Network Working Group M. Westerlund Internet-Draft B. Burman Intended status: Standards Track M. Lindqvist Expires: August 29, 2013 F. Jansson Ericsson February 25, 2013 Using Simulcast in RTP Sessions draft-westerlund-avtcore-rtp-simulcast-02 Abstract In some applications it may be necessary to send multiple media encodings derived from the same media source in independent RTP media streams. This is called Simulcast. This document discusses the best way of accomplishing this in RTP and how to signal it in SDP. It is concluded that a solution where the different simulcast versions are based on separate SDP media descriptions provides best support for simulcast. A solution is defined by making two extensions to SDP. The first extension consists of two new attributes in SDP that express capability to send or receive simulcast streams, respectively. The second extension describes how to group media descriptions belonging to the same simulcast source by using the grouping framework. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 29, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. Westerlund, et al. Expires August 29, 2013 [Page 1] Internet-Draft RTP Simulcast February 2013 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 5 3. Simulcast Scenarios . . . . . . . . . . . . . . . . . . . . . 5 3.1. Simulcasting to RTP Mixer . . . . . . . . . . . . . . . . 5 3.1.1. Simulcast Combined with Scalable Encoding . . . . . . 7 3.2. Multicast Transported Simulcasted Media . . . . . . . . . 7 3.2.1. Diversity in Receiver Population . . . . . . . . . . . 7 3.2.2. Bit-rate Adaptation . . . . . . . . . . . . . . . . . 8 3.3. Same Encoding to Multiple Destinations . . . . . . . . . . 9 3.4. Different Encoding to Independent Destinations . . . . . . 9 4. Network Aspects . . . . . . . . . . . . . . . . . . . . . . . 10 5. Simulcast Alternatives . . . . . . . . . . . . . . . . . . . . 10 5.1. Using the Payload Type . . . . . . . . . . . . . . . . . . 11 5.2. Using Single RTP session . . . . . . . . . . . . . . . . . 11 5.3. Using Multiple RTP sessions . . . . . . . . . . . . . . . 12 5.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 12 6. Simulcast Signaling Proposal . . . . . . . . . . . . . . . . . 13 6.1. Simulcast Capability . . . . . . . . . . . . . . . . . . . 14 6.2. Grouping Simulcast Media Descriptions . . . . . . . . . . 16 6.2.1. Declarative Use . . . . . . . . . . . . . . . . . . . 16 6.2.2. Offer/Answer Use . . . . . . . . . . . . . . . . . . . 16 6.3. Two-Phase Negotiation . . . . . . . . . . . . . . . . . . 17 6.4. Media Stream Requirements . . . . . . . . . . . . . . . . 17 6.5. Relating Alternative Encodings . . . . . . . . . . . . . . 18 6.6. Multiple Stream handling . . . . . . . . . . . . . . . . . 18 7. Simulcast Signaling Examples . . . . . . . . . . . . . . . . . 18 7.1. Alice: Desktop Client . . . . . . . . . . . . . . . . . . 19 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 9. Security Considerations . . . . . . . . . . . . . . . . . . . 22 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 11.1. Normative References . . . . . . . . . . . . . . . . . . . 23 11.2. Informative References . . . . . . . . . . . . . . . . . . 24 Westerlund, et al. Expires August 29, 2013 [Page 2] Internet-Draft RTP Simulcast February 2013 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 Westerlund, et al. Expires August 29, 2013 [Page 3] Internet-Draft RTP Simulcast February 2013 1. Introduction Simulcast is the act of simultaneously sending multiple different versions of the same media content, e.g. the same video source encoded with different video encoders or target resolutions. This can be done in several ways and for different purposes. This document focuses on the case where one wants to provide multiple streams with different encodings over RTP [RFC3550] towards an intermediary so that the intermediary can select which encoding to forward to other participants in the session, and more specifically how the grouping of the streams is defined. From an RTP perspective, simulcast is a specific application of the aspects discussed in RTP Multiplexing Architecture [I-D.westerlund-avtcore-multiplex-architecture]. The different encodings of a media content that are considered in this document can differ in: Bit-rate: The difference is the amount of bits spent to encode the media thus giving different quality. Codec: Different media codecs are used to ensure that different receivers that do not have a common set of decoders can decode at least one of the versions. This can include codec configuration options that are not compatible, like video encoder profiles, or the capability of receiving the transport packetization. Sampling: Different sampling of media, in spatial as well as in temporal domain, may be used to suit different rendering capabilities or needs at the receiving endpoints, as well as a method to achieve different bit-rates. For video streams, spatial sampling affects image resolution and temporal sampling affects video frame rate. For audio, spatial sampling relates to the number of audio channels and temporal sampling affects audio bandwidth. Obviously, a difference in sampling may result in difference in bit-rate. There are different reasons for an application to provide multiple different encodings of a single media source. As soon as an application has the need to send multiple encodings, there is a potential need for simulcast. This need can arise even when using media codecs that have scalability features built in. The purpose of this document is to describe a few scenarios where it is motivated to use simulcast, elaborate on possible alternatives and available mechanisms, and find a suitable solution for signaling and performing RTP simulcast. The discussion results in a signaling proposal to support simulcast. Westerlund, et al. Expires August 29, 2013 [Page 4] Internet-Draft RTP Simulcast February 2013 2. Definitions 2.1. Terminology The following terms and abbreviations are used in this document: Encoding: A particular encoding is the choice of the media encoder (codec) that has been used to compress the media and the fidelity of that encoding through the choice of sampling, bit-rate and other codec configuration parameters. Different encodings: An encoding is different when some parameter that characterize the encoding of a particular media source is changed. Such changes can be one or more of the following parameters; codec, codec configuration, bit-rate, sampling. Simulcast versions: Media streams used for simulcast that use different encodings and thus constitute different versions of the same media source. 2.2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 3. Simulcast Scenarios This section discusses different usage scenarios for the term simulcast and clarifies which of those this document focuses on. It also reviews why simulcast and scalable codecs can be a useful combination. 3.1. Simulcasting to RTP Mixer This scenario relates to a multi-party session where one or more central nodes are used to facilitate the media transport between the session participants. Thus, this targets the RTP Mixer Topology defined in [RFC5117] (Section 3.4: Topo-Mixer). This scenario is targeted for further discussion in this document. Simulcasting different media encodings of video that differ both in resolution and in bit-rate is highly applicable to video conferencing scenarios. For example, an RTP mixer selects the video of the most active speaker and sends that participant's video stream as a high resolution stream to the other participants, and in addition also sends a number of low resolution video streams of the other Westerlund, et al. Expires August 29, 2013 [Page 5] Internet-Draft RTP Simulcast February 2013 participants, enabling the receiving user to both display the current speaker in high quality and monitor the other participants in lower quality/resolution/size. As the participants should not receive the stream showing themselves, the set of streams will be unique to all participants. A number of alternatives exist to provide both high and low resolutions from an RTP Mixer: Simulcast: The clients send one stream for the low resolution and another for the high resolution to the RTP Mixer. Scalable Video Coding: The clients send one stream to the RTP Mixer, using a video encoder that in this stream can provide both the high resolution and also enables the mixer to extract a low resolution representation from that single stream. Transcoding in the Mixer: The clients send a high resolution stream to the RTP Mixer which performs a transcoding to a lower resolution stream. The Transcoding alternative requires that the RTP mixer has sufficient amount of transcoding resources to produce the number of low resolution streams required. In worst case, all participants' streams may need to be transcoded. If the resources are not available, a different solution is needed. There will also normally be a quality loss and an increase in latency associated with the transcoding operation. Scalable video encoding requires a more complex encoder compared to non-scalable encoding. Also, if the resolution difference between the streams is large, a scalable codec may in fact be only marginally more bandwidth efficient than the simulcast case where the different resolutions are sent as separate streams from the clients to the mixer. At the same time, with scalable video encoding using the currently available scalable video codecs, the transmission of all but the lowest resolution will consume more bandwidth from the mixer to the other participants compared to a non-scalable encoding. Simulcasting has the benefit that it is conceptually simple. It enables the use of any media codec that the participants agree on, allowing the RTP mixer to be codec-agnostic. Westerlund, et al. Expires August 29, 2013 [Page 6] Internet-Draft RTP Simulcast February 2013 +------------+ +---+ +---+ | |----->| B | | |=====>| | +---+ | A | | Mixer | | |----->| | +---+ +---+ | |=====>| C | +------------+ +---+ Figure 1: RTP Mixer selecting from simulcast versions The sender A provides the mixer with both a high resolution version "===>" and a low resolution version "--->". The mixer selects who in it's receiver population should get a particular version. 3.1.1. Simulcast Combined with Scalable Encoding As explained in the previous section, a scalable codec is not always more bandwidth efficient than simulcast, especially in the path from the mixer to the receiver. There are however cases where a combination of simulcast and scalable encoding can be beneficial. By using simulcast in cases where the scalable codec is less efficient, it is possible to optimize the efficiency of the complete system. A good example of this usage would be where the video is encoded using SVC transported in RTP [RFC6190], where each simulcast stream has a different resolution, and each SVC media stream uses temporal scalability and signal to noise ratio (SNR) scalability within that single media stream. If only resolution and temporal variations are needed, this can be implemented using the non-scalable part of H.264, as each simulcast version provides the different resolution, and each media stream within a simulcast encoding has temporal scalability through the use of non-reference frames. 3.2. Multicast Transported Simulcasted Media When using multicast, particularly Source-Specific Multicast (SSM) [RFC3569] to distribute RTP/RTCP packets to a large receiver population one faces some issues. There are at least two different issues where simulcast can potentially be useful. 3.2.1. Diversity in Receiver Population If there is any diversity in the receivers regarding e.g. capability, codec support or code base, there are potentially restrictions in what streams can be delivered to the receivers. If using the lowest common denominator over a diverse receiver population isn't acceptable, simulcast can be one possible solution. By offering Westerlund, et al. Expires August 29, 2013 [Page 7] Internet-Draft RTP Simulcast February 2013 different stream alternatives, it is possible to let the receivers choose the simulcast version that matches their capabilities. By using explicit signalling for simulcast, it is not necessary for the stream distributor to handle multiple receiver configurations individually for a multi-media session, nor to ensure that each receiver gets an encoding that matches their capabilities. The simulcast version granularity the receivers can select will be on multicast group level. Thus, this use case puts a strict requirement on supporting separation through differnt RTP sessions. The reason being that having a single RTP session straddle several multicast groups makes any reporting on the received sources very difficult to interpret. Using one RTP session per simulcast version instead provides consistency. 3.2.2. Bit-rate Adaptation If the network paths from the media sender to the receivers can support different bit-rates, there is a need to support media streams encoded to different bit-rates. If these path differences are of a more static nature, for example depending primarily on the underlying link layers, using simulcast has an advantage over scalable encoding. The reason is that the efficiency of scalable coding will never be better than encoding to a single target rate. When the receiver can determine current network interface connectivity, it can choose simulcast version with certainty. That choice will also be correct until the event of another network interface becoming the active one. This assumes that the multicast transmission uses dedicated resources and will thus not be congested due to other network traffic. To support this behavior, the signalling must support indication of which media streams that are alternatives to each other, and it is also necessary to be able to determine aggregate bit-rate for the selected multicast group(s) compared to available network properties. Simulcast is possible to use also in more dynamic situations where each receiver continuously gathers reception statistics to detect path congestion and based on that may change which version to receive. The main issue with such usage is how to achieve a switch from one version to another with minimal playback interruption and also avoiding to put extra load on the network during the actual switch. Here, scalable encoding in general have better characteristics since scalability layers are typically synchronized. When comparing simulcast and scalable encoding, the trade-offs are different and the down-sides occur at different places. Simulcast will have a higher bit-rate load at a media sender and that will also be the case for any network path shared between receivers of multiple simulcast versions. However, for parts of the network path where Westerlund, et al. Expires August 29, 2013 [Page 8] Internet-Draft RTP Simulcast February 2013 there is only a single simulcast version, the achievable quality at a given bit-rate will be slightly higher for simulcast. It will also be more difficult to seamlessly switch between simulcast versions than between different scalable encodings, as simulcast actually switches from one media stream version to another instead of adding or removing some enhancement layers. 3.3. Same Encoding to Multiple Destinations One interpretation of simulcast is when one encoding is sent to multiple receivers. This is well supported in RTP by simply copying all outgoing RTP and RTCP traffic to several transport destinations, if the intention is to create a common RTP session. As long as all participants do the same, a full mesh is constructed and everyone in the multi party session have a similar view of the joint RTP session. This is analog to an Any Source Multicast (ASM) session but without the traffic optimization as multiple copies of the same content is likely to have to pass over the same link. +---+ +---+ | A |<---->| B | +---+ +---+ ^ ^ \ / \ / v v +---+ | C | +---+ Figure 2: Full Mesh / Multi-unicast As this type of simulcast is analog to ASM usage and RTP has good support for ASM sessions, no further consideration is made in this document for this scenario. 3.4. Different Encoding to Independent Destinations Another alternative interpretation of simulcast includes multiple destinations, where each destination gets a specifically tailored version, but where the destinations are independent. A typical example for this would be a streaming server distributing the same live session to a number of receivers, adapting the quality and resolution of the multi-media session to each receiver's capability and available bit-rate. This case can be solved in RTP by having independent RTP sessions between the sender and the receivers. Thus this case is not considered further. Westerlund, et al. Expires August 29, 2013 [Page 9] Internet-Draft RTP Simulcast February 2013 4. Network Aspects The network aspects that are relevant for simulcast are: Quality of Service: When using simulcast it might be of interest to prioritize a particular simulcast version, rather than applying equal treatment to all versions. For example, lower bit-rate versions may be prioritized over higher bit-rate versions to minimize congestion or packet losses in the low bit-rate versions. Thus, there is a benefit to use a simulcast solution that supports QoS as good as possible. By separating simulcast versions into different RTP sessions and send those RTP sessions over different transport flows, a simulcast version can be prioritized by existing flow based QoS mechanisms. When using unicast, QoS mechanisms based on individual packet marking are also feasible, which do not require separation of simulcast versions into different RTP sessions to apply different QoS. NAT/FW Traversal: Using multiple RTP sessions will incur more cost for NAT/FW traversal unless they can re-use the same transport flow, which can be achieved by either one of multiplexing multiple RTP sessions on a single lower layer transport [I-D.westerlund-avtcore-transport-multiplexing] or Multiplexing Negotiation Using SDP Port Numbers [I-D.ietf-mmusic-sdp-bundle-negotiation]. If flow based QoS with any differentiation is desirable, the cost for additional transport flows is likely necessary. Multicast: Multiple RTP sessions will be required to enable combining simulcast with multicast. Different simulcast versions have to be separated to different multicast groups to allow a multicast receiver to pick the version it wants, rather than receive all of them. In this case, the only reasonable implementation is to use different RTP sessions for each multicast group so that reporting and other RTCP functions operate as intended. 5. Simulcast Alternatives Simulcast is in this document defined as the act of sending multiple alternative encodings of the same underlying media source. When transmitting multiple independent streams that originate from the same source, it could potentially be done in several different ways using RTP. A general discussion on how considerations for use of the different RTP multiplexing alternatives can be found in Guidelines for using the Multiplexing Features of RTP [I-D.westerlund-avtcore-multiplex-architecture]. Discussion and Westerlund, et al. Expires August 29, 2013 [Page 10] Internet-Draft RTP Simulcast February 2013 clarification on how to handle multiple streams in an RTP session can be found in [I-D.lennox-avtcore-rtp-multi-stream]. The below sub-sections briefly describe potential ways of achieving RTP media stream multiplexing and identification of which streams are alternative simulcast encodings of the same source. In the following descriptions it is also included how this interacts with multiple sources (SSRCs) in the same RTP session for other reasons than simulcast. Multiple SSRCs may occur for various reasons such as multiple participants in multipoint topologies like multicast, transport relays or full mesh transport simulcasting, multiple source devices such as multiple cameras or microphones at one end-point, or other RTP mechanisms such as RTP Retransmission [RFC4588]. 5.1. Using the Payload Type An alternative could be to use only the RTP payload type to identify the different simulcast streams. This could be tempting, since simulcast streams may differ in codec, codec configuration, or sampling, all of which are typically specified in SDP by a format number on the media line that is in turn connected to an RTP Payload Type. Thus all simulcast streams would be sent in the same RTP session using only a single SSRC per actual media source. However, as discussed in Guidelines for using the Multiplexing Features of RTP [I-D.westerlund-avtcore-multiplex-architecture], using Payload Type Multiplexing does not generally work and is hereby dismissed as potential solution. 5.2. Using Single RTP session This idea is based on using a unique SSRC for each alternative encoding of an actual media source within a single RTP session. The identification of streams and how they are specified to be related alternatives needs an additional mechanism, for example using SSRC grouping [RFC5576], and potentially also a new SDES item such as SRCNAME proposed in [I-D.westerlund-avtext-rtcp-sdes-srcname] with a semantics that indicate them as alternatives of a particular media source. When there are multiple actual media sources in a session, each media source will have to use a number of SSRCs to represent the different simulcast alternatives it produces. For example, assume the number of media sources is n and if they all produce the same number of simulcast versions, m, there will be n*m SSRCs in use in the RTP session. Each SSRC can use any of the configured payload types for this RTP session. All session level attributes and parameters that are not source specific will apply and must function with all the alternative encodings in use. In the currently used signaling system based on SDP [RFC4566] and Westerlund, et al. Expires August 29, 2013 [Page 11] Internet-Draft RTP Simulcast February 2013 Offer/Answer [RFC3264], the properties of media streams are typically negotiated on media block (m-line) level. Sending simulcast alternatives as different SSRC belonging to the same media description is likely possible to achieve, but SSRC centric signaling providing the needed media stream properties is currently almost non- existent and it would require a considerable effort to make the necessary SDP extensions. A single RTP session can be described in SDP by more than a single m-line, like for BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], and it can re-use the same m-line grouping [RFC5888] as would be used for multiple RTP sessions (Section 5.3), but the RTP aspects described in this section will still apply. This would enable the same signalling expressenes for multiple RTP sessions as for a single RTP sessions. 5.3. Using Multiple RTP sessions Using multiple RTP sessions means that each different simulcast version of an actual media source is transmitted in a separate RTP session, using whatever session identifier to distinguish the different versions. Since each RTP session is described by one or more SDP m-lines, this solution needs explicit m-line grouping [RFC5888] with a semantics that indicate them as simulcast alternatives. It is also important to identify the SSRCs in the different sessions that are alternative encodings of the same media source, if there are more than a single media source in each RTP session. This could be accomplished using the same SSRC across the sessions, but that is not robust against SSRC collisions and could potentially force cascading SSRC changes between sessions. A better choice would be to use different SSRC, but relate streams through a new SDES item proposed in [I-D.westerlund-avtext-rtcp-sdes-srcname]. Each RTP session will have its own set of configured RTP payload types available for use with any SSRC in that session. In addition, all other attributes for sessions or sources can be used as normal to indicate the configuration of that particular alternative. 5.4. Conclusions If it is at all desirable to support simulcast based on multicast, the solution must support using multiple RTP sessions. The main reason is that receiver based selection of simulcast version must be possible, which is accomplished in multicast through receiver selection of which multicast group(s) it joins. This also has the advantage of being able to use the existing SDP media description (m=) expressiveness to signal or negotiate simulcast versions. When using simulcast based on unicast, it is desirable to be able to use the same media description signalling expressiveness regardless Westerlund, et al. Expires August 29, 2013 [Page 12] Internet-Draft RTP Simulcast February 2013 if multiple RTP sessions are used or not. Assuming that MMUSIC decides to enable single RTP media stream negotiation per SDP media description and combine that with BUNDLE to identify RTP sessions, it appears that using one or more RTP sessions for simulcast over unicast will be able to use the same signalling solution. Thus the decision to use one or more RTP sessions can be taken based on other limitations, such as cost of NAT/FW traversal, need for flow-based QoS etc. A solution proposal for an SDP media description level signaling for Simulcast version parameters is outlined below. 6. Simulcast Signaling Proposal Signaling simulcast is about negotiating between media sender and receiver what the different simulcast versions should be, how to identify them in terms of RTP streams, and how to relate those RTP streams. The proposed solution consists of: o Signaling simulcast capability as SDP media level attributes in a first round of Offer/Answer * Separate send and receive simulcast capabilities * Media properties that are supported as base for different simulcast versions are listed as parameters o Adding SDP media descriptions for the simulcast streams in a second round of Offer/Answer * Grouping SDP media descriptions from the same media source, belonging to the same simulcast, using the SDP grouping framework [RFC5888] * Separate send and receive simulcast groupings * Negotiating parameters for simulcast version using regular, individual SDP media descriptions * Identifying RTP media streams (SSRC) from same media source using new SDES Item SRCNAME [I-D.westerlund-avtext-rtcp-sdes-srcname] This is further outlined below. Westerlund, et al. Expires August 29, 2013 [Page 13] Internet-Draft RTP Simulcast February 2013 6.1. Simulcast Capability There are numerous media properties that can be varied to construct a set of simulcast versions. A simulcast enabled endpoint could also support simulcast based on several of those properties. As long as those properties are relatively independent and if each simulcast version need explicit definition (an m-line) in the SDP, this would lead to an exponential number of simulcast version candidates and a very long SDP that is likely also hard to interpret. There is thus a need to limit the simulcast version candidates included in the SDP to cover as small set of properties as possible. If a legacy endpoint not supporting simulcast were to be presented with an SDP including media descriptions for a set of simulcast versions, it may not know how to correctly handle or interpret these "surplus" media descriptions. Based on the functionality that simulcast is intended to achieve, it should be clear that the reasons to send simulcast versions are not the same as to receive simulcast versions, seen from a single endpoint. For these reasons, it is proposed to define two new SDP media level attributes, "a=sim-send" and "a=sim-recv", which explicitly signal support for simulcast media transmission and simulcast media reception, respectively, for that media description. "a=sim-send" and "a=sim-recv" MAY be used independently and simulaneously. These attributes are also proposed to have parameters indicating the media properties used to create the simulcast versions. The meaning of the attributes on SDP session level is undefined and MUST NOT be used. simulcast = "a="( "sim-send:" / "sim-recv:" ) prop-list prop-list = prop-entry *(WSP prop-entry) prop-entry = prop *("=" q-value) prop = "rtpmap" / "fmtp" / "imageattr" / "ptime" / "crypto" / token ; for future extensions q-value = ( "0" "." 1*2DIGIT ) / ( "1" "." 1*2("0") ) ; Values between 0.00 and 1.00 ; WSP and DIGIT defined in [RFC5234] ; token defined in [RFC4566] Figure 3: ABNF for Simulcast Westerlund, et al. Expires August 29, 2013 [Page 14] Internet-Draft RTP Simulcast February 2013 The media property values are taken from existing (and could likely be extended to cover future) SDP attributes that express media properties that can be varied to create different simulcast versions: rtpmap: Differences in codec type, sampling rate (see Section 6.4), and number of channels fmtp: Differences in codec-specific encoding parameters imageattr: Differences in video resolution, aspect ratio, and framerate [RFC6236] ptime: Differences in frame aggregation per packet crypto: Differences in encryption [RFC4568] ...: The optional q-value expresses the relative preference to base a simulcast version on that media property, with 1.00 meaning maximum (100%) preference and 0.00 meaning no (0%) preference. Several media properties can share the same q-value, in which case they are equally preferred. An offerer wanting to use simulcast SHALL include either one or both of those attributes, depending on in which direction(s) simulcast will be used. An offerer that receives an answer without "a=sim- send" or "a=sim-recv" MUST NOT define or use any simulcast alternatives belonging to that media description and in that direction to the answerer. An answerer that does not understand the concept of simulcast will also not know those attributes and will remove them in the SDP answer, as defined in existing SDP Offer/Answer procedures. An answerer that does understand the attributes and that wants to support simulcast in the indicated direction SHALL reverse directionality of the attribute, "sim-send" becomes "sim-recv" and vice versa, and include it in the answer. An offerer that intends to send simulcast alternatives and thus includes "a=sim-send", MUST also include at least one media property parameter that it intends to use to construct the simulcast alternatives, but it MAY include more media property parameters. Including multiple media property parameters in "a=sim-send" SHALL be interpreted as an offer to send simulcast versions covering all combinations thereof, but MAY be further restricted by other information in the SDP such as for example the number of simulcast- related media descriptions in the SDP or use of max-ssrc signaling Westerlund, et al. Expires August 29, 2013 [Page 15] Internet-Draft RTP Simulcast February 2013 [I-D.westerlund-mmusic-max-ssrc]. An offerer that is capable of receiving simulcast alternatives and thus includes "a=sim-recv", MUST also include at least one media property parameter that it is willing to use as discriminator between received simulcast alternatives, but MAY include more media property parameters. Including multiple media property parameters in "a=sim- recv" SHALL be interpreted as an offer to receive simulcast versions covering all combinations thereof, but MAY be further restricted by other information in the SDP such as for example the number of simulcast-related media descriptions in the SDP or use of max-ssrc signaling [I-D.westerlund-mmusic-max-ssrc]. An answerer either lacks the capability or desire to use simulcast versions based on a certain media property parameter in a specific direction MUST remove such media property parameter from "a=sim-send" or "a=sim-recv". The answerer MUST NOT add any media property parameters that were not included in the offer. 6.2. Grouping Simulcast Media Descriptions To relate media descriptions holding simulcast versions, two new simulcast grouping semantics are defined, "SimulCast Receive" (SCR) and "SimulCast Send" (SCS). There is a need to separate semantics for the intent to send simulcast streams from the semantics that describe capability to recognize and receive simulcast streams. Both sematics act as an indicator that simulcast is desired and that the grouped media descriptions (m-lines) carries simulcast versions of media sources. There may be multiple sets of media descriptions that carries simulcast versions. 6.2.1. Declarative Use When used as a declarative media description, SCR indicates the configured end-point's required capability to recognize and receive a specified set of RTP streams as simulcast streams. In the same fashion, SCS requests the end-point to send a specified set of RTP streams as simulcast streams. SCR and SCS MAY be used independently and at the same time and they need not specify the same or even the same number of media descriptions in the group. 6.2.2. Offer/Answer Use When used in an offer, SCS indicates the SDP providing agent's intent of sending simulcast and the particular set of media descriptions, and SCR indicates the agent's capability of receiving simulcast streams within the configured set of media descriptions. SCS and SCR MAY be used independently and at the same time and they need not Westerlund, et al. Expires August 29, 2013 [Page 16] Internet-Draft RTP Simulcast February 2013 specify the same or even the same number of media descriptions in the group. The answerer MUST change SCS to SCR and SCR to SCS in the answer, given that it has and wants to use the corresponding (reverse) capability. An answerer not supporting the SCS or SCR direction, or not supporting SCS or SCR grouping semantics at all, will remove that grouping attribute altogether, according to the grouping framework [RFC5888]. However, this case should not occur or at least be very rare due to the proposed two-phase approach (Section 6.3). An offerer that receives an answer indicating lack of simulcast support in one or both directions, where SCR and/or SCS grouping are removed, MUST NOT use simulcast in the non-supported direction(s). 6.3. Two-Phase Negotiation These new "a=sim-send" and "a=sim-recv" attributes are proposed to be included in the SDP as a first phase in a two-phased approach, where the first phase involves a first SDP Offer/Answer procedure that only establishes simulcast capability at both the offerer and the answerer. This has the additional advantage to avoid sending media descriptions related to simulcast to an endpoint that does not support simulcast. It is also not likely that it incurs any significant extra signaling round-trips, given that many other recent SDP techniques also makes use of two Offer/Answer procedures, as long as this phased approach can be used in parallel with those. Such other two-phase techniques include ICE [RFC5245] and BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation]. Thus, the first Offer/Answer SHOULD NOT include any simulcast-grouped media descriptions, which SHOULD then be added in a second Offer/ Answer phase. This second phase SHOULD be initiated by the simulcast receiver, meaning the endpoint that included "a=sim-recv" in the first phase SDP SHOULD be offerer in the second phase. If both endpoints are simulcast receivers, it is not possible to define a preferred offerer in the second phase and either endpoint MAY then send the offer, using regular Offer/Answer rules to handle race conditions. The first phase of establishing capability is not possible to use with declarative SDP, in which case it SHALL be by-passed, using the second phase media description grouping directly. 6.4. Media Stream Requirements When doing simulcast, the media streams that are alternatives need to meet certain constraints to ensure that switching between alternative streams are as issue-free as possible. The following constraints are needed: Westerlund, et al. Expires August 29, 2013 [Page 17] Internet-Draft RTP Simulcast February 2013 Same Clock Base: To enable correct alignment of media packets on the source time-line, all alternative streams (SSRCs) MUST use the same underlying clock to relate their RTP timestamp values with the network time protocol (NTP) formatted sender time in the RTCP Sender Reports. 6.5. Relating Alternative Encodings To ensure that simulcast streams can be related correctly also on RTP level, the usage of SDES SRCNAME [I-D.westerlund-avtext-rtcp-sdes-srcname] to label and relate simulcast versions belonging to the same media source is RECOMMENDED. 6.6. Multiple Stream handling When using multiple SSRC in a single media description, for example when using simulcast for multiple independent media sources, the grouping semantics SCR and SCS SHOULD be combined with the SDP attributes "a=max-send-ssrc" and "a=max-recv-ssrc" [I-D.westerlund-mmusic-max-ssrc] to indicate the number of simultaneous streams of each encoding that may be sent or that can be handled in the receive direction. 7. Simulcast Signaling Examples For brevity and clarity, the SDP in all below examples does not contain signaling for multiple streams, such as the ones related to RTP level relations (Section 6.5) or multiple SSRC signaling (Section 6.6). This example is for a case of client to video conference service using a centralized media topology with an RTP mixer. Alice and Bob calls into a conference server for a conference call with audio and video sent to the RTP mixer, these clients being capable to send a few video simulcast versions. The conference server also dials out to Fred, which is a legacy client resulting in fallback behavior. When dialing out to Joe, more functionality is enabled as Joe is a client similar to Alice. Westerlund, et al. Expires August 29, 2013 [Page 18] Internet-Draft RTP Simulcast February 2013 +---+ +-----------+ +---+ | A |<---->| |<---->| B | +---+ | | +---+ | Mixer | +---+ | | +---+ | F |<---->| |<---->| J | +---+ +-----------+ +---+ Figure 4: Four-party Mixer-based Conference Example of Media plane for RTP mixer based multi-party conference with 4 participants. 7.1. Alice: Desktop Client Alice is calling in to the mixer with an audiovisual single stream desktop client, only adding capability to send video resolution [RFC6236] ("imageattr") and framerate based simulcast compared to a legacy client. The first phase offer from Alice looks like v=0 o=alice 2362969037 2362969040 IN IP4 192.0.2.156 s=Simulcast enabled Desktop Client t=0 0 c=IN IP4 192.0.2.156 b=AS:665 m=audio 49200 RTP/AVP 96 97 9 8 b=AS:145 a=rtpmap:96 G719/48000/2 a=rtpmap:97 G719/48000 a=rtpmap:9 G722/8000 a=rtpmap:8 PCMA/8000 m=video 49300 RTP/AVP 96 b=AS:520 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c01e a=sim-send:imageattr=1.0 fmtp=0.8 a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180] a=content:main Figure 5: Alice First Offer for a Simulcast Conference In this first phase, the only thing in the SDP that indicates simulcast capability is the line in the video media description containing the "sim-send" attribute. The answer from the server indicates both that it is simulcast Westerlund, et al. Expires August 29, 2013 [Page 19] Internet-Draft RTP Simulcast February 2013 capable and that it would only like to use video resolution ("imageattr") based simulcast only. Should it not have been simulcast capable, the "a=sim-recv" line would not have been present and communication would have started with the media negotiated in the SDP. v=0 o=server 823479283 1209384938 IN IP4 192.0.2.2 s=Answer to simulcast enabled Desktop Client t=0 0 c=IN IP4 192.0.2.43 b=AS:665 m=audio 49200 RTP/AVP 96 b=AS:145 a=rtpmap:96 G719/48000/2 m=video 49300 RTP/AVP 96 b=AS:520 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c01e a=sim-recv:imageattr a=imageattr:* send [x=640,y=360] [x=320,y=180] recv [x=640,y=360] a=content:main Figure 6: Server First Answer for a Simulcast Conference Since the server is the simulcast media receiver, it immediately initiates another Offer/Answer including the simulcast versions. The server also keeps the "sim-recv" as explicit simulcast capability indication in this second Offer/Answer round. Note that the "non- simulcast" media can be started already now, before the second phase Offer/Answer, with the only restriction that the simulcast functionality is not yet established. Westerlund, et al. Expires August 29, 2013 [Page 20] Internet-Draft RTP Simulcast February 2013 v=0 o=server 823479283 1209384938 IN IP4 192.0.2.2 s=Server inviting simulcast enabled Desktop Client t=0 0 c=IN IP4 192.0.2.43 b=AS:825 a=group:SCR 2 3 m=audio 49200 RTP/AVP 96 b=AS:145 a=rtpmap:96 G719/48000/2 a=mid:1 m=video 49300 RTP/AVP 96 b=AS:520 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c01e a=sim-recv:imageattr a=imageattr:* send [x=640,y=360] [x=320,y=180] recv [x=640,y=360] a=mid:2 a=content:main m=video 49400 RTP/AVP 96 b=AS:160 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c00d a=imageattr:96 recv [x=320,y=180] a=mid:3 a=recvonly Figure 7: Server Second Offer for a Simulcast Conference The server has added one additional receive-only media description with the simulcast version based on difference only in imageattr. That the two media lines are considered to be simulcast versions is seen from the SCR grouping tag and the two media IDs (2 and 3). The first video version with media ID 2 prefers 360p resolution (signaled via imageattr) and the second video version with media ID 3 prefers 180p resolution. The first video media line also acts as the single send video (making media line sendrecv), while the second video media line is only related to simulcast transmission and is thus offered recvonly. The fact that fmtp for this second video is also different should be seen as a secondary effect from the change of resolution and does not create any kind of conflict. The capabilities of Alice's client is very well aligned with this and the SDP answer is straightforward. Westerlund, et al. Expires August 29, 2013 [Page 21] Internet-Draft RTP Simulcast February 2013 v=0 o=alice 2362969037 2362969040 IN IP4 192.0.2.156 s=Final answer from simulcast enabled Desktop Client t=0 0 c=IN IP4 192.0.2.156 b=AS:825 a=group:SCS 2 3 m=audio 49200 RTP/AVP 96 b=AS:145 a=rtpmap:96 G719/48000/2 a=mid:1 m=video 49300 RTP/AVP 96 b=AS:520 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c01e a=sim-send:imageattr a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180] a=mid:2 a=content:main m=video 49400 RTP/AVP 96 b=AS:160 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c00d a=imageattr:96 send [x=320,y=180] a=mid:3 a=sendonly Figure 8: Alice Second Answer for a Simulcast Conference 8. IANA Considerations This document requests that two new attributes sim-send and sim-recv, with a new registry of defined parameters taken from existing SDP attributes, and two new SDP grouping semantics, SCS and SCR, are registered. Formal registrations to be written. 9. Security Considerations The simulcast capability attributes and parameters are vulnerable to attacks in signaling. A false inclusion of simulcast attributes may result in generation of a second phase SDP that potentially contains a large number of non- supported media descriptions expressing simulcast alternatives. A Westerlund, et al. Expires August 29, 2013 [Page 22] Internet-Draft RTP Simulcast February 2013 correct SDP implementation will however be able to reject any non- supported media descriptions and the effect from that should be limited. A hostile removal of the simulcast attributes will result in skipping any second phase Offer/Answer and that simulcast is not used. The simulcast grouping semantics are vulnerable to attacks in the signalling. A false grouping of non-simulcast streams as simulcast would risk that some streams are incorrectly ignored by receivers that know simulcast and that are not interested in the assumed simulcast streams. A hostile removal of simulcast grouping will prevent streams from being interpreted as simulcast, which obviously prevents use of the simulcast functionality. It will also risk that intended simulcast streams are instead presented as separate, independent streams to a receiver. Neither of the above will likely have any major consequences and can be mitigated by signaling that is at least integrity and source authenticated to prevent an attacker to change it. 10. Acknowledgements 11. References 11.1. Normative References [I-D.westerlund-avtext-rtcp-sdes-srcname] Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES Item SRCNAME to Label Individual Sources", draft-westerlund-avtext-rtcp-sdes-srcname-00 (work in progress), October 2011. [I-D.westerlund-mmusic-max-ssrc] Holmberg, C., Westerlund, M., Burman, B., and F. Jansson, "Multiple Synchronization Sources (SSRC) in SDP Media Descriptions", draft-westerlund-mmusic-max-ssrc-00 (work in progress), September 2012. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Westerlund, et al. Expires August 29, 2013 [Page 23] Internet-Draft RTP Simulcast February 2013 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006. [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session Description Protocol (SDP) Security Descriptions for Media Streams", RFC 4568, July 2006. [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media Attributes in the Session Description Protocol (SDP)", RFC 5576, June 2009. [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description Protocol (SDP) Grouping Framework", RFC 5888, June 2010. [RFC6236] Johansson, I. and K. Jung, "Negotiation of Generic Image Attributes in the Session Description Protocol (SDP)", RFC 6236, May 2011. 11.2. Informative References [I-D.ietf-mmusic-sdp-bundle-negotiation] Holmberg, C., Alvestrand, H., and C. Jennings, "Multiplexing Negotiation Using Session Description Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-bundle-negotiation-03 (work in progress), February 2013. [I-D.lennox-avtcore-rtp-multi-stream] Lennox, J. and M. Westerlund, "Real-Time Transport Protocol (RTP) Considerations for Endpoints Sending Multiple Media Streams", draft-lennox-avtcore-rtp-multi-stream-01 (work in progress), October 2012. [I-D.westerlund-avtcore-multiplex-architecture] Westerlund, M., Burman, B., and C. Perkins, "RTP Multiplexing Architecture", draft-westerlund-avtcore-multiplex-architecture-00 (work in progress), October 2011. [I-D.westerlund-avtcore-transport-multiplexing] Westerlund, M., "Multiple RTP Session on a Single Lower- Layer Transport", draft-westerlund-avtcore-transport-multiplexing-00 (work Westerlund, et al. Expires August 29, 2013 [Page 24] Internet-Draft RTP Simulcast February 2013 in progress), October 2011. [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. [RFC3569] Bhattacharyya, S., "An Overview of Source-Specific Multicast (SSM)", RFC 3569, July 2003. [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. Hakenberg, "RTP Retransmission Payload Format", RFC 4588, July 2006. [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, January 2008. [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", RFC 5245, April 2010. [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, "RTP Payload Format for Scalable Video Coding", RFC 6190, May 2011. Authors' Addresses Magnus Westerlund Ericsson Farogatan 6 SE-164 80 Kista Sweden Phone: +46 10 714 82 87 Email: magnus.westerlund@ericsson.com Bo Burman Ericsson Farogatan 6 SE-164 80 Kista Sweden Phone: +46 10 714 13 11 Email: bo.burman@ericsson.com Westerlund, et al. Expires August 29, 2013 [Page 25] Internet-Draft RTP Simulcast February 2013 Morgan Lindqvist Ericsson Farogatan 6 Kista, SE-164 80 Sweden Phone: +46 10 719 00 00 Fax: Email: morgan.lindqvist@ericsson.com URI: Fredrik Jansson Ericsson Farogatan 6 Kista, SE-164 80 Sweden Phone: +46 10 719 00 00 Fax: Email: fredrik.k.jansson@ericsson.com URI: Westerlund, et al. Expires August 29, 2013 [Page 26]