[rtcweb] MediaStream Label and CNAME

Magnus Westerlund <magnus.westerlund@ericsson.com> Tue, 13 September 2011 08:41 UTC

Return-Path: <magnus.westerlund@ericsson.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1818E21F84B3 for <rtcweb@ietfa.amsl.com>; Tue, 13 Sep 2011 01:41:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.202
X-Spam-Level:
X-Spam-Status: No, score=-106.202 tagged_above=-999 required=5 tests=[AWL=-0.203, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NM9Wwi-dDVXe for <rtcweb@ietfa.amsl.com>; Tue, 13 Sep 2011 01:41:20 -0700 (PDT)
Received: from mailgw10.se.ericsson.net (mailgw10.se.ericsson.net [193.180.251.61]) by ietfa.amsl.com (Postfix) with ESMTP id 3962821F8B87 for <rtcweb@ietf.org>; Tue, 13 Sep 2011 01:41:19 -0700 (PDT)
X-AuditID: c1b4fb3d-b7c47ae000000b17-08-4e6f17ac9a48
Received: from esessmw0237.eemea.ericsson.se (Unknown_Domain [153.88.253.125]) by mailgw10.se.ericsson.net (Symantec Mail Security) with SMTP id E5.F4.02839.CA71F6E4; Tue, 13 Sep 2011 10:43:24 +0200 (CEST)
Received: from [127.0.0.1] (153.88.115.8) by esessmw0237.eemea.ericsson.se (153.88.115.91) with Microsoft SMTP Server id 8.3.137.0; Tue, 13 Sep 2011 10:43:24 +0200
Message-ID: <4E6F17AB.4000005@ericsson.com>
Date: Tue, 13 Sep 2011 10:43:23 +0200
From: Magnus Westerlund <magnus.westerlund@ericsson.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2
MIME-Version: 1.0
To: "rtcweb@ietf.org" <rtcweb@ietf.org>
X-Enigmail-Version: 1.3.1
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 8bit
X-Brightmail-Tracker: AAAAAA==
Subject: [rtcweb] MediaStream Label and CNAME
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Sep 2011 08:41:22 -0000

WG,
(As an individual contributor)


There has been some discussion as result of the presentation of
terminology in the RTCWEB Interim meeting last Thursday. The biggest
question was why CNAME can't map to MediaStream label. Below we
clarify why we think CNAME and label are separate entities.

One part in this reasoning has to do with the current definition of
’media resource’
(<http://dev.w3.org/html5/spec/Overview.html#media-resource>) and media
elements of html5. The ‘media resource’ could be a file, or, more
relevant to this discussion, a MediaStream. In that usage only a single
video track can be played simultaneously and in sync with one or more
audio tracks.

Thus unless we modify an existing semantics the only way of playing
multiple video tracks in sync with one or more audio tracks is to have
multiple MediaStream objects.

We see the need for supporting multiple synchronized video tracks. We
have at least two basic use cases.

1) An endpoint has two or more cameras / video grabbers that a web
application wants to use. The user will select different cameras for use
for different aspects in the application. Naturally the use would expect
audio and video to be synced regardless of camera being viewed.

2) In the use case for centralized mixers one possible video conference
RTP mixer usage is an application that takes a number of participants
audio and mixes that into a single audio stream. The video streams are
selected from among the available and mapped to a number of SSRCs that
is the mixers. All these MediaStreams are mapped to the mixers
synchronization context represented by a CNAME owned by the mixer.

With the current MediaStream semantics this requires multiple objects
with different combinations of media tracks to enable simultaneous
playback in the browser in a synchronized fashion.

We also have some more advanced use cases where having multiple
MediaStreams with different but overlapping tracks makes sense.

The first is a mesh application with four peers (A-D). If the
application in A wants to send Audio plus Video1 to B, and Audio and
Video2 to C and Audio plus both Videos to D then that would be
different MediaSteam objects unless we change the semantics.

We don't believe in modifying the MediaStream semantics unless really
necessary. One reason is that we desire an API and behavior that doesn't
make the synchronization between streams dependent on how you program
things. That can easily occur if the MediaStream object is the sole
method for achieving playback synchronization. Another is that
MediaStream ties in nicely with the existing html5 media elements. We
also see the MediaStream as good choice of representation for the
JavaScript developer as it represents resources that commonly should be
played together.

This argument about programming things in different ways may sound
abstract. But in reality it is a simple as synchronization should be
possible independent if a programmer calls getUserMedia multiple times
rather than trying to clone and disable tracks in a MediaStream object
containing all resources of interest.

In addition, not supporting synchronization between multiple
MediaStreams from a single end-point makes it difficult when the number
of tracks a single end-point wishes to transmit changes. This does occur
for two reasons. One is that one in fact have an application where one
desire to have the functionality to add additional cameras or media
grabber devices. The second one is in the centralized conference cases
where the central node may make changes to how it mixes or switches
between media resources.

Assuming no structural changes we do see a need for having all
media tracks originating from one browser instance to have the same
CNAME to enable sync across tracks belonging to different MediaStreams.
Thus resulting in the label for a MediaStream being different from the
CNAME. The CNAME is in principle a hidden property (at the API level) of
a track enabling playback synchronization across multiple MediaStreams
having tracks with the same CNAME.

In addition we do see the need for having an optimization where a track
being part of multiple MediaStreams are only sent once in PeerConnection
despite belonging to multiple MediaStream associated with that
PeerConnection.

We do understand that there are other ways of meeting the goals that
we have around tracks and their synchronization and application handles.

- A track must be possible to synchronize with any other from the same
sync context

- The sync identifier must be common over multiple PeerConnection
objects within an browser instant.

- The programmer needs a logical identifier for media resources that can
be transferred between the peers.

- Synchronization should not be dependent on how the implementor
actually write the code. If it is possible it should just happen.

Given our desire to avoid redefining the semantics of MediaStream we saw
that keeping the MediaStream label and the CNAME as separate identifiers
is required.

We have also considered the issue of how to signal the MediaStream
label between the two ends of a PeerConnection.

Ericsson's suggestion for how to realize this to use the a=ssrc
attribute as defined in RFC 5576 carried in the SDP if SDP offer/answer
semantics will be used. Otherwise we assume similar information
structures can be created.

A MediaStream object will have a label, e.g.
"61ca6552-968a-435d-88d9-a4727f2ed515". The MediaStream will contain
tracks, each track is mapped to an SSRC in a particular RTP session. In
the SDP the SSRC part of the MediaStream will in its list of a=ssrc
attributes contain one that reads

"a=ssrc:345678123 mslabel=61ca6552-968a-435d-88d9-a4727f2ed515"

In the case a particular track is part of multiple MediaStream objects
the SSRC will have multiple mslabel values, one for each MediaStream
label. As the CNAME is required to be included when using the a=ssrc
attribute the receiving end-point will also a priori know how these
tracks relate to synchronization contexts.

This signaling may be a bit of a problem for legacy applications where
a signaling gateway commonly will not know the legacy client's SSRC
values prior to it actually sending MediaStreams. To support this use
case we do suggest that any received SSRC for which explicit MediaStream
indication hasn't happened is automatically assigned its own MediaStream
with local generated labels. The receiving web application can combine
multiple MediaStreams to a single MediaStream if necessary. As we
propose the CNAME as a hidden property of the underlying track playback
sync will still be possible.

Cheers

Magnus Westerlund

----------------------------------------------------------------------
Multimedia Technologies, Ericsson Research EAB/TVM
----------------------------------------------------------------------
Ericsson AB                | Phone  +46 10 7148287
Färögatan 6                | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden| mailto: magnus.westerlund@ericsson.com
----------------------------------------------------------------------