CLUE, 81th IETF, Quebec City, Canada
Location: Quebec
City, Canada
Chairs: Mary
Barnes, Paul Kyzivat
Note Takers: Magnus
Westerlund, Stephen Botzko, Marshal Eubanks
Minutes Editor: Paul Kyzivat
Jabber Scribes: Peter Saint-Andre
Recorded playback:
http://www.ietf.org/meeting/81/remote-participation.html#Meetecho
Agenda
bash, Status and items of interest
Presenter: Mary
Barnes
Slides: http://www.ietf.org/proceedings/81/slides/clue-2.pdf
Agenda
bash: No
discussion, ok
Definitions
Presenter: Stephan
Wenger
Slides: http://www.ietf.org/proceedings/81/slides/clue-3.pptx
Summary of action items:
¥ WG/Stephan: start
conversation on the list on a new definition for layout (r.e., Issue #2).
Conclusion:
Document will continue
to be updated. WeÕll decide later
what one document should include the definitions (e.g., Requirements, Framework
or wherever).
Detailed Discussions:
Stephan?: Can we work without the crude tool this is.
Roni Even: Good tool to start. Still some definitions that need
ironing out. Eventually this should go into the framework document.
Stephan?: Will do a revision and discuss the open issues.
Christer Holmberg: Just want to make sure we go
through the open terminology issues
Charles Eckel: We don't need this doc, just roll
it into the documents. Hopefully this should be rolled into only one.
James Polk: Are we talking about copying it out into all the
documents?
Mary Barnes: Try to ensure that the terminology only goes into one
document. Not be copied into all of them.
Stephan: Open issues
#1 Left/Right
No one objected to interpreting them in context. Authors have
to ensure that it is understandable when using the terms.
Action: accepted
Stephan: Open Issue
#2 Layout
Layout is a render side only:
Christer Holmberg: Is this the receiver of the media.
MCUs will render a layout in binaural.
Stephan Wenger: Render, produces
sound waves and photons, which MCUs donÕt do.
Roni Even: There are two aspects: Both within a screen and the
physical relation between devices.
Brian Rosen: Thinks we need two terms, one for physical
arrangement, one for models of ?
Christer: I agree with two terms.
Agreement to create two terms.
Stephan: I will post something to the list.
Mary: Suggests WG (folks in support of two terms) start
conversation on the list on a new definition
Stephan: Open Issue
#3 MCU
Magnus: Suggest that one starts with a central node that
doesn't imply specific media processing
Brian Rosen: ?
Eric Burger: Mixes the media under the control of the focus.
?Devices the mixes media to devices
that don
Magnus: What about relays? Are we ensuring that we don't get
media plane
?: RFC4353 requires media from the mixer be sent to
each participant. Is this an issue?
-Some say yes, other's no.
Eric: Very loose on the MCU definition. No reason for
tightening it now.
Chairs: Unless proposed text, keep definition as it is.
Stephan: Open Issue
#4 Media
?: Does definition need to exclude FECC or other
non-rendered RTP streams.
Action: leave as is, but will need more discussion on the
list.
John Elwell: DTMF is also a media
Brian Rosen: Suggest that "timed" in "timed
text" is removed.
Christer: If we talk about SDP, then this becomes a bit more
complete.
Stephan: Should we include camera control
Christer: No, maybe, we likely need a wider term, like media
plane or data plane.
Eric: Not to tight we need something that allows for
smell-vision
Roni Even: ?
?: Are we excluding MSRP?
Brian Rosen: I want text chatting. What about adding
"typically"
Stephan: Wait for more input. Likely want to keep the
definition with small modifications to allow for other media protocols, like
MSRP.
Summary:
Rendering Negotiation (Christer Holmberg)
Presenter: Christer
Holmberg
Slides: http://www.ietf.org/proceedings/81/slides/clue-4.ppt
Conclusion:
Needs more
discussion on the mailing list.
Detailed Discussions:
[This was a summary/report on what was discussed at
the ad hoc meeting on Rendering Type Negotiation that took place Tuesday.]
Keith Drage: Careful with using Wikipedia. What is meant with
signaling in the definition?
Charles Eckel: What are the difference between rendering and composing.
Christer: Fine if we can use only one word.
Brian Rosen: I would prefer Rendering, as you can render a single
stream, but not compose a single source.
Roni: Composing is not a good term.
Mark Duckworth: I thought Stephan had a good idea
with sound waves. While composition and layout models better describe what is
happening prior.
Christer: Just want to find terms that separate the cases.
John Elwell: Better, how is this related to the framework?
Mary: What ever we do needs to fit in.
Roni: There can be different composition algorithms.
Mark: I disagree - binaural is a format. The composition
algorithm is how the sources actually are placed within the sound field.
Roni: Disagree with that. A composition algorithm is like
2-by-2 video composition.
Mark: Example 2 - most active speaker slide: This is a
good example of what I mean that there is different levels of concepts.
Roni: Agrees with Mark, there are input selection
algorithms, not composition choices.
Stephen Botzko: We need to agree what the things
really mean before determining if the requirements are agreeable.
Mary: We need more discussion on the mailing list.
Details from the Tuesday ad hoc meeting:
Presenter: Christer
Holmberg
Note Taker: Roni
Even
Definition:
What is rendering –
definition – no comments
Usage – offer and answer.
Use case – binaural audio as
example
Stephan: Everyone knows that there are different algorithms
for audio rendering most are clear but not for video (three screens). Registry
is not enough without definition of the syntax. There is no intuitive understanding of
the algorithm.
In video the number of options is
big.
Jonathan says that not a registry
but a syntax like XML.
John: what is offer, is it receive or support.
Christer: may be capability. Not offer answer
Mark: is this for central rendering?
Christer: the endpoint will do the rendering.
Mark: MCU will need to negotiate between both sides.
Christer: yes
Steve: need more use cases, how change mid call.
Paul: the receiver wants to ask a separate one.
Christer: does not care if it advertising capabilities or the
receivers asks one.
Mark: it is in the framework – to advertise.
Stephan: based on the framework draft we can do all you want
but the requirement here is more complicated. Two modes of audio rendering is example
that can go to more complicated usages.
John: this is getting more asymmetric than offer answer.
Will need separate description for each direction.
Christer: will update the presentation.
Mark: Framework handles audio format. Need to look at what
is layout and what is rendering. What audio streams you want in the rendering
so it has more than one dimension
Christer: need to clarify this is what I want to have.
Mark: binaural is not a layout.
Charles: the framework has this but need more information
about how to do it. Complexity, asymmetry.
Mary: need detailed use cases for requirements.
Christer: will clear the presentation for tomorrow based on
the feedback.
Mary: read the framework.
Requirements
(Allyn Romanow)
Presenter: Allyn
Romanow
Slides: http://www.ietf.org/proceedings/81/slides/clue-0.pptx
Summary of action items:
¥ Reqmt-3a: consensus to leave as a MUST.
¥ Reqmt-4/5: merge???
¥ Reqmt-8: leave for now
¥ Reqmt-10: leave in
¥ Reqmt-13: leave in
¥ Reqmt-13a: delete
¥ Reqmt-14: defer. Stephan to draft a definition for segment and
site switching.
¥ Resubmit this document as a working group document.
Conclusion:
Document agreed as a WG
document. Document will be updated based on the above action items and
submitted as a WG -00 document.
Detailed Discussions:
Reqmt-3a:
Stephan Wenger: IPR concerns
Eric Burger: ?
Roni: Must be able to do it, must not do it, optional to
use it.
Stephan Wenger: A must in
requiremnt require at least a MAY in the solution. That way one may avoid IPR
from companies if we steer into them. Want a SHOULD so that we choice later to
not
Roni: Freedom of choice is good.
Eric Burger: +1
John E: We should state our intention of requirements
Cullen Jennings: Req are
informational and not binding
Stephan B: Should leave it as it is, we really needs this for
telepresence. If we find issue deal with it later
Christian?: Include
multiple mono streams.
Stephan: Some solution include multiple mono streams
Allyn: Rough consensus to leave it as a MUST.
Reqmt-3b:
Still needs to be deferred, as
layout discussion hasn't concluded.
Reqmt-4/5:
?: Merge Reqmt-4/5
Reqmt-7:
Marshall Eubanks: What is meant with actual size.
Stephen Botzko: Advertise what the capture sizes
really are so a renderer can make intelligent choices.
Jonathan Lennox: If we delete Reqmt-7 then Reqmt-8
isn't covered anymore.
Stephan B: Do not really need Reqmt-8.
Roni: I don't want to hear that it isn't needed later, so
please leave it.
Allyn: Consensus to leave Reqmt-8 for now.
Reqmt-10:
Magnus: Want to keep it as bandwidth on different paths in a
centralized node media plane is going to be reality.
Wenger: ?
Eckel: Reqmt-10 is too vague. Need it in all conferences.
Burger: A guide on how to build a good telepresence system.
Andrew Allen: What is the scope for the WG: only the work in the
CLUE, or a complete system?
Mary: Mostly the later, but not all.
Marshall Eubanks: The goal is to build a
interoperable system. Reqmt-10 is interoperable.
Cullen: Agree with that.
John: We need to figure out what is needed, and then we
may pick what already exist.
Mary: Hum about requirment:
Action: hum taken - leave it in. No
opposing view.
Reqmt-13:
Action: leave in.
Reqmt-13a:
Jonathan Lennox: How much control is there É
Decision to delete 13a was agreed.
Jonathan L has a somewhat different
requirement to propose.
Reqmt-14:
Wenger: what are site switching.
Roni: Site switching is selecting all streams from a
particular site, rather than a single camera. This goes back to enabling one to
select the streams.
Wenger: What are this described? (use cases). Requirement
says that you need to support at least one of the methods. I find the
requirements unnecessarily complex. Needs to be improved.
Marshall Eubanks: Need to enable segment switching
when a single media stream is changed, even within a sub-part of a composite
screen.
Stephan B: We know we need to reword it. But no reason to do it
before layout discussion is done.
Action: Still deferred, current
requirement is inadequate.
Action: Stephan to draft a
definition for segment and site switching
Reqmt-15:
Magnus: Unclear if the requirement include a protocol support
for transferring indicator downstream from source node.
Stephan Botzko: If the audio stream is common for
a room, one might need to indicate which of 3 camera captures contain the
active talker.
Lennox: If someone wants the requirement, it should be reworded
to the actual requirement that is ?
Allyn: Delete the current requirement and invite new
requirements that better cover this.
Adoption of document:
HUM to accept this as a working
group document:
Action: Agreed to be a working
group document, no opposing view.
Framework
Presenter: Mark
Duckworth, Andy Pepperell, Brian Baldino
Slides: http://www.ietf.org/proceedings/81/slides/clue-1.pptx
Summary of action items:
¥ Chairs:
Schedule interim meeting to complete discussion of Framework - i.e., for Brian
Baldino to do presentation on examples.
Conclusion:
WG to continue
discussion of the framework on the mailing list.
Detailed Discussions:
First/Second Row discussion:
Microphone and video, may need extension to 2D.
Comments that some messages in
encoding groups were in fact codec dependent.
Allyn:
What we are doing here -
telepresence deals with multiple streams, while our standards deal with single
streams. Challenges - we want something:
- immediately
usable (or at least relatively quickly)
- extensible
- and simple
and practical to implement
The Framework clusters around 2
concepts:
- media
capture information that needs to be passed
- how the
provider figures out what streams to send
process
- provider
provides capabilities
- consumer
choses from these
optimization - before negotiation,
the consumer may send info about itself to the provider, so the provider can
tailor what it provides.
Properties
- Media
capture
- encode
groups
- simultaneous
transmission sets
I want to take a minute to set
context - any proposed framework was going to be difficult to communicate. We
thought we should do so in stages and start simple. We would like it if people
would focus on the concepts first.
Mark Duckworth:
Media Capture and Attributes
A media capture is a source of
audio or video media.
They can be:
- media from
a camera or a microphone (a capture device)
- media from
a combination of media devices
- or, this
could be done remotely.
A capture set is a way to group media
captures that have some relations
Some attributed include things
like:
- is the
video auto switched or composed ?
- is the
audio mixed ?
- audio
channel format (mono/stereoÉ)
- what is the
spatial scale / image width on video
Attributes include a
"purpose" - say, main versus presentation
I want to introduce a capture scene
-
Imagine a given scene with people -
cameras - camera views.
Types include:
- one camera
per screen
- merging
cameras in some fashion
- switched
based on voice with a composed PiP
- etc.
Chris: you don't assume that the whole scene is always
shown
Mark: of course
Roni: What about other models ? (lists some)
Mark: this is just one example.
A capture set is a representation
of a group of video capture. It has N "rows." Each row is a capture
set. Ordering within the rows important - it's how the left to right order is
imposed.
Stephan: when you have pan/zoom cameras, and they are set
differently for different members of capture set, then its hard to understand
what they are. Do different captures identify what part of the scene they
capture?
Mark: thatÕs a non-goal.
Case where cameras cover the same
scene was raised.
Allyn indicated that a
"Regions" concept would be added to the next draft.
Cullen: when we charted this WG É
Allyn: we were trying to start with something simple
Stephan Botzko: the goal here is to achieve
interoperability. Having some approximate idea of adjacency may be more
powerful
Roni: this talks about a simple architecture
Christian: we still have a concept of left/right?
Stephan Wenger: I am willing to spend work, but I
am not willing to let you off the hook when there are requirements that are
relevant for me.
Mark: Matching audio and video - when they are part of the
same capture set - that includes time synchronization and spatial relationships
Spatial relationships - audio
direction should roughly match video directions
In the audio, we are calling this
audio channel formats - a receiver can map these into its loudspeakers to
approximate the spatial relationship, in a way better than just going to mono,
but not requiring identical audio formats.
Allyn: the point is whether or not the draft deals with
everything we need to capture the framework, not whether it deals with all of
the details.
Andy Pepperell:
Choosing streams
Basic Message flow
media stream consumer and media
stream provider
(of course, typically side each has
both)
msc communicates with msp
msc : consumer capability
advertisement
msp : media capture advertisement
Initial message msc : consumer
capability advertisement (AKA "the hint")
- Physical
factors
- User
preferences
- Software
limitations
- etc.
Next (the second message, from msp
to msc) is the media capture advertisement from the msp
- most
recently received consumer capability advertisement
- provider
fixed parameters, such as the number of cameras
- dynamic
factors - active speaker, presentation source status,
- simultaneous
transmission sets, etc.
Third message (msc to msp)
Stream configure message from the
msc
- based on
media capture advertisement
- consumer
fixed characteristics
- dynamic
factors
This is the trigger for actual
media transfer from the provider.
Question - why not use the terms
sender and receiver?
Andy: we thought this was a little different case and that
might confuse people.
Mark: and, this is not the sending and receiving of media
Andy: simultaneous transmission sets
Suppose that the same camera is a
digital zoom of one sub-scene, and also provides the entire scene - that's why
you need simultaneous transmission sets.
Encoding groups - part of the media capture set
advertisement by the media stream provider. Each capture has an associated
- Encoding
group structure - within an encoding group, there is a possibility of multiple
encode or multiple potential encodes
- the usual
sort of video encode attributed (advertised by the provider to the
consumer) (these are the usual
sorts of stuff, bandwidth, max bandwidth, etc.)
Roni: from the consumer side you are talking about
screens.
You also have to some way of
linking encoding group with a specific codec.
Marshall Eubanks: So, if something changes in the
middle of a session, to change things you will have to have the provider send a
new media capture advertisement to the consumer, which will have to then send a
new Stream configure message, to get the changed stream.
Andy: Yes
Marshall: So the consumer will have to be listening to the
provider for MCAs at any time?
Andy: Yes.
Marshall: And, of course, you will need error messages.
Andy: Of course. The msc might get it wrong.
Brian Baldino did not present due
to lack of time.