idnits 2.17.1 

draft-romanow-clue-audio-rendering-tag-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The abstract seems to contain references ([I-D.ietf-clue-framework]),
     which it shouldn't.  Please replace those with straight textual mentions
     of the documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (May 31, 2012) is 4347 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-25) exists of
     draft-ietf-clue-framework-05

  == Outdated reference: A later version (-04) exists of
     draft-lennox-clue-rtp-usage-03


     Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	CLUE                                                          A. Romanow
3	Internet-Draft                                                 R. Hansen
4	Intended status: Standards Track                           Cisco Systems
5	Expires: December 2, 2012                                   A. Pepperell
6	                                                             Silverflare
7	                                                              B. Baldino
8	                                                           Cisco Systems
9	                                                            May 31, 2012

11	    The need for audio rendering tag mechanism in the CLUE Framework
12	               draft-romanow-clue-audio-rendering-tag-00

14	Abstract

16	   The purpose of this draft is for discussion in the CLUE working
17	   group.

19	   It proposes adding an audio rendering tag to the CLUE framework
20	   [I-D.ietf-clue-framework], which makes it possible for the consumer
21	   to correctly render audio with respect to video in a multistream
22	   video conference.  The solution proposed is in partial response to
23	   CLUE Task #10, Does Framework provide sufficient info for receiver?

25	Status of this Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on December 2, 2012.

42	Copyright Notice

44	   Copyright (c) 2012 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Motivation- the issue . . . . . . . . . . . . . . . . . . . . . 3
60	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
61	   3.  Audio Rendering Tag Mechanism . . . . . . . . . . . . . . . . . 3
62	   4.  Use of the RTP header extension . . . . . . . . . . . . . . . . 5
63	   5.  Use case note . . . . . . . . . . . . . . . . . . . . . . . . . 6
64	   6.  Security Considerations . . . . . . . . . . . . . . . . . . . . 6
65	   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . 6
66	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
67	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 7
68	     9.1.  Normative References  . . . . . . . . . . . . . . . . . . . 7
69	     9.2.  Informative References  . . . . . . . . . . . . . . . . . . 7
70	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 7

72	1.  Motivation- the issue

74	   A goal for CLUE audio is that listeners perceive the direction of a
75	   sound source to be the same as that of the visual image of the
76	   source; this is referred to as directional audio.  In some situations
77	   the existing clue mechanisms are adequate.  The consumer can use the
78	   spatial information to correctly place the audio when the provider
79	   advertisement includes spatial information (point of origin and
80	   capture area) giving a static relationship between both video and
81	   associated audio captures.

83	   However, in some circumstances, for different reasons, the audio
84	   and/or video spatial information is not sent in the provider
85	   advertisement.  For instance, the case of a three-screen system
86	   advertising three video captures and one switched audio capture,
87	   where the audio is switched from the loudest of three microphones.
88	   In this case, how will the consumer know how to associate the audio
89	   with the correct video so it can be played out in the correct
90	   location?

92	   Here we suggest a simple mechanism -- audio rendering tagging.

94	   When audio and video cannot be matched through provider advertisement
95	   spatial information, we would like the ability to play out audio on
96	   multiple speakers matching the position of the speaker in the
97	   original scene.  Also, the audio may be assigned to a speaker in
98	   real-time.  It may need to be mixe locally and played out on any
99	   speaker.  For example, if the consumer wants to hear the top 3
100	   speakers, regardless of where they are located remotely, if all 3 top
101	   speakers are coming from the left, then the 3 speakers need to be
102	   mixed, perhaps locally, and played out on the left.

104	   Note: Several typical scenarios are described at the end of this note
105	   in section titled Use Case.

107	2.  Terminology

109	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
110	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
111	   document are to be interpreted as described in RFC 2119 [RFC2119] and
112	   indicate requirement levels for compliant implementations.

114	3.  Audio Rendering Tag Mechanism

116	   We propose an audio tagging mechanism In order to cope with a
117	   changing mapping of the most significant audio and video participants
118	   (i.e., normal MCU operations in the presence of more participants'
119	   media streams that can be rendered simultaneously) and to get audio
120	   played out correctly to multiple speakers.  A consumer optionally
121	   tells the provider an audio tag value corresponding to each of its
122	   chosen video captures which enables received audio to be associated
123	   with the correct video stream, even when the set of audible
124	   participants changes.  This information is included with the consumer
125	   request so there is no need for additional CLUE message exchanges
126	   (specifically, no additional provider capture advertisements or
127	   consumer requests).

129	   The audio tags are defined in the consumer request as opposed to in a
130	   capture advertised by a producer.  The reason for this is that it is
131	   valid for a consumer to request a capture multiple times (with
132	   different encodings, for example) and hence a method is required for
133	   differentiating between these streams.

135	   When the consumer configures the provider, saying which captures it
136	   wants, it also optionally includes an audio tag with each capture
137	   request.  For example, VC1, ATag1; VC2, ATag2.  When the provider
138	   sends audio packets to the consumer, it includes the appropriate
139	   audio tag in an RTP header extension.  For example, if the provider
140	   is sending audio packets that are associated with VC1, it tags the
141	   packets with ATag1.  The consumer can then play out the audio in a
142	   position appropriate for video from VC1.

144	   Suppose that several audio streams need to be played out through the
145	   same speaker - for example, the 3 audio streams (AC1, AC2, AC3) need
146	   to be played out at the speaker associated with VC1.  The provider
147	   would send:

149	   AC1  ATag1
150	   AC2  ATag1
151	   AC3  ATag1

153	   AC1, AC2 and AC3 are all played out on the same speaker, the audio
154	   output associated with VC1.  This takes care of the issue of dynamic
155	   audio output - assigning the right speaker to audio streams.

157	   Figure 1 illustrates an example showing 3 screens, each with a main
158	   video and 3 PIPs.  Below each screen is a list of the video captures,
159	   VCs with the associated Audio Tag.

161	          ----------------------3 Screens ---------------------
162	         |------------------+- -----------------+------------------Y
163	         |                  |                   |                  |
164	         |    VC1           |    VC2            |    VC3           |
165	         |                  |                   |                  |
166	         |                  |                   |                  |
167	         |                  |                   |                  |
168	         | ''''|'''''''''|  |  ''''|'''''|'''|  |  '''''|''''|''''||
169	         | |VC4|.VC5.|VC6|  |  |VC7|.VC8.|VC9|  |  |VC10|VC11|VC12||
170	         '------------------+-------------------+-------------------
171	           VC1                  VC2                 VC3
172	           VC4  Audio Tag 1     VC7  Audio tag 2    VC10 Audio tag 3
173	           VC5                  VC8                 VC11
174	           VC6                  VC9                 VC12

176	            Figure 1: Audio rendering tags for 3 screen example

178	   The provider may choose not to include the extension header in an
179	   audio packet, signaling that there is no association between the
180	   current audio and current video (i.e., an audio-only participant).
181	   It may also include more than one audio tag in the extension header,
182	   signaling that this audio is associated with multiple current video
183	   participants, due perhaps to a capture being received multiple times
184	   at different resolutions, or two video captures that both include the
185	   current speaker.

187	   This mechanism also allows multiple audio streams to be associated
188	   with a single video stream (i.e. for a composed video stream); this
189	   simply requires the appropriate audio packets to be tagged with the
190	   same id.

192	4.  Use of the RTP header extension

194	   We propose that audio tags are integer numbers between 0 and 255
195	   optionally set by the consumer per requested capture.  This allows up
196	   to 16 tags to be included in a one-byte RTP header extension [RFC
197	   5285].  An example header extension for an audio packet with one tag
198	   follows.  The audio tag extension is ID1.  The example includes
199	   another header extension (ID0) to show how the proposal would
200	   interact with [I-D.lennox-clue-rtp-usage]:

202	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
203	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
204	   |       0xBE    |    0xDE       |           length=1            |
205	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
206	   |  ID0  | L=0  |     data      |  ID1  |  L=0  |     Tag       |
207	   -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

209	          RTP ext headers for audio rendering tag and capture ID

211	   The lack of the RTP header extension in a packet means that the audio
212	   packet is not associated with any of the requested video streams that
213	   included audio tags.

215	5.  Use case note

217	   o  An endpoint can receive multiple video and audio streams and
218	      render complex layouts locally.
219	   o  It may have a wide display area so directional audio is important.
220	   o  It may have one loudspeaker per display, or perhaps some entirely
221	      different multi-loudspeaker setup known only to the endpoint
222	      itself.
223	   o  The endpoint may therefore have the capability of playing back
224	      audio from a wide range of positions.
225	   o  Either from a few fixed zones or with fine granularity.
226	   o  Either by routing a sound source to a single loudspeaker, by
227	      panning between pairs of loudspeakers, or by some other advanced
228	      distribution scheme involving several or even all loudspeakers.

230	6.  Security Considerations

232	   TBD

234	7.  Acknowledgements

236	   Thanks to Johan Nielsen for discussions and adding the Use case
237	   note.cuss

239	8.  IANA Considerations

241	   TBD

243	9.  References
244	9.1.  Normative References

246	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
247	              Requirement Levels", BCP 14, RFC 2119, March 1997.

249	9.2.  Informative References

251	   [I-D.ietf-clue-framework]
252	              Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino,
253	              "Framework for Telepresence Multi-Streams",
254	              draft-ietf-clue-framework-05 (work in progress), May 2012.

256	   [I-D.lennox-clue-rtp-usage]
257	              Lennox, J., Witty, P., and A. Romanow, "Real-Time
258	              Transport Protocol (RTP) Usage for Telepresence Sessions",
259	              draft-lennox-clue-rtp-usage-03 (work in progress),
260	              March 2012.

262	Authors' Addresses

264	   Allyn Romanow
265	   Cisco Systems
266	   San Jose, CA  95134
267	   USA

269	   Email: allyn@cisco.com

271	   Robert Hansen
272	   Cisco Systems
273	   Langley,
274	   UK

276	   Email: rohanse2@cisco.com

278	   Andy Pepperell
279	   Silverflare

281	   Email: andy.pepperell@silverflare.com
282	   Brian Baldino
283	   Cisco Systems
284	   San Jose, CA  95134
285	   USA

287	   Email: bbaldino@cisco.com