idnits 2.17.1 

draft-mostafa-mmusic-sip-cp-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There is 1 instance of too long lines in the document, the longest one
     being 1 character in excess of 72.

  -- The document has examples using IPv4 documentation addresses according
     to RFC6890, but does not use any IPv6 documentation addresses.  Maybe
     there should be IPv6 examples, too?


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 23, 2011) is 4507 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

     No issues found here.

     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                    A. Mostafa, Ed.
3	Internet-Draft                                                     Avaya
4	Intended status: Standards Track                       December 23, 2011
5	Expires: June 25, 2012

7	 A Mechanism for Negotiating Multi-Stream Continuous Presence Video in
8	                                  SIP
9	                     draft-mostafa-mmusic-sip-cp-00

11	Abstract

13	   The NextGen video conferencing clients require multiple concurrent
14	   video streams to provide a User eXperience (UX) in which multiple
15	   participants can be viewed at the same time, this user experience is
16	   called Continuous Presence (CP) video.  The multi-stream CP video
17	   provides more client control of the UX and less processing on the
18	   conference server since the video streams are relayed by the server
19	   rather than mixed to compose a CP video stream.  The client CP
20	   layout, processing power and bandwidth limitations require a per
21	   stream bandwidth and resolution to be negtiated in the SIP Offer/
22	   Answer with the conference server.  Standard methods are used to
23	   achieve this negotiation in addition to a new SDP parameter.  This
24	   document explains the methodology and solution to achieve this in SIP
25	   and SDP.

27	Status of this Memo

29	   This Internet-Draft is submitted in full conformance with the
30	   provisions of BCP 78 and BCP 79.

32	   Internet-Drafts are working documents of the Internet Engineering
33	   Task Force (IETF).  Note that other groups may also distribute
34	   working documents as Internet-Drafts.  The list of current Internet-
35	   Drafts is at http://datatracker.ietf.org/drafts/current/.

37	   Internet-Drafts are draft documents valid for a maximum of six months
38	   and may be updated, replaced, or obsoleted by other documents at any
39	   time.  It is inappropriate to use Internet-Drafts as reference
40	   material or to cite them other than as "work in progress."

42	   This Internet-Draft will expire on June 25, 2012.

44	Copyright Notice

46	   Copyright (c) 2011 IETF Trust and the persons identified as the
47	   document authors.  All rights reserved.

49	   This document is subject to BCP 78 and the IETF Trust's Legal
50	   Provisions Relating to IETF Documents
51	   (http://trustee.ietf.org/license-info) in effect on the date of
52	   publication of this document.  Please review these documents
53	   carefully, as they describe your rights and restrictions with respect
54	   to this document.  Code Components extracted from this document must
55	   include Simplified BSD License text as described in Section 4.e of
56	   the Trust Legal Provisions and are provided without warranty as
57	   described in the Simplified BSD License.

59	Table of Contents

61	   1.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  3
62	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
63	     2.1.  Key Words  . . . . . . . . . . . . . . . . . . . . . . . .  3
64	     2.2.  Abbreviations  . . . . . . . . . . . . . . . . . . . . . .  3
65	     2.3.  Voice Activated Switching  . . . . . . . . . . . . . . . .  3
66	     2.4.  Continuous Presence  . . . . . . . . . . . . . . . . . . .  3
67	     2.5.  Video Shuffling  . . . . . . . . . . . . . . . . . . . . .  3
68	   3.  Multi-Stream Continuous Presence Video . . . . . . . . . . . .  4
69	   4.  Multi-Stream Continuous Presence video SIP and SDP
70	       negotiation  . . . . . . . . . . . . . . . . . . . . . . . . .  4
71	     4.1.  Basic SIP and SDP negotiation and flows for
72	           multi-stream CP Video  . . . . . . . . . . . . . . . . . .  5
73	       4.1.1.  Client Inbound CP Video  . . . . . . . . . . . . . . .  5
74	       4.1.2.  Client Outbound Video  . . . . . . . . . . . . . . . .  5
75	       4.1.3.  Audio  . . . . . . . . . . . . . . . . . . . . . . . .  5
76	     4.2.  Advanced SIP and SDP negotiation and flows for
77	           multi-stream CP Video  . . . . . . . . . . . . . . . . . .  5
78	       4.2.1.  SDP content attribute  . . . . . . . . . . . . . . . .  5
79	       4.2.2.  VAS Rank . . . . . . . . . . . . . . . . . . . . . . .  7
80	   5.  Active Talker Indication . . . . . . . . . . . . . . . . . . .  9
81	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
82	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
83	   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
84	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
85	     9.1.  Informative References . . . . . . . . . . . . . . . . . .  9
86	     9.2.  Normative References . . . . . . . . . . . . . . . . . . . 10
87	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10

89	1.  Overview

91	   This document describes the SIP and SDP negotiation required for the
92	   multi-stream CP video using video codecs such as H.264 SVC and AVC
93	   (SVC: Scalable Video Coding, AVC: Advanced Video Coding).  It covers
94	   the CP layout use cases, grouping, shuffling and bandwidth scaling
95	   for the CP streams.

97	2.  Terminology

99	2.1.  Key Words

101	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
102	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
103	   document are to be interpreted as described in BCP 14, RFC
104	   2119[RFC2119].

106	2.2.  Abbreviations

108	   VAS: Voice Activated Switching
109	   CP: Continuous Presence
110	   UX: User eXperience
111	   BW: Bandwidth
112	   H.264 SVC: H.264 Scalable Video Coding
113	   H.264 AVC: H.264 Advanced Video Coding

115	2.3.  Voice Activated Switching

117	   Voice Activated Switching in video delivers the video of single user
118	   in a conference to a participant, this user is the current or most
119	   recent active speaker.  For example Alice, Bob, Carol, Dave and John
120	   are video particpants in a conference, Alice is talking, John would
121	   see Alice's video, when Bob starts talking John sees Bob's video.

123	2.4.  Continuous Presence

125	   Continuous Presence in video delivers the video of multiple users in
126	   a conference to a participant.  For example Alice, Bob, Carol, Dave
127	   and John are video particpants in a conference, John can see a
128	   Continuous Presence video that shows Alice, Bob, Carol and Dave at
129	   the same time on his video client, typically the video of the most
130	   recent active speakers.

132	2.5.  Video Shuffling

134	   Video shuffling is used in Continuous Presence use cases.  For
135	   example Alice, Bob, Carol, Dave, Mike and John are video particpants
136	   in a conference, John can see a four windows Continuous Presence
137	   video that has Alice, Bob, Carol and Dave on his client as the most
138	   recent active speakers, when Mike starts talking he becomes the most
139	   recent active speaker, the conference server shuffles Mike, Alice,
140	   Bob and Carol streams in place of previous Alice, Bob, Carol and Dave
141	   streams, this results in shuffling of particpants in the four windows
142	   CP view on client.

144	3.  Multi-Stream Continuous Presence Video

146	   The Multi-Stream Continuous Presence video delivers multi-stream
147	   video (e.g.  H.264 SVC or AVC) to the client from a conference server
148	   for the client to decode and render to the user.  Continuous Presence
149	   video displays multiple participants' windows on the client's
150	   display, usually for the most recent active speakers.  The multi-
151	   stream video streams are negotiated using (n) video m lines in the
152	   SDP where n > 1. example is n=4 where the CP video contains 4
153	   participants/streams.  A single video m line (n=1) means no CP and
154	   typically display the most recent active speaker.  Current video SDP
155	   negotiation covers only the codecs used (e.g.  H.264 SVC and AVC),
156	   bit rate, number of layers used (in SVC per[RFC6190]) and direction
157	   (recvonly, sendrcv, sendonly) but doesn't address the various aspects
158	   of the BW optimization, the shuffling mechanism, grouping and layout
159	   of the CP windows.

161	4.  Multi-Stream Continuous Presence video SIP and SDP negotiation

163	   This section describes the SIP and SDP negotiation required for the
164	   multi-stream CP video, some use cases, flows and examples.

166	        Audio/Video +------------+ Multistream CP video  +----------+
167	Alice  ------------>|            |---------Alice-------->|          |
168	                    |            |---------Bob-- ------->|          |
169	Bob    ------------>|            |---------Dave--------->|          |
170	                    |            |---------Mike--------->|          |
171	Carol  ------------>|            |                       |          |
172	                    | Conference |                       |  Client  |
173	Dave   ------------>|   Server   |------Mixed Audio----->|          |
174	                    |            |<--------Audio---------|          |
175	John   ------------>|            |                       |          |
176	                    |            |                       |          |
177	Mike   ------------>|            |<--------Video---------|          |
178	                    +------------+                       +----------+
179	                    Figure 1 - Multiple Video Streams Continuous Presence

181	4.1.  Basic SIP and SDP negotiation and flows for multi-stream CP Video

183	   Multi-stream CP basic negotiation is initiated or escalated by
184	   clients where a client negotiates multiple video m lines to receive
185	   the CP video, this could be done in the initial offer from client to
186	   conference server or in a re-INVITE.

188	4.1.1.  Client Inbound CP Video

190	   Conference server MAY accept all video m lines, some, one or none
191	   (audio only call) depending on conference server capabilities and
192	   policies.  The conference sever should use m=0 in the answer for the
193	   m lines that it would like to reject.  Conference server can re-
194	   Invite to escalate/de-escalate the number of video streams (with m
195	   !=0) as participants join/leave.  The server SHOULD NOT add any extra
196	   video m lines in the answer than the ones originaly offered by
197	   client.

199	4.1.2.  Client Outbound Video

201	   The Conference server SHOULD NOT use more than one video m line in
202	   the outdial to client use cases, this is to achieve better backward
203	   compatibilty with older video clients that don't support multi-stream
204	   video.  Only the client can escalate the number of video m lines it
205	   can receive using a re-INVITE.  A separate m line for outbound video
206	   MAY be negotiated, the outbound video MAY also be negotiated in one
207	   of the CP inbound m lines (sendrecv).

209	4.1.3.  Audio

211	   A single audio stream is negotiated by a separate audio m line, the
212	   inbound audio to client is mixed by the conference server.

214	4.2.  Advanced SIP and SDP negotiation and flows for multi-stream CP
215	      Video

217	   A new SDP attribute is discussed in this section.  This attribute
218	   communicates the client preferences for the CP streams.

220	4.2.1.  SDP content attribute

222	   A new content attribute is negotiated in each m line by the client,
223	   this attribute is sent by client in the video m lines negotiated in
224	   the SDP offer/answer for CP video, follows the standard[RFC3261] and
225	   [RFC3264].

227	   a=content: window-id, group number, bw reduction limit, VAS Rank
228	   window-id = 1 digit; window1, window2, window3, ..

230	   group number = 1-2 digits ;range 1-99, lower number = higher priority

232	   bandwidth reduction limit = 1-3 digits ; range 0-100;
233	   0 = no reduction allowed, 100 = full reduction is allowed.

235	   VAS Rank = 1 digit ; range 0-9

237	   The new content attribute is negotiated by the client to communicate
238	   the client CP streams grouping, BW optimization and video shuffling
239	   mechanism.  There is no answer for this attribute in the response
240	   from the server, the answer is reflected in the response m lines and
241	   the shuffling of the video RTP.  Conference servers that don't
242	   support this attribute will ignore it and will process the offer
243	   video m lines according to its own algorithms/preferences.  The group
244	   number specifies the group that the stream belongs to.  All streams
245	   (UI windows) in same group have same resolution/size.  A group with
246	   lower number has higher priority than higher group number.  The CP
247	   streams/windows are grouped within a layout, grouping allows the
248	   conference server to scale down all windows in same group for BW
249	   optimization and to deliver a uniform user experience across those
250	   windows.  The conference server should scale down the high group
251	   number first before scaling down the next group, ex: group2 first and
252	   then group1.  The bandwidth reduction limit sets the maximum
253	   percentage of the original bandwidth that the conference server can
254	   reduce to satisfy the bandwidth constraints.

256	   Client Offer SDP example.  For simplicity, audio and sprop-operation-
257	   point-info details are not shown:

259	   v=0
260	   o=svcsrv 289083124 289083124 IN IP4 192.0.2.2
261	   s=conference
262	   t=0 0
263	   b=TIAS:812000

265	   m=video 30000 RTP/AVP 98 97 96
266	   c=IN IP4 192.0.2.2
267	   a=content:window1,1,25,1
268	   b=TIAS:512000
269	   a=rtpmap:96 H264/90000
270	   a=fmtp:96 profile-level-id=42401e
271	   a=rtpmap:97 H264-SVC/90000
272	   a=fmtp:97 profile-level-id=530016; sprop-operation-point-
273	   info..(VGA/30)
274	   a=rtpmap:98 H264-SVC/90000
275	   a=fmtp:98 profile-level-id=53001e; sprop-operation-point-
276	   info..(720p/30)
277	   a=sendrecv

279	   m=video 40000 RTP/AVP 101 100 99
280	   c=IN IP4 192.0.2.2
281	   a=content:window2,2,50,1
282	   b=TIAS:300000
283	   a=rtpmap:99 H264/90000
284	   a=fmtp:99 profile-level-id=42401e
285	   a=rtpmap:100 H264-SVC/90000
286	   a=fmtp:100 profile-level-id=530013; sprop-operation-point-
287	   info..(VGA/30)
288	   a=rtpmap:101 H264-SVC/90000
289	   a=fmtp:101 profile-level-id=530016; sprop-operation-point-
290	   info..(360/30)
291	   a=recvonly

293	4.2.2.  VAS Rank

295	   Rules: If client wants the video stream/window from conference server
296	   to be switched by active speaker activity, then it has to assign a
297	   vasrank to the window.  The conference server will assign the window
298	   based on the active speaker history and rank.  Rank 1 gets the most
299	   recent speaker, rank 2 the next most recent, etc.  You can have
300	   multiple windows per rank.  This allows us to minimize the shuffling
301	   that takes place when the speakers switch in and out.  If not
302	   specified, default value is 1 for all windows (minimum shuffling).

304	   Example (1) of shuffling in a 2x2 or 1x4 layout (4 equal sized
305	   windows):
306	   a=content: window1,1,100, 1 (the most recent speaker)
307	   a=content: window2,1,100, 2 (2nd most recent)
308	   a=content: window3,1,100, 3 (3rd most recent)
309	   a=content: window4,1,100, 4 (4th most recent)

311	   In this example the client's offer to the conference server has 4
312	   video SDP m lines, a=content (second parameter) is the same for the
313	   four m lines, indicating same priority and 4 equal sized windows.
314	   The pa=content for each has a different vas rank value (last
315	   parameter in the examples above).  This means that the client is
316	   requesting the conference server to always send the most recent
317	   active speaker on first video stream negotiated in this example by
318	   first video m line, second most active speaker on second video
319	   stream, third most active speaker on third video stream and fourth
320	   most recent active speaker on fourth video stream.

322	   Example (2) of shuffling in a 2x2 or 1x4 layout (4 equal sized
323	   windows):

325	   a=content: window1,1,100, 1
326	   a=content: window2,1,100, 1
327	   a=content: window3,1,100, 1
328	   a=content: window4,1,100, 1
329	   All four windows will get switched with active speaker streams.  The
330	   order will be determined by conference server to minimize shuffling.

332	   In this example the client's offer to the conference server has 4
333	   video SDP m lines, a=content for each has a same vas rank value (last
334	   parameter in the examples above).  This means that the client is
335	   requesting the conference server to always minimize shuffling of
336	   speakers on video streams sent to client, i.e. if most recent active
337	   speaker changes, send his/her video on fourth stream replacing the
338	   least recent active speaker, leave other three streams unchanged.

340	   Example (3) of shuffling in a 1+3 layout (1 big + 3 small windows):
341	   a=content: window1,1,100, 1 (the most recent speaker)
342	   a=content: window2,2,100, 2 (2nd most recent)
343	   a=content: window3,2,100, 3 (3rd most recent)
344	   a=content: window4,2,100, 4 (4th most recent)

346	   In this example the client's offer to the conference server has 4
347	   video SDP m lines, a=content (second parameter) indicates two
348	   priorities and 1+3 layout.  The a=content for each has a different
349	   vas rank value (last parameter in the examples above).  This means
350	   that the client is requesting the conference server to always send
351	   the most recent active speaker on first video stream negotiated in
352	   this example by first video m line, second most active speaker on
353	   second video stream, third most active speaker on third video stream
354	   and fourth most recent active speaker on fourth video stream.

356	   Example (4) of shffling in a 1+3 layout (1 big + 3 small windows):
357	   a=content: window1,1,100, 1
358	   a=content: window2,2,100, 2
359	   a=content: window3,2,100, 2
360	   a=content: window4,2,100, 2

362	   The big window will always get the most recent active speaker.  The 3
363	   small windows will get the next 3 most recent active speaker.  The
364	   order for these three small windows will be determined by the server
365	   to minimize shuffling.

367	   Example (5) of shuffling in a 1+3 layout with pinned video (1 big + 3
368	   small windows):
369	   a=content: window1,1,100, 1 (the most recent speaker)
370	   a=content: window2,2,100, 2 (2nd or 3rd most recent)
371	   a=content: window3,2,100, 2 (2nd or 3rd most recent)
372	   a=content: window4,2,100, 0 (pinned / not switched based on speaker
373	   activity))

375	   In this example the fourth m line has vas rank of 0, which means this
376	   video stream will not be switched and is pinned to a certain user
377	   regardles of his/her voice activity.

379	5.  Active Talker Indication

381	   The audio and video active talker indications use the RTP CSRC in the
382	   audio and video RTP [RFC3550].  The SSRC's in the RTP CSRC list is
383	   mapped to userid/user name using the RFC4575 notifications.  Only one
384	   SSRC is sent in the video RTP CSRC list, client can use this to
385	   display the user name on each CP video window.

387	6.  Security Considerations

389	   The multi-stream CP video uses the TLS and sRTP standards for SIP
390	   signaling and media securtiy.

392	7.  IANA Considerations

394	   This document has no actions for IANA.

396	8.  Acknowledgements

398	   Thanks to Alan Johnston, Dan Romascanu, Peter Musgrave and Rifaat
399	   Shekh-Yusef for their review of the document and comments.

401	9.  References

403	9.1.  Informative References

405	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
406	              Jacobson, "RTP: A Transport Protocol for Real-Time
407	              Applications", STD 64, RFC 3550, July 2003.

409	   [RFC6190]  Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
410	              "RTP Payload Format for Scalable Video Coding", RFC 6190,
411	              May 2011.

413	9.2.  Normative References

415	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
416	              Requirement Levels", BCP 14, RFC 2119, March 1997.

418	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
419	              A., Peterson, J., Sparks, R., Handley, M., and E.
420	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
421	              June 2002.

423	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
424	              with Session Description Protocol (SDP)", RFC 3264,
425	              June 2002.

427	Author's Address

429	   Adel Mostafa (editor)
430	   Avaya
431	   Toronto, Ontario
432	   Canada

434	   Email: amostafa@avaya.com