Network Working Group A. Mostafa, Ed. Internet-Draft Avaya Intended status: Standards Track December 23, 2011 Expires: June 25, 2012 A Mechanism for Negotiating Multi-Stream Continuous Presence Video in SIP draft-mostafa-mmusic-sip-cp-00 Abstract The NextGen video conferencing clients require multiple concurrent video streams to provide a User eXperience (UX) in which multiple participants can be viewed at the same time, this user experience is called Continuous Presence (CP) video. The multi-stream CP video provides more client control of the UX and less processing on the conference server since the video streams are relayed by the server rather than mixed to compose a CP video stream. The client CP layout, processing power and bandwidth limitations require a per stream bandwidth and resolution to be negtiated in the SIP Offer/ Answer with the conference server. Standard methods are used to achieve this negotiation in addition to a new SDP parameter. This document explains the methodology and solution to achieve this in SIP and SDP. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on June 25, 2012. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. Mostafa Expires June 25, 2012 [Page 1] Internet-Draft SIP Multi-Stream CP Video December 2011 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Key Words . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 3 2.3. Voice Activated Switching . . . . . . . . . . . . . . . . 3 2.4. Continuous Presence . . . . . . . . . . . . . . . . . . . 3 2.5. Video Shuffling . . . . . . . . . . . . . . . . . . . . . 3 3. Multi-Stream Continuous Presence Video . . . . . . . . . . . . 4 4. Multi-Stream Continuous Presence video SIP and SDP negotiation . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.1. Basic SIP and SDP negotiation and flows for multi-stream CP Video . . . . . . . . . . . . . . . . . . 5 4.1.1. Client Inbound CP Video . . . . . . . . . . . . . . . 5 4.1.2. Client Outbound Video . . . . . . . . . . . . . . . . 5 4.1.3. Audio . . . . . . . . . . . . . . . . . . . . . . . . 5 4.2. Advanced SIP and SDP negotiation and flows for multi-stream CP Video . . . . . . . . . . . . . . . . . . 5 4.2.1. SDP content attribute . . . . . . . . . . . . . . . . 5 4.2.2. VAS Rank . . . . . . . . . . . . . . . . . . . . . . . 7 5. Active Talker Indication . . . . . . . . . . . . . . . . . . . 9 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9.1. Informative References . . . . . . . . . . . . . . . . . . 9 9.2. Normative References . . . . . . . . . . . . . . . . . . . 10 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 Mostafa Expires June 25, 2012 [Page 2] Internet-Draft SIP Multi-Stream CP Video December 2011 1. Overview This document describes the SIP and SDP negotiation required for the multi-stream CP video using video codecs such as H.264 SVC and AVC (SVC: Scalable Video Coding, AVC: Advanced Video Coding). It covers the CP layout use cases, grouping, shuffling and bandwidth scaling for the CP streams. 2. Terminology 2.1. Key Words The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119[RFC2119]. 2.2. Abbreviations VAS: Voice Activated Switching CP: Continuous Presence UX: User eXperience BW: Bandwidth H.264 SVC: H.264 Scalable Video Coding H.264 AVC: H.264 Advanced Video Coding 2.3. Voice Activated Switching Voice Activated Switching in video delivers the video of single user in a conference to a participant, this user is the current or most recent active speaker. For example Alice, Bob, Carol, Dave and John are video particpants in a conference, Alice is talking, John would see Alice's video, when Bob starts talking John sees Bob's video. 2.4. Continuous Presence Continuous Presence in video delivers the video of multiple users in a conference to a participant. For example Alice, Bob, Carol, Dave and John are video particpants in a conference, John can see a Continuous Presence video that shows Alice, Bob, Carol and Dave at the same time on his video client, typically the video of the most recent active speakers. 2.5. Video Shuffling Video shuffling is used in Continuous Presence use cases. For example Alice, Bob, Carol, Dave, Mike and John are video particpants Mostafa Expires June 25, 2012 [Page 3] Internet-Draft SIP Multi-Stream CP Video December 2011 in a conference, John can see a four windows Continuous Presence video that has Alice, Bob, Carol and Dave on his client as the most recent active speakers, when Mike starts talking he becomes the most recent active speaker, the conference server shuffles Mike, Alice, Bob and Carol streams in place of previous Alice, Bob, Carol and Dave streams, this results in shuffling of particpants in the four windows CP view on client. 3. Multi-Stream Continuous Presence Video The Multi-Stream Continuous Presence video delivers multi-stream video (e.g. H.264 SVC or AVC) to the client from a conference server for the client to decode and render to the user. Continuous Presence video displays multiple participants' windows on the client's display, usually for the most recent active speakers. The multi- stream video streams are negotiated using (n) video m lines in the SDP where n > 1. example is n=4 where the CP video contains 4 participants/streams. A single video m line (n=1) means no CP and typically display the most recent active speaker. Current video SDP negotiation covers only the codecs used (e.g. H.264 SVC and AVC), bit rate, number of layers used (in SVC per[RFC6190]) and direction (recvonly, sendrcv, sendonly) but doesn't address the various aspects of the BW optimization, the shuffling mechanism, grouping and layout of the CP windows. 4. Multi-Stream Continuous Presence video SIP and SDP negotiation This section describes the SIP and SDP negotiation required for the multi-stream CP video, some use cases, flows and examples. Audio/Video +------------+ Multistream CP video +----------+ Alice ------------>| |---------Alice-------->| | | |---------Bob-- ------->| | Bob ------------>| |---------Dave--------->| | | |---------Mike--------->| | Carol ------------>| | | | | Conference | | Client | Dave ------------>| Server |------Mixed Audio----->| | | |<--------Audio---------| | John ------------>| | | | | | | | Mike ------------>| |<--------Video---------| | +------------+ +----------+ Figure 1 - Multiple Video Streams Continuous Presence Mostafa Expires June 25, 2012 [Page 4] Internet-Draft SIP Multi-Stream CP Video December 2011 4.1. Basic SIP and SDP negotiation and flows for multi-stream CP Video Multi-stream CP basic negotiation is initiated or escalated by clients where a client negotiates multiple video m lines to receive the CP video, this could be done in the initial offer from client to conference server or in a re-INVITE. 4.1.1. Client Inbound CP Video Conference server MAY accept all video m lines, some, one or none (audio only call) depending on conference server capabilities and policies. The conference sever should use m=0 in the answer for the m lines that it would like to reject. Conference server can re- Invite to escalate/de-escalate the number of video streams (with m !=0) as participants join/leave. The server SHOULD NOT add any extra video m lines in the answer than the ones originaly offered by client. 4.1.2. Client Outbound Video The Conference server SHOULD NOT use more than one video m line in the outdial to client use cases, this is to achieve better backward compatibilty with older video clients that don't support multi-stream video. Only the client can escalate the number of video m lines it can receive using a re-INVITE. A separate m line for outbound video MAY be negotiated, the outbound video MAY also be negotiated in one of the CP inbound m lines (sendrecv). 4.1.3. Audio A single audio stream is negotiated by a separate audio m line, the inbound audio to client is mixed by the conference server. 4.2. Advanced SIP and SDP negotiation and flows for multi-stream CP Video A new SDP attribute is discussed in this section. This attribute communicates the client preferences for the CP streams. 4.2.1. SDP content attribute A new content attribute is negotiated in each m line by the client, this attribute is sent by client in the video m lines negotiated in the SDP offer/answer for CP video, follows the standard[RFC3261] and [RFC3264]. a=content: window-id, group number, bw reduction limit, VAS Rank Mostafa Expires June 25, 2012 [Page 5] Internet-Draft SIP Multi-Stream CP Video December 2011 window-id = 1 digit; window1, window2, window3, .. group number = 1-2 digits ;range 1-99, lower number = higher priority bandwidth reduction limit = 1-3 digits ; range 0-100; 0 = no reduction allowed, 100 = full reduction is allowed. VAS Rank = 1 digit ; range 0-9 The new content attribute is negotiated by the client to communicate the client CP streams grouping, BW optimization and video shuffling mechanism. There is no answer for this attribute in the response from the server, the answer is reflected in the response m lines and the shuffling of the video RTP. Conference servers that don't support this attribute will ignore it and will process the offer video m lines according to its own algorithms/preferences. The group number specifies the group that the stream belongs to. All streams (UI windows) in same group have same resolution/size. A group with lower number has higher priority than higher group number. The CP streams/windows are grouped within a layout, grouping allows the conference server to scale down all windows in same group for BW optimization and to deliver a uniform user experience across those windows. The conference server should scale down the high group number first before scaling down the next group, ex: group2 first and then group1. The bandwidth reduction limit sets the maximum percentage of the original bandwidth that the conference server can reduce to satisfy the bandwidth constraints. Client Offer SDP example. For simplicity, audio and sprop-operation- point-info details are not shown: v=0 o=svcsrv 289083124 289083124 IN IP4 192.0.2.2 s=conference t=0 0 b=TIAS:812000 m=video 30000 RTP/AVP 98 97 96 c=IN IP4 192.0.2.2 a=content:window1,1,25,1 b=TIAS:512000 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42401e a=rtpmap:97 H264-SVC/90000 a=fmtp:97 profile-level-id=530016; sprop-operation-point- info..(VGA/30) a=rtpmap:98 H264-SVC/90000 a=fmtp:98 profile-level-id=53001e; sprop-operation-point- Mostafa Expires June 25, 2012 [Page 6] Internet-Draft SIP Multi-Stream CP Video December 2011 info..(720p/30) a=sendrecv m=video 40000 RTP/AVP 101 100 99 c=IN IP4 192.0.2.2 a=content:window2,2,50,1 b=TIAS:300000 a=rtpmap:99 H264/90000 a=fmtp:99 profile-level-id=42401e a=rtpmap:100 H264-SVC/90000 a=fmtp:100 profile-level-id=530013; sprop-operation-point- info..(VGA/30) a=rtpmap:101 H264-SVC/90000 a=fmtp:101 profile-level-id=530016; sprop-operation-point- info..(360/30) a=recvonly 4.2.2. VAS Rank Rules: If client wants the video stream/window from conference server to be switched by active speaker activity, then it has to assign a vasrank to the window. The conference server will assign the window based on the active speaker history and rank. Rank 1 gets the most recent speaker, rank 2 the next most recent, etc. You can have multiple windows per rank. This allows us to minimize the shuffling that takes place when the speakers switch in and out. If not specified, default value is 1 for all windows (minimum shuffling). Example (1) of shuffling in a 2x2 or 1x4 layout (4 equal sized windows): a=content: window1,1,100, 1 (the most recent speaker) a=content: window2,1,100, 2 (2nd most recent) a=content: window3,1,100, 3 (3rd most recent) a=content: window4,1,100, 4 (4th most recent) In this example the client's offer to the conference server has 4 video SDP m lines, a=content (second parameter) is the same for the four m lines, indicating same priority and 4 equal sized windows. The pa=content for each has a different vas rank value (last parameter in the examples above). This means that the client is requesting the conference server to always send the most recent active speaker on first video stream negotiated in this example by first video m line, second most active speaker on second video stream, third most active speaker on third video stream and fourth most recent active speaker on fourth video stream. Example (2) of shuffling in a 2x2 or 1x4 layout (4 equal sized windows): Mostafa Expires June 25, 2012 [Page 7] Internet-Draft SIP Multi-Stream CP Video December 2011 a=content: window1,1,100, 1 a=content: window2,1,100, 1 a=content: window3,1,100, 1 a=content: window4,1,100, 1 All four windows will get switched with active speaker streams. The order will be determined by conference server to minimize shuffling. In this example the client's offer to the conference server has 4 video SDP m lines, a=content for each has a same vas rank value (last parameter in the examples above). This means that the client is requesting the conference server to always minimize shuffling of speakers on video streams sent to client, i.e. if most recent active speaker changes, send his/her video on fourth stream replacing the least recent active speaker, leave other three streams unchanged. Example (3) of shuffling in a 1+3 layout (1 big + 3 small windows): a=content: window1,1,100, 1 (the most recent speaker) a=content: window2,2,100, 2 (2nd most recent) a=content: window3,2,100, 3 (3rd most recent) a=content: window4,2,100, 4 (4th most recent) In this example the client's offer to the conference server has 4 video SDP m lines, a=content (second parameter) indicates two priorities and 1+3 layout. The a=content for each has a different vas rank value (last parameter in the examples above). This means that the client is requesting the conference server to always send the most recent active speaker on first video stream negotiated in this example by first video m line, second most active speaker on second video stream, third most active speaker on third video stream and fourth most recent active speaker on fourth video stream. Example (4) of shffling in a 1+3 layout (1 big + 3 small windows): a=content: window1,1,100, 1 a=content: window2,2,100, 2 a=content: window3,2,100, 2 a=content: window4,2,100, 2 The big window will always get the most recent active speaker. The 3 small windows will get the next 3 most recent active speaker. The order for these three small windows will be determined by the server to minimize shuffling. Example (5) of shuffling in a 1+3 layout with pinned video (1 big + 3 small windows): a=content: window1,1,100, 1 (the most recent speaker) a=content: window2,2,100, 2 (2nd or 3rd most recent) a=content: window3,2,100, 2 (2nd or 3rd most recent) a=content: window4,2,100, 0 (pinned / not switched based on speaker Mostafa Expires June 25, 2012 [Page 8] Internet-Draft SIP Multi-Stream CP Video December 2011 activity)) In this example the fourth m line has vas rank of 0, which means this video stream will not be switched and is pinned to a certain user regardles of his/her voice activity. 5. Active Talker Indication The audio and video active talker indications use the RTP CSRC in the audio and video RTP [RFC3550]. The SSRC's in the RTP CSRC list is mapped to userid/user name using the RFC4575 notifications. Only one SSRC is sent in the video RTP CSRC list, client can use this to display the user name on each CP video window. 6. Security Considerations The multi-stream CP video uses the TLS and sRTP standards for SIP signaling and media securtiy. 7. IANA Considerations This document has no actions for IANA. 8. Acknowledgements Thanks to Alan Johnston, Dan Romascanu, Peter Musgrave and Rifaat Shekh-Yusef for their review of the document and comments. 9. References 9.1. Informative References [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, "RTP Payload Format for Scalable Video Coding", RFC 6190, May 2011. Mostafa Expires June 25, 2012 [Page 9] Internet-Draft SIP Multi-Stream CP Video December 2011 9.2. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. Author's Address Adel Mostafa (editor) Avaya Toronto, Ontario Canada Email: amostafa@avaya.com Mostafa Expires June 25, 2012 [Page 10]