idnits 2.17.1 draft-mostafa-mmusic-sip-cp-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 23, 2011) is 4507 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Mostafa, Ed. 3 Internet-Draft Avaya 4 Intended status: Standards Track December 23, 2011 5 Expires: June 25, 2012 7 A Mechanism for Negotiating Multi-Stream Continuous Presence Video in 8 SIP 9 draft-mostafa-mmusic-sip-cp-00 11 Abstract 13 The NextGen video conferencing clients require multiple concurrent 14 video streams to provide a User eXperience (UX) in which multiple 15 participants can be viewed at the same time, this user experience is 16 called Continuous Presence (CP) video. The multi-stream CP video 17 provides more client control of the UX and less processing on the 18 conference server since the video streams are relayed by the server 19 rather than mixed to compose a CP video stream. The client CP 20 layout, processing power and bandwidth limitations require a per 21 stream bandwidth and resolution to be negtiated in the SIP Offer/ 22 Answer with the conference server. Standard methods are used to 23 achieve this negotiation in addition to a new SDP parameter. This 24 document explains the methodology and solution to achieve this in SIP 25 and SDP. 27 Status of this Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on June 25, 2012. 44 Copyright Notice 46 Copyright (c) 2011 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2.1. Key Words . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 3 65 2.3. Voice Activated Switching . . . . . . . . . . . . . . . . 3 66 2.4. Continuous Presence . . . . . . . . . . . . . . . . . . . 3 67 2.5. Video Shuffling . . . . . . . . . . . . . . . . . . . . . 3 68 3. Multi-Stream Continuous Presence Video . . . . . . . . . . . . 4 69 4. Multi-Stream Continuous Presence video SIP and SDP 70 negotiation . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 4.1. Basic SIP and SDP negotiation and flows for 72 multi-stream CP Video . . . . . . . . . . . . . . . . . . 5 73 4.1.1. Client Inbound CP Video . . . . . . . . . . . . . . . 5 74 4.1.2. Client Outbound Video . . . . . . . . . . . . . . . . 5 75 4.1.3. Audio . . . . . . . . . . . . . . . . . . . . . . . . 5 76 4.2. Advanced SIP and SDP negotiation and flows for 77 multi-stream CP Video . . . . . . . . . . . . . . . . . . 5 78 4.2.1. SDP content attribute . . . . . . . . . . . . . . . . 5 79 4.2.2. VAS Rank . . . . . . . . . . . . . . . . . . . . . . . 7 80 5. Active Talker Indication . . . . . . . . . . . . . . . . . . . 9 81 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 82 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 83 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 84 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 85 9.1. Informative References . . . . . . . . . . . . . . . . . . 9 86 9.2. Normative References . . . . . . . . . . . . . . . . . . . 10 87 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 89 1. Overview 91 This document describes the SIP and SDP negotiation required for the 92 multi-stream CP video using video codecs such as H.264 SVC and AVC 93 (SVC: Scalable Video Coding, AVC: Advanced Video Coding). It covers 94 the CP layout use cases, grouping, shuffling and bandwidth scaling 95 for the CP streams. 97 2. Terminology 99 2.1. Key Words 101 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 102 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 103 document are to be interpreted as described in BCP 14, RFC 104 2119[RFC2119]. 106 2.2. Abbreviations 108 VAS: Voice Activated Switching 109 CP: Continuous Presence 110 UX: User eXperience 111 BW: Bandwidth 112 H.264 SVC: H.264 Scalable Video Coding 113 H.264 AVC: H.264 Advanced Video Coding 115 2.3. Voice Activated Switching 117 Voice Activated Switching in video delivers the video of single user 118 in a conference to a participant, this user is the current or most 119 recent active speaker. For example Alice, Bob, Carol, Dave and John 120 are video particpants in a conference, Alice is talking, John would 121 see Alice's video, when Bob starts talking John sees Bob's video. 123 2.4. Continuous Presence 125 Continuous Presence in video delivers the video of multiple users in 126 a conference to a participant. For example Alice, Bob, Carol, Dave 127 and John are video particpants in a conference, John can see a 128 Continuous Presence video that shows Alice, Bob, Carol and Dave at 129 the same time on his video client, typically the video of the most 130 recent active speakers. 132 2.5. Video Shuffling 134 Video shuffling is used in Continuous Presence use cases. For 135 example Alice, Bob, Carol, Dave, Mike and John are video particpants 136 in a conference, John can see a four windows Continuous Presence 137 video that has Alice, Bob, Carol and Dave on his client as the most 138 recent active speakers, when Mike starts talking he becomes the most 139 recent active speaker, the conference server shuffles Mike, Alice, 140 Bob and Carol streams in place of previous Alice, Bob, Carol and Dave 141 streams, this results in shuffling of particpants in the four windows 142 CP view on client. 144 3. Multi-Stream Continuous Presence Video 146 The Multi-Stream Continuous Presence video delivers multi-stream 147 video (e.g. H.264 SVC or AVC) to the client from a conference server 148 for the client to decode and render to the user. Continuous Presence 149 video displays multiple participants' windows on the client's 150 display, usually for the most recent active speakers. The multi- 151 stream video streams are negotiated using (n) video m lines in the 152 SDP where n > 1. example is n=4 where the CP video contains 4 153 participants/streams. A single video m line (n=1) means no CP and 154 typically display the most recent active speaker. Current video SDP 155 negotiation covers only the codecs used (e.g. H.264 SVC and AVC), 156 bit rate, number of layers used (in SVC per[RFC6190]) and direction 157 (recvonly, sendrcv, sendonly) but doesn't address the various aspects 158 of the BW optimization, the shuffling mechanism, grouping and layout 159 of the CP windows. 161 4. Multi-Stream Continuous Presence video SIP and SDP negotiation 163 This section describes the SIP and SDP negotiation required for the 164 multi-stream CP video, some use cases, flows and examples. 166 Audio/Video +------------+ Multistream CP video +----------+ 167 Alice ------------>| |---------Alice-------->| | 168 | |---------Bob-- ------->| | 169 Bob ------------>| |---------Dave--------->| | 170 | |---------Mike--------->| | 171 Carol ------------>| | | | 172 | Conference | | Client | 173 Dave ------------>| Server |------Mixed Audio----->| | 174 | |<--------Audio---------| | 175 John ------------>| | | | 176 | | | | 177 Mike ------------>| |<--------Video---------| | 178 +------------+ +----------+ 179 Figure 1 - Multiple Video Streams Continuous Presence 181 4.1. Basic SIP and SDP negotiation and flows for multi-stream CP Video 183 Multi-stream CP basic negotiation is initiated or escalated by 184 clients where a client negotiates multiple video m lines to receive 185 the CP video, this could be done in the initial offer from client to 186 conference server or in a re-INVITE. 188 4.1.1. Client Inbound CP Video 190 Conference server MAY accept all video m lines, some, one or none 191 (audio only call) depending on conference server capabilities and 192 policies. The conference sever should use m=0 in the answer for the 193 m lines that it would like to reject. Conference server can re- 194 Invite to escalate/de-escalate the number of video streams (with m 195 !=0) as participants join/leave. The server SHOULD NOT add any extra 196 video m lines in the answer than the ones originaly offered by 197 client. 199 4.1.2. Client Outbound Video 201 The Conference server SHOULD NOT use more than one video m line in 202 the outdial to client use cases, this is to achieve better backward 203 compatibilty with older video clients that don't support multi-stream 204 video. Only the client can escalate the number of video m lines it 205 can receive using a re-INVITE. A separate m line for outbound video 206 MAY be negotiated, the outbound video MAY also be negotiated in one 207 of the CP inbound m lines (sendrecv). 209 4.1.3. Audio 211 A single audio stream is negotiated by a separate audio m line, the 212 inbound audio to client is mixed by the conference server. 214 4.2. Advanced SIP and SDP negotiation and flows for multi-stream CP 215 Video 217 A new SDP attribute is discussed in this section. This attribute 218 communicates the client preferences for the CP streams. 220 4.2.1. SDP content attribute 222 A new content attribute is negotiated in each m line by the client, 223 this attribute is sent by client in the video m lines negotiated in 224 the SDP offer/answer for CP video, follows the standard[RFC3261] and 225 [RFC3264]. 227 a=content: window-id, group number, bw reduction limit, VAS Rank 228 window-id = 1 digit; window1, window2, window3, .. 230 group number = 1-2 digits ;range 1-99, lower number = higher priority 232 bandwidth reduction limit = 1-3 digits ; range 0-100; 233 0 = no reduction allowed, 100 = full reduction is allowed. 235 VAS Rank = 1 digit ; range 0-9 237 The new content attribute is negotiated by the client to communicate 238 the client CP streams grouping, BW optimization and video shuffling 239 mechanism. There is no answer for this attribute in the response 240 from the server, the answer is reflected in the response m lines and 241 the shuffling of the video RTP. Conference servers that don't 242 support this attribute will ignore it and will process the offer 243 video m lines according to its own algorithms/preferences. The group 244 number specifies the group that the stream belongs to. All streams 245 (UI windows) in same group have same resolution/size. A group with 246 lower number has higher priority than higher group number. The CP 247 streams/windows are grouped within a layout, grouping allows the 248 conference server to scale down all windows in same group for BW 249 optimization and to deliver a uniform user experience across those 250 windows. The conference server should scale down the high group 251 number first before scaling down the next group, ex: group2 first and 252 then group1. The bandwidth reduction limit sets the maximum 253 percentage of the original bandwidth that the conference server can 254 reduce to satisfy the bandwidth constraints. 256 Client Offer SDP example. For simplicity, audio and sprop-operation- 257 point-info details are not shown: 259 v=0 260 o=svcsrv 289083124 289083124 IN IP4 192.0.2.2 261 s=conference 262 t=0 0 263 b=TIAS:812000 265 m=video 30000 RTP/AVP 98 97 96 266 c=IN IP4 192.0.2.2 267 a=content:window1,1,25,1 268 b=TIAS:512000 269 a=rtpmap:96 H264/90000 270 a=fmtp:96 profile-level-id=42401e 271 a=rtpmap:97 H264-SVC/90000 272 a=fmtp:97 profile-level-id=530016; sprop-operation-point- 273 info..(VGA/30) 274 a=rtpmap:98 H264-SVC/90000 275 a=fmtp:98 profile-level-id=53001e; sprop-operation-point- 276 info..(720p/30) 277 a=sendrecv 279 m=video 40000 RTP/AVP 101 100 99 280 c=IN IP4 192.0.2.2 281 a=content:window2,2,50,1 282 b=TIAS:300000 283 a=rtpmap:99 H264/90000 284 a=fmtp:99 profile-level-id=42401e 285 a=rtpmap:100 H264-SVC/90000 286 a=fmtp:100 profile-level-id=530013; sprop-operation-point- 287 info..(VGA/30) 288 a=rtpmap:101 H264-SVC/90000 289 a=fmtp:101 profile-level-id=530016; sprop-operation-point- 290 info..(360/30) 291 a=recvonly 293 4.2.2. VAS Rank 295 Rules: If client wants the video stream/window from conference server 296 to be switched by active speaker activity, then it has to assign a 297 vasrank to the window. The conference server will assign the window 298 based on the active speaker history and rank. Rank 1 gets the most 299 recent speaker, rank 2 the next most recent, etc. You can have 300 multiple windows per rank. This allows us to minimize the shuffling 301 that takes place when the speakers switch in and out. If not 302 specified, default value is 1 for all windows (minimum shuffling). 304 Example (1) of shuffling in a 2x2 or 1x4 layout (4 equal sized 305 windows): 306 a=content: window1,1,100, 1 (the most recent speaker) 307 a=content: window2,1,100, 2 (2nd most recent) 308 a=content: window3,1,100, 3 (3rd most recent) 309 a=content: window4,1,100, 4 (4th most recent) 311 In this example the client's offer to the conference server has 4 312 video SDP m lines, a=content (second parameter) is the same for the 313 four m lines, indicating same priority and 4 equal sized windows. 314 The pa=content for each has a different vas rank value (last 315 parameter in the examples above). This means that the client is 316 requesting the conference server to always send the most recent 317 active speaker on first video stream negotiated in this example by 318 first video m line, second most active speaker on second video 319 stream, third most active speaker on third video stream and fourth 320 most recent active speaker on fourth video stream. 322 Example (2) of shuffling in a 2x2 or 1x4 layout (4 equal sized 323 windows): 325 a=content: window1,1,100, 1 326 a=content: window2,1,100, 1 327 a=content: window3,1,100, 1 328 a=content: window4,1,100, 1 329 All four windows will get switched with active speaker streams. The 330 order will be determined by conference server to minimize shuffling. 332 In this example the client's offer to the conference server has 4 333 video SDP m lines, a=content for each has a same vas rank value (last 334 parameter in the examples above). This means that the client is 335 requesting the conference server to always minimize shuffling of 336 speakers on video streams sent to client, i.e. if most recent active 337 speaker changes, send his/her video on fourth stream replacing the 338 least recent active speaker, leave other three streams unchanged. 340 Example (3) of shuffling in a 1+3 layout (1 big + 3 small windows): 341 a=content: window1,1,100, 1 (the most recent speaker) 342 a=content: window2,2,100, 2 (2nd most recent) 343 a=content: window3,2,100, 3 (3rd most recent) 344 a=content: window4,2,100, 4 (4th most recent) 346 In this example the client's offer to the conference server has 4 347 video SDP m lines, a=content (second parameter) indicates two 348 priorities and 1+3 layout. The a=content for each has a different 349 vas rank value (last parameter in the examples above). This means 350 that the client is requesting the conference server to always send 351 the most recent active speaker on first video stream negotiated in 352 this example by first video m line, second most active speaker on 353 second video stream, third most active speaker on third video stream 354 and fourth most recent active speaker on fourth video stream. 356 Example (4) of shffling in a 1+3 layout (1 big + 3 small windows): 357 a=content: window1,1,100, 1 358 a=content: window2,2,100, 2 359 a=content: window3,2,100, 2 360 a=content: window4,2,100, 2 362 The big window will always get the most recent active speaker. The 3 363 small windows will get the next 3 most recent active speaker. The 364 order for these three small windows will be determined by the server 365 to minimize shuffling. 367 Example (5) of shuffling in a 1+3 layout with pinned video (1 big + 3 368 small windows): 369 a=content: window1,1,100, 1 (the most recent speaker) 370 a=content: window2,2,100, 2 (2nd or 3rd most recent) 371 a=content: window3,2,100, 2 (2nd or 3rd most recent) 372 a=content: window4,2,100, 0 (pinned / not switched based on speaker 373 activity)) 375 In this example the fourth m line has vas rank of 0, which means this 376 video stream will not be switched and is pinned to a certain user 377 regardles of his/her voice activity. 379 5. Active Talker Indication 381 The audio and video active talker indications use the RTP CSRC in the 382 audio and video RTP [RFC3550]. The SSRC's in the RTP CSRC list is 383 mapped to userid/user name using the RFC4575 notifications. Only one 384 SSRC is sent in the video RTP CSRC list, client can use this to 385 display the user name on each CP video window. 387 6. Security Considerations 389 The multi-stream CP video uses the TLS and sRTP standards for SIP 390 signaling and media securtiy. 392 7. IANA Considerations 394 This document has no actions for IANA. 396 8. Acknowledgements 398 Thanks to Alan Johnston, Dan Romascanu, Peter Musgrave and Rifaat 399 Shekh-Yusef for their review of the document and comments. 401 9. References 403 9.1. Informative References 405 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 406 Jacobson, "RTP: A Transport Protocol for Real-Time 407 Applications", STD 64, RFC 3550, July 2003. 409 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 410 "RTP Payload Format for Scalable Video Coding", RFC 6190, 411 May 2011. 413 9.2. Normative References 415 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 416 Requirement Levels", BCP 14, RFC 2119, March 1997. 418 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 419 A., Peterson, J., Sparks, R., Handley, M., and E. 420 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 421 June 2002. 423 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 424 with Session Description Protocol (SDP)", RFC 3264, 425 June 2002. 427 Author's Address 429 Adel Mostafa (editor) 430 Avaya 431 Toronto, Ontario 432 Canada 434 Email: amostafa@avaya.com