idnits 2.17.1 draft-ietf-rtcweb-jsep-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (June 4, 2012) is 4345 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC4566' is defined on line 969, but no explicit reference was found in the text == Unused Reference: 'RFC5245' is defined on line 978, but no explicit reference was found in the text ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Uberti 3 Internet-Draft Google 4 Intended status: Standards Track C. Jennings 5 Expires: December 6, 2012 Cisco Systems, Inc. 6 June 4, 2012 8 Javascript Session Establishment Protocol 9 draft-ietf-rtcweb-jsep-01 11 Abstract 13 This document proposes a mechanism for allowing a Javascript 14 application to fully control the signaling plane of a multimedia 15 session, and discusses how this would work with existing signaling 16 protocols. 18 This document is an input document for discussion. It should be 19 discussed in the RTCWEB WG list, rtcweb@ietf.org. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on July 26, 2012. 38 Copyright Notice 40 Copyright (c) 2012 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 57 2. JSEP Approach . . . . . . . . . . . . . . . . . . . . . . . . . 5 58 3. Other Approaches Considered . . . . . . . . . . . . . . . . . . 6 59 4. Semantics and Syntax . . . . . . . . . . . . . . . . . . . . . 7 60 4.1. Signaling Model . . . . . . . . . . . . . . . . . . . . . . 7 61 4.2. Session Descriptions and State Machine . . . . . . . . . . 7 62 4.3. Session Description Format . . . . . . . . . . . . . . . . 9 63 4.4. Separation of Signaling and ICE State Machines . . . . . . 10 64 4.5. ICE Candidate Trickling . . . . . . . . . . . . . . . . . . 10 65 4.6. ICE Candidate Format . . . . . . . . . . . . . . . . . . . 11 66 4.7. Interactions With Forking . . . . . . . . . . . . . . . . . 11 67 4.7.1. Serial Forking . . . . . . . . . . . . . . . . . . . . 11 68 4.7.2. Parallel Forking . . . . . . . . . . . . . . . . . . . 12 69 4.8. Session Rehydration . . . . . . . . . . . . . . . . . . . . 12 70 5. Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 71 5.1. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 13 72 5.1.1. createOffer . . . . . . . . . . . . . . . . . . . . . . 13 73 5.1.2. createAnswer . . . . . . . . . . . . . . . . . . . . . 14 74 5.1.3. SessionDescriptionType . . . . . . . . . . . . . . . . 14 75 5.1.4. setLocalDescription . . . . . . . . . . . . . . . . . . 15 76 5.1.5. setRemoteDescription . . . . . . . . . . . . . . . . . 15 77 5.1.6. localDescription . . . . . . . . . . . . . . . . . . . 16 78 5.1.7. remoteDescription . . . . . . . . . . . . . . . . . . . 16 79 5.1.8. updateIce . . . . . . . . . . . . . . . . . . . . . . . 16 80 5.1.9. addIceCandidate . . . . . . . . . . . . . . . . . . . . 17 81 5.2. Configurable SDP Parameters . . . . . . . . . . . . . . . . 17 82 6. Media Setup Overview . . . . . . . . . . . . . . . . . . . . . 17 83 6.1. Initiating the Session . . . . . . . . . . . . . . . . . . 18 84 6.1.1. Generating An Offer . . . . . . . . . . . . . . . . . . 18 85 6.1.2. Applying the Offer . . . . . . . . . . . . . . . . . . 18 86 6.1.3. Handling ICE Callbacks . . . . . . . . . . . . . . . . 18 87 6.1.4. Serializing the Offer and Candidates . . . . . . . . . 19 88 6.2. Receiving the Session . . . . . . . . . . . . . . . . . . . 19 89 6.2.1. Receiving the Offer . . . . . . . . . . . . . . . . . . 19 90 6.2.2. Handling ICE Messages . . . . . . . . . . . . . . . . . 19 91 6.2.3. Generating the Answer . . . . . . . . . . . . . . . . . 20 92 6.2.4. Applying the Answer . . . . . . . . . . . . . . . . . . 20 93 6.2.5. Serializing the Answer . . . . . . . . . . . . . . . . 20 94 6.3. Completing the Session . . . . . . . . . . . . . . . . . . 20 95 6.3.1. Receiving the Answer . . . . . . . . . . . . . . . . . 20 97 6.4. Updates to the Session . . . . . . . . . . . . . . . . . . 20 98 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 21 99 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 21 100 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21 101 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 102 10.1. Normative References . . . . . . . . . . . . . . . . . . . 21 103 10.2. Informative References . . . . . . . . . . . . . . . . . . 21 104 Appendix A. JSEP Implementation Examples . . . . . . . . . . . . . 22 105 A.1. Example API . . . . . . . . . . . . . . . . . . . . . . . . 22 106 A.2. Example API Flows . . . . . . . . . . . . . . . . . . . . . 23 107 A.2.1. Call using ROAP . . . . . . . . . . . . . . . . . . . . 23 108 A.2.2 Call using XMPP . . . . . . . . . . . . . . . . . . . . 24 109 A.2.3. Adding video to a call, using XMPP . . . . . . . . . . 25 110 A.2.4. Simultaneous add of video streams, using XMPP . . . . . 26 111 A.2.5. Call using SIP . . . . . . . . . . . . . . . . . . . . 27 112 A.2.6. Handling early media (e.g. 1-800-FEDEX), using SIP . . 28 113 A.3. Full Example Application . . . . . . . . . . . . . . . . . 28 114 Appendix B. Change log . . . . . . . . . . . . . . . . . . . . . . 30 115 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 117 1. Introduction 119 The thinking behind WebRTC call setup has been to fully specify and 120 control the media plane, but to leave the signaling plane up to the 121 application as much as possible. The rationale is that different 122 applications may prefer to use different protocols, such as the 123 existing SIP or Jingle call signaling protocols, or something custom 124 to the particular application, perhaps for a novel use case. In this 125 approach, the key information that needs to be exchanged is the 126 multimedia session description, which specifies the necessary 127 transport and media configuration information necessary to establish 128 the media plane. 130 The original spec for WebRTC attempted to implement this protocol- 131 agnostic signaling by providing a mechanism to exchange session 132 descriptions in the form of SDP blobs. Upon starting a session, the 133 browser would generate a SDP blob, which would be passed to the 134 application for transport over its preferred signaling protocol. On 135 the remote side, this blob would be passed into the browser from the 136 application, and the browser would then generate a blob of its own in 137 response. Upon transmission back to the initiator, this blob would be 138 plugged into their browser, and the handshake would be complete. 140 Experimentation with this mechanism turned up several shortcomings, 141 which generally stemmed from there being insufficient context at the 142 browser to fully determine the meaning of a SDP blob. For example, 143 determining whether a blob is an offer or an answer, or 144 differentiating a new offer from a retransmit. 146 The ROAP proposal, specified in [I-D.draft-jennings-rtcweb-signaling- 147 01], attempted to resolve these issues by providing additional 148 structure in the messaging - in essence, to create a generic 149 signaling protocol that specifies how the browser signaling state 150 machine should operate. However, even though the protocol is 151 abstracted, the state machine forces a least-common-denominator 152 approach on the signaling interactions. For example, in Jingle, the 153 call initiator can provide additional ICE candidates even after the 154 initial offer has been sent, which allows the offer to be sent 155 immediately for quicker call startup. However, in the browser state 156 machine, there is no notion of sending an updated offer before the 157 initial offer has been responded to, rendering this functionality 158 impossible. 160 While specific concerns like this could be addressed by modifying the 161 generic protocol, others would likely be discovered later. The main 162 reason this mechanism is inflexible is because it embeds a signaling 163 state machine within the browser. Since the browser generates the 164 session descriptions on its own, and fully controls the possible 165 states and advancement of the signaling state machine, modification 166 of the session descriptions or use of alternate state machines 167 becomes difficult or impossible. 169 The browser environment also has its own challenges that cause 170 problems for an embedded signaling state machine. One of these is 171 that the user may reload the web page at any time. If this happens, 172 and the state machine is being run at a server, the server can simply 173 push the current state back down to the page and resume the call 174 where it left off. 176 If instead the state machine is run at the browser end, and is 177 instantiated within, for example, the PeerConnection object, that 178 state machine will be reinitialized when the page is reloaded and the 179 JavaScript re-executed. This actually complicates the design of any 180 interoperability service, as all cases where an offer or answer has 181 already been generated but is now "forgotten" must now be handled by 182 trying to move the client state machine forward to the same state it 183 had been in previously in order to match what has already been 184 delivered to and/or answered by the far side, or handled by ensuring 185 that aborts are cleanly handled from every state and the negotiation 186 rapidly restarted. 188 1.1. Terminology 190 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 191 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 192 document are to be interpreted as described in RFC 2119 [RFC2119]. 194 2. JSEP Approach 196 To resolve the issues mentioned above, this document proposes the 197 Javascript Session Establishment Protocol (JSEP) that pulls the 198 signaling state machine out of the browser and into Javascript. This 199 mechanism effectively removes the browser almost completely from the 200 core signaling flow; the only interface needed is a way for the 201 application to pass in the local and remote session descriptions 202 negotiated by whatever signaling mechanism is used, and a way to 203 interact with the ICE state machine. 205 JSEP's handling of session descriptions is simple and 206 straightforward. Whenever an offer/answer exchange is needed, the 207 initiating side creates an offer by calling a createOffer() API. The 208 application can do massaging of that offer, if it wants to, and then 209 uses it to set up its local config via a setLocalDescription() API. 210 The offer is then sent off to the remote side over its preferred 211 signaling mechanism (e.g. WebSockets); upon receipt of that offer, 212 the remote party installs it using a setRemoteDescription() API. 214 When the call is accepted, the callee uses a createAnswer() API to 215 generate an appropriate answer, applies it using 216 setLocalDescription(), and sends the answer back to the initiator 217 over the signaling channel. When the offerer gets that answer, it 218 installs it using setRemoteDescription(), and initial setup is 219 complete. This process can be repeated for additional offer/answer 220 exchanges. 222 Regarding ICE, JSEP decouples the ICE state machine from the overall 223 signaling state machine, as the ICE state machine must remain in the 224 browser, since only the browser has the necessary knowledge of 225 candidates and other transport info. Performing this separation it 226 provides additional flexibility; in protocols that decouple session 227 descriptions from transport, such as Jingle, the transport 228 information can be sent separately; in protocols that don't, such as 229 SIP, the information can be easily aggregated and recombined. Sending 230 transport information separately can allow for faster ICE and DTLS 231 startup, since the necessary roundtrips can occur while waiting for 232 the remote side to accept the session. 234 The JSEP approach does come with a minor downside. As the application 235 now is responsible for driving the signaling state machine, slightly 236 more application code is necessary to perform call setup; the 237 application must call the right APIs at the right times, and convert 238 the session descriptions and ICE information into the defined 239 messages of its chosen signaling protocol, instead of simply 240 forwarding the messages emitted from the browser. 242 One way to mitigate this is to provide a Javascript library that 243 hides this complexity from the developer, which would implement the 244 state machine and serialization of the desired signaling protocol. 245 For example, this library could convert easily adapt the JSEP API 246 into the exact ROAP API, thereby implementing the ROAP signaling 247 protocol. Such a library could of course also implement other popular 248 signaling protocols, including SIP or Jingle. In this fashion we can 249 enable greater control for the experienced developer without forcing 250 any additional complexity on the novice developer. 252 3. Other Approaches Considered 254 Another approach that was considered for JSEP was to move the 255 mechanism for generating offers and answers out of the browser as 256 well. Instead of providing createOffer/createAnswer methods within 257 the browser, this approach would instead expose a getCapabilities API 258 which would provide the application with the information it needed in 259 order to generate its own session descriptions. This increases the 260 amount of work that the application needs to do; it needs to know how 261 to generate session descriptions from capabilities, and especially 262 how to generate the correct answer from an arbitrary offer and the 263 supported capabilities. While this could certainly be addressed by 264 using a library like the one mentioned above, it basically forces the 265 use of said library even for a simple example. Exposing 266 createOffer/createAnswer avoids that problem, but still allows 267 applications to generate their own offers/answers if they choose, 268 using the description generated by createOffer as an indication of 269 the browser's capabilities. 271 Note also that while JSEP transfers more control to Javascript, it is 272 not intended to be an example of a "low-level" API. The general 273 argument against a low-level API is that there are too many necessary 274 API points, and they can be called in any order, leading to something 275 that is hard to specify and test. In the approach proposed here, 276 control is performed via session descriptions; this requires only a 277 few APIs to handle these descriptions, and they are evaluated in a 278 specific fashion, which reduces the number of possible states and 279 interactions. 281 4. Semantics and Syntax 283 4.1. Signaling Model 285 JSEP does not specify a particular signaling model or state machine, 286 other than the generic need to exchange RFC 3264 offers and answers 287 in order for both sides of the session to know how to conduct the 288 session. JSEP provides mechanisms to create offers and answers, as 289 well as to apply them to a session. However, the actual mechanism by 290 which these offers and answers are communicated to the remote side, 291 including addressing, retransmission, forking, and glare handling, is 292 left entirely up to the application. 294 +-----------+ +-----------+ 295 | Web App |<--- App-Specific Signaling --->| Web App | 296 +-----------+ +-----------+ 297 | | 298 | SDP | SDP 299 V V 300 +-----------+ +-----------+ 301 | Browser |<----------- Media ------------>| Browser | 302 +-----------+ +-----------+ 304 Figure 1: JSEP Signaling Model 306 4.2. Session Descriptions and State Machine 308 In order to establish the media plane, the user agent needs specific 309 parameters to indicate what to transmit to the remote side, as well 310 as how to handle the media that is received. These parameters are 311 determined by the exchange of session descriptions in offers and 312 answers, and there are certain details to this process that must be 313 handled in the JSEP APIs. 315 Whether a session description was sent or received affects the 316 meaning of that description. For example, the list of codecs sent to 317 a remote party indicates what the local side is willing to decode, 318 and what the remote party should send. Not all parameters follow this 319 rule; for example, the SRTP parameters [RFC4568] sent to a remote 320 party indicate what the local side will use to encrypt, and thereby 321 how the remote party should expect to receive. 323 In addition, various RFCs put different conditions on the format of 324 offers versus answers. For example, a offer may propose multiple SRTP 325 configurations, but an answer may only contain a single SRTP 326 configuration. 328 Lastly, while the exact media parameters are only known only after a 329 offer and an answer have been exchanged, it is possible for the 330 offerer to receive media after they have sent an offer and before 331 they have received an answer. To properly process incoming media in 332 this case, the offerer's media handler must be aware of the details 333 of the offerer before the answer arrives. 335 Therefore, in order to handle session descriptions properly, the user 336 agent needs: 338 1. To know if a session description pertains to the local or 339 remote side. 341 2. To know if a session description is an offer or an answer. 343 3. To allow the offer to be specified independently of the answer. 345 JSEP addresses this by adding both a setLocalDescription and a 346 setRemoteDescription method, and both these methods take a parameter 347 to indicate the type of session description being supplied. This 348 satisfies the requirements listed above for both the offerer, who 349 first calls setLocalDescription("offer", sdp) and then later 350 setRemoteDescription("answer", sdp), as well as for the answerer, who 351 first calls setRemoteDescription("offer", sdp) and then later 352 setLocalDescription("answer", sdp). While it could be possible to 353 implicitly determine the value of the offer/answer argument, 354 requiring it to be specified explicitly is more robust, allowing 355 invalid combinations (i.e. an answer before an offer) to generate an 356 appropriate error. 358 It also allows for an answer to be treated as provisional by the 359 application. Provisional answers provide a way for an answerer to 360 communicate session parameters back to the offerer, in order for the 361 session to begin, while allowing a final answer to be specified 362 later. This concept of a final answer is important to the 363 offer/answer model; when such an answer is received, any extra 364 resources allocated by the caller can be released, now that the exact 365 session configuration is known. These "resources" can include things 366 like extra ICE components, TURN candidates, or video decoders. 367 Provisional answers, on the other hand, do no such deallocation; as a 368 result, multiple dissimilar provisional answers can be received and 369 applied during call setup. 371 As in [RFC3264], an offerer can send an offer, and update it as long 372 as it has not been answered. The answerer can send back zero or more 373 provisional answers, and finally end the offer-answer exchange by 374 sending a final answer. The state machine for this is as follows: 376 +-----------+ 377 | | 378 | | 379 | Stable |<---------------\ 380 | | | 381 | | | 382 +-----------+ | 383 ^ | | 384 | | OFFER | 385 ANSWER | | | ANSWER 386 | V | 387 +-----------+ +-----------+ 388 | | | | 389 | | PRANSWER | | 390 | Offer |--------->| Pranswer | 391 | | | | 392 | |----\ | |----\ 393 +-----------+ | +-----------+ | 394 ^ | ^ | 395 | | | | 396 \-----/ \-----/ 397 OFFER PRANSWER 399 Figure 2: JSEP State Machine 401 Aside from these state transitions, there is no other difference 402 between the handling of provisional ("pranswer") and final ("answer") 403 answers. 405 4.3. Session Description Format 406 In the current WebRTC specification, session descriptions are 407 formatted as SDP messages. While this format is not optimal for 408 manipulation from Javascript, it is widely accepted, and frequently 409 updated with new features. Any alternate encoding of session 410 descriptions would have to keep pace with the changes to SDP, at 411 least until the time that this new encoding eclipsed SDP in 412 popularity. As a result, JSEP continues to use SDP as the internal 413 representation for its session descriptions. 415 However, to simplify Javascript processing, and provide for future 416 flexibility, the SDP syntax is encapsulated within a 417 SessionDescription object, which can be constructed from SDP, and be 418 serialized out to SDP. If we were able to agree on a JSON format for 419 session descriptions, we could easily enable this object to 420 generate/expect JSON. 422 Other methods may be added to SessionDescription in the future to 423 simplify handling of SessionDescriptions from Javascript. 425 4.4. Separation of Signaling and ICE State Machines 427 JSEP does away with the SDP Agent within the browser, and this 428 functionality is now controlled directly by the application, which 429 uses the setLocalDescription and setRemoteDescription APIs to tell 430 the browser what SDP has been negotiated. The ICE Agent remains in 431 the browser, as it still needs to drive the process of gathering 432 candidates, connectivity checks, and related ICE functionality. 434 When a new ICE candidate is available, the ICE Agent will notify the 435 application via a callback; these candidates will automatically be 436 added to the local session description. When all candidates have been 437 gathered, the callback will also be invoked to signal that the 438 gathering process is complete. 440 4.5. ICE Candidate Trickling 442 Candidate trickling is a technique through which a caller may 443 incrementally provide candidates to the callee after the initial 444 offer has been dispatched. This allows the callee to begin acting 445 upon the call and setting up the ICE (and perhaps DTLS) connections 446 immediately, without having to wait for the caller to allocate all 447 possible candidates, resulting in faster call startup in many cases. 449 JSEP supports optional candidate trickling by providing APIs that 450 provide control and feedback on the ICE candidate gathering process. 451 Applications that support candidate trickling can send the initial 452 offer immediately and send individual candidates when they get the 453 onicecandidate callback with a new candidate; applications that do 454 not support this feature can simply wait for the final onicecandidate 455 callback that indicates gathering is complete, and create and send 456 their offer, with all the candidates, at this time. 458 Upon receipt of trickled candidates, the receiving application can 459 supply them to its ICE Agent by calling an addIceCandidate method. 460 This triggers the ICE Agent to start using this remote candidate for 461 connectivity checks. Applications that do not make use of candidate 462 tricking can ignore addIceCandidate entirely, and use the 463 onicecandidate callback solely to indicate when candidate gathering 464 is complete. 466 4.6. ICE Candidate Format 468 As with session descriptions, we choose to provide an IceCandidate 469 object that provides some abstraction, but can be easily converted 470 to/from SDP a=candidate lines. 472 The IceCandidate object has fields to indicate which m= line it 473 should be associated with, and a method to convert to a SDP 474 representation, ex: 476 a=candidate:1 1 UDP 1694498815 66.77.88.99 10000 typ host 478 Currently, a=candidate lines are the only SDP information that is 479 contained within IceCandidate, as they represent the only information 480 needed that is not present in the initial offer (i.e. for trickle 481 candidates). 483 4.7. Interactions With Forking 485 4.7.1. Serial Forking 487 Serial forking involves a call being dispatched to multiple remote 488 callees, where each callee can accept the call, but only one active 489 session ever exists at a time; no mixing of received media is 490 performed. 492 JSEP handles serial forking well, allowing the application to easily 493 control the policy for selecting the desired remote endpoint. When an 494 answer arrives from one of the callees, the application can choose to 495 apply it either as a provisional answer, leaving open the possibility 496 of using a different answer in the future, or apply it as a final 497 answer, ending the setup flow. 499 In a "first-one-wins" situation, the first answer will be applied as 500 a final answer, and the application will send a terminate message to 501 any subsequent answers. In SIP parlance, this would be ACK + BYE. 503 In a "last-one-wins" situation, all answers would be applied as 504 provisional answers, and any previous call leg will be terminated. At 505 some point, the application will end the setup process, perhaps with 506 a timer; At this point, the application could reapply the existing 507 remote description as a final answer. 509 4.7.2. Parallel Forking 511 Parallel forking involves a call being dispatched to multiple remote 512 callees, where each callee can accept the call, and multiple 513 simultaneous active sessions can be established as a result. If 514 multiple callees send media, this media is mixed and played out at 515 the caller side. 517 JSEP can handle parallel forking by "cloning" the session when needed 518 to create multiple parallel sessions. When the first answer is 519 received, the caller can clone the existing session, and then apply 520 the answer as a final answer to the original session. Upon receiving 521 the next answer, the cloned session is cloned again, and the received 522 answer is applied as a final answer to the first clone. This process 523 repeats until the caller decides to end the setup flow, and closes 524 the final cloned session. 526 Cloned sessions inherit the local session description and candidates 527 from their parent, and an empty remote description; only sessions 528 that have not yet applied an answer can be cloned. Each cloned 529 session may discover new peer-reflexive candidates; these candidates 530 will be supplied via the onicecandidate callback to that specific 531 session. Since the clone uses the same local description as its 532 parent, creating a clone will fail if it is not possible to reserve 533 the same resources for the clone as have already been reserved by the 534 parent. 536 As a result of this cloning, the application will end up with N 537 parallel sessions, each with a local and remote description and their 538 own local and remote addresses. The media flow from these sessions 539 can be managed by specifying SDP direction attributes in the 540 descriptions, or the application can choose to play out the media 541 from all sessions mixed together. Of course, if the application wants 542 to only keep a single session, it can simply terminate the sessions 543 that it no longer needs. 545 4.8. Session Rehydration 547 In the event that the local application state is reinitialized, 548 either due to a user reload of the page, or a decision within the 549 application to reload itself (perhaps to update to a new version), it 550 is possible to keep an existing session alive via a process called 551 "rehydration". 553 With rehydration, the current local session description is persisted 554 somewhere outside of the page, perhaps on the application server, or 555 in browser local storage. The page is then reloaded, and a new 556 session object is created in Javascript. The saved local session is 557 now retrieved, but the previous ICE candidates will no longer be 558 valid in this case, so we will need to perform an ICE restart; to do 559 so, we simply generate a new ICE ufrag/pwd combo for the local 560 description. 562 The modified local description is then installed via 563 setLocalDescription, and sent off as an offer to the remote side, who 564 will reply with an answer that can be supplied to 565 setRemoteDescription. ICE processing proceeds as usual, and as soon 566 as connectivity is established, the session will be back up and 567 running again. 569 5. Interface 571 This section details the basic operations that must be present to 572 implement JSEP functionality. The actual API exposed in the W3C API 573 may have somewhat different syntax, but should map easily to these 574 concepts. 576 5.1. Methods 578 5.1.1. createOffer 580 The createOffer method generates a blob of SDP that contains a RFC 581 3264 offer with the supported configurations for the session, 582 including descriptions of the local MediaStreams attached to this 583 PeerConnection, the codec/RTP/RTCP options supported by this 584 implementation, and any candidates that have been gathered by the ICE 585 Agent. A constraints parameters may be supplied to provide additional 586 control over the generated offer, e.g. to get a full set of session 587 capabilities, or to request a new set of ICE credentials. 589 In the initial offer, the generated SDP will contain all desired 590 functionality for the session (certain parts that are supported but 591 not desired by default may be omitted); for each SDP line, the 592 generation of the SDP must follow the appropriate process for 593 generating an offer. In the event createOffer is called after the 594 session is established, createOffer will generate an offer that is 595 compatible with the current session, incorporating any changes that 596 have been made to the session since the last complete offer-answer 597 exchange, such as addition or removal of streams. If no changes have 598 been made, the offer will be identical to the current local 599 description. 601 Session descriptions generated by createOffer must be immediately 602 usable by setLocalDescription; if a system has limited resources 603 (e.g. a finite number of decoders), createOffer should return an 604 offer that reflects the current state of the system, so that 605 setLocalDescription will succeed when it attempts to acquire those 606 resources. Because this method may need to inspect the system state 607 to determine the currently available resources, it may be implemented 608 as an async operation. 610 Calling this method does not change state; its use is not required. 612 5.1.2. createAnswer 614 The createAnswer method generates a blob of SDP that contains a RFC 615 3264 SDP answer with the supported configuration for the session that 616 is compatible with the parameters supplied in |offer|. Like 617 createOffer, the returned blob contains descriptions of the local 618 MediaStreams attached to this PeerConnection, the codec/RTP/RTCP 619 options negotiated for this session, and any candidates that have 620 been gathered by the ICE Agent. A constraints parameter may be 621 supplied to provide additional control over the generated answer. 623 As an answer, the generated SDP will contain a specific configuration 624 that specifies how the media plane should be established. For each 625 SDP line, the generation of the SDP must follow the appropriate 626 process for generating an answer. 628 Session descriptions generated by createAnswer must be immediately 629 usable by setLocalDescription; like createOffer, the returned 630 description should reflect the current state of the system. Because 631 this method may need to inspect the system state to determine the 632 currently available resources, it may need to be implemented as an 633 async operation. 635 Calling this method does not change state; its use is not required. 637 5.1.3. SessionDescriptionType 639 The strings "offer", "pranswer", and "answer" serve as type arguments 640 to setLocalDescription and setRemoteDescription. They provide 641 information as to how the description parameter should be parsed, and 642 how the media state should be changed. 644 "offer" indicates that a description should be parsed as an offer; 645 said description may include many possible media configurations. A 646 description used as an "offer" may be applied anytime the 647 PeerConnection is in a stable state, or as an update to a previously 648 sent but unanswered "offer". 650 "pranswer" indicates that a description should be parsed as an 651 answer, but not a final answer, and so should not result in the 652 freeing of allocated resources. It may result in the start of media 653 transmission, if the answer does not specify an inactive media 654 direction. A description used as a "pranswer" may be applied as a 655 response to an "offer", or an update to a previously sent "answer". 657 "answer" indicates that a description should be parsed as an answer, 658 the offer-answer exchange should be considered complete, and any 659 resources (decoders, candidates) that are no longer needed can be 660 released. A description used as an "answer" may be applied as a 661 response to a "offer", or an update to a previously sent "pranswer". 663 The application can use some discretion on whether an answer should 664 be applied as provisional or final. For example, in a serial forking 665 scenario, an application may receive multiple "final" answers, one 666 from each remote endpoint. The application could accept the initial 667 answers as provisional answers, and only apply an answer as final 668 when it receives one that meets its criteria (e.g. a live user 669 instead of voicemail). 671 5.1.4. setLocalDescription 673 The setLocalDescription method instructs the PeerConnection to apply 674 the supplied SDP blob as its local configuration. The type parameter 675 indicates whether the blob should be processed as an offer, 676 provisional answer, or final answer; offers and answers are checked 677 differently, using the various rules that exist for each SDP line. 679 This API changes the local media state; among other things, it sets 680 up local resources for receiving and decoding media. In order to 681 successfully handle scenarios where the application wants to offer to 682 change from one media format to a different, incompatible format, the 683 PeerConnection must be able to simultaneously support use of both the 684 old and new local descriptions (e.g. support codecs that exist in 685 both descriptions) until a final answer is received, at which point 686 the PeerConnection can fully adopt the new local description, or roll 687 back to the old description if the remote side denied the change. 689 If setRemoteDescription was previous called with an offer, and 690 setLocalDescription is called with an answer (provisional or final), 691 and the media directions are compatible, this will result in the 692 starting of media transmission. 694 5.1.5. setRemoteDescription 695 The setRemoteDescription method instructs the PeerConnection to apply 696 the supplied SDP blob as the desired remote configuration. As in 697 setLocalDescription, the |type| parameter indicates how the blob 698 should be processed. 700 This API changes the local media state; among other things, it sets 701 up local resources for sending and encoding media. 703 If setRemoteDescription was previous called with an offer, and 704 setLocalDescription is called with an answer (provisional or final), 705 and the media directions are compatible, this will result in the 706 starting of media transmission. 708 5.1.6. localDescription 710 The localDescription method returns a copy of the current local 711 configuration, i.e. what was most recently passed to 712 setLocalDescription, plus any local candidates that have been 713 generated by the ICE Agent. 715 A null object will be returned if the local description has not yet 716 been established. 718 5.1.7. remoteDescription 720 The remoteDescription method returns a copy of the current remote 721 configuration, i.e. what was most recently passed to 722 setRemoteDescription, plus any remote candidates that have been 723 supplied via processIceMessage. 725 A null object will be returned if the remote description has not yet 726 been established. 728 5.1.8. updateIce 730 The updateIce method allows the configuration of the ICE Agent to be 731 changed during the session, primarily for changing which types of 732 local candidates are provided to the application and used for 733 connectivity checks. A callee may initially configure the ICE Agent 734 to use only relay candidates, to avoid leaking location information, 735 but update this configuration to use all candidates once the call is 736 accepted. 738 Regardless of the configuration, the gathering process collects all 739 available candidates, but excluded candidates will not be surfaced in 740 onicecallback or used for connectivity checks. 742 This call may result in a change to the state of the ICE Agent, and 743 may result in a change to media state if it results in connectivity 744 being established. 746 5.1.9. addIceCandidate 748 The addIceCandidate method provides a remote candidate to the ICE 749 Agent, which will be added to the remote description. Connectivity 750 checks will be sent to the new candidate. 752 This call will result in a change to the state of the ICE Agent, and 753 may result in a change to media state if it results in connectivity 754 being established. 756 5.2. Configurable SDP Parameters 758 The following is a partial list of SDP parameters that an application 759 may want to control, in either local or remote descriptions, using 760 this API. 762 - remove or reorder codecs (m=) 763 - change codec attributes (a=fmtp; ptime) 764 - enable/disable BUNDLE (a=group) 765 - enable/disable RTCP mux (a=rtcp-mux) 766 - remove or reorder SRTP crypto-suites (a=crypto) 767 - change SRTP parameters or keys (a=crypto) 768 - change send resolution or framerate (TBD) 769 - change desired recv resolution or framerate (TBD) 770 - change total bandwidth (b=) 771 - remove desired AVPF mechanisms (a=rtcp-fb) 772 - remove RTP header extensions (a=rtphdr-ext) 773 - add/change SSRC grouping (e.g. FID, RTX, etc) (a=ssrc-group) 774 - add SSRC attributes (a=ssrc) 775 - change ICE ufrag/password (a=ice-ufrag/pwd) 776 - change media send/recv state (a=sendonly/recvonly/inactive) 778 For example, an application could implement call hold by adding an 779 a=inactive attribute to its local description, and then applying and 780 signaling that description. 782 6. Media Setup Overview 784 The example here shows a typical call setup using the JSEP model, 785 indicating the functions that are called and the state changes that 786 occur. We assume the following architecture in this example, where UA 787 is synonymous with "browser", and JS is synonymous with "web 788 application": 790 OffererUA <-> OffererJS <-> WebServer <-> AnswererJS <-> AnswererUA 792 6.1. Initiating the Session 794 The initiator creates a PeerConnection, hooks up to its ICE callback, 795 and adds the desired MediaStreams (presumably obtained via 796 getUserMedia). The ICE gathering process begins to gather candidates 797 for a default number of streams, as the exact number will not be 798 known until the local description is applied. The PeerConnection is 799 in the NEW state. 801 OffererJS->OffererUA: var pc = new PeerConnection(config, null); 802 OffererJS->OffererUA: pc.onicecandidate = onIceCandidate; 803 OffererJS->OffererUA: pc.addStream(stream); 805 6.1.1. Generating An Offer 807 The initiator then creates a session description to offer to the 808 callee. This description includes the codecs and other necessary 809 session parameters, as well as information about each of the streams 810 that has been added (e.g. SSRC, CNAME, etc.) The created description 811 includes all parameters that the offerer's UA supports; if the 812 initiator wants to influence the created offer, they can pass in a 813 MediaConstraints object to createOffer that allows for customization 814 (e.g. if the initiator wants to receive but not send video). The 815 initiator can also directly manipulate the created session 816 description as well, perhaps if it wants to change the priority of 817 the offered codecs. 819 OffererJS->OffererUA: var offer = pc.createOffer(null); 821 6.1.2. Applying the Offer 823 The initiator then instructs the PeerConnection to use this offer as 824 the local description for this session, i.e. what codecs it will use 825 for received media, what SRTP keys it will use for sending media (if 826 using SDES), etc. In order that the UA handle the description 827 properly, the initiator marks it as an offer when calling 828 setLocalDescription; this indicates to the UA that multiple 829 capabilities have been offered, but this set may be pared back later, 830 when the answer arrives. 832 Since the local user agent must be prepared to receive media upon 833 applying the offer, this operation will cause local decoder resources 834 to be allocated, based on the codecs indicated in the offer. 836 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 838 6.1.3. Handling ICE Callbacks 839 The initiator starts to receive callbacks on its onicecandidate 840 handler. Candidates are provided to the IceCallback as they are 841 allocated; when the last allocation completes or times out, this 842 callback will be invoked with a null argument. 844 OffererUA->OffererJS: onIceCandidate(candidate); 846 6.1.4. Serializing the Offer and Candidates 848 At this point, the offerer is ready to send its offer to the callee 849 using its preferred signaling protocol. Depending on the protocol, it 850 can either send the initial session description first, and then 851 "trickle" the ICE candidates as they are given to the application, or 852 it can wait for all the ICE candidates to be collected, and then send 853 the offer and list of candidates all at once. 855 6.2. Receiving the Session 857 Through the chosen signaling protocol, the recipient is notified of 858 an incoming session request. It creates a PeerConnection, and sets up 859 its own ICE callback. The ICE gathering process begins to gather 860 candidates for a default number of streams. 862 AnswererJS->AnswererUA: var pc = new PeerConnection(config, null); 863 AnswererJS->AnswererUA: pc.onicecandidate = onIceCandidate; 865 6.2.1. Receiving the Offer 867 The recipient converts the received offer from its signaling protocol 868 into SDP format, and supplies it to its PeerConnection, again marking 869 it as an offer. As a remote description, the offer indicates what 870 codecs the remote side wants to use for receiving, as well as what 871 SRTP keys it will use for sending. The setting of the remote 872 description causes callbacks to be issued, informing the application 873 of what kinds of streams are present in the offer. 875 This step will also cause encoder resources to be allocated, based on 876 the codecs specified in |offer|. 878 AnswererJS->AnswererUA: pc.setRemoteDescription("offer", offer); 879 AnswererUA->AnswererJS: onAddStream(stream); 881 6.2.2. Handling ICE Messages 883 If ICE candidates from the remote site were included in the offer, 884 the ICE Agent will automatically start trying to use them. Otherwise, 885 if ICE candidates are sent separately, they are passed into the 886 PeerConnection when they arrive. 888 AnswererJS->AnswererUA: pc.addIceCandidate(candidate); 890 6.2.3. Generating the Answer 892 Once the recipient has decided to accept the session, it generates an 893 answer session description. This process performs the appropriate 894 intersection of codecs and other parameters to generate the correct 895 answer. As with the offer, MediaConstraints can be provided to 896 influence the answer that is generated, and/or the application can 897 post-process the answer manually. 899 AnswererJS->AnswererUA: pc.createAnswer(offer, null); 901 6.2.4. Applying the Answer 903 The recipient then instructs the PeerConnection to use the answer as 904 its local description for this session, i.e. what codecs it will use 905 to receive media, etc. It also marks the description as an answer, 906 which tells the UA that these parameters are final. This causes the 907 PeerConnection to move to the ACTIVE state, and transmission of media 908 by the answerer to start (assuming both sides have indicated this in 909 their descriptions). 911 AnswererJS->AnswererUA: pc.setLocalDescription("answer", answer); 912 AnswererUA->OffererUA: 914 6.2.5. Serializing the Answer 916 As with the offer, the answer (with or without candidates) is now 917 converted to the desired signaling format and sent to the initiator. 919 6.3. Completing the Session 921 6.3.1. Receiving the Answer 923 The initiator converts the answer from the signaling protocol and 924 applies it as the remote description, marking it as an answer. This 925 causes the PeerConnection to move to the ACTIVE state, and 926 transmission of media by the offerer to start (assuming both sides 927 have indicated this in their descriptions). 929 OffererJS->OffererUA: pc.setRemoteDescription("answer", answer); 930 OffererUA->AnswererUA: 932 6.4. Updates to the Session 934 Updates to the session are handled with a new offer/answer exchange. 935 However, since media will already be flowing at this point, the new 936 offerer needs to support both its old session description as well as 937 the new one it has offered, until the change is accepted by the 938 remote side. 940 Note also that in an update scenario, the roles may be reversed, i.e. 941 the update offerer can be different than the original offerer. 943 7. Security Considerations 945 TODO 947 8. IANA Considerations 949 This document requires no actions from IANA. 951 9. Acknowledgements 953 Harald Alvestrand, Dan Burnett, Neil Stratford, Eric Rescorla, Anant 954 Narayanan, and Adam Bergkvist all provided valuable feedback on this 955 proposal. Matthew Kaufman provided the observation that keeping state 956 out of the browser allows a call to continue even if the page is 957 reloaded. Richard Ejzak provided the specifics on session cloning. 959 10. References 961 10.1. Normative References 963 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 964 Requirement Levels", BCP 14, RFC 2119, March 1997. 966 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 967 with Session Description Protocol (SDP)", RFC 3264, June 2002. 969 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 970 Description Protocol", RFC 4566, July 2006. 972 10.2. Informative References 974 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 975 Description Protocol (SDP) Security Descriptions for Media Streams", 976 RFC 4568, July 2006. 978 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 979 (ICE): A Protocol for Network Address Translator (NAT) Traversal for 980 Offer/Answer Protocols", RFC 5245, April 2010. 982 [webrtc-api] Bergkvist, Burnett, Jennings, Narayanan, "WebRTC 1.0: 984 Real-time Communication Between Browsers", May 2011. 986 Available at http://dev.w3.org/2012/webrtc/editor/webrtc.html 988 Appendix A. JSEP Implementation Examples 990 A.1. Example API 992 The interface below shows a basic Javascript API that could be used 993 to expose the functionality discussed in this document. This API is 994 used for the examples in the following parts of this Appendix. 996 // actions, for setLocalDescription/setRemoteDescription 997 enum SessionDescriptionType { "offer", "pranswer", "answer" } 999 // constraints that can be supplied to the ctor or createXXXX 1000 enum MediaConstraints { 1001 "offerConfig", // controls the kind of offer created; 1002 // "default" (normal offer) 1003 // "caps" (all capabilities) 1004 // "new" (brand new description) 1005 // "iceRestart" (new ICE creds) 1007 "iceTransports", // controls ICE candidates; can be 1008 // "none" (no candidates) 1009 // "relay" (only relay candidates) 1010 // "all" (all available candidates) 1011 } 1013 [Constructor (int index, DOMString id, in DOMString candidateLine)] 1014 interface IceCandidate { 1015 // the m= line index for this candidate 1016 readonly attribute int mLineIndex 1017 // the mid for the m= line for this candidate 1018 readonly attribute DOMString mLineId; 1019 // creates a SDP-ized form of this candidate 1020 stringifier DOMString (); 1021 }; 1023 [Constructor (DOMString sdp)] 1024 interface SessionDescription { 1025 // adds the specified candidate to the description 1026 void addCandidate(IceCandidate candidate); 1027 // serializes the description to SDP 1028 stringifier DOMString (); 1029 }; 1031 [Constructor (DOMString configuration, 1032 optional MediaConstraints constraints)] 1033 interface PeerConnection { 1034 // creates a blob of SDP to be provided as an offer. 1035 SessionDescription createOffer ( 1036 SessionDescriptionCallback successCb, 1037 optional ErrorCallback errorCb, 1038 optional MediaContraints constraints); 1039 // creates a blob of SDP to be provided as an answer. 1040 SessionDescription createAnswer ( 1041 SessionDescription offer, 1042 SessionDescriptionCallback successCb, 1043 optional ErrorCallback errorCb, 1044 optional MediaContraints constraints); 1046 // sets the local session description 1047 void setLocalDescription ( 1048 SessionDescriptionType action, 1049 SessionDescription desc); 1050 // sets the remote session description 1051 void setRemoteDescription ( 1052 SessionDescriptionType action, 1053 SessionDescription desc) 1054 // returns the current local session description 1055 readonly attribute SessionDescription localDescription; 1056 // returns the current remote session description 1057 readonly attribute SessionDescription remoteDescription; 1059 // updates the constraints for ICE processing 1060 void updateIce ( 1061 optional DOMString configuration, 1062 optional MediaConstraints constraints); 1063 // starts using a received remote ICE candidate 1064 void addIceCandidate ( 1065 IceCandidate candidate); 1066 // notifies the application of a new local ICE candidate 1067 attribute Function? onicecandidate; 1068 }; 1070 A.2. Example API Flows 1072 Below are several sample flows for the new PeerConnection and library 1073 APIs, demonstrating when the various APIs are called in different 1074 situations and with various transport protocols. For clarity and 1075 simplicity, the createOffer/createAnswer calls are assumed to be 1076 synchronous in these examples, whereas the actual APIs are async. 1078 A.2.1. Call using ROAP 1079 This example demonstrates a ROAP call, without the use of trickle 1080 candidates. 1082 // Call is initiated toward Answerer 1083 OffererJS->OffererUA: pc = new PeerConnection(); 1084 OffererJS->OffererUA: pc.addStream(localStream, null); 1085 OffererUA->OffererJS: iceCallback(candidate); 1086 OffererJS->OffererUA: offer = pc.createOffer(null); 1087 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1088 OffererJS->AnswererJS: {"type":"OFFER", "sdp":offer } 1090 // OFFER arrives at Answerer 1091 AnswererJS->AnswererUA: pc = new PeerConnection(); 1092 AnswererJS->AnswererUA: pc.setRemoteDescription("offer", msg.sdp); 1093 AnswererUA->AnswererJS: onaddstream(remoteStream); 1094 AnswererUA->OffererUA: iceCallback(candidate); 1096 // Answerer accepts call 1097 AnswererJS->AnswererUA: peer.addStream(localStream, null); 1098 AnswererJS->AnswererUA: answer = peer.createAnswer(msg.sdp, null); 1099 AnswererJS->AnswererUA: peer.setLocalDescription("answer", answer); 1100 AnswererJS->OffererJS: {"type":"ANSWER","sdp":answer } 1102 // ANSWER arrives at Offerer 1103 OffererJS->OffererUA: peer.setRemoteDescription("answer", answer); 1104 OffererUA->OffererJS: onaddstream(remoteStream); 1106 // ICE Completes (at Answerer) 1107 AnswererUA->AnswererJS: onopen(); 1108 AnswererUA->OffererUA: Media 1110 // ICE Completes (at Offerer) 1111 OffererUA->OffererJS: onopen(); 1112 OffererJS->AnswererJS: {"type":"OK" } 1113 OffererUA->AnswererUA: Media 1115 A.2.2 Call using XMPP 1117 This example demonstrates an XMPP call, making use of trickle 1118 candidates. 1120 // Call is initiated toward Answerer 1121 OffererJS->OffererUA: pc = new PeerConnection(); 1122 OffererJS->OffererUA: pc.addStream(localStream, null); 1123 OffererJS->OffererUA: offer = pc.createOffer(null); 1124 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1125 OffererJS: xmpp = createSessionInitiate(offer); 1126 OffererJS->AnswererJS: 1127 OffererJS->OffererUA: pc.startIce(); 1128 OffererUA->OffererJS: onicecandidate(cand); 1129 OffererJS: createTransportInfo(cand); 1130 OffererJS->AnswererJS: 1132 // session-initiate arrives at Answerer 1133 AnswererJS->AnswererUA: pc = new PeerConnection(); 1134 AnswererJS: offer = parseSessionInitiate(xmpp); 1135 AnswererJS->AnswererUA: pc.setRemoteDescription("offer", offer); 1136 AnswererUA->AnswererJS: onaddstream(remoteStream); 1138 // transport-infos arrive at Answerer 1139 AnswererJS->AnswererUA: candidate = parseTransportInfo(xmpp); 1140 AnswererJS->AnswererUA: pc.addIceCandidate(candidate); 1141 AnswererUA->AnswererJS: onicecandidate(cand) 1142 AnswererJS: createTransportInfo(cand); 1143 AnswererJS->OffererJS: 1145 // transport-infos arrive at Offerer 1146 OffererJS->OffererUA: candidates = parseTransportInfo(xmpp); 1147 OffererJS->OffererUA: pc.addIceCandidate(candidates); 1149 // Answerer accepts call 1150 AnswererJS->AnswererUA: peer.addStream(localStream, null); 1151 AnswererJS->AnswererUA: answer = peer.createAnswer(offer, null); 1152 AnswererJS: xmpp = createSessionAccept(answer); 1153 AnswererJS->AnswererUA: pc.setLocalDescription("answer", answer); 1154 AnswererJS->OffererJS: 1156 // session-accept arrives at Offerer 1157 OffererJS: answer = parseSessionAccept(xmpp); 1158 OffererJS->OffererUA: peer.setRemoteDescription("answer", answer); 1159 OffererUA->OffererJS: onaddstream(remoteStream); 1161 // ICE Completes (at Answerer) 1162 AnswererUA->AnswererJS: onopen(); 1163 AnswererUA->OffererUA: Media 1165 // ICE Completes (at Offerer) 1166 OffererUA->OffererJS: onopen(); 1167 OffererUA->AnswererUA: Media 1169 A.2.3. Adding video to a call, using XMPP 1171 This example demonstrates an XMPP call, where the XMPP content-add 1172 mechanism is used to add video media to an existing session. For 1173 simplicity, candidate exchange is not shown. 1175 Note that the offerer for the change to the session may be different 1176 than the original call offerer. 1178 // Offerer adds video stream 1179 OffererJS->OffererUA: pc.addStream(videoStream) 1180 OffererJS->OffererUA: offer = pc.createOffer(null); 1181 OffererJS: xmpp = createContentAdd(offer); 1182 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1183 OffererJS->AnswererJS: 1185 // content-add arrives at Answerer 1186 AnswererJS: offer = parseContentAdd(xmpp); 1187 AnswererJS->AnswererUA: pc.setRemoteDescription("offer", offer); 1188 AnswererJS->AnswererUA: answer = pc.createAnswer(offer, null); 1189 AnswererJS->AnswererUA: pc.setLocalDescription("answer", answer); 1190 AnswererJS: xmpp = createContentAccept(answer); 1191 AnswererJS->OffererJS: 1193 // content-accept arrives at Offerer 1194 OffererJS: answer = parseContentAccept(xmpp); 1195 OffererJS->OffererUA: pc.setRemoteDescription("answer", answer); 1197 A.2.4. Simultaneous add of video streams, using XMPP 1199 This example demonstrates an XMPP call, where new video sources are 1200 added at the same time to a call that already has video; since adding 1201 these sources only affects one side of the call, there is no 1202 conflict. The XMPP description-info mechanism is used to indicate the 1203 new sources to the remote side. 1205 // Offerer and "Answerer" add video streams at the same time 1206 OffererJS->OffererUA: pc.addStream(offererVideoStream2) 1207 OffererJS->OffererUA: offer = pc.createOffer(null); 1208 OffererJS: xmpp = createDescriptionInfo(offer); 1209 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1210 OffererJS->AnswererJS: 1212 AnswererJS->AnswererUA: pc.addStream(answererVideoStream2) 1213 AnswererJS->AnswererUA: offer = pc.createOffer(null); 1214 AnswererJS: xmpp = createDescriptionInfo(offer); 1215 AnswererJS->AnswererUA: pc.setLocalDescription("offer", offer); 1216 AnswererJS->OffererJS: 1218 // description-info arrives at "Answerer", and is acked 1219 AnswererJS: offer = parseDescriptionInfo(xmpp); 1220 AnswererJS->OffererJS: // ack 1225 // ack arrives at Offerer; remote offer is used as an answer 1226 OffererJS->OffererUA: pc.setRemoteDescription("answer", offer); 1228 // ack arrives at "Answerer"; remote offer is used as an answer 1229 AnswererJS->AnswererUA: pc.setRemoteDescription("answer", offer); 1231 A.2.5. Call using SIP 1233 This example demonstrates a simple SIP call (e.g. where the client 1234 talks to a SIP proxy over WebSockets). 1236 // Call is initiated toward Answerer 1237 OffererJS->OffererUA: pc = new PeerConnection(); 1238 OffererJS->OffererUA: pc.addStream(localStream, null); 1239 OffererUA->OffererJS: onicecandidate(candidate); 1240 OffererJS->OffererUA: offer = pc.createOffer(null); 1241 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1242 OffererJS: sip = createInvite(offer); 1243 OffererJS->AnswererJS: SIP INVITE w/ SDP 1245 // INVITE arrives at Answerer 1246 AnswererJS->AnswererUA: pc = new PeerConnection(); 1247 AnswererJS: offer = parseInvite(sip); 1248 AnswererJS->AnswererUA: pc.setRemoteDescription("offer", offer); 1249 AnswererUA->AnswererJS: onaddstream(remoteStream); 1250 AnswererUA->OffererUA: onicecandidate(candidate); 1252 // Answerer accepts call 1253 AnswererJS->AnswererUA: peer.addStream(localStream, null); 1254 AnswererJS->AnswererUA: answer = peer.createAnswer(offer, null); 1255 AnswererJS: sip = createResponse(200, answer); 1256 AnswererJS->AnswererUA: peer.setLocalDescription("answer", answer); 1257 AnswererJS->OffererJS: 200 OK w/ SDP 1259 // 200 OK arrives at Offerer 1260 OffererJS: answer = parseResponse(sip); 1261 OffererJS->OffererUA: peer.setRemoteDescription("answer", answer); 1262 OffererUA->OffererJS: onaddstream(remoteStream); 1263 OffererJS->AnswererJS: ACK 1265 // ICE Completes (at Answerer) 1266 AnswererUA->AnswererJS: onopen(); 1267 AnswererUA->OffererUA: Media 1268 // ICE Completes (at Offerer) 1269 OffererUA->OffererJS: onopen(); 1270 OffererUA->AnswererUA: Media 1272 A.2.6. Handling early media (e.g. 1-800-FEDEX), using SIP 1274 This example demonstrates how early media could be handled; for 1275 simplicity, only the offerer side of the call is shown. 1277 // Call is initiated toward Answerer 1278 OffererJS->OffererUA: pc = new PeerConnection(); 1279 OffererJS->OffererUA: pc.addStream(localStream, null); 1280 OffererUA->OffererJS: onicecandidate(candidate); 1281 OffererJS->OffererUA: offer = pc.createOffer(null); 1282 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1283 OffererJS: sip = createInvite(offer); 1284 OffererJS->AnswererJS: SIP INVITE w/ SDP 1286 // 180 Ringing is received by offerer, w/ SDP 1287 OffererJS: answer = parseResponse(sip); 1288 OffererJS->OffererUA: pc.setRemoteDescription("pranswer", answer); 1289 OffererUA->OffererJS: onaddstream(remoteStream); 1291 // ICE Completes (at Offerer) 1292 OffererUA->OffererJS: onopen(); 1293 OffererUA->AnswererUA: Media 1295 // 200 OK arrives at Offerer 1296 OffererJS: answer = parseResponse(sip); 1297 OffererJS->OffererUA: pc.setRemoteDescription("answer", answer); 1298 OffererJS->AnswererJS: ACK 1300 A.3. Full Example Application 1302 The following example demonstrates a simple video calling 1303 application, using both trickle candidates and provisional answers to 1304 speed up call setup. 1306 // Usage: 1307 // Caller calls start(true) 1308 // Callee calls start(false) to prepare the call/start connecting, 1309 // and then accept() to start transmitting. 1311 var signalingChannel = createSignalingChannel(); 1312 var pc = null; 1313 var localStream = null; 1314 signalingChannel.onmessage = handleMessage; 1315 // Set up the call, get access to local media, 1316 // and establish connectivity. 1317 function start(isCaller) { 1318 // Create a PeerConnection and hook up the IceCallback. 1319 pc = new webkitPeerConnection(null, null); 1320 pc.onicecandidate = function(evt) { 1321 sendMessage("candidate", evt.candidate); 1322 }; 1324 // Get the local stream and show it in the local video element; 1325 // if we're the caller, ship off an offer once we get the stream. 1326 navigator.webkitGetUserMedia( 1327 {"audio": true, "video": true}, function (stream) { 1328 selfView.src = webkitURL.createObjectURL(stream); 1329 localStream = stream; 1330 if (isCaller) { 1331 pc.addStream(stream); 1332 pc.createOffer(function(sdp) { 1333 setLocalAndSendMessage("offer", sdp); 1334 }); 1335 }); 1337 // When the remote stream arrives, show it in the remote 1338 // video element. 1339 pc.onaddstream = function(evt) { 1340 remoteView.src = webkitURL.createObjectURL(evt.stream); 1341 }; 1342 } 1344 // The callee has accepted the call, attach their media 1345 // and send a final answer. 1346 function accept() { 1347 // The addStream could also be done for the pranswer, 1348 // although that would delay the pranswer 1349 // (due to the need for user consent) 1350 pc.addStream(localStream); // assumes we have the stream already 1351 pc.createAnswer(msg.sdp, function(sdp) { 1352 setLocalAndSendMessage("answer", sdp); 1353 }); 1354 } 1356 // -- internal methods -- 1358 // Apply SDP locally and send it to the remote side. 1359 function setLocalAndSendMessage(type, sdp) { 1360 pc.setLocalDescription(type, sdp); 1361 sendMessage(type, sdp); 1362 } 1363 // Send a signaling message to the remote side. 1364 function sendMessage(type, obj) { 1365 signalingChannel.send( 1366 JSON.stringify({ "type": type, "sdp": obj })); 1367 } 1369 // Handle incoming signaling messages. 1370 function handleMessage(str) { 1371 var msg = JSON.parse(str); 1372 switch (msg.type) { 1373 case "offer": 1374 // create the PeerConnection 1375 start(false); 1376 // feed the received offer into the PeerConnection 1377 pc.setRemoteDescription(msg.type, msg.sdp); 1378 // create provisional answer to allow ICE/DTLS to start 1379 pc.createAnswer(msg.sdp, function(sdp) { 1380 setDirection(sdp, "recvonly"); 1381 setLocalAndSendMessage("pranswer", sdp); 1382 }); 1383 break; 1384 case "pranswer": 1385 case "answer": 1386 pc.setRemoteDescription(msg.type, msg.sdp); 1387 break; 1388 case "candidate": 1389 pc.addIceCandidate(msg.sdp); 1390 break; 1391 } 1392 } 1394 Appendix B. Change log 1396 01: Added diagrams for architecture and state machine. 1397 Added sections on forking and rehydration. 1398 Clarified meaning of "pranswer" and "answer". 1399 Reworked how ICE restarts and media directions are controlled. 1400 Added list of parameters that can be changed in a description. 1401 Updated suggested API and examples to match latest thinking. 1402 Suggested API and examples have been moved to an appendix. 1403 00: Migrated from draft-uberti-rtcweb-jsep-02. 1405 Authors' Addresses 1407 Justin Uberti 1408 Google 1409 5 Cambridge Center 1410 Cambridge, MA 02142 1411 Email: justin@uberti.name 1413 Cullen Jennings 1414 Cisco 1415 170 West Tasman Drive 1416 San Jose, CA 95134 1417 USA 1419 Email: fluffy@cisco.com