idnits 2.17.1 draft-ietf-rtcweb-jsep-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 18, 2013) is 3872 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-17) exists of draft-ietf-mmusic-msid-01 == Outdated reference: A later version (-26) exists of draft-ietf-mmusic-sctp-sdp-04 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-04 == Outdated reference: A later version (-11) exists of draft-ietf-rtcweb-audio-02 == Outdated reference: A later version (-26) exists of draft-ietf-rtcweb-rtp-usage-09 == Outdated reference: A later version (-05) exists of draft-nandakumar-mmusic-sdp-mux-attributes-03 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 4572 (Obsoleted by RFC 8122) ** Obsolete normative reference: RFC 5245 (Obsoleted by RFC 8445, RFC 8839) ** Obsolete normative reference: RFC 5285 (Obsoleted by RFC 8285) == Outdated reference: A later version (-08) exists of draft-nandakumar-rtcweb-sdp-02 Summary: 4 errors (**), 0 flaws (~~), 8 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Uberti 3 Internet-Draft Google 4 Intended status: Standards Track C. Jennings 5 Expires: March 22, 2014 Cisco 6 September 18, 2013 8 Javascript Session Establishment Protocol 9 draft-ietf-rtcweb-jsep-04 11 Abstract 13 This document describes the mechanisms for allowing a Javascript 14 application to control the signaling plane of a multimedia session 15 via the interface specified in the W3C RTCPeerConnection API, and 16 discusses how this relates to existing signaling protocols. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on March 22, 2014. 35 Copyright Notice 37 Copyright (c) 2013 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. General Design of JSEP . . . . . . . . . . . . . . . . . 3 54 1.2. Other Approaches Considered . . . . . . . . . . . . . . . 5 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 56 3. Semantics and Syntax . . . . . . . . . . . . . . . . . . . . 6 57 3.1. Signaling Model . . . . . . . . . . . . . . . . . . . . . 6 58 3.2. Session Descriptions and State Machine . . . . . . . . . 7 59 3.3. Session Description Format . . . . . . . . . . . . . . . 9 60 3.4. ICE . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 61 3.4.1. ICE Candidate Trickling . . . . . . . . . . . . . . . 10 62 3.4.1.1. ICE Candidate Format . . . . . . . . . . . . . . 10 63 3.5. Interactions With Forking . . . . . . . . . . . . . . . . 11 64 3.5.1. Sequential Forking . . . . . . . . . . . . . . . . . 11 65 3.5.2. Parallel Forking . . . . . . . . . . . . . . . . . . 12 66 3.6. Session Rehydration . . . . . . . . . . . . . . . . . . . 13 67 4. Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 13 68 4.1. Methods . . . . . . . . . . . . . . . . . . . . . . . . . 14 69 4.1.1. createOffer . . . . . . . . . . . . . . . . . . . . . 14 70 4.1.2. createAnswer . . . . . . . . . . . . . . . . . . . . 15 71 4.1.3. SessionDescriptionType . . . . . . . . . . . . . . . 15 72 4.1.3.1. Use of Provisional Answers . . . . . . . . . . . 16 73 4.1.3.2. Rollback . . . . . . . . . . . . . . . . . . . . 17 74 4.1.4. setLocalDescription . . . . . . . . . . . . . . . . . 17 75 4.1.5. setRemoteDescription . . . . . . . . . . . . . . . . 18 76 4.1.6. localDescription . . . . . . . . . . . . . . . . . . 18 77 4.1.7. remoteDescription . . . . . . . . . . . . . . . . . . 18 78 4.1.8. updateIce . . . . . . . . . . . . . . . . . . . . . . 19 79 4.1.9. addIceCandidate . . . . . . . . . . . . . . . . . . . 19 80 5. SDP Interaction Procedures . . . . . . . . . . . . . . . . . 19 81 5.1. SDP Requirements Overview . . . . . . . . . . . . . . . . 19 82 5.2. Constructing an Offer . . . . . . . . . . . . . . . . . . 21 83 5.2.1. Initial Offers . . . . . . . . . . . . . . . . . . . 21 84 5.2.2. Subsequent Offers . . . . . . . . . . . . . . . . . . 25 85 5.2.3. Constraints Handling . . . . . . . . . . . . . . . . 26 86 5.2.3.1. OfferToReceiveAudio . . . . . . . . . . . . . . . 26 87 5.2.3.2. OfferToReceiveVideo . . . . . . . . . . . . . . . 27 88 5.2.3.3. VoiceActivityDetection . . . . . . . . . . . . . 27 89 5.2.3.4. IceRestart . . . . . . . . . . . . . . . . . . . 27 90 5.3. Generating an Answer . . . . . . . . . . . . . . . . . . 27 91 5.3.1. Initial Answers . . . . . . . . . . . . . . . . . . . 27 92 5.3.2. Subsequent Answers . . . . . . . . . . . . . . . . . 31 93 5.3.3. Constraints Handling . . . . . . . . . . . . . . . . 31 94 5.4. Parsing an Offer . . . . . . . . . . . . . . . . . . . . 31 95 5.5. Parsing an Answer . . . . . . . . . . . . . . . . . . . . 31 96 5.6. Applying a Local Description . . . . . . . . . . . . . . 31 97 5.7. Applying a Remote Description . . . . . . . . . . . . . . 31 99 6. Configurable SDP Parameters . . . . . . . . . . . . . . . . . 31 100 7. Security Considerations . . . . . . . . . . . . . . . . . . . 33 101 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 102 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33 103 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 104 10.1. Normative References . . . . . . . . . . . . . . . . . . 33 105 10.2. Informative References . . . . . . . . . . . . . . . . . 35 106 Appendix A. JSEP Implementation Examples . . . . . . . . . . . . 36 107 A.1. Example API Flows . . . . . . . . . . . . . . . . . . . . 36 108 A.1.1. Call using ROAP . . . . . . . . . . . . . . . . . . . 36 109 A.1.2. Call using XMPP . . . . . . . . . . . . . . . . . . . 37 110 A.1.3. Adding video to a call, using XMPP . . . . . . . . . 38 111 A.1.4. Simultaneous add of video streams, using XMPP . . . . 39 112 A.1.5. Call using SIP . . . . . . . . . . . . . . . . . . . 40 113 A.1.6. Handling early media (e.g. 1-800-GO FEDEX), using SIP 40 114 A.2. Example Session Descriptions . . . . . . . . . . . . . . 41 115 A.2.1. createOffer . . . . . . . . . . . . . . . . . . . . . 41 116 A.2.2. createAnswer . . . . . . . . . . . . . . . . . . . . 43 117 Appendix B. Change log . . . . . . . . . . . . . . . . . . . . . 44 118 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 45 120 1. Introduction 122 This document describes how the W3C WEBRTC RTCPeerConnection 123 interface[W3C.WD-webrtc-20111027] is used to control the setup, 124 management and teardown of a multimedia session. 126 1.1. General Design of JSEP 128 The thinking behind WebRTC call setup has been to fully specify and 129 control the media plane, but to leave the signaling plane up to the 130 application as much as possible. The rationale is that different 131 applications may prefer to use different protocols, such as the 132 existing SIP or Jingle call signaling protocols, or something custom 133 to the particular application, perhaps for a novel use case. In this 134 approach, the key information that needs to be exchanged is the 135 multimedia session description, which specifies the necessary 136 transport and media configuration information necessary to establish 137 the media plane. 139 The browser environment also has its own challenges that pose 140 problems for an embedded signaling state machine. One of these is 141 that the user may reload the web page at any time. If the browser is 142 fully in charge of the signaling state, this will result in the loss 143 of the call when this state is wiped by the reload. However, if the 144 state can be stored at the server, and pushed back down to the new 145 page, the call can be resumed with minimal interruption. 147 With these considerations in mind, this document describes the 148 Javascript Session Establishment Protocol (JSEP) that allows for full 149 control of the signaling state machine from Javascript. This 150 mechanism effectively removes the browser almost completely from the 151 core signaling flow; the only interface needed is a way for the 152 application to pass in the local and remote session descriptions 153 negotiated by whatever signaling mechanism is used, and a way to 154 interact with the ICE state machine. 156 In this document, the use of JSEP is described as if it always occurs 157 between two browsers. Note though in many cases it will actually be 158 between a browser and some kind of server, such as a gateway or MCU. 159 This distinction is invisible to the browser; it just follows the 160 instructions it is given via the API. 162 JSEP's handling of session descriptions is simple and 163 straightforward. Whenever an offer/answer exchange is needed, the 164 initiating side creates an offer by calling a createOffer() API. The 165 application optionally modifies that offer, and then uses it to set 166 up its local config via the setLocalDescription() API. The offer is 167 then sent off to the remote side over its preferred signaling 168 mechanism (e.g., WebSockets); upon receipt of that offer, the remote 169 party installs it using the setRemoteDescription() API. 171 When the call is accepted, the callee uses the createAnswer() API to 172 generate an appropriate answer, applies it using 173 setLocalDescription(), and sends the answer back to the initiator 174 over the signaling channel. When the offerer gets that answer, it 175 installs it using setRemoteDescription(), and initial setup is 176 complete. This process can be repeated for additional offer/answer 177 exchanges. 179 Regarding ICE [RFC5245], JSEP decouples the ICE state machine from 180 the overall signaling state machine, as the ICE state machine must 181 remain in the browser, because only the browser has the necessary 182 knowledge of candidates and other transport info. Performing this 183 separation also provides additional flexibility; in protocols that 184 decouple session descriptions from transport, such as Jingle, the 185 transport information can be sent separately; in protocols that 186 don't, such as SIP, the information can be used in the aggregated 187 form. Sending transport information separately can allow for faster 188 ICE and DTLS startup, since the necessary roundtrips can occur while 189 waiting for the remote side to accept the session. 191 Through its abstraction of signaling, the JSEP approach does require 192 the application to be aware of the signaling process. While the 193 application does not need to understand the contents of session 194 descriptions to set up a call, the application must call the right 195 APIs at the right times, convert the session descriptions and ICE 196 information into the defined messages of its chosen signaling 197 protocol, and perform the reverse conversion on the messages it 198 receives from the other side. 200 One way to mitigate this is to provide a Javascript library that 201 hides this complexity from the developer; said library would 202 implement a given signaling protocol along with its state machine and 203 serialization code, presenting a higher level call-oriented interface 204 to the application developer. For example, this library could easily 205 adapt the JSEP API into the API that was proposed for the ROAP 206 signaling protocol [I-D.jennings-rtcweb-signaling], which would 207 perform a ROAP call setup under the covers, interacting with the 208 application only when it needs a signaling message to be sent. In 209 the same fashion, one could also implement other popular signaling 210 protocols, including SIP or Jingle. This allow JSEP to provide 211 greater control for the experienced developer without forcing any 212 additional complexity on the novice developer. 214 1.2. Other Approaches Considered 216 One approach that was considered instead of JSEP was to include a 217 lightweight signaling protocol. Instead of providing session 218 descriptions to the API, the API would produce and consume messages 219 from this protocol. While providing a more high-level API, this put 220 more control of signaling within the browser, forcing the browser to 221 have to understand and handle concepts like signaling glare. In 222 addition, it prevented the application from driving the state machine 223 to a desired state, as is needed in the page reload case. 225 A second approach that was considered but not chosen was to decouple 226 the management of the media control objects from session 227 descriptions, instead offering APIs that would control each component 228 directly. This was rejected based on a feeling that requiring 229 exposure of this level of complexity to the application programmer 230 would not be beneficial; it would result in an API where even a 231 simple example would require a significant amount of code to 232 orchestrate all the needed interactions, as well as creating a large 233 API surface that needed to be agreed upon and documented. In 234 addition, these API points could be called in any order, resulting in 235 a more complex set of interactions with the media subsystem than the 236 JSEP approach, which specifies how session descriptions are to be 237 evaluated and applied. 239 One variation on JSEP that was considered was to keep the basic 240 session description-oriented API, but to move the mechanism for 241 generating offers and answers out of the browser. Instead of 242 providing createOffer/createAnswer methods within the browser, this 243 approach would instead expose a getCapabilities API which would 244 provide the application with the information it needed in order to 245 generate its own session descriptions. This increases the amount of 246 work that the application needs to do; it needs to know how to 247 generate session descriptions from capabilities, and especially how 248 to generate the correct answer from an arbitrary offer and the 249 supported capabilities. While this could certainly be addressed by 250 using a library like the one mentioned above, it basically forces the 251 use of said library even for a simple example. Providing createOffer 252 /createAnswer avoids this problem, but still allows applications to 253 generate their own offers/answers (to a large extent) if they choose, 254 using the description generated by createOffer as an indication of 255 the browser's capabilities. 257 2. Terminology 259 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 260 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 261 document are to be interpreted as described in [RFC2119]. 263 3. Semantics and Syntax 265 3.1. Signaling Model 267 JSEP does not specify a particular signaling model or state machine, 268 other than the generic need to exchange SDP media descriptions in the 269 fashion described by [RFC3264] (offer/answer) in order for both sides 270 of the session to know how to conduct the session. JSEP provides 271 mechanisms to create offers and answers, as well as to apply them to 272 a session. However, the browser is totally decoupled from the actual 273 mechanism by which these offers and answers are communicated to the 274 remote side, including addressing, retransmission, forking, and glare 275 handling. These issues are left entirely up to the application; the 276 application has complete control over which offers and answers get 277 handed to the browser, and when. 279 +-----------+ +-----------+ 280 | Web App |<--- App-Specific Signaling -->| Web App | 281 +-----------+ +-----------+ 282 ^ ^ 283 | SDP | SDP 284 V V 285 +-----------+ +-----------+ 286 | Browser |<----------- Media ------------>| Browser | 287 +-----------+ +-----------+ 289 Figure 1: JSEP Signaling Model 291 3.2. Session Descriptions and State Machine 293 In order to establish the media plane, the user agent needs specific 294 parameters to indicate what to transmit to the remote side, as well 295 as how to handle the media that is received. These parameters are 296 determined by the exchange of session descriptions in offers and 297 answers, and there are certain details to this process that must be 298 handled in the JSEP APIs. 300 Whether a session description applies to the local side or the remote 301 side affects the meaning of that description. For example, the list 302 of codecs sent to a remote party indicates what the local side is 303 willing to receive, which, when intersected with the set of codecs 304 the remote side supports, specifies what the remote side should send. 305 However, not all parameters follow this rule; for example, the SRTP 306 parameters [RFC4568] sent to a remote party indicate what the local 307 side will use to encrypt, and thereby what the remote party should 308 expect to receive; the remote party will have to accept these 309 parameters, with no option to choose a different value. 311 In addition, various RFCs put different conditions on the format of 312 offers versus answers. For example, a offer may propose multiple 313 SRTP configurations, but an answer may only contain a single SRTP 314 configuration. 316 Lastly, while the exact media parameters are only known only after a 317 offer and an answer have been exchanged, it is possible for the 318 offerer to receive media after they have sent an offer and before 319 they have received an answer. To properly process incoming media in 320 this case, the offerer's media handler must be aware of the details 321 of the offer before the answer arrives. 323 Therefore, in order to handle session descriptions properly, the user 324 agent needs: 326 1. To know if a session description pertains to the local or remote 327 side. 329 2. To know if a session description is an offer or an answer. 331 3. To allow the offer to be specified independently of the answer. 333 JSEP addresses this by adding both a setLocalDescription and a 334 setRemoteDescription method and having session description objects 335 contain a type field indicating the type of session description being 336 supplied. This satisfies the requirements listed above for both the 337 offerer, who first calls setLocalDescription(sdp [offer]) and then 338 later setRemoteDescription(sdp [answer]), as well as for the 339 answerer, who first calls setRemoteDescription(sdp [offer]) and then 340 later setLocalDescription(sdp [answer]). 342 JSEP also allows for an answer to be treated as provisional by the 343 application. Provisional answers provide a way for an answerer to 344 communicate initial session parameters back to the offerer, in order 345 to allow the session to begin, while allowing a final answer to be 346 specified later. This concept of a final answer is important to the 347 offer/answer model; when such an answer is received, any extra 348 resources allocated by the caller can be released, now that the exact 349 session configuration is known. These "resources" can include things 350 like extra ICE components, TURN candidates, or video decoders. 351 Provisional answers, on the other hand, do no such deallocation 352 results; as a result, multiple dissimilar provisional answers can be 353 received and applied during call setup. 355 In [RFC3264], the constraint at the signaling level is that only one 356 offer can be outstanding for a given session, but from the media 357 stack level, a new offer can be generated at any point. For example, 358 when using SIP for signaling, if one offer is sent, then cancelled 359 using a SIP CANCEL, another offer can be generated even though no 360 answer was received for the first offer. To support this, the JSEP 361 media layer can provide an offer whenever the Javascript application 362 needs one for the signaling. The answerer can send back zero or more 363 provisional answers, and finally end the offer-answer exchange by 364 sending a final answer. The state machine for this is as follows: 366 setRemote(OFFER) setLocal(PRANSWER) 367 /-----\ /-----\ 368 | | | | 369 v | v | 370 +---------------+ | +---------------+ | 371 | |----/ | |----/ 372 | | setLocal(PRANSWER) | | 373 | Remote-Offer |------------------- >| Local-Pranswer| 374 | | | | 375 | | | | 376 +---------------+ +---------------+ 377 ^ | | 378 | | setLocal(ANSWER) | 379 setRemote(OFFER) | | 380 | V setLocal(ANSWER) | 381 +---------------+ | 382 | | | 383 | | | 384 | Stable |<---------------------------+ 385 | | | 386 | | | 387 +---------------+ setRemote(ANSWER) | 388 ^ | | 389 | | setLocal(OFFER) | 390 setRemote(ANSWER) | | 391 | V | 392 +---------------+ +---------------+ 393 | | | | 394 | | setRemote(PRANSWER) | | 395 | Local-Offer |------------------- >|Remote-Pranswer| 396 | | | | 397 | |----\ | |----\ 398 +---------------+ | +---------------+ | 399 ^ | ^ | 400 | | | | 401 \-----/ \-----/ 402 setLocal(OFFER) setRemote(PRANSWER) 404 Figure 2: JSEP State Machine 406 Aside from these state transitions, there is no other difference 407 between the handling of provisional ("pranswer") and final ("answer") 408 answers. 410 3.3. Session Description Format 412 In the WebRTC specification, session descriptions are formatted as 413 SDP messages. While this format is not optimal for manipulation from 414 Javascript, it is widely accepted, and frequently updated with new 415 features. Any alternate encoding of session descriptions would have 416 to keep pace with the changes to SDP, at least until the time that 417 this new encoding eclipsed SDP in popularity. As a result, JSEP 418 currently uses SDP as the internal representation for its session 419 descriptions. 421 However, to simplify Javascript processing, and provide for future 422 flexibility, the SDP syntax is encapsulated within a 423 SessionDescription object, which can be constructed from SDP, and be 424 serialized out to SDP. If future specifications agree on a JSON 425 format for session descriptions, we could easily enable this object 426 to generate and consume that JSON. 428 Other methods may be added to SessionDescription in the future to 429 simplify handling of SessionDescriptions from Javascript. In the 430 meantime, Javascript libraries can be used to perform these 431 manipulations. 433 Note that most applications should be able to treat the 434 SessionDescriptions produced and consumed by these various API calls 435 as opaque blobs; that is, the application will not need to read or 436 change them. The W3C API will provide appropriate APIs to allow the 437 application to control various session parameters, which will provide 438 the necessary information to the browser about what sort of 439 SessionDescription to produce. 441 3.4. ICE 443 When a new ICE candidate is available, the ICE Agent will notify the 444 application via a callback; these candidates will automatically be 445 added to the local session description. When all candidates have 446 been gathered, the callback will also be invoked to signal that the 447 gathering process is complete. 449 3.4.1. ICE Candidate Trickling 451 Candidate trickling is a technique through which a caller may 452 incrementally provide candidates to the callee after the initial 453 offer has been dispatched; the semantics of "Trickle ICE" are defined 454 in [I-D.ivov-mmusic-trickle-ice]. This process allows the callee to 455 begin acting upon the call and setting up the ICE (and perhaps DTLS) 456 connections immediately, without having to wait for the caller to 457 gather all possible candidates. This results in faster call startup 458 in cases where gathering is not performed prior to initiating the 459 call. 461 JSEP supports optional candidate trickling by providing APIs that 462 provide control and feedback on the ICE candidate gathering process. 463 Applications that support candidate trickling can send the initial 464 offer immediately and send individual candidates when they get the 465 notified of a new candidate; applications that do not support this 466 feature can simply wait for the indication that gathering is 467 complete, and then create and send their offer, with all the 468 candidates, at this time. 470 Upon receipt of trickled candidates, the receiving application will 471 supply them to its ICE Agent. This triggers the ICE Agent to start 472 using the new remote candidates for connectivity checks. 474 3.4.1.1. ICE Candidate Format 476 As with session descriptions, the syntax of the IceCandidate object 477 provides some abstraction, but can be easily converted to and from 478 the SDP candidate lines. 480 The candidate lines are the only SDP information that is contained 481 within IceCandidate, as they represent the only information needed 482 that is not present in the initial offer (i.e. for trickle 483 candidates). This information is carried with the same syntax as the 484 "candidate-attribute" field defined for ICE. For example: 486 candidate:1 1 UDP 1694498815 192.0.2.33 10000 typ host 488 The IceCandidate object also contains fields to indicate which m= 489 line it should be associated with. The m line can be identified in 490 one of two ways; either by a m-line index, or a MID. The m-line 491 index is a zero-based index, referring to the Nth m-line in the SDP. 492 The MID uses the "media stream identification", as defined in 493 [RFC5888] , to identify the m-line. WebRTC implementations creating 494 an ICE Candidate object MUST populate both of these fields. 495 Implementations receiving an ICE Candidate object SHOULD use the MID 496 if they implement that functionality, or the m-line index, if not. 498 3.5. Interactions With Forking 500 Some call signaling systems allow various types of forking where an 501 SDP Offer may be provided to more than one device. For example, SIP 502 [RFC3261] defines both a "Parallel Search" and "Sequential Search". 503 Although these are primarily signaling level issues that are outside 504 the scope of JSEP, they do have some impact on the configuration of 505 the media plane which is relevant. When forking happens at the 506 signaling layer, the Javascript application responsible for the 507 signaling needs to make the decisions about what media should be sent 508 or received at any point of time, as well as which remote endpoint it 509 should communicate with; JSEP is used to make sure the media engine 510 can make the RTP and media perform as required by the application. 511 The basic operations that the applications can have the media engine 512 do are: 514 Start exchanging media to a given remote peer, but keep all the 515 resources reserved in the offer. 517 Start exchanging media with a given remote peer, and free any 518 resources in the offer that are not being used. 520 3.5.1. Sequential Forking 522 Sequential forking involves a call being dispatched to multiple 523 remote callees, where each callee can accept the call, but only one 524 active session ever exists at a time; no mixing of received media is 525 performed. 527 JSEP handles sequential forking well, allowing the application to 528 easily control the policy for selecting the desired remote endpoint. 529 When an answer arrives from one of the callees, the application can 530 choose to apply it either as a provisional answer, leaving open the 531 possibility of using a different answer in the future, or apply it as 532 a final answer, ending the setup flow. 534 In a "first-one-wins" situation, the first answer will be applied as 535 a final answer, and the application will reject any subsequent 536 answers. In SIP parlance, this would be ACK + BYE. 538 In a "last-one-wins" situation, all answers would be applied as 539 provisional answers, and any previous call leg will be terminated. 540 At some point, the application will end the setup process, perhaps 541 with a timer; at this point, the application could reapply the 542 existing remote description as a final answer. 544 3.5.2. Parallel Forking 546 Parallel forking involves a call being dispatched to multiple remote 547 callees, where each callee can accept the call, and multiple 548 simultaneous active signaling sessions can be established as a 549 result. If multiple callees send media at the same time, the 550 possibilities for handling this are described in Section 3.1 of 551 [RFC3960]. Most SIP devices today only support exchanging media with 552 a single device at a time, and do not try to mix multiple early media 553 audio sources, as that could result in a confusing situation. For 554 example, consider having a European ringback tone mixed together with 555 the North American ringback tone - the resulting sound would not be 556 like either tone, and would confuse the user. If the signaling 557 application wishes to only exchange media with one of the remote 558 endpoints at a time, then from a media engine point of view, this is 559 exactly like the sequential forking case. 561 In the parallel forking case where the Javascript application wishes 562 to simultaneously exchange media with multiple peers, the flow is 563 slightly more complex, but the Javascript application can follow the 564 strategy that [RFC3960] describes using UPDATE. (It is worth noting 565 that use cases where this is the desired behavior are very unusual.) 566 The UPDATE approach allows the signaling to set up a separate media 567 flow for each peer that it wishes to exchange media with. In JSEP, 568 this offer used in the UPDATE would be formed by simply creating a 569 new PeerConnection and making sure that the same local media streams 570 have been added into this new PeerConnection. Then the new 571 PeerConnection object would produce a SDP offer that could be used by 572 the signaling to perform the UPDATE strategy discussed in [RFC3960]. 574 As a result of sharing the media streams, the application will end up 575 with N parallel PeerConnection sessions, each with a local and remote 576 description and their own local and remote addresses. The media flow 577 from these sessions can be managed by specifying SDP direction 578 attributes in the descriptions, or the application can choose to play 579 out the media from all sessions mixed together. Of course, if the 580 application wants to only keep a single session, it can simply 581 terminate the sessions that it no longer needs. 583 3.6. Session Rehydration 585 In the event that the local application state is reinitialized, 586 either due to a user reload of the page, or a decision within the 587 application to reload itself (perhaps to update to a new version), it 588 is possible to keep an existing session alive, via a process called 589 "rehydration". The explicit goal of rehydration is to carry out this 590 session resumption with no interaction with the remote side other 591 than normal call signaling messages. 593 With rehydration, the current signaling state is persisted somewhere 594 outside of the page, perhaps on the application server, or in browser 595 local storage. The page is then reloaded, the saved signaling state 596 is retrieved, and a new PeerConnection object is created for the 597 session. The previously obtained MediaStreams are re-acquired, and 598 are given the same IDs as the original session; this ensures the IDs 599 in use by the remote side continue to work. Next, a new offer is 600 generated by the new PeerConnection; this offer will have new ICE and 601 possibly new DTLS-SRTP certificate fingerprints (since the old ICE 602 and SRTP state has been lost). Finally, this offer is used to re- 603 initiate the session with the existing remote endpoint, who simply 604 sees the new offer as an in-call renegotiation, and replies with an 605 answer that can be supplied to setRemoteDescription. ICE processing 606 proceeds as usual, and as soon as connectivity is established, the 607 session will be back up and running again. 609 [OPEN ISSUE: EKR proposed an alternative rehydration approach where 610 the actual internal PeerConnection object in the browser was kept 611 alive for some time after the web page was killed and provided some 612 way for a new page to acquire the old PeerConnection object.] 614 4. Interface 616 This section details the basic operations that must be present to 617 implement JSEP functionality. The actual API exposed in the W3C API 618 may have somewhat different syntax, but should map easily to these 619 concepts. 621 4.1. Methods 623 4.1.1. createOffer 625 The createOffer method generates a blob of SDP that contains a 626 [RFC3264] offer with the supported configurations for the session, 627 including descriptions of the local MediaStreams attached to this 628 PeerConnection, the codec/RTP/RTCP options supported by this 629 implementation, and any candidates that have been gathered by the ICE 630 Agent. A constraints parameters may be supplied to provide 631 additional control over the generated offer. This constraints 632 parameter should allow for the following manipulations to be 633 performed: 635 o To indicate support for a media type even if no MediaStreamTracks 636 of that type have been added to the session (e.g., an audio call 637 that wants to receive video.) 639 o To trigger an ICE restart, for the purpose of reestablishing 640 connectivity. 642 o For re-offer cases, to request an offer that contains the full set 643 of supported capabilities, as opposed to just the currently 644 negotiated parameters. 646 In the initial offer, the generated SDP will contain all desired 647 functionality for the session (certain parts that are supported but 648 not desired by default may be omitted); for each SDP line, the 649 generation of the SDP will follow the process defined for generating 650 an initial offer from the document that specifies the given SDP line. 651 The exact handling of initial offer generation is detailed in 652 Section 5.2.1. below. 654 In the event createOffer is called after the session is established, 655 createOffer will generate an offer to modify the current session 656 based on any changes that have been made to the session, e.g. adding 657 or removing MediaStreams, or requesting an ICE restart. For each 658 existing stream, the generation of each SDP line must follow the 659 process defined for generating an updated offer from the document 660 that specifies the given SDP line. For each new stream, the 661 generation of the SDP must follow the process of generating an 662 initial offer, as mentioned above. If no changes have been made, or 663 for SDP lines that are unaffected by the requested changes, the offer 664 will only contain the parameters negotiated by the last offer-answer 665 exchange. The exact handling of subsequent offer generation is 666 detailed in Section 5.2.2. below. 668 Session descriptions generated by createOffer must be immediately 669 usable by setLocalDescription; if a system has limited resources 670 (e.g. a finite number of decoders), createOffer should return an 671 offer that reflects the current state of the system, so that 672 setLocalDescription will succeed when it attempts to acquire those 673 resources. Because this method may need to inspect the system state 674 to determine the currently available resources, it may be implemented 675 as an async operation. 677 Calling this method may do things such as generate new ICE 678 credentials, but does not result in candidate gathering, or cause 679 media to start or stop flowing. 681 4.1.2. createAnswer 683 The createAnswer method generates a blob of SDP that contains a 684 [RFC3264] SDP answer with the supported configuration for the session 685 that is compatible with the parameters supplied in the offer. Like 686 createOffer, the returned blob contains descriptions of the local 687 MediaStreams attached to this PeerConnection, the codec/RTP/RTCP 688 options negotiated for this session, and any candidates that have 689 been gathered by the ICE Agent. A constraints parameter may be 690 supplied to provide additional control over the generated answer. 692 As an answer, the generated SDP will contain a specific configuration 693 that specifies how the media plane should be established; for each 694 SDP line, the generation of the SDP must follow the process defined 695 for generating an answer from the document that specifies the given 696 SDP line. The exact handling of answer generation is detailed in 697 Section 5.3. below. 699 Session descriptions generated by createAnswer must be immediately 700 usable by setLocalDescription; like createOffer, the returned 701 description should reflect the current state of the system. Because 702 this method may need to inspect the system state to determine the 703 currently available resources, it may need to be implemented as an 704 async operation. 706 Calling this method may do things such as generate new ICE 707 credentials, but does not trigger candidate gathering or change media 708 state. 710 4.1.3. SessionDescriptionType 712 Session description objects (RTCSessionDescription) may be of type 713 "offer", "pranswer", and "answer". These types provide information 714 as to how the description parameter should be parsed, and how the 715 media state should be changed. 717 "offer" indicates that a description should be parsed as an offer; 718 said description may include many possible media configurations. A 719 description used as an "offer" may be applied anytime the 720 PeerConnection is in a stable state, or as an update to a previously 721 supplied but unanswered "offer". 723 "pranswer" indicates that a description should be parsed as an 724 answer, but not a final answer, and so should not result in the 725 freeing of allocated resources. It may result in the start of media 726 transmission, if the answer does not specify an inactive media 727 direction. A description used as a "pranswer" may be applied as a 728 response to an "offer", or an update to a previously sent "answer". 730 "answer" indicates that a description should be parsed as an answer, 731 the offer-answer exchange should be considered complete, and any 732 resources (decoders, candidates) that are no longer needed can be 733 released. A description used as an "answer" may be applied as a 734 response to a "offer", or an update to a previously sent "pranswer". 736 The only difference between a provisional and final answer is that 737 the final answer results in the freeing of any unused resources that 738 were allocated as a result of the offer. As such, the application 739 can use some discretion on whether an answer should be applied as 740 provisional or final, and can change the type of the session 741 description as needed. For example, in a serial forking scenario, an 742 application may receive multiple "final" answers, one from each 743 remote endpoint. The application could choose to accept the initial 744 answers as provisional answers, and only apply an answer as final 745 when it receives one that meets its criteria (e.g. a live user 746 instead of voicemail). 748 4.1.3.1. Use of Provisional Answers 750 Most web applications will not need to create answers using the 751 "pranswer" type. The preferred handling for a web application would 752 be to create and send an "inactive" answer more or less immediately 753 after receiving the offer, instead of waiting for a human user to 754 physically answer the call. Later, when the human input is received, 755 the application can create a new "sendrecv" offer to update the 756 previous offer/answer pair and start the media flow. This approach 757 is preferred because it minimizes the amount of time that the offer- 758 answer exchange is left open, in addition to avoiding media clipping 759 by ensuring the transport is ready to go by the time the call is 760 physically answered. However, some applications may not be able to 761 do this, particularly ones that are attempting to gateway to other 762 signaling protocols. In these cases, "pranswer" can still allow the 763 application to warm up the transport. 765 Consider a typical web application that will set up a data channel, 766 an audio channel, and a video channel. When an endpoint receives an 767 offer with these channels, it could send an answer accepting the data 768 channel for two-way data, and accepting the audio and video tracks as 769 inactive or receive-only. It could then ask the user to accept the 770 call, acquire the local media streams, and send a new offer to the 771 remote side moving the audio and video to be two-way media. By the 772 time the human has accepted the call and sent the new offer, it is 773 likely that the ICE and DTLS handshaking for all the channels will 774 already be set up. 776 4.1.3.2. Rollback 778 In certain situations it may be desirable to "undo" a change made to 779 setLocalDescription or setRemoteDescription. Consider a case where a 780 call is ongoing, and one side wants to change some of the session 781 parameters; that side generates an updated offer and then calls 782 setLocalDescription. However, the remote side, either before or 783 after setRemoteDescription, decides it does not want to accept the 784 new parameters, and sends a reject message back to the offerer. Now, 785 the offerer, and possibly the answerer as well, need to return to a 786 stable state and the previous local/remote description. To support 787 this, we introduce the concept of "rollback". 789 A rollback returns the state machine to its previous state, and the 790 local or remote description to its previous value. Any resources or 791 candidates that were allocated by the new local description are 792 discarded; any media that is received will be processed according to 793 the previous session description. 795 A rollback is performed by supplying a session description of type 796 "rollback" to either setLocalDescription or setRemoteDescription, 797 depending on which needs to be rolled back (i.e. if the new offer was 798 supplied to setLocalDescription, the rollback should be done on 799 setLocalDescription as well.) 801 4.1.4. setLocalDescription 803 The setLocalDescription method instructs the PeerConnection to apply 804 the supplied SDP blob as its local configuration. The type field 805 indicates whether the blob should be processed as an offer, 806 provisional answer, or final answer; offers and answers are checked 807 differently, using the various rules that exist for each SDP line. 809 This API changes the local media state; among other things, it sets 810 up local resources for receiving and decoding media. In order to 811 successfully handle scenarios where the application wants to offer to 812 change from one media format to a different, incompatible format, the 813 PeerConnection must be able to simultaneously support use of both the 814 old and new local descriptions (e.g. support codecs that exist in 815 both descriptions) until a final answer is received, at which point 816 the PeerConnection can fully adopt the new local description, or roll 817 back to the old description if the remote side denied the change. 819 This API indirectly controls the candidate gathering process. When a 820 local description is supplied, and the number of transports currently 821 in use does not match the number of transports needed by the local 822 description, the PeerConnection will create transports as needed and 823 begin gathering candidates for them. 825 If setRemoteDescription was previous called with an offer, and 826 setLocalDescription is called with an answer (provisional or final), 827 and the media directions are compatible, and media are available to 828 send, this will result in the starting of media transmission. 830 4.1.5. setRemoteDescription 832 The setRemoteDescription method instructs the PeerConnection to apply 833 the supplied SDP blob as the desired remote configuration. As in 834 setLocalDescription, the type field of the indicates how the blob 835 should be processed. 837 This API changes the local media state; among other things, it sets 838 up local resources for sending and encoding media. 840 If setRemoteDescription was previously called with an offer, and 841 setLocalDescription is called with an answer (provisional or final), 842 and the media directions are compatible, and media are available to 843 send, this will result in the starting of media transmission. 845 4.1.6. localDescription 847 The localDescription method returns a copy of the current local 848 configuration, i.e. what was most recently passed to 849 setLocalDescription, plus any local candidates that have been 850 generated by the ICE Agent. 852 TODO: Do we need to expose accessors for both the current and 853 proposed local description? 855 A null object will be returned if the local description has not yet 856 been established, or if the PeerConnection has been closed. 858 4.1.7. remoteDescription 859 The remoteDescription method returns a copy of the current remote 860 configuration, i.e. what was most recently passed to 861 setRemoteDescription, plus any remote candidates that have been 862 supplied via processIceMessage. 864 TODO: Do we need to expose accessors for both the current and 865 proposed remote description? 867 A null object will be returned if the remote description has not yet 868 been established, or if the PeerConnection has been closed. 870 4.1.8. updateIce 872 The updateIce method allows the configuration of the ICE Agent to be 873 changed during the session, primarily for changing which types of 874 local candidates are provided to the application and used for 875 connectivity checks. A callee may initially configure the ICE Agent 876 to use only relay candidates, to avoid leaking location information, 877 but update this configuration to use all candidates once the call is 878 accepted. 880 Regardless of the configuration, the gathering process collects all 881 available candidates, but excluded candidates will not be surfaced in 882 onicecandidate callback or used for connectivity checks. 884 This call may result in a change to the state of the ICE Agent, and 885 may result in a change to media state if it results in connectivity 886 being established. 888 4.1.9. addIceCandidate 890 The addIceCandidate method provides a remote candidate to the ICE 891 Agent, which, if parsed successfully, will be added to the remote 892 description according to the rules defined for Trickle ICE. 893 Connectivity checks will be sent to the new candidate. 895 This call will result in a change to the state of the ICE Agent, and 896 may result in a change to media state if it results in connectivity 897 being established. 899 5. SDP Interaction Procedures 901 This section describes the specific procedures to be followed when 902 creating and parsing SDP objects. 904 5.1. SDP Requirements Overview 905 The key specifications that govern creation and processing of offers 906 and answers are listed below. This list is derived from 907 [I-D.ietf-rtcweb-rtp-usage]. 909 R-1 [RFC4566] is the base SDP specification and MUST be 910 implemented. 912 R-2 The [RFC5888] grouping framework MUST be implemented for 913 signaling grouping information, and MUST be used to identify m= 914 lines via the a=mid attribute. 916 R-3 [RFC5124] MUST be supported for signaling RTP/SAVPF RTP 917 profile. 919 R-4 [RFC4585] MUST be implemented to signal RTCP based feedback. 921 R-5 [RFC5245] MUST be implemented for signaling the ICE candidate 922 lines corresponding to each media stream. 924 R-6 [RFC5761] MUST be implemented to signal multiplexing of RTP and 925 RTCP. 927 R-7 The SDP atributes of "sendonly", "recvonly", "inactive", and 928 "sendrecv" from [RFC4566] MUST be implemented to signal 929 information about media direction. 931 R-8 [RFC5576] MUST be implemented to signal RTP SSRC values. 933 R-9 [RFC5763] MUST be implemented to signal DTLS certificate 934 fingerprints. 936 R-10 [RFC5506] MAY be implemented to signal Reduced-Size RTCP 937 messages. 939 R-11 [RFC3556] with bandwidth modifiers MAY be supported for 940 specifying RTCP bandwidth as a fraction of the media bandwidth, 941 RTCP fraction allocated to the senders and setting maximum media 942 bit-rate boundaries. 944 R-12 [RFC4568] MUST NOT be implemented to signal SDES SRTP keying 945 information. 947 R-13 A [I-D.ietf-mmusic-msid] MUST be supported, in order to signal 948 associations between RTP objects and W3C MediaStreams and 949 MediaStreamTracks in a standard way. 951 R-14 The bundle mechanism in 952 [I-D.ietf-mmusic-sdp-bundle-negotiation] MUST be supported to 953 signal the use or multiplexing RTP somethings on a single UDP 954 port, in order to avoid excessive use of port number resources. 956 As required by [RFC4566] Section 5.13 JSEP implementations MUST 957 ignore unknown attributes (a=) lines. 959 Example SDP for RTCWeb call flows can be found in 960 [I-D.nandakumar-rtcweb-sdp]. [TODO: since we are starting to specify 961 how to handle SDP in this document, should these call flows be merged 962 into this document, or this link moved to the examples section?] 964 5.2. Constructing an Offer 966 When createOffer is called, a new SDP description must be created 967 that includes the functionality specified in 968 [I-D.ietf-rtcweb-rtp-usage]. The exact details of this process are 969 explained below. 971 5.2.1. Initial Offers 973 When createOffer is called for the first time, the result is known as 974 the initial offer. 976 The first step in generating an initial offer is to generate session- 977 level attributes, as specified in [RFC4566], Section 5. 978 Specifically: 980 o The first SDP line MUST be "v=0", as specified in [RFC4566], 981 Section 5.1 983 o The second SDP line MUST be an "o=" line, as specified in 984 [RFC4566], Section 5.2. The value of the field SHOULD 985 be "-". The value of the field SHOULD be a 986 cryptographically random number. To ensure uniqueness, this 987 number SHOULD be at least 64 bits long. The value of the field SHOULD be zero. The value of the 989 tuple SHOULD be set to a non- 990 meaningful address, such as IN IP4 0.0.0.0, to prevent leaking the 991 local address in this field. As mentioned in [RFC4566], the 992 entire o= line needs to be unique, but selecting a random number 993 for is sufficient to accomplish this. 995 o The third SDP line MUST be a "s=" line, as specified in [RFC4566], 996 Section 5.3; a single space SHOULD be used as the session name, 997 e.g. "s= " 999 o Session Information ("i="), URI ("u="), Email Address ("e="), 1000 Phone Number ("p="), Bandwidth ("b="), Repeat Times ("r="), and 1001 Time Zones ("z=") lines are not useful in this context and SHOULD 1002 NOT be included. 1004 o Encryption Keys ("k=") lines do not provide sufficient security 1005 and MUST NOT be included. 1007 o A "t=" line MUST be added, as specified in [RFC4566], Section 5.9; 1008 both and SHOULD be set to zero, e.g. "t=0 1009 0". 1011 The next step is to generate m= sections for each MediaStreamTrack 1012 that has been added to the PeerConnection via the addStream method. 1013 Note that this method takes a MediaStream, which can contain multiple 1014 MediaStreamTracks, and therefore multiple m= sections can be 1015 generated even if addStream is only called once. 1017 Each m= section should be generated as specified in [RFC4566], 1018 Section 5.14. The field MUST be set to "RTP/SAVPF". If a m= 1019 section is not being bundled into another m= section, it MUST 1020 generate a unique set of ICE credentials and gather its own set of 1021 candidates. Otherwise, it MUST use the same ICE credentials and 1022 candidates that were used in the m= section that it is being bundled 1023 into. For DTLS, all m= sections MUST use the same certificate [OPEN 1024 ISSUE: how this is configured] and will therefore have the same 1025 fingerprint values. 1027 Each m= section MUST include the following: 1029 o An "a=mid" line, as specified in [RFC5888], Section 4. 1031 o An "a=msid" line, as specified in [I-D.ietf-mmusic-msid], 1032 Section 2. 1034 o [OPEN ISSUE: Use of App Token versus stream-correlator ] 1036 o An "a=sendrecv" line, as specified in [RFC3264], Section 5.1. 1038 o For each supported codec, "a=rtpmap" and "a=fmtp" lines, as 1039 specified in [RFC4566], Section 6. For audio, the codecs 1040 specified in [I-D.ietf-rtcweb-audio], Section 3, MUST be be 1041 supported. 1043 o For each primary codec where RTP retransmission should be used, a 1044 corresponding "a=rtpmap" line indicating "rtx" with the clock rate 1045 of the primary codec and an "a=fmtp" line that references the 1046 payload type fo the primary codec, as specified in [RFC4588], 1047 Section 8.1. 1049 o For each supported FEC mechanism, a corresponding "a=rtpmap" line 1050 indicating the desired FEC codec. 1052 o "a=ice-ufrag" and "a=ice-passwd" lines, as specified in [RFC5245], 1053 Section 15.4. 1055 o An "a=ice-options" line, with the "trickle" option, as specified 1056 in [I-D.ivov-mmusic-trickle-ice], Section 4. 1058 o For each candidate that has been gathered during the most recent 1059 gathering phase, an "a=candidate" line, as specified in [RFC5245], 1060 Section 4.3., paragraph 3. 1062 o For the current default candidate, a "c=" line, as specific in 1063 [RFC5245], Section 4.3., paragraph 6. [OPEN ISSUE, pending 1064 resolution in mmusic: If no candidates have yet been gathered yet, 1065 the default candidate should be set to the null value defined in 1066 [I-D.ivov-mmusic-trickle-ice], Section 5.1.] 1068 o An "a=fingerprint" line, as specified in [RFC4572], Section 5. 1069 Use of the SHA-256 algorithm for the fingerprint is REQUIRED; if 1070 the browser also supports stronger hashes, additional 1071 "a=fingerprint" lines with these hashes MAY also be added. 1073 o An "a=setup" line, as specified in [RFC4145], Section 4, and 1074 clarified for use in DTLS-SRTP scenarios in [RFC5763], Section 5. 1075 The role value in the offer MUST be "actpass". 1077 o An "a=rtcp-mux" line, as specified in [RFC5761], Section 5.1.1. 1079 o An "a=rtcp-rsize" line, as specified in [RFC5506], Section 5. 1081 o For each supported RTP header extension, an "a=extmap" line, as 1082 specified in [RFC5285], Section 5. The list of header extensions 1083 that SHOULD/MUST be supported is specified in 1084 [I-D.ietf-rtcweb-rtp-usage], Section 5.2. Any header extensions 1085 that require encryption MUST be specified as indicated in 1086 [RFC6904], Section 4. 1088 o For each supported RTCP feedback mechanism, an "a=rtcp-fb" 1089 mechanism, as specified in [RFC4585], Section 4.2. The list of 1090 RTCP feedback mechanisms that SHOULD/MUST be supported is 1091 specified in [I-D.ietf-rtcweb-rtp-usage], Section 5.1. 1093 o An "a=ssrc" line, as specified in [RFC5576], Section 4.1, 1094 indicating the SSRC to be used for sending media. 1096 o If RTX is supported for this media type, another "a=ssrc" line 1097 with the RTX SSRC, and an "a=ssrc-group" line, as specified in 1098 [RFC5576], section 4.2, with semantics set to "FID" and including 1099 the primary and RTX SSRCs. 1101 o If FEC is supported for this media type, another "a=ssrc" line 1102 with the FEC SSRC, and an "a=ssrc-group" line, as specified in 1103 [RFC5576], section 4.2, with semantics set to "FEC" and including 1104 the primary and FEC SSRCs. 1106 o [OPEN ISSUE: Handling of a=imageattr] 1108 o [TODO: bundle-only] 1110 Lastly, if a data channel has been created, a m= section MUST be 1111 generated for data. The field MUST be set to "application" 1112 and the field MUST be set to "DTLS/SCTP", as specified in 1113 [I-D.ietf-mmusic-sctp-sdp], Section 3. The "a=mid", "a=ice-ufrag", 1114 "a=ice-passwd", "a=ice-options", "a=candidate", "a=fingerprint", and 1115 "a=setup" lines MUST be included as mentioned above. [OPEN ISSUE: 1116 additional SCTP-specific stuff to be included, as indicated in 1117 [I-D.jesup-rtcweb-data-protocol] (currently none)] 1119 Once all m= sections have been generated, a session-level "a=group" 1120 attribute MUST be added as specified in [RFC5888]. This attribute 1121 MUST have semantics "BUNDLE", and identify the m= sections to be 1122 bundled. [OPEN ISSUE: Need to determine exactly how this decision is 1123 made.] 1125 Attributes that are common between all m= sections MAY be moved to 1126 session-level, if desired. 1128 Attributes other than the ones specified above MAY be included, 1129 except for the following attributes which are specifically 1130 incompatible with the requirements of [I-D.ietf-rtcweb-rtp-usage], 1131 and MUST NOT be included: 1133 o "a=crypto" 1135 o "a=key-mgmt" 1137 o "a=ice-lite" 1139 Note that when BUNDLE is used, any additional attributes that are 1140 added MUST follow the advice in 1141 [I-D.nandakumar-mmusic-sdp-mux-attributes] on how those attributes 1142 interact with BUNDLE. 1144 5.2.2. Subsequent Offers 1146 When createOffer is called a second (or later) time, the processing 1147 is different, depending on the current signaling state. 1149 If the initial offer was not applied using setLocalDescription, 1150 meaning the PeerConnection is still in the "stable" state, the steps 1151 for generating an initial offer should be followed, with this 1152 exception: 1154 o The "o=" line MUST stay the same. 1156 If the initial offer was applied using setLocalDescription, but an 1157 answer from the remote side has not yet been applied, meaning the 1158 PeerConnection is still in the "local-offer" state, the steps for 1159 generating an initial offer should be followed, with these 1160 exceptions: 1162 o The "o=" line MUST stay the same, except for the 1163 field, which MUST increase by 1 from the previously applied local 1164 description. 1166 o The "s=" and "t=" lines MUST stay the same. 1168 o Each "a=mid" line MUST stay the same. 1170 o Each "a=ice-ufrag" and "a=ice-pwd" line MUST stay the same. 1172 o For MediaStreamTracks that are still present, the "a=msid", 1173 "a=ssrc", and "a=ssrc-group" lines MUST stay the same. 1175 o If any MediaStreamTracks have been removed, either through the 1176 removeStream method or by removing them from an added MediaStream, 1177 their m= sections MUST be marked as recvonly by changing the value 1178 of the [RFC3264] directional attribute to "a=recvonly". The 1179 "a=msid", "a=ssrc", and "a=ssrc-group" lines MUST be removed from 1180 the associated m= sections. 1182 If the initial offer was applied using setLocalDescription, and an 1183 answer from the remote side has been applied using 1184 setRemoteDescription, meaning the PeerConnection is in the "remote- 1185 pranswer" or "stable" states, an offer is generated based on the 1186 negotiated session descriptions by following the steps mentioned for 1187 the "local-offer" state above, along with these exceptions: [OPEN 1188 ISSUE: should this be permitted in the remote-pranswer state?] 1190 o If a m= section was rejected, i.e. has had its port set to zero in 1191 either the local or remote description, it MUST remain rejected 1192 and have a zero port in the new offer, as indicated in RFC3264, 1193 Section 5.1. 1195 o If a m= section exists in the current local description, but has 1196 its state set to inactive or recvonly, and a new MediaStreamTrack 1197 is added, the previously existing m= section MUST be recycled 1198 instead of creating a new m= section. [OPEN ISSUE: Nail down 1199 exactly what this means. Should the codecs remain the same? 1200 (No.) Should ICE restart? (No.) Can the "a=mid" attribute be 1201 changed? (Yes?)] 1203 o If a m= section exists in the current local description, but does 1204 not have an associated MediaStreamTrack (i.e. it is inactive or 1205 recvonly), a corresponding m= section MUST be generated in the new 1206 offer, but without "a=msid", "a=ssrc", or "a=ssrc-group" 1207 attributes, and the appropriate directional attribute must be 1208 specified. 1210 In addition, for each previously existing, non-rejected m= section in 1211 the new offer, the following adjustments are made based on the 1212 contents of the corresponding m= section in the current remote 1213 description: 1215 o The m= line and corresponding "a=rtpmap" and "a=fmtp" lines MUST 1216 only include codecs present in the remote description. 1218 o The RTP header extensions MUST only include those that are present 1219 in the remote description. 1221 o The RTCP feedback extensions MUST only include those that are 1222 present in the remote description. 1224 o The "a=rtcp-mux" line MUST only be added if present in the remote 1225 description. 1227 o The "a=rtcp-rsize" line MUST only be added if present in the 1228 remote description. 1230 5.2.3. Constraints Handling 1232 The createOffer method takes as a parameter a MediaConstraints 1233 object. Special processing is performed when generating a SDP 1234 description if the following constraints are present. 1236 5.2.3.1. OfferToReceiveAudio 1238 If the "OfferToReceiveAudio" constraint is specified, with a value of 1239 "true", the offer MUST include a non-rejected m= section with media 1240 type "audio", even if no audio MediaStreamTrack has been added to the 1241 PeerConnection. This allows the offerer to receive audio even when 1242 not sending it; accordingly, the directional attribute on the audio 1243 m= section MUST be set to recvonly. If this constraint is specified 1244 when an audio MediaStreamTrack has already been added to the 1245 PeerConnection, or a non-rejected m= section with media type "audio" 1246 previously existed, it has no effect. 1248 5.2.3.2. OfferToReceiveVideo 1250 If the "OfferToReceiveAudio" constraint is specified, with a value of 1251 "true", the offer MUST include a m= section with media type "video", 1252 even if no video MediaStreamTrack has been added to the 1253 PeerConnection. This allows the offerer to receive video even when 1254 not sending it; accordingly, the directional attribute on the video 1255 m= section MUST be set to recvonly. If this constraint is specified 1256 when an video MediaStreamTrack has already been added to the 1257 PeerConnection, or a non-rejected m= section with media type "video" 1258 previously existed, it has no effect. 1260 5.2.3.3. VoiceActivityDetection 1262 If the "VoiceActivityDetection" constraint is specified, with a value 1263 of "true", the offer MUST indicate support for silence suppression by 1264 including comfort noise ("CN") codecs for each supported clock rate, 1265 as specified in [RFC3389], Section 5.1. [OPEN issue: should this do 1266 anything in signaling, or should it just control built-in DTX modes 1267 in audio codecs? Opus has built-in DTX, but G.711 does not.] 1269 5.2.3.4. IceRestart 1271 If the "IceRestart" constraint is specified, with a value of "true", 1272 the offer MUST indicate an ICE restart by generating new ICE ufrag 1273 and pwd attributes, as specified in RFC5245, Section 9.1.1.1. If 1274 this constraint is specified on an initial offer, it has no effect 1275 (since a new ICE ufrag and pwd are already generated). 1277 5.3. Generating an Answer 1279 When createAnswer is called, a new SDP description must be created 1280 that is compatible with the supplied remote description as well as 1281 the requirements specified in [I-D.ietf-rtcweb-rtp-usage]. The exact 1282 details of this process are explained below. 1284 5.3.1. Initial Answers 1286 When createAnswer is called for the first time after a remote 1287 description has been provided, the result is known as the initial 1288 answer. If no remote description has been installed, an answer 1289 cannot be generated, and an error MUST be returned. 1291 Note that the remote description SDP may not have been created by a 1292 WebRTC endpoint and may not conform to all the requirements listed in 1293 Section 5.2. For many cases, this is not a problem. However, if any 1294 mandatory SDP attributes are missing, or functionality listed as 1295 mandatory-to-use is not present (e.g. ICE, DTLS) [TODO: find 1296 reference for this], this MUST be treated as an error. [OPEN ISSUE: 1297 Should this cause setRemoteDescription to fail, or should this cause 1298 createAnswer to reject those particular m= sections?] 1300 The first step in generating an initial answer is to generate 1301 session-level attributes. The process here is identical to that 1302 indicated in the Initial Offers section above, with the addition that 1304 The next step is to generate m= sections for each m= section that is 1305 present in the remote offer, as specified in [RFC3264], Section 6. 1306 For the purposes of this discussion, any session-level attributes in 1307 the offer that are also valid as media-level attributes SHALL be 1308 considered to be present in each m= section. 1310 If any of the offered m= sections have been rejected, by stopping the 1311 associated remote MediaStreamTrack, the corresponding m= section in 1312 the answer MUST be marked as rejected by setting the port in the m= 1313 line to zero, as indicated in [RFC3264], Section 6., and processing 1314 continues with the next m= section. 1316 For each non-rejected m= section of a given media type, if there is a 1317 local MediaStreamTrack of the specified type which has been added to 1318 the PeerConnection via addStream and not yet associated with a m= 1319 section, the MediaStreamTrack is associated with the m= section at 1320 this time. If there are more m= sections of a certain type than 1321 MediaStreamTracks, some m= sections will not have an associated 1322 MediaStreamTrack. If there are more MediaStreamTracks of a certain 1323 type than m= sections, only the first N MediaStreamTracks will be 1324 able to be associated in the constructed answer. The remainder will 1325 need to be associated in a subsequent offer. 1327 Each m= section should then generated as specified in [RFC3264], 1328 Section 6.1. The field MUST be set to "RTP/SAVPF". If the 1329 offer supports BUNDLE, all m= sections to be BUNDLEd must use the 1330 same ICE credentials and candidates; all m= sections not being 1331 BUNDLEd must use unique ICE credentials and candidates. Each m= 1332 section MUST include the following: 1334 o If present in the offer, an "a=mid" line, as specified in 1335 [RFC5888], Section 9.1. The "mid" value MUST match that specified 1336 in the offer. 1338 o If a local MediaStreamTrack has been associated, an "a=msid" line, 1339 as specified in [I-D.ietf-mmusic-msid], Section 2. 1341 o [OPEN ISSUE: Use of App Token versus stream-correlator ] 1343 o If a local MediaStreamTrack has been associated, an "a=sendrecv" 1344 line, as specified in [RFC3264], Section 6.1. If no local 1345 MediaStreamTrack has been associated, an "a=recvonly" line. 1346 [TODO: handle non-sendrecv offered m= sections] 1348 o For each supported codec that is present in the offer, "a=rtpmap" 1349 and "a=fmtp" lines, as specified in [RFC4566], Section 6, and 1350 [RFC3264], Section 6.1. For audio, the codecs specified in 1351 [I-D.ietf-rtcweb-audio], Section 3, MUST be be supported. Note 1352 that for simplicity, the answerer MAY use different payload types 1353 for codecs than the offerer, as it is not prohibited by 1354 Section 6.1. 1356 o If "rtx" is present in the offer, for each primary codec where RTP 1357 retransmission should be used, a corresponding "a=rtpmap" line 1358 indicating "rtx" with the clock rate of the primary codec and an 1359 "a=fmtp" line that references the payload type fo the primary 1360 codec, as specified in [RFC4588], Section 8.1. 1362 o For each supported FEC mechanism that is present in the offer, a 1363 corresponding "a=rtpmap" line indicating the desired FEC codec. 1365 o "a=ice-ufrag" and "a=ice-passwd" lines, as specified in [RFC5245], 1366 Section 15.4. 1368 o If the "trickle" ICE option is present in the offer, an "a=ice- 1369 options" line, with the "trickle" option, as specified in 1370 [I-D.ivov-mmusic-trickle-ice], Section 4. 1372 o For each candidate that has been gathered during the most recent 1373 gathering phase, an "a=candidate" line, as specified in [RFC5245], 1374 Section 4.3., paragraph 3. 1376 o For the current default candidate, a "c=" line, as specific in 1377 [RFC5245], Section 4.3., paragraph 6. [OPEN ISSUE, pending 1378 resolution in mmusic: If no candidates have yet been gathered yet, 1379 the default candidate should be set to the null value defined in 1380 [I-D.ivov-mmusic-trickle-ice], Section 5.1.] 1382 o An "a=fingerprint" line, as specified in [RFC4572], Section 5. 1383 Use of the SHA-256 algorithm for the fingerprint is REQUIRED; if 1384 the browser also supports stronger hashes, additional 1385 "a=fingerprint" lines with these hashes MAY also be added. 1387 o An "a=setup" line, as specified in [RFC4145], Section 4, and 1388 clarified for use in DTLS-SRTP scenarios in [RFC5763], Section 5. 1389 The role value in the answer MUST be "active" or "passive"; the 1390 "active" role is RECOMMENDED. 1392 o If present in the offer, an "a=rtcp-mux" line, as specified in 1393 [RFC5761], Section 5.1.1. 1395 o If present in the offer, an "a=rtcp-rsize" line, as specified in 1396 [RFC5506], Section 5. 1398 o For each supported RTP header extension that is present in the 1399 offer, an "a=extmap" line, as specified in [RFC5285], Section 5. 1400 The list of header extensions that SHOULD/MUST be supported is 1401 specified in [I-D.ietf-rtcweb-rtp-usage], Section 5.2. Any header 1402 extensions that require encryption MUST be specified as indicated 1403 in [RFC6904], Section 4. 1405 o For each supported RTCP feedback mechanism that is present in the 1406 offer, an "a=rtcp-fb" mechanism, as specified in [RFC4585], 1407 Section 4.2. The list of RTCP feedback mechanisms that SHOULD/ 1408 MUST be supported is specified in [I-D.ietf-rtcweb-rtp-usage], 1409 Section 5.1. 1411 o If a local MediaStreamTrack has been associated, an "a=ssrc" line, 1412 as specified in [RFC5576], Section 4.1, indicating the SSRC to be 1413 used for sending media. 1415 o If a local MediaStreamTrack has been associated, and RTX has been 1416 negotiated for this m= section, another "a=ssrc" line with the RTX 1417 SSRC, and an "a=ssrc-group" line, as specified in [RFC5576], 1418 section 4.2, with semantics set to "FID" and including the primary 1419 and RTX SSRCs. 1421 o If a local MediaStreamTrack has been associated, and FEC has been 1422 negotiated for this m= section, another "a=ssrc" line with the FEC 1423 SSRC, and an "a=ssrc-group" line, as specified in [RFC5576], 1424 section 4.2, with semantics set to "FEC" and including the primary 1425 and FEC SSRCs. 1427 o [OPEN ISSUE: Handling of a=imageattr] 1429 o [TODO: bundle-only] 1430 If a data channel m= section has been offered, a m= section MUST also 1431 be generated for data. The field MUST be set to 1432 "application" and the field MUST be set to "DTLS/SCTP", as 1433 specified in [I-D.ietf-mmusic-sctp-sdp], Section 3. The "a=mid", "a 1434 =ice-ufrag", "a=ice-passwd", "a=ice-options", "a=candidate", 1435 "a=fingerprint", and "a=setup" lines MUST be included as mentioned 1436 above. [OPEN ISSUE: additional SCTP-specific stuff to be included, 1437 as indicated in [I-D.jesup-rtcweb-data-protocol] (currently none)] 1439 [TODO: processing of BUNDLE group] 1441 Attributes that are common between all m= sections MAY be moved to 1442 session-level, if desired. 1444 The attributes prohibited in creation of offers are also prohibited 1445 in the creation of answers. 1447 5.3.2. Subsequent Answers 1449 5.3.3. Constraints Handling 1451 5.4. Parsing an Offer 1453 5.5. Parsing an Answer 1455 5.6. Applying a Local Description 1457 5.7. Applying a Remote Description 1459 6. Configurable SDP Parameters 1461 Note: This section is still very early and is likely to significantly 1462 change as we get a better understanding of a) the use cases for this 1463 b) the implications at the protocol level c) feedback from 1464 implementors on what they can do. 1466 The following elements of the SDP media description MUST NOT be 1467 changed between the createOffer and the setLocalDescription, since 1468 they reflect transport attributes that are solely under browser 1469 control, and the browser MUST NOT honor an attempt to change them: 1471 o The number, type and port number of m-lines. 1473 o The generated ICE credentials (a=ice-ufrag and a=ice-pwd). 1475 o The set of ICE candidates and their parameters (a=candidate). 1477 The following modifications, if done by the browser to a description 1478 between createOffer/createAnswer and the setLocalDescription, MUST be 1479 honored by the browser: 1481 o Remove or reorder codecs (m=) 1483 The following parameters may be controlled by constraints passed into 1484 createOffer/createAnswer. As an open issue, these changes may also 1485 be be performed by manipulating the SDP returned from createOffer/ 1486 createAnswer, as indicated above, as long as the capabilities of the 1487 endpoint are not exceeded (e.g. asking for a resolution greater than 1488 what the endpoint can encode): 1490 o disable BUNDLE (a=group) 1492 o disable RTCP mux (a=rtcp-mux) 1494 o change send resolution or frame rate 1496 o change desired recv resolution or frame rate 1498 o change maximum total bandwidth (b=) [OPEN ISSUE: need to clarify 1499 if this is CT or AS - see section 5.8 of [RFC4566]] 1501 o remove desired AVPF mechanisms (a=rtcp-fb) 1503 o remove RTP header extensions (a=extmap) 1505 o change media send/recv state (a=sendonly/recvonly/inactive) 1507 For example, an application could implement call hold by adding an 1508 a=inactive attribute to its local description, and then applying and 1509 signaling that description. 1511 The application can also modify the SDP to reduce the capabilities in 1512 the offer it sends to the far side in any way the application sees 1513 fit, as long as it is a valid SDP offer and specifies a subset of 1514 what the browser is expecting to do. 1516 As always, the application is solely responsible for what it sends to 1517 the other party, and all incoming SDP will be processed by the 1518 browser to the extent of its capabilities. It is an error to assume 1519 that all SDP is well-formed; however, one should be able to assume 1520 that any implementation of this specification will be able to 1521 process, as a remote offer or answer, unmodified SDP coming from any 1522 other implementation of this specification. 1524 7. Security Considerations 1526 The intent of the WebRTC protocol suite is to provide an environment 1527 that is securable by default: all media is encrypted, keys are 1528 exchanged in a secure fashion, and the Javascript API includes 1529 functions that can be used to verify the identity of communication 1530 partners. 1532 8. IANA Considerations 1534 This document requires no actions from IANA. 1536 9. Acknowledgements 1538 Significant text incorporated in the draft as well and review was 1539 provided by Harald Alvestrand and Suhas Nandakumar. Dan Burnett, 1540 Neil Stratford, Eric Rescorla, Anant Narayanan, Andrew Hutton, 1541 Richard Ejzak, and Adam Bergkvist all provided valuable feedback on 1542 this proposal. Matthew Kaufman provided the observation that keeping 1543 state out of the browser allows a call to continue even if the page 1544 is reloaded. 1546 10. References 1548 10.1. Normative References 1550 [I-D.ietf-mmusic-msid] 1551 Alvestrand, H., "Cross Session Stream Identification in 1552 the Session Description Protocol", draft-ietf-mmusic- 1553 msid-01 (work in progress), August 2013. 1555 [I-D.ietf-mmusic-sctp-sdp] 1556 Loreto, S. and G. Camarillo, "Stream Control Transmission 1557 Protocol (SCTP)-Based Media Transport in the Session 1558 Description Protocol (SDP)", draft-ietf-mmusic-sctp-sdp-04 1559 (work in progress), June 2013. 1561 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1562 Holmberg, C., Alvestrand, H., and C. Jennings, 1563 "Multiplexing Negotiation Using Session Description 1564 Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- 1565 bundle-negotiation-04 (work in progress), June 2013. 1567 [I-D.ietf-rtcweb-audio] 1568 Valin, J. and C. Bran, "WebRTC Audio Codec and Processing 1569 Requirements", draft-ietf-rtcweb-audio-02 (work in 1570 progress), August 2013. 1572 [I-D.ietf-rtcweb-rtp-usage] 1573 Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time 1574 Communication (WebRTC): Media Transport and Use of RTP", 1575 draft-ietf-rtcweb-rtp-usage-09 (work in progress), 1576 September 2013. 1578 [I-D.nandakumar-mmusic-sdp-mux-attributes] 1579 Nandakumar, S., "A Framework for SDP Attributes when 1580 Multiplexing", draft-nandakumar-mmusic-sdp-mux- 1581 attributes-03 (work in progress), July 2013. 1583 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1584 Requirement Levels", BCP 14, RFC 2119, March 1997. 1586 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1587 A., Peterson, J., Sparks, R., Handley, M., and E. 1588 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1589 June 2002. 1591 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1592 with Session Description Protocol (SDP)", RFC 3264, June 1593 2002. 1595 [RFC4145] Yon, D. and G. Camarillo, "TCP-Based Media Transport in 1596 the Session Description Protocol (SDP)", RFC 4145, 1597 September 2005. 1599 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1600 Description Protocol", RFC 4566, July 2006. 1602 [RFC4572] Lennox, J., "Connection-Oriented Media Transport over the 1603 Transport Layer Security (TLS) Protocol in the Session 1604 Description Protocol (SDP)", RFC 4572, July 2006. 1606 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1607 "Extended RTP Profile for Real-time Transport Control 1608 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 1609 2006. 1611 [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for 1612 Real-time Transport Control Protocol (RTCP)-Based Feedback 1613 (RTP/SAVPF)", RFC 5124, February 2008. 1615 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 1616 (ICE): A Protocol for Network Address Translator (NAT) 1617 Traversal for Offer/Answer Protocols", RFC 5245, April 1618 2010. 1620 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 1621 Header Extensions", RFC 5285, July 2008. 1623 [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and 1624 Control Packets on a Single Port", RFC 5761, April 2010. 1626 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1627 Protocol (SDP) Grouping Framework", RFC 5888, June 2010. 1629 [RFC6904] Lennox, J., "Encryption of Header Extensions in the Secure 1630 Real-time Transport Protocol (SRTP)", RFC 6904, April 1631 2013. 1633 10.2. Informative References 1635 [I-D.ivov-mmusic-trickle-ice] 1636 Ivov, E., Rescorla, E., and J. Uberti, "Trickle ICE: 1637 Incremental Provisioning of Candidates for the Interactive 1638 Connectivity Establishment (ICE) Protocol", draft-ivov- 1639 mmusic-trickle-ice-01 (work in progress), March 2013. 1641 [I-D.jennings-rtcweb-signaling] 1642 Jennings, C., Rosenberg, J., and R. Jesup, "RTCWeb Offer/ 1643 Answer Protocol (ROAP)", draft-jennings-rtcweb- 1644 signaling-01 (work in progress), October 2011. 1646 [I-D.jesup-rtcweb-data-protocol] 1647 Jesup, R., Loreto, S., and M. Tuexen, "WebRTC Data Channel 1648 Protocol", draft-jesup-rtcweb-data-protocol-04 (work in 1649 progress), February 2013. 1651 [I-D.nandakumar-rtcweb-sdp] 1652 Nandakumar, S. and C. Jennings, "SDP for the WebRTC", 1653 draft-nandakumar-rtcweb-sdp-02 (work in progress), July 1654 2013. 1656 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 1657 Comfort Noise (CN)", RFC 3389, September 2002. 1659 [RFC3556] Casner, S., "Session Description Protocol (SDP) Bandwidth 1660 Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 1661 3556, July 2003. 1663 [RFC3960] Camarillo, G. and H. Schulzrinne, "Early Media and Ringing 1664 Tone Generation in the Session Initiation Protocol (SIP)", 1665 RFC 3960, December 2004. 1667 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 1668 Description Protocol (SDP) Security Descriptions for Media 1669 Streams", RFC 4568, July 2006. 1671 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1672 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1673 July 2006. 1675 [RFC5506] Johansson, I. and M. Westerlund, "Support for Reduced-Size 1676 Real-Time Transport Control Protocol (RTCP): Opportunities 1677 and Consequences", RFC 5506, April 2009. 1679 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1680 Media Attributes in the Session Description Protocol 1681 (SDP)", RFC 5576, June 2009. 1683 [RFC5763] Fischl, J., Tschofenig, H., and E. Rescorla, "Framework 1684 for Establishing a Secure Real-time Transport Protocol 1685 (SRTP) Security Context Using Datagram Transport Layer 1686 Security (DTLS) ", RFC 5763, May 2010. 1688 [W3C.WD-webrtc-20111027] 1689 Bergkvist, A., Burnett, D., Narayanan, A., and C. 1690 Jennings, "WebRTC 1.0: Real-time Communication Between 1691 Browsers", World Wide Web Consortium WD WD- 1692 webrtc-20111027, October 2011, 1693 . 1695 Appendix A. JSEP Implementation Examples 1697 A.1. Example API Flows 1699 Below are several sample flows for the new PeerConnection and library 1700 APIs, demonstrating when the various APIs are called in different 1701 situations and with various transport protocols. For clarity and 1702 simplicity, the createOffer/createAnswer calls are assumed to be 1703 synchronous in these examples, whereas the actual APIs are async. 1705 A.1.1. Call using ROAP 1707 This example demonstrates a ROAP call, without the use of trickle 1708 candidates. 1710 // Call is initiated toward Answerer 1711 OffererJS->OffererUA: pc = new PeerConnection(); 1712 OffererJS->OffererUA: pc.addStream(localStream, null); 1713 OffererUA->OffererJS: iceCallback(candidate); 1714 OffererJS->OffererUA: offer = pc.createOffer(null); 1715 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1716 OffererJS->AnswererJS: {"type":"OFFER", "sdp":offer } 1718 // OFFER arrives at Answerer 1719 AnswererJS->AnswererUA: pc = new PeerConnection(); 1720 AnswererJS->AnswererUA: pc.setRemoteDescription("offer", msg.sdp); 1721 AnswererUA->AnswererJS: onaddstream(remoteStream); 1722 AnswererUA->OffererUA: iceCallback(candidate); 1724 // Answerer accepts call 1725 AnswererJS->AnswererUA: pc.addStream(localStream, null); 1726 AnswererJS->AnswererUA: answer = pc.createAnswer(msg.sdp, null); 1727 AnswererJS->AnswererUA: pc.setLocalDescription("answer", answer); 1728 AnswererJS->OffererJS: {"type":"ANSWER","sdp":answer } 1730 // ANSWER arrives at Offerer 1731 OffererJS->OffererUA: pc.setRemoteDescription("answer", answer); 1732 OffererUA->OffererJS: onaddstream(remoteStream); 1734 // ICE Completes (at Answerer) 1735 AnswererUA->OffererUA: Media 1737 // ICE Completes (at Offerer) 1738 OffererJS->AnswererJS: {"type":"OK" } 1739 OffererUA->AnswererUA: Media 1741 A.1.2. Call using XMPP 1743 This example demonstrates an XMPP call, making use of trickle 1744 candidates. 1746 // Call is initiated toward Answerer 1747 OffererJS->OffererUA: pc = new PeerConnection(); 1748 OffererJS->OffererUA: pc.addStream(localStream, null); 1749 OffererJS->OffererUA: offer = pc.createOffer(null); 1750 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1751 OffererJS: xmpp = createSessionInitiate(offer); 1752 OffererJS->AnswererJS: 1754 OffererJS->OffererUA: pc.startIce(); 1755 OffererUA->OffererJS: onicecandidate(cand); 1756 OffererJS: createTransportInfo(cand); 1757 OffererJS->AnswererJS: 1759 // session-initiate arrives at Answerer 1760 AnswererJS->AnswererUA: pc = new PeerConnection(); 1761 AnswererJS: offer = parseSessionInitiate(xmpp); 1762 AnswererJS->AnswererUA: pc.setRemoteDescription("offer", offer); 1763 AnswererUA->AnswererJS: onaddstream(remoteStream); 1765 // transport-infos arrive at Answerer 1766 AnswererJS->AnswererUA: candidate = parseTransportInfo(xmpp); 1767 AnswererJS->AnswererUA: pc.addIceCandidate(candidate); 1768 AnswererUA->AnswererJS: onicecandidate(cand) 1769 AnswererJS: createTransportInfo(cand); 1770 AnswererJS->OffererJS: 1772 // transport-infos arrive at Offerer 1773 OffererJS->OffererUA: candidates = parseTransportInfo(xmpp); 1774 OffererJS->OffererUA: pc.addIceCandidate(candidates); 1776 // Answerer accepts call 1777 AnswererJS->AnswererUA: pc.addStream(localStream, null); 1778 AnswererJS->AnswererUA: answer = pc.createAnswer(offer, null); 1779 AnswererJS: xmpp = createSessionAccept(answer); 1780 AnswererJS->AnswererUA: pc.setLocalDescription("answer", answer); 1781 AnswererJS->OffererJS: 1783 // session-accept arrives at Offerer 1784 OffererJS: answer = parseSessionAccept(xmpp); 1785 OffererJS->OffererUA: pc.setRemoteDescription("answer", answer); 1786 OffererUA->OffererJS: onaddstream(remoteStream); 1788 // ICE Completes (at Answerer) 1789 AnswererUA->OffererUA: Media 1791 // ICE Completes (at Offerer) 1792 OffererUA->AnswererUA: Media 1794 A.1.3. Adding video to a call, using XMPP 1796 This example demonstrates an XMPP call, where the XMPP content-add 1797 mechanism is used to add video media to an existing session. For 1798 simplicity, candidate exchange is not shown. 1800 Note that the offerer for the change to the session may be different 1801 than the original call offerer. 1803 // Offerer adds video stream 1804 OffererJS->OffererUA: pc.addStream(videoStream) 1805 OffererJS->OffererUA: offer = pc.createOffer(null); 1806 OffererJS: xmpp = createContentAdd(offer); 1807 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1808 OffererJS->AnswererJS: 1809 // content-add arrives at Answerer 1810 AnswererJS: offer = parseContentAdd(xmpp); 1811 AnswererJS->AnswererUA: pc.setRemoteDescription("offer", offer); 1812 AnswererJS->AnswererUA: answer = pc.createAnswer(offer, null); 1813 AnswererJS->AnswererUA: pc.setLocalDescription("answer", answer); 1814 AnswererJS: xmpp = createContentAccept(answer); 1815 AnswererJS->OffererJS: 1817 // content-accept arrives at Offerer 1818 OffererJS: answer = parseContentAccept(xmpp); 1819 OffererJS->OffererUA: pc.setRemoteDescription("answer", answer); 1821 A.1.4. Simultaneous add of video streams, using XMPP 1823 This example demonstrates an XMPP call, where new video sources are 1824 added at the same time to a call that already has video; since adding 1825 these sources only affects one side of the call, there is no 1826 conflict. The XMPP description-info mechanism is used to indicate 1827 the new sources to the remote side. 1829 // Offerer and "Answerer" add video streams at the same time 1830 OffererJS->OffererUA: pc.addStream(offererVideoStream2) 1831 OffererJS->OffererUA: offer = pc.createOffer(null); 1832 OffererJS: xmpp = createDescriptionInfo(offer); 1833 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1834 OffererJS->AnswererJS: 1836 AnswererJS->AnswererUA: pc.addStream(answererVideoStream2) 1837 AnswererJS->AnswererUA: offer = pc.createOffer(null); 1838 AnswererJS: xmpp = createDescriptionInfo(offer); 1839 AnswererJS->AnswererUA: pc.setLocalDescription("offer", offer); 1840 AnswererJS->OffererJS: 1842 // description-info arrives at "Answerer", and is acked 1843 AnswererJS: offer = parseDescriptionInfo(xmpp); 1844 AnswererJS->OffererJS: // ack 1846 // description-info arrives at Offerer, and is acked 1847 OffererJS: offer = parseDescriptionInfo(xmpp); 1848 OffererJS->AnswererJS: // ack 1850 // ack arrives at Offerer; remote offer is used as an answer 1851 OffererJS->OffererUA: pc.setRemoteDescription("answer", offer); 1853 // ack arrives at "Answerer"; remote offer is used as an answer 1854 AnswererJS->AnswererUA: pc.setRemoteDescription("answer", offer); 1856 A.1.5. Call using SIP 1858 This example demonstrates a simple SIP call (e.g. where the client 1859 talks to a SIP proxy over WebSockets). 1861 // Call is initiated toward Answerer 1862 OffererJS->OffererUA: pc = new PeerConnection(); 1863 OffererJS->OffererUA: pc.addStream(localStream, null); 1864 OffererUA->OffererJS: onicecandidate(candidate); 1865 OffererJS->OffererUA: offer = pc.createOffer(null); 1866 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1867 OffererJS: sip = createInvite(offer); 1868 OffererJS->AnswererJS: SIP INVITE w/ SDP 1870 // INVITE arrives at Answerer 1871 AnswererJS->AnswererUA: pc = new PeerConnection(); 1872 AnswererJS: offer = parseInvite(sip); 1873 AnswererJS->AnswererUA: pc.setRemoteDescription("offer", offer); 1874 AnswererUA->AnswererJS: onaddstream(remoteStream); 1875 AnswererUA->OffererUA: onicecandidate(candidate); 1877 // Answerer accepts call 1878 AnswererJS->AnswererUA: pc.addStream(localStream, null); 1879 AnswererJS->AnswererUA: answer = pc.createAnswer(offer, null); 1880 AnswererJS: sip = createResponse(200, answer); 1881 AnswererJS->AnswererUA: pc.setLocalDescription("answer", answer); 1882 AnswererJS->OffererJS: 200 OK w/ SDP 1884 // 200 OK arrives at Offerer 1885 OffererJS: answer = parseResponse(sip); 1886 OffererJS->OffererUA: pc.setRemoteDescription("answer", answer); 1887 OffererUA->OffererJS: onaddstream(remoteStream); 1888 OffererJS->AnswererJS: ACK 1890 // ICE Completes (at Answerer) 1891 AnswererUA->OffererUA: Media 1893 // ICE Completes (at Offerer) 1894 OffererUA->AnswererUA: Media 1896 A.1.6. Handling early media (e.g. 1-800-GO FEDEX), using SIP 1898 This example demonstrates how early media could be handled; for 1899 simplicity, only the offerer side of the call is shown. 1901 // Call is initiated toward Answerer 1902 OffererJS->OffererUA: pc = new PeerConnection(); 1903 OffererJS->OffererUA: pc.addStream(localStream, null); 1904 OffererUA->OffererJS: onicecandidate(candidate); 1905 OffererJS->OffererUA: offer = pc.createOffer(null); 1906 OffererJS->OffererUA: pc.setLocalDescription("offer", offer); 1907 OffererJS: sip = createInvite(offer); 1908 OffererJS->AnswererJS: SIP INVITE w/ SDP 1910 // 180 Ringing is received by offerer, w/ SDP 1911 OffererJS: answer = parseResponse(sip); 1912 OffererJS->OffererUA: pc.setRemoteDescription("pranswer", answer); 1913 OffererUA->OffererJS: onaddstream(remoteStream); 1915 // ICE Completes (at Offerer) 1916 OffererUA->AnswererUA: Media 1918 // 200 OK arrives at Offerer 1919 OffererJS: answer = parseResponse(sip); 1920 OffererJS->OffererUA: pc.setRemoteDescription("answer", answer); 1921 OffererJS->AnswererJS: ACK 1923 A.2. Example Session Descriptions 1925 A.2.1. createOffer 1927 This SDP shows a typical initial offer, created by createOffer for a 1928 PeerConnection with a single audio MediaStreamTrack, a single video 1929 MediaStreamTrack, and a single data channel. Host candidates have 1930 also already been gathered. Note some lines have been broken into 1931 two lines for formatting reasons. 1933 v=0 1934 o=- 4962303333179871722 1 IN IP4 0.0.0.0 1935 s=- 1936 t=0 0 1937 a=group:BUNDLE audio video data 1938 m=audio 56500 RTP/SAVPF 111 0 8 126 1939 c=IN IP4 192.0.2.1 1940 a=rtcp:56501 IN IP4 192.0.2.1 1941 a=candidate:3348148302 1 udp 2113937151 192.0.2.1 56500 1942 typ host generation 0 1943 a=candidate:3348148302 2 udp 2113937151 192.0.2.1 56501 1944 typ host generation 0 1945 a=ice-ufrag:ETEn1v9DoTMB9J4r 1946 a=ice-pwd:OtSK0WpNtpUjkY4+86js7ZQl 1947 a=ice-options:trickle 1948 a=mid:audio 1949 a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level 1950 a=sendrecv 1951 a=rtcp-mux 1952 a=rtcp-rsize 1953 a=fingerprint:sha-256 1954 19:E2:1C:3B:4B:9F:81:E6:B8:5C:F4:A5:A8:D8:73:04 1955 :BB:05:2F:70:9F:04:A9:0E:05:E9:26:33:E8:70:88:A2 1956 a=setup:actpass 1957 a=rtpmap:111 opus/48000/2 1958 a=fmtp:111 minptime=10 1959 a=rtpmap:0 PCMU/8000 1960 a=rtpmap:8 PCMA/8000 1961 a=rtpmap:126 telephone-event/8000 1962 a=maxptime:60 1963 a=ssrc:1732846380 cname:EocUG1f0fcg/yvY7 1964 a=msid:47017fee-b6c1-4162-929c-a25110252400 1965 f83006c5-a0ff-4e0a-9ed9-d3e6747be7d9 1966 m=video 56502 RTP/SAVPF 100 115 116 117 1967 c=IN IP4 192.0.2.1 1968 a=rtcp:56503 IN IP4 192.0.2.1 1969 a=candidate:3348148302 1 udp 2113937151 192.0.2.1 56502 1970 typ host generation 0 1971 a=candidate:3348148302 2 udp 2113937151 192.0.2.1 56503 1972 typ host generation 0 1973 a=ice-ufrag:BGKkWnG5GmiUpdIV 1974 a=ice-pwd:mqyWsAjvtKwTGnvhPztQ9mIf 1975 a=ice-options:trickle 1976 a=mid:video 1977 a=extmap:2 urn:ietf:params:rtp-hdrext:toffset 1978 a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time 1979 a=sendrecv 1980 a=rtcp-mux 1981 a=rtcp-rsize 1982 a=fingerprint:sha-256 1983 19:E2:1C:3B:4B:9F:81:E6:B8:5C:F4:A5:A8:D8:73:04 1984 :BB:05:2F:70:9F:04:A9:0E:05:E9:26:33:E8:70:88:A2 1985 a=setup:actpass 1986 a=rtpmap:100 VP8/90000 1987 a=rtcp-fb:100 ccm fir 1988 a=rtcp-fb:100 nack 1989 a=rtcp-fb:100 goog-remb 1990 a=rtpmap:115 rtx/90000 1991 a=fmtp:115 apt=100 1992 a=rtpmap:116 red/90000 1993 a=rtpmap:117 ulpfec/90000 1994 a=ssrc:1366781083 cname:EocUG1f0fcg/yvY7 1995 a=ssrc:1366781084 cname:EocUG1f0fcg/yvY7 1996 a=ssrc:1366781085 cname:EocUG1f0fcg/yvY7 1997 a=ssrc-group:FID 1366781083 1366781084 1998 a=ssrc-group:FEC 1366781083 1366781085 1999 a=msid:61317484-2ed4-49d7-9eb7-1414322a7aae 2000 f30bdb4a-5db8-49b5-bcdc-e0c9a23172e0 2001 m=application 56504 DTLS/SCTP 5000 2002 c=IN IP4 192.0.2.1 2003 a=candidate:3348148302 1 udp 2113937151 192.0.2.1 56504 2004 typ host generation 0 2005 a=ice-ufrag:VD5v2BnbZm3mgP3d 2006 a=ice-pwd:+Jlkuox+VVIUDqxcfIDuTZMH 2007 a=ice-options:trickle 2008 a=mid:data 2009 a=fingerprint:sha-256 19:E2:1C:3B:4B:9F:81:E6:B8:5C:F4:A5:A8:D8:73:04 2010 :BB:05:2F:70:9F:04:A9:0E:05:E9:26:33:E8:70:88:A2 2011 a=setup:actpass 2012 a=fmtp:5000 protocol=webrtc-datachannel; streams=10 2014 A.2.2. createAnswer 2016 This SDP shows a typical initial answer to the above offer, created 2017 by createAnswer for a PeerConnection with a single audio 2018 MediaStreamTrack, a single video MediaStreamTrack, and a single data 2019 channel. Host candidates have also already been gathered. Note some 2020 lines have been broken into two lines for formatting reasons. 2022 v=0 2023 o=- 6729291447651054566 1 IN IP4 0.0.0.0 2024 s=- 2025 t=0 0 2026 a=group:BUNDLE audio video data 2027 m=audio 20000 RTP/SAVPF 111 0 8 126 2028 c=IN IP4 192.0.2.2 2029 a=candidate:2299743422 1 udp 2113937151 192.0.2.2 20000 2030 typ host generation 0 2031 a=ice-ufrag:6sFvz2gdLkEwjZEr 2032 a=ice-pwd:cOTZKZNVlO9RSGsEGM63JXT2 2033 a=fingerprint:sha-256 6B:8B:F0:65:5F:78:E2:51:3B:AC:6F:F3:3F:46:1B:35 2034 :DC:B8:5F:64:1A:24:C2:43:F0:A1:58:D0:A1:2C:19:08 2035 a=setup:active 2036 a=mid:audio 2037 a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level 2038 a=sendrecv 2039 a=rtcp-mux 2040 a=rtpmap:111 opus/48000/2 2041 a=fmtp:111 minptime=10 2042 a=rtpmap:0 PCMU/8000 2043 a=rtpmap:8 PCMA/8000 2044 a=rtpmap:126 telephone-event/8000 2045 a=maxptime:60 2046 a=ssrc:3429951804 cname:Q/NWs1ao1HmN4Xa5 2047 a=msid:PI39StLS8W7ZbQl1sJsWUXkr3Zf12fJUvzQ1 2048 PI39StLS8W7ZbQl1sJsWUXkr3Zf12fJUvzQ1a0 2049 m=video 20000 RTP/SAVPF 100 115 116 117 2050 c=IN IP4 192.0.2.2 2051 a=candidate:2299743422 1 udp 2113937151 192.0.2.2 20000 2052 typ host generation 0 2053 a=ice-ufrag:6sFvz2gdLkEwjZEr 2054 a=ice-pwd:cOTZKZNVlO9RSGsEGM63JXT2 2055 a=fingerprint:sha-256 6B:8B:F0:65:5F:78:E2:51:3B:AC:6F:F3:3F:46:1B:35 2056 :DC:B8:5F:64:1A:24:C2:43:F0:A1:58:D0:A1:2C:19:08 2057 a=setup:active 2058 a=mid:video 2059 a=extmap:2 urn:ietf:params:rtp-hdrext:toffset 2060 a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time 2061 a=sendrecv 2062 a=rtcp-mux 2063 a=rtpmap:100 VP8/90000 2064 a=rtcp-fb:100 ccm fir 2065 a=rtcp-fb:100 nack 2066 a=rtcp-fb:100 goog-remb 2067 a=rtpmap:115 rtx/90000 2068 a=fmtp:115 apt=100 2069 a=rtpmap:116 red/90000 2070 a=rtpmap:117 ulpfec/90000 2071 a=ssrc:3229706345 cname:Q/NWs1ao1HmN4Xa5 2072 a=ssrc:3229706346 cname:Q/NWs1ao1HmN4Xa5 2073 a=ssrc:3229706347 cname:Q/NWs1ao1HmN4Xa5 2074 a=ssrc-group:FID 3229706345 3229706346 2075 a=ssrc-group:FEC 3229706345 3229706347 2076 a=msid:PI39StLS8W7ZbQl1sJsWUXkr3Zf12fJUvzQ1 2077 PI39StLS8W7ZbQl1sJsWUXkr3Zf12fJUvzQ1v0 2078 m=application 20000 DTLS/SCTP 5000 2079 c=IN IP4 192.0.2.2 2080 a=candidate:2299743422 1 udp 2113937151 192.0.2.2 20000 2081 typ host generation 0 2082 a=ice-ufrag:6sFvz2gdLkEwjZEr 2083 a=ice-pwd:cOTZKZNVlO9RSGsEGM63JXT2 2084 a=fingerprint:sha-256 6B:8B:F0:65:5F:78:E2:51:3B:AC:6F:F3:3F:46:1B:35 2085 :DC:B8:5F:64:1A:24:C2:43:F0:A1:58:D0:A1:2C:19:08 2086 a=setup:active 2087 a=mid:data 2088 a=fmtp:5000 protocol=webrtc-datachannel; streams=10 2090 Appendix B. Change log 2091 Changes in draft-04: 2093 o Filled in sections on createOffer and createAnswer. 2095 o Added SDP examples. 2097 o Fixed references. 2099 Changes in draft-03: 2101 o Added text describing relationship to W3C specification 2103 Changes in draft-02: 2105 o Converted from nroff 2107 o Removed comparisons to old approaches abandoned by the working 2108 group 2110 o Removed stuff that has moved to W3C specification 2112 o Align SDP handling with W3C draft 2114 o Clarified section on forking. 2116 Changes in draft-01: 2118 o Added diagrams for architecture and state machine. 2120 o Added sections on forking and rehydration. 2122 o Clarified meaning of "pranswer" and "answer". 2124 o Reworked how ICE restarts and media directions are controlled. 2126 o Added list of parameters that can be changed in a description. 2128 o Updated suggested API and examples to match latest thinking. 2130 o Suggested API and examples have been moved to an appendix. 2132 Changes in draft -00: 2134 o Migrated from draft-uberti-rtcweb-jsep-02. 2136 Authors' Addresses 2137 Justin Uberti 2138 Google 2139 747 6th Ave S 2140 Kirkland, WA 98033 2141 USA 2143 Email: justin@uberti.name 2145 Cullen Jennings 2146 Cisco 2147 170 West Tasman Drive 2148 San Jose, CA 95134 2149 USA 2151 Email: fluffy@iii.ca