Network Working Group H. Kaplan Internet Draft Acme Packet Intended status: Informational D. Burnett Expires: April 21, 2012 Voxeo N. Stratford Voxeo Tim Panton PhoneFromHere.com October 21, 2011 API Requirements for RTCWEB-enabled Browsers draft-kaplan-rtcweb-api-reqs-00 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 21, 2011. Copyright and License Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with Kaplan Expires April 24, 2012 [Page 1] Internet-Draft Tao of Web October 2011 respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License. Abstract This document discusses the advantages and disadvantages of several proposed approaches to what type of API and architectural model RTCWeb Browsers should expose and use. The document then defines the requirements for an API that treats the Browser as a library and interface as opposed to a self-contained application agent. Table of Contents 1. Terminology...................................................2 2. Introduction..................................................2 3. Defining an RTCWeb Protocol in the Browser....................4 4. Leaving Logic to Web Developers...............................6 5. API Requirements..............................................8 5.1. Browser User-Interface Requirements......................8 5.2. Media Properties.........................................9 5.3. RTP/RTCP Properties.....................................10 5.4. Data-stream Properties..................................11 5.5. IP and ICE Properties...................................11 5.6. API Design Recommendations..............................12 6. Security Considerations......................................12 7. IANA Considerations..........................................12 8. Acknowledgments..............................................12 9. References...................................................13 9.1. Informative References..................................13 Authors' Addresses...............................................13 1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. The terminology in this document conforms to RFC 2828, "Internet Security Glossary". 2. Introduction There has been a long discussion in the RTCWeb WG mailing list concerning whether any "signaling" or protocol should be standardized between the Browser and its Server, other Browsers, or the JavaScript it runs. Kaplan, et al Expires - April 2011 [Page 2] Internet-Draft Tao of Web October 2011 Within the context of the RTCWeb Browser architecture, shown in Figure 1 below, the discussion is centered on how much intelligence, logic, state, and decisions are built into the Browser, vs. provided by Javascript. +------------------------+ On-the-wire | | Protocols | Servers |---------> | | | | +------------------------+ ^ | | | HTTP/ | Websockets | | +----------------------------+ | Javascript/HTML/CSS | +----------------------------+ Other ^ ^RTCWEB APIs | |API +---|-----------------|------+ | | | | | +---------+| | | Browser || On-the-wire | Browser | RTC || Protocols | | Function|-----------> | | || | | || | +---------+| +---------------------|------+ | V Native OS Services Figure 1: Browser Model There has been some discussion that the protocol running over the HTTP/Websockets connection between the Javascript and the Server be standardized, which will be discussed in this document. There has also been discussion that the interface between the Javascript and Browser be a protocol, rather than an API, such that the JavaScript could pass the protocol's messages as opaque blobs between two Browsers to establish the media-plane characteristics. Kaplan, et al Expires - April 2011 [Page 3] Internet-Draft Tao of Web October 2011 An example of such a protocol interface is described in [draft- roap]. That will also be discussed in this document. The conclusion of the discussion in this document concerning those designs is that they are detrimental to the applications enabled with RTCWeb, and contrary to the Web Application model in general. Therefore, this document defines the beginning list of requirements for an API that we feel is more appropriate for Browsers to expose. 3. Defining an RTCWeb Protocol in the Browser Proposals have been made for integrating an entire session signaling protocol into the Browser, in [draft-signaling], and for integrating an SDP offer/answer protocol in the Browser, in [draft-roap]. This section discusses the benefits and drawbacks of doing such things in Browsers. 1) For a session signaling mechanism to work, it is not sufficient to just implement something in the browser. The Server also has to have be involved in the protocol, in order to forward the protocol messages between the appropriate Browsers. Minimally, this requires identity and location services, such as a user database and which Browser connections are which user, etc. Often it involves authentication and authorization decisions as well. In SIP, for example, this would be the role of a Proxy and Registrar. In Web applications, such things are typically handled in an application-specific way based on the needs and architecture of the specific Web application. For example, a gaming website already knows who its users are and where they are as part of their game application, while Facebook already knows who its users are and where they are in its specific application. There is no need to standardize this in any way, and attempting to do so would be fruitless since it would have to make assumptions about the applications that could not possibly be known in advance, and thus not usable in practice. 2) For either session signaling or SDP offer/answer protocols, integrating the protocols into the Browser means more logic and state is in the Browser, and ultimately more code. This leads to the following properties: a. It is easier for simple Web-application developers to initially deploy, if the code they need was built-in to the Browser the way they needed it to be. b. But, the more logic is placed into the Browser, the more need there is to extend/enhance/fix that logic in the future, and Web-application developers have little control over users to upgrade their Browsers. c. History has shown that the more complex the interface is between two implementations, the more interoperability problems occur. Ultimately the best way to provide Kaplan, et al Expires - April 2011 [Page 4] Internet-Draft Tao of Web October 2011 interoperability is to run the same actual source-code; short of that, the less logic placed into it, the better the odds of interoperability. 3) For the SDP offer/answer protocol proposal, a benefit is that for some very simple applications it makes deployment easier, if the simple application does not need to know anything about the SDP content or offer/answer semantics. If an application needs control at the media layer, then it could create a fake shim/interface from some real SDP in Javascript, to the SDP offer/answer protocol in the Browser. Thus it trades off additional simplicity for simple applications, against additional complexity for advanced applications. If the goal of RTCWEB is purely simplicity, this might seem a reasonable trade-off; if the goal is innovation, however, then making it harder for advanced uses means making it harder to innovate. 4) For the SDP offer/answer protocol proposal, an argument has been made that the logic/state required for media already has to exist in the Browser itself, and thus splitting the domain of responsibility between the Browser and Javascript is more difficult than keeping it all in the Browser. We believe this conclusion is drawn from an implicit assumption that the Browser should be dealing with SDP to begin with. Unfortunately, SDP is not just about media characteristics. There are numerous attributes in SDP that are actually properties of a higher layer than RTP and codecs. For example, the following IANA-registered SDP attributes would be unknown to a media library in the browser and only known to the Javascript: cat, keywds, tool, type, charset, lang, setup, connection, confid, userid, floorid, and probably a bunch more (we haven't investigated them all). The point is that it is NOT true that "all the SDP information needs to be handled by the Browser, so why not put offer/answer in it too?". 5) Building the SDP offer/answer model into the Browser restricts the Web application to only being able to do things that can be encoded and communicated with the SDP offer/answer model. As an example of something that cannot be accomplished because of this: imagine a Web-application that allows the Browser to communicate with a TelePresence (TP) system. TP systems have multiple cameras, screen displays, microphones, and speakers. A PC-based Browser typically only has a single microphone and camera, but can display multiple video feeds separately and can render-mix the incoming audio streams. Thus, a Browser to TP system would produce an asymmetric media stream model: multiple video streams from the TP system to the Browser, and one video stream from the Browser to the TP system, and the same for audio. Each TP stream is an independent RTP session and has unique attributes to indicate position (left/center/right). Encoding that is currently not possible with SDP offer/answer; not only because the SDP attributes aren't yet defined, but because the offer/answer model Kaplan, et al Expires - April 2011 [Page 5] Internet-Draft Tao of Web October 2011 assumes a symmetric number of media-lines (m= lines), and also that attributes represent media-receiving characteristics as opposed to media-sending capabilities. Clearly if and when SDP is changed to handle TelePresence cases, Browsers could be upgraded to handle it as well sometime after; but they wouldn't need to if the Browser hadn't been involved in SDP to begin with. SDP information isn't strictly that of an RTP library layer; it's not a one-to-one correlation. 6) Some Web application developers may prefer to make the decision of which codecs/media-properties to use in the Server, and command all the Browsers in a given session to do so. In some respects this is the very simplest model possible; but with SDP offer/answer model being forced on the developer it becomes much more complicated to achieve. 7) Since SDP offer/answer mechanism is a protocol, involving both state machines and encoding schemes, interoperability between different vendor implementations is not guaranteed. In fact, real-world SIP deployments have experienced interoperability problems with both SDP and the offer/answer model. 8) For both session signaling and SDP offer/answer, troubleshooting and debugging become difficult for the web-app provider if a problem occurs in the protocol built in the Browser. Even if the Javascript snoops on SIP or ROAP message exchanges and pushes back copies to the server in case of failure, the developer has to guess what the cause of an error response is. In other words, it's the difference between having only Wireshark traces to debug with vs. also having internal logs from code procedures. 4. Leaving Logic to Web Developers The alternative to embedding protocols in the Browser, is to leave the work up to Javascript, for whatever "work" might be required for the particular application. After all, the actual knowledge of what the specific Web application does, wants, how it encodes it, etc. is only fully known by the Web developer for that application, and thus by the Javascript+Server-code combination employed (i.e., the application "source-code"). Clearly the Browser needs to perform quite a bit of "logic": for implementing codec rendering/encoding, RTP/RTCP protocols, SRTP, and ICE. That is unavoidable, and not in question. The question is who should be in control, where any additional logic should be placed, and what the API model should be. There has been discussion that RTCWeb should strive to enable media communication session with about "20 lines of code". We assert the only means of achieving that goal in a production-deployment manner is to use Javascript, and in particular Javascript libraries. Kaplan, et al Expires - April 2011 [Page 6] Internet-Draft Tao of Web October 2011 Javascript libraries are used by a huge number of Web applications, and they work. Some of the libraries are so popular, reference books have been published for them. Yes, there are a lot of libraries, but that's a *good thing*. The properties of using Javascript, and Javascript libraries, are as follows: 1) The logic is under the control of the Web developer. This menas: a. If something is broken, the Web developer can generate log- type debug information within the javascript and push it back to the server or log collector, and determine what is broken and when to fix it; they do not need to rely on asking for the user to provide Browser logs, rely on the Browser to generate useful logs, understand the logs, nor rely on Browser manufacturers to prioritize fixing them. b. If an enhancement can be made, the Web developer can decide if and when to do so; they do not need to rely on Browser manufacturer decisions and timescales. 2) All the Browsers using a given application site run the same literal Javascript source-code provided for that application. There is no greater means of achieving interoperability than that. 3) If specific Browsers enable something proprietary, or some new media extension, the Web developer can decide whether to use it or not, when, and how. And the mechanism can be made flexibly; for example, new codecs do not need new Javascript code to be used, unless the Javascript wishes to follow a model where they do need new Javascript to be used. (see API section on this) In other words, the Web-application developer can be as conservative or liberal as he/she wishes. There are already known use-cases where a Web-app will never want to use new codecs or capabilities introduced into Browser RTP libraries, and there are known use- cases for the opposite. Let the Web-app developer make that choice. 4) The Javascript code does need to be downloaded (although Browser caching does exist), and clearly the larger the Javascript, the longer it takes to download. BUT, popular Javascript libraries are so necessary in modern Web applications, that they are often available for free and fast downloading by local delivery networks. 5) There are properties of the media library API that Javascript may need to access that cannot and should not be expressed in SDP. Some of these are described in the "Hints" and "Stats" section of [draft-roap]. These will need a true API rather than an SDP offer/answer protocol to learn, yet they are tied to the information in the SDP regarding the media streams and codecs. Therefore, it is not the case that the Javascript does not need to understand SDP and could treat it as an opaque blob. 6) There are settings of the media library API that Javascript may want to set that cannot be expressed in SDP. For example, setting Kaplan, et al Expires - April 2011 [Page 7] Internet-Draft Tao of Web October 2011 which local audio or video sources to fork to two or more remote parties. Another example is local Javascript setting the media library to use audio only even if an incoming session's remote peer Browser indicates both audio and video, because the local user only wants to use audio right now (e.g., they pressed some Javascript-provided button which meant "audio-only" because they're not wearing proper attire for this particular session). These types of decisions and logic are not the domain of the Browser, but rather the Javascript, yet they are also integral to the SDP offer/answer. 7) Debugging issues with signaling or SDP and the offer/answer model (if it's even used to begin with) are easier for the Web-app provider because they can save whatever information they need to debug their Javascriopt, within their own Javascript code's execution logic, and push it to their Server over HTTP/websocket however they see fit. 5. API Requirements Some requirements for an API are already documented in [draft-use- cases-and-requirements]. This section expands upon those in further detail, and adds new ones. In all cases, the term "Application" means the Javascript, and "Web API" refers to the Javascript <-> Browser API. It is not the goal of this document to define the actual API - that's W3C's job. [Note: this is a strawman list] 5.1. Browser User-Interface Requirements REQ-ID DESCRIPTION ---------------------------------------------------------------- A1-1 The Web API MUST provide a means for the application to ask the browser for permission to use cameras and microphones as input devices. ---------------------------------------------------------------- A1-2 The Web API MUST provide a means for the application to ask the browser for permission to the screen, a certain area on the screen or what a certain application displays on the screen as input to streams, and which stream. ---------------------------------------------------------------- A1-3 The Web API MUST provide a means for the application to disable/enable the microphone and camera inputs. [Note: this does NOT mean disabling RTP transmission] ---------------------------------------------------------------- A1-4 The Web API MUST provide a means for the application to disable/enable the rendering of Kaplan, et al Expires - April 2011 [Page 8] Internet-Draft Tao of Web October 2011 received audio and video, per stream. ---------------------------------------------------------------- 5.2. Media Properties REQ-ID DESCRIPTION ---------------------------------------------------------------- A2-1 The Web API MUST provide a means for the application to learn what codecs and codec properties the Browser supports ---------------------------------------------------------------- A2-2 The Web API MUST provide a means for the Browser to indicate codecs and codec properties such that the application does not need to know about the specific codec type in advance ---------------------------------------------------------------- A2-3 The Web API MUST provide a means for the Browser to indicate codecs and codec properties such that the application can use them in SDP, for example by providing the IANA-registered encoding name for the payload format, and the format specific parameters as strings, such that they could be used in the 'a=rtpmap' and 'a=fmtp' lines of SDP should the Javascript wish to create SDP containing codecs unknown to it. ---------------------------------------------------------------- A2-4 The Web API MUST provide means for the application to get the following media codec properties: bandwidth, clock rate, number of channels, type (audio vs. video) ---------------------------------------------------------------- A2-5 The Web API MUST provide a means for the application to get the bandwidth values for codecs which support multiple levels, and set it for codecs which can be controlled/primed. ---------------------------------------------------------------- A2-6 The Web API MUST provide a means for the application to set whether to use silence suppression or not, for codecs which support it. ---------------------------------------------------------------- A2-7 The Web API MUST provide a means for the Browser to notify the application when a used codec falls below a given quality threshold [Note: it is TBD what "quality" means] ---------------------------------------------------------------- A2-8 The Web API MUST provide a means for the web application to detect the level in audio streams. Kaplan, et al Expires - April 2011 [Page 9] Internet-Draft Tao of Web October 2011 ---------------------------------------------------------------- A2-9 The Web API MUST provide a means for the web application to adjust the level in audio streams. ---------------------------------------------------------------- 5.3. RTP/RTCP Properties REQ-ID DESCRIPTION ---------------------------------------------------------------- A3-1 The Web API MUST provide a means for the application to get and set the SSRC value(s) ---------------------------------------------------------------- A3-2 The Web API MUST provide a means for the application to get and set the CNAME value(s) ---------------------------------------------------------------- A3-3 The Web API MUST provide a means for the application to get and set the Payload Type value(s) for each of the codecs ---------------------------------------------------------------- A3-4 The Web API MUST provide a means for the application to set the audio and video codecs to be used for each stream, for both rendering and generating separately, at any time. ---------------------------------------------------------------- A3-5 The Web API MUST provide means for the application to set whether to use SRTP, its encryption algorithm and key length, with or without authentication ---------------------------------------------------------------- A3-6 The Web API MUST provide a means for the application to set whether to use SRTP or not, and which key exchange type to use [Note: this is TBD pending SRTP decisions of WG] ---------------------------------------------------------------- A3-7 The Web API MUST provide a means for the application to set the SRTP master key value(s) ---------------------------------------------------------------- A3-8 The Web API MUST provide a means for the application to get DTLS-SRTP fingerprint value(s) ---------------------------------------------------------------- A3-10 The Web API MUST provide a means for the application to enable/disable generating RTP per stream [Note: this does not disable RTCP] ---------------------------------------------------------------- A3-11 The Web API MUST provide a means for the application to be notified by the Browser if RTCP is no longer being received from the far-end Kaplan, et al Expires - April 2011 [Page 10] Internet-Draft Tao of Web October 2011 ---------------------------------------------------------------- 5.4. Data-stream Properties This section will detail requirements for the API for the client-to- client data connection stream. [TBD, since no other document has proposed anything for this yet either] 5.5. IP and ICE Properties ---------------------------------------------------------------- A5-1 The Web API MUST provide a means for the application to get IPv4/v6 addresses and ports for receiving ICE/RTP/RTCP on, per stream ---------------------------------------------------------------- A5-2 The Web API MUST provide a means for the application to set a list of the remote IPv4/v6 addresses and ports to send to, per stream ---------------------------------------------------------------- A5-3 The Web API MUST provide a means for the application to set a list of TURN servers to use, including passwords ---------------------------------------------------------------- A5-4 The Web API MUST provide a means for the application to set a list of STUN servers to use ---------------------------------------------------------------- A5-5 The Web API MUST provide a means for the application to set the local ICE username and password ---------------------------------------------------------------- A5-6 The Web API MUST provide a means for the application to set the remote ICE username and password to perform connectivity checks with ---------------------------------------------------------------- A5-7 The Web API MUST provide a means for the application to set the remote IP Addresses and ports to perform connectivity checks with ---------------------------------------------------------------- A5-8 The Web API MUST provide a means for the application to get any IP Addresses and ports learned by the Browser from STUN, TURN, or other methods (such as UPnP, NAT-PMP, PCP), including their candidate-type, foundation, etc. ---------------------------------------------------------------- A5-9 The Web API MUST provide a means for the application to be notified by the Browser for Kaplan, et al Expires - April 2011 [Page 11] Internet-Draft Tao of Web October 2011 ICE event state changes ---------------------------------------------------------------- A5-10 The Web API MUST provide a means for the application to be notified by the Browser if the local in-use IP address changes or becomes inactive (e.g., link loss) ---------------------------------------------------------------- 5.6. API Design Recommendations Technically the API design is the role of the W3C. That hasn't stopped people in the IETF RTCWEB mailing list from discussing it ad nauseum, however, and even defining a protocol for it. This document therefore recommends the following to W3C: 1) That the API setters/getters function-arguments use separate/discrete values, instead of one long string of separate tokens in a pseudo-arbitrary order with weak and complex encoding rules. 2) That when the Javascript calls an API setter function to the Browser, that it be treated as a *command*, not a protocol negotiation. 3) *IF* any "blob" of information should be passed from the Browser to the Javascript and vice-versa, for use in such things as SDP, that it be something for which there would not likely be any use to a Javascript programmer and for which future extensions/changes would require Browser changes only but would not be easily representable in discrete fields. The most likely candidate for such a need for a "blob" would be ICE-specific SDP attributes. 4) That when a IETF documents start telling you how to build Javascript APIs, you should run far away... quickly. :) 6. Security Considerations There are no security implications for this document, yet - this is just a strawman document. 7. IANA Considerations This document makes no request of IANA. 8. Acknowledgments Many of the topics discussed in this document came from numerous email posts and threads on the IETF RTCWEB mailing list over the past couple months, so we will likely forget to recognize some people who have had their input written herein. We believe, though, that the following folks have possibly emailed something we've Kaplan, et al Expires - April 2011 [Page 12] Internet-Draft Tao of Web October 2011 stolen^M^M borrowed: Matthew Kaufman, Roman Shpount, Inaki Baz Castillo, Albert Einstein, Saul Ibarra Corretge, Victor Pascual, Henry Sinnreich, and Bernard Aboba. Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). 9. References 9.1. Informative References TBD Authors' Addresses Hadriel Kaplan Acme Packet Email: hkaplan@acmepacket.com Dan Burnett Voxeo Email: dburnett@voxeo.com Neil Stratford Voxeo Email: nstratford@voxeo.com Tim Panton PhoneFromHere.com Email: tim@phonefromhere.com Kaplan, et al Expires - April 2011 [Page 13]