Internet Engineering Task Force Yoshihiro Suzuki Internet-Draft D3 Communications Intended Status: Informational Nobuo Ogashiwa Expires: July 31, 2012 Maebashi Kyoai Gakuen College February 28, 2012 Real-Time Web Communication by using XMPP Jingle draft-suzuki-rtcweb-jingle-web-00 Abstract XMPP Jingle specification defines an XMPP protocol extension for managing peer-to-peer media sessions between two users. And XMPP Jingle can manage multi contents simultaneously in one Jingle stream, but in the XMPP world there is no common way to layout variable multi contents on each display. In this document, a solution to this issue is provided by using Web browser's layout function. This document describes a new concept to realize one of solutions of RTCWeb. Of course, "Web Browser" is used for Web application's frontend for real-time communication, and XMPP Jingle specification (XEP-166) is used as signaling protocol. And a simple mapping manner between jingle contents and HTML graphical elements enables to unite Web browser's layout function and Jingle's media content management function to realize RTCWeb functions. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Suzuki, et al. Expires July 31, 2012 [Page 1] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on July 31, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Architecture and Functional Model. . . . . . . . . . . . . . . 4 3. Procedures of Real-Time Communication. . . . . . . . . . . . . 5 3.1 Procedures of Initialization and Negotiation . . . . . . . . 5 3.2 Mapping between Contents in Jingle and HTML/DOM Elements . . 5 3.3 Procedures of Jingle Stream Connection . . . . . . . . . . . 6 3.4 Procedures of Adding or Deleting contents. . . . . . . . . . 6 3.5 Procedures of Termination. . . . . . . . . . . . . . . . . . 6 4. Security Considerations . . . . . . . . . . . . . . . . . . . 6 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6.1 Normative References . . . . . . . . . . . . . . . . . . . 6 6.2 Informative References . . . . . . . . . . . . . . . . . . 7 1. Introduction XMPP can signal various information between users' clients, and signaling infromation can be easily written by using XML syntax. So, XMPP has now over 300 extended specifications as XEP series specifications in XSF (XMPP Standards Foundation) without core protocols written as RFC 6120: XMPP CORE, RFC 6121: XMPP IM and so on. Suzuki, et al. Expires July 31, 2012 [Page 2] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 In the XMPP world, many extensions are about how to treat various signaling informations only. In IM (Instant Messaging) service which is very typical service of XMPP, text messages are sent and received as in-band data, or piggy-backed in signaling message like as SMS of cell phone services. In Jingle Specification (XEP-0166) and Jingle related specifications, it's possible for each client to negotiate stream's specifications as out of band media paths by using a set of Jingle signaling procedures. When each client can succeed negotiation, it sets up media path as a Jingle stream directly between 2 clients, and then it can directly send and receive any numbers of media contents in one Jingle stream directly without a help of XMPP server. As above description, Jingle specifications define to manage multi contents stream between 2 clients, but in these specifications it's not written about a common way to layout and render various contents on each display . Until now it is a implement matter in the XMPP world. In order to layout and render changeable number of contents, it will be defined how to layout and render the contents at the time when number of contents are changed. So, in this proposal, Web browser is introduced to realize a flexible frontend real-time communication services. And this proposal would be one of solutions to realize RTCWeb. 2. Architecture and functional model In this document, Browser model is basically same as RTCWeb overview document [I-D.ietf-rtcweb-overview] as follows. But there are small changes to simplify to use XMPP, signaling path is changed to XMPP signaling, and RTC media path is changed to XMPP Jingle media path. And "Browser RTC function" is changed to "Browser Jingle function" in order to clarify to use XMPP features. Suzuki, et al. Expires July 31, 2012 [Page 3] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 +-----------+------------+ On-the-wire | | | Protocols | Web | XMPP |---------> | Server | Server | XMPP Federation | | | Protocol +-----------+------------+ (Jingle Signaling ^ Path) | | | HTTP/XMPP | | | +----------------------------+ | JavaScript/HTML/CSS | +----------------------------+ Other ^ ^RTC APIs | |APIs +---|-----------------|------+ | | | | | +---------+| | | Browser || On-the-wire | Browser | Jingle || Protocols | | Function|-----------> | | || Jingle | | || Media Path | +---------+| +---------------------|------+ | V Native OS Services Figure 1: Browser Model by Using XMPP Suzuki, et al. Expires July 31, 2012 [Page 4] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 And Overview of the system is basically same as RTCWeb Overview document [I-D.ietf-rtcweb-overview] as follows. Like as Figure 1, there are small changes to clarify to use XMPP features. In the XMPP specifications, "federation protocol" is defined to signal between the peering servers. + -----+------+ +------+------+ | | | | | | | Web | XMPP | Signaling | Web | XMPP | | | |-------------| | | |Server|Server| path |Server|Server| | | | | | | + -----+------+ +------+------+ / \ / \ / \ HTTP/XMPP / \ / \ / HTTP/XMPP \ / \ +-----------+ +-----------+ |JS/HTML/CSS| |JS/HTML/CSS| +-----------+ +-----------+ +-----------+ +-----------+ | | | | | | | | | Browser | ------------------------- | Browser | | | Jingle media Path | | | | | | +-----------+ +-----------+ Figure 2: Browser RTC Trapezoid by Using XMPP Suzuki, et al. Expires July 31, 2012 [Page 5] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 3. Procedures of Real-Time Communication Basic procedure is almost same as Jingle specification, but there are some different points in order to introduce Web browser as real-time communication service frontend. XMPP's signaling is done by presence, message and IQ stanza. "Stanza is a basic set of XML statements in XMPP signaling. In this proposal, in order to map contents in a Jingle stream to HTML/DOM elements, we add some xml elements or attributes in Jingle signaling IQ stanzas. In the following section, it is showed how to use our proposed protocol extensions. In Jingle specifications, caller is called as "Jingle Initiator" and callee is called as "Jingle Responder". In figure 3, it shows simplified session flow of XMPP Jingle, and horizontal arrow is one IQ stanza with Jingle action type name. Doubled horizontal arrow is used to show out-band direct media path between caller and callee. Of course, usual IQ stanzas (single horizontal arrow) are transferred by the help of XMPP servers belong the signaling path. (In XMPP world, it may be available for each clients to connect its own server, server-server signaling protocol is called federation protocol.) Caller (Initiator) Callee (Responder) | | | session-initiate | |---------------------------->| | ack | |<----------------------------| | session-accept | |<----------------------------| | ack | |---------------------------->| | [optional further | | negotiation] | |<--------------------------->| | Real-Time Call (RTP) | |<===========================>| | session-terminate | |<----------------------------| | ack | |---------------------------->| | | Figure 3: simplified session flow Suzuki, et al. Expires July 31, 2012 [Page 6] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 3.1. Procedures of Initialization and Negotiation In the RTCWeb model, user gets initial HTML contents from Web server as a Web application, and this HTML contents must have JavaScript statements to set up signaling session between his client and a XMPP server. So HTML statements is loaded, Web browser kicks this JavaScript event handler (maybe "onLoaded" event handler), and JavaScript statements set up a XMPP signaling session at first. As next step, user would choose peer user to make a real-time communication call. Maybe it will be done by the event that user select a gui part on a some kind of address book. When this event is occurred, JavaScript statements starts to manage negotiation of stream specification. Some default contents in a Jingle stream will be proposed from caller (Jingle initiator) to callee (Jingle Responder) by using Jingle "session-initiate" IQ-stanza with some candidates of content specifications. One content specification have basically 2 XML elements, one is a media information and another one is a transport information. A "description" XML element have media information with media type and some candidates of codec specifications, and a "tranport" XML element have transport type information and some candidates of IP address and port set. Caller and Callee make negotiation by using some more IQ-stanzas, and when negotiation is finally succeeded, callee (Jingle responder) send session-accept IQ-stanza, at this time, caller's initial candidate specifications are filtered to one acceptable content specification for both caller and callee. In this proposal, session-initiate and session-accept IQ-stanza should have one more information, it's layout information of contents in a Jingle stream on the each clients' Web browser window. This layout information is used to map between a content in a Jingle stream and a HTML/DOM element. This is a very important point of this proposal. So, this proposal can manage and layout any number of contents in a Jingle stream and a Web browser. Following Example IQ-stanza has layout information with usual Jingle "session-initiate" XML elements as "" tag. Suzuki, et al. Expires July 31, 2012 [Page 7] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 Example 1. Example of IQ-stanza from caller (session-initiate) 3.2. Mapping between contents in Jingle and HTML/DOM elements As mentioned above, in our proposal layout tag in session-initiate IQ stanza is very important. In Example. 1, layout information has only one attribute "URL". For simple html can be written in the manner of XML syntax, but usual Web applications' HTML statements and related CSS and JavaScript statements are not conformed XML syntax, so these are described as URL simply. Web browser's Jingle function part hands over this URL information to Core of Web brwoser and Web browser loads new page of URL which describes layout information of initial default contents set in HTML syntax manner. Suzuki, et al. Expires July 31, 2012 [Page 8] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 In this proposal, We proposed 2 way to map a content in a Jingle stream to a element of HTML/DOM. One is the way to use common id in both a new "id" attribute of "" tag and "id" information in HTML/DOM element. In this way "id" must be unique in one real-time session, and make the same value as both a new "id" attribute of "" tag and "id" information in HTML/DOM element. This idea introduce a new "id" attribute in tag in Jingle signaling information. From the view of Web application or Ajax programmer, identical objects in HTML, CSS, JavaScript statements must have the same value of "id" attribute, this approach is extension of this rule to apply the content of a Jingle stream. So, This is natural extension for Ajax programmers, but it is needed to add new attribute in Jingle specification. Example 2. Mapping by "id" value --- Jingle stream specification --- ^^^^^^^^^^^^^^^^^ ----------------------------------- --- HTML layout specification ---
--------------------------------- Suzuki, et al. Expires July 31, 2012 [Page 9] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 Another way to map a content in a Jingle stream to a element of HTML/DOM is to use a set of ip address and port number written in "" tag information of a "" description, and in a HTML/DOM element this set of ip address and port number is described as src attribute of media tag. For the video or image content, This approach is very natural for protocol designer, because there is no change in existing specifications to map a Jingle content to a HTML/CSS/JavaScript elements. But, Web browser with Jingle Functions must keep mapping information between information of Jingle stream specification and identical object in HTML/CSS/JavaScript statements. Example 3. Mapping by a set of ip address and port number --- Jingle stream specification --- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ----------------------------------- --- HTML layout specification ---
--------------------------------- Suzuki, et al. Expires July 31, 2012 [Page 10] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 In this proposal, contents in a Jingle stream is related to a identical object of HTML/CSS/JavaScript , and a Jingle stream is related to layout description by using whole HTML/CSS statements. In Jingle specification, it is needed to add "" information, when Jingle specification is described or changed in Jingle signaling messages. So when a set of contents or a Jingle stream specification is changed, HTML/CSS statements must change to update mapping information. (Maybe user's Web Browser should upload new HTML/CSS/JavaScript set to Web Server, until new Jingle signaling message will be sent to peer's Web browser. And when HTML/CSS statements is updated, Web browser must load new page which include updated layout information. Of Course, in this proposal, without Jingle signaling message, each Web browser can change position of contents or scale of contents by using Ajax manner or apply new CSS statements only to adapt user's own request on one side. 3.3. Procedures of Jingle stream connection In section 4.1, both caller (Jingle initiator) and callee (Jingle responder) agreed contents' specifications in a Jingle stream and contents' layout information. Jingle specification defines how to connect between the Jingle clients, but does not define how to layout on the screen. So, the JavaScript program which is done the negotiation and connection of Jingle stream, then connects each content in a Jingle stream to HTML/DOM element by using mapping information which is described above. HTML rendering engine starts HTML/DOM rendering, and at same time CSS mechanism can relocate and rescale HTML/DOM elements. The previous section's and this section's procedures are automatically done by user's event which user made by pressing a gui parts on some address book interfaces. Of course if callee is not online, or lacks abilities to treat contents in a Jingle stream, or some error occurred in the procedures, caller's call will be failed. Details of error condition in the Jingle protocol are well defined in Jingle specifications (XEP-166 and some more). Maybe, JavaScript API will be well defined to make a realtime communication call in order to use various signaling protocol by W3C WebRTC WG. 3.4. Procedures of adding or deleting contents In Jingle specifications, adding or deleting contents in a Jingle stream are well defined, so in this proposal the important thing is to keep mapping information between the contents in a Jingle stream and HTML/DOM elements in HTML statements or a DOM tree. The following figure shows overall session status in a Jingle session. Suzuki, et al. Expires July 31, 2012 [Page 11] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 o | | session-initiate | | +---------->--------------+ |/ | PENDING o-----------------------+ | | | content-accept, | | | | content-add, | | | | content-modify, | | | | content-reject, | | | | content-remove, | | | | description-info, | | \|/ | session-info, | | | | transport-accept, | | | | transport-info, | | | | transport-reject, | | | | transport-replace | | | +--------------------+ | | | | session-accept \|/ | | ACTIVE o-----------------------+ | | | content-accept, | | | | content-add, | | | | content-modify, | | | | content-reject, | | | | content-remove, | | | | description-info, | | \|/ | session-info, | | | | transport-accept, | | | | transport-info, | | | | transport-reject, | | | | transport-replace | | | +--------------------+ | | | +------------->--------------+ | | session-terminate | o ENDED Figure 4: overall session status Suzuki, et al. Expires July 31, 2012 [Page 12] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 Following example is showed to adding one content to a jingle stream. If user requests to add more contents in a Jingle stream, more contents specification must be added to adapt a user's request. When a set of contents or a Jingle stream specification is changed, mapping information or information must be updated. Example 4. Example of IQ-stanza (content-add) sid='a73sjjvkla37jfea'> 128 3.5. Procedures of Termination Web browser on both side can terminate realtime communication call by using usual Jingle signaling manner, but there is one different thing. It is needed to set HTML/CSS/JavaScript statements to update Web browser window to show that realtime communication call is terminated. Example 5 is one of session-terminate signaling message. Suzuki, et al. Expires July 31, 2012 [Page 13] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 Example 5. Example of IQ-stanza (session-terminate) I'm outta here! 4. Security Considerations This document has no requirement for a change to the security models within associated protocols. 5. IANA Considerations T.B.D. 6. References 6.1. Normative References [RFC6120] P. Saint-Andre, "Extensible Messaging and Presence Protocol (XMPP): Core", March 2011. [RFC6121] P. Saint-Andre, "Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence, March 2011. [XEP-0166] Scott Ludwig et al, "Jingle", December 2011. Available at http://xmpp.org/extensions/xep-0166.html [XEP-0167] Scott Ludwig et al, "Jingle RTP Sessions" December 2009. Available at http://xmpp.org/extensions/xep-0167.html Suzuki, et al. Expires July 31, 2012 [Page 14] Internet-Draft RealTimeWeb Using XMPP Jingle February 2012 [I-D.ietf-rtcweb-overview] H. Alvestrand, "Overview: Real Time Protocols for Brower-based Applications", draft-ietf-rtcweb-overview-02 (work in progress), September 2011. [I-D.ietf-rtcweb-use-cases-and-requirements] C. Holmberg et al,"Web Real-Time Communication Use-cases and Requirements", draft-ietf-rtcweb-use-cases-and-requirements-05 (work in progress), September 2011. [webrtc-api] Adam Bergkvist et al, "WebRTC 1.0: Real-time Communication Between Browsers", February 2012 Available at http://www.w3.org/TR/webrtc/ 6.2. Informative references Authors' Addresses Yoshihiro Suzuki D3 Communications 1-18-31 Tamagawa-Gakuen Machida City, Tokyo Japan 194-0041 Email: hiro.suzuki@d3communications.jp Nobuo Ogashiwa Maebashi Kyoai Gakuen College 1154-4 Koyahara-machi Maebashi City, Gunma Pref JAPAN 379-2192 EMail: ogashiwa@c.kyoai.ac.jp URI: http://www.kyoai.ac.jp/