CUSAX: Combined Use of the Session Initiation Protocol (SIP) and
the Extensible Messaging and Presence Protocol (XMPP)
JitsiStrasbourg67000France+33-177-624-330emcho@jitsi.orgCisco Systems, Inc.1899 Wynkoop Street, Suite 600DenverCO80202USA+1-303-308-3282psaintan@cisco.comTelecom ItaliaVia G. Reiss Romoli, 274Turin10148Italyenrico.marocco@telecomitalia.it
This document describes suggested practices for combined use
of the Session Initiation Protocol (SIP) and the Extensible
Messaging and Presence Protocol (XMPP). Such practices aim to
provide a single fully featured real-time communication service
by using complementary subsets of features from each of the
protocols. Typically such subsets would include telephony
capabilities from SIP and instant messaging and presence
capabilities from XMPP. This specification does not define any
new protocols or syntax for either SIP or XMPP. However,
implementing the practices outlined in this document
may require modifying or at least reconfiguring
existing client and server-side software. Also, it is not the
purpose of this document to make recommendations as to whether
or not such combined use should be preferred to the mechanisms
provided natively by each protocol (for example, SIP's SIMPLE
or XMPP's Jingle). It merely aims to provide guidance to those
who are interested in such a combined use.
Historically SIP and
XMPP have often been implemented
and deployed with different purposes: from its very start SIP's
primary goal has been to provide a means of conducting "Internet
telephone calls". XMPP on the other hand, has, from its Jabber
days, been mostly used for instant messaging and presence
, as well as related services such as
groupchat rooms .
For various reasons, these trends have continued through the
years even after each of the protocols had been equipped to
provide the features it was initially lacking:
In the context of the SIMPLE working group, the IETF has
defined a number of protocols and protocol extensions that
not only allow for SIP to be used for regular instant
messaging and presence but that also provide mechanisms for
elaborated features such as multi-user chats, server-stored
contact lists, file transfer and others.
Similarly, the XMPP community and the XMPP Standards
Foundation have worked on defining a number of XMPP
Extension Protocols (XEPs) that provide XMPP implementations
with the means of establishing end-to-end sessions. These
extensions are often jointly referred to as Jingle and
arguably their most popular use case is audio and video
calling.
Despite these advances, SIP remains the protocol of choice
for telephony-like services, especially in enterprises where
users are accustomed to features such as voice mail, call park,
call queues, conference bridges and many others that are rarely
(if at all) available in Jingle-based software. XMPP
implementations, on the other hand, greatly outnumber and
outperform those available for instant messaging and presence
extensions developed in the SIMPLE WG, such as
MSRP and
XCAP.
As a result, a number of adopters have found themselves needing
features that are not offered by any single-protocol solution,
but that separately exist in SIP and XMPP implementations.
The idea of seamlessly using both protocols
together would hence often appeal to service providers and users. Most
often, such a service would employ SIP exclusively for audio,
video, and telephony services and rely on XMPP for anything
else varying from chat, contact list management, and presence to
whiteboarding and exchanging files. Because these services and
clients involve the combined use of SIP and XMPP, we label them
"CUSAX" for short.
This document explains how such hybrid offerings can be achieved
with a minimum of modifications to existing software while
providing an optimal user experience. It covers server
discovery, determining a SIP Address of Record (AOR) while using
XMPP, and determining an XMPP Jabber Identifier ("JID") from
incoming SIP requests. Most of the text here pertains to client
behavior but it also recommends certain server-side
configurations and operational practices. The document also
discusses significant security considerations that can arise
when offering a dual-protocol solution, and provides advice
for avoiding security mismatches that would result in
degraded communications security for end users.
Note that this document is focused on coexistence of SIP and
XMPP functionality in end-user-oriented clients. By intent it
does not define methods for protocol-level mapping between SIP
and XMPP, as might be used within a server-side gateway between
a SIP network and an XMPP network (a separate series of
documents has been produced that defines such mappings). More
generally, this document does not describe service policies for
inter-domain communication (often called "federation") between
service providers (e.g., how a service provider that offers a
combined SIP-XMPP service might communicate with a SIP-only or
XMPP-only service), nor does it describe the reasons why a
service provider might choose SIP or XMPP for various features.
This document concentrates on use cases where the SIP
services and XMPP services are controlled by one and the same
provider, since that assumption greatly simplifies both
client implementation and server-side deployment (e.g., a
single service provider can enforce common or coordinated
policies across both the SIP and XMPP aspects of a CUSAX
service, which is not possible if a SIP service is offered
by one provider and an XMPP service is offered by another).
Since this document is of an informational nature, it
is not unreasonable for clients to apply some of the guidelines
here even in cases where there is no established relationship
between the SIP and the XMPP services (for example, it is
reasonable for a client to provide a way for its users to
easily start a call to a phone number recorded in a vCard
or obtained from a user directory).
However, the rules to follow in such cases are
left to application developers (although they might be
described in a future specification).
Finally, this document makes a further simplifying assumption
by discussing only the use of a single client, not use of
and coordination among multiple endpoints controlled by the
same user (e.g., user agents running simultaneously on a
laptop computer, tablet, and mobile phone). Although user
agents running on separate endpoints might themselves be
CUSAX clients or might engage in different aspects of an
interaction (e.g., a user might employ her mobile phone
for audio and her tablet for video and text chat), such
usage complicates the guidelines for developers of user agents
and therefore is left as a matter of implementation for now.
One of the main problems of using two distinct protocols when
providing one service is the impact on usability. Email
services, for example, have long been affected by the mixed use
of SMTP for outgoing mail and POP3 or IMAP for incoming mail.
Although standard service discovery methods (such as the
proper DNS records) make it possible for a user agent to
locate the right host(s) at which to connect, they do not
provide the kind of detailed information that is needed to
actually configure the user agent for use with the service.
As a result, it is rather complicated for inexperienced users to
configure a mail client and start using it with a new service,
and Internet service providers often need to provide
configuration instructions for various mail clients. Client
developers and communication device manufacturers on the other
hand often ship with a number of wizards that enable users to
easily set up a new account for a number of popular email
services. While this may improve the situation to some extent,
the user experience is still clearly sub-optimal.
While it should be possible for CUSAX users to manually
configure their separate SIP and XMPP accounts (often by
means of so-called "wizard" interfaces), service
providers offering CUSAX services to users of dual-stack
SIP/XMPP clients ought to provide means of online provisioning,
typically by means of a web-based service at an HTTP URI
(naturally single-purpose SIP services or XMPP services could
offer online provisioning as well, but they can be especially
helpful where the two aspects of the CUSAX service need to
have several configuration options in common).
Although the specifics of such mechanisms are outside the scope of
this specification, they should make it possible for a service
provider to remotely configure the clients based on minimal
user input (e.g., only a user ID and password). As far as the
authors are aware, no open protocol for endpoint configuration
is yet available and adopted; however, application developers are
encouraged to explore the potential for future progress in
that space (e.g., perhaps based on technologies such as
WebFinger ).
Because many of the features that a CUSAX client would prefer
in one protocol would also be available in the other, clients
should make it possible for such features to be disabled for a
specific account. In particular, it is suggested that clients
allow for audio and video calling features to be disabled for
XMPP accounts, and that instant messaging and presence features
should also be made optional for SIP accounts.
The main advantage of this approach is that clients would be
able to continue to function properly and use the complete
feature set of standalone SIP and XMPP accounts.
Once clients have been provisioned, they need to independently
log into the SIP and XMPP accounts that make up the CUSAX
"service" and then maintain both these connections as displayed
in .
In order to improve the user experience, when reporting
connection status clients may also wish to present the XMPP
connection as an "instant messaging" or a "chat" account.
Similarly they could also depict the SIP connection as a "Voice
and Video" or a "Telephony" connection. The exact naming is of
course entirely up to implementers. The point is that, in cases
where SIP and XMPP are components of a service offered by a
single provider, such presentation could help users better
understand why they are being shown two different connections
for what they perceive as a single service. It could alleviate
especially situations where one of these connections is
disrupted while the other one is still active. Naturally, the
developers of a CUSAX client or the providers of a CUSAX service
might decide not to accept such situations and force a client to
completely disconnect unless both aspects are successfully
connected.
Clients may also choose to delay their XMPP connection until
they have been successfully registered on SIP. This would help
avoid the situation where a user appears online to its contacts
but calling it would fail because their clients is still
connecting to the SIP aspect of their CUSAX service.
Once a CUSAX client has been provisioned and authorized to
connect to the corresponding SIP and XMPP services it would
proceed by retrieving its XMPP roster.
The client should use XMPP for all forms of communication with
the contacts from this roster, which will occur naturally
because they were retrieved through XMPP. Audio/video features
however, would typically be disabled in the XMPP stack, so any form of
communication based on these features (e.g. direct calls,
conferences, desktop streaming, etc.) would happen over SIP. The
rest of this section describes deployment, discovery, usability
and linking semantics that allow CUSAX clients to fall back and
seamlessly use SIP for these features.
In order for CUSAX to function properly, XMPP service
administrators should make sure that at least one of the
vCard "tel" fields for each
contact is properly populated with a SIP URI or a phone number
when an XMPP protocol for vCard storage is used (e.g.,
or ). There
are no limitations as to the form of that number. For example
while it is desirable to maintain a certain consistency
between SIP AORs and XMPP JIDs, that is by no means required.
It is quite important however that the phone number or SIP AOR
stored in the vCard be reachable through the SIP aspect of
this CUSAX service. (The same considerations apply even if
the directory storage format is not vCard.)
Administrators may also choose to include the "video" tel type
defined in for accounts that would be
capable of handling video communication.
To ensure that the foregoing approach is always respected,
service providers might consider (1) preventing clients (and
hence users) from modifying the vCard "tel" fields or (2)
applying some form of validation before storing changes. Of
course such validation would be feasible mostly in cases where
a single provider controls both the XMPP and the SIP service
since such providers would "know" (e.g., based on use of a
common user database for both services) what SIP AOR
corresponds to a given XMPP user (as indicated in
).
The task of operating and managing a SIP service or an XMPP
service is not always easy. Combining the two into a unified
service introduces additional challenges, including:
The necessity of opening additional ports on the client
side if SIP functionality is added to an existing XMPP
deployment or vice-versa.
The potential for subtle differences in security posture
across SIP and XMPP (e.g., SIP servers and XMPP servers
might support different TLS ciphersuites).
The need for, ideally, a common authentication backend and
other infrastructure that is shared across the SIP and XMPP
aspects of the combined service.
Coordinated monitoring and logging of the SIP and XMPP
servers to enable the correlation of incidents and the
pinpointing of problems.
The difficulty of troubleshooting client-side issues, e.g.
if the client loses connectivity for XMPP but maintains
its SIP connection.
Although separation of functionality (SIP for media, XMPP for
IM and presence) can help to ease the operational burden to
some extent, service providers are urged to address the
foregoing challenges and similar issues when preparing to
launch a CUSAX service.
When rendering the roster for a particular XMPP account CUSAX
clients should make sure that users are presented with a
"Call" option for each roster entry that has a properly set
"tel" field. This is the case even if calling features
have been disabled for that particular XMPP account, as
advised by this document. The usefulness of such a feature is
not limited to CUSAX. After all, numbers are entered in vCards
or stored in directories
in order to be dialed and called. Hence, as long as an XMPP
client has any means of conducting a call it may wish to make
it possible for the user to easily dial any numbers that it
learned through whatever means.
Clients that have separate triggers (e.g., buttons) for audio
calls and video calls may choose to use the presence or absence
of the "video" tel type defined in as
the basis for choosing whether to enable or disable the
possibility for starting video calls (i.e., if there is no
"video" tel type for a particular contact, do not provide a way
for the user to start a video call with that contact).
In addition to discovering phone numbers from vCards or
user directories, clients
may also check for alternative communication methods as
advertised in XMPP presence broadcasts and Personal
Eventing Protocol nodes as described in
XEP-0152: Reachability Addresses
. However, these indications are merely hints, and a
receiving client ought not associate a SIP address and an XMPP
address unless it has some way to verify the association
(e.g., the vCard of the XMPP account lists the SIP address and
the vCard of the SIP account lists the XMPP address, or the
association is made explicit in a record provided by a trusted
directory). Alternatively or in cases where vCard or
directory data is not available, a CUSAX client could take the
user's own address book as the canonical source for contact
addresses.
In order to improve usability, in cases where clients are
provisioned with only a single telephony-capable account they
ought to initiate calls immediately upon user request without
asking users to indicate an account that the call should go
through. This way CUSAX users (whose only account with calling
capabilities is usually the SIP part of their service) would
have a better experience, since from the user's perspective
calls "just work at the click of a button".
In some cases however, clients will be configured with more
than the two XMPP and SIP accounts provisioned by the CUSAX
provider. Users are likely to add additional stand-alone XMPP
or SIP accounts (or accounts for other communications protocols),
any of which might have both telephony and instant messaging
capabilities. Such situations can introduce additional
ambiguity since all of the telephony-capable accounts could be
used for calling the numbers the client has learned from
vCards or directories.
To avoid such confusion, client implementers and CUSAX service
providers may choose to indicate the existence of a special
relationship between the SIP and XMPP accounts of a CUSAX
service. For example, let's say that Alice's service provider
has opened both an XMPP account and a SIP account for her.
During or after provisioning, her client could indicate that
alice@xmpp.example.com has a CUSAX relation to
alice@sip.example.com (i.e., that they are two aspects of the
same service). This way whenever Alice triggers a call to a
contact in her XMPP roster, the client would preferentially
initiate this call through her example.com SIP account even if
other possibilities exist (such as the XMPP account where the
vCard was obtained or a SIP account with another provider).
If, on the other hand, no relationship has been configured or
discovered between a SIP account and an XMPP account, and the
client is aware of multiple telephony-capable accounts, it
ought to present the user with the choice of reaching the
contact through any of those accounts. This includes the
source XMPP account where the vCard was obtained (in case its
telephony capabilities are not disabled through configuration
or provisioning), in order to guarantee proper operation for
XMPP accounts that are not part of a CUSAX deployment.
When receiving SIP calls, clients may wish to determine the
identity of the caller and a corresponding XMPP roster entry so
that users could revert to chatting or other forms of
communication that require XMPP. To do so clients could search
their roster for an entry whose vCard has a "tel" field
matching the originator of the call. In addition, in order to
avoid the effort of iterating over an
entire roster and retrieving all vCards, CUSAX clients may use
a SIP Call-Info header whose 'purpose' header field parameter
has a value of "impp" as described in
. An example
follows.
Note that the information from the Call-Info header should
only be used as a cue: the actual AOR-to-JID binding would
still need to be confirmed by a vCard entry or through some
other trusted means (such as an enterprise directory). If this
confirmation succeeds the client would not need to search the
entire roster and retrieve all vCards. Not performing the
check might enable any caller (including malicious ones) to
employ someone else's identity and perform various scams or
Man-in-the-Middle attacks.
However, although an AOR-to-JID binding can be a helpful hint
to the user, nothing in the foregoing paragraph ought to be
construed as necessarily discouraging users, clients, or
service providers from accepting calls originated by
entities that are not established contacts of the user
(e.g., as reflected in the user's roster); that is a policy
matter for the user, client, or service provider.
CUSAX clients that support the SIP conferencing framework
can detect when a call they are
participating in is actually a conference and can then subscribe
for conference state updates as per .
A regular SIP user agent would also use the same conference URI
for text communication with the Message Session Relay Protocol
(MSRP). However, given that SIP's instant messaging capabilities
would normally be disabled (or simply not supported) in CUSAX
deployments, an XMPP Multi-User Chat (MUC)
associated with the conference can be announced/discovered through
<service-uris> bearing the "grouptextchat" purpose
. Similarly, an
XMPP MUC can advertise the SIP URI of an associated service for
audio/video interactions using the 'audio-video-uri' field of
the "muc#roominfo" data form to include
extended information about the MUC room
within XMPP service discovery ; see
for an example.
Once a CUSAX client joins the MUC associated with a particular
call it should not rely on any synchronization between the two.
Both the SIP conference and the XMPP MUC would function
independently, each issuing and delivering its own state
updates. It is hence possible that that certain peers would
temporarily or permanently be reachable in only one of the two
conferences. This would typically be the case with single-stack
clients that have only joined the SIP call or the XMPP MUC. It
is therefore important for CUSAX clients to provide a clear
indication to users as to the level of participation of the
various participants. In other words, a user needs to be able to
easily understand whether a certain participant can receive text
messages, audio/video, or both.
Of course, tighter integration between the XMPP MUC and the SIP
conference is also possible. Permissions, roles, kicks and bans
that are granted and performed in the MUC can easily be imitated
by the conference focus/mixer into the SIP call. If for example,
a certain MUC member is muted, the conference mixer can choose
to also apply the mute on the media stream corresponding to that
participant. The details and exact level of such integration is
of course entirely up to implementers and service providers.
The approach above describes one relatively lightweight
possibility of combining SIP and XMPP multi-party interaction
semantics without requiring tight integration between the two.
As with the rest of this specification, this approach is by no
means normative. Implementation and future specifications may
define other methods or provide other suggestions for improving
the Unified Communications user experience in cases of
multi-user chats in conference calling.
In theory there are no technical reasons why federation would
require special behavior from CUSAX clients. However, it is
worth noting that differences in administration policies may
sometimes lead to potentially confusing user experiences.
For example, let's say atlanta.example.com observes the CUSAX
policies described in this specification. All XMPP users at
atlanta.example.com are hence configured to have vCards that
match their SIP identities. Alice is therefore used to making
free, high-quality SIP calls to all the people in her roster.
Alice can also make calls to the PSTN by simply dialing
numbers. She may even be used to these calls being billed to
her online account so she would be careful about how long they
last. This is not a problem for her since she can easily
distinguish between a free SIP call (one that she made by
calling one her roster entries) from a paid PSTN call that she
dialed as a number.
Then Alice adds xmpp:bob@biloxi.example.com. The Biloxi domain
only has an XMPP service. There is no SIP server and Bob uses
a regular, XMPP-only client. Bob has however added his mobile
number to his vCard in order to make it easily accessible to
his contacts. Alice's client would pick up this number and
make it possible for Alice to start a call to Bob's mobile
phone number.
This could be a problem because, other than the fact that
Bob's address is from a different domain, Alice would have no
obvious and straightforward cues telling her that this is in
fact a call to the PSTN. In addition to the potentially lower
audio quality, Alice may also end up incurring unexpected
charges for such calls.
In order to avoid such issues, providers maintaining a CUSAX
service for the users in their domain may choose to provide
additional cues (e.g., a user interface warning or an an audio
tone or message) indicating that a call would incur charges.
A slightly less disturbing scenario, where a SIP service might
only allow communication with intra-domain numbers, would
simply prevent Alice from establishing a call with Bob's
mobile. Providers should hence make sure that calls to
inter-domain numbers are flagged with an appropriate audio or
textual warning.
The following practices are suggested for CUSAX user agents:
By default, prefer SIP for audio and video, and XMPP for
messaging and presence.
Use XMPP for all forms of communication with the contacts
from the XMPP roster, with the exception of features that
are based on establishing real-time sessions (e.g.
audio/video calls) in which case use SIP.
Provide on-line provisioning options for providers to
remotely setup SIP and XMPP accounts so that users wouldn't
need to go through a multi-step configuration process.
Provide on-line provisioning options for providers to
completely disable features for an account associated with a
given protocol (SIP or XMPP) if the features are preferred
in another protocol (XMPP or SIP).
Present a "Call" option for each roster entry that has a
properly set "tel" field.
If the client is provisioned with only a single
telephony-capable account, initiate calls immediately upon
user request without asking users to indicate an account
that the call should go through.
If no relationship has been configured or discovered between
a SIP account and an XMPP account, and the client is aware
of multiple telephony-capable accounts, present the user
with the choice of reaching the contact through any of those
accounts.
Optionally, indicate the existence of a special relationship
between the SIP and XMPP accounts of a CUSAX service.
Optionally, present the XMPP connection as an "instant
messaging" or a "chat" account and the SIP connection as a
"Voice and Video" or a "Telephony" acccount.
Optionally, determine the identity of the audio/video caller
and a corresponding XMPP roster entry so that the user could
revert to textual chatting or other forms of communication
that require XMPP.
Optionally, delay the XMPP connection until after a SIP
connection has been successfully registered.
Optionally, check for alternative communication methods
(SIP addresses advertised over XMPP, and XMPP addresses
advertised over SIP).
The following practices are suggested for CUSAX services:
Use online provisioning and configuration of accounts so
that users won't need to setup two separate accounts for
your service.
Use online provisioning so that calling features are
disabled for all XMPP accounts.
Ensure that at least one of the vCard "tel" fields for each
XMPP user is properly populated with a SIP URI or a phone
number that are reachable through your SIP service.
Optionally, include the "video" tel type for accounts that
are capable of handling video communication.
Optionally, provision clients with information indicating
that specific SIP and XMPP accounts are related in a CUSAX
service.
Optionally, attach a "Call-Info" header with an "impp"
purpose to all your SIP INVITE messages, so that clients
can more rapidly associate a caller with a roster entry
and display a "Caller ID".
Use of the same user agent with two different accounts providing
complementary features introduces the possibility of mismatches
between the security profiles of those accounts or features. Two
security mismatches of particular concern are:
The SIP aspect and XMPP aspect of a CUSAX service
might offer different authentication options (e.g.,
digest authentication for SIP as specified in
and SCRAM authentication
for XMPP as specified in
). Because SIP uses a
password-based method (digest) and XMPP uses a
pluggable framework for authentication via the Simple
Authentication and Security Layer (SASL) technology
, it is also possible that
the XMPP connection could be authenticated using
a password-free method such as client certificates
with SASL EXTERNAL even though a username and
password is used for the SIP connection.
The Transport Layer Security (TLS)
ciphersuites offered or negotiated on the XMPP side
might be different from those on the SIP side because
of implementation or configuration differences between
the SIP server and the XMPP server. Even more seriously,
a CUSAX client might successfully negotiate TLS when
connecting to the XMPP aspect of the service but not
when connecting to the SIP aspect, or vice-versa. In this
situation an end user might think that the combined CUSAX
session with the service is protected by TLS, even though
only one aspect is protected.
Security mismatches such as these (as well as others related
to end-to-end encryption of messages or media) introduce the
possibility of downgrade attacks, eavesdropping, information
leakage, and other security vulnerabilities. User agent
developers and service providers must ensure that such mismatches
are avoided as much as possible (e.g., by enforcing common
and strong security configurations and policies across protocols).
Specifically, if both protocols are not safeguarded by similar
levels of cryptographic protection, the user must be informed of
that fact and given the opportunity to bring both up to the same
level.
discusses potential issues that may
arise due to a mismatch between client capabilities, such as
calls being initiated with costs contrary to the expectation of
the end user. Such issues could be triggered maliciously,
as well as by accident. Implementers therefore need to
provide necessary cues to raise user awareness as suggested in
.
Refer to the specifications for the relevant SIP and XMPP
features for detailed security considerations applying to
each "stack" in a CUSAX client.
This document has no actions for the IANA.
A Group Text Chat Purpose for Conference and Service URIs in
the Session
Initiation Protocol (SIP) Event Package for Conference State
This document defines and registers a value of
"grouptextchat"
("Group Text Chat") value for the URI <purpose> element
of
SIP's Conference Event Package [RFC4575].
Data Formsreatmon@jabber.orgjhildebr@cisco.comjer@jabber.orgtemas@jabber.orgstpeter@jabber.orgService Discoveryjhildebr@cisco.comreatmon@jabber.orgstpeter@jabber.orgMulti-User Chatstpeter@jabber.orgvcard-tempstpeter@jabber.orgService Discovery Extensionsstpeter@jabber.orgXEP-0152: Reachability Addresses
Cisco
Cisco
vCard4 Over XMPPstpeter@jabber.orgsamizzi@cisco.com
This draft is inspired by the "SIXPAC" work of Markus Isomaki
and Simo Veikkolainen. Markus also provided various suggestions
for improving the document.
The authors would also like to thank the following people for
their reviews and suggestions: Sébastien Couture, Dan-Christian
Bogos, Richard Brady, Olivier Crête, Aaron Evans,
Kevin Gallagher, Adrian Georgescu, Saúl Ibarra Corretgé, David
Laban, Gergely Lukacsy, Murray Mar, Daniel Pocock, Travis
Reitter, and Gonzalo Salgueiro.
Brian Carpenter, Ted Hardie, Paul Hoffman, and Benson Schliesser
reviewed the document on behalf of the General Area Review
Team, the Applications Area Directorate, the Security Directorate,
and the Operations and Management Directorate, respectively.