Internet Engineering Task Force IMPP WG Internet Draft Jonathan Rosenberg Dean Willis Robert Sparks Ben Campbell dynamicsoft Henning Schulzrinne Jonathan Lennox Columbia U. Bernard Aboba Christian Huitema David Gurle Microsoft Dave Oran Cisco draft-rosenberg-impp-im-00.txt June 15, 2000 Expires: December, 2000 SIP Extensions for Instant Messaging STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as work in progress. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes a SIP extension that supports Instant Messaging (IM). IM is supported in SIP with a single new method. We provide motivations on why SIP is an ideal platform for IM, why IM should be completely separated from presence, and exactly how to perform IM with SIP. 1 Introduction Instant messaging is defined as the exchange of content between a set of participants in real time. Generally, the content is short textual messages, although that need not be the case. Generally, the messages that are exchanged are not stored, but this also need not be the case. IM differs from email in common usage in that instant messages are usually grouped together into brief live conversations, Rosenberg et. al. [Page 1] Internet Draft IM June 15, 2000 consisting of numerous small messages sent back and forth. Instant messaging as a service has been in existence within intranets and IP networks for quite some time. Early implementations include zephyr [1], the unix talk application, and IRC. More recently, IM has been used as a service coupled with presence and buddy lists; that is, when a friend comes online, a user can be made aware of this and have the option of sending the friend an instant message. The protocols for accomplishing this are all proprietary, which has seriously hampered interoperability. Furthermore, most of these protocols tightly couple presence and IM, due to the way in which the service is offered. Despite the popularity of presence coupled IM services, IM is a separate application from presence. There are many ways to use IM outside of presence (for example, as part of a voice communications session). Another example are interactive games (possibly established with SIP - SIP can establish any type of session, not just voice or video); IM is already a common component of multiplayer online games. Keeping it apart from presence means it can be used in such ways. Furthermore, keeping them separate allows separate providers for IM and for presence service. Of course, it can always be offered by the same provider, with both protocols implemented into a single client application. Along a similar vein, the mechanisms needed in an IM protocol are very similar to those needed to establish an interactive session - rapid delivery of small content to a user at their current location, which may, in general, be dynamically changing as the user moves. The similarity of needed function implies that existing solutions for initiation of sessions (namely, the Session Initiation Protocol (SIP) [2]) is an ideal base on which to build an IM protocol. 2 Motivations for Using SIP Our first motivation for using SIP as the basis for IM is that the problems of session initiation and instant messaging are very similar. When provided independently of presence, the primary challenge behind providing IM service is to deliver the instant message to the host where the user is currently available, and if not available, return an error code indicating such. This is exactly the same service required for initiation of sessions, as these invitations must also be delivered to the host where the user is currently available. The result is that all of the application layer-routing and personal mobility services provided by SIP are both needed, and directly applicable to, the delivery of IMs. In fact, by defining IM as just a new SIP method, existing SIP proxies can route IMs without even being aware of this extension. Rosenberg et. al. [Page 2] Internet Draft IM June 15, 2000 SIP is a transactional service, consisting of sequences of request- response transactions within a common context (identified by the Call-ID). If desired, ordering of transactions can be guaranteed. This kind of transactional service is also needed for instant messages. Instant messages often occur in groups; that is, one party sends an instant message, and then there is a back and forth of messages that form a conversation of sorts, where the conversation (aka session) was effectively initiated by the first message. It is necessary to provide an identifier to group these instant messages together, so that each IM can be associated with a particular session. Since SIP is used for session initiation, the identifiers and tools it provides for management of the state associated with sessions are directly applicable to instant messaging. SIP uses MIME for transport of content. The meaning and purpose of the content depend on the request method and on the content type. This means that an IM service based on SIP can transport arbitrary MIME content, which has been established as a requirement for IM [3]. SIP establishes and controls communications, generally between humans, and thus provides numerous header fields for identification of the users involved in the communication. IM is also designed to enable communications between humans, and thus the same requirements for user identification are present. The SIP header fields for this function (To and From) are directly applicable to IM, as are the authentication tools provided by SIP to verify those identities. Scale is critical for IM service. Scale is primarily achieved by removal of state from network elements, and pushing protocol functions towards the periphery. Based on this, it is highly desirable for it to be possible for messaging to occur directly between participants, yet still take advantage of SIPs routing capabilities to deliver messages. This is easily supported in SIP, as the same requirement exists for achieving scale of session initiation - the initial call setup messages go through network servers in order to be routed properly, but subsequent signaling can occur end to end. So, we can have the initial IM pass through proxies in order to be properly routed to the recipient, and subsequent IMs can go direct. SIPs Record-Route, Route, and Contact headers are used for this purpose, and are applicable to IM to provide the same function. Security is critical for IM, as it is for session initiation. SIPs capabilities of end to end authentication and encryption, coupled with hop by hop security mechanisms (outside of SIP itself) provide security for session initiation, and these mechanisms will work for IM as well. Rosenberg et. al. [Page 3] Internet Draft IM June 15, 2000 Finally, and most importantly, both IM and voice/video are part of a complete communications service. It is likely that many devices will perform both IM and voice, and that these devices will have limited memory and processing power. By using the same protocol for both forms of communications, a reduction in memory requirements through code reuse is obtained. Even bigger benefits are gained from providers that wish to offer voice, video, presence and IM. By having all of these differing aspects of communications running off the same infrastructure, providers can realize substantial savings in infrastructure cost, management cost, and provisioning cost. Furthermore, by using SIP for both IM and establishment of communications sessions, services that integrate the two are readily supported. For example, many IM systems allow an IM session to transition to voice with a single click. This is trivially done if SIP is used for IM; all of the information needed to send a SIP INVITE request directly to the other user has already been obtained through the IM exchanges. Furthermore, by using the same session identifiers, the call can be associated with the IM session. This allows the called party to know that the call was related to a specific IM exchange. If IM were done with a different protocol, this integration would not be possible. For these reasons, we believe SIP is ideal for IM service. Section 6 examines each of the requirements outlined in [3] and demonstrates how this extension meets those requirements. 3 Terminology Most of the terminology used here is defined in RFC2778 [4]. However, we duplicate some of the terminology from SIP in order to clarify this document: User Agent (UA): A UA is a piece of software which is capable of initiating requests, and of responding to requests. User Agent Server (UAS): A UAS is the component of a UA which receives requests, and responds to them. User Agent Client (UAC): A UAC is the component of a UA which sends requests, and receives responses. Registrar: A registrar is a SIP server which can receive and process REGISTER requests. These requests are used to construct address bindings. Rosenberg et. al. [Page 4] Internet Draft IM June 15, 2000 4 Overview of Operation When one user wishes to send an instant message to another, the sender formulates a SIP request with a new method, called MESSAGE. The request URI of this request is a normal SIP URL identifying the party to whom the message is directed. This request URI is rewritten by SIP proxies (which are very similar to HTTP proxies) as the request travels towards the recipient. For example, a request for sip:joe@example.com will arrive at the example.com server, which looks up Joe in some corporate database, and then determines that Joe can be reached internally at sip:joe@engineering.example.com. This new address is placed in the request URI of the outgoing request, and sent to the server for engineering.example.com. Since the request URI is rewritten by proxies, some means is needed to convey the identity of the original desired recipient. Thus, the sender also places the URL for the desired recipient in the mandatory To field. The From field identifies the originator of the message. The message must also contain a Call-ID. In SIP, the Call-ID is used to associate a group of requests with the same session. Here, the usage is the same; all IMs that are part of the same session share the same Call-ID value. Call-ID has no meaning beyond being a common identifier. Each IM also carries a CSeq, which is a sequence number plus the name of the method of the request (the method name is there to support SIP features not required for IM). The CSeq uniquely identifies each IM in the session, and increases for each subsequent IM. Each IM also carries a Via header. Via headers contain a trace of the IP addresses or FQDNs of the systems that the request traversed. As a request travels from proxy to proxy towards the recipient, each adds its address, "pushing" them into a header, much like the operation of a stack. The stack of addresses is reflected in the response, and each proxy "pops" the top address off, and uses that to determine where to send the response. This allows proxies to forward UDP requests statelessly, so that they need not even remember where the request came from to forward the response. Finally, clients using this extension MUST insert a Contact header into the request (Contact is used for routing of requests in the reverse direction, from the target of the original message to the initiator of the original message). The MESSAGE request MAY contain a body. The body contains the message to be rendered by the recipient. SIP uses the standard MIME headers (Content-Type, Content-Length, and Content-Encoding) to identify the content. The request MAY be sent using UDP or TCP (SIP supports both UDP and TCP (and even SCTP [5]) transport; reliability is guaranteed over UDP and congestion control is provided through a simple retransmission Rosenberg et. al. [Page 5] Internet Draft IM June 15, 2000 scheme with exponential backoff) but TCP is RECOMMENDED when the message size exceeds 1184 bytes, in order to avoid fragmentation and the associated loss exponentiation effect. This means that a TCP connection may be established for the first large message; it is RECOMMENDED that the client keep this connection open and use it to send subsequent messages destined for the same server. The request MAY be sent to a local outbound proxy (a local outbound proxy is a device similar to an http proxy; it receives requests which are not destined for itself, and then forwards them towards the final destination), or MAY be sent directly to the server in the domain specified in the request URI. This is identical to baseline SIP. Local outbound proxies are RECOMMENDED in order to provide domain-based third party signatures (i.e., re-sign the request with a key for the entire domain). These proxies SHOULD perform proxy authentication, verifying the identity of the originator, before re- signing. Proxies forward the message according to configured routing logic combined with DNS SRV record procedures. Pre-established security associations MAY be used, or SAs MAY be established on demand. The SAs themselves SHOULD be based on IPSec ESP in transport mode [6] to provide privacy services for instant messages. Keys for ESP MAY be established administratively. If administrative keys are not available, IKE is used for key exchange [7]. If a proxy receives a request that does not arrive over a SA, it MAY reject the request. This decision is based on the local security policy of the proxy. Each proxy adds its address to the Via header as it forwards the request. Proxies MAY also record route; this means that they can request to receive all subsequent messages for the same Call-ID. By not record-routing, proxies will see only the initial request they forward; all subsequent requests in the same session will bypass the proxy, and go on a more direct path between the end systems. Record- routing is done by inserting a header into the forwarded request (called Record-Route) which contains the address of the proxy. Like the Via headers, Record-Route has a "stack" property, since proxies "push" values into the message. The entire Record-Route stack is reflected in the response to the IM, but unlike Via, no addresses are "popped" in the response. In this fashion, both sender and recipient of the IM have a list of the message path for subsequent requests. This path list is built into a Route header by the end systems, and placed in subsequent requests. The Route header is like a loose source route in IP, and specifies the path that the request should take. Record-routing gives each proxy the capability to independently decide the right trade off of scale (achieved by not record routing) and services (generally achieved by record routing). Proxies which are aware that they are behind a firewall, for example, can record- Rosenberg et. al. [Page 6] Internet Draft IM June 15, 2000 route, ensuring that messages from inside to outside always come from the proxy. Beyond the existence of firewalls, however, we see no strong reason for proxies to Record-Route instant messages. The decision, of course, is at the discretion of the administrator. Proxies MAY have access rules which prohibit the transmission of instant messages based on certain criteria. Typically, this criteria will be based on the identity of the sender of the instant messages. Establishment of this criteria in the proxy is outside the scope of this extension. We anticipate that such access controls will often be controlled through web pages accessible by users, mitigating the need for standardization of a protocol for defining access rules. Eventually, the request is forwarded to a proxy which is co-located with a registrar. A registrar is an entity in SIP that has dynamic application layer routing information. When a client starts up, they send the registrar a REGISTER request that binds an address in the domain of the registrar to the address of the machine they are residing on. Continuing with the example above, the proxy for engineering.example.com receives the request for Joe. Joe had formerly registered a binding from sip:joe@engineering.example.com to sip:joe@mypc.engineering.example.com, which contains the FQDN of the host Joe is using. In fact, the binding established by a REGISTER can be one to many, so that a user can indicate the ability to be contacted at multiple hosts (laptop, PDA, cell phone). The proxy co- located with the registrar uses this information to forward the request once more. In fact, the proxy may fork, which means it sends multiple copies of the request, one to each host in the binding. For an IM, this means the message can appear at many hosts. So, a user which has a tool running at work, goes home, and starts a tool there, can receive the IM at *both* machines. Once the user sends an IM back, future IMs in the same session will be routed only to the machine where the second IM came from. Proxies which route messages based on registrations SHOULD additionally support the "methods" parameter in the caller preferences specification [8]. This specification allows, among other things, for clients to indicate in a REGISTER that they would prefer to receive messages with specific methods. Proxies receiving requests with a particular method forward it to the contact address which has indicated it can handle that method. This allows for a user with a single SIP address to use separate user agents for IM and for other communications. Alternatively, users can use the same user agents for both. If the user agent (UA) does not support IM, the MESSAGE method will be unknown to it, and it will generate a 405 (Method Not Allowed) response, and list the methods that are allowed. Similarly, if a user Rosenberg et. al. [Page 7] Internet Draft IM June 15, 2000 agent only supports IM (that is, it only does instant messaging), it rejects other requests, like INVITE, with a 405 and lists MESSAGE as the only method in the Allow header. It is RECOMMENDED that a user agent place an Allow header in a response to an INVITE, indicating its support for MESSAGE. This allows a UA to "grey out" the IM button that would allow message exchanges during a multimedia session. Finally, if the message is received correctly by the user agent (independently of whether the principal (i.e, the user) has read it), a 200 OK response is generated, and forwarded back towards the sender. It is worth noting that of all the described mechanisms above, everything is already specified by SIP, excepting the new MESSAGE method, and some minor handling rules (namely, Contact MAY be left out of a 200 OK to a MESSAGE request) to enable the forking of MESSAGE. Furthermore, the above describes the majority of the SIP capabilities needed for IM. Section 7 more fully indicates the components of SIP that are needed, and not needed, for IM. 4.1 Message flow An example message flow is shown in Figure 1. The message flow shows an initial IM sent from User 1 to User 2, both users in the same domain, "domain", through a single proxy. A second IM, sent in response, flows directly from User 2 to User 1. Message F1 looks like: MESSAGE sip:user2@domain.com SIP/2.0 Via: SIP/2.0/UDP user1pc.domain.com From: sip:user1@domain.com To: sip:user2@domain.com Contact: sip:user1@user1pc.domain.com Call-ID: asd88asd77a@1.2.3.4 CSeq: 1 MESSAGE Content-Type: text/plain Content-Length: 18 Watson, come here. User1 forwards this message to the server for domain.com (discovered Rosenberg et. al. [Page 8] Internet Draft IM June 15, 2000 | F1 MESSAGE | | |--------------------> | F2 MESSAGE | | | ----------------------->| | | | | | F3 200 OK | | | <-----------------------| | F4 200 OK | | |<-------------------- | | | | | | | | | | | | | F5 MESSAGE | | <--------------------|------------------------ | | | | | F6 200 OK | | | ---------------------|-----------------------> | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User 1 Proxy User 2 Figure 1: Example Message Flow through a combination of SRV and A record processing specified in SIP), using UDP. The proxy receives this request, and recognizes that it is the server for domain.com. It looks up user2 in its database Rosenberg et. al. [Page 9] Internet Draft IM June 15, 2000 (built up through registrations), and finds a binding from sip:user2@domain.com to sip:user2@user2pc.domain.com. It forwards the request to user2, and does not insert the Record-Route header. The resulting message, F2, looks like: MESSAGE sip:user2@domain.com SIP/2.0 Via: SIP/2.0/UDP proxy.domain.com Via: SIP/2.0/UDP user1pc.domain.com From: sip:user1@domain.com To: sip:user2@domain.com Contact: sip:user1@user1pc.domain.com Call-ID: asd88asd77a@1.2.3.4 CSeq: 1 MESSAGE Content-Type: text/plain Content-Length: 18 Watson, come here. The message is received by user2, displayed, and a response is generated, message F3, and sent to the proxy: SIP/2.0 200 OK Via: SIP/2.0/UDP proxy.domain.com Via: SIP/2.0/UDP user1pc.domain.com From: sip:user1@domain.com To: sip:user2@domain.com Contact: sip:user2@user1pc.domain.com Call-ID: asd88asd77a@1.2.3.4 CSeq: 1 MESSAGE Content-Length: 0 Note that most of the header fields are simply reflected in the response. The proxy receives this response, strips off the top Via, and forwards to the address in the next Via, user1pc.domain.com, the result being message F4: SIP/2.0 200 OK Via: SIP/2.0/UDP proxy.domain.com Via: SIP/2.0/UDP user1pc.domain.com From: sip:user1@domain.com To: sip:user2@domain.com Rosenberg et. al. [Page 10] Internet Draft IM June 15, 2000 Call-ID: asd88asd77a@1.2.3.4 CSeq: 1 MESSAGE Content-Length: 0 Now, user2 wishes to send an IM to user1, message F5. As there are no Record-Routes in the original IM, it can simply send the IM directly to the address in the Contact header. Note how the To and From fields are now reversed from the response it sent in message F4: MESSAGE sip:user1@user1pc.domain.com SIP/2.0 Via: SIP/2.0/UDP user2pc.domain.com To: sip:user1@domain.com From: sip:user2@domain.com;tag=ab8asdasd9 Contact: sip:user2@user2pc.domain.com Call-ID: asd88asd77a@1.2.3.4 CSeq: 1 MESSAGE Content-Type: text/plain Content-Length: 29 My name is User2, not Watson. This is sent directly to user1, who responds with a 200 OK in message F6: SIP/2.0 200 OK Via: SIP/2.0/UDP user2pc.domain.com To: sip:user1@domain.com From: sip:user2@domain.com;tag=ab8asdasd9 Call-ID: asd88asd77a@1.2.3.4 CSeq: 1 MESSAGE Content-Length: 0 5 Detailed Operation This section more formally defines the syntax and semantics of this extension. 5.1 Method Definition This specification defines a new SIP method, MESSAGE. The BNF for Rosenberg et. al. [Page 11] Internet Draft IM June 15, 2000 this method is: Message = "MESSAGE" As with all other methods, the MESSAGE method name is case sensitive. Tables 1 and 2 extend Tables 4 and 5 of SIP by adding an additional column, defining the headers that can be used in MESSAGE requests and responses. where enc. e-e MESSAGE __________________________________________ Accept R e o Accept 415 e o Accept-Encoding R e o Accept-Encoding 415 e o Accept-Language R e o Accept-Language 415 e o Allow 200 e o Allow 405 e m Authorization R e o Authorization r e o Call-ID gc n e m Contact R e m Contact 2xx e o Contact 3xx e o Contact 485 e o Content-Encoding e e o Content-Length e e m Content-Type e e * CSeq gc n e m Date g e o Encryption g n e o Expires g e o From gc n e m Hide R n h o Max-Forwards R n e o Organization g c h o Table 1: Summary of header fields, A--O Rosenberg et. al. [Page 12] Internet Draft IM June 15, 2000 where enc. e-e MESSAGE ________________________________________________________ Priority R c e o Proxy-Authenticate 407 n h o Proxy-Authorization R n h o Proxy-Require R n h o Record-Route R h o Record-Route 2xx,401,484 h o Require R e o Retry-After R c e - Retry-After 404,413,480,486 c e o 500,503 c e o 600,603 c e o Response-Key R c e o Route R h o Server r c e o Subject R c e o Timestamp g e o To gc(1) n e m Unsupported 420 e o User-Agent g c e o Via gc(2) n e m Warning r e o WWW-Authenticate R c e o WWW-Authenticate 401 c e o Table 2: Summary of header fields, P--Z; (1): copied with possible addition of tag; (2): UAS removes first Via header field 5.2 UAC processing of initial MESSAGE request A MESSAGE request MUST contain a To, From, Call-ID, CSeq, Via, Content-Length, and Contact header, formatted as specified in [2]. All UAs MUST be prepared to send and receive MESSAGE requests with a body of type text/plain. MESSAGE requests MAY contain an Accept header listing the allowable MIME types which may be sent in the response, or in subsequent requests in the reverse direction. The absence of the Accept header implies that the only allowed MIME type is text/plain. This simplifies operation in small devices, such as wireless appliances, which will generally only have support for text, but still allows any other MIME type to be used if both sides support it. Note that multipart may be useful for IM as well; implementations are encouraged to support multipart if possible. MESSAGE requests MAY contain a Subject header indicating the subject of the IM session. Rosenberg et. al. [Page 13] Internet Draft IM June 15, 2000 As a nice implementation feature, the subject can be displayed on the title bar of the window which contains the text of the IM exchange. A UAC MAY send a MESSAGE request for an existing call, established with an INVITE. In this case, the MESSAGE request is processed identically to the INFO method [9]. The only difference is that a MESSAGE request is assumed to be for the purpose of instant messaging as part of the call, whereas INFO is less specific. Also note that it is still possible for a user to maintain separate IM and voice/video clients, yet still receive an IM for an existing call (the IM is delivered to the IM client, of course). 5.3 Proxy processing of MESSAGE requests Proxies route requests with method MESSAGE the same as they would any other SIP request (proxy routing in SIP does not depend on the method). Note that the MESSAGE request MAY fork; this allows for delivery of the message to several possible terminals where the user might be. If a MESSAGE request hits a proxy that uses registrations to route requests, but no registration exists for the target user in the request-URI, the request is rejected with a 404 (Not Found). This is standard behavior for SIP. It is RECOMMENDED that proxies always insert Record-Route into every request, as specified in [10]. 5.4 UAS processing of MESSAGE requests As specified in RFC 2543, if a UAS receives a request with a body of type it does not understand, it MUST respond with a 415 (Unsupported Media Type) containing an Accept header listing those types which are acceptable. Servers MAY reject requests (using a 413 response code) that are too long, where too long is a matter of local configuration. All servers MUST accept requests which are up to 1184 bytes in length. 1184 = minimum IPv6 guaranteed length (1280 bytes) minus UDP (8 bytes) minus IPSEC (48 bytes) minus layer one encapsulation (40 bytes). A UAS receiving a MESSAGE request SHOULD respond with a final response immediately. A 200 OK is sent if the request is acceptable. Rosenberg et. al. [Page 14] Internet Draft IM June 15, 2000 Note, however, that the UAS is not obliged to display the message to the user either before or after responding with a 200 OK. A 200 class response to a MESSAGE request MAY contain a body, but this will often not be the case, since these responses are generated automatically. Like any other SIP request, an IM MAY be redirected, or otherwise responded to with any SIP response code. Note that a 200 OK response to a MESSAGE request does not mean the user has read the message. A UAS MAY include a Contact in a 200 class response. Including a Contact header enables end to end messaging, which is good for efficiency. However, it rules out the possibility of effectively supporting more than one terminal which can handle IM simultaneously. This odd but seemingly innocuous requirement enables a very important feature. If a user is connected at several hosts, an initial IM will fork, and arrive at each. Each UAS responds with a 200 OK immediately, one of which is arbitrarily forwarded upstream towards the UAC. If another IM is sent for the same call-leg, we still wish for this IM to fork, since we still don't know where the user is currently residing. This information is known when the user sends an IM in the reverse direction. This IM will contain a Contact, and when it arrives at the originator of the initial MESSAGE, will update the Route so that now IMs are delivered only to that one host where the user is residing. A UAS constructs a set of Route headers from the Record-Route and Contact headers in the MESSAGE request, as per the procedure defined in [10]. A UAS which is, in fact, a message relay, storing the message and forwarding it later on, or forwarding it into a non-SIP domain, SHOULD return a 202 (Accepted) response indicating that the message was accepted, but end to end delivery has not been guaranteed. 5.5 UAS processing of initial MESSAGE response A 200 OK response to an initial IM will contain Record-Route headers; these MUST be used to construct a Route header for use in subsequent requests for the same call-leg (defined as the combination of remote address, local address, and Call-ID), using the process described in Section 6.29 of SIP [2] as if the request were INVITE. Note that the 200 OK response may not contain a Contact header. A 400 or 500 class response indicates that the message was not delivered successfully. A 600 response means it was delivered Rosenberg et. al. [Page 15] Internet Draft IM June 15, 2000 successfully, but refused. 5.6 Subsequent MESSAGE requests Subsequent messages follow the path established by the Route headers computed by the UA. The CSeq header MUST be larger than a CSeq header used in a previous request for the same call leg. Is is strongly RECOMMENDED that the CSeq number be computed as described in Section 6.17 of SIP, using a clock. This allows for the CSeq to increment without requiring the UA to store the previous CSeq values. MESSAGE requests for an established IM session MUST contain a Tag in the From field. Responses to an IM SHOULD contain a tag in the To field. For SIP experts - this represents a slightly different operation than for INVITE. When a user sends an INVITE, they will receive a 200 OK with a tag. Requests in the reverse direction then contain that tag, and that tag only, in the From field. Here, the response to IM will contain a tag in the To field, and a MESSAGE will contain a tag in the From field. However, the UA may receive MESSAGE requests with tags in the From field that do not match the tag in the 200 OK received to the initial IM. This is because only a single 200 OK is returned to a MESSAGE request, as opposed to multiple 200 OK for INVITE. Thus, the UA MUST be prepared to receive MESSAGEs with many different tags, each from a different PUA. A UAS MUST be prepared to update the Route is has stored for an IM session with a Contact received in a request, if that Contact is different from one previously received, or if there was no Contact previously. Note that an IM effectively initates a session. There is state at the UA associated with that session, encapsulated in the Call-ID, Route headers, and CSeq numbers. A UA MAY terminate this session at any time, including after each MESSAGE. No messaging is required to terminate it. Any associated state with the session is simply discarded. The idempotency of SIP requests will ensure that if one side (side A) discards session state, and the other (side B) does not, a message from side B will appear as a new IM, and standard processing will reconstitute the session on side A. 5.7 Caller Preferences User agents SHOULD add the "methods" tag defined in the caller Rosenberg et. al. [Page 16] Internet Draft IM June 15, 2000 preference specification [8] to Contact headers placed in REGISTER requests, indicating support for the MESSAGE method. Other elements of caller preferences MAY be supported. For example: REGISTER sip:dynamicsoft.com SIP/2.0 Via: SIP/2.0/UDP mypc.dynamicsoft.com To: sip:jdrosen@dynamicsoft.com From: sip:jdrosen@dynamicsoft.com Call-ID: asidhasd@1.2.3.4 CSeq: 39 REGISTER Contact: sip:jdrosen@im-pc.dynamicsoft.com;methods="MESSAGE" Content-Length: 0 Registrar/proxies which wish to offer IM service SHOULD implement the proxy processing defined in the caller preferences specification [8]. 5.8 Security SIP provides numerous security mechanisms which can be utilized for instant messaging services. 5.8.1 Privacy In order to provide privacy of instant messages, it is RECOMMENDED that between network servers (proxies to proxies, proxies to redirect servers), transport mode ESP [6] is used to encrypt the entire message. TLS MAY be used instead. Coupled with persistent connections between users, it is impossible for eavesdroppers on non-UA connections to determine when a particular user has even sent an IM, let alone what the content is. Of course, the content of IMs are exposed to proxies. Between a UAC and its local proxy, TLS [11] is RECOMMENDED. Similarly, TLS SHOULD be used between a proxy and the UAS receiving the IM. The proxy can determine whether TLS is supported by the receiving client based on the transport parameter in the Contact header of its registration. If that registration contains the token "tls" as transport, it implies that the UAS supports TLS. Furthermore, we allow for the Contact header in the MESSAGE request to contain TLS as a transport. The Contact header is used to route subsequent messages between a pair of entities. It defines the address and transport used to communicate with the user agent for subsequent requests in the reverse direction. If no proxies insert Record-Route headers, the recipient of the original IM, when it Rosenberg et. al. [Page 17] Internet Draft IM June 15, 2000 wishes to send an IM back, will use the Contact header, and establish a direct TLS connection for the remainder of the IM communications. If a proxy does Record-Route, the situation is different. When the recipient of the original IM (call this participant B) sends an IM back to the originator of the original IM (call this participant A), this will be sent to the proxy closest to B which inserted Record- Route. This proxy, in turn, sends the request to the proxy before it which Record-Routed. The first proxy after A which inserted Record- Route will then use TLS to contact A. Since we suspect that most proxies will not insert Record-Route into instant messages, efficient, secure, direct IM will occur frequently. To prevent sensitive data from being observed by intermediate proxies, SIP encryption MAY be used end to end for the transmission of MESSAGE requests. SIP supports PGP based encryption, which does not require the establishment of a session key for encryption of messages within a session (basically, a new session key is established for each message as part of the PGP encryption). Other encryption mechanisms, such as S/MIME, can be readily defined for SIP. 5.8.2 Message Integrity and Authenticity It is important for the message recipient to ensure that the message contents are actually what was sent by the originator, and that the recipient of the IM be able to determine who the originator really is. This is supported in SIP through end to end authentication and message integrity. SIP provides PGP based authentication and integrity (both challenge-response and normal signatures), http basic and digest authentication. 5.8.3 Outbound authentication When local proxies are used for transmission of outbound messages, proxy authentication is RECOMMENDED. This is useful to verify the identity of the originator, and prevent spoofing and spamming at the originating network. 5.8.4 Replay Prevention To prevent the replay of old instant messages, all signed MESSAGE requests and responses SHOULD contain a Date header covered by the message signature. Any message with a date older than several minutes in the past, or which is more than several minutes in the future, SHOULD be answered with a 400 (Incorrect Date or Time) message, unless such messages arrive repeatedly from the same source, in which case they MAY be discarded without sending a response. Obviously, this replay attack prevention mechanism does not work for devices Rosenberg et. al. [Page 18] Internet Draft IM June 15, 2000 without clocks. Furthermore, all signed MESSAGE requests MUST contain a Call-ID and CSeq header covered by the message signature. A user agent MAY store a list of Call-ID values, and for each, the higest CSeq seen within that Call-ID. Any message that arrives for a Call-ID that exists, whose CSeq is lower than the highest seen so far, is discarded. Finally, challenge-response authentication MAY be used to prevent replay protection. 6 Requirements Evaluation RFC 2779 [3] outlines requirements for IM and presence protocols. The document describes both shared requirements and IM and presence specific requirements. Examining each of the IM requirements in turn, we also observe that they are met by this proposal: "Requirement 2.1.1: The protocols MUST allow a PRESENCE SERVICE to be available independent of whether an INSTANT MESSAGE SERVICE is available, and vice-versa." This requirement is met by the separation of presence and IM which we propose here. "Requirement 2.1.2. The protocols must not assume that an INSTANT INBOX is necessarily reached by the same IDENTIFIER as that of a PRESENTITY. Specifically, the protocols must assume that some INSTANT INBOXes may have no associated PRESENTITIES, and vice versa." This requirement is also easily met by any architecture which completely separates IM and presence as we propose. "Requirement 2.1.3. The protocols MUST also allow an INSTANT INBOX to be reached via the same IDENTIFIER as the IDENTIFIER of some PRESENTITY." Same as above. "Requirement 2.1.4. The administration and naming of ENTITIES within a given DOMAIN MUST be able to operate independently of actions in any other DOMAIN." This requirement is met by SIP. SIP uses email-like identifiers which consist of a user name at a domain. Administration of user names is done completely within the domain, and these user names have no defined rules or organization that needs to be known outside of the domain in order for SIP to operate. Rosenberg et. al. [Page 19] Internet Draft IM June 15, 2000 "Requirement 2.1.5. The protocol MUST allow for an arbitrary number of DOMAINS within the NAMESPACE." This requirement is met by SIP. SIP uses standard DNS domains, which are not restricted in number. "Requirement 2.2.1. It MUST be possible for ENTITIES in one DOMAIN to interoperate with ENTITIES in another DOMAIN, without the DOMAINS having previously been aware of each other." This requirement is met by SIP, as it is essential for establishing sessions as well. DNS SRV [12] records are used to discover servers for a particular service within a domain. They are a generalization of MX records, used for email routing. SIP defines procedures for usage of DNS records to find servers in another domains, which include SRV lookups. This allows domains to communicate without prior setup. "Requirement 2.2.2: The protocol MUST be capable of meeting its other functional and performance requirements even when there are millions of ENTITIES within a single DOMAIN." Whilst it is hard to judge whether this can be met by examining the architecture of a protocol, SIP has numerous mechanisms for achieving large scales of users within a domain. It allows hierarchies of servers, whereby the namespace can be partitioned among servers. Servers near the top of the hierarchy, used solely for routing, can be stateless, providing excellent scale. "Requirement 2.2.3: The protocol MUST be capable of meeting its other functional and performance requirements when there are millions of DOMAINS within the single NAMESPACE." The usage of DNS for dividing the namespace into domains provides the same scale as todays email systems, which support millions of DOMAINS. "Requirement 2.3.5: The PRINCIPAL controlling an INSTANT INBOX MUST be able to control which other PRINCIPALS, if any, can send INSTANT MESSAGES to that INSTANT INBOX." This is provided by access control mechanisms, outside the scope of this extension. "Requirement 2.3.6: The PRINCIPAL controlling an INSTANT INBOX MUST be able to control which other PRINCIPALS, if any, can Rosenberg et. al. [Page 20] Internet Draft IM June 15, 2000 read INSTANT MESSAGES from that INSTANT INBOX." This is accomplished through authenticated registration requests. Registrations are used to determine which user gets delivered an instant message. Policy in proxies can allow only certain users to register contact address for a particular inbox (an inbox is defined by the address-of- record in the To field in the registration). "Requirement 2.4.3: The protocol MUST allow the sending of an INSTANT MESSAGE both directly and via intermediaries, such as PROXIES." This is fundamental to the operation of SIP. "Requirement 2.4.4: The protocol proxying facilities and transport practices MUST allow ADMINISTRATORS ways to enable and disable protocol activity through existing and commonly-deployed FIREWALLS. The protocol MUST specify how it can be effectively filtered by such FIREWALLS." Although SIP itself runs on port 5060 by default, any other port can be used. It is simple to specify that IM should run on a different port, if so desired. "Requirement 2.5.1. The protocol MUST provide means to ensure confidence that a received message (NOTIFICATION or INSTANT MESSAGE) has not been corrupted or tampered with." This is supported by SIPs PGP and S/MIME authentication mechanism. "Requirement 2.5.2. The protocol MUST provide means to ensure confidence that a received message (NOTIFICATION or INSTANT MESSAGE) has not been recorded and played back by an adversary." This is provided by SIP's challenge response authentication mechanisms, through timestamp-based replay prevention, or through stateful storage of previous transaction identifiers (the combination of To, From, Call-ID, CSeq). "Requirement 2.5.3. The protocol MUST provide means to ensure that a sent message (NOTIFICATION or INSTANT MESSAGE) is only readable by ENTITIES that the sender allows." This is supported through SIPs end to end and hop by hop encryption mechanisms. "Requirement 2.5.4. The protocol MUST allow any client to use Rosenberg et. al. [Page 21] Internet Draft IM June 15, 2000 the means to ensure non-corruption, non-playback, and privacy, but the protocol MUST NOT require that all clients use these means at all times." All algorithms for security in SIP are optional. "Requirement 4.1.1. All ENTITIES sending and receiving INSTANT MESSAGES MUST implement at least a common base format for INSTANT MESSAGES." We specify text/plain here. "Requirement 4.1.2. The common base format for an INSTANT MESSAGE MUST identify the sender and intended recipient." This is accomplished with the To and From fields in SIP. "Requirement 4.1.3. The common message format MUST include a return address for the receiver to reply to the sender with another INSTANT MESSAGE." This is done through the Contact headers defined in SIP. "Requirement 4.1.4. The common message format SHOULD include standard forms of addresses or contact means for media other than INSTANT MESSAGES, such as telephone numbers or email addresses." SIP supports any URL format in the Contact headers. Furthermore, the body of a MESSAGE request can be multipart, and contain things like vCards. "Requirement 4.1.5. The common message format MUST permit the encoding and identification of the message payload to allow for non-ASCII or encrypted content." MIME content labeling is used in SIP. "Requirement 4.1.6. The protocol must reflect best current practices related to internationalization." SIP uses UTF-8 and is completely internationalized. "Requirement 4.1.7. The protocol must reflect best current practices related to accessibility." Additional requirements are needed on what is required for accessibility. "Requirement 4.1.9. The working group MUST determine whether the Rosenberg et. al. [Page 22] Internet Draft IM June 15, 2000 common message format includes fields for numbering or identifying messages. If there are such fields, the working group MUST define the scope within which such identifiers are unique and the acceptable means of generating such identifiers." This is done with the combination of Call-ID and CSeq. The mechanisms for guaranteeing uniqueness are specified in SIP. "Requirement 4.1.10. The common message format SHOULD be based on IETF-standard MIME [RFC 2045]." SIP uses MIME. "Requirement 4.2.1. The protocol MUST include mechanisms so that a sender can be informed of the SUCCESSFUL DELIVERY of an INSTANT MESSAGE or reasons for failure. The working group must determine what mechanisms apply when final delivery status is unknown, such as when a message is relayed to non-IMPP systems." SIP specifies notification of successful delivery through 200 OK. When delivery of requests through gateways, success can be indicated only through the SIP component (if the gateway acts as a UAS/UAC) or through the entire system (if it acts like a proxy). "Requirement 4.3.1. The transport of INSTANT MESSAGES MUST be sufficiently rapid to allow for comfortable conversational exchanges of short messages." The support for end to end messaging (i.e., without intervening proxies) allows IMs to be delivered as rapidly as possible. The UDP reliability mechanisms also support fast recovery from loss. 7 Required SIP features SIP contains many components and capabilities, only some of which are needed to support instant messaging. It is a common misconception to believe that SIP is only good for initiating phone calls. Since SIP separates the definition of a session to other protocols, such as the Session Description Protocol (SDP) [13], SIP is best viewed as a real-time rendezvous system, which allows content to be delivered from one user, to the current location(s) where another user, the desired target, is located. This rendezvous system can be used to deliver invitations to sessions, as is accomplished with the INVITE method, but other data, such as instant messages, can just as easily be delivered. As such, most of the generic components of SIP as they relate to message routing are useful and needed for this extension, and most of Rosenberg et. al. [Page 23] Internet Draft IM June 15, 2000 those related specifically to INVITE, BYE, ACK, and CANCEL processing are not needed. This section outlines those components needed, and those not needed, for IM. 7.1 Needed components The following are the SIP components needed in a user agent to support this extension: o Basic SIP parser, capable of generating To, From, Call-ID, CSeq, To, Via, Route, Accept, Allow, Require, Record-Route, Expires, Contact, Content-Length, and Content-Type headers, in addition to the request and response line. o UDP transmission mechanisms for non-INVITE requests, which is nothing more than a periodic retransmit of a request with exponential backoff. o Implementation of the client and server state machine for non-INVITE requests (used for reliable transport), documented in Section 10.4.1 of RFC 2543. o The ability to send SIP REGISTER requests, and process responses, and refresh those registrations. o Construction and usage of Route headers. o Support the Require mechanism for protocol extension, as defined in Section 6.30 of RFC 2543. o Reject requests with unknown methods, returning an Allow header in the response. o Reject requests with unknown bodies, returning an Accept header in the response. o Send and process SIP responses based solely on the 100s digit. o Send responses based on the Via header processing rules of Section 6.40 If a UA wishes to implement security, it needs to support the security mechanisms defined in RFC 2543. A proxy for IM messages has even fewer requirements: Rosenberg et. al. [Page 24] Internet Draft IM June 15, 2000 o Parse and generate SIP messages, understanding the To, From, Call-ID, CSeq, Via, Route, Record-Route, and Proxy-Require headers, in addition to the request and response line. o If co-located with a registrar, process SIP REGISTER requests and generate responses o Perform the proxying functions described in Section 12 of RFC 2543; these rules mainly concern connection management, Via processing, loop detection, and transport. 7.2 Components not needed User agents supporting IM do not need to support the following SIP capabilities: o Processing of INVITE, ACK, CANCEL, BYE requests o Support for the INVITE reliability mechanisms and state machines o Multiple 200 OK responses o SDP processing o re-INVITEs Elimination of INVITE processing alone results in a substantial reduction in required features. 8 Acknowledgements The authors would like to thank the following people for their support of the concept of SIP for IM, support for this work, and for their useful comments and insights: Jon Peterson Level(3) Communications Sean Olson Ericsson Adam Roach Ericsson Billy Biggs University of Waterloo Stuart Barkley UUNet Mauricio Arango SUN Richard Shockey Shockey Consulting LLC Jorgen Bjorker Hotsip Henry Sinnreich MCI Worldcom Ronald Akers Motorola Rosenberg et. al. [Page 25] Internet Draft IM June 15, 2000 9 Author's Addresses Jonathan Rosenberg dynamicsoft 200 Executive Drive Suite 120 West Orange, NJ 07052 email: jdrosen@dynamicsoft.com Dean Willis dynamicsoft 200 Executive Drive Suite 120 West Orange, NJ 07052 email: dwillis@dynamicsoft.com Robert Sparks dynamicsoft 200 Executive Drive Suite 120 West Orange, NJ 07052 email: rsparks@dynamicsoft.com Ben Campbell dynamicsoft 200 Executive Drive Suite 120 West Orange, NJ 07052 email: bcampbell@dynamicsoft.com Henning Schulzrinne Columbia University M/S 0401 1214 Amsterdam Ave. New York, NY 10027-7003 email: schulzrinne@cs.columbia.edu Jonathan Lennox Columbia University M/S 0401 1214 Amsterdam Ave. New York, NY 10027-7003 email: lennox@cs.columbia.edu Christian Huitema Microsoft Corporation One Microsoft Way Rosenberg et. al. [Page 26] Internet Draft IM June 15, 2000 Redmond, WA 98052-6399 email: huitema@microsoft.com Bernard Aboba Microsoft Corporation One Microsoft Way Redmond, WA 98052-6399 email: bernarda@microsoft.com David Gurle Microsoft Corporation One Microsoft Way Redmond, WA 98052-6399 email: dgurle@microsoft.com David Oran Cisco Systems 170 West Tasman Dr. San Jose, CA 95134 email: oran@cisco.com 10 Bibliography [1] C. A. DellaFera, M. W. Eichin, R. S. French, D. C. Jedlinsky, J. T. Kohl, and W. E. Sommerfeld, "The Zephyr notification service," in USENIX Winter Conference , (Dallas, Texas), Feb. 1988. [2] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP: session initiation protocol," Request for Comments 2543, Internet Engineering Task Force, Mar. 1999. [3] M. Day, S. Aggarwal, G. Mohr, and J. Vincent, "Instant messaging / presence protocol requirements," Request for Comments 2779, Internet Engineering Task Force, Feb. 2000. [4] M. Day, J. Rosenberg, and H. Sugano, "A model for presence and instant messaging," Request for Comments 2778, Internet Engineering Task Force, Feb. 2000. [5] J. Rosenberg and H. Schulzrinne, "SCTP as a transport for SIP," Internet Draft, Internet Engineering Task Force, June 2000. Work in progress. [6] S. Kent and R. Atkinson, "IP encapsulating security payload (ESP)," Request for Comments 2406, Internet Engineering Task Force, Rosenberg et. al. [Page 27] Internet Draft IM June 15, 2000 Nov. 1998. [7] D. Harkins and D. Carrel, "The internet key exchange (IKE)," Request for Comments 2409, Internet Engineering Task Force, Nov. 1998. [8] H. Schulzrinne and J. Rosenberg, "SIP caller preferences and callee capabilities," Internet Draft, Internet Engineering Task Force, Mar. 2000. Work in progress. [9] S. Donovan, "The SIP INFO method," Internet Draft, Internet Engineering Task Force, Apr. 2000. Work in progress. [10] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP: session initiation protocol (draft standard)," Internet Draft, Internet Engineering Task Force, June 2000. [11] T. Dierks and C. Allen, "The TLS protocol version 1.0," Request for Comments 2246, Internet Engineering Task Force, Jan. 1999. [12] A. Gulbrandsen, P. Vixie, and L. Esibov, "A DNS RR for specifying the location of services (DNS SRV)," Request for Comments 2782, Internet Engineering Task Force, Feb. 2000. [13] M. Handley and V. Jacobson, "SDP: session description protocol," Request for Comments 2327, Internet Engineering Task Force, Apr. 1998. Rosenberg et. al. [Page 28]