Internet Engineering Task Force MMusic WG INTERNET-DRAFT Scott Petrack, IBM draft-petrack-sisp-00.txt 13 June 1996 Expires: December 1996 SISP - Simple Internet Signalling Protocol Status of this Memo This document is a first draft of an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. This document is truly rough, but it was felt that the timeliness of the ideas justified dissemination in this preliminary form. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Abstract Simple Internet Signalling Protocol (SISP) performs one function: signalling of realtime data streams over IP networks. SISP has several distinguishing features: it is a tiny extension of RTCP, running over UDP or TCP, it can integrate very well with PSTN signalling, and it can run in very low bandwidth situations without disturbing the real time stream. It is completely scalable with respect to number of participants and also with respect to "tightness" of control, and can work with an extremely wide variety of conference models, policies, and standards. SISP differs from other conferencing protocols in that it performs a single essential task completely. It is argued that other protocols solve only parts of several overlapping problems. SISP can serve as the lowest common denominator for signalling of real-time streams. The requirements that SISP fulfills, the features it offers, the fact that it uses RTCP as an encapsulation scheme, and its generally minimalist approach of solving one problem only are more important than the actual state machine it implements or particular format of its messages. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 2 1. Introduction This paper discusses a solution to a one particular aspect of the very large "session/conference control problem": the problem of signalling. It is also an attempt to help resolve a crisis. There are at present two separate groups of applications which perform conferencing over the Internet. One is the suite of MBONE tools. These tools have little or no conference control built in. Separate protocols and tools are used to supply control if it is desired. This is quite a reasonable approach: after all, streaming is streaming and control is control. There is a long list of protocols emerging at present (SIP, SAP, SDP, SCIP, SCCP) which solve an equally long list of problems (location, announcement, setup, session description, scheduling, negotiation of capabilities). The problem is that these protocols have a large overlap, and in many cases they solve overlapping and ill-defined problems. The second group is the plethora of commercial Internet telephones, videophones, and other real time communication applications. These applications often have control built in directly into the real time stream. Of course, these applications are really very immature, and certainly have not done their IETF homework in almost any subject: none use multicast, few use RTP, and none are interoperable in any way, neither for control nor for streaming. Many people consider this a crisis, although oddly enough there are wildly differing views on what the crisis is. Now of course it is very distasteful to have to deal with this second group of applications at all. One has the impression that there are no "real problems" here, certainly none worthy of real research time or thought. It seems clear that if one does some serious work on the first group of applications, then the commercial applications will fall into line as they realize the advantages of standardization. This note argues otherwise: it begins with the question: "what is the absolute minimum infrastructure that must be in place to allow different multimedia conferencing applications to become interoperable?" I claim that there is a very tiny thing which stands out: signalling of realtime streams. This is the mechanism by which one sets up, maintains, and tears down a realtime stream. All of conference control has as object to allow human users to pass real time streams amongst themselves (although of course there will be cases where some or all users are not human). Signalling is what happens at the very last stage, when all decisions about location, announcement, policy, scheduling, have taken place, and you want to setup the real time stream NOW. It also happens when the real time stream is already streaming, and you need to change some shared ephermal state of it NOW. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 3 SISP has no mechanism to perform location queries, or scheduling of future conferences. In fact, it doesn't really know at all about conferences. It knows about a single RTP stream. SISP simply adds some new RTCP SDES items and a packet type to add some control to a single RTP stream. That's all. If you are satsified with the loose control that RTCP gives over a real time stream, then you do not need to use the new packet type. These allow one to scale the currently available loose control across to a very tightly controlled model. It reuses a number of mechanisms that already exist within RTCP, such as SDES and CNAME. In fact, it is better to say that SISP simply uses these mechanisms, not reuses them. One very important feature of RTP is that each real time stream is a separate entity with its own control; in the same way, SISP treats each real time stream as a separate entity. For example, this allows you to transfer the audio stream in an AudioVideo Call, without transferring the Video stream. These sorts of services are very important. Rather than reinventing them, we get them from RTCP. In general, all issues relating to "shared ephermal state" are implemented on a per-stream basis. Of course, it is very desireable to have standard tools and protocols for location, etc., and of course there is overlap between the need to "announce" and "describe" and "invite." Unfortunately, these problems have not been well enough defined and separated yet, and this is the reason that there are many overlapping protocols which are solving many overlapping problems. We avoid this morass by simply not addressing it in any way. We solve the smaller and perfectly well defined problem of signalling. We certainly hope that solving this will help clarify the other higher level issues. This paper is written from a double perspective. On the one hand, it is indeed a "letter from the front." The author is writing to the generals and strategists back home, describing a particular crisis. He has already done something about it, and he thinks that it is important the generals know. He is a bit upset that the guidelines he has from headquarters are a bit confusing and frankly confused. From another point of view, the author believes that the knot of overlapping requirements and protocols is making for bad strategy. He has an alternative solution, and he thinks that at least some of its features are truly superior to what now exists. He hopes that the following will contribute to untangling the problem. The author's goal is thus a contribution both to the "problem" of overlapping protocols and also to the "crisis" of non-interoperable Internet Telephones and VideoPhones. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 4 With all this out of the way, the rest of the paper can be more straightforward. We will describe the the motivation for SISP, the motivation for using RTCP as transport, basic features of SISP, and a few signal flows. For many reasons (including the "shared ephemeral state" of people submitting internet drafts in June 1996 ), a great amount of important detail is missing from this paper. Apart from simply regretting this, the author wishes to state that the two main ideas of this paper, to use RTCP and to separate out signalling from all other multimedia conferencing problems, can be explained without reference to the bit patterns of the new RTCP packets proposed. In a final section, I compare and contrast broadly SISP with various features of SIP, SAP, SDP, SCIP, SCCP. It is important to understand the claim that non-SISP protocols try to do too much on the one hand, and on the other are not quite rich enough. At first we thought to call the new protocol "YACC" - "yet another conference control." We have convinced ourselves that this would not be accurate: SISP separates out one particular, specific, essential problem, and solves it. 2. RTCP - a model of what is needed The basic features of SISP stem directly from the decision to use RTCP as a basis. So it makes sense to begin with a discussion of the principles that impelled us to such a decision: As explained above, our motivation was to perform signalling, in the dictionary sense of "an act, event, or watchword that has been agreed upon as the occasion of concerted action." That is, signalling are those messages involved in call setup, tear down, and maintenance which causes an action to happen NOW. In particular, the thing of interest which is acted upon is a real time media stream, which in our world is an RTP [1] stream. So we wish to send messages to control real time streams in real time. Now of course it might be interesting to discuss if we really want to control real time streams, or some higher level thing like a "session" or a "conference." But it should be clear that whatever higher level constructs one makes, at some point this turns into control of some real time stream. When a user joins or leaves a conference, for example, then a real time stream is starts or stops flowing over a portion of the Internet, whatever particular meaning you like to assign to the words "user", "join", "leave," or "conference". Since we have to control these RTP streams, it is natural to see what they are made of, and what already exists to signal them. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 5 Looking at the definition of RTP, one discovers that it contains RTCP, a "control protocol," by definition. In fact, the manual states explicitly that "RTCP may be sufficient for `loosely controlled' sessions" [1,p.2]. Hmmm. It is certainly natural to try to make precise just which signalling and control functions are already in RTCP, before going on to invent something new. In any case, we would certainly like to have a signalling mechanism which can scale from very enabling very loose to very tight control. If RTCP covers one end of the spectrum, it is interesting to see how far it can be pushed, at what point it breaks, and if a continuum can be built with it as one end. One discovers that RTCP is pretty rich already. For example, "by having each participant send its control packets to all the others, each can independently observe the number of participants." [1,p.15] This is certainly some sort of session awareness, of "shared ephemeral state" in the sense of [2]. There is also a great deal of information sent about the sender in the RTCP SDES packet. Although it is not clear at first why one needs to know the email of the person to whom you are talking (I don't know the email address of many of the people I talk to over the telephone), the fact is that *all* the current suggestions within MMUSIC seem to think that this is very important. Luckily for us, then, every RTP stream is already required to have this information transmitted within and SDES packet. So applications already have code to transmit this information. It seems a shame to code it again. In fact, RTCP already solves some other difficult problems in multimedia signalling. Consider the problem of how to define a "session" or "conference." In RTCP, one has the notion of a "Canonical Name" (CNAME). This is used in the RTCP packets so that different RTP streams can be associated. For example, this is how one can know that a particular RTP video stream and a different RTP audio stream are in fact meant to be a synchronized VideoPhone call. What more natural thing to do than to use an RTCP stream to convey all this information which is vital to call signalling? For example, in a very tightly controlled conference (like an ordinary phone call), one might start by sending an RTCP stream with a CNAME and SDES and other necessary information, and when the necessary shared stated has been obtained, the RTP stream itself can begin. If there are several RTP streams that make up a session, one could actually keep one RTCP stream exclusively for signalling, or just add new RTCP and RTP streams as needed. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 6 In fact, just by using RTP one can get a range of loose to tight control models. As one example among many others, if one wishes to have a multicast session, where some parameters can not be announced in advance, then one can send out the required parameters in an SDES packet, and any RTCP monitor looking at the traffic can join the conference. If one wants tighter control, then security and encryption are both a part of RTCP already. Before we come to the bad news, let us continue and see how RTCP solves some problems of signalling simply and naturally. The BYE packet of RTCP is actually a true signal, in that it does indeed constitute a "watchword which has been agreed upon as the occasion of concerted action." (Well, of course in an extremely loosely controlled conference, of course, this may not be true, but in such a case the BYE is not very important). Here is a more sophisticated reason to use RTCP as a signalling mechanism: signalling often involves precise timing considerations. The need for precise timestamps to deal with some aspects of "shared ephemeral state" is carefully discussed in [2]. Indeed, in the public telephone network (PSTN), passing these strict timing requirements is one aspect of the process of homologation. RTCP packets come with timestamps as well. Another advantage of RTCP is that it allows for separate signalling for each real time stream. For example, if I wish to transfer a VideoPhone conversation to someone who is connected only by telephone, I might wish to transfer the audio stream of the call, but not the video stream. It would be unfortunate if the transfer had to fail, just because the third party had no video support. Just as one doesn't want to *force* someone to associate or interleave two separate streams, logically or physically, one shouldn't try to force association of signalling either. Problems that arise because of bandwidth considerations are best dealt with by RTCP compression [3,4], not by forcing users to have reduced functionality. Finally, for those applications which run on very low bandwidth links, using RTCP has two advantages, one of which is perhaps a bit subtle: First, we have seen that many things one needs to send for signalling are required in any case by RTCP. So apart from saving tired fingers the trouble of writing new code, using RTCP can also save bandwidth. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 7 Second, on a very low bandwidth link, merely sending a signal can overload the bandwidth, when an application is sending true RTP. Now there is a small amount of tolerance for when one can send an RTCP packet. In particular, a clever application can arrange to send the RTCP packet during a "silence" period, which for the present purpose just means a period when the RTP stream is not itself transmitting. This is sometimes difficult, but a good application will know how to exploit this. Of course, one can perform similar juggling with the signals, but if one is transmitting signals for an RTP stream along with the RTCP stream, it is obviously easier to coordinate things. I hope that the reader is convinced that RTCP is already well on the way to enable signalling for real time streams that is robust, flexible on the scale of loose/tight control, and very effecient in bandwidth and implementation. 3. Extensions to RTCP Unfortunately, there are indeed some needed messages that are missing from RTCP. Not surprisingly, these are needed precisely to fill out the "tightly controlled" end of the scale. What is amazing is that so little is needed. I am sorry to be very informal here, and beg the extreme indulgence of the reader. A committment is made to provide details at a later date. 3.1 RDES - receiver description packet type In a tightly coupled conference you clearly need to identify the person you wish to speak to. Now exactly what "identify" means is of course an interesting subject. We can say what we mean quite precisely: It is assumed that the remote machine receiving the RTCP packets has some means of identifying the person you wish to contact. It is the duty of a decent "location service" to provide both the address (IP, port) of the machine to recieve the real time stream, and hence the RTCP signalling, as well as the tag/value information needed to identify the actual remote party. How this location service works is beyond the scope of the signalling-for-RTP-streams considered here. In any case, the RDES message should have exactly the description needed to identify the remote party. Of course, there is no *requirement* that one use the RDES field to tightly control a conference. One could imagine a private multicast to thousands of members of a cult, where the standard methods of RTCP security could be used to control conference membership very tightly. But it is equally obvious that one mechanism for tight control is that an RDES message should be sent at the very beginning of a call to identify the called party. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 8 It should come as no surprise that the suggested format for an RDES message is identical to that of an SDES message. We shall give an example of the use of the RDES message in the appendix. 3.2 RCAP item - receiver capabilities, new item in SDES Until now, I have not yet given a single hint of any signal flow. I have given no model of any kind for control. The next item type seems indeed related to the particular model for signal flows. But in fact it does not really forbid any particular model. The "receiver capabilities" items list the RTP payload types that the particular Receiver is willing to accept. Note once again that I make no assumption whatsoever about how this list is obtained. It may be a list of the coders that the receiver's machine can actually decode, or it may be a subset of that list based on such things as available machine resources, hierarchy within an organization, or the phase of the moon. As far as the needs of signalling go, a potential receiver must be able to send out a list of those RTP payload types that it is willing to receive right now. This list can contain any of the accepted standard RTP payload types, or elements of some other list of payload types agreed upon by non-RTP means. An example of setting up a simple call will be given in the appendix, but it may help to state here that the basic mode of call setup is inspired by the H.245 capabilities exchange of ITU-T standard H.323. Namely, the reciever merely lists the payload types that it is willing to accept, and then the transmitter chooses one of those types for transmission. Note that we can agree that the order of payload types within a list describes the order of preference which the receiver has. Note also that we need no special new item in SDES to describe what is actually being sent. This is done in the payload type of RTP. It might be rather confusing that the RCAP item type is found in the SDES packet, and not in the RDES packet. This is pure logic: an RDES packet is sent in order to identify the *remote* party. But one sends a SDES packet to describe "oneself," and part of this description is what one is willing to tolerate receiving. 3.3 CP item - call progress, new item in SDES The final item that we need to add is one that allows call progress to occur. Call Progress is the feedback that one obtains during the life of a call from the network system. For example, you hear a particular sound after you dial a remote user, and you know that his or her equipment is ringing. The call progress words currently supported by SISP are the following: Ringing, Accept, Busy, NoAnswer, Reject, Pause, Error, Release, Info. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 9 When the remote party's application is "ringing," it sends an SDES packet with the CP item set to "ringing." The local application receives this and can send the appropriate message ("Drrrrrrrring!") to the local user. These are new items for an SDES packet because we think of the user who is "ringing", or "accepts" a call as describing "himself" in an SDES packet. Unfortunately, this means that even if I am a receiver only, I might send an SDES packet, for example to say that my app is "ringing." This has caused some confusion, even with the author. It is really the "receiver" who is ringing, or busy, or pausing. But it seems to be RTCP convention that an SDES packet is describing "himself". Originally, these call progress items were part of the receiver report. There was no release item, and the BYE packet was used instead. (All this point needs to be clarified). Although the general meaning of each word is clear, there are a few comments to make about some of the CP items: Accept: The application sends an "accept" CP item when it is ready for the other side to start streaming its RTP data. Of course, there are many conference models where this makes no sense. For example, in a loosely defined model, I certainly don't want to wait for an accept message to begin streaming. This is entirely correct. SISP does not *require* that an application send an "accept" message before the remote party begins to stream. Whether or not this is necessary is decided by means totally outside of SISP, and is definitely a part of the conference model being used. This will be decided by a session "announcement" or "description", or some other means. SISP is merely a signaling protocol. SISP claims that RTCP, supplemented with the very few additions here, is rich enough for all Internet Signalling means. One can make a similar comment about every item of type CP (Call Progress). Indeed, we have seen that in the loosest conference model, RTCP suffices (the RTP standard says it does, so this statement is by IETF consensus true). But if one wants to distinguish, for example, between a call that is rejected because there was no answer or because the user made an active choice to reject the call, one can use SISP to do this. We shall see another example of the fact that SISP does not mandate conference policy, but merely allows one to express it, in the appendix. Pause: This SDES item just says that the receiver is stopping recieving "for a while". It is an indication that the receiver has "put you on hold." Note that I did not put a "Resume" item type. When I put you on hold, you really have no way of knowing this in an ordinary call. But one might wish to add the signal. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 10 Release: On the face of it, this is a totally superfluous item, because there is a BYE packet. It was added so that a receiver can also request or announce that s/he will disconnect. It was also added to allow for some more complicated supplementary services. The idea is that many supplementary services end with a simultaneous release, and an instruction for one party to do something else. For example, in a blind call transfer, where A has called B, and after some time, A would like B to speak to C instead, but doesn't need to inform C about it first (this is the "blind" part of the transfer), then A would send an SDES with a release CP item to B, along with an INFO CP item which said "call C." This last example is one of many many supplementary services. The author has checked that the very simple list here is enough to implement the gamut of supplementary services. Signal flows will be given in the next version of this document. The conclusion of this is that by adding only 3 things to RTCP - one packet type and two new SDES items, it is possible to use RTCP to implement the full range of Signalling needed for Internet Conferencing Applications. 4. Complaints, complaints. With such a scant description of SISP, it would be highly inappropriate to critisize other attempts to provide for internet signalling in detail. We shall try to list general objections to previous solutions. First, signalling should be totally separate from the location service. Of course, a location server may indeed use SISP if that is appropriate. But that would be signalling for the location server call, not for the actual call one wishes to make. SISP begins its function after the location of the remote party has already been decided. Second, the signalling protocol should be allow for any conference model. For example, a protocol which *forces* an application to distinguish "reject" and "no answer" is flawed: the user may not wish to convey the information that s/he rejected an invitation to confer. Certainly there cannot be a requirement for any centralized statekeeper if one wishes to include loosely controlled multicast conferences. Third, there must be the possibilty of dynamic negotiation of capabilities in real-time, via signalling at the time of connection. This is because one may need to reserve machine resources, and one can only do this "at the last minute." draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 11 Fourth, it is important to allow for independent signalling on independent RTP streams. This in itself is a strong motivation for starting with RTCP. Fifth, it is important to be truly scalable, in terms of available bandwidth, number of users in the session, and along the tight-loose control access. This is not an easy problem, and much work has gone into RTCP to this end. Sixth, it is important that the signalling allow for timestamps on signals. Seventh, it is important in some applications that the signalling itself be secure. At the other extreme, for some loosely controlled conferences, it is useful to have "signal monitors" that can "pick up" enough of the required information to join the conference. Eighth, by definition RTCP is everywhere where RTP is. It is far from clear that SMTP, HTTP, etc. will be there. (Imagine very small cheap special purpose communication devices). Ninth, the "global id" problem is quite complicated, and tying down multimedia conferencing to any particular solution of this problem is difficult. In any case, the part of the problem that is location should be treated by a location server, and the part of the global id problem that relates to shared ephemeral state is best treated by the simple CNAME mechanism of RTP. The part of the problem relating to things like dynamic IP or "Integration into Email," for example, is not really a problem that is related to signalling. 5. Reliability of SISP messages The reader may have the impression that the author has somehow forgotten that RTCP is not reliable. Indeed, in trials he has simply used TCP for the RTCP flow. Since the RTCP traffic is really very slight, this has not caused problems, even on slow serial links. (In fact, because of TCP/IP compression, TCP is usually a more effecient choice over a dial up link!). Of course in situations where it is not possible to use TCP, some other means must be used to ensure the reliability of the SISP signalling. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 12 6. Signal Flows (These must be written up, but I shall give at least one!) 6.1. Of course, the simple open multicast conference is an example of SISP signalling, as is any other conference which relies on some non-RTP means to determine location, and then only RTCP for conference control. But we shall give an example of a simple Internet Telephone Call, using SISP. In the following example A wishes to call B. The precise timings are not given for simplicity, but the packets sent are written in time order. a. A sends B the following RTCP packets, in this order: (RTP is not yet being sent) SDES: identifies the caller. (This is optional) RDES: identifies the callee. The information used in this packet is obtained from some location server or other means. RCAP: identifies the recpetion capabilities of A (remember that if there is more than one RTP stream, then there will be more than one SISP stream as well). b. B receives the three packets, and perhaps it consults with the OS and with some databases. It starts a ringing signal to the user, and sends the following packet to A: SDES: identifies B, and sends the "ringing" CP item c. Perhaps after some consultation with the user, with some databases, and with the operating system, B sends the following RTCP packets, in this order: SDES: identifies B and sends the "accept" CP item d. Upon getting the "accept" message, A knows that it can start streaming. It sends the following packet to B: SDES: identifies A and sends the "accept" CP item And now B knows that it can start streaming as well. draft-petrack-sisp-00.txt Simple Internet Signalling Protocol 13 June 1996 Page 13 Acknowledgements: The author wishes to thank Ed Ellesson of IBM for helpful ideas and advice, encouragement, and tolerance. References [1] H. Shulzrinne, S. Casner, R. Frederick, and S. McCanne, "RTP: A Transport Protocol for real-time applications." RFC 1889 [2] S. Shenker, A. Weinrib, E. Schooler, "Managing Shared Ephemeral Teleconferencing State: Policy and Mechanism." draft-mmusic- ietf-agree-00.ps [3] S. Casner and V. Jacobson, "Compressing IP/UDP/RTP Headers for Low-Speed Serial Links." draft-casner-jacobson-crtp-00.txt [4] S. Petrack, "Compression of Headers in RTP Streams", draft-petrack-crtp-00.txt Author's Location Information Name=Scott Petrack Address=IBM Haifa Research Lab, Haifa 31905, Israel Email=petrack@vnet.ibm.com Telephone=+972 4 829 6290 Fax=+972 4 829 6112