SIPPING B. Stucker Internet-Draft Nortel Intended status: Informational October 18, 2006 Expires: April 21, 2007 Coping with Early Media in the Session Initiation Protocol (SIP) draft-stucker-sipping-early-media-coping-03 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 21, 2007. Copyright Notice Copyright (C) The Internet Society (2006). Abstract Several mechanisms for early media have been proposed in the past, each attacking a different aspect of the problem. A good example of this is RFC-3960 which talks about two models of early media: the gateway model, and the application model. The gateway model uses a series of offer/answer exchanges to control the rendering of early media, but breaks down in the presence of forking (as mentioned in section 3 of RFC-3960). The application model relies on the UAS to know when it is generating early media and use RFC-3959 to keep early Stucker Expires April 21, 2007 [Page 1] Internet-Draft Coping w/ Early Media in SIP October 2006 media and regular media streams separate to avoid clipping. Even in the presence of the recommendations in RFC-3960 some problems exist within SIP in the area of early media. Although some of these challenges are likely to never be overcome, for example when interworking with a PSTN gateway that does not take into account CPG or ACM messages (in the case of ISUP). However, the potential to improve on what is already there does exist. This document attempts to go into more detail around early media where RFC-3960 left off, what sorts of mechanisms are in use today in existing implementations to deal with the challenges at hand, derives requirements and a possible mechanism to improve upon the current model. In addition, the document goes into other areas that can complicate or be complicated by the presence of early media (especially with forking) such as SRTP keying and media flow authorization. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Types of Early Media . . . . . . . . . . . . . . . . . . . . . 5 3.1. Pre-routing early media . . . . . . . . . . . . . . . . . 5 3.2. Pre-presentation early media . . . . . . . . . . . . . . . 6 3.3. Post-presentation early media . . . . . . . . . . . . . . 6 3.4. Non-SDP early media . . . . . . . . . . . . . . . . . . . 7 4. Current common coping mechanisms for early media . . . . . . . 7 4.1. Problems with current coping mechanisms . . . . . . . . . 8 4.1.1. Proxy-side coping mechanisms . . . . . . . . . . . . . 8 4.1.1.1. Proxy SDP stripping . . . . . . . . . . . . . . . 8 4.1.1.2. Proxy SDP weighting . . . . . . . . . . . . . . . 9 4.1.2. Client-side coping mechanisms . . . . . . . . . . . . 9 4.1.2.1. Client detection of forking . . . . . . . . . . . 9 4.1.2.2. Client slow-start INVITE . . . . . . . . . . . . . 10 4.1.2.3. Client Usage of Gateway Model . . . . . . . . . . 10 4.1.2.4. Client Usage of Application Server Model . . . . . 10 5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.1. Deprecation of forking . . . . . . . . . . . . . . . . . . 11 5.2. Deprecation of early media . . . . . . . . . . . . . . . . 11 5.3. Originating UA's to render early media . . . . . . . . . . 12 5.4. Downstream signaling of acceptance . . . . . . . . . . . . 12 5.5. Upstream signaling of importance . . . . . . . . . . . . . 13 5.6. Universal backward-compatibility . . . . . . . . . . . . . 13 5.7. Recursive forking . . . . . . . . . . . . . . . . . . . . 13 5.8. Media Gating . . . . . . . . . . . . . . . . . . . . . . . 14 6. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 14 6.1. Early Media Classification and Prioritization . . . . . . 14 6.1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . 14 6.1.1.1. Early-Media Classifications . . . . . . . . . . . 15 Stucker Expires April 21, 2007 [Page 2] Internet-Draft Coping w/ Early Media in SIP October 2006 6.2. Early Media Flow Negotiation . . . . . . . . . . . . . . . 16 6.2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . 16 6.2.2. SDP parameters . . . . . . . . . . . . . . . . . . . . 16 6.2.3. Usage of emflow with offer/answer . . . . . . . . . . 17 6.2.3.1. Meaning of a=emflow:none . . . . . . . . . . . . . 17 6.2.3.2. Meaning of a=emflow:send . . . . . . . . . . . . . 17 6.2.3.3. Meaning of a=emflow:recv . . . . . . . . . . . . . 18 6.2.3.4. Meaning of a=emflow:sendrecv . . . . . . . . . . . 18 6.2.3.5. Usage of RTP-SSRC-Value . . . . . . . . . . . . . 18 6.2.4. Option tag for emflow . . . . . . . . . . . . . . . . 19 6.2.5. Example . . . . . . . . . . . . . . . . . . . . . . . 19 6.3. Early Media and SRTP . . . . . . . . . . . . . . . . . . . 20 7. Security Considerations . . . . . . . . . . . . . . . . . . . 21 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 9.1. Normative References . . . . . . . . . . . . . . . . . . . 22 9.2. Informational References . . . . . . . . . . . . . . . . . 22 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 23 Intellectual Property and Copyright Statements . . . . . . . . . . 24 Stucker Expires April 21, 2007 [Page 3] Internet-Draft Coping w/ Early Media in SIP October 2006 1. Introduction One of the mechanisms within SIP [RFC3261] that has caused much consternation (and interesting service scenarios) is forking, especially forking of INVITE requests. This is where a SIP INVITE request sent to a SIP proxy is resolved into two or more destinations which are signaled in parallel or sequentially by the proxy. When this occurs, multiple downstream parties will receive similar INVITE requests to initiate a SIP session from a given originating SIP user agent (UA). This creates the possibility of race conditions where the ordering of the provisional and final responses to this request, as observed by the originating SIP UA, may potentially arrive in any order, or not at all. Another mechanism in SIP that looks simple, but causes difficult interactions, was introduced to handle SIP to PSTN interworking. Because the PSTN has a specific set of behaviors which require that only one endpoint in the PSTN network (typically the last PSTN switch reached) may generate media back to the originator of a PSTN call, generation of early media (media produced prior to the intended terminator of a call answering the call) is relatively straight- forward. In SIP, this PSTN interaction with early media was handled by allowing any endpoint that has received an SDP offer as part of setting up a session to be able to immediately generate media back to the to SDP offerer. Further, the SDP offerer was obligated to be prepared to render any media received at the location specified in the SDP offer at any time as long as the session was in a setup or stable state. Each of these mechanisms, taken separately, can create complex signaling flows and difficult service interactions to resolve. Together, however, they compound the effects of one another to create an area of study that has been open within the SIP design community for some time. Several extensions to [RFC3261] have been proposed to handle some of the various effects that early media suffers from, most notably [RFC3959] and [RFC3960]. However, none have fully attacked a few key areas of interest: o Controlling the order and timing of early media stream rendering at the originating SIP UAC. o Knowing under what general conditions early media flows are potentially being sent to the originating SIP UAC. This document seeks to capture the salient requirements for these areas, and propose a mechanism for handling these early media interactions in a more predictable manner. Stucker Expires April 21, 2007 [Page 4] Internet-Draft Coping w/ Early Media in SIP October 2006 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [3]. 3. Types of Early Media Not all early media is created equal, some types are more problematic than others. There are four generic types of early media within SIP: 1. Pre-routing early media - This is early media that is conveyed via SDP and is presented to the originator by a proxy before routing on the URI is started. 2. Pre-presentation early media - This is early media that is presented to the originator, conveyed via SDP, by a proxy after the URI has been routed upon, but before any forwarding of the INVITE request has occurred. 3. Post-presentation early media - This is early media that is presented to the originator, conveyed via SDP, by either a forking proxy or any subsequent hop after the INVITE request has been forwarded from the proxy. 4. Non-SDP early media - This is early media that may be presented to the originator at any time through means other than SDP, such as the Alert-Info header as defined in [RFC3261] 3.1. Pre-routing early media Pre-routing early media is typically generated and characterized by a proxy that has an associated media resource. An example of this type of early media would be a brief 'branding' message that is played to the originator thanking them for using the service provider associated with the originator's local outbound proxy. When the message ends, the media resource signals this to the proxy and routing of the request continues per [RFC3261] This type of early media typically does not pose the originator's local outbound proxy any issues unless the client is using one of the mechanisms defined in Section 4.1.2 or something similar. This is because the proxy is in complete control over the pace at which the terminator will be routed to relative to the media stream being presented. If the proxy attempting to present pre-routing early media to the originator is a subsequent proxy from the originator's local outbound proxy, then the service may not work due to upstream proxies employing one of the mechanisms described in Section 4.1.1 Stucker Expires April 21, 2007 [Page 5] Internet-Draft Coping w/ Early Media in SIP October 2006 3.2. Pre-presentation early media Pre-presentation early media is similar to pre-routing early media except that it may take into account the routes that the proxy is about to route the INVITE request to in its decision of what to play. This may allow the proxy to employ one of the proxy-side early media coping mechanisms defined in Section 4.1.1. Likewise, the proxy may inject its own SDP answer into the signaling to the originator to kick off services like colorful ringback tone (CRBT) where the originator is hearing a recording (typically music) selected by the terminator while the network attempts to reach the terminating party. Pre-presentation early media also differs from non-SDP early media in that the proxy or proxies are manipulating the SDP offer/answer rather than SIP headers such as Alert-Info (as defined in [RFC3261]) to signify what media the originator should be rendering. There are several potential reasons why the Alert-Info header is not used in this case: the service may be interactive, requiring two-way media in order to work (such as digit collection for a credit card number), or may not want to rely on the originator's ability to render the information in the Alert-Info header to the end user (such as a call originating from the PSTN through a SIP gateway). 3.3. Post-presentation early media Post-presentation early media is most typically characterized by the ugly interactions that arise between it and forking. Since this is early media that has come about after the proxy has potentially caused multiple endpoints to be contacted, and therefore the possibility that multiple early media streams may have been triggered, it is commonly considered to be the worst-case scenario with early media. To compound the basic issue at play, the presence of forking can confuse the type of early media being presented to the originator. A downstream proxy that has received a forked request may not be aware that the INVITE has forked as a B2BUA may have forked the request. As a result, the proxy may be acting as if it is the only proxy to handle the request from the originator, and operate in a pre-routing, pre-presentation, or non-SDP early media mode despite the fact that the early media reaching the originator is post-presentation. Therefore, unless the proxy is the originator's edge proxy, it cannot necessarily determine what kind of early media it may actually be sending to the originator. Stucker Expires April 21, 2007 [Page 6] Internet-Draft Coping w/ Early Media in SIP October 2006 3.4. Non-SDP early media Non-SDP early media is typically characterized by the presence of an Alert-Info header [RFC3261]. The Alert-Info header specifies a URI that the originator may go to in order to receive a file or stream that contains information (such as a wave recording) about the ringback tone the terminator wishes the originator to hear. It is somewhat simpler in that it is not part of the offer/answer model, and that it is not trying to create a two-way media stream. The interaction between inband ringback, client generated (local) ringback, and other forms of early media is spelled out in [RFC3960]. It is worth noting that rendering the Alert-Info header contents should only be done when the origin of the header is trusted (per [RFC3261]), so this may limit its usefulness to a considerable degree. The remainder of this document assumes that the UAC and UAS follows the advice in [RFC3960] with respect to interactions with early media. Although non-SDP early media is for future study, it is envisioned that this document would clarify the behavioral interactions between non-SDPearly media and other types of early media. 4. Current common coping mechanisms for early media A number of mechanisms exist for coping with early media. They all rely, generally, on 'fixing' the early media problem by 'breaking' the behaviors specified in other RFCs (or at least bending the spirit of them to some extent): 1. Proxy SDP stripping - If a proxy detects that it is about to fork an INVITE, it keeps track of this fact in its processing state for the INVITE transaction. Any SDP answers in provisional responses are stripped before being forwarded upstream. The SDP answer may be added into a 200 response upstream from last provisional SDP answer received if SDP is not already present in the message to ensure that the offer/answer exchange is completed. This effectively turns off early media. 2. Proxy SDP weighting - If a proxy detects that it had previously forked the INVITE to which it is now receiving a provisional response it may allow a particular provisional response to retain the SDP answer in the message body and strip other SDP answers in provisional responses per the proxy SDP stripping methodology. This mechanism is used to favor SDP that the proxy may have some control over. For instance, if the proxy knows that one forked leg is to a media server streaming CRBT media to the originator, it may allow that SDP answer to flow back, but block all other SDP answers on other legs in the meantime. Stucker Expires April 21, 2007 [Page 7] Internet-Draft Coping w/ Early Media in SIP October 2006 3. Client detection of forking - Clients may start out playing local ringback to the originator until the first SDP answer is received. When the first SDP answer is received, the client may switch to playing the media for that SDP answer. However, upon detecting that the INVITE forked through subsequent provisionals being received (reception of two or more distinct SDP answers or [RFC3261] 'TO' header tags) the client may irrevocably return to playing local ringback. At this point, the client is likely to continue to playing local ringback until the call is answered, or an error condition arises. 4. Client slow-start - Clients may wish to simply not include any SDP in their initial INVITE message in order to accumulate a set of SDP offers from their prospective terminating endpoints. Such INVITEs are known as 'slow-start' INVITEs, because the SDP offer/ answer exchange gets off to a 'slow start'. These may also be used in protocol interworkings (notably H.323 to SIP) with no intent as to managing early media. The client can either use PRACK or UPDATE to respond to offers received in provisional responses at the point in time the originating client wishes to stage early media streams. 4.1. Problems with current coping mechanisms 4.1.1. Proxy-side coping mechanisms 4.1.1.1. Proxy SDP stripping This is a very common mechanism, perhaps second only to the two client mechansims mentioned above. When a proxy employs this mechanism, it remembers when forking has occurred and removes any SDP in provisional responses as a result. This means that if the originator supports reliable provisional responses (100rel) as defined in [RFC3262], that this option tag must be removed by the proxy before forwarding the INVITE to each forked leg. Otherwise it may be forced to potentially handle SDP in negotiation within a PRACK transaction for the originating client with little or no information about the originating client's capabilities. In the case that the originator requires support for PRACK the proxy may have to fail the call setup, handle very complex negotiation signaling in the case that the call forks, or simply not fork the call. Additionally, this mechanism also completely breaks any early media services or announcements, some of which may be critical to proper completion or billing disposition of the call upon answer. For instance, the call may fork to a PSTN gateway that is trying to tell the originator that it is about to bill them $500 to complete the current call. With proxy SDP stripping this announcement would not be heard by the originator. Stucker Expires April 21, 2007 [Page 8] Internet-Draft Coping w/ Early Media in SIP October 2006 4.1.1.2. Proxy SDP weighting Proxy weighting of SDP can be useful in situations where the proxy knows what is going on with the call routing for each leg. However, lack of information as to why downstream elements are sending SDP in provisional responses can cause proxies to weight the SDP incorrectly. Further, if multiple proxies are traversed, the SDP that is accepted for delivery to the originating UA may not be the SDP selected at any given proxy. There is no indication to downstream network clients as to what has happened with their SDP as it traverses proxys back upstream towards the originator. Likewise, the $500 warning announcement presented in the previous section may or may not be heard. 4.1.2. Client-side coping mechanisms 4.1.2.1. Client detection of forking This mechanism is where a client may play audible ringback while waiting for an initial provisonal or final response to an INVITE message it originated. When the first provisional response with SDP is received, it may switch from playing audible ringback to rendering the media stream defined in the SDP. If a subsequent provisional response is received from a different endpoint (identifiable by a different to-tag in the 'TO' header as defined in [RFC3261]) it stops rendering any early media media packets it is receiving and typically returns to audible ringback. Upon receiving a non-3xx final response, the UA switches media appropriately to the response. For 3xx responses, the client continues to play audible ringback if that was what is currently being rendered, or switches (typically) to ringback again if it was rendering media packets. This mechanism is used by client devices for a number of reasons: o What gets presented to the end user is predictable. o Does not rely on the set of proxies handling any given INVITE request to do anything special. o Is easy to implement. The problem with this approach is that it often causes early media to break altogether. If a leg that the call was forked to is awaiting media from the originating client (such as prompting for digit collection, like a credit card number or extension) that leg's early media may fail due to other provisional responses sent to the originator by other call legs. Network and terminating services that utilize early media are likely to fail or work erratically (due to race conditions between messages) when an originating client behaves in this manner. What's worse, is Stucker Expires April 21, 2007 [Page 9] Internet-Draft Coping w/ Early Media in SIP October 2006 that there's no indication as to what the originating client is doing to the downstream network elements. Due to the client switching to locally generated playback, and ignoring early media RTP streams prior to receiving a final response to the INVITE, there is the opportunity for clipping to occur is the SIP signaling path latentcy lags the media path latentcy. 4.1.2.2. Client slow-start INVITE Slow-start INVITEs circumvent the problem of having to immediately render media packets from an unknown set of terminating endpoints by not giving those endpoints anywhere to send the media to. However, this mechanism has some serious drawbacks, most notably guaranteed clipping (potentially severe if the SDP offer is not received from the other end until a 200 response is received) and the potential for an increased number of messaging round-trips to setup a call. Due to some service designs and protocol interworking slow-start INVITEs will continue to be seen, but due to the clipping problems associated with slow-start INVITEs this coping mechanism is considered to be incomplete. 4.1.2.3. Client Usage of Gateway Model Clients typically do not use the [RFC3960] gateway model because of the limitations presented in the RFC around early media and forking with the gateway model in section 3.1. 4.1.2.4. Client Usage of Application Server Model The application server model defined in [RFC3960], along with [RFC3959] define an improved mechanism over the gateway model in that early media is negotiated separately from regular media to reduce media clipping issues. However, there still are problems with UASs that generate early media packets upon receiving the SDP offer from the UAC that cannot currently be distinguished from other media in all situations, and the UAC has no feedback from the various UASs that are generating early media as to which ones are of importance or otherwise. UASs typically do adhere to the request in [RFC3960] section 4.1 that they not generate superfluous early media streams to assist the UAC with early media rendering. 5. Requirements The following requirements are considered to be the starting point in more formally discussing improvements to SIP for early media Stucker Expires April 21, 2007 [Page 10] Internet-Draft Coping w/ Early Media in SIP October 2006 interactions: R1: Deprecation of forking within the [RFC3261] is considered to be out-of-scope of the possible solutions (sorry Dean). R2: Deprecation of early media from within [RFC3261] is considered to be out-of-scope of the possible solutions (sorry again, Dean). R3: SIP UAs that are attempting to create a new SIP dialog using the INVITE method should no longer be obligated to blindly render media packets that are delivered to them as a result of an SDP offer sent in the INVITE. R4: A mechanism should exist by which an originating SIP UA can signal to a downstream SIP endpoint that it is now willing to accept media packets. R5: A mechanism should exist by which a terminating SIP UA can signal to an upstream SIP endpoint what type of early media (if known) it wishes to present to the originating UA, if it requires one-way, or two-way media flows, and the relative importance of the early media. R6: Universal backwards-compatability is a secondary goal. Where possible, backwards-compatability with clients that do not implement recommendations in this draft should be preserved. R7: The mechanism must be able to deal with recursive forking scenarios. This is where an INVITE passes two or more proxies that both choose to fork the request to two or more endpoints at each proxy in parallel. R8: The mechanism must not require exchange of packets on the media path to identify or coordinate early media streams as this may not interoperate with common network media gating mechanisms. 5.1. Deprecation of forking Deprecation of forking from SIP [RFC3261] is considered to be out of scope. This is due to the heavy deployment of forking in existing implementations for key routing services. Changes of this nature are considered by the author (and others) to be of too large a scope relative to the problem at hand and are subsequently excluded from this draft in favor of searching for less radical solutions. 5.2. Deprecation of early media Deprecation of early media from SIP [RFC3261] is considered to be out of scope. Early media is required in order to handle certain PSTN interactions as defined in RFC-3398 [RFC3398] and elsewhere. In addition, the desire to provide announcements and other media prior to the terminating party answering the call is considered desirable and must therefore use some form of "pre-answer" media (currently known as early media). Stucker Expires April 21, 2007 [Page 11] Internet-Draft Coping w/ Early Media in SIP October 2006 5.3. Originating UA's to render early media Currently, section 5.1 of the offer/answer model [RFC3264] states that the offerer in an SDP offer/answer exchange must be prepared to receive media from media streams described in the offer as being 'recvonly' or 'sendrecv'. Further, in section 6.1 of [RFC3264] it states that the answerer in an SDP offer/answer exchange may immediately send media to media streams that are described in the answer as being 'sendrecv' (note: [RFC3264] does not explicitly state as much, but it is assumed that media streams that are 'sendonly' in the SDP answer can also have media immediately sent to them by the SDP offerer). These statements, taken together, create an obligation upon the originating UA to render any early media sent to them by anyone to whom their SDP offer was delivered (unless the media stream was defined to be 'sendonly' or 'inactive'). This is useful in resolving the PSTN interactions in [RFC3398], especially as noted in the example call flows and ACM message processing in section 7 of that document. This obligation on the part of the originating UA has subsequently been used in the absence of actual PSTN interworking to provide services that mimic the PSTN network (such as providing far- end announcements), or provide other services such as colorful ringback tones (CRBT) in which media is streamed to the originator while the terminator is being located/alerted. The argument can, and has, been made that simply because a service exists in the PSTN world, that it does not mean that it must exist within SIP. However, given the prevalence of services that utilize early media, and the number of RFCs that talk about dealing with various aspects of early media, this particular train appears to have long ago left the station. It is not the intent of this document to pass judgement upon these services, but to find a way to cope with them in a more robust manner than currently is available. The obvious downside to this property of [RFC3264] is that while the offerer may have limited control over the delivery of their SDP offer, they have an obligation to render anything sent to them. This severely restricts the policies that the offerer (as the originator) may use to decide to render early media, which needs to be augmented. 5.4. Downstream signaling of acceptance An INVITE with SDP should serve two simple purposes: establish the path by which all signaling shall follow to/from the originator and the set of terminating clients, and to let each terminating party know what sort of communications the originator can and will engage in. Currently, SDP offers also imply tacit acceptance of any and all Stucker Expires April 21, 2007 [Page 12] Internet-Draft Coping w/ Early Media in SIP October 2006 media that might be generated in the reverse path upstream towards the originator. This should not necessarily always be the case, and a mechanism whereby the originator may assert that it is further ready to receive media packets is needed. The originator may wish to imply a combination of early and final media acceptance or denial in order to prevent unruly early media interactions and clipping of final media. 5.5. Upstream signaling of importance A provisional response from a terminating party currently implies that the terminating party is listening to the SIP signaling it is receiving, and (if an SDP answer is present) the type of communications that the terminator wishes to engage in (if any). What is missing is a way for the terminating party to tell upstream entities what sort of demands it has upon the originator for rendering of its early media, and the relative importance associated with the media that it generates towards the originator. This helps the originator decide what is important and what is not when choosing which media stream it should render (if it wishes to, see Section 4.1.2.1). 5.6. Universal backward-compatibility There are scenarios in which there is no way to cope appropriately with early media streams. An example would be a call that forks to an ISUP PSTN gateway as defined in [RFC3398] that is ignorant of the content of early media it is generating. There is no reliable indication in ISUP CPG or ACM messages as to what the other end might be doing for early media. It is possible that a cause code is present in the CPG in some ISUP to legacy platform interworking scenarios, but these are not present generally in ISUP signaling flows, and therefore cannot be relied upon. Mechansims to deal with these types of devices is currently for future study and not explored further here at this time. 5.7. Recursive forking The mechanism should be able to deal with recursive forking scenarios. This would be where two or more independent proxies fork a given INVITE request from an originating client. In this case, the proxies are normally not coordinated in their operations. As a result, the mechanism proposed should be robust enough to allow for both end-to-middle and end-to-end negotiation of early media. Stucker Expires April 21, 2007 [Page 13] Internet-Draft Coping w/ Early Media in SIP October 2006 5.8. Media Gating In many network environments, it is common for the media flow to be 'gated' in some way. Gating refers to a practice whereby an element in the network is examining the signaling (SIP and SDP) being exchanged by UAs and is sending instructions to a middlebox as to when media packets are authorized to flow between UAs. This gating behavior is typically used to prevent theft of service. As a result of this gating behavior, any mechanism used to identify or coordinate early media should not employ media packet exchanges. It is allowable for early media itself to be marked as such in the media packets, however, because gating behavior does not interact negatively with such a mechanism. Operations that require early media packet behavior by the UAC may fail in the presence of gating. 6. Recommendations The following sections include recommendations that create a framework that is capable of both identifying/prioritizing the type of early media being presented to the originator, and giving the originating client a means by which it can control the order in which early media flows are presented to it. 6.1. Early Media Classification and Prioritization 6.1.1. Overview Regardless of the mechanism that is used to control the presentation of early media, if at any point more than one endpoint is attempting to stream early media to the originator a few problems arise: o Nobody upstream of the device attempting to stream early media to the originator is aware of what exactly it is that the early media generator is generating. Is it advertising? Is it an important message? Who knows. This is important not only for the originating client (see Section 4.1.2.1), but proxies as well since they may be employing a weighting mechanism as described in Section 4.1.1.2. o The device generating the early media may have no idea how many other devices that are peer to it or downstream from it are also trying to generate early media. Again, this is important if the client is using the client-side detection of forking mechanism defined in Section 4.1.2.1. o Multiple streams may be included in the offer, not all of which are suitable or intended for early media. For instance, an offer may include video and audio streams. Early media may only be streamed to the audio port during call setup. Another example Stucker Expires April 21, 2007 [Page 14] Internet-Draft Coping w/ Early Media in SIP October 2006 would be the inclusion of RTP and SRTP streams where only the RTP stream is intended for early media. Therefore, the UAC may not wish to apply early-media coping mechanisms to all streams offered. In order to rectify this situation, proper classification of the possible early media to be sent after completion of the SDP exchange is needed and a specific linkage of that classification to particular streams is highly desirable. This can be handled either by inclusion of SIP headers in the message carrying SDP sent towards the originator or by inclusion in the SDP itself. If the classification is handled in the SDP itself, this limits the ability of intermediaries to use this information to update the SDP as the message body may covered by an integrity protection mechanism or may be otherwise unavailable (for example, the SDP could be encrypted). If the classification is handled in the SIP headers, then it may be unclear as to which SDP stream the classification applies to. If classification is handled via a SIP header (previous revisions of this document referred to an 'Early-Media-Class' header), then it is recommended that the SIP header only apply to SDP covered by an Early-Session content disposition as defined in [RFC3959]. This allows the UAC to clearly understand which streams the classification applies to. In either case, via SIP or SDP, upon answer of the INVITE, all processing of media streams and SDP should revert to [RFC3261]RFC-3261 rules as the call is answered and no media from this point on should be considered 'early'. 6.1.1.1. Early-Media Classifications The following list is given to show a possible set of common early- media classifications. Each class is given in increasing order of importance. 1. RFC-3264 - The default behaivor defined in RFC-3264 is requested. 2. Advertisement - A non-critical advertisement. 3. Warning - A non-critical announcement. 4. Two-way - The endpoint presenting early media wishes to establish a two-way early media session before completing the call. 5. Critical - A critical announcement, such as: "We're about to bill you for $10k". 6. Unknown - The nature of the early media being presented to the originator is unknown (such as from a PSTN gateway receiving a generic announcement.) Early media classified as "Unknown" must unfortunately be considered of the highest importance: there's no indication given that qualifies it to be of lower importance. It is recommended that unclassified early media would be treated as RFC-3264. This is to prevent network Stucker Expires April 21, 2007 [Page 15] Internet-Draft Coping w/ Early Media in SIP October 2006 elements that do not classify their early media from overriding elements that are more forthcoming. An additional q-value, such as that defined in section 20.10 of [RFC3261], can be used to break ties between classifications. 6.2. Early Media Flow Negotiation The following sections take the requirements from Section 5 and tries to create a mechanism that can satisfy them. This mechanism is built along similar lines as the SIP preconditions framework [RFC3312]. 6.2.1. Overview A simple mechanism is introduced that tells terminators what the originator expects to have happen with respect to early media. This information may also be of use to intermediate nodes that also wish to generate early media. The mechanism differs from the SDP [RFC2327] 'a=recvonly', 'a=sendonly', 'a=sendrecv', and 'a=inactive' attributes in that the final media flow mode can be negotiated and ready upon answer without further messaging, and from the preconditions [RFC3959] SDP attributes in that QoS can be negotiated separately as well. 6.2.2. SDP parameters The following media-level parameters are defined: early-media-flow-status = "a=emflow:" direction-tag [ COMMA rtp- ssrc-value ] direction-tag = ("none" | "send" | "recv" | "sendrecv") rtp-ssrc-value = 1 * 8hex The early-media-flow-status 'a=emflow' denotes two things: o The current state of the early media from the perspective of the originating party of the call as specified by the direction-tag. o The RTP SSRC for a given early media stream (as defined in section 8 of [RFC3550]) to facilitate correlation of RTP packets with a particular early media session. It is possible for this value to be the same for two different early media stream. The intent of this is to give the UAC a starting point to work from. ISSUE: What should the UAC do if it sees that the RTP SSRC in two or more early media flows collides? ISSUE: How stable are RTP SSRC values during call setup? It is expected that the directionality indicators defined in [RFC2327] as 'a=sendrecv', 'a=sendonly', 'a=recvonly', and 'a=inactive' are otherwise unaffected. Likewise, preconditions, as defined in [RFC3312] are likewise unaffected. The emflow values may Stucker Expires April 21, 2007 [Page 16] Internet-Draft Coping w/ Early Media in SIP October 2006 be changed in subsequent offer/answer exchanges to allow the originator to properly stage multiple early media streams according to the Early-Media-Class header values. For example, an originator may specify 'a=emflow:none' initially to suppress all early media flows, and then send an UPDATE with a new SDP offer to an endpoint the originator received an early media indication from with 'a=emflow:recv' to denote that the originator is now willing to receive early media. Regardless of the value of this parameter, both endpoints may immediately begin exchanging media packets upon answer according to [RFC3261], [RFC3264] and [RFC2327].Intermediate proxies should honor this indication, and adjust their behavior accordingly, potentally causing them to divert from their normal early media coping mechanisms. 6.2.3. Usage of emflow with offer/answer 6.2.3.1. Meaning of a=emflow:none If the emflow value of 'none' is set in an the SDP offer, it indicates that the endpoint generating the offer will not accept early media and that anyone accepting this SDP offer MUST NOT send early media. If the emflow value in the SDP offer was 'none', then the emflow value in the SDP answer MUST be set to 'none' as well. If the emflow value of 'none' is set in an SDP answer, it indicates that the endpoint generating the answer will not generate early media. The SDP offeror can take this indication to mean that they should not expect early media packets from this endpoint per [RFC3264], and that any received prior to answer from this source MAY be discarded. 6.2.3.2. Meaning of a=emflow:send If the emflow value of 'send' is set in an the SDP offer, it indicates that the endpoint generating the offer may send early media packets, but will not accept early media. Anyone accepting this SDP offer MUST NOT send early media, but SHOULD process received early media packets if it is appropriate to the device receiving packets to do so. If the emflow value in the SDP offer was 'send', then the emflow value in the SDP answer MUST be set to 'none' or 'recv' depending on whether the application intends to process the early media packets that the offeror may send to it. If the emflow value of 'send' is set in an SDP answer, it indicates that the endpoint generating the answer may generate early media but will not process any sent to it. Any early media sent to it per Stucker Expires April 21, 2007 [Page 17] Internet-Draft Coping w/ Early Media in SIP October 2006 [RFC3264] MAY be discarded. The SDP offeror can take this indication to mean that they should expect early media packets from this endpoint and behave appropriately. 6.2.3.3. Meaning of a=emflow:recv If the emflow value of 'recv' is set in an the SDP offer, it indicates that the endpoint generating the offer may be sent early media packets, but will not generate early media. Anyone accepting this SDP offer MAY send early media, but SHOULD NOT expect to receive early media from the SDP offeror, and that any media packets received prior to answer from this the offeror may safely be discarded. If the emflow value in the SDP offer was 'recv', then the emflow value in the SDP answer MUST be set to 'none' or 'send' depending on whether the application intends to send the early media packets to the offeror or not. If the emflow value of 'recv' is set in an SDP answer, it indicates that the endpoint generating the answer will accept early media but will not generate any.The SDP offeror can take this indication to mean that they should not expect early media packets from this endpoint and may safely discard any received prior to answer. 6.2.3.4. Meaning of a=emflow:sendrecv If the emflow value of 'sendrecv' is set in an the SDP offer, it indicates that the endpoint generating the offer may send and receive early media packets. Anyone accepting this SDP offer MAY send early media, and SHOULD process received early media packets if it is appropriate to the device receiving packets to do so. If the emflow value in the SDP offer was 'sendrecv', then the emflow value in the SDP answer MAY be set to any value. The value set in the SDP answer depends on if the endpoint answering the SDP offer intends to send and/or receive early media packets. If the emflow value of 'sendrecv' is set in an SDP answer, it indicates that the endpoint generating the answer may generate and receive early media and behave appropriately. 6.2.3.5. Usage of RTP-SSRC-Value The RTP-SSRC value is useful in helping endpoints correlate incoming RTP packets with SDP offer/answer exchanges. The value used in this tag is the SSRC value used in the header portion of an RTP packet as defined in [RFC3550]. The SSRC value in an RTP packet is used to define a means for an endpoint to synchronize RTP packets sent from a particular source. As such, the SSRC value must be unique for a given RTP stream. Stucker Expires April 21, 2007 [Page 18] Internet-Draft Coping w/ Early Media in SIP October 2006 The worst-case expectation for uniqueness of an SSRC value during the offer/answer SDP phase of RTP resource allocation is given in Section 8 of [RFC3550] as 10^(-4) if there are 1000 different RTP streams being offered. As the number of RTP streams typically used in a call setup, even with significant forking involved, is likely to be O(10) or fewer, the likelihood of each RTP stream getting a unique SSRC number early on is good. If a collision is detected, then [RFC3550] defines a mechanism for detecting this and reselecting a unique SSRC value. This re-selection does not require another SDP exchange today, but if necessary, an SDP exchange could be initiated through a target refresh of the INVITE dialog to update the SDP offer and/or answer with the update SSRC value in the emflow parameter. 6.2.4. Option tag for emflow The option tag "emflow" is defined for use in the Require and Supported header fields [RFC3261]. An offerer MAY include this tag in a Require header if they wish to ensure that any endpoint reached supports this extension (typically when 'a=emflow:' is not set to 'sendrecv'). Then if the party generating an SDP offer or answer supports this extension it MUST include this tag in a Supported header if it is not already in a Require header of any message containing SDP. This allows the other party or parties involved in the signaling flow to know that the other end is processing their emflow values. 6.2.5. Example The following figures show a simple offer/answer exchange in which the UAC does not wish to receive early media automatically. The UAS then answers indicating that it has a warning announcement it would like to play as early media. The UAC then updates the emflow value to allow the warning announcement to proceed. v=0 o=alice 2890844526 2890844526 IN IP4 uac.anywhere.com s= c=IN IP4 uac.example.com t=0 0 m=audio 49170 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=emflow: none, 1e4f381 Figure 1: An SDP offer with no early media allowed and SSRC 1e4f381. Stucker Expires April 21, 2007 [Page 19] Internet-Draft Coping w/ Early Media in SIP October 2006 ... Early-Media-Class: Warning; q=1.0 v=0 o=bob 2890844730 2890844730 IN IP4 uas.example.com s= c=IN IP4 uas.example.com t=0 0 m=audio 49920 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=emflow: none, 23a73c01 Figure 2: A SIP early media answer to the offer with SSRC of 23a73c01. v=0 o=alice 2890844526 2890844526 IN IP4 uac.anywhere.com s= c=IN IP4 uac.example.com t=0 0 m=audio 49170 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=emflow: recv, 1e4f381 Figure 3: An SDP offer with early media allowed towards the UAC. v=0 o=bob 2890844730 2890844730 IN IP4 uas.example.com s= c=IN IP4 uas.example.com t=0 0 m=audio 49920 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=emflow: send, 23a73c01 Figure 4: An SDP answer to the offer acknowledging that early media will be sent using RTP SSRC 23a73c01. 6.3. Early Media and SRTP One of the challenges in dealing with SRTP is the initial key exchanges required to support it. The draft [I-D.wing-rtpsec-keying-eval] discusses a number of keying mechanisms, concentrating on their interaction with SIP. Given the amount of effort likely involved to establish a secure media flow, it is undesirable to require that early media be secure. After all, an Stucker Expires April 21, 2007 [Page 20] Internet-Draft Coping w/ Early Media in SIP October 2006 attacker can likely surmise from a 180 Ringing response to an INVITE that the originator is probably hearing ringback. It is the conversation that typically seeks to be protected, therefore securing early media in many situations is likely wasteful. Additionally, there may be issues where the early media coping mechanisms mentioned in Section 4 are employed that prevents SRTP keying exchanges from taking place in a timely manner. This can cause a number of potentially poor outcomes, especially when SDP is stripped or otherwise manipulated during call setup by a network element. Finally, network elements that wish to generate early media typically serve many endpoints simultaneously. This means that they do not have the computational power available to support key exchange and encryption without an undesirable reduction in the amount of traffic that they can handle. Therefore, it is recommended that if a client is offering an SRTP stream, that they also offer a regular RTP stream as well for purposes of early media. This gives the network a separate playground to work with for purposes of establishing early media to the UAC. If the SDP for the early media stream were separated in the SDP offer (possibly using [RFC3959]) it is conceivable that network elements that employ the mechanisms described in Section 4 would simply leave the SRTP portion of a UAC's offer alone, thereby improving the observed behavior of SRTP and early media by the user and SIP network administrator. ISSUE: The interaction between the mechanisms outlined in this draft and SRTP clearly warrants more investigation. 7. Security Considerations This document is a work in progress. Security considerations will be added as various recommendations become more concrete. 8. IANA Considerations This document defines the SDP media type of "emflow" and the direction-tag values of "none", "send", "recv", and "sendrecv" which will require IANA registration. 9. References Stucker Expires April 21, 2007 [Page 21] Internet-Draft Coping w/ Early Media in SIP October 2006 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. 9.2. Informational References [RFC2327] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3262] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional Responses in Session Initiation Protocol (SIP)", RFC 3262, June 2002. [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. [RFC3312] Camarillo, G., Marshall, W., and J. Rosenberg, "Integration of Resource Management and Session Initiation Protocol (SIP)", RFC 3312, October 2002. [RFC3398] Camarillo, G., Roach, A., Peterson, J., and L. Ong, "Integrated Services Digital Network (ISDN) User Part (ISUP) to Session Initiation Protocol (SIP) Mapping", RFC 3398, December 2002. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC3959] Camarillo, G., "The Early Session Disposition Type for the Session Initiation Protocol (SIP)", RFC 3959, December 2004. [RFC3960] Camarillo, G. and H. Schulzrinne, "Early Media and Ringing Tone Generation in the Session Initiation Protocol (SIP)", RFC 3960, December 2004. [I-D.wing-rtpsec-keying-eval] Stucker Expires April 21, 2007 [Page 22] Internet-Draft Coping w/ Early Media in SIP October 2006 Audet, F. and D. Wing, "Evaluation of SRTP Keying with SIP", draft-wing-rtpsec-keying-eval-01 (work in progress), June 2006. Author's Address Brian Stucker Nortel 2201 Lakeside Richardson, TX 75082 US Phone: +1 972 685 7724 Email: bstucker@nortel.com URI: http://www.nortel.com/ Stucker Expires April 21, 2007 [Page 23] Internet-Draft Coping w/ Early Media in SIP October 2006 Full Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Stucker Expires April 21, 2007 [Page 24]