< draft-rosenberg-sip-app-components-00.txt   draft-rosenberg-sip-app-components-01.txt >
Internet Engineering Task Force SIP WG Internet Engineering Task Force SIP WG
Internet Draft Rosenberg/Mataga/Schulzrinne Internet Draft Rosenberg/Mataga/Schulzrinne
draft-rosenberg-sip-app-components-00.txt dynamicsoft/Columbia U. draft-rosenberg-sip-app-components-01.txt dynamicsoft/Columbia U.
November 15, 2000 March 2, 2001
Expires: May 2001 Expires: September 2001
An Application Server Component Architecture for SIP An Application Server Component Architecture for SIP
STATUS OF THIS MEMO STATUS OF THIS MEMO
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 3, line 36 skipping to change at page 3, line 36
application may require special purpose hardware. This application may require special purpose hardware. This
component can distributed to a specialized processor, with component can distributed to a specialized processor, with
a normal off the shelf processor handling the more generic a normal off the shelf processor handling the more generic
software tasks. Several of the components that we are software tasks. Several of the components that we are
describing fit into this category (such as the TTS server). describing fit into this category (such as the TTS server).
Sharing of resources. By decomposing a server into components, a Sharing of resources. By decomposing a server into components, a
many-to-many interaction between them becomes possible. many-to-many interaction between them becomes possible.
This means that one component can provide services to many This means that one component can provide services to many
other components. This provides for sharing of resources, other components. This provides for sharing of resources,
which ultimately results in cost reduction. which ultimately results in capital cost reduction.
Expertise. Building a complex application requires expertise in Expertise. Building a complex application requires expertise in
call control, media services, compression, web, speech call control, media services, compression, web, speech
recognition, etc. It is highly unlikely that one recognition, etc. It is highly unlikely that one
organization will have enough expertise in all of these to organization will have enough expertise in all of these to
build them all. By decomposing an application server into build them all. By decomposing an application server into
subpieces, organizations with expertise in one particular subpieces, organizations with expertise in one particular
piece can build that one. The result is that the complete piece can build that one. The result is that the complete
system can be composed of best in breed components. system can be composed of best in breed components.
skipping to change at page 4, line 41 skipping to change at page 4, line 41
Typically, the MGCP interface between the two devices is fairly Typically, the MGCP interface between the two devices is fairly
"busy"; there is a lot of messaging for complex applications. "busy"; there is a lot of messaging for complex applications.
In this model, there is a tightly coupled relationship between the MS In this model, there is a tightly coupled relationship between the MS
and AS. The MS cannot function without the AS, and the AS needs to and AS. The MS cannot function without the AS, and the AS needs to
perform tight, low-level controls over the detailed operation of the perform tight, low-level controls over the detailed operation of the
media server. media server.
To some degree, breaking of an application server into these two To some degree, breaking of an application server into these two
components represents an implementation detail of how one builds a components represents an implementation detail of how one builds a
large, monolithic application server. It is not generally possible large, monolithic application server. It is not generally practical
for the two components to be owned by separate providers. In fact, it for the two components to be owned by separate providers, due to the
has yet to be shown that complete interoperability and integration is master/slave relationship between the two.
possible with two components from different vendors, let alone
different providers.
This decomposition also does not provide a true separation of This decomposition also does not provide a true separation of
function. Most applications that require media interaction (IVR, function. Most applications that require media interaction (IVR,
credit card and debit card, etc.) have very cleanly separated media credit card and debit card, etc.) have very cleanly separated media
phases and signaling phases. The details of the media interactions phases and signaling phases. The details of the media interactions
are usually not important to the signaling component, and vice a
versa. As an example, consider a debit card application. The
.................... ....................
. . . .
. +-------------+ . . +-------------+ .
. | | . . | | .
SIP . | | . SIP . | | .
-------------+ AS | . -------------+ AS | .
. | | . . | | .
. | | . . | | .
. | | . . | | .
. +-------------+ . . +-------------+ .
skipping to change at page 5, line 36 skipping to change at page 5, line 36
. | | . . | | .
. | | . . | | .
. +-------------+ . . +-------------+ .
. . . .
.................... ....................
Complete Application Complete Application
Server Server
Figure 1: MGCP-based decomposition Figure 1: MGCP-based decomposition
are usually not important to the signaling component, and vice a
versa. As an example, consider a debit card application. The
application starts with the user making a call. As part of the call application starts with the user making a call. As part of the call
processing, interaction is needed with the user via the media stream processing, interaction is needed with the user via the media stream
to determine the debit card number. The precise set of menu to determine the debit card number. The precise set of menu
operations and interactions used to obtain this number aren't operations and interactions used to obtain this number aren't
important to the call/signaling processing piece; only the result important to the call/signaling processing piece; only the result
(the number), is important. Once the number is returned, media (the number), is important. Once the number is returned, media
processing ceases, and data and call processing commence. The debit processing ceases, and data and call processing commence. The debit
card is looked up in a subscriber database, and if enough time card is looked up in a subscriber database, and if enough time
remains, the call is completed. The signaling component monitors the remains, the call is completed. The signaling component monitors the
call, and when the card has run out of minutes, the call is call, and when the card has run out of minutes, the call is
skipping to change at page 6, line 30 skipping to change at page 6, line 28
interactions with the MS that are provided by MGCP, in addition to interactions with the MS that are provided by MGCP, in addition to
the detailed signaling and data processing operations. The developers the detailed signaling and data processing operations. The developers
will also need to build and manage the low level state representing will also need to build and manage the low level state representing
the controlled entity, which can be painful. The result is longer the controlled entity, which can be painful. The result is longer
development times, less code reuse, and slower innovation. development times, less code reuse, and slower innovation.
It has been argued that one of the benefits of the MGCP decomposition It has been argued that one of the benefits of the MGCP decomposition
is that it offloads the "burden" of call control from the media is that it offloads the "burden" of call control from the media
server. However, from a complexity standpoint, the MGCP processing server. However, from a complexity standpoint, the MGCP processing
required is probably on par with (if not more than), the simple required is probably on par with (if not more than), the simple
amount of call control and SIP processing needed if SIP were used amount of call control and event processing needed if SIP and
directly. VoiceXML were used.
From a reliability perspective, an MGCP style decomposition is less From a reliability perspective, an MGCP style decomposition is less
desirable. Since the components are strongly coupled, the system will desirable. Since the components are strongly coupled, the system will
fail so long as any of the pieces fail. Failure can also be fail so long as any of the pieces fail. Failure can also be
introduced because of additional network resources needed for introduced because of additional network resources needed for
communications between the boxes. The result is that the MGCP communications between the boxes. The result is that the MGCP
decomposition may actually increase the probability of failure, as decomposition may actually increase the probability of failure, as
compared to no decomposition at all. compared to no decomposition at all.
Another decomposition that has been proposed is to break a proxy into Another decomposition that has been proposed is to break a proxy into
a routing and call control component, plus a services component. The a routing and call control component, plus a services component. The
interface between the two is then a transactional interface for interface between the two is then a transactional interface for
services, similar in concept to INAP, based upon state transitions services, similar in concept to INAP, based upon state transitions
within a call model. This is another form of tight coupling, since it within a call model. This is another form of tight coupling, since it
requires the services component to have detailed knowledge of the requires the services component to have detailed knowledge of the
operational model of the call control component. We believe that this operational model of the call control component. We believe that this
decomposition is limiting, for the same reasons the AS/MS decomposition is limiting, for the same reasons the AS/MS
decomposition is limiting. decomposition is limiting.
4 The Decoupled Model 4 The Decoupled Model
4.1 Architecture
4.1 Architecture
As a result of this, we see the master/slave decomposition as being As a result of this, we see the master/slave decomposition as being
ideal for a single vendor to build a large system. However, this ideal for a single vendor to build a large system. However, this
decomposition does not solve the other distribution needs we have decomposition does not solve the other distribution needs we have
motivated above. As a result, we propose that the AS be decomposed motivated above. As a result, we propose that the AS be decomposed
into an application component responsible for coordinating the into an application component responsible for coordinating the
overall execution of the application (called the controller), and overall execution of the application (called the controller), and
application server components that provide pieces of the overall application server components that provide pieces of the overall
application. These components are only loosely coupled with the application. These components are only loosely coupled with the
coordinating application server. The loose coupling implies that the coordinating application server. The loose coupling implies that the
interaction between them is the same as the interaction between the interaction between them is the same as the interaction between the
skipping to change at page 9, line 4 skipping to change at page 8, line 50
A prompt would be played over that session, something like "please A prompt would be played over that session, something like "please
record your message for Joe now", and then the component takes the record your message for Joe now", and then the component takes the
media input stream, records it, and saves it. When it is done, the media input stream, records it, and saves it. When it is done, the
session is terminated. session is terminated.
In some cases, the session may require a "side channel" over which In some cases, the session may require a "side channel" over which
intermediate data is passed, needed to control the session intermediate data is passed, needed to control the session
interactions from that point forward. IVR is the classic example. In interactions from that point forward. IVR is the classic example. In
some cases the coordinating application server can kick off the IVR some cases the coordinating application server can kick off the IVR
script, and then only get back the final result - a menu option, a script, and then only get back the final result - a menu option, a
credit card number, or what have you. In other cases, the
coordinating component may need to get intermediate results, so that
+-----------+ +-----------+
| | | |
| | | |
| AS | | AS |
|coordinator| |coordinator|
| | | |
| | | |
+-----------+ +-----------+
SIP, -- \ --- SIP, -- \ ---
RTP? -- \ ---- SIP, RTP? -- \ ---- SIP,
skipping to change at page 10, line 4 skipping to change at page 10, line 4
+----------+ | | +----------+ | |
| | | | | | | |
| | | | | | | |
| | | ASC | | | | ASC |
| ASC | | | | ASC | | |
| | | | | | | |
| | +----------+ | | +----------+
+----------+ +----------+
Figure 2: Decoupled Architecture Figure 2: Decoupled Architecture
credit card number, or what have you. In other cases, the
coordinating component may need to get intermediate results, so that
it can guide the operation of the IVR moving forward. This requires a it can guide the operation of the IVR moving forward. This requires a
companion control channel that provides data output from the companion control channel that provides data output from the
component server back to the client, and then returns further high component server back to the client, and then returns further high
level instructions from the client back to the server. level instructions from the client back to the server.
There is a thin line in some cases between this control channel and There is a thin line in some cases between this control channel and
the tightly coupled interactions of a master-slave MGCP relationship. the tightly coupled interactions of a master-slave MGCP relationship.
However, the loosely coupled nature of the interaction can be However, the loosely coupled nature of the interaction can be
maintained by using coarse-grained data passing over a distributed maintained by using coarse-grained data passing over a distributed
client-server protocol, such as HTTP or Corba. client-server protocol, such as HTTP or Corba.
skipping to change at page 17, line 4 skipping to change at page 16, line 50
caller (if they were using a softphone), the service execution code caller (if they were using a softphone), the service execution code
is unchanged. is unchanged.
Others have proposed that DTMF digits be carried in SIP directly from Others have proposed that DTMF digits be carried in SIP directly from
the caller to the AS [9,10]. However, this approach does not work the caller to the AS [9,10]. However, this approach does not work
for anything beyond DTMF, while our approach works for DTMF, speech, for anything beyond DTMF, while our approach works for DTMF, speech,
and web interfaces. Another drawback of the DTMF-in-SIP approach is and web interfaces. Another drawback of the DTMF-in-SIP approach is
that all entities on the call signaling path will receive any DTMF that all entities on the call signaling path will receive any DTMF
digits dialed by the called party. Furthermore, since the caller digits dialed by the called party. Furthermore, since the caller
doesn't know if there is an entity interested in DTMF, it is required doesn't know if there is an entity interested in DTMF, it is required
to send DTMF within SIP messages all the time, even if no entity is
interested.
Caller Coordinator Media Server Callee Caller Coordinator Media Server Callee
| | | | | | | |
|(1) SIP INV | | | |(1) SIP INV | | |
|--------------->|(2) SIP INV | | |--------------->|(2) SIP INV | |
| |----------------->| | | |----------------->| |
| |(3) 200 OK | | | |(3) 200 OK | |
| |<-----------------| | | |<-----------------| |
| |(4) SIP ACK | | | |(4) SIP ACK | |
| |----------------->| | | |----------------->| |
| |(5) SIP INV | | | |(5) SIP INV | |
skipping to change at page 18, line 4 skipping to change at page 18, line 4
| |<-----------------+-----------------| | |<-----------------+-----------------|
| |(19) SIP ACK | | | |(19) SIP ACK | |
| |------------------+---------------->| | |------------------+---------------->|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
Figure 3: Call Flow for DTMF Enabled Hold Service Figure 3: Call Flow for DTMF Enabled Hold Service
to send DTMF within SIP messages all the time, even if no entity is
interested.
There have been proposals for adding a subscription/notification There have been proposals for adding a subscription/notification
mechanism on top of this to avoid this problem. However, this further mechanism on top of this to avoid this problem. However, this further
complicates the system by adding a requirement for the caller to complicates the system by adding a requirement for the caller to
support a subscription and notification service just for DTMF. support a subscription and notification service just for DTMF.
Our approach fits well within the existing SIP framework, and Our approach fits well within the existing SIP framework, and
requires no additional work from the end users. Furthermore, it requires no additional work from the end users. Furthermore, it
transparently supports multiple application server components transparently supports multiple application server components
receiving DTMF. This is because an AS is able to send a DTMF stream receiving DTMF. This is because an AS is able to send a DTMF stream
to a component by adding a new media line to the list of media to a component by adding a new media line to the list of media
skipping to change at page 20, line 4 skipping to change at page 19, line 48
6 Patterns for Accessing Components 6 Patterns for Accessing Components
In this section, we propose a set of patterns that define the In this section, we propose a set of patterns that define the
interaction of a controller with an application server component. interaction of a controller with an application server component.
These patterns manifest themselves in the description of the service These patterns manifest themselves in the description of the service
invoked when a session is initiated, a discussion of the naming invoked when a session is initiated, a discussion of the naming
conventions of the service, and a description of any back channel conventions of the service, and a description of any back channel
used for control and data passing. used for control and data passing.
6.1 Interactive Voice Response Services 6.1 Interactive Voice Response Services
We have touched upon the basics of the interaction between a
controller and an IVR server. The controller initiates a call to the
server, the server executes some kind of IVR service, and data is
Caller A B Callee Caller A B Callee
| | | | | | | |
|(1) SIP INV | | | |(1) SIP INV | | |
|-------------->|(2) SIP INV | | |-------------->|(2) SIP INV | |
| |--------------->|(3) SIP INV | | |--------------->|(3) SIP INV |
| | |---------------->| | | |---------------->|
| | |(4) 200 OK | | | |(4) 200 OK |
| |(5) 200 OK |<----------------| | |(5) 200 OK |<----------------|
|(6) 200 OK |<---------------| | |(6) 200 OK |<---------------| |
|<--------------| | | |<--------------| | |
skipping to change at page 20, line 42 skipping to change at page 20, line 42
|(18) SIP ACK |<---------------| | |(18) SIP ACK |<---------------| |
|<--------------| | | |<--------------| | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
Figure 4: Multiple Application Servers and DTMF Figure 4: Multiple Application Servers and DTMF
We have touched upon the basics of the interaction between a possibly fed back to the controller with intermediate and/or final
controller and an IVR server. The controller initiates a call to the results of the IVR interaction.
server, the server executes some kind of IVR service, and data is
A number of questions still need to be answered, however:
1. How is the IVR service identified? 1. How is the IVR service identified?
2. How can the controller specify the details of the dialog 2. How can the controller specify the details of the dialog
the IVR carries out with the user? the IVR carries out with the user?
3. How does data from the IVR get passed back to the 3. How does data from the IVR get passed back to the
controller? controller?
4. How is intermediate control performed (e.g., to interrupt 4. How is intermediate control performed (e.g., to interrupt
skipping to change at page 22, line 47 skipping to change at page 22, line 44
which is returned by the controller (7). The prompts are played to which is returned by the controller (7). The prompts are played to
the caller, and the identity of the called party is collected. This the caller, and the identity of the called party is collected. This
is passed to the controller through another POST (8), which returns is passed to the controller through another POST (8), which returns
an empty VoiceXML script (9)[1] complete, the controller hangs up an empty VoiceXML script (9)[1] complete, the controller hangs up
with it (10 and 11). The information the controller got in the POST with it (10 and 11). The information the controller got in the POST
(8) is used to determine the next hop SIP server, and the initial (8) is used to determine the next hop SIP server, and the initial
INVITE is proxied there (12). INVITE is proxied there (12).
Its important to observe the all call control related to executing Its important to observe the all call control related to executing
the service lives within the controlling application server. The IVR the service lives within the controlling application server. The IVR
application server deals strictly with the media component. This
division of work, as we have discussed above, allows for independent
_________________________ _________________________
[1] Note that it is unusual for an empty script to be [1] Note that it is unusual for an empty script to be
returned; this is because we want the AS to maintain returned; this is because we want the AS to maintain
control of the call signaling control of the call signaling
application server deals strictly with the media component. This
division of work, as we have discussed above, allows for independent
evolution of the call control and media components of services. For evolution of the call control and media components of services. For
example, if the desired called party did not have a reachable SIP example, if the desired called party did not have a reachable SIP
address, but they did have an email address, the call could be address, but they did have an email address, the call could be
redirected to a mailto URL. To support this twist, only the redirected to a mailto URL. To support this twist, only the
controlling application server code need change. The media component controlling application server code need change. The media component
remains completely and totally unchanged. remains completely and totally unchanged.
Readers familiar with VoiceXML will observe that VoiceXML almost Readers familiar with VoiceXML will observe that VoiceXML almost
achieves this perfect separation. It lacks any call control excepting achieves this perfect separation. It lacks any call control excepting
a two - for call transfer and call termination. These tags are a two - for call transfer and call termination. These tags are
skipping to change at page 24, line 4 skipping to change at page 23, line 50
We observe once more that all of these conferencing "servers" are We observe once more that all of these conferencing "servers" are
really conferencing applications that are just bundled as a server. really conferencing applications that are just bundled as a server.
These conferencing applications can be decomposed into components in These conferencing applications can be decomposed into components in
exactly the way we have described above. At the core of each of these exactly the way we have described above. At the core of each of these
conferencing applications is a mixing service. This service is conferencing applications is a mixing service. This service is
responsible for taking N audio or video streams, mixing them responsible for taking N audio or video streams, mixing them
according to some matrix, and returning the mixed stream to each according to some matrix, and returning the mixed stream to each
participant. Issues such as conference policy, provisioning of participant. Issues such as conference policy, provisioning of
conferences, and authentication are all completely separate and conferences, and authentication are all completely separate and
outside of this basic mixing component.
| INVITE (1) | | | INVITE (1) | |
|------------------------>| | |------------------------>| |
| | INVITE (2) | | | INVITE (2) |
| |------------------------->| | |------------------------->|
| | 200 OK (3) | | | 200 OK (3) |
| |<-------------------------| | |<-------------------------|
| 183 (4) | | | 183 (4) | |
|<------------------------| | |<------------------------| |
| | ACK (5) | | | ACK (5) |
| |------------------------->| | |------------------------->|
skipping to change at page 25, line 4 skipping to change at page 25, line 4
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
Caller Controller IVR Server Caller Controller IVR Server
Figure 5: Interaction of App Server and IVR Component Figure 5: Interaction of App Server and IVR Component
outside of this basic mixing component.
For this reason, we argue that a large variety of conferencing For this reason, we argue that a large variety of conferencing
applications can be easily constructed by having the mixing service applications can be easily constructed by having the mixing service
as separate application server component. as separate application server component.
What does the interface to such a mixing server look like? For the What does the interface to such a mixing server look like? For the
call control interface, users would join a conference by calling the call control interface, users would join a conference by calling the
server. The server would answer the call, thus appearing as a SIP server. The server would answer the call, thus appearing as a SIP
UAS. The media sent from the user is mixed with other users in the UAS. The media sent from the user is mixed with other users in the
conference, and the media sent back to the user is the mixed stream. conference, and the media sent back to the user is the mixed stream.
The user can leave the conference by sending a BYE to the server, and The user can leave the conference by sending a BYE to the server, and
skipping to change at page 28, line 4 skipping to change at page 27, line 49
in mechanisms for session state sharing between the SIP and HTTP in mechanisms for session state sharing between the SIP and HTTP
components. components.
For this simple conferencing service, it was sufficient for the For this simple conferencing service, it was sufficient for the
controller to act as a proxy. Thats because it does not need to controller to act as a proxy. Thats because it does not need to
forcibly kick anyone out of the conference once they are in. To forcibly kick anyone out of the conference once they are in. To
support that kind of functionality, third party call control is support that kind of functionality, third party call control is
needed. Let us examine a more complex service in the next section. needed. Let us examine a more complex service in the next section.
6.2.2 Web Scheduled, IVR supported, Time Limited Conference 6.2.2 Web Scheduled, IVR supported, Time Limited Conference
In this more complex example, we once again wish to use a web
interface to set up the conferences. However, we wish to add a stop
time. If there are participants in the conference when the stop time
| | | | (1) HTTP POST | | | | | | (1) HTTP POST | |
|--------------------------->| | |--------------------------->| |
| | | | (2) 200 OK | | | | | | (2) 200 OK | |
|<---------------------------| | |<---------------------------| |
| | | | | | | | | | | |
| | | | (3) INVITE | | | | | | (3) INVITE | |
| |----------------------->| (4) INVITE | | |----------------------->| (4) INVITE |
| | | | |--------------------->| | | | | |--------------------->|
| | | | | (5) 200 OK | | | | | | (5) 200 OK |
| | | | (6) 200 OK |<---------------------| | | | | (6) 200 OK |<---------------------|
skipping to change at page 29, line 4 skipping to change at page 29, line 4
| | | |<---------------| | | | | |<---------------| |
| | | |(17) ACK | | | | | |(17) ACK | |
| | | |--------------->| | | | | |--------------->| |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
Web A B C Controller Mixer Web A B C Controller Mixer
Figure 6: Web Scheduled Conference Services Figure 6: Web Scheduled Conference Services
In this more complex example, we once again wish to use a web
interface to set up the conferences. However, we wish to add a stop
time. If there are participants in the conference when the stop time
arrives, a warning announcement is played 10 minutes prior, and then arrives, a warning announcement is played 10 minutes prior, and then
they are kicked off. In addition, when a user joins the conference, they are kicked off. In addition, when a user joins the conference,
before they are added, they hear an announcement that states the name before they are added, they hear an announcement that states the name
of the person that set up the conference, and what the start and stop of the person that set up the conference, and what the start and stop
times are. They are then asked to speak their name. Then, they are times are. They are then asked to speak their name. Then, they are
dropped in. The conference server then speaks their name, so that dropped in. The conference server then speaks their name, so that
everyone knows who just joined. everyone knows who just joined.
This seemingly complex service is very easily constructed by adding This seemingly complex service is very easily constructed by adding
an IVR server as described above. Now, we have a controller, a mixing an IVR server as described above. Now, we have a controller, a mixing
skipping to change at page 32, line ? skipping to change at page 30, line 47
These examples demonstrate the component model we are proposing. The These examples demonstrate the component model we are proposing. The
mixing component does not have application level intelligence. It has mixing component does not have application level intelligence. It has
a call control interface, allowing it to exist anywhere (and be a call control interface, allowing it to exist anywhere (and be
provided by any ASP service) and yet be a callable resource by other provided by any ASP service) and yet be a callable resource by other
application server components. By combining a controller with an IVR application server components. By combining a controller with an IVR
server and the mixing server, complex and useful applications can be server and the mixing server, complex and useful applications can be
constructed in a distributed fashion. constructed in a distributed fashion.
6.3 Continuous Text-to-Speech 6.3 Continuous Text-to-Speech
Another example of an application server component is a continuous
Text-to-Speech (TTS) converter. This kind of service allows a real
time text stream (encapsulated in RTP using the RTP payload format
Caller Controller IVR Server Mixing Server Caller Controller IVR Server Mixing Server
| | | | | | | |
| (1) INVITE | | | | (1) INVITE | | |
|-------------->| (2) INVITE | | |-------------->| (2) INVITE | |
| |----------------->| | | |----------------->| |
| | (3) 200 OK | | | | (3) 200 OK | |
| (4) 183 |<-----------------| | | (4) 183 |<-----------------| |
|<--------------| | | |<--------------| | |
| | (5) ACK | | | | (5) ACK | |
| |----------------->| | | |----------------->| |
skipping to change at page 33, line 6 skipping to change at page 33, line 6
| (18) 200 OK | | | (18) 200 OK | |
|<----------------------| | |<----------------------| |
| | | | | |
| | | | | |
Controller Mixer IVR Server Controller Mixer IVR Server
Figure 8: Advanced Web Scheduled Conference Service: Warning Figure 8: Advanced Web Scheduled Conference Service: Warning
Announcement Announcement
Another example of an application server component is a continuous
Text-to-Speech (TTS) converter. This kind of service allows a real
time text stream (encapsulated in RTP using the RTP payload format
for text [14] to be received, which is then converted to speech and for text [14] to be received, which is then converted to speech and
returned as an audio stream encoded using a traditional speech codec, returned as an audio stream encoded using a traditional speech codec,
be it G.723.1, G.711, or what have you. be it G.723.1, G.711, or what have you.
Like the IVR server and mixing server, the TTS server acts as a user Like the IVR server and mixing server, the TTS server acts as a user
agent server. It answers incoming calls, and basically mirrors agent server. It answers incoming calls, and basically mirrors
incoming text back as speech. It continutes to do so until the call incoming text back as speech. It continutes to do so until the call
is hung up by the initiating client. is hung up by the initiating client.
A TTS service can be done using VoiceXML with an IVR server, as in A TTS service can be done using VoiceXML with an IVR server, as in
skipping to change at page 34, line 15 skipping to change at page 34, line 12
payload type number bound to text/t140. The stream MUST be marked as payload type number bound to text/t140. The stream MUST be marked as
receive-only. receive-only.
The client then ACKs the request. The TTS server SHOULD attempt to The client then ACKs the request. The TTS server SHOULD attempt to
convert all text received on the incoming text stream to speech, and convert all text received on the incoming text stream to speech, and
return the resulting speech on the outgoing audio stream. return the resulting speech on the outgoing audio stream.
6.3.2 Hearing Impaired Service 6.3.2 Hearing Impaired Service
The TTS server is extremely useful in supporting hearing impaired The TTS server is extremely useful in supporting hearing impaired
services. Examples of such services are described in describes a services. Examples of such services are described in [16].
service where a controller accesses a TTS service. Specifically, Section 2.4 describes a service where a controller
accesses a TTS service.
6.4 Messaging Servers 6.4 Messaging Servers
Another type of application server component is a messaging server. Another type of application server component is a messaging server.
Messaging servers allow for callers to record audio messages for Messaging servers allow for callers to record audio messages for
users on the system. Users can also call into the server to retrieve users on the system. Users can also call into the server to retrieve
these messages, delete them, and file them. The system operates these messages, delete them, and file them. The system operates
through the use of voice prompts combined with DTMF detection and/or through the use of voice prompts combined with DTMF detection and/or
speech recognition. The prompts that are played are context speech recognition. The prompts that are played are context
dependent. A messaging server can be viewed as a specialized version dependent. A messaging server can be viewed as a specialized version
skipping to change at page 35, line 9 skipping to change at page 35, line 5
An example usage of this application component is a web front end An example usage of this application component is a web front end
that allows users to leave voicemail for company employees through that allows users to leave voicemail for company employees through
the company web page. The page has a URL for each company employee. the company web page. The page has a URL for each company employee.
If some user A clicks on a URL for employee B, A's phone rings. When If some user A clicks on a URL for employee B, A's phone rings. When
A picks up, they hear a greeting to record a message for employee B. A picks up, they hear a greeting to record a message for employee B.
The call flow for this application is the combination of third party The call flow for this application is the combination of third party
call control combined with access to the service. It is shown in call control combined with access to the service. It is shown in
Figure 9. Figure 9.
| | | |
| | (1) HTTP GET | |
|-------------------->| |
| | (2) 200 OK | |
|<--------------------| |
| | (3) INV | |
| |<-------------| |
| | (4) 200 OK | |
| |------------->| |
| | (5) ACK | |
| |<-------------| |
| | | (6) INV |
| | |--------------------->|
| | | (7) 200 OK |
| | |<---------------------|
| | | (8) ACK |
| | |--------------------->|
| | (9) INV | |
| |<-------------| |
| | (10) 200 OK | |
| |------------->| |
| | (11) ACK | |
| |<-------------| |
| | | |
| | | |
| | | |
Web SIP Controller Messaging
Caller Server
Figure 9: Web Enabled Message Drops
The caller, from a web page, clicks on the URL for the user they wish The caller, from a web page, clicks on the URL for the user they wish
to leave a message for. The result is an HTTP request (1) to the to leave a message for. The result is an HTTP request (1) to the
controller. The URI in this request would be some controller-specific controller. The URI in this request would be some controller-specific
identifier that tells the controller what it needs to do. The identifier that tells the controller what it needs to do. The
controller then calls the user (3) using an SDP with a single media controller then calls the user (3) using an SDP with a single media
stream on hold initially. This is accepted (4), and the resulting SDP stream on hold initially. This is accepted (4), and the resulting SDP
is used in an INVITE to the messaging server (6). The URI of this is used in an INVITE to the messaging server (6). The URI of this
INVITE is that for message drop with standard greeting (sip:sub- INVITE is that for message drop with standard greeting (sip:sub-
jdrosen-deposit@voiceserver.com). The call is accepted (7) and the jdrosen-deposit@voiceserver.com). The call is accepted (7) and the
200 OK is used in a re-INVITE to the caller (9) to set the address of 200 OK is used in a re-INVITE to the caller (9) to set the address of
skipping to change at page 36, line 4 skipping to change at page 36, line 46
components could be offered by separate providers, for example, components could be offered by separate providers, for example,
enabling an ASP component model to evolve. We have observed that many enabling an ASP component model to evolve. We have observed that many
of the components can be described as having some kind of session of the components can be described as having some kind of session
level resource that can be communicated with, usually in an automated level resource that can be communicated with, usually in an automated
fashion. Access to these resources is typically parameterized. As a fashion. Access to these resources is typically parameterized. As a
result, SIP access, using the request URI as a service indicator, is result, SIP access, using the request URI as a service indicator, is
an ideal way to communicate across these components. an ideal way to communicate across these components.
To validate this model, we examined the specific service interfaces To validate this model, we examined the specific service interfaces
that would be defined by IVR servers, conferencing servers, text-to- that would be defined by IVR servers, conferencing servers, text-to-
| | | |
| | (1) HTTP GET | |
|-------------------->| |
| | (2) 200 OK | |
|<--------------------| |
| | (3) INV | |
| |<-------------| |
| | (4) 200 OK | |
| |------------->| |
| | (5) ACK | |
| |<-------------| |
| | | (6) INV |
| | |--------------------->|
| | | (7) 200 OK |
| | |<---------------------|
| | | (8) ACK |
| | |--------------------->|
| | (9) INV | |
| |<-------------| |
| | (10) 200 OK | |
| |------------->| |
| | (11) ACK | |
| |<-------------| |
| | | |
| | | |
| | | |
Web SIP Controller Messaging
Caller Server
Figure 9: Web Enabled Message Drops
speech servers and messaging servers. We gave call flows of complex speech servers and messaging servers. We gave call flows of complex
applications built up from these components using the specified applications built up from these components using the specified
interfaces. interfaces.
9 Author's Addresses 9 Changes from -00
o Minor edits
10 Author's Addresses
Jonathan Rosenberg Jonathan Rosenberg
dynamicsoft dynamicsoft
72 Eagle Rock Avenue 72 Eagle Rock Avenue
First Floor First Floor
East Hanover, NJ 07936 East Hanover, NJ 07936
email: jdrosen@dynamicsoft.com email: jdrosen@dynamicsoft.com
Peter Mataga Peter Mataga
dynamicsoft dynamicsoft
72 Eagle Rock Avenue 72 Eagle Rock Avenue
First Floor First Floor
East Hanover, NJ 07936 East Hanover, NJ 07936
email: jdrosen@dynamicsoft.com email: pmataga@dynamicsoft.com
Henning Schulzrinne Henning Schulzrinne
Columbia University Columbia University
M/S 0401 M/S 0401
1214 Amsterdam Ave. 1214 Amsterdam Ave.
New York, NY 10027-7003 New York, NY 10027-7003
email: schulzrinne@cs.columbia.edu email: schulzrinne@cs.columbia.edu
10 Bibliography 11 Bibliography
[1] N. Greene, M. Ramalho, and B. Rosen, "Media gateway control [1] N. Greene, M. Ramalho, and B. Rosen, "Media gateway control
protocol architecture and requirements," Request for Comments 2805, protocol architecture and requirements," Request for Comments 2805,
Internet Engineering Task Force, Apr. 2000. Internet Engineering Task Force, Apr. 2000.
[2] M. Arango, A. Dugan, I. Elliott, C. Huitema, and S. Pickett, [2] M. Arango, A. Dugan, I. Elliott, C. Huitema, and S. Pickett,
"Media gateway control protocol (MGCP) version 1.0," Request for "Media gateway control protocol (MGCP) version 1.0," Request for
Comments 2705, Internet Engineering Task Force, Oct. 1999. Comments 2705, Internet Engineering Task Force, Oct. 1999.
[3] F. Cuervo, N. Greene, C. Huitema, A. Rayhan, B. Rosen, and J. [3] F. Cuervo, N. Greene, C. Huitema, A. Rayhan, B. Rosen, and J.
skipping to change at page 39, line 5 skipping to change at page 38, line 47
[13] S. Donovan, "The SIP INFO method," Request for Comments 2976, [13] S. Donovan, "The SIP INFO method," Request for Comments 2976,
Internet Engineering Task Force, Oct. 2000. Internet Engineering Task Force, Oct. 2000.
[14] G. Hellstrom, "RTP payload for text conversation," Request for [14] G. Hellstrom, "RTP payload for text conversation," Request for
Comments 2793, Internet Engineering Task Force, May 2000. Comments 2793, Internet Engineering Task Force, May 2000.
[15] H. Alvestrand, "Tags for the identification of languages," [15] H. Alvestrand, "Tags for the identification of languages,"
Request for Comments 1766, Internet Engineering Task Force, Mar. Request for Comments 1766, Internet Engineering Task Force, Mar.
1995. 1995.
[16] J. Rosenberg, H. Schulzrinne, and H. Sinnreich, "Sip enabled
services to support the hearing impaired," Internet Draft, Internet
Engineering Task Force, July 2000. Work in progress.
Table of Contents Table of Contents
1 Introduction ........................................ 2 1 Introduction ........................................ 2
2 Why Decompose ....................................... 2 2 Why Decompose ....................................... 2
3 Tightly Coupled Decomposition ....................... 4 3 Tightly Coupled Decomposition ....................... 4
4 The Decoupled Model ................................. 6 4 The Decoupled Model ................................. 6
4.1 Architecture ........................................ 7 4.1 Architecture ........................................ 6
4.2 Benefits of the Decoupling .......................... 10 4.2 Benefits of the Decoupling .......................... 10
5 Architecture for the Interfaces ..................... 11 5 Architecture for the Interfaces ..................... 11
5.1 Naming .............................................. 12 5.1 Naming .............................................. 12
5.2 Additional Message Content .......................... 14 5.2 Additional Message Content .......................... 14
5.3 Session Duration .................................... 14 5.3 Session Duration .................................... 14
5.4 Third Party Call Control ............................ 15 5.4 Third Party Call Control ............................ 15
5.5 Side Channels ....................................... 18 5.5 Side Channels ....................................... 18
6 Patterns for Accessing Components ................... 19 6 Patterns for Accessing Components ................... 19
6.1 Interactive Voice Response Services ................. 19 6.1 Interactive Voice Response Services ................. 19
6.2 Conferencing Servers ................................ 23 6.2 Conferencing Servers ................................ 23
6.2.1 Web Scheduled Conference Services ................... 26 6.2.1 Web Scheduled Conference Services ................... 26
6.2.2 Web Scheduled, IVR supported, Time Limited 6.2.2 Web Scheduled, IVR supported, Time Limited
Conference ..................................................... 27 Conference ..................................................... 27
6.3 Continuous Text-to-Speech ........................... 30 6.3 Continuous Text-to-Speech ........................... 30
6.3.1 Service Interface ................................... 33 6.3.1 Service Interface ................................... 33
6.3.2 Hearing Impaired Service ............................ 34 6.3.2 Hearing Impaired Service ............................ 34
6.4 Messaging Servers ................................... 34 6.4 Messaging Servers ................................... 34
6.4.1 Service Interface ................................... 34 6.4.1 Service Interface ................................... 34
6.4.2 Web Enabled Message Drops ........................... 34 6.4.2 Web Enabled Message Drops ........................... 34
7 Security Considerations ............................. 35 7 Security Considerations ............................. 36
8 Conclusion .......................................... 35 8 Conclusion .......................................... 36
9 Author's Addresses .................................. 37 9 Changes from -00 .................................... 36
10 Bibliography ........................................ 37 10 Author's Addresses .................................. 37
11 Bibliography ........................................ 37
 End of changes. 31 change blocks. 
70 lines changed or deleted 82 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/