idnits 2.17.1 

draft-camarillo-sip-deaf-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 25 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 17, 2003) is 7740 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? '1' on line 949 looks like a reference

  -- Missing reference section? '2' on line 954 looks like a reference

  -- Missing reference section? '3' on line 1004 looks like a reference

  -- Missing reference section? '4' on line 1007 looks like a reference

  -- Missing reference section? '5' on line 1011 looks like a reference

  -- Missing reference section? '6' on line 1015 looks like a reference

  -- Missing reference section? '7' on line 1020 looks like a reference

  -- Missing reference section? '8' on line 1024 looks like a reference

  -- Missing reference section? '9' on line 1027 looks like a reference

  -- Missing reference section? '10' on line 1031 looks like a reference

  -- Missing reference section? '11' on line 1036 looks like a reference


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 13 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                   SIP WG
3	Internet Draft                                              G. Camarillo
4	                                                                Ericsson
5	                                                               E. Burger
6	                                                      SnowShore Networks
7	                                                          H. Schulzrinne
8	                                                     Columbia University
9	                                                             A. van Wijk
10	                                                                 Viataal
11	draft-camarillo-sip-deaf-02.txt
12	February 17, 2003
13	Expires: August, 2003

15	   Transcoding Services Invocation in the Session Initiation Protocol

17	STATUS OF THIS MEMO

19	   This document is an Internet-Draft and is in full conformance with
20	   all provisions of Section 10 of RFC2026.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF), its areas, and its working groups.  Note that
24	   other groups may also distribute working documents as Internet-
25	   Drafts.

27	   Internet-Drafts are draft documents valid for a maximum of six months
28	   and may be updated, replaced, or obsoleted by other documents at any
29	   time.  It is inappropriate to use Internet-Drafts as reference
30	   material or to cite them other than as "work in progress".

32	   The list of current Internet-Drafts can be accessed at
33	   http://www.ietf.org/ietf/1id-abstracts.txt

35	   To view the list Internet-Draft Shadow Directories, see
36	   http://www.ietf.org/shadow.html.

38	Abstract

40	   This document describes how to discover the need of transcoding
41	   services in a session established with SIP and how to invoke those
42	   transcoding services. Two models for transcoding services invocation
43	   are introduced; the conference bridge model and the third party call
44	   control model. Both models meet the requirements for SIP regarding
45	   transcoding services invocation to support deaf, hard of hearing and
46	   speech-impaired individuals.

48	                           Table of Contents

50	   1          Introduction ........................................    3
51	   2          Discovery of the Need for Transcoding Services ......    3
52	   3          Transcoding Services Invocation .....................    4
53	   3.1        Terminology .........................................    5
54	   3.2        Conference Bridge Transcoding Model .................    5
55	   3.2.1      Caller's Invocation .................................    6
56	   3.2.2      Callee's Invocation .................................    6
57	   3.3        Third Party Call Control Transcoding Model ..........    8
58	   3.3.1      Callee's Invocation .................................    8
59	   3.3.2      Caller's Invocation .................................   14
60	   3.3.3      Receiving the Original Stream .......................   16
61	   3.3.4      Transcoding Services in Parallel ....................   17
62	   3.3.5      Transcoding Services in Serial ......................   21
63	   4          Security Considerations .............................   21
64	   5          TODO List ...........................................   22
65	   6          Authors' Addresses ..................................   22
66	   7          Bibliography ........................................   22

68	1 Introduction

70	   Two user agents involved in a SIP [1] dialog may find it impossible
71	   to establish a media session due to a variety of incompatibilities.
72	   Assuming that both user agents understand the same session
73	   description format (e.g., SDP), incompatibilities can be found at the
74	   user agent level and at the user level. At the user agent level, both
75	   terminals may not support any common codec or may not support common
76	   media types (e.g., a text-only terminal and an audio-only terminal).
77	   At the user level, a deaf person will not be able to understand what
78	   it is said over an audio stream.

80	   In order to make communications possible in the presence of
81	   incompatibilities, user agents need to introduce intermediaries that
82	   provide transcoding services to a session. From the SIP point of
83	   view, the introduction of a transcoder is done in the same way to
84	   resolve both user level and user agent level incompatibilities.
85	   Therefore, the invocation mechanisms described in this document are
86	   generally applicable to any type of incompatibility related to how
87	   the information that needs to be communicated is encoded.

89	   This document does not describe media server discovery. That is an
90	   orthogonal problem that one can address using user agent provisioning
91	   or other methods.

93	   All the examples provided in this document use the Session
94	   Description Protocol (SDP) [2]. However, other session description
95	   formats can be used with the same call flows.

97	   The remainder of this document is organized as follows. Section 2
98	   deals with the discovery of the need of transcoding services for a
99	   particular session. Section 3.2 introduces the conference bridge
100	   transcoding invocation model, and Section 3.3 introduces the third
101	   party call control model. Both models meet the requirements regarding
102	   transcoding services invocation in RFC3351 [3] to support deaf, hard
103	   of hearing and speech-impaired individuals.

105	2 Discovery of the Need for Transcoding Services

107	   Following the one-party consent model defined in RFC 3238 [4],
108	   transcoding invocation is best performed by one of the end-points
109	   involved in the communication. Following the same principle, one of
110	   the end-points should be the one detecting that transcoding is needed
111	   for a particular session.

113	   In order to decide whether or not transcoding is needed, a user agent
114	   needs to know the capabilities of the remote user agent. A user agent
115	   acting as an offerer typically obtains this knowledge by downloading
116	   a presence document that includes media capabilities (e.g., Bob is
117	   available on a terminal that only supports audio) or by getting an
118	   SDP description of media capabilities as defined in RFC 3264 [5].
119	   Presence documents are typically received in a NOTIFY request and SDP
120	   media capabilities descriptions are typically received in a 200 (OK)
121	   response to an OPTIONS request or in a 488 (Not Acceptable Here)
122	   response to an INVITE.

124	   A user agent client acting as an answerer typically gets an offer
125	   that it cannot accept. The user agent can send back a media
126	   capabilities description hoping that the offerer will invoke some
127	   type of transcoding services or it can invoke transcoding services
128	   itself.

130	   It is recommended that an offerer does not invoke transcoding
131	   services before making sure that the answerer does not support the
132	   capabilities needed for the session. Making wrong assumptions about
133	   the answerer's capabilities can lead to situations where two
134	   transcoders are introduced (one by the offerer and one by the
135	   answerer) in a session that would not need any transcoding services
136	   at all.

138	        An example of the situation above is a call between two GSM
139	        phones (without using transcoding-free operation). Both
140	        phones use a GSM codec, but the speech is converted from
141	        GSM to PCM by the originating MSC and from PCM back to GSM
142	        by the terminating MSC.

144	   Note that transcoding services can be symmetric (e.g., speech-to-text
145	   plus text-to-speech) or asymmetric (e.g., a one-way speech-to-text
146	   transcoding for a hearing impaired user that can talk).

148	3 Transcoding Services Invocation

150	   Once the need for transcoding for a particular session has been
151	   identified as described in Section 2, one of the user agents needs to
152	   invoke transcoding services.

154	   Invoking transcoding services from a server (T) for a session between
155	   two user agents (A and B) involves establishing two media sessions;
156	   one between A and T and another between T and B. How to invoke T's
157	   services (i.e., how to establish both A-T and T-B sessions) depends
158	   on how we model the transcoding service. We have considered two
159	   models for invoking a transcoding service. The first is to use a
160	   (dial-in and/or dial-out) conference bridge that negotiates the
161	   appropriate media parameters on each individual leg (i.e., A-T and
162	   T-B). The second is to use third party call control  [6], also
163	   referred to as 3pcc, to invoke the transcoding service. Section 3.2
164	   describes the conference bridge transcoding invocation model, and
165	   Section 3.3 describes the third party call control model.

167	3.1 Terminology

169	   All the figures in this document follow the naming convention below:

171	        SDP A: A session description generated by A. It contains, among
172	             other things, the transport address/es (IP address and port
173	             number) where A wants to receive media for each particular
174	             stream.

176	        SDP B: A session description generated by B. It contains, among
177	             other things, the transport address/es where B wants to
178	             receive media for each particular stream.

180	        SDP A+B: A session description that contains, among other
181	             things, the transport address/es where A wants to receive
182	             media and the transport address/es where B wants to receive
183	             media.

185	        SDP TA: A session description generated by T and intended for A.
186	             It contains, among other things, the transport address/es
187	             where T wants to receive media from A.

189	        SDP TB: A session description generated by T and intended for B.
190	             It contains, among other things, the transport address/es
191	             where T wants to receive media from B.

193	        SDP TA+TB: A session description generated by T that contains,
194	             among other things, the transport address/es where T wants
195	             to receive media from A and the transport address/es where
196	             T wants to receive media from B.

198	3.2 Conference Bridge Transcoding Model

200	   A conference server typically establishes an audio stream with each
201	   participant of a conference. The server sends over each individual
202	   stream the media received over the rest of the streams, typically
203	   performing some mixing. The conference server may have to send audio
204	   to different participants using different audio codecs. We can think
205	   of a transcoding service as a two-party conference server that may
206	   change not only the codec in use, but also the format of the media
207	   (e.g., audio to text). Using this model, the whole A-T-B session is
208	   established in the same way as a conference [7]. Typically, the user
209	   agent invoking the transcoding service sets up the media policy at
210	   the bridge (possibly using a media policy control protocol) and sends
211	   an INVITE to join the conference. The media policy for the session
212	   determines the type of transcoding the bridge will perform.

214	   Once the conference is set up and the invoker has joined it, the
215	   remote user has to be added as a participant as well. Users have two
216	   options to join a conference. A user can dial-in (i.e., send an
217	   INVITE request to the conference bridge) to join a conference, or the
218	   conference bridge can dial-out (i.e., send an INVITE request to the
219	   user) to add the user to the conference. Both dial-in and dial-out
220	   approaches are discussed in the following sections. Section 3.2.1
221	   deals with caller's invocation and Section 3.2.2 deals with callee's
222	   invocation of the service.

224	3.2.1 Caller's Invocation

226	   Once the caller has set up the conference bridge and joined the
227	   conference by sending an INVITE to the bridge, it has two options to
228	   add the callee to the session; sending a REFER  [8] to the bridge
229	   (that will instruct the bridge to dial-out) or sending a REFER to the
230	   callee (that will instruct the callee to dial-in).

232	   We recommend the first option (i.e., REFER sent to the bridge). The
233	   bridge, upon reception of the REFER, generates an INVITE towards the
234	   callee. The session description of the INVITE is generated according
235	   to the media policy set up by the caller. Figure 1 shows this
236	   scenario's message flow.

238	   Note that if the caller chooses to send the REFER directly to the
239	   callee (rather than to the bridge) the callee may generate an INVITE
240	   with a session description that contained media types the bridge was
241	   not configured to handle. In addition to that, some user agents may
242	   not support REFER or may not be able to handle out-of-the-blue REFER
243	   requests.

245	3.2.2 Callee's Invocation

247	   Similarly to the situation above, once the callee has set up the
248	   conference bridge and joined the conference by sending an INVITE to
249	   the bridge, it has two options to add the caller to the session;
250	   sending a REFER to the bridge (that will instruct the bridge to
251	   dial-out) or sending a REFER to the caller (that will instruct the
252	   caller to dial-in).

254	   We recommend the first option (i.e., REFER sent to the bridge). The
255	   bridge, upon reception of the REFER, generates an INVITE with a
256	   Replaces header field [9] header field towards the callee. The
257	   session description of the INVITE is generated according to the media
258	   policy set up by the callee. Figure 2 shows this scenario's message
259	      A                            T                            B

261	      |                            |                            |
262	      |------(1) INVITE SDP A----->|                            |
263	      |                            |                            |
264	      |<----(2) 200 OK SDP TA------|                            |
265	      |                            |                            |
266	      |----------(3) ACK---------->|                            |
267	      |                            |                            |
268	      | ************************** |                            |
269	      |*   Media Policy Set-up    *|                            |
270	      | ************************** |                            |
271	      |                            |                            |
272	      |---------(4) REFER--------->|                            |
273	      |                            |                            |
274	      |<--------(5) 200 OK---------|                            |
275	      |                            |                            |
276	      |                            |-----(6) INVITE SDP TB----->|
277	      |                            |                            |
278	      |                            |<-----(7) 200 OK SDP B------|
279	      |                            |                            |
280	      |                            |----------(8) ACK---------->|
281	      |                            |                            |
282	      |<--------(9) NOTIFY---------|                            |
283	      |                            |                            |
284	      |---------(10) 200 OK------->|                            |
285	      |                            |                            |
286	      | ************************** | ************************** |
287	      |*          MEDIA           *|*          MEDIA           *|
288	      | ************************** | ************************** |
289	      |

291	   Figure 1: Caller's invocation of a conference bridge

293	   flow.

295	   The flow in Figure 2 requires that the caller supports the Replaces
296	   header field. If the caller does not support it, the callee can send
297	   a 488 (Not Accpetable Here) for the original INVITE and attempt to
298	   establish the session acting as a caller (i.e., sending a new
299	   INVITE).

301	   Sending the REFER to the caller (instead of to the bridge) introduces
302	   a number of issues, since there is currently no way for the callee to
303	   inform the caller that the newly established session will substitute
304	   the original session.

306	3.3 Third Party Call Control Transcoding Model

308	   If we model T as a transcoding service rather than a special case of
309	   a conferencing server, a single INVITE transaction from the invoker
310	   of the service provides T with both A's and B's session descriptions.
311	   In order to provide in a single session description information about
312	   media streams that belong to different entities (A and B), the
313	   session description format in use should provide a means to define
314	   how these streams should be mapped. For instance, in a session
315	   description with two audio streams and one text stream, a possible
316	   mapping would be the following; the information received over the
317	   first audio stream should be sent over the text stream and over the
318	   second audio stream, and the incoming text should be sent only over
319	   the first audio stream. SDP [2] can convey this information using the
320	   source and sink attributes [10].

322	   As stated previously, the invocation of a transcoding service
323	   consists of establishing two sessions; A-T and T-B. How these
324	   sessions are established depends on which party, the caller (A) or
325	   the callee (B), invokes the transcoding services. However, we have
326	   followed a general principle to design our 3pcc flows; a 200 (OK)
327	   response from the transcoding service have to be received before
328	   contacting the callee. This tries to ensure that the transcoding
329	   service will be available when the callee accepts the session.

331	   However, note that the transcoding service does not know the exact
332	   type of transcoding it will be performing until the callee accepts
333	   the session. Therefore, there are always changes of failing to
334	   provide transcoding services after the callee has accepted the
335	   session. A system with tough requirements could use preconditions to
336	   avoid this situation. When preconditions are used, the callee is not
337	   alerted until everything is ready for the session.

339	3.3.1 Callee's Invocation

341	   In this scenario, B receives an INVITE from A, and B decides to
342	   introduce T in the session. Figure 3 shows the call flow for this
343	   scenario.

345	   In Figure 3 A can both hear and speak and B is a deaf user with a
346	   speech impairment. A proposes to establish a session that consists of
347	   an audio stream (1). B wants to send and receive only text, so it
348	   invokes a transcoding service T that will perform both speech-to-text
349	      A                            T                            B

351	      |                            |                            |
352	      |-------------------(1) INVITE SDP A--------------------->|
353	      |                            |                            |
354	      |                            |<-----(2) INVITE SDP B------|
355	      |                            |                            |
356	      |                            |------(3) 200 OK SDP TB---->|
357	      |                            |                            |
358	      |                            | ************************** |
359	      |                            |*   Media Policy Set-up    *|
360	      |                            | ************************** |
361	      |                            |                            |
362	      |                            |<--------(5) REFER----------|
363	      |                            |                            |
364	      |                            |---------(6) 200 OK-------->|
365	      |                            |                            |
366	      |<-----(7) INVITE SDP TA-----|                            |
367	      |                            |                            |
368	      |------(8) 200 OK SDP A----->|                            |
369	      |                            |                            |
370	      |<----------(9) ACK----------|                            |
371	      |                            |                            |
372	      |                            |---------(10) NOTIFY------->|
373	      |                            |                            |
374	      |                            |<--------(11) 200 OK--------|
375	      |                            |                            |
376	      |---------------------(12) CANCEL------------------------>|
377	      |                            |                            |
378	      |<--------------------(13) 200 OK-------------------------|
379	      |                            |                            |
380	      |<-------------(14) 487 Request Terminated----------------|
381	      |                            |                            |
382	      |-----------------------(15) ACK------------------------->|
383	      |                            |                            |
384	      | ************************** | ************************** |
385	      |*          MEDIA           *|*          MEDIA           *|
386	      | ************************** | ************************** |
387	      |                            |                            |

389	   Figure 2: Conference bridge transcoding model
390	      A                            T                            B

392	      |                            |                            |
393	      |--------------------(1) INVITE SDP A-------------------->|
394	      |                            |                            |
395	      |                            |<---(2) INVITE SDP A+B------|
396	      |                            |                            |
397	      |                            |---(3) 200 OK SDP TA+TB---->|
398	      |                            |                            |
399	      |                            |<---------(4) ACK-----------|
400	      |                            |                            |
401	      |<-------------------(5) 200 OK SDP TA--------------------|
402	      |                            |                            |
403	      |------------------------(6) ACK------------------------->|
404	      |                            |                            |
405	      | ************************** | ************************** |
406	      |*          MEDIA           *|*          MEDIA           *|
407	      | ************************** | ************************** |
408	      |                            |                            |

410	   Figure 3: Callee's invocation of a transcoding service

412	   and text-to-speech conversions (2). The session descriptions of
413	   Figure 3 are partially shown below.

415	   (1) INVITE SDP A

417	          m=audio 20000 RTP/AVP 0
418	          c=IN IP4 A.domain.com

420	   (2) INVITE SDP A+B

422	          m=audio 20000 RTP/AVP 0
423	          c=IN IP4 A.domain.com
424	          a=source:1
425	          a=sink:2
426	          m=text 40000 RTP/AVP 96
427	          c=IN IP4 B.domain.com
428	          a=rtpmap:96 t140/1000
429	          a=source:2
430	          a=sink:1

432	   (3) 200 OK SDP TA+TB

434	          m=audio 30000 RTP/AVP 0
435	          c=IN IP4 T.domain.com
436	          a=source:1
437	          a=sink:2
438	          m=text 30002 RTP/AVP 96
439	          c=IN IP4 T.domain.com
440	          a=rtpmap:96 t140/1000
441	          a=source:2
442	          a=sink:1

444	   (5) 200 OK SDP TA

446	          m=audio 30000 RTP/AVP 0
447	          c=IN IP4 T.domain.com

449	   Four media streams (i.e., two bi-directional streams) have been
450	   established at this point:

452	        1.   Audio from A to T.domain.com:30000

454	        2.   Text from T to B.domain.com:40000

456	        3.   Text from B to T.domain.com:30002

458	        4.   Audio from T to A.domain.com:20000

460	   When either A or B decide to terminate the session, B will send a BYE
461	   to T indicating that the session is over.

463	   If the first INVITE (1) received by B is empty (no session
464	   description), the call flow is slightly different. Figure 4 shows the
465	   messages involved.

467	   B may have different reasons for invoking T before knowing A's
468	   session description. B may want to hide its capabilities, and
469	   therefore it wants to return a session description with all the
470	   codecs B supports plus all the codecs T supports. Or T may provide
471	   recording services (besides transcoding), and B wants T to record the
472	   conversation, regardless of whether or not transcoding is needed.

474	   This scenario (Figure 4) is a bit more complex than the previous one.

476	      A                            T                            B

478	      |                            |                            |
479	      |----------------------(1) INVITE------------------------>|
480	      |                            |                            |
481	      |                            |<-----(2) INVITE SDP B------|
482	      |                            |                            |
483	      |                            |---(3) 200 OK SDP TA+TB---->|
484	      |                            |                            |
485	      |                            |<---------(4) ACK-----------|
486	      |                            |                            |
487	      |<-------------------(5) 200 OK SDP TA--------------------|
488	      |                            |                            |
489	      |-----------------------(6) ACK SDP A-------------------->|
490	      |                            |                            |
491	      |                            |<-------(7) INVITE----------|
492	      |                            |                            |
493	      |                            |---(8) 200 OK SDP TA+TB---->|
494	      |                            |                            |
495	      |<-----------------(9) INVITE SDP TA----------------------|
496	      |                            |                            |
497	      |------------------(10) 200 OK SDP A--------------------->|
498	      |                            |                            |
499	      |<-----------------------(11) ACK-------------------------|
500	      |                            |                            |
501	      |                            |<-----(12) ACK SDP A+B------|
502	      |                            |                            |
503	      | ************************** | ************************** |
504	      |*          MEDIA           *|*          MEDIA           *|
505	      | ************************** | ************************** |

507	   Figure 4: Callee's invocation after initial INVITE without SDP

509	   In INVITE (2), B still does not have SDP A, so it cannot provide T
510	   with that information. When B finally receives SDP A in (6), it has
511	   to send it to T. B sends an empty INVITE to T (7) and gets a 200 OK
512	   with SDP TA+TB (8). In general, this SDP TA+TB can be different than
513	   the one that was sent in (3). That is why B needs to send the updated
514	   SDP TA to A in (9). A then sends a possibly updated SDP A (10) and B
515	   sends it to T in (12). However, if T happens to return the same SDP
516	   TA+TB in (8) as in (3), B can skip messages (9), (10) and (11).
517	   Therefore, implementors of transcoding services are encouraged to
518	   return the same session description in (8) as in (3) in this type of
519	   scenario. The session descriptions of this flow are shown below:

521	   (2) INVITE SDP A+B

523	          m=audio 20000 RTP/AVP 0
524	          c=IN IP4 0.0.0.0
525	          a=source:1
526	          a=sink:2
527	          m=text 40000 RTP/AVP 96
528	          c=IN IP4 B.domain.com
529	          a=rtpmap:96 t140/1000
530	          a=source:2
531	          a=sink:1

533	   (3) 200 OK SDP TA+TB

535	          m=audio 30000 RTP/AVP 0
536	          c=IN IP4 T.domain.com
537	          a=source:1
538	          a=sink:2
539	          m=text 30002 RTP/AVP 96
540	          c=IN IP4 T.domain.com
541	          a=rtpmap:96 t140/1000
542	          a=source:2
543	          a=sink:1

545	   (5) 200 OK SDP TA

547	          m=audio 30000 RTP/AVP 0
548	          c=IN IP4 T.domain.com

550	   (6) ACK SDP A

552	          m=audio 20000 RTP/AVP 0
553	          c=IN IP4 A.domain.com

555	   (8) 200 OK SDP TA+TB

557	          m=audio 30004 RTP/AVP 0
558	          c=IN IP4 T.domain.com
559	          a=source:1
560	          a=sink:2
561	          m=text 30006 RTP/AVP 96
562	          c=IN IP4 T.domain.com
563	          a=rtpmap:96 t140/1000
564	          a=source:2
565	          a=sink:1

567	   (9) INVITE SDP TA

569	          m=audio 30004 RTP/AVP 0
570	          c=IN IP4 T.domain.com

572	   (10) 200 OK SDP A

574	          m=audio 20002 RTP/AVP 0
575	          c=IN IP4 A.domain.com

577	   (12) ACK SDP A+B

579	          m=audio 20002 RTP/AVP 0
580	          c=IN IP4 A.domain.com
581	          a=source:1
582	          a=sink:2
583	          m=text 40000 RTP/AVP 96
584	          c=IN IP4 B.domain.com
585	          a=rtpmap:96 t140/1000
586	          a=source:2
587	          a=sink:1

589	   Four media streams (i.e., two bi-directional streams) have been
590	   established at this point:

592	        1.   Audio from A to T.domain.com:30004

594	        2.   Text from T to B.domain.com:40000

596	        3.   Text from B to T.domain.com:30006

598	        4.   Audio from T to A.domain.com:20002

600	3.3.2 Caller's Invocation
601	   In this scenario, A wishes to establish a session with B using a
602	   transcoding service. A uses 3pcc to set up the session between T and
603	   B. The call flow we provide here is slightly different than the ones
604	   in [6]. In [6], the controller establishes a session between two user
605	   agents, being the user agents the ones deciding the characteristics
606	   of the streams. Here, A wants to establish a session between T and B,
607	   but A wants to decide how many and which types of streams are
608	   established. That is why A sends its session description in the first
609	   INVITE (1) to T, as opposed to the media-less initial INVITE
610	   recommended by [6]. Figure 5 shows the call flow for this scenario.

612	      A                            T                            B

614	      |                            |                            |
615	      |-------(1) INVITE SDP A---->|                            |
616	      |                            |                            |
617	      |<----(2) 200 OK SDP TA+TB---|                            |
618	      |                            |                            |
619	      |----------(3) ACK---------->|                            |
620	      |                            |                            |
621	      |--------------------(4) INVITE SDP TA------------------->|
622	      |                            |                            |
623	      |<--------------------(5) 200 OK SDP B--------------------|
624	      |                            |                            |
625	      |-------------------------(6) ACK------------------------>|
626	      |                            |                            |
627	      |--------(7) INVITE--------->|                            |
628	      |                            |                            |
629	      |<---(8) 200 OK SDP TA+TB  --|                            |
630	      |                            |                            |
631	      |--------------------(9) INVITE SDP TA------------------->|
632	      |                            |                            |
633	      |<-------------------(10) 200 OK SDP B--------------------|
634	      |                            |                            |
635	      |-------------------------(11) ACK----------------------->|
636	      |                            |                            |
637	      |------(12) ACK SDP A+B----->|                            |
638	      |                            |                            |
639	      | ************************** | ************************** |
640	      |*          MEDIA           *|*          MEDIA           *|
641	      | ************************** | ************************** |
642	      |                            |                            |

644	   Figure 5: Caller's invocation of a transcoding service
645	   We do not include the session descriptions of this flow, since they
646	   are very similar to the ones in Figure 4. In this flow, if T returns
647	   the same SDP TA+TB in (8) as in (2), messages (9), (10) and (11) can
648	   be skipped.

650	3.3.3 Receiving the Original Stream

652	   Sometimes, as pointed out in the requirements for SIP in support of
653	   deaf, hard of hearing and speech-impaired individuals [3], a user
654	   wants to receive both the original stream (e.g., audio) and the
655	   transcoded stream (e.g., the output of the speech-to-text
656	   conversion). There are various possible solutions for this problem.
657	   One solution consists of using the SDP group attribute with FID
658	   semantics [11]. FID allows requesting that a stream is sent to two
659	   different transport addresses in parallel, as shown below:

661	            a=group:FID 1 2
662	            m=audio 20000 RTP/AVP 0
663	            c=IN IP4 A.domain.com
664	            a=mid:1
665	            m=audio 30000 RTP/AVP 0
666	            c=IN IP4 T.domain.com
667	            a=mid:2

669	   The problem with this solution is that the majority of the SIP user
670	   agents do not support FID. And even if FID is supported, many user
671	   agents do not support sending simultaneous copies of the same media
672	   stream at the same time. In addition to that, both copies of the
673	   stream need to use the same codec.

675	   Therefore, we recommend that T (instead of a user agent) replicates
676	   the media stream. The following session description requests T to
677	   perform speech-to-text and text-to-speech conversions between the
678	   first audio stream and the text stream. In addition, it requests T to
679	   copy of the first audio stream to the second audio stream and send it
680	   to A.

682	            m=audio 40000 RTP/AVP 0
683	            c=IN IP4 B.domain.com
684	            a=source:1
685	            a=sink:2
686	            m=audio 20000 RTP/AVP 0
687	            c=IN IP4 A.domain.com
688	            a=recvonly
689	            a=sink:1
690	            m=text 20002 RTP/AVP 96
691	            c=IN IP4 A.domain.com
692	            a=rtpmap:96 t140/1000
693	            a=source:2
694	            a=sink:1

696	3.3.4 Transcoding Services in Parallel

698	   Transcoding services sometimes consist of human relays (e.g., a
699	   person performing speech-to-text and text-to-speech conversions for a
700	   session). If the same person is involved in both conversions (i.e.,
701	   from A to B and from B to A), he or she has access to all the
702	   conversation. In order to provide some degree of privacy, sometimes
703	   two different persons are allocated to do the job (i.e., one person
704	   handles A->B and the other B->A). This type of disposition is also
705	   useful for automated transcoding services, where one machine converts
706	   text to synthetic speech (text-to-speech) and a different machine
707	   performs voice recognition (speech-to-text).

709	   The scenario just described involves four different sessions; A-T1,
710	   T1-B, B-T2 and T2-A. Figure 6 shows the call flow where A invokes T1
711	   and T2.

713	   (1) INVITE SDP AT1

715	          m=text 20000 RTP/AVP 96
716	          c=IN IP4 A.domain.com
717	          a=rtpmap:96 t140/1000
718	          a=sendonly
719	          a=source:1
720	          m=audio 20000 RTP/AVP 0
721	          c=IN IP4 0.0.0.0
722	          a=recvonly
723	          a=sink:1

725	   (2) INVITE SDP AT2

727	          m=text 20002 RTP/AVP 96
728	          c=IN IP4 A.domain.com
729	          a=rtpmap:96 t140/1000
730	          a=recvonly
731	          a=sink:1
732	          m=audio 20000 RTP/AVP 0
733	          c=IN IP4 0.0.0.0
734	          a=sendonly
735	          a=source:1

737	   (3) 200 OK SDP T1A+T1B

739	          m=text 30000 RTP/AVP 96
740	          c=IN IP4 T1.domain.com
741	          a=rtpmap:96 t140/1000
742	          a=recvonly
743	          a=source:1
744	          m=audio 30002 RTP/AVP 0
745	          c=IN IP4 T1.domain.com
746	          a=sendonly
747	          a=sink:1

749	   (5) 200 OK SDP T2A+T2B

751	          m=text 40000 RTP/AVP 96
752	          c=IN IP4 T2.domain.com
753	          a=rtpmap:96 t140/1000
754	          a=sendonly
755	          a=sink:1
756	          m=audio 40002 RTP/AVP 0
757	          c=IN IP4 T2.domain.com
758	          a=recvonly
759	          a=source:1

761	   (7) INVITE SDP T1B+T2B

763	          m=audio 30002 RTP/AVP 0
764	          c=IN IP4 T1.domain.com
765	          a=sendonly
766	          m=audio 40002 RTP/AVP 0
767	          c=IN IP4 T2.domain.com
768	          a=recvonly

770	   (8) 200 OK SDP BT1+BT2

772	          m=audio 50000 RTP/AVP 0
773	          c=IN IP4 B.domain.com

775	  A                          T1                     T2            B

777	  |                          |                      |             |
778	  |----(1) INVITE SDP AT1--->|                      |             |
779	  |                          |                      |             |
780	  |----------------(2) INVITE SDP AT2-------------->|             |
781	  |                          |                      |             |
782	  |<-(3) 200 OK SDP T1A+T1B--|                      |             |
783	  |                          |                      |             |
784	  |---------(4) ACK--------->|                      |             |
785	  |                          |                      |             |
786	  |<---------------(5) 200 OK SDP T2A+T2B-----------|             |
787	  |                          |                      |             |
788	  |----------------------(6) ACK------------------->|             |
789	  |                          |                      |             |
790	  |-----------------------(7) INVITE SDP T1B+T2B----------------->|
791	  |                          |                      |             |
792	  |<----------------------(8) 200 OK SDP BT1+BT2------------------|
793	  |                          |                      |             |
794	  |------(9) INVITE--------->|                      |             |
795	  |                          |                      |             |
796	  |-------------------(10) INVITE------------------>|             |
797	  |                          |                      |             |
798	  |<-(11) 200 OK SDP T1A+T1B-|                      |             |
799	  |                          |                      |             |
800	  |<------------(12) 200 OK SDP T2A+T2B-------------|             |
801	  |                          |                      |             |
802	  |------------------(13) INVITE SDP T1B+T2B--------------------->|
803	  |                          |                      |             |
804	  |<-----------------(14) 200 OK SDP BT1+BT2----------------------|
805	  |                          |                      |             |
806	  |--------------------------(15) ACK---------------------------->|
807	  |                          |                      |             |
808	  |---(16) ACK SDP AT1+BT1-->|                      |             |
809	  |                          |                      |             |
810	  |------------(17) ACK SDP AT2+BT2---------------->|             |
811	  |                          |                      |             |
812	  | ************************ | ********************************** |
813	  |*          MEDIA         *|*               MEDIA              *|
814	  | ************************ | ********************************** |
815	  |                          |                      |             |
816	  | ***********************************************   ***********
817	  |*                      MEDIA                    *|*   MEDIA   *|
818	  | *********************************************** | *********** |
819	  |                          |                      |             |

821	   Figure 6: Transcoding services in parallel
822	          a=recvonly
823	          m=audio 50002 RTP/AVP 0
824	          c=IN IP4 B.domain.com
825	          a=sendonly

827	   (11) 200 OK SDP T1A+T1B

829	          m=text 30000 RTP/AVP 96
830	          c=IN IP4 T1.domain.com
831	          a=rtpmap:96 t140/1000
832	          a=recvonly
833	          a=source:1
834	          m=audio 30002 RTP/AVP 0
835	          c=IN IP4 T1.domain.com
836	          a=sendonly
837	          a=sink:1

839	   (12) 200 OK SDP T2A+T2B

841	          m=text 40000 RTP/AVP 96
842	          c=IN IP4 T2.domain.com
843	          a=rtpmap:96 t140/1000
844	          a=sendonly
845	          a=sink:1
846	          m=audio 40002 RTP/AVP 0
847	          c=IN IP4 T2.domain.com
848	          a=recvonly
849	          a=source:1

851	   Since T1 have returned the same SDP in (11) as in (3) and T2 has
852	   returned the same SDP in (12) as in (5), messages (13), (14) and (15)
853	   can be skipped.

855	   (16) ACK SDP AT1+BT1

857	          m=text 20000 RTP/AVP 96
858	          c=IN IP4 A.domain.com
859	          a=rtpmap:96 t140/1000
860	          a=sendonly
861	          a=source:1
862	          m=audio 50000 RTP/AVP 0
863	          c=IN IP4 B.domain.com
864	          a=recvonly
865	          a=sink:1

867	   (17) ACK SDP AT2+BT2

869	          m=text 20002 RTP/AVP 96
870	          c=IN IP4 A.domain.com
871	          a=rtpmap:96 t140/1000
872	          a=recvonly
873	          a=sink:1
874	          m=audio 50002 RTP/AVP 0
875	          c=IN IP4 B.domain.com
876	          a=sendonly
877	          a=source:1

879	   Four media streams have been established at this point:

881	        1.   Text from A to T1.domain.com:30000

883	        2.   Audio from T1 to B.domain.com:50000

885	        3.   Audio from B to T2.domain.com:40002

887	        4.   Text from T2 to A.domain.com:20002

889	   Note that B, the user agent server, needs to support two media
890	   streams; one sendonly and the other recvonly. At present, some user
891	   agents, although they support a single sendrecv media stream, they do
892	   not support a different media line per direction. Implementers are
893	   encouraged to build support for this feature.

895	3.3.5 Transcoding Services in Serial

897	   In a distributed environment, a complex transcoding service (e.g.,
898	   English text to Spanish speech) is often provided by several servers.
899	   For example, one server performs English text to Spanish text
900	   translation, and its output is feed into a server that performs
901	   text-to-speech conversion. The flow in Figure 7 shows how A invokes
902	   T1 and T2.

904	4 Security Considerations

906	   This document describes how to use the REFER method and third party
907	   call control to invoke transcoding services. It does not introduce
908	   new security considerations besides the ones discussed in  [8] and
909	   [6].

911	5 TODO List

913	   We need to see whether or not it is possible to use the media policy
914	   work in the 3pcc model as well (instead of source/sink).

916	6 Authors' Addresses

918	   Gonzalo Camarillo
919	   Ericsson
920	   Advanced Signalling Research Lab.
921	   FIN-02420 Jorvas
922	   Finland
923	   electronic mail:  Gonzalo.Camarillo@ericsson.com

925	   Eric W. Burger
926	   SnowShore Networks, Inc.
927	   Chelmsford, MA
928	   USA
929	   electronic mail:  eburger@snowshore.com

931	   Henning Schulzrinne
932	   Dept. of Computer Science
933	   Columbia University 1214 Amsterdam Avenue, MC 0401
934	   New York, NY 10027
935	   USA
936	   electronic mail:  schulzrinne@cs.columbia.edu

938	   Arnoud van Wijk
939	   Viataal
940	   Research & Development
941	   Afdeling RDS
942	   Theerestraat 42
943	   5271 GD Sint-Michielsgestel
944	   The Netherlands
945	   electronic mail:  a.vwijk@viataal.nl

947	7 Bibliography

949	   [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. R. Johnston, J.
950	   Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session
951	   initiation protocol," RFC 3261, Internet Engineering Task Force, June
952	   2002.

954	   [2] M. Handley and V. Jacobson, "SDP: session description protocol,"

956	  A                           T1                    T2            B

958	  |                           |                     |             |
959	  |----(1) INVITE SDP A-----> |                     |             |
960	  |                           |                     |             |
961	  |<-(2) 200 OK SDP T1A+T1T2- |                     |             |
962	  |                           |                     |             |
963	  |----------(3) ACK--------> |                     |             |
964	  |                           |                     |             |
965	  |-----------(4) INVITE SDP T1T2------------------>|             |
966	  |                           |                     |             |
967	  |<-----------(5) 200 OK SDP T2T1+T2B--------------|             |
968	  |                           |                     |             |
969	  |---------------------(6) ACK-------------------->|             |
970	  |                           |                     |             |
971	  |---------------------------(7) INVITE SDP T2B----------------->|
972	  |                           |                     |             |
973	  |<--------------------------(8) 200 OK SDP B--------------------|
974	  |                           |                     |             |
975	  |--------------------------------(9) ACK----------------------->|
976	  |                           |                     |             |
977	  |---(10) INVITE-----------> |                     |             |
978	  |                           |                     |             |
979	  |------------------(11) INVITE------------------->|             |
980	  |                           |                     |             |
981	  |<-(12) 200 OK SDP T1A+T1T2-|                     |             |
982	  |                           |                     |             |
983	  |<-------------(13) 200 OK SDP T2T1+T2B-----------|             |
984	  |                           |                     |             |
985	  |---(14) ACK SDP T1T2+B---> |                     |             |
986	  |                           |                     |             |
987	  |-----------------------(15) INVITE SDP T2B-------------------->|
988	  |                           |                     |             |
989	  |<----------------------(16) 200 OK SDP B-----------------------|
990	  |                           |                     |             |
991	  |----------------(17) ACK SDP T1T2+B------------->|             |
992	  |                           |                     |             |
993	  |----------------------------(18) ACK-------------------------->|
994	  |                           |                     |             |
995	  | ************************* | *******************   *********** |
996	  |*         MEDIA           *|*       MEDIA       *|*   MEDIA   *|
997	  | ************************* | ******************* | *********** |
998	  |                           |                     |             |

1000	   Figure 7: Transcoding services in serial

1002	   RFC 2327, Internet Engineering Task Force, Apr. 1998.

1004	   [3] N. Charlton, M. Gasson, G. Gybels, M. Spanner, and A. van Wijk,
1005	   RFC 3351, Internet Engineering Task Force, Aug. 2002.

1007	   [4] S. Floyd and L. Daigle, "IAB architectural and policy
1008	   considerations for open pluggable edge services," RFC 3238, Internet
1009	   Engineering Task Force, Jan. 2002.

1011	   [5] J. Rosenberg and H. Schulzrinne, "An offer/answer model with
1012	   session description protocol (SDP)," RFC 3264, Internet Engineering
1013	   Task Force, June 2002.

1015	   [6] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo,
1016	   "Best current practices for third party call control in the session
1017	   initiation protocol," internet draft, Internet Engineering Task
1018	   Force, June 2002.  Work in progress.

1020	   [7] J. Rosenberg, "A framework for conferencing with the session
1021	   initiation protocol," internet draft, Internet Engineering Task
1022	   Force, Nov. 2002.  Work in progress.

1024	   [8] R. Sparks, "The SIP refer method," internet draft, Internet
1025	   Engineering Task Force, Dec. 2002.  Work in progress.

1027	   [9] B. Biggs, R. Dean, and R. Mahy, "The session inititation protocol
1028	   (SIP)," internet draft, Internet Engineering Task Force, May 2002.
1029	   Work in progress.

1031	   [10] G. Camarillo, H. Schulzrinne, and E. Burger, "The source and
1032	   sink attributes for the session description protocol," internet
1033	   draft, Internet Engineering Task Force, Sept. 2002.  Work in
1034	   progress.

1036	   [11] G. Camarillo, J. Holler, G. Eriksson, and H. Schulzrinne,
1037	   "Grouping of m lines in SDP," internet draft, Internet Engineering
1038	   Task Force, Feb. 2002.  Work in progress.

1040	   The IETF takes no position regarding the validity or scope of any
1041	   intellectual property or other rights that might be claimed to
1042	   pertain to the implementation or use of the technology described in
1043	   this document or the extent to which any license under such rights
1044	   might or might not be available; neither does it represent that it
1045	   has made any effort to identify any such rights. Information on the
1046	   IETF's procedures with respect to rights in standards-track and
1047	   standards-related documentation can be found in BCP-11. Copies of
1048	   claims of rights made available for publication and any assurances of
1049	   licenses to be made available, or the result of an attempt made to
1050	   obtain a general license or permission for the use of such
1051	   proprietary rights by implementors or users of this specification can
1052	   be obtained from the IETF Secretariat.

1054	   The IETF invites any interested party to bring to its attention any
1055	   copyrights, patents or patent applications, or other proprietary
1056	   rights which may cover technology that may be required to practice
1057	   this standard. Please address the information to the IETF Executive
1058	   Director.

1060	   Full Copyright Statement

1062	   Copyright (c) The Internet Society (2003). All Rights Reserved.

1064	   This document and translations of it may be copied and furnished to
1065	   others, and derivative works that comment on or otherwise explain it
1066	   or assist in its implementation may be prepared, copied, published
1067	   and distributed, in whole or in part, without restriction of any
1068	   kind, provided that the above copyright notice and this paragraph are
1069	   included on all such copies and derivative works. However, this
1070	   document itself may not be modified in any way, such as by removing
1071	   the copyright notice or references to the Internet Society or other
1072	   Internet organizations, except as needed for the purpose of
1073	   developing Internet standards in which case the procedures for
1074	   copyrights defined in the Internet Standards process must be
1075	   followed, or as required to translate it into languages other than
1076	   English.

1078	   The limited permissions granted above are perpetual and will not be
1079	   revoked by the Internet Society or its successors or assigns.

1081	   This document and the information contained herein is provided on an
1082	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
1083	   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
1084	   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
1085	   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
1086	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.