idnits 2.17.1 

draft-hellstrom-mmusic-multi-party-rtt-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The document date (February 23, 2020) is 1523 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'ISO 6429' is mentioned on line 1767, but not defined

  == Missing Reference: 'Eve-typing' is mentioned on line 1628, but not
     defined

  == Missing Reference: 'RFC 4103' is mentioned on line 1759, but not defined

  == Missing Reference: 'RTP' is mentioned on line 1761, but not defined

  == Missing Reference: 'RFC 4579' is mentioned on line 1764, but not defined

  == Missing Reference: 'UTF-8' is mentioned on line 1769, but not defined

  == Missing Reference: 'Unicode' is mentioned on line 1771, but not defined

  == Missing Reference: 'UCS-16' is mentioned on line 1777, but not defined


     Summary: 0 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                             G. Hellstrom
3	Internet-Draft                                                   Omnitor
4	Intended status: Best Current Practice                 February 23, 2020
5	Expires: August 26, 2020

7	        Real-time text media handling in multi-party conferences
8	               draft-hellstrom-mmusic-multi-party-rtt-01

10	Abstract

12	   This memo specifies methods for Real-Time Text (RTT) media handling
13	   in multi-party calls.  The main solution is to carry Real-Time text
14	   by the RTP protocol in a time-sampled mode according to RFC 4103.
15	   The main solution for centralized multi-party handling of real-time
16	   text is achieved through a media control unit coordinating multiple
17	   RTP text streams into one RTP session.

19	   Identification for the streams are provided through the CSRC lists in
20	   the RTP packets and through the RTCP messages.  This mechanism
21	   enables the receiving application to present the received real-time
22	   text medium separated per source, in different ways according to user
23	   preferences.  Some presentation related features are also described
24	   explaining suitable variations of transmission and presentation of
25	   text.

27	   Call control features are described for the SIP environment.  A
28	   number of alternative methods for providing the multi-party
29	   negotiation, transmission and presentation are discussed and a
30	   recommendation for the main one is provided.  Two alternative methods
31	   using a single RTP stream and source identification inline in the
32	   text stream are also described, one of them being provided as a lower
33	   functionality fallback method for endpoints with no multi-party
34	   awareness for RTT.

36	   Brief information is also provided for multi-party RTT in the WebRTC
37	   environment.

39	   EDITOR NOTE: A number of alternatives are specified for discussion.
40	   A decision is needed which alternatives are preferred and then how
41	   the preferred alternatives shall be emphasized.

43	Status of This Memo

45	   This Internet-Draft is submitted in full conformance with the
46	   provisions of BCP 78 and BCP 79.

48	   Internet-Drafts are working documents of the Internet Engineering
49	   Task Force (IETF).  Note that other groups may also distribute
50	   working documents as Internet-Drafts.  The list of current Internet-
51	   Drafts is at https://datatracker.ietf.org/drafts/current/.

53	   Internet-Drafts are draft documents valid for a maximum of six months
54	   and may be updated, replaced, or obsoleted by other documents at any
55	   time.  It is inappropriate to use Internet-Drafts as reference
56	   material or to cite them other than as "work in progress."

58	   This Internet-Draft will expire on August 26, 2020.

60	Copyright Notice

62	   Copyright (c) 2020 IETF Trust and the persons identified as the
63	   document authors.  All rights reserved.

65	   This document is subject to BCP 78 and the IETF Trust's Legal
66	   Provisions Relating to IETF Documents
67	   (https://trustee.ietf.org/license-info) in effect on the date of
68	   publication of this document.  Please review these documents
69	   carefully, as they describe your rights and restrictions with respect
70	   to this document.  Code Components extracted from this document must
71	   include Simplified BSD License text as described in Section 4.e of
72	   the Trust Legal Provisions and are provided without warranty as
73	   described in the Simplified BSD License.

75	Table of Contents

77	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
78	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
79	   2.  Centralized conference model  . . . . . . . . . . . . . . . .   4
80	   3.  Requirements on multi-party RTT . . . . . . . . . . . . . . .   5
81	   4.  Coordination of text RTP streams  . . . . . . . . . . . . . .   6
82	     4.1.  RTP Translator sending one RTT stream per participant . .   6
83	     4.2.  RTP Mixer indicating sources in CSRC-list . . . . . . . .   7
84	     4.3.  Distributing packets in an end-to-end encryption
85	           structure . . . . . . . . . . . . . . . . . . . . . . . .   8
86	     4.4.  RTP Mixer indicating participants by a control code in
87	           the stream  . . . . . . . . . . . . . . . . . . . . . . .   9
88	     4.5.  Mesh of RTP endpoints . . . . . . . . . . . . . . . . . .  10
89	     4.6.  Multiple RTP sessions, one for each participant . . . . .  11
90	     4.7.  Mixing for conference-unaware user agents . . . . . . . .  11
91	   5.  RTT bridging in WebRTC  . . . . . . . . . . . . . . . . . . .  13
92	     5.1.  RTT bridging in WebRTC with one data channel per source .  13
93	     5.2.  RTT bridging in WebRTC with one common data channel . . .  13
94	   6.  Preferred multi-party RTT transport method  . . . . . . . . .  14
95	   7.  Session control of multi-party RTT sessions . . . . . . . . .  14
96	     7.1.  Implicit RTT multi-party capability indication  . . . . .  15
97	     7.2.  RTT multi-party capability declared by SIP media-tags . .  16
98	     7.3.  SDP media attribute for RTT multi-party capability
99	           indication  . . . . . . . . . . . . . . . . . . . . . . .  18
100	     7.4.  SDP format parameter for RTT multi-party capability
101	           indication  . . . . . . . . . . . . . . . . . . . . . . .  19
102	     7.5.  Preferred capability declaration method.  . . . . . . . .  21
103	   8.  Identification of the source of text  . . . . . . . . . . . .  21
104	   9.  Presentation of multi-party text  . . . . . . . . . . . . . .  21
105	     9.1.  Associating identities with text streams  . . . . . . . .  22
106	     9.2.  Presentation details for multi-party aware UAs. . . . . .  22
107	       9.2.1.  Bubble style presentation . . . . . . . . . . . . . .  22
108	       9.2.2.  Other presentation styles . . . . . . . . . . . . . .  23
109	   10. Presentation details for multi-party unaware UAs. . . . . . .  23
110	   11. Transmission of text from each user . . . . . . . . . . . . .  24
111	   12. Robustness and indication of possible loss  . . . . . . . . .  24
112	   13. Performance . . . . . . . . . . . . . . . . . . . . . . . . .  24
113	   14. Security Considerations . . . . . . . . . . . . . . . . . . .  24
114	   15. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  24
115	   16. Congestion considerations . . . . . . . . . . . . . . . . . .  25
116	   17. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  25
117	   18. References  . . . . . . . . . . . . . . . . . . . . . . . . .  25
118	     18.1.  Normative References . . . . . . . . . . . . . . . . . .  25
119	     18.2.  Informative References . . . . . . . . . . . . . . . . .  26
120	   Appendix A.  Mixing for a conference-unaware UA . . . . . . . . .  26
121	     A.1.  Short description . . . . . . . . . . . . . . . . . . . .  27
122	     A.2.  Functionality goals and drawbacks . . . . . . . . . . . .  27
123	     A.3.  Definitions . . . . . . . . . . . . . . . . . . . . . . .  28
124	     A.4.  Presentation level procedures . . . . . . . . . . . . . .  30
125	       A.4.1.  Structure . . . . . . . . . . . . . . . . . . . . . .  30
126	       A.4.2.  Action on reception . . . . . . . . . . . . . . . . .  30
127	     A.5.  Display examples  . . . . . . . . . . . . . . . . . . . .  34
128	     A.6.  Summary of configurable parameters  . . . . . . . . . . .  35
129	     A.7.  References for this Appendix  . . . . . . . . . . . . . .  38
130	     A.8.  Acknowledgement . . . . . . . . . . . . . . . . . . . . .  38
131	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  38

133	1.  Introduction

135	   Real-time text (RTT) is a medium in real-time conversational
136	   sessions.  Text entered by participants in a session is transmitted
137	   in a time-sampled fashion, so that no specific user action is needed
138	   to cause transmission.  This gives a direct flow of text in the rate
139	   it is created, that is suitable in a real-time conversational
140	   setting.  The real-time text medium can be combined with other media
141	   in multimedia sessions.

143	   Media from a number of multimedia session participants can be
144	   combined in a multi-party session.  This memo specifies how the real-
145	   time text streams are handled in multi-party sessions.

147	   The description is mainly focused on the transport level, but also
148	   describes a few session and presentation level aspects.

150	   Transport of real-time text is specified in RFC 4103 [RFC4103] RTP
151	   Payload for text conversation.  It makes use of RFC 3550 [RFC3550]
152	   Real Time Protocol, for transport.  Robustness against network
153	   transmission problems is normally achieved through redundant
154	   transmission based on the principle from RFC 2198, with one primary
155	   and two redundant transmission of each text element.  Primary and
156	   redundant transmissions are combined in packets and described by a
157	   redundancy header.  This transport is usually used in the SIP Session
158	   Initiation Protocol RFC 3261 [RFC3261] environment.

160	   A very brief overview of functions for real-time text handling in
161	   multi-party sessions is described in RFC 4597 [RFC4597] Conferencing
162	   Scenarios, sections 4.8 and 4.10.  This specification builds on that
163	   description and indicates which protocol mechanisms should be used to
164	   implement multi-party handling of real-time text.

166	   EDITOR NOTE: A number of alternatives are specified for discussion.
167	   A decision is needed which alternatives are preferred and then how
168	   the preferred alternatives shall be emphasized.

170	1.1.  Requirements Language

172	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
173	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
174	   document are to be interpreted as described in RFC 2119 [RFC2119].

176	2.  Centralized conference model

178	   In the centralized conference model for SIP, introduced in RFC 4353
179	   [RFC4353] A Framework for Conferencing with the Session Initiation
180	   Protocol (SIP), one function co-ordinates the communication with
181	   participants in the multi-party session.  This function also controls
182	   media mixer functions for the media appearing in the session.  The
183	   central function is common for control of all media, while the media
184	   mixers may work differently for each medium.

186	   The central function is called the Focus UA and may be co-located in
187	   an advanced terminal including multi-party control functions, or it
188	   may be located in a separate location.  Many variants exist for
189	   setting up sessions including the multipoint control centre.  It is
190	   not within scope of this description to describe these, but rather
191	   the media specific handling in the mixer required to handle multi-
192	   party calls with RTT.

194	   The main principle for handling real-time text media in a centralized
195	   conference is that one RTP session for real-time text is established
196	   including the multipoint media control centre and the participating
197	   endpoints which are going to have real-time text exchange with the
198	   others.

200	   The different possible mechanisms for mixing and transporting RTT
201	   differs in the way they multiplex the text streams and how they
202	   identify the sources of the streams.  RFC 7667 [RFC7667] describes a
203	   number of possible use cases for RTP.  This specification refers to
204	   different sections of RFC 7667 for further reading of the situations
205	   caused by the different possible design choices.

207	3.  Requirements on multi-party RTT

209	   The following requirements are placed on multi-party RTT:

211	      The solution shall be applicable to IMS (3GPP TS 22.173), SIP
212	      based VoIP and Next Generation Emergency Services (NENA i3, ETSI
213	      TS 103 479, RFC 6443).

215	      The transmission interval for text must not be longer than 500
216	      milliseconds when there is anything available to send.  Ref ITU-T
217	      T.140.

219	      If text loss is detected or suspected, a missing text marker shall
220	      be inserted in the text stream where the loss is detected or
221	      suspected.  Ref ITU-T T.140 Amendment 1.  ETSI EN 301 549

223	      The display of text from the members of the conversation shall be
224	      arranged so that the text from each participant is clearly
225	      readable, and its source and the relative timing of entered text
226	      is visualized in the display.  Mechanisms for looking back in the
227	      contents from the current session should be provided.  The text
228	      should be displayed as soon as it is received.  Ref ITU-T T.140

230	      Bridges must be multimedia capable (voice, video, text).  Ref NENA
231	      i3 STA-010.2.

233	      R7: It MUST be possible to use real-time text in conferences both
234	      as a medium of discussion between individual participants (for
235	      example, for sidebar discussions in real-time text while listening
236	      to the main conference audio) and for central support of the
237	      conference with real-time text interpretation of speech.  Ref RFC
238	      5194.

240	      It should be possible to protect RTT contents with usual means for
241	      privacy and integrity.Ref RFC 6881 section 16

243	      Conferencing procedures are documented in RFC 4579.  Ref NENA i3
244	      STA-010.2.

246	      Conferencing applies to any kind of media stream by which users
247	      may want to communicate...  Ref 3GPP TS 24.147

249	      The framework for SIP conferences is specified in RFC 4353.  Ref
250	      3GPP TS 24.147

252	4.  Coordination of text RTP streams

254	   Coordinating and sending text RTP streams in the multi-party session
255	   can be done in a number of ways.  The most suitable methods are
256	   specified here with pros and cons.

258	   A receiving UA SHOULD separate text from the different sources and
259	   identify and display them accordingly.

261	4.1.  RTP Translator sending one RTT stream per participant

263	   Within the RTP session, text from each participant is transmitted
264	   from the RTP media translator in a separate RTP stream, thus using
265	   the same destination address/port combination, but separate RTP SSRC
266	   parameters and sequence number series as described in Section 7.1 and
267	   7.2 of RTP RFC 3550 [RFC3550] about the Translator function.  The
268	   sources of the text in each RTP packet are identified by the SSRC
269	   parameters in the RTP packets, containing the SSRC of the initial
270	   sources of text.

272	   A receiving UA is supposed to separate text items from the different
273	   sources and identify and display them in a suitable way.

275	   This method is described in RFC 7667, section 3.5.1 Relay-transport
276	   translator or 3.5.2 Media translator.

278	   The identification of the source is made through the RTCP SDES CNAME
279	   and NAME packets as described in RTP[RFC3550].

281	   Pros:

283	   This method has moderate overhead.  When loss of packets occur, it is
284	   possible to recover text from redundancy at loss of up to the number
285	   of redundancy levels carried in the RFC 4103 stream. (normally
286	   primary and two redundant levels.

288	   More loss than what can be recovered, can be detected and the marker
289	   for text loss can be inserted in the correct stream.

291	   It may be possible in some scenarios to keep the text encrypted
292	   through the Translator.

294	   Cons:

296	   There may be RTP implementations not supporting the Translator model.

298	   It is even most likely that this configuration is not supported by
299	   current media declarations in sdp.  RFC 3264 specifies in many places
300	   that one media description is supposed to describe just one RTP
301	   stream.

303	4.2.  RTP Mixer indicating sources in CSRC-list

305	   An RTP media mixer combines text from all participants except from
306	   the receiving endpoint into one RTP stream , thus all using the same
307	   destination address/port combination, the same RTP SSRC and , one
308	   sequence number series as described in Section 7.1 and 7.3 of RTP RFC
309	   3550 [RFC3550] about the Mixer function.  The sources of the text in
310	   each RTP packet are identified by the CSRC parameters in the RTP
311	   packets, containing the SSRC of the initial sources of text.  The
312	   order of the CSRC parameters are the same as the order of the
313	   redundant and primary data fields in the packet.  If all redundancy
314	   blocks in a packet are from the same source, then it is allowed to
315	   use only one CSRC in the RTP packet.  This method is described in RFC
316	   7667, section 3.6.3 Media switching mixer.

318	   A set of specific rules for the application of this method together
319	   with RFC 4103 is needed.

321	   The identification of the source can be made through the RTCP SDES
322	   CNAME and NAME packets as described in RTP[RFC3550].

324	   Also information provided through the notification according to RFC
325	   4575 when the participant joined the conference provides suitable
326	   information and a reference to the SSRC.

328	   A receiving UA is supposed to separate text items from the different
329	   sources and identify and display them accordingly.

331	   The ordered CSRC lists in the RFC 4103 packets make it possible to
332	   recover from loss of one and two packets in sequence and assign the
333	   recovered text to the right source.  For more loss, a marker for
334	   possible loss should be inserted or presented.

336	   The conference server need to have authority to decrypt the payload
337	   in the RTP packets in order to be able to recover text from redundant
338	   data or insert the missing text marker in the stream, and repack the
339	   text in new packets.

341	   Pros:

343	   This method has moderate overhead.

345	   When loss of packets occur, it is possible to recover text from
346	   redundancy at loss of up to the number of redundancy levels carried
347	   in the RFC 4103 stream. (normally primary and two redundant levels.

349	   This method can be implemented with most RTP implementations.

351	   Cons:

353	   When more consecutive packet loss than the number of generations of
354	   redundant data appears, it is not possible to deduct the sources of
355	   the totally lost data.  Therefore it is not possible to know in which
356	   stream to insert the missing text marker.  It MAY be acceptable to
357	   either indicate a general loss indication, or insert a loss marker in
358	   all streams.  Calculations of most likely source can however be made
359	   from received RTP and RTCP contents so that the loss marker can be
360	   inserted in the most likely struck stream.

362	   The conference server need to be allowed to decrypt/encrypt the
363	   packet payload.  This is however normal for media mixers for other
364	   media.

366	4.3.  Distributing packets in an end-to-end encryption structure

368	   In order to achieve end-to-end encryption, it is possible to let the
369	   packets from the sources just pass though a central distributor, and
370	   handle the security agreements between the participants.
371	   Specifications exist for a framework with this functionality suitable
372	   for application on RTP based conferences in draft-ietf-perc-private-
373	   media-framework.  The RTP flow and mixing characteristics has
374	   similarities with the method described under "RTP Translator sending
375	   one RTT stream per participant" above.  RFC 4103 RTP streams would
376	   fit into the structure and it would provide a base for end-to-end
377	   encrypted rtt multi-party conferencing.

379	   Pros:

381	   Good security

383	   Straightforward multi-party handling.

385	   Cons:

387	   Does not operate under the usual SIP central conferencing
388	   architecture.

390	   Requires the participants to perform a lot of key handling.

392	4.4.  RTP Mixer indicating participants by a control code in the stream

394	   Text from all participants except the receiving one is transmitted
395	   from the media mixer in the same RTP session and stream, thus all
396	   using the same destination address/port combination, the same RTP
397	   SSRC and , one sequence number series as described in Section 7.1 and
398	   7.3 of RTP RFC 3550 [RFC3550] about the Mixer function.  The sources
399	   of the text in each RTP packet are identified by a new defined T.140
400	   control code "c" followed by a unique identification of the source in
401	   UTF-8 string format.

403	   The receiver can use the string for presenting the source of text.
404	   This method is on the RTP level described in RFC 7667, section 3.6.2
405	   Media mixing mixer.

407	   The inline coding of the source of text is applied in the data stream
408	   itself, and an RTP mixer function is used for coordinating the
409	   sources of text into one RTP stream.

411	   Information uniquely identifying each user in the multi-party session
412	   is placed as the parameter value "n" in the T.140 application
413	   protocol function with the function code "c".  The identifier shall
414	   thus be formatted like this: SOS c n ST, where SOS and ST are coded
415	   as specified in ITU-T T.140 [T.140].  The "c" is the letter "c".  The
416	   n parameter value is a string uniquely identifying the source.  This
417	   parameter shall be kept short so that it can be repeated in the
418	   transmission without concerns for network load.

420	   A receiving UA is supposed to separate text items from the different
421	   sources and identify and display them accordingly.

423	   The conference server need to be allowed to decrypt/encrypt the
424	   packet payload in order to check the source and repack the text.

426	   Pros:

428	   If loss of packets occur, it is possible to recover text from
429	   redundancy at loss of up to the number of redundancy levels carried
430	   in the RFC 4103 stream. (normally primary and two redundant levels.

432	   This method can be implemented with most RTP implementations.

434	   Transmitted text can also be used with other transports than RTP

436	   Cons:

438	   If more consecutive packet loss than the number of generations of
439	   redundant data appears, it is not possible to deduct the source of
440	   the totally lost data.  Therefore it is not possible to know in which
441	   stream to insert the missing text marker.  Calculations of most
442	   likely source can however be made from recent history, so that it is
443	   quite likely that the marker is inserted in the correct stream.  Such
444	   loss should however be rare, and a general warning that there might
445	   have been text loss in the session might be acceptable.

447	   The mixer needs to be able to generate suitable and unique source
448	   identifications which are suitable as labels for the sources.

450	   Requires an extension on the ITU-T T.140 standard, best made by the
451	   ITU.

453	   The conference server need to be allowed to decrypt/encrypt the
454	   packet payload.

456	   The conference server need to be allowed to decrypt/encrypt the
457	   packet payload.

459	4.5.  Mesh of RTP endpoints

461	   Text from all participants are transmitted directly to all others in
462	   one RTP session, without a central bridge.  The sources of the text
463	   in each RTP packet are identified by the source network address and
464	   the SSRC.

466	   This method is described in RFC 7667, section 3.4 Point to multi-
467	   point using mesh.

469	   Pros:

471	   When loss of packets occur, it is possible to recover text from
472	   redundancy at loss of up to the number of redundancy levels carried
473	   in the RFC 4103 stream. (normally primary and two redundant levels.

475	   This method can be implemented with most RTP implementations.

477	   Transmitted text can also be used with other transports than RTP

479	   Cons:

481	   This model is not described in IMS, NENA and EENA specifications, and
482	   does therefore not meet the requirements.

484	4.6.  Multiple RTP sessions, one for each participant

486	   Text from all participants are transmitted directly to all others in
487	   one RTP session each, without a central bridge.  Each session is
488	   established with a separate media description in SDP.  The sources of
489	   the text in each RTP packet are identified by the source network
490	   address and the SSRC.

492	   This method is out of scope for further discussion here, because the
493	   foreseen applications use centralized model conferencing.

495	   Pros:

497	   When loss of packets occur, it is possible to recover text from
498	   redundancy at loss of up to the number of redundancy levels carried
499	   in the RFC 4103 stream. (normally primary and two redundant levels.

501	   Complete loss of text can be indicated in the received stream.

503	   This method can be implemented with most RTP implementations.

505	   End-to-end encryption is achievable.

507	   Cons:

509	   This method is not described in IMS, NENA and EENA specifications and
510	   does therefore not meet the requirements.

512	   A lot of network resources are spent on setting up separate sessions
513	   for each participant.

515	4.7.  Mixing for conference-unaware user agents

517	   Multi-party real-time text contents can be transmitted to conference-
518	   unaware user agents if source labeling and formatting of the text is
519	   performed by a mixer.  This method has the limitations that the
520	   layout of the presentation and the format of source identification is
521	   purely controlled by the mixer, and that only one source at a time is
522	   allowed to present in real-time.  Other sources need to be stored
523	   temporarily waiting for an appropriate moment to switch the source of
524	   transmitted text.  The mixer controls the switching of sources and
525	   inserts a source identifier in text format at the beginning of text
526	   after switch of source.  The logic of trhe mixer to detect when a
527	   switch is appropriate should detect a number of places in text where
528	   a switch can be allowed, including new line, end of sentence, end of
529	   phrase, a period of inactivity, and a word separator after a long
530	   time of active transmission.

532	   This method MAY be used when no support for multi-party awareness is
533	   detected in the receiving endpoint.The base for his method is
534	   described in RFC 7667, section 3.6.2 Media mixing mixer.

536	   See Appendix A for an informative example of a procedure for
537	   presenting RTT to a conference-unaware UA.

539	   Pros:

541	   Can be transmitted to conference-unaware endpoints.

543	   Can be used with other transports than RTP

545	   Cons:

547	   Does not allow full real-time presentation of more than one source at
548	   a time.  Text from other sources will be delayed, even if automatic
549	   detection of suitable moments for switching source for presentation
550	   is made by the mixer.

552	   The only realistic presentation format is a style with the text from
553	   the different sources presented with a text label indicating source,
554	   and the text collected in a chat style presentation but with more
555	   frequent turn-taking.

557	   Endpoints often have their own system for adding labels to the RTT
558	   presentation.  In that case there will be two levels of labels in the
559	   presentation, one for the mixer and one for the sources.

561	   If loss of more packets than can be recovered by the redundancy
562	   appears, it is not possible to detect which source was struck by the
563	   loss.  It is also possible that a source switch occurred during the
564	   loss, and therefore a false indication of the source of text can be
565	   provided to the user after such loss.

567	   Because of all these cons, this method MUST NOT be used as the main
568	   method, but only as the last resort for backwards interoperability
569	   with conference-unaware endpoints.

571	   The conference server need to be allowed to decrypt/encrypt the
572	   packet payload.

574	5.  RTT bridging in WebRTC

576	   Within WebRTC, real-time text is specified to be carried in WebRTC
577	   data channels as specified in draft-ietf-mmusic-t140-usage-data-
578	   channel.  A few ways to handle multi-party RTT are mentioned briefly.
579	   They are explained and further detailed below.

581	5.1.  RTT bridging in WebRTC with one data channel per source

583	   A straightforward way to handle multi-party RTT is for the bridge to
584	   open one T.140 data channel per source towards the receiving
585	   participants.

587	   The stream-id forms a unique stream identification.

589	   The identification of the source is made through the Label property
590	   of the channel, and session information belonging to the source.  The
591	   UA can compose a readable label for the presentation from this
592	   information.

594	   Pros:

596	   This is a straightforward solution.

598	   Cons:

600	   With a high number of participants, the overhead of establishing the
601	   high number of data channels required may be high.

603	5.2.  RTT bridging in WebRTC with one common data channel

605	   A way to handle multi-party RTT in WebRTC is for the bridge combine
606	   text from all sources into one data channel and insert the sources in
607	   the stream by a T.140 control code for source.

609	   This method is described in a corresponding section for RTP
610	   transmission above.

612	   The identification of the source is made through insertion in the
613	   beginning of each text transmission from a source of a control code
614	   extension "c" followed by a string representing the source, framed by
615	   the control code start and end flags SOS and ST (See ITU-T T.140
616	   [T.140]).

618	   A receiving UA is supposed to separate text items from the different
619	   sources and identify and display them in a suitable way.

621	   The UA does not always display the source identification in the
622	   received text at the place where it is received, but has the
623	   information as a guide for planning the presentation of received
624	   text.  A label corresponding to the source identification is
625	   presented when needed depending on the selected presentation style.

627	   Pros:

629	   This solution has relatively low overhead on session and network
630	   level

632	   Cons:

634	   This solution has higher overhead on the media contents level than
635	   the WebRTC solution above.

637	   Standardisation of the new control code "c" in ITU-T T.140 is
638	   required.

640	   The conference server need to be allowed to decrypt/encrypt the data
641	   channel contents.

643	6.  Preferred multi-party RTT transport method

645	   EDITOR NOTE: The recommendations here need to be validated, and the
646	   proposed further studies performed.

648	   For RTP transport of RTT, two methods for multi-party mixing and
649	   transport for conference-aware parties stand out as fulfilling the
650	   goals best is: "RTP Mixer indicating participants in CSRC".

652	   For WebRTC, one method is to prefer because of the simplicity.  So,
653	   for WebRTC, the method to implement for multi-party RTT with
654	   conference-aware parties when no other method is explicitly agreed
655	   between implementing parties is: "RTT bridging in WebRTC with one
656	   data channel per source".

658	7.  Session control of multi-party RTT sessions

660	   General session control aspects for multi-party sessions are
661	   described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP)
662	   Event Package for Conference State, and RFC 4579 [RFC4579] Session
663	   Initiation Protocol (SIP) Call Control - Conferencing for User
664	   Agents.  The nomenclature of these specifications are used here.

666	   The procedures for a conference-aware model for RTT-transmission
667	   shall only be applied if a capability exchange for conference-aware
668	   real-time text transmission has been completed and a supported method
669	   for multi-party real-time text transmission can be identified.

671	   A method for detection of conference-awareness for centralized SIP
672	   conferencing in general is specified in RFC 4579 [RFC4579].  The
673	   focus sends the "isfocus" feature tag in a SIP Contact header.  This
674	   causes the conference-aware UA to subscribe to conference
675	   notifications from the focus.  The focus then sends notifications to
676	   the UA about entering and disappearing conference participants and
677	   their media capabilities.  The information is carried XML-formatted
678	   in a 'conference-info' block in the notification according to RFC
679	   4575.  The mechanism is described in detail in RFC 4575 [RFC4575].

681	   Before a conference media server starts sending multi-party RTT to a
682	   UA, a verification of its ability to handle multi-party RTT must be
683	   made.  A decision on which mechanism to use for identifying text from
684	   the different participants must also be taken, implicitly or
685	   explicitly.  These verifications and decisions can be done in a
686	   number of ways.  The most apparent ways are specified here and their
687	   pros and cons described.  One of the methods is selected to be the
688	   one to be used by implementations according to this specification.

690	7.1.  Implicit RTT multi-party capability indication

692	   Capability for RTT multi-party handling can be decided to be
693	   implicitly indicated by session control items.

695	   The focus may implicitly indicate muti-party RTT capability by
696	   including the media child with value "text" in the RFC 4575
697	   conference-info provided in conference notifications.

699	   A UA may implicitly indicate multi-party RTT capability by including
700	   the text media in the SDP in the session control transactions with
701	   the conference focus after the subscription to the conference has
702	   taken place.

704	   The implicit RTT capability indication means for the focus that it
705	   can handle multi-party RTT according to the preferred method
706	   indicated in the RTT multi-party methods section above.

708	   The implicit RTT capability indication means for the UA that it can
709	   handle multi-party RTT according to the preferred method indicated in
710	   the RTT multi-party methods section above.

712	   If the focus detects that a UA implicitly declared RTT multi-party
713	   capability, it SHALL provide RTT according to the preferred method.

715	   If the focus detects that the UA does not indicate any RTT multi-
716	   party capability, then it shall either provide RTT multi-party text
717	   in the way specified for conference-unaware UA above, or refuse to
718	   set up the session.

720	   If the UA detects that the focus has implicitly declared RTT multi-
721	   party capability, it shall be prepared to present RTT in a multi-
722	   party fashion according to the preferred method.

724	   Pros:

726	   Acceptance of implicit multi-party capability implies that no
727	   standardisation of explicit RTT multi-party capability exchange is
728	   required.

730	   Cons:

732	   If other methods for multi-party RTT are to be used in the same
733	   implementation environment as the preferred ones,then capability
734	   exchange needs to be defined for them.

736	   Cannot be used outside a strictly applied SIP central conference
737	   model.

739	7.2.  RTT multi-party capability declared by SIP media-tags

741	   Specifications for RTT multi-party capability declarations can be
742	   agreed for use as SIP media feature tags, to be exchanged during SIP
743	   call control operation according to the mechanisms in RFC 3840 and
744	   RFC 3841.  Capability for the RTT Multi-party capability is then
745	   indicated by the media feature tag "rtt-mixer", with one or more of
746	   its possible values in a comma-separated list.

748	   The possible values in the list are:

750	      rtp-translator

752	      rtp-mixer

754	      t140-mixer

756	      rtp-mesh

758	      multi-session

760	   rtp-translator indicates capability for using the RTP-translator
761	   based coordination of multi-party text.

763	   rtp-mixer indicates capability for using the RTP-mixer based
764	   presentation of multi-party text.

766	   t140-mixer indicates capability for using the T.140 control code
767	   source indicators in a mixer.

769	   text-mixer indicates capability for using the fallback method with
770	   text formatting for conference-unaware endpoints.

772	   rtp-mesh indicates capability for using the mesh based transmission
773	   of multi-party text.

775	   multi-session indicates capability for using separate point-to-point
776	   RTP sessions between all participants.

778	   Example: Contact: <sip:a2@beco.example.com>

780	   ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL"

782	   ;+sip.rtt-mixer="multi-session"

784	   If, after evaluation of the alternatives in this specification, only
785	   one mixing method is selected to be brought to implementation, then
786	   the media tag can be reduced to a single tag with no list of values.

788	   An offer-answer exchange should take place and the common method
789	   selected by the answering party shall be used in the session with
790	   that UA.

792	   When no common method is declared, then only the fallback method can
793	   be used or the session dropped.

795	   If more than one text media line is included in SDP, all must be
796	   capable of using the declared RTT multi-party method.

798	   Pros:

800	   Provides a clear decision method.

802	   Can be extended with new mixing methods.

804	   Can guide call routing to a suitable capable focus.

806	   Cons:

808	   Requires standardization and IANA registration.

810	   Is not stream specific.  If more than one text stream is specified,
811	   all must have the same type of multi-party capability.

813	   Cannot be used in the WebRTC environment.

815	7.3.  SDP media attribute for RTT multi-party capability indication

817	   An attribute can be specified on media level, to be used in text
818	   media SDP declarations for negotiating RTT multi-party capabilities.
819	   The attribute can have the name "rtt-mixer", with one or more of its
820	   possible values in a comma-separated list.

822	   The possible values in the list are:

824	      rtp-translator

826	      rtp-mixer

828	      t140-mixer

830	      rtp-mesh

832	      multi-session

834	   rtp-translator indicates capability for using the RTP-translator
835	   based coordination of multi-party text.

837	   rtp-mixer indicates capability for using the RTP-mixer based
838	   presentation of multi-party text.

840	   t140-mixer indicates capability for using the T.140 control code
841	   source indicators in a mixer.

843	   text-mixer indicates capability for using the fallback method with
844	   text formatting for conference-unaware endpoints.

846	   rtp-mesh indicates capability for using the mesh based transmission
847	   of multi-party text.

849	   multi-session indicates capability for using separate point-to-point
850	   RTP sessions between all participants.

852	   An offer-answer exchange should take place and the common method
853	   selected by the answering party shall be used in the session with
854	   that UA.

856	   When no common method is declared, then only the fallback method can
857	   be used.

859	   Example: a=rtt-mixer:rtp-mixer

861	   If, after evaluation of the alternatives in this specification, only
862	   one mixing method is selected to be brought to implementation, then
863	   the attribute can be reduced to a single attribute with no list of
864	   values.

866	   Pros:

868	   Provides a clear decision method.

870	   Can be extended with new mixing methods.

872	   Can be used on specific text media.

874	   Can be used also for SDP-controlled WebRTC sessions with multiple
875	   streams in the same data channel.

877	   Cons:

879	   Requires standardization and IANA registration.

881	   Cannot guide SIP routing.

883	7.4.  SDP format parameter for RTT multi-party capability indication

885	   An FMTP format parameter can be specified for the RFC 4103 media, to
886	   be used in text media SDP declarations for negotiating RTT multi-
887	   party capabilities.  The parameter can have the name "rtt-mixer",
888	   with one or more of its possible values in a comma-separated list.

890	   The possible values in the list are:

892	      rtp-translator

894	      rtp-mixer

896	      t140-mixer

898	      rtp-mesh

900	      multi-session

902	   rtp-translator indicates capability for using the RTP-translator
903	   based coordination of multi-party text.

905	   rtp-mixer indicates capability for using the RTP-mixer based
906	   presentation of multi-party text.

908	   t140-mixer indicates capability for using the T.140 control code
909	   source indicators in a mixer.

911	   text-mixer indicates capability for using the fallback method with
912	   text formatting for conference-unaware endpoints.

914	   rtp-mesh indicates capability for using the mesh based transmission
915	   of multi-party text.

917	   multi-session indicates capability for using separate point-to-point
918	   RTP sessions between all participants.

920	   Example: a=fmtp 96 98/98/98 cps=30;rtt-mixer=rtp-mixer

922	   If, after evaluation of the alternatives in this specification, only
923	   one mixing method is selected to be brought to implementation, then
924	   the parameter can be reduced to a single parameter with no list of
925	   values.

927	   An offer-answer exchange should take place and the common method
928	   selected by the answering party shall be used in the session with
929	   that UA.

931	   When no common method is declared, then only the fallback method can
932	   be used.

934	   Pros:

936	   Provides a clear decision method.

938	   Can be extended with new mixing methods.

940	   Can be used on specific text media.

942	   Can be used also for SDP-controlled WebRTC sessions with multiple
943	   streams in the same data channel.

945	   Cons:

947	   Requires standardization and IANA registration.

949	   May cause interop problems with current RFC4103 implementations not
950	   expecting a new fmtp-parameter.

952	   Cannot guide SIP routing.

954	7.5.  Preferred capability declaration method.

956	   The preferred capability declaration method is the one with SDP
957	   attributes because it is straightforward and partially usable also
958	   for WebRTC.

960	8.  Identification of the source of text

962	   EDITOR NOTE: The text in the following sections need to be adapted
963	   after recommendations for the main methods for coordination of RTT
964	   has been selected.  Details should be provided mainly for the
965	   recommended method.

967	   The main way to identify the source of text in the RTP based solution
968	   is by the SSRC of the sending participant.  It is included in the
969	   CSRC list of the transmitted packets.  Further identification that
970	   may be needed for better labeling of received text may be achieved
971	   from a number of sources.  It may be the RTCP SDES CNAME and NAME
972	   reports, and in the conference notification data (RFC 4575).

974	   As soon as a new member is added to the RTP session, its
975	   characteristics should be transmitted in RTCP SDES CNAME and NAME
976	   reports according to section 6.5 in RFC 3550.  The information about
977	   the participant should also be included in the conference data
978	   including the text media member in a notification according to RFC
979	   4575.

981	   The RTCP SDES report, SHOULD contain identification of the source
982	   represented by the SSRC/CSRC identifier.  This identification MUST
983	   contain the CNAME field and MAY contain the NAME field and other
984	   defined fields of the SDES report.

986	   A focus UA SHOULD primarily convey SDES information received from the
987	   sources of the session members.  When such information is not
988	   available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME
989	   information from available information from the SIP session with the
990	   participant.

992	9.  Presentation of multi-party text

994	   All session participants MUST observe the SSRC/CSRC field of incoming
995	   text RTP packets, and make note of what source they came from in
996	   order to be able to present text in a way that makes it easy to read
997	   text from each participant in a session, and get information about
998	   the source of the text.

1000	9.1.  Associating identities with text streams

1002	   A source identity SHOULD be composed from available information
1003	   sources and displayed together with the text as indicated in ITU-T
1004	   T.140 Appendix [T.140].

1006	   The source identity should primarily be the NAME field from incoming
1007	   SDES packets.  If this information is not available, and the session
1008	   is a two-party session, then the T.140 source identity SHOULD be
1009	   composed from the SIP session participant information.  For multi-
1010	   party sessions the source identity may be composed by local
1011	   information if sufficient information is not available in the
1012	   session.

1014	   Applications may abbreviate the presented source identity to a
1015	   suitable form for the available display.

1017	9.2.  Presentation details for multi-party aware UAs.

1019	   The multi-party aware UA should after any action for recovery of data
1020	   from lost packets, separate the incoming streams and present them
1021	   according to the style that the receiving application supports and
1022	   the user has selected.  The decisions taken for presentation of the
1023	   multi-party interchange shall be purely on the receiving side.  The
1024	   sending application must not insert any item in the stream to
1025	   influence presentation that is not requested by the sending
1026	   participant.

1028	9.2.1.  Bubble style presentation

1030	   One often used style is to present real-time text in chunks in
1031	   readable bubbles identified by labels containing names of sources.
1032	   Bubbles are placed in one column in the presentation area and are
1033	   closed and moved upwards in the presentation area after certain items
1034	   or events, when there is also newer text from another source that
1035	   would go into a new bubble.  The text items that allows bubble
1036	   closing are any character closing a phrase or sentence followed by a
1037	   space or a timeout of a suitable time (about 10 seconds).

1039	   Real-time active text sent from the local user should be presented in
1040	   a separate area.  When there is a reason to close a bubble from the
1041	   local user, the bubble should be placed above all real-time active
1042	   bubbles, so that the time order that real-time text entries were
1043	   completed is visible.

1045	   Scrolling is usually provided for viewing of recent or older text.
1046	   When scrolling is done to an earlier point in the text, the
1047	   presentation shall not move the scroll position by new received text.

1049	   It must be the decision of the local user to return to automatic
1050	   viewing of latest text actions.  It may be useful with an indication
1051	   that there is new text to read after scrolling to an earlier position
1052	   has been activated.

1054	   The presentation area may become too small to present all text in all
1055	   real-time active bubbles.  Various techniques can be applied to
1056	   provide a good overview and good reading opportunity even in such
1057	   situations.  The active real-time bubble may have a limited number of
1058	   lines and if their contents need more lines, then a scrolling
1059	   opportunity within the real-time active bubble is provided.  Another
1060	   method can be to only show the label and the last line of the active
1061	   real-time bubble contents, and make it possible to expand or compress
1062	   the bubble presentation between full view and one line view.

1064	   Erasures require special consideration.  Erasure within a real-time
1065	   active bubble is straightforward.  But if erasure from one
1066	   participant affects the last character before a bubble, the whole
1067	   previous bubble becomes the actual bubble for real-time action by
1068	   that participant and is placed below all other bubbles in the
1069	   presentation area.  If the border between bubbles was caused by the
1070	   CRLF characters, only one erasure action is required to erase this
1071	   bubble border.  When a bubble is closed, it is moved up, above all
1072	   real-time active bubbles.

1074	9.2.2.  Other presentation styles

1076	   Other presentation styles than the bubble style may be arranged and
1077	   appreciated by the users.  In a video conference one way may be to
1078	   have a real-time text area below the video view of each participant.
1079	   Another view may be to provide one column in a presentation area for
1080	   each participant and place the text entries in a relative vertical
1081	   position corresponding to when text entry in them was completed.  The
1082	   labels can then be placed in the column header.  The considerations
1083	   for ending and moving and erasure of entered text discussed above for
1084	   the bubble style are valid also for these styles.

1086	10.  Presentation details for multi-party unaware UAs.

1088	   Multi-party unaware UA:s are prepared only for presentation of two
1089	   sources of text, the local user and a remote user.  In order to
1090	   enable some multi-party communication with such UA, the mixer need to
1091	   plan the presentation and insert labels and line breaks before
1092	   lables.  Many limitations appear for this presentation mode, and it
1093	   must be seen as a fallback and a last resort.

1095	   See Appendix A for an informative example of a procedure for
1096	   presenting RTT to a conference-unaware UA.

1098	11.  Transmission of text from each user

1100	   UAs participating in sessions with real-time text, SHOULD send SDES
1101	   packets in RTCP giving values to appropriate identification fields.

1103	   The CNAME field SHALL be included in SDES packets.

1105	   The NAME field should be given a value that is suitable as an
1106	   identifier of text from the user of the UA.

1108	12.  Robustness and indication of possible loss

1110	   This section discusses the means for robustness against loss of text
1111	   that is already specified and their performance in the multi-party
1112	   situation.  means for reducing the risk for loss is discussed, as
1113	   well as ways to detect in which stream loss has occurred.

1115	   TBD

1117	13.  Performance

1119	   This section discusses performance and performance limitations for
1120	   the different transport solutions, and indicates which means for
1121	   performance increase versus load limitations can be suitable to apply
1122	   compared to the point-to-point case.

1124	   TBD

1126	14.  Security Considerations

1128	   The security considerations valid for RFC 4103 and RFC 3550 are valid
1129	   also for the multi-party sessions with text.

1131	15.  IANA Considerations

1133	   EDITOR NOTE: TBD after decision of proposed preferences in the draft.

1135	   This document Introduces the TBD /SIP media tag/SDP media level
1136	   attribute/ rtt-mixer, with a comma-separated parameter list
1137	   containing the following possible values:

1139	      rtp-translator

1141	      rtp-mixer

1143	      t140-mixer

1145	      rtp-mesh
1146	      multi-session

1148	   rtp-translator indicates capability for using the RTP-translator
1149	   based coordination of multi-party text.

1151	   rtp-mixer indicates capability for using the RTP-mixer based
1152	   presentation of multi-party text.

1154	   t140-mixer indicates capability for using the T.140 control code
1155	   source indicators in a mixer.

1157	   text-mixer indicates capability for using the fallback method with
1158	   text formatting for conference-unaware endpoints.

1160	   rtp-mesh indicates capability for using the mesh based transmission
1161	   of multi-party text.

1163	   multi-session indicates capability for using separate point-to-point
1164	   RTP sessions between all participants.

1166	16.  Congestion considerations

1168	   The congestion considerations described in RFC 4103 are valid also
1169	   for multi-party use of the real-time text RTP transport.  A risk for
1170	   congestion may appear if a number of conference participants are
1171	   active transmitting text simultaneously, because this multi-party
1172	   transmission method does not allow multiple sources of text to
1173	   contribute to the same packet.

1175	   In situations of risk for congestion, the Focus UA MAY combine
1176	   packets from the same source to increase the transmission interval
1177	   per source up to one second.  Local conference policy in the Focus UA
1178	   may be used to decide which streams shall be selected for such
1179	   transmission frequency reduction.

1181	17.  Acknowledgements

1183	   Arnoud van Wijk for contributions to an earlier, expired draft of
1184	   this memo.

1186	18.  References

1188	18.1.  Normative References

1190	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1191	              Requirement Levels", BCP 14, RFC 2119,
1192	              DOI 10.17487/RFC2119, March 1997,
1193	              <https://www.rfc-editor.org/info/rfc2119>.

1195	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
1196	              A., Peterson, J., Sparks, R., Handley, M., and E.
1197	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
1198	              DOI 10.17487/RFC3261, June 2002,
1199	              <https://www.rfc-editor.org/info/rfc3261>.

1201	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1202	              Jacobson, "RTP: A Transport Protocol for Real-Time
1203	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
1204	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

1206	   [RFC4103]  Hellstrom, G. and P. Jones, "RTP Payload for Text
1207	              Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005,
1208	              <https://www.rfc-editor.org/info/rfc4103>.

1210	   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A
1211	              Session Initiation Protocol (SIP) Event Package for
1212	              Conference State", RFC 4575, DOI 10.17487/RFC4575, August
1213	              2006, <https://www.rfc-editor.org/info/rfc4575>.

1215	   [RFC4579]  Johnston, A. and O. Levin, "Session Initiation Protocol
1216	              (SIP) Call Control - Conferencing for User Agents",
1217	              BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006,
1218	              <https://www.rfc-editor.org/info/rfc4579>.

1220	   [T.140]    "Protocol for multimedia application text conversation",
1221	              1998, <http://www.itu.int/rec/T-REC-T.140/en>.

1223	18.2.  Informative References

1225	   [RFC4353]  Rosenberg, J., "A Framework for Conferencing with the
1226	              Session Initiation Protocol (SIP)", RFC 4353,
1227	              DOI 10.17487/RFC4353, February 2006,
1228	              <https://www.rfc-editor.org/info/rfc4353>.

1230	   [RFC4597]  Even, R. and N. Ismail, "Conferencing Scenarios",
1231	              RFC 4597, DOI 10.17487/RFC4597, August 2006,
1232	              <https://www.rfc-editor.org/info/rfc4597>.

1234	   [RFC7667]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
1235	              DOI 10.17487/RFC7667, November 2015,
1236	              <https://www.rfc-editor.org/info/rfc7667>.

1238	Appendix A.  Mixing for a conference-unaware UA

1240	   This informational appendix describes media mixer procedures for a
1241	   multi-party conference server to format real-time text from a number
1242	   of participants into one single text stream to a participant with a
1243	   terminal that has no features for multi-party text display.  The
1244	   procedures are intended for implementations using ITU-T T.140 [T.140]
1245	   for the real-time text coding and presentation.

1247	A.1.  Short description

1249	   The media mixer procedures described here are intended to make real-
1250	   time text from a number of call participants be coordinated into one
1251	   text stream to a terminal originally intended for two-party calls.  A
1252	   conference server is supposed to apply the procedures.

1254	   The procedures may also be applied on a terminal for display of
1255	   multiple streams of real-time text in one area.

1257	   The intention is that text from each participant shall be displayed
1258	   in suitable sections so that it is easy to read, and text from one
1259	   active participant at a time is sent and displayed in real-time.  The
1260	   receiving terminal is assumed to have one display area for received
1261	   text.  The display is arranged by this procedure in a text chat
1262	   style, with a name label in front of each text section where switch
1263	   of source of the text has taken place.

1265	   When more than one participant transmits text at the same time, the
1266	   text from only one of them is transmitted directly to the receiving
1267	   terminals.  Text from the other participants is stored in buffers in
1268	   the conference server for transmission at a later time, when a
1269	   suitable situation for switch of current transmitter can take place.

1271	A.2.  Functionality goals and drawbacks

1273	   The procedures are intended to make best efforts to present a multi-
1274	   party text conversation on a terminal that has no awareness of multi-
1275	   party calls.  There are some obvious drawbacks, and a terminal
1276	   designed with multi-party awareness will be able to present multi-
1277	   party call contents in a more flexible way.  Only two parties at a
1278	   time will be allowed to display added text in real-time, while the
1279	   other parties' produced text will need to be stored in the multi-
1280	   party server for a moment awaiting a suitable occasion to be
1281	   displayed.  There are also some cases of erasure that will not be
1282	   performed on the target text but only indicated in another way.  Even
1283	   with these drawbacks, the procedure provides an opportunity to
1284	   display text from more than two parties in a smooth and readable way.

1286	   This specification does not introduce any new protocol element, and
1287	   does not rely on anything else than basic two-party terminal
1288	   functionality with presentation level according to ITU-T T.140
1289	   [T.140].  It is a description of a best current practice for mixing
1290	   and presentation of the real-time text component in multi-party calls
1291	   with terminals without multi-party awareness.

1293	   The procedures are applicable to scenarios, when the conference focus
1294	   and a User Agent have not gone through any successfully completed
1295	   negotiation about conference awareness for the real-time text medium
1296	   neither on the transport level, nor on the presentation level.

1298	A.3.  Definitions

1300	      Active participant: Any user sending text, or being in a pending
1301	      period.

1303	      BOM Byte-Order-Mark, the Unicode character FEFF in UCS-16.

1305	      Buffer: A buffer intended for unsent text collected per
1306	      participant.

1308	      Contributing participants: The participants selected to contribute
1309	      to the text stream sent to the recipients.

1311	      By default all participants except the recipient are contributing
1312	      participants for transmission to the recipient.

1314	      Current participant: The participant for whom text currently is
1315	      transmitted to the recipient in real time.

1317	      Current Recipients: By default all participants.

1319	      Display Counter: A counter for the number of displayable
1320	      characters in a participant's buffer or in the current entry.
1321	      Used for controlling how far erasure may be performed.

1323	      Erasure replacement A character to be displayed when an erasure
1324	      was done, but the text to erase is not reachable on the multi-
1325	      party display.  Default 'X'.

1327	      Message delimiter: Character(s) forming the end of an imagined
1328	      message.  A configurable set of alternatives, consisting by
1329	      default of: Line Separator, Paragraph Separator, CR, CRLF, LF.

1331	      Pending period: A configurable time period of inactivity from a
1332	      participant, by default set to 7 seconds after each reception of
1333	      characters from that participant, evaluated as current time minus
1334	      time stamp of latest entered character.

1336	      Sentence delimiter: Characters forming end of sentence: A
1337	      configurable set of alternatives, by default consisting of: dot
1338	      '.', question mark '?' and exclamation mark '!' followed by a
1339	      space.

1341	      Label: A readable unique name for a participant, created by the
1342	      server from a suitable source related to the participant, e.g.
1343	      part of the SIP Display name, surrounded by the Label delimiters.
1344	      The label should have a settable maximum length, with 12 being the
1345	      default.

1347	      Label delimiters A configurable set of characters at the edges of
1348	      the Label, by default being a left bracket [ at the leading edge
1349	      and a closing bracket ] followed by a space at the trailing edge.

1351	      Line Separator Unicode UCS-16 2028.  Used to request NewLine in
1352	      Real-Time Text.

1354	      Maximum waiting time: The maximum time any participant's text
1355	      shall be allowed to wait for transmission, by default set to 20
1356	      seconds.

1358	      Recipient: The terminal receiving the mixed text stream.

1360	      SGR Select Graphic Rendition, a control code to specify colours
1361	      etc.

1363	      Switch Reason: A set of reasons to switch Current Participant,
1364	      consisting of the following

1366	      -Waiting time higher for any other participant than the current
1367	      participant combined with any of the following states:

1369	      -A message delimiter was the latest transmitted item

1371	      -A sentence delimiter was the latest transmitted item

1373	      -A Pending Period has expired and still no text has been
1374	      transmitted

1376	      -The Maximum Waiting time has expired followed by a Word Delimiter
1377	      or an expired Time Extension.

1379	      Waiting time: The time the first character in queue for
1380	      transmission from a participant has been waiting in a buffer for
1381	      transmission.  The granularity shall be 0.3 Seconds or finer.

1383	      Word delimiter: Character forming end of word: space
1384	      Time extension: A configurable short extension time allowed after
1385	      the Maximum waiting time during which a suitable moment for
1386	      switching Current Participant is awaited, by default set to 7
1387	      seconds.

1389	A.4.  Presentation level procedures

1391	   The conference server applies these mixing procedures to text
1392	   transmitted to all call participants who have not gone through a
1393	   completed negotiation for conference awareness in real-time text
1394	   presentation.

1396	   All the participants and the conference server use real-time text
1397	   conversation presentation coding according to ITU-T T.140 [T.140].  A
1398	   consequence is that real-time text transmissions are UTF-8 coded,
1399	   with control codes selected from ISO 6429 [ISO 6429].

1401	   The description is from the conference server point of view.

1403	A.4.1.  Structure

1405	   The real-time text mixer structure described here is supposed to be
1406	   placed in the media path so that it is implemented with one mixer per
1407	   recipient.  A mixer contains buffers for temporary storage of text
1408	   intended for the recipient.  Each mixer has one buffer for each
1409	   contributing participant.  A set of status variables is maintained
1410	   per buffer and is used in the mixer actions.  The mixer logic decides
1411	   for each moment which participant?s buffer content is to be sent on
1412	   to the recipient.  By default, the recipient does not contribute text
1413	   to its own mixer.  Text transmitted by a participant is usually
1414	   displayed locally and will only cause confusion if it appears also in
1415	   received text.

1417	   If there is a reason, own text can be configured to be transmitted
1418	   also to the participants.  That can enable a simplification of the
1419	   mixer design to have only one common set of buffers instead of a set
1420	   per recipient.  That simplification will however hamper the flow of
1421	   the conversation severely and is therefore NOT RECOMMENDED.

1423	A.4.2.  Action on reception

1425	   This description of the mixer is valid per recipient.

1427	   Text from each contributing participant is checked for a set of
1428	   characteristics on reception.

1430	      Delete BOM: BOM characters are deleted.

1432	      Insert in buffer: Resulting text is put into the contributing
1433	      participant?s buffer in the receiving participant?s mixer.

1435	      Maintain a display counter: For each text character that will take
1436	      a position on the receiving display, a Display Counter for each
1437	      participant is increased by one.

1439	      There is one T.140 real-time text item that consists of two
1440	      characters, but is regarded to be a unit and therefore increase
1441	      the Display Counter with one only.That is CRLF.

1443	      Furthermore, the following control codes are regarded units that
1444	      shall not take any position on the receiving display and shall
1445	      therefore not increase the Display Counter:

1447	      0098 string 009C (SOS-ST strings)

1449	      ESC 0061 (INT)

1451	      009B Ps 006D (the SGR code, with special handling described below)

1453	      BEL (Alert in session)

1455	      See the section on control codes below for details.

1457	      Combination characters: Also note that it is possible to use
1458	      combination characters in Unicode.  Such combination characters
1459	      contain more than one character part.  They shall only increase
1460	      the Display Counter with one.  The combination characters mainly
1461	      have components in the series 0300 ? 0361 and 20D0 ? 20E1.

1463	      Erasure: If the control code for erasure, BS, is received, the
1464	      following shall be done: If the Display Counter is 0, an Erasure
1465	      Replacement character, by default being ?X? is inserted in the
1466	      buffer instead of the erasure, to mark that erasure was intended
1467	      in earlier transmitted entries.  ( this matches traditional habits
1468	      in real-time text when participants sometimes type XXX to indicate
1469	      erasure they do not bother to make explicit).  If the Display
1470	      Counter is >0, then the counter is reduced by one, and the erasure
1471	      control code BS put into the buffer.

1473	      Initial action in the session: BOM shall be sent initially to the
1474	      recipients in the beginning of the session.

1476	      Maintaining a waiting time per participant: The time that text has
1477	      been in the buffer is maintained as the waiting time for each
1478	      buffer.  A granularity of 0.3 seconds is sufficient.

1480	      Storing time of reception for each character: Each character that
1481	      is stored in a buffer shall be assigned with a time stamp
1482	      indicating its time of reception.  A granularity of 0.3 seconds is
1483	      sufficient.  This time stamp is used for calculation of idle time
1484	      and waiting time in the evaluation of switch reasons.

1486	      Initial assignment of the Current Participant: The first
1487	      contributing participant to send text in the session is assigned
1488	      to be the Current Participant.

1490	      Actions on assignment of a Current Participant: When a participant
1491	      becomes the Current Participant, the following initial actions
1492	      shall be performed:

1494	      1.  Scanning transmissions and timers for a Switch Reason is
1495	      inactivated.

1497	      2.  The Current Recipients are set so that all transmissions go to
1498	      the new set of Current Recipients (See definition).

1500	      3.  A Line Separator is transmitted if the switch reason was any
1501	      other than a message delimiter.

1503	      4.  The Label is transmitted

1505	      5.  Any stored SGR code is transmitted

1507	      6.  Scanning transmissions and timers for a Switch Reason is
1508	      activated.

1510	      7.  Text in the buffer is transmitted, recalculating and setting
1511	      the waiting time for each transmitted character based on the time
1512	      of reception of next character in the buffer.  If a switch occurs
1513	      during transmission from the buffer, the remaining buffer contents
1514	      is maintained and transmission can continue next time this
1515	      transmitter becomes the current participant.  Any text entered
1516	      into the buffer for the current participant is after that sent to
1517	      the recipient until a Switch Reason occurs.

1519	      Actions on transmission and during the session: Transmissions are
1520	      checked for control codes to act on at transmission as described
1521	      below in the section about handling of control codes and such
1522	      actions are performed.  When the scanning of transmission and
1523	      timers for a Switch Reason is active, the timers and the
1524	      transmission to the recipient is analyzed for detection if a
1525	      Switch Reason has occurred.  See the definition of Switch Reasons
1526	      for details.

1528	      Actions when a Switch Reason has occurred: If a Switch Reason has
1529	      occurred, then the following actions shall be performed:

1531	      1.  The Display Counter of the Current Participant is set to zero

1533	      2.  If there is an SGR code stored for the Current Participant, a
1534	      reset of SGR shall be sent by the sequence SGR 0 [009B 0000 006D].

1536	      3.  A participant with the longest waiting time is assigned to be
1537	      the Current Participant, and the procedure for assignment of a
1538	      Current Participant described above is performed.

1540	      Handling of Control codes: The following control codes are
1541	      specified by ITU-T T.140.  Some of them require consideration in
1542	      the conference server.  Note that the codes presented here are
1543	      expressed in UCS-16, while transmission is made in UTF-8 transform
1544	      of these codes.  Other sections specify procedures for handling of
1545	      specific control codes in the conference server.

1547	      BEL 0007 Bell, provides for alerting during an active session.

1549	      BS 0008 Back Space, erases the last entered character.

1551	      NEW LINE 2028 Line separator.

1553	      CR LF 000D 000A A supported, but not preferred way of requesting a
1554	      new line.

1556	      INT ESC 0061 Interrupt (used to initiate mode negotiation
1557	      procedure).

1559	      SGR 009B Ps 006D Select graphic rendition.  Ps is rendition
1560	      parameters specified in ISO 6429.

1562	      SOS 0098 Start of string, used as a general protocol element
1563	      introducer, followed by a maximum 256 bytes string.

1565	      ST 009C String terminator, end of SOS string.

1567	      ESC 001B Escape - used in control strings.

1569	      Byte order mark FEFF Zero width, no break space, used for
1570	      synchronization.

1572	      Missing text mark FFFD Replacement character, marks place in
1573	      stream of possible text loss.

1575	      Code for message border, useful, but not mentioned in T.140: New
1576	      Message 2029 Paragraph separator

1578	      Handling of Graphic Rendition SGR: The following procedure shall
1579	      be followed in order to let the participants control the graphic
1580	      rendition of their entries without disturbing other participants?
1581	      graphic rendition.  The text stream sent to a recipient shall be
1582	      monitored for the SGR sequence.  The latest conveyed SGR sequence
1583	      is also stored as a status variable for the recipient.  If the SGR
1584	      0 code initiated from the current participant is transmitted, the
1585	      SGR storage shall be cleared.

1587	A.5.  Display examples

1589	   The following pictures are examples of the view on a participant's
1590	   display.

1592	     _________________________________________________
1593	    |       Conference       |          Alice          |
1594	    |________________________|_________________________|
1595	    |                        |I will arrive by TGV.    |
1596	    |[Bob]:My flight is to   |Convenient to the main   |
1597	    |Orly.                   |station.                 |
1598	    |[Eve]:Hi all, can we    |                         |
1599	    |plan for the seminar.   |                         |
1600	    |                        |                         |
1601	    |[Bob]:Eve, will you do  |                         |
1602	    |your presentation on    |                         |
1603	    |Friday?                 |                         |
1604	    |[Eve]:Yes, Friday at 10.|                         |
1605	    |[Bob]: Fine, wo         |We need to meet befo     |
1606	    |________________________|_________________________|

1608	   Figure 2 : Alice who has a conference-unaware client is receiving the
1609	   multi-party real-time text in a single-stream.  This figure shows how
1610	   a coordinated column view MAY be presented on Alice's device.

1612	                 _________________________________________________
1613	                |                                              |^|
1614	                |[Alice] Hi, Alice here.                       | |
1615	                |                                              | |
1616	                |[Bob] Bob as well.                            | |
1617	                |                                              | |
1618	                |[Eve] Hi, this is Eve, calling from Paris.    | |
1619	                |      I thought you should be here.           | |
1620	                |                                              | |
1621	                |[Alice] I am coming on Thursday, my           | |
1622	                |      performance is not until Friday morning.| |
1623	                |                                              | |
1624	                |[Bob] And I on Wednesday evening.             | |
1625	                |                                              | |
1626	                |[Eve] we can have dinner and then take a walk | |
1627	                |                                              | |
1628	                | [Eve-typing] But I need to be back to        | |
1629	                |    the hotel by 11 because I need            |-|
1630	                |                                              |-|
1631	                |______________________________________________|v|
1632	                | of course, I underst                           |
1633	                |________________________________________________|

1635	   Figure 3 shows a conference view with real-time text preview.  Bob?s
1636	   text is buffering until a Current switch reason.

1638	A.6.  Summary of configurable parameters

1640	   A number of configurable parameters are described in this
1641	   specification.  This table provides a summary of the parameters on
1642	   presentation level.  A service provider implementing a multi-party
1643	   service may want to set specific values on these parameters to adapt
1644	   the characteristics of the service.  It is possible to control them
1645	   per recipient, if desired.

1647	   Parameter: Current Recipients

1649	   Purpose: Control if participant shall get their own text.

1651	   Possible values: Exclude or Include Current Participant

1653	   Default value: Exclude

1655	   Comment: Own transmissions are usually displayed sufficiently locally

1657	   Parameter: Erasure replacement

1659	   Purpose: Character to show erasure, when erasure cannot be done
1660	   Possible values: Character

1662	   Default value: X

1664	   Comment: May need to have other value for other than Latin script.

1666	   Parameter: Message delimiter

1668	   Purpose: Detection of suitable place in text for switching Current
1669	   Participant

1671	   Possible values: List of Unicode editing codes

1673	   Default value: Line Separator, Paragraph Separator, CR, CRLF, LF

1675	   Comment: Other than Latin based scripts may have other conventions

1677	   Parameter: Pending period

1679	   Purpose: Inactivity timer for detection of time to Switch Current
1680	   Participant

1682	   Possible values: Time in seconds

1684	   Default value: 7

1686	   Comment: Longer times may cause inefficient transmission.  Shorter
1687	   time may cause unwanted switching cutting lines of thought
1688	   inconveniently

1690	   Parameter: Sentence delimiter

1692	   Purpose: Characters forming end of sentence

1694	   Possible values: List of delimiters.

1696	   Default value: . or ? or ! followed by a space

1698	   Comment: Used for deciding on a position in the text to switch
1699	   Current Participant according to configured logic.

1701	   Parameter: Label length

1703	   Purpose: Length of label put in front of or above entry.

1705	   Possible values: Number of characters

1707	   Default value: 12
1708	   Comment: Includes any surrounding characters

1710	   Parameter: Label delimiters

1712	   Purpose: Set of characters at the edges of the label

1714	   Possible values: Two strings.  One in the beginning, one after.

1716	   Default value: [] followed by a space

1718	   Comment: It may be valid to include a Line Separator instead of the
1719	   space

1721	   Parameter: Maximum waiting time

1723	   Purpose: The maximum time any participant?s text shall be allowed to
1724	   wait for transmission

1726	   Possible values: Seconds

1728	   Default value: 20

1730	   Comment After this time a Switch will be forced within the Time
1731	   Extension

1733	   Parameter: Word delimiter

1735	   Purpose: Delimiter for words

1737	   Possible values: List of characters

1739	   Default value: Space

1741	   Comment: Used for detection of suitable switch position if Maximum
1742	   Waiting time has passed.

1744	   Parameter: Time extension

1746	   Purpose: Time for maximum further waiting for a Switch Reason

1748	   Possible values: Time in seconds

1750	   Default value: 7

1752	   Comment: After this time a Switch is forced.

1754	A.7.  References for this Appendix

1756	      [T.140] ITU-T T.140 Application protocol, text conversation
1757	      (including amendment 1.)

1759	      [RFC 4103] IETF RFC 4103 RTP Payload for text conversation

1761	      [RTP] IETF RFC 3550 RTP: A Transport Protocol for Real-Time
1762	      Applications.

1764	      [RFC 4579] IETF RFC 4579 SIP Call Control ? Conferencing for user
1765	      agents.

1767	      [ISO 6429] ISO 6429 Control functions for coded character sets.

1769	      [UTF-8] IETF RFC 3629 UTF-8, a transformation format of ISO 10646

1771	      [Unicode] The Unicode Consortium, "The Unicode Standard ? Version
1772	      4.0?

1774	      [ISO 10?646-1] ISO 10?646 Universal multiple-octet coded character
1775	      set (UCS)

1777	      [UCS-16] See ISO 10?646-1

1779	A.8.  Acknowledgement

1781	   This appendix was developed with funding in part from the National
1782	   Institute on Disability and Rehabilitation Research, U.S.  Department
1783	   of Education,RERC on Telecommunications Access,?grant # H133E090001?.
1784	   However, the contents do not necessarily represent the policy of the
1785	   Department of Education, and you should not assume endorsement by the
1786	   Federal Government.

1788	Author's Address

1790	   Gunnar Hellstrom
1791	   Omnitor
1792	   Esplanaden 30
1793	   Vendelso  SE-136 70
1794	   SE

1796	   Phone: +46 708 204 288
1797	   Email: gunnar.hellstrom@omnitor.se
1798	   URI:   www.omnitor.se