idnits 2.17.1 

draft-hellstrom-mmusic-multi-party-rtt-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RFC4103]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 3, 2020) is 1514 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'ISO 6429' is mentioned on line 1748, but not defined

  == Missing Reference: 'Alice' is mentioned on line 1711, but not defined

  == Missing Reference: 'Bob' is mentioned on line 1721, but not defined

  == Missing Reference: 'Eve' is mentioned on line 1725, but not defined

  == Missing Reference: 'RFC 4103' is mentioned on line 1740, but not defined

  == Missing Reference: 'RTP' is mentioned on line 1742, but not defined

  == Missing Reference: 'RFC 4579' is mentioned on line 1745, but not defined

  == Missing Reference: 'UTF-8' is mentioned on line 1750, but not defined

  == Missing Reference: 'Unicode' is mentioned on line 1752, but not defined

  == Missing Reference: 'UCS-16' is mentioned on line 1758, but not defined

  == Unused Reference: 'RFC3264' is defined on line 1258, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-03) exists of
     draft-hellstrom-avtcore-multi-party-rtt-source-01

  == Outdated reference: A later version (-14) exists of
     draft-ietf-mmusic-t140-usage-data-channel-11


     Summary: 1 error (**), 0 flaws (~~), 14 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                             G. Hellstrom
3	Internet-Draft                                                   Omnitor
4	Intended status: Best Current Practice                     March 3, 2020
5	Expires: September 4, 2020

7	        Real-time text media handling in multi-party conferences
8	               draft-hellstrom-mmusic-multi-party-rtt-02

10	Abstract

12	   This memo specifies methods for Real-Time Text (RTT) media handling
13	   in multi-party calls.  The main RTT transport is to carry Real-Time
14	   text by the RTP protocol in a time-sampled mode according to RFC 4103
15	   RFC 4103 [RFC4103] .  The mechanisms enable the receiving application
16	   to present the received real-time text medium separated per source,
17	   in different ways according to user preferences.  Some presentation
18	   related features are also described explaining suitable variations of
19	   transmission and presentation of text.

21	   Call control features are described for the SIP environment.  A
22	   number of alternative methods for providing the multi-party
23	   negotiation, transmission and presentation are discussed and a
24	   recommendation for the main one is provided.  The main solution for
25	   centralized multi-party handling of real-time text is achieved
26	   through a media control unit coordinating multiple RTP text streams
27	   into one RTP stream.

29	   Alternative methods using a single RTP stream and source
30	   identification inline in the text stream are also described, one of
31	   them being provided as a lower functionality fallback method for
32	   endpoints with no multi-party awareness for RTT.

34	   Bridging methods where the text stream is carried untouched by the
35	   bridge are also discussed.

37	   Brief information is also provided for multi-party RTT in the WebRTC
38	   environment.

40	Status of This Memo

42	   This Internet-Draft is submitted in full conformance with the
43	   provisions of BCP 78 and BCP 79.

45	   Internet-Drafts are working documents of the Internet Engineering
46	   Task Force (IETF).  Note that other groups may also distribute
47	   working documents as Internet-Drafts.  The list of current Internet-
48	   Drafts is at https://datatracker.ietf.org/drafts/current/.

50	   Internet-Drafts are draft documents valid for a maximum of six months
51	   and may be updated, replaced, or obsoleted by other documents at any
52	   time.  It is inappropriate to use Internet-Drafts as reference
53	   material or to cite them other than as "work in progress."

55	   This Internet-Draft will expire on September 4, 2020.

57	Copyright Notice

59	   Copyright (c) 2020 IETF Trust and the persons identified as the
60	   document authors.  All rights reserved.

62	   This document is subject to BCP 78 and the IETF Trust's Legal
63	   Provisions Relating to IETF Documents
64	   (https://trustee.ietf.org/license-info) in effect on the date of
65	   publication of this document.  Please review these documents
66	   carefully, as they describe your rights and restrictions with respect
67	   to this document.  Code Components extracted from this document must
68	   include Simplified BSD License text as described in Section 4.e of
69	   the Trust Legal Provisions and are provided without warranty as
70	   described in the Simplified BSD License.

72	Table of Contents

74	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
75	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
76	   2.  Centralized conference model  . . . . . . . . . . . . . . . .   4
77	   3.  Requirements on multi-party RTT . . . . . . . . . . . . . . .   5
78	   4.  Coordination of text RTP streams  . . . . . . . . . . . . . .   6
79	     4.1.  RTP-based solutions with a central mixer  . . . . . . . .   6
80	       4.1.1.  RTP Mixer indicating sources in CSRC-list . . . . . .   6
81	       4.1.2.  RTP Mixer indicating participants by a control code
82	               in the stream . . . . . . . . . . . . . . . . . . . .   8
83	       4.1.3.  Mixing for conference-unaware user agents . . . . . .   9
84	     4.2.  RTP-based bridging with RTT media contents untouched by
85	           the bridge  . . . . . . . . . . . . . . . . . . . . . . .  10
86	       4.2.1.  RTP Translator sending one RTT stream per participant  10
87	       4.2.2.  Distributing packets in an end-to-end encryption
88	               structure . . . . . . . . . . . . . . . . . . . . . .  11
89	       4.2.3.  Mesh of RTP endpoints . . . . . . . . . . . . . . . .  12
90	       4.2.4.  Multiple RTP sessions, one for each participant . . .  12
91	     4.3.  RTT bridging in WebRTC  . . . . . . . . . . . . . . . . .  13
92	       4.3.1.  RTT bridging in WebRTC with one data channel per
93	               source  . . . . . . . . . . . . . . . . . . . . . . .  13
94	       4.3.2.  RTT bridging in WebRTC with one common data channel .  14
95	   5.  Preferred multi-party RTT transport method  . . . . . . . . .  14
96	   6.  Session control of multi-party RTT sessions . . . . . . . . .  15
97	     6.1.  Implicit RTT multi-party capability indication  . . . . .  16
98	     6.2.  RTT multi-party capability declared by SIP media-tags . .  17
99	     6.3.  SDP media attribute for RTT multi-party capability
100	           indication  . . . . . . . . . . . . . . . . . . . . . . .  18
101	     6.4.  Simplified SDP media attribute for RTT multi-party
102	           capability indication . . . . . . . . . . . . . . . . . .  19
103	     6.5.  SDP format parameter for RTT multi-party capability
104	           indication  . . . . . . . . . . . . . . . . . . . . . . .  20
105	     6.6.  Preferred capability declaration method.  . . . . . . . .  21
106	   7.  Identification of the source of text  . . . . . . . . . . . .  21
107	   8.  Presentation of multi-party text  . . . . . . . . . . . . . .  21
108	     8.1.  Associating identities with text streams  . . . . . . . .  22
109	     8.2.  Presentation details for multi-party aware UAs. . . . . .  22
110	       8.2.1.  Bubble style presentation . . . . . . . . . . . . . .  22
111	       8.2.2.  Other presentation styles . . . . . . . . . . . . . .  24
112	   9.  Presentation details for multi-party unaware UAs. . . . . . .  25
113	   10. Security Considerations . . . . . . . . . . . . . . . . . . .  25
114	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  25
115	   12. Congestion considerations . . . . . . . . . . . . . . . . . .  26
116	   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  26
117	   14. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . .  26
118	     14.1.  Changes from version -01 to -02  . . . . . . . . . . . .  26
119	   15. References  . . . . . . . . . . . . . . . . . . . . . . . . .  26
120	     15.1.  Normative References . . . . . . . . . . . . . . . . . .  26
121	     15.2.  Informative References . . . . . . . . . . . . . . . . .  27
122	   Appendix A.  Mixing for a conference-unaware UA . . . . . . . . .  29
123	     A.1.  Short description . . . . . . . . . . . . . . . . . . . .  29
124	     A.2.  Functionality goals and drawbacks . . . . . . . . . . . .  30
125	     A.3.  Definitions . . . . . . . . . . . . . . . . . . . . . . .  30
126	     A.4.  Presentation level procedures . . . . . . . . . . . . . .  32
127	       A.4.1.  Structure . . . . . . . . . . . . . . . . . . . . . .  33
128	       A.4.2.  Action on reception . . . . . . . . . . . . . . . . .  33
129	     A.5.  Display examples  . . . . . . . . . . . . . . . . . . . .  36
130	     A.6.  References for this Appendix  . . . . . . . . . . . . . .  38
131	     A.7.  Acknowledgement for the appendix  . . . . . . . . . . . .  38
132	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  38

134	1.  Introduction

136	   Real-time text (RTT) is a medium in real-time conversational
137	   sessions.  Text entered by participants in a session is transmitted
138	   in a time-sampled fashion, so that no specific user action is needed
139	   to cause transmission.  This gives a direct flow of text in the rate
140	   it is created, that is suitable in a real-time conversational
141	   setting.  The real-time text medium can be combined with other media
142	   in multimedia sessions.

144	   Media from a number of multimedia session participants can be
145	   combined in a multi-party session.  This memo specifies how the real-
146	   time text streams can be handled in multi-party sessions.

148	   The description is mainly focused on the transport level, but also
149	   describes a few session and presentation level aspects.

151	   Transport of real-time text is specified in RFC 4103 [RFC4103] RTP
152	   Payload for text conversation.  It makes use of RFC 3550 [RFC3550]
153	   Real Time Protocol, for transport.  Robustness against network
154	   transmission problems is normally achieved through redundant
155	   transmission based on the principle from RFC 2198 [RFC2198], with one
156	   primary and two redundant transmission of each text element.  Primary
157	   and redundant transmissions are combined in packets and described by
158	   a redundancy header.  This transport is usually used in the SIP
159	   Session Initiation Protocol RFC 3261 [RFC3261] environment.

161	   A very brief overview of functions for real-time text handling in
162	   multi-party sessions is described in RFC 4597 [RFC4597] Conferencing
163	   Scenarios, sections 4.8 and 4.10.  This specification builds on that
164	   description and indicates which protocol mechanisms should be used to
165	   implement multi-party handling of real-time text.

167	   Real-time text can also be transported in the WebRTC environment, by
168	   using WebRTC data channels according to
169	   [I-D.ietf-mmusic-t140-usage-data-channel].  Multi-party aspects for
170	   WebRTC solutions are briefly covered.

172	1.1.  Requirements Language

174	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
175	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
176	   document are to be interpreted as described in RFC 2119 [RFC2119].

178	2.  Centralized conference model

180	   In the centralized conference model for SIP, introduced in RFC 4353
181	   [RFC4353] A Framework for Conferencing with the Session Initiation
182	   Protocol (SIP), one function co-ordinates the communication with
183	   participants in the multi-party session.  This function also controls
184	   media mixer functions for the media appearing in the session.  The
185	   central function is common for control of all media, while the media
186	   mixers may work differently for each media.

188	   The central function is called the Focus UA.  Many variants exist for
189	   setting up sessions including the multipoint control centre.  It is
190	   not within scope of this description to describe these, but rather
191	   the media specific handling in the mixer required to handle multi-
192	   party calls with RTT.

194	   The main principle for handling real-time text media in a centralized
195	   conference is that one RTP session for real-time text is established
196	   including the multipoint media control centre and the participating
197	   endpoints which are going to have real-time text exchange with the
198	   others.

200	   The different possible mechanisms for mixing and transporting RTT
201	   differs in the way they multiplex the text streams and how they
202	   identify the sources of the streams.  RFC 7667 [RFC7667] describes a
203	   number of possible use cases for RTP.  This specification refers to
204	   different sections of RFC 7667 for further reading of the situations
205	   caused by the different possible design choices.

207	   The recommended method for using RTT in a centralized conference
208	   model is specified in [I-D.hellstrom-avtcore-multi-party-rtt-source]

210	   Real-time text can also be transported in the WebRTC environment, by
211	   using WebRTC datachannels according to
212	   [I-D.ietf-mmusic-t140-usage-data-channel].  Ways to handle multi-
213	   party calls in that environmnent are also specified.

215	3.  Requirements on multi-party RTT

217	   The following requirements are placed on multi-party RTT:

219	      A solution shall be applicable to IMS (3GPP TS 22.173)[TS22173],
220	      SIP based VoIP and Next Generation Emergency Services (NENA i3
221	      [NENAi3], ETSI TS 103 479 [TS103479], RFC 6443[RFC6443]).

223	      The transmission interval for text must not be longer than 500
224	      milliseconds when there is anything available to send.  Ref ITU-T
225	      T.140 [T140].

227	      If text loss is detected or suspected, a missing text marker shall
228	      be inserted in the text stream.  Ref ITU-T T.140 Amendment 1
229	      [T140ad1].  ETSI EN 301 549 [EN301549]

231	      The display of text from the members of the conversation shall be
232	      arranged so that the text from each participant is clearly
233	      readable, and its source and the relative timing of entered text
234	      is visualized in the display.  Mechanisms for looking back in the
235	      contents from the current session should be provided.  The text
236	      should be displayed as soon as it is received.  Ref ITU-T T.140
237	      [T140]
238	      Bridges must be multimedia capable (voice, video, text).  Ref NENA
239	      i3 STA-010.2.  [NENAi3]

241	      R7: It MUST be possible to use real-time text in conferences both
242	      as a medium of discussion between individual participants (for
243	      example, for sidebar discussions in real-time text while listening
244	      to the main conference audio) and for central support of the
245	      conference with real-time text interpretation of speech.  Ref RFC
246	      5194.[RFC5194]

248	      It should be possible to protect RTT contents with usual means for
249	      privacy and integrity.Ref RFC 6881 section 16.  [RFC6881]

251	      Conferencing procedures are documented in RFC 4579 [RFC4579].  Ref
252	      NENA i3 STA-010.2.[NENAi3]

254	      Conferencing applies to any kind of media stream by which users
255	      may want to communicate.  Ref 3GPP TS 24.147 [TS24147]

257	      The framework for SIP conferences is specified in RFC 4353
258	      [RFC4353].  Ref 3GPP TS 24.147 [TS24147]

260	4.  Coordination of text RTP streams

262	   Coordinating and sending text RTP streams in the multi-party session
263	   can be done in a number of ways.  The most suitable methods are
264	   specified here with pros and cons.

266	   A receiving and presenting UA SHOULD separate text from the different
267	   sources and identify and display them accordingly.

269	4.1.  RTP-based solutions with a central mixer

271	   A set of solutions can be based on the central RTP mixer.  They are
272	   described here and a preferred method selected.

274	4.1.1.  RTP Mixer indicating sources in CSRC-list

276	   An RTP media mixer combines text from participants into one RTP
277	   stream , thus all using the same destination address/port
278	   combination, the same RTP SSRC and , one sequence number series as
279	   described in Section 7.1 and 7.3 of RTP RFC 3550 [RFC3550] about the
280	   Mixer function.  This method is also briefly described in RFC 7667,
281	   section 3.6.1 Media mixing mixer [RFC7667].

283	   The sources of the text in each RTP packet are identified by the CSRC
284	   list in the RTP packets, containing the SSRC of the initial sources
285	   of text.  The order of the CSRC parameters is with the SSRC of the
286	   source of the primary text first, followed by the SSRC of the first
287	   level redundancy, and then the second level.

289	   A set of specific rules for the application of this method together
290	   with RFC 4103 [RFC4103]is specified in
291	   [I-D.hellstrom-avtcore-multi-party-rtt-source]

293	   The identification of the sources is made through the CSRC fields and
294	   can be made more readable through the RTCP SDES CNAME and NAME
295	   packets as described in RTP[RFC3550].

297	   Also information provided through the notification according to RFC
298	   4575 [RFC4575] when the participant joined the conference provides
299	   suitable information and a reference to the SSRC.

301	   A receiving UA is supposed to separate text items from the different
302	   sources and identify and display them accordingly.

304	   The ordered CSRC lists in the RFC 4103 [RFC4103] packets make it
305	   possible to recover from loss of one and two packets in sequence and
306	   assign the recovered text to the right source.  For more loss, a
307	   marker for possible loss should be inserted or presented.

309	   The conference server need to have authority to decrypt the payload
310	   in the RTP packets in order to be able to recover text from redundant
311	   data or insert the missing text marker in the stream, and repack the
312	   text in new packets.

314	   Pros:

316	   This method has low overhead.

318	   When loss of packets occur, it is possible to recover text from
319	   redundancy at loss of up to the number of redundancy levels carried
320	   in the RFC 4103 [RFC4103] stream. (normally primary and two redundant
321	   levels.

323	   This method can be implemented with most RTP implementations.

325	   Cons:

327	   When more consecutive packet loss than the number of generations of
328	   redundant data appears, it is not possible to deduct the sources of
329	   the totally lost data.

331	   The conference server need to be allowed to decrypt/encrypt the
332	   packet payload.  This is however normal for media mixers for other
333	   media.

335	4.1.2.  RTP Mixer indicating participants by a control code in the
336	        stream

338	   Text from all participants except the receiving one is transmitted
339	   from the media mixer in the same RTP session and stream, thus all
340	   using the same destination address/port combination, the same RTP
341	   SSRC and , one sequence number series as described in Section 7.1 and
342	   7.3 of RTP RFC 3550 [RFC3550] about the Mixer function.  The sources
343	   of the text in each RTP packet are identified by a new defined T.140
344	   control code "c" followed by a unique identification of the source in
345	   UTF-8 string format.

347	   The receiver can use the string for presenting the source of text.
348	   This method is on the RTP level described in RFC 7667, section 3.6.1
349	   Media mixing mixer [RFC7667].

351	   The inline coding of the source of text is applied in the data stream
352	   itself, and an RTP mixer function is used for coordinating the
353	   sources of text into one RTP stream.

355	   Information uniquely identifying each user in the multi-party session
356	   is placed as the parameter value "n" in the T.140 application
357	   protocol function with the function code "c".  The identifier shall
358	   thus be formatted like this: SOS c n ST, where SOS and ST are coded
359	   as specified in ITU-T T.140 [T140].  The "c" is the letter "c".  The
360	   n parameter value is a string uniquely identifying the source.  This
361	   parameter shall be kept short so that it can be repeated in the
362	   transmission without concerns for network load.

364	   A receiving UA is supposed to separate text items from the different
365	   sources and identify and display them accordingly.

367	   The conference server need to be allowed to decrypt/encrypt the
368	   packet payload in order to check the source and repack the text.

370	   Pros:

372	   If loss of packets occur, it is possible to recover text from
373	   redundancy at loss of up to the number of redundancy levels carried
374	   in the RFC 4103 [RFC4103]stream. (normally primary and two redundant
375	   levels.

377	   This method can be implemented with most RTP implementations.

379	   Transmitted text can also be used with other transports than RTP

381	   Cons:

383	   The method implies a moderate load by the need to insert the source
384	   often in the stream.

386	   If more consecutive packet loss than the number of generations of
387	   redundant data appears, it is not possible to deduct the source of
388	   the totally lost data.

390	   The mixer needs to be able to generate suitable and unique source
391	   identifications which are suitable as labels for the sources.

393	   Requires an extension on the ITU-T T.140 standard, best made by the
394	   ITU.

396	   The conference server need to be allowed to decrypt/encrypt the
397	   packet payload.

399	   The conference server need to be allowed to decrypt/encrypt the
400	   packet payload.

402	4.1.3.  Mixing for conference-unaware user agents

404	   Multi-party real-time text contents can be transmitted to conference-
405	   unaware user agents if source labeling and formatting of the text is
406	   performed by a mixer.  This method has the limitations that the
407	   layout of the presentation and the format of source identification is
408	   purely controlled by the mixer, and that only one source at a time is
409	   allowed to present in real-time.  Other sources need to be stored
410	   temporarily waiting for an appropriate moment to switch the source of
411	   transmitted text.  The mixer controls the switching of sources and
412	   inserts a source identifier in text format at the beginning of text
413	   after switch of source.  The logic of trhe mixer to detect when a
414	   switch is appropriate should detect a number of places in text where
415	   a switch can be allowed, including new line, end of sentence, end of
416	   phrase, a period of inactivity, and a word separator after a long
417	   time of active transmission.

419	   This method MAY be used when no support for multi-party awareness is
420	   detected in the receiving endpoint.The base for his method is
421	   described in RFC 7667, section 3.6.1 Media mixing mixer [RFC7667].

423	   See Appendix A for an informative example of a procedure for
424	   presenting RTT to a conference-unaware UA.

426	   Pros:

428	   Can be transmitted to conference-unaware endpoints.

430	   Can be used with other transports than RTP
431	   Cons:

433	   Does not allow full real-time presentation of more than one source at
434	   a time.  Text from other sources will be delayed, even if automatic
435	   detection of suitable moments for switching source for presentation
436	   is made by the mixer.

438	   The only realistic presentation format is a style with the text from
439	   the different sources presented with a text label indicating source,
440	   and the text collected in a chat style presentation but with more
441	   frequent turn-taking.

443	   Endpoints often have their own system for adding labels to the RTT
444	   presentation.  In that case there will be two levels of labels in the
445	   presentation, one for the mixer and one for the sources.

447	   If loss of more packets than can be recovered by the redundancy
448	   appears, it is not possible to detect which source was struck by the
449	   loss.  It is also possible that a source switch occurred during the
450	   loss, and therefore a false indication of the source of text can be
451	   provided to the user after such loss.

453	   Because of all these cons, this method is not recommended and MUST
454	   NOT be used as the main method, but only as the last resort for
455	   backwards interoperability with conference-unaware endpoints.

457	   The conference server need to be allowed to decrypt/encrypt the
458	   packet payload.

460	4.2.  RTP-based bridging with RTT media contents untouched by the bridge

462	   It may be desirable to send text in a multi-party setting in a way
463	   that allows the text stream contents to be distributed without
464	   decryption and encryption in any central server.  A number of such
465	   methods are described.  However, when writing this specification, no
466	   one of these methods have a specified way of establishing the session
467	   by sdp.

469	4.2.1.  RTP Translator sending one RTT stream per participant

471	   Within the RTP session, text from each participant is transmitted
472	   from the RTP media translator in a separate RTP stream, thus using
473	   the same destination address/port combination, but separate RTP SSRC
474	   parameters and sequence number series as described in Section 7.1 and
475	   7.2 of RTP RFC 3550 [RFC3550] about the Translator function.  The
476	   sources of the text in each RTP packet are identified by the SSRC
477	   parameters in the RTP packets, containing the SSRC of the initial
478	   sources of text.

480	   A receiving and presenting UA is supposed to separate text items from
481	   the different sources and identify and display them in a suitable
482	   way.

484	   This method is described in RFC 7667, section 3.5.1 Relay-transport
485	   translator or 3.5.2 Media translator [RFC7667].

487	   The identification of the source is made through the RTCP SDES CNAME
488	   and NAME packets as described in RTP[RFC3550].

490	   Pros:

492	   This method has moderate overhead.  When loss of packets occur, it is
493	   possible to recover text from redundancy at loss of up to the number
494	   of redundancy levels carried in the RFC 4103 [RFC4103] stream.
495	   (normally primary and two redundant levels.

497	   More loss than what can be recovered, can be detected and the marker
498	   for text loss can be inserted in the correct stream.

500	   It may be possible in some scenarios to keep the text encrypted
501	   through the Translator.

503	   Cons:

505	   There may be RTP implementations not supporting the Translator model.

507	   This configuration is not supported by current media declarations in
508	   sdp.  RFC 3264 [RFC3264]specifies in many places that one media
509	   description is supposed to describe just one RTP stream.

511	4.2.2.  Distributing packets in an end-to-end encryption structure

513	   In order to achieve end-to-end encryption, it is possible to let the
514	   packets from the sources just pass though a central distributor, and
515	   handle the security agreements between the participants.
516	   Specifications exist for a framework with this functionality suitable
517	   for application on RTP based conferences in
518	   [I-D.ietf-perc-private-media-framework].  The RTP flow and mixing
519	   characteristics has similarities with the method described under "RTP
520	   Translator sending one RTT stream per participant" above.  RFC 4103
521	   RTP streams [RFC4103] would fit into the structure and it would
522	   provide a base for end-to-end encrypted rtt multi-party conferencing.

524	   Pros:

526	   Good security
527	   Straightforward multi-party handling.

529	   Cons:

531	   Does not operate under the usual SIP central conferencing
532	   architecture.

534	   Requires the participants to perform a lot of key handling.

536	4.2.3.  Mesh of RTP endpoints

538	   Text from all participants are transmitted directly to all others in
539	   one RTP session, without a central bridge.  The sources of the text
540	   in each RTP packet are identified by the source network address and
541	   the SSRC.

543	   This method is described in RFC 7667, section 3.4 Point to multi-
544	   point using mesh [RFC7667].

546	   Pros:

548	   When loss of packets occur, it is possible to recover text from
549	   redundancy at loss of up to the number of redundancy levels carried
550	   in the RFC 4103 [RFC4103] stream. (normally primary and two redundant
551	   levels.

553	   This method can be implemented with most RTP implementations.

555	   Transmitted text can also be used with other transports than RTP

557	   Cons:

559	   This model is not described in IMS, NENA and EENA specifications, and
560	   does therefore not meet the requirements.

562	4.2.4.  Multiple RTP sessions, one for each participant

564	   Text from all participants are transmitted directly to all others in
565	   one RTP session each, without a central bridge.  Each session is
566	   established with a separate media description in SDP.  The sources of
567	   the text in each RTP packet are identified by the source network
568	   address and the SSRC.

570	   This method is out of scope for further discussion here, because the
571	   foreseen applications use centralized model conferencing.

573	   Pros:

575	   When loss of packets occur, it is possible to recover text from
576	   redundancy at loss of up to the number of redundancy levels carried
577	   in the RFC 4103 [RFC4103] stream. (normally primary and two redundant
578	   levels.

580	   Complete loss of text can be indicated in the received stream.

582	   This method can be implemented with most RTP implementations.

584	   End-to-end encryption is achievable.

586	   Cons:

588	   This method is not described in IMS, NENA and ETSI specifications and
589	   does therefore not meet the requirements.

591	   A lot of network resources are spent on setting up separate sessions
592	   for each participant.

594	4.3.  RTT bridging in WebRTC

596	   Within WebRTC, real-time text is specified to be carried in WebRTC
597	   data channels as specified in
598	   [I-D.ietf-mmusic-t140-usage-data-channel].  A few ways to handle
599	   multi-party RTT are mentioned briefly.  They are explained and
600	   further detailed below.

602	4.3.1.  RTT bridging in WebRTC with one data channel per source

604	   A straightforward way to handle multi-party RTT is for the bridge to
605	   open one T.140 data channel per source towards the receiving
606	   participants.

608	   The stream-id forms a unique stream identification.

610	   The identification of the source is made through the Label property
611	   of the channel, and session information belonging to the source.  The
612	   UA can compose a readable label for the presentation from this
613	   information.

615	   Pros:

617	   This is a straightforward solution.

619	   Cons:

621	   With a high number of participants, the overhead of establishing the
622	   high number of data channels required may be high.

624	4.3.2.  RTT bridging in WebRTC with one common data channel

626	   A way to handle multi-party RTT in WebRTC is for the bridge combine
627	   text from all sources into one data channel and insert the sources in
628	   the stream by a T.140 control code for source.

630	   This method is described in a corresponding section for RTP
631	   transmission above Section 4.1.2.

633	   The identification of the source is made through insertion in the
634	   beginning of each text transmission from a source of a control code
635	   extension "c" followed by a string representing the source, framed by
636	   the control code start and end flags SOS and ST (See ITU-T T.140
637	   [T140]).

639	   A receiving UA is supposed to separate text items from the different
640	   sources and identify and display them in a suitable way.

642	   The UA does not always display the source identification in the
643	   received text at the place where it is received, but has the
644	   information as a guide for planning the presentation of received
645	   text.  A label corresponding to the source identification is
646	   presented when needed depending on the selected presentation style.

648	   Pros:

650	   This solution has relatively low overhead on session and network
651	   level

653	   Cons:

655	   This solution has higher overhead on the media contents level than
656	   the WebRTC solution above.

658	   Standardisation of the new control code "c" in ITU-T T.140 [T140] is
659	   required.

661	   The conference server need to be allowed to decrypt/encrypt the data
662	   channel contents.

664	5.  Preferred multi-party RTT transport method

666	   For RTP transport of RTT using RTP-mixer technology, one method for
667	   multi-party mixing and transport stand out as fulfilling the goals
668	   best and is therefore recommended.  That is: "RTP Mixer indicating
669	   participants in CSRC list" Section 4.1.1.

671	   For RTP transport in separate streams or sessions, a bridging method
672	   with good characteristics is the end-to-end encryption model "perc"
673	   Section 4.2.2.

675	   For WebRTC, one method is to prefer because of the simplicity.  So,
676	   for WebRTC, the method to implement for multi-party RTT with
677	   conference-aware parties when no other method is explicitly agreed
678	   between implementing parties is: "RTT bridging in WebRTC with one
679	   data channel per source" Section 4.3.1.

681	6.  Session control of multi-party RTT sessions

683	   General session control aspects for multi-party sessions are
684	   described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP)
685	   Event Package for Conference State, and RFC 4579 [RFC4579] Session
686	   Initiation Protocol (SIP) Call Control - Conferencing for User
687	   Agents.  The nomenclature of these specifications are used here.

689	   The procedures for a conference-aware model for RTT-transmission
690	   shall only be applied if a capability exchange for conference-aware
691	   real-time text transmission has been completed and a supported method
692	   for multi-party real-time text transmission can be negotiated.

694	   A method for detection of conference-awareness for centralized SIP
695	   conferencing in general is specified in RFC 4579 [RFC4579].  The
696	   focus sends the "isfocus" feature tag in a SIP Contact header.  This
697	   causes the conference-aware UA to subscribe to conference
698	   notifications from the focus.  The focus then sends notifications to
699	   the UA about entering and disappearing conference participants and
700	   their media capabilities.  The information is carried XML-formatted
701	   in a 'conference-info' block in the notification according to RFC
702	   4575 [RFC4575].  The mechanism is described in detail in RFC 4575
703	   [RFC4575].

705	   Before a conference media server starts sending multi-party RTT to a
706	   UA, a verification of its ability to handle multi-party RTT must be
707	   made.  A decision on which mechanism to use for identifying text from
708	   the different participants must also be taken, implicitly or
709	   explicitly.  These verifications and decisions can be done in a
710	   number of ways.  The most apparent ways are specified here and their
711	   pros and cons described.  One of the methods is selected to be the
712	   one to be used by implementations of the centralized conference model
713	   according to this specification.

715	6.1.  Implicit RTT multi-party capability indication

717	   Capability for RTT multi-party handling can be decided to be
718	   implicitly indicated by session control items.

720	   The focus may implicitly indicate muti-party RTT capability by
721	   including the media child with value "text" in the RFC 4575 [RFC4575]
722	   conference-info provided in conference notifications.

724	   A UA may implicitly indicate multi-party RTT capability by including
725	   the text media in the SDP in the session control transactions with
726	   the conference focus after the subscription to the conference has
727	   taken place.

729	   The implicit RTT capability indication means for the focus that it
730	   can handle multi-party RTT according to the preferred method
731	   indicated in the RTT multi-party methods section above.

733	   The implicit RTT capability indication means for the UA that it can
734	   handle multi-party RTT according to the preferred method indicated in
735	   the RTT multi-party methods section above.

737	   If the focus detects that a UA implicitly declared RTT multi-party
738	   capability, it SHALL provide RTT according to the preferred method.

740	   If the focus detects that the UA does not indicate any RTT multi-
741	   party capability, then it shall either provide RTT multi-party text
742	   in the way specified for conference-unaware UA above, or refuse to
743	   set up the session.

745	   If the UA detects that the focus has implicitly declared RTT multi-
746	   party capability, it shall be prepared to present RTT in a multi-
747	   party fashion according to the preferred method.

749	   Pros:

751	   Acceptance of implicit multi-party capability implies that no
752	   standardisation of explicit RTT multi-party capability exchange is
753	   required.

755	   Cons:

757	   If other methods for multi-party RTT are to be used in the same
758	   implementation environment as the preferred ones,then capability
759	   exchange needs to be defined for them.

761	   Cannot be used outside a strictly applied SIP central conference
762	   model.

764	6.2.  RTT multi-party capability declared by SIP media-tags

766	   Specifications for RTT multi-party capability declarations can be
767	   agreed for use as SIP media feature tags, to be exchanged during SIP
768	   call control operation according to the mechanisms in RFC 3840
769	   [RFC3840] and RFC 3841 [RFC3841].  Capability for the RTT Multi-party
770	   capability is then indicated by the media feature tag "rtt-mix", with
771	   a set of possible values for the different possible methods.

773	   The possible values in the list may be:

775	      rtp-mixer

777	      perc

779	   rtp-mixer indicates capability for using the RTP-mixer based
780	   presentation of multi-party text.

782	   perc indicates capability for using the perc based transmission of
783	   multi-party text.

785	   Example: Contact: <sip:a2@beco.example.com>

787	   ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL"

789	   ;+sip.rtt-mix="rtp-mixer"

791	   If, after evaluation of the alternatives in this specification, only
792	   one mixing method is selected to be brought to implementation, then
793	   the media tag can be reduced to a single tag with no list of values.

795	   An offer-answer exchange should take place and the common method
796	   selected by the answering party shall be used in the session with
797	   that UA.

799	   When no common method is declared, then only the fallback method for
800	   multi-party unaware participants can be used, or the session dropped.

802	   If more than one text media line is included in SDP, all must be
803	   capable of using the declared RTT multi-party method.

805	   Pros:

807	   Provides a clear decision method.

809	   Can be extended with new mixing methods.

811	   Can guide call routing to a suitable capable focus.

813	   Cons:

815	   Requires standardization and IANA registration.

817	   Is not stream specific.  If more than one text stream is specified,
818	   all must have the same type of multi-party capability.

820	   Cannot be used in the WebRTC environment.

822	6.3.  SDP media attribute for RTT multi-party capability indication

824	   An attribute can be specified on media level, to be used in text
825	   media SDP declarations for negotiating RTT multi-party capabilities.
826	   The attribute can have the name "rtt-mix", with one or more of its
827	   possible values in a comma-separated list.

829	   The possible values in the list are:

831	      rtp-mixer

833	      perc

835	   rtp-mixer indicates capability for using the RTP-mixer based
836	   presentation of multi-party text.

838	   perc indicates capability for using the perc based transmission of
839	   multi-party text.

841	   An offer-answer exchange should take place and the common method
842	   selected by the answering party shall be used in the session with
843	   that UA.

845	   When no common method is declared, then only the fallback method can
846	   be used.

848	   Example: a=rtt-mix:rtp-mixer

850	   If, after evaluation of the alternatives in this specification, only
851	   one mixing method is selected to be brought to implementation, then
852	   the attribute can be reduced to a single attribute with no list of
853	   values.

855	   Pros:

857	   Provides a clear decision method.

859	   Can be extended with new mixing methods.

861	   Can be used on specific text media.

863	   Can be used also for SDP-controlled WebRTC sessions with multiple
864	   streams in the same data channel.

866	   Cons:

868	   Requires standardization and IANA registration.

870	   Cannot guide SIP routing.

872	6.4.  Simplified SDP media attribute for RTT multi-party capability
873	      indication

875	   An attribute can be specified on media level, to be used in text
876	   media SDP declarations for negotiating RTT multi-party capabilities.
877	   The attribute can have the name "rtt-mix" with no value.  It would be
878	   selected and used if only one method for multi-party rtt is brought
879	   forward from this specification, and the other suppressed or found to
880	   be possible to negotiate in another way..

882	   An offer-answer exchange should take place and if both parties
883	   specify rtt-mix capability, the method for indicating source in the
884	   CSRC-list shall be used.

886	   When no common method is declared, then only the fallback method can
887	   be used, or the session not accepted for multi-party use.

889	   Example: a=rtt-mix

891	   Pros:

893	   Provides a clear decision method.

895	   Very simple syntax and semantics.

897	   Can be used on specific text media.

899	   Can be used also for SDP-controlled WebRTC sessions with multiple
900	   streams in the same data channel.

902	   Cons:

904	   Requires standardization and IANA registration.

906	   Cannot guide SIP routing.

908	6.5.  SDP format parameter for RTT multi-party capability indication

910	   An FMTP format parameter can be specified for the RFC 4103
911	   [RFC4103]media, to be used in text media SDP declarations for
912	   negotiating RTT multi-party capabilities.  The parameter can have the
913	   name "rtt-mix", with one or more of its possible values.

915	   The possible values in the list are:

917	      rtp-mixer

919	      perc

921	   rtp-mixer indicates capability for using the RTP-mixer based
922	   presentation of multi-party text.

924	   perc indicates capability for using the perc based transmission of
925	   multi-party text.

927	   Example: a=fmtp 96 98/98/98 cps=30;rtt-mix=rtp-mixer

929	   If, after evaluation of the alternatives in this specification, only
930	   one mixing method is selected to be brought to implementation, then
931	   the parameter can be reduced to a single parameter with no list of
932	   values.

934	   An offer-answer exchange should take place and the common method
935	   selected by the answering party shall be used in the session with
936	   that UA.

938	   When no common method is declared, then only the fallback method can
939	   be used, or the session denied.

941	   Pros:

943	   Provides a clear decision method.

945	   Can be extended with new mixing methods.

947	   Can be used on specific text media.

949	   Can be used also for SDP-controlled WebRTC sessions with multiple
950	   streams in the same data channel.

952	   Cons:

954	   Requires standardization and IANA registration.

956	   May cause interop problems with current RFC4103 [RFC4103]
957	   implementations not expecting a new fmtp-parameter.

959	   Cannot guide SIP routing.

961	6.6.  Preferred capability declaration method.

963	   The preferred capability declaration method is the one with a
964	   simplified SDP attribute "a=rtt-mix" Section 6.4 because it is
965	   straightforward and partially usable also for WebRTC.

967	7.  Identification of the source of text

969	   The main way to identify the source of text in the RTP based solution
970	   is by the SSRC of the sending participant.  In the RTP-mixer
971	   solution, it is included in the CSRC list of the transmitted packets.
972	   Further identification that may be needed for better labeling of
973	   received text may be achieved from a number of sources.  It may be
974	   the RTCP SDES CNAME and NAME reports, and in the conference
975	   notification data (RFC 4575) [RFC4575].

977	   As soon as a new member is added to the RTP session, its
978	   characteristics should be transmitted in RTCP SDES CNAME and NAME
979	   reports according to section 6.5 in RFC 3550 [RFC3550].  The
980	   information about the participant should also be included in the
981	   conference data including the text media member in a notification
982	   according to RFC 4575 [RFC4575].

984	   The RTCP SDES report, SHOULD contain identification of the source
985	   represented by the SSRC/CSRC identifier.  This identification MUST
986	   contain the CNAME field and MAY contain the NAME field and other
987	   defined fields of the SDES report.

989	   A focus UA SHOULD primarily convey SDES information received from the
990	   sources of the session members.  When such information is not
991	   available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME
992	   information from available information from the SIP session with the
993	   participant.

995	8.  Presentation of multi-party text

997	   All session participants MUST observe the SSRC/CSRC field of incoming
998	   text RTP packets, and make note of what source they came from in
999	   order to be able to present text in a way that makes it easy to read
1000	   text from each participant in a session, and get information about
1001	   the source of the text.

1003	8.1.  Associating identities with text streams

1005	   A source identity SHOULD be composed from available information
1006	   sources and displayed together with the text as indicated in ITU-T
1007	   T.140 Appendix[T140].

1009	   The source identity should primarily be the NAME field from incoming
1010	   SDES packets.  If this information is not available, and the session
1011	   is a two-party session, then the T.140 source identity SHOULD be
1012	   composed from the SIP session participant information.  For multi-
1013	   party sessions the source identity may be composed by local
1014	   information if sufficient information is not available in the
1015	   session.

1017	   Applications may abbreviate the presented source identity to a
1018	   suitable form for the available display.

1020	   Applications may also replace received source information with
1021	   internally used nicknames.

1023	8.2.  Presentation details for multi-party aware UAs.

1025	   The multi-party aware UA should after any action for recovery of data
1026	   from lost packets, separate the incoming streams and present them
1027	   according to the style that the receiving application supports and
1028	   the user has selected.  The decisions taken for presentation of the
1029	   multi-party interchange shall be purely on the receiving side.  The
1030	   sending application must not insert any item in the stream to
1031	   influence presentation that is not requested by the sending
1032	   participant.

1034	8.2.1.  Bubble style presentation

1036	   One often used style is to present real-time text in chunks in
1037	   readable bubbles identified by labels containing names of sources.
1038	   Bubbles are placed in one column in the presentation area and are
1039	   closed and moved upwards in the presentation area after certain items
1040	   or events, when there is also newer text from another source that
1041	   would go into a new bubble.  The text items that allows bubble
1042	   closing are any character closing a phrase or sentence followed by a
1043	   space or a timeout of a suitable time (about 10 seconds).

1045	   Real-time active text sent from the local user should be presented in
1046	   a separate area.  When there is a reason to close a bubble from the
1047	   local user, the bubble should be placed above all real-time active
1048	   bubbles, so that the time order that real-time text entries were
1049	   completed is visible.

1051	   Scrolling is usually provided for viewing of recent or older text.
1052	   When scrolling is done to an earlier point in the text, the
1053	   presentation shall not move the scroll position by new received text.
1054	   It must be the decision of the local user to return to automatic
1055	   viewing of latest text actions.  It may be useful with an indication
1056	   that there is new text to read after scrolling to an earlier position
1057	   has been activated.

1059	   The presentation area may become too small to present all text in all
1060	   real-time active bubbles.  Various techniques can be applied to
1061	   provide a good overview and good reading opportunity even in such
1062	   situations.  The active real-time bubble may have a limited number of
1063	   lines and if their contents need more lines, then a scrolling
1064	   opportunity within the real-time active bubble is provided.  Another
1065	   method can be to only show the label and the last line of the active
1066	   real-time bubble contents, and make it possible to expand or compress
1067	   the bubble presentation between full view and one line view.

1069	   Erasures require special consideration.  Erasure within a real-time
1070	   active bubble is straightforward.  But if erasure from one
1071	   participant affects the last character before a bubble, the whole
1072	   previous bubble becomes the actual bubble for real-time action by
1073	   that participant and is placed below all other bubbles in the
1074	   presentation area.  If the border between bubbles was caused by the
1075	   CRLF characters (instead of the normal "Line Separator"), only one
1076	   erasure action is required to erase this bubble border.  When a
1077	   bubble is closed, it is moved up, above all real-time active bubbles.

1079	   A three-party view is shown in this example .

1081	                 _________________________________________________
1082	                |                                              |^|
1083	                |                                              | |
1084	                |                                              | |
1085	                |                                              | |
1086	                |[Alice] Hi, Alice here.                       | |
1087	                |                                              | |
1088	                |[Bob] Bob as well.                            | |
1089	                |                                              | |
1090	                |[Eve] Hi, this is Eve, calling from Paris.    | |
1091	                |      I thought you should be here.           | |
1092	                |                                              | |
1093	                |[Alice] I am coming on Thursday, my           | |
1094	                |      performance is not until Friday morning.| |
1095	                |                                              | |
1096	                |[Bob] And I on Wednesday evening.             | |
1097	                |                                              | |
1098	                |[Alice] Can we meet on Thursday evening?      | |
1099	                |                                              | |
1100	                |[Eve] Yes, definitely. How about 7pm.         | |
1101	                |     at the entrance of the restaurant        | |
1102	                |     Le Lion Blanc?                           | |
1103	                |[Eve] we can have dinner and then take a walk | |
1104	                |                                              | |
1105	                | <Eve-typing> But I need to be back to        | |
1106	                |    the hotel by 11 because I need            | |
1107	                |                                              |-|
1108	                | <Bob-typing> I wou                           |-|
1109	                |______________________________________________|v|
1110	                | of course, I underst                           |
1111	                |________________________________________________|

1113	   Figure 1: Example of a three-party call presented in the bubble
1114	   style.

1116	               Figure 1: Three-party call with bubble style.

1118	8.2.2.  Other presentation styles

1120	   Other presentation styles than the bubble style may be arranged and
1121	   appreciated by the users.  In a video conference one way may be to
1122	   have a real-time text area below the video view of each participant.
1123	   Another view may be to provide one column in a presentation area for
1124	   each participant and place the text entries in a relative vertical
1125	   position corresponding to when text entry in them was completed.  The
1126	   labels can then be placed in the column header.  The considerations
1127	   for ending and moving and erasure of entered text discussed above for
1128	   the bubble style are valid also for these styles.

1130	   This figure shows how a coordinated column view MAY be presented.

1132	   _____________________________________________________________________
1133	   |       Bob          |       Eve            |       Alice           |
1134	   |____________________|______________________|_______________________|
1135	   |                    |                      |I will arrive by TGV.  |
1136	   |My flight is to Orly|                      |Convenient to the main |
1137	   |                    |Hi all, can we plan   |station.               |
1138	   |                    |for the seminar?      |                       |
1139	   |Eve, will you do    |                      |                       |
1140	   |your presentation on|                      |                       |
1141	   |Friday?             |Yes, Friday at 10.    |                       |
1142	   |Fine, wo            |                      |We need to meet befo   |
1143	   |___________________________________________________________________|

1145	   Figure 2: A coordinated column-view of a three-party session with
1146	   entries ordered in approximate time-order.

1148	9.  Presentation details for multi-party unaware UAs.

1150	   Multi-party unaware UA:s are prepared only for presentation of two
1151	   sources of text, the local user and a remote user.  If mixing for
1152	   multi-party unaware UAs is to be supported, in order to enable some
1153	   multi-party communication with such UA, the mixer need to plan the
1154	   presentation and insert labels and line breaks before lables.  Many
1155	   limitations appear for this presentation mode, and it must be seen as
1156	   a fallback and a last resort.  A realistic alternative is to not
1157	   allow multi-party sessions with multi-party unaware UAs.

1159	   See Appendix A for an informative example of a procedure for
1160	   presenting RTT to a conference-unaware UA.

1162	10.  Security Considerations

1164	   The security considerations valid for RFC 4103 [RFC4103] and RFC 3550
1165	   [RFC3550] are valid also for the multi-party sessions with text.

1167	11.  IANA Considerations

1169	   The items for indication and negotiation of capability for multi-
1170	   party rtt should be registered with IANA in the specifications where
1171	   they are specified in detail.

1173	12.  Congestion considerations

1175	   The congestion considerations described in RFC 4103 [RFC4103] are
1176	   valid also for multi-party use of the real-time text RTP transport.
1177	   A risk for congestion may appear if a number of conference
1178	   participants are active transmitting text simultaneously, because
1179	   this multi-party transmission method does not allow multiple sources
1180	   of text to contribute to the same packet.

1182	   In situations of risk for congestion, the Focus UA MAY combine
1183	   packets from the same source to increase the transmission interval
1184	   per source up to one second.  Local conference policy in the Focus UA
1185	   may be used to decide which streams shall be selected for such
1186	   transmission frequency reduction.

1188	13.  Acknowledgements

1190	   Arnoud van Wijk for contributions to an earlier, expired draft of
1191	   this memo.

1193	14.  Changes

1195	14.1.  Changes from version -01 to -02

1197	   Change from a general overview to overview with clear
1198	   recommendations.

1200	   Splits text coordination methods in three groups.

1202	   Recommends rtt-mixer with sources in CSRC-list but referenes to its
1203	   spec for details.

1205	   Shortened Appendix with conference-unaware example.

1207	   Cleaned up preferences.

1209	   Inserted pictures of screen-views.

1211	15.  References

1213	15.1.  Normative References

1215	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1216	              Requirement Levels", BCP 14, RFC 2119,
1217	              DOI 10.17487/RFC2119, March 1997,
1218	              <https://www.rfc-editor.org/info/rfc2119>.

1220	15.2.  Informative References

1222	   [EN301549]
1223	              ETSI, "EN 301 549. Accessibility requirements for ICT
1224	              products and services", November 2019.

1226	   [I-D.hellstrom-avtcore-multi-party-rtt-source]
1227	              Hellstrom, G., "Indicating source of multi-party Real-time
1228	              text", draft-hellstrom-avtcore-multi-party-rtt-source-01
1229	              (work in progress), February 2020.

1231	   [I-D.ietf-mmusic-t140-usage-data-channel]
1232	              Holmberg, C., "T.140 Real-time Text Conversation over
1233	              WebRTC Data Channels", draft-ietf-mmusic-t140-usage-data-
1234	              channel-11 (work in progress), December 2019.

1236	   [I-D.ietf-perc-private-media-framework]
1237	              Jones, P., Benham, D., and C. Groves, "A Solution
1238	              Framework for Private Media in Privacy Enhanced RTP
1239	              Conferencing (PERC)", draft-ietf-perc-private-media-
1240	              framework-12 (work in progress), June 2019.

1242	   [NENAi3]   NENA, "NENA-STA-010.2-2016. Detailed Functional and
1243	              Interface Standards for the NENA i3 Solution", October
1244	              2016.

1246	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
1247	              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
1248	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
1249	              DOI 10.17487/RFC2198, September 1997,
1250	              <https://www.rfc-editor.org/info/rfc2198>.

1252	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
1253	              A., Peterson, J., Sparks, R., Handley, M., and E.
1254	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
1255	              DOI 10.17487/RFC3261, June 2002,
1256	              <https://www.rfc-editor.org/info/rfc3261>.

1258	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
1259	              with Session Description Protocol (SDP)", RFC 3264,
1260	              DOI 10.17487/RFC3264, June 2002,
1261	              <https://www.rfc-editor.org/info/rfc3264>.

1263	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1264	              Jacobson, "RTP: A Transport Protocol for Real-Time
1265	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
1266	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

1268	   [RFC3840]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat,
1269	              "Indicating User Agent Capabilities in the Session
1270	              Initiation Protocol (SIP)", RFC 3840,
1271	              DOI 10.17487/RFC3840, August 2004,
1272	              <https://www.rfc-editor.org/info/rfc3840>.

1274	   [RFC3841]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller
1275	              Preferences for the Session Initiation Protocol (SIP)",
1276	              RFC 3841, DOI 10.17487/RFC3841, August 2004,
1277	              <https://www.rfc-editor.org/info/rfc3841>.

1279	   [RFC4103]  Hellstrom, G. and P. Jones, "RTP Payload for Text
1280	              Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005,
1281	              <https://www.rfc-editor.org/info/rfc4103>.

1283	   [RFC4353]  Rosenberg, J., "A Framework for Conferencing with the
1284	              Session Initiation Protocol (SIP)", RFC 4353,
1285	              DOI 10.17487/RFC4353, February 2006,
1286	              <https://www.rfc-editor.org/info/rfc4353>.

1288	   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A
1289	              Session Initiation Protocol (SIP) Event Package for
1290	              Conference State", RFC 4575, DOI 10.17487/RFC4575, August
1291	              2006, <https://www.rfc-editor.org/info/rfc4575>.

1293	   [RFC4579]  Johnston, A. and O. Levin, "Session Initiation Protocol
1294	              (SIP) Call Control - Conferencing for User Agents",
1295	              BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006,
1296	              <https://www.rfc-editor.org/info/rfc4579>.

1298	   [RFC4597]  Even, R. and N. Ismail, "Conferencing Scenarios",
1299	              RFC 4597, DOI 10.17487/RFC4597, August 2006,
1300	              <https://www.rfc-editor.org/info/rfc4597>.

1302	   [RFC5194]  van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real-
1303	              Time Text over IP Using the Session Initiation Protocol
1304	              (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008,
1305	              <https://www.rfc-editor.org/info/rfc5194>.

1307	   [RFC6443]  Rosen, B., Schulzrinne, H., Polk, J., and A. Newton,
1308	              "Framework for Emergency Calling Using Internet
1309	              Multimedia", RFC 6443, DOI 10.17487/RFC6443, December
1310	              2011, <https://www.rfc-editor.org/info/rfc6443>.

1312	   [RFC6881]  Rosen, B. and J. Polk, "Best Current Practice for
1313	              Communications Services in Support of Emergency Calling",
1314	              BCP 181, RFC 6881, DOI 10.17487/RFC6881, March 2013,
1315	              <https://www.rfc-editor.org/info/rfc6881>.

1317	   [RFC7667]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
1318	              DOI 10.17487/RFC7667, November 2015,
1319	              <https://www.rfc-editor.org/info/rfc7667>.

1321	   [T140]     ITU-T, "Recommendation ITU-T T.140 (02/1998), Protocol for
1322	              multimedia application text conversation", February 1998.

1324	   [T140ad1]  ITU-T, "Recommendation ITU-T.140 Addendum 1 - (02/2000),
1325	              Protocol for multimedia application text conversation",
1326	              February 2000.

1328	   [TS103479]
1329	              ETSI, "TS 103 479. Emergency communications (EMTEL); Core
1330	              elements for network independent access to emergency
1331	              services", December 2019.

1333	   [TS22173]  3GPP, "IP Multimedia Core Network Subsystem (IMS)
1334	              Multimedia Telephony Service and supplementary services;
1335	              Stage 1", 3GPP TS 22.173 17.1.0, December 2019.

1337	   [TS24147]  3GPP, "Conferencing using the IP Multimedia (IM) Core
1338	              Network (CN) subsystem; Stage 3", 3GPP TS 24.147 16.0.0,
1339	              December 2019.

1341	Appendix A.  Mixing for a conference-unaware UA

1343	   This informational appendix describes media mixer procedures for a
1344	   multi-party conference server to format real-time text from a number
1345	   of participants into one single text stream to a participant with a
1346	   terminal that has no features for multi-party text display.  The
1347	   procedures are intended for implementations using ITU-T T.140 [T.140]
1348	   for the real-time text coding and presentation.

1350	A.1.  Short description

1352	   The media mixer procedures described here are intended to make real-
1353	   time text from a number of call participants be coordinated into one
1354	   text stream to a terminal originally intended for two-party calls.  A
1355	   conference server is supposed to apply the procedures.

1357	   The procedures may also be applied on a terminal for display of
1358	   multiple streams of real-time text in one area.

1360	   The intention is that text from each participant shall be displayed
1361	   in suitable sections so that it is easy to read, and text from one
1362	   active participant at a time is sent and displayed in real-time.  The
1363	   receiving terminal is assumed to have one display area for received
1364	   text.  The display is arranged by this procedure in a text chat
1365	   style, with a name label in front of each text section where switch
1366	   of source of the text has taken place.

1368	   When more than one participant transmits text at the same time, the
1369	   text from only one of them is transmitted directly to the receiving
1370	   terminals.  Text from the other participants is stored in buffers in
1371	   the conference server for transmission at a later time, when a
1372	   suitable situation for switch of current transmitter can take place.

1374	A.2.  Functionality goals and drawbacks

1376	   The procedures are intended to make best efforts to present a multi-
1377	   party text conversation on a terminal that has no awareness of multi-
1378	   party calls.  There are some obvious drawbacks, and a terminal
1379	   designed with multi-party awareness will be able to present multi-
1380	   party call contents in a more flexible way.  Only two parties at a
1381	   time will be allowed to display added text in real-time, while the
1382	   other parties' produced text will need to be stored in the multi-
1383	   party server for a moment awaiting a suitable occasion to be
1384	   displayed.  There are also some cases of erasure that will not be
1385	   performed on the target text but only indicated in another way.  Even
1386	   with these drawbacks, the procedure provides an opportunity to
1387	   display text from more than two parties in a smooth and readable way.

1389	   This specification does not introduce any new protocol element, and
1390	   does not rely on anything else than basic two-party terminal
1391	   functionality with presentation level according to ITU-T T.140
1392	   [T.140].  It is a description of a best current practice for mixing
1393	   and presentation of the real-time text component in multi-party calls
1394	   with terminals without multi-party awareness.

1396	   The procedures are applicable to scenarios, when the conference focus
1397	   and a User Agent have not gone through any successfully completed
1398	   negotiation about conference awareness for the real-time text medium
1399	   neither on the transport level, nor on the presentation level.

1401	A.3.  Definitions

1403	      Active participant: Any user sending text, or being in a pending
1404	      period.

1406	      BOM Byte-Order-Mark, the Unicode character FEFF in UCS-16.

1408	      Buffer: A buffer intended for unsent text collected per
1409	      participant.

1411	      Contributing participants: The participants selected to contribute
1412	      to the text stream sent to the recipients.

1414	      By default all participants except the recipient are contributing
1415	      participants for transmission to the recipient.

1417	      Current participant: The participant for whom text currently is
1418	      transmitted to the recipient in real time.

1420	      Current Recipients: By default all participants.

1422	      Display Counter: A counter for the number of displayable
1423	      characters in a participant's buffer or in the current entry.
1424	      Used for controlling how far erasure may be performed.

1426	      Erasure replacement A character to be displayed when an erasure
1427	      was done, but the text to erase is not reachable on the multi-
1428	      party display.  Default 'X'.

1430	      Message delimiter: Character(s) forming the end of an imagined
1431	      message.  A configurable set of alternatives, consisting by
1432	      default of: Line Separator, Paragraph Separator, CR, CRLF, LF.

1434	      Pending period: A configurable time period of inactivity from a
1435	      participant, by default set to 7 seconds after each reception of
1436	      characters from that participant, evaluated as current time minus
1437	      time stamp of latest entered character.

1439	      Sentence delimiter: Characters forming end of sentence: A
1440	      configurable set of alternatives, by default consisting of: dot
1441	      '.', question mark '?' and exclamation mark '!' followed by a
1442	      space.

1444	      Label: A readable unique name for a participant, created by the
1445	      server from a suitable source related to the participant, e.g.
1446	      part of the SIP Display name, surrounded by the Label delimiters.
1447	      The label should have a settable maximum length, with 12 being the
1448	      default.

1450	      Label delimiters A configurable set of characters at the edges of
1451	      the Label, by default being a left bracket [ at the leading edge
1452	      and a closing bracket ] followed by a space at the trailing edge.

1454	      Line Separator Unicode UCS-16 2028.  Used to request NewLine in
1455	      Real-Time Text.

1457	      Maximum waiting time: The maximum time any participant's text
1458	      shall be allowed to wait for transmission, by default set to 20
1459	      seconds.

1461	      Recipient: The terminal receiving the mixed text stream.

1463	      SGR Select Graphic Rendition, a control code to specify colours
1464	      etc.

1466	      Switch Reason: A set of reasons to switch Current Participant,
1467	      consisting of the following

1469	      -Waiting time higher for any other participant than the current
1470	      participant combined with any of the following states:

1472	      -A message delimiter was the latest transmitted item

1474	      -A sentence delimiter was the latest transmitted item

1476	      -A Pending Period has expired and still no text has been
1477	      transmitted

1479	      -The Maximum Waiting time has expired followed by a Word Delimiter
1480	      or an expired Time Extension.

1482	      Waiting time: The time the first character in queue for
1483	      transmission from a participant has been waiting in a buffer for
1484	      transmission.  The granularity shall be 0.3 Seconds or finer.

1486	      Word delimiter: Character forming end of word: space

1488	      Time extension: A configurable short extension time allowed after
1489	      the Maximum waiting time during which a suitable moment for
1490	      switching Current Participant is awaited, by default set to 7
1491	      seconds.

1493	A.4.  Presentation level procedures

1495	   The conference server applies these mixing procedures to text
1496	   transmitted to call participants who have not gone through a
1497	   completed negotiation for conference awareness in real-time text
1498	   presentation.

1500	   All the participants and the conference server use real-time text
1501	   conversation presentation coding according to ITU-T T.140 [T.140].  A
1502	   consequence is that real-time text transmissions are UTF-8 coded,
1503	   with control codes selected from ISO 6429 [ISO 6429].

1505	   The description is from the conference server point of view.

1507	A.4.1.  Structure

1509	   The real-time text mixer structure described here is supposed to be
1510	   placed in the media path so that it is implemented with one mixer per
1511	   recipient.  A mixer contains buffers for temporary storage of text
1512	   intended for the recipient.  Each mixer has one buffer for each
1513	   contributing participant.  A set of status variables is maintained
1514	   per buffer and is used in the mixer actions.  The mixer logic decides
1515	   for each moment which participant?s buffer content is to be sent on
1516	   to the recipient.  By default, the recipient does not contribute text
1517	   to its own mixer.  Text transmitted by a participant is usually
1518	   displayed locally and it will only cause confusion if it appears also
1519	   in received text.

1521	A.4.2.  Action on reception

1523	   This description of the mixer is valid per recipient.

1525	   Text from each contributing participant is checked for a set of
1526	   characteristics on reception.

1528	      Delete BOM: BOM characters are deleted.

1530	      Insert in buffer: Resulting text is put into the contributing
1531	      participant?s buffer in the receiving participant?s mixer.

1533	      Maintain a display counter: For each text character that will take
1534	      a position on the receiving display, a Display Counter for each
1535	      participant is increased by one.

1537	      There is one T.140 real-time text item that consists of two
1538	      characters, but is regarded to be a unit and therefore increase
1539	      the Display Counter with one only.That is CRLF.

1541	      Furthermore, the following control codes are regarded units that
1542	      shall not take any position on the receiving display and shall
1543	      therefore not increase the Display Counter:

1545	      0098 string 009C (SOS-ST strings)

1547	      ESC 0061 (INT)

1549	      009B Ps 006D (the SGR code, with special handling described below)

1551	      BEL (Alert in session)

1553	      See the section on control codes below for details.

1555	      Combination characters: Also note that it is possible to use
1556	      combination characters in Unicode.  Such combination characters
1557	      contain more than one character part.  They shall only increase
1558	      the Display Counter with one.  The combination characters mainly
1559	      have components in the series 0300 ? 0361 and 20D0 ? 20E1.

1561	      Erasure: If the control code for erasure, BS, is received, the
1562	      following shall be done: If the Display Counter is 0, an Erasure
1563	      Replacement character, by default being "X" is inserted in the
1564	      buffer instead of the erasure, to mark that erasure was intended
1565	      in earlier transmitted entries.  ( this matches traditional habits
1566	      in real-time text when participants sometimes type XXX to indicate
1567	      erasure they do not bother to make explicit).  If the Display
1568	      Counter is >0, then the counter is reduced by one, and the erasure
1569	      control code BS put into the buffer.

1571	      Initial action in the session: BOM shall be sent initially to the
1572	      recipients in the beginning of the session.

1574	      Maintaining a waiting time per participant: The time that text has
1575	      been in the buffer is maintained as the waiting time for each
1576	      buffer.  A granularity of 0.3 seconds is sufficient.

1578	      Storing time of reception for each character: Each character that
1579	      is stored in a buffer shall be assigned with a time stamp
1580	      indicating its time of reception.  A granularity of 0.3 seconds is
1581	      sufficient.  This time stamp is used for calculation of idle time
1582	      and waiting time in the evaluation of switch reasons.

1584	      Initial assignment of the Current Participant: The first
1585	      contributing participant to send text in the session is assigned
1586	      to be the Current Participant.

1588	      Actions on assignment of a Current Participant: When a participant
1589	      becomes the Current Participant, the following initial actions
1590	      shall be performed:

1592	      1.  Scanning transmissions and timers for a Switch Reason is
1593	      inactivated.

1595	      2.  The Current Recipients are set so that all transmissions go to
1596	      the new set of Current Recipients (See definition).

1598	      3.  A Line Separator is transmitted if the switch reason was any
1599	      other than a message delimiter.

1601	      4.  The Label is transmitted
1602	      5.  Any stored SGR code is transmitted

1604	      6.  Scanning transmissions and timers for a Switch Reason is
1605	      activated.

1607	      7.  Text in the buffer is transmitted, recalculating and setting
1608	      the waiting time for each transmitted character based on the time
1609	      of reception of next character in the buffer.  If a switch occurs
1610	      during transmission from the buffer, the remaining buffer contents
1611	      is maintained and transmission can continue next time this
1612	      transmitter becomes the current participant.  Any text entered
1613	      into the buffer for the current participant is after that sent to
1614	      the recipient until a Switch Reason occurs.

1616	      Actions on transmission and during the session: Transmissions are
1617	      checked for control codes to act on at transmission as described
1618	      below in the section about handling of control codes and such
1619	      actions are performed.  When the scanning of transmission and
1620	      timers for a Switch Reason is active, the timers and the
1621	      transmission to the recipient is analyzed for detection if a
1622	      Switch Reason has occurred.  See the definition of Switch Reasons
1623	      for details.

1625	      Actions when a Switch Reason has occurred: If a Switch Reason has
1626	      occurred, then the following actions shall be performed:

1628	      1.  The Display Counter of the Current Participant is set to zero

1630	      2.  If there is an SGR code stored for the Current Participant, a
1631	      reset of SGR shall be sent by the sequence SGR 0 [009B 0000 006D].

1633	      3.  A participant with the longest waiting time is assigned to be
1634	      the Current Participant, and the procedure for assignment of a
1635	      Current Participant described above is performed.

1637	      Handling of Control codes: The following control codes are
1638	      specified by ITU-T T.140.  Some of them require consideration in
1639	      the conference server.  Note that the codes presented here are
1640	      expressed in UCS-16, while transmission is made in UTF-8 transform
1641	      of these codes.  Other sections specify procedures for handling of
1642	      specific control codes in the conference server.

1644	      BEL 0007 Bell, provides for alerting during an active session.

1646	      BS 0008 Back Space, erases the last entered character.

1648	      NEW LINE 2028 Line separator.

1650	      CR LF 000D 000A A supported, but not preferred way of requesting a
1651	      new line.

1653	      INT ESC 0061 Interrupt (used to initiate mode negotiation
1654	      procedure).

1656	      SGR 009B Ps 006D Select graphic rendition.  Ps is rendition
1657	      parameters specified in ISO 6429.

1659	      SOS 0098 Start of string, used as a general protocol element
1660	      introducer, followed by a maximum 256 bytes string.

1662	      ST 009C String terminator, end of SOS string.

1664	      ESC 001B Escape - used in control strings.

1666	      Byte order mark FEFF Zero width, no break space, used for
1667	      synchronization.

1669	      Missing text mark FFFD Replacement character, marks place in
1670	      stream of possible text loss.

1672	      Code for message border, useful, but not mentioned in T.140: New
1673	      Message 2029 Paragraph separator

1675	      Handling of Graphic Rendition SGR: The following procedure shall
1676	      be followed in order to let the participants control the graphic
1677	      rendition of their entries without disturbing other participants?
1678	      graphic rendition.  The text stream sent to a recipient shall be
1679	      monitored for the SGR sequence.  The latest conveyed SGR sequence
1680	      is also stored as a status variable for the recipient.  If the SGR
1681	      0 code initiated from the current participant is transmitted, the
1682	      SGR storage shall be cleared.

1684	A.5.  Display examples

1686	   The following pictures are examples of the view on a participant's
1687	   display.

1689	     _________________________________________________
1690	    |       Conference       |          Alice          |
1691	    |________________________|_________________________|
1692	    |                        |I will arrive by TGV.    |
1693	    |[Bob]:My flight is to   |Convenient to the main   |
1694	    |Orly.                   |station.                 |
1695	    |[Eve]:Hi all, can we    |                         |
1696	    |plan for the seminar.   |                         |
1697	    |                        |                         |
1698	    |[Bob]:Eve, will you do  |                         |
1699	    |your presentation on    |                         |
1700	    |Friday?                 |                         |
1701	    |[Eve]:Yes, Friday at 10.|                         |
1702	    |[Bob]: Fine, wo         |We need to meet befo     |
1703	    |________________________|_________________________|

1705	   Figure A1 : Alice who has a conference-unaware client is receiving
1706	   the multi-party real-time text in a single-stream.  This figure shows
1707	   how a coordinated column view MAY be presented on Alice's device.

1709	                 _________________________________________________
1710	                |                                              |^|
1711	                |[mix][Alice] Hi, Alice here.                  | |
1712	                |                                              | |
1713	                |[mix][Bob] Bob as well.                       | |
1714	                |                                              | |
1715	                |[mix][Eve] Hi, this is Eve, calling from Paris| |
1716	                |      I thought you should be here.           | |
1717	                |                                              | |
1718	                |[Alice] I am coming on Thursday, my           | |
1719	                |      performance is not until Friday morning.| |
1720	                |                                              | |
1721	                |[mix][Bob] And I on Wednesday evening.        | |
1722	                |                                              | |
1723	                |[mix][Eve] we can have dinner and then walk   | |
1724	                |                                              | |
1725	                |[mix][Eve] But I need to be back to           | |
1726	                |    the hotel by 11 because I need            |-|
1727	                |                                              |-|
1728	                |______________________________________________|v|
1729	                | of course, I underst                           |
1730	                |________________________________________________|

1732	   Figure A2 shows a conference view with real-time text preview.  Bob's
1733	   text is buffering until a Current switch reason.

1735	A.6.  References for this Appendix

1737	      [T.140] ITU-T T.140 Application protocol, text conversation
1738	      (including amendment 1.)

1740	      [RFC 4103] IETF RFC 4103 RTP Payload for text conversation

1742	      [RTP] IETF RFC 3550 RTP: A Transport Protocol for Real-Time
1743	      Applications.

1745	      [RFC 4579] IETF RFC 4579 SIP Call Control ? Conferencing for user
1746	      agents.

1748	      [ISO 6429] ISO 6429 Control functions for coded character sets.

1750	      [UTF-8] IETF RFC 3629 UTF-8, a transformation format of ISO 10646

1752	      [Unicode] The Unicode Consortium, "The Unicode Standard ? Version
1753	      4.0?

1755	      [ISO 10?646-1] ISO 10?646 Universal multiple-octet coded character
1756	      set (UCS)

1758	      [UCS-16] See ISO 10?646-1

1760	A.7.  Acknowledgement for the appendix

1762	   This appendix was developed with funding in part from the National
1763	   Institute on Disability and Rehabilitation Research, U.S.  Department
1764	   of Education,RERC on Telecommunications Access,?grant # H133E090001?.
1765	   However, the contents do not necessarily represent the policy of the
1766	   Department of Education, and you should not assume endorsement by the
1767	   Federal Government.

1769	Author's Address

1771	   Gunnar Hellstrom
1772	   Omnitor
1773	   Esplanaden 30
1774	   Vendelso  SE-136 70
1775	   SE

1777	   Phone: +46 708 204 288
1778	   Email: gunnar.hellstrom@omnitor.se