idnits 2.17.1 

draft-hellstrom-avtcore-multi-party-rtt-solutions-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (8 August 2020) is 1356 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'ICE' is mentioned on line 1039, but not defined

  == Unused Reference: 'RFC3264' is defined on line 1960, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-20) exists of
     draft-ietf-avtcore-multi-party-rtt-mix-06


     Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                             G. Hellstrom
3	Internet-Draft                 Gunnar Hellstrom Accessible Communication
4	Intended status: Informational                             8 August 2020
5	Expires: 9 February 2021

7	           Real-time text solutions for multi-party sessions
8	          draft-hellstrom-avtcore-multi-party-rtt-solutions-03

10	Abstract

12	   This document specifies methods for Real-Time Text (RTT) media
13	   handling in multi-party calls.  The main discussed transport is to
14	   carry Real-Time text by the RTP protocol in a time-sampled mode
15	   according to RFC 4103.  The mechanisms enable the receiving
16	   application to present the received real-time text media, separated
17	   per source, in different ways according to user preferences.  Some
18	   presentation related features are also described explaining suitable
19	   variations of transmission and presentation of text.

21	   Call control features are described for the SIP environment.  A
22	   number of alternative methods for providing the multi-party
23	   negotiation, transmission and presentation are discussed and a
24	   recommendation for the main ones is provided.  The main solution for
25	   SIP based centralized multi-party handling of real-time text is
26	   achieved through a media control unit coordinating multiple RTP text
27	   streams into one RTP stream.

29	   Alternative methods using a single RTP stream and source
30	   identification inline in the text stream are also described, one of
31	   them being provided as a lower functionality fallback method for
32	   endpoints with no multi-party awareness for RTT.

34	   Bridging methods where the text stream is carried without the
35	   contents being dealt with in detail by the bridge are also discussed.

37	   Brief information is also provided for multi-party RTT in the WebRTC
38	   environment.

40	   The intention is to provide background for decisions, specification
41	   and implementation of selected methods.

43	Status of This Memo

45	   This Internet-Draft is submitted in full conformance with the
46	   provisions of BCP 78 and BCP 79.

48	   Internet-Drafts are working documents of the Internet Engineering
49	   Task Force (IETF).  Note that other groups may also distribute
50	   working documents as Internet-Drafts.  The list of current Internet-
51	   Drafts is at https://datatracker.ietf.org/drafts/current/.

53	   Internet-Drafts are draft documents valid for a maximum of six months
54	   and may be updated, replaced, or obsoleted by other documents at any
55	   time.  It is inappropriate to use Internet-Drafts as reference
56	   material or to cite them other than as "work in progress."

58	   This Internet-Draft will expire on 9 February 2021.

60	Copyright Notice

62	   Copyright (c) 2020 IETF Trust and the persons identified as the
63	   document authors.  All rights reserved.

65	   This document is subject to BCP 78 and the IETF Trust's Legal
66	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
67	   license-info) in effect on the date of publication of this document.
68	   Please review these documents carefully, as they describe your rights
69	   and restrictions with respect to this document.  Code Components
70	   extracted from this document must include Simplified BSD License text
71	   as described in Section 4.e of the Trust Legal Provisions and are
72	   provided without warranty as described in the Simplified BSD License.

74	Table of Contents

76	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
77	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   5
78	   2.  Centralized conference model  . . . . . . . . . . . . . . . .   5
79	   3.  Requirements on multi-party RTT . . . . . . . . . . . . . . .   6
80	     3.1.  General requirements  . . . . . . . . . . . . . . . . . .   6
81	     3.2.  Performance requirements  . . . . . . . . . . . . . . . .   7
82	   4.  RTP based solutions . . . . . . . . . . . . . . . . . . . . .   8
83	     4.1.  Coordination of text RTP streams  . . . . . . . . . . . .   8
84	       4.1.1.  RTP-based solutions with a central mixer  . . . . . .   8
85	         4.1.1.1.  RTP Mixer using default RFC 4103 methods  . . . .   8
86	         4.1.1.2.  RTP Mixer using the default method but decreased
87	                 transmission interval . . . . . . . . . . . . . . .   9
88	         4.1.1.3.  RTP Mixer with frequent transmission and indicating
89	                 sources in CSRC-list  . . . . . . . . . . . . . . .  10
90	         4.1.1.4.  RTP Mixer using timestamp to identify
91	                 redundancy  . . . . . . . . . . . . . . . . . . . .  11
92	         4.1.1.5.  RTP Mixer with multiple primary data in each packet
93	                 and individual sequence numbers . . . . . . . . . .  12
94	         4.1.1.6.  RTP Mixer with multiple primary data in each
95	                 packet  . . . . . . . . . . . . . . . . . . . . . .  13

97	         4.1.1.7.  RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy
98	                 in the packets  . . . . . . . . . . . . . . . . . .  14
99	         4.1.1.8.  RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy
100	                 and separate sequence number in the packets . . . .  16
101	         4.1.1.9.  RTP Mixer indicating participants by a control code
102	                 in the stream . . . . . . . . . . . . . . . . . . .  18
103	         4.1.1.10. Mixing for multi-party unaware user agents  . . .  20
104	       4.1.2.  RTP-based bridging with minor RTT media contents
105	               reformatting by the bridge  . . . . . . . . . . . . .  21
106	         4.1.2.1.  RTP Translator sending one RTT stream per
107	                 participant . . . . . . . . . . . . . . . . . . . .  21
108	         4.1.2.2.  Distributing packets in an end-to-end encryption
109	                 structure . . . . . . . . . . . . . . . . . . . . .  24
110	         4.1.2.3.  Mesh of RTP endpoints . . . . . . . . . . . . . .  25
111	         4.1.2.4.  Multiple RTP sessions, one for each
112	                 participant . . . . . . . . . . . . . . . . . . . .  25
113	   5.  Preferred RTP-based multi-party RTT transport method  . . . .  26
114	   6.  Session control of RTP-based multi-party RTT sessions . . . .  26
115	     6.1.  Implicit RTT multi-party capability indication  . . . . .  27
116	     6.2.  RTT multi-party capability declared by SIP media-tags . .  28
117	     6.3.  SDP media attribute for RTT multi-party capability
118	           indication  . . . . . . . . . . . . . . . . . . . . . . .  29
119	     6.4.  Simplified SDP media attribute for RTT multi-party
120	           capability indication . . . . . . . . . . . . . . . . . .  31
121	     6.5.  SDP format parameter for RTT multi-party capability
122	           indication  . . . . . . . . . . . . . . . . . . . . . . .  31
123	     6.6.  A text media subtype for support of multi-party rtt . . .  33
124	     6.7.  Preferred capability declaration method for RTP-based
125	           transport.  . . . . . . . . . . . . . . . . . . . . . . .  33
126	     6.8.  Identification of the source of text for RTP-based
127	           solutions . . . . . . . . . . . . . . . . . . . . . . . .  33
128	   7.  RTT bridging in WebRTC  . . . . . . . . . . . . . . . . . . .  34
129	     7.1.  RTT bridging in WebRTC with one data channel per
130	           source  . . . . . . . . . . . . . . . . . . . . . . . . .  34
131	     7.2.  RTT bridging in WebRTC with one common data channel . . .  35
132	     7.3.  Preferred rtt multi-party method for WebRTC . . . . . . .  35
133	   8.  Presentation of multi-party text  . . . . . . . . . . . . . .  36
134	     8.1.  Associating identities with text streams  . . . . . . . .  36
135	     8.2.  Presentation details for multi-party aware endpoints. . .  36
136	       8.2.1.  Bubble style presentation . . . . . . . . . . . . . .  37
137	       8.2.2.  Other presentation styles . . . . . . . . . . . . . .  38
138	   9.  Presentation details for multi-party unaware endpoints. . . .  39
139	   10. Security Considerations . . . . . . . . . . . . . . . . . . .  39
140	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  39
141	   12. Congestion considerations . . . . . . . . . . . . . . . . . .  40
142	   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  40
143	   14. Change history  . . . . . . . . . . . . . . . . . . . . . . .  40
144	     14.1.  Changes to
145	            draft-hellstrom-avtcore-multi-party-rtt-solutions-03 . .  40
146	     14.2.  Changes to
147	            draft-hellstrom-avtcore-multi-party-rtt-solutions-02 . .  40
148	     14.3.  Changes to
149	            draft-hellstrom-avtcore-multi-party-rtt-solutions-01 . .  40
150	     14.4.  Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to
151	            draft-hellstrom-avtcore-multi-party-rtt-solutions-00 . .  41
152	     14.5.  Changes from version
153	            draft-hellstrom-mmusic-multi-party-rtt-01 to -02 . . . .  41
154	   15. References  . . . . . . . . . . . . . . . . . . . . . . . . .  41
155	     15.1.  Normative References . . . . . . . . . . . . . . . . . .  41
156	     15.2.  Informative References . . . . . . . . . . . . . . . . .  42
157	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  45

159	1.  Introduction

161	   Real-time text (RTT) is a medium in real-time conversational
162	   sessions.  Text entered by participants in a session is transmitted
163	   in a time-sampled fashion, so that no specific user action is needed
164	   to cause transmission.  This gives a direct flow of text in the rate
165	   it is created, that is suitable in a real-time conversational
166	   setting.  The real-time text medium can be combined with other media
167	   in multimedia sessions.

169	   Media from a number of multimedia session participants can be
170	   combined in a multi-party session.  The present document specifies
171	   how the real-time text streams can be handled in multi-party
172	   sessions.  Recommendations are provided for preferred methods.

174	   The description is mainly focused on the transport level, but also
175	   describes a few session and presentation level aspects.

177	   Transport of real-time text is specified in RFC 4103 [RFC4103] RTP
178	   Payload for text conversation.  It makes use of RFC 3550 [RFC3550]
179	   Real Time Protocol, for transport.  Robustness against network
180	   transmission problems is normally achieved through redundant
181	   transmission based on the principle from RFC 2198 [RFC2198], with one
182	   primary and two redundant transmission of each text element.  Primary
183	   and redundant transmissions are combined in packets and described by
184	   a redundancy header.  This transport is usually used in the SIP
185	   Session Initiation Protocol RFC 3261 [RFC3261] environment.

187	   A very brief overview of functions for real-time text handling in
188	   multi-party sessions is described in RFC 4597 [RFC4597] Conferencing
189	   Scenarios, sections 4.8 and 4.10.  The present specification builds
190	   on that description and indicates which protocol mechanisms should be
191	   used to implement multi-party handling of real-time text.

193	   Real-time text can also be transported in the WebRTC environment, by
194	   using WebRTC data channels according to
195	   [I-D.ietf-mmusic-t140-usage-data-channel].  Multi-party aspects for
196	   WebRTC solutions are briefly covered.

198	1.1.  Requirements Language

200	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
201	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
202	   document are to be interpreted as described in RFC 2119 [RFC2119].

204	2.  Centralized conference model

206	   In the centralized conference model for SIP, introduced in RFC 4353
207	   [RFC4353] "A Framework for Conferencing with the Session Initiation
208	   Protocol (SIP)", one function co-ordinates the communication with
209	   participants in the multi-party session.  This function also controls
210	   media mixer functions for the media appearing in the session.  The
211	   central function is common for control of all media, while the media
212	   mixers may work differently for each media.

214	   The central function is called the Focus UA.  Many variants exist for
215	   setting up sessions including the multipoint control centre.  It is
216	   not within scope of this description to describe these, but rather
217	   the media specific handling in the mixer required to handle multi-
218	   party calls with RTT.

220	   The main principle for handling real-time text media in a centralized
221	   conference is that one RTP session for real-time text is established
222	   including the multipoint media control centre and the participating
223	   endpoints which are going to have real-time text exchange with the
224	   others.

226	   The different possible mechanisms for mixing and transporting RTT
227	   differs in the way they multiplex the text streams and how they
228	   identify the sources of the streams.  RFC 7667 [RFC7667] describes a
229	   number of possible use cases for RTP.  This specification refers to
230	   different sections of RFC 7667 for further reading of the situations
231	   caused by the different possible design choices.

233	   The recommended method for using RTP based RTT in a centralized
234	   conference model is specified in
235	   [I-D.ietf-avtcore-multi-party-rtt-mix] based on the recommendations
236	   in this document.

238	   Real-time text can also be transported in the WebRTC environment, by
239	   using WebRTC data channels according to
240	   [I-D.ietf-mmusic-t140-usage-data-channel].  Ways to handle multi-
241	   party calls in that environmnent are also specified.

243	3.  Requirements on multi-party RTT

245	3.1.  General requirements

247	   The following general requirements are placed on multi-party RTT:

249	      A solution shall be applicable to IMS (3GPP TS 22.173)[TS22173],
250	      SIP based VoIP and Next Generation Emergency Services (NENA i3
251	      [NENAi3], ETSI TS 103 479 [TS103479], RFC 6443[RFC6443]).

253	      The transmission interval for text should not be longer than 500
254	      milliseconds when there is anything available to send.  Ref ITU-T
255	      T.140 [T140].

257	      If text loss is detected or suspected, a missing text marker
258	      should be inserted in the text stream.  Ref ITU-T T.140 Amendment
259	      1 [T140ad1].  ETSI EN 301 549 [EN301549]

261	      The display of text from the members of the conversation shall be
262	      arranged so that the text from each participant is clearly
263	      readable, and its source and the relative timing of entered text
264	      is visualized in the display.  Mechanisms for looking back in the
265	      contents from the current session should be provided.  The text
266	      should be displayed as soon as it is received.  Ref ITU-T T.140
267	      [T140]

269	      Bridges must be multimedia capable (voice, video, text).  Ref NENA
270	      i3 STA-010.2.  [NENAi3]

272	      It MUST be possible to use real-time text in conferences both as a
273	      medium of discussion between individual participants (for example,
274	      for sidebar discussions in real-time text while listening to the
275	      main conference audio) and for central support of the conference
276	      with real-time text interpretation of speech.  Ref (R7) in RFC
277	      5194.[RFC5194]

279	      It should be possible to protect RTT contents with usual means for
280	      privacy and integrity.  Ref RFC 6881 section 16.  [RFC6881]

282	      Conferencing procedures are documented in RFC 4579 [RFC4579].  Ref
283	      NENA i3 STA-010.2.[NENAi3]
284	      Conferencing applies to any kind of media stream by which users
285	      may want to communicate.  Ref 3GPP TS 24.147 [TS24147]

287	      The framework for SIP conferences is specified in RFC 4353
288	      [RFC4353].  Ref 3GPP TS 24.147 [TS24147]

290	3.2.  Performance requirements

292	   The mixer performance requirements can be expressed in one number,
293	   extracted from the user requirements on real-time text expressed in
294	   ITU-T F.700, where it is stated that for "good" usability, text
295	   characters should not be delayed more than 1 second from creation to
296	   presentation.  For "usable" usability the figure is 2 seconds.  The
297	   main factor behind these limits is from when taking turns in a
298	   conversation gets disturbed by a delay of when a response gets
299	   visible to the receiving part.  If that times get too long, the
300	   receiving part gets unsure if the previous utterance was well
301	   perceived and the receiving part maybe prepares for repetition.  This
302	   is similar to the same effect in voice communication, where the
303	   usability limit is 400 ms delay.

305	   Another important factor in a multi-party conference is the
306	   opportunity for a participant using real-time text to provide timely
307	   comments and get a chance to enter the discussion if the majority of
308	   participants use voice in the conference.  A complicating factor when
309	   stating the requirements is that some transport methods do not cause
310	   a total delay, but instead an increasing jerkiness when the number of
311	   simultaneously sending participants is increased.

313	   It should however be remembered that the expected number of
314	   participants sending real-time text simultaneously is low.  Just as
315	   with voice or sign language, the capability of the participants to
316	   perceive utterances from more than one participant at a time is very
317	   limited.  Therefore the normal case in multi-party situations is that
318	   one participant at a time is the main provider of text.  Others might
319	   usually just provide very brief comments such as "yes" or "no" or
320	   "may I comment?".  Only at very rare situations two participants
321	   provide more information simultaneously.

323	   *  The number of expected simultaneously transmitting users is
324	      different for different applications.  In all cases, just one
325	      transmitting user is the normal case.  Two simultaneously
326	      transmitting participants can occasionally be expected in
327	      emergency services, relay services, small unmanaged conferences
328	      and group calls and large managed conferences.  Three
329	      simultaneously transmitting participants may appear occasionally
330	      in large unmanaged conferences.  The following can therefore
331	      express the performance requirement.

333	   *  The mean delay of text passing the mixer introduced when only one
334	      participant is sending text should be kept to a minimum and should
335	      not be more than 400 ms.

337	   *  The mean delay of text passing the mixer should not be more than 1
338	      second during moments when up to three users are sending text
339	      simultaneously.

341	   *  For the very rare case that more than three participants send text
342	      simultaneously, the mixer may take action to limit the introduced
343	      delay of the text passing the mixer to 7 seconds e.g. by
344	      discarding text from some participants and instead inserting a
345	      general warning about possible text loss in the stream.

347	4.  RTP based solutions

349	4.1.  Coordination of text RTP streams

351	   Coordinating and sending text RTP streams in the multi-party session
352	   can be done in a number of ways.  The most suitable methods are
353	   specified here with pros and cons.

355	   A receiving and presenting endpoint MUST separate text from the
356	   different sources and identify and display them accordingly.

358	4.1.1.  RTP-based solutions with a central mixer

360	   A set of solutions can be based on the central RTP mixer.  They are
361	   described here and a preferred method selected.

363	4.1.1.1.  RTP Mixer using default RFC 4103 methods

365	   Without any extra specifications, a mixer would transmit with 300
366	   milliseconds intervals, and use RFC 4103 [RFC4103] with the default
367	   redundancy of one original and two redundant transmissions.  The
368	   source of the text would be indicated by a single member in the CSRC
369	   list.  Text from different sources cannot be transmitted in the same
370	   packet.  Therefore, from the time when the mixer sent one piece of
371	   new text from one source, it will need to transmit that text again
372	   twice as redundant data, before it can send text from another source.
373	   The jerkiness = time between transmission of new text is 900 ms.
374	   This is clearly insufficient.

376	   Pros:

378	   Only a capability negotiation method is needed.  No other update of
379	   standards are needed, just a general remark that traditional RTP-
380	   mixing is used.

382	   Cons:

384	   Clearly insufficient mixer switching performance.

386	   A bit complex handling of transmission when there is new text
387	   available from more than one source.  The mixer needs to send two
388	   packets more with redundant text from the current source before
389	   starting to send anything from the other source.

391	4.1.1.2.  RTP Mixer using the default method but decreased transmission
392	          interval

394	   This method makes use of the default RTP-mixing method briefly
395	   described in Section 4.1.1.1.  The only difference is that the
396	   transmission interval is decreased to 100 milliseconds when there is
397	   text from more than one source available for transmission.  The
398	   jerkiness is 300 ms.  The mean delay with two simultaneously sending
399	   participants is 250 ms, and with three simultaneously sending
400	   participants 500 ms.  This is acceptable performance.

402	   Pros:

404	   Minor influence on standards

406	   Can be relatively rapidly be introduced in the intended technical
407	   environments.

409	   Can be declared in sdp as the already existing "text/red" format with
410	   a multi-party attribute for capability negotiation.

412	   Cons:

414	   The introduced jerkiness of new text from more than the required
415	   three simultaneously sending sources is high.

417	   Slightly higher risk for loss of text at bursty packet loss than for
418	   the recommended transmission interval (300 ms) for RFC 4103.

420	   When complete loss of packets occur (beyond recovery), it is not
421	   possible to deduct from which source text was lost.

423	   A bit complex handling of transmission when there is new text
424	   available from more than one source.  The mixer needs to send two
425	   packets more with redundant text from the current source before
426	   starting to send anything from the other source.

428	4.1.1.3.  RTP Mixer with frequent transmission and indicating sources in
429	          CSRC-list

431	   An RTP media mixer combines text from participants into one RTP
432	   stream, thus all using the same destination address/port combination,
433	   the same RTP SSRC, and one sequence number series as described in
434	   Section 7.1 and 7.3 of RTP RFC 3550 [RFC3550] about the Mixer
435	   function.  This method is also briefly described in RFC 7667, section
436	   3.6.1 Media mixing mixer [RFC7667].

438	   The sources of the text in each RTP packet are identified by the CSRC
439	   list in the RTP packets, containing the SSRC of the initial sources
440	   of text.  The order of the CSRC parameters is with the SSRC of the
441	   source of the primary text first, followed by the SSRC of the first
442	   level redundancy, and then the second level redundancy.

444	   The transmission interval should be 100 milliseconds when there is
445	   text to transmit from more than one source, and otherwise 300 ms.

447	   The identification of the sources is made through the CSRC fields and
448	   can be made more readable at the receiver through the RTCP SDES CNAME
449	   and NAME packets as described in RTP[RFC3550].

451	   Information provided through the notification according to RFC 4575
452	   [RFC4575] when the participant joined the conference provides also
453	   suitable information and a reference to the SSRC.

455	   A receiving endpoint is supposed to separate text items from the
456	   different sources and identify and display them accordingly.

458	   The ordered CSRC lists in the RFC 4103 [RFC4103] packets make it
459	   possible to recover from loss of one and two packets in sequence and
460	   assign the recovered text to the right source.  For more loss, a
461	   marker for possible loss should be inserted or presented.

463	   The conference server needs to have authority to decrypt the payload
464	   in the received RTP packets in order to be able to recover text from
465	   redundant data or insert the missing text marker in the stream, and
466	   repack the text in new packets.

468	   Even if the format is very similar to "text/red" of RFC 4103, it
469	   needs to be declared as a new media subtype, e.g. "text/rex".

471	   Pros:

473	   This method has low overhead and less complexity than the methods in
474	   Section 4.1.1.1, Section 4.1.1.2, Section 4.1.1.4 and
475	   Section 4.1.1.6.

477	   When loss of packets occur, it is possible to recover text from
478	   redundancy at loss of up to the number of redundancy levels carried
479	   in the RFC 4103 [RFC4103] stream (normally primary and two redundant
480	   levels).

482	   This method can be implemented with most RTP implementations.

484	   The source switching performance is sufficient for well-behaving
485	   conference participants.  The jerkiness is 100 ms.

487	   Cons:

489	   When more consecutive packet loss than the number of generations of
490	   redundant data appears, it is not possible to deduct the sources of
491	   the totally lost data.

493	   Slightly higher risk for loss of text at bursty packet loss than for
494	   the recommended transmission interval for RFC 4103.

496	   Requires a different sub media format, e.g. "text/rex".  This takes a
497	   long time in standardisation and releases of target technical
498	   environments.

500	   The conference server needs to be allowed to decrypt/encrypt the
501	   packet payload.  This is however normal for media mixers for other
502	   media.

504	4.1.1.4.  RTP Mixer using timestamp to identify redundancy

506	   This method has text only from one source per packet, as the original
507	   RFC 4103 [RFC4103] specifies.  Packets with text from different
508	   sources are instead allowed to be merged.  The recovery procedure in
509	   the receiver will use the RTP timestamp and timestamp offsets in the
510	   redundancy headers to evaluate if a piece of redundant data should be
511	   recovered or not in case of packet loss.

513	   In this method, the transmission interval is 100 milliseconds when
514	   text from more than one source is available for transmission.

516	   Pros:

518	   The format of each packet is equal to what is specified in RFC 4103
519	   [RFC4103].

521	   The source switching performance is sufficient.  Text from five
522	   participants can be transmitted simultaneously with 500 milliseconds
523	   interval per source.

525	   New text from five simultaneous sources can be transmitted within 500
526	   milliseconds.  This is sufficient.

528	   Cons:

530	   The recovery time in case of packet loss is long.  With five
531	   simultaneously sending participants, it will be 1.5 seconds.

533	   The recovery procedure is complex and very different from what is
534	   described in RFC 4103 [RFC4103].

536	   It is not sure that this change can be regarded to be an update to
537	   RFC 4103.  It may need a new media subtype.

539	4.1.1.5.  RTP Mixer with multiple primary data in each packet and
540	          individual sequence numbers

542	   This method allows primary as well as redundant text from more than
543	   one source per packet.  The packet payload contains an ordered set of
544	   redundant and primary data with the same number of generations of
545	   redundancy as once agreed in the SDP negotiation.  The data header
546	   reflects these parts of the payload.  The CSRC list contains one CSRC
547	   member per source in the payload and in the same order.  An
548	   individual sequence number per source is included in the data header
549	   replacing the t140 payload type number that is instead assumed to be
550	   constant in this format.  This allows an individual extra sequence
551	   number per source with maximum value 127, suitable for checking for
552	   which source loss of text appeared when recovery was not possible.

554	   The data header would contain the following fields:
555	     0                   1                    2                   3
556	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
557	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
558	   |F| Source-seq  |  timestamp offset         |   block length    |
559	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
560	   Where "Source-seq" is the sequence number per source.

562	   The maximum number of members in the CSRC-list is 15, and that is
563	   therefore the maximum number of sources that can be represented in
564	   each packet provided that all data can be fitted into the size
565	   allowable in one packet.

567	   Transmission is done as soon as there is new text available, but not
568	   with shorter interval than 150 ms and not longer than 300 ms while
569	   there is anything to send.

571	   A new media subtype is needed, e.g. "text/rex".

573	   This is an SDP offer example for both traditional "text/red"
574	   and multi-party "text/rex" format:

576	         m=text 11000 RTP/AVP 101 100 98
577	         a=rtpmap:98 t140/1000
578	         a=rtpmap:100 red/1000
579	         a=rtpmap:101 rex/1000
580	         a=fmtp:100 98/98/98
581	         a=fmtp:101 98/98/98

583	   Pros:

585	   The source switching performance is good.  Text from 15 participants
586	   can be transmitted simultaneously.

588	   New text from 15 simultaneous sources can be transmitted within 300
589	   milliseconds.  This is good performance.

591	   When more consecutive packet loss than the number of generations of
592	   redundant data appears, it is still possible to deduct the sources of
593	   the totally lost data, when next text from these sources arrive.

595	   Cons:

597	   The format of each packet is different from what is specified in RFC
598	   4103 [RFC4103].

600	   The processing time in standard organisation will be long.

602	   A new media subtype is needed, causing a bit complex negotiation.

604	   The recovery procedure is a bit complex.

606	4.1.1.6.  RTP Mixer with multiple primary data in each packet

608	   This method allows primary as well as redundant text from more than
609	   one source per packet.  The packet payload contains an ordered set of
610	   redundant and primary data with the same number of generations of
611	   redundancy as once agreed in the SDP negotiation.  The data header
612	   reflects these parts of the payload.  The CSRC list contains one CSRC
613	   member per source in the payload and in the same order.

615	   The maximum number of members in the CSRC-list is 15, and that is
616	   therefore the maximum number of sources that can be represented in
617	   each packet provided that all data can be fitted into the size
618	   allowable in one packet.

620	   Transmission is done as soon as there is new text available, but not
621	   with shorter interval than 150 ms and not longer than 300 ms while
622	   there is anything to send.

624	   A new media subtype is needed, e.g. "text/rex".

626	   SDP would be the same as in Section 4.1.1.6.

628	   Pros:

630	   The source switching performance is good.  Text from 15 participants
631	   can be transmitted simultaneously.

633	   New text from 15 simultaneous sources can be transmitted within 150
634	   milliseconds.  This is good performance.

636	   Cons:

638	   The format of each packet is different from what is specified in RFC
639	   4103 [RFC4103].

641	   A new media subtype is needed.

643	   A new media subtype is needed, causing a bit complex negotiation.

645	   The processing time in standard organisation will be long.

647	   The recovery procedure is a bit complex [RFC4103].

649	   When more consecutive packet loss than the number of generations of
650	   redundant data appears, it is not possible to deduct the sources of
651	   the totally lost data.

653	4.1.1.7.  RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy in the
654	          packets

656	   This method allows primary data from one source and redundant text
657	   from other sources in each packet.  The packet payload contains
658	   primary data in "text/t140" format, and redundant data in RFC 5109
659	   FEC [RFC5109] format called "text/ulpfec".  That means that the
660	   redundant data contains the sequence number and the CSRC and other
661	   characteristics from the RTP header when the data was sent as
662	   primary.  The redundancy can be sent at a selected number of packets
663	   after when it was sent as primary, in order to improve the protection
664	   against bursty packet loss.  The redundancy level is recommended to
665	   be the same as in original RFC 4103.

667	   RFC 4103 says that the protection against loss can be made by other
668	   methods than plain redundancy, so this method is in line with that
669	   statement.

671	   Transmission is done as soon as there is new text available, but not
672	   with shorter interval than 100 ms and not longer than 300 ms while
673	   there is anything to send (new or redundant text).

675	   When more consecutive packet loss than the number of generations of
676	   redundant data appears, it is not possible to deduct the sources of
677	   the totally lost data.

679	   The sdp can indicate the format as "text/red" with "text/ulpfec"
680	   redundant data in this way. with traditional RFC 4103 with "text/red"
681	   with "text/t140" as redundant data as a fallback.

683	   m=text 49170 RTP/AVP 98 101 100 102
684	   a=rtpmap:98 red/1000
685	   a=fmtp:98 100/102/102
686	   a=rtpmap:102 ulpfec/1000
687	   a=rtpmap:100 t140/1000
688	   a=rtpmap:101 red/1000
689	   a=fmtp:101 100/100/100
690	   a=fmtp:100 cps=200

692	   The "text/ulpfec" format includes an indication of how far back the
693	   redundancy belongs, making it possible to cover bursty packet loss
694	   better than the other formats with short transmission intervals.  For
695	   real-time text, it is recommended to send three packets between the
696	   primary and the redundant transmissions of text.  That makes the
697	   transmission cover between 500 and 1500 ms of bursty packet loss.
698	   The variation is because of the varying packet interval between many
699	   and one simultaneously transmitting source.

701	   The "text/ulpfec" format has a number of parameters.  One is the
702	   length of the data to be protected which in this case must be the
703	   whole t140block.

705	   Pros:

707	   The source switching performance is good.  Text from 5 participants
708	   can be transmitted within 500 ms.

710	   Good recovery from bursty packet loss.

712	   The method is based on existing standards.  No new registrations are
713	   needed.

715	   Cons:

717	   When more consecutive packet loss than the number of generations of
718	   redundant data appears, it is not possible to deduct the sources of
719	   the totally lost data.

721	   Even if the switching performance is good, it is not as good as for
722	   the method called "RTP Mixer with multiple primary data in each
723	   packet "Section 4.1.1.6.  With more than 5 simultaneously sending
724	   sources, there will be a noticeable delay of text of over 500 ms,
725	   with 100 ms added per simultaneous source.  This is however beyond
726	   the requirements and would be a concern only in congestion
727	   situations.

729	   The recovery procedure is a bit complex [RFC5109].

731	   There is more overhead in terms of extra data and extra packets sent
732	   than in the other methods.  With the recommended two redundant
733	   generations of data, each packet will be 36 bytes longer than with
734	   traditional RFC 4103, and at each pause in transmission five extra
735	   packets with only redundant data will be sent compared to two extra
736	   packets for the traditional RFC 4103 case.

738	4.1.1.8.  RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy and
739	          separate sequence number in the packets

741	   This method allows primary data from one source and redundant text
742	   from other sources in each packet.  The packet payload contains
743	   primary data in a new "text/t140e" format, and redundant data in RFC
744	   5109 FEC [RFC5109] format called "text/ulpfec".  That means that the
745	   redundant data contains the sequence number and the CSRC and other
746	   characteristics from the RTP header when the data was sent as
747	   primary.  The redundancy can be sent at a selected number of packets
748	   after when it was sent as primary, in order to improve the protection
749	   against bursty packet loss.  The redundancy level is recommended to
750	   be the same as in original RFC 4103.  The "text/t140e" format
751	   contains a source-specific sequence number and the t140block.

753	   RFC 4103 says that the protection against loss can be made by other
754	   methods than plain redundancy, so this method is in line with that
755	   statement.

757	   Transmission is done as soon as there is new text available, but not
758	   with shorter interval than 100 ms and not longer than 300 ms while
759	   there is anything to send (new or redundant text).

761	   When more consecutive packet loss than the number of generations of
762	   redundant data appears, it is possible to deduct which sources lost
763	   data when new data arrives from the sources.  This is done by
764	   monitoring the received source specific sequence numbers preceding
765	   the text.

767	   This is an example of how can indicate the format as "text/red" with
768	   "text/t140e" as primary and "text/ulpfec" redundant data, with
769	   traditional RFC 4103 with "text/red" with "text/t140" as redundant
770	   data as a fallback.

772	   m=text 49170 RTP/AVP 98 101 100 102 103
773	   a=rtpmap:98 red/1000
774	   a=fmtp:98 100/102/102
775	   a=rtpmap:102 ulpfec/1000
776	   a=rtpmap:103 t140/1000
777	   a=rtpmap:100 t140e/1000
778	   a=rtpmap:101 red/1000
779	   a=fmtp:101 103/103/103
780	   a=fmtp:100 cps=200

782	   The "text/ulpfec" format includes an indication of how far back the
783	   redundancy belongs, making it possible to cover bursty packet loss
784	   better than the other formats with short transmission intervals.  For
785	   real-time text, it is recommended to send three packets between the
786	   primary and the redundant transmissions of text.  That makes the
787	   transmission cover between 500 and 1500 ms of bursty packet loss.
788	   The variation is because of the varying packet interval between many
789	   and one simultaneously transmitting source.

791	   The "text/ulpfec" format has a number of parameters.  One is the
792	   length of the data to be protected which in this case must be the
793	   whole t140block.

795	   Pros:

797	   The source switching performance is good.  Text from 5 participants
798	   can be transmitted within 500 ms.

800	   Good recovery from bursty packet loss.

802	   The method is based on an existing standard for FEC.

804	   When more consecutive packet loss than the number of generations of
805	   redundant data appears, it is possible to deduct the source of the
806	   lost data when new text arrives from the source.

808	   Cons:

810	   Even if the switching performance is good, it is not as good as for
811	   the method called "RTP Mixer with multiple primary data in each
812	   packet" Section 4.1.1.6.  With more than 5 simultaneously sending
813	   sources, there will be a noticeable delay of text of over 500 ms,
814	   with 100 ms added per simultaneous source.  This is however beyond
815	   the requirements and would be a concern only in congestion
816	   situations.

818	   The recovery procedure is a bit complex [RFC5109].

820	   There is more overhead in terms of extra data and extra packets sent
821	   than in the other methods.  With the recommended two redundant
822	   generations of data, each packet will be 40 bytes longer than with
823	   traditional RFC 4103, and at each pause in transmission five extra
824	   packets with only redundant data will be sent compared to two extra
825	   packets for the traditional RFC 4103 case.

827	   A new text media subtype "text/t140e" needs to be registered.

829	   The processing time in standard organisation will be long.

831	4.1.1.9.  RTP Mixer indicating participants by a control code in the
832	          stream

834	   Text from all participants except the receiving one is transmitted
835	   from the media mixer in the same RTP session and stream, thus all
836	   using the same destination address/port combination, the same RTP
837	   SSRC and , one sequence number series as described in Section 7.1 and
838	   7.3 of RTP RFC 3550 [RFC3550] about the Mixer function.  The sources
839	   of the text in each RTP packet are identified by a new defined T.140
840	   control code "c" followed by a unique identification of the source in
841	   UTF-8 string format.

843	   The receiver can use the string for presenting the source of text.
844	   This method is on the RTP level described in RFC 7667, section 3.6.1
845	   Media mixing mixer [RFC7667].

847	   The inline coding of the source of text is applied in the data stream
848	   itself, and an RTP mixer function is used for coordinating the
849	   sources of text into one RTP stream.

851	   Information uniquely identifying each user in the multi-party session
852	   is placed as the parameter value "n" in the T.140 application
853	   protocol function with the function code "c".  The identifier shall
854	   thus be formatted like this: SOS c n ST, where SOS and ST are coded
855	   as specified in ITU-T T.140 [T140].  The "c" is the letter "c".  The
856	   n parameter value is a string uniquely identifying the source.  This
857	   parameter shall be kept short so that it can be repeated in the
858	   transmission without concerns for network load.

860	   A receiving endpoint is supposed to separate text items from the
861	   different sources and identify and display them accordingly.

863	   The conference server need to be allowed to decrypt/encrypt the
864	   packet payload in order to check the source and repack the text.

866	   Pros:

868	   If loss of packets occur, it is possible to recover text from
869	   redundancy at loss of up to the number of redundancy levels carried
870	   in the RFC 4103 [RFC4103]stream. (normally primary and two redundant
871	   levels.

873	   This method can be implemented with most RTP implementations.

875	   The method can also be used with other transports than RTP

877	   Cons:

879	   The method implies a moderate load by the need to insert the source
880	   often in the stream.

882	   If more consecutive packet loss than the number of generations of
883	   redundant data appears, it is not possible to deduct the source of
884	   the totally lost data.

886	   The mixer needs to be able to generate suitable and unique source
887	   identifications which are suitable as labels for the sources.

889	   Requires an extension on the ITU-T T.140 standard, best made by the
890	   ITU.

892	   There is a risk that the control code indicating the change of source
893	   is lost and the result is false source indication of text.

895	   The conference server need to be allowed to decrypt/encrypt the
896	   packet payload.

898	4.1.1.10.  Mixing for multi-party unaware user agents

900	   Multi-party real-time text contents can be transmitted to multi-party
901	   unaware user agents if source labelling and formatting of the text is
902	   performed by a mixer.  This method has the limitations that the
903	   layout of the presentation and the format of source identification is
904	   purely controlled by the mixer, and that only one source at a time is
905	   allowed to present in real-time.  Other sources need to be stored
906	   temporarily waiting for an appropriate moment to switch the source of
907	   transmitted text.  The mixer controls the switching of sources and
908	   inserts a source identifier in text format at the beginning of text
909	   after switch of source.  The logic of the mixer to detect when a
910	   switch is appropriate should detect a number of places in text where
911	   a switch can be allowed, including new line, end of sentence, end of
912	   phrase, a period of inactivity, and a word separator after a long
913	   time of active transmission.

915	   This method MAY be used when no support for multi-party awareness is
916	   detected in the receiving endpoint.The base for his method is
917	   described in RFC 7667, section 3.6.1 Media mixing mixer [RFC7667].

919	   See [I-D.ietf-avtcore-multi-party-rtt-mix] for a procedure for mixing
920	   RTT for a conference-unaware endpoint.

922	   Pros:

924	   Can be transmitted to conference-unaware endpoints.

926	   Can be used with other transports than RTP

928	   Cons:

930	   Does not allow full real-time presentation of more than one source at
931	   a time.  Text from other sources will be delayed.

933	   The only realistic presentation format is a style with the text from
934	   the different sources presented with a text label indicating source,
935	   and the text collected in a chat style presentation but with more
936	   frequent turn-taking.

938	   Endpoints often have their own system for adding labels to the RTT
939	   presentation.  In that case there will be two levels of labels in the
940	   presentation, one for the mixer and one for the sources.

942	   If loss of more packets than can be recovered by the redundancy
943	   appears, it is not possible to detect which source was struck by the
944	   loss.  It is also possible that a source switch occurred during the
945	   loss, and therefore a false indication of the source of text can be
946	   provided to the user after such loss.

948	   Because of all these cons, this method is not recommended and should
949	   be used as the main method, but only as fallback and the last resort
950	   for backwards interoperability with multi-party unaware endpoints.

952	   The conference server need to be allowed to decrypt/encrypt the
953	   packet payload.

955	4.1.2.  RTP-based bridging with minor RTT media contents reformatting by
956	        the bridge

958	   It may be desirable to send text in a multi-party setting in a way
959	   that allows the text stream contents to be distributed without being
960	   dealt with in detail in any central server.  A number of such methods
961	   are described.  However, when writing this specification, no one of
962	   these methods have a specified way of establishing the session by
963	   sdp.

965	4.1.2.1.  RTP Translator sending one RTT stream per participant

967	   Within the RTP session, text from each participant is transmitted
968	   from the RTP media translator (bridge) in a separate RTP stream, thus
969	   using the same destination address/port combination, the same payload
970	   type number (PT) but separate RTP SSRC parameters and sequence number
971	   series as described in Section 7.1 and 7.2 of RTP RFC 3550 [RFC3550]
972	   about the Translator function.  The source of the text in each RTP
973	   packet is identified by the SSRC parameter in the RTP packets,
974	   containing the SSRC of the initial source of text.

976	   A receiving and presenting endpoint is supposed to separate text
977	   items from the different sources and identify and display them in a
978	   suitable way.

980	   This method is described in RFC 7667, section 3.5.1 Relay-transport
981	   translator or 3.5.2 Media translator [RFC7667].

983	   The identification of the source is made through the SSRC.  The
984	   translation to a readable label can be done by mapping to information
985	   from the RTCP SDES CNAME and NAME packets as described in
986	   RTP[RFC3550], and also through information in the text media member
987	   in the conference notification described in RFC 4575 [RFC4575].

989	   The sdp exchange for establishing this mixing type can be equal to
990	   what is used for basic two-party use of RFC 4103 with just an added
991	   attribute for indicating multi-party capability.

993	   m=text 49170 RTP/AVP 98 103
994	   a=rtpmap:98 red/1000
995	   a=fmtp:98 103/103/103
996	   a=rtpmap:103 t140/1000
997	   a=fmtp:103 cps=150
998	   a=RTT-mix:RTP-translator

1000	   A similar answer including the same RTT-mix attribute would indicate
1001	   that multi-party coding can begin.  An answer without the same RTT-
1002	   mix attribute could result in diversion to use of the mixing method
1003	   for multi-party unaware endpoints Section 4.1.1.10 if more than two
1004	   parties are involved in the session.

1006	   The bridge can add new sources in the communication to a participant
1007	   by first sending a conference notification according to RFC 4575
1008	   [RFC4575] with the SSRC of the new source included in the
1009	   corresponding "text" media member, or by sending an RTCP message with
1010	   the new SSRC in an SDES packet.

1012	   A receiver should be prepared to receive such indications of new
1013	   streams being added to the multi-party session, so that the new SSRC
1014	   is not taken for a change in SSRC value for an already established
1015	   RTP stream.

1017	   Transmission, reception, packet loss recovery and text loss
1018	   indication is performed per source in the separate RTP streams in the
1019	   same way as in two-party sessions with RFC 4103 [RFC4575].

1021	   Text is recommended to be sent by the bridge as soon as it is
1022	   available for transmission, but not less than 250 ms after a previous
1023	   transmission.  This will in many cases result in close to 0 added
1024	   delay by the bridge, because most RTT senders use a 300 ms
1025	   transmission interval.

1027	   It is sometimes said that this configuration is not supported by
1028	   current media declarations in sdp.  RFC 3264 [RFC3264]specifies in
1029	   some places that one media description is supposed to describe just
1030	   one RTP media stream.  However this is not directly referencing an
1031	   RTP stream, and use of multiple RTP streams in the same RTP session
1032	   is recommended in many other RFCs.

1034	   This confusion is clarified in RFC 5576 [RFC5576] section 3 by the
1035	   following statements:

1037	   "The term "media stream" does not appear in the SDP specification
1038	   itself, but is used by a number of SDP extensions, for instance,
1039	   Interactive Connectivity Establishment (ICE) [ICE], to denote the
1040	   object described by an SDP media description.  This term is
1041	   unfortunately rather confusing, as the RTP specification [RFC3550]
1042	   uses the term "media stream" to refer to an individual media source
1043	   or RTP packet stream, identified by an SSRC, whereas an SDP media
1044	   stream describes an entire RTP session, which can contain any number
1045	   of RTP sources."

1047	   In most cases, it will be sufficient that new sources are introduced
1048	   with a conference notification or RTCP message.  However, RFC 5576
1049	   [RFC5576] specifies attributes which may be used to more explicitly
1050	   announce new sources or restart of earlier established RTP streams.

1052	   This method is encouraged by draft-ietf-avtcore-multiplex-guidelines
1053	   [I-D.ietf-avtcore-multiplex-guidelines] section 5.2.

1055	   Normal operation will be that the bridge receives text packets from
1056	   the source and handles any text recovery and indication of loss
1057	   needed before queueing the resulting clean text for transmission from
1058	   the bridge to the receivers.

1060	   It may however also be possible for the bridge to just convey the
1061	   packet contents as received from the sources, with minor adjustments,
1062	   and let the receiving endpoint handle all aspects of recovery and
1063	   indication of loss, even for the source to bridge path.  In that case
1064	   also the sequence number must be maintained as it was at reception in
1065	   the bridge.  This mode needs further study before application.

1067	   Pros:

1069	   This method is the natural way to do multi-party bridging with RFC
1070	   4103 based RTT.  Only a small addition is included in the session
1071	   establishment to verify capability by the parties because many
1072	   implementations are done without multi-party capability.

1074	   This method has moderate overhead in terms of work for the mixer, but
1075	   high in terms of packet transmission rate.  Five sources sending
1076	   simultaneously cause the bridge to send 15 packets per second to each
1077	   receiver.

1079	   When loss of packets occur, it is possible to recover text from
1080	   redundancy at loss of up to the number of redundancy levels carried
1081	   in the RFC 4103 [RFC4103] stream(normally primary and two redundant
1082	   levels).

1084	   More loss than what can be recovered, can be detected and the marker
1085	   for text loss can be inserted in the correct stream.

1087	   It may be possible in some scenarios to keep the text encrypted
1088	   through the Translator.

1090	   Minimal delay.  The delay can often be kept close to 0 with at least
1091	   5 simultaneous sending participants.

1093	   Cons:

1095	   There are RTP implementations not supporting the Translator model.
1096	   They will need to use the fall-back to multi-party-unaware mixing.
1097	   An investigation about how common this is is needed before the method
1098	   is used.

1100	   The processing time in standard organisation will be long.

1102	   With many simultaneous sending sources, the total rate of packets
1103	   will be high, and can cause congestion.  The requirement to handle 3
1104	   simultaneous sources in this specification will cause 10 packets per
1105	   second that is manageable in most cases, e.g. considering that audio
1106	   usually use 50 packets per second.

1108	4.1.2.2.  Distributing packets in an end-to-end encryption structure

1110	   In order to achieve end-to-end encryption, it is possible to let the
1111	   packets from the sources just pass though a central distributor, and
1112	   handle the security agreements between the participants.
1113	   Specifications exist for a framework with this functionality for
1114	   application on RTP based conferences in
1115	   [I-D.ietf-perc-private-media-framework].  The RTP flow and mixing
1116	   characteristics has similarities with the method described under "RTP
1117	   Translator sending one RTT stream per participant" above.  RFC 4103
1118	   RTP streams [RFC4103] would fit into the structure and it would
1119	   provide a base for end-to-end encrypted rtt multi-party conferencing.

1121	   Pros:

1123	   Good security

1125	   Straightforward multi-party handling.

1127	   Cons:

1129	   Does not operate under the usual SIP central conferencing
1130	   architecture.

1132	   Requires the participants to perform a lot of key handling.

1134	   Is work in progress when this is written.

1136	4.1.2.3.  Mesh of RTP endpoints

1138	   Text from all participants are transmitted directly to all others in
1139	   one RTP session, without a central bridge.  The sources of the text
1140	   in each RTP packet are identified by the source network address and
1141	   the SSRC.

1143	   This method is described in RFC 7667, section 3.4 Point to multi-
1144	   point using mesh [RFC7667].

1146	   Pros:

1148	   When loss of packets occur, it is possible to recover text from
1149	   redundancy at loss of up to the number of redundancy levels carried
1150	   in the RFC 4103 [RFC4103] stream. (normally primary and two redundant
1151	   levels.

1153	   This method can be implemented with most RTP implementations.

1155	   Transmitted text can also be used with other transports than RTP

1157	   Cons:

1159	   This model is not described in IMS, NENA and EENA specifications, and
1160	   does therefore not meet the requirements.

1162	   Requires a drastically increasing number of connections when the
1163	   number of participants increase.

1165	4.1.2.4.  Multiple RTP sessions, one for each participant

1167	   Text from all participants are transmitted directly to all others in
1168	   one RTP session each, without a central bridge.  Each session is
1169	   established with a separate media description in SDP.  The sources of
1170	   the text in each RTP packet are identified by the source network
1171	   address and the SSRC.

1173	   Pros:

1175	   When loss of packets occur, it is possible to recover text from
1176	   redundancy at loss of up to the number of redundancy levels carried
1177	   in the RFC 4103 [RFC4103] stream. (normally primary and two redundant
1178	   levels.

1180	   Complete loss of text can be indicated in the received stream.

1182	   This method can be implemented with most RTP implementations.

1184	   End-to-end encryption is achievable.

1186	   Cons:

1188	   This method is not described in IMS, NENA and ETSI specifications and
1189	   does therefore not meet the requirements.

1191	   A lot of network resources are spent on setting up separate sessions
1192	   for each participant.

1194	5.  Preferred RTP-based multi-party RTT transport method

1196	   For RTP transport of RTT using RTP-mixer technology, one method for
1197	   multi-party mixing and transport stand out as fulfilling the goals
1198	   best and is therefore recommended.  That is: "RTP Mixer using the
1199	   default method but decreased transmission interval" Section 4.1.1.2

1201	   For RTP transport in separate streams or sessions, no current
1202	   recommendation can be made.  A bridging method in the process of
1203	   standardisation with interesting characteristics is the end-to-end
1204	   encryption model "perc" Section 4.1.2.2.

1206	6.  Session control of RTP-based multi-party RTT sessions

1208	   General session control aspects for multi-party sessions are
1209	   described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP)
1210	   Event Package for Conference State, and RFC 4579 [RFC4579] Session
1211	   Initiation Protocol (SIP) Call Control - Conferencing for User
1212	   Agents.  The nomenclature of these specifications are used here.

1214	   The procedures for a multi-party aware model for RTT-transmission
1215	   shall only be applied if a capability exchange for multi-party aware
1216	   real-time text transmission has been completed and a supported method
1217	   for multi-party real-time text transmission can be negotiated.

1219	   A method for detection of conference-awareness for centralized SIP
1220	   conferencing in general is specified in RFC 4579 [RFC4579].  The
1221	   focus sends the "isfocus" feature tag in a SIP Contact header.  This
1222	   causes the conference-aware endpoint to subscribe to conference
1223	   notifications from the focus.  The focus then sends notifications to
1224	   the endpoint about entering and disappearing conference participants
1225	   and their media capabilities.  The information is carried XML-
1226	   formatted in a 'conference-info' block in the notification according
1227	   to RFC 4575 [RFC4575].  The mechanism is described in detail in RFC
1228	   4575 [RFC4575].

1230	   Before a conference media server starts sending multi-party RTT to an
1231	   endpoint, a verification of its ability to handle multi-party RTT
1232	   must be made.  A decision on which mechanism to use for identifying
1233	   text from the different participants must also be taken, implicitly
1234	   or explicitly.  These verifications and decisions can be done in a
1235	   number of ways.  The most apparent ways are specified here and their
1236	   pros and cons described.  One of the methods is selected to be the
1237	   one to be used by implementations of the centralized conference model
1238	   according to this specification.

1240	6.1.  Implicit RTT multi-party capability indication

1242	   Capability for RTT multi-party handling can be decided to be
1243	   implicitly indicated by session control items.

1245	   The focus may implicitly indicate muti-party RTT capability by
1246	   including the media child with value "text" in the RFC 4575 [RFC4575]
1247	   conference-info provided in conference notifications.

1249	   An endpoint may implicitly indicate multi-party RTT capability by
1250	   including the text media in the SDP in the session control
1251	   transactions with the conference focus after the subscription to the
1252	   conference has taken place.

1254	   The implicit RTT capability indication means for the focus that it
1255	   can handle multi-party RTT according to the preferred method
1256	   indicated in the RTT multi-party methods section above.

1258	   The implicit RTT capability indication means for the endpoint that it
1259	   can handle multi-party RTT according to the preferred method
1260	   indicated in the RTT multi-party methods section above.

1262	   If the focus detects that an endpoint implicitly declared RTT multi-
1263	   party capability, it SHALL provide RTT according to the preferred
1264	   method.

1266	   If the focus detects that the endpoint does not indicate any RTT
1267	   multi-party capability, then it shall either provide RTT multi-party
1268	   text in the way specified for conference-unaware endpoint above, or
1269	   refuse to set up the session.

1271	   If the endpoint detects that the focus has implicitly declared RTT
1272	   multi-party capability, it shall be prepared to present RTT in a
1273	   multi-party fashion according to the preferred method.

1275	   Pros:

1277	   Acceptance of implicit multi-party capability implies that no
1278	   standardisation of explicit RTT multi-party capability exchange is
1279	   required.

1281	   Cons:

1283	   If other methods for multi-party RTT are to be used in the same
1284	   implementation environment as the preferred ones, then capability
1285	   exchange needs to be defined for them.

1287	   Cannot be used outside a strictly applied SIP central conference
1288	   model.

1290	6.2.  RTT multi-party capability declared by SIP media-tags

1292	   Specifications for RTT multi-party capability declarations can be
1293	   agreed for use as SIP media feature tags, to be exchanged during SIP
1294	   call control operation according to the mechanisms in RFC 3840
1295	   [RFC3840] and RFC 3841 [RFC3841].  Capability for the RTT Multi-party
1296	   capability is then indicated by the media feature tag "rtt-mix", with
1297	   a set of possible values for the different possible methods.

1299	   The possible values in the list may for example be:

1301	      rtp-mixer

1303	      perc

1305	   rtp-mixer indicates capability for using the RTP-mixer based
1306	   presentation of multi-party text.

1308	   perc indicates capability for using the perc based transmission of
1309	   multi-party text.

1311	   Example: Contact: <sip:a2@beco.example.com>

1313	   ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL"
1314	   ;+sip.rtt-mix="rtp-mixer"

1316	   If, after evaluation of the alternatives in this specification, only
1317	   one mixing method is selected to be brought to implementation, then
1318	   the media tag can be reduced to a single tag with no list of values.

1320	   An offer-answer exchange should take place and the common method
1321	   selected by the answering party shall be used in the session with
1322	   that UA.

1324	   When no common method is declared, then only the fallback method for
1325	   multi-party unaware participants can be used, or the session dropped.

1327	   If more than one text media section is included in SDP, all must be
1328	   capable of using the declared RTT multi-party method.

1330	   Pros:

1332	   Provides a clear decision method.

1334	   Can be extended with new mixing methods.

1336	   Can guide call routing to a suitable capable focus.

1338	   Cons:

1340	   Requires standardization and IANA registration.

1342	   Is not stream specific.  If more than one text stream is specified,
1343	   all must have the same type of multi-party capability.

1345	   Cannot be used in the WebRTC environment.

1347	6.3.  SDP media attribute for RTT multi-party capability indication

1349	   An attribute can be specified on media level, to be used in text
1350	   media SDP declarations for negotiating RTT multi-party capabilities.
1351	   The attribute can have the name "rtt-mix".

1353	   More than one attribute can be included in one media description.

1355	   The attribute can have a value.  The value can for example be:

1357	      rtp-mixer

1359	      rtp-translator
1360	      perc

1362	   rtp-mixer indicates capability for using the RTP-mixer and CSRC-list
1363	   based mixing of multi-party text.

1365	   rtp-translator indicates capability for using the RTP-translator
1366	   based mixing

1368	   perc indicates capability for using the perc based transmission of
1369	   multi-party text.

1371	   An offer-answer exchange should take place and the common method
1372	   selected by the answering party shall be used in the session with
1373	   that endpoint.

1375	   When no common method is declared, then only the fallback method for
1376	   multi-party unaware endpoints can be used.

1378	   Example: a=rtt-mix:rtp-mixer

1380	   If, after evaluation of the alternatives in this specification, only
1381	   one mixing method is selected to be brought to implementation, then
1382	   the attribute can be reduced to a single attribute with no list of
1383	   values.

1385	   Pros:

1387	   Provides a clear decision method.

1389	   Can be extended with new mixing methods.

1391	   Can be used on specific text media.

1393	   Can be used also for SDP-controlled WebRTC sessions with multiple
1394	   streams in the same data channel.

1396	   Cons:

1398	   Requires standardization and IANA registration.

1400	   Cannot guide SIP routing.

1402	6.4.  Simplified SDP media attribute for RTT multi-party capability
1403	      indication

1405	   An attribute can be specified on media level, to be used in text
1406	   media SDP declarations for negotiating RTT multi-party capabilities.
1407	   The attribute can have a name suitable for the selected method and no
1408	   value.  It would be selected and used if only one method for multi-
1409	   party rtt is brought forward from this specification, and the other
1410	   suppressed or found to be possible to negotiate in another way.

1412	   An offer-answer exchange should take place and if both parties
1413	   specify rtt-mixing capability with the same attribute, the selected
1414	   mixing method shall be used.

1416	   When no common method is declared, then only the fallback method for
1417	   multi-party unaware endpoints can be used, or the session not
1418	   accepted for multi-party use.

1420	   Example: a=rtt-mix-rtp-mixer

1422	   Pros:

1424	   Provides a clear decision method.

1426	   Very simple syntax and semantics.

1428	   Can be used on specific text media.

1430	   Cons:

1432	   Requires standardization and IANA registration.

1434	   If another RTT mixing method is also specified in the future, then
1435	   that method may also need to specify and register its own attribute,
1436	   instead of if an attribute with a parameter value is used, when only
1437	   an addition of a new possible value is needed.

1439	   Cannot guide SIP routing.

1441	6.5.  SDP format parameter for RTT multi-party capability indication

1443	   An FMTP format parameter can be specified for the RFC 4103
1444	   [RFC4103]media, to be used in text media SDP declarations for
1445	   negotiating RTT multi-party capabilities.  The parameter can have the
1446	   name "rtt-mix", with one or more of its possible values.

1448	   The possible values in the list are:

1450	      rtp-mixer

1452	      perc

1454	   rtp-mixer indicates capability for using the RTP-mixer based mixing
1455	   and presentation of multi-party text using the CSRC-list.

1457	   perc indicates capability for using the perc based transmission of
1458	   multi-party text.

1460	   Example: a=fmtp 96 98/98/98 rtt-mix=rtp-mixer

1462	   If, after evaluation of the alternatives in this specification, only
1463	   one mixing method is selected to be brought to implementation, then
1464	   the parameter can be reduced to a single parameter with no list of
1465	   values.

1467	   An offer-answer exchange should take place and the common method
1468	   selected by the answering party shall be used in the session with
1469	   that UA.

1471	   When no common method is declared, then only the fallback method can
1472	   be used, or the session denied.

1474	   Pros:

1476	   Provides a clear decision method.

1478	   Can be extended with new mixing methods.

1480	   Can be used on specific text media.

1482	   Can be used also for SDP-controlled WebRTC sessions with multiple
1483	   streams in the same data channel.

1485	   Cons:

1487	   Requires standardization and IANA registration.

1489	   May cause interop problems with current RFC4103 [RFC4103]
1490	   implementations not expecting a new fmtp-parameter.

1492	   Cannot guide SIP routing.

1494	6.6.  A text media subtype for support of multi-party rtt

1496	   Indicating a specific text media subtype in SDP is a straightforward
1497	   way for negotiating multi-party capability.  Especially if there are
1498	   format differences from the "text/red" and "text/t140" formats of
1499	   RFC4103 [RFC4103], then this is a natural way to do the negotiation
1500	   for multi-party rtt.

1502	   Pros:

1504	   No extra efforts if a new format is needed anyway.

1506	   Cons:

1508	   None specific to using the format indication for negotiation of
1509	   multi-party capability.  But only feasible if a new format is needed
1510	   anyway.

1512	6.7.  Preferred capability declaration method for RTP-based transport.

1514	   If the preferred transport method is one with a specific media
1515	   subtype in sdp, then speciication by media subtype is preferred.

1517	   If this would not be the case, then the preferred capability
1518	   declaration method would be the one with a specific SDP attribute for
1519	   the selected mixing method Section 6.4 because it is straightforward.

1521	6.8.  Identification of the source of text for RTP-based solutions

1523	   The main way to identify the source of text in the RTP based solution
1524	   is by the SSRC of the sending participant.  In the RTP-mixer
1525	   solution, this SSRC is included in the CSRC list of the transmitted
1526	   packets.  Further identification that may be needed for better
1527	   labelling of received text may be achieved from a number of sources.
1528	   It may be the RTCP SDES CNAME and NAME reports, and in the conference
1529	   notification data (RFC 4575) [RFC4575].

1531	   As soon as a new member is added to the RTP session, its
1532	   characteristics should be transmitted in RTCP SDES CNAME and NAME
1533	   reports according to section 6.5 in RFC 3550 [RFC3550].  The
1534	   information about the participant should also be included in the
1535	   conference data including the text media member in a notification
1536	   according to RFC 4575 [RFC4575].

1538	   The RTCP SDES report, SHOULD contain identification of the source
1539	   represented by the SSRC/CSRC identifier.  This identification MUST
1540	   contain the CNAME field and MAY contain the NAME field and other
1541	   defined fields of the SDES report.

1543	   A focus UA SHOULD primarily convey SDES information received from the
1544	   sources of the session members.  When such information is not
1545	   available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME
1546	   information from available information from the SIP session with the
1547	   participant.

1549	   Provision of detailed information in the NAME field has security
1550	   implications, especially if provided without encryption.

1552	7.  RTT bridging in WebRTC

1554	   Within WebRTC, real-time text is specified to be carried in WebRTC
1555	   data channels as specified in
1556	   [I-D.ietf-mmusic-t140-usage-data-channel].  A few ways to handle
1557	   multi-party RTT are mentioned briefly.  They are repeated below.

1559	7.1.  RTT bridging in WebRTC with one data channel per source

1561	   A straightforward way to handle multi-party RTT is for the bridge to
1562	   open one T.140 data channel per source towards the receiving
1563	   participants.

1565	   The stream-id forms a unique stream identification.

1567	   The identification of the source is made through the Label property
1568	   of the channel, and session information belonging to the source.  The
1569	   endpoint can compose a readable label for the presentation from this
1570	   information.

1572	   Pros:

1574	   This is a straightforward solution.

1576	   The load per source is low.

1578	   Cons:

1580	   With a high number of participants, the overhead of establishing and
1581	   maintaining the high number of data channels required may be high,
1582	   even if the load per channel is low.

1584	7.2.  RTT bridging in WebRTC with one common data channel

1586	   A way to handle multi-party RTT in WebRTC is for the bridge combine
1587	   text from all sources into one data channel and insert the sources in
1588	   the stream by a T.140 control code for source.

1590	   This method is described in a corresponding section for RTP
1591	   transmission above in Section 4.1.1.9.

1593	   The identification of the source is made through insertion in the
1594	   beginning of each text transmission from a source of a control code
1595	   extension "c" followed by a string representing the source, framed by
1596	   the control code start and end flags SOS and ST (See ITU-T T.140
1597	   [T140]).

1599	   A receiving endpoint is supposed to separate text items from the
1600	   different sources and identify and display them in a suitable way.

1602	   The endpoint does not always display the source identification in the
1603	   received text at the place where it is received, but has the
1604	   information as a guide for planning the presentation of received
1605	   text.  A label corresponding to the source identification is
1606	   presented when needed depending on the selected presentation style.

1608	   Pros:

1610	   This solution has relatively low overhead on session and network
1611	   level

1613	   Cons:

1615	   This solution has higher overhead on the media contents level than
1616	   the WebRTC solution above.

1618	   Standardisation of the new control code "c" in ITU-T T.140 [T140] is
1619	   required.

1621	   The conference server need to be allowed to decrypt/encrypt the data
1622	   channel contents.

1624	7.3.  Preferred rtt multi-party method for WebRTC

1626	   For WebRTC, one method is to prefer because of the simplicity.  So,
1627	   for WebRTC, the method to implement for multi-party RTT with multi-
1628	   party aware parties when no other method is explicitly agreed between
1629	   implementing parties is: "RTT bridging in WebRTC with one data
1630	   channel per source" Section 7.1.

1632	8.  Presentation of multi-party text

1634	   All session participants with RTP based transport MUST observe the
1635	   SSRC/CSRC field of incoming text RTP packets, and make note of which
1636	   source they came from in order to be able to present text in a way
1637	   that makes it easy to read text from each participant in a session,
1638	   and get information about the source of the text.

1640	   In the WebRTC case, the Label parameter and other provided endpoint
1641	   information should be used for the same purpose.

1643	8.1.  Associating identities with text streams

1645	   A source identity SHOULD be composed from available information
1646	   sources and displayed together with the text as indicated in ITU-T
1647	   T.140 Appendix[T140].

1649	   The source identity should primarily be the NAME field from incoming
1650	   SDES packets.  If this information is not available, and the session
1651	   is a two-party session, then the T.140 source identity SHOULD be
1652	   composed from the SIP session participant information.  For multi-
1653	   party sessions the source identity may be composed by local
1654	   information if sufficient information is not available in the
1655	   session.

1657	   Applications may abbreviate the presented source identity to a
1658	   suitable form for the available display.

1660	   Applications may also replace received source information with
1661	   internally used nicknames.

1663	8.2.  Presentation details for multi-party aware endpoints.

1665	   The multi-party aware endpoint should after any action for recovery
1666	   of data from lost packets, separate the incoming streams and present
1667	   them according to the style that the receiving application supports
1668	   and the user has selected.  The decisions taken for presentation of
1669	   the multi-party interchange shall be purely on the receiving side.
1670	   The sending application must not insert any item in the stream to
1671	   influence presentation that is not requested by the sending
1672	   participant.

1674	8.2.1.  Bubble style presentation

1676	   One often used style is to present real-time text in chunks in
1677	   readable bubbles identified by labels containing names of sources.
1678	   Bubbles are placed in one column in the presentation area and are
1679	   closed and moved upwards in the presentation area after certain items
1680	   or events, when there is also newer text from another source that
1681	   would go into a new bubble.  The text items that allows bubble
1682	   closing are any character closing a phrase or sentence followed by a
1683	   space or a timeout of a suitable time (about 10 seconds).

1685	   Real-time active text sent from the local user should be presented in
1686	   a separate area.  When there is a reason to close a bubble from the
1687	   local user, the bubble should be placed above all real-time active
1688	   bubbles, so that the time order that real-time text entries were
1689	   completed is visible.

1691	   Scrolling is usually provided for viewing of recent or older text.
1692	   When scrolling is done to an earlier point in the text, the
1693	   presentation shall not move the scroll position by new received text.
1694	   It must be the decision of the local user to return to automatic
1695	   viewing of latest text actions.  It may be useful with an indication
1696	   that there is new text to read after scrolling to an earlier position
1697	   has been activated.

1699	   The presentation area may become too small to present all text in all
1700	   real-time active bubbles.  Various techniques can be applied to
1701	   provide a good overview and good reading opportunity even in such
1702	   situations.  The active real-time bubble may have a limited number of
1703	   lines and if their contents need more lines, then a scrolling
1704	   opportunity within the real-time active bubble is provided.  Another
1705	   method can be to only show the label and the last line of the active
1706	   real-time bubble contents, and make it possible to expand or compress
1707	   the bubble presentation between full view and one line view.

1709	   Erasures require special consideration.  Erasure within a real-time
1710	   active bubble is straightforward.  But if erasure from one
1711	   participant affects the last character before a bubble, the whole
1712	   previous bubble becomes the actual bubble for real-time action by
1713	   that participant and is placed below all other bubbles in the
1714	   presentation area.  If the border between bubbles was caused by the
1715	   CRLF characters (instead of the normal "Line Separator"), only one
1716	   erasure action is required to erase this bubble border.  When a
1717	   bubble is closed, it is moved up, above all real-time active bubbles.

1719	   A three-party view is shown in this example .

1721	                 _________________________________________________
1722	                |                                              |^|
1723	                |                                              |-|
1724	                |[Alice] Hi, Alice here.                       | |
1725	                |                                              | |
1726	                |[Bob] Bob as well.                            | |
1727	                |                                              | |
1728	                |[Eve] Hi, this is Eve, calling from Paris.    | |
1729	                |      I thought you should be here.           | |
1730	                |                                              | |
1731	                |[Alice] I am coming on Thursday, my           | |
1732	                |      performance is not until Friday morning.| |
1733	                |                                              | |
1734	                |[Bob] And I on Wednesday evening.             | |
1735	                |                                              | |
1736	                |[Alice] Can we meet on Thursday evening?      | |
1737	                |                                              | |
1738	                |[Eve] Yes, definitely. How about 7pm.         | |
1739	                |     at the entrance of the restaurant        | |
1740	                |     Le Lion Blanc?                           | |
1741	                |[Eve] we can have dinner and then take a walk | |
1742	                |                                              | |
1743	                | <Eve-typing> But I need to be back to        | |
1744	                |    the hotel by 11 because I need            | |
1745	                |                                              | |
1746	                | <Bob-typing> I wou                           |-|
1747	                |______________________________________________|v|
1748	                | of course, I underst                           |
1749	                |________________________________________________|

1751	               Figure 1: Three-party call with bubble style.

1753	   Figure 1: Example of a three-party call presented in the bubble
1754	   style.

1756	8.2.2.  Other presentation styles

1758	   Other presentation styles than the bubble style may be arranged and
1759	   appreciated by the users.  In a video conference one way may be to
1760	   have a real-time text area below the video view of each participant.
1761	   Another view may be to provide one column in a presentation area for
1762	   each participant and place the text entries in a relative vertical
1763	   position corresponding to when text entry in them was completed.  The
1764	   labels can then be placed in the column header.  The considerations
1765	   for ending and moving and erasure of entered text discussed above for
1766	   the bubble style are valid also for these styles.

1768	   This figure shows how a coordinated column view MAY be presented.

1770	   _____________________________________________________________________
1771	   |       Bob          |       Eve            |       Alice           |
1772	   |____________________|______________________|_______________________|
1773	   |                    |                      |I will arrive by TGV.  |
1774	   |My flight is to Orly|                      |Convenient to the main |
1775	   |                    |Hi all, can we plan   |station.               |
1776	   |                    |for the seminar?      |                       |
1777	   |Eve, will you do    |                      |                       |
1778	   |your presentation on|                      |                       |
1779	   |Friday?             |Yes, Friday at 10.    |                       |
1780	   |Fine, wo            |                      |We need to meet befo   |
1781	   |___________________________________________________________________|

1783	   Figure 2: A coordinated column-view of a three-party session with
1784	   entries ordered in approximate time-order.

1786	9.  Presentation details for multi-party unaware endpoints.

1788	   Multi-party unaware endpoints are prepared only for presentation of
1789	   two sources of text, the local user and a remote user.  If mixing for
1790	   multi-party unaware endpoints is to be supported, in order to enable
1791	   some multi-party communication with such endpoint, the mixer need to
1792	   plan the presentation and insert labels and line breaks before
1793	   lables.  Many limitations appear for this presentation mode, and it
1794	   must be seen as a fallback and a last resort.

1796	   A procedure for presenting RTT to a conference-unaware endpoint is
1797	   included in [I-D.ietf-avtcore-multi-party-rtt-mix]

1799	10.  Security Considerations

1801	   The security considerations valid for RFC 4103 [RFC4103] and RFC 3550
1802	   [RFC3550] are valid also for the multi-party sessions with text.

1804	11.  IANA Considerations

1806	   The items for indication and negotiation of capability for multi-
1807	   party rtt should be registered with IANA in the specifications where
1808	   they are specified in detail.

1810	12.  Congestion considerations

1812	   The congestion considerations described in RFC 4103 [RFC4103] are
1813	   valid also for the recommended RTP-based multi-party use of the real-
1814	   time text transport.  A risk for congestion may appear if a number of
1815	   conference participants are active transmitting text simultaneously,
1816	   because the recommended RTP-based multi-party transmission method
1817	   does not allow multiple sources of text to contribute to the same
1818	   packet.

1820	   In situations of risk for congestion, the Focus UA MAY combine
1821	   packets from the same source to increase the transmission interval
1822	   per source up to one second.  Local conference policy in the Focus UA
1823	   may be used to decide which streams shall be selected for such
1824	   transmission frequency reduction.

1826	13.  Acknowledgements

1828	   Arnoud van Wijk for contributions to an earlier, expired draft of
1829	   this memo.

1831	14.  Change history

1833	14.1.  Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-03

1835	   Modified info on the method with RFC 4103 format and sdp attribute
1836	   "rtt-mix-rtp-mixer".

1838	   Increased the performance requirements section.

1840	   Inserted recommendations, with emphasis on ease of implementation and
1841	   ease of standardisation.

1843	14.2.  Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-02

1845	   Added detail in the section on RTP translator model alternative
1846	   4.1.2.1.

1848	14.3.  Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-01

1850	   Added three more methods for RTP-mixer mixing.  Two RFC 5109 FEC
1851	   based and another with modified data header to detect source of
1852	   completely lost text.

1854	   Separated RTP-based and WebRTC based solutions.

1856	   Deleted the multi-party-unaware mixing procedure appendix.  It is now
1857	   included in the draft draft-ietf-avtcore-multi-party-rtt-mix.  Kept a
1858	   section with a reference to the new place.

1860	14.4.  Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to draft-
1861	       hellstrom-avtcore-multi-party-rtt-solutions-00

1863	   Add discussion about switching performance, as discussed in avtcore
1864	   on March 13.

1866	   Added that a decrease of transmission interval to 100 ms increases
1867	   switching performance by a factor 3, but still not sufficient.

1869	   Added that the CSRC-list method also uses 100 milliseconds
1870	   transmission interval.

1872	   Added the method with multiple primary text in each packet.

1874	   Added the timestamp-based method for rtp-mixing proposed by James
1875	   Hamlin on March 14.

1877	   Corrected the chat style presentation example picture.  Delete a few
1878	   "[mix]".

1880	14.5.  Changes from version draft-hellstrom-mmusic-multi-party-rtt-01 to
1881	       -02

1883	   Change from a general overview to overview with clear
1884	   recommendations.

1886	   Splits text coordination methods in three groups.

1888	   Recommends rtt-mixer with sources in CSRC-list but referenes to its
1889	   spec for details.

1891	   Shortened Appendix with conference-unaware example.

1893	   Cleaned up preferences.

1895	   Inserted pictures of screen-views.

1897	15.  References

1899	15.1.  Normative References

1901	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1902	              Requirement Levels", BCP 14, RFC 2119,
1903	              DOI 10.17487/RFC2119, March 1997,
1904	              <https://www.rfc-editor.org/info/rfc2119>.

1906	15.2.  Informative References

1908	   [EN301549] ETSI, "EN 301 549. Accessibility requirements for ICT
1909	              products and services", November 2019,
1910	              <https://www.etsi.org/deliver/
1911	              etsi_en/301500_301599/301549/03.01.01_60/
1912	              en_301549v030101p.pdf>.

1914	   [I-D.ietf-avtcore-multi-party-rtt-mix]
1915	              Hellstrom, G., "RTP-mixer formatting of multi-party Real-
1916	              time text", Work in Progress, Internet-Draft, draft-ietf-
1917	              avtcore-multi-party-rtt-mix-06, 11 June 2020,
1918	              <https://tools.ietf.org/html/draft-ietf-avtcore-multi-
1919	              party-rtt-mix-06>.

1921	   [I-D.ietf-avtcore-multiplex-guidelines]
1922	              Westerlund, M., Burman, B., Perkins, C., Alvestrand, H.,
1923	              and R. Even, "Guidelines for using the Multiplexing
1924	              Features of RTP to Support Multiple Media Streams", Work
1925	              in Progress, Internet-Draft, draft-ietf-avtcore-multiplex-
1926	              guidelines-12, 16 June 2020, <https://tools.ietf.org/html/
1927	              draft-ietf-avtcore-multiplex-guidelines-12>.

1929	   [I-D.ietf-mmusic-t140-usage-data-channel]
1930	              Holmberg, C. and G. Hellstrom, "T.140 Real-time Text
1931	              Conversation over WebRTC Data Channels", Work in Progress,
1932	              Internet-Draft, draft-ietf-mmusic-t140-usage-data-channel-
1933	              14, 10 April 2020, <https://tools.ietf.org/html/draft-
1934	              ietf-mmusic-t140-usage-data-channel-14>.

1936	   [I-D.ietf-perc-private-media-framework]
1937	              Jones, P., Benham, D., and C. Groves, "A Solution
1938	              Framework for Private Media in Privacy Enhanced RTP
1939	              Conferencing (PERC)", Work in Progress, Internet-Draft,
1940	              draft-ietf-perc-private-media-framework-12, 5 June 2019,
1941	              <https://tools.ietf.org/html/draft-ietf-perc-private-
1942	              media-framework-12>.

1944	   [NENAi3]   NENA, "NENA-STA-010.2-2016. Detailed Functional and
1945	              Interface Standards for the NENA i3 Solution", October
1946	              2016, <https://www.nena.org/page/i3_Stage3>.

1948	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
1949	              Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse-
1950	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
1951	              DOI 10.17487/RFC2198, September 1997,
1952	              <https://www.rfc-editor.org/info/rfc2198>.

1954	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
1955	              A., Peterson, J., Sparks, R., Handley, M., and E.
1956	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
1957	              DOI 10.17487/RFC3261, June 2002,
1958	              <https://www.rfc-editor.org/info/rfc3261>.

1960	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
1961	              with Session Description Protocol (SDP)", RFC 3264,
1962	              DOI 10.17487/RFC3264, June 2002,
1963	              <https://www.rfc-editor.org/info/rfc3264>.

1965	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1966	              Jacobson, "RTP: A Transport Protocol for Real-Time
1967	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
1968	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

1970	   [RFC3840]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat,
1971	              "Indicating User Agent Capabilities in the Session
1972	              Initiation Protocol (SIP)", RFC 3840,
1973	              DOI 10.17487/RFC3840, August 2004,
1974	              <https://www.rfc-editor.org/info/rfc3840>.

1976	   [RFC3841]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller
1977	              Preferences for the Session Initiation Protocol (SIP)",
1978	              RFC 3841, DOI 10.17487/RFC3841, August 2004,
1979	              <https://www.rfc-editor.org/info/rfc3841>.

1981	   [RFC4103]  Hellstrom, G. and P. Jones, "RTP Payload for Text
1982	              Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005,
1983	              <https://www.rfc-editor.org/info/rfc4103>.

1985	   [RFC4353]  Rosenberg, J., "A Framework for Conferencing with the
1986	              Session Initiation Protocol (SIP)", RFC 4353,
1987	              DOI 10.17487/RFC4353, February 2006,
1988	              <https://www.rfc-editor.org/info/rfc4353>.

1990	   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A
1991	              Session Initiation Protocol (SIP) Event Package for
1992	              Conference State", RFC 4575, DOI 10.17487/RFC4575, August
1993	              2006, <https://www.rfc-editor.org/info/rfc4575>.

1995	   [RFC4579]  Johnston, A. and O. Levin, "Session Initiation Protocol
1996	              (SIP) Call Control - Conferencing for User Agents",
1997	              BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006,
1998	              <https://www.rfc-editor.org/info/rfc4579>.

2000	   [RFC4597]  Even, R. and N. Ismail, "Conferencing Scenarios",
2001	              RFC 4597, DOI 10.17487/RFC4597, August 2006,
2002	              <https://www.rfc-editor.org/info/rfc4597>.

2004	   [RFC5109]  Li, A., Ed., "RTP Payload Format for Generic Forward Error
2005	              Correction", RFC 5109, DOI 10.17487/RFC5109, December
2006	              2007, <https://www.rfc-editor.org/info/rfc5109>.

2008	   [RFC5194]  van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real-
2009	              Time Text over IP Using the Session Initiation Protocol
2010	              (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008,
2011	              <https://www.rfc-editor.org/info/rfc5194>.

2013	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
2014	              Media Attributes in the Session Description Protocol
2015	              (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
2016	              <https://www.rfc-editor.org/info/rfc5576>.

2018	   [RFC6443]  Rosen, B., Schulzrinne, H., Polk, J., and A. Newton,
2019	              "Framework for Emergency Calling Using Internet
2020	              Multimedia", RFC 6443, DOI 10.17487/RFC6443, December
2021	              2011, <https://www.rfc-editor.org/info/rfc6443>.

2023	   [RFC6881]  Rosen, B. and J. Polk, "Best Current Practice for
2024	              Communications Services in Support of Emergency Calling",
2025	              BCP 181, RFC 6881, DOI 10.17487/RFC6881, March 2013,
2026	              <https://www.rfc-editor.org/info/rfc6881>.

2028	   [RFC7667]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
2029	              DOI 10.17487/RFC7667, November 2015,
2030	              <https://www.rfc-editor.org/info/rfc7667>.

2032	   [T140]     ITU-T, "Recommendation ITU-T T.140 (02/1998), Protocol for
2033	              multimedia application text conversation", February 1998,
2034	              <https://www.itu.int/rec/T-REC-T.140-199802-I/en>.

2036	   [T140ad1]  ITU-T, "Recommendation ITU-T.140 Addendum 1 - (02/2000),
2037	              Protocol for multimedia application text conversation",
2038	              February 2000,
2039	              <https://www.itu.int/rec/T-REC-T.140-200002-I!Add1/en>.

2041	   [TS103479] ETSI, "TS 103 479. Emergency communications (EMTEL); Core
2042	              elements for network independent access to emergency
2043	              services", December 2019, <https://www.etsi.org/deliver/
2044	              etsi_ts/103400_103499/103479/01.01.01_60/
2045	              ts_103479v010101p.pdf>.

2047	   [TS22173]  3GPP, "IP Multimedia Core Network Subsystem (IMS)
2048	              Multimedia Telephony Service and supplementary services;
2049	              Stage 1", 3GPP TS 22.173 17.1.0, 20 December 2019,
2050	              <http://www.3gpp.org/ftp/Specs/html-info/22173.htm>.

2052	   [TS24147]  3GPP, "Conferencing using the IP Multimedia (IM) Core
2053	              Network (CN) subsystem; Stage 3", 3GPP TS 24.147 16.0.0,
2054	              19 December 2019,
2055	              <http://www.3gpp.org/ftp/Specs/html-info/24147.htm>.

2057	Author's Address

2059	   Gunnar Hellstrom
2060	   Gunnar Hellstrom Accessible Communication
2061	   Esplanaden 30
2062	   SE-136 70 Vendelso
2063	   Sweden

2065	   Phone: +46 708 204 288
2066	   Email: gunnar.hellstrom@ghaccess.se