idnits 2.17.1 

draft-hellstrom-avtcore-multi-party-rtt-solutions-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RFC4103]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (19 June 2020) is 1399 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'ICE' is mentioned on line 981, but not defined

  == Unused Reference: 'RFC3264' is defined on line 1890, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-20) exists of
     draft-ietf-avtcore-multi-party-rtt-mix-06


     Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                             G. Hellstrom
3	Internet-Draft                 Gunnar Hellstrom Accessible Communication
4	Intended status: Informational                              19 June 2020
5	Expires: 21 December 2020

7	           Real-time text solutions for multi-party sessions
8	          draft-hellstrom-avtcore-multi-party-rtt-solutions-02

10	Abstract

12	   This document specifies methods for Real-Time Text (RTT) media
13	   handling in multi-party calls.  The main transport is to carry Real-
14	   Time text by the RTP protocol in a time-sampled mode according to RFC
15	   4103 [RFC4103].  The mechanisms enable the receiving application to
16	   present the received real-time text media separated per source, in
17	   different ways according to user preferences.  Some presentation
18	   related features are also described explaining suitable variations of
19	   transmission and presentation of text.

21	   Call control features are described for the SIP environment.  A
22	   number of alternative methods for providing the multi-party
23	   negotiation, transmission and presentation are discussed and a
24	   recommendation for the main ones is provided.  The main solution for
25	   SIP based centralized multi-party handling of real-time text is
26	   achieved through a media control unit coordinating multiple RTP text
27	   streams into one RTP stream.

29	   Alternative methods using a single RTP stream and source
30	   identification inline in the text stream are also described, one of
31	   them being provided as a lower functionality fallback method for
32	   endpoints with no multi-party awareness for RTT.

34	   Bridging methods where the text stream is carried without the
35	   contents being dealt with in detail by the bridge are also discussed.

37	   Brief information is also provided for multi-party RTT in the WebRTC
38	   environment.

40	   The intention is to provide background for decisions, specification
41	   and implementation of selected methods.

43	Status of This Memo

45	   This Internet-Draft is submitted in full conformance with the
46	   provisions of BCP 78 and BCP 79.

48	   Internet-Drafts are working documents of the Internet Engineering
49	   Task Force (IETF).  Note that other groups may also distribute
50	   working documents as Internet-Drafts.  The list of current Internet-
51	   Drafts is at https://datatracker.ietf.org/drafts/current/.

53	   Internet-Drafts are draft documents valid for a maximum of six months
54	   and may be updated, replaced, or obsoleted by other documents at any
55	   time.  It is inappropriate to use Internet-Drafts as reference
56	   material or to cite them other than as "work in progress."

58	   This Internet-Draft will expire on 21 December 2020.

60	Copyright Notice

62	   Copyright (c) 2020 IETF Trust and the persons identified as the
63	   document authors.  All rights reserved.

65	   This document is subject to BCP 78 and the IETF Trust's Legal
66	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
67	   license-info) in effect on the date of publication of this document.
68	   Please review these documents carefully, as they describe your rights
69	   and restrictions with respect to this document.  Code Components
70	   extracted from this document must include Simplified BSD License text
71	   as described in Section 4.e of the Trust Legal Provisions and are
72	   provided without warranty as described in the Simplified BSD License.

74	Table of Contents

76	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
77	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   5
78	   2.  Centralized conference model  . . . . . . . . . . . . . . . .   5
79	   3.  Requirements on multi-party RTT . . . . . . . . . . . . . . .   6
80	   4.  RTP based solutions . . . . . . . . . . . . . . . . . . . . .   7
81	     4.1.  Coordination of text RTP streams  . . . . . . . . . . . .   7
82	       4.1.1.  RTP-based solutions with a central mixer  . . . . . .   7
83	         4.1.1.1.  RTP Mixer using default RFC 4103 methods  . . . .   7
84	         4.1.1.2.  RTP Mixer using the default method but decreased
85	                 transmission interval . . . . . . . . . . . . . . .   8
86	         4.1.1.3.  RTP Mixer with frequent transmission and indicating
87	                 sources in CSRC-list  . . . . . . . . . . . . . . .   9
88	         4.1.1.4.  RTP Mixer using timestamp to identify
89	                 redundancy  . . . . . . . . . . . . . . . . . . . .  10
90	         4.1.1.5.  RTP Mixer with multiple primary data in each packet
91	                 and individual sequence numbers . . . . . . . . . .  11
92	         4.1.1.6.  RTP Mixer with multiple primary data in each
93	                 packet  . . . . . . . . . . . . . . . . . . . . . .  12
94	         4.1.1.7.  RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy
95	                 in the packets  . . . . . . . . . . . . . . . . . .  13

97	         4.1.1.8.  RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy
98	                 and separate sequence number in the packets . . . .  15
99	         4.1.1.9.  RTP Mixer indicating participants by a control code
100	                 in the stream . . . . . . . . . . . . . . . . . . .  17
101	         4.1.1.10. Mixing for multi-party unaware user agents  . . .  18
102	       4.1.2.  RTP-based bridging with minor RTT media contents
103	               reformatting by the bridge  . . . . . . . . . . . . .  20
104	         4.1.2.1.  RTP Translator sending one RTT stream per
105	                 participant . . . . . . . . . . . . . . . . . . . .  20
106	         4.1.2.2.  Distributing packets in an end-to-end encryption
107	                 structure . . . . . . . . . . . . . . . . . . . . .  23
108	         4.1.2.3.  Mesh of RTP endpoints . . . . . . . . . . . . . .  23
109	         4.1.2.4.  Multiple RTP sessions, one for each
110	                 participant . . . . . . . . . . . . . . . . . . . .  24
111	   5.  Preferred RTP-based multi-party RTT transport method  . . . .  25
112	   6.  Session control of RTP-based multi-party RTT sessions . . . .  25
113	     6.1.  Implicit RTT multi-party capability indication  . . . . .  26
114	     6.2.  RTT multi-party capability declared by SIP media-tags . .  27
115	     6.3.  SDP media attribute for RTT multi-party capability
116	           indication  . . . . . . . . . . . . . . . . . . . . . . .  28
117	     6.4.  Simplified SDP media attribute for RTT multi-party
118	           capability indication . . . . . . . . . . . . . . . . . .  29
119	     6.5.  SDP format parameter for RTT multi-party capability
120	           indication  . . . . . . . . . . . . . . . . . . . . . . .  30
121	     6.6.  A text media subtype for support of multi-party rtt . . .  31
122	     6.7.  Preferred capability declaration method for RTP-based
123	           transport.  . . . . . . . . . . . . . . . . . . . . . . .  31
124	     6.8.  Identification of the source of text for RTP-based
125	           solutions . . . . . . . . . . . . . . . . . . . . . . . .  32
126	   7.  RTT bridging in WebRTC  . . . . . . . . . . . . . . . . . . .  32
127	     7.1.  RTT bridging in WebRTC with one data channel per
128	           source  . . . . . . . . . . . . . . . . . . . . . . . . .  32
129	     7.2.  RTT bridging in WebRTC with one common data channel . . .  33
130	     7.3.  Preferred rtt multi-party method for WebRTC . . . . . . .  34
131	   8.  Presentation of multi-party text  . . . . . . . . . . . . . .  34
132	     8.1.  Associating identities with text streams  . . . . . . . .  34
133	     8.2.  Presentation details for multi-party aware endpoints. . .  35
134	       8.2.1.  Bubble style presentation . . . . . . . . . . . . . .  35
135	       8.2.2.  Other presentation styles . . . . . . . . . . . . . .  37
136	   9.  Presentation details for multi-party unaware endpoints. . . .  37
137	   10. Security Considerations . . . . . . . . . . . . . . . . . . .  37
138	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  38
139	   12. Congestion considerations . . . . . . . . . . . . . . . . . .  38
140	   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  38
141	   14. Change history  . . . . . . . . . . . . . . . . . . . . . . .  38
142	     14.1.  Changes to
143	            draft-hellstrom-avtcore-multi-party-rtt-solutions-02 . .  38

145	     14.2.  Changes to
146	            draft-hellstrom-avtcore-multi-party-rtt-solutions-01 . .  38
147	     14.3.  Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to
148	            draft-hellstrom-avtcore-multi-party-rtt-solutions-00 . .  39
149	     14.4.  Changes from version
150	            draft-hellstrom-mmusic-multi-party-rtt-01 to -02 . . . .  39
151	   15. References  . . . . . . . . . . . . . . . . . . . . . . . . .  39
152	     15.1.  Normative References . . . . . . . . . . . . . . . . . .  39
153	     15.2.  Informative References . . . . . . . . . . . . . . . . .  39
154	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  43

156	1.  Introduction

158	   Real-time text (RTT) is a medium in real-time conversational
159	   sessions.  Text entered by participants in a session is transmitted
160	   in a time-sampled fashion, so that no specific user action is needed
161	   to cause transmission.  This gives a direct flow of text in the rate
162	   it is created, that is suitable in a real-time conversational
163	   setting.  The real-time text medium can be combined with other media
164	   in multimedia sessions.

166	   Media from a number of multimedia session participants can be
167	   combined in a multi-party session.  The present document specifies
168	   how the real-time text streams can be handled in multi-party
169	   sessions.  Recommendations are provided for preferred methods.

171	   The description is mainly focused on the transport level, but also
172	   describes a few session and presentation level aspects.

174	   Transport of real-time text is specified in RFC 4103 [RFC4103] RTP
175	   Payload for text conversation.  It makes use of RFC 3550 [RFC3550]
176	   Real Time Protocol, for transport.  Robustness against network
177	   transmission problems is normally achieved through redundant
178	   transmission based on the principle from RFC 2198 [RFC2198], with one
179	   primary and two redundant transmission of each text element.  Primary
180	   and redundant transmissions are combined in packets and described by
181	   a redundancy header.  This transport is usually used in the SIP
182	   Session Initiation Protocol RFC 3261 [RFC3261] environment.

184	   A very brief overview of functions for real-time text handling in
185	   multi-party sessions is described in RFC 4597 [RFC4597] Conferencing
186	   Scenarios, sections 4.8 and 4.10.  The present specification builds
187	   on that description and indicates which protocol mechanisms should be
188	   used to implement multi-party handling of real-time text.

190	   Real-time text can also be transported in the WebRTC environment, by
191	   using WebRTC data channels according to
192	   [I-D.ietf-mmusic-t140-usage-data-channel].  Multi-party aspects for
193	   WebRTC solutions are briefly covered.

195	1.1.  Requirements Language

197	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
198	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
199	   document are to be interpreted as described in RFC 2119 [RFC2119].

201	2.  Centralized conference model

203	   In the centralized conference model for SIP, introduced in RFC 4353
204	   [RFC4353] "A Framework for Conferencing with the Session Initiation
205	   Protocol (SIP)", one function co-ordinates the communication with
206	   participants in the multi-party session.  This function also controls
207	   media mixer functions for the media appearing in the session.  The
208	   central function is common for control of all media, while the media
209	   mixers may work differently for each media.

211	   The central function is called the Focus UA.  Many variants exist for
212	   setting up sessions including the multipoint control centre.  It is
213	   not within scope of this description to describe these, but rather
214	   the media specific handling in the mixer required to handle multi-
215	   party calls with RTT.

217	   The main principle for handling real-time text media in a centralized
218	   conference is that one RTP session for real-time text is established
219	   including the multipoint media control centre and the participating
220	   endpoints which are going to have real-time text exchange with the
221	   others.

223	   The different possible mechanisms for mixing and transporting RTT
224	   differs in the way they multiplex the text streams and how they
225	   identify the sources of the streams.  RFC 7667 [RFC7667] describes a
226	   number of possible use cases for RTP.  This specification refers to
227	   different sections of RFC 7667 for further reading of the situations
228	   caused by the different possible design choices.

230	   The recommended method for using RTT in a centralized conference
231	   model is specified in [I-D.ietf-avtcore-multi-party-rtt-mix] based on
232	   the recommendations in the present document.

234	   Real-time text can also be transported in the WebRTC environment, by
235	   using WebRTC data channels according to
236	   [I-D.ietf-mmusic-t140-usage-data-channel].  Ways to handle multi-
237	   party calls in that environmnent are also specified.

239	3.  Requirements on multi-party RTT

241	   The following requirements are placed on multi-party RTT:

243	      A solution shall be applicable to IMS (3GPP TS 22.173)[TS22173],
244	      SIP based VoIP and Next Generation Emergency Services (NENA i3
245	      [NENAi3], ETSI TS 103 479 [TS103479], RFC 6443[RFC6443]).

247	      The transmission interval for text must not be longer than 500
248	      milliseconds when there is anything available to send.  Ref ITU-T
249	      T.140 [T140].

251	      If text loss is detected or suspected, a missing text marker shall
252	      be inserted in the text stream.  Ref ITU-T T.140 Amendment 1
253	      [T140ad1].  ETSI EN 301 549 [EN301549]

255	      The display of text from the members of the conversation shall be
256	      arranged so that the text from each participant is clearly
257	      readable, and its source and the relative timing of entered text
258	      is visualized in the display.  Mechanisms for looking back in the
259	      contents from the current session should be provided.  The text
260	      should be displayed as soon as it is received.  Ref ITU-T T.140
261	      [T140]

263	      Bridges must be multimedia capable (voice, video, text).  Ref NENA
264	      i3 STA-010.2.  [NENAi3]

266	      R7: It MUST be possible to use real-time text in conferences both
267	      as a medium of discussion between individual participants (for
268	      example, for sidebar discussions in real-time text while listening
269	      to the main conference audio) and for central support of the
270	      conference with real-time text interpretation of speech.  Ref RFC
271	      5194.[RFC5194]

273	      It should be possible to protect RTT contents with usual means for
274	      privacy and integrity.Ref RFC 6881 section 16.  [RFC6881]

276	      Conferencing procedures are documented in RFC 4579 [RFC4579].  Ref
277	      NENA i3 STA-010.2.[NENAi3]

279	      Conferencing applies to any kind of media stream by which users
280	      may want to communicate.  Ref 3GPP TS 24.147 [TS24147]

282	      The framework for SIP conferences is specified in RFC 4353
283	      [RFC4353].  Ref 3GPP TS 24.147 [TS24147]

285	      The mixer performance requirements can be expressed in two
286	      numbers.

288	      1) The number of participants who can transmit simultaneously with
289	      the text not being delayed in the mixer more than 500
290	      milliseconds.  This requirement is depending on the application.
291	      Five simultaneous transmitting participants is a sufficiently high
292	      number for most situations.

294	      2) The switching time from when the mixer is transmitting text
295	      from one participant and text arrives from another participant,
296	      until the mixer sends the text from the second participant.  This
297	      time should not be more than 500 milliseconds when there are up to
298	      five participants sending text simultaneously.

300	4.  RTP based solutions

302	4.1.  Coordination of text RTP streams

304	   Coordinating and sending text RTP streams in the multi-party session
305	   can be done in a number of ways.  The most suitable methods are
306	   specified here with pros and cons.

308	   A receiving and presenting endpoint MUST separate text from the
309	   different sources and identify and display them accordingly.

311	4.1.1.  RTP-based solutions with a central mixer

313	   A set of solutions can be based on the central RTP mixer.  They are
314	   described here and a preferred method selected.

316	4.1.1.1.  RTP Mixer using default RFC 4103 methods

318	   Without any extra specifications, a mixer would transmit with 300
319	   milliseconds intervals, and use RFC 4103 [RFC4103] with the default
320	   redundancy of one original and two redundant transmissions.  The
321	   source of the text would be indicated by a single member in the CSRC
322	   list.  Text from different sources cannot be transmitted in the same
323	   packet.  Therefore, from the time when the mixer sent one piece of
324	   new text from one source, it will need to transmit that text again
325	   twice as redundant data, before it can send text from another source.
326	   The switching time will thus be 900 milliseconds.  The mixer can not
327	   even send text from two simultaneous sources without introducing more
328	   than 500 milliseconds delay.  This is clearly insufficient.

330	   Pros:

332	   Only a capability negotiation method is needed.  No other update of
333	   standards are needed, just a general remark that traditional RTP-
334	   mixing is used.

336	   Cons:

338	   Clearly insufficient mixer switching performance.

340	   A bit complex handling of transmission when there is new text
341	   available from more than one source.  The mixer needs to send two
342	   packets more with redundant text from the current source before
343	   starting to send anything from the other source.

345	4.1.1.2.  RTP Mixer using the default method but decreased transmission
346	          interval

348	   This method makes use of the default RTP-mixing method briefly
349	   described in Section 4.1.1.1.  The only difference is that the
350	   transmission interval is decreased to 100 milliseconds when there is
351	   text from more than one source available for transmission.  This
352	   increases the switching performance to three source switches per
353	   second.  The delay of new text from a participant can be one second
354	   if five users send new text simultaneously.  Text from two
355	   simultaneous users would not get more dealyed than 400 ms.

357	   Pros:

359	   Minor influence on standards

361	   Can be sdp-declared as "text/red" with a multi-party attribute for
362	   capability negotiation.

364	   Cons:

366	   Too long delay of new text from more than two simultaneous sources.

368	   Slightly higher risk for loss of text at bursty packet loss than for
369	   the recommended transmission interval (300 ms) for RFC 4103.

371	   When complete loss of packets occur (beyond recovery), it is not
372	   possible to deduct from which source text was lost.

374	   A bit complex handling of transmission when there is new text
375	   available from more than one source.  The mixer needs to send two
376	   packets more with redundant text from the current source before
377	   starting to send anything from the other source.

379	4.1.1.3.  RTP Mixer with frequent transmission and indicating sources in
380	          CSRC-list

382	   An RTP media mixer combines text from participants into one RTP
383	   stream, thus all using the same destination address/port combination,
384	   the same RTP SSRC, and one sequence number series as described in
385	   Section 7.1 and 7.3 of RTP RFC 3550 [RFC3550] about the Mixer
386	   function.  This method is also briefly described in RFC 7667, section
387	   3.6.1 Media mixing mixer [RFC7667].

389	   The sources of the text in each RTP packet are identified by the CSRC
390	   list in the RTP packets, containing the SSRC of the initial sources
391	   of text.  The order of the CSRC parameters is with the SSRC of the
392	   source of the primary text first, followed by the SSRC of the first
393	   level redundancy, and then the second level redundancy.

395	   The transmission interval should be 100 milliseconds when there is
396	   text to transmit from more than one source, and otherwise 300 ms.

398	   The identification of the sources is made through the CSRC fields and
399	   can be made more readable at the receiver through the RTCP SDES CNAME
400	   and NAME packets as described in RTP[RFC3550].

402	   Information provided through the notification according to RFC 4575
403	   [RFC4575] when the participant joined the conference provides also
404	   suitable information and a reference to the SSRC.

406	   A receiving endpoint is supposed to separate text items from the
407	   different sources and identify and display them accordingly.

409	   The ordered CSRC lists in the RFC 4103 [RFC4103] packets make it
410	   possible to recover from loss of one and two packets in sequence and
411	   assign the recovered text to the right source.  For more loss, a
412	   marker for possible loss should be inserted or presented.

414	   The conference server needs to have authority to decrypt the payload
415	   in the received RTP packets in order to be able to recover text from
416	   redundant data or insert the missing text marker in the stream, and
417	   repack the text in new packets.

419	   Even if the format is very similar to "text/red" of RFC 4103, it has
420	   been indicated that it needs to be declared as a new media subtype,
421	   e.g. "text/rex".

423	   Pros:

425	   This method has low overhead and less complexity than the methods in
426	   Section 4.1.1.1, Section 4.1.1.2, Section 4.1.1.4 and
427	   Section 4.1.1.6.

429	   When loss of packets occur, it is possible to recover text from
430	   redundancy at loss of up to the number of redundancy levels carried
431	   in the RFC 4103 [RFC4103] stream (normally primary and two redundant
432	   levels).

434	   This method can be implemented with most RTP implementations.

436	   The source switching performance is sufficient for well-behaving
437	   conference participants.  There can be switching between five source
438	   per second with an introduced delay of maximum 500 ms.  With just two
439	   parties typing simultaneously, the delay will be a maximum of 100 ms.

441	   Cons:

443	   When more consecutive packet loss than the number of generations of
444	   redundant data appears, it is not possible to deduct the sources of
445	   the totally lost data.

447	   Slightly higher risk for loss of text at bursty packet loss than for
448	   the recommended transmission interval for RFC 4103.

450	   Requires a different sub media format, e.g. "text/rex".

452	   The conference server needs to be allowed to decrypt/encrypt the
453	   packet payload.  This is however normal for media mixers for other
454	   media.

456	4.1.1.4.  RTP Mixer using timestamp to identify redundancy

458	   This method has text only from one source per packet, as the original
459	   RFC 4103 [RFC4103] specifies.  Packets with text from different
460	   sources are instead allowed to be merged.  The recovery procedure in
461	   the receiver will use the RTP timestamp and timestamp offsets in the
462	   redundancy headers to evaluate if a piece of redundant data should be
463	   recovered or not in case of packet loss.

465	   In this method, the transmission interval is 100 milliseconds when
466	   text from more than one source is available for transmission.

468	   Pros:

470	   The format of each packet is equal to what is specified in RFC 4103
471	   [RFC4103].

473	   The source switching performance is sufficient.  Text from five
474	   participants can be transmitted simultaneously with 500 milliseconds
475	   interval per source.

477	   New text from five simultaneous sources can be transmitted within 500
478	   milliseconds.  This is sufficient.

480	   Cons:

482	   The recovery time in case of packet loss is long.  With five
483	   participants, it will be 1.5 seconds.

485	   The recovery procedure is complex and very different from what is
486	   described in RFC 4103 [RFC4103].

488	   It is not sure that this change can be regarded to be an update to
489	   RFC 4103.  It may need a new media subtype.

491	4.1.1.5.  RTP Mixer with multiple primary data in each packet and
492	          individual sequence numbers

494	   This method allows primary as well as redundant text from more than
495	   one source per packet.  The packet payload contains an ordered set of
496	   redundant and primary data with the same number of generations of
497	   redundancy as once agreed in the SDP negotiation.  The data header
498	   reflects these parts of the payload.  The CSRC list contains one CSRC
499	   member per source in the payload and in the same order.  An
500	   individual sequence number per source is included in the data header
501	   replacing the t140 payload type number that is instead assumed to be
502	   constant in this format.  This allows an individual extra sequence
503	   number per source with maximum value 127, suitable for checking for
504	   which source loss of text appeared when recovery was not possible.

506	   The data header would contain the following fields:
507	     0                   1                    2                   3
508	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
509	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
510	   |F| Source-seq  |  timestamp offset         |   block length    |
511	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
512	   Where "Source-seq" is the sequence number per source.

514	   The maximum number of members in the CSRC-list is 16, and that is
515	   therefore the maximum number of sources that can be represented in
516	   each packet provided that all data can be fitted into the size
517	   allowable in one packet.

519	   Transmission is done as soon as there is new text available, but not
520	   with shorter interval than 150 ms and not longer than 300 ms while
521	   there is anything to send.

523	   A new media subtype is needed, e.g. "text/rex".

525	   This is an SDP offer example for both traditional "text/red"
526	   and multi-party "text/rex" format:

528	         m=text 11000 RTP/AVP 101 100 98
529	         a=rtpmap:98 t140/1000
530	         a=rtpmap:100 red/1000
531	         a=rtpmap:101 rex/1000
532	         a=fmtp:100 98/98/98
533	         a=fmtp:101 98/98/98

535	   Pros:

537	   The source switching performance is good.  Text from 16 participants
538	   can be transmitted simultaneously.

540	   New text from 16 simultaneous sources can be transmitted within 300
541	   milliseconds.  This is good performance.

543	   When more consecutive packet loss than the number of generations of
544	   redundant data appears, it is still possible to deduct the sources of
545	   the totally lost data, when next text from these sources arrive.

547	   Cons:

549	   The format of each packet is different from what is specified in RFC
550	   4103 [RFC4103].

552	   A new media subtype is needed.

554	   The recovery procedure is a bit complex.

556	4.1.1.6.  RTP Mixer with multiple primary data in each packet

558	   This method allows primary as well as redundant text from more than
559	   one source per packet.  The packet payload contains an ordered set of
560	   redundant and primary data with the same number of generations of
561	   redundancy as once agreed in the SDP negotiation.  The data header
562	   reflects these parts of the payload.  The CSRC list contains one CSRC
563	   member per source in the payload and in the same order.  The
564	   The maximum number of members in the CSRC-list is 16, and that is
565	   therefore the maximum number of sources that can be represented in
566	   each packet provided that all data can be fitted into the size
567	   allowable in one packet.

569	   Transmission is done as soon as there is new text available, but not
570	   with shorter interval than 150 ms and not longer than 300 ms while
571	   there is anything to send.

573	   A new media subtype is needed, e.g. "text/rex".

575	   SDP would be the same as in Section 4.1.1.6.

577	   Pros:

579	   The source switching performance is good.  Text from 16 participants
580	   can be transmitted simultaneously.

582	   New text from 16 simultaneous sources can be transmitted within 150
583	   milliseconds.  This is good performance.

585	   Cons:

587	   The format of each packet is different from what is specified in RFC
588	   4103 [RFC4103].

590	   A new media subtype is needed.

592	   The recovery procedure is a bit complex [RFC4103].

594	   When more consecutive packet loss than the number of generations of
595	   redundant data appears, it is not possible to deduct the sources of
596	   the totally lost data.

598	4.1.1.7.  RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy in the
599	          packets

601	   This method allows primary data from one source and redundant text
602	   from other sources in each packet.  The packet payload contains
603	   primary data in "text/t140" format, and redundant data in RFC 5109
604	   FEC [RFC5109] format called "text/ulpfec".  That means that the
605	   redundant data contains the sequence number and the CSRC and other
606	   characteristics from the RTP header when the data was sent as
607	   primary.  The redundancy can be sent at a selected number of packets
608	   after when it was sent as primary, in order to improve the protection
609	   against bursty packet loss.  The redundancy level is recommended to
610	   be the same as in original RFC 4103.

612	   RFC 4103 says that the protection against loss can be made by other
613	   methods than plain redundancy, so this method is in line with that
614	   statement.

616	   Transmission is done as soon as there is new text available, but not
617	   with shorter interval than 100 ms and not longer than 300 ms while
618	   there is anything to send (new or redundant text).

620	   When more consecutive packet loss than the number of generations of
621	   redundant data appears, it is not possible to deduct the sources of
622	   the totally lost data.

624	   The sdp can indicate the format as "text/red" with "text/ulpfec"
625	   redundant data in this way. with traditional RFC 4103 with "text/red"
626	   with "text/t140" as redundant data as a fallback.

628	   m=text 49170 RTP/AVP 98 101 100 102
629	   a=rtpmap:98 red/1000
630	   a=fmtp:98 100/102/102
631	   a=rtpmap:102 ulpfec/1000
632	   a=rtpmap:100 t140/1000
633	   a=rtpmap:101 red/1000
634	   a=fmtp:101 100/100/100
635	   a=fmtp:100 cps=200

637	   The "text/ulpfec" format includes an indication of how far back the
638	   redundancy belongs, making it possible to cover bursty packet loss
639	   better than the other formats with short transmission intervals.  For
640	   real-time text, it is recommended to send three packets between the
641	   primary and the redundant transmissions of text.  That makes the
642	   transmission cover between 500 and 1500 ms of bursty packet loss.
643	   The variation is because of the varying packet interval between many
644	   and one simultaneously transmitting source.

646	   The "text/ulpfec" format has a number of parameters.  One is the
647	   length of the data to be protected which in this case must be the
648	   whole t140block.

650	   Pros:

652	   The source switching performance is good.  Text from 5 participants
653	   can be transmitted within 500 ms.

655	   Good recovery from bursty packet loss.

657	   The method is based on existing standards.  No new registrations are
658	   needed.

660	   Cons:

662	   When more consecutive packet loss than the number of generations of
663	   redundant data appears, it is not possible to deduct the sources of
664	   the totally lost data.

666	   Even if the switching performance is good, it is not as good as for
667	   the method called "RTP Mixer with multiple primary data in each
668	   packet "Section 4.1.1.6.  With more than 5 simultaneously sending
669	   sources, there will be a noticeable delay of text of over 500 ms,
670	   with 100 ms added per simultaneous source.  This is however beyond
671	   the requirements and would be a concern only in congestion
672	   situations.

674	   The recovery procedure is a bit complex [RFC5109].

676	   There is more overhead in terms of extra data and extra packets sent
677	   than in the other methods.  With the recommended two redundant
678	   generations of data, each packet will be 36 bytes longer than with
679	   traditional RFC 4103, and at each pause in transmission five extra
680	   packets with only redundant data will be sent compared to two extra
681	   packets for the traditional RFC 4103 case.

683	4.1.1.8.  RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy and
684	          separate sequence number in the packets

686	   This method allows primary data from one source and redundant text
687	   from other sources in each packet.  The packet payload contains
688	   primary data in a new "text/t140e" format, and redundant data in RFC
689	   5109 FEC [RFC5109] format called "text/ulpfec".  That means that the
690	   redundant data contains the sequence number and the CSRC and other
691	   characteristics from the RTP header when the data was sent as
692	   primary.  The redundancy can be sent at a selected number of packets
693	   after when it was sent as primary, in order to improve the protection
694	   against bursty packet loss.  The redundancy level is recommended to
695	   be the same as in original RFC 4103.  The "text/t140e" format
696	   contains a source-specific sequence number and the t140block.

698	   RFC 4103 says that the protection against loss can be made by other
699	   methods than plain redundancy, so this method is in line with that
700	   statement.

702	   Transmission is done as soon as there is new text available, but not
703	   with shorter interval than 100 ms and not longer than 300 ms while
704	   there is anything to send (new or redundant text).

706	   When more consecutive packet loss than the number of generations of
707	   redundant data appears, it is possible to deduct which sources lost
708	   data when new data arrives from the sources.  This is done by
709	   monitoring the received source specific sequence numbers preceding
710	   the text.

712	   This is an example of how can indicate the format as "text/red" with
713	   "text/t140e" as primary and "text/ulpfec" redundant data, with
714	   traditional RFC 4103 with "text/red" with "text/t140" as redundant
715	   data as a fallback.

717	   m=text 49170 RTP/AVP 98 101 100 102 103
718	   a=rtpmap:98 red/1000
719	   a=fmtp:98 100/102/102
720	   a=rtpmap:102 ulpfec/1000
721	   a=rtpmap:103 t140/1000
722	   a=rtpmap:100 t140e/1000
723	   a=rtpmap:101 red/1000
724	   a=fmtp:101 103/103/103
725	   a=fmtp:100 cps=200

727	   The "text/ulpfec" format includes an indication of how far back the
728	   redundancy belongs, making it possible to cover bursty packet loss
729	   better than the other formats with short transmission intervals.  For
730	   real-time text, it is recommended to send three packets between the
731	   primary and the redundant transmissions of text.  That makes the
732	   transmission cover between 500 and 1500 ms of bursty packet loss.
733	   The variation is because of the varying packet interval between many
734	   and one simultaneously transmitting source.

736	   The "text/ulpfec" format has a number of parameters.  One is the
737	   length of the data to be protected which in this case must be the
738	   whole t140block.

740	   Pros:

742	   The source switching performance is good.  Text from 5 participants
743	   can be transmitted within 500 ms.

745	   Good recovery from bursty packet loss.

747	   The method is based on an existing standard for FEC.

749	   When more consecutive packet loss than the number of generations of
750	   redundant data appears, it is possible to deduct the source of the
751	   lost data when new text arrives from the source.

753	   Cons:

755	   Even if the switching performance is good, it is not as good as for
756	   the method called "RTP Mixer with multiple primary data in each
757	   packet" Section 4.1.1.6.  With more than 5 simultaneously sending
758	   sources, there will be a noticeable delay of text of over 500 ms,
759	   with 100 ms added per simultaneous source.  This is however beyond
760	   the requirements and would be a concern only in congestion
761	   situations.

763	   The recovery procedure is a bit complex [RFC5109].

765	   There is more overhead in terms of extra data and extra packets sent
766	   than in the other methods.  With the recommended two redundant
767	   generations of data, each packet will be 40 bytes longer than with
768	   traditional RFC 4103, and at each pause in transmission five extra
769	   packets with only redundant data will be sent compared to two extra
770	   packets for the traditional RFC 4103 case.

772	   A new text media subtype "text/t140e" needs to be registered.

774	4.1.1.9.  RTP Mixer indicating participants by a control code in the
775	          stream

777	   Text from all participants except the receiving one is transmitted
778	   from the media mixer in the same RTP session and stream, thus all
779	   using the same destination address/port combination, the same RTP
780	   SSRC and , one sequence number series as described in Section 7.1 and
781	   7.3 of RTP RFC 3550 [RFC3550] about the Mixer function.  The sources
782	   of the text in each RTP packet are identified by a new defined T.140
783	   control code "c" followed by a unique identification of the source in
784	   UTF-8 string format.

786	   The receiver can use the string for presenting the source of text.
787	   This method is on the RTP level described in RFC 7667, section 3.6.1
788	   Media mixing mixer [RFC7667].

790	   The inline coding of the source of text is applied in the data stream
791	   itself, and an RTP mixer function is used for coordinating the
792	   sources of text into one RTP stream.

794	   Information uniquely identifying each user in the multi-party session
795	   is placed as the parameter value "n" in the T.140 application
796	   protocol function with the function code "c".  The identifier shall
797	   thus be formatted like this: SOS c n ST, where SOS and ST are coded
798	   as specified in ITU-T T.140 [T140].  The "c" is the letter "c".  The
799	   n parameter value is a string uniquely identifying the source.  This
800	   parameter shall be kept short so that it can be repeated in the
801	   transmission without concerns for network load.

803	   A receiving endpoint is supposed to separate text items from the
804	   different sources and identify and display them accordingly.

806	   The conference server need to be allowed to decrypt/encrypt the
807	   packet payload in order to check the source and repack the text.

809	   Pros:

811	   If loss of packets occur, it is possible to recover text from
812	   redundancy at loss of up to the number of redundancy levels carried
813	   in the RFC 4103 [RFC4103]stream. (normally primary and two redundant
814	   levels.

816	   This method can be implemented with most RTP implementations.

818	   The method can also be used with other transports than RTP

820	   Cons:

822	   The method implies a moderate load by the need to insert the source
823	   often in the stream.

825	   If more consecutive packet loss than the number of generations of
826	   redundant data appears, it is not possible to deduct the source of
827	   the totally lost data.

829	   The mixer needs to be able to generate suitable and unique source
830	   identifications which are suitable as labels for the sources.

832	   Requires an extension on the ITU-T T.140 standard, best made by the
833	   ITU.

835	   There is a risk that the control code indicating the change of source
836	   is lost and the result is false source indication of text.

838	   The conference server need to be allowed to decrypt/encrypt the
839	   packet payload.

841	4.1.1.10.  Mixing for multi-party unaware user agents

843	   Multi-party real-time text contents can be transmitted to multi-party
844	   unaware user agents if source labelling and formatting of the text is
845	   performed by a mixer.  This method has the limitations that the
846	   layout of the presentation and the format of source identification is
847	   purely controlled by the mixer, and that only one source at a time is
848	   allowed to present in real-time.  Other sources need to be stored
849	   temporarily waiting for an appropriate moment to switch the source of
850	   transmitted text.  The mixer controls the switching of sources and
851	   inserts a source identifier in text format at the beginning of text
852	   after switch of source.  The logic of the mixer to detect when a
853	   switch is appropriate should detect a number of places in text where
854	   a switch can be allowed, including new line, end of sentence, end of
855	   phrase, a period of inactivity, and a word separator after a long
856	   time of active transmission.

858	   This method MAY be used when no support for multi-party awareness is
859	   detected in the receiving endpoint.The base for his method is
860	   described in RFC 7667, section 3.6.1 Media mixing mixer [RFC7667].

862	   See [I-D.ietf-avtcore-multi-party-rtt-mix] for a procedure for mixing
863	   RTT for a conference-unaware endpoint.

865	   Pros:

867	   Can be transmitted to conference-unaware endpoints.

869	   Can be used with other transports than RTP

871	   Cons:

873	   Does not allow full real-time presentation of more than one source at
874	   a time.  Text from other sources will be delayed.

876	   The only realistic presentation format is a style with the text from
877	   the different sources presented with a text label indicating source,
878	   and the text collected in a chat style presentation but with more
879	   frequent turn-taking.

881	   Endpoints often have their own system for adding labels to the RTT
882	   presentation.  In that case there will be two levels of labels in the
883	   presentation, one for the mixer and one for the sources.

885	   If loss of more packets than can be recovered by the redundancy
886	   appears, it is not possible to detect which source was struck by the
887	   loss.  It is also possible that a source switch occurred during the
888	   loss, and therefore a false indication of the source of text can be
889	   provided to the user after such loss.

891	   Because of all these cons, this method is not recommended and MUST
892	   NOT be used as the main method, but only as the last resort for
893	   backwards interoperability with multi-party unaware endpoints.

895	   The conference server need to be allowed to decrypt/encrypt the
896	   packet payload.

898	4.1.2.  RTP-based bridging with minor RTT media contents reformatting by
899	        the bridge

901	   It may be desirable to send text in a multi-party setting in a way
902	   that allows the text stream contents to be distributed without being
903	   dealt with in detail in any central server.  A number of such methods
904	   are described.  However, when writing this specification, no one of
905	   these methods have a specified way of establishing the session by
906	   sdp.

908	4.1.2.1.  RTP Translator sending one RTT stream per participant

910	   Within the RTP session, text from each participant is transmitted
911	   from the RTP media translator (bridge) in a separate RTP stream, thus
912	   using the same destination address/port combination, the same payload
913	   type number (PT) but separate RTP SSRC parameters and sequence number
914	   series as described in Section 7.1 and 7.2 of RTP RFC 3550 [RFC3550]
915	   about the Translator function.  The source of the text in each RTP
916	   packet is identified by the SSRC parameter in the RTP packets,
917	   containing the SSRC of the initial source of text.

919	   A receiving and presenting endpoint is supposed to separate text
920	   items from the different sources and identify and display them in a
921	   suitable way.

923	   This method is described in RFC 7667, section 3.5.1 Relay-transport
924	   translator or 3.5.2 Media translator [RFC7667].

926	   The identification of the source is made through the SSRC.  The
927	   translation to a readable label can be done by mapping to information
928	   from the RTCP SDES CNAME and NAME packets as described in
929	   RTP[RFC3550], and also through information in the text media member
930	   in the conference notification described in RFC 4575 [RFC4575].

932	   The sdp exchange for establishing this mixing type can be equal to
933	   what is used for basic two-party use of RFC 4103 with just an added
934	   attribute for indicating multi-party capability.

936	   m=text 49170 RTP/AVP 98 103
937	   a=rtpmap:98 red/1000
938	   a=fmtp:98 103/103/103
939	   a=rtpmap:103 t140/1000
940	   a=fmtp:103 cps=150
941	   a=RTT-mix:RTP-translator
942	   A similar answer including the same RTT-mix attribute would indicate
943	   that multi-party coding can begin.  An answer without the same RTT-
944	   mix attribute could result in diversion to use of the mixing method
945	   for multi-party unaware endpoints Section 4.1.1.10 if more than two
946	   parties are involved in the session.

948	   The bridge can add new sources in the communication to a participant
949	   by first sending a conference notification according to RFC 4575
950	   [RFC4575] with the SSRC of the new source included in the
951	   corresponding "text" media member, or by sending an RTCP message with
952	   the new SSRC in an SDES packet.

954	   A receiver should be prepared to receive such indications of new
955	   streams being added to the multi-party session, so that the new SSRC
956	   is not taken for a change in SSRC value for an already established
957	   RTP stream.

959	   Transmission, reception, packet loss recovery and text loss
960	   indication is performed per source in the separate RTP streams in the
961	   same way as in two-party sessions with RFC 4103 [RFC4575].

963	   Text is recommended to be sent by the bridge as soon as it is
964	   available for transmission, but not less than 250 ms after a previous
965	   transmission.  This will in many cases result in close to 0 added
966	   delay by the bridge, because most RTT senders use a 300 ms
967	   transmission interval.

969	   It is sometimes said that this configuration is not supported by
970	   current media declarations in sdp.  RFC 3264 [RFC3264]specifies in
971	   some places that one media description is supposed to describe just
972	   one RTP media stream.  However this is not directly referencing an
973	   RTP stream, and use of multiple RTP streams in the same RTP session
974	   is recommended in many other RFCs.

976	   This confusion is clarified in RFC 5576 [RFC5576] section 3 by the
977	   following statements:

979	   "The term "media stream" does not appear in the SDP specification
980	   itself, but is used by a number of SDP extensions, for instance,
981	   Interactive Connectivity Establishment (ICE) [ICE], to denote the
982	   object described by an SDP media description.  This term is
983	   unfortunately rather confusing, as the RTP specification [RFC3550]
984	   uses the term "media stream" to refer to an individual media source
985	   or RTP packet stream, identified by an SSRC, whereas an SDP media
986	   stream describes an entire RTP session, which can contain any number
987	   of RTP sources."
988	   In most cases, it will be sufficient that new sources are introduced
989	   with a conference notification or RTCP message.  However, RFC 5576
990	   [RFC5576] specifies attributes which may be used to more explicitly
991	   announce new sources or restart of earlier established RTP streams.

993	   This method is encouraged by draft-ietf-avtcore-multiplex-guidelines
994	   [I-D.ietf-avtcore-multiplex-guidelines] section 5.2.

996	   Normal operation will be that the bridge receives text packets from
997	   the source and handles any text recovery and indication of loss
998	   needed before queueing the resulting clean text for transmission from
999	   the bridge to the receivers.

1001	   It may however also be possible for the bridge to just convey the
1002	   packet contents as received from the sources, with minor adjustments,
1003	   and let the receiving endpoint handle all aspects of recovery and
1004	   indication of loss, even for the source to bridge path.  In that case
1005	   also the sequence number must be maintained as it was at reception in
1006	   the bridge.  This mode needs further study before application.

1008	   Pros:

1010	   This method is the natural way to do multi-party bridging with RFC
1011	   4103 based RTT.  Only a small addition is included in the session
1012	   establishment to verify capability by the parties because many
1013	   implementations are done without multi-party capability.

1015	   This method has moderate overhead in terms of work for the mixer, but
1016	   high in terms of packet transmission rate.  Five sources sending
1017	   simultaneously cause the bridge to send 15 packets per second to each
1018	   receiver.

1020	   When loss of packets occur, it is possible to recover text from
1021	   redundancy at loss of up to the number of redundancy levels carried
1022	   in the RFC 4103 [RFC4103] stream(normally primary and two redundant
1023	   levels).

1025	   More loss than what can be recovered, can be detected and the marker
1026	   for text loss can be inserted in the correct stream.

1028	   It may be possible in some scenarios to keep the text encrypted
1029	   through the Translator.

1031	   Minimal delay.  The delay can often be kept close to 0 with at least
1032	   5 simultaneous sending participants.

1034	   Cons:

1036	   There may be RTP implementations not supporting the Translator model.
1037	   They will need to use the fall-back to multi-party-unaware mixing.
1038	   An investigation about how common this is is needed before the method
1039	   is used.

1041	   With many simultaneous sending sources, the total rate of packets
1042	   will be high, and can cause congestion.  The requirement to handle 5
1043	   simultaneous sources in this specification will cause 15 packets per
1044	   second that is on the high side but still manageable in most cases,
1045	   e.g. considering that audio usually use 50 packets per second.

1047	4.1.2.2.  Distributing packets in an end-to-end encryption structure

1049	   In order to achieve end-to-end encryption, it is possible to let the
1050	   packets from the sources just pass though a central distributor, and
1051	   handle the security agreements between the participants.
1052	   Specifications exist for a framework with this functionality for
1053	   application on RTP based conferences in
1054	   [I-D.ietf-perc-private-media-framework].  The RTP flow and mixing
1055	   characteristics has similarities with the method described under "RTP
1056	   Translator sending one RTT stream per participant" above.  RFC 4103
1057	   RTP streams [RFC4103] would fit into the structure and it would
1058	   provide a base for end-to-end encrypted rtt multi-party conferencing.

1060	   Pros:

1062	   Good security

1064	   Straightforward multi-party handling.

1066	   Cons:

1068	   Does not operate under the usual SIP central conferencing
1069	   architecture.

1071	   Requires the participants to perform a lot of key handling.

1073	   Is work in progress when this is written.

1075	4.1.2.3.  Mesh of RTP endpoints

1077	   Text from all participants are transmitted directly to all others in
1078	   one RTP session, without a central bridge.  The sources of the text
1079	   in each RTP packet are identified by the source network address and
1080	   the SSRC.

1082	   This method is described in RFC 7667, section 3.4 Point to multi-
1083	   point using mesh [RFC7667].

1085	   Pros:

1087	   When loss of packets occur, it is possible to recover text from
1088	   redundancy at loss of up to the number of redundancy levels carried
1089	   in the RFC 4103 [RFC4103] stream. (normally primary and two redundant
1090	   levels.

1092	   This method can be implemented with most RTP implementations.

1094	   Transmitted text can also be used with other transports than RTP

1096	   Cons:

1098	   This model is not described in IMS, NENA and EENA specifications, and
1099	   does therefore not meet the requirements.

1101	   Requires a drastically increasing number of connections when the
1102	   number of participants increase.

1104	4.1.2.4.  Multiple RTP sessions, one for each participant

1106	   Text from all participants are transmitted directly to all others in
1107	   one RTP session each, without a central bridge.  Each session is
1108	   established with a separate media description in SDP.  The sources of
1109	   the text in each RTP packet are identified by the source network
1110	   address and the SSRC.

1112	   Pros:

1114	   When loss of packets occur, it is possible to recover text from
1115	   redundancy at loss of up to the number of redundancy levels carried
1116	   in the RFC 4103 [RFC4103] stream. (normally primary and two redundant
1117	   levels.

1119	   Complete loss of text can be indicated in the received stream.

1121	   This method can be implemented with most RTP implementations.

1123	   End-to-end encryption is achievable.

1125	   Cons:

1127	   This method is not described in IMS, NENA and ETSI specifications and
1128	   does therefore not meet the requirements.

1130	   A lot of network resources are spent on setting up separate sessions
1131	   for each participant.

1133	5.  Preferred RTP-based multi-party RTT transport method

1135	   For RTP transport of RTT using RTP-mixer technology, one method for
1136	   multi-party mixing and transport stand out as fulfilling the goals
1137	   best and is therefore recommended.  That is: TBD

1139	   For RTP transport in separate streams or sessions, no current
1140	   recommendation can be made.  A bridging method in the process of
1141	   standardisation with interesting characteristics is the end-to-end
1142	   encryption model "perc" Section 4.1.2.2.

1144	6.  Session control of RTP-based multi-party RTT sessions

1146	   General session control aspects for multi-party sessions are
1147	   described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP)
1148	   Event Package for Conference State, and RFC 4579 [RFC4579] Session
1149	   Initiation Protocol (SIP) Call Control - Conferencing for User
1150	   Agents.  The nomenclature of these specifications are used here.

1152	   The procedures for a multi-party aware model for RTT-transmission
1153	   shall only be applied if a capability exchange for multi-party aware
1154	   real-time text transmission has been completed and a supported method
1155	   for multi-party real-time text transmission can be negotiated.

1157	   A method for detection of conference-awareness for centralized SIP
1158	   conferencing in general is specified in RFC 4579 [RFC4579].  The
1159	   focus sends the "isfocus" feature tag in a SIP Contact header.  This
1160	   causes the conference-aware endpoint to subscribe to conference
1161	   notifications from the focus.  The focus then sends notifications to
1162	   the endpoint about entering and disappearing conference participants
1163	   and their media capabilities.  The information is carried XML-
1164	   formatted in a 'conference-info' block in the notification according
1165	   to RFC 4575 [RFC4575].  The mechanism is described in detail in RFC
1166	   4575 [RFC4575].

1168	   Before a conference media server starts sending multi-party RTT to an
1169	   endpoint, a verification of its ability to handle multi-party RTT
1170	   must be made.  A decision on which mechanism to use for identifying
1171	   text from the different participants must also be taken, implicitly
1172	   or explicitly.  These verifications and decisions can be done in a
1173	   number of ways.  The most apparent ways are specified here and their
1174	   pros and cons described.  One of the methods is selected to be the
1175	   one to be used by implementations of the centralized conference model
1176	   according to this specification.

1178	6.1.  Implicit RTT multi-party capability indication

1180	   Capability for RTT multi-party handling can be decided to be
1181	   implicitly indicated by session control items.

1183	   The focus may implicitly indicate muti-party RTT capability by
1184	   including the media child with value "text" in the RFC 4575 [RFC4575]
1185	   conference-info provided in conference notifications.

1187	   An endpoint may implicitly indicate multi-party RTT capability by
1188	   including the text media in the SDP in the session control
1189	   transactions with the conference focus after the subscription to the
1190	   conference has taken place.

1192	   The implicit RTT capability indication means for the focus that it
1193	   can handle multi-party RTT according to the preferred method
1194	   indicated in the RTT multi-party methods section above.

1196	   The implicit RTT capability indication means for the endpoint that it
1197	   can handle multi-party RTT according to the preferred method
1198	   indicated in the RTT multi-party methods section above.

1200	   If the focus detects that an endpoint implicitly declared RTT multi-
1201	   party capability, it SHALL provide RTT according to the preferred
1202	   method.

1204	   If the focus detects that the endpoint does not indicate any RTT
1205	   multi-party capability, then it shall either provide RTT multi-party
1206	   text in the way specified for conference-unaware endpoint above, or
1207	   refuse to set up the session.

1209	   If the endpoint detects that the focus has implicitly declared RTT
1210	   multi-party capability, it shall be prepared to present RTT in a
1211	   multi-party fashion according to the preferred method.

1213	   Pros:

1215	   Acceptance of implicit multi-party capability implies that no
1216	   standardisation of explicit RTT multi-party capability exchange is
1217	   required.

1219	   Cons:

1221	   If other methods for multi-party RTT are to be used in the same
1222	   implementation environment as the preferred ones, then capability
1223	   exchange needs to be defined for them.

1225	   Cannot be used outside a strictly applied SIP central conference
1226	   model.

1228	6.2.  RTT multi-party capability declared by SIP media-tags

1230	   Specifications for RTT multi-party capability declarations can be
1231	   agreed for use as SIP media feature tags, to be exchanged during SIP
1232	   call control operation according to the mechanisms in RFC 3840
1233	   [RFC3840] and RFC 3841 [RFC3841].  Capability for the RTT Multi-party
1234	   capability is then indicated by the media feature tag "rtt-mix", with
1235	   a set of possible values for the different possible methods.

1237	   The possible values in the list may for example be:

1239	      rtp-mixer

1241	      perc

1243	   rtp-mixer indicates capability for using the RTP-mixer based
1244	   presentation of multi-party text.

1246	   perc indicates capability for using the perc based transmission of
1247	   multi-party text.

1249	   Example: Contact: <sip:a2@beco.example.com>

1251	   ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL"

1253	   ;+sip.rtt-mix="rtp-mixer"

1255	   If, after evaluation of the alternatives in this specification, only
1256	   one mixing method is selected to be brought to implementation, then
1257	   the media tag can be reduced to a single tag with no list of values.

1259	   An offer-answer exchange should take place and the common method
1260	   selected by the answering party shall be used in the session with
1261	   that UA.

1263	   When no common method is declared, then only the fallback method for
1264	   multi-party unaware participants can be used, or the session dropped.

1266	   If more than one text media section is included in SDP, all must be
1267	   capable of using the declared RTT multi-party method.

1269	   Pros:

1271	   Provides a clear decision method.

1273	   Can be extended with new mixing methods.

1275	   Can guide call routing to a suitable capable focus.

1277	   Cons:

1279	   Requires standardization and IANA registration.

1281	   Is not stream specific.  If more than one text stream is specified,
1282	   all must have the same type of multi-party capability.

1284	   Cannot be used in the WebRTC environment.

1286	6.3.  SDP media attribute for RTT multi-party capability indication

1288	   An attribute can be specified on media level, to be used in text
1289	   media SDP declarations for negotiating RTT multi-party capabilities.
1290	   The attribute can have the name "rtt-mix".

1292	   More than one attribute can be included in one media description.

1294	   The attribute can have a value.  The value can for example be:

1296	      rtp-mixer

1298	      rtp-translator

1300	      perc

1302	   rtp-mixer indicates capability for using the RTP-mixer and CSRC-list
1303	   based mixing of multi-party text.

1305	   rtp-translator indicates capability for using the RTP-translator
1306	   based mixing

1308	   perc indicates capability for using the perc based transmission of
1309	   multi-party text.

1311	   An offer-answer exchange should take place and the common method
1312	   selected by the answering party shall be used in the session with
1313	   that endpoint.

1315	   When no common method is declared, then only the fallback method for
1316	   multi-party unaware endpoints can be used.

1318	   Example: a=rtt-mix:rtp-mixer
1319	   If, after evaluation of the alternatives in this specification, only
1320	   one mixing method is selected to be brought to implementation, then
1321	   the attribute can be reduced to a single attribute with no list of
1322	   values.

1324	   Pros:

1326	   Provides a clear decision method.

1328	   Can be extended with new mixing methods.

1330	   Can be used on specific text media.

1332	   Can be used also for SDP-controlled WebRTC sessions with multiple
1333	   streams in the same data channel.

1335	   Cons:

1337	   Requires standardization and IANA registration.

1339	   Cannot guide SIP routing.

1341	6.4.  Simplified SDP media attribute for RTT multi-party capability
1342	      indication

1344	   An attribute can be specified on media level, to be used in text
1345	   media SDP declarations for negotiating RTT multi-party capabilities.
1346	   The attribute can have the name "rtt-mix" with no value.  It would be
1347	   selected and used if only one method for multi-party rtt is brought
1348	   forward from this specification, and the other suppressed or found to
1349	   be possible to negotiate in another way.

1351	   An offer-answer exchange should take place and if both parties
1352	   specify "rtt-mix" capability, the selected mixing method shall be
1353	   used.

1355	   When no common method is declared, then only the fallback method for
1356	   multi-party unaware endpoints can be used, or the session not
1357	   accepted for multi-party use.

1359	   Example: a=rtt-mix

1361	   Pros:

1363	   Provides a clear decision method.

1365	   Very simple syntax and semantics.

1367	   Can be used on specific text media.

1369	   Could possibly be used also for SDP-controlled WebRTC sessions with
1370	   multiple streams in the same data channel.

1372	   Cons:

1374	   Requires standardization and IANA registration.

1376	   If another RTT mixing method is also specified in the future, then
1377	   that method may also need to specify and register its own attribute,
1378	   instead of if an attribute with a parameter value is used, when only
1379	   an addition of a new possible value is needed.

1381	   Cannot guide SIP routing.

1383	6.5.  SDP format parameter for RTT multi-party capability indication

1385	   An FMTP format parameter can be specified for the RFC 4103
1386	   [RFC4103]media, to be used in text media SDP declarations for
1387	   negotiating RTT multi-party capabilities.  The parameter can have the
1388	   name "rtt-mix", with one or more of its possible values.

1390	   The possible values in the list are:

1392	      rtp-mixer

1394	      perc

1396	   rtp-mixer indicates capability for using the RTP-mixer based mixing
1397	   and presentation of multi-party text using the CSRC-list.

1399	   perc indicates capability for using the perc based transmission of
1400	   multi-party text.

1402	   Example: a=fmtp 96 98/98/98 rtt-mix=rtp-mixer

1404	   If, after evaluation of the alternatives in this specification, only
1405	   one mixing method is selected to be brought to implementation, then
1406	   the parameter can be reduced to a single parameter with no list of
1407	   values.

1409	   An offer-answer exchange should take place and the common method
1410	   selected by the answering party shall be used in the session with
1411	   that UA.

1413	   When no common method is declared, then only the fallback method can
1414	   be used, or the session denied.

1416	   Pros:

1418	   Provides a clear decision method.

1420	   Can be extended with new mixing methods.

1422	   Can be used on specific text media.

1424	   Can be used also for SDP-controlled WebRTC sessions with multiple
1425	   streams in the same data channel.

1427	   Cons:

1429	   Requires standardization and IANA registration.

1431	   May cause interop problems with current RFC4103 [RFC4103]
1432	   implementations not expecting a new fmtp-parameter.

1434	   Cannot guide SIP routing.

1436	6.6.  A text media subtype for support of multi-party rtt

1438	   Indicating a specific text media subtype in SDP is a straightforward
1439	   way for negotiating multi-party capability.  Especially if there are
1440	   format differences from the "text/red" and "text/t140" formats of
1441	   RFC4103 [RFC4103], then this is a natural way to do the negotiation
1442	   for multi-party rtt.

1444	   Pros:

1446	   No extra efforts if a new format is needed anyway.

1448	   Cons:

1450	   None specific to using the format indication for negotiation of
1451	   multi-party capability.  But only feasible if a new format is needed
1452	   anyway.

1454	6.7.  Preferred capability declaration method for RTP-based transport.

1456	   If the preferred transport method is one with a specific media
1457	   subtype in sdp, then speciication by media subtype is preferred.

1459	   If this would not be the case, then the preferred capability
1460	   declaration method would be the one with a simplified SDP attribute
1461	   "a=rtt-mix" Section 6.4 because it is straightforward and partially
1462	   usable also for WebRTC if so needed.

1464	6.8.  Identification of the source of text for RTP-based solutions

1466	   The main way to identify the source of text in the RTP based solution
1467	   is by the SSRC of the sending participant.  In the RTP-mixer
1468	   solution, this SSRC is included in the CSRC list of the transmitted
1469	   packets.  Further identification that may be needed for better
1470	   labelling of received text may be achieved from a number of sources.
1471	   It may be the RTCP SDES CNAME and NAME reports, and in the conference
1472	   notification data (RFC 4575) [RFC4575].

1474	   As soon as a new member is added to the RTP session, its
1475	   characteristics should be transmitted in RTCP SDES CNAME and NAME
1476	   reports according to section 6.5 in RFC 3550 [RFC3550].  The
1477	   information about the participant should also be included in the
1478	   conference data including the text media member in a notification
1479	   according to RFC 4575 [RFC4575].

1481	   The RTCP SDES report, SHOULD contain identification of the source
1482	   represented by the SSRC/CSRC identifier.  This identification MUST
1483	   contain the CNAME field and MAY contain the NAME field and other
1484	   defined fields of the SDES report.

1486	   A focus UA SHOULD primarily convey SDES information received from the
1487	   sources of the session members.  When such information is not
1488	   available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME
1489	   information from available information from the SIP session with the
1490	   participant.

1492	7.  RTT bridging in WebRTC

1494	   Within WebRTC, real-time text is specified to be carried in WebRTC
1495	   data channels as specified in
1496	   [I-D.ietf-mmusic-t140-usage-data-channel].  A few ways to handle
1497	   multi-party RTT are mentioned briefly.  They are repeated below.

1499	7.1.  RTT bridging in WebRTC with one data channel per source

1501	   A straightforward way to handle multi-party RTT is for the bridge to
1502	   open one T.140 data channel per source towards the receiving
1503	   participants.

1505	   The stream-id forms a unique stream identification.

1507	   The identification of the source is made through the Label property
1508	   of the channel, and session information belonging to the source.  The
1509	   endpoint can compose a readable label for the presentation from this
1510	   information.

1512	   Pros:

1514	   This is a straightforward solution.

1516	   The load per source is low.

1518	   Cons:

1520	   With a high number of participants, the overhead of establishing and
1521	   maintaining the high number of data channels required may be high,
1522	   even if the load per channel is low.

1524	7.2.  RTT bridging in WebRTC with one common data channel

1526	   A way to handle multi-party RTT in WebRTC is for the bridge combine
1527	   text from all sources into one data channel and insert the sources in
1528	   the stream by a T.140 control code for source.

1530	   This method is described in a corresponding section for RTP
1531	   transmission above in Section 4.1.1.9.

1533	   The identification of the source is made through insertion in the
1534	   beginning of each text transmission from a source of a control code
1535	   extension "c" followed by a string representing the source, framed by
1536	   the control code start and end flags SOS and ST (See ITU-T T.140
1537	   [T140]).

1539	   A receiving endpoint is supposed to separate text items from the
1540	   different sources and identify and display them in a suitable way.

1542	   The endpoint does not always display the source identification in the
1543	   received text at the place where it is received, but has the
1544	   information as a guide for planning the presentation of received
1545	   text.  A label corresponding to the source identification is
1546	   presented when needed depending on the selected presentation style.

1548	   Pros:

1550	   This solution has relatively low overhead on session and network
1551	   level

1553	   Cons:

1555	   This solution has higher overhead on the media contents level than
1556	   the WebRTC solution above.

1558	   Standardisation of the new control code "c" in ITU-T T.140 [T140] is
1559	   required.

1561	   The conference server need to be allowed to decrypt/encrypt the data
1562	   channel contents.

1564	7.3.  Preferred rtt multi-party method for WebRTC

1566	   For WebRTC, one method is to prefer because of the simplicity.  So,
1567	   for WebRTC, the method to implement for multi-party RTT with multi-
1568	   party aware parties when no other method is explicitly agreed between
1569	   implementing parties is: "RTT bridging in WebRTC with one data
1570	   channel per source" Section 7.1.

1572	8.  Presentation of multi-party text

1574	   All session participants with RTP based transport MUST observe the
1575	   SSRC/CSRC field of incoming text RTP packets, and make note of which
1576	   source they came from in order to be able to present text in a way
1577	   that makes it easy to read text from each participant in a session,
1578	   and get information about the source of the text.

1580	   In the WebRTC case, the Label parameter and other provided endpoint
1581	   information should be used for the same purpose.

1583	8.1.  Associating identities with text streams

1585	   A source identity SHOULD be composed from available information
1586	   sources and displayed together with the text as indicated in ITU-T
1587	   T.140 Appendix[T140].

1589	   The source identity should primarily be the NAME field from incoming
1590	   SDES packets.  If this information is not available, and the session
1591	   is a two-party session, then the T.140 source identity SHOULD be
1592	   composed from the SIP session participant information.  For multi-
1593	   party sessions the source identity may be composed by local
1594	   information if sufficient information is not available in the
1595	   session.

1597	   Applications may abbreviate the presented source identity to a
1598	   suitable form for the available display.

1600	   Applications may also replace received source information with
1601	   internally used nicknames.

1603	8.2.  Presentation details for multi-party aware endpoints.

1605	   The multi-party aware endpoint should after any action for recovery
1606	   of data from lost packets, separate the incoming streams and present
1607	   them according to the style that the receiving application supports
1608	   and the user has selected.  The decisions taken for presentation of
1609	   the multi-party interchange shall be purely on the receiving side.
1610	   The sending application must not insert any item in the stream to
1611	   influence presentation that is not requested by the sending
1612	   participant.

1614	8.2.1.  Bubble style presentation

1616	   One often used style is to present real-time text in chunks in
1617	   readable bubbles identified by labels containing names of sources.
1618	   Bubbles are placed in one column in the presentation area and are
1619	   closed and moved upwards in the presentation area after certain items
1620	   or events, when there is also newer text from another source that
1621	   would go into a new bubble.  The text items that allows bubble
1622	   closing are any character closing a phrase or sentence followed by a
1623	   space or a timeout of a suitable time (about 10 seconds).

1625	   Real-time active text sent from the local user should be presented in
1626	   a separate area.  When there is a reason to close a bubble from the
1627	   local user, the bubble should be placed above all real-time active
1628	   bubbles, so that the time order that real-time text entries were
1629	   completed is visible.

1631	   Scrolling is usually provided for viewing of recent or older text.
1632	   When scrolling is done to an earlier point in the text, the
1633	   presentation shall not move the scroll position by new received text.
1634	   It must be the decision of the local user to return to automatic
1635	   viewing of latest text actions.  It may be useful with an indication
1636	   that there is new text to read after scrolling to an earlier position
1637	   has been activated.

1639	   The presentation area may become too small to present all text in all
1640	   real-time active bubbles.  Various techniques can be applied to
1641	   provide a good overview and good reading opportunity even in such
1642	   situations.  The active real-time bubble may have a limited number of
1643	   lines and if their contents need more lines, then a scrolling
1644	   opportunity within the real-time active bubble is provided.  Another
1645	   method can be to only show the label and the last line of the active
1646	   real-time bubble contents, and make it possible to expand or compress
1647	   the bubble presentation between full view and one line view.

1649	   Erasures require special consideration.  Erasure within a real-time
1650	   active bubble is straightforward.  But if erasure from one
1651	   participant affects the last character before a bubble, the whole
1652	   previous bubble becomes the actual bubble for real-time action by
1653	   that participant and is placed below all other bubbles in the
1654	   presentation area.  If the border between bubbles was caused by the
1655	   CRLF characters (instead of the normal "Line Separator"), only one
1656	   erasure action is required to erase this bubble border.  When a
1657	   bubble is closed, it is moved up, above all real-time active bubbles.

1659	   A three-party view is shown in this example .

1661	                 _________________________________________________
1662	                |                                              |^|
1663	                |                                              |-|
1664	                |[Alice] Hi, Alice here.                       | |
1665	                |                                              | |
1666	                |[Bob] Bob as well.                            | |
1667	                |                                              | |
1668	                |[Eve] Hi, this is Eve, calling from Paris.    | |
1669	                |      I thought you should be here.           | |
1670	                |                                              | |
1671	                |[Alice] I am coming on Thursday, my           | |
1672	                |      performance is not until Friday morning.| |
1673	                |                                              | |
1674	                |[Bob] And I on Wednesday evening.             | |
1675	                |                                              | |
1676	                |[Alice] Can we meet on Thursday evening?      | |
1677	                |                                              | |
1678	                |[Eve] Yes, definitely. How about 7pm.         | |
1679	                |     at the entrance of the restaurant        | |
1680	                |     Le Lion Blanc?                           | |
1681	                |[Eve] we can have dinner and then take a walk | |
1682	                |                                              | |
1683	                | <Eve-typing> But I need to be back to        | |
1684	                |    the hotel by 11 because I need            | |
1685	                |                                              | |
1686	                | <Bob-typing> I wou                           |-|
1687	                |______________________________________________|v|
1688	                | of course, I underst                           |
1689	                |________________________________________________|

1691	               Figure 1: Three-party call with bubble style.

1693	   Figure 1: Example of a three-party call presented in the bubble
1694	   style.

1696	8.2.2.  Other presentation styles

1698	   Other presentation styles than the bubble style may be arranged and
1699	   appreciated by the users.  In a video conference one way may be to
1700	   have a real-time text area below the video view of each participant.
1701	   Another view may be to provide one column in a presentation area for
1702	   each participant and place the text entries in a relative vertical
1703	   position corresponding to when text entry in them was completed.  The
1704	   labels can then be placed in the column header.  The considerations
1705	   for ending and moving and erasure of entered text discussed above for
1706	   the bubble style are valid also for these styles.

1708	   This figure shows how a coordinated column view MAY be presented.

1710	   _____________________________________________________________________
1711	   |       Bob          |       Eve            |       Alice           |
1712	   |____________________|______________________|_______________________|
1713	   |                    |                      |I will arrive by TGV.  |
1714	   |My flight is to Orly|                      |Convenient to the main |
1715	   |                    |Hi all, can we plan   |station.               |
1716	   |                    |for the seminar?      |                       |
1717	   |Eve, will you do    |                      |                       |
1718	   |your presentation on|                      |                       |
1719	   |Friday?             |Yes, Friday at 10.    |                       |
1720	   |Fine, wo            |                      |We need to meet befo   |
1721	   |___________________________________________________________________|

1723	   Figure 2: A coordinated column-view of a three-party session with
1724	   entries ordered in approximate time-order.

1726	9.  Presentation details for multi-party unaware endpoints.

1728	   Multi-party unaware endpoints are prepared only for presentation of
1729	   two sources of text, the local user and a remote user.  If mixing for
1730	   multi-party unaware endpoints is to be supported, in order to enable
1731	   some multi-party communication with such endpoint, the mixer need to
1732	   plan the presentation and insert labels and line breaks before
1733	   lables.  Many limitations appear for this presentation mode, and it
1734	   must be seen as a fallback and a last resort.

1736	   A procedure for presenting RTT to a conference-unaware endpoint is
1737	   included in [I-D.ietf-avtcore-multi-party-rtt-mix]

1739	10.  Security Considerations

1741	   The security considerations valid for RFC 4103 [RFC4103] and RFC 3550
1742	   [RFC3550] are valid also for the multi-party sessions with text.

1744	11.  IANA Considerations

1746	   The items for indication and negotiation of capability for multi-
1747	   party rtt should be registered with IANA in the specifications where
1748	   they are specified in detail.

1750	12.  Congestion considerations

1752	   The congestion considerations described in RFC 4103 [RFC4103] are
1753	   valid also for the recommended RTP-based multi-party use of the real-
1754	   time text transport.  A risk for congestion may appear if a number of
1755	   conference participants are active transmitting text simultaneously,
1756	   because the recommended RTP-based multi-party transmission method
1757	   does not allow multiple sources of text to contribute to the same
1758	   packet.

1760	   In situations of risk for congestion, the Focus UA MAY combine
1761	   packets from the same source to increase the transmission interval
1762	   per source up to one second.  Local conference policy in the Focus UA
1763	   may be used to decide which streams shall be selected for such
1764	   transmission frequency reduction.

1766	13.  Acknowledgements

1768	   Arnoud van Wijk for contributions to an earlier, expired draft of
1769	   this memo.

1771	14.  Change history

1773	14.1.  Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-02

1775	   Added detail in the section on RTP translator model alternative
1776	   4.1.2.1.

1778	14.2.  Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-01

1780	   Added three more methods for RTP-mixer mixing.  Two RFC 5109 FEC
1781	   based and another with modified data header to detect source of
1782	   completely lost text.

1784	   Separated RTP-based and WebRTC based solutions.

1786	   Deleted the multi-party-unaware mixing procedure appendix.  It is now
1787	   included in the draft draft-ietf-avtcore-multi-party-rtt-mix.  Kept a
1788	   section with a reference to the new place.

1790	14.3.  Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to draft-
1791	       hellstrom-avtcore-multi-party-rtt-solutions-00

1793	   Add discussion about switching performance, as discussed in avtcore
1794	   on March 13.

1796	   Added that a decrease of transmission interval to 100 ms increases
1797	   switching performance by a factor 3, but still not sufficient.

1799	   Added that the CSRC-list method also uses 100 milliseconds
1800	   transmission interval.

1802	   Added the method with multiple primary text in each packet.

1804	   Added the timestamp-based method for rtp-mixing proposed by James
1805	   Hamlin on March 14.

1807	   Corrected the chat style presentation example picture.  Delete a few
1808	   "[mix]".

1810	14.4.  Changes from version draft-hellstrom-mmusic-multi-party-rtt-01 to
1811	       -02

1813	   Change from a general overview to overview with clear
1814	   recommendations.

1816	   Splits text coordination methods in three groups.

1818	   Recommends rtt-mixer with sources in CSRC-list but referenes to its
1819	   spec for details.

1821	   Shortened Appendix with conference-unaware example.

1823	   Cleaned up preferences.

1825	   Inserted pictures of screen-views.

1827	15.  References

1829	15.1.  Normative References

1831	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1832	              Requirement Levels", BCP 14, RFC 2119,
1833	              DOI 10.17487/RFC2119, March 1997,
1834	              <https://www.rfc-editor.org/info/rfc2119>.

1836	15.2.  Informative References

1838	   [EN301549] ETSI, "EN 301 549. Accessibility requirements for ICT
1839	              products and services", November 2019,
1840	              <https://www.etsi.org/deliver/
1841	              etsi_en/301500_301599/301549/03.01.01_60/
1842	              en_301549v030101p.pdf>.

1844	   [I-D.ietf-avtcore-multi-party-rtt-mix]
1845	              Hellstrom, G., "RTP-mixer formatting of multi-party Real-
1846	              time text", Work in Progress, Internet-Draft, draft-ietf-
1847	              avtcore-multi-party-rtt-mix-06, 11 June 2020,
1848	              <https://tools.ietf.org/html/draft-ietf-avtcore-multi-
1849	              party-rtt-mix-06>.

1851	   [I-D.ietf-avtcore-multiplex-guidelines]
1852	              Westerlund, M., Burman, B., Perkins, C., Alvestrand, H.,
1853	              and R. Even, "Guidelines for using the Multiplexing
1854	              Features of RTP to Support Multiple Media Streams", Work
1855	              in Progress, Internet-Draft, draft-ietf-avtcore-multiplex-
1856	              guidelines-12, 16 June 2020, <https://tools.ietf.org/html/
1857	              draft-ietf-avtcore-multiplex-guidelines-12>.

1859	   [I-D.ietf-mmusic-t140-usage-data-channel]
1860	              Holmberg, C. and G. Hellstrom, "T.140 Real-time Text
1861	              Conversation over WebRTC Data Channels", Work in Progress,
1862	              Internet-Draft, draft-ietf-mmusic-t140-usage-data-channel-
1863	              14, 10 April 2020, <https://tools.ietf.org/html/draft-
1864	              ietf-mmusic-t140-usage-data-channel-14>.

1866	   [I-D.ietf-perc-private-media-framework]
1867	              Jones, P., Benham, D., and C. Groves, "A Solution
1868	              Framework for Private Media in Privacy Enhanced RTP
1869	              Conferencing (PERC)", Work in Progress, Internet-Draft,
1870	              draft-ietf-perc-private-media-framework-12, 5 June 2019,
1871	              <https://tools.ietf.org/html/draft-ietf-perc-private-
1872	              media-framework-12>.

1874	   [NENAi3]   NENA, "NENA-STA-010.2-2016. Detailed Functional and
1875	              Interface Standards for the NENA i3 Solution", October
1876	              2016, <https://www.nena.org/page/i3_Stage3>.

1878	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
1879	              Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse-
1880	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
1881	              DOI 10.17487/RFC2198, September 1997,
1882	              <https://www.rfc-editor.org/info/rfc2198>.

1884	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
1885	              A., Peterson, J., Sparks, R., Handley, M., and E.
1886	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
1887	              DOI 10.17487/RFC3261, June 2002,
1888	              <https://www.rfc-editor.org/info/rfc3261>.

1890	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
1891	              with Session Description Protocol (SDP)", RFC 3264,
1892	              DOI 10.17487/RFC3264, June 2002,
1893	              <https://www.rfc-editor.org/info/rfc3264>.

1895	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1896	              Jacobson, "RTP: A Transport Protocol for Real-Time
1897	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
1898	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

1900	   [RFC3840]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat,
1901	              "Indicating User Agent Capabilities in the Session
1902	              Initiation Protocol (SIP)", RFC 3840,
1903	              DOI 10.17487/RFC3840, August 2004,
1904	              <https://www.rfc-editor.org/info/rfc3840>.

1906	   [RFC3841]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller
1907	              Preferences for the Session Initiation Protocol (SIP)",
1908	              RFC 3841, DOI 10.17487/RFC3841, August 2004,
1909	              <https://www.rfc-editor.org/info/rfc3841>.

1911	   [RFC4103]  Hellstrom, G. and P. Jones, "RTP Payload for Text
1912	              Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005,
1913	              <https://www.rfc-editor.org/info/rfc4103>.

1915	   [RFC4353]  Rosenberg, J., "A Framework for Conferencing with the
1916	              Session Initiation Protocol (SIP)", RFC 4353,
1917	              DOI 10.17487/RFC4353, February 2006,
1918	              <https://www.rfc-editor.org/info/rfc4353>.

1920	   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A
1921	              Session Initiation Protocol (SIP) Event Package for
1922	              Conference State", RFC 4575, DOI 10.17487/RFC4575, August
1923	              2006, <https://www.rfc-editor.org/info/rfc4575>.

1925	   [RFC4579]  Johnston, A. and O. Levin, "Session Initiation Protocol
1926	              (SIP) Call Control - Conferencing for User Agents",
1927	              BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006,
1928	              <https://www.rfc-editor.org/info/rfc4579>.

1930	   [RFC4597]  Even, R. and N. Ismail, "Conferencing Scenarios",
1931	              RFC 4597, DOI 10.17487/RFC4597, August 2006,
1932	              <https://www.rfc-editor.org/info/rfc4597>.

1934	   [RFC5109]  Li, A., Ed., "RTP Payload Format for Generic Forward Error
1935	              Correction", RFC 5109, DOI 10.17487/RFC5109, December
1936	              2007, <https://www.rfc-editor.org/info/rfc5109>.

1938	   [RFC5194]  van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real-
1939	              Time Text over IP Using the Session Initiation Protocol
1940	              (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008,
1941	              <https://www.rfc-editor.org/info/rfc5194>.

1943	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
1944	              Media Attributes in the Session Description Protocol
1945	              (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
1946	              <https://www.rfc-editor.org/info/rfc5576>.

1948	   [RFC6443]  Rosen, B., Schulzrinne, H., Polk, J., and A. Newton,
1949	              "Framework for Emergency Calling Using Internet
1950	              Multimedia", RFC 6443, DOI 10.17487/RFC6443, December
1951	              2011, <https://www.rfc-editor.org/info/rfc6443>.

1953	   [RFC6881]  Rosen, B. and J. Polk, "Best Current Practice for
1954	              Communications Services in Support of Emergency Calling",
1955	              BCP 181, RFC 6881, DOI 10.17487/RFC6881, March 2013,
1956	              <https://www.rfc-editor.org/info/rfc6881>.

1958	   [RFC7667]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
1959	              DOI 10.17487/RFC7667, November 2015,
1960	              <https://www.rfc-editor.org/info/rfc7667>.

1962	   [T140]     ITU-T, "Recommendation ITU-T T.140 (02/1998), Protocol for
1963	              multimedia application text conversation", February 1998,
1964	              <https://www.itu.int/rec/T-REC-T.140-199802-I/en>.

1966	   [T140ad1]  ITU-T, "Recommendation ITU-T.140 Addendum 1 - (02/2000),
1967	              Protocol for multimedia application text conversation",
1968	              February 2000,
1969	              <https://www.itu.int/rec/T-REC-T.140-200002-I!Add1/en>.

1971	   [TS103479] ETSI, "TS 103 479. Emergency communications (EMTEL); Core
1972	              elements for network independent access to emergency
1973	              services", December 2019, <https://www.etsi.org/deliver/
1974	              etsi_ts/103400_103499/103479/01.01.01_60/
1975	              ts_103479v010101p.pdf>.

1977	   [TS22173]  3GPP, "IP Multimedia Core Network Subsystem (IMS)
1978	              Multimedia Telephony Service and supplementary services;
1979	              Stage 1", 3GPP TS 22.173 17.1.0, 20 December 2019,
1980	              <http://www.3gpp.org/ftp/Specs/html-info/22173.htm>.

1982	   [TS24147]  3GPP, "Conferencing using the IP Multimedia (IM) Core
1983	              Network (CN) subsystem; Stage 3", 3GPP TS 24.147 16.0.0,
1984	              19 December 2019,
1985	              <http://www.3gpp.org/ftp/Specs/html-info/24147.htm>.

1987	Author's Address

1989	   Gunnar Hellstrom
1990	   Gunnar Hellstrom Accessible Communication
1991	   Esplanaden 30
1992	   SE-136 70 Vendelso
1993	   Sweden

1995	   Phone: +46 708 204 288
1996	   Email: gunnar.hellstrom@ghaccess.se