idnits 2.17.1 draft-hellstrom-avtcore-multi-party-rtt-solutions-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC4103]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (19 June 2020) is 1399 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'ICE' is mentioned on line 981, but not defined == Unused Reference: 'RFC3264' is defined on line 1890, but no explicit reference was found in the text == Outdated reference: A later version (-20) exists of draft-ietf-avtcore-multi-party-rtt-mix-06 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Hellstrom 3 Internet-Draft Gunnar Hellstrom Accessible Communication 4 Intended status: Informational 19 June 2020 5 Expires: 21 December 2020 7 Real-time text solutions for multi-party sessions 8 draft-hellstrom-avtcore-multi-party-rtt-solutions-02 10 Abstract 12 This document specifies methods for Real-Time Text (RTT) media 13 handling in multi-party calls. The main transport is to carry Real- 14 Time text by the RTP protocol in a time-sampled mode according to RFC 15 4103 [RFC4103]. The mechanisms enable the receiving application to 16 present the received real-time text media separated per source, in 17 different ways according to user preferences. Some presentation 18 related features are also described explaining suitable variations of 19 transmission and presentation of text. 21 Call control features are described for the SIP environment. A 22 number of alternative methods for providing the multi-party 23 negotiation, transmission and presentation are discussed and a 24 recommendation for the main ones is provided. The main solution for 25 SIP based centralized multi-party handling of real-time text is 26 achieved through a media control unit coordinating multiple RTP text 27 streams into one RTP stream. 29 Alternative methods using a single RTP stream and source 30 identification inline in the text stream are also described, one of 31 them being provided as a lower functionality fallback method for 32 endpoints with no multi-party awareness for RTT. 34 Bridging methods where the text stream is carried without the 35 contents being dealt with in detail by the bridge are also discussed. 37 Brief information is also provided for multi-party RTT in the WebRTC 38 environment. 40 The intention is to provide background for decisions, specification 41 and implementation of selected methods. 43 Status of This Memo 45 This Internet-Draft is submitted in full conformance with the 46 provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF). Note that other groups may also distribute 50 working documents as Internet-Drafts. The list of current Internet- 51 Drafts is at https://datatracker.ietf.org/drafts/current/. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 This Internet-Draft will expire on 21 December 2020. 60 Copyright Notice 62 Copyright (c) 2020 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 67 license-info) in effect on the date of publication of this document. 68 Please review these documents carefully, as they describe your rights 69 and restrictions with respect to this document. Code Components 70 extracted from this document must include Simplified BSD License text 71 as described in Section 4.e of the Trust Legal Provisions and are 72 provided without warranty as described in the Simplified BSD License. 74 Table of Contents 76 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 77 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 78 2. Centralized conference model . . . . . . . . . . . . . . . . 5 79 3. Requirements on multi-party RTT . . . . . . . . . . . . . . . 6 80 4. RTP based solutions . . . . . . . . . . . . . . . . . . . . . 7 81 4.1. Coordination of text RTP streams . . . . . . . . . . . . 7 82 4.1.1. RTP-based solutions with a central mixer . . . . . . 7 83 4.1.1.1. RTP Mixer using default RFC 4103 methods . . . . 7 84 4.1.1.2. RTP Mixer using the default method but decreased 85 transmission interval . . . . . . . . . . . . . . . 8 86 4.1.1.3. RTP Mixer with frequent transmission and indicating 87 sources in CSRC-list . . . . . . . . . . . . . . . 9 88 4.1.1.4. RTP Mixer using timestamp to identify 89 redundancy . . . . . . . . . . . . . . . . . . . . 10 90 4.1.1.5. RTP Mixer with multiple primary data in each packet 91 and individual sequence numbers . . . . . . . . . . 11 92 4.1.1.6. RTP Mixer with multiple primary data in each 93 packet . . . . . . . . . . . . . . . . . . . . . . 12 94 4.1.1.7. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy 95 in the packets . . . . . . . . . . . . . . . . . . 13 97 4.1.1.8. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy 98 and separate sequence number in the packets . . . . 15 99 4.1.1.9. RTP Mixer indicating participants by a control code 100 in the stream . . . . . . . . . . . . . . . . . . . 17 101 4.1.1.10. Mixing for multi-party unaware user agents . . . 18 102 4.1.2. RTP-based bridging with minor RTT media contents 103 reformatting by the bridge . . . . . . . . . . . . . 20 104 4.1.2.1. RTP Translator sending one RTT stream per 105 participant . . . . . . . . . . . . . . . . . . . . 20 106 4.1.2.2. Distributing packets in an end-to-end encryption 107 structure . . . . . . . . . . . . . . . . . . . . . 23 108 4.1.2.3. Mesh of RTP endpoints . . . . . . . . . . . . . . 23 109 4.1.2.4. Multiple RTP sessions, one for each 110 participant . . . . . . . . . . . . . . . . . . . . 24 111 5. Preferred RTP-based multi-party RTT transport method . . . . 25 112 6. Session control of RTP-based multi-party RTT sessions . . . . 25 113 6.1. Implicit RTT multi-party capability indication . . . . . 26 114 6.2. RTT multi-party capability declared by SIP media-tags . . 27 115 6.3. SDP media attribute for RTT multi-party capability 116 indication . . . . . . . . . . . . . . . . . . . . . . . 28 117 6.4. Simplified SDP media attribute for RTT multi-party 118 capability indication . . . . . . . . . . . . . . . . . . 29 119 6.5. SDP format parameter for RTT multi-party capability 120 indication . . . . . . . . . . . . . . . . . . . . . . . 30 121 6.6. A text media subtype for support of multi-party rtt . . . 31 122 6.7. Preferred capability declaration method for RTP-based 123 transport. . . . . . . . . . . . . . . . . . . . . . . . 31 124 6.8. Identification of the source of text for RTP-based 125 solutions . . . . . . . . . . . . . . . . . . . . . . . . 32 126 7. RTT bridging in WebRTC . . . . . . . . . . . . . . . . . . . 32 127 7.1. RTT bridging in WebRTC with one data channel per 128 source . . . . . . . . . . . . . . . . . . . . . . . . . 32 129 7.2. RTT bridging in WebRTC with one common data channel . . . 33 130 7.3. Preferred rtt multi-party method for WebRTC . . . . . . . 34 131 8. Presentation of multi-party text . . . . . . . . . . . . . . 34 132 8.1. Associating identities with text streams . . . . . . . . 34 133 8.2. Presentation details for multi-party aware endpoints. . . 35 134 8.2.1. Bubble style presentation . . . . . . . . . . . . . . 35 135 8.2.2. Other presentation styles . . . . . . . . . . . . . . 37 136 9. Presentation details for multi-party unaware endpoints. . . . 37 137 10. Security Considerations . . . . . . . . . . . . . . . . . . . 37 138 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 139 12. Congestion considerations . . . . . . . . . . . . . . . . . . 38 140 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 141 14. Change history . . . . . . . . . . . . . . . . . . . . . . . 38 142 14.1. Changes to 143 draft-hellstrom-avtcore-multi-party-rtt-solutions-02 . . 38 145 14.2. Changes to 146 draft-hellstrom-avtcore-multi-party-rtt-solutions-01 . . 38 147 14.3. Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to 148 draft-hellstrom-avtcore-multi-party-rtt-solutions-00 . . 39 149 14.4. Changes from version 150 draft-hellstrom-mmusic-multi-party-rtt-01 to -02 . . . . 39 151 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 152 15.1. Normative References . . . . . . . . . . . . . . . . . . 39 153 15.2. Informative References . . . . . . . . . . . . . . . . . 39 154 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 43 156 1. Introduction 158 Real-time text (RTT) is a medium in real-time conversational 159 sessions. Text entered by participants in a session is transmitted 160 in a time-sampled fashion, so that no specific user action is needed 161 to cause transmission. This gives a direct flow of text in the rate 162 it is created, that is suitable in a real-time conversational 163 setting. The real-time text medium can be combined with other media 164 in multimedia sessions. 166 Media from a number of multimedia session participants can be 167 combined in a multi-party session. The present document specifies 168 how the real-time text streams can be handled in multi-party 169 sessions. Recommendations are provided for preferred methods. 171 The description is mainly focused on the transport level, but also 172 describes a few session and presentation level aspects. 174 Transport of real-time text is specified in RFC 4103 [RFC4103] RTP 175 Payload for text conversation. It makes use of RFC 3550 [RFC3550] 176 Real Time Protocol, for transport. Robustness against network 177 transmission problems is normally achieved through redundant 178 transmission based on the principle from RFC 2198 [RFC2198], with one 179 primary and two redundant transmission of each text element. Primary 180 and redundant transmissions are combined in packets and described by 181 a redundancy header. This transport is usually used in the SIP 182 Session Initiation Protocol RFC 3261 [RFC3261] environment. 184 A very brief overview of functions for real-time text handling in 185 multi-party sessions is described in RFC 4597 [RFC4597] Conferencing 186 Scenarios, sections 4.8 and 4.10. The present specification builds 187 on that description and indicates which protocol mechanisms should be 188 used to implement multi-party handling of real-time text. 190 Real-time text can also be transported in the WebRTC environment, by 191 using WebRTC data channels according to 192 [I-D.ietf-mmusic-t140-usage-data-channel]. Multi-party aspects for 193 WebRTC solutions are briefly covered. 195 1.1. Requirements Language 197 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 198 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 199 document are to be interpreted as described in RFC 2119 [RFC2119]. 201 2. Centralized conference model 203 In the centralized conference model for SIP, introduced in RFC 4353 204 [RFC4353] "A Framework for Conferencing with the Session Initiation 205 Protocol (SIP)", one function co-ordinates the communication with 206 participants in the multi-party session. This function also controls 207 media mixer functions for the media appearing in the session. The 208 central function is common for control of all media, while the media 209 mixers may work differently for each media. 211 The central function is called the Focus UA. Many variants exist for 212 setting up sessions including the multipoint control centre. It is 213 not within scope of this description to describe these, but rather 214 the media specific handling in the mixer required to handle multi- 215 party calls with RTT. 217 The main principle for handling real-time text media in a centralized 218 conference is that one RTP session for real-time text is established 219 including the multipoint media control centre and the participating 220 endpoints which are going to have real-time text exchange with the 221 others. 223 The different possible mechanisms for mixing and transporting RTT 224 differs in the way they multiplex the text streams and how they 225 identify the sources of the streams. RFC 7667 [RFC7667] describes a 226 number of possible use cases for RTP. This specification refers to 227 different sections of RFC 7667 for further reading of the situations 228 caused by the different possible design choices. 230 The recommended method for using RTT in a centralized conference 231 model is specified in [I-D.ietf-avtcore-multi-party-rtt-mix] based on 232 the recommendations in the present document. 234 Real-time text can also be transported in the WebRTC environment, by 235 using WebRTC data channels according to 236 [I-D.ietf-mmusic-t140-usage-data-channel]. Ways to handle multi- 237 party calls in that environmnent are also specified. 239 3. Requirements on multi-party RTT 241 The following requirements are placed on multi-party RTT: 243 A solution shall be applicable to IMS (3GPP TS 22.173)[TS22173], 244 SIP based VoIP and Next Generation Emergency Services (NENA i3 245 [NENAi3], ETSI TS 103 479 [TS103479], RFC 6443[RFC6443]). 247 The transmission interval for text must not be longer than 500 248 milliseconds when there is anything available to send. Ref ITU-T 249 T.140 [T140]. 251 If text loss is detected or suspected, a missing text marker shall 252 be inserted in the text stream. Ref ITU-T T.140 Amendment 1 253 [T140ad1]. ETSI EN 301 549 [EN301549] 255 The display of text from the members of the conversation shall be 256 arranged so that the text from each participant is clearly 257 readable, and its source and the relative timing of entered text 258 is visualized in the display. Mechanisms for looking back in the 259 contents from the current session should be provided. The text 260 should be displayed as soon as it is received. Ref ITU-T T.140 261 [T140] 263 Bridges must be multimedia capable (voice, video, text). Ref NENA 264 i3 STA-010.2. [NENAi3] 266 R7: It MUST be possible to use real-time text in conferences both 267 as a medium of discussion between individual participants (for 268 example, for sidebar discussions in real-time text while listening 269 to the main conference audio) and for central support of the 270 conference with real-time text interpretation of speech. Ref RFC 271 5194.[RFC5194] 273 It should be possible to protect RTT contents with usual means for 274 privacy and integrity.Ref RFC 6881 section 16. [RFC6881] 276 Conferencing procedures are documented in RFC 4579 [RFC4579]. Ref 277 NENA i3 STA-010.2.[NENAi3] 279 Conferencing applies to any kind of media stream by which users 280 may want to communicate. Ref 3GPP TS 24.147 [TS24147] 282 The framework for SIP conferences is specified in RFC 4353 283 [RFC4353]. Ref 3GPP TS 24.147 [TS24147] 285 The mixer performance requirements can be expressed in two 286 numbers. 288 1) The number of participants who can transmit simultaneously with 289 the text not being delayed in the mixer more than 500 290 milliseconds. This requirement is depending on the application. 291 Five simultaneous transmitting participants is a sufficiently high 292 number for most situations. 294 2) The switching time from when the mixer is transmitting text 295 from one participant and text arrives from another participant, 296 until the mixer sends the text from the second participant. This 297 time should not be more than 500 milliseconds when there are up to 298 five participants sending text simultaneously. 300 4. RTP based solutions 302 4.1. Coordination of text RTP streams 304 Coordinating and sending text RTP streams in the multi-party session 305 can be done in a number of ways. The most suitable methods are 306 specified here with pros and cons. 308 A receiving and presenting endpoint MUST separate text from the 309 different sources and identify and display them accordingly. 311 4.1.1. RTP-based solutions with a central mixer 313 A set of solutions can be based on the central RTP mixer. They are 314 described here and a preferred method selected. 316 4.1.1.1. RTP Mixer using default RFC 4103 methods 318 Without any extra specifications, a mixer would transmit with 300 319 milliseconds intervals, and use RFC 4103 [RFC4103] with the default 320 redundancy of one original and two redundant transmissions. The 321 source of the text would be indicated by a single member in the CSRC 322 list. Text from different sources cannot be transmitted in the same 323 packet. Therefore, from the time when the mixer sent one piece of 324 new text from one source, it will need to transmit that text again 325 twice as redundant data, before it can send text from another source. 326 The switching time will thus be 900 milliseconds. The mixer can not 327 even send text from two simultaneous sources without introducing more 328 than 500 milliseconds delay. This is clearly insufficient. 330 Pros: 332 Only a capability negotiation method is needed. No other update of 333 standards are needed, just a general remark that traditional RTP- 334 mixing is used. 336 Cons: 338 Clearly insufficient mixer switching performance. 340 A bit complex handling of transmission when there is new text 341 available from more than one source. The mixer needs to send two 342 packets more with redundant text from the current source before 343 starting to send anything from the other source. 345 4.1.1.2. RTP Mixer using the default method but decreased transmission 346 interval 348 This method makes use of the default RTP-mixing method briefly 349 described in Section 4.1.1.1. The only difference is that the 350 transmission interval is decreased to 100 milliseconds when there is 351 text from more than one source available for transmission. This 352 increases the switching performance to three source switches per 353 second. The delay of new text from a participant can be one second 354 if five users send new text simultaneously. Text from two 355 simultaneous users would not get more dealyed than 400 ms. 357 Pros: 359 Minor influence on standards 361 Can be sdp-declared as "text/red" with a multi-party attribute for 362 capability negotiation. 364 Cons: 366 Too long delay of new text from more than two simultaneous sources. 368 Slightly higher risk for loss of text at bursty packet loss than for 369 the recommended transmission interval (300 ms) for RFC 4103. 371 When complete loss of packets occur (beyond recovery), it is not 372 possible to deduct from which source text was lost. 374 A bit complex handling of transmission when there is new text 375 available from more than one source. The mixer needs to send two 376 packets more with redundant text from the current source before 377 starting to send anything from the other source. 379 4.1.1.3. RTP Mixer with frequent transmission and indicating sources in 380 CSRC-list 382 An RTP media mixer combines text from participants into one RTP 383 stream, thus all using the same destination address/port combination, 384 the same RTP SSRC, and one sequence number series as described in 385 Section 7.1 and 7.3 of RTP RFC 3550 [RFC3550] about the Mixer 386 function. This method is also briefly described in RFC 7667, section 387 3.6.1 Media mixing mixer [RFC7667]. 389 The sources of the text in each RTP packet are identified by the CSRC 390 list in the RTP packets, containing the SSRC of the initial sources 391 of text. The order of the CSRC parameters is with the SSRC of the 392 source of the primary text first, followed by the SSRC of the first 393 level redundancy, and then the second level redundancy. 395 The transmission interval should be 100 milliseconds when there is 396 text to transmit from more than one source, and otherwise 300 ms. 398 The identification of the sources is made through the CSRC fields and 399 can be made more readable at the receiver through the RTCP SDES CNAME 400 and NAME packets as described in RTP[RFC3550]. 402 Information provided through the notification according to RFC 4575 403 [RFC4575] when the participant joined the conference provides also 404 suitable information and a reference to the SSRC. 406 A receiving endpoint is supposed to separate text items from the 407 different sources and identify and display them accordingly. 409 The ordered CSRC lists in the RFC 4103 [RFC4103] packets make it 410 possible to recover from loss of one and two packets in sequence and 411 assign the recovered text to the right source. For more loss, a 412 marker for possible loss should be inserted or presented. 414 The conference server needs to have authority to decrypt the payload 415 in the received RTP packets in order to be able to recover text from 416 redundant data or insert the missing text marker in the stream, and 417 repack the text in new packets. 419 Even if the format is very similar to "text/red" of RFC 4103, it has 420 been indicated that it needs to be declared as a new media subtype, 421 e.g. "text/rex". 423 Pros: 425 This method has low overhead and less complexity than the methods in 426 Section 4.1.1.1, Section 4.1.1.2, Section 4.1.1.4 and 427 Section 4.1.1.6. 429 When loss of packets occur, it is possible to recover text from 430 redundancy at loss of up to the number of redundancy levels carried 431 in the RFC 4103 [RFC4103] stream (normally primary and two redundant 432 levels). 434 This method can be implemented with most RTP implementations. 436 The source switching performance is sufficient for well-behaving 437 conference participants. There can be switching between five source 438 per second with an introduced delay of maximum 500 ms. With just two 439 parties typing simultaneously, the delay will be a maximum of 100 ms. 441 Cons: 443 When more consecutive packet loss than the number of generations of 444 redundant data appears, it is not possible to deduct the sources of 445 the totally lost data. 447 Slightly higher risk for loss of text at bursty packet loss than for 448 the recommended transmission interval for RFC 4103. 450 Requires a different sub media format, e.g. "text/rex". 452 The conference server needs to be allowed to decrypt/encrypt the 453 packet payload. This is however normal for media mixers for other 454 media. 456 4.1.1.4. RTP Mixer using timestamp to identify redundancy 458 This method has text only from one source per packet, as the original 459 RFC 4103 [RFC4103] specifies. Packets with text from different 460 sources are instead allowed to be merged. The recovery procedure in 461 the receiver will use the RTP timestamp and timestamp offsets in the 462 redundancy headers to evaluate if a piece of redundant data should be 463 recovered or not in case of packet loss. 465 In this method, the transmission interval is 100 milliseconds when 466 text from more than one source is available for transmission. 468 Pros: 470 The format of each packet is equal to what is specified in RFC 4103 471 [RFC4103]. 473 The source switching performance is sufficient. Text from five 474 participants can be transmitted simultaneously with 500 milliseconds 475 interval per source. 477 New text from five simultaneous sources can be transmitted within 500 478 milliseconds. This is sufficient. 480 Cons: 482 The recovery time in case of packet loss is long. With five 483 participants, it will be 1.5 seconds. 485 The recovery procedure is complex and very different from what is 486 described in RFC 4103 [RFC4103]. 488 It is not sure that this change can be regarded to be an update to 489 RFC 4103. It may need a new media subtype. 491 4.1.1.5. RTP Mixer with multiple primary data in each packet and 492 individual sequence numbers 494 This method allows primary as well as redundant text from more than 495 one source per packet. The packet payload contains an ordered set of 496 redundant and primary data with the same number of generations of 497 redundancy as once agreed in the SDP negotiation. The data header 498 reflects these parts of the payload. The CSRC list contains one CSRC 499 member per source in the payload and in the same order. An 500 individual sequence number per source is included in the data header 501 replacing the t140 payload type number that is instead assumed to be 502 constant in this format. This allows an individual extra sequence 503 number per source with maximum value 127, suitable for checking for 504 which source loss of text appeared when recovery was not possible. 506 The data header would contain the following fields: 507 0 1 2 3 508 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 510 |F| Source-seq | timestamp offset | block length | 511 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 512 Where "Source-seq" is the sequence number per source. 514 The maximum number of members in the CSRC-list is 16, and that is 515 therefore the maximum number of sources that can be represented in 516 each packet provided that all data can be fitted into the size 517 allowable in one packet. 519 Transmission is done as soon as there is new text available, but not 520 with shorter interval than 150 ms and not longer than 300 ms while 521 there is anything to send. 523 A new media subtype is needed, e.g. "text/rex". 525 This is an SDP offer example for both traditional "text/red" 526 and multi-party "text/rex" format: 528 m=text 11000 RTP/AVP 101 100 98 529 a=rtpmap:98 t140/1000 530 a=rtpmap:100 red/1000 531 a=rtpmap:101 rex/1000 532 a=fmtp:100 98/98/98 533 a=fmtp:101 98/98/98 535 Pros: 537 The source switching performance is good. Text from 16 participants 538 can be transmitted simultaneously. 540 New text from 16 simultaneous sources can be transmitted within 300 541 milliseconds. This is good performance. 543 When more consecutive packet loss than the number of generations of 544 redundant data appears, it is still possible to deduct the sources of 545 the totally lost data, when next text from these sources arrive. 547 Cons: 549 The format of each packet is different from what is specified in RFC 550 4103 [RFC4103]. 552 A new media subtype is needed. 554 The recovery procedure is a bit complex. 556 4.1.1.6. RTP Mixer with multiple primary data in each packet 558 This method allows primary as well as redundant text from more than 559 one source per packet. The packet payload contains an ordered set of 560 redundant and primary data with the same number of generations of 561 redundancy as once agreed in the SDP negotiation. The data header 562 reflects these parts of the payload. The CSRC list contains one CSRC 563 member per source in the payload and in the same order. The 564 The maximum number of members in the CSRC-list is 16, and that is 565 therefore the maximum number of sources that can be represented in 566 each packet provided that all data can be fitted into the size 567 allowable in one packet. 569 Transmission is done as soon as there is new text available, but not 570 with shorter interval than 150 ms and not longer than 300 ms while 571 there is anything to send. 573 A new media subtype is needed, e.g. "text/rex". 575 SDP would be the same as in Section 4.1.1.6. 577 Pros: 579 The source switching performance is good. Text from 16 participants 580 can be transmitted simultaneously. 582 New text from 16 simultaneous sources can be transmitted within 150 583 milliseconds. This is good performance. 585 Cons: 587 The format of each packet is different from what is specified in RFC 588 4103 [RFC4103]. 590 A new media subtype is needed. 592 The recovery procedure is a bit complex [RFC4103]. 594 When more consecutive packet loss than the number of generations of 595 redundant data appears, it is not possible to deduct the sources of 596 the totally lost data. 598 4.1.1.7. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy in the 599 packets 601 This method allows primary data from one source and redundant text 602 from other sources in each packet. The packet payload contains 603 primary data in "text/t140" format, and redundant data in RFC 5109 604 FEC [RFC5109] format called "text/ulpfec". That means that the 605 redundant data contains the sequence number and the CSRC and other 606 characteristics from the RTP header when the data was sent as 607 primary. The redundancy can be sent at a selected number of packets 608 after when it was sent as primary, in order to improve the protection 609 against bursty packet loss. The redundancy level is recommended to 610 be the same as in original RFC 4103. 612 RFC 4103 says that the protection against loss can be made by other 613 methods than plain redundancy, so this method is in line with that 614 statement. 616 Transmission is done as soon as there is new text available, but not 617 with shorter interval than 100 ms and not longer than 300 ms while 618 there is anything to send (new or redundant text). 620 When more consecutive packet loss than the number of generations of 621 redundant data appears, it is not possible to deduct the sources of 622 the totally lost data. 624 The sdp can indicate the format as "text/red" with "text/ulpfec" 625 redundant data in this way. with traditional RFC 4103 with "text/red" 626 with "text/t140" as redundant data as a fallback. 628 m=text 49170 RTP/AVP 98 101 100 102 629 a=rtpmap:98 red/1000 630 a=fmtp:98 100/102/102 631 a=rtpmap:102 ulpfec/1000 632 a=rtpmap:100 t140/1000 633 a=rtpmap:101 red/1000 634 a=fmtp:101 100/100/100 635 a=fmtp:100 cps=200 637 The "text/ulpfec" format includes an indication of how far back the 638 redundancy belongs, making it possible to cover bursty packet loss 639 better than the other formats with short transmission intervals. For 640 real-time text, it is recommended to send three packets between the 641 primary and the redundant transmissions of text. That makes the 642 transmission cover between 500 and 1500 ms of bursty packet loss. 643 The variation is because of the varying packet interval between many 644 and one simultaneously transmitting source. 646 The "text/ulpfec" format has a number of parameters. One is the 647 length of the data to be protected which in this case must be the 648 whole t140block. 650 Pros: 652 The source switching performance is good. Text from 5 participants 653 can be transmitted within 500 ms. 655 Good recovery from bursty packet loss. 657 The method is based on existing standards. No new registrations are 658 needed. 660 Cons: 662 When more consecutive packet loss than the number of generations of 663 redundant data appears, it is not possible to deduct the sources of 664 the totally lost data. 666 Even if the switching performance is good, it is not as good as for 667 the method called "RTP Mixer with multiple primary data in each 668 packet "Section 4.1.1.6. With more than 5 simultaneously sending 669 sources, there will be a noticeable delay of text of over 500 ms, 670 with 100 ms added per simultaneous source. This is however beyond 671 the requirements and would be a concern only in congestion 672 situations. 674 The recovery procedure is a bit complex [RFC5109]. 676 There is more overhead in terms of extra data and extra packets sent 677 than in the other methods. With the recommended two redundant 678 generations of data, each packet will be 36 bytes longer than with 679 traditional RFC 4103, and at each pause in transmission five extra 680 packets with only redundant data will be sent compared to two extra 681 packets for the traditional RFC 4103 case. 683 4.1.1.8. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy and 684 separate sequence number in the packets 686 This method allows primary data from one source and redundant text 687 from other sources in each packet. The packet payload contains 688 primary data in a new "text/t140e" format, and redundant data in RFC 689 5109 FEC [RFC5109] format called "text/ulpfec". That means that the 690 redundant data contains the sequence number and the CSRC and other 691 characteristics from the RTP header when the data was sent as 692 primary. The redundancy can be sent at a selected number of packets 693 after when it was sent as primary, in order to improve the protection 694 against bursty packet loss. The redundancy level is recommended to 695 be the same as in original RFC 4103. The "text/t140e" format 696 contains a source-specific sequence number and the t140block. 698 RFC 4103 says that the protection against loss can be made by other 699 methods than plain redundancy, so this method is in line with that 700 statement. 702 Transmission is done as soon as there is new text available, but not 703 with shorter interval than 100 ms and not longer than 300 ms while 704 there is anything to send (new or redundant text). 706 When more consecutive packet loss than the number of generations of 707 redundant data appears, it is possible to deduct which sources lost 708 data when new data arrives from the sources. This is done by 709 monitoring the received source specific sequence numbers preceding 710 the text. 712 This is an example of how can indicate the format as "text/red" with 713 "text/t140e" as primary and "text/ulpfec" redundant data, with 714 traditional RFC 4103 with "text/red" with "text/t140" as redundant 715 data as a fallback. 717 m=text 49170 RTP/AVP 98 101 100 102 103 718 a=rtpmap:98 red/1000 719 a=fmtp:98 100/102/102 720 a=rtpmap:102 ulpfec/1000 721 a=rtpmap:103 t140/1000 722 a=rtpmap:100 t140e/1000 723 a=rtpmap:101 red/1000 724 a=fmtp:101 103/103/103 725 a=fmtp:100 cps=200 727 The "text/ulpfec" format includes an indication of how far back the 728 redundancy belongs, making it possible to cover bursty packet loss 729 better than the other formats with short transmission intervals. For 730 real-time text, it is recommended to send three packets between the 731 primary and the redundant transmissions of text. That makes the 732 transmission cover between 500 and 1500 ms of bursty packet loss. 733 The variation is because of the varying packet interval between many 734 and one simultaneously transmitting source. 736 The "text/ulpfec" format has a number of parameters. One is the 737 length of the data to be protected which in this case must be the 738 whole t140block. 740 Pros: 742 The source switching performance is good. Text from 5 participants 743 can be transmitted within 500 ms. 745 Good recovery from bursty packet loss. 747 The method is based on an existing standard for FEC. 749 When more consecutive packet loss than the number of generations of 750 redundant data appears, it is possible to deduct the source of the 751 lost data when new text arrives from the source. 753 Cons: 755 Even if the switching performance is good, it is not as good as for 756 the method called "RTP Mixer with multiple primary data in each 757 packet" Section 4.1.1.6. With more than 5 simultaneously sending 758 sources, there will be a noticeable delay of text of over 500 ms, 759 with 100 ms added per simultaneous source. This is however beyond 760 the requirements and would be a concern only in congestion 761 situations. 763 The recovery procedure is a bit complex [RFC5109]. 765 There is more overhead in terms of extra data and extra packets sent 766 than in the other methods. With the recommended two redundant 767 generations of data, each packet will be 40 bytes longer than with 768 traditional RFC 4103, and at each pause in transmission five extra 769 packets with only redundant data will be sent compared to two extra 770 packets for the traditional RFC 4103 case. 772 A new text media subtype "text/t140e" needs to be registered. 774 4.1.1.9. RTP Mixer indicating participants by a control code in the 775 stream 777 Text from all participants except the receiving one is transmitted 778 from the media mixer in the same RTP session and stream, thus all 779 using the same destination address/port combination, the same RTP 780 SSRC and , one sequence number series as described in Section 7.1 and 781 7.3 of RTP RFC 3550 [RFC3550] about the Mixer function. The sources 782 of the text in each RTP packet are identified by a new defined T.140 783 control code "c" followed by a unique identification of the source in 784 UTF-8 string format. 786 The receiver can use the string for presenting the source of text. 787 This method is on the RTP level described in RFC 7667, section 3.6.1 788 Media mixing mixer [RFC7667]. 790 The inline coding of the source of text is applied in the data stream 791 itself, and an RTP mixer function is used for coordinating the 792 sources of text into one RTP stream. 794 Information uniquely identifying each user in the multi-party session 795 is placed as the parameter value "n" in the T.140 application 796 protocol function with the function code "c". The identifier shall 797 thus be formatted like this: SOS c n ST, where SOS and ST are coded 798 as specified in ITU-T T.140 [T140]. The "c" is the letter "c". The 799 n parameter value is a string uniquely identifying the source. This 800 parameter shall be kept short so that it can be repeated in the 801 transmission without concerns for network load. 803 A receiving endpoint is supposed to separate text items from the 804 different sources and identify and display them accordingly. 806 The conference server need to be allowed to decrypt/encrypt the 807 packet payload in order to check the source and repack the text. 809 Pros: 811 If loss of packets occur, it is possible to recover text from 812 redundancy at loss of up to the number of redundancy levels carried 813 in the RFC 4103 [RFC4103]stream. (normally primary and two redundant 814 levels. 816 This method can be implemented with most RTP implementations. 818 The method can also be used with other transports than RTP 820 Cons: 822 The method implies a moderate load by the need to insert the source 823 often in the stream. 825 If more consecutive packet loss than the number of generations of 826 redundant data appears, it is not possible to deduct the source of 827 the totally lost data. 829 The mixer needs to be able to generate suitable and unique source 830 identifications which are suitable as labels for the sources. 832 Requires an extension on the ITU-T T.140 standard, best made by the 833 ITU. 835 There is a risk that the control code indicating the change of source 836 is lost and the result is false source indication of text. 838 The conference server need to be allowed to decrypt/encrypt the 839 packet payload. 841 4.1.1.10. Mixing for multi-party unaware user agents 843 Multi-party real-time text contents can be transmitted to multi-party 844 unaware user agents if source labelling and formatting of the text is 845 performed by a mixer. This method has the limitations that the 846 layout of the presentation and the format of source identification is 847 purely controlled by the mixer, and that only one source at a time is 848 allowed to present in real-time. Other sources need to be stored 849 temporarily waiting for an appropriate moment to switch the source of 850 transmitted text. The mixer controls the switching of sources and 851 inserts a source identifier in text format at the beginning of text 852 after switch of source. The logic of the mixer to detect when a 853 switch is appropriate should detect a number of places in text where 854 a switch can be allowed, including new line, end of sentence, end of 855 phrase, a period of inactivity, and a word separator after a long 856 time of active transmission. 858 This method MAY be used when no support for multi-party awareness is 859 detected in the receiving endpoint.The base for his method is 860 described in RFC 7667, section 3.6.1 Media mixing mixer [RFC7667]. 862 See [I-D.ietf-avtcore-multi-party-rtt-mix] for a procedure for mixing 863 RTT for a conference-unaware endpoint. 865 Pros: 867 Can be transmitted to conference-unaware endpoints. 869 Can be used with other transports than RTP 871 Cons: 873 Does not allow full real-time presentation of more than one source at 874 a time. Text from other sources will be delayed. 876 The only realistic presentation format is a style with the text from 877 the different sources presented with a text label indicating source, 878 and the text collected in a chat style presentation but with more 879 frequent turn-taking. 881 Endpoints often have their own system for adding labels to the RTT 882 presentation. In that case there will be two levels of labels in the 883 presentation, one for the mixer and one for the sources. 885 If loss of more packets than can be recovered by the redundancy 886 appears, it is not possible to detect which source was struck by the 887 loss. It is also possible that a source switch occurred during the 888 loss, and therefore a false indication of the source of text can be 889 provided to the user after such loss. 891 Because of all these cons, this method is not recommended and MUST 892 NOT be used as the main method, but only as the last resort for 893 backwards interoperability with multi-party unaware endpoints. 895 The conference server need to be allowed to decrypt/encrypt the 896 packet payload. 898 4.1.2. RTP-based bridging with minor RTT media contents reformatting by 899 the bridge 901 It may be desirable to send text in a multi-party setting in a way 902 that allows the text stream contents to be distributed without being 903 dealt with in detail in any central server. A number of such methods 904 are described. However, when writing this specification, no one of 905 these methods have a specified way of establishing the session by 906 sdp. 908 4.1.2.1. RTP Translator sending one RTT stream per participant 910 Within the RTP session, text from each participant is transmitted 911 from the RTP media translator (bridge) in a separate RTP stream, thus 912 using the same destination address/port combination, the same payload 913 type number (PT) but separate RTP SSRC parameters and sequence number 914 series as described in Section 7.1 and 7.2 of RTP RFC 3550 [RFC3550] 915 about the Translator function. The source of the text in each RTP 916 packet is identified by the SSRC parameter in the RTP packets, 917 containing the SSRC of the initial source of text. 919 A receiving and presenting endpoint is supposed to separate text 920 items from the different sources and identify and display them in a 921 suitable way. 923 This method is described in RFC 7667, section 3.5.1 Relay-transport 924 translator or 3.5.2 Media translator [RFC7667]. 926 The identification of the source is made through the SSRC. The 927 translation to a readable label can be done by mapping to information 928 from the RTCP SDES CNAME and NAME packets as described in 929 RTP[RFC3550], and also through information in the text media member 930 in the conference notification described in RFC 4575 [RFC4575]. 932 The sdp exchange for establishing this mixing type can be equal to 933 what is used for basic two-party use of RFC 4103 with just an added 934 attribute for indicating multi-party capability. 936 m=text 49170 RTP/AVP 98 103 937 a=rtpmap:98 red/1000 938 a=fmtp:98 103/103/103 939 a=rtpmap:103 t140/1000 940 a=fmtp:103 cps=150 941 a=RTT-mix:RTP-translator 942 A similar answer including the same RTT-mix attribute would indicate 943 that multi-party coding can begin. An answer without the same RTT- 944 mix attribute could result in diversion to use of the mixing method 945 for multi-party unaware endpoints Section 4.1.1.10 if more than two 946 parties are involved in the session. 948 The bridge can add new sources in the communication to a participant 949 by first sending a conference notification according to RFC 4575 950 [RFC4575] with the SSRC of the new source included in the 951 corresponding "text" media member, or by sending an RTCP message with 952 the new SSRC in an SDES packet. 954 A receiver should be prepared to receive such indications of new 955 streams being added to the multi-party session, so that the new SSRC 956 is not taken for a change in SSRC value for an already established 957 RTP stream. 959 Transmission, reception, packet loss recovery and text loss 960 indication is performed per source in the separate RTP streams in the 961 same way as in two-party sessions with RFC 4103 [RFC4575]. 963 Text is recommended to be sent by the bridge as soon as it is 964 available for transmission, but not less than 250 ms after a previous 965 transmission. This will in many cases result in close to 0 added 966 delay by the bridge, because most RTT senders use a 300 ms 967 transmission interval. 969 It is sometimes said that this configuration is not supported by 970 current media declarations in sdp. RFC 3264 [RFC3264]specifies in 971 some places that one media description is supposed to describe just 972 one RTP media stream. However this is not directly referencing an 973 RTP stream, and use of multiple RTP streams in the same RTP session 974 is recommended in many other RFCs. 976 This confusion is clarified in RFC 5576 [RFC5576] section 3 by the 977 following statements: 979 "The term "media stream" does not appear in the SDP specification 980 itself, but is used by a number of SDP extensions, for instance, 981 Interactive Connectivity Establishment (ICE) [ICE], to denote the 982 object described by an SDP media description. This term is 983 unfortunately rather confusing, as the RTP specification [RFC3550] 984 uses the term "media stream" to refer to an individual media source 985 or RTP packet stream, identified by an SSRC, whereas an SDP media 986 stream describes an entire RTP session, which can contain any number 987 of RTP sources." 988 In most cases, it will be sufficient that new sources are introduced 989 with a conference notification or RTCP message. However, RFC 5576 990 [RFC5576] specifies attributes which may be used to more explicitly 991 announce new sources or restart of earlier established RTP streams. 993 This method is encouraged by draft-ietf-avtcore-multiplex-guidelines 994 [I-D.ietf-avtcore-multiplex-guidelines] section 5.2. 996 Normal operation will be that the bridge receives text packets from 997 the source and handles any text recovery and indication of loss 998 needed before queueing the resulting clean text for transmission from 999 the bridge to the receivers. 1001 It may however also be possible for the bridge to just convey the 1002 packet contents as received from the sources, with minor adjustments, 1003 and let the receiving endpoint handle all aspects of recovery and 1004 indication of loss, even for the source to bridge path. In that case 1005 also the sequence number must be maintained as it was at reception in 1006 the bridge. This mode needs further study before application. 1008 Pros: 1010 This method is the natural way to do multi-party bridging with RFC 1011 4103 based RTT. Only a small addition is included in the session 1012 establishment to verify capability by the parties because many 1013 implementations are done without multi-party capability. 1015 This method has moderate overhead in terms of work for the mixer, but 1016 high in terms of packet transmission rate. Five sources sending 1017 simultaneously cause the bridge to send 15 packets per second to each 1018 receiver. 1020 When loss of packets occur, it is possible to recover text from 1021 redundancy at loss of up to the number of redundancy levels carried 1022 in the RFC 4103 [RFC4103] stream(normally primary and two redundant 1023 levels). 1025 More loss than what can be recovered, can be detected and the marker 1026 for text loss can be inserted in the correct stream. 1028 It may be possible in some scenarios to keep the text encrypted 1029 through the Translator. 1031 Minimal delay. The delay can often be kept close to 0 with at least 1032 5 simultaneous sending participants. 1034 Cons: 1036 There may be RTP implementations not supporting the Translator model. 1037 They will need to use the fall-back to multi-party-unaware mixing. 1038 An investigation about how common this is is needed before the method 1039 is used. 1041 With many simultaneous sending sources, the total rate of packets 1042 will be high, and can cause congestion. The requirement to handle 5 1043 simultaneous sources in this specification will cause 15 packets per 1044 second that is on the high side but still manageable in most cases, 1045 e.g. considering that audio usually use 50 packets per second. 1047 4.1.2.2. Distributing packets in an end-to-end encryption structure 1049 In order to achieve end-to-end encryption, it is possible to let the 1050 packets from the sources just pass though a central distributor, and 1051 handle the security agreements between the participants. 1052 Specifications exist for a framework with this functionality for 1053 application on RTP based conferences in 1054 [I-D.ietf-perc-private-media-framework]. The RTP flow and mixing 1055 characteristics has similarities with the method described under "RTP 1056 Translator sending one RTT stream per participant" above. RFC 4103 1057 RTP streams [RFC4103] would fit into the structure and it would 1058 provide a base for end-to-end encrypted rtt multi-party conferencing. 1060 Pros: 1062 Good security 1064 Straightforward multi-party handling. 1066 Cons: 1068 Does not operate under the usual SIP central conferencing 1069 architecture. 1071 Requires the participants to perform a lot of key handling. 1073 Is work in progress when this is written. 1075 4.1.2.3. Mesh of RTP endpoints 1077 Text from all participants are transmitted directly to all others in 1078 one RTP session, without a central bridge. The sources of the text 1079 in each RTP packet are identified by the source network address and 1080 the SSRC. 1082 This method is described in RFC 7667, section 3.4 Point to multi- 1083 point using mesh [RFC7667]. 1085 Pros: 1087 When loss of packets occur, it is possible to recover text from 1088 redundancy at loss of up to the number of redundancy levels carried 1089 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 1090 levels. 1092 This method can be implemented with most RTP implementations. 1094 Transmitted text can also be used with other transports than RTP 1096 Cons: 1098 This model is not described in IMS, NENA and EENA specifications, and 1099 does therefore not meet the requirements. 1101 Requires a drastically increasing number of connections when the 1102 number of participants increase. 1104 4.1.2.4. Multiple RTP sessions, one for each participant 1106 Text from all participants are transmitted directly to all others in 1107 one RTP session each, without a central bridge. Each session is 1108 established with a separate media description in SDP. The sources of 1109 the text in each RTP packet are identified by the source network 1110 address and the SSRC. 1112 Pros: 1114 When loss of packets occur, it is possible to recover text from 1115 redundancy at loss of up to the number of redundancy levels carried 1116 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 1117 levels. 1119 Complete loss of text can be indicated in the received stream. 1121 This method can be implemented with most RTP implementations. 1123 End-to-end encryption is achievable. 1125 Cons: 1127 This method is not described in IMS, NENA and ETSI specifications and 1128 does therefore not meet the requirements. 1130 A lot of network resources are spent on setting up separate sessions 1131 for each participant. 1133 5. Preferred RTP-based multi-party RTT transport method 1135 For RTP transport of RTT using RTP-mixer technology, one method for 1136 multi-party mixing and transport stand out as fulfilling the goals 1137 best and is therefore recommended. That is: TBD 1139 For RTP transport in separate streams or sessions, no current 1140 recommendation can be made. A bridging method in the process of 1141 standardisation with interesting characteristics is the end-to-end 1142 encryption model "perc" Section 4.1.2.2. 1144 6. Session control of RTP-based multi-party RTT sessions 1146 General session control aspects for multi-party sessions are 1147 described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP) 1148 Event Package for Conference State, and RFC 4579 [RFC4579] Session 1149 Initiation Protocol (SIP) Call Control - Conferencing for User 1150 Agents. The nomenclature of these specifications are used here. 1152 The procedures for a multi-party aware model for RTT-transmission 1153 shall only be applied if a capability exchange for multi-party aware 1154 real-time text transmission has been completed and a supported method 1155 for multi-party real-time text transmission can be negotiated. 1157 A method for detection of conference-awareness for centralized SIP 1158 conferencing in general is specified in RFC 4579 [RFC4579]. The 1159 focus sends the "isfocus" feature tag in a SIP Contact header. This 1160 causes the conference-aware endpoint to subscribe to conference 1161 notifications from the focus. The focus then sends notifications to 1162 the endpoint about entering and disappearing conference participants 1163 and their media capabilities. The information is carried XML- 1164 formatted in a 'conference-info' block in the notification according 1165 to RFC 4575 [RFC4575]. The mechanism is described in detail in RFC 1166 4575 [RFC4575]. 1168 Before a conference media server starts sending multi-party RTT to an 1169 endpoint, a verification of its ability to handle multi-party RTT 1170 must be made. A decision on which mechanism to use for identifying 1171 text from the different participants must also be taken, implicitly 1172 or explicitly. These verifications and decisions can be done in a 1173 number of ways. The most apparent ways are specified here and their 1174 pros and cons described. One of the methods is selected to be the 1175 one to be used by implementations of the centralized conference model 1176 according to this specification. 1178 6.1. Implicit RTT multi-party capability indication 1180 Capability for RTT multi-party handling can be decided to be 1181 implicitly indicated by session control items. 1183 The focus may implicitly indicate muti-party RTT capability by 1184 including the media child with value "text" in the RFC 4575 [RFC4575] 1185 conference-info provided in conference notifications. 1187 An endpoint may implicitly indicate multi-party RTT capability by 1188 including the text media in the SDP in the session control 1189 transactions with the conference focus after the subscription to the 1190 conference has taken place. 1192 The implicit RTT capability indication means for the focus that it 1193 can handle multi-party RTT according to the preferred method 1194 indicated in the RTT multi-party methods section above. 1196 The implicit RTT capability indication means for the endpoint that it 1197 can handle multi-party RTT according to the preferred method 1198 indicated in the RTT multi-party methods section above. 1200 If the focus detects that an endpoint implicitly declared RTT multi- 1201 party capability, it SHALL provide RTT according to the preferred 1202 method. 1204 If the focus detects that the endpoint does not indicate any RTT 1205 multi-party capability, then it shall either provide RTT multi-party 1206 text in the way specified for conference-unaware endpoint above, or 1207 refuse to set up the session. 1209 If the endpoint detects that the focus has implicitly declared RTT 1210 multi-party capability, it shall be prepared to present RTT in a 1211 multi-party fashion according to the preferred method. 1213 Pros: 1215 Acceptance of implicit multi-party capability implies that no 1216 standardisation of explicit RTT multi-party capability exchange is 1217 required. 1219 Cons: 1221 If other methods for multi-party RTT are to be used in the same 1222 implementation environment as the preferred ones, then capability 1223 exchange needs to be defined for them. 1225 Cannot be used outside a strictly applied SIP central conference 1226 model. 1228 6.2. RTT multi-party capability declared by SIP media-tags 1230 Specifications for RTT multi-party capability declarations can be 1231 agreed for use as SIP media feature tags, to be exchanged during SIP 1232 call control operation according to the mechanisms in RFC 3840 1233 [RFC3840] and RFC 3841 [RFC3841]. Capability for the RTT Multi-party 1234 capability is then indicated by the media feature tag "rtt-mix", with 1235 a set of possible values for the different possible methods. 1237 The possible values in the list may for example be: 1239 rtp-mixer 1241 perc 1243 rtp-mixer indicates capability for using the RTP-mixer based 1244 presentation of multi-party text. 1246 perc indicates capability for using the perc based transmission of 1247 multi-party text. 1249 Example: Contact: 1251 ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL" 1253 ;+sip.rtt-mix="rtp-mixer" 1255 If, after evaluation of the alternatives in this specification, only 1256 one mixing method is selected to be brought to implementation, then 1257 the media tag can be reduced to a single tag with no list of values. 1259 An offer-answer exchange should take place and the common method 1260 selected by the answering party shall be used in the session with 1261 that UA. 1263 When no common method is declared, then only the fallback method for 1264 multi-party unaware participants can be used, or the session dropped. 1266 If more than one text media section is included in SDP, all must be 1267 capable of using the declared RTT multi-party method. 1269 Pros: 1271 Provides a clear decision method. 1273 Can be extended with new mixing methods. 1275 Can guide call routing to a suitable capable focus. 1277 Cons: 1279 Requires standardization and IANA registration. 1281 Is not stream specific. If more than one text stream is specified, 1282 all must have the same type of multi-party capability. 1284 Cannot be used in the WebRTC environment. 1286 6.3. SDP media attribute for RTT multi-party capability indication 1288 An attribute can be specified on media level, to be used in text 1289 media SDP declarations for negotiating RTT multi-party capabilities. 1290 The attribute can have the name "rtt-mix". 1292 More than one attribute can be included in one media description. 1294 The attribute can have a value. The value can for example be: 1296 rtp-mixer 1298 rtp-translator 1300 perc 1302 rtp-mixer indicates capability for using the RTP-mixer and CSRC-list 1303 based mixing of multi-party text. 1305 rtp-translator indicates capability for using the RTP-translator 1306 based mixing 1308 perc indicates capability for using the perc based transmission of 1309 multi-party text. 1311 An offer-answer exchange should take place and the common method 1312 selected by the answering party shall be used in the session with 1313 that endpoint. 1315 When no common method is declared, then only the fallback method for 1316 multi-party unaware endpoints can be used. 1318 Example: a=rtt-mix:rtp-mixer 1319 If, after evaluation of the alternatives in this specification, only 1320 one mixing method is selected to be brought to implementation, then 1321 the attribute can be reduced to a single attribute with no list of 1322 values. 1324 Pros: 1326 Provides a clear decision method. 1328 Can be extended with new mixing methods. 1330 Can be used on specific text media. 1332 Can be used also for SDP-controlled WebRTC sessions with multiple 1333 streams in the same data channel. 1335 Cons: 1337 Requires standardization and IANA registration. 1339 Cannot guide SIP routing. 1341 6.4. Simplified SDP media attribute for RTT multi-party capability 1342 indication 1344 An attribute can be specified on media level, to be used in text 1345 media SDP declarations for negotiating RTT multi-party capabilities. 1346 The attribute can have the name "rtt-mix" with no value. It would be 1347 selected and used if only one method for multi-party rtt is brought 1348 forward from this specification, and the other suppressed or found to 1349 be possible to negotiate in another way. 1351 An offer-answer exchange should take place and if both parties 1352 specify "rtt-mix" capability, the selected mixing method shall be 1353 used. 1355 When no common method is declared, then only the fallback method for 1356 multi-party unaware endpoints can be used, or the session not 1357 accepted for multi-party use. 1359 Example: a=rtt-mix 1361 Pros: 1363 Provides a clear decision method. 1365 Very simple syntax and semantics. 1367 Can be used on specific text media. 1369 Could possibly be used also for SDP-controlled WebRTC sessions with 1370 multiple streams in the same data channel. 1372 Cons: 1374 Requires standardization and IANA registration. 1376 If another RTT mixing method is also specified in the future, then 1377 that method may also need to specify and register its own attribute, 1378 instead of if an attribute with a parameter value is used, when only 1379 an addition of a new possible value is needed. 1381 Cannot guide SIP routing. 1383 6.5. SDP format parameter for RTT multi-party capability indication 1385 An FMTP format parameter can be specified for the RFC 4103 1386 [RFC4103]media, to be used in text media SDP declarations for 1387 negotiating RTT multi-party capabilities. The parameter can have the 1388 name "rtt-mix", with one or more of its possible values. 1390 The possible values in the list are: 1392 rtp-mixer 1394 perc 1396 rtp-mixer indicates capability for using the RTP-mixer based mixing 1397 and presentation of multi-party text using the CSRC-list. 1399 perc indicates capability for using the perc based transmission of 1400 multi-party text. 1402 Example: a=fmtp 96 98/98/98 rtt-mix=rtp-mixer 1404 If, after evaluation of the alternatives in this specification, only 1405 one mixing method is selected to be brought to implementation, then 1406 the parameter can be reduced to a single parameter with no list of 1407 values. 1409 An offer-answer exchange should take place and the common method 1410 selected by the answering party shall be used in the session with 1411 that UA. 1413 When no common method is declared, then only the fallback method can 1414 be used, or the session denied. 1416 Pros: 1418 Provides a clear decision method. 1420 Can be extended with new mixing methods. 1422 Can be used on specific text media. 1424 Can be used also for SDP-controlled WebRTC sessions with multiple 1425 streams in the same data channel. 1427 Cons: 1429 Requires standardization and IANA registration. 1431 May cause interop problems with current RFC4103 [RFC4103] 1432 implementations not expecting a new fmtp-parameter. 1434 Cannot guide SIP routing. 1436 6.6. A text media subtype for support of multi-party rtt 1438 Indicating a specific text media subtype in SDP is a straightforward 1439 way for negotiating multi-party capability. Especially if there are 1440 format differences from the "text/red" and "text/t140" formats of 1441 RFC4103 [RFC4103], then this is a natural way to do the negotiation 1442 for multi-party rtt. 1444 Pros: 1446 No extra efforts if a new format is needed anyway. 1448 Cons: 1450 None specific to using the format indication for negotiation of 1451 multi-party capability. But only feasible if a new format is needed 1452 anyway. 1454 6.7. Preferred capability declaration method for RTP-based transport. 1456 If the preferred transport method is one with a specific media 1457 subtype in sdp, then speciication by media subtype is preferred. 1459 If this would not be the case, then the preferred capability 1460 declaration method would be the one with a simplified SDP attribute 1461 "a=rtt-mix" Section 6.4 because it is straightforward and partially 1462 usable also for WebRTC if so needed. 1464 6.8. Identification of the source of text for RTP-based solutions 1466 The main way to identify the source of text in the RTP based solution 1467 is by the SSRC of the sending participant. In the RTP-mixer 1468 solution, this SSRC is included in the CSRC list of the transmitted 1469 packets. Further identification that may be needed for better 1470 labelling of received text may be achieved from a number of sources. 1471 It may be the RTCP SDES CNAME and NAME reports, and in the conference 1472 notification data (RFC 4575) [RFC4575]. 1474 As soon as a new member is added to the RTP session, its 1475 characteristics should be transmitted in RTCP SDES CNAME and NAME 1476 reports according to section 6.5 in RFC 3550 [RFC3550]. The 1477 information about the participant should also be included in the 1478 conference data including the text media member in a notification 1479 according to RFC 4575 [RFC4575]. 1481 The RTCP SDES report, SHOULD contain identification of the source 1482 represented by the SSRC/CSRC identifier. This identification MUST 1483 contain the CNAME field and MAY contain the NAME field and other 1484 defined fields of the SDES report. 1486 A focus UA SHOULD primarily convey SDES information received from the 1487 sources of the session members. When such information is not 1488 available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME 1489 information from available information from the SIP session with the 1490 participant. 1492 7. RTT bridging in WebRTC 1494 Within WebRTC, real-time text is specified to be carried in WebRTC 1495 data channels as specified in 1496 [I-D.ietf-mmusic-t140-usage-data-channel]. A few ways to handle 1497 multi-party RTT are mentioned briefly. They are repeated below. 1499 7.1. RTT bridging in WebRTC with one data channel per source 1501 A straightforward way to handle multi-party RTT is for the bridge to 1502 open one T.140 data channel per source towards the receiving 1503 participants. 1505 The stream-id forms a unique stream identification. 1507 The identification of the source is made through the Label property 1508 of the channel, and session information belonging to the source. The 1509 endpoint can compose a readable label for the presentation from this 1510 information. 1512 Pros: 1514 This is a straightforward solution. 1516 The load per source is low. 1518 Cons: 1520 With a high number of participants, the overhead of establishing and 1521 maintaining the high number of data channels required may be high, 1522 even if the load per channel is low. 1524 7.2. RTT bridging in WebRTC with one common data channel 1526 A way to handle multi-party RTT in WebRTC is for the bridge combine 1527 text from all sources into one data channel and insert the sources in 1528 the stream by a T.140 control code for source. 1530 This method is described in a corresponding section for RTP 1531 transmission above in Section 4.1.1.9. 1533 The identification of the source is made through insertion in the 1534 beginning of each text transmission from a source of a control code 1535 extension "c" followed by a string representing the source, framed by 1536 the control code start and end flags SOS and ST (See ITU-T T.140 1537 [T140]). 1539 A receiving endpoint is supposed to separate text items from the 1540 different sources and identify and display them in a suitable way. 1542 The endpoint does not always display the source identification in the 1543 received text at the place where it is received, but has the 1544 information as a guide for planning the presentation of received 1545 text. A label corresponding to the source identification is 1546 presented when needed depending on the selected presentation style. 1548 Pros: 1550 This solution has relatively low overhead on session and network 1551 level 1553 Cons: 1555 This solution has higher overhead on the media contents level than 1556 the WebRTC solution above. 1558 Standardisation of the new control code "c" in ITU-T T.140 [T140] is 1559 required. 1561 The conference server need to be allowed to decrypt/encrypt the data 1562 channel contents. 1564 7.3. Preferred rtt multi-party method for WebRTC 1566 For WebRTC, one method is to prefer because of the simplicity. So, 1567 for WebRTC, the method to implement for multi-party RTT with multi- 1568 party aware parties when no other method is explicitly agreed between 1569 implementing parties is: "RTT bridging in WebRTC with one data 1570 channel per source" Section 7.1. 1572 8. Presentation of multi-party text 1574 All session participants with RTP based transport MUST observe the 1575 SSRC/CSRC field of incoming text RTP packets, and make note of which 1576 source they came from in order to be able to present text in a way 1577 that makes it easy to read text from each participant in a session, 1578 and get information about the source of the text. 1580 In the WebRTC case, the Label parameter and other provided endpoint 1581 information should be used for the same purpose. 1583 8.1. Associating identities with text streams 1585 A source identity SHOULD be composed from available information 1586 sources and displayed together with the text as indicated in ITU-T 1587 T.140 Appendix[T140]. 1589 The source identity should primarily be the NAME field from incoming 1590 SDES packets. If this information is not available, and the session 1591 is a two-party session, then the T.140 source identity SHOULD be 1592 composed from the SIP session participant information. For multi- 1593 party sessions the source identity may be composed by local 1594 information if sufficient information is not available in the 1595 session. 1597 Applications may abbreviate the presented source identity to a 1598 suitable form for the available display. 1600 Applications may also replace received source information with 1601 internally used nicknames. 1603 8.2. Presentation details for multi-party aware endpoints. 1605 The multi-party aware endpoint should after any action for recovery 1606 of data from lost packets, separate the incoming streams and present 1607 them according to the style that the receiving application supports 1608 and the user has selected. The decisions taken for presentation of 1609 the multi-party interchange shall be purely on the receiving side. 1610 The sending application must not insert any item in the stream to 1611 influence presentation that is not requested by the sending 1612 participant. 1614 8.2.1. Bubble style presentation 1616 One often used style is to present real-time text in chunks in 1617 readable bubbles identified by labels containing names of sources. 1618 Bubbles are placed in one column in the presentation area and are 1619 closed and moved upwards in the presentation area after certain items 1620 or events, when there is also newer text from another source that 1621 would go into a new bubble. The text items that allows bubble 1622 closing are any character closing a phrase or sentence followed by a 1623 space or a timeout of a suitable time (about 10 seconds). 1625 Real-time active text sent from the local user should be presented in 1626 a separate area. When there is a reason to close a bubble from the 1627 local user, the bubble should be placed above all real-time active 1628 bubbles, so that the time order that real-time text entries were 1629 completed is visible. 1631 Scrolling is usually provided for viewing of recent or older text. 1632 When scrolling is done to an earlier point in the text, the 1633 presentation shall not move the scroll position by new received text. 1634 It must be the decision of the local user to return to automatic 1635 viewing of latest text actions. It may be useful with an indication 1636 that there is new text to read after scrolling to an earlier position 1637 has been activated. 1639 The presentation area may become too small to present all text in all 1640 real-time active bubbles. Various techniques can be applied to 1641 provide a good overview and good reading opportunity even in such 1642 situations. The active real-time bubble may have a limited number of 1643 lines and if their contents need more lines, then a scrolling 1644 opportunity within the real-time active bubble is provided. Another 1645 method can be to only show the label and the last line of the active 1646 real-time bubble contents, and make it possible to expand or compress 1647 the bubble presentation between full view and one line view. 1649 Erasures require special consideration. Erasure within a real-time 1650 active bubble is straightforward. But if erasure from one 1651 participant affects the last character before a bubble, the whole 1652 previous bubble becomes the actual bubble for real-time action by 1653 that participant and is placed below all other bubbles in the 1654 presentation area. If the border between bubbles was caused by the 1655 CRLF characters (instead of the normal "Line Separator"), only one 1656 erasure action is required to erase this bubble border. When a 1657 bubble is closed, it is moved up, above all real-time active bubbles. 1659 A three-party view is shown in this example . 1661 _________________________________________________ 1662 | |^| 1663 | |-| 1664 |[Alice] Hi, Alice here. | | 1665 | | | 1666 |[Bob] Bob as well. | | 1667 | | | 1668 |[Eve] Hi, this is Eve, calling from Paris. | | 1669 | I thought you should be here. | | 1670 | | | 1671 |[Alice] I am coming on Thursday, my | | 1672 | performance is not until Friday morning.| | 1673 | | | 1674 |[Bob] And I on Wednesday evening. | | 1675 | | | 1676 |[Alice] Can we meet on Thursday evening? | | 1677 | | | 1678 |[Eve] Yes, definitely. How about 7pm. | | 1679 | at the entrance of the restaurant | | 1680 | Le Lion Blanc? | | 1681 |[Eve] we can have dinner and then take a walk | | 1682 | | | 1683 | But I need to be back to | | 1684 | the hotel by 11 because I need | | 1685 | | | 1686 | I wou |-| 1687 |______________________________________________|v| 1688 | of course, I underst | 1689 |________________________________________________| 1691 Figure 1: Three-party call with bubble style. 1693 Figure 1: Example of a three-party call presented in the bubble 1694 style. 1696 8.2.2. Other presentation styles 1698 Other presentation styles than the bubble style may be arranged and 1699 appreciated by the users. In a video conference one way may be to 1700 have a real-time text area below the video view of each participant. 1701 Another view may be to provide one column in a presentation area for 1702 each participant and place the text entries in a relative vertical 1703 position corresponding to when text entry in them was completed. The 1704 labels can then be placed in the column header. The considerations 1705 for ending and moving and erasure of entered text discussed above for 1706 the bubble style are valid also for these styles. 1708 This figure shows how a coordinated column view MAY be presented. 1710 _____________________________________________________________________ 1711 | Bob | Eve | Alice | 1712 |____________________|______________________|_______________________| 1713 | | |I will arrive by TGV. | 1714 |My flight is to Orly| |Convenient to the main | 1715 | |Hi all, can we plan |station. | 1716 | |for the seminar? | | 1717 |Eve, will you do | | | 1718 |your presentation on| | | 1719 |Friday? |Yes, Friday at 10. | | 1720 |Fine, wo | |We need to meet befo | 1721 |___________________________________________________________________| 1723 Figure 2: A coordinated column-view of a three-party session with 1724 entries ordered in approximate time-order. 1726 9. Presentation details for multi-party unaware endpoints. 1728 Multi-party unaware endpoints are prepared only for presentation of 1729 two sources of text, the local user and a remote user. If mixing for 1730 multi-party unaware endpoints is to be supported, in order to enable 1731 some multi-party communication with such endpoint, the mixer need to 1732 plan the presentation and insert labels and line breaks before 1733 lables. Many limitations appear for this presentation mode, and it 1734 must be seen as a fallback and a last resort. 1736 A procedure for presenting RTT to a conference-unaware endpoint is 1737 included in [I-D.ietf-avtcore-multi-party-rtt-mix] 1739 10. Security Considerations 1741 The security considerations valid for RFC 4103 [RFC4103] and RFC 3550 1742 [RFC3550] are valid also for the multi-party sessions with text. 1744 11. IANA Considerations 1746 The items for indication and negotiation of capability for multi- 1747 party rtt should be registered with IANA in the specifications where 1748 they are specified in detail. 1750 12. Congestion considerations 1752 The congestion considerations described in RFC 4103 [RFC4103] are 1753 valid also for the recommended RTP-based multi-party use of the real- 1754 time text transport. A risk for congestion may appear if a number of 1755 conference participants are active transmitting text simultaneously, 1756 because the recommended RTP-based multi-party transmission method 1757 does not allow multiple sources of text to contribute to the same 1758 packet. 1760 In situations of risk for congestion, the Focus UA MAY combine 1761 packets from the same source to increase the transmission interval 1762 per source up to one second. Local conference policy in the Focus UA 1763 may be used to decide which streams shall be selected for such 1764 transmission frequency reduction. 1766 13. Acknowledgements 1768 Arnoud van Wijk for contributions to an earlier, expired draft of 1769 this memo. 1771 14. Change history 1773 14.1. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-02 1775 Added detail in the section on RTP translator model alternative 1776 4.1.2.1. 1778 14.2. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-01 1780 Added three more methods for RTP-mixer mixing. Two RFC 5109 FEC 1781 based and another with modified data header to detect source of 1782 completely lost text. 1784 Separated RTP-based and WebRTC based solutions. 1786 Deleted the multi-party-unaware mixing procedure appendix. It is now 1787 included in the draft draft-ietf-avtcore-multi-party-rtt-mix. Kept a 1788 section with a reference to the new place. 1790 14.3. Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to draft- 1791 hellstrom-avtcore-multi-party-rtt-solutions-00 1793 Add discussion about switching performance, as discussed in avtcore 1794 on March 13. 1796 Added that a decrease of transmission interval to 100 ms increases 1797 switching performance by a factor 3, but still not sufficient. 1799 Added that the CSRC-list method also uses 100 milliseconds 1800 transmission interval. 1802 Added the method with multiple primary text in each packet. 1804 Added the timestamp-based method for rtp-mixing proposed by James 1805 Hamlin on March 14. 1807 Corrected the chat style presentation example picture. Delete a few 1808 "[mix]". 1810 14.4. Changes from version draft-hellstrom-mmusic-multi-party-rtt-01 to 1811 -02 1813 Change from a general overview to overview with clear 1814 recommendations. 1816 Splits text coordination methods in three groups. 1818 Recommends rtt-mixer with sources in CSRC-list but referenes to its 1819 spec for details. 1821 Shortened Appendix with conference-unaware example. 1823 Cleaned up preferences. 1825 Inserted pictures of screen-views. 1827 15. References 1829 15.1. Normative References 1831 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1832 Requirement Levels", BCP 14, RFC 2119, 1833 DOI 10.17487/RFC2119, March 1997, 1834 . 1836 15.2. Informative References 1838 [EN301549] ETSI, "EN 301 549. Accessibility requirements for ICT 1839 products and services", November 2019, 1840 . 1844 [I-D.ietf-avtcore-multi-party-rtt-mix] 1845 Hellstrom, G., "RTP-mixer formatting of multi-party Real- 1846 time text", Work in Progress, Internet-Draft, draft-ietf- 1847 avtcore-multi-party-rtt-mix-06, 11 June 2020, 1848 . 1851 [I-D.ietf-avtcore-multiplex-guidelines] 1852 Westerlund, M., Burman, B., Perkins, C., Alvestrand, H., 1853 and R. Even, "Guidelines for using the Multiplexing 1854 Features of RTP to Support Multiple Media Streams", Work 1855 in Progress, Internet-Draft, draft-ietf-avtcore-multiplex- 1856 guidelines-12, 16 June 2020, . 1859 [I-D.ietf-mmusic-t140-usage-data-channel] 1860 Holmberg, C. and G. Hellstrom, "T.140 Real-time Text 1861 Conversation over WebRTC Data Channels", Work in Progress, 1862 Internet-Draft, draft-ietf-mmusic-t140-usage-data-channel- 1863 14, 10 April 2020, . 1866 [I-D.ietf-perc-private-media-framework] 1867 Jones, P., Benham, D., and C. Groves, "A Solution 1868 Framework for Private Media in Privacy Enhanced RTP 1869 Conferencing (PERC)", Work in Progress, Internet-Draft, 1870 draft-ietf-perc-private-media-framework-12, 5 June 2019, 1871 . 1874 [NENAi3] NENA, "NENA-STA-010.2-2016. Detailed Functional and 1875 Interface Standards for the NENA i3 Solution", October 1876 2016, . 1878 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1879 Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse- 1880 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1881 DOI 10.17487/RFC2198, September 1997, 1882 . 1884 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1885 A., Peterson, J., Sparks, R., Handley, M., and E. 1886 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1887 DOI 10.17487/RFC3261, June 2002, 1888 . 1890 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1891 with Session Description Protocol (SDP)", RFC 3264, 1892 DOI 10.17487/RFC3264, June 2002, 1893 . 1895 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1896 Jacobson, "RTP: A Transport Protocol for Real-Time 1897 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1898 July 2003, . 1900 [RFC3840] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, 1901 "Indicating User Agent Capabilities in the Session 1902 Initiation Protocol (SIP)", RFC 3840, 1903 DOI 10.17487/RFC3840, August 2004, 1904 . 1906 [RFC3841] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 1907 Preferences for the Session Initiation Protocol (SIP)", 1908 RFC 3841, DOI 10.17487/RFC3841, August 2004, 1909 . 1911 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 1912 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 1913 . 1915 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 1916 Session Initiation Protocol (SIP)", RFC 4353, 1917 DOI 10.17487/RFC4353, February 2006, 1918 . 1920 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A 1921 Session Initiation Protocol (SIP) Event Package for 1922 Conference State", RFC 4575, DOI 10.17487/RFC4575, August 1923 2006, . 1925 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 1926 (SIP) Call Control - Conferencing for User Agents", 1927 BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, 1928 . 1930 [RFC4597] Even, R. and N. Ismail, "Conferencing Scenarios", 1931 RFC 4597, DOI 10.17487/RFC4597, August 2006, 1932 . 1934 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 1935 Correction", RFC 5109, DOI 10.17487/RFC5109, December 1936 2007, . 1938 [RFC5194] van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real- 1939 Time Text over IP Using the Session Initiation Protocol 1940 (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008, 1941 . 1943 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1944 Media Attributes in the Session Description Protocol 1945 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, 1946 . 1948 [RFC6443] Rosen, B., Schulzrinne, H., Polk, J., and A. Newton, 1949 "Framework for Emergency Calling Using Internet 1950 Multimedia", RFC 6443, DOI 10.17487/RFC6443, December 1951 2011, . 1953 [RFC6881] Rosen, B. and J. Polk, "Best Current Practice for 1954 Communications Services in Support of Emergency Calling", 1955 BCP 181, RFC 6881, DOI 10.17487/RFC6881, March 2013, 1956 . 1958 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 1959 DOI 10.17487/RFC7667, November 2015, 1960 . 1962 [T140] ITU-T, "Recommendation ITU-T T.140 (02/1998), Protocol for 1963 multimedia application text conversation", February 1998, 1964 . 1966 [T140ad1] ITU-T, "Recommendation ITU-T.140 Addendum 1 - (02/2000), 1967 Protocol for multimedia application text conversation", 1968 February 2000, 1969 . 1971 [TS103479] ETSI, "TS 103 479. Emergency communications (EMTEL); Core 1972 elements for network independent access to emergency 1973 services", December 2019, . 1977 [TS22173] 3GPP, "IP Multimedia Core Network Subsystem (IMS) 1978 Multimedia Telephony Service and supplementary services; 1979 Stage 1", 3GPP TS 22.173 17.1.0, 20 December 2019, 1980 . 1982 [TS24147] 3GPP, "Conferencing using the IP Multimedia (IM) Core 1983 Network (CN) subsystem; Stage 3", 3GPP TS 24.147 16.0.0, 1984 19 December 2019, 1985 . 1987 Author's Address 1989 Gunnar Hellstrom 1990 Gunnar Hellstrom Accessible Communication 1991 Esplanaden 30 1992 SE-136 70 Vendelso 1993 Sweden 1995 Phone: +46 708 204 288 1996 Email: gunnar.hellstrom@ghaccess.se