idnits 2.17.1 draft-hellstrom-avtcore-multi-party-rtt-solutions-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (19 November 2020) is 1225 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'ICE' is mentioned on line 1053, but not defined == Unused Reference: 'RFC3264' is defined on line 1989, but no explicit reference was found in the text == Outdated reference: A later version (-20) exists of draft-ietf-avtcore-multi-party-rtt-mix-10 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Hellstrom 3 Internet-Draft Gunnar Hellstrom Accessible Communication 4 Intended status: Informational 19 November 2020 5 Expires: 23 May 2021 7 Real-time text solutions for multi-party sessions 8 draft-hellstrom-avtcore-multi-party-rtt-solutions-05 10 Abstract 12 This document specifies methods for Real-Time Text (RTT) media 13 handling in multi-party calls. The main discussed transport is to 14 carry Real-Time text by the RTP protocol in a time-sampled mode 15 according to RFC 4103. The mechanisms enable the receiving 16 application to present the received real-time text media, separated 17 per source, in different ways according to user preferences. Some 18 presentation related features are also described explaining suitable 19 variations of transmission and presentation of text. 21 Call control features are described for the SIP environment. A 22 number of alternative methods for providing the multi-party 23 negotiation, transmission and presentation are discussed and a 24 recommendation for the main ones is provided. The main solution for 25 SIP based centralized multi-party handling of real-time text is 26 achieved through a media control unit coordinating multiple RTP text 27 streams into one RTP stream. 29 Alternative methods using a single RTP stream and source 30 identification inline in the text stream are also described, one of 31 them being provided as a lower functionality fallback method for 32 endpoints with no multi-party awareness for RTT. 34 Bridging methods where the text stream is carried without the 35 contents being dealt with in detail by the bridge are also discussed. 37 Brief information is also provided for multi-party RTT in the WebRTC 38 environment. 40 The intention is to provide background for decisions, specification 41 and implementation of selected methods. 43 Status of This Memo 45 This Internet-Draft is submitted in full conformance with the 46 provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF). Note that other groups may also distribute 50 working documents as Internet-Drafts. The list of current Internet- 51 Drafts is at https://datatracker.ietf.org/drafts/current/. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 This Internet-Draft will expire on 23 May 2021. 60 Copyright Notice 62 Copyright (c) 2020 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 67 license-info) in effect on the date of publication of this document. 68 Please review these documents carefully, as they describe your rights 69 and restrictions with respect to this document. Code Components 70 extracted from this document must include Simplified BSD License text 71 as described in Section 4.e of the Trust Legal Provisions and are 72 provided without warranty as described in the Simplified BSD License. 74 Table of Contents 76 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 77 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 78 2. Centralized conference model . . . . . . . . . . . . . . . . 5 79 3. Requirements on multi-party RTT . . . . . . . . . . . . . . . 6 80 3.1. General requirements . . . . . . . . . . . . . . . . . . 6 81 3.2. Performance requirements . . . . . . . . . . . . . . . . 7 82 4. RTP based solutions . . . . . . . . . . . . . . . . . . . . . 8 83 4.1. Coordination of text RTP streams . . . . . . . . . . . . 8 84 4.1.1. RTP-based solutions with a central mixer . . . . . . 8 85 4.1.1.1. RTP Mixer using default RFC 4103 methods . . . . 8 86 4.1.1.2. RTP Mixer using the default method but decreased 87 transmission interval . . . . . . . . . . . . . . . 9 88 4.1.1.3. RTP Mixer with frequent transmission and indicating 89 sources in CSRC-list . . . . . . . . . . . . . . . 10 90 4.1.1.4. RTP Mixer interleaving packets, receiver using 91 timestamp to recover from loss . . . . . . . . . . 11 92 4.1.1.5. RTP Mixer with multiple primary data in each packet 93 and individual sequence numbers . . . . . . . . . . 12 94 4.1.1.6. RTP Mixer with multiple primary data in each 95 packet . . . . . . . . . . . . . . . . . . . . . . 14 97 4.1.1.7. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy 98 in the packets . . . . . . . . . . . . . . . . . . 15 99 4.1.1.8. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy 100 and separate sequence number in the packets . . . . 17 101 4.1.1.9. RTP Mixer indicating participants by a control code 102 in the stream . . . . . . . . . . . . . . . . . . . 19 103 4.1.1.10. Mixing for multi-party unaware user agents . . . 20 104 4.1.2. RTP-based bridging with minor RTT media contents 105 reformatting by the bridge . . . . . . . . . . . . . 21 106 4.1.2.1. RTP Translator sending one RTT stream per 107 participant . . . . . . . . . . . . . . . . . . . . 21 108 4.1.2.2. Distributing packets in an end-to-end encryption 109 structure . . . . . . . . . . . . . . . . . . . . . 24 110 4.1.2.3. Mesh of RTP endpoints . . . . . . . . . . . . . . 25 111 4.1.2.4. Multiple RTP sessions, one for each 112 participant . . . . . . . . . . . . . . . . . . . . 26 113 5. Preferred RTP-based multi-party RTT transport method . . . . 26 114 6. Session control of RTP-based multi-party RTT sessions . . . . 27 115 6.1. Implicit RTT multi-party capability indication . . . . . 27 116 6.2. RTT multi-party capability declared by SIP media-tags . . 28 117 6.3. SDP media attribute for RTT multi-party capability 118 indication . . . . . . . . . . . . . . . . . . . . . . . 30 119 6.4. Simplified SDP media attribute for RTT multi-party 120 capability indication . . . . . . . . . . . . . . . . . . 31 121 6.5. SDP format parameter for RTT multi-party capability 122 indication . . . . . . . . . . . . . . . . . . . . . . . 32 123 6.6. A text media subtype for support of multi-party rtt . . . 33 124 6.7. Preferred capability declaration method for RTP-based 125 transport. . . . . . . . . . . . . . . . . . . . . . . . 33 126 6.8. Identification of the source of text for RTP-based 127 solutions . . . . . . . . . . . . . . . . . . . . . . . . 33 128 7. RTT bridging in WebRTC . . . . . . . . . . . . . . . . . . . 34 129 7.1. RTT bridging in WebRTC with one data channel per 130 source . . . . . . . . . . . . . . . . . . . . . . . . . 34 131 7.2. RTT bridging in WebRTC with one common data channel . . . 35 132 7.3. Preferred rtt multi-party method for WebRTC . . . . . . . 36 133 8. Presentation of multi-party text . . . . . . . . . . . . . . 36 134 8.1. Associating identities with text streams . . . . . . . . 36 135 8.2. Presentation details for multi-party aware endpoints. . . 37 136 8.2.1. Bubble style presentation . . . . . . . . . . . . . . 37 137 8.2.2. Other presentation styles . . . . . . . . . . . . . . 39 138 9. Presentation details for multi-party unaware endpoints. . . . 39 139 10. Security Considerations . . . . . . . . . . . . . . . . . . . 39 140 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 40 141 12. Congestion considerations . . . . . . . . . . . . . . . . . . 40 142 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 40 143 14. Change history . . . . . . . . . . . . . . . . . . . . . . . 40 144 14.1. Changes to 145 draft-hellstrom-avtcore-multi-party-rtt-solutions-05 . . 40 146 14.2. Changes to 147 draft-hellstrom-avtcore-multi-party-rtt-solutions-04 . . 40 148 14.3. Changes to 149 draft-hellstrom-avtcore-multi-party-rtt-solutions-03 . . 40 150 14.4. Changes to 151 draft-hellstrom-avtcore-multi-party-rtt-solutions-02 . . 41 152 14.5. Changes to 153 draft-hellstrom-avtcore-multi-party-rtt-solutions-01 . . 41 154 14.6. Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to 155 draft-hellstrom-avtcore-multi-party-rtt-solutions-00 . . 41 156 14.7. Changes from version 157 draft-hellstrom-mmusic-multi-party-rtt-01 to -02 . . . . 41 158 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 42 159 15.1. Normative References . . . . . . . . . . . . . . . . . . 42 160 15.2. Informative References . . . . . . . . . . . . . . . . . 42 161 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 45 163 1. Introduction 165 Real-time text (RTT) is a medium in real-time conversational 166 sessions. Text entered by participants in a session is transmitted 167 in a time-sampled fashion, so that no specific user action is needed 168 to cause transmission. This gives a direct flow of text in the rate 169 it is created, that is suitable in a real-time conversational 170 setting. The real-time text medium can be combined with other media 171 in multimedia sessions. 173 Media from a number of multimedia session participants can be 174 combined in a multi-party session. The present document specifies 175 how the real-time text streams can be handled in multi-party 176 sessions. Recommendations are provided for preferred methods. 178 The description is mainly focused on the transport level, but also 179 describes a few session and presentation level aspects. 181 Transport of real-time text is specified in RFC 4103 [RFC4103] RTP 182 Payload for text conversation. It makes use of RFC 3550 [RFC3550] 183 Real Time Protocol, for transport. Robustness against network 184 transmission problems is normally achieved through redundant 185 transmission based on the principle from RFC 2198 [RFC2198], with one 186 primary and two redundant transmission of each text element. Primary 187 and redundant transmissions are combined in packets and described by 188 a redundancy header. This transport is usually used in the SIP 189 Session Initiation Protocol RFC 3261 [RFC3261] environment. 191 A very brief overview of functions for real-time text handling in 192 multi-party sessions is described in RFC 4597 [RFC4597] Conferencing 193 Scenarios, sections 4.8 and 4.10. The present specification builds 194 on that description and indicates which protocol mechanisms should be 195 used to implement multi-party handling of real-time text. 197 Real-time text can also be transported in the WebRTC environment, by 198 using WebRTC data channels according to RFC-to-be 8865 199 [I-D.ietf-mmusic-t140-usage-data-channel]. Multi-party aspects for 200 WebRTC solutions are briefly covered. 202 1.1. Requirements Language 204 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 205 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 206 document are to be interpreted as described in RFC 2119 [RFC2119]. 208 2. Centralized conference model 210 In the centralized conference model for SIP, introduced in RFC 4353 211 [RFC4353] "A Framework for Conferencing with the Session Initiation 212 Protocol (SIP)", one function co-ordinates the communication with 213 participants in the multi-party session. This function also controls 214 media mixer functions for the media appearing in the session. The 215 central function is common for control of all media, while the media 216 mixers may work differently for each media. 218 The central function is called the Focus UA. Many variants exist for 219 setting up sessions including the multipoint control centre. It is 220 not within scope of this description to describe these, but rather 221 the media specific handling in the mixer required to handle multi- 222 party calls with RTT. 224 The main principle for handling real-time text media in a centralized 225 conference is that one RTP session for real-time text is established 226 including the multipoint media control centre and the participating 227 endpoints which are going to have real-time text exchange with the 228 others. 230 The different possible mechanisms for mixing and transporting RTT 231 differs in the way they multiplex the text streams and how they 232 identify the sources of the streams. RFC 7667 [RFC7667] describes a 233 number of possible use cases for RTP. This specification refers to 234 different sections of RFC 7667 for further reading of the situations 235 caused by the different possible design choices. 237 The recommended method for using RTP based RTT in a centralized 238 conference model is specified in 239 [I-D.ietf-avtcore-multi-party-rtt-mix] based on the recommendations 240 in this document. 242 Real-time text can also be transported in the WebRTC environment, by 243 using WebRTC data channels according to 244 [I-D.ietf-mmusic-t140-usage-data-channel]. Ways to handle multi- 245 party calls in that environmnent are also specified. 247 3. Requirements on multi-party RTT 249 3.1. General requirements 251 The following general requirements are placed on multi-party RTT: 253 A solution shall be applicable to IMS (3GPP TS 22.173)[TS22173], 254 SIP based VoIP and Next Generation Emergency Services (NENA i3 255 [NENAi3], ETSI TS 103 479 [TS103479], RFC 6443[RFC6443]). 257 The transmission interval for text should not be longer than 500 258 milliseconds when there is anything available to send. Ref ITU-T 259 T.140 [T140]. 261 If text loss is detected or suspected, a missing text marker 262 should be inserted in the text stream. Ref ITU-T T.140 Amendment 263 1 [T140ad1]. ETSI EN 301 549 [EN301549] 265 The display of text from the members of the conversation shall be 266 arranged so that the text from each participant is clearly 267 readable, and its source and the relative timing of entered text 268 is visualized in the display. Mechanisms for looking back in the 269 contents from the current session should be provided. The text 270 should be displayed as soon as it is received. Ref ITU-T T.140 271 [T140] 273 Bridges must be multimedia capable (voice, video, text). Ref NENA 274 i3 STA-010.2. [NENAi3] 276 It MUST be possible to use real-time text in conferences both as a 277 medium of discussion between individual participants (for example, 278 for sidebar discussions in real-time text while listening to the 279 main conference audio) and for central support of the conference 280 with real-time text interpretation of speech. Ref (R7) in RFC 281 5194.[RFC5194] 283 It should be possible to protect RTT contents with usual means for 284 privacy and integrity. Ref RFC 6881 section 16. [RFC6881] 285 Conferencing procedures are documented in RFC 4579 [RFC4579]. Ref 286 NENA i3 STA-010.2.[NENAi3] 288 Conferencing applies to any kind of media stream by which users 289 may want to communicate. Ref 3GPP TS 24.147 [TS24147] 291 The framework for SIP conferences is specified in RFC 4353 292 [RFC4353]. Ref 3GPP TS 24.147 [TS24147] 294 3.2. Performance requirements 296 The mixer performance requirements can be expressed in one number, 297 extracted from the user requirements on real-time text expressed in 298 ITU-T F.700, where it is stated that for "good" usability, text 299 characters should not be delayed more than 1 second from creation to 300 presentation. For "usable" usability the figure is 2 seconds. The 301 main factor behind these limits is from when taking turns in a 302 conversation gets disturbed by a delay of when a response gets 303 visible to the receiving part. If that times get too long, the 304 receiving part gets unsure if the previous utterance was well 305 perceived and the receiving part maybe prepares for repetition. This 306 is similar to the same effect in voice communication, where the 307 usability limit is 400 ms delay. 309 Another important factor in a multi-party conference is the 310 opportunity for a participant using real-time text to provide timely 311 comments and get a chance to enter the discussion if the majority of 312 participants use voice in the conference. A complicating factor when 313 stating the requirements is that some transport methods do not cause 314 a total delay, but instead an increasing jerkiness when the number of 315 simultaneously sending participants is increased. 317 It should however be remembered that the expected number of 318 participants sending real-time text simultaneously is low. Just as 319 with voice or sign language, the capability of the participants to 320 perceive utterances from more than one participant at a time is very 321 limited. Therefore the normal case in multi-party situations is that 322 one participant at a time is the main provider of text. Others might 323 usually just provide very brief comments such as "yes" or "no" or 324 "may I comment?". Only at very rare situations two participants 325 provide more information simultaneously. 327 * The number of expected simultaneously transmitting users is 328 different for different applications. In all cases, just one 329 transmitting user is the normal case. Two simultaneously 330 transmitting participants can occasionally be expected in 331 emergency services, relay services, small unmanaged conferences 332 and group calls and large managed conferences. Three 333 simultaneously transmitting participants may appear occasionally 334 in large unmanaged conferences. The following can therefore 335 express the performance requirement. 337 * The mean delay of text passing the mixer introduced when only one 338 participant is sending text should be kept to a minimum and should 339 not be more than 400 ms. 341 * The mean delay of text passing the mixer should not be more than 1 342 second during moments when up to three users are sending text 343 simultaneously. 345 * For the very rare case that more than three participants send text 346 simultaneously, the mixer may take action to limit the introduced 347 delay of the text passing the mixer to 7 seconds e.g. by 348 discarding text from some participants and instead inserting a 349 general warning about possible text loss in the stream. 351 4. RTP based solutions 353 4.1. Coordination of text RTP streams 355 Coordinating and sending text RTP streams in the multi-party session 356 can be done in a number of ways. The most suitable methods are 357 specified here with pros and cons. 359 A receiving and presenting endpoint MUST separate text from the 360 different sources and identify and display them accordingly. 362 4.1.1. RTP-based solutions with a central mixer 364 A set of solutions can be based on the central RTP mixer. They are 365 described here and a preferred method selected. 367 4.1.1.1. RTP Mixer using default RFC 4103 methods 369 Without any extra specifications, a mixer would transmit with 300 370 milliseconds intervals, and use RFC 4103 [RFC4103] with the default 371 redundancy of one original and two redundant transmissions. The 372 source of the text would be indicated by a single member in the CSRC 373 list. Text from different sources cannot be transmitted in the same 374 packet. Therefore, from the time when the mixer sent one piece of 375 new text from one source, it will need to transmit that text again 376 twice as redundant data, before it can send text from another source. 377 The jerkiness = time between transmission of new text is 900 ms. 378 This is clearly insufficient. 380 Pros: 382 Only a capability negotiation method is needed. No other update of 383 standards are needed, just a general remark that traditional RTP- 384 mixing is used. 386 Cons: 388 Clearly insufficient mixer switching performance. 390 A bit complex handling of transmission when there is new text 391 available from more than one source. The mixer needs to send two 392 packets more with redundant text from the current source before 393 starting to send anything from the other source. 395 4.1.1.2. RTP Mixer using the default method but decreased transmission 396 interval 398 This method makes use of the default RTP-mixing method briefly 399 described in Section 4.1.1.1. The only difference is that the 400 transmission interval is decreased to 100 milliseconds when there is 401 text from more than one source available for transmission. The 402 jerkiness is 300 ms. The mean delay with two simultaneously sending 403 participants is 250 ms, and with three simultaneously sending 404 participants 500 ms. This is acceptable performance. 406 Pros: 408 Minor influence on standards 410 Can be relatively rapidly be introduced in the intended technical 411 environments. 413 Can be declared in sdp as the already existing "text/red" format with 414 a multi-party attribute for capability negotiation. 416 Cons: 418 The introduced jerkiness of new text from more than the required 419 three simultaneously sending sources is high. 421 Slightly higher risk for loss of text at bursty packet loss than for 422 the recommended transmission interval (300 ms) for RFC 4103. 424 When complete loss of packets occur (beyond recovery), it is not 425 possible to deduce from which source text was lost. 427 A bit complex handling of transmission when there is new text 428 available from more than one source. The mixer needs to send two 429 packets more with redundant text from the current source before 430 starting to send anything from the other source. 432 4.1.1.3. RTP Mixer with frequent transmission and indicating sources in 433 CSRC-list 435 An RTP media mixer combines text from participants into one RTP 436 stream, thus all using the same destination address/port combination, 437 the same RTP SSRC, and one sequence number series as described in 438 Section 7.1 and 7.3 of RTP RFC 3550 [RFC3550] about the Mixer 439 function. This method is also briefly described in RFC 7667, section 440 3.6.1 Media mixing mixer [RFC7667]. 442 The sources of the text in each RTP packet are identified by the CSRC 443 list in the RTP packets, containing the SSRC of the initial sources 444 of text. The order of the CSRC parameters is with the SSRC of the 445 source of the primary text first, followed by the SSRC of the first 446 level redundancy, and then the second level redundancy. 448 The transmission interval should be 100 milliseconds when there is 449 text to transmit from more than one source, and otherwise 300 ms. 451 The identification of the sources is made through the CSRC fields and 452 can be made more readable at the receiver through the RTCP SDES CNAME 453 and NAME packets as described in RTP[RFC3550]. 455 Information provided through the notification according to RFC 4575 456 [RFC4575] when the participant joined the conference provides also 457 suitable information and a reference to the SSRC. 459 A receiving endpoint is supposed to separate text items from the 460 different sources and identify and display them accordingly. 462 The ordered CSRC lists in the RFC 4103 [RFC4103] packets make it 463 possible to recover from loss of one and two packets in sequence and 464 assign the recovered text to the right source. For more loss, a 465 marker for possible loss should be inserted or presented. 467 The conference server needs to have authority to decrypt the payload 468 in the received RTP packets in order to be able to recover text from 469 redundant data or insert the missing text marker in the stream, and 470 repack the text in new packets. 472 Even if the format is very similar to "text/red" of RFC 4103, it 473 needs to be declared as a new media subtype, e.g. "text/rex". 475 Pros: 477 This method has low overhead and less complexity than the methods in 478 Section 4.1.1.1, Section 4.1.1.2, Section 4.1.1.4 and 479 Section 4.1.1.6. 481 When loss of packets occur, it is possible to recover text from 482 redundancy at loss of up to the number of redundancy levels carried 483 in the RFC 4103 [RFC4103] stream (normally primary and two redundant 484 levels). 486 This method can be implemented with most RTP implementations. 488 The source switching performance is sufficient for well-behaving 489 conference participants. The jerkiness is 100 ms. 491 Cons: 493 When more consecutive packet loss than the number of generations of 494 redundant data appears, it is not possible to deduce the sources of 495 the totally lost data. 497 Slightly higher risk for loss of text at bursty packet loss than for 498 the recommended transmission interval for RFC 4103. 500 Requires a different sub media format, e.g. "text/rex". This takes a 501 long time in standardisation and releases of target technical 502 environments. 504 The conference server needs to be allowed to decrypt/encrypt the 505 packet payload. This is however normal for media mixers for other 506 media. 508 4.1.1.4. RTP Mixer interleaving packets, receiver using timestamp to 509 recover from loss 511 This method has text only from one source per packet, as the original 512 RFC 4103 [RFC4103] specifies. Packets with text from different 513 sources are instead allowed to be interleaved. The recovery 514 procedure in the receiver makes use of the RTP timestamp and 515 timestamp offsets in the redundancy headers to evaluate if a piece of 516 redundant data was received earlier or not as a base for decision if 517 the redundant data should be recovered or not in case of packet loss. 519 In this method, the transmission interval is 100 milliseconds when 520 text (new or redundant) from more than one source is available for 521 transmission. Otherwise it is 320 ms or following the timing of 522 received packets. 524 Pros: 526 The format of each packet is equal to what is specified in RFC 4103 527 [RFC4103]. 529 The source switching performance is sufficient and good. Text from 530 five participants can be transmitted simultaneously with 500 531 milliseconds interval per source. 533 New text from five simultaneous sources can be transmitted within 500 534 milliseconds. This is sufficient. 536 Recovery from packet loss with five simultaneous sources takes 1 537 second. This is good and implies good protextion against bursty 538 packet loss causing resulting text loss. 540 Cons: 542 The recovery time in case of packet loss is long with more than ten 543 simultaneously sending participants. Then it will be more than 2 544 seconds. 546 The recovery procedure is different from what is described in RFC 547 4103 [RFC4103]. 549 It will in many cases of loss of multiple packets not be possible to 550 deduce if there was any resulting loss of text. A mark for possible 551 loss should be inserted in cases when there might have been resulting 552 loss. 554 4.1.1.5. RTP Mixer with multiple primary data in each packet and 555 individual sequence numbers 557 This method allows primary as well as redundant text from more than 558 one source per packet. The packet payload contains an ordered set of 559 redundant and primary data with the same number of generations of 560 redundancy as once agreed in the SDP negotiation. The data header 561 reflects these parts of the payload. The CSRC list contains one CSRC 562 member per source in the payload and in the same order. An 563 individual sequence number per source is included in the data header 564 replacing the t140 payload type number that is instead assumed to be 565 constant in this format. This allows an individual extra sequence 566 number per source with maximum value 127, suitable for checking for 567 which source loss of text appeared when recovery was not possible. 569 The data header would contain the following fields: 570 0 1 2 3 571 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 572 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 573 |F| Source-seq | timestamp offset | block length | 574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 575 Where "Source-seq" is the sequence number per source. 577 The maximum number of members in the CSRC-list is 15, and that is 578 therefore the maximum number of sources that can be represented in 579 each packet provided that all data can be fitted into the size 580 allowable in one packet. 582 Transmission is done as soon as there is new text available, but not 583 with shorter interval than 150 ms and not longer than 300 ms while 584 there is anything to send. 586 A new media subtype is needed, e.g. "text/rex". 588 This is an SDP offer example for both traditional "text/red" 589 and multi-party "text/rex" format: 591 m=text 11000 RTP/AVP 101 100 98 592 a=rtpmap:98 t140/1000 593 a=rtpmap:100 red/1000 594 a=rtpmap:101 rex/1000 595 a=fmtp:100 98/98/98 596 a=fmtp:101 98/98/98 598 Pros: 600 The source switching performance is good. Text from 15 participants 601 can be transmitted simultaneously. 603 New text from 15 simultaneous sources can be transmitted within 300 604 milliseconds. This is good performance. 606 When more consecutive packet loss than the number of generations of 607 redundant data appears, it is still possible to deduce the sources of 608 the totally lost data, when next text from these sources arrive. 610 Cons: 612 The format of each packet is different from what is specified in RFC 613 4103 [RFC4103]. 615 The processing time in standard organisation will be long. 617 A new media subtype is needed, causing a bit complex negotiation. 619 The recovery procedure is a bit complex. 621 4.1.1.6. RTP Mixer with multiple primary data in each packet 623 This method allows primary as well as redundant text from more than 624 one source per packet. The packet payload contains an ordered set of 625 redundant and primary data with the same number of generations of 626 redundancy as once agreed in the SDP negotiation. The data header 627 reflects these parts of the payload. The CSRC list contains one CSRC 628 member per source in the payload and in the same order. 630 The maximum number of members in the CSRC-list is 15, and that is 631 therefore the maximum number of sources that can be represented in 632 each packet provided that all data can be fitted into the size 633 allowable in one packet. 635 Transmission is done as soon as there is new text available, but not 636 with shorter interval than 150 ms and not longer than 300 ms while 637 there is anything to send. 639 A new media subtype is needed, e.g. "text/rex". 641 SDP would be the same as in Section 4.1.1.6. 643 Pros: 645 The source switching performance is good. Text from 15 participants 646 can be transmitted simultaneously. 648 New text from 15 simultaneous sources can be transmitted within 150 649 milliseconds. This is good performance. 651 Cons: 653 The format of each packet is different from what is specified in RFC 654 4103 [RFC4103]. 656 A new media subtype is needed. 658 A new media subtype is needed, causing a bit complex negotiation. 660 The processing time in standard organisation will be long. 662 The recovery procedure is a bit complex [RFC4103]. 664 When more consecutive packet loss than the number of generations of 665 redundant data appears, it is not possible to deduce the sources of 666 the totally lost data. 668 4.1.1.7. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy in the 669 packets 671 This method allows primary data from one source and redundant text 672 from other sources in each packet. The packet payload contains 673 primary data in "text/t140" format, and redundant data in RFC 5109 674 FEC [RFC5109] format called "text/ulpfec". That means that the 675 redundant data contains the sequence number and the CSRC and other 676 characteristics from the RTP header when the data was sent as 677 primary. The redundancy can be sent at a selected number of packets 678 after when it was sent as primary, in order to improve the protection 679 against bursty packet loss. The redundancy level is recommended to 680 be the same as in original RFC 4103. 682 RFC 4103 says that the protection against loss can be made by other 683 methods than plain redundancy, so this method is in line with that 684 statement. 686 Transmission is done as soon as there is new text available, but not 687 with shorter interval than 100 ms and not longer than 300 ms while 688 there is anything to send (new or redundant text). 690 When more consecutive packet loss than the number of generations of 691 redundant data appears, it is not possible to deduce the sources of 692 the totally lost data. 694 The sdp can indicate the format as "text/red" with "text/ulpfec" 695 redundant data in this way. with traditional RFC 4103 with "text/red" 696 with "text/t140" as redundant data as a fallback. 698 m=text 49170 RTP/AVP 98 101 100 102 699 a=rtpmap:98 red/1000 700 a=fmtp:98 100/102/102 701 a=rtpmap:102 ulpfec/1000 702 a=rtpmap:100 t140/1000 703 a=rtpmap:101 red/1000 704 a=fmtp:101 100/100/100 705 a=fmtp:100 cps=200 707 The "text/ulpfec" format includes an indication of how far back the 708 redundancy belongs, making it possible to cover bursty packet loss 709 better than the other formats with short transmission intervals. For 710 real-time text, it is recommended to send three packets between the 711 primary and the redundant transmissions of text. That makes the 712 transmission cover between 500 and 1500 ms of bursty packet loss. 713 The variation is because of the varying packet interval between many 714 and one simultaneously transmitting source. 716 The "text/ulpfec" format has a number of parameters. One is the 717 length of the data to be protected which in this case must be the 718 whole t140block. 720 Pros: 722 The source switching performance is good. Text from 5 participants 723 can be transmitted within 500 ms. 725 Good recovery from bursty packet loss. 727 The method is based on existing standards. No new registrations are 728 needed. 730 Cons: 732 When more consecutive packet loss than the number of generations of 733 redundant data appears, it is not possible to deduce the sources of 734 the totally lost data. 736 Even if the switching performance is good, it is not as good as for 737 the method called "RTP Mixer with multiple primary data in each 738 packet "Section 4.1.1.6. With more than 5 simultaneously sending 739 sources, there will be a noticeable delay of text of over 500 ms, 740 with 100 ms added per simultaneous source. This is however beyond 741 the requirements and would be a concern only in congestion 742 situations. 744 The recovery procedure is a bit complex [RFC5109]. 746 There is more overhead in terms of extra data and extra packets sent 747 than in the other methods. With the recommended two redundant 748 generations of data, each packet will be 36 bytes longer than with 749 traditional RFC 4103, and at each pause in transmission five extra 750 packets with only redundant data will be sent compared to two extra 751 packets for the traditional RFC 4103 case. 753 4.1.1.8. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy and 754 separate sequence number in the packets 756 This method allows primary data from one source and redundant text 757 from other sources in each packet. The packet payload contains 758 primary data in a new "text/t140e" format, and redundant data in RFC 759 5109 FEC [RFC5109] format called "text/ulpfec". That means that the 760 redundant data contains the sequence number and the CSRC and other 761 characteristics from the RTP header when the data was sent as 762 primary. The redundancy can be sent at a selected number of packets 763 after when it was sent as primary, in order to improve the protection 764 against bursty packet loss. The redundancy level is recommended to 765 be the same as in original RFC 4103. The "text/t140e" format 766 contains a source-specific sequence number and the t140block. 768 RFC 4103 says that the protection against loss can be made by other 769 methods than plain redundancy, so this method is in line with that 770 statement. 772 Transmission is done as soon as there is new text available, but not 773 with shorter interval than 100 ms and not longer than 300 ms while 774 there is anything to send (new or redundant text). 776 When more consecutive packet loss than the number of generations of 777 redundant data appears, it is possible to deduce which sources lost 778 data when new data arrives from the sources. This is done by 779 monitoring the received source specific sequence numbers preceding 780 the text. 782 This is an example of how can indicate the format as "text/red" with 783 "text/t140e" as primary and "text/ulpfec" redundant data, with 784 traditional RFC 4103 with "text/red" with "text/t140" as redundant 785 data as a fallback. 787 m=text 49170 RTP/AVP 98 101 100 102 103 788 a=rtpmap:98 red/1000 789 a=fmtp:98 100/102/102 790 a=rtpmap:102 ulpfec/1000 791 a=rtpmap:103 t140/1000 792 a=rtpmap:100 t140e/1000 793 a=rtpmap:101 red/1000 794 a=fmtp:101 103/103/103 795 a=fmtp:100 cps=200 797 The "text/ulpfec" format includes an indication of how far back the 798 redundancy belongs, making it possible to cover bursty packet loss 799 better than the other formats with short transmission intervals. For 800 real-time text, it is recommended to send three packets between the 801 primary and the redundant transmissions of text. That makes the 802 transmission cover between 500 and 1500 ms of bursty packet loss. 803 The variation is because of the varying packet interval between many 804 and one simultaneously transmitting source. 806 The "text/ulpfec" format has a number of parameters. One is the 807 length of the data to be protected which in this case must be the 808 whole t140block. 810 Pros: 812 The source switching performance is good. Text from 5 participants 813 can be transmitted within 500 ms. 815 Good recovery from bursty packet loss. 817 The method is based on an existing standard for FEC. 819 When more consecutive packet loss than the number of generations of 820 redundant data appears, it is possible to deduce the source of the 821 lost data when new text arrives from the source. 823 Cons: 825 Even if the switching performance is good, it is not as good as for 826 the method called "RTP Mixer with multiple primary data in each 827 packet" Section 4.1.1.6. With more than 5 simultaneously sending 828 sources, there will be a noticeable delay of text of over 500 ms, 829 with 100 ms added per simultaneous source. This is however beyond 830 the requirements and would be a concern only in congestion 831 situations. 833 The recovery procedure is a bit complex [RFC5109]. 835 There is more overhead in terms of extra data and extra packets sent 836 than in the other methods. With the recommended two redundant 837 generations of data, each packet will be 40 bytes longer than with 838 traditional RFC 4103, and at each pause in transmission five extra 839 packets with only redundant data will be sent compared to two extra 840 packets for the traditional RFC 4103 case. 842 A new text media subtype "text/t140e" needs to be registered. 844 The processing time in standard organisation will be long. 846 4.1.1.9. RTP Mixer indicating participants by a control code in the 847 stream 849 Text from all participants except the receiving one is transmitted 850 from the media mixer in the same RTP session and stream, thus all 851 using the same destination address/port combination, the same RTP 852 SSRC and , one sequence number series as described in Section 7.1 and 853 7.3 of RTP RFC 3550 [RFC3550] about the Mixer function. The sources 854 of the text in each RTP packet are identified by a new defined T.140 855 control code "c" followed by a unique identification of the source in 856 UTF-8 string format. 858 The receiver can use the string for presenting the source of text. 859 This method is on the RTP level described in RFC 7667, section 3.6.1 860 Media mixing mixer [RFC7667]. 862 The inline coding of the source of text is applied in the data stream 863 itself, and an RTP mixer function is used for coordinating the 864 sources of text into one RTP stream. 866 Information uniquely identifying each user in the multi-party session 867 is placed as the parameter value "n" in the T.140 application 868 protocol function with the function code "c". The identifier shall 869 thus be formatted like this: SOS c n ST, where SOS and ST are coded 870 as specified in ITU-T T.140 [T140]. The "c" is the letter "c". The 871 n parameter value is a string uniquely identifying the source. This 872 parameter shall be kept short so that it can be repeated in the 873 transmission without concerns for network load. 875 A receiving endpoint is supposed to separate text items from the 876 different sources and identify and display them accordingly. 878 The conference server need to be allowed to decrypt/encrypt the 879 packet payload in order to check the source and repack the text. 881 Pros: 883 If loss of packets occur, it is possible to recover text from 884 redundancy at loss of up to the number of redundancy levels carried 885 in the RFC 4103 [RFC4103]stream. (normally primary and two redundant 886 levels. 888 This method can be implemented with most RTP implementations. 890 The method can also be used with other transports than RTP 892 Cons: 894 The method implies a moderate load by the need to insert the source 895 often in the stream. 897 If more consecutive packet loss than the number of generations of 898 redundant data appears, it is not possible to deduce the source of 899 the totally lost data. 901 The mixer needs to be able to generate suitable and unique source 902 identifications which are suitable as labels for the sources. 904 Requires an extension on the ITU-T T.140 standard, best made by the 905 ITU. 907 There is a risk that the control code indicating the change of source 908 is lost and the result is false source indication of text. 910 The conference server need to be allowed to decrypt/encrypt the 911 packet payload. 913 4.1.1.10. Mixing for multi-party unaware user agents 915 Multi-party real-time text contents can be transmitted to multi-party 916 unaware user agents if source labelling and formatting of the text is 917 performed by a mixer. This method has the limitations that the 918 layout of the presentation and the format of source identification is 919 purely controlled by the mixer, and that only one source at a time is 920 allowed to present in real-time. Other sources need to be stored 921 temporarily waiting for an appropriate moment to switch the source of 922 transmitted text. The mixer controls the switching of sources and 923 inserts a source identifier in text format at the beginning of text 924 after switch of source. The logic of the mixer to detect when a 925 switch is appropriate should detect a number of places in text where 926 a switch can be allowed, including new line, end of sentence, end of 927 phrase, a period of inactivity, and a word separator after a long 928 time of active transmission. 930 This method MAY be used when no support for multi-party awareness is 931 detected in the receiving endpoint.The base for his method is 932 described in RFC 7667, section 3.6.1 Media mixing mixer [RFC7667]. 934 See [I-D.ietf-avtcore-multi-party-rtt-mix] for a procedure for mixing 935 RTT for a conference-unaware endpoint. 937 Pros: 939 Can be transmitted to conference-unaware endpoints. 941 Can be used with other transports than RTP 942 Cons: 944 Does not allow full real-time presentation of more than one source at 945 a time. Text from other sources will be delayed. 947 The only realistic presentation format is a style with the text from 948 the different sources presented with a text label indicating source, 949 and the text collected in a chat style presentation but with more 950 frequent turn-taking. 952 Endpoints often have their own system for adding labels to the RTT 953 presentation. In that case there will be two levels of labels in the 954 presentation, one for the mixer and one for the sources. 956 If loss of more packets than can be recovered by the redundancy 957 appears, it is not possible to detect which source was struck by the 958 loss. It is also possible that a source switch occurred during the 959 loss, and therefore a false indication of the source of text can be 960 provided to the user after such loss. 962 Because of all these cons, this method is not recommended be used as 963 the main method, but only as fallback and the last resort for 964 backwards interoperability with multi-party unaware endpoints. 966 The conference server need to be allowed to decrypt/encrypt the 967 packet payload. 969 4.1.2. RTP-based bridging with minor RTT media contents reformatting by 970 the bridge 972 It may be desirable to send text in a multi-party setting in a way 973 that allows the text stream contents to be distributed without being 974 dealt with in detail in any central server. A number of such methods 975 are described. However, when writing this specification, no one of 976 these methods have a specified way of establishing the session by 977 sdp. 979 4.1.2.1. RTP Translator sending one RTT stream per participant 981 Within the RTP session, text from each participant is transmitted 982 from the RTP media translator (bridge) in a separate RTP stream, thus 983 using the same destination address/port combination, the same payload 984 type number (PT) but separate RTP SSRC parameters and sequence number 985 series as described in Section 7.1 and 7.2 of RTP RFC 3550 [RFC3550] 986 about the Translator function. The source of the text in each RTP 987 packet is identified by the SSRC parameter in the RTP packets, 988 containing the SSRC of the initial source of text. 990 A receiving and presenting endpoint is supposed to separate text 991 items from the different sources and identify and display them in a 992 suitable way. 994 This method is described in RFC 7667, section 3.5.1 Relay-transport 995 translator or 3.5.2 Media translator [RFC7667]. 997 The identification of the source is made through the SSRC. The 998 translation to a readable label can be done by mapping to information 999 from the RTCP SDES CNAME and NAME packets as described in 1000 RTP[RFC3550], and also through information in the text media member 1001 in the conference notification described in RFC 4575 [RFC4575]. 1003 The sdp exchange for establishing this mixing type can be equal to 1004 what is used for basic two-party use of RFC 4103 with just an added 1005 attribute for indicating multi-party capability. 1007 m=text 49170 RTP/AVP 98 103 1008 a=rtpmap:98 red/1000 1009 a=fmtp:98 103/103/103 1010 a=rtpmap:103 t140/1000 1011 a=fmtp:103 cps=150 1012 a=RTT-mixing:RTP-translator 1014 A similar answer including the same RTT-mixing attribute would 1015 indicate that multi-party coding can begin. An answer without the 1016 same RTT-mixing attribute could result in diversion to use of the 1017 mixing method for multi-party unaware endpoints Section 4.1.1.10 if 1018 more than two parties are involved in the session. 1020 The bridge can add new sources in the communication to a participant 1021 by first sending a conference notification according to RFC 4575 1022 [RFC4575] with the SSRC of the new source included in the 1023 corresponding "text" media member, or by sending an RTCP message with 1024 the new SSRC in an SDES packet. 1026 A receiver should be prepared to receive such indications of new 1027 streams being added to the multi-party session, so that the new SSRC 1028 is not taken for a change in SSRC value for an already established 1029 RTP stream. 1031 Transmission, reception, packet loss recovery and text loss 1032 indication is performed per source in the separate RTP streams in the 1033 same way as in two-party sessions with RFC 4103 [RFC4575]. 1035 Text is recommended to be sent by the bridge as soon as it is 1036 available for transmission, but not less than 250 ms after a previous 1037 transmission. This will in many cases result in close to 0 added 1038 delay by the bridge, because most RTT senders use a 300 ms 1039 transmission interval. 1041 It is sometimes said that this configuration is not supported by 1042 current media declarations in sdp. RFC 3264 [RFC3264]specifies in 1043 some places that one media description is supposed to describe just 1044 one RTP media stream. However this is not directly referencing an 1045 RTP stream, and use of multiple RTP streams in the same RTP session 1046 is recommended in many other RFCs. 1048 This confusion is clarified in RFC 5576 [RFC5576] section 3 by the 1049 following statements: 1051 "The term "media stream" does not appear in the SDP specification 1052 itself, but is used by a number of SDP extensions, for instance, 1053 Interactive Connectivity Establishment (ICE) [ICE], to denote the 1054 object described by an SDP media description. This term is 1055 unfortunately rather confusing, as the RTP specification [RFC3550] 1056 uses the term "media stream" to refer to an individual media source 1057 or RTP packet stream, identified by an SSRC, whereas an SDP media 1058 stream describes an entire RTP session, which can contain any number 1059 of RTP sources." 1061 In most cases, it will be sufficient that new sources are introduced 1062 with a conference notification or RTCP message. However, RFC 5576 1063 [RFC5576] specifies attributes which may be used to more explicitly 1064 announce new sources or restart of earlier established RTP streams. 1066 This method is encouraged by draft-ietf-avtcore-multiplex-guidelines 1067 [I-D.ietf-avtcore-multiplex-guidelines] section 5.2. 1069 Normal operation will be that the bridge receives text packets from 1070 the source and handles any text recovery and indication of loss 1071 needed before queueing the resulting clean text for transmission from 1072 the bridge to the receivers. 1074 It may however also be possible for the bridge to just convey the 1075 packet contents as received from the sources, with minor adjustments, 1076 and let the receiving endpoint handle all aspects of recovery and 1077 indication of loss, even for the source to bridge path. In that case 1078 also the sequence number must be maintained as it was at reception in 1079 the bridge. This mode needs further study before application. 1081 Pros: 1083 This method is the natural way to do multi-party bridging with RFC 1084 4103 based RTT. Only a small addition is included in the session 1085 establishment to verify capability by the parties because many 1086 implementations are done without multi-party capability. 1088 This method has moderate overhead in terms of work for the mixer, but 1089 high in terms of packet transmission rate. Five sources sending 1090 simultaneously cause the bridge to send 15 packets per second to each 1091 receiver. 1093 When loss of packets occur, it is possible to recover text from 1094 redundancy at loss of up to the number of redundancy levels carried 1095 in the RFC 4103 [RFC4103] stream(normally primary and two redundant 1096 levels). 1098 More loss than what can be recovered, can be detected and the marker 1099 for text loss can be inserted in the correct stream. 1101 It may be possible in some scenarios to keep the text encrypted 1102 through the Translator. 1104 Minimal delay. The delay can often be kept close to 0 with at least 1105 5 simultaneous sending participants. 1107 Cons: 1109 There are RTP implementations not supporting the Translator model. 1110 They will need to use the fall-back to multi-party-unaware mixing. 1111 An investigation about how common this is is needed before the method 1112 is used. 1114 The processing time in standard organisation will be long. 1116 With many simultaneous sending sources, the total rate of packets 1117 will be high, and can cause congestion. The requirement to handle 3 1118 simultaneous sources in this specification will cause 10 packets per 1119 second that is manageable in most cases, e.g. considering that audio 1120 usually use 50 packets per second. 1122 4.1.2.2. Distributing packets in an end-to-end encryption structure 1124 In order to achieve end-to-end encryption, it is possible to let the 1125 packets from the sources just pass though a central distributor, and 1126 handle the security agreements between the participants. 1127 Specifications exist for a framework with this functionality for 1128 application on RTP based conferences in 1129 [I-D.ietf-perc-private-media-framework]. The RTP flow and mixing 1130 characteristics has similarities with the method described under "RTP 1131 Translator sending one RTT stream per participant" above. RFC 4103 1132 RTP streams [RFC4103] would fit into the structure and it would 1133 provide a base for end-to-end encrypted rtt multi-party conferencing. 1135 Pros: 1137 Good security 1139 Straightforward multi-party handling. 1141 Cons: 1143 Does not operate under the usual SIP central conferencing 1144 architecture. 1146 Requires the participants to perform a lot of key handling. 1148 Is work in progress when this is written. 1150 4.1.2.3. Mesh of RTP endpoints 1152 Text from all participants are transmitted directly to all others in 1153 one RTP session, without a central bridge. The sources of the text 1154 in each RTP packet are identified by the source network address and 1155 the SSRC. 1157 This method is described in RFC 7667, section 3.4 Point to multi- 1158 point using mesh [RFC7667]. 1160 Pros: 1162 When loss of packets occur, it is possible to recover text from 1163 redundancy at loss of up to the number of redundancy levels carried 1164 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 1165 levels. 1167 This method can be implemented with most RTP implementations. 1169 Transmitted text can also be used with other transports than RTP 1171 Cons: 1173 This model is not described in IMS, NENA and EENA specifications, and 1174 does therefore not meet the requirements. 1176 Requires a drastically increasing number of connections when the 1177 number of participants increase. 1179 4.1.2.4. Multiple RTP sessions, one for each participant 1181 Text from all participants are transmitted directly to all others in 1182 one RTP session each, without a central bridge. Each session is 1183 established with a separate media description in SDP. The sources of 1184 the text in each RTP packet are identified by the source network 1185 address and the SSRC. 1187 Pros: 1189 When loss of packets occur, it is possible to recover text from 1190 redundancy at loss of up to the number of redundancy levels carried 1191 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 1192 levels. 1194 Complete loss of text can be indicated in the received stream. 1196 This method can be implemented with most RTP implementations. 1198 End-to-end encryption is achievable. 1200 Cons: 1202 This method is not described in IMS, NENA and ETSI specifications and 1203 does therefore not meet the requirements. 1205 A lot of network resources are spent on setting up separate sessions 1206 for each participant. 1208 5. Preferred RTP-based multi-party RTT transport method 1210 For RTP transport of RTT using RTP-mixer technology, one method for 1211 multi-party mixing and transport stand out as fulfilling the goals 1212 best and is therefore recommended. That is: "RTP Mixer interleaving 1213 packets, receiver using timestamp to recover from loss" 1214 Section 4.1.1.4 1216 For RTP transport in separate streams or sessions, no current 1217 recommendation can be made. A bridging method in the process of 1218 standardisation with interesting characteristics is the end-to-end 1219 encryption model "perc" Section 4.1.2.2. 1221 6. Session control of RTP-based multi-party RTT sessions 1223 General session control aspects for multi-party sessions are 1224 described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP) 1225 Event Package for Conference State, and RFC 4579 [RFC4579] Session 1226 Initiation Protocol (SIP) Call Control - Conferencing for User 1227 Agents. The nomenclature of these specifications are used here. 1229 The procedures for a multi-party aware model for RTT-transmission 1230 shall only be applied if a capability exchange for multi-party aware 1231 real-time text transmission has been completed and a supported method 1232 for multi-party real-time text transmission can be negotiated. 1234 A method for detection of conference-awareness for centralized SIP 1235 conferencing in general is specified in RFC 4579 [RFC4579]. The 1236 focus sends the "isfocus" feature tag in a SIP Contact header. This 1237 causes the conference-aware endpoint to subscribe to conference 1238 notifications from the focus. The focus then sends notifications to 1239 the endpoint about entering and disappearing conference participants 1240 and their media capabilities. The information is carried XML- 1241 formatted in a 'conference-info' block in the notification according 1242 to RFC 4575 [RFC4575]. The mechanism is described in detail in RFC 1243 4575 [RFC4575]. 1245 Before a conference media server starts sending multi-party RTT to an 1246 endpoint, a verification of its ability to handle multi-party RTT 1247 must be made. A decision on which mechanism to use for identifying 1248 text from the different participants must also be taken, implicitly 1249 or explicitly. These verifications and decisions can be done in a 1250 number of ways. The most apparent ways are specified here and their 1251 pros and cons described. One of the methods is selected to be the 1252 one to be used by implementations of the centralized conference model 1253 according to this specification. 1255 6.1. Implicit RTT multi-party capability indication 1257 Capability for RTT multi-party handling can be decided to be 1258 implicitly indicated by session control items. 1260 The focus may implicitly indicate muti-party RTT capability by 1261 including the media child with value "text" in the RFC 4575 [RFC4575] 1262 conference-info provided in conference notifications. 1264 An endpoint may implicitly indicate multi-party RTT capability by 1265 including the text media in the SDP in the session control 1266 transactions with the conference focus after the subscription to the 1267 conference has taken place. 1269 The implicit RTT capability indication means for the focus that it 1270 can handle multi-party RTT according to the preferred method 1271 indicated in the RTT multi-party methods section above. 1273 The implicit RTT capability indication means for the endpoint that it 1274 can handle multi-party RTT according to the preferred method 1275 indicated in the RTT multi-party methods section above. 1277 If the focus detects that an endpoint implicitly declared RTT multi- 1278 party capability, it SHALL provide RTT according to the preferred 1279 method. 1281 If the focus detects that the endpoint does not indicate any RTT 1282 multi-party capability, then it shall either provide RTT multi-party 1283 text in the way specified for conference-unaware endpoint above, or 1284 refuse to set up the session. 1286 If the endpoint detects that the focus has implicitly declared RTT 1287 multi-party capability, it shall be prepared to present RTT in a 1288 multi-party fashion according to the preferred method. 1290 Pros: 1292 Acceptance of implicit multi-party capability implies that no 1293 standardisation of explicit RTT multi-party capability exchange is 1294 required. 1296 Cons: 1298 If other methods for multi-party RTT are to be used in the same 1299 implementation environment as the preferred ones, then capability 1300 exchange needs to be defined for them. 1302 Cannot be used outside a strictly applied SIP central conference 1303 model. 1305 6.2. RTT multi-party capability declared by SIP media-tags 1307 Specifications for RTT multi-party capability declarations can be 1308 agreed for use as SIP media feature tags, to be exchanged during SIP 1309 call control operation according to the mechanisms in RFC 3840 1310 [RFC3840] and RFC 3841 [RFC3841]. Capability for the RTT Multi-party 1311 capability is then indicated by the media feature tag "rtt-mix", with 1312 a set of possible values for the different possible methods. 1314 The possible values in the list may for example be: 1316 rtp-mixer 1317 perc 1319 rtp-mixer indicates capability for using the RTP-mixer based 1320 presentation of multi-party text. 1322 perc indicates capability for using the perc based transmission of 1323 multi-party text. 1325 Example: Contact: 1327 ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL" 1329 ;+sip.rtt-mix="rtp-mixer" 1331 If, after evaluation of the alternatives in this specification, only 1332 one mixing method is selected to be brought to implementation, then 1333 the media tag can be reduced to a single tag with no list of values. 1335 An offer-answer exchange should take place and the common method 1336 selected by the answering party shall be used in the session with 1337 that UA. 1339 When no common method is declared, then only the fallback method for 1340 multi-party unaware participants can be used, or the session dropped. 1342 If more than one text media section is included in SDP, all must be 1343 capable of using the declared RTT multi-party method. 1345 Pros: 1347 Provides a clear decision method. 1349 Can be extended with new mixing methods. 1351 Can guide call routing to a suitable capable focus. 1353 Cons: 1355 Requires standardization and IANA registration. 1357 Is not stream specific. If more than one text stream is specified, 1358 all must have the same type of multi-party capability. 1360 Cannot be used in the WebRTC environment. 1362 6.3. SDP media attribute for RTT multi-party capability indication 1364 An attribute can be specified on media level, to be used in text 1365 media SDP declarations for negotiating RTT multi-party capabilities. 1366 The attribute can have the name "rtt-mixing". 1368 More than one attribute can be included in one media description. 1370 The attribute can have a value. The value can for example be: 1372 rtp-mixer 1374 rtp-translator 1376 perc 1378 rtp-mixer indicates capability for using the RTP-mixer and CSRC-list 1379 based mixing of multi-party text. 1381 rtp-translator indicates capability for using the RTP-translator 1382 based mixing 1384 perc indicates capability for using the perc based transmission of 1385 multi-party text. 1387 An offer-answer exchange should take place and the common method 1388 selected by the answering party shall be used in the session with 1389 that endpoint. 1391 When no common method is declared, then only the fallback method for 1392 multi-party unaware endpoints can be used. 1394 Example: a=rtt-mixing:rtp-mixer 1396 If, after evaluation of the alternatives in this specification, only 1397 one mixing method is selected to be brought to implementation, then 1398 the attribute can be reduced to a single attribute with no list of 1399 values. 1401 Pros: 1403 Provides a clear decision method. 1405 Can be extended with new mixing methods. 1407 Can be used on specific text media. 1409 Can be used also for SDP-controlled WebRTC sessions with multiple 1410 streams in the same data channel. 1412 Cons: 1414 Requires standardization and IANA registration. 1416 Cannot guide SIP routing. 1418 6.4. Simplified SDP media attribute for RTT multi-party capability 1419 indication 1421 An attribute can be specified on media level, to be used in text 1422 media SDP declarations for negotiating RTT multi-party capabilities. 1423 The attribute can have a name suitable for the selected method and no 1424 value. It would be selected and used if only one method for multi- 1425 party rtt is brought forward from this specification, and the other 1426 left unspecified for now or found to be possible to negotiate in 1427 another way. 1429 An offer-answer exchange should take place and if both parties 1430 specify rtt-mixing capability with the same attribute, the selected 1431 mixing method shall be used. 1433 When no common method is declared, then only the fallback method for 1434 multi-party unaware endpoints can be used, or the session not 1435 accepted for multi-party use. 1437 Example: a=rtt-mix 1439 Pros: 1441 Provides a clear decision method. 1443 Very simple syntax and semantics. 1445 Can be used on specific text media. 1447 Cons: 1449 Requires standardization and IANA registration. 1451 If another RTT mixing method is also specified in the future, then 1452 that method may also need to specify and register its own attribute, 1453 instead of if an attribute with a parameter value is used, when only 1454 an addition of a new possible value is needed. 1456 Cannot guide SIP routing. 1458 6.5. SDP format parameter for RTT multi-party capability indication 1460 An FMTP format parameter can be specified for the RFC 4103 1461 [RFC4103]media, to be used in text media SDP declarations for 1462 negotiating RTT multi-party capabilities. The parameter can have the 1463 name "rtt-mixing", with one or more of its possible values. 1465 The possible values in the list are: 1467 rtp-mixer 1469 perc 1471 rtp-mixer indicates capability for using the RTP-mixer based mixing 1472 and presentation of multi-party text using the CSRC-list. 1474 perc indicates capability for using the perc based transmission of 1475 multi-party text. 1477 Example: a=fmtp 96 98/98/98 rtt-mixing=rtp-mixer 1479 If, after evaluation of the alternatives in this specification, only 1480 one mixing method is selected to be brought to implementation, then 1481 the parameter can be reduced to a single parameter with no list of 1482 values. 1484 An offer-answer exchange should take place and the common method 1485 selected by the answering party shall be used in the session with 1486 that UA. 1488 When no common method is declared, then only the fallback method can 1489 be used, or the session denied. 1491 Pros: 1493 Provides a clear decision method. 1495 Can be extended with new mixing methods. 1497 Can be used on specific text media. 1499 Can be used also for SDP-controlled WebRTC sessions with multiple 1500 streams in the same data channel. 1502 Cons: 1504 Requires standardization and IANA registration. 1506 May cause interop problems with current RFC4103 [RFC4103] 1507 implementations not expecting a new fmtp-parameter. 1509 Cannot guide SIP routing. 1511 6.6. A text media subtype for support of multi-party rtt 1513 Indicating a specific text media subtype in SDP is a straightforward 1514 way for negotiating multi-party capability. Especially if there are 1515 format differences from the "text/red" and "text/t140" formats of 1516 RFC4103 [RFC4103], then this is a natural way to do the negotiation 1517 for multi-party rtt. 1519 Pros: 1521 No extra efforts if a new format is needed anyway. 1523 Cons: 1525 None specific to using the format indication for negotiation of 1526 multi-party capability. But only feasible if a new format is needed 1527 anyway. 1529 6.7. Preferred capability declaration method for RTP-based transport. 1531 If the preferred transport method is one with a specific media 1532 subtype in sdp, then specification by media subtype is preferred. 1534 If this would not be the case, then the preferred capability 1535 declaration method would be the one with a specific SDP attribute for 1536 the selected mixing method Section 6.4 because it is straightforward. 1538 6.8. Identification of the source of text for RTP-based solutions 1540 The main way to identify the source of text in the RTP based solution 1541 is by the SSRC of the sending participant. In the RTP-mixer 1542 solution, this SSRC is included in the CSRC list of the transmitted 1543 packets. Further identification that may be needed for better 1544 labelling of received text may be achieved from a number of sources. 1545 It may be the RTCP SDES CNAME and NAME reports, and in the conference 1546 notification data (RFC 4575) [RFC4575]. 1548 As soon as a new member is added to the RTP session, its 1549 characteristics should be transmitted in RTCP SDES CNAME and NAME 1550 reports according to section 6.5 in RFC 3550 [RFC3550]. The 1551 information about the participant should also be included in the 1552 conference data including the text media member in a notification 1553 according to RFC 4575 [RFC4575]. 1555 The RTCP SDES report, SHOULD contain identification of the source 1556 represented by the SSRC/CSRC identifier. This identification MUST 1557 contain the CNAME field and MAY contain the NAME field and other 1558 defined fields of the SDES report. 1560 A focus UA SHOULD primarily convey SDES information received from the 1561 sources of the session members. When such information is not 1562 available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME 1563 information from available information from the SIP session with the 1564 participant. 1566 Provision of detailed information in the NAME field has security 1567 implications, especially if provided without encryption. 1569 7. RTT bridging in WebRTC 1571 Within WebRTC, real-time text is specified to be carried in WebRTC 1572 data channels as specified in 1573 [I-D.ietf-mmusic-t140-usage-data-channel]. A few ways to handle 1574 multi-party RTT are mentioned briefly. They are repeated below. 1576 7.1. RTT bridging in WebRTC with one data channel per source 1578 A straightforward way to handle multi-party RTT is for the bridge to 1579 open one T.140 data channel per source towards the receiving 1580 participants. 1582 The stream-id forms a unique stream identification. 1584 The identification of the source is made through the Label property 1585 of the channel, and session information belonging to the source. The 1586 endpoint can compose a readable label for the presentation from this 1587 information. 1589 Pros: 1591 This is a straightforward solution. 1593 The load per source is low. 1595 Cons: 1597 With a high number of participants, the overhead of establishing and 1598 maintaining the high number of data channels required may be high, 1599 even if the load per channel is low. 1601 7.2. RTT bridging in WebRTC with one common data channel 1603 A way to handle multi-party RTT in WebRTC is for the bridge combine 1604 text from all sources into one data channel and insert the sources in 1605 the stream by a T.140 control code for source. 1607 This method is described in a corresponding section for RTP 1608 transmission above in Section 4.1.1.9. 1610 The identification of the source is made through insertion in the 1611 beginning of each text transmission from a source of a control code 1612 extension "c" followed by a string representing the source, framed by 1613 the control code start and end flags SOS and ST (See ITU-T T.140 1614 [T140]). 1616 A receiving endpoint is supposed to separate text items from the 1617 different sources and identify and display them in a suitable way. 1619 The endpoint does not always display the source identification in the 1620 received text at the place where it is received, but has the 1621 information as a guide for planning the presentation of received 1622 text. A label corresponding to the source identification is 1623 presented when needed depending on the selected presentation style. 1625 Pros: 1627 This solution has relatively low overhead on session and network 1628 level 1630 Cons: 1632 This solution has higher overhead on the media contents level than 1633 the WebRTC solution above. 1635 Standardisation of the new control code "c" in ITU-T T.140 [T140] is 1636 required. 1638 The conference server need to be allowed to decrypt/encrypt the data 1639 channel contents. 1641 7.3. Preferred rtt multi-party method for WebRTC 1643 For WebRTC, one method is to prefer because of the simplicity. So, 1644 for WebRTC, the method to implement for multi-party RTT with multi- 1645 party aware parties when no other method is explicitly agreed between 1646 implementing parties is: "RTT bridging in WebRTC with one data 1647 channel per source" Section 7.1. 1649 8. Presentation of multi-party text 1651 All session participants with RTP based transport MUST observe the 1652 SSRC/CSRC field of incoming text RTP packets, and make note of which 1653 source they came from in order to be able to present text in a way 1654 that makes it easy to read text from each participant in a session, 1655 and get information about the source of the text. 1657 In the WebRTC case, the Label parameter and other provided endpoint 1658 information should be used for the same purpose. 1660 8.1. Associating identities with text streams 1662 A source identity SHOULD be composed from available information 1663 sources and displayed together with the text as indicated in ITU-T 1664 T.140 Appendix[T140]. 1666 The source identity should primarily be the NAME field from incoming 1667 SDES packets. If this information is not available, and the session 1668 is a two-party session, then the T.140 source identity SHOULD be 1669 composed from the SIP session participant information. For multi- 1670 party sessions the source identity may be composed by local 1671 information if sufficient information is not available in the 1672 session. 1674 Applications may abbreviate the presented source identity to a 1675 suitable form for the available display. 1677 Applications may also replace received source information with 1678 internally used nicknames. 1680 8.2. Presentation details for multi-party aware endpoints. 1682 The multi-party aware endpoint should after any action for recovery 1683 of data from lost packets, separate the incoming streams and present 1684 them according to the style that the receiving application supports 1685 and the user has selected. The decisions taken for presentation of 1686 the multi-party interchange shall be purely on the receiving side. 1687 The sending application must not insert any item in the stream to 1688 influence presentation that is not requested by the sending 1689 participant. 1691 8.2.1. Bubble style presentation 1693 One often used style is to present real-time text in chunks in 1694 readable bubbles identified by labels containing names of sources. 1695 Bubbles are placed in one column in the presentation area and are 1696 closed and moved upwards in the presentation area after certain items 1697 or events, when there is also newer text from another source that 1698 would go into a new bubble. The text items that allows bubble 1699 closing are any character closing a phrase or sentence followed by a 1700 space or a timeout of a suitable time (about 10 seconds). 1702 Real-time active text sent from the local user should be presented in 1703 a separate area. When there is a reason to close a bubble from the 1704 local user, the bubble should be placed above all real-time active 1705 bubbles, so that the time order that real-time text entries were 1706 completed is visible. 1708 Scrolling is usually provided for viewing of recent or older text. 1709 When scrolling is done to an earlier point in the text, the 1710 presentation shall not move the scroll position by new received text. 1711 It must be the decision of the local user to return to automatic 1712 viewing of latest text actions. It may be useful with an indication 1713 that there is new text to read after scrolling to an earlier position 1714 has been activated. 1716 The presentation area may become too small to present all text in all 1717 real-time active bubbles. Various techniques can be applied to 1718 provide a good overview and good reading opportunity even in such 1719 situations. The active real-time bubble may have a limited number of 1720 lines and if their contents need more lines, then a scrolling 1721 opportunity within the real-time active bubble is provided. Another 1722 method can be to only show the label and the last line of the active 1723 real-time bubble contents, and make it possible to expand or compress 1724 the bubble presentation between full view and one line view. 1726 Erasures require special consideration. Erasure within a real-time 1727 active bubble is straightforward. But if erasure from one 1728 participant affects the last character before a bubble, the whole 1729 previous bubble becomes the actual bubble for real-time action by 1730 that participant and is placed below all other bubbles in the 1731 presentation area. If the border between bubbles was caused by the 1732 CRLF characters (instead of the normal "Line Separator"), only one 1733 erasure action is required to erase this bubble border. When a 1734 bubble is closed, it is moved up, above all real-time active bubbles. 1736 A three-party view is shown in this example . 1738 _________________________________________________ 1739 | |^| 1740 | |-| 1741 |[Alice] Hi, Alice here. | | 1742 | | | 1743 |[Bob] Bob as well. | | 1744 | | | 1745 |[Eve] Hi, this is Eve, calling from Paris. | | 1746 | I thought you should be here. | | 1747 | | | 1748 |[Alice] I am coming on Thursday, my | | 1749 | performance is not until Friday morning.| | 1750 | | | 1751 |[Bob] And I on Wednesday evening. | | 1752 | | | 1753 |[Alice] Can we meet on Thursday evening? | | 1754 | | | 1755 |[Eve] Yes, definitely. How about 7pm. | | 1756 | at the entrance of the restaurant | | 1757 | Le Lion Blanc? | | 1758 |[Eve] we can have dinner and then take a walk | | 1759 | | | 1760 | But I need to be back to | | 1761 | the hotel by 11 because I need | | 1762 | | | 1763 | I wou |-| 1764 |______________________________________________|v| 1765 | of course, I underst | 1766 |________________________________________________| 1768 Figure 1: Three-party call with bubble style. 1770 Figure 1: Example of a three-party call presented in the bubble 1771 style. 1773 8.2.2. Other presentation styles 1775 Other presentation styles than the bubble style may be arranged and 1776 appreciated by the users. In a video conference one way may be to 1777 have a real-time text area below the video view of each participant. 1778 Another view may be to provide one column in a presentation area for 1779 each participant and place the text entries in a relative vertical 1780 position corresponding to when text entry in them was completed. The 1781 labels can then be placed in the column header. The considerations 1782 for ending and moving and erasure of entered text discussed above for 1783 the bubble style are valid also for these styles. 1785 This figure shows how a coordinated column view MAY be presented. 1787 _____________________________________________________________________ 1788 | Bob | Eve | Alice | 1789 |____________________|______________________|_______________________| 1790 | | |I will arrive by TGV. | 1791 |My flight is to Orly| |Convenient to the main | 1792 | |Hi all, can we plan |station. | 1793 | |for the seminar? | | 1794 |Eve, will you do | | | 1795 |your presentation on| | | 1796 |Friday? |Yes, Friday at 10. | | 1797 |Fine, wo | |We need to meet befo | 1798 |___________________________________________________________________| 1800 Figure 2: A coordinated column-view of a three-party session with 1801 entries ordered in approximate time-order. 1803 9. Presentation details for multi-party unaware endpoints. 1805 Multi-party unaware endpoints are prepared only for presentation of 1806 two sources of text, the local user and a remote user. If mixing for 1807 multi-party unaware endpoints is to be supported, in order to enable 1808 some multi-party communication with such endpoint, the mixer need to 1809 plan the presentation and insert labels and line breaks before 1810 lables. Many limitations appear for this presentation mode, and it 1811 must be seen as a fallback and a last resort. 1813 A procedure for presenting RTT to a conference-unaware endpoint is 1814 included in [I-D.ietf-avtcore-multi-party-rtt-mix] 1816 10. Security Considerations 1818 The security considerations valid for RFC 4103 [RFC4103] and RFC 3550 1819 [RFC3550] are valid also for the multi-party sessions with text. 1821 11. IANA Considerations 1823 The items for indication and negotiation of capability for multi- 1824 party rtt should be registered with IANA in the specifications where 1825 they are specified in detail. 1827 12. Congestion considerations 1829 The congestion considerations described in RFC 4103 [RFC4103] are 1830 valid also for the recommended RTP-based multi-party use of the real- 1831 time text transport. A risk for congestion may appear if a number of 1832 conference participants are active transmitting text simultaneously, 1833 because the recommended RTP-based multi-party transmission method 1834 does not allow multiple sources of text to contribute to the same 1835 packet. 1837 In situations of risk for congestion, the Focus UA MAY combine 1838 packets from the same source to increase the transmission interval 1839 per source up to one second. Local conference policy in the Focus UA 1840 may be used to decide which streams shall be selected for such 1841 transmission frequency reduction. 1843 13. Acknowledgements 1845 Arnoud van Wijk for contributions to an earlier, expired draft of 1846 this memo. 1848 14. Change history 1850 14.1. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-05 1852 Modify the solution changing source in every packet in the RTP-mixer 1853 solution, and base recovery on analyzing timestamp and make it the 1854 recommended one. Aligned with the recommendation in draft-ietf- 1855 avtcore-multi-party-rtt-mix-10. 1857 14.2. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-04 1859 Change name of simplified sdp attribute to "rtt-mix" to match a 1860 change in the draft draft-ietf-avtcore-multi-party-rtt-mix-09. 1862 14.3. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-03 1864 Modified info on the method with RFC 4103 format and sdp attribute 1865 "rtt-mix-rtp-mixer". 1867 Increased the performance requirements section. 1869 Inserted recommendations, with emphasis on ease of implementation and 1870 ease of standardisation. 1872 14.4. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-02 1874 Added detail in the section on RTP translator model alternative 1875 4.1.2.1. 1877 14.5. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-01 1879 Added three more methods for RTP-mixer mixing. Two RFC 5109 FEC 1880 based and another with modified data header to detect source of 1881 completely lost text. 1883 Separated RTP-based and WebRTC based solutions. 1885 Deleted the multi-party-unaware mixing procedure appendix. It is now 1886 included in the draft draft-ietf-avtcore-multi-party-rtt-mix. Kept a 1887 section with a reference to the new place. 1889 14.6. Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to draft- 1890 hellstrom-avtcore-multi-party-rtt-solutions-00 1892 Add discussion about switching performance, as discussed in avtcore 1893 on March 13. 1895 Added that a decrease of transmission interval to 100 ms increases 1896 switching performance by a factor 3, but still not sufficient. 1898 Added that the CSRC-list method also uses 100 milliseconds 1899 transmission interval. 1901 Added the method with multiple primary text in each packet. 1903 Added the timestamp-based method for rtp-mixing proposed by James 1904 Hamlin on March 14. 1906 Corrected the chat style presentation example picture. Delete a few 1907 "[mix]". 1909 14.7. Changes from version draft-hellstrom-mmusic-multi-party-rtt-01 to 1910 -02 1912 Change from a general overview to overview with clear 1913 recommendations. 1915 Splits text coordination methods in three groups. 1917 Recommends rtt-mixer with sources in CSRC-list but refers to its spec 1918 for details. 1920 Shortened Appendix with conference-unaware example. 1922 Cleaned up preferences. 1924 Inserted pictures of screen-views. 1926 15. References 1928 15.1. Normative References 1930 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1931 Requirement Levels", BCP 14, RFC 2119, 1932 DOI 10.17487/RFC2119, March 1997, 1933 . 1935 15.2. Informative References 1937 [EN301549] ETSI, "EN 301 549. Accessibility requirements for ICT 1938 products and services", November 2019, 1939 . 1943 [I-D.ietf-avtcore-multi-party-rtt-mix] 1944 Hellstrom, G., "RTP-mixer formatting of multi-party Real- 1945 time text", Work in Progress, Internet-Draft, draft-ietf- 1946 avtcore-multi-party-rtt-mix-10, 18 November 2020, 1947 . 1950 [I-D.ietf-avtcore-multiplex-guidelines] 1951 Westerlund, M., Burman, B., Perkins, C., Alvestrand, H., 1952 and R. Even, "Guidelines for using the Multiplexing 1953 Features of RTP to Support Multiple Media Streams", Work 1954 in Progress, Internet-Draft, draft-ietf-avtcore-multiplex- 1955 guidelines-12, 16 June 2020, . 1958 [I-D.ietf-mmusic-t140-usage-data-channel] 1959 Holmberg, C. and G. Hellstrom, "T.140 Real-time Text 1960 Conversation over WebRTC Data Channels", Work in Progress, 1961 Internet-Draft, draft-ietf-mmusic-t140-usage-data-channel- 1962 14, 10 April 2020, . 1965 [I-D.ietf-perc-private-media-framework] 1966 Jones, P., Benham, D., and C. Groves, "A Solution 1967 Framework for Private Media in Privacy Enhanced RTP 1968 Conferencing (PERC)", Work in Progress, Internet-Draft, 1969 draft-ietf-perc-private-media-framework-12, 5 June 2019, 1970 . 1973 [NENAi3] NENA, "NENA-STA-010.2-2016. Detailed Functional and 1974 Interface Standards for the NENA i3 Solution", October 1975 2016, . 1977 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1978 Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse- 1979 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1980 DOI 10.17487/RFC2198, September 1997, 1981 . 1983 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1984 A., Peterson, J., Sparks, R., Handley, M., and E. 1985 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1986 DOI 10.17487/RFC3261, June 2002, 1987 . 1989 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1990 with Session Description Protocol (SDP)", RFC 3264, 1991 DOI 10.17487/RFC3264, June 2002, 1992 . 1994 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1995 Jacobson, "RTP: A Transport Protocol for Real-Time 1996 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1997 July 2003, . 1999 [RFC3840] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, 2000 "Indicating User Agent Capabilities in the Session 2001 Initiation Protocol (SIP)", RFC 3840, 2002 DOI 10.17487/RFC3840, August 2004, 2003 . 2005 [RFC3841] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 2006 Preferences for the Session Initiation Protocol (SIP)", 2007 RFC 3841, DOI 10.17487/RFC3841, August 2004, 2008 . 2010 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 2011 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 2012 . 2014 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 2015 Session Initiation Protocol (SIP)", RFC 4353, 2016 DOI 10.17487/RFC4353, February 2006, 2017 . 2019 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A 2020 Session Initiation Protocol (SIP) Event Package for 2021 Conference State", RFC 4575, DOI 10.17487/RFC4575, August 2022 2006, . 2024 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 2025 (SIP) Call Control - Conferencing for User Agents", 2026 BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, 2027 . 2029 [RFC4597] Even, R. and N. Ismail, "Conferencing Scenarios", 2030 RFC 4597, DOI 10.17487/RFC4597, August 2006, 2031 . 2033 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 2034 Correction", RFC 5109, DOI 10.17487/RFC5109, December 2035 2007, . 2037 [RFC5194] van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real- 2038 Time Text over IP Using the Session Initiation Protocol 2039 (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008, 2040 . 2042 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 2043 Media Attributes in the Session Description Protocol 2044 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, 2045 . 2047 [RFC6443] Rosen, B., Schulzrinne, H., Polk, J., and A. Newton, 2048 "Framework for Emergency Calling Using Internet 2049 Multimedia", RFC 6443, DOI 10.17487/RFC6443, December 2050 2011, . 2052 [RFC6881] Rosen, B. and J. Polk, "Best Current Practice for 2053 Communications Services in Support of Emergency Calling", 2054 BCP 181, RFC 6881, DOI 10.17487/RFC6881, March 2013, 2055 . 2057 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 2058 DOI 10.17487/RFC7667, November 2015, 2059 . 2061 [T140] ITU-T, "Recommendation ITU-T T.140 (02/1998), Protocol for 2062 multimedia application text conversation", February 1998, 2063 . 2065 [T140ad1] ITU-T, "Recommendation ITU-T.140 Addendum 1 - (02/2000), 2066 Protocol for multimedia application text conversation", 2067 February 2000, 2068 . 2070 [TS103479] ETSI, "TS 103 479. Emergency communications (EMTEL); Core 2071 elements for network independent access to emergency 2072 services", December 2019, . 2076 [TS22173] 3GPP, "IP Multimedia Core Network Subsystem (IMS) 2077 Multimedia Telephony Service and supplementary services; 2078 Stage 1", 3GPP TS 22.173 17.1.0, 20 December 2019, 2079 . 2081 [TS24147] 3GPP, "Conferencing using the IP Multimedia (IM) Core 2082 Network (CN) subsystem; Stage 3", 3GPP TS 24.147 16.0.0, 2083 19 December 2019, 2084 . 2086 Author's Address 2088 Gunnar Hellstrom 2089 Gunnar Hellstrom Accessible Communication 2090 Esplanaden 30 2091 SE-136 70 Vendelso 2092 Sweden 2094 Phone: +46 708 204 288 2095 Email: gunnar.hellstrom@ghaccess.se