idnits 2.17.1 draft-hellstrom-avtcore-multi-party-rtt-solutions-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (8 August 2020) is 1356 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'ICE' is mentioned on line 1039, but not defined == Unused Reference: 'RFC3264' is defined on line 1960, but no explicit reference was found in the text == Outdated reference: A later version (-20) exists of draft-ietf-avtcore-multi-party-rtt-mix-06 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Hellstrom 3 Internet-Draft Gunnar Hellstrom Accessible Communication 4 Intended status: Informational 8 August 2020 5 Expires: 9 February 2021 7 Real-time text solutions for multi-party sessions 8 draft-hellstrom-avtcore-multi-party-rtt-solutions-03 10 Abstract 12 This document specifies methods for Real-Time Text (RTT) media 13 handling in multi-party calls. The main discussed transport is to 14 carry Real-Time text by the RTP protocol in a time-sampled mode 15 according to RFC 4103. The mechanisms enable the receiving 16 application to present the received real-time text media, separated 17 per source, in different ways according to user preferences. Some 18 presentation related features are also described explaining suitable 19 variations of transmission and presentation of text. 21 Call control features are described for the SIP environment. A 22 number of alternative methods for providing the multi-party 23 negotiation, transmission and presentation are discussed and a 24 recommendation for the main ones is provided. The main solution for 25 SIP based centralized multi-party handling of real-time text is 26 achieved through a media control unit coordinating multiple RTP text 27 streams into one RTP stream. 29 Alternative methods using a single RTP stream and source 30 identification inline in the text stream are also described, one of 31 them being provided as a lower functionality fallback method for 32 endpoints with no multi-party awareness for RTT. 34 Bridging methods where the text stream is carried without the 35 contents being dealt with in detail by the bridge are also discussed. 37 Brief information is also provided for multi-party RTT in the WebRTC 38 environment. 40 The intention is to provide background for decisions, specification 41 and implementation of selected methods. 43 Status of This Memo 45 This Internet-Draft is submitted in full conformance with the 46 provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF). Note that other groups may also distribute 50 working documents as Internet-Drafts. The list of current Internet- 51 Drafts is at https://datatracker.ietf.org/drafts/current/. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 This Internet-Draft will expire on 9 February 2021. 60 Copyright Notice 62 Copyright (c) 2020 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 67 license-info) in effect on the date of publication of this document. 68 Please review these documents carefully, as they describe your rights 69 and restrictions with respect to this document. Code Components 70 extracted from this document must include Simplified BSD License text 71 as described in Section 4.e of the Trust Legal Provisions and are 72 provided without warranty as described in the Simplified BSD License. 74 Table of Contents 76 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 77 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 78 2. Centralized conference model . . . . . . . . . . . . . . . . 5 79 3. Requirements on multi-party RTT . . . . . . . . . . . . . . . 6 80 3.1. General requirements . . . . . . . . . . . . . . . . . . 6 81 3.2. Performance requirements . . . . . . . . . . . . . . . . 7 82 4. RTP based solutions . . . . . . . . . . . . . . . . . . . . . 8 83 4.1. Coordination of text RTP streams . . . . . . . . . . . . 8 84 4.1.1. RTP-based solutions with a central mixer . . . . . . 8 85 4.1.1.1. RTP Mixer using default RFC 4103 methods . . . . 8 86 4.1.1.2. RTP Mixer using the default method but decreased 87 transmission interval . . . . . . . . . . . . . . . 9 88 4.1.1.3. RTP Mixer with frequent transmission and indicating 89 sources in CSRC-list . . . . . . . . . . . . . . . 10 90 4.1.1.4. RTP Mixer using timestamp to identify 91 redundancy . . . . . . . . . . . . . . . . . . . . 11 92 4.1.1.5. RTP Mixer with multiple primary data in each packet 93 and individual sequence numbers . . . . . . . . . . 12 94 4.1.1.6. RTP Mixer with multiple primary data in each 95 packet . . . . . . . . . . . . . . . . . . . . . . 13 97 4.1.1.7. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy 98 in the packets . . . . . . . . . . . . . . . . . . 14 99 4.1.1.8. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy 100 and separate sequence number in the packets . . . . 16 101 4.1.1.9. RTP Mixer indicating participants by a control code 102 in the stream . . . . . . . . . . . . . . . . . . . 18 103 4.1.1.10. Mixing for multi-party unaware user agents . . . 20 104 4.1.2. RTP-based bridging with minor RTT media contents 105 reformatting by the bridge . . . . . . . . . . . . . 21 106 4.1.2.1. RTP Translator sending one RTT stream per 107 participant . . . . . . . . . . . . . . . . . . . . 21 108 4.1.2.2. Distributing packets in an end-to-end encryption 109 structure . . . . . . . . . . . . . . . . . . . . . 24 110 4.1.2.3. Mesh of RTP endpoints . . . . . . . . . . . . . . 25 111 4.1.2.4. Multiple RTP sessions, one for each 112 participant . . . . . . . . . . . . . . . . . . . . 25 113 5. Preferred RTP-based multi-party RTT transport method . . . . 26 114 6. Session control of RTP-based multi-party RTT sessions . . . . 26 115 6.1. Implicit RTT multi-party capability indication . . . . . 27 116 6.2. RTT multi-party capability declared by SIP media-tags . . 28 117 6.3. SDP media attribute for RTT multi-party capability 118 indication . . . . . . . . . . . . . . . . . . . . . . . 29 119 6.4. Simplified SDP media attribute for RTT multi-party 120 capability indication . . . . . . . . . . . . . . . . . . 31 121 6.5. SDP format parameter for RTT multi-party capability 122 indication . . . . . . . . . . . . . . . . . . . . . . . 31 123 6.6. A text media subtype for support of multi-party rtt . . . 33 124 6.7. Preferred capability declaration method for RTP-based 125 transport. . . . . . . . . . . . . . . . . . . . . . . . 33 126 6.8. Identification of the source of text for RTP-based 127 solutions . . . . . . . . . . . . . . . . . . . . . . . . 33 128 7. RTT bridging in WebRTC . . . . . . . . . . . . . . . . . . . 34 129 7.1. RTT bridging in WebRTC with one data channel per 130 source . . . . . . . . . . . . . . . . . . . . . . . . . 34 131 7.2. RTT bridging in WebRTC with one common data channel . . . 35 132 7.3. Preferred rtt multi-party method for WebRTC . . . . . . . 35 133 8. Presentation of multi-party text . . . . . . . . . . . . . . 36 134 8.1. Associating identities with text streams . . . . . . . . 36 135 8.2. Presentation details for multi-party aware endpoints. . . 36 136 8.2.1. Bubble style presentation . . . . . . . . . . . . . . 37 137 8.2.2. Other presentation styles . . . . . . . . . . . . . . 38 138 9. Presentation details for multi-party unaware endpoints. . . . 39 139 10. Security Considerations . . . . . . . . . . . . . . . . . . . 39 140 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 141 12. Congestion considerations . . . . . . . . . . . . . . . . . . 40 142 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 40 143 14. Change history . . . . . . . . . . . . . . . . . . . . . . . 40 144 14.1. Changes to 145 draft-hellstrom-avtcore-multi-party-rtt-solutions-03 . . 40 146 14.2. Changes to 147 draft-hellstrom-avtcore-multi-party-rtt-solutions-02 . . 40 148 14.3. Changes to 149 draft-hellstrom-avtcore-multi-party-rtt-solutions-01 . . 40 150 14.4. Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to 151 draft-hellstrom-avtcore-multi-party-rtt-solutions-00 . . 41 152 14.5. Changes from version 153 draft-hellstrom-mmusic-multi-party-rtt-01 to -02 . . . . 41 154 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 41 155 15.1. Normative References . . . . . . . . . . . . . . . . . . 41 156 15.2. Informative References . . . . . . . . . . . . . . . . . 42 157 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 45 159 1. Introduction 161 Real-time text (RTT) is a medium in real-time conversational 162 sessions. Text entered by participants in a session is transmitted 163 in a time-sampled fashion, so that no specific user action is needed 164 to cause transmission. This gives a direct flow of text in the rate 165 it is created, that is suitable in a real-time conversational 166 setting. The real-time text medium can be combined with other media 167 in multimedia sessions. 169 Media from a number of multimedia session participants can be 170 combined in a multi-party session. The present document specifies 171 how the real-time text streams can be handled in multi-party 172 sessions. Recommendations are provided for preferred methods. 174 The description is mainly focused on the transport level, but also 175 describes a few session and presentation level aspects. 177 Transport of real-time text is specified in RFC 4103 [RFC4103] RTP 178 Payload for text conversation. It makes use of RFC 3550 [RFC3550] 179 Real Time Protocol, for transport. Robustness against network 180 transmission problems is normally achieved through redundant 181 transmission based on the principle from RFC 2198 [RFC2198], with one 182 primary and two redundant transmission of each text element. Primary 183 and redundant transmissions are combined in packets and described by 184 a redundancy header. This transport is usually used in the SIP 185 Session Initiation Protocol RFC 3261 [RFC3261] environment. 187 A very brief overview of functions for real-time text handling in 188 multi-party sessions is described in RFC 4597 [RFC4597] Conferencing 189 Scenarios, sections 4.8 and 4.10. The present specification builds 190 on that description and indicates which protocol mechanisms should be 191 used to implement multi-party handling of real-time text. 193 Real-time text can also be transported in the WebRTC environment, by 194 using WebRTC data channels according to 195 [I-D.ietf-mmusic-t140-usage-data-channel]. Multi-party aspects for 196 WebRTC solutions are briefly covered. 198 1.1. Requirements Language 200 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 201 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 202 document are to be interpreted as described in RFC 2119 [RFC2119]. 204 2. Centralized conference model 206 In the centralized conference model for SIP, introduced in RFC 4353 207 [RFC4353] "A Framework for Conferencing with the Session Initiation 208 Protocol (SIP)", one function co-ordinates the communication with 209 participants in the multi-party session. This function also controls 210 media mixer functions for the media appearing in the session. The 211 central function is common for control of all media, while the media 212 mixers may work differently for each media. 214 The central function is called the Focus UA. Many variants exist for 215 setting up sessions including the multipoint control centre. It is 216 not within scope of this description to describe these, but rather 217 the media specific handling in the mixer required to handle multi- 218 party calls with RTT. 220 The main principle for handling real-time text media in a centralized 221 conference is that one RTP session for real-time text is established 222 including the multipoint media control centre and the participating 223 endpoints which are going to have real-time text exchange with the 224 others. 226 The different possible mechanisms for mixing and transporting RTT 227 differs in the way they multiplex the text streams and how they 228 identify the sources of the streams. RFC 7667 [RFC7667] describes a 229 number of possible use cases for RTP. This specification refers to 230 different sections of RFC 7667 for further reading of the situations 231 caused by the different possible design choices. 233 The recommended method for using RTP based RTT in a centralized 234 conference model is specified in 235 [I-D.ietf-avtcore-multi-party-rtt-mix] based on the recommendations 236 in this document. 238 Real-time text can also be transported in the WebRTC environment, by 239 using WebRTC data channels according to 240 [I-D.ietf-mmusic-t140-usage-data-channel]. Ways to handle multi- 241 party calls in that environmnent are also specified. 243 3. Requirements on multi-party RTT 245 3.1. General requirements 247 The following general requirements are placed on multi-party RTT: 249 A solution shall be applicable to IMS (3GPP TS 22.173)[TS22173], 250 SIP based VoIP and Next Generation Emergency Services (NENA i3 251 [NENAi3], ETSI TS 103 479 [TS103479], RFC 6443[RFC6443]). 253 The transmission interval for text should not be longer than 500 254 milliseconds when there is anything available to send. Ref ITU-T 255 T.140 [T140]. 257 If text loss is detected or suspected, a missing text marker 258 should be inserted in the text stream. Ref ITU-T T.140 Amendment 259 1 [T140ad1]. ETSI EN 301 549 [EN301549] 261 The display of text from the members of the conversation shall be 262 arranged so that the text from each participant is clearly 263 readable, and its source and the relative timing of entered text 264 is visualized in the display. Mechanisms for looking back in the 265 contents from the current session should be provided. The text 266 should be displayed as soon as it is received. Ref ITU-T T.140 267 [T140] 269 Bridges must be multimedia capable (voice, video, text). Ref NENA 270 i3 STA-010.2. [NENAi3] 272 It MUST be possible to use real-time text in conferences both as a 273 medium of discussion between individual participants (for example, 274 for sidebar discussions in real-time text while listening to the 275 main conference audio) and for central support of the conference 276 with real-time text interpretation of speech. Ref (R7) in RFC 277 5194.[RFC5194] 279 It should be possible to protect RTT contents with usual means for 280 privacy and integrity. Ref RFC 6881 section 16. [RFC6881] 282 Conferencing procedures are documented in RFC 4579 [RFC4579]. Ref 283 NENA i3 STA-010.2.[NENAi3] 284 Conferencing applies to any kind of media stream by which users 285 may want to communicate. Ref 3GPP TS 24.147 [TS24147] 287 The framework for SIP conferences is specified in RFC 4353 288 [RFC4353]. Ref 3GPP TS 24.147 [TS24147] 290 3.2. Performance requirements 292 The mixer performance requirements can be expressed in one number, 293 extracted from the user requirements on real-time text expressed in 294 ITU-T F.700, where it is stated that for "good" usability, text 295 characters should not be delayed more than 1 second from creation to 296 presentation. For "usable" usability the figure is 2 seconds. The 297 main factor behind these limits is from when taking turns in a 298 conversation gets disturbed by a delay of when a response gets 299 visible to the receiving part. If that times get too long, the 300 receiving part gets unsure if the previous utterance was well 301 perceived and the receiving part maybe prepares for repetition. This 302 is similar to the same effect in voice communication, where the 303 usability limit is 400 ms delay. 305 Another important factor in a multi-party conference is the 306 opportunity for a participant using real-time text to provide timely 307 comments and get a chance to enter the discussion if the majority of 308 participants use voice in the conference. A complicating factor when 309 stating the requirements is that some transport methods do not cause 310 a total delay, but instead an increasing jerkiness when the number of 311 simultaneously sending participants is increased. 313 It should however be remembered that the expected number of 314 participants sending real-time text simultaneously is low. Just as 315 with voice or sign language, the capability of the participants to 316 perceive utterances from more than one participant at a time is very 317 limited. Therefore the normal case in multi-party situations is that 318 one participant at a time is the main provider of text. Others might 319 usually just provide very brief comments such as "yes" or "no" or 320 "may I comment?". Only at very rare situations two participants 321 provide more information simultaneously. 323 * The number of expected simultaneously transmitting users is 324 different for different applications. In all cases, just one 325 transmitting user is the normal case. Two simultaneously 326 transmitting participants can occasionally be expected in 327 emergency services, relay services, small unmanaged conferences 328 and group calls and large managed conferences. Three 329 simultaneously transmitting participants may appear occasionally 330 in large unmanaged conferences. The following can therefore 331 express the performance requirement. 333 * The mean delay of text passing the mixer introduced when only one 334 participant is sending text should be kept to a minimum and should 335 not be more than 400 ms. 337 * The mean delay of text passing the mixer should not be more than 1 338 second during moments when up to three users are sending text 339 simultaneously. 341 * For the very rare case that more than three participants send text 342 simultaneously, the mixer may take action to limit the introduced 343 delay of the text passing the mixer to 7 seconds e.g. by 344 discarding text from some participants and instead inserting a 345 general warning about possible text loss in the stream. 347 4. RTP based solutions 349 4.1. Coordination of text RTP streams 351 Coordinating and sending text RTP streams in the multi-party session 352 can be done in a number of ways. The most suitable methods are 353 specified here with pros and cons. 355 A receiving and presenting endpoint MUST separate text from the 356 different sources and identify and display them accordingly. 358 4.1.1. RTP-based solutions with a central mixer 360 A set of solutions can be based on the central RTP mixer. They are 361 described here and a preferred method selected. 363 4.1.1.1. RTP Mixer using default RFC 4103 methods 365 Without any extra specifications, a mixer would transmit with 300 366 milliseconds intervals, and use RFC 4103 [RFC4103] with the default 367 redundancy of one original and two redundant transmissions. The 368 source of the text would be indicated by a single member in the CSRC 369 list. Text from different sources cannot be transmitted in the same 370 packet. Therefore, from the time when the mixer sent one piece of 371 new text from one source, it will need to transmit that text again 372 twice as redundant data, before it can send text from another source. 373 The jerkiness = time between transmission of new text is 900 ms. 374 This is clearly insufficient. 376 Pros: 378 Only a capability negotiation method is needed. No other update of 379 standards are needed, just a general remark that traditional RTP- 380 mixing is used. 382 Cons: 384 Clearly insufficient mixer switching performance. 386 A bit complex handling of transmission when there is new text 387 available from more than one source. The mixer needs to send two 388 packets more with redundant text from the current source before 389 starting to send anything from the other source. 391 4.1.1.2. RTP Mixer using the default method but decreased transmission 392 interval 394 This method makes use of the default RTP-mixing method briefly 395 described in Section 4.1.1.1. The only difference is that the 396 transmission interval is decreased to 100 milliseconds when there is 397 text from more than one source available for transmission. The 398 jerkiness is 300 ms. The mean delay with two simultaneously sending 399 participants is 250 ms, and with three simultaneously sending 400 participants 500 ms. This is acceptable performance. 402 Pros: 404 Minor influence on standards 406 Can be relatively rapidly be introduced in the intended technical 407 environments. 409 Can be declared in sdp as the already existing "text/red" format with 410 a multi-party attribute for capability negotiation. 412 Cons: 414 The introduced jerkiness of new text from more than the required 415 three simultaneously sending sources is high. 417 Slightly higher risk for loss of text at bursty packet loss than for 418 the recommended transmission interval (300 ms) for RFC 4103. 420 When complete loss of packets occur (beyond recovery), it is not 421 possible to deduct from which source text was lost. 423 A bit complex handling of transmission when there is new text 424 available from more than one source. The mixer needs to send two 425 packets more with redundant text from the current source before 426 starting to send anything from the other source. 428 4.1.1.3. RTP Mixer with frequent transmission and indicating sources in 429 CSRC-list 431 An RTP media mixer combines text from participants into one RTP 432 stream, thus all using the same destination address/port combination, 433 the same RTP SSRC, and one sequence number series as described in 434 Section 7.1 and 7.3 of RTP RFC 3550 [RFC3550] about the Mixer 435 function. This method is also briefly described in RFC 7667, section 436 3.6.1 Media mixing mixer [RFC7667]. 438 The sources of the text in each RTP packet are identified by the CSRC 439 list in the RTP packets, containing the SSRC of the initial sources 440 of text. The order of the CSRC parameters is with the SSRC of the 441 source of the primary text first, followed by the SSRC of the first 442 level redundancy, and then the second level redundancy. 444 The transmission interval should be 100 milliseconds when there is 445 text to transmit from more than one source, and otherwise 300 ms. 447 The identification of the sources is made through the CSRC fields and 448 can be made more readable at the receiver through the RTCP SDES CNAME 449 and NAME packets as described in RTP[RFC3550]. 451 Information provided through the notification according to RFC 4575 452 [RFC4575] when the participant joined the conference provides also 453 suitable information and a reference to the SSRC. 455 A receiving endpoint is supposed to separate text items from the 456 different sources and identify and display them accordingly. 458 The ordered CSRC lists in the RFC 4103 [RFC4103] packets make it 459 possible to recover from loss of one and two packets in sequence and 460 assign the recovered text to the right source. For more loss, a 461 marker for possible loss should be inserted or presented. 463 The conference server needs to have authority to decrypt the payload 464 in the received RTP packets in order to be able to recover text from 465 redundant data or insert the missing text marker in the stream, and 466 repack the text in new packets. 468 Even if the format is very similar to "text/red" of RFC 4103, it 469 needs to be declared as a new media subtype, e.g. "text/rex". 471 Pros: 473 This method has low overhead and less complexity than the methods in 474 Section 4.1.1.1, Section 4.1.1.2, Section 4.1.1.4 and 475 Section 4.1.1.6. 477 When loss of packets occur, it is possible to recover text from 478 redundancy at loss of up to the number of redundancy levels carried 479 in the RFC 4103 [RFC4103] stream (normally primary and two redundant 480 levels). 482 This method can be implemented with most RTP implementations. 484 The source switching performance is sufficient for well-behaving 485 conference participants. The jerkiness is 100 ms. 487 Cons: 489 When more consecutive packet loss than the number of generations of 490 redundant data appears, it is not possible to deduct the sources of 491 the totally lost data. 493 Slightly higher risk for loss of text at bursty packet loss than for 494 the recommended transmission interval for RFC 4103. 496 Requires a different sub media format, e.g. "text/rex". This takes a 497 long time in standardisation and releases of target technical 498 environments. 500 The conference server needs to be allowed to decrypt/encrypt the 501 packet payload. This is however normal for media mixers for other 502 media. 504 4.1.1.4. RTP Mixer using timestamp to identify redundancy 506 This method has text only from one source per packet, as the original 507 RFC 4103 [RFC4103] specifies. Packets with text from different 508 sources are instead allowed to be merged. The recovery procedure in 509 the receiver will use the RTP timestamp and timestamp offsets in the 510 redundancy headers to evaluate if a piece of redundant data should be 511 recovered or not in case of packet loss. 513 In this method, the transmission interval is 100 milliseconds when 514 text from more than one source is available for transmission. 516 Pros: 518 The format of each packet is equal to what is specified in RFC 4103 519 [RFC4103]. 521 The source switching performance is sufficient. Text from five 522 participants can be transmitted simultaneously with 500 milliseconds 523 interval per source. 525 New text from five simultaneous sources can be transmitted within 500 526 milliseconds. This is sufficient. 528 Cons: 530 The recovery time in case of packet loss is long. With five 531 simultaneously sending participants, it will be 1.5 seconds. 533 The recovery procedure is complex and very different from what is 534 described in RFC 4103 [RFC4103]. 536 It is not sure that this change can be regarded to be an update to 537 RFC 4103. It may need a new media subtype. 539 4.1.1.5. RTP Mixer with multiple primary data in each packet and 540 individual sequence numbers 542 This method allows primary as well as redundant text from more than 543 one source per packet. The packet payload contains an ordered set of 544 redundant and primary data with the same number of generations of 545 redundancy as once agreed in the SDP negotiation. The data header 546 reflects these parts of the payload. The CSRC list contains one CSRC 547 member per source in the payload and in the same order. An 548 individual sequence number per source is included in the data header 549 replacing the t140 payload type number that is instead assumed to be 550 constant in this format. This allows an individual extra sequence 551 number per source with maximum value 127, suitable for checking for 552 which source loss of text appeared when recovery was not possible. 554 The data header would contain the following fields: 555 0 1 2 3 556 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 557 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 558 |F| Source-seq | timestamp offset | block length | 559 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 560 Where "Source-seq" is the sequence number per source. 562 The maximum number of members in the CSRC-list is 15, and that is 563 therefore the maximum number of sources that can be represented in 564 each packet provided that all data can be fitted into the size 565 allowable in one packet. 567 Transmission is done as soon as there is new text available, but not 568 with shorter interval than 150 ms and not longer than 300 ms while 569 there is anything to send. 571 A new media subtype is needed, e.g. "text/rex". 573 This is an SDP offer example for both traditional "text/red" 574 and multi-party "text/rex" format: 576 m=text 11000 RTP/AVP 101 100 98 577 a=rtpmap:98 t140/1000 578 a=rtpmap:100 red/1000 579 a=rtpmap:101 rex/1000 580 a=fmtp:100 98/98/98 581 a=fmtp:101 98/98/98 583 Pros: 585 The source switching performance is good. Text from 15 participants 586 can be transmitted simultaneously. 588 New text from 15 simultaneous sources can be transmitted within 300 589 milliseconds. This is good performance. 591 When more consecutive packet loss than the number of generations of 592 redundant data appears, it is still possible to deduct the sources of 593 the totally lost data, when next text from these sources arrive. 595 Cons: 597 The format of each packet is different from what is specified in RFC 598 4103 [RFC4103]. 600 The processing time in standard organisation will be long. 602 A new media subtype is needed, causing a bit complex negotiation. 604 The recovery procedure is a bit complex. 606 4.1.1.6. RTP Mixer with multiple primary data in each packet 608 This method allows primary as well as redundant text from more than 609 one source per packet. The packet payload contains an ordered set of 610 redundant and primary data with the same number of generations of 611 redundancy as once agreed in the SDP negotiation. The data header 612 reflects these parts of the payload. The CSRC list contains one CSRC 613 member per source in the payload and in the same order. 615 The maximum number of members in the CSRC-list is 15, and that is 616 therefore the maximum number of sources that can be represented in 617 each packet provided that all data can be fitted into the size 618 allowable in one packet. 620 Transmission is done as soon as there is new text available, but not 621 with shorter interval than 150 ms and not longer than 300 ms while 622 there is anything to send. 624 A new media subtype is needed, e.g. "text/rex". 626 SDP would be the same as in Section 4.1.1.6. 628 Pros: 630 The source switching performance is good. Text from 15 participants 631 can be transmitted simultaneously. 633 New text from 15 simultaneous sources can be transmitted within 150 634 milliseconds. This is good performance. 636 Cons: 638 The format of each packet is different from what is specified in RFC 639 4103 [RFC4103]. 641 A new media subtype is needed. 643 A new media subtype is needed, causing a bit complex negotiation. 645 The processing time in standard organisation will be long. 647 The recovery procedure is a bit complex [RFC4103]. 649 When more consecutive packet loss than the number of generations of 650 redundant data appears, it is not possible to deduct the sources of 651 the totally lost data. 653 4.1.1.7. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy in the 654 packets 656 This method allows primary data from one source and redundant text 657 from other sources in each packet. The packet payload contains 658 primary data in "text/t140" format, and redundant data in RFC 5109 659 FEC [RFC5109] format called "text/ulpfec". That means that the 660 redundant data contains the sequence number and the CSRC and other 661 characteristics from the RTP header when the data was sent as 662 primary. The redundancy can be sent at a selected number of packets 663 after when it was sent as primary, in order to improve the protection 664 against bursty packet loss. The redundancy level is recommended to 665 be the same as in original RFC 4103. 667 RFC 4103 says that the protection against loss can be made by other 668 methods than plain redundancy, so this method is in line with that 669 statement. 671 Transmission is done as soon as there is new text available, but not 672 with shorter interval than 100 ms and not longer than 300 ms while 673 there is anything to send (new or redundant text). 675 When more consecutive packet loss than the number of generations of 676 redundant data appears, it is not possible to deduct the sources of 677 the totally lost data. 679 The sdp can indicate the format as "text/red" with "text/ulpfec" 680 redundant data in this way. with traditional RFC 4103 with "text/red" 681 with "text/t140" as redundant data as a fallback. 683 m=text 49170 RTP/AVP 98 101 100 102 684 a=rtpmap:98 red/1000 685 a=fmtp:98 100/102/102 686 a=rtpmap:102 ulpfec/1000 687 a=rtpmap:100 t140/1000 688 a=rtpmap:101 red/1000 689 a=fmtp:101 100/100/100 690 a=fmtp:100 cps=200 692 The "text/ulpfec" format includes an indication of how far back the 693 redundancy belongs, making it possible to cover bursty packet loss 694 better than the other formats with short transmission intervals. For 695 real-time text, it is recommended to send three packets between the 696 primary and the redundant transmissions of text. That makes the 697 transmission cover between 500 and 1500 ms of bursty packet loss. 698 The variation is because of the varying packet interval between many 699 and one simultaneously transmitting source. 701 The "text/ulpfec" format has a number of parameters. One is the 702 length of the data to be protected which in this case must be the 703 whole t140block. 705 Pros: 707 The source switching performance is good. Text from 5 participants 708 can be transmitted within 500 ms. 710 Good recovery from bursty packet loss. 712 The method is based on existing standards. No new registrations are 713 needed. 715 Cons: 717 When more consecutive packet loss than the number of generations of 718 redundant data appears, it is not possible to deduct the sources of 719 the totally lost data. 721 Even if the switching performance is good, it is not as good as for 722 the method called "RTP Mixer with multiple primary data in each 723 packet "Section 4.1.1.6. With more than 5 simultaneously sending 724 sources, there will be a noticeable delay of text of over 500 ms, 725 with 100 ms added per simultaneous source. This is however beyond 726 the requirements and would be a concern only in congestion 727 situations. 729 The recovery procedure is a bit complex [RFC5109]. 731 There is more overhead in terms of extra data and extra packets sent 732 than in the other methods. With the recommended two redundant 733 generations of data, each packet will be 36 bytes longer than with 734 traditional RFC 4103, and at each pause in transmission five extra 735 packets with only redundant data will be sent compared to two extra 736 packets for the traditional RFC 4103 case. 738 4.1.1.8. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy and 739 separate sequence number in the packets 741 This method allows primary data from one source and redundant text 742 from other sources in each packet. The packet payload contains 743 primary data in a new "text/t140e" format, and redundant data in RFC 744 5109 FEC [RFC5109] format called "text/ulpfec". That means that the 745 redundant data contains the sequence number and the CSRC and other 746 characteristics from the RTP header when the data was sent as 747 primary. The redundancy can be sent at a selected number of packets 748 after when it was sent as primary, in order to improve the protection 749 against bursty packet loss. The redundancy level is recommended to 750 be the same as in original RFC 4103. The "text/t140e" format 751 contains a source-specific sequence number and the t140block. 753 RFC 4103 says that the protection against loss can be made by other 754 methods than plain redundancy, so this method is in line with that 755 statement. 757 Transmission is done as soon as there is new text available, but not 758 with shorter interval than 100 ms and not longer than 300 ms while 759 there is anything to send (new or redundant text). 761 When more consecutive packet loss than the number of generations of 762 redundant data appears, it is possible to deduct which sources lost 763 data when new data arrives from the sources. This is done by 764 monitoring the received source specific sequence numbers preceding 765 the text. 767 This is an example of how can indicate the format as "text/red" with 768 "text/t140e" as primary and "text/ulpfec" redundant data, with 769 traditional RFC 4103 with "text/red" with "text/t140" as redundant 770 data as a fallback. 772 m=text 49170 RTP/AVP 98 101 100 102 103 773 a=rtpmap:98 red/1000 774 a=fmtp:98 100/102/102 775 a=rtpmap:102 ulpfec/1000 776 a=rtpmap:103 t140/1000 777 a=rtpmap:100 t140e/1000 778 a=rtpmap:101 red/1000 779 a=fmtp:101 103/103/103 780 a=fmtp:100 cps=200 782 The "text/ulpfec" format includes an indication of how far back the 783 redundancy belongs, making it possible to cover bursty packet loss 784 better than the other formats with short transmission intervals. For 785 real-time text, it is recommended to send three packets between the 786 primary and the redundant transmissions of text. That makes the 787 transmission cover between 500 and 1500 ms of bursty packet loss. 788 The variation is because of the varying packet interval between many 789 and one simultaneously transmitting source. 791 The "text/ulpfec" format has a number of parameters. One is the 792 length of the data to be protected which in this case must be the 793 whole t140block. 795 Pros: 797 The source switching performance is good. Text from 5 participants 798 can be transmitted within 500 ms. 800 Good recovery from bursty packet loss. 802 The method is based on an existing standard for FEC. 804 When more consecutive packet loss than the number of generations of 805 redundant data appears, it is possible to deduct the source of the 806 lost data when new text arrives from the source. 808 Cons: 810 Even if the switching performance is good, it is not as good as for 811 the method called "RTP Mixer with multiple primary data in each 812 packet" Section 4.1.1.6. With more than 5 simultaneously sending 813 sources, there will be a noticeable delay of text of over 500 ms, 814 with 100 ms added per simultaneous source. This is however beyond 815 the requirements and would be a concern only in congestion 816 situations. 818 The recovery procedure is a bit complex [RFC5109]. 820 There is more overhead in terms of extra data and extra packets sent 821 than in the other methods. With the recommended two redundant 822 generations of data, each packet will be 40 bytes longer than with 823 traditional RFC 4103, and at each pause in transmission five extra 824 packets with only redundant data will be sent compared to two extra 825 packets for the traditional RFC 4103 case. 827 A new text media subtype "text/t140e" needs to be registered. 829 The processing time in standard organisation will be long. 831 4.1.1.9. RTP Mixer indicating participants by a control code in the 832 stream 834 Text from all participants except the receiving one is transmitted 835 from the media mixer in the same RTP session and stream, thus all 836 using the same destination address/port combination, the same RTP 837 SSRC and , one sequence number series as described in Section 7.1 and 838 7.3 of RTP RFC 3550 [RFC3550] about the Mixer function. The sources 839 of the text in each RTP packet are identified by a new defined T.140 840 control code "c" followed by a unique identification of the source in 841 UTF-8 string format. 843 The receiver can use the string for presenting the source of text. 844 This method is on the RTP level described in RFC 7667, section 3.6.1 845 Media mixing mixer [RFC7667]. 847 The inline coding of the source of text is applied in the data stream 848 itself, and an RTP mixer function is used for coordinating the 849 sources of text into one RTP stream. 851 Information uniquely identifying each user in the multi-party session 852 is placed as the parameter value "n" in the T.140 application 853 protocol function with the function code "c". The identifier shall 854 thus be formatted like this: SOS c n ST, where SOS and ST are coded 855 as specified in ITU-T T.140 [T140]. The "c" is the letter "c". The 856 n parameter value is a string uniquely identifying the source. This 857 parameter shall be kept short so that it can be repeated in the 858 transmission without concerns for network load. 860 A receiving endpoint is supposed to separate text items from the 861 different sources and identify and display them accordingly. 863 The conference server need to be allowed to decrypt/encrypt the 864 packet payload in order to check the source and repack the text. 866 Pros: 868 If loss of packets occur, it is possible to recover text from 869 redundancy at loss of up to the number of redundancy levels carried 870 in the RFC 4103 [RFC4103]stream. (normally primary and two redundant 871 levels. 873 This method can be implemented with most RTP implementations. 875 The method can also be used with other transports than RTP 877 Cons: 879 The method implies a moderate load by the need to insert the source 880 often in the stream. 882 If more consecutive packet loss than the number of generations of 883 redundant data appears, it is not possible to deduct the source of 884 the totally lost data. 886 The mixer needs to be able to generate suitable and unique source 887 identifications which are suitable as labels for the sources. 889 Requires an extension on the ITU-T T.140 standard, best made by the 890 ITU. 892 There is a risk that the control code indicating the change of source 893 is lost and the result is false source indication of text. 895 The conference server need to be allowed to decrypt/encrypt the 896 packet payload. 898 4.1.1.10. Mixing for multi-party unaware user agents 900 Multi-party real-time text contents can be transmitted to multi-party 901 unaware user agents if source labelling and formatting of the text is 902 performed by a mixer. This method has the limitations that the 903 layout of the presentation and the format of source identification is 904 purely controlled by the mixer, and that only one source at a time is 905 allowed to present in real-time. Other sources need to be stored 906 temporarily waiting for an appropriate moment to switch the source of 907 transmitted text. The mixer controls the switching of sources and 908 inserts a source identifier in text format at the beginning of text 909 after switch of source. The logic of the mixer to detect when a 910 switch is appropriate should detect a number of places in text where 911 a switch can be allowed, including new line, end of sentence, end of 912 phrase, a period of inactivity, and a word separator after a long 913 time of active transmission. 915 This method MAY be used when no support for multi-party awareness is 916 detected in the receiving endpoint.The base for his method is 917 described in RFC 7667, section 3.6.1 Media mixing mixer [RFC7667]. 919 See [I-D.ietf-avtcore-multi-party-rtt-mix] for a procedure for mixing 920 RTT for a conference-unaware endpoint. 922 Pros: 924 Can be transmitted to conference-unaware endpoints. 926 Can be used with other transports than RTP 928 Cons: 930 Does not allow full real-time presentation of more than one source at 931 a time. Text from other sources will be delayed. 933 The only realistic presentation format is a style with the text from 934 the different sources presented with a text label indicating source, 935 and the text collected in a chat style presentation but with more 936 frequent turn-taking. 938 Endpoints often have their own system for adding labels to the RTT 939 presentation. In that case there will be two levels of labels in the 940 presentation, one for the mixer and one for the sources. 942 If loss of more packets than can be recovered by the redundancy 943 appears, it is not possible to detect which source was struck by the 944 loss. It is also possible that a source switch occurred during the 945 loss, and therefore a false indication of the source of text can be 946 provided to the user after such loss. 948 Because of all these cons, this method is not recommended and should 949 be used as the main method, but only as fallback and the last resort 950 for backwards interoperability with multi-party unaware endpoints. 952 The conference server need to be allowed to decrypt/encrypt the 953 packet payload. 955 4.1.2. RTP-based bridging with minor RTT media contents reformatting by 956 the bridge 958 It may be desirable to send text in a multi-party setting in a way 959 that allows the text stream contents to be distributed without being 960 dealt with in detail in any central server. A number of such methods 961 are described. However, when writing this specification, no one of 962 these methods have a specified way of establishing the session by 963 sdp. 965 4.1.2.1. RTP Translator sending one RTT stream per participant 967 Within the RTP session, text from each participant is transmitted 968 from the RTP media translator (bridge) in a separate RTP stream, thus 969 using the same destination address/port combination, the same payload 970 type number (PT) but separate RTP SSRC parameters and sequence number 971 series as described in Section 7.1 and 7.2 of RTP RFC 3550 [RFC3550] 972 about the Translator function. The source of the text in each RTP 973 packet is identified by the SSRC parameter in the RTP packets, 974 containing the SSRC of the initial source of text. 976 A receiving and presenting endpoint is supposed to separate text 977 items from the different sources and identify and display them in a 978 suitable way. 980 This method is described in RFC 7667, section 3.5.1 Relay-transport 981 translator or 3.5.2 Media translator [RFC7667]. 983 The identification of the source is made through the SSRC. The 984 translation to a readable label can be done by mapping to information 985 from the RTCP SDES CNAME and NAME packets as described in 986 RTP[RFC3550], and also through information in the text media member 987 in the conference notification described in RFC 4575 [RFC4575]. 989 The sdp exchange for establishing this mixing type can be equal to 990 what is used for basic two-party use of RFC 4103 with just an added 991 attribute for indicating multi-party capability. 993 m=text 49170 RTP/AVP 98 103 994 a=rtpmap:98 red/1000 995 a=fmtp:98 103/103/103 996 a=rtpmap:103 t140/1000 997 a=fmtp:103 cps=150 998 a=RTT-mix:RTP-translator 1000 A similar answer including the same RTT-mix attribute would indicate 1001 that multi-party coding can begin. An answer without the same RTT- 1002 mix attribute could result in diversion to use of the mixing method 1003 for multi-party unaware endpoints Section 4.1.1.10 if more than two 1004 parties are involved in the session. 1006 The bridge can add new sources in the communication to a participant 1007 by first sending a conference notification according to RFC 4575 1008 [RFC4575] with the SSRC of the new source included in the 1009 corresponding "text" media member, or by sending an RTCP message with 1010 the new SSRC in an SDES packet. 1012 A receiver should be prepared to receive such indications of new 1013 streams being added to the multi-party session, so that the new SSRC 1014 is not taken for a change in SSRC value for an already established 1015 RTP stream. 1017 Transmission, reception, packet loss recovery and text loss 1018 indication is performed per source in the separate RTP streams in the 1019 same way as in two-party sessions with RFC 4103 [RFC4575]. 1021 Text is recommended to be sent by the bridge as soon as it is 1022 available for transmission, but not less than 250 ms after a previous 1023 transmission. This will in many cases result in close to 0 added 1024 delay by the bridge, because most RTT senders use a 300 ms 1025 transmission interval. 1027 It is sometimes said that this configuration is not supported by 1028 current media declarations in sdp. RFC 3264 [RFC3264]specifies in 1029 some places that one media description is supposed to describe just 1030 one RTP media stream. However this is not directly referencing an 1031 RTP stream, and use of multiple RTP streams in the same RTP session 1032 is recommended in many other RFCs. 1034 This confusion is clarified in RFC 5576 [RFC5576] section 3 by the 1035 following statements: 1037 "The term "media stream" does not appear in the SDP specification 1038 itself, but is used by a number of SDP extensions, for instance, 1039 Interactive Connectivity Establishment (ICE) [ICE], to denote the 1040 object described by an SDP media description. This term is 1041 unfortunately rather confusing, as the RTP specification [RFC3550] 1042 uses the term "media stream" to refer to an individual media source 1043 or RTP packet stream, identified by an SSRC, whereas an SDP media 1044 stream describes an entire RTP session, which can contain any number 1045 of RTP sources." 1047 In most cases, it will be sufficient that new sources are introduced 1048 with a conference notification or RTCP message. However, RFC 5576 1049 [RFC5576] specifies attributes which may be used to more explicitly 1050 announce new sources or restart of earlier established RTP streams. 1052 This method is encouraged by draft-ietf-avtcore-multiplex-guidelines 1053 [I-D.ietf-avtcore-multiplex-guidelines] section 5.2. 1055 Normal operation will be that the bridge receives text packets from 1056 the source and handles any text recovery and indication of loss 1057 needed before queueing the resulting clean text for transmission from 1058 the bridge to the receivers. 1060 It may however also be possible for the bridge to just convey the 1061 packet contents as received from the sources, with minor adjustments, 1062 and let the receiving endpoint handle all aspects of recovery and 1063 indication of loss, even for the source to bridge path. In that case 1064 also the sequence number must be maintained as it was at reception in 1065 the bridge. This mode needs further study before application. 1067 Pros: 1069 This method is the natural way to do multi-party bridging with RFC 1070 4103 based RTT. Only a small addition is included in the session 1071 establishment to verify capability by the parties because many 1072 implementations are done without multi-party capability. 1074 This method has moderate overhead in terms of work for the mixer, but 1075 high in terms of packet transmission rate. Five sources sending 1076 simultaneously cause the bridge to send 15 packets per second to each 1077 receiver. 1079 When loss of packets occur, it is possible to recover text from 1080 redundancy at loss of up to the number of redundancy levels carried 1081 in the RFC 4103 [RFC4103] stream(normally primary and two redundant 1082 levels). 1084 More loss than what can be recovered, can be detected and the marker 1085 for text loss can be inserted in the correct stream. 1087 It may be possible in some scenarios to keep the text encrypted 1088 through the Translator. 1090 Minimal delay. The delay can often be kept close to 0 with at least 1091 5 simultaneous sending participants. 1093 Cons: 1095 There are RTP implementations not supporting the Translator model. 1096 They will need to use the fall-back to multi-party-unaware mixing. 1097 An investigation about how common this is is needed before the method 1098 is used. 1100 The processing time in standard organisation will be long. 1102 With many simultaneous sending sources, the total rate of packets 1103 will be high, and can cause congestion. The requirement to handle 3 1104 simultaneous sources in this specification will cause 10 packets per 1105 second that is manageable in most cases, e.g. considering that audio 1106 usually use 50 packets per second. 1108 4.1.2.2. Distributing packets in an end-to-end encryption structure 1110 In order to achieve end-to-end encryption, it is possible to let the 1111 packets from the sources just pass though a central distributor, and 1112 handle the security agreements between the participants. 1113 Specifications exist for a framework with this functionality for 1114 application on RTP based conferences in 1115 [I-D.ietf-perc-private-media-framework]. The RTP flow and mixing 1116 characteristics has similarities with the method described under "RTP 1117 Translator sending one RTT stream per participant" above. RFC 4103 1118 RTP streams [RFC4103] would fit into the structure and it would 1119 provide a base for end-to-end encrypted rtt multi-party conferencing. 1121 Pros: 1123 Good security 1125 Straightforward multi-party handling. 1127 Cons: 1129 Does not operate under the usual SIP central conferencing 1130 architecture. 1132 Requires the participants to perform a lot of key handling. 1134 Is work in progress when this is written. 1136 4.1.2.3. Mesh of RTP endpoints 1138 Text from all participants are transmitted directly to all others in 1139 one RTP session, without a central bridge. The sources of the text 1140 in each RTP packet are identified by the source network address and 1141 the SSRC. 1143 This method is described in RFC 7667, section 3.4 Point to multi- 1144 point using mesh [RFC7667]. 1146 Pros: 1148 When loss of packets occur, it is possible to recover text from 1149 redundancy at loss of up to the number of redundancy levels carried 1150 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 1151 levels. 1153 This method can be implemented with most RTP implementations. 1155 Transmitted text can also be used with other transports than RTP 1157 Cons: 1159 This model is not described in IMS, NENA and EENA specifications, and 1160 does therefore not meet the requirements. 1162 Requires a drastically increasing number of connections when the 1163 number of participants increase. 1165 4.1.2.4. Multiple RTP sessions, one for each participant 1167 Text from all participants are transmitted directly to all others in 1168 one RTP session each, without a central bridge. Each session is 1169 established with a separate media description in SDP. The sources of 1170 the text in each RTP packet are identified by the source network 1171 address and the SSRC. 1173 Pros: 1175 When loss of packets occur, it is possible to recover text from 1176 redundancy at loss of up to the number of redundancy levels carried 1177 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 1178 levels. 1180 Complete loss of text can be indicated in the received stream. 1182 This method can be implemented with most RTP implementations. 1184 End-to-end encryption is achievable. 1186 Cons: 1188 This method is not described in IMS, NENA and ETSI specifications and 1189 does therefore not meet the requirements. 1191 A lot of network resources are spent on setting up separate sessions 1192 for each participant. 1194 5. Preferred RTP-based multi-party RTT transport method 1196 For RTP transport of RTT using RTP-mixer technology, one method for 1197 multi-party mixing and transport stand out as fulfilling the goals 1198 best and is therefore recommended. That is: "RTP Mixer using the 1199 default method but decreased transmission interval" Section 4.1.1.2 1201 For RTP transport in separate streams or sessions, no current 1202 recommendation can be made. A bridging method in the process of 1203 standardisation with interesting characteristics is the end-to-end 1204 encryption model "perc" Section 4.1.2.2. 1206 6. Session control of RTP-based multi-party RTT sessions 1208 General session control aspects for multi-party sessions are 1209 described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP) 1210 Event Package for Conference State, and RFC 4579 [RFC4579] Session 1211 Initiation Protocol (SIP) Call Control - Conferencing for User 1212 Agents. The nomenclature of these specifications are used here. 1214 The procedures for a multi-party aware model for RTT-transmission 1215 shall only be applied if a capability exchange for multi-party aware 1216 real-time text transmission has been completed and a supported method 1217 for multi-party real-time text transmission can be negotiated. 1219 A method for detection of conference-awareness for centralized SIP 1220 conferencing in general is specified in RFC 4579 [RFC4579]. The 1221 focus sends the "isfocus" feature tag in a SIP Contact header. This 1222 causes the conference-aware endpoint to subscribe to conference 1223 notifications from the focus. The focus then sends notifications to 1224 the endpoint about entering and disappearing conference participants 1225 and their media capabilities. The information is carried XML- 1226 formatted in a 'conference-info' block in the notification according 1227 to RFC 4575 [RFC4575]. The mechanism is described in detail in RFC 1228 4575 [RFC4575]. 1230 Before a conference media server starts sending multi-party RTT to an 1231 endpoint, a verification of its ability to handle multi-party RTT 1232 must be made. A decision on which mechanism to use for identifying 1233 text from the different participants must also be taken, implicitly 1234 or explicitly. These verifications and decisions can be done in a 1235 number of ways. The most apparent ways are specified here and their 1236 pros and cons described. One of the methods is selected to be the 1237 one to be used by implementations of the centralized conference model 1238 according to this specification. 1240 6.1. Implicit RTT multi-party capability indication 1242 Capability for RTT multi-party handling can be decided to be 1243 implicitly indicated by session control items. 1245 The focus may implicitly indicate muti-party RTT capability by 1246 including the media child with value "text" in the RFC 4575 [RFC4575] 1247 conference-info provided in conference notifications. 1249 An endpoint may implicitly indicate multi-party RTT capability by 1250 including the text media in the SDP in the session control 1251 transactions with the conference focus after the subscription to the 1252 conference has taken place. 1254 The implicit RTT capability indication means for the focus that it 1255 can handle multi-party RTT according to the preferred method 1256 indicated in the RTT multi-party methods section above. 1258 The implicit RTT capability indication means for the endpoint that it 1259 can handle multi-party RTT according to the preferred method 1260 indicated in the RTT multi-party methods section above. 1262 If the focus detects that an endpoint implicitly declared RTT multi- 1263 party capability, it SHALL provide RTT according to the preferred 1264 method. 1266 If the focus detects that the endpoint does not indicate any RTT 1267 multi-party capability, then it shall either provide RTT multi-party 1268 text in the way specified for conference-unaware endpoint above, or 1269 refuse to set up the session. 1271 If the endpoint detects that the focus has implicitly declared RTT 1272 multi-party capability, it shall be prepared to present RTT in a 1273 multi-party fashion according to the preferred method. 1275 Pros: 1277 Acceptance of implicit multi-party capability implies that no 1278 standardisation of explicit RTT multi-party capability exchange is 1279 required. 1281 Cons: 1283 If other methods for multi-party RTT are to be used in the same 1284 implementation environment as the preferred ones, then capability 1285 exchange needs to be defined for them. 1287 Cannot be used outside a strictly applied SIP central conference 1288 model. 1290 6.2. RTT multi-party capability declared by SIP media-tags 1292 Specifications for RTT multi-party capability declarations can be 1293 agreed for use as SIP media feature tags, to be exchanged during SIP 1294 call control operation according to the mechanisms in RFC 3840 1295 [RFC3840] and RFC 3841 [RFC3841]. Capability for the RTT Multi-party 1296 capability is then indicated by the media feature tag "rtt-mix", with 1297 a set of possible values for the different possible methods. 1299 The possible values in the list may for example be: 1301 rtp-mixer 1303 perc 1305 rtp-mixer indicates capability for using the RTP-mixer based 1306 presentation of multi-party text. 1308 perc indicates capability for using the perc based transmission of 1309 multi-party text. 1311 Example: Contact: 1313 ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL" 1314 ;+sip.rtt-mix="rtp-mixer" 1316 If, after evaluation of the alternatives in this specification, only 1317 one mixing method is selected to be brought to implementation, then 1318 the media tag can be reduced to a single tag with no list of values. 1320 An offer-answer exchange should take place and the common method 1321 selected by the answering party shall be used in the session with 1322 that UA. 1324 When no common method is declared, then only the fallback method for 1325 multi-party unaware participants can be used, or the session dropped. 1327 If more than one text media section is included in SDP, all must be 1328 capable of using the declared RTT multi-party method. 1330 Pros: 1332 Provides a clear decision method. 1334 Can be extended with new mixing methods. 1336 Can guide call routing to a suitable capable focus. 1338 Cons: 1340 Requires standardization and IANA registration. 1342 Is not stream specific. If more than one text stream is specified, 1343 all must have the same type of multi-party capability. 1345 Cannot be used in the WebRTC environment. 1347 6.3. SDP media attribute for RTT multi-party capability indication 1349 An attribute can be specified on media level, to be used in text 1350 media SDP declarations for negotiating RTT multi-party capabilities. 1351 The attribute can have the name "rtt-mix". 1353 More than one attribute can be included in one media description. 1355 The attribute can have a value. The value can for example be: 1357 rtp-mixer 1359 rtp-translator 1360 perc 1362 rtp-mixer indicates capability for using the RTP-mixer and CSRC-list 1363 based mixing of multi-party text. 1365 rtp-translator indicates capability for using the RTP-translator 1366 based mixing 1368 perc indicates capability for using the perc based transmission of 1369 multi-party text. 1371 An offer-answer exchange should take place and the common method 1372 selected by the answering party shall be used in the session with 1373 that endpoint. 1375 When no common method is declared, then only the fallback method for 1376 multi-party unaware endpoints can be used. 1378 Example: a=rtt-mix:rtp-mixer 1380 If, after evaluation of the alternatives in this specification, only 1381 one mixing method is selected to be brought to implementation, then 1382 the attribute can be reduced to a single attribute with no list of 1383 values. 1385 Pros: 1387 Provides a clear decision method. 1389 Can be extended with new mixing methods. 1391 Can be used on specific text media. 1393 Can be used also for SDP-controlled WebRTC sessions with multiple 1394 streams in the same data channel. 1396 Cons: 1398 Requires standardization and IANA registration. 1400 Cannot guide SIP routing. 1402 6.4. Simplified SDP media attribute for RTT multi-party capability 1403 indication 1405 An attribute can be specified on media level, to be used in text 1406 media SDP declarations for negotiating RTT multi-party capabilities. 1407 The attribute can have a name suitable for the selected method and no 1408 value. It would be selected and used if only one method for multi- 1409 party rtt is brought forward from this specification, and the other 1410 suppressed or found to be possible to negotiate in another way. 1412 An offer-answer exchange should take place and if both parties 1413 specify rtt-mixing capability with the same attribute, the selected 1414 mixing method shall be used. 1416 When no common method is declared, then only the fallback method for 1417 multi-party unaware endpoints can be used, or the session not 1418 accepted for multi-party use. 1420 Example: a=rtt-mix-rtp-mixer 1422 Pros: 1424 Provides a clear decision method. 1426 Very simple syntax and semantics. 1428 Can be used on specific text media. 1430 Cons: 1432 Requires standardization and IANA registration. 1434 If another RTT mixing method is also specified in the future, then 1435 that method may also need to specify and register its own attribute, 1436 instead of if an attribute with a parameter value is used, when only 1437 an addition of a new possible value is needed. 1439 Cannot guide SIP routing. 1441 6.5. SDP format parameter for RTT multi-party capability indication 1443 An FMTP format parameter can be specified for the RFC 4103 1444 [RFC4103]media, to be used in text media SDP declarations for 1445 negotiating RTT multi-party capabilities. The parameter can have the 1446 name "rtt-mix", with one or more of its possible values. 1448 The possible values in the list are: 1450 rtp-mixer 1452 perc 1454 rtp-mixer indicates capability for using the RTP-mixer based mixing 1455 and presentation of multi-party text using the CSRC-list. 1457 perc indicates capability for using the perc based transmission of 1458 multi-party text. 1460 Example: a=fmtp 96 98/98/98 rtt-mix=rtp-mixer 1462 If, after evaluation of the alternatives in this specification, only 1463 one mixing method is selected to be brought to implementation, then 1464 the parameter can be reduced to a single parameter with no list of 1465 values. 1467 An offer-answer exchange should take place and the common method 1468 selected by the answering party shall be used in the session with 1469 that UA. 1471 When no common method is declared, then only the fallback method can 1472 be used, or the session denied. 1474 Pros: 1476 Provides a clear decision method. 1478 Can be extended with new mixing methods. 1480 Can be used on specific text media. 1482 Can be used also for SDP-controlled WebRTC sessions with multiple 1483 streams in the same data channel. 1485 Cons: 1487 Requires standardization and IANA registration. 1489 May cause interop problems with current RFC4103 [RFC4103] 1490 implementations not expecting a new fmtp-parameter. 1492 Cannot guide SIP routing. 1494 6.6. A text media subtype for support of multi-party rtt 1496 Indicating a specific text media subtype in SDP is a straightforward 1497 way for negotiating multi-party capability. Especially if there are 1498 format differences from the "text/red" and "text/t140" formats of 1499 RFC4103 [RFC4103], then this is a natural way to do the negotiation 1500 for multi-party rtt. 1502 Pros: 1504 No extra efforts if a new format is needed anyway. 1506 Cons: 1508 None specific to using the format indication for negotiation of 1509 multi-party capability. But only feasible if a new format is needed 1510 anyway. 1512 6.7. Preferred capability declaration method for RTP-based transport. 1514 If the preferred transport method is one with a specific media 1515 subtype in sdp, then speciication by media subtype is preferred. 1517 If this would not be the case, then the preferred capability 1518 declaration method would be the one with a specific SDP attribute for 1519 the selected mixing method Section 6.4 because it is straightforward. 1521 6.8. Identification of the source of text for RTP-based solutions 1523 The main way to identify the source of text in the RTP based solution 1524 is by the SSRC of the sending participant. In the RTP-mixer 1525 solution, this SSRC is included in the CSRC list of the transmitted 1526 packets. Further identification that may be needed for better 1527 labelling of received text may be achieved from a number of sources. 1528 It may be the RTCP SDES CNAME and NAME reports, and in the conference 1529 notification data (RFC 4575) [RFC4575]. 1531 As soon as a new member is added to the RTP session, its 1532 characteristics should be transmitted in RTCP SDES CNAME and NAME 1533 reports according to section 6.5 in RFC 3550 [RFC3550]. The 1534 information about the participant should also be included in the 1535 conference data including the text media member in a notification 1536 according to RFC 4575 [RFC4575]. 1538 The RTCP SDES report, SHOULD contain identification of the source 1539 represented by the SSRC/CSRC identifier. This identification MUST 1540 contain the CNAME field and MAY contain the NAME field and other 1541 defined fields of the SDES report. 1543 A focus UA SHOULD primarily convey SDES information received from the 1544 sources of the session members. When such information is not 1545 available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME 1546 information from available information from the SIP session with the 1547 participant. 1549 Provision of detailed information in the NAME field has security 1550 implications, especially if provided without encryption. 1552 7. RTT bridging in WebRTC 1554 Within WebRTC, real-time text is specified to be carried in WebRTC 1555 data channels as specified in 1556 [I-D.ietf-mmusic-t140-usage-data-channel]. A few ways to handle 1557 multi-party RTT are mentioned briefly. They are repeated below. 1559 7.1. RTT bridging in WebRTC with one data channel per source 1561 A straightforward way to handle multi-party RTT is for the bridge to 1562 open one T.140 data channel per source towards the receiving 1563 participants. 1565 The stream-id forms a unique stream identification. 1567 The identification of the source is made through the Label property 1568 of the channel, and session information belonging to the source. The 1569 endpoint can compose a readable label for the presentation from this 1570 information. 1572 Pros: 1574 This is a straightforward solution. 1576 The load per source is low. 1578 Cons: 1580 With a high number of participants, the overhead of establishing and 1581 maintaining the high number of data channels required may be high, 1582 even if the load per channel is low. 1584 7.2. RTT bridging in WebRTC with one common data channel 1586 A way to handle multi-party RTT in WebRTC is for the bridge combine 1587 text from all sources into one data channel and insert the sources in 1588 the stream by a T.140 control code for source. 1590 This method is described in a corresponding section for RTP 1591 transmission above in Section 4.1.1.9. 1593 The identification of the source is made through insertion in the 1594 beginning of each text transmission from a source of a control code 1595 extension "c" followed by a string representing the source, framed by 1596 the control code start and end flags SOS and ST (See ITU-T T.140 1597 [T140]). 1599 A receiving endpoint is supposed to separate text items from the 1600 different sources and identify and display them in a suitable way. 1602 The endpoint does not always display the source identification in the 1603 received text at the place where it is received, but has the 1604 information as a guide for planning the presentation of received 1605 text. A label corresponding to the source identification is 1606 presented when needed depending on the selected presentation style. 1608 Pros: 1610 This solution has relatively low overhead on session and network 1611 level 1613 Cons: 1615 This solution has higher overhead on the media contents level than 1616 the WebRTC solution above. 1618 Standardisation of the new control code "c" in ITU-T T.140 [T140] is 1619 required. 1621 The conference server need to be allowed to decrypt/encrypt the data 1622 channel contents. 1624 7.3. Preferred rtt multi-party method for WebRTC 1626 For WebRTC, one method is to prefer because of the simplicity. So, 1627 for WebRTC, the method to implement for multi-party RTT with multi- 1628 party aware parties when no other method is explicitly agreed between 1629 implementing parties is: "RTT bridging in WebRTC with one data 1630 channel per source" Section 7.1. 1632 8. Presentation of multi-party text 1634 All session participants with RTP based transport MUST observe the 1635 SSRC/CSRC field of incoming text RTP packets, and make note of which 1636 source they came from in order to be able to present text in a way 1637 that makes it easy to read text from each participant in a session, 1638 and get information about the source of the text. 1640 In the WebRTC case, the Label parameter and other provided endpoint 1641 information should be used for the same purpose. 1643 8.1. Associating identities with text streams 1645 A source identity SHOULD be composed from available information 1646 sources and displayed together with the text as indicated in ITU-T 1647 T.140 Appendix[T140]. 1649 The source identity should primarily be the NAME field from incoming 1650 SDES packets. If this information is not available, and the session 1651 is a two-party session, then the T.140 source identity SHOULD be 1652 composed from the SIP session participant information. For multi- 1653 party sessions the source identity may be composed by local 1654 information if sufficient information is not available in the 1655 session. 1657 Applications may abbreviate the presented source identity to a 1658 suitable form for the available display. 1660 Applications may also replace received source information with 1661 internally used nicknames. 1663 8.2. Presentation details for multi-party aware endpoints. 1665 The multi-party aware endpoint should after any action for recovery 1666 of data from lost packets, separate the incoming streams and present 1667 them according to the style that the receiving application supports 1668 and the user has selected. The decisions taken for presentation of 1669 the multi-party interchange shall be purely on the receiving side. 1670 The sending application must not insert any item in the stream to 1671 influence presentation that is not requested by the sending 1672 participant. 1674 8.2.1. Bubble style presentation 1676 One often used style is to present real-time text in chunks in 1677 readable bubbles identified by labels containing names of sources. 1678 Bubbles are placed in one column in the presentation area and are 1679 closed and moved upwards in the presentation area after certain items 1680 or events, when there is also newer text from another source that 1681 would go into a new bubble. The text items that allows bubble 1682 closing are any character closing a phrase or sentence followed by a 1683 space or a timeout of a suitable time (about 10 seconds). 1685 Real-time active text sent from the local user should be presented in 1686 a separate area. When there is a reason to close a bubble from the 1687 local user, the bubble should be placed above all real-time active 1688 bubbles, so that the time order that real-time text entries were 1689 completed is visible. 1691 Scrolling is usually provided for viewing of recent or older text. 1692 When scrolling is done to an earlier point in the text, the 1693 presentation shall not move the scroll position by new received text. 1694 It must be the decision of the local user to return to automatic 1695 viewing of latest text actions. It may be useful with an indication 1696 that there is new text to read after scrolling to an earlier position 1697 has been activated. 1699 The presentation area may become too small to present all text in all 1700 real-time active bubbles. Various techniques can be applied to 1701 provide a good overview and good reading opportunity even in such 1702 situations. The active real-time bubble may have a limited number of 1703 lines and if their contents need more lines, then a scrolling 1704 opportunity within the real-time active bubble is provided. Another 1705 method can be to only show the label and the last line of the active 1706 real-time bubble contents, and make it possible to expand or compress 1707 the bubble presentation between full view and one line view. 1709 Erasures require special consideration. Erasure within a real-time 1710 active bubble is straightforward. But if erasure from one 1711 participant affects the last character before a bubble, the whole 1712 previous bubble becomes the actual bubble for real-time action by 1713 that participant and is placed below all other bubbles in the 1714 presentation area. If the border between bubbles was caused by the 1715 CRLF characters (instead of the normal "Line Separator"), only one 1716 erasure action is required to erase this bubble border. When a 1717 bubble is closed, it is moved up, above all real-time active bubbles. 1719 A three-party view is shown in this example . 1721 _________________________________________________ 1722 | |^| 1723 | |-| 1724 |[Alice] Hi, Alice here. | | 1725 | | | 1726 |[Bob] Bob as well. | | 1727 | | | 1728 |[Eve] Hi, this is Eve, calling from Paris. | | 1729 | I thought you should be here. | | 1730 | | | 1731 |[Alice] I am coming on Thursday, my | | 1732 | performance is not until Friday morning.| | 1733 | | | 1734 |[Bob] And I on Wednesday evening. | | 1735 | | | 1736 |[Alice] Can we meet on Thursday evening? | | 1737 | | | 1738 |[Eve] Yes, definitely. How about 7pm. | | 1739 | at the entrance of the restaurant | | 1740 | Le Lion Blanc? | | 1741 |[Eve] we can have dinner and then take a walk | | 1742 | | | 1743 | But I need to be back to | | 1744 | the hotel by 11 because I need | | 1745 | | | 1746 | I wou |-| 1747 |______________________________________________|v| 1748 | of course, I underst | 1749 |________________________________________________| 1751 Figure 1: Three-party call with bubble style. 1753 Figure 1: Example of a three-party call presented in the bubble 1754 style. 1756 8.2.2. Other presentation styles 1758 Other presentation styles than the bubble style may be arranged and 1759 appreciated by the users. In a video conference one way may be to 1760 have a real-time text area below the video view of each participant. 1761 Another view may be to provide one column in a presentation area for 1762 each participant and place the text entries in a relative vertical 1763 position corresponding to when text entry in them was completed. The 1764 labels can then be placed in the column header. The considerations 1765 for ending and moving and erasure of entered text discussed above for 1766 the bubble style are valid also for these styles. 1768 This figure shows how a coordinated column view MAY be presented. 1770 _____________________________________________________________________ 1771 | Bob | Eve | Alice | 1772 |____________________|______________________|_______________________| 1773 | | |I will arrive by TGV. | 1774 |My flight is to Orly| |Convenient to the main | 1775 | |Hi all, can we plan |station. | 1776 | |for the seminar? | | 1777 |Eve, will you do | | | 1778 |your presentation on| | | 1779 |Friday? |Yes, Friday at 10. | | 1780 |Fine, wo | |We need to meet befo | 1781 |___________________________________________________________________| 1783 Figure 2: A coordinated column-view of a three-party session with 1784 entries ordered in approximate time-order. 1786 9. Presentation details for multi-party unaware endpoints. 1788 Multi-party unaware endpoints are prepared only for presentation of 1789 two sources of text, the local user and a remote user. If mixing for 1790 multi-party unaware endpoints is to be supported, in order to enable 1791 some multi-party communication with such endpoint, the mixer need to 1792 plan the presentation and insert labels and line breaks before 1793 lables. Many limitations appear for this presentation mode, and it 1794 must be seen as a fallback and a last resort. 1796 A procedure for presenting RTT to a conference-unaware endpoint is 1797 included in [I-D.ietf-avtcore-multi-party-rtt-mix] 1799 10. Security Considerations 1801 The security considerations valid for RFC 4103 [RFC4103] and RFC 3550 1802 [RFC3550] are valid also for the multi-party sessions with text. 1804 11. IANA Considerations 1806 The items for indication and negotiation of capability for multi- 1807 party rtt should be registered with IANA in the specifications where 1808 they are specified in detail. 1810 12. Congestion considerations 1812 The congestion considerations described in RFC 4103 [RFC4103] are 1813 valid also for the recommended RTP-based multi-party use of the real- 1814 time text transport. A risk for congestion may appear if a number of 1815 conference participants are active transmitting text simultaneously, 1816 because the recommended RTP-based multi-party transmission method 1817 does not allow multiple sources of text to contribute to the same 1818 packet. 1820 In situations of risk for congestion, the Focus UA MAY combine 1821 packets from the same source to increase the transmission interval 1822 per source up to one second. Local conference policy in the Focus UA 1823 may be used to decide which streams shall be selected for such 1824 transmission frequency reduction. 1826 13. Acknowledgements 1828 Arnoud van Wijk for contributions to an earlier, expired draft of 1829 this memo. 1831 14. Change history 1833 14.1. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-03 1835 Modified info on the method with RFC 4103 format and sdp attribute 1836 "rtt-mix-rtp-mixer". 1838 Increased the performance requirements section. 1840 Inserted recommendations, with emphasis on ease of implementation and 1841 ease of standardisation. 1843 14.2. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-02 1845 Added detail in the section on RTP translator model alternative 1846 4.1.2.1. 1848 14.3. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-01 1850 Added three more methods for RTP-mixer mixing. Two RFC 5109 FEC 1851 based and another with modified data header to detect source of 1852 completely lost text. 1854 Separated RTP-based and WebRTC based solutions. 1856 Deleted the multi-party-unaware mixing procedure appendix. It is now 1857 included in the draft draft-ietf-avtcore-multi-party-rtt-mix. Kept a 1858 section with a reference to the new place. 1860 14.4. Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to draft- 1861 hellstrom-avtcore-multi-party-rtt-solutions-00 1863 Add discussion about switching performance, as discussed in avtcore 1864 on March 13. 1866 Added that a decrease of transmission interval to 100 ms increases 1867 switching performance by a factor 3, but still not sufficient. 1869 Added that the CSRC-list method also uses 100 milliseconds 1870 transmission interval. 1872 Added the method with multiple primary text in each packet. 1874 Added the timestamp-based method for rtp-mixing proposed by James 1875 Hamlin on March 14. 1877 Corrected the chat style presentation example picture. Delete a few 1878 "[mix]". 1880 14.5. Changes from version draft-hellstrom-mmusic-multi-party-rtt-01 to 1881 -02 1883 Change from a general overview to overview with clear 1884 recommendations. 1886 Splits text coordination methods in three groups. 1888 Recommends rtt-mixer with sources in CSRC-list but referenes to its 1889 spec for details. 1891 Shortened Appendix with conference-unaware example. 1893 Cleaned up preferences. 1895 Inserted pictures of screen-views. 1897 15. References 1899 15.1. Normative References 1901 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1902 Requirement Levels", BCP 14, RFC 2119, 1903 DOI 10.17487/RFC2119, March 1997, 1904 . 1906 15.2. Informative References 1908 [EN301549] ETSI, "EN 301 549. Accessibility requirements for ICT 1909 products and services", November 2019, 1910 . 1914 [I-D.ietf-avtcore-multi-party-rtt-mix] 1915 Hellstrom, G., "RTP-mixer formatting of multi-party Real- 1916 time text", Work in Progress, Internet-Draft, draft-ietf- 1917 avtcore-multi-party-rtt-mix-06, 11 June 2020, 1918 . 1921 [I-D.ietf-avtcore-multiplex-guidelines] 1922 Westerlund, M., Burman, B., Perkins, C., Alvestrand, H., 1923 and R. Even, "Guidelines for using the Multiplexing 1924 Features of RTP to Support Multiple Media Streams", Work 1925 in Progress, Internet-Draft, draft-ietf-avtcore-multiplex- 1926 guidelines-12, 16 June 2020, . 1929 [I-D.ietf-mmusic-t140-usage-data-channel] 1930 Holmberg, C. and G. Hellstrom, "T.140 Real-time Text 1931 Conversation over WebRTC Data Channels", Work in Progress, 1932 Internet-Draft, draft-ietf-mmusic-t140-usage-data-channel- 1933 14, 10 April 2020, . 1936 [I-D.ietf-perc-private-media-framework] 1937 Jones, P., Benham, D., and C. Groves, "A Solution 1938 Framework for Private Media in Privacy Enhanced RTP 1939 Conferencing (PERC)", Work in Progress, Internet-Draft, 1940 draft-ietf-perc-private-media-framework-12, 5 June 2019, 1941 . 1944 [NENAi3] NENA, "NENA-STA-010.2-2016. Detailed Functional and 1945 Interface Standards for the NENA i3 Solution", October 1946 2016, . 1948 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1949 Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse- 1950 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1951 DOI 10.17487/RFC2198, September 1997, 1952 . 1954 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1955 A., Peterson, J., Sparks, R., Handley, M., and E. 1956 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1957 DOI 10.17487/RFC3261, June 2002, 1958 . 1960 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1961 with Session Description Protocol (SDP)", RFC 3264, 1962 DOI 10.17487/RFC3264, June 2002, 1963 . 1965 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1966 Jacobson, "RTP: A Transport Protocol for Real-Time 1967 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1968 July 2003, . 1970 [RFC3840] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, 1971 "Indicating User Agent Capabilities in the Session 1972 Initiation Protocol (SIP)", RFC 3840, 1973 DOI 10.17487/RFC3840, August 2004, 1974 . 1976 [RFC3841] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 1977 Preferences for the Session Initiation Protocol (SIP)", 1978 RFC 3841, DOI 10.17487/RFC3841, August 2004, 1979 . 1981 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 1982 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 1983 . 1985 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 1986 Session Initiation Protocol (SIP)", RFC 4353, 1987 DOI 10.17487/RFC4353, February 2006, 1988 . 1990 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A 1991 Session Initiation Protocol (SIP) Event Package for 1992 Conference State", RFC 4575, DOI 10.17487/RFC4575, August 1993 2006, . 1995 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 1996 (SIP) Call Control - Conferencing for User Agents", 1997 BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, 1998 . 2000 [RFC4597] Even, R. and N. Ismail, "Conferencing Scenarios", 2001 RFC 4597, DOI 10.17487/RFC4597, August 2006, 2002 . 2004 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 2005 Correction", RFC 5109, DOI 10.17487/RFC5109, December 2006 2007, . 2008 [RFC5194] van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real- 2009 Time Text over IP Using the Session Initiation Protocol 2010 (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008, 2011 . 2013 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 2014 Media Attributes in the Session Description Protocol 2015 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, 2016 . 2018 [RFC6443] Rosen, B., Schulzrinne, H., Polk, J., and A. Newton, 2019 "Framework for Emergency Calling Using Internet 2020 Multimedia", RFC 6443, DOI 10.17487/RFC6443, December 2021 2011, . 2023 [RFC6881] Rosen, B. and J. Polk, "Best Current Practice for 2024 Communications Services in Support of Emergency Calling", 2025 BCP 181, RFC 6881, DOI 10.17487/RFC6881, March 2013, 2026 . 2028 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 2029 DOI 10.17487/RFC7667, November 2015, 2030 . 2032 [T140] ITU-T, "Recommendation ITU-T T.140 (02/1998), Protocol for 2033 multimedia application text conversation", February 1998, 2034 . 2036 [T140ad1] ITU-T, "Recommendation ITU-T.140 Addendum 1 - (02/2000), 2037 Protocol for multimedia application text conversation", 2038 February 2000, 2039 . 2041 [TS103479] ETSI, "TS 103 479. Emergency communications (EMTEL); Core 2042 elements for network independent access to emergency 2043 services", December 2019, . 2047 [TS22173] 3GPP, "IP Multimedia Core Network Subsystem (IMS) 2048 Multimedia Telephony Service and supplementary services; 2049 Stage 1", 3GPP TS 22.173 17.1.0, 20 December 2019, 2050 . 2052 [TS24147] 3GPP, "Conferencing using the IP Multimedia (IM) Core 2053 Network (CN) subsystem; Stage 3", 3GPP TS 24.147 16.0.0, 2054 19 December 2019, 2055 . 2057 Author's Address 2059 Gunnar Hellstrom 2060 Gunnar Hellstrom Accessible Communication 2061 Esplanaden 30 2062 SE-136 70 Vendelso 2063 Sweden 2065 Phone: +46 708 204 288 2066 Email: gunnar.hellstrom@ghaccess.se