idnits 2.17.1 draft-hellstrom-avtcore-multi-party-rtt-solutions-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (20 June 2021) is 1013 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'ICE' is mentioned on line 1069, but not defined == Unused Reference: 'RFC3264' is defined on line 2094, but no explicit reference was found in the text == Outdated reference: A later version (-20) exists of draft-ietf-avtcore-multi-party-rtt-mix-10 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Hellstrom 3 Internet-Draft GHAccess 4 Intended status: Informational 20 June 2021 5 Expires: 22 December 2021 7 Real-time text solutions for multi-party sessions 8 draft-hellstrom-avtcore-multi-party-rtt-solutions-07 10 Abstract 12 This document specifies methods for Real-Time Text (RTT) media 13 handling in multi-party calls. The main discussed transport is to 14 carry Real-Time text by the RTP protocol in a time-sampled mode 15 according to RFC 4103. The mechanisms enable the receiving 16 application to present the received real-time text media, separated 17 per source, in different ways according to user preferences. Some 18 presentation related features are also described explaining suitable 19 variations of transmission and presentation of text. 21 Call control features are described for the SIP environment. A 22 number of alternative methods for providing the multi-party 23 negotiation, transmission and presentation are discussed and a 24 recommendation for the main ones is provided. The main solution for 25 SIP based centralized multi-party handling of real-time text is 26 achieved through a media control unit coordinating multiple RTP text 27 streams into one RTP stream. 29 Alternative methods using a single RTP stream and source 30 identification inline in the text stream are also described, one of 31 them being provided as a lower functionality fallback method for 32 endpoints with no multi-party awareness for RTT. 34 Bridging methods where the text stream is carried without the 35 contents being dealt with in detail by the bridge are also discussed. 37 Brief information is also provided for multi-party RTT in the WebRTC 38 environment. 40 The intention is to provide background for decisions, specification 41 and implementation of selected methods. 43 Status of This Memo 45 This Internet-Draft is submitted in full conformance with the 46 provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF). Note that other groups may also distribute 50 working documents as Internet-Drafts. The list of current Internet- 51 Drafts is at https://datatracker.ietf.org/drafts/current/. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 This Internet-Draft will expire on 22 December 2021. 60 Copyright Notice 62 Copyright (c) 2021 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 67 license-info) in effect on the date of publication of this document. 68 Please review these documents carefully, as they describe your rights 69 and restrictions with respect to this document. Code Components 70 extracted from this document must include Simplified BSD License text 71 as described in Section 4.e of the Trust Legal Provisions and are 72 provided without warranty as described in the Simplified BSD License. 74 Table of Contents 76 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 77 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 78 2. Centralized conference model . . . . . . . . . . . . . . . . 5 79 3. Requirements on multi-party RTT . . . . . . . . . . . . . . . 6 80 3.1. General requirements . . . . . . . . . . . . . . . . . . 6 81 3.2. Performance requirements . . . . . . . . . . . . . . . . 7 82 4. RTP based solutions . . . . . . . . . . . . . . . . . . . . . 8 83 4.1. Coordination of text RTP streams . . . . . . . . . . . . 8 84 4.1.1. RTP-based solutions with a central mixer . . . . . . 9 85 4.1.1.1. RTP Mixer using default RFC 4103 methods . . . . 9 86 4.1.1.2. RTP Mixer using the default method but decreased 87 transmission interval . . . . . . . . . . . . . . . 9 88 4.1.1.3. RTP Mixer with frequent transmission and indicating 89 sources in CSRC-list . . . . . . . . . . . . . . . 10 90 4.1.1.4. RTP Mixer interleaving packets, receiver using 91 timestamp to recover from loss . . . . . . . . . . 12 92 4.1.1.5. RTP Mixer with multiple primary data in each packet 93 and individual sequence numbers . . . . . . . . . . 13 94 4.1.1.6. RTP Mixer with multiple primary data in each 95 packet . . . . . . . . . . . . . . . . . . . . . . 14 97 4.1.1.7. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy 98 in the packets . . . . . . . . . . . . . . . . . . 15 99 4.1.1.8. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy 100 and separate sequence number in the packets . . . . 17 101 4.1.1.9. RTP Mixer indicating participants by a control code 102 in the stream . . . . . . . . . . . . . . . . . . . 19 103 4.1.1.10. Mixing for multi-party unaware user agents . . . 21 104 4.1.2. RTP-based bridging with minor RTT media contents 105 reformatting by the bridge . . . . . . . . . . . . . 22 106 4.1.2.1. One RTP stream for RTT per participant from the 107 mixer . . . . . . . . . . . . . . . . . . . . . . . 22 108 4.1.2.2. Selective Forwarding Middlebox . . . . . . . . . 25 109 4.1.2.3. Distributing packets in an end-to-end encryption 110 structure . . . . . . . . . . . . . . . . . . . . . 27 111 4.1.2.4. Mesh of RTP endpoints . . . . . . . . . . . . . . 27 112 4.1.2.5. Multiple RTP sessions, one for each 113 participant . . . . . . . . . . . . . . . . . . . . 28 114 5. Preferred RTP-based multi-party RTT transport method . . . . 29 115 6. Session control of RTP-based multi-party RTT sessions . . . . 29 116 6.1. Implicit RTT multi-party capability indication . . . . . 30 117 6.2. RTT multi-party capability declared by SIP media-tags . . 31 118 6.3. SDP media attribute for RTT multi-party capability 119 indication . . . . . . . . . . . . . . . . . . . . . . . 32 120 6.4. Simplified SDP media attribute for RTT multi-party 121 capability indication . . . . . . . . . . . . . . . . . . 33 122 6.5. SDP format parameter for RTT multi-party capability 123 indication . . . . . . . . . . . . . . . . . . . . . . . 34 124 6.6. A text media subtype for support of multi-party rtt . . . 35 125 6.7. Preferred capability declaration method for RTP-based 126 transport. . . . . . . . . . . . . . . . . . . . . . . . 35 127 6.8. Identification of the source of text for RTP-based 128 solutions . . . . . . . . . . . . . . . . . . . . . . . . 36 129 7. RTT bridging in WebRTC . . . . . . . . . . . . . . . . . . . 36 130 7.1. RTT bridging in WebRTC with one data channel per 131 source . . . . . . . . . . . . . . . . . . . . . . . . . 36 132 7.2. RTT bridging in WebRTC with one common data channel . . . 37 133 7.3. Preferred rtt multi-party method for WebRTC . . . . . . . 38 134 8. Presentation of multi-party text . . . . . . . . . . . . . . 38 135 8.1. Associating identities with text streams . . . . . . . . 38 136 8.2. Presentation details for multi-party aware endpoints. . . 39 137 8.2.1. Bubble style presentation . . . . . . . . . . . . . . 39 138 8.2.2. Other presentation styles . . . . . . . . . . . . . . 41 139 9. Presentation details for multi-party unaware endpoints. . . . 41 140 10. Security Considerations . . . . . . . . . . . . . . . . . . . 41 141 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 142 12. Congestion considerations . . . . . . . . . . . . . . . . . . 42 143 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 42 144 14. Change history . . . . . . . . . . . . . . . . . . . . . . . 42 145 14.1. Changes to 146 draft-hellstrom-avtcore-multi-party-rtt-solutions-07 . . 42 147 14.2. Changes to 148 draft-hellstrom-avtcore-multi-party-rtt-solutions-06 . . 42 149 14.3. Changes to 150 draft-hellstrom-avtcore-multi-party-rtt-solutions-05 . . 42 151 14.4. Changes to 152 draft-hellstrom-avtcore-multi-party-rtt-solutions-04 . . 43 153 14.5. Changes to 154 draft-hellstrom-avtcore-multi-party-rtt-solutions-03 . . 43 155 14.6. Changes to 156 draft-hellstrom-avtcore-multi-party-rtt-solutions-02 . . 43 157 14.7. Changes to 158 draft-hellstrom-avtcore-multi-party-rtt-solutions-01 . . 43 159 14.8. Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to 160 draft-hellstrom-avtcore-multi-party-rtt-solutions-00 . . 43 161 14.9. Changes from version 162 draft-hellstrom-mmusic-multi-party-rtt-01 to -02 . . . . 44 163 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 44 164 15.1. Normative References . . . . . . . . . . . . . . . . . . 44 165 15.2. Informative References . . . . . . . . . . . . . . . . . 44 166 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 47 168 1. Introduction 170 Real-time text (RTT) is a medium in real-time conversational 171 sessions. Text entered by participants in a session is transmitted 172 in a time-sampled fashion, so that no specific user action is needed 173 to cause transmission. This gives a direct flow of text in the rate 174 it is created, that is suitable in a real-time conversational 175 setting. The real-time text medium can be combined with other media 176 in multimedia sessions. 178 Media from a number of multimedia session participants can be 179 combined in a multi-party session. The present document specifies 180 how the real-time text streams can be handled in multi-party 181 sessions. Recommendations are provided for preferred methods. 183 The description is mainly focused on the transport level, but also 184 describes a few session and presentation level aspects. 186 Transport of real-time text is specified in RFC 4103 [RFC4103] RTP 187 Payload for text conversation. It makes use of RFC 3550 [RFC3550] 188 Real Time Protocol, for transport. Robustness against network 189 transmission problems is normally achieved through redundant 190 transmission based on the principle from RFC 2198 [RFC2198], with one 191 primary and two redundant transmission of each text element. Primary 192 and redundant transmissions are combined in packets and described by 193 a redundancy header. This transport is usually used in the SIP 194 Session Initiation Protocol RFC 3261 [RFC3261] environment. 196 A very brief overview of functions for real-time text handling in 197 multi-party sessions is described in RFC 4597 [RFC4597] Conferencing 198 Scenarios, sections 4.8 and 4.10. The present specification builds 199 on that description and indicates which protocol mechanisms should be 200 used to implement multi-party handling of real-time text. 202 Real-time text can also be transported in the WebRTC environment, by 203 using WebRTC data channels according to RFC-to-be 8865 204 [I-D.ietf-mmusic-t140-usage-data-channel]. Multi-party aspects for 205 WebRTC solutions are briefly covered. 207 1.1. Requirements Language 209 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 210 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 211 document are to be interpreted as described in RFC 2119 [RFC2119]. 213 2. Centralized conference model 215 In the centralized conference model for SIP, introduced in RFC 4353 216 [RFC4353] "A Framework for Conferencing with the Session Initiation 217 Protocol (SIP)", one function co-ordinates the communication with 218 participants in the multi-party session. This function also controls 219 media mixer functions for the media appearing in the session. The 220 central function is common for control of all media, while the media 221 mixers may work differently for each media. 223 The central function is called the Focus UA. Many variants exist for 224 setting up sessions including the multipoint control centre. It is 225 not within scope of this description to describe these, but rather 226 the media specific handling in the mixer required to handle multi- 227 party calls with RTT. 229 The main principle for handling real-time text media in a centralized 230 conference is that one RTP session for real-time text is established 231 including the multipoint media control centre and the participating 232 endpoints which are going to have real-time text exchange with the 233 others. 235 The different possible mechanisms for mixing and transporting RTT 236 differs in the way they multiplex the text streams and how they 237 identify the sources of the streams. RFC 7667 [RFC7667] describes a 238 number of possible use cases for RTP. This specification refers to 239 different sections of RFC 7667 for further reading of the situations 240 caused by the different possible design choices. 242 The recommended method for using RTP based RTT in a centralized 243 conference model is specified in 244 [I-D.ietf-avtcore-multi-party-rtt-mix] based on the recommendations 245 in this document. 247 Real-time text can also be transported in the WebRTC environment, by 248 using WebRTC data channels according to 249 [I-D.ietf-mmusic-t140-usage-data-channel]. Ways to handle multi- 250 party calls in that environmnent are also specified. 252 3. Requirements on multi-party RTT 254 3.1. General requirements 256 The following general requirements are placed on multi-party RTT: 258 A solution shall be applicable to IMS (3GPP TS 22.173)[TS22173], 259 SIP based VoIP and Next Generation Emergency Services (NENA i3 260 [NENAi3], ETSI TS 103 479 [TS103479], RFC 6443[RFC6443]). 262 The transmission interval for text should not be longer than 500 263 milliseconds when there is anything available to send. Ref ITU-T 264 T.140 [T140]. 266 If text loss is detected or suspected, a missing text marker 267 should be inserted in the text stream. Ref ITU-T T.140 Amendment 268 1 [T140ad1]. ETSI EN 301 549 [EN301549] 270 The display of text from the members of the conversation shall be 271 arranged so that the text from each participant is clearly 272 readable, and its source and the relative timing of entered text 273 is visualized in the display. Mechanisms for looking back in the 274 contents from the current session should be provided. The text 275 should be displayed as soon as it is received. Ref ITU-T T.140 276 [T140] 278 Bridges must be multimedia capable (voice, video, text). Ref NENA 279 i3 STA-010.2. [NENAi3] 280 It MUST be possible to use real-time text in conferences both as a 281 medium of discussion between individual participants (for example, 282 for sidebar discussions in real-time text while listening to the 283 main conference audio) and for central support of the conference 284 with real-time text interpretation of speech. Ref (R7) in RFC 285 5194.[RFC5194] 287 It should be possible to protect RTT contents with usual means for 288 privacy and integrity. Ref RFC 6881 section 16. [RFC6881] 290 Conferencing procedures are documented in RFC 4579 [RFC4579]. Ref 291 NENA i3 STA-010.2.[NENAi3] 293 Conferencing applies to any kind of media stream by which users 294 may want to communicate. Ref 3GPP TS 24.147 [TS24147] 296 The framework for SIP conferences is specified in RFC 4353 297 [RFC4353]. Ref 3GPP TS 24.147 [TS24147] 299 3.2. Performance requirements 301 The mixer performance requirements can be expressed in one number, 302 extracted from the user requirements on real-time text expressed in 303 ITU-T F.700, where it is stated that for "good" usability, text 304 characters should not be delayed more than 1 second from creation to 305 presentation. For "usable" usability the figure is 2 seconds. The 306 main factor behind these limits is from when taking turns in a 307 conversation gets disturbed by a delay of when a response gets 308 visible to the receiving part. If that times get too long, the 309 receiving part gets unsure if the previous utterance was well 310 perceived and the receiving part maybe prepares for repetition. This 311 is similar to the same effect in voice communication, where the 312 usability limit is 400 ms delay. 314 Another important factor in a multi-party conference is the 315 opportunity for a participant using real-time text to provide timely 316 comments and get a chance to enter the discussion if the majority of 317 participants use voice in the conference. A complicating factor when 318 stating the requirements is that some transport methods do not cause 319 a total delay, but instead an increasing jerkiness when the number of 320 simultaneously sending participants is increased. 322 It should however be remembered that the expected number of 323 participants sending real-time text simultaneously is low. Just as 324 with voice or sign language, the capability of the participants to 325 perceive utterances from more than one participant at a time is very 326 limited. Therefore the normal case in multi-party situations is that 327 one participant at a time is the main provider of text. Others might 328 usually just provide very brief comments such as "yes" or "no" or 329 "may I comment?". Only at very rare situations two participants 330 provide more information simultaneously. 332 * The number of expected simultaneously transmitting users is 333 different for different applications. In all cases, just one 334 transmitting user is the normal case. Two simultaneously 335 transmitting participants can occasionally be expected in 336 emergency services, relay services, small unmanaged conferences 337 and group calls and large managed conferences. Three 338 simultaneously transmitting participants may appear occasionally 339 in large unmanaged conferences. The following can therefore 340 express the performance requirement. 342 * The mean delay of text passing the mixer introduced when only one 343 participant is sending text should be kept to a minimum and should 344 not be more than 400 ms. 346 * The mean delay of text passing the mixer should not be more than 1 347 second during moments when up to three users are sending text 348 simultaneously. 350 * For the very rare case that more than three participants send text 351 simultaneously, the mixer may take action to limit the introduced 352 delay of the text passing the mixer to 7 seconds e.g. by 353 discarding text from some participants and instead inserting a 354 general warning about possible text loss in the stream. 356 * The load on network and nodes should be limited. This is usually 357 achieved by setting a limit for how many packets per second that 358 may be sent from a mixer to each participant. While two-party use 359 by RFC 4103, limits the load to 3.3 packets per second, a 360 realistic limit for mixers could be 10 packets per second. This 361 is still just a small fraction of what is commonly transmitted in 362 real-time video and audio, so in known environments it may be 363 possible to increase the packet rate if needed to keep latency 364 low. 366 4. RTP based solutions 368 4.1. Coordination of text RTP streams 370 Coordinating and sending text RTP streams in the multi-party session 371 can be done in a number of ways. The most suitable methods are 372 specified here with pros and cons. 374 A receiving and presenting endpoint MUST separate text from the 375 different sources and identify and display them accordingly. 377 4.1.1. RTP-based solutions with a central mixer 379 A set of solutions can be based on the central RTP mixer. They are 380 described here and a preferred method selected. 382 4.1.1.1. RTP Mixer using default RFC 4103 methods 384 Without any extra specifications, a mixer would transmit with 300 385 milliseconds intervals, and use RFC 4103 [RFC4103] with the default 386 redundancy of one original and two redundant transmissions. The 387 source of the text would be indicated by a single member in the CSRC 388 list. Text from different sources cannot be transmitted in the same 389 packet. Therefore, from the time when the mixer sent one piece of 390 new text from one source, it will need to transmit that text again 391 twice as redundant data, before it can send text from another source. 392 The jerkiness = time between transmission of new text is 900 ms. 393 This is clearly insufficient. 395 Pros: 397 Only a capability negotiation method is needed. No other update of 398 standards are needed, just a general remark that traditional RTP- 399 mixing is used. 401 Cons: 403 Clearly insufficient mixer switching performance. 405 A bit complex handling of transmission when there is new text 406 available from more than one source. The mixer needs to send two 407 packets more with redundant text from the current source before 408 starting to send anything from the other source. 410 4.1.1.2. RTP Mixer using the default method but decreased transmission 411 interval 413 This method makes use of the default RTP-mixing method briefly 414 described in Section 4.1.1.1. The only difference is that the 415 transmission interval is decreased to 100 milliseconds when there is 416 text from more than one source available for transmission. The 417 jerkiness is 300 ms. The mean delay with two simultaneously sending 418 participants is 250 ms, and with three simultaneously sending 419 participants 500 ms. This is acceptable performance. 421 Pros: 423 Minor influence on standards 424 Can be relatively rapidly be introduced in the intended technical 425 environments. 427 Can be declared in sdp as the already existing "text/red" format with 428 a multi-party attribute for capability negotiation. 430 Cons: 432 The introduced jerkiness of new text from more than the required 433 three simultaneously sending sources is high. 435 Slightly higher risk for loss of text at bursty packet loss than for 436 the recommended transmission interval (300 ms) for RFC 4103. 438 When complete loss of packets occur (beyond recovery), it is not 439 possible to deduce from which source text was lost. 441 A bit complex handling of transmission when there is new text 442 available from more than one source. The mixer needs to send two 443 packets more with redundant text from the current source before 444 starting to send anything from the other source. 446 4.1.1.3. RTP Mixer with frequent transmission and indicating sources in 447 CSRC-list 449 An RTP media mixer combines text from participants into one RTP 450 stream, thus all using the same destination address/port combination, 451 the same RTP SSRC, and one sequence number series as described in 452 Section 7.1 and 7.3 of RTP RFC 3550 [RFC3550] about the Mixer 453 function. This method is also briefly described in RFC 7667, section 454 3.6.1 Media mixing mixer [RFC7667]. 456 The sources of the text in each RTP packet are identified by the CSRC 457 list in the RTP packets, containing the SSRC of the initial sources 458 of text. The order of the CSRC parameters is with the SSRC of the 459 source of the primary text first, followed by the SSRC of the first 460 level redundancy, and then the second level redundancy. 462 The transmission interval should be 100 milliseconds when there is 463 text to transmit from more than one source, and otherwise 300 ms. 465 The identification of the sources is made through the CSRC fields and 466 can be made more readable at the receiver through the RTCP SDES CNAME 467 and NAME packets as described in RTP[RFC3550]. 469 Information provided through the notification according to RFC 4575 470 [RFC4575] when the participant joined the conference provides also 471 suitable information and a reference to the SSRC. 473 A receiving endpoint is supposed to separate text items from the 474 different sources and identify and display them accordingly. 476 The ordered CSRC lists in the RFC 4103 [RFC4103] packets make it 477 possible to recover from loss of one and two packets in sequence and 478 assign the recovered text to the right source. For more loss, a 479 marker for possible loss should be inserted or presented. 481 The conference server needs to have authority to decrypt the payload 482 in the received RTP packets in order to be able to recover text from 483 redundant data or insert the missing text marker in the stream, and 484 repack the text in new packets. 486 Even if the format is very similar to "text/red" of RFC 4103, it 487 needs to be declared as a new media subtype, e.g. "text/rex". 489 Pros: 491 This method has low overhead and less complexity than the methods in 492 Section 4.1.1.1, Section 4.1.1.2, Section 4.1.1.4 and 493 Section 4.1.1.6. 495 When loss of packets occur, it is possible to recover text from 496 redundancy at loss of up to the number of redundancy levels carried 497 in the RFC 4103 [RFC4103] stream (normally primary and two redundant 498 levels). 500 This method can be implemented with most RTP implementations. 502 The source switching performance is sufficient for well-behaving 503 conference participants. The jerkiness is 100 ms. 505 Cons: 507 When more consecutive packet loss than the number of generations of 508 redundant data appears, it is not possible to deduce the sources of 509 the totally lost data. 511 Slightly higher risk for loss of text at bursty packet loss than for 512 the recommended transmission interval for RFC 4103. 514 Requires a different sub media format, e.g. "text/rex". This takes a 515 long time in standardisation and releases of target technical 516 environments. 518 The conference server needs to be allowed to decrypt/encrypt the 519 packet payload. This is however normal for media mixers for other 520 media. 522 4.1.1.4. RTP Mixer interleaving packets, receiver using timestamp to 523 recover from loss 525 This method has text only from one source per packet, as the original 526 RFC 4103 [RFC4103] specifies. Packets with text from different 527 sources are instead allowed to be interleaved. The recovery 528 procedure in the receiver makes use of the RTP timestamp and 529 timestamp offsets in the redundancy headers to evaluate if a piece of 530 redundant data was received earlier or not as a base for decision if 531 the redundant data should be recovered or not in case of packet loss. 533 In this method, the transmission is immediate when new text from a 534 source is available for transmission. Otherwise the transmission 535 interval for redundant transmission of text from each source is 320 536 ms when no new text is available. At congestion, the transmission 537 interval is allowed to be longer. 539 Pros: 541 The format of each packet is equal to what is specified in RFC 4103 542 [RFC4103]. 544 The source switching performance is sufficient and good. Text from 545 five participants can be transmitted simultaneously with 300 546 milliseconds interval per source. 548 New text from five simultaneous sources can be transmitted within 300 549 milliseconds. This is sufficient. 551 Recovery from packet loss with five simultaneous sources takes 1 552 second. This is good and implies good protection against bursty 553 packet loss causing resulting text loss. 555 Cons: 557 The recovery time in case of packet loss can be long with more than 558 ten simultaneously intensively sending participants. Then it will be 559 more than 2 seconds. 561 The recovery procedure is different from what is described in RFC 562 4103 [RFC4103]. 564 It will in many cases of loss of multiple packets not be possible to 565 deduce if there was any resulting loss of text. A mark for possible 566 loss should be inserted in cases when there might have been resulting 567 loss. 569 4.1.1.5. RTP Mixer with multiple primary data in each packet and 570 individual sequence numbers 572 This method allows primary as well as redundant text from more than 573 one source per packet. The packet payload contains an ordered set of 574 redundant and primary data with the same number of generations of 575 redundancy as once agreed in the SDP negotiation. The data header 576 reflects these parts of the payload. The CSRC list contains one CSRC 577 member per source in the payload and in the same order. An 578 individual sequence number per source is included in the data header 579 replacing the t140 payload type number that is instead assumed to be 580 constant in this format. This allows an individual extra sequence 581 number per source with maximum value 127, suitable for checking for 582 which source loss of text appeared when recovery was not possible. 584 The data header would contain the following fields: 585 0 1 2 3 586 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 587 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 588 |F| Source-seq | timestamp offset | block length | 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 590 Where "Source-seq" is the sequence number per source. 592 The maximum number of members in the CSRC-list is 15, and that is 593 therefore the maximum number of sources that can be represented in 594 each packet provided that all data can be fitted into the size 595 allowable in one packet. 597 Transmission is done as soon as there is new text available, but not 598 with shorter interval than 150 ms and not longer than 300 ms while 599 there is anything to send. 601 A new media subtype is needed, e.g. "text/rex". 603 This is an SDP offer example for both traditional "text/red" 604 and multi-party "text/rex" format: 606 m=text 11000 RTP/AVP 101 100 98 607 a=rtpmap:98 t140/1000 608 a=rtpmap:100 red/1000 609 a=rtpmap:101 rex/1000 610 a=fmtp:100 98/98/98 611 a=fmtp:101 98/98/98 613 Pros: 615 The source switching performance is good. Text from 15 participants 616 can be transmitted simultaneously. 618 New text from 15 simultaneous sources can be transmitted within 300 619 milliseconds. This is good performance. 621 When more consecutive packet loss than the number of generations of 622 redundant data appears, it is still possible to deduce the sources of 623 the totally lost data, when next text from these sources arrive. 625 Cons: 627 The format of each packet is different from what is specified in RFC 628 4103 [RFC4103]. 630 The processing time in standard organisation will be long. 632 A new media subtype is needed, causing a bit complex negotiation. 634 The recovery procedure is a bit complex. 636 4.1.1.6. RTP Mixer with multiple primary data in each packet 638 This method allows primary as well as redundant text from more than 639 one source per packet. The packet payload contains an ordered set of 640 redundant and primary data with the same number of generations of 641 redundancy as once agreed in the SDP negotiation. The data header 642 reflects these parts of the payload. The CSRC list contains one CSRC 643 member per source in the payload and in the same order. 645 The maximum number of members in the CSRC-list is 15, and that is 646 therefore the maximum number of sources that can be represented in 647 each packet provided that all data can be fitted into the size 648 allowable in one packet. 650 Transmission is done as soon as there is new text available, but not 651 with shorter interval than 150 ms and not longer than 300 ms while 652 there is anything to send. 654 A new media subtype is needed, e.g. "text/rex". 656 SDP would be the same as in Section 4.1.1.6. 658 Pros: 660 The source switching performance is good. Text from 15 participants 661 can be transmitted simultaneously. 663 New text from 15 simultaneous sources can be transmitted within 150 664 milliseconds. This is good performance. 666 Cons: 668 The format of each packet is different from what is specified in RFC 669 4103 [RFC4103]. 671 A new media subtype is needed. 673 A new media subtype is needed, causing a bit complex negotiation. 675 The processing time in standard organisation will be long. 677 The recovery procedure is a bit complex [RFC4103]. 679 When more consecutive packet loss than the number of generations of 680 redundant data appears, it is not possible to deduce the sources of 681 the totally lost data. 683 4.1.1.7. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy in the 684 packets 686 This method allows primary data from one source and redundant text 687 from other sources in each packet. The packet payload contains 688 primary data in "text/t140" format, and redundant data in RFC 5109 689 FEC [RFC5109] format called "text/ulpfec". That means that the 690 redundant data contains the sequence number and the CSRC and other 691 characteristics from the RTP header when the data was sent as 692 primary. The redundancy can be sent at a selected number of packets 693 after when it was sent as primary, in order to improve the protection 694 against bursty packet loss. The redundancy level is recommended to 695 be the same as in original RFC 4103. 697 RFC 4103 says that the protection against loss can be made by other 698 methods than plain redundancy, so this method is in line with that 699 statement. 701 Transmission is done as soon as there is new text available, but not 702 with shorter interval than 100 ms and not longer than 300 ms while 703 there is anything to send (new or redundant text). 705 When more consecutive packet loss than the number of generations of 706 redundant data appears, it is not possible to deduce the sources of 707 the totally lost data. 709 The sdp can indicate the format as "text/red" with "text/ulpfec" 710 redundant data in this way. with traditional RFC 4103 with "text/red" 711 with "text/t140" as redundant data as a fallback. 713 m=text 49170 RTP/AVP 98 101 100 102 714 a=rtpmap:98 red/1000 715 a=fmtp:98 100/102/102 716 a=rtpmap:102 ulpfec/1000 717 a=rtpmap:100 t140/1000 718 a=rtpmap:101 red/1000 719 a=fmtp:101 100/100/100 720 a=fmtp:100 cps=200 722 The "text/ulpfec" format includes an indication of how far back the 723 redundancy belongs, making it possible to cover bursty packet loss 724 better than the other formats with short transmission intervals. For 725 real-time text, it is recommended to send three packets between the 726 primary and the redundant transmissions of text. That makes the 727 transmission cover between 500 and 1500 ms of bursty packet loss. 728 The variation is because of the varying packet interval between many 729 and one simultaneously transmitting source. 731 The "text/ulpfec" format has a number of parameters. One is the 732 length of the data to be protected which in this case must be the 733 whole t140block. 735 Pros: 737 The source switching performance is good. Text from 5 participants 738 can be transmitted within 500 ms. 740 Good recovery from bursty packet loss. 742 The method is based on existing standards. No new registrations are 743 needed. 745 Cons: 747 When more consecutive packet loss than the number of generations of 748 redundant data appears, it is not possible to deduce the sources of 749 the totally lost data. 751 Even if the switching performance is good, it is not as good as for 752 the method called "RTP Mixer with multiple primary data in each 753 packet "Section 4.1.1.6. With more than 5 simultaneously sending 754 sources, there will be a noticeable delay of text of over 500 ms, 755 with 100 ms added per simultaneous source. This is however beyond 756 the requirements and would be a concern only in congestion 757 situations. 759 The recovery procedure is a bit complex [RFC5109]. 761 There is more overhead in terms of extra data and extra packets sent 762 than in the other methods. With the recommended two redundant 763 generations of data, each packet will be 36 bytes longer than with 764 traditional RFC 4103, and at each pause in transmission five extra 765 packets with only redundant data will be sent compared to two extra 766 packets for the traditional RFC 4103 case. 768 4.1.1.8. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy and 769 separate sequence number in the packets 771 This method allows primary data from one source and redundant text 772 from other sources in each packet. The packet payload contains 773 primary data in a new "text/t140e" format, and redundant data in RFC 774 5109 FEC [RFC5109] format called "text/ulpfec". That means that the 775 redundant data contains the sequence number and the CSRC and other 776 characteristics from the RTP header when the data was sent as 777 primary. The redundancy can be sent at a selected number of packets 778 after when it was sent as primary, in order to improve the protection 779 against bursty packet loss. The redundancy level is recommended to 780 be the same as in original RFC 4103. The "text/t140e" format 781 contains a source-specific sequence number and the t140block. 783 RFC 4103 says that the protection against loss can be made by other 784 methods than plain redundancy, so this method is in line with that 785 statement. 787 Transmission is done as soon as there is new text available, but not 788 with shorter interval than 100 ms and not longer than 300 ms while 789 there is anything to send (new or redundant text). 791 When more consecutive packet loss than the number of generations of 792 redundant data appears, it is possible to deduce which sources lost 793 data when new data arrives from the sources. This is done by 794 monitoring the received source specific sequence numbers preceding 795 the text. 797 This is an example of how can indicate the format as "text/red" with 798 "text/t140e" as primary and "text/ulpfec" redundant data, with 799 traditional RFC 4103 with "text/red" with "text/t140" as redundant 800 data as a fallback. 802 m=text 49170 RTP/AVP 98 101 100 102 103 803 a=rtpmap:98 red/1000 804 a=fmtp:98 100/102/102 805 a=rtpmap:102 ulpfec/1000 806 a=rtpmap:103 t140/1000 807 a=rtpmap:100 t140e/1000 808 a=rtpmap:101 red/1000 809 a=fmtp:101 103/103/103 810 a=fmtp:100 cps=200 812 The "text/ulpfec" format includes an indication of how far back the 813 redundancy belongs, making it possible to cover bursty packet loss 814 better than the other formats with short transmission intervals. For 815 real-time text, it is recommended to send three packets between the 816 primary and the redundant transmissions of text. That makes the 817 transmission cover between 500 and 1500 ms of bursty packet loss. 818 The variation is because of the varying packet interval between many 819 and one simultaneously transmitting source. 821 The "text/ulpfec" format has a number of parameters. One is the 822 length of the data to be protected which in this case must be the 823 whole t140block. 825 Pros: 827 The source switching performance is good. Text from 5 participants 828 can be transmitted within 500 ms. 830 Good recovery from bursty packet loss. 832 The method is based on an existing standard for FEC. 834 When more consecutive packet loss than the number of generations of 835 redundant data appears, it is possible to deduce the source of the 836 lost data when new text arrives from the source. 838 Cons: 840 Even if the switching performance is good, it is not as good as for 841 the method called "RTP Mixer with multiple primary data in each 842 packet" Section 4.1.1.6. With more than 5 simultaneously sending 843 sources, there will be a noticeable delay of text of over 500 ms, 844 with 100 ms added per simultaneous source. This is however beyond 845 the requirements and would be a concern only in congestion 846 situations. 848 The recovery procedure is a bit complex [RFC5109]. 850 There is more overhead in terms of extra data and extra packets sent 851 than in the other methods. With the recommended two redundant 852 generations of data, each packet will be 40 bytes longer than with 853 traditional RFC 4103, and at each pause in transmission five extra 854 packets with only redundant data will be sent compared to two extra 855 packets for the traditional RFC 4103 case. 857 A new text media subtype "text/t140e" needs to be registered. 859 The processing time in standard organisation will be long. 861 4.1.1.9. RTP Mixer indicating participants by a control code in the 862 stream 864 Text from all participants except the receiving one is transmitted 865 from the media mixer in the same RTP session and stream, thus all 866 using the same destination address/port combination, the same RTP 867 SSRC and , one sequence number series as described in Section 7.1 and 868 7.3 of RTP RFC 3550 [RFC3550] about the Mixer function. The sources 869 of the text in each RTP packet are identified by a new defined T.140 870 control code "c" followed by a unique identification of the source in 871 UTF-8 string format. 873 The receiver can use the string for presenting the source of text. 874 This method is on the RTP level described in RFC 7667, section 3.6.1 875 Media mixing mixer [RFC7667]. 877 The inline coding of the source of text is applied in the data stream 878 itself, and an RTP mixer function is used for coordinating the 879 sources of text into one RTP stream. 881 Information uniquely identifying each user in the multi-party session 882 is placed as the parameter value "n" in the T.140 application 883 protocol function with the function code "c". The identifier shall 884 thus be formatted like this: SOS c n ST, where SOS and ST are coded 885 as specified in ITU-T T.140 [T140]. The "c" is the letter "c". The 886 n parameter value is a string uniquely identifying the source. This 887 parameter shall be kept short so that it can be repeated in the 888 transmission without concerns for network load. 890 A receiving endpoint is supposed to separate text items from the 891 different sources and identify and display them accordingly. 893 The conference server need to be allowed to decrypt/encrypt the 894 packet payload in order to check the source and repack the text. 896 Pros: 898 If loss of packets occur, it is possible to recover text from 899 redundancy at loss of up to the number of redundancy levels carried 900 in the RFC 4103 [RFC4103]stream. (normally primary and two redundant 901 levels. 903 This method can be implemented with most RTP implementations. 905 The method can also be used with other transports than RTP 907 Cons: 909 The method implies a moderate load by the need to insert the source 910 often in the stream. 912 If more consecutive packet loss than the number of generations of 913 redundant data appears, it is not possible to deduce the source of 914 the totally lost data. 916 The mixer needs to be able to generate suitable and unique source 917 identifications which are suitable as labels for the sources. 919 Requires an extension on the ITU-T T.140 standard, best made by the 920 ITU. 922 There is a risk that the control code indicating the change of source 923 is lost and the result is false source indication of text. 925 The conference server need to be allowed to decrypt/encrypt the 926 packet payload. 928 4.1.1.10. Mixing for multi-party unaware user agents 930 Multi-party real-time text contents can be transmitted to multi-party 931 unaware user agents if source labelling and formatting of the text is 932 performed by a mixer. This method has the limitations that the 933 layout of the presentation and the format of source identification is 934 purely controlled by the mixer, and that only one source at a time is 935 allowed to present in real-time. Other sources need to be stored 936 temporarily waiting for an appropriate moment to switch the source of 937 transmitted text. The mixer controls the switching of sources and 938 inserts a source identifier in text format at the beginning of text 939 after switch of source. The logic of the mixer to detect when a 940 switch is appropriate should detect a number of places in text where 941 a switch can be allowed, including new line, end of sentence, end of 942 phrase, a period of inactivity, and a word separator after a long 943 time of active transmission. 945 This method MAY be used when no support for multi-party awareness is 946 detected in the receiving endpoint.The base for his method is 947 described in RFC 7667, section 3.6.1 Media mixing mixer [RFC7667]. 949 See [I-D.ietf-avtcore-multi-party-rtt-mix] for a procedure for mixing 950 RTT for a conference-unaware endpoint. 952 Pros: 954 Can be transmitted to conference-unaware endpoints. 956 Can be used with other transports than RTP 958 Cons: 960 Does not allow full real-time presentation of more than one source at 961 a time. Text from other sources will be delayed. 963 The only realistic presentation format is a style with the text from 964 the different sources presented with a text label indicating source, 965 and the text collected in a chat style presentation but with more 966 frequent turn-taking. 968 Endpoints often have their own system for adding labels to the RTT 969 presentation. In that case there will be two levels of labels in the 970 presentation, one for the mixer and one for the sources. 972 If loss of more packets than can be recovered by the redundancy 973 appears, it is not possible to detect which source was struck by the 974 loss. It is also possible that a source switch occurred during the 975 loss, and therefore a false indication of the source of text can be 976 provided to the user after such loss. 978 Because of all these cons, this method is not recommended be used as 979 the main method, but only as fallback and the last resort for 980 backwards interoperability with multi-party unaware endpoints. 982 The conference server need to be allowed to decrypt/encrypt the 983 packet payload. 985 4.1.2. RTP-based bridging with minor RTT media contents reformatting by 986 the bridge 988 It may be desirable to send text in a multi-party setting in a way 989 that allows the text stream contents to be distributed without being 990 dealt with in detail in any central server. This approach may enable 991 end-to-end encryption. A number of such methods are described. 992 However, when writing this specification, no one of these methods 993 have a specified way of establishing the session by sdp. 995 4.1.2.1. One RTP stream for RTT per participant from the mixer 997 Within the RTP session, text from each participant is transmitted 998 from the RTP media bridge in a separate RTP stream, thus using the 999 same destination address/port combination, the same payload type 1000 number (PT) but separate RTP SSRC parameters and sequence number 1001 series as described in Section 7.1 and 7.2 of RTP RFC 3550 [RFC3550] 1002 about the Translator function. The source of the text in each RTP 1003 packet is identified by the SSRC parameter in the RTP packets, 1004 containing the SSRC of the initial source of text. 1006 A receiving and presenting endpoint is supposed to separate text 1007 items from the different sources and identify and display them in a 1008 suitable way. 1010 This method is described in RFC 7667, section 3.5.1 Relay-transport 1011 translator or 3.5.2 Media translator [RFC7667]. 1013 The identification of the source is made through the SSRC. The 1014 translation to a readable label can be done by mapping to information 1015 from the RTCP SDES CNAME and NAME packets as described in 1016 RTP[RFC3550], and also through information in the text media member 1017 in the conference notification described in RFC 4575 [RFC4575]. 1019 The sdp exchange for establishing this mixing type can be equal to 1020 what is used for basic two-party use of RFC 4103 with just an added 1021 attribute for indicating multi-party capability. 1023 m=text 49170 RTP/AVP 98 103 1024 a=rtpmap:98 red/1000 1025 a=fmtp:98 103/103/103 1026 a=rtpmap:103 t140/1000 1027 a=fmtp:103 cps=150 1028 a=RTT-mixing:RTP-translator 1030 A similar answer including the same RTT-mixing attribute would 1031 indicate that multi-party coding can begin. An answer without the 1032 same RTT-mixing attribute could result in diversion to use of the 1033 mixing method for multi-party unaware endpoints Section 4.1.1.10 if 1034 more than two parties are involved in the session. 1036 The bridge can add new sources in the communication to a participant 1037 by first sending a conference notification according to RFC 4575 1038 [RFC4575] with the SSRC of the new source included in the 1039 corresponding "text" media member, or by sending an RTCP message with 1040 the new SSRC in an SDES packet. 1042 A receiver should be prepared to receive such indications of new 1043 streams being added to the multi-party session, so that the new SSRC 1044 is not taken for a change in SSRC value for an already established 1045 RTP stream. 1047 Transmission, reception, packet loss recovery and text loss 1048 indication is performed per source in the separate RTP streams in the 1049 same way as in two-party sessions with RFC 4103 [RFC4575]. 1051 Text is recommended to be sent by the bridge as soon as it is 1052 available for transmission, but not less than 250 ms after a previous 1053 transmission. This will in many cases result in close to 0 added 1054 delay by the bridge, because most RTT senders use a 300 ms 1055 transmission interval. 1057 It is sometimes said that this configuration is not supported by 1058 current media declarations in sdp. RFC 3264 [RFC3264]specifies in 1059 some places that one media description is supposed to describe just 1060 one RTP media stream. However this is not directly referencing an 1061 RTP stream, and use of multiple RTP streams in the same RTP session 1062 is recommended in many other RFCs. 1064 This confusion is clarified in RFC 5576 [RFC5576] section 3 by the 1065 following statements: 1067 "The term "media stream" does not appear in the SDP specification 1068 itself, but is used by a number of SDP extensions, for instance, 1069 Interactive Connectivity Establishment (ICE) [ICE], to denote the 1070 object described by an SDP media description. This term is 1071 unfortunately rather confusing, as the RTP specification [RFC3550] 1072 uses the term "media stream" to refer to an individual media source 1073 or RTP packet stream, identified by an SSRC, whereas an SDP media 1074 stream describes an entire RTP session, which can contain any number 1075 of RTP sources." 1077 In most cases, it will be sufficient that new sources are introduced 1078 with a conference notification or RTCP message. However, RFC 5576 1079 [RFC5576] specifies attributes which may be used to more explicitly 1080 announce new sources or restart of earlier established RTP streams. 1082 This method is encouraged by draft-ietf-avtcore-multiplex-guidelines 1083 [I-D.ietf-avtcore-multiplex-guidelines] section 5.2. 1085 One way of operation will be that the bridge receives text packets 1086 from the source and handles any text recovery and indication of loss 1087 needed before queueing the resulting clean text for transmission from 1088 the bridge to the receivers. However, that method requires the mixer 1089 to decrypt the payload of the packets and makes end-to-end encryption 1090 impossible. 1092 It may however also be possible for the bridge to just convey the 1093 packet contents as received from the sources, with minor adjustments 1094 in the RTP header, and let the receiving endpoint handle all aspects 1095 of recovery and indication of loss, even for the source to bridge 1096 path. In that case also the sequence number sequence must be 1097 maintained as it was at reception in the bridge at least regarding 1098 gaps in the sequence. This mode needs further study before 1099 application. 1101 Pros: 1103 This method may be designed so that end-to-end encryption is enabled. 1105 This method is a natural way to do multi-party bridging with RFC 4103 1106 based RTT. 1108 This method has moderate overhead in terms of work for the mixer, but 1109 high in terms of packet transmission rate. Five sources sending 1110 simultaneously cause the bridge to send 15 packets per second to each 1111 receiver. 1113 When loss of packets occur, it is possible to recover text from 1114 redundancy at loss of up to the number of redundancy levels carried 1115 in the RFC 4103 [RFC4103] stream(normally primary and two redundant 1116 levels). 1118 More loss than what can be recovered, can be detected and the marker 1119 for text loss can be inserted in the correct stream. 1121 It may be possible in some scenarios to keep the text encrypted 1122 through the Translator. 1124 Minimal delay. The delay can often be kept close to 0 with at least 1125 5 simultaneous sending participants. 1127 Cons: 1129 There are RTP implementations not supporting the Translator model. 1130 They will need to use the fall-back to multi-party-unaware mixing or 1131 another method based on RTP-mixer. An investigation about how common 1132 this lack of support is is needed before the method is used. 1134 The processing time in standard organisation will be long. 1136 With many simultaneous sending sources, the total rate of packets 1137 will be high, and can cause congestion. The requirement to handle 3 1138 simultaneous sources in this specification will cause 10 packets per 1139 second that is manageable in most cases, e.g. considering that audio 1140 usually use 50 packets per second. 1142 4.1.2.2. Selective Forwarding Middlebox 1144 From some points of view, use of multiple RTP streams, one for each 1145 source, sent in the same RTP session would be efficient, and would 1146 use exactly the same packet format as [RFC4103] and the same payload 1147 type. 1149 A couple of relevant scenarios using multiple RTP-streams are 1150 specified in "RTP Topologies" [RFC7667]. One is described in the 1151 previous section. Another possibility of special interest is the 1152 Selective Forwarding Middlebox (SFM) topology specified in RFC 7667 1153 section 3.7 that could enable end to end encryption. The idea of SFM 1154 is that the mixer selects a limited number of sources to be conveyed 1155 to the participants while other media streams are discarded. This 1156 causes very good efficiency for the audio and video media which are 1157 transmitted continuously from the sources. 1159 In contrast to audio and video, real-time text is only transmitted 1160 when the users actually transmit information. Thus an SFM solution 1161 would not need to exclude any party from transmission under all 1162 normal conditions. It needs however be able to vary which sources 1163 are conveyed depending on which users are active transmitting at the 1164 moment. 1166 In order to allow the mixer to convey the packets with the payload 1167 preserved and encrypted, an SFM solution would need to act on some 1168 specific characteristics of the "text/red" format. The redundancy 1169 headers are part of the payload, so the receiver would need to just 1170 assume that the payload type number in the redundancy header is for 1171 "text/t140". The characters per second parameter (CPS) would need to 1172 act per stream. The relation between the SSRC and the source would 1173 need to be conveyed in some specified way, e.g. in the CSRC. 1174 Recovery and loss detection would preferably be based on sequence 1175 number gap detection. Thus sequence number gaps in the incoming 1176 stream to the mixer would need to be reflected in the stream to the 1177 participant and no new gaps created by the mixer, even if the 1178 sequence number series may be different. 1180 Pros: 1182 This method may be designed so that end-to-end encryption is enabled. 1184 This method is a natural way to do multi-party bridging with RFC 4103 1185 based RTT. 1187 This method has moderate overhead in terms of work for the mixer, but 1188 high in terms of packet transmission rate. Five sources sending 1189 simultaneously cause the bridge to send 15 packets per second to each 1190 receiver. 1192 When loss of packets occur, it is possible to recover text from 1193 redundancy at loss of up to the number of redundancy levels carried 1194 in the RFC 4103 [RFC4103] stream(normally primary and two redundant 1195 levels). 1197 More loss than what can be recovered, can be detected and the marker 1198 for text loss can be inserted in the correct stream. 1200 Minimal delay. The delay can often be kept close to 0 with at least 1201 5 simultaneous sending participants. 1203 Cons: 1205 There are RTP implementations not supporting the SFM method. They 1206 will need to use the fall-back to multi-party-unaware mixing or 1207 another method based on RTP-mixer. 1209 With very rarely occurring high number of simultaneous sending 1210 sources, the SFM will need to discard text from some sources in order 1211 to keep the total rate of packets at a suitable level. That can 1212 cause confusion. 1214 This method requires a lot of further specification. 1216 4.1.2.3. Distributing packets in an end-to-end encryption structure 1218 In order to achieve end-to-end encryption, it is possible to let the 1219 packets from the sources just pass though a central distributor, and 1220 handle the security agreements between the participants. 1221 Specifications exist for a framework with this functionality for 1222 application on RTP based conferences in 1223 [I-D.ietf-perc-private-media-framework]. The RTP flow and mixing 1224 characteristics has similarities with the method described under "RTP 1225 Translator sending one RTT stream per participant" above. RFC 4103 1226 RTP streams [RFC4103] would fit into the structure and it would 1227 provide a base for end-to-end encrypted rtt multi-party conferencing. 1229 Pros: 1231 Good security 1233 Straightforward multi-party handling. 1235 Cons: 1237 Does not operate under the usual SIP central conferencing 1238 architecture. 1240 Requires the participants to perform a lot of key handling. 1242 Is work in progress when this is written. 1244 4.1.2.4. Mesh of RTP endpoints 1246 Text from all participants are transmitted directly to all others in 1247 one RTP session, without a central bridge. The sources of the text 1248 in each RTP packet are identified by the source network address and 1249 the SSRC. 1251 This method is described in RFC 7667, section 3.4 Point to multi- 1252 point using mesh [RFC7667]. 1254 Pros: 1256 When loss of packets occur, it is possible to recover text from 1257 redundancy at loss of up to the number of redundancy levels carried 1258 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 1259 levels. 1261 This method can be implemented with most RTP implementations. 1263 Transmitted text can also be used with other transports than RTP 1265 Cons: 1267 This model is not described in IMS, NENA and EENA specifications, and 1268 does therefore not meet the requirements. 1270 Requires a drastically increasing number of connections when the 1271 number of participants increase. 1273 4.1.2.5. Multiple RTP sessions, one for each participant 1275 Text from all participants are transmitted directly to all others in 1276 one RTP session each, without a central bridge. Each session is 1277 established with a separate media description in SDP. The sources of 1278 the text in each RTP packet are identified by the source network 1279 address and the SSRC. 1281 Pros: 1283 When loss of packets occur, it is possible to recover text from 1284 redundancy at loss of up to the number of redundancy levels carried 1285 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 1286 levels. 1288 Complete loss of text can be indicated in the received stream. 1290 This method can be implemented with most RTP implementations. 1292 End-to-end encryption is achievable. 1294 Cons: 1296 This method is not described in IMS, NENA and ETSI specifications and 1297 does therefore not meet the requirements. 1299 A lot of network resources are spent on setting up separate sessions 1300 for each participant. 1302 5. Preferred RTP-based multi-party RTT transport method 1304 For RTP transport of RTT using RTP-mixer technology, one method for 1305 multi-party mixing and transport stand out as fulfilling the goals 1306 best and is therefore recommended. That is: "RTP Mixer interleaving 1307 packets, receiver using timestamp to recover from loss" 1308 Section 4.1.1.4 1310 For RTP transport in separate streams or sessions, no current 1311 recommendation can be made. A bridging method in the process of 1312 standardisation with interesting characteristics is the end-to-end 1313 encryption model "perc" Section 4.1.2.3. 1315 6. Session control of RTP-based multi-party RTT sessions 1317 General session control aspects for multi-party sessions are 1318 described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP) 1319 Event Package for Conference State, and RFC 4579 [RFC4579] Session 1320 Initiation Protocol (SIP) Call Control - Conferencing for User 1321 Agents. The nomenclature of these specifications are used here. 1323 The procedures for a multi-party aware model for RTT-transmission 1324 shall only be applied if a capability exchange for multi-party aware 1325 real-time text transmission has been completed and a supported method 1326 for multi-party real-time text transmission can be negotiated. 1328 A method for detection of conference-awareness for centralized SIP 1329 conferencing in general is specified in RFC 4579 [RFC4579]. The 1330 focus sends the "isfocus" feature tag in a SIP Contact header. This 1331 causes the conference-aware endpoint to subscribe to conference 1332 notifications from the focus. The focus then sends notifications to 1333 the endpoint about entering and disappearing conference participants 1334 and their media capabilities. The information is carried XML- 1335 formatted in a 'conference-info' block in the notification according 1336 to RFC 4575 [RFC4575]. The mechanism is described in detail in RFC 1337 4575 [RFC4575]. 1339 Before a conference media server starts sending multi-party RTT to an 1340 endpoint, a verification of its ability to handle multi-party RTT 1341 must be made. A decision on which mechanism to use for identifying 1342 text from the different participants must also be taken, implicitly 1343 or explicitly. These verifications and decisions can be done in a 1344 number of ways. The most apparent ways are specified here and their 1345 pros and cons described. One of the methods is selected to be the 1346 one to be used by implementations of the centralized conference model 1347 according to this specification. 1349 6.1. Implicit RTT multi-party capability indication 1351 Capability for RTT multi-party handling can be decided to be 1352 implicitly indicated by session control items. 1354 The focus may implicitly indicate muti-party RTT capability by 1355 including the media child with value "text" in the RFC 4575 [RFC4575] 1356 conference-info provided in conference notifications. 1358 An endpoint may implicitly indicate multi-party RTT capability by 1359 including the text media in the SDP in the session control 1360 transactions with the conference focus after the subscription to the 1361 conference has taken place. 1363 The implicit RTT capability indication means for the focus that it 1364 can handle multi-party RTT according to the preferred method 1365 indicated in the RTT multi-party methods section above. 1367 The implicit RTT capability indication means for the endpoint that it 1368 can handle multi-party RTT according to the preferred method 1369 indicated in the RTT multi-party methods section above. 1371 If the focus detects that an endpoint implicitly declared RTT multi- 1372 party capability, it SHALL provide RTT according to the preferred 1373 method. 1375 If the focus detects that the endpoint does not indicate any RTT 1376 multi-party capability, then it shall either provide RTT multi-party 1377 text in the way specified for conference-unaware endpoint above, or 1378 refuse to set up the session. 1380 If the endpoint detects that the focus has implicitly declared RTT 1381 multi-party capability, it shall be prepared to present RTT in a 1382 multi-party fashion according to the preferred method. 1384 Pros: 1386 Acceptance of implicit multi-party capability implies that no 1387 standardisation of explicit RTT multi-party capability exchange is 1388 required. 1390 Cons: 1392 If other methods for multi-party RTT are to be used in the same 1393 implementation environment as the preferred ones, then capability 1394 exchange needs to be defined for them. 1396 Cannot be used outside a strictly applied SIP central conference 1397 model. 1399 6.2. RTT multi-party capability declared by SIP media-tags 1401 Specifications for RTT multi-party capability declarations can be 1402 agreed for use as SIP media feature tags, to be exchanged during SIP 1403 call control operation according to the mechanisms in RFC 3840 1404 [RFC3840] and RFC 3841 [RFC3841]. Capability for the RTT Multi-party 1405 capability is then indicated by the media feature tag "rtt-mix", with 1406 a set of possible values for the different possible methods. 1408 The possible values in the list may for example be: 1410 rtp-mixer 1412 perc 1414 rtp-mixer indicates capability for using the RTP-mixer based 1415 presentation of multi-party text. 1417 perc indicates capability for using the perc based transmission of 1418 multi-party text. 1420 Example: Contact: 1422 ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL" 1424 ;+sip.rtt-mix="rtp-mixer" 1426 If, after evaluation of the alternatives in this specification, only 1427 one mixing method is selected to be brought to implementation, then 1428 the media tag can be reduced to a single tag with no list of values. 1430 An offer-answer exchange should take place and the common method 1431 selected by the answering party shall be used in the session with 1432 that UA. 1434 When no common method is declared, then only the fallback method for 1435 multi-party unaware participants can be used, or the session dropped. 1437 If more than one text media section is included in SDP, all must be 1438 capable of using the declared RTT multi-party method. 1440 Pros: 1442 Provides a clear decision method. 1444 Can be extended with new mixing methods. 1446 Can guide call routing to a suitable capable focus. 1448 Cons: 1450 Requires standardization and IANA registration. 1452 Is not stream specific. If more than one text stream is specified, 1453 all must have the same type of multi-party capability. 1455 Cannot be used in the WebRTC environment. 1457 6.3. SDP media attribute for RTT multi-party capability indication 1459 An attribute can be specified on media level, to be used in text 1460 media SDP declarations for negotiating RTT multi-party capabilities. 1461 The attribute can have the name "rtt-mixing". 1463 More than one attribute can be included in one media description. 1465 The attribute can have a value. The value can for example be: 1467 rtp-mixer 1469 rtp-translator 1471 perc 1473 rtp-mixer indicates capability for using the RTP-mixer and CSRC-list 1474 based mixing of multi-party text. 1476 rtp-translator indicates capability for using the RTP-translator 1477 based mixing 1479 perc indicates capability for using the perc based transmission of 1480 multi-party text. 1482 An offer-answer exchange should take place and the common method 1483 selected by the answering party shall be used in the session with 1484 that endpoint. 1486 When no common method is declared, then only the fallback method for 1487 multi-party unaware endpoints can be used. 1489 Example: a=rtt-mixing:rtp-mixer 1490 If, after evaluation of the alternatives in this specification, only 1491 one mixing method is selected to be brought to implementation, then 1492 the attribute can be reduced to a single attribute with no list of 1493 values. 1495 Pros: 1497 Provides a clear decision method. 1499 Can be extended with new mixing methods. 1501 Can be used on specific text media. 1503 Can be used also for SDP-controlled WebRTC sessions with multiple 1504 streams in the same data channel. 1506 Cons: 1508 Requires standardization and IANA registration. 1510 Cannot guide SIP routing. 1512 6.4. Simplified SDP media attribute for RTT multi-party capability 1513 indication 1515 An attribute can be specified on media level, to be used in text 1516 media SDP declarations for negotiating RTT multi-party capabilities. 1517 The attribute can have a name suitable for the selected method and no 1518 value. It would be selected and used if only one method for multi- 1519 party rtt is brought forward from this specification, and the other 1520 left unspecified for now or found to be possible to negotiate in 1521 another way. 1523 An offer-answer exchange should take place and if both parties 1524 specify rtt-mixing capability with the same attribute, the selected 1525 mixing method shall be used. 1527 When no common method is declared, then only the fallback method for 1528 multi-party unaware endpoints can be used, or the session not 1529 accepted for multi-party use. 1531 Example: a=rtt-mix 1533 Pros: 1535 Provides a clear decision method. 1537 Very simple syntax and semantics. 1539 Can be used on specific text media. 1541 Cons: 1543 Requires standardization and IANA registration. 1545 If another RTT mixing method is also specified in the future, then 1546 that method may also need to specify and register its own attribute, 1547 instead of if an attribute with a parameter value is used, when only 1548 an addition of a new possible value is needed. 1550 Cannot guide SIP routing. 1552 6.5. SDP format parameter for RTT multi-party capability indication 1554 An FMTP format parameter can be specified for the RFC 4103 1555 [RFC4103]media, to be used in text media SDP declarations for 1556 negotiating RTT multi-party capabilities. The parameter can have the 1557 name "rtt-mixing", with one or more of its possible values. 1559 The possible values in the list are: 1561 rtp-mixer 1563 perc 1565 rtp-mixer indicates capability for using the RTP-mixer based mixing 1566 and presentation of multi-party text using the CSRC-list. 1568 perc indicates capability for using the perc based transmission of 1569 multi-party text. 1571 Example: a=fmtp 96 98/98/98 rtt-mixing=rtp-mixer 1573 If, after evaluation of the alternatives in this specification, only 1574 one mixing method is selected to be brought to implementation, then 1575 the parameter can be reduced to a single parameter with no list of 1576 values. 1578 An offer-answer exchange should take place and the common method 1579 selected by the answering party shall be used in the session with 1580 that UA. 1582 When no common method is declared, then only the fallback method can 1583 be used, or the session denied. 1585 Pros: 1587 Provides a clear decision method. 1589 Can be extended with new mixing methods. 1591 Can be used on specific text media. 1593 Can be used also for SDP-controlled WebRTC sessions with multiple 1594 streams in the same data channel. 1596 Cons: 1598 Requires standardization and IANA registration. 1600 May cause interop problems with current RFC4103 [RFC4103] 1601 implementations not expecting a new fmtp-parameter. 1603 Cannot guide SIP routing. 1605 6.6. A text media subtype for support of multi-party rtt 1607 Indicating a specific text media subtype in SDP is a straightforward 1608 way for negotiating multi-party capability. Especially if there are 1609 format differences from the "text/red" and "text/t140" formats of 1610 RFC4103 [RFC4103], then this is a natural way to do the negotiation 1611 for multi-party rtt. 1613 Pros: 1615 No extra efforts if a new format is needed anyway. 1617 Cons: 1619 None specific to using the format indication for negotiation of 1620 multi-party capability. But only feasible if a new format is needed 1621 anyway. 1623 6.7. Preferred capability declaration method for RTP-based transport. 1625 If the preferred transport method is one with a specific media 1626 subtype in sdp, then specification by media subtype is preferred. 1628 If this would not be the case, then the preferred capability 1629 declaration method would be the one with a specific SDP attribute for 1630 the selected mixing method Section 6.4 because it is straightforward. 1632 6.8. Identification of the source of text for RTP-based solutions 1634 The main way to identify the source of text in the RTP based solution 1635 is by the SSRC of the sending participant. In the RTP-mixer 1636 solution, this SSRC is included in the CSRC list of the transmitted 1637 packets. Further identification that may be needed for better 1638 labelling of received text may be achieved from a number of sources. 1639 It may be the RTCP SDES CNAME and NAME reports, and in the conference 1640 notification data (RFC 4575) [RFC4575]. 1642 As soon as a new member is added to the RTP session, its 1643 characteristics should be transmitted in RTCP SDES CNAME and NAME 1644 reports according to section 6.5 in RFC 3550 [RFC3550]. The 1645 information about the participant should also be included in the 1646 conference data including the text media member in a notification 1647 according to RFC 4575 [RFC4575]. 1649 The RTCP SDES report, SHOULD contain identification of the source 1650 represented by the SSRC/CSRC identifier. This identification MUST 1651 contain the CNAME field and MAY contain the NAME field and other 1652 defined fields of the SDES report. 1654 A focus UA SHOULD primarily convey SDES information received from the 1655 sources of the session members. When such information is not 1656 available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME 1657 information from available information from the SIP session with the 1658 participant. 1660 Provision of detailed information in the NAME field has security 1661 implications, especially if provided without encryption. 1663 7. RTT bridging in WebRTC 1665 Within WebRTC, real-time text is specified to be carried in WebRTC 1666 data channels as specified in 1667 [I-D.ietf-mmusic-t140-usage-data-channel]. A few ways to handle 1668 multi-party RTT are mentioned briefly. They are repeated below. 1670 7.1. RTT bridging in WebRTC with one data channel per source 1672 A straightforward way to handle multi-party RTT is for the bridge to 1673 open one T.140 data channel per source towards the receiving 1674 participants. 1676 The stream-id forms a unique stream identification. 1678 The identification of the source is made through the Label property 1679 of the channel, and session information belonging to the source. The 1680 endpoint can compose a readable label for the presentation from this 1681 information. 1683 Pros: 1685 This is a straightforward solution. 1687 The load per source is low. 1689 Cons: 1691 With a high number of participants, the overhead of establishing and 1692 maintaining the high number of data channels required may be high, 1693 even if the load per channel is low. 1695 7.2. RTT bridging in WebRTC with one common data channel 1697 A way to handle multi-party RTT in WebRTC is for the bridge combine 1698 text from all sources into one data channel and insert the sources in 1699 the stream by a T.140 control code for source. 1701 This method is described in a corresponding section for RTP 1702 transmission above in Section 4.1.1.9. 1704 The identification of the source is made through insertion in the 1705 beginning of each text transmission from a source of a control code 1706 extension "c" followed by a string representing the source, framed by 1707 the control code start and end flags SOS and ST (See ITU-T T.140 1708 [T140]). 1710 A receiving endpoint is supposed to separate text items from the 1711 different sources and identify and display them in a suitable way. 1713 The endpoint does not always display the source identification in the 1714 received text at the place where it is received, but has the 1715 information as a guide for planning the presentation of received 1716 text. A label corresponding to the source identification is 1717 presented when needed depending on the selected presentation style. 1719 Pros: 1721 This solution has relatively low overhead on session and network 1722 level 1724 Cons: 1726 This solution has higher overhead on the media contents level than 1727 the WebRTC solution above. 1729 Standardisation of the new control code "c" in ITU-T T.140 [T140] is 1730 required. 1732 The conference server need to be allowed to decrypt/encrypt the data 1733 channel contents. 1735 7.3. Preferred rtt multi-party method for WebRTC 1737 For WebRTC, one method is to prefer because of the simplicity. So, 1738 for WebRTC, the method to implement for multi-party RTT with multi- 1739 party aware parties when no other method is explicitly agreed between 1740 implementing parties is: "RTT bridging in WebRTC with one data 1741 channel per source" Section 7.1. 1743 8. Presentation of multi-party text 1745 All session participants with RTP based transport MUST observe the 1746 SSRC/CSRC field of incoming text RTP packets, and make note of which 1747 source they came from in order to be able to present text in a way 1748 that makes it easy to read text from each participant in a session, 1749 and get information about the source of the text. 1751 In the WebRTC case, the Label parameter and other provided endpoint 1752 information should be used for the same purpose. 1754 8.1. Associating identities with text streams 1756 A source identity SHOULD be composed from available information 1757 sources and displayed together with the text as indicated in ITU-T 1758 T.140 Appendix[T140]. 1760 The source identity should primarily be the NAME field from incoming 1761 SDES packets. If this information is not available, and the session 1762 is a two-party session, then the T.140 source identity SHOULD be 1763 composed from the SIP session participant information. For multi- 1764 party sessions the source identity may be composed by local 1765 information if sufficient information is not available in the 1766 session. 1768 Applications may abbreviate the presented source identity to a 1769 suitable form for the available display. 1771 Applications may also replace received source information with 1772 internally used nicknames. 1774 8.2. Presentation details for multi-party aware endpoints. 1776 The multi-party aware endpoint should after any action for recovery 1777 of data from lost packets, separate the incoming streams and present 1778 them according to the style that the receiving application supports 1779 and the user has selected. The decisions taken for presentation of 1780 the multi-party interchange shall be purely on the receiving side. 1781 The sending application must not insert any item in the stream to 1782 influence presentation that is not requested by the sending 1783 participant. 1785 8.2.1. Bubble style presentation 1787 One often used style is to present real-time text in chunks in 1788 readable bubbles identified by labels containing names of sources. 1789 Bubbles are placed in one column in the presentation area and are 1790 closed and moved upwards in the presentation area after certain items 1791 or events, when there is also newer text from another source that 1792 would go into a new bubble. The text items that allows bubble 1793 closing are any character closing a phrase or sentence followed by a 1794 space or a timeout of a suitable time (about 10 seconds). 1796 Real-time active text sent from the local user should be presented in 1797 a separate area. When there is a reason to close a bubble from the 1798 local user, the bubble should be placed above all real-time active 1799 bubbles, so that the time order that real-time text entries were 1800 completed is visible. 1802 Scrolling is usually provided for viewing of recent or older text. 1803 When scrolling is done to an earlier point in the text, the 1804 presentation shall not move the scroll position by new received text. 1805 It must be the decision of the local user to return to automatic 1806 viewing of latest text actions. It may be useful with an indication 1807 that there is new text to read after scrolling to an earlier position 1808 has been activated. 1810 The presentation area may become too small to present all text in all 1811 real-time active bubbles. Various techniques can be applied to 1812 provide a good overview and good reading opportunity even in such 1813 situations. The active real-time bubble may have a limited number of 1814 lines and if their contents need more lines, then a scrolling 1815 opportunity within the real-time active bubble is provided. Another 1816 method can be to only show the label and the last line of the active 1817 real-time bubble contents, and make it possible to expand or compress 1818 the bubble presentation between full view and one line view. 1820 Erasures require special consideration. Erasure within a real-time 1821 active bubble is straightforward. But if erasure from one 1822 participant affects the last character before a bubble, the whole 1823 previous bubble becomes the actual bubble for real-time action by 1824 that participant and is placed below all other bubbles in the 1825 presentation area. If the border between bubbles was caused by the 1826 CRLF characters (instead of the normal "Line Separator"), only one 1827 erasure action is required to erase this bubble border. When a 1828 bubble is closed, it is moved up, above all real-time active bubbles. 1830 A three-party view is shown in this example . 1832 _________________________________________________ 1833 | |^| 1834 | |-| 1835 |[Alice] Hi, Alice here. | | 1836 | | | 1837 |[Bob] Bob as well. | | 1838 | | | 1839 |[Eve] Hi, this is Eve, calling from Paris. | | 1840 | I thought you should be here. | | 1841 | | | 1842 |[Alice] I am coming on Thursday, my | | 1843 | performance is not until Friday morning.| | 1844 | | | 1845 |[Bob] And I on Wednesday evening. | | 1846 | | | 1847 |[Alice] Can we meet on Thursday evening? | | 1848 | | | 1849 |[Eve] Yes, definitely. How about 7pm. | | 1850 | at the entrance of the restaurant | | 1851 | Le Lion Blanc? | | 1852 |[Eve] we can have dinner and then take a walk | | 1853 | | | 1854 | But I need to be back to | | 1855 | the hotel by 11 because I need | | 1856 | | | 1857 | I wou |-| 1858 |______________________________________________|v| 1859 | of course, I underst | 1860 |________________________________________________| 1862 Figure 1: Three-party call with bubble style. 1864 Figure 1: Example of a three-party call presented in the bubble 1865 style. 1867 8.2.2. Other presentation styles 1869 Other presentation styles than the bubble style may be arranged and 1870 appreciated by the users. In a video conference one way may be to 1871 have a real-time text area below the video view of each participant. 1872 Another view may be to provide one column in a presentation area for 1873 each participant and place the text entries in a relative vertical 1874 position corresponding to when text entry in them was completed. The 1875 labels can then be placed in the column header. The considerations 1876 for ending and moving and erasure of entered text discussed above for 1877 the bubble style are valid also for these styles. 1879 This figure shows how a coordinated column view MAY be presented. 1881 _____________________________________________________________________ 1882 | Bob | Eve | Alice | 1883 |____________________|______________________|_______________________| 1884 | | |I will arrive by TGV. | 1885 |My flight is to Orly| |Convenient to the main | 1886 | |Hi all, can we plan |station. | 1887 | |for the seminar? | | 1888 |Eve, will you do | | | 1889 |your presentation on| | | 1890 |Friday? |Yes, Friday at 10. | | 1891 |Fine, wo | |We need to meet befo | 1892 |___________________________________________________________________| 1894 Figure 2: A coordinated column-view of a three-party session with 1895 entries ordered in approximate time-order. 1897 9. Presentation details for multi-party unaware endpoints. 1899 Multi-party unaware endpoints are prepared only for presentation of 1900 two sources of text, the local user and a remote user. If mixing for 1901 multi-party unaware endpoints is to be supported, in order to enable 1902 some multi-party communication with such endpoint, the mixer need to 1903 plan the presentation and insert labels and line breaks before 1904 lables. Many limitations appear for this presentation mode, and it 1905 must be seen as a fallback and a last resort. 1907 A procedure for presenting RTT to a conference-unaware endpoint is 1908 included in [I-D.ietf-avtcore-multi-party-rtt-mix] 1910 10. Security Considerations 1912 The security considerations valid for RFC 4103 [RFC4103] and RFC 3550 1913 [RFC3550] are valid also for the multi-party sessions with text. 1915 11. IANA Considerations 1917 The items for indication and negotiation of capability for multi- 1918 party rtt should be registered with IANA in the specifications where 1919 they are specified in detail. 1921 12. Congestion considerations 1923 The congestion considerations described in RFC 4103 [RFC4103] are 1924 valid also for the recommended RTP-based multi-party use of the real- 1925 time text transport. A risk for congestion may appear if a number of 1926 conference participants are active transmitting text simultaneously, 1927 because the recommended RTP-based multi-party transmission method 1928 does not allow multiple sources of text to contribute to the same 1929 packet. 1931 In situations of risk for congestion, the Focus UA MAY combine 1932 packets from the same source to increase the transmission interval 1933 per source up to one second. Local conference policy in the Focus UA 1934 may be used to decide which streams shall be selected for such 1935 transmission frequency reduction. 1937 13. Acknowledgements 1939 Arnoud van Wijk for contributions to an earlier, expired draft of 1940 this memo. 1942 14. Change history 1944 14.1. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-07 1946 Adjustment of section 4.1.1.4 to match the specification in draft- 1947 ietf-avtcore-multi-party-rtt-mix-20. 1949 14.2. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-06 1951 Addition of the Selective Forwarding Middlebox SFM among the methods 1952 with multiple RTP streams, to match the contents of draft-ietf- 1953 avtcore-multi-party-rtt-mix-12. 1955 14.3. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-05 1957 Modify the solution changing source in every packet in the RTP-mixer 1958 solution, and base recovery on analyzing timestamp and make it the 1959 recommended one. Aligned with the recommendation in draft-ietf- 1960 avtcore-multi-party-rtt-mix-10. 1962 14.4. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-04 1964 Change name of simplified sdp attribute to "rtt-mix" to match a 1965 change in the draft draft-ietf-avtcore-multi-party-rtt-mix-09. 1967 14.5. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-03 1969 Modified info on the method with RFC 4103 format and sdp attribute 1970 "rtt-mix-rtp-mixer". 1972 Increased the performance requirements section. 1974 Inserted recommendations, with emphasis on ease of implementation and 1975 ease of standardisation. 1977 14.6. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-02 1979 Added detail in the section on RTP translator model alternative 1980 4.1.2.1. 1982 14.7. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-01 1984 Added three more methods for RTP-mixer mixing. Two RFC 5109 FEC 1985 based and another with modified data header to detect source of 1986 completely lost text. 1988 Separated RTP-based and WebRTC based solutions. 1990 Deleted the multi-party-unaware mixing procedure appendix. It is now 1991 included in the draft draft-ietf-avtcore-multi-party-rtt-mix. Kept a 1992 section with a reference to the new place. 1994 14.8. Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to draft- 1995 hellstrom-avtcore-multi-party-rtt-solutions-00 1997 Add discussion about switching performance, as discussed in avtcore 1998 on March 13. 2000 Added that a decrease of transmission interval to 100 ms increases 2001 switching performance by a factor 3, but still not sufficient. 2003 Added that the CSRC-list method also uses 100 milliseconds 2004 transmission interval. 2006 Added the method with multiple primary text in each packet. 2008 Added the timestamp-based method for rtp-mixing proposed by James 2009 Hamlin on March 14. 2011 Corrected the chat style presentation example picture. Delete a few 2012 "[mix]". 2014 14.9. Changes from version draft-hellstrom-mmusic-multi-party-rtt-01 to 2015 -02 2017 Change from a general overview to overview with clear 2018 recommendations. 2020 Splits text coordination methods in three groups. 2022 Recommends rtt-mixer with sources in CSRC-list but refers to its spec 2023 for details. 2025 Shortened Appendix with conference-unaware example. 2027 Cleaned up preferences. 2029 Inserted pictures of screen-views. 2031 15. References 2033 15.1. Normative References 2035 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2036 Requirement Levels", BCP 14, RFC 2119, 2037 DOI 10.17487/RFC2119, March 1997, 2038 . 2040 15.2. Informative References 2042 [EN301549] ETSI, "EN 301 549. Accessibility requirements for ICT 2043 products and services", November 2019, 2044 . 2048 [I-D.ietf-avtcore-multi-party-rtt-mix] 2049 Hellstrom, G., "RTP-mixer formatting of multi-party Real- 2050 time text", Work in Progress, Internet-Draft, draft-ietf- 2051 avtcore-multi-party-rtt-mix-10, 18 November 2020, 2052 . 2055 [I-D.ietf-avtcore-multiplex-guidelines] 2056 Westerlund, M., Burman, B., Perkins, C., Alvestrand, H., 2057 and R. Even, "Guidelines for using the Multiplexing 2058 Features of RTP to Support Multiple Media Streams", Work 2059 in Progress, Internet-Draft, draft-ietf-avtcore-multiplex- 2060 guidelines-12, 16 June 2020, . 2063 [I-D.ietf-mmusic-t140-usage-data-channel] 2064 Holmberg, C. and G. Hellstrom, "T.140 Real-time Text 2065 Conversation over WebRTC Data Channels", Work in Progress, 2066 Internet-Draft, draft-ietf-mmusic-t140-usage-data-channel- 2067 14, 10 April 2020, . 2070 [I-D.ietf-perc-private-media-framework] 2071 Jones, P., Benham, D., and C. Groves, "A Solution 2072 Framework for Private Media in Privacy Enhanced RTP 2073 Conferencing (PERC)", Work in Progress, Internet-Draft, 2074 draft-ietf-perc-private-media-framework-12, 5 June 2019, 2075 . 2078 [NENAi3] NENA, "NENA-STA-010.2-2016. Detailed Functional and 2079 Interface Standards for the NENA i3 Solution", October 2080 2016, . 2082 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 2083 Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse- 2084 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 2085 DOI 10.17487/RFC2198, September 1997, 2086 . 2088 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 2089 A., Peterson, J., Sparks, R., Handley, M., and E. 2090 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 2091 DOI 10.17487/RFC3261, June 2002, 2092 . 2094 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 2095 with Session Description Protocol (SDP)", RFC 3264, 2096 DOI 10.17487/RFC3264, June 2002, 2097 . 2099 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 2100 Jacobson, "RTP: A Transport Protocol for Real-Time 2101 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 2102 July 2003, . 2104 [RFC3840] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, 2105 "Indicating User Agent Capabilities in the Session 2106 Initiation Protocol (SIP)", RFC 3840, 2107 DOI 10.17487/RFC3840, August 2004, 2108 . 2110 [RFC3841] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 2111 Preferences for the Session Initiation Protocol (SIP)", 2112 RFC 3841, DOI 10.17487/RFC3841, August 2004, 2113 . 2115 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 2116 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 2117 . 2119 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 2120 Session Initiation Protocol (SIP)", RFC 4353, 2121 DOI 10.17487/RFC4353, February 2006, 2122 . 2124 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A 2125 Session Initiation Protocol (SIP) Event Package for 2126 Conference State", RFC 4575, DOI 10.17487/RFC4575, August 2127 2006, . 2129 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 2130 (SIP) Call Control - Conferencing for User Agents", 2131 BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, 2132 . 2134 [RFC4597] Even, R. and N. Ismail, "Conferencing Scenarios", 2135 RFC 4597, DOI 10.17487/RFC4597, August 2006, 2136 . 2138 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 2139 Correction", RFC 5109, DOI 10.17487/RFC5109, December 2140 2007, . 2142 [RFC5194] van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real- 2143 Time Text over IP Using the Session Initiation Protocol 2144 (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008, 2145 . 2147 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 2148 Media Attributes in the Session Description Protocol 2149 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, 2150 . 2152 [RFC6443] Rosen, B., Schulzrinne, H., Polk, J., and A. Newton, 2153 "Framework for Emergency Calling Using Internet 2154 Multimedia", RFC 6443, DOI 10.17487/RFC6443, December 2155 2011, . 2157 [RFC6881] Rosen, B. and J. Polk, "Best Current Practice for 2158 Communications Services in Support of Emergency Calling", 2159 BCP 181, RFC 6881, DOI 10.17487/RFC6881, March 2013, 2160 . 2162 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 2163 DOI 10.17487/RFC7667, November 2015, 2164 . 2166 [T140] ITU-T, "Recommendation ITU-T T.140 (02/1998), Protocol for 2167 multimedia application text conversation", February 1998, 2168 . 2170 [T140ad1] ITU-T, "Recommendation ITU-T.140 Addendum 1 - (02/2000), 2171 Protocol for multimedia application text conversation", 2172 February 2000, 2173 . 2175 [TS103479] ETSI, "TS 103 479. Emergency communications (EMTEL); Core 2176 elements for network independent access to emergency 2177 services", December 2019, . 2181 [TS22173] 3GPP, "IP Multimedia Core Network Subsystem (IMS) 2182 Multimedia Telephony Service and supplementary services; 2183 Stage 1", 3GPP TS 22.173 17.1.0, 20 December 2019, 2184 . 2186 [TS24147] 3GPP, "Conferencing using the IP Multimedia (IM) Core 2187 Network (CN) subsystem; Stage 3", 3GPP TS 24.147 16.0.0, 2188 19 December 2019, 2189 . 2191 Author's Address 2193 Gunnar Hellstrom 2194 Gunnar Hellstrom Accessible Communication 2195 SE-136 70 Vendelso 2196 Sweden 2198 Email: gunnar.hellstrom@ghaccess.se