idnits 2.17.1 draft-hellstrom-avtcore-multi-party-rtt-solutions-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC4103]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (15 June 2020) is 1409 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC3264' is defined on line 1782, but no explicit reference was found in the text == Outdated reference: A later version (-20) exists of draft-ietf-avtcore-multi-party-rtt-mix-06 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Hellstrom 3 Internet-Draft Gunnar Hellstrom Accessible Communication 4 Intended status: Informational 15 June 2020 5 Expires: 17 December 2020 7 Real-time text solutions for multi-party sessions 8 draft-hellstrom-avtcore-multi-party-rtt-solutions-01 10 Abstract 12 This document specifies methods for Real-Time Text (RTT) media 13 handling in multi-party calls. The main transport is to carry Real- 14 Time text by the RTP protocol in a time-sampled mode according to RFC 15 4103 [RFC4103]. The mechanisms enable the receiving application to 16 present the received real-time text media separated per source, in 17 different ways according to user preferences. Some presentation 18 related features are also described explaining suitable variations of 19 transmission and presentation of text. 21 Call control features are described for the SIP environment. A 22 number of alternative methods for providing the multi-party 23 negotiation, transmission and presentation are discussed and a 24 recommendation for the main ones is provided. The main solution for 25 SIP based centralized multi-party handling of real-time text is 26 achieved through a media control unit coordinating multiple RTP text 27 streams into one RTP stream. 29 Alternative methods using a single RTP stream and source 30 identification inline in the text stream are also described, one of 31 them being provided as a lower functionality fallback method for 32 endpoints with no multi-party awareness for RTT. 34 Bridging methods where the text stream is carried without the 35 contents being dealt with in detail by the bridge are also discussed. 37 Brief information is also provided for multi-party RTT in the WebRTC 38 environment. 40 The intention is to provide background for decisions, specification 41 and implementation of selected methods. 43 Status of This Memo 45 This Internet-Draft is submitted in full conformance with the 46 provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF). Note that other groups may also distribute 50 working documents as Internet-Drafts. The list of current Internet- 51 Drafts is at https://datatracker.ietf.org/drafts/current/. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 This Internet-Draft will expire on 17 December 2020. 60 Copyright Notice 62 Copyright (c) 2020 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 67 license-info) in effect on the date of publication of this document. 68 Please review these documents carefully, as they describe your rights 69 and restrictions with respect to this document. Code Components 70 extracted from this document must include Simplified BSD License text 71 as described in Section 4.e of the Trust Legal Provisions and are 72 provided without warranty as described in the Simplified BSD License. 74 Table of Contents 76 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 77 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 78 2. Centralized conference model . . . . . . . . . . . . . . . . 5 79 3. Requirements on multi-party RTT . . . . . . . . . . . . . . . 5 80 4. RTP based solutions . . . . . . . . . . . . . . . . . . . . . 7 81 4.1. Coordination of text RTP streams . . . . . . . . . . . . 7 82 4.1.1. RTP-based solutions with a central mixer . . . . . . 7 83 4.1.1.1. RTP Mixer using default RFC 4103 methods . . . . 7 84 4.1.1.2. RTP Mixer using the default method but decreased 85 transmission interval . . . . . . . . . . . . . . . 8 86 4.1.1.3. RTP Mixer with frequent transmission and indicating 87 sources in CSRC-list . . . . . . . . . . . . . . . 9 88 4.1.1.4. RTP Mixer using timestamp to identify 89 redundancy . . . . . . . . . . . . . . . . . . . . 10 90 4.1.1.5. RTP Mixer with multiple primary data in each packet 91 and individual sequence numbers . . . . . . . . . . 11 92 4.1.1.6. RTP Mixer with multiple primary data in each 93 packet . . . . . . . . . . . . . . . . . . . . . . 12 94 4.1.1.7. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy 95 in the packets . . . . . . . . . . . . . . . . . . 13 97 4.1.1.8. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy 98 and separate sequence number in the packets . . . . 15 99 4.1.1.9. RTP Mixer indicating participants by a control code 100 in the stream . . . . . . . . . . . . . . . . . . . 17 101 4.1.1.10. Mixing for multi-party unaware user agents . . . 18 102 4.1.2. RTP-based bridging with minor RTT media contents 103 reformatting by the bridge . . . . . . . . . . . . . 20 104 4.1.2.1. RTP Translator sending one RTT stream per 105 participant . . . . . . . . . . . . . . . . . . . . 20 106 4.1.2.2. Distributing packets in an end-to-end encryption 107 structure . . . . . . . . . . . . . . . . . . . . . 21 108 4.1.2.3. Mesh of RTP endpoints . . . . . . . . . . . . . . 21 109 4.1.2.4. Multiple RTP sessions, one for each 110 participant . . . . . . . . . . . . . . . . . . . . 22 111 5. Preferred RTP-based multi-party RTT transport method . . . . 23 112 6. Session control of RTP-based multi-party RTT sessions . . . . 23 113 6.1. Implicit RTT multi-party capability indication . . . . . 24 114 6.2. RTT multi-party capability declared by SIP media-tags . . 25 115 6.3. SDP media attribute for RTT multi-party capability 116 indication . . . . . . . . . . . . . . . . . . . . . . . 26 117 6.4. Simplified SDP media attribute for RTT multi-party 118 capability indication . . . . . . . . . . . . . . . . . . 27 119 6.5. SDP format parameter for RTT multi-party capability 120 indication . . . . . . . . . . . . . . . . . . . . . . . 28 121 6.6. A text media subtype for support of multi-party rtt . . . 29 122 6.7. Preferred capability declaration method for RTP-based 123 transport. . . . . . . . . . . . . . . . . . . . . . . . 29 124 6.8. Identification of the source of text for RTP-based 125 solutions . . . . . . . . . . . . . . . . . . . . . . . . 30 126 7. RTT bridging in WebRTC . . . . . . . . . . . . . . . . . . . 30 127 7.1. RTT bridging in WebRTC with one data channel per 128 source . . . . . . . . . . . . . . . . . . . . . . . . . 30 129 7.2. RTT bridging in WebRTC with one common data channel . . . 31 130 7.3. Preferred rtt multi-party method for WebRTC . . . . . . . 32 131 8. Presentation of multi-party text . . . . . . . . . . . . . . 32 132 8.1. Associating identities with text streams . . . . . . . . 32 133 8.2. Presentation details for multi-party aware endpoints. . . 33 134 8.2.1. Bubble style presentation . . . . . . . . . . . . . . 33 135 8.2.2. Other presentation styles . . . . . . . . . . . . . . 35 136 9. Presentation details for multi-party unaware endpoints. . . . 35 137 10. Security Considerations . . . . . . . . . . . . . . . . . . . 35 138 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 139 12. Congestion considerations . . . . . . . . . . . . . . . . . . 36 140 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36 141 14. Change history . . . . . . . . . . . . . . . . . . . . . . . 36 142 14.1. Changes to 143 draft-hellstrom-avtcore-multi-party-rtt-solutions-01 . . 36 145 14.2. Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to 146 draft-hellstrom-avtcore-multi-party-rtt-solutions-00 . . 36 147 14.3. Changes from version 148 draft-hellstrom-mmusic-multi-party-rtt-01 to -02 . . . . 37 149 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 37 150 15.1. Normative References . . . . . . . . . . . . . . . . . . 37 151 15.2. Informative References . . . . . . . . . . . . . . . . . 37 152 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 40 154 1. Introduction 156 Real-time text (RTT) is a medium in real-time conversational 157 sessions. Text entered by participants in a session is transmitted 158 in a time-sampled fashion, so that no specific user action is needed 159 to cause transmission. This gives a direct flow of text in the rate 160 it is created, that is suitable in a real-time conversational 161 setting. The real-time text medium can be combined with other media 162 in multimedia sessions. 164 Media from a number of multimedia session participants can be 165 combined in a multi-party session. The present document specifies 166 how the real-time text streams can be handled in multi-party 167 sessions. Recommendations are provided for preferred methods. 169 The description is mainly focused on the transport level, but also 170 describes a few session and presentation level aspects. 172 Transport of real-time text is specified in RFC 4103 [RFC4103] RTP 173 Payload for text conversation. It makes use of RFC 3550 [RFC3550] 174 Real Time Protocol, for transport. Robustness against network 175 transmission problems is normally achieved through redundant 176 transmission based on the principle from RFC 2198 [RFC2198], with one 177 primary and two redundant transmission of each text element. Primary 178 and redundant transmissions are combined in packets and described by 179 a redundancy header. This transport is usually used in the SIP 180 Session Initiation Protocol RFC 3261 [RFC3261] environment. 182 A very brief overview of functions for real-time text handling in 183 multi-party sessions is described in RFC 4597 [RFC4597] Conferencing 184 Scenarios, sections 4.8 and 4.10. The present specification builds 185 on that description and indicates which protocol mechanisms should be 186 used to implement multi-party handling of real-time text. 188 Real-time text can also be transported in the WebRTC environment, by 189 using WebRTC data channels according to 190 [I-D.ietf-mmusic-t140-usage-data-channel]. Multi-party aspects for 191 WebRTC solutions are briefly covered. 193 1.1. Requirements Language 195 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 196 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 197 document are to be interpreted as described in RFC 2119 [RFC2119]. 199 2. Centralized conference model 201 In the centralized conference model for SIP, introduced in RFC 4353 202 [RFC4353] "A Framework for Conferencing with the Session Initiation 203 Protocol (SIP)", one function co-ordinates the communication with 204 participants in the multi-party session. This function also controls 205 media mixer functions for the media appearing in the session. The 206 central function is common for control of all media, while the media 207 mixers may work differently for each media. 209 The central function is called the Focus UA. Many variants exist for 210 setting up sessions including the multipoint control centre. It is 211 not within scope of this description to describe these, but rather 212 the media specific handling in the mixer required to handle multi- 213 party calls with RTT. 215 The main principle for handling real-time text media in a centralized 216 conference is that one RTP session for real-time text is established 217 including the multipoint media control centre and the participating 218 endpoints which are going to have real-time text exchange with the 219 others. 221 The different possible mechanisms for mixing and transporting RTT 222 differs in the way they multiplex the text streams and how they 223 identify the sources of the streams. RFC 7667 [RFC7667] describes a 224 number of possible use cases for RTP. This specification refers to 225 different sections of RFC 7667 for further reading of the situations 226 caused by the different possible design choices. 228 The recommended method for using RTT in a centralized conference 229 model is specified in [I-D.ietf-avtcore-multi-party-rtt-mix] based on 230 the recommendations in the present document. 232 Real-time text can also be transported in the WebRTC environment, by 233 using WebRTC data channels according to 234 [I-D.ietf-mmusic-t140-usage-data-channel]. Ways to handle multi- 235 party calls in that environmnent are also specified. 237 3. Requirements on multi-party RTT 239 The following requirements are placed on multi-party RTT: 241 A solution shall be applicable to IMS (3GPP TS 22.173)[TS22173], 242 SIP based VoIP and Next Generation Emergency Services (NENA i3 243 [NENAi3], ETSI TS 103 479 [TS103479], RFC 6443[RFC6443]). 245 The transmission interval for text must not be longer than 500 246 milliseconds when there is anything available to send. Ref ITU-T 247 T.140 [T140]. 249 If text loss is detected or suspected, a missing text marker shall 250 be inserted in the text stream. Ref ITU-T T.140 Amendment 1 251 [T140ad1]. ETSI EN 301 549 [EN301549] 253 The display of text from the members of the conversation shall be 254 arranged so that the text from each participant is clearly 255 readable, and its source and the relative timing of entered text 256 is visualized in the display. Mechanisms for looking back in the 257 contents from the current session should be provided. The text 258 should be displayed as soon as it is received. Ref ITU-T T.140 259 [T140] 261 Bridges must be multimedia capable (voice, video, text). Ref NENA 262 i3 STA-010.2. [NENAi3] 264 R7: It MUST be possible to use real-time text in conferences both 265 as a medium of discussion between individual participants (for 266 example, for sidebar discussions in real-time text while listening 267 to the main conference audio) and for central support of the 268 conference with real-time text interpretation of speech. Ref RFC 269 5194.[RFC5194] 271 It should be possible to protect RTT contents with usual means for 272 privacy and integrity.Ref RFC 6881 section 16. [RFC6881] 274 Conferencing procedures are documented in RFC 4579 [RFC4579]. Ref 275 NENA i3 STA-010.2.[NENAi3] 277 Conferencing applies to any kind of media stream by which users 278 may want to communicate. Ref 3GPP TS 24.147 [TS24147] 280 The framework for SIP conferences is specified in RFC 4353 281 [RFC4353]. Ref 3GPP TS 24.147 [TS24147] 283 The mixer performance requirements can be expressed in two 284 numbers. 286 1) The number of participants who can transmit simultaneously with 287 the text not being delayed in the mixer more than 500 288 milliseconds. This requirement is depending on the application. 289 Five simultaneous transmitting participants is a sufficiently high 290 number for most situations. 292 2) The switching time from when the mixer is transmitting text 293 from one participant and text arrives from another participant, 294 until the mixer sends the text from the second participant. This 295 time should not be more than 500 milliseconds when there are up to 296 five participants sending text simultaneously. 298 4. RTP based solutions 300 4.1. Coordination of text RTP streams 302 Coordinating and sending text RTP streams in the multi-party session 303 can be done in a number of ways. The most suitable methods are 304 specified here with pros and cons. 306 A receiving and presenting endpoint MUST separate text from the 307 different sources and identify and display them accordingly. 309 4.1.1. RTP-based solutions with a central mixer 311 A set of solutions can be based on the central RTP mixer. They are 312 described here and a preferred method selected. 314 4.1.1.1. RTP Mixer using default RFC 4103 methods 316 Without any extra specifications, a mixer would transmit with 300 317 milliseconds intervals, and use RFC 4103 [RFC4103] with the default 318 redundancy of one original and two redundant transmissions. The 319 source of the text would be indicated by a single member in the CSRC 320 list. Text from different sources cannot be transmitted in the same 321 packet. Therefore, from the time when the mixer sent one piece of 322 new text from one source, it will need to transmit that text again 323 twice as redundant data, before it can send text from another source. 324 The switching time will thus be 900 milliseconds. The mixer can not 325 even send text from two simultaneous sources without introducing more 326 than 500 milliseconds delay. This is clearly insufficient. 328 Pros: 330 Only a capability negotiation method is needed. No other update of 331 standards are needed, just a general remark that traditional RTP- 332 mixing is used. 334 Cons: 336 Clearly insufficient mixer switching performance. 338 A bit complex handling of transmission when there is new text 339 available from more than one source. The mixer needs to send two 340 packets more with redundant text from the current source before 341 starting to send anything from the other source. 343 4.1.1.2. RTP Mixer using the default method but decreased transmission 344 interval 346 This method makes use of the default RTP-mixing method briefly 347 described in Section 4.1.1.1. The only difference is that the 348 transmission interval is decreased to 100 milliseconds when there is 349 text from more than one source available for transmission. This 350 increases the switching performance to three source switches per 351 second. The delay of new text from a participant can be one second 352 if five users send new text simultaneously. Text from two 353 simultaneous users would not get more dealyed than 400 ms. 355 Pros: 357 Minor influence on standards 359 Can be sdp-declared as "text/red" with a multi-party attribute for 360 capability negotiation. 362 Cons: 364 Too long delay of new text from more than two simultaneous sources. 366 Slightly higher risk for loss of text at bursty packet loss than for 367 the recommended transmission interval (300 ms) for RFC 4103. 369 When complete loss of packets occur (beyond recovery), it is not 370 possible to deduct from which source text was lost. 372 A bit complex handling of transmission when there is new text 373 available from more than one source. The mixer needs to send two 374 packets more with redundant text from the current source before 375 starting to send anything from the other source. 377 4.1.1.3. RTP Mixer with frequent transmission and indicating sources in 378 CSRC-list 380 An RTP media mixer combines text from participants into one RTP 381 stream, thus all using the same destination address/port combination, 382 the same RTP SSRC, and one sequence number series as described in 383 Section 7.1 and 7.3 of RTP RFC 3550 [RFC3550] about the Mixer 384 function. This method is also briefly described in RFC 7667, section 385 3.6.1 Media mixing mixer [RFC7667]. 387 The sources of the text in each RTP packet are identified by the CSRC 388 list in the RTP packets, containing the SSRC of the initial sources 389 of text. The order of the CSRC parameters is with the SSRC of the 390 source of the primary text first, followed by the SSRC of the first 391 level redundancy, and then the second level redundancy. 393 The transmission interval should be 100 milliseconds when there is 394 text to transmit from more than one source, and otherwise 300 ms. 396 The identification of the sources is made through the CSRC fields and 397 can be made more readable at the receiver through the RTCP SDES CNAME 398 and NAME packets as described in RTP[RFC3550]. 400 Information provided through the notification according to RFC 4575 401 [RFC4575] when the participant joined the conference provides also 402 suitable information and a reference to the SSRC. 404 A receiving endpoint is supposed to separate text items from the 405 different sources and identify and display them accordingly. 407 The ordered CSRC lists in the RFC 4103 [RFC4103] packets make it 408 possible to recover from loss of one and two packets in sequence and 409 assign the recovered text to the right source. For more loss, a 410 marker for possible loss should be inserted or presented. 412 The conference server needs to have authority to decrypt the payload 413 in the received RTP packets in order to be able to recover text from 414 redundant data or insert the missing text marker in the stream, and 415 repack the text in new packets. 417 Even if the format is very similar to "text/red" of RFC 4103, it has 418 been indicated that it needs to be declared as a new media subtype, 419 e.g. "text/rex". 421 Pros: 423 This method has low overhead and less complexity than the methods in 424 Section 4.1.1.1, Section 4.1.1.2, Section 4.1.1.4 and 425 Section 4.1.1.6. 427 When loss of packets occur, it is possible to recover text from 428 redundancy at loss of up to the number of redundancy levels carried 429 in the RFC 4103 [RFC4103] stream (normally primary and two redundant 430 levels). 432 This method can be implemented with most RTP implementations. 434 The source switching performance is sufficient for well-behaving 435 conference participants. There can be switching between five source 436 per second with an introduced delay of maximum 500 ms. With just two 437 parties typing simultaneously, the delay will be a maximum of 100 ms. 439 Cons: 441 When more consecutive packet loss than the number of generations of 442 redundant data appears, it is not possible to deduct the sources of 443 the totally lost data. 445 Slightly higher risk for loss of text at bursty packet loss than for 446 the recommended transmission interval for RFC 4103. 448 Requires a different sub media format, e.g. "text/rex". 450 The conference server needs to be allowed to decrypt/encrypt the 451 packet payload. This is however normal for media mixers for other 452 media. 454 4.1.1.4. RTP Mixer using timestamp to identify redundancy 456 This method has text only from one source per packet, as the original 457 RFC 4103 [RFC4103] specifies. Packets with text from different 458 sources are instead allowed to be merged. The recovery procedure in 459 the receiver will use the RTP timestamp and timestamp offsets in the 460 redundancy headers to evaluate if a piece of redundant data should be 461 recovered or not in case of packet loss. 463 In this method, the transmission interval is 100 milliseconds when 464 text from more than one source is available for transmission. 466 Pros: 468 The format of each packet is equal to what is specified in RFC 4103 469 [RFC4103]. 471 The source switching performance is sufficient. Text from five 472 participants can be transmitted simultaneously with 500 milliseconds 473 interval per source. 475 New text from five simultaneous sources can be transmitted within 500 476 milliseconds. This is sufficient. 478 Cons: 480 The recovery time in case of packet loss is long. With five 481 participants, it will be 1.5 seconds. 483 The recovery procedure is complex and very different from what is 484 described in RFC 4103 [RFC4103]. 486 It is not sure that this change can be regarded to be an update to 487 RFC 4103. It may need a new media subtype. 489 4.1.1.5. RTP Mixer with multiple primary data in each packet and 490 individual sequence numbers 492 This method allows primary as well as redundant text from more than 493 one source per packet. The packet payload contains an ordered set of 494 redundant and primary data with the same number of generations of 495 redundancy as once agreed in the SDP negotiation. The data header 496 reflects these parts of the payload. The CSRC list contains one CSRC 497 member per source in the payload and in the same order. An 498 individual sequence number per source is included in the data header 499 replacing the t140 payload type number that is instead assumed to be 500 constant in this format. This allows an individual extra sequence 501 number per source with maximum value 127, suitable for checking for 502 which source loss of text appeared when recovery was not possible. 504 The data header would contain the following fields: 505 0 1 2 3 506 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 508 |F| Source-seq | timestamp offset | block length | 509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 510 Where "Source-seq" is the sequence number per source. 512 The maximum number of members in the CSRC-list is 16, and that is 513 therefore the maximum number of sources that can be represented in 514 each packet provided that all data can be fitted into the size 515 allowable in one packet. 517 Transmission is done as soon as there is new text available, but not 518 with shorter interval than 150 ms and not longer than 300 ms while 519 there is anything to send. 521 A new media subtype is needed, e.g. "text/rex". 523 This is an SDP offer example for both traditional "text/red" 524 and multi-party "text/rex" format: 526 m=text 11000 RTP/AVP 101 100 98 527 a=rtpmap:98 t140/1000 528 a=rtpmap:100 red/1000 529 a=rtpmap:101 rex/1000 530 a=fmtp:100 98/98/98 531 a=fmtp:101 98/98/98 533 Pros: 535 The source switching performance is good. Text from 16 participants 536 can be transmitted simultaneously. 538 New text from 16 simultaneous sources can be transmitted within 300 539 milliseconds. This is good performance. 541 When more consecutive packet loss than the number of generations of 542 redundant data appears, it is still possible to deduct the sources of 543 the totally lost data, when next text from these sources arrive. 545 Cons: 547 The format of each packet is different from what is specified in RFC 548 4103 [RFC4103]. 550 A new media subtype is needed. 552 The recovery procedure is a bit complex. 554 4.1.1.6. RTP Mixer with multiple primary data in each packet 556 This method allows primary as well as redundant text from more than 557 one source per packet. The packet payload contains an ordered set of 558 redundant and primary data with the same number of generations of 559 redundancy as once agreed in the SDP negotiation. The data header 560 reflects these parts of the payload. The CSRC list contains one CSRC 561 member per source in the payload and in the same order. The 562 The maximum number of members in the CSRC-list is 16, and that is 563 therefore the maximum number of sources that can be represented in 564 each packet provided that all data can be fitted into the size 565 allowable in one packet. 567 Transmission is done as soon as there is new text available, but not 568 with shorter interval than 150 ms and not longer than 300 ms while 569 there is anything to send. 571 A new media subtype is needed, e.g. "text/rex". 573 SDP would be the same as in Section 4.1.1.6. 575 Pros: 577 The source switching performance is good. Text from 16 participants 578 can be transmitted simultaneously. 580 New text from 16 simultaneous sources can be transmitted within 150 581 milliseconds. This is good performance. 583 Cons: 585 The format of each packet is different from what is specified in RFC 586 4103 [RFC4103]. 588 A new media subtype is needed. 590 The recovery procedure is a bit complex [RFC4103]. 592 When more consecutive packet loss than the number of generations of 593 redundant data appears, it is not possible to deduct the sources of 594 the totally lost data. 596 4.1.1.7. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy in the 597 packets 599 This method allows primary data from one source and redundant text 600 from other sources in each packet. The packet payload contains 601 primary data in "text/t140" format, and redundant data in RFC 5109 602 FEC [RFC5109] format called "text/ulpfec". That means that the 603 redundant data contains the sequence number and the CSRC and other 604 characteristics from the RTP header when the data was sent as 605 primary. The redundancy can be sent at a selected number of packets 606 after when it was sent as primary, in order to improve the protection 607 against bursty packet loss. The redundancy level is recommended to 608 be the same as in original RFC 4103. 610 RFC 4103 says that the protection against loss can be made by other 611 methods than plain redundancy, so this method is in line with that 612 statement. 614 Transmission is done as soon as there is new text available, but not 615 with shorter interval than 100 ms and not longer than 300 ms while 616 there is anything to send (new or redundant text). 618 When more consecutive packet loss than the number of generations of 619 redundant data appears, it is not possible to deduct the sources of 620 the totally lost data. 622 The sdp can indicate the format as "text/red" with "text/ulpfec" 623 redundant data in this way. with traditional RFC 4103 with "text/red" 624 with "text/t140" as redundant data as a fallback. 626 m=text 49170 RTP/AVP 98 101 100 102 627 a=rtpmap:98 red/1000 628 a=fmtp:98 100/102/102 629 a=rtpmap:102 ulpfec/1000 630 a=rtpmap:100 t140/1000 631 a=rtpmap:101 red/1000 632 a=fmtp:101 100/100/100 633 a=fmtp:100 cps=200 635 The "text/ulpfec" format includes an indication of how far back the 636 redundancy belongs, making it possible to cover bursty packet loss 637 better than the other formats with short transmission intervals. For 638 real-time text, it is recommended to send three packets between the 639 primary and the redundant transmissions of text. That makes the 640 transmission cover between 500 and 1500 ms of bursty packet loss. 641 The variation is because of the varying packet interval between many 642 and one simultaneously transmitting source. 644 The "text/ulpfec" format has a number of parameters. One is the 645 length of the data to be protected which in this case must be the 646 whole t140block. 648 Pros: 650 The source switching performance is good. Text from 5 participants 651 can be transmitted within 500 ms. 653 Good recovery from bursty packet loss. 655 The method is based on existing standards. No new registrations are 656 needed. 658 Cons: 660 When more consecutive packet loss than the number of generations of 661 redundant data appears, it is not possible to deduct the sources of 662 the totally lost data. 664 Even if the switching performance is good, it is not as good as for 665 the method called "RTP Mixer with multiple primary data in each 666 packet "Section 4.1.1.6. With more than 5 simultaneously sending 667 sources, there will be a noticeable delay of text of over 500 ms, 668 with 100 ms added per simultaneous source. This is however beyond 669 the requirements and would be a concern only in congestion 670 situations. 672 The recovery procedure is a bit complex [RFC5109]. 674 There is more overhead in terms of extra data and extra packets sent 675 than in the other methods. With the recommended two redundant 676 generations of data, each packet will be 36 bytes longer than with 677 traditional RFC 4103, and at each pause in transmission five extra 678 packets with only redundant data will be sent compared to two extra 679 packets for the traditional RFC 4103 case. 681 4.1.1.8. RTP Mixer with RFC 5109 FEC and RFC 2198 redundancy and 682 separate sequence number in the packets 684 This method allows primary data from one source and redundant text 685 from other sources in each packet. The packet payload contains 686 primary data in a new "text/t140e" format, and redundant data in RFC 687 5109 FEC [RFC5109] format called "text/ulpfec". That means that the 688 redundant data contains the sequence number and the CSRC and other 689 characteristics from the RTP header when the data was sent as 690 primary. The redundancy can be sent at a selected number of packets 691 after when it was sent as primary, in order to improve the protection 692 against bursty packet loss. The redundancy level is recommended to 693 be the same as in original RFC 4103. The "text/t140e" format 694 contains a source-specific sequence number and the t140block. 696 RFC 4103 says that the protection against loss can be made by other 697 methods than plain redundancy, so this method is in line with that 698 statement. 700 Transmission is done as soon as there is new text available, but not 701 with shorter interval than 100 ms and not longer than 300 ms while 702 there is anything to send (new or redundant text). 704 When more consecutive packet loss than the number of generations of 705 redundant data appears, it is possible to deduct which sources lost 706 data when new data arrives from the sources. This is done by 707 monitoring the received source specific sequence numbers preceding 708 the text. 710 This is an example of how can indicate the format as "text/red" with 711 "text/t140e" as primary and "text/ulpfec" redundant data, with 712 traditional RFC 4103 with "text/red" with "text/t140" as redundant 713 data as a fallback. 715 m=text 49170 RTP/AVP 98 101 100 102 103 716 a=rtpmap:98 red/1000 717 a=fmtp:98 100/102/102 718 a=rtpmap:102 ulpfec/1000 719 a=rtpmap:103 t140/1000 720 a=rtpmap:100 t140e/1000 721 a=rtpmap:101 red/1000 722 a=fmtp:101 103/103/103 723 a=fmtp:100 cps=200 725 The "text/ulpfec" format includes an indication of how far back the 726 redundancy belongs, making it possible to cover bursty packet loss 727 better than the other formats with short transmission intervals. For 728 real-time text, it is recommended to send three packets between the 729 primary and the redundant transmissions of text. That makes the 730 transmission cover between 500 and 1500 ms of bursty packet loss. 731 The variation is because of the varying packet interval between many 732 and one simultaneously transmitting source. 734 The "text/ulpfec" format has a number of parameters. One is the 735 length of the data to be protected which in this case must be the 736 whole t140block. 738 Pros: 740 The source switching performance is good. Text from 5 participants 741 can be transmitted within 500 ms. 743 Good recovery from bursty packet loss. 745 The method is based on an existing standard for FEC. 747 When more consecutive packet loss than the number of generations of 748 redundant data appears, it is possible to deduct the source of the 749 lost data when new text arrives from the source. 751 Cons: 753 Even if the switching performance is good, it is not as good as for 754 the method called "RTP Mixer with multiple primary data in each 755 packet" Section 4.1.1.6. With more than 5 simultaneously sending 756 sources, there will be a noticeable delay of text of over 500 ms, 757 with 100 ms added per simultaneous source. This is however beyond 758 the requirements and would be a concern only in congestion 759 situations. 761 The recovery procedure is a bit complex [RFC5109]. 763 There is more overhead in terms of extra data and extra packets sent 764 than in the other methods. With the recommended two redundant 765 generations of data, each packet will be 40 bytes longer than with 766 traditional RFC 4103, and at each pause in transmission five extra 767 packets with only redundant data will be sent compared to two extra 768 packets for the traditional RFC 4103 case. 770 A new text media subtype "text/t140e" needs to be registered. 772 4.1.1.9. RTP Mixer indicating participants by a control code in the 773 stream 775 Text from all participants except the receiving one is transmitted 776 from the media mixer in the same RTP session and stream, thus all 777 using the same destination address/port combination, the same RTP 778 SSRC and , one sequence number series as described in Section 7.1 and 779 7.3 of RTP RFC 3550 [RFC3550] about the Mixer function. The sources 780 of the text in each RTP packet are identified by a new defined T.140 781 control code "c" followed by a unique identification of the source in 782 UTF-8 string format. 784 The receiver can use the string for presenting the source of text. 785 This method is on the RTP level described in RFC 7667, section 3.6.1 786 Media mixing mixer [RFC7667]. 788 The inline coding of the source of text is applied in the data stream 789 itself, and an RTP mixer function is used for coordinating the 790 sources of text into one RTP stream. 792 Information uniquely identifying each user in the multi-party session 793 is placed as the parameter value "n" in the T.140 application 794 protocol function with the function code "c". The identifier shall 795 thus be formatted like this: SOS c n ST, where SOS and ST are coded 796 as specified in ITU-T T.140 [T140]. The "c" is the letter "c". The 797 n parameter value is a string uniquely identifying the source. This 798 parameter shall be kept short so that it can be repeated in the 799 transmission without concerns for network load. 801 A receiving endpoint is supposed to separate text items from the 802 different sources and identify and display them accordingly. 804 The conference server need to be allowed to decrypt/encrypt the 805 packet payload in order to check the source and repack the text. 807 Pros: 809 If loss of packets occur, it is possible to recover text from 810 redundancy at loss of up to the number of redundancy levels carried 811 in the RFC 4103 [RFC4103]stream. (normally primary and two redundant 812 levels. 814 This method can be implemented with most RTP implementations. 816 The method can also be used with other transports than RTP 818 Cons: 820 The method implies a moderate load by the need to insert the source 821 often in the stream. 823 If more consecutive packet loss than the number of generations of 824 redundant data appears, it is not possible to deduct the source of 825 the totally lost data. 827 The mixer needs to be able to generate suitable and unique source 828 identifications which are suitable as labels for the sources. 830 Requires an extension on the ITU-T T.140 standard, best made by the 831 ITU. 833 There is a risk that the control code indicating the change of source 834 is lost and the result is false source indication of text. 836 The conference server need to be allowed to decrypt/encrypt the 837 packet payload. 839 4.1.1.10. Mixing for multi-party unaware user agents 841 Multi-party real-time text contents can be transmitted to multi-party 842 unaware user agents if source labelling and formatting of the text is 843 performed by a mixer. This method has the limitations that the 844 layout of the presentation and the format of source identification is 845 purely controlled by the mixer, and that only one source at a time is 846 allowed to present in real-time. Other sources need to be stored 847 temporarily waiting for an appropriate moment to switch the source of 848 transmitted text. The mixer controls the switching of sources and 849 inserts a source identifier in text format at the beginning of text 850 after switch of source. The logic of the mixer to detect when a 851 switch is appropriate should detect a number of places in text where 852 a switch can be allowed, including new line, end of sentence, end of 853 phrase, a period of inactivity, and a word separator after a long 854 time of active transmission. 856 This method MAY be used when no support for multi-party awareness is 857 detected in the receiving endpoint.The base for his method is 858 described in RFC 7667, section 3.6.1 Media mixing mixer [RFC7667]. 860 See [I-D.ietf-avtcore-multi-party-rtt-mix] for a procedure for mixing 861 RTT for a conference-unaware endpoint. 863 Pros: 865 Can be transmitted to conference-unaware endpoints. 867 Can be used with other transports than RTP 869 Cons: 871 Does not allow full real-time presentation of more than one source at 872 a time. Text from other sources will be delayed. 874 The only realistic presentation format is a style with the text from 875 the different sources presented with a text label indicating source, 876 and the text collected in a chat style presentation but with more 877 frequent turn-taking. 879 Endpoints often have their own system for adding labels to the RTT 880 presentation. In that case there will be two levels of labels in the 881 presentation, one for the mixer and one for the sources. 883 If loss of more packets than can be recovered by the redundancy 884 appears, it is not possible to detect which source was struck by the 885 loss. It is also possible that a source switch occurred during the 886 loss, and therefore a false indication of the source of text can be 887 provided to the user after such loss. 889 Because of all these cons, this method is not recommended and MUST 890 NOT be used as the main method, but only as the last resort for 891 backwards interoperability with multi-party unaware endpoints. 893 The conference server need to be allowed to decrypt/encrypt the 894 packet payload. 896 4.1.2. RTP-based bridging with minor RTT media contents reformatting by 897 the bridge 899 It may be desirable to send text in a multi-party setting in a way 900 that allows the text stream contents to be distributed without being 901 dealt with in detail in any central server. A number of such methods 902 are described. However, when writing this specification, no one of 903 these methods have a specified way of establishing the session by 904 sdp. 906 4.1.2.1. RTP Translator sending one RTT stream per participant 908 Within the RTP session, text from each participant is transmitted 909 from the RTP media translator in a separate RTP stream, thus using 910 the same destination address/port combination, but separate RTP SSRC 911 parameters and sequence number series as described in Section 7.1 and 912 7.2 of RTP RFC 3550 [RFC3550] about the Translator function. The 913 source of the text in each RTP packet is identified by the SSRC 914 parameter in the RTP packets, containing the SSRC of the initial 915 source of text. 917 A receiving and presenting endpoint is supposed to separate text 918 items from the different sources and identify and display them in a 919 suitable way. 921 This method is described in RFC 7667, section 3.5.1 Relay-transport 922 translator or 3.5.2 Media translator [RFC7667]. 924 The identification of the source is made through the SSRC and the 925 RTCP SDES CNAME and NAME packets as described in RTP[RFC3550]. 927 Pros: 929 This method has moderate overhead in terms of work for the mixer, but 930 high in terms of packet transmission rate. When loss of packets 931 occur, it is possible to recover text from redundancy at loss of up 932 to the number of redundancy levels carried in the RFC 4103 [RFC4103] 933 stream(normally primary and two redundant levels). 935 More loss than what can be recovered, can be detected and the marker 936 for text loss can be inserted in the correct stream. 938 It may be possible in some scenarios to keep the text encrypted 939 through the Translator. 941 Cons: 943 There may be RTP implementations not supporting the Translator model. 945 With many simultaneous sending sources, the total rate of packets 946 will be high, and can cause congestion. 948 This configuration is not supported by current media declarations in 949 sdp. RFC 3264 [RFC3264]specifies in many places that one media 950 description is supposed to describe just one RTP stream. 952 4.1.2.2. Distributing packets in an end-to-end encryption structure 954 In order to achieve end-to-end encryption, it is possible to let the 955 packets from the sources just pass though a central distributor, and 956 handle the security agreements between the participants. 957 Specifications exist for a framework with this functionality for 958 application on RTP based conferences in 959 [I-D.ietf-perc-private-media-framework]. The RTP flow and mixing 960 characteristics has similarities with the method described under "RTP 961 Translator sending one RTT stream per participant" above. RFC 4103 962 RTP streams [RFC4103] would fit into the structure and it would 963 provide a base for end-to-end encrypted rtt multi-party conferencing. 965 Pros: 967 Good security 969 Straightforward multi-party handling. 971 Cons: 973 Does not operate under the usual SIP central conferencing 974 architecture. 976 Requires the participants to perform a lot of key handling. 978 Is work in progress when this is written. 980 4.1.2.3. Mesh of RTP endpoints 982 Text from all participants are transmitted directly to all others in 983 one RTP session, without a central bridge. The sources of the text 984 in each RTP packet are identified by the source network address and 985 the SSRC. 987 This method is described in RFC 7667, section 3.4 Point to multi- 988 point using mesh [RFC7667]. 990 Pros: 992 When loss of packets occur, it is possible to recover text from 993 redundancy at loss of up to the number of redundancy levels carried 994 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 995 levels. 997 This method can be implemented with most RTP implementations. 999 Transmitted text can also be used with other transports than RTP 1001 Cons: 1003 This model is not described in IMS, NENA and EENA specifications, and 1004 does therefore not meet the requirements. 1006 Requires a drastically increasing number of connections when the 1007 number of participants increase. 1009 4.1.2.4. Multiple RTP sessions, one for each participant 1011 Text from all participants are transmitted directly to all others in 1012 one RTP session each, without a central bridge. Each session is 1013 established with a separate media description in SDP. The sources of 1014 the text in each RTP packet are identified by the source network 1015 address and the SSRC. 1017 Pros: 1019 When loss of packets occur, it is possible to recover text from 1020 redundancy at loss of up to the number of redundancy levels carried 1021 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 1022 levels. 1024 Complete loss of text can be indicated in the received stream. 1026 This method can be implemented with most RTP implementations. 1028 End-to-end encryption is achievable. 1030 Cons: 1032 This method is not described in IMS, NENA and ETSI specifications and 1033 does therefore not meet the requirements. 1035 A lot of network resources are spent on setting up separate sessions 1036 for each participant. 1038 5. Preferred RTP-based multi-party RTT transport method 1040 For RTP transport of RTT using RTP-mixer technology, one method for 1041 multi-party mixing and transport stand out as fulfilling the goals 1042 best and is therefore recommended. That is: TBD 1044 For RTP transport in separate streams or sessions, no current 1045 recommendation can be made. A bridging method in the process of 1046 standardisation with interesting characteristics is the end-to-end 1047 encryption model "perc" Section 4.1.2.2. 1049 6. Session control of RTP-based multi-party RTT sessions 1051 General session control aspects for multi-party sessions are 1052 described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP) 1053 Event Package for Conference State, and RFC 4579 [RFC4579] Session 1054 Initiation Protocol (SIP) Call Control - Conferencing for User 1055 Agents. The nomenclature of these specifications are used here. 1057 The procedures for a multi-party aware model for RTT-transmission 1058 shall only be applied if a capability exchange for multi-party aware 1059 real-time text transmission has been completed and a supported method 1060 for multi-party real-time text transmission can be negotiated. 1062 A method for detection of conference-awareness for centralized SIP 1063 conferencing in general is specified in RFC 4579 [RFC4579]. The 1064 focus sends the "isfocus" feature tag in a SIP Contact header. This 1065 causes the conference-aware endpoint to subscribe to conference 1066 notifications from the focus. The focus then sends notifications to 1067 the endpoint about entering and disappearing conference participants 1068 and their media capabilities. The information is carried XML- 1069 formatted in a 'conference-info' block in the notification according 1070 to RFC 4575 [RFC4575]. The mechanism is described in detail in RFC 1071 4575 [RFC4575]. 1073 Before a conference media server starts sending multi-party RTT to an 1074 endpoint, a verification of its ability to handle multi-party RTT 1075 must be made. A decision on which mechanism to use for identifying 1076 text from the different participants must also be taken, implicitly 1077 or explicitly. These verifications and decisions can be done in a 1078 number of ways. The most apparent ways are specified here and their 1079 pros and cons described. One of the methods is selected to be the 1080 one to be used by implementations of the centralized conference model 1081 according to this specification. 1083 6.1. Implicit RTT multi-party capability indication 1085 Capability for RTT multi-party handling can be decided to be 1086 implicitly indicated by session control items. 1088 The focus may implicitly indicate muti-party RTT capability by 1089 including the media child with value "text" in the RFC 4575 [RFC4575] 1090 conference-info provided in conference notifications. 1092 An endpoint may implicitly indicate multi-party RTT capability by 1093 including the text media in the SDP in the session control 1094 transactions with the conference focus after the subscription to the 1095 conference has taken place. 1097 The implicit RTT capability indication means for the focus that it 1098 can handle multi-party RTT according to the preferred method 1099 indicated in the RTT multi-party methods section above. 1101 The implicit RTT capability indication means for the endpoint that it 1102 can handle multi-party RTT according to the preferred method 1103 indicated in the RTT multi-party methods section above. 1105 If the focus detects that an endpoint implicitly declared RTT multi- 1106 party capability, it SHALL provide RTT according to the preferred 1107 method. 1109 If the focus detects that the endpoint does not indicate any RTT 1110 multi-party capability, then it shall either provide RTT multi-party 1111 text in the way specified for conference-unaware endpoint above, or 1112 refuse to set up the session. 1114 If the endpoint detects that the focus has implicitly declared RTT 1115 multi-party capability, it shall be prepared to present RTT in a 1116 multi-party fashion according to the preferred method. 1118 Pros: 1120 Acceptance of implicit multi-party capability implies that no 1121 standardisation of explicit RTT multi-party capability exchange is 1122 required. 1124 Cons: 1126 If other methods for multi-party RTT are to be used in the same 1127 implementation environment as the preferred ones, then capability 1128 exchange needs to be defined for them. 1130 Cannot be used outside a strictly applied SIP central conference 1131 model. 1133 6.2. RTT multi-party capability declared by SIP media-tags 1135 Specifications for RTT multi-party capability declarations can be 1136 agreed for use as SIP media feature tags, to be exchanged during SIP 1137 call control operation according to the mechanisms in RFC 3840 1138 [RFC3840] and RFC 3841 [RFC3841]. Capability for the RTT Multi-party 1139 capability is then indicated by the media feature tag "rtt-mix", with 1140 a set of possible values for the different possible methods. 1142 The possible values in the list may for example be: 1144 rtp-mixer 1146 perc 1148 rtp-mixer indicates capability for using the RTP-mixer based 1149 presentation of multi-party text. 1151 perc indicates capability for using the perc based transmission of 1152 multi-party text. 1154 Example: Contact: 1156 ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL" 1158 ;+sip.rtt-mix="rtp-mixer" 1160 If, after evaluation of the alternatives in this specification, only 1161 one mixing method is selected to be brought to implementation, then 1162 the media tag can be reduced to a single tag with no list of values. 1164 An offer-answer exchange should take place and the common method 1165 selected by the answering party shall be used in the session with 1166 that UA. 1168 When no common method is declared, then only the fallback method for 1169 multi-party unaware participants can be used, or the session dropped. 1171 If more than one text media section is included in SDP, all must be 1172 capable of using the declared RTT multi-party method. 1174 Pros: 1176 Provides a clear decision method. 1178 Can be extended with new mixing methods. 1180 Can guide call routing to a suitable capable focus. 1182 Cons: 1184 Requires standardization and IANA registration. 1186 Is not stream specific. If more than one text stream is specified, 1187 all must have the same type of multi-party capability. 1189 Cannot be used in the WebRTC environment. 1191 6.3. SDP media attribute for RTT multi-party capability indication 1193 An attribute can be specified on media level, to be used in text 1194 media SDP declarations for negotiating RTT multi-party capabilities. 1195 The attribute can have the name "rtt-mix". 1197 More than one attribute can be included in one media description. 1199 The attribute can have a value. The value can for example be: 1201 rtp-mixer 1203 rtp-translator 1205 perc 1207 rtp-mixer indicates capability for using the RTP-mixer and CSRC-list 1208 based mixing of multi-party text. 1210 rtp-translator indicates capability for using the RTP-translator 1211 based mixing 1213 perc indicates capability for using the perc based transmission of 1214 multi-party text. 1216 An offer-answer exchange should take place and the common method 1217 selected by the answering party shall be used in the session with 1218 that endpoint. 1220 When no common method is declared, then only the fallback method for 1221 multi-party unaware endpoints can be used. 1223 Example: a=rtt-mix:rtp-mixer 1224 If, after evaluation of the alternatives in this specification, only 1225 one mixing method is selected to be brought to implementation, then 1226 the attribute can be reduced to a single attribute with no list of 1227 values. 1229 Pros: 1231 Provides a clear decision method. 1233 Can be extended with new mixing methods. 1235 Can be used on specific text media. 1237 Can be used also for SDP-controlled WebRTC sessions with multiple 1238 streams in the same data channel. 1240 Cons: 1242 Requires standardization and IANA registration. 1244 Cannot guide SIP routing. 1246 6.4. Simplified SDP media attribute for RTT multi-party capability 1247 indication 1249 An attribute can be specified on media level, to be used in text 1250 media SDP declarations for negotiating RTT multi-party capabilities. 1251 The attribute can have the name "rtt-mix" with no value. It would be 1252 selected and used if only one method for multi-party rtt is brought 1253 forward from this specification, and the other suppressed or found to 1254 be possible to negotiate in another way. 1256 An offer-answer exchange should take place and if both parties 1257 specify "rtt-mix" capability, the selected mixing method shall be 1258 used. 1260 When no common method is declared, then only the fallback method for 1261 multi-party unaware endpoints can be used, or the session not 1262 accepted for multi-party use. 1264 Example: a=rtt-mix 1266 Pros: 1268 Provides a clear decision method. 1270 Very simple syntax and semantics. 1272 Can be used on specific text media. 1274 Could possibly be used also for SDP-controlled WebRTC sessions with 1275 multiple streams in the same data channel. 1277 Cons: 1279 Requires standardization and IANA registration. 1281 If another RTT mixing method is also specified in the future, then 1282 that method may also need to specify and register its own attribute, 1283 instead of if an attribute with a parameter value is used, when only 1284 an addition of a new possible value is needed. 1286 Cannot guide SIP routing. 1288 6.5. SDP format parameter for RTT multi-party capability indication 1290 An FMTP format parameter can be specified for the RFC 4103 1291 [RFC4103]media, to be used in text media SDP declarations for 1292 negotiating RTT multi-party capabilities. The parameter can have the 1293 name "rtt-mix", with one or more of its possible values. 1295 The possible values in the list are: 1297 rtp-mixer 1299 perc 1301 rtp-mixer indicates capability for using the RTP-mixer based mixing 1302 and presentation of multi-party text using the CSRC-list. 1304 perc indicates capability for using the perc based transmission of 1305 multi-party text. 1307 Example: a=fmtp 96 98/98/98 rtt-mix=rtp-mixer 1309 If, after evaluation of the alternatives in this specification, only 1310 one mixing method is selected to be brought to implementation, then 1311 the parameter can be reduced to a single parameter with no list of 1312 values. 1314 An offer-answer exchange should take place and the common method 1315 selected by the answering party shall be used in the session with 1316 that UA. 1318 When no common method is declared, then only the fallback method can 1319 be used, or the session denied. 1321 Pros: 1323 Provides a clear decision method. 1325 Can be extended with new mixing methods. 1327 Can be used on specific text media. 1329 Can be used also for SDP-controlled WebRTC sessions with multiple 1330 streams in the same data channel. 1332 Cons: 1334 Requires standardization and IANA registration. 1336 May cause interop problems with current RFC4103 [RFC4103] 1337 implementations not expecting a new fmtp-parameter. 1339 Cannot guide SIP routing. 1341 6.6. A text media subtype for support of multi-party rtt 1343 Indicating a specific text media subtype in SDP is a straightforward 1344 way for negotiating multi-party capability. Especially if there are 1345 format differences from the "text/red" and "text/t140" formats of 1346 RFC4103 [RFC4103], then this is a natural way to do the negotiation 1347 for multi-party rtt. 1349 Pros: 1351 No extra efforts if a new format is needed anyway. 1353 Cons: 1355 None specific to using the format indication for negotiation of 1356 multi-party capability. But only feasible if a new format is needed 1357 anyway. 1359 6.7. Preferred capability declaration method for RTP-based transport. 1361 If the preferred transport method is one with a specific media 1362 subtype in sdp, then speciication by media subtype is preferred. 1364 If this would not be the case, then the preferred capability 1365 declaration method would be the one with a simplified SDP attribute 1366 "a=rtt-mix" Section 6.4 because it is straightforward and partially 1367 usable also for WebRTC if so needed. 1369 6.8. Identification of the source of text for RTP-based solutions 1371 The main way to identify the source of text in the RTP based solution 1372 is by the SSRC of the sending participant. In the RTP-mixer 1373 solution, this SSRC is included in the CSRC list of the transmitted 1374 packets. Further identification that may be needed for better 1375 labelling of received text may be achieved from a number of sources. 1376 It may be the RTCP SDES CNAME and NAME reports, and in the conference 1377 notification data (RFC 4575) [RFC4575]. 1379 As soon as a new member is added to the RTP session, its 1380 characteristics should be transmitted in RTCP SDES CNAME and NAME 1381 reports according to section 6.5 in RFC 3550 [RFC3550]. The 1382 information about the participant should also be included in the 1383 conference data including the text media member in a notification 1384 according to RFC 4575 [RFC4575]. 1386 The RTCP SDES report, SHOULD contain identification of the source 1387 represented by the SSRC/CSRC identifier. This identification MUST 1388 contain the CNAME field and MAY contain the NAME field and other 1389 defined fields of the SDES report. 1391 A focus UA SHOULD primarily convey SDES information received from the 1392 sources of the session members. When such information is not 1393 available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME 1394 information from available information from the SIP session with the 1395 participant. 1397 7. RTT bridging in WebRTC 1399 Within WebRTC, real-time text is specified to be carried in WebRTC 1400 data channels as specified in 1401 [I-D.ietf-mmusic-t140-usage-data-channel]. A few ways to handle 1402 multi-party RTT are mentioned briefly. They are repeated below. 1404 7.1. RTT bridging in WebRTC with one data channel per source 1406 A straightforward way to handle multi-party RTT is for the bridge to 1407 open one T.140 data channel per source towards the receiving 1408 participants. 1410 The stream-id forms a unique stream identification. 1412 The identification of the source is made through the Label property 1413 of the channel, and session information belonging to the source. The 1414 endpoint can compose a readable label for the presentation from this 1415 information. 1417 Pros: 1419 This is a straightforward solution. 1421 The load per source is low. 1423 Cons: 1425 With a high number of participants, the overhead of establishing and 1426 maintaining the high number of data channels required may be high, 1427 even if the load per channel is low. 1429 7.2. RTT bridging in WebRTC with one common data channel 1431 A way to handle multi-party RTT in WebRTC is for the bridge combine 1432 text from all sources into one data channel and insert the sources in 1433 the stream by a T.140 control code for source. 1435 This method is described in a corresponding section for RTP 1436 transmission above in Section 4.1.1.9. 1438 The identification of the source is made through insertion in the 1439 beginning of each text transmission from a source of a control code 1440 extension "c" followed by a string representing the source, framed by 1441 the control code start and end flags SOS and ST (See ITU-T T.140 1442 [T140]). 1444 A receiving endpoint is supposed to separate text items from the 1445 different sources and identify and display them in a suitable way. 1447 The endpoint does not always display the source identification in the 1448 received text at the place where it is received, but has the 1449 information as a guide for planning the presentation of received 1450 text. A label corresponding to the source identification is 1451 presented when needed depending on the selected presentation style. 1453 Pros: 1455 This solution has relatively low overhead on session and network 1456 level 1458 Cons: 1460 This solution has higher overhead on the media contents level than 1461 the WebRTC solution above. 1463 Standardisation of the new control code "c" in ITU-T T.140 [T140] is 1464 required. 1466 The conference server need to be allowed to decrypt/encrypt the data 1467 channel contents. 1469 7.3. Preferred rtt multi-party method for WebRTC 1471 For WebRTC, one method is to prefer because of the simplicity. So, 1472 for WebRTC, the method to implement for multi-party RTT with multi- 1473 party aware parties when no other method is explicitly agreed between 1474 implementing parties is: "RTT bridging in WebRTC with one data 1475 channel per source" Section 7.1. 1477 8. Presentation of multi-party text 1479 All session participants with RTP based transport MUST observe the 1480 SSRC/CSRC field of incoming text RTP packets, and make note of which 1481 source they came from in order to be able to present text in a way 1482 that makes it easy to read text from each participant in a session, 1483 and get information about the source of the text. 1485 In the WebRTC case, the Label parameter and other provided endpoint 1486 information should be used for the same purpose. 1488 8.1. Associating identities with text streams 1490 A source identity SHOULD be composed from available information 1491 sources and displayed together with the text as indicated in ITU-T 1492 T.140 Appendix[T140]. 1494 The source identity should primarily be the NAME field from incoming 1495 SDES packets. If this information is not available, and the session 1496 is a two-party session, then the T.140 source identity SHOULD be 1497 composed from the SIP session participant information. For multi- 1498 party sessions the source identity may be composed by local 1499 information if sufficient information is not available in the 1500 session. 1502 Applications may abbreviate the presented source identity to a 1503 suitable form for the available display. 1505 Applications may also replace received source information with 1506 internally used nicknames. 1508 8.2. Presentation details for multi-party aware endpoints. 1510 The multi-party aware endpoint should after any action for recovery 1511 of data from lost packets, separate the incoming streams and present 1512 them according to the style that the receiving application supports 1513 and the user has selected. The decisions taken for presentation of 1514 the multi-party interchange shall be purely on the receiving side. 1515 The sending application must not insert any item in the stream to 1516 influence presentation that is not requested by the sending 1517 participant. 1519 8.2.1. Bubble style presentation 1521 One often used style is to present real-time text in chunks in 1522 readable bubbles identified by labels containing names of sources. 1523 Bubbles are placed in one column in the presentation area and are 1524 closed and moved upwards in the presentation area after certain items 1525 or events, when there is also newer text from another source that 1526 would go into a new bubble. The text items that allows bubble 1527 closing are any character closing a phrase or sentence followed by a 1528 space or a timeout of a suitable time (about 10 seconds). 1530 Real-time active text sent from the local user should be presented in 1531 a separate area. When there is a reason to close a bubble from the 1532 local user, the bubble should be placed above all real-time active 1533 bubbles, so that the time order that real-time text entries were 1534 completed is visible. 1536 Scrolling is usually provided for viewing of recent or older text. 1537 When scrolling is done to an earlier point in the text, the 1538 presentation shall not move the scroll position by new received text. 1539 It must be the decision of the local user to return to automatic 1540 viewing of latest text actions. It may be useful with an indication 1541 that there is new text to read after scrolling to an earlier position 1542 has been activated. 1544 The presentation area may become too small to present all text in all 1545 real-time active bubbles. Various techniques can be applied to 1546 provide a good overview and good reading opportunity even in such 1547 situations. The active real-time bubble may have a limited number of 1548 lines and if their contents need more lines, then a scrolling 1549 opportunity within the real-time active bubble is provided. Another 1550 method can be to only show the label and the last line of the active 1551 real-time bubble contents, and make it possible to expand or compress 1552 the bubble presentation between full view and one line view. 1554 Erasures require special consideration. Erasure within a real-time 1555 active bubble is straightforward. But if erasure from one 1556 participant affects the last character before a bubble, the whole 1557 previous bubble becomes the actual bubble for real-time action by 1558 that participant and is placed below all other bubbles in the 1559 presentation area. If the border between bubbles was caused by the 1560 CRLF characters (instead of the normal "Line Separator"), only one 1561 erasure action is required to erase this bubble border. When a 1562 bubble is closed, it is moved up, above all real-time active bubbles. 1564 A three-party view is shown in this example . 1566 _________________________________________________ 1567 | |^| 1568 | |-| 1569 |[Alice] Hi, Alice here. | | 1570 | | | 1571 |[Bob] Bob as well. | | 1572 | | | 1573 |[Eve] Hi, this is Eve, calling from Paris. | | 1574 | I thought you should be here. | | 1575 | | | 1576 |[Alice] I am coming on Thursday, my | | 1577 | performance is not until Friday morning.| | 1578 | | | 1579 |[Bob] And I on Wednesday evening. | | 1580 | | | 1581 |[Alice] Can we meet on Thursday evening? | | 1582 | | | 1583 |[Eve] Yes, definitely. How about 7pm. | | 1584 | at the entrance of the restaurant | | 1585 | Le Lion Blanc? | | 1586 |[Eve] we can have dinner and then take a walk | | 1587 | | | 1588 | But I need to be back to | | 1589 | the hotel by 11 because I need | | 1590 | | | 1591 | I wou |-| 1592 |______________________________________________|v| 1593 | of course, I underst | 1594 |________________________________________________| 1596 Figure 1: Three-party call with bubble style. 1598 Figure 1: Example of a three-party call presented in the bubble 1599 style. 1601 8.2.2. Other presentation styles 1603 Other presentation styles than the bubble style may be arranged and 1604 appreciated by the users. In a video conference one way may be to 1605 have a real-time text area below the video view of each participant. 1606 Another view may be to provide one column in a presentation area for 1607 each participant and place the text entries in a relative vertical 1608 position corresponding to when text entry in them was completed. The 1609 labels can then be placed in the column header. The considerations 1610 for ending and moving and erasure of entered text discussed above for 1611 the bubble style are valid also for these styles. 1613 This figure shows how a coordinated column view MAY be presented. 1615 _____________________________________________________________________ 1616 | Bob | Eve | Alice | 1617 |____________________|______________________|_______________________| 1618 | | |I will arrive by TGV. | 1619 |My flight is to Orly| |Convenient to the main | 1620 | |Hi all, can we plan |station. | 1621 | |for the seminar? | | 1622 |Eve, will you do | | | 1623 |your presentation on| | | 1624 |Friday? |Yes, Friday at 10. | | 1625 |Fine, wo | |We need to meet befo | 1626 |___________________________________________________________________| 1628 Figure 2: A coordinated column-view of a three-party session with 1629 entries ordered in approximate time-order. 1631 9. Presentation details for multi-party unaware endpoints. 1633 Multi-party unaware endpoints are prepared only for presentation of 1634 two sources of text, the local user and a remote user. If mixing for 1635 multi-party unaware endpoints is to be supported, in order to enable 1636 some multi-party communication with such endpoint, the mixer need to 1637 plan the presentation and insert labels and line breaks before 1638 lables. Many limitations appear for this presentation mode, and it 1639 must be seen as a fallback and a last resort. 1641 A procedure for presenting RTT to a conference-unaware endpoint is 1642 included in [I-D.ietf-avtcore-multi-party-rtt-mix] 1644 10. Security Considerations 1646 The security considerations valid for RFC 4103 [RFC4103] and RFC 3550 1647 [RFC3550] are valid also for the multi-party sessions with text. 1649 11. IANA Considerations 1651 The items for indication and negotiation of capability for multi- 1652 party rtt should be registered with IANA in the specifications where 1653 they are specified in detail. 1655 12. Congestion considerations 1657 The congestion considerations described in RFC 4103 [RFC4103] are 1658 valid also for the recommended RTP-based multi-party use of the real- 1659 time text transport. A risk for congestion may appear if a number of 1660 conference participants are active transmitting text simultaneously, 1661 because the recommended RTP-based multi-party transmission method 1662 does not allow multiple sources of text to contribute to the same 1663 packet. 1665 In situations of risk for congestion, the Focus UA MAY combine 1666 packets from the same source to increase the transmission interval 1667 per source up to one second. Local conference policy in the Focus UA 1668 may be used to decide which streams shall be selected for such 1669 transmission frequency reduction. 1671 13. Acknowledgements 1673 Arnoud van Wijk for contributions to an earlier, expired draft of 1674 this memo. 1676 14. Change history 1678 14.1. Changes to draft-hellstrom-avtcore-multi-party-rtt-solutions-01 1680 Added three more methods for RTP-mixer mixing. Two RFC 5109 FEC 1681 based and another with modified data header to detect source of 1682 completely lost text. 1684 Separated RTP-based and WebRTC based solutions. 1686 Deleted the multi-party-unaware mixing procedure appendix. It is now 1687 included in the draft draft-ietf-avtcore-multi-party-rtt-mix. Kept a 1688 section with a reference to the new place. 1690 14.2. Changes from draft-hellstrom-mmusic-multi-party-rtt-02 to draft- 1691 hellstrom-avtcore-multi-party-rtt-solutions-00 1693 Add discussion about switching performance, as discussed in avtcore 1694 on March 13. 1696 Added that a decrease of transmission interval to 100 ms increases 1697 switching performance by a factor 3, but still not sufficient. 1699 Added that the CSRC-list method also uses 100 milliseconds 1700 transmission interval. 1702 Added the method with multiple primary text in each packet. 1704 Added the timestamp-based method for rtp-mixing proposed by James 1705 Hamlin on March 14. 1707 Corrected the chat style presentation example picture. Delete a few 1708 "[mix]". 1710 14.3. Changes from version draft-hellstrom-mmusic-multi-party-rtt-01 to 1711 -02 1713 Change from a general overview to overview with clear 1714 recommendations. 1716 Splits text coordination methods in three groups. 1718 Recommends rtt-mixer with sources in CSRC-list but referenes to its 1719 spec for details. 1721 Shortened Appendix with conference-unaware example. 1723 Cleaned up preferences. 1725 Inserted pictures of screen-views. 1727 15. References 1729 15.1. Normative References 1731 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1732 Requirement Levels", BCP 14, RFC 2119, 1733 DOI 10.17487/RFC2119, March 1997, 1734 . 1736 15.2. Informative References 1738 [EN301549] ETSI, "EN 301 549. Accessibility requirements for ICT 1739 products and services", November 2019, 1740 . 1744 [I-D.ietf-avtcore-multi-party-rtt-mix] 1745 Hellstrom, G., "RTP-mixer formatting of multi-party Real- 1746 time text", Work in Progress, Internet-Draft, draft-ietf- 1747 avtcore-multi-party-rtt-mix-06, 11 June 2020, 1748 . 1751 [I-D.ietf-mmusic-t140-usage-data-channel] 1752 Holmberg, C. and G. Hellstrom, "T.140 Real-time Text 1753 Conversation over WebRTC Data Channels", Work in Progress, 1754 Internet-Draft, draft-ietf-mmusic-t140-usage-data-channel- 1755 14, 10 April 2020, . 1758 [I-D.ietf-perc-private-media-framework] 1759 Jones, P., Benham, D., and C. Groves, "A Solution 1760 Framework for Private Media in Privacy Enhanced RTP 1761 Conferencing (PERC)", Work in Progress, Internet-Draft, 1762 draft-ietf-perc-private-media-framework-12, 5 June 2019, 1763 . 1766 [NENAi3] NENA, "NENA-STA-010.2-2016. Detailed Functional and 1767 Interface Standards for the NENA i3 Solution", October 1768 2016, . 1770 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1771 Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse- 1772 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1773 DOI 10.17487/RFC2198, September 1997, 1774 . 1776 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1777 A., Peterson, J., Sparks, R., Handley, M., and E. 1778 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1779 DOI 10.17487/RFC3261, June 2002, 1780 . 1782 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1783 with Session Description Protocol (SDP)", RFC 3264, 1784 DOI 10.17487/RFC3264, June 2002, 1785 . 1787 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1788 Jacobson, "RTP: A Transport Protocol for Real-Time 1789 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1790 July 2003, . 1792 [RFC3840] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, 1793 "Indicating User Agent Capabilities in the Session 1794 Initiation Protocol (SIP)", RFC 3840, 1795 DOI 10.17487/RFC3840, August 2004, 1796 . 1798 [RFC3841] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 1799 Preferences for the Session Initiation Protocol (SIP)", 1800 RFC 3841, DOI 10.17487/RFC3841, August 2004, 1801 . 1803 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 1804 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 1805 . 1807 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 1808 Session Initiation Protocol (SIP)", RFC 4353, 1809 DOI 10.17487/RFC4353, February 2006, 1810 . 1812 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A 1813 Session Initiation Protocol (SIP) Event Package for 1814 Conference State", RFC 4575, DOI 10.17487/RFC4575, August 1815 2006, . 1817 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 1818 (SIP) Call Control - Conferencing for User Agents", 1819 BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, 1820 . 1822 [RFC4597] Even, R. and N. Ismail, "Conferencing Scenarios", 1823 RFC 4597, DOI 10.17487/RFC4597, August 2006, 1824 . 1826 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 1827 Correction", RFC 5109, DOI 10.17487/RFC5109, December 1828 2007, . 1830 [RFC5194] van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real- 1831 Time Text over IP Using the Session Initiation Protocol 1832 (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008, 1833 . 1835 [RFC6443] Rosen, B., Schulzrinne, H., Polk, J., and A. Newton, 1836 "Framework for Emergency Calling Using Internet 1837 Multimedia", RFC 6443, DOI 10.17487/RFC6443, December 1838 2011, . 1840 [RFC6881] Rosen, B. and J. Polk, "Best Current Practice for 1841 Communications Services in Support of Emergency Calling", 1842 BCP 181, RFC 6881, DOI 10.17487/RFC6881, March 2013, 1843 . 1845 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 1846 DOI 10.17487/RFC7667, November 2015, 1847 . 1849 [T140] ITU-T, "Recommendation ITU-T T.140 (02/1998), Protocol for 1850 multimedia application text conversation", February 1998, 1851 . 1853 [T140ad1] ITU-T, "Recommendation ITU-T.140 Addendum 1 - (02/2000), 1854 Protocol for multimedia application text conversation", 1855 February 2000, 1856 . 1858 [TS103479] ETSI, "TS 103 479. Emergency communications (EMTEL); Core 1859 elements for network independent access to emergency 1860 services", December 2019, . 1864 [TS22173] 3GPP, "IP Multimedia Core Network Subsystem (IMS) 1865 Multimedia Telephony Service and supplementary services; 1866 Stage 1", 3GPP TS 22.173 17.1.0, 20 December 2019, 1867 . 1869 [TS24147] 3GPP, "Conferencing using the IP Multimedia (IM) Core 1870 Network (CN) subsystem; Stage 3", 3GPP TS 24.147 16.0.0, 1871 19 December 2019, 1872 . 1874 Author's Address 1876 Gunnar Hellstrom 1877 Gunnar Hellstrom Accessible Communication 1878 Esplanaden 30 1879 SE-136 70 Vendelso 1880 Sweden 1882 Phone: +46 708 204 288 1883 Email: gunnar.hellstrom@ghaccess.se