idnits 2.17.1 draft-hellstrom-mmusic-multi-party-rtt-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC4103]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 3, 2020) is 1514 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'ISO 6429' is mentioned on line 1748, but not defined == Missing Reference: 'Alice' is mentioned on line 1711, but not defined == Missing Reference: 'Bob' is mentioned on line 1721, but not defined == Missing Reference: 'Eve' is mentioned on line 1725, but not defined == Missing Reference: 'RFC 4103' is mentioned on line 1740, but not defined == Missing Reference: 'RTP' is mentioned on line 1742, but not defined == Missing Reference: 'RFC 4579' is mentioned on line 1745, but not defined == Missing Reference: 'UTF-8' is mentioned on line 1750, but not defined == Missing Reference: 'Unicode' is mentioned on line 1752, but not defined == Missing Reference: 'UCS-16' is mentioned on line 1758, but not defined == Unused Reference: 'RFC3264' is defined on line 1258, but no explicit reference was found in the text == Outdated reference: A later version (-03) exists of draft-hellstrom-avtcore-multi-party-rtt-source-01 == Outdated reference: A later version (-14) exists of draft-ietf-mmusic-t140-usage-data-channel-11 Summary: 1 error (**), 0 flaws (~~), 14 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Hellstrom 3 Internet-Draft Omnitor 4 Intended status: Best Current Practice March 3, 2020 5 Expires: September 4, 2020 7 Real-time text media handling in multi-party conferences 8 draft-hellstrom-mmusic-multi-party-rtt-02 10 Abstract 12 This memo specifies methods for Real-Time Text (RTT) media handling 13 in multi-party calls. The main RTT transport is to carry Real-Time 14 text by the RTP protocol in a time-sampled mode according to RFC 4103 15 RFC 4103 [RFC4103] . The mechanisms enable the receiving application 16 to present the received real-time text medium separated per source, 17 in different ways according to user preferences. Some presentation 18 related features are also described explaining suitable variations of 19 transmission and presentation of text. 21 Call control features are described for the SIP environment. A 22 number of alternative methods for providing the multi-party 23 negotiation, transmission and presentation are discussed and a 24 recommendation for the main one is provided. The main solution for 25 centralized multi-party handling of real-time text is achieved 26 through a media control unit coordinating multiple RTP text streams 27 into one RTP stream. 29 Alternative methods using a single RTP stream and source 30 identification inline in the text stream are also described, one of 31 them being provided as a lower functionality fallback method for 32 endpoints with no multi-party awareness for RTT. 34 Bridging methods where the text stream is carried untouched by the 35 bridge are also discussed. 37 Brief information is also provided for multi-party RTT in the WebRTC 38 environment. 40 Status of This Memo 42 This Internet-Draft is submitted in full conformance with the 43 provisions of BCP 78 and BCP 79. 45 Internet-Drafts are working documents of the Internet Engineering 46 Task Force (IETF). Note that other groups may also distribute 47 working documents as Internet-Drafts. The list of current Internet- 48 Drafts is at https://datatracker.ietf.org/drafts/current/. 50 Internet-Drafts are draft documents valid for a maximum of six months 51 and may be updated, replaced, or obsoleted by other documents at any 52 time. It is inappropriate to use Internet-Drafts as reference 53 material or to cite them other than as "work in progress." 55 This Internet-Draft will expire on September 4, 2020. 57 Copyright Notice 59 Copyright (c) 2020 IETF Trust and the persons identified as the 60 document authors. All rights reserved. 62 This document is subject to BCP 78 and the IETF Trust's Legal 63 Provisions Relating to IETF Documents 64 (https://trustee.ietf.org/license-info) in effect on the date of 65 publication of this document. Please review these documents 66 carefully, as they describe your rights and restrictions with respect 67 to this document. Code Components extracted from this document must 68 include Simplified BSD License text as described in Section 4.e of 69 the Trust Legal Provisions and are provided without warranty as 70 described in the Simplified BSD License. 72 Table of Contents 74 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 75 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 76 2. Centralized conference model . . . . . . . . . . . . . . . . 4 77 3. Requirements on multi-party RTT . . . . . . . . . . . . . . . 5 78 4. Coordination of text RTP streams . . . . . . . . . . . . . . 6 79 4.1. RTP-based solutions with a central mixer . . . . . . . . 6 80 4.1.1. RTP Mixer indicating sources in CSRC-list . . . . . . 6 81 4.1.2. RTP Mixer indicating participants by a control code 82 in the stream . . . . . . . . . . . . . . . . . . . . 8 83 4.1.3. Mixing for conference-unaware user agents . . . . . . 9 84 4.2. RTP-based bridging with RTT media contents untouched by 85 the bridge . . . . . . . . . . . . . . . . . . . . . . . 10 86 4.2.1. RTP Translator sending one RTT stream per participant 10 87 4.2.2. Distributing packets in an end-to-end encryption 88 structure . . . . . . . . . . . . . . . . . . . . . . 11 89 4.2.3. Mesh of RTP endpoints . . . . . . . . . . . . . . . . 12 90 4.2.4. Multiple RTP sessions, one for each participant . . . 12 91 4.3. RTT bridging in WebRTC . . . . . . . . . . . . . . . . . 13 92 4.3.1. RTT bridging in WebRTC with one data channel per 93 source . . . . . . . . . . . . . . . . . . . . . . . 13 94 4.3.2. RTT bridging in WebRTC with one common data channel . 14 95 5. Preferred multi-party RTT transport method . . . . . . . . . 14 96 6. Session control of multi-party RTT sessions . . . . . . . . . 15 97 6.1. Implicit RTT multi-party capability indication . . . . . 16 98 6.2. RTT multi-party capability declared by SIP media-tags . . 17 99 6.3. SDP media attribute for RTT multi-party capability 100 indication . . . . . . . . . . . . . . . . . . . . . . . 18 101 6.4. Simplified SDP media attribute for RTT multi-party 102 capability indication . . . . . . . . . . . . . . . . . . 19 103 6.5. SDP format parameter for RTT multi-party capability 104 indication . . . . . . . . . . . . . . . . . . . . . . . 20 105 6.6. Preferred capability declaration method. . . . . . . . . 21 106 7. Identification of the source of text . . . . . . . . . . . . 21 107 8. Presentation of multi-party text . . . . . . . . . . . . . . 21 108 8.1. Associating identities with text streams . . . . . . . . 22 109 8.2. Presentation details for multi-party aware UAs. . . . . . 22 110 8.2.1. Bubble style presentation . . . . . . . . . . . . . . 22 111 8.2.2. Other presentation styles . . . . . . . . . . . . . . 24 112 9. Presentation details for multi-party unaware UAs. . . . . . . 25 113 10. Security Considerations . . . . . . . . . . . . . . . . . . . 25 114 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 115 12. Congestion considerations . . . . . . . . . . . . . . . . . . 26 116 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 117 14. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 118 14.1. Changes from version -01 to -02 . . . . . . . . . . . . 26 119 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 120 15.1. Normative References . . . . . . . . . . . . . . . . . . 26 121 15.2. Informative References . . . . . . . . . . . . . . . . . 27 122 Appendix A. Mixing for a conference-unaware UA . . . . . . . . . 29 123 A.1. Short description . . . . . . . . . . . . . . . . . . . . 29 124 A.2. Functionality goals and drawbacks . . . . . . . . . . . . 30 125 A.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 30 126 A.4. Presentation level procedures . . . . . . . . . . . . . . 32 127 A.4.1. Structure . . . . . . . . . . . . . . . . . . . . . . 33 128 A.4.2. Action on reception . . . . . . . . . . . . . . . . . 33 129 A.5. Display examples . . . . . . . . . . . . . . . . . . . . 36 130 A.6. References for this Appendix . . . . . . . . . . . . . . 38 131 A.7. Acknowledgement for the appendix . . . . . . . . . . . . 38 132 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 38 134 1. Introduction 136 Real-time text (RTT) is a medium in real-time conversational 137 sessions. Text entered by participants in a session is transmitted 138 in a time-sampled fashion, so that no specific user action is needed 139 to cause transmission. This gives a direct flow of text in the rate 140 it is created, that is suitable in a real-time conversational 141 setting. The real-time text medium can be combined with other media 142 in multimedia sessions. 144 Media from a number of multimedia session participants can be 145 combined in a multi-party session. This memo specifies how the real- 146 time text streams can be handled in multi-party sessions. 148 The description is mainly focused on the transport level, but also 149 describes a few session and presentation level aspects. 151 Transport of real-time text is specified in RFC 4103 [RFC4103] RTP 152 Payload for text conversation. It makes use of RFC 3550 [RFC3550] 153 Real Time Protocol, for transport. Robustness against network 154 transmission problems is normally achieved through redundant 155 transmission based on the principle from RFC 2198 [RFC2198], with one 156 primary and two redundant transmission of each text element. Primary 157 and redundant transmissions are combined in packets and described by 158 a redundancy header. This transport is usually used in the SIP 159 Session Initiation Protocol RFC 3261 [RFC3261] environment. 161 A very brief overview of functions for real-time text handling in 162 multi-party sessions is described in RFC 4597 [RFC4597] Conferencing 163 Scenarios, sections 4.8 and 4.10. This specification builds on that 164 description and indicates which protocol mechanisms should be used to 165 implement multi-party handling of real-time text. 167 Real-time text can also be transported in the WebRTC environment, by 168 using WebRTC data channels according to 169 [I-D.ietf-mmusic-t140-usage-data-channel]. Multi-party aspects for 170 WebRTC solutions are briefly covered. 172 1.1. Requirements Language 174 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 175 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 176 document are to be interpreted as described in RFC 2119 [RFC2119]. 178 2. Centralized conference model 180 In the centralized conference model for SIP, introduced in RFC 4353 181 [RFC4353] A Framework for Conferencing with the Session Initiation 182 Protocol (SIP), one function co-ordinates the communication with 183 participants in the multi-party session. This function also controls 184 media mixer functions for the media appearing in the session. The 185 central function is common for control of all media, while the media 186 mixers may work differently for each media. 188 The central function is called the Focus UA. Many variants exist for 189 setting up sessions including the multipoint control centre. It is 190 not within scope of this description to describe these, but rather 191 the media specific handling in the mixer required to handle multi- 192 party calls with RTT. 194 The main principle for handling real-time text media in a centralized 195 conference is that one RTP session for real-time text is established 196 including the multipoint media control centre and the participating 197 endpoints which are going to have real-time text exchange with the 198 others. 200 The different possible mechanisms for mixing and transporting RTT 201 differs in the way they multiplex the text streams and how they 202 identify the sources of the streams. RFC 7667 [RFC7667] describes a 203 number of possible use cases for RTP. This specification refers to 204 different sections of RFC 7667 for further reading of the situations 205 caused by the different possible design choices. 207 The recommended method for using RTT in a centralized conference 208 model is specified in [I-D.hellstrom-avtcore-multi-party-rtt-source] 210 Real-time text can also be transported in the WebRTC environment, by 211 using WebRTC datachannels according to 212 [I-D.ietf-mmusic-t140-usage-data-channel]. Ways to handle multi- 213 party calls in that environmnent are also specified. 215 3. Requirements on multi-party RTT 217 The following requirements are placed on multi-party RTT: 219 A solution shall be applicable to IMS (3GPP TS 22.173)[TS22173], 220 SIP based VoIP and Next Generation Emergency Services (NENA i3 221 [NENAi3], ETSI TS 103 479 [TS103479], RFC 6443[RFC6443]). 223 The transmission interval for text must not be longer than 500 224 milliseconds when there is anything available to send. Ref ITU-T 225 T.140 [T140]. 227 If text loss is detected or suspected, a missing text marker shall 228 be inserted in the text stream. Ref ITU-T T.140 Amendment 1 229 [T140ad1]. ETSI EN 301 549 [EN301549] 231 The display of text from the members of the conversation shall be 232 arranged so that the text from each participant is clearly 233 readable, and its source and the relative timing of entered text 234 is visualized in the display. Mechanisms for looking back in the 235 contents from the current session should be provided. The text 236 should be displayed as soon as it is received. Ref ITU-T T.140 237 [T140] 238 Bridges must be multimedia capable (voice, video, text). Ref NENA 239 i3 STA-010.2. [NENAi3] 241 R7: It MUST be possible to use real-time text in conferences both 242 as a medium of discussion between individual participants (for 243 example, for sidebar discussions in real-time text while listening 244 to the main conference audio) and for central support of the 245 conference with real-time text interpretation of speech. Ref RFC 246 5194.[RFC5194] 248 It should be possible to protect RTT contents with usual means for 249 privacy and integrity.Ref RFC 6881 section 16. [RFC6881] 251 Conferencing procedures are documented in RFC 4579 [RFC4579]. Ref 252 NENA i3 STA-010.2.[NENAi3] 254 Conferencing applies to any kind of media stream by which users 255 may want to communicate. Ref 3GPP TS 24.147 [TS24147] 257 The framework for SIP conferences is specified in RFC 4353 258 [RFC4353]. Ref 3GPP TS 24.147 [TS24147] 260 4. Coordination of text RTP streams 262 Coordinating and sending text RTP streams in the multi-party session 263 can be done in a number of ways. The most suitable methods are 264 specified here with pros and cons. 266 A receiving and presenting UA SHOULD separate text from the different 267 sources and identify and display them accordingly. 269 4.1. RTP-based solutions with a central mixer 271 A set of solutions can be based on the central RTP mixer. They are 272 described here and a preferred method selected. 274 4.1.1. RTP Mixer indicating sources in CSRC-list 276 An RTP media mixer combines text from participants into one RTP 277 stream , thus all using the same destination address/port 278 combination, the same RTP SSRC and , one sequence number series as 279 described in Section 7.1 and 7.3 of RTP RFC 3550 [RFC3550] about the 280 Mixer function. This method is also briefly described in RFC 7667, 281 section 3.6.1 Media mixing mixer [RFC7667]. 283 The sources of the text in each RTP packet are identified by the CSRC 284 list in the RTP packets, containing the SSRC of the initial sources 285 of text. The order of the CSRC parameters is with the SSRC of the 286 source of the primary text first, followed by the SSRC of the first 287 level redundancy, and then the second level. 289 A set of specific rules for the application of this method together 290 with RFC 4103 [RFC4103]is specified in 291 [I-D.hellstrom-avtcore-multi-party-rtt-source] 293 The identification of the sources is made through the CSRC fields and 294 can be made more readable through the RTCP SDES CNAME and NAME 295 packets as described in RTP[RFC3550]. 297 Also information provided through the notification according to RFC 298 4575 [RFC4575] when the participant joined the conference provides 299 suitable information and a reference to the SSRC. 301 A receiving UA is supposed to separate text items from the different 302 sources and identify and display them accordingly. 304 The ordered CSRC lists in the RFC 4103 [RFC4103] packets make it 305 possible to recover from loss of one and two packets in sequence and 306 assign the recovered text to the right source. For more loss, a 307 marker for possible loss should be inserted or presented. 309 The conference server need to have authority to decrypt the payload 310 in the RTP packets in order to be able to recover text from redundant 311 data or insert the missing text marker in the stream, and repack the 312 text in new packets. 314 Pros: 316 This method has low overhead. 318 When loss of packets occur, it is possible to recover text from 319 redundancy at loss of up to the number of redundancy levels carried 320 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 321 levels. 323 This method can be implemented with most RTP implementations. 325 Cons: 327 When more consecutive packet loss than the number of generations of 328 redundant data appears, it is not possible to deduct the sources of 329 the totally lost data. 331 The conference server need to be allowed to decrypt/encrypt the 332 packet payload. This is however normal for media mixers for other 333 media. 335 4.1.2. RTP Mixer indicating participants by a control code in the 336 stream 338 Text from all participants except the receiving one is transmitted 339 from the media mixer in the same RTP session and stream, thus all 340 using the same destination address/port combination, the same RTP 341 SSRC and , one sequence number series as described in Section 7.1 and 342 7.3 of RTP RFC 3550 [RFC3550] about the Mixer function. The sources 343 of the text in each RTP packet are identified by a new defined T.140 344 control code "c" followed by a unique identification of the source in 345 UTF-8 string format. 347 The receiver can use the string for presenting the source of text. 348 This method is on the RTP level described in RFC 7667, section 3.6.1 349 Media mixing mixer [RFC7667]. 351 The inline coding of the source of text is applied in the data stream 352 itself, and an RTP mixer function is used for coordinating the 353 sources of text into one RTP stream. 355 Information uniquely identifying each user in the multi-party session 356 is placed as the parameter value "n" in the T.140 application 357 protocol function with the function code "c". The identifier shall 358 thus be formatted like this: SOS c n ST, where SOS and ST are coded 359 as specified in ITU-T T.140 [T140]. The "c" is the letter "c". The 360 n parameter value is a string uniquely identifying the source. This 361 parameter shall be kept short so that it can be repeated in the 362 transmission without concerns for network load. 364 A receiving UA is supposed to separate text items from the different 365 sources and identify and display them accordingly. 367 The conference server need to be allowed to decrypt/encrypt the 368 packet payload in order to check the source and repack the text. 370 Pros: 372 If loss of packets occur, it is possible to recover text from 373 redundancy at loss of up to the number of redundancy levels carried 374 in the RFC 4103 [RFC4103]stream. (normally primary and two redundant 375 levels. 377 This method can be implemented with most RTP implementations. 379 Transmitted text can also be used with other transports than RTP 381 Cons: 383 The method implies a moderate load by the need to insert the source 384 often in the stream. 386 If more consecutive packet loss than the number of generations of 387 redundant data appears, it is not possible to deduct the source of 388 the totally lost data. 390 The mixer needs to be able to generate suitable and unique source 391 identifications which are suitable as labels for the sources. 393 Requires an extension on the ITU-T T.140 standard, best made by the 394 ITU. 396 The conference server need to be allowed to decrypt/encrypt the 397 packet payload. 399 The conference server need to be allowed to decrypt/encrypt the 400 packet payload. 402 4.1.3. Mixing for conference-unaware user agents 404 Multi-party real-time text contents can be transmitted to conference- 405 unaware user agents if source labeling and formatting of the text is 406 performed by a mixer. This method has the limitations that the 407 layout of the presentation and the format of source identification is 408 purely controlled by the mixer, and that only one source at a time is 409 allowed to present in real-time. Other sources need to be stored 410 temporarily waiting for an appropriate moment to switch the source of 411 transmitted text. The mixer controls the switching of sources and 412 inserts a source identifier in text format at the beginning of text 413 after switch of source. The logic of trhe mixer to detect when a 414 switch is appropriate should detect a number of places in text where 415 a switch can be allowed, including new line, end of sentence, end of 416 phrase, a period of inactivity, and a word separator after a long 417 time of active transmission. 419 This method MAY be used when no support for multi-party awareness is 420 detected in the receiving endpoint.The base for his method is 421 described in RFC 7667, section 3.6.1 Media mixing mixer [RFC7667]. 423 See Appendix A for an informative example of a procedure for 424 presenting RTT to a conference-unaware UA. 426 Pros: 428 Can be transmitted to conference-unaware endpoints. 430 Can be used with other transports than RTP 431 Cons: 433 Does not allow full real-time presentation of more than one source at 434 a time. Text from other sources will be delayed, even if automatic 435 detection of suitable moments for switching source for presentation 436 is made by the mixer. 438 The only realistic presentation format is a style with the text from 439 the different sources presented with a text label indicating source, 440 and the text collected in a chat style presentation but with more 441 frequent turn-taking. 443 Endpoints often have their own system for adding labels to the RTT 444 presentation. In that case there will be two levels of labels in the 445 presentation, one for the mixer and one for the sources. 447 If loss of more packets than can be recovered by the redundancy 448 appears, it is not possible to detect which source was struck by the 449 loss. It is also possible that a source switch occurred during the 450 loss, and therefore a false indication of the source of text can be 451 provided to the user after such loss. 453 Because of all these cons, this method is not recommended and MUST 454 NOT be used as the main method, but only as the last resort for 455 backwards interoperability with conference-unaware endpoints. 457 The conference server need to be allowed to decrypt/encrypt the 458 packet payload. 460 4.2. RTP-based bridging with RTT media contents untouched by the bridge 462 It may be desirable to send text in a multi-party setting in a way 463 that allows the text stream contents to be distributed without 464 decryption and encryption in any central server. A number of such 465 methods are described. However, when writing this specification, no 466 one of these methods have a specified way of establishing the session 467 by sdp. 469 4.2.1. RTP Translator sending one RTT stream per participant 471 Within the RTP session, text from each participant is transmitted 472 from the RTP media translator in a separate RTP stream, thus using 473 the same destination address/port combination, but separate RTP SSRC 474 parameters and sequence number series as described in Section 7.1 and 475 7.2 of RTP RFC 3550 [RFC3550] about the Translator function. The 476 sources of the text in each RTP packet are identified by the SSRC 477 parameters in the RTP packets, containing the SSRC of the initial 478 sources of text. 480 A receiving and presenting UA is supposed to separate text items from 481 the different sources and identify and display them in a suitable 482 way. 484 This method is described in RFC 7667, section 3.5.1 Relay-transport 485 translator or 3.5.2 Media translator [RFC7667]. 487 The identification of the source is made through the RTCP SDES CNAME 488 and NAME packets as described in RTP[RFC3550]. 490 Pros: 492 This method has moderate overhead. When loss of packets occur, it is 493 possible to recover text from redundancy at loss of up to the number 494 of redundancy levels carried in the RFC 4103 [RFC4103] stream. 495 (normally primary and two redundant levels. 497 More loss than what can be recovered, can be detected and the marker 498 for text loss can be inserted in the correct stream. 500 It may be possible in some scenarios to keep the text encrypted 501 through the Translator. 503 Cons: 505 There may be RTP implementations not supporting the Translator model. 507 This configuration is not supported by current media declarations in 508 sdp. RFC 3264 [RFC3264]specifies in many places that one media 509 description is supposed to describe just one RTP stream. 511 4.2.2. Distributing packets in an end-to-end encryption structure 513 In order to achieve end-to-end encryption, it is possible to let the 514 packets from the sources just pass though a central distributor, and 515 handle the security agreements between the participants. 516 Specifications exist for a framework with this functionality suitable 517 for application on RTP based conferences in 518 [I-D.ietf-perc-private-media-framework]. The RTP flow and mixing 519 characteristics has similarities with the method described under "RTP 520 Translator sending one RTT stream per participant" above. RFC 4103 521 RTP streams [RFC4103] would fit into the structure and it would 522 provide a base for end-to-end encrypted rtt multi-party conferencing. 524 Pros: 526 Good security 527 Straightforward multi-party handling. 529 Cons: 531 Does not operate under the usual SIP central conferencing 532 architecture. 534 Requires the participants to perform a lot of key handling. 536 4.2.3. Mesh of RTP endpoints 538 Text from all participants are transmitted directly to all others in 539 one RTP session, without a central bridge. The sources of the text 540 in each RTP packet are identified by the source network address and 541 the SSRC. 543 This method is described in RFC 7667, section 3.4 Point to multi- 544 point using mesh [RFC7667]. 546 Pros: 548 When loss of packets occur, it is possible to recover text from 549 redundancy at loss of up to the number of redundancy levels carried 550 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 551 levels. 553 This method can be implemented with most RTP implementations. 555 Transmitted text can also be used with other transports than RTP 557 Cons: 559 This model is not described in IMS, NENA and EENA specifications, and 560 does therefore not meet the requirements. 562 4.2.4. Multiple RTP sessions, one for each participant 564 Text from all participants are transmitted directly to all others in 565 one RTP session each, without a central bridge. Each session is 566 established with a separate media description in SDP. The sources of 567 the text in each RTP packet are identified by the source network 568 address and the SSRC. 570 This method is out of scope for further discussion here, because the 571 foreseen applications use centralized model conferencing. 573 Pros: 575 When loss of packets occur, it is possible to recover text from 576 redundancy at loss of up to the number of redundancy levels carried 577 in the RFC 4103 [RFC4103] stream. (normally primary and two redundant 578 levels. 580 Complete loss of text can be indicated in the received stream. 582 This method can be implemented with most RTP implementations. 584 End-to-end encryption is achievable. 586 Cons: 588 This method is not described in IMS, NENA and ETSI specifications and 589 does therefore not meet the requirements. 591 A lot of network resources are spent on setting up separate sessions 592 for each participant. 594 4.3. RTT bridging in WebRTC 596 Within WebRTC, real-time text is specified to be carried in WebRTC 597 data channels as specified in 598 [I-D.ietf-mmusic-t140-usage-data-channel]. A few ways to handle 599 multi-party RTT are mentioned briefly. They are explained and 600 further detailed below. 602 4.3.1. RTT bridging in WebRTC with one data channel per source 604 A straightforward way to handle multi-party RTT is for the bridge to 605 open one T.140 data channel per source towards the receiving 606 participants. 608 The stream-id forms a unique stream identification. 610 The identification of the source is made through the Label property 611 of the channel, and session information belonging to the source. The 612 UA can compose a readable label for the presentation from this 613 information. 615 Pros: 617 This is a straightforward solution. 619 Cons: 621 With a high number of participants, the overhead of establishing the 622 high number of data channels required may be high. 624 4.3.2. RTT bridging in WebRTC with one common data channel 626 A way to handle multi-party RTT in WebRTC is for the bridge combine 627 text from all sources into one data channel and insert the sources in 628 the stream by a T.140 control code for source. 630 This method is described in a corresponding section for RTP 631 transmission above Section 4.1.2. 633 The identification of the source is made through insertion in the 634 beginning of each text transmission from a source of a control code 635 extension "c" followed by a string representing the source, framed by 636 the control code start and end flags SOS and ST (See ITU-T T.140 637 [T140]). 639 A receiving UA is supposed to separate text items from the different 640 sources and identify and display them in a suitable way. 642 The UA does not always display the source identification in the 643 received text at the place where it is received, but has the 644 information as a guide for planning the presentation of received 645 text. A label corresponding to the source identification is 646 presented when needed depending on the selected presentation style. 648 Pros: 650 This solution has relatively low overhead on session and network 651 level 653 Cons: 655 This solution has higher overhead on the media contents level than 656 the WebRTC solution above. 658 Standardisation of the new control code "c" in ITU-T T.140 [T140] is 659 required. 661 The conference server need to be allowed to decrypt/encrypt the data 662 channel contents. 664 5. Preferred multi-party RTT transport method 666 For RTP transport of RTT using RTP-mixer technology, one method for 667 multi-party mixing and transport stand out as fulfilling the goals 668 best and is therefore recommended. That is: "RTP Mixer indicating 669 participants in CSRC list" Section 4.1.1. 671 For RTP transport in separate streams or sessions, a bridging method 672 with good characteristics is the end-to-end encryption model "perc" 673 Section 4.2.2. 675 For WebRTC, one method is to prefer because of the simplicity. So, 676 for WebRTC, the method to implement for multi-party RTT with 677 conference-aware parties when no other method is explicitly agreed 678 between implementing parties is: "RTT bridging in WebRTC with one 679 data channel per source" Section 4.3.1. 681 6. Session control of multi-party RTT sessions 683 General session control aspects for multi-party sessions are 684 described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP) 685 Event Package for Conference State, and RFC 4579 [RFC4579] Session 686 Initiation Protocol (SIP) Call Control - Conferencing for User 687 Agents. The nomenclature of these specifications are used here. 689 The procedures for a conference-aware model for RTT-transmission 690 shall only be applied if a capability exchange for conference-aware 691 real-time text transmission has been completed and a supported method 692 for multi-party real-time text transmission can be negotiated. 694 A method for detection of conference-awareness for centralized SIP 695 conferencing in general is specified in RFC 4579 [RFC4579]. The 696 focus sends the "isfocus" feature tag in a SIP Contact header. This 697 causes the conference-aware UA to subscribe to conference 698 notifications from the focus. The focus then sends notifications to 699 the UA about entering and disappearing conference participants and 700 their media capabilities. The information is carried XML-formatted 701 in a 'conference-info' block in the notification according to RFC 702 4575 [RFC4575]. The mechanism is described in detail in RFC 4575 703 [RFC4575]. 705 Before a conference media server starts sending multi-party RTT to a 706 UA, a verification of its ability to handle multi-party RTT must be 707 made. A decision on which mechanism to use for identifying text from 708 the different participants must also be taken, implicitly or 709 explicitly. These verifications and decisions can be done in a 710 number of ways. The most apparent ways are specified here and their 711 pros and cons described. One of the methods is selected to be the 712 one to be used by implementations of the centralized conference model 713 according to this specification. 715 6.1. Implicit RTT multi-party capability indication 717 Capability for RTT multi-party handling can be decided to be 718 implicitly indicated by session control items. 720 The focus may implicitly indicate muti-party RTT capability by 721 including the media child with value "text" in the RFC 4575 [RFC4575] 722 conference-info provided in conference notifications. 724 A UA may implicitly indicate multi-party RTT capability by including 725 the text media in the SDP in the session control transactions with 726 the conference focus after the subscription to the conference has 727 taken place. 729 The implicit RTT capability indication means for the focus that it 730 can handle multi-party RTT according to the preferred method 731 indicated in the RTT multi-party methods section above. 733 The implicit RTT capability indication means for the UA that it can 734 handle multi-party RTT according to the preferred method indicated in 735 the RTT multi-party methods section above. 737 If the focus detects that a UA implicitly declared RTT multi-party 738 capability, it SHALL provide RTT according to the preferred method. 740 If the focus detects that the UA does not indicate any RTT multi- 741 party capability, then it shall either provide RTT multi-party text 742 in the way specified for conference-unaware UA above, or refuse to 743 set up the session. 745 If the UA detects that the focus has implicitly declared RTT multi- 746 party capability, it shall be prepared to present RTT in a multi- 747 party fashion according to the preferred method. 749 Pros: 751 Acceptance of implicit multi-party capability implies that no 752 standardisation of explicit RTT multi-party capability exchange is 753 required. 755 Cons: 757 If other methods for multi-party RTT are to be used in the same 758 implementation environment as the preferred ones,then capability 759 exchange needs to be defined for them. 761 Cannot be used outside a strictly applied SIP central conference 762 model. 764 6.2. RTT multi-party capability declared by SIP media-tags 766 Specifications for RTT multi-party capability declarations can be 767 agreed for use as SIP media feature tags, to be exchanged during SIP 768 call control operation according to the mechanisms in RFC 3840 769 [RFC3840] and RFC 3841 [RFC3841]. Capability for the RTT Multi-party 770 capability is then indicated by the media feature tag "rtt-mix", with 771 a set of possible values for the different possible methods. 773 The possible values in the list may be: 775 rtp-mixer 777 perc 779 rtp-mixer indicates capability for using the RTP-mixer based 780 presentation of multi-party text. 782 perc indicates capability for using the perc based transmission of 783 multi-party text. 785 Example: Contact: 787 ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL" 789 ;+sip.rtt-mix="rtp-mixer" 791 If, after evaluation of the alternatives in this specification, only 792 one mixing method is selected to be brought to implementation, then 793 the media tag can be reduced to a single tag with no list of values. 795 An offer-answer exchange should take place and the common method 796 selected by the answering party shall be used in the session with 797 that UA. 799 When no common method is declared, then only the fallback method for 800 multi-party unaware participants can be used, or the session dropped. 802 If more than one text media line is included in SDP, all must be 803 capable of using the declared RTT multi-party method. 805 Pros: 807 Provides a clear decision method. 809 Can be extended with new mixing methods. 811 Can guide call routing to a suitable capable focus. 813 Cons: 815 Requires standardization and IANA registration. 817 Is not stream specific. If more than one text stream is specified, 818 all must have the same type of multi-party capability. 820 Cannot be used in the WebRTC environment. 822 6.3. SDP media attribute for RTT multi-party capability indication 824 An attribute can be specified on media level, to be used in text 825 media SDP declarations for negotiating RTT multi-party capabilities. 826 The attribute can have the name "rtt-mix", with one or more of its 827 possible values in a comma-separated list. 829 The possible values in the list are: 831 rtp-mixer 833 perc 835 rtp-mixer indicates capability for using the RTP-mixer based 836 presentation of multi-party text. 838 perc indicates capability for using the perc based transmission of 839 multi-party text. 841 An offer-answer exchange should take place and the common method 842 selected by the answering party shall be used in the session with 843 that UA. 845 When no common method is declared, then only the fallback method can 846 be used. 848 Example: a=rtt-mix:rtp-mixer 850 If, after evaluation of the alternatives in this specification, only 851 one mixing method is selected to be brought to implementation, then 852 the attribute can be reduced to a single attribute with no list of 853 values. 855 Pros: 857 Provides a clear decision method. 859 Can be extended with new mixing methods. 861 Can be used on specific text media. 863 Can be used also for SDP-controlled WebRTC sessions with multiple 864 streams in the same data channel. 866 Cons: 868 Requires standardization and IANA registration. 870 Cannot guide SIP routing. 872 6.4. Simplified SDP media attribute for RTT multi-party capability 873 indication 875 An attribute can be specified on media level, to be used in text 876 media SDP declarations for negotiating RTT multi-party capabilities. 877 The attribute can have the name "rtt-mix" with no value. It would be 878 selected and used if only one method for multi-party rtt is brought 879 forward from this specification, and the other suppressed or found to 880 be possible to negotiate in another way.. 882 An offer-answer exchange should take place and if both parties 883 specify rtt-mix capability, the method for indicating source in the 884 CSRC-list shall be used. 886 When no common method is declared, then only the fallback method can 887 be used, or the session not accepted for multi-party use. 889 Example: a=rtt-mix 891 Pros: 893 Provides a clear decision method. 895 Very simple syntax and semantics. 897 Can be used on specific text media. 899 Can be used also for SDP-controlled WebRTC sessions with multiple 900 streams in the same data channel. 902 Cons: 904 Requires standardization and IANA registration. 906 Cannot guide SIP routing. 908 6.5. SDP format parameter for RTT multi-party capability indication 910 An FMTP format parameter can be specified for the RFC 4103 911 [RFC4103]media, to be used in text media SDP declarations for 912 negotiating RTT multi-party capabilities. The parameter can have the 913 name "rtt-mix", with one or more of its possible values. 915 The possible values in the list are: 917 rtp-mixer 919 perc 921 rtp-mixer indicates capability for using the RTP-mixer based 922 presentation of multi-party text. 924 perc indicates capability for using the perc based transmission of 925 multi-party text. 927 Example: a=fmtp 96 98/98/98 cps=30;rtt-mix=rtp-mixer 929 If, after evaluation of the alternatives in this specification, only 930 one mixing method is selected to be brought to implementation, then 931 the parameter can be reduced to a single parameter with no list of 932 values. 934 An offer-answer exchange should take place and the common method 935 selected by the answering party shall be used in the session with 936 that UA. 938 When no common method is declared, then only the fallback method can 939 be used, or the session denied. 941 Pros: 943 Provides a clear decision method. 945 Can be extended with new mixing methods. 947 Can be used on specific text media. 949 Can be used also for SDP-controlled WebRTC sessions with multiple 950 streams in the same data channel. 952 Cons: 954 Requires standardization and IANA registration. 956 May cause interop problems with current RFC4103 [RFC4103] 957 implementations not expecting a new fmtp-parameter. 959 Cannot guide SIP routing. 961 6.6. Preferred capability declaration method. 963 The preferred capability declaration method is the one with a 964 simplified SDP attribute "a=rtt-mix" Section 6.4 because it is 965 straightforward and partially usable also for WebRTC. 967 7. Identification of the source of text 969 The main way to identify the source of text in the RTP based solution 970 is by the SSRC of the sending participant. In the RTP-mixer 971 solution, it is included in the CSRC list of the transmitted packets. 972 Further identification that may be needed for better labeling of 973 received text may be achieved from a number of sources. It may be 974 the RTCP SDES CNAME and NAME reports, and in the conference 975 notification data (RFC 4575) [RFC4575]. 977 As soon as a new member is added to the RTP session, its 978 characteristics should be transmitted in RTCP SDES CNAME and NAME 979 reports according to section 6.5 in RFC 3550 [RFC3550]. The 980 information about the participant should also be included in the 981 conference data including the text media member in a notification 982 according to RFC 4575 [RFC4575]. 984 The RTCP SDES report, SHOULD contain identification of the source 985 represented by the SSRC/CSRC identifier. This identification MUST 986 contain the CNAME field and MAY contain the NAME field and other 987 defined fields of the SDES report. 989 A focus UA SHOULD primarily convey SDES information received from the 990 sources of the session members. When such information is not 991 available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME 992 information from available information from the SIP session with the 993 participant. 995 8. Presentation of multi-party text 997 All session participants MUST observe the SSRC/CSRC field of incoming 998 text RTP packets, and make note of what source they came from in 999 order to be able to present text in a way that makes it easy to read 1000 text from each participant in a session, and get information about 1001 the source of the text. 1003 8.1. Associating identities with text streams 1005 A source identity SHOULD be composed from available information 1006 sources and displayed together with the text as indicated in ITU-T 1007 T.140 Appendix[T140]. 1009 The source identity should primarily be the NAME field from incoming 1010 SDES packets. If this information is not available, and the session 1011 is a two-party session, then the T.140 source identity SHOULD be 1012 composed from the SIP session participant information. For multi- 1013 party sessions the source identity may be composed by local 1014 information if sufficient information is not available in the 1015 session. 1017 Applications may abbreviate the presented source identity to a 1018 suitable form for the available display. 1020 Applications may also replace received source information with 1021 internally used nicknames. 1023 8.2. Presentation details for multi-party aware UAs. 1025 The multi-party aware UA should after any action for recovery of data 1026 from lost packets, separate the incoming streams and present them 1027 according to the style that the receiving application supports and 1028 the user has selected. The decisions taken for presentation of the 1029 multi-party interchange shall be purely on the receiving side. The 1030 sending application must not insert any item in the stream to 1031 influence presentation that is not requested by the sending 1032 participant. 1034 8.2.1. Bubble style presentation 1036 One often used style is to present real-time text in chunks in 1037 readable bubbles identified by labels containing names of sources. 1038 Bubbles are placed in one column in the presentation area and are 1039 closed and moved upwards in the presentation area after certain items 1040 or events, when there is also newer text from another source that 1041 would go into a new bubble. The text items that allows bubble 1042 closing are any character closing a phrase or sentence followed by a 1043 space or a timeout of a suitable time (about 10 seconds). 1045 Real-time active text sent from the local user should be presented in 1046 a separate area. When there is a reason to close a bubble from the 1047 local user, the bubble should be placed above all real-time active 1048 bubbles, so that the time order that real-time text entries were 1049 completed is visible. 1051 Scrolling is usually provided for viewing of recent or older text. 1052 When scrolling is done to an earlier point in the text, the 1053 presentation shall not move the scroll position by new received text. 1054 It must be the decision of the local user to return to automatic 1055 viewing of latest text actions. It may be useful with an indication 1056 that there is new text to read after scrolling to an earlier position 1057 has been activated. 1059 The presentation area may become too small to present all text in all 1060 real-time active bubbles. Various techniques can be applied to 1061 provide a good overview and good reading opportunity even in such 1062 situations. The active real-time bubble may have a limited number of 1063 lines and if their contents need more lines, then a scrolling 1064 opportunity within the real-time active bubble is provided. Another 1065 method can be to only show the label and the last line of the active 1066 real-time bubble contents, and make it possible to expand or compress 1067 the bubble presentation between full view and one line view. 1069 Erasures require special consideration. Erasure within a real-time 1070 active bubble is straightforward. But if erasure from one 1071 participant affects the last character before a bubble, the whole 1072 previous bubble becomes the actual bubble for real-time action by 1073 that participant and is placed below all other bubbles in the 1074 presentation area. If the border between bubbles was caused by the 1075 CRLF characters (instead of the normal "Line Separator"), only one 1076 erasure action is required to erase this bubble border. When a 1077 bubble is closed, it is moved up, above all real-time active bubbles. 1079 A three-party view is shown in this example . 1081 _________________________________________________ 1082 | |^| 1083 | | | 1084 | | | 1085 | | | 1086 |[Alice] Hi, Alice here. | | 1087 | | | 1088 |[Bob] Bob as well. | | 1089 | | | 1090 |[Eve] Hi, this is Eve, calling from Paris. | | 1091 | I thought you should be here. | | 1092 | | | 1093 |[Alice] I am coming on Thursday, my | | 1094 | performance is not until Friday morning.| | 1095 | | | 1096 |[Bob] And I on Wednesday evening. | | 1097 | | | 1098 |[Alice] Can we meet on Thursday evening? | | 1099 | | | 1100 |[Eve] Yes, definitely. How about 7pm. | | 1101 | at the entrance of the restaurant | | 1102 | Le Lion Blanc? | | 1103 |[Eve] we can have dinner and then take a walk | | 1104 | | | 1105 | But I need to be back to | | 1106 | the hotel by 11 because I need | | 1107 | |-| 1108 | I wou |-| 1109 |______________________________________________|v| 1110 | of course, I underst | 1111 |________________________________________________| 1113 Figure 1: Example of a three-party call presented in the bubble 1114 style. 1116 Figure 1: Three-party call with bubble style. 1118 8.2.2. Other presentation styles 1120 Other presentation styles than the bubble style may be arranged and 1121 appreciated by the users. In a video conference one way may be to 1122 have a real-time text area below the video view of each participant. 1123 Another view may be to provide one column in a presentation area for 1124 each participant and place the text entries in a relative vertical 1125 position corresponding to when text entry in them was completed. The 1126 labels can then be placed in the column header. The considerations 1127 for ending and moving and erasure of entered text discussed above for 1128 the bubble style are valid also for these styles. 1130 This figure shows how a coordinated column view MAY be presented. 1132 _____________________________________________________________________ 1133 | Bob | Eve | Alice | 1134 |____________________|______________________|_______________________| 1135 | | |I will arrive by TGV. | 1136 |My flight is to Orly| |Convenient to the main | 1137 | |Hi all, can we plan |station. | 1138 | |for the seminar? | | 1139 |Eve, will you do | | | 1140 |your presentation on| | | 1141 |Friday? |Yes, Friday at 10. | | 1142 |Fine, wo | |We need to meet befo | 1143 |___________________________________________________________________| 1145 Figure 2: A coordinated column-view of a three-party session with 1146 entries ordered in approximate time-order. 1148 9. Presentation details for multi-party unaware UAs. 1150 Multi-party unaware UA:s are prepared only for presentation of two 1151 sources of text, the local user and a remote user. If mixing for 1152 multi-party unaware UAs is to be supported, in order to enable some 1153 multi-party communication with such UA, the mixer need to plan the 1154 presentation and insert labels and line breaks before lables. Many 1155 limitations appear for this presentation mode, and it must be seen as 1156 a fallback and a last resort. A realistic alternative is to not 1157 allow multi-party sessions with multi-party unaware UAs. 1159 See Appendix A for an informative example of a procedure for 1160 presenting RTT to a conference-unaware UA. 1162 10. Security Considerations 1164 The security considerations valid for RFC 4103 [RFC4103] and RFC 3550 1165 [RFC3550] are valid also for the multi-party sessions with text. 1167 11. IANA Considerations 1169 The items for indication and negotiation of capability for multi- 1170 party rtt should be registered with IANA in the specifications where 1171 they are specified in detail. 1173 12. Congestion considerations 1175 The congestion considerations described in RFC 4103 [RFC4103] are 1176 valid also for multi-party use of the real-time text RTP transport. 1177 A risk for congestion may appear if a number of conference 1178 participants are active transmitting text simultaneously, because 1179 this multi-party transmission method does not allow multiple sources 1180 of text to contribute to the same packet. 1182 In situations of risk for congestion, the Focus UA MAY combine 1183 packets from the same source to increase the transmission interval 1184 per source up to one second. Local conference policy in the Focus UA 1185 may be used to decide which streams shall be selected for such 1186 transmission frequency reduction. 1188 13. Acknowledgements 1190 Arnoud van Wijk for contributions to an earlier, expired draft of 1191 this memo. 1193 14. Changes 1195 14.1. Changes from version -01 to -02 1197 Change from a general overview to overview with clear 1198 recommendations. 1200 Splits text coordination methods in three groups. 1202 Recommends rtt-mixer with sources in CSRC-list but referenes to its 1203 spec for details. 1205 Shortened Appendix with conference-unaware example. 1207 Cleaned up preferences. 1209 Inserted pictures of screen-views. 1211 15. References 1213 15.1. Normative References 1215 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1216 Requirement Levels", BCP 14, RFC 2119, 1217 DOI 10.17487/RFC2119, March 1997, 1218 . 1220 15.2. Informative References 1222 [EN301549] 1223 ETSI, "EN 301 549. Accessibility requirements for ICT 1224 products and services", November 2019. 1226 [I-D.hellstrom-avtcore-multi-party-rtt-source] 1227 Hellstrom, G., "Indicating source of multi-party Real-time 1228 text", draft-hellstrom-avtcore-multi-party-rtt-source-01 1229 (work in progress), February 2020. 1231 [I-D.ietf-mmusic-t140-usage-data-channel] 1232 Holmberg, C., "T.140 Real-time Text Conversation over 1233 WebRTC Data Channels", draft-ietf-mmusic-t140-usage-data- 1234 channel-11 (work in progress), December 2019. 1236 [I-D.ietf-perc-private-media-framework] 1237 Jones, P., Benham, D., and C. Groves, "A Solution 1238 Framework for Private Media in Privacy Enhanced RTP 1239 Conferencing (PERC)", draft-ietf-perc-private-media- 1240 framework-12 (work in progress), June 2019. 1242 [NENAi3] NENA, "NENA-STA-010.2-2016. Detailed Functional and 1243 Interface Standards for the NENA i3 Solution", October 1244 2016. 1246 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1247 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1248 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1249 DOI 10.17487/RFC2198, September 1997, 1250 . 1252 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1253 A., Peterson, J., Sparks, R., Handley, M., and E. 1254 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1255 DOI 10.17487/RFC3261, June 2002, 1256 . 1258 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1259 with Session Description Protocol (SDP)", RFC 3264, 1260 DOI 10.17487/RFC3264, June 2002, 1261 . 1263 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1264 Jacobson, "RTP: A Transport Protocol for Real-Time 1265 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1266 July 2003, . 1268 [RFC3840] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, 1269 "Indicating User Agent Capabilities in the Session 1270 Initiation Protocol (SIP)", RFC 3840, 1271 DOI 10.17487/RFC3840, August 2004, 1272 . 1274 [RFC3841] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 1275 Preferences for the Session Initiation Protocol (SIP)", 1276 RFC 3841, DOI 10.17487/RFC3841, August 2004, 1277 . 1279 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 1280 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 1281 . 1283 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 1284 Session Initiation Protocol (SIP)", RFC 4353, 1285 DOI 10.17487/RFC4353, February 2006, 1286 . 1288 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A 1289 Session Initiation Protocol (SIP) Event Package for 1290 Conference State", RFC 4575, DOI 10.17487/RFC4575, August 1291 2006, . 1293 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 1294 (SIP) Call Control - Conferencing for User Agents", 1295 BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, 1296 . 1298 [RFC4597] Even, R. and N. Ismail, "Conferencing Scenarios", 1299 RFC 4597, DOI 10.17487/RFC4597, August 2006, 1300 . 1302 [RFC5194] van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real- 1303 Time Text over IP Using the Session Initiation Protocol 1304 (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008, 1305 . 1307 [RFC6443] Rosen, B., Schulzrinne, H., Polk, J., and A. Newton, 1308 "Framework for Emergency Calling Using Internet 1309 Multimedia", RFC 6443, DOI 10.17487/RFC6443, December 1310 2011, . 1312 [RFC6881] Rosen, B. and J. Polk, "Best Current Practice for 1313 Communications Services in Support of Emergency Calling", 1314 BCP 181, RFC 6881, DOI 10.17487/RFC6881, March 2013, 1315 . 1317 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 1318 DOI 10.17487/RFC7667, November 2015, 1319 . 1321 [T140] ITU-T, "Recommendation ITU-T T.140 (02/1998), Protocol for 1322 multimedia application text conversation", February 1998. 1324 [T140ad1] ITU-T, "Recommendation ITU-T.140 Addendum 1 - (02/2000), 1325 Protocol for multimedia application text conversation", 1326 February 2000. 1328 [TS103479] 1329 ETSI, "TS 103 479. Emergency communications (EMTEL); Core 1330 elements for network independent access to emergency 1331 services", December 2019. 1333 [TS22173] 3GPP, "IP Multimedia Core Network Subsystem (IMS) 1334 Multimedia Telephony Service and supplementary services; 1335 Stage 1", 3GPP TS 22.173 17.1.0, December 2019. 1337 [TS24147] 3GPP, "Conferencing using the IP Multimedia (IM) Core 1338 Network (CN) subsystem; Stage 3", 3GPP TS 24.147 16.0.0, 1339 December 2019. 1341 Appendix A. Mixing for a conference-unaware UA 1343 This informational appendix describes media mixer procedures for a 1344 multi-party conference server to format real-time text from a number 1345 of participants into one single text stream to a participant with a 1346 terminal that has no features for multi-party text display. The 1347 procedures are intended for implementations using ITU-T T.140 [T.140] 1348 for the real-time text coding and presentation. 1350 A.1. Short description 1352 The media mixer procedures described here are intended to make real- 1353 time text from a number of call participants be coordinated into one 1354 text stream to a terminal originally intended for two-party calls. A 1355 conference server is supposed to apply the procedures. 1357 The procedures may also be applied on a terminal for display of 1358 multiple streams of real-time text in one area. 1360 The intention is that text from each participant shall be displayed 1361 in suitable sections so that it is easy to read, and text from one 1362 active participant at a time is sent and displayed in real-time. The 1363 receiving terminal is assumed to have one display area for received 1364 text. The display is arranged by this procedure in a text chat 1365 style, with a name label in front of each text section where switch 1366 of source of the text has taken place. 1368 When more than one participant transmits text at the same time, the 1369 text from only one of them is transmitted directly to the receiving 1370 terminals. Text from the other participants is stored in buffers in 1371 the conference server for transmission at a later time, when a 1372 suitable situation for switch of current transmitter can take place. 1374 A.2. Functionality goals and drawbacks 1376 The procedures are intended to make best efforts to present a multi- 1377 party text conversation on a terminal that has no awareness of multi- 1378 party calls. There are some obvious drawbacks, and a terminal 1379 designed with multi-party awareness will be able to present multi- 1380 party call contents in a more flexible way. Only two parties at a 1381 time will be allowed to display added text in real-time, while the 1382 other parties' produced text will need to be stored in the multi- 1383 party server for a moment awaiting a suitable occasion to be 1384 displayed. There are also some cases of erasure that will not be 1385 performed on the target text but only indicated in another way. Even 1386 with these drawbacks, the procedure provides an opportunity to 1387 display text from more than two parties in a smooth and readable way. 1389 This specification does not introduce any new protocol element, and 1390 does not rely on anything else than basic two-party terminal 1391 functionality with presentation level according to ITU-T T.140 1392 [T.140]. It is a description of a best current practice for mixing 1393 and presentation of the real-time text component in multi-party calls 1394 with terminals without multi-party awareness. 1396 The procedures are applicable to scenarios, when the conference focus 1397 and a User Agent have not gone through any successfully completed 1398 negotiation about conference awareness for the real-time text medium 1399 neither on the transport level, nor on the presentation level. 1401 A.3. Definitions 1403 Active participant: Any user sending text, or being in a pending 1404 period. 1406 BOM Byte-Order-Mark, the Unicode character FEFF in UCS-16. 1408 Buffer: A buffer intended for unsent text collected per 1409 participant. 1411 Contributing participants: The participants selected to contribute 1412 to the text stream sent to the recipients. 1414 By default all participants except the recipient are contributing 1415 participants for transmission to the recipient. 1417 Current participant: The participant for whom text currently is 1418 transmitted to the recipient in real time. 1420 Current Recipients: By default all participants. 1422 Display Counter: A counter for the number of displayable 1423 characters in a participant's buffer or in the current entry. 1424 Used for controlling how far erasure may be performed. 1426 Erasure replacement A character to be displayed when an erasure 1427 was done, but the text to erase is not reachable on the multi- 1428 party display. Default 'X'. 1430 Message delimiter: Character(s) forming the end of an imagined 1431 message. A configurable set of alternatives, consisting by 1432 default of: Line Separator, Paragraph Separator, CR, CRLF, LF. 1434 Pending period: A configurable time period of inactivity from a 1435 participant, by default set to 7 seconds after each reception of 1436 characters from that participant, evaluated as current time minus 1437 time stamp of latest entered character. 1439 Sentence delimiter: Characters forming end of sentence: A 1440 configurable set of alternatives, by default consisting of: dot 1441 '.', question mark '?' and exclamation mark '!' followed by a 1442 space. 1444 Label: A readable unique name for a participant, created by the 1445 server from a suitable source related to the participant, e.g. 1446 part of the SIP Display name, surrounded by the Label delimiters. 1447 The label should have a settable maximum length, with 12 being the 1448 default. 1450 Label delimiters A configurable set of characters at the edges of 1451 the Label, by default being a left bracket [ at the leading edge 1452 and a closing bracket ] followed by a space at the trailing edge. 1454 Line Separator Unicode UCS-16 2028. Used to request NewLine in 1455 Real-Time Text. 1457 Maximum waiting time: The maximum time any participant's text 1458 shall be allowed to wait for transmission, by default set to 20 1459 seconds. 1461 Recipient: The terminal receiving the mixed text stream. 1463 SGR Select Graphic Rendition, a control code to specify colours 1464 etc. 1466 Switch Reason: A set of reasons to switch Current Participant, 1467 consisting of the following 1469 -Waiting time higher for any other participant than the current 1470 participant combined with any of the following states: 1472 -A message delimiter was the latest transmitted item 1474 -A sentence delimiter was the latest transmitted item 1476 -A Pending Period has expired and still no text has been 1477 transmitted 1479 -The Maximum Waiting time has expired followed by a Word Delimiter 1480 or an expired Time Extension. 1482 Waiting time: The time the first character in queue for 1483 transmission from a participant has been waiting in a buffer for 1484 transmission. The granularity shall be 0.3 Seconds or finer. 1486 Word delimiter: Character forming end of word: space 1488 Time extension: A configurable short extension time allowed after 1489 the Maximum waiting time during which a suitable moment for 1490 switching Current Participant is awaited, by default set to 7 1491 seconds. 1493 A.4. Presentation level procedures 1495 The conference server applies these mixing procedures to text 1496 transmitted to call participants who have not gone through a 1497 completed negotiation for conference awareness in real-time text 1498 presentation. 1500 All the participants and the conference server use real-time text 1501 conversation presentation coding according to ITU-T T.140 [T.140]. A 1502 consequence is that real-time text transmissions are UTF-8 coded, 1503 with control codes selected from ISO 6429 [ISO 6429]. 1505 The description is from the conference server point of view. 1507 A.4.1. Structure 1509 The real-time text mixer structure described here is supposed to be 1510 placed in the media path so that it is implemented with one mixer per 1511 recipient. A mixer contains buffers for temporary storage of text 1512 intended for the recipient. Each mixer has one buffer for each 1513 contributing participant. A set of status variables is maintained 1514 per buffer and is used in the mixer actions. The mixer logic decides 1515 for each moment which participant?s buffer content is to be sent on 1516 to the recipient. By default, the recipient does not contribute text 1517 to its own mixer. Text transmitted by a participant is usually 1518 displayed locally and it will only cause confusion if it appears also 1519 in received text. 1521 A.4.2. Action on reception 1523 This description of the mixer is valid per recipient. 1525 Text from each contributing participant is checked for a set of 1526 characteristics on reception. 1528 Delete BOM: BOM characters are deleted. 1530 Insert in buffer: Resulting text is put into the contributing 1531 participant?s buffer in the receiving participant?s mixer. 1533 Maintain a display counter: For each text character that will take 1534 a position on the receiving display, a Display Counter for each 1535 participant is increased by one. 1537 There is one T.140 real-time text item that consists of two 1538 characters, but is regarded to be a unit and therefore increase 1539 the Display Counter with one only.That is CRLF. 1541 Furthermore, the following control codes are regarded units that 1542 shall not take any position on the receiving display and shall 1543 therefore not increase the Display Counter: 1545 0098 string 009C (SOS-ST strings) 1547 ESC 0061 (INT) 1549 009B Ps 006D (the SGR code, with special handling described below) 1551 BEL (Alert in session) 1553 See the section on control codes below for details. 1555 Combination characters: Also note that it is possible to use 1556 combination characters in Unicode. Such combination characters 1557 contain more than one character part. They shall only increase 1558 the Display Counter with one. The combination characters mainly 1559 have components in the series 0300 ? 0361 and 20D0 ? 20E1. 1561 Erasure: If the control code for erasure, BS, is received, the 1562 following shall be done: If the Display Counter is 0, an Erasure 1563 Replacement character, by default being "X" is inserted in the 1564 buffer instead of the erasure, to mark that erasure was intended 1565 in earlier transmitted entries. ( this matches traditional habits 1566 in real-time text when participants sometimes type XXX to indicate 1567 erasure they do not bother to make explicit). If the Display 1568 Counter is >0, then the counter is reduced by one, and the erasure 1569 control code BS put into the buffer. 1571 Initial action in the session: BOM shall be sent initially to the 1572 recipients in the beginning of the session. 1574 Maintaining a waiting time per participant: The time that text has 1575 been in the buffer is maintained as the waiting time for each 1576 buffer. A granularity of 0.3 seconds is sufficient. 1578 Storing time of reception for each character: Each character that 1579 is stored in a buffer shall be assigned with a time stamp 1580 indicating its time of reception. A granularity of 0.3 seconds is 1581 sufficient. This time stamp is used for calculation of idle time 1582 and waiting time in the evaluation of switch reasons. 1584 Initial assignment of the Current Participant: The first 1585 contributing participant to send text in the session is assigned 1586 to be the Current Participant. 1588 Actions on assignment of a Current Participant: When a participant 1589 becomes the Current Participant, the following initial actions 1590 shall be performed: 1592 1. Scanning transmissions and timers for a Switch Reason is 1593 inactivated. 1595 2. The Current Recipients are set so that all transmissions go to 1596 the new set of Current Recipients (See definition). 1598 3. A Line Separator is transmitted if the switch reason was any 1599 other than a message delimiter. 1601 4. The Label is transmitted 1602 5. Any stored SGR code is transmitted 1604 6. Scanning transmissions and timers for a Switch Reason is 1605 activated. 1607 7. Text in the buffer is transmitted, recalculating and setting 1608 the waiting time for each transmitted character based on the time 1609 of reception of next character in the buffer. If a switch occurs 1610 during transmission from the buffer, the remaining buffer contents 1611 is maintained and transmission can continue next time this 1612 transmitter becomes the current participant. Any text entered 1613 into the buffer for the current participant is after that sent to 1614 the recipient until a Switch Reason occurs. 1616 Actions on transmission and during the session: Transmissions are 1617 checked for control codes to act on at transmission as described 1618 below in the section about handling of control codes and such 1619 actions are performed. When the scanning of transmission and 1620 timers for a Switch Reason is active, the timers and the 1621 transmission to the recipient is analyzed for detection if a 1622 Switch Reason has occurred. See the definition of Switch Reasons 1623 for details. 1625 Actions when a Switch Reason has occurred: If a Switch Reason has 1626 occurred, then the following actions shall be performed: 1628 1. The Display Counter of the Current Participant is set to zero 1630 2. If there is an SGR code stored for the Current Participant, a 1631 reset of SGR shall be sent by the sequence SGR 0 [009B 0000 006D]. 1633 3. A participant with the longest waiting time is assigned to be 1634 the Current Participant, and the procedure for assignment of a 1635 Current Participant described above is performed. 1637 Handling of Control codes: The following control codes are 1638 specified by ITU-T T.140. Some of them require consideration in 1639 the conference server. Note that the codes presented here are 1640 expressed in UCS-16, while transmission is made in UTF-8 transform 1641 of these codes. Other sections specify procedures for handling of 1642 specific control codes in the conference server. 1644 BEL 0007 Bell, provides for alerting during an active session. 1646 BS 0008 Back Space, erases the last entered character. 1648 NEW LINE 2028 Line separator. 1650 CR LF 000D 000A A supported, but not preferred way of requesting a 1651 new line. 1653 INT ESC 0061 Interrupt (used to initiate mode negotiation 1654 procedure). 1656 SGR 009B Ps 006D Select graphic rendition. Ps is rendition 1657 parameters specified in ISO 6429. 1659 SOS 0098 Start of string, used as a general protocol element 1660 introducer, followed by a maximum 256 bytes string. 1662 ST 009C String terminator, end of SOS string. 1664 ESC 001B Escape - used in control strings. 1666 Byte order mark FEFF Zero width, no break space, used for 1667 synchronization. 1669 Missing text mark FFFD Replacement character, marks place in 1670 stream of possible text loss. 1672 Code for message border, useful, but not mentioned in T.140: New 1673 Message 2029 Paragraph separator 1675 Handling of Graphic Rendition SGR: The following procedure shall 1676 be followed in order to let the participants control the graphic 1677 rendition of their entries without disturbing other participants? 1678 graphic rendition. The text stream sent to a recipient shall be 1679 monitored for the SGR sequence. The latest conveyed SGR sequence 1680 is also stored as a status variable for the recipient. If the SGR 1681 0 code initiated from the current participant is transmitted, the 1682 SGR storage shall be cleared. 1684 A.5. Display examples 1686 The following pictures are examples of the view on a participant's 1687 display. 1689 _________________________________________________ 1690 | Conference | Alice | 1691 |________________________|_________________________| 1692 | |I will arrive by TGV. | 1693 |[Bob]:My flight is to |Convenient to the main | 1694 |Orly. |station. | 1695 |[Eve]:Hi all, can we | | 1696 |plan for the seminar. | | 1697 | | | 1698 |[Bob]:Eve, will you do | | 1699 |your presentation on | | 1700 |Friday? | | 1701 |[Eve]:Yes, Friday at 10.| | 1702 |[Bob]: Fine, wo |We need to meet befo | 1703 |________________________|_________________________| 1705 Figure A1 : Alice who has a conference-unaware client is receiving 1706 the multi-party real-time text in a single-stream. This figure shows 1707 how a coordinated column view MAY be presented on Alice's device. 1709 _________________________________________________ 1710 | |^| 1711 |[mix][Alice] Hi, Alice here. | | 1712 | | | 1713 |[mix][Bob] Bob as well. | | 1714 | | | 1715 |[mix][Eve] Hi, this is Eve, calling from Paris| | 1716 | I thought you should be here. | | 1717 | | | 1718 |[Alice] I am coming on Thursday, my | | 1719 | performance is not until Friday morning.| | 1720 | | | 1721 |[mix][Bob] And I on Wednesday evening. | | 1722 | | | 1723 |[mix][Eve] we can have dinner and then walk | | 1724 | | | 1725 |[mix][Eve] But I need to be back to | | 1726 | the hotel by 11 because I need |-| 1727 | |-| 1728 |______________________________________________|v| 1729 | of course, I underst | 1730 |________________________________________________| 1732 Figure A2 shows a conference view with real-time text preview. Bob's 1733 text is buffering until a Current switch reason. 1735 A.6. References for this Appendix 1737 [T.140] ITU-T T.140 Application protocol, text conversation 1738 (including amendment 1.) 1740 [RFC 4103] IETF RFC 4103 RTP Payload for text conversation 1742 [RTP] IETF RFC 3550 RTP: A Transport Protocol for Real-Time 1743 Applications. 1745 [RFC 4579] IETF RFC 4579 SIP Call Control ? Conferencing for user 1746 agents. 1748 [ISO 6429] ISO 6429 Control functions for coded character sets. 1750 [UTF-8] IETF RFC 3629 UTF-8, a transformation format of ISO 10646 1752 [Unicode] The Unicode Consortium, "The Unicode Standard ? Version 1753 4.0? 1755 [ISO 10?646-1] ISO 10?646 Universal multiple-octet coded character 1756 set (UCS) 1758 [UCS-16] See ISO 10?646-1 1760 A.7. Acknowledgement for the appendix 1762 This appendix was developed with funding in part from the National 1763 Institute on Disability and Rehabilitation Research, U.S. Department 1764 of Education,RERC on Telecommunications Access,?grant # H133E090001?. 1765 However, the contents do not necessarily represent the policy of the 1766 Department of Education, and you should not assume endorsement by the 1767 Federal Government. 1769 Author's Address 1771 Gunnar Hellstrom 1772 Omnitor 1773 Esplanaden 30 1774 Vendelso SE-136 70 1775 SE 1777 Phone: +46 708 204 288 1778 Email: gunnar.hellstrom@omnitor.se