idnits 2.17.1 draft-hellstrom-mmusic-multi-party-rtt-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (February 23, 2020) is 1523 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'ISO 6429' is mentioned on line 1767, but not defined == Missing Reference: 'Eve-typing' is mentioned on line 1628, but not defined == Missing Reference: 'RFC 4103' is mentioned on line 1759, but not defined == Missing Reference: 'RTP' is mentioned on line 1761, but not defined == Missing Reference: 'RFC 4579' is mentioned on line 1764, but not defined == Missing Reference: 'UTF-8' is mentioned on line 1769, but not defined == Missing Reference: 'Unicode' is mentioned on line 1771, but not defined == Missing Reference: 'UCS-16' is mentioned on line 1777, but not defined Summary: 0 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Hellstrom 3 Internet-Draft Omnitor 4 Intended status: Best Current Practice February 23, 2020 5 Expires: August 26, 2020 7 Real-time text media handling in multi-party conferences 8 draft-hellstrom-mmusic-multi-party-rtt-01 10 Abstract 12 This memo specifies methods for Real-Time Text (RTT) media handling 13 in multi-party calls. The main solution is to carry Real-Time text 14 by the RTP protocol in a time-sampled mode according to RFC 4103. 15 The main solution for centralized multi-party handling of real-time 16 text is achieved through a media control unit coordinating multiple 17 RTP text streams into one RTP session. 19 Identification for the streams are provided through the CSRC lists in 20 the RTP packets and through the RTCP messages. This mechanism 21 enables the receiving application to present the received real-time 22 text medium separated per source, in different ways according to user 23 preferences. Some presentation related features are also described 24 explaining suitable variations of transmission and presentation of 25 text. 27 Call control features are described for the SIP environment. A 28 number of alternative methods for providing the multi-party 29 negotiation, transmission and presentation are discussed and a 30 recommendation for the main one is provided. Two alternative methods 31 using a single RTP stream and source identification inline in the 32 text stream are also described, one of them being provided as a lower 33 functionality fallback method for endpoints with no multi-party 34 awareness for RTT. 36 Brief information is also provided for multi-party RTT in the WebRTC 37 environment. 39 EDITOR NOTE: A number of alternatives are specified for discussion. 40 A decision is needed which alternatives are preferred and then how 41 the preferred alternatives shall be emphasized. 43 Status of This Memo 45 This Internet-Draft is submitted in full conformance with the 46 provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF). Note that other groups may also distribute 50 working documents as Internet-Drafts. The list of current Internet- 51 Drafts is at https://datatracker.ietf.org/drafts/current/. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 This Internet-Draft will expire on August 26, 2020. 60 Copyright Notice 62 Copyright (c) 2020 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents 67 (https://trustee.ietf.org/license-info) in effect on the date of 68 publication of this document. Please review these documents 69 carefully, as they describe your rights and restrictions with respect 70 to this document. Code Components extracted from this document must 71 include Simplified BSD License text as described in Section 4.e of 72 the Trust Legal Provisions and are provided without warranty as 73 described in the Simplified BSD License. 75 Table of Contents 77 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 78 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 79 2. Centralized conference model . . . . . . . . . . . . . . . . 4 80 3. Requirements on multi-party RTT . . . . . . . . . . . . . . . 5 81 4. Coordination of text RTP streams . . . . . . . . . . . . . . 6 82 4.1. RTP Translator sending one RTT stream per participant . . 6 83 4.2. RTP Mixer indicating sources in CSRC-list . . . . . . . . 7 84 4.3. Distributing packets in an end-to-end encryption 85 structure . . . . . . . . . . . . . . . . . . . . . . . . 8 86 4.4. RTP Mixer indicating participants by a control code in 87 the stream . . . . . . . . . . . . . . . . . . . . . . . 9 88 4.5. Mesh of RTP endpoints . . . . . . . . . . . . . . . . . . 10 89 4.6. Multiple RTP sessions, one for each participant . . . . . 11 90 4.7. Mixing for conference-unaware user agents . . . . . . . . 11 91 5. RTT bridging in WebRTC . . . . . . . . . . . . . . . . . . . 13 92 5.1. RTT bridging in WebRTC with one data channel per source . 13 93 5.2. RTT bridging in WebRTC with one common data channel . . . 13 94 6. Preferred multi-party RTT transport method . . . . . . . . . 14 95 7. Session control of multi-party RTT sessions . . . . . . . . . 14 96 7.1. Implicit RTT multi-party capability indication . . . . . 15 97 7.2. RTT multi-party capability declared by SIP media-tags . . 16 98 7.3. SDP media attribute for RTT multi-party capability 99 indication . . . . . . . . . . . . . . . . . . . . . . . 18 100 7.4. SDP format parameter for RTT multi-party capability 101 indication . . . . . . . . . . . . . . . . . . . . . . . 19 102 7.5. Preferred capability declaration method. . . . . . . . . 21 103 8. Identification of the source of text . . . . . . . . . . . . 21 104 9. Presentation of multi-party text . . . . . . . . . . . . . . 21 105 9.1. Associating identities with text streams . . . . . . . . 22 106 9.2. Presentation details for multi-party aware UAs. . . . . . 22 107 9.2.1. Bubble style presentation . . . . . . . . . . . . . . 22 108 9.2.2. Other presentation styles . . . . . . . . . . . . . . 23 109 10. Presentation details for multi-party unaware UAs. . . . . . . 23 110 11. Transmission of text from each user . . . . . . . . . . . . . 24 111 12. Robustness and indication of possible loss . . . . . . . . . 24 112 13. Performance . . . . . . . . . . . . . . . . . . . . . . . . . 24 113 14. Security Considerations . . . . . . . . . . . . . . . . . . . 24 114 15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 115 16. Congestion considerations . . . . . . . . . . . . . . . . . . 25 116 17. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 25 117 18. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 118 18.1. Normative References . . . . . . . . . . . . . . . . . . 25 119 18.2. Informative References . . . . . . . . . . . . . . . . . 26 120 Appendix A. Mixing for a conference-unaware UA . . . . . . . . . 26 121 A.1. Short description . . . . . . . . . . . . . . . . . . . . 27 122 A.2. Functionality goals and drawbacks . . . . . . . . . . . . 27 123 A.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 28 124 A.4. Presentation level procedures . . . . . . . . . . . . . . 30 125 A.4.1. Structure . . . . . . . . . . . . . . . . . . . . . . 30 126 A.4.2. Action on reception . . . . . . . . . . . . . . . . . 30 127 A.5. Display examples . . . . . . . . . . . . . . . . . . . . 34 128 A.6. Summary of configurable parameters . . . . . . . . . . . 35 129 A.7. References for this Appendix . . . . . . . . . . . . . . 38 130 A.8. Acknowledgement . . . . . . . . . . . . . . . . . . . . . 38 131 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 38 133 1. Introduction 135 Real-time text (RTT) is a medium in real-time conversational 136 sessions. Text entered by participants in a session is transmitted 137 in a time-sampled fashion, so that no specific user action is needed 138 to cause transmission. This gives a direct flow of text in the rate 139 it is created, that is suitable in a real-time conversational 140 setting. The real-time text medium can be combined with other media 141 in multimedia sessions. 143 Media from a number of multimedia session participants can be 144 combined in a multi-party session. This memo specifies how the real- 145 time text streams are handled in multi-party sessions. 147 The description is mainly focused on the transport level, but also 148 describes a few session and presentation level aspects. 150 Transport of real-time text is specified in RFC 4103 [RFC4103] RTP 151 Payload for text conversation. It makes use of RFC 3550 [RFC3550] 152 Real Time Protocol, for transport. Robustness against network 153 transmission problems is normally achieved through redundant 154 transmission based on the principle from RFC 2198, with one primary 155 and two redundant transmission of each text element. Primary and 156 redundant transmissions are combined in packets and described by a 157 redundancy header. This transport is usually used in the SIP Session 158 Initiation Protocol RFC 3261 [RFC3261] environment. 160 A very brief overview of functions for real-time text handling in 161 multi-party sessions is described in RFC 4597 [RFC4597] Conferencing 162 Scenarios, sections 4.8 and 4.10. This specification builds on that 163 description and indicates which protocol mechanisms should be used to 164 implement multi-party handling of real-time text. 166 EDITOR NOTE: A number of alternatives are specified for discussion. 167 A decision is needed which alternatives are preferred and then how 168 the preferred alternatives shall be emphasized. 170 1.1. Requirements Language 172 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 173 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 174 document are to be interpreted as described in RFC 2119 [RFC2119]. 176 2. Centralized conference model 178 In the centralized conference model for SIP, introduced in RFC 4353 179 [RFC4353] A Framework for Conferencing with the Session Initiation 180 Protocol (SIP), one function co-ordinates the communication with 181 participants in the multi-party session. This function also controls 182 media mixer functions for the media appearing in the session. The 183 central function is common for control of all media, while the media 184 mixers may work differently for each medium. 186 The central function is called the Focus UA and may be co-located in 187 an advanced terminal including multi-party control functions, or it 188 may be located in a separate location. Many variants exist for 189 setting up sessions including the multipoint control centre. It is 190 not within scope of this description to describe these, but rather 191 the media specific handling in the mixer required to handle multi- 192 party calls with RTT. 194 The main principle for handling real-time text media in a centralized 195 conference is that one RTP session for real-time text is established 196 including the multipoint media control centre and the participating 197 endpoints which are going to have real-time text exchange with the 198 others. 200 The different possible mechanisms for mixing and transporting RTT 201 differs in the way they multiplex the text streams and how they 202 identify the sources of the streams. RFC 7667 [RFC7667] describes a 203 number of possible use cases for RTP. This specification refers to 204 different sections of RFC 7667 for further reading of the situations 205 caused by the different possible design choices. 207 3. Requirements on multi-party RTT 209 The following requirements are placed on multi-party RTT: 211 The solution shall be applicable to IMS (3GPP TS 22.173), SIP 212 based VoIP and Next Generation Emergency Services (NENA i3, ETSI 213 TS 103 479, RFC 6443). 215 The transmission interval for text must not be longer than 500 216 milliseconds when there is anything available to send. Ref ITU-T 217 T.140. 219 If text loss is detected or suspected, a missing text marker shall 220 be inserted in the text stream where the loss is detected or 221 suspected. Ref ITU-T T.140 Amendment 1. ETSI EN 301 549 223 The display of text from the members of the conversation shall be 224 arranged so that the text from each participant is clearly 225 readable, and its source and the relative timing of entered text 226 is visualized in the display. Mechanisms for looking back in the 227 contents from the current session should be provided. The text 228 should be displayed as soon as it is received. Ref ITU-T T.140 230 Bridges must be multimedia capable (voice, video, text). Ref NENA 231 i3 STA-010.2. 233 R7: It MUST be possible to use real-time text in conferences both 234 as a medium of discussion between individual participants (for 235 example, for sidebar discussions in real-time text while listening 236 to the main conference audio) and for central support of the 237 conference with real-time text interpretation of speech. Ref RFC 238 5194. 240 It should be possible to protect RTT contents with usual means for 241 privacy and integrity.Ref RFC 6881 section 16 243 Conferencing procedures are documented in RFC 4579. Ref NENA i3 244 STA-010.2. 246 Conferencing applies to any kind of media stream by which users 247 may want to communicate... Ref 3GPP TS 24.147 249 The framework for SIP conferences is specified in RFC 4353. Ref 250 3GPP TS 24.147 252 4. Coordination of text RTP streams 254 Coordinating and sending text RTP streams in the multi-party session 255 can be done in a number of ways. The most suitable methods are 256 specified here with pros and cons. 258 A receiving UA SHOULD separate text from the different sources and 259 identify and display them accordingly. 261 4.1. RTP Translator sending one RTT stream per participant 263 Within the RTP session, text from each participant is transmitted 264 from the RTP media translator in a separate RTP stream, thus using 265 the same destination address/port combination, but separate RTP SSRC 266 parameters and sequence number series as described in Section 7.1 and 267 7.2 of RTP RFC 3550 [RFC3550] about the Translator function. The 268 sources of the text in each RTP packet are identified by the SSRC 269 parameters in the RTP packets, containing the SSRC of the initial 270 sources of text. 272 A receiving UA is supposed to separate text items from the different 273 sources and identify and display them in a suitable way. 275 This method is described in RFC 7667, section 3.5.1 Relay-transport 276 translator or 3.5.2 Media translator. 278 The identification of the source is made through the RTCP SDES CNAME 279 and NAME packets as described in RTP[RFC3550]. 281 Pros: 283 This method has moderate overhead. When loss of packets occur, it is 284 possible to recover text from redundancy at loss of up to the number 285 of redundancy levels carried in the RFC 4103 stream. (normally 286 primary and two redundant levels. 288 More loss than what can be recovered, can be detected and the marker 289 for text loss can be inserted in the correct stream. 291 It may be possible in some scenarios to keep the text encrypted 292 through the Translator. 294 Cons: 296 There may be RTP implementations not supporting the Translator model. 298 It is even most likely that this configuration is not supported by 299 current media declarations in sdp. RFC 3264 specifies in many places 300 that one media description is supposed to describe just one RTP 301 stream. 303 4.2. RTP Mixer indicating sources in CSRC-list 305 An RTP media mixer combines text from all participants except from 306 the receiving endpoint into one RTP stream , thus all using the same 307 destination address/port combination, the same RTP SSRC and , one 308 sequence number series as described in Section 7.1 and 7.3 of RTP RFC 309 3550 [RFC3550] about the Mixer function. The sources of the text in 310 each RTP packet are identified by the CSRC parameters in the RTP 311 packets, containing the SSRC of the initial sources of text. The 312 order of the CSRC parameters are the same as the order of the 313 redundant and primary data fields in the packet. If all redundancy 314 blocks in a packet are from the same source, then it is allowed to 315 use only one CSRC in the RTP packet. This method is described in RFC 316 7667, section 3.6.3 Media switching mixer. 318 A set of specific rules for the application of this method together 319 with RFC 4103 is needed. 321 The identification of the source can be made through the RTCP SDES 322 CNAME and NAME packets as described in RTP[RFC3550]. 324 Also information provided through the notification according to RFC 325 4575 when the participant joined the conference provides suitable 326 information and a reference to the SSRC. 328 A receiving UA is supposed to separate text items from the different 329 sources and identify and display them accordingly. 331 The ordered CSRC lists in the RFC 4103 packets make it possible to 332 recover from loss of one and two packets in sequence and assign the 333 recovered text to the right source. For more loss, a marker for 334 possible loss should be inserted or presented. 336 The conference server need to have authority to decrypt the payload 337 in the RTP packets in order to be able to recover text from redundant 338 data or insert the missing text marker in the stream, and repack the 339 text in new packets. 341 Pros: 343 This method has moderate overhead. 345 When loss of packets occur, it is possible to recover text from 346 redundancy at loss of up to the number of redundancy levels carried 347 in the RFC 4103 stream. (normally primary and two redundant levels. 349 This method can be implemented with most RTP implementations. 351 Cons: 353 When more consecutive packet loss than the number of generations of 354 redundant data appears, it is not possible to deduct the sources of 355 the totally lost data. Therefore it is not possible to know in which 356 stream to insert the missing text marker. It MAY be acceptable to 357 either indicate a general loss indication, or insert a loss marker in 358 all streams. Calculations of most likely source can however be made 359 from received RTP and RTCP contents so that the loss marker can be 360 inserted in the most likely struck stream. 362 The conference server need to be allowed to decrypt/encrypt the 363 packet payload. This is however normal for media mixers for other 364 media. 366 4.3. Distributing packets in an end-to-end encryption structure 368 In order to achieve end-to-end encryption, it is possible to let the 369 packets from the sources just pass though a central distributor, and 370 handle the security agreements between the participants. 371 Specifications exist for a framework with this functionality suitable 372 for application on RTP based conferences in draft-ietf-perc-private- 373 media-framework. The RTP flow and mixing characteristics has 374 similarities with the method described under "RTP Translator sending 375 one RTT stream per participant" above. RFC 4103 RTP streams would 376 fit into the structure and it would provide a base for end-to-end 377 encrypted rtt multi-party conferencing. 379 Pros: 381 Good security 383 Straightforward multi-party handling. 385 Cons: 387 Does not operate under the usual SIP central conferencing 388 architecture. 390 Requires the participants to perform a lot of key handling. 392 4.4. RTP Mixer indicating participants by a control code in the stream 394 Text from all participants except the receiving one is transmitted 395 from the media mixer in the same RTP session and stream, thus all 396 using the same destination address/port combination, the same RTP 397 SSRC and , one sequence number series as described in Section 7.1 and 398 7.3 of RTP RFC 3550 [RFC3550] about the Mixer function. The sources 399 of the text in each RTP packet are identified by a new defined T.140 400 control code "c" followed by a unique identification of the source in 401 UTF-8 string format. 403 The receiver can use the string for presenting the source of text. 404 This method is on the RTP level described in RFC 7667, section 3.6.2 405 Media mixing mixer. 407 The inline coding of the source of text is applied in the data stream 408 itself, and an RTP mixer function is used for coordinating the 409 sources of text into one RTP stream. 411 Information uniquely identifying each user in the multi-party session 412 is placed as the parameter value "n" in the T.140 application 413 protocol function with the function code "c". The identifier shall 414 thus be formatted like this: SOS c n ST, where SOS and ST are coded 415 as specified in ITU-T T.140 [T.140]. The "c" is the letter "c". The 416 n parameter value is a string uniquely identifying the source. This 417 parameter shall be kept short so that it can be repeated in the 418 transmission without concerns for network load. 420 A receiving UA is supposed to separate text items from the different 421 sources and identify and display them accordingly. 423 The conference server need to be allowed to decrypt/encrypt the 424 packet payload in order to check the source and repack the text. 426 Pros: 428 If loss of packets occur, it is possible to recover text from 429 redundancy at loss of up to the number of redundancy levels carried 430 in the RFC 4103 stream. (normally primary and two redundant levels. 432 This method can be implemented with most RTP implementations. 434 Transmitted text can also be used with other transports than RTP 436 Cons: 438 If more consecutive packet loss than the number of generations of 439 redundant data appears, it is not possible to deduct the source of 440 the totally lost data. Therefore it is not possible to know in which 441 stream to insert the missing text marker. Calculations of most 442 likely source can however be made from recent history, so that it is 443 quite likely that the marker is inserted in the correct stream. Such 444 loss should however be rare, and a general warning that there might 445 have been text loss in the session might be acceptable. 447 The mixer needs to be able to generate suitable and unique source 448 identifications which are suitable as labels for the sources. 450 Requires an extension on the ITU-T T.140 standard, best made by the 451 ITU. 453 The conference server need to be allowed to decrypt/encrypt the 454 packet payload. 456 The conference server need to be allowed to decrypt/encrypt the 457 packet payload. 459 4.5. Mesh of RTP endpoints 461 Text from all participants are transmitted directly to all others in 462 one RTP session, without a central bridge. The sources of the text 463 in each RTP packet are identified by the source network address and 464 the SSRC. 466 This method is described in RFC 7667, section 3.4 Point to multi- 467 point using mesh. 469 Pros: 471 When loss of packets occur, it is possible to recover text from 472 redundancy at loss of up to the number of redundancy levels carried 473 in the RFC 4103 stream. (normally primary and two redundant levels. 475 This method can be implemented with most RTP implementations. 477 Transmitted text can also be used with other transports than RTP 479 Cons: 481 This model is not described in IMS, NENA and EENA specifications, and 482 does therefore not meet the requirements. 484 4.6. Multiple RTP sessions, one for each participant 486 Text from all participants are transmitted directly to all others in 487 one RTP session each, without a central bridge. Each session is 488 established with a separate media description in SDP. The sources of 489 the text in each RTP packet are identified by the source network 490 address and the SSRC. 492 This method is out of scope for further discussion here, because the 493 foreseen applications use centralized model conferencing. 495 Pros: 497 When loss of packets occur, it is possible to recover text from 498 redundancy at loss of up to the number of redundancy levels carried 499 in the RFC 4103 stream. (normally primary and two redundant levels. 501 Complete loss of text can be indicated in the received stream. 503 This method can be implemented with most RTP implementations. 505 End-to-end encryption is achievable. 507 Cons: 509 This method is not described in IMS, NENA and EENA specifications and 510 does therefore not meet the requirements. 512 A lot of network resources are spent on setting up separate sessions 513 for each participant. 515 4.7. Mixing for conference-unaware user agents 517 Multi-party real-time text contents can be transmitted to conference- 518 unaware user agents if source labeling and formatting of the text is 519 performed by a mixer. This method has the limitations that the 520 layout of the presentation and the format of source identification is 521 purely controlled by the mixer, and that only one source at a time is 522 allowed to present in real-time. Other sources need to be stored 523 temporarily waiting for an appropriate moment to switch the source of 524 transmitted text. The mixer controls the switching of sources and 525 inserts a source identifier in text format at the beginning of text 526 after switch of source. The logic of trhe mixer to detect when a 527 switch is appropriate should detect a number of places in text where 528 a switch can be allowed, including new line, end of sentence, end of 529 phrase, a period of inactivity, and a word separator after a long 530 time of active transmission. 532 This method MAY be used when no support for multi-party awareness is 533 detected in the receiving endpoint.The base for his method is 534 described in RFC 7667, section 3.6.2 Media mixing mixer. 536 See Appendix A for an informative example of a procedure for 537 presenting RTT to a conference-unaware UA. 539 Pros: 541 Can be transmitted to conference-unaware endpoints. 543 Can be used with other transports than RTP 545 Cons: 547 Does not allow full real-time presentation of more than one source at 548 a time. Text from other sources will be delayed, even if automatic 549 detection of suitable moments for switching source for presentation 550 is made by the mixer. 552 The only realistic presentation format is a style with the text from 553 the different sources presented with a text label indicating source, 554 and the text collected in a chat style presentation but with more 555 frequent turn-taking. 557 Endpoints often have their own system for adding labels to the RTT 558 presentation. In that case there will be two levels of labels in the 559 presentation, one for the mixer and one for the sources. 561 If loss of more packets than can be recovered by the redundancy 562 appears, it is not possible to detect which source was struck by the 563 loss. It is also possible that a source switch occurred during the 564 loss, and therefore a false indication of the source of text can be 565 provided to the user after such loss. 567 Because of all these cons, this method MUST NOT be used as the main 568 method, but only as the last resort for backwards interoperability 569 with conference-unaware endpoints. 571 The conference server need to be allowed to decrypt/encrypt the 572 packet payload. 574 5. RTT bridging in WebRTC 576 Within WebRTC, real-time text is specified to be carried in WebRTC 577 data channels as specified in draft-ietf-mmusic-t140-usage-data- 578 channel. A few ways to handle multi-party RTT are mentioned briefly. 579 They are explained and further detailed below. 581 5.1. RTT bridging in WebRTC with one data channel per source 583 A straightforward way to handle multi-party RTT is for the bridge to 584 open one T.140 data channel per source towards the receiving 585 participants. 587 The stream-id forms a unique stream identification. 589 The identification of the source is made through the Label property 590 of the channel, and session information belonging to the source. The 591 UA can compose a readable label for the presentation from this 592 information. 594 Pros: 596 This is a straightforward solution. 598 Cons: 600 With a high number of participants, the overhead of establishing the 601 high number of data channels required may be high. 603 5.2. RTT bridging in WebRTC with one common data channel 605 A way to handle multi-party RTT in WebRTC is for the bridge combine 606 text from all sources into one data channel and insert the sources in 607 the stream by a T.140 control code for source. 609 This method is described in a corresponding section for RTP 610 transmission above. 612 The identification of the source is made through insertion in the 613 beginning of each text transmission from a source of a control code 614 extension "c" followed by a string representing the source, framed by 615 the control code start and end flags SOS and ST (See ITU-T T.140 616 [T.140]). 618 A receiving UA is supposed to separate text items from the different 619 sources and identify and display them in a suitable way. 621 The UA does not always display the source identification in the 622 received text at the place where it is received, but has the 623 information as a guide for planning the presentation of received 624 text. A label corresponding to the source identification is 625 presented when needed depending on the selected presentation style. 627 Pros: 629 This solution has relatively low overhead on session and network 630 level 632 Cons: 634 This solution has higher overhead on the media contents level than 635 the WebRTC solution above. 637 Standardisation of the new control code "c" in ITU-T T.140 is 638 required. 640 The conference server need to be allowed to decrypt/encrypt the data 641 channel contents. 643 6. Preferred multi-party RTT transport method 645 EDITOR NOTE: The recommendations here need to be validated, and the 646 proposed further studies performed. 648 For RTP transport of RTT, two methods for multi-party mixing and 649 transport for conference-aware parties stand out as fulfilling the 650 goals best is: "RTP Mixer indicating participants in CSRC". 652 For WebRTC, one method is to prefer because of the simplicity. So, 653 for WebRTC, the method to implement for multi-party RTT with 654 conference-aware parties when no other method is explicitly agreed 655 between implementing parties is: "RTT bridging in WebRTC with one 656 data channel per source". 658 7. Session control of multi-party RTT sessions 660 General session control aspects for multi-party sessions are 661 described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP) 662 Event Package for Conference State, and RFC 4579 [RFC4579] Session 663 Initiation Protocol (SIP) Call Control - Conferencing for User 664 Agents. The nomenclature of these specifications are used here. 666 The procedures for a conference-aware model for RTT-transmission 667 shall only be applied if a capability exchange for conference-aware 668 real-time text transmission has been completed and a supported method 669 for multi-party real-time text transmission can be identified. 671 A method for detection of conference-awareness for centralized SIP 672 conferencing in general is specified in RFC 4579 [RFC4579]. The 673 focus sends the "isfocus" feature tag in a SIP Contact header. This 674 causes the conference-aware UA to subscribe to conference 675 notifications from the focus. The focus then sends notifications to 676 the UA about entering and disappearing conference participants and 677 their media capabilities. The information is carried XML-formatted 678 in a 'conference-info' block in the notification according to RFC 679 4575. The mechanism is described in detail in RFC 4575 [RFC4575]. 681 Before a conference media server starts sending multi-party RTT to a 682 UA, a verification of its ability to handle multi-party RTT must be 683 made. A decision on which mechanism to use for identifying text from 684 the different participants must also be taken, implicitly or 685 explicitly. These verifications and decisions can be done in a 686 number of ways. The most apparent ways are specified here and their 687 pros and cons described. One of the methods is selected to be the 688 one to be used by implementations according to this specification. 690 7.1. Implicit RTT multi-party capability indication 692 Capability for RTT multi-party handling can be decided to be 693 implicitly indicated by session control items. 695 The focus may implicitly indicate muti-party RTT capability by 696 including the media child with value "text" in the RFC 4575 697 conference-info provided in conference notifications. 699 A UA may implicitly indicate multi-party RTT capability by including 700 the text media in the SDP in the session control transactions with 701 the conference focus after the subscription to the conference has 702 taken place. 704 The implicit RTT capability indication means for the focus that it 705 can handle multi-party RTT according to the preferred method 706 indicated in the RTT multi-party methods section above. 708 The implicit RTT capability indication means for the UA that it can 709 handle multi-party RTT according to the preferred method indicated in 710 the RTT multi-party methods section above. 712 If the focus detects that a UA implicitly declared RTT multi-party 713 capability, it SHALL provide RTT according to the preferred method. 715 If the focus detects that the UA does not indicate any RTT multi- 716 party capability, then it shall either provide RTT multi-party text 717 in the way specified for conference-unaware UA above, or refuse to 718 set up the session. 720 If the UA detects that the focus has implicitly declared RTT multi- 721 party capability, it shall be prepared to present RTT in a multi- 722 party fashion according to the preferred method. 724 Pros: 726 Acceptance of implicit multi-party capability implies that no 727 standardisation of explicit RTT multi-party capability exchange is 728 required. 730 Cons: 732 If other methods for multi-party RTT are to be used in the same 733 implementation environment as the preferred ones,then capability 734 exchange needs to be defined for them. 736 Cannot be used outside a strictly applied SIP central conference 737 model. 739 7.2. RTT multi-party capability declared by SIP media-tags 741 Specifications for RTT multi-party capability declarations can be 742 agreed for use as SIP media feature tags, to be exchanged during SIP 743 call control operation according to the mechanisms in RFC 3840 and 744 RFC 3841. Capability for the RTT Multi-party capability is then 745 indicated by the media feature tag "rtt-mixer", with one or more of 746 its possible values in a comma-separated list. 748 The possible values in the list are: 750 rtp-translator 752 rtp-mixer 754 t140-mixer 756 rtp-mesh 758 multi-session 760 rtp-translator indicates capability for using the RTP-translator 761 based coordination of multi-party text. 763 rtp-mixer indicates capability for using the RTP-mixer based 764 presentation of multi-party text. 766 t140-mixer indicates capability for using the T.140 control code 767 source indicators in a mixer. 769 text-mixer indicates capability for using the fallback method with 770 text formatting for conference-unaware endpoints. 772 rtp-mesh indicates capability for using the mesh based transmission 773 of multi-party text. 775 multi-session indicates capability for using separate point-to-point 776 RTP sessions between all participants. 778 Example: Contact: 780 ;methods="INVITE,ACK,OPTIONS,BYE,CANCEL" 782 ;+sip.rtt-mixer="multi-session" 784 If, after evaluation of the alternatives in this specification, only 785 one mixing method is selected to be brought to implementation, then 786 the media tag can be reduced to a single tag with no list of values. 788 An offer-answer exchange should take place and the common method 789 selected by the answering party shall be used in the session with 790 that UA. 792 When no common method is declared, then only the fallback method can 793 be used or the session dropped. 795 If more than one text media line is included in SDP, all must be 796 capable of using the declared RTT multi-party method. 798 Pros: 800 Provides a clear decision method. 802 Can be extended with new mixing methods. 804 Can guide call routing to a suitable capable focus. 806 Cons: 808 Requires standardization and IANA registration. 810 Is not stream specific. If more than one text stream is specified, 811 all must have the same type of multi-party capability. 813 Cannot be used in the WebRTC environment. 815 7.3. SDP media attribute for RTT multi-party capability indication 817 An attribute can be specified on media level, to be used in text 818 media SDP declarations for negotiating RTT multi-party capabilities. 819 The attribute can have the name "rtt-mixer", with one or more of its 820 possible values in a comma-separated list. 822 The possible values in the list are: 824 rtp-translator 826 rtp-mixer 828 t140-mixer 830 rtp-mesh 832 multi-session 834 rtp-translator indicates capability for using the RTP-translator 835 based coordination of multi-party text. 837 rtp-mixer indicates capability for using the RTP-mixer based 838 presentation of multi-party text. 840 t140-mixer indicates capability for using the T.140 control code 841 source indicators in a mixer. 843 text-mixer indicates capability for using the fallback method with 844 text formatting for conference-unaware endpoints. 846 rtp-mesh indicates capability for using the mesh based transmission 847 of multi-party text. 849 multi-session indicates capability for using separate point-to-point 850 RTP sessions between all participants. 852 An offer-answer exchange should take place and the common method 853 selected by the answering party shall be used in the session with 854 that UA. 856 When no common method is declared, then only the fallback method can 857 be used. 859 Example: a=rtt-mixer:rtp-mixer 861 If, after evaluation of the alternatives in this specification, only 862 one mixing method is selected to be brought to implementation, then 863 the attribute can be reduced to a single attribute with no list of 864 values. 866 Pros: 868 Provides a clear decision method. 870 Can be extended with new mixing methods. 872 Can be used on specific text media. 874 Can be used also for SDP-controlled WebRTC sessions with multiple 875 streams in the same data channel. 877 Cons: 879 Requires standardization and IANA registration. 881 Cannot guide SIP routing. 883 7.4. SDP format parameter for RTT multi-party capability indication 885 An FMTP format parameter can be specified for the RFC 4103 media, to 886 be used in text media SDP declarations for negotiating RTT multi- 887 party capabilities. The parameter can have the name "rtt-mixer", 888 with one or more of its possible values in a comma-separated list. 890 The possible values in the list are: 892 rtp-translator 894 rtp-mixer 896 t140-mixer 898 rtp-mesh 900 multi-session 902 rtp-translator indicates capability for using the RTP-translator 903 based coordination of multi-party text. 905 rtp-mixer indicates capability for using the RTP-mixer based 906 presentation of multi-party text. 908 t140-mixer indicates capability for using the T.140 control code 909 source indicators in a mixer. 911 text-mixer indicates capability for using the fallback method with 912 text formatting for conference-unaware endpoints. 914 rtp-mesh indicates capability for using the mesh based transmission 915 of multi-party text. 917 multi-session indicates capability for using separate point-to-point 918 RTP sessions between all participants. 920 Example: a=fmtp 96 98/98/98 cps=30;rtt-mixer=rtp-mixer 922 If, after evaluation of the alternatives in this specification, only 923 one mixing method is selected to be brought to implementation, then 924 the parameter can be reduced to a single parameter with no list of 925 values. 927 An offer-answer exchange should take place and the common method 928 selected by the answering party shall be used in the session with 929 that UA. 931 When no common method is declared, then only the fallback method can 932 be used. 934 Pros: 936 Provides a clear decision method. 938 Can be extended with new mixing methods. 940 Can be used on specific text media. 942 Can be used also for SDP-controlled WebRTC sessions with multiple 943 streams in the same data channel. 945 Cons: 947 Requires standardization and IANA registration. 949 May cause interop problems with current RFC4103 implementations not 950 expecting a new fmtp-parameter. 952 Cannot guide SIP routing. 954 7.5. Preferred capability declaration method. 956 The preferred capability declaration method is the one with SDP 957 attributes because it is straightforward and partially usable also 958 for WebRTC. 960 8. Identification of the source of text 962 EDITOR NOTE: The text in the following sections need to be adapted 963 after recommendations for the main methods for coordination of RTT 964 has been selected. Details should be provided mainly for the 965 recommended method. 967 The main way to identify the source of text in the RTP based solution 968 is by the SSRC of the sending participant. It is included in the 969 CSRC list of the transmitted packets. Further identification that 970 may be needed for better labeling of received text may be achieved 971 from a number of sources. It may be the RTCP SDES CNAME and NAME 972 reports, and in the conference notification data (RFC 4575). 974 As soon as a new member is added to the RTP session, its 975 characteristics should be transmitted in RTCP SDES CNAME and NAME 976 reports according to section 6.5 in RFC 3550. The information about 977 the participant should also be included in the conference data 978 including the text media member in a notification according to RFC 979 4575. 981 The RTCP SDES report, SHOULD contain identification of the source 982 represented by the SSRC/CSRC identifier. This identification MUST 983 contain the CNAME field and MAY contain the NAME field and other 984 defined fields of the SDES report. 986 A focus UA SHOULD primarily convey SDES information received from the 987 sources of the session members. When such information is not 988 available, the focus UA SHOULD compose SSRC/CSRC, CNAME and NAME 989 information from available information from the SIP session with the 990 participant. 992 9. Presentation of multi-party text 994 All session participants MUST observe the SSRC/CSRC field of incoming 995 text RTP packets, and make note of what source they came from in 996 order to be able to present text in a way that makes it easy to read 997 text from each participant in a session, and get information about 998 the source of the text. 1000 9.1. Associating identities with text streams 1002 A source identity SHOULD be composed from available information 1003 sources and displayed together with the text as indicated in ITU-T 1004 T.140 Appendix [T.140]. 1006 The source identity should primarily be the NAME field from incoming 1007 SDES packets. If this information is not available, and the session 1008 is a two-party session, then the T.140 source identity SHOULD be 1009 composed from the SIP session participant information. For multi- 1010 party sessions the source identity may be composed by local 1011 information if sufficient information is not available in the 1012 session. 1014 Applications may abbreviate the presented source identity to a 1015 suitable form for the available display. 1017 9.2. Presentation details for multi-party aware UAs. 1019 The multi-party aware UA should after any action for recovery of data 1020 from lost packets, separate the incoming streams and present them 1021 according to the style that the receiving application supports and 1022 the user has selected. The decisions taken for presentation of the 1023 multi-party interchange shall be purely on the receiving side. The 1024 sending application must not insert any item in the stream to 1025 influence presentation that is not requested by the sending 1026 participant. 1028 9.2.1. Bubble style presentation 1030 One often used style is to present real-time text in chunks in 1031 readable bubbles identified by labels containing names of sources. 1032 Bubbles are placed in one column in the presentation area and are 1033 closed and moved upwards in the presentation area after certain items 1034 or events, when there is also newer text from another source that 1035 would go into a new bubble. The text items that allows bubble 1036 closing are any character closing a phrase or sentence followed by a 1037 space or a timeout of a suitable time (about 10 seconds). 1039 Real-time active text sent from the local user should be presented in 1040 a separate area. When there is a reason to close a bubble from the 1041 local user, the bubble should be placed above all real-time active 1042 bubbles, so that the time order that real-time text entries were 1043 completed is visible. 1045 Scrolling is usually provided for viewing of recent or older text. 1046 When scrolling is done to an earlier point in the text, the 1047 presentation shall not move the scroll position by new received text. 1049 It must be the decision of the local user to return to automatic 1050 viewing of latest text actions. It may be useful with an indication 1051 that there is new text to read after scrolling to an earlier position 1052 has been activated. 1054 The presentation area may become too small to present all text in all 1055 real-time active bubbles. Various techniques can be applied to 1056 provide a good overview and good reading opportunity even in such 1057 situations. The active real-time bubble may have a limited number of 1058 lines and if their contents need more lines, then a scrolling 1059 opportunity within the real-time active bubble is provided. Another 1060 method can be to only show the label and the last line of the active 1061 real-time bubble contents, and make it possible to expand or compress 1062 the bubble presentation between full view and one line view. 1064 Erasures require special consideration. Erasure within a real-time 1065 active bubble is straightforward. But if erasure from one 1066 participant affects the last character before a bubble, the whole 1067 previous bubble becomes the actual bubble for real-time action by 1068 that participant and is placed below all other bubbles in the 1069 presentation area. If the border between bubbles was caused by the 1070 CRLF characters, only one erasure action is required to erase this 1071 bubble border. When a bubble is closed, it is moved up, above all 1072 real-time active bubbles. 1074 9.2.2. Other presentation styles 1076 Other presentation styles than the bubble style may be arranged and 1077 appreciated by the users. In a video conference one way may be to 1078 have a real-time text area below the video view of each participant. 1079 Another view may be to provide one column in a presentation area for 1080 each participant and place the text entries in a relative vertical 1081 position corresponding to when text entry in them was completed. The 1082 labels can then be placed in the column header. The considerations 1083 for ending and moving and erasure of entered text discussed above for 1084 the bubble style are valid also for these styles. 1086 10. Presentation details for multi-party unaware UAs. 1088 Multi-party unaware UA:s are prepared only for presentation of two 1089 sources of text, the local user and a remote user. In order to 1090 enable some multi-party communication with such UA, the mixer need to 1091 plan the presentation and insert labels and line breaks before 1092 lables. Many limitations appear for this presentation mode, and it 1093 must be seen as a fallback and a last resort. 1095 See Appendix A for an informative example of a procedure for 1096 presenting RTT to a conference-unaware UA. 1098 11. Transmission of text from each user 1100 UAs participating in sessions with real-time text, SHOULD send SDES 1101 packets in RTCP giving values to appropriate identification fields. 1103 The CNAME field SHALL be included in SDES packets. 1105 The NAME field should be given a value that is suitable as an 1106 identifier of text from the user of the UA. 1108 12. Robustness and indication of possible loss 1110 This section discusses the means for robustness against loss of text 1111 that is already specified and their performance in the multi-party 1112 situation. means for reducing the risk for loss is discussed, as 1113 well as ways to detect in which stream loss has occurred. 1115 TBD 1117 13. Performance 1119 This section discusses performance and performance limitations for 1120 the different transport solutions, and indicates which means for 1121 performance increase versus load limitations can be suitable to apply 1122 compared to the point-to-point case. 1124 TBD 1126 14. Security Considerations 1128 The security considerations valid for RFC 4103 and RFC 3550 are valid 1129 also for the multi-party sessions with text. 1131 15. IANA Considerations 1133 EDITOR NOTE: TBD after decision of proposed preferences in the draft. 1135 This document Introduces the TBD /SIP media tag/SDP media level 1136 attribute/ rtt-mixer, with a comma-separated parameter list 1137 containing the following possible values: 1139 rtp-translator 1141 rtp-mixer 1143 t140-mixer 1145 rtp-mesh 1146 multi-session 1148 rtp-translator indicates capability for using the RTP-translator 1149 based coordination of multi-party text. 1151 rtp-mixer indicates capability for using the RTP-mixer based 1152 presentation of multi-party text. 1154 t140-mixer indicates capability for using the T.140 control code 1155 source indicators in a mixer. 1157 text-mixer indicates capability for using the fallback method with 1158 text formatting for conference-unaware endpoints. 1160 rtp-mesh indicates capability for using the mesh based transmission 1161 of multi-party text. 1163 multi-session indicates capability for using separate point-to-point 1164 RTP sessions between all participants. 1166 16. Congestion considerations 1168 The congestion considerations described in RFC 4103 are valid also 1169 for multi-party use of the real-time text RTP transport. A risk for 1170 congestion may appear if a number of conference participants are 1171 active transmitting text simultaneously, because this multi-party 1172 transmission method does not allow multiple sources of text to 1173 contribute to the same packet. 1175 In situations of risk for congestion, the Focus UA MAY combine 1176 packets from the same source to increase the transmission interval 1177 per source up to one second. Local conference policy in the Focus UA 1178 may be used to decide which streams shall be selected for such 1179 transmission frequency reduction. 1181 17. Acknowledgements 1183 Arnoud van Wijk for contributions to an earlier, expired draft of 1184 this memo. 1186 18. References 1188 18.1. Normative References 1190 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1191 Requirement Levels", BCP 14, RFC 2119, 1192 DOI 10.17487/RFC2119, March 1997, 1193 . 1195 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1196 A., Peterson, J., Sparks, R., Handley, M., and E. 1197 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1198 DOI 10.17487/RFC3261, June 2002, 1199 . 1201 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1202 Jacobson, "RTP: A Transport Protocol for Real-Time 1203 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1204 July 2003, . 1206 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 1207 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 1208 . 1210 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A 1211 Session Initiation Protocol (SIP) Event Package for 1212 Conference State", RFC 4575, DOI 10.17487/RFC4575, August 1213 2006, . 1215 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 1216 (SIP) Call Control - Conferencing for User Agents", 1217 BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, 1218 . 1220 [T.140] "Protocol for multimedia application text conversation", 1221 1998, . 1223 18.2. Informative References 1225 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 1226 Session Initiation Protocol (SIP)", RFC 4353, 1227 DOI 10.17487/RFC4353, February 2006, 1228 . 1230 [RFC4597] Even, R. and N. Ismail, "Conferencing Scenarios", 1231 RFC 4597, DOI 10.17487/RFC4597, August 2006, 1232 . 1234 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 1235 DOI 10.17487/RFC7667, November 2015, 1236 . 1238 Appendix A. Mixing for a conference-unaware UA 1240 This informational appendix describes media mixer procedures for a 1241 multi-party conference server to format real-time text from a number 1242 of participants into one single text stream to a participant with a 1243 terminal that has no features for multi-party text display. The 1244 procedures are intended for implementations using ITU-T T.140 [T.140] 1245 for the real-time text coding and presentation. 1247 A.1. Short description 1249 The media mixer procedures described here are intended to make real- 1250 time text from a number of call participants be coordinated into one 1251 text stream to a terminal originally intended for two-party calls. A 1252 conference server is supposed to apply the procedures. 1254 The procedures may also be applied on a terminal for display of 1255 multiple streams of real-time text in one area. 1257 The intention is that text from each participant shall be displayed 1258 in suitable sections so that it is easy to read, and text from one 1259 active participant at a time is sent and displayed in real-time. The 1260 receiving terminal is assumed to have one display area for received 1261 text. The display is arranged by this procedure in a text chat 1262 style, with a name label in front of each text section where switch 1263 of source of the text has taken place. 1265 When more than one participant transmits text at the same time, the 1266 text from only one of them is transmitted directly to the receiving 1267 terminals. Text from the other participants is stored in buffers in 1268 the conference server for transmission at a later time, when a 1269 suitable situation for switch of current transmitter can take place. 1271 A.2. Functionality goals and drawbacks 1273 The procedures are intended to make best efforts to present a multi- 1274 party text conversation on a terminal that has no awareness of multi- 1275 party calls. There are some obvious drawbacks, and a terminal 1276 designed with multi-party awareness will be able to present multi- 1277 party call contents in a more flexible way. Only two parties at a 1278 time will be allowed to display added text in real-time, while the 1279 other parties' produced text will need to be stored in the multi- 1280 party server for a moment awaiting a suitable occasion to be 1281 displayed. There are also some cases of erasure that will not be 1282 performed on the target text but only indicated in another way. Even 1283 with these drawbacks, the procedure provides an opportunity to 1284 display text from more than two parties in a smooth and readable way. 1286 This specification does not introduce any new protocol element, and 1287 does not rely on anything else than basic two-party terminal 1288 functionality with presentation level according to ITU-T T.140 1289 [T.140]. It is a description of a best current practice for mixing 1290 and presentation of the real-time text component in multi-party calls 1291 with terminals without multi-party awareness. 1293 The procedures are applicable to scenarios, when the conference focus 1294 and a User Agent have not gone through any successfully completed 1295 negotiation about conference awareness for the real-time text medium 1296 neither on the transport level, nor on the presentation level. 1298 A.3. Definitions 1300 Active participant: Any user sending text, or being in a pending 1301 period. 1303 BOM Byte-Order-Mark, the Unicode character FEFF in UCS-16. 1305 Buffer: A buffer intended for unsent text collected per 1306 participant. 1308 Contributing participants: The participants selected to contribute 1309 to the text stream sent to the recipients. 1311 By default all participants except the recipient are contributing 1312 participants for transmission to the recipient. 1314 Current participant: The participant for whom text currently is 1315 transmitted to the recipient in real time. 1317 Current Recipients: By default all participants. 1319 Display Counter: A counter for the number of displayable 1320 characters in a participant's buffer or in the current entry. 1321 Used for controlling how far erasure may be performed. 1323 Erasure replacement A character to be displayed when an erasure 1324 was done, but the text to erase is not reachable on the multi- 1325 party display. Default 'X'. 1327 Message delimiter: Character(s) forming the end of an imagined 1328 message. A configurable set of alternatives, consisting by 1329 default of: Line Separator, Paragraph Separator, CR, CRLF, LF. 1331 Pending period: A configurable time period of inactivity from a 1332 participant, by default set to 7 seconds after each reception of 1333 characters from that participant, evaluated as current time minus 1334 time stamp of latest entered character. 1336 Sentence delimiter: Characters forming end of sentence: A 1337 configurable set of alternatives, by default consisting of: dot 1338 '.', question mark '?' and exclamation mark '!' followed by a 1339 space. 1341 Label: A readable unique name for a participant, created by the 1342 server from a suitable source related to the participant, e.g. 1343 part of the SIP Display name, surrounded by the Label delimiters. 1344 The label should have a settable maximum length, with 12 being the 1345 default. 1347 Label delimiters A configurable set of characters at the edges of 1348 the Label, by default being a left bracket [ at the leading edge 1349 and a closing bracket ] followed by a space at the trailing edge. 1351 Line Separator Unicode UCS-16 2028. Used to request NewLine in 1352 Real-Time Text. 1354 Maximum waiting time: The maximum time any participant's text 1355 shall be allowed to wait for transmission, by default set to 20 1356 seconds. 1358 Recipient: The terminal receiving the mixed text stream. 1360 SGR Select Graphic Rendition, a control code to specify colours 1361 etc. 1363 Switch Reason: A set of reasons to switch Current Participant, 1364 consisting of the following 1366 -Waiting time higher for any other participant than the current 1367 participant combined with any of the following states: 1369 -A message delimiter was the latest transmitted item 1371 -A sentence delimiter was the latest transmitted item 1373 -A Pending Period has expired and still no text has been 1374 transmitted 1376 -The Maximum Waiting time has expired followed by a Word Delimiter 1377 or an expired Time Extension. 1379 Waiting time: The time the first character in queue for 1380 transmission from a participant has been waiting in a buffer for 1381 transmission. The granularity shall be 0.3 Seconds or finer. 1383 Word delimiter: Character forming end of word: space 1384 Time extension: A configurable short extension time allowed after 1385 the Maximum waiting time during which a suitable moment for 1386 switching Current Participant is awaited, by default set to 7 1387 seconds. 1389 A.4. Presentation level procedures 1391 The conference server applies these mixing procedures to text 1392 transmitted to all call participants who have not gone through a 1393 completed negotiation for conference awareness in real-time text 1394 presentation. 1396 All the participants and the conference server use real-time text 1397 conversation presentation coding according to ITU-T T.140 [T.140]. A 1398 consequence is that real-time text transmissions are UTF-8 coded, 1399 with control codes selected from ISO 6429 [ISO 6429]. 1401 The description is from the conference server point of view. 1403 A.4.1. Structure 1405 The real-time text mixer structure described here is supposed to be 1406 placed in the media path so that it is implemented with one mixer per 1407 recipient. A mixer contains buffers for temporary storage of text 1408 intended for the recipient. Each mixer has one buffer for each 1409 contributing participant. A set of status variables is maintained 1410 per buffer and is used in the mixer actions. The mixer logic decides 1411 for each moment which participant?s buffer content is to be sent on 1412 to the recipient. By default, the recipient does not contribute text 1413 to its own mixer. Text transmitted by a participant is usually 1414 displayed locally and will only cause confusion if it appears also in 1415 received text. 1417 If there is a reason, own text can be configured to be transmitted 1418 also to the participants. That can enable a simplification of the 1419 mixer design to have only one common set of buffers instead of a set 1420 per recipient. That simplification will however hamper the flow of 1421 the conversation severely and is therefore NOT RECOMMENDED. 1423 A.4.2. Action on reception 1425 This description of the mixer is valid per recipient. 1427 Text from each contributing participant is checked for a set of 1428 characteristics on reception. 1430 Delete BOM: BOM characters are deleted. 1432 Insert in buffer: Resulting text is put into the contributing 1433 participant?s buffer in the receiving participant?s mixer. 1435 Maintain a display counter: For each text character that will take 1436 a position on the receiving display, a Display Counter for each 1437 participant is increased by one. 1439 There is one T.140 real-time text item that consists of two 1440 characters, but is regarded to be a unit and therefore increase 1441 the Display Counter with one only.That is CRLF. 1443 Furthermore, the following control codes are regarded units that 1444 shall not take any position on the receiving display and shall 1445 therefore not increase the Display Counter: 1447 0098 string 009C (SOS-ST strings) 1449 ESC 0061 (INT) 1451 009B Ps 006D (the SGR code, with special handling described below) 1453 BEL (Alert in session) 1455 See the section on control codes below for details. 1457 Combination characters: Also note that it is possible to use 1458 combination characters in Unicode. Such combination characters 1459 contain more than one character part. They shall only increase 1460 the Display Counter with one. The combination characters mainly 1461 have components in the series 0300 ? 0361 and 20D0 ? 20E1. 1463 Erasure: If the control code for erasure, BS, is received, the 1464 following shall be done: If the Display Counter is 0, an Erasure 1465 Replacement character, by default being ?X? is inserted in the 1466 buffer instead of the erasure, to mark that erasure was intended 1467 in earlier transmitted entries. ( this matches traditional habits 1468 in real-time text when participants sometimes type XXX to indicate 1469 erasure they do not bother to make explicit). If the Display 1470 Counter is >0, then the counter is reduced by one, and the erasure 1471 control code BS put into the buffer. 1473 Initial action in the session: BOM shall be sent initially to the 1474 recipients in the beginning of the session. 1476 Maintaining a waiting time per participant: The time that text has 1477 been in the buffer is maintained as the waiting time for each 1478 buffer. A granularity of 0.3 seconds is sufficient. 1480 Storing time of reception for each character: Each character that 1481 is stored in a buffer shall be assigned with a time stamp 1482 indicating its time of reception. A granularity of 0.3 seconds is 1483 sufficient. This time stamp is used for calculation of idle time 1484 and waiting time in the evaluation of switch reasons. 1486 Initial assignment of the Current Participant: The first 1487 contributing participant to send text in the session is assigned 1488 to be the Current Participant. 1490 Actions on assignment of a Current Participant: When a participant 1491 becomes the Current Participant, the following initial actions 1492 shall be performed: 1494 1. Scanning transmissions and timers for a Switch Reason is 1495 inactivated. 1497 2. The Current Recipients are set so that all transmissions go to 1498 the new set of Current Recipients (See definition). 1500 3. A Line Separator is transmitted if the switch reason was any 1501 other than a message delimiter. 1503 4. The Label is transmitted 1505 5. Any stored SGR code is transmitted 1507 6. Scanning transmissions and timers for a Switch Reason is 1508 activated. 1510 7. Text in the buffer is transmitted, recalculating and setting 1511 the waiting time for each transmitted character based on the time 1512 of reception of next character in the buffer. If a switch occurs 1513 during transmission from the buffer, the remaining buffer contents 1514 is maintained and transmission can continue next time this 1515 transmitter becomes the current participant. Any text entered 1516 into the buffer for the current participant is after that sent to 1517 the recipient until a Switch Reason occurs. 1519 Actions on transmission and during the session: Transmissions are 1520 checked for control codes to act on at transmission as described 1521 below in the section about handling of control codes and such 1522 actions are performed. When the scanning of transmission and 1523 timers for a Switch Reason is active, the timers and the 1524 transmission to the recipient is analyzed for detection if a 1525 Switch Reason has occurred. See the definition of Switch Reasons 1526 for details. 1528 Actions when a Switch Reason has occurred: If a Switch Reason has 1529 occurred, then the following actions shall be performed: 1531 1. The Display Counter of the Current Participant is set to zero 1533 2. If there is an SGR code stored for the Current Participant, a 1534 reset of SGR shall be sent by the sequence SGR 0 [009B 0000 006D]. 1536 3. A participant with the longest waiting time is assigned to be 1537 the Current Participant, and the procedure for assignment of a 1538 Current Participant described above is performed. 1540 Handling of Control codes: The following control codes are 1541 specified by ITU-T T.140. Some of them require consideration in 1542 the conference server. Note that the codes presented here are 1543 expressed in UCS-16, while transmission is made in UTF-8 transform 1544 of these codes. Other sections specify procedures for handling of 1545 specific control codes in the conference server. 1547 BEL 0007 Bell, provides for alerting during an active session. 1549 BS 0008 Back Space, erases the last entered character. 1551 NEW LINE 2028 Line separator. 1553 CR LF 000D 000A A supported, but not preferred way of requesting a 1554 new line. 1556 INT ESC 0061 Interrupt (used to initiate mode negotiation 1557 procedure). 1559 SGR 009B Ps 006D Select graphic rendition. Ps is rendition 1560 parameters specified in ISO 6429. 1562 SOS 0098 Start of string, used as a general protocol element 1563 introducer, followed by a maximum 256 bytes string. 1565 ST 009C String terminator, end of SOS string. 1567 ESC 001B Escape - used in control strings. 1569 Byte order mark FEFF Zero width, no break space, used for 1570 synchronization. 1572 Missing text mark FFFD Replacement character, marks place in 1573 stream of possible text loss. 1575 Code for message border, useful, but not mentioned in T.140: New 1576 Message 2029 Paragraph separator 1578 Handling of Graphic Rendition SGR: The following procedure shall 1579 be followed in order to let the participants control the graphic 1580 rendition of their entries without disturbing other participants? 1581 graphic rendition. The text stream sent to a recipient shall be 1582 monitored for the SGR sequence. The latest conveyed SGR sequence 1583 is also stored as a status variable for the recipient. If the SGR 1584 0 code initiated from the current participant is transmitted, the 1585 SGR storage shall be cleared. 1587 A.5. Display examples 1589 The following pictures are examples of the view on a participant's 1590 display. 1592 _________________________________________________ 1593 | Conference | Alice | 1594 |________________________|_________________________| 1595 | |I will arrive by TGV. | 1596 |[Bob]:My flight is to |Convenient to the main | 1597 |Orly. |station. | 1598 |[Eve]:Hi all, can we | | 1599 |plan for the seminar. | | 1600 | | | 1601 |[Bob]:Eve, will you do | | 1602 |your presentation on | | 1603 |Friday? | | 1604 |[Eve]:Yes, Friday at 10.| | 1605 |[Bob]: Fine, wo |We need to meet befo | 1606 |________________________|_________________________| 1608 Figure 2 : Alice who has a conference-unaware client is receiving the 1609 multi-party real-time text in a single-stream. This figure shows how 1610 a coordinated column view MAY be presented on Alice's device. 1612 _________________________________________________ 1613 | |^| 1614 |[Alice] Hi, Alice here. | | 1615 | | | 1616 |[Bob] Bob as well. | | 1617 | | | 1618 |[Eve] Hi, this is Eve, calling from Paris. | | 1619 | I thought you should be here. | | 1620 | | | 1621 |[Alice] I am coming on Thursday, my | | 1622 | performance is not until Friday morning.| | 1623 | | | 1624 |[Bob] And I on Wednesday evening. | | 1625 | | | 1626 |[Eve] we can have dinner and then take a walk | | 1627 | | | 1628 | [Eve-typing] But I need to be back to | | 1629 | the hotel by 11 because I need |-| 1630 | |-| 1631 |______________________________________________|v| 1632 | of course, I underst | 1633 |________________________________________________| 1635 Figure 3 shows a conference view with real-time text preview. Bob?s 1636 text is buffering until a Current switch reason. 1638 A.6. Summary of configurable parameters 1640 A number of configurable parameters are described in this 1641 specification. This table provides a summary of the parameters on 1642 presentation level. A service provider implementing a multi-party 1643 service may want to set specific values on these parameters to adapt 1644 the characteristics of the service. It is possible to control them 1645 per recipient, if desired. 1647 Parameter: Current Recipients 1649 Purpose: Control if participant shall get their own text. 1651 Possible values: Exclude or Include Current Participant 1653 Default value: Exclude 1655 Comment: Own transmissions are usually displayed sufficiently locally 1657 Parameter: Erasure replacement 1659 Purpose: Character to show erasure, when erasure cannot be done 1660 Possible values: Character 1662 Default value: X 1664 Comment: May need to have other value for other than Latin script. 1666 Parameter: Message delimiter 1668 Purpose: Detection of suitable place in text for switching Current 1669 Participant 1671 Possible values: List of Unicode editing codes 1673 Default value: Line Separator, Paragraph Separator, CR, CRLF, LF 1675 Comment: Other than Latin based scripts may have other conventions 1677 Parameter: Pending period 1679 Purpose: Inactivity timer for detection of time to Switch Current 1680 Participant 1682 Possible values: Time in seconds 1684 Default value: 7 1686 Comment: Longer times may cause inefficient transmission. Shorter 1687 time may cause unwanted switching cutting lines of thought 1688 inconveniently 1690 Parameter: Sentence delimiter 1692 Purpose: Characters forming end of sentence 1694 Possible values: List of delimiters. 1696 Default value: . or ? or ! followed by a space 1698 Comment: Used for deciding on a position in the text to switch 1699 Current Participant according to configured logic. 1701 Parameter: Label length 1703 Purpose: Length of label put in front of or above entry. 1705 Possible values: Number of characters 1707 Default value: 12 1708 Comment: Includes any surrounding characters 1710 Parameter: Label delimiters 1712 Purpose: Set of characters at the edges of the label 1714 Possible values: Two strings. One in the beginning, one after. 1716 Default value: [] followed by a space 1718 Comment: It may be valid to include a Line Separator instead of the 1719 space 1721 Parameter: Maximum waiting time 1723 Purpose: The maximum time any participant?s text shall be allowed to 1724 wait for transmission 1726 Possible values: Seconds 1728 Default value: 20 1730 Comment After this time a Switch will be forced within the Time 1731 Extension 1733 Parameter: Word delimiter 1735 Purpose: Delimiter for words 1737 Possible values: List of characters 1739 Default value: Space 1741 Comment: Used for detection of suitable switch position if Maximum 1742 Waiting time has passed. 1744 Parameter: Time extension 1746 Purpose: Time for maximum further waiting for a Switch Reason 1748 Possible values: Time in seconds 1750 Default value: 7 1752 Comment: After this time a Switch is forced. 1754 A.7. References for this Appendix 1756 [T.140] ITU-T T.140 Application protocol, text conversation 1757 (including amendment 1.) 1759 [RFC 4103] IETF RFC 4103 RTP Payload for text conversation 1761 [RTP] IETF RFC 3550 RTP: A Transport Protocol for Real-Time 1762 Applications. 1764 [RFC 4579] IETF RFC 4579 SIP Call Control ? Conferencing for user 1765 agents. 1767 [ISO 6429] ISO 6429 Control functions for coded character sets. 1769 [UTF-8] IETF RFC 3629 UTF-8, a transformation format of ISO 10646 1771 [Unicode] The Unicode Consortium, "The Unicode Standard ? Version 1772 4.0? 1774 [ISO 10?646-1] ISO 10?646 Universal multiple-octet coded character 1775 set (UCS) 1777 [UCS-16] See ISO 10?646-1 1779 A.8. Acknowledgement 1781 This appendix was developed with funding in part from the National 1782 Institute on Disability and Rehabilitation Research, U.S. Department 1783 of Education,RERC on Telecommunications Access,?grant # H133E090001?. 1784 However, the contents do not necessarily represent the policy of the 1785 Department of Education, and you should not assume endorsement by the 1786 Federal Government. 1788 Author's Address 1790 Gunnar Hellstrom 1791 Omnitor 1792 Esplanaden 30 1793 Vendelso SE-136 70 1794 SE 1796 Phone: +46 708 204 288 1797 Email: gunnar.hellstrom@omnitor.se 1798 URI: www.omnitor.se