idnits 2.17.1 draft-ietf-avtcore-multi-party-rtt-mix-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC4102], [RFC4103]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == The 'Updates: ' line in the draft header should list only the _numbers_ of the RFCs which will be updated by this document (if approved); it should not include the word 'RFC' in the list. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 647 has weird spacing: '...example from ...' == Line 1368 has weird spacing: '...example from ...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: A party not performing as a mixer MUST not include the CSRC list. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: A party not performing as a mixer MUST not include the CSRC list if it has a single source of text. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: BEL 0007 Bell Alert in session, provides for alerting during an active session. The display count SHOULD not be altered. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: INT ESC 0061 Interrupt (used to initiate mode negotiation procedure). The display count SHOULD not be altered. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: SGR 009B Ps 006D Select graphic rendition. Ps is rendition parameters specified in ISO 6429. The display count SHOULD not be altered. The SGR code SHOULD be stored for the current source. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: SOS 0098 Start of string, used as a general protocol element introducer, followed by a maximum 256 bytes string and the ST. The display count SHOULD not be altered. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: ST 009C String terminator, end of SOS string. The display count SHOULD not be altered. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: ESC 001B Escape - used in control strings. The display count SHOULD not be altered for the complete escape code. (Using the creation date from RFC4102, updated by this document, for RFC5378 checks: 2003-12-18) (Using the creation date from RFC4103, updated by this document, for RFC5378 checks: 2003-11-21) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (12 July 2020) is 1384 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Bob' is mentioned on line 1975, but not defined ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Downref: Normative reference to an Informational RFC: RFC 8643 -- Possible downref: Non-RFC (?) normative reference: ref. 'T140' -- Possible downref: Non-RFC (?) normative reference: ref. 'T140ad1' Summary: 3 errors (**), 0 flaws (~~), 13 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 AVTCore G. Hellstrom 3 Internet-Draft Gunnar Hellstrom Accessible Communication 4 Updates: RFC 4102, RFC 4103 (if approved) 12 July 2020 5 Intended status: Standards Track 6 Expires: 13 January 2021 8 RTP-mixer formatting of multi-party Real-time text 9 draft-ietf-avtcore-multi-party-rtt-mix-07 11 Abstract 13 Real-time text mixers for multi-party sessions need to identify the 14 source of each transmitted group of text so that the text can be 15 presented by endpoints in suitable grouping with other text from the 16 same source. 18 Regional regulatory requirements specify provision of real-time text 19 in multi-party calls. RFC 4103 mixer implementations can use 20 traditional RTP functions for source identification, but the mixer 21 source switching performance is limited when using the default 22 transmission with redundancy. 24 Enhancements for RFC 4103 real-time text mixing is provided in this 25 document, suitable for a centralized conference model that enables 26 source identification and source switching. The intended use is for 27 real-time text mixers and multi-party-aware participant endpoints. 28 Two mechanisms are provided. The mechanisms builds on use of the 29 CSRC list in the RTP packet for source identification. One method 30 makes use of the same "text/red" format as for two-party sessions, 31 while the other makes use of an extended packet format "text/rex" for 32 more efficient transmission. 34 A capability exchange is specified so that it can be verified that a 35 participant can handle the multi-party coded real-time text stream. 36 The capability for one method is by use of a media attribute a=rtt- 37 mix-rtp-mixer. The other method is indicated by the media subtype 38 "text/rex". 40 The document updates RFC 4102[RFC4102] and RFC 4103[RFC4103] 42 A brief description about how a mixer can format text for the case 43 when the endpoint is not multi-party aware is also provided. 45 Status of This Memo 47 This Internet-Draft is submitted in full conformance with the 48 provisions of BCP 78 and BCP 79. 50 Internet-Drafts are working documents of the Internet Engineering 51 Task Force (IETF). Note that other groups may also distribute 52 working documents as Internet-Drafts. The list of current Internet- 53 Drafts is at https://datatracker.ietf.org/drafts/current/. 55 Internet-Drafts are draft documents valid for a maximum of six months 56 and may be updated, replaced, or obsoleted by other documents at any 57 time. It is inappropriate to use Internet-Drafts as reference 58 material or to cite them other than as "work in progress." 60 This Internet-Draft will expire on 13 January 2021. 62 Copyright Notice 64 Copyright (c) 2020 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 69 license-info) in effect on the date of publication of this document. 70 Please review these documents carefully, as they describe your rights 71 and restrictions with respect to this document. Code Components 72 extracted from this document must include Simplified BSD License text 73 as described in Section 4.e of the Trust Legal Provisions and are 74 provided without warranty as described in the Simplified BSD License. 76 Table of Contents 78 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 79 1.1. Selected solution and considered alternative . . . . . . 5 80 1.2. Nomenclature . . . . . . . . . . . . . . . . . . . . . . 6 81 1.3. Intended application . . . . . . . . . . . . . . . . . . 7 82 2. Specified solutions . . . . . . . . . . . . . . . . . . . . . 7 83 2.1. Negotiated use of the RFC 4103 format for multi-party in a 84 single RTP stream . . . . . . . . . . . . . . . . . . . . 7 85 2.2. Use of an extended packet format "text/rex" with text from 86 multiple sources . . . . . . . . . . . . . . . . . . . . 17 87 2.3. Mixing for multi-party unaware endpoints . . . . . . . . 35 88 3. Presentation level considerations . . . . . . . . . . . . . . 36 89 3.1. Presentation by multi-party aware endpoints . . . . . . . 36 90 3.2. Multi-party mixing for multi-party unaware endpoints . . 38 91 4. Gateway Considerations . . . . . . . . . . . . . . . . . . . 44 92 4.1. Gateway considerations with Textphones (e.g. TTYs). . . 44 93 4.2. Gateway considerations with WebRTC. . . . . . . . . . . . 45 94 5. Updates to RFC 4102 and RFC 4103 . . . . . . . . . . . . . . 45 95 6. Congestion considerations . . . . . . . . . . . . . . . . . . 46 96 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 46 97 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46 98 8.1. Registration of the "rtt-mix-rtp-mixer" sdp media 99 attribute . . . . . . . . . . . . . . . . . . . . . . . . 46 100 8.2. Registration of "text/rex" media subtype . . . . . . . . 47 101 9. Security Considerations . . . . . . . . . . . . . . . . . . . 47 102 10. Change history . . . . . . . . . . . . . . . . . . . . . . . 47 103 10.1. Changes included in 104 draft-ietf-avtcore-multi-party-rtt-mix-07 . . . . . . . 47 105 10.2. Changes included in 106 draft-ietf-avtcore-multi-party-rtt-mix-06 . . . . . . . 47 107 10.3. Changes included in 108 draft-ietf-avtcore-multi-party-rtt-mix-05 . . . . . . . 48 109 10.4. Changes included in 110 draft-ietf-avtcore-multi-party-rtt-mix-04 . . . . . . . 48 111 10.5. Changes included in 112 draft-ietf-avtcore-multi-party-rtt-mix-03 . . . . . . . 48 113 10.6. Changes included in 114 draft-ietf-avtcore-multi-party-rtt-mix-02 . . . . . . . 49 115 10.7. Changes to draft-ietf-avtcore-multi-party-rtt-mix-01 . . 49 116 10.8. Changes from 117 draft-hellstrom-avtcore-multi-party-rtt-source-03 to 118 draft-ietf-avtcore-multi-party-rtt-mix-00 . . . . . . . 50 119 10.9. Changes from 120 draft-hellstrom-avtcore-multi-party-rtt-source-02 to 121 -03 . . . . . . . . . . . . . . . . . . . . . . . . . . 50 122 10.10. Changes from 123 draft-hellstrom-avtcore-multi-party-rtt-source-01 to 124 -02 . . . . . . . . . . . . . . . . . . . . . . . . . . 50 125 10.11. Changes from 126 draft-hellstrom-avtcore-multi-party-rtt-source-00 to 127 -01 . . . . . . . . . . . . . . . . . . . . . . . . . . 51 128 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 51 129 11.1. Normative References . . . . . . . . . . . . . . . . . . 51 130 11.2. Informative References . . . . . . . . . . . . . . . . . 53 131 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 53 133 1. Introduction 135 RFC 4103[RFC4103] specifies use of RFC 3550 RTP [RFC3550] for 136 transmission of real-time text (RTT) and the "text/t140" format. It 137 also specifies a redundancy format "text/red" for increased 138 robustness. RFC 4102 [RFC4102] registers the "text/red" format. 139 Regional regulatory requirements specify provision of real-time text 140 in multi-party calls. 142 Real-time text is usually provided together with audio and sometimes 143 with video in conversational sessions. 145 The redundancy scheme of RFC 4103 [RFC4103] enables efficient 146 transmission of redundant text in packets together with new text. 147 However the redundancy header format has no source indicators for the 148 redundant transmissions. An assumption has had to be made that the 149 redundant parts in a packet are from the same source as the new text. 150 The recommended transmission is one new and two redundant generations 151 of text (T140blocks) in each packet and the recommended transmission 152 interval is 300 ms. 154 A mixer, selecting between text input from different sources and 155 transmitting it in a common stream needs to make sure that the 156 receiver can assign the received text to the proper sources for 157 presentation. Therefore, using RFC 4103 without any extra rule for 158 source identification, the mixer needs to stop sending new text from 159 one source and then make sure that all text so far has been sent with 160 all intended redundancy levels (usually two) before switching to 161 another source. That causes the long time of one second to switch 162 between transmission of text from one source to text from another 163 source when using the default transmission interval 300 ms. Both the 164 total throughput and the switching performance in the mixer would be 165 too low for most applications. However by shorting the transmission 166 interval to 100 ms, good performance is achieved for up to 3 167 simultaneously sending sources and usable performance for up to 5 168 simultaneously sending sources. This method is negotiated through an 169 sdp media attribute "rtt-mix-rtp-mixer". 171 A more efficient source identification scheme requires that each 172 redundant T140block has its source individually preserved. This 173 document introduces a source indicator by specific rules for 174 populating the CSRC-list and the data header in the RTP-packet. 176 An extended packet format "text/rex" is specified for this purpose, 177 providing the possibility to include text from up to 15 sources in 178 each packet in order to enhance mixer source switching performance. 179 By these extensions, the performance requirements on multi-party 180 mixing for real-time text are exceeded by the "text/rex" solution in 181 this document. 183 A negotiation mechanism can therefore be based on selection of the 184 "text/red" with media attribute "rtt-mix-rtp-mixer" or the "text/rex" 185 media format for verification that the parties are able to handle a 186 multi-party coded stream and agreeing on which method to use. 188 A fall-back mixing procedure is specified for cases when the 189 negotiation results in "text/red" without the "rtt-mix attribute" 190 being the only common submedia format. 192 The document updates RFC 4102[RFC4102] and RFC 4103[RFC4103] by 193 introducing an attribute for indicating multi-party capability, and 194 an extended packet format for the multi-party mixing case and more 195 strict rules for the source indications. 197 1.1. Selected solution and considered alternative 199 A number of alternatives were considered when searching an efficient 200 multi-party method for real-time text. This section explains a few 201 of them briefly. 203 One RTP stream per source, sent in the same RTP session with 204 "text/red" format. From some points of view, use of multiple RTP 205 streams, one for each source, sent in the same RTP session, called 206 the RTP translator model in RFC 3550 [RFC3550], would be 207 efficient, and use exactly the same packet format as RFC 4103, the 208 same payload type and a simple SDP declaration. However, there is 209 currently lack of support for multi-stream RTP in certain 210 implementation technologies. The multi-stream solution would also 211 cause more overhead than a single RTP stream solution "text/rex" 212 specified in this document and more the more simultaneous sending 213 participants there are. 215 The "text/red" format in RFC 4103 with shorter transmission 216 interval, and indicating source in CSRC. The "text/red" format with 217 "text/t140" payload in a single RTP stream can be sent with 100 ms 218 packet intervals instead of the regular 300 ms. The source is 219 indicated in the CSRC field. Source switching can then be done 220 every 300 ms while simultaneous transmission occurs. With two 221 participants sending text simultaneously, the switching and 222 transmission performance is good. With three or more 223 simultaneously sending participants, there will be a noticable 224 jerkiness in text presentation, more the more participants who 225 send text simultaneously. With three sending participants, the 226 jerkiness will be about 450 ms, and with five, about 1350 ms. 227 Text sent from a source at the end of the period its text is sent 228 by the mixer will have close to zero extra delay. Recent text 229 will be presented with no or low delay. The 1350 ms jerkiness 230 will be noticable and slightly unpleasant, but corresponds in time 231 to what typing humans often cause by hesitation or changing 232 position while typing. A benefit of this method is that no new 233 packet format needs to be introduced and implemented. Since 234 simultaneous typing by more than two parties is rare, and in many 235 applications also more than three parties in a call is rare, this 236 method can be used successfully without its limitations becoming 237 annoying. Negotiation is based on a new sdp media attribute "rtt- 238 mix-rtp-mixer". 240 The "text/rex" packet format with up to 15 sources in one packet. Th 241 e mechanism called "text/rex" specified in this document makes use 242 of the RTP mixer model specified in RFC3550[RFC3550]. Text from 243 up to 15 sources can be included in each packet. Packets are 244 normally sent every 300 ms. The mean delay will be 150 ms. The 245 sources are indicated in the CSRC list of the RTP packets. A new 246 redundancy packet format is specified, named "text/rex". 248 The presentation planned by the mixer for multi-party unaware 249 endpoints. It is desirable to have a method that does not require 250 any modifications in existing user devices implementing RFC 4103 251 for RTT without explicit support of multi-party sessions. This is 252 possible by having the mixer insert a new line and a text 253 formatted source label before each switch of text source in the 254 stream. Switch of source can only be done in places in the text 255 where it does not disturb the perception of the contents. Text 256 from only one source can be presented in real time at a time. The 257 delay will therefore be varying. The method has also other 258 limitations, but is included in this document as a fallback 259 method. In calls where parties take turns properly by ending 260 their entries with a new line, the limitations will have limited 261 influence on the user experience. while only two parties send 262 text, these two will see the text in real time with no delay. 264 1.2. Nomenclature 266 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 267 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 268 document are to be interpreted as described in [RFC2119]. 270 The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP- 271 mixer, RTP-translator are explained in [RFC3550] 273 The term "T140block" is defined in RFC 4103 [RFC4103] to contain one 274 or more T.140 code elements. 276 "TTY" stands for a text telephone type used in North America. 278 "WebRTC" stands for web based communication specified by W3C and 279 IETF. 281 "DTLS-SRTP" stnds for security specified in RFC 5764 [RFC5764]. 283 1.3. Intended application 285 The methods for multi-party real-time text are primarily intended for 286 use in transmission between mixers and endpoints in centralised 287 mixing configurations. It is also applicable between endpoints as 288 well as between mixers. An often mentioned application is for 289 emergency service calls with real-time text and voice, where a 290 calltaker want to make an attended handover of a call to another 291 agent, and stay observing in the session. Multimedia conference 292 sessions with support for participants to contribute in text is 293 another application. Conferences with central support for speech-to- 294 text conversion is yet another mentioned application. 296 In all these applications, normally only one participant at a time 297 will send long text utterances. In some cases, one other participant 298 will occasionally contribute with a longer comment simultaneously. 299 That may also happen in some rare cases when text is interpreted to 300 text in another language in a conference. Apart from these cases, 301 other participants are only expected to contribute with very brief 302 utterings while others are sending text. 304 Text is supposed to be human generated, by some text input means, 305 such as typing on a keyboard or using speech-to-text technology. 306 Occasional small cut-and-paste operations may appear even if that is 307 not the initial purpose of real-time text. 309 The real-time characteristics of real-time text is essential for the 310 participants to be able to contribute to a conversation. If the text 311 is too much delayed from typing a letter to its presentation, then, 312 in some conference situations, the opportunity to comment will be 313 gone and someone else will grab the turn. A delay of more than one 314 second in such situations is an obstacle for good conversation. 316 2. Specified solutions 318 2.1. Negotiated use of the RFC 4103 format for multi-party in a single 319 RTP stream 321 This section specifies use of the current format specified in 322 [RFC4103] for true multi-party real-time text. It is an update of 323 RFC 4103 by a clarification on one way to use it in the multi-party 324 situation. It is done by completing a negotiation for this kind of 325 multi-party capability and by indicating source in the CSRC element 326 in the RTP packets. Please use [RFC4103] as reference when reading 327 the following description. 329 2.1.1. Negotiation for use of this method 331 RFC 4103[RFC4103] specifies use of RFC 3550 RTP[RFC3550], and a 332 redundancy format "text/red" for increased robustness of real-time 333 text transmission. This document updates RFC 4102[RFC4102] and RFC 334 4103[RFC4103] by introducing a capability negotiation for handling 335 multi-party real-time text. The capability negotiation is based on 336 use of the sdp media attribute "rtt-mix-rtp-mixer". 338 The syntax is as follows: 339 a=rtt-mix-rtp-mixer 341 A transmitting party SHALL send text according to the multi-party 342 format only when the negotiation for this method was successful and 343 when the CC field in the RTP packet is 1. In all other cases, the 344 packets SHALL be populated as for a two-party session. 346 2.1.2. Use of fields in the RTP packets 348 The CC field SHALL show the number of members in the CSRC list, which 349 is one (1) in transmissions from a mixer involved in a multi-party 350 session, and otherwise 0. 352 When transmitted from a mixer during a multi-party session, a CSRC 353 list is included in the packet. The single member in the CSRC-list 354 SHALL contain the SSRC of the source of the T140blocks in the packet. 355 When redundancy is used, the recommended level of redundancy is to 356 use one primary and two redundant generations of T140blocks. In some 357 cases, a primary or redundant T140block is empty, but is still 358 represented by a member in the redundancy header. 360 From other aspects, the contents of the RTP packts are equal to what 361 is specified in RFC 4103. 363 2.1.3. Transmission of multi-party contents 365 As soon as a participant is known to participate in a session and 366 being available for text reception, a Unicode BOM character SHALL be 367 sent to it according to the procedures in this section. If the 368 transmitter is a mixer, then the source of this character SHALL be 369 indicated to be the mixer itself. 371 2.1.4. Keep-alive 373 After that, the transmitter SHALL send keep-alive traffic to the 374 receivers at regular intervals when no other traffic has occurred 375 during that interval if that is decided for the actual connection. 376 Recommendations for keep-alive can be found in RFC 6263[RFC6263]. 378 2.1.5. Transmission interval 380 A "text/red" transmitter SHOULD send packets distributed in time as 381 long as there is something (new or redundant T140blocks) to transmit. 382 The maximum transmission interval SHOULD then be 300 ms. It is 383 RECOMMENDED to send next packet to a receiver as soon as new text to 384 that receiver is available, as long as the time after the latest sent 385 packet to the same receiver is more than or equal to 100 ms, and also 386 the maximum character rate to the receiver is not exceeded. The 387 intention is to keep the latency low while keeping a good protection 388 against text loss in bursty packet loss conditions. New and 389 redundant text from one source MAY be transmitted in the same packet. 390 Text from different sources MUST NOT be transmitted in the same 391 packet. 393 2.1.6. Do not send received text to the originating source 395 Text received from a participant SHOULD NOT be included in 396 transmission to that participant. 398 2.1.7. Clean incoming text 400 A mixer SHALL handle reception and recovery of packet loss, marking 401 of possible text loss and deletion of 'BOM' characters from each 402 participant before queueing received text for transmission to 403 receiving participants. 405 2.1.8. Redundancy 407 The transmitting party using redundancy SHALL send redundant 408 repetitions of T140blocks aleady transmitted in earlier packets. The 409 number of redundant generations of T140blocks to include in 410 transmitted packets SHALL be deducted from the SDP negotiation. It 411 SHOULD be set to the minimum of the number declared by the two 412 parties negotiating a connection. 414 2.1.9. Text placement in packets 416 At time of transmission, the mixer SHALL populate the RTP packet with 417 all T140blocks queued for transmission originating from the source in 418 turn for transmission as long as this is not in conflict with the 419 allowed number of characters per second or the maximum packet size. 420 The SSRC of the source shall be placed as the member in the CSRC- 421 list. 423 2.1.10. Maximum number of sources per packet 425 When text from more than one source is available for transmission, 426 the mixer SHALL let the sources take turns in having their text 427 transmitted. When switching from transmission of one source to allow 428 another source to have its text sent, all intended redundant 429 generations of the last text from the current source MUST be 430 transmitted before text from another source can be transmitted. 431 Actively transmitting sources SHOULD be allowed to take turns as 432 frequently as possible to have their text transmitted. That implies 433 that with the recommended redundancy, the mixer SHALL send primary 434 text and two packets with redundant text from the current source 435 before text from another source is transmitted. The source with the 436 oldest received text in the mixer SHOULD be next in turn to get all 437 its available text transmitted. 439 Note: The CSRC-list in an RTP packet only includes the participant 440 who's text is included in text blocks. It is not the same as the 441 total list of participants in a conference. With audio and video 442 media, the CSRC-list would often contain all participants who are not 443 muted whereas text participants that don't type are completely silent 444 and thus are not represented in RTP packet CSRC-lists once their text 445 have been transmitted as primary and the intended number of redundant 446 generations. 448 2.1.11. Empty T140blocks 450 If no unsent T140blocks were available for a source at the time of 451 populating a packet, but T140blocks are available which have not yet 452 been sent the full intended number of redundant transmissions, then 453 the primary T140block for that source is composed of an empty 454 T140block, and populated (without taking up any length) in a packet 455 for transmission. The corresponding SSRC SHALL be placed as usual in 456 its place in the CSRC-list. 458 2.1.12. Creation of the redundancy 460 The primary T140block from a source in the latest transmitted packet 461 is used to populate the first redundant T140block for that source. 462 The first redundant T140block for that source from the latest 463 transmission is placed as the second redundant T140block. 465 Usually this is the level of redundancy used. If a higher number of 466 redundancy is negotiated, then the procedure SHALL be maintained 467 until all available redundant levels of T140blocks are placed in the 468 packet. If a receiver has negotiated a lower number of "text/red" 469 generations, then that level shall be the maximum used by the 470 transmitter. 472 2.1.13. Timer offset fields 474 The timestamp offset values are inserted in the data header, with the 475 time offset from the RTP timestamp in the packet when the 476 corresponding T140block was sent from its original source as primary. 478 The timestamp offsets are expressed in the same clock tick units as 479 the RTP timestamp. 481 The timestamp offset values for empty T140blocks have no relevance 482 but SHOULD be assigned realistic values. 484 2.1.14. Other RTP header fields 486 The number of members in the CSRC list ( 0 or 1) shall be placed in 487 the "CC" header field. Only mixers place value 1 in the "CC" field. 489 The current time SHALL be inserted in the timestamp. 491 The SSRC of the mixer for the RTT session SHALL be inserted in the 492 SSRC field of the RTP header. 494 The M-bit shall be handled as specified in [RFC4103]. 496 2.1.15. Pause in transmission 498 When there is no new T140block to transmit, and no redundant 499 T140block that has not been retransmitted the intended number of 500 times from any source, the transmission process can stop until either 501 new T140blocks arrive, or a keep-alive method calls for transmission 502 of keep-alive packets. 504 2.1.16. RTCP considerations 506 A mixer SHALL send RTCP reports with SDES, CNAME and NAME information 507 about the sources in the multi-party call. This makes it possible 508 for participants to compose a suitable label for text from each 509 source. 511 2.1.17. Reception of multi-party contents 513 The "text/red" receiver included in an endpoint with presentation 514 functions will receive RTP packets in the single stream from the 515 mixer, and SHALL distribute the T140blocks for presentation in 516 presentation areas for each source. Other receiver roles, such as 517 gateways or chained mixers are also feasible, and requires 518 consideration if the stream shall just be forwarded, or distributed 519 based on the different sources. 521 2.1.17.1. Multi-party vs two-party use 523 If the "CC" field value of a received packet is 1, it indicates that 524 multi-party transmission is active, and the receiver MUST be prepared 525 to act on the source according to its role. If the CC value is 0, 526 the connection is point-to-point. 528 2.1.17.2. Level of redundancy 530 The used level of redundancy generations SHALL be evaluated from the 531 received packet contents. The number of generations (including the 532 primary) is equal to the number of members in the redundancy header. 534 2.1.17.3. Extracting text and handling recovery and loss 536 The RTP sequence numbers of the received packets SHALL be monitored 537 for gaps and packets out of order. 539 As long as the sequence is correct, each packet SHALL be unpacked in 540 order. The T140blocks SHALL be extracted from the primary area, and 541 the corresponding SSRC SHALL be extracted from the CSRC list and used 542 for assigning the new T140block to the correct presentation areas (or 543 correspondingly for other receiver roles). 545 If a sequence number gap appears and is still there after some 546 defined time for jitter resolution, T140data SHALL be recovered from 547 redundant data. If the gap is wider than the number of generations 548 of redundant T140blocks in the packet, then a t140block SHALL be 549 created with a marker for possible text loss [T140ad1] and assigned 550 to the SSRC of the transmitter as a general input from the mixer 551 because in general it is not possible to deduct from which source(s) 552 text was lost. It is in some cases possible to deduct that no text 553 was lost even for a gap wider than the redundancy generations, and in 554 some cases it can be concluded which source that likely had loss. 555 Therefore, the receiver MAY insert the marker for possible text loss 556 [T140ad1] in the presentation area corresponding to the source which 557 may have had loss. 559 Then, the T140block in the received packet SHALL be retrieved 560 beginning with the highest redundant generation, and assigning it to 561 the presentation area of that source. Finally the primary T140block 562 SHALL be retrieved from the packet and similarly assigned to the 563 corresponding presentation area for the source. 565 If the sequence number gap was equal to or less than the number of 566 redundancy generations in the received packet, a missing text marker 567 SHALL NOT be inserted, and instead the T140block and the SSRC fully 568 recovered from the redundancy information and the CSRC-list in the 569 way indicated above. 571 2.1.17.4. Delete BOM 573 Unicode character "BOM" is used as a start indication and sometimes 574 used as a filler or keep alive by transmission implementations. 575 These SHALL be deleted on reception. 577 2.1.17.5. Empty T140blocks 579 Empty T140blocks are included as fillers for unused redundancy levels 580 in the packets. They just do not provide any contents and do not 581 contribute to the received streams. 583 2.1.18. Performance considerations 585 This solution has good performance up to three participants 586 simultaneously sending text. At higher numbers of participants 587 simultaneously sending text, a jerkiness is visible in the 588 presentation of text. With five participants simultaneously 589 transmitting text, the jerkiness is about 1400 ms. Evenso, the 590 transmission of text catches up, so there is no resulting delay 591 introduced. The solution is therefore suitable for emergency service 592 use, relay service use, and small or well-managed larger multimedia 593 conferences. It is only less suitable for large conferences with a 594 high number of participants sending text simultaneously. It should 595 be noted that it is only the number of users sending text within the 596 same moment that causes jerkiness, not the total number of users with 597 RTT capability. 599 2.1.19. Offer/answer considerations 601 A party which has negotiated the "rtt-mix-rtp-mixer" sdp media 602 attribute MUST populate the CSRC-list and format the packets 603 according to this section if it acts as an rtp-mixer and sends multi- 604 party text. 606 A party which has negotiated the the "rtt-mix-rtp-mixer" sdp media 607 attribute MUST interpret the contents of the CSRC-list and the 608 packets according to this section in received rtp packets in the 609 corresponding RTP stream. 611 A party performing as a mixer, which has not negotiated the "rtt-mix- 612 rtp-mixer" sdp media attribute, but negotiated a "text/red" or "text/ 613 t140" format in a session with a participant SHOULD, if nothing else 614 is specified for the application, format transmitted text to that 615 participant to be suitable to present on a multi-party unaware 616 endpoint as further specified in section Section 3.2. 618 A party not performing as a mixer MUST not include the CSRC list. 620 2.1.20. Security for session control and media 622 Security SHOULD be applied on both session control and media. In 623 applications where legacy endpoints without security may exist, a 624 negotiation between security and no security SHOULD be applied. If 625 no other security solution is mandated by the application, then RFC 626 8643 OSRTP[RFC8643] SHOULD be applied to negotiate SRTP media 627 security with DTLS. Most SDP examples below are for simplicity 628 expressed without the security additions. The principles (but not 629 all details) for applying DTLS-SRTP security is shown in a couple of 630 the following examples. 632 2.1.21. SDP offer/answer examples 634 This sections shows some examples of SDP for session negotiation of 635 the real-time text media in SIP sessions. Audio is usually provided 636 in the same session, and sometimes also video. The examples only 637 show the part of importance for the real-time text media. 639 Offer example for "text/red" format and multi-party support: 641 m=text 11000 RTP/AVP 100 98 642 a=rtpmap:98 t140/1000 643 a=rtpmap:100 red/1000 644 a=fmtp:100 98/98/98 645 a=rtt-mix-rtp-mixer 647 Answer example from a multi-party capable device 648 m=text 11000 RTP/AVP 100 98 649 a=rtpmap:98 t140/1000 650 a=rtpmap:100 red/1000 651 a=fmtp:100 98/98/98 652 a=rtt-mix-rtp-mixer 654 Offer example for "text/red" format including multi-party 655 and security: 656 a=fingerprint: SHA-1 \ 657 4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB 658 m=text 11000 RTP/AVP 100 98 659 a=rtpmap:98 t140/1000 660 a=rtpmap:100 red/1000 661 a=fmtp:100 98/98/98 662 a=rtt-mix-rtp-mixer 664 The "Fingerprint" is sufficient to offer DTLS-SRTP, with the media 665 line still indicating RTP/AVP. 667 Answer example from multi-party capable device with security 668 a=fingerprint: SHA-1 \ 669 FF:FF:FF:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB 670 m=text 11000 RTP/AVP 100 98 671 a=rtpmap:98 t140/1000 672 a=rtpmap:100 red/1000 673 a=fmtp:100 98/98/98 674 a=rtt-mix-rtp-mixer 676 With the "fingerprint" the device acknowledges use of SRTP/DTLS. 678 Answer example from a multi-party unaware device that also 679 does not support security: 681 m=text 12000 RTP/AVP 100 98 682 a=rtpmap:98 t140/1000 683 a=rtpmap:100 red/1000 684 a=fmtp:100 98/98/98 686 2.1.22. Packet sequence example 688 This example shows a symbolic flow of packets from a mixer with loss 689 and recovery. A and B are sources of RTT. P indicates primary data. 690 R1 is first redundant generation data and R2 is second redundant 691 generation data. A1, B1, A2 etc are text chunks (T140blocks) 692 received from the respective sources. X indicates dropped packet 693 between the mixer and a receiver. 695 |----------------| 696 |Seq no 1 | 697 |CC=1 | 698 |CSRC list A | 699 |R2: A1 | 700 |R1: A2 | 701 |P: A3 | 702 |----------------| 704 Assuming that earlier packets ( with text A1 and A2) were received in 705 sequence, text A3 is received from packet 1 and assigned to reception 706 area A. The mixer is now assumed to have received text from source B 707 and need to prepare for sending that text. First it must send the 708 redundant generations of text A1. 710 |----------------| 711 |Seq no 2 | 712 |CC=1 | 713 |CSRC list A | 714 |R2 A2 | 715 |R1: A3 | 716 |P: Empty | 717 |----------------| 718 Nothing needs to be retrieved from this packet. 720 X----------------| 721 X Seq no 3 | 722 X CC=1 | 723 X CSRC list A | 724 X R2: A3 | 725 X R1: Empty | 726 X P: Empty | 727 X----------------| 728 Packet 3 is assumed to be dropped in network problems 730 X----------------| 731 X Seq no 4 | 732 X CC=1 | 733 X CSRC list B | 734 X R2: Empty | 735 X R1: Empty | 736 X P2: B1 | 737 X----------------| 738 Packet 4 contains text from B, assumed dropped in network problems. 739 The mixer is assumed to have received text from A on turn to send. 740 Sending of text from B must therefore be temporarily ended by 741 sending redundancy twice. 743 X----------------| 744 X Seq no 5 | 745 X CC=1 | 746 X CSRC list B | 747 X R2: Empty | 748 X R1: B1 | 749 X P: Empty | 750 X----------------| 751 Packet 5 is assumed to be dropped in network problems 753 |----------------| 754 |Seq no 6 | 755 |CC=1 | 756 |CSRC list B | 757 | R2: B1 | 758 | R1: Empty | 759 | P: Empty | 760 |----------------| 762 Packet 6 is received. The latest received sequence number was 2. 763 Recovery is therefore tried for 3,4,5. There is no coverage for seq 764 no 3. But knowing that A1 must have been sent as R2 in packet 3, it 765 can be concluded that nothing was lost. 767 For seqno 4, text B1 is recovered from the second generation 768 redundancy and appended to the reception area of B. For seqno 5, 769 nothing needs to be recovered. No primary text is available in 770 packet 6. 772 After this sequence, A3 and B1 have been received. In this case no 773 text was lost. Even if also packet 2 was lost, it can be concluded 774 that no text was lost. 776 If also packets 1 and 2 were lost, there would be a need to create a 777 marker for possibly lost text (U'FFFD) [T140ad1], inserted generally 778 and possibly also in text sequences A and B. 780 2.2. Use of an extended packet format "text/rex" with text from 781 multiple sources 783 The method specified in this section called "text/rex" has higher 784 performance than the previous method. Text from up to 15 sources can 785 be included in each packet. This may be of value in large non- 786 managed conferences. 788 2.2.1. Use of fields in the RTP packets 790 RFC 4103[RFC4103] specifies use of RFC 3550 RTP[RFC3550], and a 791 redundancy format "text/red" for increased robustness of real-time 792 text transmission. This document updates RFC 4102[RFC4102] and RFC 793 4103[RFC4103] by introducing a format "text/rex" with a rule for 794 populating and using the CSRC-list in the RTP packet and extending 795 the redundancy header to be called a data header. This is done in 796 order to enhance the performance in multi-party RTT sessions. 798 The "text/rex" format can be seen as an "n-tuple" variant of the 799 "text/red" format intended to carry text information from up to 15 800 sources per packet. 802 The CC field SHALL show the number of members in the CSRC list, which 803 is one per source represented in the packet. 805 When transmitted from a mixer, a CSRC list is included in the packet. 806 The members in the CSRC-list SHALL contain the SSRCs of the sources 807 of the T140blocks in the packet. The order of the CSRC members MUST 808 be the same as the order of the sources of the data header fields and 809 the T140blocks. When redundancy is used, text from all included 810 sources MUST have the same number of redundant generations. The 811 primary, first redundant, second redundant and possible further 812 redundant generations of T140blocks MUST be grouped per source in the 813 packet in "source groups". The recommended level of redundancy is to 814 use one primary and two redundant generations of T140blocks. In some 815 cases, a primary or redundant T140block is empty, but is still 816 represented by a member in the data header. 818 The RTP header is followed by one or more source groups of data 819 headers: one header for each text block to be included. Each of 820 these data headers provides the timestamp offset and length of the 821 corresponding data block, in addition to the payload type number 822 corresponding to the payload format "text/t140". The data headers 823 are followed by the data fields carrying T140blocks from the sources. 825 0 1 2 3 826 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 827 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 828 |F| block PT | timestamp offset | block length | 829 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 831 Figure 1: The bits in the data header. 833 The bits in the data header are specified as follows: 835 F: 1 bit First bit in header indicates whether another header block 836 follows. It has value 1 if further header blocks follow, and 837 value 0 if this is the last header block. 839 block PT: 7 bits RTP payload type number for this block, 840 corresponding to the t140 payload type from the RTPMAP SDP 841 attribute. 843 timestamp offset: 14 bits Unsigned offset of timestamp of this block 844 relative to the timestamp given in the RTP header. The offset is 845 a time to be subtracted from the current timestamp to determine 846 the timestamp of the data when the latest part of this block was 847 sent from the original source. If the timestamp offset would be 848 >15 000, it SHALL be set to 15 000. For redundant data, the 849 resulting time is the time when the data was sent as primary from 850 the original source. If the value would be >15 000, then it SHALL 851 be set to 15 000 plus 300 times the redundancy level of the data. 852 The high values appear only in exceptional cases, e.g. when some 853 data has been held in order to keep the text flow under the 854 Characters Per Second (CPS) limit. 856 block length: 10 bits Length in bytes of the corresponding data 857 block excluding the header. 859 The header for the final block has a zero F bit, and apart from that 860 the same fields as other data headers. 862 Note: The "text/rex" packet format is similar to that of RFC 2198 863 [RFC2198] but is different from some aspects. RFC 2198 associates 864 the whole of the CSRC-list with the primary data and assumes that the 865 same list applies to reconstructed redundant data. In this section a 866 T140block is associated with exactly one CSRC list member as 867 described above. Also RFC 2198 [RFC2198] anticipates infrequent 868 change to CSRCs; implementers should be aware that the order of the 869 CSRC-list according to this section will vary during transitions 870 between transmission from the mixer of text originated by different 871 participants. Another difference is that the last member in the data 872 header area in RFC 2198 [RFC2198] only contains the payload type 873 number while in this section it has the same format as all other 874 entries in the data header. 876 The picture below shows a typical "text/rex" RTP packet with multi- 877 party RTT contents from three sources and coding according to this 878 section. 880 0 1 2 3 881 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 882 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 883 |V=2|P|X| CC=3 |M| "REX" PT | RTP sequence number | 884 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 885 | timestamp of packet creation | 886 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 887 | synchronization source (SSRC) identifier | 888 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 889 | CSRC list member 1 = SSRC of source of "A" | 890 | CSRC list member 2 = SSRC of source of "B" | 891 | CSRC list member 3 = SSRC of source of "C" | 892 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 893 |1| T140 PT |timestmp offset of "A-R2" |"A-R2" block length| 894 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 895 |1| T140 PT |timestamp offset of "A-R1" |"A-R1" block length| 896 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 897 |1| T140 PT | timestamp offset of "A-P" |"A-P" block length | 898 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 899 |1| T140 PT |timestamp offset of "B-R2" |"B-R2" block length| 900 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 901 |1| T140 PT |timestamp offset of "B-R1" |"B-R1" block length| 902 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 903 |1| T140 PT | timestamp offset of "B-P" | "B-P" block length| 904 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 905 |1| T140 PT |timestamp offset of "C-R2" |"C-R2" block length| 906 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 907 |1| T140 PT |timestamp offset of "C-R1" |"C-R1" block length| 908 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 909 |0| T140 PT |timestamp offset of "C-P" |"C-P" block length | 910 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 911 | "A-R2" T.140 encoded redundant data | 912 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 913 | |"A-R1" T.140 encoded redundant data | 914 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 915 |"A-P" T.140 encoded primary | | 916 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 917 | "B-R2" T.140 encoded redundant data | | 918 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 919 | "B-R1" T.140 encoded redundant data | 920 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 921 | "B-P" T.140 encoded primary data | | 922 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 923 | "C-R2" T.140 encoded redundant data | | 924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 925 | "C-R1" T.140 encoded redundant data | 926 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 927 | "C-P" T.140 encoded primary data | 928 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 929 Figure 2:A "text/rex" packet with text from three sources A, B, C. 931 A-P, B-P, C-P are primary data from A, B and C. 933 A-R1, B-R1, C-R1 are first redundant generation data from A, B and C. 935 A-R2, B-R2, C-R2 are first redundant generation data from A, B and C. 937 In a real case, some of the data headers will likely indicate a zero 938 block length, and no corresponding T.140 data. 940 2.2.2. Actions at transmission by a mixer 942 2.2.2.1. Initial BOM transmission 944 As soon as a participant is known to participate in a session and 945 being available for text reception, a Unicode "BOM" character SHALL 946 be sent to it according to the procedures in this section. If the 947 transmitter is a mixer, then the source of this character SHALL be 948 indicated to be the mixer itself. 950 2.2.2.2. Keep-alive 952 After that, the transmitter SHALL send keep-alive traffic to the 953 receivers at regular intervals when no other traffic has occurred 954 during that interval if that is decided for the actual connection. 955 Recommendations for keep-alive can be found in RFC 6263[RFC6263]. 957 2.2.2.3. Transmission interval 959 A "text/rex" transmitter SHOULD send packets distributed in time as 960 long as there is something (new or redundant T140blocks) to transmit. 961 The maximum transmission interval SHOULD then be 300 ms. It is 962 RECOMMENDED to send a packet to a receiver as soon as new text to 963 that receiver is available, as long as the time after the latest sent 964 packet to the same receiver is more than 150 ms, and also the maximum 965 character rate to the receiver is not exceeded. The intention is to 966 keep the latency low while keeping a good protection against text 967 loss in bursty packet loss conditions. 969 2.2.2.4. Do not send received text to the originating source 971 Text received from a participant SHOULD NOT be included in 972 transmission to that participant. 974 2.2.2.5. Clean incoming text 976 A mixer SHALL handle reception and recovery of packet loss, marking 977 of possible text loss and deletion of 'BOM' characters from each 978 participant before queueing received text for transmission to 979 receiving participants. 981 2.2.2.6. Redundancy 983 The transmitting party using redundancy SHALL send redundant 984 repetitions of T140blocks aleady transmitted in earlier packets. The 985 number of redundant generations of T140blocks to include in 986 transmitted packets SHALL be deducted from the SDP negotiation. It 987 SHOULD be set to the minimum of the number declared by the two 988 parties negotiating a connection. The same number of redundant 989 generations MUST be used for text from all sources when it is 990 transmitted to a receiver. The number of generations sent to a 991 receiver SHALL be the same during the whole session unless it is 992 modified by session renegotiation. 994 2.2.2.7. Text placement in packets 996 At time of transmission, the mixer SHALL populate the RTP packet with 997 T140blocks combined from all T140blocks queued for transmission 998 originating from each source as long as this is not in conflict with 999 the allowed number of characters per second or the maximum packet 1000 size. These T140blocks SHALL be placed in the packet interleaved 1001 with redundant T140blocks and new T140blocks from other sources. The 1002 SSRC of each source shall be placed as a member in the CSRC-list at a 1003 place corresponding to the place of its T140blocks in the packet. 1005 2.2.2.8. Maximum number of sources per packet 1007 Text from a maximum of 15 sources MAY be included in a packet. The 1008 reason for this limitation is the maximum number of CSRC list members 1009 allowed in a packet. If text from more sources need to be 1010 transmitted, the mixer MAY let the sources take turns in having their 1011 text transmitted. When stopping transmission of one source to allow 1012 another source to have its text sent, all intended redundant 1013 generations of the last text from the source to be stopped MUST be 1014 transmitted before text from another source can be transmitted. 1015 Actively transmitting sources SHOULD be allowed to take turns with 1016 short intervals to have their text transmitted. 1018 Note: The CSRC-list in an RTP packet only includes participants who's 1019 text is included in text blocks. It is not the same as the total 1020 list of participants in a conference. With audio and video media, 1021 the CSRC-list would often contain all participants who are not muted 1022 whereas text participants that don't type are completely silent and 1023 thus are not represented in RTP packet CSRC-lists once their text 1024 have been transmitted as primary and the intended number of redundant 1025 generations. 1027 2.2.2.9. Empty T140blocks 1029 If no unsent T140blocks were available for a source at the time of 1030 populating a packet, but T140blocks are available which have not yet 1031 been sent the full intended number of redundant transmissions, then 1032 the primary T140block for that source is composed of an empty 1033 T140block, and populated (without taking up any length) in a packet 1034 for transmission. The corresponding SSRC SHALL be placed in its 1035 place in the CSRC-list. 1037 2.2.2.10. Creation of the redundancy 1039 The primary T140block from each source in the latest transmitted 1040 packet is used to populate the first redundant T140block for that 1041 source. The first redundant T140block for that source from the 1042 latest transmission is placed as the second redundant T140block. 1044 Usually this is the level of redundancy used. If a higher number of 1045 redundancy is negotiated, then the procedure SHALL be maintained 1046 until all available redundant levels of T140blocks and their sources 1047 are placed in the packet. If a receiver has negotiated a lower 1048 number of "text/rex" generations, then that level shall be the 1049 maximum used by the transmitter. 1051 2.2.2.11. Timer offset fields 1053 The timer offset values are inserted in the data header, with the 1054 time offset from the RTP timestamp in the packet when the 1055 corresponding T140block was sent from its original source as primary. 1057 The timer offsets are expressed in the same clock tick units as the 1058 RTP timestamp. 1060 The timestamp offset values for empty T140blocks have no relevance 1061 but SHOULD be assigned realistic values. 1063 2.2.2.12. Other RTP header fields 1065 The number of members in the CSRC list shall be placed in the "CC" 1066 header field. Only mixers place values >0 in the "CC" field. 1068 The current time SHALL be inserted in the timestamp. 1070 The SSRC of the mixer for the RTT session SHALL be inserted in the 1071 SSRC field of the RTP header. 1073 The M-bit SHALL be set to 1 first in the session and after a pause. 1075 2.2.2.13. Pause in transmission 1077 When there is no new T140block to transmit, and no redundant 1078 T140block that has not been retransmitted the intended number of 1079 times, the transmission process can stop until either new T140blocks 1080 arrive, or a keep-alive method calls for transmission of keep-alive 1081 packets. 1083 2.2.3. Actions at reception 1085 The "text/rex" receiver included in an endpoint with presentation 1086 functions will receive RTP packets in the single stream from the 1087 mixer, and SHALL distribute the T140blocks for presentation in 1088 presentation areas for each source. Other receiver roles, such as 1089 gateways or chained mixers are also feasible, and requires 1090 consideration if the stream shall just be forwarded, or distributed 1091 based on the different sources. 1093 2.2.3.1. Multi-party vs two-party use 1095 If the "CC" field value of a received packet is >0, it indicates that 1096 multi-party transmission is active, and the receiver MUST be prepared 1097 to act on the different sources according to its role. If the CC 1098 value is 0, the transmission is point-to-point. 1100 2.2.3.2. Level of redundancy 1102 The used level of redundancy generations SHALL be evaluated from the 1103 received packet contents. If the CC value is 0, the number of 1104 generations (including the primary) is equal to the number of members 1105 in the data header. If the CC value is >0, the number of generations 1106 (including the primary) is equal to the number of members in the data 1107 header divided by the CC value. If the remainder from the division 1108 is >0, then the packet is malformed and SHALL cause an error 1109 indication in the receiver. 1111 2.2.3.3. Extracting text and handling recovery and loss 1113 The RTP sequence numbers of the received packets SHALL be monitored 1114 for gaps and packets out of order. 1116 As long as the sequence is correct, each packet SHALL be unpacked in 1117 order. The T140blocks SHALL be extracted from the primary areas, and 1118 the corresponding SSRCs SHALL be extracted from the corresponding 1119 positions in the CSRC list and used for assigning the new T140block 1120 to the correct presentation areas (or correspondingly). 1122 If a sequence number gap appears and is still there after some 1123 defined time for jitter resolution, T140data SHALL be recovered from 1124 redundant data. If the gap is wider than the number of generations 1125 of redundant T140blocks in the packet, then a t140block SHALL be 1126 created with a marker for possible text loss [T140ad1] and assigned 1127 to the SSRC of the transmitter as a general input from the mixer 1128 because in general it is not possible to deduct from which sources 1129 text was lost. It is however likely that the sources which had loss 1130 were active in transmission just before or after the sequence number 1131 gap. Therefore, the receiver MAY insert the marker for possible text 1132 loss [T140ad1] in the presentation areas corresponding to the sources 1133 which had text in the packets just before and after the gap. 1135 Then, the T140blocks in the received packet SHALL be retrieved 1136 beginning with the highest redundant generation, grouping them with 1137 the corresponding SSRC from the CSRC-list and assigning them to the 1138 presentation areas per source. Finally the primary T140blocks SHALL 1139 be retrieved from the packet and similarly their sources retrieved 1140 from the corresponding positions in the CSRC-list, and then assigned 1141 to the corresponding presentation areas for the sources. 1143 If the sequence number gap was equal to or less than the number of 1144 redundancy generations in the received packet, a missing text marker 1145 SHALL NOT be inserted, and instead the T140blocks and their SSRCs 1146 fully recovered from the redundancy information and the CSRC-list in 1147 the way indicated above. 1149 2.2.3.4. Delete BOM 1151 Unicode character "BOM" is used as a start indication and sometimes 1152 used as a filler or keep alive by transmission implementations. 1153 These SHALL be deleted on reception. 1155 2.2.3.5. Empty T140blocks 1157 Empty T140blocks are included as fillers for unused redundancy levels 1158 in the packets. They just do not provide any contents and do not 1159 contribute to the received streams. 1161 2.2.4. RTCP considerations 1163 A mixer SHALL send RTCP reports with SDES, CNAME and NAME information 1164 about the sources in the multi-party call. This makes it possible 1165 for participants to compose a suitable label for text from each 1166 source. 1168 2.2.5. Chained operation 1170 By strictly applying the rules for "text/rex" packet format by all 1171 conforming devices, mixers MAY be arranged in chains. 1173 2.2.6. Usage without redundancy 1175 The "text/rex" format SHALL be used also for multi-party 1176 communication when the redundancy mechanism is not used. That MAY be 1177 the case when robustness in transmission is provided by some other 1178 means than by redundancy. All aspects of this section SHALL be 1179 applied except the redundant generations in transmission. 1181 The "text/rex" format SHOULD thus be used for multi-party operation, 1182 also when some other protection against packet loss is utilized, for 1183 example a reliable network or transport. The format is also suitable 1184 to be used for point-to-point operation. 1186 2.2.7. Use with SIP centralized conferencing framework 1188 The SIP conferencing framework, mainly specified in RFC 1189 4353[RFC4353], RFC 4579[RFC4579] and RFC 4575[RFC4575] is suitable 1190 for coordinating sessions including multi-party RTT. The RTT stream 1191 between the mixer and a participant is one and the same during the 1192 conference. Participants get announced by notifications when 1193 participants are joining or leaving, and further user information may 1194 be provided. The SSRC of the text to expect from joined users MAY be 1195 included in a notification. The notifications MAY be used both for 1196 security purposes and for translation to a label for presentation to 1197 other users. 1199 2.2.8. Conference control 1201 In managed conferences, control of the real-time text media SHOULD be 1202 provided in the same way as other for media, e.g. for muting and 1203 unmuting by the direction attributes in SDP [RFC4566]. 1205 Note that floor control functions may be of value for RTT users as 1206 well as for users of other media in a conference. 1208 2.2.9. Media Subtype Registration 1210 This registration is done using the template defined in [RFC6838] and 1211 following [RFC4855]. 1213 Type name: 1214 text 1216 Subtype name: 1217 rex 1219 Required parameters: 1220 rate: 1221 The RTP timestamp (clock) rate. The only valid value is 1000. 1223 pt: 1224 a comma-separated list of RTP payload types. Because comma is 1225 a special character, the list must be a quoted-string (enclosed 1226 in double quotes). Each list element is a mapping of the 1227 dynamic payload type number to an embedded Content-type 1228 specification for the payload format corresponding to the 1229 payload type. The format of the mapping is: 1231 payload-type-number "=" content-type 1233 If the content-type string includes a comma, then the content- 1234 type string MUST be a quoted-string. If the content- type 1235 string does not include a comma, it MAY still be quoted. Since 1236 it is part of the list which must itself be a quoted- string, 1237 that means the quotation marks MUST be quoted with backslash 1238 quoting as specified in RFC 2045. If the content- type string 1239 itself contains a quoted-string, then the requirement for 1240 backslash quoting is recursively applied. To specify the text/ 1241 rex payload format in SDP, the pt parameter is mapped to an 1242 a=fmtp attribute by eliminating the parameter name (pt) and 1243 changing the commas to slashes. For example: 1245 pt = " = \"text/t140;cps=200,text/t140,text/t140\" " 1247 Implies the following sdp 1249 m=text 49170 RTP/AVP 98 100 1250 a=rtpmap:98 rex/1000 1251 a=fmtp:98 100/100/100 1252 a=rtpmap:100 t140/1000 1253 a=fmtp:100 cps=200 1255 Encoding considerations: 1256 binary; see Section 4.8 of [RFC6838]. 1258 Security considerations: 1259 See Section 9 of RFC xxxx. [RFC Editor: Upon publication as an 1260 RFC, please replace "XXXX" with the number assigned to this 1261 document and remove this note.] 1263 Interoperability considerations: 1264 None. 1266 Published specification: 1267 RFC XXXX. [RFC Editor: Upon publication as an RFC, please replace 1268 "XXXX" with the number assigned to this document and remove this 1269 note.] 1271 Applications which use this media type: 1272 For example: Text conferencing tools, multimedia conferencing 1273 tools.Real-time conversational tools. 1275 Fragment identifier considerations: 1276 N/A. 1278 Additional information: 1279 None. 1281 Person & email address to contact for further information: 1282 Gunnar Hellstrom 1284 Intended usage: 1285 COMMON 1287 Restrictions on usage: 1288 This media type depends on RTP framing, and hence is only defined 1289 for transfer via RTP [RFC3550]. 1291 Author: 1292 Gunnar Hellstrom 1294 Change controller: 1295 IETF AVTCore Working Group delegated from the IESG. 1297 2.2.10. SDP considerations 1299 There are receiving RTT implementations which implement RFC 4103 1300 [RFC4103] but not the source separation by the CSRC. Sending mixed 1301 text according to the usual CSRC convention from RFC 2198 [RFC2198] 1302 to a device implementing only RFC 4103 [RFC4103] and no multi-party 1303 mechanism would risk to lead to unreadable presented text. 1304 Therefore, in order to negotiate RTT mixing capability according to 1305 the "text/rtx" method, all devices supporting "text/rex"" for multi- 1306 party aware participants SHALL include an SDP media format "text/rex" 1307 in the SDP [RFC4566], indicating this format in offers and answers. 1308 Multi-party streams using the coding of this section intended for 1309 multi-party aware endpoints MUST NOT be sent to devices which have 1310 not indicated the "text/rex" format. 1312 Implementations not understanding the "text/rex" format MUST ignore 1313 it according to common SDP rules. 1315 The SDP media format defined here, is named "rex", for extended 1316 "red". It is intended to be used in "text" media descriptions with 1317 "text/rex" and "text/t140" formats. Both formats MUST be declared 1318 for the "text/rex" format to be used. It indicates capability to use 1319 source indications in the CSRC list and the packet format according 1320 to this section. It also indicates ability to receive 150 real-time 1321 text characters per second by default. 1323 2.2.10.1. Mapping of media parameters to sdp 1325 The information carried in the media type registration has a specific 1326 mapping to fields in the Session Description Protocol (SDP) , which 1327 is commonly used to describe RTP sessions. When SDP RFC 4566 1328 [RFC4566]is used to specify sessions employing the "text/rex" format, 1329 the mapping is as follows: 1331 * The media type ("text") goes in SDP "m=" as the media name. 1333 * The media subtype (payload format name) goes in SDP "a=rtpmap" as 1334 the encoding name. The RTP clock rate in "a=rtpmap" MUST be 1000 1335 for "text/rex". 1337 * When the payload type is used with redundancy, the level of 1338 redundancy is shown by the number of elements in the slash- 1339 separated payload type list in the "fmtp" parameter of the "text/ 1340 rex" media format. 1342 2.2.10.2. Security for session control and media 1344 Security SHOULD be applied on both session control and media. In 1345 applications where legacy endpoints without security may exist, a 1346 negotiation between security and no security SHOULD be applied. If 1347 no other security solution is mandated by the application, then RFC 1348 8643 OSRTP[RFC8643] SHOULD be applied to negotiate SRTP media 1349 security with DTLS. Most SDP examples below are for simplicity 1350 expressed without the security additions. The principles (but not 1351 all details) for applying DTLS-SRTP security is shown in a couple of 1352 the following examples. 1354 2.2.10.3. SDP offer/answer examples 1356 This sections shows some examples of SDP for session negotiation of 1357 the real-time text media in SIP sessions. Audio is usually provided 1358 in the same session, and sometimes also video. The examples only 1359 show the part of importance for the real-time text media. 1361 Offer example for just "text/rex" multi-party capability : 1363 m=text 11000 RTP/AVP 101 98 1364 a=rtpmap:98 t140/1000 1365 a=rtpmap:101 rex/1000 1366 a=fmtp:101 98/98/98 1368 Answer example from a multi-party capable device 1369 m=text 12000 RTP/AVP 101 98 1370 a=rtpmap:98 t140/1000 1371 a=rtpmap:101 rex/1000 1372 a=fmtp:101 98/98/98 1374 Offer example for "text/red" and "text/rex" multi-party support: 1376 m=text 11000 RTP/AVP 101 100 98 1377 a=rtpmap:98 t140/1000 1378 a=rtpmap:100 red/1000 1379 a=rtpmap:101 rex/1000 1380 a=fmtp:100 98/98/98 1381 a=fmtp:101 98/98/98 1382 a=rtt-mix-rtp-mixer 1384 Answer example from multi-party capable device using "text/rex". 1385 m=text 11000 RTP/AVP 101 98 1386 a=rtpmap:98 t140/1000 1387 a=rtpmap:101 rex/1000 1388 a=fmtp:101 98/98/98 1390 Offer example for both traditional "text/red" and multi-party format 1391 including security: 1392 a=fingerprint: SHA-1 \ 1393 4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB 1394 m=text 11000 RTP/AVP 101 100 98 1395 a=rtpmap:98 t140/1000 1396 a=rtpmap:100 red/1000 1397 a=rtpmap:101 rex/1000 1398 a=fmtp:100 98/98/98 1399 a=fmtp:101 98/98/98 1400 a=rtt-mix-rtp-mixer 1402 The "Fingerprint" is sufficient to offer DTLS-SRTP, with the media 1403 line still indicating RTP/AVP. 1405 Answer example from a multi-party capable device including security 1406 a=fingerprint: SHA-1 \ 1407 FF:FF:FF:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB 1408 m=text 11000 RTP/AVP 101 98 1409 a=rtpmap:98 t140/1000 1410 a=rtpmap:101 rex/1000 1411 a=fmtp:101 98/98/98 1413 With the "fingerprint" the device acknowledges use of SRTP/DTLS. 1415 Answer example from a multi-party unaware device that also 1416 does not support security: 1418 m=text 12000 RTP/AVP 100 98 1419 a=rtpmap:98 t140/1000 1420 a=rtpmap:100 red/1000 1421 a=fmtp:100 98/98/98 1423 A party which has negotiated the "text/rex" format MUST populate the 1424 CSRC-list and format the packets according to this section if it acts 1425 as an rtp-mixer and sends multi-party text. 1427 A party which has negotiated the "text/rex" capability MUST interpret 1428 the contents of the CSRC-list and the packets according to this 1429 section in received rtp packets using the corresponding payload type. 1431 A party performing as a mixer, which has not negotiated the "text/ 1432 rex" format, but negotiated a "text/red" or "text/t140" format in a 1433 session with a participant SHOULD, if nothing else is specified for 1434 the application, format transmitted text to that participant to be 1435 suitable to present on a multi-party unaware endpoint as further 1436 specified in section Section 3.2. 1438 A party not performing as a mixer MUST not include the CSRC list if 1439 it has a single source of text. 1441 2.2.10.4. Packet examples 1443 This example shows a symbolic flow of packets from a mixer with loss 1444 and recovery. A, B and C are sources of RTT. M is the mixer. Pn 1445 indicates primary data in source group "n". Rn1 is first redundant 1446 generation data and Rn2 is second redundant generation data in source 1447 group "n". A1, B1, A2 etc are text chunks (T140blocks) received from 1448 the respective sources. X indicates dropped packet between the mixer 1449 and a receiver. 1451 |----------------| 1452 |Seq no 1 | 1453 |CC=1 | 1454 |CSRC list A | 1455 |R12: Empty | 1456 |R11: Empty | 1457 |P1: A1 | 1458 |----------------| 1460 Assuming that earlier packets were received in sequence, text A1 is 1461 received from packet 1 and assigned to reception area A. 1463 |----------------| 1464 |Seq no 2 | 1465 |CC=3 | 1466 |CSRC list C,A | 1467 |R12 Empty | 1468 |R11:Empty | 1469 |P1: C1 | 1470 |R22 Empty | 1471 |R21: A1 | 1472 |P2: Empty | 1473 |----------------| 1474 Text C1 is received from packet 2 and assigned to reception area C. 1476 X----------------| 1477 X Seq no 3 | 1478 X CC=2 | 1479 X CSRC list C,A | 1480 X R12: Empty | 1481 X R11: C1 | 1482 X P1: Empty | 1483 X R22: A1 | 1484 X R21: Empty | 1485 X P2: A2 | 1486 X----------------| 1487 Packet 3 is assumed to be dropped in network problems 1489 X----------------| 1490 X Seq no 4 | 1491 X CC=3 | 1492 X CSRC list C,B,A| 1493 X R12: Empty | 1494 X R11: Empty | 1495 X P1: C2 | 1496 X R22: Empty | 1497 X R21: Empty | 1498 X P2: B1 | 1499 X R32: Empty | 1500 X R31: A2 | 1501 X P3: A3 | 1502 X----------------| 1503 Packet 4 is assumed to be dropped in network problems 1505 X----------------| 1506 X Seq no 5 | 1507 X CC=3 | 1508 X CSRC list C,B,A| 1509 X R12: Empty | 1510 X R11: C2 | 1511 X P1: Empty | 1512 X R22: Empty | 1513 X R21: B1 | 1514 X P2: B2 | 1515 X R32: A2 | 1516 X R31: A3 | 1517 X P3: A4 | 1518 X----------------| 1519 Packet 5 is assumed to be dropped in network problems 1520 |----------------| 1521 |Seq no 6 | 1522 |CC=3 | 1523 |CSRC list C,B,A | 1524 | R12: C2 | 1525 | R11: Empty | 1526 | P1: Empty | 1527 | R22: B1 | 1528 | R21: B2 | 1529 | P2: B3 | 1530 | R32: A3 | 1531 | R31: A4 | 1532 | P3: A5 | 1533 |----------------| 1535 Packet 6 is received. The latest received sequence number was 2. 1536 Recovery is therefore tried for 3,4,5. But there is no coverage for 1537 seq no 3. A missing text mark (U'FFFD) [T140ad1] is created and 1538 appended to the common mixer reception area. A missing text mark 1539 (U'FFFD) MAY also be appended in all streams which had text in the 1540 packets before and after the gap. That is in this case after A1, and 1541 C1, and before B1. 1543 For seqno 4, texts C2, B1 and A3 are recovered from the second 1544 generation redundancy and appended to their respective reception 1545 areas. For seqno 5, texts B2 and A4 are recovered from the first 1546 generation redundancy and appended to their respective reception 1547 areas. Primary text B3 and A5 are received and appended to their 1548 respective reception areas. 1550 After this sequence, the following has been received: A1,A3, A4, A5; 1551 B1, B2, B3; C1, C2. A possible loss is indicated by the general 1552 missing text mark in time between A1 and A3, and in the streams after 1553 A1 and C1 and before B1. 1555 With only one or two packets lost, there would not be any need to 1556 create a missing text marker, and all text would be recovered. 1558 It will be a design decision how to present the missing text markers 1559 assigned to the mixer as a source. 1561 2.2.10.5. Performance considerations 1563 This method allows new text from up to 15 sources per packet. A 1564 mixer implementing the specification will normally cause a latency of 1565 0 to 150 milliseconds in text from up to 15 simultaneous sources. 1566 This performance meets well the realistic requirements for conference 1567 and conversational applications for which up to 5 simultaneous 1568 sources should not be delayed more than 500 milliseconds by a mixer. 1569 In order to achieve good performance, a receiver for multi-party 1570 calls SHOULD declare a sufficient CPS value for the "text/t140" 1571 format in SDP for the number of allowable characters per second. 1573 As comparison, if the "text/red" format would be used for multi-party 1574 communication with its default timing and redundancy, 5 1575 simultaneously sending parties would cause jerky presentation of the 1576 text from them in text spurts with 5 seconds intervals. With a 1577 reduction of the transmission interval to 150 ms, the time between 1578 text spurts for 5 simultaneous sending parties would be 2.5 seconds. 1580 Five simultaneous sending parties may occasionally occur in a 1581 conference with one or two main sending parties and three parties 1582 giving very brief comments. 1584 The default maximum rate of reception of "text/t140" real-time text 1585 is in RFC 4103 [RFC4103] specified to be 30 characters per second. 1586 The value MAY be modified in the CPS parameter of the FMTP attribute 1587 in the media section for the "text/t140" media. A mixer combining 1588 real-time text from a number of sources may have a higher combined 1589 flow of text coming from the sources. Endpoints SHOULD therefore 1590 specify a suitable higher value for the CPS parameter, corresponding 1591 to its real reception capability. A value for CPS of 150 is the 1592 default for the "text/t140" stream in the "text/rex" format. See RFC 1593 4103 [RFC4103] for the format and use of the CPS parameter. The same 1594 rules apply for the "text/rex" format except for the default value. 1596 2.3. Mixing for multi-party unaware endpoints 1598 A method is specified in this section for cases when the 1599 participating endpoint does not implement any solution for multi- 1600 party presentation of real-time text. The solution requires the 1601 mixer to insert text dividers and readable labels and only send text 1602 from one source at a time until a suitable point appears for source 1603 change. This solution is a fallback method with functional 1604 limitations that acts on the presentation level and is further 1605 specified in Section 3.2. 1607 3. Presentation level considerations 1609 ITU-T T.140 [T140] provides the presentation level requirements for 1610 the RFC 4103 [RFC4103] transport. T.140 [T140] has functions for 1611 erasure and other formatting functions and has the following general 1612 statement for the presentation: 1614 "The display of text from the members of the conversation should be 1615 arranged so that the text from each participant is clearly readable, 1616 and its source and the relative timing of entered text is visualized 1617 in the display. Mechanisms for looking back in the contents from the 1618 current session should be provided. The text should be displayed as 1619 soon as it is received." 1621 Strict application of T.140 [T140] is of essence for the 1622 interoperability of real-time text implementations and to fulfill the 1623 intention that the session participants have the same information of 1624 the text contents of the conversation without necessarily having the 1625 exact same layout of the conversation. 1627 T.140 [T140] specifies a set of presentation control codes to include 1628 in the stream. Some of them are optional. Implementations MUST be 1629 able to ignore optional control codes that they do not support. 1631 There is no strict "message" concept in real-time text. Line 1632 Separator SHALL be used as a separator allowing a part of received 1633 text to be grouped in presentation. The characters "CRLF" may be 1634 used by other implementations as replacement for Line Separator. The 1635 "CRLF" combination SHALL be erased by just one erasing action, just 1636 as the Line Separator. Presentation functions are allowed to group 1637 text for presentation in smaller groups than the line separators 1638 imply and present such groups with source indication together with 1639 text groups from other sources (see the following presentation 1640 examples). Erasure has no specific limit by any delimiter in the 1641 text stream. 1643 3.1. Presentation by multi-party aware endpoints 1645 A multi-party aware receiving party, presenting real-time text MUST 1646 separate text from different sources and present them in separate 1647 presentation fields. The receiving party MAY separate presentation 1648 of parts of text from a source in readable groups based on other 1649 criteria than line separator and merge these groups in the 1650 presentation area when it benefits the user to most easily find and 1651 read text from the different participants. The criteria MAY e.g. be 1652 a received comma, full stop, or other phrase delimiters, or a long 1653 pause. 1655 When text is received from multiple original sources simultaneously, 1656 the presentation SHOULD provide a view where text is added in 1657 multiple places simultaneously. 1659 If the presentation presents text from different sources in one 1660 common area, the presenting endpoint SHOULD insert text from the 1661 local user ended at suitable points merged with received text to 1662 indicate the relative timing for when the text groups were completed. 1663 In this presentation mode, the receiving endpoint SHALL present the 1664 source of the different groups of text. 1666 A view of a three-party RTT call in chat style is shown in this 1667 example . 1669 _________________________________________________ 1670 | |^| 1671 |[Alice] Hi, Alice here. |-| 1672 | | | 1673 |[Bob] Bob as well. | | 1674 | | | 1675 |[Eve] Hi, this is Eve, calling from Paris. | | 1676 | I thought you should be here. | | 1677 | | | 1678 |[Alice] I am coming on Thursday, my | | 1679 | performance is not until Friday morning.| | 1680 | | | 1681 |[Bob] And I on Wednesday evening. | | 1682 | | | 1683 |[Alice] Can we meet on Thursday evening? | | 1684 | | | 1685 |[Eve] Yes, definitely. How about 7pm. | | 1686 | at the entrance of the restaurant | | 1687 | Le Lion Blanc? | | 1688 |[Eve] we can have dinner and then take a walk |-| 1689 |______________________________________________|v| 1690 | But I need to be back to |^| 1691 | the hotel by 11 because I need |-| 1692 | | | 1693 | I wou |-| 1694 |______________________________________________|v| 1695 | of course, I underst | 1696 |________________________________________________| 1698 Figure 3: Example of a three-party RTT call presented in chat style 1699 seen at participant 'Alice's endpoint. 1701 Other presentation styles than the chat style may be arranged. 1703 This figure shows how a coordinated column view MAY be presented. 1705 _____________________________________________________________________ 1706 | Bob | Eve | Alice | 1707 |____________________|______________________|_______________________| 1708 | | |I will arrive by TGV. | 1709 |My flight is to Orly| |Convenient to the main | 1710 | |Hi all, can we plan |station. | 1711 | |for the seminar? | | 1712 |Eve, will you do | | | 1713 |your presentation on| | | 1714 |Friday? |Yes, Friday at 10. | | 1715 |Fine, wo | |We need to meet befo | 1716 |___________________________________________________________________| 1718 Figure 4: An example of a coordinated column-view of a three-party 1719 session with entries ordered vertically in approximate time-order. 1721 3.2. Multi-party mixing for multi-party unaware endpoints 1723 When the mixer has indicated multi-party capability by the "rtt-mix- 1724 rtp-mixer" sdp attribute or the "text/rex" format in an SDP 1725 negotiation, but the multi-party capability negotiation fails with an 1726 endpoint, then the agreed "text/red" or "text/t140" format SHALL be 1727 used and the mixer SHOULD compose a best-effort presentation of 1728 multi-party real-time text in one stream intended to be presented by 1729 an endpoint with no multi-party awareness. 1731 This presentation format has functional limitations and SHOULD be 1732 used only to enable participation in multi-party calls by legacy 1733 deployed endpoints implementing only RFC 4103 without any multi-party 1734 extensions specified in this document. 1736 The principles and procedures below do not specify any new protocol 1737 elements. They are instead composed from the information in ITU-T 1738 T.140 [T140] and an ambition to provide a best effort presentation on 1739 an endpoint which has functions only for two-party calls. 1741 The mixer mixing for multi-party unaware endpoints SHALL compose a 1742 simulated limited multi-party RTT view suitable for presentation in 1743 one presentation area. The mixer SHALL group text in suitable groups 1744 and prepare for presentation of them by inserting a new line between 1745 them if the transmitted text did not already end with a new line. A 1746 presentable label SHOULD be composed and sent for the source 1747 initially in the session and after each source switch. With this 1748 procedure the time for source switching is depending on the actions 1749 of the users. In order to expedite source switch, a user can for 1750 example end its turn with a new line. 1752 3.2.1. Actions by the mixer at reception from the call participants 1754 When text is received by the mixer from the different participants, 1755 the mixer SHALL recover text from redundancy if any packets are lost. 1756 The mark for lost text [T140ad1] SHOULD be inserted in the stream if 1757 unrecoverable loss appears. Any Unicode "BOM" characters, possibly 1758 used for keep-alive shall be deleted. The time of creation of text 1759 (retrieved from the RTP timestamp) SHALL be stored together with the 1760 received text from each source in queues for transmission to the 1761 recipients. 1763 3.2.2. Actions by the mixer for transmission to the recipients 1765 The following procedure SHOULD be applied for each recipient of 1766 multi-part text from the mixer. 1768 The text for transmission SHOULD be formatted by the mixer for each 1769 receiving user for presentation in one single presentation area. 1770 Text received from a participant SHOULD NOT be included in 1771 transmission to that participant. When there is text available for 1772 transmission from the mixer to a receiving party from more than one 1773 participant, the mixer SHOULD switch between transmission of text 1774 from the different sources at suitable points in the transmitted 1775 stream. 1777 When switching source, the mixer SHOULD insert a line separator if 1778 the already transmitted text did not end with a new line (line 1779 separator or CRLF). A label SHOULD be composed from information in 1780 the CNAME and NAME fields in RTCP reports from the participant to 1781 have its text transmitted, or from other session information for that 1782 user. The label SHOULD be delimited by suitable characters (e.g. '[ 1783 ]') and transmitted. The CSRC SHOULD indicate the selected source. 1784 Then text from that selected participant SHOULD be transmitted until 1785 a new suitable point for switching source is reached. 1787 Seeking a suitable point for switching source SHOULD be done when 1788 there is older text waiting for transmission from any party than the 1789 age of the last transmitted text. Suitable points for switching are: 1791 * A completed phrase ended by comma 1793 * A completed sentence 1795 * A new line (line separator or CRLF) 1797 * A long pause (e.g. > 10 seconds) in received text from the 1798 currently transmitted source 1800 * If text from one participant has been transmitted with text from 1801 other sources waiting for transmission for a long time (e.g. > 1 1802 minute) and none of the other suitable points for switching has 1803 occurred, a source switch MAY be forced by the mixer at next word 1804 delimiter, and also if even a word delimiter does not occur within 1805 a time (e.g. 15 seconds) after the scan for word delimiter 1806 started. 1808 When switching source, the source which has the oldest text in queue 1809 SHOULD be selected to be transmitted. A character display count 1810 SHOULD be maintained for the currently transmitted source, starting 1811 at zero after the label is transmitted for the currently transmitted 1812 source. 1814 The status SHOULD be maintained for the latest control code for 1815 Select Graphic Rendition (SGR) from each source. If there is an SGR 1816 code stored as the status for the current source before the source 1817 switch is done, a reset of SGR shall be sent by the sequence SGR 0 1818 [009B 0000 006D] after the new line and before the new label during a 1819 source switch. See SGR below for an explanation. This transmission 1820 does not influence the display count. 1822 If there is an SGR code stored for the new source after the source 1823 switch, that SGR code SHOULD be transmitted to the recipient before 1824 the label. This transmission does not influence the display count. 1826 3.2.3. Actions on transmission of text 1828 Text from a source sent to the recipient SHOULD increase the display 1829 count by one per transmitted character. 1831 3.2.4. Actions on transmission of control codes 1833 The following control codes specified by T.140 require specific 1834 actions. They SHOULD cause specific considerations in the mixer. 1835 Note that the codes presented here are expressed in UCS-16, while 1836 transmission is made in UTF-8 transform of these codes. 1838 BEL 0007 Bell Alert in session, provides for alerting during an 1839 active session. The display count SHOULD not be altered. 1841 NEW LINE 2028 Line separator. Check and perform a source switch if 1842 appropriate. Increase display count by 1. 1844 CR LF 000D 000A A supported, but not preferred way of requesting a 1845 new line. Check and perform a source switch if appropriate. 1846 Increase display count by 1. 1848 INT ESC 0061 Interrupt (used to initiate mode negotiation 1849 procedure). The display count SHOULD not be altered. 1851 SGR 009B Ps 006D Select graphic rendition. Ps is rendition 1852 parameters specified in ISO 6429. The display count SHOULD not be 1853 altered. The SGR code SHOULD be stored for the current source. 1855 SOS 0098 Start of string, used as a general protocol element 1856 introducer, followed by a maximum 256 bytes string and the ST. 1857 The display count SHOULD not be altered. 1859 ST 009C String terminator, end of SOS string. The display count 1860 SHOULD not be altered. 1862 ESC 001B Escape - used in control strings. The display count SHOULD 1863 not be altered for the complete escape code. 1865 Byte order mark "BOM" (U+FEFF) "Zero width, no break space", used 1866 for synchronization and keep-alive. SHOULD be deleted from 1867 incoming streams. Shall be sent first after session establishment 1868 to the recipient. The display count shall not be altered. 1870 Missing text mark (U+FFFD) "Replacement character", represented as a 1871 question mark in a rhombus, or if that is not feasible, replaced 1872 by an apostrophe ', marks place in stream of possible text loss. 1873 SHOULD be inserted by the reception procedure in case of 1874 unrecoverable loss of packets. The display count SHOULD be 1875 increased by one when sent as for any other character. 1877 SGR If a control code for selecting graphic rendition (SGR), other 1878 than reset of the graphic rendition (SGR 0) is sent to a 1879 recipient, that control code shall also be stored as status for 1880 the source in the storage for SGR status. If a reset graphic 1881 rendition (SGR 0) originated from a source is sent, then the SGR 1882 status storage for that source shall be cleared. The display 1883 count shall not be increased. 1885 BS (U+0008) Back Space, intended to erase the last entered character 1886 by a source. Erasure by backspace cannot always be performed as 1887 the erasing party intended. If an erasing action erases all text 1888 up to the end of the leading label after a source switch, then the 1889 mixer must not transmit more backspaces. Instead it is 1890 RECOMMENDED that a letter "X" is inserted in the text stream for 1891 each backspace as an indication of the intent to erase more. A 1892 new line is usually coded by a Line Separator, but the character 1893 combination "CRLF" MAY be used instead. Erasure of a new line is 1894 in both cases done by just one erasing action (Backspace). If the 1895 display count has a positive value it is decreased by one when the 1896 BS is sent. If the display count is at zero, it is not altered. 1898 3.2.5. Packet transmission 1900 A mixer transmitting to a multi-party unaware terminal SHOULD send 1901 primary data only from one source per packet. The SSRC SHOULD be the 1902 SSRC of the mixer. The CSRC list SHOULD contain one member and be 1903 the SSRC of the source of the primary data. 1905 3.2.6. Functional limitations 1907 When a multi-party unaware endpoint presents a conversation in one 1908 display area in a chat style, it inserts source indications for 1909 remote text and local user text as they are merged in completed text 1910 groups. When an endpoint using this layout receives and presents 1911 text mixed for multi-party unaware endpoints, there will be two 1912 levels of source indicators for the received text; one generated by 1913 the mixer and inserted in a label after each source switch, and 1914 another generated by the receiving endpoint and inserted after each 1915 switch between local and remote source in the presentation area. 1916 This will waste display space and look inconsistent to the reader. 1918 New text can be presented only from one source at a time. Switch of 1919 source to be presented takes place at suitable places in the text, 1920 such as end of phrase, end of sentence, line separator and 1921 inactivity. Therefore the time to switch to present waiting text 1922 from other sources may become long and will vary and depend on the 1923 actions of the currently presented source. 1925 Erasure can only be done up to the latest source switch. If a user 1926 tries to erase more text, the erasing actions will be presented as 1927 letter X after the label. 1929 Text loss because of network errors may hit the label between entries 1930 from different parties, causing risk for misunderstanding from which 1931 source a piece of text is. 1933 These facts makes it strongly RECOMMENDED to implement multi-party 1934 awareness in RTT endpoints. The use of the mixing method for multi- 1935 party-unaware endpoints should be left for use with endpoints which 1936 are impossible to upgrade to become multi-party aware. 1938 3.2.7. Example views of presentation on multi-party unaware endpoints 1940 The following pictures are examples of the view on a participant's 1941 display for the multi-party-unaware case. 1943 _________________________________________________ 1944 | Conference | Alice | 1945 |________________________|_________________________| 1946 | |I will arrive by TGV. | 1947 |[Bob]:My flight is to |Convenient to the main | 1948 |Orly. |station. | 1949 |[Eve]:Hi all, can we | | 1950 |plan for the seminar. | | 1951 | | | 1952 |[Bob]:Eve, will you do | | 1953 |your presentation on | | 1954 |Friday? | | 1955 |[Eve]:Yes, Friday at 10.| | 1956 |[Bob]: Fine, wo |We need to meet befo | 1957 |________________________|_________________________| 1959 Figure 5: Alice who has a conference-unaware client is receiving the 1960 multi-party real-time text in a single-stream. This figure shows how 1961 a coordinated column view MAY be presented on Alice's device. 1963 _________________________________________________ 1964 | |^| 1965 |[Alice] Hi, Alice here. |-| 1966 | | | 1967 |[mix][Bob] Bob as well. | | 1968 | | | 1969 |[Eve] Hi, this is Eve, calling from Paris | | 1970 | I thought you should be here. | | 1971 | | | 1972 |[Alice] I am coming on Thursday, my | | 1973 | performance is not until Friday morning.| | 1974 | | | 1975 |[mix][Bob] And I on Wednesday evening. | | 1976 | | | 1977 |[Eve] we can have dinner and then walk | | 1978 | | | 1979 |[Eve] But I need to be back to | | 1980 | the hotel by 11 because I need | | 1981 | |-| 1982 |______________________________________________|v| 1983 | of course, I underst | 1984 |________________________________________________| 1986 Figure 6: An example of a view of the multi-party unaware 1987 presentation in chat style. Alice is the local user. 1989 4. Gateway Considerations 1991 4.1. Gateway considerations with Textphones (e.g. TTYs). 1993 Multi-party RTT sessions may involve gateways of different kinds. 1994 Gateways involved in setting up sessions SHALL correctly reflect the 1995 multi-party capability or unawareness of the combination of the 1996 gateway and the remote endpoint beyond the gateway. 1998 One case that may occur is a gateway to PSTN for communication with 1999 textphones (e.g. TTYs). Textphones are limited devices with no 2000 multi-party awareness, and it SHOULD therefore be suitable for the 2001 gateway to not indicate multi-party awareness for that case. Another 2002 solution is that the gateway indicates multi-party capability towards 2003 the mixer, and includes the multi-party mixer function for multi- 2004 party unaware endpoints itself. This solution makes it possible to 2005 make adaptations for the functional limitations of the textphone 2006 (TTY). 2008 More information on gateways to textphones (TTYs) is found in RFC 2009 5194[RFC5194] 2011 4.2. Gateway considerations with WebRTC. 2013 Gateway operation to real-time text in WebRTC may also be required. 2014 In WebRTC, RTT is specified in draft-ietf-mmusic-t140-usage-data- 2015 channel[I-D.ietf-mmusic-t140-usage-data-channel]. 2017 A multi-party bridge may have functionality for communicating by RTT 2018 both in RTP streams with RTT and WebRTC t140 data channels. Other 2019 configurations may consist of a multi-party bridge with either 2020 technology for RTT transport and a separate gateway for conversion of 2021 the text communication streams between RTP and t140 data channel. 2023 In WebRTC, it is assumed that for a multi-party session, one t140 2024 data channel is established for each source from a gateway or bridge 2025 to each participant. Each participant also has a data channel with 2026 two-way connection with the gateway or bridge. 2028 The t140 channel used both ways is for text from the WebRTC user and 2029 from the bridge or gateway itself to the WebRTC user. The label 2030 parameter of this t140 channel is used as NAME field in RTCP to 2031 participants on the RTP side. The other t140 channels are only for 2032 text from other participants to the WebRTC user. 2034 When a new participant has entered the session with RTP transport of 2035 rtt, a new t140 channel SHOULD be established to WebRTC users with 2036 the label parameter composed from the NAME field in RTCP on the RTP 2037 side. 2039 When a new participant has entered the multi-party session with RTT 2040 transport in a WebRTC t140 data channel, the new participant SHOULD 2041 be announced by a notification to RTP users. The label parameter 2042 from the WebRTC side SHOULD be used as the NAME RTCP field on the RTP 2043 side, or other available session information. 2045 5. Updates to RFC 4102 and RFC 4103 2047 This document updates RFC 4102[RFC4102] and RFC 4103[RFC4103] by 2048 introducing an sdp media attribute "rtt-mix-rtp-mixer" for 2049 negotiation of multi-party mixing capability with the [RFC4103] 2050 format and an extended packet format "text/rex" for the enhanced 2051 performance multi-party mixing case and more strict rules for the use 2052 of redundancy, and population of the CSRC list in the packets. 2053 Implications for the CSRC list use from RFC 2198[RFC2198] is not in 2054 effect for the "text/rex" format. 2056 The update is in line with the statement in RFC 4103 section 4, 2057 saying that "Forward Error Correction mechanisms, ..., or any other 2058 mechanism with the purpose of increasing the reliability of text 2059 transmission, MAY be used as an alternative or complement to 2060 redundancy." 2062 6. Congestion considerations 2064 The congestion considerations and recommended actions from RFC 4103 2065 [RFC4103] are valid also in multi-party situations. 2067 The first action in case of congestion SHOULD be to temporarily 2068 increase the transmission interval up to two seconds. 2070 7. Acknowledgements 2072 James Hamlin for format input. 2074 8. IANA Considerations 2076 8.1. Registration of the "rtt-mix-rtp-mixer" sdp media attribute 2078 [RFC EDITOR NOTE: Please replace all instances of RFCXXXX with the 2079 RFC number of this document.] 2081 IANA is asked to register the new sdp attribute "rtt-mix-rtp-mixer". 2083 Contact name: IESG 2085 Contact email: iesg@ietf.org 2087 Attribute name: rtt-mix-rtp-mixer 2089 Attribute syntax: a=rtt-mix-rtp-mixer 2091 Attribute semantics: See RFCXXXX Section 2.1.1 2093 Attribute value: none 2095 Usage level: media 2097 Purpose: Indicate support by mixer or endpoint of multi-party mixing 2098 for real-time text transmission, using a common RTP-stream for 2099 transmission of text from a number of sources mixed with one 2100 source at a time and the source indicated in a single CSRC-list 2101 member. 2103 Charset Dependent: no 2104 O/A procedure: See RFCXXXX Section 2.1.19 2106 Mux Category: normal 2108 Reference: RFCXXXX 2110 8.2. Registration of "text/rex" media subtype 2112 The IANA is requested to register the media type "text/rex" as 2113 specified in Section 2.2.9. The media type is also requested to be 2114 added to the IANA registry for "RTP Payload Format Media Types" 2115 . 2117 9. Security Considerations 2119 The RTP-mixer model requires the mixer to be allowed to decrypt, pack 2120 and encrypt secured text from the conference participants. Therefore 2121 the mixer needs to be trusted. This is similar to the situation for 2122 central mixers of audio and video. 2124 The requirement to transfer information about the user in RTCP 2125 reports in SDES, CNAME and NAME fields, and in conference 2126 notifications, for creation of labels may have privacy concerns as 2127 already stated in RFC 3550 [RFC3550], and may be restricted of 2128 privacy reasons. The receiving user will then get a more symbolic 2129 label for the source. 2131 10. Change history 2133 10.1. Changes included in draft-ietf-avtcore-multi-party-rtt-mix-07 2135 Added a method based on the "text/red" format and single source per 2136 packet, negotiated by the "rtt-mix-rtp-mixer" sdp attribute. 2138 Added reasoning and recommendation about indication of loss. 2140 The highest number of sources in one packet is 15, not 16. Changed. 2142 Added in information on update to RFC 4103 that RFC 4103 explicitly 2143 allows addition of FEC method. The redundancy is a kind of forward 2144 error correction.. 2146 10.2. Changes included in draft-ietf-avtcore-multi-party-rtt-mix-06 2148 Improved definitions list format. 2150 The format of the media subtype parameters is made to match the 2151 requirements. 2153 The mapping of media subtype parameters to sdp is included. 2155 The CPS parameter belongs to the t140 subtype and does not need to be 2156 registered here. 2158 10.3. Changes included in draft-ietf-avtcore-multi-party-rtt-mix-05 2160 nomenclature and editorial improvements 2162 "this document" used consistently to refer to this document. 2164 10.4. Changes included in draft-ietf-avtcore-multi-party-rtt-mix-04 2166 'Redundancy header' renamed to 'data header'. 2168 More clarifications added. 2170 Language and figure number corrections. 2172 10.5. Changes included in draft-ietf-avtcore-multi-party-rtt-mix-03 2174 Mention possible need to mute and raise hands as for other media. 2175 ---done ---- 2177 Make sure that use in two-party calls is also possible and explained. 2178 - may need more wording - 2180 Clarify the RTT is often used together with other media. --done-- 2182 Tell that text mixing is N-1. A users own text is not received in 2183 the mix. -done- 2185 In 3. correct the interval to: A "text/rex" transmitter SHOULD send 2186 packets distributed in time as long as there is something (new or 2187 redundant T140blocks) to transmit. The maximum transmission interval 2188 SHOULD then be 300 ms. It is RECOMMENDED to send a packet to a 2189 receiver as soon as new text to that receiver is available, as long 2190 as the time after the latest sent packet to the same receiver is more 2191 than 150 ms, and also the maximum character rate to the receiver is 2192 not exceeded. The intention is to keep the latency low while keeping 2193 a good protection against text loss in bursty packet loss conditions. 2194 -done- 2196 In 1.3 say that the format is used both ways. -done- 2198 In 13.1 change presentation area to presentation field so that reader 2199 does not think it shall be totally separated. -done- 2200 In Performance and intro, tell the performance in number of 2201 simultaneous sending users and introduced delay 16, 150 vs 2202 requirements 5 vs 500. -done -- 2204 Clarify redundancy level per connection. -done- 2206 Timestamp also for the last data header. To make it possible for all 2207 text to have time offset as for transmission from the source. Make 2208 that header equal to the others. -done- 2210 Mixer always use the CSRC list, even for its own BOM. -done- 2212 Combine all talk about transmission interval (300 ms vs when text has 2213 arrived) in section 3 in one paragraph or close to each other. -done- 2215 Documents the goal of good performance with low delay for 5 2216 simultaneous typers in the introduction. -done- 2218 Describe better that only primary text shall be sent on to receivers. 2219 Redundancy and loss must be resolved by the mixer. -done- 2221 10.6. Changes included in draft-ietf-avtcore-multi-party-rtt-mix-02 2223 SDP and better description and visibility of security by OSRTP RFC 2224 8634 needed. 2226 The description of gatewaying to WebRTC extended. 2228 The description of the data header in the packet is improved. 2230 10.7. Changes to draft-ietf-avtcore-multi-party-rtt-mix-01 2232 2,5,6 More efficient format "text/rex" introduced and attribute 2233 a=rtt-mix deleted. 2235 3. Brief about use of OSRTP for security included- More needed. 2237 4. Brief motivation for the solution and why not rtp-translator is 2238 used added to intro. 2240 7. More limitations for the multi-party unaware mixing method 2241 inserted. 2243 8. Updates to RFC 4102 and 4103 more clearly expressed. 2245 9. Gateway to WebRTC started. More needed. 2247 10.8. Changes from draft-hellstrom-avtcore-multi-party-rtt-source-03 to 2248 draft-ietf-avtcore-multi-party-rtt-mix-00 2250 Changed file name to draft-ietf-avtcore-multi-party-rtt-mix-00 2252 Replaced CDATA in IANA registration table with better coding. 2254 Converted to xml2rfc version 3. 2256 10.9. Changes from draft-hellstrom-avtcore-multi-party-rtt-source-02 to 2257 -03 2259 Changed company and e-mail of the author. 2261 Changed title to "RTP-mixer formatting of multi-party Real-time text" 2262 to better match contents. 2264 Check and modification where needed of use of RFC 2119 words SHALL 2265 etc. 2267 More about the CC value in sections on transmitters and receivers so 2268 that 1-to-1 sessions do not use the mixer format. 2270 Enhanced section on presentation for multi-party-unaware endpoints 2272 A paragraph recommending CPS=150 inserted in the performance section. 2274 10.10. Changes from draft-hellstrom-avtcore-multi-party-rtt-source-01 2275 to -02 2277 In Abstract and 1. Introduction: Introduced wording about regulatory 2278 requirements. 2280 In section 5: The transmission interval is decreased to 100 ms when 2281 there is text from more than one source to transmit. 2283 In section 11 about SDP negotiation, a SHOULD-requirement is 2284 introduced that the mixer should make a mix for multi-party unaware 2285 endpoints if the negotiation is not successful. And a reference to a 2286 later chapter about it. 2288 The presentation considerations chapter 14 is extended with more 2289 information about presentation on multi-party aware endpoints, and a 2290 new section on the multi-party unaware mixing with low functionality 2291 but SHOULD a be implemented in mixers. Presentation examples are 2292 added. 2294 A short chapter 15 on gateway considerations is introduced. 2296 Clarification about the text/t140 format included in chapter 10. 2298 This sentence added to the chapter 10 about use without redundancy. 2299 "The text/red format SHOULD be used unless some other protection 2300 against packet loss is utilized, for example a reliable network or 2301 transport." 2303 Note about deviation from RFC 2198 added in chapter 4. 2305 In chapter 9. "Use with SIP centralized conferencing framework" the 2306 following note is inserted: Note: The CSRC-list in an RTP packet only 2307 includes participants who's text is included in one or more text 2308 blocks. It is not the same as the list of participants in a 2309 conference. With audio and video media, the CSRC-list would often 2310 contain all participants who are not muted whereas text participants 2311 that don't type are completely silent and so don't show up in RTP 2312 packet CSRC-lists. 2314 10.11. Changes from draft-hellstrom-avtcore-multi-party-rtt-source-00 2315 to -01 2317 Editorial cleanup. 2319 Changed capability indication from fmtp-parameter to SDP attribute 2320 "rtt-mix". 2322 Swapped order of redundancy elements in the example to match reality. 2324 Increased the SDP negotiation section 2326 11. References 2328 11.1. Normative References 2330 [I-D.ietf-mmusic-t140-usage-data-channel] 2331 Holmberg, C. and G. Hellstrom, "T.140 Real-time Text 2332 Conversation over WebRTC Data Channels", Work in Progress, 2333 Internet-Draft, draft-ietf-mmusic-t140-usage-data-channel- 2334 14, 10 April 2020, . 2337 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2338 Requirement Levels", BCP 14, RFC 2119, 2339 DOI 10.17487/RFC2119, March 1997, 2340 . 2342 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 2343 Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse- 2344 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 2345 DOI 10.17487/RFC2198, September 1997, 2346 . 2348 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 2349 Jacobson, "RTP: A Transport Protocol for Real-Time 2350 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 2351 July 2003, . 2353 [RFC4102] Jones, P., "Registration of the text/red MIME Sub-Type", 2354 RFC 4102, DOI 10.17487/RFC4102, June 2005, 2355 . 2357 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 2358 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 2359 . 2361 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 2362 Description Protocol", RFC 4566, DOI 10.17487/RFC4566, 2363 July 2006, . 2365 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 2366 Formats", RFC 4855, DOI 10.17487/RFC4855, February 2007, 2367 . 2369 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer 2370 Security (DTLS) Extension to Establish Keys for the Secure 2371 Real-time Transport Protocol (SRTP)", RFC 5764, 2372 DOI 10.17487/RFC5764, May 2010, 2373 . 2375 [RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for 2376 Keeping Alive the NAT Mappings Associated with RTP / RTP 2377 Control Protocol (RTCP) Flows", RFC 6263, 2378 DOI 10.17487/RFC6263, June 2011, 2379 . 2381 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 2382 Specifications and Registration Procedures", BCP 13, 2383 RFC 6838, DOI 10.17487/RFC6838, January 2013, 2384 . 2386 [RFC8643] Johnston, A., Aboba, B., Hutton, A., Jesske, R., and T. 2387 Stach, "An Opportunistic Approach for Secure Real-time 2388 Transport Protocol (OSRTP)", RFC 8643, 2389 DOI 10.17487/RFC8643, August 2019, 2390 . 2392 [T140] ITU-T, "Recommendation ITU-T T.140 (02/1998), Protocol for 2393 multimedia application text conversation", February 1998, 2394 . 2396 [T140ad1] ITU-T, "Recommendation ITU-T.140 Addendum 1 - (02/2000), 2397 Protocol for multimedia application text conversation", 2398 February 2000, 2399 . 2401 11.2. Informative References 2403 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 2404 Session Initiation Protocol (SIP)", RFC 4353, 2405 DOI 10.17487/RFC4353, February 2006, 2406 . 2408 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A 2409 Session Initiation Protocol (SIP) Event Package for 2410 Conference State", RFC 4575, DOI 10.17487/RFC4575, August 2411 2006, . 2413 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 2414 (SIP) Call Control - Conferencing for User Agents", 2415 BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, 2416 . 2418 [RFC5194] van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real- 2419 Time Text over IP Using the Session Initiation Protocol 2420 (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008, 2421 . 2423 Author's Address 2425 Gunnar Hellstrom 2426 Gunnar Hellstrom Accessible Communication 2427 Esplanaden 30 2428 SE-13670 Vendelso 2429 Sweden 2431 Email: gunnar.hellstrom@ghaccess.se