idnits 2.17.1 draft-hellstrom-text-conference-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 14, 2011) is 4790 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Hellstrom 3 Internet-Draft Omnitor 4 Intended status: BCP A. van Wijk 5 Expires: September 15, 2011 Real-Time Text Taskforce (R3TF) 6 March 14, 2011 8 Text media handling in RTP based real-time conferences 9 draft-hellstrom-text-conference-04 11 Abstract 13 This memo specifies methods for text media handling in multi-party 14 calls, where the text is carried by the RTP protocol. Real-time text 15 is carried in a time-sampled mode according to RFC 4103. Centralized 16 multi-party handling of real-time text is achieved through a media 17 control unit coordinating multiple RTP text streams into one single 18 stream RTP session, identifying each stream with its own CSRC. 19 Identification for the streams are provided through the RTCP 20 messages. This mechanism enables the receiving application to 21 present the received real-time text medium in different ways 22 according to user preferences. Some presentation related features 23 are also described explaining suitable variations of transmission and 24 presentation of text. Call control features are described for the 25 SIP environment, while the transport mechanisms should be suitable 26 for any IP based call control environment using RTP transport. Two 27 alternative methods using a single RTP stream and source 28 identification inline in the text stream are also described. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on September 15, 2011. 47 Copyright Notice 48 Copyright (c) 2011 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 3 65 2. Centralized conference model . . . . . . . . . . . . . . . . . 3 66 2.1. Coordination of text RTP streams . . . . . . . . . . . . . 4 67 2.2. Session control of multi-party sessions . . . . . . . . . . 4 68 3. Identification of the source of text . . . . . . . . . . . . . 5 69 4. Presentation of multi-party text . . . . . . . . . . . . . . . 5 70 4.1. Associating identities with text streams . . . . . . . . . 6 71 5. Transmission of text from each user . . . . . . . . . . . . . . 6 72 6. Presentation level source indicator . . . . . . . . . . . . . . 6 73 7. Mixing for conference-unaware user agents . . . . . . . . . . . 7 74 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 75 9. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 76 10. Congestion considerations . . . . . . . . . . . . . . . . . . . 8 77 11. Normative References . . . . . . . . . . . . . . . . . . . . . 8 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 80 1. Introduction 82 Real-time text is a medium in real-time conversational sessions. 83 Text entered by participants in a session is transmitted in a time- 84 sampled fashion, so that no specific user action is needed to cause 85 transmission. This gives a direct flow of text that is suitable in a 86 real-time conversational setting. The real-time text medium can be 87 combined with other media in multimedia sessions. 89 A number of multimedia sessions can be combined in a multi-party 90 session. This memo specifies how the real-time text streams are 91 handled in such multi-party sessions. 93 The description is mainly focused on the transport level, but also 94 describes a few presentation level features. 96 Transport of real-time text is specified in RFC 4103 [RFC4103] RTP 97 Payload for text conversation. It makes use of RFC 3550 [RFC3550] 98 Real Time Protocol, for transport, and is usually used in the SIP 99 Session Initiation Protocol RFC 3261 [RFC3261] environment, even if 100 it is also used in other call control environments. Call control 101 aspects in this specification are explained with examples from SIP. 102 The specifications about how to handle multi-party text transport, 103 identification and presentation are valid also for other call control 104 environments where RTP and RTCP are used. 106 A very brief overview of functions for both real-time and messaging 107 text handling in multi-party sessions is described in RFC 4597 108 [RFC4579] Conferencing Scenarios. This specification builds on that 109 description and indicates what existing protocol mechanisms should be 110 used to implement multi-party handling of text in real-time sessions. 112 1.1. Requirements Language 114 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 115 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 116 document are to be interpreted as described in RFC 2119 [RFC2119]. 118 2. Centralized conference model 120 In the centralized conference model, one function co-ordinates the 121 sessions with participants in the multi-party session. This function 122 also controls media mixer functions for the media appearing in the 123 session. The central function is common for control of all media, 124 while the media mixers may work differently for each medium. 126 The central function is called the Focus UA and may be co-located in 127 an advanced terminal including multi-party control functions, or it 128 may be located in a separate location. Many variants exist for 129 setting up sessions including the multipoint control centre, It is 130 not within scope of this description to describe these, but rather 131 the media specific handling in the mixer required to handle multi- 132 party calls. 134 The main principle for handling real-time text media in a centralized 135 conference is that one RTP session for real-time text is established 136 between the multipoint media control centre and each participant who 137 is going to have real-time text exchange with the others. 139 2.1. Coordination of text RTP streams 141 The preferred way of coordinating text RTP streams is that within 142 each RTP session, text from all participants are transmitted from the 143 media mixer in the same RTP stream, thus all using the same 144 destination address/port combination, and the same RTP SSRC as 145 described in Section 7.1 and 7.3 of RTP RFC 3550 [RFC3550] about the 146 Mixer function. The source of the primary text in each RTP packet is 147 identified by the CSRC parameter, containing the SSRC of the initial 148 source of text. 150 The mixer MUST NOT transmit redundant levels of text from one source 151 together with primary text from another source. Thus, when there is 152 text available for primary or redundant transmission from more than 153 one source, the mixer MUST buffer text from other sources until all 154 the redundant transmissions of a packet from one selected source has 155 been transmitted. Without this restriction, there would be no way to 156 decide with what source to associate text recovered from the 157 redundant information in case of packet loss. 159 The identification of the source is made through the RTCP SDES CNAME 160 and NAME packets as described in RTP[RFC3550]. 162 This method enables the receiver to freely select display 163 characteristics of the text conversation. 165 2.2. Session control of multi-party sessions 167 General session control aspects for multi-party sessions are 168 described in RFC 4575 [RFC4575] A Session Initiation Protocol (SIP) 169 Event Package for Conference State, and RFC 4579 [RFC4579] Session 170 Initiation Protocol (SIP) Call Control - Conferencing for User 171 Agents. The nomenclature of these specifications are used here. 173 The procedures for the mixer-based model shall only be applied if a 174 capability exchange for mixer-based real-time text transmission has 175 been completed. Capability for the mixer-based model is indicated by 176 both the focus and the user agent by the media tag rtt-mixer=rtp-mix 178 3. Identification of the source of text 180 The Focus UA co-ordinates the media flow. Real-time text media from 181 different sources are combined in one text media session by the Focus 182 UA. The main principle is that the Focus UA SHOULD act as an RTP 183 Mixer as described in RTP Section 7.1 [RFC3550]. 185 The RTP text stream from each participant who transmits text is 186 allocated one unique CSRC. The CSRC is used by the receiver to 187 identify text packets originating from one source. Each RTP packet 188 MUST contain text from only one source. 190 The redundancy mechanism for increased robustness used by the RFC 191 4103 transport makes use of the RTP sequence number for detection of 192 loss. The RTP Mixer mechanism maintains a separate CSRC for each 193 source RTP stream in the combined RTP session. Therefore the RTP 194 Mixer mechanism can be used for conveying text from multiple sources 195 to one destination, with maintained possibility to detect and recover 196 loss and identify text from the different sources. 198 As soon as a new member is added to the RTP session, its 199 characteristics shall be transmitted in RTCP SDES CNAME and NAME 200 reports according to section 6.5 in RFC 3550. 202 The RTCP SDES report, SHOULD contain identification of the source 203 represented by the CSRC identifier. This identification MUST contain 204 the CNAME field and MAY contain the NAME field and other defined 205 fields of the SDES report. 207 A focus UA SHOULD primarily convey SDES information received from the 208 sources of the session members. When such information is not 209 available, the focus UA SHOULD compose CSRC, CNAME and NAME 210 information from available information from the SIP session with the 211 participant. 213 4. Presentation of multi-party text 215 All session participants MUST observe the CSRC field of incoming text 216 RTP packets, and make note of what source they came from in order to 217 be able to present text in a way that makes it easy to read text from 218 each participant in a session, and get information about the source 219 of the text. 221 4.1. Associating identities with text streams 223 A source identity SHOULD be composed from available information 224 sources and displayed together with the text as indicated in ITU-T 225 T.140 Appendix[T.140]. 227 The source should primarily be the NAME field from incoming SDES 228 packets. If this information is not available, and the session is a 229 two-party session, then the T.140 source identity SHOULD be composed 230 from the SIP session participant information. For multi-party 231 sessions the source identity may be composed by local information if 232 sufficient information is not available in the session. 234 Applications may abbreviate the presented source identity to a 235 suitable form for the available display. 237 5. Transmission of text from each user 239 UAs participating in sessions with real-time text, SHOULD send SDES 240 packets in RTCP giving values to appropriate identification fields. 242 The CNAME field SHALL be included in SDES packets. 244 The NAME field should be given a value that is suitable as an 245 identifier of text from the user of the UA. 247 6. Presentation level source indicator 249 In certain application environments, it may be known to be unsuitable 250 to use the CSRC identification on the RTP level as the base for 251 identificating the source of text. In such cases, an inline coding 252 of the source of text SHOULD be applied in the data stream itself, 253 and an RTP mixer function normally without CSRC identification used 254 for coordinating the sources of text into one RTP stream. 256 The support of this mixer type is indicated by the SIP header rtt- 257 mix=t140, both by the focus and the user agent. 259 Information uniquely identifying each user in the multi-party session 260 SHALL then be placed as the parameter value "cn" in the T.140 261 application protocol function with the function code "c". The 262 identifier shall thus be formatted like this: SOS c cn field contents 263 ST, where SOS and ST are coded as specified in ITU-T T.140 [T.140]. 264 The cn parameter shall be kept short so that it can be repeated in 265 the transmission without concerns for network load. 267 The information otherwise conveyed in the NAME field of an SDES 268 packet SHOULD then be placed as the parameter value in the T.140 269 application protocol function with the function code "n". 271 A T.140 application protocol function with the function code "c" MUST 272 be included in the text in the beginning of text when the source of 273 the text changes. A T.140 application protocol function with the 274 function code "c" MAY be repeated in the text from the same the 275 source. A T.140 application protocol function with the function code 276 "n" MAY be included in the text to further provide identification of 277 the transmitting party. This information SHOULD also be provided in 278 the SDES name field. A receiving UA SHOULD separate text from the 279 different sources and identify and display them accordingly. 281 In this case, the mixer can use the redundancy transmission function 282 of RFC 4103 without restrictions. 284 7. Mixing for conference-unaware user agents 286 Multi-party real-time text contents can be transmitted to conference- 287 unaware user agents if source labeling and formatting of the text is 288 performed by a mixer. This method has the limitations that the 289 format of source identification is purely controlled by the mixer, 290 and that only one source at a time is allowed to present in real- 291 time. Other sources need to be stored temporarily waiting for an 292 appropriate moment to switch the source of transmitted text. 294 This method is used when no exchange of the rtt-mixer media tag has 295 occurred in the session setup. Support for the method can however be 296 expressed by the focus by the SIP media tag rtt-mixer=text-mixer. 298 8. IANA Considerations 300 This document Introduces the SIP media tag rtt-mixer, with a comma- 301 separated parameter list containing the following possible values: 303 rtp-mixer 305 t140-mixer 307 text-mixer 309 rtp-mixer indicates capability for using the RTP-mixer based 310 presentation of multi-party text. t140-mixer indicates capability for 311 using the T.140 control code source indicators in a mixer. text-mixer 312 indicates capability for using text-level control over formatting and 313 presentation of multi-party text presentation. 315 9. Security Considerations 317 The security considerations valid for RFC 4103 and RFC 3550 are valid 318 also for the multi-party sessions with text. 320 10. Congestion considerations 322 The congestion considerations described in RFC 4103 are valid also 323 for multi-party use of the real-time text RTP transport. A risk for 324 congestion may appear if a number of conference participants are 325 active transmitting text simultaneously, because this multi-party 326 transmission method does not allow multiple sources of text to 327 contribute to the same packet. 329 In situations of risk for congestion, the Focus UA MAY combine 330 packets from the same source to increase the transmission interval 331 per source up to one second. Local conference policy in the Focus UA 332 may be used to decide on which streams shall be selected for such 333 transmission frequency reduction. 335 11. Normative References 337 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 338 Requirement Levels", BCP 14, RFC 2119, March 1997. 340 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 341 A., Peterson, J., Sparks, R., Handley, M., and E. 342 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 343 June 2002. 345 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 346 Jacobson, "RTP: A Transport Protocol for Real-Time 347 Applications", STD 64, RFC 3550, July 2003. 349 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 350 Conversation", RFC 4103, June 2005. 352 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 353 Initiation Protocol (SIP) Event Package for Conference 354 State", RFC 4575, August 2006. 356 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 357 (SIP) Call Control - Conferencing for User Agents", 358 BCP 119, RFC 4579, August 2006. 360 [T.140] ITU-T, "Protocol for multimedia application text 361 conversation", 1998, 362 . 364 Authors' Addresses 366 Gunnar Hellstrom 367 Omnitor 368 Box 92054 369 Stockholm SE-120 06 370 SE 372 Phone: +46 858900056 373 Fax: +46 858900051 374 Email: gunnar.hellstrom@omnitor.se 375 URI: www.omnitor.se 377 Arnoud van Wijk 378 Real-Time Text Taskforce (R3TF) 379 NL 381 Fax: +31 412614000 382 Email: arnoud@realtimetext.org 383 URI: www.realtimetext.org