idnits 2.17.1 draft-ietf-avt-rtp-redundancy-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-23) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand corner of the first page ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 73 instances of too long lines in the document, the longest one being 3 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 227: '...encoding MUST be sampled at the same r...' RFC 2119 keyword, line 266: '...e a copy of the primary MAY be used as...' RFC 2119 keyword, line 267: '...dundant encoding MUST NOT be higher ba...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 111 has weird spacing: '...ur byte bound...' == Line 384 has weird spacing: '...mestamp of pr...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 1890 (ref. '3') (Obsoleted by RFC 3551) -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' == Outdated reference: A later version (-06) exists of draft-ietf-mmusic-sdp-03 Summary: 13 errors (**), 0 flaws (~~), 4 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Expire in six months 3 Colin Perkins 4 Isidor Kouvelas 5 Orion Hodson 6 Vicky Hardman 7 University College London 9 Mark Handley 10 ISI 12 Jean-Chrysostome Bolot 13 Andres Vega-Garcia 14 Sacha Fosse-Parisis 15 INRIA Sophia Antipolis 17 RTP Payload for Redundant Audio Data 18 draft-ietf-avt-rtp-redundancy-00.txt 20 Status of this Memo 22 This document is an Internet-Draft. Internet-Drafts are working documents 23 of the Internet Engineering Task Force (IETF), its areas, and its working 24 groups. Note that other groups may also distribute working documents as 25 Internet-Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months and 28 may be updated, replaced, or obsoleted by other documents at any time. It 29 is inappropriate to use Internet-Drafts as reference material or to cite 30 them other than as ``work in progress''. To learn the current status of 31 any Internet-Draft, please check the ``1id-abstracts.txt'' listing 32 contained in the Internet-Drafts Shadow Directories on ftp.is.co.za 33 (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), 34 ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). 36 Distribution of this document is unlimited. 38 Comments are solicited and should be addressed to the authors and/or 39 the AVT working group's mailing list at rem-conf@es.net. 41 Abstract 43 This document describes a payload format for use with the 44 real-time transport protocol (RTP), version 2, for encoding 45 redundant audio data. The primary motivation for the scheme 46 described herein is the development of audio conferencing 47 tools for use with lossy packet networks such as the Internet 48 Mbone, although this scheme is not limited to such applications. 50 1 Introduction 52 If multimedia conferencing is to become widely used by the Internet Mbone 53 community, users must perceive the quality to be sufficiently good for most 54 applications. We have identified a number of problems which impair the 55 quality of conferences, the most significant of which is packet loss. This 56 is a persistent problem, particularly given the increasing popularity, and 57 therefore increasing load, of the Internet. The disruption of speech 58 intelligibility even at low loss rates which is currently experienced may 59 convince a whole generation of users that multimedia conferencing over the 60 Internet is not viable. The addition of redundancy to the data stream is 61 offered as a solution [1]. If a packet is lost then the missing information 62 may be reconstructed at the receiver from the redundant data that arrives 63 in the following packet(s), provided that the average number of 64 consecutively lost packets is small. Recent work [4,5] shows that packet 65 loss patterns in the Internet are such that this scheme typically functions 66 well. 68 This document describes an RTP payload format for the transmission 69 of audio data encoded in such a redundant fashion. Section 2 presents 70 the requirements and motivation leading to the definition of this 71 payload format, and does not form part of the payload format definition. 72 Sections 3 onwards define the RTP payload format for redundant audio 73 data. 75 2 Requirements/Motivation 77 The requirements for a redundant encoding scheme under RTP are as 78 follows: 80 o Packets have to carry a primary encoding and one or more redundant 81 encodings. 83 o As a multitude of encodings may be used for redundant information, 84 each block of redundant encoding has to have an encoding type 85 identifier. 87 o As the use of variable size encodings is desirable, each encoded 88 block in the packet has to have a length indicator. 90 o The RTP header provides a timestamp field that corresponds to 91 the time of creation of the encoded data. When redundant encodings 92 are used this timestamp field can refer to the time of creation 93 of the primary encoding data. Redundant blocks of data will 94 correspond to different time intervals than the primary data, 95 and hence each block of redundant encoding will require its own 96 timestamp. To reduce the number of bytes needed to carry the 97 timestamp, it can be encoded as the difference of the timestamp 98 for the redundant encoding and the timestamp of the primary. 100 There are two essential means by which redundant audio may be added 101 to the standard RTP specification: a header extension may hold the 102 redundancy, or one, or more, additional payload types may be defined. 104 Including all the redundancy information for a packet in a header extension 105 would make it easy for applications that do not implement redundancy to 106 discard it and just process the primary encoding data. There are, however, 107 a number of disadvantages with this scheme: 109 o There is a large overhead from the number of bytes needed for 110 the extension header (4) and the possible padding that is needed 111 at the end of the extension to round up to a four byte boundary 112 (up to 3 bytes). For many applications this overhead is unacceptable. 114 o Use of the header extension limits applications to a single redundant 115 encoding, unless further structure is introduced into the extension. 116 This would result in further overhead. 118 For these reasons, the use of RTP header extension to hold redundant 119 audio encodings is disregarded. 121 The RTP profile for audio and video conferences [3] lists a set of 122 payload types and provides for a dynamic range of 32 encodings that 123 may be defined through a conference control protocol. This leads 124 to two possible schemes for assigning additional RTP payload types 125 for redundant audio applications: 127 1. A dynamic encoding scheme may be defined, for each combination 128 of primary/redundant payload types, using the RTP dynamic payload 129 type range. 131 2. A single fixed payload type may be defined to represent a packet 132 with redundancy. This may then be assigned to either a static 133 RTP payload type, or the payload type for this may be assigned 134 dynamically. 136 It is possible to define a set of payload types that signify a particular 137 combination of primary and secondary encodings for each of the 32 dynamic 138 payload types provided. This would be a slightly restrictive yet feasible 139 solution for packets with a single block of redundancy as the number of 140 possible combinations is not too large. However the need for multiple 141 blocks of redundancy greatly increases the number of encoding combinations 142 and makes this solution not viable. 144 A modified version of the above solution could be to decide prior 145 to the beginning of a conference on a set a 32 encoding combinations 146 that will be used for the duration of the conference. All tools 147 in the conference can be initialized with this working set of encoding 148 combinations. Communication of the working set could be made through 149 the use of an external, out of band, mechanism. Setup is complicated 150 as great care needs to be taken in starting tools with identical 151 parameters. This scheme is more efficient as only one byte is used 152 to identify combinations of encodings. 154 It is felt that the complication inherent in distributing the mapping of 155 payload types onto combinations of redundant data preclude the use of this 156 mechanism. 158 A more flexible solution is to have a single payload type which signifies a 159 packet with redundancy. That packet then becomes a container, encapsulating 160 multiple payloads into a single RTP packet. Such a scheme is flexible, 161 since any amount of redundancy may be encapsulated within a single packet. 162 There is, however, a small overhead since each encapsulated payload must be 163 preceded by a header indicating the type of data enclosed. This is the 164 preferred solution, since it is both flexible, extensible, and has a 165 relatively low overhead. The remainder of this document describes this 166 solution. 168 3 Payload Format Specification 170 The assignment of an RTP payload type for this new packet format is outside 171 the scope of this document, and will not be specified here. It is expected 172 that the RTP profile for a particular class of applications will assign a 173 payload type for this encoding, or if that is not done then a payload type 174 in the dynamic range shall be chosen. 176 An RTP packet containing redundant data shall have a standard RTP header, 177 with payload type indicating redundancy. The other fields of the RTP 178 header relate to the primary data block of the redundant data. 180 Following the RTP header are a number of additional headers, defined in the 181 figure below, which specify the contents of each of the encodings carried 182 by the packet. Following these additional headers are a number of data 183 blocks, which contain the standard RTP payload data for these encodings. 184 It is noted that all the headers are aligned to a 32 bit boundary, but that 185 the payload data will typically not be aligned. If multiple redundant 186 encodings are carried in a packet, they should correspond to different time 187 intervals: there is no reason to include multiple copies of data for a 188 single time interval within a packet. 190 0 1 2 3 191 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 192 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 193 |F| block PT | timestamp offset | block length | 194 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 195 The bits in the header are specified as follows: 197 F: 1 bit First bit in header indicates whether another header block 198 follows. If 1 further header blocks follow, if 0 this is the 199 last header block. 201 block PT: 7 bits RTP payload type for this block. 203 timestamp offset: 14 bits Unsigned offset of timestamp of this block 204 relative to timestamp given in RTP header. The use of an unsigned 205 offset implies that redundant data must be sent after the primary 206 data, and is hence a time to be subtracted from the current 207 timestamp to determine the timestamp of the data for which this 208 block is the redundancy. 210 block length: 10 bits Length in bytes of the corresponding data 211 block excluding header. 213 It is noted that the use of an unsigned timestamp offset limits the use of 214 redundant data slightly: it is not possible to send redundancy before the 215 primary encoding. This may affect schemes where a low bandwidth coding 216 suitable for redundancy is produced early in the encoding process, and 217 hence could feasibly be transmitted early. However, the addition of a sign 218 bit would unacceptably reduce the range of the timestamp offset, and 219 increasing the size of the field above 14 bits limits the block length 220 field. It seems that limiting redundancy to be transmitted after the 221 primary will cause fewer problems than limiting the size of the other 222 fields. 224 The timestamp offset for a redundant block is measured in the same units as 225 the timestamp of the primary encoding (ie: audio samples, with the same 226 clock rate as the primary). The implication of this is that the redundant 227 encoding MUST be sampled at the same rate as the primary. 229 It is further noted that the block length and timestamp offset are 10 bits, 230 and 14 bits respectively; rather than the more obvious 8 and 16 bits. 231 Whilst such an encoding complicates parsing the header information 232 slightly, and adds some additional processing overhead, there are a number 233 of problems involved with the more obvious choice: An 8 bit block length 234 field is sufficient for most, but not all, possible encodings: for example 235 80ms PCM and DVI audio packets comprise more than 256 bytes, and cannot be 236 encoded with a single byte length field. It is possible to impose 237 additional structure on the block length field (for example the high bit 238 set could imply the lower 7 bits code a length in words, rather than 239 bytes), however such schemes are complex. The use of a 10 bit block length 240 field retains simplicity and provides an enlarged range, at the expense of 241 a reduced range of timestamp values. 243 The primary encoding block header is placed last in the packet. It 244 is therefore possible to omit the timestamp and block-length fields 245 from the header of this block, since they may be determined from 246 the RTP header and overall packet length. The header for the primary 247 (final) block comprises only a zero F bit, and the block payload 248 type information, a total of 8 bits. This is illustrated in the 249 figure below: 251 0 1 2 3 4 5 6 7 252 +-+-+-+-+-+-+-+-+ 253 |0| Block PT | 254 +-+-+-+-+-+-+-+-+ 256 The final header is followed, immediately, by the data blocks, stored in 257 the same order as the headers. There is no padding or other delimiter 258 between the data blocks, and they are typically not 32 bit aligned. Again, 259 this choice was made to reduce bandwidth overheads, at the expense of 260 additional decoding time. 262 The choice of encodings used should reflect the bandwidth requirements of 263 those encodings. It is expected that the redundant encoding shall use 264 significantly less bandwidth that the primary encoding: the exception 265 being the case where the primary is very low-bandwidth and has high 266 processing requirement, in which case a copy of the primary MAY be used as 267 the redundancy. The redundant encoding MUST NOT be higher bandwidth than 268 the primary. 270 The use of multiple levels of redundancy is rarely necessary. However, in 271 those cases which require it, the bandwidth required by each level of 272 redundancy is expected to be significantly less than that of the previous 273 level. 275 4 Limitations 277 The RTP marker bit is not preserved for redundant data blocks. Hence 278 if the primary (containing this marker) is lost, the marker is lost. 279 It is believed that this will not cause undue problems: even if 280 the marker bit was transmitted with the redundant information, there 281 would still be the possibility of its loss, so applications would 282 still have to be written with this in mind. 284 In addition, CSRC information is not preserved for redundant data. 285 The CSRC data in the RTP header of a redundant audio packet relates 286 to the primary only. Since CSRC data in an audio stream is expected 287 to change relatively infrequently, it is recommended that applications 288 which require this information assume that the CSRC data in the RTP 289 header may be applied to the reconstructed redundant data. 291 5 Relation to SDP 293 When a redundant payload is used, it may need to be bound to an RTP dynamic 294 payload type. This may be achieved through any out-of-band mechanism, but 295 one common way is to communicate this binding using the Session Description 296 Protocol (SDP) [6]. SDP has a mechanism for binding a dynamic payload 297 types to particular codec, sample rate, and number of channels using the 298 ``rtpmap'' attribute. An example of its use (using the RTP audio/video 299 profile [3]) is: 301 m=audio 12345 RTP/AVP 121 0 5 302 a=rtpmap:121 red/8000/1 304 This specifies that an audio stream using RTP is using payload types 121 (a 305 dynamic payload type), 0 (PCM u-law) and 5 (DVI). The ``rtpmap'' attribute 306 is used to bind payload type 121 to codec ``red'' indicating this codec is 307 actually a redundancy frame, 8KHz, and monaural. When used with SDP, the 308 term ``red'' is used to indicate the redundancy format discussed in this 309 document. 311 In this case the additional formats of PCM and DVI are specified. The 312 receiver must therefore be prepared to use these formats. Such a 313 specification means the sender will send redundancy by default, but also 314 may send PCM or DVI. However, with a redundant payload we additionally take 315 this to mean that no codec other than PCM or DVI will be used in the 316 redundant encodings. Note that the additional payload formats defined in 317 the ``m='' field may themselves be dynamic payload types, and if so a 318 number of additional ``a='' attributes may be required to describe these 319 dynamic payload types. 321 To receive a redundant stream, this is all that is required. However to 322 send a redundant stream, the sender needs to know which codecs are 323 recommended for the primary and secondary (and tertiary, etc) encodings. 324 This information is specific to the redundancy format, and is specified 325 using an additional attribute ``fmtp'' which conveys format-specific 326 information. A session directory does not parse the values specified in an 327 fmtp attribute but merely hands it to the media tool unchanged. For 328 redundancy, we define the format parameters to be a slash ``/'' separated 329 list of RTP payload types. 331 Thus a complete example is: 333 m=audio 12345 RTP/AVP 121 0 5 334 a=rtpmap:121 red/8000/1 335 a=fmtp:121 0/5 337 This specifies that the default format for senders is redundancy with PCM 338 as the primary encoding and DVI as the secondary encoding. Encodings 339 cannot be specified in the fmtp attribute unless they are also specified as 340 valid encodings on the media (``m='') line. 342 6 Security Considerations 344 RTP packets containing redundant information are subject to the security 345 considerations discussed in the RTP specification [2], and any appropriate 346 RTP profile (for example [3]). This implies that confidentiality of the 347 media streams is achieved by encryption. Encryption of a redundant data 348 stream may occur in two ways: 350 1. The entire stream is to be secured, and all participants are 351 expected to have keys to decode the entire stream. In this 352 case, nothing special need be done, and encryption is performed 353 in the usual manner. 355 2. A portion of the stream is to be encrypted with a different 356 key to the remainder. In this case a redundant copy of the 357 last packet of that portion cannot be sent, since there is no 358 following packet which is encrypted with the correct key in which 359 to send it. Similar limitations may occur when enabling/disabling 360 encryption. 362 The choice between these two is a matter for the encoder only. Decoders 363 can decrypt either form without modification. 365 Whilst the addition of low-bandwidth redundancy to an audio stream is an 366 effective means by which that stream may be protected against packet loss, 367 application designers should be aware that the addition of large amounts of 368 redundancy will increase network congestion, and hence packet loss, leading 369 to a worsening of the problem which the use of redundancy was intended to 370 solve. At its worst, this can lead to excessive network congestion and may 371 constitute a denial of service attack. 373 7 Example Packet 375 An RTP audio data packet containing a DVI4 (8KHz) primary, and a 376 single block of redundancy encoded using 8KHz LPC (both 20ms packets), 377 as defined in the RTP audio/video profile [3] is illustrated: 379 0 1 2 3 380 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 382 |V=2|P|X| CC=0 |M| PT | sequence number of primary | 383 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 384 | timestamp of primary encoding | 385 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 386 | synchronization source (SSRC) identifier | 387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 388 |1| block PT=7 | timestamp offset | block length | 389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 390 |0| block PT=5 | | 391 +-+-+-+-+-+-+-+-+ + 392 | | 393 + LPC encoded redundant data (PT=7) + 394 | (14 bytes) | 395 + +---------------+ 396 | | | 397 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 398 | | 399 + + 400 | | 401 + + 402 | | 403 + + 404 | DVI4 encoded primary data (PT=5) | 405 + (84 bytes, not to scale) + 406 / / 407 + + 408 | | 409 + + 410 | | 411 + +---------------+ 412 | | 413 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 414 8 Author's Addresses 416 Colin Perkins/Isidor Kouvelas/Orion Hodson/Vicky Hardman 417 Department of Computer Science 418 University College London 419 London WC1E 6BT 420 United Kingdom 421 Email: {c.perkins|i.kouvelas|o.hodson|v.hardman}@cs.ucl.ac.uk 423 Mark Handley 424 USC Information Sciences Institute 425 c/o MIT Laboratory for Computer Science 426 545 Technology Square 427 Cambridge, MA 02139, USA 428 Email: mjh@isi.edu 430 Jean-Chrysostome Bolot/Andres Vega-Garcia/Sacha Fosse-Parisis 431 INRIA Sophia Antipolis 432 2004 Route des Lucioles, BP 93 433 06902 Sophia Antipolis 434 France 435 Email: {bolot|avega|sfosse}@sophia.inria.fr 437 9 References 439 [1] V.J. Hardman, M.A. Sasse, M. Handley and A. Watson; Reliable 440 Audio for Use over the Internet; Proceedings INET'95, Honalulu, Oahu, 441 Hawaii, September 1995. http://www.isoc.org/in95prc/ 443 [2] H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson; RTP: 444 A Transport Protocol for Real-Time Applications; RFC 1889, January 445 1996 447 [3] H. Schulzrinne; RTP Profile for Audio and Video Conferences with 448 Minimal Control; RFC 1890, January 1996 449 [4] M. Yajnik, J. Kurose and D. Towsley; Packet loss correlation 450 in the MBone multicast network; IEEE Globecom Internet workshop, London, 451 November 1996 453 [5] J.-C. Bolot and A. Vega-Garcia; The case for FEC-based error 454 control for packet audio in the Internet; ACM Multimedia Systems, 455 1997 457 [6] M. Handley and V. Jacobson; SDP: Session Description Protocol 458 (draft 03.2) draft-ietf-mmusic-sdp-03.txt, November 1996