idnits 2.17.1 draft-ietf-avt-profile-new-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** There are 35 instances of too long lines in the document, the longest one being 8 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 408 has weird spacing: '...hannels descr...' == Line 416 has weird spacing: '... lc c r...' == Line 523 has weird spacing: '...ncoding sampl...' == Line 546 has weird spacing: '...A: not appli...' == Line 649 has weird spacing: '... bits conte...' == (3 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 14, 2000) is 8685 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 1679 looks like a reference -- Missing reference section? '2' on line 1684 looks like a reference -- Missing reference section? '3' on line 1687 looks like a reference -- Missing reference section? '4' on line 1691 looks like a reference -- Missing reference section? '5' on line 1696 looks like a reference -- Missing reference section? '6' on line 1700 looks like a reference -- Missing reference section? '7' on line 1704 looks like a reference -- Missing reference section? '8' on line 1708 looks like a reference -- Missing reference section? '9' on line 1711 looks like a reference -- Missing reference section? '10' on line 1716 looks like a reference -- Missing reference section? '11' on line 1720 looks like a reference -- Missing reference section? '12' on line 1724 looks like a reference -- Missing reference section? '13' on line 1891 looks like a reference -- Missing reference section? '14' on line 1734 looks like a reference -- Missing reference section? '15' on line 1740 looks like a reference -- Missing reference section? '16' on line 1744 looks like a reference -- Missing reference section? '17' on line 1747 looks like a reference -- Missing reference section? '0' on line 1083 looks like a reference -- Missing reference section? '22' on line 1767 looks like a reference -- Missing reference section? '23' on line 1771 looks like a reference -- Missing reference section? '24' on line 1775 looks like a reference -- Missing reference section? '25' on line 1779 looks like a reference -- Missing reference section? '26' on line 1783 looks like a reference -- Missing reference section? '27' on line 1787 looks like a reference -- Missing reference section? '28' on line 1793 looks like a reference -- Missing reference section? '29' on line 1797 looks like a reference -- Missing reference section? '30' on line 1801 looks like a reference -- Missing reference section? '31' on line 1084 looks like a reference -- Missing reference section? '32' on line 1085 looks like a reference -- Missing reference section? '33' on line 1086 looks like a reference -- Missing reference section? '34' on line 1087 looks like a reference -- Missing reference section? '35' on line 1088 looks like a reference -- Missing reference section? '36' on line 1089 looks like a reference -- Missing reference section? '37' on line 1090 looks like a reference -- Missing reference section? '38' on line 1091 looks like a reference -- Missing reference section? '39' on line 1096 looks like a reference -- Missing reference section? '40' on line 1097 looks like a reference -- Missing reference section? '41' on line 1098 looks like a reference -- Missing reference section? '42' on line 1099 looks like a reference -- Missing reference section? '43' on line 1100 looks like a reference -- Missing reference section? '44' on line 1101 looks like a reference -- Missing reference section? '45' on line 1102 looks like a reference -- Missing reference section? '46' on line 1103 looks like a reference -- Missing reference section? '47' on line 1104 looks like a reference -- Missing reference section? '18' on line 1750 looks like a reference -- Missing reference section? '48' on line 1105 looks like a reference -- Missing reference section? '19' on line 1754 looks like a reference -- Missing reference section? '49' on line 1106 looks like a reference -- Missing reference section? '20' on line 1758 looks like a reference -- Missing reference section? '50' on line 1107 looks like a reference -- Missing reference section? '21' on line 1762 looks like a reference -- Missing reference section? '51' on line 1108 looks like a reference Summary: 5 errors (**), 0 flaws (~~), 8 warnings (==), 55 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force AVT WG 3 Internet Draft Schulzrinne/Casner 4 draft-ietf-avt-profile-new-09.txt Columbia U./Packet Design 5 July 14, 2000 6 Expires: January 14, 2001 8 RTP Profile for Audio and Video Conferences with Minimal Control 10 STATUS OF THIS MEMO 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress". 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 To view the list Internet-Draft Shadow Directories, see 29 http://www.ietf.org/shadow.html. 31 Abstract 33 This memorandum is a revision of RFC 1890 in preparation for 34 advancement from Proposed Standard to Draft Standard status. Readers 35 are encouraged to use the PostScript form of this draft to see where 36 changes from RFC 1890 are marked by change bars. 38 This document describes a profile called "RTP/AVP" for the use of the 39 real-time transport protocol (RTP), version 2, and the associated 40 control protocol, RTCP, within audio and video multiparticipant 41 conferences with minimal control. It provides interpretations of 42 generic fields within the RTP specification suitable for audio and 43 video conferences. In particular, this document defines a set of 44 default mappings from payload type numbers to encodings. 46 This document also describes how audio and video data may be carried 47 within RTP. It defines a set of standard encodings and their names 48 when used within RTP. The descriptions provide pointers to reference 49 implementations and the detailed standards. This document is meant as 50 an aid for implementors of audio, video and other real-time 51 multimedia applications. 53 Resolution of Open Issues 55 [Note to the RFC Editor: This section is to be deleted when this 56 draft is published as an RFC but is shown here for reference during 57 the Last Call. The first paragraph of the Abstract is also to be 58 deleted. All RFC XXXX should be filled in with the number of the RTP 59 specification RFC submitted for Draft Standard status, and all RFC 60 YYYY should be filled in with the number of the draft specifying MIME 61 registration of RTP payload types as it is submitted for Proposed 62 Standard status. These latter references are intended to be non- 63 normative.] 65 Readers are directed to Appendix 9, Changes from RFC 1890, for a 66 listing of the changes that have been made in this draft. The 67 changes from RFC 1890 are marked with change bars in the PostScript 68 form of this draft. 70 The revisions in this draft are intended to be complete for Last 71 Call. The following open issues from previous drafts have been 72 addressed: 74 o The procedure for registering RTP encoding names as MIME 75 subtypes was moved to a separate RFC-to-be that may also serve 76 to specify how (some of) the encodings here may be used with 77 mail and other not-RTP transports. That procedure is not 78 required to implement this profile, but may be used in those 79 contexts where it is needed. 81 o This profile follows the suggestion in the RTP spec that RTCP 82 bandwidth may be specified separately from the session 83 bandwidth and separately for active senders and passive 84 receivers. 86 o No specific action is taken in this document to address 87 generic payload formats; it is assumed that if any generic 88 payload formats are developed, they can be specified in 89 separate RFCs and that the session parameters they require for 90 operation can be specified in the MIME registration of those 91 formats. 93 o The specification of the CN (comfort noise) payload format has 94 been removed to a separate draft so that it may be enhanced as 95 a result of additional work in ITU-T. That draft is intended 96 for publication at Proposed Standard status. Static payload 97 type 13 is marked reserved here for the use of that payload 98 format (since CN has already been implemented from earlier 99 drafts of this profile). Static payload type 19 is also 100 reserved because some revisions of the draft assigned that 101 number to CN to avoid an historic use of 13. 103 o The requirement for congestion control in RTP is addressed in 104 the RTP spec with an explanation that the behavior is context 105 specific and should be defined in RTP profiles. Text has been 106 added to this profile in Section 2 to describe the 107 requirements only in general terms because specific algorithms 108 have not been devised yet for multicast congestion control. 110 1 Introduction 112 This profile defines aspects of RTP left unspecified in the RTP 113 Version 2 protocol definition (RFC XXXX) [1]. This profile is 114 intended for the use within audio and video conferences with minimal 115 session control. In particular, no support for the negotiation of 116 parameters or membership control is provided. The profile is expected 117 to be useful in sessions where no negotiation or membership control 118 are used (e.g., using the static payload types and the membership 119 indications provided by RTCP), but this profile may also be useful in 120 conjunction with a higher-level control protocol. 122 Use of this profile may be implicit in the use of the appropriate 123 applications; there may be no explicit indication by port number, 124 protocol identifier or the like. Applications such as session 125 directories may use the name for this profile specified in Section 3. 127 Other profiles may make different choices for the items specified 128 here. 130 This document also defines a set of encodings and payload formats for 131 audio and video. 133 1.1 Terminology 135 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 136 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 137 document are to be interpreted as described in RFC 2119 [2] and 138 indicate requirement levels for implementations compliant with this 139 RTP profile. 141 This draft defines the term media type as dividing encodings of audio 142 and video content into three classes: audio, video and audio/video 143 (interleaved). 145 2 RTP and RTCP Packet Forms and Protocol Behavior 147 The section "RTP Profiles and Payload Format Specification" of RFC 148 XXXX enumerates a number of items that can be specified or modified 149 in a profile. This section addresses these items. Generally, this 150 profile follows the default and/or recommended aspects of the RTP 151 specification. 153 RTP data header: The standard format of the fixed RTP data 154 header is used (one marker bit). 156 Payload types: Static payload types are defined in Section 6. 158 RTP data header additions: No additional fixed fields are 159 appended to the RTP data header. 161 RTP data header extensions: No RTP header extensions are 162 defined, but applications operating under this profile MAY 163 use such extensions. Thus, applications SHOULD NOT assume 164 that the RTP header X bit is always zero and SHOULD be 165 prepared to ignore the header extension. If a header 166 extension is defined in the future, that definition MUST 167 specify the contents of the first 16 bits in such a way 168 that multiple different extensions can be identified. 170 RTCP packet types: No additional RTCP packet types are defined 171 by this profile specification. 173 RTCP report interval: The suggested constants are to be used for 174 the RTCP report interval calculation. Sessions operating 175 under this profile MAY specify a separate parameter for the 176 RTCP traffic bandwidth rather than using the default 177 fraction of the session bandwidth. The RTCP traffic 178 bandwidth MAY be divided into two separate session 179 parameters for those participants which are active data 180 senders and those which are not. Following the 181 recommendation in the RTP specification [1] that 1/4 of the 182 RTCP bandwidth be dedicated to data senders, the 183 RECOMMENDED default values for these two parameters would 184 be 1.25% and 3.75%, respectively. For a particular session, 185 the RTCP bandwidth for non-data-senders MAY be set to zero 186 when operating on unidirectional links or for sessions that 187 don't require feedback on the quality of reception. The 188 RTCP bandwidth for data senders SHOULD be kept non-zero so 189 that sender reports can still be sent for inter-media 190 synchronization and to identify the source by CNAME. The 191 means by which the one or two session parameters for RTCP 192 bandwidth are specified is beyond the scope of this memo. 194 SR/RR extension: No extension section is defined for the RTCP SR 195 or RR packet. 197 SDES use: Applications MAY use any of the SDES items described 198 in the RTP specification. While CNAME information MUST be 199 sent every reporting interval, other items SHOULD only be 200 sent every third reporting interval, with NAME sent seven 201 out of eight times within that slot and the remaining SDES 202 items cyclically taking up the eighth slot, as defined in 203 Section 6.2.2 of the RTP specification. In other words, 204 NAME is sent in RTCP packets 1, 4, 7, 10, 13, 16, 19, 205 while, say, EMAIL is used in RTCP packet 22. 207 Security: The RTP default security services are also the default 208 under this profile. 210 String-to-key mapping: A user-provided string ("pass phrase") is 211 hashed with the MD5 algorithm to a 16-octet digest. An n- 212 bit key is extracted from the digest by taking the first n 213 bits from the digest. If several keys are needed with a 214 total length of 128 bits or less (as for triple DES), they 215 are extracted in order from that digest. The octet ordering 216 is specified in RFC 1423, Section 2.2. (Note that some DES 217 implementations require that the 56-bit key be expanded 218 into 8 octets by inserting an odd parity bit in the most 219 significant bit of the octet to go with each 7 bits of the 220 key.) 222 It is RECOMMENDED that pass phrases be restricted to ASCII 223 letters, digits, the hyphen, and white space to reduce the 224 the chance of transcription errors when conveying keys by 225 phone, fax, telex or email. 227 The pass phrase MAY be preceded by a specification of the 228 encryption algorithm. Any characters up to the first slash 229 (ASCII 0x2f) are taken as the name of the encryption 230 algorithm. The encryption format specifiers SHOULD be drawn 231 from RFC 1423 or any additional identifiers registered with 232 IANA. If no slash is present, DES-CBC is assumed as 233 default. The encryption algorithm specifier is case 234 sensitive. 236 The pass phrase typed by the user is transformed to a 237 canonical form before applying the hash algorithm. For that 238 purpose, we define `white space' to be the ASCII space, 239 formfeed, newline, carriage return, tab, or vertical tab as 240 well as all characters contained in the Unicode space 241 characters table. The transformation consists of the 242 following steps: (1) convert the input string to the ISO 243 10646 character set, using the UTF-8 encoding as specified 244 in Annex P to ISO/IEC 10646-1:1993 (ASCII characters 245 require no mapping, but ISO 8859-1 characters do); (2) 246 remove leading and trailing white space characters; (3) 247 replace one or more contiguous white space characters by a 248 single space (ASCII or UTF-8 0x20); (4) convert all letters 249 to lower case and replace sequences of characters and non- 250 spacing accents with a single character, where possible. A 251 minimum length of 16 key characters (after applying the 252 transformation) SHOULD be enforced by the application, 253 while applications MUST allow up to 256 characters of 254 input. 256 Congestion: RTP and this profile may be used in the context of 257 enhanced network service, for example, through Integrated 258 Services (RFC 1633) [3] or Differentiated Services (RFC 259 2475) [4], or they may be used with best effort service. 261 If enhanced service is being used, RTP receivers SHOULD 262 monitor packet loss to ensure that the service that was 263 requested is actually being delivered. If it is not, then 264 they SHOULD assume that they are receiving best-effort 265 service and behave accordingly. 267 If best-effort service is being used, RTP receivers SHOULD 268 monitor packet loss to ensure that the packet loss rate is 269 within acceptable parameters. Packet loss is considered 270 acceptable if a TCP flow across the same network path and 271 experiencing the same network conditions would achieve an 272 average throughput that is not less the RTP flow is 273 achieving. This condition can be satisfied by implementing 274 congestion control mechanisms to adapt the transmission 275 rate (or the number of layers subscribed for a layered 276 multicast session), or by arranging for a receiver to leave 277 the session if the loss rate is unacceptably high. 279 Underlying protocol: The profile specifies the use of RTP over 280 unicast and multicast UDP as well as TCP. (This does not 281 preclude the use of these definitions when RTP is carried 282 by other lower-layer protocols.) 284 Transport mapping: The standard mapping of RTP and RTCP to 285 transport-level addresses is used. 287 Encapsulation: A minimal TCP encapsulation is defined. 289 3 IANA Considerations 291 The RTP specification establishes a registry of profile names for use 292 by higher-level control protocols, such as the Session Description 293 Protocol (SDP), RFC 2327 [5], to refer to transport methods. This 294 profile registers the name "RTP/AVP". 296 3.1 Registering Additional Encodings 298 This profile lists a set of encodings, each of which is comprised of 299 a particular media data compression or representation plus a payload 300 format for encapsulation within RTP. Some of those payload formats 301 are specified here, while others are specified in separate RFCs. It 302 is expected that additional encodings beyond the set listed here will 303 be created in the future and specified in additional payload format 304 RFCs. 306 This profile also assigns to each encoding a short name which MAY be 307 used by higher-level control protocols, such as the Session 308 Description Protocol (SDP), RFC 2327 [5], to identify encodings 309 selected for a particular RTP session. 311 In some contexts it may be useful to refer to these encodings in the 312 form of a MIME content-type. To facilitate this, RFC YYYY [6] 313 provides registrations for all of the encodings names listed here as 314 MIME subtype names under the "audio" and "video" MIME types through 315 the MIME registration procedure as specified in RFC 2048 [7]. 317 Any additional encodings specified for use under this profile (or 318 others) may also be assigned names registered as MIME subtypes with 319 the Internet Assigned Numbers Authority (IANA). This registry 320 provides a means to insure that the names assigned to the additional 321 encodings are kept unique. RFC YYYY specifies the information that is 322 required for the registration of RTP encodings. 324 In addition to assigning names to encodings, this profile also also 325 assigns static RTP payload type numbers to some of them. However, the 326 payload type number space is relatively small and cannot accommodate 327 assignments for all existing and future encodings. During the early 328 stages of RTP development, it was necessary to use statically 329 assigned payload types because no other mechanism had been specified 330 to bind encodings to payload types. It was anticipated that non-RTP 331 means beyond the scope of this memo (such as directory services or 332 invitation protocols) would be specified to establish a dynamic 333 mapping between a payload type and an encoding. Now, mechanisms for 334 defining dynamic payload type bindings have been specified in the 335 Session Description Protocol (SDP) and in other protocols such as 336 ITU-T recommendation H.323/H.245. These mechanisms associate the 337 registered name of the encoding/payload format, along with any 338 additional required parameters such as the RTP timestamp clock rate 339 and number of channels, to a payload type number. This association 340 is effective only for the duration of the RTP session in which the 341 dynamic payload type binding is made. This association applies only 342 to the RTP session for which it is made, thus the numbers can be re- 343 used for different encodings in different sessions so the number 344 space limitation is avoided. 346 This profile reserves payload type numbers in the range 96-127 347 exclusively for dynamic assignment. Applications SHOULD first use 348 values in this range for dynamic payload types. Those applications 349 which need to define more than 32 dynamic payload types MAY bind 350 codes below 96, in which case it is RECOMMENDED that unassigned 351 payload type numbers be used first. However, the statically assigned 352 payload types are default bindings and MAY be dynamically bound to 353 new encodings if needed. Redefining payload types below 96 may cause 354 incorrect operation if an attempt is made to join a session without 355 obtaining session description information that defines the dynamic 356 payload types. 358 Dynamic payload types SHOULD NOT be used without a well-defined 359 mechanism to indicate the mapping. Systems that expect to 360 interoperate with others operating under this profile SHOULD NOT make 361 their own assignments of proprietary encodings to particular, fixed 362 payload types. 364 This specification establishes the policy that no additional static 365 payload types will be assigned beyond the ones defined in this 366 document. Establishing this policy avoids the problem of trying to 367 create a set of criteria for accepting static assignments and 368 encourages the implementation and deployment of the dynamic payload 369 type mechanisms. 371 4 Audio 373 4.1 Encoding-Independent Rules 375 For applications which send either no packets or comfort-noise 376 packets during silence, the first packet of a talkspurt, that is, the 377 first packet after a silence period, SHOULD be distinguished by 378 setting the marker bit in the RTP data header to one. The marker bits 379 in all other packets is zero. The beginning of a talkspurt MAY be 380 used to adjust the playout delay to reflect changing network delays. 382 Applications without silence suppression MUST set the marker bit to 383 zero. 385 The RTP clock rate used for generating the RTP timestamp is 386 independent of the number of channels and the encoding; it equals the 387 number of sampling periods per second. For N-channel encodings, each 388 sampling period (say, 1/8000 of a second) generates N samples. (This 389 terminology is standard, but somewhat confusing, as the total number 390 of samples generated per second is then the sampling rate times the 391 channel count.) 393 If multiple audio channels are used, channels are numbered left-to- 394 right, starting at one. In RTP audio packets, information from 395 lower-numbered channels precedes that from higher-numbered channels. 396 For more than two channels, the convention followed by the AIFF-C 397 audio interchange format SHOULD be followed [8], using the following 398 notation, unless some other convention is specified for a particular 399 encoding or payload format: 401 l left 402 r right 403 c center 404 S surround 405 F front 406 R rear 408 channels description channel 409 1 2 3 4 5 6 410 __________________________________________________ 411 2 stereo l r 412 3 l r c 413 4 quadrophonic Fl Fr Rl Rr 414 4 l c r S 415 5 Fl Fr Fc Sl Sr 416 6 l lc c r rc S 418 Samples for all channels belonging to a single sampling instant MUST 419 be within the same packet. The interleaving of samples from different 420 channels depends on the encoding. General guidelines are given in 421 Section 4.3 and 4.4. 423 The sampling frequency SHOULD be drawn from the set: 8000, 11025, 424 16000, 22050, 24000, 32000, 44100 and 48000 Hz. (Older Apple 425 Macintosh computers had a native sample rate of 22254.54 Hz, which 426 can be converted to 22050 with acceptable quality by dropping 4 427 samples in a 20 ms frame.) However, most audio encodings are defined 428 for a more restricted set of sampling frequencies. Receivers SHOULD 429 be prepared to accept multi-channel audio, but MAY choose to only 430 play a single channel. 432 4.2 Operating Recommendations 434 The following recommendations are default operating parameters. 435 Applications SHOULD be prepared to handle other values. The ranges 436 given are meant to give guidance to application writers, allowing a 437 set of applications conforming to these guidelines to interoperate 438 without additional negotiation. These guidelines are not intended to 439 restrict operating parameters for applications that can negotiate a 440 set of interoperable parameters, e.g., through a conference control 441 protocol. 443 For packetized audio, the default packetization interval SHOULD have 444 a duration of 20 ms or one frame, whichever is longer, unless 445 otherwise noted in Table 1 (column "ms/packet"). The packetization 446 interval determines the minimum end-to-end delay; longer packets 447 introduce less header overhead but higher delay and make packet loss 448 more noticeable. For non-interactive applications such as lectures or 449 for links with severe bandwidth constraints, a higher packetization 450 delay MAY be used. A receiver SHOULD accept packets representing 451 between 0 and 200 ms of audio data. (For framed audio encodings, a 452 receiver SHOULD accept packets with a number of frames equal to 200 453 ms divided by the frame duration, rounded up.) This restriction 454 allows reasonable buffer sizing for the receiver. 456 4.3 Guidelines for Sample-Based Audio Encodings 458 In sample-based encodings, each audio sample is represented by a 459 fixed number of bits. Within the compressed audio data, codes for 460 individual samples may span octet boundaries. An RTP audio packet may 461 contain any number of audio samples, subject to the constraint that 462 the number of bits per sample times the number of samples per packet 463 yields an integral octet count. Fractional encodings produce less 464 than one octet per sample. 466 The duration of an audio packet is determined by the number of 467 samples in the packet. 469 For sample-based encodings producing one or more octets per sample, 470 samples from different channels sampled at the same sampling instant 471 SHOULD be packed in consecutive octets. For example, for a two- 472 channel encoding, the octet sequence is (left channel, first sample), 473 (right channel, first sample), (left channel, second sample), (right 474 channel, second sample), .... For multi-octet encodings, octets 475 SHOULD be transmitted in network byte order (i.e., most significant 476 octet first). 478 The packing of sample-based encodings producing less than one octet 479 per sample is encoding-specific. 481 The RTP timestamp reflects the instant at which the first sample in 482 the packet was sampled, that is, the oldest information in the 483 packet. 485 4.4 Guidelines for Frame-Based Audio Encodings 487 Frame-based encodings encode a fixed-length block of audio into 488 another block of compressed data, typically also of fixed length. For 489 frame-based encodings, the sender MAY choose to combine several such 490 frames into a single RTP packet. The receiver can tell the number of 491 frames contained in an RTP packet, if all the frames have the same 492 length, by dividing the RTP payload length by the audio frame size 493 which is defined as part of the encoding. This does not work when 494 carrying frames of different sizes unless the frame sizes are 495 relatively prime. If not, the frames MUST indicate their size. 497 For frame-based codecs, the channel order is defined for the whole 498 block. That is, for two-channel audio, right and left samples SHOULD 499 be coded independently, with the encoded frame for the left channel 500 preceding that for the right channel. 502 All frame-oriented audio codecs SHOULD be able to encode and decode 503 several consecutive frames within a single packet. Since the frame 504 size for the frame-oriented codecs is given, there is no need to use 505 a separate designation for the same encoding, but with different 506 number of frames per packet. 508 RTP packets SHALL contain a whole number of frames, with frames 509 inserted according to age within a packet, so that the oldest frame 510 (to be played first) occurs immediately after the RTP packet header. 511 The RTP timestamp reflects the instant at which the first sample in 512 the first frame was sampled, that is, the oldest information in the 513 packet. 515 4.5 Audio Encodings 517 The characteristics of the audio encodings described in this document 518 are shown in Table 1; they are listed in order of their payload type 519 in Table 4. While most audio codecs are only specified for a fixed 520 sampling rate, some sample-based algorithms (indicated by an entry of 521 "var." in the sampling rate column of Table 1) may be used with 522 name of sampling default 523 encoding sample/frame bits/sample rate ms/frame ms/packet 524 __________________________________________________________________ 525 1016 frame N/A 8,000 30 30 526 DVI4 sample 4 var. 20 527 G722 sample 8 16,000 20 528 G723 frame N/A 8,000 30 30 529 G726-32 sample 4 8,000 20 530 G728 frame N/A 8,000 2.5 20 531 G729 frame N/A 8,000 10 20 532 G729D frame N/A 8,000 10 20 533 G729E frame N/A 8,000 10 20 534 GSM frame N/A 8,000 20 20 535 GSM-HR frame N/A 8,000 20 20 536 GSM-EFR frame N/A 8,000 20 20 537 L8 sample 8 var. 20 538 L16 sample 16 var. 20 539 LPC frame N/A 8,000 20 20 540 MPA frame N/A var. var. 541 PCMA sample 8 var. 20 542 PCMU sample 8 var. 20 543 QCELP frame N/A 8,000 20 20 544 VDVI sample var. var. 20 546 Table 1: Properties of Audio Encodings (N/A: not applicable; var.: 547 variable) 549 different sampling rates, resulting in different coded bit rates. 550 When used with a sampling rate other than that for which a static 551 payload type is defined, non-RTP means beyond the scope of this memo 552 MUST be used to define a dynamic payload type and MUST indicate the 553 selected RTP timestamp clock rate, which is usually the same as the 554 sampling rate for audio. 556 4.5.1 1016 558 Encoding 1016 is a frame based encoding using code-excited linear 559 prediction (CELP) and is specified in Federal Standard FED-STD 1016 560 [9,10,11,12]. 562 4.5.2 DVI4 564 DVI4 is specified, with pseudo-code, in [13] as the IMA ADPCM wave 565 type. 567 However, the encoding defined here as DVI4 differs in three respects 568 from this recommendation: 570 o The RTP DVI4 header contains the predicted value rather than 571 the first sample value contained the IMA ADPCM block header. 573 o IMA ADPCM blocks contain an odd number of samples, since the 574 first sample of a block is contained just in the header 575 (uncompressed), followed by an even number of compressed 576 samples. DVI4 has an even number of compressed samples only, 577 using the `predict' word from the header to decode the first 578 sample. 580 o For DVI4, the 4-bit samples are packed with the first sample 581 in the four most significant bits and the second sample in the 582 four least significant bits. In the IMA ADPCM codec, the 583 samples are packed in the opposite order. 585 Each packet contains a single DVI block. This profile only defines 586 the 4-bit-per-sample version, while IMA also specifies a 3-bit-per- 587 sample encoding. 589 The "header" word for each channel has the following structure: 591 int16 predict; /* predicted value of first sample 592 from the previous block (L16 format) */ 593 u_int8 index; /* current index into stepsize table */ 594 u_int8 reserved; /* set to zero by sender, ignored by receiver */ 596 Each octet following the header contains two 4-bit samples, thus the 597 number of samples per packet MUST be even because there is no means 598 to indicate a partially filled last octet. 600 Packing of samples for multiple channels is for further study. 602 The document IMA Recommended Practices for Enhancing Digital Audio 603 Compatibility in Multimedia Systems (version 3.0) contains the 604 algorithm description. It is available from 606 Interactive Multimedia Association 607 48 Maryland Avenue, Suite 202 608 Annapolis, MD 21401-8011 609 USA 610 phone: +1 410 626-1380 612 4.5.3 G722 614 G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding 615 within 64 kbit/s". The G.722 encoder produces a stream of octets, 616 each of which SHALL be octet-aligned in an RTP packet. The first bit 617 transmitted in the G.722 octet, which is the most significant bit of 618 the higher sub-band sample, SHALL correspond to the most significant 619 bit of the octet in the RTP packet. 621 Even though the actual sampling rate for G.722 audio is 16000 Hz, the 622 RTP clock rate for the G722 payload format is 8000 Hz because that 623 value was erroneously assigned in RFC 1890 and must remain unchanged 624 for backward compatibility. The octet rate or sample-pair rate is 625 8000 Hz. 627 4.5.4 G723 629 G723 is specified in ITU Recommendation G.723.1, "Dual-rate speech 630 coder for multimedia communications transmitting at 5.3 and 6.3 631 kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T as 632 a mandatory codec for ITU-T H.324 GSTN videophone terminal 633 applications. The algorithm has a floating point specification in 634 Annex B to G.723.1, a silence compression algorithm in Annex A to 635 G.723.1 and an encoded signal bit-error sensitivity specification in 636 G.723.1 Annex C. 638 This Recommendation specifies a coded representation that can be used 639 for compressing the speech signal component of multi-media services 640 at a very low bit rate. Audio is encoded in 30 ms frames, with an 641 additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be 642 one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s 643 frame), or 4 octets. These 4-octet frames are called SID frames 644 (Silence Insertion Descriptor) and are used to specify comfort noise 645 parameters. There is no restriction on how 4, 20, and 24 octet frames 646 are intermixed. The least significant two bits of the first octet in 647 the frame determine the frame size and codec type: 649 bits content octets/frame 650 00 high-rate speech (6.3 kb/s) 24 651 01 low-rate speech (5.3 kb/s) 20 652 10 SID frame 4 653 11 reserved 655 It is possible to switch between the two rates at any 30 ms frame 656 boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of 657 the encoder and decoder. This coder was optimized to represent speech 658 with near-toll quality at the above rates using a limited amount of 659 complexity. 661 The packing of the encoded bit stream into octets and the 662 transmission order of the octets is specified in Rec. G.723.1 and is 663 the same as that produced by the G.723 C code reference 664 implementation. For the 6.3 kb/s data rate, this packing is 665 illustrated as follows, where the header (HDR) bits are always "0 0" 666 as shown in Fig. 1 to indicate operation at 6.3 kb/s, and the Z bit 667 is always set to zero. The diagrams show the bit packing in "network 668 byte order," also known as big-endian order. The bits of each 32-bit 669 word are numbered 0 to 31, with the most significant bit on the left 670 and numbered 0. The octets (bytes) of each word are transmitted most 671 significant octet first. The bits of each data field are numbered in 672 the order of the bit stream representation of the encoding (least 673 significant bit first). The vertical bars indicate the boundaries 674 between field fragments. 676 0 1 2 3 677 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 678 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 679 | LPC |HDR| LPC | LPC | ACL0 |LPC| 680 | | | | | | | 681 |0 0 0 0 0 0|0 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| 682 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| 683 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 684 | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 | 685 | | 1 |C| | 3 | 2 | | | 686 |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| 687 |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| 688 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 689 | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 | 690 | | | | | | | 691 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0| 692 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8| 693 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 694 | MSBPOS |Z|POS| MSBPOS | POS0 |POS| POS0 | 695 | | | 0 | | | 1 | | 696 |0 0 0 0 0 0 0|0|0 0|1 1 1 0 0 0|0 0 0 0 0 0 0 0|0 0|1 1 1 1 1 1| 697 |6 5 4 3 2 1 0| |1 0|2 1 0 9 8 7|9 8 7 6 5 4 3 2|1 0|5 4 3 2 1 0| 698 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 699 | POS1 | POS2 | POS1 | POS2 | POS3 | POS2 | 700 | | | | | | | 701 |0 0 0 0 0 0 0 0|0 0 0 0|1 1 1 1|1 1 0 0 0 0 0 0|0 0 0 0|1 1 1 1| 702 |9 8 7 6 5 4 3 2|3 2 1 0|3 2 1 0|1 0 9 8 7 6 5 4|3 2 1 0|5 4 3 2| 703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 704 | POS3 | PSIG0 |POS|PSIG2| PSIG1 | PSIG3 |PSIG2| 705 | | | 3 | | | | | 706 |1 1 0 0 0 0 0 0|0 0 0 0 0 0|1 1|0 0 0|0 0 0 0 0|0 0 0 0 0|0 0 0| 707 |1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|2 1 0|4 3 2 1 0|4 3 2 1 0|5 4 3| 708 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 710 Figure 1: G.723 (6.3 kb/s) bit packing 712 For the 5.3 kb/s data rate, the header (HDR) bits are always "0 1", 713 as shown in Fig. 2, to indicate operation at 5.3 kb/s. 715 0 1 2 3 716 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 717 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 718 | LPC |HDR| LPC | LPC | ACL0 |LPC| 719 | | | | | | | 720 |0 0 0 0 0 0|0 1|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| 721 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| 722 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 723 | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 | 724 | | 1 |C| | 3 | 2 | | | 725 |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| 726 |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| 727 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 728 | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 | 729 | | | | | | | 730 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0| 731 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|4 3 2 1|1 0 9 8| 732 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 733 | POS0 | POS1 | POS0 | POS1 | POS2 | 734 | | | | | | 735 |0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| 736 |7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| 737 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 738 | POS3 | POS2 | POS3 | PSIG1 | PSIG0 | PSIG3 | PSIG2 | 739 | | | | | | | | 740 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0| 741 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|3 2 1 0|3 2 1 0|3 2 1 0|3 2 1 0| 742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 744 Figure 2: G.723 (5.3 kb/s) bit packing 746 The packing of G.723.1 SID (silence) frames, which are indicated by 747 the header (HDR) bits having the pattern "1 0", is depicted in Fig. 748 3. 750 0 1 2 3 751 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 752 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 753 | LPC |HDR| LPC | LPC | GAIN |LPC| 754 | | | | | | | 755 |0 0 0 0 0 0|1 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| 756 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| 757 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 759 Figure 3: G.723 SID mode bit packing 761 4.5.5 G726-32 763 ITU-T Recommendation G.726 describes, among others, the algorithm 764 recommended for conversion of a single 64 kbit/s A-law or mu-law PCM 765 channel encoded at 8000 samples/sec to and from a 32 kbit/s channel. 766 The conversion is applied to the PCM stream using an Adaptive 767 Differential Pulse Code Modulation (ADPCM) transcoding technique. 768 G.726 describes codecs operating at 16 kb/s (2 bits/sample), 24 kb/s 769 (3 bits/sample), 32 kb/s (4 bits/sample), 40 kb/s (5 bits/sample). 770 Packetization is specified here only for the 32 kb/s encoding which 771 is labeled G726-32. 773 Note: In 1990, ITU-T Recommendation G.721 was merged with 774 Recommendation G.723 into ITU-T Recommendation G.726. Thus, G726-32 775 designates the same algorithm as G721 in RFC 1890. 777 No payload-specific header information SHALL be included as part of 778 the audio data. The 4-bit code words of the G726-32 encoding MUST be 779 packed into octets as follows: the first code word is placed in the 780 four least significant bits of the first octet, with the least 781 significant bit of the code word in the least significant bit of the 782 octet; the second code word is placed in the four most significant 783 bits of the first octet, with the most significant bit of the code 784 word in the most significant bit of the octet. Subsequent pairs of 785 the code words SHALL be packed in the same way into successive 786 octets, with the first code word of each pair placed in the least 787 significant four bits of the octet. The number of samples per packet 788 MUST be even because there is no means to indicate a partially filled 789 last octet. 791 4.5.6 G728 793 G728 is specified in ITU-T Recommendation G.728, "Coding of speech at 794 16 kbit/s using low-delay code excited linear prediction". 796 A G.278 encoder translates 5 consecutive audio samples into a 10-bit 797 codebook index, resulting in a bit rate of 16 kb/s for audio sampled 798 at 8,000 samples per second. The group of five consecutive samples is 799 called a vector. Four consecutive vectors, labeled V1 to V4 (where V1 800 is to be played first by the receiver), build one G.728 frame. The 801 four vectors of 40 bits are packed into 5 octets, labeled B1 through 802 B5. B1 SHALL be placed first in the RTP packet. 804 Referring to the figure below, the principle for bit order is 805 "maintenance of bit significance". Bits from an older vector are more 806 significant than bits from newer vectors. The MSB of the frame goes 807 to the MSB of B1 and the LSB of the frame goes to LSB of B5. 809 1 2 3 3 810 0 0 0 0 9 811 ++++++++++++++++++++++++++++++++++++++++ 812 <---V1---><---V2---><---V3---><---V4---> vectors 813 <--B1--><--B2--><--B3--><--B4--><--B5--> octets 814 <------------- frame 1 ----------------> 816 In particular, B1 contains the eight most significant bits of V1, 817 with the MSB of V1 being the MSB of B1. B2 contains the two least 818 significant bits of V1, the more significant of the two in its MSB, 819 and the six most significant bits of V2. B1 SHALL be placed first in 820 the RTP packet and B5 last. 822 4.5.7 G729 824 G729 is specified in ITU-T Recommendation G.729, "Coding of speech at 825 8 kbit/s using conjugate structure-algebraic code excited linear 826 prediction (CS-ACELP)". A reduced-complexity version of the G.729 827 algorithm is specified in Annex A to Rec. G.729. The speech coding 828 algorithms in the main body of G.729 and in G.729 Annex A are fully 829 interoperable with each other, so there is no need to further 830 distinguish between them. The G.729 and G.729 Annex A codecs were 831 optimized to represent speech with high quality, where G.729 Annex A 832 trades some speech quality for an approximate 50% complexity 833 reduction [14]. See the next Section (4.5.8) for other data rates 834 added in later G.729 Annexes. For all data rates, the sampling 835 frequency (and RTP timestamp clock rate) is 8000 Hz. 837 A voice activity detector (VAD) and comfort noise generator (CNG) 838 algorithm in Annex B of G.729 is RECOMMENDED for digital simultaneous 839 voice and data applications and can be used in conjunction with G.729 840 or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets, 841 while the G.729 Annex B comfort noise frame occupies 2 octets. 843 A G729 RTP packet may consist of zero or more G.729 or G.729 Annex A 844 frames, followed by zero or one G.729 Annex B frames. The presence of 845 a comfort noise frame can be deduced from the length of the RTP 846 payload. The default packetization interval is 20 ms (two frames), 847 but in some situations it may be desireable to send 10 ms packets. An 848 example would be a transition from speech to comfort noise in the 849 first 10 ms of the packet. For some applications, a longer 850 packetization interval may be required to reduce the packet rate. 852 The transmitted parameters of a G.729/G.729A 10-ms frame, consisting 853 of 80 bits, are defined in Recommendation G.729, Table 8/G.729. The 854 mapping of the these parameters is given below in Fig. 4. The 855 diagrams show the bit packing in "network byte order," also known as 856 big-endian order. The bits of each 32-bit word are numbered 0 to 31, 857 with the most significant bit on the left and numbered 0. The octets 858 (bytes) of each word are transmitted most significant octet first. 859 The bits of each data field are numbered in the order as produced by 860 the G.729 C code reference implementation. 862 0 1 2 3 863 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 864 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 865 |L| L1 | L2 | L3 | P1 |P| C1 | 866 |0| | | | |0| | 867 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4| 868 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 869 | C1 | S1 | GA1 | GB1 | P2 | C2 | 870 | 1 1 1| | | | | | 871 |5 6 7 8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7| 872 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 873 | C2 | S2 | GA2 | GB2 | 874 | 1 1 1| | | | 875 |8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3| 876 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 878 Figure 4: G.729 and G.729A bit packing 880 The packing of the G.729 Annex B comfort noise frame is shown in Fig. 881 5. 883 0 1 884 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 885 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 886 |L| LSF1 | LSF2 | GAIN |R| 887 |S| | | |E| 888 |F| | | |S| 889 |0|0 1 2 3 4|0 1 2 3|0 1 2 3 4|V| RESV = Reserved (zero) 890 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 892 Figure 5: G.729 Annex B bit packing 894 4.5.8 G729D and G729E 896 Annexes D and E to ITU-T Recommendation G.729 provide additional data 897 rates. Because the data rate is not signaled in the bitstream, the 898 different data rates are given distinct RTP encoding names which are 899 mapped to distinct payload type numbers. G729D indicates a 6.4 kbit/s 900 coding mode (G.729 Annex D, for momentary reduction in channel 901 capacity), while G729E indicates an 11.8 kbit/s mode (G.729 Annex E, 902 for improved performance with a wide range of narrow-band input 903 signals, e.g. music and background noise). Annex E has two operating 904 modes, backward adaptive and forward adaptive, which are signaled by 905 the first two bits in each frame (the most significant two bits of 906 the first octet). 908 The voice activity detector (VAD) and comfort noise generator (CNG) 909 algorithm specified in Annex B of G.729 may be used with Annex D and 910 Annex E frames in addition to G.729 and G.729 Annex A frames. The 911 algorithm details for the operation of Annexes D and E with the Annex 912 B CNG are specified in G.729 Annexes F and G. Note that Annexes F and 913 G do not introduce any new encodings. 915 For G729D, an RTP packet may consist of zero or more G.729 Annex D 916 frames, followed by zero or one G.729 Annex B frame. Similarly, for 917 G729E, an RTP packet may consist of zero or more G.729 Annex E 918 frames, followed by zero or one G.729 Annex B frame. The presence of 919 a comfort noise frame can be deduced from the length of the RTP 920 payload. 922 A single RTP packet must contain frames of only one data rate, 923 optionally followed by one comfort noise frame. The data rate may be 924 changed from packet to packet by changing the payload type number. 925 G.729 Annexes D, E and H describe what the encoding and decoding 926 algorithms must do to accommodate a change in data rate. 928 For G729D, the bits of a G.729 Annex D frame are formatted as shown 929 below in Fig. 6 (cf. Table D.1/G.729). The frame length is 64 bits. 931 0 1 2 3 932 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 933 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 934 |L| L1 | L2 | L3 | P1 | C1 | 935 |0| | | | | | 936 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7|0 1 2 3 4 5| 937 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 938 | C1 |S1 | GA1 | GB1 | P2 | C2 |S2 | GA2 | GB2 | 939 | | | | | | | | | | 940 |6 7 8|0 1|0 1 2|0 1 2|0 1 2 3|0 1 2 3 4 5 6 7 8|0 1|0 1 2|0 1 2| 941 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 943 Figure 6: G.729 Annex D bit packing 945 The net bit rate for the G.729 Annex E algorithm is 11.8 kbit/s and a 946 total of 118 bits are used. Two bits are appended as "don't care" 947 bits to complete an integer number of octets for the frame. For 948 G729E, the bits of a data frame are formatted as shown in the next 949 two diagrams (cf. Table E.1/G.729). The fields for the G729E forward 950 adaptive mode are packed as shown in Fig. 7. 952 0 1 2 3 953 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 954 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 955 |0 0|L| L1 | L2 | L3 | P1 |P| C0_1| 956 | |0| | | | |0| | 957 | | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2| 958 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 959 | | C1_1 | C2_1 | C3_1 | C4_1 | 960 | | | | | | 961 |3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6| 962 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 963 | GA1 | GB1 | P2 | C0_2 | C1_2 | C2_2 | 964 | | | | | | | 965 |0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5| 966 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 967 | | C3_2 | C4_2 | GA2 | GB2 |DC | 968 | | | | | | | 969 |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 970 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 972 Figure 7: G.729 Annex E (forward adaptive mode) bit packing 974 The fields for the G729E backward adaptive mode are packed as shown 975 in Fig. 8. 977 0 1 2 3 978 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 979 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 980 |1 1| P1 |P| C0_1 | C1_1 | 981 | | |0| 1 1 1| | 982 | |0 1 2 3 4 5 6 7|0|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7| 983 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 984 | | C2_1 | C3_1 | C4_1 |GA1 | GB1 |P2 | 985 | | | | | | | | 986 |8 9|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 987 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 988 | | C0_2 | C1_2 | C2_2 | 989 | | 1 1 1| | | 990 |2 3 4|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7 8 9|0 1 2 3 4 5| 991 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 992 | | C3_2 | C4_2 | GA2 | GB2 |DC | 993 | | | | | | | 994 |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 995 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 997 Figure 8: G.729 Annex E (backward adaptive mode) bit packing 999 4.5.9 GSM 1001 GSM (group speciale mobile) denotes the European GSM 06.10 standard 1002 for full-rate speech transcoding, ETS 300 961, which is based on 1003 RPE/LTP (residual pulse excitation/long term prediction) coding at a 1004 rate of 13 kb/s [15,16,17]. The text of the standard can be obtained 1005 from 1007 ETSI (European Telecommunications Standards Institute) 1008 ETSI Secretariat: B.P.152 1009 F-06561 Valbonne Cedex 1010 France 1011 Phone: +33 92 94 42 00 1012 Fax: +33 93 65 47 16 1014 Blocks of 160 audio samples are compressed into 33 octets, for an 1015 effective data rate of 13,200 b/s. 1017 4.5.9.1 General Packaging Issues 1019 The GSM standard (ETS 300 961) specifies the bit stream produced by 1020 the codec, but does not specify how these bits should be packed for 1021 transmission. The packetization specified here has subsequently been 1022 adopted in ETSI Technical Specification TS 101 318. Some software 1023 implementations of the GSM codec use a different packing than that 1024 specified here. 1026 In the GSM packing used by RTP, the bits SHALL be packed beginning 1027 from the most significant bit. Every 160 sample GSM frame is coded 1028 into one 33 octet (264 bit) buffer. Every such buffer begins with a 4 1029 bit signature (0xD), followed by the MSB encoding of the fields of 1030 the frame. The first octet thus contains 1101 in the 4 most 1031 significant bits (0-3) and the 4 most significant bits of F1 (0-3) in 1032 the 4 least significant bits (4-7). The second octet contains the 2 1033 least significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so 1034 on. The order of the fields in the frame is described in Table 2. 1036 4.5.9.2 GSM variable names and numbers 1038 In the RTP encoding we have the bit pattern described in Table 3, 1039 where F.i signifies the ith bit of the field F, bit 0 is the most 1040 significant bit, and the bits of every octet are numbered from 0 to 7 1041 from most to least significant. 1043 4.5.10 GSM-HR 1045 GSM-HR denotes GSM 06.20 half rate speech transcoding, specified in 1046 ETS 300 969 which is available from ETSI at the address given in 1047 Section 4.5.9. This codec has a frame length of 112 bits (14 octets). 1048 Packing of the fields in the codec bit stream into octets for 1049 transmission in RTP is done in a manner similar to that specified 1050 here for the original GSM 06.10 codec and is specified in ETSI 1051 Technical Specification TS 101 318. 1053 4.5.11 GSM-EFR 1055 GSM-EFR denotes GSM 06.60 enhanced full rate speech transcoding, 1056 specified in ETS 300 969 which is available from ETSI at the address 1057 given in Section 4.5.9. This codec has a frame length of 244 bits. 1058 For transmission in RTP, each codec frame is packed into a 31 octet 1059 (248 bit) buffer beginning with a 4-bit signature 0xC in a manner 1060 similar to that specified here for the original GSM 06.10 codec. The 1061 packing is specified in ETSI Technical Specification TS 101 318. 1063 4.5.12 L8 1065 L8 denotes linear audio data samples, using 8-bits of precision with 1066 an offset of 128, that is, the most negative signal is encoded as 1067 zero. 1069 field field name bits field field name bits 1070 ________________________________________________ 1071 1 LARc[0] 6 39 xmc[22] 3 1072 2 LARc[1] 6 40 xmc[23] 3 1073 3 LARc[2] 5 41 xmc[24] 3 1074 4 LARc[3] 5 42 xmc[25] 3 1075 5 LARc[4] 4 43 Nc[2] 7 1076 6 LARc[5] 4 44 bc[2] 2 1077 7 LARc[6] 3 45 Mc[2] 2 1078 8 LARc[7] 3 46 xmaxc[2] 6 1079 9 Nc[0] 7 47 xmc[26] 3 1080 10 bc[0] 2 48 xmc[27] 3 1081 11 Mc[0] 2 49 xmc[28] 3 1082 12 xmaxc[0] 6 50 xmc[29] 3 1083 13 xmc[0] 3 51 xmc[30] 3 1084 14 xmc[1] 3 52 xmc[31] 3 1085 15 xmc[2] 3 53 xmc[32] 3 1086 16 xmc[3] 3 54 xmc[33] 3 1087 17 xmc[4] 3 55 xmc[34] 3 1088 18 xmc[5] 3 56 xmc[35] 3 1089 19 xmc[6] 3 57 xmc[36] 3 1090 20 xmc[7] 3 58 xmc[37] 3 1091 21 xmc[8] 3 59 xmc[38] 3 1092 22 xmc[9] 3 60 Nc[3] 7 1093 23 xmc[10] 3 61 bc[3] 2 1094 24 xmc[11] 3 62 Mc[3] 2 1095 25 xmc[12] 3 63 xmaxc[3] 6 1096 26 Nc[1] 7 64 xmc[39] 3 1097 27 bc[1] 2 65 xmc[40] 3 1098 28 Mc[1] 2 66 xmc[41] 3 1099 29 xmaxc[1] 6 67 xmc[42] 3 1100 30 xmc[13] 3 68 xmc[43] 3 1101 31 xmc[14] 3 69 xmc[44] 3 1102 32 xmc[15] 3 70 xmc[45] 3 1103 33 xmc[16] 3 71 xmc[46] 3 1104 34 xmc[17] 3 72 xmc[47] 3 1105 35 xmc[18] 3 73 xmc[48] 3 1106 36 xmc[19] 3 74 xmc[49] 3 1107 37 xmc[20] 3 75 xmc[50] 3 1108 38 xmc[21] 3 76 xmc[51] 3 1110 Table 2: Ordering of GSM variables 1112 4.5.13 L16 1114 L16 denotes uncompressed audio data samples, using 16-bit signed 1115 representation with 65535 equally divided steps between minimum and 1116 Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 1117 _____________________________________________________________________________ 1118 0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3 1119 1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5 1120 2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2 1121 3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1 1122 4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2 1123 5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0 1124 6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04 1125 7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0 1126 8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2 1127 9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1 1128 10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0 1129 11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2 1130 12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0 1131 13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14 1132 14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0 1133 15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2 1134 16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1 1135 17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0 1136 18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2 1137 19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0 1138 20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24 1139 21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0 1140 22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2 1141 23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1 1142 24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0 1143 25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2 1144 26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0 1145 27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34 1146 28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0 1147 29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2 1148 30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1 1149 31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0 1150 32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2 1152 Table 3: GSM payload format 1154 maximum signal level, ranging from -32768 to 32767. The value is 1155 represented in two's complement notation and transmitted in network 1156 byte order (most significant byte first). 1158 4.5.14 LPC 1160 LPC designates an experimental linear predictive encoding contributed 1161 by Ron Frederick, which is based on an implementation written by Ron 1162 Zuckerman posted to the Usenet group comp.dsp on June 26, 1992. The 1163 codec generates 14 octets for every frame. The framesize is set to 20 1164 ms, resulting in a bit rate of 5,600 b/s. 1166 4.5.15 MPA 1168 MPA denotes MPEG-1 or MPEG-2 audio encapsulated as elementary 1169 streams. The encoding is defined in ISO standards ISO/IEC 11172-3 1170 and 13818-3. The encapsulation is specified in RFC 2250 [18]. 1172 The encoding may be at any of three levels of complexity, called 1173 Layer I, II and III. The selected layer as well as the sampling rate 1174 and channel count are indicated in the payload. The RTP timestamp 1175 clock rate is always 90000, independent of the sampling rate. MPEG-1 1176 audio supports sampling rates of 32, 44.1, and 48 kHz (ISO/IEC 1177 11172-3, section 1.1; "Scope"). MPEG-2 supports sampling rates of 16, 1178 22.05 and 24 kHz. The number of samples per frame is fixed, but the 1179 frame size will vary with the sampling rate and bit rate. 1181 4.5.16 PCMA and PCMU 1183 PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio data 1184 is encoded as eight bits per sample, after logarithmic scaling. PCMU 1185 denotes mu-law scaling, PCMA A-law scaling. A detailed description is 1186 given by Jayant and Noll [19]. Each G.711 octet SHALL be octet- 1187 aligned in an RTP packet. The sign bit of each G.711 octet SHALL 1188 correspond to the most significant bit of the octet in the RTP packet 1189 (i.e., assuming the G.711 samples are handled as octets on the host 1190 machine, the sign bit SHALL be the most signficant bit of the octet 1191 as defined by the host machine format). The 56 kb/s and 48 kb/s modes 1192 of G.711 are not applicable to RTP, since PCMA and PCMU MUST always 1193 be transmitted as 8-bit samples. 1195 4.5.17 QCELP 1197 The Electronic Industries Association (EIA) & Telecommunications 1198 Industry Association (TIA) standard IS-733, "TR45: High Rate Speech 1199 Service Option for Wideband Spread Spectrum Communications Systems," 1200 defines the QCELP audio compression algorithm for use in wireless 1201 CDMA applications. The QCELP CODEC compresses each 20 milliseconds of 1202 8000 Hz, 16- bit sampled input speech into one of four different size 1203 output frames: Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4 (54 1204 bits) or Rate 1/8 (20 bits). For typical speech patterns, this 1205 results in an average output of 6.8 k bits/sec for normal mode and 1206 4.7 k bits/sec for reduced rate mode. The packetization of the QCELP 1207 audio codec is described in [20]. 1209 4.5.18 RED 1210 The redundant audio payload format "RED" is specified by RFC 2198 1211 [21]. It defines a means by which multiple redundant copies of an 1212 audio packet may be transmitted in a single RTP stream. Each packet 1213 in such a stream contains, in addition to the audio data for that 1214 packetization interval, a (more heavily compressed) copy of the data 1215 from a previous packetization interval. This allows an approximation 1216 of the data from lost packets to be recovered upon decoding of a 1217 subsequent packet, giving much improved sound quality when compared 1218 with silence substitution for lost packets. 1220 4.5.19 VDVI 1222 VDVI is a variable-rate version of DVI4, yielding speech bit rates of 1223 between 10 and 25 kb/s. It is specified for single-channel operation 1224 only. Samples are packed into octets starting at the most- 1225 significant bit. The last octet is padded with 1 bits if the last 1226 sample does not fill the last octet. This padding is distinct from 1227 the valid codewords. The receiver needs to detect the padding 1228 because there is no explicit count of samples in the packet. 1230 It uses the following encoding: 1232 DVI4 codeword VDVI bit pattern 1233 _______________________________ 1234 0 00 1235 1 010 1236 2 1100 1237 3 11100 1238 4 111100 1239 5 1111100 1240 6 11111100 1241 7 11111110 1242 8 10 1243 9 011 1244 10 1101 1245 11 11101 1246 12 111101 1247 13 1111101 1248 14 11111101 1249 15 11111111 1251 5 Video 1253 The following sections describe the video encodings that are defined 1254 in this memo and give their abbreviated names used for 1255 identification. These video encodings and their payload types are 1256 listed in Table 5. 1258 All of these video encodings use an RTP timestamp frequency of 90,000 1259 Hz, the same as the MPEG presentation time stamp frequency. This 1260 frequency yields exact integer timestamp increments for the typical 1261 24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates 1262 and 50, 59.94 and 60 Hz field rates. While 90 kHz is the RECOMMENDED 1263 rate for future video encodings used within this profile, other rates 1264 MAY be used. However, it is not sufficient to use the video frame 1265 rate (typically between 15 and 30 Hz) because that does not provide 1266 adequate resolution for typical synchronization requirements when 1267 calculating the RTP timestamp corresponding to the NTP timestamp in 1268 an RTCP SR packet. The timestamp resolution MUST also be sufficient 1269 for the jitter estimate contained in the receiver reports. 1271 For most of these video encodings, the RTP timestamp encodes the 1272 sampling instant of the video image contained in the RTP data packet. 1273 If a video image occupies more than one packet, the timestamp is the 1274 same on all of those packets. Packets from different video images are 1275 distinguished by their different timestamps. 1277 Most of these video encodings also specify that the marker bit of the 1278 RTP header SHOULD be set to one in the last packet of a video frame 1279 and otherwise set to zero. Thus, it is not necessary to wait for a 1280 following packet with a different timestamp to detect that a new 1281 frame should be displayed. 1283 5.1 BT656 1285 The encoding is specified in ITU-R Recommendation BT.656-3, 1286 "Interfaces for Digital Component Video Signals in 525-Line and 625- 1287 Line Television Systems operating at the 4:2:2 Level of 1288 Recommendation ITU-R BT.601 (Part A)". The packetization and RTP- 1289 specific properties are described in RFC 2431 [22]. 1291 5.2 CelB 1293 The CELL-B encoding is a proprietary encoding proposed by Sun 1294 Microsystems. The byte stream format is described in RFC 2029 [23]. 1296 5.3 JPEG 1298 The encoding is specified in ISO Standards 10918-1 and 10918-2. The 1299 RTP payload format is as specified in RFC 2435 [24]. 1301 5.4 H261 1303 The encoding is specified in ITU-T Recommendation H.261, "Video codec 1304 for audiovisual services at p x 64 kbit/s". The packetization and 1305 RTP-specific properties are described in RFC 2032 [25]. 1307 5.5 H263 1309 The encoding is specified in the 1996 version of ITU-T Recommendation 1310 H.263, "Video coding for low bit rate communication". The 1311 packetization and RTP-specific properties are described in RFC 2190 1312 [26]. The H263-1998 payload format is RECOMMENDED over this one for 1313 use by new implementations. 1315 5.6 H263-1998 1317 The encoding is specified in the 1998 version of ITU-T Recommendation 1318 H.263, "Video coding for low bit rate communication". The 1319 packetization and RTP-specific properties are described in RFC 2429 1320 [27]. Because the 1998 version of H.263 is a superset of the 1996 1321 syntax, this payload format can also be used with the 1996 version of 1322 H.263, and is RECOMMENDED for this use by new implementations. This 1323 payload format does not replace RFC 2190, which continues to be used 1324 by existing implementations, and may be required for backward 1325 compatibility in new implementations. Implementations using the new 1326 features of the 1998 version of H.263 MUST use the payload format 1327 described in RFC 2429. 1329 5.7 MPV 1331 MPV designates the use of MPEG-1 and MPEG-2 video encoding elementary 1332 streams as specified in ISO Standards ISO/IEC 11172 and 13818-2, 1333 respectively. The RTP payload format is as specified in RFC 2250 1334 [18], Section 3. 1336 5.8 MP2T 1338 MP2T designates the use of MPEG-2 transport streams, for either audio 1339 or video. The RTP payoad format is described in RFC 2250 [18], 1340 Section 2. 1342 5.9 MP1S 1344 MP1S designates an MPEG-1 systems stream, encapsulated according to 1345 RFC 2250 [18]. 1347 5.10 MP2P 1349 MP2P designates an MPEG-2 program stream, encapsulated according to 1350 RFC 2250 [18]. 1352 5.11 BMPEG 1354 BMPEG designates an experimental payload format for MPEG-1 and MPEG-2 1355 which specifies bundled (multiplexed) transport of audio and video 1356 elementary streams in one RTP stream as an alternative to the MP1S 1357 and MP2P formats. The packetization is described in RFC 2343 [28]. 1359 5.12 nv 1361 The encoding is implemented in the program `nv', version 4, developed 1362 at Xerox PARC by Ron Frederick. Further information is available from 1363 the author: 1365 Ron Frederick 1366 Entera, Inc. 1367 40971 Encyclopedia Circle 1368 Fremont, CA 94538 1369 United States 1370 electronic mail: ronf@entera.com 1372 6 Payload Type Definitions 1374 Tables 4 and 5 define this profile's static payload type values for 1375 the PT field of the RTP data header. In addition, payload type 1376 values in the range 96-127 MAY be defined dynamically through a 1377 conference control protocol, which is beyond the scope of this 1378 document. For example, a session directory could specify that for a 1379 given session, payload type 96 indicates PCMU encoding, 8,000 Hz 1380 sampling rate, 2 channels. Entries in Tables 4 and 5 with payload 1381 type "dyn" have no static payload type assigned and are only used 1382 with a dynamic payload type. Payload type 13 is reserved for a 1383 comfort noise payload format to be specified in a separate RFC. 1384 Payload type 19 is also marked "reserved" because some draft versions 1385 of this specification assigned that number to a comfort noise payload 1386 format. The payload type range 72-76 is marked "reserved" so that 1387 RTCP and RTP packets can be reliably distinguished (see Section 1388 "Summary of Protocol Constants" of the RTP protocol specification). 1390 The payload types currently defined in this profile are assigned to 1391 exactly one of three categories or media types : audio only, video 1392 only and those combining audio and video. The media types are marked 1393 in Tables 4 and 5 as "A", "V" and "AV", respectively. Payload types 1394 of different media types SHALL NOT be interleaved or multiplexed 1395 within a single RTP session, but multiple RTP sessions MAY be used in 1396 parallel to send multiple media types. An RTP source MAY change 1397 payload types within the same media type during a session. See the 1398 section "Multiplexing RTP Sessions" of RFC XXXX for additional 1399 explanation. 1401 Session participants agree through mechanisms beyond the scope of 1402 this specification on the set of payload types allowed in a given 1403 session. This set MAY, for example, be defined by the capabilities 1404 of the applications used, negotiated by a conference control protocol 1405 or established by agreement between the human participants. 1407 Audio applications operating under this profile SHOULD, at a minimum, 1408 be able to send and/or receive payload types 0 (PCMU) and 5 (DVI4). 1409 This allows interoperability without format negotiation and ensures 1410 successful negotation with a conference control protocol. 1412 PT encoding media type clock rate channels 1413 name (Hz) 1414 ___________________________________________________ 1415 0 PCMU A 8000 1 1416 1 1016 A 8000 1 1417 2 G726-32 A 8000 1 1418 3 GSM A 8000 1 1419 4 G723 A 8000 1 1420 5 DVI4 A 8000 1 1421 6 DVI4 A 16000 1 1422 7 LPC A 8000 1 1423 8 PCMA A 8000 1 1424 9 G722 A 8000 1 1425 10 L16 A 44100 2 1426 11 L16 A 44100 1 1427 12 QCELP A 8000 1 1428 13 reserved A 1429 14 MPA A 90000 (see text) 1430 15 G728 A 8000 1 1431 16 DVI4 A 11025 1 1432 17 DVI4 A 22050 1 1433 18 G729 A 8000 1 1434 19 reserved A 1435 20 unassigned A 1436 21 unassigned A 1437 22 unassigned A 1438 23 unassigned A 1439 dyn G729D A 8000 1 1440 dyn G729E A 8000 1 1441 dyn GSM-HR A 8000 1 1442 dyn GSM-EFR A 8000 1 1443 dyn L8 A var. var. 1444 dyn RED A (see text) 1445 dyn VDVI A var. 1 1447 Table 4: Payload types (PT) for audio encodings 1448 PT encoding media type clock rate 1449 name (Hz) 1450 ____________________________________________ 1451 24 unassigned V 1452 25 CelB V 90000 1453 26 JPEG V 90000 1454 27 unassigned V 1455 28 nv V 90000 1456 29 unassigned V 1457 30 unassigned V 1458 31 H261 V 90000 1459 32 MPV V 90000 1460 33 MP2T AV 90000 1461 34 H263 V 90000 1462 35-71 unassigned ? 1463 72-76 reserved N/A N/A 1464 77-95 unassigned ? 1465 96-127 dynamic ? 1466 dyn BT656 V 90000 1467 dyn H263-1998 V 90000 1468 dyn MP1S V 90000 1469 dyn MP2P V 90000 1470 dyn BMPEG V 90000 1472 Table 5: Payload types (PT) for video and combined encodings 1474 7 RTP over TCP and Similar Byte Stream Protocols 1476 Under special circumstances, it may be necessary to carry RTP in 1477 protocols offering a byte stream abstraction, such as TCP, possibly 1478 multiplexed with other data. If the application does not define its 1479 own method of delineating RTP and RTCP packets, it SHOULD prefix each 1480 packet with a two-octet length field in network order (most 1481 significant octet first). 1483 (Note: RTSP [29] provides its own encapsulation and does not need an 1484 extra length indication.) 1486 8 Port Assignment 1488 As specified in the RTP protocol definition, RTP data SHOULD be 1489 carried on an even UDP or TCP port number and the corresponding RTCP 1490 packets SHOULD be carried on the next higher (odd) port number. 1492 Applications operating under this profile MAY use any such UDP or TCP 1493 port pair. For example, the port pair MAY be allocated randomly by a 1494 session management program. A single fixed port number pair cannot be 1495 required because multiple applications using this profile are likely 1496 to run on the same host, and there are some operating systems that do 1497 not allow multiple processes to use the same UDP port with different 1498 multicast addresses. 1500 However, port numbers 5004 and 5005 have been registered for use with 1501 this profile for those applications that choose to use them as the 1502 default pair. Applications that operate under multiple profiles MAY 1503 use this port pair as an indication to select this profile if they 1504 are not subject to the constraint of the previous paragraph. 1505 Applications need not have a default and MAY require that the port 1506 pair be explicitly specified. The particular port numbers were chosen 1507 to lie in the range above 5000 to accommodate port number allocation 1508 practice within some versions of the Unix operating system, where 1509 port numbers below 1024 can only be used by privileged processes and 1510 port numbers between 1024 and 5000 are automatically assigned by the 1511 operating system. 1513 9 Changes from RFC 1890 1515 This RFC revises RFC 1890. It is fully backwards-compatible with RFC 1516 1890 and codifies existing practice. The changes are listed below. 1518 o Additional payload formats and/or expanded descriptions were 1519 included for G722, G723, G726, G728, G729, GSM, GSM-HR, GSM- 1520 EFR, QCELP, RED, VDVI, BT656, H263, H263-1998, MP1S, MP2P and 1521 BMPEG. 1523 o Static payload types 4, 12, 16, 17, 18 and 34 were added, and 1524 13 and 19 were reserved. 1526 o Requirements for congestion control were added in Section 2. 1528 o A new Section "IANA Considerations" was added to specify the 1529 regstration of the name for this profile and to establish a 1530 new policy that no additional registration of static payload 1531 types for this profile will be made beyond those included in 1532 Tables 4 and 5, but that additional encoding names may be 1533 registered as MIME subtypes for binding to dynamic payload 1534 types. 1536 o In Section 4.1, the requirement level for setting of the 1537 marker bit on the first packet after silence for audio was 1538 changed from "is" to "SHOULD be". 1540 o Similarly, text was added to specify that the marker bit 1541 SHOULD be set to one on the last packet of a video frame, and 1542 that video frames are distinguished by their timestamps. 1544 o This profile follows the suggestion in the RTP spec that RTCP 1545 bandwidth may be specified separately from the session 1546 bandwidth and separately for active senders and passive 1547 receivers. 1549 o RFC references are added for payload formats published after 1550 RFC 1890. 1552 o A minimal TCP encapsulation is defined. 1554 o The security considerations and full copyright sections were 1555 added. 1557 o According to Peter Hoddie of Apple, only pre-1994 Macintosh 1558 used the 22254.54 rate and none the 11127.27 rate, so the 1559 latter was dropped from the discussion of suggested sampling 1560 frequencies. 1562 o Table 1 was corrected to move some values from the "ms/packet" 1563 column to the "default ms/packet" column where they belonged. 1565 o A note has been added for G722 to clarify a discrepancy 1566 between the actual sampling rate and the RTP timestamp clock 1567 rate. 1569 o Small clarifications of the text have been made in several 1570 places, some in response to questions from readers. In 1571 particular: 1573 - A definition for "media type" is given in Section 1.1 to 1574 allow the explanation of multiplexing RTP sessions in 1575 Section 6 to be more clear regarding the multiplexing of 1576 multiple media. 1578 - The explanation of how to determine the number of audio 1579 frames in a packet from the length was expanded. 1581 - More description of the allocation of bandwidth to SDES 1582 items is given. 1584 - A note was added that the convention for the order of 1585 channels specified in Section 4.1 may be overridden by a 1586 particular encoding or payload format specification. 1588 - The terms MUST, SHOULD, MAY, etc. are used as defined in RFC 1589 2119. 1591 o A second author for this document was added. 1593 10 Security Considerations 1595 Implementations using the profile defined in this specification are 1596 subject to the security considerations discussed in the RTP 1597 specification [1]. This profile does not specify any different 1598 security services other than giving rules for mapping characters in a 1599 user-provided pass phrase to canonical form. The primary function of 1600 this profile is to list a set of data compression encodings for audio 1601 and video media. 1603 Confidentiality of the media streams is achieved by encryption. 1604 Because the data compression used with the payload formats described 1605 in this profile is applied end-to-end, encryption may be performed 1606 after compression so there is no conflict between the two operations. 1608 A potential denial-of-service threat exists for data encodings using 1609 compression techniques that have non-uniform receiver-end 1610 computational load. The attacker can inject pathological datagrams 1611 into the stream which are complex to decode and cause the receiver to 1612 be overloaded. However, the encodings described in this profile do 1613 not exhibit any significant non-uniformity. 1615 As with any IP-based protocol, in some circumstances a receiver may 1616 be overloaded simply by the receipt of too many packets, either 1617 desired or undesired. Network-layer authentication MAY be used to 1618 discard packets from undesired sources, but the processing cost of 1619 the authentication itself may be too high. In a multicast 1620 environment, pruning of specific sources may be implemented in future 1621 versions of IGMP [30] and in multicast routing protocols to allow a 1622 receiver to select which sources are allowed to reach it. 1624 11 Full Copyright Statement 1626 Copyright (C) The Internet Society (2000). All Rights Reserved. 1628 This document and translations of it may be copied and furnished to 1629 others, and derivative works that comment on or otherwise explain it 1630 or assist in its implmentation may be prepared, copied, published and 1631 distributed, in whole or in part, without restriction of any kind, 1632 provided that the above copyright notice and this paragraph are 1633 included on all such copies and derivative works. However, this 1634 document itself may not be modified in any way, such as by removing 1635 the copyright notice or references to the Internet Society or other 1636 Internet organizations, except as needed for the purpose of 1637 developing Internet standards in which case the procedures for 1638 copyrights defined in the Internet Standards process must be 1639 followed, or as required to translate it into languages other than 1640 English. 1642 The limited permissions granted above are perpetual and will not be 1643 revoked by the Internet Society or its successors or assigns. 1645 This document and the information contained herein is provided on an 1646 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1647 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1648 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1649 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1650 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1652 12 Acknowledgements 1654 The comments and careful review of Simao Campos, Richard Cox and AVT 1655 Working Group participants are gratefully acknowledged. The GSM 1656 description was adopted from the IMTC Voice over IP Forum Service 1657 Interoperability Implementation Agreement (January 1997). Fred Burg 1658 and Terry Lyons helped with the G.729 description. 1660 13 Addresses of Authors 1662 Henning Schulzrinne 1663 Dept. of Computer Science 1664 Columbia University 1665 1214 Amsterdam Avenue 1666 New York, NY 10027 1667 USA 1668 electronic mail: schulzrinne@cs.columbia.edu 1670 Stephen L. Casner 1671 Packet Design, Inc. 1672 66 Willow Place 1673 Menlo Park, CA 94025 1674 United States 1675 electronic mail: casner@acm.org 1677 A Bibliography 1679 [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A 1680 transport protocol for real-time applications," Internet Draft, 1681 Internet Engineering Task Force, Feb. 1999 Work in progress, revision 1682 to RFC 1889. 1684 [2] S. Bradner, "Key words for use in RFCs to Indicate Requirement 1685 Levels," RFC 2119, Internet Engineering Task Force, Mar. 1997. 1687 [3] R. Braden, D. Clark, S. Shenker, "Integrated Services in the 1688 Internet Architecture: an Overview," Request for Comments 1689 (Informational) RFC 1633, Internet Engineering Task Force, June 1994. 1691 [4] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. Weiss, "An 1692 Architecture for Differentiated Service," Request for Comments 1693 (Proposed Standard) RFC 2475, Internet Engineering Task Force, Dec. 1694 1998. 1696 [5] M. Handley and V. Jacobson, "SDP: Session Description Protocol," 1697 Request for Comments (Proposed Standard) RFC 2327, Internet 1698 Engineering Task Force, Apr. 1998. 1700 [6] P. Hoschka, "MIME Type Registration of RTP Payload Types," 1701 Internet Draft, Internet Engineering Task Force, Feb. 1999 Work in 1702 progress. 1704 [7] N. Freed, J. Klensin, and J. Postel, "Multipurpose Internet Mail 1705 Extensions (MIME) Part Four: Registration Procedures," RFC 2048, 1706 Internet Engineering Task Force, Nov. 1996. 1708 [8] Apple Computer, "Audio interchange file format AIFF-C," Aug. 1709 1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z). 1711 [9] Office of Technology and Standards, "Telecommunications: Analog 1712 to digital conversion of radio voice by 4,800 bit/second code excited 1713 linear prediction (celp)," Federal Standard FS-1016, GSA, Room 6654; 1714 7th & D Street SW; Washington, DC 20407 (+1-202-708-9205), 1990. 1716 [10] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The 1717 proposed Federal Standard 1016 4800 bps voice coder: CELP," Speech 1718 Technology , vol. 5, pp. 58--64, April/May 1990. 1720 [11] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The 1721 federal standard 1016 4800 bps CELP voice coder," Digital Signal 1722 Processing , vol. 1, no. 3, pp. 145--155, 1991. 1724 [12] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The DoD 1725 4.8 kbps standard (proposed federal standard 1016)," in Advances in 1726 Speech Coding (B. Atal, V. Cuperman, and A. Gersho, eds.), ch. 12, 1727 pp. 121--133, Kluwer Academic Publishers, 1991. 1729 [13] IMA Digital Audio Focus and Technical Working Groups, 1730 "Recommended practices for enhancing digital audio compatibility in 1731 multimedia systems (version 3.00)," tech. rep., Interactive 1732 Multimedia Association, Annapolis, Maryland, Oct. 1992. 1734 [14] D. Deleam and J.-P. Petit, "Real-time implementations of the 1735 recent ITU-T low bit rate speech coders on the TI TMS320C54X DSP: 1736 results, methodology, and applications," in Proc. of International 1737 Conference on Signal Processing, Technology, and Applications 1738 (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660, Oct. 1996. 1740 [15] M. Mouly and M.-B. Pautet, The GSM system for mobile 1741 communications Lassay-les-Chateaux, France: Europe Media Duplication, 1742 1993. 1744 [16] J. Degener, "Digital speech compression," Dr. Dobb's Journal , 1745 Dec. 1994. 1747 [17] S. M. Redl, M. K. Weber, and M. W. Oliphant, An Introduction to 1748 GSM Boston: Artech House, 1995. 1750 [18] D. Hoffman, G. Fernando, V. Goyal, and M. Civanlar, "RTP payload 1751 format for MPEG1/MPEG2 video," Request for Comments (Proposed 1752 Standard) RFC 2250, Internet Engineering Task Force, Jan. 1998. 1754 [19] N. S. Jayant and P. Noll, Digital Coding of Waveforms-- 1755 Principles and Applications to Speech and Video Englewood Cliffs, New 1756 Jersey: Prentice-Hall, 1984. 1758 [20] K. McKay, "RTP Payload Format for PureVoice(tm) Audio", Request 1759 for Comments (Proposed Standard) RFC 2658, Internet Engineering Task 1760 Force, Aug. 1999. 1762 [21] C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.C. 1763 Bolot, A. Vega-Garcia, and S. Fosse-Parisis, "RTP Payload for 1764 Redundant Audio Data," Request for Comments (Proposed Standard) RFC 1765 2198, Internet Engineering Task Force, Sep. 1997. 1767 [22] D. Tynan, "RTP payload format for BT.656 Video Encoding," 1768 Request for Comments (Proposed Standard) RFC 2431, Internet 1769 Engineering Task Force, Oct. 1998. 1771 [23] M. Speer and D. Hoffman, "RTP payload format of sun's CellB 1772 video encoding," Request for Comments (Proposed Standard) RFC 2029, 1773 Internet Engineering Task Force, Oct. 1996. 1775 [24] L. Berc, W. Fenner, R. Frederick, and S. McCanne, "RTP payload 1776 format for JPEG-compressed video," Request for Comments (Proposed 1777 Standard) RFC 2435, Internet Engineering Task Force, Oct. 1996. 1779 [25] T. Turletti and C. Huitema, "RTP payload format for H.261 video 1780 streams," Request for Comments (Proposed Standard) RFC 2032, Internet 1781 Engineering Task Force, Oct. 1996. 1783 [26] C. Zhu, "RTP payload format for H.263 video streams," Request 1784 for Comments (Proposed Standard) RFC 2190, Internet Engineering Task 1785 Force, Sep. 1997. 1787 [27] C. Bormann, L. Cline, G. Deisher, T. Gardos, C. Maciocco, D. 1788 Newell, J. Ott, G. Sullivan, S. Wenger, C. Zhu, "RTP Payload Format 1789 for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)," Request for 1790 Comments (Proposed Standard) RFC 2429, Internet Engineering Task 1791 Force, Oct. 1998. 1793 [28] M. Civanlar, G. Cash, B. Haskell, "RTP Payload Format for 1794 Bundled MPEG," Request for Comments (Experimental) RFC 2343, Internet 1795 Engineering Task Force, May 1998. 1797 [29] H. Schulzrinne, A. Rao, and R. Lanphier, "Real time streaming 1798 protocol (RTSP)," Request for Comments (Proposed Standard) RFC 2326, 1799 Internet Engineering Task Force, Apr. 1998. 1801 [30] S. Deering, "Host Extensions for IP Multicasting," Request for 1802 Comments RFC 1112, STD 5, Internet Engineering Task Force, Aug. 1989. 1804 Current Locations of Related Resources 1806 Note: Several sections below refer to the ITU-T Software Tool Library 1807 (STL). It is available from the ITU Sales Service, Place des Nations, 1808 CH-1211 Geneve 20, Switzerland (also check http://www.itu.int. The 1809 ITU-T STL is covered by a license defined in ITU-T Recommendation 1810 G.191, "Software tools for speech and audio coding standardization". 1812 UTF-8 1814 Information on the UCS Transformation Format 8 (UTF-8) is available 1815 at 1817 http://www.stonehand.com/unicode/standard/utf8.html 1819 1016 1821 The U.S. DoD's Federal-Standard-1016 based 4800 bps code excited 1822 linear prediction voice coder version 3.2 (CELP 3.2) Fortran and C 1823 simulation source codes are available for worldwide distribution at 1824 no charge (on DOS diskettes, but configured to compile on Sun SPARC 1825 stations) from: Bob Fenichel, National Communications System, 1826 Washington, D.C. 20305, phone +1-703-692-2124, fax +1-703-746-4960. 1828 An implementation is also available at 1830 ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z 1832 DVI4 1834 An implementation is available from Jack Jansen at 1836 ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar 1838 G722 1840 An implementation of the G.722 algorithm is available as part of the 1841 ITU-T STL, described above. 1843 G723 1845 The reference C code implementation defining the G.723.1 algorithm 1846 and its Annexes A, B, and C are available as an integral part of 1847 Recommendation G.723.1 from the ITU Sales Service, address listed 1848 above. Both the algorithm and C code are covered by a specific 1849 license. The ITU-T Secretariat should be contacted to obtain such 1850 licensing information. 1852 G726-32 1854 G726-32 is specified in the ITU-T Recommendation G.726, "40, 32, 24, 1855 and 16 kb/s Adaptive Differential Pulse Code Modulation (ADPCM)". An 1856 implementation of the G.726 algorithm is available as part of the 1857 ITU-T STL, described above. 1859 G729 1861 The reference C code implementation defining the G.729 algorithm and 1862 its Annexes A through I are available as an integral part of 1863 Recommendation G.729 from the ITU Sales Service, listed above. Annex 1864 I contains the integrated C source code for all G.729 operating 1865 modes. The G.729 algorithm and associated C code are covered by a 1866 specific license. The contact information for obtaining the license 1867 is available from the ITU-T Secretariat. 1869 GSM 1871 A reference implementation was written by Carsten Borman and Jutta 1872 Degener (TU Berlin, Germany). It is available at 1874 ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ 1876 Although the RPE-LTP algorithm is not an ITU-T standard, there is a C 1877 code implementation of the RPE-LTP algorithm available as part of the 1878 ITU-T STL. The STL implementation is an adaptation of the TU Berlin 1879 version. 1881 LPC 1883 An implementation is available at 1885 ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z 1887 PCMU, PCMA 1889 An implementation of these algorithm is available as part of the 1890 ITU-T STL, described above. Code to convert between linear and mu-law 1891 companded data is also available in [13]. 1893 Table of Contents 1895 1 Introduction ........................................ 3 1896 1.1 Terminology ......................................... 3 1897 2 RTP and RTCP Packet Forms and Protocol Behavior ..... 4 1898 3 IANA Considerations ................................. 7 1899 3.1 Registering Additional Encodings .................... 7 1900 4 Audio ............................................... 8 1901 4.1 Encoding-Independent Rules .......................... 8 1902 4.2 Operating Recommendations ........................... 10 1903 4.3 Guidelines for Sample-Based Audio Encodings ......... 10 1904 4.4 Guidelines for Frame-Based Audio Encodings .......... 11 1905 4.5 Audio Encodings ..................................... 11 1906 4.5.1 1016 ................................................ 12 1907 4.5.2 DVI4 ................................................ 12 1908 4.5.3 G722 ................................................ 14 1909 4.5.4 G723 ................................................ 14 1910 4.5.5 G726-32 ............................................. 18 1911 4.5.6 G728 ................................................ 18 1912 4.5.7 G729 ................................................ 19 1913 4.5.8 G729D and G729E ..................................... 21 1914 4.5.9 GSM ................................................. 24 1915 4.5.9.1 General Packaging Issues ............................ 24 1916 4.5.9.2 GSM variable names and numbers ...................... 25 1917 4.5.10 GSM-HR .............................................. 25 1918 4.5.11 GSM-EFR ............................................. 25 1919 4.5.12 L8 .................................................. 25 1920 4.5.13 L16 ................................................. 26 1921 4.5.14 LPC ................................................. 27 1922 4.5.15 MPA ................................................. 28 1923 4.5.16 PCMA and PCMU ....................................... 28 1924 4.5.17 QCELP ............................................... 28 1925 4.5.18 RED ................................................. 28 1926 4.5.19 VDVI ................................................ 29 1927 5 Video ............................................... 29 1928 5.1 BT656 ............................................... 30 1929 5.2 CelB ................................................ 30 1930 5.3 JPEG ................................................ 30 1931 5.4 H261 ................................................ 30 1932 5.5 H263 ................................................ 31 1933 5.6 H263-1998 ........................................... 31 1934 5.7 MPV ................................................. 31 1935 5.8 MP2T ................................................ 31 1936 5.9 MP1S ................................................ 31 1937 5.10 MP2P ................................................ 31 1938 5.11 BMPEG ............................................... 31 1939 5.12 nv .................................................. 32 1940 6 Payload Type Definitions ............................ 32 1941 7 RTP over TCP and Similar Byte Stream Protocols ...... 34 1942 8 Port Assignment ..................................... 34 1943 9 Changes from RFC 1890 ............................... 35 1944 10 Security Considerations ............................. 37 1945 11 Full Copyright Statement ............................ 37 1946 12 Acknowledgements .................................... 38 1947 13 Addresses of Authors ................................ 38 1948 A Bibliography ........................................ 38