idnits 2.17.1 draft-ietf-avt-profile-new-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** There are 35 instances of too long lines in the document, the longest one being 8 characters in excess of 72. ** There are 6 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 401 has weird spacing: '...hannels descr...' == Line 409 has weird spacing: '... lc c r...' == Line 516 has weird spacing: '...ncoding sampl...' == Line 538 has weird spacing: '...A: not appli...' == Line 923 has weird spacing: '... field field...' == (2 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 2, 2001) is 8454 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 1494 looks like a reference -- Missing reference section? '2' on line 1499 looks like a reference -- Missing reference section? '3' on line 1502 looks like a reference -- Missing reference section? '4' on line 1506 looks like a reference -- Missing reference section? '5' on line 1511 looks like a reference -- Missing reference section? '6' on line 1515 looks like a reference -- Missing reference section? '7' on line 1519 looks like a reference -- Missing reference section? '8' on line 1523 looks like a reference -- Missing reference section? '9' on line 1654 looks like a reference -- Missing reference section? '10' on line 1531 looks like a reference -- Missing reference section? '15' on line 1551 looks like a reference -- Missing reference section? '16' on line 1555 looks like a reference -- Missing reference section? '17' on line 1559 looks like a reference -- Missing reference section? '0' on line 937 looks like a reference -- Missing reference section? '22' on line 1582 looks like a reference -- Missing reference section? '23' on line 1586 looks like a reference -- Missing reference section? '24' on line 927 looks like a reference -- Missing reference section? '25' on line 928 looks like a reference -- Missing reference section? '26' on line 933 looks like a reference -- Missing reference section? '27' on line 934 looks like a reference -- Missing reference section? '28' on line 935 looks like a reference -- Missing reference section? '29' on line 936 looks like a reference -- Missing reference section? '30' on line 937 looks like a reference -- Missing reference section? '31' on line 938 looks like a reference -- Missing reference section? '32' on line 939 looks like a reference -- Missing reference section? '33' on line 940 looks like a reference -- Missing reference section? '34' on line 941 looks like a reference -- Missing reference section? '35' on line 942 looks like a reference -- Missing reference section? '36' on line 943 looks like a reference -- Missing reference section? '37' on line 944 looks like a reference -- Missing reference section? '38' on line 945 looks like a reference -- Missing reference section? '11' on line 1537 looks like a reference -- Missing reference section? '12' on line 1541 looks like a reference -- Missing reference section? '39' on line 950 looks like a reference -- Missing reference section? '40' on line 951 looks like a reference -- Missing reference section? '41' on line 952 looks like a reference -- Missing reference section? '42' on line 953 looks like a reference -- Missing reference section? '13' on line 1544 looks like a reference -- Missing reference section? '43' on line 954 looks like a reference -- Missing reference section? '14' on line 1547 looks like a reference -- Missing reference section? '44' on line 955 looks like a reference -- Missing reference section? '45' on line 956 looks like a reference -- Missing reference section? '46' on line 957 looks like a reference -- Missing reference section? '47' on line 958 looks like a reference -- Missing reference section? '18' on line 1564 looks like a reference -- Missing reference section? '48' on line 959 looks like a reference -- Missing reference section? '19' on line 1568 looks like a reference -- Missing reference section? '49' on line 960 looks like a reference -- Missing reference section? '20' on line 1572 looks like a reference -- Missing reference section? '50' on line 961 looks like a reference -- Missing reference section? '21' on line 1576 looks like a reference -- Missing reference section? '51' on line 962 looks like a reference Summary: 6 errors (**), 0 flaws (~~), 8 warnings (==), 55 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force AVT WG 3 Internet Draft Schulzrinne/Casner 4 draft-ietf-avt-profile-new-10.txt Columbia U./Packet Design 5 March 2, 2001 6 Expires: August 2, 2001 8 RTP Profile for Audio and Video Conferences with Minimal Control 10 STATUS OF THIS MEMO 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress". 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 To view the list Internet-Draft Shadow Directories, see 29 http://www.ietf.org/shadow.html. 31 Abstract 33 This memorandum is a revision of RFC 1890 in preparation for 34 advancement from Proposed Standard to Draft Standard status. Readers 35 are encouraged to use the PostScript form of this draft to see where 36 changes from RFC 1890 are marked by change bars. 38 This document describes a profile called "RTP/AVP" for the use of the 39 real-time transport protocol (RTP), version 2, and the associated 40 control protocol, RTCP, within audio and video multiparticipant 41 conferences with minimal control. It provides interpretations of 42 generic fields within the RTP specification suitable for audio and 43 video conferences. In particular, this document defines a set of 44 default mappings from payload type numbers to encodings. 46 This document also describes how audio and video data may be carried 47 within RTP. It defines a set of standard encodings and their names 48 when used within RTP. The descriptions provide pointers to reference 49 implementations and the detailed standards. This document is meant as 50 an aid for implementors of audio, video and other real-time 51 multimedia applications. 53 Resolution of Open Issues 55 [Note to the RFC Editor: This section is to be deleted when this 56 draft is published as an RFC but is shown here for reference during 57 the Last Call. The first paragraph of the Abstract is also to be 58 deleted. All RFC XXXX should be filled in with the number of the RTP 59 specification RFC submitted for Draft Standard status, and all RFC 60 YYYY should be filled in with the number of the draft specifying MIME 61 registration of RTP payload types as it is submitted for Proposed 62 Standard status. These latter references are intended to be non- 63 normative.] 65 Readers are directed to Appendix 9, Changes from RFC 1890, for a 66 listing of the changes that have been made in this draft. The 67 changes from RFC 1890 are marked with change bars in the PostScript 68 form of this draft. 70 The changes in this revision of the draft from the previous one are: 72 o An paragraph further explaining the requirements for congestion 73 control was added to Section 2 based on the discussion at IETF 74 49. 76 o Packetization of G.726 audio at rates 40, 24 and 16 kb/s is 77 specified in addition to 32 kb/s. 79 o The mapping of a user pass-phrase string into an encryption key 80 was deleted from Section 2 because two interoperable 81 implementations were not found. 83 o The specification of a two-byte encapsulation for RTP over TCP 84 was deleted because two interoperable implementations were not 85 found. 87 o The audio payload formats 1016, G723, GSM-HR and GSM-EFR were 88 removed because two interoperable implementations were not 89 found. 91 o The video payload formats H263, BT656, MP2T, MP1S, MP2P and 92 BMPEG were removed because two interoperable implementations 93 were not found. 95 This version of the draft is intended to be complete for Last Call. 96 The following open issues from previous drafts have been addressed: 98 o The procedure for registering RTP encoding names as MIME 99 subtypes was moved to a separate RFC-to-be that may also serve 100 to specify how (some of) the encodings here may be used with 101 mail and other not-RTP transports. That procedure is not 102 required to implement this profile, but may be used in those 103 contexts where it is needed. 105 o This profile follows the suggestion in the RTP spec that RTCP 106 bandwidth may be specified separately from the session 107 bandwidth and separately for active senders and passive 108 receivers. 110 o No specific action is taken in this document to address 111 generic payload formats; it is assumed that if any generic 112 payload formats are developed, they can be specified in 113 separate RFCs and that the session parameters they require for 114 operation can be specified in the MIME registration of those 115 formats. 117 o The specification of the CN (comfort noise) payload format has 118 been removed to a separate draft so that it may be enhanced as 119 a result of additional work in ITU-T. That draft is intended 120 for publication at Proposed Standard status. Static payload 121 type 13 is marked reserved here for the use of that payload 122 format (since CN has already been implemented from earlier 123 drafts of this profile). Static payload type 19 is also 124 reserved because some revisions of the draft assigned that 125 number to CN to avoid an historic use of 13. 127 o The requirement for congestion control in RTP is addressed in 128 the RTP spec with an explanation that the behavior is context 129 specific and should be defined in RTP profiles. Text has been 130 added to this profile in Section 2 to describe the 131 requirements only in general terms because specific algorithms 132 have not been devised yet for multicast congestion control. 134 1 Introduction 136 This profile defines aspects of RTP left unspecified in the RTP 137 Version 2 protocol definition (RFC XXXX) [1]. This profile is 138 intended for the use within audio and video conferences with minimal 139 session control. In particular, no support for the negotiation of 140 parameters or membership control is provided. The profile is expected 141 to be useful in sessions where no negotiation or membership control 142 are used (e.g., using the static payload types and the membership 143 indications provided by RTCP), but this profile may also be useful in 144 conjunction with a higher-level control protocol. 146 Use of this profile may be implicit in the use of the appropriate 147 applications; there may be no explicit indication by port number, 148 protocol identifier or the like. Applications such as session 149 directories may use the name for this profile specified in Section 3. 151 Other profiles may make different choices for the items specified 152 here. 154 This document also defines a set of encodings and payload formats for 155 audio and video. 157 1.1 Terminology 159 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 160 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 161 document are to be interpreted as described in RFC 2119 [2] and 162 indicate requirement levels for implementations compliant with this 163 RTP profile. 165 This draft defines the term media type as dividing encodings of audio 166 and video content into three classes: audio, video and audio/video 167 (interleaved). 169 2 RTP and RTCP Packet Forms and Protocol Behavior 171 The section "RTP Profiles and Payload Format Specification" of RFC 172 XXXX enumerates a number of items that can be specified or modified 173 in a profile. This section addresses these items. Generally, this 174 profile follows the default and/or recommended aspects of the RTP 175 specification. 177 RTP data header: The standard format of the fixed RTP data 178 header is used (one marker bit). 180 Payload types: Static payload types are defined in Section 6. 182 RTP data header additions: No additional fixed fields are 183 appended to the RTP data header. 185 RTP data header extensions: No RTP header extensions are 186 defined, but applications operating under this profile MAY 187 use such extensions. Thus, applications SHOULD NOT assume 188 that the RTP header X bit is always zero and SHOULD be 189 prepared to ignore the header extension. If a header 190 extension is defined in the future, that definition MUST 191 specify the contents of the first 16 bits in such a way 192 that multiple different extensions can be identified. 194 RTCP packet types: No additional RTCP packet types are defined 195 by this profile specification. 197 RTCP report interval: The suggested constants are to be used for 198 the RTCP report interval calculation. Sessions operating 199 under this profile MAY specify a separate parameter for the 200 RTCP traffic bandwidth rather than using the default 201 fraction of the session bandwidth. The RTCP traffic 202 bandwidth MAY be divided into two separate session 203 parameters for those participants which are active data 204 senders and those which are not. Following the 205 recommendation in the RTP specification [1] that 1/4 of the 206 RTCP bandwidth be dedicated to data senders, the 207 RECOMMENDED default values for these two parameters would 208 be 1.25% and 3.75%, respectively. For a particular session, 209 the RTCP bandwidth for non-data-senders MAY be set to zero 210 when operating on unidirectional links or for sessions that 211 don't require feedback on the quality of reception. The 212 RTCP bandwidth for data senders SHOULD be kept non-zero so 213 that sender reports can still be sent for inter-media 214 synchronization and to identify the source by CNAME. The 215 means by which the one or two session parameters for RTCP 216 bandwidth are specified is beyond the scope of this memo. 218 SR/RR extension: No extension section is defined for the RTCP SR 219 or RR packet. 221 SDES use: Applications MAY use any of the SDES items described 222 in the RTP specification. While CNAME information MUST be 223 sent every reporting interval, other items SHOULD only be 224 sent every third reporting interval, with NAME sent seven 225 out of eight times within that slot and the remaining SDES 226 items cyclically taking up the eighth slot, as defined in 227 Section 6.2.2 of the RTP specification. In other words, 228 NAME is sent in RTCP packets 1, 4, 7, 10, 13, 16, 19, 229 while, say, EMAIL is used in RTCP packet 22. 231 Security: The RTP default security services are also the default 232 under this profile. 234 String-to-key mapping: No mapping is specified by this profile. 236 Congestion: RTP and this profile may be used in the context of 237 enhanced network service, for example, through Integrated 238 Services (RFC 1633) [3] or Differentiated Services (RFC 239 2475) [4], or they may be used with best effort service. 241 If enhanced service is being used, RTP receivers SHOULD 242 monitor packet loss to ensure that the service that was 243 requested is actually being delivered. If it is not, then 244 they SHOULD assume that they are receiving best-effort 245 service and behave accordingly. 247 If best-effort service is being used, RTP receivers SHOULD 248 monitor packet loss to ensure that the packet loss rate is 249 within acceptable parameters. Packet loss is considered 250 acceptable if a TCP flow across the same network path and 251 experiencing the same network conditions would achieve an 252 average throughput, measured on a reasonable timescale, 253 that is not less the RTP flow is achieving. This condition 254 can be satisfied by implementing congestion control 255 mechanisms to adapt the transmission rate (or the number of 256 layers subscribed for a layered multicast session), or by 257 arranging for a receiver to leave the session if the loss 258 rate is unacceptably high. 260 The comparison to TCP cannot be specified exactly, but is 261 intended as an "order-of-magnitude" comparison in timescale 262 and throughput. The timescale on which TCP throughput is 263 measured is the round-trip time of the connection. In 264 essence, this requirement states that it is not acceptable 265 to deploy an application (using RTP or any other transport 266 protocol) on the best-effort Internet which consumes 267 bandwidth arbitrarily and does not compete fairly with TCP 268 within an order of magnitude. 270 Underlying protocol: The profile specifies the use of RTP over 271 unicast and multicast UDP as well as TCP. (This does not 272 preclude the use of these definitions when RTP is carried 273 by other lower-layer protocols.) 275 Transport mapping: The standard mapping of RTP and RTCP to 276 transport-level addresses is used. 278 Encapsulation: This profile leaves to applications the 279 specification of RTP encapsulation in protocols other 280 than UDP. 282 3 IANA Considerations 284 The RTP specification establishes a registry of profile names for use 285 by higher-level control protocols, such as the Session Description 286 Protocol (SDP), RFC 2327 [5], to refer to transport methods. This 287 profile registers the name "RTP/AVP". 289 3.1 Registering Additional Encodings 291 This profile lists a set of encodings, each of which is comprised of 292 a particular media data compression or representation plus a payload 293 format for encapsulation within RTP. Some of those payload formats 294 are specified here, while others are specified in separate RFCs. It 295 is expected that additional encodings beyond the set listed here will 296 be created in the future and specified in additional payload format 297 RFCs. 299 This profile also assigns to each encoding a short name which MAY be 300 used by higher-level control protocols, such as the Session 301 Description Protocol (SDP), RFC 2327 [5], to identify encodings 302 selected for a particular RTP session. 304 In some contexts it may be useful to refer to these encodings in the 305 form of a MIME content-type. To facilitate this, RFC YYYY [6] 306 provides registrations for all of the encodings names listed here as 307 MIME subtype names under the "audio" and "video" MIME types through 308 the MIME registration procedure as specified in RFC 2048 [7]. 310 Any additional encodings specified for use under this profile (or 311 others) may also be assigned names registered as MIME subtypes with 312 the Internet Assigned Numbers Authority (IANA). This registry 313 provides a means to insure that the names assigned to the additional 314 encodings are kept unique. RFC YYYY specifies the information that is 315 required for the registration of RTP encodings. 317 In addition to assigning names to encodings, this profile also also 318 assigns static RTP payload type numbers to some of them. However, the 319 payload type number space is relatively small and cannot accommodate 320 assignments for all existing and future encodings. During the early 321 stages of RTP development, it was necessary to use statically 322 assigned payload types because no other mechanism had been specified 323 to bind encodings to payload types. It was anticipated that non-RTP 324 means beyond the scope of this memo (such as directory services or 325 invitation protocols) would be specified to establish a dynamic 326 mapping between a payload type and an encoding. Now, mechanisms for 327 defining dynamic payload type bindings have been specified in the 328 Session Description Protocol (SDP) and in other protocols such as 329 ITU-T recommendation H.323/H.245. These mechanisms associate the 330 registered name of the encoding/payload format, along with any 331 additional required parameters such as the RTP timestamp clock rate 332 and number of channels, to a payload type number. This association 333 is effective only for the duration of the RTP session in which the 334 dynamic payload type binding is made. This association applies only 335 to the RTP session for which it is made, thus the numbers can be re- 336 used for different encodings in different sessions so the number 337 space limitation is avoided. 339 This profile reserves payload type numbers in the range 96-127 340 exclusively for dynamic assignment. Applications SHOULD first use 341 values in this range for dynamic payload types. Those applications 342 which need to define more than 32 dynamic payload types MAY bind 343 codes below 96, in which case it is RECOMMENDED that unassigned 344 payload type numbers be used first. However, the statically assigned 345 payload types are default bindings and MAY be dynamically bound to 346 new encodings if needed. Redefining payload types below 96 may cause 347 incorrect operation if an attempt is made to join a session without 348 obtaining session description information that defines the dynamic 349 payload types. 351 Dynamic payload types SHOULD NOT be used without a well-defined 352 mechanism to indicate the mapping. Systems that expect to 353 interoperate with others operating under this profile SHOULD NOT make 354 their own assignments of proprietary encodings to particular, fixed 355 payload types. 357 This specification establishes the policy that no additional static 358 payload types will be assigned beyond the ones defined in this 359 document. Establishing this policy avoids the problem of trying to 360 create a set of criteria for accepting static assignments and 361 encourages the implementation and deployment of the dynamic payload 362 type mechanisms. 364 4 Audio 366 4.1 Encoding-Independent Rules 368 For applications which send either no packets or comfort-noise 369 packets during silence, the first packet of a talkspurt, that is, the 370 first packet after a silence period, SHOULD be distinguished by 371 setting the marker bit in the RTP data header to one. The marker bits 372 in all other packets is zero. The beginning of a talkspurt MAY be 373 used to adjust the playout delay to reflect changing network delays. 375 Applications without silence suppression MUST set the marker bit to 376 zero. 378 The RTP clock rate used for generating the RTP timestamp is 379 independent of the number of channels and the encoding; it equals the 380 number of sampling periods per second. For N-channel encodings, each 381 sampling period (say, 1/8000 of a second) generates N samples. (This 382 terminology is standard, but somewhat confusing, as the total number 383 of samples generated per second is then the sampling rate times the 384 channel count.) 386 If multiple audio channels are used, channels are numbered left-to- 387 right, starting at one. In RTP audio packets, information from 388 lower-numbered channels precedes that from higher-numbered channels. 389 For more than two channels, the convention followed by the AIFF-C 390 audio interchange format SHOULD be followed [8], using the following 391 notation, unless some other convention is specified for a particular 392 encoding or payload format: 394 l left 395 r right 396 c center 397 S surround 398 F front 399 R rear 401 channels description channel 402 1 2 3 4 5 6 403 __________________________________________________ 404 2 stereo l r 405 3 l r c 406 4 quadrophonic Fl Fr Rl Rr 407 4 l c r S 408 5 Fl Fr Fc Sl Sr 409 6 l lc c r rc S 411 Samples for all channels belonging to a single sampling instant MUST 412 be within the same packet. The interleaving of samples from different 413 channels depends on the encoding. General guidelines are given in 414 Section 4.3 and 4.4. 416 The sampling frequency SHOULD be drawn from the set: 8000, 11025, 417 16000, 22050, 24000, 32000, 44100 and 48000 Hz. (Older Apple 418 Macintosh computers had a native sample rate of 22254.54 Hz, which 419 can be converted to 22050 with acceptable quality by dropping 4 420 samples in a 20 ms frame.) However, most audio encodings are defined 421 for a more restricted set of sampling frequencies. Receivers SHOULD 422 be prepared to accept multi-channel audio, but MAY choose to only 423 play a single channel. 425 4.2 Operating Recommendations 427 The following recommendations are default operating parameters. 428 Applications SHOULD be prepared to handle other values. The ranges 429 given are meant to give guidance to application writers, allowing a 430 set of applications conforming to these guidelines to interoperate 431 without additional negotiation. These guidelines are not intended to 432 restrict operating parameters for applications that can negotiate a 433 set of interoperable parameters, e.g., through a conference control 434 protocol. 436 For packetized audio, the default packetization interval SHOULD have 437 a duration of 20 ms or one frame, whichever is longer, unless 438 otherwise noted in Table 1 (column "ms/packet"). The packetization 439 interval determines the minimum end-to-end delay; longer packets 440 introduce less header overhead but higher delay and make packet loss 441 more noticeable. For non-interactive applications such as lectures or 442 for links with severe bandwidth constraints, a higher packetization 443 delay MAY be used. A receiver SHOULD accept packets representing 444 between 0 and 200 ms of audio data. (For framed audio encodings, a 445 receiver SHOULD accept packets with a number of frames equal to 200 446 ms divided by the frame duration, rounded up.) This restriction 447 allows reasonable buffer sizing for the receiver. 449 4.3 Guidelines for Sample-Based Audio Encodings 451 In sample-based encodings, each audio sample is represented by a 452 fixed number of bits. Within the compressed audio data, codes for 453 individual samples may span octet boundaries. An RTP audio packet may 454 contain any number of audio samples, subject to the constraint that 455 the number of bits per sample times the number of samples per packet 456 yields an integral octet count. Fractional encodings produce less 457 than one octet per sample. 459 The duration of an audio packet is determined by the number of 460 samples in the packet. 462 For sample-based encodings producing one or more octets per sample, 463 samples from different channels sampled at the same sampling instant 464 SHOULD be packed in consecutive octets. For example, for a two- 465 channel encoding, the octet sequence is (left channel, first sample), 466 (right channel, first sample), (left channel, second sample), (right 467 channel, second sample), .... For multi-octet encodings, octets 468 SHOULD be transmitted in network byte order (i.e., most significant 469 octet first). 471 The packing of sample-based encodings producing less than one octet 472 per sample is encoding-specific. 474 The RTP timestamp reflects the instant at which the first sample in 475 the packet was sampled, that is, the oldest information in the 476 packet. 478 4.4 Guidelines for Frame-Based Audio Encodings 480 Frame-based encodings encode a fixed-length block of audio into 481 another block of compressed data, typically also of fixed length. For 482 frame-based encodings, the sender MAY choose to combine several such 483 frames into a single RTP packet. The receiver can tell the number of 484 frames contained in an RTP packet, if all the frames have the same 485 length, by dividing the RTP payload length by the audio frame size 486 which is defined as part of the encoding. This does not work when 487 carrying frames of different sizes unless the frame sizes are 488 relatively prime. If not, the frames MUST indicate their size. 490 For frame-based codecs, the channel order is defined for the whole 491 block. That is, for two-channel audio, right and left samples SHOULD 492 be coded independently, with the encoded frame for the left channel 493 preceding that for the right channel. 495 All frame-oriented audio codecs SHOULD be able to encode and decode 496 several consecutive frames within a single packet. Since the frame 497 size for the frame-oriented codecs is given, there is no need to use 498 a separate designation for the same encoding, but with different 499 number of frames per packet. 501 RTP packets SHALL contain a whole number of frames, with frames 502 inserted according to age within a packet, so that the oldest frame 503 (to be played first) occurs immediately after the RTP packet header. 504 The RTP timestamp reflects the instant at which the first sample in 505 the first frame was sampled, that is, the oldest information in the 506 packet. 508 4.5 Audio Encodings 510 The characteristics of the audio encodings described in this document 511 are shown in Table 1; they are listed in order of their payload type 512 in Table 4. While most audio codecs are only specified for a fixed 513 sampling rate, some sample-based algorithms (indicated by an entry of 514 "var." in the sampling rate column of Table 1) may be used with 515 name of sampling default 516 encoding sample/frame bits/sample rate ms/frame ms/packet 517 __________________________________________________________________ 518 DVI4 sample 4 var. 20 519 G722 sample 8 16,000 20 520 G726-40 sample 5 8,000 20 521 G726-32 sample 4 8,000 20 522 G726-24 sample 3 8,000 20 523 G726-16 sample 2 8,000 20 524 G728 frame N/A 8,000 2.5 20 525 G729 frame N/A 8,000 10 20 526 G729D frame N/A 8,000 10 20 527 G729E frame N/A 8,000 10 20 528 GSM frame N/A 8,000 20 20 529 L8 sample 8 var. 20 530 L16 sample 16 var. 20 531 LPC frame N/A 8,000 20 20 532 MPA frame N/A var. var. 533 PCMA sample 8 var. 20 534 PCMU sample 8 var. 20 535 QCELP frame N/A 8,000 20 20 536 VDVI sample var. var. 20 538 Table 1: Properties of Audio Encodings (N/A: not applicable; var.: 539 variable) 541 different sampling rates, resulting in different coded bit rates. 542 When used with a sampling rate other than that for which a static 543 payload type is defined, non-RTP means beyond the scope of this memo 544 MUST be used to define a dynamic payload type and MUST indicate the 545 selected RTP timestamp clock rate, which is usually the same as the 546 sampling rate for audio. 548 4.5.1 DVI4 550 DVI4 is specified, with pseudo-code, in [9] as the IMA ADPCM wave 551 type. 553 However, the encoding defined here as DVI4 differs in three respects 554 from this recommendation: 556 o The RTP DVI4 header contains the predicted value rather than 557 the first sample value contained the IMA ADPCM block header. 559 o IMA ADPCM blocks contain an odd number of samples, since the 560 first sample of a block is contained just in the header 561 (uncompressed), followed by an even number of compressed 562 samples. DVI4 has an even number of compressed samples only, 563 using the `predict' word from the header to decode the first 564 sample. 566 o For DVI4, the 4-bit samples are packed with the first sample 567 in the four most significant bits and the second sample in the 568 four least significant bits. In the IMA ADPCM codec, the 569 samples are packed in the opposite order. 571 Each packet contains a single DVI block. This profile only defines 572 the 4-bit-per-sample version, while IMA also specifies a 3-bit-per- 573 sample encoding. 575 The "header" word for each channel has the following structure: 577 int16 predict; /* predicted value of first sample 578 from the previous block (L16 format) */ 579 u_int8 index; /* current index into stepsize table */ 580 u_int8 reserved; /* set to zero by sender, ignored by receiver */ 582 Each octet following the header contains two 4-bit samples, thus the 583 number of samples per packet MUST be even because there is no means 584 to indicate a partially filled last octet. 586 Packing of samples for multiple channels is for further study. 588 The document IMA Recommended Practices for Enhancing Digital Audio 589 Compatibility in Multimedia Systems (version 3.0) contains the 590 algorithm description. It is available from 592 Interactive Multimedia Association 593 48 Maryland Avenue, Suite 202 594 Annapolis, MD 21401-8011 595 USA 596 phone: +1 410 626-1380 598 4.5.2 G722 600 G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding 601 within 64 kbit/s". The G.722 encoder produces a stream of octets, 602 each of which SHALL be octet-aligned in an RTP packet. The first bit 603 transmitted in the G.722 octet, which is the most significant bit of 604 the higher sub-band sample, SHALL correspond to the most significant 605 bit of the octet in the RTP packet. 607 Even though the actual sampling rate for G.722 audio is 16000 Hz, the 608 RTP clock rate for the G722 payload format is 8000 Hz because that 609 value was erroneously assigned in RFC 1890 and must remain unchanged 610 for backward compatibility. The octet rate or sample-pair rate is 611 8000 Hz. 613 4.5.3 G726-40, G726-32, G726-24, and G726-16 615 ITU-T Recommendation G.726 describes, among others, the algorithm 616 recommended for conversion of a single 64 kbit/s A-law or mu-law 617 PCM channel encoded at 8000 samples/sec to and from a 40, 32, 24, 618 or 16 kbit/s channel. The conversion is applied to the PCM stream 619 using an Adaptive Differential Pulse Code Modulation (ADPCM) 620 transcoding technique. The ADPCM representation consists of a 621 series of codewords with a one-to-one correspondance to the samples 622 in the PCM stream. The G726 data rates of 40, 32, 24, and 16 623 kbit/s have codewords of 5, 4, 3, and 2 bits respectively. 625 The 16 and 24 kbit/s encodings do not provide toll quality speech. 626 They are designed for used in overloaded Digital Circuit 627 Multiplication Equipment (DCME). ITU-T G.726 recommends that the 628 16 and 24 kbit/s encodings should be alternated with higher data 629 rate encodings to provide an average sample size of between 3.5 and 630 3.7 bits per sample. 632 The encodings of G.726 are here denoted as G726-40, G726-32, 633 G726-24, and G726-16. Prior to 1990, G721 described the 32 kbit/s 634 ADPCM encoding, and G723 described the 40, 32, and 16 kbit/s 635 encodings. Thus, G726-32 designates the same algorithm as G721 in 636 RFC 1890. 638 A stream of G726 codewords contains no information on the encoding 639 being used, therefore transitions between G726 encoding types is 640 not permitted within a sequence of packed codewords. Applications 641 MUST determine the encoding type of packed codewords from the RTP 642 payload identifier. 644 No payload-specific header information SHALL be included as part 645 of the audio data. A stream of G726 codewords MUST be packed into 646 octets as follows: the first codeword is placed into the first 647 octet such that the least significant bit of the codeword aligns 648 with the least significant bit in the octet, the second codeword 649 is then packed so that its least significant bit coincides with 650 the least significant unoccupied bit in the octet. When a 651 complete codeword cannot be placed into an octet, the bits 652 overlapping the octet boundary are placed into the least 653 significant bits of the next octet. Packing MUST end with a 654 completely packed final octet. The number of codewords packed 655 will therefore be a multiple of 8, 2, 8, and 4 for G726-40, 656 G726-32, G726-24, and G726-16 respectively. An examples of the 657 packing scheme for G726-32 codewords is as shown: 659 0 1 660 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 661 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 662 |B B B B|A A A A|D D D D|C C C C| ... 663 |0 1 2 3|0 1 2 3|0 1 2 3|0 1 2 3| 664 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 666 An example of the packing scheme for G726-24 codewords is: 668 0 1 2 669 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 670 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 671 |C C|B B B|A A A|F|E E E|D D D|C|H H H|G G G|F F| ... 672 |1 2|0 1 2|0 1 2|2|0 1 2|0 1 2|0|0 1 2|0 1 2|0 1| 673 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 675 4.5.4 G728 677 G728 is specified in ITU-T Recommendation G.728, "Coding of speech at 678 16 kbit/s using low-delay code excited linear prediction". 680 A G.278 encoder translates 5 consecutive audio samples into a 10-bit 681 codebook index, resulting in a bit rate of 16 kb/s for audio sampled 682 at 8,000 samples per second. The group of five consecutive samples is 683 called a vector. Four consecutive vectors, labeled V1 to V4 (where V1 684 is to be played first by the receiver), build one G.728 frame. The 685 four vectors of 40 bits are packed into 5 octets, labeled B1 through 686 B5. B1 SHALL be placed first in the RTP packet. 688 Referring to the figure below, the principle for bit order is 689 "maintenance of bit significance". Bits from an older vector are more 690 significant than bits from newer vectors. The MSB of the frame goes 691 to the MSB of B1 and the LSB of the frame goes to LSB of B5. 693 1 2 3 3 694 0 0 0 0 9 695 ++++++++++++++++++++++++++++++++++++++++ 696 <---V1---><---V2---><---V3---><---V4---> vectors 697 <--B1--><--B2--><--B3--><--B4--><--B5--> octets 698 <------------- frame 1 ----------------> 700 In particular, B1 contains the eight most significant bits of V1, 701 with the MSB of V1 being the MSB of B1. B2 contains the two least 702 significant bits of V1, the more significant of the two in its MSB, 703 and the six most significant bits of V2. B1 SHALL be placed first in 704 the RTP packet and B5 last. 706 4.5.5 G729 708 G729 is specified in ITU-T Recommendation G.729, "Coding of speech at 709 8 kbit/s using conjugate structure-algebraic code excited linear 710 prediction (CS-ACELP)". A reduced-complexity version of the G.729 711 algorithm is specified in Annex A to Rec. G.729. The speech coding 712 algorithms in the main body of G.729 and in G.729 Annex A are fully 713 interoperable with each other, so there is no need to further 714 distinguish between them. The G.729 and G.729 Annex A codecs were 715 optimized to represent speech with high quality, where G.729 Annex A 716 trades some speech quality for an approximate 50% complexity 717 reduction [10]. See the next Section (4.5.6) for other data rates 718 added in later G.729 Annexes. For all data rates, the sampling 719 frequency (and RTP timestamp clock rate) is 8000 Hz. 721 A voice activity detector (VAD) and comfort noise generator (CNG) 722 algorithm in Annex B of G.729 is RECOMMENDED for digital simultaneous 723 voice and data applications and can be used in conjunction with G.729 724 or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets, 725 while the G.729 Annex B comfort noise frame occupies 2 octets. 727 A G729 RTP packet may consist of zero or more G.729 or G.729 Annex A 728 frames, followed by zero or one G.729 Annex B frames. The presence of 729 a comfort noise frame can be deduced from the length of the RTP 730 payload. The default packetization interval is 20 ms (two frames), 731 but in some situations it may be desireable to send 10 ms packets. An 732 example would be a transition from speech to comfort noise in the 733 first 10 ms of the packet. For some applications, a longer 734 packetization interval may be required to reduce the packet rate. 736 The transmitted parameters of a G.729/G.729A 10-ms frame, consisting 737 of 80 bits, are defined in Recommendation G.729, Table 8/G.729. The 738 mapping of the these parameters is given below in Fig. 4. The 739 diagrams show the bit packing in "network byte order," also known as 740 big-endian order. The bits of each 32-bit word are numbered 0 to 31, 741 with the most significant bit on the left and numbered 0. The octets 742 (bytes) of each word are transmitted most significant octet first. 743 The bits of each data field are numbered in the order as produced by 744 the G.729 C code reference implementation. 746 0 1 2 3 747 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 748 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 749 |L| L1 | L2 | L3 | P1 |P| C1 | 750 |0| | | | |0| | 751 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4| 752 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 753 | C1 | S1 | GA1 | GB1 | P2 | C2 | 754 | 1 1 1| | | | | | 755 |5 6 7 8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7| 756 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 757 | C2 | S2 | GA2 | GB2 | 758 | 1 1 1| | | | 759 |8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3| 760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 762 The packing of the G.729 Annex B comfort noise frame is as follows: 764 0 1 765 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 766 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 767 |L| LSF1 | LSF2 | GAIN |R| 768 |S| | | |E| 769 |F| | | |S| 770 |0|0 1 2 3 4|0 1 2 3|0 1 2 3 4|V| RESV = Reserved (zero) 771 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 773 4.5.6 G729D and G729E 775 Annexes D and E to ITU-T Recommendation G.729 provide additional data 776 rates. Because the data rate is not signaled in the bitstream, the 777 different data rates are given distinct RTP encoding names which are 778 mapped to distinct payload type numbers. G729D indicates a 6.4 kbit/s 779 coding mode (G.729 Annex D, for momentary reduction in channel 780 capacity), while G729E indicates an 11.8 kbit/s mode (G.729 Annex E, 781 for improved performance with a wide range of narrow-band input 782 signals, e.g. music and background noise). Annex E has two operating 783 modes, backward adaptive and forward adaptive, which are signaled by 784 the first two bits in each frame (the most significant two bits of 785 the first octet). 787 The voice activity detector (VAD) and comfort noise generator (CNG) 788 algorithm specified in Annex B of G.729 may be used with Annex D and 789 Annex E frames in addition to G.729 and G.729 Annex A frames. The 790 algorithm details for the operation of Annexes D and E with the Annex 791 B CNG are specified in G.729 Annexes F and G. Note that Annexes F and 792 G do not introduce any new encodings. 794 For G729D, an RTP packet may consist of zero or more G.729 Annex D 795 frames, followed by zero or one G.729 Annex B frame. Similarly, for 796 G729E, an RTP packet may consist of zero or more G.729 Annex E 797 frames, followed by zero or one G.729 Annex B frame. The presence of 798 a comfort noise frame can be deduced from the length of the RTP 799 payload. 801 A single RTP packet must contain frames of only one data rate, 802 optionally followed by one comfort noise frame. The data rate may be 803 changed from packet to packet by changing the payload type number. 804 G.729 Annexes D, E and H describe what the encoding and decoding 805 algorithms must do to accommodate a change in data rate. 807 For G729D, the bits of a G.729 Annex D frame are formatted as shown 808 below in Fig. 6 (cf. Table D.1/G.729). The frame length is 64 bits. 810 0 1 2 3 811 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 812 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 813 |L| L1 | L2 | L3 | P1 | C1 | 814 |0| | | | | | 815 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7|0 1 2 3 4 5| 816 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 817 | C1 |S1 | GA1 | GB1 | P2 | C2 |S2 | GA2 | GB2 | 818 | | | | | | | | | | 819 |6 7 8|0 1|0 1 2|0 1 2|0 1 2 3|0 1 2 3 4 5 6 7 8|0 1|0 1 2|0 1 2| 820 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 822 The net bit rate for the G.729 Annex E algorithm is 11.8 kbit/s and a 823 total of 118 bits are used. Two bits are appended as "don't care" 824 bits to complete an integer number of octets for the frame. For 825 G729E, the bits of a data frame are formatted as shown in the next 826 two diagrams (cf. Table E.1/G.729). The fields for the G729E forward 827 adaptive mode are packed as follows: 829 0 1 2 3 830 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 831 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 832 |0 0|L| L1 | L2 | L3 | P1 |P| C0_1| 833 | |0| | | | |0| | 834 | | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2| 835 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 836 | | C1_1 | C2_1 | C3_1 | C4_1 | 837 | | | | | | 838 |3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6| 839 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 840 | GA1 | GB1 | P2 | C0_2 | C1_2 | C2_2 | 841 | | | | | | | 842 |0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5| 843 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 844 | | C3_2 | C4_2 | GA2 | GB2 |DC | 845 | | | | | | | 846 |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 847 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 849 The fields for the G729E backward adaptive mode are packed as shown 850 in Fig. 8. 852 0 1 2 3 853 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 855 |1 1| P1 |P| C0_1 | C1_1 | 856 | | |0| 1 1 1| | 857 | |0 1 2 3 4 5 6 7|0|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7| 858 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 859 | | C2_1 | C3_1 | C4_1 |GA1 | GB1 |P2 | 860 | | | | | | | | 861 |8 9|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 862 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 863 | | C0_2 | C1_2 | C2_2 | 864 | | 1 1 1| | | 865 |2 3 4|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7 8 9|0 1 2 3 4 5| 866 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 867 | | C3_2 | C4_2 | GA2 | GB2 |DC | 868 | | | | | | | 869 |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 870 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 872 4.5.7 GSM 874 GSM (group speciale mobile) denotes the European GSM 06.10 standard 875 for full-rate speech transcoding, ETS 300 961, which is based on 876 RPE/LTP (residual pulse excitation/long term prediction) coding at a 877 rate of 13 kb/s [15,16,17]. The text of the standard can be obtained 878 from 880 ETSI (European Telecommunications Standards Institute) 881 ETSI Secretariat: B.P.152 882 F-06561 Valbonne Cedex 883 France 884 Phone: +33 92 94 42 00 885 Fax: +33 93 65 47 16 887 Blocks of 160 audio samples are compressed into 33 octets, for an 888 effective data rate of 13,200 b/s. 890 4.5.7.1 General Packaging Issues 892 The GSM standard (ETS 300 961) specifies the bit stream produced by 893 the codec, but does not specify how these bits should be packed for 894 transmission. The packetization specified here has subsequently been 896 adopted in ETSI Technical Specification TS 101 318. Some software 897 implementations of the GSM codec use a different packing than that 898 specified here. 900 In the GSM packing used by RTP, the bits SHALL be packed beginning 901 from the most significant bit. Every 160 sample GSM frame is coded 902 into one 33 octet (264 bit) buffer. Every such buffer begins with a 4 903 bit signature (0xD), followed by the MSB encoding of the fields of 904 the frame. The first octet thus contains 1101 in the 4 most 905 significant bits (0-3) and the 4 most significant bits of F1 (0-3) in 906 the 4 least significant bits (4-7). The second octet contains the 2 907 least significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so 908 on. The order of the fields in the frame is described in Table 2. 910 4.5.7.2 GSM variable names and numbers 912 In the RTP encoding we have the bit pattern described in Table 3, 913 where F.i signifies the ith bit of the field F, bit 0 is the most 914 significant bit, and the bits of every octet are numbered from 0 to 7 915 from most to least significant. 917 4.5.8 L8 919 L8 denotes linear audio data samples, using 8-bits of precision with 920 an offset of 128, that is, the most negative signal is encoded as 921 zero. 923 field field name bits field field name bits 924 ________________________________________________ 925 1 LARc[0] 6 39 xmc[22] 3 926 2 LARc[1] 6 40 xmc[23] 3 927 3 LARc[2] 5 41 xmc[24] 3 928 4 LARc[3] 5 42 xmc[25] 3 929 5 LARc[4] 4 43 Nc[2] 7 930 6 LARc[5] 4 44 bc[2] 2 931 7 LARc[6] 3 45 Mc[2] 2 932 8 LARc[7] 3 46 xmaxc[2] 6 933 9 Nc[0] 7 47 xmc[26] 3 934 10 bc[0] 2 48 xmc[27] 3 935 11 Mc[0] 2 49 xmc[28] 3 936 12 xmaxc[0] 6 50 xmc[29] 3 937 13 xmc[0] 3 51 xmc[30] 3 938 14 xmc[1] 3 52 xmc[31] 3 939 15 xmc[2] 3 53 xmc[32] 3 940 16 xmc[3] 3 54 xmc[33] 3 941 17 xmc[4] 3 55 xmc[34] 3 942 18 xmc[5] 3 56 xmc[35] 3 943 19 xmc[6] 3 57 xmc[36] 3 944 20 xmc[7] 3 58 xmc[37] 3 945 21 xmc[8] 3 59 xmc[38] 3 946 22 xmc[9] 3 60 Nc[3] 7 947 23 xmc[10] 3 61 bc[3] 2 948 24 xmc[11] 3 62 Mc[3] 2 949 25 xmc[12] 3 63 xmaxc[3] 6 950 26 Nc[1] 7 64 xmc[39] 3 951 27 bc[1] 2 65 xmc[40] 3 952 28 Mc[1] 2 66 xmc[41] 3 953 29 xmaxc[1] 6 67 xmc[42] 3 954 30 xmc[13] 3 68 xmc[43] 3 955 31 xmc[14] 3 69 xmc[44] 3 956 32 xmc[15] 3 70 xmc[45] 3 957 33 xmc[16] 3 71 xmc[46] 3 958 34 xmc[17] 3 72 xmc[47] 3 959 35 xmc[18] 3 73 xmc[48] 3 960 36 xmc[19] 3 74 xmc[49] 3 961 37 xmc[20] 3 75 xmc[50] 3 962 38 xmc[21] 3 76 xmc[51] 3 964 Table 2: Ordering of GSM variables 966 4.5.9 L16 968 L16 denotes uncompressed audio data samples, using 16-bit signed 969 representation with 65535 equally divided steps between minimum and 970 Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 971 _____________________________________________________________________________ 972 0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3 973 1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5 974 2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2 975 3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1 976 4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2 977 5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0 978 6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04 979 7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0 980 8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2 981 9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1 982 10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0 983 11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2 984 12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0 985 13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14 986 14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0 987 15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2 988 16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1 989 17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0 990 18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2 991 19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0 992 20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24 993 21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0 994 22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2 995 23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1 996 24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0 997 25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2 998 26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0 999 27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34 1000 28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0 1001 29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2 1002 30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1 1003 31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0 1004 32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2 1006 Table 3: GSM payload format 1008 maximum signal level, ranging from -32768 to 32767. The value is 1009 represented in two's complement notation and transmitted in network 1010 byte order (most significant byte first). 1012 4.5.10 LPC 1014 LPC designates an experimental linear predictive encoding contributed 1015 by Ron Frederick, which is based on an implementation written by Ron 1016 Zuckerman posted to the Usenet group comp.dsp on June 26, 1992. The 1017 codec generates 14 octets for every frame. The framesize is set to 20 1018 ms, resulting in a bit rate of 5,600 b/s. 1020 4.5.11 MPA 1022 MPA denotes MPEG-1 or MPEG-2 audio encapsulated as elementary 1023 streams. The encoding is defined in ISO standards ISO/IEC 11172-3 1024 and 13818-3. The encapsulation is specified in RFC 2250 [14]. 1026 The encoding may be at any of three levels of complexity, called 1027 Layer I, II and III. The selected layer as well as the sampling rate 1028 and channel count are indicated in the payload. The RTP timestamp 1029 clock rate is always 90000, independent of the sampling rate. MPEG-1 1030 audio supports sampling rates of 32, 44.1, and 48 kHz (ISO/IEC 1031 11172-3, section 1.1; "Scope"). MPEG-2 supports sampling rates of 16, 1032 22.05 and 24 kHz. The number of samples per frame is fixed, but the 1033 frame size will vary with the sampling rate and bit rate. 1035 4.5.12 PCMA and PCMU 1037 PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio data 1038 is encoded as eight bits per sample, after logarithmic scaling. PCMU 1039 denotes mu-law scaling, PCMA A-law scaling. A detailed description is 1040 given by Jayant and Noll [15]. Each G.711 octet SHALL be octet- 1041 aligned in an RTP packet. The sign bit of each G.711 octet SHALL 1042 correspond to the most significant bit of the octet in the RTP packet 1043 (i.e., assuming the G.711 samples are handled as octets on the host 1044 machine, the sign bit SHALL be the most signficant bit of the octet 1045 as defined by the host machine format). The 56 kb/s and 48 kb/s modes 1046 of G.711 are not applicable to RTP, since PCMA and PCMU MUST always 1047 be transmitted as 8-bit samples. 1049 4.5.13 QCELP 1051 The Electronic Industries Association (EIA) & Telecommunications 1052 Industry Association (TIA) standard IS-733, "TR45: High Rate Speech 1053 Service Option for Wideband Spread Spectrum Communications Systems," 1054 defines the QCELP audio compression algorithm for use in wireless 1055 CDMA applications. The QCELP CODEC compresses each 20 milliseconds of 1056 8000 Hz, 16- bit sampled input speech into one of four different size 1057 output frames: Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4 (54 1058 bits) or Rate 1/8 (20 bits). For typical speech patterns, this 1059 results in an average output of 6.8 k bits/sec for normal mode and 1060 4.7 k bits/sec for reduced rate mode. The packetization of the QCELP 1061 audio codec is described in [16]. 1063 4.5.14 RED 1064 The redundant audio payload format "RED" is specified by RFC 2198 1065 [17]. It defines a means by which multiple redundant copies of an 1066 audio packet may be transmitted in a single RTP stream. Each packet 1067 in such a stream contains, in addition to the audio data for that 1068 packetization interval, a (more heavily compressed) copy of the data 1069 from a previous packetization interval. This allows an approximation 1070 of the data from lost packets to be recovered upon decoding of a 1071 subsequent packet, giving much improved sound quality when compared 1072 with silence substitution for lost packets. 1074 4.5.15 VDVI 1076 VDVI is a variable-rate version of DVI4, yielding speech bit rates of 1077 between 10 and 25 kb/s. It is specified for single-channel operation 1078 only. Samples are packed into octets starting at the most- 1079 significant bit. The last octet is padded with 1 bits if the last 1080 sample does not fill the last octet. This padding is distinct from 1081 the valid codewords. The receiver needs to detect the padding 1082 because there is no explicit count of samples in the packet. 1084 It uses the following encoding: 1086 DVI4 codeword VDVI bit pattern 1087 _______________________________ 1088 0 00 1089 1 010 1090 2 1100 1091 3 11100 1092 4 111100 1093 5 1111100 1094 6 11111100 1095 7 11111110 1096 8 10 1097 9 011 1098 10 1101 1099 11 11101 1100 12 111101 1101 13 1111101 1102 14 11111101 1103 15 11111111 1105 5 Video 1107 The following sections describe the video encodings that are defined 1108 in this memo and give their abbreviated names used for 1109 identification. These video encodings and their payload types are 1110 listed in Table 5. 1112 All of these video encodings use an RTP timestamp frequency of 90,000 1113 Hz, the same as the MPEG presentation time stamp frequency. This 1114 frequency yields exact integer timestamp increments for the typical 1115 24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates 1116 and 50, 59.94 and 60 Hz field rates. While 90 kHz is the RECOMMENDED 1117 rate for future video encodings used within this profile, other rates 1118 MAY be used. However, it is not sufficient to use the video frame 1119 rate (typically between 15 and 30 Hz) because that does not provide 1120 adequate resolution for typical synchronization requirements when 1121 calculating the RTP timestamp corresponding to the NTP timestamp in 1122 an RTCP SR packet. The timestamp resolution MUST also be sufficient 1123 for the jitter estimate contained in the receiver reports. 1125 For most of these video encodings, the RTP timestamp encodes the 1126 sampling instant of the video image contained in the RTP data packet. 1127 If a video image occupies more than one packet, the timestamp is the 1128 same on all of those packets. Packets from different video images are 1129 distinguished by their different timestamps. 1131 Most of these video encodings also specify that the marker bit of the 1132 RTP header SHOULD be set to one in the last packet of a video frame 1133 and otherwise set to zero. Thus, it is not necessary to wait for a 1134 following packet with a different timestamp to detect that a new 1135 frame should be displayed. 1137 5.1 CelB 1139 The CELL-B encoding is a proprietary encoding proposed by Sun 1140 Microsystems. The byte stream format is described in RFC 2029 [18]. 1142 5.2 JPEG 1144 The encoding is specified in ISO Standards 10918-1 and 10918-2. The 1145 RTP payload format is as specified in RFC 2435 [19]. 1147 5.3 H261 1149 The encoding is specified in ITU-T Recommendation H.261, "Video codec 1150 for audiovisual services at p x 64 kbit/s". The packetization and 1151 RTP-specific properties are described in RFC 2032 [20]. 1153 5.4 H263-1998 1155 The encoding is specified in the 1998 version of ITU-T Recommendation 1156 H.263, "Video coding for low bit rate communication". The 1157 packetization and RTP-specific properties are described in RFC 2429 1158 [21]. Because the 1998 version of H.263 is a superset of the 1996 1159 syntax, this payload format can also be used with the 1996 version of 1160 H.263, and is RECOMMENDED for this use by new implementations. This 1161 payload format does not replace RFC 2190, which continues to be used 1162 by existing implementations, and may be required for backward 1163 compatibility in new implementations. Implementations using the new 1164 features of the 1998 version of H.263 MUST use the payload format 1165 described in RFC 2429. 1167 5.5 MPV 1169 MPV designates the use of MPEG-1 and MPEG-2 video encoding elementary 1170 streams as specified in ISO Standards ISO/IEC 11172 and 13818-2, 1171 respectively. The RTP payload format is as specified in RFC 2250 1172 [14], Section 3. 1174 5.8 nv 1176 The encoding is implemented in the program `nv', version 4, developed 1177 at Xerox PARC by Ron Frederick. Further information is available from 1178 the author: 1180 Ron Frederick 1181 Entera, Inc. 1182 40971 Encyclopedia Circle 1183 Fremont, CA 94538 1184 United States 1185 electronic mail: ronf@entera.com 1187 6 Payload Type Definitions 1189 Tables 4 and 5 define this profile's static payload type values for 1190 the PT field of the RTP data header. In addition, payload type 1191 values in the range 96-127 MAY be defined dynamically through a 1192 conference control protocol, which is beyond the scope of this 1193 document. For example, a session directory could specify that for a 1194 given session, payload type 96 indicates PCMU encoding, 8,000 Hz 1195 sampling rate, 2 channels. Entries in Tables 4 and 5 with payload 1196 type "dyn" have no static payload type assigned and are only used 1197 with a dynamic payload type. Payload type 13 is reserved for a 1198 comfort noise payload format to be specified in a separate RFC. 1199 Payload type 19 is also marked "reserved" because some draft versions 1200 of this specification assigned that number to a comfort noise payload 1201 format. The payload type range 72-76 is marked "reserved" so that 1202 RTCP and RTP packets can be reliably distinguished (see Section 1203 "Summary of Protocol Constants" of the RTP protocol specification). 1205 The payload types currently defined in this profile are assigned to 1206 exactly one of three categories or media types : audio only, video 1207 only and those combining audio and video. The media types are marked 1208 in Tables 4 and 5 as "A", "V" and "AV", respectively. Payload types 1209 of different media types SHALL NOT be interleaved or multiplexed 1210 within a single RTP session, but multiple RTP sessions MAY be used in 1211 parallel to send multiple media types. An RTP source MAY change 1212 payload types within the same media type during a session. See the 1213 section "Multiplexing RTP Sessions" of RFC XXXX for additional 1214 explanation. 1216 Session participants agree through mechanisms beyond the scope of 1217 this specification on the set of payload types allowed in a given 1218 session. This set MAY, for example, be defined by the capabilities 1219 of the applications used, negotiated by a conference control protocol 1220 or established by agreement between the human participants. 1222 Audio applications operating under this profile SHOULD, at a minimum, 1223 be able to send and/or receive payload types 0 (PCMU) and 5 (DVI4). 1224 This allows interoperability without format negotiation and ensures 1225 successful negotation with a conference control protocol. 1227 PT encoding media type clock rate channels 1228 name (Hz) 1229 ___________________________________________________ 1230 0 PCMU A 8000 1 1231 1 reserved A 1232 2 G726-32 A 8000 1 1233 3 GSM A 8000 1 1234 4 reserved A 1235 5 DVI4 A 8000 1 1236 6 DVI4 A 16000 1 1237 7 LPC A 8000 1 1238 8 PCMA A 8000 1 1239 9 G722 A 8000 1 1240 10 L16 A 44100 2 1241 11 L16 A 44100 1 1242 12 QCELP A 8000 1 1243 13 reserved A 1244 14 MPA A 90000 (see text) 1245 15 G728 A 8000 1 1246 16 DVI4 A 11025 1 1247 17 DVI4 A 22050 1 1248 18 G729 A 8000 1 1249 19 reserved A 1250 20 unassigned A 1251 21 unassigned A 1252 22 unassigned A 1253 23 unassigned A 1254 dyn G726-40 A 8000 1 1255 dyn G726-24 A 8000 1 1256 dyn G726-16 A 8000 1 1257 dyn G729D A 8000 1 1258 dyn G729E A 8000 1 1259 dyn L8 A var. var. 1260 dyn RED A (see text) 1261 dyn VDVI A var. 1 1263 Table 4: Payload types (PT) for audio encodings 1264 PT encoding media type clock rate 1265 name (Hz) 1266 ____________________________________________ 1267 24 unassigned V 1268 25 CelB V 90000 1269 26 JPEG V 90000 1270 27 unassigned V 1271 28 nv V 90000 1272 29 unassigned V 1273 30 unassigned V 1274 31 H261 V 90000 1275 32 MPV V 90000 1276 33 reserved V 1277 34 reserved V 1278 35-71 unassigned ? 1279 72-76 reserved N/A N/A 1280 77-95 unassigned ? 1281 96-127 dynamic ? 1282 dyn BT656 V 90000 1283 dyn H263-1998 V 90000 1285 Table 5: Payload types (PT) for video and combined encodings 1287 7 RTP over TCP and Similar Byte Stream Protocols 1289 Under special circumstances, it may be necessary to carry RTP in 1290 protocols offering a byte stream abstraction, such as TCP, possibly 1291 multiplexed with other data. The application MUST define its own 1292 method of delineating RTP and RTCP packets (RTSP [22] provides an 1293 example of such an encapsulation specification.) 1295 8 Port Assignment 1297 As specified in the RTP protocol definition, RTP data SHOULD be 1298 carried on an even UDP port number and the corresponding RTCP 1299 packets SHOULD be carried on the next higher (odd) port number. 1301 Applications operating under this profile MAY use any such UDP port 1302 pair. For example, the port pair MAY be allocated randomly by a 1303 session management program. A single fixed port number pair cannot be 1304 required because multiple applications using this profile are likely 1305 to run on the same host, and there are some operating systems that do 1306 not allow multiple processes to use the same UDP port with different 1307 multicast addresses. 1309 However, port numbers 5004 and 5005 have been registered for use with 1310 this profile for those applications that choose to use them as the 1311 default pair. Applications that operate under multiple profiles MAY 1312 use this port pair as an indication to select this profile if they 1313 are not subject to the constraint of the previous paragraph. 1314 Applications need not have a default and MAY require that the port 1315 pair be explicitly specified. The particular port numbers were chosen 1316 to lie in the range above 5000 to accommodate port number allocation 1317 practice within some versions of the Unix operating system, where 1318 port numbers below 1024 can only be used by privileged processes and 1319 port numbers between 1024 and 5000 are automatically assigned by the 1320 operating system. 1322 9 Changes from RFC 1890 1324 This RFC revises RFC 1890. It is mostly backwards-compatible with RFC 1325 1890 and codifies existing practice. The changes are listed below. 1327 o The mapping of a user pass-phrase string into an encryption 1328 key was deleted from Section 2 because two interoperable 1329 implementations were not found. 1331 o The payload formats for 1016 audio and MP2T video were removed 1332 and their static payload type assignments 1 and 33 were marked 1333 "reserved" because two interoperable implementations were not 1334 found. 1336 o Additional payload formats and/or expanded descriptions were 1337 included for G722, G726, G728, G729, GSM, QCELP, RED, VDVI, 1338 and H263-1998. 1340 o Static payload types 12, 16, 17 and 18 were added, and 13 and 1341 19 were reserved. 1343 o Requirements for congestion control were added in Section 2. 1345 o A new Section "IANA Considerations" was added to specify the 1346 regstration of the name for this profile and to establish a 1347 new policy that no additional registration of static payload 1348 types for this profile will be made beyond those included in 1349 Tables 4 and 5, but that additional encoding names may be 1350 registered as MIME subtypes for binding to dynamic payload 1351 types. 1353 o In Section 4.1, the requirement level for setting of the 1354 marker bit on the first packet after silence for audio was 1355 changed from "is" to "SHOULD be". 1357 o Similarly, text was added to specify that the marker bit 1358 SHOULD be set to one on the last packet of a video frame, and 1359 that video frames are distinguished by their timestamps. 1361 o This profile follows the suggestion in the RTP spec that RTCP 1362 bandwidth may be specified separately from the session 1363 bandwidth and separately for active senders and passive 1364 receivers. 1366 o RFC references are added for payload formats published after 1367 RFC 1890. 1369 o The security considerations and full copyright sections were 1370 added. 1372 o According to Peter Hoddie of Apple, only pre-1994 Macintosh 1373 used the 22254.54 rate and none the 11127.27 rate, so the 1374 latter was dropped from the discussion of suggested sampling 1375 frequencies. 1377 o Table 1 was corrected to move some values from the "ms/packet" 1378 column to the "default ms/packet" column where they belonged. 1380 o A note has been added for G722 to clarify a discrepancy 1381 between the actual sampling rate and the RTP timestamp clock 1382 rate. 1384 o Small clarifications of the text have been made in several 1385 places, some in response to questions from readers. In 1386 particular: 1388 - A definition for "media type" is given in Section 1.1 to 1389 allow the explanation of multiplexing RTP sessions in 1390 Section 6 to be more clear regarding the multiplexing of 1391 multiple media. 1393 - The explanation of how to determine the number of audio 1394 frames in a packet from the length was expanded. 1396 - More description of the allocation of bandwidth to SDES 1397 items is given. 1399 - A note was added that the convention for the order of 1400 channels specified in Section 4.1 may be overridden by a 1401 particular encoding or payload format specification. 1403 - The terms MUST, SHOULD, MAY, etc. are used as defined in RFC 1404 2119. 1406 o A second author for this document was added. 1408 10 Security Considerations 1410 Implementations using the profile defined in this specification are 1411 subject to the security considerations discussed in the RTP 1412 specification [1]. This profile does not specify any different 1413 security services other than giving rules for mapping characters in a 1414 user-provided pass phrase to canonical form. The primary function of 1415 this profile is to list a set of data compression encodings for audio 1416 and video media. 1418 Confidentiality of the media streams is achieved by encryption. 1419 Because the data compression used with the payload formats described 1420 in this profile is applied end-to-end, encryption may be performed 1421 after compression so there is no conflict between the two operations. 1423 A potential denial-of-service threat exists for data encodings using 1424 compression techniques that have non-uniform receiver-end 1425 computational load. The attacker can inject pathological datagrams 1426 into the stream which are complex to decode and cause the receiver to 1427 be overloaded. However, the encodings described in this profile do 1428 not exhibit any significant non-uniformity. 1430 As with any IP-based protocol, in some circumstances a receiver may 1431 be overloaded simply by the receipt of too many packets, either 1432 desired or undesired. Network-layer authentication MAY be used to 1433 discard packets from undesired sources, but the processing cost of 1434 the authentication itself may be too high. In a multicast 1435 environment, pruning of specific sources may be implemented in future 1436 versions of IGMP [23] and in multicast routing protocols to allow a 1437 receiver to select which sources are allowed to reach it. 1439 11 Full Copyright Statement 1441 Copyright (C) The Internet Society (2000). All Rights Reserved. 1443 This document and translations of it may be copied and furnished to 1444 others, and derivative works that comment on or otherwise explain it 1445 or assist in its implmentation may be prepared, copied, published and 1446 distributed, in whole or in part, without restriction of any kind, 1447 provided that the above copyright notice and this paragraph are 1448 included on all such copies and derivative works. However, this 1449 document itself may not be modified in any way, such as by removing 1450 the copyright notice or references to the Internet Society or other 1451 Internet organizations, except as needed for the purpose of 1452 developing Internet standards in which case the procedures for 1453 copyrights defined in the Internet Standards process must be 1454 followed, or as required to translate it into languages other than 1455 English. 1457 The limited permissions granted above are perpetual and will not be 1458 revoked by the Internet Society or its successors or assigns. 1460 This document and the information contained herein is provided on an 1461 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1462 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1463 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1464 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1465 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1467 12 Acknowledgements 1469 The comments and careful review of Simao Campos, Richard Cox and AVT 1470 Working Group participants are gratefully acknowledged. The GSM 1471 description was adopted from the IMTC Voice over IP Forum Service 1472 Interoperability Implementation Agreement (January 1997). Fred Burg 1473 and Terry Lyons helped with the G.729 description. 1475 13 Addresses of Authors 1477 Henning Schulzrinne 1478 Dept. of Computer Science 1479 Columbia University 1480 1214 Amsterdam Avenue 1481 New York, NY 10027 1482 USA 1483 electronic mail: schulzrinne@cs.columbia.edu 1485 Stephen L. Casner 1486 Packet Design 1487 2465 Latham Street 1488 Mountain View, CA 94040 1489 United States 1490 electronic mail: casner@acm.org 1492 A Bibliography 1494 [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A 1495 transport protocol for real-time applications," Internet Draft, 1496 Internet Engineering Task Force, Feb. 1999 Work in progress, revision 1497 to RFC 1889. 1499 [2] S. Bradner, "Key words for use in RFCs to Indicate Requirement 1500 Levels," RFC 2119, Internet Engineering Task Force, Mar. 1997. 1502 [3] R. Braden, D. Clark, S. Shenker, "Integrated Services in the 1503 Internet Architecture: an Overview," Request for Comments 1504 (Informational) RFC 1633, Internet Engineering Task Force, June 1994. 1506 [4] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. Weiss, "An 1507 Architecture for Differentiated Service," Request for Comments 1508 (Proposed Standard) RFC 2475, Internet Engineering Task Force, Dec. 1509 1998. 1511 [5] M. Handley and V. Jacobson, "SDP: Session Description Protocol," 1512 Request for Comments (Proposed Standard) RFC 2327, Internet 1513 Engineering Task Force, Apr. 1998. 1515 [6] P. Hoschka, "MIME Type Registration of RTP Payload Types," 1516 Internet Draft, Internet Engineering Task Force, Feb. 1999 Work in 1517 progress. 1519 [7] N. Freed, J. Klensin, and J. Postel, "Multipurpose Internet Mail 1520 Extensions (MIME) Part Four: Registration Procedures," RFC 2048, 1521 Internet Engineering Task Force, Nov. 1996. 1523 [8] Apple Computer, "Audio interchange file format AIFF-C," Aug. 1524 1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z). 1526 [9] IMA Digital Audio Focus and Technical Working Groups, 1527 "Recommended practices for enhancing digital audio compatibility in 1528 multimedia systems (version 3.00)," tech. rep., Interactive 1529 Multimedia Association, Annapolis, Maryland, Oct. 1992. 1531 [10] D. Deleam and J.-P. Petit, "Real-time implementations of the 1532 recent ITU-T low bit rate speech coders on the TI TMS320C54X DSP: 1533 results, methodology, and applications," in Proc. of International 1534 Conference on Signal Processing, Technology, and Applications 1535 (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660, Oct. 1996. 1537 [11] M. Mouly and M.-B. Pautet, The GSM system for mobile 1538 communications Lassay-les-Chateaux, France: Europe Media Duplication, 1539 1993. 1541 [12] J. Degener, "Digital speech compression," Dr. Dobb's Journal , 1542 Dec. 1994. 1544 [13] S. M. Redl, M. K. Weber, and M. W. Oliphant, An Introduction to 1545 GSM Boston: Artech House, 1995. 1547 [14] D. Hoffman, G. Fernando, V. Goyal, and M. Civanlar, "RTP payload 1548 format for MPEG1/MPEG2 video," Request for Comments (Proposed 1549 Standard) RFC 2250, Internet Engineering Task Force, Jan. 1998. 1551 [15] N. S. Jayant and P. Noll, Digital Coding of Waveforms-- 1552 Principles and Applications to Speech and Video Englewood Cliffs, New 1553 Jersey: Prentice-Hall, 1984. 1555 [16] K. McKay, "RTP Payload Format for PureVoice(tm) Audio", Request 1556 for Comments (Proposed Standard) RFC 2658, Internet Engineering Task 1557 Force, Aug. 1999. 1559 [17] C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.C. 1560 Bolot, A. Vega-Garcia, and S. Fosse-Parisis, "RTP Payload for 1561 Redundant Audio Data," Request for Comments (Proposed Standard) RFC 1562 2198, Internet Engineering Task Force, Sep. 1997. 1564 [18] M. Speer and D. Hoffman, "RTP payload format of sun's CellB 1565 video encoding," Request for Comments (Proposed Standard) RFC 2029, 1566 Internet Engineering Task Force, Oct. 1996. 1568 [19] L. Berc, W. Fenner, R. Frederick, and S. McCanne, "RTP payload 1569 format for JPEG-compressed video," Request for Comments (Proposed 1570 Standard) RFC 2435, Internet Engineering Task Force, Oct. 1996. 1572 [20] T. Turletti and C. Huitema, "RTP payload format for H.261 video 1573 streams," Request for Comments (Proposed Standard) RFC 2032, Internet 1574 Engineering Task Force, Oct. 1996. 1576 [21] C. Bormann, L. Cline, G. Deisher, T. Gardos, C. Maciocco, D. 1577 Newell, J. Ott, G. Sullivan, S. Wenger, C. Zhu, "RTP Payload Format 1578 for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)," Request for 1579 Comments (Proposed Standard) RFC 2429, Internet Engineering Task 1580 Force, Oct. 1998. 1582 [22] H. Schulzrinne, A. Rao, and R. Lanphier, "Real time streaming 1583 protocol (RTSP)," Request for Comments (Proposed Standard) RFC 2326, 1584 Internet Engineering Task Force, Apr. 1998. 1586 [23] S. Deering, "Host Extensions for IP Multicasting," Request for 1587 Comments RFC 1112, STD 5, Internet Engineering Task Force, Aug. 1989. 1589 Current Locations of Related Resources 1591 Note: Several sections below refer to the ITU-T Software Tool Library 1592 (STL). It is available from the ITU Sales Service, Place des Nations, 1593 CH-1211 Geneve 20, Switzerland (also check http://www.itu.int. The 1594 ITU-T STL is covered by a license defined in ITU-T Recommendation 1595 G.191, "Software tools for speech and audio coding standardization". 1597 UTF-8 1599 Information on the UCS Transformation Format 8 (UTF-8) is available 1600 at 1602 http://www.stonehand.com/unicode/standard/utf8.html 1604 DVI4 1606 An implementation is available from Jack Jansen at 1608 ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar 1610 G722 1612 An implementation of the G.722 algorithm is available as part of the 1613 ITU-T STL, described above. 1615 G726 1617 G726 is specified in the ITU-T Recommendation G.726, "40, 32, 24, 1618 and 16 kb/s Adaptive Differential Pulse Code Modulation (ADPCM)". An 1619 implementation of the G.726 algorithm is available as part of the 1620 ITU-T STL, described above. 1622 G729 1624 The reference C code implementation defining the G.729 algorithm and 1625 its Annexes A through I are available as an integral part of 1626 Recommendation G.729 from the ITU Sales Service, listed above. Annex 1627 I contains the integrated C source code for all G.729 operating 1628 modes. The G.729 algorithm and associated C code are covered by a 1629 specific license. The contact information for obtaining the license 1630 is available from the ITU-T Secretariat. 1632 GSM 1634 A reference implementation was written by Carsten Borman and Jutta 1635 Degener (TU Berlin, Germany). It is available at 1637 ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ 1639 Although the RPE-LTP algorithm is not an ITU-T standard, there is a C 1640 code implementation of the RPE-LTP algorithm available as part of the 1641 ITU-T STL. The STL implementation is an adaptation of the TU Berlin 1642 version. 1644 LPC 1646 An implementation is available at 1648 ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z 1650 PCMU, PCMA 1652 An implementation of these algorithm is available as part of the 1653 ITU-T STL, described above. Code to convert between linear and mu-law 1654 companded data is also available in [9]. 1656 Table of Contents 1658 1 Introduction ........................................ 3 1659 1.1 Terminology ......................................... 4 1660 2 RTP and RTCP Packet Forms and Protocol Behavior ..... 4 1661 3 IANA Considerations ................................. 7 1662 3.1 Registering Additional Encodings .................... 7 1663 4 Audio ............................................... 8 1664 4.1 Encoding-Independent Rules .......................... 8 1665 4.2 Operating Recommendations ........................... 10 1666 4.3 Guidelines for Sample-Based Audio Encodings ......... 10 1667 4.4 Guidelines for Frame-Based Audio Encodings .......... 11 1668 4.5 Audio Encodings ..................................... 11 1669 4.5.1 DVI4 ................................................ 12 1670 4.5.2 G722 ................................................ 14 1671 4.5.3 G726-40, G726-32, G726-24, and G726-16............... 14 1672 4.5.4 G728 ................................................ 16 1673 4.5.5 G729 ................................................ 16 1674 4.5.6 G729D and G729E ..................................... 18 1675 4.5.7 GSM ................................................. 21 1676 4.5.7.1 General Packaging Issues ............................ 21 1677 4.5.7.2 GSM variable names and numbers ...................... 21 1678 4.5.8 L8 .................................................. 21 1679 4.5.9 L16 ................................................. 22 1680 4.5.10 LPC ................................................. 23 1681 4.5.11 MPA ................................................. 24 1682 4.5.12 PCMA and PCMU ....................................... 24 1683 4.5.13 QCELP ............................................... 24 1684 4.5.14 RED ................................................. 24 1685 4.5.15 VDVI ................................................ 25 1686 5 Video ............................................... 25 1687 5.1 CelB ................................................ 26 1688 5.2 JPEG ................................................ 26 1689 5.3 H261 ................................................ 26 1690 5.4 H263-1998 ........................................... 27 1691 5.5 MPV ................................................. 27 1692 5.8 nv .................................................. 27 1693 6 Payload Type Definitions ............................ 28 1694 7 RTP over TCP and Similar Byte Stream Protocols ...... 30 1695 8 Port Assignment ..................................... 30 1696 9 Changes from RFC 1890 ............................... 31 1697 10 Security Considerations ............................. 33 1698 11 Full Copyright Statement ............................ 33 1699 12 Acknowledgements .................................... 34 1700 13 Addresses of Authors ................................ 34 1701 A Bibliography ........................................ 34