idnits 2.17.1 draft-ietf-avt-profile-new-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** There are 35 instances of too long lines in the document, the longest one being 8 characters in excess of 72. ** The abstract seems to contain references ([6]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 393 has weird spacing: '...hannels descr...' == Line 401 has weird spacing: '... lc c r...' == Line 523 has weird spacing: '...ncoding sampl...' == Line 547 has weird spacing: '...A: not appli...' == Line 629 has weird spacing: '... bits conte...' == (3 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 20, 2001) is 8310 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '6' on line 1698 looks like a reference -- Missing reference section? '1' on line 1677 looks like a reference -- Missing reference section? '2' on line 1682 looks like a reference -- Missing reference section? '3' on line 1685 looks like a reference -- Missing reference section? '4' on line 1689 looks like a reference -- Missing reference section? '5' on line 1694 looks like a reference -- Missing reference section? '7' on line 1702 looks like a reference -- Missing reference section? '8' on line 1706 looks like a reference -- Missing reference section? '9' on line 1850 looks like a reference -- Missing reference section? '10' on line 1714 looks like a reference -- Missing reference section? '11' on line 1720 looks like a reference -- Missing reference section? '12' on line 1724 looks like a reference -- Missing reference section? '13' on line 1727 looks like a reference -- Missing reference section? '0' on line 1099 looks like a reference -- Missing reference section? '22' on line 1763 looks like a reference -- Missing reference section? '23' on line 1769 looks like a reference -- Missing reference section? '24' on line 1773 looks like a reference -- Missing reference section? '25' on line 1090 looks like a reference -- Missing reference section? '26' on line 1095 looks like a reference -- Missing reference section? '27' on line 1096 looks like a reference -- Missing reference section? '28' on line 1097 looks like a reference -- Missing reference section? '29' on line 1098 looks like a reference -- Missing reference section? '30' on line 1099 looks like a reference -- Missing reference section? '31' on line 1100 looks like a reference -- Missing reference section? '32' on line 1101 looks like a reference -- Missing reference section? '33' on line 1102 looks like a reference -- Missing reference section? '34' on line 1103 looks like a reference -- Missing reference section? '35' on line 1104 looks like a reference -- Missing reference section? '36' on line 1105 looks like a reference -- Missing reference section? '37' on line 1106 looks like a reference -- Missing reference section? '38' on line 1107 looks like a reference -- Missing reference section? '39' on line 1112 looks like a reference -- Missing reference section? '40' on line 1113 looks like a reference -- Missing reference section? '41' on line 1114 looks like a reference -- Missing reference section? '42' on line 1115 looks like a reference -- Missing reference section? '43' on line 1116 looks like a reference -- Missing reference section? '14' on line 1730 looks like a reference -- Missing reference section? '44' on line 1117 looks like a reference -- Missing reference section? '15' on line 1734 looks like a reference -- Missing reference section? '45' on line 1118 looks like a reference -- Missing reference section? '16' on line 1738 looks like a reference -- Missing reference section? '46' on line 1119 looks like a reference -- Missing reference section? '17' on line 1742 looks like a reference -- Missing reference section? '47' on line 1120 looks like a reference -- Missing reference section? '18' on line 1747 looks like a reference -- Missing reference section? '48' on line 1121 looks like a reference -- Missing reference section? '19' on line 1751 looks like a reference -- Missing reference section? '49' on line 1122 looks like a reference -- Missing reference section? '20' on line 1755 looks like a reference -- Missing reference section? '50' on line 1123 looks like a reference -- Missing reference section? '21' on line 1759 looks like a reference -- Missing reference section? '51' on line 1124 looks like a reference Summary: 6 errors (**), 0 flaws (~~), 8 warnings (==), 55 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force AVT WG 3 Internet Draft Schulzrinne/Casner 4 draft-ietf-avt-profile-new-11.txt Columbia U./Packet Design 5 July 20, 2001 6 Expires: January 2002 8 RTP Profile for Audio and Video Conferences with Minimal Control 10 STATUS OF THIS MEMO 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress". 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 To view the list Internet-Draft Shadow Directories, see 29 http://www.ietf.org/shadow.html. 31 Abstract 33 This memorandum is a revision of RFC 1890 in preparation for 34 advancement from Proposed Standard to Draft Standard status. Readers 35 are encouraged to use the PostScript form of this draft to see where 36 changes from RFC 1890 are marked by change bars. 38 This document describes a profile called "RTP/AVP" for the use of the 39 real-time transport protocol (RTP), version 2, and the associated 40 control protocol, RTCP, within audio and video multiparticipant 41 conferences with minimal control. It provides interpretations of 42 generic fields within the RTP specification suitable for audio and 43 video conferences. In particular, this document defines a set of 44 default mappings from payload type numbers to encodings. 46 This document also describes how audio and video data may be carried 47 within RTP. It defines a set of standard encodings and their names 48 when used within RTP. The descriptions provide pointers to reference 49 implementations and the detailed standards. This document is meant as 50 an aid for implementors of audio, video and other real-time 51 multimedia applications. 53 Resolution of Open Issues 55 [Note to the RFC Editor: This section is to be deleted when this 56 draft is published as an RFC but is shown here for reference during 57 the Last Call. The first paragraph of the Abstract is also to be 58 deleted. All RFC XXXX should be filled in with the number of the RTP 59 specification RFC submitted for Draft Standard status, and all RFC 60 YYYY should be filled in with the number of the draft specifying MIME 61 registration of RTP payload types as it is submitted for Proposed 62 Standard status. These latter references are intended to be non- 63 normative.] 65 Readers are directed to Appendix 9, Changes from RFC 1890, for a 66 listing of the changes that have been made in this draft. The 67 changes from RFC 1890 are marked with change bars in the PostScript 68 form of this draft. 70 The changes in this revision of the draft from the previous one are: 72 o Added back G723, GSM-EFR, H263 (1996), MP2T payload formats 73 since reports of interoperable implementations of these were 74 received. 76 o Added references to optional parameters in the payload format 77 MIME registrations [6] for G723, G729, L16, MPA and MPV. 79 o Clarified that the marker bit for audio is set only when 80 packets are intentionally not sent during silence. 82 o Removed a reference in the Security Considerations section to 83 the previously removed mapping of a user pass-phrase into an 84 encryption key. 86 This version of the draft is intended to be complete for Last Call. 87 The following open issues from previous drafts have been addressed: 89 o The procedure for registering RTP encoding names as MIME 90 subtypes was moved to a separate RFC-to-be that may also serve 91 to specify how (some of) the encodings here may be used with 92 mail and other not-RTP transports. That procedure is not 93 required to implement this profile, but may be used in those 94 contexts where it is needed. 96 o This profile follows the suggestion in the RTP spec that RTCP 97 bandwidth may be specified separately from the session 98 bandwidth and separately for active senders and passive 99 receivers. 101 o No specific action is taken in this document to address 102 generic payload formats; it is assumed that if any generic 103 payload formats are developed, they can be specified in 104 separate RFCs and that the session parameters they require for 105 operation can be specified in the MIME registration of those 106 formats. 108 o The specification of the CN (comfort noise) payload format has 109 been removed to a separate draft so that it may be enhanced as 110 a result of additional work in ITU-T. That draft is intended 111 for publication at Proposed Standard status. Static payload 112 type 13 is marked reserved here for the use of that payload 113 format (since CN has already been implemented from earlier 114 drafts of this profile). Static payload type 19 is also 115 reserved because some revisions of the draft assigned that 116 number to CN to avoid an historic use of 13. 118 o The requirement for congestion control in RTP is addressed in 119 the RTP spec with an explanation that the behavior is context 120 specific and should be defined in RTP profiles. Text has been 121 added to this profile in Section 2 to describe the 122 requirements only in general terms because specific algorithms 123 have not been devised yet for multicast congestion control. 125 1 Introduction 127 This profile defines aspects of RTP left unspecified in the RTP 128 Version 2 protocol definition (RFC XXXX) [1]. This profile is 129 intended for the use within audio and video conferences with minimal 130 session control. In particular, no support for the negotiation of 131 parameters or membership control is provided. The profile is expected 132 to be useful in sessions where no negotiation or membership control 133 are used (e.g., using the static payload types and the membership 134 indications provided by RTCP), but this profile may also be useful in 135 conjunction with a higher-level control protocol. 137 Use of this profile may be implicit in the use of the appropriate 138 applications; there may be no explicit indication by port number, 139 protocol identifier or the like. Applications such as session 140 directories may use the name for this profile specified in Section 3. 142 Other profiles may make different choices for the items specified 143 here. 145 This document also defines a set of encodings and payload formats for 146 audio and video. 148 1.1 Terminology 150 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 152 document are to be interpreted as described in RFC 2119 [2] and 153 indicate requirement levels for implementations compliant with this 154 RTP profile. 156 This draft defines the term media type as dividing encodings of audio 157 and video content into three classes: audio, video and audio/video 158 (interleaved). 160 2 RTP and RTCP Packet Forms and Protocol Behavior 162 The section "RTP Profiles and Payload Format Specification" of RFC 163 XXXX enumerates a number of items that can be specified or modified 164 in a profile. This section addresses these items. Generally, this 165 profile follows the default and/or recommended aspects of the RTP 166 specification. 168 RTP data header: The standard format of the fixed RTP data 169 header is used (one marker bit). 171 Payload types: Static payload types are defined in Section 6. 173 RTP data header additions: No additional fixed fields are 174 appended to the RTP data header. 176 RTP data header extensions: No RTP header extensions are 177 defined, but applications operating under this profile MAY 178 use such extensions. Thus, applications SHOULD NOT assume 179 that the RTP header X bit is always zero and SHOULD be 180 prepared to ignore the header extension. If a header 181 extension is defined in the future, that definition MUST 182 specify the contents of the first 16 bits in such a way 183 that multiple different extensions can be identified. 185 RTCP packet types: No additional RTCP packet types are defined 186 by this profile specification. 188 RTCP report interval: The suggested constants are to be used for 189 the RTCP report interval calculation. Sessions operating 190 under this profile MAY specify a separate parameter for the 191 RTCP traffic bandwidth rather than using the default 192 fraction of the session bandwidth. The RTCP traffic 193 bandwidth MAY be divided into two separate session 194 parameters for those participants which are active data 195 senders and those which are not. Following the 196 recommendation in the RTP specification [1] that 1/4 of the 197 RTCP bandwidth be dedicated to data senders, the 198 RECOMMENDED default values for these two parameters would 199 be 1.25% and 3.75%, respectively. For a particular session, 200 the RTCP bandwidth for non-data-senders MAY be set to zero 201 when operating on unidirectional links or for sessions that 202 don't require feedback on the quality of reception. The 203 RTCP bandwidth for data senders SHOULD be kept non-zero so 204 that sender reports can still be sent for inter-media 205 synchronization and to identify the source by CNAME. The 206 means by which the one or two session parameters for RTCP 207 bandwidth are specified is beyond the scope of this memo. 209 SR/RR extension: No extension section is defined for the RTCP SR 210 or RR packet. 212 SDES use: Applications MAY use any of the SDES items described 213 in the RTP specification. While CNAME information MUST be 214 sent every reporting interval, other items SHOULD only be 215 sent every third reporting interval, with NAME sent seven 216 out of eight times within that slot and the remaining SDES 217 items cyclically taking up the eighth slot, as defined in 218 Section 6.2.2 of the RTP specification. In other words, 219 NAME is sent in RTCP packets 1, 4, 7, 10, 13, 16, 19, 220 while, say, EMAIL is used in RTCP packet 22. 222 Security: The RTP default security services are also the default 223 under this profile. 225 String-to-key mapping: No mapping is specified by this profile. 227 Congestion: RTP and this profile may be used in the context of 228 enhanced network service, for example, through Integrated 229 Services (RFC 1633) [3] or Differentiated Services (RFC 230 2475) [4], or they may be used with best effort service. 232 If enhanced service is being used, RTP receivers SHOULD 233 monitor packet loss to ensure that the service that was 234 requested is actually being delivered. If it is not, then 235 they SHOULD assume that they are receiving best-effort 236 service and behave accordingly. 238 If best-effort service is being used, RTP receivers SHOULD 239 monitor packet loss to ensure that the packet loss rate is 240 within acceptable parameters. Packet loss is considered 241 acceptable if a TCP flow across the same network path and 242 experiencing the same network conditions would achieve an 243 average throughput, measured on a reasonable timescale, 244 that is not less than the RTP flow is achieving. This 245 condition can be satisfied by implementing congestion 246 control mechanisms to adapt the transmission rate (or the 247 number of layers subscribed for a layered multicast 248 session), or by arranging for a receiver to leave the 249 session if the loss rate is unacceptably high. 251 The comparison to TCP cannot be specified exactly, but is 252 intended as an "order-of-magnitude" comparison in timescale 253 and throughput. The timescale on which TCP throughput is 254 measured is the round-trip time of the connection. In 255 essence, this requirement states that it is not acceptable 256 to deploy an application (using RTP or any other transport 257 protocol) on the best-effort Internet which consumes 258 bandwidth arbitrarily and does not compete fairly with TCP 259 within an order of magnitude. 261 Underlying protocol: The profile specifies the use of RTP over 262 unicast and multicast UDP as well as TCP. (This does not 263 preclude the use of these definitions when RTP is carried 264 by other lower-layer protocols.) 266 Transport mapping: The standard mapping of RTP and RTCP to 267 transport-level addresses is used. 269 Encapsulation: This profile leaves to applications the 270 specification of RTP encapsulation in protocols other than 271 UDP. 273 3 IANA Considerations 275 The RTP specification establishes a registry of profile names for use 276 by higher-level control protocols, such as the Session Description 277 Protocol (SDP), RFC 2327 [5], to refer to transport methods. This 278 profile registers the name "RTP/AVP". 280 3.1 Registering Additional Encodings 282 This profile lists a set of encodings, each of which is comprised of 283 a particular media data compression or representation plus a payload 284 format for encapsulation within RTP. Some of those payload formats 285 are specified here, while others are specified in separate RFCs. It 286 is expected that additional encodings beyond the set listed here will 287 be created in the future and specified in additional payload format 288 RFCs. 290 This profile also assigns to each encoding a short name which MAY be 291 used by higher-level control protocols, such as the Session 292 Description Protocol (SDP), RFC 2327 [5], to identify encodings 293 selected for a particular RTP session. 295 In some contexts it may be useful to refer to these encodings in the 296 form of a MIME content-type. To facilitate this, RFC YYYY [6] 297 provides registrations for all of the encodings names listed here as 298 MIME subtype names under the "audio" and "video" MIME types through 299 the MIME registration procedure as specified in RFC 2048 [7]. 301 Any additional encodings specified for use under this profile (or 302 others) may also be assigned names registered as MIME subtypes with 303 the Internet Assigned Numbers Authority (IANA). This registry 304 provides a means to insure that the names assigned to the additional 305 encodings are kept unique. RFC YYYY specifies the information that is 306 required for the registration of RTP encodings. 308 In addition to assigning names to encodings, this profile also also 309 assigns static RTP payload type numbers to some of them. However, the 310 payload type number space is relatively small and cannot accommodate 311 assignments for all existing and future encodings. During the early 312 stages of RTP development, it was necessary to use statically 313 assigned payload types because no other mechanism had been specified 314 to bind encodings to payload types. It was anticipated that non-RTP 315 means beyond the scope of this memo (such as directory services or 316 invitation protocols) would be specified to establish a dynamic 317 mapping between a payload type and an encoding. Now, mechanisms for 318 defining dynamic payload type bindings have been specified in the 319 Session Description Protocol (SDP) and in other protocols such as 320 ITU-T recommendation H.323/H.245. These mechanisms associate the 321 registered name of the encoding/payload format, along with any 322 additional required parameters such as the RTP timestamp clock rate 323 and number of channels, to a payload type number. This association 324 is effective only for the duration of the RTP session in which the 325 dynamic payload type binding is made. This association applies only 326 to the RTP session for which it is made, thus the numbers can be re- 327 used for different encodings in different sessions so the number 328 space limitation is avoided. 330 This profile reserves payload type numbers in the range 96-127 331 exclusively for dynamic assignment. Applications SHOULD first use 332 values in this range for dynamic payload types. Those applications 333 which need to define more than 32 dynamic payload types MAY bind 334 codes below 96, in which case it is RECOMMENDED that unassigned 335 payload type numbers be used first. However, the statically assigned 336 payload types are default bindings and MAY be dynamically bound to 337 new encodings if needed. Redefining payload types below 96 may cause 338 incorrect operation if an attempt is made to join a session without 339 obtaining session description information that defines the dynamic 340 payload types. 342 Dynamic payload types SHOULD NOT be used without a well-defined 343 mechanism to indicate the mapping. Systems that expect to 344 interoperate with others operating under this profile SHOULD NOT make 345 their own assignments of proprietary encodings to particular, fixed 346 payload types. 348 This specification establishes the policy that no additional static 349 payload types will be assigned beyond the ones defined in this 350 document. Establishing this policy avoids the problem of trying to 351 create a set of criteria for accepting static assignments and 352 encourages the implementation and deployment of the dynamic payload 353 type mechanisms. 355 4 Audio 357 4.1 Encoding-Independent Rules 359 For applications which send either no packets or occasional comfort- 360 noise packets during silence, the first packet of a talkspurt, that 361 is, the first packet after a silence period during which packets have 362 not been transmitted contiguously, SHOULD be distinguished by setting 363 the marker bit in the RTP data header to one. The marker bits in all 364 other packets is zero. The beginning of a talkspurt MAY be used to 365 adjust the playout delay to reflect changing network delays. 366 Applications without silence suppression MUST set the marker bit to 367 zero. 369 The RTP clock rate used for generating the RTP timestamp is 370 independent of the number of channels and the encoding; it equals the 371 number of sampling periods per second. For N-channel encodings, each 372 sampling period (say, 1/8000 of a second) generates N samples. (This 373 terminology is standard, but somewhat confusing, as the total number 374 of samples generated per second is then the sampling rate times the 375 channel count.) 377 If multiple audio channels are used, channels are numbered left-to- 378 right, starting at one. In RTP audio packets, information from 379 lower-numbered channels precedes that from higher-numbered channels. 381 For more than two channels, the convention followed by the AIFF-C 382 audio interchange format SHOULD be followed [8], using the following 383 notation, unless some other convention is specified for a particular 384 encoding or payload format: 386 l left 387 r right 388 c center 389 S surround 390 F front 391 R rear 393 channels description channel 394 1 2 3 4 5 6 395 __________________________________________________ 396 2 stereo l r 397 3 l r c 398 4 quadrophonic Fl Fr Rl Rr 399 4 l c r S 400 5 Fl Fr Fc Sl Sr 401 6 l lc c r rc S 403 Samples for all channels belonging to a single sampling instant MUST 404 be within the same packet. The interleaving of samples from different 405 channels depends on the encoding. General guidelines are given in 406 Section 4.3 and 4.4. 408 The sampling frequency SHOULD be drawn from the set: 8000, 11025, 409 16000, 22050, 24000, 32000, 44100 and 48000 Hz. (Older Apple 410 Macintosh computers had a native sample rate of 22254.54 Hz, which 411 can be converted to 22050 with acceptable quality by dropping 4 412 samples in a 20 ms frame.) However, most audio encodings are defined 413 for a more restricted set of sampling frequencies. Receivers SHOULD 414 be prepared to accept multi-channel audio, but MAY choose to only 415 play a single channel. 417 4.2 Operating Recommendations 419 The following recommendations are default operating parameters. 420 Applications SHOULD be prepared to handle other values. The ranges 421 given are meant to give guidance to application writers, allowing a 422 set of applications conforming to these guidelines to interoperate 423 without additional negotiation. These guidelines are not intended to 424 restrict operating parameters for applications that can negotiate a 425 set of interoperable parameters, e.g., through a conference control 426 protocol. 428 For packetized audio, the default packetization interval SHOULD have 429 a duration of 20 ms or one frame, whichever is longer, unless 430 otherwise noted in Table 1 (column "ms/packet"). The packetization 431 interval determines the minimum end-to-end delay; longer packets 432 introduce less header overhead but higher delay and make packet loss 433 more noticeable. For non-interactive applications such as lectures or 434 for links with severe bandwidth constraints, a higher packetization 435 delay MAY be used. A receiver SHOULD accept packets representing 436 between 0 and 200 ms of audio data. (For framed audio encodings, a 437 receiver SHOULD accept packets with a number of frames equal to 200 438 ms divided by the frame duration, rounded up.) This restriction 439 allows reasonable buffer sizing for the receiver. 441 4.3 Guidelines for Sample-Based Audio Encodings 443 In sample-based encodings, each audio sample is represented by a 444 fixed number of bits. Within the compressed audio data, codes for 445 individual samples may span octet boundaries. An RTP audio packet may 446 contain any number of audio samples, subject to the constraint that 447 the number of bits per sample times the number of samples per packet 448 yields an integral octet count. Fractional encodings produce less 449 than one octet per sample. 451 The duration of an audio packet is determined by the number of 452 samples in the packet. 454 For sample-based encodings producing one or more octets per sample, 455 samples from different channels sampled at the same sampling instant 456 SHOULD be packed in consecutive octets. For example, for a two- 457 channel encoding, the octet sequence is (left channel, first sample), 458 (right channel, first sample), (left channel, second sample), (right 459 channel, second sample), .... For multi-octet encodings, octets 460 SHOULD be transmitted in network byte order (i.e., most significant 461 octet first). 463 The packing of sample-based encodings producing less than one octet 464 per sample is encoding-specific. 466 The RTP timestamp reflects the instant at which the first sample in 467 the packet was sampled, that is, the oldest information in the 468 packet. 470 4.4 Guidelines for Frame-Based Audio Encodings 472 Frame-based encodings encode a fixed-length block of audio into 473 another block of compressed data, typically also of fixed length. For 474 frame-based encodings, the sender MAY choose to combine several such 475 frames into a single RTP packet. The receiver can tell the number of 476 frames contained in an RTP packet, if all the frames have the same 477 length, by dividing the RTP payload length by the audio frame size 478 which is defined as part of the encoding. This does not work when 479 carrying frames of different sizes unless the frame sizes are 480 relatively prime. If not, the frames MUST indicate their size. 482 For frame-based codecs, the channel order is defined for the whole 483 block. That is, for two-channel audio, right and left samples SHOULD 484 be coded independently, with the encoded frame for the left channel 485 preceding that for the right channel. 487 All frame-oriented audio codecs SHOULD be able to encode and decode 488 several consecutive frames within a single packet. Since the frame 489 size for the frame-oriented codecs is given, there is no need to use 490 a separate designation for the same encoding, but with different 491 number of frames per packet. 493 RTP packets SHALL contain a whole number of frames, with frames 494 inserted according to age within a packet, so that the oldest frame 495 (to be played first) occurs immediately after the RTP packet header. 496 The RTP timestamp reflects the instant at which the first sample in 497 the first frame was sampled, that is, the oldest information in the 498 packet. 500 4.5 Audio Encodings 502 The characteristics of the audio encodings described in this document 503 are shown in Table 1; they are listed in order of their payload type 504 in Table 4. While most audio codecs are only specified for a fixed 505 sampling rate, some sample-based algorithms (indicated by an entry of 506 "var." in the sampling rate column of Table 1) may be used with 507 different sampling rates, resulting in different coded bit rates. 508 When used with a sampling rate other than that for which a static 509 payload type is defined, non-RTP means beyond the scope of this memo 510 MUST be used to define a dynamic payload type and MUST indicate the 511 selected RTP timestamp clock rate, which is usually the same as the 512 sampling rate for audio. 514 4.5.1 DVI4 516 DVI4 is specified, with pseudo-code, in [9] as the IMA ADPCM wave 517 type. 519 However, the encoding defined here as DVI4 differs in three respects 520 from this recommendation: 522 name of sampling default 523 encoding sample/frame bits/sample rate ms/frame ms/packet 524 __________________________________________________________________ 525 DVI4 sample 4 var. 20 526 G722 sample 8 16,000 20 527 G723 frame N/A 8,000 30 30 528 G726-40 sample 5 8,000 20 529 G726-32 sample 4 8,000 20 530 G726-24 sample 3 8,000 20 531 G726-16 sample 2 8,000 20 532 G728 frame N/A 8,000 2.5 20 533 G729 frame N/A 8,000 10 20 534 G729D frame N/A 8,000 10 20 535 G729E frame N/A 8,000 10 20 536 GSM frame N/A 8,000 20 20 537 GSM-EFR frame N/A 8,000 20 20 538 L8 sample 8 var. 20 539 L16 sample 16 var. 20 540 LPC frame N/A 8,000 20 20 541 MPA frame N/A var. var. 542 PCMA sample 8 var. 20 543 PCMU sample 8 var. 20 544 QCELP frame N/A 8,000 20 20 545 VDVI sample var. var. 20 547 Table 1: Properties of Audio Encodings (N/A: not applicable; var.: 548 variable) 550 o The RTP DVI4 header contains the predicted value rather than 551 the first sample value contained the IMA ADPCM block header. 553 o IMA ADPCM blocks contain an odd number of samples, since the 554 first sample of a block is contained just in the header 555 (uncompressed), followed by an even number of compressed 556 samples. DVI4 has an even number of compressed samples only, 557 using the `predict' word from the header to decode the first 558 sample. 560 o For DVI4, the 4-bit samples are packed with the first sample 561 in the four most significant bits and the second sample in the 562 four least significant bits. In the IMA ADPCM codec, the 563 samples are packed in the opposite order. 565 Each packet contains a single DVI block. This profile only defines 566 the 4-bit-per-sample version, while IMA also specifies a 3-bit-per- 567 sample encoding. 569 The "header" word for each channel has the following structure: 571 int16 predict; /* predicted value of first sample 572 from the previous block (L16 format) */ 573 u_int8 index; /* current index into stepsize table */ 574 u_int8 reserved; /* set to zero by sender, ignored by receiver */ 576 Each octet following the header contains two 4-bit samples, thus the 577 number of samples per packet MUST be even because there is no means 578 to indicate a partially filled last octet. 580 Packing of samples for multiple channels is for further study. 582 The document IMA Recommended Practices for Enhancing Digital Audio 583 Compatibility in Multimedia Systems (version 3.0) contains the 584 algorithm description. It is available from 586 Interactive Multimedia Association 587 48 Maryland Avenue, Suite 202 588 Annapolis, MD 21401-8011 589 USA 590 phone: +1 410 626-1380 592 4.5.2 G722 594 G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding 595 within 64 kbit/s". The G.722 encoder produces a stream of octets, 596 each of which SHALL be octet-aligned in an RTP packet. The first bit 597 transmitted in the G.722 octet, which is the most significant bit of 598 the higher sub-band sample, SHALL correspond to the most significant 599 bit of the octet in the RTP packet. 601 Even though the actual sampling rate for G.722 audio is 16000 Hz, the 602 RTP clock rate for the G722 payload format is 8000 Hz because that 603 value was erroneously assigned in RFC 1890 and must remain unchanged 604 for backward compatibility. The octet rate or sample-pair rate is 605 8000 Hz. 607 4.5.3 G723 609 G723 is specified in ITU Recommendation G.723.1, "Dual-rate speech 610 coder for multimedia communications transmitting at 5.3 and 6.3 611 kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T as 612 a mandatory codec for ITU-T H.324 GSTN videophone terminal 613 applications. The algorithm has a floating point specification in 614 Annex B to G.723.1, a silence compression algorithm in Annex A to 615 G.723.1 and an encoded signal bit-error sensitivity specification in 616 G.723.1 Annex C. 618 This Recommendation specifies a coded representation that can be used 619 for compressing the speech signal component of multi-media services 620 at a very low bit rate. Audio is encoded in 30 ms frames, with an 621 additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be 622 one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s 623 frame), or 4 octets. These 4-octet frames are called SID frames 624 (Silence Insertion Descriptor) and are used to specify comfort noise 625 parameters. There is no restriction on how 4, 20, and 24 octet frames 626 are intermixed. The least significant two bits of the first octet in 627 the frame determine the frame size and codec type: 629 bits content octets/frame 630 00 high-rate speech (6.3 kb/s) 24 631 01 low-rate speech (5.3 kb/s) 20 632 10 SID frame 4 633 11 reserved 635 It is possible to switch between the two rates at any 30 ms frame 636 boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of 637 the encoder and decoder. The MIME registration for G723 in RFC YYYY 638 [6] specifies parameters that MAY be used with MIME or SDP to 639 restrict to a single data rate or to restrict the use of SID frames. 640 This coder was optimized to represent speech with near-toll quality 641 at the above rates using a limited amount of complexity. 643 The packing of the encoded bit stream into octets and the 644 transmission order of the octets is specified in Rec. G.723.1 and is 645 the same as that produced by the G.723 C code reference 646 implementation. For the 6.3 kb/s data rate, this packing is 647 illustrated as follows, where the header (HDR) bits are always "0 0" 648 as shown in Fig. 1 to indicate operation at 6.3 kb/s, and the Z bit 649 is always set to zero. The diagrams show the bit packing in "network 650 byte order," also known as big-endian order. The bits of each 32-bit 651 word are numbered 0 to 31, with the most significant bit on the left 652 and numbered 0. The octets (bytes) of each word are transmitted most 653 significant octet first. The bits of each data field are numbered in 654 the order of the bit stream representation of the encoding (least 655 significant bit first). The vertical bars indicate the boundaries 656 between field fragments. 658 0 1 2 3 659 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 660 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 661 | LPC |HDR| LPC | LPC | ACL0 |LPC| 662 | | | | | | | 663 |0 0 0 0 0 0|0 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| 664 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| 665 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 666 | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 | 667 | | 1 |C| | 3 | 2 | | | 668 |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| 669 |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| 670 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 671 | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 | 672 | | | | | | | 673 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0| 674 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8| 675 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 676 | MSBPOS |Z|POS| MSBPOS | POS0 |POS| POS0 | 677 | | | 0 | | | 1 | | 678 |0 0 0 0 0 0 0|0|0 0|1 1 1 0 0 0|0 0 0 0 0 0 0 0|0 0|1 1 1 1 1 1| 679 |6 5 4 3 2 1 0| |1 0|2 1 0 9 8 7|9 8 7 6 5 4 3 2|1 0|5 4 3 2 1 0| 680 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 681 | POS1 | POS2 | POS1 | POS2 | POS3 | POS2 | 682 | | | | | | | 683 |0 0 0 0 0 0 0 0|0 0 0 0|1 1 1 1|1 1 0 0 0 0 0 0|0 0 0 0|1 1 1 1| 684 |9 8 7 6 5 4 3 2|3 2 1 0|3 2 1 0|1 0 9 8 7 6 5 4|3 2 1 0|5 4 3 2| 685 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 686 | POS3 | PSIG0 |POS|PSIG2| PSIG1 | PSIG3 |PSIG2| 687 | | | 3 | | | | | 688 |1 1 0 0 0 0 0 0|0 0 0 0 0 0|1 1|0 0 0|0 0 0 0 0|0 0 0 0 0|0 0 0| 689 |1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|2 1 0|4 3 2 1 0|4 3 2 1 0|5 4 3| 690 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 692 Figure 1: G.723 (6.3 kb/s) bit packing 694 For the 5.3 kb/s data rate, the header (HDR) bits are always "0 1", 695 as shown in Fig. 2, to indicate operation at 5.3 kb/s. 697 0 1 2 3 698 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 700 | LPC |HDR| LPC | LPC | ACL0 |LPC| 701 | | | | | | | 702 |0 0 0 0 0 0|0 1|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| 703 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| 704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 705 | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 | 706 | | 1 |C| | 3 | 2 | | | 707 |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| 708 |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| 709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 710 | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 | 711 | | | | | | | 712 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0| 713 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|4 3 2 1|1 0 9 8| 714 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 715 | POS0 | POS1 | POS0 | POS1 | POS2 | 716 | | | | | | 717 |0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| 718 |7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| 719 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 720 | POS3 | POS2 | POS3 | PSIG1 | PSIG0 | PSIG3 | PSIG2 | 721 | | | | | | | | 722 |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0| 723 |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|3 2 1 0|3 2 1 0|3 2 1 0|3 2 1 0| 724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 726 Figure 2: G.723 (5.3 kb/s) bit packing 728 The packing of G.723.1 SID (silence) frames, which are indicated by 729 the header (HDR) bits having the pattern "1 0", is depicted in Fig. 730 3. 732 0 1 2 3 733 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 734 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 735 | LPC |HDR| LPC | LPC | GAIN |LPC| 736 | | | | | | | 737 |0 0 0 0 0 0|1 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| 738 |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| 739 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 741 Figure 3: G.723 SID mode bit packing 743 4.5.4 G726-40, G726-32, G726-24, and G726-16 745 ITU-T Recommendation G.726 describes, among others, the algorithm 746 recommended for conversion of a single 64 kbit/s A-law or mu-law PCM 747 channel encoded at 8000 samples/sec to and from a 40, 32, 24, or 16 748 kbit/s channel. The conversion is applied to the PCM stream using an 749 Adaptive Differential Pulse Code Modulation (ADPCM) transcoding 750 technique. The ADPCM representation consists of a series of codewords 751 with a one-to-one correspondance to the samples in the PCM stream. 752 The G726 data rates of 40, 32, 24, and 16 kbit/s have codewords of 5, 753 4, 3, and 2 bits respectively. 755 The 16 and 24 kbit/s encodings do not provide toll quality speech. 756 They are designed for used in overloaded Digital Circuit 757 Multiplication Equipment (DCME). ITU-T G.726 recommends that the 16 758 and 24 kbit/s encodings should be alternated with higher data rate 759 encodings to provide an average sample size of between 3.5 and 3.7 760 bits per sample. 762 The encodings of G.726 are here denoted as G726-40, G726-32, G726-24, 763 and G726-16. Prior to 1990, G721 described the 32 kbit/s ADPCM 764 encoding, and G723 described the 40, 32, and 16 kbit/s encodings. 765 Thus, G726-32 designates the same algorithm as G721 in RFC 1890. 767 A stream of G726 codewords contains no information on the encoding 768 being used, therefore transitions between G726 encoding types is not 769 permitted within a sequence of packed codewords. Applications MUST 770 determine the encoding type of packed codewords from the RTP payload 771 identifier. 773 No payload-specific header information SHALL be included as part of 774 the audio data. A stream of G726 codewords MUST be packed into octets 775 as follows: the first codeword is placed into the first octet such 776 that the least significant bit of the codeword aligns with the least 777 significant bit in the octet, the second codeword is then packed so 778 that its least significant bit coincides with the least significant 779 unoccupied bit in the octet. When a complete codeword cannot be 780 placed into an octet, the bits overlapping the octet boundary are 781 placed into the least significant bits of the next octet. Packing 782 MUST end with a completely packed final octet. The number of 783 codewords packed will therefore be a multiple of 8, 2, 8, and 4 for 784 G726-40, G726-32, G726-24, and G726-16 respectively. An example of 785 the packing scheme for G726-32 codewords is as shown: 787 0 1 788 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 789 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 790 |B B B B|A A A A|D D D D|C C C C| ... 791 |0 1 2 3|0 1 2 3|0 1 2 3|0 1 2 3| 792 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 794 An example of the packing scheme for G726-24 codewords is: 796 0 1 2 797 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 798 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 799 |C C|B B B|A A A|F|E E E|D D D|C|H H H|G G G|F F| ... 800 |1 2|0 1 2|0 1 2|2|0 1 2|0 1 2|0|0 1 2|0 1 2|0 1| 801 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 803 4.5.5 G728 805 G728 is specified in ITU-T Recommendation G.728, "Coding of speech at 806 16 kbit/s using low-delay code excited linear prediction". 808 A G.278 encoder translates 5 consecutive audio samples into a 10-bit 809 codebook index, resulting in a bit rate of 16 kb/s for audio sampled 810 at 8,000 samples per second. The group of five consecutive samples is 811 called a vector. Four consecutive vectors, labeled V1 to V4 (where V1 812 is to be played first by the receiver), build one G.728 frame. The 813 four vectors of 40 bits are packed into 5 octets, labeled B1 through 814 B5. B1 SHALL be placed first in the RTP packet. 816 Referring to the figure below, the principle for bit order is 817 "maintenance of bit significance". Bits from an older vector are more 818 significant than bits from newer vectors. The MSB of the frame goes 819 to the MSB of B1 and the LSB of the frame goes to LSB of B5. 821 1 2 3 3 822 0 0 0 0 9 823 ++++++++++++++++++++++++++++++++++++++++ 824 <---V1---><---V2---><---V3---><---V4---> vectors 825 <--B1--><--B2--><--B3--><--B4--><--B5--> octets 826 <------------- frame 1 ----------------> 828 In particular, B1 contains the eight most significant bits of V1, 829 with the MSB of V1 being the MSB of B1. B2 contains the two least 830 significant bits of V1, the more significant of the two in its MSB, 831 and the six most significant bits of V2. B1 SHALL be placed first in 832 the RTP packet and B5 last. 834 4.5.6 G729 836 G729 is specified in ITU-T Recommendation G.729, "Coding of speech at 837 8 kbit/s using conjugate structure-algebraic code excited linear 838 prediction (CS-ACELP)". A reduced-complexity version of the G.729 839 algorithm is specified in Annex A to Rec. G.729. The speech coding 840 algorithms in the main body of G.729 and in G.729 Annex A are fully 841 interoperable with each other, so there is no need to further 842 distinguish between them. The G.729 and G.729 Annex A codecs were 843 optimized to represent speech with high quality, where G.729 Annex A 844 trades some speech quality for an approximate 50% complexity 845 reduction [10]. See the next Section (4.5.7) for other data rates 846 added in later G.729 Annexes. For all data rates, the sampling 847 frequency (and RTP timestamp clock rate) is 8000 Hz. 849 A voice activity detector (VAD) and comfort noise generator (CNG) 850 algorithm in Annex B of G.729 is RECOMMENDED for digital simultaneous 851 voice and data applications and can be used in conjunction with G.729 852 or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets, 853 while the G.729 Annex B comfort noise frame occupies 2 octets. The 854 MIME registration for G729 in RFC YYYY [6] specifies a parameter that 855 MAY be used with MIME or SDP to restrict the use of comfort noise 856 frames. 858 A G729 RTP packet may consist of zero or more G.729 or G.729 Annex A 859 frames, followed by zero or one G.729 Annex B frames. The presence of 860 a comfort noise frame can be deduced from the length of the RTP 861 payload. The default packetization interval is 20 ms (two frames), 862 but in some situations it may be desireable to send 10 ms packets. An 863 example would be a transition from speech to comfort noise in the 864 first 10 ms of the packet. For some applications, a longer 865 packetization interval may be required to reduce the packet rate. 867 The transmitted parameters of a G.729/G.729A 10-ms frame, consisting 868 of 80 bits, are defined in Recommendation G.729, Table 8/G.729. The 869 mapping of the these parameters is given below in Fig. 4. The 870 diagrams show the bit packing in "network byte order," also known as 871 big-endian order. The bits of each 32-bit word are numbered 0 to 31, 872 with the most significant bit on the left and numbered 0. The octets 873 (bytes) of each word are transmitted most significant octet first. 874 The bits of each data field are numbered in the order as produced by 875 the G.729 C code reference implementation. 877 0 1 2 3 878 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 879 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 880 |L| L1 | L2 | L3 | P1 |P| C1 | 881 |0| | | | |0| | 882 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4| 883 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 884 | C1 | S1 | GA1 | GB1 | P2 | C2 | 885 | 1 1 1| | | | | | 886 |5 6 7 8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7| 887 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 888 | C2 | S2 | GA2 | GB2 | 889 | 1 1 1| | | | 890 |8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3| 891 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 893 Figure 4: G.729 and G.729A bit packing 895 The packing of the G.729 Annex B comfort noise frame is shown in Fig. 896 5. 898 0 1 899 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 900 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 901 |L| LSF1 | LSF2 | GAIN |R| 902 |S| | | |E| 903 |F| | | |S| 904 |0|0 1 2 3 4|0 1 2 3|0 1 2 3 4|V| RESV = Reserved (zero) 905 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 907 Figure 5: G.729 Annex B bit packing 909 4.5.7 G729D and G729E 911 Annexes D and E to ITU-T Recommendation G.729 provide additional data 912 rates. Because the data rate is not signaled in the bitstream, the 913 different data rates are given distinct RTP encoding names which are 914 mapped to distinct payload type numbers. G729D indicates a 6.4 kbit/s 915 coding mode (G.729 Annex D, for momentary reduction in channel 916 capacity), while G729E indicates an 11.8 kbit/s mode (G.729 Annex E, 917 for improved performance with a wide range of narrow-band input 918 signals, e.g. music and background noise). Annex E has two operating 919 modes, backward adaptive and forward adaptive, which are signaled by 920 the first two bits in each frame (the most significant two bits of 921 the first octet). 923 The voice activity detector (VAD) and comfort noise generator (CNG) 924 algorithm specified in Annex B of G.729 may be used with Annex D and 925 Annex E frames in addition to G.729 and G.729 Annex A frames. The 926 algorithm details for the operation of Annexes D and E with the Annex 927 B CNG are specified in G.729 Annexes F and G. Note that Annexes F and 928 G do not introduce any new encodings. The MIME registrations for 929 G729D and G729E in RFC YYYY [6] specify a parameter that MAY be used 930 with MIME or SDP to restrict the use of comfort noise frames. 932 For G729D, an RTP packet may consist of zero or more G.729 Annex D 933 frames, followed by zero or one G.729 Annex B frame. Similarly, for 934 G729E, an RTP packet may consist of zero or more G.729 Annex E 935 frames, followed by zero or one G.729 Annex B frame. The presence of 936 a comfort noise frame can be deduced from the length of the RTP 937 payload. 939 A single RTP packet must contain frames of only one data rate, 940 optionally followed by one comfort noise frame. The data rate may be 941 changed from packet to packet by changing the payload type number. 942 G.729 Annexes D, E and H describe what the encoding and decoding 943 algorithms must do to accommodate a change in data rate. 945 For G729D, the bits of a G.729 Annex D frame are formatted as shown 946 below in Fig. 6 (cf. Table D.1/G.729). The frame length is 64 bits. 948 0 1 2 3 949 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 950 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 951 |L| L1 | L2 | L3 | P1 | C1 | 952 |0| | | | | | 953 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7|0 1 2 3 4 5| 954 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 955 | C1 |S1 | GA1 | GB1 | P2 | C2 |S2 | GA2 | GB2 | 956 | | | | | | | | | | 957 |6 7 8|0 1|0 1 2|0 1 2|0 1 2 3|0 1 2 3 4 5 6 7 8|0 1|0 1 2|0 1 2| 958 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 960 Figure 6: G.729 Annex D bit packing 962 The net bit rate for the G.729 Annex E algorithm is 11.8 kbit/s and a 963 total of 118 bits are used. Two bits are appended as "don't care" 964 bits to complete an integer number of octets for the frame. For 965 G729E, the bits of a data frame are formatted as shown in the next 966 two diagrams (cf. Table E.1/G.729). The fields for the G729E forward 967 adaptive mode are packed as shown in Fig. 7. 969 0 1 2 3 970 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 971 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 972 |0 0|L| L1 | L2 | L3 | P1 |P| C0_1| 973 | |0| | | | |0| | 974 | | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2| 975 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 976 | | C1_1 | C2_1 | C3_1 | C4_1 | 977 | | | | | | 978 |3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6| 979 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 980 | GA1 | GB1 | P2 | C0_2 | C1_2 | C2_2 | 981 | | | | | | | 982 |0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5| 983 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 984 | | C3_2 | C4_2 | GA2 | GB2 |DC | 985 | | | | | | | 986 |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 987 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 989 Figure 7: G.729 Annex E (forward adaptive mode) bit packing 991 The fields for the G729E backward adaptive mode are packed as shown 992 in Fig. 8. 994 0 1 2 3 995 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 996 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 997 |1 1| P1 |P| C0_1 | C1_1 | 998 | | |0| 1 1 1| | 999 | |0 1 2 3 4 5 6 7|0|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7| 1000 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1001 | | C2_1 | C3_1 | C4_1 |GA1 | GB1 |P2 | 1002 | | | | | | | | 1003 |8 9|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 1004 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1005 | | C0_2 | C1_2 | C2_2 | 1006 | | 1 1 1| | | 1007 |2 3 4|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7 8 9|0 1 2 3 4 5| 1008 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1009 | | C3_2 | C4_2 | GA2 | GB2 |DC | 1010 | | | | | | | 1011 |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| 1012 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1014 Figure 8: G.729 Annex E (backward adaptive mode) bit packing 1016 4.5.8 GSM 1018 GSM (group speciale mobile) denotes the European GSM 06.10 standard 1019 for full-rate speech transcoding, ETS 300 961, which is based on 1020 RPE/LTP (residual pulse excitation/long term prediction) coding at a 1021 rate of 13 kb/s [11,12,13]. The text of the standard can be obtained 1022 from 1024 ETSI (European Telecommunications Standards Institute) 1025 ETSI Secretariat: B.P.152 1026 F-06561 Valbonne Cedex 1027 France 1028 Phone: +33 92 94 42 00 1029 Fax: +33 93 65 47 16 1031 Blocks of 160 audio samples are compressed into 33 octets, for an 1032 effective data rate of 13,200 b/s. 1034 4.5.8.1 General Packaging Issues 1036 The GSM standard (ETS 300 961) specifies the bit stream produced by 1037 the codec, but does not specify how these bits should be packed for 1038 transmission. The packetization specified here has subsequently been 1039 adopted in ETSI Technical Specification TS 101 318. Some software 1040 implementations of the GSM codec use a different packing than that 1041 specified here. 1043 In the GSM packing used by RTP, the bits SHALL be packed beginning 1044 from the most significant bit. Every 160 sample GSM frame is coded 1045 into one 33 octet (264 bit) buffer. Every such buffer begins with a 4 1046 bit signature (0xD), followed by the MSB encoding of the fields of 1047 the frame. The first octet thus contains 1101 in the 4 most 1048 significant bits (0-3) and the 4 most significant bits of F1 (0-3) in 1049 the 4 least significant bits (4-7). The second octet contains the 2 1050 least significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so 1051 on. The order of the fields in the frame is described in Table 2. 1053 4.5.8.2 GSM variable names and numbers 1055 In the RTP encoding we have the bit pattern described in Table 3, 1056 where F.i signifies the ith bit of the field F, bit 0 is the most 1057 significant bit, and the bits of every octet are numbered from 0 to 7 1058 from most to least significant. 1060 4.5.9 GSM-EFR 1062 GSM-EFR denotes GSM 06.60 enhanced full rate speech transcoding, 1063 specified in ETS 300 969 which is available from ETSI at the address 1064 given in Section 4.5.8. This codec has a frame length of 244 bits. 1065 For transmission in RTP, each codec frame is packed into a 31 octet 1066 (248 bit) buffer beginning with a 4-bit signature 0xC in a manner 1067 similar to that specified here for the original GSM 06.10 codec. The 1068 packing is specified in ETSI Technical Specification TS 101 318. 1070 4.5.10 L8 1072 L8 denotes linear audio data samples, using 8-bits of precision with 1073 an offset of 128, that is, the most negative signal is encoded as 1074 zero. 1076 4.5.11 L16 1078 L16 denotes uncompressed audio data samples, using 16-bit signed 1079 representation with 65535 equally divided steps between minimum and 1080 maximum signal level, ranging from -32768 to 32767. The value is 1081 represented in two's complement notation and transmitted in network 1082 byte order (most significant byte first). 1084 The MIME registration for L16 in RFC YYYY [6] specifies parameters 1085 field field name bits field field name bits 1086 ________________________________________________ 1087 1 LARc[0] 6 39 xmc[22] 3 1088 2 LARc[1] 6 40 xmc[23] 3 1089 3 LARc[2] 5 41 xmc[24] 3 1090 4 LARc[3] 5 42 xmc[25] 3 1091 5 LARc[4] 4 43 Nc[2] 7 1092 6 LARc[5] 4 44 bc[2] 2 1093 7 LARc[6] 3 45 Mc[2] 2 1094 8 LARc[7] 3 46 xmaxc[2] 6 1095 9 Nc[0] 7 47 xmc[26] 3 1096 10 bc[0] 2 48 xmc[27] 3 1097 11 Mc[0] 2 49 xmc[28] 3 1098 12 xmaxc[0] 6 50 xmc[29] 3 1099 13 xmc[0] 3 51 xmc[30] 3 1100 14 xmc[1] 3 52 xmc[31] 3 1101 15 xmc[2] 3 53 xmc[32] 3 1102 16 xmc[3] 3 54 xmc[33] 3 1103 17 xmc[4] 3 55 xmc[34] 3 1104 18 xmc[5] 3 56 xmc[35] 3 1105 19 xmc[6] 3 57 xmc[36] 3 1106 20 xmc[7] 3 58 xmc[37] 3 1107 21 xmc[8] 3 59 xmc[38] 3 1108 22 xmc[9] 3 60 Nc[3] 7 1109 23 xmc[10] 3 61 bc[3] 2 1110 24 xmc[11] 3 62 Mc[3] 2 1111 25 xmc[12] 3 63 xmaxc[3] 6 1112 26 Nc[1] 7 64 xmc[39] 3 1113 27 bc[1] 2 65 xmc[40] 3 1114 28 Mc[1] 2 66 xmc[41] 3 1115 29 xmaxc[1] 6 67 xmc[42] 3 1116 30 xmc[13] 3 68 xmc[43] 3 1117 31 xmc[14] 3 69 xmc[44] 3 1118 32 xmc[15] 3 70 xmc[45] 3 1119 33 xmc[16] 3 71 xmc[46] 3 1120 34 xmc[17] 3 72 xmc[47] 3 1121 35 xmc[18] 3 73 xmc[48] 3 1122 36 xmc[19] 3 74 xmc[49] 3 1123 37 xmc[20] 3 75 xmc[50] 3 1124 38 xmc[21] 3 76 xmc[51] 3 1126 Table 2: Ordering of GSM variables 1128 that MAY be used with MIME or SDP to indicate that analog preemphasis 1129 was applied to the signal before quantization or to indicate that a 1130 multiple-channel audio stream follows a different channel ordering 1131 convention than is specified in Section 4.1. 1133 Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 1134 _____________________________________________________________________________ 1135 0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3 1136 1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5 1137 2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2 1138 3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1 1139 4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2 1140 5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0 1141 6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04 1142 7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0 1143 8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2 1144 9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1 1145 10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0 1146 11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2 1147 12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0 1148 13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14 1149 14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0 1150 15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2 1151 16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1 1152 17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0 1153 18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2 1154 19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0 1155 20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24 1156 21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0 1157 22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2 1158 23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1 1159 24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0 1160 25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2 1161 26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0 1162 27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34 1163 28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0 1164 29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2 1165 30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1 1166 31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0 1167 32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2 1169 Table 3: GSM payload format 1171 4.5.12 LPC 1173 LPC designates an experimental linear predictive encoding contributed 1174 by Ron Frederick, which is based on an implementation written by Ron 1175 Zuckerman posted to the Usenet group comp.dsp on June 26, 1992. The 1176 codec generates 14 octets for every frame. The framesize is set to 20 1177 ms, resulting in a bit rate of 5,600 b/s. 1179 4.5.13 MPA 1181 MPA denotes MPEG-1 or MPEG-2 audio encapsulated as elementary 1182 streams. The encoding is defined in ISO standards ISO/IEC 11172-3 1183 and 13818-3. The encapsulation is specified in RFC 2250 [14]. 1185 The encoding may be at any of three levels of complexity, called 1186 Layer I, II and III. The selected layer as well as the sampling rate 1187 and channel count are indicated in the payload. The RTP timestamp 1188 clock rate is always 90000, independent of the sampling rate. MPEG-1 1189 audio supports sampling rates of 32, 44.1, and 48 kHz (ISO/IEC 1190 11172-3, section 1.1; "Scope"). MPEG-2 supports sampling rates of 16, 1191 22.05 and 24 kHz. The number of samples per frame is fixed, but the 1192 frame size will vary with the sampling rate and bit rate. 1194 The MIME registration for MPA in RFC YYYY [6] specifies parameters 1195 that MAY be used with MIME or SDP to restrict the selection of layer, 1196 channel count, sampling rate, and bit rate. 1198 4.5.14 PCMA and PCMU 1200 PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio data 1201 is encoded as eight bits per sample, after logarithmic scaling. PCMU 1202 denotes mu-law scaling, PCMA A-law scaling. A detailed description is 1203 given by Jayant and Noll [15]. Each G.711 octet SHALL be octet- 1204 aligned in an RTP packet. The sign bit of each G.711 octet SHALL 1205 correspond to the most significant bit of the octet in the RTP packet 1206 (i.e., assuming the G.711 samples are handled as octets on the host 1207 machine, the sign bit SHALL be the most signficant bit of the octet 1208 as defined by the host machine format). The 56 kb/s and 48 kb/s modes 1209 of G.711 are not applicable to RTP, since PCMA and PCMU MUST always 1210 be transmitted as 8-bit samples. 1212 4.5.15 QCELP 1214 The Electronic Industries Association (EIA) & Telecommunications 1215 Industry Association (TIA) standard IS-733, "TR45: High Rate Speech 1216 Service Option for Wideband Spread Spectrum Communications Systems," 1217 defines the QCELP audio compression algorithm for use in wireless 1218 CDMA applications. The QCELP CODEC compresses each 20 milliseconds of 1219 8000 Hz, 16- bit sampled input speech into one of four different size 1220 output frames: Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4 (54 1221 bits) or Rate 1/8 (20 bits). For typical speech patterns, this 1222 results in an average output of 6.8 k bits/sec for normal mode and 1223 4.7 k bits/sec for reduced rate mode. The packetization of the QCELP 1224 audio codec is described in [16]. 1226 4.5.16 RED 1227 The redundant audio payload format "RED" is specified by RFC 2198 1228 [17]. It defines a means by which multiple redundant copies of an 1229 audio packet may be transmitted in a single RTP stream. Each packet 1230 in such a stream contains, in addition to the audio data for that 1231 packetization interval, a (more heavily compressed) copy of the data 1232 from a previous packetization interval. This allows an approximation 1233 of the data from lost packets to be recovered upon decoding of a 1234 subsequent packet, giving much improved sound quality when compared 1235 with silence substitution for lost packets. 1237 4.5.17 VDVI 1239 VDVI is a variable-rate version of DVI4, yielding speech bit rates of 1240 between 10 and 25 kb/s. It is specified for single-channel operation 1241 only. Samples are packed into octets starting at the most- 1242 significant bit. The last octet is padded with 1 bits if the last 1243 sample does not fill the last octet. This padding is distinct from 1244 the valid codewords. The receiver needs to detect the padding 1245 because there is no explicit count of samples in the packet. 1247 It uses the following encoding: 1249 DVI4 codeword VDVI bit pattern 1250 _______________________________ 1251 0 00 1252 1 010 1253 2 1100 1254 3 11100 1255 4 111100 1256 5 1111100 1257 6 11111100 1258 7 11111110 1259 8 10 1260 9 011 1261 10 1101 1262 11 11101 1263 12 111101 1264 13 1111101 1265 14 11111101 1266 15 11111111 1268 5 Video 1270 The following sections describe the video encodings that are defined 1271 in this memo and give their abbreviated names used for 1272 identification. These video encodings and their payload types are 1273 listed in Table 5. 1275 All of these video encodings use an RTP timestamp frequency of 90,000 1276 Hz, the same as the MPEG presentation time stamp frequency. This 1277 frequency yields exact integer timestamp increments for the typical 1278 24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates 1279 and 50, 59.94 and 60 Hz field rates. While 90 kHz is the RECOMMENDED 1280 rate for future video encodings used within this profile, other rates 1281 MAY be used. However, it is not sufficient to use the video frame 1282 rate (typically between 15 and 30 Hz) because that does not provide 1283 adequate resolution for typical synchronization requirements when 1284 calculating the RTP timestamp corresponding to the NTP timestamp in 1285 an RTCP SR packet. The timestamp resolution MUST also be sufficient 1286 for the jitter estimate contained in the receiver reports. 1288 For most of these video encodings, the RTP timestamp encodes the 1289 sampling instant of the video image contained in the RTP data packet. 1290 If a video image occupies more than one packet, the timestamp is the 1291 same on all of those packets. Packets from different video images are 1292 distinguished by their different timestamps. 1294 Most of these video encodings also specify that the marker bit of the 1295 RTP header SHOULD be set to one in the last packet of a video frame 1296 and otherwise set to zero. Thus, it is not necessary to wait for a 1297 following packet with a different timestamp to detect that a new 1298 frame should be displayed. 1300 5.1 CelB 1302 The CELL-B encoding is a proprietary encoding proposed by Sun 1303 Microsystems. The byte stream format is described in RFC 2029 [18]. 1305 5.2 JPEG 1307 The encoding is specified in ISO Standards 10918-1 and 10918-2. The 1308 RTP payload format is as specified in RFC 2435 [19]. 1310 5.3 H261 1312 The encoding is specified in ITU-T Recommendation H.261, "Video codec 1313 for audiovisual services at p x 64 kbit/s". The packetization and 1314 RTP-specific properties are described in RFC 2032 [20]. 1316 5.4 H263 1318 The encoding is specified in the 1996 version of ITU-T Recommendation 1319 H.263, "Video coding for low bit rate communication". The 1320 packetization and RTP-specific properties are described in RFC 2190 1321 [21]. The H263-1998 payload format is RECOMMENDED over this one for 1322 use by new implementations. 1324 5.5 H263-1998 1326 The encoding is specified in the 1998 version of ITU-T Recommendation 1327 H.263, "Video coding for low bit rate communication". The 1328 packetization and RTP-specific properties are described in RFC 2429 1329 [22]. Because the 1998 version of H.263 is a superset of the 1996 1330 syntax, this payload format can also be used with the 1996 version of 1331 H.263, and is RECOMMENDED for this use by new implementations. This 1332 payload format does not replace RFC 2190, which continues to be used 1333 by existing implementations, and may be required for backward 1334 compatibility in new implementations. Implementations using the new 1335 features of the 1998 version of H.263 MUST use the payload format 1336 described in RFC 2429. 1338 5.6 MPV 1340 MPV designates the use of MPEG-1 and MPEG-2 video encoding elementary 1341 streams as specified in ISO Standards ISO/IEC 11172 and 13818-2, 1342 respectively. The RTP payload format is as specified in RFC 2250 1343 [14], Section 3. 1345 The MIME registration for MPV in RFC YYYY [6] specifies a parameter 1346 that MAY be used with MIME or SDP to restrict the selection of the 1347 type of MPEG video. 1349 5.7 MP2T 1351 MP2T designates the use of MPEG-2 transport streams, for either audio 1352 or video. The RTP payoad format is described in RFC 2250 [14], 1353 Section 2. 1355 5.8 nv 1357 The encoding is implemented in the program `nv', version 4, developed 1358 at Xerox PARC by Ron Frederick. Further information is available from 1359 the author: 1361 Ron Frederick 1362 Cacheflow Inc. 1363 650 Almanor Avenue 1364 Sunnyvale, CA 94085 1365 United States 1366 electronic mail: ronf@cacheflow.com 1368 6 Payload Type Definitions 1370 Tables 4 and 5 define this profile's static payload type values for 1371 the PT field of the RTP data header. In addition, payload type 1372 values in the range 96-127 MAY be defined dynamically through a 1373 conference control protocol, which is beyond the scope of this 1374 document. For example, a session directory could specify that for a 1375 given session, payload type 96 indicates PCMU encoding, 8,000 Hz 1376 sampling rate, 2 channels. Entries in Tables 4 and 5 with payload 1377 type "dyn" have no static payload type assigned and are only used 1378 with a dynamic payload type. Payload type 13 is reserved for a 1379 comfort noise payload format to be specified in a separate RFC. 1380 Payload type 19 is also marked "reserved" because some draft versions 1381 of this specification assigned that number to a comfort noise payload 1382 format. The payload type range 72-76 is marked "reserved" so that 1383 RTCP and RTP packets can be reliably distinguished (see Section 1384 "Summary of Protocol Constants" of the RTP protocol specification). 1386 The payload types currently defined in this profile are assigned to 1387 exactly one of three categories or media types : audio only, video 1388 only and those combining audio and video. The media types are marked 1389 in Tables 4 and 5 as "A", "V" and "AV", respectively. Payload types 1390 of different media types SHALL NOT be interleaved or multiplexed 1391 within a single RTP session, but multiple RTP sessions MAY be used in 1392 parallel to send multiple media types. An RTP source MAY change 1393 payload types within the same media type during a session. See the 1394 section "Multiplexing RTP Sessions" of RFC XXXX for additional 1395 explanation. 1397 Session participants agree through mechanisms beyond the scope of 1398 this specification on the set of payload types allowed in a given 1399 session. This set MAY, for example, be defined by the capabilities 1400 of the applications used, negotiated by a conference control protocol 1401 or established by agreement between the human participants. 1403 Audio applications operating under this profile SHOULD, at a minimum, 1404 be able to send and/or receive payload types 0 (PCMU) and 5 (DVI4). 1405 This allows interoperability without format negotiation and ensures 1406 successful negotation with a conference control protocol. 1408 7 RTP over TCP and Similar Byte Stream Protocols 1410 Under special circumstances, it may be necessary to carry RTP in 1411 protocols offering a byte stream abstraction, such as TCP, possibly 1412 multiplexed with other data. The application MUST define its own 1413 method of delineating RTP and RTCP packets (RTSP [23] provides an 1414 example of such an encapsulation specification.) 1416 8 Port Assignment 1417 PT encoding media type clock rate channels 1418 name (Hz) 1419 ___________________________________________________ 1420 0 PCMU A 8000 1 1421 1 reserved A 1422 2 G726-32 A 8000 1 1423 3 GSM A 8000 1 1424 4 G723 A 8000 1 1425 5 DVI4 A 8000 1 1426 6 DVI4 A 16000 1 1427 7 LPC A 8000 1 1428 8 PCMA A 8000 1 1429 9 G722 A 8000 1 1430 10 L16 A 44100 2 1431 11 L16 A 44100 1 1432 12 QCELP A 8000 1 1433 13 reserved A 1434 14 MPA A 90000 (see text) 1435 15 G728 A 8000 1 1436 16 DVI4 A 11025 1 1437 17 DVI4 A 22050 1 1438 18 G729 A 8000 1 1439 19 reserved A 1440 20 unassigned A 1441 21 unassigned A 1442 22 unassigned A 1443 23 unassigned A 1444 dyn G726-40 A 8000 1 1445 dyn G726-24 A 8000 1 1446 dyn G726-16 A 8000 1 1447 dyn G729D A 8000 1 1448 dyn G729E A 8000 1 1449 dyn GSM-EFR A 8000 1 1450 dyn L8 A var. var. 1451 dyn RED A (see text) 1452 dyn VDVI A var. 1 1454 Table 4: Payload types (PT) for audio encodings 1456 As specified in the RTP protocol definition, RTP data SHOULD be 1457 carried on an even UDP port number and the corresponding RTCP packets 1458 SHOULD be carried on the next higher (odd) port number. 1460 Applications operating under this profile MAY use any such UDP port 1461 pair. For example, the port pair MAY be allocated randomly by a 1462 session management program. A single fixed port number pair cannot be 1463 required because multiple applications using this profile are likely 1464 PT encoding media type clock rate 1465 name (Hz) 1466 ____________________________________________ 1467 24 unassigned V 1468 25 CelB V 90000 1469 26 JPEG V 90000 1470 27 unassigned V 1471 28 nv V 90000 1472 29 unassigned V 1473 30 unassigned V 1474 31 H261 V 90000 1475 32 MPV V 90000 1476 33 MP2T AV 90000 1477 34 H263 V 90000 1478 35-71 unassigned ? 1479 72-76 reserved N/A N/A 1480 77-95 unassigned ? 1481 96-127 dynamic ? 1482 dyn H263-1998 V 90000 1484 Table 5: Payload types (PT) for video and combined encodings 1486 to run on the same host, and there are some operating systems that do 1487 not allow multiple processes to use the same UDP port with different 1488 multicast addresses. 1490 However, port numbers 5004 and 5005 have been registered for use with 1491 this profile for those applications that choose to use them as the 1492 default pair. Applications that operate under multiple profiles MAY 1493 use this port pair as an indication to select this profile if they 1494 are not subject to the constraint of the previous paragraph. 1495 Applications need not have a default and MAY require that the port 1496 pair be explicitly specified. The particular port numbers were chosen 1497 to lie in the range above 5000 to accommodate port number allocation 1498 practice within some versions of the Unix operating system, where 1499 port numbers below 1024 can only be used by privileged processes and 1500 port numbers between 1024 and 5000 are automatically assigned by the 1501 operating system. 1503 9 Changes from RFC 1890 1505 This RFC revises RFC 1890. It is mostly backwards-compatible with RFC 1506 1890 and codifies existing practice. The changes are listed below. 1508 o The mapping of a user pass-phrase string into an encryption 1509 key was deleted from Section 2 because two interoperable 1510 implementations were not found. 1512 o The payload format for 1016 audio was removed and its static 1513 payload type assignment 1 was marked "reserved" because two 1514 interoperable implementations were not found. 1516 o Additional payload formats and/or expanded descriptions were 1517 included for G722, G723, G726, G728, G729, GSM, GSM-EFR, 1518 QCELP, RED, VDVI, H263 and H263-1998. 1520 o Static payload types 4, 12, 16, 17, 18 and 34 were added, and 1521 13 and 19 were reserved. 1523 o Requirements for congestion control were added in Section 2. 1525 o A new Section "IANA Considerations" was added to specify the 1526 regstration of the name for this profile and to establish a 1527 new policy that no additional registration of static payload 1528 types for this profile will be made beyond those included in 1529 Tables 4 and 5, but that additional encoding names may be 1530 registered as MIME subtypes for binding to dynamic payload 1531 types. Non-normative references were added to RFC YYYY [6] 1532 where MIME subtypes for all the listed payload formats are 1533 registered, some with optional parameters for use of the 1534 payload formats. 1536 o In Section 4.1, the requirement level for setting of the 1537 marker bit on the first packet after silence for audio was 1538 changed from "is" to "SHOULD be", and clarified that the 1539 marker bit is set only when packets are intentionally not 1540 sent. 1542 o Similarly, text was added to specify that the marker bit 1543 SHOULD be set to one on the last packet of a video frame, and 1544 that video frames are distinguished by their timestamps. 1546 o This profile follows the suggestion in the RTP spec that RTCP 1547 bandwidth may be specified separately from the session 1548 bandwidth and separately for active senders and passive 1549 receivers. 1551 o RFC references are added for payload formats published after 1552 RFC 1890. 1554 o The security considerations and full copyright sections were 1555 added. 1557 o According to Peter Hoddie of Apple, only pre-1994 Macintosh 1558 used the 22254.54 rate and none the 11127.27 rate, so the 1559 latter was dropped from the discussion of suggested sampling 1560 frequencies. 1562 o Table 1 was corrected to move some values from the "ms/packet" 1563 column to the "default ms/packet" column where they belonged. 1565 o A note has been added for G722 to clarify a discrepancy 1566 between the actual sampling rate and the RTP timestamp clock 1567 rate. 1569 o Small clarifications of the text have been made in several 1570 places, some in response to questions from readers. In 1571 particular: 1573 - A definition for "media type" is given in Section 1.1 to 1574 allow the explanation of multiplexing RTP sessions in 1575 Section 6 to be more clear regarding the multiplexing of 1576 multiple media. 1578 - The explanation of how to determine the number of audio 1579 frames in a packet from the length was expanded. 1581 - More description of the allocation of bandwidth to SDES 1582 items is given. 1584 - A note was added that the convention for the order of 1585 channels specified in Section 4.1 may be overridden by a 1586 particular encoding or payload format specification. 1588 - The terms MUST, SHOULD, MAY, etc. are used as defined in RFC 1589 2119. 1591 o A second author for this document was added. 1593 10 Security Considerations 1595 Implementations using the profile defined in this specification are 1596 subject to the security considerations discussed in the RTP 1597 specification [1]. This profile does not specify any different 1598 security services. The primary function of this profile is to list a 1599 set of data compression encodings for audio and video media. 1601 Confidentiality of the media streams is achieved by encryption. 1602 Because the data compression used with the payload formats described 1603 in this profile is applied end-to-end, encryption may be performed 1604 after compression so there is no conflict between the two operations. 1606 A potential denial-of-service threat exists for data encodings using 1607 compression techniques that have non-uniform receiver-end 1608 computational load. The attacker can inject pathological datagrams 1609 into the stream which are complex to decode and cause the receiver to 1610 be overloaded. However, the encodings described in this profile do 1611 not exhibit any significant non-uniformity. 1613 As with any IP-based protocol, in some circumstances a receiver may 1614 be overloaded simply by the receipt of too many packets, either 1615 desired or undesired. Network-layer authentication MAY be used to 1616 discard packets from undesired sources, but the processing cost of 1617 the authentication itself may be too high. In a multicast 1618 environment, pruning of specific sources may be implemented in future 1619 versions of IGMP [24] and in multicast routing protocols to allow a 1620 receiver to select which sources are allowed to reach it. 1622 11 Full Copyright Statement 1624 Copyright (C) The Internet Society (2001). All Rights Reserved. 1626 This document and translations of it may be copied and furnished to 1627 others, and derivative works that comment on or otherwise explain it 1628 or assist in its implmentation may be prepared, copied, published and 1629 distributed, in whole or in part, without restriction of any kind, 1630 provided that the above copyright notice and this paragraph are 1631 included on all such copies and derivative works. However, this 1632 document itself may not be modified in any way, such as by removing 1633 the copyright notice or references to the Internet Society or other 1634 Internet organizations, except as needed for the purpose of 1635 developing Internet standards in which case the procedures for 1636 copyrights defined in the Internet Standards process must be 1637 followed, or as required to translate it into languages other than 1638 English. 1640 The limited permissions granted above are perpetual and will not be 1641 revoked by the Internet Society or its successors or assigns. 1643 This document and the information contained herein is provided on an 1644 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1645 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1646 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1647 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1648 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1650 12 Acknowledgements 1652 The comments and careful review of Simao Campos, Richard Cox and AVT 1653 Working Group participants are gratefully acknowledged. The GSM 1654 description was adopted from the IMTC Voice over IP Forum Service 1655 Interoperability Implementation Agreement (January 1997). Fred Burg 1656 and Terry Lyons helped with the G.729 description. 1658 13 Addresses of Authors 1660 Henning Schulzrinne 1661 Dept. of Computer Science 1662 Columbia University 1663 1214 Amsterdam Avenue 1664 New York, NY 10027 1665 USA 1666 electronic mail: schulzrinne@cs.columbia.edu 1668 Stephen L. Casner 1669 Packet Design 1670 2465 Latham Street 1671 Mountain View, CA 94040 1672 United States 1673 electronic mail: casner@acm.org 1675 A Bibliography 1677 [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A 1678 transport protocol for real-time applications," Internet Draft, 1679 Internet Engineering Task Force, Feb. 1999 Work in progress, revision 1680 to RFC 1889. 1682 [2] S. Bradner, "Key words for use in RFCs to Indicate Requirement 1683 Levels," RFC 2119, Internet Engineering Task Force, Mar. 1997. 1685 [3] R. Braden, D. Clark, S. Shenker, "Integrated Services in the 1686 Internet Architecture: an Overview," Request for Comments 1687 (Informational) RFC 1633, Internet Engineering Task Force, June 1994. 1689 [4] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. Weiss, "An 1690 Architecture for Differentiated Service," Request for Comments 1691 (Proposed Standard) RFC 2475, Internet Engineering Task Force, Dec. 1692 1998. 1694 [5] M. Handley and V. Jacobson, "SDP: Session Description Protocol," 1695 Request for Comments (Proposed Standard) RFC 2327, Internet 1696 Engineering Task Force, Apr. 1998. 1698 [6] S. Casner and P. Hoschka, "MIME Type Registration of RTP Payload 1699 Types," Internet Draft, Internet Engineering Task Force, July 2001. 1700 Work in progress. 1702 [7] N. Freed, J. Klensin, and J. Postel, "Multipurpose Internet Mail 1703 Extensions (MIME) Part Four: Registration Procedures," RFC 2048, 1704 Internet Engineering Task Force, Nov. 1996. 1706 [8] Apple Computer, "Audio interchange file format AIFF-C," Aug. 1707 1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z). 1709 [9] IMA Digital Audio Focus and Technical Working Groups, 1710 "Recommended practices for enhancing digital audio compatibility in 1711 multimedia systems (version 3.00)," tech. rep., Interactive 1712 Multimedia Association, Annapolis, Maryland, Oct. 1992. 1714 [10] D. Deleam and J.-P. Petit, "Real-time implementations of the 1715 recent ITU-T low bit rate speech coders on the TI TMS320C54X DSP: 1716 results, methodology, and applications," in Proc. of International 1717 Conference on Signal Processing, Technology, and Applications 1718 (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660, Oct. 1996. 1720 [11] M. Mouly and M.-B. Pautet, The GSM system for mobile 1721 communications Lassay-les-Chateaux, France: Europe Media Duplication, 1722 1993. 1724 [12] J. Degener, "Digital speech compression," Dr. Dobb's Journal , 1725 Dec. 1994. 1727 [13] S. M. Redl, M. K. Weber, and M. W. Oliphant, An Introduction to 1728 GSM Boston: Artech House, 1995. 1730 [14] D. Hoffman, G. Fernando, V. Goyal, and M. Civanlar, "RTP payload 1731 format for MPEG1/MPEG2 video," Request for Comments (Proposed 1732 Standard) RFC 2250, Internet Engineering Task Force, Jan. 1998. 1734 [15] N. S. Jayant and P. Noll, Digital Coding of Waveforms-- 1735 Principles and Applications to Speech and Video Englewood Cliffs, New 1736 Jersey: Prentice-Hall, 1984. 1738 [16] K. McKay, "RTP Payload Format for PureVoice(tm) Audio", Request 1739 for Comments (Proposed Standard) RFC 2658, Internet Engineering Task 1740 Force, Aug. 1999. 1742 [17] C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.C. 1743 Bolot, A. Vega-Garcia, and S. Fosse-Parisis, "RTP Payload for 1744 Redundant Audio Data," Request for Comments (Proposed Standard) RFC 1745 2198, Internet Engineering Task Force, Sep. 1997. 1747 [18] M. Speer and D. Hoffman, "RTP payload format of sun's CellB 1748 video encoding," Request for Comments (Proposed Standard) RFC 2029, 1749 Internet Engineering Task Force, Oct. 1996. 1751 [19] L. Berc, W. Fenner, R. Frederick, and S. McCanne, "RTP payload 1752 format for JPEG-compressed video," Request for Comments (Proposed 1753 Standard) RFC 2435, Internet Engineering Task Force, Oct. 1996. 1755 [20] T. Turletti and C. Huitema, "RTP payload format for H.261 video 1756 streams," Request for Comments (Proposed Standard) RFC 2032, Internet 1757 Engineering Task Force, Oct. 1996. 1759 [21] C. Zhu, "RTP payload format for H.263 video streams," Request 1760 for Comments (Proposed Standard) RFC 2190, Internet Engineering Task 1761 Force, Sep. 1997. 1763 [22] C. Bormann, L. Cline, G. Deisher, T. Gardos, C. Maciocco, D. 1764 Newell, J. Ott, G. Sullivan, S. Wenger, C. Zhu, "RTP Payload Format 1765 for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)," Request for 1766 Comments (Proposed Standard) RFC 2429, Internet Engineering Task 1767 Force, Oct. 1998. 1769 [23] H. Schulzrinne, A. Rao, and R. Lanphier, "Real time streaming 1770 protocol (RTSP)," Request for Comments (Proposed Standard) RFC 2326, 1771 Internet Engineering Task Force, Apr. 1998. 1773 [24] S. Deering, "Host Extensions for IP Multicasting," Request for 1774 Comments RFC 1112, STD 5, Internet Engineering Task Force, Aug. 1989. 1776 Current Locations of Related Resources 1778 Note: Several sections below refer to the ITU-T Software Tool Library 1779 (STL). It is available from the ITU Sales Service, Place des Nations, 1780 CH-1211 Geneve 20, Switzerland (also check http://www.itu.int. The 1781 ITU-T STL is covered by a license defined in ITU-T Recommendation 1782 G.191, "Software tools for speech and audio coding standardization". 1784 UTF-8 1786 Information on the UCS Transformation Format 8 (UTF-8) is available 1787 at 1789 http://www.stonehand.com/unicode/standard/utf8.html 1791 DVI4 1793 An implementation is available from Jack Jansen at 1795 ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar 1797 G722 1799 An implementation of the G.722 algorithm is available as part of the 1800 ITU-T STL, described above. 1802 G723 1804 The reference C code implementation defining the G.723.1 algorithm 1805 and its Annexes A, B, and C are available as an integral part of 1806 Recommendation G.723.1 from the ITU Sales Service, address listed 1807 above. Both the algorithm and C code are covered by a specific 1808 license. The ITU-T Secretariat should be contacted to obtain such 1809 licensing information. 1811 G726 1813 G726 is specified in the ITU-T Recommendation G.726, "40, 32, 24, and 1814 16 kb/s Adaptive Differential Pulse Code Modulation (ADPCM)". An 1815 implementation of the G.726 algorithm is available as part of the 1816 ITU-T STL, described above. 1818 G729 1820 The reference C code implementation defining the G.729 algorithm and 1821 its Annexes A through I are available as an integral part of 1822 Recommendation G.729 from the ITU Sales Service, listed above. Annex 1823 I contains the integrated C source code for all G.729 operating 1824 modes. The G.729 algorithm and associated C code are covered by a 1825 specific license. The contact information for obtaining the license 1826 is available from the ITU-T Secretariat. 1828 GSM 1830 A reference implementation was written by Carsten Borman and Jutta 1831 Degener (TU Berlin, Germany). It is available at 1833 ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ 1835 Although the RPE-LTP algorithm is not an ITU-T standard, there is a C 1836 code implementation of the RPE-LTP algorithm available as part of the 1837 ITU-T STL. The STL implementation is an adaptation of the TU Berlin 1838 version. 1840 LPC 1842 An implementation is available at 1844 ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z 1846 PCMU, PCMA 1848 An implementation of these algorithm is available as part of the 1849 ITU-T STL, described above. Code to convert between linear and mu-law 1850 companded data is also available in [9]. 1852 Table of Contents 1854 1 Introduction ........................................ 3 1855 1.1 Terminology ......................................... 4 1856 2 RTP and RTCP Packet Forms and Protocol Behavior ..... 4 1857 3 IANA Considerations ................................. 6 1858 3.1 Registering Additional Encodings .................... 6 1859 4 Audio ............................................... 8 1860 4.1 Encoding-Independent Rules .......................... 8 1861 4.2 Operating Recommendations ........................... 9 1862 4.3 Guidelines for Sample-Based Audio Encodings ......... 10 1863 4.4 Guidelines for Frame-Based Audio Encodings .......... 10 1864 4.5 Audio Encodings ..................................... 11 1865 4.5.1 DVI4 ................................................ 11 1866 4.5.2 G722 ................................................ 13 1867 4.5.3 G723 ................................................ 13 1868 4.5.4 G726-40, G726-32, G726-24, and G726-16 .............. 17 1869 4.5.5 G728 ................................................ 18 1870 4.5.6 G729 ................................................ 19 1871 4.5.7 G729D and G729E ..................................... 21 1872 4.5.8 GSM ................................................. 24 1873 4.5.8.1 General Packaging Issues ............................ 24 1874 4.5.8.2 GSM variable names and numbers ...................... 25 1875 4.5.9 GSM-EFR ............................................. 25 1876 4.5.10 L8 .................................................. 25 1877 4.5.11 L16 ................................................. 25 1878 4.5.12 LPC ................................................. 27 1879 4.5.13 MPA ................................................. 28 1880 4.5.14 PCMA and PCMU ....................................... 28 1881 4.5.15 QCELP ............................................... 28 1882 4.5.16 RED ................................................. 28 1883 4.5.17 VDVI ................................................ 29 1884 5 Video ............................................... 29 1885 5.1 CelB ................................................ 30 1886 5.2 JPEG ................................................ 30 1887 5.3 H261 ................................................ 30 1888 5.4 H263 ................................................ 30 1889 5.5 H263-1998 ........................................... 31 1890 5.6 MPV ................................................. 31 1891 5.7 MP2T ................................................ 31 1892 5.8 nv .................................................. 31 1893 6 Payload Type Definitions ............................ 31 1894 7 RTP over TCP and Similar Byte Stream Protocols ...... 32 1895 8 Port Assignment ..................................... 32 1896 9 Changes from RFC 1890 ............................... 34 1897 10 Security Considerations ............................. 36 1898 11 Full Copyright Statement ............................ 37 1899 12 Acknowledgements .................................... 37 1900 13 Addresses of Authors ................................ 38 1901 A Bibliography ........................................ 38