idnits 2.17.1 draft-ietf-avt-profile-new-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** Expected the document's filename to be given on the first page, but didn't find any ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** There are 58 instances of too long lines in the document, the longest one being 24 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 430: '... MUST indicate the appropriate sampl...' RFC 2119 keyword, line 584: '...e G726-32 encoding MUST be packed into...' RFC 2119 keyword, line 1085: '...nd RTCP packets, it SHOULD prefix each...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 285 has weird spacing: '...hannels des...' == Line 293 has weird spacing: '... lc c ...' == Line 390 has weird spacing: '...ncoding sam...' == Line 420 has weird spacing: '...A: not appli...' == Line 552 has weird spacing: '... bits con...' == (2 more instances...) == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- The document date (August 7, 1998) is 9387 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 1120 looks like a reference -- Missing reference section? '2' on line 1124 looks like a reference -- Missing reference section? '3' on line 1127 looks like a reference -- Missing reference section? '4' on line 1132 looks like a reference -- Missing reference section? '5' on line 1136 looks like a reference -- Missing reference section? '6' on line 1140 looks like a reference -- Missing reference section? '7' on line 1360 looks like a reference -- Missing reference section? '8' on line 1150 looks like a reference -- Missing reference section? '9' on line 1156 looks like a reference -- Missing reference section? '10' on line 1160 looks like a reference -- Missing reference section? '11' on line 1163 looks like a reference -- Missing reference section? '12' on line 1166 looks like a reference -- Missing reference section? '0' on line 810 looks like a reference -- Missing reference section? '22' on line 798 looks like a reference -- Missing reference section? '23' on line 799 looks like a reference -- Missing reference section? '24' on line 800 looks like a reference -- Missing reference section? '25' on line 801 looks like a reference -- Missing reference section? '26' on line 806 looks like a reference -- Missing reference section? '27' on line 807 looks like a reference -- Missing reference section? '28' on line 808 looks like a reference -- Missing reference section? '29' on line 809 looks like a reference -- Missing reference section? '30' on line 810 looks like a reference -- Missing reference section? '31' on line 811 looks like a reference -- Missing reference section? '32' on line 812 looks like a reference -- Missing reference section? '33' on line 813 looks like a reference -- Missing reference section? '34' on line 814 looks like a reference -- Missing reference section? '35' on line 815 looks like a reference -- Missing reference section? '36' on line 816 looks like a reference -- Missing reference section? '37' on line 817 looks like a reference -- Missing reference section? '38' on line 818 looks like a reference -- Missing reference section? '39' on line 823 looks like a reference -- Missing reference section? '40' on line 824 looks like a reference -- Missing reference section? '41' on line 825 looks like a reference -- Missing reference section? '42' on line 826 looks like a reference -- Missing reference section? '13' on line 1170 looks like a reference -- Missing reference section? '43' on line 827 looks like a reference -- Missing reference section? '14' on line 1221 looks like a reference -- Missing reference section? '44' on line 828 looks like a reference -- Missing reference section? '15' on line 1224 looks like a reference -- Missing reference section? '45' on line 829 looks like a reference -- Missing reference section? '16' on line 1229 looks like a reference -- Missing reference section? '46' on line 830 looks like a reference -- Missing reference section? '17' on line 1233 looks like a reference -- Missing reference section? '47' on line 831 looks like a reference -- Missing reference section? '18' on line 1237 looks like a reference -- Missing reference section? '48' on line 832 looks like a reference -- Missing reference section? '19' on line 1241 looks like a reference -- Missing reference section? '49' on line 833 looks like a reference -- Missing reference section? '20' on line 1245 looks like a reference -- Missing reference section? '50' on line 834 looks like a reference -- Missing reference section? '21' on line 835 looks like a reference -- Missing reference section? '51' on line 835 looks like a reference Summary: 13 errors (**), 0 flaws (~~), 8 warnings (==), 54 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force AVT WG 3 Internet Draft Schulzrinne 4 ietf-avt-profile-new-03.txt Columbia U. 5 August 7, 1998 6 Expires: February 7, 1999 8 RTP Profile for Audio and Video Conferences with Minimal Control 10 STATUS OF THIS MEMO 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference 20 material or to cite them other than as ``work in progress''. 22 To view the entire list of current Internet-Drafts, please check the 23 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 24 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern 25 Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific 26 Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 28 Distribution of this document is unlimited. 30 ABSTRACT 32 This memo describes a profile called "RTP/AVP" for the 33 use of the real-time transport protocol (RTP), version 2, 34 and the associated control protocol, RTCP, within audio 35 and video multiparticipant conferences with minimal 36 control. It provides interpretations of generic fields 37 within the RTP specification suitable for audio and video 38 conferences. In particular, this document defines a set 39 of default mappings from payload type numbers to 40 encodings. 42 The document also describes how audio and video data may 43 be carried within RTP. It defines a set of standard 44 encodings and their names when used within RTP. However, 45 the encoding definitions are independent of the 46 particular transport mechanism used. The descriptions 47 provide pointers to reference implementations and the 48 detailed standards. This document is meant as an aid for 49 implementors of audio, video and other real-time 50 multimedia applications. 52 Changes 54 This draft revises RFC 1890. It is fully backwards-compatible with 55 RFC 1890 and codifies existing practice. It is intended that this 56 draft form the basis of a new RFC to obsolete RFC 1890 as it moves to 57 Draft Standard. 59 Besides wording clarifications and filling in RFC numbers for payload 60 type definitions, this draft adds payload types 4, 16, 17, 18, 19 and 61 34. The PostScript version of this draft contains change bars marking 62 changes make since draft -00. 64 A tentative TCP encapsulation is defined. 66 According to Peter Hoddie of Apple, only pre-1994 Macintosh used the 67 22254.54 rate and none the 11127.27 rate. 69 Note to RFC editor: This section is to be removed before publication 70 as an RFC. All RFC TBD should be filled in with the number of the RTP 71 specification RFC submitted for Draft Standard status. 73 1 Introduction 75 This profile defines aspects of RTP left unspecified in the RTP 76 Version 2 protocol definition (RFC XXXX). This profile is intended 77 for the use within audio and video conferences with minimal session 78 control. In particular, no support for the negotiation of parameters 79 or membership control is provided. The profile is expected to be 80 useful in sessions where no negotiation or membership control are 81 used (e.g., using the static payload types and the membership 82 indications provided by RTCP), but this profile may also be useful in 83 conjunction with a higher-level control protocol. 85 Use of this profile occurs by use of the appropriate applications; 86 there is no explicit indication by port number, protocol identifier 87 or the like. Applications such as session directories should refer to 88 this profile as "RTP/AVP". 90 Other profiles may make different choices for the items specified 91 here. 93 This document also defines a set of payload formats for audio. 95 This draft defines the term media type as dividing encodings of audio 96 and video content into three classes: audio, video and audio/video 97 (interleaved). 99 2 RTP and RTCP Packet Forms and Protocol Behavior 101 The section "RTP Profiles and Payload Format Specification" of RFC 102 TBD enumerates a number of items that can be specified or modified in 103 a profile. This section addresses these items. Generally, this 104 profile follows the default and/or recommended aspects of the RTP 105 specification. 107 RTP data header: The standard format of the fixed RTP data header is 108 used (one marker bit). 110 Payload types: Static payload types are defined in Section 6. 112 RTP data header additions: No additional fixed fields are appended to 113 the RTP data header. 115 RTP data header extensions: No RTP header extensions are defined, but 116 applications operating under this profile may use such 117 extensions. Thus, applications should not assume that the RTP 118 header X bit is always zero and should be prepared to ignore the 119 header extension. If a header extension is defined in the 120 future, that definition must specify the contents of the first 121 16 bits in such a way that multiple different extensions can be 122 identified. 124 RTCP packet types: No additional RTCP packet types are defined by 125 this profile specification. 127 RTCP report interval: The suggested constants are to be used for the 128 RTCP report interval calculation. 130 SR/RR extension: No extension section is defined for the RTCP SR or 131 RR packet. 133 SDES use: Applications may use any of the SDES items described in the 134 RTP specification. While CNAME information is sent every 135 reporting interval, other items should be sent only every third 136 reporting interval, with NAME sent seven out of eight times 137 within that slot and the remaining SDES items cyclically taking 138 up the eighth slot, as defined in Section 6.2.2 of the RTP 139 specification. In other words, NAME is sent in RTCP packets 1, 140 4, 7, 10, 13, 16, 19, while, say, EMAIL is used in RTCP packet 141 22. 143 Security: The RTP default security services are also the default 144 under this profile. 146 String-to-key mapping: A user-provided string ("pass phrase") is 147 hashed with the MD5 algorithm to a 16-octet digest. An n-bit key 148 is extracted from the digest by taking the first n bits from the 149 digest. If several keys are needed with a total length of 128 150 bits or less (as for triple DES), they are extracted in order 151 from that digest. The octet ordering is specified in RFC 1423, 152 Section 2.2. (Note that some DES implementations require that 153 the 56-bit key be expanded into 8 octets by inserting an odd 154 parity bit in the most significant bit of the octet to go with 155 each 7 bits of the key.) 157 It is suggested that pass phrases are restricted to ASCII letters, 158 digits, the hyphen, and white space to reduce the the chance of 159 transcription errors when conveying keys by phone, fax, telex or 160 email. 162 The pass phrase may be preceded by a specification of the encryption 163 algorithm. Any characters up to the first slash (ASCII 0x2f) are 164 taken as the name of the encryption algorithm. The encryption format 165 specifiers should be drawn from RFC 1423 or any additional 166 identifiers registered with IANA. If no slash is present, DES-CBC is 167 assumed as default. The encryption algorithm specifier is case 168 sensitive. 170 The pass phrase typed by the user is transformed to a canonical form 171 before applying the hash algorithm. For that purpose, we define 172 return, tab, or vertical tab as well as all characters contained in 173 the Unicode space characters table. The transformation consists of 174 the following steps: (1) convert the input string to the ISO 10646 175 character set, using the UTF-8 encoding as specified in Annex P to 176 ISO/IEC 10646-1:1993 (ASCII characters require no mapping, but ISO 177 8859-1 characters do); (2) remove leading and trailing white space 178 characters; (3) replace one or more contiguous white space characters 179 by a single space (ASCII or UTF-8 0x20); (4) convert all letters to 180 lower case and replace sequences of characters and non-spacing 181 accents with a single character, where possible. A minimum length of 182 16 key characters (after applying the transformation) should be 183 enforced by the application, while applications must allow up to 256 184 characters of input. 186 Underlying protocol: The profile specifies the use of RTP over 187 unicast and multicast UDP as well as TCP. (This does not 188 preclude the use of these definitions when RTP is carried by 189 other lower-layer protocols.) 191 Transport mapping: The standard mapping of RTP and RTCP to 192 transport-level addresses is used. 194 Encapsulation: No encapsulation of RTP packets is specified. 196 3 Registering Payload Types 198 This profile defines a set of standard encodings and their payload 199 types when used within RTP. Other encodings and their payload types 200 are to be registered with the Internet Assigned Numbers Authority 201 (IANA). When registering a new encoding/payload type, the following 202 information should be provided: 204 o name and description of encoding, in particular the RTP 205 timestamp clock rate; the names defined here are 3 or 4 206 characters long to allow a compact representation if needed; 208 o indication of who has change control over the encoding (for 209 example, ISO, ITU-T, other international standardization 210 bodies, a consortium or a particular company or group of 211 companies); 213 o any operating parameters or profiles; 215 o a reference to a further description, if available, for 216 example (in order of preference) an RFC, a published paper, a 217 patent filing, a technical report, documented source code or a 218 computer manual; 220 o for proprietary encodings, contact information (postal and 221 email address); 223 o the payload type value for this profile, if necessary (see 224 below). 226 Note that not all encodings to be used by RTP need to be assigned a 227 static payload type. There will be no additional static payload 228 types assigned beyond the ones described in this document. Non-RTP 229 means beyond the scope of this memo (such as directory services or 230 invitation protocols) may be used to establish a dynamic mapping 231 between a payload type and an encoding ("dynamic payload types"). 232 Applications should first use the range 96 to 127 for dynamic payload 233 types. Only applications which need to define more than 32 dynamic 234 payload types may redefine codes below 96. Redefining payload types 235 below 96 may cause incorrect operation if an attempt is made to join 236 a session without obtaining session description information that 237 defines the dynamic payload types. 239 Dynamic payload types should not be used without a well-defined 240 mechanism to indicate the mapping. Systems that expect to 241 interoperate with others operating under this profile should not 242 assign proprietary encodings to particular, fixed payload types in 243 the range reserved for dynamic payload types. The Session Description 244 Protocol (SDP), RFC 2327 [1], defines such a mapping mechanism. 246 The available payload type space is relatively small. Thus, no new 247 static payload types will be assigned beyond the current list. For 248 implementor convenience, this profile contains descriptions of 249 encodings which do not currently have a static payload type assigned 250 to them. SDP uses the encoding names defined here. 252 4 Audio 254 4.1 Encoding-Independent Rules 256 For applications which send either no packets or comfort-noise 257 packets during silence, the first packet of a talkspurt, that is, the 258 first packet after a silence period, is distinguished by setting the 259 marker bit in the RTP data header to one. The marker bits in all 260 other packets is zero. The beginning of a talkspurt may be used to 261 adjust the playout delay to reflect changing network delays. 262 Applications without silence suppression set the bit to zero. 264 The RTP clock rate used for generating the RTP timestamp is 265 independent of the number of channels and the encoding; it equals the 266 number of sampling periods per second. For N-channel encodings, each 267 sampling period (say, 1/8000 of a second) generates N samples. (This 268 terminology is standard, but somewhat confusing, as the total number 269 of samples generated per second is then the sampling rate times the 270 channel count.) 272 If multiple audio channels are used, channels are numbered left-to- 273 right, starting at one. In RTP audio packets, information from 274 lower-numbered channels precedes that from higher-numbered channels. 275 For more than two channels, the convention followed by the AIFF-C 276 audio interchange format should be followed [2], using the following 277 notation: 279 l left 280 r right 281 c center 282 S surround 283 F front 284 R rear 285 channels description channel 286 1 2 3 4 5 6 287 ________________________________________________________________ 288 2 stereo l r 289 3 l r c 290 4 quadrophonic Fl Fr Rl Rr 291 4 l c r S 292 5 Fl Fr Fc Sl Sr 293 6 l lc c r rc S 295 Samples for all channels belonging to a single sampling instant must 296 be within the same packet. The interleaving of samples from different 297 channels depends on the encoding. General guidelines are given in 298 Section 4.3 and 4.4. 300 The sampling frequency should be drawn from the set: 8000, 11025, 301 16000, 22050, 24000, 32000, 44100 and 48000 Hz. (Older Apple 302 Macintosh computers had a native sample rate of 22254.54 Hz, which 303 can be converted to 22050 with acceptable quality by dropping 4 304 samples in a 20 ms frame.) However, most audio encodings are defined 305 for a more restricted set of sampling frequencies. Receivers should 306 be prepared to accept multi-channel audio, but may choose to only 307 play a single channel. 309 4.2 Operating Recommendations 311 The following recommendations are default operating parameters. 312 Applications should be prepared to handle other values. The ranges 313 given are meant to give guidance to application writers, allowing a 314 set of applications conforming to these guidelines to interoperate 315 without additional negotiation. These guidelines are not intended to 316 restrict operating parameters for applications that can negotiate a 317 set of interoperable parameters, e.g., through a conference control 318 protocol. 320 For packetized audio, the default packetization interval should have 321 a duration of 20 ms or one frame, whichever is longer, unless 322 otherwise noted in Table 1 (column "ms/packet"). The packetization 323 interval determines the minimum end-to-end delay; longer packets 324 introduce less header overhead but higher delay and make packet loss 325 more noticeable. For non-interactive applications such as lectures or 326 links with severe bandwidth constraints, a higher packetization delay 327 may be appropriate. A receiver should accept packets representing 328 between 0 and 200 ms of audio data. (For framed audio encodings, a 329 receiver should accept packets with 200 ms divided by the frame 330 duration, rounded up.) This restriction allows reasonable buffer 331 sizing for the receiver. 333 4.3 Guidelines for Sample-Based Audio Encodings 335 In sample-based encodings, each audio sample is represented by a 336 fixed number of bits. Within the compressed audio data, codes for 337 individual samples may span octet boundaries. An RTP audio packet may 338 contain any number of audio samples, subject to the constraint that 339 the number of bits per sample times the number of samples per packet 340 yields an integral octet count. Fractional encodings produce less 341 than one octet per sample. 343 The duration of an audio packet is determined by the number of 344 samples in the packet. 346 For sample-based encodings producing one or more octets per sample, 347 samples from different channels sampled at the same sampling instant 348 are packed in consecutive octets. For example, for a two-channel 349 encoding, the octet sequence is (left channel, first sample), (right 350 channel, first sample), (left channel, second sample), (right 351 channel, second sample), .... For multi-octet encodings, octets are 352 transmitted in network byte order (i.e., most significant octet 353 first). 355 The packing of sample-based encodings producing less than one octet 356 per sample is encoding-specific. 358 4.4 Guidelines for Frame-Based Audio Encodings 360 Frame-based encodings encode a fixed-length block of audio into 361 another block of compressed data, typically also of fixed length. For 362 frame-based encodings, the sender may choose to combine several such 363 frames into a single RTP packet. The receiver can tell the number of 364 frames contained in an RTP packet since the audio frame duration (in 365 octets) is defined as part of the encoding, as long as all frames 366 have the same length measured in octets. This does not work when 367 carrying frames of different sizes unless the frame sizes are 368 relatively prime. 370 For frame-based codecs, the channel order is defined for the whole 371 block. That is, for two-channel audio, right and left samples are 372 coded independently, with the encoded frame for the left channel 373 preceding that for the right channel. 375 All frame-oriented audio codecs should be able to encode and decode 376 several consecutive frames within a single packet. Since the frame 377 size for the frame-oriented codecs is given, there is no need to use 378 a separate designation for the same encoding, but with different 379 number of frames per packet. 381 RTP packets shall contain a whole number of frames, with frames 382 inserted according to age within a packet, so that the oldest frame 383 (to be played first) occurs immediately after the RTP packet header. 384 The RTP timestamp reflects the capturing time of the first sample in 385 the first frame, that is, the oldest information in the packet. 387 4.5 Audio Encodings 389 name of sampling default 390 encoding sample/frame bits/sample rate ms/frame ms/packet 391 ____________________________________________________________________________ 392 1016 frame N/A 8,000 30 30 393 CN frame N/A var. 394 DVI4 sample 4 var. 20 395 G722 sample 8 16,000 20 396 G723 frame N/A 8,000 30 30 397 G726-16 sample 2 8,000 20 398 G726-24 sample 3 8,000 20 399 G726-32 sample 4 8,000 20 400 G726-40 sample 5 8,000 20 401 G727-16 sample 2 8,000 20 402 G727-24 sample 3 8,000 20 403 G727-32 sample 4 8,000 20 404 G727-40 sample 5 8,000 20 405 G728 frame N/A 8,000 2.5 20 406 G729 frame N/A 8,000 10 20 407 GSM frame N/A 8,000 20 20 408 L8 sample 8 var. 20 409 L16 sample 16 var. 20 410 LPC frame N/A 8,000 20 20 411 MPA frame N/A var. 20 412 PCMA sample 8 var. 20 413 PCMU sample 8 var. 20 414 QCELP frame N/A 8,000 20 415 SX7300P frame N/A 8,000 15 30 416 SX8300P frame N/A 8,000 15 30 417 SX9600P frame N/A 8,000 15 30 418 VDVI sample var. var. 20 420 Table 1: Properties of Audio Encodings (N/A: not applicable; var.: 421 variable) 423 The characteristics of standard audio encodings are shown in Table 1; 424 they are listed in order of their payload type in Table 4. Entries 425 with payload type "dyn" have a dynamic rather than static payload 426 type. While most audio codecs are only specified for a fixed sampling 427 rate, some sample-based algorithms (indicated by an entry of "var." 428 in the sampling rate column of Table 1) may be used with different 429 sampling rates, resulting in different coded bit rates. Non-RTP means 430 MUST indicate the appropriate sampling rate. 432 4.5.1 1016 434 Encoding 1016 is a frame based encoding using code-excited linear 435 prediction (CELP) and is specified in Federal Standard FED-STD 1016 436 [3,4,5,6]. 438 4.5.2 CN 440 The CN (comfort noise) packet contains a single-octet message to the 441 receiver to play comfort noise at the absolute level specified. This 442 message would normally be sent once at the beginning of a silence 443 period (which also indicates the transition from speech to silence), 444 but rate of noise level updates is implementation specific. The 445 magnitude of the noise level is packed into the least significant 446 bits of the noise-level payload, as shown below. 448 The noise level is expressed in dBov, with values from 0 to 127 dBov. 449 dBov is the level relative to the overload of the system. (Note: 450 Representation relative to the overload point of a system is 451 particularly useful for digital implementations, since one does not 452 need to know the relative calibration of the analog circuitry.) 453 Example: In 16-bit linear PCM system (L16), a signal with 0 dBov 454 represents a square wave with the maximum possible amplitude (+/- 455 32767). -63 dBov corresponds to -58 dBm0 in a standard telephone 456 system. (dBm is the power level in decibels relative to 1 mW, with an 457 impedance of 600 Ohms.) 459 0 1 2 3 4 5 6 7 460 +-+-+-+-+-+-+-+-+ 461 |0| level | 462 +-+-+-+-+-+-+-+-+ 464 The RTP header for the comfort noise packet should be constructed as 465 if the comfort noise were an independent codec. Thus, the RTP 466 timestamp designates the beginning of the silence period. A static 467 payload type is assigned for a sampling rate of 8,000 Hz; if other 468 sampling rates are needed, they should be defined through dynamic 469 payload types. The RTP packet should not have the marker bit set. 471 The CN payload type is primarily for use with L16, DVI4, PCMA, PCMU 472 and other audio codecs that do not support comfort noise as part of 473 the codec itself. G.723.1 and G.729 have their own comfort noise 474 systems as part of Annexes A (G.723.1) and B (G.729), respectively. 476 4.5.3 DVI4 478 DVI4 is specified, with pseudo-code, in [7] as the IMA ADPCM wave 479 type. 481 However, the encoding defined here as DVI4 differs in three respects 482 from this recommendation: 484 o The RTP DVI4 header contains the predicted value rather than 485 the first sample value contained the IMA ADPCM block header. 487 o IMA ADPCM blocks contain an odd number of samples, since the 488 first sample of a block is contained just in the header 489 (uncompressed), followed by an even number of compressed 490 samples. DVI4 has an even number of compressed samples only, 491 using the 'predict' word from the header to decode the first 492 sample. 494 o For DVI4, the 4-bit samples are packed with the first sample 495 in the four most significant bits and the second sample in the 496 four least significant bits. In the IMA ADPCM codec, the 497 samples are packed in little-endian order. 499 Each packet contains a single DVI block. This profile only defines 500 the 4-bit-per-sample version, while IMA also specifies a 3-bit-per- 501 sample encoding. 503 The "header" word for each channel has the following structure: 505 int16 predict; /* predicted value of first sample 506 from the previous block (L16 format) */ 507 u_int8 index; /* current index into stepsize table */ 508 u_int8 reserved; /* set to zero by sender, ignored by receiver */ 510 Each octet following the header contains two 4-bit samples, thus the 511 number of samples per packet must be even. 513 Packing of samples for multiple channels is for further study. 515 The document IMA Recommended Practices for Enhancing Digital Audio 516 Compatibility in Multimedia Systems (version 3.0) contains the 517 algorithm description. It is available from 519 Interactive Multimedia Association 520 48 Maryland Avenue, Suite 202 521 Annapolis, MD 21401-8011 522 USA 523 phone: +1 410 626-1380 525 4.5.4 G722 527 G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding 528 within 64 kbit/s". 530 4.5.5 G723 532 G.723.1 is specified in ITU Recommendation G.723.1, "Dual-rate speech 533 coder for multimedia communications transmitting at 5.3 and 6.3 534 kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T as 535 a mandatory codec for ITU-T H.324 GSTN videophone terminal 536 applications. The algorithm has a floating point specification in 537 Annex B to G.723.1, a silence compression algorithm in Annex A to 538 G.723.1 and an encoded signal bit-error sensitivity specification in 539 G.723.1 Annex C. 541 This Recommendation specifies a coded representation that can be used 542 for compressing the speech signal component of multi-media services 543 at a very low bit rate. Audio is encoded in 30 ms frames, with an 544 additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be 545 one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s 546 frame), or 4 octets. These 4-octet frames are called SID frames 547 (Silence Insertion Descriptor) and are used to specify comfort noise 548 parameters. There is no restriction on how 4, 20, and 24 octet frames 549 are intermixed. The least significant two bits of the first octet in 550 the frame determine the frame size and codec type: 552 bits content octets/frame 553 00 high-rate speech (6.3 kb/s) 24 554 01 low-rate speech (5.3 kb/s) 20 555 10 SID frame 4 556 11 reserved 558 It is possible to switch between the two rates at any 30 ms frame 559 boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of 560 the encoder and decoder. This coder was optimized to represent speech 561 with near-toll quality at the above rates using a limited amount of 562 complexity. 564 All the bits of the encoded bit stream are transmitted always from 565 the the least significant bit towards the most significant bit. 567 4.5.6 G726-16, G726-24, G726-32, G726-40 569 ITU-T Recommendation G.726 describes, among others, the algorithm 570 recommended for conversion of a single 64 kbit/s A-law or mu-law PCM 571 channel encoded at 8000 samples/sec to and from a 32 kbit/s channel. 572 The conversion is applied to the PCM stream using an Adaptive 573 Differential Pulse Code Modulation (ADPCM) transcoding technique. 574 G.726 describes codecs operating at 16 kb/s (2 bits/sample), 24 kb/s 575 (3 bits/sample), 32 kb/s (4 bits/sample), 40 kb/s (5 bits/sample). 576 These encodings are labeled G726-16, G726-24, G726-32 and G726-40, 577 respectively. 579 Note: In 1990, ITU-T Recommendation G.721 was merged with 580 Recommendation G.723 into ITU-T Recommendation G.726. Thus, G726-32 581 designates the same algorithm as G721 in RFC 1890. 583 No header information shall be included as part of the audio data. 584 The 4-bit code words of the G726-32 encoding MUST be packed into 585 octets as follows: the first code word is placed in the four least 586 significant bits of the first octet, with the least significant bit 587 of the code word in the least significant bit of the octet; the 588 second code word is placed in the four most significant bits of the 589 first octet, with the most significant bit of the code word in the 590 most significant bit of the octet. Subsequent pairs of the code words 591 shall be packed in the same way into successive octets, with the 592 first code word of each pair placed in the least significant four 593 bits of the octet. It is prefered that the voice sample be extended 594 with silence such that the encoded value comprises an even number of 595 code words. [TBD: Shouldn't we just require an even number of 596 samples?] 598 4.5.7 G727-16, G727-24, G727-32, G727-40 600 ITU-T Recommendation G.727, "5-, 4-, 3- and 2-bits sample embedded 601 adaptive differential pulse code modulation (ADPCM)", specifies an 602 embedded ADPCM algorithm which has the intrinsic capability of 603 dropping bits in the encoded words to alleviate network congestion 604 conditions. The algorithm, although not bitstream compatible with 605 G.726, was based and has a structure similar to the G.726 ADPCM 606 algorithm. 608 4.5.8 G728 610 G728 is specified in ITU-T Recommendation G.728, "Coding of speech at 611 16 kbit/s using low-delay code excited linear prediction". 613 A G.278 encoder translates 5 consecutive audio samples into a 10-bit 614 codebook index, resulting in a bit rate of 16 kb/s for audio sampled 615 at 8,000 samples per second. The group of five consecutive samples is 616 called a vector. Four consecutive vectors, labeled V1 to V4 (where V1 617 is to be played first by the receiver), build one G.728 frame. The 618 four vectors of 40 bits are packed into 5 octets, labeled B1 through 619 B5. B1 shall be placed first in the RTP packet. 621 Referring to the figure below, the principle for bit order is 622 "maintenance of bit significance". Bits from an older vector are more 623 significant than bits from newer vectors. The MSB of the frame goes 624 to the MSB of B1 and the LSB of the frame goes to LSB of B5. For 625 example: octet B1 contains the eight most significant bits of vector 626 V1, the MSB of V1 is MSB of B1. 628 1 2 3 3 629 0 0 0 0 9 630 ++++++++++++++++++++++++++++++++++++++++ 631 <---V1---><---V2---><---V3---><---V4---> vectors 632 <--B1--><--B2--><--B3--><--B4--><--B5--> octets 633 <------------- frame 1 ----------------> 635 In particular, B1 contains the eight most significant bits of V1, 636 with the MSB of V1 being the MSB of B1. B2 contains the two least 637 significant bits of V1, the more significant of the two in its MSB, 638 and the six most significant bits of V2. B1 shall be placed first in 639 the RTP packet and B5 last. 641 4.5.9 G729 643 G729 is specified in ITU-T Recommendation G.729, "Coding of speech at 644 8 kbit/s using conjugate structure-algebraic code excited linear 645 prediction (CS-ACELP)". A complexity-reduced version of the G.729 646 algorithm is specified in Annex A to Rec. G.729. The speech coding 647 algorithms in the main body of G.729 and in G.729 Annex A are fully 648 interoperable with each other, so there is no need to further 649 distinguish between them. The G.729 and G.729 Annex A codecs were 650 optimized to represent speech with high quality, where G.729 Annex A 651 trades some speech quality for an approximate 50% complexity 652 reduction [8]. 654 A voice activity detector (VAD) and comfort noise generator (CNG) 655 algorithm in Annex B of G.729 is recommended for digital simultaneous 656 voice and data applications and can be used in conjunction with G.729 657 or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets, 658 while the G.729 Annex B comfort noise frame occupies 2 octets: 660 0 1 661 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 662 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 663 |L| LSF1 | LSF2 | GAIN |R| 664 |S| | | |E| 665 |F|0 1 2 3 4|0 1 2 3|0 1 2 3 4|S| 666 |0| | | |V| RESV = Reserved (zero) 667 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 669 An RTP packet may consist of zero or more G.729 or G.729 Annex A 670 frames, followed by zero or one G.729 Annex B payloads. The presence 671 of a comfort noise frame can be deduced from the length of the RTP 672 payload. 674 A floating-point version of the G.729, G.729 Annex A, and G.729 Annex 675 B will be available shortly as Annex C to Recommendation G.729. 677 The transmitted parameters of a G.729/G.729A 10-ms frame, consisting 678 of 80 bits, are defined in Recommendation G.729, Table 8/G.729. 680 The mapping of the these parameters is given below. Bits are numbered 681 as Internet order, that is, the most significant bit is bit 0. 683 0 1 2 3 684 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 685 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 686 |L| L1 | L2 | L3 | P1 |P| C1 | 687 |0| | | | |0| | 688 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4| 689 | | | | | | | | 690 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 692 4 5 6 693 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 694 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 695 | C1 | S1 | GA1 | GB1 | P2 | C2 | 696 | | | | | | | 697 |5 6 7 8 9 1 1 1|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7| 698 | 0 1 2| | | | | | 699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 701 7 703 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 705 | C2 | S2 | GA2 | GB2 | 706 | | | | | 707 |8 9 1 1 1|0 1 2 3|0 1 2|0 1 2 3| 708 | 0 1 2| | | | 709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 711 The encoding name "G729B" is assigned for the case when a particular 712 RTP payload type is to contain G.729 Annex B comfort noise packets 713 only. This may be necessary if the underlying RTP mechanism has no 714 means of distinguishing talkspurt from comfort-noise packets. 716 4.5.10 GSM 718 GSM (group speciale mobile) denotes the European GSM 06.10 719 provisional standard for full-rate speech transcoding, prI-ETS 300 720 036, which is based on RPE/LTP (residual pulse excitation/long term 721 prediction) coding at a rate of 13 kb/s [9,10,11]. The text of the 722 standard can be obtained from 724 ETSI (European Telecommunications Standards Institute) 725 ETSI Secretariat: B.P.152 726 F-06561 Valbonne Cedex 727 France 728 Phone: +33 92 94 42 00 729 Fax: +33 93 65 47 16 731 Blocks of 160 audio samples are compressed into 33 octets, for an 732 effective data rate of 13,200 b/s. 734 4.5.10.1 General Packaging Issues 736 The GSM standard specifies the bit stream produced by the codec, but 737 does not specify how these bits should be packed for transmission. 738 Some software implementations of the GSM codec use a different 739 packing than that specified here. 741 In the GSM encoding used by RTP, the bits are packed beginning from 742 the most significant bit. Every 160 sample GSM frame is coded into 743 one 33 octet (264 bit) buffer. Every such buffer begins with a 4 bit 744 signature (0xD), followed by the MSB encoding of the fields of the 745 frame. The first octet thus contains 1101 in the 4 most significant 746 bits (0-3) and the 4 most significant bits of F1 (0-3) in the 4 least 747 significant bits (4-7). The second octet contains the 2 least 748 significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so on. 750 The order of the fields in the frame is described in Table 2. 752 4.5.10.2 GSM variable names and numbers 754 So if F.i signifies the ith bit of the field F, and bit 0 is the most 755 significant bit, and the bits of every octet are numbered from 0 to 7 756 from most to least significant, then in the RTP encoding we have the 757 bit pattern described in Table 3. 759 4.5.11 L8 761 L8 denotes linear audio data, using 8-bits of precision with an 762 offset of 128, that is, the most negative signal is encoded as zero. 764 4.5.12 L16 766 L16 denotes uncompressed audio data, using 16-bit signed 767 representation with 65535 equally divided steps between minimum and 768 maximum signal level, ranging from -32768 to 32767. The value is 769 represented in two's complement notation and network byte order. 771 4.5.13 LPC 773 LPC designates an experimental linear predictive encoding contributed 774 by Ron Frederick, Xerox PARC, which is based on an implementation 775 written by Ron Zuckerman, Motorola, posted to the Usenet group 776 comp.dsp on June 26, 1992. The codec generates 14 octets for every 777 frame. The framesize is set to 20 ms, resulting in a bit rate of 778 5,600 b/s. 780 4.5.14 MPA 782 MPA denotes MPEG-I or MPEG-II audio encapsulated as elementary 783 streams. The encoding is defined in ISO standards ISO/IEC 11172-3 784 and 13818-3. The encapsulation is specified in RFC 2250 [12]. 786 Sampling rate and channel count are contained in the payload. MPEG-I 787 audio supports sampling rates of 32, 44.1, and 48 kHz (ISO/IEC 788 11172-3, section 1.1; "Scope"). MPEG-II additionally supports 789 sampling rates of 16, 22.05 and 24 kHz. 791 4.5.15 PCMA and PCMU 793 PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio data 794 is encoded as eight bits per sample, after logarithmic scaling. PCMU 795 denotes mu-law scaling, PCMA A-law scaling. A detailed description is 796 field field name bits field field name bits 797 __________________________________________________________ 798 1 LARc[0] 6 39 xmc[22] 3 799 2 LARc[1] 6 40 xmc[23] 3 800 3 LARc[2] 5 41 xmc[24] 3 801 4 LARc[3] 5 42 xmc[25] 3 802 5 LARc[4] 4 43 Nc[2] 7 803 6 LARc[5] 4 44 bc[2] 2 804 7 LARc[6] 3 45 Mc[2] 2 805 8 LARc[7] 3 46 xmaxc[2] 6 806 9 Nc[0] 7 47 xmc[26] 3 807 10 bc[0] 2 48 xmc[27] 3 808 11 Mc[0] 2 49 xmc[28] 3 809 12 xmaxc[0] 6 50 xmc[29] 3 810 13 xmc[0] 3 51 xmc[30] 3 811 14 xmc[1] 3 52 xmc[31] 3 812 15 xmc[2] 3 53 xmc[32] 3 813 16 xmc[3] 3 54 xmc[33] 3 814 17 xmc[4] 3 55 xmc[34] 3 815 18 xmc[5] 3 56 xmc[35] 3 816 19 xmc[6] 3 57 xmc[36] 3 817 20 xmc[7] 3 58 xmc[37] 3 818 21 xmc[8] 3 59 xmc[38] 3 819 22 xmc[9] 3 60 Nc[3] 7 820 23 xmc[10] 3 61 bc[3] 2 821 24 xmc[11] 3 62 Mc[3] 2 822 25 xmc[12] 3 63 xmaxc[3] 6 823 26 Nc[1] 7 64 xmc[39] 3 824 27 bc[1] 2 65 xmc[40] 3 825 28 Mc[1] 2 66 xmc[41] 3 826 29 xmaxc[1] 6 67 xmc[42] 3 827 30 xmc[13] 3 68 xmc[43] 3 828 31 xmc[14] 3 69 xmc[44] 3 829 32 xmc[15] 3 70 xmc[45] 3 830 33 xmc[16] 3 71 xmc[46] 3 831 34 xmc[17] 3 72 xmc[47] 3 832 35 xmc[18] 3 73 xmc[48] 3 833 36 xmc[19] 3 74 xmc[49] 3 834 37 xmc[20] 3 75 xmc[50] 3 835 38 xmc[21] 3 76 xmc[51] 3 837 Table 2: Ordering of GSM variables 839 given by Jayant and Noll [13]. Each G.711 octet shall be octet- 840 aligned in an RTP packet. The sign bit of each G.711 octet shall 841 correspond to the most significant bit of the octet in the RTP packet 842 (i.e., assuming the G.711 samples are handled as octets on the host 843 Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 844 _____________________________________________________________________________________________ 845 0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3 846 1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5 847 2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2 848 3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1 849 4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2 850 5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0 851 6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04 852 7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0 853 8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2 854 9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1 855 10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0 856 11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2 857 12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0 858 13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14 859 14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0 860 15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2 861 16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1 862 17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0 863 18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2 864 19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0 865 20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24 866 21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0 867 22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2 868 23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1 869 24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0 870 25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2 871 26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0 872 27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34 873 28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0 874 29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2 875 30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1 876 31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0 877 32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2 879 Table 3: GSM payload format 881 machine, the sign bit shall be the most signficant bit of the octet 882 as defined by the host machine format). The 56 kb/s and 48 kb/s modes 883 of G.711 are not applicable to RTP, since G.711 shall always be 884 transmitted as 8-bit samples. 886 4.5.16 QCELP 888 The packetization of the QCELP audio codec is described in [14]. 890 4.5.17 RED 892 The redundant audio payload format "RED" is specified by RFC 2198 893 [15]. It defines a means by which multiple redundant copies of an 894 audio packet may be transmitted in a single RTP stream. Each packet 895 in such a stream contains, in addition to the audio data for that 896 packetization interval, a (more heavily compressed) copy of the data 897 from the previous packetization interval. This allows an 898 approximation of the data from lost packets to be recovered upon 899 decoding of the following packet, giving much improved sound quality 900 when compared with silence substitution for lost packets. 902 4.5.18 SX* 904 The SX7300P, SX8300P and SX9600P codecs are part of the same 905 compatible family and distinguished by the first octet in each frame, 906 where "x" can be any value: 908 0 1 2 3 4 5 6 7 909 +-+-+-+-+-+-+-+-+ 910 |0 0 x | SX7300P bitstream (14 byte frame) 911 |0 1 0 | SX8300P bitstream (16 byte frame) 912 |1 0 x | VAD bistream ( 2 byte frame) 913 |1 1 x | SX9600P bitstream (18 byte frame) 914 +-+-+-+-+-+-+-+-+ 916 4.5.18.1 SX7300P 918 The SX7300P is a low-complexity CELP-based audio codec operating at a 919 sampling rate of 8000 Hz. It encodes blocks of 120 audio samples (15 920 ms) into an encoded frame of 14 octets, yielding an encoded bit rate 921 of approximately 7467 b/s. 923 4.5.18.2 SX8300P 925 The SX8300P is a low-complexity CELP-based audio codec operating at a 926 sampling rate of 8000 Hz. It encodes blocks of 120 audio samples (15 927 ms) into an encoded frame of 16 octets, yielding an encoded bit rate 928 of approximately 8533 b/s. 930 4.5.18.3 SX9600P 932 The SX9600P is a low-complexity, toll-quality CELP-based audio codec 933 operating at a sampling rate of 8000 Hz. It encodes blocks of 120 934 audio samples (15 ms) into an encoded frame of 18 octets, yielding an 935 encoded bit rate of 9600 b/s. 937 4.5.19 VDVI 939 VDVI is a variable-rate version of DVI4, yielding speech bit rates of 940 between 10 and 25 kb/s. It is specified for single-channel operation 941 only. Samples are packed into octets starting at the most-significant 942 bit. 944 It uses the following encoding: 946 DVI4 codeword VDVI bit pattern 947 _________________________________ 948 0 00 949 1 010 950 2 1100 951 3 11100 952 4 111100 953 5 1111100 954 6 11111100 955 7 11111110 956 8 10 957 9 011 958 10 1101 959 11 11101 960 12 111101 961 13 1111101 962 14 11111101 963 15 11111111 965 5 Video 967 The following video encodings are currently defined, with their 968 abbreviated names used for identification: 970 5.1 CelB 972 The CELL-B encoding is a proprietary encoding proposed by Sun 973 Microsystems. The byte stream format is described in RFC 2029 [16]. 975 5.2 JPEG 977 The encoding is specified in ISO Standards 10918-1 and 10918-2. The 978 RTP payload format is as specified in RFC 2035 [17]. 980 5.3 H261 981 The encoding is specified in ITU-T Recommendation H.261, "Video codec 982 for audiovisual services at p x 64 kbit/s". The packetization and 983 RTP-specific properties are described in RFC 2032 [18]. 985 5.4 H263 987 The encoding is specified in ITU-T Recommendation H.263, "Video 988 coding for low bit rate communication". The packetization and RTP- 989 specific properties are described in [19]. 991 5.5 MPV 993 MPV designates the use MPEG-I and MPEG-II video encoding elementary 994 streams as specified in ISO Standards ISO/IEC 11172 and 13818-2, 995 respectively. The RTP payload format is as specified in RFC 2250 996 [12], Section 3. 998 5.6 MP2T 1000 MP2T designates the use of MPEG-II transport streams, for either 1001 audio or video. The encapsulation is described in RFC 2250 [12], 1002 Section 2. 1004 5.7 MP1S 1006 MP1S designates an MPEG-I systems stream, encapsulated according to 1007 RFC 2250 [12]. 1009 5.8 MP2P 1011 MP2P designates an MPEG-II program stream, encapsulated according to 1012 RFC 2250 [12]. 1014 5.9 nv 1016 The encoding is implemented in the program 'nv', version 4, developed 1017 at Xerox PARC by Ron Frederick. Further information is available from 1018 the author: 1020 Ron Frederick 1021 Xerox Palo Alto Research Center 1022 3333 Coyote Hill Road 1023 Palo Alto, CA 94304 1024 United States 1025 electronic mail: frederic@parc.xerox.com 1027 6 Payload Type Definitions 1028 Table 4 defines this profile's static payload type values for the PT 1029 field of the RTP data header. A new RTP payload format specification 1030 may be registered with the IANA by name. In addition, payload type 1031 values in the range 96-127 may be defined dynamically through a 1032 conference control protocol, which is beyond the scope of this 1033 document. For example, a session directory could specify that for a 1034 given session, payload type 96 indicates PCMU encoding, 8,000 Hz 1035 sampling rate, 2 channels. The payload type range marked 'reserved' 1036 has been set aside so that RTCP and RTP packets can be reliably 1037 distinguished (see Section "Summary of Protocol Constants" of the RTP 1038 protocol specification). 1040 An RTP source emits a single RTP payload type at any given instant. 1041 The interleaving or multiplexing of several RTP media types within a 1042 single RTP session is not allowed, but multiple RTP sessions may be 1043 used in parallel to send multiple media types. An RTP source may 1044 change payload types during a session. 1046 The payload types currently defined in this profile are assigned to 1047 exactly one of three categories or media types : audio only, video 1048 only and those combining audio and video. A single RTP session 1049 consists of payload types of one and only media type. 1051 Session participants agree through mechanisms beyond the scope of 1052 this specification on the set of payload types allowed in a given 1053 session. This set may, for example, be defined by the capabilities 1054 of the applications used, negotiated by a conference control protocol 1055 or established by agreement between the human participants. The media 1056 types in Table 4 are marked as "A" for audio, "V" for video and "AV" 1057 for combined audio/video streams. 1059 Audio applications operating under this profile should, at minimum, 1060 be able to send and receive payload types 0 (PCMU) and 5 (DVI4). This 1061 allows interoperability without format negotiation and successful 1062 negotation with a conference control protocol. 1064 All current video encodings use a timestamp frequency of 90,000 Hz, 1065 the same as the MPEG presentation time stamp frequency. This 1066 frequency yields exact integer timestamp increments for the typical 1067 24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates 1068 and 50, 59.94 and 60 Hz field rates. While 90 kHz is the recommended 1069 rate for future video encodings used within this profile, other rates 1070 are possible. However, it is not sufficient to use the video frame 1071 rate (typically between 15 and 30 Hz) because that does not provide 1072 adequate resolution for typical synchronization requirements when 1073 calculating the RTP timestamp corresponding to the NTP timestamp in 1074 an RTCP SR packet. The timestamp resolution must also be sufficient 1075 for the jitter estimate contained in the receiver reports. 1077 The standard video encodings and their payload types are listed in 1078 Table 4. 1080 7 RTP over TCP and Similar Byte Stream Protocols 1082 Under special circumstances, it may be necessary to carry RTP in 1083 protocols offering a byte stream abstraction, such as TCP, possibly 1084 multiplexed with other data. If the application does not define its 1085 own method of delineating RTP and RTCP packets, it SHOULD prefix each 1086 packet with a two-octet length field. 1088 (Note: RTSP [20] provides its own encapsulation and does not need an 1089 extra length indication.) 1091 8 Port Assignment 1093 As specified in the RTP protocol definition, RTP data is to be 1094 carried on an even UDP or TCP port number and the corresponding RTCP 1095 packets are to be carried on the next higher (odd) port number. 1097 Applications operating under this profile may use any such UDP or TCP 1098 port pair. For example, the port pair may be allocated randomly by a 1099 session management program. A single fixed port number pair cannot be 1100 required because multiple applications using this profile are likely 1101 to run on the same host, and there are some operating systems that do 1102 not allow multiple processes to use the same UDP port with different 1103 multicast addresses. 1105 However, port numbers 5004 and 5005 have been registered for use with 1106 this profile for those applications that choose to use them as the 1107 default pair. Applications that operate under multiple profiles may 1108 use this port pair as an indication to select this profile if they 1109 are not subject to the constraint of the previous paragraph. 1110 Applications need not have a default and may require that the port 1111 pair be explicitly specified. The particular port numbers were chosen 1112 to lie in the range above 5000 to accomodate port number allocation 1113 practice within the Unix operating system, where port numbers below 1114 1024 can only be used by privileged processes and port numbers 1115 between 1024 and 5000 are automatically assigned by the operating 1116 system. 1118 9 Bibliography 1120 [1] M. Handley and V. Jacobson, "SDP: Session Description Protocol," 1121 Request for Comments (Proposed Standard) RFC 2327, Internet 1122 Engineering Task Force, Apr. 1998. 1124 [2] Apple Computer, "Audio interchange file format AIFF-C," Aug. 1125 1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z). 1127 [3] Office of Technology and Standards, "Telecommunications: Analog 1128 to digital conversion of radio voice by 4,800 bit/second code excited 1129 linear prediction (celp)," Federal Standard FS-1016, GSA, Room 6654; 1130 7th & D Street SW; Washington, DC 20407 (+1-202-708-9205), 1990. 1132 [4] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The 1133 proposed Federal Standard 1016 4800 bps voice coder: CELP," Speech 1134 Technology , vol. 5, pp. 58--64, April/May 1990. 1136 [5] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The federal 1137 standard 1016 4800 bps CELP voice coder," Digital Signal Processing , 1138 vol. 1, no. 3, pp. 145--155, 1991. 1140 [6] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The dod 4.8 1141 kbps standard (proposed federal standard 1016)," in Advances in 1142 Speech Coding (B. Atal, V. Cuperman, and A. Gersho, eds.), ch. 12, 1143 pp. 121--133, Kluwer Academic Publishers, 1991. 1145 [7] IMA Digital Audio Focus and Technical Working Groups, 1146 "Recommended practices for enhancing digital audio compatibility in 1147 multimedia systems (version 3.00)," tech. rep., Interactive 1148 Multimedia Association, Annapolis, Maryland, Oct. 1992. 1150 [8] D. Deleam and J.-P. Petit, "Real-time implementations of the 1151 recent ITU-T low bit rate speech coders on the TI TMS320C54X DSP: 1152 results, methodology, and applications," in Proc. of International 1153 Conference on Signal Processing, Technology, and Applications 1154 (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660, Oct. 1996. 1156 [9] M. Mouly and M.-B. Pautet, The GSM system for mobile 1157 communications Lassay-les-Chateaux, France: Europe Media Duplication, 1158 1993. 1160 [10] J. Degener, "Digital speech compression," Dr. Dobb's Journal , 1161 Dec. 1994. 1163 [11] S. M. Redl, M. K. Weber, and M. W. Oliphant, An Introduction to 1164 GSM Boston: Artech House, 1995. 1166 [12] D. Hoffman, G. Fernando, V. Goyal, and M. Civanlar, "RTP payload 1167 format for MPEG1/MPEG2 video," Request for Comments (Proposed 1168 Standard) RFC 2250, Internet Engineering Task Force, Jan. 1998. 1170 [13] N. S. Jayant and P. Noll, Digital Coding of Waveforms-- 1171 Principles and Applications to Speech and Video Englewood Cliffs, New 1172 PT encoding media type clock rate channels 1173 name (Hz) (audio) 1174 _______________________________________________________________ 1175 0 PCMU A 8000 1 1176 1 1016 A 8000 1 1177 2 G726-32 A 8000 1 1178 3 GSM A 8000 1 1179 4 G723 A 8000 1 1180 5 DVI4 A 8000 1 1181 6 DVI4 A 16000 1 1182 7 LPC A 8000 1 1183 8 PCMA A 8000 1 1184 9 G722 A 16000 1 1185 10 L16 A 44100 2 1186 11 L16 A 44100 1 1187 12 QCELP A 8000 1 1188 13 unassigned A 1189 14 MPA A 90000 (see text) 1190 15 G728 A 8000 1 1191 16 DVI4 A 11025 1 1192 17 DVI4 A 22050 1 1193 18 G729 A 8000 1 1194 19 CN A 8000 1 1195 20 unassigned A 1196 21 unassigned A 1197 22 unassigned A 1198 23 unassigned A 1199 24 unassigned V 1200 25 CelB V 90000 1201 26 JPEG V 90000 1202 27 unassigned V 1203 28 nv V 90000 1204 29 unassigned V 1205 30 unassigned V 1206 31 H261 V 90000 1207 32 MPV V 90000 1208 33 MP2T AV 90000 1209 34 H263 V 90000 1210 35--71 unassigned ? 1211 72--76 reserved N/A N/A N/A 1212 77--95 unassigned ? 1213 96--127 dynamic ? 1214 dyn RED A 1215 dyn MP1S V 90000 1216 dyn MP2P V 90000 1218 Table 4: Payload types (PT) for standard audio and video encodings 1219 Jersey: Prentice-Hall, 1984. 1221 [14] K. McKay, "RTP Payload Format for PureVoice(tm) Audio", Internet 1222 Draft, Internet Engineering Task Force, Oct. 1998. Work in progress. 1224 [15] C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.C. 1225 Bolot, A. Vega-Garcia, and S. Fosse-Parisis, "RTP Payload for 1226 Redundant Audio Data," Request for Comments (Proposed Standard) RFC 1227 2198, Internet Engineering Task Force, Sep. 1997. 1229 [16] M. Speer and D. Hoffman, "RTP payload format of sun's CellB 1230 video encoding," Request for Comments (Proposed Standard) RFC 2029, 1231 Internet Engineering Task Force, Oct. 1996. 1233 [17] L. Berc, W. Fenner, R. Frederick, and S. McCanne, "RTP payload 1234 format for JPEG-compressed video," Request for Comments (Proposed 1235 Standard) RFC 2035, Internet Engineering Task Force, Oct. 1996. 1237 [18] T. Turletti and C. Huitema, "RTP payload format for H.261 video 1238 streams," Request for Comments (Proposed Standard) RFC 2032, Internet 1239 Engineering Task Force, Oct. 1996. 1241 [19] C. Zhu, "RTP payload format for H.263 video streams," Request 1242 for Comments (Proposed Standard) RFC 2190, Internet Engineering Task 1243 Force, Sep. 1997. 1245 [20] H. Schulzrinne, A. Rao, and R. Lanphier, "Real time streaming 1246 protocol (RTSP)," Request for Comments (Proposed Standard) RFC 2326, 1247 Internet Engineering Task Force, Apr. 1998. 1249 10 Acknowledgements 1251 The comments and careful review of Steve Casner, Simao Campos and 1252 Richard Cox are gratefully acknowledged. The GSM description was 1253 adopted from the IMTC Voice over IP Forum Service Interoperability 1254 Implementation Agreement (January 1997). Fred Burg and Terry Lyons 1255 helped with the G.729 description. 1257 11 Address of Author 1259 Henning Schulzrinne 1260 Dept. of Computer Science 1261 Columbia University 1262 1214 Amsterdam Avenue 1263 New York, NY 10027 1264 USA 1265 electronic mail: schulzrinne@cs.columbia.edu 1266 Current Locations of Related Resources 1268 Note: Several sections below refer to the ITU-T Software Tool Library 1269 (STL). It is available from the ITU Sales Service, Place des Nations, 1270 CH-1211 Geneve 20, Switzerland (also check http://www.itu.int. The 1271 ITU-T STL is covered by a license defined in ITU-T Recommendation 1272 G.191, " Software tools for speech and audio coding standardization 1273 ". 1275 UTF-8 1277 Information on the UCS Transformation Format 8 (UTF-8) is available 1278 at 1280 http://www.stonehand.com/unicode/standard/utf8.html 1282 1016 1284 The U.S. DoD's Federal-Standard-1016 based 4800 bps code excited 1285 linear prediction voice coder version 3.2 (CELP 3.2) Fortran and C 1286 simulation source codes are available for worldwide distribution at 1287 no charge (on DOS diskettes, but configured to compile on Sun SPARC 1288 stations) from: Bob Fenichel, National Communications System, 1289 Washington, D.C. 20305, phone +1-703-692-2124, fax +1-703-746-4960. 1291 An implementation is also available at 1293 ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z 1295 DVI4 1297 An implementation is available from Jack Jansen at 1299 ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar 1301 G722 1303 An implementation of the G.722 algorithm is available as part of the 1304 ITU-T STL, described above. 1306 G723 1308 The reference C code implementation defining the G.723.1 algorithm 1309 and its Annexes A, B, and C are available as an integral part of 1310 Recommendation G.723.1 from the ITU Sales Service, address listed 1311 above. Both the algorithm and C code are covered by a specific 1312 license. The ITU-T Secretariat should be contacted to obtain such 1313 licensing information. 1315 G726-16 through G726-40 1317 G726-16 through G726-40 are specified in the ITU-T Recommendation 1318 G.726, "40, 32, 24, and 16 kb/s Adaptive Differential Pulse Code 1319 Modulation (ADPCM)". An implementation of the G.726 algorithm is 1320 available as part of the ITU-T STL, described above. 1322 G727-16 through G727-40 1324 G727-16 through G727-40 are specified in the ITU-T Recommendation 1325 G.727, "5-, 4-, 3-, and 2-bit/sample embedded adaptive differential 1326 pulse code modulation". An implementation of the G.727 algorithm will 1327 be available in a future release of the ITU-T STL, described above. 1329 G729 1331 The reference C code implementation defining the G.729 algorithm and 1332 its Annexes A and B are available as an integral part of 1333 Recommendation G.729 from the ITU Sales Service, listed above. Both 1334 the algorithm and the C code are covered by a specific license. The 1335 contact information for obtaining the license is listed in the C 1336 code. 1338 GSM 1340 A reference implementation was written by Carsten Borman and Jutta 1341 Degener (TU Berlin, Germany). It is available at 1343 ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ 1345 Although the RPE-LTP algorithm is not an ITU-T standard, there is a C 1346 code implementation of the RPE-LTP algorithm available as part of the 1347 ITU-T STL. The STL implementation is an adaptation of the TU Berlin 1348 version. 1350 LPC 1352 An implementation is available at 1354 ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z 1356 PCMU, PCMA 1358 An implementation of these algorithm is available as part of the 1359 ITU-T STL, described above. Code to convert between linear and mu-law 1360 companded data is also available in [7]. 1362 Table of Contents 1364 1 Introduction ........................................ 2 1365 2 RTP and RTCP Packet Forms and Protocol Behavior ..... 3 1366 3 Registering Payload Types ........................... 5 1367 4 Audio ............................................... 6 1368 4.1 Encoding-Independent Rules .......................... 6 1369 4.2 Operating Recommendations ........................... 7 1370 4.3 Guidelines for Sample-Based Audio Encodings ......... 8 1371 4.4 Guidelines for Frame-Based Audio Encodings .......... 8 1372 4.5 Audio Encodings ..................................... 9 1373 4.5.1 1016 ................................................ 10 1374 4.5.2 CN .................................................. 10 1375 4.5.3 DVI4 ................................................ 11 1376 4.5.4 G722 ................................................ 12 1377 4.5.5 G723 ................................................ 12 1378 4.5.6 G726-16, G726-24, G726-32, G726-40 .................. 13 1379 4.5.7 G727-16, G727-24, G727-32, G727-40 .................. 13 1380 4.5.8 G728 ................................................ 13 1381 4.5.9 G729 ................................................ 14 1382 4.5.10 GSM ................................................. 16 1383 4.5.10.1 General Packaging Issues ............................ 16 1384 4.5.10.2 GSM variable names and numbers ...................... 17 1385 4.5.11 L8 .................................................. 17 1386 4.5.12 L16 ................................................. 17 1387 4.5.13 LPC ................................................. 17 1388 4.5.14 MPA ................................................. 17 1389 4.5.15 PCMA and PCMU ....................................... 17 1390 4.5.16 QCELP ............................................... 19 1391 4.5.17 RED ................................................. 20 1392 4.5.18 SX* ................................................. 20 1393 4.5.18.1 SX7300P ............................................. 20 1394 4.5.18.2 SX8300P ............................................. 20 1395 4.5.18.3 SX9600P ............................................. 20 1396 4.5.19 VDVI ................................................ 21 1397 5 Video ............................................... 21 1398 5.1 CelB ................................................ 21 1399 5.2 JPEG ................................................ 21 1400 5.3 H261 ................................................ 21 1401 5.4 H263 ................................................ 22 1402 5.5 MPV ................................................. 22 1403 5.6 MP2T ................................................ 22 1404 5.7 MP1S ................................................ 22 1405 5.8 MP2P ................................................ 22 1406 5.9 nv .................................................. 22 1407 6 Payload Type Definitions ............................ 22 1408 7 RTP over TCP and Similar Byte Stream Protocols ...... 24 1409 8 Port Assignment ..................................... 24 1410 9 Bibliography ........................................ 24 1411 10 Acknowledgements .................................... 27 1412 11 Address of Author ................................... 27