idnits 2.17.1 draft-ietf-avt-profile-new-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-18) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** Expected the document's filename to be given on the first page, but didn't find any ** The document is more than 15 pages and seems to lack a Table of Contents. == There is 1 instance of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** There are 56 instances of too long lines in the document, the longest one being 24 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 406: '...nt coded bit rates. Non-RTP means MUST...' RFC 2119 keyword, line 592: '...e G726-32 encoding MUST be packed into...' RFC 2119 keyword, line 1104: '...nd RTCP packets, it SHOULD prefix each...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 297 has weird spacing: '...hannels des...' == Line 305 has weird spacing: '... lc c ...' == Line 426 has weird spacing: '...ncoding sam...' == Line 454 has weird spacing: '...A: not appli...' == Line 560 has weird spacing: '... bits con...' == (2 more instances...) == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- The document date (November 20, 1997) is 9646 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 1139 looks like a reference -- Missing reference section? '2' on line 1142 looks like a reference -- Missing reference section? '3' on line 1147 looks like a reference -- Missing reference section? '4' on line 1151 looks like a reference -- Missing reference section? '5' on line 1155 looks like a reference -- Missing reference section? '6' on line 1320 looks like a reference -- Missing reference section? '7' on line 1165 looks like a reference -- Missing reference section? '8' on line 1171 looks like a reference -- Missing reference section? '9' on line 1175 looks like a reference -- Missing reference section? '10' on line 1178 looks like a reference -- Missing reference section? '0' on line 782 looks like a reference -- Missing reference section? '22' on line 770 looks like a reference -- Missing reference section? '23' on line 771 looks like a reference -- Missing reference section? '24' on line 772 looks like a reference -- Missing reference section? '25' on line 773 looks like a reference -- Missing reference section? '26' on line 778 looks like a reference -- Missing reference section? '27' on line 779 looks like a reference -- Missing reference section? '28' on line 780 looks like a reference -- Missing reference section? '29' on line 781 looks like a reference -- Missing reference section? '30' on line 782 looks like a reference -- Missing reference section? '31' on line 783 looks like a reference -- Missing reference section? '32' on line 784 looks like a reference -- Missing reference section? '33' on line 785 looks like a reference -- Missing reference section? '34' on line 786 looks like a reference -- Missing reference section? '35' on line 787 looks like a reference -- Missing reference section? '36' on line 788 looks like a reference -- Missing reference section? '37' on line 789 looks like a reference -- Missing reference section? '38' on line 790 looks like a reference -- Missing reference section? '11' on line 1181 looks like a reference -- Missing reference section? '12' on line 1185 looks like a reference -- Missing reference section? '39' on line 795 looks like a reference -- Missing reference section? '40' on line 796 looks like a reference -- Missing reference section? '41' on line 797 looks like a reference -- Missing reference section? '42' on line 798 looks like a reference -- Missing reference section? '13' on line 1189 looks like a reference -- Missing reference section? '43' on line 799 looks like a reference -- Missing reference section? '14' on line 1193 looks like a reference -- Missing reference section? '44' on line 800 looks like a reference -- Missing reference section? '15' on line 1197 looks like a reference -- Missing reference section? '45' on line 801 looks like a reference -- Missing reference section? '16' on line 1201 looks like a reference -- Missing reference section? '46' on line 802 looks like a reference -- Missing reference section? '17' on line 1205 looks like a reference -- Missing reference section? '47' on line 803 looks like a reference -- Missing reference section? '18' on line 804 looks like a reference -- Missing reference section? '48' on line 804 looks like a reference -- Missing reference section? '19' on line 805 looks like a reference -- Missing reference section? '49' on line 805 looks like a reference -- Missing reference section? '20' on line 806 looks like a reference -- Missing reference section? '50' on line 806 looks like a reference -- Missing reference section? '21' on line 807 looks like a reference -- Missing reference section? '51' on line 807 looks like a reference Summary: 13 errors (**), 0 flaws (~~), 9 warnings (==), 54 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force AVT WG 3 Internet Draft Schulzrinne 4 ietf-avt-profile-new-02.txt Columbia U. 5 November 20, 1997 6 Expires: January 1, 1998 8 RTP Profile for Audio and Video Conferences with Minimal Control 10 STATUS OF THIS MEMO 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference 20 material or to cite them other than as ``work in progress''. 22 To learn the current status of any Internet-Draft, please check the 23 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 24 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 25 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 26 ftp.isi.edu (US West Coast). 28 Distribution of this document is unlimited. 30 ABSTRACT 32 This memo describes a profile called ''RTP/AVP'' for the 33 use of the real-time transport protocol (RTP), version 2, 34 and the associated control protocol, RTCP, within audio 35 and video multiparticipant conferences with minimal 36 control. It provides interpretations of generic fields 37 within the RTP specification suitable for audio and video 38 conferences. In particular, this document defines a set 39 of default mappings from payload type numbers to 40 encodings. 42 The document also describes how audio and video data may 43 be carried within RTP. It defines a set of standard 44 encodings and their names when used within RTP. However, 45 the encoding definitions are independent of the 46 particular transport mechanism used. The descriptions 47 provide pointers to reference implementations and the 48 detailed standards. This document is meant as an aid for 49 implementors of audio, video and other real-time 50 multimedia applications. 52 Changes 54 This draft revises RFC 1890. It is fully backwards-compatible with 55 RFC 1890 and codifies existing practice. It is intended that this 56 draft form the basis of a new RFC to obsolete RFC 1890 as it moves to 57 Draft Standard. 59 Besides wording clarifications and filling in RFC numbers for payload 60 type definitions, this draft adds payload types 4, 16, 17, 18, 19 and 61 34. The PostScript version of this draft contains change bars marking 62 changes make since draft -00. 64 A tentative TCP encapsulation is defined. 66 According to Peter Hoddie of Apple, only pre-1994 Macintosh used the 67 22254.54 rate and none the 11127.27 rate. 69 Note to RFC editor: This section is to be removed before publication 70 as an RFC. All RFC TBD should be filled in with the number of the RTP 71 specification RFC submitted for Draft Standard status. 73 1 Introduction 75 This profile defines aspects of RTP left unspecified in the RTP 76 Version 2 protocol definition (RFC XXXX). This profile is intended 77 for the use within audio and video conferences with minimal session 78 control. In particular, no support for the negotiation of parameters 79 or membership control is provided. The profile is expected to be 80 useful in sessions where no negotiation or membership control are 81 used (e.g., using the static payload types and the membership 82 indications provided by RTCP), but this profile may also be useful in 83 conjunction with a higher-level control protocol. 85 Use of this profile occurs by use of the appropriate applications; 86 there is no explicit indication by port number, protocol identifier 87 or the like. Applications such as session directories should refer to 88 this profile as "RTP/AVP". 90 Other profiles may make different choices for the items specified 91 here. 93 This document also defines a set of payload formats for audio. 95 This draft defines the term media type as dividing encodings of audio 96 and video content into three classes: audio, video and audio/video 97 (interleaved). 99 2 RTP and RTCP Packet Forms and Protocol Behavior 101 The section "RTP Profiles and Payload Format Specification" of RFC 102 TBD enumerates a number of items that can be specified or modified in 103 a profile. This section addresses these items. Generally, this 104 profile follows the default and/or recommended aspects of the RTP 105 specification. 107 RTP data header: The standard format of the fixed RTP data header is 108 used (one marker bit). 110 Payload types: Static payload types are defined in Section 6. 112 RTP data header additions: No additional fixed fields are appended to 113 the RTP data header. 115 RTP data header extensions: No RTP header extensions are defined, but 116 applications operating under this profile may use such 117 extensions. Thus, applications should not assume that the RTP 118 header X bit is always zero and should be prepared to ignore the 119 header extension. If a header extension is defined in the 120 future, that definition must specify the contents of the first 121 16 bits in such a way that multiple different extensions can be 122 identified. 124 RTCP packet types: No additional RTCP packet types are defined by 125 this profile specification. 127 RTCP report interval: The suggested constants are to be used for the 128 RTCP report interval calculation. 130 SR/RR extension: No extension section is defined for the RTCP SR or 131 RR packet. 133 SDES use: Applications may use any of the SDES items described in the 134 RTP specification. While CNAME information is sent every 135 reporting interval, other items should be sent only every third 136 reporting interval, with NAME sent seven out of eight times 137 within that slot and the remaining SDES items cyclically taking 138 up the eighth slot, as defined in Section 6.2.2 of the RTP 139 specification. In other words, NAME is sent in RTCP packets 1, 140 4, 7, 10, 13, 16, 19, while, say, EMAIL is used in RTCP packet 141 22. 143 Security: The RTP default security services are also the default 144 under this profile. 146 String-to-key mapping: A user-provided string ("pass phrase") is 147 hashed with the MD5 algorithm to a 16-octet digest. An !n!-bit 148 key is extracted from the digest by taking the first !n! bits 149 from the digest. If several keys are needed with a total length 150 of 128 bits or less (as for triple DES), they are extracted in 151 order from that digest. The octet ordering is specified in RFC 152 1423, Section 2.2. (Note that some DES implementations require 153 that the 56-bit key be expanded into 8 octets by inserting an 154 odd parity bit in the most significant bit of the octet to go 155 with each 7 bits of the key.) 157 It is suggested that pass phrases are restricted to ASCII letters, 158 digits, the hyphen, and white space to reduce the the chance of 159 transcription errors when conveying keys by phone, fax, telex or 160 email. 162 The pass phrase may be preceded by a specification of the encryption 163 algorithm. Any characters up to the first slash (ASCII 0x2f) are 164 taken as the name of the encryption algorithm. The encryption format 165 specifiers should be drawn from RFC 1423 or any additional 166 identifiers registered with IANA. If no slash is present, DES-CBC is 167 assumed as default. The encryption algorithm specifier is case 168 sensitive. 170 The pass phrase typed by the user is transformed to a canonical form 171 before applying the hash algorithm. For that purpose, we define 172 return, tab, or vertical tab as well as all characters contained in 173 the Unicode space characters table. The transformation consists of 174 the following steps: (1) convert the input string to the ISO 10646 175 character set, using the UTF-8 encoding as specified in Annex P to 176 ISO/IEC 10646-1:1993 (ASCII characters require no mapping, but ISO 177 8859-1 characters do); (2) remove leading and trailing white space 178 characters; (3) replace one or more contiguous white space characters 179 by a single space (ASCII or UTF-8 0x20); (4) convert all letters to 180 lower case and replace sequences of characters and non-spacing 181 accents with a single character, where possible. A minimum length of 182 16 key characters (after applying the transformation) should be 183 enforced by the application, while applications must allow up to 256 184 characters of input. 186 Underlying protocol: The profile specifies the use of RTP over 187 unicast and multicast UDP as well as TCP. (This does not 188 preclude the use of these definitions when RTP is carried by 189 other lower-layer protocols.) 191 Transport mapping: The standard mapping of RTP and RTCP to 192 transport-level addresses is used. 194 Encapsulation: No encapsulation of RTP packets is specified. 196 3 Registering Payload Types 198 This profile defines a set of standard encodings and their payload 199 types when used within RTP. Other encodings and their payload types 200 are to be registered with the Internet Assigned Numbers Authority 201 (IANA). When registering a new encoding/payload type, the following 202 information should be provided: 204 oname and description of encoding, in particular the RTP 205 timestamp clock rate; the names defined here are 3 or 4 206 characters long to allow a compact representation if needed; 208 oindication of who has change control over the encoding (for 209 example, ISO, ITU-T, other international standardization 210 bodies, a consortium or a particular company or group of 211 companies); 213 oany operating parameters or profiles; 215 oa reference to a further description, if available, for example 216 (in order of preference) an RFC, a published paper, a patent 217 filing, a technical report, documented source code or a 218 computer manual; 220 ofor proprietary encodings, contact information (postal and 221 email address); 223 othe payload type value for this profile, if necessary (see 224 below). 226 Note that not all encodings to be used by RTP need to be assigned a 227 static payload type. Non-RTP means beyond the scope of this memo 228 (such as directory services or invitation protocols) may be used to 229 establish a dynamic mapping between a payload type and an encoding 230 ("dynamic payload types"). Applications should first use the range 96 231 to 127 for dynamic payload types. Only applications which need to 232 define more than 32 dynamic payload types may redefine codes below 233 96. Redefining payload types below 96 may cause incorrect operation 234 if an attempt is made to join a session without obtaining session 235 description information that defines the dynamic payload types. 237 Note that dynamic payload types should not be used without a well- 238 defined mechanism to indicate the mapping. Systems that expect to 239 interoperate with others operating under this profile should not 240 assign proprietary encodings to particular, fixed payload types in 241 the range reserved for dynamic payload types. SDP (RFC XXXX ) defines 242 such a mapping mechanism. 244 The available payload type space is relatively small. Thus, new 245 static payload types are assigned only if the following conditions 246 are met: 248 oThe encoding is of interest to the Internet community at large. 250 oIt offers benefits compared to existing encodings and/or is 251 required for interoperation with existing, widely deployed 252 conferencing or multimedia systems. 254 oThe description is sufficient to build a decoder. 256 For implementor convenience, this profile contains descriptions of 257 encodings which do not currently have a static payload type assigned 258 to them. 260 The Session Description Protocol (SDP) (RFC XXXX) uses the encoding 261 names defined here. 263 4 Audio 265 4.1 Encoding-Independent Rules 267 For applications which send either no packets or comfort-noise 268 packets during silence, the first packet of a talkspurt, that is, the 269 first packet after a silence period, is distinguished by setting the 270 marker bit in the RTP data header to one. The marker bits in all 271 other packets is zero. The beginning of a talkspurt may be used to 272 adjust the playout delay to reflect changing network delays. 273 Applications without silence suppression set the bit to zero. 275 The RTP clock rate used for generating the RTP timestamp is 276 independent of the number of channels and the encoding; it equals the 277 number of sampling periods per second. For !N!-channel encodings, 278 each sampling period (say, 1/8000 of a second) generates !N! samples. 279 (This terminology is standard, but somewhat confusing, as the total 280 number of samples generated per second is then the sampling rate 281 times the channel count.) 283 If multiple audio channels are used, channels are numbered left-to- 284 right, starting at one. In RTP audio packets, information from 285 lower-numbered channels precedes that from higher-numbered channels. 286 For more than two channels, the convention followed by the AIFF-C 287 audio interchange format should be followed [1], using the following 288 notation: 290 l left 291 r right 292 c center 293 S surround 294 F front 295 R rear 297 channels description channel 298 1 2 3 4 5 6 299 ________________________________________________________________ 300 2 stereo l r 301 3 l r c 302 4 quadrophonic Fl Fr Rl Rr 303 4 l c r S 304 5 Fl Fr Fc Sl Sr 305 6 l lc c r rc S 307 Samples for all channels belonging to a single sampling instant must 308 be within the same packet. The interleaving of samples from different 309 channels depends on the encoding. General guidelines are given in 310 Section 4.3 and 4.4. 312 The sampling frequency should be drawn from the set: 8000, 11025, 313 16000, 22050, 24000, 32000, 44100 and 48000 Hz. (Older Apple 314 Macintosh computers had a native sample rate of 22254.54 Hz, which 315 can be converted to 22050 with acceptable quality by dropping 4 316 samples in a 20 ms frame.) However, most audio encodings are defined 317 for a more restricted set of sampling frequencies. Receivers should 318 be prepared to accept multi-channel audio, but may choose to only 319 play a single channel. 321 4.2 Operating Recommendations 323 The following recommendations are default operating parameters. 324 Applications should be prepared to handle other values. The ranges 325 given are meant to give guidance to application writers, allowing a 326 set of applications conforming to these guidelines to interoperate 327 without additional negotiation. These guidelines are not intended to 328 restrict operating parameters for applications that can negotiate a 329 set of interoperable parameters, e.g., through a conference control 330 protocol. 332 For packetized audio, the default packetization interval should have 333 a duration of 20 ms or one frame, whichever is longer, unless 334 otherwise noted in Table 1 (column "ms/packet"). The packetization 335 interval determines the minimum end-to-end delay; longer packets 336 introduce less header overhead but higher delay and make packet loss 337 more noticeable. For non-interactive applications such as lectures or 338 links with severe bandwidth constraints, a higher packetization delay 339 may be appropriate. A receiver should accept packets representing 340 between 0 and 200 ms of audio data. (For framed audio encodings, a 341 receiver should accept packets with 200 ms divided by the frame 342 duration, rounded up.) This restriction allows reasonable buffer 343 sizing for the receiver. 345 4.3 Guidelines for Sample-Based Audio Encodings 347 In sample-based encodings, each audio sample is represented by a 348 fixed number of bits. Within the compressed audio data, codes for 349 individual samples may span octet boundaries. An RTP audio packet may 350 contain any number of audio samples, subject to the constraint that 351 the number of bits per sample times the number of samples per packet 352 yields an integral octet count. Fractional encodings produce less 353 than one octet per sample. 355 The duration of an audio packet is determined by the number of 356 samples in the packet. 358 For sample-based encodings producing one or more octets per sample, 359 samples from different channels sampled at the same sampling instant 360 are packed in consecutive octets. For example, for a two-channel 361 encoding, the octet sequence is (left channel, first sample), (right 362 channel, first sample), (left channel, second sample), (right 363 channel, second sample), .... For multi-octet encodings, octets are 364 transmitted in network byte order (i.e., most significant octet 365 first). 367 The packing of sample-based encodings producing less than one octet 368 per sample is encoding-specific. 370 4.4 Guidelines for Frame-Based Audio Encodings 372 Frame-based encodings encode a fixed-length block of audio into 373 another block of compressed data, typically also of fixed length. For 374 frame-based encodings, the sender may choose to combine several such 375 frames into a single RTP packet. The receiver can tell the number of 376 frames contained in an RTP packet since the audio frame duration (in 377 octets) is defined as part of the encoding, as long as all frames 378 have the same length measured in octets. This does not work when 379 carrying frames of different sizes unless the frame sizes are 380 relatively prime. 382 For frame-based codecs, the channel order is defined for the whole 383 block. That is, for two-channel audio, right and left samples are 384 coded independently, with the encoded frame for the left channel 385 preceding that for the right channel. 387 All frame-oriented audio codecs should be able to encode and decode 388 several consecutive frames within a single packet. Since the frame 389 size for the frame-oriented codecs is given, there is no need to use 390 a separate designation for the same encoding, but with different 391 number of frames per packet. 393 RTP packets shall contain a whole number of frames, with frames 394 inserted according to age within a packet, so that the oldest frame 395 (to be played first) occurs immediately after the RTP packet header. 396 The RTP timestamp reflects the capturing time of the first sample in 397 the first frame, that is, the oldest information in the packet. 399 4.5 Audio Encodings 401 The characteristics of standard audio encodings are shown in Table 1; 402 those assigned static payload types are listed in Table 3. While most 403 audio codecs are only specified for a fixed sampling rate, some 404 sample-based algorithms (indicated by an entry of "var." in the 405 sampling rate column of Table 1) may be used with different sampling 406 rates, resulting in different coded bit rates. Non-RTP means MUST 407 indicate the appropriate sampling rate. 409 4.5.1 1016 411 Encoding 1016 is a frame based encoding using code-excited linear 412 prediction (CELP) and is specified in Federal Standard FED-STD 1016 413 [2,3,4,5]. 415 4.5.2 CN 417 The CN (comfort noise) packet contains a single-octet message to the 418 receiver to play comfort noise at the absolute level specified. This 419 message would normally be sent once at the beginning of a silence 420 period (which also indicates the transition from speech to silence), 421 but rate of noise level updates is implementation specific. The 422 magnitude of the noise level is packed into the least significant 423 bits of the noise-level payload, as shown below. 425 name of sampling default 426 encoding sample/frame bits/sample rate ms/frame ms/packet 427 ____________________________________________________________________________ 428 1016 frame N/A 8,000 30 30 429 CN frame N/A var. 430 DVI4 sample 4 var. 20 431 G722 sample 8 16,000 20 432 G723 frame N/A 8,000 30 30 433 G726-16 sample 2 8,000 20 434 G726-24 sample 3 8,000 20 435 G726-32 sample 4 8,000 20 436 G726-40 sample 5 8,000 20 437 G727-16 sample 2 8,000 20 438 G727-24 sample 3 8,000 20 439 G727-32 sample 4 8,000 20 440 G727-40 sample 5 8,000 20 441 G728 frame N/A 8,000 2.5 20 442 G729 frame N/A 8,000 10 20 443 GSM frame N/A 8,000 20 20 444 L8 sample 8 var. 20 445 L16 sample 16 var. 20 446 LPC frame N/A 8,000 20 20 447 MPA frame N/A var. 20 448 PCMA sample 8 var. 20 449 PCMU sample 8 var. 20 450 SX7300P frame N/A 8,000 15 30 451 SX8300P frame N/A 8,000 15 30 452 VDVI sample var. var. 20 454 Table 1: Properties of Audio Encodings (N/A: not applicable; var.: 455 variable) 457 The noise level is expressed in dBov, with values from 0 to 127 dBov. 458 dBov is the level relative to the overload of the system. (Note: 459 Representation relative to the overload point of a system is 460 particularly useful for digital implementations, since one does not 461 need to know the relative calibration of the analog circuitry.) 462 Example: In 16-bit linear PCM system (L16), a signal with 0 dBov 463 represents a square wave with the maximum possible amplitude (+/- 464 32767). -63 dBov corresponds to -58 dBm0 in a standard telephone 465 system. (dBm is the power level in decibels relative to 1 mW, with an 466 impedance of 600 Ohms.) 467 0 1 2 3 4 5 6 7 468 +-+-+-+-+-+-+-+-+ 469 |0| level | 470 +-+-+-+-+-+-+-+-+ 472 The RTP header for the comfort noise packet should be constructed as 473 if the comfort noise were an independent codec. Thus, the RTP 474 timestamp designates the beginning of the silence period. A static 475 payload type is assigned for a sampling rate of 8,000 Hz; if other 476 sampling rates are needed, they should be defined through dynamic 477 payload types. The RTP packet should not have the marker bit set. 479 The CN payload type is primarily for use with L16, DVI4, PCMA, PCMU 480 and other audio codecs that do not support comfort noise as part of 481 the codec itself. G.723.1 and G.729 have their own comfort noise 482 systems as part of Annexes A (G.723.1) and B (G.729), respectively. 484 4.5.3 DVI4 486 DVI4 is specified, with pseudo-code, in [6] as the IMA ADPCM wave 487 type. 489 However, the encoding defined here as DVI4 differs in three respects 490 from this recommendation: 492 oThe RTP DVI4 header contains the predicted value rather than 493 the first sample value contained the IMA ADPCM block header. 495 oIMA ADPCM blocks contain an odd number of samples, since the 496 first sample of a block is contained just in the header 497 (uncompressed), followed by an even number of compressed 498 samples. DVI4 has an even number of compressed samples only, 499 using the 'predict' word from the header to decode the first 500 sample. 502 oFor DVI4, the 4-bit samples are packed with the first sample in 503 the four most significant bits and the second sample in the 504 four least significant bits. In the IMA ADPCM codec, the 505 samples are packed in little-endian order. 507 Each packet contains a single DVI block. This profile only defines 508 the 4-bit-per-sample version, while IMA also specifies a 3-bit-per- 509 sample encoding. 511 The "header" word for each channel has the following structure: 513 int16 predict; /* predicted value of first sample 514 from the previous block (L16 format) */ 515 u_int8 index; /* current index into stepsize table */ 516 u_int8 reserved; /* set to zero by sender, ignored by receiver */ 518 Each octet following the header contains two 4-bit samples, thus the 519 number of samples per packet must be even. 521 Packing of samples for multiple channels is for further study. 523 The document IMA Recommended Practices for Enhancing Digital Audio 524 Compatibility in Multimedia Systems (version 3.0) contains the 525 algorithm description. It is available from 527 Interactive Multimedia Association 528 48 Maryland Avenue, Suite 202 529 Annapolis, MD 21401-8011 530 USA 531 phone: +1 410 626-1380 533 4.5.4 G722 535 G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding 536 within 64 kbit/s". 538 4.5.5 G723 540 G.723.1 is specified in ITU Recommendation G.723.1, "Dual-rate speech 541 coder for multimedia communications transmitting at 5.3 and 6.3 542 kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T as 543 a mandatory codec for ITU-T H.324 GSTN videophone terminal 544 applications. The algorithm has a floating point specification in 545 Annex B to G.723.1, a silence compression algorithm in Annex A to 546 G.723.1 and an encoded signal bit-error sensitivity specification in 547 G.723.1 Annex C. 549 This Recommendation specifies a coded representation that can be used 550 for compressing the speech signal component of multi-media services 551 at a very low bit rate. Audio is encoded in 30 ms frames, with an 552 additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be 553 one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s 554 frame), or 4 octets. These 4-octet frames are called SID frames 555 (Silence Insertion Descriptor) and are used to specify comfort noise 556 parameters. There is no restriction on how 4, 20, and 24 octet frames 557 are intermixed. The least significant two bits of the first octet in 558 the frame determine the frame size and codec type: 560 bits content octets/frame 561 00 high-rate speech (6.3 kb/s) 24 562 01 low-rate speech (5.3 kb/s) 20 563 10 SID frame 4 564 11 reserved 566 It is possible to switch between the two rates at any 30 ms frame 567 boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of 568 the encoder and decoder. This coder was optimized to represent speech 569 with near-toll quality at the above rates using a limited amount of 570 complexity. 572 All the bits of the encoded bit stream are transmitted always from 573 the the least significant bit towards the most significant bit. 575 4.5.6 G726-16, G726-24, G726-32, G726-40 577 ITU-T Recommendation G.726 describes, among others, the algorithm 578 recommended for conversion of a single 64 kbit/s A-law or mu-law PCM 579 channel encoded at 8000 samples/sec to and from a 32 kbit/s channel. 580 The conversion is applied to the PCM stream using an Adaptive 581 Differential Pulse Code Modulation (ADPCM) transcoding technique. 582 G.726 describes codecs operating at 16 kb/s (2 bits/sample), 24 kb/s 583 (3 bits/sample), 32 kb/s (4 bits/sample), 40 kb/s (5 bits/sample). 584 These encodings are labeled G726-16, G726-24, G726-32 and G726-40, 585 respectively. 587 Note: In 1990, ITU-T Recommendation G.721 was merged with 588 Recommendation G.723 into ITU-T Recommendation G.726. Thus, G726-32 589 designates the same algorithm as G721 in RFC 1890. 591 No header information shall be included as part of the audio data. 592 The 4-bit code words of the G726-32 encoding MUST be packed into 593 octets as follows: the first code word is placed in the four least 594 significant bits of the first octet, with the least significant bit 595 of the code word in the least significant bit of the octet; the 596 second code word is placed in the four most significant bits of the 597 first octet, with the most significant bit of the code word in the 598 most significant bit of the octet. Subsequent pairs of the code words 599 shall be packed in the same way into successive octets, with the 600 first code word of each pair placed in the least significant four 601 bits of the octet. It is prefered that the voice sample be extended 602 with silence such that the encoded value comprises an even number of 603 code words. [TBD: Shouldn't we just require an even number of 604 samples?] 606 4.5.7 G727-16, G727-24, G727-32, G727-40 608 ITU-T Recommendation G.727, "5-, 4-, 3- and 2-bits sample embedded 609 adaptive differential pulse code modulation (ADPCM)", specifies an 610 embedded ADPCM algorithm which has the intrinsic capability of 611 dropping bits in the encoded words to alleviate network congestion 612 conditions. The algorithm, although not bitstream compatible with 613 G.726, was based and has a structure similar to the G.726 ADPCM 614 algorithm. 616 4.5.8 G728 618 G728 is specified in ITU-T Recommendation G.728, "Coding of speech at 619 16 kbit/s using low-delay code excited linear prediction". 621 A G.278 encoder translates 5 consecutive audio samples into a 10-bit 622 codebook index, resulting in a bit rate of 16 kb/s for audio sampled 623 at 8,000 samples per second. The group of five consecutive samples is 624 called a vector. Four consecutive vectors, labeled V1 to V4 (where V1 625 is to be played first by the receiver), build one G.728 frame. The 626 four vectors of 40 bits are packed into 5 octets, labeled B1 through 627 B5. B1 shall be placed first in the RTP packet. 629 Referring to the figure below, the principle for bit order is 630 "maintenance of bit significance". Bits from an older vector are more 631 significant than bits from newer vectors. The MSB of the frame goes 632 to the MSB of B1 and the LSB of the frame goes to LSB of B5. For 633 example: octet B1 contains the eight most significant bits of vector 634 V1, the MSB of V1 is MSB of B1. 636 1 2 3 3 637 0 0 0 0 9 638 ++++++++++++++++++++++++++++++++++++++++ 639 <---V1---><---V2---><---V3---><---V4---> vectors 640 <--B1--><--B2--><--B3--><--B4--><--B5--> octets 641 <------------- frame 1 ----------------> 642 In particular, B1 contains the eight most significant bits of V1, 643 with the MSB of V1 being the MSB of B1. B2 contains the two least 644 significant bits of V1, the more significant of the two in its MSB, 645 and the six most significant bits of V2. B1 shall be placed first in 646 the RTP packet and B5 last. 648 4.5.9 G729 650 G729 is specified in ITU-T Recommendation G.729, "Coding of speech at 651 8 kbit/s using conjugate structure-algebraic code excited linear 652 prediction (CS-ACELP)". A complexity-reduced version of the G.729 653 algorithm is specified in Annex A to Rec. G.729. The speech coding 654 algorithms in the main body of G.729 and in G.729 Annex A are fully 655 interoperable with each other, so there is no need to further 656 distinguish between them. The G.729 and G.729 Annex A codecs were 657 optimized to represent speech with high quality, where G.729 Annex A 658 trades some speech quality for an approximate 50% complexity 659 reduction [7]. 661 A voice activity detector (VAD) and comfort noise generator (CNG) 662 algorithm in Annex B of G.729 is recommended for digital simultaneous 663 voice and data applications and can be used in conjunction with G.729 664 or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets, 665 while the G.729 Annex B comfort noise frame occupies 2 octets: 667 0 1 668 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 669 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 670 |L| LSF1 | LSF2 | GAIN |R| 671 |S| | | |E| 672 |F|0 1 2 3 4|0 1 2 3|0 1 2 3 4|S| 673 |0| | | |V| RESV = Reserved (zero) 674 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 676 An RTP packet may consist of zero or more G.729 or G.729 Annex A 677 frames, followed by zero or one G.729 Annex B payloads. The presence 678 of a comfort noise frame can be deduced from the length of the RTP 679 payload. 681 A floating-point version of the G.729, G.729 Annex A, and G.729 Annex 682 B will be available shortly as Annex C to Recommendation G.729. 684 The transmitted parameters of a G.729/G.729A 10-ms frame, consisting 685 of 80 bits, are defined in Recommendation G.729, Table 8/G.729. 687 The mapping of the these parameters is given below. Bits are numbered 688 as Internet order, that is, the most significant bit is bit 0. 690 0 1 2 3 691 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 692 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 693 |L| L1 | L2 | L3 | P1 |P| C1 | 694 |0| | | | |0| | 695 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4| 696 | | | | | | | | 697 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 699 4 5 6 700 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 702 | C1 | S1 | GA1 | GB1 | P2 | C2 | 703 | | | | | | | 704 |5 6 7 8 9 1 1 1|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7| 705 | 0 1 2| | | | | | 706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 708 7 709 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 710 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 711 | C2 | S2 | GA2 | GB2 | 712 | | | | | 713 |8 9 1 1 1|0 1 2 3|0 1 2|0 1 2 3| 714 | 0 1 2| | | | 715 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 717 The encoding name "G729B" is assigned for the case when a particular 718 RTP payload type is to contain G.729 Annex B comfort noise packets 719 only. This may be necessary if the underlying RTP mechanism has no 720 means of distinguishing talkspurt from comfort-noise packets. 722 4.5.10 GSM 724 GSM (group speciale mobile) denotes the European GSM 06.10 725 provisional standard for full-rate speech transcoding, prI-ETS 300 726 036, which is based on RPE/LTP (residual pulse excitation/long term 727 prediction) coding at a rate of 13 kb/s [8,9,10]. The text of the 728 standard can be obtained from 730 ETSI (European Telecommunications Standards Institute) 731 ETSI Secretariat: B.P.152 732 F-06561 Valbonne Cedex 733 France 734 Phone: +33 92 94 42 00 735 Fax: +33 93 65 47 16 737 Blocks of 160 audio samples are compressed into 33 octets, for an 738 effective data rate of 13,200 b/s. 740 4.5.10.1 General Packaging Issues 742 The GSM standard specifies the bit stream produced by the codec, but 743 does not specify how these bits should be packed for transmission. 744 Some software implementations of the GSM codec use a different 745 packing than that specified here. 747 In the GSM encoding used by RTP, the bits are packed beginning from 748 the most significant bit. Every 160 sample GSM frame is coded into 749 one 33 octet (264 bit) buffer. Every such buffer begins with a 4 bit 750 signature (0xD), followed by the MSB encoding of the fields of the 751 frame. The first octet thus contains 1101 in the 4 most significant 752 bits (0-3) and the 4 most significant bits of F1 (0-3) in the 4 least 753 significant bits (4-7). The second octet contains the 2 least 754 significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so on. 755 The order of the fields in the frame is as follows: 757 4.5.10.2 GSM variable names and numbers 759 So if F.i signifies the ith bit of the field F, and bit 0 is the most 760 significant bit, and the bits of every octet are numbered from 0 to 7 761 from most to least significant, then in the RTP encoding we have: 763 4.5.11 L8 765 L8 denotes linear audio data, using 8-bits of precision with an 766 offset of 128, that is, the most negative signal is encoded as zero. 768 field field name bits field field name bits 769 __________________________________________________________ 770 1 LARc[0] 6 39 xmc[22] 3 771 2 LARc[1] 6 40 xmc[23] 3 772 3 LARc[2] 5 41 xmc[24] 3 773 4 LARc[3] 5 42 xmc[25] 3 774 5 LARc[4] 4 43 Nc[2] 7 775 6 LARc[5] 4 44 bc[2] 2 776 7 LARc[6] 3 45 Mc[2] 2 777 8 LARc[7] 3 46 xmaxc[2] 6 778 9 Nc[0] 7 47 xmc[26] 3 779 10 bc[0] 2 48 xmc[27] 3 780 11 Mc[0] 2 49 xmc[28] 3 781 12 xmaxc[0] 6 50 xmc[29] 3 782 13 xmc[0] 3 51 xmc[30] 3 783 14 xmc[1] 3 52 xmc[31] 3 784 15 xmc[2] 3 53 xmc[32] 3 785 16 xmc[3] 3 54 xmc[33] 3 786 17 xmc[4] 3 55 xmc[34] 3 787 18 xmc[5] 3 56 xmc[35] 3 788 19 xmc[6] 3 57 xmc[36] 3 789 20 xmc[7] 3 58 xmc[37] 3 790 21 xmc[8] 3 59 xmc[38] 3 791 22 xmc[9] 3 60 Nc[3] 7 792 23 xmc[10] 3 61 bc[3] 2 793 24 xmc[11] 3 62 Mc[3] 2 794 25 xmc[12] 3 63 xmaxc[3] 6 795 26 Nc[1] 7 64 xmc[39] 3 796 27 bc[1] 2 65 xmc[40] 3 797 28 Mc[1] 2 66 xmc[41] 3 798 29 xmaxc[1] 6 67 xmc[42] 3 799 30 xmc[13] 3 68 xmc[43] 3 800 31 xmc[14] 3 69 xmc[44] 3 801 32 xmc[15] 3 70 xmc[45] 3 802 33 xmc[16] 3 71 xmc[46] 3 803 34 xmc[17] 3 72 xmc[47] 3 804 35 xmc[18] 3 73 xmc[48] 3 805 36 xmc[19] 3 74 xmc[49] 3 806 37 xmc[20] 3 75 xmc[50] 3 807 38 xmc[21] 3 76 xmc[51] 3 809 Table 2: Ordering of GSM variables 811 4.5.12 L16 813 L16 denotes uncompressed audio data, using 16-bit signed 814 representation with 65535 equally divided steps between minimum and 815 Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 816 _____________________________________________________________________________________________ 817 0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3 818 1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5 819 2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2 820 3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1 822 4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2 823 5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0 824 6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04 825 7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0 826 8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2 827 9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1 828 10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0 829 11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2 830 12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0 831 13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14 832 14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0 833 15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2 834 16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1 835 17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0 836 18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2 837 19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0 838 20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24 839 21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0 840 22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2 841 23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1 842 24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0 843 25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2 844 26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0 845 27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34 846 28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0 847 29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2 848 30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1 849 31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0 850 32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2 852 maximum signal level, ranging from --32768 to 32767. The value is 853 represented in two's complement notation and network byte order. 855 4.5.13 LPC 857 LPC designates an experimental linear predictive encoding contributed 858 by Ron Frederick, Xerox PARC, which is based on an implementation 859 written by Ron Zuckerman, Motorola, posted to the Usenet group 860 comp.dsp on June 26, 1992. The codec generates 14 octets for every 861 frame. The framesize is set to 20 ms, resulting in a bit rate of 862 5,600 b/s. 864 4.5.14 MPA 866 MPA denotes MPEG-I or MPEG-II audio encapsulated as elementary 867 streams. The encoding is defined in ISO standards ISO/IEC 11172-3 868 and 13818-3. The encapsulation is specified in RFC 2038 [11]. 870 Sampling rate and channel count are contained in the payload. MPEG-I 871 audio supports sampling rates of 32000, 44100, and 48000 Hz (ISO/IEC 872 11172-3, section 1.1; "Scope"). MPEG-II additionally supports ISO/IEC 873 11172-3 Audio. "TBD"). [Something missing here.] 875 4.5.15 PCMA and PCMU 877 PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio data 878 is encoded as eight bits per sample, after logarithmic scaling. PCMU 879 denotes mu-law scaling, PCMA A-law scaling. A detailed description is 880 given by Jayant and Noll [12]. Each G.711 octet shall be octet- 881 aligned in an RTP packet. The sign bit of each G.711 octet shall 882 correspond to the most significant bit of the octet in the RTP packet 883 (i.e., assuming the G.711 samples are handled as octets on the host 884 machine, the sign bit shall be the most signficant bit of the octet 885 as defined by the host machine format). The 56 kb/s and 48 kb/s modes 886 of G.711 are not applicable to RTP, since G.711 shall always be 887 transmitted as 8-bit samples. 889 4.5.16 RED 891 The redundant audio payload format "RED" is specified by RFC XXX. It 892 defines a means by which multiple redundant copies of an audio packet 893 may be transmitted in a single RTP stream. Each packet in such a 894 stream contains, in addition to the audio data for that packetization 895 interval, a (more heavily compressed) copy of the data from the 896 previous packetization interval. This allows an approximation of the 897 data from lost packets to be recovered upon decoding of the following 898 packet, giving much improved sound quality when compared with silence 899 substitution for lost packets. 901 4.5.17 SX7300P 903 The SX7300P is a low-complexity CELP-based audio codec operating at a 904 sampling rate of 8000 Hz. It encodes blocks of 120 audio samples (15 905 ms) into an encoded frame of 14 octets, yielding an encoded bit rate 906 of approximately 7467 b/s. 908 4.5.18 SX8300P 910 The SX8300P is a low-complexity CELP-based audio codec operating at a 911 sampling rate of 8000 Hz. It encodes blocks of 120 audio samples (15 912 ms) into an encoded frame of 16 octets, yielding an encoded bit rate 913 of approximately 8533 b/s. 915 4.5.19 VDVI 917 VDVI is a variable-rate version of DVI4, yielding speech bit rates of 918 between 10 and 25 kb/s. It is specified for single-channel operation 919 only. Samples are packed into octets starting at the most-significant 920 bit. 922 It uses the following encoding: 924 DVI4 codeword VDVI bit pattern 925 _________________________________ 926 0 00 927 1 010 928 2 1100 929 3 11100 930 4 111100 931 5 1111100 932 6 11111100 933 7 11111110 934 8 10 935 9 011 936 10 1101 937 11 11101 938 12 111101 939 13 1111101 940 14 11111101 941 15 11111111 943 5 Video 945 The following video encodings are currently defined, with their 946 abbreviated names used for identification: 948 5.1 CelB 950 The CELL-B encoding is a proprietary encoding proposed by Sun 951 Microsystems. The byte stream format is described in RFC 2029 [13]. 953 5.2 JPEG 955 The encoding is specified in ISO Standards 10918-1 and 10918-2. The 956 RTP payload format is as specified in RFC 2035 [14]. 958 5.3 H261 960 The encoding is specified in ITU-T Recommendation H.261, "Video codec 961 for audiovisual services at p x 64 kbit/s". The packetization and 962 RTP-specific properties are described in RFC 2032 [15]. 964 5.4 H263 966 The encoding is specified in ITU-T Recommendation H.263, "Video 967 coding for low bit rate communication". The packetization and RTP- 968 specific properties are described in [16]. 970 5.5 MPV 972 MPV designates the use MPEG-I and MPEG-II video encoding elementary 973 streams as specified in ISO Standards ISO/IEC 11172 and 13818-2, 974 respectively. The RTP payload format is as specified in RFC 2038 975 [11], Section 3. 977 5.6 MP2T 979 MP2T designates the use of MPEG-II transport streams, for either 980 audio or video. The encapsulation is described in RFC 2038 [11], 981 Section 2. See the description of the MPA audio encoding for contact 982 information. 984 5.7 nv 986 The encoding is implemented in the program 'nv', version 4, developed 987 at Xerox PARC by Ron Frederick. Further information is available from 988 the author: 990 Ron Frederick 991 Xerox Palo Alto Research Center 992 3333 Coyote Hill Road 993 Palo Alto, CA 94304 994 United States 995 electronic mail: frederic@parc.xerox.com 997 6 Payload Type Definitions 999 Table 3 defines this profile's static payload type values for the PT 1000 field of the RTP data header. A new RTP payload format specification 1001 may be registered with the IANA by name, and may also be assigned a 1002 static payload type value from the range marked in Section 3. 1004 In addition, payload type values in the range 96--127 may be defined 1005 dynamically through a conference control protocol, which is beyond 1006 the scope of this document. For example, a session directory could 1007 specify that for a given session, payload type 96 indicates PCMU 1008 encoding, 8,000 Hz sampling rate, 2 channels. The payload type range 1009 marked 'reserved' has been set aside so that RTCP and RTP packets can 1010 be reliably distinguished (see Section "Summary of Protocol 1011 Constants" of the RTP protocol specification). 1013 An RTP source emits a single RTP payload type at any given instant. 1014 The interleaving or multiplexing of several RTP media types within a 1015 single RTP session is not allowed, but multiple RTP sessions may be 1016 used in parallel to send multiple media types. An RTP source may 1017 change payload types during a session. 1019 The payload types currently defined in this profile are assigned to 1020 exactly one of three categories or media types : audio only, video 1021 only and those combining audio and video. A single RTP session 1022 consists of payload types of one and only media type. 1024 Session participants agree through mechanisms beyond the scope of 1025 this specification on the set of payload types allowed in a given 1026 session. This set may, for example, be defined by the capabilities 1027 of the applications used, negotiated by a conference control protocol 1028 or established by agreement between the human participants. The media 1029 types in Table 3 are marked as "A" for audio, "V" for video and "AV" 1030 for combined audio/video streams. 1032 Audio applications operating under this profile should, at minimum, 1033 be able to send and receive payload types 0 (PCMU) and 5 (DVI4). This 1034 allows interoperability without format negotiation and successful 1035 negotation with a conference control protocol. 1037 All current video encodings use a timestamp frequency of 90,000 Hz, 1038 the same as the MPEG presentation time stamp frequency. This 1039 frequency yields exact integer timestamp increments for the typical 1040 24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates 1041 and 50, 59.94 and 60 Hz field rates. While 90 kHz is the recommended 1042 rate for future video encodings used within this profile, other rates 1043 are possible. However, it is not sufficient to use the video frame 1044 rate (typically between 15 and 30 Hz) because that does not provide 1045 adequate resolution for typical synchronization requirements when 1046 calculating the RTP timestamp corresponding to the NTP timestamp in 1047 an RTCP SR packet. The timestamp resolution must also be sufficient 1048 for the jitter estimate contained in the receiver reports. 1050 The standard video encodings and their payload types are listed in 1051 Table 3. 1053 PT encoding media type clock rate channels 1054 name (Hz) (audio) 1055 _______________________________________________________________ 1056 0 PCMU A 8000 1 1057 1 1016 A 8000 1 1058 2 G726-32 A 8000 1 1059 3 GSM A 8000 1 1060 4 G723 A 8000 1 1061 5 DVI4 A 8000 1 1062 6 DVI4 A 16000 1 1063 7 LPC A 8000 1 1064 8 PCMA A 8000 1 1065 9 G722 A 16000 1 1066 10 L16 A 44100 2 1067 11 L16 A 44100 1 1068 12 unassigned A 1069 13 unassigned A 1070 14 MPA A 90000 (see text) 1071 15 G728 A 8000 1 1072 16 DVI4 A 11025 1 1073 17 DVI4 A 22050 1 1074 18 G729 A 8000 1 1075 19 CN A 8000 1 1076 20 unassigned A 1077 21 unassigned A 1078 22 unassigned A 1079 23 unassigned A 1080 24 unassigned V 1081 25 CelB V 90000 1082 26 JPEG V 90000 1083 27 unassigned V 1084 28 nv V 90000 1085 29 unassigned V 1086 30 unassigned V 1087 31 H261 V 90000 1088 32 MPV V 90000 1089 33 MP2T AV 90000 1090 34 H263 V 90000 1091 35--71 unassigned ? 1092 72--76 reserved N/A N/A N/A 1093 77 RED A N/A N/A 1094 78--95 unassigned ? 1095 96--127 dynamic ? 1097 Table 3: Payload types (PT) for standard audio and video encodings 1099 7 RTP over TCP and Similar Byte Stream Protocols 1101 Under special circumstances, it may be necessary to carry RTP in 1102 protocols offering a byte stream abstraction, such as TCP, possibly 1103 multiplexed with other data. If the application does not define its 1104 own method of delineating RTP and RTCP packets, it SHOULD prefix each 1105 packet with a two-octet length field. 1107 (Note: RTSP [17] provides its own encapsulation and does not need an 1108 extra length indication.) 1110 8 Port Assignment 1112 As specified in the RTP protocol definition, RTP data is to be 1113 carried on an even UDP or TCP port number and the corresponding RTCP 1114 packets are to be carried on the next higher (odd) port number. 1116 Applications operating under this profile may use any such UDP or TCP 1117 port pair. For example, the port pair may be allocated randomly by a 1118 session management program. A single fixed port number pair cannot be 1119 required because multiple applications using this profile are likely 1120 to run on the same host, and there are some operating systems that do 1121 not allow multiple processes to use the same UDP port with different 1122 multicast addresses. 1124 However, port numbers 5004 and 5005 have been registered for use with 1125 this profile for those applications that choose to use them as the 1126 default pair. Applications that operate under multiple profiles may 1127 use this port pair as an indication to select this profile if they 1128 are not subject to the constraint of the previous paragraph. 1129 Applications need not have a default and may require that the port 1130 pair be explicitly specified. The particular port numbers were chosen 1131 to lie in the range above 5000 to accomodate port number allocation 1132 practice within the Unix operating system, where port numbers below 1133 1024 can only be used by privileged processes and port numbers 1134 between 1024 and 5000 are automatically assigned by the operating 1135 system. 1137 9 Bibliography 1139 [1] Apple Computer, "Audio interchange file format AIFF-C," Aug. 1140 1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z). 1142 [2] Office of Technology and Standards, "Telecommunications: Analog 1143 to digital conversion of radio voice by 4,800 bit/second code excited 1144 linear prediction (celp)," Federal Standard FS-1016, GSA, Room 6654; 1145 7th & D Street SW; Washington, DC 20407 (+1-202-708-9205), 1990. 1147 [3] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The 1148 proposed Federal Standard 1016 4800 bps voice coder: CELP," Speech 1149 Technology , vol. 5, pp. 58--64, April/May 1990. 1151 [4] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The federal 1152 standard 1016 4800 bps CELP voice coder," Digital Signal Processing , 1153 vol. 1, no. 3, pp. 145--155, 1991. 1155 [5] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The dod 4.8 1156 kbps standard (proposed federal standard 1016)," in Advances in 1157 Speech Coding (B. Atal, V. Cuperman, and A. Gersho, eds.), ch. 12, 1158 pp. 121--133, Kluwer Academic Publishers, 1991. 1160 [6] IMA Digital Audio Focus and Technical Working Groups, 1161 "Recommended practices for enhancing digital audio compatibility in 1162 multimedia systems (version 3.00)," tech. rep., Interactive 1163 Multimedia Association, Annapolis, Maryland, Oct. 1992. 1165 [7] D. Del�am and J.-P. Petit, "Real-time implementations of the 1166 recent ITU-T low bit rate speech coders on the TI TMS320C54X DSP: 1167 results, methodology, and applications," in Proc. of International 1168 Conference on Signal Processing, Technology, and Applications 1169 (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660, Oct. 1996. 1171 [8] M. Mouly and M.-B. Pautet, The GSM system for mobile 1172 communications Lassay-les-Chateaux, France: Europe Media Duplication, 1173 1993. 1175 [9] J. Degener, "Digital speech compression," Dr. Dobb's Journal , 1176 Dec. 1994. 1178 [10] S. M. Redl, M. K. Weber, and M. W. Oliphant, An Introduction to 1179 GSM Boston: Artech House, 1995. 1181 [11] D. Hoffman, G. Fernando, and V. Goyal, "RTP payload format for 1182 MPEG1/MPEG2 video," Request for Comments (Proposed Standard) RFC 1183 2038, Internet Engineering Task Force, Oct. 1996. 1185 [12] N. S. Jayant and P. Noll, Digital Coding of Waveforms-- 1186 Principles and Applications to Speech and Video Englewood Cliffs, New 1187 Jersey: Prentice-Hall, 1984. 1189 [13] M. Speer and D. Hoffman, "RTP payload format of sun's CellB 1190 video encoding," Request for Comments (Proposed Standard) RFC 2029, 1191 Internet Engineering Task Force, Oct. 1996. 1193 [14] L. Berc, W. Fenner, R. Frederick, and S. McCanne, "RTP payload 1194 format for JPEG-compressed video," Request for Comments (Proposed 1195 Standard) RFC 2035, Internet Engineering Task Force, Oct. 1996. 1197 [15] T. Turletti and C. Huitema, "RTP payload format for H.261 video 1198 streams," Request for Comments (Proposed Standard) RFC 2032, Internet 1199 Engineering Task Force, Oct. 1996. 1201 [16] C. C. Zhu, "RTP payload format for H.263 video streams," 1202 Internet Draft, Internet Engineering Task Force, Mar. 1997. Work in 1203 progress. 1205 [17] H. Schulzrinne, A. Rao, and R. Lanphier, "Real time streaming 1206 protocol (RTSP)," Internet Draft, Internet Engineering Task Force, 1207 July 1997. Work in progress. 1209 10 Acknowledgements 1211 The comments and careful review of Steve Casner, Simao Campos and 1212 Richard Cox are gratefully acknowledged. The GSM description was 1213 adopted from the IMTC Voice over IP Forum Service Interoperability 1214 Implementation Agreement (January 1997). Fred Burg and Terry Lyons 1215 helped with the G.729 description. 1217 11 Address of Author 1219 Henning Schulzrinne 1220 Dept. of Computer Science 1221 Columbia University 1222 1214 Amsterdam Avenue 1223 New York, NY 10027 1224 USA 1225 electronic mail: schulzrinne@cs.columbia.edu 1227 Current Locations of Related Resources 1229 Note: Several sections below refer to the ITU-T Software Tool Library 1230 (STL). It is available from the ITU Sales Service, Place des Nations, 1231 CH-1211 Geneve 20, Switzerland (also check http://www.itu.int. The 1232 ITU-T STL is covered by a license defined in ITU-T Recommendation 1233 G.191, " Software tools for speech and audio coding standardization 1234 ". 1236 UTF-8 1238 Information on the UCS Transformation Format 8 (UTF-8) is available 1239 at 1240 http://www.stonehand.com/unicode/standard/utf8.html 1242 1016 1244 The U.S. DoD's Federal-Standard-1016 based 4800 bps code excited 1245 linear prediction voice coder version 3.2 (CELP 3.2) Fortran and C 1246 simulation source codes are available for worldwide distribution at 1247 no charge (on DOS diskettes, but configured to compile on Sun SPARC 1248 stations) from: Bob Fenichel, National Communications System, 1249 Washington, D.C. 20305, phone +1-703-692-2124, fax +1-703-746-4960. 1251 An implementation is also available at 1253 ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z 1255 DVI4 1257 An implementation is available from Jack Jansen at 1259 ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar 1261 G722 1263 An implementation of the G.722 algorithm is available as part of the 1264 ITU-T STL, described above. 1266 G723 1268 The reference C code implementation defining the G.723.1 algorithm 1269 and its Annexes A, B, and C are available as an integral part of 1270 Recommendation G.723.1 from the ITU Sales Service, address listed 1271 above. Both the algorithm and C code are covered by a specific 1272 license. The ITU-T Secretariat should be contacted to obtain such 1273 licensing information. 1275 G726-16 through G726-40 1277 G726-16 through G726-40 are specified in the ITU-T Recommendation 1278 G.726, "40, 32, 24, and 16 kb/s Adaptive Differential Pulse Code 1279 Modulation (ADPCM)". An implementation of the G.726 algorithm is 1280 available as part of the ITU-T STL, described above. 1282 G727-16 through G727-40 1284 G727-16 through G727-40 are specified in the ITU-T Recommendation 1285 G.727, "5-, 4-, 3-, and 2-bit/sample embedded adaptive differential 1286 pulse code modulation". An implementation of the G.727 algorithm will 1287 be available in a future release of the ITU-T STL, described above. 1289 G729 1291 The reference C code implementation defining the G.729 algorithm and 1292 its Annexes A and B are available as an integral part of 1293 Recommendation G.729 from the ITU Sales Service, listed above. Both 1294 the algorithm and the C code are covered by a specific license. The 1295 contact information for obtaining the license is listed in the C 1296 code. 1298 GSM 1300 A reference implementation was written by Carsten Borman and Jutta 1301 Degener (TU Berlin, Germany). It is available at 1303 ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ 1305 Although the RPE-LTP algorithm is not an ITU-T standard, there is a C 1306 code implementation of the RPE-LTP algorithm available as part of the 1307 ITU-T STL. The STL implementation is an adaptation of the TU Berlin 1308 version. 1310 LPC 1312 An implementation is available at 1314 ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z 1316 PCMU, PCMA 1318 An implementation of these algorithm is available as part of the 1319 ITU-T STL, described above. Code to convert between linear and mu-law 1320 companded data is also available in [6]. 1322 Table of Contents 1324 1 Introduction ........................................ 2 1325 2 RTP and RTCP Packet Forms and Protocol Behavior ..... 3 1326 3 Registering Payload Types ........................... 5 1327 4 Audio ............................................... 6 1328 4.1 Encoding-Independent Rules .......................... 6 1329 4.2 Operating Recommendations ........................... 7 1330 4.3 Guidelines for Sample-Based Audio Encodings ......... 8 1331 4.4 Guidelines for Frame-Based Audio Encodings .......... 8 1332 4.5 Audio Encodings ..................................... 9 1333 4.5.1 1016 ................................................ 9 1334 4.5.2 CN .................................................. 9 1335 4.5.3 DVI4 ................................................ 11 1336 4.5.4 G722 ................................................ 12 1337 4.5.5 G723 ................................................ 12 1338 4.5.6 G726-16, G726-24, G726-32, G726-40 .................. 13 1339 4.5.7 G727-16, G727-24, G727-32, G727-40 .................. 14 1340 4.5.8 G728 ................................................ 14 1341 4.5.9 G729 ................................................ 15 1342 4.5.10 GSM ................................................. 17 1343 4.5.10.1 General Packaging Issues ............................ 17 1344 4.5.10.2 GSM variable names and numbers ...................... 17 1345 4.5.11 L8 .................................................. 17 1346 4.5.12 L16 ................................................. 18 1347 4.5.13 LPC ................................................. 19 1348 4.5.14 MPA ................................................. 20 1349 4.5.15 PCMA and PCMU ....................................... 20 1350 4.5.16 RED ................................................. 20 1351 4.5.17 SX7300P ............................................. 20 1352 4.5.18 SX8300P ............................................. 20 1353 4.5.19 VDVI ................................................ 21 1354 5 Video ............................................... 21 1355 5.1 CelB ................................................ 21 1356 5.2 JPEG ................................................ 21 1357 5.3 H261 ................................................ 22 1358 5.4 H263 ................................................ 22 1359 5.5 MPV ................................................. 22 1360 5.6 MP2T ................................................ 22 1361 5.7 nv .................................................. 22 1362 6 Payload Type Definitions ............................ 22 1363 7 RTP over TCP and Similar Byte Stream Protocols ...... 25 1364 8 Port Assignment ..................................... 25 1365 9 Bibliography ........................................ 25 1366 10 Acknowledgements .................................... 27 1367 11 Address of Author ................................... 27