idnits 2.17.1 draft-ietf-avt-profile-new-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** Expected the document's filename to be given on the first page, but didn't find any ** The document is more than 15 pages and seems to lack a Table of Contents. == There is 1 instance of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** There are 56 instances of too long lines in the document, the longest one being 24 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 406: '...nt coded bit rates. Non-RTP means MUST...' RFC 2119 keyword, line 592: '...e G726-32 encoding MUST be packed into...' RFC 2119 keyword, line 1049: '...nd RTCP packets, it SHOULD prefix each...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 297 has weird spacing: '...hannels des...' == Line 305 has weird spacing: '... lc c ...' == Line 426 has weird spacing: '...ncoding sam...' == Line 454 has weird spacing: '...A: not appli...' == Line 560 has weird spacing: '... bits con...' == (2 more instances...) == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- The document date (July 29, 1997) is 9762 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 1129 looks like a reference -- Missing reference section? '2' on line 1132 looks like a reference -- Missing reference section? '3' on line 1137 looks like a reference -- Missing reference section? '4' on line 1141 looks like a reference -- Missing reference section? '5' on line 1145 looks like a reference -- Missing reference section? '6' on line 1311 looks like a reference -- Missing reference section? '7' on line 1155 looks like a reference -- Missing reference section? '8' on line 1161 looks like a reference -- Missing reference section? '9' on line 1165 looks like a reference -- Missing reference section? '10' on line 1168 looks like a reference -- Missing reference section? '0' on line 781 looks like a reference -- Missing reference section? '22' on line 769 looks like a reference -- Missing reference section? '23' on line 770 looks like a reference -- Missing reference section? '24' on line 771 looks like a reference -- Missing reference section? '25' on line 772 looks like a reference -- Missing reference section? '26' on line 777 looks like a reference -- Missing reference section? '27' on line 778 looks like a reference -- Missing reference section? '28' on line 779 looks like a reference -- Missing reference section? '29' on line 780 looks like a reference -- Missing reference section? '30' on line 781 looks like a reference -- Missing reference section? '31' on line 782 looks like a reference -- Missing reference section? '32' on line 783 looks like a reference -- Missing reference section? '33' on line 784 looks like a reference -- Missing reference section? '34' on line 785 looks like a reference -- Missing reference section? '35' on line 786 looks like a reference -- Missing reference section? '36' on line 787 looks like a reference -- Missing reference section? '37' on line 788 looks like a reference -- Missing reference section? '38' on line 789 looks like a reference -- Missing reference section? '11' on line 1171 looks like a reference -- Missing reference section? '12' on line 1175 looks like a reference -- Missing reference section? '39' on line 794 looks like a reference -- Missing reference section? '40' on line 795 looks like a reference -- Missing reference section? '41' on line 796 looks like a reference -- Missing reference section? '42' on line 797 looks like a reference -- Missing reference section? '13' on line 1179 looks like a reference -- Missing reference section? '43' on line 798 looks like a reference -- Missing reference section? '14' on line 1183 looks like a reference -- Missing reference section? '44' on line 799 looks like a reference -- Missing reference section? '15' on line 1187 looks like a reference -- Missing reference section? '45' on line 800 looks like a reference -- Missing reference section? '16' on line 1191 looks like a reference -- Missing reference section? '46' on line 801 looks like a reference -- Missing reference section? '17' on line 1195 looks like a reference -- Missing reference section? '47' on line 802 looks like a reference -- Missing reference section? '18' on line 803 looks like a reference -- Missing reference section? '48' on line 803 looks like a reference -- Missing reference section? '19' on line 804 looks like a reference -- Missing reference section? '49' on line 804 looks like a reference -- Missing reference section? '20' on line 805 looks like a reference -- Missing reference section? '50' on line 805 looks like a reference -- Missing reference section? '21' on line 806 looks like a reference -- Missing reference section? '51' on line 806 looks like a reference Summary: 13 errors (**), 0 flaws (~~), 9 warnings (==), 54 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force AVT WG 2 Internet Draft Schulzrinne 3 ietf-avt-profile-new-01.txt Columbia U. 4 July 29, 1997 5 Expires: January 1, 1998 7 RTP Profile for Audio and Video Conferences with Minimal Control 9 STATUS OF THIS MEMO 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its areas, 13 and its working groups. Note that other groups may also distribute 14 working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is inappropriate to use Internet-Drafts as reference 19 material or to cite them other than as ``work in progress''. 21 To learn the current status of any Internet-Draft, please check the 22 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 23 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 25 ftp.isi.edu (US West Coast). 27 Distribution of this document is unlimited. 29 ABSTRACT 31 This memo describes a profile called "RTP/AVP" for the 32 use of the real-time transport protocol (RTP), version 2, 33 and the associated control protocol, RTCP, within audio 34 and video multiparticipant conferences with minimal 35 control. It provides interpretations of generic fields 36 within the RTP specification suitable for audio and video 37 conferences. In particular, this document defines a set 38 of default mappings from payload type numbers to 39 encodings. 41 The document also describes how audio and video data may 42 be carried within RTP. It defines a set of standard 43 encodings and their names when used within RTP. However, 44 the encoding definitions are independent of the 45 particular transport mechanism used. The descriptions 46 provide pointers to reference implementations and the 47 detailed standards. This document is meant as an aid for 48 implementors of audio, video and other real-time 49 multimedia applications. 51 Changes 53 This draft revises RFC 1890. It is fully backwards-compatible with 54 RFC 1890 and codifies existing practice. It is intended that this 55 draft form the basis of a new RFC to obsolete RFC 1890 as it moves to 56 Draft Standard. 58 Besides wording clarifications and filling in RFC numbers for payload 59 type definitions, this draft adds payload types 4, 16, 17, 18, 19 and 60 34. The PostScript version of this draft contains change bars marking 61 changes make since draft -00. 63 A tentative TCP encapsulation is defined. 65 According to Peter Hoddie of Apple, only pre-1994 Macintosh used the 66 22254.54 rate and none the 11127.27 rate. 68 Note to RFC editor: This section is to be removed before publication 69 as an RFC. All RFC TBD should be filled in with the number of the RTP 70 specification RFC submitted for Draft Standard status. 72 1 Introduction 74 This profile defines aspects of RTP left unspecified in the RTP 75 Version 2 protocol definition (RFC XXXX). This profile is intended 76 for the use within audio and video conferences with minimal session 77 control. In particular, no support for the negotiation of parameters 78 or membership control is provided. The profile is expected to be 79 useful in sessions where no negotiation or membership control are 80 used (e.g., using the static payload types and the membership 81 indications provided by RTCP), but this profile may also be useful in 82 conjunction with a higher-level control protocol. 84 Use of this profile occurs by use of the appropriate applications; 85 there is no explicit indication by port number, protocol identifier 86 or the like. Applications such as session directories should refer to 87 this profile as "RTP/AVP". 89 Other profiles may make different choices for the items specified 90 here. 92 This document also defines a set of payload formats for audio. 94 This draft defines the term media type as dividing encodings of audio 95 and video content into three classes: audio, video and audio/video 96 (interleaved). 98 2 RTP and RTCP Packet Forms and Protocol Behavior 100 The section "RTP Profiles and Payload Format Specification" of RFC 101 TBD enumerates a number of items that can be specified or modified in 102 a profile. This section addresses these items. Generally, this 103 profile follows the default and/or recommended aspects of the RTP 104 specification. 106 RTP data header: The standard format of the fixed RTP data header is 107 used (one marker bit). 109 Payload types: Static payload types are defined in Section 6. 111 RTP data header additions: No additional fixed fields are appended to 112 the RTP data header. 114 RTP data header extensions: No RTP header extensions are defined, but 115 applications operating under this profile may use such 116 extensions. Thus, applications should not assume that the RTP 117 header X bit is always zero and should be prepared to ignore the 118 header extension. If a header extension is defined in the 119 future, that definition must specify the contents of the first 120 16 bits in such a way that multiple different extensions can be 121 identified. 123 RTCP packet types: No additional RTCP packet types are defined by 124 this profile specification. 126 RTCP report interval: The suggested constants are to be used for the 127 RTCP report interval calculation. 129 SR/RR extension: No extension section is defined for the RTCP SR or 130 RR packet. 132 SDES use: Applications may use any of the SDES items described in the 133 RTP specification. While CNAME information is sent every 134 reporting interval, other items should be sent only every third 135 reporting interval, with NAME sent seven out of eight times 136 within that slot and the remaining SDES items cyclically taking 137 up the eighth slot, as defined in Section 6.2.2 of the RTP 138 specification. In other words, NAME is sent in RTCP packets 1, 139 4, 7, 10, 13, 16, 19, while, say, EMAIL is used in RTCP packet 140 22. 142 Security: The RTP default security services are also the default 143 under this profile. 145 String-to-key mapping: A user-provided string ("pass phrase") is 146 hashed with the MD5 algorithm to a 16-octet digest. An !n!-bit 147 key is extracted from the digest by taking the first !n! bits 148 from the digest. If several keys are needed with a total length 149 of 128 bits or less (as for triple DES), they are extracted in 150 order from that digest. The octet ordering is specified in RFC 151 1423, Section 2.2. (Note that some DES implementations require 152 that the 56-bit key be expanded into 8 octets by inserting an 153 odd parity bit in the most significant bit of the octet to go 154 with each 7 bits of the key.) 156 It is suggested that pass phrases are restricted to ASCII letters, 157 digits, the hyphen, and white space to reduce the the chance of 158 transcription errors when conveying keys by phone, fax, telex or 159 email. 161 The pass phrase may be preceded by a specification of the encryption 162 algorithm. Any characters up to the first slash (ASCII 0x2f) are 163 taken as the name of the encryption algorithm. The encryption format 164 specifiers should be drawn from RFC 1423 or any additional 165 identifiers registered with IANA. If no slash is present, DES-CBC is 166 assumed as default. The encryption algorithm specifier is case 167 sensitive. 169 The pass phrase typed by the user is transformed to a canonical form 170 before applying the hash algorithm. For that purpose, we define 171 return, tab, or vertical tab as well as all characters contained in 172 the Unicode space characters table. The transformation consists of 173 the following steps: (1) convert the input string to the ISO 10646 174 character set, using the UTF-8 encoding as specified in Annex P to 175 ISO/IEC 10646-1:1993 (ASCII characters require no mapping, but ISO 176 8859-1 characters do); (2) remove leading and trailing white space 177 characters; (3) replace one or more contiguous white space characters 178 by a single space (ASCII or UTF-8 0x20); (4) convert all letters to 179 lower case and replace sequences of characters and non-spacing 180 accents with a single character, where possible. A minimum length of 181 16 key characters (after applying the transformation) should be 182 enforced by the application, while applications must allow up to 256 183 characters of input. 185 Underlying protocol: The profile specifies the use of RTP over 186 unicast and multicast UDP as well as TCP. (This does not 187 preclude the use of these definitions when RTP is carried by 188 other lower-layer protocols.) 190 Transport mapping: The standard mapping of RTP and RTCP to 191 transport-level addresses is used. 193 Encapsulation: No encapsulation of RTP packets is specified. 195 3 Registering Payload Types 197 This profile defines a set of standard encodings and their payload 198 types when used within RTP. Other encodings and their payload types 199 are to be registered with the Internet Assigned Numbers Authority 200 (IANA). When registering a new encoding/payload type, the following 201 information should be provided: 203 o name and description of encoding, in particular the RTP 204 timestamp clock rate; the names defined here are 3 or 4 205 characters long to allow a compact representation if needed; 207 o indication of who has change control over the encoding (for 208 example, ISO, ITU-T, other international standardization 209 bodies, a consortium or a particular company or group of 210 companies); 212 o any operating parameters or profiles; 214 o a reference to a further description, if available, for 215 example (in order of preference) an RFC, a published paper, a 216 patent filing, a technical report, documented source code or a 217 computer manual; 219 o for proprietary encodings, contact information (postal and 220 email address); 222 o the payload type value for this profile, if necessary (see 223 below). 225 Note that not all encodings to be used by RTP need to be assigned a 226 static payload type. Non-RTP means beyond the scope of this memo 227 (such as directory services or invitation protocols) may be used to 228 establish a dynamic mapping between a payload type and an encoding 229 ("dynamic payload types"). Applications should first use the range 96 230 to 127 for dynamic payload types. Only applications which need to 231 define more than 32 dynamic payload types may redefine codes below 232 96. Redefining payload types below 96 may cause incorrect operation 233 if an attempt is made to join a session without obtaining session 234 description information that defines the dynamic payload types. 236 Note that dynamic payload types should not be used without a well- 237 defined mechanism to indicate the mapping. Systems that expect to 238 interoperate with others operating under this profile should not 239 assign proprietary encodings to particular, fixed payload types in 240 the range reserved for dynamic payload types. SDP (RFC XXXX ) defines 241 such a mapping mechanism. 243 The available payload type space is relatively small. Thus, new 244 static payload types are assigned only if the following conditions 245 are met: 247 o The encoding is of interest to the Internet community at 248 large. 250 o It offers benefits compared to existing encodings and/or is 251 required for interoperation with existing, widely deployed 252 conferencing or multimedia systems. 254 o The description is sufficient to build a decoder. 256 For implementor convenience, this profile contains descriptions of 257 encodings which do not currently have a static payload type assigned 258 to them. 260 The Session Description Protocol (SDP) (RFC XXXX) uses the encoding 261 names defined here. 263 4 Audio 265 4.1 Encoding-Independent Rules 267 For applications which send no packets during silence, the first 268 packet of a talkspurt, that is, the first packet after a silence 269 period, is distinguished by setting the marker bit in the RTP data 270 header to one. The marker bits in all other packets is zero. The 271 beginning of a talkspurt may be used to adjust the playout delay to 272 reflect changing network delays. Applications without silence 273 suppression set the bit to zero. 275 The RTP clock rate used for generating the RTP timestamp is 276 independent of the number of channels and the encoding; it equals the 277 number of sampling periods per second. For !N!-channel encodings, 278 each sampling period (say, 1/8000 of a second) generates !N! samples. 279 (This terminology is standard, but somewhat confusing, as the total 280 number of samples generated per second is then the sampling rate 281 times the channel count.) 283 If multiple audio channels are used, channels are numbered left-to- 284 right, starting at one. In RTP audio packets, information from 285 lower-numbered channels precedes that from higher-numbered channels. 286 For more than two channels, the convention followed by the AIFF-C 287 audio interchange format should be followed [1], using the following 288 notation: 290 l left 291 r right 292 c center 293 S surround 294 F front 295 R rear 297 channels description channel 298 1 2 3 4 5 6 299 ________________________________________________________________ 300 2 stereo l r 301 3 l r c 302 4 quadrophonic Fl Fr Rl Rr 303 4 l c r S 304 5 Fl Fr Fc Sl Sr 305 6 l lc c r rc S 307 Samples for all channels belonging to a single sampling instant must 308 be within the same packet. The interleaving of samples from different 309 channels depends on the encoding. General guidelines are given in 310 Section 4.3 and 4.4. 312 The sampling frequency should be drawn from the set: 8000, 11025, 313 16000, 22050, 24000, 32000, 44100 and 48000 Hz. (Older Apple 314 Macintosh computers had a native sample rate of 22254.54 Hz, which 315 can be converted to 22050 with acceptable quality by dropping 4 316 samples in a 20 ms frame.) However, most audio encodings are defined 317 for a more restricted set of sampling frequencies. Receivers should 318 be prepared to accept multi-channel audio, but may choose to only 319 play a single channel. 321 4.2 Operating Recommendations 323 The following recommendations are default operating parameters. 324 Applications should be prepared to handle other values. The ranges 325 given are meant to give guidance to application writers, allowing a 326 set of applications conforming to these guidelines to interoperate 327 without additional negotiation. These guidelines are not intended to 328 restrict operating parameters for applications that can negotiate a 329 set of interoperable parameters, e.g., through a conference control 330 protocol. 332 For packetized audio, the default packetization interval should have 333 a duration of 20 ms or one frame, whichever is longer, unless 334 otherwise noted in Table 1 (column "ms/packet"). The packetization 335 interval determines the minimum end-to-end delay; longer packets 336 introduce less header overhead but higher delay and make packet loss 337 more noticeable. For non-interactive applications such as lectures or 338 links with severe bandwidth constraints, a higher packetization delay 339 may be appropriate. A receiver should accept packets representing 340 between 0 and 200 ms of audio data. (For framed audio encodings, a 341 receiver should accept packets with 200 ms divided by the frame 342 duration, rounded up.) This restriction allows reasonable buffer 343 sizing for the receiver. 345 4.3 Guidelines for Sample-Based Audio Encodings 347 In sample-based encodings, each audio sample is represented by a 348 fixed number of bits. Within the compressed audio data, codes for 349 individual samples may span octet boundaries. An RTP audio packet may 350 contain any number of audio samples, subject to the constraint that 351 the number of bits per sample times the number of samples per packet 352 yields an integral octet count. Fractional encodings produce less 353 than one octet per sample. 355 The duration of an audio packet is determined by the number of 356 samples in the packet. 358 For sample-based encodings producing one or more octets per sample, 359 samples from different channels sampled at the same sampling instant 360 are packed in consecutive octets. For example, for a two-channel 361 encoding, the octet sequence is (left channel, first sample), (right 362 channel, first sample), (left channel, second sample), (right 363 channel, second sample), .... For multi-octet encodings, octets are 364 transmitted in network byte order (i.e., most significant octet 365 first). 367 The packing of sample-based encodings producing less than one octet 368 per sample is encoding-specific. 370 4.4 Guidelines for Frame-Based Audio Encodings 372 Frame-based encodings encode a fixed-length block of audio into 373 another block of compressed data, typically also of fixed length. For 374 frame-based encodings, the sender may choose to combine several such 375 frames into a single RTP packet. The receiver can tell the number of 376 frames contained in an RTP packet since the audio frame duration (in 377 octets) is defined as part of the encoding, as long as all frames 378 have the same length measured in octets. This does not work when 379 carrying frames of different sizes unless the frame sizes are 380 relatively prime. 382 For frame-based codecs, the channel order is defined for the whole 383 block. That is, for two-channel audio, right and left samples are 384 coded independently, with the encoded frame for the left channel 385 preceding that for the right channel. 387 All frame-oriented audio codecs should be able to encode and decode 388 several consecutive frames within a single packet. Since the frame 389 size for the frame-oriented codecs is given, there is no need to use 390 a separate designation for the same encoding, but with different 391 number of frames per packet. 393 RTP packets shall contain a whole number of frames, with frames 394 inserted according to age within a packet, so that the oldest frame 395 (to be played first) occurs immediately after the RTP packet header. 396 The RTP timestamp reflects the capturing time of the first sample in 397 the first frame, that is, the oldest information in the packet. 399 4.5 Audio Encodings 401 The characteristics of standard audio encodings are shown in Table 1; 402 those assigned static payload types are listed in Table 3. While most 403 audio codecs are only specified for a fixed sampling rate, some 404 sample-based algorithms (indicated by an entry of "var." in the 405 sampling rate column of Table 1) may be used with different sampling 406 rates, resulting in different coded bit rates. Non-RTP means MUST 407 indicate the appropriate sampling rate. 409 4.5.1 1016 411 Encoding 1016 is a frame based encoding using code-excited linear 412 prediction (CELP) and is specified in Federal Standard FED-STD 1016 413 [2,3,4,5]. 415 4.5.2 CN 417 The CN (comfort noise) packet contains a single-octet message to the 418 receiver to play comfort noise at the absolute level specified. This 419 message would normally be sent once at the beginning of a silence 420 period (which also indicates the transition from speech to silence), 421 but rate of noise level updates is implementation specific. The 422 magnitude of the noise level is packed into the least significant 423 bits of the noise-level payload, as shown below. 425 name of sampling default 426 encoding sample/frame bits/sample rate ms/frame ms/packet 427 ____________________________________________________________________________ 428 1016 frame N/A 8,000 30 30 429 CN frame N/A var. 430 DVI4 sample 4 var. 20 431 G722 sample 8 16,000 20 432 G723 frame N/A 8,000 30 30 433 G726-16 sample 2 8,000 20 434 G726-24 sample 3 8,000 20 435 G726-32 sample 4 8,000 20 436 G726-40 sample 5 8,000 20 437 G727-16 sample 2 8,000 20 438 G727-24 sample 3 8,000 20 439 G727-32 sample 4 8,000 20 440 G727-40 sample 5 8,000 20 441 G728 frame N/A 8,000 2.5 20 442 G729 frame N/A 8,000 10 20 443 GSM frame N/A 8,000 20 20 444 L8 sample 8 var. 20 445 L16 sample 16 var. 20 446 LPC frame N/A 8,000 20 20 447 MPA frame N/A var. 20 448 PCMA sample 8 var. 20 449 PCMU sample 8 var. 20 450 SX7300P frame N/A 8,000 15 30 451 SX8300P frame N/A 8,000 15 30 452 VDVI sample var. var. 20 454 Table 1: Properties of Audio Encodings (N/A: not applicable; var.: 455 variable) 457 The noise level is expressed in dBov, with values from 0 to 63 dBov. 458 dBov is the level relative to the overload of the system. (Note: 459 Representation relative to the overload point of a system is 460 particularly useful for digital implementations, since one does not 461 need to know the relative calibration of the analog circuitry.) 462 Example: In 16-bit linear PCM system (L16), a signal with 0 dBov 463 represents a square wave with the maximum possible amplitude (+/- 464 32767). -63 dBov corresponds to -58 dBm0 in a standard telephone 465 system. (dBm is the power level in decibels relative to 1 mW, with an 466 impedance of 600 Ohms.) 467 0 1 2 3 4 5 6 7 468 +-+-+-+-+-+-+-+-+ 469 |0 0| level | 470 +-+-+-+-+-+-+-+-+ 472 The RTP header for the comfort noise packet should be constructed as 473 if the comfort noise were an independent codec. Thus, the RTP 474 timestamp designates the beginning of the silence period. A static 475 payload type is assigned for a sampling rate of 8,000 Hz; if other 476 sampling rates are needed, they should be defined through dynamic 477 payload types. The RTP packet should not have the marker bit set. 479 The CN payload type is primarily for use with L16, DVI4, PCMA, PCMU 480 and other audio codecs that do not support comfort noise as part of 481 the codec itself. G.723.1 and G.729 have their own comfort noise 482 systems as part of Annexes A (G.723.1) and B (G.729), respectively. 484 4.5.3 DVI4 486 DVI4 is specified, with pseudo-code, in [6] as the IMA ADPCM wave 487 type. 489 However, the encoding defined here as DVI4 differs in three respects 490 from this recommendation: 492 o The header contains the predicted value rather than the first 493 sample value. 495 o IMA ADPCM blocks contain an odd number of samples, since the 496 first sample of a block is contained just in the header 497 (uncompressed), followed by an even number of compressed 498 samples. DVI4 has an even number of compressed samples only, 499 using the 'predict' word from the header to decode the first 500 sample. 502 o For DVI4, the 4-bit samples are packed with the first sample 503 in the four most significant bits and the second sample in the 504 four least significant bits. In the IMA ADPCM codec, the 505 samples are packed in little-endian order. 507 Each packet contains a single DVI block. This profile only defines 508 the 4-bit-per-sample version, while IMA also specifies a 3-bit-per- 509 sample encoding. 511 The "header" word for each channel has the following structure: 513 int16 predict; /* predicted value of first sample 514 from the previous block (L16 format) */ 515 u_int8 index; /* current index into stepsize table */ 516 u_int8 reserved; /* set to zero by sender, ignored by receiver */ 518 Each octet following the header contains two 4-bit samples, thus the 519 number of samples per packet must be even. 521 Packing of samples for multiple channels is for further study. 523 The document IMA Recommended Practices for Enhancing Digital Audio 524 Compatibility in Multimedia Systems (version 3.0) contains the 525 algorithm description. It is available from 527 Interactive Multimedia Association 528 48 Maryland Avenue, Suite 202 529 Annapolis, MD 21401-8011 530 USA 531 phone: +1 410 626-1380 533 4.5.4 G722 535 G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding 536 within 64 kbit/s". 538 4.5.5 G723 540 G.723.1 is specified in ITU Recommendation G.723.1, "Dual-rate speech 541 coder for multimedia communications transmitting at 5.3 and 6.3 542 kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T as 543 a mandatory codec for ITU-T H.324 GSTN videophone terminal 544 applications. The algorithm has a floating point specification in 545 Annex B to G.723.1, a silence compression algorithm in Annex A to 546 G.723.1 and an encoded signal bit-error sensitivity specification in 547 G.723.1 Annex C. 549 This Recommendation specifies a coded representation that can be used 550 for compressing the speech signal component of multi-media services 551 at a very low bit rate. Audio is encoded in 30 ms frames, with an 552 additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be 553 one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s 554 frame), or 4 octets. These 4-octet frames are called SID frames 555 (Silence Insertion Descriptor) and are used to specify comfort noise 556 parameters. There is no restriction on how 4, 20, and 24 octet frames 557 are intermixed. The least significant two bits of the first octet in 558 the frame determine the frame size and codec type: 560 bits content octets/frame 561 00 high-rate speech (6.3 kb/s) 24 562 01 low-rate speech (5.3 kb/s) 20 563 10 SID frame 4 564 11 reserved 566 It is possible to switch between the two rates at any 30 ms frame 567 boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of 568 the encoder and decoder. This coder was optimized to represent speech 569 with near-toll quality at the above rates using a limited amount of 570 complexity. 572 All the bits of the encoded bit stream are transmitted always from 573 the the least significant bit towards the most significant bit. 575 4.5.6 G726-16, G726-24, G726-32, G726-40 577 ITU-T Recommendation G.726 describes, among others, the algorithm 578 recommended for conversion of a single 64 kbit/s A-law or mu-law PCM 579 channel encoded at 8000 samples/sec to and from a 32 kbit/s channel. 580 The conversion is applied to the PCM stream using an Adaptive 581 Differential Pulse Code Modulation (ADPCM) transcoding technique. 582 G.726 describes codecs operating at 16 kb/s (2 bits/sample), 24 kb/s 583 (3 bits/sample), 32 kb/s (4 bits/sample), 40 kb/s (5 bits/sample). 584 These encodings are labeled G726-16, G726-24, G726-32 and G726-40, 585 respectively. 587 Note: In 1990, ITU-T Recommendation G.721 was merged with 588 Recommendation G.723 into ITU-T Recommendation G.726. Thus, G726-32 589 designates the same algorithm as G721 in RFC 1890. 591 No header information shall be included as part of the audio data. 592 The 4-bit code words of the G726-32 encoding MUST be packed into 593 octets as follows: the first code word is placed in the four least 594 significant bits of the first octet, with the least significant bit 595 of the code word in the least significant bit of the octet; the 596 second code word is placed in the four most significant bits of the 597 first octet, with the most significant bit of the code word in the 598 most significant bit of the octet. Subsequent pairs of the code words 599 shall be packed in the same way into successive octets, with the 600 first code word of each pair placed in the least significant four 601 bits of the octet. It is prefered that the voice sample be extended 602 with silence such that the encoded value comprises an even number of 603 code words. 605 4.5.7 G727-16, G727-24, G727-32, G727-40 607 ITU-T Recommendation G.727, "5-, 4-, 3- and 2-bits sample embedded 608 adaptive differential pulse code modulation (ADPCM)", specifies an 609 embedded ADPCM algorithm which has the intrinsic capability of 610 dropping bits in the encoded words to alleviate network congestion 611 conditions. The algorithm, although not bitstream compatible with 612 G.726, was based and has a structure similar to the G.726 ADPCM 613 algorithm. 615 4.5.8 G728 617 G728 is specified in ITU-T Recommendation G.728, "Coding of speech at 618 16 kbit/s using low-delay code excited linear prediction". 620 A G.278 encoder translates 5 consecutive audio samples into a 10-bit 621 codebook index, resulting in a bit rate of 16 kb/s for audio sampled 622 at 8,000 samples per second. The group of five consecutive samples is 623 called a vector. Four consecutive vectors, labeled V1 to V4 (where V1 624 is to be played first by the receiver), build one G.728 frame. The 625 four vectors of 40 bits are packed into 5 octets, labeled B1 through 626 B5. B1 shall be placed first in the RTP packet. 628 Referring to the figure below, the principle for bit order is 629 "maintenance of bit significance". Bits from an older vector are more 630 significant than bits from newer vectors. The MSB of the frame goes 631 to the MSB of B1 and the LSB of the frame goes to LSB of B5. For 632 example: octet B1 contains the eight most significant bits of vector 633 V1, the MSB of V1 is MSB of B1. 635 1 2 3 3 636 0 0 0 0 9 637 ++++++++++++++++++++++++++++++++++++++++ 638 <---V1---><---V2---><---V3---><---V4---> vectors 639 <--B1--><--B2--><--B3--><--B4--><--B5--> octets 640 <------------- frame 1 ----------------> 641 In particular, B1 contains the eight most significant bits of V1, 642 with the MSB of V1 being the MSB of B1. B2 contains the two least 643 significant bits of V1, the more significant of the two in its MSB, 644 and the six most significant bits of V2. B1 shall be placed first in 645 the RTP packet and B5 last. 647 4.5.9 G729 649 G729 is specified in ITU-T Recommendation G.729, "Coding of speech at 650 8 kbit/s using conjugate structure-algebraic code excited linear 651 prediction (CS-ACELP)". A complexity-reduced version of the G.729 652 algorithm is specified in Annex A to Rec. G.729. The speech coding 653 algorithms in the main body of G.729 and in G.729 Annex A are fully 654 interoperable with each other, so there is no need to further 655 distinguish between them. The G.729 and G.729 Annex A codecs were 656 optimized to represent speech with high quality, where G.729 Annex A 657 trades some speech quality for an approximate 50% complexity 658 reduction [7]. 660 A voice activity detector (VAD) and comfort noise generator (CNG) 661 algorithm in Annex B of G.729 is recommended for digital simultaneous 662 voice and data applications and can be used in conjunction with G.729 663 or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets, 664 while the G.729 Annex B comfort noise frame occupies 2 octets: 666 0 1 667 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 668 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 669 |L| LSF1 | LSF2 | GAIN |R| 670 |S| | | |E| 671 |F|0 1 2 3 4|0 1 2 3|0 1 2 3 4|S| 672 |0| | | |V| RESV = Reserved (zero) 673 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 675 An RTP packet may consist of zero or more G.729 or G.729 Annex A 676 frames, followed by zero or one G.729 Annex B payloads. The presence 677 of a comfort noise frame can be deduced from the length of the RTP 678 payload. 680 A floating-point version of the G.729, G.729 Annex A, and G.729 Annex 681 B will be available shortly as Annex C to Recommendation G.729. 683 The transmitted parameters of a G.729/G.729A 10-ms frame, consisting 684 of 80 bits, are defined in Recommendation G.729, Table 8/G.729. 686 The mapping of the these parameters is given below. Bits are numbered 687 as Internet order, that is, the most significant bit is bit 0. 689 0 1 2 3 690 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 691 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 692 |L| L1 | L2 | L3 | P1 |P| C1 | 693 |0| | | | |0| | 694 | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4| 695 | | | | | | | | 696 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 698 4 5 6 699 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 700 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 701 | C1 | S1 | GA1 | GB1 | P2 | C2 | 702 | | | | | | | 703 |5 6 7 8 9 1 1 1|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7| 704 | 0 1 2| | | | | | 705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 707 7 708 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 710 | C2 | S2 | GA2 | GB2 | 711 | | | | | 712 |8 9 1 1 1|0 1 2 3|0 1 2|0 1 2 3| 713 | 0 1 2| | | | 714 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 716 4.5.10 GSM 718 GSM (group speciale mobile) denotes the European GSM 06.10 719 provisional standard for full-rate speech transcoding, prI-ETS 300 720 036, which is based on RPE/LTP (residual pulse excitation/long term 721 prediction) coding at a rate of 13 kb/s [8,9,10]. The text of the 722 standard can be obtained from 724 ETSI (European Telecommunications Standards Institute) 725 ETSI Secretariat: B.P.152 726 F-06561 Valbonne Cedex 727 France 728 Phone: +33 92 94 42 00 729 Fax: +33 93 65 47 16 731 Blocks of 160 audio samples are compressed into 33 octets, for an 732 effective data rate of 13,200 b/s. 734 4.5.10.1 General Packaging Issues 736 The GSM standard specifies the bit stream produced by the codec, but 737 does not specify how these bits should be packed for transmission. 738 Some software implementations of the GSM codec use a different 739 packing than that specified here. 741 In the GSM encoding used by RTP, the bits are packed beginning from 742 the most significant bit. Every 160 sample GSM frame is coded into 743 one 33 octet (264 bit) buffer. Every such buffer begins with a 4 bit 744 signature (0xD), followed by the MSB encoding of the fields of the 745 frame. The first octet thus contains 1101 in the 4 most significant 746 bits (4-7) and the 4 most significant bits of F1 (2-5) in the 4 least 747 significant bits (0-3). The second octet contains the 2 least bits of 748 F1 in bits 6-7, and F2 in bits 0-5, and so on. The order of the 749 fields in the frame is as follows: 751 4.5.10.2 GSM variable names and numbers 753 So if F.i signifies the ith bit of the field F, and bit 0 is the most 754 significant bit, and the bits of every octet are numbered from 0 to 7 755 from most to least significant, then in the RTP encoding we have: 757 4.5.11 L8 759 L8 denotes linear audio data, using 8-bits of precision with an 760 offset of 128, that is, the most negative signal is encoded as zero. 762 4.5.12 L16 764 L16 denotes uncompressed audio data, using 16-bit signed 765 representation with 65535 equally divided steps between minimum and 766 maximum signal level, ranging from --32768 to 32767. The value is 767 field field name bits field field name bits 768 __________________________________________________________ 769 1 LARc[0] 6 39 xmc[22] 3 770 2 LARc[1] 6 40 xmc[23] 3 771 3 LARc[2] 5 41 xmc[24] 3 772 4 LARc[3] 5 42 xmc[25] 3 773 5 LARc[4] 4 43 Nc[2] 7 774 6 LARc[5] 4 44 bc[2] 2 775 7 LARc[6] 3 45 Mc[2] 2 776 8 LARc[7] 3 46 xmaxc[2] 6 777 9 Nc[0] 7 47 xmc[26] 3 778 10 bc[0] 2 48 xmc[27] 3 779 11 Mc[0] 2 49 xmc[28] 3 780 12 xmaxc[0] 6 50 xmc[29] 3 781 13 xmc[0] 3 51 xmc[30] 3 782 14 xmc[1] 3 52 xmc[31] 3 783 15 xmc[2] 3 53 xmc[32] 3 784 16 xmc[3] 3 54 xmc[33] 3 785 17 xmc[4] 3 55 xmc[34] 3 786 18 xmc[5] 3 56 xmc[35] 3 787 19 xmc[6] 3 57 xmc[36] 3 788 20 xmc[7] 3 58 xmc[37] 3 789 21 xmc[8] 3 59 xmc[38] 3 790 22 xmc[9] 3 60 Nc[3] 7 791 23 xmc[10] 3 61 bc[3] 2 792 24 xmc[11] 3 62 Mc[3] 2 793 25 xmc[12] 3 63 xmaxc[3] 6 794 26 Nc[1] 7 64 xmc[39] 3 795 27 bc[1] 2 65 xmc[40] 3 796 28 Mc[1] 2 66 xmc[41] 3 797 29 xmaxc[1] 6 67 xmc[42] 3 798 30 xmc[13] 3 68 xmc[43] 3 799 31 xmc[14] 3 69 xmc[44] 3 800 32 xmc[15] 3 70 xmc[45] 3 801 33 xmc[16] 3 71 xmc[46] 3 802 34 xmc[17] 3 72 xmc[47] 3 803 35 xmc[18] 3 73 xmc[48] 3 804 36 xmc[19] 3 74 xmc[49] 3 805 37 xmc[20] 3 75 xmc[50] 3 806 38 xmc[21] 3 76 xmc[51] 3 808 Table 2: Ordering of GSM variables 810 represented in two's complement notation and network byte order. 812 4.5.13 LPC 813 Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 814 _____________________________________________________________________________________________ 815 0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3 816 1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5 817 2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2 818 3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1 819 4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2 820 5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0 821 6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04 822 7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0 823 8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2 824 9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1 825 10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0 826 11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2 827 12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0 828 13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14 829 14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0 830 15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2 831 16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1 832 17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0 833 18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2 834 19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0 835 20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24 836 21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0 837 22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2 838 23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1 839 24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0 840 25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2 841 26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0 842 27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34 843 28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0 844 29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2 845 30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1 846 31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0 847 32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2 849 LPC designates an experimental linear predictive encoding contributed 850 by Ron Frederick, Xerox PARC, which is based on an implementation 851 written by Ron Zuckerman, Motorola, posted to the Usenet group 852 comp.dsp on June 26, 1992. The codec generates 14 octets for every 853 frame. The framesize is set to 20 ms, resulting in a bit rate of 854 5,600 b/s. 856 4.5.14 MPA 858 MPA denotes MPEG-I or MPEG-II audio encapsulated as elementary 859 streams. The encoding is defined in ISO standards ISO/IEC 11172-3 860 and 13818-3. The encapsulation is specified in RFC 2038 [11]. 862 Sampling rate and channel count are contained in the payload. MPEG-I 863 audio supports sampling rates of 32000, 44100, and 48000 Hz (ISO/IEC 864 11172-3, section 1.1; "Scope"). MPEG-II additionally supports ISO/IEC 865 11172-3 Audio..."). 867 4.5.15 PCMA and PCMU 869 PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio data 870 is encoded as eight bits per sample, after logarithmic scaling. PCMU 871 denotes mu-law scaling, PCMA A-law scaling. A detailed description is 872 given by Jayant and Noll [12]. Each G.711 octet shall be octet- 873 aligned in an RTP packet. The sign bit of each G.711 octet shall 874 correspond to the most significant bit of the octet in the RTP packet 875 (i.e., assuming the G.711 samples are handled as octets on the host 876 machine, the sign bit shall be the most signficant bit of the octet 877 as defined by the host machine format). The 56 kb/s and 48 kb/s modes 878 of G.711 are not applicable to RTP, since G.711 shall always be 879 transmitted as 8-bit samples. 881 4.5.16 RED 883 The redundant audio payload format "RED" is specified by RFC XXX. It 884 defines a means by which multiple redundant copies of an audio packet 885 may be transmitted in a single RTP stream. Each packet in such a 886 stream contains, in addition to the audio data for that packetization 887 interval, a (more heavily compressed) copy of the data from the 888 previous packetization interval. This allows an approximation of the 889 data from lost packets to be recovered upon decoding of the following 890 packet, giving much improved sound quality when compared with silence 891 substitution for lost packets. 893 4.5.17 SX7300P 895 The SX7300P is a low-complexity CELP-based audio codec operating at a 896 sampling rate of 8000 Hz. It encodes blocks of 120 audio samples (15 897 ms) into an encoded frame of 14 octets, yielding an encoded bit rate 898 of approximately 7467 b/s. 900 4.5.18 SX8300P 902 The SX8300P is a low-complexity CELP-based audio codec operating at a 903 sampling rate of 8000 Hz. It encodes blocks of 120 audio samples (15 904 ms) into an encoded frame of 16 octets, yielding an encoded bit rate 905 of approximately 8533 b/s. 907 4.5.19 VDVI 908 VDVI is a variable-rate version of DVI4, yielding speech bit rates of 909 between 10 and 25 kb/s. It is specified for single-channel operation 910 only. Samples are packed into octets starting at the most-significant 911 bit. 913 It uses the following encoding: 915 DVI4 codeword VDVI bit pattern 916 _________________________________ 917 0 00 918 1 010 919 2 1100 920 3 11100 921 4 111100 922 5 1111100 923 6 11111100 924 7 11111110 925 8 10 926 9 011 927 10 1101 928 11 11101 929 12 111101 930 13 1111101 931 14 11111101 932 15 11111111 934 5 Video 936 The following video encodings are currently defined, with their 937 abbreviated names used for identification: 939 5.1 CelB 941 The CELL-B encoding is a proprietary encoding proposed by Sun 942 Microsystems. The byte stream format is described in RFC 2029 [13]. 944 5.2 JPEG 946 The encoding is specified in ISO Standards 10918-1 and 10918-2. The 947 RTP payload format is as specified in RFC 2035 [14]. 949 5.3 H261 951 The encoding is specified in ITU-T Recommendation H.261, "Video codec 952 for audiovisual services at p x 64 kbit/s". The packetization and 953 RTP-specific properties are described in RFC 2032 [15]. 955 5.4 H263 957 The encoding is specified in ITU-T Recommendation H.263, "Video 958 coding for low bit rate communication". The packetization and RTP- 959 specific properties are described in [16]. 961 5.5 MPV 963 MPV designates the use MPEG-I and MPEG-II video encoding elementary 964 streams as specified in ISO Standards ISO/IEC 11172 and 13818-2, 965 respectively. The RTP payload format is as specified in RFC 2038 966 [11], Section 3. 968 5.6 MP2T 970 MP2T designates the use of MPEG-II transport streams, for either 971 audio or video. The encapsulation is described in RFC 2038 [11], 972 Section 2. See the description of the MPA audio encoding for contact 973 information. 975 5.7 nv 977 The encoding is implemented in the program 'nv', version 4, developed 978 at Xerox PARC by Ron Frederick. Further information is available from 979 the author: 981 Ron Frederick 982 Xerox Palo Alto Research Center 983 3333 Coyote Hill Road 984 Palo Alto, CA 94304 985 United States 986 electronic mail: frederic@parc.xerox.com 988 6 Payload Type Definitions 990 Table 3 defines this profile's static payload type values for the PT 991 field of the RTP data header. A new RTP payload format specification 992 may be registered with the IANA by name, and may also be assigned a 993 static payload type value from the range marked in Section 3. 995 In addition, payload type values in the range 96--127 may be defined 996 dynamically through a conference control protocol, which is beyond 997 the scope of this document. For example, a session directory could 998 specify that for a given session, payload type 96 indicates PCMU 999 encoding, 8,000 Hz sampling rate, 2 channels. The payload type range 1000 marked 'reserved' has been set aside so that RTCP and RTP packets can 1001 be reliably distinguished (see Section "Summary of Protocol 1002 Constants" of the RTP protocol specification). 1004 An RTP source emits a single RTP payload type at any given instant. 1005 The interleaving or multiplexing of several RTP media types within a 1006 single RTP session is not allowed, but multiple RTP sessions may be 1007 used in parallel to send multiple media types. An RTP source may 1008 change payload types during a session. 1010 The payload types currently defined in this profile are assigned to 1011 exactly one of three categories or media types : audio only, video 1012 only and those combining audio and video. A single RTP session 1013 consists of payload types of one and only media type. 1015 Session participants agree through mechanisms beyond the scope of 1016 this specification on the set of payload types allowed in a given 1017 session. This set may, for example, be defined by the capabilities 1018 of the applications used, negotiated by a conference control protocol 1019 or established by agreement between the human participants. The media 1020 types in Table 3 are marked as "A" for audio, "V" for video and "AV" 1021 for combined audio/video streams. 1023 Audio applications operating under this profile should, at minimum, 1024 be able to send and receive payload types 0 (PCMU) and 5 (DVI4). This 1025 allows interoperability without format negotiation and successful 1026 negotation with a conference control protocol. 1028 All current video encodings use a timestamp frequency of 90,000 Hz, 1029 the same as the MPEG presentation time stamp frequency. This 1030 frequency yields exact integer timestamp increments for the typical 1031 24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates 1032 and 50, 59.94 and 60 Hz field rates. While 90 kHz is the recommended 1033 rate for future video encodings used within this profile, other rates 1034 are possible. However, it is not sufficient to use the video frame 1035 rate (typically between 15 and 30 Hz) because that does not provide 1036 adequate resolution for typical synchronization requirements when 1037 calculating the RTP timestamp corresponding to the NTP timestamp in 1038 an RTCP SR packet. The timestamp resolution must also be sufficient 1039 for the jitter estimate contained in the receiver reports. 1041 The standard video encodings and their payload types are listed in 1042 Table 3. 1044 7 RTP over TCP and Similar Byte Stream Protocols 1046 Under special circumstances, it may be necessary to carry RTP in 1047 protocols offering a byte stream abstraction, such as TCP, possibly 1048 multiplexed with other data. If the application does not define its 1049 own method of delineating RTP and RTCP packets, it SHOULD prefix each 1050 packet with a two-octet length field. 1052 PT encoding media type clock rate channels 1053 name (Hz) (audio) 1054 _______________________________________________________________ 1055 0 PCMU A 8000 1 1056 1 1016 A 8000 1 1057 2 G726-32 A 8000 1 1058 3 GSM A 8000 1 1059 4 G723 A 8000 1 1060 5 DVI4 A 8000 1 1061 6 DVI4 A 16000 1 1062 7 LPC A 8000 1 1063 8 PCMA A 8000 1 1064 9 G722 A 16000 1 1065 10 L16 A 44100 2 1066 11 L16 A 44100 1 1067 12 unassigned A 1068 13 unassigned A 1069 14 MPA A 90000 (see text) 1070 15 G728 A 8000 1 1071 16 DVI4 A 11025 1 1072 17 DVI4 A 22050 1 1073 18 G729 A 8000 1 1074 19 CN A 8000 1 1075 20 unassigned A 1076 21 unassigned A 1077 22 unassigned A 1078 23 unassigned A 1079 24 unassigned V 1080 25 CelB V 90000 1081 26 JPEG V 90000 1082 27 unassigned V 1083 28 nv V 90000 1084 29 unassigned V 1085 30 unassigned V 1086 31 H261 V 90000 1087 32 MPV V 90000 1088 33 MP2T AV 90000 1089 34 H263 V 90000 1090 35--71 unassigned ? 1091 72--76 reserved N/A N/A N/A 1092 77 RED A N/A N/A 1093 78--95 unassigned ? 1094 96--127 dynamic ? 1096 Table 3: Payload types (PT) for standard audio and video encodings 1097 (Note: RTSP [17] provides its own encapsulation and does not need an 1098 extra length indication.) 1100 8 Port Assignment 1102 As specified in the RTP protocol definition, RTP data is to be 1103 carried on an even UDP or TCP port number and the corresponding RTCP 1104 packets are to be carried on the next higher (odd) port number. 1106 Applications operating under this profile may use any such UDP or TCP 1107 port pair. For example, the port pair may be allocated randomly by a 1108 session management program. A single fixed port number pair cannot be 1109 required because multiple applications using this profile are likely 1110 to run on the same host, and there are some operating systems that do 1111 not allow multiple processes to use the same UDP port with different 1112 multicast addresses. 1114 However, port numbers 5004 and 5005 have been registered for use with 1115 this profile for those applications that choose to use them as the 1116 default pair. Applications that operate under multiple profiles may 1117 use this port pair as an indication to select this profile if they 1118 are not subject to the constraint of the previous paragraph. 1119 Applications need not have a default and may require that the port 1120 pair be explicitly specified. The particular port numbers were chosen 1121 to lie in the range above 5000 to accomodate port number allocation 1122 practice within the Unix operating system, where port numbers below 1123 1024 can only be used by privileged processes and port numbers 1124 between 1024 and 5000 are automatically assigned by the operating 1125 system. 1127 9 Bibliography 1129 [1] Apple Computer, "Audio interchange file format AIFF-C," Aug. 1130 1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z). 1132 [2] Office of Technology and Standards, "Telecommunications: Analog 1133 to digital conversion of radio voice by 4,800 bit/second code excited 1134 linear prediction (celp)," Federal Standard FS-1016, GSA, Room 6654; 1135 7th & D Street SW; Washington, DC 20407 (+1-202-708-9205), 1990. 1137 [3] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The 1138 proposed Federal Standard 1016 4800 bps voice coder: CELP," Speech 1139 Technology , vol. 5, pp. 58--64, April/May 1990. 1141 [4] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The federal 1142 standard 1016 4800 bps CELP voice coder," Digital Signal Processing , 1143 vol. 1, no. 3, pp. 145--155, 1991. 1145 [5] J. P. Campbell, Jr., T. E. Tremain, and V. C. Welch, "The dod 4.8 1146 kbps standard (proposed federal standard 1016)," in Advances in 1147 Speech Coding (B. Atal, V. Cuperman, and A. Gersho, eds.), ch. 12, 1148 pp. 121--133, Kluwer Academic Publishers, 1991. 1150 [6] IMA Digital Audio Focus and Technical Working Groups, 1151 "Recommended practices for enhancing digital audio compatibility in 1152 multimedia systems (version 3.00)," tech. rep., Interactive 1153 Multimedia Association, Annapolis, Maryland, Oct. 1992. 1155 [7] D. Del�am and J.-P. Petit, "Real-time implementations of the 1156 recent ITU-T low bit rate speech coders on the TI TMS320C54X DSP: 1157 results, methodology, and applications," in Proc. of International 1158 Conference on Signal Processing, Technology, and Applications 1159 (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660, Oct. 1996. 1161 [8] M. Mouly and M.-B. Pautet, The GSM system for mobile 1162 communications Lassay-les-Chateaux, France: Europe Media Duplication, 1163 1993. 1165 [9] J. Degener, "Digital speech compression," Dr. Dobb's Journal , 1166 Dec. 1994. 1168 [10] S. M. Redl, M. K. Weber, and M. W. Oliphant, An Introduction to 1169 GSM Boston: Artech House, 1995. 1171 [11] D. Hoffman, G. Fernando, and V. Goyal, "RTP payload format for 1172 MPEG1/MPEG2 video," Request for Comments (Proposed Standard) RFC 1173 2038, Internet Engineering Task Force, Oct. 1996. 1175 [12] N. S. Jayant and P. Noll, Digital Coding of Waveforms-- 1176 Principles and Applications to Speech and Video Englewood Cliffs, New 1177 Jersey: Prentice-Hall, 1984. 1179 [13] M. Speer and D. Hoffman, "RTP payload format of sun's CellB 1180 video encoding," Request for Comments (Proposed Standard) RFC 2029, 1181 Internet Engineering Task Force, Oct. 1996. 1183 [14] L. Berc, W. Fenner, R. Frederick, and S. McCanne, "RTP payload 1184 format for JPEG-compressed video," Request for Comments (Proposed 1185 Standard) RFC 2035, Internet Engineering Task Force, Oct. 1996. 1187 [15] T. Turletti and C. Huitema, "RTP payload format for H.261 video 1188 streams," Request for Comments (Proposed Standard) RFC 2032, Internet 1189 Engineering Task Force, Oct. 1996. 1191 [16] C. C. Zhu, "RTP payload format for H.263 video streams," 1192 Internet Draft, Internet Engineering Task Force, Mar. 1997. Work in 1193 progress. 1195 [17] H. Schulzrinne, A. Rao, and R. Lanphier, "Real time streaming 1196 protocol (RTSP)," Internet Draft, Internet Engineering Task Force, 1197 July 1997. Work in progress. 1199 10 Acknowledgements 1201 The comments and careful review of Steve Casner, Simao Campos and 1202 Richard Cox are gratefully acknowledged. The GSM description was 1203 adopted from the IMTC Voice over IP Forum Service Interoperability 1204 Implementation Agreement (January 1997). Fred Burg and Terry Lyons 1205 helped with the G.729 description. 1207 11 Address of Author 1209 Henning Schulzrinne 1210 Dept. of Computer Science 1211 Columbia University 1212 1214 Amsterdam Avenue 1213 New York, NY 10027 1214 USA 1215 electronic mail: schulzrinne@cs.columbia.edu 1217 Current Locations of Related Resources 1219 Note: Several sections below refer to the ITU-T Software Tool Library 1220 (STL). It is available from the ITU Sales Service, Place des Nations, 1221 CH-1211 Geneve 20, Switzerland (also check http://www.itu.int. The 1222 ITU-T STL is covered by a license defined in ITU-T Recommendation 1223 G.191, " Software tools for speech and audio coding standardization 1224 ". 1226 UTF-8 1228 Information on the UCS Transformation Format 8 (UTF-8) is available 1229 at 1231 http://www.stonehand.com/unicode/standard/utf8.html 1233 1016 1235 The U.S. DoD's Federal-Standard-1016 based 4800 bps code excited 1236 linear prediction voice coder version 3.2 (CELP 3.2) Fortran and C 1237 simulation source codes are available for worldwide distribution at 1238 no charge (on DOS diskettes, but configured to compile on Sun SPARC 1239 stations) from: Bob Fenichel, National Communications System, 1240 Washington, D.C. 20305, phone +1-703-692-2124, fax +1-703-746-4960. 1242 An implementation is also available at 1244 ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z 1246 DVI4 1248 An implementation is available from Jack Jansen at 1250 ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar 1252 G722 1254 An implementation of the G.722 algorithm is available as part of the 1255 ITU-T STL, described above. 1257 G723 1259 The reference C code implementation defining the G.723.1 algorithm 1260 and its Annexes A, B, and C are available as an integral part of 1261 Recommendation G.723.1 from the ITU Sales Service, address listed 1262 above. Both the algorithm and C code are covered by a specific 1263 license. The ITU-T Secretariat should be contacted to obtain such 1264 licensing information. 1266 G726-16 through G726-40 1268 G726-16 through G726-40 are specified in the ITU-T Recommendation 1269 G.726, "40, 32, 24, and 16 kb/s Adaptive Differential Pulse Code 1270 Modulation (ADPCM)". An implementation of the G.726 algorithm is 1271 available as part of the ITU-T STL, described above. 1273 G727-16 through G727-40 1275 G727-16 through G727-40 are specified in the ITU-T Recommendation 1276 G.727, "5-, 4-, 3-, and 2-bit/sample embedded adaptive differential 1277 pulse code modulation". An implementation of the G.727 algorithm will 1278 be available in a future release of the ITU-T STL, described above. 1280 G729 1282 The reference C code implementation defining the G.729 algorithm and 1283 its Annexes A and B are available as an integral part of 1284 Recommendation G.729 from the ITU Sales Service, listed above. Both 1285 the algorithm and the C code are covered by a specific license. The 1286 contact information for obtaining the license is listed in the C 1287 code. 1289 GSM 1291 A reference implementation was written by Carsten Borman and Jutta 1292 Degener (TU Berlin, Germany). It is available at 1294 ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ 1296 Although the RPE-LTP algorithm is not an ITU-T standard, there is a C 1297 code implementation of the RPE-LTP algorithm available as part of the 1298 ITU-T STL. The STL implementation is an adaptation of the TU Berlin 1299 version. 1301 LPC 1303 An implementation is available at 1305 ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z 1307 PCMU, PCMA 1309 An implementation of these algorithm is available as part of the 1310 ITU-T STL, described above. Code to convert between linear and mu-law 1311 companded data is also available in [6]. 1313 Table of Contents 1315 1 Introduction ........................................ 2 1316 2 RTP and RTCP Packet Forms and Protocol Behavior ..... 3 1317 3 Registering Payload Types ........................... 5 1318 4 Audio ............................................... 6 1319 4.1 Encoding-Independent Rules .......................... 6 1320 4.2 Operating Recommendations ........................... 7 1321 4.3 Guidelines for Sample-Based Audio Encodings ......... 8 1322 4.4 Guidelines for Frame-Based Audio Encodings .......... 8 1323 4.5 Audio Encodings ..................................... 9 1324 4.5.1 1016 ................................................ 9 1325 4.5.2 CN .................................................. 9 1326 4.5.3 DVI4 ................................................ 11 1327 4.5.4 G722 ................................................ 12 1328 4.5.5 G723 ................................................ 12 1329 4.5.6 G726-16, G726-24, G726-32, G726-40 .................. 13 1330 4.5.7 G727-16, G727-24, G727-32, G727-40 .................. 14 1331 4.5.8 G728 ................................................ 14 1332 4.5.9 G729 ................................................ 15 1333 4.5.10 GSM ................................................. 16 1334 4.5.10.1 General Packaging Issues ............................ 17 1335 4.5.10.2 GSM variable names and numbers ...................... 17 1336 4.5.11 L8 .................................................. 17 1337 4.5.12 L16 ................................................. 17 1338 4.5.13 LPC ................................................. 18 1339 4.5.14 MPA ................................................. 19 1340 4.5.15 PCMA and PCMU ....................................... 20 1341 4.5.16 RED ................................................. 20 1342 4.5.17 SX7300P ............................................. 20 1343 4.5.18 SX8300P ............................................. 20 1344 4.5.19 VDVI ................................................ 20 1345 5 Video ............................................... 21 1346 5.1 CelB ................................................ 21 1347 5.2 JPEG ................................................ 21 1348 5.3 H261 ................................................ 21 1349 5.4 H263 ................................................ 22 1350 5.5 MPV ................................................. 22 1351 5.6 MP2T ................................................ 22 1352 5.7 nv .................................................. 22 1353 6 Payload Type Definitions ............................ 22 1354 7 RTP over TCP and Similar Byte Stream Protocols ...... 23 1355 8 Port Assignment ..................................... 25 1356 9 Bibliography ........................................ 25 1357 10 Acknowledgements .................................... 27 1358 11 Address of Author ................................... 27