idnits 2.17.1 draft-ietf-avt-tones-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 29 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 6 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 138: '...ing events via RTP MAY send both named...' RFC 2119 keyword, line 145: '...one representation, it SHOULD send the...' RFC 2119 keyword, line 165: '...udio stream, and MUST use the same seq...' RFC 2119 keyword, line 191: '... A source MAY send events and coded ...' RFC 2119 keyword, line 193: '... stream, or it MAY block outgoing au...' (18 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 482 has weird spacing: '... Event encod...' == Line 929 has weird spacing: '...equency on pe...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 9, 1999) is 8876 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 1213 looks like a reference -- Missing reference section? '2' on line 1217 looks like a reference -- Missing reference section? '3' on line 1221 looks like a reference -- Missing reference section? '4' on line 1226 looks like a reference -- Missing reference section? '5' on line 1230 looks like a reference -- Missing reference section? '6' on line 1234 looks like a reference -- Missing reference section? '7' on line 1239 looks like a reference -- Missing reference section? '8' on line 1243 looks like a reference -- Missing reference section? '9' on line 1250 looks like a reference -- Missing reference section? '10' on line 1255 looks like a reference -- Missing reference section? '11' on line 1259 looks like a reference -- Missing reference section? '12' on line 1265 looks like a reference -- Missing reference section? '13' on line 1273 looks like a reference -- Missing reference section? '14' on line 1278 looks like a reference -- Missing reference section? '15' on line 1282 looks like a reference -- Missing reference section? '16' on line 1286 looks like a reference -- Missing reference section? '17' on line 1291 looks like a reference -- Missing reference section? '18' on line 1296 looks like a reference -- Missing reference section? '19' on line 1301 looks like a reference Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 22 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force AVT WG 2 Internet Draft Schulzrinne/Petrack 3 draft-ietf-avt-tones-04.txt Columbia U./MetaTel 4 December 9, 1999 5 Expires: May 2000 7 RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals 9 STATUS OF THIS MEMO 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress". 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 Abstract 32 This memo describes how to carry dual-tone multifrequency (DTMF) 33 signaling, other tone signals and telephony events in RTP packets. 35 1 Introduction 37 This memo defines two payload formats, one for carrying dual-tone 38 multifrequency (DTMF) digits, other line and trunk signals (Section 39 3), and a second one for general multi-frequency tones in RTP [1] 40 packets (Section 4). Separate RTP payload formats are desirable since 41 low-rate voice codecs cannot be guaranteed to reproduce these tone 42 signals accurately enough for automatic recognition. Defining a 43 separate payload formats also permits higher redundancy while 44 maintaining a low bit rate. 46 The payload formats described here may be useful in at least three 47 applications: DTMF handling for gateways and end sytems, as well as 48 "RTP trunks". In the first application, the Internet telephony 49 gateway detects DTMF on the incoming circuits and sends the RTP 50 payload described here instead of regular audio packets. The gateway 51 likely has the necessary digital signal processors and algorithms, as 52 it often needs to detect DTMF, e.g., for two-stage dialing. Having 53 the gateway detect tones relieves the receiving Internet end system 54 from having to do this work and also avoids that low bit-rate codecs 55 like G.723.1 render DTMF tones unintelligible. Secondly, an Internet 56 end system such as an "Internet phone" can emulate DTMF functionality 57 without concerning itself with generating precise tone pairs and 58 without imposing the burden of tone recognition on the receiver. 60 In the "RTP trunk" application, RTP is used to replace a normal 61 circuit-switched trunk between two nodes. This is particularly of 62 interest in a telephone network that is still mostly circuit- 63 switched. In this case, each end of the RTP trunk encodes audio 64 channels into the appropriate encoding, such as G.723.1 or G.729. 65 However, this encoding process destroys in-band signaling information 66 which is carried using the least-significant bit ("robbed bit 67 signaling") and may also interfere with in-band signaling tones, such 68 as the MF digit tones. In addition, tone properties such as the phase 69 reversals in the ANSam tone, will not survive speech coding. Thus, 70 the gateway needs to remove the in-band signaling information from 71 the bit stream. It can now either carry it out-of-band in a signaling 72 transport mechanism yet to be defined, or it can use the mechanism 73 described in this memorandum. (If the two trunk end points are within 74 reach of the same media gateway controller, the media gateway 75 controller can also handle the signaling.) Carrying it in-band may 76 simplify the time synchronization between audio packets and the tone 77 or signal information. This is particularly relevant where duration 78 and timing matter, as in the carriage of DTMF signals. 80 1.1 Terminology 82 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 83 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 84 and "OPTIONAL" are to be interpreted as described in RFC 2119 [2] and 85 indicate requirement levels for compliant implementations. 87 2 Events vs. Tones 89 A gateway has two options for handling DTMF digits and events. First, 90 it can simply measure the frequency components of the voice band 91 signals and transmit this information to the RTP receiver (Section 92 4). In this mode, the gateway makes no attempt to discern the meaning 93 of the tones, but simply distinguishes tones from speech signals. 95 All tone signals in use in the PSTN and meant for human consumption 96 are sequences of simple combinations of sine waves, either added or 97 modulated. (There is at least one tone, the ANSam tone [3] used for 98 indicating data transmission over voice lines, that makes use of 99 periodic phase reversals.) 101 As a second option, a gateway can recognize the tones and translate 102 them into a name, such as ringing or busy tone. The receiver then 103 produces a tone signal or other indication appropriate to the signal. 104 Generally, since the recognition of signals often depends on their 105 on/off pattern or the sequence of several tones, this recognition can 106 take several seconds. On the other hand, the gateway may have access 107 to the actual signaling information that generates the tones and thus 108 can generate the RTP packet immediately, without the detour through 109 acoustic signals. 111 In the phone network, tones are generated at different places, 112 depending on the switching technology and the nature of the tone. 113 This determines, for example, whether a person making a call to a 114 foreign country hears her local tones she is familiar with or the 115 tones as used in the country called. 117 For analog lines, dial tone is always generated by the local switch. 118 ISDN terminals may generate dial tone locally and then send a Q.931 119 SETUP message containing the dialed digits. If the terminal just 120 sends a SETUP message without any Called Party digits, then the 121 switch does digit collection, provided by the terminal as KEYPAD 122 messages, and provides dial tone over the B-channel. The terminal can 123 either use the audio signal on the B-channel or can use the Q.931 124 messages to trigger locally generated dial tone. 126 Ringing tone (also called ringback tone) is generated by the local 127 switch at the callee, with a one-way voice path opened up as soon as 128 the callee's phone rings. (This reduces the chance of clipping of the 129 called party's response just after answer. It also permits pre-answer 130 announcements or in-band call-progress-indications to reach the 131 caller before or in lieu of ringing tone.) Congestion tone and 132 special information tones can be generated by any of the switches 133 along the way, and may be generated by the caller's switch based on 134 ISUP messages received. Busy tone is generated by the caller's 135 switch, triggered by the appropriate ISUP message, for analog 136 instruments, or the ISDN terminal. 138 Gateways which send signalling events via RTP MAY send both named 139 signals (Section 3) and the tone representation (Section 4) as a 140 single RTP session, using the redundancy mechanism defined in Section 141 3.7 to interleave the two representations. It is generally a good 142 idea to send both, since it allows the receiver to choose the 143 appropriate rendering. 145 If a gateway cannot present a tone representation, it SHOULD send the 146 audio tones as regular RTP audio packets (e.g., as payload format 147 PCMU), in addition to the named signals. 149 3 RTP Payload Format for Named Telephone Events 151 3.1 Introduction 153 The payload format for named telephone events described below is 154 suitable for both gateway and end-to-end scenarios. In the gateway 155 scenario, an Internet telephony gateway connecting a packet voice 156 network to the PSTN recreates the DTMF tones or other telephony 157 events and injects them into the PSTN. Since, for example, DTMF digit 158 recognition takes several tens of milliseconds, the first few 159 milliseconds of a digit will arrive as regular audio packets. Thus, 160 careful time and power (volume) alignment between the audio samples 161 and the events is needed to avoid generating spurious digits at the 162 receiver. 164 DTMF digits and named telephone events are carried as part of the 165 audio stream, and MUST use the same sequence number and time-stamp 166 base as the regular audio channel to simplify the generation of audio 167 waveforms at a gateway. The default clock frequency is 8,000 Hz, but 168 the clock frequency can be redefined when assigning the dynamic 169 payload type. 171 The payload format described here achieves a higher redundancy even 172 in the case of sustained packet loss than the method proposed for the 173 Voice over Frame Relay Implementation Agreement [4]. 175 If an end system is directly connected to the Internet and does not 176 need to generate tone signals again, time alignment and power levels 177 are not relevant. These systems rely on PSTN gateways or Internet end 178 systems to generate DTMF events and do not perform their own audio 179 waveform analysis. An example of such a system is an Internet 180 interactive voice-response (IVR) system. 182 In circumstances where exact timing alignment between the audio 183 stream and the DTMF digits or other events is not important and data 184 is sent unicast, such as the IVR example mentioned earlier, it may be 185 preferable to use a reliable control protocol rather than RTP 186 packets. In those circumstances, this payload format would not be 187 used. 189 3.2 Simultaneous Generation of Audio and Events 191 A source MAY send events and coded audio packets for the same time 192 instants, using events as the redundant encoding for the audio 193 stream, or it MAY block outgoing audio while event tones are active 194 and only send named events as both the primary and redundant 195 encodings. 197 Note that a period covered by an encoded tone may overlap in time 198 with a period of audio encoded by other means. This is likely to 199 occur at the onset of a tone and is necessary to avoid possible 200 errors in the interpretation of the reproduced tone at the remote 201 end. Implementations supporting this payload format must be prepared 202 to handle the overlap. It is RECOMMENDED that gateways only render 203 the encoded tone since the audio may contain spurious tones 204 introduced by the audio compression algorithm. However, it is 205 anticipated that these extra tones in general should not interfere 206 with recognition at the far end. 208 3.3 Event Types 210 This payload format is used for five different types of signals: 212 o DTMF tones (Section 3.10); 214 o fax-related tones (Section 3.11); 216 o standard subscriber line tones (Section 3.12); 218 o for country-specific subscriber line tones (Section 3.13) and; 220 o for trunk events (Section 3.14). 222 A compliant implementation MUST support the events listed in Table 1. 223 If it uses some other, out-of-band mechanism for signaling line 224 conditions, it does not have to implement the other events. 226 In some cases, an implementation may simply ignore certain events, 227 such as fax tones, that do not make sense in a particular 228 environment. Section 3.9 specifies how an implementation can use the 229 SDP "fmtp" parameter within an SDP description to indicate its 230 inability to understand a particular event or range of events. 232 Depending on the available user interfaces, an implementation MAY 233 render all tones in Table 5 the same or, preferably, use the tones 234 conveyed by the concurrent "tone" payload or other RTP audio payload. 235 Alternatively, it could provide a textual representation. 237 Note that end systems that emulate telephones only need to support 238 the events described in Sections 3.10 and 3.12, while systems that 239 receive trunk signaling need to implement those in Sections 3.10, 240 3.11, 3.12 and 3.14, since MF trunks also carry most of the "line" 241 signals. Systems that do not support fax or modem functionality do 242 not need to render fax-related events described in Section 3.11. 244 The RTP payload format is designated as "telephone-event", the MIME 245 type as "audio/telephone-event". The default timestamp rate is 8000 246 Hz, but other rates may be defined. In accordance with current 247 practice, this payload format does not have a static payload type 248 number, but uses a RTP payload type number established dynamically 249 and out-of-band. 251 3.4 Use of RTP Header Fields 253 Timestamp: The RTP timestamp reflects the measurement point for 254 the current packet. The event duration described in Section 255 3.5 extends forwards from that time. The receiver 256 calculates jitter for RTCP receiver reports based on all 257 packets with a given timestamp. Note: The jitter value 258 should primarily be used as a means for comparing the 259 reception quality between two users or two time-periods, 260 not as an absolute measure. 262 Marker bit: The RTP marker bit indicates the beginning of a new 263 event. 265 3.5 Payload Format 267 The payload format is shown in Fig. 1. 269 0 1 2 3 270 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 271 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 272 | event |E|R| volume | duration | 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 275 Figure 1: Payload Format for Named Events 277 events: The events are encoded as shown in Sections 3.10 through 278 3.14. 280 volume: For DTMF digits and other events representable as tones, 281 this field describes the power level of the tone, expressed 282 in dBm0 after dropping the sign. Power levels range from 0 283 to -63 dBm0. The range of valid DTMF is from 0 to -36 dBm0 284 (must accept); lower than -55 dBm0 must be rejected (TR- 285 TSY-000181, ITU-T Q.24A). Thus, larger values denote lower 286 volume. This value is defined only for DTMF digits. For 287 other events, it is set to zero by the sender and is 288 ignored by the receiver. 290 duration: Duration of this digit, in timestamp units. Thus, the 291 event began at the instant identified by the RTP timestamp 292 and has so far lasted as long as indicated by this 293 parameter. The event may or may not have ended. 295 For a sampling rate of 8000 Hz, this field is 296 sufficient to express event durations of up to 297 approximately 8 seconds. 299 E: If set to a value of one, the "end" bit indicates that this 300 packet contains the end of the event. Thus, the duration 301 parameter above measures the complete duration of the 302 event. 304 A sender MAY delay setting the end bit until retransmitting 305 the last packet for a tone, rather than on its first 306 transmission. This avoids having to wait to detect whether 307 the tone has indeed ended. 309 Receiver implementations MAY use different algorithms to 310 create tones, including the two described here. In the 311 first, the receiver simply places a tone of the given 312 duration in the audio playout buffer at the location 313 indicated by the timestamp. As additional packets are 314 received that extend the same tone, the waveform in the 315 playout buffer is extended accordingly. (Care has to be 316 taken if audio is mixed, i.e., summed, in the playout 317 buffer rather than simply copied.) Thus, if a packet in a 318 tone lasting longer than the packet interarrival time gets 319 lost and the playout delay is short, a gap in the tone may 320 occur. Alternatively, the receiver can start a tone and 321 play it until it receives a packet with the "E" bit set, 322 the next tone, distinguished by a different timestamp value 323 or a given time period elapses. This is more robust against 324 packet loss, but may extend the tone if all retransmissions 325 of the last packet in an event are lost. Limiting the time 326 period of extending the tone is necessary to avoid that a 327 tone "gets stuck". Regardless of the algorithm used, the 328 tone SHOULD NOT be extended by more than three packet 329 interarrival times. A slight extension of tone durations 330 and shortening of pauses is generally harmless. 332 R: This field is reserved for future use. The sender MUST set it 333 to zero, the receiver MUST ignore it. 335 3.6 Sending Event Packets 337 An audio source SHOULD start transmitting event packets as soon as it 338 recognizes an event and every 50 ms thereafter or the packet interval 339 for the audio codec used for this session, if known. (The sender does 340 not need to maintain precise time intervals between event packets in 341 order to maintain precise inter-event times, since the timing 342 information is contained in the timestamp.) 344 Q.24 [5], Table A-1, indicates that all administrations 345 surveyed use a minimum signal duration of 40 ms, with 346 signaling velocity (tone and pause) of no less than 93 ms. 348 If an event continues for more than one period, the source generating 349 the events should send a new event packet with the RTP timestamp 350 value corresponding to the beginning of the event and the duration of 351 the event increased correspondingly. (The RTP sequence number is 352 incremented by one for each packet.) If there has been no new event 353 in the last interval, the event SHOULD be retransmitted three times 354 or until the next event is recognized. This ensures that the duration 355 of the event can be recognized correctly even if the last packet for 356 an event is lost. 358 DTMF digits and events are sent incrementally to avoid 359 having the receiver wait for the completion of the event. 360 Since some tones are two seconds long, this would incur a 361 substantial delay. The transmitter does not know if event 362 length is important and thus needs to transmit immediately 363 and incrementally. If the receiver application does not 364 care about event length, the incremental transmission 365 mechanism avoids delay. Some applications, such as gateways 366 into the PSTN, care about both delays and event duration. 368 3.7 Reliability 370 During an event, the RTP event payload format provides incremental 371 updates on the event. The error resiliency depends on the playout 372 delay at the receiver. For example, for a playout delay of 120 ms and 373 a packet gap of 50 ms, two packets in a row can get lost without 374 causing a gap in the tones generated at the receiver. 376 The audio redundancy mechanism described in RFC 2198 [6] MAY be used 377 to recover from packet loss across events. The effective data rate is 378 r times 64 bits (32 bits for the redundancy header and 32 bits for 379 the telephone-event payload) every 50 ms or r times 1280 bits/second, 380 where r is the number of redundant events carried in each packet. The 381 value of r is an implementation trade-off, with a value of 5 382 suggested. 384 The timestamp offset in this redundancy scheme has 14 bits, 385 so that it allows a single packet to "cover" 2.048 seconds 386 of telephone events at a sampling rate of 8000 Hz. 387 Including the starting time of previous events allows 388 precise reconstruction of the tone sequence at a gateway. 389 The scheme is resilient to consecutive packet losses 390 spanning this interval of 2.048 seconds or r digits, 391 whichever is less. Note that for previous digits, only an 392 average loudness can be represented. 394 An encoder MAY treat the event payload as a highly-compressed version 395 of the current audio frame. In that mode, each RTP packet during an 396 even would contain the current audio codec rendition (say, G.723.1 or 397 G.729) of this digit as well as the representation described in 398 Section 3.5, plus any previous events seen earlier. 400 This approach allows dumb gateways that do not understand 401 this format to function. See also the discussion in Section 402 1. 404 3.8 Example 406 A typical RTP packet, where the user is just dialing the last digit 407 of the DTMF sequence "911". The first digit was 200 ms long (1600 408 timestamp units) and started at time 0, the second digit lasted 250 409 ms (2000 timestamp units) and started at time 800 ms (6400 timestamp 410 units), the third digit was pressed at time 1.4 s (11,200 timestamp 411 units) and the packet shown was sent at 1.45 s (11,600 timestamp 412 units). The frame duration is 50 ms. To make the parts recognizable, 413 the figure below ignores byte alignment. Timestamp and sequence 414 number are assumed to have been zero at the beginning of the first 415 digit. In this example, the dynamic payload types 96 and 97 have been 416 assigned for the redundancy mechanism and the telephone event 417 payload, respectively. 419 0 1 2 3 420 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 421 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 422 |V=2|P|X| CC |M| PT | sequence number | 423 | 2 |0|0| 0 |0| 96 | 28 | 424 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 425 | timestamp | 426 | 11200 | 427 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 428 | synchronization source (SSRC) identifier | 429 | 0x5234a8 | 430 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 431 |F| block PT | timestamp offset | block length | 432 |1| 97 | 11200 | 4 | 433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 434 |F| block PT | timestamp offset | block length | 435 |1| 97 | 11200 - 6400 = 4800 | 4 | 436 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 437 |F| Block PT | 438 |0| 97 | 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 440 | digit |E R| volume | duration | 441 | 9 |1 0| 7 | 1600 | 442 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 443 | digit |E R| volume | duration | 444 | 1 |1 0| 10 | 2000 | 445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 446 | digit |E R| volume | duration | 447 | 1 |0 0| 20 | 400 | 448 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 3.9 Indication of Receiver Capabilities using SDP 452 Receivers MAY indicate which named events they can handle, for 453 example, by using the Session Description Protocol (RFC 2327 [7]). 454 The payload formats use the following fmtp format to list the event 455 values that they can receive: 457 a=fmtp: 459 The list of values consists of comma-separated elements, which can be 460 either a single decimal number or two decimal numbers separated by a 461 hyphen (dash), where the second number is larger than the first. No 462 whitespace is allowed between numbers or hyphens. The list does not 463 have to be sorted. 465 For example, if the payload format uses the payload type number 100, 466 and the implementation can handle the common DTMF tones (events 0 467 through 11) and the dial and ringing tones, it would include the 468 following description in its SDP message: 470 a=fmtp:100 0-11,66,70 472 The corresponding MIME parameter is "events", so that the following 473 sample media type definition corresponds to the SDP example above: 475 audio/telephone-event;events="0-11,66,67";rate="8000" 477 3.10 DTMF Events 479 Tables 1 summarizes the DTMF-related named events within the 480 telephone-event payload format. 482 Event encoding (decimal) 483 _________________________ 484 0--9 0--9 485 * 10 486 # 11 487 A--D 12--15 488 Flash 16 490 Table 1: DTMF named events 492 3.11 Data Modem and Fax Events 494 Table 3.11 summarizes the events and tones that can appear on a 495 subscriber line serving a fax machine or modem. The tones are 496 described below, with additional detail in Table 7. 498 ANS: This 2100 +/- 15 Hz tone is used to disable echo 499 suppression for data transmission [8,9]. For fax machines, 500 Recommendation T.30 [9] refers to this tone as called 501 terminal identification (CED) answer tone. 503 /ANS: This is the same signal as ANS, except that it reverses 504 phase at an interval of 450 +/- 25 ms. It disables both 505 echo cancellers and echo suppressors. (In the ITU 506 Recommendation, this signal is rendered as ANS with a bar 507 on top.) 509 ANSam: The modified answer tone (ANSam) [3] is a sinewave signal 510 at 2100 +/- 1 Hz with phase reversals at an interval of 450 511 +/- 25 ms, amplitude-modulated by a sinewave at 15 +/- 0.1 512 Hz. This tone [10,8] is sent by modems [11] and faxes to 513 disable echo suppressors. 515 /ANSam: This is the same signal as ANSam, except that it 516 reverses phase at an interval of 450 +/- 25 ms. It disables 517 both echo cancellers and echo suppressors. (In the ITU 518 Recommendation, this signal is rendered as ANSam with a bar 519 on top.) 521 CNG: After dialing the called fax machine's telephone number 522 (and before it answers), the calling Group III fax machine 523 (optionally) begins sending a CalliNG tone (CNG) consisting 524 of an interrupted tone of 1100 Hz. [9] 526 CRd: Capabilities Request (CRd) [12] is a dual-tone signal with 527 tones at tones at 1375 Hz and 2002 Hz for 400 ms for the 528 initiating side and 1529 Hz and 2225 Hz for the responding 529 side, followed by a single tone at 1900 Hz for 100 ms. 530 "This signal requests the remote station transition from 531 telephony mode to an information transfer mode and requests 532 the transmission of a capabilities list message by the 533 remote station. In particular, CRd is sent by the 534 initiating station during the course of a call, or by the 535 calling station at call establishment in response to a CRe 536 or MRe." 538 CRe: Capabilities Request (CRe) [12] is a dual-tone signal with 539 tones at tones at 1375 Hz and 2002 Hz for 400 ms, followed 540 by a single tone at 400 Hz for 100 ms. "This signal 541 requests the remote station transition from telephony mode 542 to an information transfer mode and requests the 543 transmission of a capabilities list message by the remote 544 station. In particular, CRe is sent by an automatic 545 answering station at call establishment." 547 CT: "The calling tone [8] consists of a series of interrupted 548 bursts of binary 1 signal or 1300 Hz, on for a duration of 549 not less than 0.5 s and not more than 0.7 s and off for a 550 duration of not less than 1.5 s and not more than 2.0 s." 551 Modems not starting with the V.8 call initiation tone often 552 use this tone. 554 ESi: Escape Signal (ESi) [12] is a dual-tone signal with tones 555 at 1375 Hz and 2002 Hz for 400 ms, followed by a single 556 tone at 980 Hz for 100 ms. "This signal requests the remote 557 station transition from telephony mode to an information 558 transfer mode. signal ESi is sent by the initiating 559 station." 561 ESr: Escape Signal (ESr) [12] is a dual-tone signal with tones 562 at 1529 Hz and 2225 Hz for 400 ms, followed by a single 563 tone at 1650 Hz for 100 ms. Same as ESi, but sent by the 564 responding station. 566 MRd: Mode Request (MRd) [12] is a dual-tone signals with tones 567 at 1375 Hz and 2002 Hz for 400 ms for the initiating side 568 and 1529 Hz and 2225 Hz for the responding side, followed 569 by a single tone at 1150 Hz for 100 ms. "This signal 570 requests the remote station transition from telephony mode 571 to an information transfer mode and requests the 572 transmission of a mode select message by the remote 573 station. In particular, signal MRd is sent by the 574 initiating station during the course of a call, or by the 575 calling station at call establishment in response to an 576 MRe." [12] 578 MRe: Mode Request (MRe) [12] is a dual-tone signal with tones at 579 1375 Hz and 2002 Hz for 400 ms, followed by a single tone 580 at 650 Hz for 100 ms. "This signal requests the remote 581 station transition from telephony mode to an information 582 transfer mode and requests the transmission of a mode 583 select message by the remote station. In particular, signal 584 MRe is sent by an automatic answering station at call 585 establishment." [12] 587 V.21: V.21 describes a 300 b/s full-duplex modem that employs 588 frequency shift keying (FSK). It is used by Group 3 fax 589 machines to exchange T.30 information. The calling 590 transmits on channel 1 and receives on channel 2; the 591 answering modem transmits on channel 2 and receives on 592 channel 1. Each bit value has a distinct tone, so that V.21 593 signaling comprises a total of four distinct tones. 595 In summary, procedures in Table 2 are used. 597 3.12 Line Events 599 Table 4 summarizes the events and tones that can appear on a 600 subscriber line. 602 Procedure indications 603 ________________________________________________________ 604 V.25 and V.8 ANS, ANS, ... 605 V.25, echo canceller disabled ANS, /ANS, ANS, /ANS 606 V.8 ANSam, ANSam, ... 607 V.8, echo canceller disabled ANSam, /ANSam, ANSam, ... 609 Table 2: Use of ANS, ANSam and /ANSam in V.x recommendations 611 Event____________________encoding_(decimal) 612 Answer tone (ANS) 32 613 /ANS 33 614 ANSam 34 615 /ANSam 35 616 Calling tone (CNG) 36 617 V.21 channel 1, "0" bit 37 618 V.21 channel 1, "1" bit 38 619 V.21 channel 2, "0" bit 39 620 V.21 channel 2, "1" bit 40 621 CRd 41 622 CRe 42 623 ESi 43 624 ESr 44 625 MRd 45 626 MRe 46 627 CT 47 629 Table 3: Data and fax named events 631 ITU Recommendation E.182 [13] defines when certain tones should be 632 used. It defines the following standard tones that are heard by the 633 caller: 635 Dial tone: The exchange is ready to receive address information. 637 PABX internal dial tone: The PABX is ready to receive address 638 information. 640 Special dial tone: Same as dial tone, but the caller's line is 641 subject to a specific condition, such as call diversion or 642 a voice mail is available (e.g., "stutter dial tone"). 644 Second dial tone: The network has accepted the address 645 information, but additional information is required. 647 Ring: This named signal event causes the recipient to generate 648 an alerting signal ("ring"). The actual tone or other 649 indication used to render this named event is left up to 650 the receiver. (This differs from the ringing tone, below, 651 heard by the caller 653 Ringing tone: The call has been placed to the callee and a 654 calling signal (ringing) is being transmitted to the 655 callee. This tone is also called "ringback". 657 Special ringing tone: A special service, such as call forwarding 658 or call waiting, is active at the called number. 660 Busy tone: The called telephone number is busy. 662 Congestion tone: Facilities necessary for the call are 663 temporarily unavailable. 665 Calling card service tone: The calling card service tone 666 consists of 60 ms of the sum of 941 Hz and 1477 Hz tones 667 (DTMF '#'), followed by 940 ms of 350 Hz and 440 Hz (U.S. 668 dial tone), decaying exponentially with a time constant of 669 200 ms. 671 Special information tone: The callee cannot be reached, but the 672 reason is neither "busy" nor "congestion". This tone should 673 be used before all call failure announcements, for the 674 benefit of automatic equipment. 676 Comfort tone: The call is being processed. This tone may be used 677 during long post-dial delays, e.g., in international 678 connections. 680 Hold tone: The caller has been placed on hold. Replaced by 681 Greensleeves 683 Record tone: The caller has been connected to an automatic 684 answering device and is requested to begin speaking. 686 Caller waiting tone: The called station is busy, but has call 687 waiting service. 689 Pay tone: The caller, at a payphone, is reminded to deposit 690 additional coins. 692 Positive indication tone: The supplementary service has been 693 activated. 695 Negative indication tone: The supplementary service could not be 697 Off-hook warning tone: The caller has left the instrument off- 698 hook for an extended period of time. activated. 700 The following tones can be heard be either calling or called party 701 during a conversation: 703 Call waiting tone: Another party wants to reach the subscriber. 705 Warning tone: The call is being recorded. This tone is not 706 required in all jurisdictions. 708 Intrusion tone: The call is being monitored, e.g., by an 709 operator. (Use by law enforcement authorities is optional.) 711 CPE alerting signal: A tone used to alert a device to an 712 arriving in-band FSK data transmission. A CPE alerting 713 signal is a combined 2130 and 2750 Hz tone, both with 714 tolerances of 0.5% and a duration of 80 to. 80 ms. The CPE 715 alerting signal is used with ADSI services and Call Waiting 716 ID services [14]. 718 The following tones are heard by operators: 720 Payphone recognition tone: The person making the call or being 721 called is using a payphone (and thus it is ill-advised to 722 allow collect calls to such a person). 724 3.13 Extended Line Events 726 Table 5 summarizes country-specific events and tones that can appear 727 on a subscriber line. 729 3.14 Trunk Events 731 Table 6 summarizes the events and tones that can appear on a trunk. 732 Note that trunk can also carry line events (Section 3.12), as MF 733 signaling does not include backward signals [15]. 735 ABCD transitional: 4-bit signaling used by digital trunks. For 736 N-state signaling, the first N values are used. 738 The T1 ESF (extended super frame format) allows 2, 4, and 739 16 state signalling bit options. These signalling bits are 740 Event encoding (decimal) 741 _____________________________________________ 742 Off Hook 64 743 On Hook 65 744 Dial tone 66 745 PABX internal dial tone 67 746 Special dial tone 68 747 Second dial tone 69 748 Ringing tone 70 749 Special ringing tone 71 750 Busy tone 72 751 Congestion tone 73 752 Special information tone 74 753 Comfort tone 75 754 Hold tone 76 755 Record tone 77 756 Caller waiting tone 78 757 Call waiting tone 79 758 Pay tone 80 759 Positive indication tone 81 760 Negative indication tone 82 761 Warning tone 83 762 Intrusion tone 84 763 Calling card service tone 85 764 Payphone recognition tone 86 765 CPE alerting signal (CAS) 87 766 Off-hook warning tone 88 767 Ring 89 769 Table 4: E.182 line events 771 named A, B, C, and D. Signalling information is sent as 772 robbed bits in frames 6, 12, 18, and 24 when using ESF T1 773 framing. A D4 superframe only transmits 4-state signalling 774 with A and B bits. On the CEPT E1 frame, all signalling is 775 carried in timeslot 16, and two channels of 16-state (ABCD) 776 signalling are sent per frame. 778 Since this information is a state rather than a changing 779 signal, implementations SHOULD use the following triple- 780 redundancy mechanism, similar to the one specified in ITU-T 781 Rec. I.366.2 [16], Annex L. At the time of a transition, 782 the same ABCD information is sent 3 times at an interval of 783 5 ms. If another transition occurs during this time, then 784 this continues. After a period of no change, the ABCD 785 information is sent every 5 seconds. 787 Event encoding (decimal) 788 ___________________________________________________ 789 Acceptance tone 96 790 Confirmation tone 97 791 Dial tone, recall 98 792 End of three party service tone 99 793 Facilities tone 100 794 Line lockout tone 101 795 Number unobtainable tone 102 796 Offering tone 103 797 Permanent signal tone 104 798 Preemption tone 105 799 Queue tone 106 800 Refusal tone 107 801 Route tone 108 802 Valid tone 109 803 Waiting tone 110 804 Warning tone (end of period) 111 805 Warning Tone (PIP tone) 112 807 Table 5: Country-specific Line events 809 Wink: A brief transition, typically 120-290 ms, from on-hook 810 (unseized) to off-hook (seized) and back to onhook, used by 811 the incoming exchange to signal that the call address 812 signaling can proceed. 814 Incoming seizure: Incoming indication of call attempt (off- 815 hook). 817 Return seizure: Seizure by answering exchange, in response to 818 outgoing seizure. [NOTE: Not clear why the difference here, 819 but not for Unseize. Should probably be just Seizure.] 821 Unseize circuit: Transition of circuit from off-hook to on-hook 822 at the end of a call. 824 Wink off: A brief transition, typically 100-350 ms, from off- 825 hook (seized) to on-hook (unseized) and back to off-hook 826 (seized). Used in operator services trunks. 828 Continuity tone send: A tone of 2010 Hz. 830 Continuity tone detect: A tone of 2010 Hz. 832 Continuity test send: A tone of 1780 Hz is sent by the calling 833 exchange. If received by the called exchange, it returns a 835 Event encoding (decimal) 836 __________________________________________________ 837 MF 0... 9 128... 137 838 MF K0 or KP (start-of-pulsing) 138 839 MF K1 139 840 MF K2 140 841 MF S0 to ST (end-of-pulsing) 141 842 MF S1... S3 142... 143 843 ABCD signaling (see below) 144... 159 844 Wink 160 845 Wink off 161 846 Incoming seizure 162 847 Return seizure 163 848 Unseize circuit 164 849 Continuity test 165 850 Default continuity tone 166 851 Continuity tone (single tone) 167 852 Continuity test send 168 853 Continuity verified 170 854 Loopback 171 855 Old milliwatt tone (1000 Hz) 172 856 New milliwatt tone (1004 Hz) 173 858 Table 6: Trunk events 860 "continuity verified" tone. 862 Continuity verified: A tone of 2010 Hz. This is a response tone, 863 used in dual-tone procedures. 865 4 RTP Payload Format for Telephony Tones 867 4.1 Introduction 869 As an alternative to describing tones and events by name, as 870 described in Section 3, it is sometimes preferable to describe them 871 by their waveform properties. In particular, recognition is faster 872 than for naming signals since it does not depend on recognizing 873 durations or pauses. 875 There is no single international standard for telephone tones such as 876 dial tone, ringing (ringback), busy, congestion ("fast-busy"), 877 special announcement tones or some of the other special tones, such 878 as payphone recognition, call waiting or record tone. However, across 879 all countries, these tones share a number of characteristics [17]: 881 o Telephony tones consist of either a single tone, the addition 882 of two or three tones or the modulation of two tones. (Almost 883 all tones use two frequencies; only the Hungarian "special 884 dial tone" has three.) Tones that are mixed have the same 885 amplitude and do not decay. 887 o Tones for telephony events are in the range of 25 (ringing 888 tone in Angola) to 1800 Hz. CED is the highest used tone at 889 2100 Hz. The telephone frequency range is limited to 3,400 Hz. 891 o Modulation frequencies range between 15 (ANSam tone) to 480 Hz 892 (Jamaica). Non-integer frequencies are used only for 893 frequencies of 16 2/3 and 33 1/3 Hz. (These fractional 894 frequencies appear to be derived from older AC power grid 895 frequencies.) 897 o Tones that are not continuous have durations of less than four 898 seconds. 900 o ITU Recommendation E.180 [18] notes that different telephone 901 companies require a tone accuracy of between 0.5 and 1.5%. 902 The Recommendation suggests a frequency tolerance of 1%. 904 4.2 Examples of Common Telephone Tone Signals 906 As an aid to the implementor, Table 7 summarizes some common tones. 907 The rows labeled "ITU ..." refer to the general recommendation of 908 Recommendation E.180 [18]. Note that there are no specific guidelines 909 for these tones. In the table, the symbol "+" indicates addition of 910 the tones, without modulation, while "*" indicates amplitude 911 modulation. The meaning of some of the tones is described in Section 912 3.12 or Section 3.11 (for V.21). 914 4.3 Use of RTP Header Fields 916 Timestamp: The RTP timestamp reflects the measurement point for 917 the current packet. The event duration described in Section 918 3.5 extends forwards from that time. 920 4.4 Payload Format 922 Based on the characteristics described above, this document defines 923 an RTP payload format called "tone" that can represent tones 924 consisting of one or more frequencies. (The corresponding MIME type 925 is "audio/tone".) The default timestamp rate is 8,000 Hz, but other 926 rates may be defined. Note that the timestamp rate does not affect 927 the interpretation of the frequency, just the durations. 929 Tone name frequency on period off period 930 ______________________________________________________ 931 CNG 1100 0.5 3.0 932 V.25 CT 1300 0.5 2.0 933 CED 2100 3.3 -- 934 ANS 2100 3.3 -- 935 ANSam 2100*15 3.3 -- 936 V.21 "0" bit, ch. 1 1180 0.033 937 V.21 "1" bit, ch. 1 980 0.033 938 V.21 "0" bit, ch. 2 1850 0.033 939 V.21_"1"_bit,_ch._2________1650______0.033____________ 940 ITU dial tone 425 -- -- 941 U.S. dial tone 350+440 -- -- 942 ______________________________________________________ 943 ITU ringing tone 425 0.67--1.5 3--5 944 U.S._ringing_tone_______440+480________2.0_________4.0 945 ITU busy tone 425 946 U.S. busy tone 480+620 0.5 0.5 947 ______________________________________________________ 948 ITU congestion tone 425 949 U.S. congestion tone 480+620 0.25 0.25 951 Table 7: Examples of telephony tones 953 In accordance with current practice, this payload format does not 954 have a static payload type number, but uses a RTP payload type number 955 established dynamically and out-of-band. 957 It is shown in Fig. 2. 959 The payload contains the following fields: 961 modulation: The modulation frequency, in Hz. The field is a 9- 962 bit unsigned integer, allowing modulation frequencies up to 963 511 Hz. If there is no modulation, this field has a value 964 of zero. 966 T: If the "T" bit is set (one), the modulation frequency is to 967 be divided by three. Otherwise, the modulation frequency is 968 taken as is. 970 This bit allows frequencies accurate to 1/3 Hz, since 971 modulation frequencies such as 16 2/3 Hz are in 972 practical use. 974 0 1 2 3 975 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 976 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 977 | modulation |T| volume | duration | 978 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 979 |R R R R| frequency |R R R R| frequency | 980 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 981 |R R R R| frequency |R R R R| frequency | 982 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 983 ...... 985 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 986 |R R R R| frequency |R R R R| frequency | 987 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 989 Figure 2: Payload format for tones 991 volume: The power level of the tone, expressed in dBm0 after 992 dropping the sign, with range from 0 to -63 dBm0. (Note: A 993 preferred level range for digital tone generators is -8 994 dBm0 to -3 dBm0.) 996 duration: The duration of the tone, measured in timestamp units. 997 The tone begins at the instant identified by the RTP 998 timestamp and lasts for the duration value. 1000 The definition of duration corresponds to that for 1001 sample-based codecs, where the timestamp represents 1002 the sampling point for the first sample. 1004 frequency: The frequencies of the tones to be added, measured in 1005 Hz and represented as a 12-bit unsigned integer. The field 1006 size is sufficient to represent frequencies up to 4095 Hz, 1007 which exceeds the range of telephone systems. A value of 1008 zero indicates silence. A single tone can contain any 1009 number of frequencies. 1011 R: This field is reserved for future use. The sender MUST set it 1012 to zero, the receiver MUST ignore it. 1014 4.5 Reliability 1016 This payload format uses the reliability mechanism described in 1017 Section 3.7. 1019 5 Combining Tones and Named Events 1021 The payload formats in Sections 3 and 4 can be combined into a single 1022 payload using the method specified in RFC 2198. Fig. 3 shows an 1023 example. In that example, the RTP packet combines two "tone" and one 1024 "telephone-event" payloads. The payload types are chosen arbitrarily 1025 as 97 and 98, respectively, with a sample rate of 8000 Hz. Here, the 1026 redundancy format has the dynamic payload type 96. 1028 The packet represents a snapshot of U.S. ringing tone, 1.5 seconds 1029 (12,000 timestamp units) into the second "on" part of the 2.0/4.0 1030 second cadence, i.e., a total of 7.5 seconds (60,000 timestamp units) 1031 into the ring cycle. The 440 + 480 Hz tone of this second cadence 1032 started at RTP timestamp 48,000. Four seconds of silence preceded it, 1033 but since RFC 2198 only has a fourteen-bit offset, only 2.05 seconds 1034 (16383 timestamp units) can be represented. Even though the tone 1035 sequence is not complete, the sender was able to determine that this 1036 is indeed ringback, and thus includes the corresponding named event. 1038 6 MIME Registration 1040 6.1 audio/telephone-event 1042 MIME media type name: audio 1044 MIME subtype name: telephone-event 1046 Required parameters: none. 1048 Optional parameters: The "events" parameter lists the events 1049 supported by the implementation. Events are listed as one 1050 or more comma-separated elements. Each element can either 1051 be a single integer or two integers separated by a hyphen. 1052 No white space is allowed in the argument. The integers 1053 designate the event numbers supported by the 1054 implementation. 1056 The "rate" parameter describes the sampling rate, in Hertz. 1057 The number is written as a floating point number or as an 1058 integer. If omitted, the default value is 8000 Hz. 1060 Encoding considerations: This type is only defined for transfer 1061 via RTP [1]. 1063 Security considerations: See the "Security Considerations" 1065 0 1 2 3 1066 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1067 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1068 | V |P|X| CC |M| PT | sequence number | 1069 | 2 |0|0| 0 |0| 96 | 31 | 1070 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1071 | timestamp | 1072 | 48000 | 1073 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1074 | synchronization source (SSRC) identifier | 1075 | 0x5234a8 | 1076 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1077 |F| block PT | timestamp offset | block length | 1078 |1| 98 | 16383 | 4 | 1079 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1080 |F| block PT | timestamp offset | block length | 1081 |1| 97 | 16383 | 8 | 1082 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1083 |F| Block PT | 1084 |0| 97 | 1085 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1086 | event=ring |0|0| volume=0 | duration=28383 | 1087 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1089 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1090 | modulation=0 |0| volume=63 | duration=16383 | 1091 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1092 |0 0 0 0| frequency=0 |0 0 0 0| frequency=0 | 1093 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1095 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1096 | modulation=0 |0| volume=5 | duration=12000 | 1097 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1098 |0 0 0 0| frequency=440 |0 0 0 0| frequency=480 | 1099 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1101 Figure 3: Combining tones and events in a single RTP packet 1103 (Section 7) section in this document. 1105 Interoperability considerations: none 1107 Published specification: This document. 1109 Applications which use this media: The telephone-event audio 1110 subtype supports the transport of events occuring in 1111 telephone systems over the Internet. 1113 Additional information: 1115 1. Magic number(s): N/A 1117 2. File extension(s): N/A 1119 3. Macintosh file type code: N/A 1121 6.2 audio/tone 1123 MIME media type name: audio 1125 MIME subtype name: tone 1127 Required parameters: none 1129 Optional parameters: The "rate" parameter describes the sampling 1130 rate, in Hertz. The number is written as a floating point 1131 number or as an integer. If omitted, the default value is 1132 8000 Hz. 1134 Encoding considerations: This type is only defined for transfer 1135 via RTP [1]. 1137 Security considerations: See the "Security Considerations" 1138 (Section 7) section in this document. 1140 Interoperability considerations: none 1142 Published specification: This document. 1144 Applications which use this media: The tone audio subtype 1145 supports the transport of pure composite tones, for example 1146 those commonly used in the current telephone system to 1147 signal call progress. 1149 Additional information: 1151 1. Magic number(s): N/A 1153 2. File extension(s): N/A 1155 3. Macintosh file type code: N/A 1157 7 Security Considerations 1159 RTP packets using the payload format defined in this specification 1160 are subject to the security considerations discussed in the RTP 1161 specification (RFC 1889 [1]), and any appropriate RTP profile (for 1162 example RFC 1890 [19]).This implies that confidentiality of the media 1163 streams is achieved by encryption. Because the data compression used 1164 with this payload format is applied end-to-end, encryption may be 1165 performed after compression so there is no conflict between the two 1166 operations. 1168 This payload type does not exhibit any significant non-uniformity in 1169 the receiver side computational complexity for packet processing to 1170 cause a potential denial-of-service threat. 1172 8 IANA Considerations 1174 This document defines two new RTP payload formats, named telephone- 1175 event and tone, and associated Internet media (MIME) types, 1176 audio/telephone-event and audio/tone. 1178 Within the audio/telephone-event type, additional events MUST be 1179 registered with IANA. Registrations are subject to approval by the 1180 current chair of the IETF audio/video transport working group, or by 1181 an expert designated by the transport area director if the AVT group 1182 has closed. 1184 The meaning of new events MUST be documented either as an RFC or an 1185 equivalent standards document produced by another standardization 1186 body, such as ITU-T. 1188 9 Acknowledgements 1190 The suggestions of the Megaco working group are gratefully 1191 acknowledged. Detailed advice and comments were provided by Fred 1192 Burg, Steve Casner, Fatih Erdin, Bill Foster, Mike Fox, Gunnar 1193 Hellstrom, Terry Lyons, Colin Perkins and Steve Magnell. 1195 10 Authors 1197 Henning Schulzrinne 1198 Dept. of Computer Science 1199 Columbia University 1200 1214 Amsterdam Avenue 1201 New York, NY 10027 1202 USA 1203 electronic mail: schulzrinne@cs.columbia.edu 1204 Scott Petrack 1205 MetaTel 1206 45 Rumford Avenue 1207 Waltham, MA 02453 1208 USA 1209 electronic mail: scott.petrack@metatel.com 1211 11 Bibliography 1213 [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a 1214 transport protocol for real-time applications," Request for Comments 1215 (Proposed Standard) 1889, Internet Engineering Task Force, Jan. 1996. 1217 [2] S. Bradner, "Key words for use in RFCs to indicate requirement 1218 levels," Request for Comments (Best Current Practice) 2119, Internet 1219 Engineering Task Force, Mar. 1997. 1221 [3] International Telecommunication Union, "Procedures for starting 1222 sessions of data transmission over the public switched telephone 1223 network," Recommendation V.8, Telecommunication Standardization 1224 Sector of ITU, Geneva, Switzerland, Feb. 1998. 1226 [4] R. Kocen and T. Hatala, "Voice over frame relay implementation 1227 agreement," Implementation Agreement FRF.11, Frame Relay Forum, 1228 Foster City, California, Jan. 1997. 1230 [5] International Telecommunication Union, "Multifrequency push- 1231 button signal reception," Recommendation Q.24, Telecommunication 1232 Standardization Sector of ITU, Geneva, Switzerland, 1988. 1234 [6] C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J. C. 1235 Bolot, A. Vega-Garcia, and S. Fosse-Parisis, "RTP payload for 1236 redundant audio data," Request for Comments (Proposed Standard) 2198, 1237 Internet Engineering Task Force, Sept. 1997. 1239 [7] M. Handley and V. Jacobson, "SDP: session description protocol," 1240 Request for Comments (Proposed Standard) 2327, Internet Engineering 1241 Task Force, Apr. 1998. 1243 [8] International Telecommunication Union, "Automatic answering 1244 equipment and general procedures for automatic calling equipment on 1245 the general switched telephone network including procedures for 1246 disabling of echo control devices for both manually and automatically 1247 established calls," Recommendation V.25, Telecommunication 1248 Standardization Sector of ITU, Geneva, Switzerland, Oct. 1996. 1250 [9] International Telecommunication Union, "Procedures for document 1251 facsimile transmission in the general switched telephone network," 1252 Recommendation T.30, Telecommunication Standardization Sector of ITU, 1253 Geneva, Switzerland, July 1996. 1255 [10] International Telecommunication Union, "Echo cancellers," 1256 Recommendation G.165, Telecommunication Standardization Sector of 1257 ITU, Geneva, Switzerland, Mar. 1993. 1259 [11] International Telecommunication Union, "A modem operating at 1260 data signalling rates of up to 33 600 bit/s for use on the general 1261 switched telephone network and on leased point-to-point 2-wire 1262 telephone-type circuits," Recommendation V.34, Telecommunication 1263 Standardization Sector of ITU, Geneva, Switzerland, Feb. 1998. 1265 [12] International Telecommunication Union, "Procedures for the 1266 identification and selection of common modes of operation between 1267 data circuit-terminating equipments (dces) and between data terminal 1268 equipments (dtes) over the public switched telephone network and on 1269 leased point-to-point telephone-type circuits," Recommendation 1270 V.8bis, Telecommunication Standardization Sector of ITU, Geneva, 1271 Switzerland, Sept. 1998. 1273 [13] International Telecommunication Union, "Application of tones and 1274 recorded announcements in telephone services," Recommendation E.182, 1275 Telecommunication Standardization Sector of ITU, Geneva, Switzerland, 1276 Mar. 1998. 1278 [14] Bellcore, "Functional criteria for digital loop carrier 1279 systems," Technical Requirement TR-NWT-000057, Telcordia (formerly 1280 Bellcore), Morristown, New Jersey, Jan. 1993. 1282 [15] J. G. van Bosse, Signaling in Telecommunications Networks 1283 Telecommunications and Signal Processing, New York, New York: Wiley, 1284 1998. 1286 [16] International Telecommunication Union, "AAL type 2 service 1287 specific convergence sublayer for trunking," Recommendation I.366.2, 1288 Telecommunication Standardization Sector of ITU, Geneva, Switzerland, 1289 Feb. 1999. 1291 [17] International Telecommunication Union, "Various tones used in 1292 national networks," Recommendation Supplement 2 to Recommendation 1293 E.180, Telecommunication Standardization Sector of ITU, Geneva, 1294 Switzerland, Jan. 1994. 1296 [18] International Telecommunication Union, "Technical 1297 characteristics of tones for telephone service," Recommendation 1298 Supplement 2 to Recommendation E.180, Telecommunication 1299 Standardization Sector of ITU, Geneva, Switzerland, Jan. 1994. 1301 [19] H. Schulzrinne, "RTP profile for audio and video conferences 1302 with minimal control," Request for Comments (Proposed Standard) 1890, 1303 Internet Engineering Task Force, Jan. 1996.