Internet Engineering Task Force AVT WG Internet Draft SB Petrack ietf-avt-telephone-tones-00.txt VocalTec 17 November 1998 Expires: 17 May, 1999 RTP Payloads for Telephone Signal Events STATUS OF THIS MEMO This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress''. To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. ABSTRACT This note describes two RTP payload formats for in-band telephony signal events (TSE) such as DTMF, dial-tone, ring-tone, off-hook, SIT, etc. One payload is designed to carry a named signal, and the other is designed carry a compact representation of actual audio waveform cadence to be played. The two formats are independent, but they can be used together very usefully within redundant audio payloads; this enables highly efficient and robust transport of telephony network signal events along with a representation of the actual audio media associated to the signal events. Acknowledgements: This internet draft is an extension of H. Schulzrinne's draft ietf-avt-dtmf-00.txt and borrows heavily from it (including copying actual text). The main extensions appearing in this draft are as follows: a) many other telephony call progress tones and signal events have been added to the original DTMF and flash-hook of ietf-avt-dtmf-00.txt (an attempt has been made to align this with [MGCP] and [E.180 supp2] b) a second payload is defined which carries a highly compact frequency representation of the audio waveform of the signal; c) a few clarifications about reliability and redundancy, including how the two payloads will work together. Acknowledgement is also due to [MGCP]; part of this draft is an attempt to use RTP to provide the signal transport required by MGCP, in a way which can be reused by a large class of applications. Petrack [Page 1] Internet Draft TSE Payload 17 Nov, 1998 1 Introduction This memo defines two payload formats for carrying in-band telephony signal events (TSE) within RTP packets. These are events such as DTMF tones, MF tones, on-hook, off-hook, flash-hook events, ring tones and ring-back tones, continuity tones, busy tones, call waiting tones, etc. A complete list is given in section 2. It is desirable to transport these signals with a separate payload type for several reasons: a) low-rate voice codecs cannot be guaranteed to accurately reproduce the audio waveforms, while special tone generators can be driven by a special payload type; b) defining a separate payload type permits higher redundancy than for other payload types in the audio stream; c) It removes the need for complicated tone detection algorithms in equipment that receives the new payload type; d) it enables the separation of the detection of the signal tones in media gateways from the semantic interpretation of the tones in signalling gateways and "call agents" (see [MGCP]). The payload type must be suitable for both gateway and end- to-end scenarios. In gateway scenarios, a gateway connecting a packet voice network with the PSTN recreates the signal events and then injects them into the PSTN, or detects the signal events and then re-encodes them with the new payload format. Since the detection may take several tens of milliseconds, careful time and power (volume) alignment is needed to avoid generating spurious events for some applications. For other applications, such as MGCP Call Agents [MGCP] or interactive voice response (IVR) systems directly connected to the packet voice network, time alignment and volume levels are not important, since the unit will not perform any signal analysis to detect the signal events from the audio stream. When these telephony signal events are carried in-band as part of the audio stream, such as is the case with DTMF digits, etc. senders SHOULD use the same sequence number and time-stamp base as the regular audio channel to simplify recreation of analog audio at a gateway. The default clock frequency is 8000 Hz, but the clock frequency can be redefined when assigning the dynamic payload type. This format achieves a higher than the method proposed for the Voice over Frame Relay Implementation Agreement [1], even in the case of sustained packet loss. Note that these telephony signal events are often sent as audio waveforms within the ordinary audio stream. In these cases, a source MAY send a TSE payload and coded audio packets for the same Petrack [Page 2] Internet Draft TSE Payload 17 Nov, 1998 time instants, using TSE as the redundant encoding for the audio stream, or it MAY block outgoing audio while the TSE tones are active and only send TSE as both the primary and redundant encodings. For signal events which are not normally sent as audio waveforms (such as on-hook, flash-hook, etc.), the regular audio SHOULD be blocked for the duration of the TSE. In this note we define two independent payload formats: a) a payload for named signal events (NSE). This payload is useful when there is no need to send any encoding of the audio waveforms that are used within the telephone network to represent the signal events, or when no such audio waveforms exist. b) a payload for signal tone frequencies (STF). This payload is useful when it is desired to transmit the actual audio media associated to the telephone signal event. Each of these payloads can be viewed as a special type of audio compression. Just as there are codecs which are highly optimised (say) for human speech, these payloads are highly optimised for the special audio which signals certain events within telephone networks. When the transmitter and receiver both agree (via non RTP means) on the audio interpretation of the TSE, the transmitter SHOULD use the named signal events payload. For example, this is usually the case for two PSTN gateways within a single country: "dial tone" is the same in all parts of the United States. When the RTP transmitter needs finer control over the actual audio media to be played at the receiver, it SHOULD use the signal tone frequencies payload. For example, since ringback is different in Mexico than in the United States, a Mexican gateway which used the named signal events payload to transmit ringback from Mexico to the United States might result in an incorrect audio signal ("USA ringback") being generated at the handset of the caller. The RTP transmitter MAY use the named signal events payload as a primary payload type and the tone frequencies payload as a secondary payload type within a redundant audio payload type (see section 4). This will allow the transmitter to send the semantic information present in the named signal ("ringback") along with the representation of the audio. Some of the named signal events are very long (on the order of several seconds). If a gateway waits to detect the named signal event before transmitting it within an NSE packet, a substantial delay might result. A gateway MAY to transmit the audio within an ordinary encoded audio payload, or within a signal tone frequency payload. The NSE payload can then be used as a redundant audio type as soon as detection is accomplished. 2 Named Signal Event (NSE) Payload Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event |R R| volume | duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Petrack [Page 3] Internet Draft TSE Payload 17 Nov, 1998 volume: The power level of the digit, expressed in dBm0 after dropping the sign, with range from 0 to -63 dBm0. The range of valid DTMF is from 0 to -36 dBm0 (must accept); lower than -55 dBm0 must be rejected (TR-TSY-000181, ITU-T Q.24A). Thus, larger values denote lower volume. Note: since the acceptable dip is 10 dB and the minimum detectable loudness variation is 3 dB, this field could be compressed by at least a bit by reducing resolution to 2 dB, if needed. duration: Duration of this digit, in timestamp units. (For a sampling rate of 8000 Hz, this field is sufficient to express digit durations of upto approximately 8 seconds; the minimum permissible digit length is 40 ms.) R: This field is reserved for future use. The sender MUST set it to zero, the receiver MUST ignore it. event: The event name identifies the actual telephone signal event. The events are encoded as follows (these events are extracted from [E.180Supp2] and [MGCP]): ________________________________ | Encoding | Definition | |(decimal) | | |----------|--------------------| | 0 | DTMF 0 | | 1 | DTMF 1 | | 2 | DTMF 2 | | 3 | DTMF 3 | | 4 | DTMF 4 | | 5 | DTMF 5 | | 6 | DTMF 6 | | 7 | DTMF 7 | | 8 | DTMF 8 | | 9 | DTMF 9 | | 10 | DTMF A | | 11 | DTMF B | | 12 | DTMF C | | 13 | DTMF D | | 14 | DTMF * | | 15 | DTMF # | | 16 | Flash Hook | | 17 | MF 0 | | 18 | MF 1 | Petrack [Page 4] Internet Draft TSE Payload 17 Nov, 1998 | 19 | MF 2 | | 20 | MF 3 | | 21 | MF 4 | | 22 | MF 5 | | 23 | MF 6 | | 24 | MF 7 | | 25 | MF 8 | | 26 | MF 9 | | 27 | MF K0 or KP | | 28 | MF K1 | | 29 | MF K2 | | 30 | MF S0 or ST | | 31 | MF S1 | | 32 | MF S2 | | 33 | MF S3 | | 34 | Wink | | 35 | Wink off | | 36 | Incoming seizure | | 37 | Return seizure | | 38 | Unseize circuit | | 39 | Continuity Test | | 40 | Default continuity tone | | 41 | Continuity tone (single tone)| | 42 | Continuity test (go tone, in dual tone procedures) | | 43 |Continuity verified (response tone, in dual tone procedures)| | 44 | Loopback | | 45 | Old Milliwatt Tone (1000 Hz) | | 46 | New Milliwatt Tone (1004 Hz) | | 47 | Test Line | | 48 | No circuit | | 49 | Answer Supervision | | 50 | Offering Tone | 51 | Reorder Tone I | 52 | Reorder Tone II | 53 | report failure | | 54 | Off hook transition | | 55 | On hook transition | | 56 | Flash hook | | 57 | Acceptance Tone | 58 | Alerting Tone | 59 | Answer tone | | 60 | Busy tone | 61 | Busy Tone II | 62 | Busy Tone III | | 63 | Calling Card Service Tone | 64 | Call Waiting Tone | | 65 | Comfort Tone | 66 | Confirmation Tone (PABX) | 67 | Confirmation Tone I | 68 | Confirmation Tone II | 69 | Congestion Tone (Network busy) | Petrack [Page 5] Internet Draft TSE Payload 17 Nov, 1998 | 70 | Congestion Tone II(Network busy) | | 71 | Congestion Tone III (Network busy) | | 72 | Dial tone | | 73 | Dial Tone II | 74 | Second Dial Tone | 75 | Second Dial Tone II | 76 | Special Dial Tone | 77 | Stutter Dial Tone | 78 | Recall Dial Tone | 79 | End of Three Party Service Tone | 80 | Error Tone | 81 | Executive Override Tone (PABX) | 82 | Facilities Tone | 83 | Function Recognition Tone | 84 | Holding Tone | 85 | Intrusion Tone | 86 | Interception Tone (PABX ) | 87 | Line Lockout Tone | 88 | Negative Indication Tone | 89 | Number Unobtainable Tone | 90 | Number Unobtainable Tone II | 91 | Off hook warning tone | 92 | Pay Tone | 93 | Payphone Recognition Tone | 94 | Payphone Recognition Tone II | 95 | Payphone Recognition Tone III | 96 | Payphone Recognition Tone IV | 97 | Permanent Signal Tone | 98 | Positive Indication Tone | 99 | Preemption Tone |100 | Prompt Tone |101 | Queue Tone |102 | Recall Dial Tone |103 | Recorder Warning Tone |104 | Refusal Tone |105 | Report on completion |106 | Ringing Tone |107 | Ringing Tone II |108 | Ringing Tone III |109 | Ringing Tone IV |110-117 | Distinct. Ringing (8 varieties) |118 | Ringing Tone (PABX) |119 | Route Tone |120 | Route Tone II |121 | SIT Tone |122 | Test Number Tone |123 | Valid Tone |124 | Waiting Tone I |125 | Waiting Tone II |126 | Waiting Tone III |127 | Warning Tone (Operator Intervening) |128 | Warning Tone (End of Period) |129 | Warning Tone (PIP Tone) |130 | Warning Tone PABX (Operator Intervening) |131-255 | Reserved ---------------------------------------------------------- Petrack [Page 6] Internet Draft TSE Payload 17 Nov, 1998 (Note: the encoding has been chosen to be backward compatible with draft-ietf-dtmf-00.txt) The audio waveforms corresponding to these tones are given in Supplement 2 of the ITU-T E.180 standard. They are country dependent. The events listed above which are not listed there are defined as follows (much better to have Bellcore standard, what about other tones?) Wink A transition from unseized to seized to unseized trunk states within a specified period. Typical seizure period is 100-350 msec.) Incoming seizure Incoming indication of call attempt. Return seizure: Seizure in response to outgoing seizure. Unseize circuit: Unseizure of a circuit at the end of a call. Wink off: A signal used in operator services trunks. A transition from seized to unseized to seized trunk states within a specified period of 100-350 ms. (To be checked) Continuity Test: A tone at 2010 + or - 08 Hz. Continuity Test: A tone at the 1780 + or - 30 Hz. Continuity Verified: A tone at 2010 + or - 08 Hz. Milliwatt Tones: Old Milliwatt Tone (1000 Hz), New Milliwatt Tone (1004 Hz) Line Test: 105 Test Line test progress tone (2225 Hz + or - 25 Hz at -10 dBm0 + or -- 0.5dB). No circuit: (that annoying tri-tone, low to high) Petrack [Page 7] Internet Draft TSE Payload 17 Nov, 1998 Answer Supervision: Reorder Tone: (120 Impulses per minute tone). Calling Card Service Tone: 60 ms of 941 + 1477 Hz and 940 ms of 350 + 440 Hz (dial tone), decaying exponentially with a time constant of 200 ms. If a gateway is detecting signal events for re-encoding within this NSE Payload format, it SHOULD start transmitting event packets as soon as it recognizes the event and every multiple of a frame period or, for sample-based codecs, every 50 ms thereafter. If an event continues for more than one period, it should send a new NSE packet with the RTP timestamp value corresponding to the beginning of the event and the duration of the digit increased correspondingly. (The RTP sequence number is incremented by one for each packet.) Note that the RTP timestamp value corresponding to the beginning of the digit ( perhaps combined with the SSRC) can be used to avoid a mistaken interpretation of two separate tones and the creation of spurious tones at the receiver, in the case of packet loss. Events are sent incrementally to avoid having the receiver wait for the completion of the event. Since some tones are very long, this would incur a substantial delay. 3 Signal Tone Frequency Format 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modulation |R R| volume | duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | mul | freq1 |R R R R| freq2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ volume and duration are as in the signal tone frequency format. freq1 and freq2 are values from 0Hz to 4096Hz. Modulation is a value from 0Hz to 255Hz. mul is a value from 0 to 15 Freq1 must be present; the presence of freq2 is optional. (Is there some reason to prefer a fixed-length payload??) The tone to be played is a sum of the frequencies freq1 and freq2, modulated By Func(modulation, mul), where Func is defined as follows: Petrack [Page 8] Internet Draft TSE Payload 17 Nov, 1998 Func (x, y) = x * y / 3 ([E.180 supp. 2] defines modulation frequencies which go from 16Hz to 450Hz, including a few fractional values such as 16 1/3 and 20 2/3). Standard tone cadences can be built up using this RTP payload very easily. Receivers which receive a consecutive sequence of STF packets MUST play them out with a smoothly interpolated phase. (Is there a need to include a bit for phase reversal??). As a simple example, ringing tone in the United States is a tone of 420Hz modulated by a sine wave of 25Hz, playing for 2 seconds, followed by silence for 4 seconds, and then recycling. In France, ringing tone is an unmodulated 440Hz tone, playing for 1.5 seconds, followed by a silence for 3.5 seconds, then recycling. When a call is placed from a phone in the USA to a phone in France, the TSF format SHOULD be transmitted by the French PSTN gateway back to an American gateway, in order to ensure that French ringback tone is heard. (If some non RTP signalling had been used to convey the information to the American gateway that the remote end is in France, and the American gateway knew French tone cadences, it would be possible to transmit the telephone signal events via the simpler TSE payload format). 4 Reliability To achieve reliability even when the network loses packets, the audio redundancy mechanism described in [RFC 2198] is used. The effective data rate is !r! times 64 bits (32 bits for the redundancy header and 32 bits for the DTMF payload) every 50 ms or !r! times 1280 bits/second, where !r! is the number of redundant events carried in each packet. The value of !r! is an implementation trade-off, with a value of 5 suggested. Each TSE event SHOULD be retransmitted at least three times to ensure some measure of reliability. The timestamp offset in RFC2198 has 14 bits, so that it allows a single packet to "cover" 2.048 seconds of events at a sampling rate of 8000 Hz. Including the starting time of previous events allows precise reconstruction of the tone sequence at a gateway. The scheme is resilient to consecutive packet losses spanning this interval of 2.048 seconds or !r! events, whichever is less. Note that for previous events, only an average loudness can be represented. An encoder MAY treat either TSE payload as a highly-compressed version of the current audio frame. In that mode, each RTP packet during a TSE tone would contain the current audio codec rendition (say, G.723.1 or G.729) of this event as well as the representation described in Section 2, plus any previous digits as before. Petrack [Page 9] Internet Draft TSE Payload 17 Nov, 1998 This approach allows dumb gateways that do not understand this format to function. It also allows the actual audio waveform to be sent with no extra delay, and then to be covered by the TSE format when the transmitter detects the signal event. It is possible to mix these two audio telephony payload formats, in order to adapt to network conditions and the shared knowledge of the participants in the RTP session. 3.1 Example 3.1.1 NSE format A typical RTP packet, where the user is just dialing the last digit of the DTMF sequence "911". The first digit was 200 ms long and started at time 0, the second digit lasted 250 ms and started at time 800 ms, the third digit has just been pressed for 100 ms, at time 1.5 s. The frame duration is 50 ms. To make the parts recognizable, the figure below ignores byte alignment. Timestamp and sequence number are assumed to have been zero at the beginning of the first digit. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | | 2 |0|0| 0 |0| 96 | 31 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | | 12000 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | | 0x5234a8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| block PT | timestamp offset | block length | |1| 96 | 12400 | 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| block PT | timestamp offset | block length | |1| 96 | 5600 | 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| Block PT | |0| 96 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event |R R| volume | duration | | 9 |0 0| 7 | 1600 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event |R R| volume | duration | | 1 |0 0| 10 | 2000 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event |R R| volume | duration | | 1 |0 0| 20 | 800 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Petrack [Page 10] Internet Draft TSE Payload 17 Nov, 1998 3.1.2 Mixed NSE and STF format Here is a typical packet which might be sent from a French PSTN gateway back to a PSTN gateway in the USA, during the transmission of ringback tone (event number 106). This packet would be sent back 1.5 seconds into the transmission of the ringback tone. At this point, the receiver should have heard 1.5 seconds of 440Hz tone: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | | 2 |0|0| 0 |0| 97 | 31 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | | 12000 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | | 0x5234a8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| block PT | timestamp offset | block length | |1| 96 | 12000 | 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| block PT | timestamp offset | block length | |1| 97 | 12000 | 6 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| Block PT | |0| 97 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event=106 |R R| volume=3 | duration=12000 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modulation=1 |R R| volume=3 | duration=12000 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |mul=3 | freq1=440 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modulation=0 |R R| volume=0 | duration=28000 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |mul=0 | freq1=0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5 Acknowledgements The suggestions of the VoIP working group are gratefully acknowledged. Steve Magnell of Dialogic Corp. gave some very useful advice. Petrack [Page 11] Internet Draft TSE Payload 17 Nov, 1998 6 Bibliography [1] R. Kocen and T. Hatala, "Voice over frame relay implementation agreement," Implementation Agreement FRF.11, Frame Relay Forum, Foster City, California, Jan. 1997. [RFC2198] C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.-C. Bolot, A. Vega-Garcia, and S. Fosse-Parisis, "RTP payload for redundant audio data," RFC 2198, Internet Engineering Task Force, Sept. 1997. [MGCP] M. Arango, A. Dugan, I. Elliott, C. Huitema, S. Pickett, Internet Draft, Media Gateway Control Protocol (MGCP), Work in Progress, Oct. 98. [E.180] Various Tones used in National Networks, ITU-T E.180, Supplement 2