Audio-Video Transport Working Group D. Budge INTERNET-DRAFT R. McKenzie W. Mills W. Diss P. Long Smith Micro Software, Inc. May 1997 Expires: December 4 1997 Media-independent Error Correction using RTP draft-budge-media-error-correction-00.txt Status of This Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document specifies a media-independent error-correction scheme using the Real-Time Transport Protocol (RTP), along with the payload format for encapsulating both error-correction signaling and media bitstreams in RTP. It enables the reconstruction of lost packets across a connectionless transport such as RTP over UDP. The goal of this scheme is to maximize isochrony, the regular and timely delivery of data, with minimal bandwidth, latency, and computational costs. Table of Contents 1. Background..........................................................2 2. Internet Behavior...................................................3 3. Effects of Packet Loss..............................................4 4. Alternative Solutions...............................................4 5. This Solution.......................................................5 6. Usage of RTP........................................................6 6.1 RTP Header Usage .................................................6 6.2 RTP Packet Structure .............................................7 Budge, et al. [Page 1] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 7. Error-correction Payload Header.....................................7 7.1 Error-Correction Schemes .........................................9 7.1.1 Error-Correction Scheme 0 (1:1:0) .............................9 7.1.2 Error-Correction Scheme 1 (2:1:1) ............................10 7.1.3 Error-Correction Scheme 2 (3:2:2) ............................10 7.1.4 Error-Correction Scheme 3 (2:1:4) ............................11 7.2 Changing Error-Correction Scheme During an RTP Session ..........12 7.2.1 How Changing Scheme is Possible ..............................13 7.2.2 Scenarios for Choosing a Scheme ..............................13 7.3 Stream Interruptions ............................................13 7.3.1 The Problem with Stream Interruptions ........................14 7.3.2 Handling Stream Interruptions by Changing Schemes ............14 7.3.3 Handling Stream Interruptions by Inserting Null Payloads .....15 7.3.4 Comparison of Handling Stream Interruptions ..................15 7.4 Tutorial ........................................................15 8. Security Considerations............................................16 9. Conclusions........................................................16 10. References........................................................17 11. Authors' Address..................................................17 1. Background Data communication over the Internet is markedly different from point- to-point communication via modems. Modems can communicate data with relatively low latency (in milliseconds) and low data protocol overhead (the number of payload data bytes transmitted relative to the total number of bytes transmitted). Modem data errors take the form of corrupted or missing bytes. Because there is such low latency on modems, when an error occurs, we can simply ask the modem to retransmit the corrupted data. The Internet has variable latency that is an order of magnitude greater than modem communications (milliseconds versus several seconds) and relatively high protocol overhead (for audio communications, can be up to 25 percent versus 5 percent over a modem). Communications bandwidths on the Internet are constantly varying from bytes per second to Kbytes per second. Internet data errors take the form of whole packets of data being lost. Large latencies prohibit the retransmission of corrupted data which would cause pauses of, for example, 5 to 20 seconds in a video conference. Because of these limitations, video conferencing using the Internet is currently limited to the those users who are more fascinated with the technology than actually communicating with another person. The challenge then is to deliver the best video conferencing technology possible on today's Internet despite its failings. Budge, et al. [Page 2] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 Finally, despite popular opinion, H.323 [1] is not a reasonable Internet video conferencing standard--it is an intranet video conferencing standard. The needs of a user connected to his or her Internet Service Provider (ISP) via a modem are very different than the needs of a user connected to a corporate LAN, with its large bandwidth capacity and relatively low latency. 2. Internet Behavior Communication across the Internet consists of the transmission of IP data packets [2]. Several different packet types/protocols are available, the two principle protocols being TCP [3] and UDP [4]. TCP is the most familiar protocol because it provides guaranteed delivery of data between two points. In order to guarantee delivery, TCP has a handshaking mechanism where a buffer is transmitted to the receiver, the receiver checks the buffer for errors and then acknowledges correct delivery before the next buffer is transmitted. If the buffer is not acknowledged, it is retransmitted. Although this is a reasonable scheme for sending data, it is problematic for sending real-time data such as audio and video across the Internet. One problem is latency. It takes so long for the acknowledgment to get back to the receiver that the data channel is frequently idle. For example, if the round trip delay on a 20,000-bit-per-second TCP connection is 2 seconds, sending 1000-byte TCP packets results in only 20 percent usage of available bandwidth. Furthermore, real-time data is inherently time-critical. After a few seconds (as in the case of a retransmission caused by a failed acknowledgment), some of the data has "spoiled," i.e., it is no longer useful to the receiver. Finally, due to retransmission, TCP is not an acceptable protocol for multicast signals, since each receiver may or may not require retransmission. The packet type most often used to transmit real-time data like audio and video on the Internet is UDP, or User Datagram Protocol. UDP affords higher throughput and lower latency than TCP at the expense of data integrity. UDP is an unreliable protocol in that there is no guarantee of packet delivery. The errors seen with UDP include lost packets, packets arriving out of sequence, and duplicate packets. Duplicate packets and out-of-order packets are easily handled. Lost packets are another story. We have observed that the amount of packet loss varies between 5 to 20 percent across an Internet connection. Further, we have observed that packet loss is unpredictable--random over a uniform distribution--and the packets are usually not dropped in temporally close groups. For example, in a scenario where there is 10 percent packet loss, the dropped packets occur on average 1 out of every 10, with 2 sequential packets being dropped 1 out of every 100 packets and so on. (It has been pointed out that many studies have focused on aggregate packet loss at a node, not packet loss across a connection [5]. Each model exhibits a different pattern of packet loss. For example, a single Budge, et al. [Page 3] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 connection may experience random, mostly single-packet and adjacent two- packet losses while a network node may experience grouped packet losses that occur in overlaid waves of different frequencies and relative phases.) 3. Effects of Packet Loss Audio and video performance are affected differently by packet loss. Converting the audio data into sound for a given audio packet does not depend on any previous or future audio data packets. That is, it is temporally "self contained." Therefore when audio data is lost, silence may be heard during the interval that the lost speech packet occupied. The effect is somewhat like a microphone with a bad cable--speech with holes in it. When packet loss approaches 10 percent, the effect is very annoying, such that it would be unacceptable for users who are accustomed to phone-quality speech. At 20-percent packet loss, the signal approaches the unintelligible. Unlike audio codecs, video codecs such as H.263 [6] have facilities for error correction in the form of simple redundancy. The effects of packet loss can be mitigated further by minimizing inter-frame dependency. Temporary picture aberrations still occur, though, such as brief freezes in the video signal. 4. Alternative Solutions Because the H.323 video conferencing standard is--again--directed towards LAN-based systems, it does not attempt to address the problem of dropped audio packets since packet loss on corporate LANs is usually well below 10 percent. For video, H.323 attempts to align data packets with discrete segments of a video frame and wrap that video segment with a header that gives context to the video data in the current frame. The effect of this is that, even when a video packet is dropped, packets that describe other pieces of the current picture can update their respective piece. For example, if a video data packet contains a complete set of data for 16 lines of pixels, and that packet is lost, other lines in that frame could still be updated. A sender could anticipate or detect the percentage of missing macroblocks at the receiver and send redundant macroblocks intra. They could be sent on a schedule so that they are received often enough that freezes are brief, bad visual effects are minimized, yet they do not consume excessive bandwidth. However, drawbacks of sending redundant macroblocks in H.323 include the amount of overhead per packet and increased codec inefficiency caused by the packetization schemes. Between IP, UDP, RTP [7] packet headers and the need to encode Groups Of Blocks (GOBS) with headers, the efficiency of data transfer is reduced by at least 25 percent--bandwidth capacity that Budge, et al. [Page 4] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 is present on the corporate LAN but hardly over a 28.8kbps modem connection to the Internet. In addition, although completely contrary to the spirit of H.26x compression, inter-frame dependency can be reduced by, for example, turning motion vectors off. This prevents the errors resulting in missing macroblocks from propagating themselves, at the expense of increasing the size of the bitstream by 25 to 50 percent. Although the means by which H.26x codecs cope with packet loss have their disadvantages, they do exist. Therefore, there is a greater need for the error-correction scheme described in this document for audio streams than video streams, although it may be appropriate for video if temporary picture aberrations are unacceptable to the user. One promising solution that has been offered to overcome the Internet's unreliability is that of layering video. The concept of layering is simple. In essence, send two independent interleaved frame sequences. Then if one sequence "takes a hit," one can at least continue to view the other stream (realizing that half the frame rate has been lost) while signally the transmitter to send another key frame which refreshes both streams. This approach has merit and needs to be studied further. However, according to the authors of this method, there is a 20 percent degradation of effective bandwidth due to the fact that frame-to-frame differences are computed over a two-frame interval rather than over a single-frame interval. The fact that one of the streams will be knocked out--and soon--can be empirically demonstrated. What one is left with is a picture that sometimes is fast (but not for long) then degrades to less than half of the frame rate that the channel is capable (40 percent of maximum). In addition, audio is not protected by this mechanism. Finally, this would represent a major change to most existing H.263 codecs. For software codec manufactures, the additional buffering would be reflected in some amount of performance degradation. A simple solution is to simply transmit each UDP packet twice, regardless of whether it is carrying audio or video data. This fits within the existing H.323 standard in that there is a requirement for duplicate packets to be ignored. Sending each packet twice guarantees correction of all single-packet losses and half of all two-packet losses. Assuming a 1:10 chance of any packet being dropped, analysis yields a freeze chance of approximately 1:200. At first glance, it appears that the loss of effective bandwidth (only 50 percent of maximum) is prohibitive. But when one analyzes the effects of dropped audio packets and the 10x cost of transmitting key frames (not to mention the time that the screen is frozen), it starts to look like a pretty good way to go. 5. This Solution Rather than layering the signal or sending each packet twice, the remainder of this document describes the packetization and scheme for Budge, et al. [Page 5] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 sending the exclusive-or, or XOR, of combinations of packets (The Mills- and-McKenzie, or M&M, Algorithm). This scheme substantially increases the reliability of packet delivery, because the original packets can be reconstructed in several different ways, not just recovered from the surviving duplicate. 6. Usage of RTP Along with a header for error-correction information, defined in section 7, the media stream is carried as payload data within RTP packets. 6.1 RTP Header Usage marker bit (M bit): The RTP marker-bit field has the following interpretation unless a profile supersedes it. The RTP marker bit for a given packet shall be what would have been the RTP marker bit for the original media payload of the most-recently- transmitted non-null payload of those represented in the packet (the timestamps in consecutive RTP packets might not be monotonic [7]). See section 7.3 for a definition of "null payload." If a packet contains only null payloads, the marker bit shall have no meaning. For example, here is a list of RTP packets transmitted from left to right, where uppercase letters represent original media payloads and xy represents some arbitrary function, f(x,y) (f() is always the XOR operation in this document). A*, B*, ABC*, C*, ACD*, ABD*, D*, BCD* What would have been the RTP marker bit for the original media payload that is followed by an asterisk, *, is used for the value of this packet's RTP marker-bit field. Knowing to which original media payload the field belongs, one refers to the original media's RTP profile to determine the final use of this field. As an example of what happens in the presence of a null payload, here is the same list where C is a null payload: A*, B*, AB*C, C, ACD*, ABD*, D*, BCD* payload type (PT): This is the type of error-correction-encapsulated media, not the original media. The packet type of the original media cannot be used, because an error-correction-encapsulated media stream is different than the original media stream. Therefore, a new static payload type may be Budge, et al. [Page 6] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 defined for an error-correction-encapsulated version of a media stream or, more likely, a dynamic payload type may be defined out-of-band. Regardless, payload-type assignment and the mapping between original payload type and error-correction-encapsulated payload type is outside the scope of this document. sequence number: The RTP sequence number can be used to restore the original packet sequence and determine whether and how many packets are lost, exactly as without this error-correction scheme. It can also be used to determine media context such as the spatial position of a video payload or the timing of the source. However, since the value of this field increases monotonically with respect to the sequence of generated RTP packets but not necessarily with respect to the sequence of the original media payloads (e.g., an error-correction RTP packet may contain the XOR of three original media payloads or conversely an original media payload may be represented in more than one error-correction RTP packet), the receiver may also need to take into consideration the values of the scheme and mode fields to determine the same media context. timestamp: The RTP timestamp field has the same interpretation as the marker-bit field described above unless a profile supersedes this interpretation of the timestamp field. 6.2 RTP Packet Structure The error-correction payload header starts at the first octet in the RTP payload. The media payload immediately follows the error-correction payload header. The media payload is the data that would have otherwise exclusively occupied the RTP payload if this error-correction scheme were not used. The RTP profile defined for the media shall be used for the packetization of the media payload field. The layout of the RTP error-correction packet is shown as: +---------------------------------------------------------------+ | RTP Header | |---------------------------------------------------------------| | Error-correction Payload Header | |---------------------------------------------------------------| | Media Payload | +---------------------------------------------------------------+ 7. Error-correction Payload Header Each RTP error-correction packet carries as many media packets as would have been carried without error correction. The error-correction payload Budge, et al. [Page 7] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 header is always present in each RTP packet even if the error-correction scheme=0, which indicates "do not apply error-correction to this packet," and not necessarily, "do not apply error-correction during this RTP session." Four error-correction schemes, i.e., 0, 1, 2, and 3, are defined for the RTP error-correction payload header. The ability to receive packets of a particular scheme is signaled out-of-band. Only one scheme applies to an RTP packet at a time, but the scheme can change from one RTP packet to another. The ability of the receiver to switch error-correction scheme during an RTP session (not the actual switching) is also signaled out- of-band. (It would have been convenient to use the payload-type field in the RTP header to express the information represented by the scheme and mode fields, thus saving one byte per RTP packet. However, this could consume several dynamic payload types in the rather small number space between 96 to 127 and require a relatively complicated out-of-band method to assign dynamic payload types to the corresponding scheme and mode combinations.) The error-correction payload header is a single 24-bit word, which is transmitted in network byte and bit order (decreasing significance) with the most significant bit shown at the left in the following diagram. 0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |scheme | mode | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ scheme: 4 bits The error-correction scheme that is being used on this and presumably subsequent packets. (Note: This document describes a single error- correction scheme in the larger sense, but this field identifies which [sub]scheme is currently in use. Therefore, the word, "scheme," is overloaded in this document.) mode: 4 bits Media-payload mode. This indicates the position of this packet in the cycle of packets for this error-correction scheme. The sequence number in the RTP header is used to establish further context in the packet stream. length: 16 bits The XOR of the lengths of the original media payloads of which this payload contains an XOR. For example, if an error-correction payload contains (the scheme and mode indicate what it contains): Budge, et al. [Page 8] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 A XOR B XOR C where the letters, A, B, and C, represent original media payloads, this field contains: (length of A) XOR (length of B) XOR (length of C) The purpose is, by XORing the length in the same way as the media payload, to recover the original media payload's length at the same time as the payload is recovered. If an error-correction payload contains an original media payload that is not XORed with another one, this field simply contains that payload's length. 7.1 Error-Correction Schemes These are the error-correction schemes currently defined. In the future, others may be defined with corresponding values for the scheme field of the error-correction payload header. If the media payloads being XORed are of different lengths, for example, A XOR B, where A or B is shorter, or (A XOR B) XOR C, where (A XOR B) or C is shorter, the shorter payload is effectively padded with zeros up to the length of the longer one before the XOR operation is performed. This padding is just for the XOR operation and is not part of the payload, proper. When the original media payload is reconstructed along with its length, the padding is ignored. The notation, x:y:z, at the end of the following section headings is a descriptive short-hand for the scheme. The first two numbers are the ratio between the number of media payloads transmitted versus the number of original media payloads. The third is the number of additional packet delays incurred by using the scheme relative to not using error correction at all. If a scheme is added whose short-hand would be the same as a scheme already defined, it should be qualified in some way such as by appending a lower-case letter as in 2:1:1a. 7.1.1 Error-Correction Scheme 0 (1:1:0) This indicates that error correction is not being applied to the packet stream. This allows an RTP session over a connection that is currently experiencing extremely low packet loss to immediately provide error correction in case the line degrades. mode is unused but shall be set to 0. Budge, et al. [Page 9] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 7.1.2 Error-Correction Scheme 1 (2:1:1) With this simplest error-correction scheme, the packet stream is translated to the original sequence with a packet of the XOR of each adjacent pair inserted between them as in A, B, C, D, E, F, . . . => A, AB; B, BC; C, CD; D, DE; E, EF; F, . . . where AB stands for A XOR B, etc. Noting that XOR is associative, commutative, and idempotent, where the latter means that XXY=Y, there are two ways to reconstruct the first packet and three ways to reconstruct all subsequent packets: A = A = (AB)(B) B = B = (A)(AB) = (BC)(C) C = C = (B)(BC) = (CD)(D) This strategy has the same 2:1 transmission overhead and single additional packet delay of simply sending each packet twice; however, it has much better error-correction capability. It protects against all single-packet and two-packet losses, and 75 percent of three-packet losses within the group of 4 sent packets for each pair of original packets, while sending each packet twice protects against all single- packet losses but only 50 percent of two-packet losses and no three- packet losses. mode shall be set to 1 for an XOR packet and 0 otherwise. 7.1.3 Error-Correction Scheme 2 (3:2:2) We can increase our effective bandwidth relative to scheme=1 at the expense of another packet delay and some error-correction capability. This is done by sending, for each group of two packets, three combinations of XOR packets. If the packet stream would have been A, B, C, D, E, F, G we first partition it into groups of two packets: . . . B, C; D, E; F, G; . . . Then each group is translated as follows into groups of three packets, remembering that, for example, BC stands for B XOR C: . . . B, C, BC; D, E, DE; F, G, FG; . . . Finally, every second packet in a group is carried over into the next group by XORing it with each packet in the next group, resulting in AB, AC, ABC; CD, CE, CDE; EF, EG, EFG; . . . Budge, et al. [Page 10] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 (Since this is a streaming operation, these translations are not, of course, done in separate passes as has been shown here.) Because of the rolling nature of the scheme, analysis is not straightforward. When the carry-over is known, it is protection against any single-packet loss in a group. When the carry-over is not known, the carry-over can be reconstructed if all three packets of a group are received. Reconstruction also depends on the carry-over situation. In the presence of the carry-over, there are two ways to reconstruct each packet: B = (A)(AB) = (AC)(ABC) C = (A)(AC) = (AB)(ABC) Absent the carry-over, there is only one way to reconstruct each packet: A = (AB)(AC)(ABC) -- reconstructing the carry-over B = (AC)(ABC) C = (AB)(ABC) This strategy has a 3:2 transmission overhead instead of 2:1 for scheme=1 or sending each packet twice--a significant improvement. It protects against all single-packet losses and against 73 to 83 percent of two-packet losses within the group. It protects against 11 of the 15 two-packet losses, 1 of the 15 only leaves the last carry-over open which may be reconstructed by the next group, and 3 of the 15 yield an unrecoverable error. Note that the above method achieves more error-correction capability than duplicating each packet, with greater effective bandwidth at the expense of 2 packet delays rather than 1. Where ideal transmission is 100 percent transfer rate, duplicating packets represents a 50 percent transfer rate, and the above mechanism represents a 67 percent transfer rate. mode shall be set to 0 for the XOR of two adjacent original media payloads, as in AB, CD, and EF, above, 1 for the XOR of two payloads separate by another intervening payload, as in AC, CE, and EG, and 2 for the XOR of three adjacent payloads, as in ABC, CDE, and EFG. 7.1.4 Error-Correction Scheme 3 (2:1:4) To increase error correction further with the effective bandwidth of scheme=1, but at the expense of 4 packet delays instead of 2 for scheme=2 and 1 for scheme=1, a better strategy is to send, for each group of four packets, eight combinations of XORs: A, B, C, D; . . . => A, B, ABC, C, ACD, ABD, D, BCD; . . . Budge, et al. [Page 11] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 This protects against one-, two-, and three-packet losses, plus correction of 80 percent (56/70) of four-packet losses per group of eight sent. For this strategy, there are eight ways to reconstruct each packet: A = A = (B)(C)(ABC) = (B)(D)(ABD) = (C)(D)(ACD) = (D)(ABC)(BCD) = (C)(ABD)(BCD) = (B)(ACD)(BCD) = (ABC)(ABD)(ACD) B = B = (A)(C)(ABC) = (A)(D)(ABD) = (C)(D)(BCD) = (D)(ABC)(ACD) = (C)(ABD)(ACD) = (A)(BCD)(ACD) = (ABC)(ABD)(BCD) C = C = (A)(B)(ABC) = (A)(D)(ACD) = (B)(D)(BCD) = (D)(ABC)(ABD) = (B)(ACD)(ABD) = (A)(BCD)(ABD) = (ABC)(ACD)(BCD) D = D = (A)(B)(ABD) = (A)(C)(ACD) = (B)(C)(BCD) = (C)(ABD)(ABC) = (B)(ACD)(ABC) = (A)(BCD)(ABC) = (ABD)(ACD)(BCD) The above sequence, while having the same bandwidth efficiency as the duplicate-packet mechanism, 2:1, is far superior in error-correction ability. With this transmission scheme (assuming a 10 percent packet loss), one could anticipate a video freeze or audio discontinuity about once every 20 minutes (the odds of an anomaly are 1:5922). mode shall be set to 0 for an A packet, 1 for B, 2 for ABC, 3 for C, 4 for ACD, 5 for ABD, 6 for D, and 7 for BCD. 7.2 Changing Error-Correction Scheme During an RTP Session Since the scheme field is present in every error-correction RTP packet, the RTP sender may change its value at any time, signaling to the RTP receiver that a different error-correction scheme is being used starting with the packet containing the changed scheme value. The capability of the receiver to perform a given error-correction scheme must be signaled out-of-band, such as through the use of the capability-exchange and open-logical-channel procedures of H.245 [8]. The sender shall not use an error-correction scheme which the receiver has no capability to process. When the scheme changes, the transmitter shall not send encodings for any original packets that the receiver could have reconstructed from the previous scheme's packet stream, assuming no packet loss. This avoids duplicate packets, because the receiver has no way to correlate the same original packets sent using the different schemes--it has no way of knowing that the same packet has been sent twice. For example, for scheme=3, if A, B, and ABC have been transmitted and the sender wishes to switch to scheme=0 for the next RTP packet, it shall not transmit an encoding of C because C could have been reconstructed from the previous packets, i.e., (A)(B)(ABC). Budge, et al. [Page 12] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 7.2.1 How Changing Scheme is Possible It is possible to change the error-correction scheme on a packet-by- packet basis, because the first packet of the new scheme can be thought of as if it were the first packet of a new RTP session using only that scheme. Once the receiver has released any system resources associated with the previous scheme, it uses the same start-of-RTP-session logic it would have used for the indicated scheme to start processing packets for the new scheme during the current RTP session. This logic includes determining the mode of the packet within the indicated error-correction scheme and taking into consideration the possibility that the first few packets of this scheme may have been lost. 7.2.2 Scenarios for Choosing a Scheme As long as the receiver has the capability to process it, the sender may use the same error-correction scheme for all RTP sessions, a scheme used throughout the session but chosen for the known performance characteristics and apparent condition of one or more links in the connection before each session starts, or a scheme based on the current condition of the connection, switching schemes as the condition changes. Feedback for the last scenario could be provided by the receiver to the sender through a reverse RTCP [7] channel associated with the forward RTP channel. 7.3 Stream Interruptions The error-correction scheme described in this document assumes that there is usually a steady stream of original media payloads present at the input of the error-correction encoder. When there is not--when the stream of payloads into the encoder is interrupted for some reason such as at the onset of silence suppression in an audio stream--there are at least two ways to make sure that enough information is transmitted so that all packets can be reconstructed up to and including the last packet before the interruption. Only one way is required; however, since they have different performance characteristics, two ways are described, and the choice of which one to use is left up to the sender, although the receiver shall be capable of handling both. To ensure that the sender may use either way of handling stream interruptions, although only used by one, the following requirements are necessary: A receiver shall ignore all original media payloads with length=0 (this assumes that an original media payload can never otherwise have length=0). These payloads are called "null payloads." Any RTP header fields that have been associated with a payload by an RTP profile shall be ignored as they pertain to a null payload. If a sender inserts a null payload into the input of its encoder, it shall consistently apply the null payload across all modes of the scheme as if Budge, et al. [Page 13] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 it were a real payload whose place it has taken. For example, with scheme=1, a null payload cannot simply replace B in a single mode, as in A, AB; 0, BC; C, . . . It must replace all encodings of B, as in A, A0; 0, 0C; C, . . . 7.3.1 The Problem with Stream Interruptions If the last original media payload before an interruption must be encoded with one or more subsequent payloads (e.g., the D in DE or the A in ACD) according to the current mode of the scheme in use, the last payload must be delayed until the subsequent payloads arrive at the input of the error-correction encoder. For some media or if the stream resumes after a relatively short delay, this is not a problem; however, it is a problem for other media such as audio with silence suppression. This particular media uses silence frames to indicate to the receiver that the audio stream is being interrupted due to silence at the source, but the silence frame cannot be sent until more frames arrive at the encoder! If previously transmitted packets do not contain sufficient information for the receiver to reconstruct the last payload, the receiver encounters the silence frame at the end of the period of silence that the frame was intended to announce. For example, with scheme=2, if AB, AC, and ABC have been sent and C is a silence frame, the receiver can reconstruct the original payloads of A, B, and C (assuming no packets have been lost) without CD. However, if only AB has been sent and B is a silence frame, the receiver cannot reconstruct the original payloads of A or B without AC and ABC. The solution lies in somehow transmitting A and B without waiting for C to arrive at the encoder. The following solutions do just that, the first by changing schemes, the second by inserting null payloads. 7.3.2 Handling Stream Interruptions by Changing Schemes Perhaps the simplest way of dealing with stream interruption is to immediately switch to scheme=0--no error correction--for the last few packets before the interruption. If, assuming no packet loss, previously transmitted packets do not contain sufficient information for the receiver to reconstruct original payloads up to and including the last original payload before the interruption, scheme=0 shall be used to transmit all of the original payloads that the receiver cannot reconstruct, in the same order that they were received by the encoder. When original payloads start arriving at the input of the encoder again--the interruption is over--the sender shall resume with the first mode of any scheme, although it will typically be the previous, non-0 scheme. A stream may be interrupted at any mode, even before a group is Budge, et al. [Page 14] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 completed. When the stream resumes, the sender shall start with mode=0, not necessarily where it left off. For example, with scheme=2, if only AB has been sent and B is the last payload before an interruption, the sender transmits packets containing A and then B using scheme=0. When the interruption is over, the sender transmits using scheme=2 again as in the following, where the scheme is enclosed in brackets: AB[2], A[0], B[0] . . . interruption . . . A'B'[2], A'C'[2], . . . 7.3.3 Handling Stream Interruptions by Inserting Null Payloads An alternative to changing schemes is for the sender to insert null payloads into its encoder for any original media payloads for which it would have otherwise had to wait. This allows the packet stream to continue until packets have been sent that contain encodings necessary for the receiver to reconstruct all packets up to and including the last original payload before the interruption. The receiver simply has to ignore null payloads. For example, with scheme=2, if only AB has been sent and B is the last payload before an interruption, the sender shall transmit AC and ABC where a null payload is inserted in place of C: AB, A0, AB0; . . . interruption . . . 0D, 0E, 0DE; EF, EG, EFG; . . . This allows the receiver to reconstruct A and B without waiting for a real C to arrive at the sender's encoder. Note that C shall continue to be replaced with a null payload in subsequent modes that include encodings of C. 7.3.4 Comparison of Handling Stream Interruptions Handling stream interruptions with null payloads continues to generate error-correction packets while the sender is "spinning down," before the interruption, whereas the alternative, changing to scheme=0, simply transmits single copies of the last few payloads as it is spinning down. Scheme=0 does not providing any encodings for secondary reconstructions, making it less resilient to packet loss. On the other hand, handling interruptions by changing to scheme=0 requires slightly less bandwidth and may be less complex. 7.4 Tutorial For those not familiar with the basic error-correcting ability provided by XOR, here is a simplified example of how it is done. When z, the Budge, et al. [Page 15] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 result of A XOR B, is XORed with A, B is recovered as the result; when XORed with B, A is recovered. A = 1010 B = 1001 z = A XOR B = 0011 B = z XOR A = 1001 A = z XOR B = 1010 8. Security Considerations This error-correction scheme does not expose the data in the RTP header and payload to any further security risk, nor does it provide any further security protection, than if it were not used. 9. Conclusions The data transport must fit the expected parameters of the target transmission medium, e.g., data transfer rate, data error rate, data error mode, and round trip delay. In the case of the Internet via phone modems, we have empirically determined that a moderate-to-good Internet transmission transfers about 20,000 bits per second with a 10 percent to 20 percent packet loss. Given these parameters and the need to provide mechanisms that support multicasting of audio and video, the XOR error-correction, packet- redundancy scheme described in this document works well between endpoints on a lossy packet-switched network, providing the user with higher quality, intelligible audio and video that is more fluid. There may be strategies for other transmission mediums, like intranets, that reduce overhead from the levels stated above, but in many cases, intranets have enough capacity so that the bandwidth requirements of this scheme are not an issue. A final note: Although this error-correction scheme is intended to be implemented within endpoints on the network, it would be more efficient to implement the core technology solely on network points-of-presence such as, in the case of the Internet, at the dial-up user's service provider. To illustrate, assume we have two users, each connected via a modem to their service provider, engaged in a conference. Their modem connections are small-bandwidth, low-latency, high-reliability connections. The Internet connection between the two service providers is a high bandwidth, high latency, low reliability connection. The users could send their packets without redundancy, hence using the narrow bandwidth efficiently, to an active agent residing at the service provider. That agent could then create the XORed redundant-packet stream for transmission to a peer agent at the other service provider, who would then recover the original packets and send them down the modem Budge, et al. [Page 16] INTERNET-DRAFT Media-independent Error Correction using RTP May 1997 pipe to the second user. Of course, the drawback of this scheme is the need for active agents resident at the Internet service providers. 10. References [1] "Visual Telephone System and Equipment for Local Area Networks Which Provide a Non-Guaranteed Quality of Service," ITU-T Draft Recommendation H.323, 1996. [2] Postel, J., "Internet Protocol," RFC 791, 1981. [3] Postel, J., ed., "Transmission Control Protocol - DARPA Internet Program Protocol Specification," RFC 793, 1981. [4] Postel, J., "User Datagram Protocol," RFC 768, 1980. [5] Bolot, J.-C. and Vega-Garcia, A., "The case for FEC-based error control for packet audio in the Internet," Multimedia Systems, 1997. [6] "Video Coding for Low Bitrate Communication," ITU-T Recommendation H.263, 1996. [7] H.Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications," RFC 1889. [8] "Control Protocol for Multimedia Communication," ITU-T Recommendation H.245, 1996. 11. Authors' Address Dan Budge (dbudge@smithmicro.com, telephone extension 22) Robert McKenzie (bmckenzie@smithmicro.com) Willie Mills (wmills@smithmicro.com) William Diss (bdiss@smithmicro.com) Paul Long (plong@smithmicro.com, telephone extension 12) Smith Micro Software, Inc. 15050 SW Koll Parkway Suite 2B Beaverton, OR 97006 USA Phone: +1.503.641.1221 Fax: +1.503.641.3344 Expires: December 4 1997 Budge, et al. [Page 17] --------------33F09D01775604BC94EE41C7--