| < draft-ietf-tsvwg-tfrc-04.txt | draft-ietf-tsvwg-tfrc-05.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force TSV WG | A new Request for Comments is now available in online RFC libraries. | |||
| INTERNET-DRAFT Mark Handley/ICIR | ||||
| draft-ietf-tsvwg-tfrc-04.txt Jitendra Padhye/ICIR | ||||
| Sally Floyd/ICIR | ||||
| Joerg Widmer/Univ. Mannheim | ||||
| 27 April 2002 | ||||
| Expires: October 2002 | ||||
| TCP Friendly Rate Control (TFRC): | ||||
| Protocol Specification | ||||
| Status of this Document | ||||
| This document is an Internet-Draft and is in full conformance with all | ||||
| provisions of Section 10 of RFC2026. | ||||
| Internet-Drafts are working documents of the Internet Engineering Task | ||||
| Force (IETF), its areas, and its working groups. Note that other groups | ||||
| may also distribute working documents as Internet-Drafts. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | ||||
| and may be updated, replaced, or obsoleted by other documents at any | ||||
| time. It is inappropriate to use Internet- Drafts as reference material | ||||
| or to cite them other than as "work in progress." | ||||
| The list of current Internet-Drafts can be accessed at | ||||
| http://www.ietf.org/ietf/1id-abstracts.txt | ||||
| The list of Internet-Draft Shadow Directories can be accessed at | ||||
| http://www.ietf.org/shadow.html. | ||||
| This document is a product of the IETF TSV WG. Comments should be | ||||
| addressed to the authors. | ||||
| Abstract | ||||
| This document specifies TCP-Friendly Rate Control (TFRC). | ||||
| TFRC is a congestion control mechanism for unicast flows | ||||
| operating in a best-effort Internet environment. It is | ||||
| reasonably fair when competing for bandwidth with TCP flows, | ||||
| but has a much lower variation of throughput over time | ||||
| compared with TCP, making it more suitable for applications | ||||
| such as telephony or streaming media where a relatively smooth | ||||
| sending rate is of importance. | ||||
| Table of Contents | ||||
| 1. Introduction. . . . . . . . . . . . . . . . . . . . . . 4 | ||||
| 2. Terminology . . . . . . . . . . . . . . . . . . . . . . 5 | ||||
| 3. Protocol Mechanism. . . . . . . . . . . . . . . . . . . 5 | ||||
| 3.1. TCP Throughput Equation. . . . . . . . . . . . . . . 6 | ||||
| 3.2. Packet Contents. . . . . . . . . . . . . . . . . . . 7 | ||||
| 3.2.1. Data Packets. . . . . . . . . . . . . . . . . . . 7 | ||||
| 3.2.2. Feedback Packets. . . . . . . . . . . . . . . . . 8 | ||||
| 4. Data Sender Protocol. . . . . . . . . . . . . . . . . . 8 | ||||
| 4.1. Measuring the Packet Size. . . . . . . . . . . . . . 9 | ||||
| 4.2. Sender Initialization. . . . . . . . . . . . . . . . 9 | ||||
| 4.3. Sender behavior when a feedback packet is | ||||
| received. . . . . . . . . . . . . . . . . . . . . . . . . 10 | ||||
| 4.4. Expiration of nofeedback timer . . . . . . . . . . . 11 | ||||
| 4.5. Preventing Oscillations. . . . . . . . . . . . . . . 11 | ||||
| 4.6. Scheduling of Packet Transmissions . . . . . . . . . 12 | ||||
| 5. Calculation of the Loss Event Rate (p). . . . . . . . . 13 | ||||
| 5.1. Detection of Lost or Marked Packets. . . . . . . . . 13 | ||||
| 5.2. Translation from Loss History to Loss Events . . . . 14 | ||||
| 5.3. Inter-loss Event Interval. . . . . . . . . . . . . . 15 | ||||
| 5.4. Average Loss Interval. . . . . . . . . . . . . . . . 15 | ||||
| 5.5. History Discounting. . . . . . . . . . . . . . . . . 17 | ||||
| 6. Data Receiver Protocol. . . . . . . . . . . . . . . . . 19 | ||||
| 6.1. Receiver behavior when a data packet is | ||||
| received. . . . . . . . . . . . . . . . . . . . . . . . . 19 | ||||
| 6.2. Expiration of feedback timer . . . . . . . . . . . . 20 | ||||
| 6.3. Receiver initialization. . . . . . . . . . . . . . . 20 | ||||
| 6.3.1. Initializing the Loss History after the | ||||
| First Loss Event . . . . . . . . . . . . . . . . . . . . 21 | ||||
| 7. Security Considerations . . . . . . . . . . . . . . . . 21 | ||||
| 8. IANA Considerations . . . . . . . . . . . . . . . . . . 22 | ||||
| 9. Authors' Addresses. . . . . . . . . . . . . . . . . . . 22 | ||||
| 10. Acknowledgments. . . . . . . . . . . . . . . . . . . . 22 | ||||
| 11. References . . . . . . . . . . . . . . . . . . . . . . 22 | ||||
| NOTE TO RFC EDITOR: PLEASE DELETE THIS NOTE UPON PUBLICATION. | RFC 3448 | |||
| Changes from draft-ietf-tsvwg-tfrc-03.txt: | ||||
| * Changed one line in the computation of the weighted average I_mean of | ||||
| the loss intervals from: | ||||
| W_tot = w_(i-1) * DF_i; | ||||
| to: | ||||
| W_tot = W_tot + w_(i-1) * DF_i; | ||||
| Bug report from Wendy Lee. | ||||
| * Removed the specification that the sender sends its current transmit | Title: TCP Friendly Rate Control (TFRC): | |||
| rate. This was only used to initialize X_recv, and is not needed. | Protocol Specification | |||
| Author(s): M. Handley, S. Floyd, J. Padhye, J. Widmer | ||||
| Status: Standards Track | ||||
| Date: January 2003 | ||||
| Mailbox: mjh@icir.org, floyd@icir.org, | ||||
| padhye@microsoft.com, | ||||
| widmer@informatik.uni-mannheim.de | ||||
| Pages: 24 | ||||
| Characters: 52657 | ||||
| Updates/Obsoletes/SeeAlso: None | ||||
| * Said that instead of sending the timestamp and the estimate of the | I-D Tag: draft-ietf-tsvwg-tfrc-05.txt | |||
| RTT, the sender could send a coarse-grained counter that increments | ||||
| every quarter RTT. In this case, the receiver also does not have to | ||||
| send the timestamp back to the sender. | ||||
| END OF NOTE TO RFC EDITOR. | ||||
| 1. Introduction | URL: ftp://ftp.rfc-editor.org/in-notes/rfc3448.txt | |||
| This document specifies TCP-Friendly Rate Control (TFRC). TFRC is a | This document specifies TCP-Friendly Rate Control (TFRC). TFRC is a | |||
| congestion control mechanism designed for unicast flows operating in a | congestion control mechanism for unicast flows | |||
| Internet environment and competing with TCP traffic [2]. Instead of | operating in a best-effort Internet environment. It is | |||
| specifying a complete protocol, this document simply specifies a | reasonably fair when competing for bandwidth with TCP flows, | |||
| congestion control mechanism that could be used in a transport protocol | but has a much lower variation of throughput over time | |||
| such as RTP [6], in an application incorporating end-to-end congestion | compared with TCP, making it more suitable for applications | |||
| control at the application level, or in the context of endpoint | such as telephony or streaming media where a relatively smooth | |||
| congestion management [BRS99]. This document does not discuss packet | sending rate is of importance. | |||
| formats, reliability, or implementation-related issues. | ||||
| TFRC is designed to be reasonably fair when competing for bandwidth with | ||||
| TCP flows, where a flow is ``reasonably fair'' if its sending rate is | ||||
| generally within a factor of two of the sending rate of a TCP flow under | ||||
| the same conditions. However TFRC has a much lower variation of | ||||
| throughput over time compared with TCP, which makes it more suitable for | ||||
| applications such as telephony or streaming media where a relatively | ||||
| smooth sending rate is of importance. | ||||
| The penalty of having smoother throughput than TCP while competing | ||||
| fairly for bandwidth is that TFRC responds slower than TCP to changes in | ||||
| available bandwidth. Thus TFRC should only be used when the application | ||||
| has a requirement for smooth throughput, in particular, avoiding TCP's | ||||
| halving of the sending rate in response to a single packet drop. For | ||||
| applications that simply need to transfer as much data as possible in as | ||||
| short a time as possible we recommend using TCP, or if reliability is | ||||
| not required, using an Additive-Increase, Multiplicative-Decrease (AIMD) | ||||
| congestion control scheme with similar parameters to those used by TCP. | ||||
| TFRC is designed for applications that use a fixed packet size, and vary | ||||
| their sending rate in packets per second in response to congestion. | ||||
| Some audio applications require a fixed interval of time between packets | ||||
| and vary their packet size instead of their packet rate in response to | ||||
| congestion. The congestion control mechanism in this document cannot be | ||||
| used by those applications; TFRC-PS (for TFRC-PacketSize) is a variant | ||||
| of TFRC for applications that have a fixed sending rate but vary their | ||||
| packet size in response to congestion. TFRC-PS will be specified in a | ||||
| later document. | ||||
| TFRC is a receiver-based mechanism, with the calculation of the | ||||
| congestion control information (i.e., the loss event rate) in the data | ||||
| receiver rather in the data sender. This is well-suited to an | ||||
| application where the sender is a large server handling many concurrent | ||||
| connections, and the receiver has more memory and CPU cycles available | ||||
| for computation. In addition, a receiver-based mechanism is more | ||||
| suitable as a building block for multicast congestion control. | ||||
| 2. Terminology | ||||
| In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", | ||||
| "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and | ||||
| "OPTIONAL" are to be interpreted as described in RFC 2119 and indicate | ||||
| requirement levels for compliant TFRC implementations. | ||||
| 3. Protocol Mechanism | ||||
| For its congestion control mechanism, TFRC directly uses a throughput | ||||
| equation for the allowed sending rate as a function of the loss event | ||||
| rate and round-trip time. In order to compete fairly with TCP, TFRC | ||||
| uses the TCP throughput equation, which roughly describes TCP's sending | ||||
| rate as a function of the loss event rate, round-trip time, and packet | ||||
| size. We define a loss event as one or more lost or marked packets from | ||||
| a window of data, where a marked packet refers to a congestion | ||||
| indication from Explicit Congestion Notification (ECN) [7]. | ||||
| Generally speaking, TFRC's congestion control mechanism works as | ||||
| follows: | ||||
| o The receiver measures the loss event rate and feeds this | ||||
| information back to the sender. | ||||
| o The sender also uses these feedback messages to measure the round- | ||||
| trip time (RTT). | ||||
| o The loss event rate and RTT are then fed into TFRC's throughput | ||||
| equation, giving the acceptable transmit rate. | ||||
| o The sender then adjusts its transmit rate to match the calculated | ||||
| rate. | ||||
| The dynamics of TFRC are sensitive to how the measurements are performed | ||||
| and applied. We recommend specific mechanisms below to perform and | ||||
| apply these measurements. Other mechanisms are possible, but it is | ||||
| important to understand how the interactions between mechanisms affect | ||||
| the dynamics of TFRC. | ||||
| 3.1. TCP Throughput Equation | ||||
| Any realistic equation giving TCP throughput as a function of loss event | ||||
| rate and RTT should be suitable for use in TFRC. However, we note that | ||||
| the TCP throughput equation used must reflect TCP's retransmit timeout | ||||
| behavior, as this dominates TCP throughput at higher loss rates. We | ||||
| also note that the assumptions implicit in the throughput equation about | ||||
| the loss event rate parameter have to be a reasonable match to how the | ||||
| loss rate or loss event rate is actually measured. While this match is | ||||
| not perfect for the throughput equation and loss rate measurement | ||||
| mechanisms given below, in practice the assumptions turn out to be close | ||||
| enough. | ||||
| The throughput equation we currently recommend for TFRC is a slightly | ||||
| simplified version of the throughput equation for Reno TCP from [4]. | ||||
| Ideally we'd prefer a throughput equation based on SACK TCP, but no one | ||||
| has yet derived the throughput equation for SACK TCP, and from both | ||||
| simulations and experiments, the differences between the two equations | ||||
| are relatively minor. | ||||
| The throughput equation is: | ||||
| s | ||||
| X = ---------------------------------------------------------- | ||||
| R*sqrt(2*b*p/3) + (t_RTO * (3*sqrt(3*b*p/8) * p * (1+32*p^2))) | ||||
| Where: | ||||
| X is the transmit rate in bytes/second. | ||||
| s is the packet size in bytes. | ||||
| R is the round trip time in seconds. | ||||
| p is the loss event rate, between 0 and 1.0, of the number of loss | ||||
| events as a fraction of the number of packets transmitted. | ||||
| t_RTO is the TCP retransmission timeout value in seconds. | ||||
| b is the number of packets acknowledged by a single TCP | ||||
| acknowledgement. | ||||
| We further simplify this by setting t_RTO = 4*R. A more accurate | ||||
| calculation of t_RTO is possible, but experiments with the current | ||||
| setting have resulted in reasonable fairness with existing TCP | ||||
| implementations [5]. Another possibility would be to set t_RTO = max(4R, | ||||
| one second), to match the recommended minimum of one second on the RTO | ||||
| [8]. | ||||
| Many current TCP connections use delayed acknowledgements, sending an | ||||
| acknowledgement for every two data packets received, and thus have a | ||||
| sending rate modeled by b = 2. However, TCP is also allowed to send an | ||||
| acknowledgement for every data packet, and this would be modeled by b = | ||||
| 1. Because many TCP implementations do not use delayed | ||||
| acknowledgements, we recommend b = 1. | ||||
| In future, different TCP equations may be substituted for this equation. | ||||
| The requirement is that the throughput equation be a reasonable | ||||
| approximation of the sending rate of TCP for conformant TCP congestion | ||||
| control. | ||||
| The parameters s (packet size), p (loss event rate) and R (RTT) need to | ||||
| be measured or calculated by a TFRC implementation. The measurement of | ||||
| s is specified in Section 4.1, measurement of R is specified in Section | ||||
| 4.3, and measurement of p is specified in Section 5. In the rest of this | ||||
| document all data rates are measured in bytes/second. | ||||
| 3.2. Packet Contents | ||||
| Before specifying the sender and receiver functionality, we describe the | ||||
| contents of the data packets sent by the sender and feedback packets | ||||
| sent by the receiver. As TFRC will be used along with a transport | ||||
| protocol, we do not specify packet formats, as these depend on the | ||||
| details of the transport protocol used. | ||||
| 3.2.1. Data Packets | ||||
| Each data packet sent by the data sender contains the following | ||||
| information: | ||||
| o A sequence number. This number is incremented by one for each data | ||||
| packet transmitted. The field must be sufficiently large that it | ||||
| does not wrap causing two different packets with the same sequence | ||||
| number to be in the receiver's recent packet history at the same | ||||
| time. | ||||
| o A timestamp indicating when the packet is sent. We denote by ts_i | ||||
| the timestamp of the packet with sequence number i. The resolution | ||||
| of the timestamp should typically be measured in milliseconds. | ||||
| This timestamp is used by the receiver to determine which losses | ||||
| belong to the same loss event. The timestamp is also echoed by the | ||||
| receiver to enable the sender to estimate the round-trip time, for | ||||
| senders that do not save timestamps of transmitted data packets. | ||||
| We note that as an alternative to a timestamp incremented in | ||||
| milliseconds, a "timestamp" that increments every quarter of a | ||||
| round-trip time would be sufficient for determining when losses | ||||
| belong to the same loss event, in the context of a protocol where | ||||
| this is understood by both sender and receiver, and where the | ||||
| sender saves the timestamps of transmitted data packets. | ||||
| o The sender's current estimate of the round trip time. The estimate | ||||
| reported in packet i is denoted by R_i. The round-trip time | ||||
| estimate is used by the receiver, along with the timestamp, to | ||||
| determine when multiple losses belong to the same loss event. | ||||
| If the sender sends a coarse-grained "timestamp" that increments | ||||
| every quarter of a round-trip time, as discussed above, then the | ||||
| sender does not need to send its current estimate of the round trip | ||||
| time. | ||||
| 3.2.2. Feedback Packets | ||||
| Each feedback packet sent by the data receiver contains the following | ||||
| information: | ||||
| o The timestamp of the last data packet received. We denote this by | ||||
| t_recvdata. If the last packet received at the receiver has | ||||
| sequence number i, then t_recvdata = ts_i. | ||||
| This timestamp is used by the sender to estimate the round-trip | ||||
| time, and is only needed if the sender does not save timestamps of | ||||
| transmitted data packets. | ||||
| o The amount of time elapsed between the receipt of the last data | ||||
| packet at the receiver, and the generation of this feedback report. | ||||
| We denote this by t_delay. | ||||
| o The rate at which the receiver estimates that data was received | ||||
| since the last feedback report was sent. We denote this by X_recv. | ||||
| o The receiver's current estimate of the loss event rate, p. | ||||
| 4. Data Sender Protocol | ||||
| The data sender sends a stream of data packets to the data receiver at a | ||||
| controlled rate. When a feedback packet is received from the data | ||||
| receiver, the data sender changes its sending rate, based on the | ||||
| information contained in the feedback report. If the sender does not | ||||
| receive a feedback report for two round trip times, it cuts its sending | ||||
| rate in half. This is achieved by means of a timer called the | ||||
| nofeedback timer. | ||||
| We specify the sender-side protocol in the following steps: | ||||
| o Measurement of the mean packet size being sent. | ||||
| o The sender behavior when a feedback packet is received. | ||||
| o The sender behavior when the nofeedback timer expires. | ||||
| o Oscillation prevention (optional) | ||||
| o Scheduling of transmission on non-realtime operating systems. | ||||
| 4.1. Measuring the Packet Size | ||||
| The parameter s (packet size) is normally known to an application. This | ||||
| may not be so in two cases: | ||||
| o The packet size naturally varies depending on the data. In this | ||||
| case, although the packet size varies, that variation is not | ||||
| coupled to the transmit rate. It should normally be safe to use an | ||||
| estimate of the mean packet size for s. | ||||
| o The application needs to change the packet size rather than the | ||||
| number of packets per second to perform congestion control. This | ||||
| would normally be the case with packet audio applications where a | ||||
| fixed interval of time needs to be represented by each packet. | ||||
| Such applications need to have a completely different way of | ||||
| measuring parameters. | ||||
| The second class of applications are discussed separately in a separate | ||||
| document on TFRC-PS. For the remainder of this section we assume the | ||||
| sender can estimate the packet size, and that congestion control is | ||||
| performed by adjusting the number of packets sent per second. | ||||
| 4.2. Sender Initialization | ||||
| To initialize the sender, the value of X is set to 1 packet/second and | ||||
| the nofeedback timer is set to expire after 2 seconds. The initial | ||||
| values for R (RTT) and t_RTO are undefined until they are set as | ||||
| described below. The intial value of tld, for the Time Last Doubled | ||||
| during slow-start, is set to -1. | ||||
| 4.3. Sender behavior when a feedback packet is received | ||||
| The sender knows its current sending rate, X, and maintains an estimate | ||||
| of the current round trip time, R, and an estimate of the timeout | ||||
| interval, t_RTO. | ||||
| When a feedback packet is received by the sender at time t_now, the | ||||
| following actions should be performed: | ||||
| 1) Calculate a new round trip sample. | ||||
| R_sample = (t_now - t_recvdata) - t_delay. | ||||
| 2) Update the round trip time estimate: | ||||
| If no feedback has been received before | ||||
| R = R_sample; | ||||
| Else | ||||
| R = q*R + (1-q)*R_sample; | ||||
| TFRC is not sensitive to the precise value for the filter constant | ||||
| q, but we recommend a default value of 0.9. | ||||
| 3) Update the timeout interval: | ||||
| t_RTO = 4*R. | ||||
| 4) Update the sending rate as follows: | ||||
| If (p > 0) | ||||
| Calculate X_calc using the TCP throughput equation. | ||||
| X = max(min(X_calc, 2*X_recv), s/64); | ||||
| Else | ||||
| If (t_now - tld >= RTT) | ||||
| X = max(min(2*X, 2*X_recv), s/RTT); | ||||
| tld = t_now; | ||||
| Note that if p == 0, then the sender is in slow-start phase, | ||||
| where it approximately doubles the sending rate each round- | ||||
| trip time until a loss occurs. The s/RTT term gives a minimum | ||||
| sending rate during slow-start of one packet per RTT. When p | ||||
| > 0, the sender sends at least one packet every 64 seconds. | ||||
| 5) Reset the nofeedback timer to expire after max(4*R, 2*s/X) seconds. | ||||
| 4.4. Expiration of nofeedback timer | ||||
| If the nofeedback timer expires, the sender should perform the following | ||||
| actions: | ||||
| 1) Cut the sending rate in half. This is done by modifying the | ||||
| sender's cached copy of X_recv (the receive rate). Because the | ||||
| sending rate is limited to at most twice X_recv, modifying X_recv | ||||
| limits the current sending rate, but allows the sender to slow- | ||||
| start, doubling its sending rate each RTT, if feedback messages | ||||
| resume reporting no losses. | ||||
| If (X_calc > 2*X_recv) | ||||
| X_recv = max(X_recv/2, s/128); | ||||
| Else | ||||
| X_recv = X_calc/4; | ||||
| The s/128 term limits the backoff to one packet every 64 seconds in | ||||
| the case of persistent absence of feedback. | ||||
| 2) The value of X must then be recalculated as described under point | ||||
| (4) above. | ||||
| If the nofeedback timer expires when the sender does not yet have | ||||
| an RTT sample, and has not yet received any feedback from the | ||||
| sender, then step (1) can be skipped, and the sending rate cut in | ||||
| half directly: | ||||
| X = max(X/2, s/64) | ||||
| 3) Restart the nofeedback timer to expire after max(4*R, 2*s/X) | ||||
| seconds. | ||||
| Note that when the sender stops sending, the receiver will stop sending | ||||
| feedback. This will cause the nofeedback timer to start to expire and | ||||
| decrease X_recv. If the sender subsequently starts to send again, | ||||
| X_recv will limit the transmit rate, and a normal slowstart phase will | ||||
| occur until the transmit rate reaches X_calc. | ||||
| 4.5. Preventing Oscillations | ||||
| To prevent oscillatory behavior in environments with a low degree of | ||||
| statistical multiplexing it is useful to modify sender's transmit rate | ||||
| to provide congestion avoidance behavior by reducing the transmit rate | ||||
| as the queuing delay (and hence RTT) increases. To do this the sender | ||||
| maintains an estimate of the long-term RTT and modifies its sending rate | ||||
| depending on how the most recent sample of the RTT differs from this | ||||
| value. The long-term sample is R_sqmean, and is set as follows: | ||||
| If no feedback has been received before | ||||
| R_sqmean = sqrt(R_sample); | ||||
| Else | ||||
| R_sqmean = q2*R_sqmean + (1-q2)*sqrt(R_sample); | ||||
| Thus R_sqmean gives the exponentially weighted moving average of the | ||||
| square root of the RTT samples. The constant q2 should be set similarly | ||||
| to q, and we recommend a value of 0.9 as the default. | ||||
| The sender obtains the base transmit rate, X, from the throughput | ||||
| function. It then calculates a modified instantaneous transmit rate | ||||
| X_inst, as follows: | ||||
| X_inst = X * R_sqmean / sqrt(R_sample); | ||||
| When sqrt(R_sample) is greater than R_sqmean then the queue is typically | ||||
| increasing and so the transmit rate needs to be decreased for stable | ||||
| operation. | ||||
| Note: This modification is not always strictly required, especially if | ||||
| the degree of statistical multiplexing in the network is high. However | ||||
| we recommend that it is done because it does make TFRC behave better in | ||||
| environments with a low level of statistical multiplexing. If it is not | ||||
| done, we recommend using a very low value of q, such that q is close to | ||||
| or exactly zero. | ||||
| 4.6. Scheduling of Packet Transmissions | ||||
| As TFRC is rate-based, and as operating systems typically cannot | ||||
| schedule events precisely, it is necessary to be opportunistic about | ||||
| sending data packets so that the correct average rate is maintained | ||||
| despite the course-grain or irregular scheduling of the operating | ||||
| system. Thus a typical sending loop will calculate the correct inter- | ||||
| packet interval, t_ipi, as follows: | ||||
| t_ipi = s/X_inst; | ||||
| When a sender first starts sending at time t_0, it calculates t_ipi, and | ||||
| calculates a nominal send time t_1 = t_0 + t_ipi for packet 1. When the | ||||
| application becomes idle, it checks the current time, t_now, and then | ||||
| requests re-scheduling after (t_ipi - (t_now - t_0)) seconds. When the | ||||
| application is re-scheduled, it checks the current time, t_now, again. | ||||
| If (t_now > t_1 - delta) then packet 1 is sent. | ||||
| Now a new t_ipi may be calculated, and used to calculate a nominal send | ||||
| time t_2 for packet 2: t2 = t_1 + t_ipi. The process then repeats, with | ||||
| each successive packet's send time being calculated from the nominal | ||||
| send time of the previous packet. | ||||
| In some cases, when the nominal send time, t_i, of the next packet is | ||||
| calculated, it may already be the case that t_now > t_i - delta. In | ||||
| such a case the packet should be sent immediately. Thus if the | ||||
| operating system has coarse timer granularity and the transmit rate is | ||||
| high, then TFRC may send short bursts of several packets separated by | ||||
| intervals of the OS timer granularity. | ||||
| The parameter delta is to allow a degree of flexibility in the send time | ||||
| of a packet. If the operating system has a scheduling timer granularity | ||||
| of t_gran seconds, then delta would typically be set to: | ||||
| delta = min(t_ipi/2, t_gran/2); | ||||
| t_gran is 10ms on many Unix systems. If t_gran is not known, a value of | ||||
| 10ms can be safely assumed. | ||||
| 5. Calculation of the Loss Event Rate (p) | ||||
| Obtaining an accurate and stable measurement of the loss event rate is | ||||
| of primary importance for TFRC. Loss rate measurement is performed at | ||||
| the receiver, based on the detection of lost or marked packets from the | ||||
| sequence numbers of arriving packets. We describe this process before | ||||
| describing the rest of the receiver protocol. | ||||
| 5.1. Detection of Lost or Marked Packets | ||||
| TFRC assumes that all packets contain a sequence number that is | ||||
| incremented by one for each packet that is sent. For the purposes of | ||||
| this specification, we require that if a lost packet is retransmitted, | ||||
| the retransmission is given a new sequence number that is the latest in | ||||
| the transmission sequence, and not the same sequence number as the | ||||
| packet that was lost. If a transport protocol has the requirement that | ||||
| it must retransmit with the original sequence number, then the transport | ||||
| protocol designer must figure out how to distinguish delayed from | ||||
| retransmitted packets and how to detect lost retransmissions. | ||||
| The receiver maintains a data structure that keeps track of which | ||||
| packets have arrived and which are missing. For the purposes of | ||||
| specification, we assume that the data structure consists of a list of | ||||
| packets that have arrived along with the receiver timestamp when each | ||||
| packet was received. In practice this data structure will normally be | ||||
| stored in a more compact representation, but this is implementation- | ||||
| specific. | ||||
| The loss of a packet is detected by the arrival of at least three | ||||
| packets with a higher sequence number than the lost packet. The | ||||
| requirement for three subsequent packets is the same as with TCP, and is | ||||
| to make TFRC more robust in the presence of reordering. In contrast to | ||||
| TCP, if a packet arrives late (after 3 subsequent packets arrived) in | ||||
| TFRC, the late packet can fill the hole in TFRC's reception record, and | ||||
| the receiver can recalculate the loss event rate. Future versions of | ||||
| TFRC might make the requirement for three subsequent packets adaptive | ||||
| based on experienced packet reordering, but we do not specify such a | ||||
| mechanism here. | ||||
| For an ECN-capable connection, a marked packet is detected as a | ||||
| congestion event as soon as it arrives, without having to wait for the | ||||
| arrival of subsequent packets. | ||||
| 5.2. Translation from Loss History to Loss Events | ||||
| TFRC requires that the loss fraction be robust to several consecutive | ||||
| packets lost where those packets are part of the same loss event. This | ||||
| is similar to TCP, which (typically) only performs one halving of the | ||||
| congestion window during any single RTT. Thus the receiver needs to map | ||||
| the packet loss history into a loss event record, where a loss event is | ||||
| one or more packets lost in an RTT. To perform this mapping, the | ||||
| receiver needs to know the RTT to use, and this is supplied periodically | ||||
| by the sender, typically as control information piggy-backed onto a data | ||||
| packet. TFRC is not sensitive to how the RTT measurement sent to the | ||||
| receiver is made, but we recommend using the sender's calculated RTT, R, | ||||
| (see Section 4.3) for this purpose. | ||||
| To determine whether a lost or marked packet should start a new loss | ||||
| event, or be counted as part of an existing loss event, we need to | ||||
| compare the sequence numbers and timestamps of the packets that arrived | ||||
| at the receiver. For a marked packet S_new, its reception time T_new | ||||
| can be noted directly. For a lost packet, we can interpolate to infer | ||||
| the nominal "arrival time". Assume: | ||||
| S_loss is the sequence number of a lost packet. | ||||
| S_before is the sequence number of the last packet to arrive with | ||||
| sequence number before S_loss. | ||||
| S_after is the sequence number of the first packet to arrive with | ||||
| sequence number after S_loss. | ||||
| T_before is the reception time of S_before. | ||||
| T_after is the reception time of S_after. | ||||
| Note that T_before can either be before or after T_after due to | ||||
| reordering. | ||||
| For a lost packet S_loss, we can interpolate its nominal "arrival time" | ||||
| at the receiver from the arrival times of S_before and S_after. Thus | ||||
| T_loss = T_before + ( (T_after - T_before) | ||||
| * (S_loss - S_before)/(S_after - S_before) ); | ||||
| Note that if the sequence space wrapped between S_before and S_after, | ||||
| then the sequence numbers must be modified to take this into account | ||||
| before performing this calculation. If the largest possible sequence | ||||
| number is S_max, and S_before > S_after, then modifying each sequence | ||||
| number S by S' = (S + (S_max + 1)/2) mod (S_max + 1) would normally be | ||||
| sufficient. | ||||
| If the lost packet S_old was determined to have started the previous | ||||
| loss event, and we have just determined that S_new has been lost, then | ||||
| we interpolate the nominal arrival times of S_old and S_new, called | ||||
| T_old and T_new respectively. | ||||
| If T_old + R >= T_new, then S_new is part of the existing loss event. | ||||
| Otherwise S_new is the first packet in a new loss event. | ||||
| 5.3. Inter-loss Event Interval | ||||
| If a loss interval, A, is determined to have started with packet | ||||
| sequence number S_A and the next loss interval, B, started with packet | ||||
| sequence number S_B, then the number of packets in loss interval A is | ||||
| given by (S_B - S_A). | ||||
| 5.4. Average Loss Interval | ||||
| To calculate the loss event rate p, we first calculate the average loss | ||||
| interval. This is done using a filter that weights the n most recent | ||||
| loss event intervals in such a way that the measured loss event rate | ||||
| changes smoothly. | ||||
| Weights w_0 to w_(n-1) are calculated as: | ||||
| If (i < n/2) | ||||
| w_i = 1; | ||||
| Else | ||||
| w_i = 1 - (i - (n/2 - 1))/(n/2 + 1); | ||||
| Thus if n=8, the values of w_0 to w_7 are: | ||||
| 1.0, 1.0, 1.0, 1.0, 0.8, 0.6, 0.4, 0.2 | ||||
| The value n for the number of loss intervals used in calculating the | ||||
| loss event rate determines TRFC's speed in responding to changes in the | ||||
| level of congestion. As currently specified, TFRC should not be used | ||||
| for values of n significantly greater than 8, for traffic that might | ||||
| compete in the global Internet with TCP. At the very least, safe | ||||
| operation with values of n greater than 8 would require a slight change | ||||
| to TFRC's mechanisms to include a more severe response to two or more | ||||
| round-trip times with heavy packet loss. | ||||
| When calculating the average loss interval we need to decide whether to | ||||
| include the interval since the most recent packet loss event. We only | ||||
| do this if it is sufficiently large to increase the average loss | ||||
| interval. | ||||
| Thus if the most recent loss intervals are I_0 to I_n, with I_0 being | ||||
| the interval since the most recent loss event, then we calculate the | ||||
| average loss interval I_mean as: | ||||
| I_tot0 = 0; | ||||
| I_tot1 = 0; | ||||
| W_tot = 0; | ||||
| for (i = 0 to n-1) { | ||||
| I_tot0 = I_tot0 + (I_i * w_i); | ||||
| W_tot = W_tot + w_i; | ||||
| } | ||||
| for (i = 1 to n) { | ||||
| I_tot1 = I_tot1 + (I_i * w_(i-1)); | ||||
| } | ||||
| I_tot = max(I_tot0, I_tot1); | ||||
| I_mean = I_tot/W_tot; | ||||
| The loss event rate, p is simply: | ||||
| p = 1 / I_mean; | ||||
| 5.5. History Discounting | ||||
| As described in Section 5.4, the most recent loss interval is only | ||||
| assigned 1/(0.75*n) of the total weight in calculating the average loss | ||||
| interval, regardless of the size of the most recent loss interval. This | ||||
| section describes an optional history discounting mechanism, discussed | ||||
| further in [3] and [5], that allows the TFRC receiver to adjust the | ||||
| weights, concentrating more of the relative weight on the most recent | ||||
| loss interval, when the most recent loss interval is more than twice as | ||||
| large as the computed average loss interval. | ||||
| To carry out history discounting, we associate a discount factor DF_i | ||||
| with each loss interval L_i, for i > 0, where each discount factor is a | ||||
| floating point number. The discount array maintains the cumulative | ||||
| history of discounting for each loss interval. At the beginning, the | ||||
| values of DF_i in the discount array are initialized to 1: | ||||
| for (i = 1 to n) { | ||||
| DF_i = 1; | ||||
| } | ||||
| History discounting also uses a general discount factor DF, also a | ||||
| floating point number, that is also initialized to 1. First we show how | ||||
| the discount factors are used in calculating the average loss interval, | ||||
| and then we describe later in this section how the discount factors are | ||||
| modified over time. | ||||
| As described in Section 5.4 the average loss interval is calculated | ||||
| using the n previous loss intervals I_1, ..., I_n, and the interval I_0 | ||||
| that represents the number of packets received since the last loss | ||||
| event. The computation of the average loss interval using the discount | ||||
| factors is a simple modification of the procedure in Section 5.4, as | ||||
| follows: | ||||
| I_tot0 = I_0 * w_0 | ||||
| I_tot1 = 0; | ||||
| W_tot0 = w_0 | ||||
| W_tot1 = 0; | ||||
| for (i = 1 to n-1) { | ||||
| I_tot0 = I_tot0 + (I_i * w_i * DF_i * DF); | ||||
| W_tot0 = W_tot0 + w_i * DF_i * DF; | ||||
| } | ||||
| for (i = 1 to n) { | ||||
| I_tot1 = I_tot1 + (I_i * w_(i-1) * DF_i); | ||||
| W_tot1 = W_tot1 + w_(i-1) * DF_i; | ||||
| } | ||||
| p = min(W_tot0/I_tot0, W_tot1/I_tot1); | ||||
| The general discounting factor, DF is updated on every packet arrival as | ||||
| follows. First, the receiver computes the weighted average I_mean of the | ||||
| loss intervals I_1, ..., I_n: | ||||
| I_tot = 0; | ||||
| W_tot = 0; | ||||
| for (i = 1 to n) { | ||||
| W_tot = W_tot + w_(i-1) * DF_i; | ||||
| I_tot = I_tot + (I_i * w_(i-1) * DF_i); | ||||
| } | ||||
| I_mean = I_tot / W_tot; | ||||
| This weighted average I_mean is compared to I_0, the number of packets | ||||
| received since the last loss event. If I_0 is greater than twice | ||||
| I_mean, then the new loss interval is considerably larger than the old | ||||
| ones, and the general discount factor DF is updated to decrease the | ||||
| relative weight on the older intervals, as follows: | ||||
| if (I_0 > 2 * I_mean) { | ||||
| DF = 2 * I_mean/I_0; | ||||
| if (DF < THRESHOLD) | ||||
| DF = THRESHOLD; | ||||
| } else | ||||
| DF = 1; | ||||
| A nonzero value for THRESHOLD ensures that older loss intervals from an | ||||
| earlier time of high congestion are not discounted entirely. We | ||||
| recommend a THRESHOLD of 0.5. Note that with each new packet arrival, | ||||
| I_0 will increase further, and the discount factor DF will be updated. | ||||
| When a new loss event occurs, the current interval shifts from I_0 to | ||||
| I_1, loss interval I_i shifts to interval I_(i+1), and the loss interval | ||||
| I_n is forgotten. The previous discount factor DF has to be | ||||
| incorporated into the discount array. Because DF_i carries the discount | ||||
| factor associated with loss interval I_i, the DF_i array has to be | ||||
| shifted as well. This is done as follows: | ||||
| for (i = 1 to n) { | ||||
| DF_i = DF * DF_i; | ||||
| } | ||||
| for (i = n-1 to 0 step -1) { | ||||
| DF_(i+1) = DF_i; | ||||
| } | ||||
| I_0 = 1; | ||||
| DF_0 = 1; | ||||
| DF = 1; | ||||
| This completes the description of the optional history discounting | ||||
| mechanism. We emphasize that this is an optional mechanism whose sole | ||||
| purpose is to allow TFRC to response somewhat more quickly to the sudden | ||||
| absence of congestion, as represented by a long current loss interval. | ||||
| 6. Data Receiver Protocol | ||||
| The receiver periodically sends feedback messages to the sender. | ||||
| Feedback packets should normally be sent at least once per RTT, unless | ||||
| the sender is sending at a rate of less than one packet per RTT, in | ||||
| which case a feedback packet should be send for every data packet | ||||
| received. A feedback packet should also be sent whenever a new loss | ||||
| event is detected without waiting for the end of an RTT, and whenever an | ||||
| out-of-order data packet is received that removes a loss event from the | ||||
| history. | ||||
| If the sender is transmitting at a high rate (many packets per RTT) | ||||
| there may be some advantages to sending periodic feedback messages more | ||||
| than once per RTT as this allows faster response to changing RTT | ||||
| measurements, and more resilience to feedback packet loss. However | ||||
| there is little gain from sending a large number of feedback messages | ||||
| per RTT. | ||||
| 6.1. Receiver behavior when a data packet is received | ||||
| When a data packet is received, the receiver performs the following | ||||
| steps: | ||||
| 1) Add the packet to the packet history. | ||||
| 2) Let the previous value of p be p_prev. Calculate the new value of | ||||
| p as described in Section 5. | ||||
| 3) If p > p_prev, cause the feedback timer to expire, and perform the | ||||
| actions described in Section 6.2 | ||||
| If p <= p_prev no action need be performed. | ||||
| However an optimization might check to see if the arrival of the | ||||
| packet caused a hole in the packet history to be filled and | ||||
| consequently two loss intervals were merged into one. If this is | ||||
| the case, the receiver might also send feedback immediately. The | ||||
| effects of such an optimization are normally expected to be small. | ||||
| 6.2. Expiration of feedback timer | ||||
| When the feedback timer at the receiver expires, the action to be taken | ||||
| depends on whether data packets have been received since the last | ||||
| feedback was sent. | ||||
| Let the maximum sequence number of a packet at the receiver so far be | ||||
| S_m, and the value of the RTT measurement included in packet S_m be R_m. | ||||
| If data packets have been received since the pervious feedback was sent, | ||||
| the receiver performs the following steps: | ||||
| 1) Calculate the average loss event rate using the algorithm described | ||||
| above. | ||||
| 2) Calculate the measured receive rate, X_recv, based on the packets | ||||
| received within the previous R_m seconds. | ||||
| 3) Prepare and send a feedback packet containing the information | ||||
| described in Section 3.2.2 | ||||
| 4) Restart the feedback timer to expire after R_m seconds. | ||||
| If no data packets have been received since the last feedback was sent, | ||||
| no feedback packet is sent, and the feedback timer is restarted to | ||||
| expire after R_m seconds. | ||||
| 6.3. Receiver initialization | ||||
| The receiver is initialized by the first packet that arrives at the | ||||
| receiver. Let the sequence number of this packet be i. | ||||
| When the first packet is received: | ||||
| o Set p=0 | ||||
| o Set X_recv = 0. | ||||
| o Prepare and send a feedback packet. | ||||
| o Set the feedback timer to expire after R_i seconds. | ||||
| 6.3.1. Initializing the Loss History after the First Loss Event | ||||
| The number of packets until the first loss can not be used to compute | ||||
| the sending rate directly, as the sending rate changes rapidly during | ||||
| this time. TFRC assumes that the correct data rate after the first loss | ||||
| is half of the sending rate when the loss occurred. TFRC approximates | ||||
| this target rate by X_recv, the receive rate over the most recent round- | ||||
| trip time. After the first loss, instead of initializing the first loss | ||||
| interval to the number of packets sent until the first loss, the TFRC | ||||
| receiver calculates the loss interval that would be required to produce | ||||
| the data rate X_recv, and uses this synthetic loss interval to seed the | ||||
| loss history mechanism. | ||||
| TFRC does this by finding some value p for which the throughput equation | ||||
| in Section 3.1 gives a sending rate within 5% of X_recv, given the | ||||
| current packet size s and round-trip time R. The first loss interval is | ||||
| then set to 1/p. (The 5% tolerance is introduced simply because the | ||||
| throughput equation is difficult to invert, and we want to reduce the | ||||
| costs of calculating p numerically.) | ||||
| 7. Security Considerations | ||||
| TFRC is not a transport protocol in its own right, but a congestion | ||||
| control mechanism that is intended to be used in conjunction with a | ||||
| transport protocol. Therefore security primarily needs to be considered | ||||
| in the context of a specific transport protocol and its authentication | ||||
| mechanisms. | ||||
| Congestion control mechanisms can potentially be exploited to create | ||||
| denial of service. This may occur through spoofed feedback. Thus any | ||||
| transport protocol that uses TFRC should take care to ensure that | ||||
| feedback is only accepted from the receiver of the data. The precise | ||||
| mechanism to achieve this will however depend on the transport protocol | ||||
| itself. | ||||
| In addition, congestion control mechanisms may potentially be | ||||
| manipulated by a greedy receiver that wishes to receive more than its | ||||
| fair share of network bandwidth. A receiver might do this by claiming | ||||
| to have received packets that in fact were lost due to congestion. | ||||
| Possible defenses against such a receiver would normally include some | ||||
| form of nonce that the receiver must feed back to the sender to prove | ||||
| receipt. However, the details of such a nonce would depend on the | ||||
| transport protocol, and in particular on whether the transport protocol | ||||
| is reliable or unreliable. | ||||
| We expect that protocols incorporating ECN with TFRC will also want to | ||||
| incorporate feedback from the receiver to the sender using the ECN nonce | ||||
| [WES01]. The ECN nonce is a modification to ECN that protects the | ||||
| sender from the accidental or malicious concealment of marked packets. | ||||
| Again, the details of such a nonce would depend on the transport | ||||
| protocol, and are not addressed in this document. | ||||
| 8. IANA Considerations | ||||
| There are no IANA actions required for this document. | ||||
| 9. Authors' Addresses | ||||
| Mark Handley, Jitendra Padhye, Sally Floyd | ||||
| ICIR/ICSI | ||||
| 1947 Center St, Suite 600 | ||||
| Berkeley, CA 94708 | ||||
| mjh@icir.org, padhye@icir.org, floyd@icir.org | ||||
| Joerg Widmer | ||||
| Lehrstuhl Praktische Informatik IV | ||||
| Universitat Mannheim | ||||
| L 15, 16 - Room 415 | ||||
| D-68131 Mannheim | ||||
| Germany | ||||
| widmer@informatik.uni-mannheim.de | ||||
| 10. Acknowledgments | ||||
| We would like to acknowledge feedback and discussions on equation-based | ||||
| congestion control with a wide range of people, including members of the | ||||
| Reliable Multicast Research Group, the Reliable Multicast Transport | ||||
| Working Group, and the End-to-End Research Group. We would like to | ||||
| thank Ken Lofgren, Eduardo Urzaiz, Vladica Stanisic, Shushan Wen, and | ||||
| Wendy Lee (lhh@zsu.edu.cn) for feedback on earlier versions of this | ||||
| document, and to thank Mark Allman for his extensive feedback from using | ||||
| the draft to produce a working implementation. | ||||
| 11. References | ||||
| [1] Balakrishnan, H., Rahul, H., and Seshan, S., "An Integrated | ||||
| Congestion Management Architecture for Internet Hosts," Proc. ACM | ||||
| SIGCOMM, Cambridge, MA, September 1999. | ||||
| [2] S. Floyd, M. Handley, J. Padhye, and J. Widmer, "Equation-Based | This document is a product of the Transport Area Working Group of the | |||
| Congestion Control for Unicast Applications", August 2000, Proc | IETF. | |||
| SIGCOMM 2000. | ||||
| [3] S. Floyd, M. Handley, J. Padhye, and J. Widmer, "Equation-Based | This is now a Proposed Standard Protocol. | |||
| Congestion Control for Unicast Applications: the Extended Version", | ||||
| ICSI tech report TR-00-03, March 2000. | ||||
| [4] Padhye, J. and Firoiu, V. and Towsley, D. and Kurose, J., "Modeling | This document specifies an Internet standards track protocol for | |||
| TCP Throughput: A Simple Model and its Empirical Validation", Proc | the Internet community, and requests discussion and suggestions | |||
| ACM SIGCOMM 1998. | for improvements. Please refer to the current edition of the | |||
| "Internet Official Protocol Standards" (STD 1) for the | ||||
| standardization state and status of this protocol. Distribution | ||||
| of this memo is unlimited. | ||||
| [5] Widmer, J., "Equation-Based Congestion Control", Diploma Thesis, | This announcement is sent to the IETF list and the RFC-DIST list. | |||
| University of Mannheim, February 2000. URL | Requests to be added to or deleted from the IETF distribution list | |||
| "http://www.icir.org/tfrc/". | should be sent to IETF-REQUEST@IETF.ORG. Requests to be | |||
| added to or deleted from the RFC-DIST distribution list should | ||||
| be sent to RFC-DIST-REQUEST@RFC-EDITOR.ORG. | ||||
| [6] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A | Details on obtaining RFCs via FTP or EMAIL may be obtained by sending | |||
| Transport Protocol for Real-Time Applications", RFC 1889, January | an EMAIL message to rfc-info@RFC-EDITOR.ORG with the message body | |||
| 1996. | help: ways_to_get_rfcs. For example: | |||
| [7] K. Ramakrishnan and S. Floyd, "A Proposal to add Explicit Congestion | To: rfc-info@RFC-EDITOR.ORG | |||
| Notification (ECN) to IP", RFC 2481, January 1999. | Subject: getting rfcs | |||
| [8] V. Paxson and M. Allman, "Computing TCP's Retransmission Timer", RFC | help: ways_to_get_rfcs | |||
| 2988, November 2000. | ||||
| [9] Wetherall, D., Ely, D., and Spring, N., "Robust ECN Signaling with | Requests for special distribution should be addressed to either the | |||
| Nonces", draft-ietf-tsvwg-tcp-nonce-00.txt, internet draft, work in | author of the RFC in question, or to RFC-Manager@RFC-EDITOR.ORG. Unless | |||
| progress, January 2001. Citation for informational purposes only. | specifically noted otherwise on the RFC itself, all RFCs are for | |||
| unlimited distribution.echo | ||||
| Submissions for Requests for Comments should be sent to | ||||
| RFC-EDITOR@RFC-EDITOR.ORG. Please consult RFC 2223, Instructions to RFC | ||||
| Authors, for further information. | ||||
| End of changes. 14 change blocks. | ||||
| 968 lines changed or deleted | 42 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||