idnits 2.17.1 draft-ietf-tsvwg-tfrc-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (20 July 2001) is 8316 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'BRS99' on line 92 -- Looks like a reference, but probably isn't: 'WES01' on line 882 == Unused Reference: '1' is defined on line 920, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 950, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' ** Obsolete normative reference: RFC 1889 (ref. '6') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 2481 (ref. '7') (Obsoleted by RFC 3168) ** Obsolete normative reference: RFC 2988 (ref. '8') (Obsoleted by RFC 6298) == Outdated reference: A later version (-04) exists of draft-ietf-tsvwg-tcp-nonce-00 ** Downref: Normative reference to an Historic draft: draft-ietf-tsvwg-tcp-nonce (ref. '9') Summary: 7 errors (**), 0 flaws (~~), 4 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force TSV WG 2 INTERNET-DRAFT Mark Handley/ACIRI 3 draft-ietf-tsvwg-tfrc-03.txt Jitendra Padhye/ACIRI 4 Sally Floyd/ACIRI 6 Joerg Widmer/Univ. Mannheim 7 20 July 2001 8 Expires: January 2002 10 TCP Friendly Rate Control (TFRC): 11 Protocol Specification 13 Status of this Document 15 This document is an Internet-Draft and is in full conformance with all 16 provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering Task 19 Force (IETF), its areas, and its working groups. Note that other groups 20 may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet- Drafts as reference material 25 or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This document is a product of the IETF TSV WG. Comments should be 34 addressed to the authors. 36 Abstract 38 This document specifies TCP-Friendly Rate Control (TFRC). 39 TFRC is a congestion control mechanism for unicast flows 40 operating in a best-effort Internet environment. It is 41 reasonably fair when competing for bandwidth with TCP flows, 42 but has a much lower variation of throughput over time 43 compared with TCP, making it more suitable for applications 44 such as telephony or streaming media where a relatively smooth 45 sending rate is of importance. 47 Table of Contents 49 1. Introduction. . . . . . . . . . . . . . . . . . . . . . 4 50 2. Terminology . . . . . . . . . . . . . . . . . . . . . . 5 51 3. Protocol Mechanism. . . . . . . . . . . . . . . . . . . 5 52 3.1. TCP Throughput Equation. . . . . . . . . . . . . . . 5 53 3.2. Packet Contents. . . . . . . . . . . . . . . . . . . 7 54 3.2.1. Data Packets. . . . . . . . . . . . . . . . . . . 7 55 3.2.2. Feedback Packets. . . . . . . . . . . . . . . . . 8 56 4. Data Sender Protocol. . . . . . . . . . . . . . . . . . 8 57 4.1. Measuring the Packet Size. . . . . . . . . . . . . . 8 58 4.2. Sender Initialization. . . . . . . . . . . . . . . . 9 59 4.3. Sender behavior when a feedback packet is 60 received. . . . . . . . . . . . . . . . . . . . . . . . . 9 61 4.4. Expiration of nofeedback timer . . . . . . . . . . . 10 62 4.5. Preventing Oscillations. . . . . . . . . . . . . . . 11 63 4.6. Scheduling of Packet Transmissions . . . . . . . . . 12 64 5. Calculation of the Loss Event Rate (p). . . . . . . . . 13 65 5.1. Detection of Lost or Marked Packets. . . . . . . . . 13 66 5.2. Translation from Loss History to Loss Events . . . . 13 67 5.3. Inter-loss Event Interval. . . . . . . . . . . . . . 15 68 5.4. Average Loss Interval. . . . . . . . . . . . . . . . 15 69 5.5. History Discounting. . . . . . . . . . . . . . . . . 16 70 6. Data Receiver Protocol. . . . . . . . . . . . . . . . . 18 71 6.1. Receiver behavior when a data packet is 72 received. . . . . . . . . . . . . . . . . . . . . . . . . 19 73 6.2. Expiration of feedback timer . . . . . . . . . . . . 19 74 6.3. Receiver initialization. . . . . . . . . . . . . . . 20 75 6.3.1. Initializing the Loss History after the 76 First Loss Event . . . . . . . . . . . . . . . . . . . . 20 77 7. Security Considerations . . . . . . . . . . . . . . . . 20 78 8. IANA Considerations . . . . . . . . . . . . . . . . . . 21 79 9. Authors' Addresses. . . . . . . . . . . . . . . . . . . 21 80 10. Acknowledgments. . . . . . . . . . . . . . . . . . . . 22 81 11. References . . . . . . . . . . . . . . . . . . . . . . 22 83 1. Introduction 85 This document specifies TCP-Friendly Rate Control (TFRC). TFRC is a 86 congestion control mechanism designed for unicast flows operating in a 87 Internet environment and competing with TCP traffic [2]. Instead of 88 specifying a complete protocol, this document simply specifies a 89 congestion control mechanism that could be used in a transport protocol 90 such as RTP [6], in an application incorporating end-to-end congestion 91 control at the application level, or in the context of endpoint 92 congestion management [BRS99]. This document does not discuss packet 93 formats, reliability, or implementation-related issues. 95 TFRC is designed to be reasonably fair when competing for bandwidth with 96 TCP flows, where a flow is ``reasonably fair'' if its sending rate is 97 generally within a factor of two of the sending rate of a TCP flow under 98 the same conditions. However TFRC has a much lower variation of 99 throughput over time compared with TCP, which makes it more suitable for 100 applications such as telephony or streaming media where a relatively 101 smooth sending rate is of importance. 103 The penalty of having smoother throughput than TCP while competing 104 fairly for bandwidth is that TFRC responds slower than TCP to changes in 105 available bandwidth. Thus TFRC should only be used when the application 106 has a requirement for smooth throughput, in particular, avoiding TCP's 107 halving of the sending rate in response to a single packet drop. For 108 applications that simply need to transfer as much data as possible in as 109 short a time as possible we recommend using TCP, or if reliability is 110 not required, using an Additive-Increase, Multiplicative-Decrease (AIMD) 111 congestion control scheme with similar parameters to those used by TCP. 113 TFRC is designed for applications that use a fixed packet size, and vary 114 their sending rate in packets per second in response to congestion. 115 Some audio applications require a fixed interval of time between packets 116 and vary their packet size instead of their packet rate in response to 117 congestion. The congestion control mechanism in this document cannot be 118 used by those applications; TFRC-PS (for TFRC-PacketSize) is a variant 119 of TFRC for applications that have a fixed sending rate but vary their 120 packet size in response to congestion. TFRC-PS will be specified in a 121 later document. 123 TFRC is a receiver-based mechanism, with the calculation of the 124 congestion control information (i.e., the loss event rate) in the data 125 receiver rather in the data sender. This is well-suited to an 126 application where the sender is a large server handling many concurrent 127 connections, and the receiver has more memory and CPU cycles available 128 for computation. In addition, a receiver-based mechanism is more 129 suitable as a building block for multicast congestion control. 131 2. Terminology 133 In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", 134 "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 135 "OPTIONAL" are to be interpreted as described in RFC 2119 and indicate 136 requirement levels for compliant TFRC implementations. 138 3. Protocol Mechanism 140 For its congestion control mechanism, TFRC directly uses a throughput 141 equation for the allowed sending rate as a function of the loss event 142 rate and round-trip time. In order to compete fairly with TCP, TFRC 143 uses the TCP throughput equation, which roughly describes TCP's sending 144 rate as a function of the loss event rate, round-trip time, and packet 145 size. We define a loss event as one or more lost or marked packets from 146 a window of data, where a marked packet refers to a congestion 147 indication from Explicit Congestion Notification (ECN) [7]. 149 Generally speaking, TFRC's congestion control mechanism works as 150 follows: 152 o The receiver measures the loss event rate and feeds this 153 information back to the sender. 155 o The sender also uses these feedback messages to measure the round- 156 trip time (RTT). 158 o The loss event rate and RTT are then fed into TFRC's throughput 159 equation, giving the acceptable transmit rate. 161 o The sender then adjusts its transmit rate to match the calculated 162 rate. 164 The dynamics of TFRC are sensitive to how the measurements are performed 165 and applied. We recommend specific mechanisms below to perform and 166 apply these measurements. Other mechanisms are possible, but it is 167 important to understand how the interactions between mechanisms affect 168 the dynamics of TFRC. 170 3.1. TCP Throughput Equation 172 Any realistic equation giving TCP throughput as a function of loss event 173 rate and RTT should be suitable for use in TFRC. However, we note that 174 the TCP throughput equation used must reflect TCP's retransmit timeout 175 behavior, as this dominates TCP throughput at higher loss rates. We 176 also note that the assumptions implicit in the throughput equation about 177 the loss event rate parameter have to be a reasonable match to how the 178 loss rate or loss event rate is actually measured. While this match is 179 not perfect for the throughput equation and loss rate measurement 180 mechanisms given below, in practice the assumptions turn out to be close 181 enough. 183 The throughput equation we currently recommend for TFRC is a slightly 184 simplified version of the throughput equation for Reno TCP from [4]. 185 Ideally we'd prefer a throughput equation based on SACK TCP, but no one 186 has yet derived the throughput equation for SACK TCP, and from both 187 simulations and experiments, the differences between the two equations 188 are relatively minor. 190 The throughput equation is: 192 s 193 X = ---------------------------------------------------------- 194 R*sqrt(2*b*p/3) + (t_RTO * (3*sqrt(3*b*p/8) * p * (1+32*p^2))) 196 Where: 198 X is the transmit rate in bytes/second. 200 s is the packet size in bytes. 202 R is the round trip time in seconds. 204 p is the loss event rate, between 0 and 1.0, of the number of loss 205 events as a fraction of the number of packets transmitted. 207 t_RTO is the TCP retransmission timeout value in seconds. 209 b is the number of packets acknowledged by a single TCP 210 acknowledgement. 212 We further simplify this by setting t_RTO = 4*R. A more accurate 213 calculation of t_RTO is possible, but experiments with the current 214 setting have resulted in reasonable fairness with existing TCP 215 implementations [5]. Another possibility would be to set t_RTO = max(4R, 216 one second), to match the recommended minimum of one second on the RTO 217 [8]. 219 Many current TCP connections use delayed acknowledgements, sending an 220 acknowledgement for every two data packets received, and thus have a 221 sending rate modeled by b = 2. However, TCP is also allowed to send an 222 acknowledgement for every data packet, and this would be modeled by b = 223 1. Because many TCP implementations do not use delayed 224 acknowledgements, we recommend b = 1. 226 In future, different TCP equations may be substituted for this equation. 227 The requirement is that the throughput equation be a reasonable 228 approximation of the sending rate of TCP for conformant TCP congestion 229 control. 231 The parameters s (packet size), p (loss event rate) and R (RTT) need to 232 be measured or calculated by a TFRC implementation. The measurement of 233 s is specified in Section 4.1, measurement of R is specified in Section 234 4.3, and measurement of p is specified in Section 5. In the rest of this 235 document all data rates are measured in bytes/second. 237 3.2. Packet Contents 239 Before specifying the sender and receiver functionality, we describe the 240 contents of the data packets sent by the sender and feedback packets 241 sent by the receiver. As TFRC will be used along with a transport 242 protocol, we do not specify packet formats, as these depend on the 243 details of the transport protocol used. 245 3.2.1. Data Packets 247 Each data packet sent by the data sender contains the following 248 information: 250 o A sequence number. This number is incremented by one for each data 251 packet transmitted. The field must be sufficiently large that it 252 does not wrap causing two different packets with the same sequence 253 number to be in the receiver's recent packet history at the same 254 time. 256 o A timestamp indicating when the packet is sent. We denote by ts_i 257 the timestamp of the packet with sequence number i. The resolution 258 of the timestamp should typically be measured in milliseconds. 260 o The sender's current estimate of the round trip time. The estimate 261 reported in packet i is denoted by R_i. 263 o The sender's current transmit rate. The estimate reported in packet 264 i is denoted by X_i. 266 3.2.2. Feedback Packets 268 Each feedback packet sent by the data receiver contains the following 269 information: 271 o The timestamp of the last data packet received. We denote this by 272 t_recvdata. If the last packet received at the receiver has 273 sequence number i, then t_recvdata = ts_i. 275 o The amount of time elapsed between the receipt of the last data 276 packet at the receiver, and the generation of this feedback report. 277 We denote this by t_delay. 279 o The rate at which the receiver estimates that data was received 280 since the last feedback report was sent. We denote this by X_recv. 282 o The receiver's current estimate of the loss event rate, p. 284 4. Data Sender Protocol 286 The data sender sends a stream of data packets to the data receiver at a 287 controlled rate. When a feedback packet is received from the data 288 receiver, the data sender changes its sending rate, based on the 289 information contained in the feedback report. If the sender does not 290 receive a feedback report for two round trip times, it cuts its sending 291 rate in half. This is achieved by means of a timer called the 292 nofeedback timer. 294 We specify the sender-side protocol in the following steps: 296 o Measurement of the mean packet size being sent. 298 o The sender behavior when a feedback packet is received. 300 o The sender behavior when the nofeedback timer expires. 302 o Oscillation prevention (optional) 304 o Scheduling of transmission on non-realtime operating systems. 306 4.1. Measuring the Packet Size 308 The parameter s (packet size) is normally known to an application. This 309 may not be so in two cases: 311 o The packet size naturally varies depending on the data. In this 312 case, although the packet size varies, that variation is not 313 coupled to the transmit rate. It should normally be safe to use an 314 estimate of the mean packet size for s. 316 o The application needs to change the packet size rather than the 317 number of packets per second to perform congestion control. This 318 would normally be the case with packet audio applications where a 319 fixed interval of time needs to be represented by each packet. 320 Such applications need to have a completely different way of 321 measuring parameters. 323 The second class of applications are discussed separately in a separate 324 document on TFRC-PS. For the remainder of this section we assume the 325 sender can estimate the packet size, and that congestion control is 326 performed by adjusting the number of packets sent per second. 328 4.2. Sender Initialization 330 To initialize the sender, the value of X is set to 1 packet/second and 331 the nofeedback timer is set to expire after 2 seconds. The initial 332 values for R (RTT) and t_RTO are undefined until they are set as 333 described above. The intial value of tld, for the Time Last Doubled 334 during slow-start, is set to -1. 336 4.3. Sender behavior when a feedback packet is received 338 The sender knows its current sending rate, X, and maintains an estimate 339 of the current round trip time, R, and an estimate of the timeout 340 interval, t_RTO. 342 When a feedback packet is received by the sender at time t_now, the 343 following actions should be performed: 345 1) Calculate a new round trip sample. 346 R_sample = (t_now - t_recvdata) - t_delay. 348 2) Update the round trip time estimate: 350 If no feedback has been received before 351 R = R_sample; 352 Else 353 R = q*R + (1-q)*R_sample; 355 TFRC is not sensitive to the precise value for the filter constant 356 q, but we recommend a default value of 0.9. 358 3) Update the timeout interval: 360 t_RTO = 4*R. 362 4) Update the sending rate as follows: 364 If (p > 0) 365 Calculate X_calc using the TCP throughput equation. 366 X = max(min(X_calc, 2*X_recv), s/64); 367 Else 368 If (t_now - tld >= RTT) 369 X = max(min(2*X, 2*X_recv), s/RTT); 370 tld = t_now; 372 Note that if p == 0, then the sender is in slow-start phase, 373 where it approximately doubles the sending rate each round- 374 trip time until a loss occurs. The s/RTT term gives a minimum 375 sending rate during slow-start of one packet per RTT. When p 376 > 0, the sender sends at least one packet every 64 seconds. 378 5) Reset the nofeedback timer to expire after max(4*R, 2*s/X) seconds. 380 4.4. Expiration of nofeedback timer 382 If the nofeedback timer expires, the sender should perform the following 383 actions: 385 1) Cut the sending rate in half. This is done by modifying the 386 sender's cached copy of X_recv (the receive rate). Because the 387 sending rate is limited to at most twice X_recv, modifying X_recv 388 limits the current sending rate, but allows the sender to slow- 389 start, doubling its sending rate each RTT, if feedback messages 390 resume reporting no losses. 392 If (X_calc > 2*X_recv) 393 X_recv = max(X_recv/2, s/128); 394 Else 395 X_recv = X_calc/4; 397 The s/128 term limits the backoff to one packet every 64 seconds in 398 the case of persistent absence of feedback. 400 2) The value of X must then be recalculated as described under point 401 (4) above. 403 If the nofeedback timer expires when the sender does not yet have 404 an RTT sample, and has not yet received any feedback from the 405 sender, then step (1) can be skipped, and the sending rate cut in 406 half directly: 408 X = max(X/2, s/64) 410 3) Restart the nofeedback timer to expire after max(4*R, 2*s/X) 411 seconds. 413 Note that when the sender stops sending, the receiver will stop sending 414 feedback. This will cause the nofeedback timer to start to expire and 415 decrease X_recv. If the sender subsequently starts to send again, 416 X_recv will limit the transmit rate, and a normal slowstart phase will 417 occur until the transmit rate reaches X_calc. 419 4.5. Preventing Oscillations 421 To prevent oscillatory behavior in environments with a low degree of 422 statistical multiplexing it is useful to modify sender's transmit rate 423 to provide congestion avoidance behavior by reducing the transmit rate 424 as the queuing delay (and hence RTT) increases. To do this the sender 425 maintains an estimate of the long-term RTT and modifies its sending rate 426 depending on how the most recent sample of the RTT differs from this 427 value. The long-term sample is R_sqmean, and is set as follows: 429 If no feedback has been received before 430 R_sqmean = sqrt(R_sample); 431 Else 432 R_sqmean = q2*R_sqmean + (1-q2)*sqrt(R_sample); 434 Thus R_sqmean gives the exponentially weighted moving average of the 435 square root of the RTT samples. The constant q2 should be set similarly 436 to q, and we recommend a value of 0.9 as the default. 438 The sender obtains the base transmit rate, X, from the throughput 439 function. It then calculates a modified instantaneous transmit rate 440 X_inst, as follows: 442 X_inst = X * R_sqmean / sqrt(R_sample); 444 When sqrt(R_sample) is greater than R_sqmean then the queue is typically 445 increasing and so the transmit rate needs to be decreased for stable 446 operation. 448 Note: This modification is not always strictly required, especially if 449 the degree of statistical multiplexing in the network is high. However 450 we recommend that it is done because it does make TFRC behave better in 451 environments with a low level of statistical multiplexing. If it is not 452 done, we recommend using a very low value of q, such that q is close to 453 or exactly zero. 455 4.6. Scheduling of Packet Transmissions 457 As TFRC is rate-based, and as operating systems typically cannot 458 schedule events precisely, it is necessary to be opportunistic about 459 sending data packets so that the correct average rate is maintained 460 despite the course-grain or irregular scheduling of the operating 461 system. Thus a typical sending loop will calculate the correct inter- 462 packet interval, t_ipi, as follows: 464 t_ipi = s/X_inst; 466 When a sender first starts sending at time t_0, it calculates t_ipi, and 467 calculates a nominal send time t_1 = t_0 + t_ipi for packet 1. When the 468 application becomes idle, it checks the current time, t_now, and then 469 requests re-scheduling after (t_ipi - (t_now - t_0)) seconds. When the 470 application is re-scheduled, it checks the current time, t_now, again. 471 If (t_now > t_1 - delta) then packet 1 is sent. 473 Now a new t_ipi may be calculated, and used to calculate a nominal send 474 time t_2 for packet 2: t2 = t_1 + t_ipi. The process then repeats, with 475 each successive packet's send time being calculated from the nominal 476 send time of the previous packet. 478 In some cases, when the nominal send time, t_i, of the next packet is 479 calculated, it may already be the case that t_now > t_i - delta. In 480 such a case the packet should be sent immediately. Thus if the 481 operating system has coarse timer granularity and the transmit rate is 482 high, then TFRC may send short bursts of several packets separated by 483 intervals of the OS timer granularity. 485 The parameter delta is to allow a degree of flexibility in the send time 486 of a packet. If the operating system has a scheduling timer granularity 487 of t_gran seconds, then delta would typically be set to: 489 delta = min(t_ipi/2, t_gran/2); 491 t_gran is 10ms on many Unix systems. If t_gran is not known, a value of 492 10ms can be safely assumed. 494 5. Calculation of the Loss Event Rate (p) 496 Obtaining a accurate and stable measurement of the loss event rate is of 497 primary importance for TFRC. Loss rate measurement is performed at the 498 receiver, based on the detection of lost or marked packets from the 499 sequence numbers of arriving packets. We describe this process before 500 describing the rest of the receiver protocol. 502 5.1. Detection of Lost or Marked Packets 504 TFRC assumes that all packets contain a sequence number that is 505 incremented by one for each packet that is sent. For the purposes of 506 this specification, we require that if a lost packet is retransmitted, 507 the retransmission is given a new sequence number that is the latest in 508 the transmission sequence, and not the same sequence number as the 509 packet that was lost. If a transport protocol has the requirement that 510 it must retransmit with the original sequence number, then the transport 511 protocol designer must figure out how to distinguish delayed from 512 retransmitted packets and how to detect lost retransmissions. 514 The receiver maintains a data structure that keeps track of which 515 packets have arrived and which are missing. For the purposes of 516 specification, we assume that the data structure consists of a list of 517 packets that have arrived along with the receiver timestamp when each 518 packet was received. In practice this data structure will normally be 519 stored in a more compact representation, but this is implementation- 520 specific. 522 The loss of a packet is detected by the arrival of at least three 523 packets with a higher sequence number than the lost packet. The 524 requirement for three subsequent packets is the same as with TCP, and is 525 to make TFRC more robust in the presence of reordering. In contrast to 526 TCP, if a packet arrives late (after 3 subsequent packets arrived) in 527 TFRC, the late packet can fill the hole in TFRC's reception record, and 528 the receiver can recalculate the loss event rate. Future versions of 529 TFRC might make the requirement for three subsequent packets adaptive 530 based on experienced packet reordering, but we do not specify such a 531 mechanism here. 533 For an ECN-capable connection, a marked packet is detected as a 534 congestion event as soon as it arrives, without having to wait for the 535 arrival of subsequent packets. 537 5.2. Translation from Loss History to Loss Events 539 TFRC requires that the loss fraction be robust to several consecutive 540 packets lost where those packets are part of the same loss event. This 541 is similar to TCP, which (typically) only performs one halving of the 542 congestion window during any single RTT. Thus the receiver needs to map 543 the packet loss history into a loss event record, where a loss event is 544 one or more packets lost in an RTT. To perform this mapping, the 545 receiver needs to know the RTT to use, and this is supplied periodically 546 by the sender, typically as control information piggy-backed onto a data 547 packet. TFRC is not sensitive to how the RTT measurement sent to the 548 receiver is made, but we recommend using the sender's calculated RTT, R, 549 (see Section 4.3) for this purpose. 551 To determine whether a lost or marked packet should start a new loss 552 event, or be counted as part of an existing loss event, we need to 553 compare the sequence numbers and timestamps of the packets that arrived 554 at the receiver. For a marked packet S_new, its reception time T_new 555 can be noted directly. For a lost packet, we can interpolate to infer 556 the nominal "arrival time". Assume: 558 S_loss is the sequence number of a lost packet. 560 S_before is the sequence number of the last packet to arrive with 561 sequence number before S_loss. 563 S_after is the sequence number of the first packet to arrive with 564 sequence number after S_loss. 566 T_before is the reception time of S_before. 568 T_after is the reception time of S_after. 570 Note that T_before can either be before or after T_after due to 571 reordering. 573 For a lost packet S_loss, we can interpolate its nominal "arrival time" 574 at the receiver from the arrival times of S_before and S_after. Thus 576 T_loss = T_before + ( (T_after - T_before) 577 * (S_loss - S_before)/(S_after - S_before) ); 579 Note that if the sequence space wrapped between S_before and S_after, 580 then the sequence numbers must be modified to take this into account 581 before performing this calculation. If the largest possible sequence 582 number is S_max, and S_before > S_after, then modifying each sequence 583 number S by S' = (S + (S_max + 1)/2) mod (S_max + 1) would normally be 584 sufficient. 586 If the lost packet S_old was determined to have started the previous 587 loss event, and we have just determined that S_new has been lost, then 588 we interpolate the nominal arrival times of S_old and S_new, called 589 T_old and T_new respectively. 591 If T_old + R >= T_new, then S_new is part of the existing loss event. 592 Otherwise S_new is the first packet in a new loss event. 594 5.3. Inter-loss Event Interval 596 If a loss interval, A, is determined to have started with packet 597 sequence number S_A and the next loss interval, B, started with packet 598 sequence number S_B, then the number of packets in loss interval A is 599 given by (S_B - S_A). 601 5.4. Average Loss Interval 603 To calculate the loss event rate p, we first calculate the average loss 604 interval. This is done using a filter that weights the n most recent 605 loss event intervals in such a way that the measured loss event rate 606 changes smoothly. 608 Weights w_0 to w_(n-1) are calculated as: 610 If (i < n/2) 611 w_i = 1; 612 Else 613 w_i = 1 - (i - (n/2 - 1))/(n/2 + 1); 615 Thus if n=8, the values of w_0 to w_7 are: 617 1.0, 1.0, 1.0, 1.0, 0.8, 0.6, 0.4, 0.2 619 The value n for the number of loss intervals used in calculating the 620 loss event rate determines TRFC's speed in responding to changes in the 621 level of congestion. As currently specified, TFRC should not be used 622 for values of n significantly greater than 8, for traffic that might 623 compete in the global Internet with TCP. At the very least, safe 624 operation with values of n greater than 8 would require a slight change 625 to TFRC's mechanisms to include a more severe response to two or more 626 round-trip times with heavy packet loss. 628 When calculating the average loss interval we need to decide whether to 629 include the interval since the most recent packet loss event. We only 630 do this if it is sufficiently large to increase the average loss 631 interval. 633 Thus if the most recent loss intervals are I_0 to I_n, with I_0 being 634 the interval since the most recent loss event, then we calculate the 635 average loss interval I_mean as: 637 I_tot0 = 0; 638 I_tot1 = 0; 639 W_tot = 0; 640 for (i = 0 to n-1) { 641 I_tot0 = I_tot0 + (I_i * w_i); 642 W_tot = W_tot + w_i; 643 } 644 for (i = 1 to n) { 645 I_tot1 = I_tot1 + (I_i * w_(i-1)); 646 } 647 I_tot = max(I_tot0, I_tot1); 648 I_mean = I_tot/W_tot; 650 The loss event rate, p is simply: 652 p = 1 / I_mean; 654 5.5. History Discounting 656 As described in Section 5.4, the most recent loss interval is only 657 assigned 1/(0.75*n) of the total weight in calculating the average loss 658 interval, regardless of the size of the most recent loss interval. This 659 section describes an optional history discounting mechanism, discussed 660 further in [3] and [5], that allows the TFRC receiver to adjust the 661 weights, concentrating more of the relative weight on the most recent 662 loss interval, when the most recent loss interval is more than twice as 663 large as the computed average loss interval. 665 To carry out history discounting, we associate a discount factor DF_i 666 with each loss interval L_i, for i > 0, where each discount factor is a 667 floating point number. The discount array maintains the cumulative 668 history of discounting for each loss interval. At the beginning, the 669 values of DF_i in the discount array are initialized to 1: 671 for (i = 1 to n) { 672 DF_i = 1; 673 } 675 History discounting also uses a general discount factor DF, also a 676 floating point number, that is also initialized to 1. First we show how 677 the discount factors are used in calculating the average loss interval, 678 and then we describe later in this section how the discount factors are 679 modified over time. 681 As described in Section 5.4 the average loss interval is calculated 682 using the n previous loss intervals I_1, ..., I_n, and the interval I_0 683 that represents the number of packets received since the last loss 684 event. The computation of the average loss interval using the discount 685 factors is a simple modification of the procedure in Section 5.4, as 686 follows: 688 I_tot0 = I_0 * w_0 689 I_tot1 = 0; 690 W_tot0 = w_0 691 W_tot1 = 0; 692 for (i = 1 to n-1) { 693 I_tot0 = I_tot0 + (I_i * w_i * DF_i * DF); 694 W_tot0 = W_tot0 + w_i * DF_i * DF; 695 } 696 for (i = 1 to n) { 697 I_tot1 = I_tot1 + (I_i * w_(i-1) * DF_i); 698 W_tot1 = W_tot1 + w_(i-1) * DF_i; 699 } 700 p = min(W_tot0/I_tot0, W_tot1/I_tot1); 702 The general discounting factor, DF is updated on every packet arrival as 703 follows. First, the receiver computes the weighted average I_mean of the 704 loss intervals I_1, ..., I_n: 706 I_tot = 0; 707 W_tot = 0; 708 for (i = 1 to n) { 709 W_tot = w_(i-1) * DF_i; 710 I_tot = I_tot + (I_i * w_(i-1) * DF_i); 711 } 712 I_mean = I_tot / W_tot; 714 This weighted average I_mean is compared to I_0, the number of packets 715 received since the last loss event. If I_0 is greater than twice 716 I_mean, then the new loss interval is considerably larger than the old 717 ones, and the general discount factor DF is updated to decrease the 718 relative weight on the older intervals, as follows: 720 if (I_0 > 2 * I_mean) { 721 DF = 2 * I_mean/I_0; 722 if (DF < THRESHOLD) 723 DF = THRESHOLD; 724 } else 725 DF = 1; 727 A nonzero value for THRESHOLD ensures that older loss intervals from an 728 earlier time of high congestion are not discounted entirely. We 729 recommend a THRESHOLD of 0.5. Note that with each new packet arrival, 730 I_0 will increase further, and the discount factor DF will be updated. 732 When a new loss event occurs, the current interval shifts from I_0 to 733 I_1, loss interval I_i shifts to interval I_(i+1), and the loss interval 734 I_n is forgotten. The previous discount factor DF has to be 735 incorporated into the discount array. Because DF_i carries the discount 736 factor associated with loss interval I_i, the DF_i array has to be 737 shifted as well. This is done as follows: 739 for (i = 1 to n) { 740 DF_i = DF * DF_i; 741 } 742 for (i = n-1 to 0 step -1) { 743 DF_(i+1) = DF_i; 744 } 745 I_0 = 1; 746 DF_0 = 1; 747 DF = 1; 749 This completes the description of the optional history discounting 750 mechanism. We emphasize that this is an optional mechanism whose sole 751 purpose is to allow TFRC to response somewhat more quickly to the sudden 752 absence of congestion, as represented by a long current loss interval. 754 6. Data Receiver Protocol 756 The receiver periodically sends feedback messages to the sender. 757 Feedback packets should normally be sent at least once per RTT, unless 758 the sender is sending at a rate of less than one packet per RTT, in 759 which case a feedback packet should be send for every data packet 760 received. A feedback packet should also be sent whenever a new loss 761 event is detected without waiting for the end of an RTT, and whenever an 762 out-of-order data packet is received that removes a loss event from the 763 history. 765 If the sender is transmitting at a high rate (many packets per RTT) 766 there may be some advantages to sending periodic feedback messages more 767 than once per RTT as this allows faster response to changing RTT 768 measurements, and more resilience to feedback packet loss. However 769 there is little gain from sending a large number of feedback messages 770 per RTT. 772 6.1. Receiver behavior when a data packet is received 774 When a data packet is received, the receiver performs the following 775 steps: 777 1) Add the packet to the packet history. 779 2) Let the previous value of p be p_prev. Calculate the new value of 780 p as described in Section 5. 782 3) If p > p_prev, cause the feedback timer to expire, and perform the 783 actions described in Section 6.2 785 If p <= p_prev no action need be performed. 787 However an optimization might check to see if the arrival of the 788 packet caused a hole in the packet history to be filled and 789 consequently two loss intervals were merged into one. If this is 790 the case, the receiver might also send feedback immediately. The 791 effects of such an optimization are normally expected to be small. 793 6.2. Expiration of feedback timer 795 When the feedback timer at the receiver expires, the action to be taken 796 depends on whether data packets have been received since the last 797 feedback was sent. 799 Let the maximum sequence number of a packet at the receiver so far be 800 S_m, and the value of the RTT measurement included in packet S_m be R_m. 801 If data packets have been received since the pervious feedback was sent, 802 the receiver performs the following steps: 804 1) Calculate the average loss event rate using the algorithm described 805 above. 807 2) Calculate the measured receive rate, X_recv, based on the packets 808 received within the previous R_m seconds. 810 3) Prepare and send a feedback packet containing the information 811 described in Section 3.2.2 813 4) Restart the feedback timer to expire after R_m seconds. 815 If no data packets have been received since the last feedback was sent, 816 no feedback packet is sent, and the feedback timer is restarted to 817 expire after R_m seconds. 819 6.3. Receiver initialization 821 The receiver is initialized by the first packet that arrives at the 822 receiver. Let the sequence number of this packet be i. 824 When the first packet is received: 826 o Set p=0 828 o Set X_recv = X_send, where X_send is the sending rate the sender 829 reports in the first data packet. 831 o Prepare and send a feedback packet. 833 o Set the feedback timer to expire after R_i seconds. 835 6.3.1. Initializing the Loss History after the First Loss Event 837 The number of packets until the first loss can not be used to compute 838 the sending rate directly, as the sending rate changes rapidly during 839 this time. TFRC assumes that the correct data rate after the first loss 840 is half of the sending rate when the loss occurred. TFRC approximates 841 this target rate by X_recv, the receive rate over the most recent round- 842 trip time. After the first loss, instead of initializing the first loss 843 interval to the number of packets sent until the first loss, the TFRC 844 receiver calculates the loss interval that would be required to produce 845 the data rate X_recv, and uses this synthetic loss interval to seed the 846 loss history mechanism. 848 TFRC does this by finding some value p for which the throughput equation 849 in Section 3.1 gives a sending rate within 5% of X_recv, given the 850 current packet size s and round-trip time R. The first loss interval is 851 then set to 1/p. (The 5% tolerance is introduced simply because the 852 throughput equation is difficult to invert, and we want to reduce the 853 costs of calculating p numerically.) 855 7. Security Considerations 857 TFRC is not a transport protocol in its own right, but a congestion 858 control mechanism that is intended to be used in conjunction with a 859 transport protocol. Therefore security primarily needs to be considered 860 in the context of a specific transport protocol and its authentication 861 mechanisms. 863 Congestion control mechanisms can potentially be exploited to create 864 denial of service. This may occur through spoofed feedback. Thus any 865 transport protocol that uses TFRC should take care to ensure that 866 feedback is only accepted from the receiver of the data. The precise 867 mechanism to achieve this will however depend on the transport protocol 868 itself. 870 In addition, congestion control mechanisms may potentially be 871 manipulated by a greedy receiver that wishes to receive more than its 872 fair share of network bandwidth. A receiver might do this by claiming 873 to have received packets that in fact were lost due to congestion. 874 Possible defenses against such a receiver would normally include some 875 form of nonce that the receiver must feed back to the sender to prove 876 receipt. However, the details of such a nonce would depend on the 877 transport protocol, and in particular on whether the transport protocol 878 is reliable or unreliable. 880 We expect that protocols incorporating ECN with TFRC will also want to 881 incorporate feedback from the receiver to the sender using the ECN nonce 882 [WES01]. The ECN nonce is a modification to ECN that protects the 883 sender from the accidental or malicious concealment of marked packets. 884 Again, the details of such a nonce would depend on the transport 885 protocol, and are not addressed in this document. 887 8. IANA Considerations 889 There are no IANA actions required for this document. 891 9. Authors' Addresses 893 Mark Handley, Jitendra Padhye, Sally Floyd 894 ACIRI/ICSI 895 1947 Center St, Suite 600 896 Berkeley, CA 94708 897 mjh@aciri.org, padhye@aciri.org, floyd@aciri.org 899 Joerg Widmer 900 Lehrstuhl Praktische Informatik IV 901 Universitat Mannheim 902 L 15, 16 - Room 415 903 D-68131 Mannheim 904 Germany 905 widmer@informatik.uni-mannheim.de 907 10. Acknowledgments 909 We would like to acknowledge feedback and discussions on equation-based 910 congestion control with a wide range of people, including members of the 911 Reliable Multicast Research Group, the Reliable Multicast Transport 912 Working Group, and the End-to-End Research Group. We would like to 913 thank Eduardo Urzaiz, Vladica Stanisic, and Shushan Wen for feedback on 914 earlier versions of this document, and to thank Mark Allman for his 915 extensive feedback from using the draft to produce a working 916 implementation. 918 11. References 920 [1] Balakrishnan, H., Rahul, H., and Seshan, S., "An Integrated 921 Congestion Management Architecture for Internet Hosts," Proc. ACM 922 SIGCOMM, Cambridge, MA, September 1999. 924 [2] S. Floyd, M. Handley, J. Padhye, and J. Widmer, "Equation-Based 925 Congestion Control for Unicast Applications", August 2000, Proc 926 SIGCOMM 2000. 928 [3] S. Floyd, M. Handley, J. Padhye, and J. Widmer, "Equation-Based 929 Congestion Control for Unicast Applications: the Extended Version", 930 ICSI tech report TR-00-03, March 2000. 932 [4] Padhye, J. and Firoiu, V. and Towsley, D. and Kurose, J., "Modeling 933 TCP Throughput: A Simple Model and its Empirical Validation", Proc 934 ACM SIGCOMM 1998. 936 [5] Widmer, J., "Equation-Based Congestion Control", Diploma Thesis, 937 University of Mannheim, February 2000. URL 938 "http://www.aciri.org/tfrc/". 940 [6] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A 941 Transport Protocol for Real-Time Applications", RFC 1889, January 942 1996. 944 [7] K. Ramakrishnan and S. Floyd, "A Proposal to add Explicit Congestion 945 Notification (ECN) to IP", RFC 2481, January 1999. 947 [8] V. Paxson and M. Allman, "Computing TCP's Retransmission Timer", RFC 948 2988, November 2000. 950 [9] Wetherall, D., Ely, D., and Spring, N., "Robust ECN Signaling with 951 Nonces", draft-ietf-tsvwg-tcp-nonce-00.txt, internet draft, work in 952 progress, January 2001. Citation for informational purposes only.