idnits 2.17.1 draft-ietf-dccp-rfc3448bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1578. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1589. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1596. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1602. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (4 March 2007) is 6263 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 2140 (Obsoleted by RFC 9040) -- Obsolete informational reference (is this intentional?): RFC 2988 (Obsoleted by RFC 6298) -- Obsolete informational reference (is this intentional?): RFC 3448 (Obsoleted by RFC 5348) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force M. Handley 2 INTERNET-DRAFT University College London 3 Intended status: Proposed Standard S. Floyd 4 Expires: September 2007 ICIR 5 J. Padhye 6 Microsoft 7 J. Widmer 8 University of Mannheim 9 4 March 2007 11 TCP Friendly Rate Control (TFRC): Protocol Specification 12 draft-ietf-dccp-rfc3448bis-01.txt 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six 27 months and may be updated, replaced, or obsoleted by other documents 28 at any time. It is inappropriate to use Internet-Drafts as 29 reference material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on September 2007. 39 Copyright Notice 41 Copyright (C) The IETF Trust (2007). 43 Abstract 45 This document specifies TCP-Friendly Rate Control (TFRC). TFRC is a 46 congestion control mechanism for unicast flows operating in a best- 47 effort Internet environment. It is reasonably fair when competing 48 for bandwidth with TCP flows, but has a much lower variation of 49 throughput over time compared with TCP, making it more suitable for 50 applications such as streaming media where a relatively smooth 51 sending rate is of importance. 53 Table of Contents 55 1. Introduction ...................................................8 56 2. Conventions ....................................................9 57 3. Protocol Mechanism .............................................9 58 3.1. TCP Throughput Equation ..................................10 59 3.2. Packet Contents ..........................................11 60 3.2.1. Data Packets ......................................12 61 3.2.2. Feedback Packets ..................................12 62 4. Data Sender Protocol ..........................................13 63 4.1. Measuring the Segment Size ...............................13 64 4.2. Sender Initialization ....................................14 65 4.3. Sender behavior when a feedback packet is received .......15 66 4.4. Expiration of nofeedback timer ...........................16 67 4.5. Sending a packet after an idle or data-limited period ....17 68 4.6. Preventing Oscillations ..................................17 69 4.7. Scheduling of Packet Transmissions .......................18 70 5. Calculation of the Loss Event Rate (p) ........................19 71 5.1. Detection of Lost or Marked Packets ......................19 72 5.2. Translation from Loss History to Loss Events .............20 73 5.3. Inter-loss Event Interval ................................21 74 5.4. Average Loss Interval ....................................22 75 5.5. History Discounting ......................................23 76 6. Data Receiver Protocol ........................................25 77 6.1. Receiver behavior when a data packet is received .........26 78 6.2. Expiration of feedback timer .............................26 79 6.3. Receiver initialization ..................................27 80 6.3.1. Initializing the Loss History after the First Loss 81 Event ....................................................28 82 7. Sender-based Variants .........................................29 83 8. Implementation Issues .........................................30 84 9. Changes from RFC 3448 .........................................31 85 10. Security Considerations ......................................32 86 11. IANA Considerations ..........................................32 87 12. Acknowledgments ..............................................33 88 13. Terminology ..................................................33 89 14. Normative References .........................................35 90 15. Informational References .....................................35 91 16. Authors' Addresses ...........................................36 92 Full Copyright Statement .........................................37 93 Intellectual Property ............................................38 94 NOTE TO RFC EDITOR: PLEASE DELETE THIS NOTE UPON PUBLICATION. 96 Changes from draft-ietf-dccp-rfc3448bis-00.txt: 98 * When initializing the loss history after the first 99 data packet sent is lost or ECN-marked, TFRC uses 100 a minimum receive rate of 0.5 packets per second. 102 * For initializing the estimated packet drop rate 103 for the first loss interval when coming out of slow-start, 104 it is ok to use the maximum receive rate so far, not just 105 the receive rate in the last round-trip time. 106 Feedback from Ladan Gharai. 108 * General feedback from Gorry Fairhurst: 109 - Added a reference for TFRC-SP. 110 - Clarified that R_m is sender's estimate of RTT, as reported 111 in Section 3.2.1. 112 - Added a definition of terms. 113 - Added a discussion of why the initial value of the nofeedback 114 timer is two seconds, instead of three seconds for the 115 recommended initial value for TCP's retransmit timer. 117 * General feedback from Arjuna Sathiaseelan: 118 - Added more details about sending multiple feedback 119 packets per RTT. 120 - Added change to Section 4.3 to use the first feedback 121 packet, or the first feedback packet after a 122 nofeedback timer during slow-start, *if min_rate > X*. 124 * General feedback from Gerrit Renker: 125 - Changed "delta" to "t_delta". 126 - Changed X_calc to X_Bps, clarified X. 127 - Clarified send times in Section 4.7. 128 - Changed so that tld can be initialized to either 0 or -1. 129 - Fixed Section 5.5 to say that the most recent lost 130 interval has weight 1/(0.75*n) *when there have been 131 at least eight loss intervals*. 132 - Clarified introduction about fixed-size and variable-size 133 packets. 135 * Added more about sender-based variants. 136 Feedback from Guillaume Jourjon. 138 * Corrected that the loss interval I_0 includes all transmitted 139 packets, including lost and marked packets (as defined in Section 140 5.3 in the general definition.) Email from Eddie Kohler and 141 Gerrit Renker. 143 * Open issue: Feedback from Ian about problems being limited by 144 X_recv after a loss event. There might not be an easy answer. 146 * Related open issue: Add Faster Restart to RFC3448bis? Or not? 147 From Ian McDonald. 149 * Open issue: Adopt something like DCCP's Receive Rate Length, 150 instead of ignoring one feedback packet? From Eddie Kohler. 152 * Open issue: Add possible mechanisms for limited the maximum 153 burst size? Using a token bucket size based on the 154 current rate? Or not? Email from Eddie Kohler and Gerrit 155 Renker. 157 * Related open issue: To deal with idle periods and the like, 158 in Section 4.7 say that t_i := max(t_i, t_now - RTT/2), to 159 limit bursts to RTT/2 packets? Has anyone implemented this? 160 Email from Eddie Kohler and Ian McDonald. 162 * Not done: I didn't add a minimum value for the nofeedback 163 timer. (Why would a nofeedback timer need to be bigger 164 than max(4*R, 2*s/X)? Email discussing pros and cons from 165 Arjuna. 167 * Not addressed yet: Email thread on "RFC 3448, 4.4: Modifying 168 X_recv if p = 0 at the time of last feedback". 170 * Todo: Update Section 9 on "Changes from RFC 3448" with 171 changes since draft-floyd-rfc3448bis-00.txt. 173 Changes from draft-floyd-rfc3448bis-00.txt: 175 * Name change to draft-ietf-dccp-rfc3448bis-00.txt. 177 * Specified the receiver's initialization of the feedback timer 178 when the first data packet doesn't have an estimate of the 179 RTT. From feedback from Dado Colussi. 181 * Added the procedure for sending receiver 182 feedback packets when a coarse-grained 183 timestamp is used. From RFC 4243. 185 Changes from RFC 3448: 187 * Incorporated changes in the RFC 3448 errata: 189 - "If the sender does not receive a feedback report for 190 four round trip times, it cuts its sending rate in half." 191 ("Two" changed to "four", for consistency with the rest 192 of the document. Reported by Joerg Widmer). 194 - "If the nofeedback timer expires when the sender does not 195 yet have an RTT sample, and has not yet received any 196 feedback from the receiver, or when p == 0,..." 197 (Added "or when p == 0,", reported by Wim Heirman). 199 - In Section 5.5, changed: 200 for (i = 1 to n) { DF_i = 1; } 201 to: 202 for (i = 0 to n) { DF_i = 1; } 203 Reported by Michele R. 205 * Changed RFC 3448 to correspond to the larger initial windows 206 specified in RFC 3390. This includes the following: 208 - Incorporated Section 5.1 from [RFC4342], saying that 209 when reducing the sending rate after an idle period, don't 210 reduce the sending rate below the initial sending rate. 212 - Change for a datalimited sender: 213 When the sender has been datalimited, the sender doesn't 214 let the receive rate limit it to a sending rate less than 215 the initial rate. 217 - Small change to slow-start: 218 Changed so that for the first feedback packet received, 219 or for the first feedback packet received after an idle 220 period, the receive rate is not used to limit the 221 sending rate. This is because the receiver might not yet 222 have seen an entire window of data. 224 * Clarified how the average loss interval is calculated when 225 the receiver has not yet seen eight loss intervals. 227 * Discussed more about estimating the average segment size: 229 - For initializing the loss history after the first loss event, 230 either the receiver knows the sender's value for s, or 231 the receiver uses the throughput equation for X_pps and does 232 not need to know an estimate for s. 234 - Added a discussion about estimating the average segment size 235 s in Section 4.1 on "Measuring the Segment Size". 237 - Changed "packet size" to "segment size". 239 END OF NOTE TO RFC EDITOR. 241 1. Introduction 243 This document specifies TCP-Friendly Rate Control (TFRC). TFRC is a 244 congestion control mechanism designed for unicast flows operating in 245 an Internet environment and competing with TCP traffic [FHPW00]. 246 Instead of specifying a complete protocol, this document simply 247 specifies a congestion control mechanism that could be used in a 248 transport protocol such as DCCP (Datagram Congestion Control 249 Protocol) [RFC4340], in an application incorporating end-to-end 250 congestion control at the application level, or in the context of 251 endpoint congestion management [BRS99]. This document does not 252 discuss packet formats or reliability. Implementation-related 253 issues are discussed only briefly, in Section 8. 255 TFRC is designed to be reasonably fair when competing for bandwidth 256 with TCP flows, where a flow is "reasonably fair" if its sending 257 rate is generally within a factor of two of the sending rate of a 258 TCP flow under the same conditions. However, TFRC has a much lower 259 variation of throughput over time compared with TCP, which makes it 260 more suitable for applications such as telephony or streaming media 261 where a relatively smooth sending rate is of importance. 263 The penalty of having smoother throughput than TCP while competing 264 fairly for bandwidth is that TFRC responds slower than TCP to 265 changes in available bandwidth. Thus TFRC should only be used when 266 the application has a requirement for smooth throughput, in 267 particular, avoiding TCP's halving of the sending rate in response 268 to a single packet drop. For applications that simply need to 269 transfer as much data as possible in as short a time as possible we 270 recommend using TCP, or if reliability is not required, using an 271 Additive-Increase, Multiplicative-Decrease (AIMD) congestion control 272 scheme with similar parameters to those used by TCP. 274 TFRC is designed for best performance with applications that use a 275 fixed segment size, and vary their sending rate in packets per 276 second in response to congestion. TFRC can also be used, perhaps 277 with less optimal performance, with applications that don't have a 278 fixed segment size, but where the segment size varies according to 279 the needs of the application (e.g., video applications). 281 Some applications (e.g., some audio applications) require a fixed 282 interval of time between packets and vary their segment size instead 283 of their packet rate in response to congestion. The congestion 284 control mechanism in this document is not designed for those 285 applications; TFRC-SP (Small-Packet TFRC) is a variant of TFRC for 286 applications that have a fixed sending rate in packets per second 287 but either use small packets, or vary their packet size in response 288 to congestion. TFRC-SP will be specified in a later document [TFRC- 289 SP]. 291 This document specifies TFRC as a receiver-based mechanism, with the 292 calculation of the congestion control information (i.e., the loss 293 event rate) in the data receiver rather in the data sender. This is 294 well-suited to an application where the sender is a large server 295 handling many concurrent connections, and the receiver has more 296 memory and CPU cycles available for computation. In addition, a 297 receiver-based mechanism is more suitable as a building block for 298 multicast congestion control. However, it is also possible to 299 implement TFRC in sender-based variants, as allowed in DCCP's 300 Congestion Control ID 3 (CCID 3) [RFC4342]. 302 2. Conventions 304 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 305 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 306 document are to be interpreted as described in [RFC2119]. 308 Appendix A gives a list of terms used in this document. 310 3. Protocol Mechanism 312 For its congestion control mechanism, TFRC directly uses a 313 throughput equation for the allowed sending rate as a function of 314 the loss event rate and round-trip time. In order to compete fairly 315 with TCP, TFRC uses the TCP throughput equation, which roughly 316 describes TCP's sending rate as a function of the loss event rate, 317 round-trip time, and segment size. We define a loss event as one or 318 more lost or marked packets from a window of data, where a marked 319 packet refers to a congestion indication from Explicit Congestion 320 Notification (ECN) [RFC3168]. 322 Generally speaking, TFRC's congestion control mechanism works as 323 follows: 325 o The receiver measures the loss event rate and feeds this 326 information back to the sender. 328 o The sender also uses these feedback messages to measure the 329 round-trip time (RTT). 331 o The loss event rate and RTT are then fed into TFRC's throughput 332 equation, giving the acceptable transmit rate. 334 o The sender then adjusts its transmit rate to match the 335 calculated rate. 337 The dynamics of TFRC are sensitive to how the measurements are 338 performed and applied. We recommend specific mechanisms below to 339 perform and apply these measurements. Other mechanisms are 340 possible, but it is important to understand how the interactions 341 between mechanisms affect the dynamics of TFRC. 343 3.1. TCP Throughput Equation 345 Any realistic equation giving TCP throughput as a function of loss 346 event rate and RTT should be suitable for use in TFRC. However, we 347 note that the TCP throughput equation used must reflect TCP's 348 retransmit timeout behavior, as this dominates TCP throughput at 349 higher loss rates. We also note that the assumptions implicit in 350 the throughput equation about the loss event rate parameter have to 351 be a reasonable match to how the loss rate or loss event rate is 352 actually measured. While this match is not perfect for the 353 throughput equation and loss rate measurement mechanisms given 354 below, in practice the assumptions turn out to be close enough. 356 The throughput equation we currently recommend for TFRC is a 357 slightly simplified version of the throughput equation for Reno TCP 358 from [PFTK98]. Ideally we'd prefer a throughput equation based on 359 SACK TCP, but no one has yet derived the throughput equation for 360 SACK TCP, and from both simulations and experiments, the differences 361 between the two equations are relatively minor. 363 The throughput equation is: 365 s 366 X_Bps = ---------------------------------------------------------- 367 R*sqrt(2*b*p/3) + (t_RTO * (3*sqrt(3*b*p/8)*p*(1+32*p^2))) 369 Where: 371 X_Bps is the transmit rate in bytes/second. 373 s is the segment size in bytes. 375 R is the round trip time in seconds. 377 p is the loss event rate, between 0 and 1.0, of the number of 378 loss events as a fraction of the number of packets transmitted. 380 t_RTO is the TCP retransmission timeout value in seconds. 382 b is the number of packets acknowledged by a single TCP 383 acknowledgement. 385 We further simplify this by setting t_RTO = 4*R. A more accurate 386 calculation of t_RTO is possible, but experiments with the current 387 setting have resulted in reasonable fairness with existing TCP 388 implementations [W00]. Another possibility would be to set t_RTO = 389 max(4R, one second), to match the recommended minimum of one second 390 on the RTO [RFC2988]. 392 Many current TCP connections use delayed acknowledgements, sending 393 an acknowledgement for every two data packets received, and thus 394 have a sending rate modeled by b = 2. However, TCP is also allowed 395 to send an acknowledgement for every data packet, and this would be 396 modeled by b = 1. Because many TCP implementations do not use 397 delayed acknowledgements, we recommend b = 1. 399 In future, different TCP equations may be substituted for this 400 equation. The requirement is that the throughput equation be a 401 reasonable approximation of the sending rate of TCP for conformant 402 TCP congestion control. 404 The throughput equation can also be expressed as 406 X_Bps = X_pps * s , 408 with X_pps, the sending rate in packets per second, given as 410 1 411 X_pps = -------------------------------------------------------- 412 R*sqrt(2*b*p/3) + (t_RTO*(3*sqrt(3*b*p/8)*p*(1+32*p^2))) 414 The parameters s (segment size), p (loss event rate) and R (RTT) 415 need to be measured or calculated by a TFRC implementation. The 416 measurement of s is specified in Section 4.1, measurement of R is 417 specified in Section 4.3, and measurement of p is specified in 418 Section 5. In the rest of this document all data rates are measured 419 in bytes/second. 421 3.2. Packet Contents 423 Before specifying the sender and receiver functionality, we describe 424 the contents of the data packets sent by the sender and feedback 425 packets sent by the receiver. As TFRC will be used along with a 426 transport protocol, we do not specify packet formats, as these 427 depend on the details of the transport protocol used. 429 3.2.1. Data Packets 431 Each data packet sent by the data sender contains the following 432 information: 434 o A sequence number. This number is incremented by one for each 435 data packet transmitted. The field must be sufficiently large 436 that it does not wrap causing two different packets with the 437 same sequence number to be in the receiver's recent packet 438 history at the same time. 440 o A timestamp indicating when the packet is sent. We denote by 441 ts_i the timestamp of the packet with sequence number i. The 442 resolution of the timestamp should typically be measured in 443 milliseconds. 444 This timestamp is used by the receiver to determine which losses 445 belong to the same loss event. The timestamp is also echoed by 446 the receiver to enable the sender to estimate the round-trip 447 time, for senders that do not save timestamps of transmitted 448 data packets. 449 We note that as an alternative to a timestamp incremented in 450 milliseconds, a "timestamp" that increments every quarter of a 451 round-trip time would be sufficient for determining when losses 452 belong to the same loss event, in the context of a protocol 453 where this is understood by both sender and receiver, and where 454 the sender saves the timestamps of transmitted data packets. 456 o The sender's current estimate of the round trip time. The 457 estimate reported in packet i is denoted by R_i. The round-trip 458 time estimate is used by the receiver, along with the timestamp, 459 to determine when multiple losses belong to the same loss event. 460 The round-trip time estimate is also used by the receiver to 461 determine the interval to use for calculating the receive rate, 462 and to determine when to send feedback packets. 463 If the sender sends a coarse-grained "timestamp" that increments 464 every quarter of a round-trip time, as discussed above, then the 465 sender does not need to send its current estimate of the round 466 trip time. 468 3.2.2. Feedback Packets 470 Each feedback packet sent by the data receiver contains the 471 following information: 473 o The timestamp of the last data packet received. We denote this 474 by t_recvdata. If the last packet received at the receiver has 475 sequence number i, then t_recvdata = ts_i. 477 This timestamp is used by the sender to estimate the round-trip 478 time, and is only needed if the sender does not save timestamps 479 of transmitted data packets. 481 o The amount of time elapsed between the receipt of the last data 482 packet at the receiver, and the generation of this feedback 483 report. We denote this by t_delay. 485 o The rate at which the receiver estimates that data was received 486 since the last feedback report was sent. We denote this by 487 X_recv. 489 o The receiver's current estimate of the loss event rate, p. 491 4. Data Sender Protocol 493 The data sender sends a stream of data packets to the data receiver 494 at a controlled rate. When a feedback packet is received from the 495 data receiver, the data sender changes its sending rate, based on 496 the information contained in the feedback report. If the sender does 497 not receive a feedback report for four round trip times, it cuts its 498 sending rate in half. This is achieved by means of a timer called 499 the nofeedback timer. 501 We specify the sender-side protocol in the following steps: 503 o Measurement of the mean segment size being sent. 505 o The sender behavior when a feedback packet is received. 507 o The sender behavior when the nofeedback timer expires. 509 o Oscillation prevention (optional) 511 o Scheduling of transmission on non-realtime operating systems. 513 4.1. Measuring the Segment Size 515 The parameter s (segment size) is normally known to an application. 516 This may not be so in two cases: 518 o (1) The segment size naturally varies depending on the data. In 519 this case, although the segment size varies, that variation is 520 not coupled to the transmit rate. The TFRC sender can either 521 compute the average segment size or use the maximum segment size 522 for the segment size s. 524 o (2) The application needs to change the segment size rather than 525 the number of segments per second to perform congestion control. 526 This would normally be the case with packet audio applications 527 where a fixed interval of time needs to be represented by each 528 packet. Such applications need to have a completely different 529 way of measuring parameters. 531 For the first class of applications where the segment size varies 532 depending on the data, the sender MAY estimate the segment size s as 533 the average segment size over the last four loss intervals. The 534 sender MAY also estimate the average segment size over longer time 535 intervals, if so desired. The TFRC sender uses the segment size s 536 in the throughput equation, in the setting of the maximum receive 537 rate and the minimum sending rate, and in the setting of the 538 nofeedback timer. 540 The TFRC receiver may use the average segment size s in initializing 541 the loss history after the first loss event, but Section 6.3.1 also 542 gives an alternate procedure that does not use the average segment 543 size s. 545 The second class of applications are discussed separately in a 546 separate document on TFRC-SP. For the remainder of this section we 547 assume the sender can estimate the segment size, and that congestion 548 control is performed by adjusting the number of packets sent per 549 second. 551 4.2. Sender Initialization 553 The initial values for X (the allowed sending rate in bytes per 554 second) and tld (the Time Last Doubled during slow-start) are 555 undefined until they are set as described below. If the sender is 556 ready to send data when it does not yet have a round trip sample, 557 the value of X is set to 1 MSS/second (for MSS the Maximum Segment 558 Size), the nofeedback timer is set to expire after two seconds, and 559 tld is set either to 0 or to -1. Upon receiving a round trip time 560 measurement (e.g., after the first feedback packet), tld is set to 561 the current time, and the allowed transmit rate X is set to 562 W_init/R, for W_init below from [RFC3390]: 564 W_init = min(4*MSS, max(2*MSS, 4380)). 566 For responding to the initial feedback packet, this replaces step 567 (4) of Section 4.3 below. 569 If the sender does have a round trip sample when it is ready to 570 first send data (e.g., from the SYN exchange or from a previous 571 connection [RFC2140]), the initial transmit rate X is set to 572 W_init/R, and tld is set to the current time. 574 Why is the initial value of TFRC's nofeedback timer set to two 575 seconds, instead of the recommended initial value of three seconds 576 for TCP's retransmit timer, from [RFC2988]? There isn't any 577 particular reason why TFRC's nofeedback timer should have the same 578 initial value as TCP's retransmit timer. TCP's retransmit timer is 579 used not only to reduce the sending rate in response to congestion, 580 but also to retransit a packet that is assumed to have been dropped 581 in the network. In contrast, TFRC's nofeedback timer is only used 582 to reduce the allowed sending rate, not to trigger the sending of a 583 new packet. As a result, there is no danger to the network for the 584 initial value of TFRC's nofeedback timer to be smaller than the 585 recommended initial value for TCP's retransmit timer. 587 4.3. Sender behavior when a feedback packet is received 589 The sender knows its current allowed sending rate, X, and maintains 590 an estimate of the current round trip time, R, and an estimate of 591 the timeout interval, t_RTO. 593 When a feedback packet is received by the sender at time t_now, the 594 following actions should be performed: 596 1) Calculate a new round trip sample. 597 R_sample = (t_now - t_recvdata) - t_delay. 599 2) Update the round trip time estimate: 601 If no feedback has been received before 602 R = R_sample; 603 Else 604 R = q*R + (1-q)*R_sample; 606 TFRC is not sensitive to the precise value for the filter 607 constant q, but we recommend a default value of 0.9. 609 3) Update the timeout interval: 611 t_RTO = 4*R. 613 4) Update the sending rate as follows: 615 If (sender has been idle or data-limited 616 within last two round-trip times) 617 min_rate = max(2*X_recv, W_init/R); 618 Else 619 min_rate = 2*X_recv; 620 If (p > 0) 621 Calculate X_Bps using the TCP throughput equation. 622 X = max(min(X_Bps, min_rate), s/t_mbi); 623 Else if ((min_rate < X) and (the first feedback packet, or 624 the first feedback packet after a nofeedback timer)) 625 Do nothing; 626 Else if (t_now - tld >= R) 627 X = max(min(2*X, min_rate), s/R); 628 tld = t_now; 630 The condition ``if (sender has been idle or data-limited within last 631 two round-trip times)'' prevents an idle or data-limited sender from 632 having to reduce the sending rate to less than the initial sending 633 rate as a result of limitations from a small receive rate. The 634 condition ``if (not the first feedback packet, and not the first 635 feedback packet after a nofeedback timer)'' prevents a sender from 636 reducing the sending rate in response to a feedback packet that 637 reports the receipt of only a few packets after start-up or after an 638 idle period. 640 Note that if p == 0, then the sender is in slow-start phase, where 641 it approximately doubles the sending rate each round-trip time until 642 a loss occurs. The s/R term gives a minimum sending rate during 643 slow-start of one packet per RTT. The parameter t_mbi is 64 644 seconds, and represents the maximum inter-packet backoff interval in 645 the persistent absence of feedback. Thus, when p > 0 the sender 646 sends at least one packet every 64 seconds. 648 5) Reset the nofeedback timer to expire after max(4*R, 2*s/X) 649 seconds. 651 4.4. Expiration of nofeedback timer 653 If the nofeedback timer expires, the sender should perform the 654 following actions: 656 1) Cut the sending rate in half. If the sender has received 657 feedback from the receiver, this is done by modifying the 658 sender's cached copy of X_recv (the receive rate). Because the 659 sending rate is limited to at most twice X_recv, modifying 660 X_recv limits the current sending rate, but allows the sender to 661 slow-start, doubling its sending rate each RTT, if feedback 662 messages resume reporting no losses. 664 If (X_Bps > 2*X_recv) 665 X_recv = max(X_recv/2, s/(2*t_mbi)); 666 Else 667 X_recv = X_Bps/4; 669 The term s/(2*t_mbi) limits the backoff to one packet every 64 670 seconds in the case of persistent absence of feedback. 672 2) The value of X must then be recalculated as described under 673 point (4) above. 675 If the nofeedback timer expires when the sender does not yet 676 have an RTT sample and has not yet received any feedback from 677 the receiver, or when p == 0, then step (1) can be skipped, and 678 the sending rate cut in half directly: 680 X = max(X/2, s/t_mbi) 682 3) Restart the nofeedback timer to expire after max(4*R, 2*s/X) 683 seconds. 685 Note that when the sender stops sending, the receiver will stop 686 sending feedback. When the sender's nofeedback timer expires, the 687 sender will decrease X_recv. If the sender subsequently starts to 688 send again, X_recv will limit the transmit rate, and a normal 689 slowstart phase will occur until the transmit rate reaches X_Bps. 691 4.5. Sending a packet after an idle or data-limited period 693 If the sender has been idle (unable to send because there is little 694 or no data from the application), the allowed sending rate could 695 have been reduced due to the nofeedback timer, as specified in the 696 section above. Because the sender is always restricted to sending 697 at most twice the receive rate reported by the receiver, the sender 698 will be limited to at most doubling its sending rate each round-trip 699 time, until the sending rate reaches the allowed sending rate 700 calculated by the throughput equation. 702 4.6. Preventing Oscillations 703 To prevent oscillatory behavior in environments with a low degree of 704 statistical multiplexing it is useful to modify sender's transmit 705 rate to provide congestion avoidance behavior by reducing the 706 transmit rate as the queuing delay (and hence RTT) increases. To do 707 this the sender maintains an estimate of the long-term RTT and 708 modifies its sending rate depending on how the most recent sample of 709 the RTT differs from this value. The long-term sample is R_sqmean, 710 and is set as follows: 712 If no feedback has been received before 713 R_sqmean = sqrt(R_sample); 714 Else 715 R_sqmean = q2*R_sqmean + (1-q2)*sqrt(R_sample); 717 Thus R_sqmean gives the exponentially weighted moving average of the 718 square root of the RTT samples. The constant q2 should be set 719 similarly to q, and we recommend a value of 0.9 as the default. 721 The sender obtains the base allowed transmit rate, X, from the 722 throughput function. It then calculates a modified instantaneous 723 transmit rate X_inst, as follows: 725 X_inst = X * R_sqmean / sqrt(R_sample); 727 When sqrt(R_sample) is greater than R_sqmean then the queue is 728 typically increasing and so the transmit rate needs to be decreased 729 for stable operation. 731 Note: This modification is not always strictly required, especially 732 if the degree of statistical multiplexing in the network is high. 733 However, we recommend that it is done because it does make TFRC 734 behave better in environments with a low level of statistical 735 multiplexing. If it is not done, we recommend using a very low 736 value of q, such that q is close to or exactly zero. 738 4.7. Scheduling of Packet Transmissions 740 As TFRC is rate-based, and as operating systems typically cannot 741 schedule events precisely, it is necessary to be opportunistic about 742 sending data packets so that the correct average rate is maintained 743 despite the coarse-grain or irregular scheduling of the operating 744 system. Thus a typical sending loop will calculate the correct 745 inter-packet interval, t_ipi, as follows: 747 t_ipi = s/X_inst; 749 Let t_now be the current time and i be a natural number, i = 0, 1, 750 ..., with t_i the nominal send time for the i-th packet. Then the 751 nominal send time t_(i+1) derives recursively as 753 t_0 = t_now, 754 t_(i+1) = t_i + t_ipi. 756 The parameter t_delta allows a degree of flexibility in the send 757 time of a packet. When the application becomes idle, it requests 758 re-scheduling for time t_i = t_(i-1) + t_ipi, for t_(i-1) the send 759 time for the previous packet. When the application is re-scheduled, 760 it checks the current time, t_now. If (t_now > t_i - t_delta) then 761 packet i is sent. 763 In some cases, when the nominal send time, t_i, of the next packet 764 is calculated, it may already be the case that t_now > t_i - 765 t_delta. In such a case the packet should be sent immediately. 766 Thus if the operating system has coarse timer granularity and the 767 transmit rate is high, then TFRC may send short bursts of several 768 packets separated by intervals of the OS timer granularity. 770 If the operating system has a scheduling timer granularity of t_gran 771 seconds, then t_delta would typically be set to: 773 t_delta = min(t_ipi/2, t_gran/2); 775 t_gran is 10ms on many Unix systems. If t_gran is not known, a 776 value of 10ms can be safely assumed. 778 5. Calculation of the Loss Event Rate (p) 780 Obtaining an accurate and stable measurement of the loss event rate 781 is of primary importance for TFRC. Loss rate measurement is 782 performed at the receiver, based on the detection of lost or marked 783 packets from the sequence numbers of arriving packets. We describe 784 this process before describing the rest of the receiver protocol. 786 5.1. Detection of Lost or Marked Packets 788 TFRC assumes that all packets contain a sequence number that is 789 incremented by one for each packet that is sent. For the purposes 790 of this specification, we require that if a lost packet is 791 retransmitted, the retransmission is given a new sequence number 792 that is the latest in the transmission sequence, and not the same 793 sequence number as the packet that was lost. If a transport 794 protocol has the requirement that it must retransmit with the 795 original sequence number, then the transport protocol designer must 796 figure out how to distinguish delayed from retransmitted packets and 797 how to detect lost retransmissions. 799 The receiver maintains a data structure that keeps track of which 800 packets have arrived and which are missing. For the purposes of 801 specification, we assume that the data structure consists of a list 802 of packets that have arrived along with the receiver timestamp when 803 each packet was received. In practice this data structure will 804 normally be stored in a more compact representation, but this is 805 implementation-specific. 807 The loss of a packet is detected by the arrival of at least NDUPACK 808 packets with a higher sequence number than the lost packet, for 809 NDUPACK set to 3. The requirement for NDUPACK subsequent packets is 810 the same as with TCP, and is to make TFRC more robust in the 811 presence of reordering. In contrast to TCP, if a packet arrives 812 late (after NDUPACK subsequent packets arrived) in TFRC, the late 813 packet can fill the hole in TFRC's reception record, and the 814 receiver can recalculate the loss event rate. Future versions of 815 TFRC might make the requirement for NDUPACK subsequent packets 816 adaptive based on experienced packet reordering, but we do not 817 specify such a mechanism here. 819 For an ECN-capable connection, a marked packet is detected as a 820 congestion event as soon as it arrives, without having to wait for 821 the arrival of subsequent packets. 823 5.2. Translation from Loss History to Loss Events 825 TFRC requires that the loss fraction be robust to several 826 consecutive packets lost or marked where those packets are part of 827 the same loss event. This is similar to TCP, which (typically) only 828 performs one halving of the congestion window during any single RTT. 829 Thus the receiver needs to map the packet loss history into a loss 830 event record, where a loss event is one or more packets lost or 831 marked in an RTT. To perform this mapping, the receiver needs to 832 know the RTT to use, and this is supplied periodically by the 833 sender, typically as control information piggy-backed onto a data 834 packet. TFRC is not sensitive to how the RTT measurement sent to 835 the receiver is made, but we recommend using the sender's calculated 836 RTT, R, (see Section 4.3) for this purpose. 838 To determine whether a lost or marked packet should start a new loss 839 event, or be counted as part of an existing loss event, we need to 840 compare the sequence numbers and timestamps of the packets that 841 arrived at the receiver. For a marked packet S_new, its reception 842 time T_new can be noted directly. For a lost packet, we can 843 interpolate to infer the nominal "arrival time". Assume: 845 S_loss is the sequence number of a lost packet. 847 S_before is the sequence number of the last packet to arrive 848 with sequence number before S_loss. 850 S_after is the sequence number of the first packet to arrive 851 with sequence number after S_loss. 853 S_max is the largest sequence number. 855 T_loss is the nominal estimated arrival time for the lost 856 packet. 858 T_before is the reception time of S_before. 860 T_after is the reception time of S_after. 862 Note that T_before can either be before or after T_after due to 863 reordering. 865 For a lost packet S_loss, we can interpolate its nominal "arrival 866 time" at the receiver from the arrival times of S_before and 867 S_after. Thus: 869 T_loss = T_before + ( (T_after - T_before) 870 * (S_loss - S_before)/(S_after - S_before) ); 872 Note that if the sequence space wrapped between S_before and 873 S_after, then the sequence numbers must be modified to take this 874 into account before performing this calculation. If the largest 875 possible sequence number is S_max, and S_before > S_after, then 876 modifying each sequence number S by S' = (S + (S_max + 1)/2) mod 877 (S_max + 1) would normally be sufficient. 879 If the lost packet S_old was determined to have started the previous 880 loss event, and we have just determined that S_new has been lost, 881 then we interpolate the nominal arrival times of S_old and S_new, 882 called T_old and T_new respectively. 884 If T_old + R >= T_new, then S_new is part of the existing loss 885 event. Otherwise S_new is the first packet in a new loss event. 887 5.3. Inter-loss Event Interval 889 If a loss interval, A, is determined to have started with packet 890 sequence number S_A and the next loss interval, B, started with 891 packet sequence number S_B, then the number of packets in loss 892 interval A is given by (S_B - S_A). Thus, loss interval A contains 893 all of the packets transmitted by the sender starting with the first 894 packet transmitted in loss interval A, and ending with but not 895 including the first packet transmitted in loss interval B. 897 5.4. Average Loss Interval 899 To calculate the loss event rate p, we first calculate the average 900 loss interval. This is done using a filter that weights the n most 901 recent loss event intervals in such a way that the measured loss 902 event rate changes smoothly. 904 Weights w_0 to w_(n-1) are calculated as: 906 If (i < n/2) 907 w_i = 1; 908 Else 909 w_i = 1 - (i - (n/2 - 1))/(n/2 + 1); 911 Thus if n=8, the values of w_0 to w_7 are: 913 1.0, 1.0, 1.0, 1.0, 0.8, 0.6, 0.4, 0.2 915 The value n for the number of loss intervals used in calculating the 916 loss event rate determines TFRC's speed in responding to changes in 917 the level of congestion. As currently specified, TFRC should not be 918 used for values of n significantly greater than 8, for traffic that 919 might compete in the global Internet with TCP. At the very least, 920 safe operation with values of n greater than 8 would require a 921 slight change to TFRC's mechanisms to include a more severe response 922 to two or more round-trip times with heavy packet loss. 924 When calculating the average loss interval we need to decide whether 925 to include the interval since the most recent packet loss event. We 926 only do this if it is sufficiently large to increase the average 927 loss interval. 929 Let the most recent loss intervals be I_0 to I_k, where I_0 is the 930 interval starting with the most recent loss event (if there has been 931 one). If there have been at least n loss intervals, then k is set 932 to n; otherwise k is the maximum number of loss intervals seen so 933 far. We calculate the average loss interval I_mean is: 935 I_tot0 = 0; 936 I_tot1 = 0; 937 W_tot = 0; 938 for (i = 0 to k-1) { 939 I_tot0 = I_tot0 + (I_i * w_i); 940 W_tot = W_tot + w_i; 941 } 942 for (i = 1 to k) { 943 I_tot1 = I_tot1 + (I_i * w_(i-1)); 944 } 945 I_tot = max(I_tot0, I_tot1); 946 I_mean = I_tot/W_tot; 948 The loss event rate, p is simply: 950 p = 1 / I_mean; 952 5.5. History Discounting 954 As described in Section 5.4, when there have been at least eight 955 loss intervals, the most recent loss interval is only assigned 956 1/(0.75*n) of the total weight in calculating the average loss 957 interval, regardless of the size of the most recent loss interval. 958 This section describes an optional history discounting mechanism, 959 discussed further in [FHPW00a] and [W00], that allows the TFRC 960 receiver to adjust the weights, concentrating more of the relative 961 weight on the most recent loss interval, when the most recent loss 962 interval is more than twice as large as the computed average loss 963 interval. 965 To carry out history discounting, we associate a discount factor 966 DF_i with each loss interval L_i, for i > 0, where each discount 967 factor is a floating point number. The discount array maintains the 968 cumulative history of discounting for each loss interval. At the 969 beginning, the values of DF_i in the discount array are initialized 970 to 1: 972 for (i = 0 to n) { 973 DF_i = 1; 974 } 976 History discounting also uses a general discount factor DF, also a 977 floating point number, that is also initialized to 1. First we show 978 how the discount factors are used in calculating the average loss 979 interval, and then we describe later in this section how the 980 discount factors are modified over time. 982 As described in Section 5.4 the average loss interval is calculated 983 using the n previous loss intervals I_1, ..., I_n, and the interval 984 I_0 that represents the number of packets sent since the beginning 985 of the last loss event. The computation of the average loss 986 interval using the discount factors is a simple modification of the 987 procedure in Section 5.4, as follows: 989 I_tot0 = I_0 * w_0 990 I_tot1 = 0; 991 W_tot0 = w_0 992 W_tot1 = 0; 993 for (i = 1 to n-1) { 994 I_tot0 = I_tot0 + (I_i * w_i * DF_i * DF); 995 W_tot0 = W_tot0 + w_i * DF_i * DF; 996 } 997 for (i = 1 to n) { 998 I_tot1 = I_tot1 + (I_i * w_(i-1) * DF_i); 999 W_tot1 = W_tot1 + w_(i-1) * DF_i; 1000 } 1001 p = min(W_tot0/I_tot0, W_tot1/I_tot1); 1003 The general discounting factor, DF is updated on every packet 1004 arrival as follows. First, the receiver computes the weighted 1005 average I_mean of the loss intervals I_1, ..., I_n: 1007 I_tot = 0; 1008 W_tot = 0; 1009 for (i = 1 to n) { 1010 W_tot = W_tot + w_(i-1) * DF_i; 1011 I_tot = I_tot + (I_i * w_(i-1) * DF_i); 1012 } 1013 I_mean = I_tot / W_tot; 1015 This weighted average I_mean is compared to I_0, the number of 1016 packets sent since the beginning of the last loss event. If I_0 is 1017 greater than twice I_mean, then the new loss interval is 1018 considerably larger than the old ones, and the general discount 1019 factor DF is updated to decrease the relative weight on the older 1020 intervals, as follows: 1022 if (I_0 > 2 * I_mean) { 1023 DF = 2 * I_mean/I_0; 1024 if (DF < THRESHOLD) 1025 DF = THRESHOLD; 1026 } else 1027 DF = 1; 1029 A nonzero value for THRESHOLD ensures that older loss intervals from 1030 an earlier time of high congestion are not discounted entirely. We 1031 recommend a THRESHOLD of 0.5. Note that with each new packet 1032 arrival, I_0 will increase further, and the discount factor DF will 1033 be updated. 1035 When a new loss event occurs, the current interval shifts from I_0 1036 to I_1, loss interval I_i shifts to interval I_(i+1), and the loss 1037 interval I_n is forgotten. The previous discount factor DF has to 1038 be incorporated into the discount array. Because DF_i carries the 1039 discount factor associated with loss interval I_i, the DF_i array 1040 has to be shifted as well. This is done as follows: 1042 for (i = 1 to n) { 1043 DF_i = DF * DF_i; 1044 } 1045 for (i = n-1 to 0 step -1) { 1046 DF_(i+1) = DF_i; 1047 } 1048 I_0 = 1; 1049 DF_0 = 1; 1050 DF = 1; 1052 This completes the description of the optional history discounting 1053 mechanism. We emphasize that this is an optional mechanism whose 1054 sole purpose is to allow TFRC to response somewhat more quickly to 1055 the sudden absence of congestion, as represented by a long current 1056 loss interval. 1058 6. Data Receiver Protocol 1060 The receiver periodically sends feedback messages to the sender. 1061 Feedback packets should normally be sent at least once per RTT, 1062 unless the sender is sending at a rate of less than one packet per 1063 RTT, in which case a feedback packet should be send for every data 1064 packet received. A feedback packet should also be sent whenever a 1065 new loss event is detected without waiting for the end of an RTT, 1066 and whenever an out-of-order data packet is received that removes a 1067 loss event from the history. 1069 If the sender is transmitting at a high rate (many packets per RTT) 1070 there may be some advantages to sending periodic feedback messages 1071 more than once per RTT as this allows faster response to changing 1072 RTT measurements, and more resilience to feedback packet loss. If 1073 the receiver was sending k feedback packets per RTT, step (4) of 1074 Section 6.2 would be modified to set the feedback timer to expire 1075 after R_m/k seconds. However, each feedback packet would still 1076 report the receiver rate over the last RTT, not over a fraction of 1077 an RTT. We note that there is little gain from sending a large 1078 number of feedback messages per RTT. 1080 6.1. Receiver behavior when a data packet is received 1082 When a data packet is received, the receiver performs the following 1083 steps: 1085 1) Add the packet to the packet history. 1087 2) Let the previous value of p be p_prev. Calculate the new value 1088 of p as described in Section 5. 1090 3) If p > p_prev, cause the feedback timer to expire, and perform 1091 the actions described in Section 6.2 1093 If p <= p_prev no action need be performed. 1095 However an optimization might check to see if the arrival of the 1096 packet caused a hole in the packet history to be filled and 1097 consequently two loss intervals were merged into one. If this 1098 is the case, the receiver might also send feedback immediately. 1099 The effects of such an optimization are normally expected to be 1100 small. 1102 6.2. Expiration of feedback timer 1104 When the feedback timer at the receiver expires, the action to be 1105 taken depends on whether data packets have been received since the 1106 last feedback was sent. 1108 Let the maximum sequence number of a packet at the receiver so far 1109 be S_m, and the value of the RTT measurement included in packet S_m 1110 be R_m. As described in Section 3.2.1, R_m is the sender's current 1111 estimate of the round trip time, reported in data packets. If data 1112 packets have been received since the previous feedback was sent, the 1113 receiver performs the following steps: 1115 1) Calculate the average loss event rate using the algorithm 1116 described above. 1118 2) Calculate the measured receive rate, X_recv, based on the 1119 packets received within the previous R_m seconds. 1121 3) Prepare and send a feedback packet containing the information 1122 described in Section 3.2.2 1124 4) Restart the feedback timer to expire after R_m seconds. 1126 Note that rule 2) above gives a minimum value for the measured 1127 receive rate X_recv of one packet per round-trip time. If the 1128 sender is limited to a sending rate of less than one packet per 1129 round-trip time, this will be due to the loss event rate, not from a 1130 limit imposed by the measured receive rate at the receiver. 1132 If no data packets have been received since the last feedback was 1133 sent, no feedback packet is sent, and the feedback timer is 1134 restarted to expire after R_m seconds. 1136 6.3. Receiver initialization 1138 The receiver is initialized by the first data packet that arrives at 1139 the receiver. Let the sequence number of this packet be i. 1141 When the first packet is received: 1143 o Set p=0 1145 o Set X_recv = 0. 1147 o Prepare and send a feedback packet. 1149 o Set the feedback timer to expire after R_i seconds. 1151 If the first data packet doesn't contain an estimate R_i of the 1152 round-trip time, then the receiver sends a feedback packet for every 1153 arriving data packet, until a data packet arrives containing an 1154 estimate of the round-trip time. 1156 If the sender is using a coarse-grained timestamp that increments 1157 every quarter of a round-trip time, then a feedback timer is not 1158 needed, and the following procedure from RFC 4342 is used to 1159 determine when to send feedback messages. 1161 o Whenever the receiver sends a feedback message, the receiver 1162 sets a local variable last_counter to the greatest received 1163 value of the window counter since the last feedback message was 1164 sent, if any data packets have been received since the last 1165 feedback message was sent. 1167 o If the receiver receives a data packet with a window counter 1168 value greater than or equal to last_counter + 4, then the 1169 receiver sends a new feedback packet. ("Greater" and "greatest" 1170 are measured in circular window counter space.) 1172 6.3.1. Initializing the Loss History after the First Loss Event 1174 The number of packets until the first loss can not be used to 1175 compute the allowed sending rate directly, as the sending rate 1176 changes rapidly during this time. TFRC assumes that the correct 1177 data rate after the first loss is half of the maximum sending rate 1178 before the loss occurred. TFRC approximates this target rate 1179 X_target by the maximum X_recv so far, for X_recv the receive rate 1180 over a single round-trip time. (For a TFRC sender that always has 1181 data to send, it is sufficient to approximate the target rate by the 1182 most recent X_recv. However, for a TFRC sender that is sometimes 1183 data-limited or idle, it is best to use the maximum X_recv so far.) 1185 After the first loss, instead of initializing the first loss 1186 interval to the number of packets sent until the first loss, the 1187 TFRC receiver calculates the loss interval that would be required to 1188 produce the data rate X_target, and uses this synthetic loss 1189 interval to seed the loss history mechanism. 1191 TFRC does this by finding some value p for which the throughput 1192 equation in Section 3.1 gives a sending rate within 5% of X_target, 1193 given the round-trip time R, and the first loss interval is then set 1194 to 1/p. If the receiver knows the segment size s used by the 1195 sender, then the receiver can use the throughput equation for X; 1196 otherwise, the receiver can measure the receive rate in packets per 1197 second instead of bytes per second for this purpose, and use the 1198 throughput equation for X_pps. (The 5% tolerance is introduced 1199 simply because the throughput equation is difficult to invert, and 1200 we want to reduce the costs of calculating p numerically.) 1202 Special care is needed for initializing the first loss interval when 1203 the first data packet is lost or marked. When the first data packet 1204 is lost in TCP, the TCP sender retransmits the packet after the 1205 retransmit timer expires. If TCP's first data packet is ECN-marked, 1206 the TCP sender resets the retransmit timer, and sends a new data 1207 packet only when the retransmit timer expires [RFC3168] (Section 1208 6.1.2). For TFRC, if the first data packet is lost or ECN-marked, 1209 then the first loss interval consists of the null interval with no 1210 data packets. In this case, the loss interval length for this 1211 (null) loss interval should be set to give a similar sending rate to 1212 that of TCP. 1214 When the first TFRC loss interval is null, meaning that the first 1215 data packet is lost or ECN-marked, in order to follow the behavior 1216 of TCP, TFRC wants the allowed sending rate to be 1 packet every two 1217 round-trip times, or equivalently, 0.5 packets per RTT. Thus, the 1218 TFRC receiver calculates the loss interval that would be required to 1219 produce the target rate X_target of 0.5/R packets per second, for 1220 the round-trip time R, and uses this synthetic loss interval for the 1221 first loss interval. The TFRC receiver uses 0.5/R packets per 1222 second as the minimum value for X_target when initializing the first 1223 loss interval. 1225 7. Sender-based Variants 1227 In a sender-based variant of TFRC, the receiver would use reliable 1228 delivery to send information about packet losses to the sender, and 1229 the sender would compute the packet loss rate and the acceptable 1230 transmit rate. 1232 The main advantages of a sender-based variant of TFRC would be that 1233 the sender would not have to trust the receiver's calculation of the 1234 packet loss rate. However, with the requirement of reliable 1235 delivery of loss information from the receiver to the sender, a 1236 sender-based TFRC would have much tighter constraints on the 1237 transport protocol in which it is embedded. 1239 In contrast, the receiver-based variant of TFRC specified in this 1240 document is robust to the loss of feedback packets, and therefore 1241 does not require the reliable delivery of feedback packets. It is 1242 also better suited for applications where it is desirable to offload 1243 work from the server to the client as much as possible. 1245 RFC 4340 and RFC 4342 together specify CCID 3, which can be used as 1246 a sender-based variant of TFRC. In CCID 3, each feedback packet 1247 from the receiver contains a Loss Intervals option, reporting the 1248 lengths of the most recent loss intervals. Feedback packets may 1249 also include the Ack Vector option, allowing the sender to determine 1250 exactly which packets were dropped or marked, and to check the 1251 information reported in the Loss Intervals options. The Ack Vector 1252 option can also include ECN Nonce Echoes, allowing the sender to 1253 verify the receiver's report of having received a data packet. The 1254 Ack Vector option allows the sender to determine for itself which 1255 data packets were lost or ECN-marked, to determine loss intervals, 1256 and to calculate the loss event rate. Section 9.2 of RFC 4342 1257 discusses issues in the sender verifying information reported by the 1258 receiver. 1260 8. Implementation Issues 1262 This document has specified the TFRC congestion control mechanism, 1263 for use by applications and transport protocols. This section 1264 mentions briefly some of the few implementation issues. 1266 For t_RTO = 4*R and b = 1, the throughput equation in Section 3.1 1267 can be expressed as follows: 1269 s 1270 X_Bps = -------- 1271 R * f(p) 1273 for 1275 f(p) = sqrt(2*p/3) + (12*sqrt(3*p/8) * p * (1+32*p^2)). 1277 A table lookup could be used for the function f(p). 1279 Many of the multiplications (e.g., q and 1-q for the round-trip time 1280 average, a factor of 4 for the timeout interval) are or could be by 1281 powers of two, and therefore could be implemented as simple shift 1282 operations. 1284 We note that the optional sender mechanism for preventing 1285 oscillations described in Section 4.6 uses a square-root 1286 computation. 1288 For the calculation of the nominal arrival time T_loss for a lost 1289 packet from Section 5.2, one way to implement this that would avoid 1290 concerns about wrapped sequence space would be to use the following: 1292 T_loss = T_before + (T_after - T_before) * Dist(S_loss, 1293 S_before)/Dist(S_after, S_before) 1295 where 1297 Dist(Seqno_A, Seqno_B) = (Seqno_A + 2^48 - Seqno_B) % 2^48 1299 The calculation of the average loss interval in Section 5.4 involves 1300 multiplications by the weights w_0 to w_(n-1), which for n=8 are: 1302 1.0, 1.0, 1.0, 1.0, 0.8, 0.6, 0.4, 0.2. 1304 With a minor loss of smoothness, it would be possible to use weights 1305 that were powers of two or sums of powers of two, e.g., 1306 1.0, 1.0, 1.0, 1.0, 0.75, 0.5, 0.25, 0.25. 1308 The optional history discounting mechanism described in Section 5.5 1309 is used in the calculation of the average loss rate. The history 1310 discounting mechanism is invoked only when there has been an 1311 unusually long interval with no packet losses. For a more efficient 1312 operation, the discount factor DF_i could be restricted to be a 1313 power of two. 1315 9. Changes from RFC 3448 1317 The changes from RFC 3448 are as follows: 1319 o Changes to the initial sending rate: In RFC 3448, the initial 1320 sending rate was two packets per round trip time. In this 1321 document, the initial sending rate can be as high as four 1322 packets per round trip time, following RFC 3390. 1324 Following Section 5.1 from [RFC4342], this document also 1325 specifies that when the sending rate is reduced after an idle 1326 period, it is not reduced below the initial sending rate. In 1327 addition, when the sender has been data-limited and the sender 1328 is reducing the allowed transmit rate to twice the receive 1329 rate,, the sender doesn't reduce the allowed transmit rate to 1330 less than the initial sending rate. 1332 A larger initial sending rate is of little use if the receiver 1333 sends a feedback packet after the first packet is received, and 1334 the sender in response reduces the allowed sending rate to at 1335 most twice the receive rate. In the current document, the 1336 sender does not reduce the allowed sending rate to at most twice 1337 the receive rate in response to the first feedback packet. 1339 o RFC 3448 had contradictory text about whether the sender halved 1340 its sending rate after *two* round-trip times without receiving 1341 a feedback report, or after *four* round-trip times. This 1342 document clarifies that the sender halves its sending rate after 1343 four round-trip times without receiving a feedback report 1344 [RFC3448Err]. 1346 o Section 4.4 was clarified to specify that on the expiration of 1347 the nofeedback timer, if p = 0, step (2) applies instead of step 1348 (1) [RFC3448Err]. 1350 o A line in Section 5.5 was changed from ``for (i = 1 to n) { DF_i 1351 = 1; }'' to ``for (i = 0 to n) { DF_i = 1; }'' [RFC3448Err]. 1353 o Section 5.4 was modified to clarify the receiver's calculation 1354 of the average loss interval when the receiver has not yet seen 1355 eight loss intervals. 1357 o Section 4.1 was modified to give a specific algorithm that could 1358 be used for estimating the average segment size. 1360 10. Security Considerations 1362 TFRC is not a transport protocol in its own right, but a congestion 1363 control mechanism that is intended to be used in conjunction with a 1364 transport protocol. Therefore security primarily needs to be 1365 considered in the context of a specific transport protocol and its 1366 authentication mechanisms. 1368 Congestion control mechanisms can potentially be exploited to create 1369 denial of service. This may occur through spoofed feedback. Thus 1370 any transport protocol that uses TFRC should take care to ensure 1371 that feedback is only accepted from the receiver of the data. The 1372 precise mechanism to achieve this will however depend on the 1373 transport protocol itself. 1375 In addition, congestion control mechanisms may potentially be 1376 manipulated by a greedy receiver that wishes to receive more than 1377 its fair share of network bandwidth. A receiver might do this by 1378 claiming to have received packets that in fact were lost due to 1379 congestion. Possible defenses against such a receiver would 1380 normally include some form of nonce that the receiver must feed back 1381 to the sender to prove receipt. However, the details of such a 1382 nonce would depend on the transport protocol, and in particular on 1383 whether the transport protocol is reliable or unreliable. 1385 We expect that protocols incorporating ECN with TFRC will also want 1386 to incorporate feedback from the receiver to the sender using the 1387 ECN nonce [RFC3540]. The ECN nonce is a modification to ECN that 1388 protects the sender from the accidental or malicious concealment of 1389 marked packets. Again, the details of such a nonce would depend on 1390 the transport protocol, and are not addressed in this document. 1392 11. IANA Considerations 1394 There are no IANA actions required for this document. 1396 12. Acknowledgments 1398 We would like to acknowledge feedback and discussions on equation- 1399 based congestion control with a wide range of people, including 1400 members of the Reliable Multicast Research Group, the Reliable 1401 Multicast Transport Working Group, and the End-to-End Research 1402 Group. We would like to thank Dado Colussi, Gorry Fairhurst, Ladan 1403 Gharai, Wim Heirman, Eddie Kohler, Ken Lofgren, Mike Luby, Ian 1404 McDonald, Michele R., Gerrit Renker, Arjuna Sathiaseelan, Vladica 1405 Stanisic, Randall Stewart, Eduardo Urzaiz, Shushan Wen, and Wendy 1406 Lee (lhh@zsu.edu.cn) for feedback on earlier versions of this 1407 document, and to thank Mark Allman for his extensive feedback from 1408 using the document to produce a working implementation. 1410 13. Terminology 1412 This document uses the following terms: 1414 DF: discount factor for a loss interval 1416 last_counter : greatest received value of the window counter 1418 min_rate : minimum transmit rate 1420 MSS : Maximum Segment Size (constant) 1422 n : number of loss intervals 1424 NDUPACK : number of dupacks for inferring loss (constant) 1426 nofeedback timer : sender-side timer 1428 p : measured Loss Event Rate 1430 p_prev : previous value of p 1432 q : filter constant for RTT (constant) 1434 q2 : filter constant for long-term RTT (constant) 1436 R : estimated path round-trip time 1438 R_sample : measured path RTT 1440 R_sqmean : estimated long-term RTT 1442 s : nominal packet size in bytes (constant) 1444 S : sequence number 1446 t_delta : parameter for flexibility in send time 1448 t_gran : schedular granularity (constant) 1450 t_ipi : calculated inter-packet interval for sending packets 1452 t_mbi : maximum RTO value of TCP (constant) 1454 tld : Time Last Doubled 1456 t_now : current time 1458 t_RTO : estimated RTO of TCP 1460 X : allowed transmit rate 1461 X_Bps : calculated sending rate in bytes per second 1463 X_pps : calculated sending rate in packets per second 1465 X_recv : estimated receive rate at the receiver 1467 X_inst : instantaneous transmit rate 1469 W_init : TCP initial window (constant) 1471 14. Normative References 1473 15. Informational References 1475 [BRS99] Balakrishnan, H., Rahul, H., and Seshan, S., "An 1476 Integrated Congestion Management Architecture for 1477 Internet Hosts," Proc. ACM SIGCOMM, Cambridge, MA, 1478 September 1999. 1480 [FHPW00] S. Floyd, M. Handley, J. Padhye, and J. Widmer, 1481 "Equation-Based Congestion Control for Unicast 1482 Applications", August 2000, Proc SIGCOMM 2000. 1484 [FHPW00a] S. Floyd, M. Handley, J. Padhye, and J. Widmer, 1485 "Equation-Based Congestion Control for Unicast 1486 Applications: the Extended Version", ICSI tech 1487 report TR-00-03, March 2000. 1489 [PFTK98] Padhye, J. and Firoiu, V. and Towsley, D. and 1490 Kurose, J., "Modeling TCP Throughput: A Simple Model 1491 and its Empirical Validation", Proc ACM SIGCOMM 1492 1998. 1494 [RFC2119] S. Bradner, Key Words For Use in RFCs to Indicate 1495 Requirement Levels, RFC 2119. 1497 [RFC2140] J. Touch, "TCP Control Block Interdependence", RFC 1498 2140, April 1997. 1500 [RFC2988] V. Paxson and M. Allman, "Computing TCP's 1501 Retransmission Timer", RFC 2988, November 2000. 1503 [RFC3168] K. Ramakrishnan and S. Floyd, "The Addition of 1504 Explicit Congestion Notification (ECN) to IP", RFC 1505 3168, September 2001. 1507 [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing 1508 TCP's Initial Window", RFC 3390, October 2002. 1510 [RFC3448Err] RFC 3448 Errata, URL 1511 ``http://www.icir.org/tfrc/rfc3448.errata''. 1513 [RFC3540] Wetherall, D., Ely, D., and Spring, N., "Robust ECN 1514 Signaling with Nonces", RFC 3540, Experimental, June 1515 2003 1517 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1518 Congestion Control Protocol (DCCP)", RFC 4340, March 1519 2006. 1521 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1522 Datagram Congestion Control Protocol (DCCP) 1523 Congestion Control ID 3: TCP-Friendly Rate Control 1524 (TFRC)", RFC 4342, March 2006. 1526 [TFRC-SP] Floyd, S., and E. Kohler, TCP Friendly Rate Control 1527 (TFRC): the Small-Packet (SP) Variant, Internet 1528 draft draft-ietf-dccp-tfrc-voip-07.txt, work in 1529 progress, November 2006. Approved for Experimental. 1530 . 1532 [W00] Widmer, J., "Equation-Based Congestion Control", 1533 Diploma Thesis, University of Mannheim, February 1534 2000. URL "http://www.icir.org/tfrc/". 1536 16. Authors' Addresses 1537 Mark Handley, 1538 Department of Computer Science 1539 University College London 1540 Gower Street 1541 London WC1E 6BT 1542 UK 1543 EMail: M.Handley@cs.ucl.ac.uk 1545 Sally Floyd 1546 ICIR/ICSI 1547 1947 Center St, Suite 600 1548 Berkeley, CA 94708 1549 floyd@icir.org 1551 Jitendra Padhye 1552 Microsoft Research 1553 padhye@microsoft.com 1555 Joerg Widmer 1556 Lehrstuhl Praktische Informatik IV 1557 Universitat Mannheim 1558 L 15, 16 - Room 415 1559 D-68131 Mannheim 1560 Germany 1561 widmer@informatik.uni-mannheim.de 1563 Full Copyright Statement 1565 Copyright (C) The IETF Trust (2007). 1567 This document is subject to the rights, licenses and restrictions 1568 contained in BCP 78, and except as set forth therein, the authors 1569 retain all their rights. 1571 This document and the information contained herein are provided on 1572 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 1573 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 1574 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 1575 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 1576 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 1577 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 1578 FOR A PARTICULAR PURPOSE. 1580 Intellectual Property 1582 The IETF takes no position regarding the validity or scope of any 1583 Intellectual Property Rights or other rights that might be claimed 1584 to pertain to the implementation or use of the technology described 1585 in this document or the extent to which any license under such 1586 rights might or might not be available; nor does it represent that 1587 it has made any independent effort to identify any such rights. 1588 Information on the procedures with respect to rights in RFC 1589 documents can be found in BCP 78 and BCP 79. 1591 Copies of IPR disclosures made to the IETF Secretariat and any 1592 assurances of licenses to be made available, or the result of an 1593 attempt made to obtain a general license or permission for the use 1594 of such proprietary rights by implementers or users of this 1595 specification can be obtained from the IETF on-line IPR repository 1596 at http://www.ietf.org/ipr. 1598 The IETF invites any interested party to bring to its attention any 1599 copyrights, patents or patent applications, or other proprietary 1600 rights that may cover technology that may be required to implement 1601 this standard. Please address the information to the IETF at ietf- 1602 ipr@ietf.org.