idnits 2.17.1 draft-ietf-tsvwg-highspeed-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (26 August 2003) is 7539 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'F02b' is mentioned on line 882, but not defined == Unused Reference: 'RFC2581' is defined on line 1245, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) -- Obsolete informational reference (is this intentional?): RFC 2582 (ref. 'Fl03') (Obsoleted by RFC 3782) == Outdated reference: A later version (-04) exists of draft-amit-quick-start-02 -- No information found for draft-jwl-tcp-fast - is the name correct? -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 3448 (Obsoleted by RFC 5348) Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Sally Floyd 2 INTERNET-DRAFT ICSI 3 draft-ietf-tsvwg-highspeed-01.txt 26 August 2003 4 Expires: February 2004 6 HighSpeed TCP for Large Congestion Windows 8 Status of this Memo 10 The proposals in this document are experimental. While they may be 11 deployed in the current Internet, they do not represent a consensus 12 that this is the best method for high-speed congestion control. In 13 particular, we note that alternative experimental proposals are 14 likely to be forthcoming, and it is not well understood how the 15 proposals in this document will interact with such alternative 16 proposals. 18 Status of this Document 20 This document is an Internet-Draft and is in full conformance with 21 all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six 29 months and may be updated, replaced, or obsoleted by other documents 30 at any time. It is inappropriate to use Internet- Drafts as 31 reference material or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt 36 The list of Internet-Draft Shadow Directories can be accessed at 37 http://www.ietf.org/shadow.html. 39 Abstract 41 This document proposes HighSpeed TCP, a modification to TCP's 42 congestion control mechanism for use with TCP connections with 43 large congestion windows. The congestion control mechanisms 44 of the current Standard TCP constrains the congestion windows 45 that can be achieved by TCP in realistic environments. For 46 example, for a Standard TCP connection with 1500-byte packets 47 and a 100 ms round-trip time, achieving a steady-state 48 throughput of 10 Gbps would require an average congestion 49 window of 83,333 segments, and a packet drop rate of at most 50 one congestion event every 5,000,000,000 packets (or 51 equivalently, at most one congestion event every 1 2/3 hours). 52 This is widely acknowledged as an unrealistic constraint. To 53 address this limitation of TCP, this document proposes 54 HighSpeed TCP, and solicits experimentation and feedback from 55 the wider community. 57 TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION: 59 Changes from draft-ietf-tsvwg-highspeed-00.txt: 61 Changed: 63 "The proposals in this document are experimental. We believe 64 they are safe for deployment in the current Internet, " 66 To: 68 "The proposals in this document are experimental. While they 69 may be deployed in the current Internet, " 71 Changes from draft-floyd-tcp-highspeed-03.txt: 73 Added the section on "Status of this Memo". 75 Added a paragraph to the end of the section on "Deployment 76 issues of HighSpeed TCP" about possible interactions between 77 HighSpeed TCP and other alternative experimental proposals. 79 Changes from draft-floyd-tcp-highspeed-02.txt: 81 * Added a section on "Deployment issues." 83 * Added a short section on "Implementation issues." 84 * Added a section on "Limiting burstiness on short time 85 scales". 87 * Added to the discussion on convergence times. 89 * Clarified that "log" is "log base 10". 91 * Clarified that W = Low_window and W_1 = High_window, in the 92 equation for b(w). 94 Changes from draft-floyd-tcp-highspeed-01.txt: 96 * Added a section on "Tradeoffs for Choosing Congestion 97 Control Parameters". 99 * Added mention of Scalable TCP from Tom Kelly. 101 Changes from draft-floyd-tcp-highspeed-00.txt: 103 * Added a discussion on related work about changing the PMTU. 105 * Added a discussion of an alternate, linear response 106 function. 108 * Added a discussion of the TCP window scale option. 110 * Added a discussion of HighSpeed TCP as roughly emulating the 111 congestion control response of N parallel TCP connections. 113 * Added a discussion of the time to converge to fairness. 115 * Expanded the Introduction. 117 Table of Contents 119 1. Introduction. . . . . . . . . . . . . . . . . . . . . . 5 120 2. The Problem Description.. . . . . . . . . . . . . . . . 6 121 3. Design Guidelines.. . . . . . . . . . . . . . . . . . . 6 122 4. Non-Goals.. . . . . . . . . . . . . . . . . . . . . . . 7 123 5. Modifying the TCP Response Function.. . . . . . . . . . 8 124 6. Fairness Implications of the HighSpeed Response 125 Function.. . . . . . . . . . . . . . . . . . . . . . . . . 11 126 7. Translating the HighSpeed Response Function into 127 Congestion Control Parameters. . . . . . . . . . . . . . . 14 128 8. An alternate, linear response functions.. . . . . . . . 16 129 9. Tradeoffs for Choosing Congestion Control Parame- 130 ters.. . . . . . . . . . . . . . . . . . . . . . . . . . . 19 131 9.1. The Number of Round-Trip Times between Loss 132 Events. . . . . . . . . . . . . . . . . . . . . . . . . . 19 133 9.2. The Number of Packet Drops per Loss Event, 134 with Drop-Tail. . . . . . . . . . . . . . . . . . . . . . 19 135 10. Related Issues . . . . . . . . . . . . . . . . . . . . 20 136 10.1. Slow-Start. . . . . . . . . . . . . . . . . . . . . 20 137 10.2. Limiting burstiness on short time scales. . . . . . 21 138 10.3. Other limitations on window size. . . . . . . . . . 22 139 10.4. Implementation issues.. . . . . . . . . . . . . . . 22 140 11. Deployment issues. . . . . . . . . . . . . . . . . . . 22 141 11.1. Deployment issues of HighSpeed TCP. . . . . . . . . 22 142 11.2. Deployment issues of Scalable TCP . . . . . . . . . 24 143 12. Related Work in HighSpeed TCP. . . . . . . . . . . . . 26 144 13. Relationship to other Work.. . . . . . . . . . . . . . 27 145 14. Conclusions. . . . . . . . . . . . . . . . . . . . . . 28 146 15. Acknowledgements . . . . . . . . . . . . . . . . . . . 28 147 16. Normative References . . . . . . . . . . . . . . . . . 29 148 17. Informative References . . . . . . . . . . . . . . . . 29 149 18. Security Considerations. . . . . . . . . . . . . . . . 31 150 19. IANA Considerations. . . . . . . . . . . . . . . . . . 31 151 20. TCP's Loss Event Rate in Steady-State. . . . . . . . . 31 153 1. Introduction. 155 This document proposes HighSpeed TCP, a modification to TCP's 156 congestion control mechanism for use with TCP connections with large 157 congestion windows. In a steady-state environment, with a packet 158 loss rate p, the current Standard TCP's average congestion window is 159 roughly 1.2/sqrt(p) segments. This places a serious constraint on 160 the congestion windows that can be achieved by TCP in realistic 161 environments. For example, for a Standard TCP connection with 162 1500-byte packets and a 100 ms round-trip time, achieving a steady- 163 state throughput of 10 Gbps would require an average congestion 164 window of 83,333 segments, and a packet drop rate of at most one 165 congestion event every 5,000,000,000 packets (or equivalently, at 166 most one congestion event every 1 2/3 hours). The average packet 167 drop rate of at most 2*10^(-10) needed for full link utilization in 168 this environment corresponds to a bit error rate of at most 169 2*10^(-14), and this is an unrealistic requirement for current 170 networks. 172 To address this fundamental limitation of TCP and of the TCP 173 response function (the function mapping the steady-state packet drop 174 rate to TCP's average sending rate in packets per round-trip time), 175 this document describes a modified TCP response function for regimes 176 with higher congestion windows. This document also solicits 177 experimentation and feedback on HighSpeed TCP from the wider 178 community. 180 Because HighSpeed TCP's modified response function would only take 181 effect with higher congestion windows, HighSpeed TCP does not modify 182 TCP behavior in environments with mild to heavy congestion, and 183 therefore does not introduce any new dangers of congestion collapse. 184 However, if relative fairness between HighSpeed TCP connections is 185 to be preserved, then in our view any modification to the TCP 186 response function should be addressed in the IETF, rather than made 187 as ad hoc decisions by individual implementors or TCP senders. 188 Modifications to the TCP response function would also have 189 implications for transport protocols that use TFRC and other forms 190 of equation-based congestion control, as these congestion control 191 mechanisms directly use the TCP response function [RFC3448]. 193 This proposal for HighSpeed TCP focuses specifically on a proposed 194 change to the TCP response function, and its implications for TCP. 195 This document does not address what we view as a separate 196 fundamental issue, of the mechanisms required to enable best-effort 197 connections to *start* with large initial windows. In our view, 198 while HighSpeed TCP proposes a somewhat fundamental change to the 199 TCP response function, at the same time it is a relatively simple 200 change to implement in a single TCP sender, and presents no dangers 201 in terms of congestion collapse. In contrast, in our view, the 202 problem of enabling connections to *start* with large initial 203 windows is inherently more risky and structurally more difficult, 204 requiring some form of explicit feedback from all of the routers 205 along the path. This is another reason why we would propose 206 addressing the problem of starting with large initial windows 207 separately, and on a separate timetable, from the problem of 208 modifying the TCP response function. 210 2. The Problem Description. 212 This section describes the number of round-trip times between 213 congestion events required for a Standard TCP flow to achieve an 214 average throughput of B bps, given packets of D bytes and a round- 215 trip time of R seconds. A congestion event refers to a window of 216 data with one or more dropped or ECN-marked packets (where ECN 217 stands for Explicit Congestion Notification). 219 From Appendix A, achieving an average TCP throughput of B bps 220 requires a loss event at most every BR/(12D) round-trip times. This 221 is illustrated in Table 1, for R = 0.1 seconds and D = 1500 bytes. 222 The table also gives the average congestion window W of BR/(8D), and 223 the steady-state packet drop rate P of 1.5/W^2. 225 TCP Throughput (Mbps) RTTs Between Losses W P 226 --------------------- ------------------- ---- ----- 227 1 5.5 8.3 0.02 228 10 55.5 83.3 0.0002 229 100 555.5 833.3 0.000002 230 1000 5555.5 8333.3 0.00000002 231 10000 55555.5 83333.3 0.0000000002 233 Table 1: RTTs Between Congestion Events for Standard TCP, for 234 1500-Byte Packets and a Round-Trip Time of 0.1 Seconds. 236 This document proposes HighSpeed TCP, a minimal modification to 237 TCP's increase and decrease parameters, for TCP connections with 238 larger congestion windows, to allow TCP to achieve high throughput 239 with more realistic requirements for the steady-state packet drop 240 rate. Equivalently, HighSpeed TCP has more realistic requirements 241 for the number of round-trip times between loss events. 243 3. Design Guidelines. 245 Our proposal for HighSpeed TCP is motivated by the following 246 requirements: 248 * Achieve high per-connection throughput without requiring 249 unrealistically low packet loss rates. 251 * Reach high throughput reasonably quickly when in slow-start. 253 * Reach high throughput without overly long delays when recovering 254 from multiple retransmit timeouts, or when ramping-up from a period 255 with small congestion windows. 257 * No additional feedback or support required from routers: 259 For example, the goal is for acceptable performance in both ECN- 260 capable and non-ECN-capable environments, and with Drop-Tail as well 261 as with Active Queue Management such as RED in the routers. 263 * No additional feedback required from TCP receivers. 265 * TCP-compatible performance in environments with moderate or high 266 congestion: 268 Equivalently, the requirement is that there be no additional load on 269 the network (in terms of increased packet drop rates) in 270 environments with moderate or high congestion. 272 * Performance at least as good as Standard TCP in environments with 273 moderate or high congestion. 275 * Acceptable transient performance, in terms of increases in the 276 congestion window in one round-trip time, responses to severe 277 congestion, and convergence times to fairness. 279 Currently, users wishing to achieve throughputs of 1 Gbps or more 280 typically open up multiple TCP connections in parallel, or use 281 MulTCP [CO98,GRK99], which behaves roughly like the aggregate of N 282 virtual TCP connections. While this approach suffices for the 283 occasional user on well-provisioned links, it leaves the parameter N 284 to be determined by the user, and results in more aggressive 285 performance and higher steady-state packet drop rates if used in 286 environments with periods of moderate or high congestion. We 287 believe that a new approach is needed that offers more flexibility, 288 more effectively scales to a wide range of available bandwidths, and 289 competes more fairly with Standard TCP in congested environments. 291 4. Non-Goals. 293 The following are explicitly *not* goals of our work: 295 * Non-goal: TCP-compatible performance in environments with very low 296 packet drop rates. 298 We note that our proposal does not require, or deliver, TCP- 299 compatible performance in environments with very low packet drop 300 rates, e.g., with packet loss rates of 10^-5 or 10^-6. As we 301 discuss later in this document, we assume that Standard TCP is 302 unable to make effective use of the available bandwidth in 303 environments with loss rates of 10^-6 in any case, so that it is 304 acceptable and appropriate for HighSpeed TCP to perform more 305 aggressively than Standard TCP is such an environment. 307 * Non-goal: Ramping-up more quickly than allowed by slow-start. 309 It is our belief that ramping-up more quickly than allowed by slow- 310 start would necessitate more explicit feedback from routers along 311 the path. The proposal for HighSpeed TCP is focused on changes to 312 TCP that could be effectively deployed in the current Internet 313 environment. 315 * Non-goal: Avoiding oscillations in environments with only one-way, 316 long-lived flows all with the same round-trip times. 318 While we agree that attention to oscillatory behavior is useful, 319 avoiding oscillations in aggregate throughput has not been our 320 primary consideration, particularly for simplified environments 321 limited to one-way, long-lived flows all with the same, large round- 322 trip times. Our assessment is that some oscillatory behavior in 323 these extreme environments is an acceptable price to pay for the 324 other benefits of HighSpeed TCP. 326 5. Modifying the TCP Response Function. 328 The TCP response function, w = 1.2/sqrt(p), gives TCP's average 329 congestion window w in MSS-sized segments, as a function of the 330 steady-state packet drop rate p [FF98]. This TCP response function 331 is a direct consequence of TCP's Additive Increase Multiplicative 332 Decrease (AIMD) mechanisms of increasing the congestion window by 333 roughly one segment per round-trip time in the absence of 334 congestion, and halving the congestion window in response to a 335 round-trip time with a congestion event. This response function for 336 Standard TCP is reflected in the table below. In this proposal we 337 restrict our attention to TCP performance in environments with 338 packet loss rates of at most 10^-2, and so we can ignore the more 339 complex response functions that are required to model TCP 340 performance in more congested environments with retransmit timeouts. 341 From Appendix A, an average congestion window of W corresponds to an 342 average of 2/3 W round-trip times between loss events for Standard 343 TCP (with the congestion window varying from 2/3 W to 4/3 W). 345 Packet Drop Rate P Congestion Window W RTTs Between Losses 346 ------------------ ------------------- ------------------- 347 10^-2 12 8 348 10^-3 38 25 349 10^-4 120 80 350 10^-5 379 252 351 10^-6 1200 800 352 10^-7 3795 2530 353 10^-8 12000 8000 354 10^-9 37948 25298 355 10^-10 120000 80000 357 Table 2: TCP Response Function for Standard TCP. The average 358 congestion window W in MSS-sized segments is given as a function of 359 the packet drop rate P. 361 To specify a modified response function for HighSpeed TCP, we use 362 three parameters, Low_Window, High_Window, and High_P. To ensure 363 TCP compatibility, the HighSpeed response function uses the same 364 response function as Standard TCP when the current congestion window 365 is at most Low_Window, and uses the HighSpeed response function when 366 the current congestion window is greater than Low_Window. In this 367 document we set Low_Window to 38 MSS-sized segments, corresponding 368 to a packet drop rate of 10^-3 for TCP. 370 To specify the upper end of the HighSpeed response function, we 371 specify the packet drop rate needed in the HighSpeed response 372 function to achieve an average congestion window of 83000 segments. 373 This is roughly the window needed to sustain 10 Gbps throughput, for 374 a TCP connection with the default packet size and round-trip time 375 used earlier in this document. For High_Window set to 83000, we 376 specify High_P of 10^-7; that is, with HighSpeed TCP a packet drop 377 rate of 10^-7 allows the HighSpeed TCP connection to achieve an 378 average congestion window of 83000 segments. We believe that this 379 loss rate sets an achievable target for high-speed environments, 380 while still allowing acceptable fairness for the HighSpeed response 381 function when competing with Standard TCP in environments with 382 packet drop rates of 10^-4 or 10^5. 384 For simplicity, for the HighSpeed response function we maintain the 385 property that the response function gives a straight line on a log- 386 log scale (as does the response function for Standard TCP, for low 387 to moderate congestion). This results in the following response 388 function, for values of the average congestion window W greater than 389 Low_Window: 391 W = (p/Low_P)^S Low_Window, 393 for Low_P the packet drop rate corresponding to Low_Window, and for 394 S as following constant [FRS02]: 396 S = (log High_Window - log Low_Window)/(log High_P - log Low_P). 398 (In this paper, "log x" refers to the log base 10.) For example, 399 for Low_Window set to 38, we have Low_P of 10^-3 (for compatibility 400 with Standard TCP). Thus, for High_Window set to 83000 and High_P 401 set to 10^-7, we get the following response function: 403 W = 0.12/p^0.835. (1) 405 This HighSpeed response function is illustrated in Table 3 below. 406 For HighSpeed TCP, the number of round-trip times between losses, 407 1/(pW), equals 12.7 W^0.2, for W > 38 segments. 409 Packet Drop Rate P Congestion Window W RTTs Between Losses 410 ------------------ ------------------- ------------------- 411 10^-2 12 8 412 10^-3 38 25 413 10^-4 263 38 414 10^-5 1795 57 415 10^-6 12279 83 416 10^-7 83981 123 417 10^-8 574356 180 418 10^-9 3928088 264 419 10^-10 26864653 388 421 Table 3: TCP Response Function for HighSpeed TCP. The average 422 congestion window W in MSS-sized segments is given as a function of 423 the packet drop rate P. 425 We believe that the problem of backward compatibility with Standard 426 TCP requires a response function that is quite close to that of 427 Standard TCP for loss rates of 10^-1, 10^-2, or 10^-3. We believe, 428 however, that such stringent TCP-compatibility is not required for 429 smaller loss rates, and that an appropriate response function is one 430 that gives a plausible packet drop rate for a connection throughput 431 of 10 Gbps. This also gives a slowly increasing number of round- 432 trip times between loss events as a function of a decreasing packet 433 drop rate. 435 Another way to look at the HighSpeed response function is to 436 consider that HighSpeed TCP is roughly emulating the congestion 437 control response of N parallel TCP connections, where N is initially 438 one, and where N increases as a function of the HighSpeed TCP's 439 congestion window. Thus for the HighSpeed response function in 440 Equation (1) above, the response function can be viewed as 441 equivalent to that of N(W) parallel TCP connections, where N(W) 442 varies as a function of the congestion window W. Recall that for a 443 single standard TCP connection, the average congestion window equals 444 1.2/sqrt(p). For N parallel TCP connections, the aggregate 445 congestion window for the N connections equals N*1.2/sqrt(p). From 446 the HighSpeed response function in Equation (1) and the relationship 447 above, we can derive the following: 449 N(W) = 0.23*W^(0.4) 451 for N(W) the number of parallel TCP connections emulated by the 452 HighSpeed TCP response function, and for N(W) >= 1. This is shown 453 in Table 4 below. 455 Congestion Window W Number N(W) of Parallel TCPs 456 ------------------- ------------------------- 457 1 1 458 10 1 459 100 1.4 460 1,000 3.6 461 10,000 9.2 462 100,000 23.0 464 Table 4: Number N(W) of parallel TCP connections roughly emulated by 465 the HighSpeed TCP response function. 467 We do not in this document attempt to seriously evaluate the 468 HighSpeed response function for congestion windows greater than 469 100,000 packets. We believe that we will learn more about the 470 requirements for sustaining the throughput of best-effort 471 connections in that range as we gain more experience with HighSpeed 472 TCP with congestion windows of thousands and tens of thousands of 473 packets. There also might be limitations to the per-connection 474 throughput that can be realistically achieved for best-effort 475 traffic, in terms of congestion window of hundreds of thousands of 476 packets or more, in the absence of additional support or feedback 477 from the routers along the path. 479 6. Fairness Implications of the HighSpeed Response Function. 481 The Standard and Highspeed Response Functions can be used directly 482 to infer the relative fairness between flows using the two response 483 functions. For example, given a packet drop rate P, assume that 484 Standard TCP has an average congestion window of W_Standard, and 485 HighSpeed TCP has a higher average congestion window of W_HighSpeed. 487 In this case, a single HighSpeed TCP connection is receiving 488 W_HighSpeed/W_Standard times the throughput of a single Standard TCP 489 connection competing in the same environment. 491 This relative fairness is illustrated below in Table 5, for the 492 parameters used for the Highspeed response function in the section 493 above. The second column gives the relative fairness, for the 494 steady-state packet drop rate specified in the first column. To 495 help calibrate, the third column gives the aggregate average 496 congestion window for the two TCP connections, and the fourth column 497 gives the bandwidth that would be needed by the two connections to 498 achieve that aggregate window and packet drop rate, given 100 ms 499 round-trip times and 1500-byte packets. 501 Packet Drop Rate P Fairness Aggregate Window Bandwidth 502 ------------------ -------- ---------------- --------- 503 10^-2 1.0 24 2.8 Mbps 504 10^-3 1.0 76 9.1 Mbps 505 10^-4 2.2 383 45.9 Mbps 506 10^-5 4.7 2174 260.8 Mbps 507 10^-6 10.2 13479 1.6 Gbps 508 10^-7 22.1 87776 10.5 Gbps 510 Table 5: Relative Fairness between the HighSpeed and Standard 511 Response Functions. 513 Thus, for packet drop rates of 10^-4, a flow with the HighSpeed 514 response function can expect to receive 2.2 times the throughput of 515 a flow using the Standard response function, given the same round- 516 trip times and packet sizes. With packet drop rates of 10^-6 (or 517 10^-7), the unfairness is more severe, and we have entered the 518 regime where a Standard TCP connection requires at most one 519 congestion event every 800 (or 2530) round-trip times in order to 520 make use of the available bandwidth. Our judgement would be that 521 there are not a lot of TCP connections effectively operating in this 522 regime today, with congestion windows of thousands of packets, and 523 that therefore the benefits of the HighSpeed response function would 524 outweigh the unfairness that would be experienced by Standard TCP in 525 this regime. However, one purpose of this document is to solicit 526 feedback on this issue. The parameter Low_Window determines 527 directly the point of divergence between the Standard and HighSpeed 528 Response Functions. 530 The third column of Table 5, the Aggregate Window, gives the 531 aggregate congestion window of the two competing TCP connections, 532 with HighSpeed and Standard TCP, given the packet drop rate 533 specified in the first column. From Table 5, a HighSpeed TCP 534 connection would receive ten times the bandwidth of a Standard TCP 535 in an environment with a packet drop rate of 10^-6. This would 536 occur when the two flows sharing a single pipe achieved an aggregate 537 window of 13479 packets. Given a round-trip time of 100 ms and a 538 packet size of 1500 bytes, this would occur with an available 539 bandwidth for the two competing flows of 1.6 Gbps. 541 Next we consider the time that it takes a standard or HighSpeed TCP 542 flow to converge to fairness against a pre-existing HighSpeed TCP 543 flow. The worst case for convergence to fairness occurs when a new 544 flow is starting up, competing against a high-bandwidth existing 545 flow, and the new flow suffers a packet drop and exits slow-start 546 while its window is still small. In the worst case, consider that 547 the new flow has entered the congestion avoidance phase while its 548 window is only one packet. A standard TCP flow in congestion 549 avoidance increases its window by at most one packet per round-trip 550 time, and after N round-trip times has only achieved a window of N 551 packets (when starting with a window of 1 in the first round-trip 552 time). In contrast, a HighSpeed TCP flows increases much faster 553 than a standard TCP flow while in the congestion avoidance phase, 554 and we can expect its convergence to fairness to be much better. 555 This is shown in Table 6 below. The script used to generate this 556 table is given in Appendix C. 558 RTT HS_Window Standard_TCP_Window 559 --- --------- ------------------- 560 100 131 100 561 200 475 200 562 300 1131 300 563 400 2160 400 564 500 3601 500 565 600 5477 600 566 700 7799 700 567 800 10567 800 568 900 13774 900 569 1000 17409 1000 570 1100 21455 1100 571 1200 25893 1200 572 1300 30701 1300 573 1400 35856 1400 574 1500 41336 1500 575 1600 47115 1600 576 1700 53170 1700 577 1800 59477 1800 578 1900 66013 1900 579 2000 72754 2000 581 Table 6: For a HighSpeed and a Standard TCP connection, the 582 congestion window during congestion avoidance phase (starting with a 583 congestion window of 1 packet during RTT 1. 585 The classic paper on relative fairness is from Chiu and Jain [CJ89]. 586 This paper shows that AIMD (Additive Increase Multiplicative 587 Decrease) converges to fairness in an environment with synchronized 588 congestion events. From [CJ89], it is easy to see that MIMD and 589 AIAD do not converge to fairness in this environment. However, the 590 results of [CJ89] do not apply to an asynchronous environment such 591 as that of the current Internet, where the frequency of congestion 592 feedback can be different for different flows. For example, it has 593 been shown that MIMD converges to fair states in a model with 594 proportional instead of synchronous feedback in terms of packet 595 drops [GV02]. Thus, we are not concerned about abandoning a strict 596 model of AIMD for HighSpeed TCP. 598 7. Translating the HighSpeed Response Function into Congestion Control 599 Parameters. 601 For equation-based congestion control such as TFRC, the HighSpeed 602 Response Function above could be used directly by the TFRC 603 congestion control mechanism. However, for TCP the HighSpeed 604 response function has to be translated into additive increase and 605 multiplicative decrease parameters. The HighSpeed response function 606 cannot be achieved by TCP with an additive increase of one segment 607 per round-trip time and a multiplicative decrease of halving the 608 current congestion window; HighSpeed TCP will have to modify either 609 the increase or the decrease parameter, or both. We have concluded 610 that HighSpeed TCP is most likely to achieve an acceptable 611 compromise between moderate increases and timely decreases by 612 modifying both the increase and the decrease parameter. 614 That is, for HighSpeed TCP let the congestion window increase by 615 a(w) segments per round-trip time in the absence of congestion, and 616 let the congestion window decrease to w(1-b(w)) segments in response 617 to a round-trip time with one or more loss events. Thus, in 618 response to a single acknowledgement HighSpeed TCP increases its 619 congestion window in segments as follows: 621 w <- w + a(w)/w. 623 In response to a congestion event, HighSpeed TCP decreases as 624 follows: 626 w <- (1-b(w))w. 628 For Standard TCP, a(w) = 1 and b(w) = 1/2, regardless of the value 629 of w. HighSpeed TCP uses the same values of a(w) and b(w) for w <= 630 Low_Window. This section specifies a(w) and b(w) for HighSpeed TCP 631 for larger values of w. 633 For w = High_Window, we have specified a loss rate of High_P. From 634 [FRS02], or from elementary calculations, this requires the 635 following relationship between a(w) and b(w) for w = High_Window: 637 a(w) = High_Window^2 * High_P * 2 * b(w)/(2-b(w). (2) 639 We use the parameter High_Decrease to specify the decrease parameter 640 b(w) for w = High_Window, and use Equation (2) to derive the 641 increase parameter a(w) for w = High_Window. Along with High_P = 642 10^-7 and High_Window = 83000, for example, we specify High_Decrease 643 = 0.1, specifying that b(83000) = 0.1, giving a decrease of 10% 644 after a congestion event. Equation (2) then gives a(83000) = 72, 645 for an increase of 72 segments, or just under 0.1%, within a round- 646 trip time, for w = 83000. 648 This moderate decrease strikes us as acceptable, particularly when 649 coupled with the role of TCP's ACK-clocking in limiting the sending 650 rate in response to more severe congestion [BBFS01]. A more severe 651 decrease would require a more aggressive increase in the congestion 652 window for a round-trip time without congestion. In particular, a 653 decrease factor High_Decrease of 0.5, as in Standard TCP, would 654 require an increase of 459 segments per round-trip time when w = 655 83000. 657 Given decrease parameters of b(w) = 1/2 for w = Low_Window, and b(w) 658 = High_Decrease for w = High_Window, we are left to specify the 659 value of b(w) for other values of w > Low_Window. From [FRS02], we 660 let b(w) vary linearly as the log of w, as follows: 662 b(w) = (High_Decrease - 0.5) (log(w)-log(W)) / (log(W_1)-log(W)) + 663 0.5, 665 for W = Low_window and W_1 = High_window. The increase parameter 666 a(w) can then be computed as follows: 668 a(w) = w^2 * p(w) * 2 * b(w)/(2-b(w)), 670 for p(w) the packet drop rate for congestion window w. From 671 inverting Equation (1), we get p(w) as follows: 673 p(w) = 0.078/w^1.2. 675 We assume that experimental implementations of HighSpeed TCP for 676 further investigation will use a pre-computed look-up table for 677 finding a(w) and b(w). For example, the implementation from Tom 678 Dunigan adjusts the a(w) and b(w) parameters every 0.1 seconds. In 679 the appendix we give such a table for our default values of 680 Low_Window = 38, High_Window = 83,000, High_P = 10^-7, and 681 High_Decrease = 0.1. These are also the default values in the NS 682 simulator; example simulations in NS can be run with the command 683 "./test-all-tcpHighspeed" in the directory tcl/test. 685 8. An alternate, linear response functions. 687 In this section we explore an alternate, linear response function 688 for HighSpeed TCP that has been proposed by a number of other 689 people, in particular by Glenn Vinnicombe and Tom Kelly. Similarly, 690 it has been suggested by others that a less "ad-hoc" guideline for a 691 response function for HighSpeed TCP would be to specify a constant 692 value for the number of round-trip times between congestion events. 694 Assume that we keep the value of Low_Window as 38 MSS-sized 695 segments, indicating when the HighSpeed response function diverges 696 from the current TCP response function, but that we modify the 697 High_Window and High_P parameters that specify the upper range of 698 the HighSpeed response function. In particular, consider the 699 response function given by High_Window = 380,000 and High_P = 10^-7, 700 with Low_Window = 38 and Low_P = 10^-3 as before. 702 Using the equations in Section 5, this would give the following 703 Linear response function, for w > Low_Window: 705 W = 0.038/p. 707 This Linear HighSpeed response function is illustrated in Table 7 708 below. For HighSpeed TCP, the number of round-trip times between 709 losses, 1/(pW), equals 1/0.38, or equivalently, 26, for W > 38 710 segments. 712 Packet Drop Rate P Congestion Window W RTTs Between Losses 713 ------------------ ------------------- ------------------- 714 10^-2 12 8 715 10^-3 38 26 716 10^-4 380 26 717 10^-5 3800 26 718 10^-6 38000 26 719 10^-7 380000 26 720 10^-8 3800000 26 721 10^-9 38000000 26 722 10^-10 380000000 26 724 Table 7: An Alternate, Linear TCP Response Function for HighSpeed 725 TCP. The average congestion window W in MSS-sized segments is given 726 as a function of the packet drop rate P. 728 Given a constant decrease b(w) of 1/2, this would give an increase 729 a(w) of w/Low_Window, or equivalently, a constant increase of 730 1/Low_Window packets per acknowledgement, for w > Low_Window. 731 Another possibility is Scalable TCP [K03], which uses a fixed 732 decrease b(w) of 1/8 and a fixed increase per acknowledgement of 733 0.01. This gives an increase a(w) per window of 0.005 w, for a TCP 734 with delayed acknowledgements, for pure MIMD. 736 The relative fairness between the alternate Linear response function 737 and the standard TCP response function is illustrated below in Table 738 8. 740 Packet Drop Rate P Fairness Aggregate Window Bandwidth 741 ------------------ -------- ---------------- --------- 742 10^-2 1.0 24 2.8 Mbps 743 10^-3 1.0 76 9.1 Mbps 744 10^-4 3.2 500 60.0 Mbps 745 10^-5 15.1 4179 501.4 Mbps 746 10^-6 31.6 39200 4.7 Gbps 747 10^-7 100.1 383795 46.0 Gbps 749 Table 8: Relative Fairness between the Linear HighSpeed and Standard 750 Response Functions. 752 One attraction of the linear response function is that it is scale- 753 invariant, with a fixed increase in the congestion window per 754 acknowledgement, and a fixed number of round-trip times between loss 755 events. My own assumption would be that having a fixed length for 756 the congestion epoch in round-trip times, regardless of the packet 757 drop rate, would be a poor fit for an imprecise and imperfect world 758 with routers with a range of queue management mechanisms, such as 759 the Drop-Tail queue management that is common today. For example, a 760 response function with a fixed length for the congestion epoch in 761 round-trip times might give less clearly-differentiated feedback in 762 an environment with steady-state background losses at fixed 763 intervals for all flows (as might occur with a wireless link with 764 occasional short error bursts, giving losses for all flows every N 765 seconds regardless of their sending rate). 767 While it is not a goal to have perfect fairness in an environment 768 with synchronized losses, it would be good to have moderately 769 acceptable performance in this regime. This goal might argue 770 against a response function with a constant number of round-trip 771 times between congestion events. However, this is a question that 772 could clearly use additional research and investigation. In 773 addition, flows with different round-trip times would have different 774 time durations for congestion epochs even in the model with a linear 775 response function. 777 The third column of Table 8, the Aggregate Window, gives the 778 aggregate congestion window of two competing TCP connections, one 779 with Linear HighSpeed TCP and one with Standard TCP, given the 780 packet drop rate specified in the first column. From Table 8, a 781 Linear HighSpeed TCP connection would receive fifteen times the 782 bandwidth of a Standard TCP in an environment with a packet drop 783 rate of 10^-5. This would occur when the two flows sharing a single 784 pipe achieved an aggregate window of 4179 packets. Given a round- 785 trip time of 100 ms and a packet size of 1500 bytes, this would 786 occur with an available bandwidth for the two competing flows of 501 787 Mbps. Thus, because the Linear HighSpeed TCP is more aggressive 788 than the HighSpeed TCP proposed above, it also is less fair when 789 competing with Standard TCP in a high-bandwidth environment. 791 9. Tradeoffs for Choosing Congestion Control Parameters. 793 A range of metrics can be used for evaluating choices for congestion 794 control parameters for HighSpeed TCP. My assumption in this section 795 is that for a response function of the form w = c/p^d, for constant 796 c and exponent d, the only response functions that would be 797 considered are response functions with 1/2 <= d <= 1. The two ends 798 of this spectrum are represented by current TCP, with d = 1/2, and 799 by the linear response function described in Section 8 above, with d 800 = 1. HighSpeed TCP lies somewhere in the middle of the spectrum, 801 with d = 0.835. 803 Response functions with exponents less than 1/2 can be eliminated 804 from consideration because they would be even worse than standard 805 TCP in accomodating connections with high congestion windows. 807 9.1. The Number of Round-Trip Times between Loss Events. 809 Response functions with exponents greater than 1 can be eliminated 810 from consideration because for these response functions, the number 811 of round-trip times between loss events decreases as congestion 812 decreases. For a response function of w = c/p^d, with one loss 813 event or congestion event every 1/p packets, the number of round- 814 trip times between loss events is w^((1/d)-1)/c^(1/d). Thus, for 815 standard TCP the number of round-trip times between loss events is 816 linear in w. In contrast, one attraction of the linear response 817 function, as described in Section 8 above, is that it is scale- 818 invariant, in terms of a fixed increase in the congestion window per 819 acknowledgement, and a fixed number of round-trip times between loss 820 events. 822 However, for a response function with d > 1, the number of round- 823 trip times between loss events would be proportional to w^((1/d)-1), 824 for a negative exponent ((1/d)-1), setting smaller as w increases. 825 This would seem undesirable. 827 9.2. The Number of Packet Drops per Loss Event, with Drop-Tail. 829 A TCP connection increases its sending rate by a(w) packets per 830 round-trip time, and in a Drop-Tail environment, this is likely to 831 result in a(w) dropped packets during a single loss event. One 832 attraction of standard TCP is that it has a fixed increase per 833 round-trip time of one packet, minimizing the number of packets that 834 would be dropped in a Drop-Tail environment. For an environment 835 with some form of Active Queue Management, and in particular for an 836 environment that uses ECN, the number of packets dropped in a single 837 congestion event would not be a problem. However, even in these 838 environments, larger increases in the sending rate per round-trip 839 time result in larger stresses on the ability of the queues in the 840 router to absorb the fluctuations. 842 HighSpeed TCP plays a middle ground between the metrics of a 843 moderate number of round-trip times between loss events, and a 844 moderate increase in the sending rate per round-trip time. As shown 845 in Appendix B, for a congestion window of 83,000 packets, HighSpeed 846 TCP increases its sending rate by 70 packets per round-trip time, 847 resulting in at most 70 packet drops when the buffer overflows in a 848 Drop-Tail environment. This increased aggressiveness is the price 849 paid by HighSpeed TCP for its increased scalability. A large number 850 of packets dropped per congestion event could result in synchronized 851 drops from multiple flows, with a possible loss of throughput as a 852 result. 854 Scalable TCP has an increase a(w) of 0.005 w packets per round-trip 855 time. For a congestion window of 83,000 packets, this gives an 856 increase of 415 packets per round-trip time, resulting in roughly 857 415 packet drops per congestion event in a Drop-Tail environment. 859 Thus, HighSpeed TCP and its variants place increased demands on 860 queue management in routers, relative to Standard TCP. (This is 861 rather similar to the increased demands on queue management that 862 would result from using N parallel TCP connections instead of a 863 single Standard TCP connection.) 865 10. Related Issues 867 10.1. Slow-Start. 869 An companion internet-draft on "Limited Slow-Start for TCP with 870 Large Congestion Windows" [F02b] proposes a modification to TCP's 871 slow-start procedure that can significantly improve the performance 872 of TCP connections slow-starting up to large congestion windows. 873 For TCP connections that are able to use congestion windows of 874 thousands (or tens of thousands) of MSS-sized segments (for MSS the 875 sender's MAXIMUM SEGMENT SIZE), the current slow-start procedure can 876 result in increasing the congestion window by thousands of segments 877 in a single round-trip time. Such an increase can easily result in 878 thousands of packets being dropped in one round-trip time. This is 879 often counter-productive for the TCP flow itself, and is also hard 880 on the rest of the traffic sharing the congested link. 882 [F02b] proposes Limited Slow-Start, limiting the number of segments 883 by which the congestion window is increased for one window of data 884 during slow-start, in order to improve performance for TCP 885 connections with large congestion windows. We have separated out 886 Limited Slow-Start to a separate draft because it can be used both 887 with Standard or with HighSpeed TCP. 889 Limited Slow-Start is illustrated in the NS simulator, for snapshots 890 after May 1, 2002, in the tests "./test-all-tcpHighspeed tcp1A" and 891 "./test-all-tcpHighspeed tcpHighspeed1" in the subdirectory 892 "tcl/lib". 894 In order for best-effort flows to safely start-up faster than slow- 895 start, e.g., in future high-bandwidth networks, we believe that it 896 would be necessary for the flow to have explicit feedback from the 897 routers along the path. There are a number of proposals for this, 898 ranging from a minimal proposal for an IP option that allows TCP SYN 899 packets to collect information from routers along the path about the 900 allowed initial sending rate [J02], to proposals with more power 901 that require more fine-tuned and continuous feedback from routers. 902 These proposals all are somewhat longer-term proposals than the 903 HighSpeed TCP proposal in this document, requiring longer lead times 904 and more coordination for deployment, and will be discussed in later 905 documents. 907 10.2. Limiting burstiness on short time scales. 909 Because the congestion window achieved by a HighSpeed TCP connection 910 could be quite large, there is a possibility for the sender to send 911 a large burst of packets in response to a single acknowledgement. 912 This could happen, for example, when there is congestion or 913 reordering on the reverse path, and the sender receives an 914 acknowledgement acknowledging hundreds or thousands of new packets. 915 Such a burst would also result if the application was idle for a 916 short period of time less than a round-trip time, and then suddenly 917 had lots of data available to send. In this case, it would be 918 useful for the HighSpeed TCP connection to have some method for 919 limiting bursts. 921 We do not in this document specify TCP mechanisms for reducing the 922 short-term burstiness. One possible mechanism is to use some form 923 of rate-based pacing, and another possibility is to use maxburst, 924 which limits the number of packets that are sent in response to a 925 single acknowledgement. We would caution, however, against a 926 permanent reduction in the congestion window as a mechanism for 927 limiting short-term bursts. Such a mechanism has been deployed in 928 some TCP stacks, and our view would be that using permanent 929 reductions of the congestion window to reduce transient bursts would 930 be a bad idea [Fl03]. 932 10.3. Other limitations on window size. 934 The TCP header uses a 16-bit field to report the receive window size 935 to the sender. Unmodified, this allows a window size of at most 936 2**16 = 65K bytes. With window scaling, the maximum window size is 937 2**30 = 1073M bytes [RFC 1323]. Given 1500-byte packets, this 938 allows a window of up to 715,000 packets. 940 10.4. Implementation issues. 942 One implementation issue that has been raised with HighSpeed TCP is 943 that with congestion windows of 4MB or more, the handling of 944 successive SACK packets after a packet is dropped becomes very time- 945 consuming at the TCP sender [S03]. Tom Kelly's Scalable TCP 946 includes a "SACK Fast Path" patch that addresses this problem. 948 The issues addressed in the Web100 project, the Net100 project, and 949 related projects about the tuning necessary to achieve high 950 bandwidth data rates with TCP apply to HighSpeed TCP as well 951 [Net100, Web100]. 953 11. Deployment issues. 955 11.1. Deployment issues of HighSpeed TCP 957 We do not claim that the HighSpeed TCP modification to TCP described 958 in this paper is an optimal transport protocol for high-bandwidth 959 environments. Based on our experiences with HighSpeed TCP in the NS 960 simulator [NS], on simulation studies [SA03], and on experimental 961 reports [ABLLS03,D02,CC03,F03], we believe that HighSpeed TCP 962 improves the performance of TCP in high-bandwidth environments, and 963 we are documenting it for the benefit of the IETF community. We 964 encourage the use of HighSpeed TCP, and of its underlying response 965 function, and we further encourage feedback about operational 966 experiences with this or related modifications. 968 We note that in environments typical of much of the current 969 Internet, HighSpeed TCP behaves exactly as does Standard TCP today. 970 This is the case any time the congestion window is less than 38 971 segments. 973 Bandwidth Avg Cwnd w (pkts) Increase a(w) Decrease b(w) 974 --------- ----------------- ------------- ------------- 975 1.5 Mbps 12.5 1 0.50 976 10 Mbps 83 1 0.50 977 100 Mbps 833 6 0.35 978 1 Gbps 8333 26 0.22 979 10 Gbps 83333 70 0.10 981 Table 9: Performance of a HighSpeed TCP connection. 983 To help calibrate, Table 9 considers a TCP connection with 1500-byte 984 packets, an RTT of 100 ms (including average queueing delay), and no 985 competing traffic, and shows the average congestion window if that 986 TCP connection had a pipe all to itself and fully used the link 987 bandwidth, for a range of bandwidths for the pipe. This assumes 988 that the TCP connection would use Table 12 in determining its 989 increase and decrease parameters. The first column of Table 9 gives 990 the bandwidth, and the second column gives the average congestion 991 window w needed to utilize that bandwidth. The third column show 992 the increase a(w) in segments per RTT for window w. The fourth 993 column show the decrease b(w) for that window w (where the TCP 994 sender decreases the congestion window from w to w(1-b(w)) segments 995 after a loss event). We note that the actual congestion window when 996 a loss occurs is likely to be greater than the average congestion 997 window w in column 2, so the decrease parameter used could be 998 slightly smaller than the one given in column 4 of Table 9. 1000 Table 9 shows that a HighSpeed TCP over a 10 Mbps link behaves 1001 exactly the same as a Standard TCP connection, even in the absence 1002 of competing traffic. One can think of the congestion window 1003 staying generally in the range of 55 to 110 segments, with the 1004 HighSpeed TCP behavior being exactly the same as the behavior of 1005 Standard TCP. (If the congestion window is ever 128 segments or 1006 more, then the HighSpeed TCP increases by two segments per RTT 1007 instead of by one, and uses a decrease parameter of 0.44 instead of 1008 0.50.) 1010 Table 9 shows that for a HighSpeed TCP connection over a 100 Mbps 1011 link, with no competing traffic, HighSpeed TCP behaves roughly as 1012 aggressively as six parallel TCP connections, increasing its 1013 congestion window by roughly six segments per round-trip time, and 1014 with a decrease parameter of roughly 1/3 (corresponding to 1015 decreasing down to 2/3-rds of its old congestion window, rather than 1016 to half, in response to a loss event). 1018 For a Standard TCP connection in this environment, the congestion 1019 window could be thought of as varying generally in the range of 550 1020 to 1100 segments, with an average packet drop rate of 2.2 * 10^-6 1021 (corresponding to a bit error rate of 1.8 * 10^-10), or 1022 equivalently, roughly 55 seconds between congestion events. While a 1023 Standard TCP connection could sustain such a low packet drop rate in 1024 a carefully controlled environment with minimal competing traffic, 1025 we would contend that in an uncontrolled best-effort environment 1026 with even a small amount of competing traffic, the occasional 1027 congestion events from smaller competing flows could easily be 1028 sufficient to prevent a Standard TCP flow with no lower-speed 1029 bottlenecks from fully utilizing the available bandwidth of the 1030 underutilized 100 Mbps link. 1032 That is, we would content that in the environment of 100 Mbps links 1033 with a significant amount of available bandwidth, Standard TCP would 1034 sometimes be unable to fully utilize the link bandwidth, and that 1035 HighSpeed TCP would be an improvement in this regard. We would 1036 further contend that in this environment, the behavior of HighSpeed 1037 TCP is sufficiently close to that of Standard TCP that HighSpeed TCP 1038 would be safe to deploy in the current Internet. 1040 We do not believe that the deployment of HighSpeed TCP would serve 1041 as a block to the possible deployment of alternate experimental 1042 protocols for high-speed congestion control, such as Scalable TCP, 1043 XCP [KHR02], or FAST TCP [JWL03]. In particular, we don't expect 1044 HighSpeed TCP to interact any more poorly with alternative 1045 experimental proposals that would the N parallel TCP connections 1046 commonly used today in the absence of HighSpeed TCP. 1048 11.2. Deployment issues of Scalable TCP 1050 We believe that Scalable TCP and HighSpeed TCP have sufficiently 1051 similar response functions that they could easily coexist in the 1052 Internet. However, we have not investigated Scalable TCP 1053 sufficiently to be able to claim, in this document, that Scalable 1054 TCP is safe for a widespread deployment in the current Internet. 1056 Bandwidth Avg Cwnd w (pkts) Increase a(w) Decrease b(w) 1057 --------- ----------------- ------------- ------------- 1058 1.5 Mbps 12.5 1 0.50 1059 10 Mbps 83 0.4 0.125 1060 100 Mbps 833 4.1 0.125 1061 1 Gbps 8333 41.6 0.125 1062 10 Gbps 83333 416.5 0.125 1064 Table 10: Performance of a Scalable TCP connection. 1066 Table 10 shows the performance of a Scalable TCP connection with 1067 1500-byte packets, an RTT of 100 ms (including average queueing 1068 delay), and no competing traffic. The TCP connection is assumed to 1069 use delayed acknowledgements. The first column of Table 10 gives 1070 the bandwidth, the second column gives the average congestion window 1071 needed to utilize that bandwidth, and the third and fourth columns 1072 give the increase and decrease parameters. 1074 Note that even in an environment with a 10 Mbps link, Scalable TCP's 1075 behavior is considerably different from that of Standard TCP. The 1076 increase parameter is smaller than that of Standard TCP, and the 1077 decrease is smaller also, 1/8-th instead of 1/2. That is, for 10 1078 Mbps links, Scalable TCP increases less aggressively than Standard 1079 TCP or HighSpeed TCP, but decreases less aggressively as well. 1081 In an environment with a 100 Mbps link, Scalable TCP has an increase 1082 parameter of roughly four segments per round-trip time, with the 1083 same decrease parameter of 1/8-th. A comparison of Tables 9 and 10 1084 shows that for this scenario of 100 Mbps links, HighSpeed TCP 1085 increases more aggressively than Scalable TCP. 1087 Next we consider the relative fairness between Standard TCP, 1088 HighSpeed TCP and Scalable TCP. The relative fairness between 1089 HighSpeed TCP and Standard TCP was shown in Table 5 earlier in this 1090 document, and the relative fairness between Scalable TCP and 1091 Standard TCP was shown in Table 8. Following the approach in 1092 Section 6, for a given packet drop rate p, for p < 10^-3, we can 1093 estimate the relative fairness between Scalable and HighSpeed TCP as 1094 W_Scalable/W_HighSpeed. This relative fairness is shown in Table 11 1095 below. The bandwidth in the last column of Table 11 is the 1096 aggregate bandwidth of the two competing flows given 100 ms round- 1097 trip times and 1500-byte packets. 1099 Packet Drop Rate P Fairness Aggregate Window Bandwidth 1100 ------------------ -------- ---------------- --------- 1101 10^-2 1.0 24 2.8 Mbps 1102 10^-3 1.0 76 9.1 Mbps 1103 10^-4 1.4 643 77.1 Mbps 1104 10^-5 2.1 5595 671.4 Mbps 1105 10^-6 3.1 50279 6.0 Gbps 1106 10^-7 4.5 463981 55.7 Gbps 1108 Table 11: Relative Fairness between the Scalable and HighSpeed 1109 Response Functions. 1111 The second row of Table 11 shows that for a Scalable TCP and a 1112 HighSpeed TCP flow competing in an environment with 100 ms RTTs and 1113 a 10 Mbps pipe, the two flows would receive essentially the same 1114 bandwidth. The next row shows that for a Scalable TCP and a 1115 HighSpeed TCP flow competing in an environment with 100 ms RTTs and 1116 a 100 Mbps pipe, the Scalable TCP flow would receive roughly 50% 1117 more bandwidth than would HighSpeed TCP. Table 11 shows the 1118 relative fairness in higher-bandwidth environments as well. This 1119 relative fairness seems sufficient that there should be no problems 1120 with Scalable TCP and HighSpeed TCP coexisting in the same 1121 environment as Experimental variants of TCP. 1123 We note that one question that requires more investigation with 1124 Scalable TCP is that of convergence to fairness in environments with 1125 Drop-Tail queue management. 1127 12. Related Work in HighSpeed TCP. 1129 HighSpeed TCP has been separately investigated in simulations by 1130 Sylvia Ratnasamy and by Evandro de Souza [SA03]. The simulations in 1131 [SA03] verify the fairness properties of HighSpeed TCP when sharing 1132 a link with Standard TCP. 1134 These simulations explore the relative fairness of HighSpeed TCP 1135 flows when competing with Standard TCP. The simulation environment 1136 includes background forward and reverse-path TCP traffic limited by 1137 the TCP receive window, along with a small amount of forward and 1138 reverse-path traffic from the web traffic generator. Most of the 1139 simulations so far explore performance on a simple dumbbell topology 1140 with a 1 Gbps link with a propagation delay of 50 ms. Simulations 1141 have been run with Adaptive RED and with DropTail queue management. 1143 The simulations in [SA03] explore performance with a varying number 1144 of competing flows, with the competing traffic being all standard 1145 TCP; all HighSpeed TCP; or a mix of standard and HighSpeed TCP. For 1146 the simulations in [SA03] with RED queue management, the relative 1147 fairness between standard and HighSpeed TCP is consistent with the 1148 relative fairness predicted in Table 5. For the simulations with 1149 Drop Tail queues, the relative fairness is more skewed, with the 1150 HighSpeed TCP flows receiving an even larger share of the link 1151 bandwidth. This is not surprising; with Active Queue Management at 1152 the congested link, the fraction of packet drops received by each 1153 flow should be roughly proportional to that flow's share of the link 1154 bandwidth, while this property no longer holds with Drop Tail queue 1155 management. We also note that relative fairness in simulations with 1156 Drop Tail queue management can sometimes depend on small details of 1157 the simulation scenario, and that Drop Tail simulations need special 1158 care to avoid phase effects [F92]. 1160 [SA03] explores the bandwidth `stolen' by HighSpeed TCP from 1161 standard TCP by exploring the fraction of the link bandwidth N 1162 standard TCP flows receive when competing against N other standard 1163 TCP flows, and comparing this to the fraction of the link bandwidth 1164 the N standard TCP flows receive when competing against N HighSpeed 1165 TCP flows. For the 1 Gbps simulation scenarios dominated by long- 1166 lived traffic, a small number of standard TCP flows are able to 1167 achieve high link utilization, and the HighSpeed TCP flows can be 1168 viewed as stealing bandwidth from the competing standard TCP flows, 1169 as predicted in Section 6 on the Fairness Implications of the 1170 HighSpeed Response Function. However, [SA03] shows that when even a 1171 small fraction of the link bandwidth is used by more bursty, short 1172 TCP connections, the standard TCP flows are unable to achieve high 1173 link utilization, and the HighSpeed TCP flows in this case are not 1174 `stealing' bandwidth from the standard TCP flows, but instead are 1175 using bandwidth that otherwise would not be utilized. 1177 The conclusions of [SA03] are that "HighSpeed TCP behaved as forseen 1178 by its response function, and appears to be a real and viable option 1179 for use on high-speed wide area TCP connections." 1181 Future work that could be explored in more detail includes 1182 convergence times after new flows start-up; recovery time after a 1183 transient outage; the response to sudden severe congestion, and 1184 investigations of the potential for oscillations. We invite 1185 contributions from others in this work. 1187 13. Relationship to other Work. 1189 Our assumption is that HighSpeed TCP will be used with the TCP SACK 1190 option, and also with the increased Initial Window of three or four 1191 segments, as allowed by [RFC3390]. For paths that have substantial 1192 reordering, TCP performance would be greatly improved by some of the 1193 mechanisms still in the research stages for robust performance in 1194 the presence of reordered packets. 1196 Our view is that HighSpeed TCP is largely orthogonal to proposals 1197 for higher PMTU (Path MTU) values [M02]. Unlike changes to the 1198 PMTU, HighSpeed TCP does not require any changes in the network or 1199 at the TCP receiver, and works well in the current Internet. Our 1200 assumption is that HighSpeed TCP would be useful even with larger 1201 values for the PMTU. Unlike the current congestion window, the PMTU 1202 gives no information about the bandwidth-delay product available to 1203 that particular flow. 1205 A related approach is that of a virtual MTU, where the actual MTU of 1206 the path might be limited [VMSS,S02]. The virtual MTU approach has 1207 not been fully investigated, and we do not explore the virtual MTU 1208 approach further in this document. 1210 14. Conclusions. 1212 This document has proposed HighSpeed TCP, a modification to TCP's 1213 congestion control mechanism for use with TCP connections with large 1214 congestion windows. We have explored this proposal in simulations, 1215 and others have explored HighSpeed TCP with experiments, and we 1216 believe HighSpeed TCP to be safe to deploy on the current Internet. 1217 We would welcome additional analysis, simulations, and particularly, 1218 experimentation. More information on simuations and experiments is 1219 available from the HighSpeed TCP Web Page [HSTCP]. There are 1220 several independent implementations of HighSpeed TCP [D02,F03] and 1221 of Scalable TCP [K03] for further investigation. 1223 We are bringing this proposal to the IETF to be considered as an 1224 Experimental RFC. 1226 15. Acknowledgements 1228 The HighSpeed TCP proposal is from joint work with Sylvia Ratnasamy 1229 and Scott Shenker (and was initiated by Scott Shenker). Additional 1230 investigations of HighSpeed TCP were joint work with Evandro de 1231 Souza and Deb Agarwal. We thank Tom Dunigan for the implementation 1232 in the Linux 2.4.16 Web100 kernel, and for resulting experimentation 1233 with HighSpeed TCP. We are grateful to the End-to-End Research 1234 Group, the members of the Transport Area Working Group, and to 1235 members of the IPAM program in Large Scale Communication Networks 1236 for feedback. We thank Glenn Vinnicombe for framing the Linear 1237 response function in the parameters of HighSpeed TCP. We are also 1238 grateful for contributions and feedback from the following 1239 individuals: Les Cottrell, Mitchell Erblich, Jeffrey Hsu, Tom Kelly, 1240 Jitendra Padhye, Andrew Reiter, Stanislav Shalunov, Alex Solan, Paul 1241 Sutter, Brian Tierney, Joe Touch. 1243 16. Normative References 1245 [RFC2581] M. Allman, V. Paxson, and W. Stevens, "TCP Congestion 1246 Control", RFC 2581, April 1999. 1248 17. Informative References 1250 [ABLLS03] A. Antony, J. Blom, C. de Laat, J. Lee, and W. Sjouw, 1251 Macroscopic Examination of TCP Flows over Transatlantic Links, 1252 January 2003. URL 1253 "http://carol.wins.uva.nl/%7Edelaat/techrep-2003-2-tcp.pdf". 1255 [BBFS01] Deepak Bansal, Hari Balakrishnan, Sally Floyd, and Scott 1256 Shenker, "Dynamic Behavior of Slowly-Responsive Congestion Control 1257 Algorithms", SIGCOMM 2001, August 2001. 1259 [CC03] Fabrizio Coccetti and Les Cottrell, TCP Stack Measurements on 1260 Lightly Loaded Testbeds, 2003. URL "http://www- 1261 iepm.slac.stanford.edu/monitoring/bulk/fast/". 1263 [CJ89] D. Chiu and R. Jain, "Analysis of the Increase and Decrease 1264 Algorithms for Congestion Avoidance in Computer Networks", Computer 1265 Networks and ISDN Systems, Vol. 17, pp. 1-14, 1989. 1267 [CO98] J. Crowcroft and P. Oechslin, "Differentiated End-to-end 1268 Services using a Weighted Proportional Fair Share TCP", Computer 1269 Communication Review, 28(3):53--69, 1998. 1271 [D02] Tom Dunigan, Floyd's TCP slow-start and AIMD mods, URL 1272 "http://www.csm.ornl.gov/~dunigan/net100/floyd.html". 1274 [F03] Gareth Fairey, High-Speed TCP, 2003. URL 1275 "http://www.hep.man.ac.uk/u/garethf/hstcp/". 1277 [F92] S. Floyd and V. Jacobson, On Traffic Phase Effects in Packet- 1278 Switched Gateways, Internetworking: Research and Experience, V.3 1279 N.3, September 1992, p.115-156. URL 1280 "http://www.icir.org/floyd/papers.html". 1282 [Fl03] Sally Floyd, "Re: [Tsvwg] taking NewReno (RFC 2582) to 1283 Proposed Standard", Email to the tsvwg mailing list, May 14, 2003, 1284 URLs "http://www1.ietf.org/mail-archive/working- 1285 groups/tsvwg/current/msg04086.html" and "http://www1.ietf.org/mail- 1286 archive/working-groups/tsvwg/current/msg04087.html". 1288 [FF98] Floyd, S., and Fall, K., "Promoting the Use of End-to-End 1289 Congestion Control in the Internet", IEEE/ACM Transactions on 1290 Networking, August 1999. 1292 [FRS02] Sally Floyd, Sylvia Ratnasamy, and Scott Shenker, "Modifying 1293 TCP's Congestion Control for High Speeds", May 2002. URL 1294 "http://www.icir.org/floyd/notes.html". 1296 [GRK99] Panos Gevros, Fulvio Risso and Peter Kirstein, "Analysis of 1297 a Method for Differential TCP Service" In Proceedings of the IEEE 1298 GLOBECOM'99, Symposium on Global Internet , December 1999, Rio de 1299 Janeiro, Brazil. 1301 [GV02] S. Gorinsky and H. Vin, "Extended Analysis of Binary 1302 Adjustment Algorithms", Technical Report TR2002-39, Department of 1303 Computer Sciences, The University of Texas at Austin, August 2002. 1304 URL "http://www.cs.utexas.edu/users/gorinsky/pubs.html". 1306 [HSTCP] HighSpeed TCP Web Page, URL 1307 "http://www.icir.org/floyd/hstcp.html". 1309 [J02] Amit Jain and Sally Floyd, "Quick-Start for TCP and IP", 1310 internet draft draft-amit-quick-start-02.txt, work in progress, 1311 2002. 1313 [JWL03] Cheng Jin, David X. Wei and Steven H. Low, FAST TCP for 1314 High-speed Long-distance Networks, internet-draft draft-jwl-tcp- 1315 fast-01.txt, work-in-progress, June 2003. 1317 [K03] Tom Kelly, "Scalable TCP: Improving Performance in HighSpeed 1318 Wide Area Networks", February 2003. URL "http://www- 1319 lce.eng.cam.ac.uk/~ctk21/scalable/". 1321 [KHR02] Dina Katabi, Mark Handley, and Charlie Rohrs, Congestion 1322 Control for High Bandwidth-Delay Product Networks, SIGCOMM 2002. 1324 [M02] Matt Mathis, "Raising the Internet MTU", Web Page, URL 1325 "http://www.psc.edu/~mathis/MTU/". 1327 [Net100] The DOE/MICS Net100 project. URL 1328 "http://www.csm.ornl.gov/~dunigan/net100/". 1330 [NS] The NS Simulator, "http://www.isi.edu/nsnam/ns/". 1332 [RFC 1323] V. Jacobson, R. Braden, and D. Borman, TCP Extensions for 1333 High Performance, RFC 1323, May 1992. 1335 [RFC3390] Allman, M., Floyd, S., and Partridge, C., "Increasing 1336 TCP's Initial Window", RFC 3390, October 2002. 1338 [RFC3448] Mark Handley, Jitendra Padhye, Sally Floyd, and Joerg 1339 Widmer, TCP Friendly Rate Control (TFRC): Protocol Specification, 1340 RFC 3448, January 2003. 1342 [SA03] Souza, E., and Agarwal, D.A., A HighSpeed TCP Study: 1343 Characteristics and Deployment Issues, LBNL Technical Report 1344 LBNL-53215. URL "http://www.icir.org/floyd/hstcp.html". 1346 [S02] Stanislav Shalunov, TCP Armonk, draft, 2002, URL 1347 "http://www.internet2.edu/~shalunov/tcpar/". 1349 [S03] Alex Solan, private communication, 2003. 1351 [VMSS] "Web100 at ORNL", Web Page, 1352 "http://www.csm.ornl.gov/~dunigan/netperf/web100.html". 1354 [Web100] The Web100 project. URL "http://www.web100.org/". 1356 18. Security Considerations 1358 This proposal makes no changes to the underlying security of TCP. 1360 19. IANA Considerations 1362 There are no IANA considerations regarding this document. 1364 20. TCP's Loss Event Rate in Steady-State 1366 This section gives the number of round-trip times between congestion 1367 events for a TCP flow with D-byte packets, for D=1500, as a function 1368 of the connection's average throughput B in bps. To achieve this 1369 average throughput B, a TCP connection with round-trip time R in 1370 seconds requires an average congestion window w of BR/(8D) segments. 1372 In steady-state, TCP's average congestion window w is roughly 1373 1.2/sqrt(p) segments. This is equivalent to a lost event at most 1374 once every 1/p packets, or at most once every 1/(pw) = w/1.5 round- 1375 trip times. Substituting for w, this is a loss event at most every 1376 (BR)/12D)round-trip times. 1378 An an example, for R = 0.1 seconds and D = 1500 bytes, this gives 1379 B/180000 round-trip times between loss events. 1381 B. A table for a(w) and b(w). 1383 This section gives a table for the increase and decrease parameters 1384 a(w) and b(w) for HighSpeed TCP, for the default values of Low_Window 1385 = 38, High_Window = 83000, High_P = 10^-7, and High_Decrease = 0.1. 1387 w a(w) b(w) 1388 ---- ---- ---- 1389 38 1 0.50 1390 118 2 0.44 1391 221 3 0.41 1392 347 4 0.38 1393 495 5 0.37 1394 663 6 0.35 1395 851 7 0.34 1396 1058 8 0.33 1397 1284 9 0.32 1398 1529 10 0.31 1399 1793 11 0.30 1400 2076 12 0.29 1401 2378 13 0.28 1402 2699 14 0.28 1403 3039 15 0.27 1404 3399 16 0.27 1405 3778 17 0.26 1406 4177 18 0.26 1407 4596 19 0.25 1408 5036 20 0.25 1409 5497 21 0.24 1410 5979 22 0.24 1411 6483 23 0.23 1412 7009 24 0.23 1413 7558 25 0.22 1414 8130 26 0.22 1415 8726 27 0.22 1416 9346 28 0.21 1417 9991 29 0.21 1418 10661 30 0.21 1419 11358 31 0.20 1420 12082 32 0.20 1421 12834 33 0.20 1422 13614 34 0.19 1423 14424 35 0.19 1424 15265 36 0.19 1425 16137 37 0.19 1426 17042 38 0.18 1427 17981 39 0.18 1428 18955 40 0.18 1429 19965 41 0.17 1430 21013 42 0.17 1431 22101 43 0.17 1432 23230 44 0.17 1433 24402 45 0.16 1434 25618 46 0.16 1435 26881 47 0.16 1436 28193 48 0.16 1437 29557 49 0.15 1438 30975 50 0.15 1439 32450 51 0.15 1440 33986 52 0.15 1441 35586 53 0.14 1442 37253 54 0.14 1443 38992 55 0.14 1444 40808 56 0.14 1445 42707 57 0.13 1446 44694 58 0.13 1447 46776 59 0.13 1448 48961 60 0.13 1449 51258 61 0.13 1450 53677 62 0.12 1451 56230 63 0.12 1452 58932 64 0.12 1453 61799 65 0.12 1454 64851 66 0.11 1455 68113 67 0.11 1456 71617 68 0.11 1457 75401 69 0.10 1458 79517 70 0.10 1459 84035 71 0.10 1460 89053 72 0.10 1461 94717 73 0.09 1463 Table 12: Parameters for HighSpeed TCP. 1465 This table was computed with the following Perl program: 1467 $top = 100000; 1468 $num = 38; 1469 if ($num == 38) { 1470 print " w a(w) b(w)\n"; 1471 print " ---- ---- ----\n"; 1472 print " 38 1 0.50\n"; 1473 $oldb = 0.50; 1474 $olda = 1; 1475 } 1476 while ($num < $top) { 1477 $bw = (0.1 -0.5)*(log($num)-log(38))/(log(83000)-log(38))+0.5; 1478 $aw = ($num**2*2.0*$bw) / ((2.0-$bw)*$num**1.2*12.8); 1479 if ($aw > $olda + 1) { 1480 printf "%6d %5d %3.2f0, $num, $aw, $bw; 1481 $olda = $aw; 1482 } 1483 $num ++; 1484 } 1486 Table 13: Perl Program for computing parameters for HighSpeed TCP. 1488 C. Exploring the time to converge to fairness. 1490 This section gives the Perl program used to compute the congestion 1491 window growth during congestion avoidance. 1493 $top = 2001; 1494 $hswin = 1; 1495 $regwin = 1; 1496 $rtt = 1; 1497 $lastrtt = 0; 1498 $rttstep = 100; 1499 if ($hswin == 1) { 1500 print " RTT HS_Window Standard_TCP_Window0; 1501 print " --- --------- -------------------0; 1502 } 1503 while ($rtt < $top) { 1504 $bw = (0.1 -0.5)*(log($hswin)-log(38))/(log(83000)-log(38))+0.5; 1505 $aw = ($hswin**2*2.0*$bw) / ((2.0-$bw)*$hswin**1.2*12.8); 1506 if ($aw < 1) { 1507 $aw = 1; 1508 } 1509 if ($rtt >= $lastrtt + $rttstep) { 1510 printf "%5d %9d %10d0, $rtt, $hswin, $regwin; 1511 $lastrtt = $rtt; 1512 } 1513 $hswin += $aw; 1514 $regwin += 1; 1515 $rtt ++; 1516 } 1518 Table 14: Perl Program for computing the window in congestion 1519 avoidance. 1521 AUTHORS' ADDRESSES 1523 Sally Floyd 1524 Phone: +1 (510) 666-2989 1525 ICIR (ICSI Center for Internet Research) 1526 Email: floyd@acm.org 1527 URL: http://www.icir.org/floyd/ 1529 This draft was created in August 2003.