idnits 2.17.1 draft-sridharan-tcpm-ctcp-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 20. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 693. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 669. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 676. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 682. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. == The page length should not exceed 58 lines per page, but there was 6 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Unrecognized Status in 'Intended status: Experimental July 18, 2007', assuming Proposed Standard (Expected one of 'Standards Track', 'Full Standard', 'Draft Standard', 'Proposed Standard', 'Best Current Practice', 'Informational', 'Experimental', 'Informational', 'Historic'.) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) == Outdated reference: A later version (-01) exists of draft-rhee-tcp-cubic-00 -- Obsolete informational reference (is this intentional?): RFC 2988 (Obsoleted by RFC 6298) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Sridharan 3 Internet Draft Microsoft 4 Intended status: Experimental July 18, 2007 K. Tan 5 Expires: January 2008 Microsoft Research 6 D. Bansal 7 D. Thaler 8 Microsoft 10 Compound TCP: A New TCP Congestion Control for High-Speed and Long 11 Distance Networks 13 draft-sridharan-tcpm-ctcp-00.txt 15 Status of this Memo 17 By submitting this Internet-Draft, each author represents that any 18 applicable patent or other IPR claims of which he or she is aware 19 have been or will be disclosed, and any of which he or she becomes 20 aware will be disclosed, in accordance with Section 6 of BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on January 18, 2008. 40 Copyright Notice 42 Copyright (C) The IETF Trust (2007). 44 Abstract 45 This document proposes Compound TCP (CTCP), a modification to TCP's 46 congestion control mechanism for use with TCP connections with large 47 congestion windows. The key idea behind CTCP is to add a scalable 48 delay-based component to the standard TCP's loss-based congestion 49 control. The sending rate of CTCP is controlled by both loss and 50 delay components. The delay-based component has a scalable window 51 increasing rule that not only efficiently uses the link capacity, 52 but on sensing queue build up, gracefully reduces the sending rate. 53 We have implemented CTCP on Microsoft's Windows and we have done 54 extensive testing on production links and in Windows Beta 55 deployments. We also engaged with Stanford Linear Accelerator Center 56 to evaluate the properties of CTCP. The results so far are very 57 encouraging. This document describes the Compound TCP algorithm in 58 detail, and solicits experimentation and feedback from the wider 59 community. In this document, we collectively refer to any TCP 60 congestion control algorithm that employs a linear increase function 61 for congestion control, including TCP Reno and all its variants as 62 Standard TCP. 64 Table of Contents 66 1. Introduction.............................................. 3 67 2. Design Goals.............................................. 5 68 3. Compound TCP Control Law.................................. 5 69 4. Compound TCP Response Function............................ 8 70 5. Automatic Selection of Gamma.............................. 9 71 6. Implementation Issues ................................... 12 72 7. Deployment Issues........................................ 13 73 8. Security Considerations.................................. 13 74 9. IANA Considerations...................................... 13 75 10. Conclusions............................................. 14 76 11. Acknowledgments......................................... 14 77 12. References ............................................. 15 78 12.1. Normative References.............................. 15 79 12.2. Informative References ........................... 15 80 Author's Addresses.......................................... 16 81 Intellectual Property Statement ............................ 17 82 Disclaimer of Validity...................................... 17 84 1. Introduction 86 This document proposes Compound TCP, a modification to TCP's congestion 87 control mechanism for fast, long-distance networks. The standard TCP 88 congestion avoidance algorithm employs an additive increase and 89 multiplicative decrease (AIMD) scheme, which employs a conservative 90 linear growth function for increasing the congestion window and 91 multiplicative decrease function on encountering a loss. For a high- 92 speed and long delay network, it will take standard TCP an unreasonably 93 long time to recover the sending rate after a single loss event 94 [RFC2581, RFC3649]. Moreover, it is well-known now that in a steady- 95 state environment, with a packet loss rate of p, the current standard 96 TCP's average congestion window is inversely proportional to the square 97 root of the packet loss rate [RFC2581,PADHYE]. Therefore, it requires 98 an extremely small packet loss rate to sustain a large window. As an 99 example, Floyd et al. [RFC3649], pointed out that under a 10Gbps link 100 with 100ms delay, it will roughly take one hour for a standard TCP flow 101 to fully utilize the link capacity, if no packet is lost or corrupted. 102 This one hour error free transmission requires a packet loss rate 103 around 10^-11 with 1500-byte size packets (one packet loss over 104 2,600,000,000 packet transmission!), which is not practical in today's 105 networks. 107 There are several proposals to address this fundamental limitation of 108 TCP. One straightforward way to overcome this limitation is to modify 109 TCP control's increase/decrease rule in its congestion avoidance stage. 110 More specifically, in the absence of packet loss, the sender increases 111 congestion window more quickly and decreases it more gently upon a 112 packet loss. In a mixed network environment, the aggressive behavior of 113 such approaches may severely degrade the performance of regular TCP 114 flows whenever the network path is already highly utilized. When an 115 aggressive high-speed variant flow traverses the bottleneck link with 116 other standard TCP flows, it may increase its own share of bandwidth by 117 reducing the throughput of other competing TCP flows. As a result the 118 aggressive variants will cause much more self-induced packet losses on 119 bottleneck links, and push back the throughput of the regular TCP 120 flows. 122 Then there is the class of high-speed protocols which use variances in 123 RTT as a congestion indicator (e.g., [AFRICA,FAST]). The delay-based 124 approaches are more-or-less derived from the seminal work of TCP-Vegas 125 [VEGAS]. An increase in RTT is considered an early indicator of 126 congestion, and the sending rate is cut in half to avoid buffer 127 overflow. The problem in this approach comes when delay-based and loss- 128 based flows share the same bottleneck link. While the delay-based flows 129 respond to increases in RTT by cutting its sending rate, the loss-based 130 flows continue to increase their sending rate. As a result a delay- 131 based flow obtains far less bandwidth than its fair share. This 132 weakness is hard to remedy for purely delay based approaches. 134 The design of Compound TCP is to satisfy to efficiency requirement and 135 TCP friendliness requirement simultaneously. The key idea is that if 136 the link is under-utilized, the high-speed protocol should be 137 aggressive and increase the sending rate quickly. However, once the 138 link is fully utilized, being aggressive will not only adversely affect 139 standard TCP flows but will also cause instability. As noted above, 140 delay-based approaches already have a nice property of adjusting its 141 aggressiveness based on the link utilization, which is observed by the 142 end-systems as an increase in RTT. CTCP incorporates a scalable delay- 143 based component to the standard TCP's congestion avoidance algorithm. 144 Using the delay component as an automatic tuning knob, CTCP is scalable 145 yet TCP friendly. 147 2. Design Goals 149 The design of CTCP is motivated by the following requirements: 151 o Improve throughput by efficiently using the spare capacity in 152 the network 153 o Good intra-protocol fairness when competing with flows that 154 have different RTTs 155 o Should not impact the performance of standard TCP flows sharing 156 the same bottleneck 157 o No additional feedback or support required from the network 159 CTCP can efficiently use the network resource and achieve high link 160 utilization. The aggressiveness can be controlled by adopting a rapid 161 increase rule in the delay-based component. We choose CTCP to have 162 similar aggressiveness as HighSpeed TCP [RFC3649]. Our design choice is 163 motivated by the fact that HSTCP has been tested to be aggressive 164 enough in real world networks and is now an experimental IETF RFC. We 165 also wanted an upper bound on the amount of unfairness to standard TCP 166 flows. However, as shown later, CTCP is able to maintain TCP 167 friendliness under high statistical multiplexing and also while 168 traversing poorly buffered links. CTCP has similar or in some cases, 169 even improved RTT fairness compared to standard TCP. As we will 170 demonstrate later this is due to the fact that the amount of backlogged 171 packets for a connection is independent of the RTT of the connection. 172 Even though CTCP does not require any feedback from the network, CTCP 173 works well in ECN capable environments. There is also no expectation on 174 the queuing algorithm deployed in the routers. 176 As is the case with most high-speed variants today, CTCP does not 177 modify slow-start. We agree to the belief that ramping-up faster than 178 slow-start without additional information from the network can be 179 harmful. Similar to HSTCP, to ensure TCP compatibility, CTCP's scalable 180 component uses the same response function as Standard TCP when the 181 current congestion window is at most Low_Window. CTCP sets Low_Window 182 to 38 MSS-sized segments, corresponding to a packet drop rate of 10^-3 183 for TCP. 185 3. Compound TCP Control Law 187 CTCP modifies Standard TCP's loss-based control law with a scalable 188 delay-based component. To do so, a new state variable is introduced in 189 current TCP Control Block (TCB), namely, dwnd (Delay Window), which 190 controls the delay-based component in CTCP. The conventional congestion 191 window, cwnd, remains untouched, which controls the loss-based 192 component in CTCP. Thus, the CTCP sending window now is controlled by 193 both cwnd and dwnd. Specifically, the TCP sending window (wnd) is now 194 calculated as follows: 196 wnd = min(cwnd + dwnd, awnd), (1) 198 where awnd is the advertised window from the receiver. 200 cwnd is updated in the same way as regular TCP in the congestion 201 avoidance phase, i.e., cwnd is increased by 1 MSS every RTT and halved 202 when a packet loss is encountered. The update to dwnd will be explained 203 in detail later in the section. The combined window for CTCP from (1) 204 above allows up to (cwnd + dwnd) packets in one RTT. Therefore, the 205 increment of cwnd on the arrival of an ACK is modified accordingly: 207 cwnd = cwnd + 1/(cwnd+dwnd) (2) 209 As stated above, CTCP retains the same behavior during slow start. When 210 a connection starts up dwnd is initialized to zero while the connection 211 is in slow start phase. Thus the delay component is effective when the 212 connection enters congestion avoidance. The delay-based algorithm has 213 the following properties. It uses a scalable increase rule when it 214 infers that the network is under-utilized. It also reduces the sending 215 rate when it sense incipient congestion. By reducing its sending rate, 216 the delay-based component yields to competing TCP flows and ensures TCP 217 fairness. It reacts to packet losses by reducing its sending rate, 218 which is necessary to avoid congestion collapse. Our control law for 219 the delay-based component is derived from TCP Vegas. A state variable, 220 called basertt tracks the minimum round trip delay seen by a packet 221 over the network path. When a connection is started, basertt is updated 222 to be the minimum RTT observed during the 3-way handshake. The CTCP 223 sender also maintains a smoothed RTT srtt, updated as specified in 224 [RFC2988]. Then, the number of backlogged packets of the connection can 225 be estimated using, 227 expected (throughput) = wnd/basertt 228 actual (throughput) = wnd/srtt 229 diff = (expected - actual) * basertt 231 The expected throughput gives the estimation of throughput CTCP gets if 232 it does not overrun the network path. The actual throughput stands for 233 the throughput CTCP really gets. Using this we can calculate the amount 234 of data backlogged in the bottleneck queue (diff). Congestion is 235 detected by comparing diff to a threshold gamma. If diff < gamma, the 236 network path is assumed to be under-utilized; otherwise the network 237 path is assumed to be congested and CTCP should gracefully reduce its 238 window. 240 It is to be noted that a connection should have at least gamma packets 241 backlogged in the bottleneck queue to be able to detect incipient 242 congestion. This motivates the need for gamma to be small since the 243 implication is that even when the bottleneck buffer size is small, CTCP 244 will react early enough to ensure TCP fairness. On the other hand if 245 gamma is too small compared to the queue size, CTCP will falsely detect 246 congestion and will adversely affect the throughput. Choosing the 247 appropriate value for gamma could be a problem because this parameter 248 depends on both network configuration and the number of concurrent 249 flows, which are generally unknown to the end-systems. We present an 250 effective way to automatically estimate gamma later in later sections. 252 The increase law of the delay-based component should make CTCP more 253 scalable in high-speed and long delay pipes. We choose a binomial 254 function to increase the delay window [BAINF01]. More specifically, 255 when no congestion is detected, CTCP window increases using the 256 following function 258 dwnd(t+1) = dwnd(t) + alpha*dwnd(t)^k (3) 260 When a packet loss occurs, the delay window is multiplicatively 261 decreased, 263 dwnd(t+1) = dwnd(t)*(1-beta) (4) 265 where alpha, beta and k are tunable to obtain the desirable 266 scalability, smoothness and responsiveness. We assume that a loss is 267 detected by three duplicate ACKs. As explained in the next section we 268 have modeled the response function for CTCP to have comparable 269 scalability to HighSpeed TCP. Since there is already a loss-based 270 component in CTCP, the delay-based component needs to be designed to 271 only fill the gap, and the overall CTCP should follows the behavior 272 defined in (3) and (4). We now summarize the control law for CTCP's 273 delay component as follows; 274 dwnd(t+1) = 275 dwnd(t) + alpha*dwnd(t)^k - 1, if diff < gamma (5) 276 dwnd(t) - eta*diff, if diff >= gamma (6) 277 dwnd(t)(1-beta) - cwnd/2, on packet loss (7) 279 where (5) shows that in the increase phase, dwnd only needs to increase 280 by (alpha*dwnd(t)^k - 1) packets, since the loss-based component cwnd 281 will also increase by 1 packet. When a packet loss occurs, dwnd is set 282 to the difference between the desired reduced window size and that can 283 be provided by cwnd. The rule in equation (6) is very important to 284 preserve good RTT and TCP fairness. Eta defines how rapidly the delay 285 component should reduce its window when congestion is detected. Note 286 that dwnd is never negative, so the CTCP window is lower bounded by its 287 loss based component, which is same as Standard TCP. 289 If a retransmission timeout occurs, dwnd should be reset to zero and 290 the delay-based component is disabled. It is because that after a 291 timeout, the TCP sender enters slow-start phase. After the CTCP sender 292 exits the slow-start recovery state and enters congestion avoidance, 293 dwnd control kicks in again. 295 4. Compound TCP Response Function 297 The TCP response function provides a relationship between TCP's average 298 congestion window w in MSS-sized segments as a function of the steady- 299 state packet drop rate p. To specify a modified response function for 300 CTCP, we use the analytical model in [CTCPI06] to derive a relationship 301 between w and p. Based on this model, the response function for CTCP 302 provides the following relationship between w and p, 304 w ~.1/(p^(1/2-k)) (8) 306 As explained earlier we modeled the response function for CTCP to have 307 comparable scalability to HighSpeed TCP. The response function for 308 HighSpeed TCP is 310 w ~.1/p^0.835 (9) 312 Comparing (8) and (9) we get k to be around 0.8. Since it's difficult 313 to implement an arbitrary power we choose k = 0.75 which can be 314 implemented using a fast integer algorithm for square root. Based on 315 extensive experimentation, we choose alpha = 1/8 and beta = 1/2. 316 Substituting the above values for alpha, beta and k in (8) we get the 317 following response function for CTCP, 319 w = 0.255/p^0.8 (10) 321 The response function for CTCP is compared with HSTCP and is 322 illustrated in Table 1 below. 324 CTCP HSTCP 326 Packet Drop Rate P Congestion Window W Congestion Window W 327 ------------------ ------------------- ------------------- 328 10^-3 64 38 329 10^-4 404 263 330 10^-5 2552 1795 331 10^-6 16107 12279 332 10^-7 101630 83981 333 10^-8 641245 574356 334 10^-9 4045987 3928088 335 10^-10 25528453 26864653 337 Table 1: TCP Response function for CTCP & HSTCP 339 The values in Table 1 illustrate that our choice of parameters makes 340 CTCP slightly more aggressive than HSTCP in moderate and low packet 341 loss rates but approaches HSTCP for larger windows. The reason we 342 choose to do this is because unlike HighSpeed TCP, CTCP's delay control 343 is capable of scaling back on detecting incipient congestion. As a 344 result we expect CTCP to be more TCP friendly than HighSpeed TCP. We 345 show that this is in fact the case even under low buffering conditions 346 in the presence of high statistical multiplexing. The fairness 347 considerations and choice of gamma are detailed in later sections. 349 5. Automatic Selection of Gamma 351 To effectively detect early congestions, CTCP requires estimating the 352 backlogged packets at bottleneck queue and compares this estimate to a 353 pre-defined threshold gamma. However, setting this threshold gamma is 354 particular difficult for CTCP (and to many other similar delay-based 355 approaches), because gamma largely depends on the network configuration 356 and the number of concurrent flows that compete for the same bottleneck 357 link, which are, unfortunately, unknown to end-systems. Based on 358 experimentation over varying conditions we selected gamma to be 30 359 packets. This value provided a pretty good tradeoff between TCP 360 fairness and throughput. However a fixed gamma can still result in poor 361 TCP friendliness over under-buffered network links. One naive solution 362 is to choose a very small value for gamma, however this can falsely 363 detect congestion and adversely affect throughput. To address this 364 problem we use a method called tuning-by-emulation to dynamically 365 adjust gamma. The basic idea of our proposal is to estimate the 366 backlogged packets of a Standard TCP flow along the same path by 367 emulating the behavior of a Standard TCP flow in runtime. Based on 368 this, gamma is set so as to ensure good TCP-friendliness. CTCP can then 369 automatically adapt to different network configurations (i.e., buffer 370 provisioning) and also concurrent competing flows. 372 Our analytical model on CTCP shows that gamma should at least be less 373 than B/m+l to ensure the effectiveness of incipient congestion 374 detection, where m and l present the flow number of concurrent Standard 375 TCP flows and CTCP flows that are competing for the same bottleneck 376 link [CTCPI06,CTCPP06,CTCPT]. Generally, both B and (m+l) are unknown 377 to end-systems. It is very difficult to estimate these values from end- 378 systems in real-time, especially the number of flows, which can vary 379 significantly over time. Fortunately there is a way to directly 380 estimate the ratio B/m+l, even though the individual variables B or 381 (m+l) are hard to estimate. Let's first assume there are (m+l) regular 382 TCP flows in the network. These (m+l) flows should be able to fairly 383 share the bottleneck capacity in steady state. Therefore, they should 384 also get roughly equal share of the buffers at the bottleneck, which 385 should equal to B/m+l. For such a Standard TCP flow, although it does 386 not know either B or (m+l), it can still infer B/m+l easily by 387 estimating its backlogged packets, which is a rather mature technique 388 widely used in many delay-based protocols. This brings us to the core 389 idea of CTCP's algorithm; CTCP lets the sender emulate the congestion 390 window of a Standard TCP flow. Using this emulated window, we can 391 estimate the buffer occupancy (Q) for a Standard TCP flow. Q can be 392 regarded as a conservative estimate of B/m+l assuming that the high 393 speed flow is more aggressive than Standard TCP. By choosing gamma <= 394 Q, we can ensure TCP fairness. 396 The implementation is actually trivial. This is because CTCP already 397 emulates Standard TCP as the loss-based component. We can simply 398 estimate the buffer occupancy of a competing Standard TCP flow from 399 state which CTCP already maintains. We choose an initial gamma = 30 and 400 Q is calculated as follows, 402 expected_reno (throughput) = cwnd/basertt 403 actual_reno (throughput) = cwnd/srtt 404 diff_reno = (expected - actual) * basertt 406 The difference between diff_reno and diff is simply that diff_reno is 407 computed only using the loss based component cwnd. Since Standard TCP 408 reaches its maximum buffer occupancy just before a loss, CTCP uses the 409 diff_reno value computed in the earlier round to calculate the gamma 410 for the next round. Whenever a loss happens, gamma is chosen to be less 411 than diff_reno and the sample values of gamma are updated using a 412 standard exponentially weighted moving average. The pseudocode to 413 calculate gamma is shown below. Here a round tracks every window worth 414 of data. We will provide more details on how to maintain a round in 415 Section 7. 417 Initialization: 418 diff_reno = invalid; 419 Gamma = 30; 421 End-of-Round: 423 expected_reno = cwnd / baseRTT; 424 actual_reno = cwnd / RTT; 425 diff_reno = (Expected_reno-Actual_reno)*baseRTT; 427 On-Packet-Loss: 429 If diff_reno is valid then 430 g_sample = 3/4*Diff_reno; 431 gamma = gamma*(1-lamda)+ lamda*g_sample; 432 if (gamma < gamma_low) 433 gamma=gamma_low; 434 else if (gamma > gamma_high) 435 gamma=gamma_high; 436 fi 437 diff_reno = invalid; 438 fi 440 The recommended values for gamma_low and gamma_high are 5 and 30 441 respectively. diff_reno is set to invalid to prevent using stale 442 diff_reno data when there are consecutive losses between which no 443 samples were taken. 445 6. Implementation Issues 447 The first challenge is to design a mechanism that can precisely track 448 the changes in round trip time with minimal overhead, and can scale 449 well to support many concurrent TCP connections. Naively taking RTT 450 sample for every packet will obviously be an over-kill for both CPU and 451 system memory, especially for high-speed and long distance networks 452 where the congestion window can be very large. Therefore, CTCP needs to 453 limit the number of samples taken, but without compromising on 454 accuracy. In our implementation, we only take up to M sample per window 455 of data. M is chosen to scale with the round trip delay and window 456 size. 458 In order to further improve the efficiency in memory usage, we have 459 developed a memory allocation mechanism to dynamically allocate sample 460 buffers from a kernel fixed-size per-processor pool. The size should be 461 chosen as a function of the available system memory. As the window size 462 increases, M can be updated so that the samples are uniformly 463 distributed over the window. As M gets updated more memory blocks are 464 allocated and linked to the existing sample buffers. If the sending 465 rate changes either due to network conditions or due to application 466 behavior, the sample blocks are reclaimed to the global memory pool. 467 This dynamic buffer management ensures the scalability of our 468 implementation, so that it can work well even in a busy server which 469 could host tens of thousands of TCP connections simultaneously. Note 470 that it may also require high-resolution timer to time RTT samples. 472 The rest of the implementation is rather straightforward. We add two 473 new state variables into the standard TCP Control Block, namely dwnd 474 and basertt. The basertt is a value that tracks the minimum RTT sample 475 measured seen so far and it is used as an estimation of the 476 transmission delay of a single packet. Basertt is usually cleared if a 477 retransmission timeout is hit. It is a good idea to re-measure the 478 basertt incase the network conditions have changed. Following the 479 common practice of high-speed protocols, CTCP reverts to standard TCP 480 behavior when the window is small. Delay-based component only kicks in 481 when cwnd is larger than some threshold, currently set to 38 packets 482 assuming 1500 byte MTU. dwnd is updated at the end of each round. Note 483 that no RTT sampling and dwnd update happens during the loss recovery 484 phase. It is because the retransmission during the loss recovery phase 485 may result in inaccurate RTT samples and can adversely affect the 486 delay-based control. 488 7. Deployment Issues 490 There are several variations of TCP proposed for high speed and long 491 delay networks. We do not claim Compound TCP to be the best nor the 492 most optimal algorithm. However, based on our extensive testing via 493 simulations, experimentation including those on production links as 494 well as beta deployments of a reasonable scale, we believe that 495 Compound TCP satisfies the design considerations outlined before in 496 this document. It effectively uses spare bandwidth in high speed 497 networks, achieves good intra-protocol fairness even in the presence of 498 differing RTTs and does not adversely impact standard TCP. Further, 499 Compound TCP does not require any changes or any new feedback from the 500 network and is deployable over the current Internet in an incremental 501 fashion. It inter-operates with Standard TCP and requires support only 502 one the send side of a TCP connection for it to be used. 503 We also note that similar to High Speed TCP, in environments typical of 504 much of the current Internet, Compound TCP behaves exactly like 505 Standard TCP. This it does by ensuring that is follows standard TCP 506 algorithm without any modification any time congestion window is less 507 than 38 packets. Only when congestion window is greater than 38 508 packets, does the delay based component of Compound TCP gets invoked. 509 Thus, for example for a connection with RTT of 100ms, end to end 510 bandwidth must be greater than 4.8Mbps for CTCP algorithm to have any 511 difference in its response to network conditions than a standard TCP. 513 Further, we do not believe that the deployment of Compound TCP would 514 block the possible deployment of alternate experimental congestion 515 control algorithms such as Fast TCP [FAST] or CUBIC [CUBIC]. In 516 particular, Compound TCP�s response has a fallback to loss based 517 function that has characteristics very similar to HS-TCP or N parallel 518 TCP connections. 520 8. Security Considerations 522 This proposal makes no changes to the underlying security of the TCP 523 protocol. 525 9. IANA Considerations 527 There are no IANA considerations regarding this proposal. 529 10. Conclusions 531 This document proposes a novel congestion control algorithm for TCP for 532 high speed and long delay networks. By introducing a delay based 533 component in addition to a standard TCP based loss component, Compound 534 TCP is able to detect and effectively use spare bandwidth that may be 535 available on a high speed and long delay network. Further, delay based 536 component detects onset of congestion early and gracefully reduces 537 sending rate. The loss based component, on the other hand, ensures 538 there is effective response to losses in network while in the absence 539 of losses, keeps the throughput of CTCP lower bounded by TCP Reno. 540 Thus, CTCP is not timid, nor induces more self induced packet loss than 541 a single standard TCP flow. Thus Compound TCP is efficient in consuming 542 available bandwidth while being friendly to standard TCP. Further, the 543 delay component does not have any RTT bias thereby reducing the RTT 544 bias of the Compound TCP vis-a-vis standard TCP. 546 Compound TCP has been implemented as an optional component in Microsoft 547 Windows Vista Operating System. It has been tested and experimented 548 through broad Windows Vista beta deployments where it has been verified 549 to meet its objectives without causing any adverse impact. SLAC has 550 also evaluated Compound TCP on production links. Based on testing and 551 evaluation done so far, we believe Compound TCP is safe to deploy on 552 the current Internet. We welcome additional analysis, testing and 553 evaluation of Compound TCP by Internet community at large and continue 554 to do additional testing ourselves. 556 11. Acknowledgments 558 The authors would like to thank Jingmin Song for all his efforts in 559 evaluating the algorithm on the test beds. We are thankful to Yee-ting 560 Lee and Les Cottrell for testing and evaluation of Compound TCP on 561 Internet2 links [SLAC]. We would like to thank Sanjay Kaniyar for his 562 insightful comments and for driving this project in Microsoft. We are 563 also thankful to the Microsft.com data center staff who helped us 564 evaluate Compound TCP on their production links. In addition, several 565 folks from the Internet research community who attended the High-Speed 566 TCP Summit at Microsoft [MSWRK] have provided valuable feedback on 567 Compound TCP. Finally, we are thankful to the Windows Vista program 568 beta participants who helped us test and evaluate CTCP. 570 12. References 572 12.1. Normative References 574 [RFC2581] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion 575 Control", RFC 2581, April 1999. 577 12.2. Informative References 579 [AFRICA] R. King, R. Baraniuk and R. riedi, "TCP-Africa: An 580 Adaptive and Fair Rapid Increase Rule for Scalable 581 TCP", In Proc. INFOCOM 2005. 583 [BAINF01] D. Bansal and H. Balakrishnan, "Binomial Congestion 584 Control Algorithms", Proc INFOCOM 2001. 586 [CTCPI06] K. Tan, Jingmin Song, Qian Zhang, Murari Sridharan, "A 587 Compound TCP Approach for High-speed and Long Distance 588 Networks", in IEEE Infocom, April 2006, Barcelona, 589 Spain. 591 [CTCPP06] K. Tan, J. Song, Q. Zhang, and M. Sridharan, "Compound 592 TCP: A Scalable and TCP-friendly Congestion Control for 593 High-speed Networks", in 4th International workshop on 594 Protocols for Fast Long-Distance Networks (PFLDNet), 595 2006, Nara, Japan. 597 [CTCPT] K. Tan, J. Song, M. Sridharan, and C.Y. Ho, "CTCP: 598 Improving TCP-Friendliness Over Low-Buffered Network 599 Links", Microsoft Technical Report. 601 [CUBIC] I. Rhee, L. Xu and S. Ha, "CUBIC for fast long distance 602 networks", Internet Draft, Expires Aug 31, 2007, draft- 603 rhee-tcp-cubic-00.txt 605 [FAST] C. Jin, D. Wei, S. Low, "FAST TCP: Motivation, 606 Architecture, Algorithms, Performance", in IEEE Infocom 607 2004. 609 [MSWRK] Microsoft High-Speed TCP Summit, 610 http://research.microsoft.com/events/TCPSummit/ 612 [PADHYE] J. Padhya, V. Firoiu, D. Towsley and J. Kurose, "Modeling 613 TCP Throughput: A Simple Model and its Empirical 614 Validation", in Proc. ACM SIGCOMM 1998. 616 [RFC2988] V. Paxson and M. Allman, "Computing TCP's Retransmission 617 Timer", RFC 2988, November 2000. 619 [RFC3649] S. Floyd, "HighSpeed TCP for Large Congestion Windows", 620 RFC 3649, Dec 2003. 622 [SLAC] Yee-Ting Li, "Evaluation of TCP Congestion Control 623 Algorithms on the Windows Vista Platform", SLAC-TN-06- 624 005, 625 http://www.slac.stanford.edu/pubs/slactns/tn04/slac-tn- 626 06-005.pdf 628 [VEGAS] L. Brakmo, S. O'Malley, and L. Peterson, "TCP Vegas: New 629 techniques for congestion detection and avoidance", in 630 Proc. ACM SIGCOMM, 1994. 632 Authors' Addresses 634 Murari Sridharan 635 Microsoft Corporation 636 1 Microsoft Way, Redmond 98052 638 Email: muraris@microsoft.com 640 Kun Tan 641 Microsoft Research 642 5/F, Beijing Sigma Center 643 No.49, Zhichun Road, Hai Dian District 644 Beijing China 100080 646 Email: kuntan@microsoft.com 648 Deepak Bansal 649 Microsoft Corporation 650 1 Microsoft Way, Redmond 98052 652 Email: dbansal@microsoft.com 654 Dave Thaler 655 Microsoft Corporation 656 1 Microsoft Way, Redmond 98052 658 Email: dthaler@microsoft.com 660 Intellectual Property Statement 662 The IETF takes no position regarding the validity or scope of any 663 Intellectual Property Rights or other rights that might be claimed 664 to pertain to the implementation or use of the technology described 665 in this document or the extent to which any license under such 666 rights might or might not be available; nor does it represent that 667 it has made any independent effort to identify any such rights. 668 Information on the procedures with respect to rights in RFC 669 documents can be found in BCP 78 and BCP 79. 671 Copies of IPR disclosures made to the IETF Secretariat and any 672 assurances of licenses to be made available, or the result of an 673 attempt made to obtain a general license or permission for the use 674 of such proprietary rights by implementers or users of this 675 specification can be obtained from the IETF on-line IPR repository 676 at http://www.ietf.org/ipr. 678 The IETF invites any interested party to bring to its attention any 679 copyrights, patents or patent applications, or other proprietary 680 rights that may cover technology that may be required to implement 681 this standard. Please address the information to the IETF at 682 ietf-ipr@ietf.org. 684 Disclaimer of Validity 686 This document and the information contained herein are provided on 687 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 688 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 689 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 690 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 691 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 692 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 693 FOR A PARTICULAR PURPOSE. 695 Copyright Statement 696 Copyright (C) The IETF Trust (2007). 697 This document is subject to the rights, licenses and restrictions 698 contained in BCP 78, and except as set forth therein, the authors 699 retain all their rights. 701 Acknowledgment 702 Funding for the RFC Editor function is currently provided by the 703 Internet Society.