idnits 2.17.1 

draft-sridharan-tcpm-ctcp-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 20.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 693.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 669.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 676.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 682.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == There is 1 instance of lines with non-ascii characters in the document.

  == The page length should not exceed 58 lines per page, but there was 6
     longer pages, the longest (page 2) being 59 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == Unrecognized Status in 'Intended status: Experimental July 18, 2007',
     assuming Proposed Standard

     (Expected one of 'Standards Track', 'Full Standard', 'Draft Standard',
     'Proposed Standard', 'Best Current Practice', 'Informational',
     'Experimental', 'Informational', 'Historic'.)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  == Outdated reference: A later version (-01) exists of
     draft-rhee-tcp-cubic-00

  -- Obsolete informational reference (is this intentional?): RFC 2988
     (Obsoleted by RFC 6298)


     Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                      M. Sridharan
3	Internet Draft                                                Microsoft
4	Intended status: Experimental July 18, 2007                      K. Tan
5	Expires: January 2008                                Microsoft Research
6	                                                              D. Bansal
7	                                                              D. Thaler
8	                                                              Microsoft

10	    Compound TCP: A New TCP Congestion Control for High-Speed and Long
11	                             Distance Networks

13	                       draft-sridharan-tcpm-ctcp-00.txt

15	Status of this Memo

17	  By submitting this Internet-Draft, each author represents that any
18	  applicable patent or other IPR claims of which he or she is aware
19	  have been or will be disclosed, and any of which he or she becomes
20	  aware will be disclosed, in accordance with Section 6 of BCP 79.

22	  Internet-Drafts are working documents of the Internet Engineering
23	  Task Force (IETF), its areas, and its working groups.  Note that
24	  other groups may also distribute working documents as Internet-
25	  Drafts.

27	  Internet-Drafts are draft documents valid for a maximum of six months
28	  and may be updated, replaced, or obsoleted by other documents at any
29	  time.  It is inappropriate to use Internet-Drafts as reference
30	  material or to cite them other than as "work in progress."

32	  The list of current Internet-Drafts can be accessed at
33	  http://www.ietf.org/ietf/1id-abstracts.txt.

35	  The list of Internet-Draft Shadow Directories can be accessed at
36	  http://www.ietf.org/shadow.html.

38	   This Internet-Draft will expire on January 18, 2008.

40	Copyright Notice

42	   Copyright (C) The IETF Trust (2007).

44	Abstract
45	   This document proposes Compound TCP (CTCP), a modification to TCP's
46	   congestion control mechanism for use with TCP connections with large
47	   congestion windows. The key idea behind CTCP is to add a scalable
48	   delay-based component to the standard TCP's loss-based congestion
49	   control. The sending rate of CTCP is controlled by both loss and
50	   delay components. The delay-based component has a scalable window
51	   increasing rule that not only efficiently uses the link capacity,
52	   but on sensing queue build up, gracefully reduces the sending rate.
53	   We have implemented CTCP on Microsoft's Windows and we have done
54	   extensive testing on production links and in Windows Beta
55	   deployments. We also engaged with Stanford Linear Accelerator Center
56	   to evaluate the properties of CTCP. The results so far are very
57	   encouraging. This document describes the Compound TCP algorithm in
58	   detail, and solicits experimentation and feedback from the wider
59	   community. In this document, we collectively refer to any TCP
60	   congestion control algorithm that employs a linear increase function
61	   for congestion control, including TCP Reno and all its variants as
62	   Standard TCP.

64	Table of Contents

66	   1. Introduction.............................................. 3
67	   2. Design Goals.............................................. 5
68	   3. Compound TCP Control Law.................................. 5
69	   4. Compound TCP Response Function............................ 8
70	   5. Automatic Selection of Gamma.............................. 9
71	   6. Implementation Issues ................................... 12
72	   7. Deployment Issues........................................ 13
73	   8. Security Considerations.................................. 13
74	   9. IANA Considerations...................................... 13
75	   10. Conclusions............................................. 14
76	   11. Acknowledgments......................................... 14
77	   12. References ............................................. 15
78	       12.1. Normative References.............................. 15
79	       12.2. Informative References ........................... 15
80	   Author's Addresses.......................................... 16
81	   Intellectual Property Statement ............................ 17
82	   Disclaimer of Validity...................................... 17

84	1. Introduction

86	This document proposes Compound TCP, a modification to TCP's congestion
87	control mechanism for fast, long-distance networks. The standard TCP
88	congestion avoidance algorithm employs an additive increase and
89	multiplicative decrease (AIMD) scheme, which employs a conservative
90	linear growth function for increasing the congestion window and
91	multiplicative decrease function on encountering a loss. For a high-
92	speed and long delay network, it will take standard TCP an unreasonably
93	long time to recover the sending rate after a single loss event
94	[RFC2581, RFC3649]. Moreover, it is well-known now that in a steady-
95	state environment, with a packet loss rate of p, the current standard
96	TCP's average congestion window is inversely proportional to the square
97	root of the packet loss rate [RFC2581,PADHYE]. Therefore, it requires
98	an extremely small packet loss rate to sustain a large window. As an
99	example, Floyd et al. [RFC3649], pointed out that under a 10Gbps link
100	with 100ms delay, it will roughly take one hour for a standard TCP flow
101	to fully utilize the link capacity, if no packet is lost or corrupted.
102	This one hour error free transmission requires a packet loss rate
103	around 10^-11 with 1500-byte size packets (one packet loss over
104	2,600,000,000 packet transmission!), which is not practical in today's
105	networks.

107	There are several proposals to address this fundamental limitation of
108	TCP. One straightforward way to overcome this limitation is to modify
109	TCP control's increase/decrease rule in its congestion avoidance stage.
110	More specifically, in the absence of packet loss, the sender increases
111	congestion window more quickly and decreases it more gently upon a
112	packet loss. In a mixed network environment, the aggressive behavior of
113	such approaches may severely degrade the performance of regular TCP
114	flows whenever the network path is already highly utilized. When an
115	aggressive high-speed variant flow traverses the bottleneck link with
116	other standard TCP flows, it may increase its own share of bandwidth by
117	reducing the throughput of other competing TCP flows. As a result the
118	aggressive variants will cause much more self-induced packet losses on
119	bottleneck links, and push back the throughput of the regular TCP
120	flows.

122	Then there is the class of high-speed protocols which use variances in
123	RTT as a congestion indicator (e.g., [AFRICA,FAST]). The delay-based
124	approaches are more-or-less derived from the seminal work of TCP-Vegas
125	[VEGAS]. An increase in RTT is considered an early indicator of
126	congestion, and the sending rate is cut in half to avoid buffer
127	overflow. The problem in this approach comes when delay-based and loss-
128	based flows share the same bottleneck link. While the delay-based flows
129	respond to increases in RTT by cutting its sending rate, the loss-based
130	flows continue to increase their sending rate. As a result a delay-
131	based flow obtains far less bandwidth than its fair share. This
132	weakness is hard to remedy for purely delay based approaches.

134	The design of Compound TCP is to satisfy to efficiency requirement and
135	TCP friendliness requirement simultaneously. The key idea is that if
136	the link is under-utilized, the high-speed protocol should be
137	aggressive and increase the sending rate quickly. However, once the
138	link is fully utilized, being aggressive will not only adversely affect
139	standard TCP flows but will also cause instability. As noted above,
140	delay-based approaches already have a nice property of adjusting its
141	aggressiveness based on the link utilization, which is observed by the
142	end-systems as an increase in RTT. CTCP incorporates a scalable delay-
143	based component to the standard TCP's congestion avoidance algorithm.
144	Using the delay component as an automatic tuning knob, CTCP is scalable
145	yet TCP friendly.

147	2. Design Goals

149	The design of CTCP is motivated by the following requirements:

151	     o  Improve throughput by efficiently using the spare capacity in
152	        the network
153	     o  Good intra-protocol fairness when competing with flows that
154	        have different RTTs
155	     o  Should not impact the performance of standard TCP flows sharing
156	        the same bottleneck
157	     o  No additional feedback or support required from the network

159	CTCP can efficiently use the network resource and achieve high link
160	utilization. The aggressiveness can be controlled by adopting a rapid
161	increase rule in the delay-based component. We choose CTCP to have
162	similar aggressiveness as HighSpeed TCP [RFC3649]. Our design choice is
163	motivated by the fact that HSTCP has been tested to be aggressive
164	enough in real world networks and is now an experimental IETF RFC. We
165	also wanted an upper bound on the amount of unfairness to standard TCP
166	flows. However, as shown later, CTCP is able to maintain TCP
167	friendliness under high statistical multiplexing and also while
168	traversing poorly buffered links. CTCP has similar or in some cases,
169	even improved RTT fairness compared to standard TCP. As we will
170	demonstrate later this is due to the fact that the amount of backlogged
171	packets for a connection is independent of the RTT of the connection.
172	Even though CTCP does not require any feedback from the network, CTCP
173	works well in ECN capable environments. There is also no expectation on
174	the queuing algorithm deployed in the routers.

176	As is the case with most high-speed variants today, CTCP does not
177	modify slow-start. We agree to the belief that ramping-up faster than
178	slow-start without additional information from the network can be
179	harmful. Similar to HSTCP, to ensure TCP compatibility, CTCP's scalable
180	component uses the same response function as Standard TCP when the
181	current congestion window is at most Low_Window. CTCP sets Low_Window
182	to 38 MSS-sized segments, corresponding to a packet drop rate of 10^-3
183	for TCP.

185	3. Compound TCP Control Law

187	CTCP modifies Standard TCP's loss-based control law with a scalable
188	delay-based component. To do so, a new state variable is introduced in
189	current TCP Control Block (TCB), namely, dwnd (Delay Window), which
190	controls the delay-based component in CTCP. The conventional congestion
191	window, cwnd, remains untouched, which controls the loss-based
192	component in CTCP. Thus, the CTCP sending window now is controlled by
193	both cwnd and dwnd. Specifically, the TCP sending window (wnd) is now
194	calculated as follows:

196	  wnd = min(cwnd + dwnd, awnd),             (1)

198	where awnd is the advertised window from the receiver.

200	cwnd is updated in the same way as regular TCP in the congestion
201	avoidance phase, i.e., cwnd is increased by 1 MSS every RTT and halved
202	when a packet loss is encountered. The update to dwnd will be explained
203	in detail later in the section. The combined window for CTCP from (1)
204	above allows up to (cwnd + dwnd) packets in one RTT. Therefore, the
205	increment of cwnd on the arrival of an ACK is modified accordingly:

207	  cwnd = cwnd + 1/(cwnd+dwnd)               (2)

209	As stated above, CTCP retains the same behavior during slow start. When
210	a connection starts up dwnd is initialized to zero while the connection
211	is in slow start phase. Thus the delay component is effective when the
212	connection enters congestion avoidance. The delay-based algorithm has
213	the following properties. It uses a scalable increase rule when it
214	infers that the network is under-utilized. It also reduces the sending
215	rate when it sense incipient congestion. By reducing its sending rate,
216	the delay-based component yields to competing TCP flows and ensures TCP
217	fairness. It reacts to packet losses by reducing its sending rate,
218	which is necessary to avoid congestion collapse. Our control law for
219	the delay-based component is derived from TCP Vegas. A state variable,
220	called basertt tracks the minimum round trip delay seen by a packet
221	over the network path. When a connection is started, basertt is updated
222	to be the minimum RTT observed during the 3-way handshake. The CTCP
223	sender also maintains a smoothed RTT srtt, updated as specified in
224	[RFC2988]. Then, the number of backlogged packets of the connection can
225	be estimated using,

227	  expected (throughput) = wnd/basertt
228	  actual (throughput) = wnd/srtt
229	  diff = (expected - actual) * basertt

231	The expected throughput gives the estimation of throughput CTCP gets if
232	it does not overrun the network path. The actual throughput stands for
233	the throughput CTCP really gets. Using this we can calculate the amount
234	of data backlogged in the bottleneck queue (diff). Congestion is
235	detected by comparing diff to a threshold gamma. If diff < gamma, the
236	network path is assumed to be under-utilized; otherwise the network
237	path is assumed to be congested and CTCP should gracefully reduce its
238	window.

240	It is to be noted that a connection should have at least gamma packets
241	backlogged in the bottleneck queue to be able to detect incipient
242	congestion. This motivates the need for gamma to be small since the
243	implication is that even when the bottleneck buffer size is small, CTCP
244	will react early enough to ensure TCP fairness. On the other hand if
245	gamma is too small compared to the queue size, CTCP will falsely detect
246	congestion and will adversely affect the throughput. Choosing the
247	appropriate value for gamma could be a problem because this parameter
248	depends on both network configuration and the number of concurrent
249	flows, which are generally unknown to the end-systems. We present an
250	effective way to automatically estimate gamma later in later sections.

252	The increase law of the delay-based component should make CTCP more
253	scalable in high-speed and long delay pipes. We choose a binomial
254	function to increase the delay window [BAINF01]. More specifically,
255	when no congestion is detected, CTCP window increases using the
256	following function

258	  dwnd(t+1) = dwnd(t) + alpha*dwnd(t)^k    (3)

260	When a packet loss occurs, the delay window is multiplicatively
261	decreased,

263	  dwnd(t+1) = dwnd(t)*(1-beta)             (4)

265	where alpha, beta and k are tunable to obtain the desirable
266	scalability, smoothness and responsiveness. We assume that a loss is
267	detected by three duplicate ACKs. As explained in the next section we
268	have modeled the response function for CTCP to have comparable
269	scalability to HighSpeed TCP. Since there is already a loss-based
270	component in CTCP, the delay-based component needs to be designed to
271	only fill the gap, and the overall CTCP should follows the behavior
272	defined in (3) and (4). We now summarize the control law for CTCP's
273	delay component as follows;
274	 dwnd(t+1) =
275	     dwnd(t) + alpha*dwnd(t)^k - 1,     if diff < gamma  (5)
276	     dwnd(t) - eta*diff,                if diff >= gamma (6)
277	     dwnd(t)(1-beta) - cwnd/2,          on packet loss   (7)

279	where (5) shows that in the increase phase, dwnd only needs to increase
280	by (alpha*dwnd(t)^k - 1) packets, since the loss-based component cwnd
281	will also increase by 1 packet. When a packet loss occurs, dwnd is set
282	to the difference between the desired reduced window size and that can
283	be provided by cwnd. The rule in equation (6) is very important to
284	preserve good RTT and TCP fairness. Eta defines how rapidly the delay
285	component should reduce its window when congestion is detected. Note
286	that dwnd is never negative, so the CTCP window is lower bounded by its
287	loss based component, which is same as Standard TCP.

289	If a retransmission timeout occurs, dwnd should be reset to zero and
290	the delay-based component is disabled. It is because that after a
291	timeout, the TCP sender enters slow-start phase. After the CTCP sender
292	exits the slow-start recovery state and enters congestion avoidance,
293	dwnd control kicks in again.

295	4. Compound TCP Response Function

297	The TCP response function provides a relationship between TCP's average
298	congestion window w in MSS-sized segments as a function of the steady-
299	state packet drop rate p. To specify a modified response function for
300	CTCP, we use the analytical model in [CTCPI06] to derive a relationship
301	between w and p. Based on this model, the response function for CTCP
302	provides the following relationship between w and p,

304	   w ~.1/(p^(1/2-k))       (8)

306	As explained earlier we modeled the response function for CTCP to have
307	comparable scalability to HighSpeed TCP. The response function for
308	HighSpeed TCP is

310	   w ~.1/p^0.835           (9)

312	Comparing (8) and (9) we get k to be around 0.8. Since it's difficult
313	to implement an arbitrary power we choose k = 0.75 which can be
314	implemented using a fast integer algorithm for square root. Based on
315	extensive experimentation, we choose alpha = 1/8 and beta = 1/2.
316	Substituting the above values for alpha, beta and k in (8) we get the
317	following response function for CTCP,

319	   w = 0.255/p^0.8        (10)

321	The response function for CTCP is compared with HSTCP and is
322	illustrated in Table 1 below.

324	                                    CTCP                 HSTCP

326	     Packet Drop Rate P   Congestion Window W    Congestion Window W
327	     ------------------   -------------------    -------------------
328	            10^-3                     64                     38
329	            10^-4                    404                    263
330	            10^-5                   2552                   1795
331	            10^-6                  16107                  12279
332	            10^-7                 101630                  83981
333	            10^-8                 641245                 574356
334	            10^-9                4045987                3928088
335	            10^-10              25528453               26864653

337	   Table 1: TCP Response function for CTCP & HSTCP

339	The values in Table 1 illustrate that our choice of parameters makes
340	CTCP slightly more aggressive than HSTCP in moderate and low packet
341	loss rates but approaches HSTCP for larger windows. The reason we
342	choose to do this is because unlike HighSpeed TCP, CTCP's delay control
343	is capable of scaling back on detecting incipient congestion. As a
344	result we expect CTCP to be more TCP friendly than HighSpeed TCP. We
345	show that this is in fact the case even under low buffering conditions
346	in the presence of high statistical multiplexing. The fairness
347	considerations and choice of gamma are detailed in later sections.

349	5. Automatic Selection of Gamma

351	To effectively detect early congestions, CTCP requires estimating the
352	backlogged packets at bottleneck queue and compares this estimate to a
353	pre-defined threshold gamma. However, setting this threshold gamma is
354	particular difficult for CTCP (and to many other similar delay-based
355	approaches), because gamma largely depends on the network configuration
356	and the number of concurrent flows that compete for the same bottleneck
357	link, which are, unfortunately, unknown to end-systems. Based on
358	experimentation over varying conditions we selected gamma to be 30
359	packets. This value provided a pretty good tradeoff between TCP
360	fairness and throughput. However a fixed gamma can still result in poor
361	TCP friendliness over under-buffered network links. One naive solution
362	is to choose a very small value for gamma, however this can falsely
363	detect congestion and adversely affect throughput. To address this
364	problem we use a method called tuning-by-emulation to dynamically
365	adjust gamma. The basic idea of our proposal is to estimate the
366	backlogged packets of a Standard TCP flow along the same path by
367	emulating the behavior of a Standard TCP flow in runtime. Based on
368	this, gamma is set so as to ensure good TCP-friendliness. CTCP can then
369	automatically adapt to different network configurations (i.e., buffer
370	provisioning) and also concurrent competing flows.

372	Our analytical model on CTCP shows that gamma should at least be less
373	than B/m+l to ensure the effectiveness of incipient congestion
374	detection, where m and l present the flow number of concurrent Standard
375	TCP flows and CTCP flows that are competing for the same bottleneck
376	link [CTCPI06,CTCPP06,CTCPT]. Generally, both B and (m+l) are unknown
377	to end-systems. It is very difficult to estimate these values from end-
378	systems in real-time, especially the number of flows, which can vary
379	significantly over time. Fortunately there is a way to directly
380	estimate the ratio B/m+l, even though the individual variables B or
381	(m+l) are hard to estimate. Let's first assume there are (m+l) regular
382	TCP flows in the network. These (m+l) flows should be able to fairly
383	share the bottleneck capacity in steady state. Therefore, they should
384	also get roughly equal share of the buffers at the bottleneck, which
385	should equal to B/m+l. For such a Standard TCP flow, although it does
386	not know either B or (m+l), it can still infer B/m+l easily by
387	estimating its backlogged packets, which is a rather mature technique
388	widely used in many delay-based protocols.  This brings us to the core
389	idea of CTCP's algorithm; CTCP lets the sender emulate the congestion
390	window of a Standard TCP flow. Using this emulated window, we can
391	estimate the buffer occupancy (Q) for a Standard TCP flow. Q can be
392	regarded as a conservative estimate of B/m+l assuming that the high
393	speed flow is more aggressive than Standard TCP. By choosing gamma <=
394	Q, we can ensure TCP fairness.

396	The implementation is actually trivial. This is because CTCP already
397	emulates Standard TCP as the loss-based component. We can simply
398	estimate the buffer occupancy of a competing Standard TCP flow from
399	state which CTCP already maintains. We choose an initial gamma = 30 and
400	Q is calculated as follows,

402	 expected_reno (throughput) = cwnd/basertt
403	 actual_reno (throughput) = cwnd/srtt
404	 diff_reno = (expected - actual) * basertt

406	The difference between diff_reno and diff is simply that diff_reno is
407	computed only using the loss based component cwnd. Since Standard TCP
408	reaches its maximum buffer occupancy just before a loss, CTCP uses the
409	diff_reno value computed in the earlier round to calculate the gamma
410	for the next round. Whenever a loss happens, gamma is chosen to be less
411	than diff_reno and the sample values of gamma are updated using a
412	standard exponentially weighted moving average. The pseudocode to
413	calculate gamma is shown below. Here a round tracks every window worth
414	of data. We will provide more details on how to maintain a round in
415	Section 7.

417	  Initialization:
418	    diff_reno = invalid;
419	     Gamma = 30;

421	  End-of-Round:

423	     expected_reno = cwnd / baseRTT;
424	     actual_reno = cwnd / RTT;
425	     diff_reno = (Expected_reno-Actual_reno)*baseRTT;

427	  On-Packet-Loss:

429	  If diff_reno is valid then
430	     g_sample = 3/4*Diff_reno;
431	     gamma = gamma*(1-lamda)+ lamda*g_sample;
432	     if (gamma < gamma_low)
433	       gamma=gamma_low;
434	     else if (gamma > gamma_high)
435	       gamma=gamma_high;
436	     fi
437	     diff_reno = invalid;
438	  fi

440	The recommended values for gamma_low and gamma_high are 5 and 30
441	respectively. diff_reno is set to invalid to prevent using stale
442	diff_reno data when there are consecutive losses between which no
443	samples were taken.

445	6. Implementation Issues

447	The first challenge is to design a mechanism that can precisely track
448	the changes in round trip time with minimal overhead, and can scale
449	well to support many concurrent TCP connections. Naively taking RTT
450	sample for every packet will obviously be an over-kill for both CPU and
451	system memory, especially for high-speed and long distance networks
452	where the congestion window can be very large. Therefore, CTCP needs to
453	limit the number of samples taken, but without compromising on
454	accuracy. In our implementation, we only take up to M sample per window
455	of data. M is chosen to scale with the round trip delay and window
456	size.

458	In order to further improve the efficiency in memory usage, we have
459	developed a memory allocation mechanism to dynamically allocate sample
460	buffers from a kernel fixed-size per-processor pool. The size should be
461	chosen as a function of the available system memory. As the window size
462	increases, M can be updated so that the samples are uniformly
463	distributed over the window. As M gets updated more memory blocks are
464	allocated and linked to the existing sample buffers. If the sending
465	rate changes either due to network conditions or due to application
466	behavior, the sample blocks are reclaimed to the global memory pool.
467	This dynamic buffer management ensures the scalability of our
468	implementation, so that it can work well even in a busy server which
469	could host tens of thousands of TCP connections simultaneously. Note
470	that it may also require high-resolution timer to time RTT samples.

472	The rest of the implementation is rather straightforward. We add two
473	new state variables into the standard TCP Control Block, namely dwnd
474	and basertt. The basertt is a value that tracks the minimum RTT sample
475	measured seen so far and it is used as an estimation of the
476	transmission delay of a single packet. Basertt is usually cleared if a
477	retransmission timeout is hit. It is a good idea to re-measure the
478	basertt incase the network conditions have changed. Following the
479	common practice of high-speed protocols, CTCP reverts to standard TCP
480	behavior when the window is small. Delay-based component only kicks in
481	when cwnd is larger than some threshold, currently set to 38 packets
482	assuming 1500 byte MTU. dwnd is updated at the end of each round. Note
483	that no RTT sampling and dwnd update happens during the loss recovery
484	phase. It is because the retransmission during the loss recovery phase
485	may result in inaccurate RTT samples and can adversely affect the
486	delay-based control.

488	7. Deployment Issues

490	There are several variations of TCP proposed for high speed and long
491	delay networks. We do not claim Compound TCP to be the best nor the
492	most optimal algorithm. However, based on our extensive testing via
493	simulations, experimentation including those on production links as
494	well as beta deployments of a reasonable scale, we believe that
495	Compound TCP satisfies the design considerations outlined before in
496	this document. It effectively uses spare bandwidth in high speed
497	networks, achieves good intra-protocol fairness even in the presence of
498	differing RTTs and does not adversely impact standard TCP. Further,
499	Compound TCP does not require any changes or any new feedback from the
500	network and is deployable over the current Internet in an incremental
501	fashion. It inter-operates with Standard TCP and requires support only
502	one the send side of a TCP connection for it to be used.
503	We also note that similar to High Speed TCP, in environments typical of
504	much of the current Internet, Compound TCP behaves exactly like
505	Standard TCP. This it does by ensuring that is follows standard TCP
506	algorithm without any modification any time congestion window is less
507	than 38 packets. Only when congestion window is greater than 38
508	packets, does the delay based component of Compound TCP gets invoked.
509	Thus, for example for a connection with RTT of 100ms, end to end
510	bandwidth must be greater than 4.8Mbps for CTCP algorithm to have any
511	difference in its response to network conditions than a standard TCP.

513	Further, we do not believe that the deployment of Compound TCP would
514	block the possible deployment of alternate experimental congestion
515	control algorithms such as Fast TCP [FAST] or CUBIC [CUBIC]. In
516	particular, Compound TCP�s response has a fallback to loss based
517	function that has characteristics very similar to HS-TCP or N parallel
518	TCP connections.

520	8.    Security Considerations

522	This proposal makes no changes to the underlying security of the TCP
523	protocol.

525	9.    IANA Considerations

527	There are no IANA considerations regarding this proposal.

529	10.   Conclusions

531	This document proposes a novel congestion control algorithm for TCP for
532	high speed and long delay networks. By introducing a delay based
533	component in addition to a standard TCP based loss component, Compound
534	TCP is able to detect and effectively use spare bandwidth that may be
535	available on a high speed and long delay network. Further, delay based
536	component detects onset of congestion early and gracefully reduces
537	sending rate. The loss based component, on the other hand, ensures
538	there is effective response to losses in network while in the absence
539	of losses, keeps the throughput of CTCP lower bounded by TCP Reno.
540	Thus, CTCP is not timid, nor induces more self induced packet loss than
541	a single standard TCP flow. Thus Compound TCP is efficient in consuming
542	available bandwidth while being friendly to standard TCP. Further, the
543	delay component does not have any RTT bias thereby reducing the RTT
544	bias of the Compound TCP vis-a-vis standard TCP.

546	Compound TCP has been implemented as an optional component in Microsoft
547	Windows Vista Operating System. It has been tested and experimented
548	through broad Windows Vista beta deployments where it has been verified
549	to meet its objectives without causing any adverse impact. SLAC has
550	also evaluated Compound TCP on production links. Based on testing and
551	evaluation done so far, we believe Compound TCP is safe to deploy on
552	the current Internet. We welcome additional analysis, testing and
553	evaluation of Compound TCP by Internet community at large and continue
554	to do additional testing ourselves.

556	11.   Acknowledgments

558	The authors would like to thank Jingmin Song for all his efforts in
559	evaluating the algorithm on the test beds. We are thankful to Yee-ting
560	Lee and Les Cottrell for testing and evaluation of Compound TCP on
561	Internet2 links [SLAC]. We would like to thank Sanjay Kaniyar for his
562	insightful comments and for driving this project in Microsoft. We are
563	also thankful to the Microsft.com data center staff who helped us
564	evaluate Compound TCP on their production links. In addition, several
565	folks from the Internet research community who attended the High-Speed
566	TCP Summit at Microsoft [MSWRK] have provided valuable feedback on
567	Compound TCP. Finally, we are thankful to the Windows Vista program
568	beta participants who helped us test and evaluate CTCP.

570	12.   References

572	12.1. Normative References

574	   [RFC2581] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion
575	             Control", RFC 2581, April 1999.

577	12.2. Informative References

579	   [AFRICA]  R. King, R. Baraniuk and R. riedi, "TCP-Africa: An
580	             Adaptive and Fair Rapid Increase Rule for Scalable
581	             TCP", In Proc. INFOCOM 2005.

583	   [BAINF01] D. Bansal and H. Balakrishnan, "Binomial Congestion
584	             Control Algorithms", Proc INFOCOM 2001.

586	   [CTCPI06] K. Tan, Jingmin Song, Qian Zhang, Murari Sridharan, "A
587	             Compound TCP Approach for High-speed and Long Distance
588	             Networks", in IEEE Infocom, April 2006, Barcelona,
589	             Spain.

591	   [CTCPP06] K. Tan, J. Song, Q. Zhang, and M. Sridharan, "Compound
592	             TCP: A Scalable and TCP-friendly Congestion Control for
593	             High-speed Networks", in 4th International workshop on
594	             Protocols for Fast Long-Distance Networks (PFLDNet),
595	             2006, Nara, Japan.

597	   [CTCPT]   K. Tan, J. Song, M. Sridharan, and C.Y. Ho, "CTCP:
598	             Improving TCP-Friendliness Over Low-Buffered Network
599	             Links", Microsoft Technical Report.

601	   [CUBIC]   I. Rhee, L. Xu and S. Ha, "CUBIC for fast long distance
602	             networks", Internet Draft, Expires Aug 31, 2007, draft-
603	             rhee-tcp-cubic-00.txt

605	   [FAST]    C. Jin, D. Wei, S. Low, "FAST TCP: Motivation,
606	             Architecture, Algorithms, Performance", in IEEE Infocom
607	             2004.

609	   [MSWRK]   Microsoft High-Speed TCP Summit,
610	             http://research.microsoft.com/events/TCPSummit/

612	   [PADHYE]  J. Padhya, V. Firoiu, D. Towsley and J. Kurose, "Modeling
613	             TCP Throughput: A Simple Model and its Empirical
614	             Validation", in Proc. ACM SIGCOMM 1998.

616	   [RFC2988] V. Paxson and M. Allman, "Computing TCP's Retransmission
617	             Timer", RFC 2988, November 2000.

619	   [RFC3649] S. Floyd, "HighSpeed TCP for Large Congestion Windows",
620	             RFC 3649, Dec 2003.

622	   [SLAC]    Yee-Ting Li, "Evaluation of TCP Congestion Control
623	             Algorithms on the Windows Vista Platform", SLAC-TN-06-
624	             005,
625	             http://www.slac.stanford.edu/pubs/slactns/tn04/slac-tn-
626	             06-005.pdf

628	   [VEGAS]   L. Brakmo, S. O'Malley, and L. Peterson, "TCP Vegas: New
629	             techniques for congestion detection and avoidance", in
630	             Proc. ACM SIGCOMM, 1994.

632	Authors' Addresses

634	   Murari Sridharan
635	   Microsoft Corporation
636	   1 Microsoft Way, Redmond 98052

638	   Email: muraris@microsoft.com

640	   Kun Tan
641	   Microsoft Research
642	   5/F, Beijing Sigma Center
643	   No.49, Zhichun Road, Hai Dian District
644	   Beijing China 100080

646	   Email: kuntan@microsoft.com

648	   Deepak Bansal
649	   Microsoft Corporation
650	   1 Microsoft Way, Redmond 98052

652	   Email: dbansal@microsoft.com

654	   Dave Thaler
655	   Microsoft Corporation
656	   1 Microsoft Way, Redmond 98052

658	   Email: dthaler@microsoft.com

660	Intellectual Property Statement

662	   The IETF takes no position regarding the validity or scope of any
663	   Intellectual Property Rights or other rights that might be claimed
664	   to pertain to the implementation or use of the technology described
665	   in this document or the extent to which any license under such
666	   rights might or might not be available; nor does it represent that
667	   it has made any independent effort to identify any such rights.
668	   Information on the procedures with respect to rights in RFC
669	   documents can be found in BCP 78 and BCP 79.

671	   Copies of IPR disclosures made to the IETF Secretariat and any
672	   assurances of licenses to be made available, or the result of an
673	   attempt made to obtain a general license or permission for the use
674	   of such proprietary rights by implementers or users of this
675	   specification can be obtained from the IETF on-line IPR repository
676	   at http://www.ietf.org/ipr.

678	   The IETF invites any interested party to bring to its attention any
679	   copyrights, patents or patent applications, or other proprietary
680	   rights that may cover technology that may be required to implement
681	   this standard.  Please address the information to the IETF at
682	   ietf-ipr@ietf.org.

684	Disclaimer of Validity

686	   This document and the information contained herein are provided on
687	   an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
688	   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
689	   IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
690	   WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
691	   WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
692	   ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
693	   FOR A PARTICULAR PURPOSE.

695	Copyright Statement
696	   Copyright (C) The IETF Trust (2007).
697	   This document is subject to the rights, licenses and restrictions
698	   contained in BCP 78, and except as set forth therein, the authors
699	   retain all their rights.

701	Acknowledgment
702	   Funding for the RFC Editor function is currently provided by the
703	   Internet Society.