idnits 2.17.1 

draft-mathis-tcpm-tcp-laminar-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 21, 2012) is 4447 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Missing Reference: 'PRR' is mentioned on line 150, but not defined

  == Unused Reference: 'RFC2861' is defined on line 567, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2140 (Obsoleted by RFC 9040)

  ** Obsolete normative reference: RFC 2861 (Obsoleted by RFC 7661)

  ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675)

  == Outdated reference: A later version (-04) exists of
     draft-ietf-tcpm-proportional-rate-reduction-00


     Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TCP Maintenance Working Group                                  M. Mathis
3	Internet-Draft                                               Google, Inc
4	Intended status: Experimental                          February 21, 2012
5	Expires: August 24, 2012

7	    Laminar TCP and the case for refactoring TCP congestion control
8	                  draft-mathis-tcpm-tcp-laminar-00.txt

10	Abstract

12	   The primary state variables used by all TCP congestion control
13	   algorithms, cwnd and ssthresh are heavily overloaded, carrying
14	   different semantics in different states.  This leads to excess
15	   implementation complexity and poorly defined behaviors under some
16	   combinations of events, such as loss recovery during cwnd validation.
17	   We propose a new framework for TCP congestion control, and to recast
18	   current standard algorithms to use new state variables.  This new
19	   framework will not generally change the behavior of any of the
20	   primary congestion control algorithms when invoked in isolation but
21	   will to permit new algorithms with better behaviors in many corner
22	   cases, such as when two distinct primary algorithms are invoked
23	   concurrently.  It will also foster the creation of new algorithms to
24	   address some events that are poorly treated by today's standards.
25	   For the vast majority of traditional algorithms the transformation to
26	   the new state variables is completely straightforward.  However, the
27	   resulting implementation will technically be in violation of all
28	   existing TCP standards, even if it is fully compliant with their
29	   principles and intent.

31	Status of this Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at http://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on August 24, 2012.

48	Copyright Notice
49	   Copyright (c) 2012 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
65	     1.1.  Overview of the new algorithm  . . . . . . . . . . . . . .  3
66	     1.2.  Standards Impact . . . . . . . . . . . . . . . . . . . . .  4
67	     1.3.  Meta Language  . . . . . . . . . . . . . . . . . . . . . .  5
68	   2.  State variables and definitions  . . . . . . . . . . . . . . .  5
69	   3.  Updated Algorithms . . . . . . . . . . . . . . . . . . . . . .  6
70	     3.1.  Congestion avoidance . . . . . . . . . . . . . . . . . . .  6
71	     3.2.  Proportional Rate Reduction  . . . . . . . . . . . . . . .  7
72	     3.3.  Restart after idle, Congestion Window Validation and
73	           Pacing . . . . . . . . . . . . . . . . . . . . . . . . . .  8
74	     3.4.  RTO and F-RTO  . . . . . . . . . . . . . . . . . . . . . .  9
75	     3.5.  Undo . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
76	     3.6.  Control Block Interdependence  . . . . . . . . . . . . . .  9
77	     3.7.  New Reno . . . . . . . . . . . . . . . . . . . . . . . . .  9
78	   4.  Example Pseudocode . . . . . . . . . . . . . . . . . . . . . . 10
79	   5.  Compatibility with existing implementations  . . . . . . . . . 11
80	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 12
81	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 13
82	   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
83	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 14

85	1.  Introduction

87	   The primary state variables used by all TCP congestion control
88	   algorithms, cwnd and ssthresh, are heavily overloaded, carrying
89	   different semantics in different states.  This leads to excess
90	   implementation complexity and poorly defined behaviors under some
91	   combinations of events, such as overlapping application stalls and
92	   loss recovery.  Multiple algorithms sharing the same state variables
93	   lead to excess complexity and conflicting correctness constraints,
94	   making it unreasonably difficult to implement, test and evaluate new
95	   algorithms.

97	   We are proposing a new framework for TCP congestion control and it
98	   use new state variables that separate transmission scheduling, which
99	   determines precisely when data is sent, from congestion control,
100	   which determines the amount of data to be sent in each RTT.  This
101	   separation greatly simplifies the interactions between the two
102	   subsystems and permits vast range of new algorithms that are not
103	   feasible with the current parameterization.

105	   This note describes the new framework, represented through its state
106	   variables, and presents a preliminary mapping between current
107	   standards and new algorithms based on the new state variables.  At
108	   this point the new algorithms are not fully specified, and many have
109	   still unconstrained design choices.  In most cases, our goal is to
110	   precisely mimic todays standard TCP, at least as far as well defined
111	   primary behaviors.  In general, it is a non-goal to mimic behaviors
112	   in poorly defined corner cases, or other cases where standard
113	   behaviors are viewed as being problematic.

115	   It is called Laminar because one of its design goals is to eliminate
116	   unnecessary turbulence introduced by TCP itself.

118	1.1.  Overview of the new algorithm

120	   The new framework separate transmission scheduling, which determines
121	   precisely when data is sent, from Congestion Control, which
122	   determines the total amount of data sent in any given RTT.

124	   The default algorithm for transmission scheduling is a strict
125	   implementation of Van Jacobsons' packet conservation principle
126	   [Jacobson88].  Data arriving at the receiver cause ACKs which in turn
127	   cause the sender to transmit an equivalent quantity of data back into
128	   the network.  The primary state variable is implicit in the quantity
129	   of data and ACKs circulating in the network.  This state observed
130	   through a new "total_pipe" estimator, which is a generalization of
131	   "pipe" as described in RFC 3517.  [RFC3517]
132	   A new state variable, CCwin, is the primary congestion control state
133	   variable.  It is updated only by the congestion control algorithms,
134	   which are concerned with detecting and regulating the overall level
135	   of congestion along the path.  CCwin is TCP's best estimate for an
136	   appropriate average window size.  In general, it rises when the
137	   network seem to be underfilled and is reduced in the presence of
138	   congestion signals, such as loss, ECN marks or increased delay.
139	   Although CCwin resembles cwnd, it is actually quite different, for
140	   one thing the new parameterization does not use ssthresh at all.

142	   Any time CCwin is larger than total_pipe, the default algorithm to
143	   grow total_pipe is for each ACK to trigger one segment of additional
144	   data.  This is essentially an implicit slowstart, but it is gated by
145	   the difference between CCwin and total_pipe, rather than the
146	   difference between cwnd and ssthresh.

148	   During Fast Retransmit, the congestion control algorithm, such as
149	   CUBIC, generally reduces CCwin in a single step.  Proportional Rate
150	   Reduction [PRR] is used to gradually reduce total_pipe to agree with
151	   CCwin.  PRR is based on Laminar principles, so its specification has
152	   many parallels to this document.

154	   Connection startup is accomplished as follows: CCwin is set to
155	   MAX_WINDOW (akin to ssthresh), and IW segments are transmitted.  The
156	   ACKs from these segments trigger additional data transmissions, and
157	   slowstart proceeds as it does today.  The very first congestion event
158	   is a special case because there is not a prior value for CCwin.  By
159	   default on the first congestion event only, CCwin would be set from
160	   total_pipe, and then standard congestion control is invoked.

162	   The primary advantage of the Laminar framework is that by
163	   partitioning congestion control and transmission scheduling into
164	   separate subsystems, each is subject to far simpler simpler design
165	   constraints, making it far easier to develop many new algorithms that
166	   are not feasible with the current organization of the code.

168	1.2.  Standards Impact

170	   Since we are proposing to to refactor existing standards into new
171	   state variables, all of the current congestion control standards
172	   documents will potentially need to be revised.  Note that there are
173	   roughly 60 RFC that mention cwnd or ssthresh, and all of them should
174	   be reviewed for material that may need to be updated.

176	   This document does not propose to change the TCP friendly paradigm.
177	   By default all updated algorithms using these new state variables
178	   would have behaviors similar to the current TCP implementations.  We
179	   do however anticipate some second order effects which we will address
180	   in section XXX below.  For example while testing PRR it was observed
181	   that suppressing bursts by slightly delaying transmissions can
182	   improve average performance, even though in a strict sense the new
183	   algorithm is less aggressive than the old.

185	1.3.  Meta Language

187	   We use the following terms when describing algorithms and their
188	   alternatives:

190	   Standard - The current state of the art, including both formal
191	   standards and widely deployed algorithms that have come into standard
192	   use, even though they may not be formally specified.  [Although PRR
193	   does not yet technically meet these criteria, we include it here].

195	   default - The simplest or most straightforward algorithm that fits
196	   within the Laminar framework.  For example implicit slowstart
197	   whenever total_pipe is less than CCwin.  This term does not make a
198	   statment about the relative aggressiveness or any other properties of
199	   the algorithm except that it is a reasonable choice and
200	   straightforward to implement.

202	   conformant - An algorithm that can produce the same packet trace as a
203	   TCP implementation that strictly conforms to the current standards.

205	   mimic - An algorithm constructed to be conformant to standards.

207	   opportunity - An algorithm that can do something better than the
208	   standard algorithm, typically better behavior in a corner cases that
209	   is either not well specified or where the standard behavior is viewed
210	   as being less than ideal.

212	   more/less aggressive - Any algorithm that sends segments earlier/
213	   later than another (typically conformant) algorithm under identical
214	   sequences of events.  Note that this is an evaluation of the packet
215	   level behavior, and does not reflect any higher order effects.

217	   Net more/less aggressive - Any algorithm that gets more/less average
218	   data rate than another (typically conformant) algorithm.  This is an
219	   empirical statement based on measurement (or perhaps justified
220	   speculation), and potentially indicates a problem with failing to be
221	   "TCP friendly".

223	2.  State variables and definitions

225	   CCwin - The primary congestion control state variable.

227	   DeliveredData - The total number of bytes that the current ACK
228	   indicates have been delivered to the receiver.  (See PRR for more
229	   detail).

231	   total_pipe - The total quantity of circulating data and ACKs.  In
232	   addition to RFC 3517 pipe, it includes DeliveredData for the current
233	   ack, plus any data held for delayed transmission, for example to
234	   permit a later TSO transmission.

236	   sendcnt - The quantity of data to be sent in response to the current
237	   event.

239	   application stall - The application is failing to keep TCP in bulk
240	   mode: either the sender is running out of data to send, or the
241	   receiver is not reading it fast enough.  When there is an application
242	   stall, congestion control does not regulate data transmission and
243	   some of the protocol events are triggered by application reads or
244	   writes, as appropriate.

246	3.  Updated Algorithms

248	   A survey of standard, common and proposed algorithms, and how they
249	   might be reimplemented under the Laminar framework.

251	3.1.  Congestion avoidance

253	   Under the Laminar framework the loss recovery mechanism does not, by
254	   default, interfere with the primary congestion control algorithms.
255	   The CCwin state variable is updated only by the algorithms that
256	   decide how much data to send on successive round trips.  For example
257	   standard Reno AIMD congestion control [RFC5681] can be implemented by
258	   raising CCwin by one segment every CCwin worth of ACKs (once per RTT)
259	   and halving it on every loss or ECN signal (e.g.  CCwin = CCwin/2).
260	   During recovery the transmission scheduling part of the Laminar
261	   framework makes the necessary adjustments to bring total_pipe to
262	   agree with CCwin, without tampering with CCwin.

264	   This separation between computing CCwin and transmission scheduling
265	   will enable new classes of congestion control algorithms, such as
266	   fluid models that adjust CCwin on every ACK, even during recovery.
267	   This is safe because raising CCwin does not directly trigger any
268	   transmissions, it just steers the transmission scheduling closer to
269	   the end of recovery.  Fluid models have a number of advantages, such
270	   as simpler closed form mathematical representations, and are
271	   intrinsically more tolerant to reordering since non-recovery
272	   disordered states don't inhibit growing the window.

274	   Investigating alternative algorithms and their impact is out of scope
275	   for this document.  It is important to note that while our goal here
276	   is not to alter the TCP friendly paradigm, Laminar does not include
277	   any implicit or explicit mechanism to prevent a Tragedy of the
278	   Commons.  However, see the comments in Section 6.

280	   The initial slowstart does not use the CCwin, except that CCwin
281	   starts at the largest possible value.  It is the transmission
282	   scheduling algorithms that are responsible for performing the
283	   slowstart.  On the first loss it is necessary to compute a reasonable
284	   CCwin from total_pipe.  Ideally, we might save total_pipe at the time
285	   each segment is scheduled for transmission, and use the saved value
286	   associated with the lost segment to prime CCwin.  However, this
287	   approach requires extra state attached to every segment in the
288	   retransmit queue.  A simpler approach is to have a mathematical model
289	   the slowstart, and to prime CCwin from total_pipe at the time the
290	   loss is detected, but scaled down by the effective slowstart
291	   multiplier (e.g. 1.5 or 2).  In either case, once CCwin is primed
292	   from total_pipe, it is typically appropriate to invoke the reduction
293	   on loss function, to reduce it again per the congestion control
294	   algorithm.

296	   Nearly all congestion control algorithms need to have some mechanism
297	   to prevent CCwin from growing while it is not regulating
298	   transmissions e.g. during application stalls.

300	3.2.  Proportional Rate Reduction

302	   Since PRR [I-D.ietf-tcpm-proportional-rate-reduction] was designed
303	   with Laminar principles in mind, updating it is a straightforward
304	   variable substitution.  CCwin replaces ssthresh, and RecoverFS is
305	   initialized from total_pipe at the beginning of recovery.  Thus PRR
306	   provides a gradual window reduction from the prior total_pipe down to
307	   the new CCwin.

309	   There is one important difference from the current standards: CCwin
310	   is computed solely on the basis of the prior value of CCwin.  Compare
311	   this to RFC 5681 which specifies that the congestion control function
312	   is computed on the basis of the FlightSize (e.g.
313	   ssthresh=FlightSize/2 ) This change from prior standard completely
314	   alters how application stalls interact with congestion control.

316	   Consider what happens if there is an application stall for most of
317	   the RTT just before a Fast Retransmit: Under Laminar it is likely
318	   that CCwin will be set to a value that is larger than total_pipe, and
319	   subject to available application data PRR will go directly to
320	   slowstart mode, to raise total_pipe up to CCwin.  Note that the final
321	   CCwin value does not depend on the duration of the application stall.

323	   WIth standard TCP, any application stall reducs the final value of
324	   cwnd at the end of recovery.  In some sense application stalls during
325	   recovery are treated as though they are additional losses, and have a
326	   detrimental effect on the connection data rate that lasts far longer
327	   than the stall itself.

329	   If there are no application stalls, the standard and Laminar variants
330	   of the PRR algorithm should have identical behaviors.  Although it is
331	   tempting to characterize Laminar as being more aggressive than the
332	   standards, it would be more apropos to characterize the standard as
333	   being excessively timid under common combinations of overlapping
334	   events that are not well represented by benchmarks or models.

336	3.3.  Restart after idle, Congestion Window Validation and Pacing

338	   Decoupling congestion control from transmission scheduling permits us
339	   to develop new algorithms to raise total_pipe to CCwin after an
340	   application stall or other events.  Although it was stated earlier
341	   that the default transmission scheduling algorithm for raising
342	   total_pipe is an implicit slowstart, there is lots of opportunity for
343	   better algorithms.

345	   We imagine a new class of hybrid transmission scheduling algorithms
346	   that use a combination of pacing and slowstart to reestablish TCP's
347	   self clock.  For example, whenever total_pipe is significantly below
348	   CCwin, RTT and CCwin can be used to directly compute a pacing rate.
349	   We suspect that pacing at the previous full rate will prove to be
350	   somewhat brittle, yielding erratic results.  It is more likely that a
351	   hybrid strategy will work better, for example by pacing at some
352	   fraction (1/2 or 1/4) of the prior rate until total_pipe reaches some
353	   fraction of CCwin (e.g.  CCwin/2) and then using conventional
354	   slowstart to bring total_pipe the rest of the way up to CCwin

356	   This is far less aggressive than standard TCP without cwnd validation
357	   [RFC2861]or when the application stall was less than one RTO, since
358	   standards permit TCP to send a full cwnd size burst in these
359	   situations.  It is potentially more aggressive than conventional
360	   slowstart invoked by cwnd validation when the application stall is
361	   longer than several RTOs.  Both standard behaviors in these
362	   situations have always been viewed as problematic, because interface
363	   rate bursts are clearly too aggressive and a full slowstart is
364	   clearly too conservative.  Mimicking either is a non-goal, when there
365	   is ample opportunity to find a better compromise.

367	   Although strictly speaking any new transmission scheduling algorithms
368	   are independent of the Laminar framework, they are expected to have
369	   substantially better behavior in many common environments and as such
370	   strongly motivate the effort required to refactor TCP implementations
371	   and standards.

373	3.4.  RTO and F-RTO

375	   We are not proposing any changes to the RTO timer or the
376	   F-RTO[RFC5682] algorithm used to detect spurious retransmissions.
377	   Once it is determined that segments were lost, CCwin is updated to a
378	   new value as determined by the congestion control function, and
379	   Laminar implicit slowstart is used to clock out (re)transmissions.
380	   Once all holes are filled, a hybrid paced transmissions can be used
381	   to reestablish TCPs self clock at the new data rate.  This can be the
382	   same hybrid pacing algorithm as is used to recover the self clock
383	   after application stalls.

385	   Note that as long as there is non-contiguous data at the receiver the
386	   retransmission algorithms require timely SACK information to make
387	   proper decisions about which segments to send.  Pacing during loss
388	   recovery is not recommended without further investigation.

390	3.5.  Undo

392	   Since CCwin is not used to implement transmission scheduling, undo is
393	   trivial.  CCwin can just be set back to a prior value and the
394	   transmission scheduling algorithm will transmit more (or less) data
395	   as needed.

397	3.6.  Control Block Interdependence

399	   Under the Laminar framework, congestion control state can be easily
400	   shared between connections[RFC2140].  An ensemble of connections can
401	   each maintain their own total_pipe (partial_pipe?) which in aggregate
402	   tracks a single common CCwin.  A master transmission scheduler
403	   allocates permission to send (sndcnt) to each of the constituent
404	   connection on the basis of the difference between the CCwin and the
405	   aggregate total_pipe, and a fairness or capacity allocation policy
406	   that balances the flows.  Note that ACKs on one connection in an
407	   ensemble might be used to clock transmissions on another connection,
408	   and that following a loss, the window reductions can be allocated to
409	   flows other than the one experiencing the loss.

411	3.7.  New Reno

413	   The key to making Laminar function well without SACK is having good
414	   estimators for DeliveredData and total_pipe.  By definition every
415	   duplicate ACK indicates that one segment has arrived at the receiver
416	   and total_pipe has fallen by one.  On any ACK that advances snd.una,
417	   total pipe can be updated from snd.nxt-snd.una, and DeliveredData is
418	   the change in snd.una, minus the estimated DeliveredData of the
419	   preceding duplicate ACKs.

421	4.  Example Pseudocode

423	   The example pseudocode in this section incorporates (or subsumes) the
424	   following algorithms:

426	   On startup:

428	     CCwin = MAX_WINOW
429	     sndBank = IW

431	   On every ACK:

433	     DeliveredData = delta(snd.una) + delta(SACKd)
434	     pipe = (RFC 3517 pipe algorithm)
435	     total_pipe = pipe+DeliveredData+sndBank
436	     sndcnt = DeliveredData    // Default outcome

438	     if new_recovery():
439	        if CCwin == MAX_WIN:
440	           CCwin = total_pipe/2 // First time only
441	        CCwin = CCwin/2         // Reno congestion control
442	        prr_delivered = 0       // Total bytes delivered during recov
443	        prr_out = 0             // Total bytes sent during recovery
444	        RecoverFS = total_pipe  //

446	     if !in_recovery() && !application_limited():
447	        CCwin += (MSS/CCwin)
448	     prr_delivered += DeliveredData  // noop if not in recovery
449	     if total_pipe > CCwin:
450	        // Proportional Rate Reduction
451	        sndcnt = CEIL(prr_delivered * CCwin / RecoverFS) - prr_out

453	     else if total_pipe < CCwin:
454	        if in_recovery():
455	           // PRR Slow Start Reduction Bound
456	           limit = MAX(prr_delivered - prr_out, DeliveredData) + SMSS
457	           sndcnt = MIN(CCwin - total_pipe, limit)
458	        else:
459	           // slow start with appropriate byte counting
460	           inc = MIN(DeliveredData, 2*MSS)
461	           sndcnt = DeliveredData + inc

463	     // cue the (re)transmission machinery
464	     sndBank += sndcnt
465	     limit = maxBank()
466	     if sndBank > limit:
467	        sndBank = limit
468	     tcp_output()

470	   For any data transmission or retransmission:

472	   tcp_output():
473	     while sndBank && tso_ok():
474	        len = sendsomething()
475	        sndBank -= len
476	        prr_out += len  // noop if not in recovery

478	5.  Compatibility with existing implementations

480	   On a segment by segment basis, the above algorithm is [believed to
481	   be] fully conformant with or less aggressive than standards under all
482	   conditions.

484	   However this condition is not sufficient to guarantee that average
485	   performance can't be substantially better (net more aggressive) than
486	   standards.  Consider an application that keeps TCP in bulk mode
487	   nearly all of the time, but has occasional pauses that last some
488	   fraction of one RTT.  A fully conforment TCP would be permitted to
489	   "catch up" by sending a partial window burst at full interface rate.
490	   In some networks, such bursts might be very disruptive, causing
491	   otherwise unnecessary packet losses and corresponding cwnd
492	   reductions.

494	   In Laminar, such a burst would be permitted, but the default
495	   algorithm would be slowstart.  A better algorithm would be to pace
496	   the data at (some fraction of) the prior rate.  Neither pacing nor
497	   slowstart is likely to cause unnecessary losses, and as was observed
498	   while testing PRR, being less aggressive at the segment level has the
499	   potential to increase average performance[IMC11PRR].  In this
500	   scenario Laminar with pacing has the potential to outperform both of
501	   the behaviors described by standards.

503	6.  Security Considerations

505	   The Laminar framework does not change the risk profile for TCP (or
506	   other transport protocols) themselves.

508	   However, the complexity of current algorithms as embodied in today's
509	   code present a substantial barrier to people wishing to cheat "TCP
510	   friendliness".  It is a fairly well known and easily rediscovered
511	   result that custom tweaks to make TCP more aggressive in one
512	   environment generally make it fragile and perform less well across
513	   the extreme diversity of the Internet.  This negative outcome is a
514	   substantial intrinsic barrier to wide deployment of rogue congestion
515	   control algorithms.

517	   A direct consequence of the changes proposed in this note, decoupling
518	   congestion control from other algorithms, is likely to reduce the
519	   barrier to rogue algorithms.  However this separation and the ability
520	   to introduce new congestion control algorithms is a key part of the
521	   motivation for this work.

523	   It is also important to note that web browsers have already largely
524	   defeated TCP's ability to regulate congestion by opening many
525	   concurrent connections.  When a Web page contains content served from
526	   multiple domains (the norm these days) all modern browsers open
527	   between 35 and 60 connections (see:
528	   http://www.browserscope.org/?category=network ).  This is the Web
529	   community's deliberate workaround for TCP's perceived poor
530	   performance and inability fill certain kinds of consumer grade
531	   networks.  As a consequence the transport layer has already lost a
532	   substantial portion of its ability to regulate congestion.  It was
533	   not anticipated that the tragedy of the commons in Internet
534	   congestion would be driven by competition between applications and
535	   not TCP implementations.

537	   In the short term, we can continue to try to use standards and peer
538	   pressure to moderate the rise in overall congestion levels, however
539	   the only real solution is to develop mechanisms in the Internet
540	   itself to apply some sort of backpressure to overly aggressive
541	   applications and transport protocols.  We need to redouble efforts by
542	   the ConEx WG and others to develop mechanisms to inform policy with
543	   information about congestion and it's causes.  Otherwise we have a
544	   looming tragedy of the commons, in which TCP has only a minor role.

546	   Implementers that change Laminar from counting bytes to segments have
547	   to be cautious about the effects of ACK splitting attacks[Savage99],
548	   where the receiver acknowledges partial segments for the purpose of
549	   confusing the sender's congestion accounting.

551	7.  IANA Considerations

553	   This document makes no request of IANA.

555	   Note to RFC Editor: this section may be removed on publication as an
556	   RFC.

558	8.  References

560	   [Jacobson88]
561	              Jacobson, V., "Congestion Avoidance and Control",
562	              SIGCOMM 18(4), August 1988.

564	   [RFC2140]  Touch, J., "TCP Control Block Interdependence", RFC 2140,
565	              April 1997.

567	   [RFC2861]  Handley, M., Padhye, J., and S. Floyd, "TCP Congestion
568	              Window Validation", RFC 2861, June 2000.

570	   [RFC3517]  Blanton, E., Allman, M., Fall, K., and L. Wang, "A
571	              Conservative Selective Acknowledgment (SACK)-based Loss
572	              Recovery Algorithm for TCP", RFC 3517, April 2003.

574	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
575	              Control", RFC 5681, September 2009.

577	   [RFC5682]  Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
578	              "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
579	              Spurious Retransmission Timeouts with TCP", RFC 5682,
580	              September 2009.

582	   [I-D.ietf-tcpm-proportional-rate-reduction]
583	              Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional
584	              Rate Reduction for TCP",
585	              draft-ietf-tcpm-proportional-rate-reduction-00 (work in
586	              progress), October 2011.

588	   [IMC11PRR]
589	              Mathis, M., Dukkipati, N., Cheng, Y., and M. Ghobadi,
590	              "Proportional Rate Reduction for TCP", Proceedings of the
591	              2011 ACM SIGCOMM conference on Internet measurement
592	              conference , 2011.

594	   [Savage99]
595	              Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
596	              "TCP congestion control with a misbehaving receiver",
597	              SIGCOMM Comput. Commun. Rev. 29(5), October  1999.

599	Author's Address

601	   Matt Mathis
602	   Google, Inc
603	   1600 Amphitheater Parkway
604	   Mountain View, California  93117
605	   USA

607	   Email: mattmathis@google.com