idnits 2.17.1 

draft-mathis-tcpm-tcp-laminar-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 15, 2012) is 4297 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 2140 (Obsoleted by RFC 9040)

  ** Obsolete normative reference: RFC 2861 (Obsoleted by RFC 7661)

  ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675)


     Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TCP Maintenance Working Group                                  M. Mathis
3	Internet-Draft                                               Google, Inc
4	Intended status: Experimental                              July 15, 2012
5	Expires: January 16, 2013

7	    Laminar TCP and the case for refactoring TCP congestion control
8	                  draft-mathis-tcpm-tcp-laminar-01.txt

10	Abstract

12	   The primary state variables used by all TCP congestion control
13	   algorithms, cwnd and ssthresh, are heavily overloaded, carrying
14	   different semantics in different states.  This leads to excess
15	   implementation complexity and poorly defined behaviors under some
16	   combinations of events, such as application stalls during loss
17	   recovery.  We propose a new framework for TCP congestion control, and
18	   to recast current standard algorithms to use new state variables.
19	   This new framework will not generally change the behavior of any of
20	   the primary congestion control algorithms when they are invoked in
21	   isolation.  It will permit new algorithms with better behaviors in
22	   many corner cases, such as when two distinct primary algorithms are
23	   invoked concurrently.  It will also foster the creation of new
24	   algorithms to address some events that are poorly treated by today's
25	   standards.  For the vast majority of traditional algorithms the
26	   transformation to the new state variables is completely
27	   straightforward.  However, the resulting implementation is likely to
28	   technically be in violation of existing TCP standards, even if it is
29	   fully compliant with their principles and intent.

31	Status of this Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at http://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on January 16, 2013.

48	Copyright Notice
49	   Copyright (c) 2012 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
65	   2.  Overview of the new algorithm  . . . . . . . . . . . . . . . .  3
66	   3.  Standards Impact . . . . . . . . . . . . . . . . . . . . . . .  4
67	   4.  Meta Language  . . . . . . . . . . . . . . . . . . . . . . . .  5
68	   5.  State variables and definitions  . . . . . . . . . . . . . . .  6
69	   6.  Updated Algorithms . . . . . . . . . . . . . . . . . . . . . .  6
70	     6.1.  Congestion avoidance . . . . . . . . . . . . . . . . . . .  7
71	     6.2.  Proportional Rate Reduction  . . . . . . . . . . . . . . .  8
72	     6.3.  Restart after idle, Congestion Window Validation and
73	           Pacing . . . . . . . . . . . . . . . . . . . . . . . . . .  8
74	     6.4.  RTO and F-RTO  . . . . . . . . . . . . . . . . . . . . . .  9
75	     6.5.  Undo . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
76	     6.6.  Control Block Interdependence  . . . . . . . . . . . . . . 10
77	     6.7.  New Reno . . . . . . . . . . . . . . . . . . . . . . . . . 10
78	   7.  Example Pseudocode . . . . . . . . . . . . . . . . . . . . . . 11
79	   8.  Compatibility with existing implementations  . . . . . . . . . 12
80	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
81	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
82	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
83	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 15

85	1.  Introduction

87	   The primary state variables used by all TCP congestion control
88	   algorithms, cwnd and ssthresh, are heavily overloaded, carrying
89	   different semantics in different states.  Multiple algorithms sharing
90	   the same state variables lead to excess complexity, conflicting
91	   correctness constraints, and makes it unreasonably difficult to
92	   implement, test and evaluate new algorithms.

94	   We are proposing a new framework for TCP congestion control that
95	   separate transmission scheduling, which determines precisely when
96	   data is sent, from pure congestion control, which determines the
97	   amount of data to be sent in each RTT.  This separation is
98	   implemented with new state variables and greatly simplifies the
99	   interactions between the two subsystems.  It permits vast range of
100	   new algorithms that are not feasible with the current
101	   parameterization.

103	   This note describes the new framework and presents a preliminary
104	   mapping between current standards and new algorithms based on the new
105	   state variables.  At this point the new algorithms are not fully
106	   specified, and many have still unconstrained design choices.  In most
107	   cases, our goal is to precisely mimic today's standard TCP, at least
108	   as far as well defined primary behaviors.  In general, it is a non-
109	   goal to mimic behaviors in poorly defined corner cases, or other
110	   cases where standard behaviors are viewed as being problematic.

112	   It is called Laminar because one of its design goals is to eliminate
113	   unnecessary turbulence introduced by TCP itself.

115	2.  Overview of the new algorithm

117	   The new framework separates transmission scheduling, which determines
118	   precisely when data is sent, from pure Congestion Control, which
119	   determines the total amount of data sent in any given RTT.

121	   The default algorithm for transmission scheduling is a strict
122	   implementation of Van Jacobsons' packet conservation principle
123	   [Jacobson88].  Data arriving at the receiver cause ACKs which in turn
124	   cause the sender to transmit an equivalent quantity of data back into
125	   the network.  The primary state variable is implicit in the quantity
126	   of data and ACKs circulating in the network.  This state observed
127	   through an improved "total_pipe" estimator, which is based on "pipe"
128	   as described in RFC 3517 [RFC3517] but also includes the quantity of
129	   data reported by the current ACK and pending transmissions that have
130	   passed congestion control but are waiting for other events such as
131	   TSO.

133	   A new state variable, CCwin, is the primary congestion control state
134	   variable.  It is updated only by the congestion control algorithms,
135	   which are concerned with detecting and regulating the overall level
136	   of congestion along the path.  CCwin is TCP's best estimate for an
137	   appropriate average window size.  In general, it rises when the
138	   network seem to be underfilled and is reduced in the presence of
139	   congestion signals, such as loss, ECN marks or increased delay.
140	   Although CCwin resembles cwnd, cwnd is overloaded and used by
141	   multiple algorithms (such as burst suppression) with different and
142	   sometimes conflicting goals.

144	   Any time total_pipe is different from CCwin the transmission
145	   scheduling algorithm slightly adjusts the number of segments sent in
146	   response to each ACK.  Slow start and Proportional Rate Reduction
147	   [PRRid] are both embedded in the transmission scheduling algorithm.

149	   If CCwin is larger than total_pipe, the default algorithm to grow
150	   total_pipe is for each ACK to trigger one segment of additional data.
151	   This is essentially an implicit slowstart, but it is gated by the
152	   difference between CCwin and total_pipe, rather than the difference
153	   between cwnd and ssthresh.  In the future, additional algorithms such
154	   as pacing, might be used to raise total_pipe.

156	   During Fast Retransmit, the congestion control algorithm, such as
157	   CUBIC, generally reduces CCwin in a single step.  Proportional Rate
158	   Reduction [PRRid] is used to gradually reduce total_pipe to agree
159	   with CCwin.  PRR was based on Laminar principles, so its
160	   specification has many parallels to this document.

162	   Connection startup is accomplished as follows: CCwin is set to
163	   MAX_WIN (akin to ssthresh), and IW segments are transmitted.  The
164	   ACKs from these segments trigger additional data transmissions, and
165	   slowstart proceeds as it does today.  The very first congestion event
166	   is a special case because there is not a prior value for CCwin.  By
167	   default and on the first congestion event only, CCwin would be set
168	   from total_pipe, and then standard congestion control is invoked.

170	   The primary advantage of the Laminar framework is that by
171	   partitioning congestion control and transmission scheduling into
172	   separate subsystems, each is subject to simpler design constraints,
173	   making it far easier to develop many new algorithms that are not
174	   feasible with the current organization of the code.

176	3.  Standards Impact

178	   Since we are proposing to refactor existing standards into new state
179	   variables, all of the current congestion control standards documents
180	   will potentially need to be reviewed.  Although there are roughly 60
181	   RFCs that mention cwnd or ssthresh, most only need self evident
182	   reinterpretation.  Others, such as MIBs, warrant a sentence or two
183	   clarifying how to map CCwin and total_pipe onto existing
184	   specifications that use cwnd and ssthresh.  There are however several
185	   RFCs that explicitly address the interplay between cwnd and ssthresh
186	   in today's TCP, including RFC 5681 [RFC5681], RFC 5682 [RFC5682], RFC
187	   4015 [RFC4015], and RFC 6582 [RFC6582].  These need to be reviewed
188	   more carefully.  In most cases the algorithms can easily be restated
189	   under the Laminar framework.  Others, such as Congestion Window
190	   Validation [RFC2861], potentially require redesign.

192	   This document does not propose to change the TCP friendly paradigm
193	   [RFC2914].  By default all updated algorithms using these new state
194	   variables would have behaviors similar to the current TCP
195	   implementations, however over the longer term the intent is to permit
196	   new algorithms that are not feasible today.  For example, since CCwin
197	   does not directly affect transmissions during recovery, it is
198	   straightforward to permit recovery ACKs to raise CCwin even while PRR
199	   is reducing total_pipe.  This facilitates so called "fluid model"
200	   algorithms which further decouple congestion control from the details
201	   of the TCP the protocol.

203	   But even without these advanced algorithms, we do anticipate some
204	   second order effects.  For example while testing PRR it was observed
205	   that suppressing bursts by slightly delaying transmissions can
206	   improve average performance, even though in a strict sense the new
207	   algorithm is less aggressive than the old [IMC11PRR].

209	4.  Meta Language

211	   We use the following terms when describing algorithms and their
212	   alternatives:

214	   Standard - The current state of the art, including both formal
215	   standards and widely deployed algorithms that have come into standard
216	   use, even though they may not be formally specified.  [Although PRR
217	   does not yet technically meet these criteria, we include it here].

219	   default - The simplest or most straightforward algorithm that fits
220	   within the Laminar framework.  For example implicit slowstart
221	   whenever total_pipe is less than CCwin.  This term does not make a
222	   statment about the relative aggressiveness or any other properties of
223	   the algorithm except that it is a reasonable choice and
224	   straightforward to implement.

226	   conformant - An algorithm that can produce the same packet trace as a
227	   TCP implementation that strictly conforms to the current standards.

229	   mimic - An algorithm constructed to be conformant to standards.

231	   opportunity - An algorithm that can do something better than the
232	   standard algorithm, typically better behavior in a corner cases that
233	   is either not well specified or where the standard behavior is viewed
234	   as being less than ideal.

236	   more/less aggressive - Any algorithm that sends segments earlier/
237	   later than another (typically conformant) algorithm under identical
238	   sequences of events.  Note that this is an evaluation of the packet
239	   level behavior, and does not reflect any higher order effects.

241	   Observed performance - A statement about algorithm performance based
242	   on a measurement study or other observations based on a significant
243	   sample of authentic Internet paths. e.g. an algorithm might have
244	   observed data rate that is different than another (typically
245	   conformant) algorithm.

247	   application stall - The application is failing to keep up with TCP:
248	   either the sender is running out of data to send, or the receiver is
249	   not reading it fast enough.  When there is an application stall,
250	   congestion control does not regulate data transmission and some of
251	   the protocol events are triggered by application reads or writes, as
252	   appropriate.

254	5.  State variables and definitions

256	   CCwin - The primary congestion control state variable.

258	   DeliveredData - The total number of bytes that the current ACK
259	   indicates have been delivered to the receiver.  (See [PRRid] for more
260	   details).

262	   total_pipe - The total quantity of circulating data and ACKs.  In
263	   addition to RFC 3517 pipe, it includes DeliveredData for the current
264	   ack, plus any data held for delayed transmission, for example to
265	   permit a later TSO transmission.

267	   sendcnt - The quantity of data to be sent in response to the current
268	   ACK or other event.

270	6.  Updated Algorithms

272	   A survey of standard, common and proposed algorithms, and how they
273	   might be reimplemented under the Laminar framework.

275	6.1.  Congestion avoidance

277	   Under the Laminar framework the loss recovery mechanism does not, by
278	   default, interfere with the primary congestion control algorithms.
279	   The CCwin state variable is updated only by the algorithms that
280	   decide how much data to send on successive round trips.  For example
281	   standard Reno AIMD congestion control [RFC5681] can be implemented by
282	   raising CCwin by one segment every CCwin worth of ACKs (once per RTT)
283	   and halving it on every loss or ECN signal (e.g.  CCwin = CCwin/2).
284	   During recovery the transmission scheduling part of the Laminar
285	   framework makes the necessary adjustments to bring total_pipe to
286	   agree with CCwin, without tampering with CCwin.

288	   This separation between computing CCwin and transmission scheduling
289	   will enable new classes of congestion control algorithms, such as
290	   fluid models that adjust CCwin on every ACK, even during recovery.
291	   This is safe because raising CCwin does not directly trigger any
292	   transmissions, it just steers the transmission scheduling closer to
293	   the end of recovery.  Fluid models have a number of advantages, such
294	   as simpler closed form mathematical representations, and are
295	   intrinsically more tolerant to reordering since non-recovery
296	   disordered states don't inhibit window growth.

298	   Investigating alternative algorithms and their impact is out of scope
299	   for this document.  It is important to note that while our goal here
300	   is not to alter the TCP friendly paradigm, Laminar does not include
301	   any implicit or explicit mechanism to prevent a Tragedy of the
302	   Commons.  However, see the comments in Section 9.

304	   The initial slowstart does not use CCwin, except that CCwin starts at
305	   the largest possible value.  It is the transmission scheduling
306	   algorithms that are responsible for performing the slowstart.  On the
307	   first loss it is necessary to compute a reasonable CCwin from
308	   total_pipe.  Ideally, we might save total_pipe at the time each
309	   segment is scheduled for transmission, and use the saved value
310	   associated with the lost segment to prime CCwin.  However, this
311	   approach requires extra state attached to every segment in the
312	   retransmit queue.  A simpler approach is to have a mathematical model
313	   the slowstart, and to prime CCwin from total_pipe at the time the
314	   loss is detected, but scaled down by the effective slowstart
315	   multiplier (e.g. 1.5 or 2).  In either case, once CCwin is primed
316	   from total_pipe, it is typically appropriate to invoke the reduction
317	   on loss function, to reduce it again per the congestion control
318	   algorithm.

320	   Nearly all congestion control algorithms need to have some mechanism
321	   to prevent CCwin from growing while it is not regulating
322	   transmissions e.g. during prolonged application stalls.

324	6.2.  Proportional Rate Reduction

326	   Since PRR [PRRid] was designed with Laminar principles in mind,
327	   updating it is a straightforward variable substitution.  CCwin
328	   replaces ssthresh, and RecoverFS is initialized from total_pipe at
329	   the beginning of recovery.  Thus PRR provides a gradual window
330	   reduction from the prior total_pipe down to the new CCwin.

332	   There is one important difference from the current standards: CCwin
333	   is computed solely on the basis of the prior value of CCwin.  Compare
334	   this to RFC 5681 which specifies that the congestion control function
335	   is computed on the basis of the FlightSize (e.g.
336	   ssthresh=FlightSize/2 ) This change from prior standard completely
337	   alters how application stalls interact with congestion control.

339	   Consider what happens if there is an application stall for most of
340	   the RTT just before a Fast Retransmit: Under Laminar it is likely
341	   that CCwin will be set to a value that is larger than total_pipe, and
342	   subject to available application data PRR will go directly to
343	   slowstart mode, to raise total_pipe up to CCwin.  Note that the final
344	   CCwin value does not depend on the duration of the application stall.

346	   With standard TCP, any application stall reduces the final value of
347	   cwnd at the end of recovery.  In some sense application stalls during
348	   recovery are treated as though they are additional losses, and have a
349	   detrimental effect on the connection data rate that lasts far longer
350	   than the stall itself.

352	   If there are no application stalls, the standard and Laminar variants
353	   of the PRR algorithm should have identical behaviors.  Although it is
354	   tempting to characterize Laminar as being more aggressive than the
355	   standards, it would be more apropos to characterize the standard as
356	   being excessively timid under certain combinations of overlapping
357	   events that are not well represented by benchmarks or models.

359	6.3.  Restart after idle, Congestion Window Validation and Pacing

361	   Decoupling congestion control from transmission scheduling permits us
362	   to develop new algorithms to raise total_pipe to CCwin after an
363	   application stall or other events.  Although it was stated earlier
364	   that the default transmission scheduling algorithm for raising
365	   total_pipe is an implicit slowstart, there is opportunity for better
366	   algorithms.

368	   We imagine a class of hybrid transmission scheduling algorithms that
369	   use a combination of pacing and slowstart to reestablish TCP's self
370	   clock.  (See [Visweswaraiah99].)  For example, whenever total_pipe is
371	   significantly below CCwin, RTT and CCwin can be used to directly
372	   compute a pacing rate.  We suspect that pacing at the previous full
373	   rate will prove to be somewhat brittle, sometimes causing excessive
374	   loss and yielding erratic results.  It is more likely that a hybrid
375	   strategy will work better and be better for the network, for example
376	   by pacing at some fraction (1/2 or 1/4) of the prior rate until
377	   total_pipe reaches some fraction of CCwin (e.g.  CCwin/2) and then
378	   using conventional slowstart to bring total_pipe the rest of the way
379	   up to CCwin.

381	   This is far less aggressive than standard TCP without cwnd validation
382	   [RFC2861] or when the application stall was less than one RTO, since
383	   standards permit TCP to send a full cwnd size burst in these
384	   situations.  It is potentially more aggressive than conventional
385	   slowstart invoked by cwnd validation when the application stall is
386	   longer than several RTOs.  Both standard behaviors in these
387	   situations have always been viewed as problematic, because interface
388	   rate bursts are clearly too aggressive and a full slowstart is
389	   clearly too conservative.  Mimicking either is a non-goal, when there
390	   is ample opportunity to find a better compromise.

392	   Although strictly speaking any new transmission scheduling algorithms
393	   are independent of the Laminar framework, they are expected to have
394	   substantially better behavior in many common environments and as such
395	   strongly motivate the effort required to refactor TCP implementations
396	   and standards.

398	6.4.  RTO and F-RTO

400	   We are not proposing any changes to the RTO timer or the F-RTO
401	   [RFC5682] algorithm used to detect spurious retransmissions.  Once it
402	   is determined that segments were lost, CCwin is updated to a new
403	   value as determined by the congestion control function, and Laminar
404	   implicit slowstart is used to clock out (re)transmissions.  Once all
405	   holes are filled, a hybrid paced transmissions can be used to
406	   reestablish TCPs self clock at the new data rate.  This can be the
407	   same hybrid pacing algorithm as is used to recover the self clock
408	   after application stalls.

410	   Note that as long as there is non-contiguous data at the receiver the
411	   retransmission algorithms require timely SACK information to make
412	   proper decisions about which segments to send.  Pacing during loss
413	   recovery is not recommended without further investigation.

415	6.5.  Undo

417	   Since CCwin is not used to implement transmission scheduling, undo is
418	   trivial.  CCwin can just be set back to its prior value and the
419	   transmission scheduling algorithm will transmit more (or less) data
420	   as needed.  It is useful to note that the discussion about ssthresh
421	   in [RFC4015] also applies to CCwin in TCP Laminar.  Some people might
422	   find it useful to think of CCwin as being equivalent to
423	   MAX(ssthresh,cwnd).

425	   There is an opportunity to do substantially better than current
426	   algorithms.  Undo can be implemented by saving the arithmetic
427	   difference between the current and prior value of CCwin, and then
428	   adding this delta back into CCwin when all retransmissions are deemed
429	   to be spurious.  If the congestion avoidance algorithm is linear (or
430	   can be linearized), and is mathematically transportable across undo,
431	   it is possible to design a congestion control algorithm that is
432	   completely immune to reordering in the sense that the overall
433	   evolution of CCwin is not affected by low level reordering, even if
434	   it is pervasive.  This is an area for future research.

436	6.6.  Control Block Interdependence

438	   Under the Laminar framework, congestion control state can be easily
439	   shared between connections [RFC2140].  An ensemble of connections can
440	   each maintain their own total_pipe (partial_pipe?) which in aggregate
441	   tracks a single common CCwin.  A master transmission scheduler
442	   allocates permission to send (sndcnt) to each of the constituent
443	   connection on the basis of the difference between the CCwin and the
444	   aggregate total_pipe, and a fairness or capacity allocation policy
445	   that balances the flows.  Note that ACKs on one connection in an
446	   ensemble might be used to clock transmissions on another connection,
447	   and that following a loss, the window reductions can be allocated to
448	   flows other than the one experiencing the loss.

450	6.7.  New Reno

452	   The key to making Laminar function well without SACK is having good
453	   estimators for DeliveredData and total_pipe.  By definition every
454	   duplicate ACK indicates that one segment has arrived at the receiver
455	   and total_pipe has fallen by one.  On any ACK that advances snd.una,
456	   total pipe can be updated from snd.nxt-snd.una, and DeliveredData is
457	   the change in snd.una, minus the sum of the estimated DeliveredData
458	   of the preceding duplicate ACKs.  As with SACK the total
459	   DeliveredData must agree with the overall forward progress over time.

461	7.  Example Pseudocode

463	   On startup:

465	     CCwin = MAX_WIN
466	     sndBank = IW

468	   On every ACK:

470	     DeliveredData = delta(snd.una) + delta(SACKd)
471	     pipe = (RFC 3517 pipe algorithm)
472	     total_pipe = pipe+DeliveredData+sndBank
473	     sndcnt = DeliveredData    // Default # transmissions

475	     if new_recovery():
476	        if CCwin == MAX_WIN:
477	           CCwin = total_pipe/2 // First time only
478	        CCwin = CCwin/2         // Reno congestion control
479	        prr_delivered = 0       // Total bytes delivered during recov
480	        prr_out = 0             // Total bytes sent during recovery
481	        RecoverFS = total_pipe  //

483	     if !in_recovery() && !application_limited():
484	        CCwin += (MSS/CCwin)
485	     prr_delivered += DeliveredData  // noop if not in recovery
486	     if total_pipe > CCwin:
487	        // Proportional Rate Reduction
488	        sndcnt = CEIL(prr_delivered * CCwin / RecoverFS) - prr_out

490	     else if total_pipe < CCwin:
491	        if in_recovery():
492	           // PRR Slow Start Reduction Bound
493	           limit = MAX(prr_delivered - prr_out, DeliveredData) + SMSS
494	           sndcnt = MIN(CCwin - total_pipe, limit)
495	        else:
496	           // slow start with appropriate byte counting
497	           inc = MIN(DeliveredData, 2*MSS)
498	           sndcnt = DeliveredData + inc

500	     // cue the transmission machinery
501	     sndBank += sndcnt
502	     limit = maxBank()
503	     if sndBank > limit:
504	        sndBank = limit
505	     tcp_output()

507	   For any data transmission or retransmission:

509	   tcp_output():
510	     while sndBank && tso_ok():
511	        len = sendsomething()
512	        sndBank -= len
513	        prr_out += len  // noop if not in recovery

515	8.  Compatibility with existing implementations

517	   On a segment by segment basis, the above algorithm is [believed to
518	   be] fully conformant with or less aggressive than standards under all
519	   conditions.

521	   However this condition is not sufficient to guarantee that observed
522	   performance can't be better than standards.  Consider an application
523	   that keeps TCP in bulk mode nearly all of the time, but has
524	   occasional pauses that last some fraction of one RTT.  A fully
525	   conforment TCP would be permitted to "catch up" by sending a partial
526	   window burst at full interface rate.  On some networks, such bursts
527	   might be very disruptive, causing otherwise unnecessary packet losses
528	   and corresponding cwnd reductions.

530	   In Laminar the default algorithm would be slowstart.  Other
531	   algorithms that might cause the same bursts would be permitted,
532	   although are not described here.  A better algorithm would be to pace
533	   the data at (some fraction of) the prior rate.  Neither pacing nor
534	   slowstart is likely to cause unnecessary losses, and as was observed
535	   while testing PRR, being less aggressive at the segment level has the
536	   potential to increase the observed performance[IMC11PRR].  In this
537	   scenario Laminar with pacing has the potential to outperform both of
538	   the behaviors described by standards.

540	9.  Security Considerations

542	   The Laminar framework does not change the risk profile for TCP (or
543	   other transport protocols) themselves.

545	   However, the complexity of current algorithms as embodied in today's
546	   code present a substantial barrier to people wishing to cheat "TCP
547	   friendliness".  It is a fairly well known and easily rediscovered
548	   result that custom tweaks to make TCP more aggressive in one
549	   environment generally make it fragile and perform less well across
550	   the extreme diversity of the Internet.  This negative outcome is a
551	   substantial intrinsic barrier to wide deployment of rogue congestion
552	   control algorithms.

554	   A direct consequence of the changes proposed in this note, decoupling
555	   congestion control from other algorithms, is likely to reduce the
556	   barrier to rogue algorithms.  However this separation and the ability
557	   to introduce new congestion control algorithms is a key part of the
558	   motivation for this work.

560	   It is also important to note that web browsers have already largely
561	   defeated TCP's ability to regulate congestion by opening many
562	   concurrent connections.  When a Web page contains content served from
563	   multiple domains (the norm these days) all modern browsers open
564	   between 35 and 60 connections (see:
565	   http://www.browserscope.org/?category=network ).  This is the Web
566	   community's deliberate workaround for TCP's perceived poor
567	   performance and inability make full use of certain types of consumer
568	   grade networks.  As a consequence the transport layer has already
569	   lost a substantial portion of its ability to regulate congestion.  It
570	   was not anticipated that the tragedy of the commons in Internet
571	   congestion would be driven by competition between applications and
572	   not between TCP implementations.

574	   In the short term, we can continue to try to use standards and peer
575	   pressure to moderate the rise in overall congestion levels, however
576	   the only real solution is to develop mechanisms in the Internet
577	   itself to apply some sort of backpressure to overly aggressive
578	   applications and transport protocols.  We need to redouble efforts by
579	   the ConEx WG and others to develop mechanisms to inform policy with
580	   information about congestion and it's causes.  Otherwise we have a
581	   looming tragedy of the commons, in which TCP has only a minor role.

583	   Implementers that change Laminar from counting bytes to segments have
584	   to be cautious about the effects of ACK splitting attacks[Savage99],
585	   where the receiver acknowledges partial segments for the purpose of
586	   confusing the sender's congestion accounting.

588	10.  IANA Considerations

590	   This document makes no request of IANA.

592	   Note to RFC Editor: this section may be removed on publication as an
593	   RFC.

595	11.  References

597	   [Jacobson88]
598	              Jacobson, V., "Congestion Avoidance and Control",
599	              SIGCOMM 18(4), August 1988.

601	   [RFC2140]  Touch, J., "TCP Control Block Interdependence", RFC 2140,
602	              April 1997.

604	   [RFC2861]  Handley, M., Padhye, J., and S. Floyd, "TCP Congestion
605	              Window Validation", RFC 2861, June 2000.

607	   [RFC2914]  Floyd, S., "Congestion Control Principles", BCP 41,
608	              RFC 2914, September 2000.

610	   [RFC3517]  Blanton, E., Allman, M., Fall, K., and L. Wang, "A
611	              Conservative Selective Acknowledgment (SACK)-based Loss
612	              Recovery Algorithm for TCP", RFC 3517, April 2003.

614	   [RFC4015]  Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm
615	              for TCP", RFC 4015, February 2005.

617	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
618	              Control", RFC 5681, September 2009.

620	   [RFC5682]  Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
621	              "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
622	              Spurious Retransmission Timeouts with TCP", RFC 5682,
623	              September 2009.

625	   [RFC6582]  Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The
626	              NewReno Modification to TCP's Fast Recovery Algorithm",
627	              RFC 6582, April 2012.

629	   [PRRid]    Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional
630	              Rate Reduction for TCP",
631	              draft-mathis-tcpm-proportional-rate-reduction-01 (work in
632	              progress), July 2011.

634	   [IMC11PRR]
635	              Mathis, M., Dukkipati, N., Cheng, Y., and M. Ghobadi,
636	              "Proportional Rate Reduction for TCP", Proceedings of the
637	              2011 ACM SIGCOMM conference on Internet measurement
638	              conference , 2011.

640	   [Savage99]
641	              Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
642	              "TCP congestion control with a misbehaving receiver",
643	              SIGCOMM Comput. Commun. Rev. 29(5), October  1999.

645	   [Visweswaraiah99]
646	              Visweswaraiah, V., "Improving Restart of Idle TCP
647	              Connections", Tech Report USC TR 97-661, November 1997.

649	Author's Address

651	   Matt Mathis
652	   Google, Inc
653	   1600 Amphitheater Parkway
654	   Mountain View, California  93117
655	   USA

657	   Email: mattmathis@google.com