<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<!--
To format this document:


TODO:


-->
<?rfc toc="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="yes"?>
<?rfc tocdepth="6"?>
<?rfc symrefs="yes"?>
<rfc ipr="trust200902" category="exp" docName="draft-mathis-tcpm-tcp-laminar-00.txt">
<front>
<title abbrev="Laminar TCP">

Laminar TCP and the case for refactoring TCP congestion control
</title>
<author fullname="Matt Mathis" initials="M." surname="Mathis">
  <organization>Google, Inc</organization>
  <address>
  <postal>
    <street>1600 Amphitheater Parkway</street>
    <city>Mountain View</city> <code>93117</code>  <region>California</region> <country>USA</country>
  </postal>
  <email> mattmathis@google.com </email>
  </address>
 </author>
<date month="February" day="21" year="2012"/>
 <area> Transport Area </area>
 <workgroup> TCP Maintenance Working Group </workgroup>


<abstract>


<t>
The primary state variables used by all TCP congestion control algorithms, cwnd and ssthresh are heavily overloaded, carrying different semantics in different states.   This leads to excess implementation complexity and poorly defined behaviors under some combinations of events, such as loss recovery during cwnd validation.   We propose a new framework for TCP congestion control, and to recast current standard algorithms to use new state variables.   This new framework will not generally change the behavior of any of the primary congestion control algorithms when invoked in isolation but will to permit new algorithms with better behaviors in many corner cases, such as when two distinct primary algorithms are invoked concurrently.   It will also foster the creation of new algorithms to address some events that are poorly treated by today's standards. For the vast majority of traditional algorithms the transformation to the new state variables is completely straightforward.   However, the resulting implementation will technically be in violation of all existing TCP standards, even if it is fully compliant with their principles and intent.</t>


 </abstract>        
</front>
<middle>


<section title="Introduction">
<t>
The primary state variables used by all TCP congestion control algorithms, cwnd and ssthresh,  are heavily overloaded, carrying different semantics in different states.   This leads to excess implementation complexity and poorly defined behaviors under some combinations of events, such as overlapping application stalls and loss recovery.   Multiple algorithms sharing the same state variables lead to excess complexity and conflicting correctness constraints, making it unreasonably difficult to implement, test and evaluate new algorithms.</t> 


<t>
We are proposing a new framework for TCP congestion control and it use new state variables that separate transmission scheduling, which determines precisely when data is sent, from congestion control, which determines the amount of data to be sent in each RTT.   This separation greatly simplifies the interactions between the two subsystems and permits vast range of new algorithms that are not feasible with the current parameterization.</t>


<t>
This note describes the new framework, represented through its state variables, and presents a preliminary mapping between current standards and new algorithms based on the new state variables.   At this point the new algorithms are not fully specified, and many have still unconstrained design choices.   In most cases, our goal is to precisely mimic todays standard TCP, at least as far as well defined primary behaviors.   In general, it is a non-goal to mimic behaviors in poorly defined corner cases, or other cases where standard behaviors are viewed as being problematic.</t> 


<t>
It is called Laminar because one of its design goals is to eliminate unnecessary turbulence introduced by TCP itself.</t>
<section title="Overview of the new algorithm"> 


<t>
The new framework separate transmission scheduling, which determines precisely when data is sent, from Congestion Control, which determines the total amount of data sent in any given RTT.</t> 


<t>
The default algorithm for transmission scheduling is a strict implementation of Van Jacobsons' packet conservation principle <xref target='Jacobson88' />.   Data arriving at the receiver cause ACKs which in turn cause the sender to transmit an equivalent quantity of data back into the network.  The primary state variable is implicit in the quantity of data and ACKs circulating in the network.  This state observed through a new "total_pipe" estimator, which is a generalization of "pipe" as described in RFC 3517. <xref target="RFC3517" /></t>


<t>
A new state variable, CCwin, is the primary congestion control state variable.    It is updated only by the congestion control algorithms, which are concerned with detecting and regulating the overall level of congestion along the path.    CCwin is TCP's best estimate for an appropriate average window size.  In general, it rises when the network seem to be underfilled and is reduced in the presence of congestion signals, such as loss, ECN marks or increased delay.   Although CCwin resembles cwnd, it is actually quite different, for one thing the new parameterization does not use ssthresh at all.</t>


<t>
Any time CCwin is larger than total_pipe, the default algorithm to grow total_pipe is for each ACK to trigger one segment of additional data.  This is essentially an implicit slowstart, but it is gated by the difference between CCwin and total_pipe, rather than the difference between cwnd and ssthresh.</t>


<t>
During Fast Retransmit, the congestion control algorithm, such as CUBIC, generally reduces CCwin in a single step.  Proportional Rate Reduction [PRR] is used to gradually reduce total_pipe to agree with CCwin.   PRR is based on Laminar principles, so  its specification has many parallels to this document. </t>



<t>
Connection startup is accomplished as follows: CCwin is set to MAX_WINDOW (akin to ssthresh), and IW segments are transmitted.   The ACKs from these segments trigger additional data transmissions, and slowstart proceeds as it does today.  The very first congestion event is a special case because there is not a prior value for CCwin.   By default on the first congestion event only, CCwin would be set from total_pipe, and then standard congestion control is invoked.</t>


<t>
The primary advantage of the Laminar framework is that by partitioning congestion control and transmission scheduling into separate subsystems, each is subject to far simpler simpler design constraints, making it far easier to develop many new algorithms that are not feasible with the current organization of the code.</t>


</section>
<section title="Standards Impact">
<t>
Since we are proposing to to refactor existing standards into new state variables, all of the current congestion control standards documents will potentially need to be revised.   Note that there are roughly 60 RFC that mention cwnd or ssthresh, and all of them should be reviewed for material that may need to be updated. </t>   


<t>
This document does not propose to change the TCP friendly paradigm.   By default all updated algorithms using these new state variables would have behaviors similar to the current TCP implementations.   We do however anticipate some second order effects which we will address in section XXX below.  For example while testing PRR it was observed that suppressing bursts by slightly delaying transmissions can improve average performance, even though in a strict sense the new algorithm is less aggressive than the old.</t>


</section>
<section title="Meta Language">


<t>
We use the following terms when describing algorithms and their alternatives:</t>


<t>
Standard - The current state of the art, including both formal standards and widely deployed algorithms that have come into standard use, even though they may not be formally specified.  [Although PRR does not yet technically meet these criteria, we include it here].</t>


<t>
default - The simplest or most straightforward algorithm that fits within the Laminar framework.  For example implicit slowstart whenever total_pipe is less than CCwin.   This term does not make a statment about the relative aggressiveness or any other properties of the algorithm except that it is a reasonable choice and straightforward to implement.</t>


<t>
conformant - An algorithm that can produce the same packet trace as a TCP implementation that strictly conforms to the current standards.</t>


<t>
mimic - An algorithm constructed to be conformant to standards.</t>


<t>
opportunity - An algorithm that can do something better than the standard algorithm, typically better behavior in a corner cases that is either not well specified or where the standard behavior is viewed as being less than ideal.</t>


<t>
more/less aggressive - Any algorithm that sends segments earlier/later than another (typically conformant) algorithm under identical sequences of events.  Note that this is an evaluation of the packet level behavior, and does not reflect any higher order effects.</t>


<t>
Net more/less aggressive - Any algorithm that gets more/less average data rate than another (typically conformant) algorithm.  This is an empirical statement based on measurement (or perhaps justified speculation), and potentially indicates a problem with failing to be "TCP friendly".</t>
</section>
</section>
<section title="State variables and definitions">


<t>
CCwin - The primary congestion control state variable.</t>


<t>
DeliveredData - The total number of bytes that the current ACK indicates have been delivered to the receiver.  (See PRR for more detail).</t>


<t>
total_pipe - The total quantity of circulating data and ACKs.  In addition to RFC 3517 pipe, it includes DeliveredData for the current ack, plus any data held for delayed transmission, for example to permit a later TSO transmission.</t>


<t>
sendcnt - The quantity of data to be sent in response to the current event.</t>


<t>
application stall - The application is failing to keep TCP in bulk mode: either the sender is running out of data to send, or the receiver is not reading it fast enough.   When there is an application stall, congestion control does not regulate data transmission and some of the protocol events are triggered by application reads or writes, as appropriate.</t>


</section>
<section title="Updated Algorithms">


<t>
A survey of standard, common and proposed algorithms, and how they might be reimplemented under the Laminar framework.</t>


<section title="Congestion avoidance">
<t>
Under the Laminar framework the loss recovery mechanism does not, by default, interfere with the primary congestion control algorithms.  The CCwin state variable is updated only by the algorithms that decide how much data to send on successive round trips.  For example standard Reno AIMD congestion control <xref target='RFC5681' /> can be implemented by raising CCwin by one segment every CCwin worth of ACKs (once per RTT) and halving it on every loss or ECN signal (e.g. CCwin = CCwin/2).   During recovery the transmission scheduling part of the Laminar framework makes the necessary adjustments to bring total_pipe to agree with CCwin, without tampering with CCwin.</t>


<t>
This separation between computing CCwin and transmission scheduling will enable new classes of congestion control algorithms, such as fluid models that adjust CCwin on every ACK, even during recovery.  This is safe because raising CCwin does not directly trigger any transmissions, it just steers the transmission scheduling closer to the end of recovery.  Fluid models have a number of advantages, such as simpler closed form mathematical representations, and are intrinsically more tolerant to reordering since non-recovery disordered states don't inhibit growing the window.</t>


<t>Investigating alternative algorithms and their impact is out of scope for this document.   It is important to note that while our goal here is not to alter the TCP friendly paradigm, Laminar does not include any implicit or explicit mechanism to prevent a Tragedy of the Commons. However, see the comments in <xref target="security" />.</t>


<t>
The initial slowstart does not use the CCwin, except that CCwin starts at the largest possible value.  It is the transmission scheduling algorithms that are responsible for performing the slowstart.   On the first loss it is necessary to compute a reasonable CCwin from total_pipe.  Ideally, we might save total_pipe at the time each segment is scheduled for transmission, and use the saved value associated with the lost segment to prime CCwin.  However, this approach requires extra state attached to every segment in the retransmit queue.  A simpler approach is to have a mathematical model the slowstart, and to prime CCwin from total_pipe at the time the loss is detected, but scaled down by the effective slowstart multiplier (e.g. 1.5 or 2).   In either case, once CCwin is primed from total_pipe, it is typically appropriate to invoke the reduction on loss function, to reduce it again per the congestion control algorithm.</t>


<t>
Nearly all congestion control algorithms need to have some mechanism to prevent CCwin from growing while it is not regulating transmissions  e.g. during application stalls.</t>


</section>
<section title="Proportional Rate Reduction"> 
<t>
Since PRR <xref target='I-D.ietf-tcpm-proportional-rate-reduction' /> was designed with Laminar principles in mind, updating it is a straightforward variable substitution.  CCwin replaces ssthresh, and RecoverFS is initialized from total_pipe at the beginning of recovery.   Thus PRR provides a gradual window reduction from the prior total_pipe down to the new CCwin.</t>


<t>
There is one important difference from the current standards: CCwin is computed solely on the basis of the prior value of CCwin.  Compare this to RFC 5681 which specifies that the congestion control function is computed on the basis of the FlightSize  (e.g. ssthresh=FlightSize/2 )   This change from prior standard completely alters how application stalls interact with congestion control.</t>


<t>Consider what happens if there is an application stall for most of the RTT just before a Fast Retransmit: Under Laminar it is likely that CCwin will be set to a value that is larger than total_pipe, and subject to available application data  PRR will go directly to slowstart mode, to raise total_pipe up to CCwin.  Note that the final CCwin value does not depend on the duration of the application stall.</t>


<t>
WIth standard TCP, any application stall reducs the final value of cwnd at the end of recovery.  In some sense application stalls during recovery are treated as though they are additional losses, and have a detrimental effect on the connection data rate that lasts far longer than the stall itself.</t>


<t>
If there are no application stalls, the standard and Laminar variants of the PRR algorithm should have identical behaviors. Although it is tempting to characterize Laminar as being more aggressive than the standards, it would be more apropos to characterize the standard as being excessively timid under common combinations of overlapping events that are not well represented by benchmarks or models.</t>
</section>
<section title="Restart after idle, Congestion Window Validation and Pacing">
<t>
Decoupling congestion control from transmission scheduling permits us to develop new algorithms to raise total_pipe to CCwin after an application stall or other events.  Although it was stated earlier that the default transmission scheduling algorithm for raising total_pipe is an implicit slowstart, there is lots of opportunity for better algorithms.</t>


<t>
We imagine a new class of hybrid transmission scheduling algorithms that use a combination of pacing and slowstart to reestablish TCP's self clock.   For example, whenever total_pipe is significantly below CCwin, RTT and CCwin can be used to directly compute a pacing rate.   We suspect that pacing at the previous full rate will prove to be somewhat brittle, yielding erratic results.  It is more likely that a hybrid strategy will work better, for example by pacing at some fraction (1/2 or 1/4) of the prior rate until total_pipe reaches some fraction of CCwin (e.g. CCwin/2) and then using  conventional slowstart to bring total_pipe the rest of the way up to CCwin</t>   


<t>
This is far less aggressive than standard TCP without cwnd validation <xref target='RFC2861' />or when the application stall was less than one RTO, since standards permit TCP to send a full cwnd size burst in these situations.  It is potentially more aggressive than conventional slowstart invoked by cwnd validation when the application stall is longer than several RTOs.  Both standard behaviors in these situations have always been viewed as problematic, because interface rate bursts are clearly too aggressive and a full slowstart is clearly too conservative.   Mimicking either is a non-goal, when there is ample opportunity to find a better compromise.</t>


<t>
Although strictly speaking any new transmission scheduling algorithms are independent of the Laminar framework, they are expected to have substantially better behavior in many common environments and as such strongly motivate the effort required to refactor TCP implementations and standards.</t>


</section>
<section title="RTO and F-RTO">
<t>
We are not proposing any changes to the RTO timer or the F-RTO<xref target='RFC5682' /> algorithm used to detect spurious retransmissions.   Once it is determined that segments were lost, CCwin is updated to a new value as determined by the congestion control function, and Laminar implicit slowstart is used to clock out (re)transmissions.   Once all holes are filled, a hybrid paced transmissions can be used to reestablish TCPs self clock at the new data rate.  This can be the same hybrid pacing algorithm as is used to recover the self clock after application stalls.</t>


<t>
Note that as long as there is non-contiguous data at the receiver the retransmission algorithms require timely SACK information to make proper decisions about which segments to send.   Pacing during loss recovery is not recommended without further investigation.</t>


</section>
<section title="Undo">
<t>
Since CCwin is not used to implement transmission scheduling, undo is trivial.  CCwin can just be set back to a prior value and the transmission scheduling algorithm will transmit more (or less) data as needed.</t>


</section>
<section title="Control Block Interdependence">
<t>
Under the Laminar framework, congestion control state can be easily shared between connections<xref target='RFC2140' />.   An ensemble of connections can each maintain their own total_pipe (partial_pipe?) which in aggregate tracks a single common CCwin.   A master transmission scheduler allocates permission to send (sndcnt) to each of the constituent connection on the basis of the difference between the CCwin and the aggregate total_pipe, and a fairness or capacity allocation policy that balances the flows.  Note that ACKs on one connection in an ensemble might be used to clock transmissions on another connection, and that following a loss, the window reductions can be allocated to flows other than the one experiencing the loss.</t>


</section>


<section title="New Reno">
<t>
The key to making Laminar function well without SACK is having good estimators for DeliveredData and total_pipe.  By definition every duplicate ACK indicates that one segment has arrived at the receiver and total_pipe has fallen by one.  On any ACK that advances snd.una, total pipe can be updated from snd.nxt-snd.una, and DeliveredData is the change in snd.una, minus the estimated DeliveredData of the preceding duplicate ACKs.</t>


</section>
</section>
<section title="Example Pseudocode">
<t>
The example pseudocode in this section incorporates (or subsumes) the following algorithms:
</t>


<t>On startup:
<figure><artwork><![CDATA[
  CCwin = MAX_WINOW
  sndBank = IW
]]></artwork></figure></t>


<t>On every ACK:
<figure><artwork><![CDATA[
  DeliveredData = delta(snd.una) + delta(SACKd)
  pipe = (RFC 3517 pipe algorithm)
  total_pipe = pipe+DeliveredData+sndBank
  sndcnt = DeliveredData    // Default outcome


  if new_recovery():
     if CCwin == MAX_WIN:
        CCwin = total_pipe/2 // First time only
     CCwin = CCwin/2         // Reno congestion control
     prr_delivered = 0       // Total bytes delivered during recov
     prr_out = 0             // Total bytes sent during recovery
     RecoverFS = total_pipe  //


  if !in_recovery() && !application_limited():
     CCwin += (MSS/CCwin)
  prr_delivered += DeliveredData  // noop if not in recovery 
]]></artwork></figure>  
<figure><artwork><![CDATA[ 
  if total_pipe > CCwin:
     // Proportional Rate Reduction
     sndcnt = CEIL(prr_delivered * CCwin / RecoverFS) - prr_out
  
  else if total_pipe < CCwin:
     if in_recovery():
        // PRR Slow Start Reduction Bound
        limit = MAX(prr_delivered - prr_out, DeliveredData) + SMSS
        sndcnt = MIN(CCwin - total_pipe, limit)
     else:
        // slow start with appropriate byte counting
        inc = MIN(DeliveredData, 2*MSS)
        sndcnt = DeliveredData + inc


  // cue the (re)transmission machinery
  sndBank += sndcnt
  limit = maxBank()
  if sndBank > limit:
     sndBank = limit
  tcp_output()
]]></artwork></figure></t>


<t>For any data transmission or retransmission:
<figure><artwork><![CDATA[
tcp_output():
  while sndBank && tso_ok():
     len = sendsomething()
     sndBank -= len
     prr_out += len  // noop if not in recovery


]]></artwork></figure></t>
</section> 
<section title="Compatibility with existing implementations">


<t>
On a segment by segment basis, the above algorithm is [believed to be] fully conformant with or less aggressive than standards under all conditions.</t>


<t>However this condition is not sufficient to guarantee that average performance can't be substantially better (net more aggressive) than standards.  Consider an application that keeps TCP in bulk mode nearly all of the time, but has occasional pauses that last some fraction of one RTT.   A fully conforment TCP would be permitted to "catch up" by sending a partial window burst at full interface rate.  In some networks, such bursts might be very disruptive, causing otherwise unnecessary packet losses and corresponding cwnd reductions.</t>


<t>In Laminar, such a burst would be permitted, but the default algorithm would be slowstart.  A better algorithm would be to pace the data at (some fraction of) the prior rate.   Neither pacing nor slowstart is likely to cause unnecessary losses, and as was observed while testing PRR, being less aggressive at the segment level has the potential to increase average performance<xref target='IMC11PRR' />.  In this scenario Laminar with pacing has the potential to outperform both of the behaviors described by standards.</t>


</section>
<section anchor='security' title="Security Considerations">
<t>The Laminar framework does not change the risk profile for TCP (or other transport protocols) themselves.</t>


<t>However, the complexity of current algorithms as embodied in today's code present a substantial barrier to people wishing to cheat "TCP friendliness".  It is a fairly well known and easily rediscovered result that custom tweaks to make TCP more aggressive in one environment generally make it fragile and perform less well across the extreme diversity of the Internet.  This negative outcome is a substantial intrinsic barrier to wide deployment of rogue congestion control algorithms.</t>


<t>A direct consequence of the changes proposed in this note, decoupling congestion control from other algorithms, is likely to reduce the barrier to rogue algorithms.  However this separation and the ability to introduce new congestion control algorithms is a key part of the motivation for this work.</t>


<t>It is also important to note that web browsers have already largely defeated TCP's ability to regulate congestion by opening many concurrent connections.  When a Web page contains content served from multiple domains (the norm these days) all modern browsers open between 35 and 60 connections (see: http://www.browserscope.org/?category=network ).  This is the Web community's deliberate workaround for TCP's perceived poor performance and inability fill certain kinds of consumer grade networks.  As a consequence the transport layer has already lost a substantial portion of its ability to regulate congestion.  It was not anticipated that the tragedy of the commons in Internet congestion would be driven by competition between applications and not TCP implementations.</t>


<t>In the short term, we can continue to try to use standards and peer pressure to moderate the rise in overall congestion levels, however the only real solution is to develop mechanisms in the Internet itself to apply some sort of backpressure to overly aggressive applications and transport protocols.  We need to redouble efforts by the ConEx WG and others to develop mechanisms to inform policy with information about congestion and it's causes.  Otherwise we have a looming tragedy of the commons, in which TCP has only a minor role.</t>


<t>Implementers that change Laminar from counting bytes to segments have to be cautious about the effects of ACK splitting attacks<xref target='Savage99' />, where the receiver acknowledges partial segments for the purpose of confusing the sender's congestion accounting.</t>


<!--   @@@@ -->


</section>
<section title="IANA Considerations">


<t>This document makes no request of IANA.</t>
<t>Note to RFC Editor: this section may be removed on publication as an RFC.</t>


</section>
</middle>
<back>
<references>


<!-- Van Jacobson Sigcomm 88 -->
<reference anchor='Jacobson88'>
<front>
<title>Congestion Avoidance and Control</title>
<author initials='V' surname='Jacobson' fullname='Van Jacobson'>
    <organization /></author>
<date year='1988' month='August' />
</front>
<seriesInfo name='SIGCOMM' value="18(4)" />
</reference>


<!-- ./bibxml/reference.RFC.2140.xml -->
<?xml version='1.0' encoding='UTF-8'?>
<reference anchor='RFC2140'>
<front>
<title abbrev='TCP Control Block'>TCP Control Block Interdependence</title>
<author initials='J.' surname='Touch' fullname='Joe Touch'>
<organization>University of Southern California/Information Sciences Institute</organization>
<address>
<postal>
<street>4676 Admiralty Way</street>
<street>Marina del Rey</street>
<street>CA 90292-6695</street>
<country>USA</country></postal>
<phone>+1 310-822-1511 x151</phone>
<facsimile>+1 310-823-6714</facsimile>
<email>touch@isi.edu</email>
<uri>http://www.isi.edu/~touch</uri></address></author>
<date year='1997' month='April' />
<area>Transport</area>
<keyword>TCP</keyword>
<keyword>congestion</keyword>
<keyword>transmission control protocol</keyword>
<abstract>
<t>
   This memo makes the case for interdependent TCP control blocks, where
   part of the TCP state is shared among similar concurrent connections,
   or across similar connection instances. TCP state includes a
   combination of parameters, such as connection state, current round-
   trip time estimates, congestion control information, and process
   information.  This state is currently maintained on a per-connection
   basis in the TCP control block, but should be shared across
   connections to the same host. The goal is to improve transient
   transport performance, while maintaining backward-compatibility with
   existing implementations.
</t>
<t>
   This document is a product of the LSAM project at ISI.
</t></abstract></front>
<seriesInfo name='RFC' value='2140' />
<format type='TXT' octets='26032' target='http://www.rfc-editor.org/rfc/rfc2140.txt' />
<format type='HTML' octets='38814' target='http://xml.resource.org/public/rfc/html/rfc2140.html' />
<format type='XML' octets='23381' target='http://xml.resource.org/public/rfc/xml/rfc2140.xml' />
</reference>


<!-- ./bibxml/reference.RFC.2861.xml -->
<?xml version='1.0' encoding='UTF-8'?>


<reference anchor='RFC2861'>


<front>
<title>TCP Congestion Window Validation</title>
<author initials='M.' surname='Handley' fullname='M. Handley'>
<organization /></author>
<author initials='J.' surname='Padhye' fullname='J. Padhye'>
<organization /></author>
<author initials='S.' surname='Floyd' fullname='S. Floyd'>
<organization /></author>
<date year='2000' month='June' />
<abstract>
<t>This document describes a simple modification to TCP's congestion control algorithms to decay the congestion window cwnd after the transition from a sufficiently-long application-limited period, while using the slow-start threshold ssthresh to save information about the previous value of the congestion window.  This memo defines an Experimental Protocol for the Internet community.</t></abstract></front>


<seriesInfo name='RFC' value='2861' />
<format type='TXT' octets='26993' target='http://www.rfc-editor.org/rfc/rfc2861.txt' />
</reference>


<!-- ./bibxml/reference.RFC.3517.xml -->
<?xml version='1.0' encoding='UTF-8'?>
<reference anchor='RFC3517'>
<front>
<title>A Conservative Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for TCP</title>
<author initials='E.' surname='Blanton' fullname='E. Blanton'>
<organization /></author>
<author initials='M.' surname='Allman' fullname='M. Allman'>
<organization /></author>
<author initials='K.' surname='Fall' fullname='K. Fall'>
<organization /></author>
<author initials='L.' surname='Wang' fullname='L. Wang'>
<organization /></author>
<date year='2003' month='April' />
<abstract>
<t>This document presents a conservative loss recovery algorithm for TCP that is based on the use of the selective acknowledgment (SACK) TCP option.  The algorithm presented in this document conforms to the spirit of the current congestion control specification (RFC 2581), but allows TCP senders to recover more effectively when multiple segments are lost from a single flight of data. [STANDARDS-TRACK]</t></abstract></front>
<seriesInfo name='RFC' value='3517' />
<format type='TXT' octets='27855' target='http://www.rfc-editor.org/rfc/rfc3517.txt' />
</reference>


<!-- ./bibxml/reference.RFC.5681.xml -->
<?xml version='1.0' encoding='UTF-8'?>
<reference anchor='RFC5681'>
<front>
<title>TCP Congestion Control</title>
<author initials='M.' surname='Allman' fullname='M. Allman'>
<organization /></author>
<author initials='V.' surname='Paxson' fullname='V. Paxson'>
<organization /></author>
<author initials='E.' surname='Blanton' fullname='E. Blanton'>
<organization /></author>
<date year='2009' month='September' />
<abstract>
<t>This document defines TCP's four intertwined congestion control algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery.  In addition, the document specifies how TCP should begin transmission after a relatively long idle period, as well as discussing various acknowledgment generation methods.  This document obsoletes RFC 2581. [STANDARDS-TRACK]</t></abstract></front>


<seriesInfo name='RFC' value='5681' />
<format type='TXT' octets='44339' target='http://www.rfc-editor.org/rfc/rfc5681.txt' />
</reference>


<!-- ./bibxml/reference.RFC.5682.xml -->
<?xml version='1.0' encoding='UTF-8'?>
<reference anchor='RFC5682'>
<front>
<title>Forward RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious Retransmission Timeouts with TCP</title>
<author initials='P.' surname='Sarolahti' fullname='P. Sarolahti'>
<organization /></author>
<author initials='M.' surname='Kojo' fullname='M. Kojo'>
<organization /></author>
<author initials='K.' surname='Yamamoto' fullname='K. Yamamoto'>
<organization /></author>
<author initials='M.' surname='Hata' fullname='M. Hata'>
<organization /></author>
<date year='2009' month='September' />
<abstract>
<t>The purpose of this document is to move the F-RTO (Forward RTO-Recovery) functionality for TCP in RFC 4138 from Experimental to Standards Track status. The F-RTO support for Stream Control Transmission Protocol (SCTP) in RFC 4138 remains with Experimental status. See Appendix B for the differences between this document and RFC 4138.&lt;/t>&lt;t> Spurious retransmission timeouts cause suboptimal TCP performance because they often result in unnecessary retransmission of the last window of data. This document describes the F-RTO detection algorithm for detecting spurious TCP retransmission timeouts. F-RTO is a TCP sender-only algorithm that does not require any TCP options to operate. After retransmitting the first unacknowledged segment triggered by a timeout, the F-RTO algorithm of the TCP sender monitors the incoming acknowledgments to determine whether the timeout was spurious. It then decides whether to send new segments or retransmit unacknowledged segments. The algorithm effectively helps to avoid additional unnecessary retransmissions and thereby improves TCP performance in the case of a spurious timeout. [STANDARDS-TRACK]</t></abstract></front>
<seriesInfo name='RFC' value='5682' />
<format type='TXT' octets='47337' target='http://www.rfc-editor.org/rfc/rfc5682.txt' />
</reference>


<!-- bibxml3/reference.I-D.ietf-tcpm-proportional-rate-reduction.xml -->
<?xml version='1.0' encoding='UTF-8'?>
<reference anchor='I-D.ietf-tcpm-proportional-rate-reduction'>
<front>
<title>Proportional Rate Reduction for TCP</title>
<author initials='M' surname='Mathis' fullname='Matt Mathis'>
    <organization /></author>
<author initials='N' surname='Dukkipati' fullname='Nandita Dukkipati'>
    <organization /></author>
<author initials='Y' surname='Cheng' fullname='Yuchung Cheng'>
    <organization /></author>
<date month='October' day='24' year='2011' />
<abstract><t>This document describes an experimental algorithm, Proportional Rate Reduction (PPR) to improve the accuracy of the amount of data sent by TCP during loss recovery.  Standard Congestion Control requires that TCP and other protocols reduce their congestion window in response to losses.  This window reduction naturally occurs in the same round trip as the data retransmissions to repair the losses, and is implemented by choosing not to transmit any data in response to some ACKs arriving from the receiver.  Two widely deployed algorithms are used to implement this window reduction: Fast Recovery and Rate Halving.  Both algorithms are needlessly fragile under a number of conditions, particularly when there is a burst of losses that such that the number of ACKs returning to the sender is small. Proportional Rate Reduction avoids these excess window reductions such that at the end of recovery the actual window size will be as close as possible to ssthresh, the window size determined by the congestion control algorithm.  It is patterned after Rate Halving, but using the fraction that is appropriate for target window chosen by the congestion control algorithm.</t></abstract>
</front>
<seriesInfo name='Internet-Draft' value='draft-ietf-tcpm-proportional-rate-reduction-00' />
<format type='TXT'
        target='http://www.ietf.org/internet-drafts/draft-ietf-tcpm-proportional-rate-reduction-00.txt' />
</reference>


<reference anchor='IMC11PRR'>
<front>
<title>Proportional Rate Reduction for TCP</title>
<author initials='M' surname='Mathis' fullname='Matt Mathis'>
    <organization /></author>
<author initials='N' surname='Dukkipati' fullname='Nandita Dukkipati'>
    <organization /></author>
<author initials='Y' surname='Cheng' fullname='Yuchung Cheng'>
    <organization /></author>
<author initials='M' surname='Ghobadi' fullname='Monia Ghobadi'>
    <organization /></author>
<date year='2011' />
<abstract><t>Packet losses increase latency for Web users. Fast recovery is a key mechanism for TCP to recover from packet losses. In this paper, we explore some of the weaknesses of the standard algorithm described in RFC 3517 and the non-standard algorithms implemented in Linux. We find that these algorithms deviate from their intended behavior in the real world due to the combined effect of short flows, application stalls, burst losses, acknowledgment (ACK) loss and reordering, and stretch ACKs. Linux suffers from excessive congestion window reductions while RFC 3517 transmits large bursts under high losses, both of which harm the rest of the flow and increase Web latency.</t>
<t>Our primary contribution is a new design to control transmission in fast recovery called proportional rate reduction (PRR). PRR recovers from losses quickly, smoothly and accurately by pacing out retransmissions across received ACKs. In addition to PRR, we evaluate the TCP early retransmit (ER) algorithm which lowers the duplicate acknowledgment threshold for short transfers, and show that delaying early retransmissions for a short interval is effective in avoiding spurious retransmissions in the presence of a small degree of reordering. PRR and ER reduce the TCP latency of connections experiencing losses by 3-10% depending on the response size. Based on our instrumentation on Google Web and YouTube servers in U.S. and India, we also present key statistics on the nature of TCP retransmissions.
 
</t></abstract>
</front>
<seriesInfo name='Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference' value=''/>
</reference>


<!-- NOTUSED 
      <reference anchor="SPDY"
         target="http://dev.chromium.org/spdy">
        <front>
          <title>SPDY: An experimental protocol for a faster web</title>
          <author fullname='Mike Belshe' surname='Belshe'>
            <organization abbrev='Google'>Google, Inc.</organization>
          </author>
          <date year="2011" />
        </front>
      </reference>
NOTUSED ->


<!-- Van Jacobson Sigcomm 88 -->
<reference anchor='Savage99'>
<front>
<title>TCP congestion control with a misbehaving receiver
</title>
<author initials='S' surname='Savage' > <organization /></author>
<author initials='N' surname='Cardwell' > <organization /></author>
<author initials='D' surname='Wetherall' > <organization /></author>
<author initials='T' surname='Anderson' > <organization /></author>
<date year='1999' month='October ' />
</front>
<seriesInfo name='SIGCOMM Comput. Commun. Rev.' value="29(5)" />
</reference>


</references>
</back>
</rfc>
