INTERNET-DRAFT                     Amy Hughes, Joe Touch, John Heidemann
draft-ietf-tcpimpl-restart-00.txt                                    ISI
                                                          March 30, 1998
                                                 Expires: Sept. 30, 1998

              Issues in TCP Slow-Start Restart After Idle

Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as ``work in progress.''

   Please check the I-D abstract listing contained in each Internet
   Draft directory to learn the current status of this or any other
   Internet Draft.

   The distribution of this document is unlimited.

Abstract

   This draft discusses variations in the TCP 'slow-start restart' (SSR)
   algorithm, and the unintended failure of some variations to properly
   restart in some environments. SSR is intended to avoid line-rate
   bursts after idle periods, where TCP accumulates permission to send
   in the form of ACKs, but does not consume that permission
   immediately. SSR's original "restart after send is idle" is commonly
   implemented as "restart after receive is idle". The latter
   unintentionally fails to restart for bidirectional connections where
   the sender's burst is triggered by a reverse-path data packet, such
   as in persistent HTTP. Both the former and latter are shown to permit
   bursts in other circumstances. Three solutions are discussed, and
   their implementations evaluated.

   This document is a product of the LSAM project at ISI.  Comments are
   solicited and should be addressed to the authors.

Introduction

   Slow-Start Restart (SSR) describes one TCP behavior to respond to
   long sending pauses in an open connection.  When a sender becomes
   idle, the normal ack-clocking mechanism which regulates traffic is no
   longer present and the sender may introduce a burst of packets into
   the network as large as the current congestion window (CWND).  Such a
   burst may be too large for the intermediate routers to handle and may
   be too large for the receiver to handle at one time as well.

   A send timer was first proposed [JK90] to detect idle sending
   periods; the recommended response is to close the congestion window
   and perform a new slow-start.  However, a footnote to this first
   proposed solution noted that send/receive symmetry on the channel
   meant that a receive timer could be used instead to achieve the same
   results.  As this second solution takes advantage of a timer that is
   already required (to detect packet loss) it was implemented by
   Jacobson and Karels.  This solution has been repeated in
   implementations which derive from their work.

   Bursty connections, such as the persistent connections required in
   HTTP/1.1 [FGMFB97] have been found to interact in meaningful ways
   with SSR [6].  In fact, it was discovered that SSR never occurs with
   HTTP/1.1 [Poo97].  This is because a new request will reset the
   receive timer (as suggested in the footnote in [JK90]) and the
   sending pause will not be detected [Tou97].

   Further, both timer solutions depend on the retransmit timeout (RTO)
   and cannot detect send pauses that are shorter than this duration.
   In such cases, the sender may transmit a burst as large as the full
   congestion window.

Burst detection.

   There are several ways of determining whether a connection is at risk
   of sending a burst of packets into the channel.  We will discuss each
   method below, from the least radical to the most radical.

 Receive Timer:
   The use of a receive timer is the most common burst detection method.
   It is attractive because it is simple and makes use of an existing
   timer.  However, a receive timer does not properly detect bursts in
   HTTP/1.1 because the timer is cancelled when the request packet is
   received.  Further, when the connection is idle for less than a full
   RTO, a burst cannot be detected.  Such a burst can happen when the
   connection is "nearly idle" or when acks are lost or reordered.

 Send Timer:
   A send timer is the reciprocal solution to using a receive timer.
   While it requires a new timestamp field to be maintained, it clearly
   detects send pauses and corrects the problem presented by HTTP/1.1.
   However, as with the receive timer, it cannot detect bursts that
   could happen before a full RTO.

 Packet Counting:
   An alternative method examines the unused portion of the congestion
   window to determine if the capacity to burst exists.  This method is
   simple, it uses existing information to make its decision, and it
   solves both the HTTP/1.1. problem as well as the RTO problem.  In
   addition, it addresses the problem that needs to be solved (bursts)
   instead of a specific circumstance where the problem could happen
   (send pauses).  However, where timer detection avoids defining a
   burst (it defines idle periods instead), here a burst must be defined
   before it can be detected.  One possible definition is the situation
   where the available portion of the sending window is some proportion
   of the entire congestion window, say 50%.  Another definition places
   a numerical limit on the available portion of the congestion window,
   say 4 or CWND-1 packets.

Burst Response

   Once a burst is detected, there are several different ways to take
   action.  The different possibilities are listed below, again from
   least to most radical.

 Full Restart:
   Reducing the congestion window to one packet and re-entering slow-
   start, the original slow-start restart is one response.  This was the
   solution proposed by J&K.  This is a very conservative response and
   it defeats most of the speedup that HTTP/1.1 provides [HOT97].
   Current proposals [FAP97] have suggested increasing the initial
   window from 1 packet to 4 packets.  Further, depending on the method
   of burst detection, Full Restart can be far more punitive than it
   should be.  Coupled with a timer, full restart is most likely to
   respond to a completely empty congestion window.  Coupled with Packet
   Counting, the response could close the window too far, even smaller
   than the amount of outstanding data.

 Window Limiting:
   This is a modified version of Full Restart which solves the problem
   created by using Packet Counting to detect bursts.  With this type of
   response, the congestion window is reduced to the amount of
   outstanding data plus the slow-start initial window (1, 2, or 4).  It
   works exactly like Full Restart in the idle case, but is successful
   at controlling bursts in an active connection.  Further, in an active
   connection, it effectively implements a leaky bucket of the initial
   window size for the accumulation of send opportunity based on the
   receipt of acks.  This solution is fairly conservative, especially as
   it defaults to Full Restart, but more importantly, sending
   opportunity is simply lost if not used, and is not available for
   paced output.  Also, it forces negative congestion feedback on the
   congestion window.

 Burst Size Limitation:
   When a burst is detected, its effects are limited, the sender may not
   send any more than a preset number of packets into the network.  It
   is less conservative than the first two responses in that it does not
   affect the size of the congestion window, and it is simple to
   implement, simply count up the number of packets you can send and
   stop when you reach the limit.  Whether to wait for an ack or some
   other signal to resume sending is an implementation detail.  Lastly,
   this burst response can be performed after each ack or with each
   send. The behavior is slightly different in each case.

 Pacing:
   When a burst is detected, packets are dribbled into the network until
   the sender starts receiving acks and normal maintenance can be
   resumed [VH97].  This solution is very easy on the network and scales
   well in cases of high bw/delay.  However, it requires a new timer and
   parameter tuning require more research.

Implemented Solutions

   Now we will examine combinations of the different detection and
   response methods presented above.  Each of the solutions that below
   have been implemented in some form.

 BSD Implementation (Jacobson and Karels)
   The most common implementation uses a receive timer coupled with Full
   Restart.  This is the implementation that causes the interaction
   problems with HTTP/1.1.  The obvious alternative is to implement a
   send timer as originally intended and use Full Restart.  There are
   several drawbacks to this solution.  First, a send timer adds
   additional state and serves no purpose other than to correct the
   bursting behavior after send pauses.  Second, forcing a slow-start in
   this situation is problematic for HTTP/1.1.  A slow-start for each
   new user request adds a delay burden to characteristically small HTTP
   responses. Further, the HTTP user request pattern is unpredictable.
   It is possible for the user to make a new request before the send
   timer expires, triggering a burst that would defeat such a timer.

 Maximum Burst Limitation (Floyd)
   Floyd has proposed a coupling of Packet Counting with Burst Size
   Limitation.  This solution has been implemented in ns and it prevents
   the sender from transmitting a series of back-to-back packets larger
   than the user configured burst limit (suggested to be 4 packets)
   [NS97].  There are several issues involved with recovering from a
   burst and the ns implementation doesn't address them consistently.
   First, it is not clear when the sender is allowed to send again after
   sending the the first limited burst of packets.  One implementation
   requires the sender to wait for the burst timer to expire.  Another
   seems to allow a series of short bursts.  Another issue is how the
   simulation implementation and usage translates to a live network
   situation.  The implementation of this solution can range from simple
   to more complex.

 Congestion Window Monitoring (Hughes, Touch, and Heidemann)
   Our proposed solution combines Packet Counting with Window Limiting.
   Whenever (CWND - outstanding data > 4), we reduce CWND to
   (outstanding data + 4).  The choice of 4 packets is discussed in with
   the implementation details below.  Congestion Window Monitoring (CWM)
   allows the congestion window to grow normally but shrinks the
   congestion window as the sender becomes idle.  It also prevents the
   sender from transmitting any bursts larger than 4 packets in response
   to a new request. Because CWM is not dependent on any timers, the
   loss of an ack or a nearly idle connection cannot cause any bursts.
   CWM is similar to Burst Limitation, but avoids the burst by reducing
   CWND, rather than by inhibiting the sends directly.  As a result, we
   avoid the potential problem of sequential calls to TCP_output, which
   would cause bursts in the former, but not the latter.  CWM also
   causes TCP to use the feedback of 'not using the CWND fast enough',
   which results in a decrease in the CWND.

   CWM effectively imposes a leaky bucket type limitation on the
   congestion window.  The window is allowed to grow and be managed
   normally but the sender is not allowed to save up any sending
   opportunities.  Any opportunity that is not used is lost.  This
   property of CWM forces interleaved reception of acks and processing
   of sends.

 Rate Based Pacing (Visweswaraiah and Heidemann)
   Rate Based Pacing combines the Pacing response with either a Send
   Timer or Packet Counting.  It avoids slow-start when resuming after
   sending pauses and allows the normal clocking of packets to be
   gracefully restarted.  When a burst potential is detected, the
   algorithm meters a small burst of packets into the channel [VH97].
   RBP is the least conservative solution to the bursting problem
   because it continues to make use of the pre-pause congestion window.
   If network conditions have changed significantly, maintaining the
   previous window could cause the paced connection to be overly
   aggressive as compared to other connections.  (Although some work
   suggests congestion windows are stable over multi-minute timeframes
   [BSSK97].)  More recently pacing been suggested for use in wireless
   networking scenarios [BPK97], and for satellite connections.

Experimental Comparisons

   Packet traces of the current FreeBSD implementation of SSR (using the
   receive timer), of a modified version of FreeBSD using a send timer,
   and of CWM with HTTP/1.1 support the above observations.  In all of
   the traces, the response pattern for the first request is the same
   with each method.  This shows that CWM allows the congestion window
   to grow normally.  Because of the different actions taken by the
   three algorithms, the response pattern for the second request differs
   as would be expected.  [We have graphs available upon request]

   When the second request arrives at the server after the
   retransmission timeout (RTO), normal FreeBSD allows the server to
   respond with a burst of packets.  FreeBSD using a send timer responds
   by entering slow-start. CWM allows a 4 packet burst.  When the second
   request arrives at the server before the RTO, both timer
   implementations allow a burst.  CWM again limits the burst to 4
   packets.  Note, RTO is the common timer limit, but any value would
   have the same results, depending on when the second request was
   presented in relation to the timer.

Implementation of Congestion Window Monitoring

   Congestion Window Monitoring requires a simple modification to
   existing TCP output routines.  The changes required replace the
   current idle detection code.  Replace the existing 3 lines of code:

          idle = (snd_max == snd_una)
          if (idle && now - lastrcv >= rto)
                  cwnd = 1;

   with the following 3 lines of code:

          maxwin = 4 + snd_nxt - snd_una;
          if (cwnd > maxwin)
                  cwnd = maxwin;

   Packet counting is implemented by line 1.  Lines 2 and 3 implement
   Window Limitation.

   The choice of limiting the available congestion window to 4 packets
   is based on the normal operation of TCP.  An ACK received by the
   sender may be in response to the receipt of 2 packets, allowing
   another 2 to be sent.  Further, normal window growth may require the
   sending of a third packet.  Lastly, in slow-start with delayed ACKs,
   the receipt of an ACK can trigger the sending of 4 packets. Thus, 4
   packets is a reasonable burst to send into the network.

   Increasing the initial window in slow-start to 4 packets has already
   been proposed [FAP97].  The effects of this change have been explored
   in simulation in [PN98] and in practice in [AHO97].  Such a
   modification to TCP would cause the same behavior as our solution in
   the cases where the pause timer has expired.  It does not address the
   pre-timeout bursting situation we are concerned with.

Conclusions

   At this time, we propose CWM as a simple, minimal and effective fix
   to the 'bug' in current TCP implementations that is exploited by
   HTTP/1.1.  Modifications can be made to TCP to solve the slow-start
   restart problem that are consistent with the original congestion
   avoidance specifications (i.e. a send timer).  However, we feel that
   the original intended behavior is not appropriate to some current
   applications, specifically HTTP. Thus, we recommend Congestion Window
   Monitoring to prevent bursts into the network.  Not only does this
   solution solve the current problem in a simple way, it will prevent
   bursting in any other situation that might arise. The 4 packet bursts
   which we allow are consistent with congestion window growth
   algorithms and with Floyd's conclusion about increasing the initial
   window size.

   CWM, as well as the other solutions listed, need to be re-evaluated
   within emerging TCP implementations, e.g., SACK [JB88].  In general,
   TCP has no rate pacing and uses congestion control to avoid bursts in
   current implementations.  A more explicit mechanism, such as RBP or
   similar proposals may be desirable in the future.

Security implications

   CWM presents no security problems.

References


   [AHO97] Mark Allman, Chris Hayes, and Shawn Ostermann.  An Evaluatin
       of TCP Slow Start Modifications, July 1997.  (Submitted to CCR,
       draft available from http://jarok.cs.ohiou.edu/papers/)

   [BPK97] Hari Balakrishnan, Venkata N. Padmanabhan, and Randy H. Katz.
       The Effects of Asymmetry on TCP Performance.  In Proceedings of
       the ACM/IEEE Mobicom, Budapest, Hungary, ACM.  September, 1997.

   [BSSK97] Hari Balakrishnan, Srinivasan Seshan, Mark Stemm, and Randy
       H. Katz.  Analyzing Stability in Wide-Area Network Performance.
       In Proceedings of the ACM SIGMETRICS, Seattle WA, USA, ACM.
       June, 1997.

   [FGMFB97] R. Fielding, Jim Gettys, Jeffrey C. Mogul, H. Frystyk, and
       Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1, January
       1997.  RFC 2068.

   [FAP97] Sally Floyd, Mark Allman, and Craig Partridge.  Increasing
       TCP's Initial Window, July 1997.  Internet Draft draft-floyd-
       incr-init-win-01.txt

   [Hei97] John Heidemann. Performance Interactions Between P-HTTP and
       TCP Implementations.  ACM Computer Communications Review, 27(2),
       65-73, April 1997.

   [HOT97] John Heidemann, Katia Obraczka, and Joe Touch.  Modeling the
       Performance of HTTP Over Several Transport Protocols.  ACM/IEEE
       Transactions on Networking 5(5), 616-630, October, 1997.

   [JB88] Van Jacobson and R.T. Braden. TCP extensions for long-delay
       paths, October 1988. RFC 1072.

   [JK90] Van Jacobson and Michael J. Karels.  Congestion Avoidance and
       Control.  ACM Computer Communication Review, 18(4):314-329,
       August 1990. Revised version of his SIGCOMM '88 paper.

   [NS97] ns Network Simulator.  http://www-mash.cs.berkeley.edu/ns/,
       1997.

   [PN98] K. Poduri and K. Nichols. Simulation Studies of Increased
       Initial TCP Window Size, February 1998.  Internet Draft draft-
       ietf-tcpimpl-poduri-00.txt

   [Poo97] Kacheong Poon, Sun Microsystems, tcp-implementors mailing
       list, August, 1997.

   [Tou97] Joe Touch, ISI, tcp-implementors mailing list, August 12,
       1997.

   [VH97] Vikram Visweswaraiah and John Heidemann.  Improving Restart of
       Idle TCP Connections.  Technical Report 97-661, University of
       Southern California, November 1997.


Authors/ Address

   Amy Hughes, Joe Touch, John Hiedemann
   University of Southern California/Information Sciences Institute
   4676 Admiralty Way
   Marina del Rey, CA 90292-6695
   USA
   Phone: +1 310-822-1511
   Fax:   +1 310-823-6714
   URLs:   http://www.isi.edu/~ahughes
           http://www.isi.edu/~touch
           http://www.isi.edu/~johnh
   Email: ahughes@isi.edu
          touch@isi.edu
          johnh@isi.edu