idnits 2.17.1 

draft-ietf-tcpimpl-restart-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-19) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 452 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([AHO97], [JK90], [BSSK97],
     [JB88], [BPK97], [NS97], [VH97], [6], [PN98], [Tou97], [FAP97], [HOT97],
     [Poo97], [FGMFB97]), which it shouldn't.  Please replace those with
     straight textual mentions of the documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The "Author's Address" (or "Authors' Addresses") section title is
     misspelled.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 30, 1998) is 9517 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '6' on line 66

  == Unused Reference: 'Hei97' is defined on line 343, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'AHO97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'BPK97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'BSSK97'

  ** Obsolete normative reference: RFC 2068 (ref. 'FGMFB97') (Obsoleted by
     RFC 2616)

  == Outdated reference: A later version (-02) exists of
     draft-floyd-incr-init-win-01

  ** Downref: Normative reference to an Experimental draft:
     draft-floyd-incr-init-win (ref. 'FAP97')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Hei97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HOT97'

  ** Obsolete normative reference: RFC 1072 (ref. 'JB88') (Obsoleted by RFC
     1323, RFC 2018, RFC 6247)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'JK90'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NS97'

  == Outdated reference: A later version (-01) exists of
     draft-ietf-tcpimpl-poduri-00

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-tcpimpl-poduri (ref. 'PN98')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Poo97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Tou97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'VH97'


     Summary: 14 errors (**), 0 flaws (~~), 6 warnings (==), 13 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET-DRAFT                     Amy Hughes, Joe Touch, John Heidemann
2	draft-ietf-tcpimpl-restart-00.txt                                    ISI
3	                                                          March 30, 1998
4	                                                 Expires: Sept. 30, 1998

6	              Issues in TCP Slow-Start Restart After Idle

8	Status of this Memo

10	   This document is an Internet-Draft.  Internet-Drafts are working
11	   documents of the Internet Engineering Task Force (IETF), its areas,
12	   and its working groups.  Note that other groups may also distribute
13	   working documents as Internet-Drafts.

15	   Internet-Drafts are draft documents valid for a maximum of six months
16	   and may be updated, replaced, or obsoleted by other documents at any
17	   time.  It is inappropriate to use Internet- Drafts as reference
18	   material or to cite them other than as ``work in progress.''

20	   Please check the I-D abstract listing contained in each Internet
21	   Draft directory to learn the current status of this or any other
22	   Internet Draft.

24	   The distribution of this document is unlimited.

26	Abstract

28	   This draft discusses variations in the TCP 'slow-start restart' (SSR)
29	   algorithm, and the unintended failure of some variations to properly
30	   restart in some environments. SSR is intended to avoid line-rate
31	   bursts after idle periods, where TCP accumulates permission to send
32	   in the form of ACKs, but does not consume that permission
33	   immediately. SSR's original "restart after send is idle" is commonly
34	   implemented as "restart after receive is idle". The latter
35	   unintentionally fails to restart for bidirectional connections where
36	   the sender's burst is triggered by a reverse-path data packet, such
37	   as in persistent HTTP. Both the former and latter are shown to permit
38	   bursts in other circumstances. Three solutions are discussed, and
39	   their implementations evaluated.

41	   This document is a product of the LSAM project at ISI.  Comments are
42	   solicited and should be addressed to the authors.

44	Introduction

46	   Slow-Start Restart (SSR) describes one TCP behavior to respond to
47	   long sending pauses in an open connection.  When a sender becomes
48	   idle, the normal ack-clocking mechanism which regulates traffic is no
49	   longer present and the sender may introduce a burst of packets into
50	   the network as large as the current congestion window (CWND).  Such a
51	   burst may be too large for the intermediate routers to handle and may
52	   be too large for the receiver to handle at one time as well.

54	   A send timer was first proposed [JK90] to detect idle sending
55	   periods; the recommended response is to close the congestion window
56	   and perform a new slow-start.  However, a footnote to this first
57	   proposed solution noted that send/receive symmetry on the channel
58	   meant that a receive timer could be used instead to achieve the same
59	   results.  As this second solution takes advantage of a timer that is
60	   already required (to detect packet loss) it was implemented by
61	   Jacobson and Karels.  This solution has been repeated in
62	   implementations which derive from their work.

64	   Bursty connections, such as the persistent connections required in
65	   HTTP/1.1 [FGMFB97] have been found to interact in meaningful ways
66	   with SSR [6].  In fact, it was discovered that SSR never occurs with
67	   HTTP/1.1 [Poo97].  This is because a new request will reset the
68	   receive timer (as suggested in the footnote in [JK90]) and the
69	   sending pause will not be detected [Tou97].

71	   Further, both timer solutions depend on the retransmit timeout (RTO)
72	   and cannot detect send pauses that are shorter than this duration.
73	   In such cases, the sender may transmit a burst as large as the full
74	   congestion window.

76	Burst detection.

78	   There are several ways of determining whether a connection is at risk
79	   of sending a burst of packets into the channel.  We will discuss each
80	   method below, from the least radical to the most radical.

82	 Receive Timer:
83	   The use of a receive timer is the most common burst detection method.
84	   It is attractive because it is simple and makes use of an existing
85	   timer.  However, a receive timer does not properly detect bursts in
86	   HTTP/1.1 because the timer is cancelled when the request packet is
87	   received.  Further, when the connection is idle for less than a full
88	   RTO, a burst cannot be detected.  Such a burst can happen when the
89	   connection is "nearly idle" or when acks are lost or reordered.

91	 Send Timer:
92	   A send timer is the reciprocal solution to using a receive timer.
93	   While it requires a new timestamp field to be maintained, it clearly
94	   detects send pauses and corrects the problem presented by HTTP/1.1.
95	   However, as with the receive timer, it cannot detect bursts that
96	   could happen before a full RTO.

98	 Packet Counting:
99	   An alternative method examines the unused portion of the congestion
100	   window to determine if the capacity to burst exists.  This method is
101	   simple, it uses existing information to make its decision, and it
102	   solves both the HTTP/1.1. problem as well as the RTO problem.  In
103	   addition, it addresses the problem that needs to be solved (bursts)
104	   instead of a specific circumstance where the problem could happen
105	   (send pauses).  However, where timer detection avoids defining a
106	   burst (it defines idle periods instead), here a burst must be defined
107	   before it can be detected.  One possible definition is the situation
108	   where the available portion of the sending window is some proportion
109	   of the entire congestion window, say 50%.  Another definition places
110	   a numerical limit on the available portion of the congestion window,
111	   say 4 or CWND-1 packets.

113	Burst Response

115	   Once a burst is detected, there are several different ways to take
116	   action.  The different possibilities are listed below, again from
117	   least to most radical.

119	 Full Restart:
120	   Reducing the congestion window to one packet and re-entering slow-
121	   start, the original slow-start restart is one response.  This was the
122	   solution proposed by J&K.  This is a very conservative response and
123	   it defeats most of the speedup that HTTP/1.1 provides [HOT97].
124	   Current proposals [FAP97] have suggested increasing the initial
125	   window from 1 packet to 4 packets.  Further, depending on the method
126	   of burst detection, Full Restart can be far more punitive than it
127	   should be.  Coupled with a timer, full restart is most likely to
128	   respond to a completely empty congestion window.  Coupled with Packet
129	   Counting, the response could close the window too far, even smaller
130	   than the amount of outstanding data.

132	 Window Limiting:
133	   This is a modified version of Full Restart which solves the problem
134	   created by using Packet Counting to detect bursts.  With this type of
135	   response, the congestion window is reduced to the amount of
136	   outstanding data plus the slow-start initial window (1, 2, or 4).  It
137	   works exactly like Full Restart in the idle case, but is successful
138	   at controlling bursts in an active connection.  Further, in an active
139	   connection, it effectively implements a leaky bucket of the initial
140	   window size for the accumulation of send opportunity based on the
141	   receipt of acks.  This solution is fairly conservative, especially as
142	   it defaults to Full Restart, but more importantly, sending
143	   opportunity is simply lost if not used, and is not available for
144	   paced output.  Also, it forces negative congestion feedback on the
145	   congestion window.

147	 Burst Size Limitation:
148	   When a burst is detected, its effects are limited, the sender may not
149	   send any more than a preset number of packets into the network.  It
150	   is less conservative than the first two responses in that it does not
151	   affect the size of the congestion window, and it is simple to
152	   implement, simply count up the number of packets you can send and
153	   stop when you reach the limit.  Whether to wait for an ack or some
154	   other signal to resume sending is an implementation detail.  Lastly,
155	   this burst response can be performed after each ack or with each
156	   send. The behavior is slightly different in each case.

158	 Pacing:
159	   When a burst is detected, packets are dribbled into the network until
160	   the sender starts receiving acks and normal maintenance can be
161	   resumed [VH97].  This solution is very easy on the network and scales
162	   well in cases of high bw/delay.  However, it requires a new timer and
163	   parameter tuning require more research.

165	Implemented Solutions

167	   Now we will examine combinations of the different detection and
168	   response methods presented above.  Each of the solutions that below
169	   have been implemented in some form.

171	 BSD Implementation (Jacobson and Karels)
172	   The most common implementation uses a receive timer coupled with Full
173	   Restart.  This is the implementation that causes the interaction
174	   problems with HTTP/1.1.  The obvious alternative is to implement a
175	   send timer as originally intended and use Full Restart.  There are
176	   several drawbacks to this solution.  First, a send timer adds
177	   additional state and serves no purpose other than to correct the
178	   bursting behavior after send pauses.  Second, forcing a slow-start in
179	   this situation is problematic for HTTP/1.1.  A slow-start for each
180	   new user request adds a delay burden to characteristically small HTTP
181	   responses. Further, the HTTP user request pattern is unpredictable.
182	   It is possible for the user to make a new request before the send
183	   timer expires, triggering a burst that would defeat such a timer.

185	 Maximum Burst Limitation (Floyd)
186	   Floyd has proposed a coupling of Packet Counting with Burst Size
187	   Limitation.  This solution has been implemented in ns and it prevents
188	   the sender from transmitting a series of back-to-back packets larger
189	   than the user configured burst limit (suggested to be 4 packets)
190	   [NS97].  There are several issues involved with recovering from a
191	   burst and the ns implementation doesn't address them consistently.
192	   First, it is not clear when the sender is allowed to send again after
193	   sending the the first limited burst of packets.  One implementation
194	   requires the sender to wait for the burst timer to expire.  Another
195	   seems to allow a series of short bursts.  Another issue is how the
196	   simulation implementation and usage translates to a live network
197	   situation.  The implementation of this solution can range from simple
198	   to more complex.

200	 Congestion Window Monitoring (Hughes, Touch, and Heidemann)
201	   Our proposed solution combines Packet Counting with Window Limiting.
202	   Whenever (CWND - outstanding data > 4), we reduce CWND to
203	   (outstanding data + 4).  The choice of 4 packets is discussed in with
204	   the implementation details below.  Congestion Window Monitoring (CWM)
205	   allows the congestion window to grow normally but shrinks the
206	   congestion window as the sender becomes idle.  It also prevents the
207	   sender from transmitting any bursts larger than 4 packets in response
208	   to a new request. Because CWM is not dependent on any timers, the
209	   loss of an ack or a nearly idle connection cannot cause any bursts.
210	   CWM is similar to Burst Limitation, but avoids the burst by reducing
211	   CWND, rather than by inhibiting the sends directly.  As a result, we
212	   avoid the potential problem of sequential calls to TCP_output, which
213	   would cause bursts in the former, but not the latter.  CWM also
214	   causes TCP to use the feedback of 'not using the CWND fast enough',
215	   which results in a decrease in the CWND.

217	   CWM effectively imposes a leaky bucket type limitation on the
218	   congestion window.  The window is allowed to grow and be managed
219	   normally but the sender is not allowed to save up any sending
220	   opportunities.  Any opportunity that is not used is lost.  This
221	   property of CWM forces interleaved reception of acks and processing
222	   of sends.

224	 Rate Based Pacing (Visweswaraiah and Heidemann)
225	   Rate Based Pacing combines the Pacing response with either a Send
226	   Timer or Packet Counting.  It avoids slow-start when resuming after
227	   sending pauses and allows the normal clocking of packets to be
228	   gracefully restarted.  When a burst potential is detected, the
229	   algorithm meters a small burst of packets into the channel [VH97].
230	   RBP is the least conservative solution to the bursting problem
231	   because it continues to make use of the pre-pause congestion window.
232	   If network conditions have changed significantly, maintaining the
233	   previous window could cause the paced connection to be overly
234	   aggressive as compared to other connections.  (Although some work
235	   suggests congestion windows are stable over multi-minute timeframes
236	   [BSSK97].)  More recently pacing been suggested for use in wireless
237	   networking scenarios [BPK97], and for satellite connections.

239	Experimental Comparisons

241	   Packet traces of the current FreeBSD implementation of SSR (using the
242	   receive timer), of a modified version of FreeBSD using a send timer,
243	   and of CWM with HTTP/1.1 support the above observations.  In all of
244	   the traces, the response pattern for the first request is the same
245	   with each method.  This shows that CWM allows the congestion window
246	   to grow normally.  Because of the different actions taken by the
247	   three algorithms, the response pattern for the second request differs
248	   as would be expected.  [We have graphs available upon request]

250	   When the second request arrives at the server after the
251	   retransmission timeout (RTO), normal FreeBSD allows the server to
252	   respond with a burst of packets.  FreeBSD using a send timer responds
253	   by entering slow-start. CWM allows a 4 packet burst.  When the second
254	   request arrives at the server before the RTO, both timer
255	   implementations allow a burst.  CWM again limits the burst to 4
256	   packets.  Note, RTO is the common timer limit, but any value would
257	   have the same results, depending on when the second request was
258	   presented in relation to the timer.

260	Implementation of Congestion Window Monitoring

262	   Congestion Window Monitoring requires a simple modification to
263	   existing TCP output routines.  The changes required replace the
264	   current idle detection code.  Replace the existing 3 lines of code:

266	          idle = (snd_max == snd_una)
267	          if (idle && now - lastrcv >= rto)
268	                  cwnd = 1;

270	   with the following 3 lines of code:

272	          maxwin = 4 + snd_nxt - snd_una;
273	          if (cwnd > maxwin)
274	                  cwnd = maxwin;

276	   Packet counting is implemented by line 1.  Lines 2 and 3 implement
277	   Window Limitation.

279	   The choice of limiting the available congestion window to 4 packets
280	   is based on the normal operation of TCP.  An ACK received by the
281	   sender may be in response to the receipt of 2 packets, allowing
282	   another 2 to be sent.  Further, normal window growth may require the
283	   sending of a third packet.  Lastly, in slow-start with delayed ACKs,
284	   the receipt of an ACK can trigger the sending of 4 packets. Thus, 4
285	   packets is a reasonable burst to send into the network.

287	   Increasing the initial window in slow-start to 4 packets has already
288	   been proposed [FAP97].  The effects of this change have been explored
289	   in simulation in [PN98] and in practice in [AHO97].  Such a
290	   modification to TCP would cause the same behavior as our solution in
291	   the cases where the pause timer has expired.  It does not address the
292	   pre-timeout bursting situation we are concerned with.

294	Conclusions

296	   At this time, we propose CWM as a simple, minimal and effective fix
297	   to the 'bug' in current TCP implementations that is exploited by
298	   HTTP/1.1.  Modifications can be made to TCP to solve the slow-start
299	   restart problem that are consistent with the original congestion
300	   avoidance specifications (i.e. a send timer).  However, we feel that
301	   the original intended behavior is not appropriate to some current
302	   applications, specifically HTTP. Thus, we recommend Congestion Window
303	   Monitoring to prevent bursts into the network.  Not only does this
304	   solution solve the current problem in a simple way, it will prevent
305	   bursting in any other situation that might arise. The 4 packet bursts
306	   which we allow are consistent with congestion window growth
307	   algorithms and with Floyd's conclusion about increasing the initial
308	   window size.

310	   CWM, as well as the other solutions listed, need to be re-evaluated
311	   within emerging TCP implementations, e.g., SACK [JB88].  In general,
312	   TCP has no rate pacing and uses congestion control to avoid bursts in
313	   current implementations.  A more explicit mechanism, such as RBP or
314	   similar proposals may be desirable in the future.

316	Security implications

318	   CWM presents no security problems.

320	References

322	   [AHO97] Mark Allman, Chris Hayes, and Shawn Ostermann.  An Evaluatin
323	       of TCP Slow Start Modifications, July 1997.  (Submitted to CCR,
324	       draft available from http://jarok.cs.ohiou.edu/papers/)

326	   [BPK97] Hari Balakrishnan, Venkata N. Padmanabhan, and Randy H. Katz.
327	       The Effects of Asymmetry on TCP Performance.  In Proceedings of
328	       the ACM/IEEE Mobicom, Budapest, Hungary, ACM.  September, 1997.

330	   [BSSK97] Hari Balakrishnan, Srinivasan Seshan, Mark Stemm, and Randy
331	       H. Katz.  Analyzing Stability in Wide-Area Network Performance.
332	       In Proceedings of the ACM SIGMETRICS, Seattle WA, USA, ACM.
333	       June, 1997.

335	   [FGMFB97] R. Fielding, Jim Gettys, Jeffrey C. Mogul, H. Frystyk, and
336	       Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1, January
337	       1997.  RFC 2068.

339	   [FAP97] Sally Floyd, Mark Allman, and Craig Partridge.  Increasing
340	       TCP's Initial Window, July 1997.  Internet Draft draft-floyd-
341	       incr-init-win-01.txt

343	   [Hei97] John Heidemann. Performance Interactions Between P-HTTP and
344	       TCP Implementations.  ACM Computer Communications Review, 27(2),
345	       65-73, April 1997.

347	   [HOT97] John Heidemann, Katia Obraczka, and Joe Touch.  Modeling the
348	       Performance of HTTP Over Several Transport Protocols.  ACM/IEEE
349	       Transactions on Networking 5(5), 616-630, October, 1997.

351	   [JB88] Van Jacobson and R.T. Braden. TCP extensions for long-delay
352	       paths, October 1988. RFC 1072.

354	   [JK90] Van Jacobson and Michael J. Karels.  Congestion Avoidance and
355	       Control.  ACM Computer Communication Review, 18(4):314-329,
356	       August 1990. Revised version of his SIGCOMM '88 paper.

358	   [NS97] ns Network Simulator.  http://www-mash.cs.berkeley.edu/ns/,
359	       1997.

361	   [PN98] K. Poduri and K. Nichols. Simulation Studies of Increased
362	       Initial TCP Window Size, February 1998.  Internet Draft draft-
363	       ietf-tcpimpl-poduri-00.txt

365	   [Poo97] Kacheong Poon, Sun Microsystems, tcp-implementors mailing
366	       list, August, 1997.

368	   [Tou97] Joe Touch, ISI, tcp-implementors mailing list, August 12,
369	       1997.

371	   [VH97] Vikram Visweswaraiah and John Heidemann.  Improving Restart of
372	       Idle TCP Connections.  Technical Report 97-661, University of
373	       Southern California, November 1997.

375	Authors/ Address

377	   Amy Hughes, Joe Touch, John Hiedemann
378	   University of Southern California/Information Sciences Institute
379	   4676 Admiralty Way
380	   Marina del Rey, CA 90292-6695
381	   USA
382	   Phone: +1 310-822-1511
383	   Fax:   +1 310-823-6714
384	   URLs:   http://www.isi.edu/~ahughes
385	           http://www.isi.edu/~touch
386	           http://www.isi.edu/~johnh
387	   Email: ahughes@isi.edu
388	          touch@isi.edu
389	          johnh@isi.edu