idnits 2.17.1 

draft-ietf-tcpimpl-cong-control-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 5 instances of lines with control characters in the document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'MM6b' is mentioned on line 434, but not defined

  == Unused Reference: 'Flo94' is defined on line 508, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2414 (ref. 'AFP98') (Obsoleted by RFC
     3390)

  ** Obsolete normative reference: RFC  813 (ref. 'Cla82') (Obsoleted by RFC
     7805)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96'

  == Outdated reference: A later version (-02) exists of
     draft-ietf-tcpimpl-newreno-00

  ** Downref: Normative reference to an Experimental draft:
     draft-ietf-tcpimpl-newreno (ref. 'FH98')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Hoe96'

  -- Possible downref: Normative reference to a draft: ref. 'HTH98' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jac88'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jac90'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MM96a'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MM96b'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97'

  ** Obsolete normative reference: RFC  793 (ref. 'Pos81') (Obsoleted by RFC
     9293)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Ste94'

  ** Obsolete normative reference: RFC 2001 (ref. 'Ste97') (Obsoleted by RFC
     2581)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'WS95'


     Summary: 14 errors (**), 0 flaws (~~), 4 warnings (==), 13 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	TCP Implementation Working Group                               M. Allman
2	INTERNET DRAFT                              NASA Lewis/Sterling Software
3	File: draft-ietf-tcpimpl-cong-control-05.txt                   V. Paxson
4	                                                                    LBNL
5	                                                              W. Stevens
6	                                                              Consultant
7	                                                          February, 1999

9				TCP Congestion Control

11	Status of this Memo

13	    This document is an Internet-Draft and is in full conformance with
14	    all provisions of Section 10 of RFC2026.  Internet-Draft.
15	    Internet-Drafts are working documents of the Internet Engineering
16	    Task Force (IETF), its areas, and its working groups.  Note that
17	    other groups may also distribute working documents as
18	    Internet-Drafts.

20	    Internet-Drafts are draft documents valid for a maximum of six
21	    months and may be updated, replaced, or obsoleted by other documents
22	    at any time.  It is inappropriate to use Internet-Drafts as
23	    reference material or to cite them other than as ``work in
24	    progress.''

26	    To view the entire list of current Internet-Drafts, please check the
27	    "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
28	    Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
29	    Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific
30	    Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).

32	Abstract

34	    This document defines TCP's four intertwined congestion control
35	    algorithms: slow start, congestion avoidance, fast retransmit, and
36	    fast recovery.  In addition, the document specifies how TCP should
37	    begin transmission after a relatively long idle period, as well as
38	    discussing various acknowledgment generation methods.

40	1   Introduction

42	    This document specifies four TCP [Pos81] congestion control
43	    algorithms: slow start, congestion avoidance, fast retransmit and
44	    fast recovery.  These algorithms were devised in [Jac88] and
45	    [Jac90].  Their use with TCP is standardized in [Bra89].

47	    This document is an update of [Ste97].  In addition to specifying
48	    the congestion control algorithms, this document specifies what TCP
49	    connections should do after a relatively long idle period, as well
50	    as specifying and clarifying some of the issues pertaining to TCP
51	    ACK generation.

53	    Note that [Ste94] provides examples of these algorithms in action
54	    and [WS95] provides an explanation of the source code for the BSD
55	    implementation of these algorithms.

57	    This document is organized as follows.  Section 2 provides various
58	    definitions which will be used throughout the document.  Section 3
59	    provides a specification of the congestion control algorithms.
60	    Section 4 outlines concerns related to the congestion control
61	    algorithms and finally, section 5 outlines security considerations.

63	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
64	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
65	    document are to be interpreted as described in [Bra97].

67	2   Definitions

69	    This section provides the definition of several terms that will be
70	    used throughout the remainder of this document.

72	    SEGMENT:
73	        A segment is ANY TCP/IP data or acknowledgment packet (or both).

75	    SENDER MAXIMUM SEGMENT SIZE (SMSS):
76	        The SMSS is the size of the largest segment that the sender can
77	        transmit.  This value can be based on the maximum transmission
78	        unit of the network, the path MTU discovery [MD90] algorithm,
79	        RMSS (see next item), or other factors.  The size does not
80	        include the TCP/IP headers and options.

82	    RECEIVER MAXIMUM SEGMENT SIZE (RMSS):
83	        The RMSS is the size of the largest segment the receiver is
84	        willing to accept.  This is the value specified in the MSS
85	        option sent by the receiver during connection startup.  Or, if
86	        the MSS option is not used, 536 bytes [Bra89].  The size does
87	        not include the TCP/IP headers and options.

89	    FULL-SIZED SEGMENT:
90	        A segment that contains the maximum number of data bytes
91	        permitted (i.e., a segment containing SMSS bytes of data).

93	    RECEIVER WINDOW (rwnd)
94		The most recently advertised receiver window.

96	    CONGESTION WINDOW (cwnd):
97	        A TCP state variable that limits the amount of data a TCP can
98	        send.  At any given time, a TCP MUST NOT send data with a
99	        sequence number higher than the sum of the highest acknowledged
100	        sequence number and the minimum of cwnd and rwnd.

102	    INITIAL WINDOW (IW):
103	        The initial window is the size of the sender's congestion window
104	        after the three-way handshake is completed.

106	    LOSS WINDOW (LW):
107	        The loss window is the size of the congestion window after a TCP
108	        sender detects loss using its retransmission timer.

110	    RESTART WINDOW (RW):
111	        The restart window is the size of the congestion window after a
112	        TCP restarts transmission after an idle period (if the slow
113	        start algorithm is used; see section 4.1 for more discussion).

115	    FLIGHT SIZE:
116		The amount of data that has been sent but not yet acknowledged.

118	3  Congestion Control Algorithms

120	    This section defines the four congestion control algorithms: slow
121	    start, congestion avoidance, fast retransmit and fast recovery,
122	    developed in [Jac88] and [Jac90].  In some situations it may be
123	    beneficial for a TCP sender to be more conservative than the
124	    algorithms allow, however a TCP MUST NOT be more aggressive than the
125	    following algorithms allow (that is, MUST NOT send data when the
126	    value of cwnd computed by the following algorithms would not allow
127	    the data to be sent).

129	3.1 Slow Start and Congestion Avoidance

131	    The slow start and congestion avoidance algorithms MUST be used by a
132	    TCP sender to control the amount of outstanding data being injected
133	    into the network.  To implement these algorithms, two variables are
134	    added to the TCP per-connection state.  The congestion window (cwnd)
135	    is a sender-side limit on the amount of data the sender can transmit
136	    into the network before receiving an acknowledgment (ACK), while the
137	    receiver's advertised window (rwnd) is a receiver-side limit on the
138	    amount of outstanding data.  The minimum of cwnd and rwnd governs
139	    data transmission.

141	    Another state variable, the slow start threshold (ssthresh), is used
142	    to determine whether the slow start or congestion avoidance
143	    algorithm is used to control data transmission, as discussed below.

145	    Beginning transmission into a network with unknown conditions
146	    requires TCP to slowly probe the network to determine the available
147	    capacity, in order to avoid congesting the network with an
148	    inappropriately large burst of data.  The slow start algorithm is
149	    used for this purpose at the beginning of a transfer, or after
150	    repairing loss detected by the retransmission timer.

152	    IW, the initial value of cwnd, MUST be less than or equal to 2*SMSS
153	    bytes and MUST NOT be more than 2 segments.

155	    We note that a non-standard, experimental TCP extension allows that
156	    a TCP MAY use a larger initial window (IW), as defined in equation 1
157	    [AFP98]:

159	               IW = min (4*SMSS, max (2*SMSS, 4380 bytes))           (1)

161	    With this extension, a TCP sender MAY use a 3 or 4 segment initial
162	    window, provided the combined size of the segments does not exceed
163	    4380 bytes.  We do NOT allow this change as part of the standard
164	    defined by this document.  However, we include discussion of (1) in
165	    the remainder of this document as a guideline for those
166	    experimenting with the change, rather than conforming to the present
167	    standards for TCP congestion control.

169	    The initial value of ssthresh MAY be arbitrarily high (for example,
170	    some implementations use the size of the advertised window), but it
171	    may be reduced in response to congestion.  The slow start algorithm
172	    is used when cwnd < ssthresh, while the congestion avoidance
173	    algorithm is used when cwnd > ssthresh.  When cwnd and ssthresh are
174	    equal the sender may use either slow start or congestion avoidance.

176	    During slow start, a TCP increments cwnd by at most SMSS bytes for
177	    each ACK received that acknowledges new data.  Slow start ends when
178	    cwnd exceeds ssthresh (or, optionally, when it reaches it, as noted
179	    above) or when congestion is observed.

181	    During congestion avoidance, cwnd is incremented by 1 full-sized
182	    segment per round-trip time (RTT).  Congestion avoidance continues
183	    until cwnd congestion is detected.  One formula commonly used to
184	    update cwnd during congestion avoidance is given in equation 2:

186	                          cwnd += SMSS*SMSS/cwnd                     (2)

188	    This adjustment is executed on every incoming non-duplicate ACK.
189	    Equation (2) provides an acceptable approximation to the underlying
190	    principle of increasing cwnd by 1 full-sized segment per RTT.  (Note
191	    that for a connection in which the receiver acknowledges every data
192	    segment, (2) proves slightly more aggressive than 1 segment per RTT,
193	    and for a receiver acknowledging every-other packet, (2) is less
194	    aggressive.)

196	    Implementation Note: Since integer arithmetic is usually used in TCP
197	    implementations, the formula given in equation 2 can fail to
198	    increase cwnd when the congestion window is very large (larger than
199	    SMSS*SMSS).  If the above formula yields 0, the result SHOULD be
200	    rounded up to 1 byte.

202	    Implementation Note: older implementations have an additional
203	    additive constant on the right-hand side of equation (2).  This is
204	    incorrect and can actually lead to diminished performance [PAD+98].

206	    Another acceptable way to increase cwnd during congestion avoidance
207	    is to count the number of bytes that have been acknowledged by ACKs
208	    for new data.  (A drawback of this implementation is that it
209	    requires maintaining an additional state variable.)  When the number
210	    of bytes acknowledged reaches cwnd, then cwnd can be incremented by
211	    up to SMSS bytes.  Note that during congestion avoidance, cwnd MUST
212	    NOT be increased by more than the larger of either 1 full-sized
213	    segment per RTT, or the value computed using equation 2.

215	    Implementation Note: some implementations maintain cwnd in units of
216	    bytes, while others in units of full-sized segments.  The latter
217	    will find equation (2) difficult to use, and may prefer to use the
218	    counting approach discussed in the previous paragraph.

220	    When a TCP sender detects segment loss using the retransmission
221	    timer, the value of ssthresh MUST be set to no more than the value
222	    given in equation 3:

224			   ssthresh = max (FlightSize / 2, 2*SMSS)            (3)

226	    As discussed above, FlightSize is the amount of outstanding data in
227	    the network.

229	    Implementation Note: an easy mistake to make is to simply use cwnd,
230	    rather than FlightSize, which in some implementations may
231	    incidentally increase well beyond rwnd.

233	    Furthermore, upon a timeout cwnd MUST be set to no more than the
234	    loss window, LW, which equals 1 full-sized segment (regardless of
235	    the value of IW).  Therefore, after retransmitting the dropped
236	    segment the TCP sender uses the slow start algorithm to increase the
237	    window from 1 full-sized segment to the new value of ssthresh, at
238	    which point congestion avoidance again takes over.

240	3.2 Fast Retransmit/Fast Recovery

242	    A TCP receiver SHOULD send an immediate duplicate ACK when an
243	    out-of-order segment arrives.  The purpose of this ACK is to inform
244	    the sender that a segment was received out-of-order and which
245	    sequence number is expected.  From the sender's perspective,
246	    duplicate ACKs can be caused by a number of network problems.
247	    First, they can be caused by dropped segments.  In this case, all
248	    segments after the dropped segment will trigger duplicate ACKs.
249	    Second, duplicate ACKs can be caused by the re-ordering of data
250	    segments by the network (not a rare event along some network paths
251	    [Pax97]).  Finally, duplicate ACKs can be caused by replication of
252	    ACK or data segments by the network.  In addition, a TCP receiver
253	    SHOULD send an immediate ACK when the incoming segment fills in all
254	    or part of a gap in the sequence space.  This will generate more
255	    timely information for a sender recovering from a loss through a
256	    retransmission timeout, a fast retransmit, or an experimental loss
257	    recovery algorithm, such as NewReno [FH98].

259	    The TCP sender SHOULD use the "fast retransmit" algorithm to detect
260	    and repair loss, based on incoming duplicate ACKs.  The fast
261	    retransmit algorithm uses the arrival of 3 duplicate ACKs (4
262	    identical ACKs without the arrival of any other intervening packets)
263	    as an indication that a segment has been lost.  After receiving 3
264	    duplicate ACKs, TCP performs a retransmission of what appears to be
265	    the missing segment, without waiting for the retransmission timer to
266	    expire.

268	    After the fast retransmit algorithm sends what appears to be the
269	    missing segment, the "fast recovery" algorithm governs the
270	    transmission of new data until a non-duplicate ACK arrives.  The
271	    reason for not performing slow start is that the receipt of the
272	    duplicate ACKs not only indicates that a segment has been lost, but
273	    also that segments are most likely leaving the network (although a
274	    massive segment duplication by the network can invalidate this
275	    conclusion).  In other words, since the receiver can only generate a
276	    duplicate ACK when a segment has arrived, that segment has left the
277	    network and is in the receiver's buffer, so we know it is no longer
278	    consuming network resources.  Furthermore, since the ACK "clock"
279	    [Jac88] is preserved, the TCP sender can continue to transmit new
280	    segments (although transmission must continue using a reduced cwnd).

282	    The fast retransmit and fast recovery algorithms are usually
283	    implemented together as follows.

285	    1.  When the third duplicate ACK is received, set ssthresh to no
286	        more than the value given in equation 3.

288	    2.  Retransmit the lost segment and set cwnd to ssthresh plus
289	        3*SMSS.  This artificially "inflates" the congestion window by
290	        the number of segments (three) that have left the network and
291	        which the receiver has buffered.

293	    3.  For each additional duplicate ACK received, increment cwnd by
294	        SMSS.  This artificially inflates the congestion window in order
295	        to reflect the additional segment that has left the network.

297	    4.  Transmit a segment, if allowed by the new value of cwnd and the
298	        receiver's advertised window.

300	    5.  When the next ACK arrives that acknowledges new data, set cwnd
301	        to ssthresh (the value set in step 1).  This is termed
302	        "deflating" the window.

304	        This ACK should be the acknowledgment elicited by the
305	        retransmission from step 1, one RTT after the retransmission
306	        (though it may arrive sooner in the presence of significant
307	        out-of-order delivery of data segments at the receiver).
308	        Additionally, this ACK should acknowledge all the intermediate
309	        segments sent between the lost segment and the receipt of the
310	        third duplicate ACK, if none of these were lost.

312	    Note: This algorithm is known to generally not recover very
313	    efficiently from multiple losses in a single flight of packets
314	    [FF96].  One proposed set of modifications to it to address this
315	    problem can be found in [FH98].

317	4   Additional Considerations

319	4.1 Re-starting Idle Connections

321	    A known problem with the TCP congestion control algorithms described
322	    above is that they allow a potentially inappropriate burst of
323	    traffic to be transmitted after TCP has been idle for a relatively
324	    long period of time.  After an idle period, TCP cannot use the ACK
325	    clock to strobe new segments into the network, as all the ACKs have
326	    drained from the network.  Therefore, as specified above, TCP can
327	    potentially send a cwnd-size line-rate burst into the network after
328	    an idle period.

330	    [Jac88] recommends that a TCP use slow start to restart transmission
331	    after a relatively long idle period.  Slow start serves to restart
332	    the ACK clock, just as it does at the beginning of a transfer.  This
333	    mechanism has been widely deployed in the following manner.  When
334	    TCP has not received a segment for more than one retransmission
335	    timeout, cwnd is reduced to the value of the restart window (RW)
336	    before transmission begins.

338	    For the purposes of this standard, we define RW = IW.

340	    We note that the non-standard experimental extension to TCP defined
341	    in [AFP98] defines RW = min(IW, cwnd), with the definition of IW
342	    adjusted per equation (1) above.

344	    Using the last time a segment was received to determine whether or
345	    not to decrease cwnd fails to deflate cwnd in the common case of
346	    persistent HTTP connections [HTH98].  In this case, a WWW server
347	    receives a request before transmitting data to the WWW browser.  The
348	    reception of the request makes the test for an idle connection fail,
349	    and allows the TCP to begin transmission with a possibly
350	    inappropriately large cwnd.

352	    Therefore, a TCP SHOULD set cwnd to no more than RW before beginning
353	    transmission if the TCP has not sent data in an interval exceeding
354	    the retransmission timeout.

356	4.2 Generating Acknowledgments

358	    The delayed ACK algorithm specified in [Bra89] SHOULD be used by a
359	    TCP receiver.  When used, a TCP receiver MUST NOT excessively delay
360	    acknowledgments.  Specifically, an ACK SHOULD be generated for at
361	    least every second full-sized segment, and MUST be generated within
362	    500 ms of the arrival of the first unacknowledged packet.

364	    The requirement that an ACK "SHOULD" be generated for at least every
365	    second full-sized segment is listed in [Bra89] in one place as a
366	    SHOULD and another as a MUST.  Here we unambiguously state it is a
367	    SHOULD.  We also emphasize that this is a SHOULD, meaning that an
368	    implementor should indeed only deviate from this requirement after
369	    careful consideration of the implications.  See the discussion of
370	    "Stretch ACK violation" in [PAD+98] and the references therein for a
371	    discussion of the possible performance problems with generating ACKs
372	    less frequently than every second full-sized segment.

374	    In some cases, the sender and receiver may not agree on what
375	    constitutes a full-sized segment.  An implementation is deemed to
376	    comply with this requirement if it sends at least one acknowledgment
377	    every time it receives 2*RMSS bytes of new data from the sender,
378	    where RMSS is the Maximum Segment Size specified by the receiver to
379	    the sender (or the default value of 536 bytes, per [Bra89], if the
380	    receiver does not specify an MSS option during connection
381	    establishment).  The sender may be forced to use a segment size less
382	    than RMSS due to the maximum transmission unit (MTU), the path MTU
383	    discovery algorithm or other factors.  For instance, consider the
384	    case when the receiver announces an RMSS of X bytes but the sender
385	    ends up using a segment size of Y bytes (Y < X) due to path MTU
386	    discovery (or the sender's MTU size).  The receiver will generate
387	    stretch ACKs if it waits for 2*X bytes to arrive before an ACK is
388	    sent.  Clearly this will take more than 2 segments of size Y bytes.
389	    Therefore, while a specific algorithm is not defined, it is
390	    desirable for receivers to attempt to prevent this situation, for
391	    example by acknowledging at least every second segment, regardless
392	    of size.  Finally, we repeat that an ACK MUST NOT be delayed for
393	    more than 500 ms waiting on a second full-sized segment to arrive.

395	    Out-of-order data segments SHOULD be acknowledged immediately, in
396	    order to accelerate loss recovery.  To trigger the fast retransmit
397	    algorithm, the receiver SHOULD send an immediate duplicate ACK when
398	    it receives a data segment above a gap in the sequence space.  To
399	    provide feedback to senders recovering from losses, the receiver
400	    SHOULD send an immediate ACK when it receives a data segment that
401	    fills in all or part of a gap in the sequence space.

403	    A TCP receiver MUST NOT generate more than one ACK for every
404	    incoming segment, other than to update the offered window as the
405	    receiving application consumes new data [page 42, Pos81][Cla82].

407	4.3 Loss Recovery Mechanisms

409	    A number of loss recovery algorithms that augment fast retransmit
410	    and fast recovery have been suggested by TCP researchers.  While
411	    some of these algorithms are based on the TCP selective
412	    acknowledgment (SACK) option [MMFR96], such as [FF96,MM96a,MM96b],
413	    others do not require SACKs [Hoe96,FF96,FH98].  The non-SACK
414	    algorithms use "partial acknowledgments" (ACKs which cover new data,
415	    but not all the data outstanding when loss was detected) to trigger
416	    retransmissions.  While this document does not standardize any of
417	    the specific algorithms that may improve fast retransmit/fast
418	    recovery, these enhanced algorithms are implicitly allowed, as long
419	    as they follow the general principles of the basic four algorithms
420	    outlined above.

422	    Therefore, when the first loss in a window of data is detected,
423	    ssthresh MUST be set to no more than the value given by equation
424	    (3).  Second, until all lost segments in the window of data in
425	    question are repaired, the number of segments transmitted in each
426	    RTT MUST be no more than half the number of outstanding segments
427	    when the loss was detected.  Finally, after all loss in the given
428	    window of segments has been successfully retransmitted, cwnd MUST be
429	    set to no more than ssthresh and congestion avoidance MUST be used
430	    to further increase cwnd.  Loss in two successive windows of data,
431	    or the loss of a retransmission, should be taken as two indications
432	    of congestion and, therefore, cwnd (and ssthresh) MUST be lowered
433	    twice in this case.  The algorithms outlined in
434	    [Hoe96,FF96,MM96a,MM6b] follow the principles of the basic four
435	    congestion control algorithms outlined in this document.

437	5.  Security Considerations

439	    This document requires a TCP to diminish its sending rate in the
440	    presence of retransmission timeouts and the arrival of duplicate
441	    acknowledgments.  An attacker can therefore impair the performance
442	    of a TCP connection by either causing data packets or their
443	    acknowledgments to be lost, or by forging excessive duplicate
444	    acknowledgments.  Causing two congestion control events back-to-back
445	    will often cut ssthresh to its minimum value of 2*SMSS, causing the
446	    connection to immediately enter the slower-performing congestion
447	    avoidance phase.

449	    The Internet to a considerable degree relies on the correct
450	    implementation of these algorithms in order to preserve network
451	    stability and avoid congestion collapse.  An attacker could cause
452	    TCP endpoints to respond more aggressively in the face of congestion
453	    by forging excessive duplicate acknowledgments or excessive
454	    acknowledgments for new data.  Conceivably, such an attack could
455	    drive a portion of the network into congestion collapse.

457	6.  Changes Relative to RFC 2001

459	    This document has been extensively rewritten editorially and it is
460	    not feasible to itemize the list of changes between the two
461	    documents. The intention of this document is not to change any of
462	    the recommendations given in RFC 2001, but to further clarify cases
463	    that were not discussed in detail in 2001. Specifically, this
464	    document suggests what TCP connections should do after a relatively
465	    long idle period, as well as specifying and clarifying some of the
466	    issues pertaining to TCP ACK generation.  Finally, the allowable
467	    upper bound for the initial congestion window has also been raised
468	    from one to two segments.

470	Acknowledgments

472	    The four algorithms that are described were developed by Van
473	    Jacobson.

475	    Some of the text from this document is taken from "TCP/IP
476	    Illustrated, Volume 1: The Protocols" by W. Richard Stevens
477	    (Addison-Wesley, 1994) and "TCP/IP Illustrated, Volume 2: The
478	    Implementation" by Gary R. Wright and W.  Richard Stevens
479	    (Addison-Wesley, 1995).  This material is used with the permission
480	    of Addison-Wesley.

482	    Neal Cardwell, Sally Floyd, Craig Partridge and Joe Touch
483	    contributed a number of helpful suggestions.

485	References

487	    [AFP98] M. Allman, S. Floyd, C. Partridge, Increasing TCP's Initial
488	        Window Size, September 1998.  RFC 2414.

490	    [Bra89] B. Braden, ed., Requirements for Internet Hosts --
491	        Communication Layers, RFC 1122, Oct. 1989.

493	    [Bra97] S. Bradner, Key words for use in RFCs to Indicate
494	        Requirement Levels, BCP 14, RFC 2119, March 1997.

496	    [Cla82] D. Clark, Window and Acknowledgment Strategy in TCP, RFC
497	        813.  July 1982.

499	    [FF96] K. Fall, S. Floyd.  Simulation-based Comparisons of Tahoe,
500	        Reno and SACK TCP.  Computer Communication Review, July 1996.
501	        ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z.

503	    [FH98] S. Floyd, T. Henderson.  The NewReno Modification to TCP's
504	        Fast Recovery Algorithm.  Internet-Draft
505	        draft-ietf-tcpimpl-newreno-00.txt, November 1998.  (Work in
506	        progress).

508	    [Flo94] S. Floyd, TCP and Successive Fast Retransmits. Technical
509	        report, October 1994.
510		ftp://ftp.ee.lbl.gov/papers/fastretrans.ps.

512	    [Hoe96] J. Hoe, Improving the Start-up Behavior of a Congestion
513	        Control Scheme for TCP.  In ACM SIGCOMM, August 1996.

515	    [HTH98] A. Hughes, J. Touch, J. Heidemann.  Issues in TCP Slow-Start
516	        Restart After Idle.  Internet-Draft
517	        draft-ietf-tcpimpl-restart-00.txt, March 1998.  (Work in
518	        progress).

520	    [Jac88] V. Jacobson, Congestion Avoidance and Control, Computer
521	        Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988.
522	        ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.

524	    [Jac90] V. Jacobson, Modified TCP Congestion Avoidance Algorithm,
525	        end2end-interest mailing list, April 30, 1990.
526	        ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail.

528	    [MD90] J. Mogul, S. Deering.  Path MTU Discovery, November 1990.
529	        RFC 1191.

531	    [MM96a] M. Mathis, J. Mahdavi, Forward Acknowledgment: Refining TCP
532	        Congestion Control, Proceedings of SIGCOMM'96, August, 1996,
533	        Stanford, CA.  Available from
534	        http://www.psc.edu/networking/papers/papers.html

536	    [MM96b] M. Mathis, J. Mahdavi, TCP Rate-Halving with Bounding
537	        Parameters.  Technical report.  Available from
538	        http://www.psc.edu/networking/papers/FACKnotes/current.

540	    [MMFR96] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective
541	        Acknowledgement Options, October 1996.  RFC 2018.

543	    [PAD+98] V. Paxson, M. Allman, S. Dawson, W. Fenner, J. Griner,
544	        I. Heavens, K. Lahey, J. Semke, B. Volz.  Known TCP
545	        Implementation Problems.  Internet-Draft
546	        draft-ietf-tcpimpl-prob-05.txt, November 1998.  (Work in
547	        progress).

549	    [Pax97] V. Paxson, End-to-End Internet Packet Dynamics,
550	        Proceedings of SIGCOMM '97, Cannes, France, Sep. 1997.

552	    [Pos81] J. Postel, Transmission Control Protocol, September 1981.
553	        RFC 793.

555	    [Ste94] W. R. Stevens, TCP/IP Illustrated, Volume 1: The
556	        Protocols, Addison-Wesley, 1994.

558	    [Ste97] W. R. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
559	        Retransmit, and Fast Recovery Algorithms", January 1997.  RFC
560	        2001.

562	    [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2:
563	        The Implementation, Addison-Wesley, 1995.

565	Author's Address:

567	    Mark Allman
568	    NASA Lewis Research Center/Sterling Software
569	    21000 Brookpark Rd.  MS 54-2
570	    Cleveland, OH  44135
571	    216-433-6586
572	    mallman@lerc.nasa.gov
573	    http://roland.lerc.nasa.gov/~mallman

575	    Vern Paxson
576	    Network Research Group
577	    Lawrence Berkeley National Laboratory
578	    Berkeley, CA 94720
579	    USA
580	    510-486-7504
581	    vern@ee.lbl.gov

583	    W. Richard Stevens
584	    1202 E. Paseo del Zorro
585	    Tucson, AZ  85718
586	    520-297-9416
587	    rstevens@kohala.com
588	    http://www.kohala.com/~rstevens