idnits 2.17.1 

draft-paxson-tcpm-rfc2988bis-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 7 instances of too long lines in the document, the longest one
     being 6 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 2010) is 5184 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC2988' is mentioned on line 360, but not defined

  ** Obsolete undefined reference: RFC 2988 (Obsoleted by RFC 6298)

  == Missing Reference: 'JBB92' is mentioned on line 156, but not defined

  == Missing Reference: 'RFC1122' is mentioned on line 360, but not defined

  == Missing Reference: 'RFC5681' is mentioned on line 383, but not defined

  ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC
     5681)

  ** Obsolete normative reference: RFC  793 (ref. 'Pos81') (Obsoleted by RFC
     9293)


     Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                                V. Paxson
2	INTERNET DRAFT                                          ICSI/UC Berkeley
3	File: draft-paxson-tcpm-rfc2988bis-00.txt                      M. Allman
4	                                                                    ICSI
5	                                                                  J. Chu
6	                                                                  Google
7	                                                           February 2010

9	                  Computing TCP's Retransmission Timer

11	Status of this Memo

13	    This Internet-Draft is submitted to IETF in full conformance with
14	    the provisions of BCP 78 and BCP 79.

16	    Internet-Drafts are working documents of the Internet Engineering
17	    Task Force (IETF), its areas, and its working groups.  Note that
18	    other groups may also distribute working documents as Internet-
19	    Drafts.

21	    Internet-Drafts are draft documents valid for a maximum of six
22	    months and may be updated, replaced, or obsoleted by other documents
23	    at any time.  It is inappropriate to use Internet-Drafts as
24	    reference material or to cite them other than as "work in progress."

26	    The list of current Internet-Drafts can be accessed at
27	    http://www.ietf.org/ietf/1id-abstracts.txt.

29	    The list of Internet-Draft Shadow Directories can be accessed at
30	    http://www.ietf.org/shadow.html.

32	    This Internet-Draft will expire on August 1, 2010.

34	Copyright Notice

36	    Copyright (c) 2010 IETF Trust and the persons identified as the
37	    document authors.  All rights reserved.

39	    This document is subject to BCP 78 and the IETF Trust's Legal
40	    Provisions Relating to IETF Documents
41	    (http://trustee.ietf.org/license-info) in effect on the date of
42	    publication of this document.  Please review these documents
43	    carefully, as they describe your rights and restrictions with
44	    respect to this document.  Code Components extracted from this
45	    document must include Simplified BSD License text as described in
46	    Section 4.e of the Trust Legal Provisions and are provided without
47	    warranty as described in the BSD License.

49	Abstract

51	   This document defines the standard algorithm that Transmission
52	   Control Protocol (TCP) senders are required to use to compute and
53	   manage their retransmission timer.  It expands on the discussion in
54	   section 4.2.3.1 of RFC 1122 and upgrades the requirement of
55	   supporting the algorithm from a SHOULD to a MUST.

57	1   Introduction

59	   The Transmission Control Protocol (TCP) [Pos81] uses a retransmission
60	   timer to ensure data delivery in the absence of any feedback from the
61	   remote data receiver.  The duration of this timer is referred to as
62	   RTO (retransmission timeout).  RFC 1122 [Bra89] specifies that the
63	   RTO should be calculated as outlined in [Jac88].

65	   This document codifies the algorithm for setting the RTO.  In
66	   addition, this document expands on the discussion in section 4.2.3.1
67	   of RFC 1122 and upgrades the requirement of supporting the algorithm
68	   from a SHOULD to a MUST.  RFC 2581 [APS99] outlines the algorithm TCP
69	   uses to begin sending after the RTO expires and a retransmission is
70	   sent.  This document does not alter the behavior outlined in RFC 2581
71	   [APS99].

73	   In some situations it may be beneficial for a TCP sender to be more
74	   conservative than the algorithms detailed in this document allow.
75	   However, a TCP MUST NOT be more aggressive than the following
76	   algorithms allow.

78	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
79	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
80	   document are to be interpreted as described in [Bra97].

82	2   The Basic Algorithm

84	   To compute the current RTO, a TCP sender maintains two state
85	   variables, SRTT (smoothed round-trip time) and RTTVAR (round-trip
86	   time variation).  In addition, we assume a clock granularity of G
87	   seconds.

89	   The rules governing the computation of SRTT, RTTVAR, and RTO are as
90	   follows:

92	   (2.1) Until a round-trip time (RTT) measurement has been made for a
93	         segment sent between the sender and receiver, the sender SHOULD
94	         set RTO <- 1 second, though the "backing off" on repeated
95	         retransmission discussed in (5.5) still applies.

97	           Note that the previous version of this document used an
98	           initial RTO of 3 seconds [RFC2988].  A TCP implementation MAY
99	           still use this value (or any other value > 1 second).  This
100	           change in the lower bound on the initial RTO is discussed in
101	           further detail in Appendix A.

103	   (2.2) When the first RTT measurement R is made, the host MUST set

105	            SRTT <- R
106	            RTTVAR <- R/2
107	            RTO <- SRTT + max (G, K*RTTVAR)

109	         where K = 4.

111	   (2.3) When a subsequent RTT measurement R' is made, a host MUST set

113	            RTTVAR <- (1 - beta) * RTTVAR + beta * |SRTT - R'|
114	            SRTT <- (1 - alpha) * SRTT + alpha * R'

116	         The value of SRTT used in the update to RTTVAR is its value
117	         before updating SRTT itself using the second assignment.  That
118	         is, updating RTTVAR and SRTT MUST be computed in the above
119	         order.

121	         The above SHOULD be computed using alpha=1/8 and beta=1/4 (as
122	         suggested in [JK88]).

124	         After the computation, a host MUST update
125	         RTO <- SRTT + max (G, K*RTTVAR)

127	   (2.4) Whenever RTO is computed, if it is less than 1 second then the
128	         RTO SHOULD be rounded up to 1 second.

130	         Traditionally, TCP implementations use coarse grain clocks to
131	         measure the RTT and trigger the RTO, which imposes a large
132	         minimum value on the RTO.  Research suggests that a large
133	         minimum RTO is needed to keep TCP conservative and avoid
134	         spurious retransmissions [AP99].  Therefore, this
135	         specification requires a large minimum RTO as a conservative
136	         approach, while at the same time acknowledging that at some
137	         future point, research may show that a smaller minimum RTO is
138	         acceptable or superior.

140	   (2.5) A maximum value MAY be placed on RTO provided it is at least 60
141	         seconds.

143	3   Taking RTT Samples

145	   TCP MUST use Karn's algorithm [KP87] for taking RTT samples.  That
146	   is, RTT samples MUST NOT be made using segments that were
147	   retransmitted (and thus for which it is ambiguous whether the reply
148	   was for the first instance of the packet or a later instance).  The
149	   only case when TCP can safely take RTT samples from retransmitted
150	   segments is when the TCP timestamp option [JBB92] is employed, since
151	   the timestamp option removes the ambiguity regarding which instance
152	   of the data segment triggered the acknowledgment.

154	   Traditionally, TCP implementations have taken one RTT measurement at
155	   a time (typically once per RTT).  However, when using the timestamp
156	   option, each ACK can be used as an RTT sample.  RFC 1323 [JBB92]
157	   suggests that TCP connections utilizing large congestion windows
158	   should take many RTT samples per window of data to avoid aliasing
159	   effects in the estimated RTT.  A TCP implementation MUST take at
160	   least one RTT measurement per RTT (unless that is not possible per
161	   Karn's algorithm).

163	   For fairly modest congestion window sizes research suggests that
164	   timing each segment does not lead to a better RTT estimator [AP99].
165	   Additionally, when multiple samples are taken per RTT the alpha and
166	   beta defined in section 2 may keep an inadequate RTT history.  A
167	   method for changing these constants is currently an open research
168	   question.

170	4   Clock Granularity

172	   There is no requirement for the clock granularity G used for
173	   computing RTT measurements and the different state variables.
174	   However, if the K*RTTVAR term in the RTO calculation equals zero,
175	   the variance term MUST be rounded to G seconds (i.e., use the
176	   equation given in step 2.3).

178	       RTO <- SRTT + max (G, K*RTTVAR)

180	   Experience has shown that finer clock granularities (<= 100 msec)
181	   perform somewhat better than more coarse granularities.

183	   Note that [Jac88] outlines several clever tricks that can be used to
184	   obtain better precision from coarse granularity timers.  These
185	   changes are widely implemented in current TCP implementations.

187	5   Managing the RTO Timer

189	   An implementation MUST manage the retransmission timer(s) in such a
190	   way that a segment is never retransmitted too early, i.e. less than
191	   one RTO after the previous transmission of that segment.

193	   The following is the RECOMMENDED algorithm for managing the
194	   retransmission timer:

196	   (5.1) Every time a packet containing data is sent (including a
197	         retransmission), if the timer is not running, start it running
198	         so that it will expire after RTO seconds (for the current value
199	         of RTO).

201	   (5.2) When all outstanding data has been acknowledged, turn off the
202	         retransmission timer.

204	   (5.3) When an ACK is received that acknowledges new data, restart the
205	         retransmission timer so that it will expire after RTO seconds
206	         (for the current value of RTO).

208	   When the retransmission timer expires, do the following:

210	   (5.4) Retransmit the earliest segment that has not been acknowledged
211	         by the TCP receiver.

213	   (5.5) The host MUST set RTO <- RTO * 2 ("back off the timer").  The
214	         maximum value discussed in (2.5) above may be used to provide an
215	         upper bound to this doubling operation.

217	   (5.6) Start the retransmission timer, such that it expires after RTO
218	         seconds (for the value of RTO after the doubling operation
219	         outlined in 5.5).

221	   (5.7) If the timer expires awaiting the ACK of a SYN segment and the
222	         TCP implementation is using an RTO less than 3 seconds, the RTO
223	         MUST be re-initialized to 3 seconds when data transmission
224	         begins (i.e., after the three-way handshake completes).

226	         This represents a change from the previous version of this
227	         document [RFC2988] and is discussed in Appendix A.

229	   Note that after retransmitting, once a new RTT measurement is
230	   obtained (which can only happen when new data has been sent and
231	   acknowledged), the computations outlined in section 2 are performed,
232	   including the computation of RTO, which may result in "collapsing"
233	   RTO back down after it has been subject to exponential backoff
234	   (rule 5.5).

236	   Note that a TCP implementation MAY clear SRTT and RTTVAR after
237	   backing off the timer multiple times as it is likely that the
238	   current SRTT and RTTVAR are bogus in this situation.  Once SRTT and
239	   RTTVAR are cleared they should be initialized with the next RTT
240	   sample taken per (2.2) rather than using (2.3).

242	6   Security Considerations

244	   This document requires a TCP to wait for a given interval before
245	   retransmitting an unacknowledged segment.  An attacker could cause a
246	   TCP sender to compute a large value of RTO by adding delay to a
247	   timed packet's latency, or that of its acknowledgment.  However,
248	   the ability to add delay to a packet's latency often coincides with
249	   the ability to cause the packet to be lost, so it is difficult to
250	   see what an attacker might gain from such an attack that could cause
251	   more damage than simply discarding some of the TCP connection's
252	   packets.

254	   The Internet to a considerable degree relies on the correct
255	   implementation of the RTO algorithm (as well as those described in
256	   RFC 2581) in order to preserve network stability and avoid
257	   congestion collapse.  An attacker could cause TCP endpoints to
258	   respond more aggressively in the face of congestion by forging
259	   acknowledgments for segments before the receiver has actually
260	   received the data, thus lowering RTO to an unsafe value.  But to do
261	   so requires spoofing the acknowledgments correctly, which is
262	   difficult unless the attacker can monitor traffic along the path
263	   between the sender and the receiver.  In addition, even if the
264	   attacker can cause the sender's RTO to reach too small a value, it
265	   appears the attacker cannot leverage this into much of an attack
266	   (compared to the other damage they can do if they can spoof packets
267	   belonging to the connection), since the sending TCP will still back
268	   off its timer in the face of an incorrectly transmitted packet's
269	   loss due to actual congestion.

271	7  IANA Considerations

273	   None

275	Acknowledgments

277	   The RTO algorithm described in this memo was originated by Van
278	   Jacobson in [Jac88].

280	   Much of the data that motivated changing the initial RTO from 3
281	   seconds to 1 second came from Robert Love, Andre Broido and Mike
282	   Belshe.

284	Normative References

286	   [APS99] Allman, M., Paxson V. and W. Stevens, "TCP Congestion
287	           Control", RFC 2581, April 1999.

289	   [Bra89] Braden, R., "Requirements for Internet Hosts --
290	           Communication Layers", STD 3, RFC 1122, October 1989.

292	   [Bra97] Bradner, S., "Key words for use in RFCs to Indicate
293	           Requirement Levels", BCP 14, RFC 2119, March 1997.

295	   [Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
296	           September 1981.

298	Non-Normative References

300	   [AP99]  Allman, M. and V. Paxson, "On Estimating End-to-End Network
301	           Path Properties", SIGCOMM 99.

303	   [Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century",
304	           http://www.ietf.org/proceedings/75/slides/tcpm-1.pdf, July
305	           2009.

307	   [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer
308	           Communication Review, vol. 18, no. 4, pp. 314-329, Aug.  1988.

310	   [JK88]  Jacobson, V. and M. Karels, "Congestion Avoidance and
311	           Control", ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.

313	   [KP87]  Karn, P. and C. Partridge, "Improving Round-Trip Time
314	           Estimates in Reliable Transport Protocols", SIGCOMM 87.

316	Author's Addresses

318	   Vern Paxson
319	   ICSI
320	   1947 Center Street
321	   Suite 600
322	   Berkeley, CA 94704-1198

324	   Phone: 510-666-2882
325	   EMail: vern@icir.org
326	   http://www.icir.org/vern/

328	   Mark Allman
329	   ICSI
330	   1947 Center Street
331	   Suite 600
332	   Berkeley, CA 94704-1198

334	   Phone: 440-235-1792
335	   EMail: mallman@icir.org
336	   http://www.icir.org/mallman/

338	   H.K. Jerry Chu
339	   Google, Inc.
340	   1600 Amphitheatre Parkway
341	   Mountain View, CA 94043

343	   Phone: 650-253-3010
344	   Email: hkchu@google.com

346	Appendix A

348	    Choosing a reasonable initial RTO requires balancing two
349	    competing considerations:

351	    1. The initial RTO should be sufficiently large to cover most of the
352	       end-to-end paths to avoid spurious retransmissions and their
353	       associated negative performance impact.

355	    2. The initial RTO should be small enough to ensure a timely
356	       recovery from packet loss occurring before an RTT sample is
357	       taken.

359	    Traditionally, TCP has used 3 seconds as the initial RTO
360	    [RFC1122,RFC2988].  This document calls for lowering this value to 1
361	    second for the following reasons:

363	     - Modern networks are simply faster than the state-of-the-art was
364	       at the time the initial RTO of 3 seconds was defined.

366	     - Studies have found that the round-trip time of more than 97.5% of
367	       the connections observed in a large scale analysis were less than
368	       1 second [Chu09], suggesting that 1 second meets criteria 1 above.

370	     - In addition, the studies have observed retransmission rates within the
371	       three-way handshake of roughly 2%.  This shows that reducing the
372	       initial RTO has benefit to a non-negligible set of connections.

374	     - However, roughly 2.5% of the connections studied in [Chu09] have
375	       an RTT longer than 1 second.  For those connections, a 1 second
376	       initial RTO guarantees a retransmission during connection establishment
377	       (needed or not).

379	       When this happens, this document calls for reverting to an initial
380	       RTO of 3 seconds for the data transmission phase.  Therefore, the
381	       implications of the spurious retransmission are modest: (1) an
382	       extra SYN is transmitted into the network, and (2) according to
383	       [RFC5681] the initial congestion window will be limited to 1
384	       segment.  While (2) clearly puts such connections at a
385	       disadvantage, this document at least resets the RTO such that the
386	       connection will not continually run into problems with a short
387	       timeout.  (Of course, if the RTT is more than three seconds, the
388	       connection will still encounter difficulties.  But that is not a new
389	       issue for TCP.)

391	       In addition, we note that when using timestamps the TCP will be
392	       able to take an RTT sample even in the presence of a spurious
393	       retransmission, hence avoiding concern (2) above.