idnits 2.17.1 

draft-swami-tsvwg-tcp-dclor-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 368.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 345.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 352.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 358.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 174: '...   the TCP SACK option [4] is enabled, only then it SHOULD follow the...'
     RFC 2119 keyword, line 186: '...  The TCP sender MUST record the time ...'
     RFC 2119 keyword, line 195: '...ally, the sender MUST NOT update the S...'
     RFC 2119 keyword, line 205: '...PTR), the sender SHOULD send one *new*...'
     RFC 2119 keyword, line 208: '... then the sender MUST retransmit no mo...'
     (5 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     3.  For each ACK or SACK < SS_PTR (i.e., a SACK block whose left
     edge is < SS_PTR), the sender SHOULD send one *new* data packet if it is
     present and if dclor_cntr < cwnd and (rwnd < SND.NXT -SND.UNA).  If (rwnd
     >= SND.NXT - SND.UNA) or if there is no new data to send, then the sender
     MUST retransmit no more than one packet per RTO from the tail of the
     retransmission queue regardless of the value of dclor_cntr.  Moreover,
     for each *new* packet sent, dclor_cntr should be incremented by one.  For
     ACK/ SACK < SS_PTR, the sender MUST not initiate any loss recovery
     algorithm nor should it update cwnd value.  Additionally, the SS_THRESH
     should be left unchanged for all these ACKs.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (September 27, 2005) is 6786 days in the past.  Is
     this intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '8' is defined on line 309, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 2581 (ref. '1') (Obsoleted by RFC 5681)

  ** Obsolete normative reference: RFC 3517 (ref. '2') (Obsoleted by RFC 6675)

  ** Obsolete normative reference: RFC 2861 (ref. '5') (Obsoleted by RFC 7661)

  ** Downref: Normative reference to an Experimental RFC: RFC 3522 (ref. '6')

  -- No information found for draft-ietf-tsvwg- - is the name correct?

  -- Possible downref: Normative reference to a draft: ref. '7' 

  ** Obsolete normative reference: RFC 2988 (ref. '8') (Obsoleted by RFC 6298)

  -- Possible downref: Non-RFC (?) normative reference: ref. '9'


     Summary: 11 errors (**), 0 flaws (~~), 4 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                           Y. Swami
3	Internet-Draft                                                     K. Le
4	Expires: March 31, 2006                    Nokia Research Center, Dallas
5	                                                      September 27, 2005

7	   Decorrelated Loss Recovery (DCLOR) Using SACK Option for Spurious
8	                                Timeouts
9	                     draft-swami-tsvwg-tcp-dclor-06

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on March 31, 2006.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2005).

40	Abstract

42	   A spurious timeout in TCP forces the sender to unnecessarily
43	   retransmit one complete congestion window of data into the network.
44	   In addition, the congestion state of the network could change
45	   substantially after a spurious timeout.  In this draft we propose a
46	   conservative congestion response algorithm afert spurious timeout
47	   that takes network state into account.

49	1.  Introduction

51	   The response of a TCP sender after a retransmission timeout is
52	   governed by the underlying assumption that a mid-stream timeout can
53	   occur only if there is heavy congestion--manifested as packet
54	   loss--in the network.  TCP therefore assumes that a timeout is a
55	   sufficient indication to a) recover all the packets in flight, and b)
56	   to initiate a congestion response (slow start in this case) suited
57	   for heavy congestion scenarios.

59	   Although the assumption that a timeout can occur only if there is
60	   severe congestion is valid for traditional wireline networks, it does
61	   not hold good for some other types of networks--networks where
62	   packets can be stalled "in the network" for a significant duration
63	   without being discarded.  In cellular networks, for example, the link
64	   layer can experience a relatively long disruption due to errors, and
65	   the link layer protocol can keep all packets buffered as long as the
66	   link layer disruption lasts.

68	   In this document we present an alternative approach to loss recovery
69	   and congestion control that "De-Correlates" Loss Recovery from
70	   congestion after a spurious.  The algorithm described here follows
71	   the congestion control principle of [1] [3] and [5], but unlike the
72	   present go-back-N loss recovery algorithm after timeout, DCLOR only
73	   sends those segments that were actually lost in the network.

75	2.  Terminology

77	   The key words "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT,"
78	   "SHOULD," "SHOULD NOT," "RECOMMENDED," "MAY," "OPTIONAL," and
79	   "silently ignore" in this document are to be interpreted as described
80	   in RFC 2119.

82	3.  Problem Description

84	   Let us assume that a TCP sender has sent N packets, p(1) ... p(N),
85	   into the network and it's waiting for the ACK of p(1).  Due to bad
86	   network conditions or some other problem, these packets are
87	   excessively delayed at some intermediary node RTR-1.  This excessive
88	   delay forces the TCP sender to timeout and enter slow start.

90	   As far as the sender is concerned, a timeout is always interpreted as
91	   heavy congestion.  The TCP sender therefore makes the assumption that
92	   all packets between p(1) and p(N) were lost in the network.  To
93	   recover from this misconstrued loss, the sender retransmits P1(1) and
94	   waits for the ACK a(1) ( where Px(k) represents the xth
95	   retransmission of packet with sequence number k).

97	   After some period of time when the network conditions at RTR-1
98	   improve, the queued in packets are finally dispatched to their
99	   intended recipient.  In response, TCP receiver generates the ACK
100	   a(1).  When the TCP sender receives a(1), it's fooled into believing
101	   that a(1) was generated in response to the retransmitted packet
102	   p1(1), while in reality a(1) was generated in response to the
103	   originally transmitted packet p(1).  When the sender receives a(1),
104	   it increases its congestion window to two, and retransmits p1(2) and
105	   p1(3).  As the sender receives more acknowledgments, it continues
106	   with retransmissions and finally starts sending new data.  Here we
107	   only analyze the congestion control behavior after a spurious
108	   timeout.  Our scheme can be used in conjunction with the detection
109	   schemes in [6] and [9].

111	   To analyze network congestion after spurious timeout, we compute the
112	   worst case scenario packet loss in the system--assuming only TCP
113	   connections to be present.  After the timeout (real or spurious), the
114	   TCP sender sets its SS_THRESH to N/2.  Therefore, for the first N/2
115	   ACKs received (i.e., ACK a(1) to a(N/2)), the TCP sender will grow
116	   its congestion window by one and reach the SS_THRESH value of N/2.
117	   For each ACK received, the TCP sender sends 2 packets.  Therefore, by
118	   the end of the slow start, the TCP sender would have sent 2*(N/2)
119	   packets into the network.  For the remaining N/2 ACKs (i.e., ACKs
120	   between a(N/2+1) to a(N)) the TCP sender will remain in the
121	   congestion avoidance phase and send one packet for each ACK
122	   received--sending N/2 more data segments.  The net amount of data
123	   sent is therefore N/2 + N = 3N/2.

125	   Please note that the entire 3N/2 packets are injected into the
126	   network within a time period less than or equal to RTT in most cases.
127	   The number of data segments that left the network during this time is
128	   only N. Therefore, the conservation of packet principle has been
129	   compromised, and of the 3N/2 packets injected in the network, N/2
130	   packets will be lost with a very high probability.  These N/2 lost
131	   packets, however, need not come from the same connection, and such a
132	   data-burst will unnecessarily penalize all the competing TCP
133	   connections that share the same bottleneck router.

135	   Now let's assume there are M competing TCP connections that share the
136	   same bottleneck router(s) with C(0) (each connection is numbered C(0)
137	   ...  C(M-1)).  During the period of time while C(0) is stalled, the
138	   TCP sender does not use its network resources--the buffer space--on
139	   the bottleneck router(s).  The competing connections, C(1)...  C(M),
140	   however see this lack of activity as resource availability and start
141	   growing their window by at least one segment per RTT during this time
142	   period (by virtue of linear window increase during congestion
143	   avoidance phase).  For simplicity reasons, we assume that each of
144	   these connections has the same round trip time of RTT, and the idle
145	   time for C(0) is k*RTT (where k > RTO/RTT).  Under these assumptions,
146	   each of these competing connections will increase their congestion
147	   window by k segments.  Therefore the amount of packets lost in the
148	   network due to slow start following a spurious timeout can be as high
149	   as: N/2 + M*k.

151	   The Eifel response algorithm [7] solves the problem of N/2 packet
152	   loss, by restoring the congestion window to an old value immediately
153	   before the spurious timeout.  Based on the above equation, however,
154	   we note that the congestion state of the network not only depends
155	   upon the old window size, but also upon the duration of spurious
156	   timeout.  In our response algorithm, we therefore take the time
157	   duration of spurious timeout into account by reducing the data rate
158	   by half every RTO.  Please note that this scheme works well only when
159	   the number of competing connections M does not vary too much while
160	   C(0) was stalled.  A more conservative response algorithm should
161	   reduce the data rate to INIT_WINDOW if M is not bounded.

163	   In addition to the above congestion and packet loss issues, the
164	   current response after spurious timeouts is inefficient, in the sense
165	   that it unnecessarily retransmits data that is not lost, but simply
166	   stalled.  Such unnecessary retransmission is an issue when bandwidth
167	   resources are at a premium, like over a cellular link, where spectrum
168	   is scarce and expensive.

170	4.  DCLOR Response Algorithm

172	   A TCP sender should follow [6] or [9] (or any other algorithm) to
173	   detect a spurious timeout.  If the spurious timeout is confirmed and
174	   the TCP SACK option [4] is enabled, only then it SHOULD follow the
175	   DCLOR algorithm.

177	   The basic idea of this algorithm is that the ACKs received for the
178	   stalled packets don't provide sufficient information about the end-
179	   to-end congestion state of the network.  Therefore, the sender
180	   reduces the congestion window by 1/2 every RTO, and waits for the ACK
181	   or SACK of a new data packet before increasing it's congestion
182	   window.  Additionally, while the sender is waiting for the ACK/SACK
183	   of new data, it's allowed to send cwnd (the updated cwnd) worth of
184	   new data into the network.

186	   1.  The TCP sender MUST record the time when the first timeout took
187	       place, and when the first ACK after the timeout was received.
188	       Based on these times (or through some other means) it should
189	       compute the number of unbacked-off timeouts that must have taken
190	       place during this time period.  Let's call this number N-RTO.
191	       The sender should also keep the highest sequence number of data
192	       packet that was sent in a variable called SS_PTR.  The sender
193	       should also keep a counter called dclor_cntr, which allows the
194	       sender to send new data while it's waiting for the ACK or SACK of
195	       SS_PTR.  Additionally, the sender MUST NOT update the SS_TRHESH
196	       value due to spurious timeouts (i.e., the spurious timeout
197	       algorithm should leave SS_THRESH values unaltered).

199	   2.  Once the Spurious Timeout is confirmed, the TCP sender should set
200	       cwnd = max( 2, pipe-size/2^N-RTO). ( where pipe-size is the
201	       packets in flight at the time when spurious timeout was
202	       confirmed.)  Additionally, it should set dclor_cntr = 0.

204	   3.  For each ACK or SACK < SS_PTR (i.e., a SACK block whose left edge
205	       is < SS_PTR), the sender SHOULD send one *new* data packet if it
206	       is present and if dclor_cntr < cwnd and (rwnd < SND.NXT -
207	       SND.UNA).  If (rwnd >= SND.NXT - SND.UNA) or if there is no new
208	       data to send, then the sender MUST retransmit no more than one
209	       packet per RTO from the tail of the retransmission queue
210	       regardless of the value of dclor_cntr.  Moreover, for each *new*
211	       packet sent, dclor_cntr should be incremented by one.  For ACK/
212	       SACK < SS_PTR, the sender MUST not initiate any loss recovery
213	       algorithm nor should it update cwnd value.  Additionally, the
214	       SS_THRESH should be left unchanged for all these ACKs.

216	   4.  If the sender receives a pure ACK > SS_PTR, it should update cwnd
217	       = cwnd+1, and follow normal TCP behavior.  (Note that this means
218	       that none of the stalled packets were lost so we don't need to
219	       change SS_THRESH value).

221	   5.  If the sender receives a SACK block whose left edge is greater
222	       than SS_PTR, then it should traverse the retransmission queue
223	       from SND.UNA to the left edge of SACK block, and mark all
224	       unsacked packets as lost.  Additionally, it should set cwnd =
225	       cwnd + 1 and reset SS_THRESH to 1/2 the pipe-size.  Beyond this
226	       point, the sender MUST recover lost packets based on [2].

228	5.  Data Delivery To Upper Layers

230	   If a TCP sender loses its entire congestion window worth of data,
231	   sending new data after timeout prevents a TCP receiver from
232	   forwarding the new data to the upper layers immediately.  However,
233	   once the SACK for this new data is received, the TCP sender will send
234	   the first lost segment.  This essentially means that data delivery to
235	   the upper layers could be delayed by at most one RTT when all the
236	   packets are lost in the network.

238	   This, however, does not affect the throughput of the connection in
239	   any way.  If a timeout has occurred, then the data delivery to the
240	   upper layers has already been excessively delayed.  Delaying it by
241	   another round trip is not a serious problem.  Please note that
242	   reliability and timeliness are two conflicting issues and one cannot
243	   gain on one without sacrificing something else on the other.

245	6.  SACK reneging

247	   The TCP SACK information is meant to be advisory, and a TCP receiver
248	   is allowed--though strongly discouraged--to discard data blocks the
249	   receiver has already SACKed [4].  Please note however that even if
250	   the TCP receiver discards the data block it received, it MUST still
251	   send the SACK block for at least the recent most data received.
252	   Therefore in spite of SACK reneging, DCLOR will work without any
253	   deadlocks.

255	   A SACK implementation is also allowed not to send a SACK block even
256	   though the TCP sender and receiver might have agreed to SACK-
257	   Permitted option at the start of the connection.  In these cases,
258	   however, if the receiver sends one SACK block, it must send SACK
259	   blocks for the rest of the connection.  Because of the above
260	   mentioned leniency in implementation, its possible that a TCP
261	   receiver may agree on SACK-Permitted option, and yet not send any
262	   SACK blocks.  To make DCLOR robust under these circumstances, DCLOR
263	   SHOULD NOT be invoked unless the sender has seen at least one SACK
264	   block before timeout.  We, however, believe that once the SACK-
265	   Permitted option is accepted, the TCP receiver MUST send a SACK
266	   block--even though that block might finally be discarded.  Otherwise,
267	   the SACK-Permitted option is completely redundant and serves little
268	   purpose.  To the best of our knowledge, almost all SACK
269	   implementations send a SACK block if they have accepted the SACK-
270	   Permitted option.

272	7.  Security Consideration

274	   DCLOR does not open TCP to new attacks.

276	8.  Acknowledgments

278	   We would like to thank Shashikant Maheshwari, Pasi Sarolahti, and
279	   Mika Liljeberg for their comments and suggestions on a previous
280	   version of this draft.  Special thanks to Jani Hirsimaki for
281	   thoroughly reviewing the document and providing feedback on the
282	   algorithm.

284	9.  References

286	   [1]  Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
287	        Control", RFC 2581, April 1999.

289	   [2]  Blanton, E., Allman, M., Fall, K., and L. Wang, "Conservative
290	        SACK-based Loss Recovery Algorithm for TCP", RFC 3517,
291	        April 2003.

293	   [3]  Floyd, S., "Congestion Control Principles", RFC 2914,
294	        September 2002.

296	   [4]  Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "TCP
297	        Selective Acknowledgement Options", RFC 2018, July 2000.

299	   [5]  Handley, M., Padhye, J., and S. Floyd, "TCP Congestion Window
300	        Validation", RFC 2861, June 2000.

302	   [6]  Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm",
303	        RFC 3522, April 2003.

305	   [7]  Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm for
306	        TCP.", Internet draft; work in progress, draft-ietf-tsvwg- tcp-
307	        eifel-response-05.txt, March 2004.

309	   [8]  Paxson, V. and M. Allman, "Computing TCP's Retransmission
310	        Timer", RFC 2988, November 2000.

312	   [9]  Sarolahti, P. and M. Kojo, "F-RTO: A TCP RTO Recovery Algorithm
313	        for Avoiding Unnecessary Retransmissions.", Internet draft; work
314	        in progress, July 2004.

316	Authors' Addresses

318	   Yogesh Prem Swami
319	   Nokia Research Center, Dallas
320	   6000 Connection Drive
321	   Irving, TX  75039
322	   USA

324	   Phone: +1 972 374 0669
325	   Email: yogesh.swami@nokia.com

327	   Khiem Le
328	   Nokia Research Center, Dallas
329	   6000 Connection Drive
330	   Irving, TX  75039
331	   USA

333	   Phone: +1 972 894 4882
334	   Email: khiem.le@nokia.com

336	Intellectual Property Statement

338	   The IETF takes no position regarding the validity or scope of any
339	   Intellectual Property Rights or other rights that might be claimed to
340	   pertain to the implementation or use of the technology described in
341	   this document or the extent to which any license under such rights
342	   might or might not be available; nor does it represent that it has
343	   made any independent effort to identify any such rights.  Information
344	   on the procedures with respect to rights in RFC documents can be
345	   found in BCP 78 and BCP 79.

347	   Copies of IPR disclosures made to the IETF Secretariat and any
348	   assurances of licenses to be made available, or the result of an
349	   attempt made to obtain a general license or permission for the use of
350	   such proprietary rights by implementers or users of this
351	   specification can be obtained from the IETF on-line IPR repository at
352	   http://www.ietf.org/ipr.

354	   The IETF invites any interested party to bring to its attention any
355	   copyrights, patents or patent applications, or other proprietary
356	   rights that may cover technology that may be required to implement
357	   this standard.  Please address the information to the IETF at
358	   ietf-ipr@ietf.org.

360	Disclaimer of Validity

362	   This document and the information contained herein are provided on an
363	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
364	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
365	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
366	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
367	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
368	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

370	Copyright Statement

372	   Copyright (C) The Internet Society (2005).  This document is subject
373	   to the rights, licenses and restrictions contained in BCP 78, and
374	   except as set forth therein, the authors retain all their rights.

376	Acknowledgment

378	   Funding for the RFC Editor function is currently provided by the
379	   Internet Society.