idnits 2.17.1 

draft-ietf-tcpm-tcp-lcd-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 29, 2010) is 5019 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323)

  ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 3782
     (Obsoleted by RFC 6582)


     Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TCP Maintenance and Minor                                  A. Zimmermann
3	Extensions (TCPM) WG                                        A. Hannemann
4	Internet-Draft                                    RWTH Aachen University
5	Intended status: Experimental                              July 29, 2010
6	Expires: January 30, 2011

8	   Making TCP more Robust to Long Connectivity Disruptions (TCP-LCD)
9	                       draft-ietf-tcpm-tcp-lcd-02

11	Abstract

13	   Disruptions in end-to-end path connectivity, which last longer than
14	   one retransmission timeout, cause suboptimal TCP performance.  The
15	   reason for this performance degradation is that TCP interprets
16	   segment loss induced by long connectivity disruptions as a sign of
17	   congestion, resulting in repeated retransmission timer backoffs.
18	   This, in turn, leads to a delayed detection of the re-establishment
19	   of the connection since TCP waits for the next retransmission timeout
20	   before it attempts a retransmission.

22	   This document proposes an algorithm to make TCP more robust to long
23	   connectivity disruptions (TCP-LCD).  It describes how standard ICMP
24	   messages can be exploited during timeout-based loss recovery to
25	   disambiguate true congestion loss from non-congestion loss caused by
26	   connectivity disruptions.  Moreover, a reversion strategy of the
27	   retransmission timer is specified that enables a more prompt
28	   detection of whether or not the connectivity to a previously
29	   disconnected peer node has been restored.  TCP-LCD is a TCP sender-
30	   only modification that effectively improves TCP performance in case
31	   of connectivity disruptions.

33	Status of this Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   Internet-Drafts are working documents of the Internet Engineering
39	   Task Force (IETF).  Note that other groups may also distribute
40	   working documents as Internet-Drafts.  The list of current Internet-
41	   Drafts is at http://datatracker.ietf.org/drafts/current/.

43	   Internet-Drafts are draft documents valid for a maximum of six months
44	   and may be updated, replaced, or obsoleted by other documents at any
45	   time.  It is inappropriate to use Internet-Drafts as reference
46	   material or to cite them other than as "work in progress."

48	   This Internet-Draft will expire on January 30, 2011.

50	Copyright Notice

52	   Copyright (c) 2010 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
68	   2.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
69	   3.  Connectivity Disruption Indication . . . . . . . . . . . . . .  6
70	   4.  Connectivity Disruption Reaction . . . . . . . . . . . . . . .  8
71	     4.1.  Basic Idea . . . . . . . . . . . . . . . . . . . . . . . .  8
72	     4.2.  Algorithm Details  . . . . . . . . . . . . . . . . . . . .  8
73	   5.  Discussion of TCP-LCD  . . . . . . . . . . . . . . . . . . . . 11
74	     5.1.  Retransmission Ambiguity . . . . . . . . . . . . . . . . . 12
75	     5.2.  Wrapped Sequence Numbers . . . . . . . . . . . . . . . . . 12
76	     5.3.  Packet Duplication . . . . . . . . . . . . . . . . . . . . 14
77	     5.4.  Probing Frequency  . . . . . . . . . . . . . . . . . . . . 14
78	     5.5.  Reaction during Connection Establishment . . . . . . . . . 14
79	     5.6.  Reaction in Steady-State . . . . . . . . . . . . . . . . . 15
80	   6.  Dissolving Ambiguity Issues using the TCP Timestamps Option  . 15
81	   7.  Interoperability Issues  . . . . . . . . . . . . . . . . . . . 17
82	     7.1.  Detection of TCP Connection Failures . . . . . . . . . . . 17
83	     7.2.  Explicit Congestion Notification . . . . . . . . . . . . . 17
84	     7.3.  ICMP for IP version 6  . . . . . . . . . . . . . . . . . . 18
85	     7.4.  TCP-LCD and IP Tunnels . . . . . . . . . . . . . . . . . . 18
86	   8.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 19
87	   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 20
88	   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 20
89	   11. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 20
90	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21
91	     12.1. Normative References . . . . . . . . . . . . . . . . . . . 21
92	     12.2. Informative References . . . . . . . . . . . . . . . . . . 21
93	   Appendix A.  Changes from previous versions of the draft . . . . . 23
94	     A.1.  Changes from draft-ietf-tcpm-tcp-lcd-01  . . . . . . . . . 24
95	     A.2.  Changes from draft-ietf-tcpm-tcp-lcd-00  . . . . . . . . . 24
96	     A.3.  Changes from draft-zimmermann-tcp-lcd-02 . . . . . . . . . 24
97	     A.4.  Changes from draft-zimmermann-tcp-lcd-01 . . . . . . . . . 25
98	     A.5.  Changes from draft-zimmermann-tcp-lcd-00 . . . . . . . . . 25
99	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25

101	1.  Terminology

103	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
104	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
105	   document are to be interpreted as described in [RFC2119].

107	   The reader should be familiar with the algorithm and terminology from
108	   [RFC2988], which defines the standard algorithm Transmission Control
109	   Protocol (TCP) senders are required to use to compute and manage
110	   their retransmission timer.  In this document, the terms
111	   "retransmission timer" and "retransmission timeout" are used as
112	   defined in [RFC2988].  The retransmission timer ensures data delivery
113	   in the absence of any feedback from the receiver.  The duration of
114	   this timer is referred to as retransmission timeout (RTO).

116	   As defined in [RFC0793], the term "acceptable acknowledgment (ACK)"
117	   refers to a TCP segment that acknowledges previously unacknowledged
118	   data.  The TCP sender state variable "SND.UNA" and the current
119	   segment variable "SEG.SEQ" are used as defined in [RFC0793].  SND.UNA
120	   holds the segment sequence number of earliest segment that has not
121	   been acknowledged by the TCP receiver (the oldest outstanding
122	   segment).  SEG.SEQ is the segment sequence number of a given segment.

124	   For the purposes of this specification, we define the term "timeout-
125	   based loss recovery" that refers to the state that a TCP sender
126	   enters upon the first timeout of the oldest outstanding segment
127	   (SND.UNA) and leaves upon the arrival of the *first* acceptable ACK.
128	   It is important to note that other documents use a different
129	   interpretation of the term "timeout-based loss recovery".  For
130	   example, the NewReno modification to TCP's Fast Recovery algorithm
131	   [RFC3782] extents the period a TCP sender remains in timeout-based
132	   loss recovery compared to the one defined in this document.  This is
133	   because [RFC3782] attempts to avoid unnecessary multiple Fast
134	   Retransmits that can occur after an RTO.

136	2.  Introduction

138	   Connectivity disruptions can occur in many different situations.  The
139	   frequency of connectivity disruptions depends on the properties of
140	   the end-to-end path between the communicating hosts.  While
141	   connectivity disruptions can occur in traditional wired networks,
142	   e.g., caused by an unplugged network cable, the likelihood of their
143	   occurrence is significantly higher in wireless (multi-hop) networks.
144	   Especially, end-host mobility, network topology changes, and wireless
145	   interferences are crucial factors.  In the case of the Transmission
146	   Control Protocol (TCP) [RFC0793], the performance of the connection
147	   can experience a significant reduction compared to a permanently
148	   connected path [SESB05].  This is because TCP, which was originally
149	   designed to operate in fixed and wired networks, generally assumes
150	   that the end-to-end path connectivity is relatively stable over the
151	   connection's lifetime.

153	   Depending on their duration, connectivity disruptions can be
154	   classified into two groups [I-D.schuetz-tcpm-tcp-rlci]: "short" and
155	   "long".  A connectivity disruption is "short" if connectivity returns
156	   before the retransmission timer fires for the first time.  In this
157	   case, TCP recovers lost data segments through Fast Retransmit and
158	   lost acknowledgments (ACK) through successfully delivered later ACKs.
159	   Connectivity disruptions are declared as "long" for a given TCP
160	   connection if the retransmission timer fires at least once before
161	   connectivity is resumed.  Whether or not path characteristics, like
162	   the round trip time (RTT) or the available bandwidth, have changed
163	   when connectivity resumes after a disruption is another important
164	   aspect for TCP's retransmission scheme [I-D.schuetz-tcpm-tcp-rlci].

166	   This document improves TCP's behavior in case of "long connectivity
167	   disruptions".  In particular, it focuses on the period prior to the
168	   re-establishment of the connectivity to a previously disconnected
169	   peer node.  The document does not describe any modifications to TCP's
170	   behavior and its congestion control mechanisms [RFC5681] after
171	   connectivity has been restored.

173	   When a long connectivity disruption occurs on a TCP connection, the
174	   TCP sender eventually does not receive any more acknowledgments.
175	   After the retransmission timer expires, the TCP sender enters the
176	   timeout-based loss recovery and declares the oldest outstanding
177	   segment (SND.UNA) as lost.  Since TCP tightly couples reliability and
178	   congestion control, the retransmission of SND.UNA is triggered
179	   together with the reduction of the transmission rate.  This is based
180	   on the assumption that segment loss is an indication of congestion
181	   [RFC5681].  As long as the connectivity disruption persists, TCP will
182	   repeat this procedure until the oldest outstanding segment has
183	   successfully been acknowledged, or until the connection has timed
184	   out.  TCP implementations that follow the recommended retransmission
185	   timeout (RTO) management of RFC 2988 [RFC2988] double the RTO after
186	   each retransmission attempt.  However, the RTO growth may be bounded
187	   by an upper limit, the maximum RTO, which is at least 60s, but may be
188	   longer: Linux, for example, uses 120s.  If connectivity is restored
189	   between two retransmission attempts, TCP still has to wait until the
190	   retransmission timer expires before resuming transmission, since it
191	   simply does not have any means to know if the connectivity has been
192	   re-established.  Therefore, depending on when connectivity becomes
193	   available again, this can waste up to a maximum RTO of possible
194	   transmission time.

196	   This retransmission behavior is not efficient, especially in
197	   scenarios with long connectivity disruptions.  In the ideal case, TCP
198	   would attempt a retransmission as soon as connectivity to its peer
199	   has been re-established.  In this document, we specify a TCP sender-
200	   only modification to provide robustness to long connectivity
201	   disruptions (TCP-LCD).  The memo describes how the standard Internet
202	   Control Message Protocol (ICMP) can be exploited during timeout-based
203	   loss recovery to identify non-congestion loss caused by long
204	   connectivity disruptions.  TCP-LCD's reversion strategy of the
205	   retransmission timer enables higher-frequency retransmissions and
206	   thereby a prompt detection when connectivity to a previously
207	   disconnected peer node has been restored.  If no congestion is
208	   present, TCP-LCD approaches the ideal behavior.

210	3.  Connectivity Disruption Indication

212	   If the queue of an intermediate router that is experiencing a link
213	   outage can buffer all incoming packets, a connectivity disruption
214	   will only cause a variation in delay, which is handled well by TCP
215	   implementations using either Eifel [RFC3522], [RFC4015] or Forward
216	   RTO-Recovery (F-RTO) [RFC5682].  However, if the link outage lasts
217	   for too long, the router experiencing the link outage is forced to
218	   drop packets, and finally to discard the according route.  Means to
219	   detect such link outages include reacting on failed address
220	   resolution protocol (ARP) [RFC0826] queries, unsuccessful link
221	   sensing, and the like.  However, this is solely in the responsibility
222	   of the respective router.

224	      Note: The focus of this memo is on introducing a method how ICMP
225	      messages may be exploited to improve TCP's performance; how
226	      different physical and link layer mechanisms below the network
227	      layer may trigger ICMP destination unreachable messages are out of
228	      scope of this memo.

230	   Provided that no other route to the specific destination exists, the
231	   router will notify the corresponding sending host about the dropped
232	   packets via ICMP destination unreachable messages of code 0 (net
233	   unreachable) or code 1 (host unreachable) [RFC1812].  Therefore, the
234	   sending host can use the ICMP destination unreachable messages of
235	   these codes as an indication for a connectivity disruption, since the
236	   reception of these messages provide evidence that packets were
237	   dropped due to a link outage.

239	   Note that there are also other ICMP destination unreachable messages
240	   with different codes.  Some of them are candidates for connectivity
241	   disruption indications, too, but need further investigation.  For
242	   example, ICMP destination unreachable messages with code 5 (source
243	   route failed), code 11 (net unreachable for TOS), or code 12 (host
244	   unreachable for TOS) [RFC1812].  On the other hand, codes that flag
245	   hard errors are of no use for this scheme, since TCP should abort the
246	   connection when those are received [RFC1122].  In the following, the
247	   term "ICMP unreachable message" is used as synonym for ICMP
248	   destination unreachable messages of code 0 or code 1.

250	   The accurate interpretation of ICMP unreachable messages as a
251	   connectivity disruption indication is complicated by the following
252	   two peculiarities of ICMP messages.  First, they do not necessarily
253	   operate on the same timescale as the packets, i.e., TCP segments that
254	   elicited them.  When a router drops a packet due to a missing route,
255	   it will not necessarily send an ICMP unreachable message immediately,
256	   but will rather queue it for later delivery.  Second, ICMP messages
257	   are subject to rate limiting, e.g., when a router drops a whole
258	   window of data due to a link outage, it is unlikely to send as many
259	   ICMP unreachable messages as dropped TCP segments.  Depending on the
260	   load of the router, it may not even send any ICMP unreachable
261	   messages at all.  Both peculiarities originate from [RFC1812].

263	   Fortunately, according to [RFC0792], ICMP unreachable messages have
264	   to contain in their body the entire Internet Protocol (IP) header
265	   [RFC0791] of the datagram eliciting the ICMP unreachable message,
266	   plus the first 64 bits of the payload of that datagram.  This allows
267	   the sending host to match the ICMP error message to the transport
268	   connection that elicited it.  RFC 1812 [RFC1812] augments these
269	   requirements and states that ICMP messages should contain as much of
270	   the original datagram as possible without the length of the ICMP
271	   datagram exceeding 576 bytes.  Therefore, in case of TCP, at least
272	   the source port number, the destination port number, and the 32-bit
273	   TCP sequence number are included.  This allows the originating TCP to
274	   demultiplex the received ICMP message and to identify the affected
275	   connection.  Moreover, it can identify which segment of the
276	   respective connection triggered the ICMP unreachable message, unless
277	   there are several segments in-flight with the same sequence number
278	   (see Section 5.1).

280	   A connectivity disruption indication in form of an ICMP unreachable
281	   message associated with a presumably lost TCP segment provides strong
282	   evidence that the segment was not dropped due to congestion, but was
283	   successfully delivered as far as the reporting router.  It therefore
284	   did not witness any congestion at least on that part of the path that
285	   was traversed by both the TCP segment eliciting the ICMP unreachable
286	   message as well as the ICMP unreachable message itself.

288	4.  Connectivity Disruption Reaction

290	   Section 4.1 introduces the basic idea of TCP-LCD.  The complete
291	   algorithm is specified in Section 4.2.

293	4.1.  Basic Idea

295	   The goal of the algorithm is to promptly detect when connectivity to
296	   a previously disconnected peer node has been restored after a long
297	   connectivity disruption, while retaining appropriate behavior in case
298	   of congestion.  TCP-LCD exploits standard ICMP unreachable messages
299	   during timeout-based loss recovery.  This increases TCP's
300	   retransmission frequency by undoing one retransmission timer backoff
301	   whenever an ICMP unreachable message is received that contains a
302	   segment with a sequence number of a presumably lost retransmission.

304	   This approach has the advantage of appropriately reducing the probing
305	   rate in case of congestion.  If either the retransmission itself or
306	   the corresponding ICMP message is dropped the previously performed
307	   retransmission timer backoff is not undone, which effectively halves
308	   the probing rate.

310	4.2.  Algorithm Details

312	   A TCP sender that uses RFC 2988 [RFC2988] to compute TCP's
313	   retransmission timer MAY employ the following scheme to avoid over-
314	   conservative retransmission timer backoffs in case of long
315	   connectivity disruptions.  If a TCP sender does implement the
316	   following steps, the algorithm MUST be initiated upon the first
317	   timeout of the oldest outstanding segment (SND.UNA) and MUST be
318	   stopped upon the arrival of the first acceptable ACK.  The algorithm
319	   MUST NOT be re-initiated upon subsequent timeouts for the same
320	   segment.  The scheme SHOULD NOT be used in SYN-SENT or SYN-RECEIVED
321	   states [RFC0793] (see Section 5.5).

323	   A TCP sender that does not employ RFC 2988 [RFC2988] to compute TCP's
324	   retransmission timer MUST NOT use TCP-LCD.  We envision that the
325	   scheme could be easily adapted to algorithms others than RFC 2988.
326	   However, we leave this as future work.

328	   In rule (2.5), RFC 2988 [RFC2988] provides the option to place a
329	   maximum value on the RTO.  When a TCP implements this rule to provide
330	   an upper bound for the RTO, it MUST also be used in the following
331	   algorithm.  In particular, if the RTO is bounded by an upper limit
332	   (maximum RTO), the "MAX_RTO" variable used in this scheme MUST be
333	   initialized with this upper limit.  Otherwise, if the RTO is
334	   unbounded, the "MAX_RTO" variable MUST be set to infinity.

336	   The scheme specified in this document uses the "BACKOFF_CNT"
337	   variable, whose initial value is zero.  The variable is used to count
338	   the number of performed retransmission timer backoffs during one
339	   timeout-based loss recovery.  Moreover, the "RTO_BASE" variable is
340	   used to recover the previous RTO if the retransmission timer backoff
341	   was unnecessary.  The variable is initialized with the RTO upon
342	   initiation of timeout-based loss recovery.

344	   (1)  Before TCP updates the variable "RTO" when it initiates timeout-
345	        based loss recovery, set the variables "BACKOFF_CNT" and
346	        "RTO_BASE" as follows:

348	           BACKOFF_CNT := 0;
349	           RTO_BASE := RTO.

351	        Proceed to step (R).

353	   (R)  This is a placeholder for standard TCP's behavior in case the
354	        retransmission timer has expired.  In particular, if RFC 2988
355	        [RFC2988] is used, steps (5.4) - (5.6) of that algorithm go
356	        here.  Proceed to step (2).

358	   (2)  To account for the expiration of the retransmission timer in the
359	        previous step (R), increment the "BACKOFF_CNT" variable by one:

361	           BACKOFF_CNT := BACKOFF_CNT + 1.

363	   (3)  Wait either

365	           for the expiration of the retransmission timer.  When the
366	           retransmission timer expires, proceed to step (R);

368	           or for the arrival of an acceptable ACK.  When an acceptable
369	           ACK arrives, proceed to step (A);

371	           or for the arrival of an ICMP unreachable message.  When the
372	           ICMP unreachable message "ICMP_DU" arrives, proceed to step
373	           (4).

375	   (4)  If "BACKOFF_CNT > 0", i.e., if at least one retransmission timer
376	        backoff can be undone, then

378	           proceed to step (5);

380	        else

382	           proceed to step (3).

384	   (5)  Extract the TCP segment header included in the ICMP unreachable
385	        message "ICMP_DU":

387	           SEG := Extract(ICMP_DU).

389	   (6)  If "SEG.SEQ == SND.UNA", i.e., if the TCP segment "SEG"
390	        eliciting the ICMP unreachable message "ICMP_DU" contains the
391	        sequence number of a retransmission, then

393	           proceed to step (7);

395	        else

397	           proceed to step (3).

399	   (7)  Undo the last retransmission timer backoff:

401	           BACKOFF_CNT := BACKOFF_CNT - 1;
402	           RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO).

404	   (8)  If the retransmission timer expires due to the undoing in the
405	        previous step (7), then

407	           proceed to step (R);

409	        else

411	           proceed to step (3).

413	   (A)  This is a placeholder for standard TCP's behavior in case an
414	        acceptable ACK has arrived.  No further processing.

416	   When a TCP in steady-state detects a segment loss using the
417	   retransmission timer, it enters the timeout-based loss recovery and
418	   initiates the algorithm (step 1).  It adjusts the slow start
419	   threshold (ssthresh), sets the congestion window (CWND) to one
420	   segment, backs off the retransmission timer, and retransmits the
421	   first unacknowledged segment (step R) [RFC5681], [RFC2988].  To
422	   account for the expiration of the retransmission timer, the TCP
423	   sender increments the "BACKOFF_CNT" variable by one (step 2).

425	   In case the retransmission timer expires again (step 3a), a TCP will
426	   repeat the retransmission of the first unacknowledged segment and
427	   back off the retransmission timer once more (step R) [RFC2988], as
428	   well as increment the "BACKOFF_CNT" variable by one (step 2).  Note
429	   that a TCP may implement RFC 2988's [RFC2988] option to place a
430	   maximum value on the RTO that may result in not performing the
431	   retransmission timer backoff.  However, step (2) MUST always and
432	   unconditionally be applied, no matter whether or not the
433	   retransmission timer is actually backed off.  In other words, each
434	   time the retransmission timer expires, the "BACKOFF_CNT" variable
435	   MUST be incremented by one.

437	   If the first received packet after the retransmission(s) is an
438	   acceptable ACK (step 3b), a TCP will proceed as normal, i.e., slow
439	   start the connection and terminate the algorithm (step A).  Later
440	   ICMP unreachable messages from the just terminated timeout-based loss
441	   recovery are ignored, since the ACK clock is already restarting due
442	   to the successful retransmission.

444	   On the other hand, if the first received packet after the
445	   retransmission(s) is an ICMP unreachable message (step 3c), and if
446	   step (4) permits it, TCP SHOULD undo one backoff for each ICMP
447	   unreachable message reporting an error on a retransmission.  To
448	   decide if an ICMP unreachable message was elicited by a
449	   retransmission, the sequence number it contains is inspected (step 5,
450	   step 6).  The undo is performed by re-calculating the RTO with the
451	   decremented "BACKOFF_CNT" variable (step 7).  This calculation
452	   explicitly matches the (bounded) exponential backoff specified in
453	   rule (5.5) of [RFC2988].

455	   Upon receipt of an ICMP unreachable message that legitimately undoes
456	   one backoff, there is the possibility that the shortened
457	   retransmission timer has already expired (step 8).  Then, TCP SHOULD
458	   retransmit immediately.  In case the shortened retransmission timer
459	   has not yet expired, TCP MUST wait accordingly.

461	5.  Discussion of TCP-LCD

463	   TCP-LCD takes caution to only react to connectivity disruption
464	   indications in the form of ICMP unreachable messages during timeout-
465	   based loss recovery.  Therefore, TCP's behavior is not altered when
466	   either no ICMP unreachable messages are received, or the
467	   retransmission timer of the TCP sender did not expire since the last
468	   received acceptable ACK.  Thus, by defintion, the algorithm triggers
469	   only in the case of long connectivity disruptions.

471	   Only such ICMP unreachable messages that contain a TCP segment with a
472	   the sequence number of a retransmission, i.e., contain SND.UNA, are
473	   evaluated by TCP-LCD.  All other ICMP unreachable messages are
474	   ignored.  The arrival of those ICMP unreachable messages provides
475	   strong evidence that the retransmissions were not dropped due to
476	   congestion, but were successfully delivered to the reporting router.
477	   In other words, there is no evidence for any congestion at least on
478	   that very part of the path that was traversed by both the TCP segment
479	   eliciting the ICMP unreachable message as well as the ICMP
480	   unreachable message itself.

482	   However, there are some situations where TCP-LCD makes a false
483	   decision and incorrectly undoes a retransmission timer backoff.  This
484	   can happen, even when the received ICMP unreachable message contains
485	   the segment number of a retransmission (SND.UNA), because the TCP
486	   segment that elicited the ICMP unreachable message may either not be
487	   a retransmission (Section 5.1), or does not belong to the current
488	   timeout-based loss recovery (Section 5.2).  Finally, packet
489	   duplication (Section 5.3) can also spuriously trigger the algorithm.

491	   Section 5.4 discusses possible probing frequencies, while Section 5.6
492	   describes the motivation for not reacting to ICMP unreachable
493	   messages while TCP is in steady-state.

495	5.1.  Retransmission Ambiguity

497	   Historically, the retransmission ambiguity problem [Zh86], [KP87] is
498	   the TCP sender's inability to distinguish whether the first
499	   acceptable ACK after a retransmission refers to the original
500	   transmission or to the retransmission.  This problem occurs after
501	   both a Fast Retransmit and a timeout-based retransmit.  However,
502	   modern TCP implementations can eliminate the retransmission ambiguity
503	   with either the help of Eifel [RFC3522], [RFC4015] or Forward RTO-
504	   Recovery (F-RTO) [RFC5682].

506	   The reversion strategy of the given algorithm suffers from a form of
507	   retransmission ambiguity, too.  In contrast to the above case, TCP
508	   suffers from ambiguity regarding ICMP unreachable messages received
509	   during timeout-based loss recovery.  With the TCP segment number
510	   included in the ICMP unreachable message, a TCP sender is not able to
511	   determine if the ICMP unreachable message refers to the original
512	   transmission or to any of the timeout-based retransmissions.  That
513	   is, there is an ambiguity with regards to which TCP segment an ICMP
514	   unreachable message reports on.

516	   However, this ambiguity is not considered to be a problem for the
517	   algorithm.  The assumption that a received ICMP message provides
518	   evidence that a non-congestion loss caused by the connectivity
519	   disruption was wrongly considered a congestion loss still holds,
520	   regardless to which TCP segment, transmission or retransmission, the
521	   message refers.

523	5.2.  Wrapped Sequence Numbers

525	   Besides the ambiguity whether a received ICMP unreachable message
526	   refers to the original transmission or to any of the retransmissions,
527	   there is another source of ambiguity related to the TCP sequence
528	   numbers contained in ICMP unreachable messages.  For high bandwidth
529	   paths, the sequence space may wrap quickly.  This migth cause that
530	   delayed ICMP unreachable messages may coincidentally fit as valid
531	   input in the proposed scheme.  As a result, the scheme may
532	   incorrectly undo retransmission timer backoffs.  Chances for this to
533	   happen are minuscule, since a particular ICMP message would need to
534	   contain the exact sequence number of the current oldest outstanding
535	   segment (SND.UNA), while at the same time TCP is in timeout-based
536	   loss recovery.  However, two "worst case" scenarios for the algorithm
537	   are possible:

539	   For instance, consider a steady state TCP connection, which will be
540	   disrupted at an intermediate router R due to a link outage.  Upon the
541	   expiration of the RTO, the TCP sender enters the timeout-based loss
542	   recovery and starts to retransmit the earliest segment that has not
543	   been acknowledged (SND.UNA).  For some reason, router R delays all
544	   corresponding ICMP unreachable messages so that the TCP sender backs
545	   the retransmission timer off normally without any undoing.  At the
546	   end of the connectivity disruption, the TCP sender eventually detects
547	   the re-establishment, leaves the scheme and finally the timeout-based
548	   loss recovery, too.  A sequence number wrap-around later, the
549	   connectivity between the two peers is disrupted again, but this time
550	   due to congestion and exactly at the time at which the current
551	   SND.UNA matches the SND.UNA from the previous cycle.  If router R
552	   emits the delayed ICMP unreachable messages now, the TCP sender would
553	   incorrectly undo retransmission timer backoffs.  As the TCP sequence
554	   number contains 32 bits, the probability of this scenario is at most
555	   1/2^32.  Given sufficiently many retransmissions in the first
556	   timeout-based loss recovery, the corresponding ICMP unreachable
557	   messages could reduce the RTO in the second recovery at most to
558	   "RTO_BASE".  However, once the ICMP unreachable messages are
559	   depleted, the standard exponential backoff will be performed.  Thus,
560	   the congestion response will only be delayed by some false
561	   retransmissions.

563	   Similar to the above, consider the case where a steady state TCP
564	   connection with n segments in flight will be disrupted at some point
565	   due to a link outage at an intermediate router R. For each segment in
566	   flight, router R may generate an ICMP unreachable message.  However,
567	   due to some reason it delays them.  Once the link outage is over and
568	   the connection has been re-established, the TCP sender leaves the
569	   scheme and slow-starts the connection.  Following a sequence number
570	   wrap-around, a retransmission timeout occurs, just at the moment the
571	   TCP sender's current window of data reaches the previous range of the
572	   sequence number space again.  In case router R emits the delayed ICMP
573	   unreachable messages now, spurious undoing of the retransmission
574	   timer backoff is possible once, if the TCP segment number contained
575	   in ICMP unreachable messages matches the current SND.UNA, and the
576	   timeout was a result of congestion.  In the case of another
577	   connectivity disruption, the additional undoing of the retransmission
578	   timer backoff has no impact.  The probability of this scenario is at
579	   most n/2^32.

581	5.3.  Packet Duplication

583	   In case an intermediate router duplicates packets, a TCP sender may
584	   receive more ICMP unreachable messages during timeout-based loss
585	   recovery than sent timeout-based retransmissions.  However, since
586	   TCP-LCD keeps track of the number of performed retransmission timer
587	   backoffs in the "BACKOFF_CNT" variable, it will not undo more
588	   retransmission timer backoffs than were actually performed.
589	   Nevertheless, if packet duplication and congestion coincide on the
590	   path between the two communicating hosts, duplicated ICMP messages
591	   could hide the congestion loss of some retransmissions or ICMP
592	   messages, and the algorithm may incorrectly undo retransmission timer
593	   backoffs.  Considering the overall impact of a router that duplicates
594	   packets, the additional load induced by some spurious timeout-based
595	   retransmits can probably be neglected.

597	5.4.  Probing Frequency

599	   One could argue that if an ICMP unreachable message arrives for a
600	   timeout-based retransmission, the RTO shall be reset or recalculated,
601	   similar to what is done when an ACK arrives during timeout-based loss
602	   recovery (see Karn's algorithm [KP87], [RFC2988]), and a new
603	   retransmission should be sent immediately.  Generally, this would
604	   allow for a much higher probing frequency based on the round trip
605	   time up to the router where connectivity has been disrupted.
606	   However, we believe the current scheme provides a good trade-off
607	   between conservative behavior and fast detection of connectivity re-
608	   establishment.

610	5.5.  Reaction during Connection Establishment

612	   It is possible that a TCP sender enters timeout-based loss recovery
613	   while the connection is in SYN-SENT or SYN-RECEIVED states [RFC0793].
614	   The algorithm described in this document could also be used for
615	   faster connection establishment in networks with connectivity
616	   disruptions.  However, because existing TCP implementations [RFC5461]
617	   already interpret ICMP unreachable messages during connection
618	   establishment and abort the corresponding connection, we refrain from
619	   suggesting this.

621	5.6.  Reaction in Steady-State

623	   Another exploitation of ICMP unreachable messages in the context of
624	   TCP congestion control might seem appropriate in case the ICMP
625	   unreachable message is received while TCP is in steady-state, and the
626	   message refers to a segment from within the current window of data.
627	   As the RTT up to the router that generated the ICMP unreachable
628	   message is likely to be substantially shorter than the overall RTT to
629	   the destination, the ICMP unreachable message may very well reach the
630	   originating TCP while it is transmitting the current window of data.
631	   In case the remaining window is large, it might seem appropriate to
632	   refrain from transmitting the remaining window as there is timely
633	   evidence that it will only trigger further ICMP unreachable messages
634	   at the very router.  Although this promises improvement from a
635	   wastage perspective, it may be counterproductive from a security
636	   perspective.  An attacker could forge such ICMP messages, thereby
637	   forcing the originating TCP to stop sending data, very similar to the
638	   blind throughput-reduction attack mentioned in [RFC5927].

640	   An additional consideration is the following: in the presence of
641	   multi-path routing, even the receipt of a legitimate ICMP unreachable
642	   message cannot be exploited accurately, because there is the
643	   possibility that only one of the multiple paths to the destination is
644	   suffering from a connectivity disruption, which causes ICMP
645	   unreachable messages to be sent.  Then, however, there is the
646	   possibility that the path along which the connectivity disruption
647	   occurred contributed considerably to the overall bandwidth, such that
648	   a congestion response is very well reasonable.  However, this is not
649	   necessarily the case.  Therefore, a TCP has no means except for its
650	   inherent congestion control to decide on this matter.  All in all, it
651	   seems that for a connection in steady-state, i.e., not in timeout-
652	   based loss recovery, reacting on ICMP unreachable messages in regard
653	   to congestion control is not appropriate.  For the case of timeout-
654	   based retransmissions, however, there is a reasonable congestion
655	   response, which is skipping further retransmission timer backoffs
656	   because there is no congestion indication - as described above.

658	6.  Dissolving Ambiguity Issues using the TCP Timestamps Option

660	   If the TCP Timestamps option [RFC1323] is enabled for a connection, a
661	   TCP sender SHOULD use the following algorithm to dissolve the
662	   ambiguity issues mentioned in Sections 5.1, 5.2, and 5.3.  In
663	   particular, both the retransmission ambiguity and the packet
664	   duplication problems are prevented by the following TCP-LCD variant.
665	   On the other hand, the false positives caused by wrapped sequence
666	   numbers cannot be completely avoided, but the likelihood is further
667	   reduced by a factor of 1/2^32 since the Timestamp Value field (TSval)
668	   of the TCP Timestamps Option contains 32 bits.

670	   Hence, implementers may choose to implement the TCP-LCD with the
671	   following modifications.

673	   Step (1) is replaced by step (1'):

675	   (1')  Before TCP updates the variable "RTO" when it initiates
676	         timeout-based loss recovery, set the variables "BACKOFF_CNT"
677	         and "RTO_BASE" and the data structure "RETRANS_TS" as follows:

679	            BACKOFF_CNT := 0;
680	            RTO_BASE := RTO;
681	            RETRANS_TS := [].

683	         Proceed to step (R).

685	   Step (2) is extended by step (2b):

687	   (2b)  Store the value of the Timestamp Value field (TSval) of the TCP
688	         Timestamps option included in the retransmission "RET" sent in
689	         step (R) into the "RETRANS_TS" data structure:

691	            RETRANS_TS.add(RET.TSval)

693	   Step (6) is replaced by step (6'):

695	   (6')  If "SEG.SEQ == SND.UNA && RETRANS_TS.exists(SEQ.TSval)", i.e.,
696	         if the TCP segment "SEG" eliciting the ICMP unreachable message
697	         "ICMP_DU" contains the sequence number of a retransmission, and
698	         the value in its Timestamp Value field (TSval) is valid, then

700	               proceed to step (7');

702	         else

704	               proceed to step (3).

706	   Step (7) is replaced by step (7'):

708	   (7')  Undo the last retransmission timer backoff:

710	               RETRANS_TS.remove(SEQ.TSval);
711	               BACKOFF_CNT := BACKOFF_CNT - 1;
712	               RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO).

714	   The downside of the this variant is twofold.  First, the
715	   modifications come at a cost: the TCP sender is required to store the
716	   timestamps of all retransmissions sent during one timeout-based loss
717	   recovery.  Second, this variant can only undo a retransmission timer
718	   backoff if the intermediate router experiencing the link outage
719	   implements [RFC1812] and chooses to include as many more than the
720	   first 64 bits of the payload of the triggering datagram, as are
721	   needed to include the TCP Timestamps option in the ICMP unreachable
722	   message.

724	7.  Interoperability Issues

726	   This section discusses interoperability issues related to introducing
727	   TCP-LCD.

729	7.1.  Detection of TCP Connection Failures

731	   TCP-LCD may have side-effects on TCP implementations that attempt to
732	   detect TCP connection failures by counting timeout-based
733	   retransmissions.  [RFC1122] states in Section 4.2.3.5 that a TCP host
734	   must handle excessive retransmissions of data segments with two
735	   thresholds R1 and R2 that measure the number of retransmissions that
736	   have occurred for the same segment.  Both thresholds might either be
737	   measured in time units or as a count of retransmissions.

739	   Due to TCP-LCD's reversion strategy of the retransmission timer, the
740	   assumption that a certain number of retransmissions corresponds to a
741	   specific time interval no longer holds, as additional retransmissions
742	   may be performed during timeout-based-loss recovery to detect the end
743	   of the connectivity disruption.  Therefore, a TCP employing TCP-LCD
744	   either MUST measure the thresholds R1 and R2 in time units or, in
745	   case R1 and R2 are counters of retransmissions, MUST convert them
746	   into time intervals, which correspond to the time an unmodified TCP
747	   would need to reach the specified number of retransmissions.

749	7.2.  Explicit Congestion Notification

751	   With Explicit Congestion Notification (ECN) [RFC3168], ECN-capable
752	   routers are no longer limited to dropping packets to indicate
753	   congestion.  Instead, they can set the Congestion Experienced (CE)
754	   codepoint in the IP header to indicate congestion.  With TCP-LCD, it
755	   may happen that during a connectivity disruption, a received ICMP
756	   unreachable message has been elicited by a timeout-based
757	   retransmission that was marked with the CE codepoint before reaching
758	   the router experiencing the link outage.  In such a case, a TCP
759	   sender MUST, corresponding to [RFC3168] (Section 6.1.2), additionally
760	   reset the retransmission timer in case the algorithm undoes a
761	   retransmission timer backoff.

763	7.3.  ICMP for IP version 6

765	   RFC 4443 [RFC4443] specifies the Internet Control Message Protocol
766	   (ICMPv6) to be used with the Internet Protocol version 6 (IPv6)
767	   [RFC2460].  From TCP-LCD's point of view, it is important to notice
768	   that for IPv6, the payload of an ICMPv6 error messages has to include
769	   as many bytes as possible from the IPv6 datagram that elicited the
770	   ICMPv6 error message, without making the error message exceed the
771	   minimum IPv6 MTU (1280 bytes) [RFC4443].  Thus, more information is
772	   available for TCP-LCD than in the case of IPv4.

774	   The counterpart of the ICMPv4 destination unreachable message of code
775	   0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6
776	   destination unreachable message of code 0 (no route to destination)
777	   [RFC4443].  As with IPv4, a router should generate an ICMPv6
778	   destination unreachable message of code 0 in response to a packet
779	   that cannot be delivered to its destination address because it lacks
780	   a matching entry in its routing table.  As a result, TCP-LCD can
781	   employ this ICMPv6 error messages as connectivity disruption
782	   indication, too.

784	7.4.  TCP-LCD and IP Tunnels

786	   It is worth noting that IP tunnels, including IPsec [RFC4301], IP in
787	   IP [RFC2003], Generic Routing Encapsulation (GRE) [RFC2784], and
788	   others are compatible with TCP-LCD, as long as the received ICMP
789	   unreachable messages can be demultiplexed and extracted appropriately
790	   by the TCP sender during timeout-based loss recovery.

792	   If, for example, end-to-end tunnels like IPsec in transport mode
793	   [RFC4301] are employed, a TCP sender may receive ICMP unreachable
794	   messages where additional steps, e.g., decrypting in step (5) of the
795	   algorithm, are needed to extract the TCP header from these ICMP
796	   messages.  Provided that the received ICMP unreachable message
797	   contains enough information, i.e., SEQ.SEG is extractable, this
798	   information can still be used as a valid input for the proposed
799	   algorithm.

801	   Likewise, if IP encapsulation like [RFC2003] is used in some part of
802	   the path between the communicating hosts, the tunnel ingress node may
803	   receive the ICMP unreachable messages from an intermediate router
804	   experiencing the link outage.  Nevertheless, the tunnel ingress node
805	   may replay the ICMP unreachable messages in order to inform the TCP
806	   sender.  If enough information is preserved to extract SEQ.SEG, the
807	   replayed ICMP unreachable messages can still be used in TCP-LCD.

809	8.  Related Work

811	   Several methods that address TCP's problems in the presence of
812	   connectivity disruptions have been proposed in literature.  Some of
813	   them try to improve TCP's performance by modifying lower layers.  For
814	   example, [SM03] introduces a "smart link layer", which buffers one
815	   segment for each active connection and replays these segments upon
816	   connectivity re-establishment.  This approach has a serious drawback:
817	   previously stateless intermediate routers have to be modified in
818	   order to inspect TCP headers, to track the end-to-end connection, and
819	   to provide additional buffer space.  This leads to an additional need
820	   of memory and processing power.

822	   On the other hand, stateless link layer schemes, as proposed in
823	   [RFC3819], which unconditionally buffer some small number of packets
824	   may have another problem: if a packet is buffered longer than the
825	   maximum segment lifetime (MSL) of 2 min [RFC0793], i.e., the
826	   disconnection lasts longer than MSL, TCP's assumption that such
827	   segments will never be received will no longer be true, violating
828	   TCP's semantics [I-D.eggert-tcpm-tcp-retransmit-now].

830	   Other approaches, like TCP-F [CRVP01] or the Explicit Link Failure
831	   Notification (ELFN) [HV02] inform a TCP sender about a disrupted path
832	   by special messages generated and sent from intermediate routers.  In
833	   the case of a link failure, the TCP sender stops sending segments and
834	   freezes its retransmission timers.  TCP-F stays in this state and
835	   remains silent until either a "route establishment notification" is
836	   received or an internal timer expires.  In contrast, ELFN
837	   periodically probes the network to detect connectivity re-
838	   establishment.  Both proposals rely on changes to intermediate
839	   routers, whereas the scheme proposed in this document is a sender-
840	   only modification.  Moreover, ELFN does not consider congestion and
841	   may impose serious additional load on the network, depending on the
842	   probe interval.

844	   The authors of ATCP [LS01] propose enhancements to identify different
845	   types of packet loss by introducing a layer between TCP and IP.  They
846	   utilize ICMP destination unreachable messages to set TCP's receiver
847	   advertised window to zero, thus forcing the TCP sender to perform
848	   zero window probing with an exponential backoff.  ICMP destination
849	   unreachable messages that arrive during this probing period are
850	   ignored.  This approach is nearly orthogonal to this document, which
851	   exploits ICMP messages to undo a retransmission timer backoff when
852	   TCP is already probing.  In principle, both mechanisms could be
853	   combined.  However, due to security considerations, it does not seem
854	   appropriate to adopt ATCP's reaction, as discussed in Section 5.6.

856	   Schuetz et al.  [I-D.schuetz-tcpm-tcp-rlci] describe a set of TCP
857	   extensions that improve TCP's behavior when transmitting over paths
858	   whose characteristics can change rapidly.  Their proposed extensions
859	   modify the local behavior of TCP and introduce a new TCP option to
860	   signal locally received connectivity-change indications (CCIs) to
861	   remote peers.  Upon receipt of a CCI, they re-probe the path
862	   characteristics either by performing a speculative retransmission or
863	   by sending a single segment of new data, depending on whether the
864	   connection is currently stalled in exponential backoff or
865	   transmitting in steady-state, respectively.  The authors focus on
866	   specifying TCP response mechanisms, nevertheless underlying layers
867	   would have to be modified to explicitly send CCIs to make these
868	   immediate responses possible.

870	9.  IANA Considerations

872	   This memo includes no request to IANA.

874	10.  Security Considerations

876	   The algorithm proposed in this document is considered to be secure.
877	   For example, an attacker who already guessed the correct four-tuple
878	   (i.e., Source IP Address, Source TCP port, Destination IP Address,
879	   and Destination TCP port), can still not make a TCP modified with
880	   TCP-LCD flood the network just by sending forged ICMP unreachable
881	   messages in an attempt to maliciously shorten the retransmission
882	   timer.  The attacker additionally would need to guess the correct
883	   segment sequence number of the current timeout-based retransmission,
884	   with a probability of at most 1/2^32.  Even in the case of man-in-
885	   the-middle attacks, i.e., attacks performed in scenarios in which the
886	   attacker can sniff the retransmissions, the impact on network load is
887	   considered to be low, since the retransmission frequency is limited
888	   by the RTO that was computed before TCP had entered the timeout-based
889	   loss recovery.  Hence, the highest probing frequency is expected to
890	   be even lower than once per minimum RTO, i.e. 1s as specified by
891	   [RFC2988].

893	11.  Acknowledgments

895	   We would like to thank Lars Eggert, Mark Handley, Kai Jakobs, Ilpo
896	   Jarvinen, Pasi Sarolahti, Tim Shepard, Joe Touch and Carsten Wolff
897	   for feedback on earlier versions of this document.  We also thank
898	   Michael Faber, Daniel Schaffrath, and Damian Lukowski for
899	   implementing and testing the algorithm in Linux.  Special thanks go
900	   to Ilpo Jarvinen for giving valuable feedback regarding the Linux
901	   implementation.

903	   This work has been supported by the German National Science
904	   Foundation (DFG) within the research excellence cluster Ultra High-
905	   Speed Mobile Information and Communication (UMIC), RWTH Aachen
906	   University.

908	12.  References

910	12.1.  Normative References

912	   [RFC0792]  Postel, J., "Internet Control Message Protocol", STD 5,
913	              RFC 792, September 1981.

915	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
916	              RFC 793, September 1981.

918	   [RFC1323]  Jacobson, V., Braden, B., and D. Borman, "TCP Extensions
919	              for High Performance", RFC 1323, May 1992.

921	   [RFC1812]  Baker, F., "Requirements for IP Version 4 Routers",
922	              RFC 1812, June 1995.

924	   [RFC2988]  Paxson, V. and M. Allman, "Computing TCP's Retransmission
925	              Timer", RFC 2988, November 2000.

927	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
928	              Control", RFC 5681, September 2009.

930	12.2.  Informative References

932	   [CRVP01]   Chandran, K., Raghunathan, S., Venkatesan, S., and R.
933	              Prakash, "A feedback-based scheme for improving TCP
934	              performance in ad hoc wireless networks", IEEE Personal
935	              Communications vol. 8, no. 1, pp. 34-39, February 2001.

937	   [HV02]     Holland, G. and N. Vaidya, "Analysis of TCP performance
938	              over mobile ad hoc networks", Wireless Networks vol. 8,
939	              no. 2-3, pp. 275-288, March 2002.

941	   [I-D.eggert-tcpm-tcp-retransmit-now]
942	              Eggert, L., "TCP Extensions for Immediate
943	              Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02
944	              (work in progress), June 2005.

946	   [I-D.schuetz-tcpm-tcp-rlci]
947	              Schuetz, S., Koutsianas, N., Eggert, L., Eddy, W., Swami,
948	              Y., and K. Le, "TCP Response to Lower-Layer Connectivity-
949	              Change Indications", draft-schuetz-tcpm-tcp-rlci-03 (work
950	              in progress), February 2008.

952	   [KP87]     Karn, P. and C. Partridge, "Improving Round-Trip Time
953	              Estimates in Reliable Transport Protocols", Proceedings of
954	              the Conference on Applications, Technologies,
955	              Architectures, and Protocols for Computer Communication
956	              (SIGCOMM'87) pp. 2-7, August 1987.

958	   [LS01]     Liu, J. and S. Singh, "ATCP: TCP for mobile ad hoc
959	              networks", IEEE Journal on Selected Areas in
960	              Communications vol. 19, no. 7, pp. 1300-1315, 2001 July.

962	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
963	              September 1981.

965	   [RFC0826]  Plummer, D., "Ethernet Address Resolution Protocol: Or
966	              converting network protocol addresses to 48.bit Ethernet
967	              address for transmission on Ethernet hardware", STD 37,
968	              RFC 826, November 1982.

970	   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
971	              Communication Layers", STD 3, RFC 1122, October 1989.

973	   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
974	              October 1996.

976	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
977	              Requirement Levels", BCP 14, RFC 2119, March 1997.

979	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
980	              (IPv6) Specification", RFC 2460, December 1998.

982	   [RFC2784]  Farinacci, D., Li, T., Hanks, S., Meyer, D., and P.
983	              Traina, "Generic Routing Encapsulation (GRE)", RFC 2784,
984	              March 2000.

986	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
987	              of Explicit Congestion Notification (ECN) to IP",
988	              RFC 3168, September 2001.

990	   [RFC3522]  Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
991	              for TCP", RFC 3522, April 2003.

993	   [RFC3782]  Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
994	              Modification to TCP's Fast Recovery Algorithm", RFC 3782,
995	              April 2004.

997	   [RFC3819]  Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
998	              Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
999	              Wood, "Advice for Internet Subnetwork Designers", BCP 89,
1000	              RFC 3819, July 2004.

1002	   [RFC4015]  Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm
1003	              for TCP", RFC 4015, February 2005.

1005	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
1006	              Internet Protocol", RFC 4301, December 2005.

1008	   [RFC4443]  Conta, A., Deering, S., and M. Gupta, "Internet Control
1009	              Message Protocol (ICMPv6) for the Internet Protocol
1010	              Version 6 (IPv6) Specification", RFC 4443, March 2006.

1012	   [RFC5461]  Gont, F., "TCP's Reaction to Soft Errors", RFC 5461,
1013	              February 2009.

1015	   [RFC5682]  Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
1016	              "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
1017	              Spurious Retransmission Timeouts with TCP", RFC 5682,
1018	              September 2009.

1020	   [RFC5927]  Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010.

1022	   [SESB05]   Schuetz, S., Eggert, L., Schmid, S., and M. Brunner,
1023	              "Protocol enhancements for intermittently connected
1024	              hosts", SIGCOMM Computer Communication Review vol. 35, no.
1025	              3, pp. 5-18, December 2005.

1027	   [SM03]     Scott, J. and G. Mapp, "Link layer-based TCP optimisation
1028	              for disconnecting networks", SIGCOMM Computer
1029	              Communication Review vol. 33, no. 5, pp. 31-42,
1030	              October 2003.

1032	   [Zh86]     Zhang, L., "Why TCP Timers Don't Work Well", Proceedings
1033	              of the Conference on Applications, Technologies,
1034	              Architectures, and Protocols for Computer Communication
1035	              (SIGCOMM'86) pp. 397-405, August 1986.

1037	Appendix A.  Changes from previous versions of the draft

1039	   This appendix should be removed by the RFC Editor before publishing
1040	   this document as an RFC.

1042	A.1.  Changes from draft-ietf-tcpm-tcp-lcd-01

1044	   o  Incorporated feedback submitted by Lars Eggert

1046	A.2.  Changes from draft-ietf-tcpm-tcp-lcd-00

1048	   o  Editorial changes.

1050	   o  Clarified TCP-LCD's behaviour during connection establishment
1051	      (Thanks to Mark Handley).

1053	A.3.  Changes from draft-zimmermann-tcp-lcd-02

1055	   o  Incorporated feedback submitted by Ilpo Jarvinen.
1056	      <http://www.ietf.org/mail-archive/web/tcpm/current/msg04841.html>

1058	   o  Incorporated feedback submitted by Pasi Sarolahti.
1059	      <http://www.ietf.org/mail-archive/web/tcpm/current/msg04870.html>

1061	   o  Incorporated feedback submitted by Joe Touch.
1062	      <http://www.ietf.org/mail-archive/web/tcpm/current/msg04895.html>
1063	      <http://www.ietf.org/mail-archive/web/tcpm/current/msg04900.html>

1065	   o  Extended and reorganized the discussion (Section 5):

1067	      *  Every discussion item got its own title, so that we have a
1068	         better overview.

1070	      *  Extended Retransmission Ambiguity section.  Added also some
1071	         references to the historical retransmission ambiguity problem.

1073	      *  Heavily extended discussion about wrapped sequence numbers (see
1074	         Joe's comments).

1076	      *  Described the influence of packet duplication on the algorithm
1077	         (Thanks to Ilpo).

1079	      *  The section "Protecting Against Misbehaving Routers" is not a
1080	         subsection anymore.  Moreover, the section was renamed to
1081	         "Dissolving Ambiguity Issues" and has now real content.

1083	   o  An interoperability issues section (Section 7) was added.  In
1084	      particular comments to ECN, ICMPv6, and to the two thresholds R1
1085	      and R2 of [RFC1122] (Section 4.2.3.5) were added.

1087	   o  Miscellaneous editorial changes.  In particular, the algorithm has
1088	      a name now: TCP-LCD.

1090	A.4.  Changes from draft-zimmermann-tcp-lcd-01

1092	   o  The algorithm in Section 4.2 was slightly changed.  Instead of
1093	      reverting the last retransmission timer backoff by halving the
1094	      RTO, the RTO is recalculated with help of the "BACKOFF_CNT"
1095	      variable.  This fixes an issue that occurred when the
1096	      retransmission timer was backed off but bounded by a maximum
1097	      value.  The algorithm in the previous version of the draft, would
1098	      have "reverted" to half of that maximum value, instead of using
1099	      the value, before the RTO was doubled (and then bounded).

1101	   o  Miscellaneous editorial changes.

1103	A.5.  Changes from draft-zimmermann-tcp-lcd-00

1105	   o  Miscellaneous editorial changes in Section 1, 2 and 3.

1107	   o  The document was restructured in Section 1, 2 and 3 for easier
1108	      reading.  The motivation for the algorithm is changed according
1109	      TCP's problem to disambiguate congestion from non-congestion loss.

1111	   o  Added Section 4.1.

1113	   o  The algorithm in Section 4.2 was restructured and simplified:

1115	      *  The special case of the first received ICMP destination
1116	         unreachable message after an RTO was removed.

1118	      *  The "BACKOFF_CNT" variable was introduced so it is no longer
1119	         possible to perform more reverts than backoffs.

1121	   o  The discussion in Section 5 was improved and expanded according to
1122	      the algorithm changes.

1124	Authors' Addresses

1126	   Alexander Zimmermann
1127	   RWTH Aachen University
1128	   Ahornstrasse 55
1129	   Aachen,   52074
1130	   Germany

1132	   Phone: +49 241 80 21422
1133	   Email: zimmermann@cs.rwth-aachen.de
1134	   Arnd Hannemann
1135	   RWTH Aachen University
1136	   Ahornstrasse 55
1137	   Aachen,   52074
1138	   Germany

1140	   Phone: +49 241 80 21423
1141	   Email: hannemann@nets.rwth-aachen.de