idnits 2.17.1 

draft-ietf-tcpm-tcp-lcd-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (September 14, 2010) is 4972 days in the past.  Is
     this intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323)

  ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 3782
     (Obsoleted by RFC 6582)


     Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TCP Maintenance and Minor                                  A. Zimmermann
3	Extensions (TCPM) WG                                        A. Hannemann
4	Internet-Draft                                    RWTH Aachen University
5	Intended status: Experimental                         September 14, 2010
6	Expires: March 18, 2011

8	   Making TCP more Robust to Long Connectivity Disruptions (TCP-LCD)
9	                       draft-ietf-tcpm-tcp-lcd-03

11	Abstract

13	   Disruptions in end-to-end path connectivity, which last longer than
14	   one retransmission timeout, cause suboptimal TCP performance.  The
15	   reason for this performance degradation is that TCP interprets
16	   segment loss induced by long connectivity disruptions as a sign of
17	   congestion, resulting in repeated retransmission timer backoffs.
18	   This, in turn, leads to a delayed detection of the re-establishment
19	   of the connection since TCP waits for the next retransmission timeout
20	   before it attempts a retransmission.

22	   This document proposes an algorithm to make TCP more robust to long
23	   connectivity disruptions (TCP-LCD).  It describes how standard ICMP
24	   messages can be exploited during timeout-based loss recovery to
25	   disambiguate true congestion loss from non-congestion loss caused by
26	   connectivity disruptions.  Moreover, a reversion strategy of the
27	   retransmission timer is specified that enables a more prompt
28	   detection of whether or not the connectivity to a previously
29	   disconnected peer node has been restored.  TCP-LCD is a TCP sender-
30	   only modification that effectively improves TCP performance in case
31	   of connectivity disruptions.

33	Status of this Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   Internet-Drafts are working documents of the Internet Engineering
39	   Task Force (IETF).  Note that other groups may also distribute
40	   working documents as Internet-Drafts.  The list of current Internet-
41	   Drafts is at http://datatracker.ietf.org/drafts/current/.

43	   Internet-Drafts are draft documents valid for a maximum of six months
44	   and may be updated, replaced, or obsoleted by other documents at any
45	   time.  It is inappropriate to use Internet-Drafts as reference
46	   material or to cite them other than as "work in progress."

48	   This Internet-Draft will expire on March 18, 2011.

50	Copyright Notice

52	   Copyright (c) 2010 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
68	   2.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
69	   3.  Connectivity Disruption Indication . . . . . . . . . . . . . .  6
70	   4.  Connectivity Disruption Reaction . . . . . . . . . . . . . . .  8
71	     4.1.  Basic Idea . . . . . . . . . . . . . . . . . . . . . . . .  8
72	     4.2.  Algorithm Details  . . . . . . . . . . . . . . . . . . . .  9
73	   5.  Discussion of TCP-LCD  . . . . . . . . . . . . . . . . . . . . 12
74	     5.1.  Retransmission Ambiguity . . . . . . . . . . . . . . . . . 13
75	     5.2.  Wrapped Sequence Numbers . . . . . . . . . . . . . . . . . 13
76	     5.3.  Packet Duplication . . . . . . . . . . . . . . . . . . . . 14
77	     5.4.  Probing Frequency  . . . . . . . . . . . . . . . . . . . . 15
78	     5.5.  Reaction during Connection Establishment . . . . . . . . . 15
79	     5.6.  Reaction in Steady-State . . . . . . . . . . . . . . . . . 15
80	   6.  Dissolving Ambiguity Issues using the TCP Timestamps Option  . 16
81	   7.  Interoperability Issues  . . . . . . . . . . . . . . . . . . . 17
82	     7.1.  Detection of TCP Connection Failures . . . . . . . . . . . 18
83	     7.2.  Explicit Congestion Notification (ECN) . . . . . . . . . . 18
84	     7.3.  TCP-LCD and IP Tunnels . . . . . . . . . . . . . . . . . . 18
85	   8.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 19
86	   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 20
87	   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 20
88	   11. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 21
89	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21
90	     12.1. Normative References . . . . . . . . . . . . . . . . . . . 21
91	     12.2. Informative References . . . . . . . . . . . . . . . . . . 22
92	   Appendix A.  Changes from previous versions of the draft . . . . . 24
93	     A.1.  Changes from draft-ietf-tcpm-tcp-lcd-02  . . . . . . . . . 24
94	     A.2.  Changes from draft-ietf-tcpm-tcp-lcd-01  . . . . . . . . . 25
95	     A.3.  Changes from draft-ietf-tcpm-tcp-lcd-00  . . . . . . . . . 25
96	     A.4.  Changes from draft-zimmermann-tcp-lcd-02 . . . . . . . . . 25
97	     A.5.  Changes from draft-zimmermann-tcp-lcd-01 . . . . . . . . . 26
98	     A.6.  Changes from draft-zimmermann-tcp-lcd-00 . . . . . . . . . 26
99	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26

101	1.  Terminology

103	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
104	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
105	   document are to be interpreted as described in [RFC2119].

107	   The reader should be familiar with the algorithm and terminology from
108	   [RFC2988], which defines the standard algorithm Transmission Control
109	   Protocol (TCP) senders are required to use to compute and manage
110	   their retransmission timer.  In this document, the terms
111	   "retransmission timer" and "retransmission timeout" are used as
112	   defined in [RFC2988].  The retransmission timer ensures data delivery
113	   in the absence of any feedback from the receiver.  The duration of
114	   this timer is referred to as retransmission timeout (RTO).

116	   As defined in [RFC0793], the term "acceptable acknowledgment (ACK)"
117	   refers to a TCP segment that acknowledges previously unacknowledged
118	   data.  The TCP sender state variable "SND.UNA" and the current
119	   segment variable "SEG.SEQ" are used as defined in [RFC0793].  SND.UNA
120	   holds the segment sequence number of earliest segment that has not
121	   been acknowledged by the TCP receiver (the oldest outstanding
122	   segment).  SEG.SEQ is the segment sequence number of a given segment.

124	   For the purposes of this specification, we define the term "timeout-
125	   based loss recovery" that refers to the state that a TCP sender
126	   enters upon the first timeout of the oldest outstanding segment
127	   (SND.UNA) and leaves upon the arrival of the *first* acceptable ACK.
128	   It is important to note that other documents use a different
129	   interpretation of the term "timeout-based loss recovery".  For
130	   example, the NewReno modification to TCP's Fast Recovery algorithm
131	   [RFC3782] extents the period a TCP sender remains in timeout-based
132	   loss recovery compared to the one defined in this document.  This is
133	   because [RFC3782] attempts to avoid unnecessary multiple Fast
134	   Retransmits that can occur after an RTO.

136	2.  Introduction

138	   Connectivity disruptions can occur in many different situations.  The
139	   frequency of connectivity disruptions depends on the properties of
140	   the end-to-end path between the communicating hosts.  While
141	   connectivity disruptions can occur in traditional wired networks,
142	   e.g., caused by an unplugged network cable, the likelihood of their
143	   occurrence is significantly higher in wireless (multi-hop) networks.
144	   Especially, end-host mobility, network topology changes, and wireless
145	   interferences are crucial factors.  In the case of the Transmission
146	   Control Protocol (TCP) [RFC0793], the performance of the connection
147	   can experience a significant reduction compared to a permanently
148	   connected path [SESB05].  This is because TCP, which was originally
149	   designed to operate in fixed and wired networks, generally assumes
150	   that the end-to-end path connectivity is relatively stable over the
151	   connection's lifetime.

153	   Depending on their duration, connectivity disruptions can be
154	   classified into two groups [I-D.schuetz-tcpm-tcp-rlci]: "short" and
155	   "long".  A connectivity disruption is "short" if connectivity returns
156	   before the retransmission timer fires for the first time.  In this
157	   case, TCP recovers lost data segments through Fast Retransmit and
158	   lost acknowledgments (ACK) through successfully delivered later ACKs.
159	   Connectivity disruptions are declared as "long" for a given TCP
160	   connection if the retransmission timer fires at least once before
161	   connectivity is resumed.  Whether or not path characteristics, like
162	   the round trip time (RTT) or the available bandwidth, have changed
163	   when connectivity resumes after a disruption is another important
164	   aspect for TCP's retransmission scheme [I-D.schuetz-tcpm-tcp-rlci].

166	   The algorithm specified in this document improves TCP's behavior in
167	   case of "long connectivity disruptions".  In particular, it focuses
168	   on the period prior to the re-establishment of the connectivity to a
169	   previously disconnected peer node.  The document does not describe
170	   any modifications to TCP's behavior and its congestion control
171	   mechanisms [RFC5681] after connectivity has been restored.

173	   When a long connectivity disruption occurs on a TCP connection, the
174	   TCP sender eventually does not receive any more acknowledgments.
175	   After the retransmission timer expires, the TCP sender enters the
176	   timeout-based loss recovery and declares the oldest outstanding
177	   segment (SND.UNA) as lost.  Since TCP tightly couples reliability and
178	   congestion control, the retransmission of SND.UNA is triggered
179	   together with the reduction of the transmission rate.  This is based
180	   on the assumption that segment loss is an indication of congestion
181	   [RFC5681].  As long as the connectivity disruption persists, TCP will
182	   repeat this procedure until the oldest outstanding segment has
183	   successfully been acknowledged, or until the connection has timed
184	   out.  TCP implementations that follow the recommended retransmission
185	   timeout (RTO) management of RFC 2988 [RFC2988] double the RTO after
186	   each retransmission attempt.  However, the RTO growth may be bounded
187	   by an upper limit, the maximum RTO, which is at least 60s, but may be
188	   longer: Linux, for example, uses 120s.  If connectivity is restored
189	   between two retransmission attempts, TCP still has to wait until the
190	   retransmission timer expires before resuming transmission, since it
191	   simply does not have any means to know if the connectivity has been
192	   re-established.  Therefore, depending on when connectivity becomes
193	   available again, this can waste up to a maximum RTO of possible
194	   transmission time.

196	   This retransmission behavior is not efficient, especially in
197	   scenarios with long connectivity disruptions.  In the ideal case, TCP
198	   would attempt a retransmission as soon as connectivity to its peer
199	   has been re-established.  In this document, we specify a TCP sender-
200	   only modification to provide robustness to long connectivity
201	   disruptions (TCP-LCD).  The memo describes how the standard Internet
202	   Control Message Protocol (ICMP) can be exploited during timeout-based
203	   loss recovery to identify non-congestion loss caused by long
204	   connectivity disruptions.  TCP-LCD's reversion strategy of the
205	   retransmission timer enables higher-frequency retransmissions and
206	   thereby a prompt detection when connectivity to a previously
207	   disconnected peer node has been restored.  If no congestion is
208	   present, TCP-LCD approaches the ideal behavior.

210	   Experimental results of a Linux implementation of TCP-LCD have been
211	   presented in [ZimHan09].  The implementation has been incorporated
212	   into mainline Linux, and is already used within the Internet.  Thus
213	   far, no negative experiences have been reported that could be
214	   attributed to the algorithm.  However, we consider TCP-LCD as
215	   experimental until more real-life results have been obtained.
216	   Nevertheless, we encourage implementation of TCP-LCD under other
217	   operating systems to provide for broader testing and experimentation
218	   opportunities.

220	3.  Connectivity Disruption Indication

222	   If the queue of an intermediate router that is experiencing a link
223	   outage can buffer all incoming packets, a connectivity disruption
224	   will only cause a variation in delay, which is handled well by TCP
225	   implementations using either Eifel [RFC3522], [RFC4015] or Forward
226	   RTO-Recovery (F-RTO) [RFC5682].  However, if the link outage lasts
227	   for too long, the router experiencing the link outage is forced to
228	   drop packets, and finally to discard the according route.  Means to
229	   detect such link outages include reacting on failed address
230	   resolution protocol (ARP) [RFC0826] queries, unsuccessful link
231	   sensing, and the like.  However, this is solely in the responsibility
232	   of the respective router.

234	      Note: The focus of this memo is on introducing a method how ICMP
235	      messages may be exploited to improve TCP's performance; how
236	      different physical and link layer mechanisms below the network
237	      layer may trigger ICMP destination unreachable messages are out of
238	      scope of this memo.

240	   Provided that no other route to the specific destination exists, an
241	   Internet Protocol version 4 (IPv4) [RFC0791] router will notify the
242	   corresponding sending host about the dropped packets via ICMP
243	   destination unreachable messages of code 0 (net unreachable) or code
244	   1 (host unreachable) [RFC1812].  Therefore, the sending host can use
245	   the ICMP destination unreachable messages of these codes as an
246	   indication for a connectivity disruption, since the reception of
247	   these messages provide evidence that packets were dropped due to a
248	   link outage.

250	   For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of
251	   the ICMP destination unreachable message of code 0 (net unreachable)
252	   and of code 1 (host unreachable) is the ICMPv6 destination
253	   unreachable message of code 0 (no route to destination) [RFC4443].
254	   As with IPv4, a router should generate an ICMPv6 destination
255	   unreachable message of code 0 in response to a packet that cannot be
256	   delivered to its destination address because it lacks a matching
257	   entry in its routing table.

259	   Note that there are also other ICMP and ICMPv6 destination
260	   unreachable messages with different codes.  Some of them are
261	   candidates for connectivity disruption indications, too, but need
262	   further investigation.  For example, ICMP destination unreachable
263	   messages with code 5 (source route failed), code 11 (net unreachable
264	   for TOS), or code 12 (host unreachable for TOS) [RFC1812].  On the
265	   other hand, codes that flag hard errors are of no use for this
266	   scheme, since TCP should abort the connection when those are received
267	   [RFC1122].

269	   For the sake of simplicity, we will use, unless explicitly qualified
270	   with ICMPv4 or ICMPv6, the term "ICMP unreachable message" as synonym
271	   for ICMP destination unreachable messages of code 0 or code 1 and
272	   ICMPv6 destination unreachable of code 0.  This implies that all
273	   keywords from [RFC2119] that deal with the handling of received ICMP
274	   messages apply in the same way to ICMPv6 messages.

276	   The accurate interpretation of ICMP unreachable messages as a
277	   connectivity disruption indication is complicated by the following
278	   two peculiarities of ICMP messages.  First, they do not necessarily
279	   operate on the same timescale as the packets, i.e., TCP segments that
280	   elicited them.  When a router drops a packet due to a missing route,
281	   it will not necessarily send an ICMP unreachable message immediately,
282	   but will rather queue it for later delivery.  Second, ICMP messages
283	   are subject to rate limiting, e.g., when a router drops a whole
284	   window of data due to a link outage, it is unlikely to send as many
285	   ICMP unreachable messages as dropped TCP segments.  Depending on the
286	   load of the router, it may not even send any ICMP unreachable
287	   messages at all.  Both peculiarities originate from [RFC1812] for
288	   ICMPv4 and [RFC4443] for ICMPv6.

290	   Fortunately, according to [RFC0792], ICMPv4 unreachable messages have
291	   to contain in their body the entire IPv4 header [RFC0791] of the
292	   datagram eliciting the ICMPv4 unreachable message, plus the first 64
293	   bits of the payload of that datagram.  This allows the sending host
294	   to match the ICMPv4 error message to the transport connection that
295	   elicited it.  RFC 1812 [RFC1812] augments these requirements and
296	   states that ICMPv4 messages should contain as much of the original
297	   datagram as possible without the length of the ICMPv4 datagram
298	   exceeding 576 bytes.  Therefore, in case of TCP, at least the source
299	   port number, the destination port number, and the 32-bit TCP sequence
300	   number are included.  This allows the originating TCP to demultiplex
301	   the received ICMPv4 message and to identify the affected connection.
302	   Moreover, it can identify which segment of the respective connection
303	   triggered the ICMPv4 unreachable message, unless there are several
304	   segments in-flight with the same sequence number (see Section 5.1).

306	   For IPv6 [RFC2460], the payload of an ICMPv6 error messages has to
307	   include as many bytes as possible from the IPv6 datagram that
308	   elicited the ICMPv6 error message, without making the error message
309	   exceed the minimum IPv6 MTU (1280 bytes) [RFC4443].  Thus, enough
310	   information is available to identify both, the affected connection
311	   and the corresponding segment that triggered the ICMPv6 error
312	   message.

314	   A connectivity disruption indication in form of an ICMP unreachable
315	   message associated with a presumably lost TCP segment provides strong
316	   evidence that the segment was not dropped due to congestion, but was
317	   successfully delivered as far as the reporting router.  It therefore
318	   did not witness any congestion at least on that part of the path that
319	   was traversed by both the TCP segment eliciting the ICMP unreachable
320	   message as well as the ICMP unreachable message itself.

322	4.  Connectivity Disruption Reaction

324	   Section 4.1 introduces the basic idea of TCP-LCD.  The complete
325	   algorithm is specified in Section 4.2.

327	4.1.  Basic Idea

329	   The goal of the algorithm is to promptly detect when connectivity to
330	   a previously disconnected peer node has been restored after a long
331	   connectivity disruption, while retaining appropriate behavior in case
332	   of congestion.  TCP-LCD exploits standard ICMP unreachable messages
333	   during timeout-based loss recovery.  This increases TCP's
334	   retransmission frequency by undoing one retransmission timer backoff
335	   whenever an ICMP unreachable message is received that contains a
336	   segment with a sequence number of a presumably lost retransmission.

338	   This approach has the advantage of appropriately reducing the probing
339	   rate in case of congestion.  If either the retransmission itself or
340	   the corresponding ICMP message is dropped the previously performed
341	   retransmission timer backoff is not undone, which effectively halves
342	   the probing rate.

344	4.2.  Algorithm Details

346	   A TCP sender that uses RFC 2988 [RFC2988] to compute TCP's
347	   retransmission timer MAY employ the following scheme to avoid over-
348	   conservative retransmission timer backoffs in case of long
349	   connectivity disruptions.  If a TCP sender does implement the
350	   following steps, the algorithm MUST be initiated upon the first
351	   timeout of the oldest outstanding segment (SND.UNA) and MUST be
352	   stopped upon the arrival of the first acceptable ACK.  The algorithm
353	   MUST NOT be re-initiated upon subsequent timeouts for the same
354	   segment.  The scheme SHOULD NOT be used in SYN-SENT or SYN-RECEIVED
355	   states [RFC0793] (see Section 5.5).

357	   A TCP sender that does not employ RFC 2988 [RFC2988] to compute TCP's
358	   retransmission timer MUST NOT use TCP-LCD.  We envision that the
359	   scheme could be easily adapted to algorithms others than RFC 2988.
360	   However, we leave this as future work.

362	   In rule (2.5), RFC 2988 [RFC2988] provides the option to place a
363	   maximum value on the RTO.  When a TCP implements this rule to provide
364	   an upper bound for the RTO, it MUST also be used in the following
365	   algorithm.  In particular, if the RTO is bounded by an upper limit
366	   (maximum RTO), the "MAX_RTO" variable used in this scheme MUST be
367	   initialized with this upper limit.  Otherwise, if the RTO is
368	   unbounded, the "MAX_RTO" variable MUST be set to infinity.

370	   The scheme specified in this document uses the "BACKOFF_CNT"
371	   variable, whose initial value is zero.  The variable is used to count
372	   the number of performed retransmission timer backoffs during one
373	   timeout-based loss recovery.  Moreover, the "RTO_BASE" variable is
374	   used to recover the previous RTO if the retransmission timer backoff
375	   was unnecessary.  The variable is initialized with the RTO upon
376	   initiation of timeout-based loss recovery.

378	   (1)  Before TCP updates the variable "RTO" when it initiates timeout-
379	        based loss recovery, set the variables "BACKOFF_CNT" and
380	        "RTO_BASE" as follows:

382	           BACKOFF_CNT := 0;
383	           RTO_BASE := RTO.

385	        Proceed to step (R).

387	   (R)  This is a placeholder for standard TCP's behavior in case the
388	        retransmission timer has expired.  In particular, if RFC 2988
389	        [RFC2988] is used, steps (5.4) - (5.6) of that algorithm go
390	        here.  Proceed to step (2).

392	   (2)  To account for the expiration of the retransmission timer in the
393	        previous step (R), increment the "BACKOFF_CNT" variable by one:

395	           BACKOFF_CNT := BACKOFF_CNT + 1.

397	   (3)  Wait either

399	           for the expiration of the retransmission timer.  When the
400	           retransmission timer expires, proceed to step (R);

402	           or for the arrival of an acceptable ACK.  When an acceptable
403	           ACK arrives, proceed to step (A);

405	           or for the arrival of an ICMP unreachable message.  When the
406	           ICMP unreachable message "ICMP_DU" arrives, proceed to step
407	           (4).

409	   (4)  If "BACKOFF_CNT > 0", i.e., if at least one retransmission timer
410	        backoff can be undone, then

412	           proceed to step (5);

414	        else

416	           proceed to step (3).

418	   (5)  Extract the TCP segment header included in the ICMP unreachable
419	        message "ICMP_DU":

421	           SEG := Extract(ICMP_DU).

423	   (6)  If "SEG.SEQ == SND.UNA", i.e., if the TCP segment "SEG"
424	        eliciting the ICMP unreachable message "ICMP_DU" contains the
425	        sequence number of a retransmission, then

427	           proceed to step (7);

429	        else

431	           proceed to step (3).

433	   (7)  Undo the last retransmission timer backoff:

435	           BACKOFF_CNT := BACKOFF_CNT - 1;
436	           RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO).

438	   (8)  If the retransmission timer expires due to the undoing in the
439	        previous step (7), then

441	           proceed to step (R);

443	        else

445	           proceed to step (3).

447	   (A)  This is a placeholder for standard TCP's behavior in case an
448	        acceptable ACK has arrived.  No further processing.

450	   When a TCP in steady-state detects a segment loss using the
451	   retransmission timer, it enters the timeout-based loss recovery and
452	   initiates the algorithm (step 1).  It adjusts the slow start
453	   threshold (ssthresh), sets the congestion window (CWND) to one
454	   segment, backs off the retransmission timer, and retransmits the
455	   first unacknowledged segment (step R) [RFC5681], [RFC2988].  To
456	   account for the expiration of the retransmission timer, the TCP
457	   sender increments the "BACKOFF_CNT" variable by one (step 2).

459	   In case the retransmission timer expires again (step 3a), a TCP will
460	   repeat the retransmission of the first unacknowledged segment and
461	   back off the retransmission timer once more (step R) [RFC2988], as
462	   well as increment the "BACKOFF_CNT" variable by one (step 2).  Note
463	   that a TCP may implement RFC 2988's [RFC2988] option to place a
464	   maximum value on the RTO that may result in not performing the
465	   retransmission timer backoff.  However, step (2) MUST always and
466	   unconditionally be applied, no matter whether or not the
467	   retransmission timer is actually backed off.  In other words, each
468	   time the retransmission timer expires, the "BACKOFF_CNT" variable
469	   MUST be incremented by one.

471	   If the first received packet after the retransmission(s) is an
472	   acceptable ACK (step 3b), a TCP will proceed as normal, i.e., slow
473	   start the connection and terminate the algorithm (step A).  Later
474	   ICMP unreachable messages from the just terminated timeout-based loss
475	   recovery are ignored, since the ACK clock is already restarting due
476	   to the successful retransmission.

478	   On the other hand, if the first received packet after the
479	   retransmission(s) is an ICMP unreachable message (step 3c), and if
480	   step (4) permits it, TCP SHOULD undo one backoff for each ICMP
481	   unreachable message reporting an error on a retransmission.  To
482	   decide if an ICMP unreachable message was elicited by a
483	   retransmission, the sequence number it contains is inspected (step 5,
484	   step 6).  The undo is performed by re-calculating the RTO with the
485	   decremented "BACKOFF_CNT" variable (step 7).  This calculation
486	   explicitly matches the (bounded) exponential backoff specified in
487	   rule (5.5) of [RFC2988].

489	   Upon receipt of an ICMP unreachable message that legitimately undoes
490	   one backoff, there is the possibility that the shortened
491	   retransmission timer has already expired (step 8).  Then, TCP SHOULD
492	   retransmit immediately.  In case the shortened retransmission timer
493	   has not yet expired, TCP MUST wait accordingly.

495	5.  Discussion of TCP-LCD

497	   TCP-LCD takes caution to only react to connectivity disruption
498	   indications in the form of ICMP unreachable messages during timeout-
499	   based loss recovery.  Therefore, TCP's behavior is not altered when
500	   either no ICMP unreachable messages are received, or the
501	   retransmission timer of the TCP sender did not expire since the last
502	   received acceptable ACK.  Thus, by definition, the algorithm triggers
503	   only in the case of long connectivity disruptions.

505	   Only such ICMP unreachable messages that contain a TCP segment with
506	   the sequence number of a retransmission, i.e., contain SND.UNA, are
507	   evaluated by TCP-LCD.  All other ICMP unreachable messages are
508	   ignored.  The arrival of those ICMP unreachable messages provides
509	   strong evidence that the retransmissions were not dropped due to
510	   congestion, but were successfully delivered to the reporting router.
511	   In other words, there is no evidence for any congestion at least on
512	   that very part of the path that was traversed by both the TCP segment
513	   eliciting the ICMP unreachable message as well as the ICMP
514	   unreachable message itself.

516	   However, there are some situations where TCP-LCD makes a false
517	   decision and incorrectly undoes a retransmission timer backoff.  This
518	   can happen, even when the received ICMP unreachable message contains
519	   the segment number of a retransmission (SND.UNA), because the TCP
520	   segment that elicited the ICMP unreachable message may either not be
521	   a retransmission (Section 5.1), or does not belong to the current
522	   timeout-based loss recovery (Section 5.2).  Finally, packet
523	   duplication (Section 5.3) can also spuriously trigger the algorithm.

525	   Section 5.4 discusses possible probing frequencies, while Section 5.6
526	   describes the motivation for not reacting to ICMP unreachable
527	   messages while TCP is in steady-state.

529	5.1.  Retransmission Ambiguity

531	   Historically, the retransmission ambiguity problem [Zh86], [KP87] is
532	   the TCP sender's inability to distinguish whether the first
533	   acceptable ACK after a retransmission refers to the original
534	   transmission or to the retransmission.  This problem occurs after
535	   both a Fast Retransmit and a timeout-based retransmit.  However,
536	   modern TCP implementations can eliminate the retransmission ambiguity
537	   with either the help of Eifel [RFC3522], [RFC4015] or Forward RTO-
538	   Recovery (F-RTO) [RFC5682].

540	   The reversion strategy of the given algorithm suffers from a form of
541	   retransmission ambiguity, too.  In contrast to the above case, TCP
542	   suffers from ambiguity regarding ICMP unreachable messages received
543	   during timeout-based loss recovery.  With the TCP segment number
544	   included in the ICMP unreachable message, a TCP sender is not able to
545	   determine if the ICMP unreachable message refers to the original
546	   transmission or to any of the timeout-based retransmissions.  That
547	   is, there is an ambiguity with regards to which TCP segment an ICMP
548	   unreachable message reports on.

550	   However, this ambiguity is not considered to be a problem for the
551	   algorithm.  The assumption that a received ICMP unreachable message
552	   provides evidence that a non-congestion loss caused by the
553	   connectivity disruption was wrongly considered a congestion loss
554	   still holds, regardless to which TCP segment, transmission or
555	   retransmission, the message refers.

557	5.2.  Wrapped Sequence Numbers

559	   Besides the ambiguity whether a received ICMP unreachable message
560	   refers to the original transmission or to any of the retransmissions,
561	   there is another source of ambiguity related to the TCP sequence
562	   numbers contained in ICMP unreachable messages.  For high bandwidth
563	   paths, the sequence space may wrap quickly.  This might cause that
564	   delayed ICMP unreachable messages may coincidentally fit as valid
565	   input in the proposed scheme.  As a result, the scheme may
566	   incorrectly undo retransmission timer backoffs.  Chances for this to
567	   happen are minuscule, since a particular ICMP unreachable message
568	   would need to contain the exact sequence number of the current oldest
569	   outstanding segment (SND.UNA), while at the same time TCP is in
570	   timeout-based loss recovery.  However, two "worst case" scenarios for
571	   the algorithm are possible:

573	   For instance, consider a steady state TCP connection, which will be
574	   disrupted at an intermediate router due to a link outage.  Upon the
575	   expiration of the RTO, the TCP sender enters the timeout-based loss
576	   recovery and starts to retransmit the earliest segment that has not
577	   been acknowledged (SND.UNA).  For some reason, the router delays all
578	   corresponding ICMP unreachable messages so that the TCP sender backs
579	   the retransmission timer off normally without any undoing.  At the
580	   end of the connectivity disruption, the TCP sender eventually detects
581	   the re-establishment, leaves the scheme and finally the timeout-based
582	   loss recovery, too.  A sequence number wrap-around later, the
583	   connectivity between the two peers is disrupted again, but this time
584	   due to congestion and exactly at the time at which the current
585	   SND.UNA matches the SND.UNA from the previous cycle.  If the router
586	   emits the delayed ICMP unreachable messages now, the TCP sender would
587	   incorrectly undo retransmission timer backoffs.  As the TCP sequence
588	   number contains 32 bits, the probability of this scenario is at most
589	   1/2^32.  Given sufficiently many retransmissions in the first
590	   timeout-based loss recovery, the corresponding ICMP unreachable
591	   messages could reduce the RTO in the second recovery at most to
592	   "RTO_BASE".  However, once the ICMP unreachable messages are
593	   depleted, the standard exponential backoff will be performed.  Thus,
594	   the congestion response will only be delayed by some false
595	   retransmissions.

597	   Similar to the above, consider the case where a steady state TCP
598	   connection with n segments in flight will be disrupted at some point
599	   due to a link outage at an intermediate router.  For each segment in
600	   flight, the router may generate an ICMP unreachable message.
601	   However, due to some reason it delays them.  Once the link outage is
602	   over and the connection has been re-established, the TCP sender
603	   leaves the scheme and slow-starts the connection.  Following a
604	   sequence number wrap-around, a retransmission timeout occurs, just at
605	   the moment the TCP sender's current window of data reaches the
606	   previous range of the sequence number space again.  In case the
607	   router emits the delayed ICMP unreachable messages now, spurious
608	   undoing of the retransmission timer backoff is possible once, if the
609	   TCP segment number contained in ICMP unreachable messages matches the
610	   current SND.UNA, and the timeout was a result of congestion.  In the
611	   case of another connectivity disruption, the additional undoing of
612	   the retransmission timer backoff has no impact.  The probability of
613	   this scenario is at most n/2^32.

615	5.3.  Packet Duplication

617	   In case an intermediate router duplicates packets, a TCP sender may
618	   receive more ICMP unreachable messages during timeout-based loss
619	   recovery than sent timeout-based retransmissions.  However, since
620	   TCP-LCD keeps track of the number of performed retransmission timer
621	   backoffs in the "BACKOFF_CNT" variable, it will not undo more
622	   retransmission timer backoffs than were actually performed.
623	   Nevertheless, if packet duplication and congestion coincide on the
624	   path between the two communicating hosts, duplicated ICMP unreachable
625	   messages could hide the congestion loss of some retransmissions or
626	   ICMP unreachable messages, and the algorithm may incorrectly undo
627	   retransmission timer backoffs.  Considering the overall impact of a
628	   router that duplicates packets, the additional load induced by some
629	   spurious timeout-based retransmits can probably be neglected.

631	5.4.  Probing Frequency

633	   One might argue that if an ICMP unreachable message arrives for a
634	   timeout-based retransmission, the RTO shall be reset or recalculated,
635	   similar to what is done when an ACK arrives during timeout-based loss
636	   recovery (see Karn's algorithm [KP87], [RFC2988]), and a new
637	   retransmission should be sent immediately.  Generally, this would
638	   result in a much higher probing frequency based on the round trip
639	   time to the router where connectivity has been disrupted.  However,
640	   we believe the current scheme provides a good trade-off between
641	   conservative behavior and fast detection of connectivity re-
642	   establishment.  TCP-LCD focuses on long-connectivity disruptions,
643	   i.e., on disruptions that last for several RTOs.  Thus, a much higher
644	   probing frequency (less then once per RTO) would not significantly
645	   increase the available transmission time compared to the duration of
646	   the connectivity disruption.

648	5.5.  Reaction during Connection Establishment

650	   It is possible that a TCP sender enters timeout-based loss recovery
651	   while the connection is in SYN-SENT or SYN-RECEIVED states [RFC0793].
652	   The algorithm described in this document could also be used for
653	   faster connection establishment in networks with connectivity
654	   disruptions.  However, because existing TCP implementations [RFC5461]
655	   already interpret ICMP unreachable messages during connection
656	   establishment and abort the corresponding connection, we refrain from
657	   suggesting this.

659	5.6.  Reaction in Steady-State

661	   Another exploitation of ICMP unreachable messages in the context of
662	   TCP congestion control might seem appropriate, while TCP is in
663	   steady-state.  As the RTT up to the router that generated the ICMP
664	   unreachable message is likely to be substantially shorter than the
665	   overall RTT to the destination, the ICMP unreachable message may very
666	   well reach the originating TCP while it is transmitting the current
667	   window of data.  In case the remaining window is large, it might seem
668	   appropriate to refrain from transmitting the remaining window as
669	   there is timely evidence that it will only trigger further ICMP
670	   unreachable messages at the very router.  Although this promises
671	   improvement from a wastage perspective, it may be counterproductive
672	   from a security perspective.  An attacker could forge such ICMP
673	   messages, thereby forcing the originating TCP to stop sending data,
674	   very similar to the blind throughput-reduction attack mentioned in
675	   [RFC5927].

677	   An additional consideration is the following: in the presence of
678	   multi-path routing, even the receipt of a legitimate ICMP unreachable
679	   message cannot be exploited accurately, because there is the
680	   possibility that only one of the multiple paths to the destination is
681	   suffering from a connectivity disruption, which causes ICMP
682	   unreachable messages to be sent.  Then, however, there is the
683	   possibility that the path along which the connectivity disruption
684	   occurred contributed considerably to the overall bandwidth, such that
685	   a congestion response is very well reasonable.  However, this is not
686	   necessarily the case.  Therefore, a TCP has no means except for its
687	   inherent congestion control to decide on this matter.  All in all, it
688	   seems that for a connection in steady-state, i.e., not in timeout-
689	   based loss recovery, reacting on ICMP unreachable messages in regard
690	   to congestion control is not appropriate.  For the case of timeout-
691	   based retransmissions, however, there is a reasonable congestion
692	   response, which is skipping further retransmission timer backoffs
693	   because there is no congestion indication - as described above.

695	6.  Dissolving Ambiguity Issues using the TCP Timestamps Option

697	   If the TCP Timestamps option [RFC1323] is enabled for a connection, a
698	   TCP sender SHOULD use the following algorithm to dissolve the
699	   ambiguity issues mentioned in Sections 5.1, 5.2, and 5.3.  In
700	   particular, both the retransmission ambiguity and the packet
701	   duplication problems are prevented by the following TCP-LCD variant.
702	   On the other hand, the false positives caused by wrapped sequence
703	   numbers cannot be completely avoided, but the likelihood is further
704	   reduced by a factor of 1/2^32 since the Timestamp Value field (TSval)
705	   of the TCP Timestamps Option contains 32 bits.

707	   Hence, implementers may choose to implement the TCP-LCD with the
708	   following modifications.

710	   Step (1) is replaced by step (1'):

712	   (1')  Before TCP updates the variable "RTO" when it initiates
713	         timeout-based loss recovery, set the variables "BACKOFF_CNT"
714	         and "RTO_BASE" and the data structure "RETRANS_TS" as follows:

716	            BACKOFF_CNT := 0;
717	            RTO_BASE := RTO;
718	            RETRANS_TS := [].

720	         Proceed to step (R).

722	   Step (2) is extended by step (2b):

724	   (2b)  Store the value of the Timestamp Value field (TSval) of the TCP
725	         Timestamps option included in the retransmission "RET" sent in
726	         step (R) into the "RETRANS_TS" data structure:

728	            RETRANS_TS.add(RET.TSval)

730	   Step (6) is replaced by step (6'):

732	   (6')  If "SEG.SEQ == SND.UNA && RETRANS_TS.exists(SEQ.TSval)", i.e.,
733	         if the TCP segment "SEG" eliciting the ICMP unreachable message
734	         "ICMP_DU" contains the sequence number of a retransmission, and
735	         the value in its Timestamp Value field (TSval) is valid, then

737	               proceed to step (7');

739	         else

741	               proceed to step (3).

743	   Step (7) is replaced by step (7'):

745	   (7')  Undo the last retransmission timer backoff:

747	               RETRANS_TS.remove(SEQ.TSval);
748	               BACKOFF_CNT := BACKOFF_CNT - 1;
749	               RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO).

751	   The downside of the this variant is twofold.  First, the
752	   modifications come at a cost: the TCP sender is required to store the
753	   timestamps of all retransmissions sent during one timeout-based loss
754	   recovery.  Second, this variant can only undo a retransmission timer
755	   backoff if the intermediate router experiencing the link outage
756	   implements [RFC1812] and chooses to include as many more than the
757	   first 64 bits of the payload of the triggering datagram, as are
758	   needed to include the TCP Timestamps option in the ICMP unreachable
759	   message.

761	7.  Interoperability Issues

763	   This section discusses interoperability issues related to introducing
764	   TCP-LCD.

766	7.1.  Detection of TCP Connection Failures

768	   TCP-LCD may have side-effects on TCP implementations that attempt to
769	   detect TCP connection failures by counting timeout-based
770	   retransmissions.  [RFC1122] states in Section 4.2.3.5 that a TCP host
771	   must handle excessive retransmissions of data segments with two
772	   thresholds R1 and R2 that measure the number of retransmissions that
773	   have occurred for the same segment.  Both thresholds might either be
774	   measured in time units or as a count of retransmissions.

776	   Due to TCP-LCD's reversion strategy of the retransmission timer, the
777	   assumption that a certain number of retransmissions corresponds to a
778	   specific time interval no longer holds, as additional retransmissions
779	   may be performed during timeout-based-loss recovery to detect the end
780	   of the connectivity disruption.  Therefore, a TCP employing TCP-LCD
781	   either MUST measure the thresholds R1 and R2 in time units or, in
782	   case R1 and R2 are counters of retransmissions, MUST convert them
783	   into time intervals, which correspond to the time an unmodified TCP
784	   would need to reach the specified number of retransmissions.

786	7.2.  Explicit Congestion Notification (ECN)

788	   With Explicit Congestion Notification (ECN) [RFC3168], ECN-capable
789	   routers are no longer limited to dropping packets to indicate
790	   congestion.  Instead, they can set the Congestion Experienced (CE)
791	   codepoint in the IP header to indicate congestion.  With TCP-LCD, it
792	   may happen that during a connectivity disruption, a received ICMP
793	   unreachable message has been elicited by a timeout-based
794	   retransmission that was marked with the CE codepoint before reaching
795	   the router experiencing the link outage.  In such a case, a TCP
796	   sender MUST, corresponding to [RFC3168] (Section 6.1.2), additionally
797	   reset the retransmission timer in case the algorithm undoes a
798	   retransmission timer backoff.

800	7.3.  TCP-LCD and IP Tunnels

802	   It is worth noting that IP tunnels, including IPsec [RFC4301], IP in
803	   IP [RFC2003], Generic Routing Encapsulation (GRE) [RFC2784], and
804	   others are compatible with TCP-LCD, as long as the received ICMP
805	   unreachable messages can be demultiplexed and extracted appropriately
806	   by the TCP sender during timeout-based loss recovery.

808	   If, for example, end-to-end tunnels like IPsec in transport mode
809	   [RFC4301] are employed, a TCP sender may receive ICMP unreachable
810	   messages where additional steps, e.g., decrypting in step (5) of the
811	   algorithm, are needed to extract the TCP header from these ICMP
812	   messages.  Provided that the received ICMP unreachable message
813	   contains enough information, i.e., SEQ.SEG is extractable, this
814	   information can still be used as a valid input for the proposed
815	   algorithm.

817	   Likewise, if IP encapsulation like [RFC2003] is used in some part of
818	   the path between the communicating hosts, the tunnel ingress node may
819	   receive the ICMP unreachable messages from an intermediate router
820	   experiencing the link outage.  Nevertheless, the tunnel ingress node
821	   may replay the ICMP unreachable messages in order to inform the TCP
822	   sender.  If enough information is preserved to extract SEQ.SEG, the
823	   replayed ICMP unreachable messages can still be used in TCP-LCD.

825	8.  Related Work

827	   Several methods that address TCP's problems in the presence of
828	   connectivity disruptions have been proposed in literature.  Some of
829	   them try to improve TCP's performance by modifying lower layers.  For
830	   example, [SM03] introduces a "smart link layer", which buffers one
831	   segment for each active connection and replays these segments upon
832	   connectivity re-establishment.  This approach has a serious drawback:
833	   previously stateless intermediate routers have to be modified in
834	   order to inspect TCP headers, to track the end-to-end connection, and
835	   to provide additional buffer space.  This leads to an additional need
836	   of memory and processing power.

838	   On the other hand, stateless link layer schemes, as proposed in
839	   [RFC3819], which unconditionally buffer some small number of packets
840	   may have another problem: if a packet is buffered longer than the
841	   maximum segment lifetime (MSL) of 2 min [RFC0793], i.e., the
842	   disconnection lasts longer than MSL, TCP's assumption that such
843	   segments will never be received will no longer be true, violating
844	   TCP's semantics [I-D.eggert-tcpm-tcp-retransmit-now].

846	   Other approaches, like TCP-F [CRVP01] or the Explicit Link Failure
847	   Notification (ELFN) [HV02] inform a TCP sender about a disrupted path
848	   by special messages generated and sent from intermediate routers.  In
849	   the case of a link failure, the TCP sender stops sending segments and
850	   freezes its retransmission timers.  TCP-F stays in this state and
851	   remains silent until either a "route establishment notification" is
852	   received or an internal timer expires.  In contrast, ELFN
853	   periodically probes the network to detect connectivity re-
854	   establishment.  Both proposals rely on changes to intermediate
855	   routers, whereas the scheme proposed in this document is a sender-
856	   only modification.  Moreover, ELFN does not consider congestion and
857	   may impose serious additional load on the network, depending on the
858	   probe interval.

860	   The authors of ATCP [LS01] propose enhancements to identify different
861	   types of packet loss by introducing a layer between TCP and IP.  They
862	   utilize ICMP destination unreachable messages to set TCP's receiver
863	   advertised window to zero, thus forcing the TCP sender to perform
864	   zero window probing with an exponential backoff.  ICMP destination
865	   unreachable messages that arrive during this probing period are
866	   ignored.  This approach is nearly orthogonal to this document, which
867	   exploits ICMP messages to undo a retransmission timer backoff when
868	   TCP is already probing.  In principle, both mechanisms could be
869	   combined.  However, due to security considerations, it does not seem
870	   appropriate to adopt ATCP's reaction, as discussed in Section 5.6.

872	   Schuetz et al.  [I-D.schuetz-tcpm-tcp-rlci] describe a set of TCP
873	   extensions that improve TCP's behavior when transmitting over paths
874	   whose characteristics can change rapidly.  Their proposed extensions
875	   modify the local behavior of TCP and introduce a new TCP option to
876	   signal locally received connectivity-change indications (CCIs) to
877	   remote peers.  Upon receipt of a CCI, they re-probe the path
878	   characteristics either by performing a speculative retransmission or
879	   by sending a single segment of new data, depending on whether the
880	   connection is currently stalled in exponential backoff or
881	   transmitting in steady-state, respectively.  The authors focus on
882	   specifying TCP response mechanisms, nevertheless underlying layers
883	   would have to be modified to explicitly send CCIs to make these
884	   immediate responses possible.

886	9.  IANA Considerations

888	   This memo includes no request to IANA.

890	10.  Security Considerations

892	   Generally, an attacker has only two attack alternatives: to generate
893	   ICMP unreachable messages to try to make a TCP modified with TCP-LCD
894	   to flood the network, or to suppress legitimate ICMP unreachable
895	   messages to try to slow down the transmission rate of a TCP sender.

897	   In order to generate ICMP unreachable messages that fit as an input
898	   for TCP-LCD, an attacker would need to guess the correct four-tuple
899	   (i.e., Source IP Address, Source TCP port, Destination IP Address,
900	   and Destination TCP port) and the exact segment sequence number of
901	   the current timeout-based retransmission.  Yet, the correct sequence
902	   number is generally hard to guess as; with a probability of 1/2^32.
903	   Even if an attacker has information about that sequence number (i.e.,
904	   the attacker can eavesdrop on the retransmissions) the impact on the
905	   network load the attacker may be considered low, since the
906	   retransmission frequency is limited by the RTO that was computed
907	   before TCP had entered the timeout-based loss recovery.  Hence, the
908	   highest probing frequency is expected to be even lower than once per
909	   minimum RTO, i.e., 1s as specified by [RFC2988].  It is important to
910	   note, that an attacker, who can correctly guess the four-tuple and
911	   the segment sequence number, can easily launch more serious attacks
912	   (i.e., hijack the connection), whether or not TCP-LCD is used.

914	   There may be means by which an attacker can cause the suppression of
915	   legitimate ICMP unreachable messages (e.g., by flooding the router
916	   experiencing the link outage to trigger ICMP rate-limiting).
917	   However, even if the attacker could suppress every legitimate ICMP
918	   unreachable message, the security impact of such an attack is
919	   negligible, since the TCP sender using TCP-LCD will behave like a
920	   regular TCP would.  Note that this kind of attack is
921	   indistinguishable from a router experiencing a link outage is not
922	   sending ICMP unreachable messages at all (e.g., because of local
923	   policy).

925	   In summary, the algorithm proposed in this document is considered to
926	   be secure.

928	11.  Acknowledgments

930	   We would like to thank Lars Eggert, Adrian Farrel, Mark Handley, Kai
931	   Jakobs, Ilpo Jarvinen, Enrico Marocco, Catherine Meadows, Juergen
932	   Quittek, Pasi Sarolahti, Tim Shepard, Joe Touch and Carsten Wolff for
933	   feedback on earlier versions of this document.  We also thank Michael
934	   Faber, Daniel Schaffrath, and Damian Lukowski for implementing and
935	   testing the algorithm in Linux.  Special thanks go to Ilpo Jarvinen
936	   for giving valuable feedback regarding the Linux implementation.

938	   This work has been supported by the German National Science
939	   Foundation (DFG) within the research excellence cluster Ultra High-
940	   Speed Mobile Information and Communication (UMIC), RWTH Aachen
941	   University.

943	12.  References

945	12.1.  Normative References

947	   [RFC0792]  Postel, J., "Internet Control Message Protocol", STD 5,
948	              RFC 792, September 1981.

950	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
951	              RFC 793, September 1981.

953	   [RFC1323]  Jacobson, V., Braden, B., and D. Borman, "TCP Extensions
954	              for High Performance", RFC 1323, May 1992.

956	   [RFC1812]  Baker, F., "Requirements for IP Version 4 Routers",
957	              RFC 1812, June 1995.

959	   [RFC2988]  Paxson, V. and M. Allman, "Computing TCP's Retransmission
960	              Timer", RFC 2988, November 2000.

962	   [RFC4443]  Conta, A., Deering, S., and M. Gupta, "Internet Control
963	              Message Protocol (ICMPv6) for the Internet Protocol
964	              Version 6 (IPv6) Specification", RFC 4443, March 2006.

966	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
967	              Control", RFC 5681, September 2009.

969	12.2.  Informative References

971	   [CRVP01]   Chandran, K., Raghunathan, S., Venkatesan, S., and R.
972	              Prakash, "A feedback-based scheme for improving TCP
973	              performance in ad hoc wireless networks", IEEE Personal
974	              Communications vol. 8, no. 1, pp. 34-39, February 2001.

976	   [HV02]     Holland, G. and N. Vaidya, "Analysis of TCP performance
977	              over mobile ad hoc networks", Wireless Networks vol. 8,
978	              no. 2-3, pp. 275-288, March 2002.

980	   [I-D.eggert-tcpm-tcp-retransmit-now]
981	              Eggert, L., "TCP Extensions for Immediate
982	              Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02
983	              (work in progress), June 2005.

985	   [I-D.schuetz-tcpm-tcp-rlci]
986	              Schuetz, S., Koutsianas, N., Eggert, L., Eddy, W., Swami,
987	              Y., and K. Le, "TCP Response to Lower-Layer Connectivity-
988	              Change Indications", draft-schuetz-tcpm-tcp-rlci-03 (work
989	              in progress), February 2008.

991	   [KP87]     Karn, P. and C. Partridge, "Improving Round-Trip Time
992	              Estimates in Reliable Transport Protocols", Proceedings of
993	              the Conference on Applications, Technologies,
994	              Architectures, and Protocols for Computer Communication
995	              (SIGCOMM'87) pp. 2-7, August 1987.

997	   [LS01]     Liu, J. and S. Singh, "ATCP: TCP for mobile ad hoc
998	              networks", IEEE Journal on Selected Areas in
999	              Communications vol. 19, no. 7, pp. 1300-1315, 2001 July.

1001	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
1002	              September 1981.

1004	   [RFC0826]  Plummer, D., "Ethernet Address Resolution Protocol: Or
1005	              converting network protocol addresses to 48.bit Ethernet
1006	              address for transmission on Ethernet hardware", STD 37,
1007	              RFC 826, November 1982.

1009	   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
1010	              Communication Layers", STD 3, RFC 1122, October 1989.

1012	   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
1013	              October 1996.

1015	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1016	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1018	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
1019	              (IPv6) Specification", RFC 2460, December 1998.

1021	   [RFC2784]  Farinacci, D., Li, T., Hanks, S., Meyer, D., and P.
1022	              Traina, "Generic Routing Encapsulation (GRE)", RFC 2784,
1023	              March 2000.

1025	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
1026	              of Explicit Congestion Notification (ECN) to IP",
1027	              RFC 3168, September 2001.

1029	   [RFC3522]  Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
1030	              for TCP", RFC 3522, April 2003.

1032	   [RFC3782]  Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
1033	              Modification to TCP's Fast Recovery Algorithm", RFC 3782,
1034	              April 2004.

1036	   [RFC3819]  Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
1037	              Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
1038	              Wood, "Advice for Internet Subnetwork Designers", BCP 89,
1039	              RFC 3819, July 2004.

1041	   [RFC4015]  Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm
1042	              for TCP", RFC 4015, February 2005.

1044	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
1045	              Internet Protocol", RFC 4301, December 2005.

1047	   [RFC5461]  Gont, F., "TCP's Reaction to Soft Errors", RFC 5461,
1048	              February 2009.

1050	   [RFC5682]  Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
1051	              "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
1052	              Spurious Retransmission Timeouts with TCP", RFC 5682,
1053	              September 2009.

1055	   [RFC5927]  Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010.

1057	   [SESB05]   Schuetz, S., Eggert, L., Schmid, S., and M. Brunner,
1058	              "Protocol enhancements for intermittently connected
1059	              hosts", SIGCOMM Computer Communication Review vol. 35, no.
1060	              3, pp. 5-18, December 2005.

1062	   [SM03]     Scott, J. and G. Mapp, "Link layer-based TCP optimisation
1063	              for disconnecting networks", SIGCOMM Computer
1064	              Communication Review vol. 33, no. 5, pp. 31-42,
1065	              October 2003.

1067	   [Zh86]     Zhang, L., "Why TCP Timers Don't Work Well", Proceedings
1068	              of the Conference on Applications, Technologies,
1069	              Architectures, and Protocols for Computer Communication
1070	              (SIGCOMM'86) pp. 397-405, August 1986.

1072	   [ZimHan09]
1073	              Zimmermann, A., "Make TCP more Robust to Long Connectivity
1074	              Disruptions", Proceedings of the 75th IETF Meeting slides,
1075	              July 2009,
1076	              <http://www.ietf.org/proceedings/75/slides/tcpm-0.pdf>.

1078	Appendix A.  Changes from previous versions of the draft

1080	   This appendix should be removed by the RFC Editor before publishing
1081	   this document as an RFC.

1083	A.1.  Changes from draft-ietf-tcpm-tcp-lcd-02

1085	   o  Incorporated feedback submitted by Enrico Marocco (Gen-ART Review)

1087	   o  Incorporated feedback submitted by Juergen Quittek (OpsDir Review)

1089	   o  Incorporated feedback submitted by Catherine Meadows (SecDir
1090	      Review)

1092	   o  Incorporated feedback submitted by Adrian Farrel (IESG Review)

1094	A.2.  Changes from draft-ietf-tcpm-tcp-lcd-01

1096	   o  Incorporated feedback submitted by Lars Eggert (AD Review)

1098	A.3.  Changes from draft-ietf-tcpm-tcp-lcd-00

1100	   o  Editorial changes.

1102	   o  Clarified TCP-LCD's behaviour during connection establishment
1103	      (Thanks to Mark Handley).

1105	A.4.  Changes from draft-zimmermann-tcp-lcd-02

1107	   o  Incorporated feedback submitted by Ilpo Jarvinen.
1108	      <http://www.ietf.org/mail-archive/web/tcpm/current/msg04841.html>

1110	   o  Incorporated feedback submitted by Pasi Sarolahti.
1111	      <http://www.ietf.org/mail-archive/web/tcpm/current/msg04870.html>

1113	   o  Incorporated feedback submitted by Joe Touch.
1114	      <http://www.ietf.org/mail-archive/web/tcpm/current/msg04895.html>
1115	      <http://www.ietf.org/mail-archive/web/tcpm/current/msg04900.html>

1117	   o  Extended and reorganized the discussion (Section 5):

1119	      *  Every discussion item got its own title, so that we have a
1120	         better overview.

1122	      *  Extended Retransmission Ambiguity section.  Added also some
1123	         references to the historical retransmission ambiguity problem.

1125	      *  Heavily extended discussion about wrapped sequence numbers (see
1126	         Joe's comments).

1128	      *  Described the influence of packet duplication on the algorithm
1129	         (Thanks to Ilpo).

1131	      *  The section "Protecting Against Misbehaving Routers" is not a
1132	         subsection anymore.  Moreover, the section was renamed to
1133	         "Dissolving Ambiguity Issues" and has now real content.

1135	   o  An interoperability issues section (Section 7) was added.  In
1136	      particular comments to ECN, ICMPv6, and to the two thresholds R1
1137	      and R2 of [RFC1122] (Section 4.2.3.5) were added.

1139	   o  Miscellaneous editorial changes.  In particular, the algorithm has
1140	      a name now: TCP-LCD.

1142	A.5.  Changes from draft-zimmermann-tcp-lcd-01

1144	   o  The algorithm in Section 4.2 was slightly changed.  Instead of
1145	      reverting the last retransmission timer backoff by halving the
1146	      RTO, the RTO is recalculated with help of the "BACKOFF_CNT"
1147	      variable.  This fixes an issue that occurred when the
1148	      retransmission timer was backed off but bounded by a maximum
1149	      value.  The algorithm in the previous version of the draft, would
1150	      have "reverted" to half of that maximum value, instead of using
1151	      the value, before the RTO was doubled (and then bounded).

1153	   o  Miscellaneous editorial changes.

1155	A.6.  Changes from draft-zimmermann-tcp-lcd-00

1157	   o  Miscellaneous editorial changes in Section 1, 2 and 3.

1159	   o  The document was restructured in Section 1, 2 and 3 for easier
1160	      reading.  The motivation for the algorithm is changed according
1161	      TCP's problem to disambiguate congestion from non-congestion loss.

1163	   o  Added Section 4.1.

1165	   o  The algorithm in Section 4.2 was restructured and simplified:

1167	      *  The special case of the first received ICMP destination
1168	         unreachable message after an RTO was removed.

1170	      *  The "BACKOFF_CNT" variable was introduced so it is no longer
1171	         possible to perform more reverts than backoffs.

1173	   o  The discussion in Section 5 was improved and expanded according to
1174	      the algorithm changes.

1176	Authors' Addresses

1178	   Alexander Zimmermann
1179	   RWTH Aachen University
1180	   Ahornstrasse 55
1181	   Aachen,   52074
1182	   Germany

1184	   Phone: +49 241 80 21422
1185	   Email: zimmermann@cs.rwth-aachen.de
1186	   Arnd Hannemann
1187	   RWTH Aachen University
1188	   Ahornstrasse 55
1189	   Aachen,   52074
1190	   Germany

1192	   Phone: +49 241 80 21423
1193	   Email: hannemann@nets.rwth-aachen.de