idnits 2.17.1 draft-ietf-tcpm-tcp-lcd-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 30, 2010) is 5134 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor A. Zimmermann 3 Extensions (TCPM) WG A. Hannemann 4 Internet-Draft RWTH Aachen University 5 Intended status: Experimental March 30, 2010 6 Expires: October 1, 2010 8 Making TCP more Robust to Long Connectivity Disruptions (TCP-LCD) 9 draft-ietf-tcpm-tcp-lcd-01 11 Abstract 13 Disruptions in end-to-end path connectivity, which last longer than 14 one retransmission timeout, cause suboptimal TCP performance. The 15 reason for this performance degradation is that TCP interprets 16 segment loss induced by long connectivity disruptions as a sign of 17 congestion, resulting in repeated retransmission timer backoffs. 18 This, in turn, leads to a delayed detection of the re-establishment 19 of the connection since TCP waits for the next retransmission timeout 20 before it attempts a retransmission. 22 This document proposes an algorithm to make TCP more robust to long 23 connectivity disruptions (TCP-LCD). It describes how standard ICMP 24 messages can be exploited during timeout-based loss recovery to 25 disambiguate true congestion loss from non-congestion loss caused by 26 connectivity disruptions. Moreover, a revert strategy of the 27 retransmission timer is specified that enables a more prompt 28 detection of whether or not the connectivity to a previously 29 disconnected peer node has been restored. TCP-LCD is a TCP sender- 30 only modification that effectively improves TCP performance in case 31 of connectivity disruptions. 33 Status of this Memo 35 This Internet-Draft is submitted to IETF in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF), its areas, and its working groups. Note that 40 other groups may also distribute working documents as Internet- 41 Drafts. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 The list of current Internet-Drafts can be accessed at 49 http://www.ietf.org/ietf/1id-abstracts.txt. 51 The list of Internet-Draft Shadow Directories can be accessed at 52 http://www.ietf.org/shadow.html. 54 This Internet-Draft will expire on October 1, 2010. 56 Copyright Notice 58 Copyright (c) 2010 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the BSD License. 71 Table of Contents 73 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 3. Connectivity Disruption Indication . . . . . . . . . . . . . . 6 76 4. Connectivity Disruption Reaction . . . . . . . . . . . . . . . 8 77 4.1. Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . 8 78 4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 8 79 5. Discussion of TCP-LCD . . . . . . . . . . . . . . . . . . . . 11 80 5.1. Retransmission Ambiguity . . . . . . . . . . . . . . . . . 12 81 5.2. Wrapped Sequence Numbers . . . . . . . . . . . . . . . . . 13 82 5.3. Packet Duplication . . . . . . . . . . . . . . . . . . . . 14 83 5.4. Probing Frequency . . . . . . . . . . . . . . . . . . . . 14 84 5.5. Reaction during Connection Establishment . . . . . . . . . 14 85 5.6. Reaction in Steady-State . . . . . . . . . . . . . . . . . 15 86 6. Dissolving Ambiguity Issues (the Safe Variant) . . . . . . . . 15 87 7. Interoperability Issues . . . . . . . . . . . . . . . . . . . 17 88 7.1. Detection of TCP Connection Failures . . . . . . . . . . . 17 89 7.2. Explicit Congestion Notification . . . . . . . . . . . . . 17 90 7.3. ICMP for IP version 6 . . . . . . . . . . . . . . . . . . 18 91 7.4. TCP-LCD and IP Tunnels . . . . . . . . . . . . . . . . . . 18 92 8. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 19 93 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 94 10. Security Considerations . . . . . . . . . . . . . . . . . . . 20 95 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 96 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 97 12.1. Normative References . . . . . . . . . . . . . . . . . . . 21 98 12.2. Informative References . . . . . . . . . . . . . . . . . . 21 99 Appendix A. Changes from previous versions of the draft . . . . . 24 100 A.1. Changes from draft-ietf-tcpm-tcp-lcd-00 . . . . . . . . . 24 101 A.2. Changes from draft-zimmermann-tcp-lcd-02 . . . . . . . . . 24 102 A.3. Changes from draft-zimmermann-tcp-lcd-01 . . . . . . . . . 25 103 A.4. Changes from draft-zimmermann-tcp-lcd-00 . . . . . . . . . 25 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 106 1. Terminology 108 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 109 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 110 document are to be interpreted as described in [RFC2119]. 112 The reader should be familiar with the algorithm and terminology from 113 [RFC2988], which defines the standard algorithm Transmission Control 114 Protocol (TCP) senders are required to use to compute and manage 115 their retransmission timer. In this document the terms 116 "retransmission timer" and "retransmission timeout" are used as 117 defined in [RFC2988]. The retransmission timer ensures data delivery 118 in the absence of any feedback from the receiver. The duration of 119 this timer is referred to as retransmission timeout (RTO). 121 As defined in [RFC0793], the term "acceptable acknowledgment (ACK)" 122 refers to a TCP segment that acknowledges previously unacknowledged 123 data. The TCP sender state variable "SND.UNA" and the current 124 segment variable "SEG.SEQ" are used as defined in [RFC0793]. SND.UNA 125 holds the segment sequence number of earliest segment that has not 126 been acknowledged by the TCP receiver (the oldest outstanding 127 segment). SEG.SEQ is the segment sequence number of a given segment. 129 For the purposes of this specification we define the term "timeout- 130 based loss recovery" that refers to the state, which a TCP sender 131 enters upon the first timeout of the oldest outstanding segment 132 (SND.UNA) and leaves upon the arrival of the *first* acceptable ACK. 133 It is important to note that other documents use a different 134 interpretation of the term "timeout-based loss recovery". For 135 example the NewReno modification to TCP's Fast Recovery algorithm 136 [RFC3782] extents the period a TCP sender remains in timeout-based 137 loss recovery compared to the one defined in this document. This is 138 because [RFC3782] attempts to avoid unnecessary multiple Fast 139 Retransmits that can occur after an RTO. 141 2. Introduction 143 Connectivity disruptions can occur in many different situations. The 144 frequency of connectivity disruptions depends on the property of the 145 end-to-end path between the communicating hosts. While connectivity 146 disruptions can occur in traditional wired networks too, e.g., caused 147 by an unplugged network cable, the likelihood of occurrence is 148 significantly higher in wireless (multi-hop) networks. Especially, 149 end-host mobility, network topology changes, and wireless 150 interferences are crucial factors. In the case of the Transmission 151 Control Protocol (TCP) [RFC0793], the performance of the connection 152 can experience a significant reduction compared to a permanently 153 connected path [SESB05]. This is because TCP, which was originally 154 designed to operate in fixed and wired networks, generally assumes 155 that the end-to-end path connectivity is relatively stable over the 156 connection's lifetime. 158 Depending on their duration connectivity disruptions can be 159 classified into two groups [I-D.schuetz-tcpm-tcp-rlci]: "short" and 160 "long". A connectivity disruption is "short" if connectivity returns 161 before the retransmission timer fires for the first time. In this 162 case, TCP recovers lost data segments through Fast Retransmit and 163 lost acknowledgments (ACK) through successfully delivered later ACKs. 164 Connectivity disruptions are declared as "long" for a given TCP 165 connection if the retransmission timer fires at least once before 166 connectivity is resumed. Whether or not path characteristics, like 167 the round trip time (RTT) or the available bandwidth, have changed 168 when connectivity resumes after a disruption is another important 169 aspect for TCP's retransmission scheme [I-D.schuetz-tcpm-tcp-rlci]. 171 This document improves TCP's behavior in case of "long connectivity 172 disruptions". In particular, it focuses on the period "prior" to the 173 re-establishment of the connectivity to a previously disconnected 174 peer node. The document does not describe any modifications of TCP's 175 behavior and its congestion control mechanisms [RFC5681] "after" 176 connectivity has been restored. 178 When a long connectivity disruption occurs on a TCP connection the 179 TCP sender eventually does not receive any more acknowledgments. 180 After the retransmission timer expires, the TCP sender enters the 181 timeout-based loss recovery and declares the oldest outstanding 182 segment (SND.UNA) as lost. Since TCP tightly couples reliability and 183 congestion control, the retransmission of SND.UNA is triggered 184 together with the reduction of the transmission rate. This is based 185 on the assumption that segment loss is an indication of congestion 186 [RFC5681]. As long as the connectivity disruption persists, TCP will 187 repeat this procedure until the oldest outstanding segment has 188 successfully been acknowledged, or until the connection has timed 189 out. TCP implementations that follow the recommended retransmission 190 timeout (RTO) management of RFC 2988 [RFC2988] double the RTO after 191 each retransmission attempt. However, the RTO's growth may be 192 bounded by an upper limit, the maximum RTO, which is at least 60s, 193 but may be longer: Linux, for example, uses 120s. If connectivity is 194 restored between two retransmission attempts, TCP still has to wait 195 until the retransmission timer expires before resuming transmission, 196 since it simply does not have any means to know if the connectivity 197 has been re-established. Therefore, depending on when connectivity 198 becomes available again, this can waste up to a maximum RTO of 199 possible transmission time. 201 This retransmission behavior is not efficient, especially in 202 scenarios with long connectivity disruptions. In the ideal case, TCP 203 would attempt a retransmission as soon as connectivity to its peer 204 has been re-established. In this document, we specify a TCP sender- 205 only modification to provide robustness to long connectivity 206 disruptions (TCP-LCD). The memo describes how the standard Internet 207 Control Message Protocol (ICMP) can be exploited during timeout-based 208 loss recovery to identify non-congestion loss caused by long 209 connectivity disruptions. TCP-LCD's revert strategy of the 210 retransmission timer enables higher-frequency retransmissions and 211 thereby a prompt detection when connectivity to a previously 212 disconnected peer node has been restored. If no congestion is 213 present, TCP-LCD approaches the ideal behavior. 215 3. Connectivity Disruption Indication 217 If the queue of an intermediate router experiencing a link outage can 218 buffer all incoming packets, a connectivity disruption will only 219 cause a variation in delay, which is handled well by TCP 220 implementations using either Eifel [RFC3522], [RFC4015] or Forward 221 RTO-Recovery (F-RTO) [RFC5682]. However, if the link outage lasts 222 for too long, the router experiencing the link outage is forced to 223 drop packets, and finally to discard the according route. Means to 224 detect such link outages include reacting on failed address 225 resolution protocol (ARP) [RFC0826] queries, unsuccessful link 226 sensing, and the like. However, this is solely in the responsibility 227 of the respective router. 229 Note: The focus of this memo is on introducing a method how ICMP 230 messages may be exploited to improve TCP's performance; how 231 different physical and link layer mechanisms below the network 232 layer may trigger ICMP destination unreachable messages are out of 233 scope of this memo. 235 Provided that no other route to the specific destination exists the 236 router will notify the corresponding sending host about the dropped 237 packets via ICMP destination unreachable messages of code 0 (net 238 unreachable) or code 1 (host unreachable) [RFC1812]. Therefore, the 239 sending host can use the ICMP destination unreachable messages of 240 these codes as an indication for a connectivity disruption, since the 241 reception of these messages provide evidence that packets were 242 dropped due to a link outage. 244 Note that there are also other ICMP destination unreachable messages 245 with different codes. Some of them are candidates for connectivity 246 disruption indications, too, but need further investigation. For 247 example, ICMP destination unreachable messages with code 5 (source 248 route failed), code 11 (net unreachable for TOS), or code 12 (host 249 unreachable for TOS) [RFC1812]. On the other hand, codes that flag 250 hard errors are of no use for the proposed scheme, since TCP should 251 abort the connection when those are received [RFC1122]. In the 252 following, the term "ICMP unreachable message" is used as synonym for 253 ICMP destination unreachable messages of code 0 or code 1. 255 The accurate interpretation of ICMP unreachable messages as a 256 connectivity disruption indication is complicated by the following 257 two peculiarities of ICMP messages. Firstly, they do not necessarily 258 operate on the same timescale as the packets, i.e., TCP segments that 259 elicited them. When a router drops a packet due to a missing route 260 it will not necessarily send an ICMP unreachable message immediately, 261 but will rather queue it for later delivery. Secondly, ICMP messages 262 are subject to rate limiting, e.g., when a router drops a whole 263 window of data due to a link outage, it will hardly send as many ICMP 264 unreachable messages as it dropped TCP segments. Depending on the 265 load of the router it may even send no ICMP unreachable messages at 266 all. Both peculiarities originate from [RFC1812]. 268 Fortunately, according to [RFC0792], ICMP unreachable messages have 269 to contain in their body the entire Internet Protocol (IP) header 270 [RFC0791] of the datagram eliciting the ICMP unreachable message, 271 plus the first 64 bits of the payload of that datagram. This allows 272 the sending host to match the ICMP error message to the transport 273 that elicited it. RFC 1812 [RFC1812] augments the requirements and 274 states that ICMP messages should contain as much of the original 275 datagram as possible without the length of the ICMP datagram 276 exceeding 576 bytes. Therefore, in case of TCP, at least the source 277 port number, the destination port number, and the 32-bit TCP sequence 278 number are included. This allows the originating TCP to demultiplex 279 the received ICMP message and to identify the faulty connection. 280 Moreover, it can identify which segment of the respective connection 281 triggered the ICMP unreachable message, unless there are several 282 segments in-flight with the same sequence number (see Section 5.1). 284 A connectivity disruption indication in form of an ICMP unreachable 285 message associated with a presumably lost TCP segment provides strong 286 evidence that the segment was not dropped due to congestion, but was 287 successfully delivered to the temporary end-point of the employed 288 path, i.e., the reporting router. It therefore did not witness any 289 congestion at least on that part of the path that was traversed by 290 both the TCP segment eliciting the ICMP unreachable message as well 291 as the ICMP unreachable message itself. 293 4. Connectivity Disruption Reaction 295 Section 4.1 introduces the basic idea of TCP-LCD. The complete 296 algorithm is specified in Section 4.2. 298 4.1. Basic Idea 300 The goal of the algorithm is to promptly detect when connectivity to 301 a previously disconnected peer node has been restored after a long 302 connectivity disruption, while retaining appropriate behavior in case 303 of congestion. TCP-LCD exploits standard ICMP unreachable messages 304 during timeout-based loss recovery. This increases TCP's 305 retransmission frequency by undoing one retransmission timer backoff 306 whenever an ICMP unreachable message reports on the sequence number 307 of a presumably lost retransmission. 309 This approach has the advantage of appropriately reducing the probing 310 rate in case of congestion. If either the retransmission itself, or 311 the corresponding ICMP message, is dropped the previously performed 312 retransmission timer backoff is not undone, which effectively halves 313 the probing rate. 315 4.2. Algorithm Details 317 A TCP sender using RFC 2988 [RFC2988] to compute TCP's retransmission 318 timer MAY employ the following scheme to avoid over-conservative 319 retransmission timer backoffs in case of long connectivity 320 disruptions. If a TCP sender does implement the following steps, the 321 algorithm MUST be initiated upon the first timeout of the oldest 322 outstanding segment (SND.UNA) and MUST be stopped upon the arrival of 323 the first acceptable ACK. The algorithm MUST NOT be re-initiated 324 upon subsequent timeouts for the same segment. The scheme SHOULD NOT 325 be used in SYN-SENT or SYN-RECEIVED states [RFC0793] (i.e., during 326 connection establishment). 328 A TCP sender that does not employ RFC 2988 [RFC2988] to compute TCP's 329 retransmission timer SHOULD NOT use TCP-LCD. We envision that the 330 scheme could be easily adapted to algorithms others than RFC 2988. 331 However, we leave this as future work. 333 In rule (2.5) RFC 2988 [RFC2988] provides the option to place a 334 maximum value on the RTO. When a TCP implements this rule to provide 335 an upper bound for the RTO, it SHOULD also be used in the following 336 algorithm. In particular, if the RTO is bounded by an upper limit 337 (maximum RTO), the "MAX_RTO" variable used in this scheme SHOULD be 338 initialized with this upper limit. Otherwise, if the RTO is 339 unbounded, the "MAX_RTO" variable SHOULD be set to infinity. 341 The scheme specified in this document uses the "BACKOFF_CNT" 342 variable, whose initial value is zero. The variable is used to count 343 the number of performed retransmission timer backoffs during one 344 timeout-based loss recovery. Moreover, the "RTO_BASE" variable is 345 used to recover the previous RTO if the retransmission timer backoff 346 was unnecessary. The variable is initialized with the RTO upon 347 initiation of timeout-based loss recovery. 349 (1) Before TCP updates the variable "RTO" when it initiates timeout- 350 based loss recovery, set the variables "BACKOFF_CNT" and 351 "RTO_BASE" as follows: 353 BACKOFF_CNT := 0; 354 RTO_BASE := RTO. 356 Proceed to step (R). 358 (R) This is a placeholder for standard TCP's behavior in case the 359 retransmission timer has expired. In particular, if RFC 2988 360 [RFC2988] is used, steps (5.4) - (5.6) of that algorithm go 361 here. Proceed to step (2). 363 (2) To account for the expiration of the retransmission timer in the 364 previous step (R), increment the "BACKOFF_CNT" variable by one: 366 BACKOFF_CNT := BACKOFF_CNT + 1. 368 (3) Wait either 370 for the expiration of the retransmission timer. When the 371 retransmission timer expires, proceed to step (R); 373 or for the arrival of an acceptable ACK. When an acceptable 374 ACK arrives, proceed to step (A); 376 or for the arrival of an ICMP unreachable message. When the 377 ICMP unreachable message "ICMP_DU" arrives, proceed to step 378 (4). 380 (4) If "BACKOFF_CNT > 0", i.e., if at least one retransmission timer 381 backoff can be undone, then 383 proceed to step (5); 385 else 387 proceed to step (3). 389 (5) Extract the TCP segment header included in the ICMP unreachable 390 message "ICMP_DU": 392 SEG := Extract(ICMP_DU). 394 (6) If "SEG.SEQ == SND.UNA", i.e., if the TCP segment "SEG" 395 eliciting the ICMP unreachable message "ICMP_DU" carries the 396 sequence number of a retransmission, then 398 proceed to step (7); 400 else 402 proceed to step (3). 404 (7) Undo the last retransmission timer backoff: 406 BACKOFF_CNT := BACKOFF_CNT - 1; 407 RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO). 409 (8) If the retransmission timer expires due to the undoing in the 410 previous step (7), then 412 proceed to step (R); 414 else 416 proceed to step (3). 418 (A) This is a placeholder for standard TCP's behavior in case an 419 acceptable ACK has arrived. No further processing. 421 When a TCP in steady-state detects a segment loss using the 422 retransmission timer it enters the timeout-based loss recovery and 423 initiates the algorithm (step 1). It adjusts the slow start 424 threshold (ssthresh), sets the congestion window (CWND) to one 425 segment, backs off the retransmission timer, and retransmits the 426 first unacknowledged segment (step R) [RFC5681], [RFC2988]. To 427 account for the expiration of the retransmission timer the TCP sender 428 increments the "BACKOFF_CNT" variable by one (step 2). 430 In case the retransmission timer expires again (step 3a) a TCP will 431 repeat the retransmission of the first unacknowledged segment and 432 back off the retransmission timer once more (step R) [RFC2988] as 433 well as increment the "BACKOFF_CNT" variable by one (step 2). Note 434 that a TCP may implement RFC 2988's [RFC2988] option to place a 435 maximum value on the RTO that may result in not performing the 436 retransmission timer backoff. However, step (2) MUST always and 437 unconditionally be applied, no matter whether or not the 438 retransmission timer is actually backed off. In other words, each 439 time the retransmission timer expires, the "BACKOFF_CNT" variable 440 MUST be incremented by one. 442 If the first received packet after the retransmission(s) is an 443 acceptable ACK (step 3b), a TCP will proceed as normal, i.e., slow 444 start the connection and terminate the algorithm (step A). Later 445 ICMP unreachable messages from the just terminated timeout-based loss 446 recovery are ignored since the ACK clock is already restarting due to 447 the successful retransmission. 449 On the other hand, if the first received packet after the 450 retransmission(s) is an ICMP unreachable message (step 3c), and if 451 step (4) permits it, a TCP SHOULD undo one backoff for each ICMP 452 unreachable message reporting an error on a retransmission. To 453 decide if an ICMP unreachable message reports on a retransmission, 454 the sequence number therein is exploited (step 5, step 6). The undo 455 is performed by re-calculating the RTO with the decremented 456 "BACKOFF_CNT" variable (step 7). This calculation explicitly matches 457 the (bounded) exponential backoff specified in rule (5.5) of 458 [RFC2988]. 460 Upon receipt of an ICMP unreachable message that legitimately undoes 461 one backoff there is the possibility that the shortened 462 retransmission timer has already expired (step 8). Then, a TCP 463 SHOULD retransmit immediately, i.e., an ICMP message clocked 464 retransmission. In case the shortened retransmission timer has not 465 yet expired, TCP MUST wait accordingly. 467 5. Discussion of TCP-LCD 469 TCP-LCD takes caution to only react to connectivity disruption 470 indications in form of ICMP unreachable messages during timeout-based 471 loss recovery. Therefore, TCP's behavior is not altered when either 472 no ICMP unreachable messages are received, or the retransmission 473 timer of the TCP sender did not expire since the last received 474 acceptable ACK. Thus, by defintion the algorithm triggers only in 475 case of long connectivity disruptions. 477 Only such ICMP unreachable messages that report on the sequence 478 number of a retransmission, i.e., report on SND.UNA, are evaluated by 479 TCP-LCD. All other ICMP unreachable messages are ignored. The 480 arrival of those ICMP unreachable messages provides strong evidence 481 that the retransmissions were not dropped due to congestion but were 482 successfully delivered to the temporary end-point of the employed 483 path, i.e., the reporting router. In other words, there is no 484 evidence for any congestion at least on that very part of the path 485 that was traveled by both, the TCP segment eliciting the ICMP 486 unreachable message as well as the ICMP unreachable message itself. 488 However, there are some situations where TCP-LCD makes a false 489 decision and incorrectly undoes a retransmission timer backoff. This 490 can happen, albeit the received ICMP unreachable message reports on 491 the segment number of a retransmission (SND.UNA) because the TCP 492 segment that elicited the ICMP unreachable message may either not be 493 a retransmission (Section 5.1), or does not belong to the current 494 timeout-based loss recovery (Section 5.2). Finally, packet 495 duplication (Section 5.3) can also spuriously trigger the algorithm. 497 Section 5.4 discusses possible probing frequencies, while Section 5.6 498 describes the motivation for not reacting on ICMP unreachable 499 messages while TCP is in steady-state. 501 5.1. Retransmission Ambiguity 503 Historically, the retransmission ambiguity problem [Zh86], [KP87] is 504 the TCP sender's inability to distinguish whether the first 505 acceptable ACK after a retransmission refers to the original 506 transmission or to the retransmission. This problem occurs after 507 both a Fast Retransmit and a timeout-based retransmit. However, 508 modern TCP implementations can eliminate the retransmission ambiguity 509 with either the help of Eifel [RFC3522], [RFC4015] or Forward RTO- 510 Recovery (F-RTO) [RFC5682]. 512 The revert strategy of the given algorithm suffers from a form of 513 retransmission ambiguity, too. In contrast to the above case, TCP 514 suffers from ambiguity regarding ICMP unreachable messages received 515 during timeout-based loss recovery. With the TCP segment number 516 included in the ICMP unreachable message, a TCP sender is not able to 517 determine if the ICMP unreachable message refers to the original 518 transmission or to any of the timeout-based retransmissions. That 519 is, there is an ambiguity which TCP segment an ICMP unreachable 520 message reports on. 522 However, for the algorithm this ambiguity is not considered to be a 523 problem. The assumption that a received ICMP message provides 524 evidence that a non-congestion loss caused by the connectivity 525 disruption was wrongly considered a congestion loss still holds, 526 regardless to which TCP segment, transmission or retransmission, the 527 message refers. 529 5.2. Wrapped Sequence Numbers 531 Besides the ambiguity whether a received ICMP unreachable message 532 refers to the original transmission or to any of the retransmissions, 533 there is another source of ambiguity about the TCP sequence numbers 534 contained in ICMP unreachable messages. For high bandwidth paths 535 like modern gigabit links the sequence space may wrap rather quickly, 536 thereby allowing the possibility that delayed ICMP unreachable 537 messages - a router dropping packets due to a link outage is not 538 obliged to send ICMP unreachable messages in a timely manner 539 [RFC1812] - may coincidentally fit as valid input in the proposed 540 scheme. As a result, the scheme may incorrectly undo retransmission 541 timer backoffs. Chances for this to happen are minuscule, since a 542 particular ICMP message would need to contain the exact sequence 543 number of the current oldest outstanding segment (SND.UNA), while at 544 the same time TCP is in timeout-based loss recovery. However, two 545 "worst case" scenarios for the algorithm are possible: 547 For instance, consider a steady state TCP connection, which will be 548 disrupted at an intermediate router R due to a link outage. Upon the 549 expiration of the RTO, the TCP sender enters the timeout-based loss 550 recovery and starts to retransmit the earliest segment that has not 551 been acknowledged (SND.UNA). For some reason, router R delays all 552 corresponding ICMP unreachable messages so that the TCP sender 553 backoffs the retransmission timer normally without any undoing. At 554 the end of the connectivity disruption, the TCP sender eventually 555 detects the re-establishment, leaves the scheme and finally the 556 timeout-based loss recovery, too. A sequence number wrap-around 557 later, the connectivity between the two peers is disrupted again, but 558 this time due to congestion and exactly at the time at which the 559 current SND.UNA matches the SND.UNA from the previous cycle. If 560 router R emits the delayed ICMP unreachable messages now, the TCP 561 sender would incorrectly undo retransmission timer backoffs. As the 562 TCP sequence number contains 32 bits, the probability of this 563 scenario is at most 1/2^32. Given sufficiently many retransmissions 564 in the first timeout-based loss recovery, the corresponding ICMP 565 unreachable messages could reduce the RTO in the second recovery at 566 most to "RTO_BASE". However, once the ICMP unreachable messages are 567 depleted, the standard exponential backoff will be performed. Thus, 568 the congestion response will only be delayed by some false 569 retransmissions. 571 Similar to the above, consider the case where a steady state TCP 572 connection with n segments in-flight will be disrupted at some point 573 due to a link outage by an intermediate router R. For each segment 574 in-flight, router R may generate an ICMP unreachable message. 575 However, due to some reason it delays them. Once the link outage is 576 over and the connection has been re-established, the TCP sender 577 leaves the scheme and slow-starts the connection. Following a 578 sequence number wrap-around, a retransmission timeout occurs, just at 579 the moment the TCP sender's current window of data reaches the 580 previous range of the sequence number space again. In case router R 581 emits the delayed ICMP unreachable messages now, one spurious undoing 582 of the retransmission timer backoff is possible, if the TCP segment 583 number contained in ICMP unreachable messages matches the current 584 SND.UNA, and the timeout was a result of congestion. In the case of 585 another connectivity disruption, the additional undoing of the 586 retransmission timer backoff has no impact. The probability of this 587 scenario is at most n/2^32. 589 5.3. Packet Duplication 591 In case an intermediate router duplicates packets, a TCP sender may 592 receive more ICMP unreachable messages during timeout-based loss 593 recovery than it actually has sent timeout-based retransmissions. 594 However, since TCP-LCD keeps track of the number of performed 595 retransmission timer backoffs in the "BACKOFF_CNT" variable, it will 596 not undo more retransmission timer backoffs than were actually 597 performed. Nevertheless, if packet duplication and congestion 598 coincide on the path between the two communicating hosts, duplicated 599 ICMP messages could hide the congestion loss of some retransmissions 600 or ICMP messages, and the algorithm may incorrectly undo 601 retransmission timer backoffs. Considering the overall impact of a 602 router that duplicates packets, the additional load induced by some 603 spurious timeout-based retransmits can probably be neglected. 605 5.4. Probing Frequency 607 One could argue that if an ICMP unreachable message arrives for a 608 timeout-based retransmission, the RTO shall be reset or recalculated, 609 similar to what is done when an ACK arrives during timeout-based loss 610 recovery (see Karn's algorithm [KP87], [RFC2988]), and a new 611 retransmission should be sent immediately. Generally, this would 612 allow for a much higher probing frequency based on the round trip 613 time up to the router where connectivity has been disrupted. 614 However, we believe the current scheme provides a good trade-off 615 between conservative behavior and fast detection of connectivity re- 616 establishment. 618 5.5. Reaction during Connection Establishment 620 It is possible that a TCP sender enters timeout-based loss recovery 621 while the connection is in SYN-SENT or SYN-RECEIVED states [RFC0793]. 622 The algorithm described in this document could also be used for 623 faster connection establishment in networks with connectivity 624 disruptions. However, because existing TCP implementations [RFC5461] 625 already interpret ICMP unreachable messages during connection 626 establishment and abort the corresponding connection, we refrain from 627 suggesting this. 629 5.6. Reaction in Steady-State 631 Another exploitation of ICMP unreachable messages in the context of 632 TCP congestion control might seem appropriate in case the ICMP 633 unreachable message is received while TCP is in steady-state, and the 634 message refers to a segment from within the current window of data. 635 As the RTT up to the router that generated the ICMP unreachable 636 message is likely to be substantially shorter than the overall RTT to 637 the destination, the ICMP unreachable message may very well reach the 638 originating TCP while it is transmitting the current window of data. 639 In case the remaining window is large, it might seem appropriate to 640 refrain from transmitting the remaining window as there is timely 641 evidence that it will only trigger further ICMP unreachable messages 642 at the very router. Although this promises improvement from a 643 wastage perspective, it may be counterproductive from a security 644 perspective. An attacker could forge such ICMP messages, thereby 645 forcing the originating TCP to stop sending data, very similar to the 646 blind throughput-reduction attack mentioned in 647 [I-D.ietf-tcpm-icmp-attacks]. 649 An additional consideration is the following: in the presence of 650 multi-path routing even the receipt of a legitimate ICMP unreachable 651 message cannot be exploited accurately because there is the option 652 that only one of the multiple paths to the destination is suffering 653 from a connectivity disruption, which causes ICMP unreachable 654 messages to be sent. Then, however, there is the possibility that 655 the path along which the connectivity disruption occurred contributed 656 considerably to the overall bandwidth, such that a congestion 657 response is very well reasonable. However, this is not necessarily 658 the case. Therefore, a TCP has no means except for its inherent 659 congestion control to decide on this matter. All in all, it seems 660 that for a connection in steady-state, i.e., not in timeout-based 661 loss recovery, reacting on ICMP unreachable messages in regard to 662 congestion control is not appropriate. For the case of timeout-based 663 retransmissions, however, there is a reasonable congestion response, 664 which is skipping further retransmission timer backoffs because there 665 is no congestion indication - as described above. 667 6. Dissolving Ambiguity Issues (the Safe Variant) 669 Given that the TCP Timestamps option [RFC1323] is enabled for a 670 connection, a TCP sender MAY use the following algorithm to dissolve 671 the ambiguity issues mentioned in Sections 5.1, 5.2, and 5.3. In 672 particular, both the retransmission ambiguity and the packet 673 duplication problems are prevented by the following TCP-LCD variant. 674 On the other hand, the false positives caused by wrapped sequence 675 numbers cannot be completely avoided, but the likelihood is further 676 reduced by a factor of 1/2^32 since the Timestamp Value field (TSval) 677 of the TCP Timestamps Option contains 32 bits. 679 Hence, implementers may choose to implement the TCP-LCD with the 680 following modifications. 682 Step (1) is replaced by step (1'): 684 (1') Before TCP updates the variable "RTO" when it initiates 685 timeout-based loss recovery, set the variables "BACKOFF_CNT" 686 and "RTO_BASE" and the data structure "RETRANS_TS" as follows: 688 BACKOFF_CNT := 0; 689 RTO_BASE := RTO. 690 RETRANS_TS := []; 692 Proceed to step (R). 694 Step (2) is extended by step (2b): 696 (2b) Store the value of the Timestamp Value field (TSval) of the TCP 697 Timestamps option included in the retransmission "RET" sent in 698 step (R) into the "RETRANS_TS" data structure: 700 RETRANS_TS.add(RET.TSval) 702 Step (6) is replaced by step (6'): 704 (6') If "SEG.SEQ == SND.UNA && RETRANS_TS.exists(SEQ.TSval)", i.e., 705 if the TCP segment "SEG" eliciting the ICMP unreachable message 706 "ICMP_DU" carries the sequence number of a retransmission, and 707 the value in its Timestamp Value field (TSval) is valid, then 709 proceed to step (7'); 711 else 713 proceed to step (3). 715 Step (7) is replaced by step (7'): 717 (7') Undo the last retransmission timer backoff: 719 RETRANS_TS.remove(SEQ.TSval); 720 BACKOFF_CNT := BACKOFF_CNT - 1; 721 RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO). 723 The downside of the safe variant is twofold. Firstly, the 724 modifications come at a cost: the TCP sender is required to store the 725 timestamps of all retransmissions sent during one timeout-based loss 726 recovery. Second, the safe variant can only undo a retransmission 727 timer backoff if the intermediate router experiencing the link outage 728 implements [RFC1812] and chooses to include as many more than the 729 first 64 bits of the payload of the triggering datagram, as are 730 needed to include the TCP Timestamps option in the ICMP unreachable 731 message. 733 7. Interoperability Issues 735 This section discusses interoperability issues related to introducing 736 TCP-LCD. 738 7.1. Detection of TCP Connection Failures 740 TCP-LCD may have side-effects on TCP implementations that attempt to 741 detect TCP connection failures by counting timeout-based 742 retransmissions. RFC 1122 [RFC1122] states in Section 4.2.3.5 that a 743 TCP host must handle excessive retransmissions of data segments with 744 two thresholds R1 and R2 measuring the number of retransmissions that 745 have occurred for the same segment. Both thresholds might either be 746 measured in time units or as a count of retransmissions. 748 Due to TCP-LCD's revert strategy of the retransmission timer, the 749 assumption that a certain number of retransmissions corresponds to a 750 specific time interval no longer holds, as additional retransmissions 751 may be performed during timeout-based-loss recovery to detect the end 752 of the connectivity disruption. Therefore, a TCP employing TCP-LCD 753 either SHOULD measure the thresholds R1 and R2 in time units or, in 754 case R1 and R2 are counters of retransmissions, SHOULD convert them 755 into time intervals, which correspond to the time an unmodified TCP 756 would need to reach the specified number of retransmissions. 758 7.2. Explicit Congestion Notification 760 By the use of Explicit Congestion Notification (ECN) [RFC3168] ECN- 761 capable routers are no longer limited to dropping packets as 762 congestion indication. Instead, they can set the Congestion 763 Experienced (CE) codepoint in the IP header to indicate congestion. 765 With TCP-LCD it may happen that during a connectivity disruption a 766 received ICMP unreachable message has been elicited by a timeout- 767 based retransmission that was marked with the CE codepoint before 768 reaching the router experiencing the link outage. In such a case, we 769 suggest that the TCP sender SHOULD additionally reset the 770 retransmission timer in case the algorithm undoes a retransmission 771 timer backoff. 773 7.3. ICMP for IP version 6 775 RFC 4443 [RFC4443] specifies the Internet Control Message Protocol 776 (ICMPv6) to be used with the Internet Protocol version 6 (IPv6) 777 [RFC2460]. From TCP-LCD's point of view, it is important to notice 778 that for IPv6, the payload of an ICMPv6 error messages has to include 779 as many bytes as possible from the IPv6 datagram that elicited the 780 ICMPv6 error message, without making the error message exceed the 781 minimum IPv6 MTU (1280 bytes) [RFC4443]. Thus, more information is 782 available for TCP-LCD as in the case of IPv4. 784 The counterpart of the ICMPv4 destination unreachable message of code 785 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 786 destination unreachable message of code 0 (no route to destination) 787 [RFC4443]. As with IPv4, a router should generate an ICMPv6 788 destination unreachable message of code 0 in response to a packet 789 that cannot be delivered to its destination address because it lacks 790 a matching entry in its routing table. As a result, TCP-LCD can 791 employ this ICMPv6 error messages as connectivity disruption 792 indication, too. 794 7.4. TCP-LCD and IP Tunnels 796 It is worth noting that IP tunnels, including IPsec [RFC4301], IP in 797 IP [RFC2003], Generic Routing Encapsulation (GRE) [RFC2784], and 798 others are compatible with TCP-LCD, as long as the received ICMP 799 unreachable messages can be demultiplexed and extracted appropriately 800 by the TCP sender during timeout-based loss recovery. 802 If, for example, end-to-end tunnels like IPSec in transport mode 803 [RFC4301] are employed, a TCP sender may receive ICMP unreachable 804 messages where additional steps, e.g., decrypting in step (5) of the 805 algorithm, are needed to extract the TCP header from these ICMP 806 messages. Provided that the received ICMP unreachable message 807 contains enough information, i.e., SEQ.SEG is extractable, these 808 information MAY still be used as a valid input for the proposed 809 algorithm. 811 Likewise, if IP encapsulation like [RFC2003] is used in some part of 812 the path between the communicating hosts, the tunnel ingress node may 813 receive the ICMP unreachable messages from an intermediate router 814 experiencing the link outage. Nevertheless, the tunnel ingress node 815 may replay the ICMP unreachable messages in order to inform the TCP 816 sender. If enough information is preserved to extract SEQ.SEG, the 817 replayed ICMP unreachable messages MAY still be used in TCP-LCD. 819 8. Related Work 821 Several methods that address TCP's problems in the presence of 822 connectivity disruptions have been proposed in literature. Some of 823 them try to improve TCP's performance by modifying lower layers. For 824 example [SM03] introduces a "smart link layer", which buffers one 825 segment for each active connection and replays these segments upon 826 connectivity re-establishment. This approach has a serious drawback: 827 previously stateless intermediate routers have to be modified in 828 order to inspect TCP headers, to track the end-to-end connection, and 829 to provide additional buffer space. This leads to an additional need 830 of memory and processing power. 832 On the other hand, stateless link layer schemes, as proposed in 833 [RFC3819], which unconditionally buffer some small number of packets 834 may have another problem: if a packet is buffered longer than the 835 maximum segment lifetime (MSL) of 2 min [RFC0793], i.e., the 836 disconnection lasts longer than MSL, TCP's assumption that such 837 segments will never be received will no longer be true, violating 838 TCP's semantics [I-D.eggert-tcpm-tcp-retransmit-now]. 840 Other approaches, like TCP-F [CRVP01] or the Explicit Link Failure 841 Notification (ELFN) [HV02] inform a TCP sender about a disrupted path 842 by special messages generated and sent from intermediate routers. In 843 case of a link failure the TCP sender stops sending segments and 844 freezes its retransmission timers. TCP-F stays in this state and 845 remains silent until either a "route establishment notification" is 846 received or an internal timer expires. In contrast, ELFN 847 periodically probes the network to detect connectivity re- 848 establishment. Both proposals rely on changes to intermediate 849 routers, whereas the scheme proposed in this document is a sender- 850 only modification. Moreover, ELFN does not consider congestion and 851 may impose serious additional load on the network, depending on the 852 probe interval. 854 The authors of ATCP [LS01] propose enhancements to identify different 855 types of packet loss by introducing a layer between TCP and IP. They 856 utilize ICMP destination unreachable messages to set TCP's receiver 857 advertised window to zero, thus forcing the TCP sender to perform 858 zero window probing with a exponential backoff. ICMP destination 859 unreachable messages that arrive during this probing period are 860 ignored. This approach is nearly orthogonal to this document, which 861 exploits ICMP messages to undo a retransmission timer backoff when 862 TCP is already probing. In principle, both mechanisms could be 863 combined. However, due to security considerations it does not seem 864 appropriate to adopt ATCP's reaction as discussed in Section 5.6. 866 Schuetz et al. describe, in [I-D.schuetz-tcpm-tcp-rlci], a set of TCP 867 extensions that improve TCP's behavior when transmitting over paths 868 whose characteristics can change rapidly. Their proposed extensions 869 modify the local behavior of TCP and introduce a new TCP option to 870 signal locally received connectivity-change indications (CCIs) to 871 remote peers. Upon receipt of a CCI, they re-probe the path 872 characteristics either by performing a speculative retransmission or 873 by sending a single segment of new data, depending on whether the 874 connection is currently stalled in exponential backoff or 875 transmitting in steady-state, respectively. The authors focus on 876 specifying TCP response mechanisms, nevertheless underlying layers 877 would have to be modified to explicitly send CCIs to make these 878 immediate responses possible. 880 9. IANA Considerations 882 This memo includes no request to IANA. 884 10. Security Considerations 886 The algorithm proposed in this document is considered to be secure. 887 For example, an attacker who already guessed the correct four-tuple 888 (i.e., Source IP Address, Source TCP port, Destination IP Address, 889 and Destination TCP port), can still not make a TCP modified with 890 TCP-LCD to flood the network just by sending forged ICMP unreachable 891 messages in an attempt to maliciously shorten the retransmission 892 timer. The attacker additionally would need to guess the correct 893 segment sequence number of the current timeout-based retransmission, 894 with a probability of at most 1/2^32. Even in the case of man-in- 895 the-middle attacks, i.e., attacks performed in scenarios in which the 896 attacker can sniff the retransmissions, the impact on network load is 897 considered to be low, since the retransmission frequency is limited 898 by the RTO that was computed before TCP had entered the timeout-based 899 loss recovery. Hence, the highest probing frequency is expected to 900 be even lower than once per minimum RTO, i.e. 1s as specified by 901 [RFC2988]. 903 11. Acknowledgments 905 We would like to thank Kai Jakobs, Ilpo Jarvinen, Pasi Sarolahti, 906 Timothy Shepard, Joe Touch and Carsten Wolff for feedback on earlier 907 versions of this document. We also thank Michael Faber, Daniel 908 Schaffrath, and Damian Lukowski for implementing and testing the 909 algorithm in Linux. Special thanks go to Ilpo Jarvinen for giving 910 valuable feedback regarding the Linux implementation. 912 This work has been supported by the German National Science 913 Foundation (DFG) within the research excellence cluster Ultra High- 914 Speed Mobile Information and Communication (UMIC), RWTH Aachen 915 University. 917 12. References 919 12.1. Normative References 921 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 922 RFC 792, September 1981. 924 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 925 RFC 793, September 1981. 927 [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions 928 for High Performance", RFC 1323, May 1992. 930 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 931 RFC 1812, June 1995. 933 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 934 Timer", RFC 2988, November 2000. 936 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 937 Control", RFC 5681, September 2009. 939 12.2. Informative References 941 [CRVP01] Chandran, K., Raghunathan, S., Venkatesan, S., and R. 942 Prakash, "A feedback-based scheme for improving TCP 943 performance in ad hoc wireless networks", IEEE Personal 944 Communications vol. 8, no. 1, pp. 34-39, February 2001. 946 [HV02] Holland, G. and N. Vaidya, "Analysis of TCP performance 947 over mobile ad hoc networks", Wireless Networks vol. 8, 948 no. 2-3, pp. 275-288, March 2002. 950 [I-D.eggert-tcpm-tcp-retransmit-now] 951 Eggert, L., "TCP Extensions for Immediate 952 Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02 953 (work in progress), June 2005. 955 [I-D.ietf-tcpm-icmp-attacks] 956 Gont, F., "ICMP attacks against TCP", 957 draft-ietf-tcpm-icmp-attacks-12 (work in progress), 958 March 2010. 960 [I-D.schuetz-tcpm-tcp-rlci] 961 Schuetz, S., Koutsianas, N., Eggert, L., Eddy, W., Swami, 962 Y., and K. Le, "TCP Response to Lower-Layer Connectivity- 963 Change Indications", draft-schuetz-tcpm-tcp-rlci-03 (work 964 in progress), February 2008. 966 [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time 967 Estimates in Reliable Transport Protocols", Proceedings of 968 the Conference on Applications, Technologies, 969 Architectures, and Protocols for Computer Communication 970 (SIGCOMM'87) pp. 2-7, August 1987. 972 [LS01] Liu, J. and S. Singh, "ATCP: TCP for mobile ad hoc 973 networks", IEEE Journal on Selected Areas in 974 Communications vol. 19, no. 7, pp. 1300-1315, 2001 July. 976 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 977 September 1981. 979 [RFC0826] Plummer, D., "Ethernet Address Resolution Protocol: Or 980 converting network protocol addresses to 48.bit Ethernet 981 address for transmission on Ethernet hardware", STD 37, 982 RFC 826, November 1982. 984 [RFC1122] Braden, R., "Requirements for Internet Hosts - 985 Communication Layers", STD 3, RFC 1122, October 1989. 987 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 988 October 1996. 990 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 991 Requirement Levels", BCP 14, RFC 2119, March 1997. 993 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 994 (IPv6) Specification", RFC 2460, December 1998. 996 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 997 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 998 March 2000. 1000 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1001 of Explicit Congestion Notification (ECN) to IP", 1002 RFC 3168, September 2001. 1004 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 1005 for TCP", RFC 3522, April 2003. 1007 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 1008 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 1009 April 2004. 1011 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 1012 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1013 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1014 RFC 3819, July 2004. 1016 [RFC4015] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 1017 for TCP", RFC 4015, February 2005. 1019 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1020 Internet Protocol", RFC 4301, December 2005. 1022 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 1023 Message Protocol (ICMPv6) for the Internet Protocol 1024 Version 6 (IPv6) Specification", RFC 4443, March 2006. 1026 [RFC5461] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461, 1027 February 2009. 1029 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1030 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1031 Spurious Retransmission Timeouts with TCP", RFC 5682, 1032 September 2009. 1034 [SESB05] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner, 1035 "Protocol enhancements for intermittently connected 1036 hosts", SIGCOMM Computer Communication Review vol. 35, no. 1037 3, pp. 5-18, December 2005. 1039 [SM03] Scott, J. and G. Mapp, "Link layer-based TCP optimisation 1040 for disconnecting networks", SIGCOMM Computer 1041 Communication Review vol. 33, no. 5, pp. 31-42, 1042 October 2003. 1044 [Zh86] Zhang, L., "Why TCP Timers Don't Work Well", Proceedings 1045 of the Conference on Applications, Technologies, 1046 Architectures, and Protocols for Computer Communication 1047 (SIGCOMM'86) pp. 397-405, August 1986. 1049 Appendix A. Changes from previous versions of the draft 1051 A.1. Changes from draft-ietf-tcpm-tcp-lcd-00 1053 o Editorial changes. 1055 o Clarified TCP-LCD's behaviour during connection establishment 1056 (Thanks to Mark Handley). 1058 A.2. Changes from draft-zimmermann-tcp-lcd-02 1060 o Incorporated feedback submitted by Ilpo Jarvinen. 1061 1063 o Incorporated feedback submitted by Pasi Sarolahti. 1064 1066 o Incorporated feedback submitted by Joe Touch. 1067 1068 1070 o Extended and reorganized the discussion (Section 5): 1072 * Every discussion item got its own title, so that we have a 1073 better overview. 1075 * Extended Retransmission Ambiguity section. Added also some 1076 references to the historical retransmission ambiguity problem. 1078 * Heavily extended discussion about wrapped sequence numbers (see 1079 Joe's comments). 1081 * Described the influence of packet duplication on the algorithm 1082 (Thanks to Ilpo). 1084 * The section "Protecting Against Misbehaving Routers" is not a 1085 subsection anymore. Moreover, the section was renamed to 1086 "Dissolving Ambiguity Issues" and has now real content. 1088 o An interoperability issues section (Section 7) was added. In 1089 particular comments to ECN, ICMPv6, and to the two thresholds R1 1090 and R2 of [RFC1122] (Section 4.2.3.5) were added. 1092 o Miscellaneous editorial changes. In particular, the algorithm has 1093 a name now: TCP-LCD. 1095 A.3. Changes from draft-zimmermann-tcp-lcd-01 1097 o The algorithm in Section 4.2 was slightly changed. Instead of 1098 reverting the last retransmission timer backoff by halving the 1099 RTO, the RTO is recalculated with help of the "BACKOFF_CNT" 1100 variable. This fixes an issue that occurred when the 1101 retransmission timer was backed off but bounded by a maximum 1102 value. The algorithm in the previous version of the draft, would 1103 have "reverted" to half of that maximum value, instead of using 1104 the value, before the RTO was doubled (and then bounded). 1106 o Miscellaneous editorial changes. 1108 A.4. Changes from draft-zimmermann-tcp-lcd-00 1110 o Miscellaneous editorial changes in Section 1, 2 and 3. 1112 o The document was restructured in Section 1, 2 and 3 for easier 1113 reading. The motivation for the algorithm is changed according 1114 TCP's problem to disambiguate congestion from non-congestion loss. 1116 o Added Section 4.1. 1118 o The algorithm in Section 4.2 was restructured and simplified: 1120 * The special case of the first received ICMP destination 1121 unreachable message after an RTO was removed. 1123 * The "BACKOFF_CNT" variable was introduced so it is no longer 1124 possible to perform more reverts than backoffs. 1126 o The discussion in Section 5 was improved and expanded according to 1127 the algorithm changes. 1129 Authors' Addresses 1131 Alexander Zimmermann 1132 RWTH Aachen University 1133 Ahornstrasse 55 1134 Aachen, 52074 1135 Germany 1137 Phone: +49 241 80 21422 1138 Email: zimmermann@cs.rwth-aachen.de 1140 Arnd Hannemann 1141 RWTH Aachen University 1142 Ahornstrasse 55 1143 Aachen, 52074 1144 Germany 1146 Phone: +49 241 80 21423 1147 Email: hannemann@nets.rwth-aachen.de