idnits 2.17.1 draft-ietf-tcpm-tcp-lcd-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 29, 2010) is 5019 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor A. Zimmermann 3 Extensions (TCPM) WG A. Hannemann 4 Internet-Draft RWTH Aachen University 5 Intended status: Experimental July 29, 2010 6 Expires: January 30, 2011 8 Making TCP more Robust to Long Connectivity Disruptions (TCP-LCD) 9 draft-ietf-tcpm-tcp-lcd-02 11 Abstract 13 Disruptions in end-to-end path connectivity, which last longer than 14 one retransmission timeout, cause suboptimal TCP performance. The 15 reason for this performance degradation is that TCP interprets 16 segment loss induced by long connectivity disruptions as a sign of 17 congestion, resulting in repeated retransmission timer backoffs. 18 This, in turn, leads to a delayed detection of the re-establishment 19 of the connection since TCP waits for the next retransmission timeout 20 before it attempts a retransmission. 22 This document proposes an algorithm to make TCP more robust to long 23 connectivity disruptions (TCP-LCD). It describes how standard ICMP 24 messages can be exploited during timeout-based loss recovery to 25 disambiguate true congestion loss from non-congestion loss caused by 26 connectivity disruptions. Moreover, a reversion strategy of the 27 retransmission timer is specified that enables a more prompt 28 detection of whether or not the connectivity to a previously 29 disconnected peer node has been restored. TCP-LCD is a TCP sender- 30 only modification that effectively improves TCP performance in case 31 of connectivity disruptions. 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 30, 2011. 50 Copyright Notice 52 Copyright (c) 2010 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 3. Connectivity Disruption Indication . . . . . . . . . . . . . . 6 70 4. Connectivity Disruption Reaction . . . . . . . . . . . . . . . 8 71 4.1. Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . 8 72 4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 8 73 5. Discussion of TCP-LCD . . . . . . . . . . . . . . . . . . . . 11 74 5.1. Retransmission Ambiguity . . . . . . . . . . . . . . . . . 12 75 5.2. Wrapped Sequence Numbers . . . . . . . . . . . . . . . . . 12 76 5.3. Packet Duplication . . . . . . . . . . . . . . . . . . . . 14 77 5.4. Probing Frequency . . . . . . . . . . . . . . . . . . . . 14 78 5.5. Reaction during Connection Establishment . . . . . . . . . 14 79 5.6. Reaction in Steady-State . . . . . . . . . . . . . . . . . 15 80 6. Dissolving Ambiguity Issues using the TCP Timestamps Option . 15 81 7. Interoperability Issues . . . . . . . . . . . . . . . . . . . 17 82 7.1. Detection of TCP Connection Failures . . . . . . . . . . . 17 83 7.2. Explicit Congestion Notification . . . . . . . . . . . . . 17 84 7.3. ICMP for IP version 6 . . . . . . . . . . . . . . . . . . 18 85 7.4. TCP-LCD and IP Tunnels . . . . . . . . . . . . . . . . . . 18 86 8. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 19 87 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 88 10. Security Considerations . . . . . . . . . . . . . . . . . . . 20 89 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 90 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 91 12.1. Normative References . . . . . . . . . . . . . . . . . . . 21 92 12.2. Informative References . . . . . . . . . . . . . . . . . . 21 93 Appendix A. Changes from previous versions of the draft . . . . . 23 94 A.1. Changes from draft-ietf-tcpm-tcp-lcd-01 . . . . . . . . . 24 95 A.2. Changes from draft-ietf-tcpm-tcp-lcd-00 . . . . . . . . . 24 96 A.3. Changes from draft-zimmermann-tcp-lcd-02 . . . . . . . . . 24 97 A.4. Changes from draft-zimmermann-tcp-lcd-01 . . . . . . . . . 25 98 A.5. Changes from draft-zimmermann-tcp-lcd-00 . . . . . . . . . 25 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 101 1. Terminology 103 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 104 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 105 document are to be interpreted as described in [RFC2119]. 107 The reader should be familiar with the algorithm and terminology from 108 [RFC2988], which defines the standard algorithm Transmission Control 109 Protocol (TCP) senders are required to use to compute and manage 110 their retransmission timer. In this document, the terms 111 "retransmission timer" and "retransmission timeout" are used as 112 defined in [RFC2988]. The retransmission timer ensures data delivery 113 in the absence of any feedback from the receiver. The duration of 114 this timer is referred to as retransmission timeout (RTO). 116 As defined in [RFC0793], the term "acceptable acknowledgment (ACK)" 117 refers to a TCP segment that acknowledges previously unacknowledged 118 data. The TCP sender state variable "SND.UNA" and the current 119 segment variable "SEG.SEQ" are used as defined in [RFC0793]. SND.UNA 120 holds the segment sequence number of earliest segment that has not 121 been acknowledged by the TCP receiver (the oldest outstanding 122 segment). SEG.SEQ is the segment sequence number of a given segment. 124 For the purposes of this specification, we define the term "timeout- 125 based loss recovery" that refers to the state that a TCP sender 126 enters upon the first timeout of the oldest outstanding segment 127 (SND.UNA) and leaves upon the arrival of the *first* acceptable ACK. 128 It is important to note that other documents use a different 129 interpretation of the term "timeout-based loss recovery". For 130 example, the NewReno modification to TCP's Fast Recovery algorithm 131 [RFC3782] extents the period a TCP sender remains in timeout-based 132 loss recovery compared to the one defined in this document. This is 133 because [RFC3782] attempts to avoid unnecessary multiple Fast 134 Retransmits that can occur after an RTO. 136 2. Introduction 138 Connectivity disruptions can occur in many different situations. The 139 frequency of connectivity disruptions depends on the properties of 140 the end-to-end path between the communicating hosts. While 141 connectivity disruptions can occur in traditional wired networks, 142 e.g., caused by an unplugged network cable, the likelihood of their 143 occurrence is significantly higher in wireless (multi-hop) networks. 144 Especially, end-host mobility, network topology changes, and wireless 145 interferences are crucial factors. In the case of the Transmission 146 Control Protocol (TCP) [RFC0793], the performance of the connection 147 can experience a significant reduction compared to a permanently 148 connected path [SESB05]. This is because TCP, which was originally 149 designed to operate in fixed and wired networks, generally assumes 150 that the end-to-end path connectivity is relatively stable over the 151 connection's lifetime. 153 Depending on their duration, connectivity disruptions can be 154 classified into two groups [I-D.schuetz-tcpm-tcp-rlci]: "short" and 155 "long". A connectivity disruption is "short" if connectivity returns 156 before the retransmission timer fires for the first time. In this 157 case, TCP recovers lost data segments through Fast Retransmit and 158 lost acknowledgments (ACK) through successfully delivered later ACKs. 159 Connectivity disruptions are declared as "long" for a given TCP 160 connection if the retransmission timer fires at least once before 161 connectivity is resumed. Whether or not path characteristics, like 162 the round trip time (RTT) or the available bandwidth, have changed 163 when connectivity resumes after a disruption is another important 164 aspect for TCP's retransmission scheme [I-D.schuetz-tcpm-tcp-rlci]. 166 This document improves TCP's behavior in case of "long connectivity 167 disruptions". In particular, it focuses on the period prior to the 168 re-establishment of the connectivity to a previously disconnected 169 peer node. The document does not describe any modifications to TCP's 170 behavior and its congestion control mechanisms [RFC5681] after 171 connectivity has been restored. 173 When a long connectivity disruption occurs on a TCP connection, the 174 TCP sender eventually does not receive any more acknowledgments. 175 After the retransmission timer expires, the TCP sender enters the 176 timeout-based loss recovery and declares the oldest outstanding 177 segment (SND.UNA) as lost. Since TCP tightly couples reliability and 178 congestion control, the retransmission of SND.UNA is triggered 179 together with the reduction of the transmission rate. This is based 180 on the assumption that segment loss is an indication of congestion 181 [RFC5681]. As long as the connectivity disruption persists, TCP will 182 repeat this procedure until the oldest outstanding segment has 183 successfully been acknowledged, or until the connection has timed 184 out. TCP implementations that follow the recommended retransmission 185 timeout (RTO) management of RFC 2988 [RFC2988] double the RTO after 186 each retransmission attempt. However, the RTO growth may be bounded 187 by an upper limit, the maximum RTO, which is at least 60s, but may be 188 longer: Linux, for example, uses 120s. If connectivity is restored 189 between two retransmission attempts, TCP still has to wait until the 190 retransmission timer expires before resuming transmission, since it 191 simply does not have any means to know if the connectivity has been 192 re-established. Therefore, depending on when connectivity becomes 193 available again, this can waste up to a maximum RTO of possible 194 transmission time. 196 This retransmission behavior is not efficient, especially in 197 scenarios with long connectivity disruptions. In the ideal case, TCP 198 would attempt a retransmission as soon as connectivity to its peer 199 has been re-established. In this document, we specify a TCP sender- 200 only modification to provide robustness to long connectivity 201 disruptions (TCP-LCD). The memo describes how the standard Internet 202 Control Message Protocol (ICMP) can be exploited during timeout-based 203 loss recovery to identify non-congestion loss caused by long 204 connectivity disruptions. TCP-LCD's reversion strategy of the 205 retransmission timer enables higher-frequency retransmissions and 206 thereby a prompt detection when connectivity to a previously 207 disconnected peer node has been restored. If no congestion is 208 present, TCP-LCD approaches the ideal behavior. 210 3. Connectivity Disruption Indication 212 If the queue of an intermediate router that is experiencing a link 213 outage can buffer all incoming packets, a connectivity disruption 214 will only cause a variation in delay, which is handled well by TCP 215 implementations using either Eifel [RFC3522], [RFC4015] or Forward 216 RTO-Recovery (F-RTO) [RFC5682]. However, if the link outage lasts 217 for too long, the router experiencing the link outage is forced to 218 drop packets, and finally to discard the according route. Means to 219 detect such link outages include reacting on failed address 220 resolution protocol (ARP) [RFC0826] queries, unsuccessful link 221 sensing, and the like. However, this is solely in the responsibility 222 of the respective router. 224 Note: The focus of this memo is on introducing a method how ICMP 225 messages may be exploited to improve TCP's performance; how 226 different physical and link layer mechanisms below the network 227 layer may trigger ICMP destination unreachable messages are out of 228 scope of this memo. 230 Provided that no other route to the specific destination exists, the 231 router will notify the corresponding sending host about the dropped 232 packets via ICMP destination unreachable messages of code 0 (net 233 unreachable) or code 1 (host unreachable) [RFC1812]. Therefore, the 234 sending host can use the ICMP destination unreachable messages of 235 these codes as an indication for a connectivity disruption, since the 236 reception of these messages provide evidence that packets were 237 dropped due to a link outage. 239 Note that there are also other ICMP destination unreachable messages 240 with different codes. Some of them are candidates for connectivity 241 disruption indications, too, but need further investigation. For 242 example, ICMP destination unreachable messages with code 5 (source 243 route failed), code 11 (net unreachable for TOS), or code 12 (host 244 unreachable for TOS) [RFC1812]. On the other hand, codes that flag 245 hard errors are of no use for this scheme, since TCP should abort the 246 connection when those are received [RFC1122]. In the following, the 247 term "ICMP unreachable message" is used as synonym for ICMP 248 destination unreachable messages of code 0 or code 1. 250 The accurate interpretation of ICMP unreachable messages as a 251 connectivity disruption indication is complicated by the following 252 two peculiarities of ICMP messages. First, they do not necessarily 253 operate on the same timescale as the packets, i.e., TCP segments that 254 elicited them. When a router drops a packet due to a missing route, 255 it will not necessarily send an ICMP unreachable message immediately, 256 but will rather queue it for later delivery. Second, ICMP messages 257 are subject to rate limiting, e.g., when a router drops a whole 258 window of data due to a link outage, it is unlikely to send as many 259 ICMP unreachable messages as dropped TCP segments. Depending on the 260 load of the router, it may not even send any ICMP unreachable 261 messages at all. Both peculiarities originate from [RFC1812]. 263 Fortunately, according to [RFC0792], ICMP unreachable messages have 264 to contain in their body the entire Internet Protocol (IP) header 265 [RFC0791] of the datagram eliciting the ICMP unreachable message, 266 plus the first 64 bits of the payload of that datagram. This allows 267 the sending host to match the ICMP error message to the transport 268 connection that elicited it. RFC 1812 [RFC1812] augments these 269 requirements and states that ICMP messages should contain as much of 270 the original datagram as possible without the length of the ICMP 271 datagram exceeding 576 bytes. Therefore, in case of TCP, at least 272 the source port number, the destination port number, and the 32-bit 273 TCP sequence number are included. This allows the originating TCP to 274 demultiplex the received ICMP message and to identify the affected 275 connection. Moreover, it can identify which segment of the 276 respective connection triggered the ICMP unreachable message, unless 277 there are several segments in-flight with the same sequence number 278 (see Section 5.1). 280 A connectivity disruption indication in form of an ICMP unreachable 281 message associated with a presumably lost TCP segment provides strong 282 evidence that the segment was not dropped due to congestion, but was 283 successfully delivered as far as the reporting router. It therefore 284 did not witness any congestion at least on that part of the path that 285 was traversed by both the TCP segment eliciting the ICMP unreachable 286 message as well as the ICMP unreachable message itself. 288 4. Connectivity Disruption Reaction 290 Section 4.1 introduces the basic idea of TCP-LCD. The complete 291 algorithm is specified in Section 4.2. 293 4.1. Basic Idea 295 The goal of the algorithm is to promptly detect when connectivity to 296 a previously disconnected peer node has been restored after a long 297 connectivity disruption, while retaining appropriate behavior in case 298 of congestion. TCP-LCD exploits standard ICMP unreachable messages 299 during timeout-based loss recovery. This increases TCP's 300 retransmission frequency by undoing one retransmission timer backoff 301 whenever an ICMP unreachable message is received that contains a 302 segment with a sequence number of a presumably lost retransmission. 304 This approach has the advantage of appropriately reducing the probing 305 rate in case of congestion. If either the retransmission itself or 306 the corresponding ICMP message is dropped the previously performed 307 retransmission timer backoff is not undone, which effectively halves 308 the probing rate. 310 4.2. Algorithm Details 312 A TCP sender that uses RFC 2988 [RFC2988] to compute TCP's 313 retransmission timer MAY employ the following scheme to avoid over- 314 conservative retransmission timer backoffs in case of long 315 connectivity disruptions. If a TCP sender does implement the 316 following steps, the algorithm MUST be initiated upon the first 317 timeout of the oldest outstanding segment (SND.UNA) and MUST be 318 stopped upon the arrival of the first acceptable ACK. The algorithm 319 MUST NOT be re-initiated upon subsequent timeouts for the same 320 segment. The scheme SHOULD NOT be used in SYN-SENT or SYN-RECEIVED 321 states [RFC0793] (see Section 5.5). 323 A TCP sender that does not employ RFC 2988 [RFC2988] to compute TCP's 324 retransmission timer MUST NOT use TCP-LCD. We envision that the 325 scheme could be easily adapted to algorithms others than RFC 2988. 326 However, we leave this as future work. 328 In rule (2.5), RFC 2988 [RFC2988] provides the option to place a 329 maximum value on the RTO. When a TCP implements this rule to provide 330 an upper bound for the RTO, it MUST also be used in the following 331 algorithm. In particular, if the RTO is bounded by an upper limit 332 (maximum RTO), the "MAX_RTO" variable used in this scheme MUST be 333 initialized with this upper limit. Otherwise, if the RTO is 334 unbounded, the "MAX_RTO" variable MUST be set to infinity. 336 The scheme specified in this document uses the "BACKOFF_CNT" 337 variable, whose initial value is zero. The variable is used to count 338 the number of performed retransmission timer backoffs during one 339 timeout-based loss recovery. Moreover, the "RTO_BASE" variable is 340 used to recover the previous RTO if the retransmission timer backoff 341 was unnecessary. The variable is initialized with the RTO upon 342 initiation of timeout-based loss recovery. 344 (1) Before TCP updates the variable "RTO" when it initiates timeout- 345 based loss recovery, set the variables "BACKOFF_CNT" and 346 "RTO_BASE" as follows: 348 BACKOFF_CNT := 0; 349 RTO_BASE := RTO. 351 Proceed to step (R). 353 (R) This is a placeholder for standard TCP's behavior in case the 354 retransmission timer has expired. In particular, if RFC 2988 355 [RFC2988] is used, steps (5.4) - (5.6) of that algorithm go 356 here. Proceed to step (2). 358 (2) To account for the expiration of the retransmission timer in the 359 previous step (R), increment the "BACKOFF_CNT" variable by one: 361 BACKOFF_CNT := BACKOFF_CNT + 1. 363 (3) Wait either 365 for the expiration of the retransmission timer. When the 366 retransmission timer expires, proceed to step (R); 368 or for the arrival of an acceptable ACK. When an acceptable 369 ACK arrives, proceed to step (A); 371 or for the arrival of an ICMP unreachable message. When the 372 ICMP unreachable message "ICMP_DU" arrives, proceed to step 373 (4). 375 (4) If "BACKOFF_CNT > 0", i.e., if at least one retransmission timer 376 backoff can be undone, then 378 proceed to step (5); 380 else 382 proceed to step (3). 384 (5) Extract the TCP segment header included in the ICMP unreachable 385 message "ICMP_DU": 387 SEG := Extract(ICMP_DU). 389 (6) If "SEG.SEQ == SND.UNA", i.e., if the TCP segment "SEG" 390 eliciting the ICMP unreachable message "ICMP_DU" contains the 391 sequence number of a retransmission, then 393 proceed to step (7); 395 else 397 proceed to step (3). 399 (7) Undo the last retransmission timer backoff: 401 BACKOFF_CNT := BACKOFF_CNT - 1; 402 RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO). 404 (8) If the retransmission timer expires due to the undoing in the 405 previous step (7), then 407 proceed to step (R); 409 else 411 proceed to step (3). 413 (A) This is a placeholder for standard TCP's behavior in case an 414 acceptable ACK has arrived. No further processing. 416 When a TCP in steady-state detects a segment loss using the 417 retransmission timer, it enters the timeout-based loss recovery and 418 initiates the algorithm (step 1). It adjusts the slow start 419 threshold (ssthresh), sets the congestion window (CWND) to one 420 segment, backs off the retransmission timer, and retransmits the 421 first unacknowledged segment (step R) [RFC5681], [RFC2988]. To 422 account for the expiration of the retransmission timer, the TCP 423 sender increments the "BACKOFF_CNT" variable by one (step 2). 425 In case the retransmission timer expires again (step 3a), a TCP will 426 repeat the retransmission of the first unacknowledged segment and 427 back off the retransmission timer once more (step R) [RFC2988], as 428 well as increment the "BACKOFF_CNT" variable by one (step 2). Note 429 that a TCP may implement RFC 2988's [RFC2988] option to place a 430 maximum value on the RTO that may result in not performing the 431 retransmission timer backoff. However, step (2) MUST always and 432 unconditionally be applied, no matter whether or not the 433 retransmission timer is actually backed off. In other words, each 434 time the retransmission timer expires, the "BACKOFF_CNT" variable 435 MUST be incremented by one. 437 If the first received packet after the retransmission(s) is an 438 acceptable ACK (step 3b), a TCP will proceed as normal, i.e., slow 439 start the connection and terminate the algorithm (step A). Later 440 ICMP unreachable messages from the just terminated timeout-based loss 441 recovery are ignored, since the ACK clock is already restarting due 442 to the successful retransmission. 444 On the other hand, if the first received packet after the 445 retransmission(s) is an ICMP unreachable message (step 3c), and if 446 step (4) permits it, TCP SHOULD undo one backoff for each ICMP 447 unreachable message reporting an error on a retransmission. To 448 decide if an ICMP unreachable message was elicited by a 449 retransmission, the sequence number it contains is inspected (step 5, 450 step 6). The undo is performed by re-calculating the RTO with the 451 decremented "BACKOFF_CNT" variable (step 7). This calculation 452 explicitly matches the (bounded) exponential backoff specified in 453 rule (5.5) of [RFC2988]. 455 Upon receipt of an ICMP unreachable message that legitimately undoes 456 one backoff, there is the possibility that the shortened 457 retransmission timer has already expired (step 8). Then, TCP SHOULD 458 retransmit immediately. In case the shortened retransmission timer 459 has not yet expired, TCP MUST wait accordingly. 461 5. Discussion of TCP-LCD 463 TCP-LCD takes caution to only react to connectivity disruption 464 indications in the form of ICMP unreachable messages during timeout- 465 based loss recovery. Therefore, TCP's behavior is not altered when 466 either no ICMP unreachable messages are received, or the 467 retransmission timer of the TCP sender did not expire since the last 468 received acceptable ACK. Thus, by defintion, the algorithm triggers 469 only in the case of long connectivity disruptions. 471 Only such ICMP unreachable messages that contain a TCP segment with a 472 the sequence number of a retransmission, i.e., contain SND.UNA, are 473 evaluated by TCP-LCD. All other ICMP unreachable messages are 474 ignored. The arrival of those ICMP unreachable messages provides 475 strong evidence that the retransmissions were not dropped due to 476 congestion, but were successfully delivered to the reporting router. 477 In other words, there is no evidence for any congestion at least on 478 that very part of the path that was traversed by both the TCP segment 479 eliciting the ICMP unreachable message as well as the ICMP 480 unreachable message itself. 482 However, there are some situations where TCP-LCD makes a false 483 decision and incorrectly undoes a retransmission timer backoff. This 484 can happen, even when the received ICMP unreachable message contains 485 the segment number of a retransmission (SND.UNA), because the TCP 486 segment that elicited the ICMP unreachable message may either not be 487 a retransmission (Section 5.1), or does not belong to the current 488 timeout-based loss recovery (Section 5.2). Finally, packet 489 duplication (Section 5.3) can also spuriously trigger the algorithm. 491 Section 5.4 discusses possible probing frequencies, while Section 5.6 492 describes the motivation for not reacting to ICMP unreachable 493 messages while TCP is in steady-state. 495 5.1. Retransmission Ambiguity 497 Historically, the retransmission ambiguity problem [Zh86], [KP87] is 498 the TCP sender's inability to distinguish whether the first 499 acceptable ACK after a retransmission refers to the original 500 transmission or to the retransmission. This problem occurs after 501 both a Fast Retransmit and a timeout-based retransmit. However, 502 modern TCP implementations can eliminate the retransmission ambiguity 503 with either the help of Eifel [RFC3522], [RFC4015] or Forward RTO- 504 Recovery (F-RTO) [RFC5682]. 506 The reversion strategy of the given algorithm suffers from a form of 507 retransmission ambiguity, too. In contrast to the above case, TCP 508 suffers from ambiguity regarding ICMP unreachable messages received 509 during timeout-based loss recovery. With the TCP segment number 510 included in the ICMP unreachable message, a TCP sender is not able to 511 determine if the ICMP unreachable message refers to the original 512 transmission or to any of the timeout-based retransmissions. That 513 is, there is an ambiguity with regards to which TCP segment an ICMP 514 unreachable message reports on. 516 However, this ambiguity is not considered to be a problem for the 517 algorithm. The assumption that a received ICMP message provides 518 evidence that a non-congestion loss caused by the connectivity 519 disruption was wrongly considered a congestion loss still holds, 520 regardless to which TCP segment, transmission or retransmission, the 521 message refers. 523 5.2. Wrapped Sequence Numbers 525 Besides the ambiguity whether a received ICMP unreachable message 526 refers to the original transmission or to any of the retransmissions, 527 there is another source of ambiguity related to the TCP sequence 528 numbers contained in ICMP unreachable messages. For high bandwidth 529 paths, the sequence space may wrap quickly. This migth cause that 530 delayed ICMP unreachable messages may coincidentally fit as valid 531 input in the proposed scheme. As a result, the scheme may 532 incorrectly undo retransmission timer backoffs. Chances for this to 533 happen are minuscule, since a particular ICMP message would need to 534 contain the exact sequence number of the current oldest outstanding 535 segment (SND.UNA), while at the same time TCP is in timeout-based 536 loss recovery. However, two "worst case" scenarios for the algorithm 537 are possible: 539 For instance, consider a steady state TCP connection, which will be 540 disrupted at an intermediate router R due to a link outage. Upon the 541 expiration of the RTO, the TCP sender enters the timeout-based loss 542 recovery and starts to retransmit the earliest segment that has not 543 been acknowledged (SND.UNA). For some reason, router R delays all 544 corresponding ICMP unreachable messages so that the TCP sender backs 545 the retransmission timer off normally without any undoing. At the 546 end of the connectivity disruption, the TCP sender eventually detects 547 the re-establishment, leaves the scheme and finally the timeout-based 548 loss recovery, too. A sequence number wrap-around later, the 549 connectivity between the two peers is disrupted again, but this time 550 due to congestion and exactly at the time at which the current 551 SND.UNA matches the SND.UNA from the previous cycle. If router R 552 emits the delayed ICMP unreachable messages now, the TCP sender would 553 incorrectly undo retransmission timer backoffs. As the TCP sequence 554 number contains 32 bits, the probability of this scenario is at most 555 1/2^32. Given sufficiently many retransmissions in the first 556 timeout-based loss recovery, the corresponding ICMP unreachable 557 messages could reduce the RTO in the second recovery at most to 558 "RTO_BASE". However, once the ICMP unreachable messages are 559 depleted, the standard exponential backoff will be performed. Thus, 560 the congestion response will only be delayed by some false 561 retransmissions. 563 Similar to the above, consider the case where a steady state TCP 564 connection with n segments in flight will be disrupted at some point 565 due to a link outage at an intermediate router R. For each segment in 566 flight, router R may generate an ICMP unreachable message. However, 567 due to some reason it delays them. Once the link outage is over and 568 the connection has been re-established, the TCP sender leaves the 569 scheme and slow-starts the connection. Following a sequence number 570 wrap-around, a retransmission timeout occurs, just at the moment the 571 TCP sender's current window of data reaches the previous range of the 572 sequence number space again. In case router R emits the delayed ICMP 573 unreachable messages now, spurious undoing of the retransmission 574 timer backoff is possible once, if the TCP segment number contained 575 in ICMP unreachable messages matches the current SND.UNA, and the 576 timeout was a result of congestion. In the case of another 577 connectivity disruption, the additional undoing of the retransmission 578 timer backoff has no impact. The probability of this scenario is at 579 most n/2^32. 581 5.3. Packet Duplication 583 In case an intermediate router duplicates packets, a TCP sender may 584 receive more ICMP unreachable messages during timeout-based loss 585 recovery than sent timeout-based retransmissions. However, since 586 TCP-LCD keeps track of the number of performed retransmission timer 587 backoffs in the "BACKOFF_CNT" variable, it will not undo more 588 retransmission timer backoffs than were actually performed. 589 Nevertheless, if packet duplication and congestion coincide on the 590 path between the two communicating hosts, duplicated ICMP messages 591 could hide the congestion loss of some retransmissions or ICMP 592 messages, and the algorithm may incorrectly undo retransmission timer 593 backoffs. Considering the overall impact of a router that duplicates 594 packets, the additional load induced by some spurious timeout-based 595 retransmits can probably be neglected. 597 5.4. Probing Frequency 599 One could argue that if an ICMP unreachable message arrives for a 600 timeout-based retransmission, the RTO shall be reset or recalculated, 601 similar to what is done when an ACK arrives during timeout-based loss 602 recovery (see Karn's algorithm [KP87], [RFC2988]), and a new 603 retransmission should be sent immediately. Generally, this would 604 allow for a much higher probing frequency based on the round trip 605 time up to the router where connectivity has been disrupted. 606 However, we believe the current scheme provides a good trade-off 607 between conservative behavior and fast detection of connectivity re- 608 establishment. 610 5.5. Reaction during Connection Establishment 612 It is possible that a TCP sender enters timeout-based loss recovery 613 while the connection is in SYN-SENT or SYN-RECEIVED states [RFC0793]. 614 The algorithm described in this document could also be used for 615 faster connection establishment in networks with connectivity 616 disruptions. However, because existing TCP implementations [RFC5461] 617 already interpret ICMP unreachable messages during connection 618 establishment and abort the corresponding connection, we refrain from 619 suggesting this. 621 5.6. Reaction in Steady-State 623 Another exploitation of ICMP unreachable messages in the context of 624 TCP congestion control might seem appropriate in case the ICMP 625 unreachable message is received while TCP is in steady-state, and the 626 message refers to a segment from within the current window of data. 627 As the RTT up to the router that generated the ICMP unreachable 628 message is likely to be substantially shorter than the overall RTT to 629 the destination, the ICMP unreachable message may very well reach the 630 originating TCP while it is transmitting the current window of data. 631 In case the remaining window is large, it might seem appropriate to 632 refrain from transmitting the remaining window as there is timely 633 evidence that it will only trigger further ICMP unreachable messages 634 at the very router. Although this promises improvement from a 635 wastage perspective, it may be counterproductive from a security 636 perspective. An attacker could forge such ICMP messages, thereby 637 forcing the originating TCP to stop sending data, very similar to the 638 blind throughput-reduction attack mentioned in [RFC5927]. 640 An additional consideration is the following: in the presence of 641 multi-path routing, even the receipt of a legitimate ICMP unreachable 642 message cannot be exploited accurately, because there is the 643 possibility that only one of the multiple paths to the destination is 644 suffering from a connectivity disruption, which causes ICMP 645 unreachable messages to be sent. Then, however, there is the 646 possibility that the path along which the connectivity disruption 647 occurred contributed considerably to the overall bandwidth, such that 648 a congestion response is very well reasonable. However, this is not 649 necessarily the case. Therefore, a TCP has no means except for its 650 inherent congestion control to decide on this matter. All in all, it 651 seems that for a connection in steady-state, i.e., not in timeout- 652 based loss recovery, reacting on ICMP unreachable messages in regard 653 to congestion control is not appropriate. For the case of timeout- 654 based retransmissions, however, there is a reasonable congestion 655 response, which is skipping further retransmission timer backoffs 656 because there is no congestion indication - as described above. 658 6. Dissolving Ambiguity Issues using the TCP Timestamps Option 660 If the TCP Timestamps option [RFC1323] is enabled for a connection, a 661 TCP sender SHOULD use the following algorithm to dissolve the 662 ambiguity issues mentioned in Sections 5.1, 5.2, and 5.3. In 663 particular, both the retransmission ambiguity and the packet 664 duplication problems are prevented by the following TCP-LCD variant. 665 On the other hand, the false positives caused by wrapped sequence 666 numbers cannot be completely avoided, but the likelihood is further 667 reduced by a factor of 1/2^32 since the Timestamp Value field (TSval) 668 of the TCP Timestamps Option contains 32 bits. 670 Hence, implementers may choose to implement the TCP-LCD with the 671 following modifications. 673 Step (1) is replaced by step (1'): 675 (1') Before TCP updates the variable "RTO" when it initiates 676 timeout-based loss recovery, set the variables "BACKOFF_CNT" 677 and "RTO_BASE" and the data structure "RETRANS_TS" as follows: 679 BACKOFF_CNT := 0; 680 RTO_BASE := RTO; 681 RETRANS_TS := []. 683 Proceed to step (R). 685 Step (2) is extended by step (2b): 687 (2b) Store the value of the Timestamp Value field (TSval) of the TCP 688 Timestamps option included in the retransmission "RET" sent in 689 step (R) into the "RETRANS_TS" data structure: 691 RETRANS_TS.add(RET.TSval) 693 Step (6) is replaced by step (6'): 695 (6') If "SEG.SEQ == SND.UNA && RETRANS_TS.exists(SEQ.TSval)", i.e., 696 if the TCP segment "SEG" eliciting the ICMP unreachable message 697 "ICMP_DU" contains the sequence number of a retransmission, and 698 the value in its Timestamp Value field (TSval) is valid, then 700 proceed to step (7'); 702 else 704 proceed to step (3). 706 Step (7) is replaced by step (7'): 708 (7') Undo the last retransmission timer backoff: 710 RETRANS_TS.remove(SEQ.TSval); 711 BACKOFF_CNT := BACKOFF_CNT - 1; 712 RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO). 714 The downside of the this variant is twofold. First, the 715 modifications come at a cost: the TCP sender is required to store the 716 timestamps of all retransmissions sent during one timeout-based loss 717 recovery. Second, this variant can only undo a retransmission timer 718 backoff if the intermediate router experiencing the link outage 719 implements [RFC1812] and chooses to include as many more than the 720 first 64 bits of the payload of the triggering datagram, as are 721 needed to include the TCP Timestamps option in the ICMP unreachable 722 message. 724 7. Interoperability Issues 726 This section discusses interoperability issues related to introducing 727 TCP-LCD. 729 7.1. Detection of TCP Connection Failures 731 TCP-LCD may have side-effects on TCP implementations that attempt to 732 detect TCP connection failures by counting timeout-based 733 retransmissions. [RFC1122] states in Section 4.2.3.5 that a TCP host 734 must handle excessive retransmissions of data segments with two 735 thresholds R1 and R2 that measure the number of retransmissions that 736 have occurred for the same segment. Both thresholds might either be 737 measured in time units or as a count of retransmissions. 739 Due to TCP-LCD's reversion strategy of the retransmission timer, the 740 assumption that a certain number of retransmissions corresponds to a 741 specific time interval no longer holds, as additional retransmissions 742 may be performed during timeout-based-loss recovery to detect the end 743 of the connectivity disruption. Therefore, a TCP employing TCP-LCD 744 either MUST measure the thresholds R1 and R2 in time units or, in 745 case R1 and R2 are counters of retransmissions, MUST convert them 746 into time intervals, which correspond to the time an unmodified TCP 747 would need to reach the specified number of retransmissions. 749 7.2. Explicit Congestion Notification 751 With Explicit Congestion Notification (ECN) [RFC3168], ECN-capable 752 routers are no longer limited to dropping packets to indicate 753 congestion. Instead, they can set the Congestion Experienced (CE) 754 codepoint in the IP header to indicate congestion. With TCP-LCD, it 755 may happen that during a connectivity disruption, a received ICMP 756 unreachable message has been elicited by a timeout-based 757 retransmission that was marked with the CE codepoint before reaching 758 the router experiencing the link outage. In such a case, a TCP 759 sender MUST, corresponding to [RFC3168] (Section 6.1.2), additionally 760 reset the retransmission timer in case the algorithm undoes a 761 retransmission timer backoff. 763 7.3. ICMP for IP version 6 765 RFC 4443 [RFC4443] specifies the Internet Control Message Protocol 766 (ICMPv6) to be used with the Internet Protocol version 6 (IPv6) 767 [RFC2460]. From TCP-LCD's point of view, it is important to notice 768 that for IPv6, the payload of an ICMPv6 error messages has to include 769 as many bytes as possible from the IPv6 datagram that elicited the 770 ICMPv6 error message, without making the error message exceed the 771 minimum IPv6 MTU (1280 bytes) [RFC4443]. Thus, more information is 772 available for TCP-LCD than in the case of IPv4. 774 The counterpart of the ICMPv4 destination unreachable message of code 775 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 776 destination unreachable message of code 0 (no route to destination) 777 [RFC4443]. As with IPv4, a router should generate an ICMPv6 778 destination unreachable message of code 0 in response to a packet 779 that cannot be delivered to its destination address because it lacks 780 a matching entry in its routing table. As a result, TCP-LCD can 781 employ this ICMPv6 error messages as connectivity disruption 782 indication, too. 784 7.4. TCP-LCD and IP Tunnels 786 It is worth noting that IP tunnels, including IPsec [RFC4301], IP in 787 IP [RFC2003], Generic Routing Encapsulation (GRE) [RFC2784], and 788 others are compatible with TCP-LCD, as long as the received ICMP 789 unreachable messages can be demultiplexed and extracted appropriately 790 by the TCP sender during timeout-based loss recovery. 792 If, for example, end-to-end tunnels like IPsec in transport mode 793 [RFC4301] are employed, a TCP sender may receive ICMP unreachable 794 messages where additional steps, e.g., decrypting in step (5) of the 795 algorithm, are needed to extract the TCP header from these ICMP 796 messages. Provided that the received ICMP unreachable message 797 contains enough information, i.e., SEQ.SEG is extractable, this 798 information can still be used as a valid input for the proposed 799 algorithm. 801 Likewise, if IP encapsulation like [RFC2003] is used in some part of 802 the path between the communicating hosts, the tunnel ingress node may 803 receive the ICMP unreachable messages from an intermediate router 804 experiencing the link outage. Nevertheless, the tunnel ingress node 805 may replay the ICMP unreachable messages in order to inform the TCP 806 sender. If enough information is preserved to extract SEQ.SEG, the 807 replayed ICMP unreachable messages can still be used in TCP-LCD. 809 8. Related Work 811 Several methods that address TCP's problems in the presence of 812 connectivity disruptions have been proposed in literature. Some of 813 them try to improve TCP's performance by modifying lower layers. For 814 example, [SM03] introduces a "smart link layer", which buffers one 815 segment for each active connection and replays these segments upon 816 connectivity re-establishment. This approach has a serious drawback: 817 previously stateless intermediate routers have to be modified in 818 order to inspect TCP headers, to track the end-to-end connection, and 819 to provide additional buffer space. This leads to an additional need 820 of memory and processing power. 822 On the other hand, stateless link layer schemes, as proposed in 823 [RFC3819], which unconditionally buffer some small number of packets 824 may have another problem: if a packet is buffered longer than the 825 maximum segment lifetime (MSL) of 2 min [RFC0793], i.e., the 826 disconnection lasts longer than MSL, TCP's assumption that such 827 segments will never be received will no longer be true, violating 828 TCP's semantics [I-D.eggert-tcpm-tcp-retransmit-now]. 830 Other approaches, like TCP-F [CRVP01] or the Explicit Link Failure 831 Notification (ELFN) [HV02] inform a TCP sender about a disrupted path 832 by special messages generated and sent from intermediate routers. In 833 the case of a link failure, the TCP sender stops sending segments and 834 freezes its retransmission timers. TCP-F stays in this state and 835 remains silent until either a "route establishment notification" is 836 received or an internal timer expires. In contrast, ELFN 837 periodically probes the network to detect connectivity re- 838 establishment. Both proposals rely on changes to intermediate 839 routers, whereas the scheme proposed in this document is a sender- 840 only modification. Moreover, ELFN does not consider congestion and 841 may impose serious additional load on the network, depending on the 842 probe interval. 844 The authors of ATCP [LS01] propose enhancements to identify different 845 types of packet loss by introducing a layer between TCP and IP. They 846 utilize ICMP destination unreachable messages to set TCP's receiver 847 advertised window to zero, thus forcing the TCP sender to perform 848 zero window probing with an exponential backoff. ICMP destination 849 unreachable messages that arrive during this probing period are 850 ignored. This approach is nearly orthogonal to this document, which 851 exploits ICMP messages to undo a retransmission timer backoff when 852 TCP is already probing. In principle, both mechanisms could be 853 combined. However, due to security considerations, it does not seem 854 appropriate to adopt ATCP's reaction, as discussed in Section 5.6. 856 Schuetz et al. [I-D.schuetz-tcpm-tcp-rlci] describe a set of TCP 857 extensions that improve TCP's behavior when transmitting over paths 858 whose characteristics can change rapidly. Their proposed extensions 859 modify the local behavior of TCP and introduce a new TCP option to 860 signal locally received connectivity-change indications (CCIs) to 861 remote peers. Upon receipt of a CCI, they re-probe the path 862 characteristics either by performing a speculative retransmission or 863 by sending a single segment of new data, depending on whether the 864 connection is currently stalled in exponential backoff or 865 transmitting in steady-state, respectively. The authors focus on 866 specifying TCP response mechanisms, nevertheless underlying layers 867 would have to be modified to explicitly send CCIs to make these 868 immediate responses possible. 870 9. IANA Considerations 872 This memo includes no request to IANA. 874 10. Security Considerations 876 The algorithm proposed in this document is considered to be secure. 877 For example, an attacker who already guessed the correct four-tuple 878 (i.e., Source IP Address, Source TCP port, Destination IP Address, 879 and Destination TCP port), can still not make a TCP modified with 880 TCP-LCD flood the network just by sending forged ICMP unreachable 881 messages in an attempt to maliciously shorten the retransmission 882 timer. The attacker additionally would need to guess the correct 883 segment sequence number of the current timeout-based retransmission, 884 with a probability of at most 1/2^32. Even in the case of man-in- 885 the-middle attacks, i.e., attacks performed in scenarios in which the 886 attacker can sniff the retransmissions, the impact on network load is 887 considered to be low, since the retransmission frequency is limited 888 by the RTO that was computed before TCP had entered the timeout-based 889 loss recovery. Hence, the highest probing frequency is expected to 890 be even lower than once per minimum RTO, i.e. 1s as specified by 891 [RFC2988]. 893 11. Acknowledgments 895 We would like to thank Lars Eggert, Mark Handley, Kai Jakobs, Ilpo 896 Jarvinen, Pasi Sarolahti, Tim Shepard, Joe Touch and Carsten Wolff 897 for feedback on earlier versions of this document. We also thank 898 Michael Faber, Daniel Schaffrath, and Damian Lukowski for 899 implementing and testing the algorithm in Linux. Special thanks go 900 to Ilpo Jarvinen for giving valuable feedback regarding the Linux 901 implementation. 903 This work has been supported by the German National Science 904 Foundation (DFG) within the research excellence cluster Ultra High- 905 Speed Mobile Information and Communication (UMIC), RWTH Aachen 906 University. 908 12. References 910 12.1. Normative References 912 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 913 RFC 792, September 1981. 915 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 916 RFC 793, September 1981. 918 [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions 919 for High Performance", RFC 1323, May 1992. 921 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 922 RFC 1812, June 1995. 924 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 925 Timer", RFC 2988, November 2000. 927 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 928 Control", RFC 5681, September 2009. 930 12.2. Informative References 932 [CRVP01] Chandran, K., Raghunathan, S., Venkatesan, S., and R. 933 Prakash, "A feedback-based scheme for improving TCP 934 performance in ad hoc wireless networks", IEEE Personal 935 Communications vol. 8, no. 1, pp. 34-39, February 2001. 937 [HV02] Holland, G. and N. Vaidya, "Analysis of TCP performance 938 over mobile ad hoc networks", Wireless Networks vol. 8, 939 no. 2-3, pp. 275-288, March 2002. 941 [I-D.eggert-tcpm-tcp-retransmit-now] 942 Eggert, L., "TCP Extensions for Immediate 943 Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02 944 (work in progress), June 2005. 946 [I-D.schuetz-tcpm-tcp-rlci] 947 Schuetz, S., Koutsianas, N., Eggert, L., Eddy, W., Swami, 948 Y., and K. Le, "TCP Response to Lower-Layer Connectivity- 949 Change Indications", draft-schuetz-tcpm-tcp-rlci-03 (work 950 in progress), February 2008. 952 [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time 953 Estimates in Reliable Transport Protocols", Proceedings of 954 the Conference on Applications, Technologies, 955 Architectures, and Protocols for Computer Communication 956 (SIGCOMM'87) pp. 2-7, August 1987. 958 [LS01] Liu, J. and S. Singh, "ATCP: TCP for mobile ad hoc 959 networks", IEEE Journal on Selected Areas in 960 Communications vol. 19, no. 7, pp. 1300-1315, 2001 July. 962 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 963 September 1981. 965 [RFC0826] Plummer, D., "Ethernet Address Resolution Protocol: Or 966 converting network protocol addresses to 48.bit Ethernet 967 address for transmission on Ethernet hardware", STD 37, 968 RFC 826, November 1982. 970 [RFC1122] Braden, R., "Requirements for Internet Hosts - 971 Communication Layers", STD 3, RFC 1122, October 1989. 973 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 974 October 1996. 976 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 977 Requirement Levels", BCP 14, RFC 2119, March 1997. 979 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 980 (IPv6) Specification", RFC 2460, December 1998. 982 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 983 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 984 March 2000. 986 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 987 of Explicit Congestion Notification (ECN) to IP", 988 RFC 3168, September 2001. 990 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 991 for TCP", RFC 3522, April 2003. 993 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 994 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 995 April 2004. 997 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 998 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 999 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1000 RFC 3819, July 2004. 1002 [RFC4015] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 1003 for TCP", RFC 4015, February 2005. 1005 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1006 Internet Protocol", RFC 4301, December 2005. 1008 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 1009 Message Protocol (ICMPv6) for the Internet Protocol 1010 Version 6 (IPv6) Specification", RFC 4443, March 2006. 1012 [RFC5461] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461, 1013 February 2009. 1015 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1016 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1017 Spurious Retransmission Timeouts with TCP", RFC 5682, 1018 September 2009. 1020 [RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010. 1022 [SESB05] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner, 1023 "Protocol enhancements for intermittently connected 1024 hosts", SIGCOMM Computer Communication Review vol. 35, no. 1025 3, pp. 5-18, December 2005. 1027 [SM03] Scott, J. and G. Mapp, "Link layer-based TCP optimisation 1028 for disconnecting networks", SIGCOMM Computer 1029 Communication Review vol. 33, no. 5, pp. 31-42, 1030 October 2003. 1032 [Zh86] Zhang, L., "Why TCP Timers Don't Work Well", Proceedings 1033 of the Conference on Applications, Technologies, 1034 Architectures, and Protocols for Computer Communication 1035 (SIGCOMM'86) pp. 397-405, August 1986. 1037 Appendix A. Changes from previous versions of the draft 1039 This appendix should be removed by the RFC Editor before publishing 1040 this document as an RFC. 1042 A.1. Changes from draft-ietf-tcpm-tcp-lcd-01 1044 o Incorporated feedback submitted by Lars Eggert 1046 A.2. Changes from draft-ietf-tcpm-tcp-lcd-00 1048 o Editorial changes. 1050 o Clarified TCP-LCD's behaviour during connection establishment 1051 (Thanks to Mark Handley). 1053 A.3. Changes from draft-zimmermann-tcp-lcd-02 1055 o Incorporated feedback submitted by Ilpo Jarvinen. 1056 1058 o Incorporated feedback submitted by Pasi Sarolahti. 1059 1061 o Incorporated feedback submitted by Joe Touch. 1062 1063 1065 o Extended and reorganized the discussion (Section 5): 1067 * Every discussion item got its own title, so that we have a 1068 better overview. 1070 * Extended Retransmission Ambiguity section. Added also some 1071 references to the historical retransmission ambiguity problem. 1073 * Heavily extended discussion about wrapped sequence numbers (see 1074 Joe's comments). 1076 * Described the influence of packet duplication on the algorithm 1077 (Thanks to Ilpo). 1079 * The section "Protecting Against Misbehaving Routers" is not a 1080 subsection anymore. Moreover, the section was renamed to 1081 "Dissolving Ambiguity Issues" and has now real content. 1083 o An interoperability issues section (Section 7) was added. In 1084 particular comments to ECN, ICMPv6, and to the two thresholds R1 1085 and R2 of [RFC1122] (Section 4.2.3.5) were added. 1087 o Miscellaneous editorial changes. In particular, the algorithm has 1088 a name now: TCP-LCD. 1090 A.4. Changes from draft-zimmermann-tcp-lcd-01 1092 o The algorithm in Section 4.2 was slightly changed. Instead of 1093 reverting the last retransmission timer backoff by halving the 1094 RTO, the RTO is recalculated with help of the "BACKOFF_CNT" 1095 variable. This fixes an issue that occurred when the 1096 retransmission timer was backed off but bounded by a maximum 1097 value. The algorithm in the previous version of the draft, would 1098 have "reverted" to half of that maximum value, instead of using 1099 the value, before the RTO was doubled (and then bounded). 1101 o Miscellaneous editorial changes. 1103 A.5. Changes from draft-zimmermann-tcp-lcd-00 1105 o Miscellaneous editorial changes in Section 1, 2 and 3. 1107 o The document was restructured in Section 1, 2 and 3 for easier 1108 reading. The motivation for the algorithm is changed according 1109 TCP's problem to disambiguate congestion from non-congestion loss. 1111 o Added Section 4.1. 1113 o The algorithm in Section 4.2 was restructured and simplified: 1115 * The special case of the first received ICMP destination 1116 unreachable message after an RTO was removed. 1118 * The "BACKOFF_CNT" variable was introduced so it is no longer 1119 possible to perform more reverts than backoffs. 1121 o The discussion in Section 5 was improved and expanded according to 1122 the algorithm changes. 1124 Authors' Addresses 1126 Alexander Zimmermann 1127 RWTH Aachen University 1128 Ahornstrasse 55 1129 Aachen, 52074 1130 Germany 1132 Phone: +49 241 80 21422 1133 Email: zimmermann@cs.rwth-aachen.de 1134 Arnd Hannemann 1135 RWTH Aachen University 1136 Ahornstrasse 55 1137 Aachen, 52074 1138 Germany 1140 Phone: +49 241 80 21423 1141 Email: hannemann@nets.rwth-aachen.de