idnits 2.17.1 draft-ietf-tcpm-tcp-lcd-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 17, 2009) is 5246 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-21) exists of draft-ietf-tcpm-1323bis-01 ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) == Outdated reference: A later version (-12) exists of draft-ietf-tcpm-icmp-attacks-06 -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor A. Zimmermann 3 Extensions (TCPM) WG A. Hannemann 4 Internet-Draft RWTH Aachen University 5 Intended status: Experimental November 17, 2009 6 Expires: May 21, 2010 8 Making TCP more Robust to Long Connectivity Disruptions (TCP-LCD) 9 draft-ietf-tcpm-tcp-lcd-00 11 Abstract 13 Disruptions in end-to-end path connectivity, which last longer than 14 one retransmission timeout cause suboptimal TCP performance. The 15 reason for the performance degradation is that TCP interprets segment 16 loss induced by long connectivity disruptions as a sign of 17 congestion, resulting in repeated retransmission timer backoffs. 18 This leads in turn to a deferred detection of the re-establishment of 19 the connection since TCP waits until the next retransmission timeout 20 occurs before attempting the retransmission. 22 This document proposes a algorithm for making TCP more robust to long 23 connectivity disruptions (TCP-LCD). The memo describes how standard 24 ICMP messages can be exploited during timeout-based loss recovery to 25 disambiguate true congestion loss from non-congestion loss caused by 26 connectivity disruptions. Moreover, a revert strategy of the 27 retransmission timer is specified that enables a more prompt 28 detection of whether the connectivity to a previously disconnected 29 peer node has been restored or not. TCP-LCD is a TCP sender-only 30 modification that effectively improves TCP performance in presence of 31 connectivity disruptions. 33 Status of this Memo 35 This Internet-Draft is submitted to IETF in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF), its areas, and its working groups. Note that 40 other groups may also distribute working documents as Internet- 41 Drafts. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 The list of current Internet-Drafts can be accessed at 49 http://www.ietf.org/ietf/1id-abstracts.txt. 51 The list of Internet-Draft Shadow Directories can be accessed at 52 http://www.ietf.org/shadow.html. 54 This Internet-Draft will expire on May 21, 2010. 56 Copyright Notice 58 Copyright (c) 2009 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the BSD License. 71 Table of Contents 73 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 3. Connectivity Disruption Indication . . . . . . . . . . . . . . 6 76 4. Connectivity Disruption Reaction . . . . . . . . . . . . . . . 8 77 4.1. Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . 8 78 4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 8 79 5. Discussion of TCP-LCD . . . . . . . . . . . . . . . . . . . . 11 80 5.1. Retransmission Ambiguity . . . . . . . . . . . . . . . . . 12 81 5.2. Wrapped Sequence Numbers . . . . . . . . . . . . . . . . . 13 82 5.3. Packet Duplication . . . . . . . . . . . . . . . . . . . . 14 83 5.4. Probing Frequency . . . . . . . . . . . . . . . . . . . . 14 84 5.5. Reaction in Steady-State . . . . . . . . . . . . . . . . . 14 85 6. Dissolving Ambiguity Issues (the Safe Variant) . . . . . . . . 15 86 7. Interoperability Issues . . . . . . . . . . . . . . . . . . . 17 87 7.1. Detection of TCP Connection Failures . . . . . . . . . . . 17 88 7.2. Explicit Congestion Notification . . . . . . . . . . . . . 17 89 7.3. ICMP for IP version 6 . . . . . . . . . . . . . . . . . . 17 90 7.4. TCP-LCD and IP Tunnels . . . . . . . . . . . . . . . . . . 18 91 8. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 18 92 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 93 10. Security Considerations . . . . . . . . . . . . . . . . . . . 20 94 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 95 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 96 12.1. Normative References . . . . . . . . . . . . . . . . . . . 21 97 12.2. Informative References . . . . . . . . . . . . . . . . . . 21 98 Appendix A. Changes from previous versions of the draft . . . . . 23 99 A.1. Changes from draft-zimmermann-tcp-lcd-02 . . . . . . . . . 23 100 A.2. Changes from draft-zimmermann-tcp-lcd-01 . . . . . . . . . 24 101 A.3. Changes from draft-zimmermann-tcp-lcd-00 . . . . . . . . . 24 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 104 1. Terminology 106 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 107 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 108 document are to be interpreted as described in [RFC2119]. 110 The reader should be familiar with the algorithm and terminology from 111 [RFC2988], which defines the standard algorithm Transmission Control 112 Protocol (TCP) senders are required to use to compute and manage 113 their retransmission timer. In this document the terms 114 "retransmission timer" and "retransmission timeout" are used as 115 defined in [RFC2988]. The retransmission timer ensures data delivery 116 in the absence of any feedback from the receiver. The duration of 117 this timer is referred to as retransmission timeout (RTO). 119 As defined in [RFC0793], the term "acceptable acknowledgment (ACK)" 120 refers to a TCP segment that acknowledges previously unacknowledged 121 data. The TCP sender state variable "SND.UNA" and the current 122 segment variable "SEG.SEQ" are used as defined in [RFC0793]. SND.UNA 123 holds the segment sequence number of earliest segment that has not 124 been acknowledged by the TCP receiver (the oldest outstanding 125 segment). SEG.SEQ is the segment sequence number of a given segment. 127 For the purposes of this specification we define the term "timeout- 128 based loss recovery" that refers to the state, which a TCP sender 129 enters upon the first timeout of the oldest outstanding segment 130 (SND.UNA) and leaves upon the arrival of the *first* acceptable ACK. 131 It is important to note that other documents use a different 132 interpretation of the term "timeout-based loss recovery". For 133 example the NewReno modification to TCP's Fast Recovery algorithm 134 [RFC3782] extents the period a TCP sender remains in timeout-based 135 loss recovery compared to the one defined in this document. This is 136 because [RFC3782] attempts to avoid unnecessary multiple Fast 137 Retransmits that can occur after an RTO. 139 2. Introduction 141 Connectivity disruptions can occur in many different situations. The 142 frequency of the connectivity disruptions depends thereby on the 143 property of the end-to-end path between the communicating hosts. 144 While connectivity disruptions can occur in traditional wired 145 networks too, e.g., simply due to an unplugged network cable, the 146 likelihood of occurrence is significantly higher in wireless (multi- 147 hop) networks. Especially, end-host mobility, network topology 148 changes and wireless interferences are crucial factors. In the case 149 of the Transmission Control Protocol (TCP) [RFC0793], the performance 150 of the connection can exhibit a significant reduction compared to a 151 permanently connected path [SESB05]. This is because TCP, which was 152 originally designed to operate in fixed and wired networks, generally 153 assumes that the end-to-end path connectivity is relatively stable 154 over the connection's lifetime. 156 Depending on their duration connectivity disruptions can be 157 classified into two groups [I-D.schuetz-tcpm-tcp-rlci]: "short" and 158 "long" connectivity disruptions. A connectivity disruption is 159 "short" if connectivity returns before the retransmission timer fires 160 for the first time. In this case, TCP recovers lost data segments 161 through Fast Retransmit and lost acknowledgments (ACK) through 162 successfully delivered later ACKs. Connectivity disruptions are 163 declared as "long" for a given TCP connection, if the retransmission 164 timer fires at least once before connectivity returns. Whether or 165 not path characteristics, like the round trip time (RTT) or the 166 available bandwidth have changed when the connectivity returns after 167 a disruption is another important aspect for TCP's retransmission 168 scheme [I-D.schuetz-tcpm-tcp-rlci]. 170 This document improves TCP's behavior in case of "long connectivity 171 disruptions". In particular, it focuses on the period "prior" to the 172 re-establishment of the connectivity to a previously disconnected 173 peer node. The document does not describe any additional 174 modification to detect whenever the path characteristics remain 175 unchanged in order to improve TCP's behavior once connectivity has 176 been restored. Hence, TCP's basic congestion control mechanisms 177 [RFC5681] will be unchanged. 179 When a long connectivity disruption occurs on a TCP connection, the 180 TCP sender stops receiving acknowledgments. After the retransmission 181 timer expires, the TCP sender enters the timeout-based loss recovery 182 and declares the oldest outstanding segment (SND.UNA) as lost. Since 183 TCP tightly couples reliability and congestion control, the 184 retransmission of SND.UNA is triggered together with the reduction of 185 sending rate, which is based on the assumption that segment loss is 186 indication of congestion [RFC5681]. As long as the connectivity 187 disruption persists, TCP will repeat this procedure until the oldest 188 outstanding segment is successfully acknowledged, or the connection 189 times out. TCP implementations that follow the recommended 190 retransmission timeout (RTO) management of RFC 2988 [RFC2988] double 191 the RTO after each retransmission attempt. However, the RTO growth 192 may be bounded by an upper limit, the maximum RTO, which is at least 193 60s, but may be longer: Linux for example uses 120s. If the 194 connectivity is restored between two retransmission attempts, TCP 195 still has to wait until the retransmission timer expires before 196 resuming transmission, since it simply does not have any means to 197 know when the connectivity is re-established. Therefore, depending 198 on when connectivity becomes available again, this can waste up to 199 maximum RTO of possible transmission time. 201 This retransmission behavior is not efficient, especially in 202 scenarios where long connectivity disruptions are frequent. In the 203 ideal case, a TCP would attempt a retransmission as soon as 204 connectivity to its peer is re-established. In this document, we 205 specify a TCP sender-only modification to provide robustness to long 206 connectivity disruptions (TCP-LCD). The memo describes how the 207 standard Internet Control Message Protocol (ICMP) can be exploited 208 during timeout-based loss recovery to identify non-congestion loss 209 caused by long connectivity disruptions. TCP-LCD's revert strategy 210 of the retransmission timer enables, due to higher-frequency 211 retransmissions, a prompt detection when the connectivity to a 212 previously disconnected peer node has been restored. In the case the 213 network allows, i.e., no congestion is present, TCP-LCD approaches 214 the ideal behavior. 216 3. Connectivity Disruption Indication 218 As long as the queue of an intermediate router experiencing a link 219 outage is deep enough, i.e., it can buffer all incoming packets, a 220 connectivity disruption will only cause variation in delay, which is 221 handled well by contemporary TCP implementations with the help of 222 Eifel [RFC3522], [RFC4015] or Forward RTO-Recovery (F-RTO) [RFC5682]. 223 However, if the link outage lasts too long, the router experiencing 224 the link outage is forced to drop packets and finally to discard the 225 according route. Means to detect such link outages comprise reacting 226 on failed address resolution protocol (ARP) [RFC0826] queries, 227 unsuccessful link sensing, and the like. However, this is solely in 228 the responsibility of the respective router. 230 Note: The focus of this memo is on introducing a method how ICMP 231 messages may be exploited to improve TCP's performance; how 232 different physical and link layer mechanisms underneath the 233 network layer may trigger ICMP destination unreachable messages 234 are out of scope of this memo. 236 Provided that no other route (including no default route) to the 237 specific destination exists, the removal of the route goes along with 238 a notification to the corresponding sending host about the dropped 239 packets via ICMP destination unreachable messages of code 0 (net 240 unreachable) or code 1 (host unreachable) [RFC1812]. Therefore, 241 since the reception of ICMP destination unreachable messages of these 242 codes provide evidence that packets were dropped due to a link 243 outage, the sending host can use them as an indication for a 244 connectivity disruption. 246 Note that there are also other ICMP destination unreachable messages 247 with different codes. Some of them are candidates for connectivity 248 disruption indications too, but need further investigation. For 249 example ICMP destination unreachable messages with code 5 (source 250 route failed), code 11 (net unreachable for TOS), or code 12 (host 251 unreachable for TOS) [RFC1812]. On the other hand codes that flag 252 hard errors are of no use for the proposed scheme, since TCP should 253 abort the connection when those are received [RFC1122]. In the 254 following, the term "ICMP unreachable message" is used as synonym for 255 ICMP destination unreachable messages of code 0 or code 1. 257 The accurate interpretation of ICMP unreachable messages as a 258 connectivity disruption indication is complicated by the following 259 two peculiarities of ICMP messages. Firstly, they do not necessarily 260 operate on the same timescale as the packets, i.e., in the given case 261 TCP segments that elicited them. When a router drops a packet due to 262 a missing route it will not necessarily send an ICMP unreachable 263 message immediately, but rather queues it for later delivery. 264 Secondly, ICMP messages are subject to rate limiting, e.g., when a 265 router drops a whole window of data due to a link outage, it will 266 hardly send as many ICMP unreachable messages as it dropped TCP 267 segments. Depending on the load of the router it may even send no 268 ICMP unreachable messages at all. Both peculiarities originate from 269 [RFC1812]. 271 Fortunately, according to [RFC0792] ICMP unreachable messages are 272 obliged to contain in their body the entire Internet Protocol (IP) 273 header [RFC0791] of the datagram eliciting the ICMP unreachable 274 messages plus the first 64 bits of the payload of that datagram. 275 This allows the sending host to match the ICMP error message to the 276 transport that elicited it. RFC 1812 [RFC1812] augments the 277 requirements and states that ICMP messages should contain as much of 278 the original datagram as possible without the length of the ICMP 279 datagram exceeding 576 bytes. Therefore, in case of TCP, at least 280 the source port number, the destination port number, and the 32-bit 281 TCP sequence number are included. Thus, this allows the originating 282 TCP to demultiplex the received ICMP message and to identify the 283 connection which an ICMP unreachable message is reporting an error 284 about. Moreover, it can identify which segment of the respective 285 connection triggered the ICMP unreachable message, provided that 286 there are not several segments in-flight with the same sequence 287 number (see Section 5.1). 289 A connectivity disruption indication in form of an ICMP unreachable 290 message associated with a presumably lost TCP segment provides strong 291 evidence that the segment was not dropped due to congestion but 292 instead was successfully delivered to the temporary end-point of the 293 employed path, i.e., the reporting router. It therefore did not 294 witness any congestion at least on that very part of the path that 295 was traveled by both, the TCP segment eliciting the ICMP unreachable 296 message as well as the ICMP unreachable message itself. 298 4. Connectivity Disruption Reaction 300 Section 4.1 gives the basic idea of TCP-LCD. The complete algorithm 301 is specified in Section 4.2. 303 4.1. Basic Idea 305 The goal of the algorithm is the prompt detection when the 306 connectivity to a previously disconnected peer node has been restored 307 after a long connectivity disruption while retaining appropriate 308 behavior in case of congestion. TCP-LCD exploits standard ICMP 309 unreachable messages during timeout-based loss recovery to increase 310 the TCP's retransmission frequency by undoing one retransmission 311 timer backoff whenever an ICMP unreachable message reports on the 312 sequence number of a presumably lost retransmission. 314 This approach has the advantage of appropriately reducing the probing 315 rate in case of congestion. If either the retransmission itself, or 316 the corresponding ICMP message is dropped the previously performed 317 retransmission timer backoff is not undone, which effectively halves 318 the probing rate. 320 4.2. Algorithm Details 322 A TCP sender using RFC 2988 [RFC2988] to compute TCP's retransmission 323 timer MAY employ the following scheme to avoid over-conservative 324 retransmission timer backoffs in case of long connectivity 325 disruptions. If a TCP sender does implement the following steps, the 326 algorithm MUST be initiated upon the first timeout of the oldest 327 outstanding segment (SND.UNA) and MUST be stopped upon the arrival of 328 the first acceptable ACK. The algorithm MUST NOT be re-initiated 329 upon subsequent timeouts for the same segment. 331 A TCP sender that does not employ RFC 2988 [RFC2988] to compute TCP's 332 retransmission timer SHOULD NOT use TCP-LCD. We envision that the 333 scheme could be easily adapted to other algorithms than RFC 2988. 334 However, we leave this as future work. 336 In rule (2.5) RFC 2988 [RFC2988] provides the option to place a 337 maximum value on the RTO. When a TCP implements this rule to provide 338 an upper bound for the RTO, the rule SHOULD also be used in the 339 following algorithm. In particular, if the RTO is bounded by an 340 upper limit (maximum RTO), the "MAX_RTO" variable used in this scheme 341 SHOULD be initialized with this upper limit. Otherwise, if the RTO 342 is unbounded, the "MAX_RTO" variable SHOULD be set to infinity. 344 The scheme specified in this document uses the "BACKOFF_CNT" 345 variable, whose initial value is zero. The variable is used to count 346 the number of performed retransmission timer backoffs during one 347 timeout-based loss recovery. Moreover, the "RTO_BASE" variable is 348 used to recover the previous RTO in case the retransmission timer 349 backoff was unnecessary. The variable is initialized with the RTO 350 upon initiation of timeout-based loss recovery. 352 (1) Before TCP updates the variable "RTO" when it initiates timeout- 353 based loss recovery, set the variables "BACKOFF_CNT" and 354 "RTO_BASE" as follows: 356 BACKOFF_CNT := 0; 357 RTO_BASE := RTO. 359 Proceed to step (R). 361 (R) This is a placeholder for the behavior that a standard TCP must 362 execute at this point in case the retransmission timer is 363 expired. In particular if RFC 2988 [RFC2988] is used, steps 364 (5.4) - (5.6) of that algorithm go here. Proceed to step (2). 366 (2) To account for the expiration of the retransmission timer in the 367 previous step (R), increment the "BACKOFF_CNT" variable by one: 369 BACKOFF_CNT := BACKOFF_CNT + 1. 371 (3) Wait either 373 for the expiration of the retransmission timer. When the 374 retransmission timer expires, proceed to step (R); 376 or for the arrival of an acceptable ACK. When an acceptable 377 ACK arrives, proceed to step (A); 379 or for the arrival of an ICMP unreachable message. When the 380 ICMP unreachable message "ICMP_DU" arrives, proceed to step 381 (4). 383 (4) If "BACKOFF_CNT > 0", i.e., if at least one retransmission timer 384 backoff can be undone, then 386 proceed to step (5); 388 else 389 proceed to step (3). 391 (5) Extract the TCP segment header included in the ICMP unreachable 392 message "ICMP_DU": 394 SEG := Extract(ICMP_DU). 396 (6) If "SEG.SEQ == SND.UNA", i.e., if the TCP segment "SEG" 397 eliciting the ICMP unreachable message "ICMP_DU" carries the 398 sequence number of a retransmission, then 400 proceed to step (7); 402 else 404 proceed to step (3). 406 (7) Undo the last retransmission timer backoff: 408 BACKOFF_CNT := BACKOFF_CNT - 1; 409 RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO). 411 (8) If the retransmission timer expires due to the undoing in the 412 previous step (7), then 414 proceed to step (R); 416 else 418 proceed to step (3). 420 (A) This is a placeholder for the standard TCP behavior that must be 421 executed at this point in the case an acceptable ACK has 422 arrived. No further processing. 424 When a TCP in steady-state detects a segment loss using the 425 retransmission timer it enters the timeout-based loss recovery and 426 initiates the algorithm (step 1). It adjusts the slow start 427 threshold (ssthresh), sets the congestion window (CWND) to one 428 segment, backs off the retransmission timer and retransmits the first 429 unacknowledged segment (step R) [RFC5681], [RFC2988]. To account for 430 the expiration of the retransmission timer the TCP sender increments 431 the "BACKOFF_CNT" variable by one (step 2). 433 In case the retransmission timer expires again (step 3a) a TCP will 434 repeat the retransmission of the first unacknowledged segment and 435 back off the retransmission timer once more (step R) [RFC2988] as 436 well as increment the "BACKOFF_CNT" variable by one (step 2). Note 437 that a TCP may implement RFC 2988's [RFC2988] option to place a 438 maximum value on the RTO that may result in not performing the 439 retransmission timer backoff. However, step (2) MUST always and 440 unconditionally be applied, no matter whether the retransmission 441 timer is actually backed off or not. In other words, each time the 442 retransmission timer expires, the "BACKOFF_CNT" variable MUST be 443 incremented by one. 445 If the first received packet after the retransmission(s) is an 446 acceptable ACK (step 3b), a TCP will proceed as normal, i.e., slow 447 start the connection and terminate the algorithm (step A). Later 448 ICMP unreachable messages from the just terminated timeout-based loss 449 recovery are of no use and therefore ignored since the ACK clock is 450 already restarting due to the successful retransmission. 452 On the other hand, if the first received packet after the 453 retransmission(s) is an ICMP unreachable message (step 3c) and if 454 step (4) allows, a TCP SHOULD undo one backoff for each ICMP 455 unreachable message reporting an error on a retransmission. To 456 decide if an ICMP unreachable message reports on a retransmission, 457 the sequence number therein is exploited (step 5, step 6). The undo 458 is performed by re-calculating the RTO with the decremented 459 "BACKOFF_CNT" variable (step 7). This calculation explicitly matches 460 the (bounded) exponential backoff specified in rule (5.5) of 461 [RFC2988]. 463 Upon receipt of an ICMP unreachable message that legitimately undoes 464 one backoff there is the possibility that the shortened 465 retransmission timer has expired already (step 8). Then, a TCP 466 SHOULD retransmit immediately, i.e., an ICMP message clocked 467 retransmission. In case the shortened retransmission timer has not 468 expired yet, TCP MUST wait accordingly. 470 5. Discussion of TCP-LCD 472 TCP-LCD takes caution to only react to connectivity disruption 473 indications in form of ICMP unreachable messages during timeout-based 474 loss recovery. Therefore, TCP's behavior is not altered when either 475 no ICMP unreachable messages are received, or the retransmission 476 timer of the TCP sender did not yet expire since the last received 477 acceptable ACK. Thereby, the algorithm triggers by definition only 478 in the case of long connectivity disruptions. 480 Only such ICMP unreachable messages that report on the sequence 481 number of a retransmission, i.e., report on SND.UNA, are evaluated by 482 TCP-LCD. All other ICMP unreachable messages are ignored. The 483 arrival of those ICMP unreachable messages provides strong evidence 484 that the retransmissions were not dropped due to congestion but 485 instead were successfully delivered to the temporary end-point of the 486 employed path, i.e., the reporting router. In other words, there is 487 no witness for any congestion at least on that very part of the path 488 that was traveled by both, the TCP segment eliciting the ICMP 489 unreachable message as well as the ICMP unreachable message itself. 491 However, there are some situations where TCP-LCD makes a false 492 decision and undoes a retransmission timer backoff wrongly. This can 493 happen, albeit the received ICMP unreachable message reports on the 494 segment number of a retransmission (SND.UNA), because the TCP segment 495 that elicited the ICMP unreachable message may either not be a 496 retransmission (Section 5.1), or does not belong to the current 497 timeout-based loss recovery (Section 5.2). Finally, packet 498 duplication (Section 5.3) can also spuriously trigger the algorithm. 500 Section 5.4 discusses possible probing frequencies, while Section 5.5 501 describes the motivation for not reacting on ICMP unreachable 502 messages while TCP is in steady-state. 504 5.1. Retransmission Ambiguity 506 Historically, the retransmission ambiguity problem [Zh86], [KP87] is 507 the TCP sender's inability to distinguish whether the first 508 acceptable ACK after a retransmission refers to the original 509 transmission or the retransmission. This problem occurs after both a 510 Fast Retransmit and a timeout-based retransmit. However, modern TCP 511 implementations can eliminate the retransmission ambiguity with 512 either the help of Eifel [RFC3522], [RFC4015] or Forward RTO-Recovery 513 (F-RTO) [RFC5682]. 515 The revert strategy of the given algorithm suffers from a form of 516 retransmission ambiguity, too. In contrast to the aforementioned 517 case, TCP suffers from ambiguity regarding ICMP unreachable messages 518 received during timeout-based loss recovery. With the TCP segment 519 number included in the ICMP unreachable message, a TCP sender is not 520 able to determine if the ICMP unreachable message refers to the 521 original transmission or to any of the timeout-based retransmissions. 522 That is, there is an ambiguity which TCP segment, i.e., the original 523 transmission or any of the retransmissions an ICMP unreachable 524 message reports on. 526 However, for the algorithm the ambiguity is not considered to be a 527 problem. The assumption that a received ICMP message provides 528 evidence that one non-congestion loss caused by the connectivity 529 disruption was wrongly considered a congestion loss still holds, 530 regardless to which TCP segment, transmission or retransmission the 531 message refers. 533 5.2. Wrapped Sequence Numbers 535 Besides the ambiguity if a received ICMP unreachable message refers 536 to the original transmission or to any of the retransmissions, there 537 is another source of ambiguity about the TCP sequence numbers 538 contained in ICMP unreachable messages. For high bandwidth paths 539 like modern gigabit links the sequence space may wrap rather quickly, 540 thereby allowing the possibility that delayed ICMP unreachable 541 messages - a router dropping packets due to a link outage is not 542 obliged to send ICMP unreachable messages in a timely manner 543 [RFC1812] - may coincidentally fit as valid input in the proposed 544 scheme. As a result, the scheme may undo retransmission timer 545 backoffs wrongly. Chances for this to happen are minuscule, since a 546 particular ICMP message would need to contain the exact sequence 547 number of the current oldest outstanding segment (SND.UNA), while at 548 the same time TCP is in timeout-based loss recovery. However, two 549 "worst case" scenarios for the algorithm are possible: 551 For instance, consider a steady state TCP connection, which will be 552 disrupted at an intermediate router R due to a link outage. Upon the 553 expiration of the RTO, the TCP sender enters the timeout-based loss 554 recovery and starts to retransmit the earliest segment that has not 555 been acknowledged (SND.UNA). For any reason, router R delays all 556 corresponding ICMP unreachable messages, so that the TCP sender 557 backoffs the retransmission timer normally without any undoing. At 558 the end of the connectivity disruption, the TCP sender eventually 559 detects the re-establishment, leaves the scheme and finally the 560 timeout-based loss recovery, too. A sequence number wrap-around 561 later, the connectivity between the two peers is disrupted again, but 562 this time due to congestion and exactly at the time at which the 563 current SND.UNA matches the SND.UNA from the previous cycle. If 564 router R emits the delayed ICMP unreachable messages now, the TCP 565 sender would undo retransmission timer backoffs wrongly. As the TCP 566 sequence number contains 32 bits, the probability of this scenario is 567 at most 1/2^32. Given sufficiently many retransmissions in the first 568 timeout-based loss recovery, the corresponding ICMP unreachable 569 messages could reduce the RTO in the second recovery at most to 570 "RTO_BASE". However, once the ICMP unreachable messages are 571 depleted, the standard exponential backoff will be performed. Thus, 572 the congestion response will only be delayed by some false 573 retransmissions. 575 Similar to the above, consider the case where a steady state TCP 576 connection with n segments in-flight will be disrupted at some point 577 by an intermediate router R due to a link outage. For each segment 578 in-flight, router R may generates an ICMP unreachable message, 579 however due to some reason it delays them. Once the link outage is 580 over and the connection is re-established, the TCP sender leaves the 581 scheme and slow-starts the connection. Following a sequence number 582 wrap-around, a retransmission timeout occurs, just at the moment the 583 TCP sender's current window of data reaches the previous range of the 584 sequence number space again. In case router R emits the delayed ICMP 585 unreachable messages now, one spurious undoing of the retransmission 586 timer backoff is possible, if firstly the TCP segment number 587 contained in ICMP unreachable messages matches the current SND.UNA, 588 and secondly the timeout was a result of congestion. In the case of 589 another connectivity disruption, the additional undoing of the 590 retransmission timer backoff has no impact. The probability of this 591 scenario is at most n/2^32. 593 5.3. Packet Duplication 595 In the case an intermediate router duplicates packets, a TCP sender 596 may receive more ICMP unreachable messages during timeout-based loss 597 recovery than it actually has sent timeout-based retransmissions. 598 However, since TCP-LCD keeps track of the number of performed 599 retransmission timer backoffs in the "BACKOFF_CNT" variable, it will 600 not undo more retransmission timer backoffs than were actually 601 performed. Nevertheless, if packet duplication and congestion 602 coincide on the path between the two communicating hosts, duplicated 603 ICMP messages could hide the congestion loss of some retransmissions 604 or ICMP messages and the algorithm may undo retransmission timer 605 backoffs wrongly. Considering the overall impact of a router that 606 duplicates packets, the additional load induced by some spurious 607 timeout-based retransmits can probably be neglected. 609 5.4. Probing Frequency 611 One could argue that if an ICMP unreachable message arrives for a 612 timeout-based retransmission, the RTO should be reset or recalculated 613 similar to what is done when an ACK arrives during timeout-based loss 614 recovery (see Karn's algorithm [KP87], [RFC2988]), and a new 615 retransmission should be sent immediately. Generally, this would 616 allow for a much higher probing frequency based on the round trip 617 time up to the router where the connectivity is disrupted. However, 618 we believe the current scheme provides a good trade-off between 619 conservative behavior and fast detection of connectivity re- 620 establishment. 622 5.5. Reaction in Steady-State 624 Another exploitation of ICMP unreachable messages in the context of 625 TCP congestion control might seem appropriate in case the ICMP 626 unreachable message is received while TCP is in steady-state and the 627 message refers to a segment from within the current window of data. 628 As the RTT up to the router, which generates the ICMP unreachable 629 message is likely to be substantially shorter than the overall RTT to 630 the destination, the ICMP unreachable message may very well reach the 631 originating TCP while it is transmitting the current window of data. 632 In case the remaining window is large, it might seem appropriate to 633 refrain from transmitting the remaining window as there is timely 634 evidence that it will only trigger further ICMP unreachable messages 635 at the very router. Although this promises improvement from a 636 wastage perspective, it may be counterproductive from a security 637 perspective. An attacker could forge such ICMP messages, thereby 638 forcing the originating TCP to stop sending data, very similar to the 639 blind throughput-reduction attack mentioned in 640 [I-D.ietf-tcpm-icmp-attacks]. 642 An additional consideration is the following: in the presence of 643 multi-path routing even the receipt of a legitimate ICMP unreachable 644 message cannot be exploited accurately because there is the option 645 that only one of the multiple paths to the destination is suffering 646 from a connectivity disruption, which causes ICMP unreachable 647 messages to be sent. Then however, there is the possibility that the 648 path along which the connectivity disruption occurred contributed 649 considerably to the overall bandwidth, such that a congestion 650 response is very well reasonable. However, this is not necessarily 651 the case. Therefore, a TCP has no means except for its inherent 652 congestion control to decide on this matter. All in all, it seems 653 that for a connection in steady-state, i.e., not in timeout-based 654 loss recovery, reacting on ICMP unreachable messages in regard to 655 congestion control is not appropriate. For the case of timeout-based 656 retransmissions, however, there is a reasonable congestion response, 657 which is skipping further retransmission timer backoffs because there 658 is no congestion indication - as described above. 660 6. Dissolving Ambiguity Issues (the Safe Variant) 662 Given that the TCP Timestamps option [I-D.ietf-tcpm-1323bis] is 663 enabled for a connection, a TCP sender MAY use the following 664 algorithm to dissolve the ambiguity issues mentioned in Sections 5.1, 665 5.2, and 5.3. In particular both the retransmission ambiguity and 666 the packet duplication problems are prevented by the following TCP- 667 LCD variant. On the other hand, the false positives caused by 668 wrapped sequence numbers can not be completely avoided, but the 669 likelihood is further reduced by a factor of 1/2^32 since the 670 Timestamp Value field (TSval) of the TCP Timestamps Option contains 671 32 bits. 673 Hence, implementers may choose to implement the TCP-LCD with the 674 following modifications. 676 Step (1) is replaced by step (1'): 678 (1') Before TCP updates the variable "RTO" when it initiates 679 timeout-based loss recovery, set the variables "BACKOFF_CNT" 680 and "RTO_BASE" and the data structure "RETRANS_TS" as follows: 682 BACKOFF_CNT := 0; 683 RTO_BASE := RTO. 684 RETRANS_TS := []; 686 Proceed to step (R). 688 Step (2) is extended by step (2b): 690 (2b) Store the value of the Timestamp Value field (TSval) of the TCP 691 Timestamps option included in the retransmission "RET" sent in 692 step (R) into the "RETRANS_TS" data structure: 694 RETRANS_TS.add(RET.TSval) 696 Step (6) is replaced by step (6'): 698 (6') If "SEG.SEQ == SND.UNA && RETRANS_TS.exists(SEQ.TSval)", i.e., 699 if the TCP segment "SEG" eliciting the ICMP unreachable message 700 "ICMP_DU" carries the sequence number of a retransmission and 701 the value in its Timestamp Value field (TSval) is valid, then 703 proceed to step (7'); 705 else 707 proceed to step (3). 709 Step (7) is replaced by step (7'): 711 (7') Undo the last retransmission timer backoff: 713 RETRANS_TS.remove(SEQ.TSval); 714 BACKOFF_CNT := BACKOFF_CNT - 1; 715 RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO). 717 The downside of the safe variant is twofold. Firstly, the 718 modifications come at a cost: the TCP sender is required to store the 719 timestamps of all retransmissions sent during one timeout-based loss 720 recovery. Secondly, the safe variant can only undo a retransmission 721 timer backoff, if the intermediate router experiencing the link 722 outage implements [RFC1812] and chooses to include as many more than 723 the first 64 bits of the payload of the triggering datagram, as are 724 needed to include the TCP Timestamps option in the ICMP unreachable 725 message. 727 7. Interoperability Issues 729 This section discusses interoperability issues related to introducing 730 TCP-LCD. 732 7.1. Detection of TCP Connection Failures 734 TCP-LCD may have side-effects on TCP implementations, which attempt 735 to detect TCP connection failures by counting timeout-based 736 retransmissions. RFC 1122 [RFC1122] states in Section 4.2.3.5 that a 737 TCP host must handle excessive retransmissions of data segments with 738 two thresholds R1 and R2 measuring the amount of retransmission that 739 has occurred for the same segment. Both thresholds might either be 740 measured in time units or as a count of retransmissions. 742 Due to TCP-LCD's revert strategy of the retransmission timer, the 743 assumption that a certain number of retransmissions corresponds to a 744 specific time interval no longer holds true, as additional 745 retransmissions may be performed during timeout-based-loss recovery 746 to detect the end of the connectivity disruption. Therefore, a TCP 747 employing TCP-LCD either SHOULD measure the thresholds R1 and R2 in 748 time units or in case R1 and R2 are counters of retransmissions 749 SHOULD convert them into time intervals, which correspond to the time 750 an unmodified TCP would need to reach the specified number of 751 retransmissions. 753 7.2. Explicit Congestion Notification 755 By the use of Explicit Congestion Notification (ECN) [RFC3168] ECN- 756 capable routers are no longer limited to dropping packets as 757 congestion indication. Instead they can set the Congestion 758 Experienced (CE) codepoint in the IP header of packets to indicate 759 congestion. Concerning TCP-LCD there is the option that during a 760 connectivity disruption a received ICMP unreachable message has been 761 elicited by a timeout-based retransmission that was marked with the 762 CE codepoint before reaching the router experiencing the link outage. 763 In such a case, we suggest in the case the algorithm undoes a 764 retransmission timer backoff, the TCP sender SHOULD additionally 765 reset the retransmission timer. 767 7.3. ICMP for IP version 6 769 RFC 4443 [RFC4443] specifies the Internet Control Message Protocol 770 (ICMPv6) to be used with the Internet Protocol version 6 (IPv6) 772 [RFC2460]. From TCP-LCD's point of view, it is important to notice 773 that for IPv6, the payload of an ICMPv6 error messages has to include 774 as many bytes from the IPv6 datagram that elicited the ICMPv6 error 775 message as possible without making the error message exceed the 776 minimum IPv6 MTU (1280 bytes) [RFC4443]. Thus, more information is 777 available for TCP-LCD as in the case of IPv4. 779 The counterpart of the ICMPv4 destination unreachable message of code 780 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 781 destination unreachable message of code 0 (no route to destination) 782 [RFC4443]. Like the IPv4 case, a router should generate an ICMPv6 783 destination unreachable message of code 0 in response to a packet 784 that cannot be delivered to its destination address because it lacks 785 a matching entry in its routing table. As a result, TCP-LCD can 786 employ this ICMPv6 error messages as connectivity disruption 787 indication, too. 789 7.4. TCP-LCD and IP Tunnels 791 It is worth noting that IP tunnels, including IPsec [RFC4301], IP in 792 IP [RFC2003], Generic Routing Encapsulation (GRE) [RFC2784], and 793 others are compatible with TCP-LCD, as long as the received ICMP 794 unreachable messages can be demultiplexed and extracted appropriately 795 by the TCP sender during timeout-based loss recovery. 797 If for example end-to-end tunnels like IPSec in transport mode 798 [RFC4301] are employed, a TCP sender may receive ICMP unreachable 799 messages, where additional steps, e.g., decrypting in step (5) of the 800 algorithm is needed to extract the TCP header from these ICMP 801 messages. Provided that the received ICMP unreachable message 802 contains enough information, i.e., SEQ.SEG is extractable, these 803 information MAY still be used as a valid input for the proposed 804 algorithm. 806 Likewise, if IP encapsulation like [RFC2003] is used in some part of 807 the path between the communicating hosts, instead of the TCP sender, 808 the tunnel ingress node may receive the ICMP unreachable messages 809 from an intermediate router experiencing the link outage. 810 Nevertheless, the tunnel ingress node may replay the ICMP unreachable 811 messages in order to inform the TCP sender. If enough information is 812 preserved to extract SEQ.SEG, the replayed ICMP unreachable messages 813 MAY still be used in TCP-LCD. 815 8. Related Work 817 In literature there are several methods that address TCP's problems 818 in the presence of connectivity disruptions. Some of them try to 819 improve TCP's performance by modifying lower layers. For example 820 [SM03] introduces a "smart link layer", which buffers one segment for 821 each active connection and replaying these segments on connectivity 822 re-establishment. This approach has a serious drawback: previously 823 stateless intermediate routers have to be modified in order to 824 inspect TCP headers, to track the end-to-end connection and to 825 provide additional buffer space. These lead all in all to an 826 additional need of memory and processing power. 828 On the other hand stateless link layer schemes, like proposed in 829 [RFC3819], which unconditionally buffer some small number of packets 830 may have another problem: if a packet is buffered longer than the 831 maximum segment lifetime (MSL) of 2 min [RFC0793], i.e., the 832 disconnection lasts longer than MSL, TCP's assumption that such 833 segments will never be received will no longer be true, violating 834 TCP's semantics [I-D.eggert-tcpm-tcp-retransmit-now]. 836 Other approaches like TCP-F [CRVP01] or the Explicit Link Failure 837 Notification (ELFN) [HV02] inform a TCP sender about a disrupted path 838 by special messages generated and sent from intermediate routers. In 839 case of a link failure the TCP sender stops sending segments and 840 freezes its retransmission timers. TCP-F stays in this state and 841 remains silent until either a "route establishment notification" is 842 received or an internal timer expires. In contrast, ELFN 843 periodically probes the network to detect connectivity re- 844 establishment. Both proposals rely on changes to intermediate 845 routers, whereas the scheme proposed in this document is a sender- 846 only modification. Moreover, ELFN does not consider congestion and 847 may impose serious additional load on the network, depending on the 848 probe interval. 850 The authors of ATCP [LS01] propose enhancements to identify different 851 types of packet loss by introducing a layer between TCP and IP. They 852 utilize ICMP destination unreachable messages to set TCP's receiver 853 advertised window to zero and thus forcing the TCP sender to perform 854 zero window probing with a exponential backoff. ICMP destination 855 unreachable messages, which arrive during this probing period, are 856 ignored. This approach is nearly orthogonal to this document, which 857 exploits ICMP messages to undo a retransmission timer backoff when 858 TCP is already probing. In principle both mechanisms could be 859 combined, however, due to security considerations it does not seem 860 appropriate to adopt ATCP's reaction as discussed in Section 5.5. 862 Schuetz et al. describe in [I-D.schuetz-tcpm-tcp-rlci] a set of TCP 863 extensions that improve TCP's behavior when transmitting over paths 864 whose characteristics can change on short time-scales. Their 865 proposed extensions modify the local behavior of TCP and introduce a 866 new TCP option to signal locally received connectivity-change 867 indications (CCIs) to remote peers. Upon reception of a CCI, they 868 re-probe the path characteristics either by performing a speculative 869 retransmission or by sending a single segment of new data, depending 870 on whether the connection is currently stalled in exponential backoff 871 or transmitting in steady-state, respectively. The authors focus on 872 specifying TCP response mechanisms, nevertheless underlying layers 873 would have to be modified to explicitly send CCIs to make these 874 immediate responses possible. 876 9. IANA Considerations 878 This memo includes no request to IANA. 880 10. Security Considerations 882 The algorithm proposed in this document is considered to be secure. 883 For example an attacker, who already guessed the correct four-tuple 884 (i.e., Source IP Address, Source TCP port, Destination IP Address, 885 and Destination TCP port), can still not make a TCP modified with 886 TCP-LCD to flood the network just by sending forged ICMP unreachable 887 messages in an attempt to maliciously shorten the retransmission 888 timer. The attacker additionally would need to guess the correct 889 segment sequence number of the current timeout-based retransmission, 890 with a probability of at most 1/2^32. Even in the case of man-in- 891 the-middle attacks, i.e., attacks performed in scenarios in which the 892 attacker can sniff the retransmissions, the impact on network load is 893 considered to be low, since the retransmission frequency is limited 894 by the RTO that was computed before TCP has entered the timeout-based 895 loss recovery. Hence, the highest probing frequency is expected to 896 be even lower than once per minimum RTO, i.e. 1s as specified by 897 [RFC2988]. 899 11. Acknowledgments 901 We would like to thank Ilpo Jarvinen, Pasi Sarolahti, Timothy 902 Shepard, Joe Touch and Carsten Wolff for feedback on earlier versions 903 of this document. We also thank Michael Faber, Daniel Schaffrath, 904 and Damian Lukowski for implementing and testing the algorithm in 905 Linux. Special thanks go to Ilpo Jarvinen for giving valuable 906 feedback regarding the Linux implementation. 908 This work has been supported by the German National Science 909 Foundation (DFG) within the research excellence cluster Ultra High- 910 Speed Mobile Information and Communication (UMIC), RWTH Aachen 911 University. 913 12. References 915 12.1. Normative References 917 [I-D.ietf-tcpm-1323bis] 918 Borman, D., Braden, R., and V. Jacobson, "TCP Extensions 919 for High Performance", draft-ietf-tcpm-1323bis-01 (work in 920 progress), March 2009. 922 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 923 RFC 792, September 1981. 925 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 926 RFC 793, September 1981. 928 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 929 RFC 1812, June 1995. 931 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 932 Timer", RFC 2988, November 2000. 934 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 935 Control", RFC 5681, September 2009. 937 12.2. Informative References 939 [CRVP01] Chandran, K., Raghunathan, S., Venkatesan, S., and R. 940 Prakash, "A feedback-based scheme for improving TCP 941 performance in ad hoc wireless networks", IEEE Personal 942 Communications vol. 8, no. 1, pp. 34-39, February 2001. 944 [HV02] Holland, G. and N. Vaidya, "Analysis of TCP performance 945 over mobile ad hoc networks", Wireless Networks vol. 8, 946 no. 2-3, pp. 275-288, March 2002. 948 [I-D.eggert-tcpm-tcp-retransmit-now] 949 Eggert, L., "TCP Extensions for Immediate 950 Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02 951 (work in progress), June 2005. 953 [I-D.ietf-tcpm-icmp-attacks] 954 Gont, F., "ICMP attacks against TCP", 955 draft-ietf-tcpm-icmp-attacks-06 (work in progress), 956 August 2009. 958 [I-D.schuetz-tcpm-tcp-rlci] 959 Schuetz, S., Koutsianas, N., Eggert, L., Eddy, W., Swami, 960 Y., and K. Le, "TCP Response to Lower-Layer Connectivity- 961 Change Indications", draft-schuetz-tcpm-tcp-rlci-03 (work 962 in progress), February 2008. 964 [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time 965 Estimates in Reliable Transport Protocols", Proceedings of 966 the Conference on Applications, Technologies, 967 Architectures, and Protocols for Computer Communication 968 (SIGCOMM'87) pp. 2-7, August 1987. 970 [LS01] Liu, J. and S. Singh, "ATCP: TCP for mobile ad hoc 971 networks", IEEE Journal on Selected Areas in 972 Communications vol. 19, no. 7, pp. 1300-1315, 2001 July. 974 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 975 September 1981. 977 [RFC0826] Plummer, D., "Ethernet Address Resolution Protocol: Or 978 converting network protocol addresses to 48.bit Ethernet 979 address for transmission on Ethernet hardware", STD 37, 980 RFC 826, November 1982. 982 [RFC1122] Braden, R., "Requirements for Internet Hosts - 983 Communication Layers", STD 3, RFC 1122, October 1989. 985 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 986 October 1996. 988 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 989 Requirement Levels", BCP 14, RFC 2119, March 1997. 991 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 992 (IPv6) Specification", RFC 2460, December 1998. 994 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 995 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 996 March 2000. 998 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 999 of Explicit Congestion Notification (ECN) to IP", 1000 RFC 3168, September 2001. 1002 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 1003 for TCP", RFC 3522, April 2003. 1005 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 1006 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 1007 April 2004. 1009 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 1010 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1011 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1012 RFC 3819, July 2004. 1014 [RFC4015] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 1015 for TCP", RFC 4015, February 2005. 1017 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1018 Internet Protocol", RFC 4301, December 2005. 1020 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 1021 Message Protocol (ICMPv6) for the Internet Protocol 1022 Version 6 (IPv6) Specification", RFC 4443, March 2006. 1024 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1025 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1026 Spurious Retransmission Timeouts with TCP", RFC 5682, 1027 September 2009. 1029 [SESB05] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner, 1030 "Protocol enhancements for intermittently connected 1031 hosts", SIGCOMM Computer Communication Review vol. 35, no. 1032 3, pp. 5-18, December 2005. 1034 [SM03] Scott, J. and G. Mapp, "Link layer-based TCP optimisation 1035 for disconnecting networks", SIGCOMM Computer 1036 Communication Review vol. 33, no. 5, pp. 31-42, 1037 October 2003. 1039 [Zh86] Zhang, L., "Why TCP Timers Don't Work Well", Proceedings 1040 of the Conference on Applications, Technologies, 1041 Architectures, and Protocols for Computer Communication 1042 (SIGCOMM'86) pp. 397-405, August 1986. 1044 Appendix A. Changes from previous versions of the draft 1046 A.1. Changes from draft-zimmermann-tcp-lcd-02 1048 o Incorporated feedback submitted by Ilpo Jarvinen. 1049 1051 o Incorporated feedback submitted by Pasi Sarolahti. 1052 1054 o Incorporated feedback submitted by Joe Touch. 1055 1056 1058 o Extended and reorganized the discussion (Section 5): 1060 * Every discussion item got its own title, so that we have a 1061 better overview. 1063 * Extended Retransmission Ambiguity section. Added also some 1064 references to the historical retransmission ambiguity problem. 1066 * Heavily extended discussion about wrapped sequence numbers (see 1067 Joe's comments). 1069 * Described the influence of packet duplication on the algorithm 1070 (Thanks to Ilpo). 1072 * The section "Protecting Against Misbehaving Routers" is not a 1073 subsection anymore. Moreover, the section was renamed to 1074 "Dissolving Ambiguity Issues" and has now real content. 1076 o An interoperability issues section (Section 7) was added. In 1077 particular comments to ECN, ICMPv6, and to the two thresholds R1 1078 and R2 of [RFC1122] (Section 4.2.3.5) were added. 1080 o Miscellaneous editorial changes. In particular, the algorithm has 1081 a name now: TCP-LCD. 1083 A.2. Changes from draft-zimmermann-tcp-lcd-01 1085 o The algorithm in Section 4.2 was slightly changed. Instead of 1086 reverting the last retransmission timer backoff by halving the 1087 RTO, the RTO is recalculated with help of the "BACKOFF_CNT" 1088 variable. This fixes an issue that occurred when the 1089 retransmission timer was backed off but bounded by a maximum 1090 value. The algorithm in the previous version of the draft, would 1091 have "reverted" to half of that maximum value, instead of using 1092 the value, before the RTO was doubled (and then bounded). 1094 o Miscellaneous editorial changes. 1096 A.3. Changes from draft-zimmermann-tcp-lcd-00 1098 o Miscellaneous editorial changes in Section 1, 2 and 3. 1100 o The document was restructured in Section 1, 2 and 3 for easier 1101 reading. The motivation for the algorithm is changed according 1102 TCP's problem to disambiguate congestion from non-congestion loss. 1104 o Added Section 4.1. 1106 o The algorithm in Section 4.2 was restructured and simplified: 1108 * The special case of the first received ICMP destination 1109 unreachable message after an RTO was removed. 1111 * The "BACKOFF_CNT" variable was introduced so it is no longer 1112 possible to perform more reverts than backoffs. 1114 o The discussion in Section 5 was improved and expanded according to 1115 the algorithm changes. 1117 Authors' Addresses 1119 Alexander Zimmermann 1120 RWTH Aachen University 1121 Ahornstrasse 55 1122 Aachen, 52074 1123 Germany 1125 Phone: +49 241 80 21422 1126 Email: zimmermann@cs.rwth-aachen.de 1128 Arnd Hannemann 1129 RWTH Aachen University 1130 Ahornstrasse 55 1131 Aachen, 52074 1132 Germany 1134 Phone: +49 241 80 21423 1135 Email: hannemann@nets.rwth-aachen.de