idnits 2.17.1 draft-ietf-tcpm-tcp-lcd-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 14, 2010) is 4972 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor A. Zimmermann 3 Extensions (TCPM) WG A. Hannemann 4 Internet-Draft RWTH Aachen University 5 Intended status: Experimental September 14, 2010 6 Expires: March 18, 2011 8 Making TCP more Robust to Long Connectivity Disruptions (TCP-LCD) 9 draft-ietf-tcpm-tcp-lcd-03 11 Abstract 13 Disruptions in end-to-end path connectivity, which last longer than 14 one retransmission timeout, cause suboptimal TCP performance. The 15 reason for this performance degradation is that TCP interprets 16 segment loss induced by long connectivity disruptions as a sign of 17 congestion, resulting in repeated retransmission timer backoffs. 18 This, in turn, leads to a delayed detection of the re-establishment 19 of the connection since TCP waits for the next retransmission timeout 20 before it attempts a retransmission. 22 This document proposes an algorithm to make TCP more robust to long 23 connectivity disruptions (TCP-LCD). It describes how standard ICMP 24 messages can be exploited during timeout-based loss recovery to 25 disambiguate true congestion loss from non-congestion loss caused by 26 connectivity disruptions. Moreover, a reversion strategy of the 27 retransmission timer is specified that enables a more prompt 28 detection of whether or not the connectivity to a previously 29 disconnected peer node has been restored. TCP-LCD is a TCP sender- 30 only modification that effectively improves TCP performance in case 31 of connectivity disruptions. 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on March 18, 2011. 50 Copyright Notice 52 Copyright (c) 2010 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 3. Connectivity Disruption Indication . . . . . . . . . . . . . . 6 70 4. Connectivity Disruption Reaction . . . . . . . . . . . . . . . 8 71 4.1. Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . 8 72 4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 9 73 5. Discussion of TCP-LCD . . . . . . . . . . . . . . . . . . . . 12 74 5.1. Retransmission Ambiguity . . . . . . . . . . . . . . . . . 13 75 5.2. Wrapped Sequence Numbers . . . . . . . . . . . . . . . . . 13 76 5.3. Packet Duplication . . . . . . . . . . . . . . . . . . . . 14 77 5.4. Probing Frequency . . . . . . . . . . . . . . . . . . . . 15 78 5.5. Reaction during Connection Establishment . . . . . . . . . 15 79 5.6. Reaction in Steady-State . . . . . . . . . . . . . . . . . 15 80 6. Dissolving Ambiguity Issues using the TCP Timestamps Option . 16 81 7. Interoperability Issues . . . . . . . . . . . . . . . . . . . 17 82 7.1. Detection of TCP Connection Failures . . . . . . . . . . . 18 83 7.2. Explicit Congestion Notification (ECN) . . . . . . . . . . 18 84 7.3. TCP-LCD and IP Tunnels . . . . . . . . . . . . . . . . . . 18 85 8. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 87 10. Security Considerations . . . . . . . . . . . . . . . . . . . 20 88 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 89 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 90 12.1. Normative References . . . . . . . . . . . . . . . . . . . 21 91 12.2. Informative References . . . . . . . . . . . . . . . . . . 22 92 Appendix A. Changes from previous versions of the draft . . . . . 24 93 A.1. Changes from draft-ietf-tcpm-tcp-lcd-02 . . . . . . . . . 24 94 A.2. Changes from draft-ietf-tcpm-tcp-lcd-01 . . . . . . . . . 25 95 A.3. Changes from draft-ietf-tcpm-tcp-lcd-00 . . . . . . . . . 25 96 A.4. Changes from draft-zimmermann-tcp-lcd-02 . . . . . . . . . 25 97 A.5. Changes from draft-zimmermann-tcp-lcd-01 . . . . . . . . . 26 98 A.6. Changes from draft-zimmermann-tcp-lcd-00 . . . . . . . . . 26 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26 101 1. Terminology 103 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 104 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 105 document are to be interpreted as described in [RFC2119]. 107 The reader should be familiar with the algorithm and terminology from 108 [RFC2988], which defines the standard algorithm Transmission Control 109 Protocol (TCP) senders are required to use to compute and manage 110 their retransmission timer. In this document, the terms 111 "retransmission timer" and "retransmission timeout" are used as 112 defined in [RFC2988]. The retransmission timer ensures data delivery 113 in the absence of any feedback from the receiver. The duration of 114 this timer is referred to as retransmission timeout (RTO). 116 As defined in [RFC0793], the term "acceptable acknowledgment (ACK)" 117 refers to a TCP segment that acknowledges previously unacknowledged 118 data. The TCP sender state variable "SND.UNA" and the current 119 segment variable "SEG.SEQ" are used as defined in [RFC0793]. SND.UNA 120 holds the segment sequence number of earliest segment that has not 121 been acknowledged by the TCP receiver (the oldest outstanding 122 segment). SEG.SEQ is the segment sequence number of a given segment. 124 For the purposes of this specification, we define the term "timeout- 125 based loss recovery" that refers to the state that a TCP sender 126 enters upon the first timeout of the oldest outstanding segment 127 (SND.UNA) and leaves upon the arrival of the *first* acceptable ACK. 128 It is important to note that other documents use a different 129 interpretation of the term "timeout-based loss recovery". For 130 example, the NewReno modification to TCP's Fast Recovery algorithm 131 [RFC3782] extents the period a TCP sender remains in timeout-based 132 loss recovery compared to the one defined in this document. This is 133 because [RFC3782] attempts to avoid unnecessary multiple Fast 134 Retransmits that can occur after an RTO. 136 2. Introduction 138 Connectivity disruptions can occur in many different situations. The 139 frequency of connectivity disruptions depends on the properties of 140 the end-to-end path between the communicating hosts. While 141 connectivity disruptions can occur in traditional wired networks, 142 e.g., caused by an unplugged network cable, the likelihood of their 143 occurrence is significantly higher in wireless (multi-hop) networks. 144 Especially, end-host mobility, network topology changes, and wireless 145 interferences are crucial factors. In the case of the Transmission 146 Control Protocol (TCP) [RFC0793], the performance of the connection 147 can experience a significant reduction compared to a permanently 148 connected path [SESB05]. This is because TCP, which was originally 149 designed to operate in fixed and wired networks, generally assumes 150 that the end-to-end path connectivity is relatively stable over the 151 connection's lifetime. 153 Depending on their duration, connectivity disruptions can be 154 classified into two groups [I-D.schuetz-tcpm-tcp-rlci]: "short" and 155 "long". A connectivity disruption is "short" if connectivity returns 156 before the retransmission timer fires for the first time. In this 157 case, TCP recovers lost data segments through Fast Retransmit and 158 lost acknowledgments (ACK) through successfully delivered later ACKs. 159 Connectivity disruptions are declared as "long" for a given TCP 160 connection if the retransmission timer fires at least once before 161 connectivity is resumed. Whether or not path characteristics, like 162 the round trip time (RTT) or the available bandwidth, have changed 163 when connectivity resumes after a disruption is another important 164 aspect for TCP's retransmission scheme [I-D.schuetz-tcpm-tcp-rlci]. 166 The algorithm specified in this document improves TCP's behavior in 167 case of "long connectivity disruptions". In particular, it focuses 168 on the period prior to the re-establishment of the connectivity to a 169 previously disconnected peer node. The document does not describe 170 any modifications to TCP's behavior and its congestion control 171 mechanisms [RFC5681] after connectivity has been restored. 173 When a long connectivity disruption occurs on a TCP connection, the 174 TCP sender eventually does not receive any more acknowledgments. 175 After the retransmission timer expires, the TCP sender enters the 176 timeout-based loss recovery and declares the oldest outstanding 177 segment (SND.UNA) as lost. Since TCP tightly couples reliability and 178 congestion control, the retransmission of SND.UNA is triggered 179 together with the reduction of the transmission rate. This is based 180 on the assumption that segment loss is an indication of congestion 181 [RFC5681]. As long as the connectivity disruption persists, TCP will 182 repeat this procedure until the oldest outstanding segment has 183 successfully been acknowledged, or until the connection has timed 184 out. TCP implementations that follow the recommended retransmission 185 timeout (RTO) management of RFC 2988 [RFC2988] double the RTO after 186 each retransmission attempt. However, the RTO growth may be bounded 187 by an upper limit, the maximum RTO, which is at least 60s, but may be 188 longer: Linux, for example, uses 120s. If connectivity is restored 189 between two retransmission attempts, TCP still has to wait until the 190 retransmission timer expires before resuming transmission, since it 191 simply does not have any means to know if the connectivity has been 192 re-established. Therefore, depending on when connectivity becomes 193 available again, this can waste up to a maximum RTO of possible 194 transmission time. 196 This retransmission behavior is not efficient, especially in 197 scenarios with long connectivity disruptions. In the ideal case, TCP 198 would attempt a retransmission as soon as connectivity to its peer 199 has been re-established. In this document, we specify a TCP sender- 200 only modification to provide robustness to long connectivity 201 disruptions (TCP-LCD). The memo describes how the standard Internet 202 Control Message Protocol (ICMP) can be exploited during timeout-based 203 loss recovery to identify non-congestion loss caused by long 204 connectivity disruptions. TCP-LCD's reversion strategy of the 205 retransmission timer enables higher-frequency retransmissions and 206 thereby a prompt detection when connectivity to a previously 207 disconnected peer node has been restored. If no congestion is 208 present, TCP-LCD approaches the ideal behavior. 210 Experimental results of a Linux implementation of TCP-LCD have been 211 presented in [ZimHan09]. The implementation has been incorporated 212 into mainline Linux, and is already used within the Internet. Thus 213 far, no negative experiences have been reported that could be 214 attributed to the algorithm. However, we consider TCP-LCD as 215 experimental until more real-life results have been obtained. 216 Nevertheless, we encourage implementation of TCP-LCD under other 217 operating systems to provide for broader testing and experimentation 218 opportunities. 220 3. Connectivity Disruption Indication 222 If the queue of an intermediate router that is experiencing a link 223 outage can buffer all incoming packets, a connectivity disruption 224 will only cause a variation in delay, which is handled well by TCP 225 implementations using either Eifel [RFC3522], [RFC4015] or Forward 226 RTO-Recovery (F-RTO) [RFC5682]. However, if the link outage lasts 227 for too long, the router experiencing the link outage is forced to 228 drop packets, and finally to discard the according route. Means to 229 detect such link outages include reacting on failed address 230 resolution protocol (ARP) [RFC0826] queries, unsuccessful link 231 sensing, and the like. However, this is solely in the responsibility 232 of the respective router. 234 Note: The focus of this memo is on introducing a method how ICMP 235 messages may be exploited to improve TCP's performance; how 236 different physical and link layer mechanisms below the network 237 layer may trigger ICMP destination unreachable messages are out of 238 scope of this memo. 240 Provided that no other route to the specific destination exists, an 241 Internet Protocol version 4 (IPv4) [RFC0791] router will notify the 242 corresponding sending host about the dropped packets via ICMP 243 destination unreachable messages of code 0 (net unreachable) or code 244 1 (host unreachable) [RFC1812]. Therefore, the sending host can use 245 the ICMP destination unreachable messages of these codes as an 246 indication for a connectivity disruption, since the reception of 247 these messages provide evidence that packets were dropped due to a 248 link outage. 250 For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of 251 the ICMP destination unreachable message of code 0 (net unreachable) 252 and of code 1 (host unreachable) is the ICMPv6 destination 253 unreachable message of code 0 (no route to destination) [RFC4443]. 254 As with IPv4, a router should generate an ICMPv6 destination 255 unreachable message of code 0 in response to a packet that cannot be 256 delivered to its destination address because it lacks a matching 257 entry in its routing table. 259 Note that there are also other ICMP and ICMPv6 destination 260 unreachable messages with different codes. Some of them are 261 candidates for connectivity disruption indications, too, but need 262 further investigation. For example, ICMP destination unreachable 263 messages with code 5 (source route failed), code 11 (net unreachable 264 for TOS), or code 12 (host unreachable for TOS) [RFC1812]. On the 265 other hand, codes that flag hard errors are of no use for this 266 scheme, since TCP should abort the connection when those are received 267 [RFC1122]. 269 For the sake of simplicity, we will use, unless explicitly qualified 270 with ICMPv4 or ICMPv6, the term "ICMP unreachable message" as synonym 271 for ICMP destination unreachable messages of code 0 or code 1 and 272 ICMPv6 destination unreachable of code 0. This implies that all 273 keywords from [RFC2119] that deal with the handling of received ICMP 274 messages apply in the same way to ICMPv6 messages. 276 The accurate interpretation of ICMP unreachable messages as a 277 connectivity disruption indication is complicated by the following 278 two peculiarities of ICMP messages. First, they do not necessarily 279 operate on the same timescale as the packets, i.e., TCP segments that 280 elicited them. When a router drops a packet due to a missing route, 281 it will not necessarily send an ICMP unreachable message immediately, 282 but will rather queue it for later delivery. Second, ICMP messages 283 are subject to rate limiting, e.g., when a router drops a whole 284 window of data due to a link outage, it is unlikely to send as many 285 ICMP unreachable messages as dropped TCP segments. Depending on the 286 load of the router, it may not even send any ICMP unreachable 287 messages at all. Both peculiarities originate from [RFC1812] for 288 ICMPv4 and [RFC4443] for ICMPv6. 290 Fortunately, according to [RFC0792], ICMPv4 unreachable messages have 291 to contain in their body the entire IPv4 header [RFC0791] of the 292 datagram eliciting the ICMPv4 unreachable message, plus the first 64 293 bits of the payload of that datagram. This allows the sending host 294 to match the ICMPv4 error message to the transport connection that 295 elicited it. RFC 1812 [RFC1812] augments these requirements and 296 states that ICMPv4 messages should contain as much of the original 297 datagram as possible without the length of the ICMPv4 datagram 298 exceeding 576 bytes. Therefore, in case of TCP, at least the source 299 port number, the destination port number, and the 32-bit TCP sequence 300 number are included. This allows the originating TCP to demultiplex 301 the received ICMPv4 message and to identify the affected connection. 302 Moreover, it can identify which segment of the respective connection 303 triggered the ICMPv4 unreachable message, unless there are several 304 segments in-flight with the same sequence number (see Section 5.1). 306 For IPv6 [RFC2460], the payload of an ICMPv6 error messages has to 307 include as many bytes as possible from the IPv6 datagram that 308 elicited the ICMPv6 error message, without making the error message 309 exceed the minimum IPv6 MTU (1280 bytes) [RFC4443]. Thus, enough 310 information is available to identify both, the affected connection 311 and the corresponding segment that triggered the ICMPv6 error 312 message. 314 A connectivity disruption indication in form of an ICMP unreachable 315 message associated with a presumably lost TCP segment provides strong 316 evidence that the segment was not dropped due to congestion, but was 317 successfully delivered as far as the reporting router. It therefore 318 did not witness any congestion at least on that part of the path that 319 was traversed by both the TCP segment eliciting the ICMP unreachable 320 message as well as the ICMP unreachable message itself. 322 4. Connectivity Disruption Reaction 324 Section 4.1 introduces the basic idea of TCP-LCD. The complete 325 algorithm is specified in Section 4.2. 327 4.1. Basic Idea 329 The goal of the algorithm is to promptly detect when connectivity to 330 a previously disconnected peer node has been restored after a long 331 connectivity disruption, while retaining appropriate behavior in case 332 of congestion. TCP-LCD exploits standard ICMP unreachable messages 333 during timeout-based loss recovery. This increases TCP's 334 retransmission frequency by undoing one retransmission timer backoff 335 whenever an ICMP unreachable message is received that contains a 336 segment with a sequence number of a presumably lost retransmission. 338 This approach has the advantage of appropriately reducing the probing 339 rate in case of congestion. If either the retransmission itself or 340 the corresponding ICMP message is dropped the previously performed 341 retransmission timer backoff is not undone, which effectively halves 342 the probing rate. 344 4.2. Algorithm Details 346 A TCP sender that uses RFC 2988 [RFC2988] to compute TCP's 347 retransmission timer MAY employ the following scheme to avoid over- 348 conservative retransmission timer backoffs in case of long 349 connectivity disruptions. If a TCP sender does implement the 350 following steps, the algorithm MUST be initiated upon the first 351 timeout of the oldest outstanding segment (SND.UNA) and MUST be 352 stopped upon the arrival of the first acceptable ACK. The algorithm 353 MUST NOT be re-initiated upon subsequent timeouts for the same 354 segment. The scheme SHOULD NOT be used in SYN-SENT or SYN-RECEIVED 355 states [RFC0793] (see Section 5.5). 357 A TCP sender that does not employ RFC 2988 [RFC2988] to compute TCP's 358 retransmission timer MUST NOT use TCP-LCD. We envision that the 359 scheme could be easily adapted to algorithms others than RFC 2988. 360 However, we leave this as future work. 362 In rule (2.5), RFC 2988 [RFC2988] provides the option to place a 363 maximum value on the RTO. When a TCP implements this rule to provide 364 an upper bound for the RTO, it MUST also be used in the following 365 algorithm. In particular, if the RTO is bounded by an upper limit 366 (maximum RTO), the "MAX_RTO" variable used in this scheme MUST be 367 initialized with this upper limit. Otherwise, if the RTO is 368 unbounded, the "MAX_RTO" variable MUST be set to infinity. 370 The scheme specified in this document uses the "BACKOFF_CNT" 371 variable, whose initial value is zero. The variable is used to count 372 the number of performed retransmission timer backoffs during one 373 timeout-based loss recovery. Moreover, the "RTO_BASE" variable is 374 used to recover the previous RTO if the retransmission timer backoff 375 was unnecessary. The variable is initialized with the RTO upon 376 initiation of timeout-based loss recovery. 378 (1) Before TCP updates the variable "RTO" when it initiates timeout- 379 based loss recovery, set the variables "BACKOFF_CNT" and 380 "RTO_BASE" as follows: 382 BACKOFF_CNT := 0; 383 RTO_BASE := RTO. 385 Proceed to step (R). 387 (R) This is a placeholder for standard TCP's behavior in case the 388 retransmission timer has expired. In particular, if RFC 2988 389 [RFC2988] is used, steps (5.4) - (5.6) of that algorithm go 390 here. Proceed to step (2). 392 (2) To account for the expiration of the retransmission timer in the 393 previous step (R), increment the "BACKOFF_CNT" variable by one: 395 BACKOFF_CNT := BACKOFF_CNT + 1. 397 (3) Wait either 399 for the expiration of the retransmission timer. When the 400 retransmission timer expires, proceed to step (R); 402 or for the arrival of an acceptable ACK. When an acceptable 403 ACK arrives, proceed to step (A); 405 or for the arrival of an ICMP unreachable message. When the 406 ICMP unreachable message "ICMP_DU" arrives, proceed to step 407 (4). 409 (4) If "BACKOFF_CNT > 0", i.e., if at least one retransmission timer 410 backoff can be undone, then 412 proceed to step (5); 414 else 416 proceed to step (3). 418 (5) Extract the TCP segment header included in the ICMP unreachable 419 message "ICMP_DU": 421 SEG := Extract(ICMP_DU). 423 (6) If "SEG.SEQ == SND.UNA", i.e., if the TCP segment "SEG" 424 eliciting the ICMP unreachable message "ICMP_DU" contains the 425 sequence number of a retransmission, then 427 proceed to step (7); 429 else 431 proceed to step (3). 433 (7) Undo the last retransmission timer backoff: 435 BACKOFF_CNT := BACKOFF_CNT - 1; 436 RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO). 438 (8) If the retransmission timer expires due to the undoing in the 439 previous step (7), then 441 proceed to step (R); 443 else 445 proceed to step (3). 447 (A) This is a placeholder for standard TCP's behavior in case an 448 acceptable ACK has arrived. No further processing. 450 When a TCP in steady-state detects a segment loss using the 451 retransmission timer, it enters the timeout-based loss recovery and 452 initiates the algorithm (step 1). It adjusts the slow start 453 threshold (ssthresh), sets the congestion window (CWND) to one 454 segment, backs off the retransmission timer, and retransmits the 455 first unacknowledged segment (step R) [RFC5681], [RFC2988]. To 456 account for the expiration of the retransmission timer, the TCP 457 sender increments the "BACKOFF_CNT" variable by one (step 2). 459 In case the retransmission timer expires again (step 3a), a TCP will 460 repeat the retransmission of the first unacknowledged segment and 461 back off the retransmission timer once more (step R) [RFC2988], as 462 well as increment the "BACKOFF_CNT" variable by one (step 2). Note 463 that a TCP may implement RFC 2988's [RFC2988] option to place a 464 maximum value on the RTO that may result in not performing the 465 retransmission timer backoff. However, step (2) MUST always and 466 unconditionally be applied, no matter whether or not the 467 retransmission timer is actually backed off. In other words, each 468 time the retransmission timer expires, the "BACKOFF_CNT" variable 469 MUST be incremented by one. 471 If the first received packet after the retransmission(s) is an 472 acceptable ACK (step 3b), a TCP will proceed as normal, i.e., slow 473 start the connection and terminate the algorithm (step A). Later 474 ICMP unreachable messages from the just terminated timeout-based loss 475 recovery are ignored, since the ACK clock is already restarting due 476 to the successful retransmission. 478 On the other hand, if the first received packet after the 479 retransmission(s) is an ICMP unreachable message (step 3c), and if 480 step (4) permits it, TCP SHOULD undo one backoff for each ICMP 481 unreachable message reporting an error on a retransmission. To 482 decide if an ICMP unreachable message was elicited by a 483 retransmission, the sequence number it contains is inspected (step 5, 484 step 6). The undo is performed by re-calculating the RTO with the 485 decremented "BACKOFF_CNT" variable (step 7). This calculation 486 explicitly matches the (bounded) exponential backoff specified in 487 rule (5.5) of [RFC2988]. 489 Upon receipt of an ICMP unreachable message that legitimately undoes 490 one backoff, there is the possibility that the shortened 491 retransmission timer has already expired (step 8). Then, TCP SHOULD 492 retransmit immediately. In case the shortened retransmission timer 493 has not yet expired, TCP MUST wait accordingly. 495 5. Discussion of TCP-LCD 497 TCP-LCD takes caution to only react to connectivity disruption 498 indications in the form of ICMP unreachable messages during timeout- 499 based loss recovery. Therefore, TCP's behavior is not altered when 500 either no ICMP unreachable messages are received, or the 501 retransmission timer of the TCP sender did not expire since the last 502 received acceptable ACK. Thus, by definition, the algorithm triggers 503 only in the case of long connectivity disruptions. 505 Only such ICMP unreachable messages that contain a TCP segment with 506 the sequence number of a retransmission, i.e., contain SND.UNA, are 507 evaluated by TCP-LCD. All other ICMP unreachable messages are 508 ignored. The arrival of those ICMP unreachable messages provides 509 strong evidence that the retransmissions were not dropped due to 510 congestion, but were successfully delivered to the reporting router. 511 In other words, there is no evidence for any congestion at least on 512 that very part of the path that was traversed by both the TCP segment 513 eliciting the ICMP unreachable message as well as the ICMP 514 unreachable message itself. 516 However, there are some situations where TCP-LCD makes a false 517 decision and incorrectly undoes a retransmission timer backoff. This 518 can happen, even when the received ICMP unreachable message contains 519 the segment number of a retransmission (SND.UNA), because the TCP 520 segment that elicited the ICMP unreachable message may either not be 521 a retransmission (Section 5.1), or does not belong to the current 522 timeout-based loss recovery (Section 5.2). Finally, packet 523 duplication (Section 5.3) can also spuriously trigger the algorithm. 525 Section 5.4 discusses possible probing frequencies, while Section 5.6 526 describes the motivation for not reacting to ICMP unreachable 527 messages while TCP is in steady-state. 529 5.1. Retransmission Ambiguity 531 Historically, the retransmission ambiguity problem [Zh86], [KP87] is 532 the TCP sender's inability to distinguish whether the first 533 acceptable ACK after a retransmission refers to the original 534 transmission or to the retransmission. This problem occurs after 535 both a Fast Retransmit and a timeout-based retransmit. However, 536 modern TCP implementations can eliminate the retransmission ambiguity 537 with either the help of Eifel [RFC3522], [RFC4015] or Forward RTO- 538 Recovery (F-RTO) [RFC5682]. 540 The reversion strategy of the given algorithm suffers from a form of 541 retransmission ambiguity, too. In contrast to the above case, TCP 542 suffers from ambiguity regarding ICMP unreachable messages received 543 during timeout-based loss recovery. With the TCP segment number 544 included in the ICMP unreachable message, a TCP sender is not able to 545 determine if the ICMP unreachable message refers to the original 546 transmission or to any of the timeout-based retransmissions. That 547 is, there is an ambiguity with regards to which TCP segment an ICMP 548 unreachable message reports on. 550 However, this ambiguity is not considered to be a problem for the 551 algorithm. The assumption that a received ICMP unreachable message 552 provides evidence that a non-congestion loss caused by the 553 connectivity disruption was wrongly considered a congestion loss 554 still holds, regardless to which TCP segment, transmission or 555 retransmission, the message refers. 557 5.2. Wrapped Sequence Numbers 559 Besides the ambiguity whether a received ICMP unreachable message 560 refers to the original transmission or to any of the retransmissions, 561 there is another source of ambiguity related to the TCP sequence 562 numbers contained in ICMP unreachable messages. For high bandwidth 563 paths, the sequence space may wrap quickly. This might cause that 564 delayed ICMP unreachable messages may coincidentally fit as valid 565 input in the proposed scheme. As a result, the scheme may 566 incorrectly undo retransmission timer backoffs. Chances for this to 567 happen are minuscule, since a particular ICMP unreachable message 568 would need to contain the exact sequence number of the current oldest 569 outstanding segment (SND.UNA), while at the same time TCP is in 570 timeout-based loss recovery. However, two "worst case" scenarios for 571 the algorithm are possible: 573 For instance, consider a steady state TCP connection, which will be 574 disrupted at an intermediate router due to a link outage. Upon the 575 expiration of the RTO, the TCP sender enters the timeout-based loss 576 recovery and starts to retransmit the earliest segment that has not 577 been acknowledged (SND.UNA). For some reason, the router delays all 578 corresponding ICMP unreachable messages so that the TCP sender backs 579 the retransmission timer off normally without any undoing. At the 580 end of the connectivity disruption, the TCP sender eventually detects 581 the re-establishment, leaves the scheme and finally the timeout-based 582 loss recovery, too. A sequence number wrap-around later, the 583 connectivity between the two peers is disrupted again, but this time 584 due to congestion and exactly at the time at which the current 585 SND.UNA matches the SND.UNA from the previous cycle. If the router 586 emits the delayed ICMP unreachable messages now, the TCP sender would 587 incorrectly undo retransmission timer backoffs. As the TCP sequence 588 number contains 32 bits, the probability of this scenario is at most 589 1/2^32. Given sufficiently many retransmissions in the first 590 timeout-based loss recovery, the corresponding ICMP unreachable 591 messages could reduce the RTO in the second recovery at most to 592 "RTO_BASE". However, once the ICMP unreachable messages are 593 depleted, the standard exponential backoff will be performed. Thus, 594 the congestion response will only be delayed by some false 595 retransmissions. 597 Similar to the above, consider the case where a steady state TCP 598 connection with n segments in flight will be disrupted at some point 599 due to a link outage at an intermediate router. For each segment in 600 flight, the router may generate an ICMP unreachable message. 601 However, due to some reason it delays them. Once the link outage is 602 over and the connection has been re-established, the TCP sender 603 leaves the scheme and slow-starts the connection. Following a 604 sequence number wrap-around, a retransmission timeout occurs, just at 605 the moment the TCP sender's current window of data reaches the 606 previous range of the sequence number space again. In case the 607 router emits the delayed ICMP unreachable messages now, spurious 608 undoing of the retransmission timer backoff is possible once, if the 609 TCP segment number contained in ICMP unreachable messages matches the 610 current SND.UNA, and the timeout was a result of congestion. In the 611 case of another connectivity disruption, the additional undoing of 612 the retransmission timer backoff has no impact. The probability of 613 this scenario is at most n/2^32. 615 5.3. Packet Duplication 617 In case an intermediate router duplicates packets, a TCP sender may 618 receive more ICMP unreachable messages during timeout-based loss 619 recovery than sent timeout-based retransmissions. However, since 620 TCP-LCD keeps track of the number of performed retransmission timer 621 backoffs in the "BACKOFF_CNT" variable, it will not undo more 622 retransmission timer backoffs than were actually performed. 623 Nevertheless, if packet duplication and congestion coincide on the 624 path between the two communicating hosts, duplicated ICMP unreachable 625 messages could hide the congestion loss of some retransmissions or 626 ICMP unreachable messages, and the algorithm may incorrectly undo 627 retransmission timer backoffs. Considering the overall impact of a 628 router that duplicates packets, the additional load induced by some 629 spurious timeout-based retransmits can probably be neglected. 631 5.4. Probing Frequency 633 One might argue that if an ICMP unreachable message arrives for a 634 timeout-based retransmission, the RTO shall be reset or recalculated, 635 similar to what is done when an ACK arrives during timeout-based loss 636 recovery (see Karn's algorithm [KP87], [RFC2988]), and a new 637 retransmission should be sent immediately. Generally, this would 638 result in a much higher probing frequency based on the round trip 639 time to the router where connectivity has been disrupted. However, 640 we believe the current scheme provides a good trade-off between 641 conservative behavior and fast detection of connectivity re- 642 establishment. TCP-LCD focuses on long-connectivity disruptions, 643 i.e., on disruptions that last for several RTOs. Thus, a much higher 644 probing frequency (less then once per RTO) would not significantly 645 increase the available transmission time compared to the duration of 646 the connectivity disruption. 648 5.5. Reaction during Connection Establishment 650 It is possible that a TCP sender enters timeout-based loss recovery 651 while the connection is in SYN-SENT or SYN-RECEIVED states [RFC0793]. 652 The algorithm described in this document could also be used for 653 faster connection establishment in networks with connectivity 654 disruptions. However, because existing TCP implementations [RFC5461] 655 already interpret ICMP unreachable messages during connection 656 establishment and abort the corresponding connection, we refrain from 657 suggesting this. 659 5.6. Reaction in Steady-State 661 Another exploitation of ICMP unreachable messages in the context of 662 TCP congestion control might seem appropriate, while TCP is in 663 steady-state. As the RTT up to the router that generated the ICMP 664 unreachable message is likely to be substantially shorter than the 665 overall RTT to the destination, the ICMP unreachable message may very 666 well reach the originating TCP while it is transmitting the current 667 window of data. In case the remaining window is large, it might seem 668 appropriate to refrain from transmitting the remaining window as 669 there is timely evidence that it will only trigger further ICMP 670 unreachable messages at the very router. Although this promises 671 improvement from a wastage perspective, it may be counterproductive 672 from a security perspective. An attacker could forge such ICMP 673 messages, thereby forcing the originating TCP to stop sending data, 674 very similar to the blind throughput-reduction attack mentioned in 675 [RFC5927]. 677 An additional consideration is the following: in the presence of 678 multi-path routing, even the receipt of a legitimate ICMP unreachable 679 message cannot be exploited accurately, because there is the 680 possibility that only one of the multiple paths to the destination is 681 suffering from a connectivity disruption, which causes ICMP 682 unreachable messages to be sent. Then, however, there is the 683 possibility that the path along which the connectivity disruption 684 occurred contributed considerably to the overall bandwidth, such that 685 a congestion response is very well reasonable. However, this is not 686 necessarily the case. Therefore, a TCP has no means except for its 687 inherent congestion control to decide on this matter. All in all, it 688 seems that for a connection in steady-state, i.e., not in timeout- 689 based loss recovery, reacting on ICMP unreachable messages in regard 690 to congestion control is not appropriate. For the case of timeout- 691 based retransmissions, however, there is a reasonable congestion 692 response, which is skipping further retransmission timer backoffs 693 because there is no congestion indication - as described above. 695 6. Dissolving Ambiguity Issues using the TCP Timestamps Option 697 If the TCP Timestamps option [RFC1323] is enabled for a connection, a 698 TCP sender SHOULD use the following algorithm to dissolve the 699 ambiguity issues mentioned in Sections 5.1, 5.2, and 5.3. In 700 particular, both the retransmission ambiguity and the packet 701 duplication problems are prevented by the following TCP-LCD variant. 702 On the other hand, the false positives caused by wrapped sequence 703 numbers cannot be completely avoided, but the likelihood is further 704 reduced by a factor of 1/2^32 since the Timestamp Value field (TSval) 705 of the TCP Timestamps Option contains 32 bits. 707 Hence, implementers may choose to implement the TCP-LCD with the 708 following modifications. 710 Step (1) is replaced by step (1'): 712 (1') Before TCP updates the variable "RTO" when it initiates 713 timeout-based loss recovery, set the variables "BACKOFF_CNT" 714 and "RTO_BASE" and the data structure "RETRANS_TS" as follows: 716 BACKOFF_CNT := 0; 717 RTO_BASE := RTO; 718 RETRANS_TS := []. 720 Proceed to step (R). 722 Step (2) is extended by step (2b): 724 (2b) Store the value of the Timestamp Value field (TSval) of the TCP 725 Timestamps option included in the retransmission "RET" sent in 726 step (R) into the "RETRANS_TS" data structure: 728 RETRANS_TS.add(RET.TSval) 730 Step (6) is replaced by step (6'): 732 (6') If "SEG.SEQ == SND.UNA && RETRANS_TS.exists(SEQ.TSval)", i.e., 733 if the TCP segment "SEG" eliciting the ICMP unreachable message 734 "ICMP_DU" contains the sequence number of a retransmission, and 735 the value in its Timestamp Value field (TSval) is valid, then 737 proceed to step (7'); 739 else 741 proceed to step (3). 743 Step (7) is replaced by step (7'): 745 (7') Undo the last retransmission timer backoff: 747 RETRANS_TS.remove(SEQ.TSval); 748 BACKOFF_CNT := BACKOFF_CNT - 1; 749 RTO := min(RTO_BASE * 2^(BACKOFF_CNT), MAX_RTO). 751 The downside of the this variant is twofold. First, the 752 modifications come at a cost: the TCP sender is required to store the 753 timestamps of all retransmissions sent during one timeout-based loss 754 recovery. Second, this variant can only undo a retransmission timer 755 backoff if the intermediate router experiencing the link outage 756 implements [RFC1812] and chooses to include as many more than the 757 first 64 bits of the payload of the triggering datagram, as are 758 needed to include the TCP Timestamps option in the ICMP unreachable 759 message. 761 7. Interoperability Issues 763 This section discusses interoperability issues related to introducing 764 TCP-LCD. 766 7.1. Detection of TCP Connection Failures 768 TCP-LCD may have side-effects on TCP implementations that attempt to 769 detect TCP connection failures by counting timeout-based 770 retransmissions. [RFC1122] states in Section 4.2.3.5 that a TCP host 771 must handle excessive retransmissions of data segments with two 772 thresholds R1 and R2 that measure the number of retransmissions that 773 have occurred for the same segment. Both thresholds might either be 774 measured in time units or as a count of retransmissions. 776 Due to TCP-LCD's reversion strategy of the retransmission timer, the 777 assumption that a certain number of retransmissions corresponds to a 778 specific time interval no longer holds, as additional retransmissions 779 may be performed during timeout-based-loss recovery to detect the end 780 of the connectivity disruption. Therefore, a TCP employing TCP-LCD 781 either MUST measure the thresholds R1 and R2 in time units or, in 782 case R1 and R2 are counters of retransmissions, MUST convert them 783 into time intervals, which correspond to the time an unmodified TCP 784 would need to reach the specified number of retransmissions. 786 7.2. Explicit Congestion Notification (ECN) 788 With Explicit Congestion Notification (ECN) [RFC3168], ECN-capable 789 routers are no longer limited to dropping packets to indicate 790 congestion. Instead, they can set the Congestion Experienced (CE) 791 codepoint in the IP header to indicate congestion. With TCP-LCD, it 792 may happen that during a connectivity disruption, a received ICMP 793 unreachable message has been elicited by a timeout-based 794 retransmission that was marked with the CE codepoint before reaching 795 the router experiencing the link outage. In such a case, a TCP 796 sender MUST, corresponding to [RFC3168] (Section 6.1.2), additionally 797 reset the retransmission timer in case the algorithm undoes a 798 retransmission timer backoff. 800 7.3. TCP-LCD and IP Tunnels 802 It is worth noting that IP tunnels, including IPsec [RFC4301], IP in 803 IP [RFC2003], Generic Routing Encapsulation (GRE) [RFC2784], and 804 others are compatible with TCP-LCD, as long as the received ICMP 805 unreachable messages can be demultiplexed and extracted appropriately 806 by the TCP sender during timeout-based loss recovery. 808 If, for example, end-to-end tunnels like IPsec in transport mode 809 [RFC4301] are employed, a TCP sender may receive ICMP unreachable 810 messages where additional steps, e.g., decrypting in step (5) of the 811 algorithm, are needed to extract the TCP header from these ICMP 812 messages. Provided that the received ICMP unreachable message 813 contains enough information, i.e., SEQ.SEG is extractable, this 814 information can still be used as a valid input for the proposed 815 algorithm. 817 Likewise, if IP encapsulation like [RFC2003] is used in some part of 818 the path between the communicating hosts, the tunnel ingress node may 819 receive the ICMP unreachable messages from an intermediate router 820 experiencing the link outage. Nevertheless, the tunnel ingress node 821 may replay the ICMP unreachable messages in order to inform the TCP 822 sender. If enough information is preserved to extract SEQ.SEG, the 823 replayed ICMP unreachable messages can still be used in TCP-LCD. 825 8. Related Work 827 Several methods that address TCP's problems in the presence of 828 connectivity disruptions have been proposed in literature. Some of 829 them try to improve TCP's performance by modifying lower layers. For 830 example, [SM03] introduces a "smart link layer", which buffers one 831 segment for each active connection and replays these segments upon 832 connectivity re-establishment. This approach has a serious drawback: 833 previously stateless intermediate routers have to be modified in 834 order to inspect TCP headers, to track the end-to-end connection, and 835 to provide additional buffer space. This leads to an additional need 836 of memory and processing power. 838 On the other hand, stateless link layer schemes, as proposed in 839 [RFC3819], which unconditionally buffer some small number of packets 840 may have another problem: if a packet is buffered longer than the 841 maximum segment lifetime (MSL) of 2 min [RFC0793], i.e., the 842 disconnection lasts longer than MSL, TCP's assumption that such 843 segments will never be received will no longer be true, violating 844 TCP's semantics [I-D.eggert-tcpm-tcp-retransmit-now]. 846 Other approaches, like TCP-F [CRVP01] or the Explicit Link Failure 847 Notification (ELFN) [HV02] inform a TCP sender about a disrupted path 848 by special messages generated and sent from intermediate routers. In 849 the case of a link failure, the TCP sender stops sending segments and 850 freezes its retransmission timers. TCP-F stays in this state and 851 remains silent until either a "route establishment notification" is 852 received or an internal timer expires. In contrast, ELFN 853 periodically probes the network to detect connectivity re- 854 establishment. Both proposals rely on changes to intermediate 855 routers, whereas the scheme proposed in this document is a sender- 856 only modification. Moreover, ELFN does not consider congestion and 857 may impose serious additional load on the network, depending on the 858 probe interval. 860 The authors of ATCP [LS01] propose enhancements to identify different 861 types of packet loss by introducing a layer between TCP and IP. They 862 utilize ICMP destination unreachable messages to set TCP's receiver 863 advertised window to zero, thus forcing the TCP sender to perform 864 zero window probing with an exponential backoff. ICMP destination 865 unreachable messages that arrive during this probing period are 866 ignored. This approach is nearly orthogonal to this document, which 867 exploits ICMP messages to undo a retransmission timer backoff when 868 TCP is already probing. In principle, both mechanisms could be 869 combined. However, due to security considerations, it does not seem 870 appropriate to adopt ATCP's reaction, as discussed in Section 5.6. 872 Schuetz et al. [I-D.schuetz-tcpm-tcp-rlci] describe a set of TCP 873 extensions that improve TCP's behavior when transmitting over paths 874 whose characteristics can change rapidly. Their proposed extensions 875 modify the local behavior of TCP and introduce a new TCP option to 876 signal locally received connectivity-change indications (CCIs) to 877 remote peers. Upon receipt of a CCI, they re-probe the path 878 characteristics either by performing a speculative retransmission or 879 by sending a single segment of new data, depending on whether the 880 connection is currently stalled in exponential backoff or 881 transmitting in steady-state, respectively. The authors focus on 882 specifying TCP response mechanisms, nevertheless underlying layers 883 would have to be modified to explicitly send CCIs to make these 884 immediate responses possible. 886 9. IANA Considerations 888 This memo includes no request to IANA. 890 10. Security Considerations 892 Generally, an attacker has only two attack alternatives: to generate 893 ICMP unreachable messages to try to make a TCP modified with TCP-LCD 894 to flood the network, or to suppress legitimate ICMP unreachable 895 messages to try to slow down the transmission rate of a TCP sender. 897 In order to generate ICMP unreachable messages that fit as an input 898 for TCP-LCD, an attacker would need to guess the correct four-tuple 899 (i.e., Source IP Address, Source TCP port, Destination IP Address, 900 and Destination TCP port) and the exact segment sequence number of 901 the current timeout-based retransmission. Yet, the correct sequence 902 number is generally hard to guess as; with a probability of 1/2^32. 903 Even if an attacker has information about that sequence number (i.e., 904 the attacker can eavesdrop on the retransmissions) the impact on the 905 network load the attacker may be considered low, since the 906 retransmission frequency is limited by the RTO that was computed 907 before TCP had entered the timeout-based loss recovery. Hence, the 908 highest probing frequency is expected to be even lower than once per 909 minimum RTO, i.e., 1s as specified by [RFC2988]. It is important to 910 note, that an attacker, who can correctly guess the four-tuple and 911 the segment sequence number, can easily launch more serious attacks 912 (i.e., hijack the connection), whether or not TCP-LCD is used. 914 There may be means by which an attacker can cause the suppression of 915 legitimate ICMP unreachable messages (e.g., by flooding the router 916 experiencing the link outage to trigger ICMP rate-limiting). 917 However, even if the attacker could suppress every legitimate ICMP 918 unreachable message, the security impact of such an attack is 919 negligible, since the TCP sender using TCP-LCD will behave like a 920 regular TCP would. Note that this kind of attack is 921 indistinguishable from a router experiencing a link outage is not 922 sending ICMP unreachable messages at all (e.g., because of local 923 policy). 925 In summary, the algorithm proposed in this document is considered to 926 be secure. 928 11. Acknowledgments 930 We would like to thank Lars Eggert, Adrian Farrel, Mark Handley, Kai 931 Jakobs, Ilpo Jarvinen, Enrico Marocco, Catherine Meadows, Juergen 932 Quittek, Pasi Sarolahti, Tim Shepard, Joe Touch and Carsten Wolff for 933 feedback on earlier versions of this document. We also thank Michael 934 Faber, Daniel Schaffrath, and Damian Lukowski for implementing and 935 testing the algorithm in Linux. Special thanks go to Ilpo Jarvinen 936 for giving valuable feedback regarding the Linux implementation. 938 This work has been supported by the German National Science 939 Foundation (DFG) within the research excellence cluster Ultra High- 940 Speed Mobile Information and Communication (UMIC), RWTH Aachen 941 University. 943 12. References 945 12.1. Normative References 947 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 948 RFC 792, September 1981. 950 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 951 RFC 793, September 1981. 953 [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions 954 for High Performance", RFC 1323, May 1992. 956 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 957 RFC 1812, June 1995. 959 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 960 Timer", RFC 2988, November 2000. 962 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 963 Message Protocol (ICMPv6) for the Internet Protocol 964 Version 6 (IPv6) Specification", RFC 4443, March 2006. 966 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 967 Control", RFC 5681, September 2009. 969 12.2. Informative References 971 [CRVP01] Chandran, K., Raghunathan, S., Venkatesan, S., and R. 972 Prakash, "A feedback-based scheme for improving TCP 973 performance in ad hoc wireless networks", IEEE Personal 974 Communications vol. 8, no. 1, pp. 34-39, February 2001. 976 [HV02] Holland, G. and N. Vaidya, "Analysis of TCP performance 977 over mobile ad hoc networks", Wireless Networks vol. 8, 978 no. 2-3, pp. 275-288, March 2002. 980 [I-D.eggert-tcpm-tcp-retransmit-now] 981 Eggert, L., "TCP Extensions for Immediate 982 Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02 983 (work in progress), June 2005. 985 [I-D.schuetz-tcpm-tcp-rlci] 986 Schuetz, S., Koutsianas, N., Eggert, L., Eddy, W., Swami, 987 Y., and K. Le, "TCP Response to Lower-Layer Connectivity- 988 Change Indications", draft-schuetz-tcpm-tcp-rlci-03 (work 989 in progress), February 2008. 991 [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time 992 Estimates in Reliable Transport Protocols", Proceedings of 993 the Conference on Applications, Technologies, 994 Architectures, and Protocols for Computer Communication 995 (SIGCOMM'87) pp. 2-7, August 1987. 997 [LS01] Liu, J. and S. Singh, "ATCP: TCP for mobile ad hoc 998 networks", IEEE Journal on Selected Areas in 999 Communications vol. 19, no. 7, pp. 1300-1315, 2001 July. 1001 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1002 September 1981. 1004 [RFC0826] Plummer, D., "Ethernet Address Resolution Protocol: Or 1005 converting network protocol addresses to 48.bit Ethernet 1006 address for transmission on Ethernet hardware", STD 37, 1007 RFC 826, November 1982. 1009 [RFC1122] Braden, R., "Requirements for Internet Hosts - 1010 Communication Layers", STD 3, RFC 1122, October 1989. 1012 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 1013 October 1996. 1015 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1016 Requirement Levels", BCP 14, RFC 2119, March 1997. 1018 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1019 (IPv6) Specification", RFC 2460, December 1998. 1021 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1022 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1023 March 2000. 1025 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1026 of Explicit Congestion Notification (ECN) to IP", 1027 RFC 3168, September 2001. 1029 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 1030 for TCP", RFC 3522, April 2003. 1032 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 1033 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 1034 April 2004. 1036 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 1037 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 1038 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 1039 RFC 3819, July 2004. 1041 [RFC4015] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 1042 for TCP", RFC 4015, February 2005. 1044 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1045 Internet Protocol", RFC 4301, December 2005. 1047 [RFC5461] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461, 1048 February 2009. 1050 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1051 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1052 Spurious Retransmission Timeouts with TCP", RFC 5682, 1053 September 2009. 1055 [RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010. 1057 [SESB05] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner, 1058 "Protocol enhancements for intermittently connected 1059 hosts", SIGCOMM Computer Communication Review vol. 35, no. 1060 3, pp. 5-18, December 2005. 1062 [SM03] Scott, J. and G. Mapp, "Link layer-based TCP optimisation 1063 for disconnecting networks", SIGCOMM Computer 1064 Communication Review vol. 33, no. 5, pp. 31-42, 1065 October 2003. 1067 [Zh86] Zhang, L., "Why TCP Timers Don't Work Well", Proceedings 1068 of the Conference on Applications, Technologies, 1069 Architectures, and Protocols for Computer Communication 1070 (SIGCOMM'86) pp. 397-405, August 1986. 1072 [ZimHan09] 1073 Zimmermann, A., "Make TCP more Robust to Long Connectivity 1074 Disruptions", Proceedings of the 75th IETF Meeting slides, 1075 July 2009, 1076 . 1078 Appendix A. Changes from previous versions of the draft 1080 This appendix should be removed by the RFC Editor before publishing 1081 this document as an RFC. 1083 A.1. Changes from draft-ietf-tcpm-tcp-lcd-02 1085 o Incorporated feedback submitted by Enrico Marocco (Gen-ART Review) 1087 o Incorporated feedback submitted by Juergen Quittek (OpsDir Review) 1089 o Incorporated feedback submitted by Catherine Meadows (SecDir 1090 Review) 1092 o Incorporated feedback submitted by Adrian Farrel (IESG Review) 1094 A.2. Changes from draft-ietf-tcpm-tcp-lcd-01 1096 o Incorporated feedback submitted by Lars Eggert (AD Review) 1098 A.3. Changes from draft-ietf-tcpm-tcp-lcd-00 1100 o Editorial changes. 1102 o Clarified TCP-LCD's behaviour during connection establishment 1103 (Thanks to Mark Handley). 1105 A.4. Changes from draft-zimmermann-tcp-lcd-02 1107 o Incorporated feedback submitted by Ilpo Jarvinen. 1108 1110 o Incorporated feedback submitted by Pasi Sarolahti. 1111 1113 o Incorporated feedback submitted by Joe Touch. 1114 1115 1117 o Extended and reorganized the discussion (Section 5): 1119 * Every discussion item got its own title, so that we have a 1120 better overview. 1122 * Extended Retransmission Ambiguity section. Added also some 1123 references to the historical retransmission ambiguity problem. 1125 * Heavily extended discussion about wrapped sequence numbers (see 1126 Joe's comments). 1128 * Described the influence of packet duplication on the algorithm 1129 (Thanks to Ilpo). 1131 * The section "Protecting Against Misbehaving Routers" is not a 1132 subsection anymore. Moreover, the section was renamed to 1133 "Dissolving Ambiguity Issues" and has now real content. 1135 o An interoperability issues section (Section 7) was added. In 1136 particular comments to ECN, ICMPv6, and to the two thresholds R1 1137 and R2 of [RFC1122] (Section 4.2.3.5) were added. 1139 o Miscellaneous editorial changes. In particular, the algorithm has 1140 a name now: TCP-LCD. 1142 A.5. Changes from draft-zimmermann-tcp-lcd-01 1144 o The algorithm in Section 4.2 was slightly changed. Instead of 1145 reverting the last retransmission timer backoff by halving the 1146 RTO, the RTO is recalculated with help of the "BACKOFF_CNT" 1147 variable. This fixes an issue that occurred when the 1148 retransmission timer was backed off but bounded by a maximum 1149 value. The algorithm in the previous version of the draft, would 1150 have "reverted" to half of that maximum value, instead of using 1151 the value, before the RTO was doubled (and then bounded). 1153 o Miscellaneous editorial changes. 1155 A.6. Changes from draft-zimmermann-tcp-lcd-00 1157 o Miscellaneous editorial changes in Section 1, 2 and 3. 1159 o The document was restructured in Section 1, 2 and 3 for easier 1160 reading. The motivation for the algorithm is changed according 1161 TCP's problem to disambiguate congestion from non-congestion loss. 1163 o Added Section 4.1. 1165 o The algorithm in Section 4.2 was restructured and simplified: 1167 * The special case of the first received ICMP destination 1168 unreachable message after an RTO was removed. 1170 * The "BACKOFF_CNT" variable was introduced so it is no longer 1171 possible to perform more reverts than backoffs. 1173 o The discussion in Section 5 was improved and expanded according to 1174 the algorithm changes. 1176 Authors' Addresses 1178 Alexander Zimmermann 1179 RWTH Aachen University 1180 Ahornstrasse 55 1181 Aachen, 52074 1182 Germany 1184 Phone: +49 241 80 21422 1185 Email: zimmermann@cs.rwth-aachen.de 1186 Arnd Hannemann 1187 RWTH Aachen University 1188 Ahornstrasse 55 1189 Aachen, 52074 1190 Germany 1192 Phone: +49 241 80 21423 1193 Email: hannemann@nets.rwth-aachen.de