idnits 2.17.1 draft-zimmermann-tcp-lcd-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 26, 2009) is 5356 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-21) exists of draft-ietf-tcpm-1323bis-01 ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) -- Obsolete informational reference (is this intentional?): RFC 2629 (Obsoleted by RFC 7749) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force A. Zimmermann 3 Internet-Draft A. Hannemann 4 Intended status: Experimental RWTH Aachen University 5 Expires: February 27, 2010 August 26, 2009 7 Make TCP more Robust to Long Connectivity Disruptions 8 draft-zimmermann-tcp-lcd-02 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on February 27, 2010. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents in effect on the date of 40 publication of this document (http://trustee.ietf.org/license-info). 41 Please review these documents carefully, as they describe your rights 42 and restrictions with respect to this document. 44 Abstract 46 Disruptions in end-to-end path connectivity which last longer than 47 one retransmission timeout cause suboptimal TCP performance. The 48 reason for the performance degradation is that TCP interprets segment 49 loss induced by connectivity disruptions as a sign of congestion, 50 resulting in repeated backoffs of the retransmission timer. This 51 leads in turn to a deferred detection of the re-establishment of the 52 connection since TCP waits until the next retransmission timeout 53 occurs before attempting the retransmission. 55 This document describes how standard ICMP messages can be exploited 56 to disambiguate true congestion loss from non-congestion loss caused 57 by long connectivity disruptions. Moreover, a revert strategy of the 58 retransmission timer is specified that enables a more prompt 59 detection of whether the connectivity to a previously disconnected 60 peer node has been restored or not. The specified algorithm is a TCP 61 sender-only modification that effectively improves TCP performance in 62 presence of connectivity disruptions. 64 Table of Contents 66 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 68 3. Connectivity Disruption Indication . . . . . . . . . . . . . . 5 69 4. Connectivity Disruption Reaction . . . . . . . . . . . . . . . 6 70 4.1. Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . 6 71 4.2. The Algorithm . . . . . . . . . . . . . . . . . . . . . . 7 72 4.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 9 73 4.4. Protecting Against Misbehaving Routers (the Safe 74 Variant) . . . . . . . . . . . . . . . . . . . . . . . . . 11 75 5. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 11 76 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 77 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 78 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 79 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 80 9.1. Normative References . . . . . . . . . . . . . . . . . . . 13 81 9.2. Informative References . . . . . . . . . . . . . . . . . . 14 82 Appendix A. TODO list . . . . . . . . . . . . . . . . . . . . . . 16 83 Appendix B. Changes from previous versions of the draft . . . . . 16 84 B.1. Changes from draft-zimmermann-tcp-lcd-01 . . . . . . . . . 16 85 B.2. Changes from draft-zimmermann-tcp-lcd-00 . . . . . . . . . 16 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 88 1. Terminology 90 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 91 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 92 document are to be interpreted as described in [RFC2119]. 94 As defined in [RFC0793], the term "acceptable acknowledgment (ACK)" 95 refers to a TCP segment that acknowledges previously unacknowledged 96 data. The Transmission Control Protocol (TCP) sender state variable 97 "SND.UNA" and the current segment variable "SEG.SEQ" are used as 98 defined in [RFC0793]. SND.UNA holds the segment sequence number of 99 earliest segment that has not been acknowledged by the TCP receiver 100 (the oldest outstanding segment). SEG.SEQ is the segment sequence 101 number of a given segment. 103 We use both the term "retransmission timer" and the term 104 "retransmission timeout (RTO)" as defined in [RFC2988]. 106 2. Introduction 108 Connectivity disruptions can occur in many different situations. The 109 frequency of the connectivity disruptions depends thereby on the 110 property of the end-to-end path between the communicating hosts. 111 While connectivity disruptions can occur in traditional wired 112 networks too, e.g., simply due to an unplugged network cable, the 113 likelihood of occurrence is significantly higher in wireless (multi- 114 hop) networks. Especially, end-host mobility, network topology 115 changes and wireless interferences are crucial factors. In the case 116 of the Transmission Control Protocol (TCP) [RFC0793], the performance 117 of the connection can exhibit a significant reduction compared to a 118 permanently connected path [SESB05]. This is because TCP, which was 119 originally designed to operate in fixed and wired networks, generally 120 assumes that the end-to-end path connectivity is relatively stable 121 over the connection's lifetime. 123 According to Schuetz et. al. [I-D.schuetz-tcpm-tcp-rlci] 124 connectivity disruptions can be classified into two groups: "short" 125 and "long" connectivity disruptions. A connectivity disruption is 126 short if connectivity returns before the retransmission timer fires 127 for the first time. In this case, TCP recovers lost data segments 128 through Fast Retransmit and lost acknowledgments (ACK) through 129 successfully delivered later ACKs. Connectivity disruptions are 130 declared as "long" for a given TCP connection, if the retransmission 131 timer fires at least once before connectivity returns. Whether or 132 not path characteristics like the round trip time (RTT) or the 133 available bandwidth have changed when the connectivity returns after 134 a disruption is another important aspect for TCP's retransmission 135 scheme [I-D.schuetz-tcpm-tcp-rlci]. 137 This document will focus on TCP's behavior in face of long 138 connectivity disruptions in the time "before" connectivity is 139 restored. In particular this memo does not describe any additional 140 modification to detect if the path characteristics remain unchanged 141 in order to improve TCP's behavior "after" connectivity is restored. 142 Therefore, TCP's congestion control mechanisms 143 [I-D.ietf-tcpm-rfc2581bis] will be unchanged. 145 When a long connectivity disruption occurs on a TCP connection, the 146 TCP sender stops receiving acknowledgments. After the retransmission 147 timer expires, the TCP sender enters the timeout-based loss recovery 148 and declares the oldest outstanding segment (SND.UNA) as lost. Since 149 TCP tightly couples reliability and congestion control, the 150 retransmission of SND.UNA is triggered together with the reduction of 151 sending rate, which is based on the assumption that loss is 152 indication of congestion [I-D.ietf-tcpm-rfc2581bis]. As long as the 153 connectivity disruption persists, TCP will repeat the procedure until 154 the oldest outstanding segment is successfully acknowledged, or the 155 connection times out. TCP implementations that follow the 156 recommended retransmission timeout (RTO) management of RFC 2988 157 [RFC2988] double the RTO after each retransmission attempt. However, 158 the RTO growth may be bounded by an upper limit, the maximum RTO, 159 which is at least 60s, but may be longer: Linux for example uses 160 120s. If the connectivity is restored between two retransmission 161 attempts, TCP still has to wait until the retransmission timer 162 expires before resuming transmission, since it simply does not have 163 any means to know when the connectivity is re-established. 164 Therefore, depending on when connectivity becomes available again, 165 this can waste up to maximum RTO of possible transmission time. 167 This retransmission behavior is not efficient, especially in 168 scenarios or networks like wireless (multi-hop) networks where 169 connectivity disruptions are frequent. In the ideal case, TCP would 170 attempt a retransmission as soon as connectivity to its peer is re- 171 established. This document describes how the standard Internet 172 Control Message Protocol (ICMP) can be exploited to identify non- 173 congestion loss caused by connectivity disruptions. An revert 174 strategy of the retransmission timer is specified that enables, due 175 to higher-frequency retransmissions, a prompt detection of whether 176 connectivity to a previously disconnected peer node has been 177 restored. The specified scheme is a TCP sender-only modification, 178 i.e., neither intermediate routers nor the TCP receiver have to be 179 modified. Furthermore, in the case the network allows, i.e., no 180 congestion is present, the proposed algorithm approaches the ideal 181 behavior. 183 3. Connectivity Disruption Indication 185 As long as the queue of an intermediate router experiencing a link 186 outage is deep enough, i.e., it can buffer all incoming packets, a 187 connectivity disruption will only cause variation in delay which is 188 handled well by contemporary TCP implementations with the help of 189 Eifel [RFC3522] or forward RTO (F-RTO) [I-D.ietf-tcpm-rfc4138bis]. 190 However, if the link outage lasts too long, the router experiencing 191 the link outage is forced to drop packets and finally to discard the 192 according route. Means to detect such link outages comprise reacting 193 on failed address resolution protocol (ARP) [RFC0826] queries, 194 unsuccessful link sensing, and the like. However, this is solely in 195 the responsibility of the respective router. 197 Note: The focus of this memo is on introducing a method how ICMP 198 messages may be exploited to improve TCP's performance; how 199 different physical and link layer mechanisms underneath the 200 network layer may trigger ICMP destination unreachable messages 201 are out of scope of this memo. 203 The removal of the route usually goes along with a notification to 204 the corresponding TCP sender about the dropped packets via ICMP 205 destination unreachable messages of code 0 (net unreachable) or code 206 1 (host unreachable) [RFC1812]. Therefore, since ICMP destination 207 unreachable messages of these codes provide evidence that packets 208 were dropped due to a link outage, they can be used by a TCP as an 209 indication for a connectivity disruption. 211 Note that there are also other ICMP destination unreachable messages 212 with different codes. Some of them are candidates for connectivity 213 disruption indications too, but need further investigation. For 214 example ICMP destination unreachable messages with code 5 (source 215 route failed), code 11 (net unreachable for TOS), or code 12 (host 216 unreachable for TOS) [RFC1812]. On the other side codes that flag 217 hard errors are of no use for the proposed scheme, since TCP should 218 abort the connection when those are received [RFC1122]. In the 219 following, the term "ICMP unreachable message" is used as synonym for 220 ICMP destination unreachable messages of code 0 or code 1. 222 The accurate interpretation of ICMP unreachable messages as an 223 connectivity disruption indication is complicated by the following 224 two peculiarities of ICMP messages. Firstly, they do not necessarily 225 operate on the same timescale as the packets, i.e., in the given case 226 TCP segments, which elicited them. When a router drops a packet due 227 to a missing route it will not necessarily send an ICMP unreachable 228 message immediately, but rather queues it for later delivery. 229 Secondly, ICMP messages are subject to rate limiting, e.g., when a 230 router drops a whole window of data due to a link outage, it will 231 hardly send as many ICMP unreachable messages as it dropped TCP 232 segments. Depending on the load of the router it may even send no 233 ICMP unreachable messages at all. Both peculiarities originate from 234 [RFC1812]. 236 Fortunately, according to [RFC0792] ICMP unreachable messages are 237 obliged to contain in their body the Internet Protocol (IP) header 238 [RFC0791] of the datagram eliciting the ICMP unreachable messages 239 plus the first 64 bits of the payload of that datagram. Hence, in 240 case of TCP both port numbers and the sequence number are included. 241 This allows the originating TCP to identify the connection which an 242 ICMP unreachable message is reporting an error about. Moreover, it 243 allows the originating TCP to identify which segment of the 244 respective connection triggered the ICMP unreachable message, 245 provided that there are not several segments in flight with the same 246 sequence number. This may very well be the case when TCP is 247 recovering lost segments (see Section 4.3). 249 A connectivity disruption indication in form of an ICMP unreachable 250 message associated with a presumably lost TCP segment provides strong 251 evidence that the segment was not dropped due to congestion but 252 instead was successful delivered to the temporary end-point of the 253 employed path, i.e., the reporting router. It therefore did not 254 witness any congestion at least on that very part of the path which 255 was traveled by both, the TCP segment eliciting the ICMP unreachable 256 message as well as the ICMP unreachable message itself. 258 4. Connectivity Disruption Reaction 260 In Section 4.1 the basic idea of the algorithm is given. The 261 complete algorithm is specified in Section 4.2. In Section 4.3 the 262 algorithm is discussed in detail. 264 4.1. Basic Idea 266 The goal of the algorithm is the prompt detection when the 267 connectivity to a previously disconnected peer node has been restored 268 after a long connectivity disruption while retaining appropriate 269 behavior in case of congestion. The proposed algorithm exploits 270 standard ICMP unreachable messages to increase the TCP's 271 retransmission frequency during timeout-based loss recovery by 272 undoing one retransmission timer backoff whenever an ICMP unreachable 273 message reports on a presumably lost retransmission. 275 This approach has the advantage of appropriately reducing the probing 276 rate in case of congestion. If either the (re-)transmission itself, 277 or the corresponding ICMP message is dropped the conventional backoff 278 is performed and not undone, effectively halving the probing rate. 280 4.2. The Algorithm 282 A TCP sender using RFC 2988 [RFC2988] to compute TCP's retransmission 283 timer MAY employ the following scheme to avoid over-conservative 284 backoffs of the retransmission timer in case of long connectivity 285 disruptions. If a TCP sender does implement the scheme, the 286 following steps MUST be taken, but only upon initiation of a timeout- 287 based loss recovery, i.e., upon the first timeout of the oldest 288 outstanding segment (SND.UNA). The algorithm MUST NOT be re- 289 initiated after a timeout-based loss recovery has already been 290 started but not completed. In particular, it must not be re- 291 initiated upon subsequent timeouts for the same segment. 293 A TCP sender that does not employ RFC 2988 [RFC2988] to compute TCP's 294 retransmission timer SHOULD NOT use the scheme. We envision that the 295 scheme could be easily adapted to other algorithms than RFC 2988. 296 However, we leave this as future work. 298 The scheme specified in this document uses the "Backoff_cnt" 299 variable, whose initial value is zero. The variable is used to count 300 the number of performed retransmission timer backoffs during one 301 timeout-based loss recovery. Moreover, the "RTO_base" variable is 302 used to recover the previous RTO in case the retransmission timer 303 backoff was unnecessary. The variable is initialized with the RTO 304 upon initiation of timeout-based loss recovery. 306 (1) Before the variable RTO gets updated when timeout-based loss 307 recovery is initiated, set the variable "Backoff_cnt" and the 308 variable "RTO_base" as follows: 310 Backoff_cnt := 0; 311 RTO_base := RTO. 313 Proceed to step (R). 315 (R) This is a placeholder for the behavior that a standard TCP must 316 execute at this point in case the retransmission timer is 317 expired. In particular if RFC 2988 [RFC2988] is used, steps 318 (5.4) - (5.6) of that algorithm go here. Proceed to step (2). 320 (2) If the retransmission timer was backed off in the previous step 321 (R), then increment the variable "Backoff_cnt" by one to account 322 for the new backoff 324 Backoff_cnt := Backoff_cnt + 1. 326 (3) Wait either 328 for the expiration of the retransmission timer. When the 329 retransmission timer expires, proceed to step (R); 331 or for the arrival of an acceptable ACK. When an acceptable 332 ACK arrives, proceed to step (A); 334 or for the arrival of an ICMP unreachable message. When the 335 ICMP unreachable message ICMP_DU arrives, proceed to step 336 (4). 338 (4) If "Backoff_cnt > 0", i.e., an undoing of the last 339 retransmission timer backoff is allowed, then 341 proceed to step (5); 343 else 345 proceed to step (3). 347 (5) Extract the TCP segment header included in the ICMP destination 348 unreachable message ICMP_DU 350 SEG := Extract(ICMP_DU). 352 (6) If "SEG.SEQ == SND.UNA", i.e., the ICMP unreachable ICMP_DU 353 message reports on the oldest outstanding segment, then undo the 354 last retransmission timer backoff 356 Backoff_cnt := Backoff_cnt - 1; 357 RTO := RTO_base * 2^(Backoff_cnt). 359 (7) If the retransmission timer expires due to the undoing in the 360 previous step (6), then 362 proceed to step (R); 364 else 366 proceed to step (3). 368 (A) This is a placeholder for the standard TCP behavior that must be 369 executed at this point in the case an acceptable ACK has 370 arrived. No further processing. 372 When a TCP in steady-state detects a segment loss using the 373 retransmission timer it enters the timeout-based loss recovery and 374 initiates the algorithm (step 1). It adjusts the slow start 375 threshold (ssthresh), sets the congestion window (CWND) to one 376 segment, back offs the retransmission timer and retransmits the first 377 unacknowledged segment (step R) [I-D.ietf-tcpm-rfc2581bis] [RFC2988]. 379 In case the retransmission timer expires again (step 3a) a TCP will 380 repeat the retransmission of the first unacknowledged segment and 381 back off the retransmission timer once more (step R). If a maximum 382 value is placed on the RTO (rule 2.5 in [RFC2988]) and that maximum 383 value is already reached the TCP will not backoff the retransmission 384 timer in this step and thus "Backoff_cnt" MUST NOT be incremented. 385 However, the "last step" to reach this maximum RTO is still 386 considered as a backoff in the scope of this algorithm and 387 "Backoff_cnt" MUST be incremented, even if the RTO is not strictly 388 doubled. 390 If the first received packet after the retransmission(s) is an 391 acceptable ACK (step 3b), a TCP will proceed as normal, i.e., slow 392 start the connection and terminate the algorithm (step A). Later 393 ICMP unreachable messages from the just terminated timeout-based loss 394 recovery are of no use and therefore ignored since the ACK clock is 395 already restarting due to the successful retransmission. 397 On the other side if the first received packet after the 398 retransmission(s) is an ICMP unreachable message (step 3c), a TCP 399 SHOULD if allowed (step 4) undo one backoff for each ICMP unreachable 400 message reporting an error on a retransmission. To decide if an ICMP 401 unreachable message reports on a retransmission, the sequence number 402 therein is exploited (step 5, step 6). The undo is done by re- 403 calculating the RTO with the previously reduced "Backoff_cnt". This 404 calculation explicitly matches the exponential backoff specified in 405 [RFC2988] (rule 5.5). 407 Upon receipt of an ICMP unreachable message which legitimately undoes 408 one backoff there is the possibility that this new started 409 retransmission timer has expired already (step 7). Then, a TCP 410 SHOULD retransmit immediately, i.e., an ICMP message clocked 411 retransmission. In case the new started retransmission timer has not 412 expired yet, TCP MUST wait accordingly. 414 4.3. Discussion 416 It is important to note that the proposed algorithm only reacts to 417 connectivity disruption indications in form of ICMP destination 418 unreachable messages during the phase of RTO induced loss recovery. 419 That is, TCP's behavior is not altered when no ICMP unreachable 420 messages are received, or the retransmission timer of the TCP sender 421 did not yet expire since the last successfully received ACK. Thereby 422 the algorithm is by definition only triggered in the case of long 423 connectivity disruptions. 425 Only such ICMP unreachable messages which are reporting on the 426 sequence number of the retransmission (SND.UNA) are evaluated by the 427 proposed algorithm. All other ICMP unreachable messages are ignored. 428 If an ICMP unreachable message arrives for a retransmission it 429 provides evidence that neither the retransmission nor the 430 corresponding ICMP unreachable message itself did experience any 431 congestion. In other words, it has been proved that the 432 retransmission was not lost due to congestion, but due to a 433 connectivity disruption instead. 435 One could argue, that if an ICMP unreachable message arrives for an 436 RTO induced retransmission, the RTO should be reset, and the next 437 retransmission sent out immediately similar to what is done when an 438 ACK arrives after an RTO induced recovery phase. This would allow 439 for a much higher probing frequency based on the round trip time of 440 the router where the connectivity is disrupted. However, we consider 441 our proposed scheme a good trade off between conservative behavior 442 and a fast detection of connectivity re-establishment. 444 Of course there is an ambiguity on which (re-)transmission an ICMP 445 unreachable message reports. However, for our purposes it is not 446 considered to be problem, because the assumption that such an ICMP 447 message provides evidence that one link loss was wrongly considered 448 as a congestion loss, still holds. There is also the option to make 449 use of the timestamps option to obtain a more strict mapping between 450 segments and ICMP messages (see Section 4.3). 452 Besides the ambiguity if the first unacknowledged sequence number 453 refers to the original transmission or to any of the retransmissions, 454 there is another source of ambiguity about the sequence numbers 455 contained in the ICMP unreachable messages. For high bandwidth paths 456 like modern gigabit links the sequence space may wrap rather quickly, 457 thereby allowing the possibility that a late ICMP unreachable message 458 reporting on an old error may coincidentally fit as input in the 459 scheme explained above. As a result, the scheme would wrongly undo 460 one backoff. Chances for this to happen are minuscule, since a 461 particular ICMP message would need to contain the exact sequence 462 number of SND.UNA, while at the same TCP is coincidentally in 463 timeout-based loss recovery. Moreover, as the scheme is tailored 464 most conservatively no threat to the network from this issues may 465 arise. 467 Finally, the scheme explicitly does not call for a differentiation of 468 ICMP unreachable messages originating from different routers, as the 469 evidence of no congestion still holds even if the reporting router 470 changed. 472 Another exploitation of ICMP unreachable messages in the context of 473 TCP congestion control might seem appropriate in case the ICMP 474 unreachable message is received while TCP is in steady-state and the 475 message refers to a segment from within the current window of data. 476 As the RTT up to the router which generates the ICMP unreachable 477 message is likely to be substantially shorter than the overall RTT to 478 the destination, the ICMP unreachable message may very well reach the 479 originating TCP while it is transmitting the current window of data. 480 In case the remaining window is large, it might seem appropriate to 481 refrain from transmitting the remaining window as there is timely 482 evidence that it will only trigger further ICMP unreachable messages 483 at the very router. Although this might seem appropriate from a 484 wastage perspective, it may be counterproductive from a security 485 perspective since ICMP message are easy to spoof, thereby allowing an 486 easy attack to the TCP by simply forging such ICMP messages. 488 An additional consideration is the following: in the presence of 489 multi-path routing even the receipt of a legitimate ICMP unreachable 490 message cannot be exploited accurately because there is the option 491 that only one of the multiple paths to the destination is suffering 492 from a connectivity disruption which causes ICMP unreachable messages 493 to be sent. Then however, there is the possibility that the path 494 along which the connectivity disruption occurred contributed 495 considerably to the overall bandwidth, such that a congestion 496 response is very well reasonable. However, this is not necessarily 497 the case. Therefore, a TCP has no means except for its inherent 498 congestion control to decide on this matter. All in all, it seems 499 that for a connection in steady-state, i.e., not in RTO induced 500 recovery, reacting on ICMP unreachable messages in regard to 501 congestion control is not appropriate. For the case of RTO-based 502 retransmissions, however, there is a reasonable congestion response, 503 which is skipping further backoffs of the retransmission timer 504 because there is no congestion indication - as described above. 506 4.4. Protecting Against Misbehaving Routers (the Safe Variant) 508 Given that the TCP Timestamps option [I-D.ietf-tcpm-1323bis] is 509 enabled for a connection, a TCP sender MAY use the following 510 algorithm to protect against misbehaving routers. 512 5. Related Work 514 In literature there are several methods that address TCP's problems 515 in the presence of connectivity disruptions. Some of them try to 516 improve TCP's performance by modifying lower layers. For example 518 [SM03] introduces a "smart link layer" that buffers one segment for 519 each ongoing connection and replaying these segments on connectivity 520 re-establishment. This approach has a serious drawback: previously 521 stateless intermediate routers have to be modified in order to 522 inspect TCP headers, to track the end-to-end connection and to 523 provide additional buffer space. These lead all in all to an 524 additional need of memory and processing power. 526 On the other hand stateless link layer schemes, like proposed in 527 [RFC3819], which unconditionally buffer some small number of packets 528 may have another problem: if a packet is buffered longer than the 529 maximum segment lifetime (MSL) of 2 min [RFC0793], i.e., the 530 disconnection lasts longer than MSL, TCP's assumption that such 531 segments will never be received will no longer be true, violating 532 TCP's semantics [I-D.eggert-tcpm-tcp-retransmit-now]. 534 Other approaches like TCP-F [CRVP01] or the Explicit Link Failure 535 Notification (ELFN) [HV02] inform the TCP sender about a disrupted 536 path by special messages generated from intermediate routers. In 537 case of a link failure they stop sending segments and freeze TCP's 538 retransmission timers. TCP-F stays in this state and remains silent 539 until either a "route establishment notification" is received or an 540 internal timer expires. In contrast, ELFN periodically probes the 541 network to detect connectivity re-establishment. Both proposals rely 542 on changes to intermediate routers, whereas the scheme proposed in 543 this document is a sender-only modification. Moreover, ELFN also 544 does not consider congestion and may impose serious additional load 545 on the network, depending on the probe interval. 547 The authors of ATCP [LS01] propose enhancements to identify different 548 types of packet loss by introducing a layer between TCP and IP. They 549 utilize ICMP destination unreachable messages to set TCP's receiver 550 advertised window to zero and thus forcing the TCP sender to perform 551 zero window probing with a exponential backoff. ICMP destination 552 unreachable messages, which arrive during this probing period, are 553 ignored. This approach is nearly orthogonal to this document, which 554 exploits ICMP messages to undo a retransmission timer backoff when 555 TCP is already probing. In principle both mechanisms could be 556 combined, however, due to security considerations it does not seem 557 appropriate to adopt ATCP's reaction as discussed in Section 4.3. 559 Schuetz et al. describe in [I-D.schuetz-tcpm-tcp-rlci] a set of TCP 560 extensions that improve TCP's behavior when transmitting over paths 561 whose characteristics can change on short time-scales. Their 562 proposed extensions modify the local behavior of TCP and introduce a 563 new TCP option to signal locally received connectivity-change 564 indications (CCIs) to remote peers. Upon reception of a CCI, they 565 re-probe the path characteristics either by performing a speculative 566 retransmission or by sending a single segment of new data, depending 567 on whether the connection is currently stalled in exponential backoff 568 or transmitting in steady-state, respectively. The authors focus on 569 specifying TCP response mechanisms, nevertheless underlying layers 570 would have to be modified to explicitly send CCIs to make these 571 immediate responses possible. 573 6. IANA Considerations 575 This memo includes no request to IANA. 577 7. Security Considerations 579 The proposed algorithm is considered to be secure. For example an 580 attacker cannot make a TCP modified with proposed scheme flood the 581 network just by sending forged ICMP unreachable messages to attempt 582 to maliciously shorten the retransmission timer. An attacker would 583 need to guess the correct sequence number of the current 584 retransmission, which seems very unlikely. Even in case of an 585 omniscient attacker, the impact on network load would be low, since 586 the retransmission frequency is limited by the RTO which was computed 587 before TCP has entered the timeout-based loss recovery. (The highest 588 probing frequency is expected to be even lower than once per minimum 589 RTO, that is 1s as specified by [RFC2988].) 591 8. Acknowledgments 593 We would like to thank Timothy Shepard and Joe Touch for feedback on 594 earlier versions of this draft. We also thank Michael Faber, Daniel 595 Schaffrath, and Damian Lukowski for implementing and testing the 596 algorithm in Linux. Special thanks go to Ilpo Jarvinen, who gave 597 valuable feedback regarding the Linux implementation. 599 This document was written with the xml2rfc tool described in 600 [RFC2629]. 602 9. References 604 9.1. Normative References 606 [I-D.ietf-tcpm-1323bis] 607 Borman, D., Braden, R., and V. Jacobson, "TCP Extensions 608 for High Performance", draft-ietf-tcpm-1323bis-01 (work in 609 progress), March 2009. 611 [I-D.ietf-tcpm-rfc2581bis] 612 Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 613 Control", draft-ietf-tcpm-rfc2581bis-07 (work in 614 progress), July 2009. 616 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 617 RFC 792, September 1981. 619 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 620 RFC 793, September 1981. 622 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", 623 RFC 1812, June 1995. 625 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 626 Timer", RFC 2988, November 2000. 628 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 629 Message Protocol (ICMPv6) for the Internet Protocol 630 Version 6 (IPv6) Specification", RFC 4443, March 2006. 632 9.2. Informative References 634 [CRVP01] Chandran, K., Raghunathan, S., Venkatesan, S., and R. 635 Prakash, "A feedback-based scheme for improving TCP 636 performance in ad hoc wireless networks", IEEE Personal 637 Communications vol. 8, no. 1, pp. 34-39, February 2001. 639 [HV02] Holland, G. and N. Vaidya, "Analysis of TCP performance 640 over mobile ad hoc networks", Wireless Networks vol. 8, 641 no. 2-3, pp. 275-288, March 2002. 643 [I-D.eggert-tcpm-tcp-retransmit-now] 644 Eggert, L., "TCP Extensions for Immediate 645 Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02 646 (work in progress), June 2005. 648 [I-D.ietf-tcpm-rfc4138bis] 649 Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 650 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 651 Spurious Retransmission Timeouts with TCP", 652 draft-ietf-tcpm-rfc4138bis-04 (work in progress), 653 October 2008. 655 [I-D.schuetz-tcpm-tcp-rlci] 656 Schuetz, S., Koutsianas, N., Eggert, L., Eddy, W., Swami, 657 Y., and K. Le, "TCP Response to Lower-Layer Connectivity- 658 Change Indications", draft-schuetz-tcpm-tcp-rlci-03 (work 659 in progress), February 2008. 661 [LS01] Liu, J. and S. Singh, "ATCP: TCP for mobile ad hoc 662 networks", IEEE Journal on Selected Areas in 663 Communications vol. 19, no. 7, pp. 1300-1315, 2001 July. 665 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 666 September 1981. 668 [RFC0826] Plummer, D., "Ethernet Address Resolution Protocol: Or 669 converting network protocol addresses to 48.bit Ethernet 670 address for transmission on Ethernet hardware", STD 37, 671 RFC 826, November 1982. 673 [RFC1122] Braden, R., "Requirements for Internet Hosts - 674 Communication Layers", STD 3, RFC 1122, October 1989. 676 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 677 Requirement Levels", BCP 14, RFC 2119, March 1997. 679 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 680 June 1999. 682 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 683 for TCP", RFC 3522, April 2003. 685 [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., 686 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. 687 Wood, "Advice for Internet Subnetwork Designers", BCP 89, 688 RFC 3819, July 2004. 690 [RFC4884] Bonica, R., Gan, D., Tappan, D., and C. Pignataro, 691 "Extended ICMP to Support Multi-Part Messages", RFC 4884, 692 April 2007. 694 [SESB05] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner, 695 "Protocol enhancements for intermittently connected 696 hosts", SIGCOMM Computer Communication Review vol. 35, no. 697 3, pp. 5-18, December 2005. 699 [SM03] Scott, J. and G. Mapp, "Link layer-based TCP optimisation 700 for disconnecting networks", SIGCOMM Computer 701 Communication Review vol. 33, no. 5, pp. 31-42, 702 October 2003. 704 Appendix A. TODO list 706 o Extend the Security Sections 4.4 and 7. 708 o Extend discussion in Section 4.3 710 * ICMPv6. See [RFC4443] and [RFC4884]. 712 * Explicit Congestion Notification (ECN). 714 * More about congestion in general. 716 o Mention the possible side-effect on TCP implementations that 717 measure the thresholds R1 and R2 (Section 4.2.3.5 of [RFC1122]) as 718 a count of retransmissions instead of time units. 720 o Discuss the influence of packet duplication on the algorithm 721 (Thanks to Ilpo). 723 Appendix B. Changes from previous versions of the draft 725 B.1. Changes from draft-zimmermann-tcp-lcd-01 727 o The algorithm in Section 4.2 was slightly changed. Instead of 728 reverting the RTO by halving it, it is recalculated with help of 729 the "Backoff_cnt" variable. This fixes an issue that occurred 730 when the retransmission timer was backed off but bounded by a 731 maximum value. The algorithm in the previous version of the 732 draft, would have "reverted" to half of that maximum value, 733 instead of using the value, before the RTO was doubled (and then 734 bounded). 736 o Miscellaneous editorial changes. 738 o Extended the TODO list (Appendix A). 740 B.2. Changes from draft-zimmermann-tcp-lcd-00 742 o Miscellaneous editorial changes in Section 1, 2 and 3. 744 o The document was restructured in Section 1, 2 and 3 for easier 745 reading. The motivation for the algorithm is changed according 746 TCP's problem to disambiguate congestion from non-congestion loss. 748 o Added Section 4.1. 750 o The algorithm in Section 4.2 was restructured and simplified: 752 * The special case of the first received ICMP destination 753 unreachable message after an RTO was removed. 755 * The "Backoff_cnt" variable was introduced so it is no longer 756 possible to perform more reverts than backoffs. 758 o The discussion in Section 4.3 was improved and expanded according 759 to the algorithm changes. 761 o Added Section 4.4. 763 Authors' Addresses 765 Alexander Zimmermann 766 RWTH Aachen University 767 Ahornstrasse 55 768 Aachen, 52074 769 Germany 771 Phone: +49 241 80 21422 772 Email: zimmermann@cs.rwth-aachen.de 774 Arnd Hannemann 775 RWTH Aachen University 776 Ahornstrasse 55 777 Aachen, 52074 778 Germany 780 Phone: +49 241 80 21423 781 Email: hannemann@nets.rwth-aachen.de