idnits 2.17.1 draft-ietf-tcpm-sack-recovery-entry-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (8 March 2010) is 5162 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force I. Jarvinen 2 INTERNET-DRAFT M. Kojo 3 draft-ietf-tcpm-sack-recovery-entry-01.txt University of Helsinki 4 Intended status: Standards Track 8 March 2010 5 Expires: September 2010 7 Using TCP Selective Acknowledgement (SACK) Information to Determine 8 Duplicate Acknowledgements for Loss Recovery Initiation 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with 13 the provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months and may be updated, replaced, or obsoleted by other documents 22 at any time. It is inappropriate to use Internet-Drafts as 23 reference material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on September 2010. 33 Copyright Notice 35 Copyright (c) 2010 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with 43 respect to this document. Code Components extracted from this 44 document must include Simplified BSD License text as described in 45 Section 4.e of the Trust Legal Provisions and are provided without 46 warranty as described in the Simplified BSD License. 48 Abstract 50 This document describes a TCP sender algorithm to trigger loss 51 recovery based on the TCP Selective Acknowledgement (SACK) 52 information gathered on a SACK scoreboard instead of simply counting 53 the number of arriving duplicate acknowledgements (ACKs) in the 54 traditional way. The given algorithm is more robust to ACK losses, 55 ACK reordering, missed duplicate acknowledgements due to delayed 56 acknowledgements, and extra duplicate acknowledgements due to 57 duplicated segments and out-of-window segments. The algorithm allows 58 not only a timely initiation of TCP loss recovery but also reduces 59 false fast retransmits. It has a low implementation cost on top of 60 the SACK scoreboard defined in RFC 3517. 62 Table of Contents 64 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 5 65 1.1. Conventions and Terminology. . . . . . . . . . . . . . . 6 66 1.2. Definitions. . . . . . . . . . . . . . . . . . . . . . . 7 67 2. Algorithm Details . . . . . . . . . . . . . . . . . . . . . . 7 68 2.1. Redefined IsLost (SeqNum). . . . . . . . . . . . . . . . 7 69 2.2. The Algorithm. . . . . . . . . . . . . . . . . . . . . . 7 70 3. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . 9 71 3.1. Small Segment Sender . . . . . . . . . . . . . . . . . . 9 72 3.2. SACK Capability Misbehavior. . . . . . . . . . . . . . . 10 73 3.3. Compatibility with Duplicate ACK based Loss 74 Recovery Algorithms . . . . . . . . . . . . . . . . . . . . . 11 75 4. Security Considerations . . . . . . . . . . . . . . . . . . . 11 76 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 77 6. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 12 78 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 79 A. Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 A.1. Basic Case . . . . . . . . . . . . . . . . . . . . . . . 12 81 A.2. Delayed ACK. . . . . . . . . . . . . . . . . . . . . . . 13 82 A.3. ACK Loss . . . . . . . . . . . . . . . . . . . . . . . . 14 83 A.4. ACK Reordering . . . . . . . . . . . . . . . . . . . . . 15 84 A.5. Duplicated Packet. . . . . . . . . . . . . . . . . . . . 16 85 A.6. Mitigation of Blind Throughput Reduction 86 Attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 87 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 88 Normative References . . . . . . . . . . . . . . . . . . . . . . 16 89 Informative References . . . . . . . . . . . . . . . . . . . . . 17 90 AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . 18 91 TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION: 93 Changes from draft-ietf-tcpm-sack-recovery-entry-00.txt 95 * Mention setting of RecoveryPoint explicitly as this algorithm 96 depends on it being valid. 98 * Changed definition of IsLost (SeqNum) to be less strict. 100 * Changed packet ordering in one of the appendix examples, now it 101 makes more sense in the context of this algorithm. Point out in the 102 examples which of the transmissions are due to Limited Transmit and 103 Fast retransmit. 105 Changes from draft-jarvinen-tcpm-sack-recovery-entry-01.txt 107 * Clarified issues that based on feedback may cause confusion for 108 the reader. 110 * Incorporated handling of cumulative ACKs into the algorithm 112 * 2581 refs -> 5681 114 * Added early-rexmt ID as a related one, it uses SACK information 115 similar to this algorithm (Thanks to Anna Brunstrom). 117 * More cases added where this algorithm is beneficial in taking 118 advantage of SACK block redundancy (thanks to Anna Brunstrom). 120 * Discuss on differences how duplicate ACK counter is managed 121 (traditional vs. this algorithm) 123 * Added ref and couple of words about blind throughput reduction 124 attack 126 * Wrote SACK splitting attacks. These attacks are quite close to the 127 edge in significance. Should consider just dropping (rather 128 insignificant). 130 Changes from draft-jarvinen-tcpm-sack-recovery-entry-00.txt 132 * TODO items embedded: Improvements with window update, clarify 133 dupack counting 135 * Modified ACK reordering scenario in appendix, shows now a scenario 136 where recovery is triggered in a more timely manner. 138 * IDnits 139 * Handle small segments case using duplicate ACKs counter paraller 140 to the SACK blocks based detection. 142 * Add a placeholder for SACK splitting 144 * Mentioned FACK as some ideas are inherited from there 146 END OF SECTION TO BE DELETED. 148 1. Introduction 150 The Transmission Control Protocol (TCP) [RFC793] has two methods for 151 triggering retransmissions. First, the TCP sender relies on 152 incoming duplicate acknowledgements (ACKs) [RFC5681], indicating 153 receipt of out-of-order segments at the TCP receiver. After 154 receiving a required number of duplicate ACKs (usually three), the 155 TCP sender retransmits the first unacknowledged segment and 156 continues with a fast recovery algorithm such as Reno [RFC5681], 157 NewReno [RFC3782] or SACK-based loss recovery [RFC3517]. Second, 158 the TCP sender maintains a retransmission timer that triggers 159 retransmission of segments, if the retransmission timer expires 160 before the segments have been acknowledged. 162 While the conservative loss recovery algorithm defined in [RFC3517] 163 takes full advantage of SACK information during a loss recovery, it 164 does not consider the very same information during the pre-recovery 165 detection phase. Instead, it simply counts the number of arriving 166 duplicate ACKs and leans on the number of duplicate ACKs in deciding 167 when to enter loss recovery. However, this traditional heuristics of 168 simply counting the number of duplicate ACKs to trigger a loss 169 recovery fails in several cases to determine correctly the actual 170 number of valid out-of-order segments the receiver has successfully 171 received. First, trusting on duplicate ACKs alone utterly fails to 172 get hold of the whole picture in case of ACK losses and ACK 173 reordering, resulting in delayed or missed initiation of fast 174 retransmit and fast recovery. Similarly, the delayed ACK mechanism 175 tends to conceal the first duplicate ACK as the delayed cumulative 176 ACK becomes combined with the first duplicate ACK when the first 177 out-of-order segment arrives at the receiver (in case of an enlarged 178 ACK ratio such as with ACK congestion control [RFC5690], even more 179 significant portion is affected). Second, segment duplication or 180 out-of-window segments increase the risk of falsely triggering loss 181 recovery as they trigger duplicate ACKs. At worst, this legitimate 182 behavior on out-of-window segments can be turned into a blind 183 throughput reduction attack [CPNI09]. Third, receiver window 184 updates or opposite direction data segments cannot be counted as 185 duplicate ACKs with the traditional approach but can still contain 186 redundant SACK information that the sender could benefit from in a 187 scenario where the actual duplicate ACKs where lost. 189 The algorithm specified in this document uses TCP Selective 190 Acknowledgement Option [RFC2018] in the pre-recovery state to 191 determine duplicate ACKs and to trigger loss recovery based on the 192 information gathered on the SACK scoreboard [RFC3517]. It gives a 193 more accurate heuristic for determining the number of out-of-order 194 segments that have arrived at the TCP receiver. The information 195 gathered on the SACK scoreboard reveals missing ACKs and allows 196 detecting duplicate events. Therefore, the algorithm enables a 197 timely triggering of Fast Retransmit. In addition, it allows the use 198 of Limited Transmit [RFC3042] accurately regardless of lost ACKs and 199 also in the cases where the SACK information is piggybacked to a 200 cumulative ACK due to delayed ACKs. This, in turn, improves the ACK 201 clock accuracy. 203 This algorithm is close to what Linux TCP implementation has used 204 for a very long time when in conservative SACK mode. A similar 205 approach is briefly mentioned along ACK congestion control [RFC5690] 206 but as the usefulness of the algorithm in this document is more 207 general and not limited to ACK congestion control we specify it 208 separately. We also note that the definition of a duplicate 209 acknowledgement already suggests that an incoming ACK can be 210 considered as a duplicate ACK if it "contains previously unknown 211 SACK information" [RFC5681]. In addition, SACK information is used, 212 whenever available, for similar purpose by Early Retransmit 213 [AAA+10]. 215 This algorithm also resembles Forward Acknowledgement (FACK) [MM96] 216 but they differ in how the quantity of data outstanding in the 217 network is determined. FACK always assumes that every non-SACKed 218 octet below the highest SACKed octet is lost which is only true if 219 no reordering occurs. Thus it would simply trigger loss recovery 220 whenever the highest SACKed octet is more than dupThresh * SMSS 221 octets above SND.UNA. 223 1.1. Conventions and Terminology 225 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 226 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 227 document are to be interpreted as described in BCP 14, RFC 2119 228 [RFC2119] and indicate requirement levels for protocols. 230 1.2. Definitions 232 The reader is expected to be familiar with the definitions given in 233 [RFC5681], [RFC2018], and [RFC3517]. 235 2. Algorithm Details 237 In order to use this algorithm, a TCP sender MUST have TCP Selective 238 Acknowledgement Option [RFC2018] enabled and negotiated for the TCP 239 connection. The TCP sender MUST maintain SACK information in an 240 appropriate data structure such as scoreboard defined in [RFC3517]. 241 This algorithm uses functions Update(), and SetPipe () and variables 242 DupThresh, HighData, HighRxt, Pipe, and RecoveryPoint, as defined in 243 [RFC3517]. Note: the definition of IsLost (SeqNum) is altered from 244 the one specified in [RFC3517]. 246 2.1. Redefined IsLost (SeqNum) 248 IsLost (SeqNum) defined in [RFC3517] is stricter than necessary in 249 counting how many segments the receiver has received past SeqNum. 250 Instead of requiring at least three times SMSS bytes to be SACKed, 251 it is enough to have at least two times SMSS bytes plus one byte 252 SACKed to confirm that the receiver has received at least three 253 segments above SeqNum (and would have generated at least three 254 duplicate ACKs). The less strict definition is: 256 IsLost (SeqNum): 258 This routine returns whether the given sequence number is 259 considered to be lost. The routine returns true when either 260 DupThresh discontiguous SACKed sequences have arrived above 261 'SeqNum' or more than (DupThresh - 1) * SMSS bytes with sequence 262 numbers greater than 'SeqNum' have been SACKed. Otherwise, the 263 routine returns false. 265 2.2. The Algorithm 267 A TCP sender using this algorithm MUST take the following steps upon 268 the receipt of any ACK containing SACK information: 270 1) If no previous loss event has occurred on the connection OR 271 RecoveryPoint is less than SND.UNA (the oldest unacknowledged 272 sequence number [RFC793]), continue with the other steps of 273 this algorithm. Otherwise, continue the ongoing loss recovery. 275 2) Update the scoreboard via the Update () function as outlined 276 in [RFC3517]. 278 3) If ACK is a cumulative ACK, reset duplicate ACK counter to zero. 280 4) If ACK contains SACK blocks with previously unknown in-window 281 SACK information (i.e., between SND.UNA and HighData, assuming 282 SND.UNA has been updated from the acknowledgment number of the 283 ACK), increase duplicate ACK counter. 285 5) Determinate if a loss recovery should be initiated: 287 If IsLost (SND.UNA) returns false AND the sender has received 288 less than DupThresh duplicate ACKs, goto step 6A. Otherwise goto 289 step 6B. 291 6A) Invoke optional Limited Transmit: 293 Set HighRxt to SND.UNA and run SetPipe(). The TCP sender MAY 294 transmit previously unsent data segments according the 295 guidelines of Limited Transmit [RFC3042], with the exception 296 that the amount of octets that can be send is determined by Pipe 297 and cwnd. 299 If cwnd - Pipe >= 1 SMSS, the TCP sender can transmit one or 300 more segments as follows: 302 Send Loop: 304 a) If available unsent data exists and the receiver's advertised 305 window allows, transmit one segment of up to SMSS octets of 306 previously unsent data starting with sequence number 307 HighData+1 and update HighData to reflect the transmission of 308 the data segment. Otherwise, exit Send Loop. 310 b) Run SetPipe() to re-calculate the number of outstanding 311 octets in the network. If cwnd - Pipe >= 1 SMSS, go to step 312 a) of Send Loop. Otherwise, exit Send Loop. 314 6B) Invoke Fast Retransmit and enter loss recovery: 316 Initiate a loss recovery phase, per the fast retransmit 317 algorithm outlined in [RFC5681], and continue with a fast 318 recovery algorithm such as the SACK-based loss recovery 319 algorithm outlined in [RFC3517]. This includes setting 320 RecoveryPoint to HighData as in step (1) of [RFC3517]. 322 3. Discussion 324 In scenarios where no ACK losses nor reordering occur and the first 325 acknowledgement with SACK information is not the ACK held due to 326 delayed acknowledgements mechanism, the new SACK information with 327 each duplicate ACK covers a single segment. Those duplicate ACKs 328 cause this algorithm to trigger loss recovery after three duplicate 329 acknowledgements and will allow transmission of new segments using 330 Limited Transmit on the first and second duplicate ACK. This is 331 identical to the behavior that would occur without this algorithm 332 (assuming DupThresh is 3 and that all segments are SMSS sized). This 333 scenario together with other typical scenarios describing the 334 behavior of the algorithm are depicted in Appendix A. 336 This algorithm SHOULD be used also with an ACK that contains a 337 window update or opposite direction data that could not be 338 considered as a duplicate ACK in the traditional algorithm. Such 339 behavior is safe because the SACK information can only add more 340 information to the current state of the sender; at worst, all 341 received information is just redundant. 343 Setting HighRxt to SND.UNA in Step 6A has no direct relation to this 344 algorithm. Yet it is included in the algorithm to avoid confusion in 345 how to implement SetPipe() correctly because it depends on having a 346 valid HighRxt value [RFC3517]. 348 A set of potential issues to consider with the algorithm are 349 discussed in the following. 351 3.1. Small Segment Sender 353 If a TCP sender is sending small segments (usually intentionally 354 overriding Nagle algorithm [RFC896]), the IsLost (SND.UNA) used in 355 step 5 of the algorithm might fail to detect the need for loss 356 recovery on the third duplicate acknowledgement because not enough 357 octets have been SACKed to cover more than (DupThresh - 1) * SMSS 358 bytes above SND.UNA. Therefore, an adapted duplicate ACK algorithm 359 is needed as a fallback. Steps 3, 4 and the latter condition of step 360 5 implement the adapted duplicate ACK algorithm in parallel to the 361 SACK block based detection. 363 The number of duplicate ACKs is an artificial metric to estimate the 364 number of segments the receiver has already in its receive buffer. 365 How accurately they match depends on the scenario. Because of that, 366 the goal of the duplicate ACK counter included into this algorithm 367 is not to achieve bug-to-bug compatibility with the plain duplicate 368 ACK counter but to estimate how many out-of-order segments the 369 receiver has already queued in a more accurate way. Therefore, the 370 duplicate ACK counter used as a fallback mechanism in this algorithm 371 differs from the plain duplicate ACK counter. However, such 372 differences indicate a scenario where the plain counter was not able 373 to accurately keep track of the receiver state. 375 While the fallback algorithm itself does not look into 376 acknowledgment field in order to make a decision whether ACK is a 377 "duplicate ACK", the duplicate ACK counter is not renamed in this 378 document as in practice most of ACKs that increment the counter 379 would still contain a duplicate acknowledgment number. In contrast 380 to the traditional approach, only condition that must be satisfied 381 to increment the duplicate ACK counter with this algorithm is that 382 the acknowledgement MUST contain at least one in-window SACK block 383 that covers octets that were not previously SACKed [RFC5681]. In 384 cases with ACK losses or delayed ACKs this condition can also match 385 to cumulative ACKs, receiver window updates and opposite direction 386 data segments but still the counter can safely be incremented. 388 Alternatively to the fallback algorithm, a TCP sender that is able 389 to discern segment boundaries accurately can consider full segments 390 in IsLost (SeqNum) regardless of segment size. Therefore, such a 391 TCP sender can avoid the problem with small segments using IsLost 392 (SND.UNA) check alone which means that Steps 3, 4 and the latter 393 condition of step 5 are redundant and not required to be 394 implemented. 396 Note: the small segments problem is not unique to this algorithm but 397 also the SACK-based loss recovery [RFC3517] encounters it because of 398 how IsLost (SeqNum) is defined. 400 3.2. SACK Capability Misbehavior 402 If the receiver represents such a SACK misbehavior that it 403 advertises SACK capability but never sends any SACK blocks when it 404 should, this algorithm fails to enter loss recovery and 405 retransmission timeout is required for recovery. However, such 406 misbehavior does not allow SACK-based loss recovery [RFC3517] to 407 work either, and a TCP sender will anyway require a timeout to 408 recover if there was more than one lost data segment within the 409 window. 411 3.3. Compatibility with Duplicate ACK based Loss Recovery Algorithms 413 This algorithm SHOULD NOT be used together with a fast recovery 414 algorithm that determines the segments that have left the network 415 based on the number of arriving duplicate acknowledgements (e.g., 416 NewReno [RFC3782]), instead of the actual segments reported by SACK. 417 In presence of ACK reordering such an algorithm will count the 418 delayed duplicate acknowledgements during the fast recovery 419 algorithm as extra while determining the number of packets that have 420 left the network. 422 In general there should be very little reason to combine this 423 algorithm with a loss recovery algorithm that is based on inferior, 424 non-SACK based information only. 426 4. Security Considerations 428 A malicious TCP receiver may send false SACK information for 429 sequence number ranges which it has not received in order to trigger 430 Fast Retransmit sooner. Such behavior would only be useful when out- 431 of-order segments have arrived because otherwise the flow undergoes 432 a loss recovery with a window reduction. This kind of lying involves 433 guessing which segments will arrive later. In case the guess was 434 wrong, the performance of the flow is ruined because the TCP sender 435 will need a retransmission timeout as it will not retransmit the 436 segments until it assumes SACK reneging. On a successful guess the 437 attacker is able to trigger the recovery slightly earlier. The later 438 segments would have allowed reporting the very same regions with 439 SACK anyway. Therefore, the gain from this attack is small, hardly 440 justifiable considering the drastic effect of a misguess. 441 Furthermore, a similar attack can be made with the duplicate 442 acknowledgment based algorithm (even if the new SACK information 443 rule is applied) by sending false duplicate acknowledgements with 444 false SACK ranges, and trivially without the new SACK information 445 rule. 447 A variation of the lying attack discards reliability of the flow but 448 as soon as the reliability is not a concern of the receiver, a 449 number of simpler ways exist to attack TCP independently of this 450 algorithm. Thus this algorithm is not considered to weaken TCP 451 security properties against false information. 453 Splitting SACK blocks into a smaller than the received segment sized 454 chunks allows the receiver to enable recovery to start sooner 455 because of IsLost (SeqNum) discontiguous check. However, by doing so 456 the receiver neglects the possiblity of reordering for a little 457 gain. If the segment was just reordered, the sender performs 458 unnecessary window reduction and unnecessary retransmission of the 459 reordered segment. Another variant of SACK block splitting simply 460 tries to increase consumption of bandwidth by triggering a burst of 461 retransmissions falsely. However, the difference between sending 462 three duplicate ACKs (traditional algorithm) and a single ACK with 463 SACK blocks will not offer significant benefits to make such an 464 attack practical with a small DupThresh value such as three. In 465 case the sender keeps track of segment boundaries and applies them 466 in IsLost (SeqNum), such attack will not succeed as the sender 467 cannot be mislead to believe that a segment was split into multiple 468 chunks. 470 5. IANA Considerations 472 This document has no actions for IANA. 474 6. Acknowledgements 476 The authors would like to thank Alexander Zimmermann and Anna 477 Brunstrom for the comments on this document. 479 Appendix 481 A. Scenarios 483 A.1. Basic Case 485 In this scenario no Delayed ACK, ACK losses, reordering or other 486 "abnormal" behavior happens. For simplicity all the segments are 487 SMSS sized. 489 Once the TCP receiver gets first out-of-order segment, it sends a 490 duplicate ACK with SACK information about the received octets. The 491 following two out-of-order segments trigger a duplicate ACK each, 492 with the corresponding range SACKed in addition to the previously 493 know information. The sender gets those duplicate ACKs in-order, 494 each of them will SACK a new previously unknown segment. 496 This algorithm triggers loss recovery on third duplicate ACK because 497 IsLost (SeqNum) returns true as more than (DupThresh - 1) * SMSS 498 bytes become SACKed on the same acknowledgement, thus the behavior 499 is identical to that of a sender which is using duplicate 500 acknowledgments. If Limited Transmit is in use, two first duplicate 501 ACKs allow a single segment to be sent with either of the algorithms 502 (Pipe is decremented by SMSS by the SACKed octets per ACK allowing 503 SMSS worth of new octets). 505 ACK Transmitted Received ACK Sent 506 Received Segment Segment (Including SACK Blocks) 508 1000 509 3000-3499 3000-3499 (delayed ACK) 510 3500-3999 3500-3999 4000 511 2000 512 4000-4499 (dropped) 513 4500-4999 4500-4999 4000, SACK=4500-5000 514 3000 515 5000-5499 5000-5499 4000, SACK=4500-5500 516 5500-5999 5500-5999 4000, SACK=4500-6000 517 4000 518 6000-6499 6000-6499 4000, SACK=4500-6500 519 6500-6999 6500-6999 4000, SACK=4500-7000 520 4000, SACK=4500-5000 521 (lim. tr.) 7000-7499 7000-7499 4000, SACK=4500-7500 522 4000, SACK=4500-5500 523 (lim. tr.) 7500-7999 7500-7999 4000, SACK=4500-8000 524 4000, SACK=4500-6000 525 (fast retr.) 4000-4499 4000-4499 8000 526 4000, SACK=4500-6500 528 A.2. Delayed ACK 530 The case with delayed ACK occurs when the receiver sends the first 531 ACK with SACK information but since the previous ACK was sent with a 532 lower sequence number because an acknowledgment is held by delayed 533 ACK, the sender will not considered it as duplicate ACK. Because the 534 segment contains SACK information that is identical to the basic 535 case, the sender can use Limited Transmit with the same segments as 536 in the basic case and will start loss recovery at the third 537 acknowledgment, i.e., with the second duplicate acknowledgment. In 538 the same situation the duplicate ACK based sender will have to wait 539 for one more duplicate ACK to arrive to do the same as the first 540 acknowledgment is fully "wasted". 542 Technically an acknowledgement with a sequence number higher than 543 what was previously acknowledged is not a duplicate acknowledgement 544 but a presence of the SACK block tells another story revealing the 545 receiver which used delayed ACK, and thus the missing duplicate 546 acknowledgement in between. The response of a TCP sender taking 547 advantage of such inferred duplicate acknowledgements is well within 548 the guidelines of packet conservation principle [Jac88] as it still 549 sends only when segments have left the network. 551 ACK Transmitted Received ACK Sent 552 Received Segment Segment (Including SACK Blocks) 554 1500 555 3000-3499 3000-3499 3500 556 3500-3999 3500-3999 (delayed ACK) 557 2500 558 4000-4499 (dropped) 559 4500-4999 4500-4999 4000, SACK=4500-5000 560 3500 561 5000-5499 5000-5499 4000, SACK=4500-5500 562 5500-5999 5500-5999 4000, SACK=4500-6000 563 4000, SACK=4500-5000 (two segments left the network) 564 6000-6499 6000-6499 4000, SACK=4500-6500 565 (lim. tr.) 6500-6999 6500-6999 4000, SACK=4500-7000 566 4000, SACK=4500-5500 567 (lim. tr.) 7000-7499 7000-7499 4000, SACK=4500-7500 568 4000, SACK=4500-6000 569 (fast retr.) 4000-4499 4000-4499 7500 570 4000, SACK=4500-6500 572 A.3. ACK Loss 574 This case with ACK loss shares much behavior with the case with 575 delayed ACK. If hole at RCV.NXT is filled, the sender will notice 576 that cumulative ACK advanced. In case of out-of-order segments the 577 first ACK which gets through to the sender includes SACK blocks up 578 to the quantity the SACK block redundancy is able to cover. With 579 this algorithm the sender immediately takes use of all the 580 information that is made available by the incoming ACK. 582 ACK Transmitted Received ACK Sent 583 Received Segment Segment (Including SACK Blocks) 585 1000 586 3000-3499 3000-3499 (delayed ACK) 587 3500-3999 3500-3999 4000 588 2000 589 4000-4499 (dropped) 590 4500-4999 4500-4999 4000, SACK=4500-5000 591 (dropped) 592 3000 593 5000-5499 5000-5499 4000, SACK=4500-5500 594 5500-5999 5500-5999 4000, SACK=4500-6000 596 4000 597 6000-6499 6000-6499 4000, SACK=4500-6500 598 6500-6999 6500-6999 4000, SACK=4500-7000 599 4000, SACK=4500-5500 (two segments left the network) 600 (lim. tr.) 7000-7499 7000-7499 4000, SACK=4500-7500 601 (lim. tr.) 7500-7999 7500-7999 4000, SACK=4500-8000 602 4000, SACK=4500-6000 603 (fast retr.) 4000-4499 4000-4499 8000 604 4000, SACK=4500-6500 606 A.4. ACK Reordering 608 With ACK reordering an ACK is postponed. Due to redundancy the next 609 ACK after postponed one contains not only its own information but 610 also the information of the reordered ACK (similar to the ACK losses 611 case). When the reordered ACK arrives later, the sender already 612 knows the information it provides and therefore no actions are taken 613 with this algorithm. 615 ACK Transmitted Received ACK Sent 616 Received Segment Segment (Including SACK Blocks) 618 1000 619 3000-3499 3000-3499 (delayed ACK) 620 3500-3999 3500-3999 4000 621 2000 622 4000-4499 (dropped) 623 4500-4999 4500-4999 4000, SACK=4500-5000 624 (delayed) 625 3000 626 5000-5499 5000-5499 4000, SACK=4500-5500 627 5500-5999 5500-5999 4000, SACK=4500-6000 628 4000 629 6000-6499 6000-6499 4000, SACK=4500-6500 630 6500-6999 6500-6999 4000, SACK=4500-7000 631 4000, SACK=4500-5500 (two segments left the network) 632 (lim. tr.) 7000-7499 7000-7499 4000, SACK=4500-7500 633 (lim. tr.) 7500-7999 7500-7999 4000, SACK=4500-8000 634 4000, SACK=4500-5000 (has only redundant information) 635 4000, SACK=4500-6000 636 (fast retr.) 4000-4499 4000-4499 8000 637 4000, SACK=4500-6500 639 A.5. Duplicated Packet 641 A duplicate packet is received either due to unnecessary 642 retransmission or hardware duplication. It adds a redundant ACK 643 which has only redundant information or a data segment to the stream 644 which will trigger a redundant duplicate ACK (possibly with SACK 645 and/or DSACK [RFC2883] information). Because neither adds any new 646 SACKed octets at the TCP sender, this algorithm will not do anything 647 whereas a duplicate ACK based receiver would falsely consider it as 648 a duplicate ACK. 650 If one of the redundant ACKs is lost, the effect of duplication is 651 just cancelled. 653 It would be possible for the sender to detect this case using DSACK 654 alone. 656 A.6. Mitigation of Blind Throughput Reduction Attack 658 In case an attacker knows or is able to guess 4-tuple of a TCP 659 connection, it may apply a blind throughput reduction attack 660 [CPNI09]. In this attack TCP is tricked to send duplicate ACKs to 661 the other endpoint using segments likely residing out-of-window that 662 is considerably easier to achieve than a match with sequence 663 numbers. If more than dupThresh duplicate ACKs can be triggered in a 664 row without any legimate segment that advances acknowledged sequence 665 number, the other end acts according to the false congestion signal 666 and halves the window. 668 With this algorithm such duplicate ACKs are filtered because they do 669 not have any new in-window SACK blocks (DSACK [RFC2883] might be 670 present though, but it does not cover in-window octets). 672 References 674 Normative References 676 [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 677 793, September 1981. 679 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, 680 "TCP Selective Acknowledgment Options", RFC 2018, 681 October 1996. 683 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 684 Requirement Levels", BCP 14, RFC 2119, March 1997. 686 [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing 687 TCP's Loss Recovery Using Limited Transmit", RFC 3042, 688 January 2001. 690 [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, 691 "A Conservative Selective Acknowledgment (SACK)-based 692 Loss Recovery Algorithm for TCP", RFC 3517, April 2003. 694 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 695 Control", RFC 5681, September 2009. 697 Informative References 699 [AAA+10] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., 700 and P. Hurtig, "Early Retransmit for TCP and SCTP", 701 Internet-Draft, draft-ietf-tcpm-early-rexmt-04, January 702 2010. 704 [CPNI09] Security Assessment of the Transmission Control Protocol 705 (TCP). Available at: 706 http://www.cpni.gov.uk/Docs/tn-03-09-security-assessment- 707 TCP.pdf 709 [Jac88] Jacobson, V., "Congestion Avoidance and Control", In 710 Proceedings of ACM SIGCOMM '88, August 1988. 712 [MM96] M. Mathis, J. Mahdavi, "Forward Acknowledgment: Refining 713 TCP Congestion Control," In Proceedings of SIGCOMM '96, 714 August 1996. 716 [RFC896] Nagle, J., "Congestion Control in IP/TCP Internetworks", 717 RFC 896, January 1984. 719 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 720 Extension to the Selective Acknowledgement (SACK) Option 721 for TCP", RFC 2883, July 2000. 723 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 724 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 725 April 2004. 727 [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding 728 Acknowledgement Congestion Control to TCP", RFC 5690, 729 February 2010. 731 AUTHORS' ADDRESSES 733 Ilpo Jarvinen 734 University of Helsinki 735 P.O. Box 68 736 FI-00014 UNIVERSITY OF HELSINKI 737 Finland 738 Email: ilpo.jarvinen@helsinki.fi 740 Markku Kojo 741 University of Helsinki 742 P.O. Box 68 743 FI-00014 UNIVERSITY OF HELSINKI 744 Finland 745 Email: kojo@cs.helsinki.fi