idnits 2.17.1 draft-ietf-tcpm-sack-recovery-entry-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (19 October 2009) is 5303 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force I. Jarvinen 2 INTERNET-DRAFT M. Kojo 3 draft-ietf-tcpm-sack-recovery-entry-00.txt University of Helsinki 4 Intended status: Standards Track 19 October 2009 5 Expires: April 2010 7 Using TCP Selective Acknowledgement (SACK) Information to Determine 8 Duplicate Acknowledgements for Loss Recovery Initiation 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with 13 the provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months and may be updated, replaced, or obsoleted by other documents 22 at any time. It is inappropriate to use Internet-Drafts as 23 reference material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on April 2010. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents in effect on the date of 40 publication of this document (http://trustee.ietf.org/license-info). 41 Please review these documents carefully, as they describe your 42 rights and restrictions with respect to this document. 44 Abstract 46 This document describes a TCP sender algorithm to trigger loss 47 recovery based on the TCP Selective Acknowledgement (SACK) 48 information gathered on a SACK scoreboard instead of simply counting 49 the number of arriving duplicate acknowledgements (ACKs) in the 50 traditional way. The given algorithm is more robust to ACK losses, 51 ACK reordering, missed duplicate acknowledgements due to delayed 52 acknowledgements, and extra duplicate acknowledgements due to 53 duplicated segments and out-of-window segments. The algorithm allows 54 not only a timely initiation of TCP loss recovery but also reduces 55 false fast retransmits. It has a low implementation cost on top of 56 the SACK scoreboard defined in RFC 3517. 58 Table of Contents 60 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 5 61 1.1. Conventions and Terminology. . . . . . . . . . . . . . . 6 62 1.2. Definitions. . . . . . . . . . . . . . . . . . . . . . . 6 63 2. Algorithm Details . . . . . . . . . . . . . . . . . . . . . . 6 64 3. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . 8 65 3.1. Small Segment Sender . . . . . . . . . . . . . . . . . . 8 66 3.2. One Segment is Small . . . . . . . . . . . . . . . . . . 10 67 3.3. SACK Capability Misbehavior. . . . . . . . . . . . . . . 10 68 3.4. Compatibility with Duplicate ACK based Loss 69 Recovery Algorithms . . . . . . . . . . . . . . . . . . . . . 10 70 4. Security Considerations . . . . . . . . . . . . . . . . . . . 10 71 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 72 6. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 11 73 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 74 A. Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 12 75 A.1. Basic Case . . . . . . . . . . . . . . . . . . . . . . . 12 76 A.2. Delayed ACK. . . . . . . . . . . . . . . . . . . . . . . 13 77 A.3. ACK Losses . . . . . . . . . . . . . . . . . . . . . . . 14 78 A.4. ACK Reordering . . . . . . . . . . . . . . . . . . . . . 14 79 A.5. Packet Duplication . . . . . . . . . . . . . . . . . . . 15 80 A.6. Mitigation of Blind Throughput Reduction 81 Attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 82 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 83 Normative References . . . . . . . . . . . . . . . . . . . . . . 16 84 Informative References . . . . . . . . . . . . . . . . . . . . . 16 85 AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . 17 86 TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION: 88 Changes from draft-jarvinen-tcpm-sack-recovery-entry-01.txt 90 * Clarified issues that based on feedback may cause confusion for 91 the reader. 93 * Incorporated handling of cumulative ACKs into the algorithm 95 * 2581 refs -> 5681 97 * Added early-rexmt ID as a related one, it uses SACK information 98 similar to this algorithm (Thanks to Anna Brunstrom). 100 * More cases added where this algorithm is beneficial in taking 101 advantage of SACK block redundancy (thanks to Anna Brunstrom). 103 * Discuss on differences how duplicate ACK counter is managed 104 (traditional vs. this algorithm) 106 * Added ref and couple of words about blind throughput reduction 107 attack 109 * Wrote SACK splitting attacks. These attacks are quite close to the 110 edge in significance. Should consider just dropping (rather 111 insignificant). 113 Changes from draft-jarvinen-tcpm-sack-recovery-entry-00.txt 115 * TODO items embedded: Improvements with window update, clarify 116 dupack counting 118 * Modified ACK reordering scenario in appendix, shows now a scenario 119 where recovery is triggered in a more timely manner. 121 * IDnits 123 * Handle small segments case using duplicate ACKs counter paraller 124 to the SACK blocks based detection. 126 * Add a placeholder for SACK splitting 128 * Mentioned FACK as some ideas are inherited from there 130 END OF SECTION TO BE DELETED. 132 1. Introduction 134 The Transmission Control Protocol (TCP) [RFC793] has two methods for 135 triggering retransmissions. First, the TCP sender relies on 136 incoming duplicate acknowledgements (ACKs) [RFC5681], indicating 137 receipt of out-of-order segments at the TCP receiver. After 138 receiving a required number of duplicate ACKs (usually three), the 139 TCP sender retransmits the first unacknowledged segment and 140 continues with a fast recovery algorithm such as Reno [RFC5681], 141 NewReno [RFC3782] or SACK-based loss recovery [RFC3517]. Second, 142 the TCP sender maintains a retransmission timer that triggers 143 retransmission of segments, if the retransmission timer expires 144 before the segments have been acknowledged. 146 While the conservative loss recovery algorithm defined in [RFC3517] 147 takes full advantage of SACK information during a loss recovery, it 148 does not consider the very same information during the pre-recovery 149 detection phase. Instead, it simply counts the number of arriving 150 duplicate ACKs and leans on the number of duplicate ACKs in deciding 151 when to enter loss recovery. However, this traditional heuristics of 152 simply counting the number of duplicate ACKs to trigger a loss 153 recovery fails in several cases to determine correctly the actual 154 number of valid out-of-order segments the receiver has successfully 155 received. First, trusting on duplicate ACKs alone utterly fails to 156 get hold of the whole picture in case of ACK losses and ACK 157 reordering, resulting in delayed or missed initiation of fast 158 retransmit and fast recovery. Similarly, the delayed ACK mechanism 159 tends to conceal the first duplicate ACK as the delayed cumulative 160 ACK becomes combined with the first duplicate ACK when the first 161 out-of-order segment arrives at the receiver (in case of an enlarged 162 ACK ratio such as with ACK congestion control [FARI08], even more 163 significant portion is affected). Second, segment duplication or 164 out-of-window segments increase the risk of falsely triggering loss 165 recovery as they trigger duplicate ACKs. At worst, this legitimate 166 behavior on out-of-window segments can be turned into a blind 167 throughput reduction attack [CPNI09]. Third, receiver window 168 updates or opposite direction data segments cannot be counted as 169 duplicate ACKs with the traditional approach but can still contain 170 redundant SACK information that the sender could benefit from in a 171 scenario where the actual duplicate ACKs where lost. 173 The algorithm specified in this document uses TCP Selective 174 Acknowledgement Option [RFC2018] to determine duplicate ACKs and to 175 trigger loss recovery based on the information gathered on the SACK 176 scoreboard [RFC3517]. It works in the pre-recovery state giving a 177 more accurate heuristic for determining the number of out-of-order 178 segments arrived at the TCP receiver. The information gathered on 179 the scoreboard reveals missing ACKs and allows detecting duplicate 180 events. Therefore, the algorithm enables a timely triggering of Fast 181 Retransmit. In addition, it allows the use of Limited Transmit 182 [RFC3042] regardless of lost ACKs and also in the cases where the 183 SACK information is piggybacked to a cumulative ACK due to delayed 184 ACKs. This, in turn, allows keeping the ACK clock running more 185 accurately. 187 This algorithm is close to what Linux TCP implementation has used 188 for a very long time when in conservative SACK mode. A similar 189 approach is briefly mentioned along ACK congestion control [FARI08] 190 but as the usefulness of the algorithm in this document is more 191 general and not limited to ACK congestion control we specify it 192 separately. We also note that the definition of a duplicate 193 acknowledgement already suggests that an incoming ACK can be 194 considered as a duplicate ACK if it "contains previously unknown 195 SACK information" [RFC5681]. In addition, SACK information is used, 196 whenever available, for similar purpose by Early Retransmit 197 [AAA+09]. 199 This algorithm also resembles Forward Acknowledgement (FACK) [MM96] 200 but they differ in how the quantity of data outstanding in the 201 network is determined. FACK always assumes that every non-SACKed 202 octet below the highest SACKed octet is lost which is only true if 203 no reordering occurs. Thus it would simply trigger loss recovery 204 whenever the highest SACKed octet is more than dupThresh segments 205 above SND.UNA. 207 1.1. Conventions and Terminology 209 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 210 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 211 document are to be interpreted as described in BCP 14, RFC 2119 212 [RFC2119] and indicate requirement levels for protocols. 214 1.2. Definitions 216 The reader is expected to be familiar with the definitions given in 217 [RFC5681], [RFC2018], and [RFC3517]. 219 2. Algorithm Details 221 In order to use this algorithm, a TCP sender MUST have TCP Selective 222 Acknowledgement Option [RFC2018] enabled and negotiated for the TCP 223 connection. A TCP sender MUST maintain SACK information in an 224 appropriate data structure such as scoreboard defined in [RFC3517]. 226 This algorithm uses functions IsLost (SeqNum), Update(), and SetPipe 227 () and variables DupThresh, HighData, HighRxt, Pipe, and 228 RecoveryPoint, as defined in [RFC3517]. 230 A TCP sender using this algorithm MUST take following steps: 232 1) Upon the receipt of any ACK containing SACK information: 234 If no previous loss event has occurred on the connection OR 235 RecoveryPoint is less than SND.UNA (the oldest unacknowledged 236 sequence number [RFC793]), continue with the other steps of this 237 algorithm. Otherwise, continue the ongoing loss recovery. 239 2) Update the scoreboard via the Update () function as outlined in 240 [RFC3517]. 242 3) If ACK is a cumulative ACK, reset duplicate ACK counter to zero. 244 4) If ACK contains SACK blocks with previously unknown in-window 245 (i.e., between SND.UNA and HighData, assuming SND.UNA has been 246 updated from the acknowledgment number of the ACK) SACK 247 information, increase duplicate ACK counter. 249 5) Determinate if a loss recovery should be initiated: 251 If IsLost(SND.UNA) returns false AND the sender has received 252 less than DupThresh duplicate ACKs, goto step 6A. Otherwise goto 253 step 6B. 255 6A) Invoke optional Limited Transmit: 257 Set HighRxt to SND.UNA and run SetPipe(). The TCP sender MAY 258 transmit previously unsent data segments according the 259 guidelines of Limited Transmit [RFC3042], with the exception 260 that the amount of octets that can be send is determined by Pipe 261 and cwnd. 263 If cwnd - pipe >= 1 SMSS, the TCP sender can transmit one or 264 more segments as follows: 266 Send Loop: 268 a) If available unsent data exists and the receiver's advertised 269 window allows, transmit one segment of up to SMSS octets of 270 previously unsent data starting with sequence number 271 HighData+1 and update HighData to reflect the transmission of 272 the data segment. Otherwise, exit Send Loop. 274 b) Run SetPipe() to re-calculate the number of outstanding 275 octets in the network. If cwnd - pipe >= 1 SMSS, go to step 276 a) of Send Loop. Otherwise, exit Send Loop. 278 6B) Invoke Fast Retransmit and enter loss recovery: 280 Initiate a loss recovery phase, per the fast retransmit 281 algorithm outlined in [RFC5681] and continue with a fast 282 recovery algorithm, such as the SACK-based loss recovery 283 algorithm outlined in [RFC3517]. 285 3. Discussion 287 In scenarios where no ACK losses nor reordering occur and the first 288 acknowledgement with SACK information is not the ACK held due to 289 delayed acknowledgements mechanism, the new SACK information with 290 each duplicate ACK covers a single segment. In such a case, this 291 algorithm will trigger loss recovery after three duplicate 292 acknowledgements and will allow transmission of a single new segment 293 using Limited Transmit on the first and second duplicate ACK. This 294 is identical to the behavior that would occur without this algorithm 295 (assuming DupThresh is 3 and that all segments are SMSS sized). This 296 scenario together with other scenarios describing the behavior of 297 the algorithm are depicted in Appendix A. 299 This algorithm SHOULD be used also with an ACK that contains a 300 window update or opposite direction data that could not be 301 considered as a duplicate ACK in the traditional algorithm. Such 302 behavior is safe because the SACK information can only add more 303 information to the current state of the sender; at worst, all 304 received information is just redundant. 306 Setting HighRxt to SND.UNA in Step 6A has no direct relation to this 307 algorithm. Yet it is included in the algorithm to avoid confusion in 308 how to implement SetPipe() correctly because it depends on having a 309 valid HighRxt value [RFC3517]. 311 A set of potential issues to consider with the algorithm are 312 discussed in the following. 314 3.1. Small Segment Sender 316 If a TCP sender is sending small segments (usually intentionally 317 overriding Nagle algorithm [RFC896]), the IsLost(SND.UNA) used in 318 step 5 of the algorithm might fail to detect the need for loss 319 recovery on the third duplicate acknowledgement because not enough 320 octets have been SACKed to cover DupThresh * SMSS bytes above 321 SND.UNA. Therefore, the traditional duplicate ACK algorithm is 322 needed as a fallback. Steps 3, 4 and the latter condition of step 5 323 implement the traditional algorithm in paraller to the SACK block 324 based detection. 326 The number of duplicate ACKs is an artificial metric to estimate the 327 number of segments the receiver has already in its receive buffer. 328 How accurately they match depends on the scenario. Because of that, 329 the goal of the duplicate ACK counter included into this algorithm 330 is not to achieve bug-to-bug compatibility with the plain duplicate 331 ACK counter but to estimate how many out-of-order segments the 332 receiver has already queued in a more accurate way. Therefore, the 333 duplicate ACK counter used as a fallback mechanism in this algorithm 334 differs from the plain duplicate ACK counter. However, such 335 differences indicate a scenario where the plain counter was not able 336 to accurately keep track of the receiver state. 338 While the fallback algorithm itself does not look into 339 acknowledgment field in order to make a decision whether ACK is a 340 "duplicate ACK", the duplicate ACK counter is not renamed in this 341 document as in practice most of ACKs that increment the counter 342 would still contain a duplicate acknowledgment number. In contrast 343 to the traditional approach, only condition that must be satisfied 344 to increment the duplicate ACK counter with this algorithm is that 345 the acknowledgement MUST contain at least one in-window SACK block 346 that covers octets that where not previously SACKed [RFC5681]. In 347 cases with ACK losses or delayed ACKs this condition can also match 348 to cumulative ACKs, receiver window updates and opposite direction 349 data segments but still the counter can safely be incremented. 351 Alternatively to the fallback algorithm, a TCP sender that is able 352 to discern segment boundaries accurately can consider full segments 353 in IsLost() regardless of segment size. Therefore, such a TCP 354 sender can avoid the problem with small segments using 355 IsLost(SND.UNA) check alone which means that Steps 3, 4 and the 356 latter condition of step 5 are redundant and do not have to be 357 implemented. 359 Note: the small segments problem is not unique to this algorithm but 360 also the SACK-based loss recovery [RFC3517] encounters it because of 361 how IsLost() is defined. 363 3.2. One Segment is Small 365 A variant of small segment sender case is the case where only one of 366 the SACKed segments is smaller than SMSS (possible even with Nagle 367 enabled). If TCP sender lacks ability to use the improved method by 368 discerning segment boundaries but still wants robustness against ACK 369 losses in this case, it MAY extend the condition in Step 5 with the 370 test: 372 SACKed octets > SMSS * (DupThresh - 1) 374 3.3. SACK Capability Misbehavior 376 If the receiver represents such a SACK misbehavior that it 377 advertises SACK capability but never sends any SACK blocks when it 378 should, this algorithm fails to enter loss recovery and 379 retransmission timeout is required for recovery. However, such 380 misbehavior does not allow SACK-based loss recovery [RFC3517] to 381 work either, and a TCP sender will anyway require a timeout to 382 recover. 384 3.4. Compatibility with Duplicate ACK based Loss Recovery Algorithms 386 This algorithm SHOULD NOT be used together with a fast recovery 387 algorithm that determines the segments that have left the network 388 based on the number of arriving duplicate acknowledgements (e.g., 389 NewReno [RFC3782]), instead of the actual segments reported by SACK. 390 In presence of ACK reordering such an algorithm will count the 391 delayed duplicate acknowledgements during the fast recovery 392 algorithm as extra while determining the number of packets that have 393 left the network. 395 In general there should be very little reason to combine this 396 algorithm with a loss recovery algorithm that is based on inferior, 397 non-SACK based information only. 399 4. Security Considerations 401 A malicious TCP receiver may send false SACK information for 402 sequence number ranges which it has not received in order to trigger 403 Fast Retransmit sooner. Such behavior would only be useful when out- 404 of-order segments have arrived because otherwise the flow undergoes 405 a loss recovery with a window reduction. This kind of lying involves 406 guessing which segments will arrive later. In case the guess was 407 wrong, the performance of the flow is ruined because the TCP sender 408 will need a retransmission timeout as it will not retransmit the 409 segments until it assumes SACK reneging. On a successful guess the 410 attacker is able to trigger the recovery slightly earlier. The later 411 segments would have allowed reporting the very same regions with 412 SACK anyway. Therefore, the gain from this attack is small, hardly 413 justifiable considering the drastic effect of a misguess. Also, a 414 similar attack can be made with the duplicate acknowledgment based 415 algorithm (even if the new SACK information rule is applied) by 416 sending false duplicate acknowledgements with false SACK ranges, and 417 trivially without the new SACK information rule. 419 A variation of the lying attack discards reliability of the flow but 420 as soon as the reliability is not a concern of the receiver, a 421 number of simpler ways exist to attack TCP independently of this 422 algorithm. Thus this algorithm is not considered to weaken TCP 423 security properties against false information. 425 Splitting SACK blocks into a smaller than the received segment sized 426 chunks allows the receiver to enable recovery to start sooner 427 because of IsLost() discontiguous check. However, by doing so the 428 receiver neglects the possiblity of reordering for a little gain. If 429 the segment was just reordered, the sender performs unnecessary 430 window reduction and unnecessary retransmission of the reordered 431 segment. Another variant of SACK block splitting simply tries to 432 increase consumption of bandwidth but with small dupThresh value 433 such as three the difference between sending three duplicate ACKs 434 (traditional algorithm) and a single ACK with SACK blocks will not 435 offer significant benefits to make such attack practical. In case 436 the sender keeps track of segment boundaries and applies them in 437 IsLost(), these attack will not succeed as the sender cannot be 438 mislead to believe that a segment was split into multiple chunks. 440 5. IANA Considerations 442 This document has no actions for IANA. 444 6. Acknowledgements 446 The authors would like to thank Alexander Zimmermann and Anna 447 Brunstrom for the comments on this document. 449 Appendix 450 A. Scenarios 452 A.1. Basic Case 454 In this scenario no Delayed ACK, ACK losses, reordering or other 455 "abnormal" behavior happens. For simplicity all the segments are 456 SMSS sized. 458 Once the TCP receiver gets first out-of-order segment, it sends a 459 duplicate ACK with SACK information about the received octets. The 460 following two out-of-order segments trigger a duplicate ACK each, 461 with the corresponding range SACKed in addition to the previously 462 know information. The sender gets those duplicate ACKs in-order, 463 each of them will SACK a new previously unknown segment. 465 This algorithm triggers loss recovery on third duplicate ACK because 466 IsLost returns true as DupThresh * SMSS bytes became SACKed above 467 the SND.UNA on the same acknowledgement, thus the behavior is 468 identical to that of a sender which is using duplicate 469 acknowledgments. If Limited Transmit is in use, two first duplicate 470 ACKs allow a single segment to be sent with either of the algorithms 471 (Pipe is decremented by SMSS by the SACKed octets per ACK allowing 472 SMSS worth of new octets). 474 ACK Transmitted Received ACK Sent 475 Received Segment Segment (Including SACK Blocks) 477 1000 478 3000-3499 3000-3499 (delayed ACK) 479 3500-3999 3500-3999 4000 480 2000 481 4000-4499 (dropped) 482 4500-4999 4500-4999 4000, SACK=4500-5000 483 3000 484 5000-5499 5000-5499 4000, SACK=4500-5500 485 5500-5999 5500-5999 4000, SACK=4500-6000 486 4000 487 6000-6499 6000-6499 4000, SACK=4500-6500 488 6500-6999 6500-6999 4000, SACK=4500-7000 489 4000, SACK=4500-5000 490 7000-7499 7000-7499 4000, SACK=4500-7500 491 4000, SACK=4500-5500 492 7500-7999 7500-7999 4000, SACK=4500-8000 493 4000, SACK=4500-6000 494 4000-4499 4000-4499 8000 495 4000, SACK=4500-6500 497 A.2. Delayed ACK 499 A basic case with delayed ACK send the first ACK with SACK 500 information but since the previous ACK was sent with a lower 501 sequence number because an acknowledgment is held by delayed ACK, 502 the sender will not considered it as duplicate ACK. Because the 503 segment contains SACK information that is identical to the basic 504 case, the sender can use Limited Transmit with the same segments as 505 in the basic case and will start loss recovery at the third 506 acknowledgment, i.e., with the second duplicate acknowledgment. In 507 the same situation the duplicate ACK based sender will have to wait 508 for one more duplicate ACK to arrive to do the same as the first 509 acknowledgment is fully "wasted". 511 Technically an acknowledgement with a sequence number higher than 512 what was previously acknowledged is not a duplicate acknowledgement 513 but a presence of the SACK block tells another story revealing the 514 receiver which used delayed ACK, and thus the missing duplicate 515 acknowledgement in between. The response of a TCP sender taking 516 advantage of such inferred duplicate acknowledgements is well within 517 the guidelines of packet conservation principle [Jac88] as it still 518 sends only when segments have left the network. 520 ACK Transmitted Received ACK Sent 521 Received Segment Segment (Including SACK Blocks) 523 1500 524 3000-3499 3000-3499 3500 525 3500-3999 3500-3999 (delayed ACK) 526 2500 527 4000-4499 (dropped) 528 4500-4999 4500-4999 4000, SACK=4500-5000 529 3500 530 5000-5499 5000-5499 4000, SACK=4500-5500 531 5500-5999 5500-5999 4000, SACK=4500-6000 532 4000, SACK=4500-5000 533 6000-6499 6000-6499 4000, SACK=4500-6500 534 6500-6999 6500-6999 4000, SACK=4500-7000 535 4000, SACK=4500-5500 536 7000-7499 7000-7499 4000, SACK=4500-7500 537 4000, SACK=4500-6000 538 4000-4499 4000-4499 7500 539 4000, SACK=4500-6500 541 A.3. ACK Losses 543 This case with ACK loss shares much behavior with the case with 544 delayed ACK. If hole at rcv.nxt is filled, the sender will notice 545 that cumulative ACK advanced. In case of out-of-order segments the 546 first ACK which gets through to the sender includes SACK blocks up 547 to the quantity the SACK block redundancy is able to cover. With 548 this algorithm the sender immediately takes use of all the 549 information that is made available by the incoming ACK. 551 ACK Transmitted Received ACK Sent 552 Received Segment Segment (Including SACK Blocks) 554 1000 555 3000-3499 3000-3499 (delayed ACK) 556 3500-3999 3500-3999 4000 557 2000 558 4000-4499 (dropped) 559 4500-4999 4500-4999 4000, SACK=4500-5000 560 (dropped) 561 3000 562 5000-5499 5000-5499 4000, SACK=4500-5500 563 5500-5999 5500-5999 4000, SACK=4500-6000 564 4000 565 6000-6499 6000-6499 4000, SACK=4500-6500 566 6500-6999 6500-6999 4000, SACK=4500-7000 567 4000, SACK=4500-5500 (two segments left the network) 568 7000-7499 7000-7499 4000, SACK=4500-7500 569 7500-7999 7500-7999 4000, SACK=4500-8000 570 4000, SACK=4500-6000 571 4000-4499 4000-4499 8000 572 4000, SACK=4500-6500 574 A.4. ACK Reordering 576 With ACK reordering an ACK is postponed. Due to redundancy the next 577 ACK after postponed one contains not only its own information but 578 also the information of the reordered ACK (similar to the ACK losses 579 case). Then when the reordered ACK arrives, the sender already knows 580 about the information it provides and therefore no actions are taken 581 with this algorithm. 583 ACK Transmitted Received ACK Sent 584 Received Segment Segment (Including SACK Blocks) 586 1000 587 3000-3499 3000-3499 (delayed ACK) 588 3500-3999 3500-3999 4000 589 2000 590 4000-4499 (dropped) 591 4500-4999 4500-4999 4000, SACK=4500-5000 592 (delayed) 593 3000 594 5000-5499 5000-5499 4000, SACK=4500-5500 595 5500-5999 5500-5999 4000, SACK=4500-6000 596 4000 597 6000-6499 6000-6499 4000, SACK=4500-6500 598 6500-6999 6500-6999 4000, SACK=4500-7000 599 4000, SACK=4500-5500 600 7000-7499 7000-7499 4000, SACK=4500-7500 601 7500-7999 7500-7999 4000, SACK=4500-8000 602 4000, SACK=4500-6000 603 4000-4499 4000-4499 8000 604 4000, SACK=4500-5000 (has only redundant information) 605 4000, SACK=4500-6500 607 A.5. Packet Duplication 609 Packet duplication happens either due to unnecessary retransmission 610 or hardware duplication. It adds a redundant ACK which has only 611 redundant information or a data segment to the stream which will 612 triggers a redundant duplicate ACK (possibly with SACK and/or DSACK 613 [RFC2883] information). Because neither adds any new SACKed octets 614 at the sender, this algorithm will not do anything while duplicate 615 ACK based receiver would falsely consider it as a duplicate ACK. 617 If one of the redundant ACKs is lost, the effect of duplication is 618 just negated. 620 It is possible for the sender to detect this case using DSACK alone. 622 A.6. Mitigation of Blind Throughput Reduction Attack 624 In case an attacker knows or is able to guess 4-tuple of a TCP 625 connection, it may apply a blind throughput reduction attack 626 [CPNI09]. In this attack TCP is tricked to send duplicate ACK to 627 the other endpoint using out-of-window segments which it is 628 considerably easier to achieve than a match with sequence numbers. 629 If more than dupThresh duplicate ACKs can be triggered in row 630 without any legimate segment that advances acknowledged sequence 631 number, the other end acts according that false congestion signal 632 and halves the window. 634 With this algorithm such duplicate ACKs are filtered because they do 635 not have any new in-window SACK blocks (DSACK [RFC2883] might be 636 present though). 638 References 640 Normative References 642 [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 643 793, September 1981. 645 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, 646 "TCP Selective Acknowledgment Options", RFC 2018, 647 October 1996. 649 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 650 Requirement Levels", BCP 14, RFC 2119, March 1997. 652 [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing 653 TCP's Loss Recovery Using Limited Transmit", RFC 3042, 654 January 2001. 656 [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, 657 "A Conservative Selective Acknowledgment (SACK)-based 658 Loss Recovery Algorithm for TCP", RFC 3517, April 2003. 660 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 661 Control", RFC 5681, September 2009. 663 Informative References 665 [AAA+09] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., 666 and P. Hurtig, "Early Retransmit for TCP and SCTP", 667 Internet-Draft, draft-ietf-tcpm-early-rexmt-01, January 668 2009. 670 [CPNI09] Security Assessment of the Transmission Control Protocol 671 (TCP). Available at: 672 http://www.cpni.gov.uk/Docs/tn-03-09-security-assessment- 673 TCP.pdf 675 [FARI08] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding 676 Acknowledgement Congestion Control to TCP", 677 Internet-Draft, draft-floyd-tcpm-ackcc-06, July 2009. 679 [Jac88] Jacobson, V., "Congestion Avoidance and Control", In 680 Proc. ACM SIGCOMM 88. 682 [MM96] M. Mathis, J. Mahdavi, "Forward Acknowledgment: Refining 683 TCP Congestion Control," Proceedings of SIGCOMM'96, August 684 1996, Stanford, CA. 686 [RFC896] Nagle, J., "Congestion Control in IP/TCP Internetworks", 687 RFC 896, January 1984. 689 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 690 Extension to the Selective Acknowledgement (SACK) Option 691 for TCP", RFC 2883, July 2000. 693 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 694 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 695 April 2004. 697 AUTHORS' ADDRESSES 699 Ilpo Jarvinen 700 University of Helsinki 701 P.O. Box 68 702 FI-00014 UNIVERSITY OF HELSINKI 703 Finland 704 Email: ilpo.jarvinen@helsinki.fi 706 Markku Kojo 707 University of Helsinki 708 P.O. Box 68 709 FI-00014 UNIVERSITY OF HELSINKI 710 Finland 711 Email: kojo@cs.helsinki.fi