idnits 2.17.1 draft-ietf-tcpm-3517bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 4 instances of too long lines in the document, the longest one being 6 characters in excess of 72. ** There is 1 instance of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 26, 2012) is 4468 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'A' is mentioned on line 151, but not defined == Missing Reference: 'B' is mentioned on line 151, but not defined == Unused Reference: 'RFC2026' is defined on line 596, but no explicit reference was found in the text == Unused Reference: 'Jac90' is defined on line 628, but no explicit reference was found in the text == Unused Reference: 'RFC3042' is defined on line 644, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Duplicate reference: RFC2018, mentioned in 'Errata1610', was also mentioned in 'RFC2018'. -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) -- Obsolete informational reference (is this intentional?): RFC 3517 (Obsoleted by RFC 6675) Summary: 4 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force E. Blanton 2 INTERNET-DRAFT Purdue University 3 draft-ietf-tcpm-3517bis-01.txt M. Allman 4 ICSI 5 L. Wang 6 Juniper Networks 7 I. Jarvinen 8 M. Kojo 9 University of Helsinki 10 Y. Nishida 11 WIDE Project 12 January 26, 2012 14 A Conservative Selective Acknowledgment (SACK)-based 15 Loss Recovery Algorithm for TCP 17 Status of this Memo 19 This Internet-Draft is submitted to IETF in full conformance with 20 the provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six 28 months and may be updated, replaced, or obsoleted by other documents 29 at any time. It is inappropriate to use Internet-Drafts as 30 reference material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on May 22, 2012. 40 Copyright Notice 42 Copyright (c) 2012 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with 50 respect to this document. Code Components extracted from this 51 document must include Simplified BSD License text as described in 52 Section 4.e of the Trust Legal Provisions and are provided without 53 warranty as described in the Simplified BSD License. 55 Abstract 57 This document presents a conservative loss recovery algorithm for TCP 58 that is based on the use of the selective acknowledgment (SACK) TCP 59 option. The algorithm presented in this document conforms to the 60 spirit of the current congestion control specification (RFC 5681), 61 but allows TCP senders to recover more effectively when multiple 62 segments are lost from a single flight of data. 64 1 Introduction 66 This document presents a conservative loss recovery algorithm for TCP 67 that is based on the use of the selective acknowledgment (SACK) TCP 68 option. While the TCP SACK [RFC2018] is being steadily deployed in 69 the Internet [All00], there is evidence that hosts are not using the 70 SACK information when making retransmission and congestion control 71 decisions [PF01]. The goal of this document is to outline one 72 straightforward method for TCP implementations to use SACK 73 information to increase performance. 75 [RFC5681] allows advanced loss recovery algorithms to be used by TCP 76 [RFC793] provided that they follow the spirit of TCP's congestion 77 control algorithms [RFC5681, RFC2914]. [RFC3782] outlines one such 78 advanced recovery algorithm called NewReno. This document outlines a 79 loss recovery algorithm that uses the SACK [RFC2018] TCP option to 80 enhance TCP's loss recovery. The algorithm outlined in this 81 document, heavily based on the algorithm detailed in [FF96], is a 82 conservative replacement of the fast recovery algorithm [Jac90, 83 RFC5681]. The algorithm specified in this document is a 84 straightforward SACK-based loss recovery strategy that follows the 85 guidelines set in [RFC5681] and can safely be used in TCP 86 implementations. Alternate SACK-based loss recovery methods can be 87 used in TCP as implementers see fit (as long as the alternate 88 algorithms follow the guidelines provided in [RFC5681]). Please 89 note, however, that the SACK-based decisions in this document (such 90 as what segments are to be sent at what time) are largely decoupled 91 from the congestion control algorithms, and as such can be treated as 92 separate issues if so desired. 94 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 95 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 96 document are to be interpreted as described in BCP 14, RFC 2119 97 [RFC2119]. 99 2 Definitions 101 The reader is expected to be familiar with the definitions given in 102 [RFC5681]. 104 The reader is assumed to be familiar with selective acknowledgments 105 as specified in [RFC2018]. 107 For the purposes of explaining the SACK-based loss recovery algorithm 108 we define six variables that a TCP sender stores: 110 "HighACK" is the sequence number of the highest byte of data that 111 has been cumulatively ACKed at a given point. 113 "HighData" is the highest sequence number transmitted at a given 114 point. 116 "HighRxt" is the highest sequence number which has been 117 retransmitted during the current loss recovery phase. 119 "RescueRxt" is the highest sequence number which has been 120 retransmitted optimistically to prevent stalling of the ACK clock 121 when there is loss at the end of the window and no new data is 122 available for transmission. 124 "Pipe" is a sender's estimate of the number of bytes outstanding 125 in the network. This is used during recovery for limiting the 126 sender's sending rate. The pipe variable allows TCP to use a 127 fundamentally different congestion control than specified in 128 [RFC5681]. The algorithm is often referred to as the "pipe 129 algorithm". 131 "DupAcks" is the number of duplicate acknowledgments received 132 since the last cumulative acknowledgment. 134 For the purposes of this specification we define a "duplicate 135 acknowledgment" as a segment that arrives carrying a SACK block which 136 identifies previously unacknowledged and un-SACKed octets between 137 HighACK and HighData. Note that an ACK which carries new 138 SACK data is counted as a duplicate acknowledgment under this 139 definition even if it carries new data, changes the advertised 140 window, or moves the cumulative acknowledgment point, which is 141 different from the definition of duplicate acknowledgment 142 in [RFC5681]. 144 We define a variable "DupThresh" that holds the number of duplicate 145 acknowledgments required to trigger a retransmission. Per [RFC5681] 146 this threshold is defined to be 3 duplicate acknowledgments. 147 However, implementers should consult any updates to [RFC5681] to 148 determine the current value for DupThresh (or method for determining 149 its value). 151 Finally, a range of sequence numbers [A,B] is said to "cover" 152 sequence number S if A <= S <= B. 154 3 Keeping Track of SACK Information 156 For a TCP sender to implement the algorithm defined in the next 157 section it must keep a data structure to store incoming selective 158 acknowledgment information on a per connection basis. Such a data 159 structure is commonly called the "scoreboard". The specifics of the 160 scoreboard data structure are out of scope for this document (as long 161 as the implementation can perform all functions required by this 162 specification). 164 Note that this document refers to keeping account of (marking) 165 individual octets of data transferred across a TCP connection. A 166 real-world implementation of the scoreboard would likely prefer to 167 manage this data as sequence number ranges. The algorithms presented 168 here allow this, but require the ability to mark arbitrary sequence 169 number ranges as having been selectively acknowledged. 171 Finally, note that the algorithm in this document assumes a 172 sender that is not keeping track of segment boundaries after 173 transmitting a segment. It is possible that a sender that did 174 keep this extra state may be able to use a more refined and 175 precise algorithm than the one presented herein, however, we 176 leave this as future work. 178 4 Processing and Acting Upon SACK Information 180 For the purposes of the algorithm defined in this document the 181 scoreboard SHOULD implement the following functions: 183 Update (): 185 Given the information provided in an ACK, each octet that is 186 cumulatively ACKed or SACKed should be marked accordingly in the 187 scoreboard data structure, and the total number of octets SACKed 188 should be recorded. 190 Note: SACK information is advisory and therefore SACKed data MUST 191 NOT be removed from TCP's retransmission buffer until the data is 192 cumulatively acknowledged [RFC2018]. 194 IsLost (SeqNum): 196 This routine returns whether the given sequence number is 197 considered to be lost. The routine returns true when either 198 DupThresh discontiguous SACKed sequences have arrived above 199 'SeqNum' or more than (DupThresh - 1) * SMSS bytes with sequence 200 numbers greater than 'SeqNum' have been SACKed. Otherwise, the 201 routine returns false. 203 SetPipe (): 205 This routine traverses the sequence space from HighACK to HighData 206 and MUST set the "pipe" variable to an estimate of the number of 207 octets that are currently in transit between the TCP sender and 208 the TCP receiver. After initializing pipe to zero the following 209 steps are taken for each octet 'S1' in the sequence space between 210 HighACK and HighData that has not been SACKed: 212 (a) If IsLost (S1) returns false: 214 Pipe is incremented by 1 octet. 216 The effect of this condition is that pipe is incremented for 217 packets that have not been SACKed and have not been determined 218 to have been lost (i.e., those segments that are still assumed 219 to be in the network). 221 (b) If S1 <= HighRxt: 223 Pipe is incremented by 1 octet. 225 The effect of this condition is that pipe is incremented for 226 the retransmission of the octet. 228 Note that octets retransmitted without being considered lost are 229 counted twice by the above mechanism. 231 NextSeg (): 233 This routine uses the scoreboard data structure maintained by the 234 Update() function to determine what to transmit based on the SACK 235 information that has arrived from the data receiver (and hence 236 been marked in the scoreboard). NextSeg () MUST return the 237 sequence number range of the next segment that is to be 238 transmitted, per the following rules: 240 (1) If there exists a smallest unSACKed sequence number 'S2' that 241 meets the following three criteria for determining loss, the 242 sequence range of one segment of up to SMSS octets starting 243 with S2 MUST be returned. 245 (1.a) S2 is greater than HighRxt. 247 (1.b) S2 is less than the highest octet covered by any 248 received SACK. 250 (1.c) IsLost (S2) returns true. 252 (2) If no sequence number 'S2' per rule (1) exists but there 253 exists available unsent data and the receiver's advertised 254 window allows, the sequence range of one segment of up to SMSS 255 octets of previously unsent data starting with sequence number 256 HighData+1 MUST be returned. 258 (3) If the conditions for rules (1) and (2) fail, but there exists 259 an unSACKed sequence number 'S3' that meets the criteria for 260 detecting loss given in steps (1.a) and (1.b) above 261 (specifically excluding step (1.c)) then one segment of up to 262 SMSS octets starting with S3 SHOULD be returned. 264 (4) If the conditions for (1), (2), and (3) fail, but there 265 exists outstanding unSACKed data, we provide the 266 opportunity for a single "rescue" retransmission per entry 267 into loss recovery. If HighACK is greater than RescueRxt 268 (or RescueRxt is undefined), then one segment of up to 269 SMSS octets which MUST include the highest outstanding 270 unSACKed sequence number SHOULD be returned, and RescueRxt 271 set to RecoveryPoint. HighRxt MUST NOT be updated. 273 Note that rules (3) and (4) are a sort of retransmission "last 274 resort". They allow for retransmission of sequence numbers 275 even when the sender has less certainty a segment has been 276 lost than as with rule (1). Retransmitting segments via rule 277 (3) and (4) will help sustain TCP's ACK clock and therefore 278 can potentially help avoid retransmission timeouts. However, 279 in sending these segments the sender has two copies of the 280 same data considered to be in the network (and also in the 281 Pipe estimate, in the case of (3)). When an ACK or SACK 282 arrives covering this retransmitted segment, the sender cannot 283 be sure exactly how much data left the network (one of the two 284 transmissions of the packet or both transmissions of the 285 packet). Therefore the sender may underestimate Pipe by 286 considering both segments to have left the network when it is 287 possible that only one of the two has. 289 (5) If the conditions for each of (1), (2), (3), and (4) are not 290 met, then NextSeg () MUST indicate failure, and no segment is 291 returned. 293 Note: The SACK-based loss recovery algorithm outlined in this 294 document requires more computational resources than previous TCP loss 295 recovery strategies. However, we believe the scoreboard data 296 structure can be implemented in a reasonably efficient manner (both 297 in terms of computation complexity and memory usage) in most TCP 298 implementations. 300 5 Algorithm Details 302 Upon the receipt of any ACK containing SACK information, the 303 scoreboard MUST be updated via the Update () routine. 305 If the incoming ACK is a cumulative acknowledgment, the TCP MUST 306 reset DupAcks to zero. 308 If the incoming ACK is a duplicate acknowledgment per the definition 309 in Section 2 (regardless of its status as a cumulative acknowledgment), 310 and the TCP is not currently in loss recovery, the TCP MUST increase 311 DupAcks by one and take the following steps: 313 (1) If DupAcks >= DupThresh, go to step (4). 315 Note: This check covers the case when a TCP receives SACK 316 information for multiple segments smaller than SMSS, which can 317 potentially prevent IsLost() (next step) from declaring a segment 318 as lost. 320 (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns 321 true---indicating at least three segments have arrived above 322 the current cumulative acknowledgment point, which is taken 323 to indicate loss---go to step (4). 325 (3) The TCP MAY transmit previously unsent data segments as per 326 Limited Transmit [RFC5681], except that the number of octets 327 which may be sent is governed by Pipe and cwnd as follows: 329 (3.1) Set HighRxt to HighACK. 331 (3.2) Run SetPipe (). 333 (3.3) If (cwnd - pipe) >= 1 SMSS, there exists previously 334 unsent data, and the receiver's advertised window 335 allows, transmit up to 1 SMSS of data starting with the 336 octet HighData+1 and update HighData to reflect this 337 transmission, then return to (3.2). 339 (3.4) Terminate processing of this ACK. 341 (4) Invoke Fast Retransmit and enter loss recovery as follows: 343 (4.1) RecoveryPoint = HighData 345 When the TCP sender receives a cumulative ACK for this 346 data octet the loss recovery phase is terminated. 348 (4.2) ssthresh = cwnd = (FlightSize / 2) 350 The congestion window (cwnd) and slow start threshold 351 (ssthresh) are reduced to half of FlightSize per 352 [RFC5681]. Additionally, note that [RFC5681] requires 353 any segments sent as part of the Limited Transmit 354 mechanism not be counted in FlightSize for the purpose 355 of the above equation. 357 (4.3) Retransmit the first data segment presumed dropped -- the 358 segment starting with sequence number HighACK + 1. To 359 prevent repeated retransmission of the same data or a 360 premature rescue retransmission, set both HighRxt and 361 RescueRxt to the highest sequence number in the 362 retransmitted segment. 364 (4.4) Run SetPipe () 366 Set a "pipe" variable to the number of outstanding 367 octets currently "in the pipe"; this is the data which 368 has been sent by the TCP sender but for which no 369 cumulative or selective acknowledgment has been 370 received and the data has not been determined to have 371 been dropped in the network. It is assumed that the 372 data is still traversing the network path. 374 (4.5) In order to take advantage of potential additional 375 available cwnd, proceed to step (C) below. 377 Once a TCP is in the loss recovery phase the following procedure MUST 378 be used for each arriving ACK: 380 (A) An incoming cumulative ACK for a sequence number greater than 381 RecoveryPoint signals the end of loss recovery and the loss 382 recovery phase MUST be terminated. Any information contained in 383 the scoreboard for sequence numbers greater than the new value of 384 HighACK SHOULD NOT be cleared when leaving the loss recovery 385 phase. 387 (B) Upon receipt of an ACK that does not cover RecoveryPoint the 388 following actions MUST be taken: 390 (B.1) Use Update () to record the new SACK information conveyed 391 by the incoming ACK. 393 (B.2) Use SetPipe () to re-calculate the number of octets still 394 in the network. 396 (C) If cwnd - pipe >= 1 SMSS the sender SHOULD transmit one or more 397 segments as follows: 399 (C.1) The scoreboard MUST be queried via NextSeg () for the 400 sequence number range of the next segment to transmit (if any), 401 and the given segment sent. If NextSeg () returns failure (no 402 data to send) return without sending anything (i.e., terminate 403 steps C.1 -- C.5). 405 (C.2) If any of the data octets sent in (C.1) are below HighData, 406 HighRxt MUST be set to the highest sequence number of the 407 retransmitted segment unless NextSeg () rule (4) was invoked for 408 this retransmission. 410 (C.3) If any of the data octets sent in (C.1) are above HighData, 411 HighData must be updated to reflect the transmission of 412 previously unsent data. 414 (C.4) The estimate of the amount of data outstanding in the 415 network must be updated by incrementing pipe by the number of 416 octets transmitted in (C.1). 418 (C.5) If cwnd - pipe >= 1 SMSS, return to (C.1) 420 Note that steps (A) and (C) can potentially send a burst of 421 back-to-back segments into the network if the incoming cumulative 422 acknowledgment is for more than SMSS octets of data, or if incoming 423 SACK blocks indicate that more than SMSS octets of data have been 424 lost in the second half of the window. 426 5.1 Retransmission Timeouts 428 In order to avoid memory deadlocks, the TCP receiver is allowed 429 to discard data that has already been selectively acknowledged. 430 As a result, [RFC2018] suggests that a TCP sender SHOULD expunge 431 the SACK information gathered from a receiver upon a 432 retransmission timeout "since the timeout might indicate that the 433 data receiver has reneged." Additionally, a TCP sender MUST 434 "ignore prior SACK information in determining which data to 435 retransmit." However, since the publication of [RFC2018] this 436 has come to be viewed by some as too strong. It has been 437 suggested that, as long as robust tests for reneging are present, 438 an implementation can retain and use SACK information across a 439 timeout event [Errata1610]. While this document does not change 440 the specification in [RFC2018], we note that implementers should 441 consult any updates to [RFC2018] on this subject. Further, a 442 SACK TCP sender SHOULD utilize all SACK information made 443 available during the loss recovery following an RTO. 445 If an RTO occurs during loss recovery as specified in this document, 446 RecoveryPoint MUST be set to HighData. Further, the new value of 447 RecoveryPoint MUST be preserved and the loss recovery algorithm 448 outlined in this document MUST be terminated. In addition, a new 449 recovery phase (as described in section 5) MUST NOT be initiated 450 until HighACK is greater than or equal to the new value of 451 RecoveryPoint. 453 As described in Sections 4 and 5, Update () SHOULD continue to be 454 used appropriately upon receipt of ACKs. This will allow the 455 recovery period after an RTO to benefit from all available 456 information provided by the receiver, even if SACK information 457 was expunged due to the RTO. 459 If there are segments missing from the receiver's buffer 460 following processing of the retransmitted segment, the 461 corresponding ACK will contain SACK information. In this case, a 462 TCP sender SHOULD use this SACK information when determining what 463 data should be sent in each segment following an RTO. The exact 464 algorithm for this selection is not specified in this document 465 (specifically NextSeg () is inappropriate during loss recovery 466 after an RTO). A relatively straightforward approach to "filling 467 in" the sequence space reported as missing should be a reasonable 468 approach. 470 6 Managing the RTO Timer 472 The standard TCP RTO estimator is defined in [RFC6298]. Due to the 473 fact that the SACK algorithm in this document can have an impact on 474 the behavior of the estimator, implementers may wish to consider how 475 the timer is managed. [RFC6298] calls for the RTO timer to be 476 re-armed each time an ACK arrives that advances the cumulative ACK 477 point. Because the algorithm presented in this document can keep the 478 ACK clock going through a fairly significant loss event, 479 (comparatively longer than the algorithm described in [RFC5681]), on 480 some networks the loss event could last longer than the RTO. In this 481 case the RTO timer would expire prematurely and a segment that need 482 not be retransmitted would be resent. 484 Therefore we give implementers the latitude to use the standard 486 [RFC6298] style RTO management or, optionally, a more careful variant 487 that re-arms the RTO timer on each retransmission that is sent during 488 recovery MAY be used. This provides a more conservative timer than 489 specified in [RFC6298], and so may not always be an attractive 490 alternative. However, in some cases it may prevent needless 491 retransmissions, go-back-N transmission and further reduction of the 492 congestion window. 494 7 Research 496 The algorithm specified in this document is analyzed in [FF96], which 497 shows that the above algorithm is effective in reducing transfer time 498 over standard TCP Reno [RFC5681] when multiple segments are dropped 499 from a window of data (especially as the number of drops increases). 500 [AHKO97] shows that the algorithm defined in this document can 501 greatly improve throughput in connections traversing satellite 502 channels. 504 8 Security Considerations 506 The algorithm presented in this paper shares security considerations 507 with [RFC5681]. A key difference is that an algorithm based on SACKs 508 is more robust against attackers forging duplicate ACKs to force the 509 TCP sender to reduce cwnd. With SACKs, TCP senders have an 510 additional check on whether or not a particular ACK is legitimate. 511 While not fool-proof, SACK does provide some amount of protection in 512 this area. 514 Similarly, [CPNI309] sketches a variant of a blind attack [RFC5961] 515 whereby an attacker can spoof out-of-window data to a TCP endpoint, 516 causing it to respond to the legitimate peer with a duplicate 517 cumulative ACK, per [RFC793]. Adding a SACK-based requirement to 518 trigger loss recovery effectively mitigates this attack, as the 519 duplicate ACKs caused by out-of-window segments will not contain SACK 520 information indicating reception of previously un-SACKED in-window 521 data. 523 9 Changes Relative to RFC 3517 525 The state variable "DupAcks" has been added to the list of variables 526 maintained by this algorithm, and its usage specified. 528 The function IsLost () has been modified to require that more than 529 (DupThresh - 1) * SMSS octets have been SACKed above a given sequence 530 number as indication that it is lost, changed from at least 531 (DupThresh * SMSS). This retains the requirement that at least three 532 segments following the sequence number in question have been SACKed, 533 while improving detection in the event that the sender has 534 outstanding segments which are smaller than SMSS. 536 The definition of a "duplicate acknowledgment" has been modified to 537 utilize the SACK information in detecting loss. Duplicate cumulative 538 acknowledgments can be caused by either loss or reordering in the 539 network. To disambiguate loss and reordering TCP's fast retransmit 540 algorithm [RFC5681] waits until three duplicate ACKs arrive to 541 trigger loss recovery. This notion was then the basis for the 542 algorithm specified in [RFC3517]. However, with SACK information 543 there is no need to rely blindly on the cumulative acknowledgment 544 field. We can leverage the additional information present in the 545 SACK blocks to understand that three segments have arrived at the 546 receiver which lie above a gap in the sequence space, and can use 547 that to trigger loss recovery. This notion was used in [RFC3517] 548 during loss recovery, and the change in this document is that the 549 notion is also used to enter a loss recovery phase. 551 The state variable "RescueRxt" has been added to the list of 552 variables maintained by the algorithm, and its usage specified. This 553 variable is used to allow for one extra retransmission per entry into 554 loss recovery, in order to keep the ACK clock going under certain 555 circumstances involving loss at the end of the window. This 556 mechanism allows for no more than one segment of no larger than 1 557 SMSS to be optimistically retransmitted per loss recovery. 559 Rule (3) of NextSeg() has been changed from MAY to SHOULD, to 560 appropriately reflect the opinion of the authors and working group 561 that it should be left in, rather than out, if an implementor does 562 not have a compelling reason to do otherwise. 564 Acknowledgments 566 The authors wish to thank Sally Floyd for encouraging [RFC3517] 567 and commenting on early drafts. The algorithm described in this 568 document is loosely based on an algorithm outlined by Kevin Fall 569 and Sally Floyd in [FF96], although the authors of this document 570 assume responsibility for any mistakes in the above text. 572 [RFC3517] was co-authored by Kevin Fall, who provided crucial input 573 to that document and hence this follow-on work. 575 Murali Bashyam, Ken Calvert, Tom Henderson, Reiner Ludwig, 576 Jamshid Mahdavi, Matt Mathis, Shawn Ostermann, Vern Paxson and 577 Venkat Venkatsubra provided valuable feedback on earlier versions 578 of this document. 580 We thank Matt Mathis and Jamshid Mahdavi for implementing the 581 scoreboard in ns and hence guiding our thinking in keeping track 582 of SACK state. 584 The first author would like to thank Ohio University and the Ohio 585 University Internetworking Research Group for supporting the bulk of 586 his work on this project. 588 Normative References 590 [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 591 793, September 1981. 593 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP 594 Selective Acknowledgment Options", RFC 2018, October 1996. 596 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 597 3", BCP 9, RFC 2026, October 1996. 599 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 600 Requirement Levels", BCP 14, RFC 2119, March 1997. 602 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 603 Control", RFC 5681, September 2009. 605 Informative References 607 [AHKO97] Mark Allman, Chris Hayes, Hans Kruse, Shawn Ostermann, "TCP 608 Performance Over Satellite Links", Proceedings of the Fifth 609 International Conference on Telecommunications Systems, 610 Nashville, TN, March, 1997. 612 [All00] Mark Allman, "A Web Server's View of the Transport Layer", 613 ACM Computer Communication Review, 30(5), October 2000. 615 [CPNI309] Fernando Gont, "Security Assessment of the Transmission 616 Control Protocol (TCP)", CPNI Technical Note 3/2009, 617 http://www.cpni.gov.uk/Docs/tn-03-09-security-assessment-TCP.pdf, 618 February 2009. 620 [Errata1610] Matt Mathis, "RFC Errata Report 1610 for RFC 2018", 621 http://www.rfc-editor.org/errata_search.php?eid=1610, 622 Verified 2008-12-09. 624 [FF96] Kevin Fall and Sally Floyd, "Simulation-based Comparisons 625 of Tahoe, Reno and SACK TCP", Computer Communication 626 Review, July 1996. 628 [Jac90] Van Jacobson, "Modified TCP Congestion Avoidance Algorithm", 629 Technical Report, LBL, April 1990. 631 [PF01] Jitendra Padhye, Sally Floyd "Identifying the TCP Behavior 632 of Web Servers", ACM SIGCOMM, August 2001. 634 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 635 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 636 April 2004. 638 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC 639 2914, September 2000. 641 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, "Computing 642 TCP's Retransmission Timer", RFC 6298, June 2011. 644 [RFC3042] Allman, M., Balakrishnan, H, and S. Floyd, "Enhancing TCP's 645 Loss Recovery Using Limited Transmit", RFC 3042, January 646 2001. 648 [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A 649 Conservative Selective Acknowledgment (SACK)-based Loss 650 Recovery Algorithm for TCP", RFC 3517, April 2003. 652 [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's 653 Robustness to Blind In-Window Attacks", RFC 5961, August 654 2010. 656 Authors' Addresses 658 Ethan Blanton 659 Purdue University Computer Sciences 660 305 N. University St. 661 West Lafayette, IN 47907 663 EMail: eblanton@cs.purdue.edu 665 Mark Allman 666 International Computer Science Institute 667 1947 Center St. Suite 600 668 Berkeley, CA 94704 670 Phone: 440-235-1792 671 EMail: mallman@icir.org 672 http://www.icir.org/mallman 674 Lili Wang 675 Juniper Networks 676 10 Technology Park Drive 677 Westford, MA 01886 679 EMail: liliw@juniper.net 681 Ilpo Jarvinen 682 University of Helsinki 683 P.O. Box 68 684 FI-00014 UNIVERSITY OF HELSINKI 685 Finland 687 Email: ilpo.jarvinen@helsinki.fi 689 Markku Kojo 690 University of Helsinki 691 P.O. Box 68 692 FI-00014 UNIVERSITY OF HELSINKI 693 Finland 695 Email: kojo@cs.helsinki.fi 696 Yoshifumi Nishida 697 WIDE Project 698 Endo 5322 699 Fujisawa, Kanagawa 252-8520 700 Japan 702 Email: nishida@wide.ad.jp