idnits 2.17.1 draft-ietf-tcpm-3517bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 6 characters in excess of 72. -- The draft header indicates that this document obsoletes RFC3517, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 26, 2012) is 4407 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'A' is mentioned on line 157, but not defined == Missing Reference: 'B' is mentioned on line 157, but not defined == Unused Reference: 'RFC2026' is defined on line 604, but no explicit reference was found in the text == Unused Reference: 'Jac90' is defined on line 636, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Duplicate reference: RFC2018, mentioned in 'Errata1610', was also mentioned in 'RFC2018'. -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) -- Obsolete informational reference (is this intentional?): RFC 3517 (Obsoleted by RFC 6675) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TCPM Working Group E. Blanton 2 INTERNET-DRAFT Purdue University 3 draft-ietf-tcpm-3517bis-02.txt M. Allman 4 Obsoletes: 3517 ICSI 5 Intended status: Standards Track L. Wang 6 Expires: September 2012 Juniper Networks 7 I. Jarvinen 8 M. Kojo 9 University of Helsinki 10 Y. Nishida 11 WIDE Project 12 March 26, 2012 14 A Conservative Selective Acknowledgment (SACK)-based 15 Loss Recovery Algorithm for TCP 17 Status of this Memo 19 This Internet-Draft is submitted to IETF in full conformance with 20 the provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six 28 months and may be updated, replaced, or obsoleted by other documents 29 at any time. It is inappropriate to use Internet-Drafts as 30 reference material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on September 23, 2012. 40 Copyright Notice 42 Copyright (c) 2012 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with 50 respect to this document. Code Components extracted from this 51 document must include Simplified BSD License text as described in 52 Section 4.e of the Trust Legal Provisions and are provided without 53 warranty as described in the Simplified BSD License. 55 Abstract 57 This document presents a conservative loss recovery algorithm for TCP 58 that is based on the use of the selective acknowledgment (SACK) TCP 59 option. The algorithm presented in this document conforms to the 60 spirit of the current congestion control specification (RFC 5681), 61 but allows TCP senders to recover more effectively when multiple 62 segments are lost from a single flight of data. 64 1 Introduction 66 This document presents a conservative loss recovery algorithm for TCP 67 that is based on the use of the selective acknowledgment (SACK) TCP 68 option. While the TCP SACK [RFC2018] is being steadily deployed in 69 the Internet [All00], there is evidence that hosts are not using the 70 SACK information when making retransmission and congestion control 71 decisions [PF01]. The goal of this document is to outline one 72 straightforward method for TCP implementations to use SACK 73 information to increase performance. 75 [RFC5681] allows advanced loss recovery algorithms to be used by TCP 76 [RFC793] provided that they follow the spirit of TCP's congestion 77 control algorithms [RFC5681, RFC2914]. [RFC3782] outlines one such 78 advanced recovery algorithm called NewReno. This document outlines a 79 loss recovery algorithm that uses the SACK [RFC2018] TCP option to 80 enhance TCP's loss recovery. The algorithm outlined in this 81 document, heavily based on the algorithm detailed in [FF96], is a 82 conservative replacement of the fast recovery algorithm [Jac90, 83 RFC5681]. The algorithm specified in this document is a 84 straightforward SACK-based loss recovery strategy that follows the 85 guidelines set in [RFC5681] and can safely be used in TCP 86 implementations. Alternate SACK-based loss recovery methods can be 87 used in TCP as implementers see fit (as long as the alternate 88 algorithms follow the guidelines provided in [RFC5681]). Please 89 note, however, that the SACK-based decisions in this document (such 90 as what segments are to be sent at what time) are largely decoupled 91 from the congestion control algorithms, and as such can be treated as 92 separate issues if so desired. 94 This document represents a revision of [RFC3517] to address several 95 situations that are not handled explicitly in that document. A 96 summary of the changes between this document and [RFC3517] can be 97 found in Section 9. 99 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 100 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 101 document are to be interpreted as described in BCP 14, RFC 2119 102 [RFC2119]. 104 2 Definitions 106 The reader is expected to be familiar with the definitions given in 108 [RFC5681]. 110 The reader is assumed to be familiar with selective acknowledgments 111 as specified in [RFC2018]. 113 For the purposes of explaining the SACK-based loss recovery algorithm 114 we define six variables that a TCP sender stores: 116 "HighACK" is the sequence number of the highest byte of data that 117 has been cumulatively ACKed at a given point. 119 "HighData" is the highest sequence number transmitted at a given 120 point. 122 "HighRxt" is the highest sequence number which has been 123 retransmitted during the current loss recovery phase. 125 "RescueRxt" is the highest sequence number which has been 126 retransmitted optimistically to prevent stalling of the ACK clock 127 when there is loss at the end of the window and no new data is 128 available for transmission. 130 "Pipe" is a sender's estimate of the number of bytes outstanding 131 in the network. This is used during recovery for limiting the 132 sender's sending rate. The pipe variable allows TCP to use a 133 fundamentally different congestion control than specified in 134 [RFC5681]. The algorithm is often referred to as the "pipe 135 algorithm". 137 "DupAcks" is the number of duplicate acknowledgments received 138 since the last cumulative acknowledgment. 140 For the purposes of this specification we define a "duplicate 141 acknowledgment" as a segment that arrives carrying a SACK block that 142 identifies previously unacknowledged and un-SACKed octets between 143 HighACK and HighData. Note that an ACK which carries new 144 SACK data is counted as a duplicate acknowledgment under this 145 definition even if it carries new data, changes the advertised 146 window, or moves the cumulative acknowledgment point, which is 147 different from the definition of duplicate acknowledgment 148 in [RFC5681]. 150 We define a variable "DupThresh" that holds the number of duplicate 151 acknowledgments required to trigger a retransmission. Per [RFC5681] 152 this threshold is defined to be 3 duplicate acknowledgments. 153 However, implementers should consult any updates to [RFC5681] to 154 determine the current value for DupThresh (or method for determining 155 its value). 157 Finally, a range of sequence numbers [A,B] is said to "cover" 158 sequence number S if A <= S <= B. 160 3 Keeping Track of SACK Information 161 For a TCP sender to implement the algorithm defined in the next 162 section it must keep a data structure to store incoming selective 163 acknowledgment information on a per connection basis. Such a data 164 structure is commonly called the "scoreboard". The specifics of the 165 scoreboard data structure are out of scope for this document (as long 166 as the implementation can perform all functions required by this 167 specification). 169 Note that this document refers to keeping account of (marking) 170 individual octets of data transferred across a TCP connection. A 171 real-world implementation of the scoreboard would likely prefer to 172 manage this data as sequence number ranges. The algorithms presented 173 here allow this, but require the ability to mark arbitrary sequence 174 number ranges as having been selectively acknowledged. 176 Finally, note that the algorithm in this document assumes a 177 sender that is not keeping track of segment boundaries after 178 transmitting a segment. It is possible that a sender that did 179 keep this extra state may be able to use a more refined and 180 precise algorithm than the one presented herein, however, we 181 leave this as future work. 183 4 Processing and Acting Upon SACK Information 185 For the purposes of the algorithm defined in this document the 186 scoreboard SHOULD implement the following functions: 188 Update (): 190 Given the information provided in an ACK, each octet that is 191 cumulatively ACKed or SACKed should be marked accordingly in the 192 scoreboard data structure, and the total number of octets SACKed 193 should be recorded. 195 Note: SACK information is advisory and therefore SACKed data MUST 196 NOT be removed from TCP's retransmission buffer until the data is 197 cumulatively acknowledged [RFC2018]. 199 IsLost (SeqNum): 201 This routine returns whether the given sequence number is 202 considered to be lost. The routine returns true when either 203 DupThresh discontiguous SACKed sequences have arrived above 204 'SeqNum' or more than (DupThresh - 1) * SMSS bytes with sequence 205 numbers greater than 'SeqNum' have been SACKed. Otherwise, the 206 routine returns false. 208 SetPipe (): 210 This routine traverses the sequence space from HighACK to HighData 211 and MUST set the "pipe" variable to an estimate of the number of 212 octets that are currently in transit between the TCP sender and 213 the TCP receiver. After initializing pipe to zero the following 214 steps are taken for each octet 'S1' in the sequence space between 215 HighACK and HighData that has not been SACKed: 217 (a) If IsLost (S1) returns false: 219 Pipe is incremented by 1 octet. 221 The effect of this condition is that pipe is incremented for 222 packets that have not been SACKed and have not been determined 223 to have been lost (i.e., those segments that are still assumed 224 to be in the network). 226 (b) If S1 <= HighRxt: 228 Pipe is incremented by 1 octet. 230 The effect of this condition is that pipe is incremented for 231 the retransmission of the octet. 233 Note that octets retransmitted without being considered lost are 234 counted twice by the above mechanism. 236 NextSeg (): 238 This routine uses the scoreboard data structure maintained by the 239 Update() function to determine what to transmit based on the SACK 240 information that has arrived from the data receiver (and hence 241 been marked in the scoreboard). NextSeg () MUST return the 242 sequence number range of the next segment that is to be 243 transmitted, per the following rules: 245 (1) If there exists a smallest unSACKed sequence number 'S2' that 246 meets the following three criteria for determining loss, the 247 sequence range of one segment of up to SMSS octets starting 248 with S2 MUST be returned. 250 (1.a) S2 is greater than HighRxt. 252 (1.b) S2 is less than the highest octet covered by any 253 received SACK. 255 (1.c) IsLost (S2) returns true. 257 (2) If no sequence number 'S2' per rule (1) exists but there 258 exists available unsent data and the receiver's advertised 259 window allows, the sequence range of one segment of up to SMSS 260 octets of previously unsent data starting with sequence number 261 HighData+1 MUST be returned. 263 (3) If the conditions for rules (1) and (2) fail, but there exists 264 an unSACKed sequence number 'S3' that meets the criteria for 265 detecting loss given in steps (1.a) and (1.b) above 266 (specifically excluding step (1.c)) then one segment of up to 267 SMSS octets starting with S3 SHOULD be returned. 269 (4) If the conditions for (1), (2), and (3) fail, but there 270 exists outstanding unSACKed data, we provide the 271 opportunity for a single "rescue" retransmission per entry 272 into loss recovery. If HighACK is greater than RescueRxt 273 (or RescueRxt is undefined), then one segment of up to 274 SMSS octets which MUST include the highest outstanding 275 unSACKed sequence number SHOULD be returned, and RescueRxt 276 set to RecoveryPoint. HighRxt MUST NOT be updated. 278 Note that rules (3) and (4) are a sort of retransmission "last 279 resort". They allow for retransmission of sequence numbers 280 even when the sender has less certainty a segment has been 281 lost than as with rule (1). Retransmitting segments via rule 282 (3) and (4) will help sustain TCP's ACK clock and therefore 283 can potentially help avoid retransmission timeouts. However, 284 in sending these segments the sender has two copies of the 285 same data considered to be in the network (and also in the 286 Pipe estimate, in the case of (3)). When an ACK or SACK 287 arrives covering this retransmitted segment, the sender cannot 288 be sure exactly how much data left the network (one of the two 289 transmissions of the packet or both transmissions of the 290 packet). Therefore the sender may underestimate Pipe by 291 considering both segments to have left the network when it is 292 possible that only one of the two has. 294 (5) If the conditions for each of (1), (2), (3), and (4) are not 295 met, then NextSeg () MUST indicate failure, and no segment is 296 returned. 298 Note: The SACK-based loss recovery algorithm outlined in this 299 document requires more computational resources than previous TCP loss 300 recovery strategies. However, we believe the scoreboard data 301 structure can be implemented in a reasonably efficient manner (both 302 in terms of computation complexity and memory usage) in most TCP 303 implementations. 305 5 Algorithm Details 307 Upon the receipt of any ACK containing SACK information, the 308 scoreboard MUST be updated via the Update () routine. 310 If the incoming ACK is a cumulative acknowledgment, the TCP MUST 311 reset DupAcks to zero. 313 If the incoming ACK is a duplicate acknowledgment per the definition 314 in Section 2 (regardless of its status as a cumulative 315 acknowledgment), and the TCP is not currently in loss recovery, the 316 TCP MUST increase DupAcks by one and take the following steps: 318 (1) If DupAcks >= DupThresh, go to step (4). 320 Note: This check covers the case when a TCP receives SACK 321 information for multiple segments smaller than SMSS, which can 322 potentially prevent IsLost() (next step) from declaring a segment 323 as lost. 325 (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns 326 true---indicating at least three segments have arrived above 327 the current cumulative acknowledgment point, which is taken 328 to indicate loss---go to step (4). 330 (3) The TCP MAY transmit previously unsent data segments as per 331 Limited Transmit [RFC5681], except that the number of octets 332 which may be sent is governed by Pipe and cwnd as follows: 334 (3.1) Set HighRxt to HighACK. 336 (3.2) Run SetPipe (). 338 (3.3) If (cwnd - pipe) >= 1 SMSS, there exists previously 339 unsent data, and the receiver's advertised window 340 allows, transmit up to 1 SMSS of data starting with the 341 octet HighData+1 and update HighData to reflect this 342 transmission, then return to (3.2). 344 (3.4) Terminate processing of this ACK. 346 (4) Invoke Fast Retransmit and enter loss recovery as follows: 348 (4.1) RecoveryPoint = HighData 350 When the TCP sender receives a cumulative ACK for this 351 data octet the loss recovery phase is terminated. 353 (4.2) ssthresh = cwnd = (FlightSize / 2) 355 The congestion window (cwnd) and slow start threshold 356 (ssthresh) are reduced to half of FlightSize per 357 [RFC5681]. Additionally, note that [RFC5681] requires 358 any segments sent as part of the Limited Transmit 359 mechanism not be counted in FlightSize for the purpose 360 of the above equation. 362 (4.3) Retransmit the first data segment presumed dropped -- the 363 segment starting with sequence number HighACK + 1. To 364 prevent repeated retransmission of the same data or a 365 premature rescue retransmission, set both HighRxt and 366 RescueRxt to the highest sequence number in the 367 retransmitted segment. 369 (4.4) Run SetPipe () 371 Set a "pipe" variable to the number of outstanding 372 octets currently "in the pipe"; this is the data which 373 has been sent by the TCP sender but for which no 374 cumulative or selective acknowledgment has been 375 received and the data has not been determined to have 376 been dropped in the network. It is assumed that the 377 data is still traversing the network path. 379 (4.5) In order to take advantage of potential additional 380 available cwnd, proceed to step (C) below. 382 Once a TCP is in the loss recovery phase the following procedure MUST 383 be used for each arriving ACK: 385 (A) An incoming cumulative ACK for a sequence number greater than 386 RecoveryPoint signals the end of loss recovery and the loss 387 recovery phase MUST be terminated. Any information contained in 388 the scoreboard for sequence numbers greater than the new value of 389 HighACK SHOULD NOT be cleared when leaving the loss recovery 390 phase. 392 (B) Upon receipt of an ACK that does not cover RecoveryPoint the 393 following actions MUST be taken: 395 (B.1) Use Update () to record the new SACK information conveyed 396 by the incoming ACK. 398 (B.2) Use SetPipe () to re-calculate the number of octets still 399 in the network. 401 (C) If cwnd - pipe >= 1 SMSS the sender SHOULD transmit one or more 402 segments as follows: 404 (C.1) The scoreboard MUST be queried via NextSeg () for the 405 sequence number range of the next segment to transmit (if any), 406 and the given segment sent. If NextSeg () returns failure (no 407 data to send) return without sending anything (i.e., terminate 408 steps C.1 -- C.5). 410 (C.2) If any of the data octets sent in (C.1) are below HighData, 411 HighRxt MUST be set to the highest sequence number of the 412 retransmitted segment unless NextSeg () rule (4) was invoked for 413 this retransmission. 415 (C.3) If any of the data octets sent in (C.1) are above HighData, 416 HighData must be updated to reflect the transmission of 417 previously unsent data. 419 (C.4) The estimate of the amount of data outstanding in the 420 network must be updated by incrementing pipe by the number of 421 octets transmitted in (C.1). 423 (C.5) If cwnd - pipe >= 1 SMSS, return to (C.1) 425 Note that steps (A) and (C) can potentially send a burst of 426 back-to-back segments into the network if the incoming cumulative 427 acknowledgment is for more than SMSS octets of data, or if incoming 428 SACK blocks indicate that more than SMSS octets of data have been 429 lost in the second half of the window. 431 5.1 Retransmission Timeouts 433 In order to avoid memory deadlocks, the TCP receiver is allowed 434 to discard data that has already been selectively acknowledged. 435 As a result, [RFC2018] suggests that a TCP sender SHOULD expunge 436 the SACK information gathered from a receiver upon a 437 retransmission timeout "since the timeout might indicate that the 438 data receiver has reneged." Additionally, a TCP sender MUST 439 "ignore prior SACK information in determining which data to 440 retransmit." However, since the publication of [RFC2018] this 441 has come to be viewed by some as too strong. It has been 442 suggested that, as long as robust tests for reneging are present, 443 an implementation can retain and use SACK information across a 444 timeout event [Errata1610]. While this document does not change 445 the specification in [RFC2018], we note that implementers should 446 consult any updates to [RFC2018] on this subject. Further, a 447 SACK TCP sender SHOULD utilize all SACK information made 448 available during the loss recovery following an RTO. 450 If an RTO occurs during loss recovery as specified in this document, 451 RecoveryPoint MUST be set to HighData. Further, the new value of 452 RecoveryPoint MUST be preserved and the loss recovery algorithm 453 outlined in this document MUST be terminated. In addition, a new 454 recovery phase (as described in section 5) MUST NOT be initiated 455 until HighACK is greater than or equal to the new value of 456 RecoveryPoint. 458 As described in Sections 4 and 5, Update () SHOULD continue to be 459 used appropriately upon receipt of ACKs. This will allow the 460 recovery period after an RTO to benefit from all available 461 information provided by the receiver, even if SACK information 462 was expunged due to the RTO. 464 If there are segments missing from the receiver's buffer 465 following processing of the retransmitted segment, the 466 corresponding ACK will contain SACK information. In this case, a 467 TCP sender SHOULD use this SACK information when determining what 468 data should be sent in each segment following an RTO. The exact 469 algorithm for this selection is not specified in this document 470 (specifically NextSeg () is inappropriate during loss recovery 471 after an RTO). A relatively straightforward approach to "filling 472 in" the sequence space reported as missing should be a reasonable 473 approach. 475 6 Managing the RTO Timer 477 The standard TCP RTO estimator is defined in [RFC6298]. Due to the 478 fact that the SACK algorithm in this document can have an impact on 479 the behavior of the estimator, implementers may wish to consider how 480 the timer is managed. [RFC6298] calls for the RTO timer to be 481 re-armed each time an ACK arrives that advances the cumulative ACK 482 point. Because the algorithm presented in this document can keep the 483 ACK clock going through a fairly significant loss event, 484 (comparatively longer than the algorithm described in [RFC5681]), on 485 some networks the loss event could last longer than the RTO. In this 486 case the RTO timer would expire prematurely and a segment that need 487 not be retransmitted would be resent. 489 Therefore we give implementers the latitude to use the standard 490 [RFC6298] style RTO management or, optionally, a more careful variant 491 that re-arms the RTO timer on each retransmission that is sent during 492 recovery MAY be used. This provides a more conservative timer than 493 specified in [RFC6298], and so may not always be an attractive 494 alternative. However, in some cases it may prevent needless 495 retransmissions, go-back-N transmission and further reduction of the 496 congestion window. 498 7 Research 500 The algorithm specified in this document is analyzed in [FF96], which 501 shows that the above algorithm is effective in reducing transfer time 502 over standard TCP Reno [RFC5681] when multiple segments are dropped 503 from a window of data (especially as the number of drops increases). 504 [AHKO97] shows that the algorithm defined in this document can 505 greatly improve throughput in connections traversing satellite 506 channels. 508 8 Security Considerations 510 The algorithm presented in this paper shares security considerations 511 with [RFC5681]. A key difference is that an algorithm based on SACKs 512 is more robust against attackers forging duplicate ACKs to force the 513 TCP sender to reduce cwnd. With SACKs, TCP senders have an 514 additional check on whether or not a particular ACK is legitimate. 515 While not fool-proof, SACK does provide some amount of protection in 516 this area. 518 Similarly, [CPNI309] sketches a variant of a blind attack [RFC5961] 519 whereby an attacker can spoof out-of-window data to a TCP endpoint, 520 causing it to respond to the legitimate peer with a duplicate 521 cumulative ACK, per [RFC793]. Adding a SACK-based requirement to 522 trigger loss recovery effectively mitigates this attack, as the 523 duplicate ACKs caused by out-of-window segments will not contain SACK 524 information indicating reception of previously un-SACKED in-window 525 data. 527 9 Changes Relative to RFC 3517 529 The state variable "DupAcks" has been added to the list of variables 530 maintained by this algorithm, and its usage specified. 532 The function IsLost () has been modified to require that more than 533 (DupThresh - 1) * SMSS octets have been SACKed above a given sequence 534 number as indication that it is lost, changed from at least 535 (DupThresh * SMSS). This retains the requirement that at least three 536 segments following the sequence number in question have been SACKed, 537 while improving detection in the event that the sender has 538 outstanding segments which are smaller than SMSS. 540 The definition of a "duplicate acknowledgment" has been modified to 541 utilize the SACK information in detecting loss. Duplicate cumulative 542 acknowledgments can be caused by either loss or reordering in the 543 network. To disambiguate loss and reordering TCP's fast retransmit 544 algorithm [RFC5681] waits until three duplicate ACKs arrive to 545 trigger loss recovery. This notion was then the basis for the 546 algorithm specified in [RFC3517]. However, with SACK information 547 there is no need to rely blindly on the cumulative acknowledgment 548 field. We can leverage the additional information present in the 549 SACK blocks to understand that three segments have arrived at the 550 receiver which lie above a gap in the sequence space, and can use 551 that to trigger loss recovery. This notion was used in [RFC3517] 552 during loss recovery, and the change in this document is that the 553 notion is also used to enter a loss recovery phase. 555 The state variable "RescueRxt" has been added to the list of 556 variables maintained by the algorithm, and its usage specified. This 557 variable is used to allow for one extra retransmission per entry into 558 loss recovery, in order to keep the ACK clock going under certain 559 circumstances involving loss at the end of the window. This 560 mechanism allows for no more than one segment of no larger than 1 561 SMSS to be optimistically retransmitted per loss recovery. 563 Rule (3) of NextSeg() has been changed from MAY to SHOULD, to 564 appropriately reflect the opinion of the authors and working group 565 that it should be left in, rather than out, if an implementor does 566 not have a compelling reason to do otherwise. 568 10 IANA Considerations 570 This document has no actions for IANA. 572 Acknowledgments 574 The authors wish to thank Sally Floyd for encouraging [RFC3517] 575 and commenting on early drafts. The algorithm described in this 576 document is loosely based on an algorithm outlined by Kevin Fall 577 and Sally Floyd in [FF96], although the authors of this document 578 assume responsibility for any mistakes in the above text. 580 [RFC3517] was co-authored by Kevin Fall, who provided crucial input 581 to that document and hence this follow-on work. 583 Murali Bashyam, Ken Calvert, Tom Henderson, Reiner Ludwig, 584 Jamshid Mahdavi, Matt Mathis, Shawn Ostermann, Vern Paxson and 585 Venkat Venkatsubra provided valuable feedback on earlier versions 586 of this document. 588 We thank Matt Mathis and Jamshid Mahdavi for implementing the 589 scoreboard in ns and hence guiding our thinking in keeping track 590 of SACK state. 592 The first author would like to thank Ohio University and the Ohio 593 University Internetworking Research Group for supporting the bulk of 594 his work on RFC 3517, from which this document is derived. 596 Normative References 598 [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 599 793, September 1981. 601 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP 602 Selective Acknowledgment Options", RFC 2018, October 1996. 604 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 605 3", BCP 9, RFC 2026, October 1996. 607 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 608 Requirement Levels", BCP 14, RFC 2119, March 1997. 610 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 611 Control", RFC 5681, September 2009. 613 Informative References 615 [AHKO97] Mark Allman, Chris Hayes, Hans Kruse, Shawn Ostermann, "TCP 616 Performance Over Satellite Links", Proceedings of the Fifth 617 International Conference on Telecommunications Systems, 618 Nashville, TN, March, 1997. 620 [All00] Mark Allman, "A Web Server's View of the Transport Layer", 621 ACM Computer Communication Review, 30(5), October 2000. 623 [CPNI309] Fernando Gont, "Security Assessment of the Transmission 624 Control Protocol (TCP)", CPNI Technical Note 3/2009, 625 http://www.cpni.gov.uk/Docs/tn-03-09-security-assessment-TCP.pdf, 626 February 2009. 628 [Errata1610] Matt Mathis, "RFC Errata Report 1610 for RFC 2018", 629 http://www.rfc-editor.org/errata_search.php?eid=1610, 630 Verified 2008-12-09. 632 [FF96] Kevin Fall and Sally Floyd, "Simulation-based Comparisons 633 of Tahoe, Reno and SACK TCP", Computer Communication 634 Review, July 1996. 636 [Jac90] Van Jacobson, "Modified TCP Congestion Avoidance 637 Algorithm", Technical Report, LBL, April 1990. 639 [PF01] Jitendra Padhye, Sally Floyd "Identifying the TCP Behavior 640 of Web Servers", ACM SIGCOMM, August 2001. 642 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 643 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 644 April 2004. 646 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC 647 2914, September 2000. 649 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, "Computing 650 TCP's Retransmission Timer", RFC 6298, June 2011. 652 [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A 653 Conservative Selective Acknowledgment (SACK)-based Loss 654 Recovery Algorithm for TCP", RFC 3517, April 2003. 656 [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's 657 Robustness to Blind In-Window Attacks", RFC 5961, August 658 2010. 660 Authors' Addresses 662 Ethan Blanton 663 Purdue University Computer Sciences 664 305 N. University St. 665 West Lafayette, IN 47907 667 EMail: elb@psg.com 669 Mark Allman 670 International Computer Science Institute 671 1947 Center St. Suite 600 672 Berkeley, CA 94704 674 Phone: 440-235-1792 675 EMail: mallman@icir.org 676 http://www.icir.org/mallman 678 Lili Wang 679 Juniper Networks 680 10 Technology Park Drive 681 Westford, MA 01886 683 EMail: liliw@juniper.net 685 Ilpo Jarvinen 686 University of Helsinki 687 P.O. Box 68 688 FI-00014 UNIVERSITY OF HELSINKI 689 Finland 691 Email: ilpo.jarvinen@helsinki.fi 693 Markku Kojo 694 University of Helsinki 695 P.O. Box 68 696 FI-00014 UNIVERSITY OF HELSINKI 697 Finland 699 Email: kojo@cs.helsinki.fi 701 Yoshifumi Nishida 702 WIDE Project 703 Endo 5322 704 Fujisawa, Kanagawa 252-8520 705 Japan 707 Email: nishida@wide.ad.jp