idnits 2.17.1 draft-allman-tcp-sack-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-23) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 2 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** There are 16 instances of lines with control characters in the document. ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'AHKO97' -- Possible downref: Non-RFC (?) normative reference: ref. 'All00' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jac90' -- Possible downref: Non-RFC (?) normative reference: ref. 'PF00' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782) Summary: 10 errors (**), 0 flaws (~~), 1 warning (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Ethan Blanton 2 INTERNET DRAFT Ohio University 3 File: draft-allman-tcp-sack-07.txt Mark Allman 4 BBN/NASA GRC 5 July, 2001 6 Expires: January, 2002 8 A Conservative SACK-based Loss Recovery Algorithm for TCP 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of [RFC2026]. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as 18 Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months and may be updated, replaced, or obsoleted by other documents 22 at any time. It is inappropriate to use Internet-Drafts as 23 reference material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 This document presents a conservative loss recovery algorithm for 34 TCP that is based on the use of the selective acknowledgment TCP 35 option. The algorithm presented in this document conforms to the 36 spirit of the current congestion control specification, but allows 37 TCP senders to recover more effectively when multiple segments are 38 lost from a single flight of data. 40 Terminology 42 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 43 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 44 document are to be interpreted as described in RFC 2119 [RFC2119]. 46 1 Introduction 48 This document presents a conservative loss recovery algorithm for 49 TCP that is based on the use of the selective acknowledgment TCP 50 option. While the TCP selective acknowledgment (SACK) option 51 [RFC2018] is being steadily deployed in the Internet [All00] there 52 is evidence that hosts are not using the SACK information when 53 making retransmission and congestion control decisions [PF00]. The 54 goal of this document is to outline one straightforward method for 55 TCP implementations to use SACK information to increase performance. 57 [RFC2581] allows advanced loss recovery algorithms to be used by TCP 58 [RFC793] provided that they follow the spirit of TCP's congestion 59 control algorithms [RFC2581,RFC2914]. [RFC2582] outlines one such 60 advanced recovery algorithm called NewReno. This document outlines 61 a loss recovery algorithm that uses the selective acknowledgment 62 (SACK) [RFC2018] TCP option to enhance TCP's loss recovery. The 63 algorithm outlined in this document, heavily based on the algorithm 64 detailed in [FF96], is a conservative replacement of the fast 65 recovery algorithm [Jac90,RFC2581]. The algorithm specified in this 66 document is a straightforward SACK-based loss recovery strategy that 67 follows the guidelines set in [RFC2581] and can safely be used in 68 TCP implementations. Alternate SACK-based loss recovery methods can 69 be used in TCP as implementers see fit (as long as the alternate 70 algorithms follow the guidelines provided in [RFC2581]). Please 71 note, however, that the SACK-based decisions in this document (such 72 as what segments are to be sent at what time) are largely decoupled 73 from the congestion control algorithms, and as such can be treated 74 as separate issues if so desired. 76 2 Definitions 78 The reader is expected to be familiar with the definitions given in 79 [RFC2581]. 81 For the purposes of explaining the SACK-based loss recovery 82 algorithm we define two variables that a TCP sender stores: 84 ``HighACK'' is the sequence number of the highest cumulative ACK 85 received at a given point. 87 ``HighData'' is the highest sequence number transmitted at a 88 given point. 90 For the purposes of this specification we define a ``duplicate 91 acknowledgment'' as an acknowledgment (ACK) whose cumulative ACK 92 number is equal to the current value of HighACK, as described in 93 [RFC2581]. 95 We define a variable ``DupThresh'' that holds the number of 96 duplicate acknowledgments required to trigger a retransmission. Per 97 [RFC2581] this threshold is defined to be 3 duplicate 98 acknowledgments. However, implementers should consult any updates 99 to [RFC2581] to determine the current value for DupThresh (or method 100 for determining its value). 102 3 Keeping Track of SACK Information 104 For a TCP sender to implement the algorithm defined in the next 105 section it must keep a data structure to store incoming selective 106 acknowledgment information on a per connection basis. Such a data 107 structure is commonly called the ``scoreboard''. Note that while 108 this document speaks of marking and keeping track of octets, a 109 real world implementation would probably want to keep track of 110 octet ranges or otherwise collapse the data while ensuring that 111 arbitrary ranges are still markable. For the purposes of the 112 algorithm defined in this document the scoreboard SHOULD implement 113 the following functions: 115 Update (): 117 Each octet that is cumulatively ACKed or SACKed should be marked 118 accordingly in the scoreboard data structure, and the total 119 number of octets SACKed should be recorded. 121 Note: SACK information is advisory and therefore SACKed data 122 MUST NOT be removed from TCP's retransmission buffer until the 123 data is cumulatively acknowledged [RFC2018]. 125 MarkRetran (): 127 When a retransmission is sent, the scoreboard MUST be updated 128 with this information so that data is not repeatedly 129 retransmitted by the SACK-based algorithm outlined in this 130 document. Note: If a retransmission is lost it will be repaired 131 using TCP's retransmission timer. 133 NextSeg (): 135 This routine MUST return the sequence number range of the oldest 136 segment that has not been cumulatively ACKed or SACKed and has 137 not been retransmitted, per the following rules: 139 (1) Look for the lowest sequence number that is not ACKed or 140 SACKed. If such a sequence number ``S'' exists, this 141 routine MUST return a sequence number range of up to 1 142 SMSS bytes in size starting at octet S. 144 (2) If we fail to find a segment per rule 1, but the connection 145 has unsent data available to be transmitted, NextSeg () MUST 146 return a sequence number range corresponding to one segment of 147 this new data. 149 (3) If rules 1 and 2 fail, NextSeg () MUST indicate this and no 150 data will be sent. 152 AmountSACKed (): 154 This routine MUST return the total number of octets which fall 155 between HighACK and HighData that have been selectively 156 acknowledged by the receiver. 158 LeftNetwork (): 160 This function MUST return the number of octets in the given 161 sequence number range that have left the network. The 162 algorithm checks each octet in the given range and separately 163 tracks two quantities. The first is the number of 164 retransmitted octets. The second value that is tracked is the 165 number of octets that have not been SACKed. This value 166 represents the number of octets that have not yet been removed 167 from the pipe estimate but are now known to have left the 168 network. Note: it is possible to have octets that fit both 169 categories. In this case, the octets MUST be counted in both 170 categories. After checking the sequence number range given, 171 this routine returns the sum of the two counters. 173 Note: The SACK-based loss recovery algorithm outlined in this 174 document requires more computational resources than previous TCP 175 loss recovery strategies. However, we believe the scoreboard data 176 structure can be implemented in a reasonably efficient manner (both 177 in terms of computation complexity and memory usage) in most TCP 178 implementations. 180 4 Algorithm Details 182 Upon the receipt of any ACK containing SACK information, the 183 scoreboard MUST be updated via the Update () routine. 185 Upon the receipt of the first (DupThresh - 1) duplicate ACKs, 186 the scoreboard is also to be updated as normal. Note: The first 187 and second duplicate ACKs can also be used to trigger the 188 transmission of previously unsent segments using the Limited 189 Transmit mechanism [RFC3042]. 191 When a TCP sender receives the duplicate ACK corresponding to 192 DupThresh ACKs, the scoreboard MUST be updated with the new SACK 193 information (via Update ()) and a loss recovery phase SHOULD be 194 initiated, per the fast retransmit algorithm outlined in [RFC2581], 195 and the following steps MUST be taken: 197 (1) Set a ``pipe'' variable to the number of outstanding octets 198 (i.e., octets that have been sent but not yet acknowledged), per 199 the following equation: 201 pipe = HighData - HighACK - AmountSACKed () 203 This variable represents the amount of data currently ``in the 204 pipe''; this is the data which has been sent by the TCP sender 205 but not acknowledged by the TCP receiver. This data can be 206 assumed to still be traversing the network path. 208 (2) Set a ``RecoveryPoint'' variable to HighData. When the TCP 209 sender receives a cumulative ACK for this data octet the loss 210 recovery phase is terminated. 212 (3) The congestion window (cwnd) is reduced to half of FlightSize 213 per [RFC2581]. The value of the slow start threshold (ssthresh) 214 is set to the halved value of cwnd. 216 (4) Retransmit the first data segment not covered by HighACK. Use 217 the MarkRetran () function to mark the sequence number range as 218 having been retransmitted in the scoreboard. In order to take 219 advantage of potential additional available cwnd, proceed to step 220 (D) below. 222 Once a TCP is in the loss recovery phase the following procedure 223 MUST be used for each arriving ACK: 225 (A) An incoming cumulative ACK for a sequence number greater than 226 RecoveryPoint signals the end of loss recovery and the loss 227 recovery phase MUST be terminated. The scoreboard SHOULD NOT be 228 cleared when leaving the loss recovery phase. 230 (B) Upon receipt of a duplicate ACK the following actions MUST be 231 taken: 233 (B.1) Use Update () to record the new SACK information conveyed 234 by the incoming ACK. 236 (B.2) The pipe variable is decremented by the number of newly 237 SACKed data octets conveyed in the incoming ACK, as that 238 is the amount of new data presumed to have left the 239 network. 241 (C) When a ``partial ACK'' (an ACK that increases the HighACK point, 242 but does not terminate loss recovery) arrives, the following 243 actions MUST be performed: 245 (C.1) Before updating HighACK based on the received cumulative 246 ACK, save HighACK as OldHighACK. 248 (C.2) The scoreboard MUST be updated based on the cumulative ACK 249 and any new SACK information that is included in the ACK via 250 the Update () routine. 252 (C.3) The value of pipe MUST be decremented by the number of 253 octets returned by the LeftNetwork () routine when given the 254 sequence number range OldHighACK-HighACK. 256 (D) While pipe is less than cwnd and the receiver's advertised window 257 permits, the TCP sender SHOULD transmit one or more segments 258 as follows: 260 (D.1) The scoreboard MUST be queried via NextSeg () for the 261 sequence number range of the next segment to transmit, and 262 the given segment is sent. 264 (D.2) The pipe variable MUST be incremented by the number of 265 data octets sent in (D.1). 267 (D.3) If any of the data octets sent in (D.1) are below HighData, 268 they MUST be marked as retransmitted via Update (). 270 (D.4) If cwnd - pipe is greater than 1 SMSS, return to (D.1) 271 4.1 Retransmission Timeouts 273 Keeping track of SACK information depends on the TCP sender having 274 an accurate measure of the current state of the network, the 275 conditions of this connection, and the state of the receiver's 276 buffer. Due to these limitations, [RFC2018] suggests that a TCP 277 sender SHOULD expunge the SACK information gathered from a receiver 278 upon a retransmission timeout ``since the timeout might indicate 279 that the data receiver has reneged.'' Additionally, a TCP sender 280 MUST ``ignore prior SACK information in determining which data to 281 retransmit.'' However, a SACK TCP sender SHOULD still use all SACK 282 information made available during the slow start phase of loss 283 recovery following an RTO. 285 As described in Sections 3 and 4, Update () and MarkRetran () SHOULD 286 continue to be used appropriately upon receipt of ACKs and 287 retransmissions, respectively. This will allow the slow start 288 recovery period to benefit from all available information provided 289 by the receiver, despite the fact that SACK information was expunged 290 due to the RTO. 292 If there are segments missing from the receiver's buffer following 293 processing of the retransmitted segment, the corresponding ACK will 294 contain SACK information. In this case, a TCP sender SHOULD use 295 this SACK information by using the NextSeg () routine to determine 296 what data should be sent in each segment of the slow start. 298 5 Research 300 The algorithm specified in this document is analyzed in [FF96], 301 which shows that the above algorithm is effective in reducing 302 transfer time over standard TCP Reno [RFC2581] when multiple 303 segments are dropped from a window of data (especially as the number 304 of drops increases). [AHKO97] shows that the algorithm defined in 305 this document can greatly improve throughput in connections 306 traversing satellite channels. 308 6 Security Considerations 310 The algorithm presented in this paper shares security considerations 311 with [RFC2581]. A key difference is that an algorithm based on 312 SACKs is more robust against attackers forging duplicate ACKs to 313 force the TCP sender to reduce cwnd. With SACKs, TCP senders have an 314 additional check on whether or not a particular ACK is legitimate. 315 While not fool-proof, SACK does provide some amount of protection in 316 this area. 318 Acknowledgments 320 The authors wish to thank Sally Floyd for encouraging this document 321 and commenting on an early draft. The algorithm described in this 322 document is largely based on an algorithm outlined by Kevin Fall and 323 Sally Floyd in [FF96], although the authors of this document assume 324 responsibility for any mistakes in the above text. Murali Bashyam, 325 Kevin Fall, Jamshid Mahdavi, Matt Mathis, Vern Paxson, Venkat 326 Venkatsubra, Reiner Ludwig and Shawn Ostermann provided valuable 327 feedback on earlier versions of this document. Finally, we thank 328 Matt Mathis and Jamshid Mahdavi for implementing the scoreboard in 329 ns and hence guiding our thinking in keeping track of SACK state. 331 References 333 [AHKO97] Mark Allman, Chris Hayes, Hans Kruse, Shawn Ostermann. TCP 334 Performance Over Satellite Links. Proceedings of the Fifth 335 International Conference on Telecommunications Systems, 336 Nashville, TN, March, 1997. 338 [All00] Mark Allman. A Web Server's View of the Transport Layer. ACM 339 Computer Communication Review, 30(5), October 2000. 341 [FF96] Kevin Fall and Sally Floyd. Simulation-based Comparisons of 342 Tahoe, Reno and SACK TCP. Computer Communication Review, July 343 1996. 345 [Jac90] Van Jacobson. Modified TCP Congestion Avoidance Algorithm. 346 Technical Report, LBL, April 1990. 348 [PF00] Jitendra Padhye, Sally Floyd. TBIT, the TCP Behavior 349 Inference Tool, October 2000. http://www.aciri.org/tbit/. 351 [RFC793] Jon Postel, Transmission Control Protocol, STD 7, RFC 793, 352 September 1981. 354 [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow. TCP Selective 355 Acknowledgment Options. RFC 2018, October 1996 357 [RFC2026] Scott Bradner. The Internet Standards Process -- Revision 358 3, RFC 2026, October 1996 360 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 361 Requirement Levels", BCP 14, RFC 2119, March 1997. 363 [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens, TCP 364 Congestion Control, RFC 2581, April 1999. 366 [RFC2582] Sally Floyd and Tom Henderson. The NewReno Modification 367 to TCP's Fast Recovery Algorithm, RFC 2582, April 1999. 369 [RFC2914] Sally Floyd. Congestion Control Principles, RFC 2914, 370 September 2000. 372 [RFC3042] Mark Allman, Hari Balkrishnan, Sally Floyd. Enhancing 373 TCP's Loss Recovery Using Limited Transmit. RFC 3042, 374 January 2001 375 Author's Addresses: 377 Ethan Blanton 378 Ohio University Internetworking Research Lab 379 Stocker Center 380 Athens, OH 45701 381 eblanton@irg.cs.ohiou.edu 383 Mark Allman 384 BBN Technologies/NASA Glenn Research Center 385 Lewis Field 386 21000 Brookpark Rd. MS 54-5 387 Cleveland, OH 44135 388 Phone: 216-433-6586 389 Fax: 216-433-8705 390 mallman@bbn.com 391 http://roland.grc.nasa.gov/~mallman