idnits 2.17.1 draft-allman-tcp-sack-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 2 instances of lines with control characters in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 98: '...d in this document the scoreboard MUST...' RFC 2119 keyword, line 111: '... MUST NOT be removed from TCP's...' RFC 2119 keyword, line 116: '...s sent, the scoreboard MUST be updated...' RFC 2119 keyword, line 124: '... This routine MUST return the seque...' RFC 2119 keyword, line 130: '...number ``S'' exists, this routine MUST...' (20 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2018' is mentioned on line 112, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. 'AHKO97' -- Possible downref: Non-RFC (?) normative reference: ref. 'All00' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jac90' -- Possible downref: Non-RFC (?) normative reference: ref. 'PF00' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782) Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Mark Allman 2 INTERNET DRAFT NASA GRC/BBN 3 File: draft-allman-tcp-sack-01.txt Ethan Blanton 4 Ohio University 5 January, 2001 6 Expires: July, 2001 8 A Conservative SACK-based Loss Recovery Algorithm for TCP 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as 18 Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months and may be updated, replaced, or obsoleted by other documents 22 at any time. It is inappropriate to use Internet- Drafts as 23 reference material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 This document presents a conservative loss recovery algorithm for 34 TCP that is based on the use of the selective acknowledgment TCP 35 option. The algorithm presented in this document conforms to the 36 spirit of the current congestion control specification, but allows 37 TCP senders to recover more effectively when multiple segments are 38 lost from a single flight of data. 40 1 Introduction 42 This document presents a conservative loss recovery algorithm for 43 TCP that is based on the use of the selective acknowledgment TCP 44 option. While the TCP selective acknowledgment (SACK) option 45 [RFC2018] is being steadily deployed in the Internet [All00] there 46 is evidence that hosts are not using the SACK information when 47 making retransmission and congestion control decisions [PF00]. The 48 goal of this document is to outline one straightforward method for 49 TCP implementations to use SACK information to increase performance. 51 [RFC2581] allows advanced loss recovery algorithms to be used by TCP 52 [RFC793] provided that they follow the spirit of TCP's congestion 53 control algorithms [RFC2581,RFC2914]. [RFC2582] outlines one such 54 advanced recovery algorithm called NewReno. This document outlines 55 a loss recovery algorithm that uses the selective acknowledgment 56 (SACK) [RFC2018] TCP option to enhance TCP's loss recovery. The 57 algorithm outlined in this document, heavily based on the algorithm 58 detailed in [FF96], is a conservative replacement of the fast 59 recovery algorithm [Jac90,RFC2581]. The algorithm specified in this 60 document is a straightforward SACK-based loss recovery strategy that 61 follows the guidelines set in [RFC2581] and can safely be used in 62 TCP implementations. Alternate SACK-based loss recovery methods can 63 be used in TCP as implementers see fit (as long as the alternate 64 algorithms follow the guidelines provided in [RFC2581]). 66 2 Definitions 68 The reader is expected to be familiar with the definitions given in 69 [RFC2581]. 71 For the purposes of explaining the SACK-based loss recovery 72 algorithm we define two variables that a TCP sender stores: 74 ``HighACK'' is the sequence number of the highest cumulative ACK 75 received at a given point. 77 ``HighData'' is the highest sequence number transmitted at a 78 given point. 80 For the purposes of this specification we define a ``duplicate 81 acknowledgment'' as an acknowledgment (ACK) whose cumulative ACK 82 number is equal to the current value of HighACK and also conveys new 83 selective acknowledgment information for segment(s) above HighACK. 85 We define a variable ``DupThresh'' that holds the number of 86 duplicate acknowledgments required to trigger a retransmission. Per 87 [RFC2581] this threshold is defined to be 3 duplicate 88 acknowledgments. However, implementers should consult any updates 89 to [RFC2581] to determine the current value for DupThresh (or method 90 for determining its value). 92 3 Keeping Track of SACK Information 94 For a TCP sender to implement the algorithm defined in the next 95 section it must keep a data structure to store incoming selective 96 acknowledgment information on a per connection basis. Such a data 97 structure is commonly called the ``scoreboard''. For the purposes 98 of the algorithm defined in this document the scoreboard MUST 99 implement the following functions: 101 Update (): 103 Each octet that is cumulatively ACKed or SACKed should be marked 104 accordingly in the scoreboard data structure, and the total 105 number of octets SACKed should be recorded. For each octet that 106 has not been cumulatively acknowledged, a ``DupSACK'' counter is 107 kept indicating how many times an octet of greater sequence 108 number has been SACKed. 110 Note: SACK information is advisory and therefore SACKed data 111 MUST NOT be removed from TCP's retransmission buffer until the 112 data is cumulatively acknowledged [RFC2018]. 114 MarkRetran (): 116 When a retransmission is sent, the scoreboard MUST be updated 117 with this information so that data is not repeatedly 118 retransmitted by the SACK-based algorithm outlined in this 119 document. Note: If a retransmission is lost it will be repaired 120 using TCP's retransmission timer. 122 NextSeg (): 124 This routine MUST return the sequence number range of the oldest 125 segment that has not been cumulatively ACKed or SACKed and not 126 been retransmitted, per the following rules: 128 (1) Look for the lowest sequence number that is not ACKed o 129 SACKed, but has a DupSACK counter of at least DupThresh. If 130 such a sequence number ``S'' exists, this routine MUST 131 return a sequence number range starting at octet S. 133 (2) If we fail to find a segment per rule 1, but the connection 134 has unsent data available to be transmitted, NextSeg () MUST 135 return a sequence number range corresponding to one segment 136 of this new data. 138 (3) If rules 1 and 2 fail, this routine MUST return a segment 139 that has not been ACKed or SACKed but may not meet the 140 DupThresh requirement in 1. 142 (4) Finally, if rules 1-3 fail, NextSeg () MUST indicate this 143 and no data will be sent. 145 AmountSACKed (): 147 This routine MUST return the number of octets selectively 148 acknowledged by the receiver. 150 LeftNetwork (): 152 This function MUST return the number of octets in the given 153 sequence number range that have left the network. The algorithm 154 checks each octet in the given range and separately keeps track 155 of the number of retransmitted octets and the number of octets 156 that are cumulatively ACKed but were not SACKed. Note: it is 157 possible to have octets that fit both categories. In this case, 158 the octets MUST be counted in both categories. After checking 159 the sequence number range given this routine returns the sum of 160 the two counters. 162 Note: The SACK-based loss recovery algorithm outlined in this 163 document requires more computational resources than previous TCP 164 loss recovery strategies. However, we believe the scoreboard data 165 structure can be implemented in a reasonably efficient manner (both 166 in terms of computation complexity and memory usage) in most TCP 167 implementations. 169 4 Algorithm Details 171 Upon the receipt of the first DupThresh - 1 duplicate ACKs, the 172 scoreboard MUST be updated per the selective acknowledgment 173 information contained in the ACK (via the Update () routine). Note: 174 The first and second duplicate ACKs can also be used to trigger the 175 transmission of previously unsent segments using the Limited 176 Transmit mechanism [ABF00]. 178 When a TCP sender receives the duplicate ACK corresponding to 179 DupThresh ACKs, the scoreboard MUST be updated with the new SACK 180 information (via Update ()) and a loss recovery phase SHOULD be 181 initiated, per the fast retransmit algorithm outlined in [RFC2581], 182 and the following steps MUST be taken: 184 (1) Set a ``pipe'' variable to the number of outstanding octets 185 (i.e., octets that have been sent but not yet acknowledged), per 186 the following equation: 188 pipe = HighData - HighACK - AmountSACKed () 190 (2) Set a ``RecoveryPoint'' variable to HighData. When the TCP 191 sender receives a cumulative ACK for this data octet the loss 192 recovery phase is terminated. 194 (3) The congestion window (cwnd) is reduced to half its current 195 value. The value of the slow start threshold (ssthresh) is set 196 to the halved value of cwnd. 198 (4) Retransmit the first data segment not covered by HighACK. Use 199 the MarkRetran () function to mark the sequence number range as 200 having been retransmitted in the scoreboard. 202 Once a TCP is in the loss recovery phase the following procedure 203 MUST be used for each arriving ACK: 205 (A) An incoming cumulative ACK for a sequence number greater than or 206 equal to RecoveryPoint signals the end of loss recovery and the 207 loss recovery phase MUST be terminated. 209 (B) Upon receipt of a duplicate ACK the following actions MUST be 210 taken: 212 (B.1) Use Update () to record the new SACK information conveyed 213 by the incoming ACK. 215 (B.2) The pipe variable is decremented by the number of newly 216 SACKed data octets conveyed in the incoming ACK, as that is 217 the amount of new data that has left the network. 219 (C) When a ``partial ACK'' (an ACK that increases the HighACK point, 220 but does not terminate loss recovery) arrives, the following 221 actions MUST be performed: 223 (C.1) Before updating HighACK based on the received cumulative 224 ACK, save HighACK as OldHighACK. 226 (C.2) The scoreboard MUST be updated based on the cumulative ACK 227 and any new SACK information that is included in the ACK via 228 the Update () routine. 230 (C.3) The value of pipe MUST be decremented by the number of 231 octets returned by the LeftNetwork () routine when given the 232 sequence number range OldHighACK-HighACK. 234 (D) While pipe is less than cwnd and the receiver's advertised 235 window permits, the TCP sender SHOULD transmit one or more 236 segments as follows: 238 (D.1) The scoreboard MUST be queried via NextSeg () for the 239 sequence number range of the next segment to transmit, and 240 the given segment is sent. 242 (D.2) The pipe variable MUST be incremented by the number of 243 data octets sent in (D.1). 245 (D.3) If any of the data octets sent in (D.1) are below HighACK, 246 they MUST be marked as retransmitted via Update (). 248 (D.4) If cwnd - pipe is greater than 1 SMSS, return to (D.1) 250 5 Research 252 The algorithm specified in this document is analyzed in [FF96], 253 which shows that the above algorithm is effective in reducing 254 transfer time over standard TCP Reno [RFC2581] when multiple 255 segments are dropped from a window of data (especially as the number 256 of drops increases). [AHKO97] shows that the algorithm defined in 257 this document can greatly improve throughput in connections 258 traversing satellite channels. 260 6 Security Considerations 262 The algorithm presented in this paper shares security considerations 263 with [RFC2581]. A key difference is that an algorithm based on 264 SACKs is more robust against attackers forging duplicate ACKs to 265 force the TCP sender to reduce cwnd. With SACKs TCP senders have an 266 additional check on whether the ACK is legitimate or not. While not 267 fool-proof, SACK provides some amount of protection in this area. 269 Acknowledgments 271 The authors wish to thank Sally Floyd for encouraging this document 272 and commenting on an early draft. The algorithm described in this 273 document is largely based on an algorithm outlined by Kevin Fall and 274 Sally Floyd in [FF96] (although the authors of this document assume 275 responsibility for any mistakes in the above). Murali Bashyam, 276 Jamshid Mahdavi, Matt Mathis and Vern Paxson provided valuable 277 feedback on earlier versions of this document. Finally, we thank 278 Matt Mathis and Jamshid Mahdavi for implementing the scoreboard in 279 ns and hence guiding our thinking in keeping track of SACK state. 281 References 283 [ABF00] Mark Allman, Hari Balakrishnan, Sally Floyd. Enhancing 284 TCP's Loss Recovery Using Limited Transmit, August 285 2000. Internet-Draft draft-ietf-tsvwg-limited-xmit-00.txt (work 286 in progress). 288 [AHKO97] Mark Allman, Chris Hayes, Hans Kruse, Shawn Ostermann. TCP 289 Performance Over Satellite Links. Proceedings of the Fifth 290 International Conference on Telecommunications Systems, 291 Nashville, TN, March, 1997. 293 [All00] Mark Allman. A Web Server's View of the Transport Layer. ACM 294 Computer Communication Review, 30(5), October 2000. 296 [FF96] Kevin Fall and Sally Floyd. Simulation-based Comparisons of 297 Tahoe, Reno and SACK TCP. Computer Communication Review, July 298 1996. 300 [Jac90] Van Jacobson. Modified TCP Congestion Avoidance Algorithm. 301 Technical Report, LBL, April 1990. 303 [PF00] Jitendra Padhye, Sally Floyd. TBIT, the TCP Behavior 304 Inference Tool, October 2000. http://www.aciri.org/tbit/. 306 [RFC793] Jon Postel, Transmission Control Protocol, STD 7, RFC 793, 307 September 1981. 309 [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens, TCP 310 Congestion Control, RFC 2581, April 1999. 312 [RFC2582] Sally Floyd and Tom Henderson. The NewReno Modification 313 to TCP's Fast Recovery Algorithm, RFC 2582, April 1999. 315 [RFC2914] Sally Floyd. Congestion Control Principles, RFC 2914, 316 September 2000. 318 Author's Addresses: 320 Mark Allman 321 NASA Glenn Research Center/BBN Technologies 322 Lewis Field 323 21000 Brookpark Rd. MS 54-2 324 Cleveland, OH 44135 325 Phone: 216-433-6586 326 Fax: 216-433-8705 327 mallman@grc.nasa.gov 328 http://roland.grc.nasa.gov/~mallman 330 Ethan Blanton 331 Ohio University Internetworking Research Lab 332 Stocker Center 333 Athens, OH 45701 334 eblanton@cs.ohiou.edu