idnits 2.17.1 draft-allman-tcp-abc-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Bra89' is mentioned on line 96, but not defined == Missing Reference: 'RFC2988' is mentioned on line 172, but not defined ** Obsolete undefined reference: RFC 2988 (Obsoleted by RFC 6298) -- Possible downref: Non-RFC (?) normative reference: ref. 'All98' -- Possible downref: Non-RFC (?) normative reference: ref. 'All99' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jac88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97' ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2861 (Obsoleted by RFC 7661) -- Possible downref: Non-RFC (?) normative reference: ref. 'SCWA99' Summary: 7 errors (**), 0 flaws (~~), 3 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Mark Allman 3 INTERNET DRAFT BBN/NASA GRC 4 File: draft-allman-tcp-abc-02.txt November, 2001 5 Expires: May, 2002 7 TCP Congestion Control with Appropriate Byte Counting 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as 17 Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months and may be updated, replaced, or obsoleted by other documents 21 at any time. It is inappropriate to use Internet- Drafts as 22 reference material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 Abstract 32 This document proposed a small modification to the way TCP increases 33 its congestion window. Rather than the traditional method of 34 increasing the congestion window by a constant amount for each 35 arriving acknowledgment, we suggest basing the increase on the 36 number of previously unacknowledged bytes each ACK covers. This 37 change improves the performance of TCP, as well as closes a security 38 hole TCP receivers can use to induce the sender into increasing the 39 sending rate too rapidly. 41 1 Introduction 43 This document proposes a modified algorithm for increasing TCP's 44 congestion window (cwnd) that improves performance and security. 45 Rather than increasing a TCP's congestion window based on the number 46 of acknowledgments (ACKs) that arrive at the data sender the 47 congestion window is increased based on the number of bytes 48 acknowledged by the arriving ACKs. The algorithm improves 49 performance by mitigating the impact of delayed ACKs on the growth 50 of cwnd. At the same time, the algorithm provides more appropriate 51 cwnd growth in response to ACKs that cover only small amounts of 52 data (less than a full segment size). More appropriate cwnd growth 53 can improve both performance and can prevent inappropriate cwnd 54 growth in response to a misbehaving receiver. 56 Much of the language in this document is taken from [RFC2581]. 58 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 59 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 60 document are to be interpreted as described in [RFC2119]. 62 This document is organized as follows. Section 2 outlines the 63 modified algorithm for increasing TCP's congestion window. Section 64 3 discusses the advantages of using the modified algorithm. Section 65 4 discusses the disadvantages of the approach outlined in this 66 document. Section 5 outlines some of the fairness issues that must 67 be considered for the modified algorithm. Section 6 discusses 68 security considerations. 70 2 A Modified Algorithm for Increasing the Congestion Window 72 As originally outlined in [Jac88] and specified in [RFC2581], TCP 73 uses two algorithms for increasing the congestion window (cwnd). 74 During steady-state, TCP uses the Congestion Avoidance algorithm to 75 linearly increase the value of cwnd. At the beginning of a 76 transfer, after a retransmission timeout or after a long idle period 77 (in some implementations), TCP uses the Slow Start algorithm to 78 increase cwnd exponentially. According to RFC 2581 slow start bases 79 the cwnd increase on the number of incoming acknowledgments. During 80 congestion avoidance RFC 2581 allows more latitude in increasing 81 cwnd, but traditionally implementations have based the increase on 82 the number of arriving ACKs. In the following two subsections, we 83 detail modifications to these algorithms to increase cwnd based on 84 the number of bytes being acknowledged by each arriving ACK, rather 85 than by the number of ACKs that arrive. We call these changes 86 ``Appropriate Byte Counting'' (ABC) [All99]. 88 2.1 Congestion Avoidance 90 RFC 2581 specifies that cwnd should be increased by 1 segment per 91 round-trip time (RTT) during the congestion avoidance phase of a 92 transfer. Traditionally, TCPs have approximated this increase by 93 increasing cwnd by 1/cwnd for each arriving ACK. This algorithm 94 opens cwnd by roughly 1 segment per RTT if the receiver ACKs each 95 incoming segment and no ACK loss occurs. However, if the receiver 96 implements delayed ACKs [Bra89] the receiver returns roughly half as 97 many ACKs which causes the sender to open cwnd more conservatively 98 (by approximately 1 segment every second RTT). The approach that we 99 suggest is to store the number of bytes that have been ACKed in a 100 bytes_acked variable in the TCP control block. When bytes_acked 101 becomes greater than or equal to the value of the congestion window, 102 bytes_acked is reduced by the value of cwnd. Next, cwnd is 103 incremented by a full-sized segment (SMSS). The algorithm suggested 104 above is specifically allowed by RFC 2581 during congestion 105 avoidance because it opens the window by at most 1 segment per RTT. 107 2.2 Slow Start 108 RFC 2581 states that the sender increments the congestion window by 109 at most 1*SMSS bytes for each arriving acknowledgment during slow 110 start. We propose that a TCP sender SHOULD increase cwnd by the 111 number of previously unacknowledged bytes ACKed by each incoming 112 acknowledgment provided the increase is not more than L bytes. 113 Choosing the limit on the increase, L, is discussed in the next 114 subsection. When the number of previously unacknowledged bytes 115 ACKed is less than or equal to 1*SMSS bytes or L is less than or 116 equal to 1*SMSS bytes this proposal is no more aggressive (and 117 possibly less aggressive) than allowed by RFC 2581. However, 118 increasing cwnd by more than 1*SMSS bytes in response to a single 119 ACK is more aggressive than allowed by RFC 2581. We believe the 120 more aggressive version of the slow start algorithm still falls 121 under the ``conservation of packets'' principle outlined in [Jac88] 122 and is safe for experimentation in shared networks provided an 123 appropriate limit is applied (see next section). 125 2.3 Choosing the Limit 127 The limit, L, chosen for the cwnd increase during slow start 128 controls the aggressiveness of the algorithm. Choosing L=1*SMSS 129 bytes provides behavior that is no more aggressive than allowed by 130 RFC 2581. However, ABC with L=1*SMSS bytes is more conservative in 131 a number of key ways (as discussed in the next section) and 132 therefore, we believe that even though with L=1*SMSS bytes TCP 133 stacks will see little performance benefit, ABC SHOULD be used. 135 A very large L could potentially lead to large line-rate bursts of 136 traffic in the face of a large amount of ACK loss or in the case 137 when the receiver sends ``stretch ACKs'' (ACKs for more than the two 138 full-sized segments allowed by the delayed ACK algorithm) [Pax97]. 140 This documents specifies that TCP implementations MAY use L=2*SMSS 141 bytes and MUST NOT use L > 2*SMSS bytes. This choice balances 142 between being conservative (L=1*SMSS bytes) and potentially being 143 very aggressive. In addition, L=2*SMSS bytes exactly balances the 144 negative impact of the delayed ACK algorithm (as discussed in more 145 detail in section 3.2). Note that when L=2*SMSS bytes cwnd growth 146 is roughly the same as the case when the standard algorithms are 147 used in conjunction with a receiver that transmits an ACK for each 148 incoming segment. 150 The exception to the above suggestion is during a slow start phase 151 that follows a retransmission timeout (RTO). In this situation, a 152 TCP MUST use L=1*SMSS as specified in RFC 2581 since ACKs for large 153 amount of previously unacknowledged data are common during this 154 phase of a transfer. These ACKs do not necessarily indicate how 155 much data has left the network in the last RTT and therefore ABC 156 cannot accurately determine how much to increase cwnd. As an 157 example, say segment N is dropped by the network and segments N+1 158 and N+2 arrive successfully at the receiver. The sender will 159 receive only two duplicate ACKs and therefore must rely on the 160 retransmission timer (RTO) to detect the loss. When the RTO expires 161 segment N is retransmitted. The ACK sent in response to the 162 retransmission will be for segment N+2. However, this ACK does not 163 indicate that three segments have left the network in the last RTT, 164 but rather only a single segment left the network. Therefore, the 165 appropriate cwnd increment is at most 1*SMSS bytes. 167 2.4 RTO Implications 169 [Jac88] shows that increases in cwnd of more than a factor of two in 170 succeeding RTTs can cause spurious retransmissions on slow links 171 where the bandwidth dominates the RTT, assuming the RTO estimator 172 given in [Jac88] and [RFC2988]. ABC stays within this limit of no 173 more than doubling cwnd in successive RTTs by capping the increase 174 (no matter what L is employed) by the number of previously 175 unacknowledged bytes covered by each incoming ACK. 177 3 Advantages 179 This section outlines several advantages of using the ABC algorithm 180 to increase cwnd, rather than the standard ACK counting algorithm 181 given in [RFC2581]. 183 3.1 More Appropriate Congestion Window Increase 185 The ABC algorithm outlined in section 2 increases TCP's cwnd in 186 proportion to the amount of data actually sent into the network. 187 ACK counting, on the other hand, increments cwnd by a constant upon 188 the arrival of each ACK. For instance, consider a telnet connection 189 in which ACKs generally cover only a few bytes of data, but cwnd is 190 increased by 1*SMSS bytes for each ACK received. When a large 191 amount of data needs to be transmitted (e.g., displaying a large 192 file) the data is sent in one large burst because the cwnd grows by 193 1*SMSS bytes per ACK rather than based on the actual amount of 194 capacity used. Such a line-rate burst of data can potentially cause 195 a large amount of segment loss. 197 Congestion Window Validation (CWV) [RFC2861] helps the above problem 198 as well. CWV limits the amount of unused cwnd a TCP connection can 199 accumulate. ABC can be used in conjunction with CWV to obtain an 200 accurate measure of the network path. 202 3.2 Mitigate the Impact of Delayed ACKs and Lost ACKs 204 Delayed ACKs [RFC1122,RFC2581] allow a TCP receiver to refrain from 205 sending an ACK for each incoming segment. However, a receiver 206 SHOULD send an ACK for every second full-sized segment that arrives. 207 Furthermore, a receiver MUST NOT withhold an ACK for more than 500 208 ms. By reducing the number of ACKs sent to the data originator the 209 receiver is slowing the growth of the congestion window under an ACK 210 counting system. Using ABC with L=2*SMSS bytes can roughly negate 211 the negative impact imposed by delayed ACKs by allowing cwnd to be 212 increased for ACKs that were withheld by the receiver. This allows 213 the congestion window to grow in a manner similar to the case when 214 the receiver ACKs each incoming segment, but without adding extra 215 traffic to the network. Simulation studies have shown increased 216 throughput when a TCP sender uses ABC when compared to the standard 217 ACK counting algorithm [All99], especially for short transfers that 218 never leave the initial slow start period. 220 Note that delayed ACKs should not be an issue during slow 221 start-based loss recovery, as RFC 2581 recommends that receivers not 222 delay ACKs that cover out-of-order segments. Therefore, as 223 discussed above, ABC with L > 1*SMSS is inappropriate for such slow 224 start based loss recovery and MUST NOT be used. 226 3.3 Prevents Attacks from Misbehaving Receivers 228 [SCWA99] outlines several methods for a receiver to induce a TCP 229 sender into violating congestion control and transmitting data at a 230 potentially inappropriate rate. One of the outlined attacks is 231 ``ACK Splitting''. This scheme involves the receiver sending 232 multiple ACKs for each incoming data segment, each ACKing only a 233 small portion of the original TCP data segment. Since TCP senders 234 have traditionally used ACK counting to increase cwnd, ACK splitting 235 causes inappropriately rapid cwnd growth and, in turn, a potentially 236 inappropriate sending rate. A TCP sender that uses ABC can prevent 237 this attack from being used to undermine standard congestion control 238 because the cwnd increase is based on the number of bytes ACKed, 239 rather than the number of ACKs received. 241 To prevent misbehaving receivers from inducing inappropriate sender 242 behavior we suggest TCP implementations use ABC, even if L=1*SMSS 243 bytes (i.e., not allowing ABC to provide more aggressive cwnd growth 244 than allowed by RFC 2581). 246 4 Disadvantages 248 The main disadvantages of using ABC with L=2*SMSS bytes are an 249 increase in the burstiness of TCP and a small increase in the 250 overall loss rate. [All98] discusses the two ways that ABC 251 increases the burstiness of the TCP sender. First, the ``micro 252 burstiness'' of the connection is increased. In other words, the 253 number of segments sent in response to each incoming ACK is 254 increased by at most 1 segment when using ABC with L=2*SMSS bytes in 255 conjunction with a receiver that is sending delayed ACKs. During 256 slow start this translates into an increase from sending 2 257 back-to-back segments to sending 3 back-to-back packets in response 258 to an ACK for a single packet. Or, an increase from 3 packets to 4 259 packets when receiving a delayed ACK for two outstanding packets. 260 Note that ACK loss can cause larger bursts. However, ABC only 261 increases the burst size by at most 1*SMSS bytes per ACK received 262 when compared to the standard behavior. This slight increase in the 263 burstiness should only cause problems for devices that have very 264 small buffers. In addition, ABC increases the ``macro burstiness'' 265 of the TCP sender in response to delayed ACKs. Rather than 266 increasing cwnd by roughly 1.5 times per RTT, ABC roughly doubles 267 the congestion window every RTT. However, doubling cwnd every RTT 268 fits within the spirit of slow start, as originally outlined 269 [Jac88]. 271 With the increased burstiness comes a modest increase in the loss 272 rate for a TCP connection employing ABC (see the next section for a 273 short discussion on the fairness of ABC to non-ABC flows). The 274 additional loss can be directly attributable to the increased 275 aggressiveness of ABC. During slow start cwnd is increased more 276 rapidly and therefore when loss occurs cwnd is larger and more drops 277 are likely. Similarly, a congestion avoidance cycle takes roughly 278 half as long when using ABC and delayed ACKs when compared to an ACK 279 counting implementation. In other words, a TCP sender reaches the 280 capacity of the network path, drops a packet and reduces the 281 congestion window by half roughly twice as often when using ABC. 282 However, as discussed above, in spite of the additional loss an ABC 283 TCP sender generally obtains better overall performance than a 284 non-ABC TCP [All99]. 286 5 Fairness Considerations 288 [All99] presents several simulations conducted to measure the impact 289 of ABC on competing traffic (both ABC and non-ABC). The experiments 290 show that while ABC increases the drop rate for the connection using 291 ABC, competing traffic is not greatly effected. The experiments 292 show that standard TCP and ABC both obtain roughly the same 293 throughput regardless of the variant of the competing traffic. The 294 simulations also reaffirm that ABC outperforms non-ABC TCP in an 295 environment with varying types of TCP connections. On the other 296 hand, the simulations presented in [All99] are not necessarily 297 realistic and therefore we are encouraging more experimentation in 298 the Internet. 300 6 Security Considerations 302 As discussed in section 3.3 ABC protects a TCP from a misbehaving 303 receiver that induces the sender into transmitting at an 304 inappropriate rate with an ``ACK splitting'' attack. This, in turn, 305 protects the network from an overly aggressive sender. 307 7 Conclusions 309 We RECOMMEND that all TCP stacks be modified to use ABC with 310 L=1*SMSS bytes. This change does not increase the aggressiveness of 311 TCP. Furthermore, simulations of ABC with L=2*SMSS bytes show a 312 promising performance improvement that we encourage researchers to 313 experiment with in the Internet. 315 Acknowledgments 317 This draft has benefited from discussions with and encouragement 318 from Sally Floyd. Van Jacobson and Reiner Ludwig provided valuable 319 input on the implications of byte counting on the RTO. 321 References 323 [All98] Mark Allman. TCP Byte Counting Refinements. ACM Computer 324 Communication Review, 29(3), July 1999. 326 [All99] Mark Allman. TCP Byte Counting Refinements. ACM Computer 327 Communication Review, 29(3), July 1999. 329 [Jac88] Van Jacobson. Congestion Avoidance and Control. ACM 330 SIGCOMM 1988. 332 [Pax97] Vern Paxson. Automated Packet Trace Analysis of TCP 333 Implementations. ACM SIGCOMM, September 1997. 335 [RFC1122] B. Braden, ed., Requirements for Internet Hosts -- 336 Communication Layers, RFC 1122, Oct. 1989. 338 [RFC2119] S. Bradner, Key words for use in RFCs to Indicate 339 Requirement Levels, BCP 14, RFC 2119, March 1997. 341 [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens. TCP 342 Congestion Control, April 1999. RFC 2581. 344 [RFC2861] Mark Handley, Jitendra Padhye, Sally Floyd. TCP 345 Congestion Window Validation, June 2000. RFC 2861. 347 [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, Tom 348 Anderson. TCP Congestion Control with a Misbehaving Receiver. 349 ACM Computer Communication Review, 29(5), October 1999. 351 Author's Addresses: 353 Mark Allman 354 BBN Technologies/NASA Glenn Research Center 355 Lewis Field 356 21000 Brookpark Rd. MS 54-5 357 Cleveland, OH 44135 358 Phone: 216-433-6586 359 Fax: 216-433-8705 360 mallman@bbn.com 361 http://roland.grc.nasa.gov/~mallman