idnits 2.17.1 draft-stevens-tcpca-spec-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [3], [4], [5], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 1996) is 10291 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' Summary: 10 errors (**), 0 flaws (~~), 1 warning (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT W. Richard Stevens 2 Expires: August 26, 1996 February 1996 3 5 TCP Slow Start, Congestion Avoidance, 6 Fast Retransmit, and Fast Recovery Algorithms 8 Status of this Memo 10 This document is an Internet Draft. Internet Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its Areas, 12 and its Working Groups. Note that other groups may also distribute 13 working documents as Internet Drafts. 15 Internet Drafts are draft documents valid for a maximum of six 16 months. Internet Drafts may be updated, replaced, or obsoleted by 17 other documents at any time. It is not appropriate to use Internet 18 Drafts as reference material or to cite them other than as a 19 "working draft" or "work in progress." 21 To learn the current status of any Internet-Draft, please check the 22 "1id-abstracts.txt" listing contained in the internet-drafts Shadow 23 Directories on: 25 ftp.is.co.za (Africa) 26 nic.nordu.net (Europe) 27 ds.internic.net (US East Coast) 28 ftp.isi.edu (US West Coast) 29 munnari.oz.au (Pacific Rim) 31 Abstract 33 Modern implementations of TCP contain four intertwined algorithms 34 that have never been fully documented as Internet standards: slow 35 start, congestion avoidance, fast retransmit, and fast recovery. 36 [2] and [3] provide some details on these algorithms, [4] provides 37 examples of the algorithms in action, and [5] provides the source 38 code for the 4.4BSD implementation. RFC 1122 requires that a TCP 39 must implement slow start and congestion avoidance (Section 4.2.2.15 40 of [1]), citing [2] as the reference, but fast retransmit and fast 41 recovery were implemented after RFC 1122. The purpose of this 42 Internet Draft is to document these four algorithms for the 43 Internet. 45 Acknowledgments 47 Much of this memo is taken from "TCP/IP Illustrated, Volume 1: The 48 Protocols" by W. Richard Stevens (Addison-Wesley, 1994) and "TCP/IP 49 Illustrated, Volume 2: The Implementation" by Gary R. Wright and W. 50 Richard Stevens (Addison-Wesley, 1995). This material is used with 51 the permission of Addison-Wesley. 53 1. Slow Start 55 Old TCPs would start a connection with the sender injecting multiple 56 segments into the network, up to the window size advertised by the 57 receiver. While this is OK when the two hosts are on the same LAN, 58 if there are routers and slower links between the sender and the 59 receiver, problems can arise. Some intermediate router must queue 60 the packets, and it's possible for that router to run out of space. 61 [2] shows how this naive approach can reduce the throughput of a TCP 62 connection drastically. 64 The algorithm to avoid this is called slow start. It operates by 65 observing that the rate at which new packets should be injected into 66 the network is the rate at which the acknowledgments are returned by 67 the other end. 69 Slow start adds another window to the sender's TCP: the congestion 70 window, called "cwnd". When a new connection is established with a 71 host on another network, the congestion window is initialized to one 72 segment (i.e., the segment size announced by the other end, or the 73 default, typically 536 or 512). Each time an ACK is received, the 74 congestion window is increased by one segment. The sender can 75 transmit up to the minimum of the congestion window and the 76 advertised window. The congestion window is flow control imposed by 77 the sender, while the advertised window is flow control imposed by 78 the receiver. The former is based on the sender's assessment of 79 perceived network congestion; the latter is related to the amount of 80 available buffer space at the receiver for this connection. 82 The sender starts by transmitting one segment and waiting for its 83 ACK. When that ACK is received, the congestion window is 84 incremented from one to two, and two segments can be sent. When 85 each of those two segments is acknowledged, the congestion window is 86 increased to four. This provides an exponential increase, although 87 it is not exactly exponential because the receiver may delay its 88 ACKs, typically sending one ACK for every two segments that it 89 receives. 91 At some point the capacity of the internet can be reached, and an 92 intermediate router will start discarding packets. This tells the 93 sender that its congestion window has gotten too large. 95 Early implementations performed slow start only if the other end was 96 on a different network. Current implementations always perform slow 97 start. 99 2. Congestion Avoidance 101 Congestion can occur when data arrives on a big pipe (a fast LAN) 102 and gets sent out a smaller pipe (a slower WAN). Congestion can 103 also occur when multiple input streams arrive at a router whose 104 output capacity is less than the sum of the inputs. Congestion 105 avoidance is a way to deal with lost packets. It is described in 106 [2]. 108 The assumption of the algorithm is that packet loss caused by damage 109 is very small (much less than 1%), therefore the loss of a packet 110 signals congestion somewhere in the network between the source and 111 destination. There are two indications of packet loss: a timeout 112 occurring and the receipt of duplicate ACKs. 114 Congestion avoidance and slow start are independent algorithms with 115 different objectives. But when congestion occurs TCP must slow down 116 its transmission rate of packets into the network, and then invoke 117 slow start to get things going again. In practice they are 118 implemented together. 120 Congestion avoidance and slow start require that two variables be 121 maintained for each connection: a congestion window, cwnd, and a 122 slow start threshold size, ssthresh. The combined algorithm 123 operates as follows: 125 1. Initialization for a given connection sets cwnd to one segment 126 and ssthresh to 65535 bytes. 128 2. The TCP output routine never sends more than the minimum of cwnd 129 and the receiver's advertised window. 131 3. When congestion occurs (indicated by a timeout or the reception 132 of duplicate ACKs), one-half of the current window size (the 133 minimum of cwnd and the receiver's advertised window, but at 134 least two segments) is saved in ssthresh. Additionally, if the 135 congestion is indicated by a timeout, cwnd is set to one segment 136 (i.e., slow start). 138 4. When new data is acknowledged by the other end, increase cwnd, 139 but the way it increases depends on whether TCP is performing 140 slow start or congestion avoidance. 142 If cwnd is less than or equal to ssthresh, TCP is in slow start; 143 otherwise TCP is performing congestion avoidance. Slow start 144 continues until TCP is halfway to where it was when congestion 145 occurred (since it recorded half of the window size that caused 146 the problem in step 2), and then congestion avoidance takes 147 over. 149 Slow start has cwnd begin at one segment, and be incremented by 150 one segment every time an ACK is received. As mentioned 151 earlier, this opens the window exponentially: send one segment, 152 then two, then four, and so on. Congestion avoidance dictates 153 that cwnd be incremented by 1/cwnd each time an ACK is received. 154 This is an additive increase, compared to slow start's 155 exponential increase. The increase in cwnd should be at most 156 one segment each round-trip time (regardless how many ACKs are 157 received in that RTT), whereas slow start increments cwnd by the 158 number of ACKs received in a round-trip time. 160 Many implementations incorrectly add a small fraction of the segment 161 size (typically the segment size divided by 8) during congestion 162 avoidance. This is wrong and should not be emulated in future 163 releases. 165 3. Fast Retransmit 167 Modifications to the congestion avoidance algorithm were proposed in 168 1990 [3]. Before describing the change, realize that TCP may 169 generate an immediate acknowledgment (a duplicate ACK) when an out- 170 of-order segment is received (Section 4.2.2.21 of [1], with a note 171 that one reason for doing so was for the experimental fast- 172 retransmit algorithm). This duplicate ACK should not be delayed. 173 The purpose of this duplicate ACK is to let the other end know that 174 a segment was received out of order, and to tell it what sequence 175 number is expected. 177 Since TCP does not know whether a duplicate ACK is caused by a lost 178 segment or just a reordering of segments, it waits for a small 179 number of duplicate ACKs to be received. It is assumed that if 180 there is just a reordering of the segments, there will be only one 181 or two duplicate ACKs before the reordered segment is processed, 182 which will then generate a new ACK. If three or more duplicate ACKs 183 are received in a row, it is a strong indication that a segment has 184 been lost. TCP then performs a retransmission of what appears to be 185 the missing segment, without waiting for a retransmission timer to 186 expire. This is the fast retransmit algorithm. 188 4. Fast Recovery 189 After fast retransmit sends what appears to be the missing segment, 190 congestion avoidance, but not slow start is performed. This is the 191 fast recovery algorithm. It is an improvement that allows high 192 throughput under moderate congestion, especially for large windows. 194 The reason for not performing slow start in this case is that the 195 receipt of the duplicate ACKs tells TCP more than just a packet has 196 been lost. Since the receiver can only generate the duplicate ACK 197 when another segment is received, that segment has left the network 198 and is in the receiver's buffer. That is, there is still data 199 flowing between the two ends, and TCP does not want to reduce the 200 flow abruptly by going into slow start. 202 The fast retransmit and fast recovery algorithms are usually 203 implemented together as follows. 205 1. When the third duplicate ACK in a row is received, set ssthresh 206 to one-half the current congestion window, cwnd, but no less 207 than two segments. Retransmit the missing segment. Set cwnd to 208 ssthresh plus 3 times the segment size. This inflates the 209 congestion window by the number of segments that have left the 210 network and which the other end has cached (3). 212 2. Each time another duplicate ACK arrives, increment cwnd by the 213 segment size. This inflates the congestion window for the 214 additional segment that has left the network. Transmit a 215 packet, if allowed by the new value of cwnd. 217 3. When the next ACK arrives that acknowledges new data, set cwnd 218 to ssthresh (the value set in step 1). This ACK should be the 219 acknowledgment of the retransmission from step 1, one round-trip 220 time after the retransmission. Additionally, this ACK should 221 acknowledge all the intermediate segments sent between the lost 222 packet and the receipt of the first duplicate ACK. This step is 223 congestion avoidance, since TCP is down to one-half the rate it 224 was at when the packet was lost. 226 The fast retransmit algorithm first appeared in the 4.3BSD Tahoe 227 release, but it was incorrectly followed by slow start. The fast 228 recovery algorithm appeared in the 4.3BSD Reno release. 230 5. Security Considerations 232 Security considerations are not discussed in this memo. 234 6. References 236 [1] B. Braden, ed., "Requirements for Internet Hosts -- 237 Communication Layers," RFC 1122, Oct. 1989. 239 [2] V. Jacobson, "Congestion Avoidance and Control," Computer 240 Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. 241 ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. 243 [3] V. Jacobson, "Modified TCP Congestion Avoidance Algorithm," 244 end2end-interest mailing list, April 30, 1990. 245 ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail. 247 [4] W. R. Stevens, "TCP/IP Illustrated, Volume 1: The Protocols", 248 Addison-Wesley, 1994. 250 [5] G. R. Wright, W. R. Stevens, "TCP/IP Illustrated, Volume 2: 251 The Implementation", Addison-Wesley, 1995. 253 Author's Address: 255 W. Richard Stevens 256 1202 E. Paseo del Zorro 257 Tucson, AZ 85718 259 Phone: 520-297-9416 261 EMail: rstevens@noao.edu 263 Expires: August 26, 1996