idnits 2.17.1 draft-swami-tsvwg-tcp-dclor-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1.a on line 18. -- Found old boilerplate from RFC 3978, Section 5.5 on line 367. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 344. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 351. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 357. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: This document is an Internet-Draft and is subject to all provisions of Section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 176: '... the TCP SACK option [4] is enabled, only then it SHOULD follow the...' RFC 2119 keyword, line 188: '... The TCP sender MUST record the time ...' RFC 2119 keyword, line 197: '...ally, the sender MUST NOT update the S...' RFC 2119 keyword, line 205: '...PTR), the sender SHOULD send one *new*...' RFC 2119 keyword, line 208: '... then the sender MUST retransmit no mo...' (5 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: 1. The TCP sender MUST record the time when the first timeout took place, and when the first ACK after the timeout was received. Based on these times (or through some other means) it should compute the number of unbacked-off timeouts that must have taken place during this time period. Let's call this number N-RTO. The sender should also keep the highest sequence number of data packet that was sent in a variable called SS_PTR. The sender should also keep a counter called dclor_cntr, which allows the sender to send new data while it's waiting for the ACK or SACK of SS_PTR. Additionally, the sender MUST NOT update the SS_TRHESH value due to spurious timeouts (i.e., the spurious timeout algorithm should leave SS_THRESH values unaltered). 2. Once the Spurious Timeout is confirmed, the TCP sender should set cwnd = max( 2, pipe-size/2^N-RTO). ( where pipe-size is the packets in flight at the time when spurious timeout was confirmed.) Additionally, it should set dclor_cntr = 0. 3. For each ACK or SACK < SS_PTR (i.e., a SACK block whose left edge is < SS_PTR), the sender SHOULD send one *new* data packet if it is present and if dclor_cntr < cwnd and (rwnd < SND.NXT -SND.UNA). If (rwnd >= SND.NXT - SND.UNA) or if there is no new data to send, then the sender MUST retransmit no more than one packet per RTO from the tail of the retransmission queue regardless of the value of dclor_cntr. Moreover, for each *new* packet sent, dclor_cntr should be incremented by one. For ACK/ SACK < SS_PTR, the sender MUST not initiate any loss recovery algorithm nor should it update cwnd value. Additionally, the SS_THRESH should be left unchanged for all these ACKs. 4. If the sender receives a pure ACK > SS_PTR, it should update cwnd = cwnd+1, and follow normal TCP behavior. (Note that this means that none of the stalled packets were lost so we don't need to change SS_THRESH value). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2, 2004) is 7169 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '8' is defined on line 308, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2581 (ref. '1') (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (ref. '2') (Obsoleted by RFC 6675) ** Obsolete normative reference: RFC 2861 (ref. '5') (Obsoleted by RFC 7661) ** Downref: Normative reference to an Experimental RFC: RFC 3522 (ref. '6') == Outdated reference: A later version (-06) exists of draft-ietf-tsvwg-tcp-eifel-response-05 ** Obsolete normative reference: RFC 2988 (ref. '8') (Obsoleted by RFC 6298) -- Possible downref: Non-RFC (?) normative reference: ref. '9' Summary: 13 errors (**), 0 flaws (~~), 5 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Swami 3 Internet-Draft K. Le 4 Expires: March 3, 2005 Nokia Research Center, Dallas 5 September 2, 2004 7 Decorrelated Loss Recovery (DCLOR) Using SACK Option for Spurious 8 Timeouts 9 draft-swami-tsvwg-tcp-dclor-04 11 Status of this Memo 13 This document is an Internet-Draft and is subject to all provisions 14 of section 3 of RFC 3667. By submitting this Internet-Draft, each 15 author represents that any applicable patent or other IPR claims of 16 which he or she is aware have been or will be disclosed, and any of 17 which he or she become aware will be disclosed, in accordance with 18 RFC 3668. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as 23 Internet-Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This Internet-Draft will expire on March 3, 2005. 38 Copyright Notice 40 Copyright (C) The Internet Society (2004). 42 Abstract 44 A spurious timeout in TCP forces the sender to unnecessarily 45 retransmit one complete congestion window of data into the network. 46 In addition, the congestion state of the network could change 47 substantially after a spurious timeout. In this draft we propose a 48 conservative congestion response algorithm afert spurious timeout 49 that takes network state into account. 51 1. Introduction 53 The response of a TCP sender after a retransmission timeout is 54 governed by the underlying assumption that a mid-stream timeout can 55 occur only if there is heavy congestion--manifested as packet 56 loss--in the network. TCP therefore assumes that a timeout is a 57 sufficient indication to a) recover all the packets in flight, and b) 58 to initiate a congestion response (slow start in this case) suited 59 for heavy congestion scenarios. 61 Although the assumption that a timeout can occur only if there is 62 severe congestion is valid for traditional wireline networks, it does 63 not hold good for some other types of networks--networks where 64 packets can be stalled "in the network" for a significant duration 65 without being discarded. In cellular networks, for example, the link 66 layer can experience a relatively long disruption due to errors, and 67 the link layer protocol can keep all packets buffered as long as the 68 link layer disruption lasts. 70 In this document we present an alternative approach to loss recovery 71 and congestion control that "De-Correlates" Loss Recovery from 72 congestion after a spurious. The algorithm described here follows 73 the congestion control principle of [1][3] and [5], but unlike the 74 present go-back-N loss recovery algorithm after timeout, DCLOR only 75 sends those segments that were actually lost in the network. 77 2. Terminology 79 The key words "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT," 80 "SHOULD," "SHOULD NOT," "RECOMMENDED," "MAY," "OPTIONAL," and 81 "silently ignore" in this document are to be interpreted as described 82 in RFC 2119. 84 3. Problem Description 86 Let us assume that a TCP sender has sent N packets, p(1) ... p(N), 87 into the network and it's waiting for the ACK of p(1). Due to bad 88 network conditions or some other problem, these packets are 89 excessively delayed at some intermediary node RTR-1. This excessive 90 delay forces the TCP sender to timeout and enter slow start. 92 As far as the sender is concerned, a timeout is always interpreted as 93 heavy congestion. The TCP sender therefore makes the assumption that 94 all packets between p(1) and p(N) were lost in the network. To 95 recover from this misconstrued loss, the sender retransmits P1(1) and 96 waits for the ACK a(1) ( where Px(k) represents the xth 97 retransmission of packet with sequence number k). 99 After some period of time when the network conditions at RTR-1 100 improve, the queued in packets are finally dispatched to their 101 intended recipient. In response, TCP receiver generates the ACK 102 a(1). When the TCP sender receives a(1), it's fooled into believing 103 that a(1) was generated in response to the retransmitted packet 104 p1(1), while in reality a(1) was generated in response to the 105 originally transmitted packet p(1). When the sender receives a(1), 106 it increases its congestion window to two, and retransmits p1(2) and 107 p1(3). As the sender receives more acknowledgments, it continues 108 with retransmissions and finally starts sending new data. Here we 109 only analyze the congestion control behavior after a spurious 110 timeout. Our scheme can be used in conjunction with the detection 111 schemes in [6] and [9]. 113 To analyze network congestion after spurious timeout, we compute the 114 worst case scenario packet loss in the system--assuming only TCP 115 connections to be present. After the timeout (real or spurious), the 116 TCP sender sets its SS_THRESH to N/2. Therefore, for the first N/2 117 ACKs received (i.e., ACK a(1) to a(N/2)), the TCP sender will grow 118 its congestion window by one and reach the SS_THRESH value of N/2. 119 For each ACK received, the TCP sender sends 2 packets. Therefore, by 120 the end of the slow start, the TCP sender would have sent 2*(N/2) 121 packets into the network. For the remaining N/2 ACKs (i.e., ACKs 122 between a(N/2+1) to a(N)) the TCP sender will remain in the 123 congestion avoidance phase and send one packet for each ACK 124 received--sending N/2 more data segments. The net amount of data 125 sent is therefore N/2 + N = 3N/2. 127 Please note that the entire 3N/2 packets are injected into the 128 network within a time period less than or equal to RTT in most cases. 129 The number of data segments that left the network during this time is 130 only N. Therefore, the conservation of packet principle has been 131 compromised, and of the 3N/2 packets injected in the network, N/2 132 packets will be lost with a very high probability. These N/2 lost 133 packets, however, need not come from the same connection, and such a 134 data-burst will unnecessarily penalize all the competing TCP 135 connections that share the same bottleneck router. 137 Now let's assume there are M competing TCP connections that share the 138 same bottleneck router(s) with C(0) (each connection is numbered C(0) 139 ... C(M-1)). During the period of time while C(0) is stalled, the 140 TCP sender does not use its network resources--the buffer space--on 141 the bottleneck router(s). The competing connections, C(1)... C(M), 142 however see this lack of activity as resource availability and start 143 growing their window by at least one segment per RTT during this time 144 period (by virtue of linear window increase during congestion 145 avoidance phase). For simplicity reasons, we assume that each of 146 these connections has the same round trip time of RTT, and the idle 147 time for C(0) is k*RTT (where k > RTO/RTT). Under these assumptions, 148 each of these competing connections will increase their congestion 149 window by k segments. Therefore the amount of packets lost in the 150 network due to slow start following a spurious timeout can be as high 151 as: N/2 + M*k. 153 The Eifel response algorithm [7] solves the problem of N/2 packet 154 loss, by restoring the congestion window to an old value immediately 155 before the spurious timeout. Based on the above equation, however, 156 we note that the congestion state of the network not only depends 157 upon the old window size, but also upon the duration of spurious 158 timeout. In our response algorithm, we therefore take the time 159 duration of spurious timeout into account by reducing the data rate 160 by half every RTO. Please note that this scheme works well only when 161 the number of competing connections M does not vary too much while 162 C(0) was stalled. A more conservative response algorithm should 163 reduce the data rate to INIT_WINDOW if M is not bounded. 165 In addition to the above congestion and packet loss issues, the 166 current response after spurious timeouts is inefficient, in the sense 167 that it unnecessarily retransmits data that is not lost, but simply 168 stalled. Such unnecessary retransmission is an issue when bandwidth 169 resources are at a premium, like over a cellular link, where spectrum 170 is scarce and expensive. 172 4. DCLOR Response Algorithm 174 A TCP sender should follow [6] or [9] (or any other algorithm) to 175 detect a spurious timeout. If the spurious timeout is confirmed and 176 the TCP SACK option [4] is enabled, only then it SHOULD follow the 177 DCLOR algorithm. 179 The basic idea of this algorithm is that the ACKs received for the 180 stalled packets don't provide sufficient information about the 181 end-to-end congestion state of the network. Therefore, the sender 182 reduces the congestion window by 1/2 every RTO, and waits for the ACK 183 or SACK of a new data packet before increasing it's congestion 184 window. Additionally, while the sender is waiting for the ACK/SACK 185 of new data, it's allowed to send cwnd (the updated cwnd) worth of 186 new data into the network. 188 1. The TCP sender MUST record the time when the first timeout took 189 place, and when the first ACK after the timeout was received. 190 Based on these times (or through some other means) it should 191 compute the number of unbacked-off timeouts that must have taken 192 place during this time period. Let's call this number N-RTO. 193 The sender should also keep the highest sequence number of data 194 packet that was sent in a variable called SS_PTR. The sender 195 should also keep a counter called dclor_cntr, which allows the 196 sender to send new data while it's waiting for the ACK or SACK of 197 SS_PTR. Additionally, the sender MUST NOT update the SS_TRHESH 198 value due to spurious timeouts (i.e., the spurious timeout 199 algorithm should leave SS_THRESH values unaltered). 200 2. Once the Spurious Timeout is confirmed, the TCP sender should set 201 cwnd = max( 2, pipe-size/2^N-RTO). ( where pipe-size is the 202 packets in flight at the time when spurious timeout was 203 confirmed.) Additionally, it should set dclor_cntr = 0. 204 3. For each ACK or SACK < SS_PTR (i.e., a SACK block whose left edge 205 is < SS_PTR), the sender SHOULD send one *new* data packet if it 206 is present and if dclor_cntr < cwnd and (rwnd < SND.NXT - 207 SND.UNA). If (rwnd >= SND.NXT - SND.UNA) or if there is no new 208 data to send, then the sender MUST retransmit no more than one 209 packet per RTO from the tail of the retransmission queue 210 regardless of the value of dclor_cntr. Moreover, for each *new* 211 packet sent, dclor_cntr should be incremented by one. For ACK/ 212 SACK < SS_PTR, the sender MUST not initiate any loss recovery 213 algorithm nor should it update cwnd value. Additionally, the 214 SS_THRESH should be left unchanged for all these ACKs. 215 4. If the sender receives a pure ACK > SS_PTR, it should update cwnd 216 = cwnd+1, and follow normal TCP behavior. (Note that this means 217 that none of the stalled packets were lost so we don't need to 218 change SS_THRESH value). 220 5. If the sender receives a SACK block whose left edge is greater 221 than SS_PTR, then it should traverse the retransmission queue 222 from SND.UNA to the left edge of SACK block, and mark all 223 unsacked packets as lost. Additionally, it should set cwnd = 224 cwnd + 1 and reset SS_THRESH to 1/2 the pipe-size. Beyond this 225 point, the sender MUST recover lost packets based on [2]. 227 5. Data Delivery To Upper Layers 229 If a TCP sender loses its entire congestion window worth of data, 230 sending new data after timeout prevents a TCP receiver from 231 forwarding the new data to the upper layers immediately. However, 232 once the SACK for this new data is received, the TCP sender will send 233 the first lost segment. This essentially means that data delivery to 234 the upper layers could be delayed by at most one RTT when all the 235 packets are lost in the network. 237 This, however, does not affect the throughput of the connection in 238 any way. If a timeout has occurred, then the data delivery to the 239 upper layers has already been excessively delayed. Delaying it by 240 another round trip is not a serious problem. Please note that 241 reliability and timeliness are two conflicting issues and one cannot 242 gain on one without sacrificing something else on the other. 244 6. SACK reneging 246 The TCP SACK information is meant to be advisory, and a TCP receiver 247 is allowed--though strongly discouraged--to discard data blocks the 248 receiver has already SACKed [4]. Please note however that even if 249 the TCP receiver discards the data block it received, it MUST still 250 send the SACK block for at least the recent most data received. 251 Therefore in spite of SACK reneging, DCLOR will work without any 252 deadlocks. 254 A SACK implementation is also allowed not to send a SACK block even 255 though the TCP sender and receiver might have agreed to SACK- 256 Permitted option at the start of the connection. In these cases, 257 however, if the receiver sends one SACK block, it must send SACK 258 blocks for the rest of the connection. Because of the above 259 mentioned leniency in implementation, its possible that a TCP 260 receiver may agree on SACK-Permitted option, and yet not send any 261 SACK blocks. To make DCLOR robust under these circumstances, DCLOR 262 SHOULD NOT be invoked unless the sender has seen at least one SACK 263 block before timeout. We, however, believe that once the 264 SACK-Permitted option is accepted, the TCP receiver MUST send a SACK 265 block--even though that block might finally be discarded. Otherwise, 266 the SACK-Permitted option is completely redundant and serves little 267 purpose. To the best of our knowledge, almost all SACK 268 implementations send a SACK block if they have accepted the 269 SACK-Permitted option. 271 7. Security Consideration 273 DCLOR does not open TCP to new attacks. 275 8. Acknowledgments 277 We would like to thank Shashikant Maheshwari, Pasi Sarolahti, and 278 Mika Liljeberg for their comments and suggestions on a previous 279 version of this draft. Special thanks to Jani Hirsimaki for 280 thoroughly reviewing the document and providing feedback on the 281 algorithm. 283 9 References 285 [1] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion Control", 286 RFC 2581, April 1999. 288 [2] Blanton, E., Allman, M., Fall, K. and L. Wang, "Conservative 289 SACK-based Loss Recovery Algorithm for TCP", RFC 3517, April 290 2003. 292 [3] Floyd, S., "Congestion Control Principles", RFC 2914, September 293 2002. 295 [4] Floyd, S., Mahdavi, J., Mathis, M. and M. Podolsky, "TCP 296 Selective Acknowledgement Options", RFC 2018, July 2000. 298 [5] Handley, M., Padhye, J. and S. Floyd, "TCP Congestion Window 299 Validation", RFC 2861, June 2000. 301 [6] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm", RFC 302 3522, April 2003. 304 [7] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm for 305 TCP.", Internet draft; work in progress, draft-ietf-tsvwg- 306 tcp-eifel-response-05.txt, March 2004. 308 [8] Paxson, V. and M. Allman, "Computing TCP's Retransmission 309 Timer", RFC 2988, November 2000. 311 [9] Sarolahti, P. and M. Kojo, "F-RTO: A TCP RTO Recovery Algorithm 312 for Avoiding Unnecessary Retransmissions.", Internet draft; work 313 in progress, July 2004. 315 Authors' Addresses 317 Yogesh Prem Swami 318 Nokia Research Center, Dallas 319 6000 Connection Drive 320 Irving, TX 75039 321 USA 323 Phone: +1 972 374 0669 324 EMail: yogesh.swami@nokia.com 326 Khiem Le 327 Nokia Research Center, Dallas 328 6000 Connection Drive 329 Irving, TX 75039 330 USA 332 Phone: +1 972 894 4882 333 EMail: khiem.le@nokia.com 335 Intellectual Property Statement 337 The IETF takes no position regarding the validity or scope of any 338 Intellectual Property Rights or other rights that might be claimed to 339 pertain to the implementation or use of the technology described in 340 this document or the extent to which any license under such rights 341 might or might not be available; nor does it represent that it has 342 made any independent effort to identify any such rights. Information 343 on the procedures with respect to rights in RFC documents can be 344 found in BCP 78 and BCP 79. 346 Copies of IPR disclosures made to the IETF Secretariat and any 347 assurances of licenses to be made available, or the result of an 348 attempt made to obtain a general license or permission for the use of 349 such proprietary rights by implementers or users of this 350 specification can be obtained from the IETF on-line IPR repository at 351 http://www.ietf.org/ipr. 353 The IETF invites any interested party to bring to its attention any 354 copyrights, patents or patent applications, or other proprietary 355 rights that may cover technology that may be required to implement 356 this standard. Please address the information to the IETF at 357 ietf-ipr@ietf.org. 359 Disclaimer of Validity 361 This document and the information contained herein are provided on an 362 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 363 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 364 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 365 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 366 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 367 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 369 Copyright Statement 371 Copyright (C) The Internet Society (2004). This document is subject 372 to the rights, licenses and restrictions contained in BCP 78, and 373 except as set forth therein, the authors retain all their rights. 375 Acknowledgment 377 Funding for the RFC Editor function is currently provided by the 378 Internet Society.