idnits 2.17.1 draft-swami-tsvwg-tcp-dclor-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 368. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 345. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 352. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 358. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 174: '... the TCP SACK option [4] is enabled, only then it SHOULD follow the...' RFC 2119 keyword, line 186: '... The TCP sender MUST record the time ...' RFC 2119 keyword, line 195: '...ally, the sender MUST NOT update the S...' RFC 2119 keyword, line 205: '...PTR), the sender SHOULD send one *new*...' RFC 2119 keyword, line 208: '... then the sender MUST retransmit no mo...' (5 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: 3. For each ACK or SACK < SS_PTR (i.e., a SACK block whose left edge is < SS_PTR), the sender SHOULD send one *new* data packet if it is present and if dclor_cntr < cwnd and (rwnd < SND.NXT -SND.UNA). If (rwnd >= SND.NXT - SND.UNA) or if there is no new data to send, then the sender MUST retransmit no more than one packet per RTO from the tail of the retransmission queue regardless of the value of dclor_cntr. Moreover, for each *new* packet sent, dclor_cntr should be incremented by one. For ACK/ SACK < SS_PTR, the sender MUST not initiate any loss recovery algorithm nor should it update cwnd value. Additionally, the SS_THRESH should be left unchanged for all these ACKs. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 27, 2005) is 6786 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '8' is defined on line 309, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2581 (ref. '1') (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (ref. '2') (Obsoleted by RFC 6675) ** Obsolete normative reference: RFC 2861 (ref. '5') (Obsoleted by RFC 7661) ** Downref: Normative reference to an Experimental RFC: RFC 3522 (ref. '6') -- No information found for draft-ietf-tsvwg- - is the name correct? -- Possible downref: Normative reference to a draft: ref. '7' ** Obsolete normative reference: RFC 2988 (ref. '8') (Obsoleted by RFC 6298) -- Possible downref: Non-RFC (?) normative reference: ref. '9' Summary: 11 errors (**), 0 flaws (~~), 4 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Swami 3 Internet-Draft K. Le 4 Expires: March 31, 2006 Nokia Research Center, Dallas 5 September 27, 2005 7 Decorrelated Loss Recovery (DCLOR) Using SACK Option for Spurious 8 Timeouts 9 draft-swami-tsvwg-tcp-dclor-06 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on March 31, 2006. 36 Copyright Notice 38 Copyright (C) The Internet Society (2005). 40 Abstract 42 A spurious timeout in TCP forces the sender to unnecessarily 43 retransmit one complete congestion window of data into the network. 44 In addition, the congestion state of the network could change 45 substantially after a spurious timeout. In this draft we propose a 46 conservative congestion response algorithm afert spurious timeout 47 that takes network state into account. 49 1. Introduction 51 The response of a TCP sender after a retransmission timeout is 52 governed by the underlying assumption that a mid-stream timeout can 53 occur only if there is heavy congestion--manifested as packet 54 loss--in the network. TCP therefore assumes that a timeout is a 55 sufficient indication to a) recover all the packets in flight, and b) 56 to initiate a congestion response (slow start in this case) suited 57 for heavy congestion scenarios. 59 Although the assumption that a timeout can occur only if there is 60 severe congestion is valid for traditional wireline networks, it does 61 not hold good for some other types of networks--networks where 62 packets can be stalled "in the network" for a significant duration 63 without being discarded. In cellular networks, for example, the link 64 layer can experience a relatively long disruption due to errors, and 65 the link layer protocol can keep all packets buffered as long as the 66 link layer disruption lasts. 68 In this document we present an alternative approach to loss recovery 69 and congestion control that "De-Correlates" Loss Recovery from 70 congestion after a spurious. The algorithm described here follows 71 the congestion control principle of [1] [3] and [5], but unlike the 72 present go-back-N loss recovery algorithm after timeout, DCLOR only 73 sends those segments that were actually lost in the network. 75 2. Terminology 77 The key words "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT," 78 "SHOULD," "SHOULD NOT," "RECOMMENDED," "MAY," "OPTIONAL," and 79 "silently ignore" in this document are to be interpreted as described 80 in RFC 2119. 82 3. Problem Description 84 Let us assume that a TCP sender has sent N packets, p(1) ... p(N), 85 into the network and it's waiting for the ACK of p(1). Due to bad 86 network conditions or some other problem, these packets are 87 excessively delayed at some intermediary node RTR-1. This excessive 88 delay forces the TCP sender to timeout and enter slow start. 90 As far as the sender is concerned, a timeout is always interpreted as 91 heavy congestion. The TCP sender therefore makes the assumption that 92 all packets between p(1) and p(N) were lost in the network. To 93 recover from this misconstrued loss, the sender retransmits P1(1) and 94 waits for the ACK a(1) ( where Px(k) represents the xth 95 retransmission of packet with sequence number k). 97 After some period of time when the network conditions at RTR-1 98 improve, the queued in packets are finally dispatched to their 99 intended recipient. In response, TCP receiver generates the ACK 100 a(1). When the TCP sender receives a(1), it's fooled into believing 101 that a(1) was generated in response to the retransmitted packet 102 p1(1), while in reality a(1) was generated in response to the 103 originally transmitted packet p(1). When the sender receives a(1), 104 it increases its congestion window to two, and retransmits p1(2) and 105 p1(3). As the sender receives more acknowledgments, it continues 106 with retransmissions and finally starts sending new data. Here we 107 only analyze the congestion control behavior after a spurious 108 timeout. Our scheme can be used in conjunction with the detection 109 schemes in [6] and [9]. 111 To analyze network congestion after spurious timeout, we compute the 112 worst case scenario packet loss in the system--assuming only TCP 113 connections to be present. After the timeout (real or spurious), the 114 TCP sender sets its SS_THRESH to N/2. Therefore, for the first N/2 115 ACKs received (i.e., ACK a(1) to a(N/2)), the TCP sender will grow 116 its congestion window by one and reach the SS_THRESH value of N/2. 117 For each ACK received, the TCP sender sends 2 packets. Therefore, by 118 the end of the slow start, the TCP sender would have sent 2*(N/2) 119 packets into the network. For the remaining N/2 ACKs (i.e., ACKs 120 between a(N/2+1) to a(N)) the TCP sender will remain in the 121 congestion avoidance phase and send one packet for each ACK 122 received--sending N/2 more data segments. The net amount of data 123 sent is therefore N/2 + N = 3N/2. 125 Please note that the entire 3N/2 packets are injected into the 126 network within a time period less than or equal to RTT in most cases. 127 The number of data segments that left the network during this time is 128 only N. Therefore, the conservation of packet principle has been 129 compromised, and of the 3N/2 packets injected in the network, N/2 130 packets will be lost with a very high probability. These N/2 lost 131 packets, however, need not come from the same connection, and such a 132 data-burst will unnecessarily penalize all the competing TCP 133 connections that share the same bottleneck router. 135 Now let's assume there are M competing TCP connections that share the 136 same bottleneck router(s) with C(0) (each connection is numbered C(0) 137 ... C(M-1)). During the period of time while C(0) is stalled, the 138 TCP sender does not use its network resources--the buffer space--on 139 the bottleneck router(s). The competing connections, C(1)... C(M), 140 however see this lack of activity as resource availability and start 141 growing their window by at least one segment per RTT during this time 142 period (by virtue of linear window increase during congestion 143 avoidance phase). For simplicity reasons, we assume that each of 144 these connections has the same round trip time of RTT, and the idle 145 time for C(0) is k*RTT (where k > RTO/RTT). Under these assumptions, 146 each of these competing connections will increase their congestion 147 window by k segments. Therefore the amount of packets lost in the 148 network due to slow start following a spurious timeout can be as high 149 as: N/2 + M*k. 151 The Eifel response algorithm [7] solves the problem of N/2 packet 152 loss, by restoring the congestion window to an old value immediately 153 before the spurious timeout. Based on the above equation, however, 154 we note that the congestion state of the network not only depends 155 upon the old window size, but also upon the duration of spurious 156 timeout. In our response algorithm, we therefore take the time 157 duration of spurious timeout into account by reducing the data rate 158 by half every RTO. Please note that this scheme works well only when 159 the number of competing connections M does not vary too much while 160 C(0) was stalled. A more conservative response algorithm should 161 reduce the data rate to INIT_WINDOW if M is not bounded. 163 In addition to the above congestion and packet loss issues, the 164 current response after spurious timeouts is inefficient, in the sense 165 that it unnecessarily retransmits data that is not lost, but simply 166 stalled. Such unnecessary retransmission is an issue when bandwidth 167 resources are at a premium, like over a cellular link, where spectrum 168 is scarce and expensive. 170 4. DCLOR Response Algorithm 172 A TCP sender should follow [6] or [9] (or any other algorithm) to 173 detect a spurious timeout. If the spurious timeout is confirmed and 174 the TCP SACK option [4] is enabled, only then it SHOULD follow the 175 DCLOR algorithm. 177 The basic idea of this algorithm is that the ACKs received for the 178 stalled packets don't provide sufficient information about the end- 179 to-end congestion state of the network. Therefore, the sender 180 reduces the congestion window by 1/2 every RTO, and waits for the ACK 181 or SACK of a new data packet before increasing it's congestion 182 window. Additionally, while the sender is waiting for the ACK/SACK 183 of new data, it's allowed to send cwnd (the updated cwnd) worth of 184 new data into the network. 186 1. The TCP sender MUST record the time when the first timeout took 187 place, and when the first ACK after the timeout was received. 188 Based on these times (or through some other means) it should 189 compute the number of unbacked-off timeouts that must have taken 190 place during this time period. Let's call this number N-RTO. 191 The sender should also keep the highest sequence number of data 192 packet that was sent in a variable called SS_PTR. The sender 193 should also keep a counter called dclor_cntr, which allows the 194 sender to send new data while it's waiting for the ACK or SACK of 195 SS_PTR. Additionally, the sender MUST NOT update the SS_TRHESH 196 value due to spurious timeouts (i.e., the spurious timeout 197 algorithm should leave SS_THRESH values unaltered). 199 2. Once the Spurious Timeout is confirmed, the TCP sender should set 200 cwnd = max( 2, pipe-size/2^N-RTO). ( where pipe-size is the 201 packets in flight at the time when spurious timeout was 202 confirmed.) Additionally, it should set dclor_cntr = 0. 204 3. For each ACK or SACK < SS_PTR (i.e., a SACK block whose left edge 205 is < SS_PTR), the sender SHOULD send one *new* data packet if it 206 is present and if dclor_cntr < cwnd and (rwnd < SND.NXT - 207 SND.UNA). If (rwnd >= SND.NXT - SND.UNA) or if there is no new 208 data to send, then the sender MUST retransmit no more than one 209 packet per RTO from the tail of the retransmission queue 210 regardless of the value of dclor_cntr. Moreover, for each *new* 211 packet sent, dclor_cntr should be incremented by one. For ACK/ 212 SACK < SS_PTR, the sender MUST not initiate any loss recovery 213 algorithm nor should it update cwnd value. Additionally, the 214 SS_THRESH should be left unchanged for all these ACKs. 216 4. If the sender receives a pure ACK > SS_PTR, it should update cwnd 217 = cwnd+1, and follow normal TCP behavior. (Note that this means 218 that none of the stalled packets were lost so we don't need to 219 change SS_THRESH value). 221 5. If the sender receives a SACK block whose left edge is greater 222 than SS_PTR, then it should traverse the retransmission queue 223 from SND.UNA to the left edge of SACK block, and mark all 224 unsacked packets as lost. Additionally, it should set cwnd = 225 cwnd + 1 and reset SS_THRESH to 1/2 the pipe-size. Beyond this 226 point, the sender MUST recover lost packets based on [2]. 228 5. Data Delivery To Upper Layers 230 If a TCP sender loses its entire congestion window worth of data, 231 sending new data after timeout prevents a TCP receiver from 232 forwarding the new data to the upper layers immediately. However, 233 once the SACK for this new data is received, the TCP sender will send 234 the first lost segment. This essentially means that data delivery to 235 the upper layers could be delayed by at most one RTT when all the 236 packets are lost in the network. 238 This, however, does not affect the throughput of the connection in 239 any way. If a timeout has occurred, then the data delivery to the 240 upper layers has already been excessively delayed. Delaying it by 241 another round trip is not a serious problem. Please note that 242 reliability and timeliness are two conflicting issues and one cannot 243 gain on one without sacrificing something else on the other. 245 6. SACK reneging 247 The TCP SACK information is meant to be advisory, and a TCP receiver 248 is allowed--though strongly discouraged--to discard data blocks the 249 receiver has already SACKed [4]. Please note however that even if 250 the TCP receiver discards the data block it received, it MUST still 251 send the SACK block for at least the recent most data received. 252 Therefore in spite of SACK reneging, DCLOR will work without any 253 deadlocks. 255 A SACK implementation is also allowed not to send a SACK block even 256 though the TCP sender and receiver might have agreed to SACK- 257 Permitted option at the start of the connection. In these cases, 258 however, if the receiver sends one SACK block, it must send SACK 259 blocks for the rest of the connection. Because of the above 260 mentioned leniency in implementation, its possible that a TCP 261 receiver may agree on SACK-Permitted option, and yet not send any 262 SACK blocks. To make DCLOR robust under these circumstances, DCLOR 263 SHOULD NOT be invoked unless the sender has seen at least one SACK 264 block before timeout. We, however, believe that once the SACK- 265 Permitted option is accepted, the TCP receiver MUST send a SACK 266 block--even though that block might finally be discarded. Otherwise, 267 the SACK-Permitted option is completely redundant and serves little 268 purpose. To the best of our knowledge, almost all SACK 269 implementations send a SACK block if they have accepted the SACK- 270 Permitted option. 272 7. Security Consideration 274 DCLOR does not open TCP to new attacks. 276 8. Acknowledgments 278 We would like to thank Shashikant Maheshwari, Pasi Sarolahti, and 279 Mika Liljeberg for their comments and suggestions on a previous 280 version of this draft. Special thanks to Jani Hirsimaki for 281 thoroughly reviewing the document and providing feedback on the 282 algorithm. 284 9. References 286 [1] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion 287 Control", RFC 2581, April 1999. 289 [2] Blanton, E., Allman, M., Fall, K., and L. Wang, "Conservative 290 SACK-based Loss Recovery Algorithm for TCP", RFC 3517, 291 April 2003. 293 [3] Floyd, S., "Congestion Control Principles", RFC 2914, 294 September 2002. 296 [4] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "TCP 297 Selective Acknowledgement Options", RFC 2018, July 2000. 299 [5] Handley, M., Padhye, J., and S. Floyd, "TCP Congestion Window 300 Validation", RFC 2861, June 2000. 302 [6] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm", 303 RFC 3522, April 2003. 305 [7] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm for 306 TCP.", Internet draft; work in progress, draft-ietf-tsvwg- tcp- 307 eifel-response-05.txt, March 2004. 309 [8] Paxson, V. and M. Allman, "Computing TCP's Retransmission 310 Timer", RFC 2988, November 2000. 312 [9] Sarolahti, P. and M. Kojo, "F-RTO: A TCP RTO Recovery Algorithm 313 for Avoiding Unnecessary Retransmissions.", Internet draft; work 314 in progress, July 2004. 316 Authors' Addresses 318 Yogesh Prem Swami 319 Nokia Research Center, Dallas 320 6000 Connection Drive 321 Irving, TX 75039 322 USA 324 Phone: +1 972 374 0669 325 Email: yogesh.swami@nokia.com 327 Khiem Le 328 Nokia Research Center, Dallas 329 6000 Connection Drive 330 Irving, TX 75039 331 USA 333 Phone: +1 972 894 4882 334 Email: khiem.le@nokia.com 336 Intellectual Property Statement 338 The IETF takes no position regarding the validity or scope of any 339 Intellectual Property Rights or other rights that might be claimed to 340 pertain to the implementation or use of the technology described in 341 this document or the extent to which any license under such rights 342 might or might not be available; nor does it represent that it has 343 made any independent effort to identify any such rights. Information 344 on the procedures with respect to rights in RFC documents can be 345 found in BCP 78 and BCP 79. 347 Copies of IPR disclosures made to the IETF Secretariat and any 348 assurances of licenses to be made available, or the result of an 349 attempt made to obtain a general license or permission for the use of 350 such proprietary rights by implementers or users of this 351 specification can be obtained from the IETF on-line IPR repository at 352 http://www.ietf.org/ipr. 354 The IETF invites any interested party to bring to its attention any 355 copyrights, patents or patent applications, or other proprietary 356 rights that may cover technology that may be required to implement 357 this standard. Please address the information to the IETF at 358 ietf-ipr@ietf.org. 360 Disclaimer of Validity 362 This document and the information contained herein are provided on an 363 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 364 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 365 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 366 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 367 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 368 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 370 Copyright Statement 372 Copyright (C) The Internet Society (2005). This document is subject 373 to the rights, licenses and restrictions contained in BCP 78, and 374 except as set forth therein, the authors retain all their rights. 376 Acknowledgment 378 Funding for the RFC Editor function is currently provided by the 379 Internet Society.