| < draft-ietf-tcpm-rfc2581bis-04.txt | draft-ietf-tcpm-rfc2581bis-05.txt > | |||
|---|---|---|---|---|
| Network Working Group M. Allman | Network Working Group M. Allman | |||
| Internet-Draft V. Paxson | Internet-Draft V. Paxson | |||
| Expires: October 2008 ICSI | Expires: October 2009 ICSI | |||
| E. Blanton | E. Blanton | |||
| Purdue University | Purdue University | |||
| April 2008 | May 2009 | |||
| TCP Congestion Control | TCP Congestion Control | |||
| draft-ietf-tcpm-rfc2581bis-04.txt | draft-ietf-tcpm-rfc2581bis-05.txt | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | This Internet-Draft is submitted to IETF in full conformance with | |||
| applicable patent or other IPR claims of which he or she is aware | the provisions of BCP 78 and BCP 79. This document may contain | |||
| have been or will be disclosed, and any of which he or she becomes | material from IETF Documents or IETF Contributions published or made | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | publicly available before November 10, 2008. The person(s) | |||
| controlling the copyright in some of this material may not have | ||||
| granted the IETF Trust the right to allow modifications of such | ||||
| material outside the IETF Standards Process. Without obtaining an | ||||
| adequate license from the person(s) controlling the copyright in | ||||
| such materials, this document may not be modified outside the IETF | ||||
| Standards Process, and derivative works of it may not be created | ||||
| outside the IETF Standards Process, except to format it for | ||||
| publication as an RFC or to translate it into languages other than | ||||
| English. | ||||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| other groups may also distribute working documents as | other groups may also distribute working documents as | |||
| Internet-Drafts. | Internet-Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six | Internet-Drafts are draft documents valid for a maximum of six | |||
| months and may be updated, replaced, or obsoleted by other documents | months and may be updated, replaced, or obsoleted by other documents | |||
| at any time. It is inappropriate to use Internet-Drafts as | at any time. It is inappropriate to use Internet-Drafts as | |||
| reference material or to cite them other than as "work in progress." | reference material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| Copyright Statement | ||||
| Copyright (c) 2009 IETF Trust and the persons identified as the | ||||
| document authors. All rights reserved. | ||||
| This document is subject to BCP 78 and the IETF Trust's Legal | ||||
| Provisions Relating to IETF Documents in effect on the date of | ||||
| publication of this document (http://trustee.ietf.org/license-info). | ||||
| Please review these documents carefully, as they describe your | ||||
| rights and restrictions with respect to this document. | ||||
| This document may contain material from IETF Documents or IETF | ||||
| Contributions published or made publicly available before November | ||||
| 10, 2008. The person(s) controlling the copyright in some of this | ||||
| material may not have granted the IETF Trust the right to allow | ||||
| modifications of such material outside the IETF Standards Process. | ||||
| Without obtaining an adequate license from the person(s) controlling | ||||
| the copyright in such materials, this document may not be modified | ||||
| outside the IETF Standards Process, and derivative works of it may | ||||
| not be created outside the IETF Standards Process, except to format | ||||
| it for publication as an RFC or to translate it into languages other | ||||
| than English. | ||||
| Abstract | Abstract | |||
| This document defines TCP's four intertwined congestion control | This document defines TCP's four intertwined congestion control | |||
| algorithms: slow start, congestion avoidance, fast retransmit, and | algorithms: slow start, congestion avoidance, fast retransmit, and | |||
| fast recovery. In addition, the document specifies how TCP should | fast recovery. In addition, the document specifies how TCP should | |||
| begin transmission after a relatively long idle period, as well as | begin transmission after a relatively long idle period, as well as | |||
| discussing various acknowledgment generation methods. | discussing various acknowledgment generation methods. | |||
| 1. Introduction | 1. Introduction | |||
| This document specifies four TCP [RFC793] congestion control | This document specifies four TCP [RFC793] congestion control | |||
| algorithms: slow start, congestion avoidance, fast retransmit and | algorithms: slow start, congestion avoidance, fast retransmit and | |||
| fast recovery. These algorithms were devised in [Jac88] and | fast recovery. These algorithms were devised in [Jac88] and | |||
| [Jac90]. Their use with TCP is standardized in [RFC1122]. | [Jac90]. Their use with TCP is standardized in [RFC1122]. | |||
| Additional early work in additive-increase, multiplicative-decrease | Additional early work in additive-increase, multiplicative-decrease | |||
| congestion control is given in [CJ89]. | congestion control is given in [CJ89]. | |||
| This document obsoletes [RFC2581] which in turned obsoleted | Note that [Ste94] provides examples of these algorithms in action | |||
| [RFC2001]. | and [WS95] provides an explanation of the source code for the BSD | |||
| implementation of these algorithms. | ||||
| In addition to specifying the congestion control algorithms, this | In addition to specifying these congestion control algorithms, this | |||
| document specifies what TCP connections should do after a relatively | document specifies what TCP connections should do after a relatively | |||
| long idle period, as well as specifying and clarifying some of the | long idle period, as well as specifying and clarifying some of the | |||
| issues pertaining to TCP ACK generation. | issues pertaining to TCP ACK generation. | |||
| Note that [Ste94] provides examples of these algorithms in action | This document obsoletes [RFC2581], which in turn obsoleted | |||
| and [WS95] provides an explanation of the source code for the BSD | [RFC2001]. | |||
| implementation of these algorithms. | ||||
| This document is organized as follows. Section 2 provides various | This document is organized as follows. Section 2 provides various | |||
| definitions which will be used throughout the document. Section 3 | definitions which will be used throughout the document. Section 3 | |||
| provides a specification of the congestion control | provides a specification of the congestion control | |||
| algorithms. Section 4 outlines concerns related to the congestion | algorithms. Section 4 outlines concerns related to the congestion | |||
| control algorithms and finally, section 5 outlines security | control algorithms and finally, section 5 outlines security | |||
| considerations. | considerations. | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| skipping to change at page 6, line 28 ¶ | skipping to change at page 7, line 6 ¶ | |||
| Implementation Note: Older implementations have an additional | Implementation Note: Older implementations have an additional | |||
| additive constant on the right-hand side of equation (3). This is | additive constant on the right-hand side of equation (3). This is | |||
| incorrect and can actually lead to diminished performance [RFC2525]. | incorrect and can actually lead to diminished performance [RFC2525]. | |||
| Implementation Note: Some implementations maintain cwnd in units of | Implementation Note: Some implementations maintain cwnd in units of | |||
| bytes, while others in units of full-sized segments. The latter | bytes, while others in units of full-sized segments. The latter | |||
| will find equation (3) difficult to use, and may prefer to use the | will find equation (3) difficult to use, and may prefer to use the | |||
| counting approach discussed in the previous paragraph. | counting approach discussed in the previous paragraph. | |||
| When a TCP sender detects segment loss using the retransmission | When a TCP sender detects segment loss using the retransmission | |||
| timer and the given segment has not yet been retransmitted, the | timer and the given segment has not yet been resent by way of the | |||
| value of ssthresh MUST be set to no more than the value given in | retransmission timer, the value of ssthresh MUST be set to no more | |||
| equation 4: | than the value given in equation 4: | |||
| ssthresh = max (FlightSize / 2, 2*SMSS) (4) | ssthresh = max (FlightSize / 2, 2*SMSS) (4) | |||
| where, as discussed above, FlightSize is the amount of outstanding | where, as discussed above, FlightSize is the amount of outstanding | |||
| data in the network. | data in the network. | |||
| On the other hand, when a TCP sender detects segment loss using the | On the other hand, when a TCP sender detects segment loss using the | |||
| retransmission timer and the given segment has already been | retransmission timer and the given segment has already been | |||
| retransmitted by way of the retransmission timer at least once, the | retransmitted by way of the retransmission timer at least once, the | |||
| value of ssthresh is held constant. | value of ssthresh is held constant. | |||
| skipping to change at page 8, line 33 ¶ | skipping to change at page 9, line 11 ¶ | |||
| that has left the network. | that has left the network. | |||
| Note: [SCWA99] discusses a receiver-based attack whereby many | Note: [SCWA99] discusses a receiver-based attack whereby many | |||
| bogus duplicate ACKs are sent to the data sender in order to | bogus duplicate ACKs are sent to the data sender in order to | |||
| artificially inflate cwnd and cause a higher than appropriate | artificially inflate cwnd and cause a higher than appropriate | |||
| sending rate to be used. A TCP MAY therefore limit the number | sending rate to be used. A TCP MAY therefore limit the number | |||
| of times cwnd is artificially inflated during loss recovery | of times cwnd is artificially inflated during loss recovery | |||
| to the number of outstanding segments (or, an approximation | to the number of outstanding segments (or, an approximation | |||
| thereof). | thereof). | |||
| Note: When an advanced loss recovery mechanism (such as outlined | ||||
| in section 4.3) is not in use, this increase in FlightSize can | ||||
| cause equation 4 to slightly inflate cwnd and ssthresh, as some | ||||
| of the segments between SND.UNA and SND.NXT are assumed to have | ||||
| left the network but are still reflected in FlightSize. | ||||
| 5. When previously unsent data is available and the new value of | 5. When previously unsent data is available and the new value of | |||
| cwnd and the receiver's advertised window allow, a TCP SHOULD | cwnd and the receiver's advertised window allow, a TCP SHOULD | |||
| send 1*SMSS bytes of previously unsent data. | send 1*SMSS bytes of previously unsent data. | |||
| 6. When the next ACK arrives that acknowledges previously | 6. When the next ACK arrives that acknowledges previously | |||
| unacknowledged data, a TCP MUST set cwnd to ssthresh (the value | unacknowledged data, a TCP MUST set cwnd to ssthresh (the value | |||
| set in step 2). This is termed "deflating" the window. | set in step 2). This is termed "deflating" the window. | |||
| This ACK should be the acknowledgment elicited by the | This ACK should be the acknowledgment elicited by the | |||
| retransmission from step 3, one RTT after the retransmission | retransmission from step 3, one RTT after the retransmission | |||
| skipping to change at page 11, line 19 ¶ | skipping to change at page 12, line 4 ¶ | |||
| We RECOMMEND that TCP implementers employ some form of advanced loss | We RECOMMEND that TCP implementers employ some form of advanced loss | |||
| recovery that can cope with multiple losses in a window of data. | recovery that can cope with multiple losses in a window of data. | |||
| The algorithms detailed in [RFC3782] and [RFC3517] conform to the | The algorithms detailed in [RFC3782] and [RFC3517] conform to the | |||
| general principles outlined above. We note that while these are not | general principles outlined above. We note that while these are not | |||
| the only two algorithms that conform to the above general principles | the only two algorithms that conform to the above general principles | |||
| these two algorithms have been vetted by the community and are | these two algorithms have been vetted by the community and are | |||
| currently on the standards track. | currently on the standards track. | |||
| 5. Security Considerations | 5. Security Considerations | |||
| This document requires a TCP to diminish its sending rate in the | This document requires a TCP to diminish its sending rate in the | |||
| presence of retransmission timeouts and the arrival of duplicate | presence of retransmission timeouts and the arrival of duplicate | |||
| acknowledgments. An attacker can therefore impair the performance | acknowledgments. An attacker can therefore impair the performance | |||
| of a TCP connection by either causing data packets or their | of a TCP connection by either causing data packets or their | |||
| acknowledgments to be lost, or by forging excessive duplicate | acknowledgments to be lost, or by forging excessive duplicate | |||
| acknowledgments. Causing two congestion control events back-to-back | acknowledgments. | |||
| will often cut ssthresh to its minimum value of 2*SMSS, causing the | ||||
| connection to immediately enter the slower-performing congestion | ||||
| avoidance phase. | ||||
| In response to the ACK division attack outlined in [SCWA99] this | In response to the ACK division attack outlined in [SCWA99] this | |||
| document RECOMMENDS increasing the congestion window based on the | document RECOMMENDS increasing the congestion window based on the | |||
| number of bytes newly acknowledged in each arriving ACK rather than | number of bytes newly acknowledged in each arriving ACK rather than | |||
| by a particular constant on each arriving ACK (as outlined in | by a particular constant on each arriving ACK (as outlined in | |||
| section 3.1). | section 3.1). | |||
| The Internet to a considerable degree relies on the correct | The Internet to a considerable degree relies on the correct | |||
| implementation of these algorithms in order to preserve network | implementation of these algorithms in order to preserve network | |||
| stability and avoid congestion collapse. An attacker could cause | stability and avoid congestion collapse. An attacker could cause | |||
| TCP endpoints to respond more aggressively in the face of congestion | TCP endpoints to respond more aggressively in the face of congestion | |||
| by forging excessive duplicate acknowledgments or excessive | by forging excessive duplicate acknowledgments or excessive | |||
| acknowledgments for new data. Conceivably, such an attack could | acknowledgments for new data. Conceivably, such an attack could | |||
| drive a portion of the network into congestion collapse. | drive a portion of the network into congestion collapse. | |||
| 6. Changes Between RFC 2001 and RFC 2581 | 6. Changes Between RFC 2001 and RFC 2581 | |||
| [RFC2001] has been extensively rewritten editorially and it is not | [RFC2001] was extensively rewritten editorially and it is not | |||
| feasible to itemize the list of changes between [RFC2001] and | feasible to itemize the list of changes between [RFC2001] and | |||
| [RFC2581]. The intention of [RFC2581] is to not change any of the | [RFC2581]. The intention of [RFC2581] was to not change any of the | |||
| recommendations given in [RFC2001], but to further clarify cases | recommendations given in [RFC2001], but to further clarify cases | |||
| that were not discussed in detail in [RFC2001]. Specifically, | that were not discussed in detail in [RFC2001]. Specifically, | |||
| [RFC2581] suggests what TCP connections should do after a relatively | [RFC2581] suggested what TCP connections should do after a | |||
| long idle period, as well as specifying and clarifying some of the | relatively long idle period, as well as specified and clarified | |||
| issues pertaining to TCP ACK generation. Finally, the allowable | some of the issues pertaining to TCP ACK generation. Finally, the | |||
| upper bound for the initial congestion window has also been raised | allowable upper bound for the initial congestion window was raised | |||
| from one to two segments. | from one to two segments. | |||
| 7. Changes Relative to RFC 2581 | 7. Changes Relative to RFC 2581 | |||
| A specific definition for "duplicate acknowledgment" has been | A specific definition for "duplicate acknowledgment" has been | |||
| added, based on the definition used by BSD TCP. | added, based on the definition used by BSD TCP. | |||
| The document now notes that what to do with duplicate ACKs after the | The document now notes that what to do with duplicate ACKs after the | |||
| retransmission timer has fired is future work and explicitly | retransmission timer has fired is future work and explicitly | |||
| unspecified in this document. | unspecified in this document. | |||
| The initial window requirements were changed to allow Larger | The initial window requirements were changed to allow Larger | |||
| Initial Windows as standardized in [RFC3390]. Additionally, the | Initial Windows as standardized in [RFC3390]. Additionally, the | |||
| steps to take when an initial window is discovered to be too large | steps to take when an initial window is discovered to be too large | |||
| skipping to change at page 13, line 55 ¶ | skipping to change at page 14, line 37 ¶ | |||
| [CJ89] Chiu, D. and R. Jain, "Analysis of the Increase/Decrease | [CJ89] Chiu, D. and R. Jain, "Analysis of the Increase/Decrease | |||
| Algorithms for Congestion Avoidance in Computer Networks", | Algorithms for Congestion Avoidance in Computer Networks", | |||
| Journal of Computer Networks and ISDN Systems, vol. 17, no. 1, | Journal of Computer Networks and ISDN Systems, vol. 17, no. 1, | |||
| pp. 1-14, June 1989. | pp. 1-14, June 1989. | |||
| [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of | [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of | |||
| Tahoe, Reno and SACK TCP", Computer Communication Review, July | Tahoe, Reno and SACK TCP", Computer Communication Review, July | |||
| 1996. ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z. | 1996. ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z. | |||
| [Flo94] Floyd, S., "TCP and Successive Fast Retransmits. Technical | ||||
| report", October 1994. | ||||
| ftp://ftp.ee.lbl.gov/papers/fastretrans.ps. | ||||
| [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion | [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion | |||
| Control Scheme for TCP", In ACM SIGCOMM, August 1996. | Control Scheme for TCP", In ACM SIGCOMM, August 1996. | |||
| [HTH98] Hughes, A., Touch, J. and J. Heidemann, "Issues in TCP | [HTH98] Hughes, A., Touch, J. and J. Heidemann, "Issues in TCP | |||
| Slow-Start Restart After Idle", Work in Progress. | Slow-Start Restart After Idle", Work in Progress. | |||
| [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer | [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer | |||
| Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. | Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. | |||
| ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. | ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. | |||
| skipping to change at page 16, line 12 ¶ | skipping to change at page 16, line 42 ¶ | |||
| International Computer Science Institute (ICSI) | International Computer Science Institute (ICSI) | |||
| 1947 Center Street | 1947 Center Street | |||
| Suite 600 | Suite 600 | |||
| Berkeley, CA 94704-1198 | Berkeley, CA 94704-1198 | |||
| Phone: +1 510/642-4274 x302 | Phone: +1 510/642-4274 x302 | |||
| EMail: vern@icir.org | EMail: vern@icir.org | |||
| http://www.icir.org/vern/ | http://www.icir.org/vern/ | |||
| Ethan Blanton | Ethan Blanton | |||
| Purdue University Computer Sciences | Purdue University Computer Sciences | |||
| 1398 Computer Science Building | 305 North University Street | |||
| West Lafayette, IN 47907 | West Lafayette, IN 47907 | |||
| EMail: eblanton@cs.purdue.edu | EMail: eblanton@cs.purdue.edu | |||
| http://www.cs.purdue.edu/homes/eblanton/ | http://www.cs.purdue.edu/homes/eblanton/ | |||
| Intellectual Property Statement | ||||
| The IETF takes no position regarding the validity or scope of any | ||||
| Intellectual Property Rights or other rights that might be claimed | ||||
| to pertain to the implementation or use of the technology described | ||||
| in this document or the extent to which any license under such | ||||
| rights might or might not be available; nor does it represent that | ||||
| it has made any independent effort to identify any such rights. | ||||
| Information on the procedures with respect to rights in RFC | ||||
| documents can be found in BCP 78 and BCP 79. | ||||
| Copies of IPR disclosures made to the IETF Secretariat and any | ||||
| assurances of licenses to be made available, or the result of an | ||||
| attempt made to obtain a general license or permission for the use | ||||
| of such proprietary rights by implementers or users of this | ||||
| specification can be obtained from the IETF on-line IPR repository | ||||
| at http://www.ietf.org/ipr. | ||||
| The IETF invites any interested party to bring to its attention any | ||||
| copyrights, patents or patent applications, or other proprietary | ||||
| rights that may cover technology that may be required to implement | ||||
| this standard. Please address the information to the IETF at | ||||
| ietf-ipr@ietf.org. | ||||
| Disclaimer of Validity | ||||
| This document and the information contained herein are provided | ||||
| on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE | ||||
| REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE | ||||
| IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL | ||||
| WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY | ||||
| WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE | ||||
| ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS | ||||
| FOR A PARTICULAR PURPOSE. | ||||
| Copyright Statement | ||||
| Copyright (C) The IETF Trust (2008). This document is subject to | ||||
| the rights, licenses and restrictions contained in BCP 78, and | ||||
| except as set forth therein, the authors retain all their rights. | ||||
| Acknowledgment | Acknowledgment | |||
| Funding for the RFC Editor function is currently provided by the | Funding for the RFC Editor function is currently provided by the | |||
| Internet Society. | Internet Society. | |||
| End of changes. 19 change blocks. | ||||
| 74 lines changed or deleted | 63 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||