| < draft-ietf-tsvwg-newreno-01.txt | draft-ietf-tsvwg-newreno-02.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force S. Floyd | Internet Engineering Task Force S. Floyd | |||
| INTERNET DRAFT ICSI | INTERNET DRAFT ICSI | |||
| draft-ietf-tsvwg-newreno-01.txt T. Henderson | draft-ietf-tsvwg-newreno-02.txt T. Henderson | |||
| Boeing | Boeing | |||
| A. Gurtov | A. Gurtov | |||
| U. Helsinki | U. Helsinki | |||
| September 2003 | November 2003 | |||
| The NewReno Modification to TCP's Fast Recovery Algorithm | The NewReno Modification to TCP's Fast Recovery Algorithm | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
| all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 40 ¶ | skipping to change at page 1, line 40 ¶ | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| Abstract | Abstract | |||
| RFC 2581 [RFC2581] documents the following four intertwined TCP | RFC 2581 [RFC2581] documents the following four intertwined TCP | |||
| congestion control algorithms: Slow Start, Congestion Avoidance, Fast | congestion control algorithms: Slow Start, Congestion Avoidance, Fast | |||
| Retransmit, and Fast Recovery. RFC 2581 [RFC2581] explicitly allows | Retransmit, and Fast Recovery. RFC 2581 [RFC2581] explicitly allows | |||
| certain modifications of these algorithms, including modifications | certain modifications of these algorithms, including modifications | |||
| that use the TCP Selective Acknowledgement (SACK) option [RFC2018], | that use the TCP Selective Acknowledgement (SACK) option | |||
| and modifications that respond to "partial acknowledgments" (ACKs | [RFC2018,RFC3517], and modifications that respond to "partial | |||
| which cover new data, but not all the data outstanding when loss was | acknowledgments" (ACKs which cover new data, but not all the data | |||
| detected) in the absence of SACK. The NewReno mechanism uses an | outstanding when loss was detected) in the absence of SACK. The | |||
| algorithm for responding to partial acknowledgments that was first | NewReno mechanism uses an algorithm for responding to partial | |||
| proposed by Janey Hoe in [Hoe95]. | acknowledgments that was first proposed by Janey Hoe in [Hoe95]. | |||
| RFC 2582 [RFC2582] specified the NewReno mechanisms as Experimental | RFC 2582 [RFC2582] specified the NewReno mechanisms as Experimental | |||
| in 1999. This document is a small revision of RFC 2582 intended to | in 1999. This document is a small revision of RFC 2582 intended to | |||
| advance the NewReno mechanisms to Proposed Standard. RFC 2581 notes | advance the NewReno mechanisms to Proposed Standard. RFC 2581 notes | |||
| that the Fast Retransmit/Fast Recovery algorithm specified in that | that the Fast Retransmit/Fast Recovery algorithm specified in that | |||
| document does not recover very efficiently from multiple losses in a | document does not recover very efficiently from multiple losses in a | |||
| single flight of packets, and that RFC 2582 contains one set of | single flight of packets, and that RFC 2582 contains one set of | |||
| modifications to address this problem. | modifications to address this problem. | |||
| NOTE TO THE RFC EDITOR: PLEASE REMOVE THIS SECTION UPON PUBLICATION. | NOTE TO THE RFC EDITOR: PLEASE REMOVE THIS SECTION UPON PUBLICATION. | |||
| Changes from draft-ietf-tsvwg-newreno-01.txt: | ||||
| * Some improvements in phrasing suggested by Mark Allman. | ||||
| Changes from draft-ietf-tsvwg-newreno-00.txt: | Changes from draft-ietf-tsvwg-newreno-00.txt: | |||
| * In Section 8, added a cautionary note about using the duplicate | * In Section 8, added a cautionary note about using the duplicate | |||
| acknowledgment counter as a flag for whether Fast Recovery is in | acknowledgment counter as a flag for whether Fast Recovery is in | |||
| effect. | effect. | |||
| * In Section 8, added a note about pulling along "recover" with | * In Section 8, added a note about pulling along "recover" with | |||
| "snd_una" when Fast Recovery is not in effect. | "snd_una" when Fast Recovery is not in effect. | |||
| * Added a discussion in Section 6 about heuristics for distinguishing | * Added a discussion in Section 6 about heuristics for distinguishing | |||
| skipping to change at page 6, line 43 ¶ | skipping to change at page 6, line 46 ¶ | |||
| "recover", then the ACK acknowledges all the intermediate | "recover", then the ACK acknowledges all the intermediate | |||
| segments sent between the original transmission of the lost | segments sent between the original transmission of the lost | |||
| segment and the receipt of the third duplicate ACK. Set cwnd to | segment and the receipt of the third duplicate ACK. Set cwnd to | |||
| either (1) min (ssthresh, FlightSize + SMSS); or (2) ssthresh, | either (1) min (ssthresh, FlightSize + SMSS); or (2) ssthresh, | |||
| where ssthresh is the value set in step 1; this is termed | where ssthresh is the value set in step 1; this is termed | |||
| "deflating" the window. (We note that "FlightSize" in step 1 | "deflating" the window. (We note that "FlightSize" in step 1 | |||
| referred to the amount of data outstanding in step 1, when Fast | referred to the amount of data outstanding in step 1, when Fast | |||
| Recovery was entered, while "FlightSize" in step 5 refers to the | Recovery was entered, while "FlightSize" in step 5 refers to the | |||
| amount of data outstanding in step 5, when Fast Recovery is | amount of data outstanding in step 5, when Fast Recovery is | |||
| exited.) If the second option is selected, the implementation | exited.) If the second option is selected, the implementation | |||
| should take measures to avoid a possible burst of data, in case | is encouraged to take measures to avoid a possible burst of | |||
| the amount of data outstanding in the network was much less than | data, in case the amount of data outstanding in the network was | |||
| the new congestion window allows. A simple mechanism is to limit | much less than the new congestion window allows. A simple | |||
| the number of data packets that can be sent in response to a | mechanism is to limit the number of data packets that can be | |||
| single acknowledgement. (This is known as "maxburst_" in the NS | sent in response to a single acknowledgement. (This is known | |||
| simulator). Exit the Fast Recovery procedure. | as "maxburst_" in the NS simulator). Exit the Fast Recovery | |||
| procedure. | ||||
| Partial acknowledgements: | Partial acknowledgements: | |||
| If this ACK does *not* acknowledge all of the data up to and | If this ACK does *not* acknowledge all of the data up to and | |||
| including "recover", then this is a partial ACK. In this case, | including "recover", then this is a partial ACK. In this case, | |||
| retransmit the first unacknowledged segment. Deflate the | retransmit the first unacknowledged segment. Deflate the | |||
| congestion window by the amount of new data acknowledged, then | congestion window by the amount of new data acknowledged by the | |||
| add back one SMSS (if the partial ACK acknowledges at least one | cumulative acknowledgement field. If the partial ACK | |||
| SMSS of new data) and send a new segment if permitted by the new | acknowledges at least one SMSS of new data, then add back SMSS | |||
| value of cwnd. This "partial window deflation" attempts to | bytes to the congestion window. As in Step 3, this artificially | |||
| ensure that, when Fast Recovery eventually ends, approximately | inflates the congestion window in order to reflect the additional | |||
| ssthresh amount of data will be outstanding in the network. Do | segment that has left the network. Send a new segment if | |||
| not exit the Fast Recovery procedure (i.e., if any duplicate ACKs | permitted by the new value of cwnd. This "partial window | |||
| subsequently arrive, execute Steps 3 and 4 above). | deflation" attempts to ensure that, when Fast Recovery eventually | |||
| ends, approximately ssthresh amount of data will be outstanding | ||||
| in the network. Do not exit the Fast Recovery procedure (i.e., | ||||
| if any duplicate ACKs subsequently arrive, execute Steps 3 and | ||||
| 4 above). | ||||
| For the first partial ACK that arrives during Fast Recovery, also | For the first partial ACK that arrives during Fast Recovery, also | |||
| reset the retransmit timer. | reset the retransmit timer. Timer management is discussed in | |||
| more detail in Section 4. | ||||
| 6) Retransmit timeouts: | 6) Retransmit timeouts: | |||
| After a retransmit timeout, record the highest sequence number | After a retransmit timeout, record the highest sequence number | |||
| transmitted in the variable "recover" and exit the Fast | transmitted in the variable "recover" and exit the Fast | |||
| Recovery procedure if applicable. | Recovery procedure if applicable. | |||
| Step 1 specifies a check that the Cumulative Acknowledgement field | Step 1 specifies a check that the Cumulative Acknowledgement field | |||
| covers more than "recover". Because the acknowledgement field | covers more than "recover". Because the acknowledgement field | |||
| contains the sequence number that the sender next expects to receive, | contains the sequence number that the sender next expects to receive, | |||
| the acknowledgement "ack_number" covers more than "recover" when: | the acknowledgement "ack_number" covers more than "recover" when: | |||
| ack_number - one > recover. | ack_number - 1 > recover. | |||
| Note that in Step 5, the congestion window is deflated after a | Note that in Step 5, the congestion window is deflated after a | |||
| partial acknowledgement is received. The congestion window was | partial acknowledgement is received. The congestion window was | |||
| likely to have been inflated considerably when the partial | likely to have been inflated considerably when the partial | |||
| acknowledgement was received. In addition, depending on the original | acknowledgement was received. In addition, depending on the original | |||
| pattern of packet losses, the partial acknowledgement might | pattern of packet losses, the partial acknowledgement might | |||
| acknowledge nearly a window of data. In this case, if the congestion | acknowledge nearly a window of data. In this case, if the congestion | |||
| window was not deflated, the data sender might be able to send nearly | window was not deflated, the data sender might be able to send nearly | |||
| a window of data back-to-back. | a window of data back-to-back. | |||
| This document does not specify the sender's response to duplicate | This document does not specify the sender's response to duplicate | |||
| ACKs when the Fast Retransmit/Fast Recovery algorithm is not invoked. | ACKs when the Fast Retransmit/Fast Recovery algorithm is not invoked. | |||
| This is addressed in other documents, such as those describing the | This is addressed in other documents, such as those describing the | |||
| Limited Transmit procedure [RFC3042]. This document also does not | Limited Transmit procedure [RFC3042]. This document also does not | |||
| address issues of adjusting the duplicate acknowledgement threshold, | address issues of adjusting the duplicate acknowledgement threshold, | |||
| but assumes the threshold of three duplicate acknowledgements | but assumes the threshold specified in the IETF standards; the | |||
| currently specified in RFC 2581. | current standard is RFC 2581, which specifies a threshold of three | |||
| duplicate acknowledgements. | ||||
| As a final note, we would observe that in the absence of the SACK | As a final note, we would observe that in the absence of the SACK | |||
| option, the data sender is working from limited information. When | option, the data sender is working from limited information. When | |||
| the issue of recovery from multiple dropped packets from a single | the issue of recovery from multiple dropped packets from a single | |||
| window of data is of particular importance, the best alternative | window of data is of particular importance, the best alternative | |||
| would be to use the SACK option. | would be to use the SACK option. | |||
| 4. Resetting the Retransmit Timer in Response to Partial | 4. Resetting the Retransmit Timer in Response to Partial | |||
| Acknowledgements. | Acknowledgements. | |||
| skipping to change at page 9, line 47 ¶ | skipping to change at page 10, line 8 ¶ | |||
| SACK, recovering as quickly as slow-start introduces the likelihood | SACK, recovering as quickly as slow-start introduces the likelihood | |||
| of unnecessarily retransmitting packets, and this could significantly | of unnecessarily retransmitting packets, and this could significantly | |||
| complicate the recovery mechanisms. | complicate the recovery mechanisms. | |||
| We note that the response to partial acknowledgements specified in | We note that the response to partial acknowledgements specified in | |||
| Section 3 of this document and in RFC 2582 differs from the response | Section 3 of this document and in RFC 2582 differs from the response | |||
| in [FF96], even though both approaches only retransmit one packet in | in [FF96], even though both approaches only retransmit one packet in | |||
| response to a partial acknowledgement. Step 5 of Section 3 specifies | response to a partial acknowledgement. Step 5 of Section 3 specifies | |||
| that the TCP sender responds to a partial ACK by deflating the | that the TCP sender responds to a partial ACK by deflating the | |||
| congestion window by the amount of new data acknowledged, then adding | congestion window by the amount of new data acknowledged, then adding | |||
| back one SMSS if the partial ACK acknowledges at least one SMSS of | back SMSS bytes if the partial ACK acknowledges at least SMSS bytes | |||
| new data, and sending a new segment if permitted by the new value of | of new data, and sending a new segment if permitted by the new value | |||
| cwnd. Thus, only one previously-sent packet is retransmitted in | of cwnd. Thus, only one previously-sent packet is retransmitted in | |||
| response to each partial acknowledgement, but additional new packets | response to each partial acknowledgement, but additional new packets | |||
| might be transmitted as well, depending on the amount of new data | might be transmitted as well, depending on the amount of new data | |||
| acknowledged by the partial acknowledgement. In contrast, the | acknowledged by the partial acknowledgement. In contrast, the | |||
| variant of NewReno illustrated in [FF96] simply set the congestion | variant of NewReno illustrated in [FF96] simply set the congestion | |||
| window to ssthresh when a partial acknowledgement was received. The | window to ssthresh when a partial acknowledgement was received. The | |||
| approach in [FF96] is more conservative, and does not attempt to | approach in [FF96] is more conservative, and does not attempt to | |||
| accurately track the actual number of outstanding packets after a | accurately track the actual number of outstanding packets after a | |||
| partial acknowledgement is received. While either of these | partial acknowledgement is received. While either of these | |||
| approaches gives acceptable performance, the variant specified in | approaches gives acceptable performance, the variant specified in | |||
| Section 3 recovers more smoothly when multiple packets are dropped | Section 3 recovers more smoothly when multiple packets are dropped | |||
| skipping to change at page 12, line 25 ¶ | skipping to change at page 12, line 34 ¶ | |||
| If the ACK-based heuristic is used, then following the advancement of | If the ACK-based heuristic is used, then following the advancement of | |||
| the cumulative acknowledgement field, the sender stores the value of | the cumulative acknowledgement field, the sender stores the value of | |||
| previous cumulative acknowledgement as prev_highest_ack and stores | previous cumulative acknowledgement as prev_highest_ack and stores | |||
| the latest cumulative ACK as highest_ack. In addition, the following | the latest cumulative ACK as highest_ack. In addition, the following | |||
| step is performed if Step 1 in Section 3 fails, before proceeding to | step is performed if Step 1 in Section 3 fails, before proceeding to | |||
| Step 1B. | Step 1B. | |||
| 1*) If the Cumulative Acknowledgement field didn't cover more than | 1*) If the Cumulative Acknowledgement field didn't cover more than | |||
| "recover", check to see if the congestion window is greater | "recover", check to see if the congestion window is greater | |||
| than one SMSS and the difference between highest_ack and | than SMSS bytes and the difference between highest_ack and | |||
| prev_highest_ack is at most four SMSS. If true, duplicate | prev_highest_ack is at most 4*SMSS bytes. If true, duplicate | |||
| ACKs indicate a lost segment (proceed to Step 1A in Section | ACKs indicate a lost segment (proceed to Step 1A in Section | |||
| 3). Otherwise, duplicate ACKs likely result from unnecessary | 3). Otherwise, duplicate ACKs likely result from unnecessary | |||
| retransmissions (proceed to Step 1B in Section 3). | retransmissions (proceed to Step 1B in Section 3). | |||
| The congestion window check serves to protect against fast retransmit | The congestion window check serves to protect against fast retransmit | |||
| immediately after a retransmit timeout, similar to the | immediately after a retransmit timeout, similar to the | |||
| "exitFastRetrans_" variable in NS. Examples of applying the ACK | "exitFastRetrans_" variable in NS. Examples of applying the ACK | |||
| heuristic are in validation tests "./test-all-newreno | heuristic are in validation tests "./test-all-newreno | |||
| newreno_rto_loss_ack" and "./test-all-newreno newreno_rto_dup_ack" in | newreno_rto_loss_ack" and "./test-all-newreno newreno_rto_dup_ack" in | |||
| directory "tcl/test" of the NS simulator. | directory "tcl/test" of the NS simulator. | |||
| If several ACKs are lost, the sender can see a jump in the cumulative | If several ACKs are lost, the sender can see a jump in the cumulative | |||
| ACK of more than three segments and the heuristic can fail. A | ACK of more than three segments and the heuristic can fail. A | |||
| validation test for this scenario is "./test-all-newreno | validation test for this scenario is "./test-all-newreno | |||
| newreno_rto_loss_ackf". The ACK heuristic is more likely to fail if | newreno_rto_loss_ackf". RFC 2581 recommends that a receiver should | |||
| the receiver uses delayed ACKs, because then a smaller number of ACK | send duplicate ACKs for every out-of-order data packet, such as a | |||
| losses are needed to produce a sufficient jump in the cumulative ACK. | data packet received during Fast Recovery. The ACK heuristic is more | |||
| likely to fail if the receiver does not follow this advice, because | ||||
| then a smaller number of ACK losses are needed to produce a | ||||
| sufficient jump in the cumulative ACK. | ||||
| 6.2. Timestamp Heuristic. | 6.2. Timestamp Heuristic. | |||
| If this heuristic is used, the sender stores the timestamp of the | If this heuristic is used, the sender stores the timestamp of the | |||
| last acknowledged segment. In addition, the second paragraph of step | last acknowledged segment. In addition, the second paragraph of step | |||
| 1 in Section 3 is replaced as follows: | 1 in Section 3 is replaced as follows: | |||
| 1**) If the Cumulative Acknowledgement field didn't cover more than | 1**) If the Cumulative Acknowledgement field didn't cover more than | |||
| "recover", check to see if the echoed timestamp equals the | "recover", check to see if the echoed timestamp equals the | |||
| stored timestamp. If true, duplicate ACKs indicate a lost | stored timestamp. If true, duplicate ACKs indicate a lost | |||
| skipping to change at page 13, line 29 ¶ | skipping to change at page 13, line 41 ¶ | |||
| 7. Implementation Issues for the Data Receiver | 7. Implementation Issues for the Data Receiver | |||
| [RFC2581] specifies that "Out-of-order data segments SHOULD be | [RFC2581] specifies that "Out-of-order data segments SHOULD be | |||
| acknowledged immediately, in order to accelerate loss recovery." | acknowledged immediately, in order to accelerate loss recovery." | |||
| Neal Cardwell has noted that some data receivers do not send an | Neal Cardwell has noted that some data receivers do not send an | |||
| immediate acknowledgement when they send a partial acknowledgment, | immediate acknowledgement when they send a partial acknowledgment, | |||
| but instead wait first for their delayed acknowledgement timer to | but instead wait first for their delayed acknowledgement timer to | |||
| expire [C98]. As [C98] notes, this severely limits the potential | expire [C98]. As [C98] notes, this severely limits the potential | |||
| benefit from NewReno by delaying the receipt of the partial | benefit from NewReno by delaying the receipt of the partial | |||
| acknowledgement at the data sender. Our recommendation is that the | acknowledgement at the data sender. Echoing RFC 2581, our | |||
| data receiver send an immediate acknowledgement for an out-of-order | recommendation is that the data receiver send an immediate | |||
| segment, even when that out-of-order segment fills a hole in the | acknowledgement for an out-of-order segment, even when that out-of- | |||
| buffer. | order segment fills a hole in the buffer. | |||
| 8. Implementation Issues for the Data Sender | 8. Implementation Issues for the Data Sender | |||
| In Section 3, Step 5 above, it is noted that implementations should | In Section 3, Step 5 above, it is noted that implementations should | |||
| take measures to avoid a possible burst of data when leaving Fast | take measures to avoid a possible burst of data when leaving Fast | |||
| Recovery, in case the amount of new data that the sender is eligible | Recovery, in case the amount of new data that the sender is eligible | |||
| to send due to the new value of the congestion window is large. This | to send due to the new value of the congestion window is large. This | |||
| can arise during NewReno when ACKs are lost or treated as pure window | can arise during NewReno when ACKs are lost or treated as pure window | |||
| updates, thereby causing the sender to underestimate the number of | updates, thereby causing the sender to underestimate the number of | |||
| new segments that can be sent during the recovery procedure. | new segments that can be sent during the recovery procedure. | |||
| skipping to change at page 19, line 31 ¶ | skipping to change at page 19, line 31 ¶ | |||
| "http://www.acm.org/sigcomm/sigcomm97/program.html". | "http://www.acm.org/sigcomm/sigcomm97/program.html". | |||
| [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/". | [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/". | |||
| [PF01] J. Padhye and S. Floyd, "Identifying the TCP Behavior of Web | [PF01] J. Padhye and S. Floyd, "Identifying the TCP Behavior of Web | |||
| Servers", June 2001, SIGCOMM 2001. | Servers", June 2001, SIGCOMM 2001. | |||
| [RFC1323] V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for | [RFC1323] V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for | |||
| High Performance,", RFC 1323, May 1992. | High Performance,", RFC 1323, May 1992. | |||
| [RFC3517] E. Blanton, M. Allman, K. Fall, and L. Wang, "A | ||||
| Conservative Selective Acknowledgment (SACK)-based Loss Recovery | ||||
| Algorithm for TCP", RFC 3517, April 2003. | ||||
| [RFC3522] R. Ludwig and M. Meyer, The Eifel Detection Algorithm for | [RFC3522] R. Ludwig and M. Meyer, The Eifel Detection Algorithm for | |||
| TCP, RFC 3522, April 2003. | TCP, RFC 3522, April 2003. | |||
| 15. Security Considerations | 15. Security Considerations | |||
| RFC 2581 discusses general security considerations concerning TCP | RFC 2581 discusses general security considerations concerning TCP | |||
| congestion control. This document describes a specific algorithm | congestion control. This document describes a specific algorithm | |||
| that conforms with the congestion control requirements of RFC 2581, | that conforms with the congestion control requirements of RFC 2581, | |||
| and so those considerations apply to this algorithm, too. There are | and so those considerations apply to this algorithm, too. There are | |||
| no known additional security concerns for this specific algorithm. | no known additional security concerns for this specific algorithm. | |||
| End of changes. 14 change blocks. | ||||
| 38 lines changed or deleted | 56 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||