| < draft-floyd-incr-init-win-01.txt | draft-floyd-incr-init-win-02.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force Mark Allman | TCP Implementation Working Group M. Allman | |||
| INTERNET DRAFT NASA Lewis/Sterling Software | INTERNET DRAFT NASA Lewis/Sterling Software | |||
| File: draft-floyd-incr-init-win-01.txt Sally Floyd | File: draft-floyd-incr-init-win-02.txt S. Floyd | |||
| LBL | LBNL | |||
| Craig Partridge | C. Partridge | |||
| BBN Technologies | BBN Technologies | |||
| March, 1998 | April, 1998 | |||
| Expires: September, 1998 | ||||
| Increasing TCP's Initial Window | Increasing TCP's Initial Window | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft. Internet-Drafts are working | This document is an Internet-Draft. Internet-Drafts are working | |||
| documents of the Internet Engineering Task Force (IETF), its areas, | documents of the Internet Engineering Task Force (IETF), its areas, | |||
| and its working groups. Note that other groups may also distribute | and its working groups. Note that other groups may also distribute | |||
| working documents as Internet-Drafts. | working documents as Internet-Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six | Internet-Drafts are draft documents valid for a maximum of six | |||
| months and may be updated, replaced, or obsoleted by other documents | months and may be updated, replaced, or obsoleted by other documents | |||
| at any time. It is inappropriate to use Internet-Drafts as | at any time. It is inappropriate to use Internet-Drafts as | |||
| reference material or to cite them other than as ``work in | reference material or to cite them other than as ``work in | |||
| progress.'' | progress.'' | |||
| To learn the current status of any Internet-Draft, please check the | To view the entire list of current Internet-Drafts, please check | |||
| ``1id-abstracts.txt'' listing contained in the Internet- Drafts | the "1id-abstracts.txt" listing contained in the Internet-Drafts | |||
| Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), | Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net | |||
| munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or | (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au | |||
| ftp.isi.edu (US West Coast). | (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu | |||
| (US West Coast). | ||||
| Abstract | Abstract | |||
| This is a note to suggest changing the permitted initial window in | This document specifies an increase in the permitted initial window | |||
| TCP from 1 segment to roughly 4K bytes. This draft considers the | for TCP from one segment to roughly 4K bytes. This document | |||
| advantages and disadvantages of such a change, as well as outlining | discusses the advantages and disadvantages of such a change, | |||
| some experimental results that indicate the costs and benefits of | outlining experimental results that indicate the costs and benefits | |||
| making such a change to TCP, and pointing out remaining research | of such a change to TCP. | |||
| questions. | ||||
| Terminology | ||||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL | ||||
| NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and | ||||
| "OPTIONAL" in this document are to be interpreted as described in | ||||
| RFC 2119 [RFC2119]. | ||||
| 1. TCP Modification | 1. TCP Modification | |||
| This draft suggests allowing the initial window used by a TCP | This document specifies an increase in the permitted upper bound | |||
| connection to increase from 1 segment to between 2 and 4 segments. | for TCP's initial window from one segment to between two | |||
| In most cases, this will result in an initial window of roughly 4K | and four segments. In most cases, this change results in an upper | |||
| bytes (although given a large segment size, the initial window could | bound on the initial window of roughly 4K bytes (although given a | |||
| be significantly larger than 4K bytes). The proposed initial window | large segment size, the permitted initial window of two segments | |||
| size is given in (1): | could be significantly larger than 4K bytes). The upper bound for | |||
| the initial window is given more precisely in (1): | ||||
| min (4*MSS, max (2*MSS, 4380 bytes)) (1) | min (4*MSS, max (2*MSS, 4380 bytes)) (1) | |||
| Or, more specifically the initial window size is based on the | Equivalently, the upper bound for the initial window size | |||
| maximum segment size (MSS), as follows: | is based on the maximum segment size (MSS), as follows: | |||
| MSS <= 1095 bytes: | If (MSS <= 1095 bytes) | |||
| win = 4 * MSS | then win <= 4 * MSS; | |||
| 1095 bytes < MSS < 2190 bytes: | If (1095 bytes < MSS < 2190 bytes) | |||
| win = 4380 | then win <= 4380; | |||
| MSS => 2190 bytes: | If (2190 bytes <= MSS) | |||
| win = 2 * MSS | then win <= 2 * MSS; | |||
| This increased initial window would be optional: that a TCP MAY | This increased initial window is optional: that a TCP MAY | |||
| start with a larger initial window, not that it SHOULD. | start with a larger initial window, not that it SHOULD. | |||
| This change would only apply to the initial window of the | This upper bound for the initial window size represents a change | |||
| connection, in the first round trip time (RTT) of transmission | from RFC 2001 [S97], which specifies that the congestion window be | |||
| following the TCP three-way handshake. That is, the SYN/ACK in the | initialized to one segment. If implementation experience proves | |||
| three way handshake should not increase the initial window size | successful, then the intent is for this change to be incorporated | |||
| above that outlined in equation (1). However, if the SYN or SYN/ACK | into a revision to RFC 2001. | |||
| is lost the initial window used after a correctly transmitted SYN | ||||
| MUST be 1 segment. | ||||
| Some TCP implementations use slow start to re-start transmission | This change applies to the initial window of the connection in the | |||
| after a long idle period. In this case, the initial window used | first round trip time (RTT) of transmission following the TCP | |||
| should be the same as the initial window used at the beginning of | three-way handshake. Neither the SYN/ACK nor its acknowledgment | |||
| the transfer. The change proposed in this document would not change | (ACK) in the three-way handshake should increase the initial window | |||
| the behavior after a retransmit timeout, when the sender would | size above that outlined in equation (1). If the SYN or SYN/ACK is | |||
| continue to slow start from an initial window of one segment. | lost, the initial window used by a sender after a correctly | |||
| transmitted SYN MUST be one segment. | ||||
| 2. Advantages of Larger Initial Windows | TCP implementations use slow start in as many as three different ways: | |||
| (1) to start a new connection (the initial window); (2) to restart | ||||
| a transmission after a long idle period (the restart window); and | ||||
| (3) to restart after a retransmit timeout (the loss window). The | ||||
| change proposed in this document affects the value of the initial | ||||
| window. Optionally, a TCP MAY set the restart window to the | ||||
| same value used for the initial window. These changes do NOT | ||||
| change the loss window, which must remain 1 (to permit the lowest | ||||
| possible window size in the case of severe congestion). | ||||
| 1. For connections transmitting only a small amount of data, a | 2. Implementation Issues | |||
| larger initial window would reduce the transmission time | ||||
| (assuming moderate segment drop rates). For many email (SMTP | ||||
| [Pos82]) and web page (HTTP [BLFN96, FJGFBL97]) transfers that | ||||
| are less than 4K bytes, the larger initial window would reduce | ||||
| the data transfer time to a single RTT. | ||||
| 2. For connections that will be able to use large congestion | When larger initial windows are implemented along with Path MTU | |||
| windows, this modification eliminates up to three RTTs and a | Discovery [MD90], and the MSS being used is found to be too large, | |||
| delayed ACK timeout during the initial slow-start phase. This | the congestion window `cwnd' SHOULD be reduced to prevent large | |||
| would be of particular benefit for high-bandwidth | bursts of smaller segments. Specifically, `cwnd' SHOULD be reduced | |||
| large-propagation-delay TCP connections, such as those over | by the ratio of the old segment size to the new segment size. | |||
| satellite links. | ||||
| 3. When the initial window is 1 segment, a receiver employing | When larger initial windows are implemented along with Path MTU | |||
| delayed acknowledgments (ACK) [Bra89] is forced to wait for a | Discovery [MD90], alternatives are to set the "Don't Fragment" (DF) | |||
| timeout before generating an ACK. With a larger initial window, | bit in all segments in the initial window, or to set the "Don't | |||
| the receiver will be able to generate an ACK after the second | Fragment" (DF) bit in one of the segments. It is an open question | |||
| data segment arrives. This eliminates the need to wait on the | which of these two alternatives is best; we would hope that | |||
| timeout (0.1 seconds, or more). | implementation experiences will shed light on this. In the first | |||
| case of setting the DF bit in all segments, if the initial packets | ||||
| are too large, then all of the initial packets will be dropped in | ||||
| the network. In the second case of setting the DF bit in only one | ||||
| segment, if the initial packets are too large, then all but one of | ||||
| the initial packets will be fragmented in the network. When the | ||||
| second case is followed, setting the DF bit in the last segment in | ||||
| the initial window provides the least chance for needless | ||||
| retransmissions when the initial segment size is found to be too | ||||
| large, because it minimizes the chances of duplicate ACKs | ||||
| triggering a Fast Retransmit. However, more attention needs to be | ||||
| paid to the interaction between larger initial windows and Path MTU | ||||
| Discovery. | ||||
| 3. Implementation Issues | The larger initial window proposed in this document is not intended | |||
| as an encouragement for web browsers to open multiple simultaneous | ||||
| TCP connections all with large initial windows. When web browsers | ||||
| open simultaneous TCP connections to the same destination, this | ||||
| works against TCP's congestion control mechanisms [FF98], | ||||
| regardless of the size of the initial window. Combining this | ||||
| behavior with larger initial windows further increases the | ||||
| unfairness to other traffic in the network. | ||||
| When larger initial windows are implemented along with Path MTU | 3. Advantages of Larger Initial Windows | |||
| Discovery [MD90], only one of the segments in the initial window | ||||
| should have the "Don't Fragment" (DF) bit set. Preliminary analysis | ||||
| indicates that setting the DF bit in the last segment in the initial | ||||
| window provides the least chance for needless retransmissions and | ||||
| large line-rate bursts of segments when the initial segment size is | ||||
| found to be too large. In addition, if the MSS being used is found | ||||
| to be too large, the cwnd should be reduced to prevent large bursts | ||||
| of smaller segments. Specifically, cwnd should be reduced by the | ||||
| ratio of the old segment size to the new segment size. However, | ||||
| more attention needs to be paid to the interaction between larger | ||||
| initial windows and Path MTU Discovery. | ||||
| The larger initial window proposed in this document SHOULD NOT be | 1. When the initial window is one segment, a receiver employing | |||
| viewed as an encouragement for web browsers to open multiple | delayed ACKs [Bra89] is forced to wait for a timeout before | |||
| simultaneous TCP connections all with larger initial windows. (Web | generating an ACK. With an initial window of at least two | |||
| browsers should not open four simultaneous TCP connections to the | segments, the receiver will generate an ACK after the second | |||
| same destination in any case, because this works against TCP's | data segment arrives. This eliminates the wait on the timeout | |||
| congestion control mechanisms [FF98]). | (often up to 200 msec). | |||
| 2. For connections transmitting only a small amount of data, a | ||||
| larger initial window reduces the transmission time (assuming | ||||
| moderate segment drop rates). For many email (SMTP [Pos82]) | ||||
| and web page (HTTP [BLFN96, FJGFBL97]) transfers that are less | ||||
| than 4K bytes, the larger initial window would reduce the data | ||||
| transfer time to a single RTT. | ||||
| 3. For connections that will be able to use large congestion | ||||
| windows, this modification eliminates up to three RTTs and a | ||||
| delayed ACK timeout during the initial slow-start phase. This | ||||
| would be of particular benefit for high-bandwidth | ||||
| large-propagation-delay TCP connections, such as those over | ||||
| satellite links. | ||||
| 4. Disadvantages of Larger Initial Windows for the Individual | 4. Disadvantages of Larger Initial Windows for the Individual | |||
| Connection | Connection | |||
| In high-congestion environments, particularly for routers that have | In high-congestion environments, particularly for routers that have | |||
| a bias against bursty traffic (as in the typical Drop Tail router | a bias against bursty traffic (as in the typical Drop Tail router | |||
| queues), a TCP connection can sometimes be better off starting with | queues), a TCP connection can sometimes be better off starting with | |||
| an initial window of one segment. There are scenarios where a TCP | an initial window of one segment. There are scenarios where a TCP | |||
| connection slow-starting from an initial window of one segment might | connection slow-starting from an initial window of one segment might | |||
| not have segments dropped, while a TCP connection starting with an | not have segments dropped, while a TCP connection starting with an | |||
| initial window of four segments might experience unnecessary | initial window of four segments might experience unnecessary | |||
| retransmits due to the inability of the router to handle small | retransmits due to the inability of the router to handle small | |||
| bursts. This could result in an unnecessary retransmit timeout. | bursts. This could result in an unnecessary retransmit timeout. | |||
| For a large-window connection that is able to recover without a | For a large-window connection that is able to recover without a | |||
| retransmit timeout, this could result in an unnecessarily-early | retransmit timeout, this could result in an unnecessarily-early | |||
| transition from the slow-start to the congestion-avoidance phase of | transition from the slow-start to the congestion-avoidance phase of | |||
| the window increase algorithm. These premature segment drops should | the window increase algorithm. These premature segment drops should | |||
| not happen in uncongested networks, or in moderately-congested | not occur in uncongested networks with sufficient buffering or in | |||
| networks where the congested router used active queue management | moderately-congested networks where the congested router uses | |||
| (such as Random Early Detection [FJ93]). | active queue management (such as Random Early Detection [FJ93]). | |||
| Some TCP connections will receive better performance with the higher | Some TCP connections will receive better performance with the higher | |||
| initial window even if the burstiness of the initial window results | initial window even if the burstiness of the initial window results | |||
| in premature segment drops. This will be true if (1) the TCP | in premature segment drops. This will be true if (1) the TCP | |||
| connection recovers from the segment drop without a retransmit | connection recovers from the segment drop without a retransmit | |||
| timeout, and (2) the TCP connection is ultimately limited to a small | timeout, and (2) the TCP connection is ultimately limited to a small | |||
| congestion window by either network congestion or by the receiver's | congestion window by either network congestion or by the receiver's | |||
| advertised window. | advertised window. | |||
| 5. Disadvantages of Larger Initial Windows for the Network | 5. Disadvantages of Larger Initial Windows for the Network | |||
| We consider two separate potential dangers for the network. The | In terms of the potential for congestion collapse, we consider two | |||
| first danger would be a scenario where a large number of segments on | separate potential dangers for the network. The first danger would | |||
| congested links were duplicate or unnecessarily-retransmitted | be a scenario where a large number of segments on congested links | |||
| segments that had already been received at the receiver. The second | were duplicate segments that had already been received at the | |||
| danger would be a scenario where a large number of segments on | receiver. The second danger would be a scenario where a large | |||
| congested links were segments that would be dropped later in the | number of segments on congested links were segments that would be | |||
| network before reaching their final destination. | dropped later in the network before reaching their final | |||
| destination. | ||||
| Unnecessarily-retransmitted segments: | In terms of the negative effect on other traffic in the network, a | |||
| potential disadvantage of larger initial windows would be that they | ||||
| increase the general packet drop rate in the network. We discuss | ||||
| these three issues below. | ||||
| As described in the previous section, the larger initial window | Duplicate segments: | |||
| could occasionally result in a segment dropped from the initial | ||||
| window, when that segment might not have been dropped if the | As described in the previous section, the larger initial window | |||
| sender had slow-started from an initial window of one segment. | could occasionally result in a segment dropped from the initial | |||
| However, Appendix A shows that even in this case, the larger | window, when that segment might not have been dropped if the | |||
| initial window would not result in a large number of | sender had slow-started from an initial window of one segment. | |||
| unnecessarily-retransmitted segments. | However, Appendix A shows that even in this case, the larger | |||
| initial window would not result in the transmission of a large | ||||
| number of duplicate segments. | ||||
| Segments dropped later in the network: | Segments dropped later in the network: | |||
| How much would the larger initial window for TCP increase the | How much would the larger initial window for TCP increase the | |||
| number of segments on congested links that would be dropped | number of segments on congested links that would be dropped | |||
| before reaching their final destination? This is a problem that | before reaching their final destination? This is a problem that | |||
| can only occur for connections with multiple congested links, | can only occur for connections with multiple congested links, | |||
| where some segments might use scarce bandwidth on the first | where some segments might use scarce bandwidth on the first | |||
| congested link along the path, only to be dropped later along | congested link along the path, only to be dropped later along | |||
| the path. | the path. | |||
| First, many of the TCP connections will have only one congested | First, many of the TCP connections will have only one congested | |||
| link along the path. Segments dropped from these connections do | link along the path. Segments dropped from these connections do | |||
| not ``waste'' scarce bandwidth, and do not contribute to | not ``waste'' scarce bandwidth, and do not contribute to | |||
| congestion collapse. | congestion collapse. | |||
| However, some network paths will have multiple congested links, | However, some network paths will have multiple congested links, | |||
| and segments dropped from the initial window could use scarce | and segments dropped from the initial window could use scarce | |||
| bandwidth along the earlier congested links before being dropped | bandwidth along the earlier congested links before ultimately | |||
| on subsequent congested links. To the extent that the drop rate | being dropped on subsequent congested links. To the extent | |||
| is independent of the initial window used by TCP segments, the | that the drop rate is independent of the initial window used by | |||
| problem of congested links carrying segments that will be | TCP segments, the problem of congested links carrying segments | |||
| dropped before reaching their destination will be similar for | that will be dropped before reaching their destination will be | |||
| TCP connections that start by sending four segments or one | similar for TCP connections that start by sending four segments | |||
| segment. | or one segment. | |||
| For a network with a high segment drop rate, increasing the | An increased packet drop rate: | |||
| initial TCP congestion window could increase the segment drop | ||||
| rate even further. This is in part because routers with drop | ||||
| tail queue management have difficulties with bursty traffic in | ||||
| times of congestion. However, this should be a second order | ||||
| effect. Given uncorrelated arrivals for TCP connections, the | ||||
| larger initial TCP congestion window should generally not | ||||
| significantly increase the segment drop rate. | ||||
| 6. Network Changes | For a network with a high segment drop rate, increasing the TCP | |||
| initial window could increase the segment drop rate even | ||||
| further. This is in part because routers with Drop Tail queue | ||||
| management have difficulties with bursty traffic in times of | ||||
| congestion. However, given uncorrelated arrivals for TCP | ||||
| connections, the larger TCP initial window should not | ||||
| significantly increase the segment drop rate. Simulation-based | ||||
| explorations of these issues are discussed in Section 7.2. | ||||
| There are other changes in the network that make a larger initial | These potential dangers for the network are explored in simulations | |||
| window less of a problem. These include the increasing deployment | and experiments described in the section below. Our judgement | |||
| of higher-speed links where 4K bytes is a rather small quantity of | would be, while there are dangers of congestion collapse in the | |||
| data and the deployment of queue management mechanisms such as RED | current Internet (see [FF98] for a discussion of the dangers of | |||
| that are more tolerant of transient traffic bursts. The current | congestion collapse from an increased deployment of UDP connections | |||
| dangers of congestion collapse most likely now come not from a 4K | without end-to-end congestion control), there is no such danger to | |||
| initial burst from TCP connections, but from the increased | the network from increasing the TCP initial window to 4K bytes. | |||
| deployment of UDP connections without end-to-end congestion control. | ||||
| 7. Concerns | 6. Typical Levels of Burstiness for TCP Traffic. | |||
| All the experiments (see section 8) with larger initial windows have | Larger TCP initial windows would not dramatically increase the | |||
| tested how the larger window affects the TCP connection that uses | burstiness of TCP traffic in the Internet today, because such | |||
| the larger window. No one has thoroughly studied the impact of the | traffic is already fairly bursty. Bursts of two and three segments | |||
| larger window on other TCP connections. In particular, no one has a | are already typical of TCP [Flo97]; A delayed ACK (covering two | |||
| thorough set of answers about what happens when a TCP bursts a | previously unacknowledged segments) received during congestion | |||
| larger initial window into or across a path already being shared by | avoidance causes the congestion window to slide and two segments to | |||
| a set of established TCP connections. | be sent. The same delayed ACK received during slow start causes | |||
| the window to slide by two segments and then be incremented by one | ||||
| segment, resulting in a three-segment burst. While not necessarily | ||||
| typical, bursts of four and five segments for TCP are not rare. | ||||
| Assuming delayed ACKs, a single dropped ACK causes the subsequent | ||||
| ACK to cover four previously unacknowledged segments. During | ||||
| congestion avoidance this leads to a four-segment burst and during | ||||
| slow start a five-segment burst is generated. | ||||
| Part of the reason for this omission is the assumption that the | There are also changes in progress that reduce the performance | |||
| effect is small. For example, in much of the Internet bursts of 2 | problems posed by moderate traffic bursts. One such change is the | |||
| and 3 segments are common and bursts of 4 and 5 segments are not | deployment of higher-speed links in some parts of the network, | |||
| rare. A delayed ACK (covering two previously unacknowledged | where a burst of 4K bytes can represent a small quantity of data. | |||
| segments) received during congestion avoidance causes the window to | A second change, for routers with sufficient buffering, is the | |||
| slide and 2 segments to be sent. The same delayed ACK received | deployment of queue management mechanisms such as RED, which is | |||
| during slow start causes the window to slide by 2 segments and then | designed to be tolerant of transient traffic bursts. | |||
| be incremented by 1 segment, leading to a 3 segment burst. Assuming | ||||
| delayed ACKs, a single dropped ACK causes the subsequent ACK to | ||||
| cover 4 previously unacknowledged segments. During congestion | ||||
| avoidance this leads to a 4 segment burst and during slow start a 5 | ||||
| segment burst is generated. | ||||
| However, there are some common scenarios where a larger initial | 7. Simulations and Experimental Results | |||
| window might have an effect. One example is low speed tail circuits | ||||
| with routers with small buffers. For instance, imagine a dialup | ||||
| link connecting routers each of which have a handful of buffers. | ||||
| Further imagine the link is already being shared by a few TCP | ||||
| connections. Then a new connection launches a large initial window, | ||||
| causing losses. How long will it be before the connections resume | ||||
| sharing the link fairly? Are there any signs of a capture effect, | ||||
| in which the new TCP gets a large fraction of the bandwidth? (A | ||||
| capture effect could ensure that, say, an SMTP server got more | ||||
| bandwidth than a long running FTP). | ||||
| Another scenario of concern is heavily loaded links. For instance, | 7.1 Studies of TCP Connections using that Larger Initial Window | |||
| a couple of years ago, one of the trans-Atlantic links was so | ||||
| heavily loaded that the correct congestion window size for a | ||||
| connection was about one segment. In this environment, new | ||||
| connections using larger initial windows would be starting with | ||||
| windows that were four times too big. What would the effects be? | ||||
| Do connections thrash? | ||||
| 8. Experimental Results | This section surveys simulations and experiments that have been | |||
| used to explore the effect of larger initial windows on the TCP | ||||
| connection using that larger window. The first set of experiments | ||||
| explores performance over satellite links. Larger initial windows | ||||
| have been shown to improve performance of TCP connections over | ||||
| satellite channels [All97b]. In this study, an initial window of | ||||
| four segments (512 byte MSS) resulted in throughput improvements of | ||||
| up to 30% (depending upon transfer size). [HAGT98] shows that the | ||||
| use of larger initial windows results in a decrease in transfer | ||||
| time in HTTP tests over the ACTS satellite system. A study | ||||
| involving simulations of a large number of HTTP transactions over | ||||
| hybrid fiber coax (HFC) indicates that the use of larger initial | ||||
| windows decreases the time required to load WWW pages [Nic97]. | ||||
| 8.1 Studies of TCP Connections using Larger Initial Windows | A second set of experiments has explored TCP performance over | |||
| dialup modem links. In experiments over a 28.8 bps dialup channel | ||||
| [All97a, AHO98], a four-segment initial window decreased the | ||||
| transfer time of a 16KB file by roughly 10%, with no accompanying | ||||
| increase in the drop rate. A particular area of concern has been | ||||
| TCP performance over low speed tail circuits (e.g., dialup modem | ||||
| links) with routers with small buffers. A simulation study [SP97] | ||||
| investigated the effects of using a larger initial window on a host | ||||
| connected by a slow modem link and a router with a 3 packet | ||||
| buffer. The study concluded that for the scenario investigated, | ||||
| the use of larger initial windows was not harmful to TCP | ||||
| performance. Questions have been raised concerning the effects of | ||||
| larger initial windows on the transfer time for short transfers in | ||||
| this environment, but these effects have not been quantified. A | ||||
| question has also been raised concerning the possible effect on | ||||
| existing TCP connections sharing the link. | ||||
| A number of studies have been done using larger initial windows. | 7.2 Studies of Networks using Larger Initial Windows | |||
| The first study considers the effects on the global Internet, as | ||||
| well as on slow dialup modem links [All97a]. These test results | ||||
| show that for 16 KB transfers to 100 Internet hosts, 4 segment | ||||
| initial windows resulted in an increase in the drop rate of 0.04 | ||||
| segments/transfer. While the drop rate increased slightly, the | ||||
| transfer time was reduced by roughly 25% for transfers using a 4 | ||||
| segment (512 byte MSS) initial window when compared to an initial | ||||
| window of 1 segment. Tests over a 28.8 bps dialup channel showed no | ||||
| increase in the drop rate and a transfer time decrease of roughly | ||||
| 10% over standard TCP when using a 4 segment initial window. | ||||
| In another study, larger initial windows have been shown to improve | This section surveys simulations and experiments investigating the | |||
| performance over satellite channels [All97b]. In this study, an | impact of the larger window on other TCP connections sharing the | |||
| initial window of 4 segments (512 byte MSS) resulted in throughput | path. Experiments in [All97a, AHO98] show that for 16 KB transfers | |||
| improvements of up to 30% (depending upon transfer size). | to 100 Internet hosts, four-segment initial windows resulted in a | |||
| small increase in the drop rate of 0.04 segments/transfer. While | ||||
| the drop rate increased slightly, the transfer time was reduced by | ||||
| roughly 25% for transfers using the four-segment (512 byte MSS) | ||||
| initial window when compared to an initial window of one segment. | ||||
| Next, a study involving simulations of a large number of HTTP | One scenario of concern is heavily loaded links. For | |||
| transactions over hybrid fiber coax (HFC) indicates that the use of | instance, a couple of years ago, one of the trans-Atlantic links | |||
| larger initial windows decreases the time required to load WWW pages | was so heavily loaded that the correct congestion window size for a | |||
| [Nic97]. [HAGT98] also shows that the use of larger initial windows | connection was about one segment. In this environment, new | |||
| results in a decrease in transfer time in HTTP tests over the ACTS | connections using larger initial windows would be starting with | |||
| satellite system. | windows that were four times too big. What would the effects be? | |||
| Do connections thrash? | ||||
| A study investigated the effects of using a larger initial window on | A simulation study in [PN98] explores the impact of a larger | |||
| a host connected by a slow modem link and a router with a 3 packet | initial window on competing network traffic. In this | |||
| buffer [SP97]. This study found that in this environment, larger | investigation, HTTP and FTP flows share a single congested gateway | |||
| initial windows slightly improved performance. | (where the number of HTTP and FTP flows varies from one simulation | |||
| set to another). For each simulation set, the paper examines | ||||
| aggregate link utilization and packet drop rates, median web page | ||||
| delay, and network power for the FTP transfers. The larger initial | ||||
| window generally resulted in increased throughput, | ||||
| slightly-increased packet drop rates, and an increase in overall | ||||
| network power. With the exception of one scenario, the larger | ||||
| initial window resulted in an increase in the drop rate of less | ||||
| than 1% above the loss rate experienced when using a one-segment | ||||
| initial window; in this scenario, the drop rate increased from | ||||
| 3.5% with one-segment initial windows, to 4.5% with four-segment | ||||
| initial windows. The overall conclusions were that increasing the | ||||
| TCP initial window to three packets (or 4380 bytes) helps to | ||||
| improve perceived performance. | ||||
| 8.2 Studies of Networks using Larger Initial Windows | Morris [Mor97] investigated larger initial windows in a very | |||
| congested network with transfers of size 20K. The loss rate in | ||||
| networks where all TCP connections use an initial window of four | ||||
| segments is shown to be 1-2% greater than in a network where all | ||||
| connections use an initial window of one segment. This | ||||
| relationship held in scenarios where the loss rates with | ||||
| one-segment initial windows ranged from 1% to 11%. In addition, in | ||||
| networks where connections used an initial window of four segments, | ||||
| TCP connections spent more time waiting for the retransmit timer | ||||
| (RTO) to expire to resend a segment than was spent when using an | ||||
| initial window of one segment. The time spent waiting for the RTO | ||||
| timer to expire represents idle time when no useful work was being | ||||
| accomplished for that connection. These results show that in a | ||||
| very congested environment, where each connection's share of the | ||||
| bottleneck bandwidth is close to one segment, using a larger | ||||
| initial window can cause a perceptible increase in both loss rates | ||||
| and retransmit timeouts. | ||||
| A simulation study of how the use of a larger initial window impacts | 8. Security Considerations | |||
| competing network traffic is outlined in [PN98]. In this | ||||
| investigation, a number of HTTP and FTP flows were sharing a | ||||
| congested gateway (the exact number of flows was varied in this | ||||
| study). The study showed improvement in HTTP transfer times on the | ||||
| order of 30% in many scenarios. In addition, a larger initial | ||||
| window slightly increased the segment drop rate (only one scenario | ||||
| increased the drop rate more than 1% above the loss rate experienced | ||||
| when using an initial window of 1 segment). | ||||
| Morris [Mor97] investigated larger initial windows in a very congested | This document discusses the initial congestion window permitted | |||
| network. The loss rate in networks where all TCP connections use an | for TCP connections. Changing this value does not raise any known | |||
| initial window of 4 segments is shown to be 1-2% greater than in a | new security issues with TCP. | |||
| network where all connections use an initial window of 1 segment. | ||||
| In addition, in networks where connections used an initial window of | ||||
| 4 segments, roughly 5-10% more time was spent waiting for the | ||||
| retransmit timer (RTO) to expire to resend a segment than was spent | ||||
| when using an initial window of 1 segment. The time spent waiting | ||||
| for the RTO timer to expire represents idle time when no useful work | ||||
| was being accomplished. These results show that in a very congested | ||||
| environment, where each connection's share of the bottleneck | ||||
| bandwidth is close to 1 segment, using a larger initial window | ||||
| degrades performance. | ||||
| 9. Conclusion | 9. Conclusion | |||
| This draft suggests a small change to TCP that may be beneficial to | This document proposes a small change to TCP that may be beneficial to | |||
| short lived TCP connections and those over links with long RTTs | short-lived TCP connections and those over links with long RTTs | |||
| (saving several RTTs during the initial slow-start phase). | (saving several RTTs during the initial slow-start phase). | |||
| 10. Acknowledgments | 10. Acknowledgments | |||
| We would like to acknowledge Tim Shepard and the members of the | We would like to acknowledge Vern Paxson, Tim Shepard, members of | |||
| End-to-End-Interest Mailing List for continuing discussions of these | the End-to-End-Interest Mailing List, and members of the IETF TCP | |||
| issues. | Implementation Working Group for continuing discussions of these | |||
| issues for discussions and feedback on this document. | ||||
| References | 11. References | |||
| [All97a] Mark Allman. An Evaluation of TCP with Larger Initial | [All97a] Mark Allman. An Evaluation of TCP with Larger Initial | |||
| Windows. 40th IETF Meeting -- TCP Implementations WG. | Windows. 40th IETF Meeting -- TCP Implementations WG. | |||
| December, 1997. Washington, DC. | December, 1997. Washington, DC. | |||
| [AHO98] Mark Allman, Chris Hayes, and Shawn Ostermann, An | ||||
| Evaluation of TCP with Larger Initial Windows, March 1998. | ||||
| Submitted to ACM Computer Communication Review. URL | ||||
| "http://gigahertz.lerc.nasa.gov/~mallman/papers/initwin.ps". | ||||
| [All97b] Mark Allman. Improving TCP Performance Over Satellite | [All97b] Mark Allman. Improving TCP Performance Over Satellite | |||
| Channels. Master's thesis, Ohio University, June 1997. | Channels. Master's thesis, Ohio University, June 1997. | |||
| [BLFN96] Tim Berners-Lee, R. Fielding, and H. Nielsen. Hypertext | [BLFN96] Tim Berners-Lee, R. Fielding, and H. Nielsen. Hypertext | |||
| Transfer Protocol -- HTTP/1.0, May 1996. RFC 1945. | Transfer Protocol -- HTTP/1.0, May 1996. RFC 1945. | |||
| [Bra89] Robert Braden. Requirements for Internet Hosts -- | [Bra89] Robert Braden. Requirements for Internet Hosts -- | |||
| Communication Layers, October 1989. RFC 1122. | Communication Layers, October 1989. RFC 1122. | |||
| [FF96] Fall, K., and Floyd, S., Simulation-based Comparisons of | [FF96] Fall, K., and Floyd, S., Simulation-based Comparisons of | |||
| Tahoe, Reno, and SACK TCP. Computer Communication Review, | Tahoe, Reno, and SACK TCP. Computer Communication Review, | |||
| 26(3), July 1996. | 26(3), July 1996. | |||
| [FF98] Sally Floyd, Kevin Fall. Promoting the Use of End-to-End | [FF98] Sally Floyd, Kevin Fall. Promoting the Use of End-to-End | |||
| Congestion Control in the Internet. Submitted to IEEE | Congestion Control in the Internet. Submitted to IEEE | |||
| Transactions on Networking. | Transactions on Networking. URL | |||
| "http://www-nrg.ee.lbl.gov/floyd/end2end-paper.html". | ||||
| [FJGFBL97] R. Fielding, Jeffrey C. Mogul, Jim Gettys, H. Frystyk, | [FJGFBL97] R. Fielding, Jeffrey C. Mogul, Jim Gettys, H. Frystyk, | |||
| and Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1, | and Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1, | |||
| January 1997. RFC 2068. | January 1997. RFC 2068. | |||
| [FJ93] Floyd, S., and Jacobson, V., Random Early Detection gateways | [FJ93] Floyd, S., and Jacobson, V., Random Early Detection gateways | |||
| for Congestion Avoidance. IEEE/ACM Transactions on Networking, | for Congestion Avoidance. IEEE/ACM Transactions on Networking, | |||
| V.1 N.4, August 1993, p. 397-413. | V.1 N.4, August 1993, p. 397-413. | |||
| [Flo94] Floyd, S., TCP and Explicit Congestion Notification. | [Flo94] Floyd, S., TCP and Explicit Congestion Notification. | |||
| Computer Communication Review, 24(5):10-23, October 1994. | Computer Communication Review, 24(5):10-23, October 1994. | |||
| [Flo96] Floyd, S., Issues of TCP with SACK. Technical report, January | [Flo96] Floyd, S., Issues of TCP with SACK. Technical report, January | |||
| 1996. Available from http://www-nrg.ee.lbl.gov/floyd/. | 1996. Available from http://www-nrg.ee.lbl.gov/floyd/. | |||
| [HAGT98] Hans Kruse, Mark Allman, Jim Griner, Diepchi Tran. HTTP | [Flo97] Floyd, S., Increasing TCP's Initial Window. Viewgraphs, | |||
| 40th IETF Meeting - TCP Implementations WG. December, 1997. | ||||
| URL "ftp://ftp.ee.lbl.gov/talks/sf-tcp-ietf97.ps". | ||||
| [KAGT98] Hans Kruse, Mark Allman, Jim Griner, Diepchi Tran. HTTP | ||||
| Page Transfer Rates Over Geo-Stationary Satellite Links. March | Page Transfer Rates Over Geo-Stationary Satellite Links. March | |||
| 1998. Proceedings of the Sixth International Conference on | 1998. Proceedings of the Sixth International Conference on | |||
| Telecommunication Systems. To Appear. | Telecommunication Systems. URL | |||
| "http://gigahertz.lerc.nasa.gov/~mallman/papers/nash98.ps". | ||||
| [MD90] Jeffrey C. Mogul and Steve Deering. Path MTU Discovery, | [MD90] Jeffrey C. Mogul and Steve Deering. Path MTU Discovery, | |||
| November 1990. RFC 1191. | November 1990. RFC 1191. | |||
| [MMFR96] Matt Mathis, Jamshid Mahdavi, Sally Floyd and Allyn | [MMFR96] Matt Mathis, Jamshid Mahdavi, Sally Floyd and Allyn | |||
| Romanow. TCP Selective Acknowledgment Options, October 1996. | Romanow. TCP Selective Acknowledgment Options, October 1996. | |||
| RFC 2018. | RFC 2018. | |||
| [Mor97] Robert Morris. Private communication. | [Mor97] Robert Morris. Private communication, 1997. Cited for | |||
| acknowledgement purposes only. | ||||
| [Nic97] Kathleen Nichols. Improving Network Simulation with | [Nic97] Kathleen Nichols. Improving Network Simulation with | |||
| Feedback. Com21, Inc. Technical Report. Available from | Feedback. Com21, Inc. Technical Report. Available from | |||
| http://www.com21.com/pages/papers/068.pdf. | http://www.com21.com/pages/papers/068.pdf. | |||
| [PN98] Poduri, K., and Nichols, K., Simulation Studies of Increased | [PN98] Poduri, K., and Nichols, K., Simulation Studies of Increased | |||
| Initial TCP Window Size, February 1998. Internet-Draft | Initial TCP Window Size, February 1998. Internet-Draft | |||
| draft-ietf-tcpimpl-poduri-00.txt (work in progress). | draft-ietf-tcpimpl-poduri-00.txt (work in progress). | |||
| [Pos82] Jon Postel. Simple Mail Transfer Protocol, August 1982. | [Pos82] Jon Postel. Simple Mail Transfer Protocol, August 1982. | |||
| RFC 821. | RFC 821. | |||
| [RF97] Ramakrishnan, K.K., and Floyd, S., A Proposal to Add Explicit | [RF97] Ramakrishnan, K.K., and Floyd, S., A Proposal to Add Explicit | |||
| Congestion Notification (ECN) to IPv6 and to TCP. Internet-Draft | Congestion Notification (ECN) to IPv6 and to TCP. Internet-Draft | |||
| draft-kksjf-ecn-00.txt (work in progress). November 1997. | draft-kksjf-ecn-00.txt (work in progress). November 1997. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | ||||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | ||||
| [S97] W. Stevens, TCP Slow Start, Congestion Avoidance, Fast | ||||
| Retransmit, and Fast Recovery Algorithms. RFC 2001, Proposed | ||||
| Standard, January 1997. | ||||
| [SP97] Tim Shepard and Craig Partridge. When TCP Starts Up With | [SP97] Tim Shepard and Craig Partridge. When TCP Starts Up With | |||
| Four Packets Into Only Three Buffers, July 1997. Internet-Draft | Four Packets Into Only Three Buffers, July 1997. Internet-Draft | |||
| draft-shepard-TCP-4-packets-3-buff-00.txt (work in progress). | draft-shepard-TCP-4-packets-3-buff-00.txt (work in progress). | |||
| Appendix A | 12. Author's Addresses | |||
| In the current environment (without Explicit Congestion Notification | ||||
| [Flo94] [RF97]), all TCPs use segment drops as indications from the | ||||
| network about the limits of available bandwidth. The change to a | ||||
| larger initial window should not result in a large number of | ||||
| unnecessarily-retransmitted segments. | ||||
| If a segment is dropped from the initial window, there are three | ||||
| different ways for TCP to recover: (1) Slow-starting from a window | ||||
| of one segment, as is done after a retransmit timeout, or after Fast | ||||
| Retransmit in Tahoe TCP; (2) Fast Recovery without selective | ||||
| acknowledgments (SACK), as is done after three duplicate ACKs in | ||||
| Reno TCP; and (3) Fast Recovery with SACK, for TCP where both the | ||||
| sender and the receiver support the SACK option [MMFR96]. In all | ||||
| three cases, if a single segment is dropped from the initial window, | ||||
| there are no unnecessarily-retransmitted segments. Note that for a | ||||
| TCP sending four 512-byte segments in the initial window, a single | ||||
| segment drop will not require a retransmit timeout, but can be | ||||
| recovered from using the Fast Retransmit algorithm. In addition, a | ||||
| single segment dropped from an initial window of three segments may | ||||
| be repaired using the fast retransmit algorithm, depending on which | ||||
| segment is dropped and whether or not delayed ACKs are used. For | ||||
| example, dropping the first segment of a three segment initial | ||||
| window will always require waiting for a timeout. However, dropping | ||||
| the third segment will always allow recovery via the fast retransmit | ||||
| algorithm. | ||||
| We now consider the case when multiple segments are dropped from the | ||||
| initial window. Using the first recovery method, slow-starting from | ||||
| a window of one segment, the number of unnecessarily-retransmitted | ||||
| segments is limited [FF96]. In the second case of Fast Recovery | ||||
| without SACK, multiple segment drops from a window of data generally | ||||
| result in a retransmit timeout. Again, the number of | ||||
| unnecessarily-retransmitted segments is small. In the third case, | ||||
| of Fast Recovery with SACK, there can only be | ||||
| unnecessarily-retransmitted segments if a precise pattern of ACK | ||||
| segments are also lost [Flo96], or if segments are | ||||
| seriously-reordered in the network. In any case, the number of | ||||
| unnecessarily-retransmitted segments due to a larger initial window | ||||
| should be small. | ||||
| Author's Addresses | ||||
| Mark Allman | Mark Allman | |||
| NASA Lewis Research Center/Sterling Software | NASA Lewis Research Center/Sterling Software | |||
| 21000 Brookpark Road | 21000 Brookpark Road | |||
| MS 54-2 | MS 54-2 | |||
| Cleveland, OH 44135 | Cleveland, OH 44135 | |||
| mallman@lerc.nasa.gov | mallman@lerc.nasa.gov | |||
| http://gigahertz.lerc.nasa.gov/~mallman/ | http://gigahertz.lerc.nasa.gov/~mallman/ | |||
| Sally Floyd | Sally Floyd | |||
| Lawrence Berkeley National Laboratory | Lawrence Berkeley National Laboratory | |||
| One Cyclotron Road | One Cyclotron Road | |||
| Berkeley, CA 94720 | Berkeley, CA 94720 | |||
| floyd@ee.lbl.gov | floyd@ee.lbl.gov | |||
| Craig Partridge | Craig Partridge | |||
| BBN Technologies | BBN Technologies | |||
| 10 Moulton Street | 10 Moulton Street | |||
| Cambridge, MA 02138 | Cambridge, MA 02138 | |||
| craig@bbn.com | craig@bbn.com | |||
| 13. Appendix - Duplicate Segments | ||||
| In the current environment (without Explicit Congestion | ||||
| Notification [Flo94] [RF97]), all TCPs use segment drops as | ||||
| indications from the network about the limits of available | ||||
| bandwidth. We argue here that the change to a larger initial | ||||
| window should not result in the sender retransmitting | ||||
| a large number of duplicate segments that have already been | ||||
| received at the receiver. | ||||
| If one segment is dropped from the initial window, there are three | ||||
| different ways for TCP to recover: (1) Slow-starting from a window | ||||
| of one segment, as is done after a retransmit timeout, or after Fast | ||||
| Retransmit in Tahoe TCP; (2) Fast Recovery without selective | ||||
| acknowledgments (SACK), as is done after three duplicate ACKs in | ||||
| Reno TCP; and (3) Fast Recovery with SACK, for TCP where both the | ||||
| sender and the receiver support the SACK option [MMFR96]. In all | ||||
| three cases, if a single segment is dropped from the initial window, | ||||
| no duplicate segments (i.e., segments that have already been | ||||
| received at the receiver) are transmitted. Note that for a | ||||
| TCP sending four 512-byte segments in the initial window, a single | ||||
| segment drop will not require a retransmit timeout, but can be | ||||
| recovered from using the Fast Retransmit algorithm (unless the | ||||
| retransmit timer expires prematurely). In addition, a single | ||||
| segment dropped from an initial window of three segments might be | ||||
| repaired using the fast retransmit algorithm, depending on which | ||||
| segment is dropped and whether or not delayed ACKs are used. For | ||||
| example, dropping the first segment of a three segment initial | ||||
| window will always require waiting for a timeout. However, | ||||
| dropping the third segment will always allow recovery via the fast | ||||
| retransmit algorithm, as long as no ACKs are lost. | ||||
| Next we consider scenarios where the initial window contains | ||||
| two to four segments, and at least two of those segments are dropped. | ||||
| If all segments in the initial window are dropped, then clearly | ||||
| no duplicate segments are retransmitted, as the receiver has not yet | ||||
| received any segments. (It is still a possibility that these dropped | ||||
| segments used scarce bandwidth on the way to their drop point; | ||||
| this issue was discussed in Section 5.) | ||||
| When two segments are dropped from an initial window of three | ||||
| segments, the sender will only send a duplicate segment if the | ||||
| first two of the three segments were dropped, and the sender does | ||||
| not receive a packet with the SACK option acknowledging the third | ||||
| segment. | ||||
| When two segments are dropped from an initial window of four | ||||
| segments, an examination of the six possible scenarios (which we | ||||
| don't go through here) shows that, depending on the position of the | ||||
| dropped packets, in the absence of SACK the sender might send one | ||||
| duplicate segment. There are no scenarios in which the sender | ||||
| sends two duplicate segments. | ||||
| When three segments are dropped from an initial window of four segments, | ||||
| then, in the absence of SACK, it is possible that one duplicate | ||||
| segment will be sent, depending on the position of the dropped segments. | ||||
| The summary is that in the absence of SACK, there are some | ||||
| scenarios with multiple segment drops from the initial window where | ||||
| one duplicate segment will be transmitted. There are no scenarios | ||||
| where more that one duplicate segment will be transmitted. Our | ||||
| conclusion is that the number of duplicate segments transmitted as | ||||
| a result of a larger initial window should be small. | ||||
| 14. Full Copyright Statement | ||||
| [This section would be filled in with the standard template if | ||||
| this document advances to an RFC.] | ||||
| End of changes. 52 change blocks. | ||||
| 269 lines changed or deleted | 301 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||