| < draft-ietf-tcpimpl-prob-04.txt | draft-ietf-tcpimpl-prob-05.txt > | |||
|---|---|---|---|---|
| Network Working Group V. Paxson, Editor | Network Working Group V. Paxson, Editor | |||
| Internet Draft M. Allman | Internet Draft M. Allman | |||
| S. Dawson | S. Dawson | |||
| W. Fenner | ||||
| J. Griner | J. Griner | |||
| I. Heavens | I. Heavens | |||
| K. Lahey | K. Lahey | |||
| J. Semke | J. Semke | |||
| B. Volz | B. Volz | |||
| Expiration Date: Feburary 1999 August 1998 | Expiration Date: May 1999 November 1998 | |||
| Known TCP Implementation Problems | Known TCP Implementation Problems | |||
| <draft-ietf-tcpimpl-prob-04.txt> | <draft-ietf-tcpimpl-prob-05.txt> | |||
| 1. Status of this Memo | 1. Status of this Memo | |||
| This document is an Internet Draft. Internet Drafts are working | This document is an Internet Draft. Internet Drafts are working | |||
| documents of the Internet Engineering Task Force (IETF), its areas, | documents of the Internet Engineering Task Force (IETF), its areas, | |||
| and its working groups. Note that other groups may also distribute | and its working groups. Note that other groups may also distribute | |||
| working documents as Internet Drafts. | working documents as Internet Drafts. | |||
| Internet Drafts are draft documents valid for a maximum of six | Internet Drafts are draft documents valid for a maximum of six | |||
| months, and may be updated, replaced, or obsoleted by other documents | months, and may be updated, replaced, or obsoleted by other documents | |||
| skipping to change at page 1, line 38 ¶ | skipping to change at page 2, line 5 ¶ | |||
| To view the entire list of current Internet-Drafts, please check the | To view the entire list of current Internet-Drafts, please check the | |||
| "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow | "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow | |||
| Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern | Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern | |||
| Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific | Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific | |||
| Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). | Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). | |||
| This memo provides information for the Internet community. This memo | This memo provides information for the Internet community. This memo | |||
| does not specify an Internet standard of any kind. Distribution of | does not specify an Internet standard of any kind. Distribution of | |||
| this memo is unlimited. | this memo is unlimited. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| Table of Contents | ||||
| 1. STATUS OF THIS MEMO.............................................1 | ||||
| 2. INTRODUCTION....................................................2 | ||||
| 3. KNOWN IMPLEMENTATION PROBLEMS...................................4 | ||||
| 3.1 No initial slow start........................................4 | ||||
| 3.2 No slow start after retransmission timeout...................6 | ||||
| 3.3 Uninitialized CWND...........................................9 | ||||
| 3.4 Inconsistent retransmission.................................11 | ||||
| 3.5 Failure to retain above-sequence data.......................14 | ||||
| 3.6 Extra additive constant in congestion avoidance.............18 | ||||
| 3.7 Initial RTO too low.........................................24 | ||||
| 3.8 Failure of window deflation after loss recovery.............27 | ||||
| 3.9 Excessively short keepalive connection timeout..............30 | ||||
| 3.10 Failure to back off retransmission timeout..................32 | ||||
| 3.11 Insufficient interval between keepalives....................35 | ||||
| 3.12 Window probe deadlock.......................................38 | ||||
| 3.13 Stretch ACK violation.......................................42 | ||||
| 3.14 Retransmission sends multiple packets.......................45 | ||||
| 3.15 Failure to send FIN notification promptly...................48 | ||||
| 3.16 Failure to send a RST after Half Duplex Close...............50 | ||||
| 3.17 Failure to RST on close with data pending...................53 | ||||
| 3.18 Options missing from TCP MSS calculation....................57 | ||||
| 4. SECURITY CONSIDERATIONS........................................59 | ||||
| 5. ACKNOWLEDGEMENTS...............................................59 | ||||
| 6. REFERENCES.....................................................60 | ||||
| 7. AUTHORS' ADDRESSES.............................................62 | ||||
| 2. Introduction | 2. Introduction | |||
| This memo catalogs a number of known TCP implementation problems. | This memo catalogs a number of known TCP implementation problems. | |||
| The goal in doing so is to improve conditions in the existing | The goal in doing so is to improve conditions in the existing | |||
| Internet by enhancing the quality of current TCP/IP implementations. | Internet by enhancing the quality of current TCP/IP implementations. | |||
| It is hoped that both performance and correctness issues can be | It is hoped that both performance and correctness issues can be | |||
| resolved by making implementors aware of the problems and their | resolved by making implementors aware of the problems and their | |||
| solutions. In the long term, it is hoped that this will provide a | solutions. In the long term, it is hoped that this will provide a | |||
| reduction in unnecessary traffic on the network, the rate of | reduction in unnecessary traffic on the network, the rate of | |||
| connection failures due to protocol errors, and load on network | connection failures due to protocol errors, and load on network | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| servers due to time spent processing both unsuccessful connections | servers due to time spent processing both unsuccessful connections | |||
| and retransmitted data. This will help to ensure the stability of | and retransmitted data. This will help to ensure the stability of | |||
| the global Internet. | the global Internet. | |||
| Each problem is defined as follows: | Each problem is defined as follows: | |||
| Name of Problem | Name of Problem | |||
| The name associated with the problem. In this memo, the name is | The name associated with the problem. In this memo, the name is | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| given as a subsection heading. | given as a subsection heading. | |||
| Classification | Classification | |||
| One or more problem categories for which the problem is | One or more problem categories for which the problem is | |||
| classified. Categories used so far: "congestion control", | classified: "congestion control", "performance", "reliability", | |||
| "performance", "reliability", "resource management". Others | "resource management". | |||
| anticipated: "security", "interoperability", "configuration". | ||||
| Description | Description | |||
| A definition of the problem, succinct but including necessary | A definition of the problem, succinct but including necessary | |||
| background material. | background material. | |||
| Significance | Significance | |||
| A brief summary of the sorts of environments for which the | A brief summary of the sorts of environments for which the | |||
| problem is significant. | problem is significant. | |||
| Implications | Implications | |||
| Why the problem is viewed as a problem. | Why the problem is viewed as a problem. | |||
| Relevant RFCs | Relevant RFCs | |||
| Brief discussion of the RFCs with respect to which the problem | The RFCs defining the TCP specification with which the problem | |||
| is viewed as an implementation error. These RFCs often qualify | conflicts. These RFCs often qualify behavior using terms such | |||
| behavior using terms such as MUST, SHOULD, MAY, and others | as MUST, SHOULD, MAY, and others written capitalized. See RFC | |||
| written capitalized. See RFC 2119 for the exact interpretation | 2119 for the exact interpretation of these terms. | |||
| of these terms. | ||||
| Trace file demonstrating the problem | Trace file demonstrating the problem | |||
| One or more ASCII trace files demonstrating the problem, if | One or more ASCII trace files demonstrating the problem, if | |||
| applicable. These may in the future be replaced with URLs to | applicable. | |||
| on-line traces. | ||||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| One or more examples of how correct behavior appears in a trace, | One or more examples of how correct behavior appears in a trace, | |||
| if applicable. These may in the future be replaced with URLs to | if applicable. | |||
| on-line traces. | ||||
| References | References | |||
| References that further discuss the problem. | References that further discuss the problem. | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| How to detect | How to detect | |||
| How to test an implementation to see if it exhibits the problem. | How to test an implementation to see if it exhibits the problem. | |||
| This discussion may include difficulties and subtleties | This discussion may include difficulties and subtleties | |||
| associated with causing the problem to manifest itself, and with | associated with causing the problem to manifest itself, and with | |||
| interpreting traces to detect the presence of the problem (if | interpreting traces to detect the presence of the problem (if | |||
| applicable). In the future, this may include URLs for | applicable). | |||
| diagnostic tools. | ||||
| How to fix | How to fix | |||
| For known causes of the problem, how to correct the | For known causes of the problem, how to correct the | |||
| implementation. | implementation. | |||
| Implementation specifics | ID Known TCP Implementation Problems November 1998 | |||
| If it is viewed as beneficial to document particular | ||||
| implementations exhibiting the problem, and if the corresponding | ||||
| implementors approve, then this section gives the specifics of | ||||
| those implementations, along with a contact address for the | ||||
| implementors. | ||||
| 3. Known implementation problems | 3. Known implementation problems | |||
| 3.1. | 3.1. | |||
| Name of Problem | Name of Problem | |||
| No initial slow start | No initial slow start | |||
| Classification | Classification | |||
| Congestion control | Congestion control | |||
| Description | Description | |||
| When a TCP begins transmitting data, it is required by RFC 1122, | When a TCP begins transmitting data, it is required by RFC 1122, | |||
| 4.2.2.15, to engage in a "slow start" by initializing its | 4.2.2.15, to engage in a "slow start" by initializing its | |||
| congestion window, cwnd, to one packet (one segment of the maximum | congestion window, cwnd, to one packet (one segment of the maximum | |||
| size). (Note that an experimental change to TCP, documented in | size). (Note that an experimental change to TCP, documented in | |||
| [Allman98], allows an initial value somewhat larger than one | [RFC2414], allows an initial value somewhat larger than one | |||
| packet.) It subsequently increases cwnd by one packet for each ACK | packet.) It subsequently increases cwnd by one packet for each ACK | |||
| it receives for new data. The minimum of cwnd and the receiver's | it receives for new data. The minimum of cwnd and the receiver's | |||
| advertised window bounds the highest sequence number the TCP can | advertised window bounds the highest sequence number the TCP can | |||
| transmit. A TCP that fails to initialize and increment cwnd in | transmit. A TCP that fails to initialize and increment cwnd in | |||
| this fashion exhibits "No initial slow start". | this fashion exhibits "No initial slow start". | |||
| Significance | Significance | |||
| In congested environments, detrimental to the performance of other | In congested environments, detrimental to the performance of other | |||
| connections, and possibly to the connection itself. | connections, and possibly to the connection itself. | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Implications | Implications | |||
| A TCP failing to slow start when beginning a connection results in | A TCP failing to slow start when beginning a connection results in | |||
| traffic bursts that can stress the network, leading to excessive | traffic bursts that can stress the network, leading to excessive | |||
| queueing delays and packet loss. | queueing delays and packet loss. | |||
| Implementations exhibiting this problem might do so because they | Implementations exhibiting this problem might do so because they | |||
| suffer from the general problem of not including the required | suffer from the general problem of not including the required | |||
| congestion window. These implementations will also suffer from "No | congestion window. These implementations will also suffer from "No | |||
| slow start after retransmission timeout". | slow start after retransmission timeout". | |||
| There are different shades of "No initial slow start". From the | There are different shades of "No initial slow start". From the | |||
| perspective of stressing the network, the worst is a connection | perspective of stressing the network, the worst is a connection | |||
| that simply always sends based on the receiver's advertised window, | that simply always sends based on the receiver's advertised window, | |||
| with no notion of a separate congestion window. Another form is | with no notion of a separate congestion window. Another form is | |||
| described in "Uninitialized CWND" below. | described in "Uninitialized CWND" below. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 requires use of slow start. RFC 2001 gives the specifics | RFC 1122 requires use of slow start. RFC 2001 gives the specifics | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| of slow start. | of slow start. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using tcpdump/BPF recording at the connection responder. No | Made using tcpdump [Jacobson89] recording at the connection | |||
| losses reported. | responder. No losses reported by the packet filter. | |||
| 10:40:42.244503 B > A: S 1168512000:1168512000(0) win 32768 | 10:40:42.244503 B > A: S 1168512000:1168512000(0) win 32768 | |||
| <mss 1460,nop,wscale 0> (DF) [tos 0x8] | <mss 1460,nop,wscale 0> (DF) [tos 0x8] | |||
| 10:40:42.259908 A > B: S 3688169472:3688169472(0) | 10:40:42.259908 A > B: S 3688169472:3688169472(0) | |||
| ack 1168512001 win 32768 <mss 1460> | ack 1168512001 win 32768 <mss 1460> | |||
| 10:40:42.389992 B > A: . ack 1 win 33580 (DF) [tos 0x8] | 10:40:42.389992 B > A: . ack 1 win 33580 (DF) [tos 0x8] | |||
| 10:40:42.664975 A > B: P 1:513(512) ack 1 win 32768 | 10:40:42.664975 A > B: P 1:513(512) ack 1 win 32768 | |||
| 10:40:42.700185 A > B: . 513:1973(1460) ack 1 win 32768 | 10:40:42.700185 A > B: . 513:1973(1460) ack 1 win 32768 | |||
| 10:40:42.718017 A > B: . 1973:3433(1460) ack 1 win 32768 | 10:40:42.718017 A > B: . 1973:3433(1460) ack 1 win 32768 | |||
| 10:40:42.762945 A > B: . 3433:4893(1460) ack 1 win 32768 | 10:40:42.762945 A > B: . 3433:4893(1460) ack 1 win 32768 | |||
| skipping to change at page 5, line 4 ¶ | skipping to change at page 5, line 36 ¶ | |||
| After the third packet, the connection is established. A, the | After the third packet, the connection is established. A, the | |||
| connection responder, begins transmitting to B, the connection | connection responder, begins transmitting to B, the connection | |||
| initiator. Host A quickly sends 6 packets comprising 7812 bytes, | initiator. Host A quickly sends 6 packets comprising 7812 bytes, | |||
| even though the SYN exchange agreed upon an MSS of 1460 bytes | even though the SYN exchange agreed upon an MSS of 1460 bytes | |||
| (implying an initial congestion window of 1 segment corresponds to | (implying an initial congestion window of 1 segment corresponds to | |||
| 1460 bytes), and so A should have sent at most 1460 bytes. | 1460 bytes), and so A should have sent at most 1460 bytes. | |||
| The ACKs sent by B to A in the last two lines indicate that this | The ACKs sent by B to A in the last two lines indicate that this | |||
| trace is not a measurement error (slow start really occurring but | trace is not a measurement error (slow start really occurring but | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| the corresponding ACKs having been dropped by the packet filter). | the corresponding ACKs having been dropped by the packet filter). | |||
| A second trace confirmed that the problem is repeatable. | A second trace confirmed that the problem is repeatable. | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| Made using tcpdump/BPF recording at the connection originator. No | Made using tcpdump recording at the connection originator. No | |||
| losses reported. | losses reported by the packet filter. | |||
| 12:35:31.914050 C > D: S 1448571845:1448571845(0) win 4380 <mss 1460> | 12:35:31.914050 C > D: S 1448571845:1448571845(0) win 4380 <mss 1460> | |||
| 12:35:32.068819 D > C: S 1755712000:1755712000(0) ack 1448571846 win 4096 | 12:35:32.068819 D > C: S 1755712000:1755712000(0) ack 1448571846 win 4096 | |||
| 12:35:32.069341 C > D: . ack 1 win 4608 | 12:35:32.069341 C > D: . ack 1 win 4608 | |||
| 12:35:32.075213 C > D: P 1:513(512) ack 1 win 4608 | 12:35:32.075213 C > D: P 1:513(512) ack 1 win 4608 | |||
| 12:35:32.286073 D > C: . ack 513 win 4096 | 12:35:32.286073 D > C: . ack 513 win 4096 | |||
| 12:35:32.287032 C > D: . 513:1025(512) ack 1 win 4608 | 12:35:32.287032 C > D: . 513:1025(512) ack 1 win 4608 | |||
| 12:35:32.287506 C > D: . 1025:1537(512) ack 1 win 4608 | 12:35:32.287506 C > D: . 1025:1537(512) ack 1 win 4608 | |||
| 12:35:32.432712 D > C: . ack 1537 win 4096 | 12:35:32.432712 D > C: . ack 1537 win 4096 | |||
| 12:35:32.433690 C > D: . 1537:2049(512) ack 1 win 4608 | 12:35:32.433690 C > D: . 1537:2049(512) ack 1 win 4608 | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 12:35:32.434481 C > D: . 2049:2561(512) ack 1 win 4608 | 12:35:32.434481 C > D: . 2049:2561(512) ack 1 win 4608 | |||
| 12:35:32.435032 C > D: . 2561:3073(512) ack 1 win 4608 | 12:35:32.435032 C > D: . 2561:3073(512) ack 1 win 4608 | |||
| 12:35:32.594526 D > C: . ack 3073 win 4096 | 12:35:32.594526 D > C: . ack 3073 win 4096 | |||
| 12:35:32.595465 C > D: . 3073:3585(512) ack 1 win 4608 | 12:35:32.595465 C > D: . 3073:3585(512) ack 1 win 4608 | |||
| 12:35:32.595947 C > D: . 3585:4097(512) ack 1 win 4608 | 12:35:32.595947 C > D: . 3585:4097(512) ack 1 win 4608 | |||
| 12:35:32.596414 C > D: . 4097:4609(512) ack 1 win 4608 | 12:35:32.596414 C > D: . 4097:4609(512) ack 1 win 4608 | |||
| 12:35:32.596888 C > D: . 4609:5121(512) ack 1 win 4608 | 12:35:32.596888 C > D: . 4609:5121(512) ack 1 win 4608 | |||
| 12:35:32.733453 D > C: . ack 4097 win 4096 | 12:35:32.733453 D > C: . ack 4097 win 4096 | |||
| References | References | |||
| skipping to change at page 6, line 5 ¶ | skipping to change at page 6, line 31 ¶ | |||
| immediately in a packet trace or a sequence plot, as illustrated | immediately in a packet trace or a sequence plot, as illustrated | |||
| above. | above. | |||
| How to fix | How to fix | |||
| If the root problem is that the implementation lacks a notion of a | If the root problem is that the implementation lacks a notion of a | |||
| congestion window, then unfortunately this requires significant | congestion window, then unfortunately this requires significant | |||
| work to fix. However, doing so is important, as such | work to fix. However, doing so is important, as such | |||
| implementations also exhibit "No slow start after retransmission | implementations also exhibit "No slow start after retransmission | |||
| timeout". | timeout". | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 3.2. | 3.2. | |||
| Name of Problem | Name of Problem | |||
| No slow start after retransmission timeout | No slow start after retransmission timeout | |||
| Classification | Classification | |||
| Congestion control | Congestion control | |||
| Description | Description | |||
| When a TCP experiences a retransmission timeout, it is required by | When a TCP experiences a retransmission timeout, it is required by | |||
| RFC 1122, 4.2.2.15, to engage in "slow start" by initializing its | RFC 1122, 4.2.2.15, to engage in "slow start" by initializing its | |||
| congestion window, cwnd, to one packet (one segment of the maximum | congestion window, cwnd, to one packet (one segment of the maximum | |||
| size). It subsequently increases cwnd by one packet for each ACK | size). It subsequently increases cwnd by one packet for each ACK | |||
| it receives for new data until it reaches the "congestion | it receives for new data until it reaches the "congestion | |||
| avoidance" threshold, ssthresh, at which point the congestion | avoidance" threshold, ssthresh, at which point the congestion | |||
| avoidance algorithm for updating the window takes over. A TCP that | avoidance algorithm for updating the window takes over. A TCP that | |||
| fails to enter slow start upon a timeout exhibits "No slow start | fails to enter slow start upon a timeout exhibits "No slow start | |||
| after retransmission timeout". | after retransmission timeout". | |||
| Significance | Significance | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| In congested environments, severely detrimental to the performance | In congested environments, severely detrimental to the performance | |||
| of other connections, and also the connection itself. | of other connections, and also the connection itself. | |||
| Implications | Implications | |||
| Entering slow start upon timeout forms one of the cornerstones of | Entering slow start upon timeout forms one of the cornerstones of | |||
| Internet congestion stability, as outlined in [Jacobson88]. If | Internet congestion stability, as outlined in [Jacobson88]. If | |||
| TCPs fail to do so, the network becomes at risk of suffering | TCPs fail to do so, the network becomes at risk of suffering | |||
| "congestion collapse" [RFC896]. | "congestion collapse" [RFC896]. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 requires use of slow start after loss. RFC 2001 gives the | RFC 1122 requires use of slow start after loss. RFC 2001 gives the | |||
| specifics of how to implement slow start. RFC 896 describes | specifics of how to implement slow start. RFC 896 describes | |||
| congestion collapse. | congestion collapse. | |||
| The retransmission timeout discussed here should not be confused | The retransmission timeout discussed here should not be confused | |||
| with the separate "fast recovery" retransmission mechanism | with the separate "fast recovery" retransmission mechanism | |||
| discussed in RFC 2001. | discussed in RFC 2001. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using tcpdump/BPF recording at the sending TCP (A). No losses | Made using tcpdump recording at the sending TCP (A). No losses | |||
| reported. | reported by the packet filter. | |||
| 10:40:59.090612 B > A: . ack 357125 win 33580 (DF) [tos 0x8] | 10:40:59.090612 B > A: . ack 357125 win 33580 (DF) [tos 0x8] | |||
| 10:40:59.222025 A > B: . 357125:358585(1460) ack 1 win 32768 | 10:40:59.222025 A > B: . 357125:358585(1460) ack 1 win 32768 | |||
| 10:40:59.868871 A > B: . 357125:358585(1460) ack 1 win 32768 | 10:40:59.868871 A > B: . 357125:358585(1460) ack 1 win 32768 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 10:41:00.016641 B > A: . ack 364425 win 33580 (DF) [tos 0x8] | 10:41:00.016641 B > A: . ack 364425 win 33580 (DF) [tos 0x8] | |||
| 10:41:00.036709 A > B: . 364425:365885(1460) ack 1 win 32768 | 10:41:00.036709 A > B: . 364425:365885(1460) ack 1 win 32768 | |||
| 10:41:00.045231 A > B: . 365885:367345(1460) ack 1 win 32768 | 10:41:00.045231 A > B: . 365885:367345(1460) ack 1 win 32768 | |||
| 10:41:00.053785 A > B: . 367345:368805(1460) ack 1 win 32768 | 10:41:00.053785 A > B: . 367345:368805(1460) ack 1 win 32768 | |||
| 10:41:00.062426 A > B: . 368805:370265(1460) ack 1 win 32768 | 10:41:00.062426 A > B: . 368805:370265(1460) ack 1 win 32768 | |||
| 10:41:00.071074 A > B: . 370265:371725(1460) ack 1 win 32768 | 10:41:00.071074 A > B: . 370265:371725(1460) ack 1 win 32768 | |||
| 10:41:00.079794 A > B: . 371725:373185(1460) ack 1 win 32768 | 10:41:00.079794 A > B: . 371725:373185(1460) ack 1 win 32768 | |||
| 10:41:00.089304 A > B: . 373185:374645(1460) ack 1 win 32768 | 10:41:00.089304 A > B: . 373185:374645(1460) ack 1 win 32768 | |||
| 10:41:00.097738 A > B: . 374645:376105(1460) ack 1 win 32768 | 10:41:00.097738 A > B: . 374645:376105(1460) ack 1 win 32768 | |||
| 10:41:00.106409 A > B: . 376105:377565(1460) ack 1 win 32768 | 10:41:00.106409 A > B: . 376105:377565(1460) ack 1 win 32768 | |||
| skipping to change at page 7, line 29 ¶ | skipping to change at page 8, line 5 ¶ | |||
| 10:41:00.132016 A > B: . 380485:381945(1460) ack 1 win 32768 | 10:41:00.132016 A > B: . 380485:381945(1460) ack 1 win 32768 | |||
| 10:41:00.141635 A > B: . 381945:383405(1460) ack 1 win 32768 | 10:41:00.141635 A > B: . 381945:383405(1460) ack 1 win 32768 | |||
| 10:41:00.150094 A > B: . 383405:384865(1460) ack 1 win 32768 | 10:41:00.150094 A > B: . 383405:384865(1460) ack 1 win 32768 | |||
| 10:41:00.158552 A > B: . 384865:386325(1460) ack 1 win 32768 | 10:41:00.158552 A > B: . 384865:386325(1460) ack 1 win 32768 | |||
| 10:41:00.167053 A > B: . 386325:387785(1460) ack 1 win 32768 | 10:41:00.167053 A > B: . 386325:387785(1460) ack 1 win 32768 | |||
| 10:41:00.175518 A > B: . 387785:389245(1460) ack 1 win 32768 | 10:41:00.175518 A > B: . 387785:389245(1460) ack 1 win 32768 | |||
| 10:41:00.210835 A > B: . 389245:390705(1460) ack 1 win 32768 | 10:41:00.210835 A > B: . 389245:390705(1460) ack 1 win 32768 | |||
| 10:41:00.226108 A > B: . 390705:392165(1460) ack 1 win 32768 | 10:41:00.226108 A > B: . 390705:392165(1460) ack 1 win 32768 | |||
| 10:41:00.241524 B > A: . ack 389245 win 8760 (DF) [tos 0x8] | 10:41:00.241524 B > A: . ack 389245 win 8760 (DF) [tos 0x8] | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| The first packet indicates the ack point is 357125. 130 msec after | The first packet indicates the ack point is 357125. 130 msec after | |||
| receiving the ACK, A transmits the packet after the ACK point, | receiving the ACK, A transmits the packet after the ACK point, | |||
| 357125:358585. 640 msec after this transmission, it retransmits | 357125:358585. 640 msec after this transmission, it retransmits | |||
| 357125:358585, in an apparent retransmission timeout. At this | 357125:358585, in an apparent retransmission timeout. At this | |||
| point, A's cwnd should be one MSS, or 1460 bytes, as A enters slow | point, A's cwnd should be one MSS, or 1460 bytes, as A enters slow | |||
| start. The trace is consistent with this possibility. | start. The trace is consistent with this possibility. | |||
| B replies with an ACK of 364425, indicating that A has filled a | B replies with an ACK of 364425, indicating that A has filled a | |||
| sequence hole. At this point, A's cwnd should be 1460*2 = 2920 | sequence hole. At this point, A's cwnd should be 1460*2 = 2920 | |||
| bytes, since in slow start receiving an ACK advances cwnd by MSS. | bytes, since in slow start receiving an ACK advances cwnd by MSS. | |||
| However, A then launches 19 consecutive packets, which is | However, A then launches 19 consecutive packets, which is | |||
| inconsistent with slow start. | inconsistent with slow start. | |||
| A second trace confirmed that the problem is repeatable. | A second trace confirmed that the problem is repeatable. | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| Made using tcpdump/BPF recording at the sending TCP (C). No losses | Made using tcpdump recording at the sending TCP (C). No losses | |||
| reported. | reported by the packet filter. | |||
| 12:35:48.442538 C > D: P 465409:465921(512) ack 1 win 4608 | 12:35:48.442538 C > D: P 465409:465921(512) ack 1 win 4608 | |||
| 12:35:48.544483 D > C: . ack 461825 win 4096 | 12:35:48.544483 D > C: . ack 461825 win 4096 | |||
| 12:35:48.703496 D > C: . ack 461825 win 4096 | 12:35:48.703496 D > C: . ack 461825 win 4096 | |||
| 12:35:49.044613 C > D: . 461825:462337(512) ack 1 win 4608 | 12:35:49.044613 C > D: . 461825:462337(512) ack 1 win 4608 | |||
| 12:35:49.192282 D > C: . ack 465921 win 2048 | 12:35:49.192282 D > C: . ack 465921 win 2048 | |||
| 12:35:49.192538 D > C: . ack 465921 win 4096 | 12:35:49.192538 D > C: . ack 465921 win 4096 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 12:35:49.193392 C > D: P 465921:466433(512) ack 1 win 4608 | 12:35:49.193392 C > D: P 465921:466433(512) ack 1 win 4608 | |||
| 12:35:49.194726 C > D: P 466433:466945(512) ack 1 win 4608 | 12:35:49.194726 C > D: P 466433:466945(512) ack 1 win 4608 | |||
| 12:35:49.350665 D > C: . ack 466945 win 4096 | 12:35:49.350665 D > C: . ack 466945 win 4096 | |||
| 12:35:49.351694 C > D: . 466945:467457(512) ack 1 win 4608 | 12:35:49.351694 C > D: . 466945:467457(512) ack 1 win 4608 | |||
| 12:35:49.352168 C > D: . 467457:467969(512) ack 1 win 4608 | 12:35:49.352168 C > D: . 467457:467969(512) ack 1 win 4608 | |||
| 12:35:49.352643 C > D: . 467969:468481(512) ack 1 win 4608 | 12:35:49.352643 C > D: . 467969:468481(512) ack 1 win 4608 | |||
| 12:35:49.506000 D > C: . ack 467969 win 3584 | 12:35:49.506000 D > C: . ack 467969 win 3584 | |||
| After C transmits the first packet shown to D, it takes no action | After C transmits the first packet shown to D, it takes no action | |||
| in response to D's ACKs for 461825, because the first packet | in response to D's ACKs for 461825, because the first packet | |||
| skipping to change at page 8, line 30 ¶ | skipping to change at page 9, line 5 ¶ | |||
| congestion window is now MSS (512 bytes). | congestion window is now MSS (512 bytes). | |||
| D acks 465921, indicating that C's retransmission filled a sequence | D acks 465921, indicating that C's retransmission filled a sequence | |||
| hole. This ACK advances C's cwnd from 512 to 1024. Very shortly | hole. This ACK advances C's cwnd from 512 to 1024. Very shortly | |||
| after, D acks 465921 again in order to update the offered window | after, D acks 465921 again in order to update the offered window | |||
| from 2048 to 4096. This ACK does not advance cwnd since it is not | from 2048 to 4096. This ACK does not advance cwnd since it is not | |||
| for new data. Very shortly after, C responds to the newly enlarged | for new data. Very shortly after, C responds to the newly enlarged | |||
| window by transmitting two packets. D acks both, advancing cwnd | window by transmitting two packets. D acks both, advancing cwnd | |||
| from 1024 to 1536. C in turn transmits three packets. | from 1024 to 1536. C in turn transmits three packets. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| References | References | |||
| This problem is documented in [Paxson97]. | This problem is documented in [Paxson97]. | |||
| How to detect | How to detect | |||
| Packet loss is common enough in the Internet that generally it is | Packet loss is common enough in the Internet that generally it is | |||
| not difficult to find an Internet path that will force | not difficult to find an Internet path that will force | |||
| retransmission due to packet loss. | retransmission due to packet loss. | |||
| If the effective window prior to loss is large enough, however, | If the effective window prior to loss is large enough, however, | |||
| then the TCP may retransmit using the "fast recovery" mechanism | then the TCP may retransmit using the "fast recovery" mechanism | |||
| skipping to change at page 9, line 4 ¶ | skipping to change at page 9, line 30 ¶ | |||
| may lead to the transmission of new data, above both the ack point | may lead to the transmission of new data, above both the ack point | |||
| and the highest sequence transmitted so far. An absence of three | and the highest sequence transmitted so far. An absence of three | |||
| duplicate ACKs prior to retransmission suffices to distinguish | duplicate ACKs prior to retransmission suffices to distinguish | |||
| between timeout and fast recovery retransmissions. In the face of | between timeout and fast recovery retransmissions. In the face of | |||
| only observing fast recovery retransmissions, generally it is not | only observing fast recovery retransmissions, generally it is not | |||
| difficult to repeat the data transfer until observing a timeout | difficult to repeat the data transfer until observing a timeout | |||
| retransmission. | retransmission. | |||
| Once armed with a trace exhibiting a timeout retransmission, | Once armed with a trace exhibiting a timeout retransmission, | |||
| determining whether the TCP follows slow start is done by computing | determining whether the TCP follows slow start is done by computing | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| the correct progression of cwnd and comparing it to the amount of | the correct progression of cwnd and comparing it to the amount of | |||
| data transmited by the TCP subsequent to the timeout rtransmission. | data transmitted by the TCP subsequent to the timeout | |||
| retransmission. | ||||
| How to fix | How to fix | |||
| If the root problem is that the implementation lacks a notion of a | If the root problem is that the implementation lacks a notion of a | |||
| congestion window, then unfortunately this requires significant | congestion window, then unfortunately this requires significant | |||
| work to fix. However, doing so is critical, for reasons outlined | work to fix. However, doing so is critical, for reasons outlined | |||
| above. | above. | |||
| 3.3. | 3.3. | |||
| Name of Problem | Name of Problem | |||
| Uninitialized CWND | Uninitialized CWND | |||
| Classification | Classification | |||
| Congestion control | Congestion control | |||
| Description | Description | |||
| As described above for "No initial slow start", when a TCP | As described above for "No initial slow start", when a TCP | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| connection begins cwnd is initialized to one segment (or perhaps a | connection begins cwnd is initialized to one segment (or perhaps a | |||
| few segments, if experimenting with [Allman98]). One particular | few segments, if experimenting with [RFC2414]). One particular | |||
| form of "No initial slow start", worth separate mention as the bug | form of "No initial slow start", worth separate mention as the bug | |||
| is fairly widely deployed, is "Uninitialized CWND". That is, while | is fairly widely deployed, is "Uninitialized CWND". That is, while | |||
| the TCP implements the proper slow start mechanism, it fails to | the TCP implements the proper slow start mechanism, it fails to | |||
| initialize cwnd properly, so slow start in fact fails to occur. | initialize cwnd properly, so slow start in fact fails to occur. | |||
| The particular bug occurs when, during the connection establishment | One way the bug can occur is if, during the connection | |||
| handshake, the SYN ACK packet arrives without an MSS option. The | establishment handshake, the SYN ACK packet arrives without an MSS | |||
| faulty implementation uses receipt of the MSS option to initialize | option. The faulty implementation uses receipt of the MSS option | |||
| cwnd to one segment; if the option fails to arrive, then cwnd is | to initialize cwnd to one segment; if the option fails to arrive, | |||
| instead initialized to a very large value. | then cwnd is instead initialized to a very large value. | |||
| Significance | Significance | |||
| In congested environments, detrimental to the performance of other | In congested environments, detrimental to the performance of other | |||
| connections, and likely to the connection itself. The burst can be | connections, and likely to the connection itself. The burst can be | |||
| so large (see below) that it has deleterious effects even in | so large (see below) that it has deleterious effects even in | |||
| uncongested environments. | uncongested environments. | |||
| Implications | Implications | |||
| A TCP exhibiting this behavior is stressing the network with a | A TCP exhibiting this behavior is stressing the network with a | |||
| large burst of packets, which can cause loss in the network. | large burst of packets, which can cause loss in the network. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 requires use of slow start. RFC 2001 gives the specifics | RFC 1122 requires use of slow start. RFC 2001 gives the specifics | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| of slow start. | of slow start. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| This trace was made using tcpdump/BPF running on host A. Host A is | This trace was made using tcpdump running on host A. Host A is the | |||
| the sender and host B is the receiver. The advertised window and | sender and host B is the receiver. The advertised window and | |||
| timestamp options have been omitted for clarity, except for the | timestamp options have been omitted for clarity, except for the | |||
| first segment sent by host A. Note that A sends an MSS option in | first segment sent by host A. Note that A sends an MSS option in | |||
| its initial SYN but B does not include one in its reply. | its initial SYN but B does not include one in its reply. | |||
| 16:56:02.226937 A > B: S 237585307:237585307(0) win 8192 | 16:56:02.226937 A > B: S 237585307:237585307(0) win 8192 | |||
| <mss 536,nop,wscale 0,nop,nop,timestamp[|tcp]> | <mss 536,nop,wscale 0,nop,nop,timestamp[|tcp]> | |||
| 16:56:02.557135 B > A: S 1617216000:1617216000(0) | 16:56:02.557135 B > A: S 1617216000:1617216000(0) | |||
| ack 237585308 win 16384 | ack 237585308 win 16384 | |||
| 16:56:02.557788 A > B: . ack 1 win 8192 | 16:56:02.557788 A > B: . ack 1 win 8192 | |||
| 16:56:02.566014 A > B: . 1:537(536) ack 1 | 16:56:02.566014 A > B: . 1:537(536) ack 1 | |||
| 16:56:02.566557 A > B: . 537:1073(536) ack 1 | 16:56:02.566557 A > B: . 537:1073(536) ack 1 | |||
| 16:56:02.567120 A > B: . 1073:1609(536) ack 1 | 16:56:02.567120 A > B: . 1073:1609(536) ack 1 | |||
| 16:56:02.567662 A > B: P 1609:2049(440) ack 1 | 16:56:02.567662 A > B: P 1609:2049(440) ack 1 | |||
| 16:56:02.568349 A > B: . 2049:2585(536) ack 1 | 16:56:02.568349 A > B: . 2049:2585(536) ack 1 | |||
| 16:56:02.568909 A > B: . 2585:3121(536) ack 1 | 16:56:02.568909 A > B: . 2585:3121(536) ack 1 | |||
| [54 additional burst segments deleted for brevity] | [54 additional burst segments deleted for brevity] | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 16:56:02.936638 A > B: . 32065:32601(536) ack 1 | 16:56:02.936638 A > B: . 32065:32601(536) ack 1 | |||
| 16:56:03.018685 B > A: . ack 1 | 16:56:03.018685 B > A: . ack 1 | |||
| After the three-way handshake, host A bursts 61 segments into the | After the three-way handshake, host A bursts 61 segments into the | |||
| network, before duplicate ACKs on the first segment cause a | network, before duplicate ACKs on the first segment cause a | |||
| retransmission to occur. Since host A did not wait for the ACK on | retransmission to occur. Since host A did not wait for the ACK on | |||
| the first segment before sending additional segments, it is | the first segment before sending additional segments, it is | |||
| exhibiting "Uninitialized CWND" | exhibiting "Uninitialized CWND" | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| skipping to change at page 11, line 5 ¶ | skipping to change at page 11, line 29 ¶ | |||
| References | References | |||
| This problem is documented in [Paxson97]. | This problem is documented in [Paxson97]. | |||
| How to detect | How to detect | |||
| This problem can be detected by examining a packet trace recorded | This problem can be detected by examining a packet trace recorded | |||
| at either the sender or the receiver. However, the bug can be | at either the sender or the receiver. However, the bug can be | |||
| difficult to induce because it requires finding a remote TCP peer | difficult to induce because it requires finding a remote TCP peer | |||
| that does not send an MSS option in its SYN ACK. | that does not send an MSS option in its SYN ACK. | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| How to fix | How to fix | |||
| This problem can be fixed by ensuring that cwnd is initialized upon | This problem can be fixed by ensuring that cwnd is initialized upon | |||
| receipt of a SYN ACK, even if the SYN ACK does not contain an MSS | receipt of a SYN ACK, even if the SYN ACK does not contain an MSS | |||
| option. | option. | |||
| 3.4. | 3.4. | |||
| Name of Problem | Name of Problem | |||
| Inconsistent retransmission | Inconsistent retransmission | |||
| Classification | Classification | |||
| Reliability | Reliability | |||
| Description | Description | |||
| If, for a given sequence number, a sending TCP retransmits | If, for a given sequence number, a sending TCP retransmits | |||
| different data than previously sent for that sequence number, then | different data than previously sent for that sequence number, then | |||
| a strong possibility arises that the receiving TCP will reconstruct | a strong possibility arises that the receiving TCP will reconstruct | |||
| a different byte stream than that sent by the sending application, | a different byte stream than that sent by the sending application, | |||
| depending on which instance of the sequence number it accepts. | depending on which instance of the sequence number it accepts. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| Such a sending TCP exhibits "Inconsistent retransmission". | Such a sending TCP exhibits "Inconsistent retransmission". | |||
| Significance | Significance | |||
| Critical for all environments. | Critical for all environments. | |||
| Implications | Implications | |||
| Reliable delivery of data is a fundamental property of TCP. | Reliable delivery of data is a fundamental property of TCP. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 793, section 1.5, discusses the central role of reliability in | RFC 793, section 1.5, discusses the central role of reliability in | |||
| TCP operation. | TCP operation. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using tcpdump/BPF recording at the receiving TCP (B). No | Made using tcpdump recording at the receiving TCP (B). No losses | |||
| losses reported. | reported by the packet filter. | |||
| 12:35:53.145503 A > B: FP 90048435:90048461(26) ack 393464682 win 4096 | 12:35:53.145503 A > B: FP 90048435:90048461(26) ack 393464682 win 4096 | |||
| 4500 0042 9644 0000 | 4500 0042 9644 0000 | |||
| 3006 e4c2 86b1 0401 83f3 010a b2a4 0015 | 3006 e4c2 86b1 0401 83f3 010a b2a4 0015 | |||
| 055e 07b3 1773 cb6a 5019 1000 68a9 0000 | 055e 07b3 1773 cb6a 5019 1000 68a9 0000 | |||
| data starts here>504f 5254 2031 3334 2c31 3737*2c34 2c31 | data starts here>504f 5254 2031 3334 2c31 3737*2c34 2c31 | |||
| 2c31 3738 2c31 3635 0d0a | 2c31 3738 2c31 3635 0d0a | |||
| 12:35:53.146479 B > A: R 393464682:393464682(0) win 8192 | 12:35:53.146479 B > A: R 393464682:393464682(0) win 8192 | |||
| 12:35:53.851714 A > B: FP 90048429:90048463(34) ack 393464682 win 4096 | 12:35:53.851714 A > B: FP 90048429:90048463(34) ack 393464682 win 4096 | |||
| 4500 004a 965b 0000 | 4500 004a 965b 0000 | |||
| 3006 e4a3 86b1 0401 83f3 010a b2a4 0015 | 3006 e4a3 86b1 0401 83f3 010a b2a4 0015 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 055e 07ad 1773 cb6a 5019 1000 8bd3 0000 | 055e 07ad 1773 cb6a 5019 1000 8bd3 0000 | |||
| data starts here>5041 5356 0d0a 504f 5254 2031 3334 2c31 | data starts here>5041 5356 0d0a 504f 5254 2031 3334 2c31 | |||
| 3737*2c31 3035 2c31 3431 2c34 2c31 3539 | 3737*2c31 3035 2c31 3431 2c34 2c31 3539 | |||
| 0d0a | 0d0a | |||
| The sequence numbers shown in this trace are absolute and not | The sequence numbers shown in this trace are absolute and not | |||
| adjusted to reflect the ISN. The 4-digit hex values show a dump of | adjusted to reflect the ISN. The 4-digit hex values show a dump of | |||
| the packet's IP and TCP headers, as well as payload. A first sends | the packet's IP and TCP headers, as well as payload. A first sends | |||
| to B data for 90048435:90048461. The corresponding data begins | to B data for 90048435:90048461. The corresponding data begins | |||
| with hex words 504f, 5254, etc. | with hex words 504f, 5254, etc. | |||
| skipping to change at page 12, line 30 ¶ | skipping to change at page 13, line 5 ¶ | |||
| A then sends 90048429:90048463, which includes six sequence | A then sends 90048429:90048463, which includes six sequence | |||
| positions below the earlier transmission, all 26 positions of the | positions below the earlier transmission, all 26 positions of the | |||
| earlier transmission, and two additional sequence positions. | earlier transmission, and two additional sequence positions. | |||
| The retransmission disagrees starting just after sequence 90048447, | The retransmission disagrees starting just after sequence 90048447, | |||
| annotated above with a leading '*'. These two bytes were | annotated above with a leading '*'. These two bytes were | |||
| originally transmitted as hex 2c34 but retransmitted as hex 2c31. | originally transmitted as hex 2c34 but retransmitted as hex 2c31. | |||
| Subsequent positions disagree as well. | Subsequent positions disagree as well. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| This behavior has been observed in other traces involving different | This behavior has been observed in other traces involving different | |||
| hosts. It is unknown how to repeat it. | hosts. It is unknown how to repeat it. | |||
| In this instance, no corruption would occur, since B has already | In this instance, no corruption would occur, since B has already | |||
| indicated it will not accept further packets from A. | indicated it will not accept further packets from A. | |||
| A second example illustrates a slightly different instance of the | A second example illustrates a slightly different instance of the | |||
| problem. The tracing again was made with tcpdump/BPF at the | problem. The tracing again was made with tcpdump at the receiving | |||
| receiving TCP (D). | TCP (D). | |||
| 22:23:58.645829 C > D: P 185:212(27) ack 565 win 4096 | 22:23:58.645829 C > D: P 185:212(27) ack 565 win 4096 | |||
| 4500 0043 90a3 0000 | 4500 0043 90a3 0000 | |||
| 3306 0734 cbf1 9eef 83f3 010a 0525 0015 | 3306 0734 cbf1 9eef 83f3 010a 0525 0015 | |||
| a3a2 faba 578c 70a4 5018 1000 9a53 0000 | a3a2 faba 578c 70a4 5018 1000 9a53 0000 | |||
| data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538 | data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538 | |||
| 2c32 3339 2c35 2c34 330d 0a | 2c32 3339 2c35 2c34 330d 0a | |||
| 22:23:58.646805 D > C: . ack 184 win 8192 | 22:23:58.646805 D > C: . ack 184 win 8192 | |||
| 4500 0028 beeb 0000 | 4500 0028 beeb 0000 | |||
| 3e06 ce06 83f3 010a cbf1 9eef 0015 0525 | 3e06 ce06 83f3 010a cbf1 9eef 0015 0525 | |||
| 578c 70a4 a3a2 fab9 5010 2000 342f 0000 | 578c 70a4 a3a2 fab9 5010 2000 342f 0000 | |||
| 22:31:36.532244 C > D: FP 186:213(27) ack 565 win 4096 | 22:31:36.532244 C > D: FP 186:213(27) ack 565 win 4096 | |||
| 4500 0043 9435 0000 | 4500 0043 9435 0000 | |||
| 3306 03a2 cbf1 9eef 83f3 010a 0525 0015 | 3306 03a2 cbf1 9eef 83f3 010a 0525 0015 | |||
| a3a2 fabb 578c 70a4 5019 1000 9a51 0000 | a3a2 fabb 578c 70a4 5019 1000 9a51 0000 | |||
| data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538 | data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 2c32 3339 2c35 2c34 330d 0a | 2c32 3339 2c35 2c34 330d 0a | |||
| In this trace, sequence numbers are relative. C sends 185:212, but | In this trace, sequence numbers are relative. C sends 185:212, but | |||
| D only sends an ACK for 184 (so sequence number 184 is missing). C | D only sends an ACK for 184 (so sequence number 184 is missing). C | |||
| then sends 186:213. The packet payload is identical to the | then sends 186:213. The packet payload is identical to the | |||
| previous payload, but the base sequence number is one higher, | previous payload, but the base sequence number is one higher, | |||
| resulting in an inconsistent retransmission. | resulting in an inconsistent retransmission. | |||
| Neither trace exhibits checksum errors. | Neither trace exhibits checksum errors. | |||
| skipping to change at page 13, line 29 ¶ | skipping to change at page 14, line 5 ¶ | |||
| References | References | |||
| None known. | None known. | |||
| How to detect | How to detect | |||
| This problem unfortunately can be very difficult to detect, since | This problem unfortunately can be very difficult to detect, since | |||
| available experience indicates it is quite rare that it is | available experience indicates it is quite rare that it is | |||
| manifested. No "trigger" has been identified that can be used to | manifested. No "trigger" has been identified that can be used to | |||
| reproduce the problem. | reproduce the problem. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| How to fix | How to fix | |||
| In the absence of a known "trigger", we cannot always assess how to | In the absence of a known "trigger", we cannot always assess how to | |||
| fix the problem. | fix the problem. | |||
| In one implementation (not the one illustrated above), the problem | In one implementation (not the one illustrated above), the problem | |||
| manifested itself when (1) the sender received a zero window and | manifested itself when (1) the sender received a zero window and | |||
| stalled; (2) eventually an ACK arrived that offered a window larger | stalled; (2) eventually an ACK arrived that offered a window larger | |||
| than that in effect at the time of the stall; (3) the sender | than that in effect at the time of the stall; (3) the sender | |||
| transmitted out of the buffer of data it held at the time of the | transmitted out of the buffer of data it held at the time of the | |||
| stall, but (4) failed to limit this transfer to the buffer length, | stall, but (4) failed to limit this transfer to the buffer length, | |||
| skipping to change at page 14, line 5 ¶ | skipping to change at page 14, line 30 ¶ | |||
| retransmitted the corresponding sequence numbers, at that point it | retransmitted the corresponding sequence numbers, at that point it | |||
| sent the correct data, resulting in an inconsistent retransmission. | sent the correct data, resulting in an inconsistent retransmission. | |||
| Note that this instance of the problem reflects a more general | Note that this instance of the problem reflects a more general | |||
| problem, that of initially transmitting incorrect data. | problem, that of initially transmitting incorrect data. | |||
| 3.5. | 3.5. | |||
| Name of Problem | Name of Problem | |||
| Failure to retain above-sequence data | Failure to retain above-sequence data | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Classification | Classification | |||
| Congestion control, performance | Congestion control, performance | |||
| Description | Description | |||
| When a TCP receives an "above sequence" segment, meaning one with a | When a TCP receives an "above sequence" segment, meaning one with a | |||
| sequence number exceeding RCV.NXT but below RCV.NXT+RCV.WND, it | sequence number exceeding RCV.NXT but below RCV.NXT+RCV.WND, it | |||
| SHOULD queue the segment for later delivery (RFC 1122, 4.2.2.20). | SHOULD queue the segment for later delivery (RFC 1122, 4.2.2.20). | |||
| A TCP that fails to do so is said to exhibit "Failure to retain | (See RFC 793 for the definition of RCV.NXT and RCV.WND.) A TCP | |||
| above-sequence data". | that fails to do so is said to exhibit "Failure to retain above- | |||
| sequence data". | ||||
| It may sometimes be appropriate for a TCP to discard above-sequence | It may sometimes be appropriate for a TCP to discard above-sequence | |||
| data to reclaim memory. If they do so only rarely, then we would | data to reclaim memory. If they do so only rarely, then we would | |||
| not consider them to exhibit this problem. Instead, the particular | not consider them to exhibit this problem. Instead, the particular | |||
| concern is with TCPs that always discard above-sequence data. | concern is with TCPs that always discard above-sequence data. | |||
| Significance | Significance | |||
| In environments prone to packet loss, detrimental to the | In environments prone to packet loss, detrimental to the | |||
| performance of both other connections and the connection itself. | performance of both other connections and the connection itself. | |||
| Implications | Implications | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| In times of congestion, a failure to retain above-sequence data | In times of congestion, a failure to retain above-sequence data | |||
| will lead to numerous otherwise-unnecessary retransmissions, | will lead to numerous otherwise-unnecessary retransmissions, | |||
| aggravating the congestion and potentially reducing performance by | aggravating the congestion and potentially reducing performance by | |||
| a large factor. | a large factor. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 revises RFC 793 by upgrading the latter's MAY to a SHOULD | RFC 1122 revises RFC 793 by upgrading the latter's MAY to a SHOULD | |||
| on this issue. | on this issue. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using tcpdump/BPF recording at the receiving TCP. No losses | Made using tcpdump recording at the receiving TCP. No losses | |||
| reported. | reported by the packet filter. | |||
| B is the TCP sender, A the receiver. A exhibits failure to retain | B is the TCP sender, A the receiver. A exhibits failure to retain | |||
| above sequence data: | above sequence data: | |||
| 10:38:10.164860 B > A: . 221078:221614(536) ack 1 win 33232 [tos 0x8] | 10:38:10.164860 B > A: . 221078:221614(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.170809 B > A: . 221614:222150(536) ack 1 win 33232 [tos 0x8] | 10:38:10.170809 B > A: . 221614:222150(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.177183 B > A: . 222150:222686(536) ack 1 win 33232 [tos 0x8] | 10:38:10.177183 B > A: . 222150:222686(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.225039 A > B: . ack 222686 win 25800 | 10:38:10.225039 A > B: . ack 222686 win 25800 | |||
| Here B has sent up to (relative) sequence 222686 in-sequence, and A | Here B has sent up to (relative) sequence 222686 in-sequence, and A | |||
| accordingly acknowledges. | accordingly acknowledges. | |||
| 10:38:10.268131 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8] | 10:38:10.268131 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.337995 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8] | 10:38:10.337995 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8] | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 10:38:10.344065 B > A: . 224294:224830(536) ack 1 win 33232 [tos 0x8] | 10:38:10.344065 B > A: . 224294:224830(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.350169 B > A: . 224830:225366(536) ack 1 win 33232 [tos 0x8] | 10:38:10.350169 B > A: . 224830:225366(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.356362 B > A: . 225366:225902(536) ack 1 win 33232 [tos 0x8] | 10:38:10.356362 B > A: . 225366:225902(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.362445 B > A: . 225902:226438(536) ack 1 win 33232 [tos 0x8] | 10:38:10.362445 B > A: . 225902:226438(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.368579 B > A: . 226438:226974(536) ack 1 win 33232 [tos 0x8] | 10:38:10.368579 B > A: . 226438:226974(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.374732 B > A: . 226974:227510(536) ack 1 win 33232 [tos 0x8] | 10:38:10.374732 B > A: . 226974:227510(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.380825 B > A: . 227510:228046(536) ack 1 win 33232 [tos 0x8] | 10:38:10.380825 B > A: . 227510:228046(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.387027 B > A: . 228046:228582(536) ack 1 win 33232 [tos 0x8] | 10:38:10.387027 B > A: . 228046:228582(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.393053 B > A: . 228582:229118(536) ack 1 win 33232 [tos 0x8] | 10:38:10.393053 B > A: . 228582:229118(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:10.399193 B > A: . 229118:229654(536) ack 1 win 33232 [tos 0x8] | 10:38:10.399193 B > A: . 229118:229654(536) ack 1 win 33232 [tos 0x8] | |||
| skipping to change at page 15, line 30 ¶ | skipping to change at page 16, line 5 ¶ | |||
| sequence because 222686:223222 was dropped. The packets do however | sequence because 222686:223222 was dropped. The packets do however | |||
| fit within the offered window of 25800. A does not generate any | fit within the offered window of 25800. A does not generate any | |||
| duplicate ACKs for them. | duplicate ACKs for them. | |||
| The trace contributor (V. Paxson) verified that these 13 packets | The trace contributor (V. Paxson) verified that these 13 packets | |||
| had valid IP and TCP checksums. | had valid IP and TCP checksums. | |||
| 10:38:11.917728 B > A: . 222686:223222(536) ack 1 win 33232 [tos 0x8] | 10:38:11.917728 B > A: . 222686:223222(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:11.930925 A > B: . ack 223222 win 32232 | 10:38:11.930925 A > B: . ack 223222 win 32232 | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| B times out for 222686:223222 and retransmits it. Upon receiving | B times out for 222686:223222 and retransmits it. Upon receiving | |||
| it, A only acknowledges 223222. Had it retained the valid above- | it, A only acknowledges 223222. Had it retained the valid above- | |||
| sequence packets, it would instead have ack'd 230190. | sequence packets, it would instead have ack'd 230190. | |||
| 10:38:12.048438 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8] | 10:38:12.048438 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:12.054397 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8] | 10:38:12.054397 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8] | |||
| 10:38:12.068029 A > B: . ack 224294 win 31696 | 10:38:12.068029 A > B: . ack 224294 win 31696 | |||
| B retransmits two more packets, and A only acknowledges them. This | B retransmits two more packets, and A only acknowledges them. This | |||
| pattern continues as B retransmits the entire set of previously- | pattern continues as B retransmits the entire set of previously- | |||
| received packets. | received packets. | |||
| A second trace confirmed that the problem is repeatable. | A second trace confirmed that the problem is repeatable. | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| Made using tcpdump/BPF recording at the receiving TCP (C). No | Made using tcpdump recording at the receiving TCP (C). No losses | |||
| losses reported. | reported by the packet filter. | |||
| 09:11:25.790417 D > C: . 33793:34305(512) ack 1 win 61440 | 09:11:25.790417 D > C: . 33793:34305(512) ack 1 win 61440 | |||
| 09:11:25.791393 D > C: . 34305:34817(512) ack 1 win 61440 | 09:11:25.791393 D > C: . 34305:34817(512) ack 1 win 61440 | |||
| 09:11:25.792369 D > C: . 34817:35329(512) ack 1 win 61440 | 09:11:25.792369 D > C: . 34817:35329(512) ack 1 win 61440 | |||
| 09:11:25.792369 D > C: . 35329:35841(512) ack 1 win 61440 | 09:11:25.792369 D > C: . 35329:35841(512) ack 1 win 61440 | |||
| 09:11:25.793345 D > C: . 36353:36865(512) ack 1 win 61440 | 09:11:25.793345 D > C: . 36353:36865(512) ack 1 win 61440 | |||
| 09:11:25.794321 C > D: . ack 35841 win 59904 | 09:11:25.794321 C > D: . ack 35841 win 59904 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| A sequence hole occurs because 35841:36353 has been dropped. | A sequence hole occurs because 35841:36353 has been dropped. | |||
| 09:11:25.794321 D > C: . 36865:37377(512) ack 1 win 61440 | 09:11:25.794321 D > C: . 36865:37377(512) ack 1 win 61440 | |||
| 09:11:25.794321 C > D: . ack 35841 win 59904 | 09:11:25.794321 C > D: . ack 35841 win 59904 | |||
| 09:11:25.795297 D > C: . 37377:37889(512) ack 1 win 61440 | 09:11:25.795297 D > C: . 37377:37889(512) ack 1 win 61440 | |||
| 09:11:25.795297 C > D: . ack 35841 win 59904 | 09:11:25.795297 C > D: . ack 35841 win 59904 | |||
| 09:11:25.796273 C > D: . ack 35841 win 61440 | 09:11:25.796273 C > D: . ack 35841 win 61440 | |||
| 09:11:25.798225 D > C: . 37889:38401(512) ack 1 win 61440 | 09:11:25.798225 D > C: . 37889:38401(512) ack 1 win 61440 | |||
| 09:11:25.799201 C > D: . ack 35841 win 61440 | 09:11:25.799201 C > D: . ack 35841 win 61440 | |||
| 09:11:25.807009 D > C: . 38401:38913(512) ack 1 win 61440 | 09:11:25.807009 D > C: . 38401:38913(512) ack 1 win 61440 | |||
| skipping to change at page 16, line 29 ¶ | skipping to change at page 17, line 4 ¶ | |||
| 09:11:25.884113 D > C: . 52737:53249(512) ack 1 win 61440 | 09:11:25.884113 D > C: . 52737:53249(512) ack 1 win 61440 | |||
| 09:11:25.884113 C > D: . ack 35841 win 61440 | 09:11:25.884113 C > D: . ack 35841 win 61440 | |||
| Each additional, above-sequence packet C receives from D elicits a | Each additional, above-sequence packet C receives from D elicits a | |||
| duplicate ACK for 35841. | duplicate ACK for 35841. | |||
| 09:11:25.887041 D > C: . 35841:36353(512) ack 1 win 61440 | 09:11:25.887041 D > C: . 35841:36353(512) ack 1 win 61440 | |||
| 09:11:25.887041 C > D: . ack 53249 win 44032 | 09:11:25.887041 C > D: . ack 53249 win 44032 | |||
| D retransmits 35841:36353 and C acknowledges receipt of data all | D retransmits 35841:36353 and C acknowledges receipt of data all | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| the way up to 53249. | the way up to 53249. | |||
| References | References | |||
| This problem is documented in [Paxson97]. | This problem is documented in [Paxson97]. | |||
| How to detect | How to detect | |||
| Packet loss is common enough in the Internet that generally it is | Packet loss is common enough in the Internet that generally it is | |||
| not difficult to find an Internet path that will result in some | not difficult to find an Internet path that will result in some | |||
| above-sequence packets arriving. A TCP that exhibits "Failure to | above-sequence packets arriving. A TCP that exhibits "Failure to | |||
| retain ..." may not generate duplicate ACKs for these packets. | retain ..." may not generate duplicate ACKs for these packets. | |||
| skipping to change at page 17, line 4 ¶ | skipping to change at page 17, line 31 ¶ | |||
| above-sequence is acknowledged. | above-sequence is acknowledged. | |||
| Two considerations in detecting this problem using a packet trace | Two considerations in detecting this problem using a packet trace | |||
| are that it is easiest to do so with a trace made at the TCP | are that it is easiest to do so with a trace made at the TCP | |||
| receiver, in order to unambiguously determine which packets arrived | receiver, in order to unambiguously determine which packets arrived | |||
| successfully, and that such packets may still be correctly | successfully, and that such packets may still be correctly | |||
| discarded if they arrive with checksum errors. The latter can be | discarded if they arrive with checksum errors. The latter can be | |||
| tested by capturing the entire packet contents and performing the | tested by capturing the entire packet contents and performing the | |||
| IP and TCP checksum algorithms to verify their integrity; or by | IP and TCP checksum algorithms to verify their integrity; or by | |||
| confirming that the packets arrive with the same checksum and | confirming that the packets arrive with the same checksum and | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| contents as that with which they were sent, with a presumption that | contents as that with which they were sent, with a presumption that | |||
| the sending TCP correctly calculates checksums for the packets it | the sending TCP correctly calculates checksums for the packets it | |||
| transmits. | transmits. | |||
| It is considerably easier to verify that an implementation does NOT | It is considerably easier to verify that an implementation does NOT | |||
| exhibit this problem. This can be done by recording a trace at the | exhibit this problem. This can be done by recording a trace at the | |||
| data sender, and observing that sometimes after a retransmission | data sender, and observing that sometimes after a retransmission | |||
| the receiver acknowledges a higher sequence number than just that | the receiver acknowledges a higher sequence number than just that | |||
| which was retransmitted. | which was retransmitted. | |||
| How to fix | How to fix | |||
| If the root problem is that the implementation lacks buffer, then | If the root problem is that the implementation lacks buffer, then | |||
| then unfortunately this requires significant work to fix. However, | then unfortunately this requires significant work to fix. However, | |||
| doing so is important, for reasons outlined above. | doing so is important, for reasons outlined above. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 3.6. | 3.6. | |||
| Name of Problem | Name of Problem | |||
| Extra additive constant in congestion avoidance | Extra additive constant in congestion avoidance | |||
| Classification | Classification | |||
| Congestion control / performance | Congestion control / performance | |||
| Description | Description | |||
| RFC 1122 section 4.2.2.15 states that TCP MUST implement Jacobson's | RFC 1122 section 4.2.2.15 states that TCP MUST implement Jacobson's | |||
| skipping to change at page 18, line 5 ¶ | skipping to change at page 18, line 35 ¶ | |||
| Some TCP implementations add an additional fraction of a segment | Some TCP implementations add an additional fraction of a segment | |||
| (typically MSS/8) to cwnd for each ACK received for new data | (typically MSS/8) to cwnd for each ACK received for new data | |||
| [Stevens94, Wright95]: | [Stevens94, Wright95]: | |||
| (MSS * MSS / cwnd) + MSS/8 | (MSS * MSS / cwnd) + MSS/8 | |||
| These implementations exhibit "Extra additive constant in | These implementations exhibit "Extra additive constant in | |||
| congestion avoidance". | congestion avoidance". | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Significance | Significance | |||
| May be detrimental to performance even in completely uncongested | May be detrimental to performance even in completely uncongested | |||
| environments (see Implications). | environments (see Implications). | |||
| In congested environments, may also be detrimental to the | In congested environments, may also be detrimental to the | |||
| performance of other connections. | performance of other connections. | |||
| Implications | Implications | |||
| The extra additive term allows a TCP to more aggressively open its | The extra additive term allows a TCP to more aggressively open its | |||
| congestion window (quadratic rather than linear increase). For | congestion window (quadratic rather than linear increase). For | |||
| congested networks, this can increase the loss rate experienced by | congested networks, this can increase the loss rate experienced by | |||
| all connections sharing a bottleneck with the aggressive TCP. | all connections sharing a bottleneck with the aggressive TCP. | |||
| However, even for completely uncongested networks, the extra | However, even for completely uncongested networks, the extra | |||
| additive term can lead to diminished performance, as follows. In | additive term can lead to diminished performance, as follows. In | |||
| congestion avoidance, a TCP sender probes the network path to | congestion avoidance, a TCP sender probes the network path to | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| determine its available capacity, which often equates to the number | determine its available capacity, which often equates to the number | |||
| of buffers available at a bottleneck link. With linear congestion | of buffers available at a bottleneck link. With linear congestion | |||
| avoidance, the TCP only probes for sufficient capacity (buffer) to | avoidance, the TCP only probes for sufficient capacity (buffer) to | |||
| hold one extra packet per RTT. | hold one extra packet per RTT. | |||
| Thus, when it exceeds the available capacity, generally only one | Thus, when it exceeds the available capacity, generally only one | |||
| packet will be lost (since on the previous RTT it already found | packet will be lost (since on the previous RTT it already found | |||
| that the path could sustain a window with one less packet in | that the path could sustain a window with one less packet in | |||
| flight). If the congestion window is sufficiently large, then the | flight). If the congestion window is sufficiently large, then the | |||
| TCP will recover from this single loss using fast retransmission | TCP will recover from this single loss using fast retransmission | |||
| and avoid an expensive (in terms of performance) retransmission | and avoid an expensive (in terms of performance) retransmission | |||
| timeout. | timeout. | |||
| However, when the additional additive term is used, then cwnd can | However, when the additional additive term is used, then cwnd can | |||
| increase by more than one packet per RTT, in which case the TCP | increase by more than one packet per RTT, in which case the TCP | |||
| probes more aggressively. If in the previous RTT it had reached | probes more aggressively. If in the previous RTT it had reached | |||
| the available capacity of the path, then the excess due to the | the available capacity of the path, then the excess due to the | |||
| increase will again be lost, but now this will result in multiple | extra increase will again be lost, but now this will result in | |||
| losses from the flight instead of a single loss. TCPs that do not | multiple losses from the flight instead of a single loss. TCPs | |||
| utilize SACK [RFC2018] generally will not recover from multiple | that do not utilize SACK [RFC2018] generally will not recover from | |||
| losses without incurring a retransmission timeout [Fall96,Hoe96], | multiple losses without incurring a retransmission timeout | |||
| significantly diminishing performance. | [Fall96,Hoe96], significantly diminishing performance. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 requires use of the "congestion avoidance" algorithm. RFC | RFC 1122 requires use of the "congestion avoidance" algorithm. RFC | |||
| 2001 outlines the fast retransmit/fast recovery algorithms. RFC | 2001 outlines the fast retransmit/fast recovery algorithms. RFC | |||
| 2018 discusses the SACK option. | 2018 discusses the SACK option. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Recorded using tcpdump running on the same FDDI LAN as host A. | Recorded using tcpdump running on the same FDDI LAN as host A. | |||
| Host A is the sender and host B is the receiver. The connection | Host A is the sender and host B is the receiver. The connection | |||
| establishment specified an MSS of 4,312 bytes and a window scale | establishment specified an MSS of 4,312 bytes and a window scale | |||
| factor of 4. We omit the establishment and the first 2.5 MB of | factor of 4. We omit the establishment and the first 2.5 MB of | |||
| data transfer, as the problem is best demonstrated when the window | data transfer, as the problem is best demonstrated when the window | |||
| has grown to a large value. At the beginning of the trace excerpt, | has grown to a large value. At the beginning of the trace excerpt, | |||
| the congestion window is 31 packets. The connection is never | the congestion window is 31 packets. The connection is never | |||
| receiver-window limited, so we omit window advertisements from the | receiver-window limited, so we omit window advertisements from the | |||
| trace for clarity. | trace for clarity. | |||
| 11:42:07.697951 B > A: . ack 2383006 | 11:42:07.697951 B > A: . ack 2383006 | |||
| 11:42:07.699388 A > B: . 2508054:2512366(4312) | 11:42:07.699388 A > B: . 2508054:2512366(4312) | |||
| 11:42:07.699962 A > B: . 2512366:2516678(4312) | 11:42:07.699962 A > B: . 2512366:2516678(4312) | |||
| 11:42:07.700012 B > A: . ack 2391630 | 11:42:07.700012 B > A: . ack 2391630 | |||
| 11:42:07.701081 A > B: . 2516678:2520990(4312) | 11:42:07.701081 A > B: . 2516678:2520990(4312) | |||
| 11:42:07.701656 A > B: . 2520990:2525302(4312) | 11:42:07.701656 A > B: . 2520990:2525302(4312) | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 11:42:07.701739 B > A: . ack 2400254 | 11:42:07.701739 B > A: . ack 2400254 | |||
| 11:42:07.702685 A > B: . 2525302:2529614(4312) | 11:42:07.702685 A > B: . 2525302:2529614(4312) | |||
| 11:42:07.703257 A > B: . 2529614:2533926(4312) | 11:42:07.703257 A > B: . 2529614:2533926(4312) | |||
| 11:42:07.703295 B > A: . ack 2408878 | 11:42:07.703295 B > A: . ack 2408878 | |||
| 11:42:07.704414 A > B: . 2533926:2538238(4312) | 11:42:07.704414 A > B: . 2533926:2538238(4312) | |||
| 11:42:07.704989 A > B: . 2538238:2542550(4312) | 11:42:07.704989 A > B: . 2538238:2542550(4312) | |||
| 11:42:07.705040 B > A: . ack 2417502 | 11:42:07.705040 B > A: . ack 2417502 | |||
| 11:42:07.705935 A > B: . 2542550:2546862(4312) | 11:42:07.705935 A > B: . 2542550:2546862(4312) | |||
| 11:42:07.706506 A > B: . 2546862:2551174(4312) | 11:42:07.706506 A > B: . 2546862:2551174(4312) | |||
| 11:42:07.706544 B > A: . ack 2426126 | 11:42:07.706544 B > A: . ack 2426126 | |||
| skipping to change at page 20, line 4 ¶ | skipping to change at page 20, line 39 ¶ | |||
| 11:42:07.712898 A > B: . 2585670:2589982(4312) | 11:42:07.712898 A > B: . 2585670:2589982(4312) | |||
| 11:42:07.712938 B > A: . ack 2460622 | 11:42:07.712938 B > A: . ack 2460622 | |||
| 11:42:07.713926 A > B: . 2589982:2594294(4312) | 11:42:07.713926 A > B: . 2589982:2594294(4312) | |||
| 11:42:07.714501 A > B: . 2594294:2598606(4312) | 11:42:07.714501 A > B: . 2594294:2598606(4312) | |||
| 11:42:07.714547 B > A: . ack 2469246 | 11:42:07.714547 B > A: . ack 2469246 | |||
| 11:42:07.715747 A > B: . 2598606:2602918(4312) | 11:42:07.715747 A > B: . 2598606:2602918(4312) | |||
| 11:42:07.716287 A > B: . 2602918:2607230(4312) | 11:42:07.716287 A > B: . 2602918:2607230(4312) | |||
| 11:42:07.716328 B > A: . ack 2477870 | 11:42:07.716328 B > A: . ack 2477870 | |||
| 11:42:07.717146 A > B: . 2607230:2611542(4312) | 11:42:07.717146 A > B: . 2607230:2611542(4312) | |||
| 11:42:07.717717 A > B: . 2611542:2615854(4312) | 11:42:07.717717 A > B: . 2611542:2615854(4312) | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 11:42:07.717762 B > A: . ack 2486494 | 11:42:07.717762 B > A: . ack 2486494 | |||
| 11:42:07.718754 A > B: . 2615854:2620166(4312) | 11:42:07.718754 A > B: . 2615854:2620166(4312) | |||
| 11:42:07.719331 A > B: . 2620166:2624478(4312) | 11:42:07.719331 A > B: . 2620166:2624478(4312) | |||
| 11:42:07.719906 A > B: . 2624478:2628790(4312) ** | 11:42:07.719906 A > B: . 2624478:2628790(4312) ** | |||
| 11:42:07.719958 B > A: . ack 2495118 | 11:42:07.719958 B > A: . ack 2495118 | |||
| 11:42:07.720500 A > B: . 2628790:2633102(4312) | 11:42:07.720500 A > B: . 2628790:2633102(4312) | |||
| 11:42:07.721080 A > B: . 2633102:2637414(4312) | 11:42:07.721080 A > B: . 2633102:2637414(4312) | |||
| 11:42:07.721739 B > A: . ack 2503742 | 11:42:07.721739 B > A: . ack 2503742 | |||
| 11:42:07.722348 A > B: . 2637414:2641726(4312) | 11:42:07.722348 A > B: . 2637414:2641726(4312) | |||
| 11:42:07.722918 A > B: . 2641726:2646038(4312) | 11:42:07.722918 A > B: . 2641726:2646038(4312) | |||
| 11:42:07.769248 B > A: . ack 2512366 | 11:42:07.769248 B > A: . ack 2512366 | |||
| The receiver's acknowledgment policy is one ACK per two packets | The receiver's acknowledgment policy is one ACK per two packets | |||
| received. Thus, for each ACK arriving at host A, two new packets | received. Thus, for each ACK arriving at host A, two new packets | |||
| are sent, except when cwnd increases due to congestion avoidance, | are sent, except when cwnd increases due to congestion avoidance, | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| in which case three new packets are sent. | in which case three new packets are sent. | |||
| With an ack-every-two-packets policy, cwnd should only increase one | With an ack-every-two-packets policy, cwnd should only increase one | |||
| MSS per 2 RTT. However, at the point marked "*" the window | MSS per 2 RTT. However, at the point marked "*" the window | |||
| increases after 7 ACKs have arrived, and then again at "**" after 6 | increases after 7 ACKs have arrived, and then again at "**" after 6 | |||
| more ACKs. | more ACKs. | |||
| While we do not have space to show the effect, this trace suffered | While we do not have space to show the effect, this trace suffered | |||
| from repeated timeout retransmissions due to multiple packet losses | from repeated timeout retransmissions due to multiple packet losses | |||
| during a single RTT. | during a single RTT. | |||
| skipping to change at page 21, line 4 ¶ | skipping to change at page 21, line 38 ¶ | |||
| aggressive with opening the window). | aggressive with opening the window). | |||
| 14:22:21.236757 B > A: . ack 5194679 | 14:22:21.236757 B > A: . ack 5194679 | |||
| 14:22:21.238192 A > B: . 5319727:5324039(4312) | 14:22:21.238192 A > B: . 5319727:5324039(4312) | |||
| 14:22:21.238770 A > B: . 5324039:5328351(4312) | 14:22:21.238770 A > B: . 5324039:5328351(4312) | |||
| 14:22:21.238821 B > A: . ack 5203303 | 14:22:21.238821 B > A: . ack 5203303 | |||
| 14:22:21.240158 A > B: . 5328351:5332663(4312) | 14:22:21.240158 A > B: . 5328351:5332663(4312) | |||
| 14:22:21.240738 A > B: . 5332663:5336975(4312) | 14:22:21.240738 A > B: . 5332663:5336975(4312) | |||
| 14:22:21.270422 B > A: . ack 5211927 | 14:22:21.270422 B > A: . ack 5211927 | |||
| 14:22:21.271883 A > B: . 5336975:5341287(4312) | 14:22:21.271883 A > B: . 5336975:5341287(4312) | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 14:22:21.272458 A > B: . 5341287:5345599(4312) | 14:22:21.272458 A > B: . 5341287:5345599(4312) | |||
| 14:22:21.279099 B > A: . ack 5220551 | 14:22:21.279099 B > A: . ack 5220551 | |||
| 14:22:21.280539 A > B: . 5345599:5349911(4312) | 14:22:21.280539 A > B: . 5345599:5349911(4312) | |||
| 14:22:21.281118 A > B: . 5349911:5354223(4312) | 14:22:21.281118 A > B: . 5349911:5354223(4312) | |||
| 14:22:21.281183 B > A: . ack 5229175 | 14:22:21.281183 B > A: . ack 5229175 | |||
| 14:22:21.282348 A > B: . 5354223:5358535(4312) | 14:22:21.282348 A > B: . 5354223:5358535(4312) | |||
| 14:22:21.283029 A > B: . 5358535:5362847(4312) | 14:22:21.283029 A > B: . 5358535:5362847(4312) | |||
| 14:22:21.283089 B > A: . ack 5237799 | 14:22:21.283089 B > A: . ack 5237799 | |||
| 14:22:21.284213 A > B: . 5362847:5367159(4312) | 14:22:21.284213 A > B: . 5362847:5367159(4312) | |||
| 14:22:21.284779 A > B: . 5367159:5371471(4312) | 14:22:21.284779 A > B: . 5367159:5371471(4312) | |||
| 14:22:21.285976 B > A: . ack 5246423 | 14:22:21.285976 B > A: . ack 5246423 | |||
| 14:22:21.287465 A > B: . 5371471:5375783(4312) | 14:22:21.287465 A > B: . 5371471:5375783(4312) | |||
| 14:22:21.288036 A > B: . 5375783:5380095(4312) | 14:22:21.288036 A > B: . 5375783:5380095(4312) | |||
| 14:22:21.288073 B > A: . ack 5255047 | 14:22:21.288073 B > A: . ack 5255047 | |||
| 14:22:21.289155 A > B: . 5380095:5384407(4312) | 14:22:21.289155 A > B: . 5380095:5384407(4312) | |||
| 14:22:21.289725 A > B: . 5384407:5388719(4312) | 14:22:21.289725 A > B: . 5384407:5388719(4312) | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 14:22:21.289762 B > A: . ack 5263671 | 14:22:21.289762 B > A: . ack 5263671 | |||
| 14:22:21.291090 A > B: . 5388719:5393031(4312) | 14:22:21.291090 A > B: . 5388719:5393031(4312) | |||
| 14:22:21.291662 A > B: . 5393031:5397343(4312) | 14:22:21.291662 A > B: . 5393031:5397343(4312) | |||
| 14:22:21.291701 B > A: . ack 5272295 | 14:22:21.291701 B > A: . ack 5272295 | |||
| 14:22:21.292870 A > B: . 5397343:5401655(4312) | 14:22:21.292870 A > B: . 5397343:5401655(4312) | |||
| 14:22:21.293441 A > B: . 5401655:5405967(4312) | 14:22:21.293441 A > B: . 5401655:5405967(4312) | |||
| 14:22:21.293481 B > A: . ack 5280919 | 14:22:21.293481 B > A: . ack 5280919 | |||
| 14:22:21.294476 A > B: . 5405967:5410279(4312) | 14:22:21.294476 A > B: . 5405967:5410279(4312) | |||
| 14:22:21.295053 A > B: . 5410279:5414591(4312) | 14:22:21.295053 A > B: . 5410279:5414591(4312) | |||
| 14:22:21.295106 B > A: . ack 5289543 | 14:22:21.295106 B > A: . ack 5289543 | |||
| skipping to change at page 22, line 4 ¶ | skipping to change at page 22, line 39 ¶ | |||
| 14:22:21.309525 A > B: . 5449087:5453399(4312) | 14:22:21.309525 A > B: . 5449087:5453399(4312) | |||
| 14:22:21.310101 A > B: . 5453399:5457711(4312) | 14:22:21.310101 A > B: . 5453399:5457711(4312) | |||
| 14:22:21.310144 B > A: . ack 5332663 *** | 14:22:21.310144 B > A: . ack 5332663 *** | |||
| 14:22:21.311615 A > B: . 5457711:5462023(4312) | 14:22:21.311615 A > B: . 5457711:5462023(4312) | |||
| 14:22:21.312198 A > B: . 5462023:5466335(4312) | 14:22:21.312198 A > B: . 5462023:5466335(4312) | |||
| 14:22:21.341876 B > A: . ack 5341287 | 14:22:21.341876 B > A: . ack 5341287 | |||
| 14:22:21.343451 A > B: . 5466335:5470647(4312) | 14:22:21.343451 A > B: . 5466335:5470647(4312) | |||
| 14:22:21.343985 A > B: . 5470647:5474959(4312) | 14:22:21.343985 A > B: . 5470647:5474959(4312) | |||
| 14:22:21.350304 B > A: . ack 5349911 | 14:22:21.350304 B > A: . ack 5349911 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 14:22:21.351852 A > B: . 5474959:5479271(4312) | 14:22:21.351852 A > B: . 5474959:5479271(4312) | |||
| 14:22:21.352430 A > B: . 5479271:5483583(4312) | 14:22:21.352430 A > B: . 5479271:5483583(4312) | |||
| 14:22:21.352484 B > A: . ack 5358535 | 14:22:21.352484 B > A: . ack 5358535 | |||
| 14:22:21.353574 A > B: . 5483583:5487895(4312) | 14:22:21.353574 A > B: . 5483583:5487895(4312) | |||
| 14:22:21.354149 A > B: . 5487895:5492207(4312) | 14:22:21.354149 A > B: . 5487895:5492207(4312) | |||
| 14:22:21.354205 B > A: . ack 5367159 | 14:22:21.354205 B > A: . ack 5367159 | |||
| 14:22:21.355467 A > B: . 5492207:5496519(4312) | 14:22:21.355467 A > B: . 5492207:5496519(4312) | |||
| 14:22:21.356039 A > B: . 5496519:5500831(4312) | 14:22:21.356039 A > B: . 5496519:5500831(4312) | |||
| 14:22:21.357361 B > A: . ack 5375783 | 14:22:21.357361 B > A: . ack 5375783 | |||
| 14:22:21.358855 A > B: . 5500831:5505143(4312) | 14:22:21.358855 A > B: . 5500831:5505143(4312) | |||
| 14:22:21.359424 A > B: . 5505143:5509455(4312) | 14:22:21.359424 A > B: . 5505143:5509455(4312) | |||
| 14:22:21.359465 B > A: . ack 5384407 | 14:22:21.359465 B > A: . ack 5384407 | |||
| 14:22:21.360605 A > B: . 5509455:5513767(4312) | 14:22:21.360605 A > B: . 5509455:5513767(4312) | |||
| 14:22:21.361181 A > B: . 5513767:5518079(4312) | 14:22:21.361181 A > B: . 5513767:5518079(4312) | |||
| 14:22:21.361225 B > A: . ack 5393031 | 14:22:21.361225 B > A: . ack 5393031 | |||
| 14:22:21.362485 A > B: . 5518079:5522391(4312) | 14:22:21.362485 A > B: . 5518079:5522391(4312) | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 14:22:21.363057 A > B: . 5522391:5526703(4312) | 14:22:21.363057 A > B: . 5522391:5526703(4312) | |||
| 14:22:21.363096 B > A: . ack 5401655 | 14:22:21.363096 B > A: . ack 5401655 | |||
| 14:22:21.364236 A > B: . 5526703:5531015(4312) | 14:22:21.364236 A > B: . 5526703:5531015(4312) | |||
| 14:22:21.364810 A > B: . 5531015:5535327(4312) | 14:22:21.364810 A > B: . 5531015:5535327(4312) | |||
| 14:22:21.364867 B > A: . ack 5410279 | 14:22:21.364867 B > A: . ack 5410279 | |||
| 14:22:21.365819 A > B: . 5535327:5539639(4312) | 14:22:21.365819 A > B: . 5535327:5539639(4312) | |||
| 14:22:21.366386 A > B: . 5539639:5543951(4312) | 14:22:21.366386 A > B: . 5539639:5543951(4312) | |||
| 14:22:21.366427 B > A: . ack 5418903 | 14:22:21.366427 B > A: . ack 5418903 | |||
| 14:22:21.367586 A > B: . 5543951:5548263(4312) | 14:22:21.367586 A > B: . 5543951:5548263(4312) | |||
| 14:22:21.368158 A > B: . 5548263:5552575(4312) | 14:22:21.368158 A > B: . 5548263:5552575(4312) | |||
| skipping to change at page 23, line 4 ¶ | skipping to change at page 23, line 39 ¶ | |||
| 14:22:21.381947 A > B: . 5587071:5591383(4312) **** | 14:22:21.381947 A > B: . 5587071:5591383(4312) **** | |||
| "***" marks the end of the first round trip. Note that cwnd did | "***" marks the end of the first round trip. Note that cwnd did | |||
| not increase (as evidenced by each ACK eliciting two new data | not increase (as evidenced by each ACK eliciting two new data | |||
| packets). Only at "****", which comes near the end of the second | packets). Only at "****", which comes near the end of the second | |||
| round trip, does cwnd increase by one packet. | round trip, does cwnd increase by one packet. | |||
| This trace did not suffer any timeout retransmissions. It | This trace did not suffer any timeout retransmissions. It | |||
| transferred the same amount of data as the first trace in about | transferred the same amount of data as the first trace in about | |||
| half as much time. This difference is repeatable between hosts A | half as much time. This difference is repeatable between hosts A | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| and B. | and B. | |||
| References | References | |||
| [Stevens94] and [Wright95] discuss this problem. The problem of | [Stevens94] and [Wright95] discuss this problem. The problem of | |||
| Reno TCP failing to recover from multiple losses except via a | Reno TCP failing to recover from multiple losses except via a | |||
| retransmission timeout is discussed in [Fall96,Hoe96]. | retransmission timeout is discussed in [Fall96,Hoe96]. | |||
| How to detect | How to detect | |||
| If source code is available, that is generally the easiest way to | If source code is available, that is generally the easiest way to | |||
| detect this problem. Search for each modification to the cwnd | detect this problem. Search for each modification to the cwnd | |||
| variable; (at least) one of these will be for congestion avoidance, | variable; (at least) one of these will be for congestion avoidance, | |||
| and inspection of the related code should immediately identify the | and inspection of the related code should immediately identify the | |||
| problem if present. | problem if present. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| The problem can also be detected by closely examining packet traces | The problem can also be detected by closely examining packet traces | |||
| taken near the sender. During congestion avoidance, cwnd will | taken near the sender. During congestion avoidance, cwnd will | |||
| increase by an additional segment upon the receipt of (typically) | increase by an additional segment upon the receipt of (typically) | |||
| eight acknowledgements without a loss. This increase is in | eight acknowledgements without a loss. This increase is in | |||
| addition to the one segment increase per round trip time (or two | addition to the one segment increase per round trip time (or two | |||
| round trip times if the receiver is using delayed ACKs). | round trip times if the receiver is using delayed ACKs). | |||
| Furthermore, graphs of the sequence number vs. time, taken from | Furthermore, graphs of the sequence number vs. time, taken from | |||
| packet traces, are normally linear during congestion avoidance. | packet traces, are normally linear during congestion avoidance. | |||
| When viewing packet traces of transfers from senders exhibiting | When viewing packet traces of transfers from senders exhibiting | |||
| skipping to change at page 24, line 5 ¶ | skipping to change at page 24, line 35 ¶ | |||
| of new data is received. | of new data is received. | |||
| 3.7. | 3.7. | |||
| Name of Problem | Name of Problem | |||
| Initial RTO too low | Initial RTO too low | |||
| Classification | Classification | |||
| Performance | Performance | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Description | Description | |||
| When a TCP first begins transmitting data, it lacks the RTT | When a TCP first begins transmitting data, it lacks the RTT | |||
| measurements necessary to have computed an adaptive retransmission | measurements necessary to have computed an adaptive retransmission | |||
| timeout (RTO). RFC 1122, 4.2.3.1, states that a TCP SHOULD | timeout (RTO). RFC 1122, 4.2.3.1, states that a TCP SHOULD | |||
| initialize RTO to 3 seconds. A TCP that uses a lower value | initialize RTO to 3 seconds. A TCP that uses a lower value | |||
| exhibits "Initial RTO too low". | exhibits "Initial RTO too low". | |||
| Significance | Significance | |||
| In environments with large RTTs (where "large" means any value | In environments with large RTTs (where "large" means any value | |||
| larger than the initial RTO), TCPs will experience very poor | larger than the initial RTO), TCPs will experience very poor | |||
| performance. | performance. | |||
| Implications | Implications | |||
| Whenever RTO < RTT, very poor performance can result as packets are | Whenever RTO < RTT, very poor performance can result as packets are | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| unnecessarily retransmitted (because RTO will expire before an ACK | unnecessarily retransmitted (because RTO will expire before an ACK | |||
| for the packet can arrive) and the connection enters slow start and | for the packet can arrive) and the connection enters slow start and | |||
| congestion avoidance. Generally, the algorithms for computing RTO | congestion avoidance. Generally, the algorithms for computing RTO | |||
| avoid this problem by adding a positive term to the estimated RTT. | avoid this problem by adding a positive term to the estimated RTT. | |||
| However, when a connection first begins it must use some estimate | However, when a connection first begins it must use some estimate | |||
| for RTO, and if it picks a value less than RTT, the above problems | for RTO, and if it picks a value less than RTT, the above problems | |||
| will arise. | will arise. | |||
| Furthermore, when the initial RTO < RTT, it can take a long time | Furthermore, when the initial RTO < RTT, it can take a long time | |||
| for the TCP to correct the problem by adapting the RTT estimate, | for the TCP to correct the problem by adapting the RTT estimate, | |||
| skipping to change at page 25, line 4 ¶ | skipping to change at page 25, line 37 ¶ | |||
| The following trace file was taken using tcpdump at host A, the | The following trace file was taken using tcpdump at host A, the | |||
| data sender. The advertised window and SYN options have been | data sender. The advertised window and SYN options have been | |||
| omitted for clarity. | omitted for clarity. | |||
| 07:52:39.870301 A > B: S 2786333696:2786333696(0) | 07:52:39.870301 A > B: S 2786333696:2786333696(0) | |||
| 07:52:40.548170 B > A: S 130240000:130240000(0) ack 2786333697 | 07:52:40.548170 B > A: S 130240000:130240000(0) ack 2786333697 | |||
| 07:52:40.561287 A > B: P 1:513(512) ack 1 | 07:52:40.561287 A > B: P 1:513(512) ack 1 | |||
| 07:52:40.753466 A > B: . 1:513(512) ack 1 | 07:52:40.753466 A > B: . 1:513(512) ack 1 | |||
| 07:52:41.133687 A > B: . 1:513(512) ack 1 | 07:52:41.133687 A > B: . 1:513(512) ack 1 | |||
| 07:52:41.458529 B > A: . ack 513 | 07:52:41.458529 B > A: . ack 513 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 07:52:41.458686 A > B: . 513:1025(512) ack 1 | 07:52:41.458686 A > B: . 513:1025(512) ack 1 | |||
| 07:52:41.458797 A > B: P 1025:1537(512) ack 1 | 07:52:41.458797 A > B: P 1025:1537(512) ack 1 | |||
| 07:52:41.541633 B > A: . ack 513 | 07:52:41.541633 B > A: . ack 513 | |||
| 07:52:41.703732 A > B: . 513:1025(512) ack 1 | 07:52:41.703732 A > B: . 513:1025(512) ack 1 | |||
| 07:52:42.044875 B > A: . ack 513 | 07:52:42.044875 B > A: . ack 513 | |||
| 07:52:42.173728 A > B: . 513:1025(512) ack 1 | 07:52:42.173728 A > B: . 513:1025(512) ack 1 | |||
| 07:52:42.330861 B > A: . ack 1537 | 07:52:42.330861 B > A: . ack 1537 | |||
| 07:52:42.331129 A > B: . 1537:2049(512) ack 1 | 07:52:42.331129 A > B: . 1537:2049(512) ack 1 | |||
| 07:52:42.331262 A > B: P 2049:2561(512) ack 1 | 07:52:42.331262 A > B: P 2049:2561(512) ack 1 | |||
| 07:52:42.623673 A > B: . 1537:2049(512) ack 1 | 07:52:42.623673 A > B: . 1537:2049(512) ack 1 | |||
| 07:52:42.683203 B > A: . ack 1537 | 07:52:42.683203 B > A: . ack 1537 | |||
| 07:52:43.044029 B > A: . ack 1537 | 07:52:43.044029 B > A: . ack 1537 | |||
| 07:52:43.193812 A > B: . 1537:2049(512) ack 1 | 07:52:43.193812 A > B: . 1537:2049(512) ack 1 | |||
| Note from the SYN/SYN-ack exchange, the RTT is over 600 msec. | Note from the SYN/SYN-ACK exchange, the RTT is over 600 msec. | |||
| However, from the elapsed time between the third and fourth lines | However, from the elapsed time between the third and fourth lines | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| (the first packet being sent and then retransmitted), it is | (the first packet being sent and then retransmitted), it is | |||
| apparent the RTO was initialized to under 200 msec. The next line | apparent the RTO was initialized to under 200 msec. The next line | |||
| shows that this value has doubled to 400 msec (correct exponential | shows that this value has doubled to 400 msec (correct exponential | |||
| backoff of RTO), but that still does not suffice to avoid an | backoff of RTO), but that still does not suffice to avoid an | |||
| unnecessary retransmission. | unnecessary retransmission. | |||
| Finally, an ACK from B arrives for the first segment. Later two | Finally, an ACK from B arrives for the first segment. Later two | |||
| more duplicate ACKs for 513 arrive, indicating that both the | more duplicate ACKs for 513 arrive, indicating that both the | |||
| original and the two retransmissions arrived at B. (Indeed, a | original and the two retransmissions arrived at B. (Indeed, a | |||
| concurrent trace at B showed that no packets were lost during the | concurrent trace at B showed that no packets were lost during the | |||
| skipping to change at page 26, line 4 ¶ | skipping to change at page 26, line 38 ¶ | |||
| omitted for clarity. | omitted for clarity. | |||
| 17:30:32.090299 C > D: S 2031744000:2031744000(0) | 17:30:32.090299 C > D: S 2031744000:2031744000(0) | |||
| 17:30:32.900325 D > C: S 262737964:262737964(0) ack 2031744001 | 17:30:32.900325 D > C: S 262737964:262737964(0) ack 2031744001 | |||
| 17:30:32.900326 C > D: . ack 1 | 17:30:32.900326 C > D: . ack 1 | |||
| 17:30:32.910326 C > D: . 1:513(512) ack 1 | 17:30:32.910326 C > D: . 1:513(512) ack 1 | |||
| 17:30:34.150355 D > C: . ack 513 | 17:30:34.150355 D > C: . ack 513 | |||
| 17:30:34.150356 C > D: . 513:1025(512) ack 1 | 17:30:34.150356 C > D: . 513:1025(512) ack 1 | |||
| 17:30:34.150357 C > D: . 1025:1537(512) ack 1 | 17:30:34.150357 C > D: . 1025:1537(512) ack 1 | |||
| 17:30:35.170384 D > C: . ack 1025 | 17:30:35.170384 D > C: . ack 1025 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 17:30:35.170385 C > D: . 1537:2049(512) ack 1 | 17:30:35.170385 C > D: . 1537:2049(512) ack 1 | |||
| 17:30:35.170386 C > D: . 2049:2561(512) ack 1 | 17:30:35.170386 C > D: . 2049:2561(512) ack 1 | |||
| 17:30:35.320385 D > C: . ack 1537 | 17:30:35.320385 D > C: . ack 1537 | |||
| 17:30:35.320386 C > D: . 2561:3073(512) ack 1 | 17:30:35.320386 C > D: . 2561:3073(512) ack 1 | |||
| 17:30:35.320387 C > D: . 3073:3585(512) ack 1 | 17:30:35.320387 C > D: . 3073:3585(512) ack 1 | |||
| 17:30:35.730384 D > C: . ack 2049 | 17:30:35.730384 D > C: . ack 2049 | |||
| The initital SYN/SYN-ack exchange shows that RTT is more than 800 | The initial SYN/SYN-ACK exchange shows that RTT is more than 800 | |||
| msec, and for some subsequent packets it rises above 1 second, but | msec, and for some subsequent packets it rises above 1 second, but | |||
| C's retransmit timer does not ever expire. | C's retransmit timer does not ever expire. | |||
| References | References | |||
| This problem is documented in [Paxson97]. | This problem is documented in [Paxson97]. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| How to detect | How to detect | |||
| This problem is readily detected by inspecting a packet trace of | This problem is readily detected by inspecting a packet trace of | |||
| the startup of a TCP connection made over a long-delay path. It | the startup of a TCP connection made over a long-delay path. It | |||
| can be diagnosed from either a sender-side or receiver-side trace. | can be diagnosed from either a sender-side or receiver-side trace. | |||
| Long-delay paths can often be found by locating remote sites on | Long-delay paths can often be found by locating remote sites on | |||
| other continents. | other continents. | |||
| How to fix | How to fix | |||
| As this problem arises from a faulty initialization, one hopes | As this problem arises from a faulty initialization, one hopes | |||
| fixing it requires a one-line change to the TCP source code. | fixing it requires a one-line change to the TCP source code. | |||
| skipping to change at page 27, line 4 ¶ | skipping to change at page 27, line 35 ¶ | |||
| Description | Description | |||
| The fast recovery algorithm allows TCP senders to continue to | The fast recovery algorithm allows TCP senders to continue to | |||
| transmit new segments during loss recovery. First, fast | transmit new segments during loss recovery. First, fast | |||
| retransmission is initiated after a TCP sender receives three | retransmission is initiated after a TCP sender receives three | |||
| duplicate ACKs. At this point, a retransmission is sent and cwnd | duplicate ACKs. At this point, a retransmission is sent and cwnd | |||
| is halved. The fast recovery algorithm then allows additional | is halved. The fast recovery algorithm then allows additional | |||
| segments to be sent when sufficient additional duplicate ACKs | segments to be sent when sufficient additional duplicate ACKs | |||
| arrive. Some implementations of fast recovery compute when to send | arrive. Some implementations of fast recovery compute when to send | |||
| additional segments by artificially incrementing cwnd, first by | additional segments by artificially incrementing cwnd, first by | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| three segments to account for the three duplicate ACKs that | three segments to account for the three duplicate ACKs that | |||
| triggered fast retransmission, and subsequently by 1 MSS for each | triggered fast retransmission, and subsequently by 1 MSS for each | |||
| new duplicate ACK that arrives. When cwnd allows, the sender | new duplicate ACK that arrives. When cwnd allows, the sender | |||
| transmits new data segments. | transmits new data segments. | |||
| When an ACK arrives that covers new data, cwnd is to be reduced by | When an ACK arrives that covers new data, cwnd is to be reduced by | |||
| the amount by which it was artificially increased. However, some | the amount by which it was artificially increased. However, some | |||
| TCP implementations fail to "deflate" the window, causing an | TCP implementations fail to "deflate" the window, causing an | |||
| inappropriate amount of data to be sent into the network after | inappropriate amount of data to be sent into the network after | |||
| recovery. One cause of this problem is the "header prediction" | recovery. One cause of this problem is the "header prediction" | |||
| code, which is used to handle incoming segments that require little | code, which is used to handle incoming segments that require little | |||
| work. In some implementations of TCP, the header prediction code | work. In some implementations of TCP, the header prediction code | |||
| does not check to make sure cwnd has not been artificially | does not check to make sure cwnd has not been artificially | |||
| inflated, and therefore does not reduce the artificially increased | inflated, and therefore does not reduce the artificially increased | |||
| cwnd when appropriate. | cwnd when appropriate. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| Significance | Significance | |||
| TCP senders that exhibit this problem will transmit a burst of data | TCP senders that exhibit this problem will transmit a burst of data | |||
| immediately after recovery, which can degrade performance, as well | immediately after recovery, which can degrade performance, as well | |||
| as network stability. Effectively, the sender does not reduce the | as network stability. Effectively, the sender does not reduce the | |||
| size of cwnd as much as it should (to half its value when loss was | size of cwnd as much as it should (to half its value when loss was | |||
| detected), if at all. This can harm the performance of the TCP | detected), if at all. This can harm the performance of the TCP | |||
| connection itself, as well as competing TCP flows. | connection itself, as well as competing TCP flows. | |||
| Implications | Implications | |||
| A TCP sender exhibiting this problem does not reduce cwnd | A TCP sender exhibiting this problem does not reduce cwnd | |||
| skipping to change at page 28, line 4 ¶ | skipping to change at page 28, line 35 ¶ | |||
| The following trace file was taken using tcpdump at host A, the | The following trace file was taken using tcpdump at host A, the | |||
| data sender. The advertised window (which never changed) has been | data sender. The advertised window (which never changed) has been | |||
| omitted for clarity, except for the first packet sent by each host. | omitted for clarity, except for the first packet sent by each host. | |||
| 08:22:56.825635 A.7505 > B.7505: . 29697:30209(512) ack 1 win 4608 | 08:22:56.825635 A.7505 > B.7505: . 29697:30209(512) ack 1 win 4608 | |||
| 08:22:57.038794 B.7505 > A.7505: . ack 27649 win 4096 | 08:22:57.038794 B.7505 > A.7505: . ack 27649 win 4096 | |||
| 08:22:57.039279 A.7505 > B.7505: . 30209:30721(512) ack 1 | 08:22:57.039279 A.7505 > B.7505: . 30209:30721(512) ack 1 | |||
| 08:22:57.321876 B.7505 > A.7505: . ack 28161 | 08:22:57.321876 B.7505 > A.7505: . ack 28161 | |||
| 08:22:57.322356 A.7505 > B.7505: . 30721:31233(512) ack 1 | 08:22:57.322356 A.7505 > B.7505: . 30721:31233(512) ack 1 | |||
| 08:22:57.347128 B.7505 > A.7505: . ack 28673 | 08:22:57.347128 B.7505 > A.7505: . ack 28673 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 08:22:57.347572 A.7505 > B.7505: . 31233:31745(512) ack 1 | 08:22:57.347572 A.7505 > B.7505: . 31233:31745(512) ack 1 | |||
| 08:22:57.347782 A.7505 > B.7505: . 31745:32257(512) ack 1 | 08:22:57.347782 A.7505 > B.7505: . 31745:32257(512) ack 1 | |||
| 08:22:57.936393 B.7505 > A.7505: . ack 29185 | 08:22:57.936393 B.7505 > A.7505: . ack 29185 | |||
| 08:22:57.936864 A.7505 > B.7505: . 32257:32769(512) ack 1 | 08:22:57.936864 A.7505 > B.7505: . 32257:32769(512) ack 1 | |||
| 08:22:57.950802 B.7505 > A.7505: . ack 29697 win 4096 | 08:22:57.950802 B.7505 > A.7505: . ack 29697 win 4096 | |||
| 08:22:57.951246 A.7505 > B.7505: . 32769:33281(512) ack 1 | 08:22:57.951246 A.7505 > B.7505: . 32769:33281(512) ack 1 | |||
| 08:22:58.169422 B.7505 > A.7505: . ack 29697 | 08:22:58.169422 B.7505 > A.7505: . ack 29697 | |||
| 08:22:58.638222 B.7505 > A.7505: . ack 29697 | 08:22:58.638222 B.7505 > A.7505: . ack 29697 | |||
| 08:22:58.643312 B.7505 > A.7505: . ack 29697 | 08:22:58.643312 B.7505 > A.7505: . ack 29697 | |||
| 08:22:58.643669 A.7505 > B.7505: . 29697:30209(512) ack 1 | 08:22:58.643669 A.7505 > B.7505: . 29697:30209(512) ack 1 | |||
| 08:22:58.936436 B.7505 > A.7505: . ack 29697 | 08:22:58.936436 B.7505 > A.7505: . ack 29697 | |||
| 08:22:59.002614 B.7505 > A.7505: . ack 29697 | 08:22:59.002614 B.7505 > A.7505: . ack 29697 | |||
| 08:22:59.003026 A.7505 > B.7505: . 33281:33793(512) ack 1 | 08:22:59.003026 A.7505 > B.7505: . 33281:33793(512) ack 1 | |||
| 08:22:59.682902 B.7505 > A.7505: . ack 33281 | 08:22:59.682902 B.7505 > A.7505: . ack 33281 | |||
| 08:22:59.683391 A.7505 > B.7505: P 33793:34305(512) ack 1 | 08:22:59.683391 A.7505 > B.7505: P 33793:34305(512) ack 1 | |||
| 08:22:59.683748 A.7505 > B.7505: P 34305:34817(512) ack 1 | 08:22:59.683748 A.7505 > B.7505: P 34305:34817(512) ack 1 *** | |||
| 08:22:59.684043 A.7505 > B.7505: P 34817:35329(512) ack 1 | 08:22:59.684043 A.7505 > B.7505: P 34817:35329(512) ack 1 | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 08:22:59.684266 A.7505 > B.7505: P 35329:35841(512) ack 1 | 08:22:59.684266 A.7505 > B.7505: P 35329:35841(512) ack 1 | |||
| 08:22:59.684567 A.7505 > B.7505: P 35841:36353(512) ack 1 | 08:22:59.684567 A.7505 > B.7505: P 35841:36353(512) ack 1 | |||
| 08:22:59.684810 A.7505 > B.7505: P 36353:36865(512) ack 1 | 08:22:59.684810 A.7505 > B.7505: P 36353:36865(512) ack 1 | |||
| 08:22:59.685094 A.7505 > B.7505: P 36865:37377(512) ack 1 | 08:22:59.685094 A.7505 > B.7505: P 36865:37377(512) ack 1 | |||
| The first 12 lines of the trace show incoming ACKs clocking out a | The first 12 lines of the trace show incoming ACKs clocking out a | |||
| window of data segments. At this point in the transfer, cwnd is 7 | window of data segments. At this point in the transfer, cwnd is 7 | |||
| segments. The next 4 lines of the trace show 3 duplicate ACKs | segments. The next 4 lines of the trace show 3 duplicate ACKs | |||
| arriving from the receiver, followed by a retransmission from the | arriving from the receiver, followed by a retransmission from the | |||
| sender. At this point, cwnd is halved (to 3 segments) and | sender. At this point, cwnd is halved (to 3 segments) and | |||
| artificially incremented by the three duplicate ACKs that have | artificially incremented by the three duplicate ACKs that have | |||
| arrived, making cwnd 6 segments. The next two lines show 2 more | arrived, making cwnd 6 segments. The next two lines show 2 more | |||
| duplicate ACKs arriving, each of which increases cwnd by 1 segment. | duplicate ACKs arriving, each of which increases cwnd by 1 segment. | |||
| So, after these two duplicate ACKs arrive the cwnd is 8 segments | So, after these two duplicate ACKs arrive the cwnd is 8 segments | |||
| and the sender has permission to send 1 new segment (since there | and the sender has permission to send 1 new segment (since there | |||
| are 7 segments outstanding). The next line in the trace shows this | are 7 segments outstanding). The next line in the trace shows this | |||
| new segment being transmitted. The next packet shown in the trace | new segment being transmitted. The next packet shown in the trace | |||
| is an ACK from host B that covers the first 7 outstanding segments | is an ACK from host B that covers the first 7 outstanding segments | |||
| (all but the segment sent during recovery). This should cause cwnd | (all but the new segment sent during recovery). This should cause | |||
| to be reduced to 3 segments and 2 segments to be transmitted (since | cwnd to be reduced to 3 segments and 2 segments to be transmitted | |||
| there is already 1 outstanding segment in the network). However, | (since there is already 1 outstanding segment in the network). | |||
| as shown by the last 7 lines of the trace, cwnd is not reduced, | However, as shown by the last 7 lines of the trace, cwnd is not | |||
| causing a line-rate burst of 7 new segments. | reduced, causing a line-rate burst of 7 new segments. | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| The trace would appear identical to the one above, only it would | The trace would appear identical to the one above, only it would | |||
| stop after: | stop after the line marked "***", because at this point host A | |||
| would correctly reduce cwnd after recovery, allowing only 2 | ||||
| 08:22:59.683748 A.7505 > B.7505: P 34305:34817(512) ack 1 | segments to be transmitted, rather than producing a burst of 7 | |||
| segments. | ||||
| ID Known TCP Implementation Problems August 1998 | ||||
| because at this point host A would correctly reduce cwnd after | ||||
| recovery, allowing only 2 segments to be transmited, rather than | ||||
| producing a burst of 7 segments. | ||||
| References | References | |||
| This problem is documented and the performance implications | This problem is documented and the performance implications | |||
| analyzed in [Brakmo95]. | analyzed in [Brakmo95]. | |||
| How to detect | How to detect | |||
| Failure of window deflation after loss recovery can be found by | Failure of window deflation after loss recovery can be found by | |||
| examining sender-side packet traces recorded during periods of | examining sender-side packet traces recorded during periods of | |||
| moderate loss (so cwnd can grow large enough to allow for fast | moderate loss (so cwnd can grow large enough to allow for fast | |||
| recovery when loss occurs). | recovery when loss occurs). | |||
| How to fix | How to fix | |||
| When this bug is caused by incorrect header prediction, the fix is | When this bug is caused by incorrect header prediction, the fix is | |||
| to add a predicate to the header prediction test that checks to see | to add a predicate to the header prediction test that checks to see | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| whether cwnd is inflated; if so, the header prediction test fails | whether cwnd is inflated; if so, the header prediction test fails | |||
| and the usual ACK processing occurs, which (in this case) takes | and the usual ACK processing occurs, which (in this case) takes | |||
| care to deflate the window. | care to deflate the window. See [Brakmo95] for details. | |||
| 3.9. | 3.9. | |||
| Name of Problem | Name of Problem | |||
| Excessively short keepalive connection timeout | Excessively short keepalive connection timeout | |||
| Classification | Classification | |||
| Reliability | Reliability | |||
| Description | Description | |||
| Keep-alive is a mechanism for checking whether an idle connection | Keep-alive is a mechanism for checking whether an idle connection | |||
| is still alive. According to RFC-1122, keepalive should only be | is still alive. According to RFC 1122, keepalive should only be | |||
| invoked in server applications that might otherwise hang | invoked in server applications that might otherwise hang | |||
| indefinitely and consume resources unnecessarily if a client | indefinitely and consume resources unnecessarily if a client | |||
| crashes or aborts a connection during a network failure. | crashes or aborts a connection during a network failure. | |||
| RFC-1122 also specifies that if a keep-alive mechanism is | RFC 1122 also specifies that if a keep-alive mechanism is | |||
| implemented it MUST NOT interpret failure to respond to any | implemented it MUST NOT interpret failure to respond to any | |||
| specific probe as a dead connection. The RFC does not specify a | specific probe as a dead connection. The RFC does not specify a | |||
| particular mechanism for timing out a connection when no response | particular mechanism for timing out a connection when no response | |||
| is received for keepalive probes. However, if the mechanism does | is received for keepalive probes. However, if the mechanism does | |||
| not allow ample time for recovery from network congestion or delay, | not allow ample time for recovery from network congestion or delay, | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| connections may be timed out unnecessarily. | connections may be timed out unnecessarily. | |||
| Significance | Significance | |||
| In congested networks, can lead to unwarranted termination of | In congested networks, can lead to unwarranted termination of | |||
| connections. | connections. | |||
| Implications | Implications | |||
| It is possible for the network connection between two peer machines | It is possible for the network connection between two peer machines | |||
| to become congested or to exhibit packet loss at the time that a | to become congested or to exhibit packet loss at the time that a | |||
| keep-alive probe is sent on a connection. If the keep-alive | keep-alive probe is sent on a connection. If the keep-alive | |||
| mechanism does not allow sufficient time before dropping | mechanism does not allow sufficient time before dropping | |||
| connections in the face of unacknowledged probes, connections may | connections in the face of unacknowledged probes, connections may | |||
| be dropped even when both peers of a connection are still alive. | be dropped even when both peers of a connection are still alive. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 specifies that the keep-alive mechanism may be provided. | RFC 1122 specifies that the keep-alive mechanism may be provided. | |||
| It does not specify a mechanism for determining dead connections | It does not specify a mechanism for determining dead connections | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| when keepalive probes are not acknowledged. | when keepalive probes are not acknowledged. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using the Orchestra tool at the peer of the machine using | Made using the Orchestra tool at the peer of the machine using | |||
| keep-alive. After connection establishment, incoming keep-alives | keep-alive. After connection establishment, incoming keep-alives | |||
| were dropped by Orchestra to simulate a dead connection. | were dropped by Orchestra to simulate a dead connection. | |||
| 22:11:12.040000 A > B: 22666019:0 win 8192 datasz 4 SYN | 22:11:12.040000 A > B: 22666019:0 win 8192 datasz 4 SYN | |||
| 22:11:12.060000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK | 22:11:12.060000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK | |||
| 22:11:12.130000 A > B: 22666020:2496002 win 8760 datasz 0 ACK | 22:11:12.130000 A > B: 22666020:2496002 win 8760 datasz 0 ACK | |||
| skipping to change at page 31, line 4 ¶ | skipping to change at page 31, line 33 ¶ | |||
| The initial three packets are the SYN exchange for connection | The initial three packets are the SYN exchange for connection | |||
| setup. About two hours later, the keepalive timer fires because | setup. About two hours later, the keepalive timer fires because | |||
| the connection has been idle. Keepalive probes are transmitted a | the connection has been idle. Keepalive probes are transmitted a | |||
| total of 5 times, with a 1 second spacing between probes, after | total of 5 times, with a 1 second spacing between probes, after | |||
| which the connection is dropped. This is problematic because a 5 | which the connection is dropped. This is problematic because a 5 | |||
| second network outage at the time of the first probe results in the | second network outage at the time of the first probe results in the | |||
| connection being killed. | connection being killed. | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Made using the Orchestra tool at the peer of the machine using | Made using the Orchestra tool at the peer of the machine using | |||
| keep-alive. After connection establishment, incoming keep-alives | keep-alive. After connection establishment, incoming keep-alives | |||
| were dropped by Orchestra to simulate a dead connection. | were dropped by Orchestra to simulate a dead connection. | |||
| 16:01:52.130000 A > B: 1804412929:0 win 4096 datasz 4 SYN | 16:01:52.130000 A > B: 1804412929:0 win 4096 datasz 4 SYN | |||
| 16:01:52.360000 B > A: 16512001:1804412930 win 4096 datasz 4 SYN ACK | 16:01:52.360000 B > A: 16512001:1804412930 win 4096 datasz 4 SYN ACK | |||
| 16:01:52.410000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK | 16:01:52.410000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK | |||
| (two hours elapse) | (two hours elapse) | |||
| 18:01:57.170000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | 18:01:57.170000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | |||
| 18:03:12.220000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | 18:03:12.220000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | |||
| 18:04:27.270000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | 18:04:27.270000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | |||
| 18:05:42.320000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | 18:05:42.320000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | |||
| 18:06:57.370000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | 18:06:57.370000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | |||
| 18:08:12.420000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | 18:08:12.420000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | |||
| 18:09:27.480000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | 18:09:27.480000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | |||
| 18:10:43.290000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | 18:10:43.290000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | |||
| 18:11:57.580000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | 18:11:57.580000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK | |||
| 18:13:12.630000 A > B: 1804412929:16512002 win 4096 datasz 0 RST ACK | 18:13:12.630000 A > B: 1804412929:16512002 win 4096 datasz 0 RST ACK | |||
| In this trace, when the keep-alive timer expires, 9 keepalive | In this trace, when the keep-alive timer expires, 9 keepalive | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| probes are sent at 75 second intervals. 75 seconds after the last | probes are sent at 75 second intervals. 75 seconds after the last | |||
| probe is sent, a final RST segment is sent indicating that the | probe is sent, a final RST segment is sent indicating that the | |||
| connection has been closed. This implementation waits about 11 | connection has been closed. This implementation waits about 11 | |||
| minutes before timing out the connection, while the first | minutes before timing out the connection, while the first | |||
| implementation shown allows only 5 seconds. | implementation shown allows only 5 seconds. | |||
| References | References | |||
| This problem is documented in [Dawson97]. | This problem is documented in [Dawson97]. | |||
| How to detect | How to detect | |||
| skipping to change at page 32, line 4 ¶ | skipping to change at page 32, line 32 ¶ | |||
| or if the keepalive mechanism violates the specification (see | or if the keepalive mechanism violates the specification (see | |||
| Insufficient interval between keepalives problem). In this | Insufficient interval between keepalives problem). In this | |||
| example, suppressing the response of the peer to keepalive probes | example, suppressing the response of the peer to keepalive probes | |||
| was accomplished using the Orchestra toolkit, which can be | was accomplished using the Orchestra toolkit, which can be | |||
| configured to drop packets. It could also have been done by | configured to drop packets. It could also have been done by | |||
| creating a connection, turning on keepalive, and disconnecting the | creating a connection, turning on keepalive, and disconnecting the | |||
| network connection at the receiver machine. | network connection at the receiver machine. | |||
| How to fix | How to fix | |||
| This problem can be fixed by using a different method for timing | This problem can be fixed by using a different method for timing | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| out keepalives that allows a longer period of time to elapse before | out keepalives that allows a longer period of time to elapse before | |||
| dropping the connection. For example, the algorithm for timing out | dropping the connection. For example, the algorithm for timing out | |||
| on dropped data could be used. Another possibility is an algorithm | on dropped data could be used. Another possibility is an algorithm | |||
| such as the one shown in the trace above, which sends 9 probes at | such as the one shown in the trace above, which sends 9 probes at | |||
| 75 second intervals and then waits an additional 75 seconds for a | 75 second intervals and then waits an additional 75 seconds for a | |||
| response before closing the connection. | response before closing the connection. | |||
| 3.10. | 3.10. | |||
| Name of Problem | Name of Problem | |||
| Failure to back off retransmission timeout | Failure to back off retransmission timeout | |||
| Classification | Classification | |||
| Congestion control / reliability | Congestion control / reliability | |||
| Description | Description | |||
| The retransmission timeout is used to determine when a packet has | The retransmission timeout is used to determine when a packet has | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| been dropped in the network. When this timeout has expired without | been dropped in the network. When this timeout has expired without | |||
| the arrival of an ACK, the segment is retransmitted. Each time a | the arrival of an ACK, the segment is retransmitted. Each time a | |||
| segment is retransmitted, the timeout is adjusted according to an | segment is retransmitted, the timeout is adjusted according to an | |||
| exponential backoff algorithm, doubling each time. If a TCP fails | exponential backoff algorithm, doubling each time. If a TCP fails | |||
| to receive an ACK after numerous attempts at retransmitting the | to receive an ACK after numerous attempts at retransmitting the | |||
| same segment, it terminates the connection. A TCP that fails to | same segment, it terminates the connection. A TCP that fails to | |||
| double its retransmission timeout upon repeated timeouts is said to | double its retransmission timeout upon repeated timeouts is said to | |||
| exhibit "Failure to back off retransmission timeout". | exhibit "Failure to back off retransmission timeout". | |||
| Significance | Significance | |||
| skipping to change at page 33, line 5 ¶ | skipping to change at page 33, line 32 ¶ | |||
| Implications | Implications | |||
| It is possible for the network connection between two TCP peers to | It is possible for the network connection between two TCP peers to | |||
| become congested or to exhibit packet loss at the time that a | become congested or to exhibit packet loss at the time that a | |||
| retransmission is sent on a connection. If the retransmission | retransmission is sent on a connection. If the retransmission | |||
| mechanism does not allow sufficient time before dropping | mechanism does not allow sufficient time before dropping | |||
| connections in the face of unacknowledged segments, connections may | connections in the face of unacknowledged segments, connections may | |||
| be dropped even when, by waiting longer, the connection could have | be dropped even when, by waiting longer, the connection could have | |||
| continued. | continued. | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 specifies mandatory exponential backoff of the | RFC 1122 specifies mandatory exponential backoff of the | |||
| retransmission timeout, and the termination of connections after | retransmission timeout, and the termination of connections after | |||
| some period of time (at least 100 seconds). | some period of time (at least 100 seconds). | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using tcpdump on an intermediate host: | Made using tcpdump on an intermediate host: | |||
| 16:51:12.671727 A > B: S 510878852:510878852(0) win 16384 | 16:51:12.671727 A > B: S 510878852:510878852(0) win 16384 | |||
| 16:51:12.672479 B > A: S 2392143687:2392143687(0) ack 510878853 win 16384 | 16:51:12.672479 B > A: S 2392143687:2392143687(0) ack 510878853 win 16384 | |||
| 16:51:12.672581 A > B: . ack 1 win 16384 | 16:51:12.672581 A > B: . ack 1 win 16384 | |||
| 16:51:15.244171 A > B: P 1:3(2) ack 1 win 16384 | 16:51:15.244171 A > B: P 1:3(2) ack 1 win 16384 | |||
| 16:51:15.244933 B > A: . ack 3 win 17518 (DF) | 16:51:15.244933 B > A: . ack 3 win 17518 (DF) | |||
| <receiving host disconnected> | <receiving host disconnected> | |||
| 16:51:19.381176 A > B: P 3:5(2) ack 1 win 16384 | 16:51:19.381176 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:20.162016 A > B: P 3:5(2) ack 1 win 16384 | 16:51:20.162016 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:21.161936 A > B: P 3:5(2) ack 1 win 16384 | 16:51:21.161936 A > B: P 3:5(2) ack 1 win 16384 | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 16:51:22.161914 A > B: P 3:5(2) ack 1 win 16384 | 16:51:22.161914 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:23.161914 A > B: P 3:5(2) ack 1 win 16384 | 16:51:23.161914 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:24.161879 A > B: P 3:5(2) ack 1 win 16384 | 16:51:24.161879 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:25.161857 A > B: P 3:5(2) ack 1 win 16384 | 16:51:25.161857 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:26.161836 A > B: P 3:5(2) ack 1 win 16384 | 16:51:26.161836 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:27.161814 A > B: P 3:5(2) ack 1 win 16384 | 16:51:27.161814 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:28.161791 A > B: P 3:5(2) ack 1 win 16384 | 16:51:28.161791 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:29.161769 A > B: P 3:5(2) ack 1 win 16384 | 16:51:29.161769 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:30.161750 A > B: P 3:5(2) ack 1 win 16384 | 16:51:30.161750 A > B: P 3:5(2) ack 1 win 16384 | |||
| 16:51:31.161727 A > B: P 3:5(2) ack 1 win 16384 | 16:51:31.161727 A > B: P 3:5(2) ack 1 win 16384 | |||
| skipping to change at page 34, line 4 ¶ | skipping to change at page 34, line 34 ¶ | |||
| second for 12 seconds, and then the connection is terminated with a | second for 12 seconds, and then the connection is terminated with a | |||
| RST. This is problematic because a 12 second pause in connectivity | RST. This is problematic because a 12 second pause in connectivity | |||
| could result in the termination of a connection. | could result in the termination of a connection. | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| Again, a tcpdump taken from a third host: | Again, a tcpdump taken from a third host: | |||
| 16:59:05.398301 A > B: S 2503324757:2503324757(0) win 16384 | 16:59:05.398301 A > B: S 2503324757:2503324757(0) win 16384 | |||
| 16:59:05.399673 B > A: S 2492674648:2492674648(0) ack 2503324758 win 16384 | 16:59:05.399673 B > A: S 2492674648:2492674648(0) ack 2503324758 win 16384 | |||
| 16:59:05.399866 A > B: . ack 1 win 17520 | 16:59:05.399866 A > B: . ack 1 win 17520 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 16:59:06.538107 A > B: P 1:3(2) ack 1 win 17520 | 16:59:06.538107 A > B: P 1:3(2) ack 1 win 17520 | |||
| 16:59:06.540977 B > A: . ack 3 win 17518 (DF) | 16:59:06.540977 B > A: . ack 3 win 17518 (DF) | |||
| <receiving host disconnected> | <receiving host disconnected> | |||
| 16:59:13.121542 A > B: P 3:5(2) ack 1 win 17520 | 16:59:13.121542 A > B: P 3:5(2) ack 1 win 17520 | |||
| 16:59:14.010928 A > B: P 3:5(2) ack 1 win 17520 | 16:59:14.010928 A > B: P 3:5(2) ack 1 win 17520 | |||
| 16:59:16.010979 A > B: P 3:5(2) ack 1 win 17520 | 16:59:16.010979 A > B: P 3:5(2) ack 1 win 17520 | |||
| 16:59:20.011229 A > B: P 3:5(2) ack 1 win 17520 | 16:59:20.011229 A > B: P 3:5(2) ack 1 win 17520 | |||
| 16:59:28.011896 A > B: P 3:5(2) ack 1 win 17520 | 16:59:28.011896 A > B: P 3:5(2) ack 1 win 17520 | |||
| skipping to change at page 34, line 28 ¶ | skipping to change at page 35, line 5 ¶ | |||
| 17:00:16.015766 A > B: P 3:5(2) ack 1 win 17520 | 17:00:16.015766 A > B: P 3:5(2) ack 1 win 17520 | |||
| 17:01:20.021308 A > B: P 3:5(2) ack 1 win 17520 | 17:01:20.021308 A > B: P 3:5(2) ack 1 win 17520 | |||
| 17:02:24.027752 A > B: P 3:5(2) ack 1 win 17520 | 17:02:24.027752 A > B: P 3:5(2) ack 1 win 17520 | |||
| 17:03:28.034569 A > B: P 3:5(2) ack 1 win 17520 | 17:03:28.034569 A > B: P 3:5(2) ack 1 win 17520 | |||
| 17:04:32.041567 A > B: P 3:5(2) ack 1 win 17520 | 17:04:32.041567 A > B: P 3:5(2) ack 1 win 17520 | |||
| 17:05:36.048264 A > B: P 3:5(2) ack 1 win 17520 | 17:05:36.048264 A > B: P 3:5(2) ack 1 win 17520 | |||
| 17:06:40.054900 A > B: P 3:5(2) ack 1 win 17520 | 17:06:40.054900 A > B: P 3:5(2) ack 1 win 17520 | |||
| 17:07:44.061306 A > B: R 5:5(0) ack 1 win 17520 | 17:07:44.061306 A > B: R 5:5(0) ack 1 win 17520 | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| In this trace, when the retransmission timer expires, 12 | In this trace, when the retransmission timer expires, 12 | |||
| retransmissions are sent at exponentially-increasing intervals, | retransmissions are sent at exponentially-increasing intervals, | |||
| until the interval value reaches 64 seconds, at which time the | until the interval value reaches 64 seconds, at which time the | |||
| interval stops growing. 64 seconds after the last retransmission, | interval stops growing. 64 seconds after the last retransmission, | |||
| a final RST segment is sent indicating that the connection has been | a final RST segment is sent indicating that the connection has been | |||
| closed. This implementation waits about 9 minutes before timing | closed. This implementation waits about 9 minutes before timing | |||
| out the connection, while the first implementation shown allows | out the connection, while the first implementation shown allows | |||
| only 12 seconds. | only 12 seconds. | |||
| References | References | |||
| None known. | None known. | |||
| How to detect | How to detect | |||
| A simple transfer can be eaily interrupted by disconnecting the | A simple transfer can be easily interrupted by disconnecting the | |||
| receiving host from the network. tcpdump or another appropriate | receiving host from the network. tcpdump or another appropriate | |||
| tool should show the retransmissions being sent. Several trials in | tool should show the retransmissions being sent. Several trials in | |||
| a low-rtt environment may be required to demonstrate the bug. | a low-rtt environment may be required to demonstrate the bug. | |||
| How to fix | How to fix | |||
| For one of the implementations studied, this problem seemed to be | For one of the implementations studied, this problem seemed to be | |||
| the result of an error introduced with the addition of the Brakmo- | the result of an error introduced with the addition of the Brakmo- | |||
| Peterson RTO algorithm [Brakmo95], which can return a value of zero | Peterson RTO algorithm [Brakmo95], which can return a value of zero | |||
| where the older Jacobson algorithm would always have a minimum | where the older Jacobson algorithm always returns a positive value. | |||
| value of three. Brakmo and Peterson specified an additional step | Brakmo and Peterson specified an additional step of min(rtt + 2, | |||
| RTO) to avoid problems with this. Unfortunately, in the | ||||
| ID Known TCP Implementation Problems August 1998 | implementation this step was omitted when calculating the | |||
| of min(rtt + 2, RTO) to avoid problems with this. Unfortunately, | ||||
| in the implementation this step was omitted when calculating the | ||||
| exponential backoff for the RTO. This results in an RTO of 0 | exponential backoff for the RTO. This results in an RTO of 0 | |||
| seconds being multiplied by the backoff, yielding again zero, and | seconds being multiplied by the backoff, yielding again zero, and | |||
| then being subjected to a later MAX operation that increases it to | then being subjected to a later MAX operation that increases it to | |||
| 1 second, regardless of the backoff factor. | 1 second, regardless of the backoff factor. | |||
| A similar TCP persist failure has the same cause. | A similar TCP persist failure has the same cause. | |||
| 3.11. | 3.11. | |||
| Name of Problem | Name of Problem | |||
| Insufficient interval between keepalives | Insufficient interval between keepalives | |||
| Classification | Classification | |||
| Reliability | Reliability | |||
| Description | Description | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| Keep-alive is a mechanism for checking whether an idle connection | Keep-alive is a mechanism for checking whether an idle connection | |||
| is still alive. According to RFC-1122, keep-alive may be included | is still alive. According to RFC 1122, keep-alive may be included | |||
| in an implementation. If it is included, the interval between | in an implementation. If it is included, the interval between | |||
| keep-alive packets MUST be configurable, and MUST default to no | keep-alive packets MUST be configurable, and MUST default to no | |||
| less than two hours. | less than two hours. | |||
| Significance | Significance | |||
| In congested networks, can lead to unwarranted termination of | In congested networks, can lead to unwarranted termination of | |||
| connections. | connections. | |||
| Implications | Implications | |||
| According to RFC-1122, keep-alive is not required of | According to RFC 1122, keep-alive is not required of | |||
| implementations because it could: (1) cause perfectly good | implementations because it could: (1) cause perfectly good | |||
| connections to break during transient Internet failures; (2) | connections to break during transient Internet failures; (2) | |||
| consume unnecessary bandwidth ("if no one is using the connection, | consume unnecessary bandwidth ("if no one is using the connection, | |||
| who cares if it is still good?"); and (3) cost money for an | who cares if it is still good?"); and (3) cost money for an | |||
| Internet path that charges for packets. Regarding this last point, | Internet path that charges for packets. Regarding this last point, | |||
| we note that in addition the presence of dial-on-demand links in | we note that in addition the presence of dial-on-demand links in | |||
| the route can greatly magnify the cost penalty of excess | the route can greatly magnify the cost penalty of excess | |||
| keepalives, potentially forcing a full-time connection on a link | keepalives, potentially forcing a full-time connection on a link | |||
| that would otherwise only be connected a few minutes a day. | that would otherwise only be connected a few minutes a day. | |||
| If keepalive is provided the RFC states that the required inter- | If keepalive is provided the RFC states that the required inter- | |||
| keepalive distance MUST default to no less than two hours. If it | keepalive distance MUST default to no less than two hours. If it | |||
| does not, the probability of connections breaking increases, the | does not, the probability of connections breaking increases, the | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| bandwidth used due to keepalives increases, and cost increases over | bandwidth used due to keepalives increases, and cost increases over | |||
| paths which charge per packet. | paths which charge per packet. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 specifies that the keep-alive mechanism may be provided. | RFC 1122 specifies that the keep-alive mechanism may be provided. | |||
| It also specifies the two hour minimum for the default interval | It also specifies the two hour minimum for the default interval | |||
| between keepalive probes. | between keepalive probes. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using the Orchestra tool at the peer of the machine using | Made using the Orchestra tool at the peer of the machine using | |||
| keep-alive. Machine A was configured to use default settings for | keep-alive. Machine A was configured to use default settings for | |||
| the keepalive timer. | the keepalive timer. | |||
| 11:36:32.910000 A > B: 3288354305:0 win 28672 datasz 4 SYN | 11:36:32.910000 A > B: 3288354305:0 win 28672 datasz 4 SYN | |||
| 11:36:32.930000 B > A: 896001:3288354306 win 4096 datasz 4 SYN ACK | 11:36:32.930000 B > A: 896001:3288354306 win 4096 datasz 4 SYN ACK | |||
| 11:36:32.950000 A > B: 3288354306:896002 win 28672 datasz 0 ACK | 11:36:32.950000 A > B: 3288354306:896002 win 28672 datasz 0 ACK | |||
| 11:50:01.190000 A > B: 3288354305:896002 win 28672 datasz 0 ACK | 11:50:01.190000 A > B: 3288354305:896002 win 28672 datasz 0 ACK | |||
| 11:50:01.210000 B > A: 896002:3288354306 win 4096 datasz 0 ACK | 11:50:01.210000 B > A: 896002:3288354306 win 4096 datasz 0 ACK | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 12:03:29.410000 A > B: 3288354305:896002 win 28672 datasz 0 ACK | 12:03:29.410000 A > B: 3288354305:896002 win 28672 datasz 0 ACK | |||
| 12:03:29.430000 B > A: 896002:3288354306 win 4096 datasz 0 ACK | 12:03:29.430000 B > A: 896002:3288354306 win 4096 datasz 0 ACK | |||
| 12:16:57.630000 A > B: 3288354305:896002 win 28672 datasz 0 ACK | 12:16:57.630000 A > B: 3288354305:896002 win 28672 datasz 0 ACK | |||
| 12:16:57.650000 B > A: 896002:3288354306 win 4096 datasz 0 ACK | 12:16:57.650000 B > A: 896002:3288354306 win 4096 datasz 0 ACK | |||
| 12:30:25.850000 A > B: 3288354305:896002 win 28672 datasz 0 ACK | 12:30:25.850000 A > B: 3288354305:896002 win 28672 datasz 0 ACK | |||
| 12:30:25.870000 B > A: 896002:3288354306 win 4096 datasz 0 ACK | 12:30:25.870000 B > A: 896002:3288354306 win 4096 datasz 0 ACK | |||
| 12:43:54.070000 A > B: 3288354305:896002 win 28672 datasz 0 ACK | 12:43:54.070000 A > B: 3288354305:896002 win 28672 datasz 0 ACK | |||
| skipping to change at page 37, line 4 ¶ | skipping to change at page 37, line 32 ¶ | |||
| timer fires again in about 13 more minutes. This behavior | timer fires again in about 13 more minutes. This behavior | |||
| continues indefinitely until the connection is closed, and is a | continues indefinitely until the connection is closed, and is a | |||
| violation of the specification. | violation of the specification. | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| Made using the Orchestra tool at the peer of the machine using | Made using the Orchestra tool at the peer of the machine using | |||
| keep-alive. Machine A was configured to use default settings for | keep-alive. Machine A was configured to use default settings for | |||
| the keepalive timer. | the keepalive timer. | |||
| 17:37:20.500000 A > B: 34155521:0 win 4096 datasz 4 SYN | 17:37:20.500000 A > B: 34155521:0 win 4096 datasz 4 SYN | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 17:37:20.520000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK | 17:37:20.520000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK | |||
| 17:37:20.540000 A > B: 34155522:6272002 win 4096 datasz 0 ACK | 17:37:20.540000 A > B: 34155522:6272002 win 4096 datasz 0 ACK | |||
| 19:37:25.430000 A > B: 34155521:6272002 win 4096 datasz 0 ACK | 19:37:25.430000 A > B: 34155521:6272002 win 4096 datasz 0 ACK | |||
| 19:37:25.450000 B > A: 6272002:34155522 win 4096 datasz 0 ACK | 19:37:25.450000 B > A: 6272002:34155522 win 4096 datasz 0 ACK | |||
| 21:37:30.560000 A > B: 34155521:6272002 win 4096 datasz 0 ACK | 21:37:30.560000 A > B: 34155521:6272002 win 4096 datasz 0 ACK | |||
| 21:37:30.570000 B > A: 6272002:34155522 win 4096 datasz 0 ACK | 21:37:30.570000 B > A: 6272002:34155522 win 4096 datasz 0 ACK | |||
| 23:37:35.580000 A > B: 34155521:6272002 win 4096 datasz 0 ACK | 23:37:35.580000 A > B: 34155521:6272002 win 4096 datasz 0 ACK | |||
| skipping to change at page 37, line 29 ¶ | skipping to change at page 38, line 4 ¶ | |||
| 01:37:40.620000 A > B: 34155521:6272002 win 4096 datasz 0 ACK | 01:37:40.620000 A > B: 34155521:6272002 win 4096 datasz 0 ACK | |||
| 01:37:40.640000 B > A: 6272002:34155522 win 4096 datasz 0 ACK | 01:37:40.640000 B > A: 6272002:34155522 win 4096 datasz 0 ACK | |||
| 03:37:45.590000 A > B: 34155521:6272002 win 4096 datasz 0 ACK | 03:37:45.590000 A > B: 34155521:6272002 win 4096 datasz 0 ACK | |||
| 03:37:45.610000 B > A: 6272002:34155522 win 4096 datasz 0 ACK | 03:37:45.610000 B > A: 6272002:34155522 win 4096 datasz 0 ACK | |||
| The initial three packets are the SYN exchange for connection | The initial three packets are the SYN exchange for connection | |||
| setup. Just over two hours later, the keepalive timer fires | setup. Just over two hours later, the keepalive timer fires | |||
| because the connection is idle. The keepalive is acknowledged, and | because the connection is idle. The keepalive is acknowledged, and | |||
| the timer fires again just over two hours later. This behavior | the timer fires again just over two hours later. This behavior | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| continues indefinitely until the connection is closed. | continues indefinitely until the connection is closed. | |||
| References | References | |||
| This problem is documented in [Dawson97]. | This problem is documented in [Dawson97]. | |||
| How to detect | How to detect | |||
| For implementations manifesting this problem, it shows up on a | For implementations manifesting this problem, it shows up on a | |||
| packet trace. If the connection is left idle, the keepalive probes | packet trace. If the connection is left idle, the keepalive probes | |||
| will arrive closer together than the two hour minimum. | will arrive closer together than the two hour minimum. | |||
| 3.12. | 3.12. | |||
| Name of Problem | Name of Problem | |||
| Window probe deadlock | ||||
| Classification | ||||
| Reliability | ||||
| Description | ||||
| When an application reads a single byte from a full window, the | ||||
| window should not be updated, in order to avoid Silly Window | ||||
| Syndrome (SWS; see [RFC813]). If the remote peer uses a single | ||||
| byte of data to probe the window, that byte can be accepted into | ||||
| the buffer. In some implementations, at this point a negative | ||||
| argument to a signed comparison causes all further new data to be | ||||
| considered outside the window; consequently, it is discarded (after | ||||
| sending an ACK to resynchronize). These discards include the ACKs | ||||
| for the data packets sent by the local TCP, so the TCP will | ||||
| consider the data unacknowledged. | ||||
| Consequently, the application may be unable to complete sending new | ||||
| data to the remote peer, because it has exhausted the transmit | ||||
| buffer available to its local TCP, and buffer space is never being | ||||
| freed because incoming ACKs that would do so are being discarded. | ||||
| If the application does not read any more data, which may happen | ||||
| due to its failure to complete such sends, then deadlock results. | ||||
| Significance | ||||
| It's relatively rare for applications to use TCP in a manner that | ||||
| can exercise this problem. Most applications only transmit bulk | ||||
| data if they know the other end is prepared to receive the data. | ||||
| However, if a client fails to consume data, putting the server in | ||||
| ID Known TCP Implementation Problems November 1998 | ||||
| persist mode, and then consumes a small amount of data, it can | ||||
| mistakenly compute a negative window. At this point the client | ||||
| will discard all further packets from the server, including ACKs of | ||||
| the client's own data, since they are not inside the (impossibly- | ||||
| sized) window. If subsequently the client consumes enough data to | ||||
| then send a window update to the server, the situation will be | ||||
| rectified. That is, this situation can only happen if the client | ||||
| consumes 1 < N < MSS bytes, so as not to cause a window update, and | ||||
| then starts its own transmission towards the server of more than a | ||||
| window's worth of data. | ||||
| Implications | ||||
| TCP connections will hang and eventually time out. | ||||
| Relevant RFCs | ||||
| RFC 793 describes zero window probing. RFC 813 describes Silly | ||||
| Window Syndrome. | ||||
| Trace file demonstrating it | ||||
| Trace made from a version of tcpdump modified to print out the | ||||
| sequence number attached to an ACK even if it's dataless. An | ||||
| unmodified tcpdump would not print seq:seq(0); however, for this | ||||
| bug, the sequence number in the ACK is important for unambiguously | ||||
| determining how the TCP is behaving. | ||||
| [ Normal connection startup and data transmission from B to A. | ||||
| Options, including MSS of 16344 in both directions, omitted | ||||
| for clarity. ] | ||||
| 16:07:32.327616 A > B: S 65360807:65360807(0) win 8192 | ||||
| 16:07:32.327304 B > A: S 65488807:65488807(0) ack 65360808 win 57344 | ||||
| 16:07:32.327425 A > B: . 1:1(0) ack 1 win 57344 | ||||
| 16:07:32.345732 B > A: P 1:2049(2048) ack 1 win 57344 | ||||
| 16:07:32.347013 B > A: P 2049:16385(14336) ack 1 win 57344 | ||||
| 16:07:32.347550 B > A: P 16385:30721(14336) ack 1 win 57344 | ||||
| 16:07:32.348683 B > A: P 30721:45057(14336) ack 1 win 57344 | ||||
| 16:07:32.467286 A > B: . 1:1(0) ack 45057 win 12288 | ||||
| 16:07:32.467854 B > A: P 45057:57345(12288) ack 1 win 57344 | ||||
| [ B fills up A's offered window ] | ||||
| 16:07:32.667276 A > B: . 1:1(0) ack 57345 win 0 | ||||
| [ B probes A's window with a single byte ] | ||||
| 16:07:37.467438 B > A: . 57345:57346(1) ack 1 win 57344 | ||||
| [ A resynchronizes without accepting the byte ] | ||||
| ID Known TCP Implementation Problems November 1998 | ||||
| 16:07:37.467678 A > B: . 1:1(0) ack 57345 win 0 | ||||
| [ B probes A's window again ] | ||||
| 16:07:45.467438 B > A: . 57345:57346(1) ack 1 win 57344 | ||||
| [ A resynchronizes and accepts the byte (per the ack field) ] | ||||
| 16:07:45.667250 A > B: . 1:1(0) ack 57346 win 0 | ||||
| [ The application on A has started generating data. The first | ||||
| packet A sends is small due to a memory allocation bug. ] | ||||
| 16:07:51.358459 A > B: P 1:2049(2048) ack 57346 win 0 | ||||
| [ B acks A's first packet ] | ||||
| 16:07:51.467239 B > A: . 57346:57346(0) ack 2049 win 57344 | ||||
| [ This looks as though A accepted B's ACK and is sending | ||||
| another packet in response to it. In fact, A is trying | ||||
| to resynchronize with B, and happens to have data to send | ||||
| and can send it because the first small packet didn't use | ||||
| up cwnd. ] | ||||
| 16:07:51.467698 A > B: . 2049:14337(12288) ack 57346 win 0 | ||||
| [ B acks all of the data that A has sent ] | ||||
| 16:07:51.667283 B > A: . 57346:57346(0) ack 14337 win 57344 | ||||
| [ A tries to resynchronize. Notice that by the packets | ||||
| seen on the network, A and B *are* in fact synchronized; | ||||
| A only thinks that they aren't. ] | ||||
| 16:07:51.667477 A > B: . 14337:14337(0) ack 57346 win 0 | ||||
| [ A's retransmit timer fires, and B acks all of the data. | ||||
| A once again tries to resynchronize. ] | ||||
| 16:07:52.467682 A > B: . 1:14337(14336) ack 57346 win 0 | ||||
| 16:07:52.468166 B > A: . 57346:57346(0) ack 14337 win 57344 | ||||
| 16:07:52.468248 A > B: . 14337:14337(0) ack 57346 win 0 | ||||
| [ A's retransmit timer fires again, and B acks all of the data. | ||||
| A once again tries to resynchronize. ] | ||||
| 16:07:55.467684 A > B: . 1:14337(14336) ack 57346 win 0 | ||||
| 16:07:55.468172 B > A: . 57346:57346(0) ack 14337 win 57344 | ||||
| 16:07:55.468254 A > B: . 14337:14337(0) ack 57346 win 0 | ||||
| Trace file demonstrating correct behavior | ||||
| Made between the same two hosts after applying the bug fix | ||||
| mentioned below (and using the same modified tcpdump). | ||||
| [ Connection starts up with data transmission from B to A. | ||||
| ID Known TCP Implementation Problems November 1998 | ||||
| Note that due to a separate bug (the fact that A and B | ||||
| are communicating over a loopback driver), B erroneously | ||||
| skips slow start. ] | ||||
| 17:38:09.510854 A > B: S 3110066585:3110066585(0) win 16384 | ||||
| 17:38:09.510926 B > A: S 3110174850:3110174850(0) ack 3110066586 win 57344 | ||||
| 17:38:09.510953 A > B: . 1:1(0) ack 1 win 57344 | ||||
| 17:38:09.512956 B > A: P 1:2049(2048) ack 1 win 57344 | ||||
| 17:38:09.513222 B > A: P 2049:16385(14336) ack 1 win 57344 | ||||
| 17:38:09.513428 B > A: P 16385:30721(14336) ack 1 win 57344 | ||||
| 17:38:09.513638 B > A: P 30721:45057(14336) ack 1 win 57344 | ||||
| 17:38:09.519531 A > B: . 1:1(0) ack 45057 win 12288 | ||||
| 17:38:09.519638 B > A: P 45057:57345(12288) ack 1 win 57344 | ||||
| [ B fills up A's offered window ] | ||||
| 17:38:09.719526 A > B: . 1:1(0) ack 57345 win 0 | ||||
| [ B probes A's window with a single byte. A resynchronizes | ||||
| without accepting the byte ] | ||||
| 17:38:14.499661 B > A: . 57345:57346(1) ack 1 win 57344 | ||||
| 17:38:14.499724 A > B: . 1:1(0) ack 57345 win 0 | ||||
| [ B probes A's window again. A resynchronizes and accepts | ||||
| the byte, as indicated by the ack field ] | ||||
| 17:38:19.499764 B > A: . 57345:57346(1) ack 1 win 57344 | ||||
| 17:38:19.519731 A > B: . 1:1(0) ack 57346 win 0 | ||||
| [ B probes A's window with a single byte. A resynchronizes | ||||
| without accepting the byte ] | ||||
| 17:38:24.499865 B > A: . 57346:57347(1) ack 1 win 57344 | ||||
| 17:38:24.499934 A > B: . 1:1(0) ack 57346 win 0 | ||||
| [ The application on A has started generating data. | ||||
| B acks A's data and A accepts the ACKs and the | ||||
| data transfer continues ] | ||||
| 17:38:28.530265 A > B: P 1:2049(2048) ack 57346 win 0 | ||||
| 17:38:28.719914 B > A: . 57346:57346(0) ack 2049 win 57344 | ||||
| 17:38:28.720023 A > B: . 2049:16385(14336) ack 57346 win 0 | ||||
| 17:38:28.720089 A > B: . 16385:30721(14336) ack 57346 win 0 | ||||
| 17:38:28.720370 B > A: . 57346:57346(0) ack 30721 win 57344 | ||||
| 17:38:28.720462 A > B: . 30721:45057(14336) ack 57346 win 0 | ||||
| 17:38:28.720526 A > B: P 45057:59393(14336) ack 57346 win 0 | ||||
| 17:38:28.720824 A > B: P 59393:73729(14336) ack 57346 win 0 | ||||
| 17:38:28.721124 B > A: . 57346:57346(0) ack 73729 win 47104 | ||||
| 17:38:28.721198 A > B: P 73729:88065(14336) ack 57346 win 0 | ||||
| 17:38:28.721379 A > B: P 88065:102401(14336) ack 57346 win 0 | ||||
| ID Known TCP Implementation Problems November 1998 | ||||
| 17:38:28.721557 A > B: P 102401:116737(14336) ack 57346 win 0 | ||||
| 17:38:28.721863 B > A: . 57346:57346(0) ack 116737 win 36864 | ||||
| References | ||||
| None known. | ||||
| How to detect | ||||
| Initiate a connection from a client to a server. Have the server | ||||
| continuously send data until its buffers have been full for long | ||||
| enough to exhaust the window. Next, have the client read 1 byte | ||||
| and then delay for long enough that the server TCP sends a window | ||||
| probe. Now have the client start sending data. At this point, if | ||||
| it ignores the server's ACKs, then the client's TCP suffers from | ||||
| the problem. | ||||
| How to fix | ||||
| In one implementation known to exhibit the problem (derived from | ||||
| 4.3-Reno), the problem was introduced when the macro MAX() was | ||||
| replaced by the function call max() for computing the amount of | ||||
| space in the receive window: | ||||
| tp->rcv_wnd = max(win, (int)(tp->rcv_adv - tp->rcv_nxt)); | ||||
| When data has been received into a window beyond what has been | ||||
| advertised to the other side, rcv_nxt > rcv_adv, making this | ||||
| negative. It's clear from the (int) cast that this is intended, | ||||
| but the unsigned max() function sign-extends so the negative number | ||||
| is "larger". The fix is to change max() to imax(): | ||||
| tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt)); | ||||
| 4.3-Tahoe and before did not have this bug, since it used the macro | ||||
| MAX() for this calculation. | ||||
| 3.13. | ||||
| Name of Problem | ||||
| Stretch ACK violation | Stretch ACK violation | |||
| Classification | Classification | |||
| Congestion Control/Performance | Congestion Control/Performance | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| Description | Description | |||
| To improve efficiency (both computer and network) a data receiver | To improve efficiency (both computer and network) a data receiver | |||
| may refrain from sending an ACK for each incoming segment, | may refrain from sending an ACK for each incoming segment, | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| according to [RFC1122]. However, an ACK should not be delayed an | according to [RFC1122]. However, an ACK should not be delayed an | |||
| inordinate amount of time. Specifically, ACKs MUST be sent for | inordinate amount of time. Specifically, ACKs SHOULD be sent for | |||
| every second full-sized segment that arrives. If a second full- | every second full-sized segment that arrives. If a second full- | |||
| sized segment does not arrive within a given timeout (of no more | sized segment does not arrive within a given timeout (of no more | |||
| than 0.5 seconds), an ACK must be transmitted, according to | than 0.5 seconds), an ACK should be transmitted, according to | |||
| [RFC1122]. A TCP receiver which does not generate an ACK for every | [RFC1122]. A TCP receiver which does not generate an ACK for every | |||
| second full-sized segment exhibits a "Stretch ACK Violation". | second full-sized segment exhibits a "Stretch ACK Violation". | |||
| Significance | Significance | |||
| TCP receivers exhibiting this behavior will cause TCP senders to | TCP receivers exhibiting this behavior will cause TCP senders to | |||
| generate burstier traffic, which can degrade performance in | generate burstier traffic, which can degrade performance in | |||
| congested environments. In addition, generating fewer ACKs | congested environments. In addition, generating fewer ACKs | |||
| increases the amount of time needed by the slow start algorithm to | increases the amount of time needed by the slow start algorithm to | |||
| open the congestion window to an appropriate point, which | open the congestion window to an appropriate point, which | |||
| diminishes performance in environments with large bandwidth-delay | diminishes performance in environments with large bandwidth-delay | |||
| products. Finally, generating fewer ACKs may cause needless | products. Finally, generating fewer ACKs may cause needless | |||
| retransmission timeouts in lossy environments, as it increases the | retransmission timeouts in lossy environments, as it increases the | |||
| possibility that an entire window of ACKs is lost, forcing a | possibility that an entire window of ACKs is lost, forcing a | |||
| retransmission timeout. | retransmission timeout. | |||
| Implications | Implications | |||
| When not in loss recovery, every ACK received by a TCP sender | When not in loss recovery, every ACK received by a TCP sender | |||
| triggers the transmission of new data segments. The burst size is | triggers the transmission of new data segments. The burst size is | |||
| determined by the number of previously unacknowledged segments each | determined by the number of previously unacknowledged segments each | |||
| ACK covers. Therefore, a TCP receiver ACKing more than 2 segments | ACK covers. Therefore, a TCP receiver ack'ing more than 2 segments | |||
| at a time causes the sending TCP to generate a larger burst of | at a time causes the sending TCP to generate a larger burst of | |||
| traffic upon receipt of the ACK. This large burst of traffic can | traffic upon receipt of the ACK. This large burst of traffic can | |||
| overwhelm an intervening gateway, leading to higher drop rates for | overwhelm an intervening gateway, leading to higher drop rates for | |||
| both the connection and other connections passing through the | both the connection and other connections passing through the | |||
| congested gateway. | congested gateway. | |||
| In addition, the TCP slow start algorithm increases the congestion | In addition, the TCP slow start algorithm increases the congestion | |||
| window by 1 segment for each ACK received. Therefore, increasing | window by 1 segment for each ACK received. Therefore, increasing | |||
| the ACK interval (thus decreasing the rate at which ACKs are | the ACK interval (thus decreasing the rate at which ACKs are | |||
| transmitted) increases the amount of time it takes slow start to | transmitted) increases the amount of time it takes slow start to | |||
| increase the congestion window to an appropriate operating point, | increase the congestion window to an appropriate operating point, | |||
| and the connection consequently suffers from reduced performance. | and the connection consequently suffers from reduced performance. | |||
| This is especially true for connections using large windows. | This is especially true for connections using large windows. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 outlines delayed ACKs as a recommended mechanism. | RFC 1122 outlines delayed ACKs as a recommended mechanism. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Trace file taken using tcpdump at host B, the data receiver (and | Trace file taken using tcpdump at host B, the data receiver (and | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| ACK originator). The advertised window (which never changed) and | ACK originator). The advertised window (which never changed) and | |||
| timestamp options have been omitted for clarity, except for the | timestamp options have been omitted for clarity, except for the | |||
| first packet sent by A: | first packet sent by A: | |||
| 12:09:24.820187 A.1174 > B.3999: . 2049:3497(1448) ack 1 | 12:09:24.820187 A.1174 > B.3999: . 2049:3497(1448) ack 1 | |||
| win 33580 <nop,nop,timestamp 2249877 2249914> [tos 0x8] | win 33580 <nop,nop,timestamp 2249877 2249914> [tos 0x8] | |||
| 12:09:24.824147 A.1174 > B.3999: . 3497:4945(1448) ack 1 | 12:09:24.824147 A.1174 > B.3999: . 3497:4945(1448) ack 1 | |||
| 12:09:24.832034 A.1174 > B.3999: . 4945:6393(1448) ack 1 | 12:09:24.832034 A.1174 > B.3999: . 4945:6393(1448) ack 1 | |||
| 12:09:24.832222 B.3999 > A.1174: . ack 6393 | 12:09:24.832222 B.3999 > A.1174: . ack 6393 | |||
| 12:09:24.934837 A.1174 > B.3999: . 6393:7841(1448) ack 1 | 12:09:24.934837 A.1174 > B.3999: . 6393:7841(1448) ack 1 | |||
| skipping to change at page 39, line 52 ¶ | skipping to change at page 45, line 4 ¶ | |||
| Trace file taken using tcpdump at host B, the data receiver (and | Trace file taken using tcpdump at host B, the data receiver (and | |||
| ACK originator), again with window and timestamp information | ACK originator), again with window and timestamp information | |||
| omitted except for the first packet: | omitted except for the first packet: | |||
| 12:06:53.627320 A.1172 > B.3999: . 1449:2897(1448) ack 1 | 12:06:53.627320 A.1172 > B.3999: . 1449:2897(1448) ack 1 | |||
| win 33580 <nop,nop,timestamp 2249575 2249612> [tos 0x8] | win 33580 <nop,nop,timestamp 2249575 2249612> [tos 0x8] | |||
| 12:06:53.634773 A.1172 > B.3999: . 2897:4345(1448) ack 1 | 12:06:53.634773 A.1172 > B.3999: . 2897:4345(1448) ack 1 | |||
| 12:06:53.634961 B.3999 > A.1172: . ack 4345 | 12:06:53.634961 B.3999 > A.1172: . ack 4345 | |||
| 12:06:53.737326 A.1172 > B.3999: . 4345:5793(1448) ack 1 | 12:06:53.737326 A.1172 > B.3999: . 4345:5793(1448) ack 1 | |||
| 12:06:53.744401 A.1172 > B.3999: . 5793:7241(1448) ack 1 | ||||
| 12:06:53.744592 B.3999 > A.1172: . ack 7241 | ||||
| ID Known TCP Implementation Problems August 1998 | ID Known TCP Implementation Problems November 1998 | |||
| 12:06:53.744401 A.1172 > B.3999: . 5793:7241(1448) ack 1 | ||||
| 12:06:53.744592 B.3999 > A.1172: . ack 7241 | ||||
| 12:06:53.752287 A.1172 > B.3999: . 7241:8689(1448) ack 1 | 12:06:53.752287 A.1172 > B.3999: . 7241:8689(1448) ack 1 | |||
| 12:06:53.847332 A.1172 > B.3999: . 8689:10137(1448) ack 1 | 12:06:53.847332 A.1172 > B.3999: . 8689:10137(1448) ack 1 | |||
| 12:06:53.847525 B.3999 > A.1172: . ack 10137 | 12:06:53.847525 B.3999 > A.1172: . ack 10137 | |||
| This trace shows the TCP receiver (host B) ack'ing every second | This trace shows the TCP receiver (host B) ack'ing every second | |||
| full-sized packet, according to [RFC1122]. This is the same | full-sized packet, according to [RFC1122]. This is the same | |||
| implementation shown above, with slight modifications that allow | implementation shown above, with slight modifications that allow | |||
| the receiver to take the length of the options into account when | the receiver to take the length of the options into account when | |||
| deciding when to transmit an ACK. | deciding when to transmit an ACK. | |||
| References | References | |||
| This problem is documented in [Allman97] and [Paxson97]. | This problem is documented in [Allman97] and [Paxson97]. | |||
| How to detect | How to detect | |||
| Stretch ACK violations show up immediately in receiver-side packet | Stretch ACK violations show up immediately in receiver-side packet | |||
| traces of bulk transfers, as shown above. However, packet traces | traces of bulk transfers, as shown above. However, packet traces | |||
| made on the sender side of the TCP connection may lead to | made on the sender side of the TCP connection may lead to | |||
| ambiguities when diagnosing this problem due to the possibility of | ambiguities when diagnosing this problem due to the possibility of | |||
| lost ACKs. | lost ACKs. | |||
| 3.13. | 3.14. | |||
| Name of Problem | Name of Problem | |||
| Retransmission sends multiple packets | Retransmission sends multiple packets | |||
| Classification | Classification | |||
| Congestion control | Congestion control | |||
| Description | Description | |||
| When a TCP retransmits a segment due to a timeout expiration or | When a TCP retransmits a segment due to a timeout expiration or | |||
| beginning a fast retransmission sequence, it should only transmit a | beginning a fast retransmission sequence, it should only transmit a | |||
| single segment. A TCP that transmits more than one segment | single segment. A TCP that transmits more than one segment | |||
| exhibits "Retransmission Sends Multiple Packets". | exhibits "Retransmission Sends Multiple Packets". | |||
| Instances of this problem have been known to occur due to | Instances of this problem have been known to occur due to | |||
| miscomputations involving the use of TCP options. TCP options | miscomputations involving the use of TCP options. TCP options | |||
| increase the TCP header beyond its usual size of 20 bytes. The | increase the TCP header beyond its usual size of 20 bytes. The | |||
| total size of header must be taken into account when retransmitting | total size of header must be taken into account when retransmitting | |||
| a packet. If a TCP sender does not account for the length of the | a packet. If a TCP sender does not account for the length of the | |||
| TCP options when determining how much data to retransmit, it will | TCP options when determining how much data to retransmit, it will | |||
| send too much data to fit into a single packet. In this case, the | send too much data to fit into a single packet. In this case, the | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| correct retransmission will be followed by a short segment | correct retransmission will be followed by a short segment | |||
| (tinygram) containing data that may not need to be retransmitted. | (tinygram) containing data that may not need to be retransmitted. | |||
| A specific case is a TCP using the RFC 1323 timestamp option, which | A specific case is a TCP using the RFC 1323 timestamp option, which | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| adds 12 bytes to the standard 20-byte TCP header. On | adds 12 bytes to the standard 20-byte TCP header. On | |||
| retransmission of a packet, the 12 byte option is incorrectly | retransmission of a packet, the 12 byte option is incorrectly | |||
| interpreted as part of the data portion of the segment. A standard | interpreted as part of the data portion of the segment. A standard | |||
| TCP header and a new 12-byte option is added to the data, which | TCP header and a new 12-byte option is added to the data, which | |||
| yields a transmission of 12 bytes more data than contained in the | yields a transmission of 12 bytes more data than contained in the | |||
| original segment. This overflow causes a smaller packet, with 12 | original segment. This overflow causes a smaller packet, with 12 | |||
| data bytes, to be transmitted. | data bytes, to be transmitted. | |||
| Significance | Significance | |||
| This problem is somewhat serious for congested environments because | This problem is somewhat serious for congested environments because | |||
| the TCP implementation injects more packets into the network than | the TCP implementation injects more packets into the network than | |||
| is appropriate. However, since a tinygram is only sent in response | is appropriate. However, since a tinygram is only sent in response | |||
| to a fast retransmit or a timeout, it does not effect the sustained | to a fast retransmit or a timeout, it does not effect the sustained | |||
| sending rate. | sending rate. | |||
| Implications | Implications | |||
| A TCP exhibiting this behavior is stressing the network with more | A TCP exhibiting this behavior is stressing the network with more | |||
| traffic than appropriate, and stressing routers by increasing the | traffic than appropriate, and stressing routers by increasing the | |||
| number of packets they must process. The redundant tinygram will | number of packets they must process. The redundant tinygram will | |||
| also elicit a duplicate ack from the receiver, resulting in yet | also elicit a duplicate ACK from the receiver, resulting in yet | |||
| another unnecessary transmission. | another unnecessary transmission. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 requires use of slow start after loss; RFC 2001 explicates | RFC 1122 requires use of slow start after loss; RFC 2001 explicates | |||
| slow start; RFC 1323 describes the timestamp option that has been | slow start; RFC 1323 describes the timestamp option that has been | |||
| observed to lead to some implementations exhibiting this problem. | observed to lead to some implementations exhibiting this problem. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using tcpdump/BPF recording at a machine on the same subnet as | Made using tcpdump recording at a machine on the same subnet as | |||
| Host A. Host A is the sender and Host B is the receiver. The | Host A. Host A is the sender and Host B is the receiver. The | |||
| advertised window and timestamp options have been omitted for | advertised window and timestamp options have been omitted for | |||
| clarity, except for the first segment sent by host A. In addition, | clarity, except for the first segment sent by host A. In addition, | |||
| portions of the trace file not pertaining to the packet in question | portions of the trace file not pertaining to the packet in question | |||
| have been removed (missing packets are denoted by ``[...]'' in the | have been removed (missing packets are denoted by ``[...]'' in the | |||
| trace). | trace). | |||
| 11:55:22.701668 A > B: . 7361:7821(460) ack 1 | 11:55:22.701668 A > B: . 7361:7821(460) ack 1 | |||
| win 49324 <nop,nop,timestamp 3485348 3485113> | win 49324 <nop,nop,timestamp 3485348 3485113> | |||
| 11:55:22.702109 A > B: . 7821:8281(460) ack 1 | 11:55:22.702109 A > B: . 7821:8281(460) ack 1 | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| [...] | [...] | |||
| 11:55:23.112405 B > A: . ack 7821 | 11:55:23.112405 B > A: . ack 7821 | |||
| 11:55:23.113069 A > B: . 12421:12881(460) ack 1 | 11:55:23.113069 A > B: . 12421:12881(460) ack 1 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 11:55:23.113511 A > B: . 12881:13341(460) ack 1 | 11:55:23.113511 A > B: . 12881:13341(460) ack 1 | |||
| 11:55:23.333077 B > A: . ack 7821 | 11:55:23.333077 B > A: . ack 7821 | |||
| 11:55:23.336860 B > A: . ack 7821 | 11:55:23.336860 B > A: . ack 7821 | |||
| 11:55:23.340638 B > A: . ack 7821 | 11:55:23.340638 B > A: . ack 7821 | |||
| 11:55:23.341290 A > B: . 7821:8281(460) ack 1 | 11:55:23.341290 A > B: . 7821:8281(460) ack 1 | |||
| 11:55:23.341317 A > B: . 8281:8293(12) ack 1 | 11:55:23.341317 A > B: . 8281:8293(12) ack 1 | |||
| 11:55:23.498242 B > A: . ack 7821 | 11:55:23.498242 B > A: . ack 7821 | |||
| 11:55:23.506850 B > A: . ack 7821 | 11:55:23.506850 B > A: . ack 7821 | |||
| 11:55:23.510630 B > A: . ack 7821 | 11:55:23.510630 B > A: . ack 7821 | |||
| skipping to change at page 42, line 45 ¶ | skipping to change at page 48, line 5 ¶ | |||
| omitted. | omitted. | |||
| References | References | |||
| [Brakmo95] | [Brakmo95] | |||
| How to detect | How to detect | |||
| This problem can be detected by examining a packet trace of the TCP | This problem can be detected by examining a packet trace of the TCP | |||
| connections of a machine using TCP options, during which a packet | connections of a machine using TCP options, during which a packet | |||
| is retransmitted. | is retransmitted. | |||
| 3.14. | ID Known TCP Implementation Problems November 1998 | |||
| 3.15. | ||||
| Name of Problem | Name of Problem | |||
| Failure to send FIN notification promptly | Failure to send FIN notification promptly | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Classification | Classification | |||
| Performance | Performance | |||
| Description | Description | |||
| When an application closes a connection, the corresponding TCP | When an application closes a connection, the corresponding TCP | |||
| should send the FIN notification promptly to its peer (unless | should send the FIN notification promptly to its peer (unless | |||
| prevented by the congestion window). If a TCP implementation | prevented by the congestion window). If a TCP implementation | |||
| delays in sending the FIN notification, for example due to waiting | delays in sending the FIN notification, for example due to waiting | |||
| until unacknowledged data has been acknowledged, then it is said to | until unacknowledged data has been acknowledged, then it is said to | |||
| exhibit "Failure to send FIN notification promptly". | exhibit "Failure to send FIN notification promptly". | |||
| skipping to change at page 43, line 39 ¶ | skipping to change at page 48, line 44 ¶ | |||
| Implications | Implications | |||
| Can diminish total throughput as seen at the application layer, | Can diminish total throughput as seen at the application layer, | |||
| because connection termination takes longer to complete. | because connection termination takes longer to complete. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 793 indicates that a receiver should treat an incoming FIN flag | RFC 793 indicates that a receiver should treat an incoming FIN flag | |||
| as implying the push function. | as implying the push function. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using tcpdump (no losses reported). | Made using tcpdump (no losses reported by the packet filter). | |||
| 10:04:38.68 A > B: S 1031850376:1031850376(0) win 4096 | 10:04:38.68 A > B: S 1031850376:1031850376(0) win 4096 | |||
| <mss 1460,wscale 0,eol> (DF) | <mss 1460,wscale 0,eol> (DF) | |||
| 10:04:38.71 B > A: S 596916473:596916473(0) ack 1031850377 | 10:04:38.71 B > A: S 596916473:596916473(0) ack 1031850377 | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| win 8760 <mss 1460> (DF) | win 8760 <mss 1460> (DF) | |||
| 10:04:38.73 A > B: . ack 1 win 4096 (DF) | 10:04:38.73 A > B: . ack 1 win 4096 (DF) | |||
| 10:04:41.98 A > B: P 1:4(3) ack 1 win 4096 (DF) | 10:04:41.98 A > B: P 1:4(3) ack 1 win 4096 (DF) | |||
| 10:04:42.15 B > A: . ack 4 win 8757 (DF) | 10:04:42.15 B > A: . ack 4 win 8757 (DF) | |||
| 10:04:42.23 A > B: P 4:7(3) ack 1 win 4096 (DF) | 10:04:42.23 A > B: P 4:7(3) ack 1 win 4096 (DF) | |||
| 10:04:42.25 B > A: P 1:11(10) ack 7 win 8754 (DF) | 10:04:42.25 B > A: P 1:11(10) ack 7 win 8754 (DF) | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 10:04:42.32 A > B: . ack 11 win 4096 (DF) | 10:04:42.32 A > B: . ack 11 win 4096 (DF) | |||
| 10:04:42.33 B > A: P 11:51(40) ack 7 win 8754 (DF) | 10:04:42.33 B > A: P 11:51(40) ack 7 win 8754 (DF) | |||
| 10:04:42.51 A > B: . ack 51 win 4096 (DF) | 10:04:42.51 A > B: . ack 51 win 4096 (DF) | |||
| 10:04:42.53 B > A: F 51:51(0) ack 7 win 8754 (DF) | 10:04:42.53 B > A: F 51:51(0) ack 7 win 8754 (DF) | |||
| 10:04:42.56 A > B: FP 7:7(0) ack 52 win 4096 (DF) | 10:04:42.56 A > B: FP 7:7(0) ack 52 win 4096 (DF) | |||
| 10:04:42.58 B > A: . ack 8 win 8754 (DF) | 10:04:42.58 B > A: . ack 8 win 8754 (DF) | |||
| Machine B in the trace above does not send out a FIN notification | Machine B in the trace above does not send out a FIN notification | |||
| promptly if there is any data outstanding. It instead waits for | promptly if there is any data outstanding. It instead waits for | |||
| all unacknowledged data to be acknowledged before sending the FIN | all unacknowledged data to be acknowledged before sending the FIN | |||
| segment. The connection was closed at 10:04.42.33 after requesting | segment. The connection was closed at 10:04.42.33 after requesting | |||
| 40 bytes to be sent. However, the FIN notification isn't sent | 40 bytes to be sent. However, the FIN notification isn't sent | |||
| until 10:04.42.51, after the (delayed) acknowledgement of the 40 | until 10:04.42.51, after the (delayed) acknowledgement of the 40 | |||
| bytes of data. | bytes of data. | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| Made using tcpdump (no losses reported). | Made using tcpdump (no losses reported by the packet filter). | |||
| 10:27:53.85 C > D: S 419744533:419744533(0) win 4096 | 10:27:53.85 C > D: S 419744533:419744533(0) win 4096 | |||
| <mss 1460,wscale 0,eol> (DF) | <mss 1460,wscale 0,eol> (DF) | |||
| 10:27:53.92 D > C: S 10082297:10082297(0) ack 419744534 | 10:27:53.92 D > C: S 10082297:10082297(0) ack 419744534 | |||
| win 8760 <mss 1460> (DF) | win 8760 <mss 1460> (DF) | |||
| 10:27:53.95 C > D: . ack 1 win 4096 (DF) | 10:27:53.95 C > D: . ack 1 win 4096 (DF) | |||
| 10:27:54.42 C > D: P 1:4(3) ack 1 win 4096 (DF) | 10:27:54.42 C > D: P 1:4(3) ack 1 win 4096 (DF) | |||
| 10:27:54.62 D > C: . ack 4 win 8757 (DF) | 10:27:54.62 D > C: . ack 4 win 8757 (DF) | |||
| 10:27:54.76 C > D: P 4:7(3) ack 1 win 4096 (DF) | 10:27:54.76 C > D: P 4:7(3) ack 1 win 4096 (DF) | |||
| 10:27:54.89 D > C: P 1:11(10) ack 7 win 8754 (DF) | 10:27:54.89 D > C: P 1:11(10) ack 7 win 8754 (DF) | |||
| skipping to change at page 44, line 46 ¶ | skipping to change at page 50, line 5 ¶ | |||
| 10:27:55.01 C > D: FP 7:7(0) ack 52 win 4096 (DF) | 10:27:55.01 C > D: FP 7:7(0) ack 52 win 4096 (DF) | |||
| 10:27:55.09 D > C: . ack 8 win 8754 (DF) | 10:27:55.09 D > C: . ack 8 win 8754 (DF) | |||
| Here, Machine D sends a FIN with 40 bytes of data even before the | Here, Machine D sends a FIN with 40 bytes of data even before the | |||
| original 10 octets have been acknowledged. This is correct behavior | original 10 octets have been acknowledged. This is correct behavior | |||
| as it provides for the highest performance. | as it provides for the highest performance. | |||
| References | References | |||
| This problem is documented in [Dawson97]. | This problem is documented in [Dawson97]. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| How to detect | How to detect | |||
| For implementations manifesting this problem, it shows up on a | For implementations manifesting this problem, it shows up on a | |||
| packet trace. | packet trace. | |||
| ID Known TCP Implementation Problems August 1998 | 3.16. | |||
| 3.15. | ||||
| Name of Problem | Name of Problem | |||
| Failure to send a RST after Half Duplex Close | Failure to send a RST after Half Duplex Close | |||
| Classification | Classification | |||
| Resource management | Resource management | |||
| Description | Description | |||
| RFC 1122 4.2.2.13 states that a TCP SHOULD send a RST if data is | RFC 1122 4.2.2.13 states that a TCP SHOULD send a RST if data is | |||
| received after "half duplex close", i.e. if it cannot be delivered | received after "half duplex close", i.e. if it cannot be delivered | |||
| skipping to change at page 45, line 46 ¶ | skipping to change at page 51, line 4 ¶ | |||
| client TCP does not consume the pending data or tear down the | client TCP does not consume the pending data or tear down the | |||
| connection: the window decreases to zero, since the client cannot | connection: the window decreases to zero, since the client cannot | |||
| pass the data to the application, and the server sends probe | pass the data to the application, and the server sends probe | |||
| segments. The client acknowledges the probe segments with a zero | segments. The client acknowledges the probe segments with a zero | |||
| window. As mandated in RFC1122 4.2.2.17, the probe segments are | window. As mandated in RFC1122 4.2.2.17, the probe segments are | |||
| transmitted forever. Server connection state remains in | transmitted forever. Server connection state remains in | |||
| CLOSE_WAIT, and eventually server processes are exhausted. | CLOSE_WAIT, and eventually server processes are exhausted. | |||
| Note that there are two bugs. First, probe segments should be | Note that there are two bugs. First, probe segments should be | |||
| ignored if the window can never subsequently increase. Second, a | ignored if the window can never subsequently increase. Second, a | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| RST should be sent when data is received after half duplex close. | RST should be sent when data is received after half duplex close. | |||
| Fixing the first bug, but not the second, results in the probe | Fixing the first bug, but not the second, results in the probe | |||
| segments eventually timing out the connection, but the server | segments eventually timing out the connection, but the server | |||
| remains in CLOSE_WAIT for a significant and unnecessary period. | remains in CLOSE_WAIT for a significant and unnecessary period. | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 sections 4.2.2.13 and 4.2.2.17. | RFC 1122 sections 4.2.2.13 and 4.2.2.17. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using an unknown network analyzer. No drop information | Made using an unknown network analyzer. No drop information | |||
| available. | available. | |||
| client.1391 > server.8080: S 0:1(0) ack: 0 win: 2000 <mss: 5b4> | client.1391 > server.8080: S 0:1(0) ack: 0 win: 2000 <mss: 5b4> | |||
| server.8080 > client.1391: SA 8c01:8c02(0) ack: 1 win: 8000 <mss:100> | server.8080 > client.1391: SA 8c01:8c02(0) ack: 1 win: 8000 <mss:100> | |||
| client.1391 > server.8080: PA | client.1391 > server.8080: PA | |||
| skipping to change at page 46, line 48 ¶ | skipping to change at page 52, line 5 ¶ | |||
| client.1391 > server.8080: FPA | client.1391 > server.8080: FPA | |||
| [ server ACKs the FIN and enters CLOSE_WAIT ] | [ server ACKs the FIN and enters CLOSE_WAIT ] | |||
| server.8080 > client.1391: [DF] A | server.8080 > client.1391: [DF] A | |||
| [ client enters FIN_WAIT_2 ] | [ client enters FIN_WAIT_2 ] | |||
| server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000 | server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000 | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| [ server continues to try to send its data ] | [ server continues to try to send its data ] | |||
| client.1391 > server.8080: PA < window = 0 > | client.1391 > server.8080: PA < window = 0 > | |||
| server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000 | server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000 | |||
| client.1391 > server.8080: PA < window = 0 > | client.1391 > server.8080: PA < window = 0 > | |||
| server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000 | server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| client.1391 > server.8080: PA < window = 0 > | client.1391 > server.8080: PA < window = 0 > | |||
| server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000 | server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000 | |||
| client.1391 > server.8080: PA < window = 0 > | client.1391 > server.8080: PA < window = 0 > | |||
| server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000 | server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000 | |||
| client.1391 > server.8080: PA < window = 0 > | client.1391 > server.8080: PA < window = 0 > | |||
| [ ... repeat ad exhaustium ... ] | [ ... repeat ad exhaustium ... ] | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| Made using an unknown network analyzer. No drop information | Made using an unknown network analyzer. No drop information | |||
| skipping to change at page 47, line 47 ¶ | skipping to change at page 53, line 4 ¶ | |||
| client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0 | client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0 | |||
| server > client D=59500 S=80 Ack=597 Seq=118939 Len=1460 Win=8760 | server > client D=59500 S=80 Ack=597 Seq=118939 Len=1460 Win=8760 | |||
| client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0 | client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0 | |||
| server > client D=59500 S=80 Ack=597 Seq=120399 Len=892 Win=8760 | server > client D=59500 S=80 Ack=597 Seq=120399 Len=892 Win=8760 | |||
| client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0 | client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0 | |||
| server > client D=59500 S=80 Ack=597 Seq=121291 Len=1460 Win=8760 | server > client D=59500 S=80 Ack=597 Seq=121291 Len=1460 Win=8760 | |||
| client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0 | client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0 | |||
| "client" sends a number of RSTs, one in response to each incoming | "client" sends a number of RSTs, one in response to each incoming | |||
| packet from "server". One might wonder why "server" keeps sending | packet from "server". One might wonder why "server" keeps sending | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| data packets after it has received a RST from "client"; the | data packets after it has received a RST from "client"; the | |||
| explanation is that "server" had already transmitted all five of | explanation is that "server" had already transmitted all five of | |||
| the data packets before receiving the first RST from "client", so | the data packets before receiving the first RST from "client", so | |||
| it is too late to avoid transmitting them. | it is too late to avoid transmitting them. | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| How to detect | How to detect | |||
| The problem can be detected by inspecting packet traces of a large, | The problem can be detected by inspecting packet traces of a large, | |||
| interrupted bulk transfer. | interrupted bulk transfer. | |||
| 3.16. | 3.17. | |||
| Name of Problem | Name of Problem | |||
| Failure to RST on close with data pending | Failure to RST on close with data pending | |||
| Classification | Classification | |||
| Resource management | Resource management | |||
| Description | Description | |||
| When an application closes a connection in such a way that it can | When an application closes a connection in such a way that it can | |||
| no longer read any received data, the TCP SHOULD, per section | no longer read any received data, the TCP SHOULD, per section | |||
| skipping to change at page 48, line 44 ¶ | skipping to change at page 54, line 4 ¶ | |||
| This problem is most significant for endpoints that engage in large | This problem is most significant for endpoints that engage in large | |||
| numbers of connections, as their ability to do so will be curtailed | numbers of connections, as their ability to do so will be curtailed | |||
| as they leak away resources. | as they leak away resources. | |||
| Implications | Implications | |||
| Failure to reset the connection can lead to permanently hung | Failure to reset the connection can lead to permanently hung | |||
| connections, in which the remote endpoint takes no further action | connections, in which the remote endpoint takes no further action | |||
| to tear down the connection because it is waiting on the local TCP | to tear down the connection because it is waiting on the local TCP | |||
| to first take some action. This is particularly the case if the | to first take some action. This is particularly the case if the | |||
| local TCP also allows the advertised window to go to zero, and | local TCP also allows the advertised window to go to zero, and | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| fails to tear down the connection when the remote TCP engages in | fails to tear down the connection when the remote TCP engages in | |||
| "persist" probes (see example below). | "persist" probes (see example below). | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 section 4.2.2.13. Also, 4.2.2.17 for the zero-window | RFC 1122 section 4.2.2.13. Also, 4.2.2.17 for the zero-window | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| probing discussion below. | probing discussion below. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| Made using tcpdump. No drop information available. | Made using tcpdump. No drop information available. | |||
| 13:11:46.04 A > B: S 458659166:458659166(0) win 4096 | 13:11:46.04 A > B: S 458659166:458659166(0) win 4096 | |||
| <mss 1460,wscale 0,eol> (DF) | <mss 1460,wscale 0,eol> (DF) | |||
| 13:11:46.04 B > A: S 792320000:792320000(0) ack 458659167 | 13:11:46.04 B > A: S 792320000:792320000(0) ack 458659167 | |||
| win 4096 | win 4096 | |||
| 13:11:46.04 A > B: . ack 1 win 4096 (DF) | 13:11:46.04 A > B: . ack 1 win 4096 (DF) | |||
| skipping to change at page 49, line 47 ¶ | skipping to change at page 54, line 52 ¶ | |||
| 13:12:06.37 A > B: . ack 2 win 4096 (DF) | 13:12:06.37 A > B: . ack 2 win 4096 (DF) | |||
| 13:12:11.78 A > B: . 4096:4097(1) ack 2 win 4096 (DF) | 13:12:11.78 A > B: . 4096:4097(1) ack 2 win 4096 (DF) | |||
| 13:12:11.78 B > A: . ack 4097 win 0 | 13:12:11.78 B > A: . ack 4097 win 0 | |||
| 13:12:24.59 A > B: . 4096:4097(1) ack 2 win 4096 (DF) | 13:12:24.59 A > B: . 4096:4097(1) ack 2 win 4096 (DF) | |||
| 13:12:24.60 B > A: . ack 4097 win 0 | 13:12:24.60 B > A: . ack 4097 win 0 | |||
| 13:12:50.22 A > B: . 4096:4097(1) ack 2 win 4096 (DF) | 13:12:50.22 A > B: . 4096:4097(1) ack 2 win 4096 (DF) | |||
| 13:12:50.22 B > A: . ack 4097 win 0 | 13:12:50.22 B > A: . ack 4097 win 0 | |||
| Machine B in the trace above does not drop received data when the | Machine B in the trace above does not drop received data when the | |||
| socket is "closed" by the application (in this case, the | socket is "closed" by the application (in this case, the | |||
| application process was terminated). This occured at approximately | application process was terminated). This occurred at approximately | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 13:12:06.36 and resulted in the FIN being sent in response to the | 13:12:06.36 and resulted in the FIN being sent in response to the | |||
| close. However, because there is no longer an application to | close. However, because there is no longer an application to | |||
| deliver the data to, the TCP should have instead sent a RST. | deliver the data to, the TCP should have instead sent a RST. | |||
| Note: Machine A's zero-window probing is also broken. It is | Note: Machine A's zero-window probing is also broken. It is | |||
| resending old data, rather than new data. Section 3.7 in RFC 793 | resending old data, rather than new data. Section 3.7 in RFC 793 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| and Section 4.2.2.17 in RFC 1122 discuss zero-window probing. | and Section 4.2.2.17 in RFC 1122 discuss zero-window probing. | |||
| Trace file demonstrating better behavior | Trace file demonstrating better behavior | |||
| Made using tcpdump. No drop information available. | Made using tcpdump. No drop information available. | |||
| Better, but still not fully correct, behavior, per the discussion | Better, but still not fully correct, behavior, per the discussion | |||
| below. We show this behavior because it has been observed for a | below. We show this behavior because it has been observed for a | |||
| number of different TCP implementations. | number of different TCP implementations. | |||
| 13:48:29.24 C > D: S 73445554:73445554(0) win 4096 | 13:48:29.24 C > D: S 73445554:73445554(0) win 4096 | |||
| skipping to change at page 50, line 46 ¶ | skipping to change at page 56, line 4 ¶ | |||
| window opened again (since it discarded the previously received | window opened again (since it discarded the previously received | |||
| data). Machine C promptly sends more data, causing Machine D to | data). Machine C promptly sends more data, causing Machine D to | |||
| reset the connection since it cannot deliver the data to the | reset the connection since it cannot deliver the data to the | |||
| application. Ideally, Machine D SHOULD send a RST instead of | application. Ideally, Machine D SHOULD send a RST instead of | |||
| dropping the data and re-opening the receive window. | dropping the data and re-opening the receive window. | |||
| Note: Machine C's zero-window probing is broken, the same as in the | Note: Machine C's zero-window probing is broken, the same as in the | |||
| example above. | example above. | |||
| Trace file demonstrating correct behavior | Trace file demonstrating correct behavior | |||
| Made using tcpdump. No losses reported. | ||||
| ID Known TCP Implementation Problems November 1998 | ||||
| Made using tcpdump. No losses reported by the packet filter. | ||||
| 14:12:02.19 E > F: S 1143360000:1143360000(0) win 4096 | 14:12:02.19 E > F: S 1143360000:1143360000(0) win 4096 | |||
| 14:12:02.19 F > E: S 1002988443:1002988443(0) ack 1143360001 | 14:12:02.19 F > E: S 1002988443:1002988443(0) ack 1143360001 | |||
| win 4096 <mss 1460> (DF) | win 4096 <mss 1460> (DF) | |||
| 14:12:02.19 E > F: . ack 1 win 4096 | 14:12:02.19 E > F: . ack 1 win 4096 | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 14:12:10.43 E > F: . 1:513(512) ack 1 win 4096 | 14:12:10.43 E > F: . 1:513(512) ack 1 win 4096 | |||
| 14:12:10.61 F > E: . ack 513 win 3584 (DF) | 14:12:10.61 F > E: . ack 513 win 3584 (DF) | |||
| 14:12:10.61 E > F: . 513:1025(512) ack 1 win 4096 | 14:12:10.61 E > F: . 513:1025(512) ack 1 win 4096 | |||
| 14:12:10.61 E > F: . 1025:1537(512) ack 1 win 4096 | 14:12:10.61 E > F: . 1025:1537(512) ack 1 win 4096 | |||
| 14:12:10.81 F > E: . ack 1537 win 2560 (DF) | 14:12:10.81 F > E: . ack 1537 win 2560 (DF) | |||
| 14:12:10.81 E > F: . 1537:2049(512) ack 1 win 4096 | 14:12:10.81 E > F: . 1537:2049(512) ack 1 win 4096 | |||
| 14:12:10.81 E > F: . 2049:2561(512) ack 1 win 4096 | 14:12:10.81 E > F: . 2049:2561(512) ack 1 win 4096 | |||
| 14:12:10.81 E > F: . 2561:3073(512) ack 1 win 4096 | 14:12:10.81 E > F: . 2561:3073(512) ack 1 win 4096 | |||
| 14:12:11.01 F > E: . ack 3073 win 1024 (DF) | 14:12:11.01 F > E: . ack 3073 win 1024 (DF) | |||
| 14:12:11.01 E > F: . 3073:3585(512) ack 1 win 4096 | 14:12:11.01 E > F: . 3073:3585(512) ack 1 win 4096 | |||
| skipping to change at page 51, line 43 ¶ | skipping to change at page 57, line 5 ¶ | |||
| When doing so, there can be an ambiguity (if only looking at the | When doing so, there can be an ambiguity (if only looking at the | |||
| trace) as to whether the receiving TCP did indeed have unread data | trace) as to whether the receiving TCP did indeed have unread data | |||
| that it could now no longer deliver. To provoke this to happen, it | that it could now no longer deliver. To provoke this to happen, it | |||
| may help to suspend the receiving application so that it fails to | may help to suspend the receiving application so that it fails to | |||
| consume any data, eventually exhausting the advertised window. At | consume any data, eventually exhausting the advertised window. At | |||
| this point, since the advertised window is zero, we know that the | this point, since the advertised window is zero, we know that the | |||
| receiving TCP has undelivered data buffered up. Terminating the | receiving TCP has undelivered data buffered up. Terminating the | |||
| application process then should suffice to test the correctness of | application process then should suffice to test the correctness of | |||
| the TCP's behavior. | the TCP's behavior. | |||
| 3.17. | ID Known TCP Implementation Problems November 1998 | |||
| 3.18. | ||||
| Name of Problem | Name of Problem | |||
| Options missing from TCP MSS calculation | Options missing from TCP MSS calculation | |||
| Classification | Classification | |||
| Reliability / performance | Reliability / performance | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Description | Description | |||
| When a TCP determines how much data to send per packet, it | When a TCP determines how much data to send per packet, it | |||
| calculates a segment size based on the MTU of the path. It must | calculates a segment size based on the MTU of the path. It must | |||
| then subtract from that MTU the size of the IP and TCP headers in | then subtract from that MTU the size of the IP and TCP headers in | |||
| the packet. If IP options and TCP options are not taken into | the packet. If IP options and TCP options are not taken into | |||
| account correctly in this calculation, the resulting segment size | account correctly in this calculation, the resulting segment size | |||
| may be too large. TCPs that do so are said to exhibit "Options | may be too large. TCPs that do so are said to exhibit "Options | |||
| missing from TCP MSS calculation". | missing from TCP MSS calculation". | |||
| Significance | Significance | |||
| skipping to change at page 52, line 41 ¶ | skipping to change at page 58, line 4 ¶ | |||
| send it out the interface. It instead informs the TCP layer of the | send it out the interface. It instead informs the TCP layer of the | |||
| correct MTU size of the interface; the TCP layer again miscomputes | correct MTU size of the interface; the TCP layer again miscomputes | |||
| the MSS by failing to take into account the size of IP options; and | the MSS by failing to take into account the size of IP options; and | |||
| the problem repeats, with no data flowing. | the problem repeats, with no data flowing. | |||
| Relevant RFCs | Relevant RFCs | |||
| RFC 1122 describes the calculation of the effective send MSS. RFC | RFC 1122 describes the calculation of the effective send MSS. RFC | |||
| 1191 describes Path MTU discovery. | 1191 describes Path MTU discovery. | |||
| Trace file demonstrating it | Trace file demonstrating it | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| Trace file taking using tcpdump on host C. The first trace | Trace file taking using tcpdump on host C. The first trace | |||
| demonstrates the fragmentation that occurs without path MTU | demonstrates the fragmentation that occurs without path MTU | |||
| discovery: | discovery: | |||
| 13:55:25.488728 A.65528 > C.discard: | 13:55:25.488728 A.65528 > C.discard: | |||
| P 567833:569273(1440) ack 1 win 17520 | P 567833:569273(1440) ack 1 win 17520 | |||
| <nop,nop,timestamp 3839 1026342> | <nop,nop,timestamp 3839 1026342> | |||
| (frag 20828:1472@0+) | (frag 20828:1472@0+) | |||
| (ttl 62, optlen=8 LSRR{B#} NOP) | (ttl 62, optlen=8 LSRR{B#} NOP) | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| 13:55:25.488943 A > C: | 13:55:25.488943 A > C: | |||
| (frag 20828:8@1472) | (frag 20828:8@1472) | |||
| (ttl 62, optlen=8 LSRR{B#} NOP) | (ttl 62, optlen=8 LSRR{B#} NOP) | |||
| 13:55:25.489052 C.discard > A.65528: | 13:55:25.489052 C.discard > A.65528: | |||
| . ack 566385 win 60816 | . ack 566385 win 60816 | |||
| <nop,nop,timestamp 1026345 3839> (DF) | <nop,nop,timestamp 1026345 3839> (DF) | |||
| (ttl 60, id 41266) | (ttl 60, id 41266) | |||
| Host A repeatedly sends 1440-octet data segments, but these hare | Host A repeatedly sends 1440-octet data segments, but these hare | |||
| skipping to change at page 53, line 42 ¶ | skipping to change at page 58, line 52 ¶ | |||
| 13:55:44.333206 C.discard > A.65527: | 13:55:44.333206 C.discard > A.65527: | |||
| S 1271629000:1271629000(0) ack 1018235391 win 60816 | S 1271629000:1271629000(0) ack 1018235391 win 60816 | |||
| <mss 1460,nop,wscale 0,nop,nop,timestamp 1026383 3876> (DF) | <mss 1460,nop,wscale 0,nop,nop,timestamp 1026383 3876> (DF) | |||
| (ttl 60, id 41427) | (ttl 60, id 41427) | |||
| This is all of the activity seen on this connection. Eventually | This is all of the activity seen on this connection. Eventually | |||
| host C will time out attempting to establish the connection. | host C will time out attempting to establish the connection. | |||
| How to detect | How to detect | |||
| The "netcat" utility is useful for generating source routed | The "netcat" utility [Hobbit96] is useful for generating source | |||
| packets: | routed packets: | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 1% nc C discard | 1% nc C discard | |||
| (interactive typing) | (interactive typing) | |||
| ^C | ^C | |||
| 2% nc C discard < /dev/zero | 2% nc C discard < /dev/zero | |||
| ^C | ^C | |||
| 3% nc -g B C discard | 3% nc -g B C discard | |||
| (interactive typing) | (interactive typing) | |||
| ^C | ^C | |||
| 4% nc -g B C discard < /dev/zero | 4% nc -g B C discard < /dev/zero | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| ^C | ^C | |||
| Lines 1 through 3 should generate appropriate packets, which can be | Lines 1 through 3 should generate appropriate packets, which can be | |||
| verified using tcpdump. If the problem is present, line 4 should | verified using tcpdump. If the problem is present, line 4 should | |||
| generate one of the two kinds of packet traces shown. | generate one of the two kinds of packet traces shown. | |||
| How to fix | How to fix | |||
| The implementation should ensure that the effective send MSS | The implementation should ensure that the effective send MSS | |||
| calculation includes a term for the IP and TCP options, as mandated | calculation includes a term for the IP and TCP options, as mandated | |||
| by RFC 1122. | by RFC 1122. | |||
| 4. Security Considerations | 4. Security Considerations | |||
| This version of this memo does not discuss any security-related | This memo does not discuss any specific security-related TCP | |||
| implementation problems. Futures versions most likely will, so | implementation problems, as the working group decided to pursue | |||
| security considerations will require revisiting. | documenting those in a separate document. Some of the implementation | |||
| problems discussed here, however, can be used for denial-of-service | ||||
| attacks. Those classified as congestion control present | ||||
| opportunities to subvert TCPs used for legitimate data transfer into | ||||
| excessively loading network elements. Those classified as | ||||
| "performance", "reliability" and "resource management" may be | ||||
| exploitable for launching surreptitious denial-of-service attacks | ||||
| against the user of the TCP. Both of these types of attacks can be | ||||
| extremely difficult to detect because in most respects they look | ||||
| identical to legitimate network traffic. | ||||
| 5. Acknowledgements | 5. Acknowledgements | |||
| Thanks to numerous correspondents on the tcp-impl mailing list for | Thanks to numerous correspondents on the tcp-impl mailing list for | |||
| their input: Steve Alexander, Mark Allman, Larry Backman, Jerry Chu, | their input: Steve Alexander, Larry Backman, Jerry Chu, Alan Cox, | |||
| Alan Cox, Kevin Fall, Richard Fox, Jim Gettys, Rick Jones, Allison | Kevin Fall, Richard Fox, Jim Gettys, Rick Jones, Allison Mankin, Neal | |||
| Mankin, Neal McBurnett, Perry Metzger, der Mouse, Thomas Narten, | McBurnett, Perry Metzger, der Mouse, Thomas Narten, Andras Olah, | |||
| Andras Olah, Steve Parker, Francesco Potorti`, Luigi Rizzo, Allyn | Steve Parker, Francesco Potorti`, Luigi Rizzo, Allyn Romanow, Al | |||
| Romanow, Jeff Semke, Al Smith, Jerry Toporek, Joe Touch, and Curtis | Smith, Jerry Toporek, Joe Touch, and Curtis Villamizar. | |||
| Villamizar. | ||||
| Thanks also to Josh Cohen for the traces documenting the "Failure to | Thanks also to Josh Cohen for the traces documenting the "Failure to | |||
| send a RST after Half Duplex Close" problem. | ||||
| ID Known TCP Implementation Problems November 1998 | ||||
| send a RST after Half Duplex Close" problem; and to John Polstra, who | ||||
| analyzed the "Window probe deadlock" problem. | ||||
| 6. References | 6. References | |||
| [Allman97] | [Allman97] | |||
| M. Allman, "Fixing Two BSD TCP Bugs," Technical Report CR-204151, | M. Allman, "Fixing Two BSD TCP Bugs," Technical Report CR-204151, | |||
| NASA Lewis Research Center, October 1997. | NASA Lewis Research Center, Oct. 1997. | |||
| http://gigahertz.lerc.nasa.gov/~mallman/papers/bug.ps | http://gigahertz.lerc.nasa.gov/~mallman/papers/bug.ps | |||
| [Allman98] | [RFC2414] | |||
| M. Allman, S. Floyd and C. Partridge, "Increasing TCP's Initial | M. Allman, S. Floyd and C. Partridge, "Increasing TCP's Initial | |||
| Window," Internet-Draft draft-floyd-incr-init-win-03.txt, May 1998. | Window," Sep. 1998. | |||
| [RFC1122] | [RFC1122] | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| R. Braden, Editor, "Requirements for Internet Hosts -- | R. Braden, Editor, "Requirements for Internet Hosts -- | |||
| Communication Layers," Oct. 1989. | Communication Layers," Oct. 1989. | |||
| [RFC2119] | [RFC2119] | |||
| S. Bradner, "Key words for use in RFCs to Indicate Requirement | S. Bradner, "Key words for use in RFCs to Indicate Requirement | |||
| Levels," Mar. 1997. | Levels," Mar. 1997. | |||
| [Brakmo95] | [Brakmo95] | |||
| L. Brakmo and L. Peterson, "Performance Problems in BSD4.4 TCP," | L. Brakmo and L. Peterson, "Performance Problems in BSD4.4 TCP," | |||
| ACM Computer Communication Review, 25(5):69-86, 1995. | ACM Computer Communication Review, 25(5):69-86, 1995. | |||
| [RFC813] | ||||
| D. Clark, "Window and Acknowledgement Strategy in TCP," Jul. 1982. | ||||
| [Dawson97] | [Dawson97] | |||
| S. Dawson, F. Jahanian, and T. Mitton, "Experiments on Six | S. Dawson, F. Jahanian, and T. Mitton, "Experiments on Six | |||
| Commercial TCP Implementations Using a Software Fault Injection | Commercial TCP Implementations Using a Software Fault Injection | |||
| Tool," to appear in Software Practice & Experience, 1997. A | Tool," to appear in Software Practice & Experience, 1997. A | |||
| technical report version of this paper can be obtained at | technical report version of this paper can be obtained at | |||
| ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298-96.ps.gz. | ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298-96.ps.gz. | |||
| [Fall96] | [Fall96] | |||
| K. Fall and S. Floyd, "Simulation-based Comparisons of Tahoe, Reno, | K. Fall and S. Floyd, "Simulation-based Comparisons of Tahoe, Reno, | |||
| and SACK TCP," ACM Computer Communication Review, 26(3):5-21, 1996. | and SACK TCP," ACM Computer Communication Review, 26(3):5-21, 1996. | |||
| [Hobbit96] | ||||
| Hobbit, Avian Research, netcat, available via anonymous ftp to | ||||
| ftp.avian.org, 1996. | ||||
| [Hoe96] | [Hoe96] | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| J. Hoe, "Improving the Start-up Behavior of a Congestion Control | J. Hoe, "Improving the Start-up Behavior of a Congestion Control | |||
| Scheme for TCP," Proc. SIGCOMM '96. | Scheme for TCP," Proc. SIGCOMM '96. | |||
| [Jacobson88] | [Jacobson88] | |||
| V. Jacobson, "Congestion Avoidance and Control," Proc. SIGCOMM '88. | V. Jacobson, "Congestion Avoidance and Control," Proc. SIGCOMM '88. | |||
| ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z | ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z | |||
| [Jacobson89] | ||||
| V. Jacobson, C. Leres, and S. McCanne, tcpdump, available via | ||||
| anonymous ftp to ftp.ee.lbl.gov, Jun. 1989. | ||||
| [RFC2018] | [RFC2018] | |||
| M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective | M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective | |||
| Acknowledgement Options," Oct. 1996. | Acknowledgement Options," Oct. 1996. | |||
| [RFC1191] | [RFC1191] | |||
| J. Mogul and S. Deering, "Path MTU discovery," Nov. 1990. | J. Mogul and S. Deering, "Path MTU discovery," Nov. 1990. | |||
| [RFC896] | [RFC896] | |||
| J. Nagle, "Congestion Control in IP/TCP Internetworks," Jan. 1984. | J. Nagle, "Congestion Control in IP/TCP Internetworks," Jan. 1984. | |||
| [Paxson97] | [Paxson97] | |||
| V. Paxson, "Automated Packet Trace Analysis of TCP | V. Paxson, "Automated Packet Trace Analysis of TCP | |||
| Implementations," Proc. SIGCOMM '97, available from | Implementations," Proc. SIGCOMM '97, available from | |||
| ftp://ftp.ee.lbl.gov/papers/vp-tcpanaly-sigcomm97.ps.Z. | ftp://ftp.ee.lbl.gov/papers/vp-tcpanaly-sigcomm97.ps.Z. | |||
| [RFC793] | [RFC793] | |||
| J. Postel, Editor, "Transmission Control Protocol," Sep. 1981. | J. Postel, Editor, "Transmission Control Protocol," Sep. 1981. | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| [RFC2001] | [RFC2001] | |||
| W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast Retransmit, | W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast Retransmit, | |||
| and Fast Recovery Algorithms," Jan. 1997. | and Fast Recovery Algorithms," Jan. 1997. | |||
| [Stevens94] | [Stevens94] | |||
| W. Stevens, "TCP/IP Illustrated, Volume 1", Addison-Wesley | W. Stevens, "TCP/IP Illustrated, Volume 1", Addison-Wesley | |||
| Publishing Company, Reading, Massachusetts, 1994. | Publishing Company, Reading, Massachusetts, 1994. | |||
| [Wright95] | [Wright95] | |||
| G. Wright and W. Stevens, "TCP/IP Illustrated, Volume 2", Addison- | G. Wright and W. Stevens, "TCP/IP Illustrated, Volume 2", Addison- | |||
| Wesley Publishing Company, Reading Massachusetts, 1995. | Wesley Publishing Company, Reading Massachusetts, 1995. | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| 7. Authors' Addresses | 7. Authors' Addresses | |||
| Vern Paxson <vern@ee.lbl.gov> | Vern Paxson <vern@ee.lbl.gov> | |||
| Network Research Group | Network Research Group | |||
| Lawrence Berkeley National Laboratory | Lawrence Berkeley National Laboratory | |||
| Berkeley, CA 94720 | Berkeley, CA 94720 | |||
| USA | USA | |||
| Phone: +1 510/486-7504 | Phone: +1 510/486-7504 | |||
| Mark Allman <mallman@lerc.nasa.gov> | Mark Allman <mallman@lerc.nasa.gov> | |||
| skipping to change at page 56, line 44 ¶ | skipping to change at page 62, line 32 ¶ | |||
| Phone: +1 216/433-6586 | Phone: +1 216/433-6586 | |||
| Scott Dawson <sdawson@eecs.umich.edu> | Scott Dawson <sdawson@eecs.umich.edu> | |||
| Real-Time Computing Laboratory | Real-Time Computing Laboratory | |||
| EECS Building | EECS Building | |||
| University of Michigan | University of Michigan | |||
| Ann Arbor, MI 48109-2122 | Ann Arbor, MI 48109-2122 | |||
| USA | USA | |||
| Phone: +1 313/763-5363 | Phone: +1 313/763-5363 | |||
| William C. Fenner <fenner@parc.xerox.com> | ||||
| Xerox PARC | ||||
| 3333 Coyote Hill Road | ||||
| Palo Alto, CA 94304 | ||||
| USA | ||||
| Phone: +1 650/812-4816 | ||||
| Jim Griner <jgriner@lerc.nasa.gov> | Jim Griner <jgriner@lerc.nasa.gov> | |||
| NASA Lewis Research Center | NASA Lewis Research Center | |||
| 21000 Brookpark Road | 21000 Brookpark Road | |||
| MS 54-2 | MS 54-2 | |||
| Cleveland, OH 44135 | Cleveland, OH 44135 | |||
| USA | USA | |||
| Phone: +1 216/433-5787 | Phone: +1 216/433-5787 | |||
| Ian Heavens <ian@spider.com> | Ian Heavens <ian@spider.com> | |||
| ID Known TCP Implementation Problems August 1998 | ||||
| Spider Software Ltd. | Spider Software Ltd. | |||
| 8 John's Place, Leith | 8 John's Place, Leith | |||
| Edinburgh EH6 7EL | Edinburgh EH6 7EL | |||
| UK | UK | |||
| Phone: +44 131/475-7015 | Phone: +44 131/475-7015 | |||
| Kevin Lahey <kml@nas.nasa.gov> | Kevin Lahey <kml@nas.nasa.gov> | |||
| ID Known TCP Implementation Problems November 1998 | ||||
| NASA Ames Research Center/MRJ | NASA Ames Research Center/MRJ | |||
| MS 258-6 | MS 258-6 | |||
| Moffett Field, CA 94035 | Moffett Field, CA 94035 | |||
| USA | USA | |||
| Phone: +1 650/604-4334 | Phone: +1 650/604-4334 | |||
| Jeff Semke <semke@psc.edu> | Jeff Semke <semke@psc.edu> | |||
| Pittsburgh Supercomputing Center | Pittsburgh Supercomputing Center | |||
| 4400 Fifth Ave | 4400 Fifth Ave | |||
| Pittsburgh, PA 15213 | Pittsburgh, PA 15213 | |||
| End of changes. 167 change blocks. | ||||
| 259 lines changed or deleted | 521 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||