| < draft-gont-tcpm-tcp-soft-errors-01.txt | draft-gont-tcpm-tcp-soft-errors-02.txt > | |||
|---|---|---|---|---|
| TCP Maintenance and Minor F. Gont | TCP Maintenance and Minor F. Gont | |||
| Extensions (tcpm) UTN/FRH | Extensions (tcpm) UTN/FRH | |||
| Internet-Draft October 24, 2004 | Internet-Draft September 12, 2005 | |||
| Expires: April 24, 2005 | Expires: March 16, 2006 | |||
| TCP's Reaction to Soft Errors | TCP's Reaction to Soft Errors | |||
| draft-gont-tcpm-tcp-soft-errors-01.txt | draft-gont-tcpm-tcp-soft-errors-02.txt | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is subject to all provisions | By submitting this Internet-Draft, each author represents that any | |||
| of section 3 of RFC 3667. By submitting this Internet-Draft, each | applicable patent or other IPR claims of which he or she is aware | |||
| author represents that any applicable patent or other IPR claims of | have been or will be disclosed, and any of which he or she becomes | |||
| which he or she is aware have been or will be disclosed, and any of | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| which he or she become aware will be disclosed, in accordance with | ||||
| RFC 3668. | ||||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| other groups may also distribute working documents as | other groups may also distribute working documents as Internet- | |||
| Internet-Drafts. | Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on April 24, 2005. | This Internet-Draft will expire on March 16, 2006. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2004). | Copyright (C) The Internet Society (2005). | |||
| Abstract | Abstract | |||
| This document discusses problems that may arise due to TCP's reaction | This document discusses the problem of long delays between connection | |||
| to soft errors. In particular, it discusses the problem of long | establishment attempts that may arise in a number of scenarios, | |||
| delays in connection establishment attempts that may arise in a | including that in which dual stack nodes that have IPv6 enabled by | |||
| number of scenarios, including that in which dual stack nodes that | default are deployed in IPv4 or mixed IPv4 and IPv6 environments. | |||
| have IPv6 enabled by default are deployed in IPv4 or mixed IPv4 and | Additionaly, it describes a modification to TCP's reaction to soft | |||
| IPv6 environments. This document discusses this potential problem, | errors that has been implemented in a variety of TCP/IP stacks to | |||
| and proposes to change TCP's reaction to soft errors to work around | help overcome this problem. | |||
| this problem. It does not try to specify whether IPv6 should be | ||||
| enabled by default or not. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Error Handling in TCP . . . . . . . . . . . . . . . . . . . . 3 | 2. Error Handling in TCP . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2.1 Reaction to Hard Errors . . . . . . . . . . . . . . . . . 4 | 2.1. Reaction to Hard Errors . . . . . . . . . . . . . . . . . 4 | |||
| 2.2 Reaction to Soft Errors . . . . . . . . . . . . . . . . . 4 | 2.2. Reaction to Soft Errors . . . . . . . . . . . . . . . . . 4 | |||
| 3. Problems arising from TCP's reaction to soft errors . . . . . 4 | 3. Problems that may arise from TCP's reaction to soft errors . . 5 | |||
| 3.1 General Discussion . . . . . . . . . . . . . . . . . . . . 4 | 3.1. General Discussion . . . . . . . . . . . . . . . . . . . . 5 | |||
| 3.2 Problems that arise with Dual Stack IPv6 on by Default . . 5 | 3.2. Problems that may arise with Dual Stack IPv6 on by | |||
| 4. Changing TCP's reaction to soft errors . . . . . . . . . . . . 6 | Default . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 4. A workaround for long delays between | ||||
| connection-establishment attempts . . . . . . . . . . . . . . 6 | ||||
| 5. Possible drawbacks . . . . . . . . . . . . . . . . . . . . . . 6 | 5. Possible drawbacks . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 5.1 Non-deterministic transient network failures . . . . . . . 6 | 5.1. Non-deterministic transient network failures . . . . . . . 7 | |||
| 5.2 Deterministic transient network failures . . . . . . . . . 7 | 5.2. Deterministic transient network failures . . . . . . . . . 7 | |||
| 6. Future work . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 6. Future work . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | |||
| 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 10.1 Normative References . . . . . . . . . . . . . . . . . . . . 9 | 10.1. Normative References . . . . . . . . . . . . . . . . . . . 8 | |||
| 10.2 Informative References . . . . . . . . . . . . . . . . . . . 9 | 10.2. Informative References . . . . . . . . . . . . . . . . . . 9 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . 10 | Appendix A. Other possible solutions . . . . . . . . . . . . . . 9 | |||
| A. Other possible solutions . . . . . . . . . . . . . . . . . . . 10 | A.1. A more conservative approach . . . . . . . . . . . . . . . 10 | |||
| A.1 A more conservative approach . . . . . . . . . . . . . . . 10 | A.2. Asynchronous Application Notification . . . . . . . . . . 10 | |||
| A.2 Asynchronous Application Notification . . . . . . . . . . 11 | A.3. Issuing several connection requests in parallel . . . . . 11 | |||
| A.3 Issuing several connection requests in parallel . . . . . 11 | Appendix B. Changes from draft-gont-tcpm-tcp-soft-errors-01 . . . 11 | |||
| B. Changes from draft-gont-tcpm-tcp-soft-errors-00 . . . . . . . 12 | Appendix C. Changes from draft-gont-tcpm-tcp-soft-errors-00 . . . 11 | |||
| Intellectual Property and Copyright Statements . . . . . . . . 13 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 14 | ||||
| 1. Introduction | 1. Introduction | |||
| The handling of network failures can be separated into two different | The handling of network failures can be separated into two different | |||
| actions: fault isolation and fault recovery. Fault isolation is the | actions: fault isolation and fault recovery. Fault isolation is the | |||
| actions that hosts and routers take to determine that there is some | actions that hosts and routers take to determine that there is some | |||
| network failure. Fault recovery, on the other hand, is the actions | network failure. Fault recovery, on the other hand, is the actions | |||
| that hosts and routers will perform to isolate and survive a network | that hosts and routers will perform to isolate and survive a network | |||
| failure.[8] | failure.[8] | |||
| skipping to change at page 3, line 27 ¶ | skipping to change at page 3, line 27 ¶ | |||
| over the network. | over the network. | |||
| When a host is signalled of a network error, there is still the issue | When a host is signalled of a network error, there is still the issue | |||
| of what to do to let communication survive, if possible, the network | of what to do to let communication survive, if possible, the network | |||
| failure. The fault recovery strategy may depend on the type of | failure. The fault recovery strategy may depend on the type of | |||
| network failure taking place, and the time the error condition is | network failure taking place, and the time the error condition is | |||
| detected. | detected. | |||
| This document analyzes the fault recovery policy of TCP [2], and the | This document analyzes the fault recovery policy of TCP [2], and the | |||
| problems that may arise due to TCP's policy of reaction to soft | problems that may arise due to TCP's policy of reaction to soft | |||
| errors. In particular, it analyzes the problems that may arise in | errors. Among others, it analyzes the problems that may arise in | |||
| scenarios where dual stack nodes that have IPv6 enabled by default | scenarios where dual stack nodes that have IPv6 enabled by default | |||
| are deployed in IPv4 or mixed IPv4 and IPv6 environments. | are deployed in IPv4 or mixed IPv4 and IPv6 environments. | |||
| Additionaly, it documents a modification to TCP's policy of reaction | ||||
| to ICMP messages indicating "soft errors", that has been implemented | ||||
| in a variety of TCP/IP stacks to help overcome the problems discussed | ||||
| in this document. | ||||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in RFC 2119 [3]. | document are to be interpreted as described in RFC 2119 [3]. | |||
| 2. Error Handling in TCP | 2. Error Handling in TCP | |||
| Network errors can be divided into soft and hard errors. Soft errors | Network errors can be divided into soft and hard errors. Soft errors | |||
| are considered to be transient network failures, which will hopefully | are considered to be transient network failures, which will hopefully | |||
| be solved in the near term. Hard errors, on the other hand, are | be solved in the near term. Hard errors, on the other hand, are | |||
| considered to reflect permanent network conditions, which are | considered to reflect permanent network error conditions, which are | |||
| unlikely to be solved in the near future. | unlikely to be solved in the near future. | |||
| Therefore, it may make sense for the fault recovery action to be | Therefore, it may make sense for the fault recovery action to be | |||
| different depending on the type of error being detected. | different depending on the type of error being detected. | |||
| When there is a network failure that's not signalled to the sending | When there is a network failure that's not signalled to the sending | |||
| host, such as a gateway corrupting packets, TCP's fault recovery | host, such as a gateway corrupting packets, TCP's fault recovery | |||
| action is to repeatedly retransmit the segment until either it gets | action is to repeatedly retransmit the segment until either it gets | |||
| acknowledged, or the connection times out. In case the connection | acknowledged, or the connection times out. In case the connection | |||
| times out before the segment is acknowledged, TCP won't be able to | times out before the segment is acknowledged, TCP won't be able to | |||
| provide more information than the timeout condition. | provide more information than the timeout condition. | |||
| In case a host does receive an ICMP error message about a current TCP | In case a host does receive an ICMP error message meant for an | |||
| connection, the IP layer will pass this message up to TCP to raise | ongoing TCP connection, the IP layer will pass this message up to | |||
| awareness of the network failure. [4] | correspoding TCP instance to raise awareness of the network failure. | |||
| [4] | ||||
| TCP's reaction will depend on the type of error being signalled. | TCP's reaction to ICMP messages will depend on the type of error | |||
| being signalled. | ||||
| 2.1 Reaction to Hard Errors | 2.1. Reaction to Hard Errors | |||
| When receiving a segment with the RST bit set, or an ICMP error | When receiving a segment with the RST bit set, or an ICMP error | |||
| message indicating a hard error condition, TCP will simply abort the | message indicating a hard error condition, TCP will simply abort the | |||
| corresponding connection, regardless of the state the connection is | corresponding connection, regardless of the state the connection is | |||
| in. | in. | |||
| The "Requirements for Internet Hosts -- Communication Layers" RFC [4] | The "Requirements for Internet Hosts -- Communication Layers" RFC [4] | |||
| states, in section 4.2.3.9, that TCP SHOULD abort connections when | states, in section 4.2.3.9, that TCP SHOULD abort connections when | |||
| receiving ICMP error messages that indicate hard errors. This policy | receiving ICMP error messages that indicate hard errors. This policy | |||
| is based on the premise that, as hard errors indicate network | is based on the premise that, as hard errors indicate network error | |||
| conditions that won't change in the near term, it will not be | conditions that won't change in the near term, it will not be | |||
| possible for TCP to recover from this type of network failure. | possible for TCP to recover from this type of network failure. | |||
| 2.2 Reaction to Soft Errors | 2.2. Reaction to Soft Errors | |||
| The "Requirements for Internet Hosts -- Communication Layers" RFC [4] | ||||
| states, in section 4.2.3.9, that TCP MUST NOT abort connections when | ||||
| receiving ICMP error messages that indicate soft errors. | ||||
| If an ICMP error message is received that indicates a soft error, TCP | If an ICMP error message is received that indicates a soft error, TCP | |||
| will just record this information [9], and repeatedly retransmit the | will just record this information [9], and repeatedly retransmit the | |||
| data until either they get acknowledged or the connection times out. | data until either they get acknowledged or the connection times out. | |||
| This policy is based on the premise that, as soft errors are | ||||
| transient network failures that will hopefully be solved in the near | The "Requirements for Internet Hosts -- Communication Layers" RFC [4] | |||
| term, one of the retransmissions will succeed. | states, in section 4.2.3.9, that TCP MUST NOT abort connections when | |||
| receiving ICMP error messages that indicate soft errors. This policy | ||||
| is based on the premise that, as soft errors are transient network | ||||
| failures that will hopefully be solved in the near term, one of the | ||||
| retransmissions will succeed. | ||||
| In case the connection timer expires, and an ICMP error message had | In case the connection timer expires, and an ICMP error message had | |||
| been received before the timeout, TCP will use this information to | been received before the timeout, TCP will use this information to | |||
| provide the user with a more specific error message. [9] | provide the user with a more specific error message. [9] | |||
| This handling of soft errors exploits the valuable feature of the | This handling of soft errors exploits the valuable feature of the | |||
| Internet that for many network failures, the network can be | Internet that for many network failures, the network can be | |||
| dynamically reconstructed without any disruption of the endpoints. | dynamically reconstructed without any disruption of the endpoints. | |||
| 3. Problems arising from TCP's reaction to soft errors | 3. Problems that may arise from TCP's reaction to soft errors | |||
| 3.1 General Discussion | 3.1. General Discussion | |||
| Even though TCP's fault recovery strategy in the presence of soft | Even though TCP's fault recovery strategy in the presence of soft | |||
| errors allows for TCP connections to survive transient network | errors allows for TCP connections to survive transient network | |||
| failures, there are scenarios in which this policy may cause | failures, there are scenarios in which this policy may cause | |||
| undesirable effects. | undesirable effects. | |||
| For example, consider the case where an application on a local host | For example, consider the case in which an application on a local | |||
| is trying to communicate with a destination whose name resolves to | host is trying to communicate with a destination whose name resolves | |||
| several IP addresses. The application on the local host will try to | to several IP addresses. The application on the local host will try | |||
| establish a connection with the destination host, cycling through the | to establish a connection with the destination host, cycling through | |||
| list of IP addresses, until one succeeds [5]. Suppose that some (but | the list of IP addresses, until one succeeds [5]. Suppose that some | |||
| not all) of the addresses in the returned list are permanently | (but not all) of the addresses in the returned list are permanently | |||
| unreachable. If they are the first IP addresses in the list, the | unreachable. If they are the first IP addresses in the list, the | |||
| application will usually try to use these addresses first. | application will usually try to use these addresses first. | |||
| As discussed in Section 2, this unreachability condition may or may | As discussed in Section 2, this unreachability condition may or may | |||
| not be signalled to the sending host. If the local TCP is not | not be signalled to the sending host. If the local TCP is not | |||
| signalled of the error condition, it will repeatedly retransmit the | signalled of the error condition, it will repeatedly retransmit the | |||
| SYN segment, until the connection times out. If unreachability is | SYN segment, until the connection times out. If unreachability is | |||
| signalled by some intermediate router to the local TCP by means of an | signalled by some intermediate router to the local TCP by means of an | |||
| ICMP error message, the local TCP will just record the error message | ICMP error message, the local TCP will just record the error message | |||
| and will still repeatedly retransmit the SYN segment until the | and will still repeatedly retransmit the SYN segment until the | |||
| connection timer expires. The "Requirements For Internet Hosts -- | connection timer expires. The "Requirements For Internet Hosts -- | |||
| Communication Layers" RFC [4] states that this timer MUST be large | Communication Layers" RFC [4] states that this timer MUST be large | |||
| enough to provide retransmission of the SYN segment for at least 3 | enough to provide retransmission of the SYN segment for at least 3 | |||
| minutes. This would mean that the application on the local host | minutes. This would mean that the application on the local host | |||
| would spend several minutes for each unreachable address it tries to | would spend several minutes for each unreachable address it tries to | |||
| use for a connection attempt. These long delays in connection | use for a connection attempt. These long delays between connection | |||
| establishment attempts would be inappropriate for interactive | establishment attempts would be inappropriate for interactive | |||
| applications such as the web. [10][11] | applications such as the web. [10] [11] | |||
| 3.2 Problems that arise with Dual Stack IPv6 on by Default | 3.2. Problems that may arise with Dual Stack IPv6 on by Default | |||
| A scenario in which this type of problem may occur is that where dual | Another scenario in which this type of problem may occur is that | |||
| stack nodes that have IPv6 enabled by default are deployed in IPv4 or | where dual stack nodes that have IPv6 enabled by default are deployed | |||
| mixed IPv4 and IPv6 environments, and the IPv6 connectivity is | in IPv4 or mixed IPv4 and IPv6 environments, and the IPv6 | |||
| non-existent [6]. | connectivity is non-existent [6]. | |||
| As discussed in [6], there are two possible variants of this | As discussed in [6], there are two possible variants of this | |||
| scenario, which differ in whether the lack of connectivity is | scenario, which differ in whether the lack of connectivity is | |||
| signalled to the sending node, or not. | signalled to the sending node, or not. | |||
| In cases where packets sent to a destination are silently dropped and | In cases where packets sent to a destination are silently dropped and | |||
| no ICMPv6 [7] errors are generated, there is very little that can be | no ICMPv6 [7] errors are generated, there is very little that can be | |||
| done other than waiting for the existing connection timeout mechanism | done other than waiting for the existing connection timeout mechanism | |||
| in TCP, or an aplication timeout, to be triggered. | in TCP, or an aplication timeout, to be triggered. | |||
| In cases where a node has no default routers and Neighbor | In cases where a node has no default routers and Neighbor | |||
| Unreachability Detection (NUD) fails for destinations assumed to be | Unreachability Detection (NUD) fails for destinations assumed to be | |||
| on-link, or where firewalls or other systems that enforce scope | on-link, or where firewalls or other systems that enforce scope | |||
| boundaries send ICMPv6 errors, the sending node will be signalled of | boundaries send ICMPv6 errors, the sending node will be signalled of | |||
| the unreachability problem. As discussed in Section 2.2, TCP | the unreachability problem. However, as discussed in Section 2.2, | |||
| implementations will not abort connections when receiving ICMP error | TCP implementations will not abort connections when receiving ICMP | |||
| messages that indicate soft errors. However, it would be desirable | error messages that indicate soft errors. | |||
| for TCP implementations to use this information to avoid the long | ||||
| delays in connection attempts described in Section 3.1. | ||||
| 4. Changing TCP's reaction to soft errors | 4. A workaround for long delays between connection-establishment | |||
| attempts | ||||
| As discussed in Section 1, it may make sense for the fault recovery | As discussed in Section 1, it may make sense for the fault recovery | |||
| action to depend not only on the type of error being reported, but | action to depend not only on the type of error being reported, but | |||
| also on the time the error is reported. For example, one could infer | also on the time the error is reported. For example, one could infer | |||
| that when an error arrives in response to opening a new connection, | that when an error arrives in response to opening a new connection, | |||
| it is probably caused by opening the connection improperly, rather | it is probably caused by opening the connection improperly, rather | |||
| than by a transient network failure. [8] | than by a transient network failure. [8] | |||
| This document proposes to change TCP's reaction to soft errors as a | ||||
| workaround to the potential problems described in Section 3.1. | ||||
| TCP SHOULD abort a connection in the SYN-SENT or the SYN-RECEIVED | A variety of TCP/IP stacks have modified TCP's reaction to soft | |||
| state if it receives an ICMP "Destination Unreachable" message that | errors, to make it abort a connection in the SYN-SENT or the SYN- | |||
| indicates a soft error about that connection. | RECEIVED state if it receives an ICMP "Destination Unreachable" | |||
| message that indicates a soft error about that connection. | ||||
| The "Requirements for Internet Hosts -- Communication Layers" RFC [4] | The "Requirements for Internet Hosts -- Communication Layers" RFC [4] | |||
| states, in section 4.2.3.9., that the ICMP "Destination Unreachable" | states, in section 4.2.3.9., that the ICMP "Destination Unreachable" | |||
| messages that indicate soft errors are ICMP codes 0 (network | messages that indicate soft errors are ICMP codes 0 (network | |||
| unreachable), 1 (host unreachable), and 5 (source route failed). | unreachable), 1 (host unreachable), and 5 (source route failed). | |||
| Even though ICMPv6 didn't exist when [4] was written, one could | Even though ICMPv6 didn't exist when [4] was written, one could | |||
| extrapolate the concept of soft errors to ICMPv6 Type 1 Codes 0 (no | extrapolate the concept of soft errors to ICMPv6 Type 1 Codes 0 (no | |||
| route to destination) and 3 (address unreachable). | route to destination) and 3 (address unreachable). | |||
| This workaround has been implemented in the Linux kernel since | It must be noted that this behaviour violates section 4.2.3.9 of [4], | |||
| version 2.0.0 (released in 1996), and has therefore been tested in | since it states that as these Unreachable messages indicate soft | |||
| real-world scenarios. | error conditions, TCP MUST NOT abort the corresponding connection. | |||
| A system-wide toggle that allows system administrators to disable the | ||||
| proposed fix MAY be provided. By default, this toggle SHOULD enable | ||||
| the proposed fix. | ||||
| Appendix A.1 discusses a more conservative approach than the one | This workaround has been implemented, for example, in the Linux | |||
| introduced in this section. | kernel since version 2.0.0 (released in 1996) [12]. Appendix A.1 | |||
| discusses a more conservative approach than the one introduced in | ||||
| this section. | ||||
| 5. Possible drawbacks | 5. Possible drawbacks | |||
| The following subsections discuss some of the possible drawbacks | The following subsections discuss some of the possible drawbacks | |||
| arising from the use of the proposed fix. | arising from the use of the modification to TCP's reaction to soft | |||
| errors described in Section 4. | ||||
| 5.1 Non-deterministic transient network failures | 5.1. Non-deterministic transient network failures | |||
| In case there's a transient network failure affecting all of the | In case there's a transient network failure affecting all of the | |||
| addresses returned by the name-to-address translation function, all | addresses returned by the name-to-address translation function, all | |||
| destinations could be unreachable for some short period of time. In | destinations could be unreachable for some short period of time. In | |||
| such a scenario, the application could quickly cycle through all the | such a scenario, the application could quickly cycle through all the | |||
| IP addresses in the list and return an error, when it could have let | IP addresses in the list and return an error, when it could have let | |||
| TCP retry a destination a few seconds later when the transient | TCP retry a destination a few seconds later, when the transient | |||
| problem could have been mitigated. | problem could have disappeared. | |||
| However, it must be noted that non-interactive applications, such as | However, it must be noted that non-interactive applications, such as | |||
| a Mail Transfer Agent (MTA), usually must implement application-layer | a Mail Transfer Agent (MTA), usually must implement application-layer | |||
| retry mechanisms, and thus are able to handle these scenarios | retry mechanisms, and thus are able to handle these scenarios | |||
| appropriately. | appropriately. For interactive applications, the user would likely | |||
| not be satisfied with a connection attempt that succeeds only after | ||||
| For interactive applications, the user would likely not be satisfied | several seconds, anyway. [13] | |||
| with a connection attempt that succeeds only after several seconds, | ||||
| anyway. [12] | ||||
| 5.2 Deterministic transient network failures | 5.2. Deterministic transient network failures | |||
| There are some scenarios in which transient network failures could be | There are some scenarios in which transient network failures could be | |||
| deterministic. For example, consider the case in which upstream | deterministic. For example, consider the case in which upstream | |||
| network connectivity is triggered by network use. In this scenario, | network connectivity is triggered by network use. In this scenario, | |||
| the connection triggering the upstream connectivity would | the connection triggering the upstream connectivity would | |||
| deterministically receive ICMP Destination Unreachables while the | deterministically receive ICMP Destination Unreachables while the | |||
| upstream connectivity is being activated, and thus would be aborted. | upstream connectivity is being activated, and thus would be aborted. | |||
| As discussed in Section 5.1, applications usually implement mechanims | As discussed in Section 5.1, applications usually implement mechanims | |||
| to handle these scenarios appropriately. Also, connection attempts | to handle these scenarios appropriately. Also, connection attempts | |||
| are usually preceded by a UDP-based DNS name-to-address lookup. | are usually preceded by a UDP-based DNS name-to-address lookup. | |||
| Thus, unless the name-to-address mapping has been cached by a local | Thus, unless the name-to-address mapping has been cached by a local | |||
| nameserver or resolver, it will be the DNS query that will trigger | nameserver or resolver, it will be the DNS query that will trigger | |||
| the upstream network connectivity, and thus the corresponding | the upstream network connectivity, and thus the corresponding | |||
| connection will not be aborted. | connection will not be aborted. | |||
| In any case, the system-wide toggle described in section Section 4 | ||||
| could be used in these specific scenarios to override the default | ||||
| behaviour so that connections in the SYN-SENT or SYN-RECEIVED states | ||||
| are not aborted upon receipt of ICMP error messages that indicate | ||||
| "soft errors". | ||||
| 6. Future work | 6. Future work | |||
| A Higher-Level API would be useful for isolating applications from | A Higher-Level API would be useful for isolating applications from | |||
| protocol details. The API could contain the intelligence required to | protocol details. The API could contain the intelligence required to | |||
| resolve the hostname, try each destination address, etc. One could | resolve the hostname, try each destination address, etc. One could | |||
| even argue that this document wouldn't have existed if application | even argue that this document wouldn't have existed if application | |||
| programmers had been using a Higher-Level API. However, the time | programmers had been using a Higher-Level API. However, such an API | |||
| frame in which this Higher Level API would kick in would be quite | would need to be designed, standardized, implemented, deployed, and | |||
| different than that of the proposed work-around: such an API would | ||||
| need to be designed, standardized, implemented, deployed, and | ||||
| documented even before application programmers start (if ever) to use | documented even before application programmers start (if ever) to use | |||
| it. Therefore, while it is an interesting long-term solution, it is | it. | |||
| inappropriate for providing a short term fix. | ||||
| 7. Security Considerations | 7. Security Considerations | |||
| This document proposes to make TCP abort a connection in the SYN-SENT | This document describes a modification to TCP's reaction to soft | |||
| or the SYN-RECEIVED states when it receives an ICMP "Destination | errors that has been implemented in a variety of TCP/IP stacks. This | |||
| Unreachable" message that indicates a "soft error" about that | modification makes TCP abort a connection in the SYN-SENT or the SYN- | |||
| connection. While this could be used to reset valid connections, it | RECEIVED states when it receives an ICMP "Destination Unreachable" | |||
| must be noted that this behaviour is specified only for connections | message that indicates a "soft error" about that connection. While | |||
| in the SYN-SENT or the SYN-RECEIVED states, and thus the window of | this modification could be exploited to reset valid connections, it | |||
| exposure is very short. Furthermore, in order for this type to | must be noted that this behaviour is meant only for connections in | |||
| succeed, the attacker should be able to guess the four-tuple that | the SYN-SENT or the SYN-RECEIVED states, and thus the window of | |||
| identifies the target TCP connection. A discussion on this issue can | exposure is very short. | |||
| be found in [13]. | ||||
| In any case, it must be noted that the workaround proposed in this | In any case, it must be noted that the workaround discussed in this | |||
| document neither strengthens nor weakens TCP's resistance to attack. | document neither strengthens nor weakens TCP's resistance to attack. | |||
| An attacker wishing to reset valid connections could perform the | An attacker wishing to reset ongoing TCP connections could perform | |||
| attack by sending any of the ICMP error messages that indicate "hard | the attack by sending any of the ICMP error messages that indicate | |||
| errors", not only for connections in the SYN-SENT or the SYN-RECEIVED | "hard errors", not only for connections in the SYN-SENT or the SYN- | |||
| states, but for connections in any state. | RECEIVED states, but for connections in any state. | |||
| A discussion of the use of ICMP to perform a variety of attacks | A discussion of the use of ICMP to perform a variety of attacks | |||
| against TCP, and some proposed counter-measures that eliminate or | against TCP, and a number of proposed counter-measures that eliminate | |||
| greatly minimize the impact of these attacks can be found in [14]. | or greatly minimize the impact of these attacks can be found in [14]. | |||
| A discussion of the security issues arising from the use of ICMPv6 | A discussion of the security issues arising from the use of ICMPv6 | |||
| can be found in [7]. | can be found in [7]. | |||
| 8. Acknowledgements | 8. Acknowledgements | |||
| The author wishes to thank Michael Kerrisk, Eddie Kohler, Mika | The author wishes to thank Michael Kerrisk, Eddie Kohler, Mika | |||
| Liljeberg, Pasi Sarolahti, and Pekka Savola, for contributing many | Liljeberg, Pasi Sarolahti, Pekka Savola, and Joe Touch, for | |||
| valuable comments. | contributing many valuable comments. | |||
| 9. Contributors | 9. Contributors | |||
| Mika Liljeberg was the first to describe how their implementation | Mika Liljeberg was the first to describe how their implementation | |||
| treated soft errors. Based on that, the solution discussed in | treated soft errors. Based on that, the solution discussed in | |||
| Section 4 was documented in [6] by Sebastien Roy, Alain Durand and | Section 4 was documented in [6] by Sebastien Roy, Alain Durand and | |||
| James Paugh. | James Paugh. | |||
| 10. References | 10. References | |||
| 10.1 Normative References | ||||
| 10.1. Normative References | ||||
| [1] Postel, J., "Internet Control Message Protocol", STD 5, RFC 792, | [1] Postel, J., "Internet Control Message Protocol", STD 5, RFC 792, | |||
| September 1981. | September 1981. | |||
| [2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, | [2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, | |||
| September 1981. | September 1981. | |||
| [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement | [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement | |||
| Levels", BCP 14, RFC 2119, March 1997. | Levels", BCP 14, RFC 2119, March 1997. | |||
| [4] Braden, R., "Requirements for Internet Hosts - Communication | [4] Braden, R., "Requirements for Internet Hosts - Communication | |||
| Layers", STD 3, RFC 1122, October 1989. | Layers", STD 3, RFC 1122, October 1989. | |||
| [5] Braden, R., "Requirements for Internet Hosts - Application and | [5] Braden, R., "Requirements for Internet Hosts - Application and | |||
| Support", STD 3, RFC 1123, October 1989. | Support", STD 3, RFC 1123, October 1989. | |||
| [6] Roy, S., Durand, A. and J. Paugh, "Issues with Dual Stack IPv6 | [6] Roy, S., Durand, A., and J. Paugh, "Issues with Dual Stack IPv6 | |||
| on by Default", draft-ietf-v6ops-v6onbydefault-03 (work in | on by Default", draft-ietf-v6ops-v6onbydefault-03 (work in | |||
| progress), July 2004. | progress), July 2004. | |||
| [7] Conta, A. and S. Deering, "Internet Control Message Protocol | [7] Conta, A. and S. Deering, "Internet Control Message Protocol | |||
| (ICMPv6) for the Internet Protocol Version 6 (IPv6) | (ICMPv6) for the Internet Protocol Version 6 (IPv6) | |||
| Specification", RFC 2463, December 1998. | Specification", RFC 2463, December 1998. | |||
| 10.2 Informative References | 10.2. Informative References | |||
| [8] Clark, D., "Fault isolation and recovery", RFC 816, July 1982. | [8] Clark, D., "Fault isolation and recovery", RFC 816, July 1982. | |||
| [9] "TCP/IP Illustrated, Volume 1: The Protocols", Addison-Wesley , | [9] "TCP/IP Illustrated, Volume 1: The Protocols", Addison-Wesley , | |||
| 1994. | 1994. | |||
| [10] Shneiderman, B., "Response Time and Display Rate in Human | [10] Shneiderman, B., "Response Time and Display Rate in Human | |||
| Performance with Computers", ACM Computing Surveys , 1984. | Performance with Computers", ACM Computing Surveys , 1984. | |||
| [11] Thadani, A., "Interactive User Productivity", IBM Systems | [11] Thadani, A., "Interactive User Productivity", IBM Systems | |||
| Journal No. 1, 1981. | Journal No. 1, 1981. | |||
| [12] Guynes, J., "Impact of System Response Time on State Anxiety", | [12] The Linux Project, "http://www.kernel.org". | |||
| Communications of the ACM , 1988. | ||||
| [13] Watson, P., "Slipping in the Window: TCP Reset Attacks", 2004 | [13] Guynes, J., "Impact of System Response Time on State Anxiety", | |||
| CanSecWest Conference , 2004. | Communications of the ACM , 1988. | |||
| [14] Gont, F., "ICMP attacks against TCP", | [14] Gont, F., "ICMP attacks against TCP", | |||
| draft-gont-tcpm-icmp-attacks-01 (work in progress), September | draft-gont-tcpm-icmp-attacks-04 (work in progress), | |||
| 2004. | September 2005. | |||
| [15] Wright, G. and W. Stevens, "TCP/IP Illustrated, Volume 2: The | [15] Wright, G. and W. Stevens, "TCP/IP Illustrated, Volume 2: The | |||
| Implementation", Addison-Wesley , 1994. | Implementation", Addison-Wesley , 1994. | |||
| Author's Address | ||||
| Fernando Gont | ||||
| Universidad Tecnologica Nacional | ||||
| Evaristo Carriego 2644 | ||||
| Haedo, Provincia de Buenos Aires 1706 | ||||
| Argentina | ||||
| Phone: +54 11 4650 8472 | ||||
| EMail: fernando@gont.com.ar | ||||
| URI: http://www.gont.com.ar | ||||
| Appendix A. Other possible solutions | Appendix A. Other possible solutions | |||
| A.1. A more conservative approach | ||||
| A.1 A more conservative approach | ||||
| A more conservative approach would be to abort a connection in the | A more conservative approach would be to abort a connection in the | |||
| SYN-SENT or SYN-RECEIVED states only after a ICMP Destination | SYN-SENT or SYN-RECEIVED states only after a ICMP Destination | |||
| Unreacheable has been received a specified number of times, and the | Unreacheable has been received a specified number of times, and the | |||
| SYN segment has been retransmitted more than some specified number of | SYN segment has been retransmitted more than some specified number of | |||
| times. | times. | |||
| Two new parameters would have to be introduced to TCP, to be used | Two new parameters would have to be introduced to TCP, to be used | |||
| only during the connection-establishment phase: MAXSYNREXMIT and | only during the connection-establishment phase: MAXSYNREXMIT and | |||
| MAXSOFTERROR. MAXSYNREXMIT would speficy the number of times the SYN | MAXSOFTERROR. MAXSYNREXMIT would specify the number of times the SYN | |||
| segment would have to be retransmitted before a connection is | segment would have to be retransmitted before a connection is | |||
| aborted. MAXSOFTERROR would specify the number of ICMP messages | aborted. MAXSOFTERROR would specify the number of ICMP messages | |||
| indicating soft errors that would have to be received before a | indicating soft errors that would have to be received before a | |||
| connection is aborted. | connection is aborted. | |||
| Two additional variables would be introduced in implementations to | Two additional variables would need to be introduced to store | |||
| store additional state information during the | additional state information during the connection-establishment | |||
| connection-establishment phase: "nsynrexmit" and "nsofterror". Both | phase: "nsynrexmit" and "nsofterror". Both would be initialized to | |||
| would be initialized to zero. "nsynrexmit" would be incremented by | zero. "nsynrexmit" would be incremented by one every time the SYN | |||
| one every time the SYN segment is retransmitted. "nsofterror" would | segment is retransmitted. "nsofterror" would be incremented by one | |||
| be incremented by one every time an ICMP message that indicates a | every time an ICMP message that indicates a soft error is received. | |||
| soft error is received. | ||||
| A connection in the SYN-SENT or SYN-RECEIVED states would be aborted | A connection in the SYN-SENT or SYN-RECEIVED states would be aborted | |||
| if nsynrexmit was greater than MAXSYNREXMIT and "nsofterror" was | if nsynrexmit was greater than MAXSYNREXMIT and "nsofterror" was | |||
| simultaneously greater than MAXSOFTERROR. | simultaneously greater than MAXSOFTERROR. | |||
| This approach would give the network more time to solve the | This approach would give the network more time to solve the | |||
| connectivity problem. However, it should be noted that depending on | connectivity problem. However, it should be noted that depending on | |||
| the values chosen for the MAXSYNREXMIT and MAXSOFTERROR parameters, | the values chosen for the MAXSYNREXMIT and MAXSOFTERROR parameters, | |||
| this approach could still lead to long delays in connection | this approach could still lead to long delays in connection | |||
| establishment attempts. For example, BSD systems abort connections | establishment attempts. For example, BSD systems abort connections | |||
| in the SYN-SENT or the SYN-RECEIVED state when a second ICMP error is | in the SYN-SENT or the SYN-RECEIVED state when a second ICMP error is | |||
| received, and the SYN segment has been retransmitted more than three | received, and the SYN segment has been retransmitted more than three | |||
| times. They also set up a "connection-establishment timer" that | times. They also set up a "connection-establishment timer" that | |||
| imposes an upper limit on the time the connection establishment | imposes an upper limit on the time the connection establishment | |||
| attempt has to succeed, which expires after 75 seconds [15]. Even | attempt has to succeed, which expires after 75 seconds [15]. Even | |||
| when this policy is better than the three-minutes timeout policy | when this policy may be better than the three-minutes timeout policy | |||
| specified in [4], it is still inappropriate for handling the | specified in [4], it may still be inappropriate for handling the | |||
| potential problems described in this document. This more | potential problems described in this document. This more | |||
| conservative approach has been implemented in BSD systems since, at | conservative approach has been implemented in BSD systems since, at | |||
| least, 1994 [15]. | least, 1994 [15]. | |||
| A.2 Asynchronous Application Notification | A.2. Asynchronous Application Notification | |||
| In section 4.2.4.1, [4] states that there MUST be a mechanism for | In section 4.2.4.1, [4] states that there MUST be a mechanism for | |||
| reporting soft TCP error conditions to the application. Such a | reporting soft TCP error conditions to the application. Such a | |||
| mechanism (assuming one is implemented) could be used by applications | mechanism (assuming one is implemented) could be used by applications | |||
| to cycle through the destination IP addresses. However, this | to cycle through the destination IP addresses. However, this | |||
| approach would increase application complexity, and would take a long | approach would increase application complexity, and would take a long | |||
| time to kick in, as it requires every existing applications to be | time to kick in, as would require every existing applications to be | |||
| modified. Thus, it is inappropriate for providing a short term fix. | modified. | |||
| A.3 Issuing several connection requests in parallel | A.3. Issuing several connection requests in parallel | |||
| For those scenarios in which a domain name maps to several IP | For those scenarios in which a domain name maps to several IP | |||
| addresses, several connection requests could be issued in parallel, | addresses, several connection requests could be issued in parallel, | |||
| each one to a different destination IP address. The host would then | each one to a different destination IP address. The host would then | |||
| use the first connection attempt to succeed, eliminating the | use the first connection attempt to succeed, eliminating the | |||
| potential delay in establishing a connection with the destination | potential delay in establishing a connection with the destination | |||
| host. However, this would mean that every attempt to connect to a | host. However, this would mean that every attempt to connect to a | |||
| multihomed host would imply sending several SYN segments, making it | multihomed host would imply sending several SYN segments, making it | |||
| hard for network operators to distinguish valid connection attempts | hard for network operators to distinguish valid connection attempts | |||
| from those performing Denial of Service (DoS) attacks. | from those performing Denial of Service (DoS) attacks. | |||
| An alternative approach would be as follows. A host would issue a | An alternative approach would be as follows. A host would issue a | |||
| connection request to the first IP address in the list returned by | connection request to the first IP address in the list returned by | |||
| the name-to-address mapping function. If this connection request | the name-to-address mapping function. If this connection request | |||
| doesn't succeed in some time, a connection request to the second IP | doesn't succeed in some time, a connection request to the second IP | |||
| address in the list would be issued in parallel. If none of these | address in the list would be issued in parallel. If none of these | |||
| connection requests succeeds in some time, and there are still more | connection requests succeeds in some time, and there are still more | |||
| addresses left in the list, they would be tried in the same way. | addresses left in the list, they would be tried in the same way. | |||
| While this approach would, in principle, avoid the problems of the | While this approach would, in principle, avoid the problems of the | |||
| previous approach, it might be hard to define the time interval to | previous approach, it might be hard to define the time interval to | |||
| wait before issuing each parallel connection. A short time interval | wait before issuing each parallel connection request. A short time | |||
| would lead to the problems caused by the previous approach, while a | interval would lead to the problems caused by the previous approach, | |||
| long time interval would likely still lead to long delays in | while a long time interval would likely still lead to long delays in | |||
| establishing a connection with the destination host. | establishing a connection with the destination host. | |||
| In any case, it must be noted that both approachs have the same | In any case, it must be noted that both approachs have the same | |||
| drawbacks as the solution described in Appendix A.2: they would | drawbacks as the solution described in Appendix A.2: they would | |||
| increase application complexity, and would take too long to begin to | increase application complexity, and would take too long to begin to | |||
| be used by applications. Thus, they would be inappropriate for | be used by applications. | |||
| providing a short-term fix. | ||||
| Appendix B. Changes from draft-gont-tcpm-tcp-soft-errors-00 | Appendix B. Changes from draft-gont-tcpm-tcp-soft-errors-01 | |||
| o Changed wording to describe the mechanism, rather than proposing | ||||
| it | ||||
| o Miscellaneous editorial changes | ||||
| Appendix C. Changes from draft-gont-tcpm-tcp-soft-errors-00 | ||||
| o Added reference to the Linux implementation in Section 4 | o Added reference to the Linux implementation in Section 4 | |||
| o Added Section 5 | o Added Section 5 | |||
| o Added Section 6 | o Added Section 6 | |||
| o Added Appendix A.1 | o Added Appendix A.1 | |||
| o Moved section "Asynchronous Application Notification" to Appendix | o Moved section "Asynchronous Application Notification" to | |||
| A.2 | Appendix A.2 | |||
| o Added a Appendix A.3 | o Added a Appendix A.3 | |||
| o Miscellaneous editorial changes | o Miscellaneous editorial changes | |||
| Author's Address | ||||
| Fernando Gont | ||||
| Universidad Tecnologica Nacional | ||||
| Evaristo Carriego 2644 | ||||
| Haedo, Provincia de Buenos Aires 1706 | ||||
| Argentina | ||||
| Phone: +54 11 4650 8472 | ||||
| Email: fernando@gont.com.ar | ||||
| URI: http://www.gont.com.ar | ||||
| Intellectual Property Statement | Intellectual Property Statement | |||
| The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
| Intellectual Property Rights or other rights that might be claimed to | Intellectual Property Rights or other rights that might be claimed to | |||
| pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
| this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
| might or might not be available; nor does it represent that it has | might or might not be available; nor does it represent that it has | |||
| made any independent effort to identify any such rights. Information | made any independent effort to identify any such rights. Information | |||
| on the procedures with respect to rights in RFC documents can be | on the procedures with respect to rights in RFC documents can be | |||
| found in BCP 78 and BCP 79. | found in BCP 78 and BCP 79. | |||
| skipping to change at page 13, line 41 ¶ | skipping to change at page 14, line 41 ¶ | |||
| This document and the information contained herein are provided on an | This document and the information contained herein are provided on an | |||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | |||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | |||
| ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | |||
| INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | |||
| INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | |||
| WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | |||
| Copyright Statement | Copyright Statement | |||
| Copyright (C) The Internet Society (2004). This document is subject | Copyright (C) The Internet Society (2005). This document is subject | |||
| to the rights, licenses and restrictions contained in BCP 78, and | to the rights, licenses and restrictions contained in BCP 78, and | |||
| except as set forth therein, the authors retain all their rights. | except as set forth therein, the authors retain all their rights. | |||
| Acknowledgment | Acknowledgment | |||
| Funding for the RFC Editor function is currently provided by the | Funding for the RFC Editor function is currently provided by the | |||
| Internet Society. | Internet Society. | |||
| End of changes. 67 change blocks. | ||||
| 181 lines changed or deleted | 174 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||