TCP Maintenance and Minor M. Jethanandani Extensions Cisco Systems Internet-Draft M. Bashyam Intended status: Informational Ocarina Systems, Inc Expires: September 8, 2007 March 7, 2007 draft-mahesh-persist-timeout-01 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 8, 2007. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract This document describes how a connection can remain infinitely in persist state and its Denial of Service (DoS) implication on the system if there is no mechanism to recover from this anomaly. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", Jethanandani & Bashyam Expires September 8, 2007 [Page 1] Internet-Draft Improving TCP robustness in persist state March 2007 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Denial of Service . . . . . . . . . . . . . . . . . . . . . . . 4 3. Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Role of Application . . . . . . . . . . . . . . . . . . . . . . 5 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 8.1. Normative References . . . . . . . . . . . . . . . . . . . 7 8.2. Informative References . . . . . . . . . . . . . . . . . . 7 Appendix A. An Appendix . . . . . . . . . . . . . . . . . . . . . 7 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8 Intellectual Property and Copyright Statements . . . . . . . . . . 9 Jethanandani & Bashyam Expires September 8, 2007 [Page 2] Internet-Draft Improving TCP robustness in persist state March 2007 1. Introduction RFC 1122 [RFC1122] Section 4.2.2.17, page 92 says that: A TCP MAY keep its offered receive window closed indefinitely. As long as the receiving TCP continues to send acknowledgments in response to the probe segments, the sending TCP MUST allow the connection to stay open. The RFC goes on to say that it is important to remember that ACK (acknowledgement) segments that contain no data are not reliably transmitted by TCP. Therefore zero window probing SHOULD be supported to prevent a connection from hanging forever if ACK segments that re-opens the window is lost. While it is clear why the sender needs to continue to probe the receiver, it is not clear why this process needs to be indefinite, particularly if the receiver reliably responds with a ACK and a window of zero. The particular situation we ran into was with a gaming client that would receive regular updates of the ensuing game from the server. At some point the client decided to pause the game, effectively telling the application to stop reading data from the TCP connection. Another example of such a setup is a HTTP based Web conferencing. The problem is applicable to TCP and TCP derived transport protocol like SCTP. The effect of the client that stops reading data is that the server continues to send data till the advertised window goes down to zero at which time the connection enters persist state. Since the server has more buffers with data for the client, it will continue to probe the receiver. However, it is not clear what the sender is supposed to do if the receiver never exits this state. It is quite possible that the receiving end continues to advertise a zero window for an extended period of time which could result in the sender holding on to large number of buffers/data. If the sender is servicing several such clients the effect compounds itself to the extent that the system runs out of buffers and or connection resources. The sender at this point cannot service new legitimate connections and even the existing connections start seeing degraded service. It is not possible to enforce application control to recover from this scenario as will be described in the following sections of the document. Jethanandani & Bashyam Expires September 8, 2007 [Page 3] Internet-Draft Improving TCP robustness in persist state March 2007 For TCP to persist indefinitely makes the end point vulnerable to a DoS attack. We therefore suggest that TCP end point SHOULD NOT persist for an indefinite amount of time. 2. Denial of Service One instance of a DoS that is possible is for clients to open a large number of connections that will ultimately enter persist state causing TCP to run out of resources. It is also possible for the client to open its receive window briefly and with a small value, enough to make the server take the connection out of persist state. To prevent this, and only when the administrator has opted to use the solution described below, we would apply a threshold check on the receive window to be at least one Maximum Segment Size (MSS) before taking the connection out of persist state. 3. Solution The current behavior of the connection in persist state SHALL continue to exist as the default behavior. We are proposing an option to enable an upper bound to the persist state with an absolute time limit or via a set number of retries. To enable an upper bound to the persist state, the administrator MAY configure an option. The option SHOULD be configured as a time or number of retries. If both the options are configured, whichever option kicks in first will take effect. If the configured option is time then that implies how long the connection will be allowed to stay in persist state. The configured option is called persist-state-expiry-time. When the connection enters persist state, i.e. the receiver advertises a window of zero, the value of current time is saved in the connection entry. This entry is called persist-entry-time. Thereafter every time the persist timer expires, and before it is set, or when an ACK is received that continues to advertise zero window, a check is done to make sure that the difference between current time and persist-entry- time is not more than persist-state-expiry-time. If it is then the connection is reset and the connection resources are reclaimed by TCP. Any time after the connection has gone into persist state and before reset of the connection, if the receiver advertises a non-zero window, the persist-entry-time is cleared. If the configured option is number of retries it implies the number Jethanandani & Bashyam Expires September 8, 2007 [Page 4] Internet-Draft Improving TCP robustness in persist state March 2007 of retries that will be made before the connection is aborted. The configured option is called persist-state-expiry-retries. When the connection enters persist state, i.e. the receiver advertises a window of zero, the count of retries called persist-state-retry-count in the connection entry is cleared. Thereafter every time the persist timer expires, and before it is set, or when and ACK is received that continues to advertise zero window, a check is done to make sure that persist-state-retry-count does not exceed persist- state-expiry-retries. If it does, the connection is reset and the connection resources are reclaimed by TCP. Any time after the connection has gone into persist state and before reset of the connection, if the receiver advertises a non-zero window, the persist-state-expiry-retries is cleared. If the difference between the current retry count and persist-entry-expiry-count is less than the persist-state-expiry-retries, the current retry count is incremented by one. This configuration option of persist-state- expiry-retries is more coarse grained compared to the persist-state- expiry-time option. Application can suggest a persist-state-expiry-time or the persist- state-expiry-retries to TCP. The application suggested values will override the default values that TCP will use. These values will apply to only the application and the connections on which the values have been suggested and not to all TCP connections. The default values should allow for the sender to send probes a few times. More experimentation is required to come up with the default values. However, TCP may find that in spite of implementing the above suggested solution it is still running out of resources because there are too many connections in persist state. A smaller value of the persist-state-expiry-time or the persist-state-retry-count would help clear some of these connections sooner. Alternatively, the schemes that TCP can use to decide which connections to clear is to look at connections that are holding the maximum number of buffers for the longest amount of time. An ordered list of the TCP send queue size times delta of the current stamp and the time when the connection entered persist state will give TCP an idea of which connections are holding the maximum number of resources. 4. Role of Application In order to understand if application can play a role in solving this problem, one needs to understand the current behavior of application vis-a-vis TCP. Jethanandani & Bashyam Expires September 8, 2007 [Page 5] Internet-Draft Improving TCP robustness in persist state March 2007 Applications today do not know if a connection is stuck in persist state. Application in most cases is even unaware why TCP is not sending any more data. It cannot distinguish between segments getting dropped because of network issues or send window not advancing because the other end has closed the window. Trying to keep the application appraised of what is causing the problem only takes care of that particular connection and that particular application. It does not take care of all applications and all connections that might be in persist state. TCP in most cases will not signal that a connection is blocked. This is particularly true if there are buffers available or application has no more data to send. If the application were to poll TCP to get the information, it is not clear how often it would need to poll. As described before TCP MAY not send more data because of several reasons and in most cases the polling will show that the connection MAY not even be in persist state. It is also possible for applications to write data and exit before the data is sent. An example of this application is HTTP server. When a HTTP server receives a HTTP request like a GET, the server will respond with data and go ahead and close the socket even before TCP has finished sending all the data. In that case, TCP has no application it can inform to take action on a connection stuck in persist state. There are cases where the system is application agnostic. A classic case of this is a TCP proxy. In that particular case, there is no end application that can be informed of the state of the connection for the application to take action. Resources like TCP buffers are system wide resources and are not tied to any particular application. TCP needs to be able to monitor buffer usage on a per connection basis for it to detect and drop packets on connections that are taking up a lot of buffers. TCP cannot rely on an application to perform the task of looking at buffers system wide. Applications have a role to play in solving this problem. They can register for an asynchronous notification when the TCP connection enters or exits persist state. They can use the notification mechanism to implement their own scheme of deciding which persist connections to clear. They can also suggest timeout or retry values to TCP. It is quite possible that the application that is encountering the problem may not have implemented the above mentioned scheme. Since the impact of a connection in persist state is system wide all Jethanandani & Bashyam Expires September 8, 2007 [Page 6] Internet-Draft Improving TCP robustness in persist state March 2007 applications have to have implemented the option for the solution to be effective. Even one application that has not implemented the option can cause the entire system to be impacted. It is also not possible to get every application to implement detection of persist state and have it clear the connection. However TCP can look at the persist state system wide. TCP already keeps track of connections in persist state. The advantage of doing this in TCP is that once enabled, the entire system including all the applications benefit. Moreover, resources like buffers which are system wide can be monitored by TCP to determine when to reset a connection and reclaim the resources. 5. IANA Considerations This document makes no request of IANA. 6. Security Considerations This document discusses one security consideration. That is the possible DoS attacks discussed in Section 2. 7. Acknowledgements Thanks to Anantha Ramaiah who spent countless hours reviewing, commenting and proposing changes the draft. Thanks also to Fred Baker for providing his feedback on the draft. 8. References 8.1. Normative References [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 8.2. Informative References Appendix A. An Appendix Jethanandani & Bashyam Expires September 8, 2007 [Page 7] Internet-Draft Improving TCP robustness in persist state March 2007 Authors' Addresses Mahesh Jethanandani Cisco Systems 170 West Tasman Drive San Jose, California 95134 USA Phone: +1-408-527-8230 Fax: +1-408-527-0147 Email: mahesh@cisco.com URI: www.cisco.com Murali Bashyam Ocarina Systems, Inc Fremont, CA USA Phone: Fax: Email: mbashyam@ocarinatech.com URI: Jethanandani & Bashyam Expires September 8, 2007 [Page 8] Internet-Draft Improving TCP robustness in persist state March 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Jethanandani & Bashyam Expires September 8, 2007 [Page 9]