Network Working Group Reiner Ludwig INTERNET-DRAFT Ericsson Research Expires: January 2002 July, 2001 TCP Retransmit (RXT) Flag Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document proposes a solution to TCPÆs retransmission ambiguity problem. It is based on using a single bit, named the Retransmit (RXT) flag, taken from the Reserved field of the TCP header. The TCP sender sets the RXT flag in segments containing retransmitted data. In response to such a segment, the TCP receiver sends an immediate pure ACK with the RXT flag set. By inspecting the RXT flag of the first new ACK that arrives after a retransmit, the TCP sender can detect whether the retransmit was spurious. Ludwig [Page 1] INTERNET-DRAFT TCP Retransmit (RXT) Flag July, 2001 Terminology The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. We use the term æpure ACKÆ to refer to an ACK that is not piggy backed onto a data segment. We use the term æimmediate ACKÆ to refer to an ACK that the TCP receiver immediately sends in response to the arrival of a data segment. That is, without waiting for the delayed- ACK timer [RFC1122] to expire. We use the term æoriginal ACKÆ to refer to an ACK that correspond to the first-time transmission of a data segment. We use the term ænew ACKÆ to refer to an ACK that acknowledges outstanding data. These terms are not exclusive. Thus, an ACK could, e.g., be a new, original, immediate, and pure ACK all at the same time, or it could, e.g., be any one of these but none of the others. Furthermore, we use the term æduplicateÆ ACK as defined in [WS95] We borrow the definition of æsnd_unaÆ from [WS95] which defines snd_una as the lowest previously unsent sequence number. 1. Introduction The retransmission ambiguity problem [KP87] is the TCP senderÆs inability to distinguish an ACK that was triggered by the first-time transmission of a data segment from the ACK that was triggered by the retransmit of that segment. TCPÆs retransmission ambiguity problem inevitably fools a TCP sender into unnecessarily retransmitting data (go-back-N style) after it has taken a spurious timeout [LK00]. For many paths through the Internet this does not create a serious problem as long as the TCP sender implements the retransmission timer specified in [RFC2988]. This is because that retransmission timer is extremely conservative [LS00]. However, across paths that include links which may often only provide intermittent connectivity, spurious timeouts are more frequently found [GU01]. For example, wireless access links may often be subject to handovers and resource preemption, or the mobile transmitter may traverse through a radio coverage hole. Such disrupting events may easily trigger a spurious timeout despite a conservative retransmission timer. Here, the unnecessary go-back-N retransmits do create a serious problem since they decrease end-to-end throughput, are useless load upon a potentially congested network, and waste transmission (battery) power. Independent of path characteristics, the unnecessary retransmits after a spurious timeout create a strong incentive to stick with a conservative retransmission timer. I.e. it discourages developing and Ludwig [Page 2] INTERNET-DRAFT TCP Retransmit (RXT) Flag July, 2001 deploying more aggressive retransmission timers which would increase the performance of especially interactive request/response-style applications. Typically, such application can not benefit from TCPÆs fast retransmit algorithm since they do not put sufficient data in flight to trigger the required number of duplicate ACKs. Hence, TCP will have to rely on its timeout-based error recovery. This motivates the need for finding an efficient solution to resolve the retransmission ambiguity in TCP. We propose a marking scheme, named the RXT scheme, based on the RXT flag as such a solution. Based on the RXT scheme, we define the Eifel algorithm as one way of detecting spurious retransmits, and compare it with other known alternatives. [Note, that the original proposal of the Eifel algorithm [LK00] included the TCP senderÆs response to a detected spurious retransmit. We have dropped that response part and leave that to future documents (see, e.g., [EA01] and [Lud01]). Thus, the Eifel algorithm as defined here is a pure detection scheme for spurious retransmits.] The mentioned unnecessary go-back-N retransmits after a spurious timeout can not be avoided with neither the SACK [RFC2018] nor the D- SACK [RFC2883] option. This is because the SACK/D-SACK information would arrive too late at a TCP sender that has taken a spurious timeout. More precisely, the ACKs carrying the SACK/D-SACK option would only arrive after the original ACKs have already clocked out the unnecessary retransmits. The exception would be the unlikely event that all of those original ACKs got lost. It has been shown how the timestamp option [RFC1323] could be used to solve this problem [LK00]. However, the price for that solution is the extra overhead added by the timestamp option field: 12 bytes for every data segment and for every ACK. In addition, the presence of timestamps in a TCP flow effectively disable widely deployed TCP/IP header compression code [RFC1144]. This document proposes an alternative solution. It is based on using a single bit, named the Retransmit (RXT) flag, taken from the Reserved field of the TCP header. 2. Definition of the RXT Flag We define bit 6 in the Reserved field of the TCP header as the RXT flag. The location of the 6-bit Reserved field in the TCP header is shown in Figure 3 of [RFC793]. Bit 8 and 9 of the Reserved field have been assigned to the Explicit Congestion Notification (ECN) [RFC2481] while bit 7 is under discussion to be assigned to the nonce scheme proposed in [WES01]. Ludwig [Page 3] INTERNET-DRAFT TCP Retransmit (RXT) Flag July, 2001 3. Initial Handshake When a TCP sends a SYN segment, it MAY set the RXT flag. For a SYN segment, the setting of the RXT flag is defined as an indication that the TCP sending the SYN segment wishes to participate in the RXT scheme as both a sender and a receiver. When a TCP receives a SYN segment with the RXT flag set, it MAY set the RXT flag when it sends the SYN-ACK segment. For a SYN-ACK segment, the setting of the RXT flag is defined as an indication that the TCP sending the SYN-ACK segment agrees to participate in the RXT scheme as both a sender and a receiver. Note, setting the RXT flag in either the SYN or the SYN-ACK segment is not an indication that the segment is a retransmit. 4. TCP Sender If both TCPs have agreed to participate in the RXT scheme, the TCP sender SHOULD set the RXT flag in segments containing retransmitted data. In all other cases, it SHOULD reset the RXT flag in data segments it sends. 5. TCP Receiver If both TCPs have agreed to participate in the RXT scheme, the TCP receiver SHOULD send an immediate pure ACK with the RXT flag set in response to a data segment that arrived with the RXT flag set. In all other situations where a pure ACK is sent, the TCP receiver SHOULD reset the RXT flag. 6. Using the RXT Flag to Detect Spurious Retransmits In this section, we propose the Eifel algorithm as one way of detecting spurious retransmits, and compare it with other known alternatives. The Eifel algorithm is a solution to resolve the retransmission ambiguity in TCP. It thereby offers the TCP sender a fast way to detect spurious retransmits. More precisely, the Eifel algorithm already decides upon the first ACK that acknowledges a retransmit whether the retransmit was spurious. Being able to decide upon the first new ACK is crucial to avoid the unnecessary go-back-N retransmits that typically occur after a spurious timeout. Ludwig [Page 4] INTERNET-DRAFT TCP Retransmit (RXT) Flag July, 2001 To resolve the retransmission ambiguity, the Eifel algorithm relies on a segment marking scheme. The RXT scheme proposed in Section 3-5 is such a scheme. An alternative scheme based on timestamps is outlined in Section 6.3.1. Note, that the original proposal of the Eifel algorithm [LK00] included the TCP senderÆs response to a detected spurious retransmit. We have dropped that response part and leave that to future documents (see, e.g., [EA01] and [Lud01]). Thus, the Eifel algorithm as defined here is a pure detection scheme for spurious retransmits. 6.1. Events Causing Spurious Retransmits The following events are reasons for falsely triggering TCPÆs error recovery (causing a so-called spurious retransmit) and congestion control algorithms: - spurious timeouts - packet re-ordering - packet duplication Generally speaking a spurious timeout is a timeout that would not have occurred had the sender "waited longer". This can have a number of reasons. The typical reason is that a data segment itself or the first new original ACK got excessively delayed in the network. Another reason could be the loss of a series of original ACKs from the entire flight of ACKs. This may cause an aggressive retransmission timer to expire prematurely. However, this is an unlikely event as long as a conservative retransmission timer such as [RFC2988] is used. Yet, another reason would be a situation where the third duplicate ACK for a segment arrives after the TCP sender has already retransmitted that segment due to a timeout. In some TCP implementation, the arrival of that third duplicate ACK may then trigger a spurious fast retransmit. This last reason for a spurious timeout will not be further addressed in this document (maybe in [Lud01]). Packet reordering can occur due to the connection-less nature of IP [RFC791] which does not guarantee an in-order delivery of packets. This results in a spurious fast retransmit if three or more data segments arrive out-of-order at the TCP receiver (assuming that at least three duplicate ACKs arrive back at the TCP sender). The reason is that a TCP receiver generates a duplicate ACK for each segment that arrives out-of-order, and three consecutive duplicate ACKs trigger the TCP senderÆs fast retransmit algorithm. This assumes that the TCP sender uses the recommended value of three for the duplicate ACK threshold [RFC2581]. Ludwig [Page 5] INTERNET-DRAFT TCP Retransmit (RXT) Flag July, 2001 Likewise, packet duplication in the network may also result in a spurious fast retransmit. This happens if duplication of data segments or ACKs results in three or more duplicate ACKs to arrive at the TCP sender. 6.2. The RXT-Flag-based Eifel Algorithm When a retransmit is sent, the TCP sender stores in æsnd_una_prevÆ the current value of snd_una. By inspecting the first ACK acknowledging a retransmit, a TCP sender determines as follows whether the retransmit was spurious: if (1) the ACKÆs sequence number is less or equal to snd_una_prev, and (2) the ACKÆs RXT flag is not set, then the last retransmit was spurious. The range check in the first condition ensures that only those ACKs are considered that correspond to segments that were outstanding at the time the retransmit was sent. If such an ACK does not have the RXT flag set, then this ACK is an original ACK which could not have been sent in response to the retransmit. Thus, with a high degree of certainty (see the following paragraph) the first-time transmission of the retransmitted segment did arrive at the TCP receiver. In this case, the original ACK must have been either triggered by that first- time transmission or another first-time transmission that followed. The latter could happen because of the delayed-ACK scheme or loss of earlier original ACKs. It is possible to construct a pathological case where this algorithm fails, i.e., it concludes that a retransmit was spurious when in fact it was not. This could happen after a genuine loss of a data segment if - the corresponding retransmit arrived at the TCP receiver in place of the first-time transmission, i.e., jumping ahead of all data segments that were sent between the first-time transmission and the retransmit, and if - the ACK for the retransmit got lost, i.e., did not arrive at the TCP sender. We believe that this case is unlikely enough to be neglected; especially since it does not seem conceivable how a malicious TCP receiver could exploit this situation to its benefit. Furthermore, it seems difficult to devise an alternative detection algorithm that is able to decide already upon the first ACK acknowledging a retransmit, but that does fail in this pathological case. However, as mentioned before, being able to decide upon the first new ACK is crucial after a spurious timeout. Therefore, we believe that it is save to recommend the following. Ludwig [Page 6] INTERNET-DRAFT TCP Retransmit (RXT) Flag July, 2001 A TCP sender MAY use the RXT-Flag-based Eifel algorithm to detect that the last timeout(s) it has taken for the currently oldest outstanding segment was/were spurious. A TCP sender MAY use the RXT-Flag-based Eifel algorithm to detect that the last fast retransmit it has sent for the currently oldest outstanding segment was spurious. 6.3. Alternatives for Detecting Spurious Retransmits 6.3.1 The Timestamp-based Eifel Algorithm This is described in [Lud01] and will be moved here later. It is planned to recommend the following. A TCP sender MAY use the Timestamp-based Eifel algorithm to detect that the last timeout(s) it has taken for the currently oldest outstanding segment was/were spurious. A TCP sender MAY use the Timestamp-based Eifel algorithm to detect that the last fast retransmit it has sent for the currently oldest outstanding segment was spurious. 6.3.2 Using the SACK/DSACK Option To be completed later. 6.4. Evaluating the Alternatives (Pros & Cons) To be completed later. 6.4.1 Reliability 6.4.2 Responsiveness - SACK/DSACK comes too late to prevent go-back-N retransmits - DSACK seems to be best candidate to deal with packet duplication (suppress fast retransmit) 6.4.3 Protocol Overhead (including interaction with header compression code: timestamps disable RFC1144, the RXT flag does not) 6.4.4 Robustness Against ACK Loss Ludwig [Page 7] INTERNET-DRAFT TCP Retransmit (RXT) Flag July, 2001 If the single DSACK is lost then DSACK-based schemes fail => not so robust. This is different with the RXT scheme since here itÆs the flight of ACKs without the RXT flag set that matter. => some of those ACKs may get lost and the Eifel algorithm will still work. 6.4.5 Robustness Against Lying Receivers 5. Security Considerations The RXT scheme does not alter TCPÆs congestion control behavior [RFC2581], and there is no obvious benefit for neither the TCP sender nor the TCP receiver to lie about the RXT flag. Hence, there seem to be no security concerns. Acknowledgments Many thanks to Keith Sklower for helping to develop the tools that allowed the study of spurious timeouts. Many thanks to Randy Katz, Michael Meyer, Stephan Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, and Andrei Gurtov for discussions around the Eifel algorithm which includes the RXT scheme. References [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, RFC 2581, April 1999. [EA01] E. Blanton, M. Allman, Adjusting the Duplicate ACK Threshold to Avoid Spurious Retransmits, work in progress, July 2001. [RFC1122] R. Braden, Requirements for Internet Hosts - Communication Layers, RFC 1122, October 1989. [RFC2119] S. Bradner, Key words for use in RFCs to Indicate Requirement Levels, RFC 2119, March 1997. [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High Performance, RFC 1323, May 1992. [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective Acknowledgement Options, RFC 2018, October 1996. [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, An Extension to the Selective Acknowledgement (SACK) Option for TCP, RFC 2883, July 2000. Ludwig [Page 8] INTERNET-DRAFT TCP Retransmit (RXT) Flag July, 2001 [GU01] A. Gurtov, Effect of Delays on TCP Performance, In Proceedings of IFIP Personal Wireless Conference '2001. [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates in Reliable Transport Protocols, In Proceedings of ACM SIGCOMM 87. [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP Robust Against Spurious Retransmissions, ACM Computer Communication Review, Vol. 30, No. 1, January 2000, available at http://www.acm.org/sigcomm/ccr/archive/2000/ jan00/ccr-200001-ludwig.html (easier studied when viewed/printed in color). [LS00] R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM Computer Communication Review, Vol. 30, No. 3, July 2000. [Lud01] R. Ludwig, TCPÆs Response after Detecting a Spurious Timeout (was: ôThe Eifel Algorithm for TCPö), work in progress, July 2001. [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, RFC 2988, November 2000. [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981. [RFC793] J. Postel, Transmission Control Protocol, RFC793, September 1981. [WES01] D. Wetherall, D. Ely, N. Spring, Robust ECN Signaling with Nonces, work in progress, July 2001. [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 (The Implementation), Addison Wesley, January 1995. Author's Address Reiner Ludwig Ericsson Research (EED) Ericsson Allee 1 52134 Herzogenrath, Germany Phone: +49 2407 575 719 Fax: +49 2407 575 400 Reiner.Ludwig@Ericsson.com This Internet-Draft expires in January 2002. Ludwig [Page 9]