Network Working Group                                      Reiner Ludwig
INTERNET-DRAFT                                         Ericsson Research
Expires: January 2002                                         July, 2001


                       TCP Retransmit (RXT) Flag
                <draft-ludwig-tsvwg-tcp-rxt-flag-01.txt>


Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/lid-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Abstract

   This document proposes a solution to TCPÆs retransmission ambiguity
   problem. It is based on using a single bit, named the Retransmit
   (RXT) flag, taken from the Reserved field of the TCP header. The TCP
   sender sets the RXT flag in segments containing retransmitted data.
   In response to such a segment, the TCP receiver sends an immediate
   pure ACK with the RXT flag set. By inspecting the RXT flag of the
   first new ACK that arrives after a retransmit, the TCP sender can
   detect whether the retransmit was spurious.


Ludwig                                                          [Page 1]

INTERNET-DRAFT         TCP Retransmit (RXT) Flag             July, 2001


Terminology

   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in [RFC2119].

   We use the term æpure ACKÆ to refer to an ACK that is not piggy
   backed onto a data segment. We use the term æimmediate ACKÆ to refer
   to an ACK that the TCP receiver immediately sends in response to the
   arrival of a data segment. That is, without waiting for the delayed-
   ACK timer [RFC1122] to expire. We use the term æoriginal ACKÆ to
   refer to an ACK that correspond to the first-time transmission of a
   data segment. We use the term ænew ACKÆ to refer to an ACK that
   acknowledges outstanding data. These terms are not exclusive. Thus,
   an ACK could, e.g., be a new, original, immediate, and pure ACK all
   at the same time, or it could, e.g., be any one of these but none of
   the others.

   Furthermore, we use the term æduplicateÆ ACK as defined in [WS95]

   We borrow the definition of æsnd_unaÆ from [WS95] which defines
   snd_una as the lowest previously unsent sequence number.


1. Introduction

   The retransmission ambiguity problem [KP87] is the TCP senderÆs
   inability to distinguish an ACK that was triggered by the first-time
   transmission of a data segment from the ACK that was triggered by the
   retransmit of that segment. TCPÆs retransmission ambiguity problem
   inevitably fools a TCP sender into unnecessarily retransmitting data
   (go-back-N style) after it has taken a spurious timeout [LK00]. For
   many paths through the Internet this does not create a serious
   problem as long as the TCP sender implements the retransmission timer
   specified in [RFC2988]. This is because that retransmission timer is
   extremely conservative [LS00].

   However, across paths that include links which may often only provide
   intermittent connectivity, spurious timeouts are more frequently
   found [GU01]. For example, wireless access links may often be subject
   to handovers and resource preemption, or the mobile transmitter may
   traverse through a radio coverage hole. Such disrupting events may
   easily trigger a spurious timeout despite a conservative
   retransmission timer. Here, the unnecessary go-back-N retransmits do
   create a serious problem since they decrease end-to-end throughput,
   are useless load upon a potentially congested network, and waste
   transmission (battery) power.

   Independent of path characteristics, the unnecessary retransmits
   after a spurious timeout create a strong incentive to stick with a
   conservative retransmission timer. I.e. it discourages developing and


Ludwig                                                          [Page 2]

INTERNET-DRAFT         TCP Retransmit (RXT) Flag             July, 2001


   deploying more aggressive retransmission timers which would increase
   the performance of especially interactive request/response-style
   applications. Typically, such application can not benefit from TCPÆs
   fast retransmit algorithm since they do not put sufficient data in
   flight to trigger the required number of duplicate ACKs. Hence, TCP
   will have to rely on its timeout-based error recovery.

   This motivates the need for finding an efficient solution to resolve
   the retransmission ambiguity in TCP. We propose a marking scheme,
   named the RXT scheme, based on the RXT flag as such a solution. Based
   on the RXT scheme, we define the Eifel algorithm as one way of
   detecting spurious retransmits, and compare it with other known
   alternatives.

       [Note, that the original proposal of the Eifel algorithm [LK00]
       included the TCP senderÆs response to a detected spurious
       retransmit. We have dropped that response part and leave that to
       future documents (see, e.g., [EA01] and [Lud01]). Thus, the
       Eifel algorithm as defined here is a pure detection scheme for
       spurious retransmits.]

   The mentioned unnecessary go-back-N retransmits after a spurious
   timeout can not be avoided with neither the SACK [RFC2018] nor the D-
   SACK [RFC2883] option. This is because the SACK/D-SACK information
   would arrive too late at a TCP sender that has taken a spurious
   timeout. More precisely, the ACKs carrying the SACK/D-SACK option
   would only arrive after the original ACKs have already clocked out
   the unnecessary retransmits. The exception would be the unlikely
   event that all of those original ACKs got lost.

   It has been shown how the timestamp option [RFC1323] could be used to
   solve this problem [LK00]. However, the price for that solution is
   the extra overhead added by the timestamp option field: 12 bytes for
   every data segment and for every ACK. In addition, the presence of
   timestamps in a TCP flow effectively disable widely deployed TCP/IP
   header compression code [RFC1144].

   This document proposes an alternative solution. It is based on using
   a single bit, named the Retransmit (RXT) flag, taken from the
   Reserved field of the TCP header.


2. Definition of the RXT Flag

   We define bit 6 in the Reserved field of the TCP header as the RXT
   flag. The location of the 6-bit Reserved field in the TCP header is
   shown in Figure 3 of [RFC793]. Bit 8 and 9 of the Reserved field have
   been assigned to the Explicit Congestion Notification (ECN) [RFC2481]
   while bit 7 is under discussion to be assigned to the nonce scheme
   proposed in [WES01].


Ludwig                                                          [Page 3]

INTERNET-DRAFT         TCP Retransmit (RXT) Flag             July, 2001


3. Initial Handshake

   When a TCP sends a SYN segment, it MAY set the RXT flag. For a SYN
   segment, the setting of the RXT flag is defined as an indication that
   the TCP sending the SYN segment wishes to participate in the RXT
   scheme as both a sender and a receiver.

   When a TCP receives a SYN segment with the RXT flag set, it MAY
   set the RXT flag when it sends the SYN-ACK segment. For a SYN-ACK
   segment, the setting of the RXT flag is defined as an indication that
   the TCP sending the SYN-ACK segment agrees to participate in the RXT
   scheme as both a sender and a receiver.

   Note, setting the RXT flag in either the SYN or the SYN-ACK segment
   is not an indication that the segment is a retransmit.


4. TCP Sender

   If both TCPs have agreed to participate in the RXT scheme, the TCP
   sender SHOULD set the RXT flag in segments containing retransmitted
   data.

   In all other cases, it SHOULD reset the RXT flag in data segments it
   sends.


5. TCP Receiver

   If both TCPs have agreed to participate in the RXT scheme, the TCP
   receiver SHOULD send an immediate pure ACK with the RXT flag set in
   response to a data segment that arrived with the RXT flag set.

   In all other situations where a pure ACK is sent, the TCP receiver
   SHOULD reset the RXT flag.


6. Using the RXT Flag to Detect Spurious Retransmits

   In this section, we propose the Eifel algorithm as one way of
   detecting spurious retransmits, and compare it with other known
   alternatives. The Eifel algorithm is a solution to resolve the
   retransmission ambiguity in TCP. It thereby offers the TCP sender a
   fast way to detect spurious retransmits. More precisely, the Eifel
   algorithm already decides upon the first ACK that acknowledges a
   retransmit whether the retransmit was spurious. Being able to decide
   upon the first new ACK is crucial to avoid the unnecessary go-back-N
   retransmits that typically occur after a spurious timeout.


Ludwig                                                          [Page 4]

INTERNET-DRAFT         TCP Retransmit (RXT) Flag             July, 2001


   To resolve the retransmission ambiguity, the Eifel algorithm relies
   on a segment marking scheme. The RXT scheme proposed in Section 3-5
   is such a scheme. An alternative scheme based on timestamps is
   outlined in Section 6.3.1.

   Note, that the original proposal of the Eifel algorithm [LK00]
   included the TCP senderÆs response to a detected spurious retransmit.
   We have dropped that response part and leave that to future documents
   (see, e.g., [EA01] and [Lud01]). Thus, the Eifel algorithm as defined
   here is a pure detection scheme for spurious retransmits.

6.1. Events Causing Spurious Retransmits

   The following events are reasons for falsely triggering TCPÆs error
   recovery (causing a so-called spurious retransmit) and congestion
   control algorithms:

     - spurious timeouts

     - packet re-ordering

     - packet duplication

   Generally speaking a spurious timeout is a timeout that would not
   have occurred had the sender "waited longer". This can have a number
   of reasons. The typical reason is that a data segment itself or the
   first new original ACK got excessively delayed in the network.
   Another reason could be the loss of a series of original ACKs from
   the entire flight of ACKs. This may cause an aggressive
   retransmission timer to expire prematurely. However, this is an
   unlikely event as long as a conservative retransmission timer such as
   [RFC2988] is used. Yet, another reason would be a situation where the
   third duplicate ACK for a segment arrives after the TCP sender has
   already retransmitted that segment due to a timeout. In some TCP
   implementation, the arrival of that third duplicate ACK may then
   trigger a spurious fast retransmit. This last reason for a spurious
   timeout will not be further addressed in this document (maybe in
   [Lud01]).

   Packet reordering can occur due to the connection-less nature of IP
   [RFC791] which does not guarantee an in-order delivery of packets.
   This results in a spurious fast retransmit if three or more data
   segments arrive out-of-order at the TCP receiver (assuming that at
   least three duplicate ACKs arrive back at the TCP sender). The reason
   is that a TCP receiver generates a duplicate ACK for each segment
   that arrives out-of-order, and three consecutive duplicate ACKs
   trigger the TCP senderÆs fast retransmit algorithm. This assumes that
   the TCP sender uses the recommended value of three for the duplicate
   ACK threshold [RFC2581].


Ludwig                                                          [Page 5]

INTERNET-DRAFT         TCP Retransmit (RXT) Flag             July, 2001


   Likewise, packet duplication in the network may also result in a
   spurious fast retransmit. This happens if duplication of data
   segments or ACKs results in three or more duplicate ACKs to arrive at
   the TCP sender.


6.2. The RXT-Flag-based Eifel Algorithm

   When a retransmit is sent, the TCP sender stores in æsnd_una_prevÆ
   the current value of snd_una.

   By inspecting the first ACK acknowledging a retransmit, a TCP sender
   determines as follows whether the retransmit was spurious: if (1) the
   ACKÆs sequence number is less or equal to snd_una_prev, and (2) the
   ACKÆs RXT flag is not set, then the last retransmit was spurious.

   The range check in the first condition ensures that only those ACKs
   are considered that correspond to segments that were outstanding at
   the time the retransmit was sent. If such an ACK does not have the
   RXT flag set, then this ACK is an original ACK which could not have
   been sent in response to the retransmit. Thus, with a high degree of
   certainty (see the following paragraph) the first-time transmission
   of the retransmitted segment did arrive at the TCP receiver. In this
   case, the original ACK must have been either triggered by that first-
   time transmission or another first-time transmission that followed.
   The latter could happen because of the delayed-ACK scheme or loss of
   earlier original ACKs.

   It is possible to construct a pathological case where this algorithm
   fails, i.e., it concludes that a retransmit was spurious when in fact
   it was not. This could happen after a genuine loss of a data segment
   if

     - the corresponding retransmit arrived at the TCP receiver in
       place of the first-time transmission, i.e., jumping ahead of all
       data segments that were sent between the first-time transmission
       and the retransmit, and if

     - the ACK for the retransmit got lost, i.e., did not arrive at the
       TCP sender.

   We believe that this case is unlikely enough to be neglected;
   especially since it does not seem conceivable how a malicious TCP
   receiver could exploit this situation to its benefit. Furthermore, it
   seems difficult to devise an alternative detection algorithm that is
   able to decide already upon the first ACK acknowledging a retransmit,
   but that does fail in this pathological case. However, as mentioned
   before, being able to decide upon the first new ACK is crucial after
   a spurious timeout.

   Therefore, we believe that it is save to recommend the following.


Ludwig                                                          [Page 6]

INTERNET-DRAFT         TCP Retransmit (RXT) Flag             July, 2001


   A TCP sender MAY use the RXT-Flag-based Eifel algorithm to detect
   that the last timeout(s) it has taken for the currently oldest
   outstanding segment was/were spurious.

   A TCP sender MAY use the RXT-Flag-based Eifel algorithm to detect
   that the last fast retransmit it has sent for the currently oldest
   outstanding segment was spurious.


6.3. Alternatives for Detecting Spurious Retransmits

6.3.1 The Timestamp-based Eifel Algorithm

   This is described in [Lud01] and will be moved here later. It is
   planned to recommend the following.

   A TCP sender MAY use the Timestamp-based Eifel algorithm to detect
   that the last timeout(s) it has taken for the currently oldest
   outstanding segment was/were spurious.

   A TCP sender MAY use the Timestamp-based Eifel algorithm to detect
   that the last fast retransmit it has sent for the currently oldest
   outstanding segment was spurious.


6.3.2 Using the SACK/DSACK Option

   To be completed later.

6.4. Evaluating the Alternatives (Pros & Cons)

   To be completed later.

6.4.1 Reliability

6.4.2 Responsiveness

   - SACK/DSACK comes too late to prevent go-back-N retransmits

   - DSACK seems to be best candidate to deal with packet duplication
   (suppress fast retransmit)

6.4.3 Protocol Overhead

   (including interaction with header compression code: timestamps
   disable RFC1144, the RXT flag does not)

6.4.4 Robustness Against ACK Loss


Ludwig                                                          [Page 7]

INTERNET-DRAFT         TCP Retransmit (RXT) Flag             July, 2001


   If the single DSACK is lost then DSACK-based schemes fail => not so
   robust.

   This is different with the RXT scheme since here itÆs the flight of
   ACKs without the RXT flag set that matter. => some of those ACKs may
   get lost and the Eifel algorithm will still work.

6.4.5 Robustness Against Lying Receivers


5. Security Considerations

   The RXT scheme does not alter TCPÆs congestion control behavior
   [RFC2581], and there is no obvious benefit for neither the TCP sender
   nor the TCP receiver to lie about the RXT flag. Hence, there seem to
   be no security concerns.


Acknowledgments

   Many thanks to Keith Sklower for helping to develop the tools that
   allowed the study of spurious timeouts. Many thanks to Randy Katz,
   Michael Meyer, Stephan Baucke, Sally Floyd, Vern Paxson, Mark Allman,
   Ethan Blanton, and Andrei Gurtov for discussions around the Eifel
   algorithm which includes the RXT scheme.

References

   [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control,
             RFC 2581, April 1999.

   [EA01]   E. Blanton, M. Allman, Adjusting the Duplicate ACK
             Threshold to Avoid Spurious Retransmits, work in progress,
             July 2001.

   [RFC1122] R. Braden, Requirements for Internet Hosts - Communication
             Layers, RFC 1122, October 1989.

   [RFC2119] S. Bradner, Key words for use in RFCs to Indicate
             Requirement Levels, RFC 2119, March 1997.

   [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
             Performance, RFC 1323, May 1992.

   [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective
             Acknowledgement Options, RFC 2018, October 1996.

   [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow,
             An Extension to the Selective Acknowledgement (SACK) Option
             for TCP, RFC 2883, July 2000.


Ludwig                                                          [Page 8]

INTERNET-DRAFT         TCP Retransmit (RXT) Flag             July, 2001


   [GU01]    A. Gurtov, Effect of Delays on TCP Performance, In
             Proceedings of IFIP Personal Wireless Conference '2001.

   [KP87]    P. Karn, C. Partridge, Improving Round-Trip Time Estimates
             in Reliable Transport Protocols, In Proceedings of ACM
             SIGCOMM 87.

   [LK00]    R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP
             Robust Against Spurious Retransmissions, ACM Computer
             Communication Review, Vol. 30, No. 1, January 2000,
             available at http://www.acm.org/sigcomm/ccr/archive/2000/
             jan00/ccr-200001-ludwig.html (easier studied when
             viewed/printed in color).

   [LS00]    R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM
             Computer Communication Review, Vol. 30, No. 3, July 2000.

   [Lud01]   R. Ludwig, TCPÆs Response after Detecting a Spurious
             Timeout (was: ôThe Eifel Algorithm for TCPö), work in
             progress, July 2001.

   [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer,
             RFC 2988, November 2000.

   [RFC791]  J. Postel, Internet Protocol, RFC 791, September 1981.

   [RFC793]  J. Postel, Transmission Control Protocol, RFC793, September
             1981.

   [WES01]   D. Wetherall, D. Ely, N. Spring, Robust ECN Signaling with
             Nonces, work in progress, July 2001.

   [WS95]    G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2
             (The Implementation), Addison Wesley, January 1995.


Author's Address

     Reiner Ludwig
     Ericsson Research (EED)
     Ericsson Allee 1
     52134 Herzogenrath, Germany
     Phone: +49 2407 575 719
     Fax:   +49 2407 575 400
     Reiner.Ludwig@Ericsson.com


This Internet-Draft expires in January 2002.


Ludwig                                                          [Page 9]