<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSPY v5 rel. 3 U (http://www.xmlspy.com)
     by Daniel M Kohn (private) -->

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
    <!ENTITY rfc2119 PUBLIC '' 
      'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
    <!ENTITY rfc2018 PUBLIC '' 
      'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2018.xml'>
    <!ENTITY rfc3782 PUBLIC '' 
      'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3782.xml'>
    <!ENTITY rfc2581 PUBLIC '' 
      'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2581.xml'>
    <!ENTITY rfc3517 PUBLIC '' 
      'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3517.xml'>
]>

<rfc category="std" ipr="trust200902" docName="draft-nishida-tcpm-rescue-retransmission-00">

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>

    <front>
        <title abbrev="Rescue Retransmission for SACK Recovery">
        Rescue Retransmission for SACK-based Loss Recovery Algorithm
	</title>

        <author initials='Y.N' surname="Nishida" fullname='Yoshifumi Nishida'>
	  <organization> WIDE Project
            </organization>
	  <address><postal>
<street>Endo 5322</street>
<city>Fujisawa</city> <region>Kanagawa </region><code>252-8520</code>
<country>Japan</country>
	    </postal>
	  <email>nishida@wide.ad.jp</email>
	  </address>
        </author>
        <date/>
        <abstract><t>
	    This memo describes an issue in the recovery algorithm in RFC3517 
        and proposes a simple modification to avoid unnecessary timeouts for
        performance improvement.
         </t></abstract>
    </front>

    <middle>
	<section title="Introduction">
      <t>
      RFC3517 <xref target="RFC3517"/> defines conservative loss recovery algorithm  
      based on the use of the selective acknowledgment (SACK) TCP option
      <xref target="RFC2018"/>. It is designed to follows the guidelines set in 
      RFC2581 <xref target="RFC2581"/> in order to be used safely in TCP implementations.  

      However, in some situations, the loss recovery algorithm in RFC3517 fails 
      to retransmit segments even though there are available pipe size for the connection. 
      This failure of the retransmission can causes unnecessary timeouts which can lead
      performance degradation.

	  This document describes the issue and propose a simple modification 
      to solve this problem. The proposed solution allows SACK-based TCP to attain
      the same performance as NewReno <xref target="RFC3782"/>.

      </t>
	</section>

	<section title="Conventions and Terminology">
      <t>
	  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
	  "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
	  document are to be interpreted as described in <xref target="RFC2119"/>.
	  </t>

	</section>

	<section title="Problem Description">
     <t>
     In RFC3517, when a sender receives the duplicate ACK corresponding to DupThresh ACKs,
     it enters loss recovery phase. In the loss recovery phase, whenever sender receives 
     ACK segments, it re-calculate the size of pipes by calling Update() and SetPipe().
     and determines which segments should be sent by calling NextSeg().
     However, there are some situations where NextSeg() returns no segment although the
     size of pipes is not zero. This behavior results from the following logic in the 
     NextSeg(). When NextSeg() tries to find segments to be retransmitted, it uses the 
     IsLost() that returns segments which are most likely lost. In order to increase the
     accuracy, IsLost() determines that the packet with 'SeqNum' is lost when 
     DupThresh discontiguous SACKed sequences have arrived above 'SeqNum' or
     (DupThresh * SMSS) bytes with sequence numbers greater than 'SeqNum' have been SACKed. 
     If IsLost() returns no packet, NextSeg() uses new segments for the next transmission.
     </t>
     <t>
     In this logic, a problem can arise when a sender does not have new segments to be sent. 
     In this case, if IsLost() returns no packet, NextSeg() cannot find a packet for the
     next transmission and packet transmissions will be delayed until one of the following 
     events happens. 
         <list style='symbols'>
         <t> ACKs have arrived and IsLost() finds new lost segments </t>
         <t> Application feeds data to TCP </t>
         <t> Retransmission timer expires </t>
         </list>
     However, in some situations, such as where window size is small,  
     the number of arrived ACKs might not be enough to identify lost segments.
     In addition, applications might feed data intermittently or might not have no more 
     data to feed. In this case, TCP will need timer expiration to retransmit segments
     even though there are enough pipe size to send a packet. 
     </t>

     </section> 

<section title="Possible Scenario">
 <t>
   This section describe a possible scenario where the issue described in the document happens.
 </t>
   <t>
    The following is a virtual tcpdump log. 
    </t>
          <figure>
	    <preamble></preamble>
	    <artwork>
    1  10:41:00.000001 A > B: . 1000:2000(1000) ack 1 win 32768
    2  10:41:00.001001 A > B: . 2000:3000(1000) ack 1 win 32768
    3  10:41:00.002001 A > B: . 3000:4000(1000) ack 1 win 32768
    4  10:41:00.003001 A > B: . 4000:5000(1000) ack 1 win 32768
    5  10:41:00.004001 A > B: . 5000:6000(1000) ack 1 win 32768
    6  10:41:00.010001 B > A: . ack 1000 win 16384 < sack {2000:3000} >
    7  10:41:00.011001 B > A: . ack 1000 win 16384 < sack {2000:4000} >
    8  10:41:00.012001 B > A: . ack 1000 win 16384 < sack {2000:5000} >
    9  10:41:00.015001 A > B: . 1000:2000(1000) ack 1 win 32768
   10  10:41:00.018001 B > A: . ack 5000 win 16384

	    </artwork>
	    <postamble></postamble>
	  </figure>
   <t>
    In this example, A sends data segments to B. At the beginning of the log,
    the cwnd of A is 5 SMSS (SMSS=1000 octets), hence A sends 5 segments to B (line 1-5).
    Here, if the segment sent in line 1 (segment 1000:2000) and 
    line 5 (segment 5000:6000) are lost, B sends 3 duplicated ACKs for the lost segment 
    (line 6-8) to ask retransmission for the segment 1000:2000.
    At line 8, A receives DupThresh ACKs and retransmits the lost segment (at line 9). 
    At this time, A enters loss recovery phase and set pipe size to 2.5 SMSS.
    At line 10, A receives the ACK triggered by the arrival of the segment 1000:2000.
    Upon the reception of the ACK at line 10, A performs the following steps to determine
    if there are segments can be sent.
      <list style='numbers'>
      <t> Update the pipe size by calling update() and SetPipe(). 
          Since HighACK = 5000, HighData is 6000 and IsLost(5000) returns false, 
          the value of pipe is set to 1000.   
          </t>
      <t> Because cwnd - pipe >= 1 SMSS, it decides to send one or more segments. </t>
      <t> Call NextSeg() to determine what segments to be sent. </t>
      </list>
   </t>
   <t>
   Now, if A has no unsent data, only available packet can be sent is segment 5000:6000. 
   NextSeg() checks if this segment can be sent by applying the following logics, however
   none of them can be applied.
      <list style='numbers'>
      <t> rule (1) cannot be applied to this segment. Because (1.b) and (1.c) return false, </t> 
      <t> rule (2) cannot be applied since there is no available unsent data. </t>
      <t> rule (3) cannot be applied to this segment. Because (1.b) returns false. </t>
      </list>
   Hence NextSeg() returns no segment in this case, 
   which means TCP has no segment to be sent until timeout happens.
   In case where there are multiple packet loss in a window and TCP has no data to send
   at the moment, it will be possible that TCP falls into this situation. 
   </t>
</section>
<section title="Proposed Fix">
<t>
   To solve the problem mentioned above, we propose to introduce one variable: RescueRxt 
   for TCP sender and add the following logic as the fourth rule. 
</t>
<t>
	<list style="hanging" hangIndent="4">
     <t hangText="(4)" >
         If the conditions for rules (1), (2) and (3) fail, but there exists
         unSACKed data, one segment of up to SMSS octets MAY be returned 
         if RescueRxt is not set. The returned segment MUST include the highest
         unSACKed sequence number. 
     </t>
     <t>    
         When a segment is returned by this rule, RescueRxt MUST be set to the highest 
         octets of the segment. Also, HighRxt MUST NOT be updated.

    </t></list>

   In addition to this rule, TCP sender MUST reset RescueRxt when it receives cumulative ACK 
   for a sequence number greater than RescueRxt. 
</t>
</section>

<section title="Discussion">
<t>
The simple approach to address this issue is to send unSACKed data 
when the conditions for rules (1), (2) and (3) failed as long as there is available pipe size. 
A similar approach is also proposed in 
<xref target="I-D.scheffenegger-tcpm-sack-loss-recovery"/>.
However, this approach can cause lots of unnecessary retransmissions where segments 
are reordered but not lost.
</t>
<t>
The proposed fix in the document allows TCP to retransmit one segment per RTT 
where all available data TCP has is unSACKed and not sure if it is lost.  
Since the objective of this algorithm is to avoid retransmission 
timeout and maintain ack clocking, but not to utilize unused pipe, 
sending one segment per RTT is enough for this purpose.
By sending this one packet, the sender TCP will have a good chance to receive
additional ACKs from the receiver, which can trigger another retransmissions
in the next RTT.
The variable RescueRxt ensures that the retransmission by this algorithm happens
only once in a RTT. This logic can drastically suppress amount of unnecessary 
retransmissions in case of reordering.
</t>
</section>
<section title="Acknowledgements">
<t>
The authors gratefully acknowledge Richard Scheffenegger who originally identified 
the issue described in the document and gave insightful comments.  
The authors also would like to appreciate Mark Allman and Ethan Blanton for their
careful reviewing on the initial idea of the logic and their valuable feedbacks.
</t>
</section>

<section title="Security Considerations">
  <t>
  This document only propose simple modification in RFC3782. There are
   no known additional security concerns for this algorithm.
  </t>
        </section>
        <section title="IANA Considerations">
	  <t>
   This document does not create any new registries or modify the rules
   for any existing registries managed by IANA.
	  </t>
	  </section>
    </middle>

    <back>
        <references title='Normative References'> 
          &rfc2119; &rfc2018; &rfc3782; &rfc2581; &rfc3517;
        </references>
        <references title='Informative References'> 
		<?rfc include="reference.I-D.draft-scheffenegger-tcpm-sack-loss-recovery-00.xml" ?>
        </references>
    </back>

</rfc>
