idnits 2.17.1 

draft-ietf-tcpm-sack-recovery-entry-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (8 March 2010) is 5162 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675)

  -- Obsolete informational reference (is this intentional?): RFC  896
     (Obsoleted by RFC 7805)

  -- Obsolete informational reference (is this intentional?): RFC 3782
     (Obsoleted by RFC 6582)


     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                              I. Jarvinen
2	INTERNET-DRAFT                                                   M. Kojo
3	draft-ietf-tcpm-sack-recovery-entry-01.txt        University of Helsinki
4	Intended status: Standards Track                            8 March 2010
5	Expires: September 2010

7	  Using TCP Selective Acknowledgement (SACK) Information to Determine
8	        Duplicate Acknowledgements for Loss Recovery Initiation

10	Status of this Memo

12	    This Internet-Draft is submitted to IETF in full conformance with
13	    the provisions of BCP 78 and BCP 79.

15	    Internet-Drafts are working documents of the Internet Engineering
16	    Task Force (IETF), its areas, and its working groups.  Note that
17	    other groups may also distribute working documents as Internet-
18	    Drafts.

20	    Internet-Drafts are draft documents valid for a maximum of six
21	    months and may be updated, replaced, or obsoleted by other documents
22	    at any time.  It is inappropriate to use Internet-Drafts as
23	    reference material or to cite them other than as "work in progress."

25	    The list of current Internet-Drafts can be accessed at
26	    http://www.ietf.org/ietf/1id-abstracts.txt.

28	    The list of Internet-Draft Shadow Directories can be accessed at
29	    http://www.ietf.org/shadow.html.

31	    This Internet-Draft will expire on September 2010.

33	Copyright Notice

35	    Copyright (c) 2010 IETF Trust and the persons identified as the
36	    document authors.  All rights reserved.

38	    This document is subject to BCP 78 and the IETF Trust's Legal
39	    Provisions Relating to IETF Documents
40	    (http://trustee.ietf.org/license-info) in effect on the date of
41	    publication of this document. Please review these documents
42	    carefully, as they describe your rights and restrictions with
43	    respect to this document. Code Components extracted from this
44	    document must include Simplified BSD License text as described in
45	    Section 4.e of the Trust Legal Provisions and are provided without
46	    warranty as described in the Simplified BSD License.

48	Abstract

50	    This document describes a TCP sender algorithm to trigger loss
51	    recovery based on the TCP Selective Acknowledgement (SACK)
52	    information gathered on a SACK scoreboard instead of simply counting
53	    the number of arriving duplicate acknowledgements (ACKs) in the
54	    traditional way.  The given algorithm is more robust to ACK losses,
55	    ACK reordering, missed duplicate acknowledgements due to delayed
56	    acknowledgements, and extra duplicate acknowledgements due to
57	    duplicated segments and out-of-window segments. The algorithm allows
58	    not only a timely initiation of TCP loss recovery but also reduces
59	    false fast retransmits.  It has a low implementation cost on top of
60	    the SACK scoreboard defined in RFC 3517.

62	                             Table of Contents

64	    1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . .   5
65	       1.1. Conventions and Terminology. . . . . . . . . . . . . . .   6
66	       1.2. Definitions. . . . . . . . . . . . . . . . . . . . . . .   7
67	    2. Algorithm Details . . . . . . . . . . . . . . . . . . . . . .   7
68	       2.1. Redefined IsLost (SeqNum). . . . . . . . . . . . . . . .   7
69	       2.2. The Algorithm. . . . . . . . . . . . . . . . . . . . . .   7
70	    3. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . .   9
71	       3.1. Small Segment Sender . . . . . . . . . . . . . . . . . .   9
72	       3.2. SACK Capability Misbehavior. . . . . . . . . . . . . . .  10
73	       3.3. Compatibility with Duplicate ACK based Loss
74	       Recovery Algorithms . . . . . . . . . . . . . . . . . . . . .  11
75	    4. Security Considerations . . . . . . . . . . . . . . . . . . .  11
76	    5. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12
77	    6. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . .  12
78	    Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . .  12
79	    A. Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . .  12
80	       A.1. Basic Case . . . . . . . . . . . . . . . . . . . . . . .  12
81	       A.2. Delayed ACK. . . . . . . . . . . . . . . . . . . . . . .  13
82	       A.3. ACK Loss . . . . . . . . . . . . . . . . . . . . . . . .  14
83	       A.4. ACK Reordering . . . . . . . . . . . . . . . . . . . . .  15
84	       A.5. Duplicated Packet. . . . . . . . . . . . . . . . . . . .  16
85	       A.6. Mitigation of Blind Throughput Reduction
86	       Attack. . . . . . . . . . . . . . . . . . . . . . . . . . . .  16
87	    References . . . . . . . . . . . . . . . . . . . . . . . . . . .  16
88	    Normative References . . . . . . . . . . . . . . . . . . . . . .  16
89	    Informative References . . . . . . . . . . . . . . . . . . . . .  17
90	    AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . .  18
91	    TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION:

93	    Changes from draft-ietf-tcpm-sack-recovery-entry-00.txt

95	    * Mention setting of RecoveryPoint explicitly as this algorithm
96	    depends on it being valid.

98	    * Changed definition of IsLost (SeqNum) to be less strict.

100	    * Changed packet ordering in one of the appendix examples, now it
101	    makes more sense in the context of this algorithm.  Point out in the
102	    examples which of the transmissions are due to Limited Transmit and
103	    Fast retransmit.

105	    Changes from draft-jarvinen-tcpm-sack-recovery-entry-01.txt

107	    * Clarified issues that based on feedback may cause confusion for
108	    the reader.

110	    * Incorporated handling of cumulative ACKs into the algorithm

112	    * 2581 refs -> 5681

114	    * Added early-rexmt ID as a related one, it uses SACK information
115	    similar to this algorithm (Thanks to Anna Brunstrom).

117	    * More cases added where this algorithm is beneficial in taking
118	    advantage of SACK block redundancy (thanks to Anna Brunstrom).

120	    * Discuss on differences how duplicate ACK counter is managed
121	    (traditional vs. this algorithm)

123	    * Added ref and couple of words about blind throughput reduction
124	    attack

126	    * Wrote SACK splitting attacks. These attacks are quite close to the
127	    edge in significance. Should consider just dropping (rather
128	    insignificant).

130	    Changes from draft-jarvinen-tcpm-sack-recovery-entry-00.txt

132	    * TODO items embedded: Improvements with window update, clarify
133	    dupack counting

135	    * Modified ACK reordering scenario in appendix, shows now a scenario
136	    where recovery is triggered in a more timely manner.

138	    * IDnits
139	    * Handle small segments case using duplicate ACKs counter paraller
140	    to the SACK blocks based detection.

142	    * Add a placeholder for SACK splitting

144	    * Mentioned FACK as some ideas are inherited from there

146	    END OF SECTION TO BE DELETED.

148	1.  Introduction

150	    The Transmission Control Protocol (TCP) [RFC793] has two methods for
151	    triggering retransmissions.  First, the TCP sender relies on
152	    incoming duplicate acknowledgements (ACKs) [RFC5681], indicating
153	    receipt of out-of-order segments at the TCP receiver. After
154	    receiving a required number of duplicate ACKs (usually three), the
155	    TCP sender retransmits the first unacknowledged segment and
156	    continues with a fast recovery algorithm such as Reno [RFC5681],
157	    NewReno [RFC3782] or SACK-based loss recovery [RFC3517].  Second,
158	    the TCP sender maintains a retransmission timer that triggers
159	    retransmission of segments, if the retransmission timer expires
160	    before the segments have been acknowledged.

162	    While the conservative loss recovery algorithm defined in [RFC3517]
163	    takes full advantage of SACK information during a loss recovery, it
164	    does not consider the very same information during the pre-recovery
165	    detection phase. Instead, it simply counts the number of arriving
166	    duplicate ACKs and leans on the number of duplicate ACKs in deciding
167	    when to enter loss recovery. However, this traditional heuristics of
168	    simply counting the number of duplicate ACKs to trigger a loss
169	    recovery fails in several cases to determine correctly the actual
170	    number of valid out-of-order segments the receiver has successfully
171	    received.  First, trusting on duplicate ACKs alone utterly fails to
172	    get hold of the whole picture in case of ACK losses and ACK
173	    reordering, resulting in delayed or missed initiation of fast
174	    retransmit and fast recovery. Similarly, the delayed ACK mechanism
175	    tends to conceal the first duplicate ACK as the delayed cumulative
176	    ACK becomes combined with the first duplicate ACK when the first
177	    out-of-order segment arrives at the receiver (in case of an enlarged
178	    ACK ratio such as with ACK congestion control [RFC5690], even more
179	    significant portion is affected).  Second, segment duplication or
180	    out-of-window segments increase the risk of falsely triggering loss
181	    recovery as they trigger duplicate ACKs. At worst, this legitimate
182	    behavior on out-of-window segments can be turned into a blind
183	    throughput reduction attack [CPNI09].  Third, receiver window
184	    updates or opposite direction data segments cannot be counted as
185	    duplicate ACKs with the traditional approach but can still contain
186	    redundant SACK information that the sender could benefit from in a
187	    scenario where the actual duplicate ACKs where lost.

189	    The algorithm specified in this document uses TCP Selective
190	    Acknowledgement Option [RFC2018] in the pre-recovery state to
191	    determine duplicate ACKs and to trigger loss recovery based on the
192	    information gathered on the SACK scoreboard [RFC3517].  It gives a
193	    more accurate heuristic for determining the number of out-of-order
194	    segments that have arrived at the TCP receiver.  The information
195	    gathered on the SACK scoreboard reveals missing ACKs and allows
196	    detecting duplicate events. Therefore, the algorithm enables a
197	    timely triggering of Fast Retransmit. In addition, it allows the use
198	    of Limited Transmit [RFC3042] accurately regardless of lost ACKs and
199	    also in the cases where the SACK information is piggybacked to a
200	    cumulative ACK due to delayed ACKs.  This, in turn, improves the ACK
201	    clock accuracy.

203	    This algorithm is close to what Linux TCP implementation has used
204	    for a very long time when in conservative SACK mode. A similar
205	    approach is briefly mentioned along ACK congestion control [RFC5690]
206	    but as the usefulness of the algorithm in this document is more
207	    general and not limited to ACK congestion control we specify it
208	    separately. We also note that the definition of a duplicate
209	    acknowledgement already suggests that an incoming ACK can be
210	    considered as a duplicate ACK if it "contains previously unknown
211	    SACK information" [RFC5681]. In addition, SACK information is used,
212	    whenever available, for similar purpose by Early Retransmit
213	    [AAA+10].

215	    This algorithm also resembles Forward Acknowledgement (FACK) [MM96]
216	    but they differ in how the quantity of data outstanding in the
217	    network is determined. FACK always assumes that every non-SACKed
218	    octet below the highest SACKed octet is lost which is only true if
219	    no reordering occurs. Thus it would simply trigger loss recovery
220	    whenever the highest SACKed octet is more than dupThresh * SMSS
221	    octets above SND.UNA.

223	1.1.  Conventions and Terminology

225	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
226	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
227	    document are to be interpreted as described in BCP 14, RFC 2119
228	    [RFC2119] and indicate requirement levels for protocols.

230	1.2.  Definitions

232	    The reader is expected to be familiar with the definitions given in
233	    [RFC5681], [RFC2018], and [RFC3517].

235	2.  Algorithm Details

237	    In order to use this algorithm, a TCP sender MUST have TCP Selective
238	    Acknowledgement Option [RFC2018] enabled and negotiated for the TCP
239	    connection. The TCP sender MUST maintain SACK information in an
240	    appropriate data structure such as scoreboard defined in [RFC3517].
241	    This algorithm uses functions Update(), and SetPipe () and variables
242	    DupThresh, HighData, HighRxt, Pipe, and RecoveryPoint, as defined in
243	    [RFC3517]. Note: the definition of IsLost (SeqNum) is altered from
244	    the one specified in [RFC3517].

246	2.1.  Redefined IsLost (SeqNum)

248	    IsLost (SeqNum) defined in [RFC3517] is stricter than necessary in
249	    counting how many segments the receiver has received past SeqNum.
250	    Instead of requiring at least three times SMSS bytes to be SACKed,
251	    it is enough to have at least two times SMSS bytes plus one byte
252	    SACKed to confirm that the receiver has received at least three
253	    segments above SeqNum (and would have generated at least three
254	    duplicate ACKs). The less strict definition is:

256	    IsLost (SeqNum):

258	        This routine returns whether the given sequence number is
259	        considered to be lost.  The routine returns true when either
260	        DupThresh discontiguous SACKed sequences have arrived above
261	        'SeqNum' or more than (DupThresh - 1) * SMSS bytes with sequence
262	        numbers greater than 'SeqNum' have been SACKed.  Otherwise, the
263	        routine returns false.

265	2.2.  The Algorithm

267	    A TCP sender using this algorithm MUST take the following steps upon
268	    the receipt of any ACK containing SACK information:

270	    1)  If no previous loss event has occurred on the connection OR
271	        RecoveryPoint is less than SND.UNA (the oldest unacknowledged
272	        sequence number [RFC793]), continue with the other steps of
273	        this algorithm. Otherwise, continue the ongoing loss recovery.

275	    2)  Update the scoreboard via the Update () function as outlined
276	        in [RFC3517].

278	    3)  If ACK is a cumulative ACK, reset duplicate ACK counter to zero.

280	    4)  If ACK contains SACK blocks with previously unknown in-window
281	        SACK information (i.e., between SND.UNA and HighData, assuming
282	        SND.UNA has been updated from the acknowledgment number of the
283	        ACK), increase duplicate ACK counter.

285	    5)  Determinate if a loss recovery should be initiated:

287	        If IsLost (SND.UNA) returns false AND the sender has received
288	        less than DupThresh duplicate ACKs, goto step 6A. Otherwise goto
289	        step 6B.

291	    6A) Invoke optional Limited Transmit:

293	        Set HighRxt to SND.UNA and run SetPipe(). The TCP sender MAY
294	        transmit previously unsent data segments according the
295	        guidelines of Limited Transmit [RFC3042], with the exception
296	        that the amount of octets that can be send is determined by Pipe
297	        and cwnd.

299	        If cwnd - Pipe >= 1 SMSS, the TCP sender can transmit one or
300	        more segments as follows:

302	        Send Loop:

304	        a) If available unsent data exists and the receiver's advertised
305	           window allows, transmit one segment of up to SMSS octets of
306	           previously unsent data starting with sequence number
307	           HighData+1 and update HighData to reflect the transmission of
308	           the data segment. Otherwise, exit Send Loop.

310	        b) Run SetPipe() to re-calculate the number of outstanding
311	           octets in the network. If cwnd - Pipe >= 1 SMSS, go to step
312	           a) of Send Loop.  Otherwise, exit Send Loop.

314	    6B) Invoke Fast Retransmit and enter loss recovery:

316	        Initiate a loss recovery phase, per the fast retransmit
317	        algorithm outlined in [RFC5681], and continue with a fast
318	        recovery algorithm such as the SACK-based loss recovery
319	        algorithm outlined in [RFC3517].  This includes setting
320	        RecoveryPoint to HighData as in step (1) of [RFC3517].

322	3.  Discussion

324	    In scenarios where no ACK losses nor reordering occur and the first
325	    acknowledgement with SACK information is not the ACK held due to
326	    delayed acknowledgements mechanism, the new SACK information with
327	    each duplicate ACK covers a single segment. Those duplicate ACKs
328	    cause this algorithm to trigger loss recovery after three duplicate
329	    acknowledgements and will allow transmission of new segments using
330	    Limited Transmit on the first and second duplicate ACK. This is
331	    identical to the behavior that would occur without this algorithm
332	    (assuming DupThresh is 3 and that all segments are SMSS sized). This
333	    scenario together with other typical scenarios describing the
334	    behavior of the algorithm are depicted in Appendix A.

336	    This algorithm SHOULD be used also with an ACK that contains a
337	    window update or opposite direction data that could not be
338	    considered as a duplicate ACK in the traditional algorithm. Such
339	    behavior is safe because the SACK information can only add more
340	    information to the current state of the sender; at worst, all
341	    received information is just redundant.

343	    Setting HighRxt to SND.UNA in Step 6A has no direct relation to this
344	    algorithm. Yet it is included in the algorithm to avoid confusion in
345	    how to implement SetPipe() correctly because it depends on having a
346	    valid HighRxt value [RFC3517].

348	    A set of potential issues to consider with the algorithm are
349	    discussed in the following.

351	3.1.  Small Segment Sender

353	    If a TCP sender is sending small segments (usually intentionally
354	    overriding Nagle algorithm [RFC896]), the IsLost (SND.UNA) used in
355	    step 5 of the algorithm might fail to detect the need for loss
356	    recovery on the third duplicate acknowledgement because not enough
357	    octets have been SACKed to cover more than (DupThresh - 1) * SMSS
358	    bytes above SND.UNA.  Therefore, an adapted duplicate ACK algorithm
359	    is needed as a fallback. Steps 3, 4 and the latter condition of step
360	    5 implement the adapted duplicate ACK algorithm in parallel to the
361	    SACK block based detection.

363	    The number of duplicate ACKs is an artificial metric to estimate the
364	    number of segments the receiver has already in its receive buffer.
365	    How accurately they match depends on the scenario. Because of that,
366	    the goal of the duplicate ACK counter included into this algorithm
367	    is not to achieve bug-to-bug compatibility with the plain duplicate
368	    ACK counter but to estimate how many out-of-order segments the
369	    receiver has already queued in a more accurate way. Therefore, the
370	    duplicate ACK counter used as a fallback mechanism in this algorithm
371	    differs from the plain duplicate ACK counter. However, such
372	    differences indicate a scenario where the plain counter was not able
373	    to accurately keep track of the receiver state.

375	    While the fallback algorithm itself does not look into
376	    acknowledgment field in order to make a decision whether ACK is a
377	    "duplicate ACK", the duplicate ACK counter is not renamed in this
378	    document as in practice most of ACKs that increment the counter
379	    would still contain a duplicate acknowledgment number.  In contrast
380	    to the traditional approach, only condition that must be satisfied
381	    to increment the duplicate ACK counter with this algorithm is that
382	    the acknowledgement MUST contain at least one in-window SACK block
383	    that covers octets that were not previously SACKed [RFC5681]. In
384	    cases with ACK losses or delayed ACKs this condition can also match
385	    to cumulative ACKs, receiver window updates and opposite direction
386	    data segments but still the counter can safely be incremented.

388	    Alternatively to the fallback algorithm, a TCP sender that is able
389	    to discern segment boundaries accurately can consider full segments
390	    in IsLost (SeqNum) regardless of segment size.  Therefore, such a
391	    TCP sender can avoid the problem with small segments using IsLost
392	    (SND.UNA) check alone which means that Steps 3, 4 and the latter
393	    condition of step 5 are redundant and not required to be
394	    implemented.

396	    Note: the small segments problem is not unique to this algorithm but
397	    also the SACK-based loss recovery [RFC3517] encounters it because of
398	    how IsLost (SeqNum) is defined.

400	3.2.  SACK Capability Misbehavior

402	    If the receiver represents such a SACK misbehavior that it
403	    advertises SACK capability but never sends any SACK blocks when it
404	    should, this algorithm fails to enter loss recovery and
405	    retransmission timeout is required for recovery. However, such
406	    misbehavior does not allow SACK-based loss recovery [RFC3517] to
407	    work either, and a TCP sender will anyway require a timeout to
408	    recover if there was more than one lost data segment within the
409	    window.

411	3.3.  Compatibility with Duplicate ACK based Loss Recovery Algorithms

413	    This algorithm SHOULD NOT be used together with a fast recovery
414	    algorithm that determines the segments that have left the network
415	    based on the number of arriving duplicate acknowledgements (e.g.,
416	    NewReno [RFC3782]), instead of the actual segments reported by SACK.
417	    In presence of ACK reordering such an algorithm will count the
418	    delayed duplicate acknowledgements during the fast recovery
419	    algorithm as extra while determining the number of packets that have
420	    left the network.

422	    In general there should be very little reason to combine this
423	    algorithm with a loss recovery algorithm that is based on inferior,
424	    non-SACK based information only.

426	4.  Security Considerations

428	    A malicious TCP receiver may send false SACK information for
429	    sequence number ranges which it has not received in order to trigger
430	    Fast Retransmit sooner. Such behavior would only be useful when out-
431	    of-order segments have arrived because otherwise the flow undergoes
432	    a loss recovery with a window reduction. This kind of lying involves
433	    guessing which segments will arrive later. In case the guess was
434	    wrong, the performance of the flow is ruined because the TCP sender
435	    will need a retransmission timeout as it will not retransmit the
436	    segments until it assumes SACK reneging. On a successful guess the
437	    attacker is able to trigger the recovery slightly earlier. The later
438	    segments would have allowed reporting the very same regions with
439	    SACK anyway. Therefore, the gain from this attack is small, hardly
440	    justifiable considering the drastic effect of a misguess.
441	    Furthermore, a similar attack can be made with the duplicate
442	    acknowledgment based algorithm (even if the new SACK information
443	    rule is applied) by sending false duplicate acknowledgements with
444	    false SACK ranges, and trivially without the new SACK information
445	    rule.

447	    A variation of the lying attack discards reliability of the flow but
448	    as soon as the reliability is not a concern of the receiver, a
449	    number of simpler ways exist to attack TCP independently of this
450	    algorithm. Thus this algorithm is not considered to weaken TCP
451	    security properties against false information.

453	    Splitting SACK blocks into a smaller than the received segment sized
454	    chunks allows the receiver to enable recovery to start sooner
455	    because of IsLost (SeqNum) discontiguous check. However, by doing so
456	    the receiver neglects the possiblity of reordering for a little
457	    gain. If the segment was just reordered, the sender performs
458	    unnecessary window reduction and unnecessary retransmission of the
459	    reordered segment. Another variant of SACK block splitting simply
460	    tries to increase consumption of bandwidth by triggering a burst of
461	    retransmissions falsely. However,  the difference between sending
462	    three duplicate ACKs (traditional algorithm) and a single ACK with
463	    SACK blocks will not offer significant benefits to make such an
464	    attack practical with a small DupThresh value such as three.  In
465	    case the sender keeps track of segment boundaries and applies them
466	    in IsLost (SeqNum), such attack will not succeed as the sender
467	    cannot be mislead to believe that a segment was split into multiple
468	    chunks.

470	5.  IANA Considerations

472	    This document has no actions for IANA.

474	6.  Acknowledgements

476	    The authors would like to thank Alexander Zimmermann and Anna
477	    Brunstrom for the comments on this document.

479	Appendix

481	A.  Scenarios

483	A.1.  Basic Case

485	    In this scenario no Delayed ACK, ACK losses, reordering or other
486	    "abnormal" behavior happens. For simplicity all the segments are
487	    SMSS sized.

489	    Once the TCP receiver gets first out-of-order segment, it sends a
490	    duplicate ACK with SACK information about the received octets. The
491	    following two out-of-order segments trigger a duplicate ACK each,
492	    with the corresponding range SACKed in addition to the previously
493	    know information. The sender gets those duplicate ACKs in-order,
494	    each of them will SACK a new previously unknown segment.

496	    This algorithm triggers loss recovery on third duplicate ACK because
497	    IsLost (SeqNum) returns true as more than (DupThresh - 1) * SMSS
498	    bytes become SACKed on the same acknowledgement, thus the behavior
499	    is identical to that of a sender which is using duplicate
500	    acknowledgments.  If Limited Transmit is in use, two first duplicate
501	    ACKs allow a single segment to be sent with either of the algorithms
502	    (Pipe is decremented by SMSS by the SACKed octets per ACK allowing
503	    SMSS worth of new octets).

505	        ACK           Transmitted    Received    ACK Sent
506	        Received      Segment        Segment     (Including SACK Blocks)

508	        1000
509	                      3000-3499      3000-3499   (delayed ACK)
510	                      3500-3999      3500-3999   4000
511	        2000
512	                      4000-4499      (dropped)
513	                      4500-4999      4500-4999   4000, SACK=4500-5000
514	        3000
515	                      5000-5499      5000-5499   4000, SACK=4500-5500
516	                      5500-5999      5500-5999   4000, SACK=4500-6000
517	        4000
518	                      6000-6499      6000-6499   4000, SACK=4500-6500
519	                      6500-6999      6500-6999   4000, SACK=4500-7000
520	        4000, SACK=4500-5000
521	         (lim. tr.)   7000-7499      7000-7499   4000, SACK=4500-7500
522	        4000, SACK=4500-5500
523	         (lim. tr.)   7500-7999      7500-7999   4000, SACK=4500-8000
524	        4000, SACK=4500-6000
525	         (fast retr.) 4000-4499      4000-4499   8000
526	        4000, SACK=4500-6500

528	A.2.  Delayed ACK

530	    The case with delayed ACK occurs when the receiver sends the first
531	    ACK with SACK information but since the previous ACK was sent with a
532	    lower sequence number because an acknowledgment is held by delayed
533	    ACK, the sender will not considered it as duplicate ACK. Because the
534	    segment contains SACK information that is identical to the basic
535	    case, the sender can use Limited Transmit with the same segments as
536	    in the basic case and will start loss recovery at the third
537	    acknowledgment, i.e., with the second duplicate acknowledgment. In
538	    the same situation the duplicate ACK based sender will have to wait
539	    for one more duplicate ACK to arrive to do the same as the first
540	    acknowledgment is fully "wasted".

542	    Technically an acknowledgement with a sequence number higher than
543	    what was previously acknowledged is not a duplicate acknowledgement
544	    but a presence of the SACK block tells another story revealing the
545	    receiver which used delayed ACK, and thus the missing duplicate
546	    acknowledgement in between. The response of a TCP sender taking
547	    advantage of such inferred duplicate acknowledgements is well within
548	    the guidelines of packet conservation principle [Jac88] as it still
549	    sends only when segments have left the network.

551	        ACK           Transmitted    Received    ACK Sent
552	        Received      Segment        Segment     (Including SACK Blocks)

554	        1500
555	                      3000-3499      3000-3499   3500
556	                      3500-3999      3500-3999   (delayed ACK)
557	        2500
558	                      4000-4499      (dropped)
559	                      4500-4999      4500-4999   4000, SACK=4500-5000
560	        3500
561	                      5000-5499      5000-5499   4000, SACK=4500-5500
562	                      5500-5999      5500-5999   4000, SACK=4500-6000
563	        4000, SACK=4500-5000 (two segments left the network)
564	                      6000-6499      6000-6499   4000, SACK=4500-6500
565	         (lim. tr.)   6500-6999      6500-6999   4000, SACK=4500-7000
566	        4000, SACK=4500-5500
567	         (lim. tr.)   7000-7499      7000-7499   4000, SACK=4500-7500
568	        4000, SACK=4500-6000
569	         (fast retr.) 4000-4499      4000-4499   7500
570	        4000, SACK=4500-6500

572	A.3.  ACK Loss

574	    This case with ACK loss shares much behavior with the case with
575	    delayed ACK. If hole at RCV.NXT is filled, the sender will notice
576	    that cumulative ACK advanced.  In case of out-of-order segments the
577	    first ACK which gets through to the sender includes SACK blocks up
578	    to the quantity the SACK block redundancy is able to cover.  With
579	    this algorithm the sender immediately takes use of all the
580	    information that is made available by the incoming ACK.

582	        ACK           Transmitted    Received    ACK Sent
583	        Received      Segment        Segment     (Including SACK Blocks)

585	        1000
586	                      3000-3499      3000-3499   (delayed ACK)
587	                      3500-3999      3500-3999   4000
588	        2000
589	                      4000-4499      (dropped)
590	                      4500-4999      4500-4999   4000, SACK=4500-5000
591	                                                 (dropped)
592	        3000
593	                      5000-5499      5000-5499   4000, SACK=4500-5500
594	                      5500-5999      5500-5999   4000, SACK=4500-6000

596	        4000
597	                      6000-6499      6000-6499   4000, SACK=4500-6500
598	                      6500-6999      6500-6999   4000, SACK=4500-7000
599	        4000, SACK=4500-5500 (two segments left the network)
600	         (lim. tr.)   7000-7499      7000-7499   4000, SACK=4500-7500
601	         (lim. tr.)   7500-7999      7500-7999   4000, SACK=4500-8000
602	        4000, SACK=4500-6000
603	         (fast retr.) 4000-4499      4000-4499   8000
604	        4000, SACK=4500-6500

606	A.4.  ACK Reordering

608	    With ACK reordering an ACK is postponed.  Due to redundancy the next
609	    ACK after postponed one contains not only its own information but
610	    also the information of the reordered ACK (similar to the ACK losses
611	    case).  When the reordered ACK arrives later, the sender already
612	    knows the information it provides and therefore no actions are taken
613	    with this algorithm.

615	        ACK           Transmitted    Received    ACK Sent
616	        Received      Segment        Segment     (Including SACK Blocks)

618	        1000
619	                      3000-3499      3000-3499   (delayed ACK)
620	                      3500-3999      3500-3999   4000
621	        2000
622	                      4000-4499      (dropped)
623	                      4500-4999      4500-4999   4000, SACK=4500-5000
624	                                                 (delayed)
625	        3000
626	                      5000-5499      5000-5499   4000, SACK=4500-5500
627	                      5500-5999      5500-5999   4000, SACK=4500-6000
628	        4000
629	                      6000-6499      6000-6499   4000, SACK=4500-6500
630	                      6500-6999      6500-6999   4000, SACK=4500-7000
631	        4000, SACK=4500-5500 (two segments left the network)
632	         (lim. tr.)   7000-7499      7000-7499   4000, SACK=4500-7500
633	         (lim. tr.)   7500-7999      7500-7999   4000, SACK=4500-8000
634	        4000, SACK=4500-5000 (has only redundant information)
635	        4000, SACK=4500-6000
636	         (fast retr.) 4000-4499      4000-4499   8000
637	        4000, SACK=4500-6500

639	A.5.  Duplicated Packet

641	    A duplicate packet is received either due to unnecessary
642	    retransmission or hardware duplication.  It adds a redundant ACK
643	    which has only redundant information or a data segment to the stream
644	    which will trigger a redundant duplicate ACK (possibly with SACK
645	    and/or DSACK [RFC2883] information).  Because neither adds any new
646	    SACKed octets at the TCP sender, this algorithm will not do anything
647	    whereas a duplicate ACK based receiver would falsely consider it as
648	    a duplicate ACK.

650	    If one of the redundant ACKs is lost, the effect of duplication is
651	    just cancelled.

653	    It would be possible for the sender to detect this case using DSACK
654	    alone.

656	A.6.  Mitigation of Blind Throughput Reduction Attack

658	    In case an attacker knows or is able to guess 4-tuple of a TCP
659	    connection, it may apply a blind throughput reduction attack
660	    [CPNI09].  In this attack TCP is tricked to send duplicate ACKs to
661	    the other endpoint using segments likely residing out-of-window that
662	    is considerably easier to achieve than a match with sequence
663	    numbers. If more than dupThresh duplicate ACKs can be triggered in a
664	    row without any legimate segment that advances acknowledged sequence
665	    number, the other end acts according to the false congestion signal
666	    and halves the window.

668	    With this algorithm such duplicate ACKs are filtered because they do
669	    not have any new in-window SACK blocks (DSACK [RFC2883] might be
670	    present though, but it does not cover in-window octets).

672	References

674	Normative References

676	    [RFC793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
677	              793, September 1981.

679	    [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow,
680	              "TCP Selective Acknowledgment Options", RFC 2018,
681	              October 1996.

683	    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
684	              Requirement Levels", BCP 14, RFC 2119, March 1997.

686	    [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
687	              TCP's Loss Recovery Using Limited Transmit", RFC 3042,
688	              January 2001.

690	    [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang,
691	              "A Conservative Selective Acknowledgment (SACK)-based
692	              Loss Recovery Algorithm for TCP", RFC 3517, April 2003.

694	    [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
695	              Control", RFC 5681, September 2009.

697	Informative References

699	    [AAA+10]  Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J.,
700	              and P. Hurtig, "Early Retransmit for TCP and SCTP",
701	              Internet-Draft, draft-ietf-tcpm-early-rexmt-04, January
702	              2010.

704	    [CPNI09]  Security Assessment of the Transmission Control Protocol
705	              (TCP).  Available at:
706	              http://www.cpni.gov.uk/Docs/tn-03-09-security-assessment-
707	              TCP.pdf

709	    [Jac88]   Jacobson, V., "Congestion Avoidance and Control", In
710	              Proceedings of ACM SIGCOMM '88, August 1988.

712	    [MM96]    M. Mathis, J. Mahdavi, "Forward Acknowledgment: Refining
713	              TCP Congestion Control," In Proceedings of SIGCOMM '96,
714	              August 1996.

716	    [RFC896]  Nagle, J., "Congestion Control in IP/TCP Internetworks",
717	              RFC 896, January 1984.

719	    [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
720	              Extension to the Selective Acknowledgement (SACK) Option
721	              for TCP", RFC 2883, July 2000.

723	    [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
724	              Modification to TCP's Fast Recovery Algorithm", RFC 3782,
725	              April 2004.

727	    [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding
728	              Acknowledgement Congestion Control to TCP", RFC 5690,
729	              February 2010.

731	AUTHORS' ADDRESSES

733	    Ilpo Jarvinen
734	    University of Helsinki
735	    P.O. Box 68
736	    FI-00014 UNIVERSITY OF HELSINKI
737	    Finland
738	    Email: ilpo.jarvinen@helsinki.fi

740	    Markku Kojo
741	    University of Helsinki
742	    P.O. Box 68
743	    FI-00014 UNIVERSITY OF HELSINKI
744	    Finland
745	    Email: kojo@cs.helsinki.fi