idnits 2.17.1 

draft-ietf-tcpm-sack-recovery-entry-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (19 October 2009) is 5303 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675)

  -- Obsolete informational reference (is this intentional?): RFC  896
     (Obsoleted by RFC 7805)

  -- Obsolete informational reference (is this intentional?): RFC 3782
     (Obsoleted by RFC 6582)


     Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                              I. Jarvinen
2	INTERNET-DRAFT                                                   M. Kojo
3	draft-ietf-tcpm-sack-recovery-entry-00.txt        University of Helsinki
4	Intended status: Standards Track                         19 October 2009
5	Expires: April 2010

7	  Using TCP Selective Acknowledgement (SACK) Information to Determine
8	        Duplicate Acknowledgements for Loss Recovery Initiation

10	Status of this Memo

12	    This Internet-Draft is submitted to IETF in full conformance with
13	    the provisions of BCP 78 and BCP 79.

15	    Internet-Drafts are working documents of the Internet Engineering
16	    Task Force (IETF), its areas, and its working groups.  Note that
17	    other groups may also distribute working documents as Internet-
18	    Drafts.

20	    Internet-Drafts are draft documents valid for a maximum of six
21	    months and may be updated, replaced, or obsoleted by other documents
22	    at any time.  It is inappropriate to use Internet-Drafts as
23	    reference material or to cite them other than as "work in progress."

25	    The list of current Internet-Drafts can be accessed at
26	    http://www.ietf.org/ietf/1id-abstracts.txt.

28	    The list of Internet-Draft Shadow Directories can be accessed at
29	    http://www.ietf.org/shadow.html.

31	    This Internet-Draft will expire on April 2010.

33	Copyright Notice

35	    Copyright (c) 2009 IETF Trust and the persons identified as the
36	    document authors.  All rights reserved.

38	    This document is subject to BCP 78 and the IETF Trust's Legal
39	    Provisions Relating to IETF Documents in effect on the date of
40	    publication of this document (http://trustee.ietf.org/license-info).
41	    Please review these documents carefully, as they describe your
42	    rights and restrictions with respect to this document.

44	Abstract

46	    This document describes a TCP sender algorithm to trigger loss
47	    recovery based on the TCP Selective Acknowledgement (SACK)
48	    information gathered on a SACK scoreboard instead of simply counting
49	    the number of arriving duplicate acknowledgements (ACKs) in the
50	    traditional way.  The given algorithm is more robust to ACK losses,
51	    ACK reordering, missed duplicate acknowledgements due to delayed
52	    acknowledgements, and extra duplicate acknowledgements due to
53	    duplicated segments and out-of-window segments. The algorithm allows
54	    not only a timely initiation of TCP loss recovery but also reduces
55	    false fast retransmits.  It has a low implementation cost on top of
56	    the SACK scoreboard defined in RFC 3517.

58	                             Table of Contents

60	    1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . .   5
61	       1.1. Conventions and Terminology. . . . . . . . . . . . . . .   6
62	       1.2. Definitions. . . . . . . . . . . . . . . . . . . . . . .   6
63	    2. Algorithm Details . . . . . . . . . . . . . . . . . . . . . .   6
64	    3. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . .   8
65	       3.1. Small Segment Sender . . . . . . . . . . . . . . . . . .   8
66	       3.2. One Segment is Small . . . . . . . . . . . . . . . . . .  10
67	       3.3. SACK Capability Misbehavior. . . . . . . . . . . . . . .  10
68	       3.4. Compatibility with Duplicate ACK based Loss
69	       Recovery Algorithms . . . . . . . . . . . . . . . . . . . . .  10
70	    4. Security Considerations . . . . . . . . . . . . . . . . . . .  10
71	    5. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
72	    6. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . .  11
73	    Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
74	    A. Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . .  12
75	       A.1. Basic Case . . . . . . . . . . . . . . . . . . . . . . .  12
76	       A.2. Delayed ACK. . . . . . . . . . . . . . . . . . . . . . .  13
77	       A.3. ACK Losses . . . . . . . . . . . . . . . . . . . . . . .  14
78	       A.4. ACK Reordering . . . . . . . . . . . . . . . . . . . . .  14
79	       A.5. Packet Duplication . . . . . . . . . . . . . . . . . . .  15
80	       A.6. Mitigation of Blind Throughput Reduction
81	       Attack. . . . . . . . . . . . . . . . . . . . . . . . . . . .  15
82	    References . . . . . . . . . . . . . . . . . . . . . . . . . . .  16
83	    Normative References . . . . . . . . . . . . . . . . . . . . . .  16
84	    Informative References . . . . . . . . . . . . . . . . . . . . .  16
85	    AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . .  17
86	    TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION:

88	    Changes from draft-jarvinen-tcpm-sack-recovery-entry-01.txt

90	    * Clarified issues that based on feedback may cause confusion for
91	    the reader.

93	    * Incorporated handling of cumulative ACKs into the algorithm

95	    * 2581 refs -> 5681

97	    * Added early-rexmt ID as a related one, it uses SACK information
98	    similar to this algorithm (Thanks to Anna Brunstrom).

100	    * More cases added where this algorithm is beneficial in taking
101	    advantage of SACK block redundancy (thanks to Anna Brunstrom).

103	    * Discuss on differences how duplicate ACK counter is managed
104	    (traditional vs. this algorithm)

106	    * Added ref and couple of words about blind throughput reduction
107	    attack

109	    * Wrote SACK splitting attacks. These attacks are quite close to the
110	    edge in significance. Should consider just dropping (rather
111	    insignificant).

113	    Changes from draft-jarvinen-tcpm-sack-recovery-entry-00.txt

115	    * TODO items embedded: Improvements with window update, clarify
116	    dupack counting

118	    * Modified ACK reordering scenario in appendix, shows now a scenario
119	    where recovery is triggered in a more timely manner.

121	    * IDnits

123	    * Handle small segments case using duplicate ACKs counter paraller
124	    to the SACK blocks based detection.

126	    * Add a placeholder for SACK splitting

128	    * Mentioned FACK as some ideas are inherited from there

130	    END OF SECTION TO BE DELETED.

132	1.  Introduction

134	    The Transmission Control Protocol (TCP) [RFC793] has two methods for
135	    triggering retransmissions.  First, the TCP sender relies on
136	    incoming duplicate acknowledgements (ACKs) [RFC5681], indicating
137	    receipt of out-of-order segments at the TCP receiver. After
138	    receiving a required number of duplicate ACKs (usually three), the
139	    TCP sender retransmits the first unacknowledged segment and
140	    continues with a fast recovery algorithm such as Reno [RFC5681],
141	    NewReno [RFC3782] or SACK-based loss recovery [RFC3517].  Second,
142	    the TCP sender maintains a retransmission timer that triggers
143	    retransmission of segments, if the retransmission timer expires
144	    before the segments have been acknowledged.

146	    While the conservative loss recovery algorithm defined in [RFC3517]
147	    takes full advantage of SACK information during a loss recovery, it
148	    does not consider the very same information during the pre-recovery
149	    detection phase. Instead, it simply counts the number of arriving
150	    duplicate ACKs and leans on the number of duplicate ACKs in deciding
151	    when to enter loss recovery. However, this traditional heuristics of
152	    simply counting the number of duplicate ACKs to trigger a loss
153	    recovery fails in several cases to determine correctly the actual
154	    number of valid out-of-order segments the receiver has successfully
155	    received.  First, trusting on duplicate ACKs alone utterly fails to
156	    get hold of the whole picture in case of ACK losses and ACK
157	    reordering, resulting in delayed or missed initiation of fast
158	    retransmit and fast recovery. Similarly, the delayed ACK mechanism
159	    tends to conceal the first duplicate ACK as the delayed cumulative
160	    ACK becomes combined with the first duplicate ACK when the first
161	    out-of-order segment arrives at the receiver (in case of an enlarged
162	    ACK ratio such as with ACK congestion control [FARI08], even more
163	    significant portion is affected).  Second, segment duplication or
164	    out-of-window segments increase the risk of falsely triggering loss
165	    recovery as they trigger duplicate ACKs. At worst, this legitimate
166	    behavior on out-of-window segments can be turned into a blind
167	    throughput reduction attack [CPNI09].  Third, receiver window
168	    updates or opposite direction data segments cannot be counted as
169	    duplicate ACKs with the traditional approach but can still contain
170	    redundant SACK information that the sender could benefit from in a
171	    scenario where the actual duplicate ACKs where lost.

173	    The algorithm specified in this document uses TCP Selective
174	    Acknowledgement Option [RFC2018] to determine duplicate ACKs and to
175	    trigger loss recovery based on the information gathered on the SACK
176	    scoreboard [RFC3517]. It works in the pre-recovery state giving a
177	    more accurate heuristic for determining the number of out-of-order
178	    segments arrived at the TCP receiver.  The information gathered on
179	    the scoreboard reveals missing ACKs and allows detecting duplicate
180	    events. Therefore, the algorithm enables a timely triggering of Fast
181	    Retransmit. In addition, it allows the use of Limited Transmit
182	    [RFC3042] regardless of lost ACKs and also in the cases where the
183	    SACK information is piggybacked to a cumulative ACK due to delayed
184	    ACKs.  This, in turn, allows keeping the ACK clock running more
185	    accurately.

187	    This algorithm is close to what Linux TCP implementation has used
188	    for a very long time when in conservative SACK mode. A similar
189	    approach is briefly mentioned along ACK congestion control [FARI08]
190	    but as the usefulness of the algorithm in this document is more
191	    general and not limited to ACK congestion control we specify it
192	    separately. We also note that the definition of a duplicate
193	    acknowledgement already suggests that an incoming ACK can be
194	    considered as a duplicate ACK if it "contains previously unknown
195	    SACK information" [RFC5681]. In addition, SACK information is used,
196	    whenever available, for similar purpose by Early Retransmit
197	    [AAA+09].

199	    This algorithm also resembles Forward Acknowledgement (FACK) [MM96]
200	    but they differ in how the quantity of data outstanding in the
201	    network is determined. FACK always assumes that every non-SACKed
202	    octet below the highest SACKed octet is lost which is only true if
203	    no reordering occurs. Thus it would simply trigger loss recovery
204	    whenever the highest SACKed octet is more than dupThresh segments
205	    above SND.UNA.

207	1.1.  Conventions and Terminology

209	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
210	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
211	    document are to be interpreted as described in BCP 14, RFC 2119
212	    [RFC2119] and indicate requirement levels for protocols.

214	1.2.  Definitions

216	    The reader is expected to be familiar with the definitions given in
217	    [RFC5681], [RFC2018], and [RFC3517].

219	2.  Algorithm Details

221	    In order to use this algorithm, a TCP sender MUST have TCP Selective
222	    Acknowledgement Option [RFC2018] enabled and negotiated for the TCP
223	    connection. A TCP sender MUST maintain SACK information in an
224	    appropriate data structure such as scoreboard defined in [RFC3517].

226	    This algorithm uses functions IsLost (SeqNum), Update(), and SetPipe
227	    () and variables DupThresh, HighData, HighRxt, Pipe, and
228	    RecoveryPoint, as defined in [RFC3517].

230	    A TCP sender using this algorithm MUST take following steps:

232	    1)  Upon the receipt of any ACK containing SACK information:

234	        If no previous loss event has occurred on the connection OR
235	        RecoveryPoint is less than SND.UNA (the oldest unacknowledged
236	        sequence number [RFC793]), continue with the other steps of this
237	        algorithm. Otherwise, continue the ongoing loss recovery.

239	    2)  Update the scoreboard via the Update () function as outlined in
240	        [RFC3517].

242	    3)  If ACK is a cumulative ACK, reset duplicate ACK counter to zero.

244	    4)  If ACK contains SACK blocks with previously unknown in-window
245	        (i.e., between SND.UNA and HighData, assuming SND.UNA has been
246	        updated from the acknowledgment number of the ACK) SACK
247	        information, increase duplicate ACK counter.

249	    5)  Determinate if a loss recovery should be initiated:

251	        If IsLost(SND.UNA) returns false AND the sender has received
252	        less than DupThresh duplicate ACKs, goto step 6A. Otherwise goto
253	        step 6B.

255	    6A) Invoke optional Limited Transmit:

257	        Set HighRxt to SND.UNA and run SetPipe(). The TCP sender MAY
258	        transmit previously unsent data segments according the
259	        guidelines of Limited Transmit [RFC3042], with the exception
260	        that the amount of octets that can be send is determined by Pipe
261	        and cwnd.

263	        If cwnd - pipe >= 1 SMSS, the TCP sender can transmit one or
264	        more segments as follows:

266	        Send Loop:

268	        a) If available unsent data exists and the receiver's advertised
269	           window allows, transmit one segment of up to SMSS octets of
270	           previously unsent data starting with sequence number
271	           HighData+1 and update HighData to reflect the transmission of
272	           the data segment. Otherwise, exit Send Loop.

274	        b) Run SetPipe() to re-calculate the number of outstanding
275	           octets in the network. If cwnd - pipe >= 1 SMSS, go to step
276	           a) of Send Loop.  Otherwise, exit Send Loop.

278	    6B) Invoke Fast Retransmit and enter loss recovery:

280	        Initiate a loss recovery phase, per the fast retransmit
281	        algorithm outlined in [RFC5681] and continue with a fast
282	        recovery algorithm, such as the SACK-based loss recovery
283	        algorithm outlined in [RFC3517].

285	3.  Discussion

287	    In scenarios where no ACK losses nor reordering occur and the first
288	    acknowledgement with SACK information is not the ACK held due to
289	    delayed acknowledgements mechanism, the new SACK information with
290	    each duplicate ACK covers a single segment. In such a case, this
291	    algorithm will trigger loss recovery after three duplicate
292	    acknowledgements and will allow transmission of a single new segment
293	    using Limited Transmit on the first and second duplicate ACK. This
294	    is identical to the behavior that would occur without this algorithm
295	    (assuming DupThresh is 3 and that all segments are SMSS sized). This
296	    scenario together with other scenarios describing the behavior of
297	    the algorithm are depicted in Appendix A.

299	    This algorithm SHOULD be used also with an ACK that contains a
300	    window update or opposite direction data that could not be
301	    considered as a duplicate ACK in the traditional algorithm. Such
302	    behavior is safe because the SACK information can only add more
303	    information to the current state of the sender; at worst, all
304	    received information is just redundant.

306	    Setting HighRxt to SND.UNA in Step 6A has no direct relation to this
307	    algorithm. Yet it is included in the algorithm to avoid confusion in
308	    how to implement SetPipe() correctly because it depends on having a
309	    valid HighRxt value [RFC3517].

311	    A set of potential issues to consider with the algorithm are
312	    discussed in the following.

314	3.1.  Small Segment Sender

316	    If a TCP sender is sending small segments (usually intentionally
317	    overriding Nagle algorithm [RFC896]), the IsLost(SND.UNA) used in
318	    step 5 of the algorithm might fail to detect the need for loss
319	    recovery on the third duplicate acknowledgement because not enough
320	    octets have been SACKed to cover DupThresh * SMSS bytes above
321	    SND.UNA.  Therefore, the traditional duplicate ACK algorithm is
322	    needed as a fallback. Steps 3, 4 and the latter condition of step 5
323	    implement the traditional algorithm in paraller to the SACK block
324	    based detection.

326	    The number of duplicate ACKs is an artificial metric to estimate the
327	    number of segments the receiver has already in its receive buffer.
328	    How accurately they match depends on the scenario. Because of that,
329	    the goal of the duplicate ACK counter included into this algorithm
330	    is not to achieve bug-to-bug compatibility with the plain duplicate
331	    ACK counter but to estimate how many out-of-order segments the
332	    receiver has already queued in a more accurate way. Therefore, the
333	    duplicate ACK counter used as a fallback mechanism in this algorithm
334	    differs from the plain duplicate ACK counter. However, such
335	    differences indicate a scenario where the plain counter was not able
336	    to accurately keep track of the receiver state.

338	    While the fallback algorithm itself does not look into
339	    acknowledgment field in order to make a decision whether ACK is a
340	    "duplicate ACK", the duplicate ACK counter is not renamed in this
341	    document as in practice most of ACKs that increment the counter
342	    would still contain a duplicate acknowledgment number.  In contrast
343	    to the traditional approach, only condition that must be satisfied
344	    to increment the duplicate ACK counter with this algorithm is that
345	    the acknowledgement MUST contain at least one in-window SACK block
346	    that covers octets that where not previously SACKed [RFC5681]. In
347	    cases with ACK losses or delayed ACKs this condition can also match
348	    to cumulative ACKs, receiver window updates and opposite direction
349	    data segments but still the counter can safely be incremented.

351	    Alternatively to the fallback algorithm, a TCP sender that is able
352	    to discern segment boundaries accurately can consider full segments
353	    in IsLost() regardless of segment size.  Therefore, such a TCP
354	    sender can avoid the problem with small segments using
355	    IsLost(SND.UNA) check alone which means that Steps 3, 4 and the
356	    latter condition of step 5 are redundant and do not have to be
357	    implemented.

359	    Note: the small segments problem is not unique to this algorithm but
360	    also the SACK-based loss recovery [RFC3517] encounters it because of
361	    how IsLost() is defined.

363	3.2.  One Segment is Small

365	    A variant of small segment sender case is the case where only one of
366	    the SACKed segments is smaller than SMSS (possible even with Nagle
367	    enabled).  If TCP sender lacks ability to use the improved method by
368	    discerning segment boundaries but still wants robustness against ACK
369	    losses in this case, it MAY extend the condition in Step 5 with the
370	    test:

372	        SACKed octets > SMSS * (DupThresh - 1)

374	3.3.  SACK Capability Misbehavior

376	    If the receiver represents such a SACK misbehavior that it
377	    advertises SACK capability but never sends any SACK blocks when it
378	    should, this algorithm fails to enter loss recovery and
379	    retransmission timeout is required for recovery. However, such
380	    misbehavior does not allow SACK-based loss recovery [RFC3517] to
381	    work either, and a TCP sender will anyway require a timeout to
382	    recover.

384	3.4.  Compatibility with Duplicate ACK based Loss Recovery Algorithms

386	    This algorithm SHOULD NOT be used together with a fast recovery
387	    algorithm that determines the segments that have left the network
388	    based on the number of arriving duplicate acknowledgements (e.g.,
389	    NewReno [RFC3782]), instead of the actual segments reported by SACK.
390	    In presence of ACK reordering such an algorithm will count the
391	    delayed duplicate acknowledgements during the fast recovery
392	    algorithm as extra while determining the number of packets that have
393	    left the network.

395	    In general there should be very little reason to combine this
396	    algorithm with a loss recovery algorithm that is based on inferior,
397	    non-SACK based information only.

399	4.  Security Considerations

401	    A malicious TCP receiver may send false SACK information for
402	    sequence number ranges which it has not received in order to trigger
403	    Fast Retransmit sooner. Such behavior would only be useful when out-
404	    of-order segments have arrived because otherwise the flow undergoes
405	    a loss recovery with a window reduction. This kind of lying involves
406	    guessing which segments will arrive later. In case the guess was
407	    wrong, the performance of the flow is ruined because the TCP sender
408	    will need a retransmission timeout as it will not retransmit the
409	    segments until it assumes SACK reneging. On a successful guess the
410	    attacker is able to trigger the recovery slightly earlier. The later
411	    segments would have allowed reporting the very same regions with
412	    SACK anyway. Therefore, the gain from this attack is small, hardly
413	    justifiable considering the drastic effect of a misguess. Also, a
414	    similar attack can be made with the duplicate acknowledgment based
415	    algorithm (even if the new SACK information rule is applied) by
416	    sending false duplicate acknowledgements with false SACK ranges, and
417	    trivially without the new SACK information rule.

419	    A variation of the lying attack discards reliability of the flow but
420	    as soon as the reliability is not a concern of the receiver, a
421	    number of simpler ways exist to attack TCP independently of this
422	    algorithm. Thus this algorithm is not considered to weaken TCP
423	    security properties against false information.

425	    Splitting SACK blocks into a smaller than the received segment sized
426	    chunks allows the receiver to enable recovery to start sooner
427	    because of IsLost() discontiguous check. However, by doing so the
428	    receiver neglects the possiblity of reordering for a little gain. If
429	    the segment was just reordered, the sender performs unnecessary
430	    window reduction and unnecessary retransmission of the reordered
431	    segment. Another variant of SACK block splitting simply tries to
432	    increase consumption of bandwidth but with small dupThresh value
433	    such as three the difference between sending three duplicate ACKs
434	    (traditional algorithm) and a single ACK with SACK blocks will not
435	    offer significant benefits to make such attack practical. In case
436	    the sender keeps track of segment boundaries and applies them in
437	    IsLost(), these attack will not succeed as the sender cannot be
438	    mislead to believe that a segment was split into multiple chunks.

440	5.  IANA Considerations

442	    This document has no actions for IANA.

444	6.  Acknowledgements

446	    The authors would like to thank Alexander Zimmermann and Anna
447	    Brunstrom for the comments on this document.

449	Appendix
450	A.  Scenarios

452	A.1.  Basic Case

454	    In this scenario no Delayed ACK, ACK losses, reordering or other
455	    "abnormal" behavior happens. For simplicity all the segments are
456	    SMSS sized.

458	    Once the TCP receiver gets first out-of-order segment, it sends a
459	    duplicate ACK with SACK information about the received octets. The
460	    following two out-of-order segments trigger a duplicate ACK each,
461	    with the corresponding range SACKed in addition to the previously
462	    know information. The sender gets those duplicate ACKs in-order,
463	    each of them will SACK a new previously unknown segment.

465	    This algorithm triggers loss recovery on third duplicate ACK because
466	    IsLost returns true as DupThresh * SMSS bytes became SACKed above
467	    the SND.UNA on the same acknowledgement, thus the behavior is
468	    identical to that of a sender which is using duplicate
469	    acknowledgments.  If Limited Transmit is in use, two first duplicate
470	    ACKs allow a single segment to be sent with either of the algorithms
471	    (Pipe is decremented by SMSS by the SACKed octets per ACK allowing
472	    SMSS worth of new octets).

474	        ACK           Transmitted    Received    ACK Sent
475	        Received      Segment        Segment     (Including SACK Blocks)

477	        1000
478	                      3000-3499      3000-3499   (delayed ACK)
479	                      3500-3999      3500-3999   4000
480	        2000
481	                      4000-4499      (dropped)
482	                      4500-4999      4500-4999   4000, SACK=4500-5000
483	        3000
484	                      5000-5499      5000-5499   4000, SACK=4500-5500
485	                      5500-5999      5500-5999   4000, SACK=4500-6000
486	        4000
487	                      6000-6499      6000-6499   4000, SACK=4500-6500
488	                      6500-6999      6500-6999   4000, SACK=4500-7000
489	        4000, SACK=4500-5000
490	                      7000-7499      7000-7499   4000, SACK=4500-7500
491	        4000, SACK=4500-5500
492	                      7500-7999      7500-7999   4000, SACK=4500-8000
493	        4000, SACK=4500-6000
494	                      4000-4499      4000-4499   8000
495	        4000, SACK=4500-6500

497	A.2.  Delayed ACK

499	    A basic case with delayed ACK send the first ACK with SACK
500	    information but since the previous ACK was sent with a lower
501	    sequence number because an acknowledgment is held by delayed ACK,
502	    the sender will not considered it as duplicate ACK. Because the
503	    segment contains SACK information that is identical to the basic
504	    case, the sender can use Limited Transmit with the same segments as
505	    in the basic case and will start loss recovery at the third
506	    acknowledgment, i.e., with the second duplicate acknowledgment. In
507	    the same situation the duplicate ACK based sender will have to wait
508	    for one more duplicate ACK to arrive to do the same as the first
509	    acknowledgment is fully "wasted".

511	    Technically an acknowledgement with a sequence number higher than
512	    what was previously acknowledged is not a duplicate acknowledgement
513	    but a presence of the SACK block tells another story revealing the
514	    receiver which used delayed ACK, and thus the missing duplicate
515	    acknowledgement in between. The response of a TCP sender taking
516	    advantage of such inferred duplicate acknowledgements is well within
517	    the guidelines of packet conservation principle [Jac88] as it still
518	    sends only when segments have left the network.

520	        ACK           Transmitted    Received    ACK Sent
521	        Received      Segment        Segment     (Including SACK Blocks)

523	        1500
524	                      3000-3499      3000-3499   3500
525	                      3500-3999      3500-3999   (delayed ACK)
526	        2500
527	                      4000-4499      (dropped)
528	                      4500-4999      4500-4999   4000, SACK=4500-5000
529	        3500
530	                      5000-5499      5000-5499   4000, SACK=4500-5500
531	                      5500-5999      5500-5999   4000, SACK=4500-6000
532	        4000, SACK=4500-5000
533	                      6000-6499      6000-6499   4000, SACK=4500-6500
534	                      6500-6999      6500-6999   4000, SACK=4500-7000
535	        4000, SACK=4500-5500
536	                      7000-7499      7000-7499   4000, SACK=4500-7500
537	        4000, SACK=4500-6000
538	                      4000-4499      4000-4499   7500
539	        4000, SACK=4500-6500

541	A.3.  ACK Losses

543	    This case with ACK loss shares much behavior with the case with
544	    delayed ACK. If hole at rcv.nxt is filled, the sender will notice
545	    that cumulative ACK advanced.  In case of out-of-order segments the
546	    first ACK which gets through to the sender includes SACK blocks up
547	    to the quantity the SACK block redundancy is able to cover.  With
548	    this algorithm the sender immediately takes use of all the
549	    information that is made available by the incoming ACK.

551	        ACK           Transmitted    Received    ACK Sent
552	        Received      Segment        Segment     (Including SACK Blocks)

554	        1000
555	                      3000-3499      3000-3499   (delayed ACK)
556	                      3500-3999      3500-3999   4000
557	        2000
558	                      4000-4499      (dropped)
559	                      4500-4999      4500-4999   4000, SACK=4500-5000
560	                                                 (dropped)
561	        3000
562	                      5000-5499      5000-5499   4000, SACK=4500-5500
563	                      5500-5999      5500-5999   4000, SACK=4500-6000
564	        4000
565	                      6000-6499      6000-6499   4000, SACK=4500-6500
566	                      6500-6999      6500-6999   4000, SACK=4500-7000
567	        4000, SACK=4500-5500 (two segments left the network)
568	                      7000-7499      7000-7499   4000, SACK=4500-7500
569	                      7500-7999      7500-7999   4000, SACK=4500-8000
570	        4000, SACK=4500-6000
571	                      4000-4499      4000-4499   8000
572	        4000, SACK=4500-6500

574	A.4.  ACK Reordering

576	    With ACK reordering an ACK is postponed.  Due to redundancy the next
577	    ACK after postponed one contains not only its own information but
578	    also the information of the reordered ACK (similar to the ACK losses
579	    case). Then when the reordered ACK arrives, the sender already knows
580	    about the information it provides and therefore no actions are taken
581	    with this algorithm.

583	        ACK           Transmitted    Received    ACK Sent
584	        Received      Segment        Segment     (Including SACK Blocks)

586	        1000
587	                      3000-3499      3000-3499   (delayed ACK)
588	                      3500-3999      3500-3999   4000
589	        2000
590	                      4000-4499      (dropped)
591	                      4500-4999      4500-4999   4000, SACK=4500-5000
592	                                                 (delayed)
593	        3000
594	                      5000-5499      5000-5499   4000, SACK=4500-5500
595	                      5500-5999      5500-5999   4000, SACK=4500-6000
596	        4000
597	                      6000-6499      6000-6499   4000, SACK=4500-6500
598	                      6500-6999      6500-6999   4000, SACK=4500-7000
599	        4000, SACK=4500-5500
600	                      7000-7499      7000-7499   4000, SACK=4500-7500
601	                      7500-7999      7500-7999   4000, SACK=4500-8000
602	        4000, SACK=4500-6000
603	                      4000-4499      4000-4499   8000
604	        4000, SACK=4500-5000 (has only redundant information)
605	        4000, SACK=4500-6500

607	A.5.  Packet Duplication

609	    Packet duplication happens either due to unnecessary retransmission
610	    or hardware duplication.  It adds a redundant ACK which has only
611	    redundant information or a data segment to the stream which will
612	    triggers a redundant duplicate ACK (possibly with SACK and/or DSACK
613	    [RFC2883] information).  Because neither adds any new SACKed octets
614	    at the sender, this algorithm will not do anything while duplicate
615	    ACK based receiver would falsely consider it as a duplicate ACK.

617	    If one of the redundant ACKs is lost, the effect of duplication is
618	    just negated.

620	    It is possible for the sender to detect this case using DSACK alone.

622	A.6.  Mitigation of Blind Throughput Reduction Attack

624	    In case an attacker knows or is able to guess 4-tuple of a TCP
625	    connection, it may apply a blind throughput reduction attack
626	    [CPNI09].  In this attack TCP is tricked to send duplicate ACK to
627	    the other endpoint using out-of-window segments which it is
628	    considerably easier to achieve than a match with sequence numbers.
629	    If more than dupThresh duplicate ACKs can be triggered in row
630	    without any legimate segment that advances acknowledged sequence
631	    number, the other end acts according that false congestion signal
632	    and halves the window.

634	    With this algorithm such duplicate ACKs are filtered because they do
635	    not have any new in-window SACK blocks (DSACK [RFC2883] might be
636	    present though).

638	References

640	Normative References

642	    [RFC793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
643	              793, September 1981.

645	    [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow,
646	              "TCP Selective Acknowledgment Options", RFC 2018,
647	              October 1996.

649	    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
650	              Requirement Levels", BCP 14, RFC 2119, March 1997.

652	    [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
653	              TCP's Loss Recovery Using Limited Transmit", RFC 3042,
654	              January 2001.

656	    [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang,
657	              "A Conservative Selective Acknowledgment (SACK)-based
658	              Loss Recovery Algorithm for TCP", RFC 3517, April 2003.

660	    [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
661	              Control", RFC 5681, September 2009.

663	Informative References

665	    [AAA+09]  Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J.,
666	              and P. Hurtig, "Early Retransmit for TCP and SCTP",
667	              Internet-Draft, draft-ietf-tcpm-early-rexmt-01, January
668	              2009.

670	    [CPNI09]  Security Assessment of the Transmission Control Protocol
671	              (TCP).  Available at:
672	              http://www.cpni.gov.uk/Docs/tn-03-09-security-assessment-
673	              TCP.pdf

675	    [FARI08]  Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding
676	              Acknowledgement Congestion Control to TCP",
677	              Internet-Draft, draft-floyd-tcpm-ackcc-06, July 2009.

679	    [Jac88]   Jacobson, V., "Congestion Avoidance and Control", In
680	              Proc. ACM SIGCOMM 88.

682	    [MM96]    M. Mathis, J. Mahdavi, "Forward Acknowledgment: Refining
683	              TCP Congestion Control," Proceedings of SIGCOMM'96, August
684	              1996, Stanford, CA.

686	    [RFC896]  Nagle, J., "Congestion Control in IP/TCP Internetworks",
687	              RFC 896, January 1984.

689	    [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
690	              Extension to the Selective Acknowledgement (SACK) Option
691	              for TCP", RFC 2883, July 2000.

693	    [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
694	              Modification to TCP's Fast Recovery Algorithm", RFC 3782,
695	              April 2004.

697	AUTHORS' ADDRESSES

699	    Ilpo Jarvinen
700	    University of Helsinki
701	    P.O. Box 68
702	    FI-00014 UNIVERSITY OF HELSINKI
703	    Finland
704	    Email: ilpo.jarvinen@helsinki.fi

706	    Markku Kojo
707	    University of Helsinki
708	    P.O. Box 68
709	    FI-00014 UNIVERSITY OF HELSINKI
710	    Finland
711	    Email: kojo@cs.helsinki.fi