[tcpm] Detect Lost Retransmit with SACK
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[tcpm] Detect Lost Retransmit with SACK
Hi Richard,
firstly welcome on the list :-)
Since your question in not really related to the poll I change the
title...
Comments inline.
Am 09.11.2009 um 13:57 schrieb Scheffenegger, Richard:
Hi Alexander et al.,
This will be the first post to this group, so excuse me if I act
inappropriately.
I'm curious about one little tidbit which has been bugging me for
the better part of the last two monts, and which is closely related
with TCP SACK operations (thus it might belong to this thread?)
The implicit assumption for TCP fast recovery is, that packet loss
happens randomly (ie. to different segments each time) with low
correlation between the drop events. Also, a drop event is used as a
implicit signal to indicate congestion. So far, so good.
It seems to me, that the focus of most developments has been the
internet environment - where statistical assumptions like the above
mentioned arguably hold true.
However, certain high-speed LANs seem to exhibit characteristics,
which don't play well with these implicit assumptions (uncorrelated
packet loss) - the smaller the network, the more deviation from an
"good seasoned" link (exhibiting some form of congestion) is likely
to occur.
Also, as has been noted in prior research, many internet routers do
use more "tcp-friendly" RED or WRED queue policies, over the
simplistic TailDrop most often encountered in LANs (default policy
of L2 switches and L3 routers).
In one extreme, I have found a (misbehaving´?) TCP stack/host, which
sends out a burst of segments (4-6) @ 10GbE wirespeed, which
immediately cause queue buffer overload and TailDrop in the first
hop L2 Switch, when two such high performance hosts try to establish
a high speed communication. With other words, the hosts themselves
seem to make sure that there is a high correlation between TCP
(fast) recovery and further packet loss.
But what puzzles me the most - even with SACK enabled TCP stacks,
virtually no implementation can detect / act upon detection of the
loss of a retransmitted segment during fast recovery. This despite
the fact, that the stipulations in RFC3517 requires the receiver to
make the information to detect such an event implicitly available to
the sender. The first SACK option has to reflect the last segment,
which triggered this SACK.
Together with the scoreboard held at the sender, it should be rather
easy to find out if the left edge of the lowest hole (relative to
stream octets) closes.
What do you with "left edge of the lowest hole"? Do you mean SND.UNA?
If ACK covers SND.UNA then it is an cumulative ACK.
If that left edge stays constant for "DupThresh" number of ACKs,
which reduce the overall number of octets in holes (any one hole
might close due to the retransmitted packets still received), AND
the sender retransmits beginning with the lowest hole first, this
would be a clear indication of another segment retransmit loss...
Sorry, I don't understand. If we have 20 segments in flight and one
segment gets lost, you will retransmit after 3 DUPACKS the oldest
outstanding segment.
Then, assuming no reordering and no further lost, you will get 17
DUPACKS (without Limited Transmit) before your hole is closed.
What do I miss here?
Can you give me an example?
Even a less speedy detection logic would work for SACK-enabled
sessions: once the fast recovery is finished from the sender's point
of view, if the receiver still complains about missing segments
(indicated by having the SACK rightmost edge - in the first slot
SACK option - at a segment higher than when fast recovery started),
another round of fast recovery could be invoked, rather than waiting
for RTO.
Of course, the first approach would be better for low cwnd sessions
with only very few segments in transit - and both could be combined
with the proposed sack recovery speed-ups... (Reducing DupThresh for
low cwnd sessions / when little data is being sent).
Congestion control should act to this event (it will now, but only
one RTO later...), and the SACK retransmit vector (HighRxt) reset,
using LimitedTransmit for sending out the retransmission segments -
once cwnd + pipe allows; any retransmitted segments still in the
network will close their respective SACK holes before the new
HighRxt advances to them.
And, RTO should be reduce (I guess to nearly zero, between SACK-
enabled hosts).
I have run numerous tests, to check the behavior of different TCP
Stacks (FreeBSD 4.2 - 8.0; windows xp, vista, 7, 2003; Linux 2.6.16
and others).
All these stacks seem to exhibit this issue; What I don't know yet
is the percentage of multi-loss segement events triggering RTO - but
I assume that the majority of RTOs happen because of this.
In LAN environments (ie. 10 GbE over 1 km @ 2 ms latency due to the
L2 hops in between) featuring relatively few streams, the effect of
any single RTO can be quite tremendeous - taking considerable
theoretical bandwidth away from the session (ie. 1 sec minimum RTO
equals 1.2 GB; even with more recent RTO values around 0.2 - 0.4
sec, each RTO is still a few hundred MB "lost" capacity under
optimal circumstances.
Nevertheless, I cann't imagine that I am the first one to bring up
this issue (despite having failed to find any study of this
effect). :)
One more clarification, which came up after I looked at the FreeBSD
implementation of Limited Transmit; this might be a nit-pick, but
when RFC 3042 is active, shouldn't ABC also be used during
LimitedTransmit / FastRecovery?
Why? One reason for ABC are lying receivers (ACK Division). So, the
worst case is Slow-Start...
(FreeBSD MAIN is increasing cwnd by 1 mss for each new ACK, instead
for the amount of data in that ack...
What do you describe here? Slow-Start?
RFC 3042 says: "The congestion window (cwnd) MUST NOT be changed when
these new segments are transmitted."
Thanks a lot!
Best regards,
Alex
Richard Scheffenegger
Field Escalation Engineer
NetApp Global Support
NetApp
+43 1 3676811 3146 Office (2143 3146 - internal)
+43 676 654 3146 Mobile
www.netapp.com <BLOCKED::http://www.netapp.com/>
Franz-Klein-Gasse 5
1190 Wien
* To: "tcpm at ietf.org <mailto:tcpm at DOMAIN.HIDDEN> WG Extensions"
<tcpm at ietf.org <mailto:tcpm at DOMAIN.HIDDEN> >
* Subject: [tcpm] Should draft-ietf-tcpm-sack-recovery-entry update
RFC 3717 (SACK-TCP)
* From: Alexander Zimmermann <alexander.zimmermann at nets.rwth-aachen.de
<mailto:alexander.zimmermann at DOMAIN.HIDDEN> >
* Date: Wed, 21 Oct 2009 12:22:50 +0200
_____
Hi folks,
based on the fact that the draft "draft-ietf-tcpm-sack-recovery-
entry" is adopted as WG item now and intended to be a "standards
track" document, I would like to start a poll/discussion whether the
draft should update RFC 3517 or not? Moreover, should we produce a
separate document or an update of RFC 3517?
a) separate document, do not update RFC 3517
b) separate document, update RFC 3517
c) RFC3517bis, obsolete RFC 3517
//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22220
// email: zimmermann at cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//
_______________________________________________
tcpm mailing list
tcpm at ietf.org
https://www.ietf.org/mailman/listinfo/tcpm
//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22220
// email: zimmermann at cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//
//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22220
// email: zimmermann at cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//
Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.