Re: [tcpm] Detect Lost Retransmit mit SACK
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tcpm] Detect Lost Retransmit mit SACK



See inline

Richard Scheffenegger
Field Escalation Engineer
NetApp Global Support 
NetApp
+43 1 3676811 3146 Office (2143 3146 - internal)
+43 676 654 3146 Mobile
www.netapp.com 
Franz-Klein-Gasse 5
1190 Wien 

 

> -----Original Message-----
> From: Arnd Hannemann [mailto:hannemann at i4.informatik.rwth-aachen.de] 
> Sent: Dienstag, 10. November 2009 17:43
> To: Scheffenegger, Richard
> Cc: Alexander Zimmermann; tcpm at ietf.org
> Subject: Re: Detect Lost Retransmit mit SACK
> 
> Hi Richard,
> 
> 
> Scheffenegger, Richard schrieb:
::
> > 
> >   A1) With TSOpt, they could be used to "remember" when 
> >       Retransmission started;
> >       When an ACKs with TSEcn > (TSOpt + RTT) is seen by 
> >       the sender, it can
> >       re-arm the DUPACK detector.
> 
> This is your non-SACK scenario?
> Please note that a TCP receiver will NOT echo TSVal from 
> out-of-order segments. So this won't work.

Interesting; this doesn't seem to be what certain (most?) stacks are 
doing - they seem to always reply in the ACK Tsecr the value of
Tsopt/TS.recent 
triggering the ACK...

Or was your intention to say, that the receiver will not respond with
Tsopt, but rather TS.recent?

If so, than this makes no difference - the sender will detect the 
first ACK from the receiver, which has the same TS (well, actually TS+1
as multiple segments are likely to carry the same TS) when the
retransmission
was started. This will be the sign for the sender, that at least one RTT
has elapsed (that was the intention for having timestamps in the first
place, right?). If the returned ACK does not cover the first
retransmitted
segment by that time, it's quite reasonable to assume that it got lost
again.

Again, keep in mind that I'm looking on LANs; a TS in 10 ms increments
(100 Hz
TCP clock) is likely to be put into a few hundred or thousand packets...

If there is reason to believe the reordering can occur with delays >>
RTT, then
Yes, this simple detection logic would be fooled and falsely trigger a
retransmission.

However, IF the reordering is >> 2*RTT, then it's very likely that the 
newly retransmitted segment arrives at the receiver first (over the
faster 
path), and again this would only serve to keep the application visible
latency 
as low as possible (between the points in time, where TCP can deliver
some / any 
new data up the stack...) 



> >   A2) With SACK, tracking SND.NXT and HOLEBYTES (number of 
> >       octets in all current holes)
> >       can be employed to track RTT (they need to be 
> >       initialized when the first
> >       retransmitted segment is sent). When an ACK contains 
> >       a SACK for >= SND.NXT, or the
> >       HOLEBYTES are smaller than when retransmission started, the 
> >       DUPACK detector can be re-armed.
> 
> You should specify what you mean with SND.NXT.
> We are in recovery, so we may very well send out new segments.
> I don't see why number of holes in sack scoreboard has 
> anything todo with RTT.


SND.NXT is the rightmost segment which the sender has never before 
sent out to the network. Isn't that the meaning of Snd.nxt according 
to RFC793?

And I was not talking about the # of holes, but rather the amount of
Data (in sum) between FACK (defined by the rightmost offset in the SACK
Scoreboard, i.e. covering the highest ever received segment) and
SND.UNA.

Monitoring Holes will gain nothing (when there is reordering for
example);

But if the any one retransmission makes it though, the amount of
unSACKed 
data in the sum total of all holes in the scoreboard, will decrease - 
indicating when at least one RTT has passed since start of
FastRetransmit.



> 
> > 
> >   A3) If neither TSOpt nor SACK is used, revert to current behavior 
> > (RTO timeout to
> >       detect a lost retransmit).
> > 
> > B) DUPACK detector would need to be enhanced over what is 
> currently in 
> > the RFCs (but
> >    I think most of that is already in Drafts, ie. 
> >    
> > 
> http://tools.ietf.org/html/draft-jarvinen-tcpm-sack-recovery-entry-01
> > ).
> > 
> >  B1) With SACK enabled, fully duplicate ACKs (ACK and SACK 
> have been 
> > seen before, and
> >      prev_HOLEBYTES == HOLEBYTES) can be discarded; lost ACKs would 
> > count for two
> >      (prev_HOLEBYTES < HOLEBYTES - SMSS).
> 
> Now B1 is SACK enabled scenario? You try to confuse me, right?
> All this makes little sense to me.


I agree, it would have made sense to label points B1 and B2 the other
way
around to stay more consistent; that's one of the reasons why I like to 
discuss this :)

> 
> > 
> >  B2) With TSOpt, if another RTT passes without advancing SND.UNA, 
> >      increase DupAcks += DupThresh
> > 
> >  When DupThresh # of ACKs are received, reset HighRxt = SND.UNA, 
> > DupAcks = 0; *)
> 
> See above...
> non-SACK is not going to work.


See above - TSOpt is only used for the prime purpose to detect when a 
RTT has passed. It doesn't matter if Tsecr contains TSopt or
TS.recent... 
 
> 
> > There are two comments I would like to make:
> > 
> > A) Latency: if the sender first sends out the entire range 
> of holes, 
> > before
> >    reacting to a detectable lost retransmitted packet, the latency 
> > observed
> >    by the application on the receiver side will increase 
> needlessly. 
> > It might
> >    take a number of RTTs under certain circumstances, 
> before all the 
> > holes
> >    have been retransmitted once... 
> 
> Show me an example. where a lost retransmission is 
> detectable, while the sender did not send out retransmissions 
> for every other hole?
> Sounds weird.

Can I attach trace files on posts to this list, or shall I sent them 
to you directly? 

I assume that on the receiving side, the application is only delivered
data, which has been ACKed (not SACKed). If the sender waits until 
RTO before retrying to re-send a retransmission (sorry about these
Multiple re-s :) ), the application will at least stall (not get any 
new data) for RTO+2*RTT - and certain latency sensitive applications
will time out before that. Again: I'm not talking about internet
applications but rather LAN HPC applications with timeouts in the 
order of a few seconds at most.

Advancing (on the receiver) within that application timeout is
Usually enough to not run into the application timeout.


> 
> 
> > 
> > B) Complexity: As soon as any congestion event is encountered, TCP 
> > won't be in
> >    fastpath mode any longer; the added computational complexity to 
> > (reliably, see my
> >    check on FACK with Linux 2.6.18 yesterday) trigger another 
> > retransmission
> >    of an already retransmitted segment is neglegible, compared to 
> > current best
> >    practise (waiting for RTO), when you need to move data around as 
> > quickly (low
> >    latency AND high bandwidth) as possible. As mentioned, my 
> > background is not
> >    so much the global internet (with statistically valid implict 
> > assumptions), but
> >    rather High-Speed, short Range, (lossy) LANs, with only very few 
> > active TCP
> >    sessions at any one time. The hosts I deal with might 
> have as few 
> > as 10-20 TCP
> >    sessions, of which only 1-4 are pushing data, but those 
> might run 
> > over dedicated
> >    10GbE ports (overloading L2 switches in the process, and 
> exhibiting 
> > Burst Loss
> >    (correlated packet loss) ).
> 
> This probably highly rates from case to case. If you have a 
> huge bandwidth-delay product even SACK scoreboard handling 
> will have computational complexity which might outweigh its benefits.
> As you have seem to have the equipment at hand, you could do 
> some cpu load and throughput measurements with 10GbE and SACK 
> versus non-SACK flows.

Well, I was doing some test like this:

http://www.ibm.com/developerworks/linux/library/l-tcp-sack/index.html

But the CPU impact (on top of what is required for 10G) was neglegible.

But I have to admit, I didn't had the time yet to test a SACK scoreboard
with every other byte marked mission (which could mean some 0,5 mio
entries).

But it seems that most implementations feature sorted scoreboards of
limited
(16/128) size.

Traversing those is - again in my case of 10G LANs - still many orders
of 
magnitude faster then waiting for RTO. 

I personally consider RTOs to be harmful (when there are other means to 
recover with). :) (perhaps once I have worked through all the issues
raised
here, I submit a draft with "TCP RTO considered harmful" next April? :)


> 
> 
> > 
> > 
> > Thus my view might deviate from a "typical" view in certain 
> aspects. I 
> > might even be in error and would like to hear, why some 
> things cann't 
> > work like depicted. However, I would place more emphasis on 
> the timely 
> > recovery  of any lost (or
> > re-lost) segment,
> > over making sure that no single segment is re-transmitted 
> needlessly; 
> > better retransmit a little (!) too much too soon, instead 
> of waiting 
> > for RTO :) I want to point out the "little" though, as 
> overwhelming a 
> > already lossy network with too much retransmissions (ie. 
> Setting RTO = 
> > 0/1 tick, with a TCP clock runnig in msec), will only cause more 
> > loss...
> > 
> 
> I think it is a good idea to detect lost retransmissions and 
> I think it is possible with SACK. I already proposed a 
> potential solution ;-) "Just" store highest seq, for each 
> retransmission. And check incoming SACKs on those. My main 
> concern is, as already said, that this might get quiet 
> expensive. But you are free to evaluate this more deeply ;-) 
> Anyway, maybe we should focus on your other thread...
> 
> Best regards,
> Arnd
> 

Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.