Re: [tcpm] Detect Lost Retransmit mit SACK
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tcpm] Detect Lost Retransmit mit SACK



Hi Richard,

it is done. I'm completely lost in the double threaded tread. I really suggest we stop
this ping-pong discussion, since IHMO nobody can follow us....
My suggestion is that you

* start to write a little ID with at least a pseudo code algo
* and a lot of *well indented* examples like Ilpo does in his ID
	=> Basic Case, multiple (burst) loss, ACK loss, reordering, packet duplication,...

With such a document,  it will be much more easier to follow your thoughts.
What do you think?

Alex

Am 10.11.2009 um 19:28 schrieb Scheffenegger, Richard:

> 
> See inline
> 
> Richard Scheffenegger
> Field Escalation Engineer
> NetApp Global Support 
> NetApp
> +43 1 3676811 3146 Office (2143 3146 - internal)
> +43 676 654 3146 Mobile
> www.netapp.com 
> Franz-Klein-Gasse 5
> 1190 Wien 
> 
> 
> 
>> -----Original Message-----
>> From: Arnd Hannemann [mailto:hannemann at i4.informatik.rwth-aachen.de] 
>> Sent: Dienstag, 10. November 2009 17:43
>> To: Scheffenegger, Richard
>> Cc: Alexander Zimmermann; tcpm at ietf.org
>> Subject: Re: Detect Lost Retransmit mit SACK
>> 
>> Hi Richard,
>> 
>> 
>> Scheffenegger, Richard schrieb:
> ::
>>> 
>>>  A1) With TSOpt, they could be used to "remember" when 
>>>      Retransmission started;
>>>      When an ACKs with TSEcn > (TSOpt + RTT) is seen by 
>>>      the sender, it can
>>>      re-arm the DUPACK detector.
>> 
>> This is your non-SACK scenario?
>> Please note that a TCP receiver will NOT echo TSVal from 
>> out-of-order segments. So this won't work.
> 
> Interesting; this doesn't seem to be what certain (most?) stacks are 
> doing - they seem to always reply in the ACK Tsecr the value of
> Tsopt/TS.recent 
> triggering the ACK...
> 
> Or was your intention to say, that the receiver will not respond with
> Tsopt, but rather TS.recent?
> 
> If so, than this makes no difference - the sender will detect the 
> first ACK from the receiver, which has the same TS (well, actually TS+1
> as multiple segments are likely to carry the same TS) when the
> retransmission
> was started. This will be the sign for the sender, that at least one RTT
> has elapsed (that was the intention for having timestamps in the first
> place, right?). If the returned ACK does not cover the first
> retransmitted
> segment by that time, it's quite reasonable to assume that it got lost
> again.
> 
> Again, keep in mind that I'm looking on LANs; a TS in 10 ms increments
> (100 Hz
> TCP clock) is likely to be put into a few hundred or thousand packets...
> 
> If there is reason to believe the reordering can occur with delays >>
> RTT, then
> Yes, this simple detection logic would be fooled and falsely trigger a
> retransmission.
> 
> However, IF the reordering is >> 2*RTT, then it's very likely that the 
> newly retransmitted segment arrives at the receiver first (over the
> faster 
> path), and again this would only serve to keep the application visible
> latency 
> as low as possible (between the points in time, where TCP can deliver
> some / any 
> new data up the stack...) 
> 
> 
> 
>>>  A2) With SACK, tracking SND.NXT and HOLEBYTES (number of 
>>>      octets in all current holes)
>>>      can be employed to track RTT (they need to be 
>>>      initialized when the first
>>>      retransmitted segment is sent). When an ACK contains 
>>>      a SACK for >= SND.NXT, or the
>>>      HOLEBYTES are smaller than when retransmission started, the 
>>>      DUPACK detector can be re-armed.
>> 
>> You should specify what you mean with SND.NXT.
>> We are in recovery, so we may very well send out new segments.
>> I don't see why number of holes in sack scoreboard has 
>> anything todo with RTT.
> 
> 
> SND.NXT is the rightmost segment which the sender has never before 
> sent out to the network. Isn't that the meaning of Snd.nxt according 
> to RFC793?
> 
> And I was not talking about the # of holes, but rather the amount of
> Data (in sum) between FACK (defined by the rightmost offset in the SACK
> Scoreboard, i.e. covering the highest ever received segment) and
> SND.UNA.
> 
> Monitoring Holes will gain nothing (when there is reordering for
> example);
> 
> But if the any one retransmission makes it though, the amount of
> unSACKed 
> data in the sum total of all holes in the scoreboard, will decrease - 
> indicating when at least one RTT has passed since start of
> FastRetransmit.
> 
> 
> 
>> 
>>> 
>>>  A3) If neither TSOpt nor SACK is used, revert to current behavior 
>>> (RTO timeout to
>>>      detect a lost retransmit).
>>> 
>>> B) DUPACK detector would need to be enhanced over what is 
>> currently in 
>>> the RFCs (but
>>>   I think most of that is already in Drafts, ie. 
>>> 
>>> 
>> http://tools.ietf.org/html/draft-jarvinen-tcpm-sack-recovery-entry-01
>>> ).
>>> 
>>> B1) With SACK enabled, fully duplicate ACKs (ACK and SACK 
>> have been 
>>> seen before, and
>>>     prev_HOLEBYTES == HOLEBYTES) can be discarded; lost ACKs would 
>>> count for two
>>>     (prev_HOLEBYTES < HOLEBYTES - SMSS).
>> 
>> Now B1 is SACK enabled scenario? You try to confuse me, right?
>> All this makes little sense to me.
> 
> 
> I agree, it would have made sense to label points B1 and B2 the other
> way
> around to stay more consistent; that's one of the reasons why I like to 
> discuss this :)
> 
>> 
>>> 
>>> B2) With TSOpt, if another RTT passes without advancing SND.UNA, 
>>>     increase DupAcks += DupThresh
>>> 
>>> When DupThresh # of ACKs are received, reset HighRxt = SND.UNA, 
>>> DupAcks = 0; *)
>> 
>> See above...
>> non-SACK is not going to work.
> 
> 
> See above - TSOpt is only used for the prime purpose to detect when a 
> RTT has passed. It doesn't matter if Tsecr contains TSopt or
> TS.recent... 
> 
>> 
>>> There are two comments I would like to make:
>>> 
>>> A) Latency: if the sender first sends out the entire range 
>> of holes, 
>>> before
>>>   reacting to a detectable lost retransmitted packet, the latency 
>>> observed
>>>   by the application on the receiver side will increase 
>> needlessly. 
>>> It might
>>>   take a number of RTTs under certain circumstances, 
>> before all the 
>>> holes
>>>   have been retransmitted once... 
>> 
>> Show me an example. where a lost retransmission is 
>> detectable, while the sender did not send out retransmissions 
>> for every other hole?
>> Sounds weird.
> 
> Can I attach trace files on posts to this list, or shall I sent them 
> to you directly? 
> 
> I assume that on the receiving side, the application is only delivered
> data, which has been ACKed (not SACKed). If the sender waits until 
> RTO before retrying to re-send a retransmission (sorry about these
> Multiple re-s :) ), the application will at least stall (not get any 
> new data) for RTO+2*RTT - and certain latency sensitive applications
> will time out before that. Again: I'm not talking about internet
> applications but rather LAN HPC applications with timeouts in the 
> order of a few seconds at most.
> 
> Advancing (on the receiver) within that application timeout is
> Usually enough to not run into the application timeout.
> 
> 
>> 
>> 
>>> 
>>> B) Complexity: As soon as any congestion event is encountered, TCP 
>>> won't be in
>>>   fastpath mode any longer; the added computational complexity to 
>>> (reliably, see my
>>>   check on FACK with Linux 2.6.18 yesterday) trigger another 
>>> retransmission
>>>   of an already retransmitted segment is neglegible, compared to 
>>> current best
>>>   practise (waiting for RTO), when you need to move data around as 
>>> quickly (low
>>>   latency AND high bandwidth) as possible. As mentioned, my 
>>> background is not
>>>   so much the global internet (with statistically valid implict 
>>> assumptions), but
>>>   rather High-Speed, short Range, (lossy) LANs, with only very few 
>>> active TCP
>>>   sessions at any one time. The hosts I deal with might 
>> have as few 
>>> as 10-20 TCP
>>>   sessions, of which only 1-4 are pushing data, but those 
>> might run 
>>> over dedicated
>>>   10GbE ports (overloading L2 switches in the process, and 
>> exhibiting 
>>> Burst Loss
>>>   (correlated packet loss) ).
>> 
>> This probably highly rates from case to case. If you have a 
>> huge bandwidth-delay product even SACK scoreboard handling 
>> will have computational complexity which might outweigh its benefits.
>> As you have seem to have the equipment at hand, you could do 
>> some cpu load and throughput measurements with 10GbE and SACK 
>> versus non-SACK flows.
> 
> Well, I was doing some test like this:
> 
> http://www.ibm.com/developerworks/linux/library/l-tcp-sack/index.html
> 
> But the CPU impact (on top of what is required for 10G) was neglegible.
> 
> But I have to admit, I didn't had the time yet to test a SACK scoreboard
> with every other byte marked mission (which could mean some 0,5 mio
> entries).
> 
> But it seems that most implementations feature sorted scoreboards of
> limited
> (16/128) size.
> 
> Traversing those is - again in my case of 10G LANs - still many orders
> of 
> magnitude faster then waiting for RTO. 
> 
> I personally consider RTOs to be harmful (when there are other means to 
> recover with). :) (perhaps once I have worked through all the issues
> raised
> here, I submit a draft with "TCP RTO considered harmful" next April? :)
> 
> 
>> 
>> 
>>> 
>>> 
>>> Thus my view might deviate from a "typical" view in certain 
>> aspects. I 
>>> might even be in error and would like to hear, why some 
>> things cann't 
>>> work like depicted. However, I would place more emphasis on 
>> the timely 
>>> recovery  of any lost (or
>>> re-lost) segment,
>>> over making sure that no single segment is re-transmitted 
>> needlessly; 
>>> better retransmit a little (!) too much too soon, instead 
>> of waiting 
>>> for RTO :) I want to point out the "little" though, as 
>> overwhelming a 
>>> already lossy network with too much retransmissions (ie. 
>> Setting RTO = 
>>> 0/1 tick, with a TCP clock runnig in msec), will only cause more 
>>> loss...
>>> 
>> 
>> I think it is a good idea to detect lost retransmissions and 
>> I think it is possible with SACK. I already proposed a 
>> potential solution ;-) "Just" store highest seq, for each 
>> retransmission. And check incoming SACKs on those. My main 
>> concern is, as already said, that this might get quiet 
>> expensive. But you are free to evaluate this more deeply ;-) 
>> Anyway, maybe we should focus on your other thread...
>> 
>> Best regards,
>> Arnd
>> 

//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22220
// email: zimmermann at cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//


Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.