Re: [tcpm] Detect Lost Retransmit mit SACK
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tcpm] Detect Lost Retransmit mit SACK
Hi Richard,
it is done. I'm completely lost in the double threaded tread. I really suggest we stop
this ping-pong discussion, since IHMO nobody can follow us....
My suggestion is that you
* start to write a little ID with at least a pseudo code algo
* and a lot of *well indented* examples like Ilpo does in his ID
=> Basic Case, multiple (burst) loss, ACK loss, reordering, packet duplication,...
With such a document, it will be much more easier to follow your thoughts.
What do you think?
Alex
Am 10.11.2009 um 19:28 schrieb Scheffenegger, Richard:
>
> See inline
>
> Richard Scheffenegger
> Field Escalation Engineer
> NetApp Global Support
> NetApp
> +43 1 3676811 3146 Office (2143 3146 - internal)
> +43 676 654 3146 Mobile
> www.netapp.com
> Franz-Klein-Gasse 5
> 1190 Wien
>
>
>
>> -----Original Message-----
>> From: Arnd Hannemann [mailto:hannemann at i4.informatik.rwth-aachen.de]
>> Sent: Dienstag, 10. November 2009 17:43
>> To: Scheffenegger, Richard
>> Cc: Alexander Zimmermann; tcpm at ietf.org
>> Subject: Re: Detect Lost Retransmit mit SACK
>>
>> Hi Richard,
>>
>>
>> Scheffenegger, Richard schrieb:
> ::
>>>
>>> A1) With TSOpt, they could be used to "remember" when
>>> Retransmission started;
>>> When an ACKs with TSEcn > (TSOpt + RTT) is seen by
>>> the sender, it can
>>> re-arm the DUPACK detector.
>>
>> This is your non-SACK scenario?
>> Please note that a TCP receiver will NOT echo TSVal from
>> out-of-order segments. So this won't work.
>
> Interesting; this doesn't seem to be what certain (most?) stacks are
> doing - they seem to always reply in the ACK Tsecr the value of
> Tsopt/TS.recent
> triggering the ACK...
>
> Or was your intention to say, that the receiver will not respond with
> Tsopt, but rather TS.recent?
>
> If so, than this makes no difference - the sender will detect the
> first ACK from the receiver, which has the same TS (well, actually TS+1
> as multiple segments are likely to carry the same TS) when the
> retransmission
> was started. This will be the sign for the sender, that at least one RTT
> has elapsed (that was the intention for having timestamps in the first
> place, right?). If the returned ACK does not cover the first
> retransmitted
> segment by that time, it's quite reasonable to assume that it got lost
> again.
>
> Again, keep in mind that I'm looking on LANs; a TS in 10 ms increments
> (100 Hz
> TCP clock) is likely to be put into a few hundred or thousand packets...
>
> If there is reason to believe the reordering can occur with delays >>
> RTT, then
> Yes, this simple detection logic would be fooled and falsely trigger a
> retransmission.
>
> However, IF the reordering is >> 2*RTT, then it's very likely that the
> newly retransmitted segment arrives at the receiver first (over the
> faster
> path), and again this would only serve to keep the application visible
> latency
> as low as possible (between the points in time, where TCP can deliver
> some / any
> new data up the stack...)
>
>
>
>>> A2) With SACK, tracking SND.NXT and HOLEBYTES (number of
>>> octets in all current holes)
>>> can be employed to track RTT (they need to be
>>> initialized when the first
>>> retransmitted segment is sent). When an ACK contains
>>> a SACK for >= SND.NXT, or the
>>> HOLEBYTES are smaller than when retransmission started, the
>>> DUPACK detector can be re-armed.
>>
>> You should specify what you mean with SND.NXT.
>> We are in recovery, so we may very well send out new segments.
>> I don't see why number of holes in sack scoreboard has
>> anything todo with RTT.
>
>
> SND.NXT is the rightmost segment which the sender has never before
> sent out to the network. Isn't that the meaning of Snd.nxt according
> to RFC793?
>
> And I was not talking about the # of holes, but rather the amount of
> Data (in sum) between FACK (defined by the rightmost offset in the SACK
> Scoreboard, i.e. covering the highest ever received segment) and
> SND.UNA.
>
> Monitoring Holes will gain nothing (when there is reordering for
> example);
>
> But if the any one retransmission makes it though, the amount of
> unSACKed
> data in the sum total of all holes in the scoreboard, will decrease -
> indicating when at least one RTT has passed since start of
> FastRetransmit.
>
>
>
>>
>>>
>>> A3) If neither TSOpt nor SACK is used, revert to current behavior
>>> (RTO timeout to
>>> detect a lost retransmit).
>>>
>>> B) DUPACK detector would need to be enhanced over what is
>> currently in
>>> the RFCs (but
>>> I think most of that is already in Drafts, ie.
>>>
>>>
>> http://tools.ietf.org/html/draft-jarvinen-tcpm-sack-recovery-entry-01
>>> ).
>>>
>>> B1) With SACK enabled, fully duplicate ACKs (ACK and SACK
>> have been
>>> seen before, and
>>> prev_HOLEBYTES == HOLEBYTES) can be discarded; lost ACKs would
>>> count for two
>>> (prev_HOLEBYTES < HOLEBYTES - SMSS).
>>
>> Now B1 is SACK enabled scenario? You try to confuse me, right?
>> All this makes little sense to me.
>
>
> I agree, it would have made sense to label points B1 and B2 the other
> way
> around to stay more consistent; that's one of the reasons why I like to
> discuss this :)
>
>>
>>>
>>> B2) With TSOpt, if another RTT passes without advancing SND.UNA,
>>> increase DupAcks += DupThresh
>>>
>>> When DupThresh # of ACKs are received, reset HighRxt = SND.UNA,
>>> DupAcks = 0; *)
>>
>> See above...
>> non-SACK is not going to work.
>
>
> See above - TSOpt is only used for the prime purpose to detect when a
> RTT has passed. It doesn't matter if Tsecr contains TSopt or
> TS.recent...
>
>>
>>> There are two comments I would like to make:
>>>
>>> A) Latency: if the sender first sends out the entire range
>> of holes,
>>> before
>>> reacting to a detectable lost retransmitted packet, the latency
>>> observed
>>> by the application on the receiver side will increase
>> needlessly.
>>> It might
>>> take a number of RTTs under certain circumstances,
>> before all the
>>> holes
>>> have been retransmitted once...
>>
>> Show me an example. where a lost retransmission is
>> detectable, while the sender did not send out retransmissions
>> for every other hole?
>> Sounds weird.
>
> Can I attach trace files on posts to this list, or shall I sent them
> to you directly?
>
> I assume that on the receiving side, the application is only delivered
> data, which has been ACKed (not SACKed). If the sender waits until
> RTO before retrying to re-send a retransmission (sorry about these
> Multiple re-s :) ), the application will at least stall (not get any
> new data) for RTO+2*RTT - and certain latency sensitive applications
> will time out before that. Again: I'm not talking about internet
> applications but rather LAN HPC applications with timeouts in the
> order of a few seconds at most.
>
> Advancing (on the receiver) within that application timeout is
> Usually enough to not run into the application timeout.
>
>
>>
>>
>>>
>>> B) Complexity: As soon as any congestion event is encountered, TCP
>>> won't be in
>>> fastpath mode any longer; the added computational complexity to
>>> (reliably, see my
>>> check on FACK with Linux 2.6.18 yesterday) trigger another
>>> retransmission
>>> of an already retransmitted segment is neglegible, compared to
>>> current best
>>> practise (waiting for RTO), when you need to move data around as
>>> quickly (low
>>> latency AND high bandwidth) as possible. As mentioned, my
>>> background is not
>>> so much the global internet (with statistically valid implict
>>> assumptions), but
>>> rather High-Speed, short Range, (lossy) LANs, with only very few
>>> active TCP
>>> sessions at any one time. The hosts I deal with might
>> have as few
>>> as 10-20 TCP
>>> sessions, of which only 1-4 are pushing data, but those
>> might run
>>> over dedicated
>>> 10GbE ports (overloading L2 switches in the process, and
>> exhibiting
>>> Burst Loss
>>> (correlated packet loss) ).
>>
>> This probably highly rates from case to case. If you have a
>> huge bandwidth-delay product even SACK scoreboard handling
>> will have computational complexity which might outweigh its benefits.
>> As you have seem to have the equipment at hand, you could do
>> some cpu load and throughput measurements with 10GbE and SACK
>> versus non-SACK flows.
>
> Well, I was doing some test like this:
>
> http://www.ibm.com/developerworks/linux/library/l-tcp-sack/index.html
>
> But the CPU impact (on top of what is required for 10G) was neglegible.
>
> But I have to admit, I didn't had the time yet to test a SACK scoreboard
> with every other byte marked mission (which could mean some 0,5 mio
> entries).
>
> But it seems that most implementations feature sorted scoreboards of
> limited
> (16/128) size.
>
> Traversing those is - again in my case of 10G LANs - still many orders
> of
> magnitude faster then waiting for RTO.
>
> I personally consider RTOs to be harmful (when there are other means to
> recover with). :) (perhaps once I have worked through all the issues
> raised
> here, I submit a draft with "TCP RTO considered harmful" next April? :)
>
>
>>
>>
>>>
>>>
>>> Thus my view might deviate from a "typical" view in certain
>> aspects. I
>>> might even be in error and would like to hear, why some
>> things cann't
>>> work like depicted. However, I would place more emphasis on
>> the timely
>>> recovery of any lost (or
>>> re-lost) segment,
>>> over making sure that no single segment is re-transmitted
>> needlessly;
>>> better retransmit a little (!) too much too soon, instead
>> of waiting
>>> for RTO :) I want to point out the "little" though, as
>> overwhelming a
>>> already lossy network with too much retransmissions (ie.
>> Setting RTO =
>>> 0/1 tick, with a TCP clock runnig in msec), will only cause more
>>> loss...
>>>
>>
>> I think it is a good idea to detect lost retransmissions and
>> I think it is possible with SACK. I already proposed a
>> potential solution ;-) "Just" store highest seq, for each
>> retransmission. And check incoming SACKs on those. My main
>> concern is, as already said, that this might get quiet
>> expensive. But you are free to evaluate this more deeply ;-)
>> Anyway, maybe we should focus on your other thread...
>>
>> Best regards,
>> Arnd
>>
//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22220
// email: zimmermann at cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//
Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.