[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [dhcwg] max-unacked-bndupd



On Aug 17, 2006, at 1:53 PM, David W. Hankins wrote:
On Thu, Aug 17, 2006 at 11:33:09AM -0700, Damien Neil wrote:
I'm also curious as to the reason for the focus on ensuring that
CONTACT messages are processed in a timely fashion.  In the case
where the failover connection is clogged with BNDUPD messages,
there's no need to send CONTACTs--since the tSend timer will be reset
every time a BNDUPD message is sent, and the tReceive timer will be
reset every time a BNDUPD message is read.  CONTACT messages are only

I believe the timers would be reset for any message, not specific ones (don't know if you meant that to be specific or not):

Right; I specified BNDUPDs, because they're the only other type of message that's likely to be being sent at that time.



Imagine a system has 10 buckets for update messages internally,
no matter how big the tcp buffer sizes are.  It fills all 10,
but none of them are being processed - a lock on the database
has held up processing, let's imagine, so none of them can complete.

The server isn't down, it can still be answering DHCP clients (say
from a memory cache of said database) - it just has...committment
problems.

Okay. Let's call this server A, and its peer server B.

There's no worry that server B will conclude that server A is down, so long as A continues to send CONTACT messages.

If the TCP connection from B to A blocks, there's still no worry, as long as A reads another message in a timely fashion. The fact that B can't send more messages isn't a concern, since there are messages queued in A's send buffer--when it goes to read a message, it will find one, and reset its tReceive timer.

There's only a problem if A stops reading messages because it has too many pending BNDUPDs. In this case, A may conclude that B is down. This is the scenario that the max-unacked-bndupd option is intended to prevent.


Imagine a different system: Peer A reads a single failover message every second, processes it, and continues. 600 BNDUPD messages are sent to it, followed by a CONTACT. It will take it ten minutes to reach and process that CONTACT--but there is no chance that A will conclude that peer B is down, since A is processing the other messages and resetting tReceive on each one.



So I think the text is correct: it has more to do with the remote
system's processing (eg database) blocking than TCP blocking, but the
result, and what's trying to be avoided, is TCP blocking.

I think that TCP blocking is unimportant; only the remote system processing (database, etc.) is important. If the TCP connection blocks, then the remote peer unquestionably has pending messages available to reset its tReceive timer--problems only occur when the remote peer stops reading additional messages, regardless of whether the TCP channel has blocked or not.


To look at this in a different light: If a peer has a TCP receive buffer of infinite size, such that the TCP connection never blocks, should it send a very high value for max-unacked-bndupd? Certainly not, if its internal processing blocks after half a dozen BNDUPDs.

                - Damien



_______________________________________________
dhcwg mailing list
dhcwg at ietf.org
https://www1.ietf.org/mailman/listinfo/dhcwg