[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comments on draft-chandra-ospf-manet-ext-01.txt



Russ White wrote:

"The first Hello packet with a particular SCS number MUST contain the
full state associated with that SCS number, and thus MUST NOT set the 'N'
bit in the State Check Sequence TLV."

Assuming that the Hello is multicast to all neighbors, does the "full
state" include all neighbors?  Or does it include only the neighbors that
recently changed state?


"Full state" should probably be changed here. It's actually the full change state, or rather all the changes occurring since the last SCS change.

From Section 2.3.6 (Receiving Hellos), I see that the initial Hello for a
new SCS number (with the N bit not set) must be received correctly by all
neighbors, and a neighbor must send a Hello request if it discovers that
it missed the initial Hello for a given SCS number. This concerns me,
because in a MANET, a node will typically miss a large percentage of
Hellos, and thus will have to unicast a large number of Hello requests to
each neighbor.

It is possible to design the Hello protocol so that it is OK to miss 2
out of 3 Hellos (for example) without having to send a Hello request.
TBRPF (RFC 3684) does by reporting each neighbor change in 3 (or k)
consecutive Hellos, so that each neighbor will either learn about the
state change, or will declare the neighbor lost after missing 3
consecutive Hellos from the neighbor. (To detect this, the Hello sequence
number is incremented with each transmitted Hello.)


If you do this, you don't need the sequence numbers at all. I'm concerned about this on several fronts, though....

First, if you are losing two out of three hellos on a regular basis, the
link quality is likely to be so low that it's unusable for passing data
traffic. Second, even if you lose two out of three hellos, you must also
always miss the specific hello with a state change. So, the only place I
see advertising all state changes for three hellos as being more efficient
than the incremental hello mechanism outlined in the draft is this: If you
always have a state change in at least one out of every three hello's, and
the hello with the state change information is always lost. While this
seems possible in some short term situations, I find it stretching to think
this would happen so consistently that it's always better to send the state
change information three times.

It depends on the scenario, being more beneficial to do this in very
dense networks with highly mobile nodes. If a node has 100 highly mobile
neighbors, it could easily have a state change with almost every hello,
and if a node misses 25% of the hellos from its neighbors, it would have
to unicast 25 hello requests for each hello interval. But it occurred to
me that you could do this anyway, i.e., send each state change 2 or 3
times(with the N bit not set), as an option. Simulations would be
helpful to evaluate the benefit of doing this.


Another issue is that, because different nodes can have different
transmission ranges, it is possible for a node to have many 1-way
neighbors that never become 2-way. Unfortunately, your incremental Hellos
will have to report all of these 1-way neighbors forever in Hellos. In
TBRPF, when a 1-way neighbor is discovered, it is reported in at most 3
Hellos. (The correctness proof for TBRPF shows that this works, and TBRPF
has been throroughly tested.)


?? Then how does TBRPF form an adjacency in this situation?

A---B

A sees B, but B doesn't see A. This condition lasts for 90 seconds, long
enough for A to report B, without B seeing A's hello's. Now, B moves
somewhat closer, and starts reporting A in it's hellos. Of course, at this
point, B has no idea A can see it, and thus two way state will never be
reached.


Sure it will, because node A will then see that B is reporting A in its hellos, and will again report B in its hellos. This is step 4b in Section 7.4 of RFC 3684. (However, this does not prove correctness. The correctness proof is nontrival.)



If there's a solution to this, then we could easily adapt it to the
incremental hello solution, as well, just for one-way neighbors, without
using the "advertise three times and drop" rule. You could state that as
soon as A sees its own router id in B's hellos, it begins to advertise in
its hello's until full state is reached. Note you couldn't use the three
and drop rule in this case, because it's entirely possible the first three
hello's A originates with B's router id before two way state is acheived
could be dropped, thus leaving the two devices forever in one way state,
with no way out.


That won't happen with TBRPF, because if B misses those 3 hellos from A, then B will declare A to be lost, and will report the loss of A in 3 hellos (and A will either receive one of those 3 hellos or will declare B to be lost). You can see how the proof might be nontrival, so I am including it at the end of this message.


1.   If the node is an active overlapping relay for the adjacent
    speaker, then the router MUST immediately relay any information
    received from the adjacent speaker.


This could cause inifinite looping, since two nodes could be overlapping
relays (MPRs) for each other. I guess you mean either: (a) the LSA is
relayed if it has not already been relayed, or (b) the LSA is relayed if
it has not already been received. The inefficiency of doing (a) has
already been discussed on this mailing list, as well as the possible
incorrectness of doing (b).


?? The normal OSPF flooding rules are applied here, so you would not get into a flooding loop.


OK, then you are indeed doing (b), which should work fine in your case (since you allow non-MPR nodes to relay LSAs when necessary).

Richard




:-)

Russ

__________________________________
riw at cisco.com CCIE <>< Grace Alone


Correctness Proof for TBRPF Neighbor Discovery

The TBRPF Neighbor Discovery (TND) protocol, described in RFC 3684,
uses differential HELLO messages, which report only *changes*
in the status of links. This results in HELLO messages that
are much smaller than those of other link-state protocols such
as OSPF, in which each node must report all of its neighbors in
HELLO messages at least once per HELLO interval.

Since TND uses differential HELLO messages, its correctness is not
obvious, because HELLO messages are not sent reliably, and thus a
HELLO that reports a change in the status of a link to a neighbor
might not be received by the neighbor.

We first review some aspects of TND that will help to understand
the proof.  Although TND supports multiple interfaces, TND is run
separately on each interface, so without loss of generality we
consider only one interface.

In TND, each node must broadcast (to all neighbors) at least one
HELLO message every HELLO_INTERVAL seconds.  Each HELLO message
includes three (possibly empty) lists: the NEIGHBOR REQUEST list
includes neighbors whose status has recently changed to 1-WAY,
the NEIGHBOR REPLY list includes neighbors whose status has recently
changed to 2-WAY, and the NEIGHBOR_LOST list includes neighbors
whose status has recently changed to LOST.

When a node i changes the status of a link (i,j), it includes the
neighbor address j in the appropriate list
(NEIGHBOR REQUEST/REPLY/LOST) in NBR_HOLD_COUNT (NHC, typically 3)
consecutive HELLOs, or until it receives a reply from node j
(whichever comes first). This ensures that node j will either
receive one of these HELLOs, or will miss NHC consecutive HELLOs
from node i. In the latter case, node j will declare the link
(j,i) to be LOST.

TND employs hysteresis as follows.  A node i must receive at least
HELLO_ACQUIRE_COUNT (HAC, typically 2) of the last
HELLO_ACQUIRE_WINDOW (HAW, typically 3) HELLOs sent by node j
before changing the status of link (i,j) from LOST (or nonexistent)
to 1-WAY or 2-WAY.  However, once the link (i,j) becomes 1-WAY
or 2-WAY, node i changes its status to LOST only if it misses NHC
consecutive HELLOs from node j, or if it receives a HELLO from
node j which includes node i in the NEIGHBOR LOST list.
(A node detects that NHC consecutive HELLOs were missed either
based on the HELLO sequence number of by not receiving any
HELLO for NBR_HOLD_TIME seconds).

Definition: We say that link (i,j) satisfies the 1-WAY condition
at time t, if there exists a time t' < t such that node i received
HAC of the last HAW HELLOs from node j at time t', and node i did
not miss any NHC consecutive HELLOs from node j during the time
interval [t', t].

It is clear from the description of TND that if a link (i,j) satisfies
the 1-WAY condition, then node i considers the link to be either 1-WAY
or 2-WAY. It is also clear that if a link (i,j) does not satisfy the
1-WAY condition, then node i considers the link to be LOST (or
nonexistent).  However, it is not clear, for example, that node i
will correctly detect a change in the status of link (i,j) from 2-WAY
to 1-WAY. This example is covered by the unidirectional case in
the following theorem, which proves the correctness of TND by
considering the three possible cases: bidirectional, unidirectional,
and lost in both directions.

Theorem. Consider two nodes i and j at a given time t.
(a) Bidirectional case: If links (i,j) and (j,i) have both satisfied
the 1-WAY property for at least 2*NHC*HELLO_INTERVAL seconds, then
both nodes i and j consider the link between them to be 2-WAY.

(b) Unidirectional case: If link (i,j) has satisfied the 1-WAY
property for at least NHC HELLO intervals, and (j,i) has failed
to satisfy the 1-WAY property for at least HNC*HELLO_INTERVAL
seconds, then node i considers the link (i,j) to be 1-WAY, and
node j considers the link (j,i) to be LOST.

(c) LOST in both directions: If links (i,j) and (j,i) both fail to
satisfy the 1-WAY property, then both nodes i and j consider the
link between them to be LOST.

Proof. Case (c) is trivial since, as mentioned above, if link (i,j)
does not satisfy the 1-WAY property, then node i must consider
the link to be LOST.

Now consider case (a). Since links (i,j) and (j,i) both satisfy the
1-WAY condition at time t, both nodes i and j must consider the
link between them to be 1-WAY or 2-WAY.  Let t' be the last time
that either node changed the link status from LOST/nonexistent
to 1-WAY or 2-WAY, and assume without loss of generality that this
status change occured at node i.  (There must be such a time since
the status of each link is initially LOST/nonexistent.)
It follows that (i,j) and (j,i) both satisfied the 1-WAY condition
during the interval [t', t], which by the Theorem's assumption
implies that t - t' is at least 2*NHC*HELLO_INTERVAL seconds. Since
the link status at node j was 1-WAY or 2-WAY at time t', node i must
have received a HELLO message from node j at time t', which could
not have included node i in the LOST list.  In response, node i
included node j in the REQUEST or REPLY list in its next NHC HELLOs
after time t', or until it saw itself in the REPLY list of a HELLO
message sent by node j.  If the latter case occured, then the status
at node j at some time after t' was 2-WAY, and upon receiving this
message, node i then changed the link status to 2-WAY which would
imply that the link status at both nodes is still 2-WAY at time t,
since a link status cannot change from 2-WAY unless one of the two
nodes declares the neighbor LOST.

If the former case occured, then node j would have received one of
the NHC HELLO messages (by time t' + NHC*HELLO_INTERVAL), since
otherwise it would have changed the link status to LOST.  Node j
would therefore respond by setting the link status to 2-WAY and
including node i in the REPLY list in its next NHC HELLOs, and
node i would have received one of these NHC HELLOs (by time
t' + 2*NHC*HELLO_INTERVAL < t), since otherwise it would have changed
the link status to LOST.  Upon receiving one of these HELLOs, node i
would have changed the link status to 2-WAY. Again, this implies that
the link status at both nodes is still 2-WAY at time t, since a state
cannot transition from 2-WAY unless one of the two nodes declares
the neighbor LOST.

Now consider case (b). Since link (j,i) fails to satisfy the 1-WAY
condition, then (as mentioned above) node j must consider the link
to be LOST at time t.  We consider two subcases: (b1) link (j,i)
never satisfied the 1-WAY condition, i.e., was always LOST;
and (b2) link (j,i) satisfied the 1-WAY condition at some time in
the past.  If subcase (b1) is true, then node j would never include
node i in the REQUEST or REPLY list of any transmitted HELLO.
As a result, node i could not have changed the status of link (i,j)
to 2-WAY.  Therefore, since link (i,j) satisfies the 1-WAY condition
(implying that node i must consider the link to be 1-WAY or 2-WAY),
node i must consider the link to be 1-WAY.

Now suppose subcase (b2) is true, and let t' be the last time that
node j changed the status of link (j,i) from 1-WAY or 2-WAY to LOST.
Then node j must have included node i in the LOST list in its next
NHC HELLOs after time t'. Node i either received one of these HELLOs,
or missed NHC consecutive HELLOs; in either case node i must have
considered the link to be LOST at some time between time t' and
time t.  Since node j considered the link to be LOST during the
entire interval [t', t], it could not have included node i in the
REQUEST or REPLY list of any transmitted HELLO during this interval.
Therefore, as in subcase (b1), node i could not have changed the status
of link (i,j) to 2-WAY.  Therefore, since link (i,j) satisfies the
1-WAY condition, node i must consider the link to be 1-WAY.
This establishes case (b) and completes the proof of the theorem.