[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comments on draft-spagnolo-manet-ospf-design-00.txt



Gary,

Please see my comments below.

Gary Pei wrote:

Dear Richard,
Thank you very much for your detailed comments. Please see my in-line
comments.

On Mon, 26 Jul 2004 09:07:06 -0700, Richard Ogier <ogier at EARTHLINK.NET>
wrote:

Please consider my comments below for the draft
http://www.ietf.org/internet-drafts/draft-spagnolo-manet-ospf-design-00.txt
to be presented next week in San Diego.

Overall, the draft presents some nice analysis, simulation results,
and conclusions, and I was happy to see that ideas from my draft
http://www.ietf.org/internet-drafts/draft-ogier-manet-ospf-extension-01.txt
(recently updated) were considered. However, I disagree with some of
the conclusions as noted below.

My comments are divided into the following 5 items:

1. Simulation Model
2. Originator-based LSA or ACK supression
3. Simulation results for reliable vs. unreliable flooding
4. Reliable flooding without ACKs, using periodic DDESC packets
5. Differential Hellos and differential LSAs

Items 2 and 3 are a bit long, so I first summarize them, and later
provide more details in items 2a and 3a.

1. Simulation Model

Although the analysis considered up to 100 nodes, the simulation
results considered only 20 nodes, which is quite limited.
The high mobility case (for 20 nodes) had a neighbor change rate of
0.14, or one neighbor change every 7.14 sec on average.
If there are 100 nodes and thus 5 times as many neighbors, then the
neighbor change rate should increase to about 0.70, or one neighbor
change every 1.43 sec, so that an LSA would be generated almost every
MinLSInterval (5 sec), which would trigger the "periodic update" mode
(possibly with non-ackable LSAs) in any of the hybrid flooding schemes
we have considered (whether or not prediction of neighbor changes is
used). Thus, your results are highly dependent on your choice of
number of nodes and mobility level.


The neighbor change rate is more closely related to node density than the network size. It is not necessarily true that 100 nodes will have 5 times higher neighbor changing rate. In fact, I am expecting that the neighbor changing rate remains constant if we keep the density the same for both 100 nodes and 20 nodes network.


*If* you keep the density constant, then I agree. I was talking about the case where the density (average number of neighbors) increases 5 fold. For example, Figure 1 assumes that avg-neighbors = #nodes/2. I don't think we want to assume that the number of neighbors will never exceed 10!



In addition, a larger number of nodes would result in larger Hellos,
thus making differential Hellos more beneficial in a highly mobile
network (in which a smaller Hello interval would be used). As indicated
in your analysis (Figure 1), using 100 nodes and a 2 second Hello
interval results in about 62 times as much Hello overhead as using 20
nodes and a 10 second Hello interval (when full Hellos are used).


Figure 1 assumes the area is same for all network sizes. The hello overhead is really affected by the density not by the network size. Thus, the dense network may benefit from differential hellos.

We agree on this. (When I said a larger number of nodes, I again meant a
larger number
of neighbor nodes.)



Also, the only measure you used for data packets was delivery ratio,
which was about .78 in your high mobility scenario.
MANET simulations often assume that link-layer failure detection
is used, which typically results in a delivery ratio close to 0.99
for even faster mobility than you considered. It would be good
to run simulations that assume link-layer failure detection is used,
and that consider other performance measures such as delay.



I believe that the delivery ratios (as well as delay) are very scenario dependent. In the simulation, each node sends packets to *every* other nodes using CBR traffic generator, the packet losses are due to the following reasons: (1) no possible routes physically exist at the time of delivery; (2) packets are dropped because the link exceeds the retransmission attempt limits due to fading etc. that effects an existing link; Note in this case, regardless whether the you have link-layer failure detection or not, the packet will be dropped.


Not necessarily. It is possible for the packet to be rerouted on another link, which we did in our ns-2 simulations of TBRPF. Also, link-layer failure detection allows much faster response to link failures (e.g., a few milliseconds) than waiting for RouterDeadInterval (several seconds).


(3) route-change dissemination latency causes packets routed via failure links. In this case, it may be true that the link-layer failure detection can speed up the route convergence if efficient layer 2/layer 3 interface is implemented. However, PTMP only relies on Hello protocol to detect the link change. Thus the link-layer failure detection will not help unless the OSPF wireless extension considers the link-layer failure indication from link layer. So link-layer failure detection may only help case 3.

I don't understand where 0.99 comes from. Sure, there exist 0.99 scenarios,
but it should be neither general nor typical.


I meant 0.99 or better, which we typically obtained in our TBRPF simulations when link-layer failure detection was used, even when nodes were moving 20 m/s.




2. Originator-based LSA or ACK supression (summary)

Although I mentioned receiver-based ACK suppression in version 00
of my draft, I later stated that I preferred originator-based
methods (described in version 01 of my draft), so I will focus on
such methods. However, I will also mention that your simulation
result in Figure 10, which shows increased overhead for my ACK
suppression method, does not use the parameters that I would use.
(I would have used a threshold parameter that suppresses ACKs only
when a new LSA is originated almost every MinLSInterval - not true
in your case - which should not increase overhead. However, this is
not important since we both prefer originator-based methods.)

Regarding originator-based methods, we seem to agree that using
an option bit to indicate a non-ackable LSA can be useful and
should be investigated.

However, I disagree with you that using an exponential moving average
(EMA) to measure the rate of LSA changes is not a good approach.


Please see my comments under 2.a below.

The following bullet from your draft refers to this approach:

 "The approach adapts to the link changes passively by predicting
  the future LSA change based upon the LSA's history.  In the MANET
  environment, it is difficult to predict future link changes based
  on the past.  Looking at both fixed and exponentially weighted
  windows with different window sizes and weights, we could not
  predict when future LSAs would be created.  In fact the
  distribution of interarrival times looked almost as if the
  interarrivals were exponentially distributed (modulo the
  MinLSInterval)."

My goal was to measure the long-term or medium-term average
rate of LSA changes, and to base the decision of using reliable
or periodic flooding on such a measurement.  (The threshold should
be chosen so that periodic flooding is used only if it generates
less overhead than reliable flooding.)  Thus, I was interested in
a quasi-static scheme based on an average rate, rather than a scheme
that reacts dynamically to short-term variations in the (exponentially
distributed) interarrival times. Your prediction scheme is of the
latter type, which tries to do better than the quasi-static approach
by making a short-term prediction of the interarrival times.

Although using prediction of LSA changes can result in better
performance when the LSA change rate happens to be near the threshold
as in your case (so that your predicted measurement frequently
crosses the threshold), using an EMA based on LSA history can also
be quite effective in reducing overhead (when used by the originator
to decide whether to flood LSAs periodically vs. reliably) if
done properly, and could be a good choice if it is decided that
using a prediction technique is too complex.

Simulations could be conducted to compare these two approaches
for the hybrid reliable/unreliable LSA flooding method described
in item 2a below (and in my updated draft), but we need to
consider a large range of scenarios, and avoid drawing conclusions
based only on a few scenarios.


3. Simulation results for reliable vs. unreliable flooding (summary)

Your simulation results in Figure 17 compare the overhead
for reliable vs. unreliable flooding, where the reliable
flooding incorporates several of the proposed optimization
techniques (source-independent CDS, source LSA suppression,
multicast ACKS, and retransmit timer backoff.)
Your results show that, for the high mobility case, unreliable
flooding generates much less overhead than reliable flooding,
with no significant difference in delivery ratio.

As I explain below (item 3a), a hybrid reliable/unreliable flooding
method can be used that performs the same as unreliable (periodic)
flooding when mobility is high, and never performs significantly worse
than unreliable flooding.  Also, reliable flooding (or a hybrid
solution) will generate much less overhead than periodic flooding
in *non-mobile* wireless networks, which is an important
advantage in low-bandwidth sensor networks, for example.
(Thus, the fact that unreliable flooding overhead is independent
of mobility can be a *bad* thing, for non-mobile networks.)


Please see my comments in 3a. BTW, I don't think it is good idea to run OSPF in sensor networks.



Even so, it would make sense to run OSPF in non-mobile wireless
networks, and periodic flooding would be inefficient in such networks.



Also, I discuss below (item 3a) other possible ways to reduce overhead
in reliable flooding. For example, the CDS algorithm you are using may
not be the best one. I have been running simulations to compare
different CDS algorithms, and may announce my results soon. Maybe you
can run simulations to evaluate the algorithm that I found to be the
most promising.  Other people may also suggest CDS algorithms, so
it would be good to compare several of them.
It is known that a good CDS will not result in more transmissions
of a given LSA than MPRs, i.e., a source-independent CDS
will be no larger than a source-dependent CDS (based on MPRs).


4. Reliable flooding without ACKs, using periodic DDESC packets

In Section 6.5 of your draft, you mentioned the methods of INRIA [12]
and Ogier [5] to achieve reliable flooding without ACKs, based on
periodic sequence numbers, similar to IS-IS.
However, in Sections 7 and 9.2 you mentioned only [12] for
adjacency forming.  What I suggested in my draft is a simple
application of the IS-IS method of periodic Sequence Numbers packets,
which translates to *periodic* DDESC packets in OSPF that are
*broadcast* to all neighbors. This method is equivalent to using
LSA NACKs instead of LSA ACKs.  (The NACKs would actually be the LSR
packets sent in response to DDESC packets.)
If a CDS is used, then only CDS nodes need to send full DDESC
packets. (If MPRs are used, then each MPR only needs to include
a subset of LSA headers in the DDESC packets it sends, as described
in my draft.)  I agree with you that [12] is a big departure from
OSPF, but what I propose is less of a departure from OSPF since
it does not require any new message types. (An option bit could be
used to distinguish a periodic DDESC packet from a regular one.)

Also, you propose using INRIA's method [12] only for external LSAs,
while using ACKs or unreliable flooding for internal LSAs, the
rationale being that there will be more external LSAs than internal
LSAs. (Although the manet could be configured as a stub area, you do
not mention this in your draft.)
However, it would be nice to have a single unified solution that
is as scalable for internal LSAs (for very large manets) as for
external LSAs. The method of periodic DDESC packets that I propose
is one possible way to achieve such a unified solution.
Simulations should be conducted to compare these alternatives.


5. Differential Hellos and differential LSAs

As discussed above, Hellos can generate a significant amount of
overhead in dense, highly mobile networks where a relatively
small Hello interval is desirable. (That is why we used differential
Hellos in TBRPF.)  Therefore, I think that the design should include
differential Hellos.

For the same reason, I think that differential LSAs should be
given serious consideration, since this can also reduce overhead
significantly in dense, highly mobile networks.
(That is why we used differental LSAs in TBRPF.)
Of course, simulations are needed to evaluate this.

----------------

I will now provide more details for items 2 and 3 above.

2a. Originator-based LSA or ACK supression (details)

As I said above, using prediction of neighbor changes can be helpful,
but simply using an EMA to measure the rate of neighbor changes
can also be quite effective if done properly. In fact, I believe
it can achieve the same reduction in overhead as using prediction
(if done properly). Using prediction may help to reduce LSA latency,
by allowing an LSA to be originated earlier if it is predicted that
no further link change is likely to occur soon.
(However, your draft concludes that LSA latency is not a problem
even if unreliable, periodic flooding is used.)

In the hybrid reliable/unreliable flooding method that I suggest
(described in my updated draft) the originator decides
when to generate periodic non-ackable LSAs (unreliable/periodic mode)
vs. event-driven ackable LSAs (reliable/event-driven mode), using
an EMA that estimates the probability that a link change will
occur within the next MinLSInterval (which I think is the same
as your LSA_WAIT_TIME, which I assume is 5 seconds).
When the EMA goes above a certain threshold (say 0.9), the
originator goes to unreliable mode and generates a non-ackable
LSA periodically every PERIODIC_WAIT_TIME (say 10 seconds).
Since the LSAs are non-ackable, no ACKs or retransmitted LSAs
are sent in this case, so going to unreliable mode can only
decrease overhead. The originator goes to reliable mode when
the EMA goes below the threshold (possibly a different threshold
for hysteresis). The last LSA sent in periodic mode must be
ackable, so that subsequent LSAs need to be originated only
when there is a link change.


I think we are talking about the same thing from your above description. Here you suggest to estimate the probability. In simulation, the link change time is estimated. If the link change interarrival time is indeed exponentially distributed, the probability that a link change will occur in next T seconds is independent from history and should be constant. I don't see why EMA is good approach in this case.


If it is a constant, then I am simply suggesting using an EMA to measure this constant. (But of course, it will actually change as the number of neighbors changes.) As I said in item 2, I suggest using using a quasi-static method to decide whether to use reliable or unreliable flooding, depending on the average interrarival time. Thus, reliable flooding will be used if link changes are infrequent, and unreliable/periodic flooding will be used if link changes are frequent. The threshold is chosen so that unreliable flooding is used only if it generates less overhead than reliable flooding. This seems like a good approach to me. You are claiming that it helps to predict future link changes (which is a more dynamic approach and can result in frequent transitions between periodic and event-driven flooding), but I am claiming that (if the EMA approach is done correctly), prediction will not result in significantly less overhead (but could improve LSA latency). Simulations are needed to compare these methods. (You compared using prediction to using an EMA based on LSA history, but not for the method I am now proposing.) Also, simulations should be run for a wider range of scenarios (e.g., different densities and node speeds). If such simulations show that using prediction helps significantly, then maybe the additional complexity is justified. Otherwise, the simpler quasi-static approach may be preferable. (However, methods that avoid LS ACKs completely, such as the method of periodic DDESC packets, may be more efficient than any of these adaptive flooding methods.)



Frankly, I was assuming that PERIODIC_WAIT_TIME = MinLSInterval,
since I don't think it is a good idea to increase LSA latency
when link changes are frequent.  If this causes too much
overhead, then the period used for unreliable mode can be
adjusted based on the measured overhead of originated LSAs,
using an EMA. (If the main goal is to control overhead, then
I don't think it helps much to use prediction.)


3a. Simulation results for reliable vs. unreliable flooding (details)

In item 3 above, I claimed that a hybrid reliable/unreliable flooding
method can be used that performs the same as unreliable flooding when
mobility is high, and never performs significantly worse than
unreliable flooding. This can be achieved using something like the
hybrid method described in item 2a above, except for the DDESC and LSR
overhead, which I discuss below.  The key is for the originator-based
method to go to reliable/event-driven mode ONLY when doing so will
generate less overhead than unreliable/periodic mode.

The LSR overhead was negligible in your results of Figure 17, but
the DDESC overhead (which was a significant 7.81 kbps) can
be reduced by using one or more of the following techniques:

- Only CDS nodes need to send a full DDESC (as described in my draft).
 Since a CDS typically consists of less than 10% of the
 nodes, this could reduce DDESC overhead by 90% or more.

- A DDESC packet only needs to contain headers of ACKable LSAs, and
 thus could be EMPTY if all LSAs are non-ackable (in unreliable mode).

- If the method of sending periodic DD packets is used (similar to
 IS-IS, as suggested in my draft), then only CDS nodes would send
 periodic DDESC packets.


I agree that adaptive flooding is the right approach. The key difficulty is however the conditions on which each node decides to change it flooding mode. I think these conditions should be identified and quantified. It seems that no proposals are currently addressing this issue.

We have discussed several ideas that can be compared via simulations.
These include source-based LSA suppression, with or without non-ackable
LSAs, and with or without prediction; and methods based on IS-IS that
avoids ACKs (e.g., periodic DDESC packets).

Regards,
Richard Ogier



Regards,
Gary Pei