Re: [pim] Assert Mechanism and Joinn Expiry Timer (ET) interaction
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [pim] Assert Mechanism and Joinn Expiry Timer (ET) interaction
--- John Zwiebel <jzwiebel at cisco.com> wrote:
>
> On Jul 5, 2006, at 7:17 PM, Saurabh Goel wrote:
>
> >
> > (The discussion below assumes only SPT Vs SPT or
> RPT
> > Vs RPT Asserts and not SPT Vs RPT)
> >
> > However, the Assert mechanism doesn?t seem to be
> > resilient to route-flaps but rather seems to be
> > sensitive to it because of the following
> mechanisms:
>
> "route flaps" can mean a single event where the
> metrics
> change in a network and then things immediately
> settle down,
> or they could change and then change back again. Or
> it could
> mean continuous flip-flopping.
>
> In the first two cases, I think the spec is fine.
> In the last case, nothing is going to work.
>
> >
> > 1. We know that upstream Assert Loser transitions
> to
> > No Info and enables forwarding data on receiving a
> > Join from a downstream. If the downstreams are
> > flapping their RPFs, it would keep triggering new
> > Assert wars. The rationale for this behavior is
> quick
> > convergence as specified in section 4.6.5
> >
> > ?6. Behavior: An assert loser that receives a
> > Join(S,G) with an
> > Upstream Neighbor Address that is its primary
> IP
> > address on that
> > interface cancels the (S,G) Assert Timer.
> >
> > Rationale: This is necessary in order to have
> > rapid convergence in
> > the event that the downstream router that
> > initially sent a join to
> > the prior Assert winner has undergone a
> topology
> > change.?
>
> Remember the downstream routers also follow the
> assert exchange
> and will RPF to the winner. If they think the
> winner is the
> wrong one, they should be in a position to override
> the upstream
> routers. Unicast metrics do not converge on all
> routers at the
> same time. (which was why I brought up the
> prune-the-old-RPF problem)
>
The downstream routers are always supposed to lose
because
they always compare their own metric (which is
infinite because
CouldAssert() is always false ) with the Upstream
Winners metric.
The Assert mechanism in itself never allows the
downstreams to
compare the metrices of an 'Upstream Loser' against
the
Winner.
I guess, the only way it affects Winner election is
'indirectly' when it finds
that its unicast routing indicates a different RPF
than its old RPF (not RPF'
which is the Assert Winner). This makes it send Prune
to the old RPF (probably
redundant wrt data forwarding if the old RPF is
already an assert loser) and
not RPF' and Join to the new RPF. In general, it is
this join to the new
RPF that causes another Assert war. Please correct me.
> There is a small window of time where a join might
> be sent
> at the same time an assert is being sent. Canceling
> the assert
> timer will cause the assert to happen again which
> will get all
> the downstream routers to RPF to the winner
> (hopefully, things
> could still break, but we're getting to a point of
> diminishing
> returns.) Convergence here means all the downstream
> routers
> agree on which one is the winner.
>
> >
> > 2. Also, the upstream Assert Losers have a
> mechanism
> > (orthogonal to (1) above) to convert back to
> NoInfo
> > state and enables data forwarding if its metric
> > becomes better than the current Winner.
>
> Which causes an assert exchange and all the routers
> on the
> LAN converge on the same winner. A good thing.
>
> > (However,
> > convergence is slower when the Winner metric gets
> > worse because the Loser finds this out only in the
> > next Assert Winner message).
>
> If convergence means getting the mpackets via the
> "best" path, you are
> right. But if your goal is to just get the mpackets
> without
> interruption,
> it isn't important which system is forwarding on to
> the LAN as long as
> one of them is. (I can't offer an opinion on
> whether duplicate packets
> or lost packets is worse since "it depends")
>
> >
> > Both these mechanisms enable rapid convergence but
> > also make it sensitive to the route flaps. Even if
> (1)
> > were absent, say in case of static routes on
> > downstreams, (2) still makes it sensitive to
> > route-flaps on the upstreams.
> >
> > After the initial Assert ? assuming the (1) is not
> in
> > effect (static routes on downstreams or
> downstreams
> > already converged) ? (2) remains in effect only
> till
> > the Join Expiry Timer (ET) on the Assert Loser is
> > alive. The question I have is that why do we do
> away
> > with (2) after ET expires? Is it desirable to
> amend
> > this behavior?
>
> Are you forgetting that the assert winner is suppose
> to
> send a new assert 3 seconds before it times out on
> the
> loser? If it doesn't do this, then it no longer has
> forwarding state. If the old winner isn't
> forwarding
> on to the LAN, who is?
Here I am talking about the Join ET (gets refreshed on
a Join)
and not the Assert Timer (AT) which is refeshed on
losers by an Assert Winner message.
Assuming RPT Vs RPT case for the sake of specificity,
Join ET directly affects the marco join (*,G) and AT
on the
upstream loser affects lost_assert(*,G).
join(*,G) which depends on ET (which is never refeshed
while the upstream is in the loser state)
determined the outcome
of CouldAssert(*,G, I) irrespective of value of AT. On
the
upstream loser, if ET expires
CouldAssert(*,G, I) becomes false, and so does
AssTrDes(*,G,I)
which forces it to transition to NOInfo state even
though AT never expired.
This means that we lose mechanism (2) (though (1) is
still
in place) on the upstream loser discussed earlier
within the max value
of ET (which is the Holtime advertised in the Join
message it last
received before losing the Assert). To restate my
question
again, do we do this on purpose and if yes what are
the reasons? or we just chose to live with it to avoid
complexity ? I could see that this behavior may be
good enough if the route-flaps stabilize within ET
interval.
BTW, the avoiding prune-the-old-RPF in this case (not
all cases)
can actually cause us to lose the opportunity to
refresh the
ET on the 'going-to-be-loser' when a downstream
triggers an Assert by sending Join to the
'going-to-be-winner'
and Prune to the going-to-be-loser. The Prune override
to
going-to-be-loser actually refreshes the ET just
before assert
ensues giving mechanism (2) the max time.
>
> In all likelihood the loser won't get to this point
> because it won't have an OIF either. Having not
> specifically
> tracked this (what I consider rare) case, I could be
> in error.
> But the downstream routers should have converged on
> the winner
> and should be sending joins to the winner.
>
> >
> > Thanks.
> >
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
pim mailing list
pim at ietf.org
https://www1.ietf.org/mailman/listinfo/pim
Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.