Routing Area Working Group (RTGWG) MINUTES

Meeting : IETF78, Monday 26, 2010
Location: 0.9 Athens, 13:00-15:00
Chairs:  John Scudder <jgs@juniper.net>, Alia Atlas <alia.atlas@bt.com>
Minutes: Andrew Lange, John Scudder & Alia Atlas
=======================================================================

draft-filsfils-rtgwg-lfa-applicability-00
Clarence Filsfils

Clarence Filsfils: "It's boring, but that's the point of the document."

Clarence Filsfils: In the future, the authors plan to add an analysis
of Multicast, and a short section on Capacity Planning.  Adding a
section on how to use a planning too for handling how to plan for LFA.
In many topologies LFA becomes completely deterministic because of the
topology.

Dave McDysan: Looked interesting & access topology is easier to
analyze.  In Michael Menth's draft, he looks at utilization.  Did you
analyze high utilization under failures?

Clarence Filsfils: In forthcoming planning section, we will cover
that.  This is something that the planning tool is going to analyze.
The beauty of the LFA is that there are not two choices; the LFA
pushes the traffic onto the next best path.  In steady state, standard
capacity planning process applies to ensure load.  In the
access/aggregation, there isn't enough meshiness to create
over-utilization.  For the core, the only way to analyze it is in this
planning tool.  We may also need diagnostics on routers to help with
this, TBD.

Dave McDysan: Are you planning to look at dual failures?  These occur
during a maintenance window with less traffic where a failure occurs.

Clarence Filsfils: I'll write down the comments, since this echos
another conversation we had.  This is a good conversation to have with
Thomas Telkamp, who works on a planning tool for this.  In
access/aggregation, not relevant - if you have 2 uncorrelated failures
in the same pocket, it is disconnected.  If it is in 2 separate
access/sub-aggregation regions, then they are unrelated.

Stewart Bryant: A way of looking at Dave's question is to look at prefix
density on each link.  That would give a first approximation.

Clarence Filsfils: The prefix-density is not correlated, unfortunately, with
traffic density.  Think about peering points.

Stewart Bryant: It would be a first-order approximation.

Clarence Filsfils: Worth doing, but the worst-case worth doing.

Jeff Tantsura: Are you going to propose Multicast??  Are you
going to send/join on secondary interface for PIM, in the multicast analysis?

Clarence Filsfils: There would be another draft.  We are working on an idea for
multicast.  It's obvious something must be done on the multicast side.
It's too early to say.

Jeff Tantsura:  An idea is to send PIM join on secondary LFA interface.

Clarence: We have another thing, but it is too early to say.

Michael Menth:  You looked at link & node failures - what kind of LFAs.

Clarence Filsfils:  Analyzed for per-link & per-node failures.

Michael Menth: Do you use both link-protecting, node-protecting &
downstream b/c some can cause loops in some topologies.

Clarence Filsfils: Can compute per-link or per-prefix LFA - each prefix can
have a different backup.  Then the LFAs have attributes - many ways of
characterizing them.  We analyze the 2 algorithms and then we want to
see what is the relevance of these algorithms.  Does it protect
against links, against nodes, and does it create loops?  For each
topology, we report whether it is link-protecting, node-protecting,
and micro-loops for each algorithm since these are what are most
relevant to SPs.

Adoption:  Many have read, many support and none oppose. 

John Scudder:  We'll ask again on the list, but safe to say this we'll make
this a working group document.

Routing Optimization with IP FRR

focus on mesh networks in contrast to previous talk which focuses on acesss

draft-menth-ipfrr-routing-optimization-00
Michael Menth

Begins with discussion of LFAs and then unique shortest path.

Brief description of what an LFA is.  Different types of LFAs based on
diagrammed on slides, some are link-protecting, some are
node-protecting, some are downstream which protect multiple failures.

LFAs are available, but don't always offer 100% coverage.  Mentions
some similar work, references in slides.  This work was in parallel
and there may be some overlap.

Looked at Max Utilization (all about TE) and Failure Convergence
(Maximize coverage).

Looked at different scenarios using each of these LFAs types.

We use the COST239 network - diagram in slides. Walks through table on
the slides, showing the differences between the LFA types, traffic
utilization (normalized), and coverage percentage.

John Scudder: To clarify, can I understand those numbers to address how much
of the network you have available for user traffic?

Michael Menth: The outcome depends on the input, so we normalized the traffic
inputs.  When we sorted for max utilization, the coverage increases,
per the chart, but total capacity available drops.  Minimizing the max
link utilization also tends to improve the failure coverage. Then goes
through the Failure Coverage scenario, and then sees that you can only
reach 90% in this scenario since there is already one node

Dave McDysan: (question on slide 17) You've computed the link costs
and then computed for the failures.

Michael Menth:  We optimized the link metrics to miminize the max
utilization coverage & similarly optimized the link metrics to get 

Dave McDysan:  Where'd the traffic come from?

Michael Menth:  Determined traffic based on a gravity model.

Dave McDysan: But this network had its link capacities.

John Scudder: Are you saying that if you were designing this network, you'd
consider capacity of links to be an adjustable figure.

Dave McDysan: Yes

Michael Menth: This is a theoretical network, that is used for
analysis, so capacity is a fixed value.  And there are homogeneous
link capacities in this network.

Coverage and utilization could be conflicting goals.  Presents a chart
showing max link utilization under failure versus failure coverage for
LFAs for one set of optimal metrics. Ran a pareto optimization. In
this network you couldn't get both.

Dave McDysan: But if you added the capacity, you could?

Michael Menth: Quite possibility, but capacity here was fixed.

Summary: what sort of LFAs can be used?  Are all types okay, are there
certain ones to avoid?  Question to the audience: What do you think of
different types of LFAs?

Stewart Bryant:  In the draft, you said we had to do all this because
there was no ECMP available in the repair mechanism.  It'd be very
easy to add ECMP around.  It'd be easy to add LFAs, multiple not-via,
or use the entropy-label for MPLS.

Michael Menth:  Two separate presentations, question is about the first part
which I haven't presented yet.  This is just about LFAs.

Dave McDysan: We see that the link failures are much more frequent
than the node failures.  Other question, what is the latency of the
results from this?  What latency results from the underlying links &
nodes?  This is something interesting for adjusting link weights.

Michael Menth: Discuss off-line.

Stewart Bryant:  Confused about something - the choice of LFA is a local
decision.  The answer of 'what sort do you want' is purely a private
matter.

John Scudder:  Isn't question 'what do you think you'll be using so I can
study that more?'

Michael Menth:  Should an LFA be used even if it creates a micro-loop.

Jeff Tantsura:  In general, I think it is acceptable.  If hw &
sw can handle, better to use the more complicated LFAs.  If not, then
not...

new topic: multiple shortest path/ECMP
draft-menth-ipfrr-routing-optimization-00 - Part 2
Michael Menth

ECMP distributes across all ECMPs.  If not enabled, then one path is
used.  Then there is a tiebreaker, and not predictable as to which
path it takes, and this can cause problems for traffic engineering.
There are multiple ways to TE a network, and without ECMP, if the
planning tool and router pick different paths there could be an
issue. We saw examples where link utilization is 200% greater than
expected.  Solution to this is "Unique Shortest Path" - if you design
the network so you don't have ECMP, then there is only one way the
traffic can go.  We then looked at this.  Unique shortest paths do
exist based on our experiments.  The ability to engineer these depends
on how large a maximum link cost you allow.  Chart in the presentation
shows this.

Then mapped the USP and SSP conditions, and mapped it to max link
utilization.  So we can get an optimal network, and a network with
unique shortest paths.

John Scudder: Are you considering multipath? I would expect to get
better utilization with multipath.

Michael Menth: In most topologies we can get the same efficiencies
with USP or SSP as with Multipath.

Ruediger Volk:  Do you think that networks are relevant that have lots
of parallel links?

Michael Menth:  We didn't work with networks with parallel links.

Ruediger Volk:  Do you think that would have different results?

Michael Menth:  Definitely...

What does this have to do with Not Via?  Description of Not Via
tunneling from the PLR to the NNHOP.  When there are ECMPs on the Not
Via backup path, we cannot know which of the multipaths the backup not
via tunnel will take.

Alia Atlas (at mic): LDP label stack hashing can provide ECMP including for notvia tunnel.
Michael Menth Oh, so you compute the hash down the stack?
Stewart Bryant:  Yes

composite links
draft-ietf-rtgwg-cl-requirement-01
Dave McDysan

Completely new draft from the last meeting.

John:  It's certainly not unprecedented to put numeric values into
requirements drafts.  The 50ms has entered into legend, but if there
are real hard targets that actually have to be met, it's legitimate to
put them into the draft.

Dave McDysan: First action is to create an appendix about what
services require what convergence times (scrubbed a bit) for
quantitative examples of SLAs.

Ilya:  Question about DR#7:  Does it cover case with 2 ECMP between
LSR?   Is that out of scope?  I would like to be able to build a TE
tunnel between the 2 LSR & have a bandwidth reserved that can only be
handled by that.

Dave McDysan: Only for the rapid restoration...  Would this candidate solution
approach meet this requirement cases?  I think it would meet these
requirements.

John Scudder: Clarify - want to signal an LSP that has a capacity larger than
either component link....

Dave McDysan: We do have a requirement to address this, Curtis put
this in on the mailing list. This was out of this draft, since it was
solution oriented.

John Scudder: What do you think we should do?  How close to baked do you
think it is?

Dave McDysan:  I think it is pretty close to baked.  There's been some
chit-chat on being clearer on definitions, quantify some of the
convergence time requirements.  The idea of link-bundling to reduce
convergence time.  Being clear on the cases of multi-area networks.

Curtis Villamizar:  No particular area that is hurting a lot, but we had
discussions among authors & we put it on the mailing list 2 weeks.  We
need to see what discussions we get & see what we get.

John Scudder: Sometimes discussions only happen at WG last call.

Dave McDysan: I tried to find the areas where we still need work.

John Scudder: Sounds like there's another revision to do and not
super-contentious.  Do your revision & let's try and last-call it
before the next IETF.  Let's get this part of the work published &
then we can start moving forward on the other stuff.

Dave McDysan:  There was some good discussion...

John Scudder:  1 comment on the doc - it seemed to mirror a disagreement
between Tony & Curtis on the list - requirements having to do with
aggregation of routing information.  Is that a solution or a
requirement?  Is it that you should converge to a time or is it that
one needs aggregation.  Curtis said 'everyone knows you need
aggregation to scale'.  I kind of like it better down in the derived
requirements.  Everyone does know we need to aggregate to scale, but
maybe it doesn't need to be Requirement #2.

Dave McDysan:  We state the aggregation as a derived requirement and not as a
functional requirement.  Decide if it is a solution.

Curtis Villamizar:  Requirement that the solution should aggregate unless it can
be demonstrated that the solution doesn't impact convergence.

Action 2: State as derived requirement.

John Scudder :  Let's be pragmatic enough to get this done.

STOP-TIME:  14:42