Rtg Area Working Group - IETF 81

Alia - full agenda today.  

Alvaro is new co-chair.

Alia reads the note well

Agenda bashing:

1) composite links
2) fast reroute with max red trees
3) OTV
4) if time MPLS FRR using LDP extend.

Alia trying to get more scribes.

Blue sheets etc.

charter update being planned.  only discussed monday so new to the group.  Goal is primarily 2 things:
1) emphasise this is WG for hop by hop routing/FRR for unicast & multicast
2) add milestones for composite links.
please discuss on list

draft status:

1) LFA applicability in last call - ending Aug 1st.  No comments yet on list.  Please read.

2) draft-karan caused much interest last time.  charter update brings
in scope but need co-ordination with PIM WG.

3) notvia will be done as informational

4) ordered FIB looking for opinions.  interest in further work now?
Or just document what's known until we get interest in proceeding
further?


Composite Link Framework (Ning - had been planned to be Curtis).

want to update on this draft, and what's changed.  Been published on
WG list.  Also updates as to new things we need to take to
requirements draft.

requirements completed WG last call.  Framework been in progress for a
while and still an individual draft.  What it adopted as WG draft.
Been stable but there's a considerable new addition so co-authors are
thinking that part of the new addition (management plane requirements)
needs putting in the requirements.  So will go ahead and add 4 new
requirements onto the requirements draft and ask for a respin there.
Hopefully quite a small change.

update for framework itself.  Covers overview of usage of CL but also
CL control plane.  now listed out extensions needed (e.g. IGP-TE,
RSVP-TE).  One note of caution - don't be scared by long list as
tweaks are fairly minor.  Hopefully framework will catch everything so
the scope of work is clear.

Added various stuff to framework.  Curtis got delayed.  done more like
an individual draft but intention was to merge into this one.
Additions are things like arch summary, tradeoffs, challenges.
Co-authors will merge the 2 docs after this meeting, and then ask for
WG adoption.

Some of the wording might need clarification so will do that when the
merge gets done.

Some new mechanisms proposed.  To meet requirements draft we need to
leverage other active IETF work (e.g. MPLS loss-delay).  Some of that
work is not even WG draft stage yet.  So there's various states
there. But the solution here will rely on that other work so we want
to point to it.

Also the section on required protocol extensions/mechanisms is not
complete.  Need to resync that when the merge happens and check no
gaps.  no intention to specify those solutions in this draft.  But if
people have input on what may be needed then please provide it.

in terms of moving forward the co-authors will merge, publish on list,
ask WG to adopt.  Will cite existing work and need agreement that
meets requirements.  Need to complete list of new protocol work.

when we started composite link (years ago) use cases were well defined
and narrow.  But now the scope is broader.  So might be useful to
revise some of the old use cases from earlier revisions and then
expand them so operators can understand the broader scope of the
composite link work.


Fast Reroute with Maximally Redundant Trees (Alia).

this is work that Alia has been doing with others from Juniper, Cisco, Ericsson.

will talk about motvivation and architecture then Gabor will talk algorithm etc.

Basic issues is that LFA isn't enough - doesn't give 100% coverage.
it's NP-hard to figure out how to extend the network when you have a
single failure.  After failure coverage decreases.  it's important to
get 100% coverage.  NotVia can guarantee coverage but requires lots of
network state and complexity.  Research done to reduce state but seems
to Alia that we need a solution that's perhaps more complex than LFA
(hard to get simpler than that) but is simpler than NotVia in terms of
state and network complexity.  And of course as LFA is deployed we
keep hearing more calls for 100% coverage to improve it.  Also of
course unicast isn't enough.  So important to talk about multicast -
not just FRR but also live-live.  Drafts on that were discussed last
IETF.  And need link plus node protection.  SRLG is nice, but not
really ready yet.

what if we could always compute 2 link and node disjoint paths between
any 2 nodes in the network?  Neither one necessarily is the shortest
path.  But now you have an alternate that isn't failure specific.  2
destination routed trees per destination, and one of those will always
work.  Lots of research done (30 papers or more) into maximally
redundant trees.  there's an algorithm that gives you those trees as
long as your network supports it (i.e. is 2-connected).  What makes
this useful is that it works when your network isn't 2-connected.
It'll give you the paths that are "as disjoint as possible given your
network".  Example of where maintenance happens, or failure takes a
while to repair.  So now network isn't 2-connected.  But you still
want as much protection as the network will allow.  And want 100%
coverage whenever topology permits.  Alia shows example where network
isn't 2 connected and 2 trees HAVE to share one link.  What we can do
is have an algorithm that works as well as possible in that case.

so pair of disjoint maximally redundant trees per IGP-area
destination.  Blue and Red MRTs.  Not one algorithm - there's a
spectrum with tradeoffs between computational complexity and
optimality of path selection.

just like djikstra this runs on link state database.  no signalling.
handles any network.  flexibility on path goodness/complexity of
computation.

Various usability goals.  The key thing is to handle real networks
with 100% coverage.

so how do you use this? It doesn't replace LFA, it
supplements/augments it to get 100%.  So if you have an LFA then use
it.  Otherwise you have the blue and red MRTs.  You can pick one based
on primary next-hop's failure.  Draft talks about various stuff -
e.g. how to recompute vs swapping blue to red.  And of course you
pre-select that MRT.

theres an issue that we don't have enough spare bits left to steal in
the header.  So instead of that we have to do some kind of
encapsulation.  e.g. use GRE tunnels or use a different LDP label.
Advertise additional loopbacks.  e.g. have loopback label for FEC and
then labels for blue and red MRTs.  Could conserve label space with
topology-id labels, but have an option here for topology encoded in
labels.  No new hardware in MPLS case (but need context-based labels).

What about multicast live-live?  Want 2 disjoint trees from source to
all other nodes.  So that's an MRT.  So extend PIM and mLDP to say you
want to join Blue or Red MRT when you join a group.  Up to the
receiver to decide which stream to forward on.  Still some work in
progress in that packets might have to identify topology for cases
where you aren't 2-connected.

also issues for how we do this for non live-live multicast.  The PLR
doesn't know where the next-next hops are in normal MLDP or PIM.  Need
to specify which next hops have already joined and their associated
labels.  Also merge point doesn't know which interface traffic will
come in on.  2 ways to handle this:
1) upstream backup join.  PLR is the one who knows the information
(the merge point doesn't know all that). So the PLR sends an upstream
backup join to each merge point.
2) encapsulated backup traffic.  So if traffic is encapsulated it must
be the alternate, not the regular traffic.

key point that differs between multicast and unicast.  PLR doesn't
know if it's a link or node failure.  For unicast you just use the
node protection alternate. But in multicast your next hop might have
receivers hanging off it so you want to send traffic to both link and
node protection alternates.  Send traffic there until a configured
timeout.  The merge point can decide whether to accept alternate
traffic based on whether it's primary upstream link is up (if it is
then the link protection will work, if not then the node protection
has to be used).  Also if you join to new topology you have a new
primary upstream link so you can start dropping alternate traffic on
the floor.

Question from Russ White.  Wants Alia to talk about computational complexity.

Alia you have 2 extra labels per IGP destination.  Fast computation is
linear.  More optimal takes more time.

Stewart Bryant - when you say "per-destination" you need to worry
about multi-homed prefixes as well?  Is it 3 per destination.

Alia - yes.  I won't leave that out.

Alia - where do you replicate with multicast.  PLR? (ingress
replication).  Can send along appropriate alternate to MP (which
recognises it by encapsulation) and then can forward based on state of
primary uplink.  Standard issues of ingress replication if you have
lots of MPs.  But simple and doesn't create lots of extra state.  Does
require tunnelling as no multicast tree on path to alternates.  Second
option is to replicate along an alternate tree.  This now creates
state in the network.  The PLR would send upstream back-up joins
unicast to a merge point, intercepted along the way, specify what kind
of alternate it is.  We need to merge the alternate trees for
different PLRs for the same S,G.  Drawback is unnecessary replication
but reduces state etc.  Also can reduce state by merging alternate
trees from the same S regardless of ,G as candidate set of
next-next-hops is the same.

Showing replication on alternate tree.  Problem is have different
alternate trees from different PLRs intercepting the same point.
Showing alternate trees one hop on from the source.  Basic problem is
that when a node gets alternate traffic it may not know which
alternate tree it came from.  If merge trees then that node will send
to output links for all trees.

Stewart question as to how the egress knows which tree is sending it
traffic.

Alia - back to the point that the node looks at its upstream to see if
it is up.  Can discuss offline. Tight for schedule.

Also cases where primary multicast tree and alternate tree can
intercept.  So traffic can come over a link from both a primary link
and an alternate.  So need a way to mark traffic as being on alternate
tree.

Some work in progress - e.g. ABR knowing MRTs per area.  How do we
reduce that state?

Various algorithmic work in progress.  Exact details on e.g. how to
handle broadcast interfaces, unequal link costs, etc.

Gabor - talking algorithm.  showing a way to find the MRTs.  will
assume network is 2-connected as we're short of time.  Basic model is
partial ordering of nodes in the network.  idea is that the root is
the smallest and the largest node.  but partial as can't always say
that one node is greater than another or not.  good thing with the
partial order is we can find redundant trees.  if you have always
increasing and always decreasing then the two paths can only have the
root in common.  If you have 2 trees they must be redundant as long as
you increase on one and decrease on the other.  Also true if you vary
the next hop.  If you have 2 greater neighbours you can use both of
them for ECMP.

So how do we find those?  worked example.

the first path is a cycle from root to root so both directions are ok.

Showing graphs where compute for shortest path.  average paths compared to shortest paths.

As Alia said there are lots of papers.  That's why we're here.  Want
comments to find the correct algorithm.

Next steps - want to continue the work, get feedback on tradeoffs
etc. and then look to become a WG draft (this seems to be a good
starting point for architecture).

comments on list please (no time now)


Dhanahnjaya (OTV)

OTV is L2/L3 virtualisation providing L2 LAN extension for enterprise
sites.  Let's sites have L2 or L3 connectivity.  Efficient multi-point
protocol.  Works over any kind of core (L2, L3, MPLS, etc.)  Also
simple to provision/manage (key for enterprises).

Basic model is MAC routing.  Advertising L2 reachability.  MACs from
one site advertised to another using a routing protocol so no need to
flood unknowns through the core.  Traffic is encapsulated in IP across
the core but no pre-built tunnels required.  just forward IP packets
using unicast/multicast routing.

This is an overlay across the core.  Edge devices discover each other
(generally by joining an IP multicast group).  Then create adjacencies
and exchange multicast routing information.  Unicast MACs, lists of
active sources etc.  Only exists on the edge so transparent to core
and intra-site.  Acts as an L2 switch on the site site (STP etc.).
Can be a host or a router from the core's perspective.  May have to
advertise particular IP addresses to the core.  No STP transported
across the core (constrained to the sites - and therefore constrains
failure domain).

Forwarding:

When L2 unicast arrives at the edge and if it's at another site you
get a next-hop which is the IP address of the edge device at that
site. So then seed packet out (tunnelled).  Unicast can use ECMP.

Mulicast uses a limited set of core multicast groups (typically SSM).
Site specific groups map onto one or more of these core groups.  Edge
device with active source sends out into the core.  Receiver edge
devices join this group in the core and do multicast routing to send
only to interested devices.  Broadcast also uses multicast.  All edge
devices join the group for that.

Supports multihoming.  One authoritative edge device per site. that's
the only one that sends traffic.  It's elected.  Site traffic can still
be load-balanced.  E.g. per-VLAN AED.

How do we do MAC mobility?  Use a control plane mechanism.  MACs are
advertised with a default metric.  When it moves the MAC is
readvertised with a lower metric.  Once original advertisement is
removed (as a result of the ED seeing that) the new ED bumps the
metric back up.

Showing the encapsulation (over UDP using IPv4 or v6 - and with a
well-known destination port and dynamic assignment of source ports
based on a hash of the transported frame).  OTV header in the packet
plus instance ID to handle overlapping sets of LANs.

Routing:

any routing protocol would be ok.  Need to autodiscover, form
adjacencies, exchange routing (and other) information. OTV uses IS-IS.
Overlay forms a logical network.  Edge devices run at L2 on overlay.
Leveraging IS-IS extensions (and defining some more).

Showing the drafts (one on OTV and one on IS-IS extensions for OTV).

Comments:

Florian - when you presented in IS-IS we discussed doing this in L2VPN.

Alia - wasn't possible for the presenter to do this.  

Florian - overlap is L2VPN and TRILL.  Draft says that.  Encaps has
instance ID.  So it's a L2VPN.

Dino - state intention.  No plan to bring into a WG.

Wim -fits in L2VPN as it supports IP.

stewart - doesn't as isn't L2VPN.  This isn't TRILL either.

Wim - let's discuss in L2VPN to see what's missing and see what the best way forward is.

Florin - we extended L2VPN to data-centres.  We have proposals there. Clear overlap.

Dino - restating intention.  This is cisco's presenting an option for
L2VPN.  If a WG wants this work to be done ask us and we will answer -
but not planned to put this in a WG.

Lou Berger - confused by Dino's statement.  You're presenting just in
case the IETF is interested?

Dino - presenting to get comments on tech.

Lou - but not to get progress in IETF?  So why is it an Internet draft.

John - we normally standardise in IETF. If you don't want to
standardise then why do that.

Dino - we want to make it public and we don't want codepoints to
clash.