Rtg Area Working Group - IETF 81 Alia - full agenda today. Alvaro is new co-chair. Alia reads the note well Agenda bashing: 1) composite links 2) fast reroute with max red trees 3) OTV 4) if time MPLS FRR using LDP extend. Alia trying to get more scribes. Blue sheets etc. charter update being planned. only discussed monday so new to the group. Goal is primarily 2 things: 1) emphasise this is WG for hop by hop routing/FRR for unicast & multicast 2) add milestones for composite links. please discuss on list draft status: 1) LFA applicability in last call - ending Aug 1st. No comments yet on list. Please read. 2) draft-karan caused much interest last time. charter update brings in scope but need co-ordination with PIM WG. 3) notvia will be done as informational 4) ordered FIB looking for opinions. interest in further work now? Or just document what's known until we get interest in proceeding further? Composite Link Framework (Ning - had been planned to be Curtis). want to update on this draft, and what's changed. Been published on WG list. Also updates as to new things we need to take to requirements draft. requirements completed WG last call. Framework been in progress for a while and still an individual draft. What it adopted as WG draft. Been stable but there's a considerable new addition so co-authors are thinking that part of the new addition (management plane requirements) needs putting in the requirements. So will go ahead and add 4 new requirements onto the requirements draft and ask for a respin there. Hopefully quite a small change. update for framework itself. Covers overview of usage of CL but also CL control plane. now listed out extensions needed (e.g. IGP-TE, RSVP-TE). One note of caution - don't be scared by long list as tweaks are fairly minor. Hopefully framework will catch everything so the scope of work is clear. Added various stuff to framework. Curtis got delayed. done more like an individual draft but intention was to merge into this one. Additions are things like arch summary, tradeoffs, challenges. Co-authors will merge the 2 docs after this meeting, and then ask for WG adoption. Some of the wording might need clarification so will do that when the merge gets done. Some new mechanisms proposed. To meet requirements draft we need to leverage other active IETF work (e.g. MPLS loss-delay). Some of that work is not even WG draft stage yet. So there's various states there. But the solution here will rely on that other work so we want to point to it. Also the section on required protocol extensions/mechanisms is not complete. Need to resync that when the merge happens and check no gaps. no intention to specify those solutions in this draft. But if people have input on what may be needed then please provide it. in terms of moving forward the co-authors will merge, publish on list, ask WG to adopt. Will cite existing work and need agreement that meets requirements. Need to complete list of new protocol work. when we started composite link (years ago) use cases were well defined and narrow. But now the scope is broader. So might be useful to revise some of the old use cases from earlier revisions and then expand them so operators can understand the broader scope of the composite link work. Fast Reroute with Maximally Redundant Trees (Alia). this is work that Alia has been doing with others from Juniper, Cisco, Ericsson. will talk about motvivation and architecture then Gabor will talk algorithm etc. Basic issues is that LFA isn't enough - doesn't give 100% coverage. it's NP-hard to figure out how to extend the network when you have a single failure. After failure coverage decreases. it's important to get 100% coverage. NotVia can guarantee coverage but requires lots of network state and complexity. Research done to reduce state but seems to Alia that we need a solution that's perhaps more complex than LFA (hard to get simpler than that) but is simpler than NotVia in terms of state and network complexity. And of course as LFA is deployed we keep hearing more calls for 100% coverage to improve it. Also of course unicast isn't enough. So important to talk about multicast - not just FRR but also live-live. Drafts on that were discussed last IETF. And need link plus node protection. SRLG is nice, but not really ready yet. what if we could always compute 2 link and node disjoint paths between any 2 nodes in the network? Neither one necessarily is the shortest path. But now you have an alternate that isn't failure specific. 2 destination routed trees per destination, and one of those will always work. Lots of research done (30 papers or more) into maximally redundant trees. there's an algorithm that gives you those trees as long as your network supports it (i.e. is 2-connected). What makes this useful is that it works when your network isn't 2-connected. It'll give you the paths that are "as disjoint as possible given your network". Example of where maintenance happens, or failure takes a while to repair. So now network isn't 2-connected. But you still want as much protection as the network will allow. And want 100% coverage whenever topology permits. Alia shows example where network isn't 2 connected and 2 trees HAVE to share one link. What we can do is have an algorithm that works as well as possible in that case. so pair of disjoint maximally redundant trees per IGP-area destination. Blue and Red MRTs. Not one algorithm - there's a spectrum with tradeoffs between computational complexity and optimality of path selection. just like djikstra this runs on link state database. no signalling. handles any network. flexibility on path goodness/complexity of computation. Various usability goals. The key thing is to handle real networks with 100% coverage. so how do you use this? It doesn't replace LFA, it supplements/augments it to get 100%. So if you have an LFA then use it. Otherwise you have the blue and red MRTs. You can pick one based on primary next-hop's failure. Draft talks about various stuff - e.g. how to recompute vs swapping blue to red. And of course you pre-select that MRT. theres an issue that we don't have enough spare bits left to steal in the header. So instead of that we have to do some kind of encapsulation. e.g. use GRE tunnels or use a different LDP label. Advertise additional loopbacks. e.g. have loopback label for FEC and then labels for blue and red MRTs. Could conserve label space with topology-id labels, but have an option here for topology encoded in labels. No new hardware in MPLS case (but need context-based labels). What about multicast live-live? Want 2 disjoint trees from source to all other nodes. So that's an MRT. So extend PIM and mLDP to say you want to join Blue or Red MRT when you join a group. Up to the receiver to decide which stream to forward on. Still some work in progress in that packets might have to identify topology for cases where you aren't 2-connected. also issues for how we do this for non live-live multicast. The PLR doesn't know where the next-next hops are in normal MLDP or PIM. Need to specify which next hops have already joined and their associated labels. Also merge point doesn't know which interface traffic will come in on. 2 ways to handle this: 1) upstream backup join. PLR is the one who knows the information (the merge point doesn't know all that). So the PLR sends an upstream backup join to each merge point. 2) encapsulated backup traffic. So if traffic is encapsulated it must be the alternate, not the regular traffic. key point that differs between multicast and unicast. PLR doesn't know if it's a link or node failure. For unicast you just use the node protection alternate. But in multicast your next hop might have receivers hanging off it so you want to send traffic to both link and node protection alternates. Send traffic there until a configured timeout. The merge point can decide whether to accept alternate traffic based on whether it's primary upstream link is up (if it is then the link protection will work, if not then the node protection has to be used). Also if you join to new topology you have a new primary upstream link so you can start dropping alternate traffic on the floor. Question from Russ White. Wants Alia to talk about computational complexity. Alia you have 2 extra labels per IGP destination. Fast computation is linear. More optimal takes more time. Stewart Bryant - when you say "per-destination" you need to worry about multi-homed prefixes as well? Is it 3 per destination. Alia - yes. I won't leave that out. Alia - where do you replicate with multicast. PLR? (ingress replication). Can send along appropriate alternate to MP (which recognises it by encapsulation) and then can forward based on state of primary uplink. Standard issues of ingress replication if you have lots of MPs. But simple and doesn't create lots of extra state. Does require tunnelling as no multicast tree on path to alternates. Second option is to replicate along an alternate tree. This now creates state in the network. The PLR would send upstream back-up joins unicast to a merge point, intercepted along the way, specify what kind of alternate it is. We need to merge the alternate trees for different PLRs for the same S,G. Drawback is unnecessary replication but reduces state etc. Also can reduce state by merging alternate trees from the same S regardless of ,G as candidate set of next-next-hops is the same. Showing replication on alternate tree. Problem is have different alternate trees from different PLRs intercepting the same point. Showing alternate trees one hop on from the source. Basic problem is that when a node gets alternate traffic it may not know which alternate tree it came from. If merge trees then that node will send to output links for all trees. Stewart question as to how the egress knows which tree is sending it traffic. Alia - back to the point that the node looks at its upstream to see if it is up. Can discuss offline. Tight for schedule. Also cases where primary multicast tree and alternate tree can intercept. So traffic can come over a link from both a primary link and an alternate. So need a way to mark traffic as being on alternate tree. Some work in progress - e.g. ABR knowing MRTs per area. How do we reduce that state? Various algorithmic work in progress. Exact details on e.g. how to handle broadcast interfaces, unequal link costs, etc. Gabor - talking algorithm. showing a way to find the MRTs. will assume network is 2-connected as we're short of time. Basic model is partial ordering of nodes in the network. idea is that the root is the smallest and the largest node. but partial as can't always say that one node is greater than another or not. good thing with the partial order is we can find redundant trees. if you have always increasing and always decreasing then the two paths can only have the root in common. If you have 2 trees they must be redundant as long as you increase on one and decrease on the other. Also true if you vary the next hop. If you have 2 greater neighbours you can use both of them for ECMP. So how do we find those? worked example. the first path is a cycle from root to root so both directions are ok. Showing graphs where compute for shortest path. average paths compared to shortest paths. As Alia said there are lots of papers. That's why we're here. Want comments to find the correct algorithm. Next steps - want to continue the work, get feedback on tradeoffs etc. and then look to become a WG draft (this seems to be a good starting point for architecture). comments on list please (no time now) Dhanahnjaya (OTV) OTV is L2/L3 virtualisation providing L2 LAN extension for enterprise sites. Let's sites have L2 or L3 connectivity. Efficient multi-point protocol. Works over any kind of core (L2, L3, MPLS, etc.) Also simple to provision/manage (key for enterprises). Basic model is MAC routing. Advertising L2 reachability. MACs from one site advertised to another using a routing protocol so no need to flood unknowns through the core. Traffic is encapsulated in IP across the core but no pre-built tunnels required. just forward IP packets using unicast/multicast routing. This is an overlay across the core. Edge devices discover each other (generally by joining an IP multicast group). Then create adjacencies and exchange multicast routing information. Unicast MACs, lists of active sources etc. Only exists on the edge so transparent to core and intra-site. Acts as an L2 switch on the site site (STP etc.). Can be a host or a router from the core's perspective. May have to advertise particular IP addresses to the core. No STP transported across the core (constrained to the sites - and therefore constrains failure domain). Forwarding: When L2 unicast arrives at the edge and if it's at another site you get a next-hop which is the IP address of the edge device at that site. So then seed packet out (tunnelled). Unicast can use ECMP. Mulicast uses a limited set of core multicast groups (typically SSM). Site specific groups map onto one or more of these core groups. Edge device with active source sends out into the core. Receiver edge devices join this group in the core and do multicast routing to send only to interested devices. Broadcast also uses multicast. All edge devices join the group for that. Supports multihoming. One authoritative edge device per site. that's the only one that sends traffic. It's elected. Site traffic can still be load-balanced. E.g. per-VLAN AED. How do we do MAC mobility? Use a control plane mechanism. MACs are advertised with a default metric. When it moves the MAC is readvertised with a lower metric. Once original advertisement is removed (as a result of the ED seeing that) the new ED bumps the metric back up. Showing the encapsulation (over UDP using IPv4 or v6 - and with a well-known destination port and dynamic assignment of source ports based on a hash of the transported frame). OTV header in the packet plus instance ID to handle overlapping sets of LANs. Routing: any routing protocol would be ok. Need to autodiscover, form adjacencies, exchange routing (and other) information. OTV uses IS-IS. Overlay forms a logical network. Edge devices run at L2 on overlay. Leveraging IS-IS extensions (and defining some more). Showing the drafts (one on OTV and one on IS-IS extensions for OTV). Comments: Florian - when you presented in IS-IS we discussed doing this in L2VPN. Alia - wasn't possible for the presenter to do this. Florian - overlap is L2VPN and TRILL. Draft says that. Encaps has instance ID. So it's a L2VPN. Dino - state intention. No plan to bring into a WG. Wim -fits in L2VPN as it supports IP. stewart - doesn't as isn't L2VPN. This isn't TRILL either. Wim - let's discuss in L2VPN to see what's missing and see what the best way forward is. Florin - we extended L2VPN to data-centres. We have proposals there. Clear overlap. Dino - restating intention. This is cisco's presenting an option for L2VPN. If a WG wants this work to be done ask us and we will answer - but not planned to put this in a WG. Lou Berger - confused by Dino's statement. You're presenting just in case the IETF is interested? Dino - presenting to get comments on tech. Lou - but not to get progress in IETF? So why is it an Internet draft. John - we normally standardise in IETF. If you don't want to standardise then why do that. Dino - we want to make it public and we don't want codepoints to clash.