Got a number of drafts already in the AD queue: PBB-VPLS PE model and PBB VPLS Interop. There are a couple of issues with references etc.
VPLS multicast has gone to AD but requires substantial changes.
Stewart - two at the top of my queue are the VPLS multicast where Yakov did a huge amount of editing to simplify it. I need a strong cup of coffee to go through it to check it's clear. Another requires a revised ID or editors note. There was a good comment from IESG review. Giles gave us text but didn't work as we don't put references in the abstract. But as a general note, as shepherds and authors, when comments come back from IESG do look at them. It's not that you have to address to the satisfaction of the responder, but to the satisfaction of the responsible AD. Ideally either resolve each comment or put a polite note that it's not going to be addressed.
Nabil - VPLS Multicast, referred to by Stewart, has been around a long time and gone through two WG LCs. Yakov has done a good piece of work addressing the clomments.
E-VPN requirements is now being progressed too. Still have a journey there but it's moving. Goal is that once that's done we'll move onto the solution drafts.
Also have the VPLS MIB. Comments from MIB doctor being addressed before it goes to the ADs.
IPLS is very old - 8 or 9 years? Lots of discussion about what to do. Had agreed to publish as informational and based on IESG may be historic. Andrew is shepherding it.
E-Tree requirements. Passed WG LC. Checked IPR - took a while. Please when we poll for IPR do respond to the mailing list if you're an author. We can't progress any draft until every author confirms there's no outstanding IPR.
Stewart - saves us a great deal of embarrassment if this is done before things go to RFC.
Nabil - also had comments (on E-Tree requirements) to authors. They've not had time to address before the IETF cut-off. Hoping to send to ADs after IETF.
There's also VPWS OAM Interworking. Also quite old. Had been dormant. Restarted work in 2012 and also polled for implementation (there are implementations out there). We need to pick this up with the authors (it has expired).
Multicast. A few drafts:
1) LDP VPLS broadcast extensions. Held waiting for P2MP PWE work. PWE3 and L2VPN need to decide if we need to continue the work. There are authors willing to do so, but the wider question is on the need. But if we do we need the PWE work to progress first.
2) PIM snooping. Jeff Zhang and others. Will be presented today. Hoping to progress this before Vancouver.
3) VPMS. There’s a framework doc but again dependent on P2MP PWE. So like the LDP VPLS broadcast draft it’s waiting on the P2MP PWE work.
Various E-VPN drafts. Both WG drafts here and unlisted individual IDs. Adopted drafts:
1) E-VPN solution – in version 4. Good progress, lots of work by the authors. Being presented today.
2) PBB-EVPN. Also being presented today.
3) TRILL EVPN. Not sure what to do. We’d adopted one draft which expired. There’s another draft out there.
Sam Aldrin – we’re going to do a new rev and ask WG to look at it to see if it’s ready for WG LC.
Nabil – E-Tree is another body of work. 3 drafts:
1) requirements (in last call)
2) framework. Looking to progress that once requirements progresses
3) E-Tree solution (for VPLS). Lots of debate and we picked the 2 VLAN model. Lots of discussion early on. Need to get back to this after framework.
Want to get E-Tree work off our plate before Vancouver.
Then we have a few drafts on multihoming and convergence.
BGP VPLS Multihoming will be presented today by Senad.
The MAC withdraw optimization and Macflush loop detection drafts. Those are quite old. Been last called. Not enough comments on mailing list so have to decide how to progress. Lots of feedback to the authors from the chairs. Ali did expert review. Authors have done edits. But didn’t get feedback at last call. Big push here from ADs with concern about silent calls where there’s no substantive comment. Are people really paying attention? This is happening across WGs. We’re going to try to be stricter on that in L2VPN. Before we adopt an individual ID as a WG draft we want to get substantial commitment from WG to review the draft, as well as to work on it. Hopefully that commitment will continue after the draft is adopted.
Florin – on the MAC withdraw optimization. There was good discussion early on. Got stabilized. Was deployed. So we ought to progess.
Ali – the reason we don’t get enough comments is because it takes so long to get the drafts through the pipe.
Stewart – so should we publish as historic?
Nabil – we’ve been discussing this recently. I agree with Florin on Mac optimization. There were substantial comments between rev 6 and rev 8. There were a few authors. Didn’t get comments at last call. But checked implementation and authors who were vendors said had implemented. But not sure on interoperability of implementations. On loop detection we need to discuss more. Again the authors have taken comments from chairs onboard. So we thank them for that and we want to see how best to progress. Might be informational. Might be historic. You’ll help us chairs do our job if you’re active in WG LCs. To Ali’s comment we recognise that drafts sit in the pipe too long. Many people have moved onto new things by the time they get to the end. Wider WG interest will also have dwindled by the time we get to the end. We’re going to do a better job on E-VPN etc. but we’re also clearing the VPLS stuff off our plate. The E-VPN stuff is current, and our goal is to get to the point where the sole focus of the WG is E-VPN.
The inter-domain redundancy draft was agreed to be published as a BCP. Protocol extensions originally in the draft were taken out. Was Last Called 2 weeks ago. Silent. But as this is a BCP the standard is a bit lower. Not a crucial issue for interop etc. But we’d still like to get comments/feedback from the WG. So we will probably extend the Last Call for a week after IETF. So hopefully we’ll get more feedback than the zero we have so far.
Terminology updated since last time – specific implementations rely on general VE NLRI to do multi-homing. Document was a bit confusing so went and redefined what to do with the various NLRIs. Have specified both NLRIs clearly. VE NLRI is used for auto-discovery and PW stitching to the instance and is non-zero. Multi-homing NLRI represents the CE state and is used of state transitioning so have called it the CE NLRI. Used for multi-homing but can also use for e.g. multicast optimization.
Have added section on the provisioning model. Details on how apart from multi-homing you can infer things from CE NLRI changes for multicast optimization.
Have clearly defined the D and F bits. Got text from the old Kothari draft. Specifically D bit is applied to VE NLRI to say what the significance is (D implies this is the whole VPLS instance). D bit in the CE NLRI represents status of the CE. The F bit’s role is with the VE NLRI.
Also redefined macflush. Had explicit and implicit macflush. Brought this from Kothari draft too. Got rid of explicit flush as not implemented much. Not had much feedback from WG. Understand this is an old draft but we want feedback before the last call. In particular there are operators who use this. Have covered backwards compatibility etc.
Nabil – Senad would you please ask the WG for feedback?
Major change is a new PIM relay mode between PIM snooping and proxy. Also re-arranged some paragraphs and addressed some comments from Giles. Still discussing some comments.
First let’s discuss IGMP snooping and proxy. Snooping means you snoop joins and leaves for membership information and then send them as is. In proxy you consume the messages and generate new ones based on learned membership information. But in both cases only send to router ports (to avoid join suppression).
In the previous version PIM snooped messages were flooded everywhere. So unlike IGMP snooping have to disable join suppression on the CE routers. For proxy it’s similar to IGMP proxy. Consume messages and generate new ones.
PIM relay is like IGMP snooping. You snoop but only send the messages to upstream ports. So no need to disable join suppression on the CE routers. Providers may not want to disable join suppression as it’s hard to co-ordinate with customers. Doesn’t require upstream FSM. No flooding here. And compared to proxy lets the upstream see all the downstream joins. We’re not sure if PIM relay is the right name. If you have any suggestions please let us know!
Making slow progress and plan to last call by the London meeting. Requesting review and comments. Will also ask the PIM group to comment. So far only had comments from Giles. Resolved some but still discussing others.
The background is that both L3 and VPLS multicast use PMSIs as a conceptual overlay on the P network instantiated by provider tunnels and used to carry customer multicast traffic. The PMSIs are advertised in PMSI AD routes – in particular there are PMSI tunnel attributes that carry the tunnel info. Both the L3 MVPN MIB and VPLS multicast MIB have related objects.
We’ve extracted those common definition into this MIB. Has two things:
1) textual convention for tunnel types.
2) PMSI tunnel attribute table – indexed on flags, type, label and tunnel attribute ID (as signalled in PMSI AD routes). Last two attributes in the table entry are tunnel pointer and tunnel interface. If a particular provider tunnel exists in a MIB table there will be a row pointer to it. Likewise if there is an interface for the tunnel there will be a row pointer to the interface. Both the L3 MVPN MIB and the VPLS multicast MIB have row pointers to entries in this table.
L3 MVPN MIB is a WG draft for L3VPN. We plan to request last call in November. To do that we need to progress this draft too so we’re requesting comment/review from this WG – as we need consensus in both WGs.
Nabil – are you presenting this in L3VPN?
Jeffrey – already presented last time around.
Updates for rev 04. 3 categories of changes (added sections, removed sections, clarified others). Did a good scrub to get this ready for WG LC. So any refs that said “to be done later” etc. were removed.
Modified ES route – added originating router’s IP address to cover multi-homing across ASes. Added section 16.2 for sticky/static MAC (e.g. operator wants to configure MAC of a server statically so no other VM can pretend to be that MAC). New flag in ESI extended community to indicate whether a MAC is static. If a MAC is static and another PE learns it locally it’ll alert the operator and stop learning it. Added section 9.6 for interop with single-homed PEs. Thanks to Thomas Morin here. Even single-homed PEs need to execute some of the multi-homing procedures (those for Ether AD route for fast convergence, aliasing, and backup path). Also added section 19 on frame ordering. Important topic for Ethernet. When the most significant nibble of dest MAC is 0x04 or 0x06 it can alias with IP so P routers that do ECMP may treat this as IP payload. So as a result a control word is mandated before the Ethernet header.
Removed 5 sections – leaving those for future work. MP2MP LSPs, multicast for selective trees, multicast with explicit tracking, LACP state sync and local repair. In the future, if there’s interest, we might do new drafts on those topics.
Changes. EVPN is now EVPN, not E-VPN. EVI was sometimes used for forwarding table and sometimes as EVPN instance. So now it’s always EVPN instance (can span across multiple PEs). The table itself is now “MAC VRF”. So updated the doc to make sure terminology for the two is consistent. Also clarified relationship between the EVIs, the VLANs and the VIDs. Can be 1:1 or 1:many mapping from EVI to VLAN. VLAN can have multiple VIDs. A VID just identifies a VLAN attachment. Also clarified MAC VRF relationship to bridge domains etc. Clarified prefix vs. address for MAC and IP in EVPN MAC route. Changed MPLS label field in EVPN MAC route to be just one label (not a stack) as nobody was using the stack. Expanded the description of default GW for inter-subnet forwarding. And added a paragraph to explain why PEs don’t need to compare ESIs in cases of MAC mobility amongst single-homed CEs.
Believe have done a good scrub and that this is ready for WG LC.
Robin (Huawei) – lots of multicast sections have been removed. Suggest that the multicast part of E-VPN is moved to a separate draft.
Ali – we already cover inclusive multicast and ingress replication here. The only stuff we don’t cover is selective multicast. That’s not the primary utilization by operators. We may cover just that in a separate draft.
Robin – relating to the label stack. The ESI label is a single one instead of a stack. That’s a change?
Ali – you still have the stack in your frame. But the label for your VPN instance is now one label instead of multiple labels. But there will still be a stack with tunnel label, EVPN instance, Entropy etc. and if needed a split horizon filtering label. What has changed is the EVPN instance is now one label.
Robin – RFC3107 defines label stack for labelled BGP for IP prefixes. Not a single label. So I think this is an issue. I have done work on MPLS big labels.
Ali – this is consistent with L3VPN where there’s one instance for the VPN instance.
Robin – but this is RFC3107.
Ali – if you do 3107 that’s fine. But the L3VPN draft has a single label.
Robin – there’s an inconsistency.
Ali – bottom line is we want draft to be simple. Vendors need to be able to implement and interoperate.
Nabil – if there are more comments on that let’s take them to the list. I suspect we will have an interop-fest at some point.
Wim – regarding the prefix part we’d like to discuss. There’s a draft about it. Multiple ways to do it. We’d like to have the WG understand the different scenarios so we can pick the right solution. I’d propose we do that after Jorge presents at the end of the session.
Jorge – draft looks good. I am struggling with (apart from the prefix thing) the MAC mobility section. You can have a route with just a MAC, or with a MAC and an IPv4 address or MAC and IPv6 address. So based on draft if you get an advertisement with a higher sequence number you should withdraw the MAC route. But it doesn’t say which one.
Ali – the BGP prefix consists of MAC and IP address. So it’s the MAC and IP address that gets compared in the MAC route. So there’s no issue. If you don’t have a prefix then all you have is the MAC.
Jorge – but you can still have more than one prefix in the RIB. If you advertise the MAC with the IP for the ARP resolution you can have two or more routes in the RIB. If you just get an advertisement for the MAC without any IP do you remove all the routes?
Nabil – so if you get a MAC only then what do you do to the ARP entries?
Florin – do you compare the sequence number across different routes? Or per route?
Ali – when you do MAC mobility it consists of both MAC and IP. When there’s a MAC move the PE to which the VM is attached will be able to know both MAC and IP so will advertise both.
Nabil – can we defer this to the end?
Ali – you can envision scenarios where the PE only sees the MAC and no IP. So there’s an aliasing issue and we need to decide how to address it.
Jorge – yes, we need to clarify that.
Lucy – there seem to be quite substantial changes here so I think it’s too early for WG LC.
Ali – those are not substantial changes. If you compare this rev to earlier ones it has been fairly stable since rev -02. Quite a while. This is fine tuning – e.g. ES across multiple ASes. Sticky MAC was just a paragraph we added. If you go through this 40 page document those changes are a few paragraphs.
Nabil – your comment is about readiness for last call. We need to raise the issues which could get in the way. Won’t be immediately after IETF. We need to go through the list. It’s good to get feedback to prepare the draft to be ready for last call.
Ali – we want to be proactive. Whatever changes we could think of we incorporated.
Lucy – you introduced control word for load balancing. But EVPN doesns’t use PWE.
Ali – it’s very simple. No negotiation. We’re mandating it.
Shahram Davari – you’ll accept comments before starting last call?
Ali – absolutely.
Changes are clarified B-MAC assignment for multi-homing with per-ISID load-balancing. Have active/active and active/standby. Per-ISID LB is active/standby. Multiple B-MAC addresses for the same Ethernet Segment (one per PE). These can be one per PE or one per site per PE. Covered both options. Also added important section on failure handling. For active/active is simple, but for active/standby need to know what to do in various scenarios.
Also want to accept comments before WG LC.
Changes since rev-01 is that 01 only covered homogenous scenarios where all PEs have IRB. In some cases IRB may only exist in some PEs – either due to capability or preference to control policies centrally. So added new scenario to the draft – assumes that all NVEs may not have IRB but might have it centrally at a gateway. Had 5 or 6 scenarios before and added one more.
N.B. The draft is called “inter-subnet-forwarding”. The presentation says “inter-subnet-switching”.
Draft talks about how you connect VXLAN islands using EVPN with VXLAN doing learning/data-plane.
We added all-active multi-homing scenario (i.e. active/active). So 3 new sections for that. Also clarified how the Ethernet tag is set in the EVPN MAC route. If it’s a 1:1 mapping from VNI to VLAN then set to zero. If VNI corresponds to multiple VLANs (rare scenario) then pass the VNI in the Ethernet MAC route.
EVPN core network connects VXLAN islands.
Want active/active load balancing so unicasts can arrive at any PE on a per-flow basis. Multicast needs to be locked properly. Scenario is that VXLAN network looks like Ethernet to EVPN. EVPN does the DF election and load balancing. Only difference here to normal active/active is that we define it for a device, not a network. Difference is that with a device running LACP in active/active mode the multicast only gets blocked in one direction (device does a hash and only picks a single link). When you have a VXLAN network the multicast arrives on both PEs to the blocking needs to be symmetric – need to block core to site and site to core. Defined how we prevent flip-flopping. So have anycast address for the VTEP at the PEs.
Using PIM Bidir. One way to do it is to have multiple trees – one per spine. NVE can select any tree for load balancing. Each PE joins all the trees.
PEs use DF election to decide for a given VNI whether it’s a DF or not. Once we do DF election there’s no loop or duplication.
The draft describes how to connect VXLAN islands over EVPN and how to have it in an all-active mode. How to do DF election. Encourage you all to read and give feedback.
Florin – I noticed the control word stuff in EVPN. It’s not needed for VXLAN though, right? So only define for MPLS encaps?
Ali – yes it’s only needed for MPLS encap, and is mandated for it.
Florin – I think you need it for NVGRE but not VXLAN.
Ali – NVGRE does ECMP based on GRE key.
Nabil – there’s room for 8 bits to do it but nobody supports it.
Ali – but PE routers don’t go beyond the GRE header to do load balancing. Either the P routers understand NVGRE or they don’t. But they don’t try to do load balancing if they don’t support it so we don’t need control word for NVGRE.
Nabil – I agree.
Florin – for VXLAN. We agreed to use the Ethernet tag for the VNID (as a forwarding component). I think in EVPN we need to go back and take that out of the route.
Ali – no the beauty here is we made it consistent. In the EVPN baseline we’ve defined VLAN, VLAN-bundle, and VLAN-aware bundle services. The only case where the Ethernet tag is non-zero is VAN-aware bundle. So it’s the same here. If it’s VNI based set it to zero. But if we bundle multiple VNIs then you carry the VNI in the Ethernet tag.
Florin – for VXLAN the VNID was a data plane component like an MPLS label (we identify the VNID that needs to go in the data plane). Goes in the Ethernet tag when we do the advertisement.
Ali – we used to do that. But not we set it to zero. It’s one to one mapping. The RT that is carried with this route already defines the VPN ID. You don’t need to carry it.
Nabil – that requires some discussion amongst the authors.
Lucy – going back to the diagram on slide 3. This is an overlayed BGP control plane with the underlay for VXLAN being IGP.
Ali – the underlay IGP gets terminated so not carried over the overlay. You terminate VXLAN too. This is for data-plane learning. There’s another draft for control plane learning. This operates just like normal EVPN in the data plane. You learn the CMAC addresses and advertise them using the overlay BGP. On the other side you again do data-plane learning on the VXLAN side.
Lucy – so NVE1 can’t see NVE3 through the underlay IGP?
Ali – no.
Nabil – one way to think of this is that VXLAN is an attachment circuit to the EVI.
Lucy – this drawing is confusing.
Aligned with EVPN-PBB. Removed refs to 802.1Qbp as will do it in a separate document. All issues raised during discussions have been resolved so we feel draft is ready for WG adoption.
Defines mLDP extensions for signalling MDTs over EVPN. Could be used for VLAN type FECs but mostly for SPB and I-SID. Define here how to do this better and want WG to discuss so we get feedback. Since last rev have clarified how to use VLAN aware/unaware RTs.
Nabil – have you had any feedback so far?
Jeff – no.
Nabil – would be useful to get feedback then. Also might need to take this to MPLS WG.
Jeff – I’ve already asked Ice for some feedback.
This is a brand new draft talking about segment EVPN. Co-authored by 3 people.
Proposing an enhanced EVPN solution which satisfies the PBB EPVN requirements but without having to implement PBB EVPN. Only for IP/MPLS. Draft also describes pain points for PBB-EVPN implementation and operation.
This solution requires each CE site to have its own Ethernet segment (includes single-homed CE sites). In EVPN only have ES for multi-homed CE sites. Have a unique ESI and a global ES label for each ES. Draft shows how to generate the ES for single-homed CE (use the same mechanism used for multi-homed CE). Draft suggests using the RR to assign the global label.
Have tunnel label, then EVI label, then source ES Global Label before payload. So compared to PBB EVPN it is much more efficient. At the egress the PE can learn C-MAC to VLAN tag, EVI and source ES bindings. PE can then build a forwarding table based on data-plane learning and control-plane info. So no need for the control plane to advertise C-MACs (just like PBB EVPN).
This improves the split horizon by using ES global label instead of ESI label. Works for both P2MP and MP2MP (EVPN has issue in the MP2MP case). Also simplifies the implementation. This also supports all the usual EVPN functions. And it has the efficiency/implementation of EVPN rather than that of PBB EVPN but meets the PBB EVPN requirements.
Would like feedback and discussion of whether we make this a WG item.
Ali – this is similar to PBB EVPN except for one minor problem: it doesn’t work. This kind of solution is not new either. In L2VPN when we looked at scaling VPLS MACs there was a similar proposal and the WG decided in favour of the PBB encaps. We had lots of discussion. So it’s worth looking at the discussion from 2006. Secondly – when you talk about the global label, there is no global label in MPLS. If you go to the MPLS WG they will chase you out. Thirdly you can’t assume there will always be a RR. And even if it exists it won’t work across AS boundaries. Don’t trivialise these issues. The PBB EVPN draft has been out for more than 2 years. It’s ready for WG LC. I recommend the WG energy and time is spent on things that have been decided and improve those rather than coming up with a new iteration of something we discussed 8 years ago.
Lucy – with regard to your first comment when new things come they take time.
Ali – the difference between this and EVPN is that EVPN was a new proposal. This was discussed 8 years ago.
Lucy – but this is for EVPN, not VPLS
Ali – it’s the same thing. This has been discussed.
Nabil – let’s separate these out. Removing PBB encap doesn’t give you hiding of customer MACs. If you isolate that then the global label space is another issue. How will it be programmed – what if 2 RRs, what if multi-AS. Worth taking this to the mailing list.
Ali – one other thing. PBB has been a standard since 2006/2007. Multiple vendors have it. Very easy to integrate PBB with EVPN and take advantage of both. Also even with this proposal you’re talking about replacing the BMAC. There’s an advantage in the BMAC DA in that you can distribute the BMACs across the linecards and don’t need them on the core-facing linecards. I recommend you go and revisit these discussions.
Lucy – I’d be glad to discuss with you.
Ali – I’d prefer to concentrate my time and not revisit stuff we’ve already know is not worthwhile. We have work to do to improve our current technology. There are still loopholes we want to close in our solutions.
Robin (Huawei) – comments regarding PBB and MPLS encapsulations. Both PBB and MPLS already exist. So maybe there are two options – PBB MPLS and this MPLS-based solution. I don’t think global MPLS labels are mature enough. I have written drafts in MPLS WG on global labels and recommend you read those. Also it’s not a revolutionary change to use a centralised controller to assign global labels. Regarding Ali we can concentrate on PBB EVPN. But for MPLS-based carrier networks that use seamless MPLS we can solve all issues based on MPLS instead of incorporating PBB.
Shahram – I’d support Ali. We don’t want two solutions to the same problem. There are so many RFCs to implement we don’t want to implement more than one just because this one is slightly more efficient (even if we assume it works perfectly).
Lucy – Read the draft. we have VPLS and now we have EVPN. So why did we need EVPN?
Ali – we have VPLS and PBB VPLS. PBB VPLS deployed by a number of carriers. The comment about carriers preferring MPLS isn’t true. Carriers want something that addresses the requirements. That’s why PBB VPLS has been deployed. The same applies here to EVPN. There are carriers who will deploy PBB EVPN. Let’s not reinvent the wheel for nothing. And we haven’t thought about all the issues I mentioned. PBB EVPN is a WG doc. Is it a good use of our time to propose another solution? We’ve done a full evaluation and decided on PBB EVPN. If there’s no merit then the WG shouldn’t consider multiple solutions for the same thing.
Nabil – we’re not adopting this yet. It’s an idea, and we don’t want to block new ideas. Let’s allow discussion to happen on-list and then we’ll poll the WG for how to proceed. From what I read I don’t think this replaces PBB EVPN. But that’s just me with my chair hat off.
Ali – so what’s it trying to address?
Nabil – let’s take it to the mailing list.
Wim – I also believe PBB EVPN is good. Sure, this is slightly more efficient in terms of encaps but it introduces a number of issues (e.g. global labels) so I’d like us to stay with PBB EVPN and not adopt this.
Nabil – being clear we’ve adopted PBB EVPN and that will continue as people who deploy PBB need it. We’ll take this to the mailing list.
Robin/Zhenbin – PBB VPLS uses full mesh of LDP. PBB EVPN uses BGP and has centralized control point, so there’s a difference.
When we implement EVPN there are some points that can be optimized. Example here on slide 2. PE3 is DF for CE2. But when do ingress replication we set up tunnels from PE1 to PE2, PE3 and PE4. Multicast traffic gets sent to all 3 but only needed at PE3. Proposing an optimization here.
New extended community for multicast state. State 0 means state is active (EVI can send traffic to the CE). State 1 means state is inactive (EVI cannot send multicast traffic to the CE).
Advertise this community with the multicast route. If multicast is active will send this with state of 0, else with state of 1. So then only send BUM traffic to PEs advertising multicast state of 0. Either not set up a tunnel or not use the tunnel.
Also advertise this with the Ethernet AD route. If the ES of the EVI on the leaf PE doesn’t forward BUM traffic it will set this state to 0. If it can send it it will set it to 1. Ingress PE can collect state and determine if leaf PE can forward so again can save bandwidth.
Showing example of how ingress PE can avoid nonsense BUM forwarding. Also showing egress local protection for P2MP LSP. If failure happens on PE it will use the inactive one. Multicast advertisement in multicast route can facilitate provision of the egress protection.
Looking for comments/feedback on this.
Ali – this is a solution for EVPN to optimise ingress replication so non-DF PE doesn’t get BUM traffic. This was already in the EVPN draft (called “source quench”). We decided that because of the additional complexity it introduced, plus the use cases we had from carriers indicated that your PE won’t be connected to one device and for some of the other devices you’d want to do load balancing as the other PE will be the DF. So because of the deployment scenario this optimization wasn’t justified so we took it out. But if a customer comes and needs it we can put it back. I can understand with the draft Lucy presented it’s hard to track stuff from 8 years ago. But this was only from 2 or 3 years ago.
Zhenbin – I’ll check this.
EVPN is being deployed for data-center interconnection and can also be deployed for metro networks. Existing metro networks use Y.1731 or MPLS-TP OAM on PWEs for performance monitoring. EVPN will also need performance monitoring.
Challenge in EVPN (as in IP-VPN) is identifying the source of received packets – as the same label is advertised to all other PEs.
Suggested solution is a point to point connection in E-VPN. Concept is an EVI to EVI tunnel (ET). Egress PE can use the ET to identify the ingress EVI.
Control plane mechanisms:
1) Ethernet AD route used for VPN membership auto-discovery.
2) ET label allocation – need a different ET label for each remote EVI.
So now can use the ET label to differentiate traffic from different PEs.
New route type in EVPN needed for this.
2 possible approaches:
1) additional ET label
2) replace MAC label with ET label
ET label can only identify egress EVI so in the 2nd approach the egress PE must look in MAC table to figure out how to forward the traffic. But the benefit is that the label stack is the same depth as for normal EVPN.
The ET label enables RFC6374 to be used for EVPN PM.
Looking for comments/feedback and will revise the draft.
Greg Mirsky – you propose to change encaps for all data traffic?
Zhenbin – if you want to implement this you can add one ET label.
Greg - you don’t need this for OAM packets if you’re going to use MPLS-TP OAM.
Zhenbin – this is for data traffic, not OAM.
Greg – you can use synthetic measurement, you don’t need to do passive measurement. You can create OAM packets and then no need to change encapsulation. No issue for delay measurement as that’s done with OAM packets. Yes, you might be limited in OAM mechanisms but why not use synthetic measurement.
Zhenbin – I think this method may be more accurate.
Greg – I think that’s questionable.
Sam Aldrin – there’s already a draft called EVPN OAM requirements and framework. So why do we need this one? Is your draft solution? If so you need to change title.
Zhenbin – this is solution.
Sam – with the new label how do you ensure the data and OAM follow the same path.
Zhenbin – just like MPLS-TP ECMP is not taken into account.
Sam – but you can’t ensure on the P routers that traffic follows the same path. Are you planning this label only for PM?
Zhenbin – even the normal forwarding can use this encapsulation. It’s a question of local operation. You can use either encaps for normal traffic.
Sam – OAM has to verify the data path. If you have to do that then you need it to take the same path as data. If you encapsulate with a different label stack you can’t ensure that.
Ali – was going to make the same comment. OAM is verification of the data path for fault and performance. I don’t see that here. VPLS OAM for fault and performance uses the same kind of pseudowire as traffic to ensure that.
Jakob Heitz – you can’t send a different label to different PEs in BGP. You may not have a direct session to the other PE. Could have RRs, confederations etc.
Zhenbin – this FEC is two EVIs. you send this label to all PEs. If this EVI is not on this PE it will drop it. Same as VPN where you send a prefix with different RT.
Ali – RT is different from label.
Zhenbin – it’s the same. This is also control plane.
Ali – you can do RT constraint with a label.
Zhenbin – you can refer to the L3VPN PM extension. I’ve proposed that draft in L3VPN.
Nabil – we haven’t yet done work on EVPN OAM. So before we look at solutions we should step back and see if the WG is ready to take on the OAM requirements and framework (to Sam’s point). Define what needs to be done and then come back to solutions. Good to talk through it right now but we’re not ready to take solutions on. We need to capture things like the requirement for OAM to have congruence with the data plane in the requirements.
Showing scenario for a multi-homed network in EVPN. Reliability is improved by multi-homing bridged network to the EVPN network. If PE doesn’t participate in control protocol of bridged network then elect DF per VLAN. If PE does participate then can do active/active MA-based load balancing.
Showing the DF mechanism. All traffic on a given VLAN goes via the same PE.
Now showing active/active load-balancing. Both PEs participate in control protocol of bridged network. So some traffic goes through one PE and some through the other. Can either emulate MSTP root bridge or can tunnel bridge control plane protocol.
Showing the emulated MSTP root bridge solution. Both PEs use the same bridge ID so they emulate one bridge. Only applicable for STP/MSTP. Can’t use this for G.8032.
Now showing bridge control plane protocol tunneling solution. Pre-establish a tunnel between the PEs. The tunnel enables MAC based load balancing.
Details on the two scenarios.
Protocol extension to EVPN. M bit indicates MAC-based all-active multi-homing with no DF election. So if M bit set to 1 then skip the DF election.
Asking if WG can put this in EVPN base protocol.
Ali – some problems with this:
1) scalability. Number of MSTP instances. If PE participates in MSTP that can be a scaling issue.
2) BPDU tunneling – convergence time is function of provider’s network (the access time). Also split brain issue.
These same problems came up in IEEE when connecting .1Q to PBB etc. The solution is Layer 2 Gateway Protocol. Already implemented.
Weiguo – PE only needs to implement MSTP for bridge networks, so scalability is not an issue.
Ali – it’s an issue because you connect a bunch of access networks to each PE and each network has its own MSTP instance. All this stuff has been looked at.
Weiguo – many vendors already do this for VPLS multi-homing though.
Ali – yes, tunneling was proposed 10 years ago. We’ve moved on since then. There are better mechanisms. L2 Gateway Protocol is standardized by IEEE. Or you can do DF election – no need to participate in MSTP. Participating in MSTP requires the provider network to be provisioned with all the parameters for each MSTP instance and that’s a burden.
Weiguo – scale may not be an issue e.g. for enterprises. Also MPLS-TP in VPLS over PW is also deployed by many carriers. So this is similar for EVPN.
Ali – all I’m trying to say is this is my third comment with respect to this kind of thing. I’d encourage people to look at work that’s already been done.
Weiguo – MAC-based load balancing is already in the EVPN requirements.
Ali – it’s not mandatory.
Nabil – these are valid comments but let’s take them to the list.
Two problems here:
1) duplicate delivery of flooded traffic to multi-homed CE. DF election can be used to prevent duplicate copies.
2) Loop and echo forwarding among multi-homed PEs. Caused if CE sends traffic to non-DF PE. Split horizon can prevent this in MPLS networks.
Challenge in NVO3 case that each PE needs to assign a unique IP address for each Ethernet segment to enable split horizon. IP address allocation scalability issue.
Proposed extension using LAGID. LAGID is similar to ESI label in MPLS. Carry the LAGID in reserved bits in VXLAN/NVGRE header. Then DF NVE can use LAGID.
Showing where LAGID can be put in the VXLAN or NVGRE header.
Nabil – this is focussed on NVO3. Need to be driven by requirements from NVO3. I don’t see the context here – what is the NVO3 network? Has to come via NVO3 work. If NVO3 implements EVPN or VPLS then we can address that here.
Ali – are you talking about EVPN control plane for NVO3? Have you read EVPN overlay draft? It exactly addresses this and without defining any new extended communities or anything. It’s called local bias. Avoids one IP per segment. Described in a couple of paragraphs.
Discussion of complexity of base EVPN draft at last IETF so suggested to write a draft on usage. This is it. Explaining procedures using a simple example network. Want feedback from the WG.
Nabil – we took this as an action item in Orlando. Giving a guide for which parts of EVPN apply where so people know where to focus. Will be informational.
Not presented due to time constraints.
In data-center the basic construct is a L2 domain. Need EVPN to get rid of issues with L2 domains in data-center. Another requirement is to have inter-subnet connectivity in the data-center. Must be flexible. So subnets can be behind an IRB interface (as in the diagram) or can be behind a virtual appliance. Diagram shows example of virtual appliances. A virtual appliance can be a L2 or L3 firewall, a NAT router, or many other things. The appliances have characteristics – such as they don’t run dynamic routing protocols, they have basic redundancy mechanisms based on a floating IP, and they can move. Puts constraints on the EVPN control plane – need to advertise prefixes on behalf of the virtual appliances and need to advertise prefixes with different overlay next hops (IRB interfaces, floating IPs, MAC addresses, ESIs).
Solution is a new route type in EVPN for prefix-advertisement (route-type 5). The route-type 2 (MAC advertisement) has 2 functions – advertises MACs and MAC/IP tuples for ARP. Route-type 5 will advertise prefixes independently of route-type 2 and can have the different next-hop types.
Gives us a clean identification of a prefix. And can ignore routes if can’t process prefixes. No MAC information so just look at IP prefixes. Flexible next-hop types (IRB, floating IP, MAC, ESI). Decouples prefix advertisement from MAC mobility. E.g. if MAC advertised from NVE1 and NVE2 then you have mobility (VM has moved). But if you advertise a prefix doesn’t mean you have mobility. Prefix is decoupled. Supports virtual appliance resiliency procedures (draft explains floating IP case).
Would like WG feedback. Open to keep it as a separate draft or merge it into something else.
Ali – we’ve had many hours of discussion amongst EVPN co-authors on this subject. Slight modification here relative to what has been discussed. Overlay IP address moved into extended community (used to be part of the newly defined route). I still take the position that we don’t need to define a new route to convey IP prefixes. This can simply be accommodated by setting the MAC to zero in the existing EVPN MAC route. No ambiguity – if set to zero then it means “this is just a prefix”. You can advertise individual IP addresses with MACs. I was going to incorporate this in the new rev of the overlay draft but wanted to wait to discuss with the co-authors. We can use BGP remote next-hop to advertise the overlay MAC and IP addresses associated with that IP prefix. Only difference between us is that I don’t think we need a separate route.
Jorge – my comment is that AD route looks like MAC route but is different type as function is different. It’s pretty much the same here – advertising IP prefixes is a different function and needs a different route type.
Ali – if you look at MAC route we can advertise it currently with MAC or with MAC+IP. Advertising it with IP is yet another way to use it – it fits right in and there’s no ambiguity. If you advertise without MAC it’s treated as a prefix. As I’ve said all along I don’t want to define a new route when there is no good reason for it.
Florin – I think we have different functions that EVPN will perform moving forward. MAC forwarding. ARP mediation. Those are ready and we can move ahead. IP advertisement needs more work. We should try to decouple those 3 functions and move the first 2 forward. We have combinations with MAC mobility where it might move or be load balanced in active/active. The same can happen with IP prefixes. Multiple combinations between these possibilities need to be analysed. Doing it with MAC route type won’t provide something that will work in the long term. Once we decide we need a different type of function then we should put it in a different route type.
Nabil – with my chairs hat off I really like the decoupling more than the overloading of things. Makes boundaries cleaner. But also sympathise that adding a new route adds to RIB table and adds other complexities. But will leave it to implementers to comment on that. I think it will be good to comment on that a bit. Second point on analysis (and I understand what you’re saying about the combinatorial problem) is that it would help if we could step back and list scenarios that can create complexity and that could lead to some work that is either standalone or bundled back into EVPN draft. I think that’s your goal here is to highlight the problem and then potentially put it back into EVPN. We need to look at the complexity if we overload semantics with existing route types.
Ali – I agree with Florin that we need to do analysis. We’ll discuss that offline. But at least based on the scenarios in my draft it covers all those. If there are scenarios we haven’t captured we can talk about it. We’ve had lots of discussion and gone through lots of scenarios. At least everyone will be able to see the other side’s point of view.
Keyur – what are the justifications for defining a new route type vs using an existing one in the author’s mind?
Jorge – slide 4 addressed that – clean identification, PE that doesn’t do L3 can just look at route type and ignore it, if advertising prefixes don’t need a MAC address.
Wim – if you do floating IP where 2 NVEs share the same prefix and a MAC moves from one to the other then with route type 2 you readvertise all the routes and convergence is bad. In this case because it’s decoupled you just readvertise MAC route and leave the prefixes alone – you get indirection. To use ESI next hop you can’t do that with the procedures that are there today. Real use-cases that are hard to accommodate in the existing draft. If you accommodate them it impacts the cloud management systems work – but those systems don’t understand ESI. We’re looking at use-cases and trying to figure out the best way to accommodate in EVPN.
Jorge – floating IP use-case is in the draft.