------------------------------------ Monday L2VPN Session (14th Nov 2012) ------------------------------------ ------------------------ WG & Doc Status - chairs ------------------------ Today is the "traditional" l2vpn discussion Thu is the "NVO3" meeting Recharter discussion: E-tree, E-vpn and "l2vpn in data-centre" are now in scope E-tree req adopted as a WG document Following re-charter, we can now progress solutions Doc status: Giles mea culpa on delays getting docs shepherded. shepherd write-up done: draft-ietf-l2vpn-arp-mediation-18 (passed iesg lc, cross-posted to 6man for comments). shepherd write-up in progress: draft-ietf-l2vpn-vpls-mib-07 being progressed as AD-sponsored: draft-kompella-l2vpn-l2vpn-07 multicast: vpls - draft-ietf-l2vpn-mcast-09 (lc comments received) - draft-ietf-l2vpn-ldp-vpls-broadcast-extn-02 (wg lc once p2mp pwe progresses) vpms - draft-ietf-l2vpn-vpms-frmwk-requirements-04 (ietf lc, no comments, were planning expert review, but polled room. No interest by non-authors so will ask list too - and will drop draft if no interest). service convergence / multi-homing - draft-ietf-l2vpn-vpls-ldp-mac-opt-05 (some comments from daniel cohn, wg lc post ietf) - draft-ietf-l2vpn-vpls-macflush-ld-01 (last call got no comments) - draft-ietf-l2vpn-vpls-multihoming-03 (Kireeti presenting today) scalbility: - draft-ietf-l2vpn-pbb-vpls-pe-model-03 (lots of comments, mostly from Dave Allan. Florin resolving). - draft-ietf-l2vpn-pbb-vpls-interop-01 (lc was ok) will progress both drafts once comments addressed. OAM: - draft-ietf-l2vpn-vpws-iw-oam (will last call post IETF) NVO3: l2 service over IP tunnels background is that re-charter brought data-centres in scope. some work like E-VPN, PBB-EVPN addressing scalability/resiliency for data centres. but also work coming - referred to as NVO3. Network Virtualisation over Layer 3 (IP). planned a BoF, but based on AD's advice we're taking it into Thursday's L2VPN session. L2 services over IP tunnelling. Addressing service-ID scaling. Additional requirements in problem statement. based on session on Thursday we'll see what outcome is. ----------------------------------- Kireeti Kompella, VPLS Multi-homing ----------------------------------- Draft has been stable... so, time to shake things up BGP DF election as defined today has two problems: - route oscillation - longer than needed convergence and unnecessary traffic blackholing Route Oscillation: - how to select a route from two equal routes - you have a race condition here Enke Chen: this has been solved in the RR spec. You favour path with shortest cluster list. kireeti: he's looking forward to the solution, which we aren't talking about yet. Many of these problems have been solved in BGP. Issue is we did a L2VPN-specific BGP route selection. Some reasons for that, but many problems already solved in BGP. One of the IDR chairs asked if we can use BGP route selection to solve this. Kireeti thinks we need something in-between. BGP route selection with some VPLS specific bits. Long convergence + blackhole current DF election rules (IDR terminology is "route selection) say nothing about choosing between VPLS routes learned via ebgp and ibgp Keyur Patel: I know you're not talking solutions but I think best external and addpath covers this. vpls vs bgp df election vs route selection df election focuses on choosing among multi-homed sites however, there are times when one sees the same route from multiple sites we have an easy way of doing add-path, use route-distinguishers I said we MUST use the same RDs for multi-homed sites, but you should have beaten me up we need to fix this, but be aware that existing deployments may use the same RD. whether it's IP-VPN, EVPN, L2VPN, VPLS we should use different RDs for multi-homed situations. ------------------------------------------------ Yuanlong Jiang, VPLS PE model for E-Tree support ------------------------------------------------ changes in draft-05 e-tree progress in other SDOs (MEF, IEEE, ITU-T) Discussions: - 3 diff approaches proposed for e-tree in vpls (dual vlan, dual pw, control word) - dual vlan approach works in all scenarios, while other 2 need extra workarounds or a redesign of forwarding plane for hvpls w/ q-in-q spoke & pe with multi-stage bridge modules per RFC6246 - when AC is untagged or tagged w/ c-vlan, vlan mechanism is simple - when ac is s-vlan tagged, still out of scope of MEF and draft-l2vpn-vpls-etree-req (wg considerations: s-vlan translation, use of pbb vpls) - do we really need more options? (dual flow labels, dual PWs, CW, etc?) Can we progress this as a WG item? Ali Sajassi: when we first talked about this, I suggested use of private VLAN. This draft is in line with that, and it's the preferred way. I'm in favor of this approach. One area that needs better coverage is pbb vpls. Then we can move forward on this. I can help for pbb vpls. Dave Allen: In general supportive. Clarification: when AC is S-tagged this poses a problem, but I didn't quite grasp what the issue is? Wim Hendrickx: The issue is that if you want to do S-VLAN transparency. we are proposing to translate the S-tag in the domain in which the E-tree is active. Dave: that still falls within .1Q standards. Nabil: We need to take this to the mailing list for discussion. We're going to go for 1 solution. And we want to be consistent with MEF and IEEE for long-term interop. ---------------------- Ali Sajassi, PBB E-VPN ---------------------- This is the 3rd rev of the draft; will discuss history and changes History rev 01 - review of evpn and requirements, basic solution overview and encoding rev 02 - added new BGP extended comm for MAC mobility rev 03 - added new BGP route for TRILL, added ARP suppression Solution overview very similar to e-vpn, mac addr advertisement difference is the use of PBB encap, B-MAC represents site ID and therefore simplifies a bunch of procedures forwarding table gets built just like e-vpn PE does encap just like PBB-VPLS Overview of Advantages 1 - MAC advertisement route scalability 2 - C-MAC mobility with MAC sub-netting, c-mac not managed but b-mac can be aggregated 3 - C-MAC learning and confinement, when done in control plane they're always in RIB, but with learning they're only there when active 4 - interworking with TRILL and 802.1aq/.1bp networks and C-MAC transparency 5 - per site policy, BGP policy per MAC gives very nice set of per-site policy 6 - avoiding C-MAC flushing 7 - avoid transient loop for known unicast when doing egress MAC lookup BGP encoding ethernet A-D is not needed and is not used results in simplification MAC mobility extended community TRILL nickname advertisement route for supporting interconnection of TRILL islands over PBB-EVPN and maintain "independence" of each island similar to Ethernet MAC route, basically instead of Ethernet field it's replaced by TRILL RBridge nickname field Operation for TRILL/.1Qbp over MPLS TRILL nicknames of .1aq/.1Qbp B-MAC exchanged among different IS-IS islands using BGP, provides "independence" among TRILL/.1Qbp islands Assumed that TRILL nicknames or B-MAC addresses are globally unique in the network Imposition/disposition operation for TRILL frames is similar to B-MAC frames except the MPLS label is associated with TRILL nickname instead of B-MAC in case of trill we have 16 bits, otherwise we have more bits ARP suppression similar to E-VPN in operational principle Difference is that E-VPN advertises MAC/IP binding in the control plane rather than (as PBB-EVPN) in the data plane PBB-EVPN MES nodes snoop ARP requests/responses and cache, locally reply This draft has been around for a while Authors feel good about the coverage in this document Authors think this is ready for WG LC But we know it's pending the requirements draft - which is moving forwards. Want to move forward with E-VPN and PBB-EVPN. Lucy Yong: This adds lots of new stuff. Interesting, interworking between PBB and TRILL. When we forward in the dataplane, is it just a TRILL packet? Ali: assumption is that islands are homogeneous, so there's no forwarding between islands of different types (TRILL and SPB) Nabil: key is BGP is another level in routing hierarchy. Don't have contiguous IS-IS network. Ali: yes - DC operators want to keep control planes for different DCs independent. Dave McDysan: Have you looked at multi-homing, potential for oscillation (like Kireeti has talked about) Ali: I think we need to look at it. Nabil: We need to document the behaviour of an ARP proxy (there's a draft in ARMD on that). Behaviour not the same as ARP proxy defined previously in IETF. Need to take up with ADs. Raised as an issue in ops area. Dave Allen: Don't see BVID gone into extensively in the draft. Is 802.1AQ multi-pathing supported across this? Ali: Hand-off is I-tag interface (no BVID). Mapping of ISID to P2MP LSPs is a local decision. e.g. 1:1 for inclusive, N:1 for aggregate inclusive. --------------------------------- Lucy Yong, TRILL over an MPLS PSN --------------------------------- Describes use-case for RBridge sites interconnected by IP/MPLS network point to point and multiple access TRILL links New hierarchical L2VPN network TRILL background / overview Use Cases point to point interconnection, eth or PPP PW in IP/MPLS Eth or PPP RBridge ports at AC PPP saves ~16 bytes per frame multiaccess link interconnection use VPLS to bridge LAN between RBridge ports Hierarchical L2VPN RBridges at low-tier network, IP/MPLS at top-tier we do forwarding in the RBridges, not MPLS very scalable solution for l2vpn Next step: feedback, work on auto-configuration Ali Sajassi: This draft talks about connecting RBridges over PW or VPLS. Not sure if we need a draft or not, but if we do we need to clean up what's in there. In terms of naming, it's "TRILL over VPLS or VPWS" so the name "over MPLS" needs to be modified as can be over IP or MPLS. Last slide mentioned hierarchy. There is no hierarchy - you're just using VPLS to connect a bunch of TRILL islands. Lucy: Both VPWS and VPLS are over MPLS. There's another draft showing TRILL over IP. Ali - but truly it's VPLS or VPWS. Shouldn't narrow the scope just because there's another draft. Ali - will send my other comments on list so you can address. Florin Balus: On last slide, you have RB/PE. Would PE run TRILL? Lucy: yes. Reasonable for PE to run TRILL as the RBRIDGE is a router that runs IS-IS etc. Florin: It's interesting if you go there, you need to discuss how TRILL+PE model should work. Tissa: Should we have 2 solutions for TRILL over MPLS? What's process here? Giles: Goal tends to be to converge at 1 solution, but there are different ways to get to that goal. E.g. with E-Tree we might need to get authors together. Depends on context. Nabil: Clarification. Question might not have been entirely accurate. Comes back to what Ali was saying. Comes back to the service. VPLS, VPWS, E-VPN. E-VPN isn't the same as VPLS. Up to WG as to whether we want to address TRILL interworking with E-VPN, with VPLS, with VPWS. Lucy: Ali's draft treats TRILL campuses as islands, here we interconnect them together. Ali: Yes, this proposal is VPLS as shared media to connect RBRIDGES. My proposal is the only one for interconnecting them as independent islands. -------------------------------------------------------- Lucy Yong, Shortest Path Bridging (SPB) over an MPLS PSN -------------------------------------------------------- similar draft to TRILL one. This is SPB over MPLS. Document overview SPB overview, IEEE 802.1aq SPB enabled device is called an SPT bridge (because it computes shortest path trees) SPBV (SPB - VID mode) provides SPB with no configuration for customer LANs SPBM (SPB - MAC mode) provides a SPB backbone LAN for connected customer bridges or end stations project to support multi-pathing is in progress (802.1bp) Use cases point to point using ether or ppp PW over IP/MPLS multisite interconnection, using mesh to interconnect SPT sites, can be full or partial mesh SPT bridge at low-tier network, IP/MPLS at top-tier network very scalable solution for l2vpn Ali: How do you use PPP PW? For TRILL it makes sense because we can skip local eth header. For SPB it doesn't make sense because we need to outer MAC, so PPP PW doesn't make sense. (Cut and paste error?) Ali: Second, while VPLS is applicable to TRILL, it's not here. Because SPB expects point to point connectivity. Ali: Third, lots of similarities between these 2 drafts, TRILL and SPB. Cover these in one draft. Dave Allen: Whether you call it VPLS or a mesh of VPWS, it's starting to look alike. Ali: for the trill it should be vpls and vpws. For sbb it should be vpws only. Florin: Making PEs not being aware of TRILL or SPB makes it unrealistic for deployment because of flood containment etc. Only valid use case is where PE is aware of SPB or TRILL. Keep the drafts separate so that the issues can be discussed. Ali: In PBB-VPLS interop draft, we cover scenarios where PBB connects to VPLS. No issue describing how it works. Same thing applies here. You don't have to only cover the case where the functions are co-located in PE. ---------------------------- Xiaohu Xu, VPLS using IS-IS ---------------------------- Cloud data-centre network requirements flat l2 networking, for VM mobility and clustering scalability, more than 4k vlans, mac table scalability maximise available bandwidth Deploy VPLS in the data-centre: good news VPLS could meet most requirements in addition, VPLS is proven L2VPN technology Bad news: VPLS can't meet requirement of simplified provisioning and operation separate protocols for VPLS (LDP and/or BGP) full mesh of PWs Why not a lightweight VPLS? IS-IS could be extended to support a lightweight VPLS We need IS-IS TLV for VPLS Auto-discovery each PE router automatically discover other PE routers that are part of a given VPLS instance identified by the globally unique VPLS ID Signalling Extended IS-IS TLV could be transparent Dataplane wouldn't change, only change is to data-driven MAC learning alternatively the MACs could be learned by the control plane Delivering multicast/broadcast/unknown unicast 2 options ingress replication, p-multicast tree mode operators can make tradeoff MAC scalability on PE routers, PBB can be used with VPLS - e.g. done at TOR switches VPLS routers only need to learn B-MAC addresses Next steps: we need more comments Ali Sajassi: My comment is basically the same as last time. With all due respect, this doesn't make sense. There are already existing solutions in L2VPN that do inter and intra-DC. This only works intra-DC. There are other intra-DC solutions like TRILL, SPB etc. Xiaohu: Current solution has shortcomings - e.g. simplified provisioning. Ali - there are other ways to do that. automating how we drive the BGP RT/AD (specified in E-VPN). Operator doesn't have to configure the BGP AD information manually. Xiaohu: Is a full mesh of PW a problem? Ali: The pain point is on ingress replication. Xiaohu - also pain with full mesh in core. Ali: If you want to get rid of PWs you also need to look at the bigger picture. Look at active/active multi-homing etc. E-VPN/PBB-EVPN. Florin Balus: Pretty much goes to different signalling for VPLS labels, changes the forwarding paradigm. I see no real benefits, nothing added for L2VPN. These are major changes without benefit. Need to describe what's missing from overall L2VPN suite, not just VPLS. Xiaohu: don't you think that pbb e-vpn is also the same kind of significant change? Florus: for pbb e-vpn, it comes on top of what we have. its covering the intra-dc issues. why should I do something pbb-epvn and IS-IS VPLS. PBB-EVPN is more comprehensive. Giles: back of the queue! Ilya: You're planning a new IS-IS instance, or put it in the existing IS-IS instance you're likely to have? Have you considered amount of info you're putting into the database that will be flooded. May challenge existing IS-IS implementations. Xiaohu: You could say the same of TRILL or SPB. Ali: Thanks to Wim for letting me take his place in line. Ali: EVPN and PBB-EVPN etc already have a full set of requirements. Complete set and covers everything (including intra and inter DC). But this is not as complete and is only intra-DC. What are we trying to achieve here beyond what we already have? Xiaohu: will you replace TRILL/SPB if you put E-VPN or PBB-EVPN intra-DC? Ali: if you do TRILL or SPB then PBB-EVPN can connect them. But if I want MPLS to the edge we already have a solution (E-VPN or PBB-EVPN). Wim: At the end of the day we should look at 2 things: either we have TRILL and SPB in the DC and they solve all the problems we have so far, and E-VPN/PBB-EVPN on the wide area side. What don't those solutions fix? Xiaohu: the simplified provisioning are solved by trill or spb, but they're not IP based solutions. this is. Wim: trill is an ip based solution. it uses an ip based forwarding paradigm. Xiaohu: it's similar to IP, but it's not IP. Sue Hares: Ali, why do you think it's important for intra-DC to be the same as inter-DC. Ali: I was not saying it was important to be the same. i apologise if that was the perception. it can be different. if it is different, TRILL and SPB are defined and do the job. I was saying if it needs to be the same we have existing L2VPN solutions. Sue: As an example use case, did you have any examples where there are anything other than the desire to have similarity, or are there technical advantage of one over the other? Ali: I guess "beauty is in the eye of the beholder" -------------------------------------- Thursday L2VPN Session (14th Nov 2012) -------------------------------------- Giles: Key is by the end of today we get to a point where we decide whether there's a problem to solve, what it is, and where we solve it. Giles: Showing IETF "note well" slide and NVO3 slide from Thursday. ADs have decided to put this work here for now. 3 Qs to ask: 1) What is the problem we've got here. And is the problem statement a good starting point. 2) Is there any work that needs to be done, and what is it? protocol work? 3) Is IETF the right place to be doing this work? Explicitly not discussing if this is in L2VPN. ADs will take that away. Nabil: as we go through presentations please think actively on these questions. We want to address these in the open mic session at the end. Lou: can you elaborate on last line of the slide? Giles: not right for us in L2VPN to decide if this is in L2VPN or not. Want AD's guidance. They're taking that away at the end of the week. Lou: opposite choice to what was made in L3VPN where discussed if it was in L3VPN. Giles: that was where Stewart marched to mic in L3VPN and said "we're not deciding that here and now" Lou: just want to work out where this fits vs L3VPN and SDN discussions. Stewart: This is one of 3 sessions this week to gather data. at end will have better picture. Then can decide if we re-charter a group. Take out of one group's charter. Start a WG. Start many WGs. No commitment to do anything beyond figure out if this fits in IETF and where. Lou: would have been better if we'd said that yesterday. But makes sense. Stewart: that was said at the start of L3VPN. ------------------------------------------------------------------ Cloud Networking: Framework and VPN Applicability. (Nabil/Florin) ------------------------------------------------------------------ Nabil: scope is reqs for large scale multi-tenant DCs and cloud networks. How do existing/evolving Ethernet/L2VPN/L3VPN requirements meet those requirements. deck has scenarios and concludes with challenges/gaps that need work. have discussion re scenarios. data-centres and interconnection to end-tenants and to public. define generic reference architectures. Implementations can vary. data-centre gateway provides interconnection to outside world. Tenant connections as well as inter-DC. Core layer. Top of rack element (aggregates server blades). Switching or routing capabilities - implementation/design issue. Virtual switch on server blade. requirements: 1) virtualisation 2) scalability. Lots of tenants. big namespace. used to use 802.1Q. 12 bits -> 4K services. Want much larger. Seems consensus on 24 bits/16M. Lots of VMs/MACs/IPs. 3) multicast/broadcast containment. 4) overall network convergence in presence of VM mobility/movement. 5) resource optimisation - optimise bandwidth in DC and across WAN. FIB optimisation. router/switch control plane. 6) path optimisation. Especially with VM mobility. minimise packet loss during transitions (zero packet loss.) Do redirection during transition? not optimum routing but necessary during transition, then converge to optimum. 7) resiliency. 8) VM mobility/movement. Maintain existing sessions on VMs. Generally keep MAC/IP address as you move the VM. expand/shrink L2/L3 domains across DCs (VMs are mobile, so network attachment changes). optimal forwarding as a result of that during the transit state. Applies to data-plane or control-plane learning. 9) minimal network configuration. Could be provisioning or on-net control. ease of management. 10) OAM per-service for troubleshooting 11) ease introduction of new technologies. 12) allow for different network models. Different entities operating different parts of network (DC, WAN etc.) Dave McDysan: would be helpful describing where things are L2, L3 or both. Nabil: the draft talks about L2 and L3. But for this meeting focus is L2. Florin (taking over from Nabil): covering the solution. trying not to scare people with too much terminology. going from simple to more complex. Pinpoint the related drafts so people can dive into those. draft covers L3 (BGP VPN), IPSEC (a bit, needs expanding). Also L2 (covering now). trying to start with one solution - PBB with L2VPN. replace 12 bit tag with 24 bit ISID. Standardised by IEEE in 2008. Format is like VLAN tag. In merchant silicon and widely deployed. 4K VLANs -> 16M ISIDs. ISID over native Ethernet supported. Also networks doing ISID over MPLS. Lots of deployments for Ethernet services. ISID over IP uses L2TPV3 or GRE (can carry PWE). Need more work. Maybe optimisations (see NVO3 drafts). Need a gateway to interconnect customer sites to existing L2/L3VPN. slides covering specific service level interoperability. Can present ISID tag into IP-VPN as an interface (as for a VLAN interface). So interop into IPVPN. Ditto VPLS. DC gateway doesn't need to see VM MAC. also showing a DC based on ISID interoperating with DC based on VLAN. standardised and running today. discussions in L2VPN on control plane. IEEE defined how to operate over STP or MC-LAG, or IS-IS (SPB). L2VPN can use VPLS with split-horizon. Or use BGP (correct mechanism for inter-AS or inter-provider). E-VPN and PBB-EVPN. IS-IS and BGP are routing options. Can do ECMP and Active-Active. Can do fast convergence using MPLS toolset. SPB uses IS-IS. auto-discovery is important. good toolset intra-provider. route targets, NMS etc. Need to allow VM to initiate auto-discovery. mention VDP (802.1qvg), IGMP, ASDN? pointers on discussing VM mobility from L2VPN perspective. Some paragraphs in E-VPN and PBB-EVPN too. Also ARP suppression drafts in ARMD. Table showing toolsets etc. not much deployment of IP tunnelling. Need good discussion here. PWE over L2TPv3 (standardised in PWE) or GRE (has implementations for L2VPN and L3VPN). all agree service scaling can be solved with 24 bit tag. MAC scaling addressed using overlay. Flood containment discussed in Ethernet and MPLS case. not much IP deployment. Also multicast efficiency. Convergence/ECMP addressed with IS-IS or BGP or combination. VPN interop is a big piece. Would be good to bring back into L2VPN to ensure alignment with deployments. VM mobility is work in progress. In IP case have discussion of IP multicast for flood containment and multicast efficiency. potential work items: 1) IP tunnelling. How do we optimise? Need 24 bit I-Tag format to interop. 2) network autoprovisioning. Discuss how we bootstrap from the VM. 3) broadcast/multicast over IP core requires work. Ingress replication. Aggregated tree. Selected tree? Even if you don't need multicast right now in DC doesn't mean you won't need it later. Also for flood containment may need to build trees per tenant. 4) Ali had a good comment that we need tunnel and service address translation across cloud providers and network/service providers. Giles: now's a good time for specific questions on this draft. General questions at open mic later. Dave Allan: seem to touch on .1ah and then go to .1qvp (which is a long way out). no discussion of .1aq. Florin: I meant .1aq. Ali editor of qvp. Linda: 1) good draft. 2) Big description for address resolution. Need to point out this has been studied in ARMD. Lots of statistics etc. Server impact of ARP/ND. Conclusion is not a big impact for servers (high CPU power). Issue is the routers. Better to reference ARMD instead of putting text here. that way it's consistent. 3) auto-provisioning for autodiscovery for the VM. IEEE has qvg to allow virtual machine to send profile. Use that. Florin: This is just a platform to start discussions. Adrian: thank you for putting this together. Draft seems to flip-flop between "cloud" and "data centre". Would like it if you can compare/contrast what the difference is. Nabil: cloud is a set of data centres and the connections between them. Donald Eastlake: do you want me to add TRILL to the draft? Florin: we wanted to put placeholders for people to add stuff. no time to put text into first revision. So yes please add it. Ali: quick clarification. I'm not editor of qvp. Ben Mac-Crane is. For IP encamps we can take advantage of IP's multipoint-to-multipoint nature. Source identified in header. PWE identifies source, service ID and tells you what packet it is. Don't need that in IP. I did a draft 9 years ago on "M-VPLS". "Multicast-VPLS". Do MAC learning against IP addresses. L2TPv3 was encaps du jour so used that. written with VPLS terminology in mind. But mechanism was very like VXLAN. Florin: Draft from Pedro in L3VPN using XMPP. Nabil: this draft was to show what exists and how it works. Please let's avoid solution space. Want Thomas to present and to have open mic discussing what needs to be done. Kireeti: I keep seeing ISID, PBB, VLANs. We have to stop conceding to layer 2 and start moving to layer 3 (applause) and lots of problems will go away. Florin: need a L2 service. Are you talking L2 vs L3 tunnelling. Kireeti: Tell VMware to do L3 VMotion. that's the end of it. Giles: that's clearly not in scope of IETF! Warren Kumari: this presupposes data-centre built as large layer 2 network with VLANs. there are other designs. Florin: not talking necessarily a flat L2 domain in the core. matter of IP addressing or MACinMAC addressing. L3 routing protocols in core either way (IS-IS or BGP). Some people will use IP backbones. --------------------------------------- Problem Statement: NVO3 (Thomas Narten) --------------------------------------- Starting with a level-set. Important themes for everyone before the Q&A. We believe there is work that needs to be done, and that IETF is right place. focus on problems. Framework - not a specific solution. industry support already for this. Don't want to talk process. don't want to get stuck on requirements for too long. Industry is going ahead. Short window to engage here. High level view - imagine a data-centre, cloud provider etc. Multiple tenants. Tenants may not trust each other. Tenant wants virtual network. Put VMs in the virtual network. provider sells this "network as a service". just one part of bigger system. Tenant requirements: 1) VMs think this is a real network. Don't care about subnets, transport etc. Just want to do their job - send/receive Ethernet frames (generally IP traffic). Not just VMs. Could be servers - in which case the technology lives in a real switch, not a hypervisor 2) address space isolation. fully isolated virtual networks. contain traffic in virtual network except for well-defined entry points (e.g. firewalls, public Internet, VPN etc.) Data-Centre provider wants to place VMs anywhere in the data-centre with no physical constraints. Don't want subnet boundary issues. Can do this with one giant L2 domain. But not tenable any more. Fine with small data-centres. Not for larger ones. Approaches to making L2 scale better (TRILL, SPB, etc.) No magic bullet. ARMD issues. Desire to put VMs anywhere and manageability of L2 domains are in conflict. Want to separate logical network attributes from physical. Complex to tie VLAN info to VM (as done today). Want to describe the properties you need and abstract that from underlying realisation. We've done that for server virtualisation. So VM can say "I want this kind of architecture and this much memory". Needs to scale across the entire data-centre. Data-centre issue, not VPN. Data-centres may have nearly 100k switches, 1M machines and 10x number of VMs. map logical view dynamically to physical view. Summary of requirements: 1) multi-tenant 2) VM placement anywhere 3) on-demand elastic provisioning of resources (stretch virtual network on demand - e.g. in <1 sec with no loss of packets). 4) small forwarding tables in switches. Issue with large L2 domains. Tunnelling approach. Want switches to know minimal info. 5) decouple logical/physical network configuration 6) scale to millions of VMs and beyond today have servers, L2 VLANs, IP cores. "Squishy" boundaries. Detains don't matter. Today VPN comes in, terminates in a box and then goes through to a VLAN. Not planning to change this model. NVO3 approach. Motivation is internal part of data-centre (not VPNs coming in). Different perspective. better scaling than current L2. Spans whole data-centre (and remote ones - though that's not the focus). Clean boundary between overlay and VPNs. highly dynamic changes. Reconfigure subsecond (no long pauses when moving VMs). overlay-based. Data-centre is IP or L2-based. layered on top using shim header. encamps -> tunnel across -> decaps. Overlay header carries virtual network identifier (VNID) = "tenant identifier". Large enough space (e.g. 24 bits - compatible with other technologies) that won't run out. VM packet handed to hypervisor (or switch in non-VM environment). encapsulated, tunnelled to far end hypervisor and decapsulated. VM thinks it's just sending Ethernet frames. Hypervisor has to know which remote hypervisor to send to. Tunnelling just uses outer header. Tenant ID generally ignored by infrastructure (just used at far end to decapsulate). need an overlay header. Exact details don't matter, but need to carry tenant identifier and carry client frame. Multiple encapsulations out there already. Probably not productive to try to pick one. Interesting part for IETF is control plane. Dimiti Stiliadis: there's a need for gateways. firewalls, NATs etc. Need a standard encaps if we want the gateways to work. Having gateways support multiple encapsulations is an issue. Thomas: it's an issue if people don't implement a common set. not necessarily a problem - most implementations can support multiple. not an absolute requirement to pick one. Dimitri: will make gateways more complex. what if there's no hypervisor? Need a signalling mechanism to tell the top of rack switch what to do. Dave McDysan: using VNID to enforce segregation is an important requirement. In NVGRE talks about ways of enforcing policy to ensure separation. Need to make sure one configuration error doesn't create unexpected communication. NVGRE talks about that. Your draft doesn't. control plane tasks. Need a way to populate tables mapping from inner to outer destination. Also for delivering multi-destination frames (broadcast etc.) Need a registration mechanism so when a VM connects it can associate with right VNID. Likewise detaching a VM from a VNID and then rejoining elsewhere. Timescales must be appropriate. Update cached info in a timely manner. Don't want blackholes when a VM moves. 2 address mapping approaches: first is learning - reuse IEEE bridging. flood unknown unicasts. e.g. assign a multicast group address and use that for flooding over IP multicast (likewise for broadcast and multicast). Simple approach. We understand limitations. But there are scaling limitations. Ali: if learning is done in data-plane from MAC to tunnel address then is that a control-plane? Thomas: that's a definition question. The point is that it works. Marc Lasserre: you show non-congruent unicast and multicast paths. that may lead to packet reordering. Thomas: yes, that might be an issue. Need to see how much of an issue it is in practice. Second model is directory-based approach. Have centralised database. Copy of all mappings from e.g. MAC to IP tunnel endpoint. When an edge device needs to send a packet it queries the directory if it doesn't already have an entry. Update directory when you create, update or move VMs. Need a way to invalidate old cache information. "Centralised" is a bit of misnomer - need some kind of replication and fault tolerance for resilience. we need to select an approach. But it's not research - it's engineering. Igor Gashinsky: some people are pushing bindings instead of queries. Thomas: you can imagine multiple different approaches to solve this problem. But learning is not sufficient. Igor: absolutely. I've been advocating this for a while. But most people are going for a push rather than queries. Central system needs to know what to update, but doesn't have to worry about cache invalidation if it pushes everything. Thomas: yes, there are other ways to do this than be query based. big picture. number of things needed for overall solution. NVO3 just does the overlay part. Some areas don't have a standards component - e.g. VM orchestration systems already handle moving VMs around. But they need a hook for telling the network that VM is joining and providing it the VNID. 2 existing proposals - VXLAN and NVGRE. Existence proof that there's desire and momentum. Author list shows significant vendor backing. If IETF doesn't negate work will happen anyway. related work - TRILL is L2 so complementary. Great to scale L2, but some people want L3. IEEE has SPB (again L2 based). ARMD looking at this space but not chartered to do protocol work. L2VPN comes from SP perspective. NVO3 comes from data-centre operator perspective. Really fundamental difference. L2VPN is about tying together L2s at different sites to make it look like one big L2. But has scaling issues (e.g. multicast/broadcast). Not right approach for all. One option is to push L2VPN deeper into data-centres. Some operators may want that. Others may not. NVO3 and L2VPN are both approaches. Neither is the only approach. NVO3 interfaces with L2VPN at a clean boundary. People see NVO3 as L2 over L3. Bit more general. "Carry L3 when we can and L2 when we must". 90% of control plane issues are agnostic to whether we tunnel over L2 or L3. E.g. proposals in TRILL to do directory-based mapping. Can use same approach for both. If we replace VLANs with L2 overlay can still interface with VPN the same way. Instead of mapping to VLAN you map to tenant ID. So not a VPN-based approach, but a complementary one. summary - driven by intra-DC but can span data-centres. Cost/benefit of overlays is compelling to some data-centre operators. mostly works with existing equipment - if you're fully virtualised can do this entirely in hypervisors so no hardware updates. Clearly not the common case. With non-virtualised servers need the switches to implement. Different deployment path to deploying TRILL or SPB. But not to say one is better than the other - just different tradeoffs. Again, major vendors are committed, so we have a short window for the IETF to engage. Ping Pan: You want to have a VM, an ID and map to a network. I'm giving a comment that in your slides you mention the orchestration, that this is the objective of the SDN BoF this afternoon. So orchestration is part of IETF. Thomas Narten: yes agreed, I mentioned it, but we are not looking at working at NVO3 here. John Scudder: You have bullet points (first and last slides) that say "hurry, hurry, hurry". I'll offer an aphorism "short-cuts lead to long delays". Latter 2/3 of presentation demonstrates that. Lots of wheel-reinventing. Need to take time to look at the requirements so we don't reinvent. Second point is that existing solutions don't scale, to which I don't agree. Thomas: I'm looking to what a lot of people have said about this topic over the years. I'm not trying to rush this through, we have had many discussions to get to this point. On the scale issue, that is for the market to sort it out, there is concern that existing mechanism don't scale from other operators. John: If you did not want the discussion on scaling you should not have made it part of the thesis. Thomas: What I'm hearing from at least some data-centre operators is that there is a problem here, but this is not the place to discuss that. Mark Lasserre: On the point of learning rather than a centralised controller, I would like to echo Ping's point that this is belonging to the IETF. It's either data plane or centralised. Thomas: It sounds like you are agreeing with me, but I'm saying orchestration is involved but that is no part of IETF. I just want to do the parts that need IETF standards work. Mark: We need to clarify what is the orchestration. Second point. You say NVO3 is very different to existing technologies, but I disagree. We do overlays already. Thomas: Our approach is L3 based, and some providers are saying this is important to them. Erik Nordmark: We talk about scaling this up, but do we want to think about scaling this up and down. We have different requirements. Thomas: Agreed Eric: What is the essence of the work we'd standardise at IETF. is it architecture that allows multiple control planes running over multiple encaps? E.g. learn at low scale and use database at high scale. Or different encaps for different data-centres. Thomas Narten: The work is to design the mapping systems and the work surround than, as a common framework rather than lots of different approaches. Florin: You come at it from the server prospective, let me come at it from the VPN Gateway one. To interconnect to customer sites outside data-centre you need a VPN instance. Stewart steps in gets to mic - discussion is getting general, questions need to be on Thomas's slides, please no other questions. Giles: agrees with Stewart, we need to get the discussion back to Thomas's slides. Ali Sajassi: the underlying requirements have significant overlap and much of the this, like directory service have all been discussed before in L2 and L3. Have you seen the work that has been done and discussed before. Giles. is there a question here relating to understanding Thomas's slides? Stewart - Ali, you've got one sentence to ask your question! Ali Sajassi - Do you have visibility as to the work done in L2VPN already? Thomas Narten - I agree some overlap in requirements, but DC people are saying that existing L2VPN approaches are not what they want. we are proposing an alternative. Ali Sajassi - current RFC's don't align to what you want here. But approaches and tradeoffs have been discussed. Stewart - we need people to ask one simple question in one sentence! Warren Kumari: I'm upset with this. You've got a clear problem statement etc.. I want to rant about something. Thomas Morin: How do you glue VPNs created using NVO3 to the outside world. Thomas Narten: Something similar to what you do today with VLANs's. Overlay has an edge. Perhaps a dedicated box or not. Thomas Morin: are you saying must be a separate box? Thomas Narten: There's room for that if you want, but it's not an IETF question Wim Hendrickx: Overlay is a problem we need to solve, but we also need a gateway to talk to the outside world. Lets not do an v4/v6 and make something incompatible. Thomas: I don't think your analogy with v4/v6 comment is wrong. Wim: Please can we reuse the VPN identifier which is already there, so we can interwork. We have 24 bit identifier already. Why invent a new one? Thomas: Are you asking for a global namespace? Wim: No I'm asking for us to reuse the VPN identifier that's already defined. And also not have interconnection to VPNs as a secondary priority as we will need to interwork the two worlds. Thomas: I'm saying it's secondary for this group's benefit. This is a VPN group. That's not what's driving NVO3. Igor Gashinsky: The draft says you also want to do tenant-specific routing in the VNIs. Thomas Narten: Yes we do. Igor: Awesome work, best I have seen, what we have today does not scale. We need to pursue this. Maciej: I like the comments in the slides - e.g. "L3 when we can L2 when we must", scaling to 10's of millions of hosts, only storing infrastructure state in the network, technology reuse. We already have a technology for this which is IP, so yes..more IP please. Dave McDysan: question to the slides about directory services (as Igor clarified could include push) and learning. Would you do both in the same DC or just one or the other. Thomas Narten: I think same control plane and encap throughout in one domain. But could have independent overlays interconnected by one router but sharing a namespace. Dave: Thank you. And I'd say "yes" to the questions on the slides. Paul Unbehagen: You refer to SP versus Data Centre Operator, what is the difference? Can you clarify what DC Operator is? Thomas Narten: Someone like enterprise an example of a DCO, as opposed to SP. Giles: Time for general questions now. Chris Liljenstolpe: Agree with Igor and thank Thomas for the work. I agree we need to do this in IETF, and agree in principle we should reuse VPN technology, however I would prefer to force change on the gateways rather than the 10's of thousands of switching elements in the data-centre. Ali: In terms of overlay approach can see benefit if you assume IP network is multicast enabled. So not saying we must impose VPN work on this space. but I think L2VPN is right place to do the work. Giles: specifically not discussing that. Stewart: please - we want to come out with answers to our 3 questions, not discuss which working group will do it. Giles: we're also not doing protocol design. You were talking about multicast etc. Let's please avoid that. Florin: We should put VPN on the same page to modify the priorities. You need to expand so can look at L3 over L3 as well as L2 over L3. For item number 2 we need to look at L3 transport as well as Ethernet and MPLS. And for number 3 we need to work on this as there's a lot of expertise in L2, L3 and other WGs. Dave Allan: concern that we will replicate other solutions. Lets pick a problem to solve that builds on existing work and don't duplicate. Wim: If we look at the existing work in L2VPN, we find issues when running this over IP (e.g. multicast). But L2VPN gives you a good base to solve problems and we need to perhaps make extensions to this. IETF has to solve that problem. Also to make use of the IEEE environment as they have done a lot of work on bridging/switching. L2VPN has been reusing IEEE toolsets up to now. So let's extend that to the problems highlighted today. Thomas Morin: The problem statement seems to put aside the remote DC and the DC interconnects, needs more work. On Multicast, don't reinvent the wheel. Ping: Talking about number 3, VXLAN is has some implications on this problem and should be considered, Redhat is putting this in the kernel for example. This is perhaps a management problem. We should look at this problem and reframe it going forward, however I agree we should look at this problem at IETF. Giles: Stewart is suggesting reframing question 3 as "should we do this at an interim meeting". Ping: yes we should with network and DC people and figure out way forward. Thomas Narten: do you mean on just this topic or other topics? Stewart: it was an idle thought, but yes we should get though the questions here and take a straw poll around this at the end. Lou Berger: Agrees there is some work here for the IETF. there are generic requirements that apply to all underlying technologies, so we should make sure the way we approach this at IETF allows any number of underlying technologies. David Black: Slides only came together in the last 24 hours. Complementarity between the two decks proposed today. We need to try to get a handle of the control and management complexity before we have a 5 x 5 x "number of VPN technologies" integration problem. Let's get this down to 1:1 not N:N integrations and do it now before people start on point solutions. Nabil: We need to make sure we understand what control plane means as there are two perspectives. David Black: We need one framework and not end up with a pile of point solutions. Adrian Farrell: (wearing no hats). Thank you for the work. Agree with Lou that we need more abstraction. Does have a problem with the problem statement re data centre scaling. How big are we going in future? 10's of millions or 100's of millions? Also agree with Thomas Morin that we have lots of existing wheels so don't want to reinvent. Marc: The problem statement needs to include more of the L3 issues around sending L2 over L3, as the L2 traffic already contains L3. Issue is just framing. Eric Nordmark; "Yes, Yes and Yes". We need to focus from the DC prospective and not from the VPN view. don't just think about Ethernet over IP, but also IP over IP. Hypervisor may get decoupled over time. Dino: 1) Yes, this is a great starting point. 2) if encap/decap point is in the middle of the network we need the VMs to be able to tell the network what instance ID they need. 3) Yes IETF should work on this and we should move to one and only one encapsulation to minimise cost in vendor equipment. We should standardise signalling and encapsulation. Chris Wright: 1) Yes this is a good set of problem statements. 2) I don't believe that multi encap is a huge problem. Control plane is more expensive to hypervisor vendor than bit-twiddling. Igor: This is awesome work, and the problem needs to be solved. If IETF is not interested there are other groups who will pick this up. Manuel Paul: Problem statement is good enough. But better if can converge the two presentations. Work to be done on scaling issues. Struggling to see if NVO3 really addresses VPN interworking so customer can get into the data-centre. Not enough to come from just the data-centre perspective. Wrapping Up: Giles - question to room. Who has read the problem statement? Lots of hands up for yes, a few for no. Giles - question to room. Do people this is a good starting point? Some hands, some hands at the back for no. Stewart - question to room - who thinks we need to take more DC input to this? Medium hands Stewart - question to room - Is VPN is a good starting point? Same number of hands. George - IETF needs more DC clue and DC needs a home here. Giles - question to room - Framing format - minimal framing formats? A few hands Giles - question to room - Framing formats - not relevant? Fewer hands Thomas Narten - what does minimal frame formats mean? One isn't achievable? Giles - question to room. Should we drive to one and only one more sophisticated control plane? few people. Giles - so who is ok with lots of control planes? few people. Giles -- question to room - do we need more work on understanding control planes? significant number of people. Florin - Need to drive to a minimal set. Already have multiple tunnelling flavours. We should have one service ID. One control plane as much as we can. But it's probably too early. Nabil - "one control plane" could be confusing. Do we mean management plane, routing protocol, signalling protocol. Need to be clear what control plane is? Stewart - no time now to debate. Kireeti - I think the IETF should work on this. but the the more questions I hear from the front, I less convinced I am that we should work on on it. Pat Thaler - "Control plane" is toolbox to handle learning vs download or query etc. Potentially different tools, for example for small/large deployments. But kept to a minimal set. Don't want 20 different ways to do the same thing. Lou Berger - perhaps questions are too at too low a level. It's clear we have a problem we don't fully understand. Higher level problem and set of requirements. Nabil used the word "framework". We need to figure out the framework and then plug solutions into it. can't preclude existing solution set. Hope we can invent inventing new control planes and data planes. Giles - question to the room - Is one format for service ID across layer 2 and layer 3 transport is a good thing? And is 16M scale enough? Thomas Narten - way too early to know that. Giles - Feels like consensus that it's too early to answer detail questions. Thomas Narten - feels like there's lots of support for the problem statement, though some didn't like it. Stewart - feels like consensus that there is work to be done and that IETF should do it. Stewart - question to the room - Do we understand the material enough to use email to start on this or do we need an interim meeting. lots of hands for "yes let's have a meeting" and similar for "no". Stewart - would anyone be prepared to host any meeting we have? 2 hands. Stewart - we've gathered some data and will get more this afternoon at SDN BoF. Then will arrive at a decision as to how to move forward. Thomas - interim meeting on NVO3 only, or on broader space? Stewart - would cover all of data-centre space. "Do we need an interim on the data-centre problem?" seemed consensus for a meeting. Steve Blake - which mailing list shall we use? Stewart - we'll start a new list. Doing it on NVO3 assumes a solution. Linda - I thought there was a DC-ops mailing list? Warren - yes, there was a DC-ops BoF and a non-IETF list that got no traffic. Linda - there is the ARMD mailing list for address resolution scaling issues. Stewart - this needs to cover scaling in all its aspects. Ping - go to routing area mailing list? Stewart - not just a routing problem! *MEETING CLOSED*