L2VPN Workgroup Ali Sajassi INTERNET-DRAFT Samer Salam Intended Status: Standards Track Samir Thoria Cisco Wim Henderickx Jorge Rabadan Yakov Rekhter Alcatel-Lucent John Drake Juniper Florin Balus Nuage Networks Lucy Yong Linda Dunbar Dennis Cai Huawei Cisco Expires: January 4, 2015 July 4, 2014 Integrated Routing and Bridging in EVPN draft-sajassi-l2vpn-evpn-inter-subnet-forwarding-04 Abstract EVPN provides an extensible and flexible multi-homing VPN solution for intra-subnet connectivity among hosts/VMs over an MPLS/IP network. However, there are scenarios in which inter-subnet forwarding among hosts/VMs across different IP subnets is required, while maintaining the multi-homing capabilities of EVPN. This document describes an Integrated Routing and Bridging (IRB) solution based on EVPN to address such requirements. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at Sajassi et al. Expires January 4, 2015 [Page 1] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Inter-Subnet Forwarding Scenarios . . . . . . . . . . . . . . . 5 2.1 Switching among Subnets within a DC . . . . . . . . . . . . 6 2.2 Switching among EVIs in different DCs without route aggregation . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Switching among EVIs in different DCs with route aggregation . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Switching among IP-VPN sites and EVIs with route aggregation . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Default L3 Gateway Addressing . . . . . . . . . . . . . . . . . 8 3.1 Homogeneous Environment . . . . . . . . . . . . . . . . . . 8 3.1 Heterogeneous Environment . . . . . . . . . . . . . . . . . 9 4 Operational Models for Asymmetric Inter-Subnet Forwarding . . . 9 4.1 Among EVPN NVEs within a DC . . . . . . . . . . . . . . . . 9 4.2 Among EVPN NVEs in Different DCs Without Route Aggregation . 10 4.3 Among EVPN NVEs in Different DCs with Route Aggregation . . 12 4.4 Among IP-VPN Sites and EVPN NVEs with Route Aggregation . . 13 4.5 Use of Centralized Gateway . . . . . . . . . . . . . . . . . 14 5 Operational Models for Symmetric Inter-Subnet Forwarding . . . . 14 5.1 IRB forwarding on NVEs without core-facing IRB Interface . . 14 5.1.1 Control Plane Operation for IRB forwarding without core-facing I/F . . . . . . . . . . . . . . . . . . . . 15 5.1.2 Data Plane Operation for IRB forwarding without Sajassi et al. Expires January 4, 2015 [Page 2] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 core-facing I/F . . . . . . . . . . . . . . . . . . . . 16 5.2 IRB forwarding on NVEs with core-facing IRB Interface . . . 17 5.2.1 Control Plane Operation for IRB forwarding with core-facing I/F . . . . . . . . . . . . . . . . . . . . 18 5.2.2 Data Plane Operation for IRB forwarding with core-facing I/F . . . . . . . . . . . . . . . . . . . . 19 6 BGP Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7 VM Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.1 VM Mobility & Optimum Forwarding for VM's Outbound Traffic . 21 7.2 VM Mobility & Optimum Forwarding for VM's Inbound Traffic . 21 7.2.1 Mobility without Route Aggregation . . . . . . . . . . . 22 7.2.2 Mobility with Route Aggregation . . . . . . . . . . . . 22 8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 9 Security Considerations . . . . . . . . . . . . . . . . . . . . 22 10 IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 11 References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 11.1 Normative References . . . . . . . . . . . . . . . . . . . 22 11.2 Informative References . . . . . . . . . . . . . . . . . . 22 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. IRB: Integrated Routing and Bridging IRB Interface: A virtual interface that connects the bridging module and the routing module on an NVE. NVE: Network Virtualization Endpoint TS: Tenant System Sajassi et al. Expires January 4, 2015 [Page 3] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 1 Introduction EVPN provides an extensible and flexible multi-homing VPN solution for intra-subnet connectivity among hosts/VMs over an MPLS/IP network. However, there are scenarios where, in addition to intra- subnet forwarding, inter-subnet forwarding is required among hosts/VMs across different IP subnets at the EVPN PE nodes, also known as EVPN NVE nodes throughout this document, while maintaining the multi-homing capabilities of EVPN. This document describes an Integrated Routing and Bridging (IRB) solution based on EVPN to address such requirements. The inter-subnet communication is traditionally achieved at centralized L3 Gateway nodes where all the inter-subnet communication policies are enforced. When two Tenant Systems belonging to two different subnets connected to the same PE node wanted to talk to each other, their traffic needed to be back hauled from the PE node all the way to the centralized gateway nodes where inter-subnet switching is performed and then back to the PE node. For today's large multi-tenant data center, this scheme is very inefficient and sometimes impractical. In order to overcome the drawback of centralized approach, IRB functionality is needed on the PE nodes (i.e., NVE devices) as close to TS as possible to avoid hair pinning of user traffic unnecessarily. Under this design, all traffic between hosts attached to one NVE can be routed and bridged locally, thus avoiding traffic hair-pinning issue at the centralized L3GW. There can be scenarios where both centralized and decentralized approaches may be preferred simultaneously. For example, to allow NVEs to switch inter-subnet traffic belonging to one tenant or one security zone locally; whereas, to back haul inter-subnet traffic belonging to two different tenants or security zones to the centralized gateway nodes and perform switching there after the traffic is subjected to Firewall or Deep Packet Inspection (DPI). Some TSes run non-IP protocols in conjunction with their IP traffic. Therefore, it is important to handle both kinds of traffic optimally - e.g., to bridge non-IP traffic and to route IP traffic. Therefore, the solution needs to meet the following requirements: R1: The solution MUST allow for inter-subnet traffic to be locally switched at NVEs. R2: The solution MUST allow for both inter-subnet and intra-subnet traffic belonging to the same tenant to be locally routed and bridged Sajassi et al. Expires January 4, 2015 [Page 4] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 respectively. The solution MUST provide IP routing for inter-subnet traffic and Ethernet Bridging for intra-subnet traffic. R3: The solution MUST support bridging non-IP traffic. R4: The solution MUST allow inter-subnet switching to be disabled on a per VLAN basis on NVEs where the traffic needs to be back hauled to another node (e.g., for performing FW or DPI functionality). 2 Inter-Subnet Forwarding Scenarios The inter-subnet forwarding scenarios performed by an EVPN NVE can be divided into the following five categories. The last scenario, along with their corresponding solutions, are described in [EVPN-IPVPN- INTEROP]. The solutions for the first four scenarios are the focus of this document. 1. Switching among EVIs (subnets) within a DC 2. Switching among EVIs (subnets) in different DCs without route aggregation 3. Switching among EVIs (subnets) in different DCs with route aggregation 4. Switching among IP-VPN sites and EVPN instances with route aggregation 5. Switching among IP-VPN sites and EVPN instances without route aggregation In the above scenario, the term "route aggregation" refers to the case where a node situated at the WAN edge of the data center network behaves as a default gateway for all the destinations that are outside the data center. The absence of route aggregation refers to the scenario where NVEs within a data center maintain individual (host) routes that are outside of the data center. In the case (4) the WAN edge node also performs route aggregation for all the destinations within its own data center, and acts as an interworking unit between EVPN and IP VPN (it implements both EVPN and IP VPN functionality). Sajassi et al. Expires January 4, 2015 [Page 5] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 +---+ Enterprise Site 1 |PE1|----- H1 +---+ / ,---------. Enterprise Site 2 ,' `. +---+ ,---------. /( MPLS/IP )---|PE2|----- H2 ' DCN 3 `./ `. Core ,' +---+ `-+------+' `-+------+' __/__ / / \ \ :NVE4 : +---+ \ \ '-----' ,----|GW |. \ \ | ,' +---+ `. ,---------. VM6 ( DCN 1 ) ,' `. `. ,' ( DCN 2 ) `-+------+' `. ,' __/__ `-+------+' :NVE1 : __/__ __\__ '-----' :NVE2 : :NVE3 : | | '-----' '-----' VM1 VM2 | | | VM3 VM4 VM5 Figure 2: Interoperability Use-Cases In what follows, we will describe scenarios 3 through 6 in more detail. 2.1 Switching among Subnets within a DC In this scenario, connectivity is required between hosts (e.g. VMs) in the same data center, where those hosts belong to different IP subnets. All these subnets belong to the same tenant or are part of the same IP VPN. Each subnet is associated with a single EVPN instance, where each such EVI is realized by a collection of MAC-VRFs (one per NVE) residing on the NVEs configured for that EVI. As an example, consider VM3 and VM5 of Figure 2 above. Assume that connectivity is required between these two VMs where VM3 belongs to the IP-subnet 3 (SN3) whereas VM5 belongs to the IP-subnet 5 (SN5). Both SN3 and SN5 subnets belong to the same tenant (e.g., are part of the same IP VPN). NVE2 has an EVI3 associated with the SN3 and this EVI is represented by a MAC-VRF which is connected to an IP-VRF (for that IP VPN) via an IRB interface. NVE3 respectively has an EVI5 associated with the SN5 and this EVI is represented by an MAC-VRF which is connected to an IP-VRF (for the same IP VPN) via an IRB interface. Sajassi et al. Expires January 4, 2015 [Page 6] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 2.2 Switching among EVIs in different DCs without route aggregation This case is similar to that of section 2.1 above albeit for the fact that the hosts belong to different data centers that are interconnected over a WAN (e.g. MPLS/IP PSN). The data centers in question here are seamlessly interconnected to the WAN, i.e., the WAN edge devices does not maintain any host/VM-specific addresses in the forwarding path - e.g., there is no WAN edge GW(s) between these DCs. As an example, consider VM3 and VM6 of Figure 2 above. Assume that connectivity is required between these two VMs where VM3 belongs to the SN3 whereas VM6 belongs to the SN6. NVE2 has an EVI3 associated with SN3 and NVE4 has an EVI6 associated with the SN6. Both SN3 and SN6 are part of the same IP VPN. 2.3 Switching among EVIs in different DCs with route aggregation In this scenario, connectivity is required between hosts (e.g. VMs) in different data centers, and those hosts belong to different IP subnets. What makes this case different from that of Section 2.2 is that (in the context of a given IP-VRF) at least one of the data centers in question has a gateway as the WAN edge switch. Because of that, the NVE's IP-VRF within each data center need not maintain (host) routes to individual VMs outside of the data center. As an example, consider VM1 and VM5 of Figure 2 above. Assume that connectivity is required between these two VMs where VM1 belongs to the SN1 whereas VM5 belongs to the SN5 thus SN1 and SN5 belong to the same IP VPN. NVE3 has an EVI5 associated with the SN5 and this EVI is represented by the MAC-VRF which is connected to the IP-VRF via an IRB interface. NVE1 has an EVI1 associated with the SN1 and this EVI is represented by the MAC-VRF which is connected to the IP-VRF representing the same IP VPN. Due to the gateway at the edge of DCN 1, NVE1's IP-VRF does not need to have the address of VM5 but instead it has a default route in its IP-VRF with the next-hop being the GW. 2.4 Switching among IP-VPN sites and EVIs with route aggregation In this scenario, connectivity is required between hosts (e.g. VMs) in a data center and hosts in an enterprise site that belongs to a given IP-VPN. The NVE within the data center is an EVPN NVE, whereas the enterprise site has an IP-VPN PE. Furthermore, the data center in question has a gateway as the WAN edge switch. Because of that, the NVE in the data center does not need to maintain individual IP prefixes advertised by enterprise sites (by IP-VPN PEs). As an example, consider end-station H1 and VM2 of Figure 2. Assume Sajassi et al. Expires January 4, 2015 [Page 7] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 that connectivity is required between the end-station and the VM, where VM2 belongs to the SN2 that is realized using EVPN, whereas H1 belongs to an IP VPN site connected to PE1 (PE1 maintains an IP-VRF associated with that IP VPN). NVE1 has an EVI2 associated with the SN2. Moreover, EVI2 on NVE1 is connected to an IP-VRF associated with that IP VPN. PE1 originates a VPN-IP route that covers H1. The gateway at the edge of DCN1 performs interworking function between IP-VPN and EVPN. As a result of this, a default route in the IP-VRF on the NVE1, pointing to the gateway as the next hop, and a route to the VM2 (or maybe SN2) on the PE1's IP-VRF are sufficient for the connectivity between H1 and VM2. In this scenario, the NVE1's IP-VRF does not need to maintain a route to H1 because it has the default route to the gateway. 3 Default L3 Gateway Addressing 3.1 Homogeneous Environment This is an environment where all NVEs to which an EVPN instance could potentially be attached (or moved), perform inter-subnet switching. Therefore, inter-subnet traffic can be locally switched by the EVPN NVE connecting the VMs belonging to different subnets. To support such inter-subnet forwarding, the NVE behaves as an IP Default Gateway from the perspective of the attached end-stations (e.g. VMs). Two models are possible: 1. All the EVIs of a given EVPN instance use the same anycast default gateway IP address and the same anycast default gateway MAC address. On each NVE, this default gateway IP/MAC address correspond to the IRB interface of the EVI associated with that EVPN instance. 2. Each EVI of a given EVPN instance uses its own default gateway IP and MAC addresses, and these addresses are aliased to the same conceptual gateway through the use of the Default Gateway extended community as specified in [EVPN], which is carried in the EVPN MAC Advertisement routes. On each NVE, this default gateway IP/MAC address correspond to the IRB interface of the EVI associated with that EVPN instance. Both of these models enable a packet forwarding paradigm for asymmetric IRB forwarding where a packet can bypass the VRF processing on the egress (i.e. disposition) NVE. The egress NVE merely needs to perform a lookup in the associated EVI and forward the Ethernet frames unmodified, i.e. without rewriting the source MAC address. This is different from symmetric IRB forwarding where a packet is forwarded through the bridge module followed by the routing module on the ingress NVE, and then forwarded through the routing Sajassi et al. Expires January 4, 2015 [Page 8] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 module followed by the bridging module on the egress NVE. It is worth noting that if the applications that are running on the hosts (e.g. VMs) are employing or relying on any form of MAC security, then the first model (i.e. using anycast addresses) would be required to ensure that the applications receive traffic from the same source MAC address that they are sending to. 3.1 Heterogeneous Environment For large data centers with thousands of servers and ToR (or Access) switches, some of them may not have the capability of maintaining or enforcing policies for inter-subnet switching. Even though policies among multiple subnets belonging to same tenant can be simpler, hosts belonging to one tenant can also send traffic to peers belonging to different tenants or security zones. A L3GW not only needs to enforce policies for communication among subnets belonging to a single tenant, but also it needs to know how to handle traffic destined towards peers in different tenants. Therefore, there can be a mixed environment where an NVE performs inter-subnet switching for some EVPN instances but not others. 4 Operational Models for Asymmetric Inter-Subnet Forwarding 4.1 Among EVPN NVEs within a DC When an EVPN MAC advertisement route is received by the NVE, the IP address associated with the route is used to populate the IP-VRF table, whereas the MAC address associated with the route is used to populate both the MAC-VRF table, as well as the adjacency associated with the IP route in the IP-VRF table. When an Ethernet frame is received by an ingress NVE, it performs a lookup on the destination MAC address in the associated MAC-VRF for that EVI. If the MAC address corresponds to its IRB Interface MAC address, the ingress NVE deduces that the packet MUST be inter-subnet routed. Hence, the ingress NVE performs an IP lookup in the associated IP-VRF table. The lookup identifies both the next-hop (i.e. egress) NVE to which the packet must be forwarded, in addition to an adjacency that contains a MAC rewrite and an MPLS label stack. The MAC rewrite holds the MAC address associated with the destination host (as populated by the EVPN MAC route), instead of the MAC address of the next-hop NVE. The ingress NVE then rewrites the destination MAC address in the packet with the address specified in the adjacency. It also rewrites the source MAC address with its IRB Interface MAC address. The ingress NVE, then, forwards the frame to the next-hop (i.e. egress) NVE after encapsulating it with the MPLS Sajassi et al. Expires January 4, 2015 [Page 9] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 label stack. Note that this label stack includes the LSP label as well as the EVI label that was advertised by the egress NVE. When the MPLS encapsulated packet is received by the egress NVE, it uses the EVI label to identify the MAC-VRF table. It then performs a MAC lookup in that table, which yields the outbound interface to which the Ethernet frame must be forwarded. Figure 2 below depicts the packet flow, where NVE1 and NVE2 are the ingress and egress NVEs, respectively. NVE1 NVE2 +------------+ +------------+ | ... ... | | ... ... | |(EVI)-(VRF) | |(VRF)-(EVI) | | .|. .|. | | ... |..| | +------------+ +------------+ ^ v ^ V | | | | VM1->-+ +-->--------------+ +->-VM2 Figure 2: Inter-Subnet Forwarding Among EVPN NVEs within a DC Note that the forwarding behavior on the egress NVE is similar to EVPN intra-subnet forwarding. In other words, all the packet processing associated with the inter-subnet forwarding semantics is confined to the ingress NVE and that is why it is called Asymmetric IRB. It should also be noted that [EVPN] provides different level of granularity for the EVI label. Besides identifying bridge domain table, it can be used to identify the egress interface or a destination MAC address on that interface. If EVI label is used for egress interface or destination MAC address identification, then no MAC lookup is needed in the egress EVI and the packet can be directly forwarded to the egress interface just based on EVI label lookup. 4.2 Among EVPN NVEs in Different DCs Without Route Aggregation When an EVPN MAC advertisement route is received by the NVE, the IP address associated with the route is used to populate the IP-VRF table, whereas the MAC address associated with the route is used to populate both the MAC-VRF table, as well as the adjacency associated with the IP route in the IP-VRF table. When an Ethernet frame is received by an ingress NVE, it performs a lookup on the destination MAC address in the associated EVI. If the MAC address corresponds to its IRB Interface MAC address, the ingress Sajassi et al. Expires January 4, 2015 [Page 10] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 NVE deduces that the packet MUST be inter-subnet routed. Hence, the ingress NVE performs an IP lookup in the associated IP-VRF table. The lookup identifies both the next-hop (i.e. egress) Gateway to which the packet must be forwarded, in addition to an adjacency that contains a MAC rewrite and an MPLS label stack. The MAC rewrite holds the MAC address associated with the destination host (as populated by the EVPN MAC route), instead of the MAC address of the next-hop Gateway. The ingress NVE then rewrites the destination MAC address in the packet with the address specified in the adjacency. It also rewrites the source MAC address with its IRB Interface MAC address. The ingress NVE, then, forwards the frame to the next-hop (i.e. egress) Gateway after encapsulating it with the MPLS label stack. Note that this label stack includes the LSP label as well as an EVI label. The EVI label could be either advertised by the ingress Gateway, if inter-AS option B is used, or advertised by the egress NVE, if inter-AS option C is used. When the MPLS encapsulated packet is received by the ingress Gateway, the processing again differs depending on whether inter-AS option B or option C is employed: in the former case, the ingress Gateway swaps the EVI label in the packets with the EVI label value received from the egress Gateway. In the latter case, the ingress Gateway does not modify the EVI label and performs normal label switching on the LSP label. Similarly on the egress Gateway, for option B, the egress Gateway swaps the EVI label with the value advertised by the egress NVE. Whereas, for option C, the egress Gateway does not modify the EVI label, and performs normal label switching on the LSP label. When the MPLS encapsulated packet is received by the egress NVE, it uses the EVI label to identify the bridge-domain table. It then performs a MAC lookup in that table, which yields the outbound interface to which the Ethernet frame must be forwarded. Figure 3 below depicts the packet flow. NVE1 GW1 GW2 NVE2 +------------+ +------------+ +------------+ +------------+ | ... ... | | ... | | ... | | ... ... | |(EVI)-(VRF) | | [LS ] | | [LS ] | |(VRF)-(EVI) | | .|. .|. | | |..| | | |..| | | ... |..| | +------------+ +------------+ +------------+ +------------+ ^ v ^ V ^ V ^ V | | | | | | | | VM1->-+ +-->--------+ +------------+ +---------------+ +->-VM2 Figure 3: Inter-Subnet Forwarding Among EVPN NVEs in Different DCs without Route Aggregation Sajassi et al. Expires January 4, 2015 [Page 11] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 4.3 Among EVPN NVEs in Different DCs with Route Aggregation In this scenario, the NVEs within a given data center do not have entries for the MAC/IP addresses of hosts in remote data centers. Rather, the NVEs have a default IP route pointing to the WAN gateway for each VRF. This is accomplished by the WAN gateway advertising for a given EVPN that spans multiple DC a default VPN-IP route that is imported by the NVEs of that EVPN that are in the gateway's own DC. When an Ethernet frame is received by an ingress NVE, it performs a lookup on the destination MAC address in the associated MAC-VRF table. If the MAC address corresponds to the IRB Interface MAC address, the ingress NVE deduces that the packet MUST be inter-subnet routed. Hence, the ingress NVE performs an IP lookup in the associated IP-VRF table. The lookup, in this case, matches the default route which points to the local WAN gateway. The ingress NVE then rewrites the destination MAC address in the packet with the IRB Interface MAC address of the local WAN gateway. It also rewrites the source MAC address with its own IRB Interface MAC address. The ingress NVE, then, forwards the frame to the WAN gateway after encapsulating it with the MPLS label stack. Note that this label stack includes the LSP label as well as the IP-VPN label that was advertised by the local WAN gateway. When the MPLS encapsulated packet is received by the local WAN gateway, it uses the IP-VPN label to identify the IP-VRF table. It then performs an IP lookup in that table. The lookup identifies both the remote WAN gateway (of the remote data center) to which the packet must be forwarded, in addition to an adjacency that contains a MAC rewrite and an MPLS label stack. The MAC rewrite holds the MAC address associated with the ultimate destination host (as populated by the EVPN MAC route). The local WAN gateway then rewrites the destination MAC address in the packet with the address specified in the adjacency. It also rewrites the source MAC address with its IRB Interface MAC address. The local WAN gateway, then, forwards the frame to the remote WAN gateway after encapsulating it with the MPLS label stack. Note that this label stack includes the LSP label as well as a EVI label that was advertised by the remote WAN gateway. When the MPLS encapsulated packet is received by the remote WAN gateway, it simply swaps the EVI label and forwards the packet to the egress NVE. This implies that the GW1 needs to keep the remote host MAC addresses along with the corresponding EVI labels in the adjacency entries of the IP-VRF table. The remote WAN gateway then forward the packet to the egress NVE. The egress NVE then performs a MAC lookup in the MAC-VRF (identified by the received EVI label) to determine the outbound port to send the traffic on. Figure 4 below depicts the forwarding model. Sajassi et al. Expires January 4, 2015 [Page 12] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 NVE1 GW1 GW2 NVE2 +------------+ +------------+ +------------+ +------------+ | ... ... | | ... ... | | ... | | ... ... | |(EVI)-(VRF) | |(VRF)-(EVI) | | [LS ] | |(VRF)-(EVI) | | .|. .|. | | |..| | | |...| | | ... |..| | +------------+ +------------+ +------------+ +------------+ ^ v ^ V ^ V ^ V | | | | | | | | VM1->-+ +-->-----+ +--------------+ +---------------+ +->-VM2 Figure 4: Inter-Subnet Forwarding Among EVPN NVEs in Different DCs with Route Aggregation 4.4 Among IP-VPN Sites and EVPN NVEs with Route Aggregation In this scenario, the NVEs within a given data center do not have entries for the IP addresses of hosts in remote enterprise sites. Rather, the NVEs have a default IP route pointing the WAN gateway for each IP-VRF. When an Ethernet frame is received by an ingress NVE, it performs a lookup on the destination MAC address in the associated MAC-VRF table. If the MAC address corresponds to the IRB Interface MAC address, the ingress NVE deduces that the packet MUST be inter-subnet routed. Hence, the ingress NVE performs an IP lookup in the associated IP-VRF table. The lookup, in this case, matches the default route which points to the local WAN gateway. The ingress NVE then rewrites the destination MAC address in the packet with the IRB Interface MAC address of the local WAN gateway. It also rewrites the source MAC address with its own IRB Interface MAC address. The ingress NVE, then, forwards the frame to the local WAN gateway after encapsulating it with the MPLS label stack. Note that this label stack includes the LSP label as well as the IP-VPN label that was advertised by the local WAN gateway. When the MPLS encapsulated packet is received by the local WAN gateway, it uses the IP-VPN label to identify the VRF table. It then performs an IP lookup in that table. The lookup identifies the next hop ASBR to which the packet must be forwarded. The local gateway in this case strips the Ethernet encapsulation and perform an IP lookup in its IP-VRF and forwards the IP packet to the ASBR using a label stack comprising of an LSP label and an IP-VPN label that was advertised by the ASBR. When the MPLS encapsulated packet is received by the ASBR, it simply swaps the IP- VPN label with the one advertised by the egress PE. This implies that the remote WAN gateway must allocate the VPN label at least at the granularity of a (VRF, egress PE) tuple. The ASBR then forwards the packet to the egress PE. The egress PE then performs an IP lookup in the IP-VRF (identified by the received IP-VPN label) to determine Sajassi et al. Expires January 4, 2015 [Page 13] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 where to forward the traffic. Figure 5 below depicts the forwarding model. NVE1 GW1 ASBR NVE2 +------------+ +------------+ +------------+ +------------+ | ... ... | | ... ... | | ... | | ... | |(EVI)-(VRF) | |(VRF)-(EVI) | | [LS ] | | (VRF)| | .|. .|. | | |..| | | |...| | | |..| | +------------+ +------------+ +------------+ +------------+ ^ v ^ V ^ V ^ V | | | | | | | | VM1->-+ +-->-----+ +--------------+ +---------------+ +->-H1 Figure 5: Inter-Subnet Forwarding Among IP-VPN Sites and EVPN NVEs with Route Aggregation 4.5 Use of Centralized Gateway In this scenario, the NVEs within a given data center need to forward traffic in L2 to a centralized L3GW for a number of reasons: a) they don't have IRB capabilities or b) they don't have required policy for switching traffic between different tenants or security zones. The centralized L3GW performs both the IRB function for switching traffic among different EVPN instances as well as it performs interworking function when the traffic needs to be switched between IP-VPN sites and EVPN instances. 5 Operational Models for Symmetric Inter-Subnet Forwarding The following sections describe several main symmetric IRB forwarding scenarios. 5.1 IRB forwarding on NVEs without core-facing IRB Interface In this scenario, for a given tenant or IP-VPN, an NVE has an access- facing EVI for each tenant's subnet (VLAN) that is configured for. Assuming VLAN-based service which is typically the case for VxLAN and NVGRE encapsulation, each of these EVIs represent a MAC-VRF with one bridge domain. In case of MPLS encapsulation with VLAN-aware bundling, then each EVI may represent a MAC-VRF with multiple bridge domains (one bridge domain per VLAN). The EVIs (or MAC-VRFs) on an NVE for a given tenant are connected to an IP-VRF corresponding to that tenant (or IP-VPN) via their associated IRB interfaces. Since in this scenario, there is no core-facing IRB interface, there Sajassi et al. Expires January 4, 2015 [Page 14] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 is no need for a core-facing EVI or MAC-VRF. The advantage of not having a core-facing IRB interface may be operational simplicity as there is no need to configure an IRB interface and have a MAC-VRF associated with it and no additional BGP MAC address advertisements are needed. However, the disadvantage for not having a core-facing IRB interface is that no QoS or security policies can be enforced for the core-facing traffic on a per tenant basis. Since VxLAN and NVGRE encapsulations require inner Ethernet header (inner MAC SA/DA), and since for inter-subnet traffic, TS MAC address cannot be used, the ingress NVE's MAC address is used as inner MAC SA. It should be noted that if there was a core-facing IRB interface, then the MAC address of IRB interface would have been used as inner MAC SA. The NVE's MAC address is the device MAC address and the same MAC address is used across all EVIs and IP-VPNs. Figure below illustrates this scenario where a given tenant (e.g., IP-VPN) has three subnets represented by EVI-1, EVI-2, and EVI3 across two NVEs. There are five TSes connected to these three EVIs - i.e., TS1, TS5 are connected to EVI-1 on NVE1, TS4 is connected to EVI-1 on NVE2, TS2 is connected to EVI-2 on NVE1, and TS3 is connected to EVI3 on NVE2. When TS1, TS5, and TS4 exchange traffic with each other, only L2 forwarding (bridging) part of the IRB solution is used because all these TSes sit on the same subnet. However, when TS1 wants to exchange traffic with TS2 or TS3 which belong to different subnets, then both bridging and routing parts of the IRB solution are used. The following subsections describe the control and data planes operations for this IRB scenario in details. NVE1 +---------+ +-------------+ | | TS1-----| MACx| | | NVE2 (IP1/M1) |(EVI-1) | | | +-------------+ TS5-----| \ | | MPLS/ | |MACy (EVI-3)|-----TS3 (IP5/M5) | \ | | VxLAN/ | | / | (IP3/M3) | (VRF)|----| NVGRE |---|(VRF) | | / | | | | \ | TS2-----|(EVI-2) | | | | (EVI-1)|-----TS4 (IP2/M2) +-------------+ | | +-------------+ (IP4/M4) | | | | +---------+ Figure 6: IRB forwarding on NVEs without core-facing IRB Interface 5.1.1 Control Plane Operation for IRB forwarding without core-facing I/F Sajassi et al. Expires January 4, 2015 [Page 15] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 Each NVE advertises an RT-2 (MAC/IP Advertisement Route) for each of its TSes with the following field set: - RD and ESI per [EVPN] - Ethernet Tag = 0; assuming VLAN-based service - MAC Address Length = 48 - MAC Address = Mi ; where i = 1,2,3,4, or 5 in the above example - IP Address Length = 32 or 128 - IP Address = IPi ; where i = 1,2,3,4, or 5 in the above example - Label-1 = MPLS Label or VNID corresponding to EVI - Label-2 = MPLS Label or VNID corresponding to IP-VRF Each RT-2 route is advertised with two RTs (one corresponding to the EVI and the other corresponding to the IP-VPN) and with a new BGP attribute (section 6) that includes the tunnel type and the MAC address of the NVE (e.g., MACx for NVE1 or MACy for NVE2) . Upon receiving this advertisement, the receiving NVE performs the following: - It uses Route Targets corresponding to EVI and IP-VPN for importing this route into the corresponding MAC-VRF and IP-VRF tables. - It imports the MAC address into the MAC-VRF with BGP Next Hop address as underlay tunnel destination address (e.g., VTEP DA for VxLAN encapsulation) and Label-1 as EVI VNID for VxLAN encapsulation or EVPN label for MPLS encapsulation. - It imports the IP address into IP-VRF with NVE's MAC address (from the new BGP attribute) as inner MAC DA and BGP Next Hop address as underlay tunnel destination address (e.g., VTEP DA for VxLAN encapsulation) and Label-2 as IP-VPN VNID for VxLAN encapsulation or IP-VPN label for MPLS encapsulation. 5.1.2 Data Plane Operation for IRB forwarding without core-facing I/F The following description of the data-plane operation describes just the logical functions and the actual implementation may differ. Lets consider data-plane operation when TS1 in subnet-1 (EVI-1) on NVE1 wants to send traffic to TS3 in subnet-3 (EVI-3) on NVE2. - TS1 send an Ethernet frame with MAC DA corresponding to the EVI-1 IRB interface of NVE1, and VLAN-tag corresponding to EVI-1. - Upon receiving the Ethernet frame, the NVE1 uses VLAN-tag to identify the MAC-VRF corresponding to EVI-1. It then looks up the MAC DA and forwards the frame to its IRB interface. Sajassi et al. Expires January 4, 2015 [Page 16] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 - The Ethernet header of the frame is stripped and the packet is fed to the IP-VRF where IP lookup is performed on the destination address. This lookup yields a MAC address to be used as inner MAC DA for VxLAN/NVGRE encapsulation, an IP address to be used as VTEP DA for VxLAN encap or tunnel label for MPLS encap , and a VPN-ID to be used as VNID for VxLAN encap or IP-VPN label. - The packet is then encapsulated with the proper header based on the above info. The inner MAC SA and VTEP SA is set to NVE's MAC and IP addresses respectively. The packet is then forwarded to the egress NVE. - On the egress NVE, if the packet is VxLAN encapsulated, the VxLAN header is removed. Since the inner MAC DA is that of egress NVE, the NVE knows that it needs to perform an IP lookup. It uses VNID to identify the IP-VRF table and then performs an IP lookup which results in destination TS (TS3) MAC address and the access-facing IRB interface over which the packet needs to be sent. - The IP packet is encapsulated with an Ethernet header with MAC SA set to that of NVE-2 MAC address(MACy) and MAC DA set to that of destination TS (TS3) MAC address. The packet is sent to the corresponding MAC-VRF and after a lookup of MAC DA, is forwarded to the destination TS (TS3) over the corresponding interface. 5.2 IRB forwarding on NVEs with core-facing IRB Interface The only difference between this scenario and the previous scenario is that there is a core-facing IRB interface per tenant (or IP-VPN) on each NVE. Each core-facing IRB interface has a MAC and IP addresses associated with it and it allows for QoS/security policies to be configured on a per tenant basis on this interface. Furthermore, it allows for better OAM coverage (e.g., fault isolation) by running OAM on this interface. Other than that, the rest of the functionality is the same as the solution describe in section 5.1. This core-facing IRB interface results in additional control-plane processing (e.g., BGP routes advertisements) and additional data- plane processing as detail in the next two sub-sections. Sajassi et al. Expires January 4, 2015 [Page 17] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 NVE1 +--------------+ +---------------------+ | | TS1-----|(EVI-1) | | | (IP1/M1) | \ | | | | (VRF)-(EVI-x)|------| | | / | | | TS2-----|(EVI-2) | | MPLS/ | (IP2/M2) +---------------------+ | VxLAN/ | | NVGRE | +---------------------+ | | TS3-----|(EVI-1) | | | (IP3/M3) | \ | | | | (VRF)-(EVI-x)|------| | | / | | | TS4-----|(EVI-3) | | | (IP4/M4) +---------------------+ | | NVE2 +--------------+ Figure 7: IRB forwarding on NVEs with core-facing IRB Interface 5.2.1 Control Plane Operation for IRB forwarding with core-facing I/F Each NVE advertises an RT-2 (MAC/IP Advertisement Route) for each of its TSes and it also advertises a single RT-2 for core-facing IRB interface (which is per tenant or per IP-VPN). The fields of RT-2 for each TS are set as follow: - RD and ESI per [EVPN] - Ethernet Tag = 0; assuming VLAN-based service - MAC Address Length = 48 - MAC Address = Mi ; MAC address of TS - IP Address Length = 32 or 128 - IP Address = IPi ; IP address of TS - Label-1 = MPLS Label or VNID corresponding to access-facing EVI Furthermore, this RT-2 is also advertised with two RTs (one corresponding to the EVI and the other corresponding to the IP-VPN) as described in section 5.1.1. The main difference in terms of BGP advertisement for this per-TS RT-2 is that it is advertised with a new BGP attribute (section 6) that includes the tunnel type and the IP address of the core-facing IRB interface (which is per tenant). Upon receiving this per-TS RT-2 advertisement, the receiving NVE performs the following: - It uses the Route Targets corresponding to EVI and IP-VPN for Sajassi et al. Expires January 4, 2015 [Page 18] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 importing this route into the corresponding MAC-VRF and IP-VRF tables similar to section 5.1.1. - It imports the MAC address into the MAC-VRF just like section 5.1.1. - It imports the IP address into IP-VRF with next hop pointing to the IP address of core-facing IRB interface (carried in the new BGP attribute). The fields of RT-2 advertised for core-facing IRB interface, are set as follow. This RT-2 is advertised with an RT corresponding to the core-facing EVI (e.g., EVI-x). This RT-2 is also advertised as a sticky MAC per section 15.2 of [EVPN] in order to ensure mis- configuration is caught quickly. - RD per [EVPN] - ESI = 0 - Ethernet Tag = 0 - MAC Address Length = 48 - MAC Address = Ma ; MAC address of core-facing IRB interface - IP Address Length = 32 or 128 - IP Address = IPa ; IP address of core-facing IRB interface - Label-1 = MPLS Label or VNID corresponding to core-facing EVI Upon receiving the RT-2 advertisement corresponding to core-facing IRB interface, the receiving NVE performs the following: - It uses the Route Target corresponding to the EVI-x, to identify MAC-VRF associated with EVI-x. - It imports the MAC address into the MAC-VRF associated with EVI-x with BGP Next Hop address as underlay tunnel destination address (e.g., VTEP DA for VxLAN encapsulation) and Label-1 as EVI VNID for VxLAN encapsulation or EVPN label for MPLS encapsulation. - It imports (MAC/IP ) pair associated with core-facing IRB interface into the overlay ARP table. This overlay ARP table is used to resolve per-TS IP addresses imported into the IP-VRF table previously. 5.2.2 Data Plane Operation for IRB forwarding with core-facing I/F The following description of the data-plane operation describes just the logical functions and the actual implementation may differ. Lets consider data-plane operation when TS1 in subnet-1 (EVI-1) on NVE1 wants to send traffic to TS4 in subnet-3 (EVI-3) on NVE2. Sajassi et al. Expires January 4, 2015 [Page 19] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 - TS1 send an Ethernet frame with MAC DA corresponding to the EVI-1 IRB interface of NVE1, and VLAN-tag corresponding to EVI-1 just like section 5.1.1. - Upon receiving the Ethernet frame, the ingress NVE1 uses VLAN-tag to identify the MAC-VRF corresponding to EVI-1. It then looks up the MAC DA and forwards the frame to its IRB interface just like section 5.1.1. - The Ethernet header of the frame is stripped and the packet is fed to the IP-VRF where IP lookup is performed on the destination address. This lookup yield a MAC address (corresponding to the destination core-facing IRB interface) and its local core-facing IRB interface over which the packet is sent. - The packet is encapsulated with an Ethernet header where MAC SA is set to that of the local core-facing IRB interface and MAC DA is set to that of the remote core-facing IRB interface. The packet is then sent to the core-facing EVI of the ingress NVE. - MAC DA lookup is performed in the core-facing IRB of the ingress NVE. This lookup yields an IP address to be used as VTEP DA for VxLAN encap or tunnel label for MPLS encap , and a VPN-ID to be used as VNID for VxLAN encap or IP-VPN label. - The packet is then encapsulated with the proper header based on the above info and is forwarded to the egress NVE. - On the egress NVE, if the packet is VxLAN encapsulated, the VxLAN header is removed and the resultant Ethernet frame is fed into the core-facing MAC-VRF associated with that tenant based on the VNID. - The MAC DA lookup yields the core-facing IRB interface of the egress NVE over which the frame is sent. Next, the Ethernet header is removed and a lookup is performed based on IP DA in the associated IP-VRF for that tenant. The IP lookup yields the destination TS (TS3) MAC address and the access-facing IRB interface over which the packet needs to be sent. - The IP packet is encapsulated with an Ethernet header with the MAC SA set to that of the access-facing IRB interface of the egress NVE (NVE2) and the MAC DA is set to that of destination TS (TS4) MAC address. The packet is sent to the corresponding MAC-VRF and after a lookup of MAC DA, is forwarded to the destination TS (TS3) over the corresponding interface. 6 BGP Encoding Sajassi et al. Expires January 4, 2015 [Page 20] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 A new BGP attribute with the following encoding is introduced. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Tunnel Type (2 Octets) | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Addr len | Address (IPv4, MAC, or IPv6) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Tunnel Type (2 octets): identifies the type of tunneling technology being signaled. This document specifies the following types: This document defines the following types: - VXLAN: Tunnel Type = 8 - NVGRE: Tunnel Type = 9 - GTP: Tunnel Type = 10 Unknown types MUST be ignored and skipped upon receipt. Length (2 octets): the total number of octets of the value field. Address Length - Addr len (1 octet): Length of Address. Set to 4 bytes for an IPv4 address, 6 bytes for MAC address, and 16 bytes for an IPv6 address. 7 VM Mobility 7.1 VM Mobility & Optimum Forwarding for VM's Outbound Traffic Optimum forwarding for the VM's outbound traffic, upon VM mobility, can be achieved using either the anycast default Gateway MAC and IP addresses, or using the address aliasing as discussed in [DC- MOBILITY]. 7.2 VM Mobility & Optimum Forwarding for VM's Inbound Traffic For optimum forwarding of the VM's inbound traffic, upon VM mobility, all the NVEs and/or IP-VPN PEs need to know the up to date location of the VM. Two scenarios must be considered, as discussed next. In what follows, we use the following terminology: - source NVE refers to the NVE behind which the VM used to reside prior to the VM mobility event. Sajassi et al. Expires January 4, 2015 [Page 21] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 - target NVE refers to the new NVE behind which the VM has moved after the mobility event. 7.2.1 Mobility without Route Aggregation In this scenario, when a target NVE detects that a MAC mobility event has occurred, it initiates the MAC mobility handshake in BGP as specified in [EVPN]. The WAN Gateways, acting as ASBRs in this case, re-advertise the MAC route of the target NVE with the MAC Mobility extended community attribute unmodified. Because the WAN Gateway for a given data center re-advertises BGP routes received from the WAN into the data center, the source NVE will receive the MAC Advertisement route of the target NVE (with the next hop attribute adjusted depending on which inter-AS option is employed). The source NVE will then withdraw its original MAC Advertisement route as a result of evaluating the Sequence Number field of the MAC Mobility extended community in the received MAC Advertisement route. This is per the procedures already defined in [EVPN]. 7.2.2 Mobility with Route Aggregation This section will be completed in the next revision. 8 Acknowledgements The authors would like to thank Sami Boutros for his valuable comments. 9 Security Considerations 10 IANA Considerations 11 References 11.1 Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 11.2 Informative References [EVPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf- l2vpn-evpn-04.txt, work in progress, July, 2014. Sajassi et al. Expires January 4, 2015 [Page 22] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 [EVPN-IPVPN-INTEROP] Sajassi et al., "EVPN Seamless Interoperability with IP-VPN", draft-sajassi-l2vpn-evpn-ipvpn-interop-01, work in progress, October, 2012. [DC-MOBILITY] Aggarwal et al., "Data Center Mobility based on BGP/MPLS, IP Routing and NHRP", draft-raggarwa-data-center-mobility- 05.txt, work in progress, June, 2013. Authors' Addresses Ali Sajassi Cisco Email: sajassi@cisco.com Samer Salam Cisco Email: ssalam@cisco.com Yakov Rekhter Juniper Networks Email: yakov@juniper.net John E. Drake Juniper Networks Email: jdrake@juniper.net Lucy Yong Huawei Technologies Email: lucy.yong@huawei.com Linda Dunbar Huawei Technologies Email: linda.dunbar@huawei.com Wim Henderickx Alcatel-Lucent Email: wim.henderickx@alcatel-lucent.com Florin Balus Alcatel-Lucent Sajassi et al. Expires January 4, 2015 [Page 23] INTERNET DRAFT Integrated Routing & Bridging in EVPN February 13, 2014 Email: Florin.Balus@alcatel-lucent.com Samir Thoria Cisco Email: sthoria@cisco.com Sajassi et al. Expires January 4, 2015 [Page 24]