< draft-boutros-bess-vxlan-evpn-00.txt   draft-boutros-bess-vxlan-evpn-01.txt >
INTERNET-DRAFT Sami Boutros INTERNET-DRAFT Sami Boutros
Intended Status: Informational Ali Sajassi Intended Status: Informational VMware
Ali Sajassi
Samer Salam Samer Salam
Dennis Cai Dennis Cai
Samir Thoria Samir Thoria
Cisco Systems Cisco Systems
Tapraj Singh Tapraj Singh
John Drake John Drake
Juniper Networks Juniper Networks
Jeff Tantsura Jeff Tantsura
Ericsson Ericsson
Expires: January 5, 2016 July 4, 2015 Expires: September 17, 2016 March 16, 2016
VXLAN DCI Using EVPN VXLAN DCI Using EVPN
draft-boutros-bess-vxlan-evpn-00.txt draft-boutros-bess-vxlan-evpn-01.txt
Abstract Abstract
This document describes how Ethernet VPN (E-VPN) technology can be This document describes how Ethernet VPN (E-VPN) technology can be
used to interconnect VXLAN or NVGRE networks over an MPLS/IP network. used to interconnect VXLAN or NVGRE networks over an MPLS/IP network.
This is to provide intra-subnet connectivity at Layer 2 and control- This is to provide intra-subnet connectivity at Layer 2 and control-
plane separation among the interconnected VXLAN or NVGRE networks. plane separation among the interconnected VXLAN or NVGRE networks.
The scope of the learning of host MAC addresses in VXLAN or NVGRE The scope of the learning of host MAC addresses in VXLAN or NVGRE
network is limited to data plane learning in this document. network is limited to data plane learning in this document.
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
skipping to change at page 2, line 4 skipping to change at page 2, line 5
other groups may also distribute working documents as other groups may also distribute working documents as
Internet-Drafts. Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Copyright and License Notice Copyright and License Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 2, line 34 skipping to change at page 2, line 36
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Control Plane Separation among VXLAN/NVGRE Networks . . . . 4 2.1. Control Plane Separation among VXLAN/NVGRE Networks . . . . 4
2.2 All-Active Multi-homing . . . . . . . . . . . . . . . . . . 5 2.2 All-Active Multi-homing . . . . . . . . . . . . . . . . . . 5
2.3 Layer 2 Extension of VNIs/VSIDs over the MPLS/IP Network . . 5 2.3 Layer 2 Extension of VNIs/VSIDs over the MPLS/IP Network . . 5
2.4 Support for Integrated Routing and Bridging (IRB) . . . . . 5 2.4 Support for Integrated Routing and Bridging (IRB) . . . . . 5
3. Solution Overview . . . . . . . . . . . . . . . . . . . . . . . 5 3. Solution Overview . . . . . . . . . . . . . . . . . . . . . . . 5
3.1. Redundancy and All-Active Multi-homing . . . . . . . . . . 6 3.1. Redundancy and All-Active Multi-homing . . . . . . . . . . 6
4. EVPN Routes . . . . . . . . . . . . . . . . . . . . . . . . . 7 4. EVPN Routes . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1. BGP MAC Advertisement Route . . . . . . . . . . . . . . . 8 4.1. BGP MAC Advertisement Route . . . . . . . . . . . . . . . 7
4.2. Ethernet Auto-Discovery Route . . . . . . . . . . . . . . 8 4.2. Ethernet Auto-Discovery Route . . . . . . . . . . . . . . 8
4.3. Per VPN Route Targets . . . . . . . . . . . . . . . . . . 8 4.3. Per VPN Route Targets . . . . . . . . . . . . . . . . . . 8
4.4 Inclusive Multicast Route . . . . . . . . . . . . . . . . . 8 4.4 Inclusive Multicast Route . . . . . . . . . . . . . . . . . 8
5.0 Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.5. Unicast Forwarding . . . . . . . . . . . . . . . . . . . . 8
5.1 Unicast Forwarding . . . . . . . . . . . . . . . . . . . . 9 4.6. Handling Multicast . . . . . . . . . . . . . . . . . . . . 9
5.2 Handling Multicast . . . . . . . . . . . . . . . . . . . . . 9 4.6.2. Multicast Stitching with Per-VNI Load Balancing . . . . 9
5.2.1 Multicast Stitching with Per-VNI Load Balancing . . . . 10 4.6.2.1 PIM SM operation . . . . . . . . . . . . . . . . . . 10
5.2.2 PIM SM operation . . . . . . . . . . . . . . . . . . . . 11 5. NVGRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6. NVGRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6. Use Cases Overview . . . . . . . . . . . . . . . . . . . . . . 11
7. Use Cases Overview . . . . . . . . . . . . . . . . . . . . . . 12 6.1. Homogeneous Network DCI interconnect Use cases . . . . . . 12
7.1. Homogeneous Network DCI interconnect Use cases . . . . . . 12 6.1.1. VNI Base Mode EVPN Service Use Case . . . . . . . . . . 12
7.1.1. VNI Base Mode EVPN Service Use Case . . . . . . . . . . 12 6.1.2. VNI Bundle Service Use Case Scenario . . . . . . . . . 13
7.1.2. VNI Bundle Service Use Case Scenario . . . . . . . . . 13 6.1.3. VNI Translation Use Case . . . . . . . . . . . . . . 13
7.1.3. VNI Translation Use Case . . . . . . . . . . . . . . 13
7.2. Heterogeneous Network DCI Use Cases Scenarios . . . . . . . 14 6.2. Heterogeneous Network DCI Use Cases Scenarios . . . . . . . 13
7.2.1. VXLAN VLAN Interworking Over EVPN Use Case Scenario . . 14 6.2.1. VXLAN VLAN Interworking Over EVPN Use Case Scenario . . 13
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14
9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 14
11.1 Normative References . . . . . . . . . . . . . . . . . . . 15 10.1 Normative References . . . . . . . . . . . . . . . . . . . 14
11.2 Informative References . . . . . . . . . . . . . . . . . . 15 10.2 Informative References . . . . . . . . . . . . . . . . . . 14
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15
1 Introduction 1 Introduction
[EVPN] introduces a solution for multipoint L2VPN services, with [EVPN] introduces a solution for multipoint L2VPN services, with
advanced multi-homing capabilities, using BGP control plane over the advanced multi-homing capabilities, using BGP control plane over the
core MPLS/IP network. [VXLAN] defines a tunneling scheme to overlay core MPLS/IP network. [VXLAN] defines a tunneling scheme to overlay
Layer 2 networks on top of Layer 3 networks. [VXLAN] allows for Layer 2 networks on top of Layer 3 networks. [VXLAN] allows for
optimal forwarding of Ethernet frames with support for multipathing optimal forwarding of Ethernet frames with support for multipathing
of unicast and multicast traffic. VXLAN uses UDP/IP encapsulation for of unicast and multicast traffic. VXLAN uses UDP/IP encapsulation for
skipping to change at page 4, line 40 skipping to change at page 4, line 40
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
LDP: Label Distribution Protocol. MAC: Media Access Control MPLS: LDP: Label Distribution Protocol. MAC: Media Access Control MPLS:
Multi Protocol Label Switching. OAM: Operations, Administration and Multi Protocol Label Switching. OAM: Operations, Administration and
Maintenance. PE: Provide Edge Node. PW: PseudoWire. TLV: Type, Maintenance. PE: Provide Edge Node. PW: PseudoWire. TLV: Type,
Length, and Value. VPLS: Virtual Private LAN Services. VXLAN: Virtual Length, and Value. VPLS: Virtual Private LAN Services. VXLAN: Virtual
eXtensible Local Area Network. VTEP: VXLAN Tunnel End Point VNI: eXtensible Local Area Network. VTEP: VXLAN Tunnel End Point VNI:
VXLAN Network Identifier (or VXLAN Segment ID) ToR: Top of Rack VXLAN Network Identifier (or VXLAN Segment ID) ToR: Top of Rack
switch. switch. LACP: Link Aggregation Control Protocol
2. Requirements 2. Requirements
2.1. Control Plane Separation among VXLAN/NVGRE Networks 2.1. Control Plane Separation among VXLAN/NVGRE Networks
It is required to maintain control-plane separation for the underlay It is required to maintain control-plane separation for the underlay
networks (e.g., among the various VXLAN/NVGRE networks) being networks (e.g., among the various VXLAN/NVGRE networks) being
interconnected over the MPLS/IP network. This ensures the following interconnected over the MPLS/IP network. This ensures the following
characteristics: characteristics:
skipping to change at page 6, line 23 skipping to change at page 6, line 23
+-----+ | |---|PE1 | |PE3 |--| | +-----+ +-----+ | |---|PE1 | |PE3 |--| | +-----+
|VTEP1|--| | +----+ +----+ | |--|VTEP3| |VTEP1|--| | +----+ +----+ | |--|VTEP3|
+-----+ | VXLAN | +----+ +----+ | VXLAN | +-----+ +-----+ | VXLAN | +----+ +----+ | VXLAN | +-----+
+-----+ | |---|PE2 | |PE4 |--| | +-----+ +-----+ | |---|PE2 | |PE4 |--| | +-----+
|VTEP2|--| | +----+Backbone+----+ | |--|VTEP4| |VTEP2|--| | +----+Backbone+----+ | |--|VTEP4|
+-----+ +---------+ +--------------+ +---------+ +-----+ +-----+ +---------+ +--------------+ +---------+ +-----+
|<--- Underlay IGP ---->|<-Overlay BGP->|<--- Underlay IGP --->| CP |<--- Underlay IGP ---->|<-Overlay BGP->|<--- Underlay IGP --->| CP
|<----- VXLAN --------->|<EVPN/PBB-EVPN>|<------ VXLAN ------->| DP |<----- VXLAN --------->|<EVPN/PBB-EVPN>|<------ VXLAN ------->| DP
|<----MPLS----->| |<----MPLS----->|
Legend: CP = Control Plane View DP = Data Plane View Legend: CP = Control Plane View DP = Data Plane View
Figure 1: Interconnecting VXLAN Networks with VXLAN-EVPN Figure 1: Interconnecting VXLAN Networks with VXLAN-EVPN
3.1. Redundancy and All-Active Multi-homing 3.1. Redundancy and All-Active Multi-homing
When a VXLAN network is multi-homed to two or more PEs, and provided When a VXLAN network is multi-homed to two or more PEs, and provided
that these PEs have the same IGP distance to a given NVE, the that these PEs have the same IGP distance to a given NVE, the
solution MUST support load-balancing of traffic between the NVE and solution MUST support load-balancing of traffic between the NVE and
skipping to change at page 7, line 22 skipping to change at page 7, line 22
a) It prevents any flip/flopping in the forwarding tables for the a) It prevents any flip/flopping in the forwarding tables for the
MAC-to-VTEP associations MAC-to-VTEP associations
b) It enables load-balancing via ECMP for DCI traffic among the b) It enables load-balancing via ECMP for DCI traffic among the
multi-homed PEs multi-homed PEs
In the baseline [EVPN] draft, the all-active multi-homing is In the baseline [EVPN] draft, the all-active multi-homing is
described for a multi-homed device (MHD) using [LACP] and the single- described for a multi-homed device (MHD) using [LACP] and the single-
active multi-homing is described for a multi-homed network (MHN) active multi-homing is described for a multi-homed network (MHN)
using [802.1Q]. In this draft, the all-active multi-homing is using [802.1Q]. In this draft, the all-active multi-homing is
described for a VXLAN MHN. This requires some changes to the described for a VXLAN MHN. This implies some changes to the filtering
filtering used for BUM traffic which will be described in detail in which will be described in details in the multicast section (Section
the multicast sections (Sections 5.2.1 and 5.2.2). 4.6.2).
The filtering used for BUM traffic in all-active multi-homing in
[EVPN] is asymmetric. BUM traffic from the MPLS/IP network to the
multi-homed site is dropped by the non-DF PE(s) and only sent to the
multi-homed site by the DF, while BUM traffic from the multi-homed
site to the MPLS/IP network may be sent by any PE to the MPLS/IP
network. This is because [EVPN] assumes all-active multi-homing is
used in conjunction with MHD, in which the CE is attached to multiple
PEs via a LAG and hashes the frames of a given BUM flow to the same
PE, ensuring that those frames are only sent to the MPLS/IP network
by one PE.
However, this document assumes that all-active multi-homing is used
in conjunction with MHN, which means that the frames of a given BUM
flow are sent to all PEs attached to the multi-homed site. In order
to avoid duplication only the DF can send BUM traffic from the multi-
homed site to the MPLS/IP network; the non DF PE(s) MUST drop BUM
traffic received from the multi-homed site.
If PIM Bidir is used within the multi-homed site, BUM traffic from The filtering used for BUM traffic of all-active multi-homing in
the MPLS/IP network to the multi-homed site is dropped by the non-DF [EVPN] is asymmetric; where the BUM traffic from the MPLS/IP network
PE(s) and only sent to the multi-homed site by the DF. If PIM SM is towards the multi-homed site is filtered on non-DF PE(s) and it
used within the multi-homed site, BUM traffic from the MPLS/IP passes thorough the DF PE. There is no filtering of BUM traffic
network to the multi-homed site is sent to the multi-homed site by originating from the multi-homed site because of the use of Ethernet
multiple PEs attached to it. Link Aggregation: the MHD hashes the BUM traffic to only a single
link. However, in this solution because BUM traffic can arrive at
both PEs in both core-to-site and site-to-core directions, the
filtering needs to be symmetric just like the filtering of BUM
traffic for single-active multi-homing (on a per service
instance/VLAN basis).
4. EVPN Routes 4. EVPN Routes
This solution leverages the same BGP Routes and Attributes defined in This solution leverages the same BGP Routes and Attributes defined in
[EVPN], adapted as follows: [EVPN], adapted as follows:
4.1. BGP MAC Advertisement Route 4.1. BGP MAC Advertisement Route
This route and its associated modes are used to distribute the This route and its associated modes are used to distribute the
customer MAC addresses learnt in data plane over the VXLAN tunnel in customer MAC addresses learnt in data plane over the VXLAN tunnel in
case of EVPN. Or can be used to distribute the provider Backbone MAC case of EVPN. Or can be used to distribute the provider Backbone MAC
addresses in case of PBB-EVPN. addresses in case of PBB-EVPN.
skipping to change at page 9, line 13 skipping to change at page 8, line 49
- Ethernet Tag ID is set to zero for VNI-based mode and to VNI for - Ethernet Tag ID is set to zero for VNI-based mode and to VNI for
VNI-aware bundle mode. VNI-aware bundle mode.
- Originating Router's IP Address is set to one of the PE's IP - Originating Router's IP Address is set to one of the PE's IP
addresses. addresses.
All other fields are set as defined in [EVPN]. All other fields are set as defined in [EVPN].
Please see section 4.6 "Handling Multicast" Please see section 4.6 "Handling Multicast"
5.0 Forwarding 4.5. Unicast Forwarding
5.1 Unicast Forwarding
Host MAC addresses will be learnt in data plane from the VXLAN Host MAC addresses will be learnt in data plane from the VXLAN
network and associated with the corresponding VTEP identified by the network and associated with the corresponding VTEP identified by the
source IP address. Host MAC addresses will be learnt in control plane source IP address. Host MAC addresses will be learnt in control plane
if EVPN is implemented over the MPLS/IP core, or in the data-plane if if EVPN is implemented over the MPLS/IP core, or in the data-plane if
PBB-EVPN is implemented over the MPLS core. When Host MAC addressed PBB-EVPN is implemented over the MPLS core. When Host MAC addressed
are learned in data plane over MPLS/IP core [in case of PBB-EVPN], are learned in data plane over MPLS/IP core [in case of PBB-EVPN],
they are associated with their corresponding BMAC addresses. they are associated with their corresponding BMAC addresses.
L2 Unicast traffic destined to the VXLAN network will be encapsulated L2 Unicast traffic destined to the VXLAN network will be encapsulated
with the IP/UDP header and the corresponding customer bridge VNI. with the IP/UDP header and the corresponding customer bridge VNI.
L2 Unicast traffic destined to the MPLS/IP network will be L2 Unicast traffic destined to the MPLS/IP network will be
encapsulated with the MPLS label. encapsulated with the MPLS label.
5.2 Handling Multicast 4.6. Handling Multicast
Each VXLAN network independently builds its P2MP or MP2MP shared Each VXLAN network independently builds its P2MP or MP2MP shared
multicast trees. A P2MP or MP2MP tree is built for one or more VNIs multicast trees. A P2MP or MP2MP tree is built for one or more VNIs
local to the VXLAN network. local to the VXLAN network.
In the MPLS/IP network, multiple options are available for the In the MPLS/IP network, multiple options are available for the
delivery of multicast traffic: - Ingress replication - LSM delivery of multicast traffic: - Ingress replication - LSM
with Inclusive trees - LSM with Aggregate Inclusive trees - with Inclusive trees - LSM with Aggregate Inclusive trees -
LSM with Selective trees - LSM with Aggregate Selective trees LSM with Selective trees - LSM with Aggregate Selective trees
skipping to change at page 10, line 16 skipping to change at page 9, line 50
2. Avoiding Forwarding Loops: In the case of VXLAN network multi- 2. Avoiding Forwarding Loops: In the case of VXLAN network multi-
homing, the solution must ensure that a multicast frame forwarded by homing, the solution must ensure that a multicast frame forwarded by
a given PE to the MPLS core is not forwarded back by another PE (in a given PE to the MPLS core is not forwarded back by another PE (in
the same VXLAN network) to the VXLAN network of origin. The same the same VXLAN network) to the VXLAN network of origin. The same
applies for traffic in the core to site direction. applies for traffic in the core to site direction.
The following approach of per-VNI load balancing can guarantee proper The following approach of per-VNI load balancing can guarantee proper
stitching that meets the above requirements. stitching that meets the above requirements.
5.2.1 Multicast Stitching with Per-VNI Load Balancing 4.6.2. Multicast Stitching with Per-VNI Load Balancing
To setup multicast trees in the VXLAN network for DC applications, To setup multicast trees in the VXLAN network for DC applications,
PIM Bidir can be of special interest because it reduces the amount of PIM Bidir can be of special interest because it reduces the amount of
multicast state in the network significantly. Furthermore, it multicast state in the network significantly. Furthermore, it
alleviates any special processing for RPF check since PIM Bidir alleviates any special processing for RPF check since PIM Bidir
doesn't require any RPF check. The RP for PIM Bidir can be any of the doesn't require any RPF check. The RP for PIM Bidir can be any of the
spine nodes. Multiple trees can be built (e.g., one tree rooted per spine nodes. Multiple trees can be built (e.g., one tree rooted per
spine node) for efficient load-balancing within the network. All PEs spine node) for efficient load-balancing within the network. All PEs
participating in the multi-homing of the VXLAN network join all the participating in the multi-homing of the VXLAN network join all the
trees. Therefore, for a given tree, all PEs receive BUM traffic. DF trees. Therefore, for a given tree, all PEs receive BUM traffic. DF
election procedures of [EVPN] are used to ensure that only traffic election procedures of [EVPN] are used to ensure that only traffic
skipping to change at page 11, line 5 skipping to change at page 10, line 39
traffic received from the VXLAN/NVGRE network and the host MAC SA is traffic received from the VXLAN/NVGRE network and the host MAC SA is
learnt against the source VTEP address. learnt against the source VTEP address.
The PE nodes, connected to a multi-homed VXLAN network, perform BGP The PE nodes, connected to a multi-homed VXLAN network, perform BGP
DF election to decide which PE node is responsible for forwarding DF election to decide which PE node is responsible for forwarding
multicast traffic associated with a given VNI. A PE would forward multicast traffic associated with a given VNI. A PE would forward
multicast traffic for a given VNI only when it is the DF for this multicast traffic for a given VNI only when it is the DF for this
VNI. This forwarding rule applies in both the site-to-core as well as VNI. This forwarding rule applies in both the site-to-core as well as
core-to-site directions. core-to-site directions.
5.2.2 PIM SM operation 4.6.2.1 PIM SM operation
In some situations, it may be desirable to use PIM SM in a VXLAN With PIM SM, multicast traffic from the core-to-site could be dropped
networks's underlay network. However, because all of the PEs since a transit router may decide that the RPF path towards the
attached to a multi-homed site use the same IP anycast address, if anycast address source is toward a PE node that is not the DF.
only one PE (the DF) sent BUM traffic from the MPLS/IP network to the
multi-homed site, the RPF check in PIM SM would cause any router in
the VXLAN network's underlay network whose shortest path to that IP
anycast address was to a different PE to drop this BUM traffic.
Conversely, if BUM traffic from the MPLS/IP network to the multi-
homed site is sent to the multi-homed site by multiple PEs attached
to it, the RPF check in PIM SM would cause any router in the VXLAN
network's underlay network to forward only one copy of a given BUM
packet.
The following is a description of operations with respect to a given The PE nodes whether DF or not, has to forward forward multicast
VNI in the VXLAN network. traffic from core-to-side.
- All PEs attached to a multi-homed site join towards the RP for the The operation would work as follow:
multicast group for that VNI.
- When the first BUM packet is received from the MPLS/IP network all Initially, the PE nodes connected to the multi-homed VXLAN network as
PEs attached to the multi-homed site will send PIM register messages well the VTEPs, join towards the RP for the multicast group for a
to the RP. The multicast flow is identified as (anycast address, particular VXLAN.
group) in the register message, and the source address for the PIM-
SIM register message is a unique address on the PE, typically the
sending interface address.
- Upon receiving a register message the RP will send a join for the When BUM traffic needs to be flooded from core to site, all the PE
(anycast address, group), routed towards the closest PE, and that PE nodes connected to the multi-homed VXLAN network send PIM register
will switch to sending BUM traffic natively. Upon receiving the messages to the RP. The multicast flow is identified as (anycast
native BUM traffic, the RP will send register-stop messages for any address, group) in the register message, and the source address for
PEs that continue sending register messages (because only one PE the PIM-SM register message should be a unique address on the PE node
will get the (anycast address, group) join and switch to native not the anycast address.
forwarding).
- After VTEPs receive traffic from the RP, they will send (anycast The RP will send a join for the (anycast address, group) upon
address, group) join, routed towards the closest PE (wrt each VTEP). receiving the register message, routed towards the closest PE which
This may start native forwarding on multiple PEs, but each VTEP or could be either the DF or the non-DF. This PE will switch to send
router in the VXLAN network's underlay network will only accept BUM traffic natively. Upon receiving the native traffic, the RP will send
traffic from one of the PEs attached to the multi-homed site. register-stop messages for other PEs that keep sending registering
messages, given that only one PE will get the (anycast address,
group) join.
- If BUM traffic stops for a long time, relevant PIM state will time When VTEPs receive traffic from the RP, VTEPs will send (anycast
out and the next BUM packet for that multicast group will trigger address, group) join, routed towards the closet PE to each VTEP. This
the above steps again. starts native forwarding on multiple PE nodes connected to the VXLAN
network, but each VTEP or transit router will only accept multicast
traffic from one of the multi-homed PE nodes.
Note that before the RP receives the first natively sent packet from If PIM state times out when multicast traffic stops for a period of
one particular PE, all packets encapsulated in the register messages time, the next flooded packet will trigger the above process again.
from all PEs will be forwarded by the RP, causing duplications. This
should be transient and will stop as soon as the first native packet
is received by the RP. If the transient duplication is a concern,
then null-register SHOULD be used at the beginning (instead of
encapsulating BUM traffic in register messages), but that will lead
to transient loss of initial packets.
To avoid packet loss and duplication, the PEs attached to the multi- It is to be noted that before the RP receives the first natively sent
homed site SHOULD send null-register periodically as soon as initial packet from one particular PE node connected to the multihomed VXLAN
provisioning is completed to pre-build and maintain the relevant PIM network, all packets encapsulated in the register messages from all
state. PEs will be forwarded by the RP, causing duplications.
6. NVGRE A possible optimization is for all PE nodes connected to the
multihomed VXLAN network to send null-register periodically to
maintain the PIM state at the RP, instead of encapsulating flooded
packets in register messages.
The site-to-core operations for flooding BUM traffic would still be
subject to DF election per VNI as described above.
5. NVGRE
Just like VXLAN, all the above specification would apply for NVGRE, Just like VXLAN, all the above specification would apply for NVGRE,
replacing the VNI with Virtual Subnet Identifier (VSID) and the VTEP replacing the VNI with Virtual Subnet Identifier (VSID) and the VTEP
with NVGRE Endpoint. with NVGRE Endpoint.
7. Use Cases Overview 6. Use Cases Overview
7.1. Homogeneous Network DCI interconnect Use cases This covers DCI 6.1. Homogeneous Network DCI interconnect Use cases
interconnect of two or more VXLAN based Data center over MPLS enabled
EVPN core.
7.1.1. VNI Base Mode EVPN Service Use Case This use case handles the This covers DCI interconnect of two or more VXLAN based Data center
EVPN service where there is one to one mapping between a VNI and an over MPLS enabled EVPN core.
EVI. Ethernet TAG ID of EVPN BGP NLRI should be set to Zero. BD ID
can be derived from the RT associated with the EVI/VNI. 6.1.1. VNI Base Mode EVPN Service Use Case
This use case handles the EVPN service where there is one to one
mapping between a VNI and an EVI. Ethernet TAG ID of EVPN BGP NLRI
should be set to Zero. BD ID can be derived from the RT associated
with the EVI/VNI.
+---+ +---+ +---+ +---+
| H1| +---+ +-----+ +--+ +---------+ +---+ +-----+ +---+ | H3| | H1| +---++-------+ +--+ +---------+ +---+ +------+ +---+ | H3|
| M1|--+ +-+ +-+PE1+-+ +-+PE3+--+ +--+ +--| M3| | M1|--+ ++ +-+PE1+-+ +-+PE3+--+ +--+ +-| M3|
+---+ | | | | +--+ |MPLS Core| +---+ | | | | +---+ +---+ | || | +--+ |MPLS Core| +---+ | | | | +---+
+---+ |NVE| |VXLAN| | (EVPN) | |VXLAN| |NVE| +---+ +---+ |NVE|| VXLAN | | (EVPN) | | VXLAN| |NVE| +---+
| H2| | 1 | | | +--+ | | +---+ | | | 2 | | H4| | H2| | 1 || | +--+ | | +---+ | | | 2 | | H4|
| M2|--+ +-+ +-+PE2+-+ +-+PE4+--+ +--+ +--| M4| | M2|--+ +-+ +-+PE2+-+ +-+PE4+--+ +--+ +-| M4|
+---+ +---+ +-----+ +--+ +---------+ +---+ +-----+ +---+ +---+ +---+ +---++-------+ +--+ +---------+ +---+ +------+ +---+ +---+
+--------+------+--------+-------+--------+------+--------+--------+ +--------+------+--------+------+--------+------+--------+--------+
|Original|VXLAN |Original|MPLS |Original|VXLAN |Original|Original| |Original|VXLAN |Original|MPLS |Original|VXLAN |Original|Original|
|Ethernet|Header|Ethernet|Header |Ethernet|Header|Ethernet|Ethernet| |Ethernet|Header|Ethernet|Header|Ethernet|Header|Ethernet|Ethernet|
|Frame | |Frame | |Frame | |Frame |Frame | |Frame | |Frame | |Frame | |Frame |Frame |
+--------+------+--------+-------+--------+------+--------+--------+ +--------+------+--------+------+--------+------+--------+--------+
|<---Data Center Site1-->|<-----EVPN Core>|<---Data Center Site2-->| |<----Data Center Site1->|<----EVPN Core>|<---Data Center Site2-->|
Figure 2 VNI Base Service Packet Flow. Figure 2 VNI Base Service Packet Flow.
VNI base Service(One VNI mapped to one EVI). VNI base Service(One VNI mapped to one EVI).
Hosts H1, H2, H3 and H4 are hosts and there associated MAC addresses Hosts H1, H2, H3 and H4 are hosts and there associated MAC addresses
are M1, M2, M3 and M4. PE1, PE2, PE3 and PE4 are the VXLAN-EVPN are M1, M2, M3 and M4. PE1, PE2, PE3 and PE4 are the VXLAN-EVPN
gateways. NVE1 and NVE2 are the originators of the VXLAN based gateways. NVE1 and NVE2 are the originators of the VXLAN based
network. network.
When host H1 in Data Center Site1 communicates with H3 in Data Center When host H1 in Data Center Site1 communicates with H3 in Data Center
Site2, H1 forms a layer2 packet with source IP address as IP1 and Site2, H1 forms a layer2 packet with source IP address as IP1 and
skipping to change at page 13, line 23 skipping to change at page 13, line 7
MAC lookup, the frame needs to be sent to VXLAN network. VXLAN MAC lookup, the frame needs to be sent to VXLAN network. VXLAN
encapsulation is added to the original Ethernet frame and frame is encapsulation is added to the original Ethernet frame and frame is
sent over the VXLAN tunnel. Frames arrives at PE1. PE1(i.e. VXLAN sent over the VXLAN tunnel. Frames arrives at PE1. PE1(i.e. VXLAN
gateway), identifies that frame is a VXLAN frame. The VXLAN header is gateway), identifies that frame is a VXLAN frame. The VXLAN header is
de-capsulated and Destination MAC lookup is done in the bridge domain de-capsulated and Destination MAC lookup is done in the bridge domain
table of the EVI. Lookup of destination MAC results in the EVPN table of the EVI. Lookup of destination MAC results in the EVPN
unicast NH. This NH will be used for identifying the labels (tunnel unicast NH. This NH will be used for identifying the labels (tunnel
label and service label) to be added over the EVPN core. Similar label and service label) to be added over the EVPN core. Similar
processing is done on the other side of DCI. processing is done on the other side of DCI.
7.1.2. VNI Bundle Service Use Case Scenario 6.1.2. VNI Bundle Service Use Case Scenario
In the case of VNI-aware bundle service mode, there are multiple VNIs In the case of VNI-aware bundle service mode, there are multiple VNIs
are mapped to one EVI. The Ethernet TAG ID must be set to the VNI ID are mapped to one EVI. The Ethernet TAG ID must be set to the VNI ID
in the EVPN BGP NLRIs. MPLS label allocation in this use case in the EVPN BGP NLRIs. MPLS label allocation in this use case
scenario can be done either per EVI or per EVI, VNI ID basis. If MPLS scenario can be done either per EVI or per EVI, VNI ID basis. If MPLS
label allocation is done per EVI basis, then in data path there is a label allocation is done per EVI basis, then in data path there is a
need to push a VLAN TAG for identifying bridge-domain at egress PE so need to push a VLAN TAG for identifying bridge-domain at egress PE so
that Destination MAC address lookup can be done on the bridge domain. that Destination MAC address lookup can be done on the bridge domain.
7.1.3. VNI Translation Use Case 6.1.3. VNI Translation Use Case
+---+ +---+ +---+ +---+
| H1| +---+ +-------+ +---+ +----------+ +---+ +-------+ +---+ | H3| | H1| +---+ +-------+ +---+ +----------+ +---+ +-------+ +---+ | H3|
| M1|-+ +-+ +-+PE1+-+ +-+PE3+-+ +-+ +-| M3| | M1|-+ +-+ +-+PE1+-+ +-+PE3+-+ +-+ +-| M3|
+---+ | | | | +---+ |MPLS Core | +---+ | | | | +---+ +---+ | | | | +---+ |MPLS Core | +---+ | | | | +---+
+---+ |NVE| | VXLAN | | (EVPN) | | VXLAN | |NVE| +---+ +---+ |NVE| | VXLAN | | (EVPN) | | VXLAN | |NVE| +---+
| H2| | 1 | | | +---+ | | +---+ | | | 2 | | H4| | H2| | 1 | | | +---+ | | +---+ | | | 2 | | H4|
| M2|-+ +-+ +-+PE2+-+ +-+PE4+-+ +-+ +-| M4| | M2|-+ +-+ +-+PE2+-+ +-+PE4+-+ +-+ +-| M4|
+---+ +---+ +-------+ +---+ +----------+ +---+ +-------+ +---+ +---+ +---+ +---+ +-------+ +---+ +----------+ +---+ +-------+ +---+ +---+
|<----VNI ID A--->|<-------EVI-A------->|<----VNI_ID_B--->| |<----VNI ID A--->|<-------EVI-A------->|<----VNI_ID_B--->|
Figure 3 VNI Translation Use Case Scenarios.
Figure 3 VNI Translation Use Case Scenarios.
There are two or more Data Center sites. These Data Center sites There are two or more Data Center sites. These Data Center sites
might use different VNI ID for same service. For example, Service A might use different VNI ID for same service. For example, Service A
usage "VNI_ID_A" at data center site1 and "VNI_ID_B" for same service usage "VNI_ID_A" at data center site1 and "VNI_ID_B" for same service
in data center site 2. VNI ID A is terminated at ingress EVPN PE and in data center site 2. VNI ID A is terminated at ingress EVPN PE and
VNI ID B is encapsulated at the egress EVPN PE. VNI ID B is encapsulated at the egress EVPN PE.
7.2. Heterogeneous Network DCI Use Cases Scenarios 6.2. Heterogeneous Network DCI Use Cases Scenarios
Data Center sites are upgraded slowly; so heterogeneous network DCI Data Center sites are upgraded slowly; so heterogeneous network DCI
solution is required from the perspective of migration approach from solution is required from the perspective of migration approach from
traditional data center to VXLAN based data center. For Example Data traditional data center to VXLAN based data center. For Example Data
Center Site1 is upgrade to VXLAN but Data Center Site 2 and 3 are Center Site1 is upgrade to VXLAN but Data Center Site 2 and 3 are
still layer2/VLAN based data centers. For these use cases, it is still layer2/VLAN based data centers. For these use cases, it is
required to provide VXLAN VLAN interworking over EVPN core. required to provide VXLAN VLAN interworking over EVPN core.
7.2.1. VXLAN VLAN Interworking Over EVPN Use Case Scenario 6.2.1. VXLAN VLAN Interworking Over EVPN Use Case Scenario
The new data center site is VXLAN based data center site. But the The new data center site is VXLAN based data center site. But the
older data center sites are still based on the VLAN. older data center sites are still based on the VLAN.
+---+ +---+ +---+ +---+
| H1| +---+ +------+ +---+ +---------+ +---+ +-------+ +---+ | H3| | H1| +---+ +------+ +---+ +---------+ +---+ +-------+ +---+ | H3|
| M1|-+ +-+ +-+PE1+-+ +-+PE3+-+ +-+ +-| M3| | M1|-+ +-+ +-+PE1+-+ +-+PE3+-+ +-+ +-| M3|
+---+ | | | | +---+ |MPLS Core| +---+ | | | | +---+ +---+ | | | | +---+ |MPLS Core| +---+ | | | | +---+
+---+ |NVE| |VXLAN | | (EVPN) | | L2 | |NVE| +---+ +---+ |NVE| |VXLAN | | (EVPN) | | L2 | |NVE| +---+
| H2| | 1 | | | +---+ | | +---+ |Network| | 2 | | H4| | H2| | 1 | | | +---+ | | +---+ |Network| | 2 | | H4|
| M2|-+ +-+ +-+PE2+-+ +-+PE4+-+ +-+ +-| M4| | M2|-+ +-+ +-+PE2+-+ +-+PE4+-+ +-+ +-| M4|
+---+ +---+ +------+ +---+ +---------+ +---+ +-------+ +---+ +---+ +---+ +---+ +------+ +---+ +---------+ +---+ +-------+ +---+ +---+
|<--Data Center Site1->|<---EVPN Core--->|<--Data Center Site2-->| |<--Data Center Site1->|<---EVPN Core--->|<--Data Center Site2-->|
+-----+ +------+-----+ +------+------+-----+ +------+-----+ +-----+ +-----+ +------+-----+ +------+------+-----+ +------+-----+ +-----+
|L2 | |VXLAN |L2 | |MPLS |VLAN |L2 | |VLAN |L2 | |L2 | |L2 | |VXLAN |L2 | |MPLS |VLAN |L2 | |VLAN |L2 | |L2 |
|Frame| |Header|Frame| |Header|Header|Frame| |Header|Frame| |Frame| |Frame| |Header|Frame| |Header|Header|Frame| |Header|Frame| |Frame|
+-----+ +------+-----+ +------+------+-----+ +------+-----+ +-----+ +-----+ +------+-----+ +------+------+-----+ +------+-----+ +-----+
Figure 5 VXLAN VLAN interworking over EVPN Use Case Figure 5 VXLAN VLAN interworking over EVPN Use Case.
If a service that are represented by VXLAN on one site of data center If a service that are represented by VXLAN on one site of data center
and via VLAN at different data center sites, then it is a recommended and via VLAN at different data center sites, then it is a recommended
to model the service as a VNI base EVPN service. The BGP NLRIs will to model the service as a VNI base EVPN service. The BGP NLRIs will
always advertise VLAN ID TAG as '0' in BGP routes. The advantage with always advertise VLAN ID TAG as '0' in BGP routes. The advantage with
this approach is that there is no requirement to do the VNI this approach is that there is no requirement to do the VNI
normalization at EVPN core. VNI ID A is terminated at ingress EVPN PE normalization at EVPN core. VNI ID A is terminated at ingress EVPN PE
and "VLAN ID B" is encapsulated at the egress EVPN PE. and "VLAN ID B" is encapsulated at the egress EVPN PE.
8. Acknowledgements 7. Acknowledgements
The authors would like to acknowledge Wen Lin contributions to this The authors would like to acknowledge Wen Lin contributions to this
document. document.
8. Security Considerations
9. Security Considerations
There are no additional security aspects that need to be discussed There are no additional security aspects that need to be discussed
here. here.
9. IANA Considerations
10. IANA Considerations 10. References
TBD
11. References
11.1 Normative References 10.1 Normative References
[KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
11.2 Informative References 10.2 Informative References
[EVPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf- [EVPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", RFC 7432,
l2vpn-evpn-00.txt, work in progress, February, 2012. February, 2012.
[TRILL] Sajassi et al., TRILL-EVPN draft-ietf-l2vpn-trill-evpn-00, [PBB-EVPN] Sajassi et al., "Provider Backbone Bridging Combined with
work in progress, June 2012. Ethernet VPN (PBB-EVPN)", RFC 7623, September, 2015.
[VXLAN] Mahalingam, Dutt et al., A Framework for Overlaying [VXLAN] Mahalingam, Dutt et al., A Framework for Overlaying
Virtualized Layer 2 Networks over Layer 3 Networks draft-mahalingam- Virtualized Layer 2 Networks over Layer 3 Networks, RFC 7348, August,
dutt-dcops-vxlan-02.txt, work in progress, August, 2012. 2012.
[NVGRE] Sridharan et al., Network Virtualization using Generic [NVGRE] Sridharan et al., Network Virtualization using Generic
Routing Encapsulation draft-sridharan-virtualization-nvgre-01.txt, Routing Encapsulation, RFC 7637, July, 2012.
work in progress, July, 2012.
Authors' Addresses Authors' Addresses
Sami Boutros Sami Boutros
Cisco Systems VMware, Inc.
EMail: sboutros@vmware.com
EMail: sboutros@cisco.com
Ali Sajassi Ali Sajassi
Cisco Systems Cisco Systems
EMail: sajassi@cisco.com EMail: sajassi@cisco.com
Samer Salam Samer Salam
Cisco Systems Cisco Systems
EMail: ssalam@cisco.com EMail: ssalam@cisco.com
Dennis Cai Dennis Cai
Cisco Systems Cisco Systems
EMail: dcai@cisco.com EMail: dcai@cisco.com
Tapraj Singh Tapraj Singh
Juniper Networks Juniper Networks
Email: tsingh@juniper.net Email: tsingh@juniper.net
John Drake John Drake
Juniper Networks Juniper Networks
Email: jdrake@juniper.net Email: jdrake@juniper.net
Samir Thoria Samir Thoria
Cisco Cisco
 End of changes. 51 change blocks. 
162 lines changed or deleted 137 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/