idnits 2.17.1 

draft-ietf-bess-evpn-overlay-08.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 27, 2017) is 2577 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 5512 (Obsoleted by RFC 9012)

  == Outdated reference: A later version (-10) exists of
     draft-ietf-bess-dci-evpn-overlay-04

  == Outdated reference: A later version (-22) exists of
     draft-ietf-idr-tunnel-encaps-03

  == Outdated reference: A later version (-13) exists of
     draft-ietf-nvo3-vxlan-gpe-03


     Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	BESS Workgroup                                       A. Sajassi (Editor)
3	INTERNET-DRAFT                                                     Cisco
4	Intended Status: Standards Track                       J. Drake (Editor)
5	                                                                 Juniper
6	                                                                N. Bitar
7	                                                                   Nokia
8	                                                              R. Shekhar
9	                                                                 Juniper
10	                                                               J. Uttaro
11	                                                                    AT&T
12	                                                           W. Henderickx
13	                                                                   Nokia

15	Expires: September 27, 2017                               March 27, 2017

17	         A Network Virtualization Overlay Solution using EVPN
18	                    draft-ietf-bess-evpn-overlay-08

20	Abstract

22	   This document describes how Ethernet VPN (EVPN) can be used as an
23	   Network Virtualization Overlay (NVO) solution and explores the
24	   various tunnel encapsulation options over IP  and their impact on the
25	   EVPN control-plane and procedures. In particular, the following
26	   encapsulation options are analyzed: VXLAN, NVGRE, and MPLS over GRE.

28	Status of this Memo

30	   This Internet-Draft is submitted to IETF in full conformance with the
31	   provisions of BCP 78 and BCP 79.

33	   Internet-Drafts are working documents of the Internet Engineering
34	   Task Force (IETF), its areas, and its working groups.  Note that
35	   other groups may also distribute working documents as
36	   Internet-Drafts.

38	   Internet-Drafts are draft documents valid for a maximum of six months
39	   and may be updated, replaced, or obsoleted by other documents at any
40	   time.  It is inappropriate to use Internet-Drafts as reference
41	   material or to cite them other than as "work in progress."

43	   The list of current Internet-Drafts can be accessed at
44	   http://www.ietf.org/1id-abstracts.html

46	   The list of Internet-Draft Shadow Directories can be accessed at
47	   http://www.ietf.org/shadow.html

49	Copyright and License Notice

51	   Copyright (c) 2017 IETF Trust and the persons identified as the
52	   document authors. All rights reserved.

54	   This document is subject to BCP 78 and the IETF Trust's Legal
55	   Provisions Relating to IETF Documents
56	   (http://trustee.ietf.org/license-info) in effect on the date of
57	   publication of this document. Please review these documents
58	   carefully, as they describe your rights and restrictions with respect
59	   to this document. Code Components extracted from this document must
60	   include Simplified BSD License text as described in Section 4.e of
61	   the Trust Legal Provisions and are provided without warranty as
62	   described in the Simplified BSD License.

64	Table of Contents

66	   1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .  4
67	   2  Specification of Requirements . . . . . . . . . . . . . . . . .  5
68	   3  Terminology . . . . . . . . . . . . . . . . . . . . . . . . . .  5
69	   4 EVPN Features  . . . . . . . . . . . . . . . . . . . . . . . . .  6
70	   5 Encapsulation Options for EVPN Overlays  . . . . . . . . . . . .  7
71	     5.1 VXLAN/NVGRE Encapsulation  . . . . . . . . . . . . . . . . .  7
72	       5.1.1 Virtual Identifiers Scope  . . . . . . . . . . . . . . .  8
73	         5.1.1.1 Data Center Interconnect with Gateway  . . . . . . .  8
74	         5.1.1.2 Data Center Interconnect without Gateway . . . . . .  9
75	       5.1.2 Virtual Identifiers to EVI Mapping . . . . . . . . . . .  9
76	         5.1.2.1 Auto Derivation of RT  . . . . . . . . . . . . . . . 10
77	       5.1.3  Constructing EVPN BGP Routes  . . . . . . . . . . . . . 11
78	     5.2 MPLS over GRE  . . . . . . . . . . . . . . . . . . . . . . . 13
79	   6  EVPN with Multiple Data Plane Encapsulations  . . . . . . . . . 13
80	   7  Single-Homing NVEs - NVE Residing in Hypervisor . . . . . . . . 14
81	     7.1 Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE
82	         Encapsulation  . . . . . . . . . . . . . . . . . . . . . . . 14
83	     7.2 Impact on EVPN Procedures for VXLAN/NVGRE Encapsulation  . . 15
84	   8  Multi-Homing NVEs - NVE Residing in ToR Switch  . . . . . . . . 16
85	     8.1  EVPN Multi-Homing Features  . . . . . . . . . . . . . . . . 16
86	       8.1.1 Multi-homed Ethernet Segment Auto-Discovery  . . . . . . 16
87	       8.1.2 Fast Convergence and Mass Withdraw . . . . . . . . . . . 16
88	       8.1.3 Split-Horizon  . . . . . . . . . . . . . . . . . . . . . 17
89	       8.1.4 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 17
90	       8.1.5 DF Election  . . . . . . . . . . . . . . . . . . . . . . 18
91	     8.2 Impact on EVPN BGP Routes & Attributes . . . . . . . . . . . 18
92	     8.3 Impact on EVPN Procedures  . . . . . . . . . . . . . . . . . 18
93	       8.3.1 Split Horizon  . . . . . . . . . . . . . . . . . . . . . 19
94	       8.3.2 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 20
95	       8.3.3 Unknown Unicast Traffic Designation  . . . . . . . . . . 20
96	   9 Support for Multicast  . . . . . . . . . . . . . . . . . . . . . 20
97	   10 Data Center Interconnections - DCI  . . . . . . . . . . . . . . 21
98	     10.1 DCI using GWs . . . . . . . . . . . . . . . . . . . . . . . 22
99	     10.2 DCI using ASBRs . . . . . . . . . . . . . . . . . . . . . . 22
100	       10.2.1 ASBR Functionality with Single-Homing NVEs  . . . . . . 23
101	       10.2.2 ASBR Functionality with Multi-Homing NVEs . . . . . . . 23
102	   11  Acknowledgement  . . . . . . . . . . . . . . . . . . . . . . . 26
103	   12  Security Considerations  . . . . . . . . . . . . . . . . . . . 26
104	   13  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 27
105	   14  References . . . . . . . . . . . . . . . . . . . . . . . . . . 27
106	     14.1  Normative References . . . . . . . . . . . . . . . . . . . 27
107	     14.2  Informative References . . . . . . . . . . . . . . . . . . 27
108	   Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
109	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 29

111	1  Introduction

113	   In the context of this document, a Network Virtualization Overlay
114	   (NVO) is a solution to address the requirements of a multi-tenant
115	   data center, especially one with virtualized hosts, e.g., Virtual
116	   Machines (VMs) or virtual workloads. The key requirements of such a
117	   solution, as described in [Problem-Statement], are:

119	   - Isolation of network traffic per tenant

121	   - Support for a large number of tenants (tens or hundreds of
122	   thousands)

124	   - Extending L2 connectivity among different VMs belonging to a given
125	   tenant segment (subnet) across different PODs within a data center or
126	   between different data centers

128	   - Allowing a given VM to move between different physical points of
129	   attachment within a given L2 segment

131	   The underlay network for NVO solutions is assumed to provide IP
132	   connectivity between NVO endpoints (NVEs).

134	   This document describes how Ethernet VPN (EVPN) can be used as an NVO
135	   solution and explores applicability of EVPN functions and procedures.
136	   In particular, it describes the various tunnel encapsulation options
137	   for EVPN over IP, and their impact on the EVPN control-plane and
138	   procedures for two main scenarios:

140	   a) single-homing NVEs - when a NVE resides in the hypervisor, and
141	   b) multi-homing NVEs - when a NVE resides in a Top of Rack (ToR)
142	   device

144	   The possible encapsulation options for EVPN overlays that are
145	   analyzed in this document are:

147	   - VXLAN and NVGRE
148	   - MPLS over GRE

150	   Before getting into the description of the different encapsulation
151	   options for EVPN over IP, it is important to highlight the EVPN
152	   solution's main features, how those features are currently supported,
153	   and any impact that the encapsulation has on those features.

155	2  Specification of Requirements

157	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
158	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
159	   document are to be interpreted as described in [KEYWORDS].

161	3  Terminology

163	   Most of the terminology used in this documents comes from [RFC7432]
164	   and [NVO3-FRWK].

166	   NVO: Network Virtualization Overlay

168	   NVE: Network Virtualization Endpoint

170	   VNI:  Virtual Network Identifier (for VXLAN)

172	   VSID: Virtual Subnet Identifier (for NVGRE)

174	   EVPN: Ethernet VPN

176	   EVI: An EVPN instance spanning the Provider Edge (PE) devices
177	   participating in that EVPN.

179	   MAC-VRF: A Virtual Routing and Forwarding table for Media Access
180	   Control (MAC) addresses on a PE.

182	   Ethernet Segment (ES): When a customer site (device or network) is
183	   connected to one or more PEs via a set of Ethernet links, then that
184	   set of links is referred to as an 'Ethernet segment'.

186	   Ethernet Segment Identifier (ESI): A unique non-zero identifier that
187	   identifies an Ethernet segment is called an 'Ethernet Segment
188	   Identifier'.

190	   Ethernet Tag: An Ethernet tag identifies a particular broadcast
191	   domain, e.g., a VLAN.  An EVPN instance consists of one or more
192	   broadcast domains.

194	   PE: Provider Edge device.

196	   Single-Active Redundancy Mode: When only a single PE, among all the
197	   PEs attached to an Ethernet segment, is allowed to forward traffic
198	   to/from that Ethernet segment for a given VLAN, then the Ethernet
199	   segment is defined to be operating in Single-Active redundancy mode.

201	   All-Active Redundancy Mode: When all PEs attached to an Ethernet
202	   segment are allowed to forward known unicast traffic to/from that
203	   Ethernet segment for a given VLAN, then the Ethernet segment is
204	   defined to be operating in All-Active redundancy mode.

206	4 EVPN Features

208	   EVPN was originally designed to support the requirements detailed in
209	   [RFC7209] and therefore has the following attributes which directly
210	   address control plane scaling and ease of deployment issues.

212	   1)  Control plane information is distributed with BGP and Broadcast
213	   and Multicast traffic is sent using a shared multicast tree or with
214	   ingress replication.

216	   2)  Control plane learning is used for MAC (and IP) addresses instead
217	   of data plane learning. The latter requires the flooding of unknown
218	   unicast and ARP frames; whereas, the former does not require any
219	   flooding.

221	   3) Route Reflectors are used to reduce a full mesh of BGP sessions
222	   among PE devices to a single BGP session between a PE and the RR.
223	   Furthermore, RR hierarchy can be leveraged to scale the number of BGP
224	   routes on the RR.

226	   4)  Auto-discovery via BGP is used to discover PE devices
227	   participating in a given VPN, PE devices participating in a given
228	   redundancy group, tunnel encapsulation types, multicast tunnel type,
229	   multicast members, etc.

231	   5)  All-Active multihoming is used.  This allows a given customer
232	   device (CE) to have multiple links to multiple PEs, and traffic
233	   to/from that CE fully utilizes all of these links.

235	   6)  When a link between a CE and a PE fails, the PEs for that EVI are
236	   notified of the failure via the withdrawal of a single EVPN route.
237	   This allows those PEs to remove the withdrawing PE as a next hop for
238	   every MAC address associated with the failed link.  This is termed
239	   'mass withdrawal'.

241	   7)  BGP route filtering and constrained route distribution are
242	   leveraged to ensure that the control plane traffic for a given EVI is
243	   only distributed to the PEs in that EVI.

245	   8) When a 802.1Q interface is used between a CE and a PE, each of the
246	   VLAN ID (VID) on that interface can be mapped onto a bridge table
247	   (for upto 4094 such bridge tables). All these bridge tables may be
248	   mapped onto a single MAC-VRF (in case of VLAN-aware bundle service).

250	   9)  VM Mobility mechanisms ensure that all PEs in a given EVI know
251	   the ES with which a given VM, as identified by its MAC and IP
252	   addresses, is currently associated.

254	   10)  Route Targets are used to allow the operator (or customer) to
255	   define a spectrum of logical network topologies including mesh, hub &
256	   spoke, and extranets (e.g., a VPN whose sites are owned by different
257	   enterprises), without the need for proprietary software or the aid of
258	   other virtual or physical devices.

260	   Because the design goal for NVO is millions of instances per common
261	   physical infrastructure, the scaling properties of the control plane
262	   for NVO are extremely important.   EVPN and the extensions described
263	   herein, are designed with this level of scalability in mind.

265	5 Encapsulation Options for EVPN Overlays

267	5.1 VXLAN/NVGRE Encapsulation

269	   Both VXLAN and NVGRE are examples of technologies that provide a data
270	   plane encapsulation which is used to transport a packet over the
271	   common physical IP infrastructure between Network Virtualization
272	   Edges (NVEs) - e.g., VXLAN Tunnel End Points (VTEPs) in VXLAN
273	   network. Both of these technologies include the identifier of the
274	   specific NVO instance, Virtual Network Identifier (VNI) in VXLAN and
275	   Virtual Subnet Identifier (VSID) in NVGRE, in each packet. In the
276	   remainder of this document we use VNI as the representation for NVO
277	   instance with the understanding that VSID can equally be used if the
278	   encapsulation is NVGRE unless it is stated otherwise.

280	   Note that a Provider Edge (PE) is equivalent to a NVE/VTEP.

282	   VXLAN encapsulation is based on UDP, with an 8-byte header following
283	   the UDP header. VXLAN provides a 24-bit VNI, which typically provides
284	   a one-to-one mapping to the tenant VLAN ID, as described in
285	   [RFC7348]. In this scenario, the ingress VTEP does not include an
286	   inner VLAN tag on the encapsulated frame, and the egress VTEP
287	   discards the frames with an inner VLAN tag. This mode of operation in
288	   [RFC7348] maps to VLAN Based Service in [RFC7432], where a tenant
289	   VLAN ID gets mapped to an EVPN instance (EVI).

291	   VXLAN also provides an option of including an inner VLAN tag in the
292	   encapsulated frame, if explicitly configured at the VTEP. This mode
293	   of operation can map to VLAN Bundle Service in [RFC7432] because all
294	   the tenant's tagged frames map to a single bridge table / MAC-VRF,
295	   and the inner VLAN tag is not used for lookup by the disposition PE
296	   when performing VXLAN decapsulation as described in section 6 of

298	   [RFC7348].

300	   [NVGRE] encapsulation is based on GRE encapsulation and it mandates
301	   the inclusion of the optional GRE Key field which carries the VSID.
302	   There is a one-to-one mapping between the VSID and the tenant VLAN
303	   ID, as described in [NVGRE] and the inclusion of an inner VLAN tag is
304	   prohibited. This mode of operation in [NVGRE] maps to VLAN Based
305	   Service in [RFC7432].

307	   As described in the next section there is no change to the encoding
308	   of EVPN routes to support VXLAN or NVGRE encapsulation except for the
309	   use of the BGP Encapsulation extended community to indicate the
310	   encapsulation type (e.g., VxLAN or NVGRE). However, there is
311	   potential impact to the EVPN procedures depending on where the NVE is
312	   located (i.e., in hypervisor or TOR) and whether multi-homing
313	   capabilities are required.

315	5.1.1 Virtual Identifiers Scope

317	   Although VNIs are defined as 24-bit globally unique values, there are
318	   scenarios in which it is desirable to use a locally significant value
319	   for VNI, especially in the context of data center interconnect:

321	5.1.1.1 Data Center Interconnect with Gateway

323	   In the case where NVEs in different data centers need to be
324	   interconnected, and the NVEs need to use VNIs as a globally unique
325	   identifiers within a data center, then a Gateway needs to be employed
326	   at the edge of the data center network. This is because the Gateway
327	   will provide the functionality of translating the VNI when crossing
328	   network boundaries, which may align with operator span of control
329	   boundaries. As an example, consider the network of Figure 1 below.
330	   Assume there are three network operators: one for each of the DC1,
331	   DC2 and WAN networks. The Gateways at the edge of the data centers
332	   are responsible for translating the VNIs between the values used in
333	   each of the data center networks and the values used in the WAN.

335	                             +--------------+
336	                             |              |
337	           +---------+       |     WAN      |       +---------+
338	   +----+  |        +---+  +----+        +----+  +---+        |  +----+
339	   |NVE1|--|        |   |  |WAN |        |WAN |  |   |        |--|NVE3|
340	   +----+  |IP      |GW |--|Edge|        |Edge|--|GW | IP     |  +----+
341	   +----+  |Fabric  +---+  +----+        +----+  +---+ Fabric |  +----+
342	   |NVE2|--|         |       |              |       |         |--|NVE4|
343	   +----+  +---------+       +--------------+       +---------+  +----+

345	   |<------ DC 1 ------>                          <------ DC2  ------>|

347	            Figure 1: Data Center Interconnect with Gateway

349	5.1.1.2 Data Center Interconnect without Gateway

351	   In the case where NVEs in different data centers need to be
352	   interconnected, and the NVEs need to use locally assigned VNIs (e.g.,
353	   similar to MPLS labels), then there may be no need to employ Gateways
354	   at the edge of the data center network. More specifically, the VNI
355	   value that is used by the transmitting NVE is allocated by the NVE
356	   that is receiving the traffic (in other words, this is similar to
357	   "downstream assigned" MPLS label). This allows the VNI space to be
358	   decoupled between different data center networks without the need for
359	   a dedicated Gateway at the edge of the data centers. This topics is
360	   covered in section 10.2.

362	                              +--------------+
363	                              |              |
364	              +---------+     |     WAN      |    +---------+
365	      +----+  |         |   +----+        +----+  |         |  +----+
366	      |NVE1|--|         |   |ASBR|        |ASBR|  |         |--|NVE3|
367	      +----+  |IP Fabric|---|    |        |    |--|IP Fabric|  +----+
368	      +----+  |         |   +----+        +----+  |         |  +----+
369	      |NVE2|--|         |     |              |    |         |--|NVE4|
370	      +----+  +---------+     +--------------+    +---------+  +----+

372	      |<------ DC 1 ----->                        <---- DC2  ------>|

374	              Figure 2: Data Center Interconnect with ASBR

376	5.1.2 Virtual Identifiers to EVI Mapping

378	   When the EVPN control plane is used in conjunction with VXLAN (or
379	   NVGRE encapsulation), two options for mapping the VXLAN VNI (or NVGRE
380	   VSID) to an EVI are possible:

382	   1. Option 1: Single Broadcast Domain per EVI

384	   In this option, a single Ethernet broadcast domain (e.g., subnet)
385	   represented by a VNI is mapped to a unique EVI. This corresponds to
386	   the VLAN Based service in [RFC7432], where a tenant-facing interface,
387	   logical interface (e.g., represented by a VLAN ID) or physical, gets
388	   mapped to an EVPN instance (EVI). As such, a BGP RD and RT are needed
389	   per VNI on every NVE. The advantage of this model is that it allows
390	   the BGP RT constraint mechanisms to be used in order to limit the
391	   propagation and import of routes to only the NVEs that are interested
392	   in a given VNI. The disadvantage of this model may be the
393	   provisioning overhead if RD and RT are not derived automatically from
394	   VNI.

396	   In this option, the MAC-VRF table is identified by the RT in the
397	   control plane and by the VNI in the data-plane. In this option, the
398	   specific MAC-VRF table corresponds to only a single bridge table.

400	   2. Option 2: Multiple Broadcast Domains per EVI

402	   In this option, multiple subnets each represented by a unique VNI are
403	   mapped to a single EVI. For example, if a tenant has multiple
404	   segments/subnets each represented by a VNI, then all the VNIs for
405	   that tenant are mapped to a single EVI - e.g., the EVI in this case
406	   represents the tenant and not a subnet . This corresponds to the
407	   VLAN-aware bundle service in [RFC7432]. The advantage of this model
408	   is that it doesn't require the provisioning of RD/RT per VNI.
409	   However, this is a moot point when compared to option 1 where auto-
410	   derivation is used. The disadvantage of this model is that routes
411	   would be imported by NVEs that may not be interested in a given VNI.

413	   In this option the MAC-VRF table is identified by the RT in the
414	   control plane and a specific bridge table for that MAC-VRF is
415	   identified by the <RT, Ethernet Tag ID> in the control plane. In this
416	   option, the VNI in the data-plane is sufficient to identify a
417	   specific bridge table.

419	5.1.2.1 Auto Derivation of RT

421	   When the option of a single VNI per EVI is used, in order to simplify
422	   configuration, the RT used for EVPN can be auto-derived. RD can be
423	   auto generated as described in [RFC7432] and RT can be auto-derived
424	   as described next.

426	   Since a gateway PE as depicted in figure-1 participates in both the
427	   DCN and WAN BGP sessions, it is important that when RT values are
428	   auto-derived from VNIs, there is no conflict in RT spaces between DCN
429	   and WAN networks assuming that both are operating within the same AS.
430	   Also, there can be scenarios where both VXLAN and NVGRE
431	   encapsulations may be needed within the same DCN and their
432	   corresponding VNIs are administered independently which means VNI
433	   spaces can overlap. In order to ensure that no such conflict in RT
434	   spaces arises, RT values for DCNs are auto-derived as follow:

436	    0                   1                   2                   3
437	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
438	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
439	   |              AS #             |A| TYPE| D-ID  | Service ID    |
440	   +-----------------------------------------------+---------------+
441	   |       Service ID (Cont.)      |
442	   +-------------------------------+

444	   - 2 bytes of global admin field of the RT is set to the AS number.

446	   - Three least significant bytes of the local admin field of the RT is
447	   set to the VNI, VSID, I-SID, or VID.

449	   - The most significant bit of the local admin field of the RT is set
450	   as follow:
451	            0: auto-derived
452	            1: manually-derived

454	   - The next 3 bits of the most significant byte of the local admin
455	   field of the RT identifies the space in which the other 3 bytes are
456	   defined. The following spaces are defined:
457	            0 : VID (802.1Q VLAN ID)
458	            1 : VXLAN
459	            2 : NVGRE
460	            3 : I-SID
461	            4 : EVI
462	            5 : dual-VID (QinQ VLAN ID)

464	   - The remaining 4 bits of the most significant byte of the local
465	   admin field of the RT identifies the domain-id. The default value of
466	   domain-id is zero indicating that only a single numbering space exist
467	   for a given technology. However, if there are more than one number
468	   space exist for a given technology (e.g., overlapping VXLAN spaces),
469	   then each of the number spaces need to be identify by their
470	   corresponding domain-id starting from 1.

472	5.1.3  Constructing EVPN BGP Routes
473	   In EVPN, an MPLS label for instance identifying forwarding table is
474	   distributed by the egress PE via the EVPN control plane and is placed
475	   in the MPLS header of a given packet by the ingress PE. This label is
476	   used upon receipt of that packet by the egress PE for disposition of
477	   that packet. This is very similar to the use of the VNI by the egress
478	   NVE, with the difference being that an MPLS label has local
479	   significance while a VNI typically has global significance.
480	   Accordingly, and specifically to support the option of locally-
481	   assigned VNIs, the MPLS Label1 field in the MAC/IP Advertisement
482	   route, the MPLS label field in the Ethernet AD per EVI route, and the
483	   MPLS label field in the PMSI Tunnel Attribute of the Inclusive
484	   Multicast Ethernet Tag (IMET) route are used to carry the VNI. For
485	   the balance of this memo, the above MPLS label fields will be
486	   referred to as the VNI field. The VNI field is used for both local
487	   and global VNIs, and for either case the entire 24-bit field is used
488	   to encode the VNI value.

490	   For the VLAN-based service (a single VNI per MAC-VRF), the Ethernet
491	   Tag field in the MAC/IP Advertisement, Ethernet AD per EVI, and IMET
492	   route MUST be set to zero just as in the VLAN Based service in
493	   [RFC7432].

495	   For the VLAN-aware bundle service (multiple VNIs per MAC-VRF with
496	   each VNI associated with its own bridge table), the Ethernet Tag
497	   field in the MAC Advertisement, Ethernet AD per EVI, and IMET route
498	   MUST identify a bridge table within a MAC-VRF and the set of Ethernet
499	   Tags for that EVI needs to be configured consistently on all PEs
500	   within that EVI.  For locally-assigned VNIs, the value advertised in
501	   the Ethernet Tag field MUST be set to a VID just as in the VLAN-aware
502	   bundle service in [RFC7432]. Such setting must be done consistently
503	   on all PE devices participating in that EVI within a given domain.
504	   For global VNIs, the value advertised in the Ethernet Tag field
505	   SHOULD be set to a VNI as long as it matches the existing semantics
506	   of the Ethernet Tag, i.e., it identifies a bridge table within a MAC-
507	   VRF and the set of VNIs are configured consistently on each PE in
508	   that EVI.

510	   In order to indicate which type of data plane encapsulation (i.e.,
511	   VXLAN, NVGRE, MPLS, or MPLS in GRE) is to be used, the BGP
512	   Encapsulation extended community defined in [TUNNEL-ENCAP] and
513	   [RFC5512] is included with all EVPN routes (i.e. MAC Advertisement,
514	   Ethernet AD per EVI, Ethernet AD per ESI, Inclusive Multicast
515	   Ethernet Tag, and Ethernet Segment) advertised by an egress PE. Five
516	   new values have been assigned by IANA to extend the list of
517	   encapsulation types defined in [TUNNEL-ENCAP] and they are listed in
518	   section 13.

520	   The MPLS encapsulation tunnel type, listed in section 13, is needed
521	   in order to distinguish between an advertising node that only
522	   supports non-MPLS encapsulations and one that supports MPLS and non-
523	   MPLS encapsulations. An advertising node that only supports MPLS
524	   encapsulation does not need to advertise any encapsulation tunnel
525	   types;  i.e.,  if the BGP Encapsulation extended community is not
526	   present, then either MPLS encapsulation or a statically configured
527	   encapsulation is assumed.

529	   The Ethernet Segment and Ethernet AD per ESI routes MAY be advertised
530	   with multiple encapsulation types as long as they use the same EVPN
531	   multi-homing procedures (section 8.3.1, Split Horizon) - e.g., the
532	   mix of VXLAN and NVGRE encapsulation types is a valid one but not the
533	   mix of VXLAN and MPLS encapsulation types.

535	   The Next Hop field of the MP_REACH_NLRI attribute of the route MUST
536	   be set to the IPv4 or IPv6 address of the NVE. The remaining fields
537	   in each route are set as per [RFC7432].

539	   Note that the procedure defined here to use the MPLS Label field to
540	   carry the VNI in the presence of a Tunnel Encapsulation Extended
541	   Community specifying the use of a VNI, is aligned with the procedures
542	   described in section 8.2.2.2 of [tunnel-encap] ("When a Valid VNI has
543	   not been Signaled").

545	5.2 MPLS over GRE

547	   The EVPN data-plane is modeled as an EVPN MPLS client layer sitting
548	   over an MPLS PSN-tunnel server layer. Some of the EVPN functions
549	   (split-horizon, aliasing, and backup-path) are tied to the MPLS
550	   client layer. If MPLS over GRE encapsulation is used, then the EVPN
551	   MPLS client layer can be carried over an IP PSN tunnel transparently.
552	   Therefore, there is no impact to the EVPN procedures and associated
553	   data-plane operation.

555	   The existing standards for MPLS over GRE encapsulation as defined by
556	   [RFC4023] can be used for this purpose; however, when it is used in
557	   conjunction with EVPN the GRE key field SHOULD be present, and SHOULD
558	   be used to provide a 32-bit entropy field. The Checksum and Sequence
559	   Number fields are not needed and their corresponding C and S bits
560	   MUST be set to zero. A PE capable of supporting this encapsulation,
561	   should advertise its EVPN routes along with the Tunnel Encapsulation
562	   extended community indicating MPLS over GRE encapsulation, as
563	   described in previous section.

565	6  EVPN with Multiple Data Plane Encapsulations

567	   The use of the BGP Encapsulation extended community per [TUNNEL-
568	   ENCAP] and [RFC5512] allows each NVE in a given EVI to know each of
569	   the encapsulations supported by each of the other NVEs in that EVI.
570	   i.e., each of the NVEs in a given EVI may support multiple data plane
571	   encapsulations.  An ingress NVE can send a frame to an egress NVE
572	   only if the set of encapsulations advertised by the egress NVE forms
573	   a non-empty intersection with the set of encapsulations supported by
574	   the ingress NVE, and it is at the discretion of the ingress NVE which
575	   encapsulation to choose from this intersection.   (As noted in
576	   section 5.1.3, if the BGP Encapsulation extended community is not
577	   present, then the default MPLS encapsulation or a locally configured
578	   encapsulation is assumed.)

580	   An ingress node that uses shared multicast trees for sending
581	   broadcast or multicast frames MAY maintain distinct trees for each
582	   different encapsulation type.

584	   It is the responsibility of the operator of a given EVI to ensure
585	   that all of the NVEs in that EVI support at least one common
586	   encapsulation. If this condition is violated, it could result in
587	   service disruption or failure.  The use of the BGP Encapsulation
588	   extended community provides a method to detect when this condition is
589	   violated but the actions to be taken are at the discretion of the
590	   operator and are outside the scope of this document.

592	7  Single-Homing NVEs - NVE Residing in Hypervisor

594	   When a NVE and its hosts/VMs are co-located in the same physical
595	   device, e.g., when they reside in a server, the links between them
596	   are virtual and they typically share fate;  i.e., the subject
597	   hosts/VMs are typically not multi-homed or if they are multi-homed,
598	   the multi-homing is a purely local matter to the server hosting the
599	   VM and the NVEs, and need not be "visible" to any other NVEs residing
600	   on other servers, and thus does not require any specific protocol
601	   mechanisms.  The most common case of this is when the NVE resides on
602	   the hypervisor.

604	   In the sub-sections that follow, we will discuss the impact on EVPN
605	   procedures for the case when the NVE resides on the hypervisor and
606	   the VXLAN (or NVGRE) encapsulation is used.

608	7.1 Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE Encapsulation

610	   In scenarios where different groups of data centers are under
611	   different administrative domains, and these data centers are
612	   connected via one or more backbone core providers as described in
613	   [NVO3-FRWK], the RD must be a unique value per EVI or per NVE as
614	   described in [RFC7432]. In other words, whenever there is more than
615	   one administrative domain for global VNI, then a unique RD MUST be
616	   used, or whenever the VNI value have local significance, then a
617	   unique RD MUST be used. Therefore, it is recommend to use a unique RD
618	   as described in [RFC7432] at all time.

620	   When the NVEs reside on the hypervisor, the EVPN BGP routes and
621	   attributes associated with multi-homing are no longer required. This
622	   reduces the required routes and attributes to the following subset of
623	   four out of eight:

625	   - MAC/IP Advertisement Route
626	   - Inclusive Multicast Ethernet Tag Route
627	   - MAC Mobility Extended Community
628	   - Default Gateway Extended Community

630	   However, as noted in section 8.6 of [RFC7432] in order to enable a
631	   single-homing ingress NVE to take advantage of fast convergence,
632	   aliasing, and backup-path when interacting with multi-homed egress
633	   NVEs attached to a given Ethernet segment, the single-homing ingress
634	   NVE SHOULD be able to receive and process Ethernet AD per ES and
635	   Ethernet AD per EVI routes.

637	7.2 Impact on EVPN Procedures for VXLAN/NVGRE Encapsulation

639	   When the NVEs reside on the hypervisors, the EVPN procedures
640	   associated with multi-homing are no longer required. This limits the
641	   procedures on the NVE to the following subset of the EVPN procedures:

643	   1. Local learning of MAC addresses received from the VMs per section
644	   10.1 of [RFC7432].

646	   2. Advertising locally learned MAC addresses in BGP using the MAC/IP
647	   Advertisement routes.

649	   3. Performing remote learning using BGP per Section 10.2 of
650	   [RFC7432].

652	   4. Discovering other NVEs and constructing the multicast tunnels
653	   using the Inclusive Multicast Ethernet Tag routes.

655	   5. Handling MAC address mobility events per the procedures of Section
656	   16 in [RFC7432].

658	   However, as noted in section 8.6 of [RFC7432] in order to enable a
659	   single-homing ingress NVE to take advantage of fast convergence,
660	   aliasing, and back-up path when interacting with multi-homed egress
661	   NVEs attached to a given Ethernet segment, a single-homing ingress
662	   NVE SHOULD implement the ingress node processing of Ethernet AD per
663	   ES and Ethernet AD per EVI routes as defined in sections 8.2 Fast
664	   Convergence and 8.4 Aliasing and Backup-Path of [RFC7432].

666	8  Multi-Homing NVEs - NVE Residing in ToR Switch

668	   In this section, we discuss the scenario where the NVEs reside in the
669	   Top of Rack (ToR) switches AND the servers (where VMs are residing)
670	   are multi-homed to these ToR switches. The multi-homing NVE operate
671	   in All-Active or Single-Active redundancy mode. If the servers are
672	   single-homed to the ToR switches, then the scenario becomes similar
673	   to that where the NVE resides on the hypervisor, as discussed in
674	   Section 7, as far as the required EVPN functionality are concerned.

676	   [RFC7432] defines a set of BGP routes, attributes and procedures to
677	   support multi-homing. We first describe these functions and
678	   procedures, then discuss which of these are impacted by the VxLAN
679	   (or NVGRE) encapsulation and what modifications are required. As it
680	   will be seen later in this section, the only EVPN procedure that is
681	   impacted by non-MPLS overlay encapsulation (e.g., VxLAN or NVGRE)
682	   where it provides space for one ID rather than stack of labels, is
683	   that of split-horizon filtering for multi-homed Ethernet Segments
684	   described in section 8.3.1.

686	8.1  EVPN Multi-Homing Features

688	   In this section, we will recap the multi-homing features of EVPN to
689	   highlight the encapsulation dependencies. The section only describes
690	   the features and functions at a high-level. For more details, the
691	   reader is to refer to [RFC7432].

693	8.1.1 Multi-homed Ethernet Segment Auto-Discovery

695	   EVPN NVEs (or PEs) connected to the same Ethernet Segment (e.g. the
696	   same server via LAG) can automatically discover each other with
697	   minimal to no configuration through the exchange of BGP routes.

699	8.1.2 Fast Convergence and Mass Withdraw

701	   EVPN defines a mechanism to efficiently and quickly signal, to remote
702	   NVEs, the need to update their forwarding tables upon the occurrence
703	   of a failure in connectivity to an Ethernet segment (e.g., a link or
704	   a port failure). This is done by having each NVE advertise an
705	   Ethernet A-D Route per Ethernet segment for each locally attached
706	   segment. Upon a failure in connectivity to the attached segment, the
707	   NVE withdraws the corresponding Ethernet A-D route. This triggers all
708	   NVEs that receive the withdrawal to update their next-hop adjacencies
709	   for all MAC addresses associated with the Ethernet segment in
710	   question. If no other NVE had advertised an Ethernet A-D route for
711	   the same segment, then the NVE that received the withdrawal simply
712	   invalidates the MAC entries for that segment. Otherwise, the NVE
713	   updates the next-hop adjacency list accordingly.

715	8.1.3 Split-Horizon

717	   If a server is multi-homed to two or more NVEs (represented by an
718	   Ethernet segment ES1) and operating in an all-active redundancy mode,
719	   sends a BUM packet (ie,  Broadcast, Unknown unicast, or Multicast) to
720	   one of these NVEs, then it is important to ensure the packet is not
721	   looped back to the server via another NVE connected to this server.
722	   The filtering mechanism on the NVE to prevent such loop and packet
723	   duplication is called "split horizon filtering'.

725	8.1.4 Aliasing and Backup-Path

727	   In the case where a station is multi-homed to multiple NVEs, it is
728	   possible that only a single NVE learns a set of the MAC addresses
729	   associated with traffic transmitted by the station. This leads to a
730	   situation where remote NVEs receive MAC advertisement routes, for
731	   these addresses, from a single NVE even though multiple NVEs are
732	   connected to the multi-homed station. As a result, the remote NVEs
733	   are not able to effectively load-balance traffic among the NVEs
734	   connected to the multi-homed Ethernet segment. This could be the
735	   case, for e.g. when the NVEs perform data-path learning on the
736	   access, and the load-balancing function on the station hashes traffic
737	   from a given source MAC address to a single NVE. Another scenario
738	   where this occurs is when the NVEs rely on control plane learning on
739	   the access (e.g. using ARP), since ARP traffic will be hashed to a
740	   single link in the LAG.

742	   To alleviate this issue, EVPN introduces the concept of Aliasing.
743	   This refers to the ability of an NVE to signal that it has
744	   reachability to a given locally attached Ethernet segment, even when
745	   it has learnt no MAC addresses from that segment. The Ethernet A-D
746	   route per EVI is used to that end. Remote NVEs which receive MAC
747	   advertisement routes with non-zero ESI SHOULD consider the MAC
748	   address as reachable via all NVEs that advertise reachability to the
749	   relevant Segment using Ethernet A-D routes with the same ESI and with
750	   the Single-Active flag reset.

752	   Backup-Path is a closely related function, albeit it applies to the
753	   case where the redundancy mode is Single-Active. In this case, the
754	   NVE signals that it has reachability to a given locally attached
755	   Ethernet Segment using the Ethernet A-D route as well. Remote NVEs
756	   which receive the MAC advertisement routes, with non-zero ESI, SHOULD
757	   consider the MAC address as reachable via the advertising NVE.

759	   Furthermore, the remote NVEs SHOULD install a Backup-Path, for said
760	   MAC, to the NVE which had advertised reachability to the relevant
761	   Segment using an Ethernet A-D route with the same ESI and with the
762	   Single-Active flag set.

764	8.1.5 DF Election

766	   If a host is multi-homed to two or more NVEs on an Ethernet segment
767	   operating in all-active redundancy mode, then for a given EVI only
768	   one of these NVEs, termed the Designated Forwarder (DF) is
769	   responsible for sending it broadcast, multicast, and, if configured
770	   for that EVI, unknown unicast frames.

772	   This is required in order to prevent duplicate delivery of multi-
773	   destination frames to a multi-homed host or VM, in case of all-active
774	   redundancy.

776	   In NVEs where .1Q tagged frames are received from hosts, the DF
777	   election SHOULD BE performed based on host VLAN IDs (VIDs) per
778	   section 8.5 of [RFC7432]. Furthermore, multi-homing PEs of a given
779	   Ethernet Segment MAY perform DF election using configured IDs such as
780	   VNI, EVI, normalized VIDs, and etc. as along the IDs are configured
781	   consistently across the multi-homing PEs.

783	   In GWs where VxLAN encapsulated frames are received, the DF election
784	   is performed on VNIs. Again, it is assumed that for a given Ethernet
785	   Segment, VNIs are unique and consistent (e.g., no duplicate VNIs
786	   exist).

788	8.2 Impact on EVPN BGP Routes & Attributes

790	   Since multi-homing is supported in this scenario, then the entire set
791	   of BGP routes and attributes defined in [RFC7432] are used. The
792	   setting of the Ethernet Tag field in the MAC Advertisement, Ethernet
793	   AD per EVI, and Inclusive Multicast routes follows that of section
794	   5.1.3. Furthermore, the setting of the VNI field in the MAC
795	   Advertisement and Ethernet AD per EVI routes follows that of section
796	   5.1.3.

798	8.3 Impact on EVPN Procedures

800	   Two cases need to be examined here, depending on whether the NVEs are
801	   operating in Single-Active or in All-Active redundancy mode.

803	   First, lets consider the case of Single-Active redundancy mode, where
804	   the hosts are multi-homed to a set of NVEs, however, only a single
805	   NVE is active at a given point of time for a given VNI. In this case,
806	   the aliasing is not required and the split-horizon filtering may not
807	   be required, but other functions such as multi-homed Ethernet segment
808	   auto-discovery, fast convergence and mass withdraw, backup path, and
809	   DF election are required.

811	   Second, let's consider the case of All-Active redundancy mode. In
812	   this case, out of all the EVPN multi-homing features listed in
813	   section 8.1, the use of the VXLAN or NVGRE encapsulation impacts the
814	   split-horizon and aliasing features, since those two rely on the MPLS
815	   client layer. Given that this MPLS client layer is absent with these
816	   types of encapsulations, alternative procedures and mechanisms are
817	   needed to provide the required functions. Those are discussed in
818	   detail next.

820	8.3.1 Split Horizon

822	   In EVPN, an MPLS label is used for split-horizon filtering to support
823	   All-Active multi-homing where an ingress NVE adds a label
824	   corresponding to the site of origin (aka ESI Label) when
825	   encapsulating the packet. The egress NVE checks the ESI label when
826	   attempting to forward a multi-destination frame out an interface, and
827	   if the label corresponds to the same site identifier (ESI) associated
828	   with that interface, the packet gets dropped. This prevents the
829	   occurrence of forwarding loops.

831	   Since the VXLAN or NVGRE encapsulation does not include this ESI
832	   label, other means of performing the split-horizon filtering function
833	   MUST be devised. The following approach is recommended for split-
834	   horizon filtering when VXLAN (or NVGRE) encapsulation is used.

836	   Every NVE track the IP address(es) associated with the other NVE(s)
837	   with which it has shared multi-homed Ethernet Segments. When the NVE
838	   receives a multi-destination frame from the overlay network, it
839	   examines the source IP address in the tunnel header (which
840	   corresponds to the ingress NVE) and filters out the frame on all
841	   local interfaces connected to Ethernet Segments that are shared with
842	   the ingress NVE. With this approach, it is required that the ingress
843	   NVE performs replication locally to all directly attached Ethernet
844	   Segments (regardless of the DF Election state) for all flooded
845	   traffic ingress from the access interfaces (i.e. from the hosts).
846	   This approach is referred to as "Local Bias", and has the advantage
847	   that only a single IP address needs to be used per NVE for split-
848	   horizon filtering, as opposed to requiring an IP address per Ethernet
849	   Segment per NVE.

851	   In order to prevent unhealthy interactions between the split horizon
852	   procedures defined in [RFC7432] and the local bias procedures
853	   described in this document, a mix of MPLS over GRE encapsulations on
854	   the one hand and VXLAN/NVGRE encapsulations on the other on a given
855	   Ethernet Segment is prohibited.

857	8.3.2 Aliasing and Backup-Path

859	   The Aliasing and the Backup-Path procedures for VXLAN/NVGRE
860	   encapsulation are very similar to the ones for MPLS. In case of MPLS,
861	   Ethernet A-D route per EVI is used for Aliasing when the
862	   corresponding Ethernet Segment operates in All-Active multi-homing,
863	   and the same route is used for Backup-Path when the corresponding
864	   Ethernet Segment operates in Single-Active multi-homing. In case of
865	   VxLAN/NVGRE, the same route is used for the Aliasing and the Backup-
866	   Path with the difference that the Ethernet Tag and VNI fields in
867	   Ethernet A-D per EVI route are set as described in section 5.1.3.

869	8.3.3 Unknown Unicast Traffic Designation

871	   In EVPN, when an ingress PE uses ingress replication to flood unknown
872	   unicast traffic to egress PEs, the ingress PE uses a different EVPN
873	   MPLS label (from the one used for known unicast traffic) to identify
874	   such BUM traffic. The egress PEs use this label to identify such BUM
875	   traffic and thus apply DF filtering for All-Active multi-homed sites.
876	   In absence of unknown unicast traffic designation and in presence of
877	   enabling unknown unicast flooding, there can be transient duplicate
878	   traffic to All-Active multi-homed sites under the following
879	   condition: the host MAC address is learned by the egress PE(s) and
880	   advertised to the ingress PE; however, the MAC advertisement has not
881	   been received or processed by the ingress PE, resulting in the host
882	   MAC address to be unknown on the ingress PE but be known on the
883	   egress PE(s). Therefore, when a packet destined to that host MAC
884	   address arrives on the ingress PE, it floods it via ingress
885	   replication to all the egress PE(s) and since they are known to the
886	   egress PE(s), multiple copies is sent to the All-Active multi-homed
887	   site. It should be noted that such transient packet duplication only
888	   happens when a) the destination host is multi-homed via All-Active
889	   redundancy mode, b) flooding of unknown unicast is enabled in the
890	   network, c) ingress replication is used, and d) traffic for the
891	   destination host is arrived on the ingress PE before it learns the
892	   host MAC address via BGP EVPN advertisement. In order to prevent such
893	   occurrence of packet duplication (however low probability that may
894	   be), the ingress PE MAY use a flag-bit in the VxLAN header to
895	   indicate BUM traffic type. Bit 6 of flag field in the VxLAN header is
896	   used for this purpose per section 3.1 of [VXLAN-GPE].

898	9 Support for Multicast
899	   The E-VPN Inclusive Multicast Ethernet Tag (IMET) route is used to
900	   discover the multicast tunnels among the endpoints associated with a
901	   given EVI (e.g., given VNI) for VLAN-based service and a given
902	   <EVI,VLAN> for VLAN-aware bundle service. All fields of this route is
903	   set as described in section 5.1.3. The Originating router's IP
904	   address field is set to the NVE's IP address. This route is tagged
905	   with the PMSI Tunnel attribute, which is used to encode the type of
906	   multicast tunnel to be used as well as the multicast tunnel
907	   identifier. The tunnel encapsulation is encoded by adding the BGP
908	   Encapsulation extended community as per section 5.1.1. For example,
909	   the PMSI Tunnel attribute may indicate the multicast tunnel is of
910	   type PIM-SM; whereas, the BGP Encapsulation extended community may
911	   indicate the encapsulation for that tunnel is of type VxLAN. The
912	   following tunnel types as defined in [RFC6514] can be used in the
913	   PMSI tunnel attribute for VXLAN/NVGRE:

915	         + 3 - PIM-SSM Tree
916	         + 4 - PIM-SM Tree
917	         + 5 - BIDIR-PIM Tree
918	         + 6 - Ingress Replication

920	   Except for Ingress Replication, this multicast tunnel is used by the
921	   PE originating the route for sending multicast traffic to other PEs,
922	   and is used by PEs that receive this route for receiving the traffic
923	   originated by hosts connected to the PE that originated the route.

925	   In the scenario where the multicast tunnel is a tree, both the
926	   Inclusive as well as the Aggregate Inclusive variants may be used. In
927	   the former case, a multicast tree is dedicated to a VNI. Whereas, in
928	   the latter, a multicast tree is shared among multiple VNIs. For VNI-
929	   based service, the Aggregate Inclusive mode is accomplished by having
930	   the NVEs advertise multiple IMET routes with different Route Targets
931	   (one per VNI) but with the same tunnel identifier encoded in the PMSI
932	   tunnel attribute. For VNI-aware bundle service, the Aggregate
933	   Inclusive mode is accomplished by having the NVEs advertise multiple
934	   IMET routes with different VNI encoded in the Ethernet Tag field, but
935	   with the same tunnel identifier encoded in the PMSI Tunnel attribute.

937	10 Data Center Interconnections - DCI

939	   For DCI, the following two main scenarios are considered when
940	   connecting data centers running evpn-overlay (as described here) over
941	   MPLS/IP core network:

943	   - Scenario 1: DCI using GWs
944	   - Scenario 2: DCI using ASBRs
945	   The following two subsections describe the operations for each of
946	   these scenarios.

948	10.1 DCI using GWs

950	   This is the typical scenario for interconnecting data centers over
951	   WAN. In this scenario, EVPN routes are terminated and processed in
952	   each GW and MAC/IP routes are always re-advertised from DC to WAN but
953	   from WAN to DC, they are not re-advertised if unknown MAC address
954	   (and default IP address) are utilized in NVEs. In this scenario, each
955	   GW maintains a MAC-VRF (and/or IP-VRF) for each EVI. The main
956	   advantage of this approach is that NVEs do not need to maintain MAC
957	   and IP addresses from any remote data centers when default IP route
958	   and unknown MAC routes are used - i.e., they only need to maintain
959	   routes that are local to their own DC. When default IP route and
960	   unknown MAC route are used, any unknown IP and MAC packets from NVEs
961	   are forwarded to the GWs where all the VPN MAC and IP routes are
962	   maintained. This approach reduces the size of MAC-VRF and IP-VRF
963	   significantly at NVEs. Furthermore, it results in a faster
964	   convergence time upon a link or NVE failure in a multi-homed network
965	   or device redundancy scenario, because the failure related BGP routes
966	   (such as mass withdraw message) do not need to get propagated all the
967	   way to the remote NVEs in the remote DCs. This approach is described
968	   in details in section 3.4 of [DCI-EVPN-OVERLAY].

970	10.2 DCI using ASBRs

972	   This approach can be considered as the opposite of the first approach
973	   and it favors simplification at DCI devices over NVEs such that
974	   larger MAC-VRF (and IP-VRF) tables need to be maintained on NVEs;
975	   whereas, DCI devices don't need to maintain any MAC (and IP)
976	   forwarding tables. Furthermore, DCI devices do not need to terminate
977	   and process routes related to multi-homing but rather to relay these
978	   messages for the establishment of an end-to-end LSP path. In other
979	   words, DCI devices in this approach operate similar to ASBRs for
980	   inter-AS option B - section 10 of [RFC4364]. This requires locally
981	   assigned VNIs to be used just like downstream assigned MPLS VPN label
982	   where for all practical purposes the VNIs function like 24-bit VPN
983	   labels. This approach is equally applicable to data centers (or
984	   Carrier Ethernet networks) with MPLS encapsulation.

986	   In inter-AS option B, when ASBR receives an EVPN route from its DC
987	   over iBGP and re-advertises it to other ASBRs, it re-advertises the
988	   EVPN route by re-writing the BGP next-hops to itself, thus losing the
989	   identity of the PE that originated the advertisement. This re-write
990	   of BGP next-hop impacts the EVPN Mass Withdraw route (Ethernet A-D
991	   per ES) and its procedure adversely. However, it does not impact EVPN
992	   Aliasing mechanism/procedure because when the Aliasing routes (Ether
993	   A-D per EVI) are advertised, the receiving PE first resolves a MAC
994	   address for a given EVI into its corresponding <ES,EVI> and
995	   subsequently, it resolves the <ES,EVI> into multiple paths (and their
996	   associated next hops) via which the <ES,EVI> is reachable. Since
997	   Aliasing and MAC routes are both advertised per EVI basis and they
998	   use the same RD and RT (per EVI), the receiving PE can associate them
999	   together on a per BGP path basis (e.g., per originating PE) and thus
1000	   perform recursive route resolution - e.g., a MAC is reachable via an
1001	   <ES,EVI> which in turn, is reachable via a set of BGP paths, thus the
1002	   MAC is reachable via the set of BGP paths. Since on a per EVI basis,
1003	   the association of MAC routes and the corresponding Aliasing route is
1004	   fixed and determined by the same RD and RT, there is no ambiguity
1005	   when the BGP next hop for these routes is re-written as these routes
1006	   pass through ASBRs - i.e., the receiving PE may receive multiple
1007	   Aliasing routes for the same EVI from a single next hop (a single
1008	   ASBR), and it can still create multiple paths toward that <ES, EVI>.

1010	   However, when the BGP next hop address corresponding to the
1011	   originating PE is re-written, the association between the Mass
1012	   Withdraw route (Ether A-D per ES) and its corresponding MAC routes
1013	   cannot be made based on their RDs and RTs because the RD for Mass
1014	   Withdraw route is different than the one for the MAC routes.
1015	   Therefore, the functionality needed at the ASBRs and the receiving
1016	   PEs depends on whether the Mass Withdraw route is originated and
1017	   whether there is a need to handle route resolution ambiguity for this
1018	   route. The following two subsections describe the functionality
1019	   needed by the ASBRs and the receiving PEs depending on whether the
1020	   NVEs reside in a Hypervisors or in TORs.

1022	10.2.1 ASBR Functionality with Single-Homing NVEs

1024	   When NVEs reside in hypervisors as described in section 7.1, there is
1025	   no multi-homing and thus there is no need for the originating NVE to
1026	   send Ethernet A-D per ES or Ethernet A-D per EVI routes. However, as
1027	   noted in section 7, in order to enable a single-homing ingress NVE to
1028	   take advantage of fast convergence, aliasing, and backup-path when
1029	   interacting with multi-homing egress NVEs attached to a given
1030	   Ethernet segment, the single-homing NVE SHOULD be able to receive and
1031	   process Ethernet AD per ES and Ethernet AD per EVI routes. The
1032	   handling of these routes are described in the next section.

1034	10.2.2 ASBR Functionality with Multi-Homing NVEs

1036	   When NVEs reside in TORs and operate in multi-homing redundancy mode,
1037	   then as described in section 8, there is a need for the originating
1038	   multi-homing NVE to send Ethernet A-D per ES route(s) (used for mass
1039	   withdraw) and Ethernet A-D per EVI routes (used for aliasing). As
1040	   described above, the re-write of BGP next-hop by ASBRs creates
1041	   ambiguities when Ethernet A-D per ES routes are received by the
1042	   remote NVE in a different ASBR because the receiving NVE cannot
1043	   associated that route with the MAC/IP routes of that Ethernet Segment
1044	   advertised by the same originating NVE. This ambiguity inhibits the
1045	   function of mass-withdraw per ES by the receiving NVE in a different
1046	   AS.

1048	   As an example consider a scenario where CE is multi-homed to PE1 and
1049	   PE2 where these PEs are connected via ASBR1 and then ASBR2 to the
1050	   remote PE3. Furthermore, consider that PE1 receives M1 from CE1 but
1051	   not PE2. Therefore, PE1 advertises Eth A-D per ES1, Eth A-D per EVI1,
1052	   and M1; whereas, PE2 only advertises Eth A-D per ES1 and Eth A-D per
1053	   EVI1. ASBR1 receives all these five advertisements and passes them to
1054	   ASBR2 (with itself as the BGP next hop). ASBR2, in turn, passes them
1055	   to the remote PE3 with itself as the BGP next hop. PE3 receives these
1056	   five routes where all of them have the same BGP next-hop (i.e.,
1057	   ASBR2). Furthermore, the two Ether A-D per ES routes received by PE3
1058	   have the same info - i.e., same ESI and the same BGP next hop.
1059	   Although both of these routes are maintained by the BGP process in
1060	   PE3 (because they have different RDs and thus treated as different
1061	   BGP routes), information from only one of them is used in the L2
1062	   routing table (L2 RIB).

1064	                      PE1
1065	                     /   \
1066	                    CE     ASBR1---ASBR2---PE3
1067	                     \   /
1068	                      PE2

1070	                      Figure 1: Inter-AS Option B

1072	   Now, when the AC between the PE2 and the CE fails and PE2 sends NLRI
1073	   withdrawal for Ether A-D per ES route and this withdrawal gets
1074	   propagated and received by the PE3, the BGP process in PE3 removes
1075	   the corresponding BGP route; however, it doesn't remove the
1076	   associated info (namely ESI and BGP next hop) from the L2 routing
1077	   table (L2 RIB) because it still has the other Ether A-D per ES route
1078	   (originated from PE1) with the same info. That is why the mass-
1079	   withdraw mechanism does not work when doing DCI with inter-AS option
1080	   B. However, as described previoulsy, the aliasing function works and
1081	   so does "mass-withdraw per EVI" (which is associated with withdrawing
1082	   the EVPN route associated with Aliasing - i.e., Ether A-D per EVI
1083	   route).

1085	   In the above example, the PE3 receives two Aliasing routes with the
1086	   same BGP next hop (ASBR2) but different RDs. One of the Alias route
1087	   has the same RD as the advertised MAC route (M1). PE3 follows the
1088	   route resolution procedure specified in [RFC7432] upon receiving the
1089	   two Aliasing route - ie, it resolves M1 to <ES, EVI1> and
1090	   subsequently it resolves <ES,EVI1> to a BGP path list with two paths
1091	   along with the corresponding VNIs/MPLS labels (one associated with
1092	   PE1 and the other associated with PE2). It should be noted that even
1093	   though both paths are advertised by the same BGP next hop (ASRB2),
1094	   the receiving PE3 can handle them properly. Therefore, M1 is
1095	   reachable via two paths. This creates two end-to-end LSPs, from PE3
1096	   to PE1 and from PE3 to PE2, for M1 such that when PE3 wants to
1097	   forward traffic destined to M1, it can load balanced between the two
1098	   LSPs. Although route resolution for Aliasing routes with the same BGP
1099	   next hop is not explicitly mentioned in [RFC7432], this is the
1100	   expected operation and thus it is elaborated here.

1102	   When the AC between the PE2 and the CE fails and PE2 sends NLRI
1103	   withdrawal for Ether A-D per EVI routes and these withdrawals get
1104	   propagated and received by the PE3, the PE3 removes the Aliasing
1105	   route and updates the path list - ie, it removes the path
1106	   corresponding to the PE2. Therefore, all the corresponding MAC routes
1107	   for that <ES,EVI> that point to that path list will now have the
1108	   updated path list with a single path associated with PE1. This action
1109	   can be considered as the mass-withdraw at the per-EVI level. The
1110	   mass-withdraw at per-EVI level has longer convergence time than the
1111	   mass-withdraw at per-ES level; however, it is much faster than the
1112	   convergence time when the withdraw is done on a per-MAC basis.

1114	   If a PE becomes detached from a given ES, then in addition to
1115	   withdrawing its previously advertised Ethernet AD Per ES routes, it
1116	   MUST also withdraw its previously advertised Ethernet AD Per EVI
1117	   routes for that ES.  For a remote PE that is separated from the
1118	   withdrawing PE by one or more EVPN inter-AS option B ASBRs, the
1119	   withdrawal of the Ethernet AD Per ES routes is not actionable.
1120	   However, a remote PE is able to correlate a previously advertised
1121	   Ethernet AD Per EVI route with any MAC/IP Advertisement routes also
1122	   advertised by the withdrawing PE for that <ES, EVI, BD>.  Hence, when
1123	   it receives the withdrawal of an Ethernet AD Per EVI route, it SHOULD
1124	   remove the withdrawing PE as a next-hop for all MAC addresses
1125	   associated with that <ES, EVI, BD>.

1127	   In the previous example, when the AC between PE2 and the CE fails,
1128	   PE2 will withdraw its Ethernet AD Per ES and Per EVI routes.  When
1129	   PE3 receives the withdrawal of an Ethernet AD Per EVI route, it
1130	   removes PE2 as a valid next-hop for all MAC addresses associated with
1131	   the corresponding <ES, EVI, BD>.  Therefore, all the MAC next-hops
1132	   for that <ES,EVI, BD> will now have a single next-hop, viz the LSP to
1133	   PE1.

1135	   In summary, it can be seen that aliasing (and backup path)
1136	   functionality should work as is for inter-AS option B without
1137	   requiring any addition functionality in ASBRs or PEs. However, the
1138	   mass-withdraw functionality falls back from per-ES mode to per-EVI
1139	   mode for inter-AS option B - i.e., PEs receiving mass-withdraw route
1140	   from the same AS take action on Ether A-D per ES route; whereas, PEs
1141	   receiving mass-withdraw route from different AS take action on Ether
1142	   A-D per EVI route.

1144	11  Acknowledgement

1146	   The authors would like to thank Aldrin Isaac, David Smith, John
1147	   Mullooly, Thomas Nadeau for their valuable comments and feedback. The
1148	   authors would also like to thank Jakob Heitz for his contribution on
1149	   section 10.2.

1151	12  Security Considerations

1153	   This document uses IP-based tunnel technologies to support data
1154	   plane transport.  Consequently, the security considerations of those
1155	   tunnel technologies apply.  This document defines support for VXLAN
1156	   and NVGRE encapsulations. The security considerations from those
1157	   documents as well as [RFC4301] apply to the data plane aspects of
1158	   this document.

1160	   As with [RFC5512], any modification of the information that is used
1161	   to form encapsulation headers, to choose a tunnel type, or to choose
1162	   a particular tunnel for a particular payload type may lead to user
1163	   data packets getting misrouted, misdelivered, and/or dropped.

1165	   More broadly, the security considerations for the transport of IP
1166	   reachability information using BGP are discussed in [RFC4271] and
1167	   [RFC4272], and are equally applicable for the extensions described
1168	   in this document.

1170	   If the integrity of the BGP session is not itself protected, then an
1171	   imposter could mount a denial-of-service attack by establishing
1172	   numerous BGP sessions and forcing an IPsec SA to be created for each
1173	   one.  However, as such an imposter could wreak havoc on the entire
1174	   routing system, this particular sort of attack is probably not of
1175	   any special importance.

1177	   It should be noted that a BGP session may itself be transported over
1178	   an IPsec tunnel.  Such IPsec tunnels can provide additional security
1179	   to a BGP session.  The management of such IPsec tunnels is outside
1180	   the scope of this document.

1182	13  IANA Considerations

1184	   IANA has allocated the following BGP Tunnel Encapsulation Attribute
1185	   Tunnel Types:

1187	   8        VXLAN Encapsulation
1188	   9        NVGRE Encapsulation
1189	   10       MPLS Encapsulation
1190	   11       MPLS in GRE Encapsulation
1191	   12       VXLAN GPE Encapsulation

1193	14  References

1195	14.1  Normative References

1197	   [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate
1198	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1200	   [KEYWORDS]  Bradner, S., "Key words for use in RFCs to Indicate
1201	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1203	   [RFC4271]  Y. Rekhter, Ed., T. Li, Ed., S. Hares, Ed., "A Border
1204	              Gateway Protocol 4 (BGP-4)", January 2006.

1206	   [RFC4301]   S. Kent, K. Seo., "Security Architecture for the
1207	              Internet Protocol.", December 2005.

1209	   [RFC5512]  Mohapatra, P. and E. Rosen, "The BGP Encapsulation
1210	              Subsequent Address Family Identifier (SAFI) and the BGP
1211	              Tunnel Encapsulation Attribute", RFC 5512, April 2009.

1213	   [RFC7432] Sajassi et al., "BGP MPLS Based Ethernet VPN",  RFC 7432,
1214	              February 2014

1216	14.2  Informative References

1218	   [RFC7209] Sajassi et al., "Requirements for Ethernet VPN (EVPN)", RFC
1219	   7209, May 2014

1221	   [RFC7348] Mahalingam, M., et al, "VXLAN: A Framework for Overlaying
1222	   Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, August
1223	   2014

1225	   [RFC4272]  S. Murphy, "BGP Security Vulnerabilities Analysis.",
1226	   January 2006.

1228	   [NVGRE]   Garg, P., et al., "NVGRE: Network Virtualization using
1229	   Generic Routing Encapsulation", RFC 7637, September, 2015

1231	   [Problem-Statement] Narten et al., "Problem Statement: Overlays for
1232	   Network Virtualization", RFC 7364, October 2014.

1234	   [NVO3-FRWK] Lasserre et al., "Framework for DC Network
1235	   Virtualization", RFC 7365, October 2014.

1237	   [DCI-EVPN-OVERLAY] Rabadan et al., "Interconnect Solution for EVPN
1238	   Overlay networks", draft-ietf-bess-dci-evpn-overlay-04, work in
1239	   progress, February 29, 2016.

1241	   [TUNNEL-ENCAP] Rosen et al., "The BGP Tunnel Encapsulation
1242	   Attribute", draft-ietf-idr-tunnel-encaps-03, work in progress, May
1243	   31, 2016.

1245	   [VXLAN-GPE] Maino et al., "Generic Protocol Extension for VXLAN",
1246	   draft-ietf-nvo3-vxlan-gpe-03, work in progress October 25, 2016.

1248	   [RFC4364] Rosen, E., et al, "BGP/MPLS IP Virtual Private Networks
1249	   (VPNs)", RFC 4364, February 2006.

1251	   [RFC4023] T. Worster et al., "Encapsulating MPLS in IP or Generic
1252	   Routing Encapsulation (GRE)", RFC 4023, March 2005

1254	   [RFC6514] R. Aggarwal et al., "BGP Encodings and Procedures for
1255	   Multicast in MPLS/BGP IP VPNs", RFC 6514, February 2012

1257	Contributors

1259	   S. Salam
1260	   K. Patel
1261	   D. Rao
1262	   S. Thoria
1263	   D. Cai
1264	   Cisco

1266	   Y. Rekhter
1267	   A. Issac
1268	   Wen Lin
1269	   Nischal Sheth
1270	   Juniper

1272	   L. Yong
1273	   Huawei

1275	Authors' Addresses

1277	   Ali Sajassi
1278	   Cisco
1279	   Email: sajassi@cisco.com

1281	   John Drake
1282	   Juniper Networks
1283	   Email: jdrake@juniper.net

1285	   Nabil Bitar
1286	   Nokia
1287	   Email : nabil.bitar@nokia.com

1289	   R. Shekhar
1290	   Juniper
1291	   Email: rshekhar@juniper.net

1293	   James Uttaro
1294	   AT&T
1295	   Email: uttaro@att.com

1297	   Wim Henderickx
1298	   Alcatel-Lucent
1299	   e-mail: wim.henderickx@nokia.com