idnits 2.17.1 

draft-mackenzie-bess-evpn-l3mh-proto-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  == There are 9 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'SHOULD not' in this paragraph:
     
     Local ARP/ND learning will trigger a RT-2 route sync to any peer
     PE. There is no need for local MAC learning or sync over the L3
     interface, only adjacencies.  The MAC-only RT-2 route SHOULD not be
     advertised to peer PE.

  -- The document date (July 11, 2021) is 1018 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-21) exists of
     draft-ietf-bess-evpn-igmp-mld-proxy-09

  == Outdated reference: A later version (-06) exists of
     draft-sajassi-bess-evpn-ac-aware-bundling-03


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	BESS Working Group                                     M. MacKenzie, Ed.
3	Internet-Draft                                              P. Brissette
4	Intended status: Standards Track                                   Cisco
5	Expires: January 12, 2022                                  S. Matsushima
6	                                                                Softbank
7	                                                           July 11, 2021

9	               EVPN multi-homing support for L3 services
10	                draft-mackenzie-bess-evpn-l3mh-proto-00

12	Abstract

14	   This document brings the machinery and solution providing higher
15	   network availability and load balancing benefits of EVPN Multi-
16	   Chassis Link Aggregation Group (MC-LAG) to various L3 services
17	   delivered by EVPN.

19	Requirements Language

21	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
22	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
23	   document are to be interpreted as described in RFC 2119 [RFC2119] and
24	   RFC 8174 [RFC8174].

26	Status of This Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at https://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on January 12, 2022.

43	Copyright Notice

45	   Copyright (c) 2021 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (https://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
61	     1.1.  Problems with unicast load-balancing from core to CE  . .   4
62	     1.2.  Problems with multicast from core to CE . . . . . . . . .   4
63	     1.3.  Problems with IGP adjacencies over the LAG port . . . . .   5
64	     1.4.  Problems with supporting multiple subnets on same ES in
65	           all active mode . . . . . . . . . . . . . . . . . . . . .   6
66	     1.5.  Acronyms  . . . . . . . . . . . . . . . . . . . . . . . .   6
67	     1.6.  Requirements  . . . . . . . . . . . . . . . . . . . . . .   8
68	   2.  Solution  . . . . . . . . . . . . . . . . . . . . . . . . . .   8
69	     2.1.  Mapping of L3VRF to EVPN EVI  . . . . . . . . . . . . . .  10
70	     2.2.  Mapping for L3 Interface to ESI . . . . . . . . . . . . .  11
71	     2.3.  Mapping for L3 Sub-Interface to Attachment Circuit ID . .  11
72	     2.4.  Route sync for ARP/ND . . . . . . . . . . . . . . . . . .  11
73	       2.4.1.  Local adjacency (ARP/ND) learning . . . . . . . . . .  11
74	       2.4.2.  Remote ARP/ND learning  . . . . . . . . . . . . . . .  12
75	     2.5.  Route sync for IGMP . . . . . . . . . . . . . . . . . . .  12
76	       2.5.1.  Local IGMP Join/Leave learning  . . . . . . . . . . .  13
77	       2.5.2.  Remote IGMP Join/Leave learning . . . . . . . . . . .  13
78	     2.6.  Customer Subnet Route sync using Route-type(5)  . . . . .  13
79	     2.7.  Mapping for VLAN to ETAG  . . . . . . . . . . . . . . . .  14
80	   3.  Extensions to RT-2, RT-5, RT-7 and RT-8 . . . . . . . . . . .  14
81	   4.  Convergence Considerations  . . . . . . . . . . . . . . . . .  14
82	   5.  Overall Advantages  . . . . . . . . . . . . . . . . . . . . .  15
83	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  15
84	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
85	   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
86	     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  15
87	     8.2.  Informative References  . . . . . . . . . . . . . . . . .  16
88	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

90	1.  Introduction

92	   Resilient L3VPN service to a CE requires multiple service PEs to run
93	   a MC-LAG mechanism, which previously required a proprietary ICL
94	   control plane link between them.

96	   This proposed extension to [RFC7432] brings EVPN based MC-LAG all-
97	   active multi-homing load-balancing to various services (L2 and L3)
98	   delivered by EVPN.  Although this solution is also applicable to some
99	   L2 service use cases, (example Centralized Gateway) this document
100	   will focus on the L3VPN [RFC4364] use case to provide examples.

102	   EVPN MC-LAG is completely transparent to a CE device, and provides
103	   link and node level redundancy with load-balancing using the existing
104	   BGP control plane required by the L3 services.

106	   For example, the L3VPN service can be MPLS, VxLAN or SRv6 based, and
107	   does not require EVPN signaling to remote neighbors.  The EVPN
108	   signaling will be limited to the redundant service PEs sharing a
109	   Ethernet Segment Identifier (ESI).  This will be used to synchronize
110	   ARP/ND, multicast Join/Leave, and IGP routes replacing need for ICL
111	   link.

113	                       +-----+
114	                       | PE3 |
115	                       +-----+
116	                    +-----------+
117	                    |  MPLS/IP  |
118	                    |  CORE     |
119	                    +-----------+
120	                  +-----+   +-----+
121	                  | PE1 |   | PE2 |
122	                  +-----+   +-----+
123	                     |         |
124	                     I1       I2
125	                       \     /
126	                        \   /
127	                        +---+
128	                        |CE1|
129	                        +---+

131	                      Figure 1: EVPN MC-LAG Topology

133	   Figure 1 shows a MC-LAG multi-homing topology where PE1 and PE2 are
134	   part of the same redundancy group providing multi-homing to CE1 via
135	   interfaces I1 and I2.  Interfaces I1 and I2 are Bundle-Ethernet
136	   interfaces running LACP protocol.  The CE device can be a layer-2 or
137	   layer-3 device connecting to the redundant PEs over a single LACP LAG
138	   port.  In the case of a layer-3 CE device, this document looks to
139	   solve the case of an IGP adjacency between PEs and CE, but further
140	   study is needed to support BGP PE to CE protocols.  The core, shown
141	   as IP or MPLS enabled, provides wide range of L3 services.  MC-LAG
142	   multi-homing functionality is decoupled from those services in the
143	   core and it focuses on providing multi-homing to CE.

145	   To deliver resilient layer-3 services and provide traffic load-
146	   balancing towards the access, the two service PEs will advertise
147	   layer-3 reach-ability towards the layer-3 core and will both be
148	   eligible to receive traffic and forward towards the Access.

150	1.1.  Problems with unicast load-balancing from core to CE

152	   The layer-2 hashing performed by CE over its LAG port means that its
153	   possible for only one service PE to populate its ARP/ND cache.  Take
154	   for example PE1 and PE2 from Figure 1.  If CE1 ARP/ND response
155	   happens to always hash over I1 towards PE1, then PE2 ARP/ND table
156	   will be empty.  Since unicast traffic from remote PEs can be received
157	   by either service PE, traffic that reaches the service PE2 will not
158	   find an ARP entry matching the host IP address and traffic will drop
159	   until ARP/ND resolves the adjacency.

161	   If the CEs hash implementation always calculates the ARP/ND response
162	   towards PE1, the resolution on PE2 will never happen and traffic load
163	   balanced to PE2 will black-hole.

165	   The route sync solution is described in Section 2.4

167	1.2.  Problems with multicast from core to CE

169	   Similar to the unicast behavior above, multicast IGMP join messages
170	   from CE to LAG link may always hash to a single PE.

172	   When PIM runs on both redundant layer-3 PEs that both service
173	   multicast for the same access segment, PIM elects only one of the PEs
174	   as a PIM Designated Router (DR) using PIM DR election algorithm
175	   [RFC7761].  The PIM DR is responsible for tracking local multicast
176	   listeners and forwarding traffic to those listeners.  The PIM DR is
177	   also responsible for sending local Join/Prune messages towards the RP
178	   or source.

180	   For example, if in Figure 1 PE2 is designated PIM-RP, but CE IGMP
181	   join messages are hashed to I1 towards PE1, then multicast traffic
182	   will not be attracted to this service pair as PE2 will not send PIM
183	   Join on behalf of CE.

185	   In order to ensure that the PIM DR always has all the MCAST route(s)
186	   and able to forward PIM Join/Prune message towards RP, BGP-EVPN
187	   multicast route-sync will be leveraged to synchronize MCAST route(s)
188	   learned to the DR.

190	   When a fail-over occurs, multicast states would be pre-programmed on
191	   the newly elected DR service PE and assumes responsibility for the
192	   routing and forwarding of all the traffic.

194	   The multicast route sync solution is described in Section 2.5

196	1.3.  Problems with IGP adjacencies over the LAG port

198	   A layer-3 CE device/router that connects to the redundant PEs may
199	   establish an IGP adjacency on the bundle port.  In this case, the
200	   adjacency will be formed to one of the PEs and IGP customer route(s)
201	   will only be present on that PE.

203	   This prevents the load-balancing benefits of redundant PEs from
204	   supporting this use case, as only one PE will be aware and
205	   advertising the customer routes to the core.

207	                     <---------+
208	                               | IGP Adj
209	       +-------+               |
210	       |       | 1.1.1.1/24    |
211	       | PE1   +-----------+   |
212	       |       |           |   |
213	       |       |           |   +
214	       +-------+           |
215	                           |
216	           +               |  +------+
217	     RT5   |             L |  | CE   +------>H1
218	     Sync  |             A +->+      |
219	           v             G |  |      |
220	                           |  |      +------>R1
221	       +-------+           |  +------+
222	       |       |           |    1.1.1.2/2
223	       | PE2   +-----------+
224	       |       | 1.1.1.1/24
225	       |       |
226	       +-------+

228	                   Figure 2: IGP Adjacency over LAG Port

230	   Figure 2 provides an example of this use case, where CE forms an IGP
231	   adjacency with PE1 (example: ISIS or OSPF), and advertises its H1 and
232	   R1 routes into the IP-VRF of PE1.  PE1 may then redistribute this IGP
233	   route into the core as an L3 service.  Any remote PEs will only be
234	   aware of the service from PE1, and cannot load balance through PE2 as
235	   well.

237	   Further study is required in order to support the case of BGP PE to
238	   CE protocols.

240	   A solution to this is described in Section 2.6

242	1.4.  Problems with supporting multiple subnets on same ES in all active
243	      mode

245	   In the case where the L3 service is L3VPN such as [RFC4364], it is
246	   likely the CE device could be a layer-2 switch supporting multiple
247	   subnets through the use of VLANs.  In addition, each VLAN may be
248	   associated with a different customer VRF.

250	   When ARP/ND routes are synchronized between the PEs for ARP proxy
251	   support using RT-2, a similar problem is encountered as described by
252	   Section 1.1 of [I-D.sajassi-bess-evpn-ac-aware-bundling].  The PE
253	   receiving RT-2 is unable to determine which sub-interface the ARP/ND
254	   entry is associated with.

256	   When IGMP routes are synchronized between the PEs using RT-7 and RT-
257	   8, a similar problem is encountered as described by Section 1.2 of
258	   [I-D.sajassi-bess-evpn-ac-aware-bundling].  The PE receiving RT-7 and
259	   RT-8 is unable to determine which sub-interface the IGMP join is
260	   associated with.

262	   This document proposes to use the solution defined by Section 4 of
263	   [I-D.sajassi-bess-evpn-ac-aware-bundling] to solve both these cases.
264	   All route sync messages (RT-2, RT-5, RT-7, RT-8) will carry an
265	   Attachment Circuit Identifier Extended Community to signal which sub-
266	   interface the routes were learnt on.

268	1.5.  Acronyms

270	   BD:  Broadcast Domain.  As per [RFC7432], an EVI consists of a single
271	      or multiple BDs.  In case of VLAN-bundle and VLAN-aware bundle
272	      service model, an EVI contains multiple BDs.

274	   DF:  Designated Forwarder

276	   DR:  Designated Router

278	   EC:  BGP Extended Community

280	   ES:  Ethernet Segment.  When a customer site (device or network) is
281	      connected to one or more PEs via a set of Ethernet links, then
282	      that set of links is referred to as an 'Ethernet Segment'.

284	   ESI:  Ethernet Segment Identifier.  A unique non-zero identifier that
285	      identifies an Ethernet Segment is called an 'Ethernet Segment
286	      Identifier'.

288	   ETAG:  Ethernet Tag. An Ethernet tag identifies a particular
289	      broadcast domain, e.g., a VLAN.  An EVPN instance consists of one
290	      or more broadcast domains.

292	   EVI:  An EVPN instance spanning the Provider Edge (PE) devices
293	      participating in that EVPN

295	   ICL:  Inter Chassis Link

297	   IGMP:  Internet Group Management Protocol

299	   IP-VRF:  A VPN Routing and Forwarding table for IP routes on an PE.
300	      The IP routes could be populated by EVPN and IP-VPN address
301	      families.  An IP-VRF is also an instantiation of a layer 3 VPN in
302	      an PE.

304	   L3AA  All-Active Redundancy Mode for Layer 3 services.  When all PEs
305	      attached to an Ethernet segment are allowed to forward known
306	      unicast traffic to/from that Ethernet segment for a given VLAN,
307	      then the Ethernet segment is defined to be operating in All-Active
308	      redundancy mode.

310	   MAC-VRF:  A Virtual Routing and Forwarding table for Media Access
311	      Control (MAC) addresses on a PE.  A MAC-VRF is also an
312	      instantiation of an EVI in a PE

314	   MC-LAG:  Multi-Chassis Link Aggregation Group (MC-LAG).

316	   PE:  Provider Edge.

318	   PIM:  Protocol Independent Multicast

320	   RT-2:  EVPN route type 2, i.e., MAC/IP advertisement route, as
321	      defined in [RFC7432].

323	   RT-5:  EVPN route type 5, i.e., IP Prefix route, as defined in
324	      Section 3 of [I-D.ietf-bess-evpn-prefix-advertisement]

326	   RT-7:  EVPN route type 7, i.e., Multicast Join Synch Route, as
327	      defined in Section 9.2 of [I-D.ietf-bess-evpn-igmp-mld-proxy]

329	   RT-8:  EVPN route type 8, i.e., Multicast Leave Synch Route, as
330	      defined in Section 9.3 of [I-D.ietf-bess-evpn-igmp-mld-proxy]

332	1.6.  Requirements

334	   1.  The multi-homing solution MUST support Layer-3 access interface

336	   2.  The multi-homing solution MUST support Layer-3 access sub-
337	       interface

339	   3.  The solution MUST support unicast and multicast VPN services

341	   4.  The solution SHOULD support igp synchronization

343	   5.  The solution SHOULD support unicast and multicast GRT services

345	   6.  The solution MUST support all-active load-balancing mode

347	   7.  The solution MAY support single-active load-balancing mode

349	   8.  The solution MUST support port-active load-balancing mode

351	2.  Solution
352	   +------
353	   |     +-------+ .1 10.0.0.1/24
354	   | PE1 || BE1  +---------------------------------+
355	   |     || ESI-1|                                 |
356	   |     ||      | .2 10.0.0.1/24                  |
357	   |     ||      +-------------------------+       |
358	   |     +-------+                         |       |
359	   |     |                                 |       |
360	   |     +-------+ 10.0.1.1/24             |       |
361	   |     || BE2  +------------------+      |       |
362	   |     || ESI-2|                  |      |       |
363	   |     ||      |                 +v----+ |       |
364	   |     ||      |                 |CE1  | |       |
365	   |     +-------+                 |.2   | |       |
366	   +------                         |CUST1| |       |
367	                                   +^----+ |       |
368	   +------                          |     +v-----+-v----+
369	   |     +-------+ 10.0.1.1/24      |     |SW1   |      +-->H1(.2)
370	   | PE2 || BE2  +------------------+     |CUST2 |CUST1 |
371	   |     || ESI-2|                        +^-----+-^----+
372	   |     ||      |                         |       |
373	   |     ||      |                         |       |
374	   |     +-------+                         |       |
375	   |     |                                 |       |
376	   |     +-------+ .2 10.0.0.1/24          |       |
377	   |     || BE1  +-------------------------+       |
378	   |     || ESI-1|                                 |
379	   |     ||      | .1 10.0.0.1/24                  |
380	   |     ||      +---------------------------------+
381	   |     +-------+
382	   +------

384	   PE(1,2):
385	   CUST1-VRF: EVI 1
386	   CUST2-VRF: EVI 2

388	   SW1:
389	   CUST1-Subnet1: 10.0.0.2/24 (VLAN 1)
390	   CUST2-Subnet1: 10.0.0.2/24 (VLAN 2)

392	   CE1:
393	   CUST1-Subnet2 10.0.1.2/24

395	         Figure 3: ARP/ND MAC-IP route-sync over different VRF(s)

397	   Consider the Figure 3 topology, where 2 AC aware bundling service
398	   interfaces are supported.  On first bundling interface BE1, PE1 and
399	   PE2 share a LAG interface with switch 1 (SW1) and have 2 separate
400	   (but overlapping) customer 1 and customer 2 subnets.  CUST1 Subnet 1
401	   is resolving over sub-interface VLAN 1 (.1), and CUST2 Subnet 1 is
402	   resolving over sub-interface VLAN 2 (.2).

404	   On second bundling interface BE2, both PEs share a LAG interface with
405	   Customer Edge device 1 (CE1) and only a single Customer (CUST1)
406	   subnet on native VLAN.

408	   Main interface BE1 on PE1 and PE2 is shared by customer 1 and 2, and
409	   represented by ESI-1.

411	   Main interface BE2 on PE1 and PE2 is only used by customer 1, and
412	   represented by ESI-2.

414	   If we focus on CUST1 for now, there are 2 cases visible.

416	   Case 1: For CE 1, if its ARP responses hash towards PE2, then PE1
417	   will be unaware of its presence.  For PE2 to synchronize this
418	   information to PE1, in addition to CE1 IP address (10.0.1.2) and MAC
419	   address (m1), 2 additional unique identifiers are needed. 1.  IP-VRF.
420	   CUST 1 VRF is represented by EVI ID 1 2.  Interface.  BE2 Interface
421	   is represented by ESI-2

423	   Case 2: For Host 1 (H1), if its ARP responses hash towards PE2, then
424	   PE1 will be unaware of its presence.  For PE2 to synchronize this
425	   information to PE1, then in addition to H1 IP address (10.0.0.2) and
426	   MAC address (m2), 3 additional unique identifiers are required. 1.
427	   IP-VRF.  CUST 1 VRF is represented by EVI ID 1 2.  Main Interface.
428	   BE1 Interface is represented by ESI-1 3.  Sub-Interface.  Subnet/VLAN
429	   1 is represented by Attachment Circuit ID 1.

431	2.1.  Mapping of L3VRF to EVPN EVI

433	   A separate EVPN instance will be configured to each layer-3 VRF and
434	   be marked for route-sync only.  Each L3-VRF will have a unique
435	   associated EVI ID.  The multi-homed peer PEs MUST have the same
436	   configured EVI to layer-3 VRF mapping.  This mapping also extends to
437	   the GRT, where a unique EVI ID can be assigned to support non VPN
438	   layer-3 services.  Mis-configuration detection across peering PEs are
439	   left for further study.

441	   When an EVPN instance is created as route-sync only, a MAC-VRF table
442	   is created to store all advertised routes.  Local MAC learning may be
443	   disabled as this feature does not require MAC-only RT-2
444	   advertisements.

446	   This EVI is applicable to the multi-homed peer PEs only

448	   The EVPN instance will be responsible for populating the following
449	   layer-3 VRF tables from remotely synced routes from peer PE

451	   o  ARP/ND

453	   o  IGMP

455	   o  IP (for customer subnets learned from IGP adjacency)

457	   In the example Figure 3, route-syncs from VRF CUST1 will have EVI-RT
458	   BGP Extended Community (EC) with EVI 1, and VRF CUST2 will have EVI
459	   2.

461	2.2.  Mapping for L3 Interface to ESI

463	   The ESI represents the L3 LAG interface between PE and CEs.  This ESI
464	   is signalled using RT-4 with the ES-Import Route Target as described
465	   in Section 8.1.1 of [RFC7432] so that the service PE peers can
466	   discover each others common ES.

468	   In the example Figure 3, route-syncs from interface BE1 have ES-
469	   Import RT EC with ESI 1

471	2.3.  Mapping for L3 Sub-Interface to Attachment Circuit ID

473	   The Attachment Circuit ID represens the sub-interface subnet on the
474	   L3 LAG interface between PE and CEs.  The AC-ID is signalled using
475	   RT-2, RT-5, RT-7 and RT-8 by attaching Attachment Circuit ID Extended
476	   community as described in Section 6.1 of
477	   [I-D.sajassi-bess-evpn-ac-aware-bundling].

479	   In the example Figure 3, route-syncs from sub-interface BE1.1 (VLAN1)
480	   have Attachment-Circuit-ID EC with ID 1

482	2.4.  Route sync for ARP/ND

484	   This document proposes solving the issue described in Section 1.1
485	   using RT-2 IP/MAC route sync as described in Section 10 of [RFC7432]
486	   with a modification described below.

488	2.4.1.  Local adjacency (ARP/ND) learning

490	   Local ARP/ND learning will trigger a RT-2 route sync to any peer PE.
491	   There is no need for local MAC learning or sync over the L3
492	   interface, only adjacencies.  The MAC-only RT-2 route SHOULD not be
493	   advertised to peer PE.

495	   Section 9.1 of [RFC7432] describes different mechanisms to learn
496	   adjacency routes locally.

498	   o  An ARP/ND Sync route MUST carry exactly one ES-Import Route Target
499	      extended community, the one that corresponds to the ES on which
500	      the ARP or ND was received.

502	   o  It MUST also carry exactly one EVI-RT EC, the one that corresponds
503	      to the EVI on which the ARP or ND was received.  The EVI maps the
504	      layer-3 VRF See Section 9.5 of [I-D.ietf-bess-evpn-igmp-mld-proxy]
505	      for details on how to encode and construct the EVI-RT EC.

507	   o  If the case where PE supports AC aware bundling, it MUST also
508	      carry one Attachment Circuit ID Extended Community.  The circuit
509	      ID maps the sub-interface (or subnet) this route was received.
510	      For details on how to encode and construct this Extended
511	      Community, see section 6.1 of
512	      [I-D.sajassi-bess-evpn-ac-aware-bundling].

514	2.4.2.  Remote ARP/ND learning

516	   When consuming a remote layer-3 RT-2 sync route:

518	   o  BGP only imports layer-3 sync route(s) when both ES-Import and
519	      EVI-RT extended communities match those locally configured

521	   o  The layer-3 VRF is derived from the matching EVI

523	   o  The main interface is derived from the ESI

525	   o  The VLAN / sub-interface is derived from the AC-ID provided in the
526	      Attachment-Circuit-ID extended community

528	   o  The combination of ES Import and EVI RT will allow BGP to import
529	      layer-3 sync route(s) to only PE(s) that have are attached to the
530	      same ESI and have the respective EVI.

532	2.5.  Route sync for IGMP

534	   This document proposes solving the issue described in Section 1.2
535	   using RT-7 and RT-8 route sync as described by
536	   [I-D.ietf-bess-evpn-igmp-mld-proxy].

538	   Local IGMP join and leave will trigger a RT-7/8 route sync to peer
539	   PE.

541	2.5.1.  Local IGMP Join/Leave learning

543	   An IGP Join or Leave will trigger a RT-7/8 route sync to any peer PE.

545	   Section 9.1 of [RFC7432] describes different mechanisms to learn
546	   adjacency routes locally.

548	   o  An Multicast Join or Leave Sync route MUST carry exactly one ES-
549	      Import Route Target extended community, the one that corresponds
550	      to the ES on which the IGMP Join or Leave was received.

552	   o  It MUST also carry exactly one EVI-RT EC, the one that corresponds
553	      to the EVI on which the IGMP Join or Leave was received.  The EVI
554	      maps the layer-3 VRF See Section 9.5 of
555	      [I-D.ietf-bess-evpn-igmp-mld-proxy] for details on how to encode
556	      and construct the EVI-RT EC.

558	   o  If the case where PE supports AC aware bundling, it MUST also
559	      carry one Attachment Circuit ID Extended Community.  The circuit
560	      ID maps the sub-interface (or subnet) this route was received.
561	      For details on how to encode and construct this Extended
562	      Community, see section 6.1 of
563	      [I-D.sajassi-bess-evpn-ac-aware-bundling].

565	   o  The combination of ES Import and EVI RT will allow BGP to import
566	      Multicast Join and Leave synch route(s) to only PE(s) that have
567	      are attached to the same ESI and have the respective EVI.

569	2.5.2.  Remote IGMP Join/Leave learning

571	   When consuming a remote multicast RT-7 or RT-8 sync route:

573	   o  BGP only imports multicast sync route(s) when both ES-Import and
574	      EVI-RT extended communities match those locally configured

576	   o  The layer-3 VRF is derived from the matching EVI

578	   o  The main interface is derived from the ESI

580	   o  The VLAN / sub-interface is derived from the AC-ID provided in the
581	      Attachment-Circuit-ID extended community

583	2.6.  Customer Subnet Route sync using Route-type(5)

585	   Section 3 of [I-D.ietf-bess-evpn-prefix-advertisement] provides a
586	   mechanism to synchronize layer-3 customer subnets between the PEs in
587	   order to solve problem described in Section 1.3.

589	   Using Figure 2 as example, if PE1 forms the IGP adjacency with CE, it
590	   will be the only PE with knowledge of the customer subnet R1.  BGP on
591	   PE1 will then advertise R1 to remote PEs using L3-VPN signalling.

593	   Although PE2 has the same ES connection to the CE, and could provide
594	   load balancing to remote PEs, due to it not having formed an IGP
595	   adjacency with CE it is not aware of the customer subnet R1.

597	   This can be solved by PE1 signaling R1 to PE2 using a RT-5 synch
598	   route.  BGP on PE2 can then advertise this customer subnet R1 towards
599	   the core is if it was locally learned through IGP, and provide load-
600	   balancing from the remote PEs.

602	   The route-type(5) will carry the ESI as well as the gateway address
603	   GW (prefix next-hop address).

605	   The same mapping mechanism will be used as for Route and IGMP sync,
606	   where EVI will determine the L3-VRF, ESI carried with route-type(5)
607	   will provide the main interface, and the gateway address will provide
608	   the nexthop.

610	2.7.  Mapping for VLAN to ETAG

612	   Another possible signalling of VLAN/sub-interface between service PE
613	   peers is to use the Ethernet Tag (ETAG) ID value in RT-2, RT-5, RT-7
614	   and RT-8 as apposed to the Attachment Circuit Extended Community.

616	   This will not work with vlan-aware bundling mode, but as that is a
617	   layer2 mode this should not prevent ETAGs use for L3 services.

619	3.  Extensions to RT-2, RT-5, RT-7 and RT-8

621	   This document proposes extending the usecase of Extended communities
622	   already defined in other drafts for the route types RT-2, RT-5, RT-7
623	   and RT-8.

625	   o  EVI-RT Extended Community as defined in Section 9.5 of
626	      [I-D.ietf-bess-evpn-igmp-mld-proxy].

628	   o  Attachment Circuit ID Extended Community as defined in Section 6.1
629	      of [I-D.sajassi-bess-evpn-ac-aware-bundling].

631	4.  Convergence Considerations
632	5.  Overall Advantages

634	   The use of EVPN MC-LAG all active multi-homing brings the following
635	   benefits to L3 BGP services:

637	   o  Open standards based per interface all-active redundancy mechanism
638	      that eliminates the need to run ICCP and LDP.

640	   o  Agnostic of underlay technology (MPLS, VXLAN, SRv6) and associated
641	      services (L3, L3-VPN)

643	   o  Replaces legacy MC-LAG ICCP-based solution, and offers following
644	      additional benefits:

646	      *  Fast convergence with mass-withdraw is possible with EVPN, no
647	         equivalent in ICCP

649	   o  Requires signalling already defined in existing EVPN RFCs
650	      [RFC7432] and drafts [I-D.ietf-bess-evpn-igmp-mld-proxy],
651	      [I-D.sajassi-bess-evpn-ac-aware-bundling], and
652	      [I-D.ietf-bess-evpn-prefix-advertisement]

654	   o  Removes the burden of having the need for ICL link

656	6.  Security Considerations

658	   The same Security Considerations described in [RFC7432] are valid for
659	   this document.

661	7.  IANA Considerations

663	   There are no IANA considerations.

665	8.  References

667	8.1.  Normative References

669	   [I-D.ietf-bess-evpn-igmp-mld-proxy]
670	              Sajassi, A., Thoria, S., Mishra, M., Patel, K., Drake, J.,
671	              and W. Lin, "IGMP and MLD Proxy for EVPN", draft-ietf-
672	              bess-evpn-igmp-mld-proxy-09 (work in progress), April
673	              2021.

675	   [I-D.ietf-bess-evpn-prefix-advertisement]
676	              Rabadan, J., Henderickx, W., Drake, J., Lin, W., and A.
677	              Sajassi, "IP Prefix Advertisement in EVPN", draft-ietf-
678	              bess-evpn-prefix-advertisement-11 (work in progress), May
679	              2018.

681	   [I-D.sajassi-bess-evpn-ac-aware-bundling]
682	              Sajassi, A., Mishra, M., Thoria, S., Brissette, P.,
683	              Rabadan, J., and J. Drake, "AC-Aware Bundling Service
684	              Interface in EVPN", draft-sajassi-bess-evpn-ac-aware-
685	              bundling-03 (work in progress), February 2021.

687	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
688	              Requirement Levels", BCP 14, RFC 2119,
689	              DOI 10.17487/RFC2119, March 1997,
690	              <https://www.rfc-editor.org/info/rfc2119>.

692	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
693	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
694	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

696	8.2.  Informative References

698	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
699	              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
700	              2006, <https://www.rfc-editor.org/info/rfc4364>.

702	   [RFC7432]  Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
703	              Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
704	              Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
705	              2015, <https://www.rfc-editor.org/info/rfc7432>.

707	   [RFC7761]  Fenner, B., Handley, M., Holbrook, H., Kouvelas, I.,
708	              Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent
709	              Multicast - Sparse Mode (PIM-SM): Protocol Specification
710	              (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March
711	              2016, <https://www.rfc-editor.org/info/rfc7761>.

713	Authors' Addresses

715	   Michael MacKenzie (editor)
716	   Cisco Systems

718	   Email: mimacken@cisco.com

720	   Patrice Brissette
721	   Cisco Systems

723	   Email: pbrisset@cisco.com
724	   Satoru Matsushima
725	   Softbank

727	   Email: satoru.matsushima@g.softbank.co.jp