idnits 2.17.1 

draft-ietf-bess-evpn-irb-mcast-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 24, 2021) is 1061 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-14) exists of
     draft-ietf-bess-evpn-bum-procedure-updates-08

  == Outdated reference: A later version (-21) exists of
     draft-ietf-bess-evpn-igmp-mld-proxy-09

  == Outdated reference: A later version (-15) exists of
     draft-ietf-bess-evpn-inter-subnet-forwarding-13

  == Outdated reference: A later version (-12) exists of
     draft-ietf-bess-evpn-optimized-ir-07

  == Outdated reference: A later version (-13) exists of
     draft-ietf-bess-evpn-pref-df-07

  == Outdated reference: A later version (-14) exists of
     draft-ietf-bier-evpn-04


     Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	BESS                                                              W. Lin
3	Internet-Draft                                                  Z. Zhang
4	Intended status: Standards Track                                J. Drake
5	Expires: November 25, 2021                                 E. Rosen, Ed.
6	                                                  Juniper Networks, Inc.
7	                                                              J. Rabadan
8	                                                                   Nokia
9	                                                              A. Sajassi
10	                                                           Cisco Systems
11	                                                            May 24, 2021

13	        EVPN Optimized Inter-Subnet Multicast (OISM) Forwarding
14	                   draft-ietf-bess-evpn-irb-mcast-06

16	Abstract

18	   Ethernet VPN (EVPN) provides a service that allows a single Local
19	   Area Network (LAN), comprising a single IP subnet, to be divided into
20	   multiple "segments".  Each segment may be located at a different
21	   site, and the segments are interconnected by an IP or MPLS backbone.
22	   Intra-subnet traffic (either unicast or multicast) always appears to
23	   the endusers to be bridged, even when it is actually carried over the
24	   IP or MPLS backbone.  When a single "tenant" owns multiple such LANs,
25	   EVPN also allows IP unicast traffic to be routed between those LANs.
26	   This document specifies new procedures that allow inter-subnet IP
27	   multicast traffic to be routed among the LANs of a given tenant,
28	   while still making intra-subnet IP multicast traffic appear to be
29	   bridged.  These procedures can provide optimal routing of the inter-
30	   subnet multicast traffic, and do not require any such traffic to
31	   leave a given router and then reenter that same router.  These
32	   procedures also accommodate IP multicast traffic that needs to travel
33	   to or from systems that are outside the EVPN domain.

35	Status of This Memo

37	   This Internet-Draft is submitted in full conformance with the
38	   provisions of BCP 78 and BCP 79.

40	   Internet-Drafts are working documents of the Internet Engineering
41	   Task Force (IETF).  Note that other groups may also distribute
42	   working documents as Internet-Drafts.  The list of current Internet-
43	   Drafts is at https://datatracker.ietf.org/drafts/current/.

45	   Internet-Drafts are draft documents valid for a maximum of six months
46	   and may be updated, replaced, or obsoleted by other documents at any
47	   time.  It is inappropriate to use Internet-Drafts as reference
48	   material or to cite them other than as "work in progress."
49	   This Internet-Draft will expire on November 25, 2021.

51	Copyright Notice

53	   Copyright (c) 2021 IETF Trust and the persons identified as the
54	   document authors.  All rights reserved.

56	   This document is subject to BCP 78 and the IETF Trust's Legal
57	   Provisions Relating to IETF Documents
58	   (https://trustee.ietf.org/license-info) in effect on the date of
59	   publication of this document.  Please review these documents
60	   carefully, as they describe your rights and restrictions with respect
61	   to this document.  Code Components extracted from this document must
62	   include Simplified BSD License text as described in Section 4.e of
63	   the Trust Legal Provisions and are provided without warranty as
64	   described in the Simplified BSD License.

66	Table of Contents

68	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
69	     1.1.  Background  . . . . . . . . . . . . . . . . . . . . . . .   4
70	       1.1.1.  Segments, Broadcast Domains, and Tenants  . . . . . .   4
71	       1.1.2.  Inter-BD (Inter-Subnet) IP Traffic  . . . . . . . . .   5
72	       1.1.3.  EVPN and IP Multicast . . . . . . . . . . . . . . . .   6
73	       1.1.4.  BDs, MAC-VRFS, and EVPN Service Models  . . . . . . .   7
74	     1.2.  Need for EVPN-aware Multicast Procedures  . . . . . . . .   7
75	     1.3.  Additional Requirements That Must be Met by the Solution    8
76	     1.4.  Terminology . . . . . . . . . . . . . . . . . . . . . . .  10
77	     1.5.  Model of Operation: Overview  . . . . . . . . . . . . . .  13
78	       1.5.1.  Control Plane . . . . . . . . . . . . . . . . . . . .  13
79	       1.5.2.  Data Plane  . . . . . . . . . . . . . . . . . . . . .  15
80	   2.  Detailed Model of Operation . . . . . . . . . . . . . . . . .  17
81	     2.1.  Supplementary Broadcast Domain  . . . . . . . . . . . . .  18
82	     2.2.  Detecting When a Route is About/For/From a Particular BD   18
83	     2.3.  Use of IRB Interfaces at Ingress PE . . . . . . . . . . .  21
84	     2.4.  Use of IRB Interfaces at an Egress PE . . . . . . . . . .  23
85	     2.5.  Announcing Interest in (S,G)  . . . . . . . . . . . . . .  24
86	     2.6.  Tunneling Frames from Ingress PE to Egress PEs  . . . . .  25
87	     2.7.  Advanced Scenarios  . . . . . . . . . . . . . . . . . . .  26
88	   3.  EVPN-aware Multicast Solution Control Plane . . . . . . . . .  26
89	     3.1.  Supplementary Broadcast Domain (SBD) and Route Targets  .  26
90	     3.2.  Advertising the Tunnels Used for IP Multicast . . . . . .  27
91	       3.2.1.  Constructing Routes for the SBD . . . . . . . . . . .  28
92	       3.2.2.  Ingress Replication . . . . . . . . . . . . . . . . .  28
93	       3.2.3.  Assisted Replication  . . . . . . . . . . . . . . . .  29
94	         3.2.3.1.  Automatic SBD Matching  . . . . . . . . . . . . .  30
95	       3.2.4.  BIER  . . . . . . . . . . . . . . . . . . . . . . . .  30
96	       3.2.5.  Inclusive P2MP Tunnels  . . . . . . . . . . . . . . .  31
97	         3.2.5.1.  Using the BUM Tunnels as IP Multicast Inclusive
98	                   Tunnels . . . . . . . . . . . . . . . . . . . . .  31
99	         3.2.5.2.  Using Wildcard S-PMSI A-D Routes to Advertise
100	                   Inclusive Tunnels Specific to IP Multicast  . . .  33
101	       3.2.6.  Selective Tunnels . . . . . . . . . . . . . . . . . .  34
102	     3.3.  Advertising SMET Routes . . . . . . . . . . . . . . . . .  35
103	   4.  Constructing Multicast Forwarding State . . . . . . . . . . .  37
104	     4.1.  Layer 2 Multicast State . . . . . . . . . . . . . . . . .  37
105	       4.1.1.  Constructing the OIF List . . . . . . . . . . . . . .  38
106	       4.1.2.  Data Plane: Applying the OIF List to an (S,G) Frame .  39
107	         4.1.2.1.  Eligibility of an AC to Receive a Frame . . . . .  39
108	         4.1.2.2.  Applying the OIF List . . . . . . . . . . . . . .  39
109	     4.2.  Layer 3 Forwarding State  . . . . . . . . . . . . . . . .  41
110	   5.  Interworking with non-OISM EVPN-PEs . . . . . . . . . . . . .  42
111	     5.1.  IPMG Designated Forwarder . . . . . . . . . . . . . . . .  44
112	     5.2.  Ingress Replication . . . . . . . . . . . . . . . . . . .  45
113	       5.2.1.  Ingress PE is non-OISM  . . . . . . . . . . . . . . .  46
114	       5.2.2.  Ingress PE is OISM  . . . . . . . . . . . . . . . . .  47
115	     5.3.  P2MP Tunnels  . . . . . . . . . . . . . . . . . . . . . .  48
116	   6.  Traffic to/from Outside the EVPN Tenant Domain  . . . . . . .  49
117	     6.1.  Layer 3 Interworking via EVPN OISM PEs  . . . . . . . . .  49
118	       6.1.1.  General Principles  . . . . . . . . . . . . . . . . .  49
119	       6.1.2.  Interworking with MVPN  . . . . . . . . . . . . . . .  52
120	         6.1.2.1.  MVPN Sources with EVPN Receivers  . . . . . . . .  54
121	           6.1.2.1.1.  Identifying MVPN Sources  . . . . . . . . . .  54
122	           6.1.2.1.2.  Joining a Flow from an MVPN Source  . . . . .  54
123	         6.1.2.2.  EVPN Sources with MVPN Receivers  . . . . . . . .  57
124	           6.1.2.2.1.  General procedures  . . . . . . . . . . . . .  57
125	           6.1.2.2.2.  Any-Source Multicast (ASM) Groups . . . . . .  58
126	           6.1.2.2.3.  Source on Multihomed Segment  . . . . . . . .  59
127	         6.1.2.3.  Obtaining Optimal Routing of Traffic Between MVPN
128	                   and EVPN  . . . . . . . . . . . . . . . . . . . .  59
129	         6.1.2.4.  Selecting the MEG SBD-DR  . . . . . . . . . . . .  60
130	       6.1.3.  Interworking with 'Global Table Multicast'  . . . . .  61
131	       6.1.4.  Interworking with PIM . . . . . . . . . . . . . . . .  61
132	         6.1.4.1.  Source Inside EVPN Domain . . . . . . . . . . . .  62
133	         6.1.4.2.  Source Outside EVPN Domain  . . . . . . . . . . .  63
134	     6.2.  Interworking with PIM via an External PIM Router  . . . .  63
135	   7.  Using an EVPN Tenant Domain as an Intermediate (Transit)
136	       Network for Multicast traffic . . . . . . . . . . . . . . . .  65
137	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  67
138	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  67
139	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  68
140	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  68
141	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  68
142	     11.2.  Informative References . . . . . . . . . . . . . . . . .  69
143	   Appendix A.  Integrated Routing and Bridging  . . . . . . . . . .  71
144	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  76

146	1.  Introduction

148	1.1.  Background

150	   Ethernet VPN (EVPN) [RFC7432] provides a Layer 2 VPN (L2VPN)
151	   solution, which allows IP backbone provider to offer ethernet service
152	   to a set of customers, known as "tenants".

154	   In this section (as well as in
155	   [I-D.ietf-bess-evpn-inter-subnet-forwarding]), we provide some
156	   essential background information on EVPN.

158	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
159	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
160	   "OPTIONAL" in this document are to be interpreted as described in BCP
161	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
162	   capitals, as shown here.

164	1.1.1.  Segments, Broadcast Domains, and Tenants

166	   One of the key concepts of EVPN is the Broadcast Domain (BD).  A BD
167	   is essentially an emulated ethernet.  Each BD belongs to a single
168	   tenant.  A BD typically consists of multiple ethernet "segments", and
169	   each segment may be attached to a different EVPN Provider Edge
170	   (EVPN-PE) router.  EVPN-PE routers are often referred to as "Network
171	   Virtualization Endpoints" or NVEs.  However, this document will use
172	   the term "EVPN-PE", or, when the context is clear, just "PE".

174	   In this document, we use the term "segment" to mean the same as
175	   "Ethernet Segment" or "ES" in [RFC7432].

177	   Attached to each segment are "Tenant Systems" (TSes).  A TS may be
178	   any type of system, physical or virtual, host or router, etc., that
179	   can attach to an ethernet.

181	   When two TSes are on the same segment, traffic between them does not
182	   pass through an EVPN-PE.  When two TSes are on different segments of
183	   the same BD, traffic between them does pass through an EVPN-PE.

185	   When two TSes, say TS1 and TS2 are on the same BD, then:

187	   o  If TS1 knows the MAC address of TS2, TS1 can send unicast ethernet
188	      frames to TS2.  TS2 will receive the frames unaltered.

190	   o  If TS1 broadcasts an ethernet frame, TS2 will receive the
191	      unaltered frame.

193	   o  If TS1 multicasts an ethernet frame, TS2 will receive the
194	      unaltered frame, as long as TS2 has been provisioned to receive
195	      ethernet multicasts.

197	   When we say that TS2 receives an unaltered frame from TS1, we mean
198	   that the frame still contains TS1's MAC address, and that no
199	   alteration of the frame's payload (and consequently, no alteration of
200	   the payload's IP header) has been made.

202	   EVPN allows a single segment to be attached to multiple PE routers.
203	   This is known as "EVPN multi-homing".  Suppose a given segment is
204	   attached to both PE1 and PE2, and suppose PE1 receives a frame from
205	   that segment.  It may be necessary for PE1 to send the frame over the
206	   backbone to PE2.  EVPN has procedures to ensure that such a frame
207	   cannot be sent by PE2 back to its originating segment.  This is
208	   particularly important for multicast, because a frame arriving at PE1
209	   from a given segment will already have been seen by all the systems
210	   on that segment that need to see it.  If the frame were sent back to
211	   the originating segment by PE2, receivers on that segment would
212	   receive the packet twice.  Even worse, the frame might be sent back
213	   to PE1, which could cause an infinite loop.

215	1.1.2.  Inter-BD (Inter-Subnet) IP Traffic

217	   If a given tenant has multiple BDs, the tenant may wish to allow IP
218	   communication among these BDs.  Such a set of BDs is known as an
219	   "EVPN Tenant Domain" or just a "Tenant Domain".

221	   If tenant systems TS1 and TS2 are not in the same BD, then they do
222	   not receive unaltered ethernet frames from each other.  In order for
223	   TS1 to send traffic to TS2, TS1 encapsulates an IP datagram inside an
224	   ethernet frame, and uses ethernet to send these frames to an IP
225	   router.  The router decapsulates the IP datagram, does the IP
226	   processing, and re-encapsulates the datagram for ethernet.  The MAC
227	   source address field now has the MAC address of the router, not of
228	   TS1.  The TTL field of the IP datagram should be decremented by
229	   exactly 1, even if the frame needs to be sent from one PE to another.
230	   The structure of the provider's IP backbone is thus hidden from the
231	   tenants.

233	   EVPN accommodates the need for inter-BD communication within a Tenant
234	   Domain by providing an integrated L2/L3 service for unicast IP
235	   traffic.  EVPN's Integrated Routing and Bridging (IRB) functionality
236	   is specified in [I-D.ietf-bess-evpn-inter-subnet-forwarding].  Each
237	   BD in a Tenant Domain is assumed to be a single IP subnet, and each
238	   IP subnet within a a given Tenant Domain is assumed to be a single
239	   BD.  EVPN's IRB functionality allows IP traffic to travel from one BD
240	   to another, and ensures that proper IP processing (e.g., TTL
241	   decrement) is done.

243	   A brief overview of IRB, including the notion of an "IRB interface",
244	   can be found in Appendix A.  As explained there, an IRB interface is
245	   a sort of virtual interface connecting an L3 routing instance to a
246	   BD.  A BD may have multiple attachment circuits (ACs) to a given PE,
247	   where each AC connects to a different ethernet segment of the BD.
248	   However, these ACs are not visible to the L3 routing function; from
249	   the perspective of an L3 routing instance, a PE has just one
250	   interface to each BD, viz., the IRB interface for that BD.

252	   The "L3 routing instance" depicted in Appendix A is associated with a
253	   single Tenant Domain, and may be thought of as an IP-VRF for that
254	   Tenant Domain.

256	1.1.3.  EVPN and IP Multicast

258	   [I-D.ietf-bess-evpn-inter-subnet-forwarding] and
259	   [I-D.ietf-bess-evpn-prefix-advertisement] cover inter-subnet
260	   (inter-BD) IP unicast forwarding, but they do not cover inter-subnet
261	   IP multicast forwarding.

263	   [RFC7432] covers intra-subnet (intra-BD) ethernet multicast.  The
264	   intra-subnet ethernet multicast procedures of [RFC7432] are used for
265	   ethernet Broadcast traffic, for ethernet unicast traffic whose MAC
266	   Destination Address field contains an Unknown address, and for
267	   ethernet traffic whose MAC Destination Address field contains an
268	   ethernet Multicast MAC address.  These three classes of traffic are
269	   known collectively as "BUM traffic" (Broadcast/Unknown-Unicast/
270	   Multicast), and the procedures for handling BUM traffic are known as
271	   "BUM procedures".

273	   [I-D.ietf-bess-evpn-igmp-mld-proxy] extends the intra-subnet ethernet
274	   multicast procedures by adding procedures that are specific to, and
275	   optimized for, the use of IP multicast within a subnet.  However,that
276	   document does not cover inter-subnet IP multicast.

278	   The purpose of this document is to specify procedures for EVPN that
279	   provide optimized IP multicast functionality within an EVPN tenant
280	   domain.  This document also specifies procedures that allow IP
281	   multicast packets to be sourced from or destined to systems outside
282	   the Tenant Domain.  We refer to the entire set of these procedures as
283	   "OISM" (Optimized Inter-Subnet Multicast) procedures.

285	   In order to support the OISM procedures specified in this document,
286	   an EVPN-PE MUST also support
287	   [I-D.ietf-bess-evpn-inter-subnet-forwarding] and

289	   [I-D.ietf-bess-evpn-igmp-mld-proxy].  (However, certain of the
290	   procedures in [I-D.ietf-bess-evpn-igmp-mld-proxy] are modified when
291	   OISM is supported.)

293	1.1.4.  BDs, MAC-VRFS, and EVPN Service Models

295	   [RFC7432] defines the notion of "MAC-VRF".  A MAC-VRF contains one or
296	   more "Bridge Tables" (see section 3 of [RFC7432] for a discussion of
297	   this terminology), each of which represents a single Broadcast
298	   Domain.

300	   In the IRB model (outlined in Appendix A) a L3 routing instance has
301	   one IRB interface per BD, NOT one per MAC-VRF.  This document does
302	   not distinguish between a "Broadcast Domain" and a "Bridge Table",
303	   and will use the terms interchangeably (or will use the acronym "BD"
304	   to refer to either).  The way the BDs are grouped into MAC-VRFs is
305	   not relevant to the procedures specified in this document.

307	   Section 6 of [RFC7432] also defines several different EVPN service
308	   models:

310	   o  In the "vlan-based service", each MAC-VRF contains one "bridge
311	      table", where the bridge table corresponds to a particular Virtual
312	      LAN (VLAN).  (See section 3 of [RFC7432] for a discussion of this
313	      terminology.)  Thus each VLAN is treated as a BD.

315	   o  In the "vlan bundle service", each MAC-VRF contains one bridge
316	      table, where the bridge table corresponds to a set of VLANs.  Thus
317	      a set of VLANs are treated as constituting a single BD.

319	   o  In the "vlan-aware bundle service", each MAC-VRF may contain
320	      multiple bridge tables, where each bridge table corresponds to one
321	      BD.  If a MAC-VRF contains several bridge tables, then it
322	      corresponds to several BDs.

324	   The procedures of this document are intended to work for all these
325	   service models.

327	1.2.  Need for EVPN-aware Multicast Procedures

329	   Inter-subnet IP multicast among a set of BDs can be achieved, in a
330	   non-optimal manner, without any specific EVPN procedures.  For
331	   instance, if a particular tenant has n BDs among which he wants to
332	   send IP multicast traffic, he can simply attach a conventional
333	   multicast router to all n BDs.  Or more generally, as long as each BD
334	   has at least one IP multicast router, and the IP multicast routers
335	   communicate multicast control information with each other,
336	   conventional IP multicast procedures will work normally, and no
337	   special EVPN functionality is needed.

339	   However, that technique does not provide optimal routing for
340	   multicast.  In conventional multicast routing, for a given multicast
341	   flow, there is only one multicast router on each BD that is permitted
342	   to send traffic of that flow to the BD.  If that BD has receivers for
343	   a given flow, but the source of the flow is not on that BD, then the
344	   flow must pass through that multicast router.  This leads to the
345	   "hair-pinning" problem described (for unicast) in Appendix A.

347	   For example, consider an (S,G) flow that is sourced by a TS S and
348	   needs to be received by TSes R1 and R2.  Suppose S is on a segment of
349	   BD1, R1 is on a segment of BD2, but both are attached to PE1.
350	   Suppose also that the tenant has a multicast router, attached to a
351	   segment of BD1 and to a segment of BD2.  However, the segments to
352	   which that router is attached are both attached to PE2.  Then the
353	   flow from S to R would have to follow the path:
354	   S-->PE1-->PE2-->Tenant Multicast Router-->PE2-->PE1-->R1.  Obviously,
355	   the path S-->PE1-->R would be preferred.

357	   Now suppose that there is a second receiver, R2.  R2 is attached to a
358	   third BD, BD3.  However, it is attached to a segment of BD3 that is
359	   attached to PE1.  And suppose also that the Tenant Multicast Router
360	   is attached to a segment of BD3 that attaches to PE2.  In this case,
361	   the Tenant Multicast Router will make two copies of the packet, one
362	   for BD2 and one for BD3.  PE2 will send both copies back to PE1.  Not
363	   only is the routing sub-optimal, but PE2 sends multiple copies of the
364	   same packet to PE1.  This is a further sub-optimality.

366	   This is only an example; many more examples of sub-optimal multicast
367	   routing can easily be given.  To eliminate sub-optimal routing and
368	   extra copies, it is necessary to have a multicast solution that is
369	   EVPN-aware, and that can use its knowledge of the internal structure
370	   of a Tenant Domain to ensure that multicast traffic gets routed
371	   optimally.  The procedures of this document allow us to avoid all
372	   such sub-optimalities when routing inter-subnet multicasts within a
373	   Tenant Domain.

375	1.3.  Additional Requirements That Must be Met by the Solution

377	   In addition to providing optimal routing of multicast flows within a
378	   Tenant Domain, the EVPN-aware multicast solution is intended to
379	   satisfy the following requirements:

381	   o  The solution must integrate well with the procedures specified in
382	      [I-D.ietf-bess-evpn-igmp-mld-proxy].  That is, an integrated set
383	      of procedures must handle both intra-subnet multicast and
384	      inter-subnet multicast.

386	   o  With regard to intra-subnet multicast, the solution MUST maintain
387	      the integrity of multicast ethernet service.  This means:

389	      *  If a source and a receiver are on the same subnet, the MAC
390	         source address (SA) of the multicast frame sent by the source
391	         will not get rewritten.

393	      *  If a source and a receiver are on the same subnet, no IP
394	         processing of the ethernet payload is done.  The IP TTL is not
395	         decremented, the header checksum is not changed, no
396	         fragmentation is done, etc.

398	   o  On the other hand, if a source and a receiver are on different
399	      subnets, the frame received by the receiver will not have the MAC
400	      Source address of the source, as the frame will appear to have
401	      come from a multicast router.  Also, proper processing of the IP
402	      header is done, e.g., TTL decrement by 1, header checksum
403	      modification, possibly fragmentation, etc.

405	   o  If a Tenant Domain contains several BDs, it MUST be possible for a
406	      multicast flow (even when the multicast group address is an "any
407	      source multicast" (ASM) address), to have sources in one of those
408	      BDs and receivers in one or more of the other BDs, without
409	      requiring the presence of any system performing PIM Rendezvous
410	      Point (RP) functions ([RFC7761]).  Multicast throughout a Tenant
411	      Domain must not require the tenant systems to be aware of any
412	      underlying multicast infrastructure.

414	   o  Sometimes a MAC address used by one TS on a particular BD is also
415	      used by another TS on a different BD.  Inter-subnet routing of
416	      multicast traffic MUST NOT make any assumptions about the
417	      uniqueness of a MAC address across several BDs.

419	   o  If two EVPN-PEs attached to the same Tenant Domain both support
420	      the OISM procedures, each may receive inter-subnet multicasts from
421	      the other, even if the egress PE is not attached to any segment of
422	      the BD from which the multicast packets are being sourced.  It
423	      MUST NOT be necessary to provision the egress PE with knowledge of
424	      the ingress BD.

426	   o  There must be a procedure that that allows EVPN-PE routers
427	      supporting OISM procedures to send/receive multicast traffic to/
428	      from EVPN-PE routers that support only [RFC7432], but that do not
429	      support the OISM procedures or even the procedures of
430	      [I-D.ietf-bess-evpn-inter-subnet-forwarding].  However, when
431	      interworking with such routers (which we call "non-OISM PE
432	      routers"), optimal routing may not be achievable.

434	   o  It MUST be possible to support scenarios in which multicast flows
435	      with sources inside a Tenant Domain have "external" receivers,
436	      i.e., receivers that are outside the domain.  It must also be
437	      possible to support scenarios where multicast flows with external
438	      sources (sources outside the Tenant Domain) have receivers inside
439	      the domain.

441	      This presupposes that unicast routes to multicast sources outside
442	      the domain can be distributed to EVPN-PEs attached to the domain,
443	      and that unicast routes to multicast sources within the domain can
444	      be distributed outside the domain.

446	      Of particular importance are the scenario in which the external
447	      sources and/or receivers are reachable via L3VPN/MVPN, and the
448	      scenario in which external sources and/or receivers are reachable
449	      via IP/PIM.

451	      The solution for external interworking MUST allow for deployment
452	      scenarios in which EVPN does not need to export a host route for
453	      every multicast source.

455	   o  The solution for external interworking must not presuppose that
456	      the same tunneling technology is used within both the EVPN domain
457	      and the external domain.  For example, MVPN interworking must be
458	      possible when MVPN is using MPLS P2MP tunneling, and EVPN is using
459	      Ingress Replication or VXLAN tunneling.

461	   o  The solution must not be overly dependent on the details of a
462	      small set of use cases, but must be adaptable to new use cases as
463	      they arise.  (That is, the solution must be robust.)

465	1.4.  Terminology

467	   In this document we make frequent use of the following terminology:

469	   o  OISM: Optimized Inter-Subnet Multicast.  EVPN-PEs that follow the
470	      procedures of this document will be known as "OISM" PEs.  EVPN-PEs
471	      that do not follow the procedures of this document will be known
472	      as "non-OISM" PEs.

474	   o  IP Multicast Packet: An IP packet whose IP Destination Address
475	      field is a multicast address that is not a link-local address.
476	      (Link-local addresses are IPv4 addresses in the 224/8 range and
477	      IPv6 address in the FF02/16 range.)

479	   o  IP Multicast Frame: An ethernet frame whose payload is an IP
480	      multicast packet (as defined above).

482	   o  (S,G) Multicast Packet: An IP multicast packet whose IP Source
483	      Address field contains S and whose IP Destination Address field
484	      contains G.

486	   o  (S,G) Multicast Frame: An IP multicast frame whose payload
487	      contains S in its IP Source Address field and G in its IP
488	      Destination Address field.

490	   o  Broadcast Domain (BD): an emulated ethernet, such that two systems
491	      on the same BD will receive each other's link-local broadcasts.

493	      Note that EVPN supports service models in which a single EVPN
494	      Instance (EVI) contains only one BD, and service models in which a
495	      single EVI contains multiple BDs.  Both types of service model are
496	      supported by this draft.  In all models, a given BD belongs to
497	      only one EVI.

499	   o  Designated Forwarder (DF).  As defined in [RFC7432], an ethernet
500	      segment may be multi-homed (attached to more than one PE).  An
501	      ethernet segment may also contain multiple BDs, of one or more
502	      EVIs.  For each such EVI, one of the PEs attached to the segment
503	      becomes that EVI's DF for that segment.  Since a BD may belong to
504	      only one EVI, we can speak unambiguously of the BD's DF for a
505	      given segment.

507	      When the text makes it clear that we are speaking in the context
508	      of a given BD, we will frequently use the term "a segment's DF" to
509	      mean the given BD's DF for that segment.

511	   o  AC: Attachment Circuit.  An AC connects the bridging function of
512	      an EVPN-PE to an ethernet segment of a particular BD.  ACs are not
513	      visible at the router (L3) layer.

515	      If a given ethernet segment, attached to a given PE, contains n
516	      BDs, we will say that the PE has n ACs to that segment.

518	   o  L3 Gateway: An L3 Gateway is a PE that connects an EVPN tenant
519	      domain to an external multicast domain by performing both the OISM
520	      procedures and the Layer 3 multicast procedures of the external
521	      domain.

523	   o  PEG (PIM/EVPN Gateway): A L3 Gateway that connects an EVPN Tenant
524	      Domain to an external multicast domain whose Layer 3 multicast
525	      procedures are those of PIM ([RFC7761]).

527	   o  MEG (MVPN/EVPN Gateway): A L3 Gateway that connects an EVPN Tenant
528	      Domain to an external multicast domain whose Layer 3 multicast
529	      procedures are those of MVPN ([RFC6513], [RFC6514]).

531	   o  IPMG (IP Multicast Gateway): A PE that is used for interworking
532	      OISM EVPN-PEs with non-OISM EVPN-PEs.

534	   o  DR (Designated Router): A PE that has special responsibilities for
535	      handling multicast on a given BD.

537	   o  FHR (First Hop Router): The FHR is a PIM router ([RFC7761]) with
538	      special responsibilities.  It is the first multicast router to see
539	      (S,G) packets from source S, and if G is an "Any Source Multicast
540	      (ASM)" group, the FHR is responsible for sending PIM Register
541	      messages to the PIM Rendezvous Point for group G.

543	   o  LHR (Last Hop Router): The LHR is a PIM router ([RFC7761]) with
544	      special responsibilities.  Generally it is attached to a LAN, and
545	      it determines whether there are any hosts on the LAN that need to
546	      receive a given multicast flow.  If so, it creates and sends the
547	      PIM Join messages that are necessary to draw the flow.

549	   o  EC (Extended Community).  A BGP Extended Communities attribute
550	      ([RFC4360], [RFC7153]) is a BGP path attribute that consists of
551	      one or more extended communities.

553	   o  RT (Route Target): A Route Target is a particular kind of BGP
554	      Extended Community.  A BGP Extended Community consists of a type
555	      field, a sub-type field, and a value field.  Certain type/sub-type
556	      combinations indicate that a particular Extended Community is an
557	      RT.  RT1 and RT2 are considered to be the same RT if and only if
558	      they have the same type, same sub-type, and same value fields.

560	   o  Use of the "C-" prefix.  In many documents on VPN multicast, the
561	      prefix "C-" appears before any address or wildcard that refers to
562	      an address or addresses in a tenant's address space, rather than
563	      to an address of addresses in the address space of the backbone
564	      network.  This document omits the "C-" prefix in many cases where
565	      it is clear from the context that the reference is to the tenant's
566	      address space.

568	   This document also assumes familiarity with the terminology of
569	   [RFC4364], [RFC6514], [RFC7432], [RFC7761],
570	   [I-D.ietf-bess-evpn-igmp-mld-proxy],
571	   [I-D.ietf-bess-evpn-prefix-advertisement] and
572	   [I-D.ietf-bess-evpn-bum-procedure-updates].

574	1.5.  Model of Operation: Overview

576	1.5.1.  Control Plane

578	   In this section, and in the remainder of this document, we assume the
579	   reader is familiar with the procedures of IGMP/MLD (see [RFC2236] and
580	   [RFC2710]), by which hosts announce their interest in receiving
581	   particular multicast flows.

583	   Consider a Tenant Domain consisting of a set of k BDs: BD1, ..., BDk.
584	   To support the OISM procedures, each Tenant Domain must also be
585	   associated with a "Supplementary Broadcast Domain" (SBD).  An SBD is
586	   treated in the control plane as a real BD, but it does not have any
587	   ACs.  The SBD has several uses; these will be described later in this
588	   document (see Section 2.1 and Section 3).

590	   Each PE that attaches to one or more of the BDs in a given tenant
591	   domain will be provisioned to recognize that those BDs are part of
592	   the same Tenant Domain.  Note that a given PE does not need to be
593	   configured with all the BDs of a given Tenant Domain.  In general, a
594	   PE will only be attached to a subset of the BDs in a given Tenant
595	   Domain, and will be configured only with that subset of BDs.
596	   However, each PE attached to a given Tenant Domain must be configured
597	   with the SBD for that Tenant Domain.

599	   Suppose a particular segment of a particular BD is attached to PE1.
600	   [RFC7432] specifies that PE1 must originate an Inclusive Multicast
601	   Ethernet Tag (IMET) route for that BD, and that the IMET route must
602	   be propagated to all other PEs attached to the same BD.  If the given
603	   segment contains a host that has interest in receiving a particular
604	   multicast flow, either an (S,G) flow or a (*,G) flow, PE1 will learn
605	   of that interest by participating in the IGMP/MLD procedures, as
606	   specified in [I-D.ietf-bess-evpn-igmp-mld-proxy].  In this case, we
607	   will say that:

609	   o  PE1 is interested in receiving the flow;

611	   o  The AC attaching the interested host to PE1 is also said to be
612	      interested in the flow;

614	   o  The BD containing an AC that is interested in a particular flow is
615	      also said to be interested in that flow.

617	   Once PE1 determines that it has an AC that is interested in receiving
618	   a particular flow or set of flows, it originates one or more
619	   Selective Multicast Ethernet Tag (SMET) route to advertise that
620	   interest.

622	   Note that each IMET or SMET route is "for" a particular BD.  The
623	   notion of a route being "for" a particular BD is explained in
624	   Section 2.2.

626	   When OISM is being supported, the procedures of
627	   [I-D.ietf-bess-evpn-igmp-mld-proxy], are modified as follows:

629	   o  The IMET route originated by a particular PE for a particular BD
630	      is distributed to all other PEs attached to the Tenant Domain
631	      containing that BD, even to those PEs that are not attached to
632	      that particular BD.

634	   o  The SMET routes originated by a particular PE are originated on a
635	      per-Tenant-Domain basis, rather than on a per-BD basis.  That is,
636	      the SMET routes are considered to be for the Tenant Domain's SBD,
637	      rather than for any of its ordinary BDs.  These SMET routes are
638	      distributed to all the PEs attached to the Tenant Domain.

640	      In this way, each PE attached to a given Tenant Domain learns,
641	      from each other PE attached to the same Tenant Domain, the set of
642	      flows that are of interest to each of those other PEs.

644	   An OISM PE that is provisioned with several BDs in the same Tenant
645	   Domain MUST originate an IMET route for each such BD.  To indicate
646	   its support of [I-D.ietf-bess-evpn-igmp-mld-proxy], it SHOULD attach
647	   the EVPN Multicast Flags Extended Community to each such IMET route,
648	   but it MUST attach the EC to at least one such IMET route.

650	   Suppose PE1 is provisioned with both BD1 and BD2, and is provisioned
651	   to consider them to be part of the same Tenant Domain.  It is
652	   possible that PE1 will receive from PE2 both an IMET route for BD1
653	   and an IMET route for BD2.  If either of these IMET routes has the
654	   EVPN Multicast Flags Extended Community, PE1 MUST assume that PE2 is
655	   supporting the procedures of [I-D.ietf-bess-evpn-igmp-mld-proxy] for
656	   ALL BDs in the Tenant Domain.

658	   If a PE supports OISM functionality, it indicates that by setting the
659	   "OISM-supported" flag in the Multicast Flags Extended Community that
660	   it attaches to some or all of its IMET routes.  An OISM PE SHOULD
661	   attach this EC with the OISM-supported flag set to all the IMET
662	   routes it originates.  However, if PE1 imports IMET routes from PE2,
663	   and at least one of PE2's IMET routes indicates that PE2 is an OISM
664	   PE, PE1 MUST assume that PE2 is following OISM procedures.

666	1.5.2.  Data Plane

668	   Suppose PE1 has an AC to a segment in BD1, and PE1 receives from that
669	   AC an (S,G) multicast frame (as defined in Section 1.4).

671	   There may be other ACs of PE1 on which TSes have indicated an
672	   interest (via IGMP/MLD) in receiving (S,G) multicast packets.  PE1 is
673	   responsible for sending the received multicast packet out those ACs.
674	   There are two cases to consider:

676	   o  Intra-Subnet Forwarding: In this case, an attachment AC with
677	      interest in (S,G) is connected to a segment that is part of the
678	      source BD, BD1.  If the segment is not multi-homed, or if PE1 is
679	      the Designated Forwarder (DF) (see [RFC7432]) for that segment,
680	      PE1 sends the multicast frame on that AC without changing the MAC
681	      SA.  The IP header is not modified at all; in particular, the TTL
682	      is not decremented.

684	   o  Inter-Subnet Forwarding: An AC with interest in (S,G) is connected
685	      to a segment of BD2, where BD2 is different than BD1.  If PE1 is
686	      the DF for that segment (or if the segment is not multi-homed),
687	      PE1 decapsulates the IP multicast packet, performs any necessary
688	      IP processing (including TTL decrement), then re-encapsulates the
689	      packet appropriately for BD2.  PE1 then sends the packet on the
690	      AC.  Note that after re-encapsulation, the MAC SA will be PE1's
691	      MAC address on BD2.  The IP TTL will have been decremented by 1.

693	   In addition, there may be other PEs that are interested in (S,G)
694	   traffic.  Suppose PE2 is such a PE.  Then PE1 tunnels a copy of the
695	   IP multicast frame (with its original MAC SA, and with no alteration
696	   of the payload's IP header) to PE2.  The tunnel encapsulation
697	   contains information that PE2 can use to associate the frame with an
698	   "apparent source BD".  If the actual source BD of the frame is BD1,
699	   then:

701	   o  If PE2 is attached to BD1, the tunnel encapsulation used to send
702	      the frame to PE2 will cause PE2 to identify BD1 as the apparent
703	      source BD.

705	   o  If PE2 is not attached to BD1, the tunnel encapsulation used to
706	      send the frame to PE2 will cause PE2 to identify the SBD as the
707	      apparent source BD.

709	   Note that the tunnel encapsulation used for a particular BD will have
710	   been advertised in an IMET route or S-PMSI route
711	   ([I-D.ietf-bess-evpn-bum-procedure-updates]) for that BD.  That route
712	   carries a PMSI Tunnel attribute, which specifies how packets
713	   originating from that BD are encapsulated.  This information enables
714	   the PE receiving a tunneled packet to identify the apparent source BD
715	   as stated above.  See Section 3.2 for more details.

717	   When PE2 receives the tunneled frame, it will forward it on any of
718	   its ACs that have interest in (S,G).

720	   If PE2 determines from the tunnel encapsulation that the apparent
721	   source BD is BD1, then

723	   o  For those ACs that connect PE2 to BD1, the intra-subnet forwarding
724	      procedure described above is used, except that it is now PE2, not
725	      PE1, carrying out that procedure.  Unmodified EVPN procedures from
726	      [RFC7432] are used to ensure that a packet originating from a
727	      multi-homed segment is never sent back to that segment.

729	   o  For those ACs that do not connect to BD1, the inter-subnet
730	      forwarding procedure described above is used, except that it is
731	      now PE2, not PE1, carrying out that procedure.

733	   If the tunnel encapsulation identifies the apparent source BD as the
734	   SBD, PE2 applies the inter-subnet forwarding procedures described
735	   above to all of its ACs that have interest in the flow.

737	   These procedures ensure that an IP multicast frame travels from its
738	   ingress PE to all egress PEs that are interested in receiving it.
739	   While in transit, the frame retains its original MAC SA, and the
740	   payload of the frame retains its original IP header.  Note that in
741	   all cases, when an IP multicast packet is sent from one BD to
742	   another, these procedures cause its TTL to be decremented by 1.

744	   So far we have assumed that an IP multicast packet arrives at its
745	   ingress PE over an AC that belongs to one of the BDs in a given
746	   Tenant Domain.  However, it is possible for a packet to arrive at its
747	   ingress PE in other ways.  Since an EVPN-PE supporting IRB has an
748	   IP-VRF, it is possible that the IP-VRF will have a "VRF interface"
749	   that is not an IRB interface.  For example, there might be a VRF
750	   interface that is actually a physical link to an external ethernet
751	   switch, or to a directly attached host, or to a router.  When an
752	   EVPN-PE, say PE1, receives a packet through such means, we will say
753	   that the packet has an "external" source (i.e., a source "outside the
754	   Tenant Domain").  There are also other scenarios in which a multicast
755	   packet might have an external source, e.g., it might arrive over an
756	   MVPN tunnel from an L3VPN PE.  In such cases, we will still refer to
757	   PE1 as the "ingress EVPN-PE".

759	   When an EVPN-PE, say PE1, receives an externally sourced multicast
760	   packet, and there are receivers for that packet inside the Tenant
761	   Domain, it does the following:

763	   o  Suppose PE1 has an AC in BD1 that has interest in (S,G).  Then PE1
764	      encapsulates the packet for BD1, filling in the MAC SA field with
765	      PE1's own MAC address on BD1.  It sends the resulting frame on the
766	      AC.

768	   o  Suppose some other EVPN-PE, say PE2, has interest in (S,G).  PE1
769	      encapsulates the packet for ethernet, filling in the MAC SA field
770	      with PE1's own MAC address on the SBD.  PE1 then tunnels the
771	      packet to PE2.  The tunnel encapsulation will identify the
772	      apparent source BD as the SBD.  Since the apparent source BD is
773	      the SBD, PE2 will know to treat the frame as an inter-subnet
774	      multicast.

776	   When ingress replication is used to transmit IP multicast frames from
777	   an ingress EVPN-PE to a set of egress PEs, then of course the ingress
778	   PE has to send multiple copies of the frame.  Each copy is the
779	   original ethernet frame; decapsulation and IP processing take place
780	   only at the egress PE.

782	   If a Point-to-Multipoint (P2MP) tree or BIER ([I-D.ietf-bier-evpn])
783	   is used to transmit an IP multicast frame from an ingress PE to a set
784	   of egress PEs, then the ingress PE only has to send one copy of the
785	   frame to each of its next hops.  Again, each egress PE receives the
786	   original frame and does any necessary IP processing.

788	2.  Detailed Model of Operation

790	   The model described in Section 1.5.2 can be expressed more precisely
791	   using the notion of "IRB interface" (see Appendix A).  For a given
792	   Tenant Domain:

794	   o  A given PE has one IRB for each BD to which it is attached.  This
795	      IRB interface connects L3 routing to that BD.  When IP multicast
796	      packets are sent or received on the IRB interfaces, the semantics
797	      of the interface is modified from the semantics described in
798	      Appendix A.  See Section 2.3 for the details of the modification.

800	   o  Each PE also has an IRB interface that connects L3 routing to the
801	      SBD.  The semantics of this interface is different than the
802	      semantics of the IRB interface to the real BDs.  See Section 2.3.

804	   In this section we assume that PIM is not enabled on the IRB
805	   interfaces.  In general, it is not necessary to enable PIM on the IRB
806	   interfaces unless there are PIM routers on one of the Tenant Domain's
807	   BDs, or unless there is some other scenario requiring a Tenant
808	   Domain's L3 routing instance to become a PIM adjacency of some other
809	   system.  These cases will be discussed in Section 7.

811	2.1.  Supplementary Broadcast Domain

813	   Suppose a given Tenant Domain contains three BDs (BD1, BD2, BD3) and
814	   two PEs (PE1, PE2).  PE1 attaches to BD1 and BD2, while PE2 attaches
815	   to BD2 and BD3.

817	   To carry out the procedures described above, all the PEs attached to
818	   the Tenant Domain must be provisioned with the SBD for that tenant
819	   domain.  A Route Target (RT) must be associated with the SBD, and
820	   provisioned on each of those PEs.  We will refer to that RT as the
821	   "SBD-RT".

823	   A Tenant Domain is also configured with an IP-VRF
824	   ([I-D.ietf-bess-evpn-inter-subnet-forwarding]), and the IP-VRF is
825	   associated with an RT.  This RT MAY be the same as the SBD-RT.

827	   Suppose an (S,G) multicast frame originating on BD1 has a receiver on
828	   BD3.  PE1 will transmit the packet to PE2 as a frame, and the
829	   encapsulation will identify the frame's source BD as BD1.  Since PE2
830	   is not provisioned with BD1, it will treat the packet as if its
831	   source BD were the SBD.  That is, a packet can be transmitted from
832	   BD1 to BD3 even though its ingress PE is not configured for BD3, and/
833	   or its egress PE is not configured for BD1.

835	   EVPN supports service models in which a given EVPN Instance (EVI) can
836	   contain only one BD.  It also supports service models in which a
837	   given EVI can contain multiple BDs.  No matter which service model is
838	   being used for a particular tenant, it is highly RECOMMENDED that an
839	   EVI containing only the SBD be provisioned for that tenant.

841	   If, for some reason, it is not feasible to provision an EVI that
842	   contains only the SBD, it is possible to put the SBD in an EVI that
843	   contains other BDs.  However, in that case, the SBD-RT MUST be
844	   different than the RT associated with any other BD.  Otherwise the
845	   procedures of this document (as detailed in Sections 2.2 and 3.1)
846	   will not produce correct results.

848	2.2.  Detecting When a Route is About/For/From a Particular BD

850	   In this document, we frequently say that a particular multicast route
851	   is "about" a particular BD, or is "from" a particular BD, or is "for"
852	   a particular BD or is "related to" a particular BD or "is associated
853	   with" a particular BD.  These terms are used interchangeably.
854	   Subsequent sections of this document explain when various routes must
855	   be originated for particular BDs.  In this section, we explain how
856	   the PE originating a route marks the route to indicate which BD it is
857	   about.  We also explain how a PE receiving the route determines which
858	   BD the route is about.

860	   In EVPN, each BD is assigned a Route Target (RT).  An RT is a BGP
861	   extended community that can be attached to the BGP routes used by the
862	   EVPN control plane.  In some EVPN service models, each BD is assigned
863	   a unique RT.  In other service models, a set of BDs (all in the same
864	   EVI) may be assigned the same RT.  The RT that is assigned to the SBD
865	   is called the "SBD-RT".

867	   In those service models that allow a set of BDs to share a single RT,
868	   each BD is assigned a non-zero Tag ID.  The Tag ID appears in the
869	   Network Layer Reachability Information (NLRI) of many of the BGP
870	   routes that are used by the EVPN control plane.

872	   A given route may be about the SBD, or about an "ordinary BD" (a BD
873	   that is not the SBD).  An RT that has been assigned to an ordinary BD
874	   will be known as an "ordinary BD-RT".

876	   When constructing an IMET, SMET, S-PMSI or Leaf
877	   ([I-D.ietf-bess-evpn-bum-procedure-updates]) route that is about a
878	   given BD, the following rules apply:

880	   o  If the route is about an ordinary BD, say BD1, then

882	      *  the route MUST carry the ordinary BD-RT associated with BD1,
883	         and

885	      *  the route MUST NOT carry any RT that is associated with an
886	         ordinary BD other than BD1.

888	   o  If the route is about the SBD, the route MUST carry the SBD-RT,
889	      and MUST NOT carry any RT that is associated with any other BD.

891	   o  As detailed in subsequent sections, under certain circumstances a
892	      route that is about BD1 may carry both the RT of BD1 and also the
893	      SBD-RT.

895	   The IMET route for the SBD MUST carry an Multicast Flags Extended
896	   Community, in which an "OISM SBD" flag is set.

898	   The IMET route for a BD other than the SBD SHOULD carry an EVI-RT EC
899	   as defined in [I-D.ietf-bess-evpn-igmp-mld-proxy].  The EC is
900	   constructed from the SBD-RT, to indicate the BD's corresponding SBD.
901	   This allows all PEs to check that they have consistent SBD
902	   provisioning and allow an AR-replicator to automatically determine a
903	   BD's corresponding SBD w/o any provisioning, as explained in
904	   Section 3.2.3.1.

906	   When receiving an IMET, SMET, S-PMSI or Leaf route, it is necessary
907	   for the receiving PE to determine the BD to which the route belongs.

909	   This is done by examining the RTs carried by the route, as well as
910	   the Tag ID field of the route's NLRI.  There are several cases to
911	   consider.  Some of these cases are error cases that arise when the
912	   route has not been properly constructed.

914	   When one of the error cases is detected, the route MUST be regarded
915	   as a malformed route, and the "treat-as-withdraw" procedure of
916	   [RFC7606] MUST be applied.  Note though that these error cases are
917	   only detectable by EVPN procedures at the receiving PE; BGP
918	   procedures at intermediate nodes will generally not detect the
919	   existence of such error cases, and in general SHOULD NOT attempt to
920	   do so.

922	   Case 1:  The receiving PE recognizes more than one of the route's RTs
923	            as being an SBD-RT (i.e., the route carries SBD-RTs of more
924	            than one Tenant Domain).

926	            This is an error case; the route has not been properly
927	            constructed.

929	   Case 2:  The receiving PE recognizes one of the route's RTs as being
930	            associated with an ordinary BD, and recognizes one of the
931	            route's other RTs as being associated with a different
932	            ordinary BD.

934	            This is an error case; the route has not been properly
935	            constructed.

937	   Case 3:  The receiving PE recognizes one of the route's RTs as being
938	            associated with an ordinary BD in a particular Tenant
939	            Domain, and recognizes another of the route's RTs as being
940	            associated with the SBD of a different Tenant Domain.

942	            This is an error case; the route has not been properly
943	            constructed.

945	   Case 4:  The receiving PE does not recognize any of the route's RTs
946	            as being associated with an ordinary BD in any of its tenant
947	            domains, but does recognize one of the RTs as the SBD-RT of
948	            one of its Tenant Domains.

950	            In this case, receiving PE associates the route with the SBD
951	            of that Tenant Domain.  This association is made even if the
952	            Tag ID field of the route's NLRI is not the Tag ID of the
953	            SBD.

955	            This is a normal use case where either (a) the route is for
956	            a BD to which the receiving PE is not attached, or (b) the
957	            route is for the SBD.  In either case, the receiving PE
958	            associates the route with the SBD.

960	   Case 5:  The receiving PE recognizes exactly one of the RTs as an
961	            ordinary BD-RT that is associated with one of the PE's EVIs,
962	            say EVI-1.  The receiving PE also recognizes one of the RTs
963	            as being the SBD-RT of the Tenant Domain containing EVI-1.

965	            In this case, the route is associated with the BD in EVI-1
966	            that is identified (in the context of EVI-1) by the Tag ID
967	            field of the route's NLRI.  (If EVI-1 contains only a single
968	            BD, the Tag ID is likely to be zero.)

970	            This is the case where the route is for a BD to which the
971	            receiving PE is attached, but the route also carries the
972	            SBD-RT.  In this case, the receiving PE associates the route
973	            with the ordinary BD, not with the SBD.

975	   N.B.: According to the above rules, the mapping from BD to RT is a
976	   many-to-one or one-to-one mapping.  A route that an EVPN-PE
977	   originates for a particular BD carries that BD's RT, and an EVPN-PE
978	   that receives the route associates it with a BD as described above.
979	   However, RTs are not used only to help identify the BD to which a
980	   route belongs; they may also used by BGP to determine the path along
981	   which the route is distributed, and to determine which PEs receive
982	   the route.  There may be cases where it is desirable to originate a
983	   route about a particular BD, but have that route distributed to only
984	   some of the EVPN-PEs attached to that BD.  Or one might want the
985	   route distributed to some intermediate set of systems, where it might
986	   be modified or replaced before being propagated further.  Such
987	   situations are outside the scope of this document.

989	   Additionally, there may be situations where it is desirable to
990	   exchange routes among two or more different Tenant Domains ("EVPN
991	   Extranet").  Such situations are outside the scope of this document.

993	2.3.  Use of IRB Interfaces at Ingress PE

995	   When an (S,G) multicast frame is received from an AC belonging to a
996	   particular BD, say BD1:

998	   1.  The frame is sent unchanged to other EVPN-PEs that are interested
999	       in (S,G) traffic.  The encapsulation used to send the frame to
1000	       the other EVPN-PEs depends on the tunnel type being used for
1001	       multicast transmission.  (For our purposes, we consider Ingress
1002	       Replication (IR), Assisted Replication (AR) and BIER to be
1003	       "tunnel types", even though IR, AR and BIER do not actually use
1004	       P2MP tunnels.)  At the egress PE, the apparent source BD of the
1005	       frame can be inferred from the tunnel encapsulation.  If the
1006	       egress PE is not attached to the actual source BD, it will infer
1007	       that the apparent source BD is the SBD.

1009	       Note that the the inter-PE transmission of a multicast frame
1010	       among EVPN-PEs of the same Tenant Domain does NOT involve the IRB
1011	       interfaces, as long as the multicast frame was received over an
1012	       AC attached to one of the Tenant Domain's BDs.

1014	   2.  The frame is also sent up the IRB interface that attaches BD1 to
1015	       the Tenant Domain's L3 routing instance in this PE.  That is, the
1016	       L3 routing instance, behaving as if it were a multicast router,
1017	       receives the IP multicast frames that arrive at the PE from its
1018	       local ACs.  The L3 routing instance decapsulates the frame's
1019	       payload to extract the IP multicast packet, decrements the IP
1020	       TTL, adjusts the header checksum, and does any other necessary IP
1021	       processing (e.g., fragmentation).

1023	   3.  The L3 routing instance keeps track of which BDs have local
1024	       receivers for (S,G) traffic.  (A "local receiver" is a TS,
1025	       reachable via a local AC, that has expressed interest in (S,G)
1026	       traffic.)  If the L3 routing instance has an IRB interface to
1027	       BD2, and it knows that BD2 has a LOCAL receiver interested in
1028	       (S,G) traffic, it encapsulates the packet in an ethernet header
1029	       for BD2, putting its own MAC address in the MAC SA field.  Then
1030	       it sends the packet down the IRB interface to BD2.

1032	   If a packet is sent from the L3 routing instance to a particular BD
1033	   via the IRB interface (step 3 in the above list), and if the BD in
1034	   question is NOT the SBD, the packet is sent ONLY to LOCAL ACs of that
1035	   BD.  If the packet needs to go to other PEs, it has already been sent
1036	   to them in step 1.  Note that this is a change in the IRB interface
1037	   semantics from what is described in
1038	   [I-D.ietf-bess-evpn-inter-subnet-forwarding] and Figure 2.

1040	   If a given locally attached segment is multi-homed, existing EVPN
1041	   procedures ensure that a packet is not sent by a given PE to that
1042	   segment unless the PE is the DF for that segment.  Those procedures
1043	   also ensure that a packet is never sent by a PE to its segment of
1044	   origin.  Thus EVPN segment multi-homing is fully supported; duplicate
1045	   delivery to a segment or looping on a segment are thereby prevented,
1046	   without the need for any new procedures to be defined in this
1047	   document.

1049	   What if an IP multicast packet is received from outside the tenant
1050	   domain?  For instance, perhaps PE1's IP-VRF for a particular tenant
1051	   domain also has a physical interface leading to an external switch,
1052	   host, or router, and PE1 receives an IP multicast packet or frame on
1053	   that interface.  Or perhaps the packet is from an L3VPN, or a
1054	   different EVPN Tenant Domain.

1056	   Such a packet is first processed by the L3 routing instance, which
1057	   decrements TTL and does any other necessary IP processing.  Then the
1058	   packet is sent into the Tenant Domain by sending it down the IRB
1059	   interface to the SBD of that Tenant Domain.  This requires
1060	   encapsulating the packet in an ethernet header.  The MAC SA field
1061	   will contain the PE's own MAC on the SBD.

1063	   An IP multicast packet sent by the L3 routing instance down the IRB
1064	   interface to the SBD is treated as if it had arrived from a local AC,
1065	   and steps 1-3 are applied.  Note that the semantics of sending a
1066	   packet down the IRB interface to the SBD are thus slightly different
1067	   than the semantics of sending a packet down other IRB interfaces.  IP
1068	   multicast packets sent down the SBD's IRB interface may be
1069	   distributed to other PEs, but IP multicast packets sent down other
1070	   IRB interfaces are distributed only to local ACs.

1072	   If a PE sends a link-local multicast packet down the SBD IRB
1073	   interface, that packet will be distributed (as an ethernet frame) to
1074	   other PEs of the Tenant Domain, but will not appear on any of the
1075	   actual BDs.

1077	2.4.  Use of IRB Interfaces at an Egress PE

1079	   Suppose an egress EVPN-PE receives an (S,G) multicast frame from the
1080	   frame's ingress EVPN-PE.  As described above, the packet will arrive
1081	   as an ethernet frame over a tunnel from the ingress PE, and the
1082	   tunnel encapsulation will identify the source BD of the ethernet
1083	   frame.

1085	   We define the notion of the frame's "apparent source BD" as follows.
1086	   If the egress PE is attached to the actual source BD, the actual
1087	   source BD is the apparent source BD.  If the egress PE is not
1088	   attached to the actual source BD, the SBD is the apparent source BD.

1090	   The egress PE now takes the following steps:

1092	   1.  If the egress PE has ACs belonging to the apparent source BD of
1093	       the frame, it sends the frame unchanged to any ACs of that BD
1094	       that have interest in (S,G) packets.  The MAC SA of the frame is
1095	       not modified, and the IP header of the frame's payload is not
1096	       modified in any way.

1098	   2.  The frame is also sent to the L3 routing instance by being sent
1099	       up the IRB interface that attaches the L3 routing instance to the
1100	       apparent source BD.  Steps 2 and 3 of Section 2.3 are then
1101	       applied.

1103	2.5.  Announcing Interest in (S,G)

1105	   [I-D.ietf-bess-evpn-igmp-mld-proxy] defines procedures used by an
1106	   egress PE to announce its interest in a multicast flow or set of
1107	   flows.  If an egress PE determines it has LOCAL receivers in a
1108	   particular BD, say BD1, that are interested in a particular set of
1109	   flows, it originates one or more SMET routes for BD1.  Each SMET
1110	   route specifies a particular (S,G) or (*,G) flow.  By originating an
1111	   SMET route for BD1, a PE is announcing "I have receivers for (S,G) or
1112	   (*,G) in BD1".  Such an SMET route carries the Route Target (RT) for
1113	   BD1, ensuring that it will be distributed to all PEs that are
1114	   attached to BD1.

1116	   The OISM procedures for originating SMET routes differ slightly from
1117	   those in [I-D.ietf-bess-evpn-igmp-mld-proxy].  In most cases, the
1118	   SMET routes are considered to be for the SBD, rather than for the BD
1119	   containing local receivers.  These SMET routes carry the SBD-RT, and
1120	   do not carry any ordinary BD-RT.  Details on the processing of SMET
1121	   routes can be found in Section 3.3.

1123	   Since the SMET routes carry the SBD-RT, every ingress PE attached to
1124	   a particular Tenant Domain will learn of all other PEs (attached to
1125	   the same Tenant Domain) that have interest in a particular set of
1126	   flows.  Note that a PE that receives a given SMET route does not
1127	   necessarily have any BDs (other than the SBD) in common with the PE
1128	   that originates that SMET route.

1130	   If all the sources and receivers for a given (*,G) are in the Tenant
1131	   Domain, inter-subnet "Any Source Multicast" traffic will be properly
1132	   routed without requiring any Rendezvous Points, shared trees, or
1133	   other complex aspects of multicast routing infrastructure.  Suppose,
1134	   for example, that:

1136	   o  PE1 has a local receiver, on BD1, for (*,G)

1138	   o  PE2 has a local source, on BD2, for (*,G).

1140	   PE1 will originate an SMET(*,G) route for the SBD, and PE2 will
1141	   receive that route, even if PE2 is not attached to BD1.  PE2 will
1142	   thus know to forward (S,G) traffic to PE1.  PE1 does not need to do
1143	   any "source discovery".  (This does assume that source S does not
1144	   send the same (S,G) datagram on two different BDs, and that the
1145	   Tenant Domain does not contain two or more sources with the same IP
1146	   address S.  The use of multicast sources that have IP "anycast"
1147	   addresses is outside the scope of this document.)
1148	   If some PE attached to the Tenant Domain does not support [I-D.ietf-
1149	   bess-evpn-igmp-mld-proxy], it will be assumed to be interested in all
1150	   flows.  Whether a particular remote PE supports [I-D.ietf-bess-evpn-
1151	   igmp-mld-proxy] is determined by the presence of the Multicast Flags
1152	   Extended Community in its IMET route; this is specified in [I-D.ietf-
1153	   bess-evpn-igmp-mld-proxy].

1155	2.6.  Tunneling Frames from Ingress PE to Egress PEs

1157	   [RFC7432] specifies the procedures for setting up and using "BUM
1158	   tunnels".  A BUM tunnel is a tunnel used to carry traffic on a
1159	   particular BD if that traffic is (a) broadcast traffic, or (b)
1160	   unicast traffic with an unknown MAC DA, or (c) ethernet multicast
1161	   traffic.

1163	   This document allows the BUM tunnels to be used as the default
1164	   tunnels for transmitting IP multicast frames.  It also allows a
1165	   separate set of tunnels to be used, instead of the BUM tunnels, as
1166	   the default tunnels for carrying IP multicast frames.  Let's call
1167	   these "IP Multicast Tunnels".

1169	   When the tunneling is done via Ingress Replication or via BIER, this
1170	   difference is of no significance.  However, when P2MP tunnels are
1171	   used, there is a significant advantage to having separate IP
1172	   multicast tunnels.

1174	   Other things being equal, it is desirable for an ingress PE to
1175	   transmit a copy of a given (S,G) multicast frame on only one P2MP
1176	   tunnel.  All egress PEs interested in (S,G) packets then have to join
1177	   that tunnel.  If the source BD and PE for an (S,G) frame are BD1 an
1178	   PE1 respectively, and if PE2 has receivers on BD2 for (S,G), then PE2
1179	   must join the P2MP LSP on which PE1 transmits the (S,G) frame.  PE2
1180	   must join this P2MP LSP even if PE2 is not attached to the source BD
1181	   (BD1).  If PE1 were transmitting the multicast frame on its BD1 BUM
1182	   tunnel, then PE2 would have to join the BD1 BUM tunnel, even though
1183	   PE2 has no BD1 attachment circuits.  This would cause PE2 to pull all
1184	   the BUM traffic from BD1, most of which it would just have to
1185	   discard.  Thus we RECOMMEND that the default IP multicast tunnels be
1186	   distinct from the BUM tunnels.

1188	   Notwithstanding the above, link local IP multicast traffic MUST
1189	   always be carried on the BUM tunnels, and ONLY on the BUM tunnels.
1190	   Link local IP multicast traffic consists of IPv4 traffic with a
1191	   destination address prefix of 224/8 and IPv6 traffic with a
1192	   destination address prefix of FF02/16.  In this document, the terms
1193	   "IP multicast packet" and "IP multicast frame" are defined in
1194	   Section 1.4 so as to exclude the link-local traffic.

1196	   Note that it is also possible to use "selective tunnels" to carry
1197	   particular multicast flows (see Section 3.2).  When an (S,G) frame is
1198	   transmitted on a selective tunnel, it is not transmitted on the BUM
1199	   tunnel or on the default IP Multicast tunnel.

1201	2.7.  Advanced Scenarios

1203	   There are some deployment scenarios that require special procedures:

1205	   1.  Some multicast sources or receivers are attached to PEs that
1206	       support [RFC7432], but do not support this document or
1207	       [I-D.ietf-bess-evpn-inter-subnet-forwarding].  To interoperate
1208	       with these "non-OISM PEs", it is necessary to have one or more
1209	       gateway PEs that interface the tunnels discussed in this document
1210	       with the BUM tunnels of the legacy PEs.  This is discussed in
1211	       Section 5.

1213	   2.  Sometimes multicast traffic originates from outside the EVPN
1214	       domain, or needs to be sent outside the EVPN domain.  This is
1215	       discussed in Section 6.  An important special case of this,
1216	       integration with MVPN, is discussed in Section 6.1.2.

1218	   3.  In some scenarios, one or more of the tenant systems is a PIM
1219	       router, and the Tenant Domain is used for as a transit network
1220	       that is part of a larger multicast domain.  This is discussed in
1221	       Section 7.

1223	3.  EVPN-aware Multicast Solution Control Plane

1225	3.1.  Supplementary Broadcast Domain (SBD) and Route Targets

1227	   As discussed in Section 2.1, every Tenant Domain is associated with a
1228	   single Supplementary Broadcast Domain (SBD).  Recall that a Tenant
1229	   Domain is defined to be a set of BDs that can freely send and receive
1230	   IP multicast traffic to/from each other.  If an EVPN-PE has one or
1231	   more ACs in a BD of a particular Tenant Domain, and if the EVPN-PE
1232	   supports the procedures of this document, that EVPN-PE MUST be
1233	   provisioned with the SBD of that Tenant Domain.

1235	   At each EVPN-PE attached to a given Tenant Domain, there is an IRB
1236	   interface leading from the L3 routing instance of that Tenant Domain
1237	   to the SBD.  However, the SBD has no ACs.

1239	   Each SBD is provisioned with a Route Target (RT).  All the EVPN-PEs
1240	   supporting a given SBD are provisioned with that RT as an import RT.
1241	   That RT MUST NOT be the same as the RT associated with any other BD.

1243	   We will use the term "SBD-RT" to denote the RT has has been assigned
1244	   to the SBD.  Routes carrying this RT will be propagated to all
1245	   EVPN-PEs in the same Tenant Domain as the originator.

1247	   Section 2.2 specifies the rules by which an EVPN-PE that receives a
1248	   route determines whether a received route "belongs to" a particular
1249	   ordinary BD or SBD.

1251	   Section 2.2 also specifies additional rules that must be following
1252	   when constructing routes that belong to a particular BD, including
1253	   the SBD.

1255	   The SBD SHOULD be in an EVPN Instance (EVI) of its own.  Even if the
1256	   SBD is not in an EVI of its own, the SBD-RT MUST be different than
1257	   the RT associated with any other BD.  This restriction is necessary
1258	   in order for the rules of Sections 2.2 and 3.1 to work correctly.

1260	   Note that an SBD, just like any other BD, is associated on each
1261	   EVPN-PE with a MAC-VRF.  Per [RFC7432], each MAC-VRF is associated
1262	   with a Route Distinguisher (RD).  When constructing a route that is
1263	   "about" an SBD, an EVPN-PE will place the RD of the associated
1264	   MAC-VRF in the "Route Distinguisher" field of the NLRI.  (If the
1265	   Tenant Domain has several MAC-VRFs on a given PE, the EVPN-PE has a
1266	   choice of which RD to use.)

1268	   If Assisted Replication (AR, see [I-D.ietf-bess-evpn-optimized-ir])
1269	   is used, each AR-REPLICATOR for a given Tenant Domain must be
1270	   provisioned with the SBD of that Tenant Domain, even if the
1271	   AR-REPLICATOR does not have any L3 routing instance.

1273	3.2.  Advertising the Tunnels Used for IP Multicast

1275	   The procedures used for advertising the tunnels that carry IP
1276	   multicast traffic depend upon the type of tunnel being used.  If the
1277	   tunnel type is neither Ingress Replication, Assisted Replication, nor
1278	   BIER, there are procedures for advertising both "inclusive tunnels"
1279	   and "selective tunnels".

1281	   When IR, AR or BIER are used to transmit IP multicast packets across
1282	   the core, there are no P2MP tunnels.  Once an ingress EVPN-PE
1283	   determines the set of egress EVPN-PEs for a given flow, the IMET
1284	   routes contain all the information needed to transport packets of
1285	   that flow to the egress PEs.

1287	   If AR is used, the ingress EVPN-PE is also an AR-LEAF and the IMET
1288	   route coming from the selected AR-REPLICATOR contains the information
1289	   needed.  The AR-REPLICATOR will behave as an ingress EVPN-PE when
1290	   sending a flow to the egress EVPN-PEs.

1292	   If the tunneling technique requires P2MP tunnels to be set up (e.g.,
1293	   RSVP-TE P2MP, mLDP, PIM), some of the tunnels may be selective
1294	   tunnels and some may be inclusive tunnels.

1296	   Selective P2MP tunnels are always advertised by the ingress PE using
1297	   S-PMSI A-D routes ([I-D.ietf-bess-evpn-bum-procedure-updates]).

1299	   For inclusive tunnels, there is a choice between using a BD's
1300	   ordinary "BUM tunnel" [RFC7432] as the default inclusive tunnel for
1301	   carrying IP multicast traffic, or using a separate IP multicast
1302	   tunnel as the default inclusive tunnel for carrying IP multicast.  In
1303	   the former case, the inclusive tunnel is advertised in an IMET route.
1304	   In the latter case, the inclusive tunnel is advertised in a (C-*,C-*)
1305	   S-PMSI A-D route ([I-D.ietf-bess-evpn-bum-procedure-updates]).
1306	   Details may be found in subsequent sections.

1308	3.2.1.  Constructing Routes for the SBD

1310	   There are situations in which an EVPN-PE needs to originate IMET,
1311	   SMET, and/or SPMSI routes for the SBD.  Throughout this document, we
1312	   will refer to such routes respectively as "SBD-IMET routes",
1313	   "SBD-SMET routes", and "SBD-SPMSI routes".  Subsequent sections
1314	   detail the conditions under which these routes need to be originated.

1316	   When an EVPN-PE needs to originate an SBD-IMET, SBD-SMET, or
1317	   SBD-SPMSI route, it constructs the route as follows:

1319	   o  the RD field of the route's NLRI is set to the RD of the MAC-VRF
1320	      that is associated with the SBD;

1322	   o  the SBD-RT is attached to the route;

1324	   o  the "Tag ID" field of the route's NLRI is set to the Tag ID that
1325	      has been assigned to the SBD.  This is most likely 0 if a
1326	      VLAN-based or VLAN-bundle service is being used, but non-zero if a
1327	      VLAN-aware bundle service is being used.

1329	3.2.2.  Ingress Replication

1331	   When Ingress Replication (IR) is used to transport IP multicast
1332	   frames of a given Tenant Domain, each EVPN-PE attached to that Tenant
1333	   Domain MUST originate an SBD-IMET route (see Section 3.2.1).

1335	   The SBD-IMET route MUST carry a PMSI Tunnel attribute (PTA), and the
1336	   MPLS label field of the PTA MUST specify a downstream-assigned MPLS
1337	   label that maps uniquely (in the context of the originating EVPN-PE)
1338	   to the SBD.

1340	   Following the procedures of [RFC7432], an EVPN-PE MUST also originate
1341	   an IMET route for each BD to which it is attached.  Each of these
1342	   IMET routes carries a PTA specifying a downstream-assigned label that
1343	   maps uniquely, in the context of the originating EVPN-PE, to the BD
1344	   in question.  These IMET routes need not carry the SBD-RT.

1346	   When an ingress EVPN-PE needs to use IR to send an IP multicast frame
1347	   from a particular source BD to an egress EVPN-PE, the ingress PE
1348	   determines whether the egress PE has originated an IMET route for
1349	   that BD.  If so, that IMET route contains the MPLS label that the
1350	   egress PE has assigned to the source BD.  The ingress PE uses that
1351	   label when transmitting the packet to the egress PE.  Otherwise, the
1352	   ingress PE uses the label that the egress PE has assigned to the SBD
1353	   (in the SBD-IMET route originated by the egress).

1355	   Note that the set of IMET routes originated by a given egress PE, and
1356	   installed by a given ingress PE, may change over time.  If the egress
1357	   PE withdraws its IMET route for the source BD, the ingress PE MUST
1358	   stop using the label carried in that IMET route, and instead MUST use
1359	   the label carried in the SBD-IMET route from that egress PE.
1360	   Implementors must also take into account that an IMET route from a
1361	   particular PE for a particular BD may arrive after that PE's SBD-IMET
1362	   route.

1364	3.2.3.  Assisted Replication

1366	   When Assisted Replication is used to transport IP multicast frames of
1367	   a given Tenant Domain, each EVPN-PE (including the AR-REPLICATOR)
1368	   attached to the Tenant Domain MUST originate an SBD-IMET route (see
1369	   Section 3.2.1).

1371	   An AR-REPLICATOR attached to a given Tenant Domain is considered to
1372	   be an EVPN-PE of that Tenant Domain.  It is attached to all the BDs
1373	   in the Tenant Domain, but it does not necessarily have L3 routing
1374	   instances.

1376	   As with Ingress Replication, the SBD-IMET route carries a PTA where
1377	   the MPLS label field specifies the downstream-assigned MPLS label
1378	   that identifies the SBD.  However, the AR-REPLICATOR and AR-LEAF
1379	   EVPN-PEs will set the PTA's flags differently, as per
1380	   [I-D.ietf-bess-evpn-optimized-ir].

1382	   In addition, each EVPN-PE originates an IMET route for each BD to
1383	   which it is attached.  As in the case of Ingress Replication, these
1384	   routes carry the downstream-assigned MPLS labels that identify the
1385	   BDs and do not carry the SBD-RT.

1387	   When an ingress EVPN-PE, acting as AR-LEAF, needs to send an IP
1388	   multicast frame from a particular source BD to an egress EVPN-PE, the
1389	   ingress PE determines whether there is any AR-REPLICATOR that
1390	   originated an IMET route for that BD.  After the AR-REPLICATOR
1391	   selection (if there are more than one), the AR-LEAF uses the label
1392	   contained in the IMET route of the AR-REPLICATOR when transmitting
1393	   packets to it.  The AR-REPLICATOR receives the packet and, based on
1394	   the procedures specified in [I-D.ietf-bess-evpn-optimized-ir] and in
1395	   Section 3.2.2 of this document, transmits the packets to the egress
1396	   EVPN-PEs using the labels contained in the received IMET routes for
1397	   either the source BD or the SBD.

1399	   If an ingress AR-LEAF for a given BD has not received any IMET route
1400	   for that BD from an AR-REPLICATOR, the ingress AR-LEAF follows the
1401	   procedures in Section 3.2.2.

1403	3.2.3.1.  Automatic SBD Matching

1405	   Each PE needs to know a BD's corresponding SBD.  Configuring that
1406	   information in each BD is one way but it requires repetitive
1407	   configuration and consistency check (to make sure that all the BDs of
1408	   the same tenant are configured with the same SBD).  A better way is
1409	   to configure the SBD info in the L3 routing instance so that all
1410	   related BDs will derive the SBD information.

1412	   An AR-replicator also needs to know same information, though it does
1413	   not necessarily have an L3 routing instance.  However from the EVI-RT
1414	   EC in a BD's IMET route, an AR-replicator can derive the
1415	   corresponding SBD of that BD w/o any configuration.

1417	3.2.4.  BIER

1419	   When BIER is used to transport multicast packets of a given Tenant
1420	   Domain, and a given EVPN-PE attached to that Tenant Domain is a
1421	   possible ingress EVPN-PE for traffic originating outside that Tenant
1422	   Domain, the given EVPN-PE MUST originate an SBD-IMET route, (see
1423	   Section 3.2.1).

1425	   In addition, IMET routes that are originated for other BDs in the
1426	   Tenant Domain MUST carry the SBD-RT.

1428	   Each IMET route (including but not limited to the SBD-IMET route)
1429	   MUST carry a PMSI Tunnel attribute (PTA).  The MPLS label field of
1430	   the PTA MUST specify an upstream-assigned MPLS label that maps
1431	   uniquely (in the context of the originating EVPN-PE) to the BD for
1432	   which the route is originated.

1434	   Suppose an ingress EVPN-PE, PE1, needs to use BIER to tunnel an IP
1435	   multicast frame to a set of egress EVPN-PEs.  And suppose the frame's
1436	   source BD is BD1.  The frame is encapsulated as follows:

1438	   o  A four-octet MPLS label stack entry ([RFC3032]) is prepended to
1439	      the frame.  The Label field is set to the upstream-assigned label
1440	      that PE1 has assigned to BD1.

1442	   o  The resulting MPLS packet is then encapsulated in a BIER
1443	      encapsulation ([RFC8296], [I-D.ietf-bier-evpn]).  The BIER
1444	      BitString is set to identify the egress EVPN-PEs.  The BIER
1445	      "proto" field is set to the value for "MPLS packet with
1446	      upstream-assigned label at top of stack".

1448	   Note: It is possible that the packet being tunneled from PE1
1449	   originated outside the Tenant Domain.  In this case, the actual
1450	   source BD (BD1) is considered to be the SBD, and the
1451	   upstream-assigned label it carries will be the label that PE1
1452	   assigned to the SBD, and advertised in its SBD-IMET route.

1454	   Suppose an egress PE, PE2, receives such a BIER packet.  The BFIR-id
1455	   field of the BIER header allows PE2 to determine that the ingress PE
1456	   is PE1.  There are then two cases to consider:

1458	   1.  PE2 has received and installed an IMET route for BD1 from PE1.

1460	       In this case, the BIER packet will be carrying the
1461	       upstream-assigned label that is specified in the PTA of that IMET
1462	       route.  This enables PE2 to determine the "apparent source BD"
1463	       (as defined in Section 2.4).

1465	   2.  PE2 has not received and installed an IMET route for BD1 from
1466	       PE1.

1468	       In this case, PE2 will not recognize the upstream-assigned label
1469	       carried in the BIER packet.  PE2 MUST discard the packet.

1471	   Further details on the use of BIER to support EVPN can be found in
1472	   [I-D.ietf-bier-evpn].

1474	3.2.5.  Inclusive P2MP Tunnels

1476	3.2.5.1.  Using the BUM Tunnels as IP Multicast Inclusive Tunnels

1478	   The procedures in this section apply only when

1480	   (a)  it is desired to use the BUM tunnels to carry IP multicast
1481	        traffic across the backbone, and

1483	   (b)  the BUM tunnels are P2MP tunnels (i.e., neither IR, AR, nor BIER
1484	        are being used to transport the BUM traffic).

1486	   In this case, an IP multicast frame (whether inter-subnet or
1487	   intra-subnet) will be carried across the backbone in the BUM tunnel
1488	   belonging to its source BD.  Each EVPN-PE attached to a given Tenant
1489	   Domain needs to join the BUM tunnels for every BD in the Tenant
1490	   Domain, even those BDs to which the EVPN-PE is not locally attached.
1491	   This ensures that an IP multicast packet from any source BD can reach
1492	   all PEs attached to the Tenant Domain.

1494	   Note that this will cause all the BUM traffic from a given BD in a
1495	   Tenant Domain to be sent to all PEs that attach to that Tenant
1496	   Domain, even the PEs that don't attach to the given BD.  To avoid
1497	   this, it is RECOMMENDED that the BUM tunnels not be used as IP
1498	   Multicast inclusive tunnels, and that the procedures of
1499	   Section 3.2.5.2 be used instead.

1501	   If a PE is a possible ingress EVPN-PE for traffic originating outside
1502	   the Tenant Domain, the PE MUST originate an SBD-IMET route (see
1503	   Section 3.2.1).  This route MUST carry a PTA specifying the P2MP
1504	   tunnel used for transmitting IP multicast packets that originate
1505	   outside the tenant domain.  All EVPN-PEs of the Tenant Domain MUST
1506	   join the tunnel specified in the PTA of an SBD-IMET route:

1508	   o  If the tunnel is an RSVP-TE P2MP tunnel, the originator of the
1509	      route MUST use RSVP-TE P2MP procedures to add each PE of the
1510	      Tenant Domain to the tunnel, even PEs that have not originated an
1511	      SBD-IMET route.

1513	   o  If the tunnel is an mLDP or PIM tunnel, each PE importing the
1514	      SBD-IMET route MUST add itself to the tunnel, using mLDP or PIM
1515	      procedures, respectively.

1517	   Whether or not a PE originates an SBD-IMET route, it will of course
1518	   originate an IMET route for each BD to which it is attached.  Each of
1519	   these IMET routes MUST carry the SBD-RT, as well as the RT for the BD
1520	   to which it belongs.

1522	   If a received IMET route is not the SBD-IMET route, it will also be
1523	   carrying the RT for its source BD.  The route's NLRI will carry the
1524	   Tag ID for the source BD.  From the RT and the Tag ID, any PE
1525	   receiving the route can determine the route's source BD.

1527	   If the MPLS label field of the PTA contains zero, the specified P2MP
1528	   tunnel is used only to carry frames of a single source BD.

1530	   If the MPLS label field of the PTA does not contain zero, it MUST
1531	   contain an upstream-assigned MPLS label that maps uniquely (in the
1532	   context of the originating EVPN-PE) to the source BD (or, in the case
1533	   of an SBD-IMET route, to the SBD).  The tunnel may then be used to
1534	   carry frames of multiple source BDs.  The apparent source BD of a
1535	   particular packet is inferred from the label carried by the packet.

1537	   IP multicast traffic originating outside the Tenant Domain is
1538	   transmitted with the label corresponding to the SBD, as specified in
1539	   the ingress EVPN-PE's SBD-IMET route.

1541	3.2.5.2.  Using Wildcard S-PMSI A-D Routes to Advertise Inclusive
1542	          Tunnels Specific to IP Multicast

1544	   The procedures of this section apply when (and only when) it is
1545	   desired to transmit IP multicast traffic on an inclusive tunnel, but
1546	   not on the same tunnel used to transmit BUM traffic.

1548	   However, these procedures do NOT apply when the tunnel type is
1549	   Ingress Replication or BIER, EXCEPT in the case where it is necessary
1550	   to interwork between non-OISM PEs and OISM PEs, as specified in
1551	   Section 5.

1553	   Each EVPN-PE attached to the given Tenant Domain MUST originate an
1554	   SBD-SPMSI A-D route.  The NLRI of that route MUST contain (C-*,C-*)
1555	   (see [RFC6625]).  Additional rules for constructing that route are
1556	   given in Section 3.2.1.

1558	   In addition, an EVPN-PE MUST originate an S-PMSI A-D route containing
1559	   (C-*,C-*) in its NLRI for each of the other BDs, in the given Tenant
1560	   Domain, to which it is attached.  All such routes MUST carry the
1561	   SBD-RT.  This ensures that those routes are imported by all EVPN-PEs
1562	   attached to the Tenant Domain.

1564	   A PE receiving these routes follows the procedures of Section 2.2 to
1565	   determine which BD the route is for.

1567	   If the MPLS label field of the PTA contains zero, the specified
1568	   tunnel is used only to carry frames of a single source BD.

1570	   If the MPLS label field of the PTA does not contain zero, it MUST
1571	   specify an upstream-assigned MPLS label that maps uniquely (in the
1572	   context of the originating EVPN-PE) to the source BD.  The tunnel may
1573	   be used to carry frames of multiple source BDs, and the apparent
1574	   source BD for a particular packet is inferred from the label carried
1575	   by the packet.

1577	   The EVPN-PE advertising these S-PMSI A-D route routes is specifying
1578	   the default tunnel that it will use (as ingress PE) for transmitting
1579	   IP multicast packets.  The upstream-assigned label allows an egress
1580	   PE to determine the apparent source BD of a given packet.

1582	3.2.6.  Selective Tunnels

1584	   An ingress EVPN-PE for a given multicast flow or set of flows can
1585	   always assign the flow to a particular P2MP tunnel by originating an
1586	   S-PMSI A-D route whose NLRI identifies the flow or set of flows.  The
1587	   NLRI of the route could be (C-*,C-G), or (C-S,C-G).  The S-PMSI A-D
1588	   route MUST carry the SBD-RT, so that it is imported by all EVPN-PEs
1589	   attached to the Tenant Domain.

1591	   An S-PMSI A-D route is "for" a particular source BD.  It MUST carry
1592	   the RT associated with that BD, and it MUST have the Tag ID for that
1593	   BD in its NLRI.

1595	   When an EVPN-PE imports an S-PMSI A-D route, it applies the rules of
1596	   Section 2.2 to associate the route with a particular BD.

1598	   Each such route MUST contain a PTA, as specified in Section 3.2.5.2.

1600	   An egress EVPN-PE interested in the specified flow or flows MUST join
1601	   the specified tunnel.  Procedures for joining the specified tunnel
1602	   are specific to the tunnel type.  (Note that if the tunnel type is
1603	   RSVP-TE P2MP LSP, the Leaf Information Required (LIR) flag of the PTA
1604	   SHOULD NOT be set.  An ingress OISM PE knows which OISM EVPN PEs are
1605	   interested in any given flow, and hence can add them to the RSVP-TE
1606	   P2MP tunnel that carries such flows.)

1608	   If the PTA does not specify a non-zero MPLS label, the apparent
1609	   source BD of any packets that arrive on that tunnel is considered to
1610	   be the BD associated with the route that carries the PTA.  If the PTA
1611	   does specify a non-zero MPLS label, the apparent source BD of any
1612	   packets that arrive on that tunnel carrying the specified label is
1613	   considered to be the BD associated with the route that carries the
1614	   PTA.

1616	   It should be noted that when either IR or BIER is used, there is no
1617	   need for an ingress PE to use S-PMSI A-D routes to assign specific
1618	   flows to selective tunnels.  The procedures of Section 3.3, along
1619	   with the procedures of Section 3.2.2, Section 3.2.3, or
1620	   Section 3.2.4, provide the functionality of selective tunnels without
1621	   the need to use S-PMSI A-D routes.

1623	3.3.  Advertising SMET Routes

1625	   [I-D.ietf-bess-evpn-igmp-mld-proxy] allows an egress EVPN-PE to
1626	   express its interest in a particular multicast flow or set of flows
1627	   by originating an SMET route.  The NLRI of the SMET route identifies
1628	   the flow or set of flows as (C-*,C-*) or (C-*,C-G) or (C-S,C-G).

1630	   Each SMET route belongs to a particular BD.  The Tag ID for the BD
1631	   appears in the NLRI of the route, and the route carries the RT
1632	   associated that that BD.  From this <RT, tag> pair, other EVPN-PEs
1633	   can identify the BD to which a received SMET route belongs.
1634	   (Remember though that the route may be carrying multiple RTs.)

1636	   There are three cases to consider:

1638	   o  Case 1: It is known that no BD of a Tenant Domain contains a
1639	      multicast router.

1641	      In this case, an egress PE advertises its interest in a flow or
1642	      set of flows by originating an SMET route that belongs to the SBD.
1643	      We refer to this as an SBD-SMET route.  The SBD-SMET route carries
1644	      the SBD-RT, and has the Tag ID for the SBD in its NLRI.  SMET
1645	      routes for the individual BDs are not needed, because there is no
1646	      need for a PE that receives an SMET route to send a corresponding
1647	      IGMP Join message out any of its ACs.

1649	   o  Case 2: It is known that more than one BD of a Tenant Domain may
1650	      contain a multicast router.

1652	      This is very like Case 1.  An egress PE advertises its interest in
1653	      a flow or set of flows by originating an SBD-SMET route.  The
1654	      SBD-SMET route carries the SBD-RT, and has the Tag ID for the SBD
1655	      in its NLRI.

1657	      In this case, it is important to be sure that SMET routes for the
1658	      individual BDs are not originated.  Suppose, for example, that PE1
1659	      had local receivers for a given flow on both BD1 and BD2, and that
1660	      it originated SMET routes for both those BDs.  Then PEs receiving
1661	      those SMET routes might send IGMP Joins on both those BDs.  This
1662	      could cause externally sourced multicast traffic to enter the
1663	      Tenant Domain at both BDs, which could result in duplication of
1664	      data.

1666	      N.B.: If it is possible that more than one BD contains a tenant
1667	      multicast router, then in order to receive multicast data
1668	      originating from outside EVPN, the PEs MUST follow the procedures
1669	      of Section 6.

1671	   o  Case 3: It is known that only a single BD of a Tenant Domain
1672	      contains a multicast router.

1674	      Suppose that an egress PE is attached to a BD on which there might
1675	      be a tenant multicast router.  (The tenant router is not
1676	      necessarily on a segment that is attached to that PE.)  And
1677	      suppose that the PE has one or more ACs attached to that BD which
1678	      are interested in a given multicast flow.  In this case, IN
1679	      ADDITION to the SMET route for the SBD, the egress PE MAY
1680	      originate an SMET route for that BD.  This will enable the ingress
1681	      PE(s) to send IGMP/MLD messages on ACs for the BD, as specified in
1682	      [I-D.ietf-bess-evpn-igmp-mld-proxy].  As long as that is the only
1683	      BD on which there is a tenant multicast router, there is no
1684	      possibility of duplication of data.

1686	   This document does not specify procedures for dynamically determining
1687	   which of the three cases applies to a given deployment; the PEs of a
1688	   given Tenant Domain MUST be provisioned to know which case applies.

1690	   As detailed in [I-D.ietf-bess-evpn-igmp-mld-proxy], an SMET route
1691	   carries a Multicast Flags EC containing flags indicating whether it
1692	   is to result in the propagation of IGMP v1, v2, or v3 messages on the
1693	   ACs of the BD to which the SMET route belongs.  These flags SHOULD be
1694	   set to zero in an SBD-SMET route.

1696	   Note that a PE only needs to originate the set of SBD-SMET routes
1697	   that are needed to pull in all the traffic in which it is interested.
1698	   Suppose PE1 has ACs attached to BD1 that are interested in (C-*,C-G)
1699	   traffic, and ACs attached to BD2 that are interested in (C-S,C-G)
1700	   traffic.  A single SBD-SMET route specifying (C-*,C-G) will pull in
1701	   all the necessary flows.

1703	   As another example, suppose the ACs attached to BD1 are interested in
1704	   (C-*,C-G) but not in (C-S,C-G), while the ACs attached to BD2 are
1705	   interested in (C-S,C-G).  A single SBD-SMET route specifying
1706	   (C-*,C-G) will pull in all the necessary flows.

1708	   In other words, to determine the set of SBD-SMET routes that have to
1709	   be sent for a given C-G, the PE has to merge the IGMP/MLD state for
1710	   all the BDs (of the given Tenant Domain) to which it is attached.

1712	   Per [I-D.ietf-bess-evpn-igmp-mld-proxy], importing an SMET route for
1713	   a particular BD will cause IGMP/MLD state to be instantiated for the
1714	   IRB interface to that BD.  This applies as well when the BD is the
1715	   SBD.

1717	   However, traffic that originates in one of the actual BDs of a
1718	   particular Tenant Domain MUST NOT be sent down the IRB interface that
1719	   connects the L3 routing instance of that Tenant Domain to the SBD.
1720	   That would cause duplicate delivery of traffic, since such traffic
1721	   will have already been distributed throughout the Tenant Domain.
1722	   Therefore, when setting up the IGMP/MLD state based on SBD-SMET
1723	   routes, care must be taken to ensure that the IRB interface to the
1724	   SBD is not added to the Outgoing Interface (OIF) list if the traffic
1725	   originates within the Tenant Domain.

1727	   There are some multicast scenarios that make use of "anycast
1728	   sources".  For example, two different sources may share the same
1729	   anycast IP address, say S1, and each may transmit an (S1,G) multicast
1730	   flow.  In such a scenario, the two (S1,G) flows are typically
1731	   identical.  Ordinary PIM procedures will cause only one the flows to
1732	   be delivered to each receiver that has expressed interest in either
1733	   (*,G) or (S1,G).  However, the OISM procedures described in this
1734	   document will result in both of the (S1,G) flows being distributed in
1735	   the Tenant Domain, and duplicate delivery will result.  Therefore, if
1736	   there are receivers for (*,G) in a given Tenant Domain, there MUST
1737	   NOT be anycast sources for G within that Tenant Domain.  (This
1738	   restriction can be lifted by defining additional procedures; however
1739	   that is outside the scope of this document.)

1741	4.  Constructing Multicast Forwarding State

1743	4.1.  Layer 2 Multicast State

1745	   An EVPN-PE maintains "layer 2 multicast state" for each BD to which
1746	   it is attached.

1748	   Let PE1 be an EVPN-PE, and BD1 be a BD to which it is attached.  At
1749	   PE1, BD1's layer 2 multicast state for a given (C-S,C-G) or (C-*,C-G)
1750	   governs the disposition of an IP multicast packet that is received by
1751	   BD1's layer 2 multicast function on an EVPN-PE.

1753	   An IP multicast (S,G) packet is considered to have been received by
1754	   BD1's layer 2 multicast function in PE1 in the following cases:

1756	   o  The packet is the payload of an ethernet frame received by PE1
1757	      from an AC that attaches to BD1.

1759	   o  The packet is the payload of an ethernet frame whose apparent
1760	      source BD is BD1, and which is received by the PE1 over a tunnel
1761	      from another EVPN-PE.

1763	   o  The packet is received from BD1's IRB interface (i.e., has been
1764	      transmitted by PE1's L3 routing instance down BD1's IRB
1765	      interface).

1767	   According to the procedures of this document, all transmission of IP
1768	   multicast packets from one EVPN-PE to another is done at layer 2.
1769	   That is, the packets are transmitted as ethernet frames, according to
1770	   the layer 2 multicast state.

1772	   Each layer 2 multicast state (S,G) or (*,G) contains a set "output
1773	   interfaces" (OIF list).  The disposition of an (S,G) multicast frame
1774	   received by BD1's layer 2 multicast function is determined as
1775	   follows:

1777	   o  The OIF list is taken from BD1's layer 2 (S,G) state, or if there
1778	      is no such (S,G) state, then from BD1's (*,G) state.  (If neither
1779	      state exists, the OIF list is considered to be null.)

1781	   o  The rules of Section 4.1.2 are applied to the OIF list.  This will
1782	      generally result in the frame being transmitted to some, but not
1783	      all, elements of the OIF list.

1785	   Note that there is no RPF check at layer 2.

1787	4.1.1.  Constructing the OIF List

1789	   In this document, we have extended the procedures of
1790	   [I-D.ietf-bess-evpn-igmp-mld-proxy] so that IMET and SMET routes for
1791	   a particular BD are distributed not just to PEs that attach to that
1792	   BD, but to PEs that attach to any BD in the Tenant Domain.  In this
1793	   way, each PE attached to a given Tenant Domain learns, from each
1794	   other PE attached to the same Tenant Domain, the set of flows that
1795	   are of interest to each of those other PEs.  (If some PE attached to
1796	   the Tenant Domain does not support
1797	   [I-D.ietf-bess-evpn-igmp-mld-proxy], it will be assumed to be
1798	   interested in all flows.  Whether a particular remote PE supports
1799	   [I-D.ietf-bess-evpn-igmp-mld-proxy] is determined by the presence of
1800	   an Extended Community in its IMET route; this is specified in
1801	   [I-D.ietf-bess-evpn-igmp-mld-proxy].)  If a set of remote PEs are
1802	   interested in a particular flow, the tunnels used to reach those PEs
1803	   are added to the OIF list of the multicast states corresponding to
1804	   that flow.

1806	   An EVPN-PE may run IGMP/MLD procedures on each of its ACs, in order
1807	   to determine the set of flows of interest to each AC.  (An AC is said
1808	   to be interested in a given flow if it connects to a segment that has
1809	   tenant systems interested in that flow.)  If IGMP/MLD procedures are
1810	   not being run on a given AC, that AC is considered to be interested
1811	   in all flows.  For each BD, the set of ACs interested in a given flow
1812	   is determined, and the ACs of that set are added to the OIF list of
1813	   that BD's multicast state for that flow.

1815	   The OIF list for each multicast state must also contain the IRB
1816	   interface for the BD to which the state belongs.

1818	   Implementors should note that the OIF list of a multicast state will
1819	   change from time to time as ACs and/or remote PEs either become
1820	   interested in, or lose interest in, particular multicast flows.

1822	4.1.2.  Data Plane: Applying the OIF List to an (S,G) Frame

1824	   When an (S,G) multicast frame is received by the layer 2 multicast
1825	   function of a given EVPN-PE, say PE1, its disposition depends (a) the
1826	   way it was received, (b) upon the OIF list of the corresponding
1827	   multicast state (see Section 4.1.1), (c) upon the "eligibility" of an
1828	   AC to receive a given frame (see Section 4.1.2.1 and (d) upon its
1829	   apparent source BD (see Section 3.2 for information about determining
1830	   the apparent source BD of a frame received over a tunnel from another
1831	   PE).

1833	4.1.2.1.  Eligibility of an AC to Receive a Frame

1835	   A given (S,G) multicast frame is eligible to be transmitted by a
1836	   given PE, say PE1, on a given AC, say AC1, only if one of the
1837	   following conditions holds:

1839	   1.  ESI labels are being used, PE1 is the DF for the segment to which
1840	       AC1 is connected, and the frame did not originate from that same
1841	       segment (as determined by the ESI label), or

1843	   2.  The ingress PE for the frame is a remote PE, say PE2, local bias
1844	       is being used, and PE2 is not connected to the same segment as
1845	       AC1.

1847	4.1.2.2.  Applying the OIF List

1849	   Assume a given (S,G) multicast frame has been received by a given PE,
1850	   say PE1.  PE1 determines the apparent source BD of the frame, finds
1851	   the layer 2 (S,G) state for that BD (or the (*,G) state if there is
1852	   no (S,G) state), and takes the OIF list from that state.  (Note that
1853	   if PE1 is not attached to the actual source BD, the apparent source
1854	   BD will be the SBD.)

1856	   Suppose PE1 has determined the frame's apparent source BD to be BD1
1857	   (which may or may not be the SBD.)  There are the following cases to
1858	   consider:

1860	   1.  The frame was received by PE1 from a local AC, say AC1, that
1861	       attaches to BD1.

1863	       a.  The frame MUST be sent out all local ACs of BD1 that appear
1864	           in the OIF list, except for AC1 itself.

1866	       b.  The frame MUST also be delivered to any other EVPN-PEs that
1867	           have interest in it.  This is achieved as follows:

1869	           i.    If (a) AR is being used, and (b) PE1 is an AR-LEAF, and
1870	                 (c) the OIF list is non-null, PE1 MUST send the frame
1871	                 to the AR-REPLICATOR.

1873	           ii.   Otherwise the frame MUST be sent on all tunnels in the
1874	                 OIF list.

1876	       c.  The frame MUST be sent to the local L3 routing instance by
1877	           being sent up the IRB interface of BD1.  It MUST NOT be sent
1878	           up any other IRB interfaces.

1880	   2.  The frame was received by PE1 over a tunnel from another PE.
1881	       (See Section 3.2 for the rules to determine the apparent source
1882	       BD of a packet received from another PE.  Note that if PE1 is not
1883	       attached to the source BD, it will regard the SBD as the apparent
1884	       source BD.)

1886	       a.  The frame MUST be sent out all local ACs in the OIF list that
1887	           connect to BD1 and that are eligible (per Section 4.1.2.1) to
1888	           receive the frame.

1890	       b.  The frame MUST be sent up the IRB interface of the apparent
1891	           source BD.  (Note that this may be the SBD.)  The frame MUST
1892	           NOT be sent up any other IRB interfaces.

1894	       c.  If PE1 is not an AR-REPLICATOR, it MUST NOT send the frame to
1895	           any other EVPN-PEs.  However, if PE1 is an AR-REPLICATOR, it
1896	           MUST send the frame to all tunnels in the OIF list, except
1897	           for the tunnel over which the frame was received.

1899	   3.  The frame was received by PE1 from the BD1 IRB interface (i.e.,
1900	       the frame has been transmitted by PE1's L3 routing instance down
1901	       the BD1 IRB interface), and BD1 is NOT the SBD.

1903	       a.  The frame MUST be sent out all local ACs in the OIF list that
1904	           are eligible (per Section 4.1.2.1 to receive the frame.

1906	       b.  The frame MUST NOT be sent to any other EVPN-PEs.

1908	       c.  The frame MUST NOT be sent up any IRB interfaces.

1910	   4.  The frame was received from the SBD IRB interface (i.e., has been
1911	       transmitted by PE1's L3 routing instance down the SBD IRB
1912	       interface).

1914	       a.  The frame MUST be sent on all tunnels in the OIF list.  This
1915	           causes the frame to be delivered to any other EVPN-PEs that
1916	           have interest in it.

1918	       b.  The frame MUST NOT be sent on any local ACs.

1920	       c.  The frame MUST NOT be sent up any IRB interfaces.

1922	4.2.  Layer 3 Forwarding State

1924	   If an EVPN-PE is performing IGMP/MLD procedures on the ACs of a given
1925	   BD, it processes those messages at layer 2 to help form the layer 2
1926	   multicast state.  If also sends those messages up that BD's IRB
1927	   interface to the L3 routing instance of a particular tenant domain.
1928	   This causes layer 2 (C-S,C-G) or (C-*,C-G) L3 state to be created/
1929	   updated.

1931	   A layer 3 multicast state has both an Input Interface (IIF) and an
1932	   OIF list.

1934	   To set the IIF of an (C-S,C-G) state, the EVPN-PE must determine the
1935	   source BD of C-S.  This is done by looking up S in the local
1936	   MAC-VRF(s) of the given Tenant Domain.

1938	   If the source BD is present on the PE, the IIF is set to the IRB
1939	   interface that attaches to that BD.  Otherwise the IIF is set to the
1940	   SBD IRB interface.

1942	   For (C-*,C-G) states, traffic can arrive from any BD, so the IIF
1943	   needs to be set to a wildcard value meaning "any IRB interface".

1945	   The OIF list of these states includes one or more of the IRB
1946	   interfaces of the Tenant Domain.  In general, maintenance of the OIF
1947	   list does not require any EVPN-specific procedures.  However, there
1948	   is one EVPN-specific rule:

1950	      If the IIF is one of the IRB interfaces (or the wild card meaning
1951	      "any IRB interface"), then the SBD IRB interface MUST NOT be added
1952	      to the OIF list.  Traffic originating from within a particular
1953	      EVPN Tenant Domain must not be sent down the SBD IRB interface, as
1954	      such traffic has already been distributed to all EVPN-PEs attached
1955	      to that Tenant Domain.

1957	   Please also see Section 6.1.1, which states a modification of this
1958	   rule for the case where OISM is interworking with external Layer 3
1959	   multicast routing.

1961	5.  Interworking with non-OISM EVPN-PEs

1963	   It is possible that a given Tenant Domain will be attached to both
1964	   OISM PEs and non-OISM PEs.  Inter-subnet IP multicast should be
1965	   possible and fully functional even if not all PEs attaching to a
1966	   Tenant Domain can be upgraded to support OISM functionality.

1968	   Note that the non-OISM PEs are not required to have IRB support, or
1969	   support for [I-D.ietf-bess-evpn-igmp-mld-proxy].  It is however
1970	   advantageous for the non-OISM PEs to support
1971	   [I-D.ietf-bess-evpn-igmp-mld-proxy].

1973	   In this section, we will use the following terminology:

1975	   o  PE-S: the ingress PE for an (S,G) flow.

1977	   o  PE-R: an egress PE for an (S,G) flow.

1979	   o  BD-S: the source BD for an (S,G) flow.  PE-S must have one or more
1980	      ACs attached BD-S, at least one of which attaches to host S.

1982	   o  BD-R: a BD that contains a host interested in the flow.  The host
1983	      is attached to PE-R via an AC that belongs to BD-R.

1985	   To allow OISM PEs to interwork with non-OISM PEs, a given Tenant
1986	   Domain needs to contain one or more "IP Multicast Gateways" (IPMGs).
1987	   An IPMG is an OISM PE with special responsibilities regarding the
1988	   interworking between OISM and non-OISM PEs.

1990	   If a PE is functioning as an IPMG, it MUST signal this fact by
1991	   setting the "IPMG" flag in the Multicast Flags EC that it attaches to
1992	   its IMET routes.  An IPMG SHOULD attach this EC with the IPMG flag
1993	   set to all IMET routes it originates.  However, if PE1 imports any
1994	   IMET route from PE2 that has the EC present with the "IPMG" flag set,
1995	   then the PE1 will assume that PE2 is an IPMG.

1997	   An IPMG Designated Forwarder (IPMG-DF) selection procedure is used to
1998	   ensure that, at any given time, there is exactly one active IPMG-DF
1999	   for any given BD.  Details of the IPMG-DF selection procedure are in
2000	   Section 5.1.  The IPMG-DF for a given BD, say BD-S, has special
2001	   functions to perform when it receives (S,G) frames on that BD:

2003	   o  If the frames are from a non-OISM PE-S:

2005	      *  The IPMG-DF forwards them to OISM PEs that do not attach to
2006	         BD-S but have interest in (S,G).

2008	         Note that OISM PEs that do attach to BD-S will have received
2009	         the frames on the BUM tunnel from the non-OISM PE-S.

2011	      *  The IPMG-DF forwards them to non-OISM PEs that have interest in
2012	         (S,G) on ACs that do not belong to BD-S.

2014	         Note that if a non-OISM PE has multiple BDs other than BD-S
2015	         with interest in (S,G), it will receive one copy of the frame
2016	         for each such BD.  This is necessary because the non-OISM PEs
2017	         cannot move IP multicast traffic from one BD to another.

2019	   o  If the frames are from an OISM PE, the IPMG-DF forwards them to
2020	      non-OISM PEs that have interest in (S,G) on ACs that do not belong
2021	      to BD-S.

2023	      If a non-OISM PE has interest in (S,G) on an AC belonging to BD-S,
2024	      it will have received a copy of the (S,G) frame, encapsulated for
2025	      BD-S, from the OISM PE-S.  (See Section 3.2.2.)  If the non-OISM
2026	      PE has interest in (S,G) on one or more ACs belonging to
2027	      BD-R1,...,BD-Rk where the BD-Ri are distinct from BD-S, the
2028	      IPMG-DF needs to send it a copy of the frame for BD-Ri.

2030	   If an IPMG receives a frame on a BD for which it is not the IPMG-DF,
2031	   it just follows normal OISM procedures.

2033	   This section specifies several sets of procedures:

2035	   o  the procedures that the IPMG-DF for a given BD needs to follow
2036	      when receiving, on that BD, an IP multicast frame from a non-OISM
2037	      PE;

2039	   o  the procedures that the IPMG-DF for a given BD needs to follow
2040	      when receiving, on that BD, an IP multicast frame from an OISM PE;

2042	   o  the procedures that an OISM PE needs to follow when receiving, on
2043	      a given BD, an IP multicast frame from a non-OISM PE, when the
2044	      OISM PE is not the IPMG-DF for that BD.

2046	   To enable OISM/non-OISM interworking in a given Tenant Domain, the
2047	   Tenant Domain MUST have some EVPN-PEs that can function as IPMGs.  An
2048	   IPMG must be configured with the SBD.  It must also be configured
2049	   with every BD of the Tenant Domain that exists on any of the non-OISM
2050	   PEs of that domain.  (Operationally, it may be simpler to configure
2051	   the IPMG with all the BDs of the Tenant Domain.)
2052	   A non-OISM PE of course only needs to be configured with BDs for
2053	   which it has ACs.  An OISM PE that is not an IPMG only needs to be
2054	   configured with the SBD and with the BDs for which it has ACs.

2056	   An IPMG MUST originate a wildcard SMET route (with (C-*,C-*) in the
2057	   NLRI) for each BD in the Tenant Domain.  This will cause it to
2058	   receive all the IP multicast traffic that is sourced in the Tenant
2059	   Domain.  Note that non-OISM nodes that do not support
2060	   [I-D.ietf-bess-evpn-igmp-mld-proxy] will send all the multicast
2061	   traffic from a given BD to all PEs attached to that BD, even if those
2062	   PEs do not originate an SMET route.

2064	   The interworking procedures vary somewhat depending upon whether
2065	   packets are transmitted from PE to PE via Ingress Replication (IR) or
2066	   via Point-to-Multipoint (P2MP) tunnels.  We do not consider the use
2067	   of BIER in this section, due to the low likelihood of there being a
2068	   non-OISM PE that supports BIER.

2070	5.1.  IPMG Designated Forwarder

2072	   Every PE that is eligible for selection as an IPMG-DF for a
2073	   particular BD originates both an IMET route for that BD and an
2074	   SBD-IMET route.  As stated in Section 5, these SBD-IMET routes carry
2075	   a Multicast Flags EC with the IPMG Flag set.

2077	   These SBD-IMET routes SHOULD also carry a DF Election EC.  The DF
2078	   Election EC and its use is specified in ([RFC8584]).  When the route
2079	   is originated, the AC-DF bit in the DF Election EC SHOULD be set to
2080	   zero.  This bit is not used when selecting an IPMSG-DF, i.e., it MUST
2081	   be ignored by the receiver of an SBD-IMET route.

2083	   In the context of a given Tenant Domain, to select the IPMG-DF for a
2084	   particular BD, say BD1, the IPMGs of the Tenant Domain perform the
2085	   following procedure:

2087	   o  From the set of received SBD-IMET routes for the given tenant
2088	      domain, determine the candidate set of PEs that support IPMG
2089	      functionality for that domain.

2091	   o  Eliminate from that candidate set any PEs from which an IMET route
2092	      for BD1 has not been received.

2094	   o  Select a DF Election algorithm as specified in [RFC8584].  Some of
2095	      the possible algorithms can be found, e.g., in [RFC8584],
2096	      [RFC7432], and [I-D.ietf-bess-evpn-pref-df].

2098	   o  Apply the DF Election Algorithm (see [RFC8584]) to the candidate
2099	      set of PEs.  The "winner' becomes the IPMG-DF for BD1.

2101	   Note that even if a given PE supports MEG (Section 6.1.2) and/or PEG
2102	   (Section 6.1.4) functionality, as well as IPMG functionality, its
2103	   SBD-IMET routes carry only one DF Election EC.

2105	5.2.  Ingress Replication

2107	   The procedures of this section are used when Ingress Replication is
2108	   used to transmit packets from one PE to another.

2110	   When a non-OISM PE-S transmits a multicast frame from BD-S to another
2111	   PE, PE-R, PE-S will use the encapsulation specified in the BD-S IMET
2112	   route that was originated by PE-R.  This encapsulation will include
2113	   the label that appears in the "MPLS label" field of the PMSI Tunnel
2114	   attribute (PTA) of the IMET route.  If the tunnel type is VXLAN, the
2115	   "label" is actually a Virtual Network Identifier (VNI); for other
2116	   tunnel types, the label is an MPLS label.  In either case, we will
2117	   speak of the transmitted frames as carrying a label that was assigned
2118	   to a particular BD by the PE-R to which the frame is being
2119	   transmitted.

2121	   To support OISM/non-OISM interworking, an OISM PE-R MUST originate,
2122	   for each of its BDs, both an IMET route and an S-PMSI (C-*,C-*) A-D
2123	   route.  Note that even when IR is being used, interworking between
2124	   OISM and non-OISM PEs requires the OISM PEs to follow the rules of
2125	   Section 3.2.5.2, as modified below.

2127	   Non-OISM PEs will not understand S-PMSI A-D routes.  So when a
2128	   non-OISM PE-S transmits an IP multicast frame with a particular
2129	   source BD to an IPMG, it encapsulates the frame using the label
2130	   specified in that IPMG's BD-S IMET route.  (This is just the
2131	   procedure of [RFC7432].)

2133	   The (C-*,C-*) S-PMSI A-D route originated by a given OISM PE will
2134	   have a PTA that specifies IR.

2136	   o  If MPLS tunneling is being used, the MPLS label field SHOULD
2137	      contain a non-zero value, and the LIR flag SHOULD be zero.  (The
2138	      case where the MPLS label field is zero or the LIR flag is set is
2139	      outside the scope of this document.)

2141	   o  If the tunnel encapsulation is VXLAN, the MPLS label field MUST
2142	      contain a non-zero value, and the LIR flag MUST be zero.

2144	   When an OISM PE-S transmits an IP multicast frame to an IPMG, it will
2145	   use the label specified in that IPMG's (C-*,C-*) S-PMSI A-D route.

2147	   When a PE originates both an IMET route and a (C-*,C-*) S-PMSI A-D
2148	   route, the values of the MPLS label field in the respective PTAs must
2149	   be distinct.  Further, each MUST map uniquely (in the context of the
2150	   originating PE) to the route's BD.

2152	   As a result, an IPMG receiving an MPLS-encapsulated IP multicast
2153	   frame can always tell by the label whether the frame's ingress PE is
2154	   an OISM PE or a non-OISM PE.  When an IPMG receives a VXLAN-
2155	   encapsulated IP multicast frame it may need to determine the identity
2156	   of the ingress PE from the outer IP encapsulation; it can then
2157	   determine whether the ingress PE is an OISM PE or a non-OISM PE by
2158	   looking the IMET route from that PE.

2160	   Suppose an IPMG receives an IP multicast frame from another EVPN-PE
2161	   in the Tenant Domain, and the IPMG is not the IPMG-DF for the frame's
2162	   source BD.  Then the IPMG performs only the ordinary OISM functions;
2163	   it does not perform the IPMG-specific functions for that frame.  In
2164	   the remainder of this section, when we discuss the procedures applied
2165	   by an IPMG when it receives an IP multicast frame, we are presuming
2166	   that the source BD of the frame is a BD for which the IPMG is the
2167	   IPMG-DF.

2169	   We have two basic cases to consider: (1) a frame's ingress PE is a
2170	   non-OISM node, and (2) a frame's ingress PE is an OISM node.

2172	5.2.1.  Ingress PE is non-OISM

2174	   In this case, a non-OISM PE, PE-S, has received an (S,G) multicast
2175	   frame over an AC that is attached to a particular BD, BD-S.  By
2176	   virtue of normal EVPN procedures, PE-S has sent a copy of the frame
2177	   to every PE-R (both OISM and non-OISM) in the Tenant Domain that is
2178	   attached to BD-S.  If the non-OISM node supports
2179	   [I-D.ietf-bess-evpn-igmp-mld-proxy], only PEs that have expressed
2180	   interest in (S,G) receive the frame.  The IPMG will have expressed
2181	   interest via a (C-*,C-*) SMET route and thus receives the frame.

2183	   Any OISM PE (including an IPMG) receiving the frame will apply normal
2184	   OISM procedures.  As a result it will deliver the frame to any of its
2185	   local ACs (in BD-S or in any other BD) that have interest in (S,G).

2187	   An OISM PE that is also the IPMG-DF for a particular BD, say BD-S,
2188	   has additional procedures that it applies to frames received on BD-S
2189	   from non-OISM PEs:

2191	   1.   When the IPMG-DF for BD-S receives an (S,G) frame from a
2192	        non-OISM node, it MUST forward a copy of the frame to every OISM
2193	        PE that is NOT attached to BD-S but has interest in (S,G).  The
2194	        copy sent to a given OISM PE-R must carry the label that PE-R
2195	        has assigned to the SBD in an S-PMSI A-D route.  The IPMG MUST
2196	        NOT do any IP processing of the frame's IP payload.  TTL
2197	        decrement and other IP processing will be done by PE-R, per the
2198	        normal OISM procedures.  There is no need for the IPMG to
2199	        include an ESI label in the frame's tunnel encapsulation,
2200	        because it is already known that the frame's source BD has no
2201	        presence on PE-R.  There is also no need for the IPMG to modify
2202	        the frame's MAC SA.

2204	   2.   In addition, when the IPMG-DF for BD-S receives an (S,G) frame
2205	        from a non-OISM node, it may need to forward copies of the frame
2206	        to other non-OISM nodes.  Before it does so, it MUST decapsulate
2207	        the (S,G) packet, and do the IP processing (e.g., TTL
2208	        decrement).  Suppose PE-R is a non-OISM node that has an AC to
2209	        BD-R, where BD-R is not the same as BD-S, and that AC has
2210	        interest in (S,G).  The IPMG must then encapsulate the (S,G)
2211	        packet (after the IP processing has been done) in an ethernet
2212	        header.  The MAC SA field will have the MAC address of the
2213	        IPMG's IRB interface to BD-R.  The IPMG then sends the frame to
2214	        PE-R.  The tunnel encapsulation will carry the label that PE-R
2215	        advertised in its IMET route for BD-R.  There is no need to
2216	        include an ESI label, as the source and destination BDs are
2217	        known to be different.

2219	        Note that if a non-OISM PE-R has several BDs (other than BD-S)
2220	        with local ACs that have interest in (S,G), the IPMG will send
2221	        it one copy for each such BD.  This is necessary because the
2222	        non-OISM PE cannot move packets from one BD to another.

2224	   There may be deployment scenarios in which every OISM PE is
2225	   configured with every BD that is present on any non-OISM PE.  In such
2226	   scenarios, the procedures of item 1 above will not actually result in
2227	   the transmission of any packets.  Hence if it is known a priori that
2228	   this deployment scenario exists for a given tenant domain, the
2229	   procedures of item 1 above can be disabled.

2231	5.2.2.  Ingress PE is OISM

2233	   In this case, an OISM PE, PE-S, has received an (S,G) multicast frame
2234	   over an AC that attaches to a particular BD, BD-S.

2236	   By virtue of receiving all the IMET routes about BD-S, PE-S will know
2237	   all the PEs attached to BD-S.  By virtue of normal OISM procedures:

2239	   o  PE-S will send a copy of the frame to every OISM PE-R (including
2240	      the IPMG) in the Tenant Domain that is attached to BD-S and has
2241	      interest in (S,G).  The copy sent to a given PE-R carries the
2242	      label that that the PE-R has assigned to BD-S in its (C-*,C-*)
2243	      S-PMSI A-D route.

2245	   o  PE-S will also transmit a copy of the (S,G) frame to every OISM
2246	      PE-R that has interest in (S,G) but is not attached to BD-S.  The
2247	      copy will contain the label that the PE-R has assigned to the SBD.
2248	      (As in Section 5.2.1, an IPMG is assumed to have indicated
2249	      interest in all multicast flows.)

2251	   o  PE-S will also transmit a copy of the (S,G) frame to every
2252	      non-OISM PE-R that is attached to BD-S.  It does this using the
2253	      label advertised by that PE-R in its IMET route for BD-S.

2255	   The PE-Rs follow their normal procedures.  An OISM PE that receives
2256	   the (S,G) frame on BD-S applies the OISM procedures to deliver the
2257	   frame to its local ACs, as necessary.  A non-OISM PE that receives
2258	   the (S,G) frame on BD-S delivers the frame only to its local BD-S
2259	   ACs, as necessary.

2261	   Suppose that a non-OISM PE-R has interest in (S,G) on a BD, BD-R,
2262	   that is different than BD-S.  If the non-OISM PE-R is attached to
2263	   BD-S, the OISM PE-S will send forward it the original (S,G) multicast
2264	   frame, but the non-OISM PE-R will not be able to send the frame to
2265	   ACs that are not in BD-S.  If PE-R is not even attached to BD-S, the
2266	   OISM PE-S will not send it a copy of the frame at all, because PE-R
2267	   is not attached to the SBD.  In these cases, the IPMG needs to relay
2268	   the (S,G) multicast traffic from OISM PE-S to non-OISM PE-R.

2270	   When the IPMG-DF for BD-S receives an (S,G) frame from an OISM PE-S,
2271	   it has to forward it to every non-OISM PE-R that that has interest in
2272	   (S,G) on a BD-R that is different than BD-S.  The IPMG MUST
2273	   decapsulate the IP multicast packet, do the IP processing, re-
2274	   encapsulate it for BD-R (changing the MAC SA to the IPMG's own MAC
2275	   address on BD-R), and send a copy of the frame to PE-R.  Note that a
2276	   given non-OISM PE-R will receive multiple copies of the frame, if it
2277	   has multiple BDs on which there is interest in the frame.

2279	5.3.  P2MP Tunnels

2281	   When IR is used to distribute the multicast traffic among the
2282	   EVPN-PEs, the procedures of Section 5.2 ensure that there will be no
2283	   duplicate delivery of multicast traffic.  That is, no egress PE will
2284	   ever send a frame twice on any given AC.  If P2MP tunnels are being
2285	   used to distribute the multicast traffic, it is necessary have
2286	   additional procedures to prevent duplicate delivery.

2288	   At the present time, it is not clear that there will be a use case in
2289	   which OISM nodes need to interwork with non-OISM nodes that use P2MP
2290	   tunnels.  If it is determined that there is such a use case,
2291	   procedures for it will be included in a future revision of this
2292	   document.

2294	6.  Traffic to/from Outside the EVPN Tenant Domain

2296	   In this section, we discuss scenarios where a multicast source
2297	   outside a given EVPN Tenant Domain sends traffic to receivers inside
2298	   the domain (as well as, possibly, to receivers outside the domain).
2299	   This requires the OISM procedures to interwork with various layer 3
2300	   multicast routing procedures.

2302	   We assume in this section that the Tenant Domain is not being used as
2303	   an intermediate transit network for multicast traffic; that is, we do
2304	   not consider the case where the Tenant Domain contains multicast
2305	   routers that will receive traffic from sources outside the domain and
2306	   forward the traffic to receivers outside the domain.  The transit
2307	   scenario is considered in Section 7.

2309	   We can divide the non-transit scenarios into two classes:

2311	   1.   One or more of the EVPN PE routers provide the functionality
2312	        needed to interwork with layer 3 multicast routing procedures.

2314	   2.   A single BD in the Tenant Domain contains external multicast
2315	        routers ("tenant multicast routers"), and those tenant multicast
2316	        routers are used to interwork, on behalf of the entire Tenant
2317	        Domain, with layer 3 multicast routing procedures.

2319	6.1.  Layer 3 Interworking via EVPN OISM PEs

2321	6.1.1.  General Principles

2323	   Sometimes it is necessary to interwork an EVPN Tenant Domain with an
2324	   external layer 3 multicast domain (the "external domain").  This is
2325	   needed to allow EVPN tenant systems to receive multicast traffic from
2326	   sources ("external sources") outside the EVPN Tenant Domain.  It is
2327	   also needed to allow receivers ("external receivers") outside the
2328	   EVPN Tenant Domain to receive traffic from sources inside the Tenant
2329	   Domain.

2331	   In order to allow interworking between an EVPN Tenant Domain and an
2332	   external domain, one or more OISM PEs must be "L3 Gateways".  An L3
2333	   Gateway participates both in the OISM procedures and in the L3
2334	   multicast routing procedures of the external domain.

2336	   An L3 Gateway that has interest in receiving (S,G) traffic must be
2337	   able to determine the best route to S.  If an L3 Gateway has interest
2338	   in (*,G), it must be able to determine the best route to G's RP.  In
2339	   these interworking scenarios, the L3 Gateway must be running a layer
2340	   3 unicast routing protocol.  Via this protocol, it imports unicast
2341	   routes (either IP routes or VPN-IP routes) from routers other than
2342	   EVPN PEs.  And since there may be multicast sources inside the EVPN
2343	   Tenant Domain, the EVPN PEs also need to export, either as IP routes
2344	   or as VPN-IP routes (depending upon the external domain), unicast
2345	   routes to those sources.

2347	   When selecting the best route to a multicast source or RP, an L3
2348	   Gateway might have a choice between an EVPN route and an IP/VPN-IP
2349	   route.  When such a choice exists, the L3 Gateway SHOULD always
2350	   prefer the EVPN route.  This will ensure that when traffic originates
2351	   in the Tenant Domain and has a receiver in the Tenant Domain, the
2352	   path to that receiver will remain within the EVPN Tenant Domain, even
2353	   if the source is also reachable via a routed path.  This also
2354	   provides protection against sub-optimal routing that might occur if
2355	   two EVPN PEs export IP/VPN-IP routes and each imports the other's IP/
2356	   VPN-IP routes.

2358	   Section 4.2 discusses the way layer 3 multicast states are
2359	   constructed by OISM PEs.  These layer 3 multicast states have IRB
2360	   interfaces as their IIF and OIF list entries, and are the basis for
2361	   interworking OISM with other layer 3 multicast procedures such as
2362	   MVPN or PIM.  From the perspective of the layer 3 multicast
2363	   procedures running in a given L3 Gateway, an EVPN Tenant Domain is a
2364	   set of IRB interfaces.

2366	   When interworking an EVPN Tenant Domain with an external domain, the
2367	   L3 Gateway's layer 3 multicast states will not only have IRB
2368	   interfaces as IIF and OIF list entries, but also other "interfaces"
2369	   that lead outside the Tenant Domain.  For example, when interworking
2370	   with MVPN, the multicast states may have MVPN tunnels as well as IRB
2371	   interfaces as IIF or OIF list members.  When interworking with PIM,
2372	   the multicast states may have PIM-enabled non-IRB interfaces as IIF
2373	   or OIF list members.

2375	   As long as a Tenant Domain is not being used as an intermediate
2376	   transit network for IP multicast traffic, it is not necessary to
2377	   enable PIM on its IRB interfaces.

2379	   In general, an L3 Gateway has the following responsibilities:

2381	   o  It exports, to the external domain, unicast routes to those
2382	      multicast sources in the EVPN Tenant Domain that are locally
2383	      attached to the L3 Gateway.

2385	   o  It imports, from the external domain, unicast routes to multicast
2386	      sources that are in the external domain.

2388	   o  It executes the procedures necessary to draw externally sourced
2389	      multicast traffic that is of interest to locally attached
2390	      receivers in the EVPN Tenant Domain.  When such traffic is
2391	      received, the traffic is sent down the IRB interfaces of the BDs
2392	      on which the locally attached receivers reside.

2394	   One of the L3 Gateways in a given Tenant Domain becomes the "DR" for
2395	   the SBD.  (See Section 6.1.2.4.)  This L3 gateway has the following
2396	   additional responsibilities:

2398	   o  It exports, to the external domain, unicast routes to multicast
2399	      sources that in the EVPN Tenant Domain that are not locally
2400	      attached to any L3 gateway.

2402	   o  It imports, from the external domain, unicast routes to multicast
2403	      sources that are in the external domain.

2405	   o  It executes the procedures necessary to draw externally sourced
2406	      multicast traffic that is of interest to receivers in the EVPN
2407	      Tenant Domain that are not locally attached to an L3 gateway.
2408	      When such traffic is received, the traffic is sent down the SBD
2409	      IRB interface.  OISM procedures already described in this document
2410	      will then ensure that the IP multicast traffic gets distributed
2411	      throughout the Tenant Domain to any EVPN PEs that have interest in
2412	      it.  Thus to an OISM PE that is not an L3 gateway the externally
2413	      sourced traffic will appear to have been sourced on the SBD.

2415	   In order for this to work, some special care is needed when an L3
2416	   gateway creates or modifies a layer 3 (*,G) multicast state.  Suppose
2417	   group G has both external sources (sources outside the EVPN Tenant
2418	   Domain) and internal sources (sources inside the EVPN tenant domain).
2419	   Section 4.2 states that when there are internal sources, the SBD IRB
2420	   interface must not be added to the OIF list of the (*,G) state.
2421	   Traffic from internal sources will already have been delivered to all
2422	   the EVPN PEs that have interest in it.  However, if the OIF list of
2423	   the (*,G) state does not contain its SBD IRB interface, then traffic
2424	   from external sources will not get delivered to other EVPN PEs.

2426	   One way of handling this is the following.  When a L3 gateway
2427	   receives (S,G) traffic from other than an IRB interface, and the
2428	   traffic corresponds to a layer 3 (*,G) state, the L3 gateway can
2429	   create (S,G) state.  The IIF will be set to the external interface
2430	   over which the traffic is expected.  The OIF list will contain the
2431	   SBD IRB interface, as well as the IRB interfaces of any other BDs
2432	   attached to the PEG DR that have locally attached receivers with
2433	   interest in the (S,G) traffic.  The (S,G) state will ensure that the
2434	   external traffic is sent down the SBD IRB interface.  The following
2435	   text will assume this procedure; however other implementation
2436	   techniques may also be possible.

2438	   If a particular BD is attached to several L3 Gateways, one of the L3
2439	   Gateways becomes the DR for that BD.  (See Section 6.1.2.4.)  If the
2440	   interworking scenario requires FHR functionality, it is generally the
2441	   DR for a particular BD that is responsible for performing that
2442	   functionality on behalf of the source hosts on that BD.  (E.g., if
2443	   the interworking scenario requires that PIM Register messages be sent
2444	   by a FHR, the DR for a given BD would send the PIM Register messages
2445	   for sources on that BD.)  Note though that the DR for the SBD does
2446	   not perform FHR functionality on behalf of external sources.

2448	   An optional alternative is to have each L3 gateway perform FHR
2449	   functionality for locally attached sources.  Then the DR would only
2450	   have to perform FHR functionality on behalf of sources that are
2451	   locally attached to itself AND sources that are not attached to any
2452	   L3 gateway.

2454	   N.B.: If it is possible that more than one BD contains a tenant
2455	   multicast router, then a PE receiving an SMET route for that BD MUST
2456	   NOT reconstruct IGMP Join Reports from the SMET route, and MUST NOT
2457	   transmit any such IGMP Join Reports on its local ACs attaching to
2458	   that BD.  Otherwise, multicast traffic may be duplicated.

2460	6.1.2.  Interworking with MVPN

2462	   In this section, we specify the procedures necessary to allow EVPN
2463	   PEs running OISM procedures to interwork with L3VPN PEs that run BGP-
2464	   based MVPN ([RFC6514]) procedures.  More specifically, the procedures
2465	   herein allow a given EVPN Tenant Domain to become part of an L3VPN/
2466	   MVPN, and support multicast flows where either:

2468	   o  The source of a given multicast flow is attached to an ethernet
2469	      segment whose BD is part of an EVPN Tenant Domain, and one or more
2470	      receivers of the flow are attached to the network via L3VPN/MVPN.
2471	      (Other receivers may be attached to the network via EVPN.)

2473	   o  The source of a given multicast flow is attached to the network
2474	      via L3VPN/MVPN, and one or more receivers of the flow are attached
2475	      to an ethernet segment that is part of an EVPN tenant domain.
2476	      (Other receivers may be attached via L3VPN/MVPN.)

2478	   In this interworking model, existing L3VPN/MVPN PEs are unaware that
2479	   certain sources or receivers are part of an EVPN Tenant Domain.  The
2480	   existing L3VPN/MVPN nodes run only their standard procedures and are
2481	   entirely unaware of EVPN.  Interworking is achieved by having some or
2482	   all of the EVPN PEs function as L3 Gateways running L3VPN/MVPN
2483	   procedures, as detailed in the following sub-sections.

2485	   In this section, we assume that there are no tenant multicast routers
2486	   on any of the EVPN-attached ethernet segments.  (There may of course
2487	   be multicast routers in the L3VPN.)  Consideration of the case where
2488	   there are tenant multicast routers is deferred till Section 7.)

2490	   To support MVPN/EVPN interworking, we introduce the notion of an
2491	   MVPN/EVPN Gateway, or MEG.

2493	   A MEG is an L3 Gateway (see Section 6.1.1), hence is both an OISM PE
2494	   and an L3VPN/MVPN PE.  For a given EVPN Tenant Domain it will have an
2495	   IP-VRF.  If the Tenant Domain is part of an L3VPN/MVPN, the IP-VRF
2496	   also serves as an L3VPN VRF ([RFC4364]).  The IRB interfaces of the
2497	   IP-VRF are considered to be "VRF interfaces" of the L3VPN VRF.  The
2498	   L3VPN VRF may also have other local VRF interfaces that are not EVPN
2499	   IRB interfaces.

2501	   The VRF on the MEG will import VPN-IP routes ([RFC4364]) from other
2502	   L3VPN Provider Edge (PE) routers.  It will also export VPN-IP routes
2503	   to other L3VPN PE routers.  In order to do so, it must be
2504	   appropriately configured with the Route Targets used in the L3VPN to
2505	   control the distribution of the VPN-IP routes.  These Route Targets
2506	   will in general be different than the Route Targets used for
2507	   controlling the distribution of EVPN routes, as there is no need to
2508	   distribute EVPN routes to L3VPN-only PEs and no reason to distribute
2509	   L3VPN/MVPN routes to EVPN-only PEs.

2511	   Note that the RDs in the imported VPN-IP routes will not necessarily
2512	   conform to the EVPN rules (as specified in [RFC7432]) for creating
2513	   RDs.  Therefore a MEG MUST NOT expect the RDs of the VPN-IP routes to
2514	   be of any particular format other than what is required by the L3VPN/
2515	   MVPN specifications.

2517	   The VPN-IP routes that a MEG exports to L3VPN are subnet routes and/
2518	   or host routes for the multicast sources that are part of the EVPN
2519	   tenant domain.  The exact set of routes that need to be exported is
2520	   discussed in Section 6.1.2.2.

2522	   Each IMET route originated by a MEG SHOULD carry a Multicast Flags
2523	   Extended Community with the "MEG" flag set, indicating that the
2524	   originator of the IMET route is a MEG.  However, PE1 will consider
2525	   PE2 to be a MEG if PE1 imports at least one IMET route from PE2 that
2526	   carries the Multicast Flags EC with the MEG flag set.

2528	   All the MEGs of a given Tenant Domain attach to the SBD of that
2529	   domain, and one of them is selected to be the SBD's Designated Router
2530	   (the "MEG SBD-DR") for the domain.  The selection procedure is
2531	   discussed in Section 6.1.2.4.

2533	   In this model of operation, MVPN procedures and EVPN procedures are
2534	   largely independent.  In particular, there is no assumption that MVPN
2535	   and EVPN use the same kind of tunnels.  Thus no special procedures
2536	   are needed to handle the common scenarios where, e.g., EVPN uses
2537	   VXLAN tunnels but MVPN uses MPLS P2MP tunnels, or where EVPN uses
2538	   Ingress Replication but MVPN uses MPLS P2MP tunnels.

2540	   Similarly, no special procedures are needed to prevent duplicate data
2541	   delivery on ethernet segments that are multi-homed.

2543	   The MEG does have some special procedures (described below) for
2544	   interworking between EVPN and MVPN; these have to do with selection
2545	   of the Upstream PE for a given multicast source, with the exporting
2546	   of VPN-IP routes, and with the generation of MVPN C-multicast routes
2547	   triggered by the installation of SMET routes.

2549	6.1.2.1.  MVPN Sources with EVPN Receivers

2551	6.1.2.1.1.  Identifying MVPN Sources

2553	   Consider a multicast source S.  It is possible that a MEG will import
2554	   both an EVPN unicast route to S and a VPN-IP route (or an ordinary IP
2555	   route), where the prefix length of each route is the same.  In order
2556	   to draw (S,G) multicast traffic for any group G, the MEG SHOULD use
2557	   the EVPN route rather than the VPN-IP or IP route to determine the
2558	   "Upstream PE" (see section 5 of [RFC6513]).

2560	   Doing so ensures that when an EVPN tenant system desires to receive a
2561	   multicast flow from another EVPN tenant system, the traffic from the
2562	   source to that receiver stays within the EVPN domain.  This prevents
2563	   problems that might arise if there is a unicast route via L3VPN to S,
2564	   but no multicast routers along the routed path.  This also prevents
2565	   problem that might arise as a result of the fact that the MEGs will
2566	   import each others' VPN-IP routes.

2568	   In the Section 6.1.2.1.2, we describe the procedures to be used when
2569	   the selected route to S is a VPN-IP route.

2571	6.1.2.1.2.  Joining a Flow from an MVPN Source

2573	   Consider a tenant system, R, on a particular BD, BD-R.  Suppose R
2574	   wants to receive (S,G) multicast traffic, where source S is not
2575	   attached to any PE in the EVPN Tenant Domain, but is attached to an
2576	   MVPN PE.

2578	   o  Suppose R is on a singly homed ethernet segment of BD-R, and that
2579	      segment is attached to PE1, where PE1 is a MEG.  PE1 learns via
2580	      IGMP/MLD listening that R is interested in (S,G).  PE1 determines
2581	      from its VRF that there is no route to S within the Tenant Domain
2582	      (i.e., no EVPN RT-2 route with S's IP address), but that there is
2583	      a route to S via L3VPN (i.e., the VRF contains a subnet or host
2584	      route to S that was received as a VPN-IP route).  PE1 thus
2585	      originates (if it hasn't already) an MVPN C-multicast Source Tree
2586	      Join(S,G) route.  The route is constructed according to normal
2587	      MVPN procedures.

2589	      The layer 2 multicast state is constructed as specified in
2590	      Section 4.1.

2592	      In the layer 3 multicast state, the IIF is the appropriate MVPN
2593	      tunnel, and the IRB interface to BD-R is added to the OIF list.

2595	      When PE1 receives (S,G) traffic from the appropriate MVPN tunnel,
2596	      it performs IP processing of the traffic, and then sends the
2597	      traffic down its IRB interface to BD-R.  Following normal OISM
2598	      procedures, the (S,G) traffic will be encapsulated for ethernet
2599	      and sent out the AC to which R is attached.

2601	   o  Suppose R is on a singly homed ethernet segment of BD-R, and that
2602	      segment is attached to PE1, where PE1 is an OISM PE but is NOT a
2603	      MEG.  PE1 learns via IGMP/MLD listening that R is interested in
2604	      (S,G).  PE1 follows normal OISM procedures, originating an SBD-
2605	      SMET route for (S,G); this route will be received by all the MEGs
2606	      of the Tenant Domain, including the MEG SBD-DR.  The MEG SBD-DR
2607	      can determine from PE1's IMET routes whether PE1 is itself a MEG.
2608	      If PE1 is not a MEG, the MEG SBD-DR will originate (if it hasn't
2609	      already) an MVPN C-multicast Source Tree Join(S,G) route.  This
2610	      will cause the MEG SBD-DR to receive (S,G) traffic on an MVPN
2611	      tunnel.

2613	      The layer 2 multicast state is constructed as specified in
2614	      Section 4.1.

2616	      In the layer 3 multicast state, the IIF is the appropriate MVPN
2617	      tunnel, and the IRB interface to the SBD is added to the OIF list.

2619	      When the MEG SBD-DR receives (S,G) traffic on an MVPN tunnel, it
2620	      performs IP processing of the traffic, and the sends the traffic
2621	      down its IRB interface to the SBD.  Following normal OISM
2622	      procedures, the traffic will be encapsulated for ethernet and
2623	      delivered to all PEs in the Tenant Domain that have interest in
2624	      (S,G), including PE1.

2626	   o  If R is on a multi-homed ethernet segment of BD-R, one of the PEs
2627	      attached to the segment will be its DF (following normal EVPN
2628	      procedures), and the DF will know (via IGMP/MLD listening or the
2629	      procedures of [I-D.ietf-bess-evpn-igmp-mld-proxy]) that a tenant
2630	      system reachable via one of its local ACs to BD-R is interested in
2631	      (S,G) traffic.  The DF is responsible for originating an SBD-SMET
2632	      route for (S,G), following normal OISM procedures.  If the DF is a
2633	      MEG, it MUST originate the corresponding MVPN C-multicast Source
2634	      Tree Join(S,G) route; if the DF is not a MEG, the MEG SBD-DR SBD
2635	      MUST originate the C-multicast route when it receives the SMET
2636	      route.

2638	      Optionally, if the non-DF is a MEG, it MAY originate the
2639	      corresponding MVPN C-multicast Source Tree Join(S,G) route.  This
2640	      will cause the traffic to flow to both the DF and the non-DF, but
2641	      only the DF will forward the traffic out an AC.  This allows for
2642	      quicker recovery if the DF's local AC to R fails.

2644	   o  If R is attached to a non-OISM PE, it will receive the traffic via
2645	      an IPMG, as specified in Section 5.

2647	   If an EVPN-attached receiver is interested in (*,G) traffic, and if
2648	   it is possible for there to be sources of (*,G) traffic that are
2649	   attached only to L3VPN nodes, the MEGs will have to know the group-
2650	   to-RP mappings.  That will enable them to originate MVPN C-multicast
2651	   Shared Tree Join(*,G) routes and to send them towards the RP.  (Since
2652	   we are assuming in this section that there are no tenant multicast
2653	   routers attached to the EVPN Tenant Domain, the RP must be attached
2654	   via L3VPN.  Alternatively, the MEG itself could be configured to
2655	   function as an RP for group G.)

2657	   The layer 2 multicast states are constructed as specified in
2658	   Section 4.1.

2660	   In the layer 3 (*,G) multicast state, the IIF is the appropriate MVPN
2661	   tunnel.  A MEG will add to the (*,G) OIF list its IRB interfaces for
2662	   any BDs containing locally attached receivers.  If there are
2663	   receivers attached to other EVPN PEs, then whenever (S,G) traffic
2664	   from an external source matches a (*,G) state, the MEG will create
2665	   (S,G) state, with the MVPN tunnel as the IIF, the OIF list copied
2666	   from the (*,G) state, and the SBD IRB interface added to the OIF
2667	   list.  (Please see the discussion in Section 6.1.1 regarding the
2668	   inclusion of the SBD IRB interface in a (*,G) state; the SBD IRB
2669	   interface is used in the OIF list only for traffic from external
2670	   sources.)

2672	   Normal MVPN procedures will then result in the MEG getting the (*,G)
2673	   traffic from all the multicast sources for G that are attached via
2674	   L3VPN.  This traffic arrives on MVPN tunnels.  When the MEG removes
2675	   the traffic from these tunnels, it does the IP processing.  If there
2676	   are any receivers on a given BD, BD-R, that are attached via local
2677	   EVPN ACs, the MEG sends the traffic down its BD-R IRB interface.  If
2678	   there are any other EVPN PEs that are interested in the (*,G)
2679	   traffic, the MEG sends the traffic down the SBD IRB interface.
2680	   Normal OISM procedures then distribute the traffic as needed to other
2681	   EVPN-PEs.

2683	6.1.2.2.  EVPN Sources with MVPN Receivers

2685	6.1.2.2.1.  General procedures

2687	   Consider the case where an EVPN tenant system S is sending IP
2688	   multicast traffic to group G, and there is a receiver R for the (S,G)
2689	   traffic that is attached to the L3VPN, but not attached to the EVPN
2690	   Tenant Domain.  (We assume in this document that the L3VPN/MVPN-only
2691	   nodes will not have any special procedures to deal with the case
2692	   where a source is inside an EVPN domain.)

2694	   In this case, an L3VPN PE through which R can be reached has to send
2695	   an MVPN C-multicast Join(S,G) route to one of the MEGs that is
2696	   attached to the EVPN Tenant Domain.  For this to happen, the L3VPN PE
2697	   must have imported a VPN-IP route for S (either a host route or a
2698	   subnet route) from a MEG.

2700	   If a MEG determines that there is multicast source transmitting on
2701	   one of its ACs, the MEG SHOULD originate a VPN-IP host route for that
2702	   source.  This determination SHOULD be made by examining the IP
2703	   multicast traffic that arrives on the ACs.  (It MAY be made by
2704	   provisioning.)  A MEG SHOULD NOT export a VPN-IP host route for any
2705	   IP address that is not known to be a multicast source (unless it has
2706	   some other reason for exporting such a route).  The VPN-IP host route
2707	   for a given multicast source MUST be withdrawn if the source goes
2708	   silent for a configurable period of time, or if it can be determined
2709	   that the source is no longer reachable via a local AC.

2711	   A MEG SHOULD also originate a VPN-IP subnet route for each of the BDs
2712	   in the Tenant Domain.

2714	   VPN-IP routes exported by a MEG must carry any attributes or extended
2715	   communities that are required by L3VPN and MVPN.  In particular, a
2716	   VPN-IP route exported by a MEG must carry a VRF Route Import Extended
2717	   Community corresponding to the IP-VRF from which it is imported, and
2718	   a Source AS Extended Community.

2720	   As a result, if S is attached to a MEG, the L3VPN nodes will direct
2721	   their MVPN C-multicast Join routes to that MEG.  Normal MVPN
2722	   procedures will cause the traffic to be delivered to the L3VPN nodes.
2723	   The layer 3 multicast state for (S,G) will have the MVPN tunnel on
2724	   its OIF list.  The IIF will be the IRB interface leading to the BD
2725	   containing S.

2727	   If S is not attached to a MEG, the L3VPN nodes will direct their
2728	   C-multicast Join routes to whichever MEG appears to be on the best
2729	   route to S's subnet.  Upon receiving the C-multicast Join, that MEG
2730	   will originate an EVPN SMET route for (S,G).  As a result, the MEG
2731	   will receive the (S,G) traffic at layer 2 via the OISM procedures.
2732	   The (S,G) traffic will be sent up the appropriate IRB interface, and
2733	   the layer 3 MVPN procedures will ensure that the traffic is delivered
2734	   to the L3VPN nodes that have requested it.  The layer 3 multicast
2735	   state for (S,G) will have the MVPN tunnel in the OIF list, and the
2736	   IIF will be one of the following:

2738	   o  If S belongs to a BD that is attached to the MEG, the IIF will be
2739	      the IRB interface to that BD;

2741	   o  Otherwise the IIF will be the SBD IRB interface.

2743	   Note that this works even if S is attached to a non-OISM PE, per the
2744	   procedures of Section 5.

2746	6.1.2.2.2.  Any-Source Multicast (ASM) Groups

2748	   Suppose the MEG SBD-DR learns that one of the PEs in its Tenant
2749	   Domain is interested in (*,G), traffic, where G is an Any-Source
2750	   Multicast (ASM) group.  If there are no tenant multicast routers, the
2751	   MEG SBD-DR SHOULD perform the "First Hop Router" (FHR) functionality
2752	   for group G on behalf of the Tenant Domain, as described in
2753	   [RFC7761].  This means that the MEG SBD-DR must know the identity of
2754	   the Rendezvous Point (RP) for each group, must send Register messages
2755	   to the Rendezvous Point, etc.

2757	   If the MEG SBD-DR is to be the FHR for the Tenant Domain, it must see
2758	   all the multicast traffic that is sourced from within the domain and
2759	   destined to an ASM group address.  The MEG can ensure this by
2760	   originating an SBD-SMET route for (*,*).

2762	   (As a possible optimization, an SBD-SMET route for (*, "any ASM
2763	   group") may be defined in a future revision of this draft.)

2765	   In some deployment scenarios, it may be preferred that the MEG that
2766	   receives the (S,G) traffic over an AC be the one provides the FHR
2767	   functionality.  This behavior is OPTIONAL.  If this option is used,
2768	   it MUST be ensured that the MEG DR does not provide the FHR
2769	   functionality for (S,G) traffic that is attached to another MEG; FHR
2770	   functionality for (S,G) traffic from a particular source S MUST be
2771	   provided by only a single router.

2773	   Other deployment scenarios are also possible.  For example, one might
2774	   want to configure the MEGs to themselves be RPs.  In this case, the
2775	   RPs would have to exchange with each other information about which
2776	   sources are active.  The method exchanging such information is
2777	   outside the scope of this document.

2779	6.1.2.2.3.  Source on Multihomed Segment

2781	   Suppose S is attached to a segment that is all-active multi-homed to
2782	   PEl and PE2.  If S is transmitting to two groups, say G1 and G2, it
2783	   is possible that PE1 will receive the (S,G1) traffic from S while PE2
2784	   receives the (S,G2) traffic from S.

2786	   This creates an issue for MVPN/EVPN interworking, because there is no
2787	   way to cause L3VPN/MVPN nodes to select PE1 as the ingress PE for
2788	   (S,G1) traffic while selecting PE2 as the ingress PE for (S,G2)
2789	   traffic.

2791	   However, the following procedure ensures that the IP multicast
2792	   traffic will still flow, even if the L3VPN/MVPN nodes picks the
2793	   "wrong" EVPN-PE as the Upstream PE for (say) the (S,G1) traffic.

2795	   Suppose S is on an ethernet segment, belonging to BD1, that is
2796	   multi-homed to both PE1 and PE2, where PE1 is a MEG.  And suppose
2797	   that IP multicast traffic from S to G travels over the AC that
2798	   attaches the segment to PE2 .  If PE1 receives a C-multicast Source
2799	   Tree Join (S,G) route, it MUST originate an SMET route for (S,G).
2800	   Normal OISM procedures will then cause PE2 to send the (S,G) traffic
2801	   to PE1 on an EVPN IP multicast tunnel.  Normal OISM procedures will
2802	   also cause PE1 to send the (S,G) traffic up its BD1 IRB interface.
2803	   Normal MVPN procedures will then cause PE1 to forward the traffic on
2804	   an MVPN tunnel.  In this case, the routing is not optimal, but the
2805	   traffic does flow correctly.

2807	6.1.2.3.  Obtaining Optimal Routing of Traffic Between MVPN and EVPN

2809	   The routing of IP multicast traffic between MVPN nodes and EVPN nodes
2810	   will be optimal as long as there is a MEG along the optimal route.
2811	   There are various deployment strategies that can be used to obtain
2812	   optimal routing between MVPN and EVPN.

2814	   In one such scenario, a Tenant Domain will have a small number of
2815	   strategically placed MEGs.  For example, a Data Center may have a
2816	   small number of MEGs that connect it to a wide-area network.  Then
2817	   the optimal route into or out of the Data Center would be through the
2818	   MEGs.

2820	   In this scenario, the MEGs do not need to originate VPN-IP host
2821	   routes for the multicast sources, they only need to originate VPN-IP
2822	   subnet routes.  The internal structure of the EVPN is completely
2823	   hidden from the MVPN node.  EVPN actions such as MAC Mobility and
2824	   Mass Withdrawal ([RFC7432]) have zero impact on the MVPN control
2825	   plane.

2827	   While this deployment scenario provides the most optimal routing and
2828	   has the least impact on the installed based of MVPN nodes, it does
2829	   complicate network planning considerations.

2831	   Another way of providing routing that is close to optimal is to turn
2832	   each EVPN PE into a MEG.  Then routing of MVPN-to-EVPN traffic is
2833	   optimal.  However, routing of EVPN-to-MVPN traffic is not guaranteed
2834	   to be optimal when a source host is on a multi-homed ethernet segment
2835	   (as discussed in Section 6.1.2.2.)

2837	   The obvious disadvantage of this method is that it requires every
2838	   EVPN PE to be a MEG.

2840	   The procedures specified in this document allow an operator to add
2841	   MEG functionality to any subset of his EVPN OISM PEs.  This allows an
2842	   operator to make whatever trade-offs he deems appropriate between
2843	   optimal routing and MEG deployment.

2845	6.1.2.4.  Selecting the MEG SBD-DR

2847	   Every PE that is eligible for selection as the MEG SBD-DR originates
2848	   an SBD-IMET route.  As stated in Section 5, these SBD-IMET routes
2849	   carry a Multicast Flags EC with the MEG Flag set.

2851	   These SBD-IMET routes SHOULD also carry a DF Election EC.  The DF
2852	   Election EC and its use is specified in ([RFC8584]).  When the route
2853	   is originated, the AC-DF bit in the DF Election EC SHOULD be set to
2854	   zero.  This bit is not used when selecting a MEG SBD-DR, i.e., it
2855	   MUST be ignored by the receiver of an SBD-IMET route.

2857	   In the context of a given Tenant Domain, to select the MEG SBD-DR,
2858	   the MEGs of the Tenant Domain perform the following procedure:

2860	   o  From the set of received SBD-IMET routes for the given tenant
2861	      domain, determine he candidate set of PEs that support MEG
2862	      functionality for that domain.

2864	   o  Select a DF Election algorithm as specified in [RFC8584].  Some of
2865	      the possible algorithms can be found, e.g., in [RFC7432],
2866	      [RFC8584], and [I-D.ietf-bess-evpn-pref-df].

2868	   o  Apply the DF Election Algorithm (see [RFC8584]) to the candidate
2869	      set of PEs.  The "winner" becomes the MEG SBD-DR.

2871	   Note that if a given PE supports IPMG (Section 6.1.2) or PEG
2872	   (Section 6.1.4) functionality as well as MEG functionality, its
2873	   SBD-IMET routes carry only one DF Election EC.

2875	6.1.3.  Interworking with 'Global Table Multicast'

2877	   If multicast service to the outside sources and/or receivers is
2878	   provided via the BGP-based "Global Table Multicast" (GTM) procedures
2879	   of [RFC7716], the procedures of Section 6.1.2 can easily be adapted
2880	   for EVPN/GTM interworking.  The way to adapt the MVPN procedures to
2881	   GTM is explained in [RFC7716].

2883	6.1.4.  Interworking with PIM

2885	   As we have been discussing, there may be receivers in an EVPN tenant
2886	   domain that are interested in multicast flows whose sources are
2887	   outside the EVPN Tenant Domain.  Or there may be receivers outside an
2888	   EVPN Tenant Domain that are interested in multicast flows whose
2889	   sources are inside the Tenant Domain.

2891	   If the outside sources and/or receivers are part of an MVPN,
2892	   interworking procedures are covered in Section 6.1.2.

2894	   There are also cases where an external source or receiver are
2895	   attached via IP, and the layer 3 multicast routing is done via PIM.
2896	   In this case, the interworking between the "PIM domain" and the EVPN
2897	   tenant domain is done at L3 Gateways that perform "PIM/EVPN Gateway"
2898	   (PEG) functionality.  A PEG is very similar to a MEG, except that its
2899	   layer 3 multicast routing is done via PIM rather than via BGP.

2901	   If external sources or receivers for a given group are attached to a
2902	   PEG via a layer 3 interface, that interface should be treated as a
2903	   VRF interface attached to the Tenant Domain's L3VPN VRF.  The layer 3
2904	   multicast routing instance for that Tenant Domain will either run PIM
2905	   on the VRF interface or will listen for IGMP/MLD messages on that
2906	   interface.  If the external receiver is attached elsewhere on an IP
2907	   network, the PE has to enable PIM on its interfaces to the backbone
2908	   network.  In both cases, the PE needs to perform PEG functionality,
2909	   and its IMET routes must carry the Multicast Flags EC with the PEG
2910	   flag set.

2912	   For each BD on which there is a multicast source or receiver, one of
2913	   the PEGs will becomes the PEG DR.  DR selection can be done using the
2914	   same procedures specified in Section 6.1.2.4, except with "PEG"
2915	   substituted for "MEG".

2917	   As long as there are no tenant multicast routers within the EVPN
2918	   Tenant Domain, the PEGs do not need to run PIM on their IRB
2919	   interfaces.

2921	6.1.4.1.  Source Inside EVPN Domain

2923	   If a PEG receives a PIM Join(S,G) from outside the EVPN tenant
2924	   domain, it may find it necessary to create (S,G) state.  The PE needs
2925	   to determine whether S is within the Tenant Domain.  If S is not
2926	   within the EVPN Tenant Domain, the PE carries out normal layer 3
2927	   multicast routing procedures.  If S is within the EVPN tenant domain,
2928	   the IIF of the (S,G) state is set as follows:

2930	   o  if S is on a BD that is attached to the PE, the IIF is the PE's
2931	      IRB interface to that BD;

2933	   o  if S is not on a BD that is attached to the PE, the IIF is the
2934	      PE's IRB interface to the SBD.

2936	   When the PE creates such an (S,G) state, it MUST originate (if it
2937	   hasn't already) an SBD-SMET route for (S,G).  This will cause it to
2938	   pull the (S,G) traffic via layer 2.  When the traffic arrives over an
2939	   EVPN tunnel, it gets sent up an IRB interface where the layer 3
2940	   multicast routing determines the packet's disposition.  The SBD-SMET
2941	   route is withdrawn when the (S,G) state no longer exists (unless
2942	   there is some other reason for not withdrawing it).

2944	   If there are no tenant multicast routers with the EVPN tenant domain,
2945	   there cannot be an RP in the Tenant Domain, so a PEG does not have to
2946	   handle externally arriving PIM Join(*,G) messages.

2948	   The PEG DR for a particular BD MUST act as the a First Hop Router for
2949	   that BD.  It will examine all (S,G) traffic on the BD, and whenever G
2950	   is an ASM group, the PEG DR will send Register messages to the RP for
2951	   G.  This means that the PEG DR will need to pull all the (S,G)
2952	   traffic originating on a given BD, by originating an SMET (*,*) route
2953	   for that BD.  If a PEG DR is the DR for all the BDS, in SHOULD
2954	   originate just an SBD-SMET (*,*) route rather than an SMET (*,*)
2955	   route for each BD.

2957	   The rules for exporting IP routes to multicast sources are the same
2958	   as those specified for MEGs in Section 6.1.2.2, except that the
2959	   exported routes will be IP routes rather than VPN-IP routes, and it
2960	   is not necessary to attach the VRF Route Import EC or the Source AS
2961	   EC.

2963	   When a source is on a multi-homed segment, the same issue discussed
2964	   in Section 6.1.2.2.3 exists.  Suppose S is on an ethernet segment,
2965	   belonging to BD1, that is multi-homed to both PE1 and PE2, where PE1
2966	   is a PEG.  And suppose that IP multicast traffic from S to G travels
2967	   over the AC that attaches the segment to PE2.  If PE1 receives an
2968	   external PIM Join (S,G) route, it MUST originate an SMET route for
2969	   (S,G).  Normal OISM procedures will cause PE2 to send the (S,G)
2970	   traffic to PE1 on an EVPN IP multicast tunnel.  Normal OISM
2971	   procedures will also cause PE1 to send the (S,G) traffic up its BD1
2972	   IRB interface.  Normal PIM procedures will then cause PE1 to forward
2973	   the traffic along a PIM tree.  In this case, the routing is not
2974	   optimal, but the traffic does flow correctly.

2976	6.1.4.2.  Source Outside EVPN Domain

2978	   By means of normal OISM procedures, a PEG learns whether there are
2979	   receivers in the Tenant Domain that are interested in receiving (*,G)
2980	   or (S,G) traffic.  The PEG must determine whether S (or the RP for G)
2981	   is outside the EVPN Tenant Domain.  If so, and if there is a receiver
2982	   on BD1 interested in receiving such traffic, the PEG DR for BD1 is
2983	   responsible for originating a PIM Join(S,G) or Join(*,G) control
2984	   message.

2986	   An alternative would be to allow any PEG that is directly attached to
2987	   a receiver to originate the PIM Joins.  Then the PEG DR would only
2988	   have to originate PIM Joins on behalf of receivers that are not
2989	   attached to a PEG.  However, if this is done, it is necessary for the
2990	   PEGs to run PIM on all their IRB interfaces, so that the PIM Assert
2991	   procedures can be used to prevent duplicate delivery to a given BD.

2993	   The IIF for the layer 3 (S,G) or (*,G) state is determined by normal
2994	   PIM procedures.  If a receiver is on BD1, and the PEG DR is attached
2995	   to BD1, its IRB interface to BD1 is added to the OIF list.  This
2996	   ensures that any receivers locally attached to the PEG DR will
2997	   receive the traffic.  If there are receivers attached to other EVPN
2998	   PEs, then whenever (S,G) traffic from an external source matches a
2999	   (*,G) state, the PEG will create (S,G) state.  The IIF will be set to
3000	   whatever external interface the traffic is expected to arrive on
3001	   (copied from the (*,G) state), the OIF list is copied from the (*,G)
3002	   state, and the SBD IRB interface added to the OIF list.

3004	6.2.  Interworking with PIM via an External PIM Router

3006	   Section 6.1 describes how to use an OISM PE router as the gateway to
3007	   a non-EVPN multicast domain, when the EVPN tenant domain is not being
3008	   used as an intermediate transit network for multicast.  An
3009	   alternative approach is to have one or more external PIM routers
3010	   (perhaps operated by a tenant) on one of the BDs of the tenant
3011	   domain.  We will refer to this BD as the "gateway BD".

3013	   In this model:

3015	   o  The EVPN Tenant Domain is treated as a stub network attached to
3016	      the external PIM routers.

3018	   o  The external PIM routers follow normal PIM procedures, and provide
3019	      the FHR and LHR functionality for the entire Tenant Domain.

3021	   o  The OISM PEs do not run PIM.

3023	   o  There MUST NOT be more than one gateway BD.

3025	   o  If an OISM PE not attached to the gateway BD has interest in a
3026	      given multicast flow, it conveys that interest, following normal
3027	      OISM procedures, by originating an SBD-SMET route for that flow.

3029	   o  If a PE attached to the gateway BD receives an SBD-SMET, it may
3030	      need to generate and transmit a corresponding IGMP/MLD Join out
3031	      one or more of its ACs.  (Procedures for generating an IGMP/MLD
3032	      Join as a result of receiving an SMET route are given in
3033	      [I-D.ietf-bess-evpn-igmp-mld-proxy].)  The PE MUST know which BD
3034	      is the Gateway BD and MUST NOT transmit an IGMP/MLD Join to any
3035	      other BDs.  Furthermore, even if a particular AC is part of that
3036	      BD, the PE SHOULD NOT transmit an IGMP/MLD Join on that AC unless
3037	      that an external PIM route is attached via that AC.

3039	      As a result, IGMP/MLD messages will seen by the external PIM
3040	      routers on the gateway BD, and those external PIM routers will
3041	      send PIM Join messages externally as required.  Traffic of the
3042	      given multicast flow will then be received by one of the external
3043	      PIM routers, and that traffic will be forwarded by that router to
3044	      the gateway BD.

3046	      The normal OISM procedures will then cause the given multicast
3047	      flow to be tunneled to any PEs of the EVPN Tenant Domain that have
3048	      interest in the flow.  PEs attached to the gateway BD will see the
3049	      flow as originating from the gateway BD, other PEs will see the
3050	      flow as originating from the SBD.

3052	   o  An OISM PE attached to a gateway BD MUST set its layer 2 multicast
3053	      state to indicate that each AC to the gateway BD has interest in
3054	      all multicast flows.  It MUST also originate an SMET route for
3055	      (*,*).  The procedures for originating SMET routes are discussed
3056	      in Section 2.5.

3058	      This will cause the OISM PEs attached to the gateway BD to receive
3059	      all the IP multicast traffic that is sourced within the EVPN
3060	      tenant domain, and to transmit that traffic to the gateway BD,
3061	      where the external PIM routers will see it.  This enables the
3062	      external PIM routers to perform FHR functions on behalf of the
3063	      entire Tenant Domain.  (Of course, if the gateway BD has a
3064	      multi-homed segment, only the PE that is the DF for that segment
3065	      will transmit the multicast traffic to the segment.)

3067	7.  Using an EVPN Tenant Domain as an Intermediate (Transit) Network for
3068	    Multicast traffic

3070	   In this section, we consider the scenario where one or more BDs of an
3071	   EVPN Tenant Domain are being used to carry IP multicast traffic for
3072	   which the source and at least one receiver are not part the tenant
3073	   domain.  That is, one or more BDs of the Tenant Domain are
3074	   intermediate "links" of a larger multicast tree created by PIM.

3076	   We define a "tenant multicast router" as a multicast router, running
3077	   PIM, that is:

3079	   1.  attached to one or more BDs of the Tenant Domain, but

3081	   2.  is not an EVPN PE router.

3083	   In order an EVPN Tenant Domain to be used as a transit network for IP
3084	   multicast, one or more of its BDs must have tenant multicast routers,
3085	   and an OISM PE that attaching to such a BD MUST be provisioned to
3086	   enable PIM on its IRB interface to that BD.  (This is true even if
3087	   none of the tenant routers is on a segment attached to the PE.)
3088	   Further, all the OISM PEs (even ones not attached to a BD with tenant
3089	   multicast routers) MUST be provisioned to enable PIM on their SBD IRB
3090	   interfaces.

3092	   If PIM is enabled on a particular BD, the DR Selection procedure of
3093	   Section 6.1.2.4 MUST be replaced by the normal PIM DR Election
3094	   procedure of [RFC7761].  Note that this may result in one of the
3095	   tenant routers being selected as the DR, rather than one of the OISM
3096	   PE routers.  In this case, First Hop Router and Last Hop Router
3097	   functionality will not be performed by any of the EVPN PEs.

3099	   A PIM control message on a particular BD is considered to be a
3100	   link-local multicast message, and as such is sent transparently from
3101	   PE to PE via the BUM tunnel for that BD.  This is true whether the
3102	   control message was received from an AC, or whether it was received
3103	   from the local layer 3 routing instance via an IRB interface.

3105	   A PIM Join/Prune message contains three fields that are relevant to
3106	   the present discussion:

3108	   o  Upstream Neighbor
3109	   o  Group Address (G)

3111	   o  Source Address (S), omitted in the case of (*,G) Join/Prune
3112	      messages.

3114	   We will generally speak of a PIM Join as a "Join(S,G)" or a
3115	   "Join(*,G)" message, and will use the term "Join(X,G)" to mean
3116	   "either Join(S,G) or Join(*,G)".  In the context of a Join(X,G), we
3117	   will use the term "X" to mean "S in the case of (S,G), or G's RP in
3118	   the case of (*,G)".

3120	   Suppose BD1 contains two tenant multicast routers, C1 and C2.
3121	   Suppose C1 is on a segment attached to PE1, and C2 is on a segment
3122	   attached to PE2.  When C1 sends a PIM Join(X,G) to BD1, the Upstream
3123	   Neighbor field might be set to either PE1, PE2, or C2.  C1 chooses
3124	   the Upstream Neighbor based on its unicast routing.  Typically, it
3125	   will choose as the Upstream Neighbor the PIM router on BD1 that is
3126	   "closest" (according to the unicast routing) to X.  Note that this
3127	   will not necessarily be PE1.  PE1 may not even be visible to the
3128	   unicast routing algorithm used by the tenant routers.  Even if it is,
3129	   it is unlikely to be the PIM router that is closest to X.  So we need
3130	   to consider the following two cases:

3132	   1.   C1 sends a PIM Join(X,G) to BD1, with PE1 as the Upstream
3133	        Neighbor.

3135	        PE1's PIM routing instance will see the Join arrive on the BD1
3136	        IRB interface.  If X is not within the Tenant Domain, PE1
3137	        handles the Join according to normal PIM procedures.  This will
3138	        generally result in PE1 selecting an Upstream Neighbor and
3139	        sending it a Join(X,G).

3141	        If X is within the Tenant Domain, but is attached to some other
3142	        PE, PE1 sends (if it hasn't already) an SBD-SMET route for
3143	        (X,G).  The IIF of the layer 3 (X,G) state will be the SBD IRB
3144	        interface, and the OIF list will include the IRB interface to
3145	        BD1.

3147	        The SBD-SMET route will pull the (X,G) traffic to PE1, and the
3148	        (X,G) state will result in the (X,G) traffic being forwarded to
3149	        C1.

3151	        If X is within the Tenant Domain, but is attached to PE1 itself,
3152	        no SBD-SMET route is sent.  The IIF of the layer 3 (X,G) state
3153	        will be the IRB interface to X's BD, and the OIF list will
3154	        include the IRB interface to BD1.

3156	   2.   C1 sends a PIM Join(X,G) to BD1, with either PE2 or C2 as the
3157	        Upstream Neighbor.

3159	        PE1's PIM routing instance will see the Join arrive on the BD1
3160	        IRB interface.  If neither X nor Upstream Neighbor is within the
3161	        tenant domain, PE1 handles the Join according to normal PIM
3162	        procedures.  This will NOT result in PE1 sending a Join(X,G).

3164	        If either X or Upstream Neighbor is within the Tenant Domain,
3165	        PE1 sends (if it hasn't already) an SBD-SMET route for (X,G).
3166	        The IIF of the layer 3 (X,G) state will be the SBD IRB
3167	        interface, and the OIF list will include the IRB interface to
3168	        BD1.

3170	        The SBD-SMET route will pull the (X,G) traffic to PE1, and the
3171	        (X,G) state will result in the (X,G) traffic being forwarded to
3172	        C1.

3174	8.  IANA Considerations

3176	   IANA is requested to assign new flags in the "Multicast Flags
3177	   Extended Community Flags" registry.  These flags are:

3179	   o  IPMG

3181	   o  MEG

3183	   o  PEG

3185	   o  OISM SBD

3187	   o  OISM-supported

3189	9.  Security Considerations

3191	   This document uses protocols and procedures defined in the normative
3192	   references, and inherits the security considerations of those
3193	   references.

3195	   This document adds flags or Extended Communities (ECs) to a number of
3196	   BGP routes, in order to signal that particular nodes support the
3197	   OISM, IPMG, MEG, and/or PEG functionalities that are defined in this
3198	   document.  Incorrect addition, removal, or modification of those
3199	   flags and/or ECs will cause the procedures defined herein to
3200	   malfunction, in which case loss or diversion of data traffic is
3201	   possible.

3203	10.  Acknowledgements

3205	   The authors thank Vikram Nagarajan and Princy Elizabeth for their
3206	   work on Section 6.2 and Section 3.2.3.1.  The authors also benefited
3207	   tremendously from discussions with Aldrin Isaac on EVPN multicast
3208	   optimizations.

3210	11.  References

3212	11.1.  Normative References

3214	   [I-D.ietf-bess-evpn-bum-procedure-updates]
3215	              Zhang, Z., Lin, W., Rabadan, J., Patel, K., and A.
3216	              Sajassi, "Updates on EVPN BUM Procedures", draft-ietf-
3217	              bess-evpn-bum-procedure-updates-08 (work in progress),
3218	              November 2019.

3220	   [I-D.ietf-bess-evpn-igmp-mld-proxy]
3221	              Sajassi, A., Thoria, S., Mishra, M., PAtel, K., Drake, J.,
3222	              and W. Lin, "IGMP and MLD Proxy for EVPN", draft-ietf-
3223	              bess-evpn-igmp-mld-proxy-09 (work in progress), April
3224	              2021.

3226	   [I-D.ietf-bess-evpn-inter-subnet-forwarding]
3227	              Sajassi, A., Salam, S., Thoria, S., Drake, J. E., and J.
3228	              Rabadan, "Integrated Routing and Bridging in EVPN", draft-
3229	              ietf-bess-evpn-inter-subnet-forwarding-13 (work in
3230	              progress), February 2021.

3232	   [I-D.ietf-bess-evpn-optimized-ir]
3233	              Rabadan, J., Sathappan, S., Lin, W., Katiyar, M., and A.
3234	              Sajassi, "Optimized Ingress Replication solution for
3235	              EVPN", draft-ietf-bess-evpn-optimized-ir-07 (work in
3236	              progress), July 2020.

3238	   [I-D.ietf-bess-evpn-prefix-advertisement]
3239	              Rabadan, J., Henderickx, W., Drake, J. E., Lin, W., and A.
3240	              Sajassi, "IP Prefix Advertisement in EVPN", draft-ietf-
3241	              bess-evpn-prefix-advertisement-11 (work in progress), May
3242	              2018.

3244	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
3245	              Requirement Levels", BCP 14, RFC 2119,
3246	              DOI 10.17487/RFC2119, March 1997,
3247	              <https://www.rfc-editor.org/info/rfc2119>.

3249	   [RFC2236]  Fenner, W., "Internet Group Management Protocol, Version
3250	              2", RFC 2236, DOI 10.17487/RFC2236, November 1997,
3251	              <https://www.rfc-editor.org/info/rfc2236>.

3253	   [RFC2710]  Deering, S., Fenner, W., and B. Haberman, "Multicast
3254	              Listener Discovery (MLD) for IPv6", RFC 2710,
3255	              DOI 10.17487/RFC2710, October 1999,
3256	              <https://www.rfc-editor.org/info/rfc2710>.

3258	   [RFC3032]  Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y.,
3259	              Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack
3260	              Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001,
3261	              <https://www.rfc-editor.org/info/rfc3032>.

3263	   [RFC4360]  Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
3264	              Communities Attribute", RFC 4360, DOI 10.17487/RFC4360,
3265	              February 2006, <https://www.rfc-editor.org/info/rfc4360>.

3267	   [RFC6625]  Rosen, E., Ed., Rekhter, Y., Ed., Hendrickx, W., and R.
3268	              Qiu, "Wildcards in Multicast VPN Auto-Discovery Routes",
3269	              RFC 6625, DOI 10.17487/RFC6625, May 2012,
3270	              <https://www.rfc-editor.org/info/rfc6625>.

3272	   [RFC7153]  Rosen, E. and Y. Rekhter, "IANA Registries for BGP
3273	              Extended Communities", RFC 7153, DOI 10.17487/RFC7153,
3274	              March 2014, <https://www.rfc-editor.org/info/rfc7153>.

3276	   [RFC7432]  Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
3277	              Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
3278	              Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
3279	              2015, <https://www.rfc-editor.org/info/rfc7432>.

3281	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
3282	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
3283	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

3285	   [RFC8584]  Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake,
3286	              J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet
3287	              VPN Designated Forwarder Election Extensibility",
3288	              RFC 8584, DOI 10.17487/RFC8584, April 2019,
3289	              <https://www.rfc-editor.org/info/rfc8584>.

3291	11.2.  Informative References

3293	   [I-D.ietf-bess-evpn-pref-df]
3294	              Rabadan, J., Sathappan, S., Przygienda, T., Lin, W.,
3295	              Drake, J., Sajassi, A., and S. Mohanty, "Preference-based
3296	              EVPN DF Election", draft-ietf-bess-evpn-pref-df-07 (work
3297	              in progress), March 2021.

3299	   [I-D.ietf-bier-evpn]
3300	              Zhang, Z., Przygienda, A., Sajassi, A., and J. Rabadan,
3301	              "EVPN BUM Using BIER", draft-ietf-bier-evpn-04 (work in
3302	              progress), December 2020.

3304	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
3305	              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
3306	              2006, <https://www.rfc-editor.org/info/rfc4364>.

3308	   [RFC6513]  Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/
3309	              BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February
3310	              2012, <https://www.rfc-editor.org/info/rfc6513>.

3312	   [RFC6514]  Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
3313	              Encodings and Procedures for Multicast in MPLS/BGP IP
3314	              VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012,
3315	              <https://www.rfc-editor.org/info/rfc6514>.

3317	   [RFC7606]  Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K.
3318	              Patel, "Revised Error Handling for BGP UPDATE Messages",
3319	              RFC 7606, DOI 10.17487/RFC7606, August 2015,
3320	              <https://www.rfc-editor.org/info/rfc7606>.

3322	   [RFC7716]  Zhang, J., Giuliano, L., Rosen, E., Ed., Subramanian, K.,
3323	              and D. Pacella, "Global Table Multicast with BGP Multicast
3324	              VPN (BGP-MVPN) Procedures", RFC 7716,
3325	              DOI 10.17487/RFC7716, December 2015,
3326	              <https://www.rfc-editor.org/info/rfc7716>.

3328	   [RFC7761]  Fenner, B., Handley, M., Holbrook, H., Kouvelas, I.,
3329	              Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent
3330	              Multicast - Sparse Mode (PIM-SM): Protocol Specification
3331	              (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March
3332	              2016, <https://www.rfc-editor.org/info/rfc7761>.

3334	   [RFC8296]  Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
3335	              Tantsura, J., Aldrin, S., and I. Meilik, "Encapsulation
3336	              for Bit Index Explicit Replication (BIER) in MPLS and Non-
3337	              MPLS Networks", RFC 8296, DOI 10.17487/RFC8296, January
3338	              2018, <https://www.rfc-editor.org/info/rfc8296>.

3340	Appendix A.  Integrated Routing and Bridging

3342	   This Appendix provides a short tutorial on the interaction of routing
3343	   and bridging.  First it shows the traditional model, where bridging
3344	   and routing are performed in separate boxes.  Then it shows the model
3345	   specified in [I-D.ietf-bess-evpn-inter-subnet-forwarding], where a
3346	   single box contains both routing and bridging functions.  The latter
3347	   model is presupposed in the body of this document.

3349	   Figure 1 shows a "traditional" router that only does routing and has
3350	   no L2 bridging capabilities.  There are two LANs, LAN1 and LAN2.
3351	   LAN1 is realized by switch1, LAN2 by switch2.  The router has an
3352	   interface, "lan1" that attaches to LAN1 (via switch1) and an
3353	   interface "lan2" that attachs to LAN2 (via switch2).  Each intreface
3354	   is configured, as an IP interface, with an IP address and a subnet
3355	   mask.

3357	               +-------+        +--------+        +-------+
3358	               |       |    lan1|        |lan2    |       |
3359	       H1 -----+Switch1+--------+ Router1+--------+Switch2+------H3
3360	               |       |        |        |        |       |
3361	       H2 -----|       |        |        |        |       |
3362	               +-------+        +--------+        +-------+
3363	           |_________________|              |__________________|
3364	               LAN1                              LAN2

3366	             Figure 1: Conventional Router with LAN Interfaces

3368	   IP traffic (unicast or multicast) that remains within a single subnet
3369	   never reaches the router.  For instance, if H1 emits an ethernet
3370	   frame with H2's MAC address in the ethernet destination address
3371	   field, the frame will go from H1 to Switch1 to H2, without ever
3372	   reaching the router.  Since the frame is never seen by a router, the
3373	   IP datagram within the frame remains entirely unchanged; e.g., its
3374	   TTL is not decremented.  The ethernet Source and Destination MAC
3375	   addresses are not changed either.

3377	   If H1 wants to send a unicast IP datagram to H3, which is on a
3378	   different subnet, H1 has to be configured with the IP address of a
3379	   "default router".  Let's assume that H1 is configured with an IP
3380	   address of Router1 as its default router address.  H1 compares H3's
3381	   IP address with its own IP address and IP subnet mask, and determines
3382	   that H3 is on a different subnet.  So the packet has to be routed.
3383	   H1 uses ARP to map Router1's IP address to a MAC address on LAN1.  H1
3384	   then encapsulates the datagram in an ethernet frame, using router1's
3385	   MAC address as the destination MAC address, and sends the frame to
3386	   Router1.

3388	   Router1 then receives the frame over its lan1 interface.  Router1
3389	   sees that the frame is addressed to it, so it removes the ethernet
3390	   encapsulation and processes the IP datagram.  The datagram is not
3391	   addressed to Router1, so it must be forwarded further.  Router1 does
3392	   a lookup of the datagram's IP destination field, and determines that
3393	   the destination (H3) can be reached via Router1's lan2 interface.
3394	   Router1 now performs the IP processing of the datagram: it decrements
3395	   the IP TTL, adjusts the IP header checksum (if present), may fragment
3396	   the packet is necessary, etc.  Then the datagram (or its fragments)
3397	   are encapsulated in an ethernet header, with Router1's MAC address on
3398	   LAN2 as the MAC Source Address, and H3's MAC address on LAN2 (which
3399	   Router1 determines via ARP) as the MAC Destination Address.  Finally
3400	   the packet is sent out the lan2 interface.

3402	   If H1 has an IP multicast datagram to send (i.e., an IP datagram
3403	   whose Destination Address field is an IP Multicast Address), it
3404	   encapsulates it in an ethernet frame whose MAC Destination Address is
3405	   computed from the IP Destination Address.

3407	   If H2 is a receiver for that multicast address, H2 will receive a
3408	   copy of the frame, unchanged, from H1.  The MAC Source Address in the
3409	   ethernet encapsulation does not change, the IP TTL field does not get
3410	   decremented, etc.

3412	   If H3 is a receiver for that multicast address, the datagram must be
3413	   routed to H3.  In order for this to happen, Router1 must be
3414	   configured as a multicast router, and it must accept traffic sent to
3415	   ethernet multicast addresses.  Router1 will receive H1's multicast
3416	   frame on its lan1 interface, will remove the ethernet encapsulation,
3417	   and will determine how to dispatch the IP datagram based on Router1's
3418	   multicast forwarding states.  If Router1 knows that there is a
3419	   receiver for the multicast datagram on LAN2, makes a copy of the
3420	   datagram, decrements the TTL (and performs any other necessary IP
3421	   processing), then encapsulates the datagram in ethernet frame for
3422	   LAN2.  The MAC Source Address for this frame will be Router1's MAC
3423	   Source Address on LAN2.  The MAC Destination Address is computed from
3424	   the IP Destination Address.  Finally, the frame is sent out Router1's
3425	   LAN2 interface.

3427	   Figure 2 shows an Integrated Router/Bridge that supports the routing/
3428	   bridging integration model of
3429	   [I-D.ietf-bess-evpn-inter-subnet-forwarding].

3431	                +------------------------------------------+
3432	                |         Integrated Router/Bridge         |

3434	                +-------+        +--------+        +-------+
3435	                |       |    IRB1|   L3   |IRB2    |       |
3436	        H1 -----+  BD1  +--------+Routing +--------+  BD2  +------H3
3437	                |       |        |Instance|        |       |
3438	        H2 -----|       |        |        |        |       |
3439	                +-------+        +--------+        +-------+
3440	           |___________________|            |____________________|
3441	                      LAN1                              LAN2

3443	                    Figure 2: Integrated Router/Bridge

3445	   In Figure 2, a single box consists of one or more "L3 Routing
3446	   Instances".  The routing/forwarding tables of a given routing
3447	   instance is known as an IP-VRF
3448	   ([I-D.ietf-bess-evpn-inter-subnet-forwarding]).  In the context of
3449	   EVPN, it is convenient to think of each routing instance as
3450	   representing the routing of a particular tenant.  Each IP-VRF is
3451	   attached to one or more interfaces.

3453	   When several EVPN PEs have a routing instance of the same tenant
3454	   domain, those PEs advertise IP routes to the attached hosts.  This is
3455	   done as specified in [I-D.ietf-bess-evpn-inter-subnet-forwarding].

3457	   The integrated router/bridge shown in Figure 2 also attaches to a
3458	   number of "Broadcast Domains" (BDs).  Each BD performs the functions
3459	   that are performed by the bridges in Figure 1.  To the L3 routing
3460	   instance, each BD appears to be a LAN.  The interface attaching a
3461	   particular BD to a particular IP-VRF is known as an "IRB Interface".
3462	   From the perspective of L3 routing, each BD is a subnet.  Thus each
3463	   IRB interface is configured with a MAC address (which is the router's
3464	   MAC address on the corresponding LAN), as well as an IP address and
3465	   subnet mask.

3467	   The integrated router/bridge shown in Figure 2 may have multiple ACs
3468	   to each BD.  These ACs are visible only to the bridging function, not
3469	   to the routing instance.  To the L3 routing instance, there is just
3470	   one "interface" to each BD.

3472	   If the L3 routing instance represents the IP routing of a particular
3473	   tenant, the BDs attached to that routing instance are BDs belonging
3474	   to that same tenant.

3476	   Bridging and routing now proceed exactly as in the case of Figure 1,
3477	   except that BD1 replaces Switch1, BD2 replaces Switch2, interface
3478	   IRB1 replaces interface lan1, and interface IRB2 replaces interface
3479	   lan2.

3481	   It is important to understand that an IRB interface connects an L3
3482	   routing instance to a BD, NOT to a "MAC-VRF".  (See [RFC7432] for the
3483	   definition of "MAC-VRF".)  A MAC-VRF may contain several BDs, as long
3484	   as no MAC address appears in more than one BD.  From the perspective
3485	   of the L3 routing instance, each individual BD is an individual IP
3486	   subnet; whether each BD has its own MAC-VRF or not is irrelevant to
3487	   the L3 routing instance.

3489	   Figure 3 illustrates IRB when a pair of BDs (subnets) are attached to
3490	   two different PE routers.  In this example, each BD has two segments,
3491	   and one segment of each BD is attached to one PE router.

3493	                +------------------------------------------+
3494	                |        Integrated Router/Bridges         |

3496	                +-------+        +--------+        +-------+
3497	                |       |    IRB1|        |IRB2    |       |
3498	        H1 -----+  BD1  +--------+   PE1  +--------+  BD2  +------H3
3499	                |(Seg-1)|        |(L3 Rtg)|        |(Seg-1)|
3500	        H2 -----|       |        |        |        |       |
3501	                +-------+        +--------+        +-------+
3502	           |___________________|     |       |____________________|
3503	                      LAN1           |                   LAN2
3504	                                     |
3505	                                     |
3506	                +-------+        +--------+        +-------+
3507	                |       |    IRB1|        |IRB2    |       |
3508	        H4 -----+  BD1  +--------+   PE2  +--------+  BD2  +------H5
3509	                |(Seg-2)|        |(L3 Rtg)|        |(Seg-2)|
3510	                |       |        |        |        |       |
3511	                +-------+        +--------+        +-------+

3513	        Figure 3: Integrated Router/Bridges with Distributed Subnet

3515	   If H1 needs to send an IP packet to H4, it determines from its IP
3516	   address and subnet mask that H4 is on the same subnet as H1.
3517	   Although H1 and H4 are not attached to the same PE router, EVPN
3518	   provides ethernet communication among all hosts that are on the same
3519	   BD.  H1 thus uses ARP to find H4's MAC address, and sends an ethernet
3520	   frame with H4's MAC address in the Destination MAC address field.
3521	   The frame is received at PE1, but since the Destination MAC address
3522	   is not PE1's MAC address, PE1 assumes that the frame is to remain on
3523	   BD1.  Therefore the packet inside the frame is NOT decapsulated, and
3524	   is NOT send up the IRB interface to PE1's routing instance.  Rather,
3525	   standard EVPN intra-subnet procedures (as detailed in [RFC7432] are
3526	   used to deliver the frame to PE2, which then sends it to H4.

3528	   If H1 needs to send an IP packet to H5, it determines from its IP
3529	   address and subnet mask that H5 is NOT on the same subnet as H1.
3530	   Assuming that H1 has been configured with the IP address of PE1 as
3531	   its default router, H1 sends the packet in an ethernet frame with
3532	   PE1's MAC address in its Destination MAC Address field.  PE1 receives
3533	   the frame, and sees that the frame is addressed to it.  PE1 thus
3534	   sends the frame up its IRB1 interface to the L3 routing instance.
3535	   Appropriate IP processing is done (e.g., TTL decrement).  The L3
3536	   routing instance determines that the "next hop" for H5 is PE2, so the
3537	   packet is encapsulated (e.g., in MPLS) and sent across the backbone
3538	   to PE2's routing instance.  PE2 will see that the packet's
3539	   destination, H5, is on BD2 segment-2, and will send the packet down
3540	   its IRB2 interface.  This causes the IP packet to be encapsulated in
3541	   an ethernet frame with PE2's MAC address (on BD2) in the Source
3542	   Address field and H5's MAC address in the Destination Address field.

3544	   Note that if H1 has an IP packet to send to H3, the forwarding of the
3545	   packet is handled entirely within PE1.  PE1's routing instance sees
3546	   the packet arrive on its IRB1 interface, and then transmits the
3547	   packet by sending it down its IRB2 interface.

3549	   Often, all the hosts in a particular Tenant Domain will be
3550	   provisioned with the same value of the default router IP address.
3551	   This IP address can be assigned, as an "anycast address", to all the
3552	   EVPN PEs attached to that Tenant Domain.  Thus although all hosts are
3553	   provisioned with the same "default router address", the actual
3554	   default router for a given host will be one of the PEs that is
3555	   attached to the same ethernet segment as the host.  This provisioning
3556	   method ensures that IP packets from a given host are handled by the
3557	   closest EVPN PE that supports IRB.

3559	   In the topology of Figure 3, one could imagine that H1 is configured
3560	   with a default router address that belongs to PE2 but not to PE1.
3561	   Inter-subnet routing would still work, but IP packets from H1 to H3
3562	   would then follow the non-optimal path H1-->PE1-->PE2-->PE1-->H3.
3563	   Sending traffic on this sort of path, where it leaves a router and
3564	   then comes back to the same router, is sometimes known as
3565	   "hairpinning".  Similarly, if PE2 supports IRB but PE1 dos not, the
3566	   same non-optimal path from H1 to H3 would have to be followed.  To
3567	   avoid hairpinning, each EVPN PE needs to support IRB.

3569	   It is worth pointing out the way IRB interfaces interact with
3570	   multicast traffic.  Referring again to Figure 3, suppose PE1 and PE2
3571	   are functioning as IP multicast routers.  Suppose also that H3
3572	   transmits a multicast packet, and both H1 and H4 are interested in
3573	   receiving that packet.  PE1 will receive the packet from H3 via its
3574	   IRB2 interface.  The ethernet encapsulation from BD2 is removed, the
3575	   IP header processing is done, and the packet is then reencapsulated
3576	   for BD1, with PE1's MAC address in the MAC Source Address field.
3577	   Then the packet is sent down the IRB1 interface.  Layer 2 procedures
3578	   (as defined in [RFC7432] would then be used to deliver a copy of the
3579	   packet locally to H1, and remotely to H4.

3581	   Please be aware that his document modifies the semantics, described
3582	   in the previous paragraph, of sending/receiving multicast traffic on
3583	   an IRB interface.  This is explained in Section 1.5.1 and subsequent
3584	   sections.

3586	Authors' Addresses

3588	   Wen Lin
3589	   Juniper Networks, Inc.
3590	   10 Technology Park Drive
3591	   Westford, Massachusetts  01886
3592	   United States

3594	   EMail: wlin@juniper.net

3596	   Zhaohui Zhang
3597	   Juniper Networks, Inc.
3598	   10 Technology Park Drive
3599	   Westford, Massachusetts  01886
3600	   United States

3602	   EMail: zzhang@juniper.net

3604	   John Drake
3605	   Juniper Networks, Inc.
3606	   1194 N. Mathilda Ave
3607	   Sunnyvale, CA  94089
3608	   United States

3610	   EMail: jdrake@juniper.net
3611	   Eric C. Rosen (editor)
3612	   Juniper Networks, Inc.
3613	   10 Technology Park Drive
3614	   Westford, Massachusetts  01886
3615	   United States

3617	   EMail: erosen52@gmail.com

3619	   Jorge Rabadan
3620	   Nokia
3621	   777 E. Middlefield Road
3622	   Mountain View, CA  94043
3623	   United States

3625	   EMail: jorge.rabadan@nokia.com

3627	   Ali Sajassi
3628	   Cisco Systems
3629	   170 West Tasman Drive
3630	   San Jose, CA  95134
3631	   United States

3633	   EMail: sajassi@cisco.com