idnits 2.17.1 draft-ietf-bess-evpn-irb-mcast-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 24, 2021) is 1061 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-14) exists of draft-ietf-bess-evpn-bum-procedure-updates-08 == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-igmp-mld-proxy-09 == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-13 == Outdated reference: A later version (-12) exists of draft-ietf-bess-evpn-optimized-ir-07 == Outdated reference: A later version (-13) exists of draft-ietf-bess-evpn-pref-df-07 == Outdated reference: A later version (-14) exists of draft-ietf-bier-evpn-04 Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS W. Lin 3 Internet-Draft Z. Zhang 4 Intended status: Standards Track J. Drake 5 Expires: November 25, 2021 E. Rosen, Ed. 6 Juniper Networks, Inc. 7 J. Rabadan 8 Nokia 9 A. Sajassi 10 Cisco Systems 11 May 24, 2021 13 EVPN Optimized Inter-Subnet Multicast (OISM) Forwarding 14 draft-ietf-bess-evpn-irb-mcast-06 16 Abstract 18 Ethernet VPN (EVPN) provides a service that allows a single Local 19 Area Network (LAN), comprising a single IP subnet, to be divided into 20 multiple "segments". Each segment may be located at a different 21 site, and the segments are interconnected by an IP or MPLS backbone. 22 Intra-subnet traffic (either unicast or multicast) always appears to 23 the endusers to be bridged, even when it is actually carried over the 24 IP or MPLS backbone. When a single "tenant" owns multiple such LANs, 25 EVPN also allows IP unicast traffic to be routed between those LANs. 26 This document specifies new procedures that allow inter-subnet IP 27 multicast traffic to be routed among the LANs of a given tenant, 28 while still making intra-subnet IP multicast traffic appear to be 29 bridged. These procedures can provide optimal routing of the inter- 30 subnet multicast traffic, and do not require any such traffic to 31 leave a given router and then reenter that same router. These 32 procedures also accommodate IP multicast traffic that needs to travel 33 to or from systems that are outside the EVPN domain. 35 Status of This Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at https://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on November 25, 2021. 51 Copyright Notice 53 Copyright (c) 2021 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (https://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 69 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 4 70 1.1.1. Segments, Broadcast Domains, and Tenants . . . . . . 4 71 1.1.2. Inter-BD (Inter-Subnet) IP Traffic . . . . . . . . . 5 72 1.1.3. EVPN and IP Multicast . . . . . . . . . . . . . . . . 6 73 1.1.4. BDs, MAC-VRFS, and EVPN Service Models . . . . . . . 7 74 1.2. Need for EVPN-aware Multicast Procedures . . . . . . . . 7 75 1.3. Additional Requirements That Must be Met by the Solution 8 76 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 10 77 1.5. Model of Operation: Overview . . . . . . . . . . . . . . 13 78 1.5.1. Control Plane . . . . . . . . . . . . . . . . . . . . 13 79 1.5.2. Data Plane . . . . . . . . . . . . . . . . . . . . . 15 80 2. Detailed Model of Operation . . . . . . . . . . . . . . . . . 17 81 2.1. Supplementary Broadcast Domain . . . . . . . . . . . . . 18 82 2.2. Detecting When a Route is About/For/From a Particular BD 18 83 2.3. Use of IRB Interfaces at Ingress PE . . . . . . . . . . . 21 84 2.4. Use of IRB Interfaces at an Egress PE . . . . . . . . . . 23 85 2.5. Announcing Interest in (S,G) . . . . . . . . . . . . . . 24 86 2.6. Tunneling Frames from Ingress PE to Egress PEs . . . . . 25 87 2.7. Advanced Scenarios . . . . . . . . . . . . . . . . . . . 26 88 3. EVPN-aware Multicast Solution Control Plane . . . . . . . . . 26 89 3.1. Supplementary Broadcast Domain (SBD) and Route Targets . 26 90 3.2. Advertising the Tunnels Used for IP Multicast . . . . . . 27 91 3.2.1. Constructing Routes for the SBD . . . . . . . . . . . 28 92 3.2.2. Ingress Replication . . . . . . . . . . . . . . . . . 28 93 3.2.3. Assisted Replication . . . . . . . . . . . . . . . . 29 94 3.2.3.1. Automatic SBD Matching . . . . . . . . . . . . . 30 95 3.2.4. BIER . . . . . . . . . . . . . . . . . . . . . . . . 30 96 3.2.5. Inclusive P2MP Tunnels . . . . . . . . . . . . . . . 31 97 3.2.5.1. Using the BUM Tunnels as IP Multicast Inclusive 98 Tunnels . . . . . . . . . . . . . . . . . . . . . 31 99 3.2.5.2. Using Wildcard S-PMSI A-D Routes to Advertise 100 Inclusive Tunnels Specific to IP Multicast . . . 33 101 3.2.6. Selective Tunnels . . . . . . . . . . . . . . . . . . 34 102 3.3. Advertising SMET Routes . . . . . . . . . . . . . . . . . 35 103 4. Constructing Multicast Forwarding State . . . . . . . . . . . 37 104 4.1. Layer 2 Multicast State . . . . . . . . . . . . . . . . . 37 105 4.1.1. Constructing the OIF List . . . . . . . . . . . . . . 38 106 4.1.2. Data Plane: Applying the OIF List to an (S,G) Frame . 39 107 4.1.2.1. Eligibility of an AC to Receive a Frame . . . . . 39 108 4.1.2.2. Applying the OIF List . . . . . . . . . . . . . . 39 109 4.2. Layer 3 Forwarding State . . . . . . . . . . . . . . . . 41 110 5. Interworking with non-OISM EVPN-PEs . . . . . . . . . . . . . 42 111 5.1. IPMG Designated Forwarder . . . . . . . . . . . . . . . . 44 112 5.2. Ingress Replication . . . . . . . . . . . . . . . . . . . 45 113 5.2.1. Ingress PE is non-OISM . . . . . . . . . . . . . . . 46 114 5.2.2. Ingress PE is OISM . . . . . . . . . . . . . . . . . 47 115 5.3. P2MP Tunnels . . . . . . . . . . . . . . . . . . . . . . 48 116 6. Traffic to/from Outside the EVPN Tenant Domain . . . . . . . 49 117 6.1. Layer 3 Interworking via EVPN OISM PEs . . . . . . . . . 49 118 6.1.1. General Principles . . . . . . . . . . . . . . . . . 49 119 6.1.2. Interworking with MVPN . . . . . . . . . . . . . . . 52 120 6.1.2.1. MVPN Sources with EVPN Receivers . . . . . . . . 54 121 6.1.2.1.1. Identifying MVPN Sources . . . . . . . . . . 54 122 6.1.2.1.2. Joining a Flow from an MVPN Source . . . . . 54 123 6.1.2.2. EVPN Sources with MVPN Receivers . . . . . . . . 57 124 6.1.2.2.1. General procedures . . . . . . . . . . . . . 57 125 6.1.2.2.2. Any-Source Multicast (ASM) Groups . . . . . . 58 126 6.1.2.2.3. Source on Multihomed Segment . . . . . . . . 59 127 6.1.2.3. Obtaining Optimal Routing of Traffic Between MVPN 128 and EVPN . . . . . . . . . . . . . . . . . . . . 59 129 6.1.2.4. Selecting the MEG SBD-DR . . . . . . . . . . . . 60 130 6.1.3. Interworking with 'Global Table Multicast' . . . . . 61 131 6.1.4. Interworking with PIM . . . . . . . . . . . . . . . . 61 132 6.1.4.1. Source Inside EVPN Domain . . . . . . . . . . . . 62 133 6.1.4.2. Source Outside EVPN Domain . . . . . . . . . . . 63 134 6.2. Interworking with PIM via an External PIM Router . . . . 63 135 7. Using an EVPN Tenant Domain as an Intermediate (Transit) 136 Network for Multicast traffic . . . . . . . . . . . . . . . . 65 137 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 67 138 9. Security Considerations . . . . . . . . . . . . . . . . . . . 67 139 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 68 140 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 68 141 11.1. Normative References . . . . . . . . . . . . . . . . . . 68 142 11.2. Informative References . . . . . . . . . . . . . . . . . 69 143 Appendix A. Integrated Routing and Bridging . . . . . . . . . . 71 144 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 76 146 1. Introduction 148 1.1. Background 150 Ethernet VPN (EVPN) [RFC7432] provides a Layer 2 VPN (L2VPN) 151 solution, which allows IP backbone provider to offer ethernet service 152 to a set of customers, known as "tenants". 154 In this section (as well as in 155 [I-D.ietf-bess-evpn-inter-subnet-forwarding]), we provide some 156 essential background information on EVPN. 158 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 159 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 160 "OPTIONAL" in this document are to be interpreted as described in BCP 161 14 [RFC2119] [RFC8174] when, and only when, they appear in all 162 capitals, as shown here. 164 1.1.1. Segments, Broadcast Domains, and Tenants 166 One of the key concepts of EVPN is the Broadcast Domain (BD). A BD 167 is essentially an emulated ethernet. Each BD belongs to a single 168 tenant. A BD typically consists of multiple ethernet "segments", and 169 each segment may be attached to a different EVPN Provider Edge 170 (EVPN-PE) router. EVPN-PE routers are often referred to as "Network 171 Virtualization Endpoints" or NVEs. However, this document will use 172 the term "EVPN-PE", or, when the context is clear, just "PE". 174 In this document, we use the term "segment" to mean the same as 175 "Ethernet Segment" or "ES" in [RFC7432]. 177 Attached to each segment are "Tenant Systems" (TSes). A TS may be 178 any type of system, physical or virtual, host or router, etc., that 179 can attach to an ethernet. 181 When two TSes are on the same segment, traffic between them does not 182 pass through an EVPN-PE. When two TSes are on different segments of 183 the same BD, traffic between them does pass through an EVPN-PE. 185 When two TSes, say TS1 and TS2 are on the same BD, then: 187 o If TS1 knows the MAC address of TS2, TS1 can send unicast ethernet 188 frames to TS2. TS2 will receive the frames unaltered. 190 o If TS1 broadcasts an ethernet frame, TS2 will receive the 191 unaltered frame. 193 o If TS1 multicasts an ethernet frame, TS2 will receive the 194 unaltered frame, as long as TS2 has been provisioned to receive 195 ethernet multicasts. 197 When we say that TS2 receives an unaltered frame from TS1, we mean 198 that the frame still contains TS1's MAC address, and that no 199 alteration of the frame's payload (and consequently, no alteration of 200 the payload's IP header) has been made. 202 EVPN allows a single segment to be attached to multiple PE routers. 203 This is known as "EVPN multi-homing". Suppose a given segment is 204 attached to both PE1 and PE2, and suppose PE1 receives a frame from 205 that segment. It may be necessary for PE1 to send the frame over the 206 backbone to PE2. EVPN has procedures to ensure that such a frame 207 cannot be sent by PE2 back to its originating segment. This is 208 particularly important for multicast, because a frame arriving at PE1 209 from a given segment will already have been seen by all the systems 210 on that segment that need to see it. If the frame were sent back to 211 the originating segment by PE2, receivers on that segment would 212 receive the packet twice. Even worse, the frame might be sent back 213 to PE1, which could cause an infinite loop. 215 1.1.2. Inter-BD (Inter-Subnet) IP Traffic 217 If a given tenant has multiple BDs, the tenant may wish to allow IP 218 communication among these BDs. Such a set of BDs is known as an 219 "EVPN Tenant Domain" or just a "Tenant Domain". 221 If tenant systems TS1 and TS2 are not in the same BD, then they do 222 not receive unaltered ethernet frames from each other. In order for 223 TS1 to send traffic to TS2, TS1 encapsulates an IP datagram inside an 224 ethernet frame, and uses ethernet to send these frames to an IP 225 router. The router decapsulates the IP datagram, does the IP 226 processing, and re-encapsulates the datagram for ethernet. The MAC 227 source address field now has the MAC address of the router, not of 228 TS1. The TTL field of the IP datagram should be decremented by 229 exactly 1, even if the frame needs to be sent from one PE to another. 230 The structure of the provider's IP backbone is thus hidden from the 231 tenants. 233 EVPN accommodates the need for inter-BD communication within a Tenant 234 Domain by providing an integrated L2/L3 service for unicast IP 235 traffic. EVPN's Integrated Routing and Bridging (IRB) functionality 236 is specified in [I-D.ietf-bess-evpn-inter-subnet-forwarding]. Each 237 BD in a Tenant Domain is assumed to be a single IP subnet, and each 238 IP subnet within a a given Tenant Domain is assumed to be a single 239 BD. EVPN's IRB functionality allows IP traffic to travel from one BD 240 to another, and ensures that proper IP processing (e.g., TTL 241 decrement) is done. 243 A brief overview of IRB, including the notion of an "IRB interface", 244 can be found in Appendix A. As explained there, an IRB interface is 245 a sort of virtual interface connecting an L3 routing instance to a 246 BD. A BD may have multiple attachment circuits (ACs) to a given PE, 247 where each AC connects to a different ethernet segment of the BD. 248 However, these ACs are not visible to the L3 routing function; from 249 the perspective of an L3 routing instance, a PE has just one 250 interface to each BD, viz., the IRB interface for that BD. 252 The "L3 routing instance" depicted in Appendix A is associated with a 253 single Tenant Domain, and may be thought of as an IP-VRF for that 254 Tenant Domain. 256 1.1.3. EVPN and IP Multicast 258 [I-D.ietf-bess-evpn-inter-subnet-forwarding] and 259 [I-D.ietf-bess-evpn-prefix-advertisement] cover inter-subnet 260 (inter-BD) IP unicast forwarding, but they do not cover inter-subnet 261 IP multicast forwarding. 263 [RFC7432] covers intra-subnet (intra-BD) ethernet multicast. The 264 intra-subnet ethernet multicast procedures of [RFC7432] are used for 265 ethernet Broadcast traffic, for ethernet unicast traffic whose MAC 266 Destination Address field contains an Unknown address, and for 267 ethernet traffic whose MAC Destination Address field contains an 268 ethernet Multicast MAC address. These three classes of traffic are 269 known collectively as "BUM traffic" (Broadcast/Unknown-Unicast/ 270 Multicast), and the procedures for handling BUM traffic are known as 271 "BUM procedures". 273 [I-D.ietf-bess-evpn-igmp-mld-proxy] extends the intra-subnet ethernet 274 multicast procedures by adding procedures that are specific to, and 275 optimized for, the use of IP multicast within a subnet. However,that 276 document does not cover inter-subnet IP multicast. 278 The purpose of this document is to specify procedures for EVPN that 279 provide optimized IP multicast functionality within an EVPN tenant 280 domain. This document also specifies procedures that allow IP 281 multicast packets to be sourced from or destined to systems outside 282 the Tenant Domain. We refer to the entire set of these procedures as 283 "OISM" (Optimized Inter-Subnet Multicast) procedures. 285 In order to support the OISM procedures specified in this document, 286 an EVPN-PE MUST also support 287 [I-D.ietf-bess-evpn-inter-subnet-forwarding] and 289 [I-D.ietf-bess-evpn-igmp-mld-proxy]. (However, certain of the 290 procedures in [I-D.ietf-bess-evpn-igmp-mld-proxy] are modified when 291 OISM is supported.) 293 1.1.4. BDs, MAC-VRFS, and EVPN Service Models 295 [RFC7432] defines the notion of "MAC-VRF". A MAC-VRF contains one or 296 more "Bridge Tables" (see section 3 of [RFC7432] for a discussion of 297 this terminology), each of which represents a single Broadcast 298 Domain. 300 In the IRB model (outlined in Appendix A) a L3 routing instance has 301 one IRB interface per BD, NOT one per MAC-VRF. This document does 302 not distinguish between a "Broadcast Domain" and a "Bridge Table", 303 and will use the terms interchangeably (or will use the acronym "BD" 304 to refer to either). The way the BDs are grouped into MAC-VRFs is 305 not relevant to the procedures specified in this document. 307 Section 6 of [RFC7432] also defines several different EVPN service 308 models: 310 o In the "vlan-based service", each MAC-VRF contains one "bridge 311 table", where the bridge table corresponds to a particular Virtual 312 LAN (VLAN). (See section 3 of [RFC7432] for a discussion of this 313 terminology.) Thus each VLAN is treated as a BD. 315 o In the "vlan bundle service", each MAC-VRF contains one bridge 316 table, where the bridge table corresponds to a set of VLANs. Thus 317 a set of VLANs are treated as constituting a single BD. 319 o In the "vlan-aware bundle service", each MAC-VRF may contain 320 multiple bridge tables, where each bridge table corresponds to one 321 BD. If a MAC-VRF contains several bridge tables, then it 322 corresponds to several BDs. 324 The procedures of this document are intended to work for all these 325 service models. 327 1.2. Need for EVPN-aware Multicast Procedures 329 Inter-subnet IP multicast among a set of BDs can be achieved, in a 330 non-optimal manner, without any specific EVPN procedures. For 331 instance, if a particular tenant has n BDs among which he wants to 332 send IP multicast traffic, he can simply attach a conventional 333 multicast router to all n BDs. Or more generally, as long as each BD 334 has at least one IP multicast router, and the IP multicast routers 335 communicate multicast control information with each other, 336 conventional IP multicast procedures will work normally, and no 337 special EVPN functionality is needed. 339 However, that technique does not provide optimal routing for 340 multicast. In conventional multicast routing, for a given multicast 341 flow, there is only one multicast router on each BD that is permitted 342 to send traffic of that flow to the BD. If that BD has receivers for 343 a given flow, but the source of the flow is not on that BD, then the 344 flow must pass through that multicast router. This leads to the 345 "hair-pinning" problem described (for unicast) in Appendix A. 347 For example, consider an (S,G) flow that is sourced by a TS S and 348 needs to be received by TSes R1 and R2. Suppose S is on a segment of 349 BD1, R1 is on a segment of BD2, but both are attached to PE1. 350 Suppose also that the tenant has a multicast router, attached to a 351 segment of BD1 and to a segment of BD2. However, the segments to 352 which that router is attached are both attached to PE2. Then the 353 flow from S to R would have to follow the path: 354 S-->PE1-->PE2-->Tenant Multicast Router-->PE2-->PE1-->R1. Obviously, 355 the path S-->PE1-->R would be preferred. 357 Now suppose that there is a second receiver, R2. R2 is attached to a 358 third BD, BD3. However, it is attached to a segment of BD3 that is 359 attached to PE1. And suppose also that the Tenant Multicast Router 360 is attached to a segment of BD3 that attaches to PE2. In this case, 361 the Tenant Multicast Router will make two copies of the packet, one 362 for BD2 and one for BD3. PE2 will send both copies back to PE1. Not 363 only is the routing sub-optimal, but PE2 sends multiple copies of the 364 same packet to PE1. This is a further sub-optimality. 366 This is only an example; many more examples of sub-optimal multicast 367 routing can easily be given. To eliminate sub-optimal routing and 368 extra copies, it is necessary to have a multicast solution that is 369 EVPN-aware, and that can use its knowledge of the internal structure 370 of a Tenant Domain to ensure that multicast traffic gets routed 371 optimally. The procedures of this document allow us to avoid all 372 such sub-optimalities when routing inter-subnet multicasts within a 373 Tenant Domain. 375 1.3. Additional Requirements That Must be Met by the Solution 377 In addition to providing optimal routing of multicast flows within a 378 Tenant Domain, the EVPN-aware multicast solution is intended to 379 satisfy the following requirements: 381 o The solution must integrate well with the procedures specified in 382 [I-D.ietf-bess-evpn-igmp-mld-proxy]. That is, an integrated set 383 of procedures must handle both intra-subnet multicast and 384 inter-subnet multicast. 386 o With regard to intra-subnet multicast, the solution MUST maintain 387 the integrity of multicast ethernet service. This means: 389 * If a source and a receiver are on the same subnet, the MAC 390 source address (SA) of the multicast frame sent by the source 391 will not get rewritten. 393 * If a source and a receiver are on the same subnet, no IP 394 processing of the ethernet payload is done. The IP TTL is not 395 decremented, the header checksum is not changed, no 396 fragmentation is done, etc. 398 o On the other hand, if a source and a receiver are on different 399 subnets, the frame received by the receiver will not have the MAC 400 Source address of the source, as the frame will appear to have 401 come from a multicast router. Also, proper processing of the IP 402 header is done, e.g., TTL decrement by 1, header checksum 403 modification, possibly fragmentation, etc. 405 o If a Tenant Domain contains several BDs, it MUST be possible for a 406 multicast flow (even when the multicast group address is an "any 407 source multicast" (ASM) address), to have sources in one of those 408 BDs and receivers in one or more of the other BDs, without 409 requiring the presence of any system performing PIM Rendezvous 410 Point (RP) functions ([RFC7761]). Multicast throughout a Tenant 411 Domain must not require the tenant systems to be aware of any 412 underlying multicast infrastructure. 414 o Sometimes a MAC address used by one TS on a particular BD is also 415 used by another TS on a different BD. Inter-subnet routing of 416 multicast traffic MUST NOT make any assumptions about the 417 uniqueness of a MAC address across several BDs. 419 o If two EVPN-PEs attached to the same Tenant Domain both support 420 the OISM procedures, each may receive inter-subnet multicasts from 421 the other, even if the egress PE is not attached to any segment of 422 the BD from which the multicast packets are being sourced. It 423 MUST NOT be necessary to provision the egress PE with knowledge of 424 the ingress BD. 426 o There must be a procedure that that allows EVPN-PE routers 427 supporting OISM procedures to send/receive multicast traffic to/ 428 from EVPN-PE routers that support only [RFC7432], but that do not 429 support the OISM procedures or even the procedures of 430 [I-D.ietf-bess-evpn-inter-subnet-forwarding]. However, when 431 interworking with such routers (which we call "non-OISM PE 432 routers"), optimal routing may not be achievable. 434 o It MUST be possible to support scenarios in which multicast flows 435 with sources inside a Tenant Domain have "external" receivers, 436 i.e., receivers that are outside the domain. It must also be 437 possible to support scenarios where multicast flows with external 438 sources (sources outside the Tenant Domain) have receivers inside 439 the domain. 441 This presupposes that unicast routes to multicast sources outside 442 the domain can be distributed to EVPN-PEs attached to the domain, 443 and that unicast routes to multicast sources within the domain can 444 be distributed outside the domain. 446 Of particular importance are the scenario in which the external 447 sources and/or receivers are reachable via L3VPN/MVPN, and the 448 scenario in which external sources and/or receivers are reachable 449 via IP/PIM. 451 The solution for external interworking MUST allow for deployment 452 scenarios in which EVPN does not need to export a host route for 453 every multicast source. 455 o The solution for external interworking must not presuppose that 456 the same tunneling technology is used within both the EVPN domain 457 and the external domain. For example, MVPN interworking must be 458 possible when MVPN is using MPLS P2MP tunneling, and EVPN is using 459 Ingress Replication or VXLAN tunneling. 461 o The solution must not be overly dependent on the details of a 462 small set of use cases, but must be adaptable to new use cases as 463 they arise. (That is, the solution must be robust.) 465 1.4. Terminology 467 In this document we make frequent use of the following terminology: 469 o OISM: Optimized Inter-Subnet Multicast. EVPN-PEs that follow the 470 procedures of this document will be known as "OISM" PEs. EVPN-PEs 471 that do not follow the procedures of this document will be known 472 as "non-OISM" PEs. 474 o IP Multicast Packet: An IP packet whose IP Destination Address 475 field is a multicast address that is not a link-local address. 476 (Link-local addresses are IPv4 addresses in the 224/8 range and 477 IPv6 address in the FF02/16 range.) 479 o IP Multicast Frame: An ethernet frame whose payload is an IP 480 multicast packet (as defined above). 482 o (S,G) Multicast Packet: An IP multicast packet whose IP Source 483 Address field contains S and whose IP Destination Address field 484 contains G. 486 o (S,G) Multicast Frame: An IP multicast frame whose payload 487 contains S in its IP Source Address field and G in its IP 488 Destination Address field. 490 o Broadcast Domain (BD): an emulated ethernet, such that two systems 491 on the same BD will receive each other's link-local broadcasts. 493 Note that EVPN supports service models in which a single EVPN 494 Instance (EVI) contains only one BD, and service models in which a 495 single EVI contains multiple BDs. Both types of service model are 496 supported by this draft. In all models, a given BD belongs to 497 only one EVI. 499 o Designated Forwarder (DF). As defined in [RFC7432], an ethernet 500 segment may be multi-homed (attached to more than one PE). An 501 ethernet segment may also contain multiple BDs, of one or more 502 EVIs. For each such EVI, one of the PEs attached to the segment 503 becomes that EVI's DF for that segment. Since a BD may belong to 504 only one EVI, we can speak unambiguously of the BD's DF for a 505 given segment. 507 When the text makes it clear that we are speaking in the context 508 of a given BD, we will frequently use the term "a segment's DF" to 509 mean the given BD's DF for that segment. 511 o AC: Attachment Circuit. An AC connects the bridging function of 512 an EVPN-PE to an ethernet segment of a particular BD. ACs are not 513 visible at the router (L3) layer. 515 If a given ethernet segment, attached to a given PE, contains n 516 BDs, we will say that the PE has n ACs to that segment. 518 o L3 Gateway: An L3 Gateway is a PE that connects an EVPN tenant 519 domain to an external multicast domain by performing both the OISM 520 procedures and the Layer 3 multicast procedures of the external 521 domain. 523 o PEG (PIM/EVPN Gateway): A L3 Gateway that connects an EVPN Tenant 524 Domain to an external multicast domain whose Layer 3 multicast 525 procedures are those of PIM ([RFC7761]). 527 o MEG (MVPN/EVPN Gateway): A L3 Gateway that connects an EVPN Tenant 528 Domain to an external multicast domain whose Layer 3 multicast 529 procedures are those of MVPN ([RFC6513], [RFC6514]). 531 o IPMG (IP Multicast Gateway): A PE that is used for interworking 532 OISM EVPN-PEs with non-OISM EVPN-PEs. 534 o DR (Designated Router): A PE that has special responsibilities for 535 handling multicast on a given BD. 537 o FHR (First Hop Router): The FHR is a PIM router ([RFC7761]) with 538 special responsibilities. It is the first multicast router to see 539 (S,G) packets from source S, and if G is an "Any Source Multicast 540 (ASM)" group, the FHR is responsible for sending PIM Register 541 messages to the PIM Rendezvous Point for group G. 543 o LHR (Last Hop Router): The LHR is a PIM router ([RFC7761]) with 544 special responsibilities. Generally it is attached to a LAN, and 545 it determines whether there are any hosts on the LAN that need to 546 receive a given multicast flow. If so, it creates and sends the 547 PIM Join messages that are necessary to draw the flow. 549 o EC (Extended Community). A BGP Extended Communities attribute 550 ([RFC4360], [RFC7153]) is a BGP path attribute that consists of 551 one or more extended communities. 553 o RT (Route Target): A Route Target is a particular kind of BGP 554 Extended Community. A BGP Extended Community consists of a type 555 field, a sub-type field, and a value field. Certain type/sub-type 556 combinations indicate that a particular Extended Community is an 557 RT. RT1 and RT2 are considered to be the same RT if and only if 558 they have the same type, same sub-type, and same value fields. 560 o Use of the "C-" prefix. In many documents on VPN multicast, the 561 prefix "C-" appears before any address or wildcard that refers to 562 an address or addresses in a tenant's address space, rather than 563 to an address of addresses in the address space of the backbone 564 network. This document omits the "C-" prefix in many cases where 565 it is clear from the context that the reference is to the tenant's 566 address space. 568 This document also assumes familiarity with the terminology of 569 [RFC4364], [RFC6514], [RFC7432], [RFC7761], 570 [I-D.ietf-bess-evpn-igmp-mld-proxy], 571 [I-D.ietf-bess-evpn-prefix-advertisement] and 572 [I-D.ietf-bess-evpn-bum-procedure-updates]. 574 1.5. Model of Operation: Overview 576 1.5.1. Control Plane 578 In this section, and in the remainder of this document, we assume the 579 reader is familiar with the procedures of IGMP/MLD (see [RFC2236] and 580 [RFC2710]), by which hosts announce their interest in receiving 581 particular multicast flows. 583 Consider a Tenant Domain consisting of a set of k BDs: BD1, ..., BDk. 584 To support the OISM procedures, each Tenant Domain must also be 585 associated with a "Supplementary Broadcast Domain" (SBD). An SBD is 586 treated in the control plane as a real BD, but it does not have any 587 ACs. The SBD has several uses; these will be described later in this 588 document (see Section 2.1 and Section 3). 590 Each PE that attaches to one or more of the BDs in a given tenant 591 domain will be provisioned to recognize that those BDs are part of 592 the same Tenant Domain. Note that a given PE does not need to be 593 configured with all the BDs of a given Tenant Domain. In general, a 594 PE will only be attached to a subset of the BDs in a given Tenant 595 Domain, and will be configured only with that subset of BDs. 596 However, each PE attached to a given Tenant Domain must be configured 597 with the SBD for that Tenant Domain. 599 Suppose a particular segment of a particular BD is attached to PE1. 600 [RFC7432] specifies that PE1 must originate an Inclusive Multicast 601 Ethernet Tag (IMET) route for that BD, and that the IMET route must 602 be propagated to all other PEs attached to the same BD. If the given 603 segment contains a host that has interest in receiving a particular 604 multicast flow, either an (S,G) flow or a (*,G) flow, PE1 will learn 605 of that interest by participating in the IGMP/MLD procedures, as 606 specified in [I-D.ietf-bess-evpn-igmp-mld-proxy]. In this case, we 607 will say that: 609 o PE1 is interested in receiving the flow; 611 o The AC attaching the interested host to PE1 is also said to be 612 interested in the flow; 614 o The BD containing an AC that is interested in a particular flow is 615 also said to be interested in that flow. 617 Once PE1 determines that it has an AC that is interested in receiving 618 a particular flow or set of flows, it originates one or more 619 Selective Multicast Ethernet Tag (SMET) route to advertise that 620 interest. 622 Note that each IMET or SMET route is "for" a particular BD. The 623 notion of a route being "for" a particular BD is explained in 624 Section 2.2. 626 When OISM is being supported, the procedures of 627 [I-D.ietf-bess-evpn-igmp-mld-proxy], are modified as follows: 629 o The IMET route originated by a particular PE for a particular BD 630 is distributed to all other PEs attached to the Tenant Domain 631 containing that BD, even to those PEs that are not attached to 632 that particular BD. 634 o The SMET routes originated by a particular PE are originated on a 635 per-Tenant-Domain basis, rather than on a per-BD basis. That is, 636 the SMET routes are considered to be for the Tenant Domain's SBD, 637 rather than for any of its ordinary BDs. These SMET routes are 638 distributed to all the PEs attached to the Tenant Domain. 640 In this way, each PE attached to a given Tenant Domain learns, 641 from each other PE attached to the same Tenant Domain, the set of 642 flows that are of interest to each of those other PEs. 644 An OISM PE that is provisioned with several BDs in the same Tenant 645 Domain MUST originate an IMET route for each such BD. To indicate 646 its support of [I-D.ietf-bess-evpn-igmp-mld-proxy], it SHOULD attach 647 the EVPN Multicast Flags Extended Community to each such IMET route, 648 but it MUST attach the EC to at least one such IMET route. 650 Suppose PE1 is provisioned with both BD1 and BD2, and is provisioned 651 to consider them to be part of the same Tenant Domain. It is 652 possible that PE1 will receive from PE2 both an IMET route for BD1 653 and an IMET route for BD2. If either of these IMET routes has the 654 EVPN Multicast Flags Extended Community, PE1 MUST assume that PE2 is 655 supporting the procedures of [I-D.ietf-bess-evpn-igmp-mld-proxy] for 656 ALL BDs in the Tenant Domain. 658 If a PE supports OISM functionality, it indicates that by setting the 659 "OISM-supported" flag in the Multicast Flags Extended Community that 660 it attaches to some or all of its IMET routes. An OISM PE SHOULD 661 attach this EC with the OISM-supported flag set to all the IMET 662 routes it originates. However, if PE1 imports IMET routes from PE2, 663 and at least one of PE2's IMET routes indicates that PE2 is an OISM 664 PE, PE1 MUST assume that PE2 is following OISM procedures. 666 1.5.2. Data Plane 668 Suppose PE1 has an AC to a segment in BD1, and PE1 receives from that 669 AC an (S,G) multicast frame (as defined in Section 1.4). 671 There may be other ACs of PE1 on which TSes have indicated an 672 interest (via IGMP/MLD) in receiving (S,G) multicast packets. PE1 is 673 responsible for sending the received multicast packet out those ACs. 674 There are two cases to consider: 676 o Intra-Subnet Forwarding: In this case, an attachment AC with 677 interest in (S,G) is connected to a segment that is part of the 678 source BD, BD1. If the segment is not multi-homed, or if PE1 is 679 the Designated Forwarder (DF) (see [RFC7432]) for that segment, 680 PE1 sends the multicast frame on that AC without changing the MAC 681 SA. The IP header is not modified at all; in particular, the TTL 682 is not decremented. 684 o Inter-Subnet Forwarding: An AC with interest in (S,G) is connected 685 to a segment of BD2, where BD2 is different than BD1. If PE1 is 686 the DF for that segment (or if the segment is not multi-homed), 687 PE1 decapsulates the IP multicast packet, performs any necessary 688 IP processing (including TTL decrement), then re-encapsulates the 689 packet appropriately for BD2. PE1 then sends the packet on the 690 AC. Note that after re-encapsulation, the MAC SA will be PE1's 691 MAC address on BD2. The IP TTL will have been decremented by 1. 693 In addition, there may be other PEs that are interested in (S,G) 694 traffic. Suppose PE2 is such a PE. Then PE1 tunnels a copy of the 695 IP multicast frame (with its original MAC SA, and with no alteration 696 of the payload's IP header) to PE2. The tunnel encapsulation 697 contains information that PE2 can use to associate the frame with an 698 "apparent source BD". If the actual source BD of the frame is BD1, 699 then: 701 o If PE2 is attached to BD1, the tunnel encapsulation used to send 702 the frame to PE2 will cause PE2 to identify BD1 as the apparent 703 source BD. 705 o If PE2 is not attached to BD1, the tunnel encapsulation used to 706 send the frame to PE2 will cause PE2 to identify the SBD as the 707 apparent source BD. 709 Note that the tunnel encapsulation used for a particular BD will have 710 been advertised in an IMET route or S-PMSI route 711 ([I-D.ietf-bess-evpn-bum-procedure-updates]) for that BD. That route 712 carries a PMSI Tunnel attribute, which specifies how packets 713 originating from that BD are encapsulated. This information enables 714 the PE receiving a tunneled packet to identify the apparent source BD 715 as stated above. See Section 3.2 for more details. 717 When PE2 receives the tunneled frame, it will forward it on any of 718 its ACs that have interest in (S,G). 720 If PE2 determines from the tunnel encapsulation that the apparent 721 source BD is BD1, then 723 o For those ACs that connect PE2 to BD1, the intra-subnet forwarding 724 procedure described above is used, except that it is now PE2, not 725 PE1, carrying out that procedure. Unmodified EVPN procedures from 726 [RFC7432] are used to ensure that a packet originating from a 727 multi-homed segment is never sent back to that segment. 729 o For those ACs that do not connect to BD1, the inter-subnet 730 forwarding procedure described above is used, except that it is 731 now PE2, not PE1, carrying out that procedure. 733 If the tunnel encapsulation identifies the apparent source BD as the 734 SBD, PE2 applies the inter-subnet forwarding procedures described 735 above to all of its ACs that have interest in the flow. 737 These procedures ensure that an IP multicast frame travels from its 738 ingress PE to all egress PEs that are interested in receiving it. 739 While in transit, the frame retains its original MAC SA, and the 740 payload of the frame retains its original IP header. Note that in 741 all cases, when an IP multicast packet is sent from one BD to 742 another, these procedures cause its TTL to be decremented by 1. 744 So far we have assumed that an IP multicast packet arrives at its 745 ingress PE over an AC that belongs to one of the BDs in a given 746 Tenant Domain. However, it is possible for a packet to arrive at its 747 ingress PE in other ways. Since an EVPN-PE supporting IRB has an 748 IP-VRF, it is possible that the IP-VRF will have a "VRF interface" 749 that is not an IRB interface. For example, there might be a VRF 750 interface that is actually a physical link to an external ethernet 751 switch, or to a directly attached host, or to a router. When an 752 EVPN-PE, say PE1, receives a packet through such means, we will say 753 that the packet has an "external" source (i.e., a source "outside the 754 Tenant Domain"). There are also other scenarios in which a multicast 755 packet might have an external source, e.g., it might arrive over an 756 MVPN tunnel from an L3VPN PE. In such cases, we will still refer to 757 PE1 as the "ingress EVPN-PE". 759 When an EVPN-PE, say PE1, receives an externally sourced multicast 760 packet, and there are receivers for that packet inside the Tenant 761 Domain, it does the following: 763 o Suppose PE1 has an AC in BD1 that has interest in (S,G). Then PE1 764 encapsulates the packet for BD1, filling in the MAC SA field with 765 PE1's own MAC address on BD1. It sends the resulting frame on the 766 AC. 768 o Suppose some other EVPN-PE, say PE2, has interest in (S,G). PE1 769 encapsulates the packet for ethernet, filling in the MAC SA field 770 with PE1's own MAC address on the SBD. PE1 then tunnels the 771 packet to PE2. The tunnel encapsulation will identify the 772 apparent source BD as the SBD. Since the apparent source BD is 773 the SBD, PE2 will know to treat the frame as an inter-subnet 774 multicast. 776 When ingress replication is used to transmit IP multicast frames from 777 an ingress EVPN-PE to a set of egress PEs, then of course the ingress 778 PE has to send multiple copies of the frame. Each copy is the 779 original ethernet frame; decapsulation and IP processing take place 780 only at the egress PE. 782 If a Point-to-Multipoint (P2MP) tree or BIER ([I-D.ietf-bier-evpn]) 783 is used to transmit an IP multicast frame from an ingress PE to a set 784 of egress PEs, then the ingress PE only has to send one copy of the 785 frame to each of its next hops. Again, each egress PE receives the 786 original frame and does any necessary IP processing. 788 2. Detailed Model of Operation 790 The model described in Section 1.5.2 can be expressed more precisely 791 using the notion of "IRB interface" (see Appendix A). For a given 792 Tenant Domain: 794 o A given PE has one IRB for each BD to which it is attached. This 795 IRB interface connects L3 routing to that BD. When IP multicast 796 packets are sent or received on the IRB interfaces, the semantics 797 of the interface is modified from the semantics described in 798 Appendix A. See Section 2.3 for the details of the modification. 800 o Each PE also has an IRB interface that connects L3 routing to the 801 SBD. The semantics of this interface is different than the 802 semantics of the IRB interface to the real BDs. See Section 2.3. 804 In this section we assume that PIM is not enabled on the IRB 805 interfaces. In general, it is not necessary to enable PIM on the IRB 806 interfaces unless there are PIM routers on one of the Tenant Domain's 807 BDs, or unless there is some other scenario requiring a Tenant 808 Domain's L3 routing instance to become a PIM adjacency of some other 809 system. These cases will be discussed in Section 7. 811 2.1. Supplementary Broadcast Domain 813 Suppose a given Tenant Domain contains three BDs (BD1, BD2, BD3) and 814 two PEs (PE1, PE2). PE1 attaches to BD1 and BD2, while PE2 attaches 815 to BD2 and BD3. 817 To carry out the procedures described above, all the PEs attached to 818 the Tenant Domain must be provisioned with the SBD for that tenant 819 domain. A Route Target (RT) must be associated with the SBD, and 820 provisioned on each of those PEs. We will refer to that RT as the 821 "SBD-RT". 823 A Tenant Domain is also configured with an IP-VRF 824 ([I-D.ietf-bess-evpn-inter-subnet-forwarding]), and the IP-VRF is 825 associated with an RT. This RT MAY be the same as the SBD-RT. 827 Suppose an (S,G) multicast frame originating on BD1 has a receiver on 828 BD3. PE1 will transmit the packet to PE2 as a frame, and the 829 encapsulation will identify the frame's source BD as BD1. Since PE2 830 is not provisioned with BD1, it will treat the packet as if its 831 source BD were the SBD. That is, a packet can be transmitted from 832 BD1 to BD3 even though its ingress PE is not configured for BD3, and/ 833 or its egress PE is not configured for BD1. 835 EVPN supports service models in which a given EVPN Instance (EVI) can 836 contain only one BD. It also supports service models in which a 837 given EVI can contain multiple BDs. No matter which service model is 838 being used for a particular tenant, it is highly RECOMMENDED that an 839 EVI containing only the SBD be provisioned for that tenant. 841 If, for some reason, it is not feasible to provision an EVI that 842 contains only the SBD, it is possible to put the SBD in an EVI that 843 contains other BDs. However, in that case, the SBD-RT MUST be 844 different than the RT associated with any other BD. Otherwise the 845 procedures of this document (as detailed in Sections 2.2 and 3.1) 846 will not produce correct results. 848 2.2. Detecting When a Route is About/For/From a Particular BD 850 In this document, we frequently say that a particular multicast route 851 is "about" a particular BD, or is "from" a particular BD, or is "for" 852 a particular BD or is "related to" a particular BD or "is associated 853 with" a particular BD. These terms are used interchangeably. 854 Subsequent sections of this document explain when various routes must 855 be originated for particular BDs. In this section, we explain how 856 the PE originating a route marks the route to indicate which BD it is 857 about. We also explain how a PE receiving the route determines which 858 BD the route is about. 860 In EVPN, each BD is assigned a Route Target (RT). An RT is a BGP 861 extended community that can be attached to the BGP routes used by the 862 EVPN control plane. In some EVPN service models, each BD is assigned 863 a unique RT. In other service models, a set of BDs (all in the same 864 EVI) may be assigned the same RT. The RT that is assigned to the SBD 865 is called the "SBD-RT". 867 In those service models that allow a set of BDs to share a single RT, 868 each BD is assigned a non-zero Tag ID. The Tag ID appears in the 869 Network Layer Reachability Information (NLRI) of many of the BGP 870 routes that are used by the EVPN control plane. 872 A given route may be about the SBD, or about an "ordinary BD" (a BD 873 that is not the SBD). An RT that has been assigned to an ordinary BD 874 will be known as an "ordinary BD-RT". 876 When constructing an IMET, SMET, S-PMSI or Leaf 877 ([I-D.ietf-bess-evpn-bum-procedure-updates]) route that is about a 878 given BD, the following rules apply: 880 o If the route is about an ordinary BD, say BD1, then 882 * the route MUST carry the ordinary BD-RT associated with BD1, 883 and 885 * the route MUST NOT carry any RT that is associated with an 886 ordinary BD other than BD1. 888 o If the route is about the SBD, the route MUST carry the SBD-RT, 889 and MUST NOT carry any RT that is associated with any other BD. 891 o As detailed in subsequent sections, under certain circumstances a 892 route that is about BD1 may carry both the RT of BD1 and also the 893 SBD-RT. 895 The IMET route for the SBD MUST carry an Multicast Flags Extended 896 Community, in which an "OISM SBD" flag is set. 898 The IMET route for a BD other than the SBD SHOULD carry an EVI-RT EC 899 as defined in [I-D.ietf-bess-evpn-igmp-mld-proxy]. The EC is 900 constructed from the SBD-RT, to indicate the BD's corresponding SBD. 901 This allows all PEs to check that they have consistent SBD 902 provisioning and allow an AR-replicator to automatically determine a 903 BD's corresponding SBD w/o any provisioning, as explained in 904 Section 3.2.3.1. 906 When receiving an IMET, SMET, S-PMSI or Leaf route, it is necessary 907 for the receiving PE to determine the BD to which the route belongs. 909 This is done by examining the RTs carried by the route, as well as 910 the Tag ID field of the route's NLRI. There are several cases to 911 consider. Some of these cases are error cases that arise when the 912 route has not been properly constructed. 914 When one of the error cases is detected, the route MUST be regarded 915 as a malformed route, and the "treat-as-withdraw" procedure of 916 [RFC7606] MUST be applied. Note though that these error cases are 917 only detectable by EVPN procedures at the receiving PE; BGP 918 procedures at intermediate nodes will generally not detect the 919 existence of such error cases, and in general SHOULD NOT attempt to 920 do so. 922 Case 1: The receiving PE recognizes more than one of the route's RTs 923 as being an SBD-RT (i.e., the route carries SBD-RTs of more 924 than one Tenant Domain). 926 This is an error case; the route has not been properly 927 constructed. 929 Case 2: The receiving PE recognizes one of the route's RTs as being 930 associated with an ordinary BD, and recognizes one of the 931 route's other RTs as being associated with a different 932 ordinary BD. 934 This is an error case; the route has not been properly 935 constructed. 937 Case 3: The receiving PE recognizes one of the route's RTs as being 938 associated with an ordinary BD in a particular Tenant 939 Domain, and recognizes another of the route's RTs as being 940 associated with the SBD of a different Tenant Domain. 942 This is an error case; the route has not been properly 943 constructed. 945 Case 4: The receiving PE does not recognize any of the route's RTs 946 as being associated with an ordinary BD in any of its tenant 947 domains, but does recognize one of the RTs as the SBD-RT of 948 one of its Tenant Domains. 950 In this case, receiving PE associates the route with the SBD 951 of that Tenant Domain. This association is made even if the 952 Tag ID field of the route's NLRI is not the Tag ID of the 953 SBD. 955 This is a normal use case where either (a) the route is for 956 a BD to which the receiving PE is not attached, or (b) the 957 route is for the SBD. In either case, the receiving PE 958 associates the route with the SBD. 960 Case 5: The receiving PE recognizes exactly one of the RTs as an 961 ordinary BD-RT that is associated with one of the PE's EVIs, 962 say EVI-1. The receiving PE also recognizes one of the RTs 963 as being the SBD-RT of the Tenant Domain containing EVI-1. 965 In this case, the route is associated with the BD in EVI-1 966 that is identified (in the context of EVI-1) by the Tag ID 967 field of the route's NLRI. (If EVI-1 contains only a single 968 BD, the Tag ID is likely to be zero.) 970 This is the case where the route is for a BD to which the 971 receiving PE is attached, but the route also carries the 972 SBD-RT. In this case, the receiving PE associates the route 973 with the ordinary BD, not with the SBD. 975 N.B.: According to the above rules, the mapping from BD to RT is a 976 many-to-one or one-to-one mapping. A route that an EVPN-PE 977 originates for a particular BD carries that BD's RT, and an EVPN-PE 978 that receives the route associates it with a BD as described above. 979 However, RTs are not used only to help identify the BD to which a 980 route belongs; they may also used by BGP to determine the path along 981 which the route is distributed, and to determine which PEs receive 982 the route. There may be cases where it is desirable to originate a 983 route about a particular BD, but have that route distributed to only 984 some of the EVPN-PEs attached to that BD. Or one might want the 985 route distributed to some intermediate set of systems, where it might 986 be modified or replaced before being propagated further. Such 987 situations are outside the scope of this document. 989 Additionally, there may be situations where it is desirable to 990 exchange routes among two or more different Tenant Domains ("EVPN 991 Extranet"). Such situations are outside the scope of this document. 993 2.3. Use of IRB Interfaces at Ingress PE 995 When an (S,G) multicast frame is received from an AC belonging to a 996 particular BD, say BD1: 998 1. The frame is sent unchanged to other EVPN-PEs that are interested 999 in (S,G) traffic. The encapsulation used to send the frame to 1000 the other EVPN-PEs depends on the tunnel type being used for 1001 multicast transmission. (For our purposes, we consider Ingress 1002 Replication (IR), Assisted Replication (AR) and BIER to be 1003 "tunnel types", even though IR, AR and BIER do not actually use 1004 P2MP tunnels.) At the egress PE, the apparent source BD of the 1005 frame can be inferred from the tunnel encapsulation. If the 1006 egress PE is not attached to the actual source BD, it will infer 1007 that the apparent source BD is the SBD. 1009 Note that the the inter-PE transmission of a multicast frame 1010 among EVPN-PEs of the same Tenant Domain does NOT involve the IRB 1011 interfaces, as long as the multicast frame was received over an 1012 AC attached to one of the Tenant Domain's BDs. 1014 2. The frame is also sent up the IRB interface that attaches BD1 to 1015 the Tenant Domain's L3 routing instance in this PE. That is, the 1016 L3 routing instance, behaving as if it were a multicast router, 1017 receives the IP multicast frames that arrive at the PE from its 1018 local ACs. The L3 routing instance decapsulates the frame's 1019 payload to extract the IP multicast packet, decrements the IP 1020 TTL, adjusts the header checksum, and does any other necessary IP 1021 processing (e.g., fragmentation). 1023 3. The L3 routing instance keeps track of which BDs have local 1024 receivers for (S,G) traffic. (A "local receiver" is a TS, 1025 reachable via a local AC, that has expressed interest in (S,G) 1026 traffic.) If the L3 routing instance has an IRB interface to 1027 BD2, and it knows that BD2 has a LOCAL receiver interested in 1028 (S,G) traffic, it encapsulates the packet in an ethernet header 1029 for BD2, putting its own MAC address in the MAC SA field. Then 1030 it sends the packet down the IRB interface to BD2. 1032 If a packet is sent from the L3 routing instance to a particular BD 1033 via the IRB interface (step 3 in the above list), and if the BD in 1034 question is NOT the SBD, the packet is sent ONLY to LOCAL ACs of that 1035 BD. If the packet needs to go to other PEs, it has already been sent 1036 to them in step 1. Note that this is a change in the IRB interface 1037 semantics from what is described in 1038 [I-D.ietf-bess-evpn-inter-subnet-forwarding] and Figure 2. 1040 If a given locally attached segment is multi-homed, existing EVPN 1041 procedures ensure that a packet is not sent by a given PE to that 1042 segment unless the PE is the DF for that segment. Those procedures 1043 also ensure that a packet is never sent by a PE to its segment of 1044 origin. Thus EVPN segment multi-homing is fully supported; duplicate 1045 delivery to a segment or looping on a segment are thereby prevented, 1046 without the need for any new procedures to be defined in this 1047 document. 1049 What if an IP multicast packet is received from outside the tenant 1050 domain? For instance, perhaps PE1's IP-VRF for a particular tenant 1051 domain also has a physical interface leading to an external switch, 1052 host, or router, and PE1 receives an IP multicast packet or frame on 1053 that interface. Or perhaps the packet is from an L3VPN, or a 1054 different EVPN Tenant Domain. 1056 Such a packet is first processed by the L3 routing instance, which 1057 decrements TTL and does any other necessary IP processing. Then the 1058 packet is sent into the Tenant Domain by sending it down the IRB 1059 interface to the SBD of that Tenant Domain. This requires 1060 encapsulating the packet in an ethernet header. The MAC SA field 1061 will contain the PE's own MAC on the SBD. 1063 An IP multicast packet sent by the L3 routing instance down the IRB 1064 interface to the SBD is treated as if it had arrived from a local AC, 1065 and steps 1-3 are applied. Note that the semantics of sending a 1066 packet down the IRB interface to the SBD are thus slightly different 1067 than the semantics of sending a packet down other IRB interfaces. IP 1068 multicast packets sent down the SBD's IRB interface may be 1069 distributed to other PEs, but IP multicast packets sent down other 1070 IRB interfaces are distributed only to local ACs. 1072 If a PE sends a link-local multicast packet down the SBD IRB 1073 interface, that packet will be distributed (as an ethernet frame) to 1074 other PEs of the Tenant Domain, but will not appear on any of the 1075 actual BDs. 1077 2.4. Use of IRB Interfaces at an Egress PE 1079 Suppose an egress EVPN-PE receives an (S,G) multicast frame from the 1080 frame's ingress EVPN-PE. As described above, the packet will arrive 1081 as an ethernet frame over a tunnel from the ingress PE, and the 1082 tunnel encapsulation will identify the source BD of the ethernet 1083 frame. 1085 We define the notion of the frame's "apparent source BD" as follows. 1086 If the egress PE is attached to the actual source BD, the actual 1087 source BD is the apparent source BD. If the egress PE is not 1088 attached to the actual source BD, the SBD is the apparent source BD. 1090 The egress PE now takes the following steps: 1092 1. If the egress PE has ACs belonging to the apparent source BD of 1093 the frame, it sends the frame unchanged to any ACs of that BD 1094 that have interest in (S,G) packets. The MAC SA of the frame is 1095 not modified, and the IP header of the frame's payload is not 1096 modified in any way. 1098 2. The frame is also sent to the L3 routing instance by being sent 1099 up the IRB interface that attaches the L3 routing instance to the 1100 apparent source BD. Steps 2 and 3 of Section 2.3 are then 1101 applied. 1103 2.5. Announcing Interest in (S,G) 1105 [I-D.ietf-bess-evpn-igmp-mld-proxy] defines procedures used by an 1106 egress PE to announce its interest in a multicast flow or set of 1107 flows. If an egress PE determines it has LOCAL receivers in a 1108 particular BD, say BD1, that are interested in a particular set of 1109 flows, it originates one or more SMET routes for BD1. Each SMET 1110 route specifies a particular (S,G) or (*,G) flow. By originating an 1111 SMET route for BD1, a PE is announcing "I have receivers for (S,G) or 1112 (*,G) in BD1". Such an SMET route carries the Route Target (RT) for 1113 BD1, ensuring that it will be distributed to all PEs that are 1114 attached to BD1. 1116 The OISM procedures for originating SMET routes differ slightly from 1117 those in [I-D.ietf-bess-evpn-igmp-mld-proxy]. In most cases, the 1118 SMET routes are considered to be for the SBD, rather than for the BD 1119 containing local receivers. These SMET routes carry the SBD-RT, and 1120 do not carry any ordinary BD-RT. Details on the processing of SMET 1121 routes can be found in Section 3.3. 1123 Since the SMET routes carry the SBD-RT, every ingress PE attached to 1124 a particular Tenant Domain will learn of all other PEs (attached to 1125 the same Tenant Domain) that have interest in a particular set of 1126 flows. Note that a PE that receives a given SMET route does not 1127 necessarily have any BDs (other than the SBD) in common with the PE 1128 that originates that SMET route. 1130 If all the sources and receivers for a given (*,G) are in the Tenant 1131 Domain, inter-subnet "Any Source Multicast" traffic will be properly 1132 routed without requiring any Rendezvous Points, shared trees, or 1133 other complex aspects of multicast routing infrastructure. Suppose, 1134 for example, that: 1136 o PE1 has a local receiver, on BD1, for (*,G) 1138 o PE2 has a local source, on BD2, for (*,G). 1140 PE1 will originate an SMET(*,G) route for the SBD, and PE2 will 1141 receive that route, even if PE2 is not attached to BD1. PE2 will 1142 thus know to forward (S,G) traffic to PE1. PE1 does not need to do 1143 any "source discovery". (This does assume that source S does not 1144 send the same (S,G) datagram on two different BDs, and that the 1145 Tenant Domain does not contain two or more sources with the same IP 1146 address S. The use of multicast sources that have IP "anycast" 1147 addresses is outside the scope of this document.) 1148 If some PE attached to the Tenant Domain does not support [I-D.ietf- 1149 bess-evpn-igmp-mld-proxy], it will be assumed to be interested in all 1150 flows. Whether a particular remote PE supports [I-D.ietf-bess-evpn- 1151 igmp-mld-proxy] is determined by the presence of the Multicast Flags 1152 Extended Community in its IMET route; this is specified in [I-D.ietf- 1153 bess-evpn-igmp-mld-proxy]. 1155 2.6. Tunneling Frames from Ingress PE to Egress PEs 1157 [RFC7432] specifies the procedures for setting up and using "BUM 1158 tunnels". A BUM tunnel is a tunnel used to carry traffic on a 1159 particular BD if that traffic is (a) broadcast traffic, or (b) 1160 unicast traffic with an unknown MAC DA, or (c) ethernet multicast 1161 traffic. 1163 This document allows the BUM tunnels to be used as the default 1164 tunnels for transmitting IP multicast frames. It also allows a 1165 separate set of tunnels to be used, instead of the BUM tunnels, as 1166 the default tunnels for carrying IP multicast frames. Let's call 1167 these "IP Multicast Tunnels". 1169 When the tunneling is done via Ingress Replication or via BIER, this 1170 difference is of no significance. However, when P2MP tunnels are 1171 used, there is a significant advantage to having separate IP 1172 multicast tunnels. 1174 Other things being equal, it is desirable for an ingress PE to 1175 transmit a copy of a given (S,G) multicast frame on only one P2MP 1176 tunnel. All egress PEs interested in (S,G) packets then have to join 1177 that tunnel. If the source BD and PE for an (S,G) frame are BD1 an 1178 PE1 respectively, and if PE2 has receivers on BD2 for (S,G), then PE2 1179 must join the P2MP LSP on which PE1 transmits the (S,G) frame. PE2 1180 must join this P2MP LSP even if PE2 is not attached to the source BD 1181 (BD1). If PE1 were transmitting the multicast frame on its BD1 BUM 1182 tunnel, then PE2 would have to join the BD1 BUM tunnel, even though 1183 PE2 has no BD1 attachment circuits. This would cause PE2 to pull all 1184 the BUM traffic from BD1, most of which it would just have to 1185 discard. Thus we RECOMMEND that the default IP multicast tunnels be 1186 distinct from the BUM tunnels. 1188 Notwithstanding the above, link local IP multicast traffic MUST 1189 always be carried on the BUM tunnels, and ONLY on the BUM tunnels. 1190 Link local IP multicast traffic consists of IPv4 traffic with a 1191 destination address prefix of 224/8 and IPv6 traffic with a 1192 destination address prefix of FF02/16. In this document, the terms 1193 "IP multicast packet" and "IP multicast frame" are defined in 1194 Section 1.4 so as to exclude the link-local traffic. 1196 Note that it is also possible to use "selective tunnels" to carry 1197 particular multicast flows (see Section 3.2). When an (S,G) frame is 1198 transmitted on a selective tunnel, it is not transmitted on the BUM 1199 tunnel or on the default IP Multicast tunnel. 1201 2.7. Advanced Scenarios 1203 There are some deployment scenarios that require special procedures: 1205 1. Some multicast sources or receivers are attached to PEs that 1206 support [RFC7432], but do not support this document or 1207 [I-D.ietf-bess-evpn-inter-subnet-forwarding]. To interoperate 1208 with these "non-OISM PEs", it is necessary to have one or more 1209 gateway PEs that interface the tunnels discussed in this document 1210 with the BUM tunnels of the legacy PEs. This is discussed in 1211 Section 5. 1213 2. Sometimes multicast traffic originates from outside the EVPN 1214 domain, or needs to be sent outside the EVPN domain. This is 1215 discussed in Section 6. An important special case of this, 1216 integration with MVPN, is discussed in Section 6.1.2. 1218 3. In some scenarios, one or more of the tenant systems is a PIM 1219 router, and the Tenant Domain is used for as a transit network 1220 that is part of a larger multicast domain. This is discussed in 1221 Section 7. 1223 3. EVPN-aware Multicast Solution Control Plane 1225 3.1. Supplementary Broadcast Domain (SBD) and Route Targets 1227 As discussed in Section 2.1, every Tenant Domain is associated with a 1228 single Supplementary Broadcast Domain (SBD). Recall that a Tenant 1229 Domain is defined to be a set of BDs that can freely send and receive 1230 IP multicast traffic to/from each other. If an EVPN-PE has one or 1231 more ACs in a BD of a particular Tenant Domain, and if the EVPN-PE 1232 supports the procedures of this document, that EVPN-PE MUST be 1233 provisioned with the SBD of that Tenant Domain. 1235 At each EVPN-PE attached to a given Tenant Domain, there is an IRB 1236 interface leading from the L3 routing instance of that Tenant Domain 1237 to the SBD. However, the SBD has no ACs. 1239 Each SBD is provisioned with a Route Target (RT). All the EVPN-PEs 1240 supporting a given SBD are provisioned with that RT as an import RT. 1241 That RT MUST NOT be the same as the RT associated with any other BD. 1243 We will use the term "SBD-RT" to denote the RT has has been assigned 1244 to the SBD. Routes carrying this RT will be propagated to all 1245 EVPN-PEs in the same Tenant Domain as the originator. 1247 Section 2.2 specifies the rules by which an EVPN-PE that receives a 1248 route determines whether a received route "belongs to" a particular 1249 ordinary BD or SBD. 1251 Section 2.2 also specifies additional rules that must be following 1252 when constructing routes that belong to a particular BD, including 1253 the SBD. 1255 The SBD SHOULD be in an EVPN Instance (EVI) of its own. Even if the 1256 SBD is not in an EVI of its own, the SBD-RT MUST be different than 1257 the RT associated with any other BD. This restriction is necessary 1258 in order for the rules of Sections 2.2 and 3.1 to work correctly. 1260 Note that an SBD, just like any other BD, is associated on each 1261 EVPN-PE with a MAC-VRF. Per [RFC7432], each MAC-VRF is associated 1262 with a Route Distinguisher (RD). When constructing a route that is 1263 "about" an SBD, an EVPN-PE will place the RD of the associated 1264 MAC-VRF in the "Route Distinguisher" field of the NLRI. (If the 1265 Tenant Domain has several MAC-VRFs on a given PE, the EVPN-PE has a 1266 choice of which RD to use.) 1268 If Assisted Replication (AR, see [I-D.ietf-bess-evpn-optimized-ir]) 1269 is used, each AR-REPLICATOR for a given Tenant Domain must be 1270 provisioned with the SBD of that Tenant Domain, even if the 1271 AR-REPLICATOR does not have any L3 routing instance. 1273 3.2. Advertising the Tunnels Used for IP Multicast 1275 The procedures used for advertising the tunnels that carry IP 1276 multicast traffic depend upon the type of tunnel being used. If the 1277 tunnel type is neither Ingress Replication, Assisted Replication, nor 1278 BIER, there are procedures for advertising both "inclusive tunnels" 1279 and "selective tunnels". 1281 When IR, AR or BIER are used to transmit IP multicast packets across 1282 the core, there are no P2MP tunnels. Once an ingress EVPN-PE 1283 determines the set of egress EVPN-PEs for a given flow, the IMET 1284 routes contain all the information needed to transport packets of 1285 that flow to the egress PEs. 1287 If AR is used, the ingress EVPN-PE is also an AR-LEAF and the IMET 1288 route coming from the selected AR-REPLICATOR contains the information 1289 needed. The AR-REPLICATOR will behave as an ingress EVPN-PE when 1290 sending a flow to the egress EVPN-PEs. 1292 If the tunneling technique requires P2MP tunnels to be set up (e.g., 1293 RSVP-TE P2MP, mLDP, PIM), some of the tunnels may be selective 1294 tunnels and some may be inclusive tunnels. 1296 Selective P2MP tunnels are always advertised by the ingress PE using 1297 S-PMSI A-D routes ([I-D.ietf-bess-evpn-bum-procedure-updates]). 1299 For inclusive tunnels, there is a choice between using a BD's 1300 ordinary "BUM tunnel" [RFC7432] as the default inclusive tunnel for 1301 carrying IP multicast traffic, or using a separate IP multicast 1302 tunnel as the default inclusive tunnel for carrying IP multicast. In 1303 the former case, the inclusive tunnel is advertised in an IMET route. 1304 In the latter case, the inclusive tunnel is advertised in a (C-*,C-*) 1305 S-PMSI A-D route ([I-D.ietf-bess-evpn-bum-procedure-updates]). 1306 Details may be found in subsequent sections. 1308 3.2.1. Constructing Routes for the SBD 1310 There are situations in which an EVPN-PE needs to originate IMET, 1311 SMET, and/or SPMSI routes for the SBD. Throughout this document, we 1312 will refer to such routes respectively as "SBD-IMET routes", 1313 "SBD-SMET routes", and "SBD-SPMSI routes". Subsequent sections 1314 detail the conditions under which these routes need to be originated. 1316 When an EVPN-PE needs to originate an SBD-IMET, SBD-SMET, or 1317 SBD-SPMSI route, it constructs the route as follows: 1319 o the RD field of the route's NLRI is set to the RD of the MAC-VRF 1320 that is associated with the SBD; 1322 o the SBD-RT is attached to the route; 1324 o the "Tag ID" field of the route's NLRI is set to the Tag ID that 1325 has been assigned to the SBD. This is most likely 0 if a 1326 VLAN-based or VLAN-bundle service is being used, but non-zero if a 1327 VLAN-aware bundle service is being used. 1329 3.2.2. Ingress Replication 1331 When Ingress Replication (IR) is used to transport IP multicast 1332 frames of a given Tenant Domain, each EVPN-PE attached to that Tenant 1333 Domain MUST originate an SBD-IMET route (see Section 3.2.1). 1335 The SBD-IMET route MUST carry a PMSI Tunnel attribute (PTA), and the 1336 MPLS label field of the PTA MUST specify a downstream-assigned MPLS 1337 label that maps uniquely (in the context of the originating EVPN-PE) 1338 to the SBD. 1340 Following the procedures of [RFC7432], an EVPN-PE MUST also originate 1341 an IMET route for each BD to which it is attached. Each of these 1342 IMET routes carries a PTA specifying a downstream-assigned label that 1343 maps uniquely, in the context of the originating EVPN-PE, to the BD 1344 in question. These IMET routes need not carry the SBD-RT. 1346 When an ingress EVPN-PE needs to use IR to send an IP multicast frame 1347 from a particular source BD to an egress EVPN-PE, the ingress PE 1348 determines whether the egress PE has originated an IMET route for 1349 that BD. If so, that IMET route contains the MPLS label that the 1350 egress PE has assigned to the source BD. The ingress PE uses that 1351 label when transmitting the packet to the egress PE. Otherwise, the 1352 ingress PE uses the label that the egress PE has assigned to the SBD 1353 (in the SBD-IMET route originated by the egress). 1355 Note that the set of IMET routes originated by a given egress PE, and 1356 installed by a given ingress PE, may change over time. If the egress 1357 PE withdraws its IMET route for the source BD, the ingress PE MUST 1358 stop using the label carried in that IMET route, and instead MUST use 1359 the label carried in the SBD-IMET route from that egress PE. 1360 Implementors must also take into account that an IMET route from a 1361 particular PE for a particular BD may arrive after that PE's SBD-IMET 1362 route. 1364 3.2.3. Assisted Replication 1366 When Assisted Replication is used to transport IP multicast frames of 1367 a given Tenant Domain, each EVPN-PE (including the AR-REPLICATOR) 1368 attached to the Tenant Domain MUST originate an SBD-IMET route (see 1369 Section 3.2.1). 1371 An AR-REPLICATOR attached to a given Tenant Domain is considered to 1372 be an EVPN-PE of that Tenant Domain. It is attached to all the BDs 1373 in the Tenant Domain, but it does not necessarily have L3 routing 1374 instances. 1376 As with Ingress Replication, the SBD-IMET route carries a PTA where 1377 the MPLS label field specifies the downstream-assigned MPLS label 1378 that identifies the SBD. However, the AR-REPLICATOR and AR-LEAF 1379 EVPN-PEs will set the PTA's flags differently, as per 1380 [I-D.ietf-bess-evpn-optimized-ir]. 1382 In addition, each EVPN-PE originates an IMET route for each BD to 1383 which it is attached. As in the case of Ingress Replication, these 1384 routes carry the downstream-assigned MPLS labels that identify the 1385 BDs and do not carry the SBD-RT. 1387 When an ingress EVPN-PE, acting as AR-LEAF, needs to send an IP 1388 multicast frame from a particular source BD to an egress EVPN-PE, the 1389 ingress PE determines whether there is any AR-REPLICATOR that 1390 originated an IMET route for that BD. After the AR-REPLICATOR 1391 selection (if there are more than one), the AR-LEAF uses the label 1392 contained in the IMET route of the AR-REPLICATOR when transmitting 1393 packets to it. The AR-REPLICATOR receives the packet and, based on 1394 the procedures specified in [I-D.ietf-bess-evpn-optimized-ir] and in 1395 Section 3.2.2 of this document, transmits the packets to the egress 1396 EVPN-PEs using the labels contained in the received IMET routes for 1397 either the source BD or the SBD. 1399 If an ingress AR-LEAF for a given BD has not received any IMET route 1400 for that BD from an AR-REPLICATOR, the ingress AR-LEAF follows the 1401 procedures in Section 3.2.2. 1403 3.2.3.1. Automatic SBD Matching 1405 Each PE needs to know a BD's corresponding SBD. Configuring that 1406 information in each BD is one way but it requires repetitive 1407 configuration and consistency check (to make sure that all the BDs of 1408 the same tenant are configured with the same SBD). A better way is 1409 to configure the SBD info in the L3 routing instance so that all 1410 related BDs will derive the SBD information. 1412 An AR-replicator also needs to know same information, though it does 1413 not necessarily have an L3 routing instance. However from the EVI-RT 1414 EC in a BD's IMET route, an AR-replicator can derive the 1415 corresponding SBD of that BD w/o any configuration. 1417 3.2.4. BIER 1419 When BIER is used to transport multicast packets of a given Tenant 1420 Domain, and a given EVPN-PE attached to that Tenant Domain is a 1421 possible ingress EVPN-PE for traffic originating outside that Tenant 1422 Domain, the given EVPN-PE MUST originate an SBD-IMET route, (see 1423 Section 3.2.1). 1425 In addition, IMET routes that are originated for other BDs in the 1426 Tenant Domain MUST carry the SBD-RT. 1428 Each IMET route (including but not limited to the SBD-IMET route) 1429 MUST carry a PMSI Tunnel attribute (PTA). The MPLS label field of 1430 the PTA MUST specify an upstream-assigned MPLS label that maps 1431 uniquely (in the context of the originating EVPN-PE) to the BD for 1432 which the route is originated. 1434 Suppose an ingress EVPN-PE, PE1, needs to use BIER to tunnel an IP 1435 multicast frame to a set of egress EVPN-PEs. And suppose the frame's 1436 source BD is BD1. The frame is encapsulated as follows: 1438 o A four-octet MPLS label stack entry ([RFC3032]) is prepended to 1439 the frame. The Label field is set to the upstream-assigned label 1440 that PE1 has assigned to BD1. 1442 o The resulting MPLS packet is then encapsulated in a BIER 1443 encapsulation ([RFC8296], [I-D.ietf-bier-evpn]). The BIER 1444 BitString is set to identify the egress EVPN-PEs. The BIER 1445 "proto" field is set to the value for "MPLS packet with 1446 upstream-assigned label at top of stack". 1448 Note: It is possible that the packet being tunneled from PE1 1449 originated outside the Tenant Domain. In this case, the actual 1450 source BD (BD1) is considered to be the SBD, and the 1451 upstream-assigned label it carries will be the label that PE1 1452 assigned to the SBD, and advertised in its SBD-IMET route. 1454 Suppose an egress PE, PE2, receives such a BIER packet. The BFIR-id 1455 field of the BIER header allows PE2 to determine that the ingress PE 1456 is PE1. There are then two cases to consider: 1458 1. PE2 has received and installed an IMET route for BD1 from PE1. 1460 In this case, the BIER packet will be carrying the 1461 upstream-assigned label that is specified in the PTA of that IMET 1462 route. This enables PE2 to determine the "apparent source BD" 1463 (as defined in Section 2.4). 1465 2. PE2 has not received and installed an IMET route for BD1 from 1466 PE1. 1468 In this case, PE2 will not recognize the upstream-assigned label 1469 carried in the BIER packet. PE2 MUST discard the packet. 1471 Further details on the use of BIER to support EVPN can be found in 1472 [I-D.ietf-bier-evpn]. 1474 3.2.5. Inclusive P2MP Tunnels 1476 3.2.5.1. Using the BUM Tunnels as IP Multicast Inclusive Tunnels 1478 The procedures in this section apply only when 1480 (a) it is desired to use the BUM tunnels to carry IP multicast 1481 traffic across the backbone, and 1483 (b) the BUM tunnels are P2MP tunnels (i.e., neither IR, AR, nor BIER 1484 are being used to transport the BUM traffic). 1486 In this case, an IP multicast frame (whether inter-subnet or 1487 intra-subnet) will be carried across the backbone in the BUM tunnel 1488 belonging to its source BD. Each EVPN-PE attached to a given Tenant 1489 Domain needs to join the BUM tunnels for every BD in the Tenant 1490 Domain, even those BDs to which the EVPN-PE is not locally attached. 1491 This ensures that an IP multicast packet from any source BD can reach 1492 all PEs attached to the Tenant Domain. 1494 Note that this will cause all the BUM traffic from a given BD in a 1495 Tenant Domain to be sent to all PEs that attach to that Tenant 1496 Domain, even the PEs that don't attach to the given BD. To avoid 1497 this, it is RECOMMENDED that the BUM tunnels not be used as IP 1498 Multicast inclusive tunnels, and that the procedures of 1499 Section 3.2.5.2 be used instead. 1501 If a PE is a possible ingress EVPN-PE for traffic originating outside 1502 the Tenant Domain, the PE MUST originate an SBD-IMET route (see 1503 Section 3.2.1). This route MUST carry a PTA specifying the P2MP 1504 tunnel used for transmitting IP multicast packets that originate 1505 outside the tenant domain. All EVPN-PEs of the Tenant Domain MUST 1506 join the tunnel specified in the PTA of an SBD-IMET route: 1508 o If the tunnel is an RSVP-TE P2MP tunnel, the originator of the 1509 route MUST use RSVP-TE P2MP procedures to add each PE of the 1510 Tenant Domain to the tunnel, even PEs that have not originated an 1511 SBD-IMET route. 1513 o If the tunnel is an mLDP or PIM tunnel, each PE importing the 1514 SBD-IMET route MUST add itself to the tunnel, using mLDP or PIM 1515 procedures, respectively. 1517 Whether or not a PE originates an SBD-IMET route, it will of course 1518 originate an IMET route for each BD to which it is attached. Each of 1519 these IMET routes MUST carry the SBD-RT, as well as the RT for the BD 1520 to which it belongs. 1522 If a received IMET route is not the SBD-IMET route, it will also be 1523 carrying the RT for its source BD. The route's NLRI will carry the 1524 Tag ID for the source BD. From the RT and the Tag ID, any PE 1525 receiving the route can determine the route's source BD. 1527 If the MPLS label field of the PTA contains zero, the specified P2MP 1528 tunnel is used only to carry frames of a single source BD. 1530 If the MPLS label field of the PTA does not contain zero, it MUST 1531 contain an upstream-assigned MPLS label that maps uniquely (in the 1532 context of the originating EVPN-PE) to the source BD (or, in the case 1533 of an SBD-IMET route, to the SBD). The tunnel may then be used to 1534 carry frames of multiple source BDs. The apparent source BD of a 1535 particular packet is inferred from the label carried by the packet. 1537 IP multicast traffic originating outside the Tenant Domain is 1538 transmitted with the label corresponding to the SBD, as specified in 1539 the ingress EVPN-PE's SBD-IMET route. 1541 3.2.5.2. Using Wildcard S-PMSI A-D Routes to Advertise Inclusive 1542 Tunnels Specific to IP Multicast 1544 The procedures of this section apply when (and only when) it is 1545 desired to transmit IP multicast traffic on an inclusive tunnel, but 1546 not on the same tunnel used to transmit BUM traffic. 1548 However, these procedures do NOT apply when the tunnel type is 1549 Ingress Replication or BIER, EXCEPT in the case where it is necessary 1550 to interwork between non-OISM PEs and OISM PEs, as specified in 1551 Section 5. 1553 Each EVPN-PE attached to the given Tenant Domain MUST originate an 1554 SBD-SPMSI A-D route. The NLRI of that route MUST contain (C-*,C-*) 1555 (see [RFC6625]). Additional rules for constructing that route are 1556 given in Section 3.2.1. 1558 In addition, an EVPN-PE MUST originate an S-PMSI A-D route containing 1559 (C-*,C-*) in its NLRI for each of the other BDs, in the given Tenant 1560 Domain, to which it is attached. All such routes MUST carry the 1561 SBD-RT. This ensures that those routes are imported by all EVPN-PEs 1562 attached to the Tenant Domain. 1564 A PE receiving these routes follows the procedures of Section 2.2 to 1565 determine which BD the route is for. 1567 If the MPLS label field of the PTA contains zero, the specified 1568 tunnel is used only to carry frames of a single source BD. 1570 If the MPLS label field of the PTA does not contain zero, it MUST 1571 specify an upstream-assigned MPLS label that maps uniquely (in the 1572 context of the originating EVPN-PE) to the source BD. The tunnel may 1573 be used to carry frames of multiple source BDs, and the apparent 1574 source BD for a particular packet is inferred from the label carried 1575 by the packet. 1577 The EVPN-PE advertising these S-PMSI A-D route routes is specifying 1578 the default tunnel that it will use (as ingress PE) for transmitting 1579 IP multicast packets. The upstream-assigned label allows an egress 1580 PE to determine the apparent source BD of a given packet. 1582 3.2.6. Selective Tunnels 1584 An ingress EVPN-PE for a given multicast flow or set of flows can 1585 always assign the flow to a particular P2MP tunnel by originating an 1586 S-PMSI A-D route whose NLRI identifies the flow or set of flows. The 1587 NLRI of the route could be (C-*,C-G), or (C-S,C-G). The S-PMSI A-D 1588 route MUST carry the SBD-RT, so that it is imported by all EVPN-PEs 1589 attached to the Tenant Domain. 1591 An S-PMSI A-D route is "for" a particular source BD. It MUST carry 1592 the RT associated with that BD, and it MUST have the Tag ID for that 1593 BD in its NLRI. 1595 When an EVPN-PE imports an S-PMSI A-D route, it applies the rules of 1596 Section 2.2 to associate the route with a particular BD. 1598 Each such route MUST contain a PTA, as specified in Section 3.2.5.2. 1600 An egress EVPN-PE interested in the specified flow or flows MUST join 1601 the specified tunnel. Procedures for joining the specified tunnel 1602 are specific to the tunnel type. (Note that if the tunnel type is 1603 RSVP-TE P2MP LSP, the Leaf Information Required (LIR) flag of the PTA 1604 SHOULD NOT be set. An ingress OISM PE knows which OISM EVPN PEs are 1605 interested in any given flow, and hence can add them to the RSVP-TE 1606 P2MP tunnel that carries such flows.) 1608 If the PTA does not specify a non-zero MPLS label, the apparent 1609 source BD of any packets that arrive on that tunnel is considered to 1610 be the BD associated with the route that carries the PTA. If the PTA 1611 does specify a non-zero MPLS label, the apparent source BD of any 1612 packets that arrive on that tunnel carrying the specified label is 1613 considered to be the BD associated with the route that carries the 1614 PTA. 1616 It should be noted that when either IR or BIER is used, there is no 1617 need for an ingress PE to use S-PMSI A-D routes to assign specific 1618 flows to selective tunnels. The procedures of Section 3.3, along 1619 with the procedures of Section 3.2.2, Section 3.2.3, or 1620 Section 3.2.4, provide the functionality of selective tunnels without 1621 the need to use S-PMSI A-D routes. 1623 3.3. Advertising SMET Routes 1625 [I-D.ietf-bess-evpn-igmp-mld-proxy] allows an egress EVPN-PE to 1626 express its interest in a particular multicast flow or set of flows 1627 by originating an SMET route. The NLRI of the SMET route identifies 1628 the flow or set of flows as (C-*,C-*) or (C-*,C-G) or (C-S,C-G). 1630 Each SMET route belongs to a particular BD. The Tag ID for the BD 1631 appears in the NLRI of the route, and the route carries the RT 1632 associated that that BD. From this pair, other EVPN-PEs 1633 can identify the BD to which a received SMET route belongs. 1634 (Remember though that the route may be carrying multiple RTs.) 1636 There are three cases to consider: 1638 o Case 1: It is known that no BD of a Tenant Domain contains a 1639 multicast router. 1641 In this case, an egress PE advertises its interest in a flow or 1642 set of flows by originating an SMET route that belongs to the SBD. 1643 We refer to this as an SBD-SMET route. The SBD-SMET route carries 1644 the SBD-RT, and has the Tag ID for the SBD in its NLRI. SMET 1645 routes for the individual BDs are not needed, because there is no 1646 need for a PE that receives an SMET route to send a corresponding 1647 IGMP Join message out any of its ACs. 1649 o Case 2: It is known that more than one BD of a Tenant Domain may 1650 contain a multicast router. 1652 This is very like Case 1. An egress PE advertises its interest in 1653 a flow or set of flows by originating an SBD-SMET route. The 1654 SBD-SMET route carries the SBD-RT, and has the Tag ID for the SBD 1655 in its NLRI. 1657 In this case, it is important to be sure that SMET routes for the 1658 individual BDs are not originated. Suppose, for example, that PE1 1659 had local receivers for a given flow on both BD1 and BD2, and that 1660 it originated SMET routes for both those BDs. Then PEs receiving 1661 those SMET routes might send IGMP Joins on both those BDs. This 1662 could cause externally sourced multicast traffic to enter the 1663 Tenant Domain at both BDs, which could result in duplication of 1664 data. 1666 N.B.: If it is possible that more than one BD contains a tenant 1667 multicast router, then in order to receive multicast data 1668 originating from outside EVPN, the PEs MUST follow the procedures 1669 of Section 6. 1671 o Case 3: It is known that only a single BD of a Tenant Domain 1672 contains a multicast router. 1674 Suppose that an egress PE is attached to a BD on which there might 1675 be a tenant multicast router. (The tenant router is not 1676 necessarily on a segment that is attached to that PE.) And 1677 suppose that the PE has one or more ACs attached to that BD which 1678 are interested in a given multicast flow. In this case, IN 1679 ADDITION to the SMET route for the SBD, the egress PE MAY 1680 originate an SMET route for that BD. This will enable the ingress 1681 PE(s) to send IGMP/MLD messages on ACs for the BD, as specified in 1682 [I-D.ietf-bess-evpn-igmp-mld-proxy]. As long as that is the only 1683 BD on which there is a tenant multicast router, there is no 1684 possibility of duplication of data. 1686 This document does not specify procedures for dynamically determining 1687 which of the three cases applies to a given deployment; the PEs of a 1688 given Tenant Domain MUST be provisioned to know which case applies. 1690 As detailed in [I-D.ietf-bess-evpn-igmp-mld-proxy], an SMET route 1691 carries a Multicast Flags EC containing flags indicating whether it 1692 is to result in the propagation of IGMP v1, v2, or v3 messages on the 1693 ACs of the BD to which the SMET route belongs. These flags SHOULD be 1694 set to zero in an SBD-SMET route. 1696 Note that a PE only needs to originate the set of SBD-SMET routes 1697 that are needed to pull in all the traffic in which it is interested. 1698 Suppose PE1 has ACs attached to BD1 that are interested in (C-*,C-G) 1699 traffic, and ACs attached to BD2 that are interested in (C-S,C-G) 1700 traffic. A single SBD-SMET route specifying (C-*,C-G) will pull in 1701 all the necessary flows. 1703 As another example, suppose the ACs attached to BD1 are interested in 1704 (C-*,C-G) but not in (C-S,C-G), while the ACs attached to BD2 are 1705 interested in (C-S,C-G). A single SBD-SMET route specifying 1706 (C-*,C-G) will pull in all the necessary flows. 1708 In other words, to determine the set of SBD-SMET routes that have to 1709 be sent for a given C-G, the PE has to merge the IGMP/MLD state for 1710 all the BDs (of the given Tenant Domain) to which it is attached. 1712 Per [I-D.ietf-bess-evpn-igmp-mld-proxy], importing an SMET route for 1713 a particular BD will cause IGMP/MLD state to be instantiated for the 1714 IRB interface to that BD. This applies as well when the BD is the 1715 SBD. 1717 However, traffic that originates in one of the actual BDs of a 1718 particular Tenant Domain MUST NOT be sent down the IRB interface that 1719 connects the L3 routing instance of that Tenant Domain to the SBD. 1720 That would cause duplicate delivery of traffic, since such traffic 1721 will have already been distributed throughout the Tenant Domain. 1722 Therefore, when setting up the IGMP/MLD state based on SBD-SMET 1723 routes, care must be taken to ensure that the IRB interface to the 1724 SBD is not added to the Outgoing Interface (OIF) list if the traffic 1725 originates within the Tenant Domain. 1727 There are some multicast scenarios that make use of "anycast 1728 sources". For example, two different sources may share the same 1729 anycast IP address, say S1, and each may transmit an (S1,G) multicast 1730 flow. In such a scenario, the two (S1,G) flows are typically 1731 identical. Ordinary PIM procedures will cause only one the flows to 1732 be delivered to each receiver that has expressed interest in either 1733 (*,G) or (S1,G). However, the OISM procedures described in this 1734 document will result in both of the (S1,G) flows being distributed in 1735 the Tenant Domain, and duplicate delivery will result. Therefore, if 1736 there are receivers for (*,G) in a given Tenant Domain, there MUST 1737 NOT be anycast sources for G within that Tenant Domain. (This 1738 restriction can be lifted by defining additional procedures; however 1739 that is outside the scope of this document.) 1741 4. Constructing Multicast Forwarding State 1743 4.1. Layer 2 Multicast State 1745 An EVPN-PE maintains "layer 2 multicast state" for each BD to which 1746 it is attached. 1748 Let PE1 be an EVPN-PE, and BD1 be a BD to which it is attached. At 1749 PE1, BD1's layer 2 multicast state for a given (C-S,C-G) or (C-*,C-G) 1750 governs the disposition of an IP multicast packet that is received by 1751 BD1's layer 2 multicast function on an EVPN-PE. 1753 An IP multicast (S,G) packet is considered to have been received by 1754 BD1's layer 2 multicast function in PE1 in the following cases: 1756 o The packet is the payload of an ethernet frame received by PE1 1757 from an AC that attaches to BD1. 1759 o The packet is the payload of an ethernet frame whose apparent 1760 source BD is BD1, and which is received by the PE1 over a tunnel 1761 from another EVPN-PE. 1763 o The packet is received from BD1's IRB interface (i.e., has been 1764 transmitted by PE1's L3 routing instance down BD1's IRB 1765 interface). 1767 According to the procedures of this document, all transmission of IP 1768 multicast packets from one EVPN-PE to another is done at layer 2. 1769 That is, the packets are transmitted as ethernet frames, according to 1770 the layer 2 multicast state. 1772 Each layer 2 multicast state (S,G) or (*,G) contains a set "output 1773 interfaces" (OIF list). The disposition of an (S,G) multicast frame 1774 received by BD1's layer 2 multicast function is determined as 1775 follows: 1777 o The OIF list is taken from BD1's layer 2 (S,G) state, or if there 1778 is no such (S,G) state, then from BD1's (*,G) state. (If neither 1779 state exists, the OIF list is considered to be null.) 1781 o The rules of Section 4.1.2 are applied to the OIF list. This will 1782 generally result in the frame being transmitted to some, but not 1783 all, elements of the OIF list. 1785 Note that there is no RPF check at layer 2. 1787 4.1.1. Constructing the OIF List 1789 In this document, we have extended the procedures of 1790 [I-D.ietf-bess-evpn-igmp-mld-proxy] so that IMET and SMET routes for 1791 a particular BD are distributed not just to PEs that attach to that 1792 BD, but to PEs that attach to any BD in the Tenant Domain. In this 1793 way, each PE attached to a given Tenant Domain learns, from each 1794 other PE attached to the same Tenant Domain, the set of flows that 1795 are of interest to each of those other PEs. (If some PE attached to 1796 the Tenant Domain does not support 1797 [I-D.ietf-bess-evpn-igmp-mld-proxy], it will be assumed to be 1798 interested in all flows. Whether a particular remote PE supports 1799 [I-D.ietf-bess-evpn-igmp-mld-proxy] is determined by the presence of 1800 an Extended Community in its IMET route; this is specified in 1801 [I-D.ietf-bess-evpn-igmp-mld-proxy].) If a set of remote PEs are 1802 interested in a particular flow, the tunnels used to reach those PEs 1803 are added to the OIF list of the multicast states corresponding to 1804 that flow. 1806 An EVPN-PE may run IGMP/MLD procedures on each of its ACs, in order 1807 to determine the set of flows of interest to each AC. (An AC is said 1808 to be interested in a given flow if it connects to a segment that has 1809 tenant systems interested in that flow.) If IGMP/MLD procedures are 1810 not being run on a given AC, that AC is considered to be interested 1811 in all flows. For each BD, the set of ACs interested in a given flow 1812 is determined, and the ACs of that set are added to the OIF list of 1813 that BD's multicast state for that flow. 1815 The OIF list for each multicast state must also contain the IRB 1816 interface for the BD to which the state belongs. 1818 Implementors should note that the OIF list of a multicast state will 1819 change from time to time as ACs and/or remote PEs either become 1820 interested in, or lose interest in, particular multicast flows. 1822 4.1.2. Data Plane: Applying the OIF List to an (S,G) Frame 1824 When an (S,G) multicast frame is received by the layer 2 multicast 1825 function of a given EVPN-PE, say PE1, its disposition depends (a) the 1826 way it was received, (b) upon the OIF list of the corresponding 1827 multicast state (see Section 4.1.1), (c) upon the "eligibility" of an 1828 AC to receive a given frame (see Section 4.1.2.1 and (d) upon its 1829 apparent source BD (see Section 3.2 for information about determining 1830 the apparent source BD of a frame received over a tunnel from another 1831 PE). 1833 4.1.2.1. Eligibility of an AC to Receive a Frame 1835 A given (S,G) multicast frame is eligible to be transmitted by a 1836 given PE, say PE1, on a given AC, say AC1, only if one of the 1837 following conditions holds: 1839 1. ESI labels are being used, PE1 is the DF for the segment to which 1840 AC1 is connected, and the frame did not originate from that same 1841 segment (as determined by the ESI label), or 1843 2. The ingress PE for the frame is a remote PE, say PE2, local bias 1844 is being used, and PE2 is not connected to the same segment as 1845 AC1. 1847 4.1.2.2. Applying the OIF List 1849 Assume a given (S,G) multicast frame has been received by a given PE, 1850 say PE1. PE1 determines the apparent source BD of the frame, finds 1851 the layer 2 (S,G) state for that BD (or the (*,G) state if there is 1852 no (S,G) state), and takes the OIF list from that state. (Note that 1853 if PE1 is not attached to the actual source BD, the apparent source 1854 BD will be the SBD.) 1856 Suppose PE1 has determined the frame's apparent source BD to be BD1 1857 (which may or may not be the SBD.) There are the following cases to 1858 consider: 1860 1. The frame was received by PE1 from a local AC, say AC1, that 1861 attaches to BD1. 1863 a. The frame MUST be sent out all local ACs of BD1 that appear 1864 in the OIF list, except for AC1 itself. 1866 b. The frame MUST also be delivered to any other EVPN-PEs that 1867 have interest in it. This is achieved as follows: 1869 i. If (a) AR is being used, and (b) PE1 is an AR-LEAF, and 1870 (c) the OIF list is non-null, PE1 MUST send the frame 1871 to the AR-REPLICATOR. 1873 ii. Otherwise the frame MUST be sent on all tunnels in the 1874 OIF list. 1876 c. The frame MUST be sent to the local L3 routing instance by 1877 being sent up the IRB interface of BD1. It MUST NOT be sent 1878 up any other IRB interfaces. 1880 2. The frame was received by PE1 over a tunnel from another PE. 1881 (See Section 3.2 for the rules to determine the apparent source 1882 BD of a packet received from another PE. Note that if PE1 is not 1883 attached to the source BD, it will regard the SBD as the apparent 1884 source BD.) 1886 a. The frame MUST be sent out all local ACs in the OIF list that 1887 connect to BD1 and that are eligible (per Section 4.1.2.1) to 1888 receive the frame. 1890 b. The frame MUST be sent up the IRB interface of the apparent 1891 source BD. (Note that this may be the SBD.) The frame MUST 1892 NOT be sent up any other IRB interfaces. 1894 c. If PE1 is not an AR-REPLICATOR, it MUST NOT send the frame to 1895 any other EVPN-PEs. However, if PE1 is an AR-REPLICATOR, it 1896 MUST send the frame to all tunnels in the OIF list, except 1897 for the tunnel over which the frame was received. 1899 3. The frame was received by PE1 from the BD1 IRB interface (i.e., 1900 the frame has been transmitted by PE1's L3 routing instance down 1901 the BD1 IRB interface), and BD1 is NOT the SBD. 1903 a. The frame MUST be sent out all local ACs in the OIF list that 1904 are eligible (per Section 4.1.2.1 to receive the frame. 1906 b. The frame MUST NOT be sent to any other EVPN-PEs. 1908 c. The frame MUST NOT be sent up any IRB interfaces. 1910 4. The frame was received from the SBD IRB interface (i.e., has been 1911 transmitted by PE1's L3 routing instance down the SBD IRB 1912 interface). 1914 a. The frame MUST be sent on all tunnels in the OIF list. This 1915 causes the frame to be delivered to any other EVPN-PEs that 1916 have interest in it. 1918 b. The frame MUST NOT be sent on any local ACs. 1920 c. The frame MUST NOT be sent up any IRB interfaces. 1922 4.2. Layer 3 Forwarding State 1924 If an EVPN-PE is performing IGMP/MLD procedures on the ACs of a given 1925 BD, it processes those messages at layer 2 to help form the layer 2 1926 multicast state. If also sends those messages up that BD's IRB 1927 interface to the L3 routing instance of a particular tenant domain. 1928 This causes layer 2 (C-S,C-G) or (C-*,C-G) L3 state to be created/ 1929 updated. 1931 A layer 3 multicast state has both an Input Interface (IIF) and an 1932 OIF list. 1934 To set the IIF of an (C-S,C-G) state, the EVPN-PE must determine the 1935 source BD of C-S. This is done by looking up S in the local 1936 MAC-VRF(s) of the given Tenant Domain. 1938 If the source BD is present on the PE, the IIF is set to the IRB 1939 interface that attaches to that BD. Otherwise the IIF is set to the 1940 SBD IRB interface. 1942 For (C-*,C-G) states, traffic can arrive from any BD, so the IIF 1943 needs to be set to a wildcard value meaning "any IRB interface". 1945 The OIF list of these states includes one or more of the IRB 1946 interfaces of the Tenant Domain. In general, maintenance of the OIF 1947 list does not require any EVPN-specific procedures. However, there 1948 is one EVPN-specific rule: 1950 If the IIF is one of the IRB interfaces (or the wild card meaning 1951 "any IRB interface"), then the SBD IRB interface MUST NOT be added 1952 to the OIF list. Traffic originating from within a particular 1953 EVPN Tenant Domain must not be sent down the SBD IRB interface, as 1954 such traffic has already been distributed to all EVPN-PEs attached 1955 to that Tenant Domain. 1957 Please also see Section 6.1.1, which states a modification of this 1958 rule for the case where OISM is interworking with external Layer 3 1959 multicast routing. 1961 5. Interworking with non-OISM EVPN-PEs 1963 It is possible that a given Tenant Domain will be attached to both 1964 OISM PEs and non-OISM PEs. Inter-subnet IP multicast should be 1965 possible and fully functional even if not all PEs attaching to a 1966 Tenant Domain can be upgraded to support OISM functionality. 1968 Note that the non-OISM PEs are not required to have IRB support, or 1969 support for [I-D.ietf-bess-evpn-igmp-mld-proxy]. It is however 1970 advantageous for the non-OISM PEs to support 1971 [I-D.ietf-bess-evpn-igmp-mld-proxy]. 1973 In this section, we will use the following terminology: 1975 o PE-S: the ingress PE for an (S,G) flow. 1977 o PE-R: an egress PE for an (S,G) flow. 1979 o BD-S: the source BD for an (S,G) flow. PE-S must have one or more 1980 ACs attached BD-S, at least one of which attaches to host S. 1982 o BD-R: a BD that contains a host interested in the flow. The host 1983 is attached to PE-R via an AC that belongs to BD-R. 1985 To allow OISM PEs to interwork with non-OISM PEs, a given Tenant 1986 Domain needs to contain one or more "IP Multicast Gateways" (IPMGs). 1987 An IPMG is an OISM PE with special responsibilities regarding the 1988 interworking between OISM and non-OISM PEs. 1990 If a PE is functioning as an IPMG, it MUST signal this fact by 1991 setting the "IPMG" flag in the Multicast Flags EC that it attaches to 1992 its IMET routes. An IPMG SHOULD attach this EC with the IPMG flag 1993 set to all IMET routes it originates. However, if PE1 imports any 1994 IMET route from PE2 that has the EC present with the "IPMG" flag set, 1995 then the PE1 will assume that PE2 is an IPMG. 1997 An IPMG Designated Forwarder (IPMG-DF) selection procedure is used to 1998 ensure that, at any given time, there is exactly one active IPMG-DF 1999 for any given BD. Details of the IPMG-DF selection procedure are in 2000 Section 5.1. The IPMG-DF for a given BD, say BD-S, has special 2001 functions to perform when it receives (S,G) frames on that BD: 2003 o If the frames are from a non-OISM PE-S: 2005 * The IPMG-DF forwards them to OISM PEs that do not attach to 2006 BD-S but have interest in (S,G). 2008 Note that OISM PEs that do attach to BD-S will have received 2009 the frames on the BUM tunnel from the non-OISM PE-S. 2011 * The IPMG-DF forwards them to non-OISM PEs that have interest in 2012 (S,G) on ACs that do not belong to BD-S. 2014 Note that if a non-OISM PE has multiple BDs other than BD-S 2015 with interest in (S,G), it will receive one copy of the frame 2016 for each such BD. This is necessary because the non-OISM PEs 2017 cannot move IP multicast traffic from one BD to another. 2019 o If the frames are from an OISM PE, the IPMG-DF forwards them to 2020 non-OISM PEs that have interest in (S,G) on ACs that do not belong 2021 to BD-S. 2023 If a non-OISM PE has interest in (S,G) on an AC belonging to BD-S, 2024 it will have received a copy of the (S,G) frame, encapsulated for 2025 BD-S, from the OISM PE-S. (See Section 3.2.2.) If the non-OISM 2026 PE has interest in (S,G) on one or more ACs belonging to 2027 BD-R1,...,BD-Rk where the BD-Ri are distinct from BD-S, the 2028 IPMG-DF needs to send it a copy of the frame for BD-Ri. 2030 If an IPMG receives a frame on a BD for which it is not the IPMG-DF, 2031 it just follows normal OISM procedures. 2033 This section specifies several sets of procedures: 2035 o the procedures that the IPMG-DF for a given BD needs to follow 2036 when receiving, on that BD, an IP multicast frame from a non-OISM 2037 PE; 2039 o the procedures that the IPMG-DF for a given BD needs to follow 2040 when receiving, on that BD, an IP multicast frame from an OISM PE; 2042 o the procedures that an OISM PE needs to follow when receiving, on 2043 a given BD, an IP multicast frame from a non-OISM PE, when the 2044 OISM PE is not the IPMG-DF for that BD. 2046 To enable OISM/non-OISM interworking in a given Tenant Domain, the 2047 Tenant Domain MUST have some EVPN-PEs that can function as IPMGs. An 2048 IPMG must be configured with the SBD. It must also be configured 2049 with every BD of the Tenant Domain that exists on any of the non-OISM 2050 PEs of that domain. (Operationally, it may be simpler to configure 2051 the IPMG with all the BDs of the Tenant Domain.) 2052 A non-OISM PE of course only needs to be configured with BDs for 2053 which it has ACs. An OISM PE that is not an IPMG only needs to be 2054 configured with the SBD and with the BDs for which it has ACs. 2056 An IPMG MUST originate a wildcard SMET route (with (C-*,C-*) in the 2057 NLRI) for each BD in the Tenant Domain. This will cause it to 2058 receive all the IP multicast traffic that is sourced in the Tenant 2059 Domain. Note that non-OISM nodes that do not support 2060 [I-D.ietf-bess-evpn-igmp-mld-proxy] will send all the multicast 2061 traffic from a given BD to all PEs attached to that BD, even if those 2062 PEs do not originate an SMET route. 2064 The interworking procedures vary somewhat depending upon whether 2065 packets are transmitted from PE to PE via Ingress Replication (IR) or 2066 via Point-to-Multipoint (P2MP) tunnels. We do not consider the use 2067 of BIER in this section, due to the low likelihood of there being a 2068 non-OISM PE that supports BIER. 2070 5.1. IPMG Designated Forwarder 2072 Every PE that is eligible for selection as an IPMG-DF for a 2073 particular BD originates both an IMET route for that BD and an 2074 SBD-IMET route. As stated in Section 5, these SBD-IMET routes carry 2075 a Multicast Flags EC with the IPMG Flag set. 2077 These SBD-IMET routes SHOULD also carry a DF Election EC. The DF 2078 Election EC and its use is specified in ([RFC8584]). When the route 2079 is originated, the AC-DF bit in the DF Election EC SHOULD be set to 2080 zero. This bit is not used when selecting an IPMSG-DF, i.e., it MUST 2081 be ignored by the receiver of an SBD-IMET route. 2083 In the context of a given Tenant Domain, to select the IPMG-DF for a 2084 particular BD, say BD1, the IPMGs of the Tenant Domain perform the 2085 following procedure: 2087 o From the set of received SBD-IMET routes for the given tenant 2088 domain, determine the candidate set of PEs that support IPMG 2089 functionality for that domain. 2091 o Eliminate from that candidate set any PEs from which an IMET route 2092 for BD1 has not been received. 2094 o Select a DF Election algorithm as specified in [RFC8584]. Some of 2095 the possible algorithms can be found, e.g., in [RFC8584], 2096 [RFC7432], and [I-D.ietf-bess-evpn-pref-df]. 2098 o Apply the DF Election Algorithm (see [RFC8584]) to the candidate 2099 set of PEs. The "winner' becomes the IPMG-DF for BD1. 2101 Note that even if a given PE supports MEG (Section 6.1.2) and/or PEG 2102 (Section 6.1.4) functionality, as well as IPMG functionality, its 2103 SBD-IMET routes carry only one DF Election EC. 2105 5.2. Ingress Replication 2107 The procedures of this section are used when Ingress Replication is 2108 used to transmit packets from one PE to another. 2110 When a non-OISM PE-S transmits a multicast frame from BD-S to another 2111 PE, PE-R, PE-S will use the encapsulation specified in the BD-S IMET 2112 route that was originated by PE-R. This encapsulation will include 2113 the label that appears in the "MPLS label" field of the PMSI Tunnel 2114 attribute (PTA) of the IMET route. If the tunnel type is VXLAN, the 2115 "label" is actually a Virtual Network Identifier (VNI); for other 2116 tunnel types, the label is an MPLS label. In either case, we will 2117 speak of the transmitted frames as carrying a label that was assigned 2118 to a particular BD by the PE-R to which the frame is being 2119 transmitted. 2121 To support OISM/non-OISM interworking, an OISM PE-R MUST originate, 2122 for each of its BDs, both an IMET route and an S-PMSI (C-*,C-*) A-D 2123 route. Note that even when IR is being used, interworking between 2124 OISM and non-OISM PEs requires the OISM PEs to follow the rules of 2125 Section 3.2.5.2, as modified below. 2127 Non-OISM PEs will not understand S-PMSI A-D routes. So when a 2128 non-OISM PE-S transmits an IP multicast frame with a particular 2129 source BD to an IPMG, it encapsulates the frame using the label 2130 specified in that IPMG's BD-S IMET route. (This is just the 2131 procedure of [RFC7432].) 2133 The (C-*,C-*) S-PMSI A-D route originated by a given OISM PE will 2134 have a PTA that specifies IR. 2136 o If MPLS tunneling is being used, the MPLS label field SHOULD 2137 contain a non-zero value, and the LIR flag SHOULD be zero. (The 2138 case where the MPLS label field is zero or the LIR flag is set is 2139 outside the scope of this document.) 2141 o If the tunnel encapsulation is VXLAN, the MPLS label field MUST 2142 contain a non-zero value, and the LIR flag MUST be zero. 2144 When an OISM PE-S transmits an IP multicast frame to an IPMG, it will 2145 use the label specified in that IPMG's (C-*,C-*) S-PMSI A-D route. 2147 When a PE originates both an IMET route and a (C-*,C-*) S-PMSI A-D 2148 route, the values of the MPLS label field in the respective PTAs must 2149 be distinct. Further, each MUST map uniquely (in the context of the 2150 originating PE) to the route's BD. 2152 As a result, an IPMG receiving an MPLS-encapsulated IP multicast 2153 frame can always tell by the label whether the frame's ingress PE is 2154 an OISM PE or a non-OISM PE. When an IPMG receives a VXLAN- 2155 encapsulated IP multicast frame it may need to determine the identity 2156 of the ingress PE from the outer IP encapsulation; it can then 2157 determine whether the ingress PE is an OISM PE or a non-OISM PE by 2158 looking the IMET route from that PE. 2160 Suppose an IPMG receives an IP multicast frame from another EVPN-PE 2161 in the Tenant Domain, and the IPMG is not the IPMG-DF for the frame's 2162 source BD. Then the IPMG performs only the ordinary OISM functions; 2163 it does not perform the IPMG-specific functions for that frame. In 2164 the remainder of this section, when we discuss the procedures applied 2165 by an IPMG when it receives an IP multicast frame, we are presuming 2166 that the source BD of the frame is a BD for which the IPMG is the 2167 IPMG-DF. 2169 We have two basic cases to consider: (1) a frame's ingress PE is a 2170 non-OISM node, and (2) a frame's ingress PE is an OISM node. 2172 5.2.1. Ingress PE is non-OISM 2174 In this case, a non-OISM PE, PE-S, has received an (S,G) multicast 2175 frame over an AC that is attached to a particular BD, BD-S. By 2176 virtue of normal EVPN procedures, PE-S has sent a copy of the frame 2177 to every PE-R (both OISM and non-OISM) in the Tenant Domain that is 2178 attached to BD-S. If the non-OISM node supports 2179 [I-D.ietf-bess-evpn-igmp-mld-proxy], only PEs that have expressed 2180 interest in (S,G) receive the frame. The IPMG will have expressed 2181 interest via a (C-*,C-*) SMET route and thus receives the frame. 2183 Any OISM PE (including an IPMG) receiving the frame will apply normal 2184 OISM procedures. As a result it will deliver the frame to any of its 2185 local ACs (in BD-S or in any other BD) that have interest in (S,G). 2187 An OISM PE that is also the IPMG-DF for a particular BD, say BD-S, 2188 has additional procedures that it applies to frames received on BD-S 2189 from non-OISM PEs: 2191 1. When the IPMG-DF for BD-S receives an (S,G) frame from a 2192 non-OISM node, it MUST forward a copy of the frame to every OISM 2193 PE that is NOT attached to BD-S but has interest in (S,G). The 2194 copy sent to a given OISM PE-R must carry the label that PE-R 2195 has assigned to the SBD in an S-PMSI A-D route. The IPMG MUST 2196 NOT do any IP processing of the frame's IP payload. TTL 2197 decrement and other IP processing will be done by PE-R, per the 2198 normal OISM procedures. There is no need for the IPMG to 2199 include an ESI label in the frame's tunnel encapsulation, 2200 because it is already known that the frame's source BD has no 2201 presence on PE-R. There is also no need for the IPMG to modify 2202 the frame's MAC SA. 2204 2. In addition, when the IPMG-DF for BD-S receives an (S,G) frame 2205 from a non-OISM node, it may need to forward copies of the frame 2206 to other non-OISM nodes. Before it does so, it MUST decapsulate 2207 the (S,G) packet, and do the IP processing (e.g., TTL 2208 decrement). Suppose PE-R is a non-OISM node that has an AC to 2209 BD-R, where BD-R is not the same as BD-S, and that AC has 2210 interest in (S,G). The IPMG must then encapsulate the (S,G) 2211 packet (after the IP processing has been done) in an ethernet 2212 header. The MAC SA field will have the MAC address of the 2213 IPMG's IRB interface to BD-R. The IPMG then sends the frame to 2214 PE-R. The tunnel encapsulation will carry the label that PE-R 2215 advertised in its IMET route for BD-R. There is no need to 2216 include an ESI label, as the source and destination BDs are 2217 known to be different. 2219 Note that if a non-OISM PE-R has several BDs (other than BD-S) 2220 with local ACs that have interest in (S,G), the IPMG will send 2221 it one copy for each such BD. This is necessary because the 2222 non-OISM PE cannot move packets from one BD to another. 2224 There may be deployment scenarios in which every OISM PE is 2225 configured with every BD that is present on any non-OISM PE. In such 2226 scenarios, the procedures of item 1 above will not actually result in 2227 the transmission of any packets. Hence if it is known a priori that 2228 this deployment scenario exists for a given tenant domain, the 2229 procedures of item 1 above can be disabled. 2231 5.2.2. Ingress PE is OISM 2233 In this case, an OISM PE, PE-S, has received an (S,G) multicast frame 2234 over an AC that attaches to a particular BD, BD-S. 2236 By virtue of receiving all the IMET routes about BD-S, PE-S will know 2237 all the PEs attached to BD-S. By virtue of normal OISM procedures: 2239 o PE-S will send a copy of the frame to every OISM PE-R (including 2240 the IPMG) in the Tenant Domain that is attached to BD-S and has 2241 interest in (S,G). The copy sent to a given PE-R carries the 2242 label that that the PE-R has assigned to BD-S in its (C-*,C-*) 2243 S-PMSI A-D route. 2245 o PE-S will also transmit a copy of the (S,G) frame to every OISM 2246 PE-R that has interest in (S,G) but is not attached to BD-S. The 2247 copy will contain the label that the PE-R has assigned to the SBD. 2248 (As in Section 5.2.1, an IPMG is assumed to have indicated 2249 interest in all multicast flows.) 2251 o PE-S will also transmit a copy of the (S,G) frame to every 2252 non-OISM PE-R that is attached to BD-S. It does this using the 2253 label advertised by that PE-R in its IMET route for BD-S. 2255 The PE-Rs follow their normal procedures. An OISM PE that receives 2256 the (S,G) frame on BD-S applies the OISM procedures to deliver the 2257 frame to its local ACs, as necessary. A non-OISM PE that receives 2258 the (S,G) frame on BD-S delivers the frame only to its local BD-S 2259 ACs, as necessary. 2261 Suppose that a non-OISM PE-R has interest in (S,G) on a BD, BD-R, 2262 that is different than BD-S. If the non-OISM PE-R is attached to 2263 BD-S, the OISM PE-S will send forward it the original (S,G) multicast 2264 frame, but the non-OISM PE-R will not be able to send the frame to 2265 ACs that are not in BD-S. If PE-R is not even attached to BD-S, the 2266 OISM PE-S will not send it a copy of the frame at all, because PE-R 2267 is not attached to the SBD. In these cases, the IPMG needs to relay 2268 the (S,G) multicast traffic from OISM PE-S to non-OISM PE-R. 2270 When the IPMG-DF for BD-S receives an (S,G) frame from an OISM PE-S, 2271 it has to forward it to every non-OISM PE-R that that has interest in 2272 (S,G) on a BD-R that is different than BD-S. The IPMG MUST 2273 decapsulate the IP multicast packet, do the IP processing, re- 2274 encapsulate it for BD-R (changing the MAC SA to the IPMG's own MAC 2275 address on BD-R), and send a copy of the frame to PE-R. Note that a 2276 given non-OISM PE-R will receive multiple copies of the frame, if it 2277 has multiple BDs on which there is interest in the frame. 2279 5.3. P2MP Tunnels 2281 When IR is used to distribute the multicast traffic among the 2282 EVPN-PEs, the procedures of Section 5.2 ensure that there will be no 2283 duplicate delivery of multicast traffic. That is, no egress PE will 2284 ever send a frame twice on any given AC. If P2MP tunnels are being 2285 used to distribute the multicast traffic, it is necessary have 2286 additional procedures to prevent duplicate delivery. 2288 At the present time, it is not clear that there will be a use case in 2289 which OISM nodes need to interwork with non-OISM nodes that use P2MP 2290 tunnels. If it is determined that there is such a use case, 2291 procedures for it will be included in a future revision of this 2292 document. 2294 6. Traffic to/from Outside the EVPN Tenant Domain 2296 In this section, we discuss scenarios where a multicast source 2297 outside a given EVPN Tenant Domain sends traffic to receivers inside 2298 the domain (as well as, possibly, to receivers outside the domain). 2299 This requires the OISM procedures to interwork with various layer 3 2300 multicast routing procedures. 2302 We assume in this section that the Tenant Domain is not being used as 2303 an intermediate transit network for multicast traffic; that is, we do 2304 not consider the case where the Tenant Domain contains multicast 2305 routers that will receive traffic from sources outside the domain and 2306 forward the traffic to receivers outside the domain. The transit 2307 scenario is considered in Section 7. 2309 We can divide the non-transit scenarios into two classes: 2311 1. One or more of the EVPN PE routers provide the functionality 2312 needed to interwork with layer 3 multicast routing procedures. 2314 2. A single BD in the Tenant Domain contains external multicast 2315 routers ("tenant multicast routers"), and those tenant multicast 2316 routers are used to interwork, on behalf of the entire Tenant 2317 Domain, with layer 3 multicast routing procedures. 2319 6.1. Layer 3 Interworking via EVPN OISM PEs 2321 6.1.1. General Principles 2323 Sometimes it is necessary to interwork an EVPN Tenant Domain with an 2324 external layer 3 multicast domain (the "external domain"). This is 2325 needed to allow EVPN tenant systems to receive multicast traffic from 2326 sources ("external sources") outside the EVPN Tenant Domain. It is 2327 also needed to allow receivers ("external receivers") outside the 2328 EVPN Tenant Domain to receive traffic from sources inside the Tenant 2329 Domain. 2331 In order to allow interworking between an EVPN Tenant Domain and an 2332 external domain, one or more OISM PEs must be "L3 Gateways". An L3 2333 Gateway participates both in the OISM procedures and in the L3 2334 multicast routing procedures of the external domain. 2336 An L3 Gateway that has interest in receiving (S,G) traffic must be 2337 able to determine the best route to S. If an L3 Gateway has interest 2338 in (*,G), it must be able to determine the best route to G's RP. In 2339 these interworking scenarios, the L3 Gateway must be running a layer 2340 3 unicast routing protocol. Via this protocol, it imports unicast 2341 routes (either IP routes or VPN-IP routes) from routers other than 2342 EVPN PEs. And since there may be multicast sources inside the EVPN 2343 Tenant Domain, the EVPN PEs also need to export, either as IP routes 2344 or as VPN-IP routes (depending upon the external domain), unicast 2345 routes to those sources. 2347 When selecting the best route to a multicast source or RP, an L3 2348 Gateway might have a choice between an EVPN route and an IP/VPN-IP 2349 route. When such a choice exists, the L3 Gateway SHOULD always 2350 prefer the EVPN route. This will ensure that when traffic originates 2351 in the Tenant Domain and has a receiver in the Tenant Domain, the 2352 path to that receiver will remain within the EVPN Tenant Domain, even 2353 if the source is also reachable via a routed path. This also 2354 provides protection against sub-optimal routing that might occur if 2355 two EVPN PEs export IP/VPN-IP routes and each imports the other's IP/ 2356 VPN-IP routes. 2358 Section 4.2 discusses the way layer 3 multicast states are 2359 constructed by OISM PEs. These layer 3 multicast states have IRB 2360 interfaces as their IIF and OIF list entries, and are the basis for 2361 interworking OISM with other layer 3 multicast procedures such as 2362 MVPN or PIM. From the perspective of the layer 3 multicast 2363 procedures running in a given L3 Gateway, an EVPN Tenant Domain is a 2364 set of IRB interfaces. 2366 When interworking an EVPN Tenant Domain with an external domain, the 2367 L3 Gateway's layer 3 multicast states will not only have IRB 2368 interfaces as IIF and OIF list entries, but also other "interfaces" 2369 that lead outside the Tenant Domain. For example, when interworking 2370 with MVPN, the multicast states may have MVPN tunnels as well as IRB 2371 interfaces as IIF or OIF list members. When interworking with PIM, 2372 the multicast states may have PIM-enabled non-IRB interfaces as IIF 2373 or OIF list members. 2375 As long as a Tenant Domain is not being used as an intermediate 2376 transit network for IP multicast traffic, it is not necessary to 2377 enable PIM on its IRB interfaces. 2379 In general, an L3 Gateway has the following responsibilities: 2381 o It exports, to the external domain, unicast routes to those 2382 multicast sources in the EVPN Tenant Domain that are locally 2383 attached to the L3 Gateway. 2385 o It imports, from the external domain, unicast routes to multicast 2386 sources that are in the external domain. 2388 o It executes the procedures necessary to draw externally sourced 2389 multicast traffic that is of interest to locally attached 2390 receivers in the EVPN Tenant Domain. When such traffic is 2391 received, the traffic is sent down the IRB interfaces of the BDs 2392 on which the locally attached receivers reside. 2394 One of the L3 Gateways in a given Tenant Domain becomes the "DR" for 2395 the SBD. (See Section 6.1.2.4.) This L3 gateway has the following 2396 additional responsibilities: 2398 o It exports, to the external domain, unicast routes to multicast 2399 sources that in the EVPN Tenant Domain that are not locally 2400 attached to any L3 gateway. 2402 o It imports, from the external domain, unicast routes to multicast 2403 sources that are in the external domain. 2405 o It executes the procedures necessary to draw externally sourced 2406 multicast traffic that is of interest to receivers in the EVPN 2407 Tenant Domain that are not locally attached to an L3 gateway. 2408 When such traffic is received, the traffic is sent down the SBD 2409 IRB interface. OISM procedures already described in this document 2410 will then ensure that the IP multicast traffic gets distributed 2411 throughout the Tenant Domain to any EVPN PEs that have interest in 2412 it. Thus to an OISM PE that is not an L3 gateway the externally 2413 sourced traffic will appear to have been sourced on the SBD. 2415 In order for this to work, some special care is needed when an L3 2416 gateway creates or modifies a layer 3 (*,G) multicast state. Suppose 2417 group G has both external sources (sources outside the EVPN Tenant 2418 Domain) and internal sources (sources inside the EVPN tenant domain). 2419 Section 4.2 states that when there are internal sources, the SBD IRB 2420 interface must not be added to the OIF list of the (*,G) state. 2421 Traffic from internal sources will already have been delivered to all 2422 the EVPN PEs that have interest in it. However, if the OIF list of 2423 the (*,G) state does not contain its SBD IRB interface, then traffic 2424 from external sources will not get delivered to other EVPN PEs. 2426 One way of handling this is the following. When a L3 gateway 2427 receives (S,G) traffic from other than an IRB interface, and the 2428 traffic corresponds to a layer 3 (*,G) state, the L3 gateway can 2429 create (S,G) state. The IIF will be set to the external interface 2430 over which the traffic is expected. The OIF list will contain the 2431 SBD IRB interface, as well as the IRB interfaces of any other BDs 2432 attached to the PEG DR that have locally attached receivers with 2433 interest in the (S,G) traffic. The (S,G) state will ensure that the 2434 external traffic is sent down the SBD IRB interface. The following 2435 text will assume this procedure; however other implementation 2436 techniques may also be possible. 2438 If a particular BD is attached to several L3 Gateways, one of the L3 2439 Gateways becomes the DR for that BD. (See Section 6.1.2.4.) If the 2440 interworking scenario requires FHR functionality, it is generally the 2441 DR for a particular BD that is responsible for performing that 2442 functionality on behalf of the source hosts on that BD. (E.g., if 2443 the interworking scenario requires that PIM Register messages be sent 2444 by a FHR, the DR for a given BD would send the PIM Register messages 2445 for sources on that BD.) Note though that the DR for the SBD does 2446 not perform FHR functionality on behalf of external sources. 2448 An optional alternative is to have each L3 gateway perform FHR 2449 functionality for locally attached sources. Then the DR would only 2450 have to perform FHR functionality on behalf of sources that are 2451 locally attached to itself AND sources that are not attached to any 2452 L3 gateway. 2454 N.B.: If it is possible that more than one BD contains a tenant 2455 multicast router, then a PE receiving an SMET route for that BD MUST 2456 NOT reconstruct IGMP Join Reports from the SMET route, and MUST NOT 2457 transmit any such IGMP Join Reports on its local ACs attaching to 2458 that BD. Otherwise, multicast traffic may be duplicated. 2460 6.1.2. Interworking with MVPN 2462 In this section, we specify the procedures necessary to allow EVPN 2463 PEs running OISM procedures to interwork with L3VPN PEs that run BGP- 2464 based MVPN ([RFC6514]) procedures. More specifically, the procedures 2465 herein allow a given EVPN Tenant Domain to become part of an L3VPN/ 2466 MVPN, and support multicast flows where either: 2468 o The source of a given multicast flow is attached to an ethernet 2469 segment whose BD is part of an EVPN Tenant Domain, and one or more 2470 receivers of the flow are attached to the network via L3VPN/MVPN. 2471 (Other receivers may be attached to the network via EVPN.) 2473 o The source of a given multicast flow is attached to the network 2474 via L3VPN/MVPN, and one or more receivers of the flow are attached 2475 to an ethernet segment that is part of an EVPN tenant domain. 2476 (Other receivers may be attached via L3VPN/MVPN.) 2478 In this interworking model, existing L3VPN/MVPN PEs are unaware that 2479 certain sources or receivers are part of an EVPN Tenant Domain. The 2480 existing L3VPN/MVPN nodes run only their standard procedures and are 2481 entirely unaware of EVPN. Interworking is achieved by having some or 2482 all of the EVPN PEs function as L3 Gateways running L3VPN/MVPN 2483 procedures, as detailed in the following sub-sections. 2485 In this section, we assume that there are no tenant multicast routers 2486 on any of the EVPN-attached ethernet segments. (There may of course 2487 be multicast routers in the L3VPN.) Consideration of the case where 2488 there are tenant multicast routers is deferred till Section 7.) 2490 To support MVPN/EVPN interworking, we introduce the notion of an 2491 MVPN/EVPN Gateway, or MEG. 2493 A MEG is an L3 Gateway (see Section 6.1.1), hence is both an OISM PE 2494 and an L3VPN/MVPN PE. For a given EVPN Tenant Domain it will have an 2495 IP-VRF. If the Tenant Domain is part of an L3VPN/MVPN, the IP-VRF 2496 also serves as an L3VPN VRF ([RFC4364]). The IRB interfaces of the 2497 IP-VRF are considered to be "VRF interfaces" of the L3VPN VRF. The 2498 L3VPN VRF may also have other local VRF interfaces that are not EVPN 2499 IRB interfaces. 2501 The VRF on the MEG will import VPN-IP routes ([RFC4364]) from other 2502 L3VPN Provider Edge (PE) routers. It will also export VPN-IP routes 2503 to other L3VPN PE routers. In order to do so, it must be 2504 appropriately configured with the Route Targets used in the L3VPN to 2505 control the distribution of the VPN-IP routes. These Route Targets 2506 will in general be different than the Route Targets used for 2507 controlling the distribution of EVPN routes, as there is no need to 2508 distribute EVPN routes to L3VPN-only PEs and no reason to distribute 2509 L3VPN/MVPN routes to EVPN-only PEs. 2511 Note that the RDs in the imported VPN-IP routes will not necessarily 2512 conform to the EVPN rules (as specified in [RFC7432]) for creating 2513 RDs. Therefore a MEG MUST NOT expect the RDs of the VPN-IP routes to 2514 be of any particular format other than what is required by the L3VPN/ 2515 MVPN specifications. 2517 The VPN-IP routes that a MEG exports to L3VPN are subnet routes and/ 2518 or host routes for the multicast sources that are part of the EVPN 2519 tenant domain. The exact set of routes that need to be exported is 2520 discussed in Section 6.1.2.2. 2522 Each IMET route originated by a MEG SHOULD carry a Multicast Flags 2523 Extended Community with the "MEG" flag set, indicating that the 2524 originator of the IMET route is a MEG. However, PE1 will consider 2525 PE2 to be a MEG if PE1 imports at least one IMET route from PE2 that 2526 carries the Multicast Flags EC with the MEG flag set. 2528 All the MEGs of a given Tenant Domain attach to the SBD of that 2529 domain, and one of them is selected to be the SBD's Designated Router 2530 (the "MEG SBD-DR") for the domain. The selection procedure is 2531 discussed in Section 6.1.2.4. 2533 In this model of operation, MVPN procedures and EVPN procedures are 2534 largely independent. In particular, there is no assumption that MVPN 2535 and EVPN use the same kind of tunnels. Thus no special procedures 2536 are needed to handle the common scenarios where, e.g., EVPN uses 2537 VXLAN tunnels but MVPN uses MPLS P2MP tunnels, or where EVPN uses 2538 Ingress Replication but MVPN uses MPLS P2MP tunnels. 2540 Similarly, no special procedures are needed to prevent duplicate data 2541 delivery on ethernet segments that are multi-homed. 2543 The MEG does have some special procedures (described below) for 2544 interworking between EVPN and MVPN; these have to do with selection 2545 of the Upstream PE for a given multicast source, with the exporting 2546 of VPN-IP routes, and with the generation of MVPN C-multicast routes 2547 triggered by the installation of SMET routes. 2549 6.1.2.1. MVPN Sources with EVPN Receivers 2551 6.1.2.1.1. Identifying MVPN Sources 2553 Consider a multicast source S. It is possible that a MEG will import 2554 both an EVPN unicast route to S and a VPN-IP route (or an ordinary IP 2555 route), where the prefix length of each route is the same. In order 2556 to draw (S,G) multicast traffic for any group G, the MEG SHOULD use 2557 the EVPN route rather than the VPN-IP or IP route to determine the 2558 "Upstream PE" (see section 5 of [RFC6513]). 2560 Doing so ensures that when an EVPN tenant system desires to receive a 2561 multicast flow from another EVPN tenant system, the traffic from the 2562 source to that receiver stays within the EVPN domain. This prevents 2563 problems that might arise if there is a unicast route via L3VPN to S, 2564 but no multicast routers along the routed path. This also prevents 2565 problem that might arise as a result of the fact that the MEGs will 2566 import each others' VPN-IP routes. 2568 In the Section 6.1.2.1.2, we describe the procedures to be used when 2569 the selected route to S is a VPN-IP route. 2571 6.1.2.1.2. Joining a Flow from an MVPN Source 2573 Consider a tenant system, R, on a particular BD, BD-R. Suppose R 2574 wants to receive (S,G) multicast traffic, where source S is not 2575 attached to any PE in the EVPN Tenant Domain, but is attached to an 2576 MVPN PE. 2578 o Suppose R is on a singly homed ethernet segment of BD-R, and that 2579 segment is attached to PE1, where PE1 is a MEG. PE1 learns via 2580 IGMP/MLD listening that R is interested in (S,G). PE1 determines 2581 from its VRF that there is no route to S within the Tenant Domain 2582 (i.e., no EVPN RT-2 route with S's IP address), but that there is 2583 a route to S via L3VPN (i.e., the VRF contains a subnet or host 2584 route to S that was received as a VPN-IP route). PE1 thus 2585 originates (if it hasn't already) an MVPN C-multicast Source Tree 2586 Join(S,G) route. The route is constructed according to normal 2587 MVPN procedures. 2589 The layer 2 multicast state is constructed as specified in 2590 Section 4.1. 2592 In the layer 3 multicast state, the IIF is the appropriate MVPN 2593 tunnel, and the IRB interface to BD-R is added to the OIF list. 2595 When PE1 receives (S,G) traffic from the appropriate MVPN tunnel, 2596 it performs IP processing of the traffic, and then sends the 2597 traffic down its IRB interface to BD-R. Following normal OISM 2598 procedures, the (S,G) traffic will be encapsulated for ethernet 2599 and sent out the AC to which R is attached. 2601 o Suppose R is on a singly homed ethernet segment of BD-R, and that 2602 segment is attached to PE1, where PE1 is an OISM PE but is NOT a 2603 MEG. PE1 learns via IGMP/MLD listening that R is interested in 2604 (S,G). PE1 follows normal OISM procedures, originating an SBD- 2605 SMET route for (S,G); this route will be received by all the MEGs 2606 of the Tenant Domain, including the MEG SBD-DR. The MEG SBD-DR 2607 can determine from PE1's IMET routes whether PE1 is itself a MEG. 2608 If PE1 is not a MEG, the MEG SBD-DR will originate (if it hasn't 2609 already) an MVPN C-multicast Source Tree Join(S,G) route. This 2610 will cause the MEG SBD-DR to receive (S,G) traffic on an MVPN 2611 tunnel. 2613 The layer 2 multicast state is constructed as specified in 2614 Section 4.1. 2616 In the layer 3 multicast state, the IIF is the appropriate MVPN 2617 tunnel, and the IRB interface to the SBD is added to the OIF list. 2619 When the MEG SBD-DR receives (S,G) traffic on an MVPN tunnel, it 2620 performs IP processing of the traffic, and the sends the traffic 2621 down its IRB interface to the SBD. Following normal OISM 2622 procedures, the traffic will be encapsulated for ethernet and 2623 delivered to all PEs in the Tenant Domain that have interest in 2624 (S,G), including PE1. 2626 o If R is on a multi-homed ethernet segment of BD-R, one of the PEs 2627 attached to the segment will be its DF (following normal EVPN 2628 procedures), and the DF will know (via IGMP/MLD listening or the 2629 procedures of [I-D.ietf-bess-evpn-igmp-mld-proxy]) that a tenant 2630 system reachable via one of its local ACs to BD-R is interested in 2631 (S,G) traffic. The DF is responsible for originating an SBD-SMET 2632 route for (S,G), following normal OISM procedures. If the DF is a 2633 MEG, it MUST originate the corresponding MVPN C-multicast Source 2634 Tree Join(S,G) route; if the DF is not a MEG, the MEG SBD-DR SBD 2635 MUST originate the C-multicast route when it receives the SMET 2636 route. 2638 Optionally, if the non-DF is a MEG, it MAY originate the 2639 corresponding MVPN C-multicast Source Tree Join(S,G) route. This 2640 will cause the traffic to flow to both the DF and the non-DF, but 2641 only the DF will forward the traffic out an AC. This allows for 2642 quicker recovery if the DF's local AC to R fails. 2644 o If R is attached to a non-OISM PE, it will receive the traffic via 2645 an IPMG, as specified in Section 5. 2647 If an EVPN-attached receiver is interested in (*,G) traffic, and if 2648 it is possible for there to be sources of (*,G) traffic that are 2649 attached only to L3VPN nodes, the MEGs will have to know the group- 2650 to-RP mappings. That will enable them to originate MVPN C-multicast 2651 Shared Tree Join(*,G) routes and to send them towards the RP. (Since 2652 we are assuming in this section that there are no tenant multicast 2653 routers attached to the EVPN Tenant Domain, the RP must be attached 2654 via L3VPN. Alternatively, the MEG itself could be configured to 2655 function as an RP for group G.) 2657 The layer 2 multicast states are constructed as specified in 2658 Section 4.1. 2660 In the layer 3 (*,G) multicast state, the IIF is the appropriate MVPN 2661 tunnel. A MEG will add to the (*,G) OIF list its IRB interfaces for 2662 any BDs containing locally attached receivers. If there are 2663 receivers attached to other EVPN PEs, then whenever (S,G) traffic 2664 from an external source matches a (*,G) state, the MEG will create 2665 (S,G) state, with the MVPN tunnel as the IIF, the OIF list copied 2666 from the (*,G) state, and the SBD IRB interface added to the OIF 2667 list. (Please see the discussion in Section 6.1.1 regarding the 2668 inclusion of the SBD IRB interface in a (*,G) state; the SBD IRB 2669 interface is used in the OIF list only for traffic from external 2670 sources.) 2672 Normal MVPN procedures will then result in the MEG getting the (*,G) 2673 traffic from all the multicast sources for G that are attached via 2674 L3VPN. This traffic arrives on MVPN tunnels. When the MEG removes 2675 the traffic from these tunnels, it does the IP processing. If there 2676 are any receivers on a given BD, BD-R, that are attached via local 2677 EVPN ACs, the MEG sends the traffic down its BD-R IRB interface. If 2678 there are any other EVPN PEs that are interested in the (*,G) 2679 traffic, the MEG sends the traffic down the SBD IRB interface. 2680 Normal OISM procedures then distribute the traffic as needed to other 2681 EVPN-PEs. 2683 6.1.2.2. EVPN Sources with MVPN Receivers 2685 6.1.2.2.1. General procedures 2687 Consider the case where an EVPN tenant system S is sending IP 2688 multicast traffic to group G, and there is a receiver R for the (S,G) 2689 traffic that is attached to the L3VPN, but not attached to the EVPN 2690 Tenant Domain. (We assume in this document that the L3VPN/MVPN-only 2691 nodes will not have any special procedures to deal with the case 2692 where a source is inside an EVPN domain.) 2694 In this case, an L3VPN PE through which R can be reached has to send 2695 an MVPN C-multicast Join(S,G) route to one of the MEGs that is 2696 attached to the EVPN Tenant Domain. For this to happen, the L3VPN PE 2697 must have imported a VPN-IP route for S (either a host route or a 2698 subnet route) from a MEG. 2700 If a MEG determines that there is multicast source transmitting on 2701 one of its ACs, the MEG SHOULD originate a VPN-IP host route for that 2702 source. This determination SHOULD be made by examining the IP 2703 multicast traffic that arrives on the ACs. (It MAY be made by 2704 provisioning.) A MEG SHOULD NOT export a VPN-IP host route for any 2705 IP address that is not known to be a multicast source (unless it has 2706 some other reason for exporting such a route). The VPN-IP host route 2707 for a given multicast source MUST be withdrawn if the source goes 2708 silent for a configurable period of time, or if it can be determined 2709 that the source is no longer reachable via a local AC. 2711 A MEG SHOULD also originate a VPN-IP subnet route for each of the BDs 2712 in the Tenant Domain. 2714 VPN-IP routes exported by a MEG must carry any attributes or extended 2715 communities that are required by L3VPN and MVPN. In particular, a 2716 VPN-IP route exported by a MEG must carry a VRF Route Import Extended 2717 Community corresponding to the IP-VRF from which it is imported, and 2718 a Source AS Extended Community. 2720 As a result, if S is attached to a MEG, the L3VPN nodes will direct 2721 their MVPN C-multicast Join routes to that MEG. Normal MVPN 2722 procedures will cause the traffic to be delivered to the L3VPN nodes. 2723 The layer 3 multicast state for (S,G) will have the MVPN tunnel on 2724 its OIF list. The IIF will be the IRB interface leading to the BD 2725 containing S. 2727 If S is not attached to a MEG, the L3VPN nodes will direct their 2728 C-multicast Join routes to whichever MEG appears to be on the best 2729 route to S's subnet. Upon receiving the C-multicast Join, that MEG 2730 will originate an EVPN SMET route for (S,G). As a result, the MEG 2731 will receive the (S,G) traffic at layer 2 via the OISM procedures. 2732 The (S,G) traffic will be sent up the appropriate IRB interface, and 2733 the layer 3 MVPN procedures will ensure that the traffic is delivered 2734 to the L3VPN nodes that have requested it. The layer 3 multicast 2735 state for (S,G) will have the MVPN tunnel in the OIF list, and the 2736 IIF will be one of the following: 2738 o If S belongs to a BD that is attached to the MEG, the IIF will be 2739 the IRB interface to that BD; 2741 o Otherwise the IIF will be the SBD IRB interface. 2743 Note that this works even if S is attached to a non-OISM PE, per the 2744 procedures of Section 5. 2746 6.1.2.2.2. Any-Source Multicast (ASM) Groups 2748 Suppose the MEG SBD-DR learns that one of the PEs in its Tenant 2749 Domain is interested in (*,G), traffic, where G is an Any-Source 2750 Multicast (ASM) group. If there are no tenant multicast routers, the 2751 MEG SBD-DR SHOULD perform the "First Hop Router" (FHR) functionality 2752 for group G on behalf of the Tenant Domain, as described in 2753 [RFC7761]. This means that the MEG SBD-DR must know the identity of 2754 the Rendezvous Point (RP) for each group, must send Register messages 2755 to the Rendezvous Point, etc. 2757 If the MEG SBD-DR is to be the FHR for the Tenant Domain, it must see 2758 all the multicast traffic that is sourced from within the domain and 2759 destined to an ASM group address. The MEG can ensure this by 2760 originating an SBD-SMET route for (*,*). 2762 (As a possible optimization, an SBD-SMET route for (*, "any ASM 2763 group") may be defined in a future revision of this draft.) 2765 In some deployment scenarios, it may be preferred that the MEG that 2766 receives the (S,G) traffic over an AC be the one provides the FHR 2767 functionality. This behavior is OPTIONAL. If this option is used, 2768 it MUST be ensured that the MEG DR does not provide the FHR 2769 functionality for (S,G) traffic that is attached to another MEG; FHR 2770 functionality for (S,G) traffic from a particular source S MUST be 2771 provided by only a single router. 2773 Other deployment scenarios are also possible. For example, one might 2774 want to configure the MEGs to themselves be RPs. In this case, the 2775 RPs would have to exchange with each other information about which 2776 sources are active. The method exchanging such information is 2777 outside the scope of this document. 2779 6.1.2.2.3. Source on Multihomed Segment 2781 Suppose S is attached to a segment that is all-active multi-homed to 2782 PEl and PE2. If S is transmitting to two groups, say G1 and G2, it 2783 is possible that PE1 will receive the (S,G1) traffic from S while PE2 2784 receives the (S,G2) traffic from S. 2786 This creates an issue for MVPN/EVPN interworking, because there is no 2787 way to cause L3VPN/MVPN nodes to select PE1 as the ingress PE for 2788 (S,G1) traffic while selecting PE2 as the ingress PE for (S,G2) 2789 traffic. 2791 However, the following procedure ensures that the IP multicast 2792 traffic will still flow, even if the L3VPN/MVPN nodes picks the 2793 "wrong" EVPN-PE as the Upstream PE for (say) the (S,G1) traffic. 2795 Suppose S is on an ethernet segment, belonging to BD1, that is 2796 multi-homed to both PE1 and PE2, where PE1 is a MEG. And suppose 2797 that IP multicast traffic from S to G travels over the AC that 2798 attaches the segment to PE2 . If PE1 receives a C-multicast Source 2799 Tree Join (S,G) route, it MUST originate an SMET route for (S,G). 2800 Normal OISM procedures will then cause PE2 to send the (S,G) traffic 2801 to PE1 on an EVPN IP multicast tunnel. Normal OISM procedures will 2802 also cause PE1 to send the (S,G) traffic up its BD1 IRB interface. 2803 Normal MVPN procedures will then cause PE1 to forward the traffic on 2804 an MVPN tunnel. In this case, the routing is not optimal, but the 2805 traffic does flow correctly. 2807 6.1.2.3. Obtaining Optimal Routing of Traffic Between MVPN and EVPN 2809 The routing of IP multicast traffic between MVPN nodes and EVPN nodes 2810 will be optimal as long as there is a MEG along the optimal route. 2811 There are various deployment strategies that can be used to obtain 2812 optimal routing between MVPN and EVPN. 2814 In one such scenario, a Tenant Domain will have a small number of 2815 strategically placed MEGs. For example, a Data Center may have a 2816 small number of MEGs that connect it to a wide-area network. Then 2817 the optimal route into or out of the Data Center would be through the 2818 MEGs. 2820 In this scenario, the MEGs do not need to originate VPN-IP host 2821 routes for the multicast sources, they only need to originate VPN-IP 2822 subnet routes. The internal structure of the EVPN is completely 2823 hidden from the MVPN node. EVPN actions such as MAC Mobility and 2824 Mass Withdrawal ([RFC7432]) have zero impact on the MVPN control 2825 plane. 2827 While this deployment scenario provides the most optimal routing and 2828 has the least impact on the installed based of MVPN nodes, it does 2829 complicate network planning considerations. 2831 Another way of providing routing that is close to optimal is to turn 2832 each EVPN PE into a MEG. Then routing of MVPN-to-EVPN traffic is 2833 optimal. However, routing of EVPN-to-MVPN traffic is not guaranteed 2834 to be optimal when a source host is on a multi-homed ethernet segment 2835 (as discussed in Section 6.1.2.2.) 2837 The obvious disadvantage of this method is that it requires every 2838 EVPN PE to be a MEG. 2840 The procedures specified in this document allow an operator to add 2841 MEG functionality to any subset of his EVPN OISM PEs. This allows an 2842 operator to make whatever trade-offs he deems appropriate between 2843 optimal routing and MEG deployment. 2845 6.1.2.4. Selecting the MEG SBD-DR 2847 Every PE that is eligible for selection as the MEG SBD-DR originates 2848 an SBD-IMET route. As stated in Section 5, these SBD-IMET routes 2849 carry a Multicast Flags EC with the MEG Flag set. 2851 These SBD-IMET routes SHOULD also carry a DF Election EC. The DF 2852 Election EC and its use is specified in ([RFC8584]). When the route 2853 is originated, the AC-DF bit in the DF Election EC SHOULD be set to 2854 zero. This bit is not used when selecting a MEG SBD-DR, i.e., it 2855 MUST be ignored by the receiver of an SBD-IMET route. 2857 In the context of a given Tenant Domain, to select the MEG SBD-DR, 2858 the MEGs of the Tenant Domain perform the following procedure: 2860 o From the set of received SBD-IMET routes for the given tenant 2861 domain, determine he candidate set of PEs that support MEG 2862 functionality for that domain. 2864 o Select a DF Election algorithm as specified in [RFC8584]. Some of 2865 the possible algorithms can be found, e.g., in [RFC7432], 2866 [RFC8584], and [I-D.ietf-bess-evpn-pref-df]. 2868 o Apply the DF Election Algorithm (see [RFC8584]) to the candidate 2869 set of PEs. The "winner" becomes the MEG SBD-DR. 2871 Note that if a given PE supports IPMG (Section 6.1.2) or PEG 2872 (Section 6.1.4) functionality as well as MEG functionality, its 2873 SBD-IMET routes carry only one DF Election EC. 2875 6.1.3. Interworking with 'Global Table Multicast' 2877 If multicast service to the outside sources and/or receivers is 2878 provided via the BGP-based "Global Table Multicast" (GTM) procedures 2879 of [RFC7716], the procedures of Section 6.1.2 can easily be adapted 2880 for EVPN/GTM interworking. The way to adapt the MVPN procedures to 2881 GTM is explained in [RFC7716]. 2883 6.1.4. Interworking with PIM 2885 As we have been discussing, there may be receivers in an EVPN tenant 2886 domain that are interested in multicast flows whose sources are 2887 outside the EVPN Tenant Domain. Or there may be receivers outside an 2888 EVPN Tenant Domain that are interested in multicast flows whose 2889 sources are inside the Tenant Domain. 2891 If the outside sources and/or receivers are part of an MVPN, 2892 interworking procedures are covered in Section 6.1.2. 2894 There are also cases where an external source or receiver are 2895 attached via IP, and the layer 3 multicast routing is done via PIM. 2896 In this case, the interworking between the "PIM domain" and the EVPN 2897 tenant domain is done at L3 Gateways that perform "PIM/EVPN Gateway" 2898 (PEG) functionality. A PEG is very similar to a MEG, except that its 2899 layer 3 multicast routing is done via PIM rather than via BGP. 2901 If external sources or receivers for a given group are attached to a 2902 PEG via a layer 3 interface, that interface should be treated as a 2903 VRF interface attached to the Tenant Domain's L3VPN VRF. The layer 3 2904 multicast routing instance for that Tenant Domain will either run PIM 2905 on the VRF interface or will listen for IGMP/MLD messages on that 2906 interface. If the external receiver is attached elsewhere on an IP 2907 network, the PE has to enable PIM on its interfaces to the backbone 2908 network. In both cases, the PE needs to perform PEG functionality, 2909 and its IMET routes must carry the Multicast Flags EC with the PEG 2910 flag set. 2912 For each BD on which there is a multicast source or receiver, one of 2913 the PEGs will becomes the PEG DR. DR selection can be done using the 2914 same procedures specified in Section 6.1.2.4, except with "PEG" 2915 substituted for "MEG". 2917 As long as there are no tenant multicast routers within the EVPN 2918 Tenant Domain, the PEGs do not need to run PIM on their IRB 2919 interfaces. 2921 6.1.4.1. Source Inside EVPN Domain 2923 If a PEG receives a PIM Join(S,G) from outside the EVPN tenant 2924 domain, it may find it necessary to create (S,G) state. The PE needs 2925 to determine whether S is within the Tenant Domain. If S is not 2926 within the EVPN Tenant Domain, the PE carries out normal layer 3 2927 multicast routing procedures. If S is within the EVPN tenant domain, 2928 the IIF of the (S,G) state is set as follows: 2930 o if S is on a BD that is attached to the PE, the IIF is the PE's 2931 IRB interface to that BD; 2933 o if S is not on a BD that is attached to the PE, the IIF is the 2934 PE's IRB interface to the SBD. 2936 When the PE creates such an (S,G) state, it MUST originate (if it 2937 hasn't already) an SBD-SMET route for (S,G). This will cause it to 2938 pull the (S,G) traffic via layer 2. When the traffic arrives over an 2939 EVPN tunnel, it gets sent up an IRB interface where the layer 3 2940 multicast routing determines the packet's disposition. The SBD-SMET 2941 route is withdrawn when the (S,G) state no longer exists (unless 2942 there is some other reason for not withdrawing it). 2944 If there are no tenant multicast routers with the EVPN tenant domain, 2945 there cannot be an RP in the Tenant Domain, so a PEG does not have to 2946 handle externally arriving PIM Join(*,G) messages. 2948 The PEG DR for a particular BD MUST act as the a First Hop Router for 2949 that BD. It will examine all (S,G) traffic on the BD, and whenever G 2950 is an ASM group, the PEG DR will send Register messages to the RP for 2951 G. This means that the PEG DR will need to pull all the (S,G) 2952 traffic originating on a given BD, by originating an SMET (*,*) route 2953 for that BD. If a PEG DR is the DR for all the BDS, in SHOULD 2954 originate just an SBD-SMET (*,*) route rather than an SMET (*,*) 2955 route for each BD. 2957 The rules for exporting IP routes to multicast sources are the same 2958 as those specified for MEGs in Section 6.1.2.2, except that the 2959 exported routes will be IP routes rather than VPN-IP routes, and it 2960 is not necessary to attach the VRF Route Import EC or the Source AS 2961 EC. 2963 When a source is on a multi-homed segment, the same issue discussed 2964 in Section 6.1.2.2.3 exists. Suppose S is on an ethernet segment, 2965 belonging to BD1, that is multi-homed to both PE1 and PE2, where PE1 2966 is a PEG. And suppose that IP multicast traffic from S to G travels 2967 over the AC that attaches the segment to PE2. If PE1 receives an 2968 external PIM Join (S,G) route, it MUST originate an SMET route for 2969 (S,G). Normal OISM procedures will cause PE2 to send the (S,G) 2970 traffic to PE1 on an EVPN IP multicast tunnel. Normal OISM 2971 procedures will also cause PE1 to send the (S,G) traffic up its BD1 2972 IRB interface. Normal PIM procedures will then cause PE1 to forward 2973 the traffic along a PIM tree. In this case, the routing is not 2974 optimal, but the traffic does flow correctly. 2976 6.1.4.2. Source Outside EVPN Domain 2978 By means of normal OISM procedures, a PEG learns whether there are 2979 receivers in the Tenant Domain that are interested in receiving (*,G) 2980 or (S,G) traffic. The PEG must determine whether S (or the RP for G) 2981 is outside the EVPN Tenant Domain. If so, and if there is a receiver 2982 on BD1 interested in receiving such traffic, the PEG DR for BD1 is 2983 responsible for originating a PIM Join(S,G) or Join(*,G) control 2984 message. 2986 An alternative would be to allow any PEG that is directly attached to 2987 a receiver to originate the PIM Joins. Then the PEG DR would only 2988 have to originate PIM Joins on behalf of receivers that are not 2989 attached to a PEG. However, if this is done, it is necessary for the 2990 PEGs to run PIM on all their IRB interfaces, so that the PIM Assert 2991 procedures can be used to prevent duplicate delivery to a given BD. 2993 The IIF for the layer 3 (S,G) or (*,G) state is determined by normal 2994 PIM procedures. If a receiver is on BD1, and the PEG DR is attached 2995 to BD1, its IRB interface to BD1 is added to the OIF list. This 2996 ensures that any receivers locally attached to the PEG DR will 2997 receive the traffic. If there are receivers attached to other EVPN 2998 PEs, then whenever (S,G) traffic from an external source matches a 2999 (*,G) state, the PEG will create (S,G) state. The IIF will be set to 3000 whatever external interface the traffic is expected to arrive on 3001 (copied from the (*,G) state), the OIF list is copied from the (*,G) 3002 state, and the SBD IRB interface added to the OIF list. 3004 6.2. Interworking with PIM via an External PIM Router 3006 Section 6.1 describes how to use an OISM PE router as the gateway to 3007 a non-EVPN multicast domain, when the EVPN tenant domain is not being 3008 used as an intermediate transit network for multicast. An 3009 alternative approach is to have one or more external PIM routers 3010 (perhaps operated by a tenant) on one of the BDs of the tenant 3011 domain. We will refer to this BD as the "gateway BD". 3013 In this model: 3015 o The EVPN Tenant Domain is treated as a stub network attached to 3016 the external PIM routers. 3018 o The external PIM routers follow normal PIM procedures, and provide 3019 the FHR and LHR functionality for the entire Tenant Domain. 3021 o The OISM PEs do not run PIM. 3023 o There MUST NOT be more than one gateway BD. 3025 o If an OISM PE not attached to the gateway BD has interest in a 3026 given multicast flow, it conveys that interest, following normal 3027 OISM procedures, by originating an SBD-SMET route for that flow. 3029 o If a PE attached to the gateway BD receives an SBD-SMET, it may 3030 need to generate and transmit a corresponding IGMP/MLD Join out 3031 one or more of its ACs. (Procedures for generating an IGMP/MLD 3032 Join as a result of receiving an SMET route are given in 3033 [I-D.ietf-bess-evpn-igmp-mld-proxy].) The PE MUST know which BD 3034 is the Gateway BD and MUST NOT transmit an IGMP/MLD Join to any 3035 other BDs. Furthermore, even if a particular AC is part of that 3036 BD, the PE SHOULD NOT transmit an IGMP/MLD Join on that AC unless 3037 that an external PIM route is attached via that AC. 3039 As a result, IGMP/MLD messages will seen by the external PIM 3040 routers on the gateway BD, and those external PIM routers will 3041 send PIM Join messages externally as required. Traffic of the 3042 given multicast flow will then be received by one of the external 3043 PIM routers, and that traffic will be forwarded by that router to 3044 the gateway BD. 3046 The normal OISM procedures will then cause the given multicast 3047 flow to be tunneled to any PEs of the EVPN Tenant Domain that have 3048 interest in the flow. PEs attached to the gateway BD will see the 3049 flow as originating from the gateway BD, other PEs will see the 3050 flow as originating from the SBD. 3052 o An OISM PE attached to a gateway BD MUST set its layer 2 multicast 3053 state to indicate that each AC to the gateway BD has interest in 3054 all multicast flows. It MUST also originate an SMET route for 3055 (*,*). The procedures for originating SMET routes are discussed 3056 in Section 2.5. 3058 This will cause the OISM PEs attached to the gateway BD to receive 3059 all the IP multicast traffic that is sourced within the EVPN 3060 tenant domain, and to transmit that traffic to the gateway BD, 3061 where the external PIM routers will see it. This enables the 3062 external PIM routers to perform FHR functions on behalf of the 3063 entire Tenant Domain. (Of course, if the gateway BD has a 3064 multi-homed segment, only the PE that is the DF for that segment 3065 will transmit the multicast traffic to the segment.) 3067 7. Using an EVPN Tenant Domain as an Intermediate (Transit) Network for 3068 Multicast traffic 3070 In this section, we consider the scenario where one or more BDs of an 3071 EVPN Tenant Domain are being used to carry IP multicast traffic for 3072 which the source and at least one receiver are not part the tenant 3073 domain. That is, one or more BDs of the Tenant Domain are 3074 intermediate "links" of a larger multicast tree created by PIM. 3076 We define a "tenant multicast router" as a multicast router, running 3077 PIM, that is: 3079 1. attached to one or more BDs of the Tenant Domain, but 3081 2. is not an EVPN PE router. 3083 In order an EVPN Tenant Domain to be used as a transit network for IP 3084 multicast, one or more of its BDs must have tenant multicast routers, 3085 and an OISM PE that attaching to such a BD MUST be provisioned to 3086 enable PIM on its IRB interface to that BD. (This is true even if 3087 none of the tenant routers is on a segment attached to the PE.) 3088 Further, all the OISM PEs (even ones not attached to a BD with tenant 3089 multicast routers) MUST be provisioned to enable PIM on their SBD IRB 3090 interfaces. 3092 If PIM is enabled on a particular BD, the DR Selection procedure of 3093 Section 6.1.2.4 MUST be replaced by the normal PIM DR Election 3094 procedure of [RFC7761]. Note that this may result in one of the 3095 tenant routers being selected as the DR, rather than one of the OISM 3096 PE routers. In this case, First Hop Router and Last Hop Router 3097 functionality will not be performed by any of the EVPN PEs. 3099 A PIM control message on a particular BD is considered to be a 3100 link-local multicast message, and as such is sent transparently from 3101 PE to PE via the BUM tunnel for that BD. This is true whether the 3102 control message was received from an AC, or whether it was received 3103 from the local layer 3 routing instance via an IRB interface. 3105 A PIM Join/Prune message contains three fields that are relevant to 3106 the present discussion: 3108 o Upstream Neighbor 3109 o Group Address (G) 3111 o Source Address (S), omitted in the case of (*,G) Join/Prune 3112 messages. 3114 We will generally speak of a PIM Join as a "Join(S,G)" or a 3115 "Join(*,G)" message, and will use the term "Join(X,G)" to mean 3116 "either Join(S,G) or Join(*,G)". In the context of a Join(X,G), we 3117 will use the term "X" to mean "S in the case of (S,G), or G's RP in 3118 the case of (*,G)". 3120 Suppose BD1 contains two tenant multicast routers, C1 and C2. 3121 Suppose C1 is on a segment attached to PE1, and C2 is on a segment 3122 attached to PE2. When C1 sends a PIM Join(X,G) to BD1, the Upstream 3123 Neighbor field might be set to either PE1, PE2, or C2. C1 chooses 3124 the Upstream Neighbor based on its unicast routing. Typically, it 3125 will choose as the Upstream Neighbor the PIM router on BD1 that is 3126 "closest" (according to the unicast routing) to X. Note that this 3127 will not necessarily be PE1. PE1 may not even be visible to the 3128 unicast routing algorithm used by the tenant routers. Even if it is, 3129 it is unlikely to be the PIM router that is closest to X. So we need 3130 to consider the following two cases: 3132 1. C1 sends a PIM Join(X,G) to BD1, with PE1 as the Upstream 3133 Neighbor. 3135 PE1's PIM routing instance will see the Join arrive on the BD1 3136 IRB interface. If X is not within the Tenant Domain, PE1 3137 handles the Join according to normal PIM procedures. This will 3138 generally result in PE1 selecting an Upstream Neighbor and 3139 sending it a Join(X,G). 3141 If X is within the Tenant Domain, but is attached to some other 3142 PE, PE1 sends (if it hasn't already) an SBD-SMET route for 3143 (X,G). The IIF of the layer 3 (X,G) state will be the SBD IRB 3144 interface, and the OIF list will include the IRB interface to 3145 BD1. 3147 The SBD-SMET route will pull the (X,G) traffic to PE1, and the 3148 (X,G) state will result in the (X,G) traffic being forwarded to 3149 C1. 3151 If X is within the Tenant Domain, but is attached to PE1 itself, 3152 no SBD-SMET route is sent. The IIF of the layer 3 (X,G) state 3153 will be the IRB interface to X's BD, and the OIF list will 3154 include the IRB interface to BD1. 3156 2. C1 sends a PIM Join(X,G) to BD1, with either PE2 or C2 as the 3157 Upstream Neighbor. 3159 PE1's PIM routing instance will see the Join arrive on the BD1 3160 IRB interface. If neither X nor Upstream Neighbor is within the 3161 tenant domain, PE1 handles the Join according to normal PIM 3162 procedures. This will NOT result in PE1 sending a Join(X,G). 3164 If either X or Upstream Neighbor is within the Tenant Domain, 3165 PE1 sends (if it hasn't already) an SBD-SMET route for (X,G). 3166 The IIF of the layer 3 (X,G) state will be the SBD IRB 3167 interface, and the OIF list will include the IRB interface to 3168 BD1. 3170 The SBD-SMET route will pull the (X,G) traffic to PE1, and the 3171 (X,G) state will result in the (X,G) traffic being forwarded to 3172 C1. 3174 8. IANA Considerations 3176 IANA is requested to assign new flags in the "Multicast Flags 3177 Extended Community Flags" registry. These flags are: 3179 o IPMG 3181 o MEG 3183 o PEG 3185 o OISM SBD 3187 o OISM-supported 3189 9. Security Considerations 3191 This document uses protocols and procedures defined in the normative 3192 references, and inherits the security considerations of those 3193 references. 3195 This document adds flags or Extended Communities (ECs) to a number of 3196 BGP routes, in order to signal that particular nodes support the 3197 OISM, IPMG, MEG, and/or PEG functionalities that are defined in this 3198 document. Incorrect addition, removal, or modification of those 3199 flags and/or ECs will cause the procedures defined herein to 3200 malfunction, in which case loss or diversion of data traffic is 3201 possible. 3203 10. Acknowledgements 3205 The authors thank Vikram Nagarajan and Princy Elizabeth for their 3206 work on Section 6.2 and Section 3.2.3.1. The authors also benefited 3207 tremendously from discussions with Aldrin Isaac on EVPN multicast 3208 optimizations. 3210 11. References 3212 11.1. Normative References 3214 [I-D.ietf-bess-evpn-bum-procedure-updates] 3215 Zhang, Z., Lin, W., Rabadan, J., Patel, K., and A. 3216 Sajassi, "Updates on EVPN BUM Procedures", draft-ietf- 3217 bess-evpn-bum-procedure-updates-08 (work in progress), 3218 November 2019. 3220 [I-D.ietf-bess-evpn-igmp-mld-proxy] 3221 Sajassi, A., Thoria, S., Mishra, M., PAtel, K., Drake, J., 3222 and W. Lin, "IGMP and MLD Proxy for EVPN", draft-ietf- 3223 bess-evpn-igmp-mld-proxy-09 (work in progress), April 3224 2021. 3226 [I-D.ietf-bess-evpn-inter-subnet-forwarding] 3227 Sajassi, A., Salam, S., Thoria, S., Drake, J. E., and J. 3228 Rabadan, "Integrated Routing and Bridging in EVPN", draft- 3229 ietf-bess-evpn-inter-subnet-forwarding-13 (work in 3230 progress), February 2021. 3232 [I-D.ietf-bess-evpn-optimized-ir] 3233 Rabadan, J., Sathappan, S., Lin, W., Katiyar, M., and A. 3234 Sajassi, "Optimized Ingress Replication solution for 3235 EVPN", draft-ietf-bess-evpn-optimized-ir-07 (work in 3236 progress), July 2020. 3238 [I-D.ietf-bess-evpn-prefix-advertisement] 3239 Rabadan, J., Henderickx, W., Drake, J. E., Lin, W., and A. 3240 Sajassi, "IP Prefix Advertisement in EVPN", draft-ietf- 3241 bess-evpn-prefix-advertisement-11 (work in progress), May 3242 2018. 3244 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 3245 Requirement Levels", BCP 14, RFC 2119, 3246 DOI 10.17487/RFC2119, March 1997, 3247 . 3249 [RFC2236] Fenner, W., "Internet Group Management Protocol, Version 3250 2", RFC 2236, DOI 10.17487/RFC2236, November 1997, 3251 . 3253 [RFC2710] Deering, S., Fenner, W., and B. Haberman, "Multicast 3254 Listener Discovery (MLD) for IPv6", RFC 2710, 3255 DOI 10.17487/RFC2710, October 1999, 3256 . 3258 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 3259 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 3260 Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001, 3261 . 3263 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 3264 Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, 3265 February 2006, . 3267 [RFC6625] Rosen, E., Ed., Rekhter, Y., Ed., Hendrickx, W., and R. 3268 Qiu, "Wildcards in Multicast VPN Auto-Discovery Routes", 3269 RFC 6625, DOI 10.17487/RFC6625, May 2012, 3270 . 3272 [RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP 3273 Extended Communities", RFC 7153, DOI 10.17487/RFC7153, 3274 March 2014, . 3276 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 3277 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 3278 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 3279 2015, . 3281 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 3282 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 3283 May 2017, . 3285 [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, 3286 J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet 3287 VPN Designated Forwarder Election Extensibility", 3288 RFC 8584, DOI 10.17487/RFC8584, April 2019, 3289 . 3291 11.2. Informative References 3293 [I-D.ietf-bess-evpn-pref-df] 3294 Rabadan, J., Sathappan, S., Przygienda, T., Lin, W., 3295 Drake, J., Sajassi, A., and S. Mohanty, "Preference-based 3296 EVPN DF Election", draft-ietf-bess-evpn-pref-df-07 (work 3297 in progress), March 2021. 3299 [I-D.ietf-bier-evpn] 3300 Zhang, Z., Przygienda, A., Sajassi, A., and J. Rabadan, 3301 "EVPN BUM Using BIER", draft-ietf-bier-evpn-04 (work in 3302 progress), December 2020. 3304 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 3305 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 3306 2006, . 3308 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 3309 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 3310 2012, . 3312 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 3313 Encodings and Procedures for Multicast in MPLS/BGP IP 3314 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 3315 . 3317 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 3318 Patel, "Revised Error Handling for BGP UPDATE Messages", 3319 RFC 7606, DOI 10.17487/RFC7606, August 2015, 3320 . 3322 [RFC7716] Zhang, J., Giuliano, L., Rosen, E., Ed., Subramanian, K., 3323 and D. Pacella, "Global Table Multicast with BGP Multicast 3324 VPN (BGP-MVPN) Procedures", RFC 7716, 3325 DOI 10.17487/RFC7716, December 2015, 3326 . 3328 [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., 3329 Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent 3330 Multicast - Sparse Mode (PIM-SM): Protocol Specification 3331 (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 3332 2016, . 3334 [RFC8296] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 3335 Tantsura, J., Aldrin, S., and I. Meilik, "Encapsulation 3336 for Bit Index Explicit Replication (BIER) in MPLS and Non- 3337 MPLS Networks", RFC 8296, DOI 10.17487/RFC8296, January 3338 2018, . 3340 Appendix A. Integrated Routing and Bridging 3342 This Appendix provides a short tutorial on the interaction of routing 3343 and bridging. First it shows the traditional model, where bridging 3344 and routing are performed in separate boxes. Then it shows the model 3345 specified in [I-D.ietf-bess-evpn-inter-subnet-forwarding], where a 3346 single box contains both routing and bridging functions. The latter 3347 model is presupposed in the body of this document. 3349 Figure 1 shows a "traditional" router that only does routing and has 3350 no L2 bridging capabilities. There are two LANs, LAN1 and LAN2. 3351 LAN1 is realized by switch1, LAN2 by switch2. The router has an 3352 interface, "lan1" that attaches to LAN1 (via switch1) and an 3353 interface "lan2" that attachs to LAN2 (via switch2). Each intreface 3354 is configured, as an IP interface, with an IP address and a subnet 3355 mask. 3357 +-------+ +--------+ +-------+ 3358 | | lan1| |lan2 | | 3359 H1 -----+Switch1+--------+ Router1+--------+Switch2+------H3 3360 | | | | | | 3361 H2 -----| | | | | | 3362 +-------+ +--------+ +-------+ 3363 |_________________| |__________________| 3364 LAN1 LAN2 3366 Figure 1: Conventional Router with LAN Interfaces 3368 IP traffic (unicast or multicast) that remains within a single subnet 3369 never reaches the router. For instance, if H1 emits an ethernet 3370 frame with H2's MAC address in the ethernet destination address 3371 field, the frame will go from H1 to Switch1 to H2, without ever 3372 reaching the router. Since the frame is never seen by a router, the 3373 IP datagram within the frame remains entirely unchanged; e.g., its 3374 TTL is not decremented. The ethernet Source and Destination MAC 3375 addresses are not changed either. 3377 If H1 wants to send a unicast IP datagram to H3, which is on a 3378 different subnet, H1 has to be configured with the IP address of a 3379 "default router". Let's assume that H1 is configured with an IP 3380 address of Router1 as its default router address. H1 compares H3's 3381 IP address with its own IP address and IP subnet mask, and determines 3382 that H3 is on a different subnet. So the packet has to be routed. 3383 H1 uses ARP to map Router1's IP address to a MAC address on LAN1. H1 3384 then encapsulates the datagram in an ethernet frame, using router1's 3385 MAC address as the destination MAC address, and sends the frame to 3386 Router1. 3388 Router1 then receives the frame over its lan1 interface. Router1 3389 sees that the frame is addressed to it, so it removes the ethernet 3390 encapsulation and processes the IP datagram. The datagram is not 3391 addressed to Router1, so it must be forwarded further. Router1 does 3392 a lookup of the datagram's IP destination field, and determines that 3393 the destination (H3) can be reached via Router1's lan2 interface. 3394 Router1 now performs the IP processing of the datagram: it decrements 3395 the IP TTL, adjusts the IP header checksum (if present), may fragment 3396 the packet is necessary, etc. Then the datagram (or its fragments) 3397 are encapsulated in an ethernet header, with Router1's MAC address on 3398 LAN2 as the MAC Source Address, and H3's MAC address on LAN2 (which 3399 Router1 determines via ARP) as the MAC Destination Address. Finally 3400 the packet is sent out the lan2 interface. 3402 If H1 has an IP multicast datagram to send (i.e., an IP datagram 3403 whose Destination Address field is an IP Multicast Address), it 3404 encapsulates it in an ethernet frame whose MAC Destination Address is 3405 computed from the IP Destination Address. 3407 If H2 is a receiver for that multicast address, H2 will receive a 3408 copy of the frame, unchanged, from H1. The MAC Source Address in the 3409 ethernet encapsulation does not change, the IP TTL field does not get 3410 decremented, etc. 3412 If H3 is a receiver for that multicast address, the datagram must be 3413 routed to H3. In order for this to happen, Router1 must be 3414 configured as a multicast router, and it must accept traffic sent to 3415 ethernet multicast addresses. Router1 will receive H1's multicast 3416 frame on its lan1 interface, will remove the ethernet encapsulation, 3417 and will determine how to dispatch the IP datagram based on Router1's 3418 multicast forwarding states. If Router1 knows that there is a 3419 receiver for the multicast datagram on LAN2, makes a copy of the 3420 datagram, decrements the TTL (and performs any other necessary IP 3421 processing), then encapsulates the datagram in ethernet frame for 3422 LAN2. The MAC Source Address for this frame will be Router1's MAC 3423 Source Address on LAN2. The MAC Destination Address is computed from 3424 the IP Destination Address. Finally, the frame is sent out Router1's 3425 LAN2 interface. 3427 Figure 2 shows an Integrated Router/Bridge that supports the routing/ 3428 bridging integration model of 3429 [I-D.ietf-bess-evpn-inter-subnet-forwarding]. 3431 +------------------------------------------+ 3432 | Integrated Router/Bridge | 3434 +-------+ +--------+ +-------+ 3435 | | IRB1| L3 |IRB2 | | 3436 H1 -----+ BD1 +--------+Routing +--------+ BD2 +------H3 3437 | | |Instance| | | 3438 H2 -----| | | | | | 3439 +-------+ +--------+ +-------+ 3440 |___________________| |____________________| 3441 LAN1 LAN2 3443 Figure 2: Integrated Router/Bridge 3445 In Figure 2, a single box consists of one or more "L3 Routing 3446 Instances". The routing/forwarding tables of a given routing 3447 instance is known as an IP-VRF 3448 ([I-D.ietf-bess-evpn-inter-subnet-forwarding]). In the context of 3449 EVPN, it is convenient to think of each routing instance as 3450 representing the routing of a particular tenant. Each IP-VRF is 3451 attached to one or more interfaces. 3453 When several EVPN PEs have a routing instance of the same tenant 3454 domain, those PEs advertise IP routes to the attached hosts. This is 3455 done as specified in [I-D.ietf-bess-evpn-inter-subnet-forwarding]. 3457 The integrated router/bridge shown in Figure 2 also attaches to a 3458 number of "Broadcast Domains" (BDs). Each BD performs the functions 3459 that are performed by the bridges in Figure 1. To the L3 routing 3460 instance, each BD appears to be a LAN. The interface attaching a 3461 particular BD to a particular IP-VRF is known as an "IRB Interface". 3462 From the perspective of L3 routing, each BD is a subnet. Thus each 3463 IRB interface is configured with a MAC address (which is the router's 3464 MAC address on the corresponding LAN), as well as an IP address and 3465 subnet mask. 3467 The integrated router/bridge shown in Figure 2 may have multiple ACs 3468 to each BD. These ACs are visible only to the bridging function, not 3469 to the routing instance. To the L3 routing instance, there is just 3470 one "interface" to each BD. 3472 If the L3 routing instance represents the IP routing of a particular 3473 tenant, the BDs attached to that routing instance are BDs belonging 3474 to that same tenant. 3476 Bridging and routing now proceed exactly as in the case of Figure 1, 3477 except that BD1 replaces Switch1, BD2 replaces Switch2, interface 3478 IRB1 replaces interface lan1, and interface IRB2 replaces interface 3479 lan2. 3481 It is important to understand that an IRB interface connects an L3 3482 routing instance to a BD, NOT to a "MAC-VRF". (See [RFC7432] for the 3483 definition of "MAC-VRF".) A MAC-VRF may contain several BDs, as long 3484 as no MAC address appears in more than one BD. From the perspective 3485 of the L3 routing instance, each individual BD is an individual IP 3486 subnet; whether each BD has its own MAC-VRF or not is irrelevant to 3487 the L3 routing instance. 3489 Figure 3 illustrates IRB when a pair of BDs (subnets) are attached to 3490 two different PE routers. In this example, each BD has two segments, 3491 and one segment of each BD is attached to one PE router. 3493 +------------------------------------------+ 3494 | Integrated Router/Bridges | 3496 +-------+ +--------+ +-------+ 3497 | | IRB1| |IRB2 | | 3498 H1 -----+ BD1 +--------+ PE1 +--------+ BD2 +------H3 3499 |(Seg-1)| |(L3 Rtg)| |(Seg-1)| 3500 H2 -----| | | | | | 3501 +-------+ +--------+ +-------+ 3502 |___________________| | |____________________| 3503 LAN1 | LAN2 3504 | 3505 | 3506 +-------+ +--------+ +-------+ 3507 | | IRB1| |IRB2 | | 3508 H4 -----+ BD1 +--------+ PE2 +--------+ BD2 +------H5 3509 |(Seg-2)| |(L3 Rtg)| |(Seg-2)| 3510 | | | | | | 3511 +-------+ +--------+ +-------+ 3513 Figure 3: Integrated Router/Bridges with Distributed Subnet 3515 If H1 needs to send an IP packet to H4, it determines from its IP 3516 address and subnet mask that H4 is on the same subnet as H1. 3517 Although H1 and H4 are not attached to the same PE router, EVPN 3518 provides ethernet communication among all hosts that are on the same 3519 BD. H1 thus uses ARP to find H4's MAC address, and sends an ethernet 3520 frame with H4's MAC address in the Destination MAC address field. 3521 The frame is received at PE1, but since the Destination MAC address 3522 is not PE1's MAC address, PE1 assumes that the frame is to remain on 3523 BD1. Therefore the packet inside the frame is NOT decapsulated, and 3524 is NOT send up the IRB interface to PE1's routing instance. Rather, 3525 standard EVPN intra-subnet procedures (as detailed in [RFC7432] are 3526 used to deliver the frame to PE2, which then sends it to H4. 3528 If H1 needs to send an IP packet to H5, it determines from its IP 3529 address and subnet mask that H5 is NOT on the same subnet as H1. 3530 Assuming that H1 has been configured with the IP address of PE1 as 3531 its default router, H1 sends the packet in an ethernet frame with 3532 PE1's MAC address in its Destination MAC Address field. PE1 receives 3533 the frame, and sees that the frame is addressed to it. PE1 thus 3534 sends the frame up its IRB1 interface to the L3 routing instance. 3535 Appropriate IP processing is done (e.g., TTL decrement). The L3 3536 routing instance determines that the "next hop" for H5 is PE2, so the 3537 packet is encapsulated (e.g., in MPLS) and sent across the backbone 3538 to PE2's routing instance. PE2 will see that the packet's 3539 destination, H5, is on BD2 segment-2, and will send the packet down 3540 its IRB2 interface. This causes the IP packet to be encapsulated in 3541 an ethernet frame with PE2's MAC address (on BD2) in the Source 3542 Address field and H5's MAC address in the Destination Address field. 3544 Note that if H1 has an IP packet to send to H3, the forwarding of the 3545 packet is handled entirely within PE1. PE1's routing instance sees 3546 the packet arrive on its IRB1 interface, and then transmits the 3547 packet by sending it down its IRB2 interface. 3549 Often, all the hosts in a particular Tenant Domain will be 3550 provisioned with the same value of the default router IP address. 3551 This IP address can be assigned, as an "anycast address", to all the 3552 EVPN PEs attached to that Tenant Domain. Thus although all hosts are 3553 provisioned with the same "default router address", the actual 3554 default router for a given host will be one of the PEs that is 3555 attached to the same ethernet segment as the host. This provisioning 3556 method ensures that IP packets from a given host are handled by the 3557 closest EVPN PE that supports IRB. 3559 In the topology of Figure 3, one could imagine that H1 is configured 3560 with a default router address that belongs to PE2 but not to PE1. 3561 Inter-subnet routing would still work, but IP packets from H1 to H3 3562 would then follow the non-optimal path H1-->PE1-->PE2-->PE1-->H3. 3563 Sending traffic on this sort of path, where it leaves a router and 3564 then comes back to the same router, is sometimes known as 3565 "hairpinning". Similarly, if PE2 supports IRB but PE1 dos not, the 3566 same non-optimal path from H1 to H3 would have to be followed. To 3567 avoid hairpinning, each EVPN PE needs to support IRB. 3569 It is worth pointing out the way IRB interfaces interact with 3570 multicast traffic. Referring again to Figure 3, suppose PE1 and PE2 3571 are functioning as IP multicast routers. Suppose also that H3 3572 transmits a multicast packet, and both H1 and H4 are interested in 3573 receiving that packet. PE1 will receive the packet from H3 via its 3574 IRB2 interface. The ethernet encapsulation from BD2 is removed, the 3575 IP header processing is done, and the packet is then reencapsulated 3576 for BD1, with PE1's MAC address in the MAC Source Address field. 3577 Then the packet is sent down the IRB1 interface. Layer 2 procedures 3578 (as defined in [RFC7432] would then be used to deliver a copy of the 3579 packet locally to H1, and remotely to H4. 3581 Please be aware that his document modifies the semantics, described 3582 in the previous paragraph, of sending/receiving multicast traffic on 3583 an IRB interface. This is explained in Section 1.5.1 and subsequent 3584 sections. 3586 Authors' Addresses 3588 Wen Lin 3589 Juniper Networks, Inc. 3590 10 Technology Park Drive 3591 Westford, Massachusetts 01886 3592 United States 3594 EMail: wlin@juniper.net 3596 Zhaohui Zhang 3597 Juniper Networks, Inc. 3598 10 Technology Park Drive 3599 Westford, Massachusetts 01886 3600 United States 3602 EMail: zzhang@juniper.net 3604 John Drake 3605 Juniper Networks, Inc. 3606 1194 N. Mathilda Ave 3607 Sunnyvale, CA 94089 3608 United States 3610 EMail: jdrake@juniper.net 3611 Eric C. Rosen (editor) 3612 Juniper Networks, Inc. 3613 10 Technology Park Drive 3614 Westford, Massachusetts 01886 3615 United States 3617 EMail: erosen52@gmail.com 3619 Jorge Rabadan 3620 Nokia 3621 777 E. Middlefield Road 3622 Mountain View, CA 94043 3623 United States 3625 EMail: jorge.rabadan@nokia.com 3627 Ali Sajassi 3628 Cisco Systems 3629 170 West Tasman Drive 3630 San Jose, CA 95134 3631 United States 3633 EMail: sajassi@cisco.com