idnits 2.17.1 draft-lin-bess-evpn-irb-mcast-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 24, 2017) is 2369 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'EVPN-AR' -- Possible downref: Non-RFC (?) normative reference: ref. 'EVPN-BUM' == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-03 == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-igmp-mld-proxy-00 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS W. Lin 3 Internet-Draft Z. Zhang 4 Intended status: Standards Track J. Drake 5 Expires: April 27, 2018 E. Rosen, Ed. 6 Juniper Networks, Inc. 7 J. Rabadan 8 Nokia 9 A. Sajassi 10 Cisco Systems 11 October 24, 2017 13 EVPN Optimized Inter-Subnet Multicast (OISM) Forwarding 14 draft-lin-bess-evpn-irb-mcast-04 16 Abstract 18 Ethernet VPN (EVPN) provides a service that allows a single Local 19 Area Network (LAN), i.e., a single IP subnet, to be distributed over 20 multiple sites. The sites are interconnected by an IP or MPLS 21 backbone. Intra-subnet traffic (either unicast or multicast) always 22 appears to the endusers to be bridged, even when it is actually 23 carried over the IP backbone. When a single "tenant" owns multiple 24 such LANs, EVPN also allows IP unicast traffic to be routed between 25 those LANs. This document specifies new procedures that allow inter- 26 subnet IP multicast traffic to be routed among the LANs of a given 27 tenant, while still making intra-subnet IP multicast traffic appear 28 to be bridged. These procedures can provide optimal routing of the 29 inter-subnet multicast traffic, and do not require any such traffic 30 to leave a given router and then reenter that same router. These 31 procedures also accommodate IP multicast traffic that needs to travel 32 to or from systems that are outside the EVPN domain. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on April 27, 2018. 50 Copyright Notice 52 Copyright (c) 2017 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 68 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 4 69 1.1.1. Segments, Broadcast Domains, and Tenants . . . . . . 4 70 1.1.2. Inter-BD (Inter-Subnet) IP Traffic . . . . . . . . . 5 71 1.1.3. EVPN and IP Multicast . . . . . . . . . . . . . . . . 6 72 1.1.4. BDs, MAC-VRFS, and EVPN Service Models . . . . . . . 7 73 1.2. Need for EVPN-aware Multicast Procedures . . . . . . . . 7 74 1.3. Additional Requirements That Must be Met by the Solution 8 75 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 10 76 1.5. Model of Operation: Overview . . . . . . . . . . . . . . 12 77 1.5.1. Control Plane . . . . . . . . . . . . . . . . . . . . 12 78 1.5.2. Data Plane . . . . . . . . . . . . . . . . . . . . . 14 79 2. Detailed Model of Operation . . . . . . . . . . . . . . . . . 16 80 2.1. Supplementary Broadcast Domain . . . . . . . . . . . . . 16 81 2.2. When is a Route About/For/From a Particular BD . . . . . 17 82 2.3. Use of IRB Interfaces at Ingress PE . . . . . . . . . . . 18 83 2.4. Use of IRB Interfaces at an Egress PE . . . . . . . . . . 19 84 2.5. Announcing Interest in (S,G) . . . . . . . . . . . . . . 20 85 2.6. Tunneling Frames from Ingress PE to Egress PEs . . . . . 21 86 2.7. Advanced Scenarios . . . . . . . . . . . . . . . . . . . 22 87 3. EVPN-aware Multicast Solution Control Plane . . . . . . . . . 22 88 3.1. Supplementary Broadcast Domain (SBD) and Route Targets . 22 89 3.2. Advertising the Tunnels Used for IP Multicast . . . . . . 23 90 3.2.1. Constructing SBD Routes . . . . . . . . . . . . . . . 24 91 3.2.1.1. Constructing an SBD-IMET Route . . . . . . . . . 24 92 3.2.1.2. Constructing an SBD-SMET Route . . . . . . . . . 25 93 3.2.1.3. Constructing an SBD-SPMSI Route . . . . . . . . . 25 94 3.2.2. Ingress Replication . . . . . . . . . . . . . . . . . 26 95 3.2.3. Assisted Replication . . . . . . . . . . . . . . . . 26 96 3.2.4. BIER . . . . . . . . . . . . . . . . . . . . . . . . 27 97 3.2.5. Inclusive P2MP Tunnels . . . . . . . . . . . . . . . 28 98 3.2.5.1. Using the BUM Tunnels as IP Multicast Inclusive 99 Tunnels . . . . . . . . . . . . . . . . . . . . . 28 100 3.2.5.1.1. RSVP-TE P2MP . . . . . . . . . . . . . . . . 28 101 3.2.5.1.2. mLDP or PIM . . . . . . . . . . . . . . . . . 29 102 3.2.5.2. Using Wildcard S-PMSI A-D Routes to Advertise 103 Inclusive Tunnels Specific to IP Multicast . . . 30 104 3.2.6. Selective Tunnels . . . . . . . . . . . . . . . . . . 30 105 3.3. Advertising SMET Routes . . . . . . . . . . . . . . . . . 31 106 4. Constructing Multicast Forwarding State . . . . . . . . . . . 33 107 4.1. Layer 2 Multicast State . . . . . . . . . . . . . . . . . 33 108 4.1.1. Constructing the OIF List . . . . . . . . . . . . . . 34 109 4.1.2. Data Plane: Applying the OIF List to an (S,G) Frame . 35 110 4.1.2.1. Eligibility of an AC to Receive a Frame . . . . . 35 111 4.1.2.2. Applying the OIF List . . . . . . . . . . . . . . 35 112 4.2. Layer 3 Forwarding State . . . . . . . . . . . . . . . . 37 113 5. Interworking with non-OISM EVPN-PEs . . . . . . . . . . . . . 37 114 5.1. IPMG Designated Forwarder . . . . . . . . . . . . . . . . 40 115 5.2. Ingress Replication . . . . . . . . . . . . . . . . . . . 40 116 5.2.1. Ingress PE is non-OISM . . . . . . . . . . . . . . . 42 117 5.2.2. Ingress PE is OISM . . . . . . . . . . . . . . . . . 43 118 5.3. P2MP Tunnels . . . . . . . . . . . . . . . . . . . . . . 44 119 6. Traffic to/from Outside the EVPN Tenant Domain . . . . . . . 44 120 6.1. Layer 3 Interworking via EVPN OISM PEs . . . . . . . . . 45 121 6.1.1. General Principles . . . . . . . . . . . . . . . . . 45 122 6.1.2. Interworking with MVPN . . . . . . . . . . . . . . . 47 123 6.1.2.1. MVPN Sources with EVPN Receivers . . . . . . . . 49 124 6.1.2.1.1. Identifying MVPN Sources . . . . . . . . . . 49 125 6.1.2.1.2. Joining a Flow from an MVPN Source . . . . . 50 126 6.1.2.2. EVPN Sources with MVPN Receivers . . . . . . . . 52 127 6.1.2.2.1. General procedures . . . . . . . . . . . . . 52 128 6.1.2.2.2. Any-Source Multicast (ASM) Groups . . . . . . 53 129 6.1.2.2.3. Source on Multihomed Segment . . . . . . . . 54 130 6.1.2.3. Obtaining Optimal Routing of Traffic Between MVPN 131 and EVPN . . . . . . . . . . . . . . . . . . . . 55 132 6.1.2.4. DR Selection . . . . . . . . . . . . . . . . . . 55 133 6.1.3. Interworking with 'Global Table Multicast' . . . . . 56 134 6.1.4. Interworking with PIM . . . . . . . . . . . . . . . . 56 135 6.1.4.1. Source Inside EVPN Domain . . . . . . . . . . . . 57 136 6.1.4.2. Source Outside EVPN Domain . . . . . . . . . . . 58 137 6.2. Interworking with PIM via an External PIM Router . . . . 59 138 7. Using an EVPN Tenant Domain as an Intermediate (Transit) 139 Network for Multicast traffic . . . . . . . . . . . . . . . . 60 140 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 62 141 9. Security Considerations . . . . . . . . . . . . . . . . . . . 62 142 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 62 143 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 62 144 11.1. Normative References . . . . . . . . . . . . . . . . . . 62 145 11.2. Informative References . . . . . . . . . . . . . . . . . 64 146 Appendix A. Integrated Routing and Bridging . . . . . . . . . . 65 147 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 70 149 1. Introduction 151 1.1. Background 153 Ethernet VPN (EVPN) [RFC7432] provides a Layer 2 VPN (L2VPN) 154 solution, which allows IP backbone provider to offer ethernet service 155 to a set of customers, known as "tenants". 157 In this section (as well as in [EVPN-IRB]), we provide some essential 158 background information on EVPN. 160 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 161 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 162 document are to be interpreted as described in [RFC2119]. 164 1.1.1. Segments, Broadcast Domains, and Tenants 166 One of the key concepts of EVPN is the Broadcast Domain (BD). A BD 167 is essentially an emulated ethernet. Each BD belongs to a single 168 tenant. A BD typically consists of multiple ethernet "segments", and 169 each segment may be attached to a different EVPN Provider Edge 170 (EVPN-PE) router. EVPN-PE routers are often referred to as "Network 171 Virtualization Endpoints" or NVEs. However, this document will use 172 the term "EVPN-PE", or, when the context is clear, just "PE". 174 In this document, we use the term "segment" to mean the same as 175 "Ethernet Segment" or "ES" in [RFC7432]. 177 Attached to each segment are "Tenant Systems" (TSes). A TS may be 178 any type of system, physical or virtual, host or router, etc., that 179 can attach to an ethernet. 181 When two TSes are on the same segment, traffic between them does not 182 pass through an EVPN-PE. When two TSes are on different segments of 183 the same BD, traffic between them does pass through an EVPN-PE. 185 When two TSes, say TS1 and TS2 are on the same BD, then: 187 o If TS1 knows the MAC address of TS2, TS1 can send unicast ethernet 188 frames to TS2. TS2 will receive the frames unaltered. That is, 189 TS1's MAC address will be in the MAC Source Address field. If the 190 frame contains an IP datagram, the IP header is not modified in 191 any way during the transmission. 193 o If TS1 broadcasts an ethernet frame, TS2 will receive the 194 unaltered frame. 196 o If TS1 multicasts an ethernet frame, TS2 will receive the 197 unaltered frame, as long as TS2 has been provisioned to receive 198 ethernet multicasts. 200 When we say that TS2 receives an unaltered frame from TS1, we mean 201 that the frame still contains TS1's MAC address, and that no 202 alteration of the frame's payload has been done. 204 EVPN allows a single segment to be attached to multiple PE routers. 205 This is known as "EVPN multi-homing". EVPN has procedures to ensure 206 that a frame from a given segment, arriving at a particular PE 207 router, cannot be returned to that segment via a different PE router. 208 This is particularly important for multicast, because a frame 209 arriving at a PE from a given segment will already have been seen by 210 all systems on the segment that need to see it. If the frame were 211 sent back to the originating segment, receivers on that segment would 212 receive the packet twice. Even worse, the frame might be sent back 213 to a PE, which could cause an infinite loop. 215 1.1.2. Inter-BD (Inter-Subnet) IP Traffic 217 If a given tenant has multiple BDs, the tenant may wish to allow IP 218 communication among these BDs. Such a set of BDs is known as an 219 "EVPN Tenant Domain" or just a "Tenant Domain". 221 If tenant systems TS1 and TS2 are not in the same BD, then they do 222 not receive unaltered ethernet frames from each other. In order for 223 TS1 to send traffic to TS2, TS1 encapsulates an IP datagram inside an 224 ethernet frame, and uses ethernet to send these frames to an IP 225 router. The router decapsulates the IP datagram, does the IP 226 processing, and re-encapsulates the datagram for ethernet. The MAC 227 source address field now has the MAC address of the router, not of 228 TS1. The TTL field of the IP datagram should be decremented by 229 exactly 1; this hides the structure of the provider's IP backbone 230 from the tenants. 232 EVPN accommodates the need for inter-BD communication within a Tenant 233 Domain by providing an integrated L2/L3 service for unicast IP 234 traffic. EVPN's Integrated Routing and Bridging (IRB) functionality 235 is specified in [EVPN-IRB]. Each BD in a Tenant Domain is assumed to 236 be a single IP subnet, and each IP subnet within a a given Tenant 237 Domain is assumed to be a single BD. EVPN's IRB functionality allows 238 IP traffic to travel from one BD to another, and ensures that proper 239 IP processing (e.g., TTL decrement) is done. 241 A brief overview of IRB, including the notion of an "IRB interface", 242 can be found in Appendix A. As explained there, an IRB interface is 243 a sort of virtual interface connecting an L3 routing instance to a 244 BD. A BD may have multiple attachment circuits (ACs) to a given PE, 245 where each AC connects to a different ethernet segment of the BD. 246 However, these ACs are not visible to the L3 routing function; from 247 the perspective of an L3 routing instance, a PE has just one 248 interface to each BD, viz., the IRB interface for that BD. 250 The "L3 routing instance" depicted in Appendix A is associated with a 251 single Tenant Domain, and may be thought of as an IP-VRF for that 252 Tenant Domain. 254 1.1.3. EVPN and IP Multicast 256 [EVPN-IRB] and [EVPN_IP_Prefix] cover inter-subnet (inter-BD) IP 257 unicast forwarding, but they do not cover inter-subnet IP multicast 258 forwarding. 260 [RFC7432] covers intra-subnet (intra-BD) ethernet multicast. The 261 intra-subnet ethernet multicast procedures of [RFC7432] are used for 262 ethernet Broadcast traffic, for ethernet unicast traffic whose MAC 263 Destination Address field contains an Unknown address, and for 264 ethernet traffic whose MAC Destination Address field contains an 265 ethernet Multicast MAC address. These three classes of traffic are 266 known collectively as "BUM traffic" (Broadcast/UnknownUnicast/ 267 Multicast), and the procedures for handling BUM traffic are known as 268 "BUM procedures". 270 [IGMP-Proxy] extends the intra-subnet ethernet multicast procedures 271 by adding procedures that are specific to, and optimized for, the use 272 of IP multicast within a subnet. However,that document does not 273 cover inter-subnet IP multicast. 275 The purpose of this document is to specify procedures for EVPN that 276 provide optimized IP multicast functionality within an EVPN tenant 277 domain. This document also specifies procedures that allow IP 278 multicast packets to be sourced from or destined to systems outside 279 the Tenant Domain. We refer to the entire set of these procedures as 280 "OISM" (Optimized Inter-Subnet Multicast) procedures. 282 In order to support the OISM procedures specified in this document, 283 an EVPN-PE MUST also support [EVPN-IRB] and [IGMP-Proxy]. 285 1.1.4. BDs, MAC-VRFS, and EVPN Service Models 287 [RFC7432] defines the notion of "MAC-VRF". A MAC-VRF contains one or 288 more "Bridge Tables" (see section 3 of [RFC7432] for a discussion of 289 this terminology), each of which represents a single Broadcast 290 Domain. 292 In the IRB model (outlined in Appendix A) a L3 routing instance has 293 one IRB interface per BD, NOT one per MAC-VRF. The procedures of 294 this document are intended to work with all the EVPN service models. 295 This document does not distinguish between a "Broadcast Domain" and a 296 "Bridge Table", and will use the terms interchangeably (or will use 297 the acronym "BD" to refer to either). The way the BDs are grouped 298 into MAC-VRFs is not relevant to the procedures specified in this 299 document. 301 Section 6 of [RFC7432] also defines several different EVPN service 302 models: 304 o In the "vlan-based service", each MAC-VRF contains one "bridge 305 table", where the bridge table corresponds to a particular Virtual 306 LAN (VLAN). (See section 3 of [RFC7432] for a discussion of this 307 terminology.) Thus each VLAN is treated as a BD. 309 o In the "vlan bundle service", each MAC-VRF contains one bridge 310 table, where the bridge table corresponds to a set of VLANs. Thus 311 a set of VLANs are treated as constituting a single BD. 313 o In the "vlan-aware bundle service", each MAC-VRF may contain 314 multiple bridge tables, where each bridge table corresponds to one 315 BD. If a MAC-VRF contains several bridge tables, then it 316 corresponds to several BDs. 318 The procedures of this document are intended to work for all these 319 service models. 321 1.2. Need for EVPN-aware Multicast Procedures 323 Inter-subnet IP multicast among a set of BDs can be achieved, in a 324 non-optimal manner, without any specific EVPN procedures. For 325 instance, if a particular tenant has n BDs among which he wants to 326 send IP multicast traffic, he can simply attach a conventional 327 multicast router to all n BDs. Or more generally, as long as each BD 328 has at least one IP multicast router, and the IP multicast routers 329 communicate multicast control information with each other, 330 conventional IP multicast procedures will work normally, and no 331 special EVPN functionality is needed. 333 However, that technique does not provide optimal routing for 334 multicast. In conventional multicast routing, for a given multicast 335 flow, there is only one multicast router on each BD that is permitted 336 to send traffic of that flow to the BD. If that BD has receivers for 337 a given flow, but the source of the flow is not on that BD, then the 338 flow must pass through that multicast router. This leads to the 339 "hair-pinning" problem described (for unicast) in Appendix A. 341 For example, consider an (S,G) flow that is sourced by a TS S and 342 needs to be received by TSes R1 and R2. Suppose S is on a segment of 343 BD1, R1 is on a segment of BD2, but both are attached to PE1. 344 Suppose also that the tenant has a multicast router, attached to a 345 segment of BD1 and to a segment of BD2. However, the segments to 346 which that router is attached are both attached to PE2. Then the 347 flow from S to R would have to follow the path: 348 S-->PE1-->PE2-->Tenant Multicast Router-->PE2-->PE1-->R1. Obviously, 349 the path S-->PE1-->R would be preferred. 351 Now suppose that there is a second receiver, R2. R2 is attached to a 352 third BD, BD3. However, it is attached to a segment of BD3 that is 353 attached to PE1. And suppose also that the Tenant Multicast Router 354 is attached to a segment of BD3 that attaches to PE2. In this case, 355 the Tenant Multicast Router will make two copies of the packet, one 356 for BD2 and one for BD3. PE2 will send both copies back to PE1. Not 357 only is the routing sub-optimal, but PE2 sends multiple copies of the 358 same packet to PE1. This is a further sub-optimality. 360 This is only an example; many more examples of sub-optimal multicast 361 routing can easily be given. To eliminate sub-optimal routing and 362 extra copies, it is necessary to have a multicast solution that is 363 EVPN-aware, and that can use its knowledge of the internal structure 364 of a Tenant Domain to ensure that multicast traffic gets routed 365 optimally. The procedures of this document allow us to avoid all 366 such sub-optimalities when routing inter-subnet multicasts within a 367 Tenant Domain. 369 1.3. Additional Requirements That Must be Met by the Solution 371 In addition to providing optimal routing of multicast flows within a 372 Tenant Domain, the EVPN-aware multicast solution is intended to 373 satisfy the following requirements: 375 o The solution must integrate well with the procedures specified in 376 [IGMP-Proxy]. That is, an integrated set of procedures must 377 handle both intra-subnet multicast and inter-subnet multicast. 379 o With regard to intra-subnet multicast, the solution MUST maintain 380 the integrity of multicast ethernet service. This means: 382 * If a source and a receiver are on the same subnet, the MAC 383 source address (SA) of the multicast frame sent by the source 384 will not get rewritten. 386 * If a source and a receiver are on the same subnet, no IP 387 processing of the ethernet payload is done. The IP TTL is not 388 decremented, the header checksum is not changed, no 389 fragmentation is done, etc. 391 o On the other hand, if a source and a receiver are on different 392 subnets, the frame received by the receiver will not have the MAC 393 Source address of the source, as the frame will appear to have 394 come from a multicast router. Also, proper processing of the IP 395 header is done, e.g., TTL decrement by 1, header checksum 396 modification, possibly fragmentation, etc. 398 o If a Tenant Domain contains several BDs, it MUST be possible for a 399 multicast flow (even when the multicast group address is an "any 400 source multicast" (ASM) address), to have sources in one of those 401 BDs and receivers in one or more of the other BDs, without 402 requiring the presence of any system performing PIM Rendezvous 403 Point (RP) functions ([RFC7761]). Multicast throughout a Tenant 404 Domain must not require the tenant systems to be aware of any 405 underlying multicast infrastructure. 407 o Sometimes a MAC address used by one TS on a particular BD is also 408 used by another TS on a different BD. Inter-subnet routing of 409 multicast traffic MUST NOT make any assumptions about the 410 uniqueness of a MAC address across several BDs. 412 o If two EVPN-PEs attached to the same Tenant Domain both support 413 the OISM procedures, each may receive inter-subnet multicasts from 414 the other, even if the egress PE is not attached to any segment of 415 the BD from which the multicast packets are being sourced. It 416 MUST NOT be necessary to provision the egress PE with knowledge of 417 the ingress BD. 419 o There must be a procedure that that allows EVPN-PE routers 420 supporting OISM procedures to send/receive multicast traffic to/ 421 from EVPN-PE routers that support only [RFC7432], but that do not 422 support the OISM procedures or even the procedures of [EVPN-IRB]. 423 However, when interworking with such routers (which we call 424 "non-OISM PE routers"), optimal routing may not be achievable. 426 o It MUST be possible to support scenarios in which multicast flows 427 with sources inside a Tenant Domain have "external" receivers, 428 i.e., receivers that are outside the domain. It must also be 429 possible to support scenarios where multicast flows with external 430 sources (sources outside the Tenant Domain) have receivers inside 431 the domain. 433 This presupposes that unicast routes to multicast sources outside 434 the domain can be distributed to EVPN-PEs attached to the domain, 435 and that unicast routes to multicast sources within the domain can 436 be distributed outside the domain. 438 Of particular importance are the scenario in which the external 439 sources and/or receivers are reachable via L3VPN/MVPN, and the 440 scenario in which external sources and/or receivers are reachable 441 via IP/PIM. 443 The solution for external interworking MUST allow for deployment 444 scenarios in which EVPN does not need to export a host route for 445 every multicast source. 447 o The solution for external interworking must not presuppose that 448 the same tunneling technology is used within both the EVPN domain 449 and the external domain. For example, MVPN interworking must be 450 possible when MVPN is using MPLS P2MP tunneling, and EVPN is using 451 Ingress Replication or VXLAN tunneling. 453 o The solution must not be overly dependent on the details of a 454 small set of use cases, but must be adaptable to new use cases as 455 they arise. (That is, the solution must be robust.) 457 1.4. Terminology 459 In this document we make frequent use of the following terminology: 461 o OISM: Optimized Inter-Subnet Multicast. EVPN-PEs that follow the 462 procedures of this document will be known as "OISM" PEs. EVPN-PEs 463 that do not follow the procedures of this document will be known 464 as "non-OISM" PEs. 466 o IP Multicast Packet: An IP packet whose IP Destination Address 467 field is a multicast address that is not a link-local address. 468 (Link-local addresses are IPv4 addresses in the 224/8 range and 469 IPv6 address in the FF02/16 range.) 471 o IP Multicast Frame: An ethernet frame whose payload is an IP 472 multicast packet (as defined above). 474 o (S,G) Multicast Packet: An IP multicast packet whose IP Source 475 Address field contains S and whose IP Destination Address field 476 contains G. 478 o (S,G) Multicast Frame: An IP multicast frame whose payload 479 contains S in its IP Source Address field and G in its IP 480 Destination Address field. 482 o Broadcast Domain (BD): an emulated ethernet, such that two systems 483 on the same BD will receive each other's link-local broadcasts. 485 Note that EVPN supports models in which a single EVPN Instance 486 (EVI) contains only one BD, and models in which a single EVI 487 contains multiple BDs. Both models are supported by this draft. 488 However, a given BD belongs to only one EVI. 490 o Designated Forwarder (DF). As defined in [RFC7432], an ethernet 491 segment may be multi-homed (attached to more than one PE). An 492 ethernet segment may also contain multiple BDs, of one or more 493 EVIs. For each such EVI, one of the PEs attached to the segment 494 becomes that EVI's DF for that segment. Since a BD may belong to 495 only one EVI, we can speak unambiguously of the BD's DF for a 496 given segment. 498 When the text makes it clear that we are speaking in the context 499 of a given BD, we will frequently use the term "a segment's DF" to 500 mean the given BD's DF for that segment. 502 o AC: Attachment Circuit. An AC connects the bridging function of 503 an EVPN-PE to an ethernet segment of a particular BD. ACs are not 504 visible at the router (L3) layer. 506 o L3 Gateway: An L3 Gateway is a PE that connects an EVPN tenant 507 domain to an external multicast domain by performing both the OISM 508 procedures and the Layer 3 multicast procedures of the external 509 domain. 511 o PEG (PIM/EVPN Gateway): A L3 Gateway that connects an EVPN tenant 512 domain to an external multicast domain whose Layer 3 multicast 513 procedures are those of PIM ([RFC7761]). 515 o MEG (MVPN/EVPN Gateway): A L3 Gateway that connects an EVPN tenant 516 domain to an external multicast domain whose Layer 3 multicast 517 procedures are those of MVPN ([RFC6513], [RFC6514]). 519 o IPMG (IP Multicast Gateway): A PE that is used for interworking 520 OISM EVPN-PEs with non-OISM EVPN-PEs. 522 o DR (Designated Router): A PE that has special responsibilities for 523 handling multicast on a given BD. 525 o Use of the "C-" prefix. In many documents on VPN multicast, the 526 prefix "C-" appears before any address or wildcard that refers to 527 an address or addresses in a tenant's address space, rather than 528 to an address of addresses in the address space of the backbone 529 network. This document omits the "C-" prefix in many cases where 530 it is clear from the context that the reference is to the tenant's 531 address space. 533 This document also assumes familiarity with the terminology of 534 [RFC4364], [RFC6514], [RFC7432], [RFC7761], [IGMP-Proxy], 535 [EVPN_IP_Prefix] and [EVPN-BUM]. 537 1.5. Model of Operation: Overview 539 1.5.1. Control Plane 541 In this section, and in the remainder of this document, we assume the 542 reader is familiar with the procedures of IGMP/MLD (see [RFC2236] and 543 [RFC2710]), by which hosts announce their interest in receiving 544 particular multicast flows. 546 Consider a Tenant Domain consisting of a set of k BDs: BD1, ..., BDk. 547 To support the OISM procedures, each Tenant Domain must also be 548 associated with a "Supplementary Broadcast Domain" (SBD). An SBD is 549 treated in the control plane as a real BD, but it does not have any 550 ACs. The SBD has several uses, that will be described later in this 551 document. (See Section 2.1.) 553 Each PE that attaches to one or more of the BDs in a given tenant 554 domain will be provisioned to recognize that those BDs are part of 555 the same Tenant Domain. Note that a given PE does not need to be 556 configured with all the BDs of a given Tenant Domain. In general, a 557 PE will only be attached to a subset of the BDs in a given Tenant 558 Domain, and will be configured only with that subset of BDs. 559 However, each PE attached to a given Tenant Domain must be configured 560 with the SBD for that Tenant Domain. 562 Suppose a particular segment of a particular BD is attached to PE1. 563 [RFC7432] specifies that PE1 must originate an Inclusive Multicast 564 Ethernet Tag (IMET) route for that BD, and that the IMET must be 565 propagated to all other PEs attached to the same BD. If the given 566 segment contains a host that has interest in receiving a particular 567 multicast flow, either an (S,G) flow or a (*,G) flow, PE1 will learn 568 of that interest by participating in the IGMP/MLD procedures, as 569 specified in [IGMP-Proxy]. In this case, we will say that: 571 o PE1 is interested in receiving the flow; 573 o The AC attaching the interested host to PE1 is also said to be 574 interested in the flow; 576 o The BD containing an AC that is interested in a particular flow is 577 also said to be interested in that flow. 579 Once PE1 determines that it has interest in receiving a particular 580 flow or set of flows, it uses the procedures of [IGMP-Proxy] to 581 advertise its interest in those flows. It advertises its interest in 582 a given flow by originating a Selective Multicast Ethernet Tag (SMET) 583 route. An SMET route is propagated to the other PEs that attach to 584 the same BD. 586 OISM PEs MUST follow the procedures of [IGMP-Proxy]. In this 587 document, we extend the procedures of [IGMP-Proxy] so that IMET and 588 SMET routes for a particular BD are distributed not just to PEs that 589 attach to that BD, but to PEs that attach to any BD in the Tenant 590 Domain. 592 In this way, each PE attached to a given Tenant Domain learns, from 593 each other PE attached to the same Tenant Domain, the set of flows 594 that are of interest to each of those other PEs. 596 An OISM PE that is provisioned with several BDs in the same Tenant 597 Domain may originate an IMET route for each such BD. To indicate its 598 support of [IGMP-Proxy], it MUST attach the EVPN Multicast Flags 599 Extended Community to each such IMET route. 601 Suppose PE1 is provisioned with both BD1 and BD2, and is provisioned 602 to consider them to be part of the same Tenant Domain. It is 603 possible that PE1 will receive from PE2 both an IMET route for BD1 604 and an IMET route for BD2. If either of these IMET routes has the 605 EVPN Multicast Flags Extended Community, PE1 MUST assume that PE2 is 606 supporting the procedures of [IGMP-Proxy] for ALL BDs in the Tenant 607 Domain. 609 If a PE supports OISM functionality, it MUST indicate that by 610 attaching an "OISM-supported" flag or Extended Community (EC) to all 611 its IMET routes. (Details to be specified in next revision.) An 612 OISM PE SHOULD attach this flag or EC to all the IMET routes it 613 originates. However, if PE1 imports IMET routes from PE2, and at 614 least one of PE2's IMET routes indicates that PE2 is an OISM PE, PE1 615 will assume that PE2 is following OISM procedures. 617 1.5.2. Data Plane 619 Suppose PE1 has an AC to a segment in BD1, and PE1 receives from that 620 AC an (S,G) multicast frame (as defined in Section 1.4). 622 There may be other ACs of PE1 on which TSes have indicated an 623 interest (via IGMP/MLD) in receiving (S,G) multicast packets. PE1 is 624 responsible for sending the received multicast packet out those ACs. 625 There are two cases to consider: 627 o Intra-Subnet Forwarding: In this case, an attachment AC with 628 interest in (S,G) is connected to a segment that is part of the 629 source BD, BD1. If the segment is not multi-homed, or if PE1 is 630 the Designated Forwarder (DF) (see [RFC7432]) for that segment, 631 PE1 sends the multicast frame on that AC without changing the MAC 632 SA. The IP header is not modified at all; in particular, the TTL 633 is not decremented. 635 o Inter-Subnet Forwarding: An AC with interest in (S,G) is connected 636 to a segment of BD2, where BD2 is different than BD1. If PE1 is 637 the DF for that segment (or if the segment is not multi-homed), 638 PE1 decapsulates the IP multicast packet, performs any necessary 639 IP processing (including TTL decrement), then re-encapsulates the 640 packet appropriately for BD2. PE1 then sends the packet on the 641 AC. Note that after re-encapsulation, the MAC SA will be PE1's 642 MAC address on BD2. The IP TTL will have been decremented by 1. 644 In addition, there may be other PEs that are interested in (S,G) 645 traffic. Suppose PE2 is such a PE. Then PE1 tunnels a copy of the 646 IP multicast frame (with its original MAC SA, and with no alteration 647 of the payload's IP header). The tunnel encapsulation contains 648 information that PE2 can use to associate the frame with a source BD. 649 If the source BD is BD1: 651 o If PE2 is attached to BD1, the tunnel encapsulation used to send 652 the frame to PE2 will cause PE2 to identify BD1 as the source BD. 654 o If PE2 is not attached to BD1, the tunnel encapsulation used to 655 send the frame to PE2 will cause PE2 to identify the SBD as the 656 source BD. 658 The way in which the tunnel encapsulation identifies the source BD is 659 of course dependent on the type of tunnel that is used. This will be 660 specified later in this document. 662 When PE2 receives the tunneled frame, it will forward it on any of 663 its ACs that have interest in (S,G). 665 If PE2 determines from the tunnel encapsulation that the source BD is 666 BD1, then 668 o For those ACs that connect PE2 to BD1, the intra-subnet forwarding 669 procedure described above is used, except that it is now PE2, not 670 PE1, carrying out that procedure. Unmodified EVPN procedures from 671 [RFC7432] are used to ensure that a packet originating from a 672 multi-homed segment is never sent back to that segment. 674 o For those ACs that do not connect to BD1, the inter-subnet 675 forwarding procedure described above is used, except that it is 676 now PE2, not PE1, carrying out that procedure. 678 If the tunnel encapsulation identifies the source BD as the SBD, PE2 679 applies the inter-subnet forwarding procedures described above to all 680 of its ACs that have interest in the flow. 682 These procedures ensure that an IP multicast frame travels from its 683 ingress PE to all egress PEs that are interested in receiving it. 684 While in transit, the frame retains its original MAC SA, and the 685 payload of the frame retains its original IP header. Note that in 686 all cases, when an IP multicast packet is sent from one BD to 687 another, these procedures cause its TTL to be decremented by 1. 689 So far we have assumed that an IP multicast packet arrives at its 690 ingress PE over an AC that belongs to one of the BDs in a given 691 Tenant Domain. However, it is possible for a packet to arrive at its 692 ingress PE in other ways. Since an EVPN-PE supporting IRB has an 693 IP-VRF, it is possible that the IP-VRF will have a "VRF interface" 694 that is not an IRB interface. For example, there might be a VRF 695 interface that is actually a physical link to an external ethernet 696 switch, or to a directly attached host, or to a router. When an 697 EVPN-PE, say PE1, receives a packet through such means, we will say 698 that the packet has an "external" source (i.e., a source "outside the 699 tenant domain"). There are also other scenarios in which a multicast 700 packet might have an external source, e.g., it might arrive over an 701 MVPN tunnel from an L3VPN PE. In such cases, we will still refer to 702 PE1 as the "ingress EVPN-PE". 704 When an EVPN-PE, say PE1, receives an externally sourced multicast 705 packet, and there are receivers for that packet inside the Tenant 706 Domain, it does the following: 708 o Suppose PE1 has an AC in BD1 that has interest in (S,G). Then PE1 709 encapsulates the packet for BD1, filling in the MAC SA field with 710 the MAC address of PE1 itself on BD1. It sends the resulting 711 frame on the AC. 713 o Suppose some other EVPN-PE, say PE2, has interest in (S,G). PE1 714 encapsulates the packet for ethernet, filling in the MAC SA field 715 with PE1's own MAC address on the SBD. PE1 then tunnels the 716 packet to PE2. The tunnel encapsulation will identify the source 717 BD as the SBD. Since the source BD is the SBD, PE2 will know to 718 treat the frame as an inter-subnet multicast. 720 When ingress replication is used to transmit IP multicast frames from 721 an ingress EVPN-PE to a set of egress PEs, then of course the ingress 722 PE has to send multiple copies of the frame. Each copy is the 723 original ethernet frame; decapsulation and IP processing take place 724 only at the egress PE. 726 If a Point-to-Multipoint (P2MP) tree or BIER ([EVPN-BIER]) is used to 727 transmit an IP multicast frame from an ingress PE to a set of egress 728 PEs, then the ingress PE only has to send one copy of the frame to 729 each of its next hops. Again, each egress PE receives the original 730 frame and does any necessary IP processing. 732 2. Detailed Model of Operation 734 The model described in Section 1.5.2 can be expressed more precisely 735 using the notion of "IRB interface" (see Appendix A). However, this 736 requires that the semantics of the IRB interface be modified for 737 multicast packets. It is also necessary to have an IRB interface 738 that connects the L3 routing instance of a particular Tenant Domain 739 (in a particular PE) to the SBD of that Tenant Domain. 741 In this section we assume that PIM is not enabled on the IRB 742 interfaces. In general, it is not necessary to enable PIM on the IRB 743 interfaces unless there are PIM routers on one of the Tenant Domain's 744 BDs, or unless there is some other scenario requiring a Tenant 745 Domain's L3 routing instance to become a PIM adjacency of some other 746 system. These cases will be discussed in Section 7. 748 2.1. Supplementary Broadcast Domain 750 Suppose a given Tenant Domain contains three BDs (BD1, BD2, BD3) and 751 two PEs (PE1, PE2). PE1 attaches to BD1 and BD2, while PE2 attaches 752 to BD2 and BD3. 754 To carry out the procedures described above, all the PEs attached to 755 the Tenant Domain must be provisioned to have the SBD for that tenant 756 domain. An RT must be associated with the SBD, and provisioned on 757 each of those PEs. We will refer to that RT as the "SBD-RT". 759 A Tenant Domain is also configured with an IP-VRF ([EVPN-IRB]), and 760 the IP-VRF is associated with an RT. This RT MAY be the same as the 761 SBD-RT. 763 Suppose an (S,G) multicast frame originating on BD1 has a receiver on 764 BD3. PE1 will transmit the packet to PE2 as a frame, and the 765 encapsulation will identify the frame's source BD as BD1. Since PE2 766 is not provisioned with BD1, it will treat the packet as if its 767 source BD were the SBD. That is, a packet can be transmitted from 768 BD1 to BD3 even though its ingress PE is not configured for BD3, and/ 769 or its egress PE is not configured for BD1. 771 EVPN supports service models in which a given EVPN Instance (EVI) can 772 contain only one BD. It also supports service models in which a 773 given EVI can contain multiple BDs. The SBD can be treated either as 774 its own EVI, or it can be treated as one BD within an EVI that 775 contains multiple BDs. The procedures specified in this document 776 accommodate both cases. 778 2.2. When is a Route About/For/From a Particular BD 780 In this document, we will frequently say that a particular route is 781 "about" a particular BD, or is "from" a particular BD, or is "for" a 782 particular BD or is "related to" a particular BD. These terms are 783 used interchangeably. In this section, we explain exactly what that 784 means. 786 In EVPN, each BD is assigned an RT. In some service models, each BD 787 is assigned a unique RT. In other service models, a set of BDs (all 788 in the same Tenant Domain) may be assigned the same RT. (An RT is 789 actually assigned to a MAC-VRF, and hence is shared by all the BDs 790 that share the MAC-VRF.) The RT is a BGP extended community that may 791 be attached to the BGP routes used by the EVPN control plane. 793 In those service models that allow a set of BDs to share a single RT, 794 each BD is assigned a non-zero Tag ID. The Tag ID appears in the 795 Network Layer Reachability Information (NLRI) of many of the BGP 796 routes that are used by the EVPN control plane. 798 A route is about a particular BD if it carries the RT that has been 799 assigned to that BD, and its NLRI contains the Tag ID that has been 800 assigned to that BD. 802 Note that a route that is about a particular BD may also carry 803 additional RTs. 805 2.3. Use of IRB Interfaces at Ingress PE 807 When an (S,G) multicast frame is received from an AC belonging to a 808 particular BD, say BD1: 810 1. The frame is sent unchanged to other EVPN-PEs that are interested 811 in (S,G) traffic. The encapsulation used to send the frame to 812 the other EVPN-PEs depends on the tunnel type being used for 813 multicast transmission. (For our purposes, we consider Ingress 814 Replication (IR), Assisted Replication (AR) and BIER to be 815 "tunnel types", even though IR, AR and BIER do not actually use 816 P2MP tunnels.) At the egress PE, the source BD of the frame can 817 be inferred from the tunnel encapsulation. If the egress PE is 818 not attached to the real source BD, it will infer that the source 819 BD is the SBD. 821 Note that the the inter-PE transmission of a multicast frame 822 among EVPN-PEs of the same Tenant Domain does NOT involve the IRB 823 interfaces, as long as the multicast frame was received over an 824 AC attached to one of the Tenant Domain's BDs. 826 2. The frame is also sent up the IRB interface that attaches BD1 to 827 the Tenant Domain's L3 routing instance in this PE. That is, the 828 L3 routing instance, behaving as if it were a multicast router, 829 receives the IP multicast frames that arrive at the PE from its 830 local ACs. The L3 routing instance decapsulates the frame's 831 payload to extract the IP multicast packet, decrements the IP 832 TTL, adjusts the header checksum, and does any other necessary IP 833 processing (e.g., fragmentation). 835 3. The L3 routing instance keeps track of which BDs have local 836 receivers for (S,G) traffic. (A "local receiver" is a tenant 837 system, reachable via a local attachment circuit that has 838 expressed interest in (S,G) traffic.) If the L3 routing instance 839 has an IRB interface to BD2, and it knows that BD2 has a LOCAL 840 receiver interested in (S,G) traffic, it encapsulates the packet 841 in an ethernet header for BD2, putting its own MAC address in the 842 MAC SA field. Then it sends the packet down the IRB interface to 843 BD2. 845 If a packet is sent from the L3 routing instance to a particular BD 846 via the IRB interface (step 3 in the above list), and if the BD in 847 question is NOT the SBD, the packet is sent ONLY to LOCAL ACs of that 848 BD. If the packet needs to go to other PEs, it has already been sent 849 to them in step 1. Note that this is a change in the IRB interface 850 semantics from what is described in [EVPN-IRB] and Figure 2. 852 Existing EVPN procedures ensure that a packet is not sent by a given 853 PE to a given locally attached segment unless the PE is the DF for 854 that segment. Those procedures also ensure that a packet is never 855 sent by a PE to its segment of origin. Thus EVPN segment multi- 856 homing is fully supported; duplicate delivery to a segment or looping 857 on a segment are thereby prevented, without the need for any new 858 procedures to be defined in this document. 860 What if an IP multicast packet is received from outside the tenant 861 domain? For instance, perhaps PE1's IP-VRF for a particular tenant 862 domain also has a physical interface leading to an external switch, 863 host, or router, and PE1 receives an IP multicast packet or frame on 864 that interface. Or perhaps the packet is from an L3VPN, or a 865 different EVPN Tenant Domain. 867 Such a packet is first processed by the L3 routing instance, which 868 decrements TTL and does any other necessary IP processing. Then the 869 packet is sent into the Tenant Domain by sending it down the IRB 870 interface to the SBD of that Tenant Domain. This requires 871 encapsulating the packet in an ethernet header, with the PE's own MAC 872 address, on the SBD, in the MAC SA field. 874 An IP multicast packet sent by the L3 routing instance down the IRB 875 interface to the SBD is treated as if it had arrived from a local AC, 876 and steps 1-3 are applied. Note that the semantics of sending a 877 packet down the IRB interface to the SBD are thus slightly different 878 than the semantics of sending a packet down other IRB interfaces. IP 879 multicast packets sent down the SBD's IRB interface may be 880 distributed to other PEs, but IP multicast packets sent down other 881 IRB interfaces are distributed only to local ACs. 883 If a PE sends a link-local multicast packet down the SBD IRB 884 interface, that packet will be distributed (as an ethernet frame) to 885 other PEs of the Tenant Domain, but will not appear on any of the 886 actual BDs. 888 2.4. Use of IRB Interfaces at an Egress PE 890 Suppose an egress EVPN-PE receives an (S,G) multicast frame from the 891 frame's ingress EVPN-PE. As described above, the packet will arrive 892 as an ethernet frame over a tunnel from the ingress PE, and the 893 tunnel encapsulation will identify the source BD of the ethernet 894 frame. 896 We define the notion of the frame's "inferred source BD" as follows. 897 If the egress PE is attached to the actual source BD, the actual 898 source BD is the inferred source BD. If the egress PE is not 899 attached to the actual source BD, the inferred source BD is the SBD. 901 The egress PE now takes the following steps: 903 1. If the egress PE has ACs belonging to the inferred source BD of 904 the frame, it sends the frame unchanged to any ACs of that BD 905 that have interest in (S,G) packets. The MAC SA of the frame is 906 not modified, and the IP header of the frame's payload is not 907 modified in any way. 909 2. The frame is also sent to the L3 routing instance by being sent 910 up the IRB interface that attaches the L3 routing instance to the 911 inferred source BD. Steps 2 and 3 of Section 2.3 are then 912 applied. 914 2.5. Announcing Interest in (S,G) 916 [IGMP-Proxy] defines the procedures used by an egress PE to announce 917 its interest in a multicast flow or set of flows. This is done by 918 originating an SMET route. If an egress PE determines it has LOCAL 919 receivers in a particular BD that are interested in a particular set 920 of flows, it originates one or more SMET routes for that BD. The 921 SMET route specifies a flow or set of flows, and identifies the 922 egress PE. The SMET route is specific to a particular BD. A PE that 923 originates an SMET route is announcing "I have receivers for (S,G) or 924 (*,G) in BD-x". 926 In [IGMP-Proxy], an SMET route for a particular BD carries a Route 927 Target (RT) that ensures it will be distributed to all PEs that are 928 attached to that BD. In this document, it is REQUIRED that an SMET 929 route also carry the RT that is assigned to the SBD. This ensures 930 that every ingress PE attached to a particular Tenant Domain will 931 learn of all other PEs (attached to the same Tenant Domain) that have 932 interest in a particular set of flows. Note that it is not necessary 933 for the ingress PE to have any BDs other than the SBD in common with 934 the egress PEs. 936 Since the SMET routes from any BD in a given Tenant Domain are 937 propagated to all PEs of that Tenant Domain, an (S,G) receiver on one 938 BD can receive (S,G) packets that originate in a different BD. 939 Within an EVPN domain, a given IP source address can only be on one 940 BD. Therefore inter-subnet multicasting can be done, within the 941 Tenant Domain, without requiring any Rendezvous Points, shared trees, 942 or other complex aspects of multicast routing infrastructure. (Note 943 that while the MAC addresses do not have to be unique across all the 944 BDs in a Tenant Domain, the IP addresses to have to be unique across 945 all those BDs.) 947 If some PE attached to the Tenant Domain does not support [IGMP- 948 Proxy], it will be assumed to be interested in all flows. Whether a 949 particular remote PE supports [IGMP-Proxy] is determined by the 950 presence of the Multicast Flags Extended Community in its IMET route; 951 this is specified in [IGMP-Proxy].) 953 2.6. Tunneling Frames from Ingress PE to Egress PEs 955 [RFC7432] specifies the procedures for setting up and using "BUM 956 tunnels". A BUM tunnel is a tunnel used to carry traffic on a 957 particular BD if that traffic is (a) broadcast traffic, or (b) 958 unicast traffic with an unknown MAC DA, or (c) ethernet multicast 959 traffic. 961 This document allows the BUM tunnels to be used as the default 962 tunnels for transmitting intra-subnet IP multicast frames. It also 963 allows a separate set of tunnels to be used, instead of the BUM 964 tunnels, as the default tunnels for carrying intra-subnet IP 965 multicast frames. Let's call these "IP Multicast Tunnels". 967 When the tunneling is done via Ingress Replication or via BIER, this 968 difference is of no significance. However, when P2MP tunnels are 969 used, there is a significant advantages to having separate IP 970 multicast tunnels. 972 It is desirable for an ingress PE to transmit a copy of a given (S,G) 973 multicast frame on only one tunnel. All egress PEs interested in 974 (S,G) packets must then join that tunnel. If the source BD/PE for an 975 (S,G) packet is BD1/PE1, and PE2 has receivers for (S,G) on BD2, PE2 976 must join the P2MP LSP on which PE1 transmits the frame. PE2 must 977 join this P2MP LSP even if PE2 is not attached to the source BD 978 (BD1). If PE1 were transmitting the multicast frame on its BD1 BUM 979 tunnel, then PE2 would have to join the BD1 BUM tunnel, even though 980 PE2 has no BD1 attachment circuits. This would cause PE2 to pull all 981 the BUM traffic from BD1, most of which it would just have to 982 discard. Thus we RECOMMEND that the default IP multicast tunnels be 983 distinct from the BUM tunnels. 985 Whether or not the default IP multicast tunnels are distinct from the 986 BUM tunnels, selective tunnels for particular multicast flows can 987 still be used. Traffic sent on a selective tunnel would not be sent 988 on the default tunnel. 990 Notwithstanding the above, link local IP multicast traffic MUST 991 always be carried on the BUM tunnels, and ONLY on the BUM tunnels. 992 Link local IP multicast traffic consists of IPv4 traffic with a 993 destination address prefix of 224/8 and IPv6 traffic with a 994 destination address prefix of FF02/16. In this document, the terms 995 "IP multicast packet" and "IP multicast frame" are defined in 996 Section 1.4 so as to exclude the link-local traffic. 998 2.7. Advanced Scenarios 1000 There are some deployment scenarios that require special procedures: 1002 1. Some multicast sources or receivers are attached to PEs that 1003 support [RFC7432], but do not support this document or 1004 [EVPN-IRB]. To interoperate with these "non-OISM PEs", it is 1005 necessary to have one or more gateway PEs that interface the 1006 tunnels discussed in this document with the BUM tunnels of the 1007 legacy PEs. This is discussed in Section 5. 1009 2. Sometimes multicast traffic originates from outside the EVPN 1010 domain, or needs to be sent outside the EVPN domain. This is 1011 discussed in Section 6. An important special case of this, 1012 integration with MVPN, is discussed in Section 6.1.2. 1014 3. In some scenarios, one or more of the tenant systems is a PIM 1015 router, and the Tenant Domain is used for as a transit network 1016 that is part of a larger multicast domain. This is discussed in 1017 Section 7. 1019 3. EVPN-aware Multicast Solution Control Plane 1021 3.1. Supplementary Broadcast Domain (SBD) and Route Targets 1023 Every Tenant Domain is associated with a single Supplementary 1024 Broadcast Domain (SBD), as discussed in Section 2.1. Recall that a 1025 Tenant Domain is defined to be a set of BDs that can freely send and 1026 receive IP multicast traffic to/from each other. If an EVPN-PE has 1027 one or more ACs in a BD of a particular Tenant Domain, and if the 1028 EVPN-PE supports the procedures of this document, that EVPN-PE must 1029 be provisioned with the SBD of that Tenant Domain. 1031 At each EVPN-PE attached to a given Tenant Domain, there is an IRB 1032 interface leading from the L3 routing instance of that Tenant Domain 1033 and the SBD. However, the SBD has no ACs. 1035 The SBD may be in an EVPN Instance (EVI) of its own, or it may be one 1036 of several BDs (of the same Tenant Domain) in an EVI. 1038 Each SBD is provisioned with a Route Target (RT). All the EVPN-PEs 1039 supporting a given SBD are provisioned with that RT as an import RT. 1041 Each SBD is also provisioned with a "Tag ID" (see Section 6 of 1042 [RFC7432]). 1044 o If the SBD is the only BD in its EVI, the mapping from RT to SBD 1045 is one-to-one. The Tag ID is zero. 1047 o If the SBD is one of several BDs in its EVI, it may have its own 1048 RT, or it may share an RT with one or more of those other BDs. In 1049 either case, it must be assigned a non-zero Tag ID. The mapping 1050 from is always one-to-one. 1052 We will use the term "SBD-RT" to denote the RT has has been assigned 1053 to an SBD. Routes carrying this RT will be propagated to all 1054 EVPN-PEs in the same Tenant Domain as the originator. 1056 An EVPN-PE that receives a route can always determine whether a 1057 received route "belongs to" a particular SBD, by seeing if that route 1058 carries the SBD-RT and has the Tag ID of the SBD in its NLRI. 1060 If the VLAN-based service model is being used for a particular Tenant 1061 Domain, and thus each BD is in a distinct EVI, it is natural to have 1062 the SBD be in a distinct EVI as well. If the VLAN-aware bundle 1063 service is being used, it is natural to include the SBD in the same 1064 EVI that contains the other BDs. However, it is not required to do 1065 so; the SBD can still be placed in an EVI of its own, if that is 1066 desired. 1068 Note that an SBD, just like any other BD, is associated on each 1069 EVPN-PE with a MAC-VRF. Per [RFC7432], each MAC-VRF is associated 1070 with a Route Distinguisher (RD). When constructing a route that is 1071 "about" an SBD, an EVPN-PE will place the RD of the associated 1072 MAC-VRF in the "Route Distinguisher" field of the NLRI. (If the 1073 Tenant Domain has several MAC-VRFs on a given PE, the EVPN-PE has a 1074 choice of which RD to use.) 1076 If Assisted Replication (AR, see [EVPN-AR]) is used, each 1077 AR-REPLICATOR for a given Tenant Domain must be provisioned with the 1078 SBD of that Tenant Domain, even if the AR-REPLICATOR does not have 1079 any L3 routing instance. 1081 3.2. Advertising the Tunnels Used for IP Multicast 1083 The procedures used for advertising the tunnels that carry IP 1084 multicast traffic depend upon the type of tunnel being used. If the 1085 tunnel type is neither Ingress Replication, Assisted Replication, nor 1086 BIER, there are procedures for advertising both "inclusive tunnels" 1087 and "selective tunnels". 1089 When IR, AR or BIER are used to transmit IP multicast packets across 1090 the core, there are no P2MP tunnels. Once an ingress EVPN-PE 1091 determines the set of egress EVPN-PEs for a given flow, the IMET 1092 routes contain all the information needed to transport packets of 1093 that flow to the egress PEs. 1095 If AR is used, the ingress EVPN-PE is also an AR-LEAF and the IMET 1096 route coming from the selected AR-REPLICATOR contains the information 1097 needed. The AR-REPLICATOR will behave as an ingress EVPN-PE when 1098 sending a flow to the egress EVPN-PEs. 1100 If the tunneling technique requires P2MP tunnels to be set up (e.g., 1101 RSVP-TE P2MP, mLDP, PIM), some of the tunnels may be selective 1102 tunnels and some may be inclusive tunnels. 1104 Selective tunnels are always advertised by the ingress PE using 1105 S-PMSI A-D routes ([EVPN-BUM]). 1107 For inclusive tunnels, there is a choice between using a BD's 1108 ordinary "BUM tunnel" [RFC7432] as the default inclusive tunnel for 1109 carrying IP multicast traffic, or using a separate IP multicast 1110 tunnel as the default inclusive tunnel for carrying IP multicast. In 1111 the former case, the inclusive tunnel is advertised in an IMET route. 1112 In the latter case, the inclusive tunnel is advertised in a (C-*,C-*) 1113 S-PMSI A-D route ([EVPN-BUM]). Details may be found in subsequent 1114 sections. 1116 3.2.1. Constructing SBD Routes 1118 3.2.1.1. Constructing an SBD-IMET Route 1120 In general, an EVPN-PE originates an IMET route for each real BD. 1121 Whether an EVPN-PE has to originate an IMET route for the SBD (of a 1122 particular Tenant Domain) depends upon the type of tunnels being used 1123 to carry EVPN multicast traffic across the backbone. In some cases, 1124 an IMET route does not need to be originated for the SBD, but the 1125 other IMET routes have to carry the SBD-RT as well as any other RTs 1126 they would ordinarily carry (per [RFC7432]. 1128 Subsequent sections will specify when it is necessary for an EVPN-PE 1129 to originate an IMET route for the SBD. We will refer to such a 1130 route as an "SBD-IMET route". 1132 When an EVPN-PE needs to originate an SBD-IMET route that is "for" 1133 the SBD, it constructs the route as follows: 1135 o the RD field of the route's NLRI is set to the RD of the MAC-VRF 1136 that is associated with the SBD; 1138 o a Route Target Extended Community containing the value of the 1139 SBD-RT is attached to that route; 1141 o the "Tag ID" field of the NLRI is set to the Tag ID that has been 1142 assigned to the SBD. This is most likely 0 if a VLAN-based or 1143 VLAN-bundle service is being used and non-zero if a VLAN-aware 1144 bundle service is being used. 1146 3.2.1.2. Constructing an SBD-SMET Route 1148 An EVPN-PE can originate an SMET route to indicate that it has 1149 receivers, on a specified BD, for a specified multicast flow. In 1150 some scenarios, an EVPN-PE must originate an SMET route that is for 1151 the SBD, which we will call an "SBD-SMET route". Whether an EVPN-PE 1152 has to originate an SMET route for the SBD (of a particular tenant 1153 domain) depends upon various factors, detailed in subsequent 1154 sections. 1156 When an EVPN-PE needs to originate an SBD-SMET route that is "for" 1157 the SBD, it constructs the route as follows: 1159 o the RD field of the route's NLRI is set to the RD of the MAC-VRF 1160 that is associated with the SBD; 1162 o a Route Target Extended Community containing the value of the 1163 SBD-RT is attached to that route; 1165 o the "Tag ID" field of the NLRI is set to the Tag ID that has been 1166 assigned to the SBD. This is most likely 0 if a VLAN-based or 1167 VLAN-bundle service is being used and non-zero if a VLAN-aware 1168 bundle service is being used. 1170 3.2.1.3. Constructing an SBD-SPMSI Route 1172 An EVPN-PE can originate an S-PMSI A-D route (see [EVPN-BUM]) to 1173 indicate that it is going to use a particular P2MP tunnel to carry 1174 the traffic of particular IP multicast flows. In general, an S-PMSI 1175 A-D route is specific to a particular BD. In some scenarios, an 1176 EVPN-PE must originate an S-PMSI A-D route that is for the SBD, which 1177 we will call an "SBD-SPMSI route". Whether an EVPN-PE has to 1178 originate an SBD-SPMSI route for (of a particular Tenant Domain) 1179 depends upon various factors, detailed in subsequent sections. 1181 When an EVPN-PE needs to originate an SBD-SPMSI route that is "for" 1182 the SBD, it constructs the route as follows: 1184 o the RD field of the route's NLRI is set to the RD of the MAC-VRF 1185 that is associated with the SBD; 1187 o a Route Target Extended Community containing the value of the 1188 SBD-RT is attached to that route; 1190 o the "Tag ID" field of the NLRI is set to the Tag ID that has been 1191 assigned to the SBD. This is most likely 0 if a VLAN-based or 1192 VLAN-bundle service is being used and non-zero if a VLAN-aware 1193 bundle service is being used. 1195 3.2.2. Ingress Replication 1197 When Ingress Replication (IR) is used to transport IP multicast 1198 frames of a given Tenant Domain, each EVPN-PE attached to that Tenant 1199 Domain MUST originate an SBD-IMET route, as described in 1200 Section 3.2.1.1. 1202 The SBD-IMET route MUST carry a PMSI Tunnel attribute (PTA), and the 1203 MPLS label field of the PTA MUST specify a downstream-assigned MPLS 1204 label that maps uniquely (in the context of the originating EVPN-PE) 1205 to the SBD. 1207 An EVPN-PE MUST also originate an IMET route for each BD to which it 1208 is attached, following the procedures of [RFC7432]. Each of these 1209 IMET routes carries a PTA that specifying a downstream-assigned label 1210 that maps uniquely (in the context of the originating EVPN-PE) to the 1211 BD in question. These IMET routes need not carry the SBD-RT. 1213 When an ingress EVPN-PE needs to use IR to send an IP multicast frame 1214 from a particular source BD to an egress EVPN-PE, the ingress PE 1215 determines whether the egress PE has originated an IMET route for 1216 that BD. If so, that IMET route contains the MPLS label that the 1217 egress PE has assigned to the source BD. The ingress PE uses that 1218 label when transmitting the packet to the egress PE. Otherwise, the 1219 ingress PE uses the label that the egress PE has assigned to the SBD 1220 (in the SBD-IMET route originated by the egress). 1222 Note that the set of IMET routes originated by a given egress PE, and 1223 installed by a given ingress PE, will change over time. If the 1224 egress PE withdraws its IMET route for the source BD, the ingress PE 1225 must stop using the label carried in that IMET route, and start using 1226 the label carried in the SBD-IMET route from that egress PE. 1228 3.2.3. Assisted Replication 1230 When Assisted Replication is used to transport IP multicast frames of 1231 a given Tenant Domain, each EVPN-PE (including the AR-REPLICATOR) 1232 attached to the Tenant Domain MUST originate an SBD-IMET route, as 1233 described in Section 3.2.1.1. 1235 An AR-REPLICATOR attached to a given Tenant Domain is considered to 1236 be an EVPN-PE of that Tenant Domain. It is attached to all the BDs 1237 in the Tenant Domain, but it has no IRB interfaces. 1239 As with Ingress Replication, the SBD-IMET route carries a PTA where 1240 the MPLS label field specifies the downstream-assigned MPLS label 1241 that identifies the SBD. However, the AR-REPLICATOR and AR-LEAF 1242 EVPN-PEs will set the PTA's flags differently, as per [EVPN-AR]. 1244 In addition, each EVPN-PE originates an IMET route for each BD to 1245 which it is attached. As in the case of Ingress Replication, these 1246 routes carry the downstream-assigned MPLS labels that identify the 1247 BDs and do not carry the SBD-RT. 1249 When an ingress EVPN-PE, acting as AR-LEAF, needs to send an IP 1250 multicast frame from a particular source BD to an egress EVPN-PE, the 1251 ingress PE determines whether there is any AR-REPLICATOR that 1252 originated an IMET route for that BD. After the AR-REPLICATOR 1253 selection (if there are more than one), the AR-LEAF uses the label 1254 contained in the IMET route of the AR-REPLICATOR when transmitting 1255 packets to it. The AR-REPLICATOR receives the packet and, based on 1256 the procedures specified in [EVPN-AR], transmits the packets to the 1257 egress EVPN-PEs using the labels contained in the IMET routes 1258 received from the egress PEs. 1260 If an ingress AR-LEAF for a given BD has not received any IMET route 1261 for that BD from an AR-REPLICATOR, the ingress AR-LEAF follows the 1262 procedures in Section 3.2.2. 1264 3.2.4. BIER 1266 When BIER is used to transport multicast packets of a given Tenant 1267 Domain, each EVPN-PE attached to that Tenant Domain MUST originate an 1268 SBD-IMET route, as described in Section 3.2.1.1. 1270 In addition, IMET routes that are originated for other BDs in the 1271 Tenant Domain MUST carry the SBD-RT. 1273 Each IMET route (including but not limited to the SBD-IMET route) 1274 MUST carry a PMSI Tunnel attribute (PTA). The MPLS label field of 1275 the PTA MUST specify an upstream-assigned MPLS label that maps 1276 uniquely (in the context of the originating EVPN-PE) to the BD for 1277 which the route is originated. 1279 When an ingress EVPN-PE uses BIER to send an IP multicast packet 1280 (inside an ethernet frame) from a particular source BD to a set of 1281 egress EVPN-PEs, the ingress PE follows the BIER encapsulation with 1282 the upstream-assigned label it has assigned to the source BD. (This 1283 label will come from the originated SBD-IMET route ONLY if the 1284 traffic originated from outside the Tenant Domain.) An egress PE can 1285 determine from that label whether the packet's source BD is one of 1286 the BDs to which the egress PE is attached. 1288 Further details on the use of BIER to support EVPN can be found in 1289 [EVPN-BIER]. 1291 3.2.5. Inclusive P2MP Tunnels 1293 3.2.5.1. Using the BUM Tunnels as IP Multicast Inclusive Tunnels 1295 The procedures in this section apply only when it is desired to use 1296 the BUM tunnels to carry IP multicast traffic across the backbone. 1297 In this cases, an IP multicast frame (whether inter-subnet or 1298 intra-subnet) will be carried across the backbone in the BUM tunnel 1299 belonging to its source BD. An EVPN-PE attached to a given Tenant 1300 Domain will then need to join the BUM tunnels for each BD in the 1301 Tenant Domain, even if the EVPN-PE is not attached to all of those 1302 BDs. The reason is that an IP multicast packet from any source BD 1303 might be needed by an EVPN-PE that is not attached to that source 1304 domain. 1306 Note that this will cause BUM traffic from a given BD in a Tenant 1307 Domain to be sent to all PEs that attach to that tenant domain, even 1308 the PEs that don't attach to the given BD. To avoid this, it is 1309 RECOMMENDED that the BUM tunnels not be used as IP Multicast 1310 inclusive tunnels, and that the procedures of Section 3.2.5.2 be used 1311 instead. 1313 3.2.5.1.1. RSVP-TE P2MP 1315 When BUM tunnels created by RSVP-TE P2MP are used to transport IP 1316 multicast frames of a given Tenant Domain, each EVPN-PE attached to 1317 that Tenant Domain MUST originate an SBD-IMET route, as described in 1318 Section 3.2.1.1. 1320 In addition, IMET routes that are originated for other BDs in the 1321 Tenant Domain MUST carry the SBD-RT. 1323 Each IMET route (including but not limited to the SBD-IMET route) 1324 MUST carry a PMSI Tunnel attribute (PTA). 1326 If received IMET route is not the SBD-IMET route, it will also be 1327 carrying the RT for its source BD. The route's NLRI will carry the 1328 Tag ID for the source BD. From the RT and the Tag ID, any PE 1329 receiving the route can determine the route's source BD. 1331 If the MPLS label field of the PTA contains zero, the specified 1332 RSVP-TE P2MP tunnel is used only to carry frames of a single source 1333 BD. 1335 If the MPLS label field of the PTA does not contain zero, it MUST 1336 contain an upstream-assigned MPLS label that maps uniquely (in the 1337 context of the originating EVPN-PE) to the source BD (or, in the case 1338 of an SBD-IMET route, the SBD). The tunnel may be used to carry 1339 frames of multiple source BDs, and the source BD for a particular 1340 packet is inferred from the label carried by the packet. 1342 IP multicast traffic originating outside the Tenant Domain is 1343 transmitted with the label corresponding to the SBD, as specified in 1344 the ingress EVPN-PE's SBD-IMET route. 1346 3.2.5.1.2. mLDP or PIM 1348 When either mLDP or PIM is used to transport multicast packets of a 1349 given Tenant Domain, an EVPN-PE attached to that tenant domain 1350 originates an SBD-IMET route only if it is the ingress PE for IP 1351 multicast traffic originating outside the tenant domain. Such 1352 traffic is treated as having the SBD as its source BD. 1354 An EVPN-PE MUST originate an IMET routes for each BD to which it is 1355 attached. These IMET routes MUST carry the SBD-RT of the Tenant 1356 Domain to which the BD belongs. Each such IMET route must also carry 1357 the RT of the BD to which it belongs. 1359 When an IMET route (other than the SBD-IMET route) is received by an 1360 egress PE, the route will be carrying the RT for its source BD and 1361 the route's NLRI will contain the Tag ID for that source BD. This 1362 allows any PE receiving the route to determine the source BD 1363 associated with the route. 1365 If the MPLS label field of the PTA contains zero, the specified mLDP 1366 or PIM tunnel is used only to carry frames of a single source BD. 1368 If the MPLS label field of the PTA does not contain zero, it MUST 1369 contain an upstream-assigned MPLS label that maps uniquely (in the 1370 context of the originating EVPN-PE) to the source BD. The tunnel may 1371 be used to carry frames of multiple source BDs, and the source BD for 1372 a particular packet is inferred from the label carried by the packet. 1374 The EVPN-PE advertising these IMET routes is specifying the default 1375 tunnel that it will use (as ingress PE) for transmitting IP multicast 1376 packets. The upstream-assigned label allows an egress PE to 1377 determine the source BD of a given packet. 1379 The procedures of this section apply whenever the tunnel technology 1380 is based on the construction of the multicast trees in a "receiver- 1381 driven" manner; mLDP and PIM are two ways of constructing trees in a 1382 receiver-driven manner. 1384 3.2.5.2. Using Wildcard S-PMSI A-D Routes to Advertise Inclusive 1385 Tunnels Specific to IP Multicast 1387 The procedures of this section apply when (and only when) it is 1388 desired to transmit IP multicast traffic on an inclusive tunnel, but 1389 not on the same tunnel used to transmit BUM traffic. 1391 However, these procedures do NOT apply when the tunnel type is 1392 Ingress Replication or BIER, EXCEPT in the case where it is necessary 1393 to interwork between non-OISM PEs and OISM PEs, as specified in 1394 Section 5. 1396 Each EVPN-PE attached to the given Tenant Domain MUST originate an 1397 SBD-SPMSI A-D route. The NLRI of that route MUST contain (C-*,C-*) 1398 (see [RFC6625]). Additional rules for constructing that route are 1399 given in Section 3.2.1.3. 1401 In addition, an EVPN-PE MUST originate an S-PMSI A-D route containing 1402 (C-*,C-*) in its NLRI for each of the other BDs in the Tenant Domain 1403 to which it is attached. All such routes MUST carry the SBD-RT. 1404 This ensures that those routes are imported by all EVPN-PEs attached 1405 to the Tenant Domain. 1407 The route carrying the PTA will also be carrying the RT for that 1408 source BD, and the route's NLRI will contain the Tag ID for that 1409 source BD. This allows any PE receiving the route to determine the 1410 source BD associated with the route. 1412 If the MPLS label field of the PTA contains zero, the specified 1413 tunnel is used only to carry frames of a single source BD. 1415 If the MPLS label field of the PTA does not contain zero, it MUST 1416 specify an upstream-assigned MPLS label that maps uniquely (in the 1417 context of the originating EVPN-PE) to the source BD. The tunnel may 1418 be used to carry frames of multiple source BDs, and the source BD for 1419 a particular packet is inferred from the label carried by the packet. 1421 The EVPN-PE advertising these S-PMSI A-D route routes is specifying 1422 the default tunnel that it will use (as ingress PE) for transmitting 1423 IP multicast packets. The upstream-assigned label allows an egress 1424 PE to determine the source BD of a given packet. 1426 3.2.6. Selective Tunnels 1428 An ingress EVPN-PE for a given multicast flow or set of flows can 1429 always assign the flow to a particular P2MP tunnel by originating an 1430 S-PMSI A-D route whose NLRI identifies the flow or set of flows. The 1431 NLRI of the route could be (C-*,C-G), or (C-S,C-G). The S-PMSI A-D 1432 route MUST carry the SBD-RT, so that it is imported by all EVPN-PEs 1433 attached to the Tenant Domain. 1435 An S-PMSI A-D route is "for" a particular source BD. It MUST carry 1436 the RT associated with that BD, and it MUST have the Tag ID for that 1437 BD in its NLRI. 1439 Each such route MUST contain a PTA, as specified in Section 3.2.5.2. 1441 An egress EVPN-PE interested in the specified flow or flows MUST join 1442 the specified tunnel. Procedures for joining the specified tunnel 1443 are specific to the tunnel type. (Note that if the tunnel type is 1444 RSVP-TE P2MP LSP, the Leaf Information Required (LIR) flag of the PTA 1445 SHOULD NOT be set. An ingress OISM PE knows which OISM EVPN PEs are 1446 interested in any given flow, and hence can add them to the RSVP-TE 1447 P2MP tunnel that carries such flows.) 1449 When an EVPN-PE imports an S-PMSI A-D route, it infers the source BD 1450 from the RTs and the Tag ID. If the EVPN-PE is not attached to the 1451 source BD, the tunnel it specifies is treated as belonging to the 1452 SBD. That is, packets arriving on that tunnel are treated as having 1453 been sourced in the SBD. Note that a packet is only considered to 1454 have arrived on the specified tunnel if the packet carries the 1455 upstream-assigned label specified in in the PTA, or if there is no 1456 upstream-assigned label specified in the PTA. 1458 It should be noted that when either IR or BIER is used, there is no 1459 need for an ingress PE to use S-PMSI A-D routes to assign specific 1460 flows to selective tunnels. The procedures of Section 3.3, along 1461 with the procedures of Section 3.2.2, Section 3.2.3, or 1462 Section 3.2.4, provide the functionality of selective tunnels without 1463 the need to use S-PMSI A-D routes. 1465 3.3. Advertising SMET Routes 1467 [IGMP-Proxy] allows an egress EVPN-PE to express its interest in a 1468 particular multicast flow or set of flows by originating an SMET 1469 route. The NLRI of the SMET route identifies the flow or set of 1470 flows as (C-*,C-*) or (C-*,C-G) or (C-S,C-G). 1472 Each SMET route belongs to a particular BD. The Tag ID for the BD 1473 appears in the NLRI of the route, and the route carries the RT 1474 associated that that BD. From this pair, other EVPN-PEs 1475 can identify the BD to which a received SMET route belongs. 1476 (Remember though that the route may be carrying multiple RTs.) 1478 There are two cases to consider: 1480 1. Case 1: When it is known that no BD of a Tenant Domain contains a 1481 multicast router. 1483 In this case, an egress PE can advertise its interest in a flow 1484 or set of flows by originating a single SMET route. The SMET 1485 route will belong to the SBD. We refer to this as an SBD-SMET 1486 route. The SBD-SMET route carries the SBD-RT, and has the Tag ID 1487 for the SBD in its NLRI. SMET routes for the individual BDs are 1488 not needed. 1490 2. Case 2: When it is possible that a BD of a Tenant Domain contains 1491 a multicast router. 1493 Suppose that an egress PE is attached to a BD on which there 1494 might be a tenant multicast router. (The tenant router is not 1495 necessarily on a segment that is attached to that PE.) And 1496 suppose that the PE has one or more ACs attached to that BD which 1497 are interested in a given multicast flow. In this case, IN 1498 ADDITION to the SMET route for the SBD, the egress PE MUST 1499 originate an SMET route for that BD. This will enable the 1500 ingress PE(s) to send IGMP/MLD messages on ACs for the BD, as 1501 specified in [IGMP-Proxy]. 1503 If an SMET route is not an SBD-SMET route, and if the SMET route 1504 is for (C-S,C-G) (i.e., no wildcard source), and if the EVPN-PE 1505 originating it knows the source BD of C-S, it MAY put only the RT 1506 for that BD on the route. Otherwise, the route MUST carry the 1507 SBD-RT, so that it gets distributed to all the EVPN-PEs attached 1508 to the tenant domain. 1510 As detailed in [IGMP-Proxy], an SMET route carries flags saying 1511 whether it is to result in the propagation of IGMP v1, v2, or v3 1512 messages on the ACs of the BD to which the SMET route belongs. These 1513 flags SHOULD be set to zero in an SBD-SMET route. 1515 Note that a PE only needs to originate the set SBD-SMET routes that 1516 are needed to pull in all the traffic in which it is interested. 1517 Suppose PE1 has ACs attached to BD1 that are interested in (C-*,C-G) 1518 traffic, and ACs attached to BD2 that are interested in (C-S,C-G) 1519 traffic. A single SBD-SMET route specifying (C-*,C-G) will pull in 1520 all the necessary flows. 1522 As another example, suppose the ACs attached to BD1 are interested in 1523 (C-*,C-G) but not in (C-S,C-G), while the ACs attached to BD2 are 1524 interested in (C-S,C-G). A single SBD-SMET route specifying 1525 (C-*,C-G) will pull in all the necessary flows. 1527 In other words, to determine the set of SBD-SMET routes that have to 1528 be sent for a given C-G, the PE has to merge the IGMP/MLD state for 1529 all the BDs (of the given Tenant Domain) to which it is attached. 1531 Per [IGMP-Proxy], importing an SMET route for a particular BD will 1532 cause IGMP/MLD state to be instantiated for the IRB interface to that 1533 BD. This applies as well when the BD is the SBD. 1535 However, traffic originating in a BD of a particular Tenant Domain 1536 MUST NOT be sent down the IRB interface that connects the L3 routing 1537 instance of that Tenant Domain to the SBD of that Tenant Domain. 1538 That would cause duplicate delivery of traffic, since traffic 1539 arriving at L3 over the IRB interface from the SBD has already been 1540 distributed throughout the Tenant Domain. When setting up the IGMP/ 1541 MLD state based on SBD-SMET routes, care must be taken to ensure that 1542 the IRB interface to the SBD is not added to the Outgoing Interface 1543 (OIF) list if the traffic originates within the Tenant Domain. 1545 4. Constructing Multicast Forwarding State 1547 4.1. Layer 2 Multicast State 1549 An EVPN-PE maintains "layer 2 multicast state" for each BD to which 1550 it is attached. 1552 Let PE1 be an EVPN-PE, and BD1 be a BD to which it is attached. At 1553 PE1, BD1's layer 2 multicast state for a given (C-S,C-G) or (C-*,C-G) 1554 governs the disposition of an IP multicast packet that is received by 1555 BD1's layer 2 multicast function on an EVPN-PE. 1557 An IP multicast (S,G) packet is considered to have been received by 1558 BD1's layer 2 multicast function in PE1 in the following cases: 1560 o The packet is the payload of an ethernet frame received by PE1 1561 from an AC that attaches to BD1. 1563 o The packet is the payload of an ethernet frame whose source BD is 1564 BD1, and which is received by the PE1 over a tunnel from another 1565 EVPN-PE. 1567 o The packet is received from BD1's IRB interface (i.e., has been 1568 transmitted by PE1's L3 routing instance down BD1's IRB 1569 interface). 1571 According to the procedures of this document, all transmission of IP 1572 multicast packets from one EVPN-PE to another is done at layer 2. 1573 That is, the packets are transmitted as ethernet frames, according to 1574 the layer 2 multicast state. 1576 Each layer 2 multicast state (S,G) or (*,G) contains a set "output 1577 interfaces" (OIF list). The disposition of an (S,G) multicast frame 1578 received by BD1's layer 2 multicast function is determined as 1579 follows: 1581 o The OIF list is taken from BD1's layer 2 (S,G) state, or if there 1582 is no such (S,G) state, then from BD1's (*,G) state. (If neither 1583 state exists, the OIF list is considered to be null.) 1585 o The rules of Section 4.1.2 are applied to the OIF list. This will 1586 generally result in the frame being transmitted to some, but not 1587 all, elements of the OIF list. 1589 Note that there is no RPF check at layer 2. 1591 4.1.1. Constructing the OIF List 1593 In this document, we have extended the procedures of [IGMP-Proxy] so 1594 that IMET and SMET routes for a particular BD are distributed not 1595 just to PEs that attach to that BD, but to PEs that attach to any BD 1596 in the Tenant Domain. In this way, each PE attached to a given 1597 Tenant Domain learns, from each other PE attached to the same Tenant 1598 Domain, the set of flows that are of interest to each of those other 1599 PEs. (If some PE attached to the Tenant Domain does not support 1600 [IGMP-Proxy], it will be assumed to be interested in all flows. 1601 Whether a particular remote PE supports [IGMP-Proxy] is determined by 1602 the presence of an Extended Community in its IMET route; this is 1603 specified in [IGMP-Proxy].) If a set of remote PEs are interested in 1604 a particular flow, the tunnels used to reach those PEs are added to 1605 the OIF list of the multicast states corresponding to that flow. 1607 An EVPN-PE may run IGMP/MLD procedures on each of its ACs, in order 1608 to determine the set of flows of interest to each AC. (An AC is said 1609 to be interested in a given flow if it connects to a segment that has 1610 tenant systems interested in that flow.) If IGMP/MLD procedures are 1611 not being run on a given AC, that AC is considered to be interested 1612 in all flows. For each BD, the set of ACs interested in a given flow 1613 is determined, and the ACs of that set are added to the OIF list of 1614 that BD's multicast state for that flow. 1616 The OIF list for each multicast state must also contain the IRB 1617 interface for the BD to which the state belongs. 1619 Implementors should note that the OIF list of a multicast state will 1620 change from time to time as ACs and/or remote PEs either become 1621 interested in, or lose interest in, particular multicast flows. 1623 4.1.2. Data Plane: Applying the OIF List to an (S,G) Frame 1625 When an (S,G) multicast frame is received by the layer 2 multicast 1626 function of a given EVPN-PE, say PE1, its disposition depends (a) the 1627 way it was received, (b) upon the OIF list of the corresponding 1628 multicast state (see Section 4.1.1), (c) upon the "eligibility" of an 1629 AC to receive a given frame (see Section 4.1.2.1 and (d) upon its 1630 source BD (see Section 3.2 for information about determining the 1631 source BD of a frame received over a tunnel from another PE). 1633 4.1.2.1. Eligibility of an AC to Receive a Frame 1635 A given (S,G) multicast frame is eligible to be transmitted by a 1636 given PE, say PE1, on a given AC, say AC1, only if one of the 1637 following conditions holds: 1639 1. ESI labels are being used, PE1 is the DF for the segment to which 1640 AC1 is connected, and the frame did not originate from that same 1641 segment (as determined by the ESI label), or 1643 2. The ingress PE for the frame is a remote PE, say PE2, local bias 1644 is being used, and PE2 is not connected to the same segment as 1645 AC1. 1647 4.1.2.2. Applying the OIF List 1649 Assume a given (S,G) multicast frame has been received by a given PE, 1650 say PE1. PE1 determines the source BD of the frame, finds the layer 1651 2 (S,G) state for the source BD (or the (*,G) state if there is no 1652 (S,G) state), and takes the OIF list from that state. Note that if 1653 PE1 is not attached to the actual source BD, it will treat the frame 1654 as if its source BD is the SBD. 1656 Suppose PE1 has determined the frame's source BD to be BD1 (which may 1657 or may not be the SBD.) There are the following cases to consider: 1659 1. The frame was received by PE1 from a local AC, say AC1, that 1660 attaches to BD1. 1662 a. The frame MUST be sent out all local ACs of BD1 that appear 1663 in the OIF list, except for AC1 itself. 1665 b. The frame MUST also be delivered to any other EVPN-PEs that 1666 have interest in it. This is achieved as follows: 1668 i. If (a) AR is being used, and (b) PE1 is an AR-LEAF, and 1669 (c) the OIF list is non-null, PE1 MUST send the frame 1670 to the AR-REPLICATOR. 1672 ii. Otherwise the frame MUST be sent on all tunnels in the 1673 OIF list. 1675 c. The frame MUST be sent to the local L3 routing instance by 1676 being sent up the IRB interface of BD1. It MUST NOT be sent 1677 up any other IRB interfaces. 1679 2. The frame was received by PE1 over a tunnel from another PE. 1680 (See Section 3.2 for the rules to determine the source BD of a 1681 packet received from another PE. Note that if PE1 is not 1682 attached to the source BD, it will regard the SBD as the source 1683 BD.) 1685 a. The frame MUST be sent out all local ACs in the OIF list that 1686 connect to BD1 and that are eligible (per Section 4.1.2.1) to 1687 receive the frame. 1689 b. The frame MUST be sent up the IRB interface of the source BD. 1690 (Note that this may be the SBD.) The frame MUST NOT be sent 1691 up any other IRB interfaces. 1693 c. If PE1 is not an AR-REPLICATOR, it MUST NOT send the frame to 1694 any other EVPN-PEs. However, if PE1 is an AR-REPLICATOR, it 1695 MUST send the frame to all tunnels in the OIF list, except 1696 for the tunnel over which the frame was received. 1698 3. The frame was received by PE1 from the BD1 IRB interface (i.e., 1699 the frame has been transmitted by PE1's L3 routing instance down 1700 the BD1 IRB interface), and BD1 is NOT the SBD. 1702 a. The frame MUST be sent out all local ACs in the OIF list that 1703 are eligible (per Section 4.1.2.1 to receive the frame. 1705 b. The frame MUST NOT be sent to any other EVPN-PEs. 1707 c. The frame MUST NOT be sent up any IRB interfaces. 1709 4. The frame was received from the SBD IRB interface (i.e., has been 1710 transmitted by PE1's L3 routing instance down the SBD IRB 1711 interface). 1713 a. The frame MUST be sent on all tunnels in the OIF list. This 1714 causes the frame to be delivered to any other EVPN-PEs that 1715 have interest in it. 1717 b. The frame MUST NOT be sent on any local ACs. 1719 c. The frame MUST NOT be sent up any IRB interfaces. 1721 4.2. Layer 3 Forwarding State 1723 If an EVPN-PE is performing IGMP/MLD procedures on the ACs of a given 1724 BD, it processes those messages at layer 2 to help form the layer 2 1725 multicast state. If also sends those messages up that BD's IRB 1726 interface to the L3 routing instance of a particular tenant domain. 1727 This causes layer 2 (C-S,C-G) or (C-*,C-G) L3 state to be created/ 1728 updated. 1730 A layer 3 multicast state has both an Input Interface (IIF) and an 1731 OIF list. 1733 To set the IIF of an (C-S,C-G) state, the EVPN-PE must determine the 1734 source BD of C-S. This is done by looking up S in the local 1735 MAC-VRF(s) of the given Tenant Domain. 1737 If the source BD is present on the PE, the IIF is set to the IRB 1738 interface that attaches to that BD. Otherwise the IIF is set to the 1739 SBD IRB interface. 1741 For (C-*,C-G) states, traffic can arrive from any BD, so the IIF 1742 needs to be set to a wildcard value meaning "any IRB interface". 1744 The OIF list of these states includes one or more of the IRB 1745 interfaces of the Tenant Domain. In general, maintenance of the OIF 1746 list does not require any EVPN-specific procedures. However, there 1747 is one EVPN-specific rule: 1749 If the IIF is one of the IRB interfaces (or the wild card meaning 1750 "any IRB interface"), then the SBD IRB interface MUST NOT be added 1751 to the OIF list. Traffic originating from within a particular 1752 EVPN Tenant Domain must not be sent down the SBD IRB interface, as 1753 such traffic has already been distributed to all EVPN-PEs attached 1754 to that Tenant Domain. 1756 Please also see Section 6.1.1, which states a modification of this 1757 rule for the case where OISM is interworking with external Layer 3 1758 multicast routing. 1760 5. Interworking with non-OISM EVPN-PEs 1762 It is possible that a given Tenant Domain will be attached to both 1763 OISM PEs and non-OISM PEs. Inter-subnet IP multicast should be 1764 possible and fully functional even if not all PEs attaching to a 1765 Tenant Domain can be upgraded to support OISM functionality. 1767 Note that the non-OISM PEs are not required to have IRB support, or 1768 support for [IGMP-Proxy]. It is however advantageous for the 1769 non-OISM PEs to support [IGMP-Proxy]. 1771 In this section, we will use the following terminology: 1773 o PE-S: the ingress PE for an (S,G) flow. 1775 o PE-R: an egress PE for an (S,G) flow. 1777 o BD-S: the source BD for an (S,G) flow. PE-S must have one or more 1778 ACs attached BD-S, at least one of which attaches to host S. 1780 o BD-R: a BD that contains a host interested in the flow. The host 1781 is attached to PE-R via an AC that belongs to BD-R. 1783 To allow OISM PEs to interwork with non-OISM PEs, a given Tenant 1784 Domain needs to contain one or more "IP Multicast Gateways" (IPMGs). 1785 An IPMG is an OISM PE with special responsibilities regarding the 1786 interworking between OISM and non-OISM PEs. 1788 If a PE is functioning as an IPMG, it MUST signal this fact by 1789 attaching a particular flag or EC (details to be determined) to its 1790 IMET routes. An IPMG SHOULD attach this flag or EC to all IMET 1791 routes it originates. However, if PE1 imports any IMET route from 1792 PE2 that has the "IPMG" flag or EC present, then the PE1 will assume 1793 that PE2 is an IPMG. 1795 An IPMG Designated Forwarder (IPMG-DF) selection procedure is used to 1796 ensure that, at any given time, there is exactly one active IPMG-DF 1797 for any given BD. Details of the IPMG-DF selection procedure are in 1798 Section 5.1. The IPMG-DF for a given BD, say BD-S, has special 1799 functions to perform when it receives (S,G) frames on that BD: 1801 o If the frames are from a non-OISM PE-S: 1803 * The IPMG-DF forwards them to OISM PEs that do not attach to 1804 BD-S but have interest in (S,G). 1806 Note that OISM PEs that do attach to BD-S will have received 1807 the frames on the BUM tunnel from the non-OISM PE-S. 1809 * The IPMG-DF forwards them to non-OISM PEs that have interest in 1810 (S,G) on ACs that do not belong to BD-S. 1812 Note that if a non-OISM PE has multiple BDs other than BD-S 1813 with interest in (S,G), it will receive one copy of the frame 1814 for each such BD. This is necessary because the non-OISM PEs 1815 cannot move IP multicast traffic from one BD to another. 1817 o If the frames are from an OISM PE, the IPMG-DF forwards them to 1818 non-OISM PEs that have interest in (S,G) on ACs that do not belong 1819 to BD-S. 1821 If a non-OISM PE has interest in (S,G) on an AC belonging to BD-S, 1822 it will have received a copy of the (S,G) frame, encapsulated for 1823 BD-S, from the OISM PE-S. (See Section 3.2.2.) If the non-OISM 1824 PE has interest in (S,G) on one or more ACs belonging to 1825 BD-R1,...,BD-Rk where the BD-Ri are distinct from BD-S, the 1826 IPMG-DF needs to send it a copy of the frame for BD-Ri. 1828 If an IPMG receives a frame on a BD for which it is not the IPMG-DF, 1829 it just follows normal OISM procedures. 1831 This section specifies several sets of procedures: 1833 o the procedures that the IPMG-DF for a given BD needs to follow 1834 when receiving, on that BD, an IP multicast frame from a non-OISM 1835 PE; 1837 o the procedures that the IPMG-DF for a given BD needs to follow 1838 when receiving, on that BD, an IP multicast frame from an OISM PE; 1840 o the procedures that an OISM PE needs to follow when receiving, on 1841 a given BD, an IP multicast frame from a non-OISM PE, when the 1842 OISM PE is not the IPMG-DF for that BD. 1844 To enable OISM/non-OISM interworking in a given Tenant Domain, the 1845 Tenant Domain MUST have some EVPN-PEs that can function as IPMGs. An 1846 IPMG must be configured with the SBD. It must also be configured 1847 with every BD of the Tenant Domain that exists on any of the non-OISM 1848 PEs of that domain. (Operationally, it may be simpler to configure 1849 the IPMG with all the BDs of the Tenant Domain.) 1851 A non-OISM PE of course only needs to be configured with BDs for 1852 which it has ACs. An OISM PE that is not an IPMG only needs to be 1853 configured with the SBD and with the BDs for which it has ACs. 1855 An IPMG MUST originate a wildcard SMET route (with (C-*,C-*) in the 1856 NLRI) for each BD in the Tenant Domain. This will cause it to 1857 receive all the IP multicast traffic that is sourced in the Tenant 1858 Domain. Note that non-OISM nodes that do not support [IGMP-Proxy] 1859 will send all the multicast traffic from a given BD to all PEs 1860 attached to that BD, even if those PEs do not originate an SMET 1861 route. 1863 The interworking procedures vary somewhat depending upon whether 1864 packets are transmitted from PE to PE via Ingress Replication (IR) or 1865 via Point-to-Multipoint (P2MP) tunnels. We do not consider the use 1866 of BIER in this section, due to the low likelihood of there being a 1867 non-OISM PE that supports BIER. 1869 5.1. IPMG Designated Forwarder 1871 Each IPMG MUST be configured with an "IPMG dummy ethernet segment" 1872 that has no ACs. 1874 EVPN supports a number of procedures that can be used to select the 1875 Designated Forwarder (DF) for a particular BD on a particular 1876 ethernet segment. Some of the possible procedures can be found, 1877 e.g., in [RFC7432], [EVPN-DF-NEW], and [EVPN-DF-WEIGHTED]. Whatever 1878 procedure is in use in a given deployment can be adapted to select an 1879 IPMG-DF for a given BD, as follows. 1881 Each IPMG will originate an Ethernet Segment route for the IPMG dummy 1882 ethernet segment. It MUST carry a Route Target derived from the 1883 corresponding Ethernet Segment Identifier. Thus only IPMGs will 1884 import the route. 1886 Once the set of IPMGs is known, it is also possible to determine the 1887 set of BDs supported by each IPMG. The DF selection procedure can 1888 then be used to choose a DF for each BD. (The conditions under which 1889 the IPMG-DF for a given BD changes depends upon the DF selection 1890 algorithm that is in use.) 1892 5.2. Ingress Replication 1894 The procedures of this section are used when Ingress Replication is 1895 used to transmit packets from one PE to another. 1897 When a non-OISM PE-S transmits a multicast frame from BD-S to another 1898 PE, PE-R, PE-S will use the encapsulation specified in the BD-S IMET 1899 route that was originated by PE-R. This encapsulation will include 1900 the label that appears in the "MPLS label" field of the PMSI Tunnel 1901 attribute (PTA) of the IMET route. If the tunnel type is VXLAN, the 1902 "label" is actually a Virtual Network Identifier (VNI); for other 1903 tunnel types, the label is an MPLS label. In either case, we will 1904 speak of the transmitted frames as carrying a label that was assigned 1905 to a particular BD by the PE-R to which the frame is being 1906 transmitted. 1908 To support OISM/non-OISM interworking, an OISM PE-R MUST originate, 1909 for each of its BDs, both an IMET route and an S-PMSI (C-*,C-*) A-D 1910 route. Note that even when IR is being used, interworking between 1911 OISM and non-OISM PEs requires the OISM PEs to follow the rules of 1912 Section 3.2.5.2, as modified below. 1914 Non-OISM PEs will not understand S-PMSI A-D routes. So when a 1915 non-OISM PE-S transmits an IP multicast frame with a particular 1916 source BD to an IPMG, it encapsulates the frame using the label 1917 specified in that IPMG's BD-S IMET route. (This is just the 1918 procedure of [RFC7432].) 1920 The (C-*,C-*) S-PMSI A-D route originated by a given OISM PE will 1921 have a PTA that specifies IR. 1923 o If MPLS tunneling is being used, the MPLS label field SHOULD 1924 contain a non-zero value, and the LIR flag SHOULD be zero. (The 1925 case where the MPLS label field is zero or the LIR flag is set is 1926 outside the scope of this document.) 1928 o If the tunnel encapsulation is VXLAN, the MPLS label field MUST 1929 contain a non-zero value, and the LIR flag MUST be zero. 1931 When an OISM PE-S transmits an IP multicast frame to an IPMG, it will 1932 use the label specified in that IPMG's (C-*,C-*) S-PMSI A-D route. 1934 When a PE originates both an IMET route and a (C-*,C-*) S-PMSI A-D 1935 route, the values of the MPLS label field in the respective PTAs must 1936 be distinct. Further, each MUST map uniquely (in the context of the 1937 originating PE) to the route's BD. 1939 As a result, an IPMG receiving an MPLS-encapsulated IP multicast 1940 frame can always tell by the label whether the frame's ingress PE is 1941 an OISM PE or a non-OISM PE. When an IPMG receives a VXLAN- 1942 encapsulated IP multicast frame it may need to determine the identity 1943 of the ingress PE from the outer IP encapsulation; it can then 1944 determine whether the ingress PE is an OISM PE or a non-OISM PE by 1945 looking the IMET route from that PE. 1947 Suppose an IPMG receives an IP multicast frame from another EVPN-PE 1948 in the Tenant Domain, and the IPMG is not the IPMG-DF for the frame's 1949 source BD. Then the IPMG performs only the ordinary OISM functions; 1950 it does not perform the IPMG-specific functions for that frame. In 1951 the remainder of this section, when we discuss the procedures applied 1952 by an IPMG when it receives an IP multicast frame, we are presuming 1953 that the source BD of the frame is a BD for which the IPMG is the 1954 IPMG-DF. 1956 We have two basic cases to consider: (1) a frame's ingress PE is a 1957 non-OISM node, and (2) a frame's ingress PE is an OISM node. 1959 5.2.1. Ingress PE is non-OISM 1961 In this case, a non-OISM PE, PE-S, has received an (S,G) multicast 1962 frame over an AC that is attached to a particular BD, BD-S. By 1963 virtue of normal EVPN procedures, PE-S has sent a copy of the frame 1964 to every PE-R (both OISM and non-OISM) in the Tenant Domain that is 1965 attached to BD-S. If the non-OISM node supports [IGMP-Proxy], only 1966 PEs that have expressed interest in (S,G) receive the frame. The 1967 IPMG will have expressed interest via a (C-*,C-*) SMET route and thus 1968 receives the frame. 1970 Any OISM PE (including an IPMG) receiving the frame will apply normal 1971 OISM procedures. As a result it will deliver the frame to any of its 1972 local ACs (in BD-S or in any other BD) that have interest in (S,G). 1974 An OISM PE that is also the IPMG-DF for a particular BD, say BD-S, 1975 has additional procedures that it applies to frames received on BD-S 1976 from non-OISM PEs: 1978 1. When the IPMG-DF for BD-S receives an (S,G) frame from a 1979 non-OISM node, it MUST forward a copy of the frame to every OISM 1980 PE that is NOT attached to BD-S but has interest in (S,G). The 1981 copy sent to a given OISM PE-R must carry the label that PE-R 1982 has assigned to the SBD in an S-PMSI A-D route. The IPMG MUST 1983 NOT do any IP processing of the frame's IP payload. TTL 1984 decrement and other IP processing will be done by PE-R, per the 1985 normal OISM procedures. There is no need for the IPMG to 1986 include an ESI label in the frame's tunnel encapsulation, 1987 because it is already known that the frame's source BD has no 1988 presence on PE-R. There is also no need for the IPMG to modify 1989 the frame's MAC SA. 1991 2. In addition, when the IPMG-DF for BD-S receives an (S,G) frame 1992 from a non-OISM node, it may need to forward copies of the frame 1993 to other non-OISM nodes. Before it does so, it MUST decapsulate 1994 the (S,G) packet, and do the IP processing (e.g., TTL 1995 decrement). Suppose PE-R is a non-OISM node that has an AC to 1996 BD-R, where BD-R is not the same as BD-S, and that AC has 1997 interest in (S,G). The IPMG must then encapsulate the (S,G) 1998 packet (after the IP processing has been done) in an ethernet 1999 header. The MAC SA field will have the MAC address of the 2000 IPMG's IRB interface to BD-R. The IPMG then sends the frame to 2001 PE-R. The tunnel encapsulation will carry the label that PE-R 2002 advertised in its IMET route for BD-R. There is no need to 2003 include an ESI label, as the source and destination BDs are 2004 known to be different. 2006 Note that if a non-OISM PE-R has several BDs (other than BD-S) 2007 with local ACs that have interest in (S,G), the IPMG will send 2008 it one copy for each such BD. This is necessary because the 2009 non-OISM PE cannot move packets from one BD to another. 2011 There may be deployment scenarios in which every OISM PE is 2012 configured with every BD that is present on any non-OISM PE. In such 2013 scenarios, the procedures of item 1 above will not actually result in 2014 the transmission of any packets. Hence if it is known a priori that 2015 this deployment scenario exists for a given tenant domain, the 2016 procedures of item 1 above can be disabled. 2018 5.2.2. Ingress PE is OISM 2020 In this case, an OISM PE, PE-S, has received an (S,G) multicast frame 2021 over an AC that attaches to a particular BD, BD-S. 2023 By virtue of receiving all the IMET routes about BD-S, PE-S will know 2024 all the PEs attached to BD-S. By virtue of normal OISM procedures: 2026 o PE-S will send a copy of the frame to every OISM PE-R (including 2027 the IPMG) in the Tenant Domain that is attached to BD-S and has 2028 interest in (S,G). The copy sent to a given PE-R carries the 2029 label that that the PE-R has assigned to BD-S in its (C-*,C-*) 2030 S-PMSI A-D route. 2032 o PE-S will also transmit a copy of the (S,G) frame to every OISM 2033 PE-R that has interest in (S,G) but is not attached to BD-S. The 2034 copy will contain the label that the PE-R has assigned to the SBD. 2035 (As in Section 5.2.1, an IPMG is assumed to have indicated 2036 interest in all multicast flows.) 2038 o PE-S will also transmit a copy of the (S,G) frame to every 2039 non-OISM PE-R that is attached to BD-S. It does this using the 2040 label advertised by that PE-R in its IMET route for BD-S. 2042 The PE-Rs follow their normal procedures. An OISM PE that receives 2043 the (S,G) frame on BD-S applies the OISM procedures to deliver the 2044 frame to its local ACs, as necessary. A non-OISM PE that receives 2045 the (S,G) frame on BD-S delivers the frame only to its local BD-S 2046 ACs, as necessary. 2048 Suppose that a non-OISM PE-R has interest in (S,G) on a BD, BD-R, 2049 that is different than BD-S. If the non-OISM PE-R is attached to 2050 BD-S, the OISM PE-S will send forward it the original (S,G) multicast 2051 frame, but the non-OISM PE-R will not be able to send the frame to 2052 ACs that are not in BD-S. If PE-R is not even attached to BD-S, the 2053 OISM PE-S will not send it a copy of the frame at all, because PE-R 2054 is not attached to the SBD. In these cases, the IPMG needs to relay 2055 the (S,G) multicast traffic from OISM PE-S to non-OISM PE-R. 2057 When the IPMG-DF for BD-S receives an (S,G) frame from an OISM PE-S, 2058 it has to forward it to every non-OISM PE-R that that has interest in 2059 (S,G) on a BD-R that is different than BD-S. The IPMG MUST 2060 decapsulate the IP multicast packet, do the IP processing, re- 2061 encapsulate it for BD-R (changing the MAC SA to the IPMG's own MAC 2062 address on BD-R), and send a copy of the frame to PE-R. Note that a 2063 given non-OISM PE-R will receive multiple copies of the frame, if it 2064 has multiple BDs on which there is interest in the frame. 2066 5.3. P2MP Tunnels 2068 When IR is used to distribute the multicast traffic among the 2069 EVPN-PEs, the procedures of Section 5.2 ensure that there will be no 2070 duplicate delivery of multicast traffic. That is, no egress PE will 2071 ever send a frame twice on any given AC. If P2MP tunnels are being 2072 used to distribute the multicast traffic, it is necessary have 2073 additional procedures to prevent duplicate delivery. 2075 At the present time, it is not clear that there will be a use case in 2076 which OISM nodes need to interwork with non-OISM nodes that use P2MP 2077 tunnels. If it is determined that there is such a use case, 2078 procedures for it will be included in a future revision of this 2079 document. 2081 6. Traffic to/from Outside the EVPN Tenant Domain 2083 In this section, we discuss scenarios where a multicast source 2084 outside a given EVPN Tenant Domain sends traffic to receivers inside 2085 the domain (as well as, possibly, to receivers outside the domain). 2086 This requires the OISM procedures to interwork with various layer 3 2087 multicast routing procedures. 2089 We assume in this section that the Tenant Domain is not being used as 2090 an intermediate transit network for multicast traffic; that is, we do 2091 not consider the case where the Tenant Domain contains multicast 2092 routers that will receive traffic from sources outside the domain and 2093 forward the traffic to receivers outside the domain. The transit 2094 scenario is considered in Section 7. 2096 We can divide the non-transit scenarios into two classes: 2098 1. One or more of the EVPN PE routers provide the functionality 2099 needed to interwork with layer 3 multicast routing procedures. 2101 2. One BD in the Tenant Domain contains external multicast routers 2102 ("tenant multicast routers") that are used to interwork the 2103 entire Tenant Domain with layer 3 multicast routing procedures. 2105 6.1. Layer 3 Interworking via EVPN OISM PEs 2107 6.1.1. General Principles 2109 Sometimes it is necessary to interwork an EVPN Tenant Domain with an 2110 external layer 3 multicast domain (the "external domain"). This is 2111 needed to allow EVPN tenant systems to receive multicast traffic from 2112 sources ("external sources") outside the EVPN Tenant Domain. It is 2113 also needed to allow receivers ("external receivers") outside the 2114 EVPN Tenant Domain to receive traffic from sources inside the Tenant 2115 Domain. 2117 In order to allow interworking between an EVPN Tenant Domain and an 2118 external domain, one or more OISM PEs must be "L3 Gateways". An L3 2119 Gateway participates both in the OISM procedures and in the L3 2120 multicast routing procedures of the external domain. 2122 An L3 Gateway that has interest in receiving (S,G) traffic must be 2123 able to determine the best route to S. If an L3 Gateway has interest 2124 in (*,G), it must be able to determine the best route to G's RP. In 2125 these interworking scenarios, the L3 Gateway must be running a layer 2126 3 unicast routing protocol. Via this protocol, it imports unicast 2127 routes (either IP routes or VPN-IP routes) from routers other than 2128 EVPN PEs. And since there may be multicast sources inside the EVPN 2129 Tenant Domain, the EVPN PEs also need to export, either as IP routes 2130 or as VPN-IP routes (depending upon the external domain), unicast 2131 routes to those sources. 2133 When selecting the best route to a multicast source or RP, an L3 2134 Gateway might have a choice between an EVPN route and an IP/VPN-IP 2135 route. When such a choice exists, the L3 Gateway SHOULD always 2136 prefer the EVPN route. This will ensure that when traffic originates 2137 in the Tenant Domain and has a receiver in the tenant domain, the 2138 path to that receiver will remain within the EVPN tenant domain, even 2139 if the source is also reachable via a routed path. This also 2140 provides protection against sub-optimal routing that might occur if 2141 two EVPN PEs export IP/VPN-IP routes and each imports the other's IP/ 2142 VPN-IP routes. 2144 Section 4.2 discusses the way layer 3 multicast states are 2145 constructed by OISM PEs. These layer 3 multicast states have IRB 2146 interfaces as their IIF and OIF list entries, and are the basis for 2147 interworking OISM with other layer 3 multicast procedures such as 2148 MVPN or PIM. From the perspective of the layer 3 multicast 2149 procedures running in a given L3 Gateway, an EVPN Tenant Domain is a 2150 set of IRB interfaces. 2152 When interworking an EVPN Tenant Domain with an external domain, the 2153 L3 Gateway's layer 3 multicast states will not only have IRB 2154 interfaces as IIF and OIF list entries, but also other "interfaces" 2155 that lead outside the Tenant Domain. For example, when interworking 2156 with MVPN, the multicast states may have MVPN tunnels as well as IRB 2157 interfaces as IIF or OIF list members. When interworking with PIM, 2158 the multicast states may have PIM-enabled non-IRB interfaces as IIF 2159 or OIF list members. 2161 As long as a Tenant Domain is not being used as an intermediate 2162 transit network for IP multicast traffic, it is not necessary to 2163 enable PIM on its IRB interfaces. 2165 In general, an L3 Gateway has the following responsibilities: 2167 o It exports, to the external domain, unicast routes to those 2168 multicast sources in the EVPN Tenant Domain that are locally 2169 attached to the L3 Gateway. 2171 o It imports, from the external domain, unicast routes to multicast 2172 sources that are in the external domain. 2174 o It executes the procedures necessary to draw externally sourced 2175 multicast traffic that is of interest to locally attached 2176 receivers in the EVPN Tenant Domain. When such traffic is 2177 received, the traffic is sent down the IRB interfaces of the BDs 2178 on which the locally attached receivers reside. 2180 One of the L3 Gateways in a given Tenant Domain becomes the "DR" for 2181 the SBD.(See Section 6.1.2.4.) This L3 gateway has the following 2182 additional responsibilities: 2184 o It exports, to the external domain, unicast routes to multicast 2185 sources that in the EVPN Tenant Domain that are not locally 2186 attached to any L3 gateway. 2188 o It imports, from the external domain, unicast routes to multicast 2189 sources that are in the external domain. 2191 o It executes the procedures necessary to draw externally sourced 2192 multicast traffic that is of interest to receivers in the EVPN 2193 Tenant Domain that are not locally attached to an L3 gateway. 2194 When such traffic is received, the traffic is sent down the SBD 2195 IRB interface. OISM procedures already described in this document 2196 will then ensure that the IP multicast traffic gets distributed 2197 throughout the Tenant Domain to any EVPN PEs that have interest in 2198 it. Thus to an OISM PE that is not an L3 gateway the externally 2199 sourced traffic will appear to have been sourced on the SBD. 2201 In order for this to work, some special care is needed when an L3 2202 gateway creates or modifies a layer 3 (*,G) multicast state. Suppose 2203 group G has both external sources (sources outside the EVPN Tenant 2204 Domain) and internal sources (sources inside the EVPN tenant domain). 2205 Section 4.2 states that when there are internal sources, the SBD IRB 2206 interface must not be added to the OIF list of the (*,G) state. 2207 Traffic from internal sources will already have been delivered to all 2208 the EVPN PEs that have interest in it. However, if the OIF list of 2209 the (*,G) state does not contain its SBD IRB interface, then traffic 2210 from external sources will not get delivered to other EVPN PEs. 2212 One way of handling this is the following. When a L3 gateway 2213 receives (S,G) traffic from other than an IRB interface, and the 2214 traffic corresponds to a layer 3 (*,G) state, the L3 gateway can 2215 create (S,G) state. The IIF will be set to the external interface 2216 over which the traffic is expected. The OIF list will contain the 2217 SBD IRB interface, as well as the IRB interfaces of any other BDs 2218 attached to the PEG DR that have locally attached receivers with 2219 interest in the (S,G) traffic. The (S,G) state will ensure that the 2220 external traffic is sent down the SBD IRB interface. The following 2221 text will assume this procedure; however other implementation 2222 techniques may also be possible. 2224 If a particular BD is attached to several L3 Gateways, one of the L3 2225 Gateways becomes the DR for that BD. (See Section 6.1.2.4.) If the 2226 interworking scenario requires FHR functionality, it is generally the 2227 DR for a particular BD that is responsible for performing that 2228 functionality on behalf of the source hosts on that BD. (E.g., if 2229 the interworking scenario requires that PIM Register messages be sent 2230 by a FHR, the DR for a given BD would send the PIM Register messages 2231 for sources on that BD.) Note though that the DR for the SBD does 2232 not perform FHR functionality on behalf of external sources. 2234 An optional alternative is to have each L3 gateway perform FHR 2235 functionality for locally attached sources. Then the DR would only 2236 have to perform FHR functionality on behalf of sources that are 2237 locally attached to itself AND sources that are not attached to any 2238 L3 gateway. 2240 6.1.2. Interworking with MVPN 2242 In this section, we specify the procedures necessary to allow EVPN 2243 PEs running OISM procedures to interwork with L3VPN PEs that run BGP- 2244 based MVPN ([RFC6514]) procedures. More specifically, the procedures 2245 herein allow a given EVPN Tenant Domain to become part of an L3VPN/ 2246 MVPN, and support multicast flows where either: 2248 o The source of a given multicast flow is attached to an ethernet 2249 segment whose BD is part of an EVPN Tenant Domain, and one or more 2250 receivers of the flow are attached to the network via L3VPN/MVPN. 2251 (Other receivers may be attached to the network via EVPN.) 2253 o The source of a given multicast flow is attached to the network 2254 via L3VPN/MVPN, and one or more receivers of the flow are attached 2255 to an ethernet segment that is part of an EVPN tenant domain. 2256 (Other receivers may be attached via L3VPN/MVPN.) 2258 In this interworking model, existing L3VPN/MVPN PEs are unaware that 2259 certain sources or receivers are part of an EVPN Tenant Domain. The 2260 existing L3VPN/MVPN nodes run only their standard procedures and are 2261 entirely unaware of EVPN. Interworking is achieved by having some or 2262 all of the EVPN PEs function as L3 Gateways running L3VPN/MVPN 2263 procedures, as detailed in the following sub-sections. 2265 In this section, we assume that there are no tenant multicast routers 2266 on any of the EVPN-attached ethernet segments. (There may of course 2267 be multicast routers in the L3VPN.) Consideration of the case where 2268 there are tenant multicast routers is deferred till Section 7.) 2270 To support MVPN/EVPN interworking, we introduce the notion of an 2271 MVPN/EVPN Gateway, or MEG. 2273 A MEG is an L3 Gateway (see Section 6.1.1), hence is both an OISM PE 2274 and an L3VPN/MVPN PE. For a given EVPN Tenant Domain it will have an 2275 IP-VRF. If the Tenant Domain is part of an L3VPN/MVPN, the IP-VRF 2276 also serves as an L3VPN VRF ([RFC4364]). The IRB interfaces of the 2277 IP-VRF are considered to be "VRF interfaces" of the L3VPN VRF. The 2278 L3VPN VRF may also have other local VRF interfaces that are not EVPN 2279 IRB interfaces. 2281 The VRF on the MEG will import VPN-IP routes ([RFC4364]) from other 2282 L3VPN Provider Edge (PE) routers. It will also export VPN-IP routes 2283 to other L3VPN PE routers. In order to do so, it must be 2284 appropriately configured with the Route Targets used in the L3VPN to 2285 control the distribution of the VPN-IP routes. These Route Targets 2286 will in general be different than the Route Targets used for 2287 controlling the distribution of EVPN routes, as there is no need to 2288 distribute EVPN routes to L3VPN-only PEs and no reason to distribute 2289 L3VPN/MVPN routes to EVPN-only PEs. 2291 Note that the RDs in the imported VPN-IP routes will not necessarily 2292 conform to the EVPN rules (as specified in [RFC7432]) for creating 2293 RDs. Therefore a MEG MUST NOT expect the RDs of the VPN-IP routes to 2294 be of any particular format other than what is required by the L3VPN/ 2295 MVPN specifications. 2297 The VPN-IP routes that a MEG exports to L3VPN are subnet routes and/ 2298 or host routes for the multicast sources that are part of the EVPN 2299 tenant domain. The exact set of routes that need to be exported is 2300 discussed in Section 6.1.2.2. 2302 Each IMET route originated by a MEG SHOULD carry a flag or Extended 2303 Community (to be determined) indicating that the originator of the 2304 IMET route is a MEG. However, PE1 will consider PE2 to be a MEG if 2305 PE1 imports at least one IMET route from PE2 that carries the flag or 2306 EC. 2308 All the MEGs of a given Tenant Domain attach to the SBD of that 2309 domain, and one of them is selected to be the SBD's Designated Router 2310 (DR) for the domain. The selection procedure is discussed in 2311 Section 6.1.2.4. 2313 In this model of operation, MVPN procedures and EVPN procedures are 2314 largely independent. In particular, there is no assumption that MVPN 2315 and EVPN use the same kind of tunnels. Thus no special procedures 2316 are needed to handle the common scenarios where, e.g., EVPN uses 2317 VXLAN tunnels but MVPN uses MPLS P2MP tunnels, or where EVPN uses 2318 Ingress Replication but MVPN uses MPLS P2MP tunnels. 2320 Similarly, no special procedures are needed to prevent duplicate data 2321 delivery on ethernet segments that are multi-homed. 2323 The MEG does have some special procedures (described below) for 2324 interworking between EVPN and MVPN; these have to do with selection 2325 of the Upstream PE for a given multicast source, with the exporting 2326 of VPN-IP routes, and with the generation of MVPN C-multicast routes 2327 triggered by the installation of SMET routes. 2329 6.1.2.1. MVPN Sources with EVPN Receivers 2331 6.1.2.1.1. Identifying MVPN Sources 2333 Consider a multicast source S. It is possible that a MEG will import 2334 both an EVPN unicast route to S and a VPN-IP route (or an ordinary IP 2335 route), where the prefix length of each route is the same. In order 2336 to draw (S,G) multicast traffic for any group G, the MEG SHOULD use 2337 the EVPN route rather than the VPN-IP or IP route to determine the 2338 "Upstream PE" (see section 5 of [RFC6513]). 2340 Doing so ensures that when an EVPN tenant system desires to receive a 2341 multicast flow from another EVPN tenant system, the traffic from the 2342 source to that receiver stays within the EVPN domain. This prevents 2343 problems that might arise if there is a unicast route via L3VPN to S, 2344 but no multicast routers along the routed path. This also prevents 2345 problem that might arise as a result of the fact that the MEGs will 2346 import each others' VPN-IP routes. 2348 In the Section 6.1.2.1.2, we describe the procedures to be used when 2349 the selected route to S is a VPN-IP route. 2351 6.1.2.1.2. Joining a Flow from an MVPN Source 2353 Suppose a tenant system R wants to receive (S,G) multicast traffic, 2354 where source S is not attached to any PE in the EVPN Tenant Domain, 2355 but is attached to an MVPN PE. 2357 o Suppose R is on a singly homed ethernet segment of BD-R, and that 2358 segment is attached to PE1, where PE1 is a MEG. PE1 learns via 2359 IGMP/MLD listening that R is interested in (S,G). PE1 determines 2360 from its VRF that there is no route to S within the Tenant Domain 2361 (i.e., no EVPN RT-2 route with S's IP address), but that there is 2362 a route to S via L3VPN (i.e., the VRF contains a subnet or host 2363 route to S that was received as a VPN-IP route). PE1 thus 2364 originates (if it hasn't already) an MVPN C-multicast Source Tree 2365 Join(S,G) route. The route is constructed according to normal 2366 MVPN procedures. 2368 The layer 2 multicast state is constructed as specified in 2369 Section 4.1. 2371 In the layer 3 multicast state, the IIF is the appropriate MVPN 2372 tunnel, and the IRB interface to BD-R is added to the OIF list. 2374 When PE1 receives (S,G) traffic from the appropriate MVPN tunnel, 2375 it performs IP processing of the traffic, and then sends the 2376 traffic down its IRB interface to BD-R. Following normal OISM 2377 procedures, the (S,G) traffic will be encapsulated for ethernet 2378 and sent out the AC to which R is attached. 2380 o Suppose R is on a singly homed ethernet segment of BD-R, and that 2381 segment is attached to PE1, where PE1 is an OISM PE but is NOT a 2382 MEG. PE1 learns via IGMP/MLD listening that R is interested in 2383 (S,G). PE1 follows normal OISM procedures, originating an SMET 2384 route in BD-R for (S,G). Since this route will carry the SBD-RT, 2385 it will be received by the MEG that is the DR for the Tenant 2386 Domain. The MEG DR can determine from PE1's IMET route whether 2387 PE1 is itself a MEG. If PE1 is not a MEG, the MEG DR will 2388 originate (if it hasn't already) an MVPN C-multicast Source Tree 2389 Join(S,G) route. This will cause the DR MEG to receive (S,G) 2390 traffic on an MVPN tunnel. 2392 The layer 2 multicast state is constructed as specified in 2393 Section 4.1. 2395 In the layer 3 multicast state, the IIF is the appropriate MVPN 2396 tunnel, and the IRB interface to the SBD is added to the OIF list. 2398 When the DR MEG receives (S,G) traffic on an MVPN tunnel, it 2399 performs IP processing of the traffic, and the sends the traffic 2400 down its IRB interface to the SBD. Following normal OISM 2401 procedures, the traffic will be encapsulated for ethernet and 2402 delivered to all PEs in the Tenant Domain that have interest in 2403 (S,G), including PE1. 2405 o If R is on a multi-homed ethernet segment of BD-R, one of the PEs 2406 attached to the segment will be its DF (following normal EVPN 2407 procedures), and the DF will know (via the procedures of 2408 [IGMP-Proxy] that a tenant system reachable via one of its local 2409 ACs to BD-R is interested in (S,G) traffic. The DF is responsible 2410 for originating an SMET route for (S,G), following normal OISM 2411 procedures. If the DF is a MEG, it will originate the 2412 corresponding MVPN C-multicast Source Tree Join(S,G) route; if the 2413 DF is not a MEG, the MEG that is the DR will originate the 2414 C-multicast route when it receives the SMET route. 2416 o If R is attached to a non-OISM PE, it will receive the traffic via 2417 an IPMG, as specified in Section 5. 2419 If an EVPN-attached receiver is interested in (*,G) traffic, and if 2420 it is possible for there to be sources of (*,G) traffic that are 2421 attached only to L3VPN nodes, the MEGs will have to know the group- 2422 to-RP mappings. That will enable them to originate MVPN C-multicast 2423 Shared Tree Join(*,G) routes and to send them towards the RP. (Since 2424 we are assuming in this section that there are no tenant multicast 2425 routers attached to the EVPN Tenant Domain, the RP must be attached 2426 via L3VPN. Alternatively, the MEG itself could be configured to 2427 function as an RP for group G.) 2429 The layer 2 multicast states are constructed as specified in 2430 Section 4.1. 2432 In the layer 3 (*,G) multicast state, the IIF is the appropriate MVPN 2433 tunnel. A MEG will add to the (*,G) OIF list its IRB interfaces for 2434 any BDs containing locally attached receivers. If there are 2435 receivers attached to other EVPN PEs, then whenever (S,G) traffic 2436 from an external source matches a (*,G) state, the MEG will create 2437 (S,G) state, with the MVPN tunnel as the IIF, the OIF list copied 2438 from the (*,G) state, and the SBD IRB interface added to the OIF 2439 list. (Please see the discussion in Section 6.1.1 regarding the 2440 inclusion of the SBD IRB interface in a (*,G) state; the SBD IRB 2441 interface is used in the OIF list only for traffic from external 2442 sources.) 2444 Normal MVPN procedures will then result in the MEG getting the (*,G) 2445 traffic from all the multicast sources for G that are attached via 2446 L3VPN. This traffic arrives on MVPN tunnels. When the MEG removes 2447 the traffic from these tunnels, it does the IP processing. If there 2448 are any receivers on a given BD, BD-R, that are attached via local 2449 EVPN ACs, the MEG sends the traffic down its BD-R IRB interface. If 2450 there are any other EVPN PEs that are interested in the (*,G) 2451 traffic, the MEG sends the traffic down the SBD IRB interface. 2452 Normal OISM procedures then distribute the traffic as needed to other 2453 EVPN-PEs. 2455 6.1.2.2. EVPN Sources with MVPN Receivers 2457 6.1.2.2.1. General procedures 2459 Consider the case where an EVPN tenant system S is sending IP 2460 multicast traffic to group G, and there is a receiver R for the (S,G) 2461 traffic that is attached to the L3VPN, but not attached to the EVPN 2462 Tenant Domain. (We assume in this document that the L3VPN/MVPN-only 2463 nodes will not have any special procedures to deal with the case 2464 where a source is inside an EVPN domain.) 2466 In this case, an L3VPN PE through which R can be reached has to send 2467 an MVPN C-multicast Join(S,G) route to one of the MEGs that is 2468 attached to the EVPN Tenant Domain. For this to happen, the L3VPN PE 2469 must have imported a VPN-IP route for S (either a host route or a 2470 subnet route) from a MEG. 2472 If a MEG determines that there is multicast source transmitting on 2473 one of its ACs, the MEG SHOULD originate a VPN-IP host route for that 2474 source. This determination SHOULD be made by examining the IP 2475 multicast traffic that arrives on the ACs. (It MAY be made by 2476 provisioning.) A MEG SHOULD NOT export a VPN-IP host route for any 2477 IP address that is not known to be a multicast source (unless it has 2478 some other reason for exporting such a route). The VPN-IP host route 2479 for a given multicast source MUST be withdrawn if the source goes 2480 silent for a configurable period of time, or if it can be determined 2481 that the source is no longer reachable via a local AC. 2483 A MEG SHOULD also originate a VPN-IP subnet route for each of the BDs 2484 in the Tenant Domain. 2486 VPN-IP routes exported by a MEG must carry any attributes or extended 2487 communities that are required by L3VPN and MVPN. In particular, a 2488 VPN-IP route exported by a MEG must carry a VRF Route Import Extended 2489 Community corresponding to the IP-VRF from which it is imported, and 2490 a Source AS Extended Community. 2492 As a result, if S is attached to a MEG, the L3VPN nodes will direct 2493 their MVPN C-multicast Join routes to that MEG. Normal MVPN 2494 procedures will cause the traffic to be delivered to the L3VPN nodes. 2495 The layer 3 multicast state for (S,G) will have the MVPN tunnel on 2496 its OIF list. The IIF will be the IRB interface leading to the BD 2497 containing S. 2499 If S is not attached to a MEG, the L3VPN nodes will direct their 2500 C-multicast Join routes to whichever MEG appears to be on the best 2501 route to S's subnet. Upon receiving the C-multicast Join, that MEG 2502 will originate an EVPN SMET route for (S,G). As a result, the MEG 2503 will receive the (S,G) traffic at layer 2 via the OISM procedures. 2504 The (S,G) traffic will be sent up the appropriate IRB interface, and 2505 the layer 3 MVPN procedures will ensure that the traffic is delivered 2506 to the L3VPN nodes that have requested it. The layer 3 multicast 2507 state for (S,G) will have the MVPN tunnel in the OIF list, and the 2508 IIF will be one of the following: 2510 o If S belongs to a BD that is attached to the MEG, the IIF will be 2511 the IRB interface to that BD; 2513 o Otherwise the IIF will be the SBD IRB interface. 2515 Note that this works even if S is attached to a non-OISM PE, per the 2516 procedures of Section 5. 2518 6.1.2.2.2. Any-Source Multicast (ASM) Groups 2520 Suppose the MEG DR learns that one of the PEs in its Tenant Domain is 2521 interested in (*,G), traffic, where G is an Any-Source Multicast 2522 (ASM) group. If there are no tenant multicast routers, the MEG DR 2523 SHOULD perform the "First Hop Router" (FHR) functionality for group G 2524 on behalf of the Tenant Domain, as described in [RFC7761]. This 2525 means that the MEG DR must know the identity of the Rendezvous Point 2526 (RP) for each group, must send Register messages to the Rendezvous 2527 Point, etc. 2529 If the MEG DR is to be the FHR for the Tenant Domain, it must see all 2530 the multicast traffic that is sourced from within the domain and 2531 destined to an ASM group address. The MEG can ensure this by 2532 originating an SBD-SMET route for (*,*). As an optimization, an 2533 SBD-SMET route for (*, "any ASM group"), or even (*, "any ASM group 2534 that might have MVPN sources") can be defined. 2536 In some deployment scenarios, it may be preferred that the MEG that 2537 receives the (S,G) traffic over an AC be the one provides the FHR 2538 functionality. In that case, the MEG DR wold not need to provide the 2539 FHR functionality for (S,G) traffic that is attached to another MEG. 2541 Other deployment scenarios are also possible. For example, one might 2542 want to configure the MEGs to themselves be RPs. In this case, the 2543 RPs would have to exchange with each other information about which 2544 sources are active. The method exchanging such information is 2545 outside the scope of this document. 2547 6.1.2.2.3. Source on Multihomed Segment 2549 Suppose S is attached to a segment that is all-active multi-homed to 2550 PEl and PE2. If S is transmitting to two groups, say G1 and G2, it 2551 is possible that PE1 will receive the (S,G1) traffic from S while PE2 2552 receives the (S,G2) traffic from S. 2554 This creates an issue for MVPN/EVPN interworking, because there is no 2555 way to cause L3VPN/MVPN nodes to select PE1 as the ingress PE for 2556 (S,G1) traffic while selecting PE2 as the ingress PE for (S,G2) 2557 traffic. 2559 However, the following procedure ensures that the IP multicast 2560 traffic will still flow, even if the L3VPN/MVPN nodes picks the 2561 "wrong" EVPN-PE as the Upstream PE for (say) the (S,G1) traffic. 2563 Suppose S is on an ethernet segment, belonging to BD1, that is 2564 multi-homed to both PE1 and PE2, where PE1 is a MEG. And suppose 2565 that IP multicast traffic from S to G travels over the AC that 2566 attaches the segment to PE2 . If PE1 receives a C-multicast Source 2567 Tree Join (S,G) route, it MUST originate an SMET route for (S,G). 2568 Normal OISM procedures will then cause PE2 to send the (S,G) traffic 2569 to PE1 on an EVPN IP multicast tunnel. Normal OISM procedures will 2570 also cause PE1 to send the (S,G) traffic up its BD1 IRB interface. 2571 Normal MVPN procedures will then cause PE1 to forward the traffic on 2572 an MVPN tunnel. In this case, the routing is not optimal, but the 2573 traffic does flow correctly. 2575 6.1.2.3. Obtaining Optimal Routing of Traffic Between MVPN and EVPN 2577 The routing of IP multicast traffic between MVPN nodes and EVPN nodes 2578 will be optimal as long as there is a MEG along the optimal route. 2579 There are various deployment strategies that can be used to obtain 2580 optimal routing between MVPN and EVPN. 2582 In one such scenario, a Tenant Domain will have a small number of 2583 strategically placed MEGs. For example, a Data Center may have a 2584 small number of MEGs that connect it to a wide-area network. Then 2585 the optimal route into or out of the Data Center would be through the 2586 MEGs. 2588 In this scenario, the MEGs do not need to originate VPN-IP host 2589 routes for the multicast sources, they only need to originate VPN-IP 2590 subnet routes. The internal structure of the EVPN is completely 2591 hidden from the MVPN node. EVPN actions such as MAC Mobility and 2592 Mass Withdrawal ([RFC7432]) have zero impact on the MVPN control 2593 plane. 2595 While this deployment scenario provides the most optimal routing and 2596 has the least impact on the installed based of MVPN nodes, it does 2597 complicate network planning considerations. 2599 Another way of providing routing that is close to optimal is to turn 2600 each EVPN PE into a MEG. Then routing of MVPN-to-EVPN traffic is 2601 optimal. However, routing of EVPN-to-MVPN traffic is not guaranteed 2602 to be optimal when a source host is on a multi-homed ethernet segment 2603 (as discussed in Section 6.1.2.2.) 2605 The obvious disadvantage of this method is that it requires every 2606 EVPN PE to be a MEG. 2608 The procedures specified in this document allow an operator to add 2609 MEG functionality to any subset of his EVPN OISM PEs. This allows an 2610 operator to make whatever trade-offs he deems appropriate between 2611 optimal routing and MEG deployment. 2613 6.1.2.4. DR Selection 2615 Each MEG MUST be configured with an "MEG dummy ethernet segment" that 2616 has no ACs. 2618 EVPN supports a number of procedures that can be used to select the 2619 Designated Forwarder (DF) for a particular BD on a particular 2620 ethernet segment. Some of the possible procedures can be found, 2621 e.g., in [RFC7432], [EVPN-DF-NEW], and [EVPN-DF-WEIGHTED]. Whatever 2622 procedure is in use in a given deployment can be adapted to select a 2623 MEG DR for a given BD, as follows. 2625 Each MEG will originate an Ethernet Segment route for the MEG dummy 2626 ethernet segment. It MUST carry a Route Target derived from the 2627 corresponding Ethernet Segment Identifier. Thus only MEGs will 2628 import the route. 2630 Once the set of MEGs is known, it is also possible to determine the 2631 set of BDs supported by each MEG. The DF selection procedure can 2632 then be used to choose a MEG DR for the SBD. (The conditions under 2633 which the MEG DR changes depends upon the DF selection algorithm that 2634 is in use.) 2636 These procedures can also be used to select a DR for each BD. 2638 6.1.3. Interworking with 'Global Table Multicast' 2640 If multicast service to the outside sources and/or receivers is 2641 provided via the BGP-based "Global Table Multicast" (GTM) procedures 2642 of [RFC7716], the procedures of Section 6.1.2 can easily be adapted 2643 for EVPN/GTM interworking. The way to adapt the MVPN procedures to 2644 GTM is explained in [RFC7716]. 2646 6.1.4. Interworking with PIM 2648 As we have been discussing, there may be receivers in an EVPN tenant 2649 domain that are interested in multicast flows whose sources are 2650 outside the EVPN Tenant Domain. Or there may be receivers outside an 2651 EVPN Tenant Domain that are interested in multicast flows whose 2652 sources are inside the Tenant Domain. 2654 If the outside sources and/or receivers are part of an MVPN, 2655 interworking procedures are covered in Section 6.1.2. 2657 There are also cases where an external source or receiver are 2658 attached via IP, and the layer 3 multicast routing is done via PIM. 2659 In this case, the interworking between the "PIM domain" and the EVPN 2660 tenant domain is done at L3 Gateways that perform "PIM/EVPN Gateway" 2661 (PEG) functionality. A PEG is very similar to a MEG, except that its 2662 layer 3 multicast routing is done via PIM rather than via BGP. 2664 If external sources or receivers for a given group are attached to a 2665 PEG via a layer 3 interface, that interface should be treated as a 2666 VRF interface attached to the Tenant Domain's L3VPN VRF. The layer 3 2667 multicast routing instance for that Tenant Domain will either run PIM 2668 on the VRF interface or will listen for IGMP/MLD messages on that 2669 interface. If the external receiver is attached elsewhere on an IP 2670 network, the PE has to enable PIM on its interfaces to the backbone 2671 network. In both cases, the PE needs to perform PEG functionality, 2672 and its IMET routes must carry a flag or EC identifying it as a PEG. 2674 For each BD on which there is a multicast source or receiver, one of 2675 the PEGs will becomes the PEG DR. DR selection can be done using the 2676 same procedures specified in Section 6.1.2.4. 2678 As long as there are no tenant multicast routers within the EVPN 2679 Tenant Domain, the PEGs do not need to run PIM on their IRB 2680 interfaces. 2682 6.1.4.1. Source Inside EVPN Domain 2684 If a PEG receives a PIM Join(S,G) from outside the EVPN tenant 2685 domain, it may find it necessary to create (S,G) state. The PE needs 2686 to determine whether S is within the Tenant Domain. If S is not 2687 within the EVPN Tenant Domain, the PE carries out normal layer 3 2688 multicast routing procedures. If S is within the EVPN tenant domain, 2689 the IIF of the (S,G) state is set as follows: 2691 o if S is on a BD that is attached to the PE, the IIF is the PE's 2692 IRB interface to that BD; 2694 o if S is not on a BD that is attached to the PE, the IIF is the 2695 PE's IRB interface to the SBD. 2697 When the PE creates such an (S,G) state, it MUST originate (if it 2698 hasn't already) an SBD-SMET route for (S,G). This will cause it to 2699 pull the (S,G) traffic via layer 2. When the traffic arrives over an 2700 EVPN tunnel, it gets sent up an IRB interface where the layer 3 2701 multicast routing determines the packet's disposition. The SBD-SMET 2702 route is withdrawn when the (S,G) state no longer exists (unless 2703 there is some other reason for not withdrawing it). 2705 If there are no tenant multicast routers with the EVPN tenant domain, 2706 there cannot be an RP in the Tenant Domain, so a PEG does not have to 2707 handle externally arriving PIM Join(*,G) messages. 2709 The PEG DR for a particular BD MUST act as the a First Hop Router for 2710 that BD. It will examine all (S,G) traffic on the BD, and whenever G 2711 is an ASM group, the PEG DR will send Register messages to the RP for 2712 G. This means that the PEG DR will need to pull all the (S,G) 2713 traffic originating on a given BD, by originating an SMET (*,*) route 2714 for that BD. If a PEG DR is the DR for all the BDS, in SHOULD 2715 originate just an SBD-SMET (*,*) route rather than an SMET (*,*) 2716 route for each BD. 2718 The rules for exporting IP routes to multicast sources are the same 2719 as those specified for MEGs in Section 6.1.2.2, except that the 2720 exported routes will be IP routes rather than VPN-IP routes, and it 2721 is not necessary to attach the VRF Route Import EC or the Source AS 2722 EC. 2724 When a source is on a multi-homed segment, the same issue discussed 2725 in Section 6.1.2.2.3 exists. Suppose S is on an ethernet segment, 2726 belonging to BD1, that is multi-homed to both PE1 and PE2, where PE1 2727 is a PEG. And suppose that IP multicast traffic from S to G travels 2728 over the AC that attaches the segment to PE2. If PE1 receives an 2729 external PIM Join (S,G) route, it MUST originate an SMET route for 2730 (S,G). Normal OISM procedures will cause PE2 to send the (S,G) 2731 traffic to PE1 on an EVPN IP multicast tunnel. Normal OISM 2732 procedures will also cause PE1 to send the (S,G) traffic up its BD1 2733 IRB interface. Normal PIM procedures will then cause PE1 to forward 2734 the traffic along a PIM tree. In this case, the routing is not 2735 optimal, but the traffic does flow correctly. 2737 6.1.4.2. Source Outside EVPN Domain 2739 By means of normal OISM procedures, a PEG learns whether there are 2740 receivers in the Tenant Domain that are interested in receiving (*,G) 2741 or (S,G) traffic. The PEG must determine whether S (or the RP for G) 2742 is outside the EVPN Tenant Domain. If so, and if there is a receiver 2743 on BD1 interested in receiving such traffic, the PEG DR for BD1 is 2744 responsible for originating a PIM Join(S,G) or Join(*,G) control 2745 message. 2747 An alternative would be to allow any PEG that is directly attached to 2748 a receiver to originate the PIM Joins. Then the PEG DR would only 2749 have to originate PIM Joins on behalf of receivers that are not 2750 attached to a PEG. However, if this is done, it is necessary for the 2751 PEGs to run PIM on all their IRB interfaces, so that the PIM Assert 2752 procedures can be used to prevent duplicate delivery to a given BD. 2754 The IIF for the layer 3 (S,G) or (*,G) state is determined by normal 2755 PIM procedures. If a receiver is on BD1, and the PEG DR is attached 2756 to BD1, its IRB interface to BD1 is added to the OIF list. This 2757 ensures that any receivers locally attached to the PEG DR will 2758 receive the traffic. If there are receivers attached to other EVPN 2759 PEs, then whenever (S,G) traffic from an external source matches a 2760 (*,G) state, the PEG will create (S,G) state. The IIF will be set to 2761 whatever external interface the traffic is expected to arrive on 2762 (copied from the (*,G) state), the OIF list is copied from the (*,G) 2763 state, and the SBD IRB interface added to the OIF list. 2765 6.2. Interworking with PIM via an External PIM Router 2767 Section 6.1 describes how to use an OISM PE router as the gateway to 2768 a non-EVPN multicast domain, when the EVPN tenant domain is not being 2769 used as an intermediate transit network for multicast. An 2770 alternative approach is to have one or more external PIM routers 2771 (perhaps operated by a tenant) on one of the BDs of the tenant 2772 domain. We will refer to this BD as the "gateway BD". 2774 In this model: 2776 o The EVPN Tenant Domain is treated as a stub network attached to 2777 the external PIM routers. 2779 o The external PIM routers follow normal PIM procedures, and provide 2780 the FHR and LHR functionality for the entire Tenant Domain. 2782 o The OISM PEs do not run PIM. 2784 o If an OISM PE not attached to the gateway BD has interest in a 2785 given multicast flow, it conveys that interest to the OISM PEs 2786 that are attached to the gateway BD. This is done by following 2787 normal OISM procedures. As a result, IGMP/MLD messages will seen 2788 by the external PIM routers on the gateway BD, and those external 2789 PIM routers will send PIM Join messages externally as required. 2790 Traffic of the given multicast flow will then be received by one 2791 of the external PIM routers, and that traffic will be forwarded by 2792 that router to the gateway BD. 2794 The normal OISM procedures will then cause the given multicast 2795 flow to be tunneled to any PEs of the EVPN Tenant Domain that have 2796 interest in the flow. PEs attached to the gateway BD will see the 2797 flow as originating from the gateway BD, other PEs will see the 2798 flow as originating from the SBD. 2800 o An OISM PE attached to a gateway BD MUST set its layer 2 multicast 2801 state to indicate that each AC to the gateway BD has interest in 2802 all multicast flows. It MUST also originate an SMET route for 2803 (*,*). The procedures for originating SMET routes are discussed 2804 in Section 2.5. 2806 o This will cause the OISM PEs attached to the gateway BD to receive 2807 all the IP multicast traffic that is sourced within the EVPN 2808 tenant domain, and to transmit that traffic to the gateway BD, 2809 where the external PIM routers will see it. (Of course, if the 2810 gateway BD has a multi-homed segment, only the PE that is the DF 2811 for that segment will transmit the multicast traffic to the 2812 segment.) 2814 7. Using an EVPN Tenant Domain as an Intermediate (Transit) Network for 2815 Multicast traffic 2817 In this section, we consider the scenario where one or more BDs of an 2818 EVPN Tenant Domain are being used to carry IP multicast traffic for 2819 which the source and at least one receiver are not part the tenant 2820 domain. That is, one or more BDs of the Tenant Domain are 2821 intermediate "links" of a larger multicast tree created by PIM. 2823 We define a "tenant multicast router" as a multicast router, running 2824 PIM, that is: 2826 attached to one or more BDs of the Tenant Domain, but 2828 is not an EVPN PE router. 2830 In order an EVPN Tenant Domain to be used as a transit network for IP 2831 multicast, one or more of its BDs must have tenant multicast routers, 2832 and an OISM PE that attaching to such a BD MUST be provisioned to 2833 enable PIM on its IRB interface to that BD. (This is true even if 2834 none of the tenant routers is on a segment attached to the PE.) 2835 Further, all the OISM PEs (even ones not attached to a BD with tenant 2836 multicast routers) MUST be provisioned to enable PIM on their SBD IRB 2837 interfaces. 2839 If PIM is enabled on a particular BD, the DR Selection procedure of 2840 Section 6.1.2.4 MUST be replaced by the normal PIM DR Election 2841 procedure of [RFC7761]. Note that this may result in one of the 2842 tenant routers being selected as the DR, rather than one of the OISM 2843 PE routers. In this case, First Hop Router and Last Hop Router 2844 functionality will not be performed by any of the EVPN PEs. 2846 A PIM control message on a particular BD is considered to be a 2847 link-local multicast message, and as such is sent transparently from 2848 PE to PE via the BUM tunnel for that BD. This is true whether the 2849 control message was received from an AC, or whether it was received 2850 from the local layer 3 routing instance via an IRB interface. 2852 A PIM Join/Prune message contains three fields that are relevant to 2853 the present discussion: 2855 o Upstream Neighbor 2857 o Group Address (G) 2859 o Source Address (S), omitted in the case of (*,G) Join/Prune 2860 messages. 2862 We will generally speak of a PIM Join as a "Join(S,G)" or a 2863 "Join(*,G)" message, and will use the term "Join(X,G)" to mean 2864 "either Join(S,G) or Join(*,G)". In the context of a Join(X,G), we 2865 will use the term "X" to mean "S in the case of (S,G), or G's RP in 2866 the case of (*,G)". 2868 Suppose BD1 contains two tenant multicast routers, C1 and C2. 2869 Suppose C1 is on a segment attached to PE1, and C2 is on a segment 2870 attached to PE2. When C1 sends a PIM Join(X,G) to BD1, the Upstream 2871 Neighbor field might be set to either PE1, PE2, or C2. C1 chooses 2872 the Upstream Neighbor based on its unicast routing. Typically, it 2873 will choose as the Upstream Neighbor the PIM router on BD1 that is 2874 "closest" (according to the unicast routing) to X. Note that this 2875 will not necessarily be PE1. PE1 may not even be visible to the 2876 unicast routing algorithm used by the tenant routers. Even if it is, 2877 it is unlikely to be the PIM router that is closest to X. So we need 2878 to consider the following two cases: 2880 C1 sends a PIM Join(X,G) to BD1, with PE1 as the Upstream 2881 Neighbor. 2883 PE1's PIM routing instance will see the Join arrive on the BD1 IRB 2884 interface. If X is not within the Tenant Domain, PE1 handles the 2885 Join according to normal PIM procedures. This will generally 2886 result in PE1 selecting an Upstream Neighbor and sending it a 2887 Join(X,G). 2889 If X is within the Tenant Domain, but is attached to some other 2890 PE, PE1 sends (if it hasn't already) an SBD-SMET route for (X,G). 2891 The IIF of the layer 3 (X,G) state will be the SBD IRB interface, 2892 and the OIF list will include the IRB interface to BD1. 2894 The SBD-SMET route will pull the (X,G) traffic to PE1, and the 2895 (X,G) state will result in the (X,G) traffic being forwarded to 2896 C1. 2898 If X is within the Tenant Domain, but is attached to PE1 itself, 2899 no SBD-SMET route is sent. The IIF of the layer 3 (X,G) state 2900 will be the IRB interface to X's BD, and the OIF list will include 2901 the IRB interface to BD1. 2903 C1 sends a PIM Join(X,G) to BD1, with either PE2 or C2 as the 2904 Upstream Neighbor. 2906 PE1's PIM routing instance will see the Join arrive on the BD1 IRB 2907 interface. If neither X nor Upstream Neighbor is within the 2908 tenant domain, PE1 handles the Join according to normal PIM 2909 procedures. This will NOT result in PE1 sending a Join(X,G). 2911 If either X or Upstream Neighbor is within the Tenant Domain, PE1 2912 sends (if it hasn't already) an SBD-SMET route for (X,G). The IIF 2913 of the layer 3 (X,G) state will be the SBD IRB interface, and the 2914 OIF list will include the IRB interface to BD1. 2916 The SBD-SMET route will pull the (X,G) traffic to PE1, and the 2917 (X,G) state will result in the (X,G) traffic being forwarded to 2918 C1. 2920 8. IANA Considerations 2922 To be supplied. 2924 9. Security Considerations 2926 This document uses protocols and procedures defined in the normative 2927 references, and inherits the security considerations of those 2928 references. 2930 This document adds flags or Extended Communities (ECs) to a number of 2931 BGP routes, in order to signal that particular nodes support the 2932 OISM, IPMG, MEG, and/or PEG functionalities that are defined in this 2933 document. Incorrect addition, removal, or modification of those 2934 flags and/or ECs will cause the procedures defined herein to 2935 malfunction, in which case loss or diversion of data traffic is 2936 possible. 2938 10. Acknowledgements 2940 The authors thank Vikram Nagarajan and Princy Elizabeth for their 2941 work on Section 6.2. The authors also benefited tremendously from 2942 discussions with Aldrin Isaac on EVPN multicast optimizations. 2944 11. References 2946 11.1. Normative References 2948 [EVPN-AR] Rabadan, J., Ed., "Optimized Ingress Replication solution 2949 for EVPN", internet-draft ietf-bess-evpn-optimized-ir- 2950 02.txt, August 2017. 2952 [EVPN-BUM] 2953 Zhang, Z., Lin, W., Rabadan, J., and K. Patel, "Updates on 2954 EVPN BUM Procedures", internet-draft ietf-bess-evpn-bum- 2955 procedure-updates-01.txt, December 2016. 2957 [EVPN-IRB] 2958 Sajassi, A., Salam, S., Thoria, S., Drake, J., Rabadan, 2959 J., and L. Yong, "Integrated Routing and Bridging in 2960 EVPN", internet-draft draft-ietf-bess-evpn-inter-subnet- 2961 forwarding-03.txt, February 2017. 2963 [EVPN_IP_Prefix] 2964 Rabadan, J., Henderickx, W., Drake, J., Lin, W., and A. 2965 Sajassi, "IP Prefix Advertisement in EVPN", internet- 2966 draft ietf-bess-evpn-prefix-advertisement-05.txt, July 2967 2017. 2969 [IGMP-Proxy] 2970 Sajassi, A., Thoria, S., Patel, K., Yeung, D., Drake, J., 2971 and W. Lin, "IGMP and MLD Proxy for EVPN", internet-draft 2972 draft-ietf-bess-evpn-igmp-mld-proxy-00.txt, March 2017. 2974 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2975 Requirement Levels", BCP 14, RFC 2119, 2976 DOI 10.17487/RFC2119, March 1997, 2977 . 2979 [RFC2236] Fenner, W., "Internet Group Management Protocol, Version 2980 2", RFC 2236, DOI 10.17487/RFC2236, November 1997, 2981 . 2983 [RFC2710] Deering, S., Fenner, W., and B. Haberman, "Multicast 2984 Listener Discovery (MLD) for IPv6", RFC 2710, 2985 DOI 10.17487/RFC2710, October 1999, 2986 . 2988 [RFC6625] Rosen, E., Ed., Rekhter, Y., Ed., Hendrickx, W., and R. 2989 Qiu, "Wildcards in Multicast VPN Auto-Discovery Routes", 2990 RFC 6625, DOI 10.17487/RFC6625, May 2012, 2991 . 2993 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 2994 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 2995 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2996 2015, . 2998 11.2. Informative References 3000 [EVPN-BIER] 3001 Zhang, Z., Przygienda, A., Sajassi, A., and J. Rabadan, 3002 "Updates on EVPN BUM Procedures", internet-draft ietf- 3003 zzhang-bier-evpn-00.txt, June 2017. 3005 [EVPN-DF-NEW] 3006 Mohanty, S., Patel, K., Sajassi, A., Drake, J., and T. 3007 Przygienda, "A new Designated Forwarder Election for the 3008 EVPN", internet-draft ietf-bess-evpn-df-election-02.txt, 3009 April 2017. 3011 [EVPN-DF-WEIGHTED] 3012 Rabadan, J., Sathappan, S., Przygienda, T., Lin, W., 3013 Drake, J., Sajassi, A., and S. Mohanty, "Preference-based 3014 EVPN DF Election", internet-draft ietf-bess-evpn-pref-df- 3015 00.txt, June 2017. 3017 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 3018 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 3019 2006, . 3021 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 3022 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 3023 2012, . 3025 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 3026 Encodings and Procedures for Multicast in MPLS/BGP IP 3027 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 3028 . 3030 [RFC7716] Zhang, J., Giuliano, L., Rosen, E., Ed., Subramanian, K., 3031 and D. Pacella, "Global Table Multicast with BGP Multicast 3032 VPN (BGP-MVPN) Procedures", RFC 7716, 3033 DOI 10.17487/RFC7716, December 2015, 3034 . 3036 [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., 3037 Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent 3038 Multicast - Sparse Mode (PIM-SM): Protocol Specification 3039 (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 3040 2016, . 3042 Appendix A. Integrated Routing and Bridging 3044 This Appendix provides a short tutorial on the interaction of routing 3045 and bridging. First it shows the traditional model, where bridging 3046 and routing are performed in separate boxes. Then it shows the model 3047 specified in [EVPN-IRB], where a single box contains both routing and 3048 bridging functions. The latter model is presupposed in the body of 3049 this document. 3051 Figure 1 shows a "traditional" router that only does routing and has 3052 no L2 bridging capabilities. There are two LANs, LAN1 and LAN2. 3053 LAN1 is realized by switch1, LAN2 by switch2. The router has an 3054 interface, "lan1" that attaches to LAN1 (via switch1) and an 3055 interface "lan2" that attachs to LAN2 (via switch2). Each intreface 3056 is configured, as an IP interface, with an IP address and a subnet 3057 mask. 3059 +-------+ +--------+ +-------+ 3060 | | lan1| |lan2 | | 3061 H1 -----+Switch1+--------+ Router1+--------+Switch2+------H3 3062 | | | | | | 3063 H2 -----| | | | | | 3064 +-------+ +--------+ +-------+ 3065 |_________________| |__________________| 3066 LAN1 LAN2 3068 Figure 1: Conventional Router with LAN Interfaces 3070 IP traffic (unicast or multicast) that remains within a single subnet 3071 never reaches the router. For instance, if H1 emits an ethernet 3072 frame with H2's MAC address in the ethernet destination address 3073 field, the frame will go from H1 to Switch1 to H2, without ever 3074 reaching the router. Since the frame is never seen by a router, the 3075 IP datagram within the frame remains entirely unchanged; e.g., its 3076 TTL is not decremented. The ethernet Source and Destination MAC 3077 addresses are not changed either. 3079 If H1 wants to send a unicast IP datagram to H3, which is on a 3080 different subnet, H1 has to be configured with the IP address of a 3081 "default router". Let's assume that H1 is configured with an IP 3082 address of Router1 as its default router address. H1 compares H3's 3083 IP address with its own IP address and IP subnet mask, and determines 3084 that H3 is on a different subnet. So the packet has to be routed. 3085 H1 uses ARP to map Router1's IP address to a MAC address on LAN1. H1 3086 then encapsulates the datagram in an ethernet frame, using router1's 3087 MAC address as the destination MAC address, and sends the frame to 3088 Router1. 3090 Router1 then receives the frame over its lan1 interface. Router1 3091 sees that the frame is addressed to it, so it removes the ethernet 3092 encapsulation and processes the IP datagram. The datagram is not 3093 addressed to Router1, so it must be forwarded further. Router1 does 3094 a lookup of the datagram's IP destination field, and determines that 3095 the destination (H3) can be reached via Router1's lan2 interface. 3096 Router1 now performs the IP processing of the datagram: it decrements 3097 the IP TTL, adjusts the IP header checksum (if present), may fragment 3098 the packet is necessary, etc. Then the datagram (or its fragments) 3099 are encapsulated in an ethernet header, with Router1's MAC address on 3100 LAN2 as the MAC Source Address, and H3's MAC address on LAN2 (which 3101 Router1 determines via ARP) as the MAC Destination Address. Finally 3102 the packet is sent out the lan2 interface. 3104 If H1 has an IP multicast datagram to send (i.e., an IP datagram 3105 whose Destination Address field is an IP Multicast Address), it 3106 encapsulates it in an ethernet frame whose MAC Destination Address is 3107 computed from the IP Destination Address. 3109 If H2 is a receiver for that multicast address, H2 will receive a 3110 copy of the frame, unchanged, from H1. The MAC Source Address in the 3111 ethernet encapsulation does not change, the IP TTL field does not get 3112 decremented, etc. 3114 If H3 is a receiver for that multicast address, the datagram must be 3115 routed to H3. In order for this to happen, Router1 must be 3116 configured as a multicast router, and it must accept traffic sent to 3117 ethernet multicast addresses. Router1 will receive H1's multicast 3118 frame on its lan1 interface, will remove the ethernet encapsulation, 3119 and will determine how to dispatch the IP datagram based on Router1's 3120 multicast forwarding states. If Router1 knows that there is a 3121 receiver for the multicast datagram on LAN2, makes a copy of the 3122 datagram, decrements the TTL (and performs any other necessary IP 3123 processing), then encapsulates the datagram in ethernet frame for 3124 LAN2. The MAC Source Address for this frame will be Router1's MAC 3125 Source Address on LAN2. The MAC Destination Address is computed from 3126 the IP Destination Address. Finally, the frame is sent out Router1's 3127 LAN2 interface. 3129 Figure 2 shows an Integrated Router/Bridge that supports the routing/ 3130 bridging integration model of [EVPN-IRB]. 3132 +------------------------------------------+ 3133 | Integrated Router/Bridge | 3135 +-------+ +--------+ +-------+ 3136 | | IRB1| L3 |IRB2 | | 3137 H1 -----+ BD1 +--------+Routing +--------+ BD2 +------H3 3138 | | |Instance| | | 3139 H2 -----| | | | | | 3140 +-------+ +--------+ +-------+ 3141 |___________________| |____________________| 3142 LAN1 LAN2 3144 Figure 2: Integrated Router/Bridge 3146 In Figure 2, a single box consists of one or more "L3 Routing 3147 Instances". The routing/forwarding tables of a given routing 3148 instance is known as an IP-VRF ([EVPN-IRB]). In the context of EVPN, 3149 it is convenient to think of each routing instance as representing 3150 the routing of a particular tenant. Each IP-VRF is attached to one 3151 or more interfaces. 3153 When several EVPN PEs have a routing instance of the same tenant 3154 domain, those PEs advertise IP routes to the attached hosts. This is 3155 done as specified in [EVPN-IRB]. 3157 The integrated router/bridge shown in Figure 2 also attaches to a 3158 number of "Broadcast Domains" (BDs). Each BD performs the functions 3159 that are performed by the bridges in Figure 1. To the L3 routing 3160 instance, each BD appears to be a LAN. The interface attaching a 3161 particular BD to a particular IP-VRF is known as an "IRB Interface". 3162 From the perspective of L3 routing, each BD is a subnet. Thus each 3163 IRB interface is configured with a MAC address (which is the router's 3164 MAC address on the corresponding LAN), as well as an IP address and 3165 subnet mask. 3167 The integrated router/bridge shown in Figure 2 may have multiple ACs 3168 to each BD. These ACs are visible only to the bridging function, not 3169 to the routing instance. To the L3 routing instance, there is just 3170 one "interface" to each BD. 3172 If the L3 routing instance represents the IP routing of a particular 3173 tenant, the BDs attached to that routing instance are BDs belonging 3174 to that same tenant. 3176 Bridging and routing now proceed exactly as in the case of Figure 1, 3177 except that BD1 replaces Switch1, BD2 replaces Switch2, interface 3178 IRB1 replaces interface lan1, and interface IRB2 replaces interface 3179 lan2. 3181 It is important to understand that an IRB interface connects an L3 3182 routing instance to a BD, NOT to a "MAC-VRF". (See [RFC7432] for the 3183 definition of "MAC-VRF".) A MAC-VRF may contain several BDs, as long 3184 as no MAC address appears in more than one BD. From the perspective 3185 of the L3 routing instance, each individual BD is an individual IP 3186 subnet; whether each BD has its own MAC-VRF or not is irrelevant to 3187 the L3 routing instance. 3189 Figure 3 illustrates IRB when a pair of BDs (subnets) are attached to 3190 two different PE routers. In this example, each BD has two segments, 3191 and one segment of each BD is attached to one PE router. 3193 +------------------------------------------+ 3194 | Integrated Router/Bridges | 3196 +-------+ +--------+ +-------+ 3197 | | IRB1| |IRB2 | | 3198 H1 -----+ BD1 +--------+ PE1 +--------+ BD2 +------H3 3199 |(Seg-1)| |(L3 Rtg)| |(Seg-1)| 3200 H2 -----| | | | | | 3201 +-------+ +--------+ +-------+ 3202 |___________________| | |____________________| 3203 LAN1 | LAN2 3204 | 3205 | 3206 +-------+ +--------+ +-------+ 3207 | | IRB1| |IRB2 | | 3208 H4 -----+ BD1 +--------+ PE2 +--------+ BD2 +------H5 3209 |(Seg-2)| |(L3 Rtg)| |(Seg-2)| 3210 | | | | | | 3211 +-------+ +--------+ +-------+ 3213 Figure 3: Integrated Router/Bridges with Distributed Subnet 3215 If H1 needs to send an IP packet to H4, it determines from its IP 3216 address and subnet mask that H4 is on the same subnet as H1. 3217 Although H1 and H4 are not attached to the same PE router, EVPN 3218 provides ethernet communication among all hosts that are on the same 3219 BD. H1 thus uses ARP to find H4's MAC address, and sends an ethernet 3220 frame with H4's MAC address in the Destination MAC address field. 3221 The frame is received at PE1, but since the Destination MAC address 3222 is not PE1's MAC address, PE1 assumes that the frame is to remain on 3223 BD1. Therefore the packet inside the frame is NOT decapsulated, and 3224 is NOT send up the IRB interface to PE1's routing instance. Rather, 3225 standard EVPN intra-subnet procedures (as detailed in [RFC7432] are 3226 used to deliver the frame to PE2, which then sends it to H4. 3228 If H1 needs to send an IP packet to H5, it determines from its IP 3229 address and subnet mask that H5 is NOT on the same subnet as H1. 3230 Assuming that H1 has been configured with the IP address of PE1 as 3231 its default router, H1 sends the packet in an ethernet frame with 3232 PE1's MAC address in its Destination MAC Address field. PE1 receives 3233 the frame, and sees that the frame is addressed to it. PE1 thus 3234 sends the frame up its IRB1 interface to the L3 routing instance. 3235 Appropriate IP processing is done (e.g., TTL decrement). The L3 3236 routing instance determines that the "next hop" for H5 is PE2, so the 3237 packet is encapsulated (e.g., in MPLS) and sent across the backbone 3238 to PE2's routing instance. PE2 will see that the packet's 3239 destination, H5, is on BD2 segment-2, and will send the packet down 3240 its IRB2 interface. This causes the IP packet to be encapsulated in 3241 an ethernet frame with PE2's MAC address (on BD2) in the Source 3242 Address field and H5's MAC address in the Destination Address field. 3244 Note that if H1 has an IP packet to send to H3, the forwarding of the 3245 packet is handled entirely within PE1. PE1's routing instance sees 3246 the packet arrive on its IRB1 interface, and then transmits the 3247 packet by sending it down its IRB2 interface. 3249 Often, all the hosts in a particular Tenant Domain will be 3250 provisioned with the same value of the default router IP address. 3251 This IP address can be assigned, as an "anycast address", to all the 3252 EVPN PEs attached to that Tenant Domain. Thus although all hosts are 3253 provisioned with the same "default router address", the actual 3254 default router for a given host will be one of the PEs that is 3255 attached to the same ethernet segment as the host. This provisioning 3256 method ensures that IP packets from a given host are handled by the 3257 closest EVPN PE that supports IRB. 3259 In the topology of Figure 3, one could imagine that H1 is configured 3260 with a default router address that belongs to PE2 but not to PE1. 3261 Inter-subnet routing would still work, but IP packets from H1 to H3 3262 would then follow the non-optimal path H1-->PE1-->PE2-->PE1-->H3. 3263 Sending traffic on this sort of path, where it leaves a router and 3264 then comes back to the same router, is sometimes known as 3265 "hairpinning". Similarly, if PE2 supports IRB but PE1 dos not, the 3266 same non-optimal path from H1 to H3 would have to be followed. To 3267 avoid hairpinning, each EVPN PE needs to support IRB. 3269 It is worth pointing out the way IRB interfaces interact with 3270 multicast traffic. Referring again to Figure 3, suppose PE1 and PE2 3271 are functioning as IP multicast routers. Suppose also that H3 3272 transmits a multicast packet, and both H1 and H4 are interested in 3273 receiving that packet. PE1 will receive the packet from H3 via its 3274 IRB2 interface. The ethernet encapsulation from BD2 is removed, the 3275 IP header processing is done, and the packet is then reencapsulated 3276 for BD1, with PE1's MAC address in the MAC Source Address field. 3277 Then the packet is sent down the IRB1 interface. Layer 2 procedures 3278 (as defined in [RFC7432] would then be used to deliver a copy of the 3279 packet locally to H1, and remotely to H4. 3281 Please be aware that his document modifies the semantics, described 3282 in the previous paragraph, of sending/receiving multicast traffic on 3283 an IRB interface. This is explained in Section 1.5.1 and subsequent 3284 sections. 3286 Authors' Addresses 3288 Wen Lin 3289 Juniper Networks, Inc. 3291 EMail: wlin@juniper.net 3293 Zhaohui Zhang 3294 Juniper Networks, Inc. 3296 EMail: zzhang@juniper.net 3298 John Drake 3299 Juniper Networks, Inc. 3301 EMail: jdrake@juniper.net 3303 Eric C. Rosen (editor) 3304 Juniper Networks, Inc. 3306 EMail: erosen@juniper.net 3308 Jorge Rabadan 3309 Nokia 3311 EMail: jorge.rabadan@nokia.com 3313 Ali Sajassi 3314 Cisco Systems 3316 EMail: sajassi@cisco.com