idnits 2.17.1 draft-ietf-bess-evpn-irb-mcast-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 13, 2018) is 2256 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'EVPN-AR' -- Possible downref: Non-RFC (?) normative reference: ref. 'EVPN-BUM' == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-03 == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-igmp-mld-proxy-00 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS W. Lin 3 Internet-Draft Z. Zhang 4 Intended status: Standards Track J. Drake 5 Expires: August 17, 2018 E. Rosen, Ed. 6 Juniper Networks, Inc. 7 J. Rabadan 8 Nokia 9 A. Sajassi 10 Cisco Systems 11 February 13, 2018 13 EVPN Optimized Inter-Subnet Multicast (OISM) Forwarding 14 draft-ietf-bess-evpn-irb-mcast-00 16 Abstract 18 Ethernet VPN (EVPN) provides a service that allows a single Local 19 Area Network (LAN), i.e., a single IP subnet, to be distributed over 20 multiple sites. The sites are interconnected by an IP or MPLS 21 backbone. Intra-subnet traffic (either unicast or multicast) always 22 appears to the endusers to be bridged, even when it is actually 23 carried over the IP backbone. When a single "tenant" owns multiple 24 such LANs, EVPN also allows IP unicast traffic to be routed between 25 those LANs. This document specifies new procedures that allow inter- 26 subnet IP multicast traffic to be routed among the LANs of a given 27 tenant, while still making intra-subnet IP multicast traffic appear 28 to be bridged. These procedures can provide optimal routing of the 29 inter-subnet multicast traffic, and do not require any such traffic 30 to leave a given router and then reenter that same router. These 31 procedures also accommodate IP multicast traffic that needs to travel 32 to or from systems that are outside the EVPN domain. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on August 17, 2018. 50 Copyright Notice 52 Copyright (c) 2018 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 68 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 4 69 1.1.1. Segments, Broadcast Domains, and Tenants . . . . . . 4 70 1.1.2. Inter-BD (Inter-Subnet) IP Traffic . . . . . . . . . 5 71 1.1.3. EVPN and IP Multicast . . . . . . . . . . . . . . . . 6 72 1.1.4. BDs, MAC-VRFS, and EVPN Service Models . . . . . . . 7 73 1.2. Need for EVPN-aware Multicast Procedures . . . . . . . . 7 74 1.3. Additional Requirements That Must be Met by the Solution 8 75 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 10 76 1.5. Model of Operation: Overview . . . . . . . . . . . . . . 12 77 1.5.1. Control Plane . . . . . . . . . . . . . . . . . . . . 12 78 1.5.2. Data Plane . . . . . . . . . . . . . . . . . . . . . 14 79 2. Detailed Model of Operation . . . . . . . . . . . . . . . . . 16 80 2.1. Supplementary Broadcast Domain . . . . . . . . . . . . . 16 81 2.2. When is a Route About/For/From a Particular BD . . . . . 17 82 2.3. Use of IRB Interfaces at Ingress PE . . . . . . . . . . . 18 83 2.4. Use of IRB Interfaces at an Egress PE . . . . . . . . . . 19 84 2.5. Announcing Interest in (S,G) . . . . . . . . . . . . . . 20 85 2.6. Tunneling Frames from Ingress PE to Egress PEs . . . . . 21 86 2.7. Advanced Scenarios . . . . . . . . . . . . . . . . . . . 22 87 3. EVPN-aware Multicast Solution Control Plane . . . . . . . . . 22 88 3.1. Supplementary Broadcast Domain (SBD) and Route Targets . 22 89 3.2. Advertising the Tunnels Used for IP Multicast . . . . . . 23 90 3.2.1. Constructing SBD Routes . . . . . . . . . . . . . . . 24 91 3.2.1.1. Constructing an SBD-IMET Route . . . . . . . . . 24 92 3.2.1.2. Constructing an SBD-SMET Route . . . . . . . . . 25 93 3.2.1.3. Constructing an SBD-SPMSI Route . . . . . . . . . 25 94 3.2.2. Ingress Replication . . . . . . . . . . . . . . . . . 26 95 3.2.3. Assisted Replication . . . . . . . . . . . . . . . . 26 96 3.2.4. BIER . . . . . . . . . . . . . . . . . . . . . . . . 27 97 3.2.5. Inclusive P2MP Tunnels . . . . . . . . . . . . . . . 28 98 3.2.5.1. Using the BUM Tunnels as IP Multicast Inclusive 99 Tunnels . . . . . . . . . . . . . . . . . . . . . 28 100 3.2.5.1.1. RSVP-TE P2MP . . . . . . . . . . . . . . . . 28 101 3.2.5.1.2. mLDP or PIM . . . . . . . . . . . . . . . . . 29 102 3.2.5.2. Using Wildcard S-PMSI A-D Routes to Advertise 103 Inclusive Tunnels Specific to IP Multicast . . . 30 104 3.2.6. Selective Tunnels . . . . . . . . . . . . . . . . . . 30 105 3.3. Advertising SMET Routes . . . . . . . . . . . . . . . . . 31 106 4. Constructing Multicast Forwarding State . . . . . . . . . . . 33 107 4.1. Layer 2 Multicast State . . . . . . . . . . . . . . . . . 33 108 4.1.1. Constructing the OIF List . . . . . . . . . . . . . . 34 109 4.1.2. Data Plane: Applying the OIF List to an (S,G) Frame . 35 110 4.1.2.1. Eligibility of an AC to Receive a Frame . . . . . 35 111 4.1.2.2. Applying the OIF List . . . . . . . . . . . . . . 35 112 4.2. Layer 3 Forwarding State . . . . . . . . . . . . . . . . 37 113 5. Interworking with non-OISM EVPN-PEs . . . . . . . . . . . . . 37 114 5.1. IPMG Designated Forwarder . . . . . . . . . . . . . . . . 40 115 5.2. Ingress Replication . . . . . . . . . . . . . . . . . . . 40 116 5.2.1. Ingress PE is non-OISM . . . . . . . . . . . . . . . 42 117 5.2.2. Ingress PE is OISM . . . . . . . . . . . . . . . . . 43 118 5.3. P2MP Tunnels . . . . . . . . . . . . . . . . . . . . . . 44 119 6. Traffic to/from Outside the EVPN Tenant Domain . . . . . . . 44 120 6.1. Layer 3 Interworking via EVPN OISM PEs . . . . . . . . . 45 121 6.1.1. General Principles . . . . . . . . . . . . . . . . . 45 122 6.1.2. Interworking with MVPN . . . . . . . . . . . . . . . 47 123 6.1.2.1. MVPN Sources with EVPN Receivers . . . . . . . . 49 124 6.1.2.1.1. Identifying MVPN Sources . . . . . . . . . . 49 125 6.1.2.1.2. Joining a Flow from an MVPN Source . . . . . 50 126 6.1.2.2. EVPN Sources with MVPN Receivers . . . . . . . . 52 127 6.1.2.2.1. General procedures . . . . . . . . . . . . . 52 128 6.1.2.2.2. Any-Source Multicast (ASM) Groups . . . . . . 53 129 6.1.2.2.3. Source on Multihomed Segment . . . . . . . . 54 130 6.1.2.3. Obtaining Optimal Routing of Traffic Between MVPN 131 and EVPN . . . . . . . . . . . . . . . . . . . . 55 132 6.1.2.4. DR Selection . . . . . . . . . . . . . . . . . . 55 133 6.1.3. Interworking with 'Global Table Multicast' . . . . . 56 134 6.1.4. Interworking with PIM . . . . . . . . . . . . . . . . 56 135 6.1.4.1. Source Inside EVPN Domain . . . . . . . . . . . . 57 136 6.1.4.2. Source Outside EVPN Domain . . . . . . . . . . . 58 137 6.2. Interworking with PIM via an External PIM Router . . . . 59 138 7. Using an EVPN Tenant Domain as an Intermediate (Transit) 139 Network for Multicast traffic . . . . . . . . . . . . . . . . 60 140 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 62 141 9. Security Considerations . . . . . . . . . . . . . . . . . . . 62 142 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 62 143 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 62 144 11.1. Normative References . . . . . . . . . . . . . . . . . . 62 145 11.2. Informative References . . . . . . . . . . . . . . . . . 64 146 Appendix A. Integrated Routing and Bridging . . . . . . . . . . 65 147 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 71 149 1. Introduction 151 1.1. Background 153 Ethernet VPN (EVPN) [RFC7432] provides a Layer 2 VPN (L2VPN) 154 solution, which allows IP backbone provider to offer ethernet service 155 to a set of customers, known as "tenants". 157 In this section (as well as in [EVPN-IRB]), we provide some essential 158 background information on EVPN. 160 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 161 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 162 "OPTIONAL" in this document are to be interpreted as described in BCP 163 14 [RFC2119] [RFC8174] when, and only when, they appear in all 164 capitals, as shown here. 166 1.1.1. Segments, Broadcast Domains, and Tenants 168 One of the key concepts of EVPN is the Broadcast Domain (BD). A BD 169 is essentially an emulated ethernet. Each BD belongs to a single 170 tenant. A BD typically consists of multiple ethernet "segments", and 171 each segment may be attached to a different EVPN Provider Edge 172 (EVPN-PE) router. EVPN-PE routers are often referred to as "Network 173 Virtualization Endpoints" or NVEs. However, this document will use 174 the term "EVPN-PE", or, when the context is clear, just "PE". 176 In this document, we use the term "segment" to mean the same as 177 "Ethernet Segment" or "ES" in [RFC7432]. 179 Attached to each segment are "Tenant Systems" (TSes). A TS may be 180 any type of system, physical or virtual, host or router, etc., that 181 can attach to an ethernet. 183 When two TSes are on the same segment, traffic between them does not 184 pass through an EVPN-PE. When two TSes are on different segments of 185 the same BD, traffic between them does pass through an EVPN-PE. 187 When two TSes, say TS1 and TS2 are on the same BD, then: 189 o If TS1 knows the MAC address of TS2, TS1 can send unicast ethernet 190 frames to TS2. TS2 will receive the frames unaltered. That is, 191 TS1's MAC address will be in the MAC Source Address field. If the 192 frame contains an IP datagram, the IP header is not modified in 193 any way during the transmission. 195 o If TS1 broadcasts an ethernet frame, TS2 will receive the 196 unaltered frame. 198 o If TS1 multicasts an ethernet frame, TS2 will receive the 199 unaltered frame, as long as TS2 has been provisioned to receive 200 ethernet multicasts. 202 When we say that TS2 receives an unaltered frame from TS1, we mean 203 that the frame still contains TS1's MAC address, and that no 204 alteration of the frame's payload has been done. 206 EVPN allows a single segment to be attached to multiple PE routers. 207 This is known as "EVPN multi-homing". EVPN has procedures to ensure 208 that a frame from a given segment, arriving at a particular PE 209 router, cannot be returned to that segment via a different PE router. 210 This is particularly important for multicast, because a frame 211 arriving at a PE from a given segment will already have been seen by 212 all systems on the segment that need to see it. If the frame were 213 sent back to the originating segment, receivers on that segment would 214 receive the packet twice. Even worse, the frame might be sent back 215 to a PE, which could cause an infinite loop. 217 1.1.2. Inter-BD (Inter-Subnet) IP Traffic 219 If a given tenant has multiple BDs, the tenant may wish to allow IP 220 communication among these BDs. Such a set of BDs is known as an 221 "EVPN Tenant Domain" or just a "Tenant Domain". 223 If tenant systems TS1 and TS2 are not in the same BD, then they do 224 not receive unaltered ethernet frames from each other. In order for 225 TS1 to send traffic to TS2, TS1 encapsulates an IP datagram inside an 226 ethernet frame, and uses ethernet to send these frames to an IP 227 router. The router decapsulates the IP datagram, does the IP 228 processing, and re-encapsulates the datagram for ethernet. The MAC 229 source address field now has the MAC address of the router, not of 230 TS1. The TTL field of the IP datagram should be decremented by 231 exactly 1; this hides the structure of the provider's IP backbone 232 from the tenants. 234 EVPN accommodates the need for inter-BD communication within a Tenant 235 Domain by providing an integrated L2/L3 service for unicast IP 236 traffic. EVPN's Integrated Routing and Bridging (IRB) functionality 237 is specified in [EVPN-IRB]. Each BD in a Tenant Domain is assumed to 238 be a single IP subnet, and each IP subnet within a a given Tenant 239 Domain is assumed to be a single BD. EVPN's IRB functionality allows 240 IP traffic to travel from one BD to another, and ensures that proper 241 IP processing (e.g., TTL decrement) is done. 243 A brief overview of IRB, including the notion of an "IRB interface", 244 can be found in Appendix A. As explained there, an IRB interface is 245 a sort of virtual interface connecting an L3 routing instance to a 246 BD. A BD may have multiple attachment circuits (ACs) to a given PE, 247 where each AC connects to a different ethernet segment of the BD. 248 However, these ACs are not visible to the L3 routing function; from 249 the perspective of an L3 routing instance, a PE has just one 250 interface to each BD, viz., the IRB interface for that BD. 252 The "L3 routing instance" depicted in Appendix A is associated with a 253 single Tenant Domain, and may be thought of as an IP-VRF for that 254 Tenant Domain. 256 1.1.3. EVPN and IP Multicast 258 [EVPN-IRB] and [EVPN_IP_Prefix] cover inter-subnet (inter-BD) IP 259 unicast forwarding, but they do not cover inter-subnet IP multicast 260 forwarding. 262 [RFC7432] covers intra-subnet (intra-BD) ethernet multicast. The 263 intra-subnet ethernet multicast procedures of [RFC7432] are used for 264 ethernet Broadcast traffic, for ethernet unicast traffic whose MAC 265 Destination Address field contains an Unknown address, and for 266 ethernet traffic whose MAC Destination Address field contains an 267 ethernet Multicast MAC address. These three classes of traffic are 268 known collectively as "BUM traffic" (Broadcast/UnknownUnicast/ 269 Multicast), and the procedures for handling BUM traffic are known as 270 "BUM procedures". 272 [IGMP-Proxy] extends the intra-subnet ethernet multicast procedures 273 by adding procedures that are specific to, and optimized for, the use 274 of IP multicast within a subnet. However,that document does not 275 cover inter-subnet IP multicast. 277 The purpose of this document is to specify procedures for EVPN that 278 provide optimized IP multicast functionality within an EVPN tenant 279 domain. This document also specifies procedures that allow IP 280 multicast packets to be sourced from or destined to systems outside 281 the Tenant Domain. We refer to the entire set of these procedures as 282 "OISM" (Optimized Inter-Subnet Multicast) procedures. 284 In order to support the OISM procedures specified in this document, 285 an EVPN-PE MUST also support [EVPN-IRB] and [IGMP-Proxy]. 287 1.1.4. BDs, MAC-VRFS, and EVPN Service Models 289 [RFC7432] defines the notion of "MAC-VRF". A MAC-VRF contains one or 290 more "Bridge Tables" (see section 3 of [RFC7432] for a discussion of 291 this terminology), each of which represents a single Broadcast 292 Domain. 294 In the IRB model (outlined in Appendix A) a L3 routing instance has 295 one IRB interface per BD, NOT one per MAC-VRF. The procedures of 296 this document are intended to work with all the EVPN service models. 297 This document does not distinguish between a "Broadcast Domain" and a 298 "Bridge Table", and will use the terms interchangeably (or will use 299 the acronym "BD" to refer to either). The way the BDs are grouped 300 into MAC-VRFs is not relevant to the procedures specified in this 301 document. 303 Section 6 of [RFC7432] also defines several different EVPN service 304 models: 306 o In the "vlan-based service", each MAC-VRF contains one "bridge 307 table", where the bridge table corresponds to a particular Virtual 308 LAN (VLAN). (See section 3 of [RFC7432] for a discussion of this 309 terminology.) Thus each VLAN is treated as a BD. 311 o In the "vlan bundle service", each MAC-VRF contains one bridge 312 table, where the bridge table corresponds to a set of VLANs. Thus 313 a set of VLANs are treated as constituting a single BD. 315 o In the "vlan-aware bundle service", each MAC-VRF may contain 316 multiple bridge tables, where each bridge table corresponds to one 317 BD. If a MAC-VRF contains several bridge tables, then it 318 corresponds to several BDs. 320 The procedures of this document are intended to work for all these 321 service models. 323 1.2. Need for EVPN-aware Multicast Procedures 325 Inter-subnet IP multicast among a set of BDs can be achieved, in a 326 non-optimal manner, without any specific EVPN procedures. For 327 instance, if a particular tenant has n BDs among which he wants to 328 send IP multicast traffic, he can simply attach a conventional 329 multicast router to all n BDs. Or more generally, as long as each BD 330 has at least one IP multicast router, and the IP multicast routers 331 communicate multicast control information with each other, 332 conventional IP multicast procedures will work normally, and no 333 special EVPN functionality is needed. 335 However, that technique does not provide optimal routing for 336 multicast. In conventional multicast routing, for a given multicast 337 flow, there is only one multicast router on each BD that is permitted 338 to send traffic of that flow to the BD. If that BD has receivers for 339 a given flow, but the source of the flow is not on that BD, then the 340 flow must pass through that multicast router. This leads to the 341 "hair-pinning" problem described (for unicast) in Appendix A. 343 For example, consider an (S,G) flow that is sourced by a TS S and 344 needs to be received by TSes R1 and R2. Suppose S is on a segment of 345 BD1, R1 is on a segment of BD2, but both are attached to PE1. 346 Suppose also that the tenant has a multicast router, attached to a 347 segment of BD1 and to a segment of BD2. However, the segments to 348 which that router is attached are both attached to PE2. Then the 349 flow from S to R would have to follow the path: 350 S-->PE1-->PE2-->Tenant Multicast Router-->PE2-->PE1-->R1. Obviously, 351 the path S-->PE1-->R would be preferred. 353 Now suppose that there is a second receiver, R2. R2 is attached to a 354 third BD, BD3. However, it is attached to a segment of BD3 that is 355 attached to PE1. And suppose also that the Tenant Multicast Router 356 is attached to a segment of BD3 that attaches to PE2. In this case, 357 the Tenant Multicast Router will make two copies of the packet, one 358 for BD2 and one for BD3. PE2 will send both copies back to PE1. Not 359 only is the routing sub-optimal, but PE2 sends multiple copies of the 360 same packet to PE1. This is a further sub-optimality. 362 This is only an example; many more examples of sub-optimal multicast 363 routing can easily be given. To eliminate sub-optimal routing and 364 extra copies, it is necessary to have a multicast solution that is 365 EVPN-aware, and that can use its knowledge of the internal structure 366 of a Tenant Domain to ensure that multicast traffic gets routed 367 optimally. The procedures of this document allow us to avoid all 368 such sub-optimalities when routing inter-subnet multicasts within a 369 Tenant Domain. 371 1.3. Additional Requirements That Must be Met by the Solution 373 In addition to providing optimal routing of multicast flows within a 374 Tenant Domain, the EVPN-aware multicast solution is intended to 375 satisfy the following requirements: 377 o The solution must integrate well with the procedures specified in 378 [IGMP-Proxy]. That is, an integrated set of procedures must 379 handle both intra-subnet multicast and inter-subnet multicast. 381 o With regard to intra-subnet multicast, the solution MUST maintain 382 the integrity of multicast ethernet service. This means: 384 * If a source and a receiver are on the same subnet, the MAC 385 source address (SA) of the multicast frame sent by the source 386 will not get rewritten. 388 * If a source and a receiver are on the same subnet, no IP 389 processing of the ethernet payload is done. The IP TTL is not 390 decremented, the header checksum is not changed, no 391 fragmentation is done, etc. 393 o On the other hand, if a source and a receiver are on different 394 subnets, the frame received by the receiver will not have the MAC 395 Source address of the source, as the frame will appear to have 396 come from a multicast router. Also, proper processing of the IP 397 header is done, e.g., TTL decrement by 1, header checksum 398 modification, possibly fragmentation, etc. 400 o If a Tenant Domain contains several BDs, it MUST be possible for a 401 multicast flow (even when the multicast group address is an "any 402 source multicast" (ASM) address), to have sources in one of those 403 BDs and receivers in one or more of the other BDs, without 404 requiring the presence of any system performing PIM Rendezvous 405 Point (RP) functions ([RFC7761]). Multicast throughout a Tenant 406 Domain must not require the tenant systems to be aware of any 407 underlying multicast infrastructure. 409 o Sometimes a MAC address used by one TS on a particular BD is also 410 used by another TS on a different BD. Inter-subnet routing of 411 multicast traffic MUST NOT make any assumptions about the 412 uniqueness of a MAC address across several BDs. 414 o If two EVPN-PEs attached to the same Tenant Domain both support 415 the OISM procedures, each may receive inter-subnet multicasts from 416 the other, even if the egress PE is not attached to any segment of 417 the BD from which the multicast packets are being sourced. It 418 MUST NOT be necessary to provision the egress PE with knowledge of 419 the ingress BD. 421 o There must be a procedure that that allows EVPN-PE routers 422 supporting OISM procedures to send/receive multicast traffic to/ 423 from EVPN-PE routers that support only [RFC7432], but that do not 424 support the OISM procedures or even the procedures of [EVPN-IRB]. 425 However, when interworking with such routers (which we call 426 "non-OISM PE routers"), optimal routing may not be achievable. 428 o It MUST be possible to support scenarios in which multicast flows 429 with sources inside a Tenant Domain have "external" receivers, 430 i.e., receivers that are outside the domain. It must also be 431 possible to support scenarios where multicast flows with external 432 sources (sources outside the Tenant Domain) have receivers inside 433 the domain. 435 This presupposes that unicast routes to multicast sources outside 436 the domain can be distributed to EVPN-PEs attached to the domain, 437 and that unicast routes to multicast sources within the domain can 438 be distributed outside the domain. 440 Of particular importance are the scenario in which the external 441 sources and/or receivers are reachable via L3VPN/MVPN, and the 442 scenario in which external sources and/or receivers are reachable 443 via IP/PIM. 445 The solution for external interworking MUST allow for deployment 446 scenarios in which EVPN does not need to export a host route for 447 every multicast source. 449 o The solution for external interworking must not presuppose that 450 the same tunneling technology is used within both the EVPN domain 451 and the external domain. For example, MVPN interworking must be 452 possible when MVPN is using MPLS P2MP tunneling, and EVPN is using 453 Ingress Replication or VXLAN tunneling. 455 o The solution must not be overly dependent on the details of a 456 small set of use cases, but must be adaptable to new use cases as 457 they arise. (That is, the solution must be robust.) 459 1.4. Terminology 461 In this document we make frequent use of the following terminology: 463 o OISM: Optimized Inter-Subnet Multicast. EVPN-PEs that follow the 464 procedures of this document will be known as "OISM" PEs. EVPN-PEs 465 that do not follow the procedures of this document will be known 466 as "non-OISM" PEs. 468 o IP Multicast Packet: An IP packet whose IP Destination Address 469 field is a multicast address that is not a link-local address. 470 (Link-local addresses are IPv4 addresses in the 224/8 range and 471 IPv6 address in the FF02/16 range.) 473 o IP Multicast Frame: An ethernet frame whose payload is an IP 474 multicast packet (as defined above). 476 o (S,G) Multicast Packet: An IP multicast packet whose IP Source 477 Address field contains S and whose IP Destination Address field 478 contains G. 480 o (S,G) Multicast Frame: An IP multicast frame whose payload 481 contains S in its IP Source Address field and G in its IP 482 Destination Address field. 484 o Broadcast Domain (BD): an emulated ethernet, such that two systems 485 on the same BD will receive each other's link-local broadcasts. 487 Note that EVPN supports models in which a single EVPN Instance 488 (EVI) contains only one BD, and models in which a single EVI 489 contains multiple BDs. Both models are supported by this draft. 490 However, a given BD belongs to only one EVI. 492 o Designated Forwarder (DF). As defined in [RFC7432], an ethernet 493 segment may be multi-homed (attached to more than one PE). An 494 ethernet segment may also contain multiple BDs, of one or more 495 EVIs. For each such EVI, one of the PEs attached to the segment 496 becomes that EVI's DF for that segment. Since a BD may belong to 497 only one EVI, we can speak unambiguously of the BD's DF for a 498 given segment. 500 When the text makes it clear that we are speaking in the context 501 of a given BD, we will frequently use the term "a segment's DF" to 502 mean the given BD's DF for that segment. 504 o AC: Attachment Circuit. An AC connects the bridging function of 505 an EVPN-PE to an ethernet segment of a particular BD. ACs are not 506 visible at the router (L3) layer. 508 o L3 Gateway: An L3 Gateway is a PE that connects an EVPN tenant 509 domain to an external multicast domain by performing both the OISM 510 procedures and the Layer 3 multicast procedures of the external 511 domain. 513 o PEG (PIM/EVPN Gateway): A L3 Gateway that connects an EVPN tenant 514 domain to an external multicast domain whose Layer 3 multicast 515 procedures are those of PIM ([RFC7761]). 517 o MEG (MVPN/EVPN Gateway): A L3 Gateway that connects an EVPN tenant 518 domain to an external multicast domain whose Layer 3 multicast 519 procedures are those of MVPN ([RFC6513], [RFC6514]). 521 o IPMG (IP Multicast Gateway): A PE that is used for interworking 522 OISM EVPN-PEs with non-OISM EVPN-PEs. 524 o DR (Designated Router): A PE that has special responsibilities for 525 handling multicast on a given BD. 527 o Use of the "C-" prefix. In many documents on VPN multicast, the 528 prefix "C-" appears before any address or wildcard that refers to 529 an address or addresses in a tenant's address space, rather than 530 to an address of addresses in the address space of the backbone 531 network. This document omits the "C-" prefix in many cases where 532 it is clear from the context that the reference is to the tenant's 533 address space. 535 This document also assumes familiarity with the terminology of 536 [RFC4364], [RFC6514], [RFC7432], [RFC7761], [IGMP-Proxy], 537 [EVPN_IP_Prefix] and [EVPN-BUM]. 539 1.5. Model of Operation: Overview 541 1.5.1. Control Plane 543 In this section, and in the remainder of this document, we assume the 544 reader is familiar with the procedures of IGMP/MLD (see [RFC2236] and 545 [RFC2710]), by which hosts announce their interest in receiving 546 particular multicast flows. 548 Consider a Tenant Domain consisting of a set of k BDs: BD1, ..., BDk. 549 To support the OISM procedures, each Tenant Domain must also be 550 associated with a "Supplementary Broadcast Domain" (SBD). An SBD is 551 treated in the control plane as a real BD, but it does not have any 552 ACs. The SBD has several uses, that will be described later in this 553 document. (See Section 2.1.) 555 Each PE that attaches to one or more of the BDs in a given tenant 556 domain will be provisioned to recognize that those BDs are part of 557 the same Tenant Domain. Note that a given PE does not need to be 558 configured with all the BDs of a given Tenant Domain. In general, a 559 PE will only be attached to a subset of the BDs in a given Tenant 560 Domain, and will be configured only with that subset of BDs. 561 However, each PE attached to a given Tenant Domain must be configured 562 with the SBD for that Tenant Domain. 564 Suppose a particular segment of a particular BD is attached to PE1. 565 [RFC7432] specifies that PE1 must originate an Inclusive Multicast 566 Ethernet Tag (IMET) route for that BD, and that the IMET must be 567 propagated to all other PEs attached to the same BD. If the given 568 segment contains a host that has interest in receiving a particular 569 multicast flow, either an (S,G) flow or a (*,G) flow, PE1 will learn 570 of that interest by participating in the IGMP/MLD procedures, as 571 specified in [IGMP-Proxy]. In this case, we will say that: 573 o PE1 is interested in receiving the flow; 575 o The AC attaching the interested host to PE1 is also said to be 576 interested in the flow; 578 o The BD containing an AC that is interested in a particular flow is 579 also said to be interested in that flow. 581 Once PE1 determines that it has interest in receiving a particular 582 flow or set of flows, it uses the procedures of [IGMP-Proxy] to 583 advertise its interest in those flows. It advertises its interest in 584 a given flow by originating a Selective Multicast Ethernet Tag (SMET) 585 route. An SMET route is propagated to the other PEs that attach to 586 the same BD. 588 OISM PEs MUST follow the procedures of [IGMP-Proxy]. In this 589 document, we extend the procedures of [IGMP-Proxy] so that IMET and 590 SMET routes for a particular BD are distributed not just to PEs that 591 attach to that BD, but to PEs that attach to any BD in the Tenant 592 Domain. 594 In this way, each PE attached to a given Tenant Domain learns, from 595 each other PE attached to the same Tenant Domain, the set of flows 596 that are of interest to each of those other PEs. 598 An OISM PE that is provisioned with several BDs in the same Tenant 599 Domain may originate an IMET route for each such BD. To indicate its 600 support of [IGMP-Proxy], it MUST attach the EVPN Multicast Flags 601 Extended Community to each such IMET route. 603 Suppose PE1 is provisioned with both BD1 and BD2, and is provisioned 604 to consider them to be part of the same Tenant Domain. It is 605 possible that PE1 will receive from PE2 both an IMET route for BD1 606 and an IMET route for BD2. If either of these IMET routes has the 607 EVPN Multicast Flags Extended Community, PE1 MUST assume that PE2 is 608 supporting the procedures of [IGMP-Proxy] for ALL BDs in the Tenant 609 Domain. 611 If a PE supports OISM functionality, it MUST indicate that by 612 attaching an "OISM-supported" flag or Extended Community (EC) to all 613 its IMET routes. (Details to be specified in next revision.) An 614 OISM PE SHOULD attach this flag or EC to all the IMET routes it 615 originates. However, if PE1 imports IMET routes from PE2, and at 616 least one of PE2's IMET routes indicates that PE2 is an OISM PE, PE1 617 will assume that PE2 is following OISM procedures. 619 1.5.2. Data Plane 621 Suppose PE1 has an AC to a segment in BD1, and PE1 receives from that 622 AC an (S,G) multicast frame (as defined in Section 1.4). 624 There may be other ACs of PE1 on which TSes have indicated an 625 interest (via IGMP/MLD) in receiving (S,G) multicast packets. PE1 is 626 responsible for sending the received multicast packet out those ACs. 627 There are two cases to consider: 629 o Intra-Subnet Forwarding: In this case, an attachment AC with 630 interest in (S,G) is connected to a segment that is part of the 631 source BD, BD1. If the segment is not multi-homed, or if PE1 is 632 the Designated Forwarder (DF) (see [RFC7432]) for that segment, 633 PE1 sends the multicast frame on that AC without changing the MAC 634 SA. The IP header is not modified at all; in particular, the TTL 635 is not decremented. 637 o Inter-Subnet Forwarding: An AC with interest in (S,G) is connected 638 to a segment of BD2, where BD2 is different than BD1. If PE1 is 639 the DF for that segment (or if the segment is not multi-homed), 640 PE1 decapsulates the IP multicast packet, performs any necessary 641 IP processing (including TTL decrement), then re-encapsulates the 642 packet appropriately for BD2. PE1 then sends the packet on the 643 AC. Note that after re-encapsulation, the MAC SA will be PE1's 644 MAC address on BD2. The IP TTL will have been decremented by 1. 646 In addition, there may be other PEs that are interested in (S,G) 647 traffic. Suppose PE2 is such a PE. Then PE1 tunnels a copy of the 648 IP multicast frame (with its original MAC SA, and with no alteration 649 of the payload's IP header). The tunnel encapsulation contains 650 information that PE2 can use to associate the frame with a source BD. 651 If the source BD is BD1: 653 o If PE2 is attached to BD1, the tunnel encapsulation used to send 654 the frame to PE2 will cause PE2 to identify BD1 as the source BD. 656 o If PE2 is not attached to BD1, the tunnel encapsulation used to 657 send the frame to PE2 will cause PE2 to identify the SBD as the 658 source BD. 660 The way in which the tunnel encapsulation identifies the source BD is 661 of course dependent on the type of tunnel that is used. This will be 662 specified later in this document. 664 When PE2 receives the tunneled frame, it will forward it on any of 665 its ACs that have interest in (S,G). 667 If PE2 determines from the tunnel encapsulation that the source BD is 668 BD1, then 670 o For those ACs that connect PE2 to BD1, the intra-subnet forwarding 671 procedure described above is used, except that it is now PE2, not 672 PE1, carrying out that procedure. Unmodified EVPN procedures from 673 [RFC7432] are used to ensure that a packet originating from a 674 multi-homed segment is never sent back to that segment. 676 o For those ACs that do not connect to BD1, the inter-subnet 677 forwarding procedure described above is used, except that it is 678 now PE2, not PE1, carrying out that procedure. 680 If the tunnel encapsulation identifies the source BD as the SBD, PE2 681 applies the inter-subnet forwarding procedures described above to all 682 of its ACs that have interest in the flow. 684 These procedures ensure that an IP multicast frame travels from its 685 ingress PE to all egress PEs that are interested in receiving it. 686 While in transit, the frame retains its original MAC SA, and the 687 payload of the frame retains its original IP header. Note that in 688 all cases, when an IP multicast packet is sent from one BD to 689 another, these procedures cause its TTL to be decremented by 1. 691 So far we have assumed that an IP multicast packet arrives at its 692 ingress PE over an AC that belongs to one of the BDs in a given 693 Tenant Domain. However, it is possible for a packet to arrive at its 694 ingress PE in other ways. Since an EVPN-PE supporting IRB has an 695 IP-VRF, it is possible that the IP-VRF will have a "VRF interface" 696 that is not an IRB interface. For example, there might be a VRF 697 interface that is actually a physical link to an external ethernet 698 switch, or to a directly attached host, or to a router. When an 699 EVPN-PE, say PE1, receives a packet through such means, we will say 700 that the packet has an "external" source (i.e., a source "outside the 701 tenant domain"). There are also other scenarios in which a multicast 702 packet might have an external source, e.g., it might arrive over an 703 MVPN tunnel from an L3VPN PE. In such cases, we will still refer to 704 PE1 as the "ingress EVPN-PE". 706 When an EVPN-PE, say PE1, receives an externally sourced multicast 707 packet, and there are receivers for that packet inside the Tenant 708 Domain, it does the following: 710 o Suppose PE1 has an AC in BD1 that has interest in (S,G). Then PE1 711 encapsulates the packet for BD1, filling in the MAC SA field with 712 the MAC address of PE1 itself on BD1. It sends the resulting 713 frame on the AC. 715 o Suppose some other EVPN-PE, say PE2, has interest in (S,G). PE1 716 encapsulates the packet for ethernet, filling in the MAC SA field 717 with PE1's own MAC address on the SBD. PE1 then tunnels the 718 packet to PE2. The tunnel encapsulation will identify the source 719 BD as the SBD. Since the source BD is the SBD, PE2 will know to 720 treat the frame as an inter-subnet multicast. 722 When ingress replication is used to transmit IP multicast frames from 723 an ingress EVPN-PE to a set of egress PEs, then of course the ingress 724 PE has to send multiple copies of the frame. Each copy is the 725 original ethernet frame; decapsulation and IP processing take place 726 only at the egress PE. 728 If a Point-to-Multipoint (P2MP) tree or BIER ([EVPN-BIER]) is used to 729 transmit an IP multicast frame from an ingress PE to a set of egress 730 PEs, then the ingress PE only has to send one copy of the frame to 731 each of its next hops. Again, each egress PE receives the original 732 frame and does any necessary IP processing. 734 2. Detailed Model of Operation 736 The model described in Section 1.5.2 can be expressed more precisely 737 using the notion of "IRB interface" (see Appendix A). However, this 738 requires that the semantics of the IRB interface be modified for 739 multicast packets. It is also necessary to have an IRB interface 740 that connects the L3 routing instance of a particular Tenant Domain 741 (in a particular PE) to the SBD of that Tenant Domain. 743 In this section we assume that PIM is not enabled on the IRB 744 interfaces. In general, it is not necessary to enable PIM on the IRB 745 interfaces unless there are PIM routers on one of the Tenant Domain's 746 BDs, or unless there is some other scenario requiring a Tenant 747 Domain's L3 routing instance to become a PIM adjacency of some other 748 system. These cases will be discussed in Section 7. 750 2.1. Supplementary Broadcast Domain 752 Suppose a given Tenant Domain contains three BDs (BD1, BD2, BD3) and 753 two PEs (PE1, PE2). PE1 attaches to BD1 and BD2, while PE2 attaches 754 to BD2 and BD3. 756 To carry out the procedures described above, all the PEs attached to 757 the Tenant Domain must be provisioned to have the SBD for that tenant 758 domain. An RT must be associated with the SBD, and provisioned on 759 each of those PEs. We will refer to that RT as the "SBD-RT". 761 A Tenant Domain is also configured with an IP-VRF ([EVPN-IRB]), and 762 the IP-VRF is associated with an RT. This RT MAY be the same as the 763 SBD-RT. 765 Suppose an (S,G) multicast frame originating on BD1 has a receiver on 766 BD3. PE1 will transmit the packet to PE2 as a frame, and the 767 encapsulation will identify the frame's source BD as BD1. Since PE2 768 is not provisioned with BD1, it will treat the packet as if its 769 source BD were the SBD. That is, a packet can be transmitted from 770 BD1 to BD3 even though its ingress PE is not configured for BD3, and/ 771 or its egress PE is not configured for BD1. 773 EVPN supports service models in which a given EVPN Instance (EVI) can 774 contain only one BD. It also supports service models in which a 775 given EVI can contain multiple BDs. The SBD can be treated either as 776 its own EVI, or it can be treated as one BD within an EVI that 777 contains multiple BDs. The procedures specified in this document 778 accommodate both cases. 780 2.2. When is a Route About/For/From a Particular BD 782 In this document, we will frequently say that a particular route is 783 "about" a particular BD, or is "from" a particular BD, or is "for" a 784 particular BD or is "related to" a particular BD. These terms are 785 used interchangeably. In this section, we explain exactly what that 786 means. 788 In EVPN, each BD is assigned an RT. In some service models, each BD 789 is assigned a unique RT. In other service models, a set of BDs (all 790 in the same Tenant Domain) may be assigned the same RT. (An RT is 791 actually assigned to a MAC-VRF, and hence is shared by all the BDs 792 that share the MAC-VRF.) The RT is a BGP extended community that may 793 be attached to the BGP routes used by the EVPN control plane. 795 In those service models that allow a set of BDs to share a single RT, 796 each BD is assigned a non-zero Tag ID. The Tag ID appears in the 797 Network Layer Reachability Information (NLRI) of many of the BGP 798 routes that are used by the EVPN control plane. 800 A route is about a particular BD if it carries the RT that has been 801 assigned to that BD, and its NLRI contains the Tag ID that has been 802 assigned to that BD. 804 Note that a route that is about a particular BD may also carry 805 additional RTs. 807 2.3. Use of IRB Interfaces at Ingress PE 809 When an (S,G) multicast frame is received from an AC belonging to a 810 particular BD, say BD1: 812 1. The frame is sent unchanged to other EVPN-PEs that are interested 813 in (S,G) traffic. The encapsulation used to send the frame to 814 the other EVPN-PEs depends on the tunnel type being used for 815 multicast transmission. (For our purposes, we consider Ingress 816 Replication (IR), Assisted Replication (AR) and BIER to be 817 "tunnel types", even though IR, AR and BIER do not actually use 818 P2MP tunnels.) At the egress PE, the source BD of the frame can 819 be inferred from the tunnel encapsulation. If the egress PE is 820 not attached to the real source BD, it will infer that the source 821 BD is the SBD. 823 Note that the the inter-PE transmission of a multicast frame 824 among EVPN-PEs of the same Tenant Domain does NOT involve the IRB 825 interfaces, as long as the multicast frame was received over an 826 AC attached to one of the Tenant Domain's BDs. 828 2. The frame is also sent up the IRB interface that attaches BD1 to 829 the Tenant Domain's L3 routing instance in this PE. That is, the 830 L3 routing instance, behaving as if it were a multicast router, 831 receives the IP multicast frames that arrive at the PE from its 832 local ACs. The L3 routing instance decapsulates the frame's 833 payload to extract the IP multicast packet, decrements the IP 834 TTL, adjusts the header checksum, and does any other necessary IP 835 processing (e.g., fragmentation). 837 3. The L3 routing instance keeps track of which BDs have local 838 receivers for (S,G) traffic. (A "local receiver" is a tenant 839 system, reachable via a local attachment circuit that has 840 expressed interest in (S,G) traffic.) If the L3 routing instance 841 has an IRB interface to BD2, and it knows that BD2 has a LOCAL 842 receiver interested in (S,G) traffic, it encapsulates the packet 843 in an ethernet header for BD2, putting its own MAC address in the 844 MAC SA field. Then it sends the packet down the IRB interface to 845 BD2. 847 If a packet is sent from the L3 routing instance to a particular BD 848 via the IRB interface (step 3 in the above list), and if the BD in 849 question is NOT the SBD, the packet is sent ONLY to LOCAL ACs of that 850 BD. If the packet needs to go to other PEs, it has already been sent 851 to them in step 1. Note that this is a change in the IRB interface 852 semantics from what is described in [EVPN-IRB] and Figure 2. 854 Existing EVPN procedures ensure that a packet is not sent by a given 855 PE to a given locally attached segment unless the PE is the DF for 856 that segment. Those procedures also ensure that a packet is never 857 sent by a PE to its segment of origin. Thus EVPN segment multi- 858 homing is fully supported; duplicate delivery to a segment or looping 859 on a segment are thereby prevented, without the need for any new 860 procedures to be defined in this document. 862 What if an IP multicast packet is received from outside the tenant 863 domain? For instance, perhaps PE1's IP-VRF for a particular tenant 864 domain also has a physical interface leading to an external switch, 865 host, or router, and PE1 receives an IP multicast packet or frame on 866 that interface. Or perhaps the packet is from an L3VPN, or a 867 different EVPN Tenant Domain. 869 Such a packet is first processed by the L3 routing instance, which 870 decrements TTL and does any other necessary IP processing. Then the 871 packet is sent into the Tenant Domain by sending it down the IRB 872 interface to the SBD of that Tenant Domain. This requires 873 encapsulating the packet in an ethernet header, with the PE's own MAC 874 address, on the SBD, in the MAC SA field. 876 An IP multicast packet sent by the L3 routing instance down the IRB 877 interface to the SBD is treated as if it had arrived from a local AC, 878 and steps 1-3 are applied. Note that the semantics of sending a 879 packet down the IRB interface to the SBD are thus slightly different 880 than the semantics of sending a packet down other IRB interfaces. IP 881 multicast packets sent down the SBD's IRB interface may be 882 distributed to other PEs, but IP multicast packets sent down other 883 IRB interfaces are distributed only to local ACs. 885 If a PE sends a link-local multicast packet down the SBD IRB 886 interface, that packet will be distributed (as an ethernet frame) to 887 other PEs of the Tenant Domain, but will not appear on any of the 888 actual BDs. 890 2.4. Use of IRB Interfaces at an Egress PE 892 Suppose an egress EVPN-PE receives an (S,G) multicast frame from the 893 frame's ingress EVPN-PE. As described above, the packet will arrive 894 as an ethernet frame over a tunnel from the ingress PE, and the 895 tunnel encapsulation will identify the source BD of the ethernet 896 frame. 898 We define the notion of the frame's "inferred source BD" as follows. 899 If the egress PE is attached to the actual source BD, the actual 900 source BD is the inferred source BD. If the egress PE is not 901 attached to the actual source BD, the inferred source BD is the SBD. 903 The egress PE now takes the following steps: 905 1. If the egress PE has ACs belonging to the inferred source BD of 906 the frame, it sends the frame unchanged to any ACs of that BD 907 that have interest in (S,G) packets. The MAC SA of the frame is 908 not modified, and the IP header of the frame's payload is not 909 modified in any way. 911 2. The frame is also sent to the L3 routing instance by being sent 912 up the IRB interface that attaches the L3 routing instance to the 913 inferred source BD. Steps 2 and 3 of Section 2.3 are then 914 applied. 916 2.5. Announcing Interest in (S,G) 918 [IGMP-Proxy] defines the procedures used by an egress PE to announce 919 its interest in a multicast flow or set of flows. This is done by 920 originating an SMET route. If an egress PE determines it has LOCAL 921 receivers in a particular BD that are interested in a particular set 922 of flows, it originates one or more SMET routes for that BD. The 923 SMET route specifies a flow or set of flows, and identifies the 924 egress PE. The SMET route is specific to a particular BD. A PE that 925 originates an SMET route is announcing "I have receivers for (S,G) or 926 (*,G) in BD-x". 928 In [IGMP-Proxy], an SMET route for a particular BD carries a Route 929 Target (RT) that ensures it will be distributed to all PEs that are 930 attached to that BD. In this document, it is REQUIRED that an SMET 931 route also carry the RT that is assigned to the SBD. This ensures 932 that every ingress PE attached to a particular Tenant Domain will 933 learn of all other PEs (attached to the same Tenant Domain) that have 934 interest in a particular set of flows. Note that it is not necessary 935 for the ingress PE to have any BDs other than the SBD in common with 936 the egress PEs. 938 Since the SMET routes from any BD in a given Tenant Domain are 939 propagated to all PEs of that Tenant Domain, an (S,G) receiver on one 940 BD can receive (S,G) packets that originate in a different BD. 941 Within an EVPN domain, a given IP source address can only be on one 942 BD. Therefore inter-subnet multicasting can be done, within the 943 Tenant Domain, without requiring any Rendezvous Points, shared trees, 944 or other complex aspects of multicast routing infrastructure. (Note 945 that while the MAC addresses do not have to be unique across all the 946 BDs in a Tenant Domain, the IP addresses to have to be unique across 947 all those BDs.) 949 If some PE attached to the Tenant Domain does not support [IGMP- 950 Proxy], it will be assumed to be interested in all flows. Whether a 951 particular remote PE supports [IGMP-Proxy] is determined by the 952 presence of the Multicast Flags Extended Community in its IMET route; 953 this is specified in [IGMP-Proxy].) 955 2.6. Tunneling Frames from Ingress PE to Egress PEs 957 [RFC7432] specifies the procedures for setting up and using "BUM 958 tunnels". A BUM tunnel is a tunnel used to carry traffic on a 959 particular BD if that traffic is (a) broadcast traffic, or (b) 960 unicast traffic with an unknown MAC DA, or (c) ethernet multicast 961 traffic. 963 This document allows the BUM tunnels to be used as the default 964 tunnels for transmitting intra-subnet IP multicast frames. It also 965 allows a separate set of tunnels to be used, instead of the BUM 966 tunnels, as the default tunnels for carrying intra-subnet IP 967 multicast frames. Let's call these "IP Multicast Tunnels". 969 When the tunneling is done via Ingress Replication or via BIER, this 970 difference is of no significance. However, when P2MP tunnels are 971 used, there is a significant advantages to having separate IP 972 multicast tunnels. 974 It is desirable for an ingress PE to transmit a copy of a given (S,G) 975 multicast frame on only one tunnel. All egress PEs interested in 976 (S,G) packets must then join that tunnel. If the source BD/PE for an 977 (S,G) packet is BD1/PE1, and PE2 has receivers for (S,G) on BD2, PE2 978 must join the P2MP LSP on which PE1 transmits the frame. PE2 must 979 join this P2MP LSP even if PE2 is not attached to the source BD 980 (BD1). If PE1 were transmitting the multicast frame on its BD1 BUM 981 tunnel, then PE2 would have to join the BD1 BUM tunnel, even though 982 PE2 has no BD1 attachment circuits. This would cause PE2 to pull all 983 the BUM traffic from BD1, most of which it would just have to 984 discard. Thus we RECOMMEND that the default IP multicast tunnels be 985 distinct from the BUM tunnels. 987 Whether or not the default IP multicast tunnels are distinct from the 988 BUM tunnels, selective tunnels for particular multicast flows can 989 still be used. Traffic sent on a selective tunnel would not be sent 990 on the default tunnel. 992 Notwithstanding the above, link local IP multicast traffic MUST 993 always be carried on the BUM tunnels, and ONLY on the BUM tunnels. 994 Link local IP multicast traffic consists of IPv4 traffic with a 995 destination address prefix of 224/8 and IPv6 traffic with a 996 destination address prefix of FF02/16. In this document, the terms 997 "IP multicast packet" and "IP multicast frame" are defined in 998 Section 1.4 so as to exclude the link-local traffic. 1000 2.7. Advanced Scenarios 1002 There are some deployment scenarios that require special procedures: 1004 1. Some multicast sources or receivers are attached to PEs that 1005 support [RFC7432], but do not support this document or 1006 [EVPN-IRB]. To interoperate with these "non-OISM PEs", it is 1007 necessary to have one or more gateway PEs that interface the 1008 tunnels discussed in this document with the BUM tunnels of the 1009 legacy PEs. This is discussed in Section 5. 1011 2. Sometimes multicast traffic originates from outside the EVPN 1012 domain, or needs to be sent outside the EVPN domain. This is 1013 discussed in Section 6. An important special case of this, 1014 integration with MVPN, is discussed in Section 6.1.2. 1016 3. In some scenarios, one or more of the tenant systems is a PIM 1017 router, and the Tenant Domain is used for as a transit network 1018 that is part of a larger multicast domain. This is discussed in 1019 Section 7. 1021 3. EVPN-aware Multicast Solution Control Plane 1023 3.1. Supplementary Broadcast Domain (SBD) and Route Targets 1025 Every Tenant Domain is associated with a single Supplementary 1026 Broadcast Domain (SBD), as discussed in Section 2.1. Recall that a 1027 Tenant Domain is defined to be a set of BDs that can freely send and 1028 receive IP multicast traffic to/from each other. If an EVPN-PE has 1029 one or more ACs in a BD of a particular Tenant Domain, and if the 1030 EVPN-PE supports the procedures of this document, that EVPN-PE must 1031 be provisioned with the SBD of that Tenant Domain. 1033 At each EVPN-PE attached to a given Tenant Domain, there is an IRB 1034 interface leading from the L3 routing instance of that Tenant Domain 1035 and the SBD. However, the SBD has no ACs. 1037 The SBD may be in an EVPN Instance (EVI) of its own, or it may be one 1038 of several BDs (of the same Tenant Domain) in an EVI. 1040 Each SBD is provisioned with a Route Target (RT). All the EVPN-PEs 1041 supporting a given SBD are provisioned with that RT as an import RT. 1043 Each SBD is also provisioned with a "Tag ID" (see Section 6 of 1044 [RFC7432]). 1046 o If the SBD is the only BD in its EVI, the mapping from RT to SBD 1047 is one-to-one. The Tag ID is zero. 1049 o If the SBD is one of several BDs in its EVI, it may have its own 1050 RT, or it may share an RT with one or more of those other BDs. In 1051 either case, it must be assigned a non-zero Tag ID. The mapping 1052 from is always one-to-one. 1054 We will use the term "SBD-RT" to denote the RT has has been assigned 1055 to an SBD. Routes carrying this RT will be propagated to all 1056 EVPN-PEs in the same Tenant Domain as the originator. 1058 An EVPN-PE that receives a route can always determine whether a 1059 received route "belongs to" a particular SBD, by seeing if that route 1060 carries the SBD-RT and has the Tag ID of the SBD in its NLRI. 1062 If the VLAN-based service model is being used for a particular Tenant 1063 Domain, and thus each BD is in a distinct EVI, it is natural to have 1064 the SBD be in a distinct EVI as well. If the VLAN-aware bundle 1065 service is being used, it is natural to include the SBD in the same 1066 EVI that contains the other BDs. However, it is not required to do 1067 so; the SBD can still be placed in an EVI of its own, if that is 1068 desired. 1070 Note that an SBD, just like any other BD, is associated on each 1071 EVPN-PE with a MAC-VRF. Per [RFC7432], each MAC-VRF is associated 1072 with a Route Distinguisher (RD). When constructing a route that is 1073 "about" an SBD, an EVPN-PE will place the RD of the associated 1074 MAC-VRF in the "Route Distinguisher" field of the NLRI. (If the 1075 Tenant Domain has several MAC-VRFs on a given PE, the EVPN-PE has a 1076 choice of which RD to use.) 1078 If Assisted Replication (AR, see [EVPN-AR]) is used, each 1079 AR-REPLICATOR for a given Tenant Domain must be provisioned with the 1080 SBD of that Tenant Domain, even if the AR-REPLICATOR does not have 1081 any L3 routing instance. 1083 3.2. Advertising the Tunnels Used for IP Multicast 1085 The procedures used for advertising the tunnels that carry IP 1086 multicast traffic depend upon the type of tunnel being used. If the 1087 tunnel type is neither Ingress Replication, Assisted Replication, nor 1088 BIER, there are procedures for advertising both "inclusive tunnels" 1089 and "selective tunnels". 1091 When IR, AR or BIER are used to transmit IP multicast packets across 1092 the core, there are no P2MP tunnels. Once an ingress EVPN-PE 1093 determines the set of egress EVPN-PEs for a given flow, the IMET 1094 routes contain all the information needed to transport packets of 1095 that flow to the egress PEs. 1097 If AR is used, the ingress EVPN-PE is also an AR-LEAF and the IMET 1098 route coming from the selected AR-REPLICATOR contains the information 1099 needed. The AR-REPLICATOR will behave as an ingress EVPN-PE when 1100 sending a flow to the egress EVPN-PEs. 1102 If the tunneling technique requires P2MP tunnels to be set up (e.g., 1103 RSVP-TE P2MP, mLDP, PIM), some of the tunnels may be selective 1104 tunnels and some may be inclusive tunnels. 1106 Selective tunnels are always advertised by the ingress PE using 1107 S-PMSI A-D routes ([EVPN-BUM]). 1109 For inclusive tunnels, there is a choice between using a BD's 1110 ordinary "BUM tunnel" [RFC7432] as the default inclusive tunnel for 1111 carrying IP multicast traffic, or using a separate IP multicast 1112 tunnel as the default inclusive tunnel for carrying IP multicast. In 1113 the former case, the inclusive tunnel is advertised in an IMET route. 1114 In the latter case, the inclusive tunnel is advertised in a (C-*,C-*) 1115 S-PMSI A-D route ([EVPN-BUM]). Details may be found in subsequent 1116 sections. 1118 3.2.1. Constructing SBD Routes 1120 3.2.1.1. Constructing an SBD-IMET Route 1122 In general, an EVPN-PE originates an IMET route for each real BD. 1123 Whether an EVPN-PE has to originate an IMET route for the SBD (of a 1124 particular Tenant Domain) depends upon the type of tunnels being used 1125 to carry EVPN multicast traffic across the backbone. In some cases, 1126 an IMET route does not need to be originated for the SBD, but the 1127 other IMET routes have to carry the SBD-RT as well as any other RTs 1128 they would ordinarily carry (per [RFC7432]. 1130 Subsequent sections will specify when it is necessary for an EVPN-PE 1131 to originate an IMET route for the SBD. We will refer to such a 1132 route as an "SBD-IMET route". 1134 When an EVPN-PE needs to originate an SBD-IMET route that is "for" 1135 the SBD, it constructs the route as follows: 1137 o the RD field of the route's NLRI is set to the RD of the MAC-VRF 1138 that is associated with the SBD; 1140 o a Route Target Extended Community containing the value of the 1141 SBD-RT is attached to that route; 1143 o the "Tag ID" field of the NLRI is set to the Tag ID that has been 1144 assigned to the SBD. This is most likely 0 if a VLAN-based or 1145 VLAN-bundle service is being used and non-zero if a VLAN-aware 1146 bundle service is being used. 1148 3.2.1.2. Constructing an SBD-SMET Route 1150 An EVPN-PE can originate an SMET route to indicate that it has 1151 receivers, on a specified BD, for a specified multicast flow. In 1152 some scenarios, an EVPN-PE must originate an SMET route that is for 1153 the SBD, which we will call an "SBD-SMET route". Whether an EVPN-PE 1154 has to originate an SMET route for the SBD (of a particular tenant 1155 domain) depends upon various factors, detailed in subsequent 1156 sections. 1158 When an EVPN-PE needs to originate an SBD-SMET route that is "for" 1159 the SBD, it constructs the route as follows: 1161 o the RD field of the route's NLRI is set to the RD of the MAC-VRF 1162 that is associated with the SBD; 1164 o a Route Target Extended Community containing the value of the 1165 SBD-RT is attached to that route; 1167 o the "Tag ID" field of the NLRI is set to the Tag ID that has been 1168 assigned to the SBD. This is most likely 0 if a VLAN-based or 1169 VLAN-bundle service is being used and non-zero if a VLAN-aware 1170 bundle service is being used. 1172 3.2.1.3. Constructing an SBD-SPMSI Route 1174 An EVPN-PE can originate an S-PMSI A-D route (see [EVPN-BUM]) to 1175 indicate that it is going to use a particular P2MP tunnel to carry 1176 the traffic of particular IP multicast flows. In general, an S-PMSI 1177 A-D route is specific to a particular BD. In some scenarios, an 1178 EVPN-PE must originate an S-PMSI A-D route that is for the SBD, which 1179 we will call an "SBD-SPMSI route". Whether an EVPN-PE has to 1180 originate an SBD-SPMSI route for (of a particular Tenant Domain) 1181 depends upon various factors, detailed in subsequent sections. 1183 When an EVPN-PE needs to originate an SBD-SPMSI route that is "for" 1184 the SBD, it constructs the route as follows: 1186 o the RD field of the route's NLRI is set to the RD of the MAC-VRF 1187 that is associated with the SBD; 1189 o a Route Target Extended Community containing the value of the 1190 SBD-RT is attached to that route; 1192 o the "Tag ID" field of the NLRI is set to the Tag ID that has been 1193 assigned to the SBD. This is most likely 0 if a VLAN-based or 1194 VLAN-bundle service is being used and non-zero if a VLAN-aware 1195 bundle service is being used. 1197 3.2.2. Ingress Replication 1199 When Ingress Replication (IR) is used to transport IP multicast 1200 frames of a given Tenant Domain, each EVPN-PE attached to that Tenant 1201 Domain MUST originate an SBD-IMET route, as described in 1202 Section 3.2.1.1. 1204 The SBD-IMET route MUST carry a PMSI Tunnel attribute (PTA), and the 1205 MPLS label field of the PTA MUST specify a downstream-assigned MPLS 1206 label that maps uniquely (in the context of the originating EVPN-PE) 1207 to the SBD. 1209 An EVPN-PE MUST also originate an IMET route for each BD to which it 1210 is attached, following the procedures of [RFC7432]. Each of these 1211 IMET routes carries a PTA that specifying a downstream-assigned label 1212 that maps uniquely (in the context of the originating EVPN-PE) to the 1213 BD in question. These IMET routes need not carry the SBD-RT. 1215 When an ingress EVPN-PE needs to use IR to send an IP multicast frame 1216 from a particular source BD to an egress EVPN-PE, the ingress PE 1217 determines whether the egress PE has originated an IMET route for 1218 that BD. If so, that IMET route contains the MPLS label that the 1219 egress PE has assigned to the source BD. The ingress PE uses that 1220 label when transmitting the packet to the egress PE. Otherwise, the 1221 ingress PE uses the label that the egress PE has assigned to the SBD 1222 (in the SBD-IMET route originated by the egress). 1224 Note that the set of IMET routes originated by a given egress PE, and 1225 installed by a given ingress PE, will change over time. If the 1226 egress PE withdraws its IMET route for the source BD, the ingress PE 1227 must stop using the label carried in that IMET route, and start using 1228 the label carried in the SBD-IMET route from that egress PE. 1230 3.2.3. Assisted Replication 1232 When Assisted Replication is used to transport IP multicast frames of 1233 a given Tenant Domain, each EVPN-PE (including the AR-REPLICATOR) 1234 attached to the Tenant Domain MUST originate an SBD-IMET route, as 1235 described in Section 3.2.1.1. 1237 An AR-REPLICATOR attached to a given Tenant Domain is considered to 1238 be an EVPN-PE of that Tenant Domain. It is attached to all the BDs 1239 in the Tenant Domain, but it has no IRB interfaces. 1241 As with Ingress Replication, the SBD-IMET route carries a PTA where 1242 the MPLS label field specifies the downstream-assigned MPLS label 1243 that identifies the SBD. However, the AR-REPLICATOR and AR-LEAF 1244 EVPN-PEs will set the PTA's flags differently, as per [EVPN-AR]. 1246 In addition, each EVPN-PE originates an IMET route for each BD to 1247 which it is attached. As in the case of Ingress Replication, these 1248 routes carry the downstream-assigned MPLS labels that identify the 1249 BDs and do not carry the SBD-RT. 1251 When an ingress EVPN-PE, acting as AR-LEAF, needs to send an IP 1252 multicast frame from a particular source BD to an egress EVPN-PE, the 1253 ingress PE determines whether there is any AR-REPLICATOR that 1254 originated an IMET route for that BD. After the AR-REPLICATOR 1255 selection (if there are more than one), the AR-LEAF uses the label 1256 contained in the IMET route of the AR-REPLICATOR when transmitting 1257 packets to it. The AR-REPLICATOR receives the packet and, based on 1258 the procedures specified in [EVPN-AR], transmits the packets to the 1259 egress EVPN-PEs using the labels contained in the IMET routes 1260 received from the egress PEs. 1262 If an ingress AR-LEAF for a given BD has not received any IMET route 1263 for that BD from an AR-REPLICATOR, the ingress AR-LEAF follows the 1264 procedures in Section 3.2.2. 1266 3.2.4. BIER 1268 When BIER is used to transport multicast packets of a given Tenant 1269 Domain, each EVPN-PE attached to that Tenant Domain MUST originate an 1270 SBD-IMET route, as described in Section 3.2.1.1. 1272 In addition, IMET routes that are originated for other BDs in the 1273 Tenant Domain MUST carry the SBD-RT. 1275 Each IMET route (including but not limited to the SBD-IMET route) 1276 MUST carry a PMSI Tunnel attribute (PTA). The MPLS label field of 1277 the PTA MUST specify an upstream-assigned MPLS label that maps 1278 uniquely (in the context of the originating EVPN-PE) to the BD for 1279 which the route is originated. 1281 When an ingress EVPN-PE uses BIER to send an IP multicast packet 1282 (inside an ethernet frame) from a particular source BD to a set of 1283 egress EVPN-PEs, the ingress PE follows the BIER encapsulation with 1284 the upstream-assigned label it has assigned to the source BD. (This 1285 label will come from the originated SBD-IMET route ONLY if the 1286 traffic originated from outside the Tenant Domain.) An egress PE can 1287 determine from that label whether the packet's source BD is one of 1288 the BDs to which the egress PE is attached. 1290 Further details on the use of BIER to support EVPN can be found in 1291 [EVPN-BIER]. 1293 3.2.5. Inclusive P2MP Tunnels 1295 3.2.5.1. Using the BUM Tunnels as IP Multicast Inclusive Tunnels 1297 The procedures in this section apply only when it is desired to use 1298 the BUM tunnels to carry IP multicast traffic across the backbone. 1299 In this cases, an IP multicast frame (whether inter-subnet or 1300 intra-subnet) will be carried across the backbone in the BUM tunnel 1301 belonging to its source BD. An EVPN-PE attached to a given Tenant 1302 Domain will then need to join the BUM tunnels for each BD in the 1303 Tenant Domain, even if the EVPN-PE is not attached to all of those 1304 BDs. The reason is that an IP multicast packet from any source BD 1305 might be needed by an EVPN-PE that is not attached to that source 1306 domain. 1308 Note that this will cause BUM traffic from a given BD in a Tenant 1309 Domain to be sent to all PEs that attach to that tenant domain, even 1310 the PEs that don't attach to the given BD. To avoid this, it is 1311 RECOMMENDED that the BUM tunnels not be used as IP Multicast 1312 inclusive tunnels, and that the procedures of Section 3.2.5.2 be used 1313 instead. 1315 3.2.5.1.1. RSVP-TE P2MP 1317 When BUM tunnels created by RSVP-TE P2MP are used to transport IP 1318 multicast frames of a given Tenant Domain, each EVPN-PE attached to 1319 that Tenant Domain MUST originate an SBD-IMET route, as described in 1320 Section 3.2.1.1. 1322 In addition, IMET routes that are originated for other BDs in the 1323 Tenant Domain MUST carry the SBD-RT. 1325 Each IMET route (including but not limited to the SBD-IMET route) 1326 MUST carry a PMSI Tunnel attribute (PTA). 1328 If received IMET route is not the SBD-IMET route, it will also be 1329 carrying the RT for its source BD. The route's NLRI will carry the 1330 Tag ID for the source BD. From the RT and the Tag ID, any PE 1331 receiving the route can determine the route's source BD. 1333 If the MPLS label field of the PTA contains zero, the specified 1334 RSVP-TE P2MP tunnel is used only to carry frames of a single source 1335 BD. 1337 If the MPLS label field of the PTA does not contain zero, it MUST 1338 contain an upstream-assigned MPLS label that maps uniquely (in the 1339 context of the originating EVPN-PE) to the source BD (or, in the case 1340 of an SBD-IMET route, the SBD). The tunnel may be used to carry 1341 frames of multiple source BDs, and the source BD for a particular 1342 packet is inferred from the label carried by the packet. 1344 IP multicast traffic originating outside the Tenant Domain is 1345 transmitted with the label corresponding to the SBD, as specified in 1346 the ingress EVPN-PE's SBD-IMET route. 1348 3.2.5.1.2. mLDP or PIM 1350 When either mLDP or PIM is used to transport multicast packets of a 1351 given Tenant Domain, an EVPN-PE attached to that tenant domain 1352 originates an SBD-IMET route only if it is the ingress PE for IP 1353 multicast traffic originating outside the tenant domain. Such 1354 traffic is treated as having the SBD as its source BD. 1356 An EVPN-PE MUST originate an IMET routes for each BD to which it is 1357 attached. These IMET routes MUST carry the SBD-RT of the Tenant 1358 Domain to which the BD belongs. Each such IMET route must also carry 1359 the RT of the BD to which it belongs. 1361 When an IMET route (other than the SBD-IMET route) is received by an 1362 egress PE, the route will be carrying the RT for its source BD and 1363 the route's NLRI will contain the Tag ID for that source BD. This 1364 allows any PE receiving the route to determine the source BD 1365 associated with the route. 1367 If the MPLS label field of the PTA contains zero, the specified mLDP 1368 or PIM tunnel is used only to carry frames of a single source BD. 1370 If the MPLS label field of the PTA does not contain zero, it MUST 1371 contain an upstream-assigned MPLS label that maps uniquely (in the 1372 context of the originating EVPN-PE) to the source BD. The tunnel may 1373 be used to carry frames of multiple source BDs, and the source BD for 1374 a particular packet is inferred from the label carried by the packet. 1376 The EVPN-PE advertising these IMET routes is specifying the default 1377 tunnel that it will use (as ingress PE) for transmitting IP multicast 1378 packets. The upstream-assigned label allows an egress PE to 1379 determine the source BD of a given packet. 1381 The procedures of this section apply whenever the tunnel technology 1382 is based on the construction of the multicast trees in a "receiver- 1383 driven" manner; mLDP and PIM are two ways of constructing trees in a 1384 receiver-driven manner. 1386 3.2.5.2. Using Wildcard S-PMSI A-D Routes to Advertise Inclusive 1387 Tunnels Specific to IP Multicast 1389 The procedures of this section apply when (and only when) it is 1390 desired to transmit IP multicast traffic on an inclusive tunnel, but 1391 not on the same tunnel used to transmit BUM traffic. 1393 However, these procedures do NOT apply when the tunnel type is 1394 Ingress Replication or BIER, EXCEPT in the case where it is necessary 1395 to interwork between non-OISM PEs and OISM PEs, as specified in 1396 Section 5. 1398 Each EVPN-PE attached to the given Tenant Domain MUST originate an 1399 SBD-SPMSI A-D route. The NLRI of that route MUST contain (C-*,C-*) 1400 (see [RFC6625]). Additional rules for constructing that route are 1401 given in Section 3.2.1.3. 1403 In addition, an EVPN-PE MUST originate an S-PMSI A-D route containing 1404 (C-*,C-*) in its NLRI for each of the other BDs in the Tenant Domain 1405 to which it is attached. All such routes MUST carry the SBD-RT. 1406 This ensures that those routes are imported by all EVPN-PEs attached 1407 to the Tenant Domain. 1409 The route carrying the PTA will also be carrying the RT for that 1410 source BD, and the route's NLRI will contain the Tag ID for that 1411 source BD. This allows any PE receiving the route to determine the 1412 source BD associated with the route. 1414 If the MPLS label field of the PTA contains zero, the specified 1415 tunnel is used only to carry frames of a single source BD. 1417 If the MPLS label field of the PTA does not contain zero, it MUST 1418 specify an upstream-assigned MPLS label that maps uniquely (in the 1419 context of the originating EVPN-PE) to the source BD. The tunnel may 1420 be used to carry frames of multiple source BDs, and the source BD for 1421 a particular packet is inferred from the label carried by the packet. 1423 The EVPN-PE advertising these S-PMSI A-D route routes is specifying 1424 the default tunnel that it will use (as ingress PE) for transmitting 1425 IP multicast packets. The upstream-assigned label allows an egress 1426 PE to determine the source BD of a given packet. 1428 3.2.6. Selective Tunnels 1430 An ingress EVPN-PE for a given multicast flow or set of flows can 1431 always assign the flow to a particular P2MP tunnel by originating an 1432 S-PMSI A-D route whose NLRI identifies the flow or set of flows. The 1433 NLRI of the route could be (C-*,C-G), or (C-S,C-G). The S-PMSI A-D 1434 route MUST carry the SBD-RT, so that it is imported by all EVPN-PEs 1435 attached to the Tenant Domain. 1437 An S-PMSI A-D route is "for" a particular source BD. It MUST carry 1438 the RT associated with that BD, and it MUST have the Tag ID for that 1439 BD in its NLRI. 1441 Each such route MUST contain a PTA, as specified in Section 3.2.5.2. 1443 An egress EVPN-PE interested in the specified flow or flows MUST join 1444 the specified tunnel. Procedures for joining the specified tunnel 1445 are specific to the tunnel type. (Note that if the tunnel type is 1446 RSVP-TE P2MP LSP, the Leaf Information Required (LIR) flag of the PTA 1447 SHOULD NOT be set. An ingress OISM PE knows which OISM EVPN PEs are 1448 interested in any given flow, and hence can add them to the RSVP-TE 1449 P2MP tunnel that carries such flows.) 1451 When an EVPN-PE imports an S-PMSI A-D route, it infers the source BD 1452 from the RTs and the Tag ID. If the EVPN-PE is not attached to the 1453 source BD, the tunnel it specifies is treated as belonging to the 1454 SBD. That is, packets arriving on that tunnel are treated as having 1455 been sourced in the SBD. Note that a packet is only considered to 1456 have arrived on the specified tunnel if the packet carries the 1457 upstream-assigned label specified in in the PTA, or if there is no 1458 upstream-assigned label specified in the PTA. 1460 It should be noted that when either IR or BIER is used, there is no 1461 need for an ingress PE to use S-PMSI A-D routes to assign specific 1462 flows to selective tunnels. The procedures of Section 3.3, along 1463 with the procedures of Section 3.2.2, Section 3.2.3, or 1464 Section 3.2.4, provide the functionality of selective tunnels without 1465 the need to use S-PMSI A-D routes. 1467 3.3. Advertising SMET Routes 1469 [IGMP-Proxy] allows an egress EVPN-PE to express its interest in a 1470 particular multicast flow or set of flows by originating an SMET 1471 route. The NLRI of the SMET route identifies the flow or set of 1472 flows as (C-*,C-*) or (C-*,C-G) or (C-S,C-G). 1474 Each SMET route belongs to a particular BD. The Tag ID for the BD 1475 appears in the NLRI of the route, and the route carries the RT 1476 associated that that BD. From this pair, other EVPN-PEs 1477 can identify the BD to which a received SMET route belongs. 1478 (Remember though that the route may be carrying multiple RTs.) 1480 There are two cases to consider: 1482 1. Case 1: When it is known that no BD of a Tenant Domain contains a 1483 multicast router. 1485 In this case, an egress PE can advertise its interest in a flow 1486 or set of flows by originating a single SMET route. The SMET 1487 route will belong to the SBD. We refer to this as an SBD-SMET 1488 route. The SBD-SMET route carries the SBD-RT, and has the Tag ID 1489 for the SBD in its NLRI. SMET routes for the individual BDs are 1490 not needed. 1492 2. Case 2: When it is possible that a BD of a Tenant Domain contains 1493 a multicast router. 1495 Suppose that an egress PE is attached to a BD on which there 1496 might be a tenant multicast router. (The tenant router is not 1497 necessarily on a segment that is attached to that PE.) And 1498 suppose that the PE has one or more ACs attached to that BD which 1499 are interested in a given multicast flow. In this case, IN 1500 ADDITION to the SMET route for the SBD, the egress PE MUST 1501 originate an SMET route for that BD. This will enable the 1502 ingress PE(s) to send IGMP/MLD messages on ACs for the BD, as 1503 specified in [IGMP-Proxy]. 1505 If an SMET route is not an SBD-SMET route, and if the SMET route 1506 is for (C-S,C-G) (i.e., no wildcard source), and if the EVPN-PE 1507 originating it knows the source BD of C-S, it MAY put only the RT 1508 for that BD on the route. Otherwise, the route MUST carry the 1509 SBD-RT, so that it gets distributed to all the EVPN-PEs attached 1510 to the tenant domain. 1512 As detailed in [IGMP-Proxy], an SMET route carries flags saying 1513 whether it is to result in the propagation of IGMP v1, v2, or v3 1514 messages on the ACs of the BD to which the SMET route belongs. These 1515 flags SHOULD be set to zero in an SBD-SMET route. 1517 Note that a PE only needs to originate the set SBD-SMET routes that 1518 are needed to pull in all the traffic in which it is interested. 1519 Suppose PE1 has ACs attached to BD1 that are interested in (C-*,C-G) 1520 traffic, and ACs attached to BD2 that are interested in (C-S,C-G) 1521 traffic. A single SBD-SMET route specifying (C-*,C-G) will pull in 1522 all the necessary flows. 1524 As another example, suppose the ACs attached to BD1 are interested in 1525 (C-*,C-G) but not in (C-S,C-G), while the ACs attached to BD2 are 1526 interested in (C-S,C-G). A single SBD-SMET route specifying 1527 (C-*,C-G) will pull in all the necessary flows. 1529 In other words, to determine the set of SBD-SMET routes that have to 1530 be sent for a given C-G, the PE has to merge the IGMP/MLD state for 1531 all the BDs (of the given Tenant Domain) to which it is attached. 1533 Per [IGMP-Proxy], importing an SMET route for a particular BD will 1534 cause IGMP/MLD state to be instantiated for the IRB interface to that 1535 BD. This applies as well when the BD is the SBD. 1537 However, traffic originating in a BD of a particular Tenant Domain 1538 MUST NOT be sent down the IRB interface that connects the L3 routing 1539 instance of that Tenant Domain to the SBD of that Tenant Domain. 1540 That would cause duplicate delivery of traffic, since traffic 1541 arriving at L3 over the IRB interface from the SBD has already been 1542 distributed throughout the Tenant Domain. When setting up the IGMP/ 1543 MLD state based on SBD-SMET routes, care must be taken to ensure that 1544 the IRB interface to the SBD is not added to the Outgoing Interface 1545 (OIF) list if the traffic originates within the Tenant Domain. 1547 4. Constructing Multicast Forwarding State 1549 4.1. Layer 2 Multicast State 1551 An EVPN-PE maintains "layer 2 multicast state" for each BD to which 1552 it is attached. 1554 Let PE1 be an EVPN-PE, and BD1 be a BD to which it is attached. At 1555 PE1, BD1's layer 2 multicast state for a given (C-S,C-G) or (C-*,C-G) 1556 governs the disposition of an IP multicast packet that is received by 1557 BD1's layer 2 multicast function on an EVPN-PE. 1559 An IP multicast (S,G) packet is considered to have been received by 1560 BD1's layer 2 multicast function in PE1 in the following cases: 1562 o The packet is the payload of an ethernet frame received by PE1 1563 from an AC that attaches to BD1. 1565 o The packet is the payload of an ethernet frame whose source BD is 1566 BD1, and which is received by the PE1 over a tunnel from another 1567 EVPN-PE. 1569 o The packet is received from BD1's IRB interface (i.e., has been 1570 transmitted by PE1's L3 routing instance down BD1's IRB 1571 interface). 1573 According to the procedures of this document, all transmission of IP 1574 multicast packets from one EVPN-PE to another is done at layer 2. 1575 That is, the packets are transmitted as ethernet frames, according to 1576 the layer 2 multicast state. 1578 Each layer 2 multicast state (S,G) or (*,G) contains a set "output 1579 interfaces" (OIF list). The disposition of an (S,G) multicast frame 1580 received by BD1's layer 2 multicast function is determined as 1581 follows: 1583 o The OIF list is taken from BD1's layer 2 (S,G) state, or if there 1584 is no such (S,G) state, then from BD1's (*,G) state. (If neither 1585 state exists, the OIF list is considered to be null.) 1587 o The rules of Section 4.1.2 are applied to the OIF list. This will 1588 generally result in the frame being transmitted to some, but not 1589 all, elements of the OIF list. 1591 Note that there is no RPF check at layer 2. 1593 4.1.1. Constructing the OIF List 1595 In this document, we have extended the procedures of [IGMP-Proxy] so 1596 that IMET and SMET routes for a particular BD are distributed not 1597 just to PEs that attach to that BD, but to PEs that attach to any BD 1598 in the Tenant Domain. In this way, each PE attached to a given 1599 Tenant Domain learns, from each other PE attached to the same Tenant 1600 Domain, the set of flows that are of interest to each of those other 1601 PEs. (If some PE attached to the Tenant Domain does not support 1602 [IGMP-Proxy], it will be assumed to be interested in all flows. 1603 Whether a particular remote PE supports [IGMP-Proxy] is determined by 1604 the presence of an Extended Community in its IMET route; this is 1605 specified in [IGMP-Proxy].) If a set of remote PEs are interested in 1606 a particular flow, the tunnels used to reach those PEs are added to 1607 the OIF list of the multicast states corresponding to that flow. 1609 An EVPN-PE may run IGMP/MLD procedures on each of its ACs, in order 1610 to determine the set of flows of interest to each AC. (An AC is said 1611 to be interested in a given flow if it connects to a segment that has 1612 tenant systems interested in that flow.) If IGMP/MLD procedures are 1613 not being run on a given AC, that AC is considered to be interested 1614 in all flows. For each BD, the set of ACs interested in a given flow 1615 is determined, and the ACs of that set are added to the OIF list of 1616 that BD's multicast state for that flow. 1618 The OIF list for each multicast state must also contain the IRB 1619 interface for the BD to which the state belongs. 1621 Implementors should note that the OIF list of a multicast state will 1622 change from time to time as ACs and/or remote PEs either become 1623 interested in, or lose interest in, particular multicast flows. 1625 4.1.2. Data Plane: Applying the OIF List to an (S,G) Frame 1627 When an (S,G) multicast frame is received by the layer 2 multicast 1628 function of a given EVPN-PE, say PE1, its disposition depends (a) the 1629 way it was received, (b) upon the OIF list of the corresponding 1630 multicast state (see Section 4.1.1), (c) upon the "eligibility" of an 1631 AC to receive a given frame (see Section 4.1.2.1 and (d) upon its 1632 source BD (see Section 3.2 for information about determining the 1633 source BD of a frame received over a tunnel from another PE). 1635 4.1.2.1. Eligibility of an AC to Receive a Frame 1637 A given (S,G) multicast frame is eligible to be transmitted by a 1638 given PE, say PE1, on a given AC, say AC1, only if one of the 1639 following conditions holds: 1641 1. ESI labels are being used, PE1 is the DF for the segment to which 1642 AC1 is connected, and the frame did not originate from that same 1643 segment (as determined by the ESI label), or 1645 2. The ingress PE for the frame is a remote PE, say PE2, local bias 1646 is being used, and PE2 is not connected to the same segment as 1647 AC1. 1649 4.1.2.2. Applying the OIF List 1651 Assume a given (S,G) multicast frame has been received by a given PE, 1652 say PE1. PE1 determines the source BD of the frame, finds the layer 1653 2 (S,G) state for the source BD (or the (*,G) state if there is no 1654 (S,G) state), and takes the OIF list from that state. Note that if 1655 PE1 is not attached to the actual source BD, it will treat the frame 1656 as if its source BD is the SBD. 1658 Suppose PE1 has determined the frame's source BD to be BD1 (which may 1659 or may not be the SBD.) There are the following cases to consider: 1661 1. The frame was received by PE1 from a local AC, say AC1, that 1662 attaches to BD1. 1664 a. The frame MUST be sent out all local ACs of BD1 that appear 1665 in the OIF list, except for AC1 itself. 1667 b. The frame MUST also be delivered to any other EVPN-PEs that 1668 have interest in it. This is achieved as follows: 1670 i. If (a) AR is being used, and (b) PE1 is an AR-LEAF, and 1671 (c) the OIF list is non-null, PE1 MUST send the frame 1672 to the AR-REPLICATOR. 1674 ii. Otherwise the frame MUST be sent on all tunnels in the 1675 OIF list. 1677 c. The frame MUST be sent to the local L3 routing instance by 1678 being sent up the IRB interface of BD1. It MUST NOT be sent 1679 up any other IRB interfaces. 1681 2. The frame was received by PE1 over a tunnel from another PE. 1682 (See Section 3.2 for the rules to determine the source BD of a 1683 packet received from another PE. Note that if PE1 is not 1684 attached to the source BD, it will regard the SBD as the source 1685 BD.) 1687 a. The frame MUST be sent out all local ACs in the OIF list that 1688 connect to BD1 and that are eligible (per Section 4.1.2.1) to 1689 receive the frame. 1691 b. The frame MUST be sent up the IRB interface of the source BD. 1692 (Note that this may be the SBD.) The frame MUST NOT be sent 1693 up any other IRB interfaces. 1695 c. If PE1 is not an AR-REPLICATOR, it MUST NOT send the frame to 1696 any other EVPN-PEs. However, if PE1 is an AR-REPLICATOR, it 1697 MUST send the frame to all tunnels in the OIF list, except 1698 for the tunnel over which the frame was received. 1700 3. The frame was received by PE1 from the BD1 IRB interface (i.e., 1701 the frame has been transmitted by PE1's L3 routing instance down 1702 the BD1 IRB interface), and BD1 is NOT the SBD. 1704 a. The frame MUST be sent out all local ACs in the OIF list that 1705 are eligible (per Section 4.1.2.1 to receive the frame. 1707 b. The frame MUST NOT be sent to any other EVPN-PEs. 1709 c. The frame MUST NOT be sent up any IRB interfaces. 1711 4. The frame was received from the SBD IRB interface (i.e., has been 1712 transmitted by PE1's L3 routing instance down the SBD IRB 1713 interface). 1715 a. The frame MUST be sent on all tunnels in the OIF list. This 1716 causes the frame to be delivered to any other EVPN-PEs that 1717 have interest in it. 1719 b. The frame MUST NOT be sent on any local ACs. 1721 c. The frame MUST NOT be sent up any IRB interfaces. 1723 4.2. Layer 3 Forwarding State 1725 If an EVPN-PE is performing IGMP/MLD procedures on the ACs of a given 1726 BD, it processes those messages at layer 2 to help form the layer 2 1727 multicast state. If also sends those messages up that BD's IRB 1728 interface to the L3 routing instance of a particular tenant domain. 1729 This causes layer 2 (C-S,C-G) or (C-*,C-G) L3 state to be created/ 1730 updated. 1732 A layer 3 multicast state has both an Input Interface (IIF) and an 1733 OIF list. 1735 To set the IIF of an (C-S,C-G) state, the EVPN-PE must determine the 1736 source BD of C-S. This is done by looking up S in the local 1737 MAC-VRF(s) of the given Tenant Domain. 1739 If the source BD is present on the PE, the IIF is set to the IRB 1740 interface that attaches to that BD. Otherwise the IIF is set to the 1741 SBD IRB interface. 1743 For (C-*,C-G) states, traffic can arrive from any BD, so the IIF 1744 needs to be set to a wildcard value meaning "any IRB interface". 1746 The OIF list of these states includes one or more of the IRB 1747 interfaces of the Tenant Domain. In general, maintenance of the OIF 1748 list does not require any EVPN-specific procedures. However, there 1749 is one EVPN-specific rule: 1751 If the IIF is one of the IRB interfaces (or the wild card meaning 1752 "any IRB interface"), then the SBD IRB interface MUST NOT be added 1753 to the OIF list. Traffic originating from within a particular 1754 EVPN Tenant Domain must not be sent down the SBD IRB interface, as 1755 such traffic has already been distributed to all EVPN-PEs attached 1756 to that Tenant Domain. 1758 Please also see Section 6.1.1, which states a modification of this 1759 rule for the case where OISM is interworking with external Layer 3 1760 multicast routing. 1762 5. Interworking with non-OISM EVPN-PEs 1764 It is possible that a given Tenant Domain will be attached to both 1765 OISM PEs and non-OISM PEs. Inter-subnet IP multicast should be 1766 possible and fully functional even if not all PEs attaching to a 1767 Tenant Domain can be upgraded to support OISM functionality. 1769 Note that the non-OISM PEs are not required to have IRB support, or 1770 support for [IGMP-Proxy]. It is however advantageous for the 1771 non-OISM PEs to support [IGMP-Proxy]. 1773 In this section, we will use the following terminology: 1775 o PE-S: the ingress PE for an (S,G) flow. 1777 o PE-R: an egress PE for an (S,G) flow. 1779 o BD-S: the source BD for an (S,G) flow. PE-S must have one or more 1780 ACs attached BD-S, at least one of which attaches to host S. 1782 o BD-R: a BD that contains a host interested in the flow. The host 1783 is attached to PE-R via an AC that belongs to BD-R. 1785 To allow OISM PEs to interwork with non-OISM PEs, a given Tenant 1786 Domain needs to contain one or more "IP Multicast Gateways" (IPMGs). 1787 An IPMG is an OISM PE with special responsibilities regarding the 1788 interworking between OISM and non-OISM PEs. 1790 If a PE is functioning as an IPMG, it MUST signal this fact by 1791 attaching a particular flag or EC (details to be determined) to its 1792 IMET routes. An IPMG SHOULD attach this flag or EC to all IMET 1793 routes it originates. However, if PE1 imports any IMET route from 1794 PE2 that has the "IPMG" flag or EC present, then the PE1 will assume 1795 that PE2 is an IPMG. 1797 An IPMG Designated Forwarder (IPMG-DF) selection procedure is used to 1798 ensure that, at any given time, there is exactly one active IPMG-DF 1799 for any given BD. Details of the IPMG-DF selection procedure are in 1800 Section 5.1. The IPMG-DF for a given BD, say BD-S, has special 1801 functions to perform when it receives (S,G) frames on that BD: 1803 o If the frames are from a non-OISM PE-S: 1805 * The IPMG-DF forwards them to OISM PEs that do not attach to 1806 BD-S but have interest in (S,G). 1808 Note that OISM PEs that do attach to BD-S will have received 1809 the frames on the BUM tunnel from the non-OISM PE-S. 1811 * The IPMG-DF forwards them to non-OISM PEs that have interest in 1812 (S,G) on ACs that do not belong to BD-S. 1814 Note that if a non-OISM PE has multiple BDs other than BD-S 1815 with interest in (S,G), it will receive one copy of the frame 1816 for each such BD. This is necessary because the non-OISM PEs 1817 cannot move IP multicast traffic from one BD to another. 1819 o If the frames are from an OISM PE, the IPMG-DF forwards them to 1820 non-OISM PEs that have interest in (S,G) on ACs that do not belong 1821 to BD-S. 1823 If a non-OISM PE has interest in (S,G) on an AC belonging to BD-S, 1824 it will have received a copy of the (S,G) frame, encapsulated for 1825 BD-S, from the OISM PE-S. (See Section 3.2.2.) If the non-OISM 1826 PE has interest in (S,G) on one or more ACs belonging to 1827 BD-R1,...,BD-Rk where the BD-Ri are distinct from BD-S, the 1828 IPMG-DF needs to send it a copy of the frame for BD-Ri. 1830 If an IPMG receives a frame on a BD for which it is not the IPMG-DF, 1831 it just follows normal OISM procedures. 1833 This section specifies several sets of procedures: 1835 o the procedures that the IPMG-DF for a given BD needs to follow 1836 when receiving, on that BD, an IP multicast frame from a non-OISM 1837 PE; 1839 o the procedures that the IPMG-DF for a given BD needs to follow 1840 when receiving, on that BD, an IP multicast frame from an OISM PE; 1842 o the procedures that an OISM PE needs to follow when receiving, on 1843 a given BD, an IP multicast frame from a non-OISM PE, when the 1844 OISM PE is not the IPMG-DF for that BD. 1846 To enable OISM/non-OISM interworking in a given Tenant Domain, the 1847 Tenant Domain MUST have some EVPN-PEs that can function as IPMGs. An 1848 IPMG must be configured with the SBD. It must also be configured 1849 with every BD of the Tenant Domain that exists on any of the non-OISM 1850 PEs of that domain. (Operationally, it may be simpler to configure 1851 the IPMG with all the BDs of the Tenant Domain.) 1853 A non-OISM PE of course only needs to be configured with BDs for 1854 which it has ACs. An OISM PE that is not an IPMG only needs to be 1855 configured with the SBD and with the BDs for which it has ACs. 1857 An IPMG MUST originate a wildcard SMET route (with (C-*,C-*) in the 1858 NLRI) for each BD in the Tenant Domain. This will cause it to 1859 receive all the IP multicast traffic that is sourced in the Tenant 1860 Domain. Note that non-OISM nodes that do not support [IGMP-Proxy] 1861 will send all the multicast traffic from a given BD to all PEs 1862 attached to that BD, even if those PEs do not originate an SMET 1863 route. 1865 The interworking procedures vary somewhat depending upon whether 1866 packets are transmitted from PE to PE via Ingress Replication (IR) or 1867 via Point-to-Multipoint (P2MP) tunnels. We do not consider the use 1868 of BIER in this section, due to the low likelihood of there being a 1869 non-OISM PE that supports BIER. 1871 5.1. IPMG Designated Forwarder 1873 Each IPMG MUST be configured with an "IPMG dummy ethernet segment" 1874 that has no ACs. 1876 EVPN supports a number of procedures that can be used to select the 1877 Designated Forwarder (DF) for a particular BD on a particular 1878 ethernet segment. Some of the possible procedures can be found, 1879 e.g., in [RFC7432], [EVPN-DF-NEW], and [EVPN-DF-WEIGHTED]. Whatever 1880 procedure is in use in a given deployment can be adapted to select an 1881 IPMG-DF for a given BD, as follows. 1883 Each IPMG will originate an Ethernet Segment route for the IPMG dummy 1884 ethernet segment. It MUST carry a Route Target derived from the 1885 corresponding Ethernet Segment Identifier. Thus only IPMGs will 1886 import the route. 1888 Once the set of IPMGs is known, it is also possible to determine the 1889 set of BDs supported by each IPMG. The DF selection procedure can 1890 then be used to choose a DF for each BD. (The conditions under which 1891 the IPMG-DF for a given BD changes depends upon the DF selection 1892 algorithm that is in use.) 1894 5.2. Ingress Replication 1896 The procedures of this section are used when Ingress Replication is 1897 used to transmit packets from one PE to another. 1899 When a non-OISM PE-S transmits a multicast frame from BD-S to another 1900 PE, PE-R, PE-S will use the encapsulation specified in the BD-S IMET 1901 route that was originated by PE-R. This encapsulation will include 1902 the label that appears in the "MPLS label" field of the PMSI Tunnel 1903 attribute (PTA) of the IMET route. If the tunnel type is VXLAN, the 1904 "label" is actually a Virtual Network Identifier (VNI); for other 1905 tunnel types, the label is an MPLS label. In either case, we will 1906 speak of the transmitted frames as carrying a label that was assigned 1907 to a particular BD by the PE-R to which the frame is being 1908 transmitted. 1910 To support OISM/non-OISM interworking, an OISM PE-R MUST originate, 1911 for each of its BDs, both an IMET route and an S-PMSI (C-*,C-*) A-D 1912 route. Note that even when IR is being used, interworking between 1913 OISM and non-OISM PEs requires the OISM PEs to follow the rules of 1914 Section 3.2.5.2, as modified below. 1916 Non-OISM PEs will not understand S-PMSI A-D routes. So when a 1917 non-OISM PE-S transmits an IP multicast frame with a particular 1918 source BD to an IPMG, it encapsulates the frame using the label 1919 specified in that IPMG's BD-S IMET route. (This is just the 1920 procedure of [RFC7432].) 1922 The (C-*,C-*) S-PMSI A-D route originated by a given OISM PE will 1923 have a PTA that specifies IR. 1925 o If MPLS tunneling is being used, the MPLS label field SHOULD 1926 contain a non-zero value, and the LIR flag SHOULD be zero. (The 1927 case where the MPLS label field is zero or the LIR flag is set is 1928 outside the scope of this document.) 1930 o If the tunnel encapsulation is VXLAN, the MPLS label field MUST 1931 contain a non-zero value, and the LIR flag MUST be zero. 1933 When an OISM PE-S transmits an IP multicast frame to an IPMG, it will 1934 use the label specified in that IPMG's (C-*,C-*) S-PMSI A-D route. 1936 When a PE originates both an IMET route and a (C-*,C-*) S-PMSI A-D 1937 route, the values of the MPLS label field in the respective PTAs must 1938 be distinct. Further, each MUST map uniquely (in the context of the 1939 originating PE) to the route's BD. 1941 As a result, an IPMG receiving an MPLS-encapsulated IP multicast 1942 frame can always tell by the label whether the frame's ingress PE is 1943 an OISM PE or a non-OISM PE. When an IPMG receives a VXLAN- 1944 encapsulated IP multicast frame it may need to determine the identity 1945 of the ingress PE from the outer IP encapsulation; it can then 1946 determine whether the ingress PE is an OISM PE or a non-OISM PE by 1947 looking the IMET route from that PE. 1949 Suppose an IPMG receives an IP multicast frame from another EVPN-PE 1950 in the Tenant Domain, and the IPMG is not the IPMG-DF for the frame's 1951 source BD. Then the IPMG performs only the ordinary OISM functions; 1952 it does not perform the IPMG-specific functions for that frame. In 1953 the remainder of this section, when we discuss the procedures applied 1954 by an IPMG when it receives an IP multicast frame, we are presuming 1955 that the source BD of the frame is a BD for which the IPMG is the 1956 IPMG-DF. 1958 We have two basic cases to consider: (1) a frame's ingress PE is a 1959 non-OISM node, and (2) a frame's ingress PE is an OISM node. 1961 5.2.1. Ingress PE is non-OISM 1963 In this case, a non-OISM PE, PE-S, has received an (S,G) multicast 1964 frame over an AC that is attached to a particular BD, BD-S. By 1965 virtue of normal EVPN procedures, PE-S has sent a copy of the frame 1966 to every PE-R (both OISM and non-OISM) in the Tenant Domain that is 1967 attached to BD-S. If the non-OISM node supports [IGMP-Proxy], only 1968 PEs that have expressed interest in (S,G) receive the frame. The 1969 IPMG will have expressed interest via a (C-*,C-*) SMET route and thus 1970 receives the frame. 1972 Any OISM PE (including an IPMG) receiving the frame will apply normal 1973 OISM procedures. As a result it will deliver the frame to any of its 1974 local ACs (in BD-S or in any other BD) that have interest in (S,G). 1976 An OISM PE that is also the IPMG-DF for a particular BD, say BD-S, 1977 has additional procedures that it applies to frames received on BD-S 1978 from non-OISM PEs: 1980 1. When the IPMG-DF for BD-S receives an (S,G) frame from a 1981 non-OISM node, it MUST forward a copy of the frame to every OISM 1982 PE that is NOT attached to BD-S but has interest in (S,G). The 1983 copy sent to a given OISM PE-R must carry the label that PE-R 1984 has assigned to the SBD in an S-PMSI A-D route. The IPMG MUST 1985 NOT do any IP processing of the frame's IP payload. TTL 1986 decrement and other IP processing will be done by PE-R, per the 1987 normal OISM procedures. There is no need for the IPMG to 1988 include an ESI label in the frame's tunnel encapsulation, 1989 because it is already known that the frame's source BD has no 1990 presence on PE-R. There is also no need for the IPMG to modify 1991 the frame's MAC SA. 1993 2. In addition, when the IPMG-DF for BD-S receives an (S,G) frame 1994 from a non-OISM node, it may need to forward copies of the frame 1995 to other non-OISM nodes. Before it does so, it MUST decapsulate 1996 the (S,G) packet, and do the IP processing (e.g., TTL 1997 decrement). Suppose PE-R is a non-OISM node that has an AC to 1998 BD-R, where BD-R is not the same as BD-S, and that AC has 1999 interest in (S,G). The IPMG must then encapsulate the (S,G) 2000 packet (after the IP processing has been done) in an ethernet 2001 header. The MAC SA field will have the MAC address of the 2002 IPMG's IRB interface to BD-R. The IPMG then sends the frame to 2003 PE-R. The tunnel encapsulation will carry the label that PE-R 2004 advertised in its IMET route for BD-R. There is no need to 2005 include an ESI label, as the source and destination BDs are 2006 known to be different. 2008 Note that if a non-OISM PE-R has several BDs (other than BD-S) 2009 with local ACs that have interest in (S,G), the IPMG will send 2010 it one copy for each such BD. This is necessary because the 2011 non-OISM PE cannot move packets from one BD to another. 2013 There may be deployment scenarios in which every OISM PE is 2014 configured with every BD that is present on any non-OISM PE. In such 2015 scenarios, the procedures of item 1 above will not actually result in 2016 the transmission of any packets. Hence if it is known a priori that 2017 this deployment scenario exists for a given tenant domain, the 2018 procedures of item 1 above can be disabled. 2020 5.2.2. Ingress PE is OISM 2022 In this case, an OISM PE, PE-S, has received an (S,G) multicast frame 2023 over an AC that attaches to a particular BD, BD-S. 2025 By virtue of receiving all the IMET routes about BD-S, PE-S will know 2026 all the PEs attached to BD-S. By virtue of normal OISM procedures: 2028 o PE-S will send a copy of the frame to every OISM PE-R (including 2029 the IPMG) in the Tenant Domain that is attached to BD-S and has 2030 interest in (S,G). The copy sent to a given PE-R carries the 2031 label that that the PE-R has assigned to BD-S in its (C-*,C-*) 2032 S-PMSI A-D route. 2034 o PE-S will also transmit a copy of the (S,G) frame to every OISM 2035 PE-R that has interest in (S,G) but is not attached to BD-S. The 2036 copy will contain the label that the PE-R has assigned to the SBD. 2037 (As in Section 5.2.1, an IPMG is assumed to have indicated 2038 interest in all multicast flows.) 2040 o PE-S will also transmit a copy of the (S,G) frame to every 2041 non-OISM PE-R that is attached to BD-S. It does this using the 2042 label advertised by that PE-R in its IMET route for BD-S. 2044 The PE-Rs follow their normal procedures. An OISM PE that receives 2045 the (S,G) frame on BD-S applies the OISM procedures to deliver the 2046 frame to its local ACs, as necessary. A non-OISM PE that receives 2047 the (S,G) frame on BD-S delivers the frame only to its local BD-S 2048 ACs, as necessary. 2050 Suppose that a non-OISM PE-R has interest in (S,G) on a BD, BD-R, 2051 that is different than BD-S. If the non-OISM PE-R is attached to 2052 BD-S, the OISM PE-S will send forward it the original (S,G) multicast 2053 frame, but the non-OISM PE-R will not be able to send the frame to 2054 ACs that are not in BD-S. If PE-R is not even attached to BD-S, the 2055 OISM PE-S will not send it a copy of the frame at all, because PE-R 2056 is not attached to the SBD. In these cases, the IPMG needs to relay 2057 the (S,G) multicast traffic from OISM PE-S to non-OISM PE-R. 2059 When the IPMG-DF for BD-S receives an (S,G) frame from an OISM PE-S, 2060 it has to forward it to every non-OISM PE-R that that has interest in 2061 (S,G) on a BD-R that is different than BD-S. The IPMG MUST 2062 decapsulate the IP multicast packet, do the IP processing, re- 2063 encapsulate it for BD-R (changing the MAC SA to the IPMG's own MAC 2064 address on BD-R), and send a copy of the frame to PE-R. Note that a 2065 given non-OISM PE-R will receive multiple copies of the frame, if it 2066 has multiple BDs on which there is interest in the frame. 2068 5.3. P2MP Tunnels 2070 When IR is used to distribute the multicast traffic among the 2071 EVPN-PEs, the procedures of Section 5.2 ensure that there will be no 2072 duplicate delivery of multicast traffic. That is, no egress PE will 2073 ever send a frame twice on any given AC. If P2MP tunnels are being 2074 used to distribute the multicast traffic, it is necessary have 2075 additional procedures to prevent duplicate delivery. 2077 At the present time, it is not clear that there will be a use case in 2078 which OISM nodes need to interwork with non-OISM nodes that use P2MP 2079 tunnels. If it is determined that there is such a use case, 2080 procedures for it will be included in a future revision of this 2081 document. 2083 6. Traffic to/from Outside the EVPN Tenant Domain 2085 In this section, we discuss scenarios where a multicast source 2086 outside a given EVPN Tenant Domain sends traffic to receivers inside 2087 the domain (as well as, possibly, to receivers outside the domain). 2088 This requires the OISM procedures to interwork with various layer 3 2089 multicast routing procedures. 2091 We assume in this section that the Tenant Domain is not being used as 2092 an intermediate transit network for multicast traffic; that is, we do 2093 not consider the case where the Tenant Domain contains multicast 2094 routers that will receive traffic from sources outside the domain and 2095 forward the traffic to receivers outside the domain. The transit 2096 scenario is considered in Section 7. 2098 We can divide the non-transit scenarios into two classes: 2100 1. One or more of the EVPN PE routers provide the functionality 2101 needed to interwork with layer 3 multicast routing procedures. 2103 2. One BD in the Tenant Domain contains external multicast routers 2104 ("tenant multicast routers") that are used to interwork the 2105 entire Tenant Domain with layer 3 multicast routing procedures. 2107 6.1. Layer 3 Interworking via EVPN OISM PEs 2109 6.1.1. General Principles 2111 Sometimes it is necessary to interwork an EVPN Tenant Domain with an 2112 external layer 3 multicast domain (the "external domain"). This is 2113 needed to allow EVPN tenant systems to receive multicast traffic from 2114 sources ("external sources") outside the EVPN Tenant Domain. It is 2115 also needed to allow receivers ("external receivers") outside the 2116 EVPN Tenant Domain to receive traffic from sources inside the Tenant 2117 Domain. 2119 In order to allow interworking between an EVPN Tenant Domain and an 2120 external domain, one or more OISM PEs must be "L3 Gateways". An L3 2121 Gateway participates both in the OISM procedures and in the L3 2122 multicast routing procedures of the external domain. 2124 An L3 Gateway that has interest in receiving (S,G) traffic must be 2125 able to determine the best route to S. If an L3 Gateway has interest 2126 in (*,G), it must be able to determine the best route to G's RP. In 2127 these interworking scenarios, the L3 Gateway must be running a layer 2128 3 unicast routing protocol. Via this protocol, it imports unicast 2129 routes (either IP routes or VPN-IP routes) from routers other than 2130 EVPN PEs. And since there may be multicast sources inside the EVPN 2131 Tenant Domain, the EVPN PEs also need to export, either as IP routes 2132 or as VPN-IP routes (depending upon the external domain), unicast 2133 routes to those sources. 2135 When selecting the best route to a multicast source or RP, an L3 2136 Gateway might have a choice between an EVPN route and an IP/VPN-IP 2137 route. When such a choice exists, the L3 Gateway SHOULD always 2138 prefer the EVPN route. This will ensure that when traffic originates 2139 in the Tenant Domain and has a receiver in the tenant domain, the 2140 path to that receiver will remain within the EVPN tenant domain, even 2141 if the source is also reachable via a routed path. This also 2142 provides protection against sub-optimal routing that might occur if 2143 two EVPN PEs export IP/VPN-IP routes and each imports the other's IP/ 2144 VPN-IP routes. 2146 Section 4.2 discusses the way layer 3 multicast states are 2147 constructed by OISM PEs. These layer 3 multicast states have IRB 2148 interfaces as their IIF and OIF list entries, and are the basis for 2149 interworking OISM with other layer 3 multicast procedures such as 2150 MVPN or PIM. From the perspective of the layer 3 multicast 2151 procedures running in a given L3 Gateway, an EVPN Tenant Domain is a 2152 set of IRB interfaces. 2154 When interworking an EVPN Tenant Domain with an external domain, the 2155 L3 Gateway's layer 3 multicast states will not only have IRB 2156 interfaces as IIF and OIF list entries, but also other "interfaces" 2157 that lead outside the Tenant Domain. For example, when interworking 2158 with MVPN, the multicast states may have MVPN tunnels as well as IRB 2159 interfaces as IIF or OIF list members. When interworking with PIM, 2160 the multicast states may have PIM-enabled non-IRB interfaces as IIF 2161 or OIF list members. 2163 As long as a Tenant Domain is not being used as an intermediate 2164 transit network for IP multicast traffic, it is not necessary to 2165 enable PIM on its IRB interfaces. 2167 In general, an L3 Gateway has the following responsibilities: 2169 o It exports, to the external domain, unicast routes to those 2170 multicast sources in the EVPN Tenant Domain that are locally 2171 attached to the L3 Gateway. 2173 o It imports, from the external domain, unicast routes to multicast 2174 sources that are in the external domain. 2176 o It executes the procedures necessary to draw externally sourced 2177 multicast traffic that is of interest to locally attached 2178 receivers in the EVPN Tenant Domain. When such traffic is 2179 received, the traffic is sent down the IRB interfaces of the BDs 2180 on which the locally attached receivers reside. 2182 One of the L3 Gateways in a given Tenant Domain becomes the "DR" for 2183 the SBD.(See Section 6.1.2.4.) This L3 gateway has the following 2184 additional responsibilities: 2186 o It exports, to the external domain, unicast routes to multicast 2187 sources that in the EVPN Tenant Domain that are not locally 2188 attached to any L3 gateway. 2190 o It imports, from the external domain, unicast routes to multicast 2191 sources that are in the external domain. 2193 o It executes the procedures necessary to draw externally sourced 2194 multicast traffic that is of interest to receivers in the EVPN 2195 Tenant Domain that are not locally attached to an L3 gateway. 2196 When such traffic is received, the traffic is sent down the SBD 2197 IRB interface. OISM procedures already described in this document 2198 will then ensure that the IP multicast traffic gets distributed 2199 throughout the Tenant Domain to any EVPN PEs that have interest in 2200 it. Thus to an OISM PE that is not an L3 gateway the externally 2201 sourced traffic will appear to have been sourced on the SBD. 2203 In order for this to work, some special care is needed when an L3 2204 gateway creates or modifies a layer 3 (*,G) multicast state. Suppose 2205 group G has both external sources (sources outside the EVPN Tenant 2206 Domain) and internal sources (sources inside the EVPN tenant domain). 2207 Section 4.2 states that when there are internal sources, the SBD IRB 2208 interface must not be added to the OIF list of the (*,G) state. 2209 Traffic from internal sources will already have been delivered to all 2210 the EVPN PEs that have interest in it. However, if the OIF list of 2211 the (*,G) state does not contain its SBD IRB interface, then traffic 2212 from external sources will not get delivered to other EVPN PEs. 2214 One way of handling this is the following. When a L3 gateway 2215 receives (S,G) traffic from other than an IRB interface, and the 2216 traffic corresponds to a layer 3 (*,G) state, the L3 gateway can 2217 create (S,G) state. The IIF will be set to the external interface 2218 over which the traffic is expected. The OIF list will contain the 2219 SBD IRB interface, as well as the IRB interfaces of any other BDs 2220 attached to the PEG DR that have locally attached receivers with 2221 interest in the (S,G) traffic. The (S,G) state will ensure that the 2222 external traffic is sent down the SBD IRB interface. The following 2223 text will assume this procedure; however other implementation 2224 techniques may also be possible. 2226 If a particular BD is attached to several L3 Gateways, one of the L3 2227 Gateways becomes the DR for that BD. (See Section 6.1.2.4.) If the 2228 interworking scenario requires FHR functionality, it is generally the 2229 DR for a particular BD that is responsible for performing that 2230 functionality on behalf of the source hosts on that BD. (E.g., if 2231 the interworking scenario requires that PIM Register messages be sent 2232 by a FHR, the DR for a given BD would send the PIM Register messages 2233 for sources on that BD.) Note though that the DR for the SBD does 2234 not perform FHR functionality on behalf of external sources. 2236 An optional alternative is to have each L3 gateway perform FHR 2237 functionality for locally attached sources. Then the DR would only 2238 have to perform FHR functionality on behalf of sources that are 2239 locally attached to itself AND sources that are not attached to any 2240 L3 gateway. 2242 6.1.2. Interworking with MVPN 2244 In this section, we specify the procedures necessary to allow EVPN 2245 PEs running OISM procedures to interwork with L3VPN PEs that run BGP- 2246 based MVPN ([RFC6514]) procedures. More specifically, the procedures 2247 herein allow a given EVPN Tenant Domain to become part of an L3VPN/ 2248 MVPN, and support multicast flows where either: 2250 o The source of a given multicast flow is attached to an ethernet 2251 segment whose BD is part of an EVPN Tenant Domain, and one or more 2252 receivers of the flow are attached to the network via L3VPN/MVPN. 2253 (Other receivers may be attached to the network via EVPN.) 2255 o The source of a given multicast flow is attached to the network 2256 via L3VPN/MVPN, and one or more receivers of the flow are attached 2257 to an ethernet segment that is part of an EVPN tenant domain. 2258 (Other receivers may be attached via L3VPN/MVPN.) 2260 In this interworking model, existing L3VPN/MVPN PEs are unaware that 2261 certain sources or receivers are part of an EVPN Tenant Domain. The 2262 existing L3VPN/MVPN nodes run only their standard procedures and are 2263 entirely unaware of EVPN. Interworking is achieved by having some or 2264 all of the EVPN PEs function as L3 Gateways running L3VPN/MVPN 2265 procedures, as detailed in the following sub-sections. 2267 In this section, we assume that there are no tenant multicast routers 2268 on any of the EVPN-attached ethernet segments. (There may of course 2269 be multicast routers in the L3VPN.) Consideration of the case where 2270 there are tenant multicast routers is deferred till Section 7.) 2272 To support MVPN/EVPN interworking, we introduce the notion of an 2273 MVPN/EVPN Gateway, or MEG. 2275 A MEG is an L3 Gateway (see Section 6.1.1), hence is both an OISM PE 2276 and an L3VPN/MVPN PE. For a given EVPN Tenant Domain it will have an 2277 IP-VRF. If the Tenant Domain is part of an L3VPN/MVPN, the IP-VRF 2278 also serves as an L3VPN VRF ([RFC4364]). The IRB interfaces of the 2279 IP-VRF are considered to be "VRF interfaces" of the L3VPN VRF. The 2280 L3VPN VRF may also have other local VRF interfaces that are not EVPN 2281 IRB interfaces. 2283 The VRF on the MEG will import VPN-IP routes ([RFC4364]) from other 2284 L3VPN Provider Edge (PE) routers. It will also export VPN-IP routes 2285 to other L3VPN PE routers. In order to do so, it must be 2286 appropriately configured with the Route Targets used in the L3VPN to 2287 control the distribution of the VPN-IP routes. These Route Targets 2288 will in general be different than the Route Targets used for 2289 controlling the distribution of EVPN routes, as there is no need to 2290 distribute EVPN routes to L3VPN-only PEs and no reason to distribute 2291 L3VPN/MVPN routes to EVPN-only PEs. 2293 Note that the RDs in the imported VPN-IP routes will not necessarily 2294 conform to the EVPN rules (as specified in [RFC7432]) for creating 2295 RDs. Therefore a MEG MUST NOT expect the RDs of the VPN-IP routes to 2296 be of any particular format other than what is required by the L3VPN/ 2297 MVPN specifications. 2299 The VPN-IP routes that a MEG exports to L3VPN are subnet routes and/ 2300 or host routes for the multicast sources that are part of the EVPN 2301 tenant domain. The exact set of routes that need to be exported is 2302 discussed in Section 6.1.2.2. 2304 Each IMET route originated by a MEG SHOULD carry a flag or Extended 2305 Community (to be determined) indicating that the originator of the 2306 IMET route is a MEG. However, PE1 will consider PE2 to be a MEG if 2307 PE1 imports at least one IMET route from PE2 that carries the flag or 2308 EC. 2310 All the MEGs of a given Tenant Domain attach to the SBD of that 2311 domain, and one of them is selected to be the SBD's Designated Router 2312 (DR) for the domain. The selection procedure is discussed in 2313 Section 6.1.2.4. 2315 In this model of operation, MVPN procedures and EVPN procedures are 2316 largely independent. In particular, there is no assumption that MVPN 2317 and EVPN use the same kind of tunnels. Thus no special procedures 2318 are needed to handle the common scenarios where, e.g., EVPN uses 2319 VXLAN tunnels but MVPN uses MPLS P2MP tunnels, or where EVPN uses 2320 Ingress Replication but MVPN uses MPLS P2MP tunnels. 2322 Similarly, no special procedures are needed to prevent duplicate data 2323 delivery on ethernet segments that are multi-homed. 2325 The MEG does have some special procedures (described below) for 2326 interworking between EVPN and MVPN; these have to do with selection 2327 of the Upstream PE for a given multicast source, with the exporting 2328 of VPN-IP routes, and with the generation of MVPN C-multicast routes 2329 triggered by the installation of SMET routes. 2331 6.1.2.1. MVPN Sources with EVPN Receivers 2333 6.1.2.1.1. Identifying MVPN Sources 2335 Consider a multicast source S. It is possible that a MEG will import 2336 both an EVPN unicast route to S and a VPN-IP route (or an ordinary IP 2337 route), where the prefix length of each route is the same. In order 2338 to draw (S,G) multicast traffic for any group G, the MEG SHOULD use 2339 the EVPN route rather than the VPN-IP or IP route to determine the 2340 "Upstream PE" (see section 5 of [RFC6513]). 2342 Doing so ensures that when an EVPN tenant system desires to receive a 2343 multicast flow from another EVPN tenant system, the traffic from the 2344 source to that receiver stays within the EVPN domain. This prevents 2345 problems that might arise if there is a unicast route via L3VPN to S, 2346 but no multicast routers along the routed path. This also prevents 2347 problem that might arise as a result of the fact that the MEGs will 2348 import each others' VPN-IP routes. 2350 In the Section 6.1.2.1.2, we describe the procedures to be used when 2351 the selected route to S is a VPN-IP route. 2353 6.1.2.1.2. Joining a Flow from an MVPN Source 2355 Suppose a tenant system R wants to receive (S,G) multicast traffic, 2356 where source S is not attached to any PE in the EVPN Tenant Domain, 2357 but is attached to an MVPN PE. 2359 o Suppose R is on a singly homed ethernet segment of BD-R, and that 2360 segment is attached to PE1, where PE1 is a MEG. PE1 learns via 2361 IGMP/MLD listening that R is interested in (S,G). PE1 determines 2362 from its VRF that there is no route to S within the Tenant Domain 2363 (i.e., no EVPN RT-2 route with S's IP address), but that there is 2364 a route to S via L3VPN (i.e., the VRF contains a subnet or host 2365 route to S that was received as a VPN-IP route). PE1 thus 2366 originates (if it hasn't already) an MVPN C-multicast Source Tree 2367 Join(S,G) route. The route is constructed according to normal 2368 MVPN procedures. 2370 The layer 2 multicast state is constructed as specified in 2371 Section 4.1. 2373 In the layer 3 multicast state, the IIF is the appropriate MVPN 2374 tunnel, and the IRB interface to BD-R is added to the OIF list. 2376 When PE1 receives (S,G) traffic from the appropriate MVPN tunnel, 2377 it performs IP processing of the traffic, and then sends the 2378 traffic down its IRB interface to BD-R. Following normal OISM 2379 procedures, the (S,G) traffic will be encapsulated for ethernet 2380 and sent out the AC to which R is attached. 2382 o Suppose R is on a singly homed ethernet segment of BD-R, and that 2383 segment is attached to PE1, where PE1 is an OISM PE but is NOT a 2384 MEG. PE1 learns via IGMP/MLD listening that R is interested in 2385 (S,G). PE1 follows normal OISM procedures, originating an SMET 2386 route in BD-R for (S,G). Since this route will carry the SBD-RT, 2387 it will be received by the MEG that is the DR for the Tenant 2388 Domain. The MEG DR can determine from PE1's IMET route whether 2389 PE1 is itself a MEG. If PE1 is not a MEG, the MEG DR will 2390 originate (if it hasn't already) an MVPN C-multicast Source Tree 2391 Join(S,G) route. This will cause the DR MEG to receive (S,G) 2392 traffic on an MVPN tunnel. 2394 The layer 2 multicast state is constructed as specified in 2395 Section 4.1. 2397 In the layer 3 multicast state, the IIF is the appropriate MVPN 2398 tunnel, and the IRB interface to the SBD is added to the OIF list. 2400 When the DR MEG receives (S,G) traffic on an MVPN tunnel, it 2401 performs IP processing of the traffic, and the sends the traffic 2402 down its IRB interface to the SBD. Following normal OISM 2403 procedures, the traffic will be encapsulated for ethernet and 2404 delivered to all PEs in the Tenant Domain that have interest in 2405 (S,G), including PE1. 2407 o If R is on a multi-homed ethernet segment of BD-R, one of the PEs 2408 attached to the segment will be its DF (following normal EVPN 2409 procedures), and the DF will know (via the procedures of 2410 [IGMP-Proxy] that a tenant system reachable via one of its local 2411 ACs to BD-R is interested in (S,G) traffic. The DF is responsible 2412 for originating an SMET route for (S,G), following normal OISM 2413 procedures. If the DF is a MEG, it will originate the 2414 corresponding MVPN C-multicast Source Tree Join(S,G) route; if the 2415 DF is not a MEG, the MEG that is the DR will originate the 2416 C-multicast route when it receives the SMET route. 2418 o If R is attached to a non-OISM PE, it will receive the traffic via 2419 an IPMG, as specified in Section 5. 2421 If an EVPN-attached receiver is interested in (*,G) traffic, and if 2422 it is possible for there to be sources of (*,G) traffic that are 2423 attached only to L3VPN nodes, the MEGs will have to know the group- 2424 to-RP mappings. That will enable them to originate MVPN C-multicast 2425 Shared Tree Join(*,G) routes and to send them towards the RP. (Since 2426 we are assuming in this section that there are no tenant multicast 2427 routers attached to the EVPN Tenant Domain, the RP must be attached 2428 via L3VPN. Alternatively, the MEG itself could be configured to 2429 function as an RP for group G.) 2431 The layer 2 multicast states are constructed as specified in 2432 Section 4.1. 2434 In the layer 3 (*,G) multicast state, the IIF is the appropriate MVPN 2435 tunnel. A MEG will add to the (*,G) OIF list its IRB interfaces for 2436 any BDs containing locally attached receivers. If there are 2437 receivers attached to other EVPN PEs, then whenever (S,G) traffic 2438 from an external source matches a (*,G) state, the MEG will create 2439 (S,G) state, with the MVPN tunnel as the IIF, the OIF list copied 2440 from the (*,G) state, and the SBD IRB interface added to the OIF 2441 list. (Please see the discussion in Section 6.1.1 regarding the 2442 inclusion of the SBD IRB interface in a (*,G) state; the SBD IRB 2443 interface is used in the OIF list only for traffic from external 2444 sources.) 2446 Normal MVPN procedures will then result in the MEG getting the (*,G) 2447 traffic from all the multicast sources for G that are attached via 2448 L3VPN. This traffic arrives on MVPN tunnels. When the MEG removes 2449 the traffic from these tunnels, it does the IP processing. If there 2450 are any receivers on a given BD, BD-R, that are attached via local 2451 EVPN ACs, the MEG sends the traffic down its BD-R IRB interface. If 2452 there are any other EVPN PEs that are interested in the (*,G) 2453 traffic, the MEG sends the traffic down the SBD IRB interface. 2454 Normal OISM procedures then distribute the traffic as needed to other 2455 EVPN-PEs. 2457 6.1.2.2. EVPN Sources with MVPN Receivers 2459 6.1.2.2.1. General procedures 2461 Consider the case where an EVPN tenant system S is sending IP 2462 multicast traffic to group G, and there is a receiver R for the (S,G) 2463 traffic that is attached to the L3VPN, but not attached to the EVPN 2464 Tenant Domain. (We assume in this document that the L3VPN/MVPN-only 2465 nodes will not have any special procedures to deal with the case 2466 where a source is inside an EVPN domain.) 2468 In this case, an L3VPN PE through which R can be reached has to send 2469 an MVPN C-multicast Join(S,G) route to one of the MEGs that is 2470 attached to the EVPN Tenant Domain. For this to happen, the L3VPN PE 2471 must have imported a VPN-IP route for S (either a host route or a 2472 subnet route) from a MEG. 2474 If a MEG determines that there is multicast source transmitting on 2475 one of its ACs, the MEG SHOULD originate a VPN-IP host route for that 2476 source. This determination SHOULD be made by examining the IP 2477 multicast traffic that arrives on the ACs. (It MAY be made by 2478 provisioning.) A MEG SHOULD NOT export a VPN-IP host route for any 2479 IP address that is not known to be a multicast source (unless it has 2480 some other reason for exporting such a route). The VPN-IP host route 2481 for a given multicast source MUST be withdrawn if the source goes 2482 silent for a configurable period of time, or if it can be determined 2483 that the source is no longer reachable via a local AC. 2485 A MEG SHOULD also originate a VPN-IP subnet route for each of the BDs 2486 in the Tenant Domain. 2488 VPN-IP routes exported by a MEG must carry any attributes or extended 2489 communities that are required by L3VPN and MVPN. In particular, a 2490 VPN-IP route exported by a MEG must carry a VRF Route Import Extended 2491 Community corresponding to the IP-VRF from which it is imported, and 2492 a Source AS Extended Community. 2494 As a result, if S is attached to a MEG, the L3VPN nodes will direct 2495 their MVPN C-multicast Join routes to that MEG. Normal MVPN 2496 procedures will cause the traffic to be delivered to the L3VPN nodes. 2497 The layer 3 multicast state for (S,G) will have the MVPN tunnel on 2498 its OIF list. The IIF will be the IRB interface leading to the BD 2499 containing S. 2501 If S is not attached to a MEG, the L3VPN nodes will direct their 2502 C-multicast Join routes to whichever MEG appears to be on the best 2503 route to S's subnet. Upon receiving the C-multicast Join, that MEG 2504 will originate an EVPN SMET route for (S,G). As a result, the MEG 2505 will receive the (S,G) traffic at layer 2 via the OISM procedures. 2506 The (S,G) traffic will be sent up the appropriate IRB interface, and 2507 the layer 3 MVPN procedures will ensure that the traffic is delivered 2508 to the L3VPN nodes that have requested it. The layer 3 multicast 2509 state for (S,G) will have the MVPN tunnel in the OIF list, and the 2510 IIF will be one of the following: 2512 o If S belongs to a BD that is attached to the MEG, the IIF will be 2513 the IRB interface to that BD; 2515 o Otherwise the IIF will be the SBD IRB interface. 2517 Note that this works even if S is attached to a non-OISM PE, per the 2518 procedures of Section 5. 2520 6.1.2.2.2. Any-Source Multicast (ASM) Groups 2522 Suppose the MEG DR learns that one of the PEs in its Tenant Domain is 2523 interested in (*,G), traffic, where G is an Any-Source Multicast 2524 (ASM) group. If there are no tenant multicast routers, the MEG DR 2525 SHOULD perform the "First Hop Router" (FHR) functionality for group G 2526 on behalf of the Tenant Domain, as described in [RFC7761]. This 2527 means that the MEG DR must know the identity of the Rendezvous Point 2528 (RP) for each group, must send Register messages to the Rendezvous 2529 Point, etc. 2531 If the MEG DR is to be the FHR for the Tenant Domain, it must see all 2532 the multicast traffic that is sourced from within the domain and 2533 destined to an ASM group address. The MEG can ensure this by 2534 originating an SBD-SMET route for (*,*). As an optimization, an 2535 SBD-SMET route for (*, "any ASM group"), or even (*, "any ASM group 2536 that might have MVPN sources") can be defined. 2538 In some deployment scenarios, it may be preferred that the MEG that 2539 receives the (S,G) traffic over an AC be the one provides the FHR 2540 functionality. In that case, the MEG DR wold not need to provide the 2541 FHR functionality for (S,G) traffic that is attached to another MEG. 2543 Other deployment scenarios are also possible. For example, one might 2544 want to configure the MEGs to themselves be RPs. In this case, the 2545 RPs would have to exchange with each other information about which 2546 sources are active. The method exchanging such information is 2547 outside the scope of this document. 2549 6.1.2.2.3. Source on Multihomed Segment 2551 Suppose S is attached to a segment that is all-active multi-homed to 2552 PEl and PE2. If S is transmitting to two groups, say G1 and G2, it 2553 is possible that PE1 will receive the (S,G1) traffic from S while PE2 2554 receives the (S,G2) traffic from S. 2556 This creates an issue for MVPN/EVPN interworking, because there is no 2557 way to cause L3VPN/MVPN nodes to select PE1 as the ingress PE for 2558 (S,G1) traffic while selecting PE2 as the ingress PE for (S,G2) 2559 traffic. 2561 However, the following procedure ensures that the IP multicast 2562 traffic will still flow, even if the L3VPN/MVPN nodes picks the 2563 "wrong" EVPN-PE as the Upstream PE for (say) the (S,G1) traffic. 2565 Suppose S is on an ethernet segment, belonging to BD1, that is 2566 multi-homed to both PE1 and PE2, where PE1 is a MEG. And suppose 2567 that IP multicast traffic from S to G travels over the AC that 2568 attaches the segment to PE2 . If PE1 receives a C-multicast Source 2569 Tree Join (S,G) route, it MUST originate an SMET route for (S,G). 2570 Normal OISM procedures will then cause PE2 to send the (S,G) traffic 2571 to PE1 on an EVPN IP multicast tunnel. Normal OISM procedures will 2572 also cause PE1 to send the (S,G) traffic up its BD1 IRB interface. 2573 Normal MVPN procedures will then cause PE1 to forward the traffic on 2574 an MVPN tunnel. In this case, the routing is not optimal, but the 2575 traffic does flow correctly. 2577 6.1.2.3. Obtaining Optimal Routing of Traffic Between MVPN and EVPN 2579 The routing of IP multicast traffic between MVPN nodes and EVPN nodes 2580 will be optimal as long as there is a MEG along the optimal route. 2581 There are various deployment strategies that can be used to obtain 2582 optimal routing between MVPN and EVPN. 2584 In one such scenario, a Tenant Domain will have a small number of 2585 strategically placed MEGs. For example, a Data Center may have a 2586 small number of MEGs that connect it to a wide-area network. Then 2587 the optimal route into or out of the Data Center would be through the 2588 MEGs. 2590 In this scenario, the MEGs do not need to originate VPN-IP host 2591 routes for the multicast sources, they only need to originate VPN-IP 2592 subnet routes. The internal structure of the EVPN is completely 2593 hidden from the MVPN node. EVPN actions such as MAC Mobility and 2594 Mass Withdrawal ([RFC7432]) have zero impact on the MVPN control 2595 plane. 2597 While this deployment scenario provides the most optimal routing and 2598 has the least impact on the installed based of MVPN nodes, it does 2599 complicate network planning considerations. 2601 Another way of providing routing that is close to optimal is to turn 2602 each EVPN PE into a MEG. Then routing of MVPN-to-EVPN traffic is 2603 optimal. However, routing of EVPN-to-MVPN traffic is not guaranteed 2604 to be optimal when a source host is on a multi-homed ethernet segment 2605 (as discussed in Section 6.1.2.2.) 2607 The obvious disadvantage of this method is that it requires every 2608 EVPN PE to be a MEG. 2610 The procedures specified in this document allow an operator to add 2611 MEG functionality to any subset of his EVPN OISM PEs. This allows an 2612 operator to make whatever trade-offs he deems appropriate between 2613 optimal routing and MEG deployment. 2615 6.1.2.4. DR Selection 2617 Each MEG MUST be configured with an "MEG dummy ethernet segment" that 2618 has no ACs. 2620 EVPN supports a number of procedures that can be used to select the 2621 Designated Forwarder (DF) for a particular BD on a particular 2622 ethernet segment. Some of the possible procedures can be found, 2623 e.g., in [RFC7432], [EVPN-DF-NEW], and [EVPN-DF-WEIGHTED]. Whatever 2624 procedure is in use in a given deployment can be adapted to select a 2625 MEG DR for a given BD, as follows. 2627 Each MEG will originate an Ethernet Segment route for the MEG dummy 2628 ethernet segment. It MUST carry a Route Target derived from the 2629 corresponding Ethernet Segment Identifier. Thus only MEGs will 2630 import the route. 2632 Once the set of MEGs is known, it is also possible to determine the 2633 set of BDs supported by each MEG. The DF selection procedure can 2634 then be used to choose a MEG DR for the SBD. (The conditions under 2635 which the MEG DR changes depends upon the DF selection algorithm that 2636 is in use.) 2638 These procedures can also be used to select a DR for each BD. 2640 6.1.3. Interworking with 'Global Table Multicast' 2642 If multicast service to the outside sources and/or receivers is 2643 provided via the BGP-based "Global Table Multicast" (GTM) procedures 2644 of [RFC7716], the procedures of Section 6.1.2 can easily be adapted 2645 for EVPN/GTM interworking. The way to adapt the MVPN procedures to 2646 GTM is explained in [RFC7716]. 2648 6.1.4. Interworking with PIM 2650 As we have been discussing, there may be receivers in an EVPN tenant 2651 domain that are interested in multicast flows whose sources are 2652 outside the EVPN Tenant Domain. Or there may be receivers outside an 2653 EVPN Tenant Domain that are interested in multicast flows whose 2654 sources are inside the Tenant Domain. 2656 If the outside sources and/or receivers are part of an MVPN, 2657 interworking procedures are covered in Section 6.1.2. 2659 There are also cases where an external source or receiver are 2660 attached via IP, and the layer 3 multicast routing is done via PIM. 2661 In this case, the interworking between the "PIM domain" and the EVPN 2662 tenant domain is done at L3 Gateways that perform "PIM/EVPN Gateway" 2663 (PEG) functionality. A PEG is very similar to a MEG, except that its 2664 layer 3 multicast routing is done via PIM rather than via BGP. 2666 If external sources or receivers for a given group are attached to a 2667 PEG via a layer 3 interface, that interface should be treated as a 2668 VRF interface attached to the Tenant Domain's L3VPN VRF. The layer 3 2669 multicast routing instance for that Tenant Domain will either run PIM 2670 on the VRF interface or will listen for IGMP/MLD messages on that 2671 interface. If the external receiver is attached elsewhere on an IP 2672 network, the PE has to enable PIM on its interfaces to the backbone 2673 network. In both cases, the PE needs to perform PEG functionality, 2674 and its IMET routes must carry a flag or EC identifying it as a PEG. 2676 For each BD on which there is a multicast source or receiver, one of 2677 the PEGs will becomes the PEG DR. DR selection can be done using the 2678 same procedures specified in Section 6.1.2.4. 2680 As long as there are no tenant multicast routers within the EVPN 2681 Tenant Domain, the PEGs do not need to run PIM on their IRB 2682 interfaces. 2684 6.1.4.1. Source Inside EVPN Domain 2686 If a PEG receives a PIM Join(S,G) from outside the EVPN tenant 2687 domain, it may find it necessary to create (S,G) state. The PE needs 2688 to determine whether S is within the Tenant Domain. If S is not 2689 within the EVPN Tenant Domain, the PE carries out normal layer 3 2690 multicast routing procedures. If S is within the EVPN tenant domain, 2691 the IIF of the (S,G) state is set as follows: 2693 o if S is on a BD that is attached to the PE, the IIF is the PE's 2694 IRB interface to that BD; 2696 o if S is not on a BD that is attached to the PE, the IIF is the 2697 PE's IRB interface to the SBD. 2699 When the PE creates such an (S,G) state, it MUST originate (if it 2700 hasn't already) an SBD-SMET route for (S,G). This will cause it to 2701 pull the (S,G) traffic via layer 2. When the traffic arrives over an 2702 EVPN tunnel, it gets sent up an IRB interface where the layer 3 2703 multicast routing determines the packet's disposition. The SBD-SMET 2704 route is withdrawn when the (S,G) state no longer exists (unless 2705 there is some other reason for not withdrawing it). 2707 If there are no tenant multicast routers with the EVPN tenant domain, 2708 there cannot be an RP in the Tenant Domain, so a PEG does not have to 2709 handle externally arriving PIM Join(*,G) messages. 2711 The PEG DR for a particular BD MUST act as the a First Hop Router for 2712 that BD. It will examine all (S,G) traffic on the BD, and whenever G 2713 is an ASM group, the PEG DR will send Register messages to the RP for 2714 G. This means that the PEG DR will need to pull all the (S,G) 2715 traffic originating on a given BD, by originating an SMET (*,*) route 2716 for that BD. If a PEG DR is the DR for all the BDS, in SHOULD 2717 originate just an SBD-SMET (*,*) route rather than an SMET (*,*) 2718 route for each BD. 2720 The rules for exporting IP routes to multicast sources are the same 2721 as those specified for MEGs in Section 6.1.2.2, except that the 2722 exported routes will be IP routes rather than VPN-IP routes, and it 2723 is not necessary to attach the VRF Route Import EC or the Source AS 2724 EC. 2726 When a source is on a multi-homed segment, the same issue discussed 2727 in Section 6.1.2.2.3 exists. Suppose S is on an ethernet segment, 2728 belonging to BD1, that is multi-homed to both PE1 and PE2, where PE1 2729 is a PEG. And suppose that IP multicast traffic from S to G travels 2730 over the AC that attaches the segment to PE2. If PE1 receives an 2731 external PIM Join (S,G) route, it MUST originate an SMET route for 2732 (S,G). Normal OISM procedures will cause PE2 to send the (S,G) 2733 traffic to PE1 on an EVPN IP multicast tunnel. Normal OISM 2734 procedures will also cause PE1 to send the (S,G) traffic up its BD1 2735 IRB interface. Normal PIM procedures will then cause PE1 to forward 2736 the traffic along a PIM tree. In this case, the routing is not 2737 optimal, but the traffic does flow correctly. 2739 6.1.4.2. Source Outside EVPN Domain 2741 By means of normal OISM procedures, a PEG learns whether there are 2742 receivers in the Tenant Domain that are interested in receiving (*,G) 2743 or (S,G) traffic. The PEG must determine whether S (or the RP for G) 2744 is outside the EVPN Tenant Domain. If so, and if there is a receiver 2745 on BD1 interested in receiving such traffic, the PEG DR for BD1 is 2746 responsible for originating a PIM Join(S,G) or Join(*,G) control 2747 message. 2749 An alternative would be to allow any PEG that is directly attached to 2750 a receiver to originate the PIM Joins. Then the PEG DR would only 2751 have to originate PIM Joins on behalf of receivers that are not 2752 attached to a PEG. However, if this is done, it is necessary for the 2753 PEGs to run PIM on all their IRB interfaces, so that the PIM Assert 2754 procedures can be used to prevent duplicate delivery to a given BD. 2756 The IIF for the layer 3 (S,G) or (*,G) state is determined by normal 2757 PIM procedures. If a receiver is on BD1, and the PEG DR is attached 2758 to BD1, its IRB interface to BD1 is added to the OIF list. This 2759 ensures that any receivers locally attached to the PEG DR will 2760 receive the traffic. If there are receivers attached to other EVPN 2761 PEs, then whenever (S,G) traffic from an external source matches a 2762 (*,G) state, the PEG will create (S,G) state. The IIF will be set to 2763 whatever external interface the traffic is expected to arrive on 2764 (copied from the (*,G) state), the OIF list is copied from the (*,G) 2765 state, and the SBD IRB interface added to the OIF list. 2767 6.2. Interworking with PIM via an External PIM Router 2769 Section 6.1 describes how to use an OISM PE router as the gateway to 2770 a non-EVPN multicast domain, when the EVPN tenant domain is not being 2771 used as an intermediate transit network for multicast. An 2772 alternative approach is to have one or more external PIM routers 2773 (perhaps operated by a tenant) on one of the BDs of the tenant 2774 domain. We will refer to this BD as the "gateway BD". 2776 In this model: 2778 o The EVPN Tenant Domain is treated as a stub network attached to 2779 the external PIM routers. 2781 o The external PIM routers follow normal PIM procedures, and provide 2782 the FHR and LHR functionality for the entire Tenant Domain. 2784 o The OISM PEs do not run PIM. 2786 o If an OISM PE not attached to the gateway BD has interest in a 2787 given multicast flow, it conveys that interest to the OISM PEs 2788 that are attached to the gateway BD. This is done by following 2789 normal OISM procedures. As a result, IGMP/MLD messages will seen 2790 by the external PIM routers on the gateway BD, and those external 2791 PIM routers will send PIM Join messages externally as required. 2792 Traffic of the given multicast flow will then be received by one 2793 of the external PIM routers, and that traffic will be forwarded by 2794 that router to the gateway BD. 2796 The normal OISM procedures will then cause the given multicast 2797 flow to be tunneled to any PEs of the EVPN Tenant Domain that have 2798 interest in the flow. PEs attached to the gateway BD will see the 2799 flow as originating from the gateway BD, other PEs will see the 2800 flow as originating from the SBD. 2802 o An OISM PE attached to a gateway BD MUST set its layer 2 multicast 2803 state to indicate that each AC to the gateway BD has interest in 2804 all multicast flows. It MUST also originate an SMET route for 2805 (*,*). The procedures for originating SMET routes are discussed 2806 in Section 2.5. 2808 o This will cause the OISM PEs attached to the gateway BD to receive 2809 all the IP multicast traffic that is sourced within the EVPN 2810 tenant domain, and to transmit that traffic to the gateway BD, 2811 where the external PIM routers will see it. (Of course, if the 2812 gateway BD has a multi-homed segment, only the PE that is the DF 2813 for that segment will transmit the multicast traffic to the 2814 segment.) 2816 7. Using an EVPN Tenant Domain as an Intermediate (Transit) Network for 2817 Multicast traffic 2819 In this section, we consider the scenario where one or more BDs of an 2820 EVPN Tenant Domain are being used to carry IP multicast traffic for 2821 which the source and at least one receiver are not part the tenant 2822 domain. That is, one or more BDs of the Tenant Domain are 2823 intermediate "links" of a larger multicast tree created by PIM. 2825 We define a "tenant multicast router" as a multicast router, running 2826 PIM, that is: 2828 attached to one or more BDs of the Tenant Domain, but 2830 is not an EVPN PE router. 2832 In order an EVPN Tenant Domain to be used as a transit network for IP 2833 multicast, one or more of its BDs must have tenant multicast routers, 2834 and an OISM PE that attaching to such a BD MUST be provisioned to 2835 enable PIM on its IRB interface to that BD. (This is true even if 2836 none of the tenant routers is on a segment attached to the PE.) 2837 Further, all the OISM PEs (even ones not attached to a BD with tenant 2838 multicast routers) MUST be provisioned to enable PIM on their SBD IRB 2839 interfaces. 2841 If PIM is enabled on a particular BD, the DR Selection procedure of 2842 Section 6.1.2.4 MUST be replaced by the normal PIM DR Election 2843 procedure of [RFC7761]. Note that this may result in one of the 2844 tenant routers being selected as the DR, rather than one of the OISM 2845 PE routers. In this case, First Hop Router and Last Hop Router 2846 functionality will not be performed by any of the EVPN PEs. 2848 A PIM control message on a particular BD is considered to be a 2849 link-local multicast message, and as such is sent transparently from 2850 PE to PE via the BUM tunnel for that BD. This is true whether the 2851 control message was received from an AC, or whether it was received 2852 from the local layer 3 routing instance via an IRB interface. 2854 A PIM Join/Prune message contains three fields that are relevant to 2855 the present discussion: 2857 o Upstream Neighbor 2859 o Group Address (G) 2861 o Source Address (S), omitted in the case of (*,G) Join/Prune 2862 messages. 2864 We will generally speak of a PIM Join as a "Join(S,G)" or a 2865 "Join(*,G)" message, and will use the term "Join(X,G)" to mean 2866 "either Join(S,G) or Join(*,G)". In the context of a Join(X,G), we 2867 will use the term "X" to mean "S in the case of (S,G), or G's RP in 2868 the case of (*,G)". 2870 Suppose BD1 contains two tenant multicast routers, C1 and C2. 2871 Suppose C1 is on a segment attached to PE1, and C2 is on a segment 2872 attached to PE2. When C1 sends a PIM Join(X,G) to BD1, the Upstream 2873 Neighbor field might be set to either PE1, PE2, or C2. C1 chooses 2874 the Upstream Neighbor based on its unicast routing. Typically, it 2875 will choose as the Upstream Neighbor the PIM router on BD1 that is 2876 "closest" (according to the unicast routing) to X. Note that this 2877 will not necessarily be PE1. PE1 may not even be visible to the 2878 unicast routing algorithm used by the tenant routers. Even if it is, 2879 it is unlikely to be the PIM router that is closest to X. So we need 2880 to consider the following two cases: 2882 C1 sends a PIM Join(X,G) to BD1, with PE1 as the Upstream 2883 Neighbor. 2885 PE1's PIM routing instance will see the Join arrive on the BD1 IRB 2886 interface. If X is not within the Tenant Domain, PE1 handles the 2887 Join according to normal PIM procedures. This will generally 2888 result in PE1 selecting an Upstream Neighbor and sending it a 2889 Join(X,G). 2891 If X is within the Tenant Domain, but is attached to some other 2892 PE, PE1 sends (if it hasn't already) an SBD-SMET route for (X,G). 2893 The IIF of the layer 3 (X,G) state will be the SBD IRB interface, 2894 and the OIF list will include the IRB interface to BD1. 2896 The SBD-SMET route will pull the (X,G) traffic to PE1, and the 2897 (X,G) state will result in the (X,G) traffic being forwarded to 2898 C1. 2900 If X is within the Tenant Domain, but is attached to PE1 itself, 2901 no SBD-SMET route is sent. The IIF of the layer 3 (X,G) state 2902 will be the IRB interface to X's BD, and the OIF list will include 2903 the IRB interface to BD1. 2905 C1 sends a PIM Join(X,G) to BD1, with either PE2 or C2 as the 2906 Upstream Neighbor. 2908 PE1's PIM routing instance will see the Join arrive on the BD1 IRB 2909 interface. If neither X nor Upstream Neighbor is within the 2910 tenant domain, PE1 handles the Join according to normal PIM 2911 procedures. This will NOT result in PE1 sending a Join(X,G). 2913 If either X or Upstream Neighbor is within the Tenant Domain, PE1 2914 sends (if it hasn't already) an SBD-SMET route for (X,G). The IIF 2915 of the layer 3 (X,G) state will be the SBD IRB interface, and the 2916 OIF list will include the IRB interface to BD1. 2918 The SBD-SMET route will pull the (X,G) traffic to PE1, and the 2919 (X,G) state will result in the (X,G) traffic being forwarded to 2920 C1. 2922 8. IANA Considerations 2924 To be supplied. 2926 9. Security Considerations 2928 This document uses protocols and procedures defined in the normative 2929 references, and inherits the security considerations of those 2930 references. 2932 This document adds flags or Extended Communities (ECs) to a number of 2933 BGP routes, in order to signal that particular nodes support the 2934 OISM, IPMG, MEG, and/or PEG functionalities that are defined in this 2935 document. Incorrect addition, removal, or modification of those 2936 flags and/or ECs will cause the procedures defined herein to 2937 malfunction, in which case loss or diversion of data traffic is 2938 possible. 2940 10. Acknowledgements 2942 The authors thank Vikram Nagarajan and Princy Elizabeth for their 2943 work on Section 6.2. The authors also benefited tremendously from 2944 discussions with Aldrin Isaac on EVPN multicast optimizations. 2946 11. References 2948 11.1. Normative References 2950 [EVPN-AR] Rabadan, J., Ed., "Optimized Ingress Replication solution 2951 for EVPN", internet-draft ietf-bess-evpn-optimized-ir- 2952 02.txt, August 2017. 2954 [EVPN-BUM] 2955 Zhang, Z., Lin, W., Rabadan, J., and K. Patel, "Updates on 2956 EVPN BUM Procedures", internet-draft ietf-bess-evpn-bum- 2957 procedure-updates-02.txt, September 2017. 2959 [EVPN-IRB] 2960 Sajassi, A., Salam, S., Thoria, S., Drake, J., Rabadan, 2961 J., and L. Yong, "Integrated Routing and Bridging in 2962 EVPN", internet-draft draft-ietf-bess-evpn-inter-subnet- 2963 forwarding-03.txt, February 2017. 2965 [EVPN_IP_Prefix] 2966 Rabadan, J., Henderickx, W., Drake, J., Lin, W., and A. 2967 Sajassi, "IP Prefix Advertisement in EVPN", internet- 2968 draft ietf-bess-evpn-prefix-advertisement-09.txt, November 2969 2017. 2971 [IGMP-Proxy] 2972 Sajassi, A., Thoria, S., Patel, K., Yeung, D., Drake, J., 2973 and W. Lin, "IGMP and MLD Proxy for EVPN", internet-draft 2974 draft-ietf-bess-evpn-igmp-mld-proxy-00.txt, March 2017. 2976 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2977 Requirement Levels", BCP 14, RFC 2119, 2978 DOI 10.17487/RFC2119, March 1997, 2979 . 2981 [RFC2236] Fenner, W., "Internet Group Management Protocol, Version 2982 2", RFC 2236, DOI 10.17487/RFC2236, November 1997, 2983 . 2985 [RFC2710] Deering, S., Fenner, W., and B. Haberman, "Multicast 2986 Listener Discovery (MLD) for IPv6", RFC 2710, 2987 DOI 10.17487/RFC2710, October 1999, 2988 . 2990 [RFC6625] Rosen, E., Ed., Rekhter, Y., Ed., Hendrickx, W., and R. 2991 Qiu, "Wildcards in Multicast VPN Auto-Discovery Routes", 2992 RFC 6625, DOI 10.17487/RFC6625, May 2012, 2993 . 2995 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 2996 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 2997 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2998 2015, . 3000 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 3001 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 3002 May 2017, . 3004 11.2. Informative References 3006 [EVPN-BIER] 3007 Zhang, Z., Przygienda, A., Sajassi, A., and J. Rabadan, 3008 "EVPN BUM Using BIER", internet-draft ietf-bier-evpn- 3009 00.txt, August 2017. 3011 [EVPN-DF-NEW] 3012 Mohanty, S., Patel, K., Sajassi, A., Drake, J., and T. 3013 Przygienda, "A new Designated Forwarder Election for the 3014 EVPN", internet-draft ietf-bess-evpn-df-election-03.txt, 3015 October 2017. 3017 [EVPN-DF-WEIGHTED] 3018 Rabadan, J., Sathappan, S., Przygienda, T., Lin, W., 3019 Drake, J., Sajassi, A., and S. Mohanty, "Preference-based 3020 EVPN DF Election", internet-draft ietf-bess-evpn-pref-df- 3021 00.txt, June 2017. 3023 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 3024 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 3025 2006, . 3027 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 3028 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 3029 2012, . 3031 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 3032 Encodings and Procedures for Multicast in MPLS/BGP IP 3033 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 3034 . 3036 [RFC7716] Zhang, J., Giuliano, L., Rosen, E., Ed., Subramanian, K., 3037 and D. Pacella, "Global Table Multicast with BGP Multicast 3038 VPN (BGP-MVPN) Procedures", RFC 7716, 3039 DOI 10.17487/RFC7716, December 2015, 3040 . 3042 [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., 3043 Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent 3044 Multicast - Sparse Mode (PIM-SM): Protocol Specification 3045 (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 3046 2016, . 3048 Appendix A. Integrated Routing and Bridging 3050 This Appendix provides a short tutorial on the interaction of routing 3051 and bridging. First it shows the traditional model, where bridging 3052 and routing are performed in separate boxes. Then it shows the model 3053 specified in [EVPN-IRB], where a single box contains both routing and 3054 bridging functions. The latter model is presupposed in the body of 3055 this document. 3057 Figure 1 shows a "traditional" router that only does routing and has 3058 no L2 bridging capabilities. There are two LANs, LAN1 and LAN2. 3059 LAN1 is realized by switch1, LAN2 by switch2. The router has an 3060 interface, "lan1" that attaches to LAN1 (via switch1) and an 3061 interface "lan2" that attachs to LAN2 (via switch2). Each intreface 3062 is configured, as an IP interface, with an IP address and a subnet 3063 mask. 3065 +-------+ +--------+ +-------+ 3067 | | lan1| |lan2 | | 3069 H1 -----+Switch1+--------+ Router1+--------+Switch2+------H3 3071 | | | | | | 3073 H2 -----| | | | | | 3075 +-------+ +--------+ +-------+ 3077 |_________________| |__________________| 3079 LAN1 LAN2 3081 Figure 1: Conventional Router with LAN Interfaces 3083 IP traffic (unicast or multicast) that remains within a single subnet 3084 never reaches the router. For instance, if H1 emits an ethernet 3085 frame with H2's MAC address in the ethernet destination address 3086 field, the frame will go from H1 to Switch1 to H2, without ever 3087 reaching the router. Since the frame is never seen by a router, the 3088 IP datagram within the frame remains entirely unchanged; e.g., its 3089 TTL is not decremented. The ethernet Source and Destination MAC 3090 addresses are not changed either. 3092 If H1 wants to send a unicast IP datagram to H3, which is on a 3093 different subnet, H1 has to be configured with the IP address of a 3094 "default router". Let's assume that H1 is configured with an IP 3095 address of Router1 as its default router address. H1 compares H3's 3096 IP address with its own IP address and IP subnet mask, and determines 3097 that H3 is on a different subnet. So the packet has to be routed. 3098 H1 uses ARP to map Router1's IP address to a MAC address on LAN1. H1 3099 then encapsulates the datagram in an ethernet frame, using router1's 3100 MAC address as the destination MAC address, and sends the frame to 3101 Router1. 3103 Router1 then receives the frame over its lan1 interface. Router1 3104 sees that the frame is addressed to it, so it removes the ethernet 3105 encapsulation and processes the IP datagram. The datagram is not 3106 addressed to Router1, so it must be forwarded further. Router1 does 3107 a lookup of the datagram's IP destination field, and determines that 3108 the destination (H3) can be reached via Router1's lan2 interface. 3109 Router1 now performs the IP processing of the datagram: it decrements 3110 the IP TTL, adjusts the IP header checksum (if present), may fragment 3111 the packet is necessary, etc. Then the datagram (or its fragments) 3112 are encapsulated in an ethernet header, with Router1's MAC address on 3113 LAN2 as the MAC Source Address, and H3's MAC address on LAN2 (which 3114 Router1 determines via ARP) as the MAC Destination Address. Finally 3115 the packet is sent out the lan2 interface. 3117 If H1 has an IP multicast datagram to send (i.e., an IP datagram 3118 whose Destination Address field is an IP Multicast Address), it 3119 encapsulates it in an ethernet frame whose MAC Destination Address is 3120 computed from the IP Destination Address. 3122 If H2 is a receiver for that multicast address, H2 will receive a 3123 copy of the frame, unchanged, from H1. The MAC Source Address in the 3124 ethernet encapsulation does not change, the IP TTL field does not get 3125 decremented, etc. 3127 If H3 is a receiver for that multicast address, the datagram must be 3128 routed to H3. In order for this to happen, Router1 must be 3129 configured as a multicast router, and it must accept traffic sent to 3130 ethernet multicast addresses. Router1 will receive H1's multicast 3131 frame on its lan1 interface, will remove the ethernet encapsulation, 3132 and will determine how to dispatch the IP datagram based on Router1's 3133 multicast forwarding states. If Router1 knows that there is a 3134 receiver for the multicast datagram on LAN2, makes a copy of the 3135 datagram, decrements the TTL (and performs any other necessary IP 3136 processing), then encapsulates the datagram in ethernet frame for 3137 LAN2. The MAC Source Address for this frame will be Router1's MAC 3138 Source Address on LAN2. The MAC Destination Address is computed from 3139 the IP Destination Address. Finally, the frame is sent out Router1's 3140 LAN2 interface. 3142 Figure 2 shows an Integrated Router/Bridge that supports the routing/ 3143 bridging integration model of [EVPN-IRB]. 3145 +------------------------------------------+ 3147 | Integrated Router/Bridge | 3149 +-------+ +--------+ +-------+ 3151 | | IRB1| L3 |IRB2 | | 3153 H1 -----+ BD1 +--------+Routing +--------+ BD2 +------H3 3155 | | |Instance| | | 3157 H2 -----| | | | | | 3159 +-------+ +--------+ +-------+ 3161 |___________________| |____________________| 3163 LAN1 LAN2 3165 Figure 2: Integrated Router/Bridge 3167 In Figure 2, a single box consists of one or more "L3 Routing 3168 Instances". The routing/forwarding tables of a given routing 3169 instance is known as an IP-VRF ([EVPN-IRB]). In the context of EVPN, 3170 it is convenient to think of each routing instance as representing 3171 the routing of a particular tenant. Each IP-VRF is attached to one 3172 or more interfaces. 3174 When several EVPN PEs have a routing instance of the same tenant 3175 domain, those PEs advertise IP routes to the attached hosts. This is 3176 done as specified in [EVPN-IRB]. 3178 The integrated router/bridge shown in Figure 2 also attaches to a 3179 number of "Broadcast Domains" (BDs). Each BD performs the functions 3180 that are performed by the bridges in Figure 1. To the L3 routing 3181 instance, each BD appears to be a LAN. The interface attaching a 3182 particular BD to a particular IP-VRF is known as an "IRB Interface". 3183 From the perspective of L3 routing, each BD is a subnet. Thus each 3184 IRB interface is configured with a MAC address (which is the router's 3185 MAC address on the corresponding LAN), as well as an IP address and 3186 subnet mask. 3188 The integrated router/bridge shown in Figure 2 may have multiple ACs 3189 to each BD. These ACs are visible only to the bridging function, not 3190 to the routing instance. To the L3 routing instance, there is just 3191 one "interface" to each BD. 3193 If the L3 routing instance represents the IP routing of a particular 3194 tenant, the BDs attached to that routing instance are BDs belonging 3195 to that same tenant. 3197 Bridging and routing now proceed exactly as in the case of Figure 1, 3198 except that BD1 replaces Switch1, BD2 replaces Switch2, interface 3199 IRB1 replaces interface lan1, and interface IRB2 replaces interface 3200 lan2. 3202 It is important to understand that an IRB interface connects an L3 3203 routing instance to a BD, NOT to a "MAC-VRF". (See [RFC7432] for the 3204 definition of "MAC-VRF".) A MAC-VRF may contain several BDs, as long 3205 as no MAC address appears in more than one BD. From the perspective 3206 of the L3 routing instance, each individual BD is an individual IP 3207 subnet; whether each BD has its own MAC-VRF or not is irrelevant to 3208 the L3 routing instance. 3210 Figure 3 illustrates IRB when a pair of BDs (subnets) are attached to 3211 two different PE routers. In this example, each BD has two segments, 3212 and one segment of each BD is attached to one PE router. 3214 +------------------------------------------+ 3216 | Integrated Router/Bridges | 3218 +-------+ +--------+ +-------+ 3220 | | IRB1| |IRB2 | | 3222 H1 -----+ BD1 +--------+ PE1 +--------+ BD2 +------H3 3224 |(Seg-1)| |(L3 Rtg)| |(Seg-1)| 3226 H2 -----| | | | | | 3228 +-------+ +--------+ +-------+ 3230 |___________________| | |____________________| 3232 LAN1 | LAN2 3234 | 3236 | 3238 +-------+ +--------+ +-------+ 3240 | | IRB1| |IRB2 | | 3242 H4 -----+ BD1 +--------+ PE2 +--------+ BD2 +------H5 3244 |(Seg-2)| |(L3 Rtg)| |(Seg-2)| 3246 | | | | | | 3248 +-------+ +--------+ +-------+ 3250 Figure 3: Integrated Router/Bridges with Distributed Subnet 3252 If H1 needs to send an IP packet to H4, it determines from its IP 3253 address and subnet mask that H4 is on the same subnet as H1. 3254 Although H1 and H4 are not attached to the same PE router, EVPN 3255 provides ethernet communication among all hosts that are on the same 3256 BD. H1 thus uses ARP to find H4's MAC address, and sends an ethernet 3257 frame with H4's MAC address in the Destination MAC address field. 3258 The frame is received at PE1, but since the Destination MAC address 3259 is not PE1's MAC address, PE1 assumes that the frame is to remain on 3260 BD1. Therefore the packet inside the frame is NOT decapsulated, and 3261 is NOT send up the IRB interface to PE1's routing instance. Rather, 3262 standard EVPN intra-subnet procedures (as detailed in [RFC7432] are 3263 used to deliver the frame to PE2, which then sends it to H4. 3265 If H1 needs to send an IP packet to H5, it determines from its IP 3266 address and subnet mask that H5 is NOT on the same subnet as H1. 3267 Assuming that H1 has been configured with the IP address of PE1 as 3268 its default router, H1 sends the packet in an ethernet frame with 3269 PE1's MAC address in its Destination MAC Address field. PE1 receives 3270 the frame, and sees that the frame is addressed to it. PE1 thus 3271 sends the frame up its IRB1 interface to the L3 routing instance. 3272 Appropriate IP processing is done (e.g., TTL decrement). The L3 3273 routing instance determines that the "next hop" for H5 is PE2, so the 3274 packet is encapsulated (e.g., in MPLS) and sent across the backbone 3275 to PE2's routing instance. PE2 will see that the packet's 3276 destination, H5, is on BD2 segment-2, and will send the packet down 3277 its IRB2 interface. This causes the IP packet to be encapsulated in 3278 an ethernet frame with PE2's MAC address (on BD2) in the Source 3279 Address field and H5's MAC address in the Destination Address field. 3281 Note that if H1 has an IP packet to send to H3, the forwarding of the 3282 packet is handled entirely within PE1. PE1's routing instance sees 3283 the packet arrive on its IRB1 interface, and then transmits the 3284 packet by sending it down its IRB2 interface. 3286 Often, all the hosts in a particular Tenant Domain will be 3287 provisioned with the same value of the default router IP address. 3288 This IP address can be assigned, as an "anycast address", to all the 3289 EVPN PEs attached to that Tenant Domain. Thus although all hosts are 3290 provisioned with the same "default router address", the actual 3291 default router for a given host will be one of the PEs that is 3292 attached to the same ethernet segment as the host. This provisioning 3293 method ensures that IP packets from a given host are handled by the 3294 closest EVPN PE that supports IRB. 3296 In the topology of Figure 3, one could imagine that H1 is configured 3297 with a default router address that belongs to PE2 but not to PE1. 3298 Inter-subnet routing would still work, but IP packets from H1 to H3 3299 would then follow the non-optimal path H1-->PE1-->PE2-->PE1-->H3. 3300 Sending traffic on this sort of path, where it leaves a router and 3301 then comes back to the same router, is sometimes known as 3302 "hairpinning". Similarly, if PE2 supports IRB but PE1 dos not, the 3303 same non-optimal path from H1 to H3 would have to be followed. To 3304 avoid hairpinning, each EVPN PE needs to support IRB. 3306 It is worth pointing out the way IRB interfaces interact with 3307 multicast traffic. Referring again to Figure 3, suppose PE1 and PE2 3308 are functioning as IP multicast routers. Suppose also that H3 3309 transmits a multicast packet, and both H1 and H4 are interested in 3310 receiving that packet. PE1 will receive the packet from H3 via its 3311 IRB2 interface. The ethernet encapsulation from BD2 is removed, the 3312 IP header processing is done, and the packet is then reencapsulated 3313 for BD1, with PE1's MAC address in the MAC Source Address field. 3314 Then the packet is sent down the IRB1 interface. Layer 2 procedures 3315 (as defined in [RFC7432] would then be used to deliver a copy of the 3316 packet locally to H1, and remotely to H4. 3318 Please be aware that his document modifies the semantics, described 3319 in the previous paragraph, of sending/receiving multicast traffic on 3320 an IRB interface. This is explained in Section 1.5.1 and subsequent 3321 sections. 3323 Authors' Addresses 3325 Wen Lin 3326 Juniper Networks, Inc. 3328 EMail: wlin@juniper.net 3330 Zhaohui Zhang 3331 Juniper Networks, Inc. 3333 EMail: zzhang@juniper.net 3335 John Drake 3336 Juniper Networks, Inc. 3338 EMail: jdrake@juniper.net 3340 Eric C. Rosen (editor) 3341 Juniper Networks, Inc. 3343 EMail: erosen@juniper.net 3345 Jorge Rabadan 3346 Nokia 3348 EMail: jorge.rabadan@nokia.com 3349 Ali Sajassi 3350 Cisco Systems 3352 EMail: sajassi@cisco.com