idnits 2.17.1 draft-ietf-bess-evpn-bum-procedure-updates-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 7, 2021) is 931 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-igmp-mld-proxy-13 == Outdated reference: A later version (-14) exists of draft-ietf-bess-mvpn-evpn-aggregation-label-06 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Z. Zhang 3 Internet-Draft W. Lin 4 Updates: 7432 (if approved) Juniper Networks 5 Intended status: Standards Track J. Rabadan 6 Expires: April 10, 2022 Nokia 7 K. Patel 8 Arrcus 9 A. Sajassi 10 Cisco Systems 11 October 7, 2021 13 Updates on EVPN BUM Procedures 14 draft-ietf-bess-evpn-bum-procedure-updates-11 16 Abstract 18 This document specifies procedure updates for broadcast, unknown 19 unicast, and multicast (BUM) traffic in Ethernet VPNs (EVPN), 20 including selective multicast, and provider tunnel segmentation. 21 This document updates RFC 7432. 23 Requirements Language 25 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 26 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 27 "OPTIONAL" in this document are to be interpreted as described in BCP 28 14 [RFC2119] [RFC8174] when, and only when, they appear in all 29 capitals, as shown here. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on April 10, 2022. 48 Copyright Notice 50 Copyright (c) 2021 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (https://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2.1. Tunnel Segmentation . . . . . . . . . . . . . . . . . . . 4 68 2.1.1. Reasons for Tunnel Segmentation . . . . . . . . . . . 5 69 3. Additional Route Types of EVPN NLRI . . . . . . . . . . . . . 5 70 3.1. Per-Region I-PMSI A-D route . . . . . . . . . . . . . . . 6 71 3.2. S-PMSI A-D route . . . . . . . . . . . . . . . . . . . . 7 72 3.3. Leaf A-D route . . . . . . . . . . . . . . . . . . . . . 7 73 4. Selective Multicast . . . . . . . . . . . . . . . . . . . . . 8 74 5. Inter-AS Segmentation . . . . . . . . . . . . . . . . . . . . 9 75 5.1. Changes to Section 7.2.2 of [RFC7117] . . . . . . . . . . 9 76 5.2. I-PMSI Leaf Tracking . . . . . . . . . . . . . . . . . . 10 77 5.3. Backward Compatibility . . . . . . . . . . . . . . . . . 11 78 5.3.1. Designated ASBR Election . . . . . . . . . . . . . . 12 79 6. Inter-Region Segmentation . . . . . . . . . . . . . . . . . . 13 80 6.1. Area/AS vs. Region . . . . . . . . . . . . . . . . . . . 13 81 6.2. Per-region Aggregation . . . . . . . . . . . . . . . . . 14 82 6.3. Use of S-NH-EC . . . . . . . . . . . . . . . . . . . . . 15 83 6.4. Ingress PE's I-PMSI Leaf Tracking . . . . . . . . . . . . 16 84 7. Multi-homing Support . . . . . . . . . . . . . . . . . . . . 16 85 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 86 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 87 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 88 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 17 89 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 90 12.1. Normative References . . . . . . . . . . . . . . . . . . 17 91 12.2. Informative References . . . . . . . . . . . . . . . . . 19 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 94 1. Terminology 96 It is expected that audience is familiar with EVPN and MVPN concepts 97 and terminologies. For convenience, the following terms are briefly 98 explained. 100 o PMSI [RFC6513]: P-Multicast Service Interface - a conceptual 101 interface for a PE to send customer multicast traffic to all or 102 some PEs in the same VPN. 104 o I-PMSI: Inclusive PMSI - to all PEs in the same VPN. 106 o S-PMSI: Selective PMSI - to some of the PEs in the same VPN. 108 o Leaf Auto-Discovery (A-D) routes [RFC6513]: For explicit leaf 109 tracking purpose. Triggered by S-PMSI A-D routes and targeted at 110 triggering route's originator. 112 o IMET A-D route [RFC7432]: Inclusive Multicast Ethernet Tag A-D 113 route. The EVPN equivalent of MVPN Intra-AS I-PMSI A-D route. 115 o SMET A-D route [I-D.ietf-bess-evpn-igmp-mld-proxy]: Selective 116 Multicast Ethernet Tag A-D route. The EVPN equivalent of MVPN 117 Leaf A-D route but unsolicited and untargeted. 119 2. Introduction 121 [RFC7117] specifies procedures for Multicast in Virtual Private LAN 122 Service (VPLS Multicast) using both inclusive tunnels and selective 123 tunnels with or without inter-as segmentation, similar to Multicast 124 VPN (MVPN) procedures specified in [RFC6513] and [RFC6514]. 125 [RFC7524] specifies inter-area tunnel segmentation procedures for 126 both VPLS Multicast and MVPN. 128 [RFC7432] specifies BGP MPLS-Based Ethernet VPN (EVPN) procedures, 129 including those handling broadcast, unknown unicast, and multicast 130 (BUM) traffic. A lot of details are referred to [RFC7117], yet with 131 quite some feature gaps like selective tunnel and tunnel segmentation 132 (Section 2.1). 134 This document aims at filling the gaps - cover the use of selective 135 and segmented tunnels in EVPN. It follows the same editorial choice 136 as in RFC7432 - only changes/additions to relevant [RFC7117] and 137 [RFC7524] procedures are specified, instead of repeating the text. 138 Note that these changes/additions are to be applied to EVPN only, and 139 are not updates to [RFC7117] or [RFC7524]. 141 MVPN uses terms I-PMSI and S-PMSI A-D Routes. For consistency and 142 convenience, this document will use the same I/S-PMSI terms for VPLS 143 and EVPN. In particular, EVPN's Inclusive Multicast Ethernet Tag 144 Route and VPLS's VPLS A-D route carrying PTA (PMSI Tunnel Attribute) 145 for BUM traffic purpose will all be referred to as I-PMSI A-D routes. 146 Depending on the context, they may be used interchangeably. 148 2.1. Tunnel Segmentation 150 MVPN provider tunnels and EVPN/VPLS BUM provider tunnels, which are 151 referred to as MVPN/EVPN/VPLS provider tunnels in this document for 152 simplicity, can be segmented for technical or administrative reasons, 153 which are summarized in Section 2.1.1 of this document. [RFC6513] 154 and [RFC6514] cover MVPN inter-as segmentation, [RFC7117] covers VPLS 155 multicast inter-as segmentation, and [RFC7524] (Seamless MPLS 156 Multicast) covers inter-area segmentation for both MVPN and VPLS. 158 With tunnel segmentation, different segments of an end-to-end tunnel 159 may have different encapsulation overhead. However, the largest 160 overhead of the tunnel caused by an encapsulation method on a 161 particular segment is not different from the case of a non-segmented 162 tunnel with that encapsulation method. This is similar to the case 163 of a network with different link types. 165 There is a difference between MVPN and VPLS multicast inter-as 166 segmentation. For simplicity, EVPN will use the same procedures as 167 in MVPN. All ASBRs can re-advertise their choice of the best route. 168 Each can become the root of its intra-AS segment and inject traffic 169 it receives from its upstream, while each downstream PE/ASBR will 170 only pick one of the upstream ASBRs as its upstream. This is also 171 the behavior even for VPLS in case of inter-area segmentation. 173 For inter-area segmentation, [RFC7524] requires the use of Inter-area 174 P2MP Segmented Next-Hop Extended Community (S-NH-EC), and the setting 175 of "Leaf Information Required" L flag in PTA in certain situations. 176 Either of these could be optional in case of EVPN. Removing these 177 requirements would make the segmentation procedures transparent to 178 ingress and egress PEs. 180 [RFC7524] assumes that segmentation happens at area borders. 181 However, it could be at "regional" borders, where a region could be a 182 sub-area, or even an entire AS plus its external links (Section 6). 183 That would allow for more flexible deployment scenarios (e.g. for 184 single-area provider networks). This document extends the inter-area 185 segmentation to inter-region segmentation for EVPN. 187 2.1.1. Reasons for Tunnel Segmentation 189 Tunnel segmentation may be required and/or desired because of 190 administrative and/or technical reasons. 192 For example, an MVPN/VPLS/EVPN network may span multiple providers 193 and Inter-AS Option-B has to be used, in which the end-to-end 194 provider tunnels have to be segmented at and stitched by the ASBRs. 195 Different providers may use different tunnel technologies (e.g., 196 provider A uses Ingress Replication [RFC7988], provider B uses RSVP- 197 TE P2MP [RFC4875] while provider C uses mLDP [RFC6388]). Even if 198 they use the same tunnel technology like RSVP-TE P2MP, it may be 199 impractical to set up the tunnels across provider boundaries. 201 The same situations may apply between the ASes and/or areas of a 202 single provider. For example, the backbone area may use RSVP-TE P2MP 203 tunnels while non-backbone areas may use mLDP tunnels. 205 Segmentation can also be used to divide an AS/area to smaller 206 regions, so that control plane state and/or forwarding plane state/ 207 burden can be limited to that of individual regions. For example, 208 instead of Ingress Replicating to 100 PEs in the entire AS, with 209 inter-area segmentation [RFC7524] a PE only needs to replicate to 210 local PEs and ABRs. The ABRs will further replicate to their 211 downstream PEs and ABRs. This not only reduces the forwarding plane 212 burden, but also reduces the leaf tracking burden in the control 213 plane. 215 Smaller regions also have the benefit that, in case of tunnel 216 aggregation, it is easier to find congruence among the segments of 217 different constituent (service) tunnels and the resulting aggregation 218 (base) tunnel in a region. This leads to better bandwidth 219 efficiency, because the more congruent they are, the fewer leaves of 220 the base tunnel need to discard traffic when a service tunnel's 221 segment does not need to receive the traffic (yet it is receiving the 222 traffic due to aggregation). 224 Another advantage of the smaller region is smaller BIER sub-domains. 225 In this new multicast architecture BIER [RFC8279], packets carry a 226 BitString, in which the bits correspond to edge routers that needs to 227 receive traffic. Smaller sub-domains means smaller BitStrings can be 228 used without having to send multiple copies of the same packet. 230 3. Additional Route Types of EVPN NLRI 232 [RFC7432] defines the format of EVPN NLRI as the following: 234 +-----------------------------------+ 235 | Route Type (1 octet) | 236 +-----------------------------------+ 237 | Length (1 octet) | 238 +-----------------------------------+ 239 | Route Type specific (variable) | 240 +-----------------------------------+ 242 So far eight types have been defined in [RFC7432], 243 [I-D.ietf-bess-evpn-prefix-advertisement], and 244 [I-D.ietf-bess-evpn-igmp-mld-proxy]: 246 + 1 - Ethernet Auto-Discovery (A-D) route 247 + 2 - MAC/IP Advertisement route 248 + 3 - Inclusive Multicast Ethernet Tag route 249 + 4 - Ethernet Segment route 250 + 5 - IP Prefix Route 251 + 6 - Selective Multicast Ethernet Tag Route 252 + 7 - Multicast Join Synch Route 253 + 8 - Multicast Leave Synch Route 255 This document defines three additional route types: 257 + 9 - Per-Region I-PMSI A-D route 258 + 10 - S-PMSI A-D route 259 + 11 - Leaf A-D route 261 The "Route Type specific" field of the type 9 and type 10 EVPN NLRIs 262 starts with a type 1 RD, whose Administrator sub-field MUST match 263 that of the RD in all non-Leaf A-D (Section 3.3) EVPN routes from the 264 same advertising router for a given EVI. 266 3.1. Per-Region I-PMSI A-D route 268 The Per-region I-PMSI A-D route has the following format. Its usage 269 is discussed in Section 6.2. 271 +-----------------------------------+ 272 | RD (8 octets) | 273 +-----------------------------------+ 274 | Ethernet Tag ID (4 octets) | 275 +-----------------------------------+ 276 | Region ID (8 octets) | 277 +-----------------------------------+ 279 The Region ID identifies the region and is encoded just as how an 280 Extended Community is encoded, as detailed in Section 6.2. 282 3.2. S-PMSI A-D route 284 The S-PMSI A-D route has the following format: 286 +-----------------------------------+ 287 | RD (8 octets) | 288 +-----------------------------------+ 289 | Ethernet Tag ID (4 octets) | 290 +-----------------------------------+ 291 | Multicast Source Length (1 octet) | 292 +-----------------------------------+ 293 | Multicast Source (Variable) | 294 +-----------------------------------+ 295 | Multicast Group Length (1 octet) | 296 +-----------------------------------+ 297 | Multicast Group (Variable) | 298 +-----------------------------------+ 299 |Originator's Addr Length (1 octet) | 300 +-----------------------------------+ 301 |Originator's Addr (4 or 16 octets) | 302 +-----------------------------------+ 304 Other than the addition of Ethernet Tag ID and Originator's Addr 305 Length, it is identical to the S-PMSI A-D route as defined in 306 [RFC7117]. The procedures in [RFC7117] also apply (including 307 wildcard functionality), except that the granularity level is per 308 Ethernet Tag. 310 3.3. Leaf A-D route 312 The Route Type specific field of a Leaf A-D route consists of the 313 following: 315 +-----------------------------------+ 316 | Route Key (variable) | 317 +-----------------------------------+ 318 |Originator's Addr Length (1 octet) | 319 +-----------------------------------+ 320 |Originator's Addr (4 or 16 octets) | 321 +-----------------------------------+ 323 A Leaf A-D route is originated in response to a PMSI route, which 324 could be an Inclusive Multicast Tag route, a per-region I-PMSI A-D 325 route, an S-PMSI A-D route, or some other types of routes that may be 326 defined in the future that triggers Leaf A-D routes. The Route Key 327 is the "Route Type Specific" field of the route for which this Leaf 328 A-D route is generated. 330 The general procedures of Leaf A-D route are first specified in 331 [RFC6514] for MVPN. The principles apply to VPLS and EVPN as well. 332 [RFC7117] has details for VPLS Multicast, and this document points 333 out some specifics for EVPN, e.g. in Section 5. 335 4. Selective Multicast 337 [I-D.ietf-bess-evpn-igmp-mld-proxy] specifies procedures for EVPN 338 selective forwarding of IP multicast using SMET routes. It assumes 339 selective forwarding is always used with IR for all flows (though the 340 same signaling can also be used for an ingress PE to find out the set 341 of egress PEs for selective forwarding with BIER). An NVE proxies 342 the IGMP/MLD state that it learns on its ACs to (C-S,C-G) or 343 (C-*,C-G) SMET routes and advertises to other NVEs, and a receiving 344 NVE converts the SMET routes back to IGMP/MLD messages and send them 345 out of its ACs. The receiving NVE also uses the SMET routes to 346 identify which NVEs need to receive traffic for a particular 347 (C-S,C-G) or (C-*,C-G) to achieve selective forwarding using IR or 348 BIER. 350 With the above procedures, selective forwarding is done for all flows 351 and the SMET routes are advertised for all flows. It is possible 352 that an operator may not want to track all those (C-S, C-G) or 353 (C-*,C-G) state on the NVEs, and the multicast traffic pattern allows 354 inclusive forwarding for most flows while selective forwarding is 355 needed only for a few high-rate flows. For that, or for tunnel types 356 other than IR/BIER, S-PMSI/Leaf A-D procedures defined for Selective 357 Multicast for VPLS in [RFC7117] are used. Other than that different 358 route types and formats are specified with EVPN SAFI for S-PMSI A-D 359 and Leaf A-D routes (Section 3), all procedures in [RFC7117] with 360 respect to Selective Multicast apply to EVPN as well, including 361 wildcard procedures. In a nutshell, a source NVE advertises S-PMSI 362 A-D routes to announce the tunnels used for certain flows, and 363 receiving NVEs either join the announced PIM/mLDP tunnel or respond 364 with Leaf A-D routes if the Leaf Information Requested flag is set in 365 the S-PMSI A-D route's PTA (so that the source NVE can include them 366 as tunnel leaves). 368 An optimization to the [RFC7117] procedures may be applied. Even if 369 a source NVE sets the L flag to request Leaf A-D routes, an egress 370 NVE MAY omit the Leaf A-D route if it has already advertised a 371 corresponding SMET route, and the source NVE MUST use that in lieu of 372 the Leaf A-D route. 374 The optional optimizations specified for MVPN in [RFC8534] are also 375 applicable to EVPN when the S-PMSI/Leaf A-D routes procedures are 376 used for EVPN selective multicast forwarding. 378 5. Inter-AS Segmentation 380 5.1. Changes to Section 7.2.2 of [RFC7117] 382 The first paragraph of Section 7.2.2.2 of [RFC7117] says: 384 "... The best route procedures ensure that if multiple 385 ASBRs, in an AS, receive the same Inter-AS A-D route from their EBGP 386 neighbors, only one of these ASBRs propagates this route in Internal 387 BGP (IBGP). This ASBR becomes the root of the intra-AS segment of 388 the inter-AS tree and ensures that this is the only ASBR that accepts 389 traffic into this AS from the inter-AS tree." 391 The above VPLS behavior requires complicated VPLS specific procedures 392 for the ASBRs to reach agreement. For EVPN, a different approach is 393 used and the above quoted text is not applicable to EVPN. 395 With the different approach for EVPN, each ASBR will re-advertise its 396 received Inter-AS A-D route to its IBGP peers and becomes the root of 397 an intra-AS segment of the inter-AS tree. The intra-AS segment 398 rooted at one ASBR is disjoint with another intra-AS segment rooted 399 at another ASBR. This is the same as the procedures for S-PMSI in 400 [RFC7117] itself. 402 The following bullet in Section 7.2.2.2 of [RFC7117] does not apply 403 to EVPN. 405 + If the ASBR uses ingress replication to instantiate the intra-AS 406 segment of the inter-AS tunnel, the re-advertised route MUST NOT 407 carry the PMSI Tunnel attribute. 409 The following bullet in Section 7.2.2.2 of [RFC7117]: 411 + If the ASBR uses a P-multicast tree to instantiate the intra-AS 412 segment of the inter-AS tunnel, the PMSI Tunnel attribute MUST 413 contain the identity of the tree that is used to instantiate the 414 segment (note that the ASBR could create the identity of the tree 415 prior to the actual instantiation of the segment). If, in order 416 to instantiate the segment, the ASBR needs to know the leaves of 417 the tree, then the ASBR obtains this information from the A-D 418 routes received from other PEs/ASBRs in the ASBR's own AS. 420 is changed to the following when applied to EVPN: 422 "The PMSI Tunnel attribute MUST specify the tunnel for the segment. 423 If and only if, in order to establish the tunnel, the ASBR needs to 424 know the leaves of the tree, then the ASBR MUST set the L flag to 425 1 in the PTA to trigger Leaf A-D routes from egress PEs and 426 downstream ASBRs. It MUST be (auto-)configured with an import RT, 427 which controls acceptance of leaf A-D routes by the ASBR." 429 Accordingly, the following paragraph in Section 7.2.2.4 of [RFC7117]: 431 "If the received Inter-AS A-D route carries the PMSI Tunnel attribute 432 with the Tunnel Identifier set to RSVP-TE P2MP LSP, then the ASBR 433 that originated the route MUST establish an RSVP-TE P2MP LSP with the 434 local PE/ASBR as a leaf. This LSP MAY have been established before 435 the local PE/ASBR receives the route, or it MAY be established after 436 the local PE receives the route." 438 is changed to the following when applied to EVPN: 440 "If the received Inter-AS A-D route has the L flag set in its PTA, 441 then a receiving PE MUST originate a corresponding Leaf A-D route, 442 while a receiving ASBR MUST originate a corresponding Leaf A-D route 443 if and only if it received and imported one or more corresponding 444 Leaf A-D routes from its downstream IBGP or EBGP peers, or it has 445 non-null downstream forwarding state for the PIM/mLDP tunnel that 446 instantiates its downstream intra-AS segment. The targeted ASBR for 447 the Leaf A-D route, which (re-)advertised the Inter-AS A-D route, 448 MUST establish a tunnel to the leaves discovered by the Leaf A-D 449 routes." 451 5.2. I-PMSI Leaf Tracking 453 An ingress PE does not set the L flag in its Inclusive Multicast 454 Ethernet Tag (IMET) A-D route's PTA, even with Ingress Replication or 455 RSVP-TE P2MP tunnels. It does not rely on the Leaf A-D routes to 456 discover leaves in its AS, and Section 11.2 of [RFC7432] explicitly 457 states that the L flag must be set to zero. 459 An implementation of [RFC7432] might have used the Originating 460 Router's IP Address field of the IMET A-D routes to determine the 461 leaves, or might have used the Next Hop field instead. Within the 462 same AS, both will lead to the same result. 464 With segmentation, an ingress PE MUST determine the leaves in its AS 465 from the BGP next hops in all its received IMET A-D routes, so it 466 does not have to set the L flag set to request Leaf A-D routes. PEs 467 within the same AS will all have different next hops in their IMET 468 A-D routes (hence will all be considered as leaves), and PEs from 469 other ASes will have the next hop in their IMET A-D routes set to 470 addresses of ASBRs in this local AS, hence only those ASBRs will be 471 considered as leaves (as proxies for those PEs in other ASes). Note 472 that in case of Ingress Replication, when an ASBR re-advertises IMET 473 A-D routes to IBGP peers, it MUST advertise the same label for all 474 those for the same Ethernet Tag ID and the same EVI. When an ingress 475 PE builds its flooding list, multiple routes might have the same 476 (nexthop, label) tuple and they MUST only be added as a single branch 477 in the flooding list. 479 5.3. Backward Compatibility 481 The above procedures assume that all PEs are upgraded to support the 482 segmentation procedures: 484 o An ingress PE uses the Next Hop instead of Originating Router's IP 485 Address to determine leaves for the I-PMSI tunnel. 487 o An egress PE sends Leaf A-D routes in response to I-PMSI routes, 488 if the PTA has the L flag set (by the re-advertising ASBRs). 490 o In case of Ingress Replication, when an ingress PE builds its 491 flooding list, multiple I-PMSI routes may have the same (nexthop, 492 label) tuple and only a single branch for those will be added in 493 the flooding list. 495 If a deployment has legacy PEs that does not support the above, then 496 a legacy ingress PE would include all PEs (including those in remote 497 ASes) as leaves of the inclusive tunnel and try to send traffic to 498 them directly (no segmentation), which is either undesired or not 499 possible; a legacy egress PE would not send Leaf A-D routes so the 500 ASBRs would not know to send external traffic to them. 502 To address this backward compatibility problem, the following 503 procedure can be used (see Section 6.2 for per-PE/AS/region I-PMSI 504 A-D routes): 506 o An upgraded PE indicates in its per-PE I-PMSI A-D route that it 507 supports the new procedures. This is done by setting a flag bit 508 in the EVPN Multicast Flags Extended Community. 510 o All per-PE I-PMSI A-D routes are restricted to the local AS and 511 not propagated to external peers. 513 o The ASBRs in an AS originate per-region I-PMSI A-D routes and 514 advertise to their external peers to advertise tunnels used to 515 carry traffic from the local AS to other ASes. Depending on the 516 types of tunnels being used, the L flag in the PTA may be set, in 517 which case the downstream ASBRs and upgraded PEs will send Leaf 518 A-D routes to pull traffic from their upstream ASBRs. In a 519 particular downstream AS, one of the ASBRs is elected, based on 520 the per-region I-PMSI A-D routes for a particular source AS, to 521 send traffic from that source AS to legacy PEs in the downstream 522 AS. The traffic arrives at the elected ASBR on the tunnel 523 announced in the best per-region I-PMSI A-D route for the source 524 AS, that the ASBR has selected of all those that it received over 525 EBGP or IBGP sessions. The election procedure is described in 526 Section 5.3.1. 528 o In an ingress/upstream AS, if and only if an ASBR has active 529 downstream receivers (PEs and ASBRs), which are learned either 530 explicitly via Leaf A-D routes or implicitly via PIM join or mLDP 531 label mapping, the ASBR originates a per-PE I-PMSI A-D route 532 (i.e., regular Inclusive Multicast Ethernet Tag route) into the 533 local AS, and stitches incoming per-PE I-PMSI tunnels into its 534 per-region I-PMSI tunnel. With this, it gets traffic from local 535 PEs and send to other ASes via the tunnel announced in its per- 536 region I-PMSI A-D route. 538 Note that, even if there is no backward compatibility issue, the use 539 of per-region I-PMSI has the benefit of keeping all per-PE I-PMSI A-D 540 routes in their local ASes, greatly reducing the flooding of the 541 routes and their corresponding Leaf A-D routes (when needed), and the 542 number of inter-as tunnels. 544 5.3.1. Designated ASBR Election 546 When an ASBR re-advertises a per-region I-PMSI A-D route into an AS 547 in which a designated ASBR needs to be used to forward traffic to the 548 legacy PEs in the AS, it MUST include a DF Election EC. The EC and 549 its use is specified in [RFC8584]. The AC-DF bit in the DF Election 550 EC MUST be cleared. If it is known that no legacy PEs exist in the 551 AS, the ASBR SHOULD NOT include the EC and SHOULD remove the DF 552 Election EC if one is carried in the per-region I-PMSI A-D routes 553 that it receives. Note that this is done for each set of per-region 554 I-PMSI A-D routes with the same NLRI. 556 Based on the procedures in [RFC8584], an election algorithm is 557 determined according to the DF Election ECs carried in the set of 558 per-region I-PMSI routes of the same NLRI re-adverised into the AS. 559 The algorithm is then applied to a candidate list, which is the set 560 of ASBRs that re-advertised the per-region I-PMSI routes of the same 561 NLRI carrying the DF Election EC. 563 6. Inter-Region Segmentation 565 6.1. Area/AS vs. Region 567 [RFC7524] is for MVPN/VPLS inter-area segmentation and does not 568 explicitly cover EVPN. However, if "area" is replaced by "region" 569 and "ABR" is replaced by "RBR" (Regional Border Router) then 570 everything still works, and can be applied to EVPN as well. 572 A region can be a sub-area, or can be an entire AS including its 573 external links. Instead of automatic region definition based on IGP 574 areas, a region would be defined as a BGP peer group. In fact, even 575 with IGP area based region definition, a BGP peer group listing the 576 PEs and ABRs in an area is still needed. 578 Consider the following example diagram: 580 --------- ------ --------- 581 / \ / \ / \ 582 / \ / \ / \ 583 | PE1 o ASBR1 -- ASBR2 ASBR3 -- ASBR4 o PE2 | 584 \ / \ / \ / 585 \ / \ / \ / 586 --------- ------ --------- 587 AS 100 AS 200 AS 300 588 |-----------|--------|---------|--------|------------| 589 segment1 segment2 segment3 segment4 segment5 591 The inter-as segmentation procedures specified so far ([RFC6513] 592 [RFC6514], [RFC7117], and Section 5 of this document) require all 593 ASBRs to be involved, and Ingress Replication is used between two 594 ASBRs in different ASes. 596 In the above diagram, it's possible that ASBR1/4 does not support 597 segmentation, and the provider tunnels in AS 100/300 can actually 598 extend across the external link. In this case, the inter-region 599 segmentation procedures can be used instead - a region is the entire 600 (AS100 + ASBR1-ASBR2 link) or (AS300 + ASBR3-ASBR4 link). ASBR2/3 601 would be the RBRs, and ASBR1/4 will just be a transit core router 602 with respect to provider tunnels. 604 As illustrated in the diagram below, ASBR2/3 will establish a 605 multihop EBGP session with either a RR or directly with PEs in the 606 neighboring AS. I/S-PMSI A-D routes from ingress PEs will not be 607 processed by ASBR1/4. When ASBR2 re-advertises the routes into AS 608 200, it changes the next hop to its own address and changes PTA to 609 specify the tunnel type/identification in its own AS. When ASBR3 re- 610 advertises I/S-PMSI A-D routes into the neighboring AS 300, it 611 changes the next hop to its own address and changes PTA to specify 612 the tunnel type/identification in the neighboring region 3. Now the 613 segment is rooted at ASBR3 and extends across the external link to 614 PEs. 616 --------- ------ --------- 617 / RR....\.mh-ebpg / \ mh-ebgp/....RR \ 618 / : \ `. / \ .' / : \ 619 | PE1 o ASBR1 -- ASBR2 ASBR3 -- ASBR4 o PE2 | 620 \ / \ / \ / 621 \ / \ / \ / 622 --------- ------ --------- 623 AS 100 AS 200 AS 300 624 |-------------------|----------|---------------------| 625 segment 1 segment 2 segment 3 627 6.2. Per-region Aggregation 629 Notice that every I/S-PMSI route from each PE will be propagated 630 throughout all the ASes or regions. They may also trigger 631 corresponding Leaf A-D routes depending on the types of tunnels used 632 in each region. This may become too many - routes and corresponding 633 tunnels. To address this concern, the I-PMSI routes from all PEs in 634 a AS/region can be aggregated into a single I-PMSI route originated 635 from the RBRs, and traffic from all those individual I-PMSI tunnels 636 will be switched into the single I-PMSI tunnel. This is like the 637 MVPN Inter-AS I-PMSI route originated by ASBRs. 639 The MVPN Inter-AS I-PMSI A-D route can be better called as per-AS 640 I-PMSI A-D route, to be compared against the (per-PE) Intra-AS I-PMSI 641 A-D routes originated by each PE. In this document we will call it 642 as per-region I-PMSI A-D route, in case we want to apply the 643 aggregation at regional level. The per-PE I-PMSI routes will not be 644 propagated to other regions. If multiple RBRs are connected to a 645 region, then each will advertise such a route, with the same route 646 key (Section 3.1). Similar to the per-PE I-PMSI A-D routes, RBRs/PEs 647 in a downstream region will each select a best one from all those re- 648 advertised by the upstream RBRs, hence will only receive traffic 649 injected by one of them. 651 MVPN does not aggregate S-PMSI routes from all PEs in an AS like it 652 does for I-PMSIs routes, because the number of PEs that will 653 advertise S-PMSI routes for the same (s,g) or (*,g) is small. This 654 is also the case for EVPN, i.e., there is no per-region S-PMSI 655 routes. 657 Notice that per-region I-PMSI routes can also be used to address 658 backwards compatibility issue, as discussed in Section 5.3. 660 The Region ID in the per-region I-PMSI route's NLRI is encoded like 661 an EC. For example, the Region ID can encode an AS number or area ID 662 in the following EC format: 664 o For a two-octet AS number, a Transitive Two-Octet AS-Specific EC 665 of sub-type 0x09 (Source AS), with the Global Administrator sub- 666 field set to the AS number and the Local Administrator sub-field 667 set to 0. 669 o For a four-octet AS number, a Transitive Four-Octet AS-Specific EC 670 of sub-type 0x09 (Source AS), with the Global Administrator sub- 671 field set to the AS number and the Local Administrator sub-field 672 set to 0. 674 o For an area ID, a Transitive IPv4-Address-Specific EC of any sub- 675 type, with the Global Administrator sub-field set to the area ID 676 and the Local Administrator sub-field set to 0. 678 Uses of other EC encoding MAY be allowed as long as it uniquely 679 identifies the region and the RBRs for the same region uses the same 680 Region ID. 682 6.3. Use of S-NH-EC 684 [RFC7524] specifies the use of S-NH-EC because it does not allow ABRs 685 to change the BGP next hop when they re-advertise I/S-PMSI A-D routes 686 to downstream areas. That is only to be consistent with the MVPN 687 Inter-AS I-PMSI A-D routes, whose next hop must not be changed when 688 they're re-advertised by the segmenting ABRs for reasons specific to 689 MVPN. For EVPN, it is perfectly fine to change the next hop when 690 RBRs re-advertise the I/S-PMSI A-D routes, instead of relying on S- 691 NH-EC. As a result, this document specifies that RBRs change the BGP 692 next hop when they re-advertise I/S-PMSI A-D routes and do not use S- 693 NH-EC. The advantage of this is that neither ingress nor egress PEs 694 need to understand/use S-NH-EC, and consistent procedure (based on 695 BGP next hop) is used for both inter-as and inter-region 696 segmentation. 698 If a downstream PE/RBR needs to originate Leaf A-D routes, it 699 constructs an IP-based Route Target Extended Community by placing the 700 IP address carried in the Next Hop of the received I/S-PMSI A-D route 701 in the Global Administrator field of the Community, with the Local 702 Administrator field of this Community set to 0 and setting the 703 Extended Communities attribute of the Leaf A-D route to that 704 Community. 706 Similar to [RFC7524], the upstream RBR MUST (auto-)configure a RT 707 with the Global Administrator field set to the Next Hop in the re- 708 advertised I/S-PMSI A-D route and with the Local Administrator field 709 set to 0. With this, the mechanisms specified in [RFC4684] for 710 constrained BGP route distribution can be used along with this 711 specification to ensure that only the needed PE/ABR will have to 712 process a said Leaf A-D route. 714 6.4. Ingress PE's I-PMSI Leaf Tracking 716 [RFC7524] specifies that when an ingress PE/ASBR (re-)advertises an 717 VPLS I-PMSI A-D route, it sets the L flag to 1 in the route's PTA. 718 Similar to the inter-as case, this is actually not really needed for 719 EVPN. To be consistent with the inter-as case, the ingress PE does 720 not set the L flag in its originated I-PMSI A-D routes, and 721 determines the leaves based on the BGP next hops in its received 722 I-PMSI A-D routes, as specified in Section 5.2. 724 The same backward compatibility issue exists, and the same solution 725 as in the inter-as case applies, as specified in Section 5.3. 727 7. Multi-homing Support 729 To support multi-homing with segmentation, ESI labels SHOULD be 730 allocated from "Domain-wide Common Block" (DCB) 731 [I-D.ietf-bess-mvpn-evpn-aggregation-label] for all tunnel types 732 including Ingress Replication. Via means outside the scope of this 733 document, PEs know that ESI labels are from DCB and existing multi- 734 homing procedures work as is, whether a multi-homed Ethernet Segment 735 spans across segmentation regions or not. 737 Not using DCB-allocated ESI labels is outside the scope of this 738 document. 740 8. IANA Considerations 742 IANA has temporarily assigned the following new EVPN route types: 744 o 9 - Per-Region I-PMSI A-D route 746 o 10 - S-PMSI A-D route 748 o 11 - Leaf A-D route 750 This document requests IANA to assign one flag bit from the EVPN 751 Multicast Flags Extended Community: 753 o Bit-S - The router supports segmentation procedure defined in this 754 document 756 9. Security Considerations 758 The Selective Forwarding procedures via S-PMSI/Leaf A-D routes in 759 this document are based on the same procedures for MVPN [RFC6514] and 760 VPLS Multicast [RFC7117]. The tunnel segmentation procedures in this 761 document are based on the similar procedures for MVPN inter-AS 762 [RFC6514] and inter-area [RFC7524] tunnel segmentation, and 763 procedures for VPLS Multicast [RFC7117] inter-as tunnel segmentation. 764 They do not introduce new security concerns besides what have been 765 discussed in [RFC6514], [RFC7117], [RFC7432] and [RFC7524]. 767 10. Acknowledgements 769 The authors thank Eric Rosen, John Drake, and Ron Bonica for their 770 comments and suggestions. 772 11. Contributors 774 The following also contributed to this document through their earlier 775 work in EVPN selective multicast. 777 Junlin Zhang 778 Huawei Technologies 779 Huawei Bld., No.156 Beiqing Rd. 780 Beijing 100095 781 China 783 Email: jackey.zhang@huawei.com 785 Zhenbin Li 786 Huawei Technologies 787 Huawei Bld., No.156 Beiqing Rd. 788 Beijing 100095 789 China 791 Email: lizhenbin@huawei.com 793 12. References 795 12.1. Normative References 797 [I-D.ietf-bess-evpn-igmp-mld-proxy] 798 Sajassi, A., Thoria, S., Mishra, M., Drake, J., and W. 799 Lin, "IGMP and MLD Proxy for EVPN", draft-ietf-bess-evpn- 800 igmp-mld-proxy-13 (work in progress), September 2021. 802 [I-D.ietf-bess-mvpn-evpn-aggregation-label] 803 Zhang, Z., Rosen, E., Lin, W., Li, Z., and I. Wijnands, 804 "MVPN/EVPN Tunnel Aggregation with Common Labels", draft- 805 ietf-bess-mvpn-evpn-aggregation-label-06 (work in 806 progress), April 2021. 808 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 809 Requirement Levels", BCP 14, RFC 2119, 810 DOI 10.17487/RFC2119, March 1997, 811 . 813 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 814 Encodings and Procedures for Multicast in MPLS/BGP IP 815 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 816 . 818 [RFC7117] Aggarwal, R., Ed., Kamite, Y., Fang, L., Rekhter, Y., and 819 C. Kodeboniya, "Multicast in Virtual Private LAN Service 820 (VPLS)", RFC 7117, DOI 10.17487/RFC7117, February 2014, 821 . 823 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 824 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 825 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 826 2015, . 828 [RFC7524] Rekhter, Y., Rosen, E., Aggarwal, R., Morin, T., 829 Grosclaude, I., Leymann, N., and S. Saad, "Inter-Area 830 Point-to-Multipoint (P2MP) Segmented Label Switched Paths 831 (LSPs)", RFC 7524, DOI 10.17487/RFC7524, May 2015, 832 . 834 [RFC7988] Rosen, E., Ed., Subramanian, K., and Z. Zhang, "Ingress 835 Replication Tunnels in Multicast VPN", RFC 7988, 836 DOI 10.17487/RFC7988, October 2016, 837 . 839 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 840 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 841 May 2017, . 843 [RFC8534] Dolganow, A., Kotalwar, J., Rosen, E., Ed., and Z. Zhang, 844 "Explicit Tracking with Wildcard Routes in Multicast VPN", 845 RFC 8534, DOI 10.17487/RFC8534, February 2019, 846 . 848 [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, 849 J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet 850 VPN Designated Forwarder Election Extensibility", 851 RFC 8584, DOI 10.17487/RFC8584, April 2019, 852 . 854 12.2. Informative References 856 [I-D.ietf-bess-evpn-prefix-advertisement] 857 Rabadan, J., Henderickx, W., Drake, J. E., Lin, W., and A. 858 Sajassi, "IP Prefix Advertisement in EVPN", draft-ietf- 859 bess-evpn-prefix-advertisement-11 (work in progress), May 860 2018. 862 [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 863 R., Patel, K., and J. Guichard, "Constrained Route 864 Distribution for Border Gateway Protocol/MultiProtocol 865 Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual 866 Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684, 867 November 2006, . 869 [RFC4875] Aggarwal, R., Ed., Papadimitriou, D., Ed., and S. 870 Yasukawa, Ed., "Extensions to Resource Reservation 871 Protocol - Traffic Engineering (RSVP-TE) for Point-to- 872 Multipoint TE Label Switched Paths (LSPs)", RFC 4875, 873 DOI 10.17487/RFC4875, May 2007, 874 . 876 [RFC6388] Wijnands, IJ., Ed., Minei, I., Ed., Kompella, K., and B. 877 Thomas, "Label Distribution Protocol Extensions for Point- 878 to-Multipoint and Multipoint-to-Multipoint Label Switched 879 Paths", RFC 6388, DOI 10.17487/RFC6388, November 2011, 880 . 882 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 883 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 884 2012, . 886 [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 887 Przygienda, T., and S. Aldrin, "Multicast Using Bit Index 888 Explicit Replication (BIER)", RFC 8279, 889 DOI 10.17487/RFC8279, November 2017, 890 . 892 Authors' Addresses 894 Zhaohui Zhang 895 Juniper Networks 897 EMail: zzhang@juniper.net 899 Wen Lin 900 Juniper Networks 902 EMail: wlin@juniper.net 904 Jorge Rabadan 905 Nokia 907 EMail: jorge.rabadan@nokia.com 909 Keyur Patel 910 Arrcus 912 EMail: keyur@arrcus.com 914 Ali Sajassi 915 Cisco Systems 917 EMail: sajassi@cisco.com