idnits 2.17.1 draft-ietf-bess-evpn-bum-procedure-updates-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 18, 2021) is 887 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-igmp-mld-proxy-14 == Outdated reference: A later version (-14) exists of draft-ietf-bess-mvpn-evpn-aggregation-label-06 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Z. Zhang 3 Internet-Draft W. Lin 4 Updates: 7432 (if approved) Juniper Networks 5 Intended status: Standards Track J. Rabadan 6 Expires: May 22, 2022 Nokia 7 K. Patel 8 Arrcus 9 A. Sajassi 10 Cisco Systems 11 November 18, 2021 13 Updates on EVPN BUM Procedures 14 draft-ietf-bess-evpn-bum-procedure-updates-14 16 Abstract 18 This document specifies updated procedures for handling broadcast, 19 unknown unicast, and multicast (BUM) traffic in Ethernet VPNs (EVPN), 20 including selective multicast, and provider tunnel segmentation. 21 This document updates RFC 7432. 23 Requirements Language 25 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 26 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 27 "OPTIONAL" in this document are to be interpreted as described in BCP 28 14 [RFC2119] [RFC8174] when, and only when, they appear in all 29 capitals, as shown here. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on May 22, 2022. 48 Copyright Notice 50 Copyright (c) 2021 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (https://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2.1. Tunnel Segmentation . . . . . . . . . . . . . . . . . . . 4 68 2.1.1. Reasons for Tunnel Segmentation . . . . . . . . . . . 5 69 3. Additional Route Types of EVPN NLRI . . . . . . . . . . . . . 6 70 3.1. Per-Region I-PMSI A-D route . . . . . . . . . . . . . . . 7 71 3.2. S-PMSI A-D route . . . . . . . . . . . . . . . . . . . . 7 72 3.3. Leaf A-D route . . . . . . . . . . . . . . . . . . . . . 8 73 4. Selective Multicast . . . . . . . . . . . . . . . . . . . . . 8 74 5. Inter-AS Segmentation . . . . . . . . . . . . . . . . . . . . 9 75 5.1. Differences from Section 7.2.2 of [RFC7117] When Applied 76 to EVPN . . . . . . . . . . . . . . . . . . . . . . . . . 9 77 5.2. I-PMSI Leaf Tracking . . . . . . . . . . . . . . . . . . 11 78 5.3. Backward Compatibility . . . . . . . . . . . . . . . . . 11 79 5.3.1. Designated ASBR Election . . . . . . . . . . . . . . 13 80 6. Inter-Region Segmentation . . . . . . . . . . . . . . . . . . 13 81 6.1. Area/AS vs. Region . . . . . . . . . . . . . . . . . . . 13 82 6.2. Per-region Aggregation . . . . . . . . . . . . . . . . . 14 83 6.3. Use of S-NH-EC . . . . . . . . . . . . . . . . . . . . . 15 84 6.4. Ingress PE's I-PMSI Leaf Tracking . . . . . . . . . . . . 16 85 7. Multi-homing Support . . . . . . . . . . . . . . . . . . . . 16 86 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 87 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 88 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 89 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 17 90 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 91 12.1. Normative References . . . . . . . . . . . . . . . . . . 18 92 12.2. Informative References . . . . . . . . . . . . . . . . . 19 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 95 1. Terminology 97 It is expected that audience is familiar with MVPN [RFC6513] 98 [RFC6514], VPLS Multicast [RFC7117] and EVPN [RFC7432] concepts and 99 terminologies. For convenience, the following terms are briefly 100 explained. 102 o PMSI [RFC6513]: P-Multicast Service Interface - a conceptual 103 interface for a PE to send customer multicast traffic to all or 104 some PEs in the same VPN. 106 o I-PMSI: Inclusive PMSI - to all PEs in the same VPN. 108 o S-PMSI: Selective PMSI - to some of the PEs in the same VPN. 110 o I/S-PMSI A-D Route: Auto-Discovery routes used to announce the 111 tunnels that instantiate an I/S-PMSI. 113 o Leaf Auto-Discovery (A-D) routes [RFC6513]: For explicit leaf 114 tracking purpose. Triggered by I/S-PMSI A-D routes and targeted 115 at triggering route's (re-)advertiser. Its NLRI embeds the entire 116 NLRI of the triggering PMSI A-D route. 118 o IMET A-D route [RFC7432]: Inclusive Multicast Ethernet Tag A-D 119 route. The EVPN equivalent of MVPN Intra-AS I-PMSI A-D route used 120 to announce the tunnels that instantiate an I-PMSI. 122 o SMET A-D route [I-D.ietf-bess-evpn-igmp-mld-proxy]: Selective 123 Multicast Ethernet Tag A-D route. The EVPN equivalent of MVPN 124 Leaf A-D route but unsolicited and untargeted. 126 o PMSI Tunnel Attribute (PTA): An optional transitive BGP attribute 127 that may be attached to PMSI/Leaf A-D routes to provide 128 information for a PMSI tunnel. 130 2. Introduction 132 [RFC7117] specifies procedures for Multicast in Virtual Private LAN 133 Service (VPLS Multicast) using both inclusive tunnels and selective 134 tunnels with or without inter-as segmentation, similar to the 135 Multicast VPN (MVPN) procedures specified in [RFC6513] and [RFC6514]. 136 [RFC7524] specifies inter-area tunnel segmentation procedures for 137 both VPLS Multicast and MVPN. 139 [RFC7432] specifies BGP MPLS-Based Ethernet VPN (EVPN) procedures, 140 including those handling broadcast, unknown unicast, and multicast 141 (BUM) traffic. A lot of details are referred to [RFC7117], yet with 142 quite some feature gaps like selective tunnel and tunnel segmentation 143 (Section 2.1). 145 This document aims at filling the gaps - cover the use of selective 146 and segmented tunnels in EVPN. It follows the same editorial choice 147 as in RFC7432 and only specifies differences from relevant procedures 148 in [RFC7117] and [RFC7524], instead of repeating the text. Note that 149 these differences are applicable to EVPN only, and are not updates to 150 [RFC7117] or [RFC7524]. 152 MVPN, VPLS and EVPN all have the need to discover other PEs in the 153 same L3/L2 VPN and announce the inclusive tunnels. MVPN introduced 154 the I-PMSI concept and uses I-PMSI A-D route for that. EVPN uses 155 Inclusive Multicast Ethernet Tag Route (IMET) A-D route but VPLS just 156 adds an PMSI Tunnel Attribute (PTA) to the existing VPLS A-D route 157 for that purpose. For selective tunnels, they all do use the same 158 term S-PMSI A-D routes. 160 Many places of this document involve the I-PMSI concept that is all 161 the same for all three technologies. For consistency and 162 convenience, EVPN's IMET and VPLS's VPLS A-D route carrying PTA for 163 BUM traffic purpose may all be referred to as I-PMSI A-D routes 164 depending on the context. 166 2.1. Tunnel Segmentation 168 MVPN provider tunnels and EVPN/VPLS BUM provider tunnels, which are 169 referred to as MVPN/EVPN/VPLS provider tunnels in this document for 170 simplicity, can be segmented for technical or administrative reasons, 171 which are summarized in Section 2.1.1 of this document. [RFC6513] 172 and [RFC6514] cover MVPN inter-as segmentation, [RFC7117] covers VPLS 173 multicast inter-as segmentation, and [RFC7524] (Seamless MPLS 174 Multicast) covers inter-area segmentation for both MVPN and VPLS. 176 With tunnel segmentation, different segments of an end-to-end tunnel 177 may have different encapsulation overhead. However, the largest 178 overhead of the tunnel caused by an encapsulation method on a 179 particular segment is not different from the case of a non-segmented 180 tunnel with that encapsulation method. This is similar to the case 181 of a network with different link types. 183 There is a difference between MVPN and VPLS multicast inter-as 184 segmentation (the VPLS approach is briefly discribed in Section 5.1). 185 For simplicity, EVPN will use the same procedures as in MVPN. All 186 ASBRs can re-advertise their choice of the best route. Each can 187 become the root of its intra-AS segment and inject traffic it 188 receives from its upstream, while each downstream PE/ASBR will only 189 pick one of the upstream ASBRs as its upstream. This is also the 190 behavior even for VPLS in case of inter-area segmentation. 192 For inter-area segmentation, [RFC7524] requires the use of Inter-area 193 P2MP Segmented Next-Hop Extended Community (S-NH-EC), and the setting 194 of "Leaf Information Required" L flag in PTA in certain situations. 195 In the EVPN case, the requirements around S-NH-EC and the PTA "L" 196 flag differ from [RFC7524] to make the segmentation procedures 197 transparent to ingress and egress PEs. 199 [RFC7524] assumes that segmentation happens at area borders. 200 However, it could be at "regional" borders, where a region could be a 201 sub-area, or even an entire AS plus its external links (Section 6.1). 202 That would allow for more flexible deployment scenarios (e.g. for 203 single-area provider networks). This document extends the inter-area 204 segmentation to inter-region segmentation for EVPN. 206 2.1.1. Reasons for Tunnel Segmentation 208 Tunnel segmentation may be required and/or desired because of 209 administrative and/or technical reasons. 211 For example, an MVPN/VPLS/EVPN network may span multiple providers 212 and the end-to-end provider tunnels have to be segmented at and 213 stitched by the ASBRs. Different providers may use different tunnel 214 technologies (e.g., provider A uses Ingress Replication [RFC7988], 215 provider B uses RSVP-TE P2MP [RFC4875] while provider C uses mLDP 216 [RFC6388]). Even if they use the same tunnel technology like RSVP-TE 217 P2MP, it may be impractical to set up the tunnels across provider 218 boundaries. 220 The same situations may apply between the ASes and/or areas of a 221 single provider. For example, the backbone area may use RSVP-TE P2MP 222 tunnels while non-backbone areas may use mLDP tunnels. 224 Segmentation can also be used to divide an AS/area into smaller 225 regions, so that control plane state and/or forwarding plane state/ 226 burden can be limited to that of individual regions. For example, 227 instead of Ingress Replicating to 100 PEs in the entire AS, with 228 inter-area segmentation [RFC7524] a PE only needs to replicate to 229 local PEs and ABRs. The ABRs will further replicate to their 230 downstream PEs and ABRs. This not only reduces the forwarding plane 231 burden, but also reduces the leaf tracking burden in the control 232 plane. 234 Smaller regions also have the benefit that, in case of tunnel 235 aggregation, it is easier to find congruence among the segments of 236 different constituent (service) tunnels and the resulting aggregation 237 (base) tunnel in a region. This leads to better bandwidth 238 efficiency, because the more congruent they are, the fewer leaves of 239 the base tunnel need to discard traffic when a service tunnel's 240 segment does not need to receive the traffic (yet it is receiving the 241 traffic due to aggregation). 243 Another advantage of the smaller region is smaller BIER [RFC8279] 244 sub-domains. With BIER, packets carry a BitString, in which the bits 245 correspond to edge routers that needs to receive traffic. Smaller 246 sub-domains means smaller BitStrings can be used without having to 247 send multiple copies of the same packet. 249 3. Additional Route Types of EVPN NLRI 251 [RFC7432] defines the format of EVPN NLRI as the following: 253 +-----------------------------------+ 254 | Route Type (1 octet) | 255 +-----------------------------------+ 256 | Length (1 octet) | 257 +-----------------------------------+ 258 | Route Type specific (variable) | 259 +-----------------------------------+ 261 So far eight route types have been defined in [RFC7432], 262 [I-D.ietf-bess-evpn-prefix-advertisement], and 263 [I-D.ietf-bess-evpn-igmp-mld-proxy]: 265 + 1 - Ethernet Auto-Discovery (A-D) route 266 + 2 - MAC/IP Advertisement route 267 + 3 - Inclusive Multicast Ethernet Tag route 268 + 4 - Ethernet Segment route 269 + 5 - IP Prefix Route 270 + 6 - Selective Multicast Ethernet Tag Route 271 + 7 - Multicast Join Synch Route 272 + 8 - Multicast Leave Synch Route 274 This document defines three additional route types: 276 + 9 - Per-Region I-PMSI A-D route 277 + 10 - S-PMSI A-D route 278 + 11 - Leaf A-D route 280 The "Route Type specific" field of the type 9 and type 10 EVPN NLRIs 281 starts with a type 1 RD, whose Administrator sub-field MUST match 282 that of the RD in all current non-Leaf A-D (Section 3.3) EVPN routes 283 from the same advertising router for a given EVI. 285 3.1. Per-Region I-PMSI A-D route 287 The Per-region I-PMSI A-D route has the following format. Its usage 288 is discussed in Section 6.2. 290 +-----------------------------------+ 291 | RD (8 octets) | 292 +-----------------------------------+ 293 | Ethernet Tag ID (4 octets) | 294 +-----------------------------------+ 295 | Region ID (8 octets) | 296 +-----------------------------------+ 298 The Region ID identifies the region and is encoded just as how an 299 Extended Community is encoded, as detailed in Section 6.2. 301 3.2. S-PMSI A-D route 303 The S-PMSI A-D route has the following format: 305 +-----------------------------------+ 306 | RD (8 octets) | 307 +-----------------------------------+ 308 | Ethernet Tag ID (4 octets) | 309 +-----------------------------------+ 310 | Multicast Source Length (1 octet) | 311 +-----------------------------------+ 312 | Multicast Source (Variable) | 313 +-----------------------------------+ 314 | Multicast Group Length (1 octet) | 315 +-----------------------------------+ 316 | Multicast Group (Variable) | 317 +-----------------------------------+ 318 |Originator's Addr Length (1 octet) | 319 +-----------------------------------+ 320 |Originator's Addr (4 or 16 octets) | 321 +-----------------------------------+ 323 Other than the addition of Ethernet Tag ID and Originator's Addr 324 Length, it is identical to the S-PMSI A-D route as defined in 325 [RFC7117]. The procedures in [RFC7117] also apply (including 326 wildcard functionality), except that the granularity level is per 327 Ethernet Tag. 329 3.3. Leaf A-D route 331 The Route Type specific field of a Leaf A-D route consists of the 332 following: 334 +-----------------------------------+ 335 | Route Key (variable) | 336 +-----------------------------------+ 337 |Originator's Addr Length (1 octet) | 338 +-----------------------------------+ 339 |Originator's Addr (4 or 16 octets) | 340 +-----------------------------------+ 342 A Leaf A-D route is originated in response to a PMSI route, which 343 could be an Inclusive Multicast Tag route, a per-region I-PMSI A-D 344 route, an S-PMSI A-D route, or some other types of routes that may be 345 defined in the future that triggers Leaf A-D routes. The Route Key 346 is the NLRI of the route for which this Leaf A-D route is generated. 348 The general procedures of Leaf A-D route are first specified in 349 [RFC6514] for MVPN. The principles apply to VPLS and EVPN as well. 350 [RFC7117] has details for VPLS Multicast, and this document points 351 out some specifics for EVPN, e.g. in Section 5. 353 4. Selective Multicast 355 [I-D.ietf-bess-evpn-igmp-mld-proxy] specifies procedures for EVPN 356 selective forwarding of IP multicast using SMET routes. It assumes 357 selective forwarding is always used with IR for all flows (though the 358 same signaling can also be used for an ingress PE to find out the set 359 of egress PEs for selective forwarding with BIER). An NVE proxies 360 the IGMP/MLD state that it learns on its ACs to (C-S,C-G) or 361 (C-*,C-G) SMET routes that advertises to other NVEs, and a receiving 362 NVE converts the SMET routes back to IGMP/MLD messages and sends them 363 out of its ACs. The receiving NVE also uses the SMET routes to 364 identify which NVEs need to receive traffic for a particular 365 (C-S,C-G) or (C-*,C-G) to achieve selective forwarding using IR or 366 BIER. 368 With the above procedures, selective forwarding is done for all flows 369 and the SMET routes are advertised for all flows. It is possible 370 that an operator may not want to track all those (C-S, C-G) or 371 (C-*,C-G) state on the NVEs, and the multicast traffic pattern allows 372 inclusive forwarding for most flows while selective forwarding is 373 needed only for a few high-rate flows. For that, or for tunnel types 374 other than IR/BIER, S-PMSI/Leaf A-D procedures defined for Selective 375 Multicast for VPLS in [RFC7117] are used. Other than that different 376 route types and formats are specified with EVPN SAFI for S-PMSI A-D 377 and Leaf A-D routes (Section 3), all procedures in [RFC7117] with 378 respect to Selective Multicast apply to EVPN as well, including 379 wildcard procedures. In a nutshell, a source NVE advertises S-PMSI 380 A-D routes to announce the tunnels used for certain flows, and 381 receiving NVEs either join the announced PIM/mLDP tunnel or respond 382 with Leaf A-D routes if the Leaf Information Required flag is set in 383 the S-PMSI A-D route's PTA (so that the source NVE can include them 384 as tunnel leaves). 386 An optimization to the [RFC7117] procedures may be applied. Even if 387 a source NVE sets the L flag to request Leaf A-D routes, an egress 388 NVE MAY omit the Leaf A-D route if it has already advertised a 389 corresponding SMET route, and the source NVE MUST use that in lieu of 390 the Leaf A-D route. 392 The optional optimizations specified for MVPN in [RFC8534] are also 393 applicable to EVPN when the S-PMSI/Leaf A-D routes procedures are 394 used for EVPN selective multicast forwarding. 396 5. Inter-AS Segmentation 398 5.1. Differences from Section 7.2.2 of [RFC7117] When Applied to EVPN 400 The first paragraph of Section 7.2.2.2 of [RFC7117] says: 402 "... The best route procedures ensure that if multiple 403 ASBRs, in an AS, receive the same Inter-AS A-D route from their EBGP 404 neighbors, only one of these ASBRs propagates this route in Internal 405 BGP (IBGP). This ASBR becomes the root of the intra-AS segment of 406 the inter-AS tree and ensures that this is the only ASBR that accepts 407 traffic into this AS from the inter-AS tree." 409 The above VPLS behavior requires complicated VPLS specific procedures 410 for the ASBRs to reach agreement. For EVPN, a different approach is 411 used and the above quoted text is not applicable to EVPN. 413 With the different approach for EVPN/MVPN, each ASBR will re- 414 advertise its received Inter-AS A-D route to its IBGP peers and 415 becomes the root of an intra-AS segment of the inter-AS tree. The 416 intra-AS segment rooted at one ASBR is disjoint with another intra-AS 417 segment rooted at another ASBR. This is the same as the procedures 418 for S-PMSI in [RFC7117] itself. 420 The following bullet in Section 7.2.2.2 of [RFC7117] does not apply 421 to EVPN. 423 + If the ASBR uses ingress replication to instantiate the intra-AS 424 segment of the inter-AS tunnel, the re-advertised route MUST NOT 425 carry the PMSI Tunnel attribute. 427 The following bullet in Section 7.2.2.2 of [RFC7117]: 429 + If the ASBR uses a P-multicast tree to instantiate the intra-AS 430 segment of the inter-AS tunnel, the PMSI Tunnel attribute MUST 431 contain the identity of the tree that is used to instantiate the 432 segment (note that the ASBR could create the identity of the tree 433 prior to the actual instantiation of the segment). If, in order 434 to instantiate the segment, the ASBR needs to know the leaves of 435 the tree, then the ASBR obtains this information from the A-D 436 routes received from other PEs/ASBRs in the ASBR's own AS. 438 is changed to the following when applied to EVPN: 440 "The PMSI Tunnel attribute MUST specify the tunnel for the segment. 441 If and only if, in order to establish the tunnel, the ASBR needs to 442 know the leaves of the tree, then the ASBR MUST set the L flag to 443 1 in the PTA to trigger Leaf A-D routes from egress PEs and 444 downstream ASBRs. It MUST be (auto-)configured with an import RT, 445 which controls acceptance of leaf A-D routes by the ASBR." 447 Accordingly, the following paragraph in Section 7.2.2.4 of [RFC7117]: 449 "If the received Inter-AS A-D route carries the PMSI Tunnel attribute 450 with the Tunnel Identifier set to RSVP-TE P2MP LSP, then the ASBR 451 that originated the route MUST establish an RSVP-TE P2MP LSP with the 452 local PE/ASBR as a leaf. This LSP MAY have been established before 453 the local PE/ASBR receives the route, or it MAY be established after 454 the local PE receives the route." 456 is changed to the following when applied to EVPN: 458 "If the received Inter-AS A-D route has the L flag set in its PTA, 459 then a receiving PE MUST originate a corresponding Leaf A-D route, 460 while a receiving ASBR MUST originate a corresponding Leaf A-D route 461 if and only if it received and imported one or more corresponding 462 Leaf A-D routes from its downstream IBGP or EBGP peers, or it has 463 non-null downstream forwarding state for the PIM/mLDP tunnel that 464 instantiates its downstream intra-AS segment. The targeted ASBR for 465 the Leaf A-D route, which (re-)advertised the Inter-AS A-D route, 466 MUST establish a tunnel to the leaves discovered by the Leaf A-D 467 routes." 469 5.2. I-PMSI Leaf Tracking 471 An ingress PE does not set the L flag in its Inclusive Multicast 472 Ethernet Tag (IMET) A-D route's PTA, even with Ingress Replication or 473 RSVP-TE P2MP tunnels. It does not rely on the Leaf A-D routes to 474 discover leaves in its AS, and Section 11.2 of [RFC7432] explicitly 475 states that the L flag must be set to zero. 477 An implementation of [RFC7432] might have used the Originating 478 Router's IP Address field of the IMET A-D routes to determine the 479 leaves, or might have used the Next Hop field instead. Within the 480 same AS, both will lead to the same result. 482 With segmentation, an ingress PE MUST determine the leaves in its AS 483 from the BGP next hops in all its received IMET A-D routes, so it 484 does not have to set the L flag set to request Leaf A-D routes. PEs 485 within the same AS will all have different next hops in their IMET 486 A-D routes (hence will all be considered as leaves), and PEs from 487 other ASes will have the next hop in their IMET A-D routes set to 488 addresses of ASBRs in this local AS, hence only those ASBRs will be 489 considered as leaves (as proxies for those PEs in other ASes). Note 490 that in case of Ingress Replication, when an ASBR re-advertises IMET 491 A-D routes to IBGP peers, it MUST advertise the same label for all 492 those for the same Ethernet Tag ID and the same EVI. Otherwise, 493 duplicated copies will be sent by the ingress PE and received by 494 egress PEs in other regions. For the same reason, when an ingress PE 495 builds its flooding list, if multiple routes have the same (nexthop, 496 label) tuple they MUST only be added as a single branch in the 497 flooding list. 499 5.3. Backward Compatibility 501 The above procedures assume that all PEs are upgraded to support the 502 segmentation procedures: 504 o An ingress PE uses the Next Hop and not Originating Router's IP 505 Address to determine leaves for the I-PMSI tunnel. 507 o An egress PE sends Leaf A-D routes in response to I-PMSI routes, 508 if the PTA has the L flag set by the re-advertising ASBR. 510 o In case of Ingress Replication, when an ingress PE builds its 511 flooding list, multiple I-PMSI routes may have the same (nexthop, 512 label) tuple and only a single branch for those will be added in 513 the flooding list. 515 If a deployment has legacy PEs that does not support the above, then 516 a legacy ingress PE would include all PEs (including those in remote 517 ASes) as leaves of the inclusive tunnel and try to send traffic to 518 them directly (no segmentation), which is either undesired or not 519 possible; a legacy egress PE would not send Leaf A-D routes so the 520 ASBRs would not know to send external traffic to them. 522 If this backward compatibility problem needs to be addressed, the 523 following procedure MUST be used (see Section 6.2 for per-PE/AS/ 524 region I-PMSI A-D routes): 526 o An upgraded PE indicates in its per-PE I-PMSI A-D route that it 527 supports the new procedures. This is done by setting a flag bit 528 in the EVPN Multicast Flags Extended Community. 530 o All per-PE I-PMSI A-D routes are restricted to the local AS and 531 not propagated to external peers. 533 o The ASBRs in an AS originate per-region I-PMSI A-D routes and 534 advertise them to their external peers to specify tunnels used to 535 carry traffic from the local AS to other ASes. Depending on the 536 types of tunnels being used, the L flag in the PTA may be set, in 537 which case the downstream ASBRs and upgraded PEs will send Leaf 538 A-D routes to pull traffic from their upstream ASBRs. In a 539 particular downstream AS, one of the ASBRs is elected, based on 540 the per-region I-PMSI A-D routes for a particular source AS, to 541 send traffic from that source AS to legacy PEs in the downstream 542 AS. The traffic arrives at the elected ASBR on the tunnel 543 announced in the best per-region I-PMSI A-D route for the source 544 AS, that the ASBR has selected of all those that it received over 545 EBGP or IBGP sessions. The election procedure is described in 546 Section 5.3.1. 548 o In an ingress/upstream AS, if and only if an ASBR has active 549 downstream receivers (PEs and ASBRs), which are learned either 550 explicitly via Leaf A-D routes or implicitly via PIM join or mLDP 551 label mapping, the ASBR originates a per-PE I-PMSI A-D route 552 (i.e., regular Inclusive Multicast Ethernet Tag route) into the 553 local AS, and stitches incoming per-PE I-PMSI tunnels into its 554 per-region I-PMSI tunnel. With this, it gets traffic from local 555 PEs and send to other ASes via the tunnel announced in its per- 556 region I-PMSI A-D route. 558 Note that, even if there is no backward compatibility issue, the use 559 of per-region I-PMSI has the benefit of keeping all per-PE I-PMSI A-D 560 routes in their local ASes, greatly reducing the flooding of the 561 routes and their corresponding Leaf A-D routes (when needed), and the 562 number of inter-as tunnels. 564 5.3.1. Designated ASBR Election 566 When an ASBR re-advertises a per-region I-PMSI A-D route into an AS 567 in which a designated ASBR needs to be used to forward traffic to the 568 legacy PEs in the AS, it MUST include a DF Election EC. The EC and 569 its use is specified in [RFC8584]. The AC-DF bit in the DF Election 570 EC MUST be cleared. If it is known that no legacy PEs exist in the 571 AS, the ASBR MUST NOT include the EC and MUST remove the DF Election 572 EC if one is carried in the per-region I-PMSI A-D routes that it 573 receives. Note that this is done for each set of per-region I-PMSI 574 A-D routes with the same NLRI. 576 Based on the procedures in [RFC8584], an election algorithm is 577 determined according to the DF Election ECs carried in the set of 578 per-region I-PMSI routes of the same NLRI re-adverised into the AS. 579 The algorithm is then applied to a candidate list, which is the set 580 of ASBRs that re-advertised the per-region I-PMSI routes of the same 581 NLRI carrying the DF Election EC. 583 6. Inter-Region Segmentation 585 6.1. Area/AS vs. Region 587 [RFC7524] is for MVPN/VPLS inter-area segmentation and does not 588 explicitly cover EVPN. However, if "area" is replaced by "region" 589 and "ABR" is replaced by "RBR" (Regional Border Router) then 590 everything still works, and can be applied to EVPN as well. 592 A region can be a sub-area, or can be an entire AS including its 593 external links. Instead of automatic region definition based on IGP 594 areas, a region would be defined as a BGP peer group. In fact, even 595 with IGP area based region definition, a BGP peer group listing the 596 PEs and ABRs in an area is still needed. 598 Consider the following example diagram for inter-as segmentation: 600 --------- ------ --------- 601 / \ / \ / \ 602 / \ / \ / \ 603 | PE1 o ASBR1 -- ASBR2 ASBR3 -- ASBR4 o PE2 | 604 \ / \ / \ / 605 \ / \ / \ / 606 --------- ------ --------- 607 AS 100 AS 200 AS 300 608 |-----------|--------|---------|--------|------------| 609 segment1 segment2 segment3 segment4 segment5 611 The inter-as segmentation procedures specified so far ([RFC6513] 612 [RFC6514], [RFC7117], and Section 5 of this document) require all 613 ASBRs to be involved, and Ingress Replication is used between two 614 ASBRs in different ASes. 616 In the above diagram, it's possible that ASBR1/4 does not support 617 segmentation, and the provider tunnels in AS 100/300 can actually 618 extend across the external link. In this case, the inter-region 619 segmentation procedures can be used instead - a region is the entire 620 (AS100 + ASBR1-ASBR2 link) or (AS300 + ASBR3-ASBR4 link). ASBR2/3 621 would be the RBRs, and ASBR1/4 will just be a transit core router 622 with respect to provider tunnels. 624 As illustrated in the diagram below, ASBR2/3 will establish a 625 multihop EBGP session with either a RR or directly with PEs in the 626 neighboring AS. I/S-PMSI A-D routes from ingress PEs will not be 627 processed by ASBR1/4. When ASBR2 re-advertises the routes into AS 628 200, it changes the next hop to its own address and changes PTA to 629 specify the tunnel type/identification in its own AS. When ASBR3 re- 630 advertises I/S-PMSI A-D routes into the neighboring AS 300, it 631 changes the next hop to its own address and changes PTA to specify 632 the tunnel type/identification in the neighboring region. Now the 633 segment is rooted at ASBR3 and extends across the external link to 634 PEs. 636 --------- ------ --------- 637 / RR....\.mh-ebpg / \ mh-ebgp/....RR \ 638 / : \ `. / \ .' / : \ 639 | PE1 o ASBR1 -- ASBR2 ASBR3 -- ASBR4 o PE2 | 640 \ / \ / \ / 641 \ / \ / \ / 642 --------- ------ --------- 643 AS 100 AS 200 AS 300 644 |-------------------|----------|---------------------| 645 segment 1 segment 2 segment 3 647 6.2. Per-region Aggregation 649 Notice that every I/S-PMSI route from each PE will be propagated 650 throughout all the ASes or regions. They may also trigger 651 corresponding Leaf A-D routes depending on the types of tunnels used 652 in each region. This may become too many - routes and corresponding 653 tunnels. To address this concern, the I-PMSI routes from all PEs in 654 a AS/region can be aggregated into a single I-PMSI route originated 655 from the RBRs, and traffic from all those individual I-PMSI tunnels 656 will be switched into the single I-PMSI tunnel. This is like the 657 MVPN Inter-AS I-PMSI route originated by ASBRs. 659 The MVPN Inter-AS I-PMSI A-D route can be better called as per-AS 660 I-PMSI A-D route, to be compared against the (per-PE) Intra-AS I-PMSI 661 A-D routes originated by each PE. In this document we will call it 662 as per-region I-PMSI A-D route, in case we want to apply the 663 aggregation at regional level. The per-PE I-PMSI routes will not be 664 propagated to other regions. If multiple RBRs are connected to a 665 region, then each will advertise such a route, with the same Region 666 ID and Ethernet Tag ID (Section 3.1). Similar to the per-PE I-PMSI 667 A-D routes, RBRs/PEs in a downstream region will each select a best 668 one from all those re-advertised by the upstream RBRs, hence will 669 only receive traffic injected by one of them. 671 MVPN does not aggregate S-PMSI routes from all PEs in an AS like it 672 does for I-PMSIs routes, because the number of PEs that will 673 advertise S-PMSI routes for the same (s,g) or (*,g) is small. This 674 is also the case for EVPN, i.e., there is no per-region S-PMSI 675 routes. 677 Notice that per-region I-PMSI routes can also be used to address 678 backwards compatibility issue, as discussed in Section 5.3. 680 The Region ID in the per-region I-PMSI route's NLRI is encoded like 681 an EC. For example, the Region ID can encode an AS number or area ID 682 in the following EC format: 684 o For a two-octet AS number, a Transitive Two-Octet AS-Specific EC 685 of sub-type 0x09 (Source AS), with the Global Administrator sub- 686 field set to the AS number and the Local Administrator sub-field 687 set to 0. 689 o For a four-octet AS number, a Transitive Four-Octet AS-Specific EC 690 of sub-type 0x09 (Source AS), with the Global Administrator sub- 691 field set to the AS number and the Local Administrator sub-field 692 set to 0. 694 o For an area ID, a Transitive IPv4-Address-Specific EC of any sub- 695 type, with the Global Administrator sub-field set to the area ID 696 and the Local Administrator sub-field set to 0. 698 Uses of other EC encoding MAY be allowed as long as it uniquely 699 identifies the region and the RBRs for the same region uses the same 700 Region ID. 702 6.3. Use of S-NH-EC 704 [RFC7524] specifies the use of S-NH-EC because it does not allow ABRs 705 to change the BGP next hop when they re-advertise I/S-PMSI A-D routes 706 to downstream areas. That is only to be consistent with the MVPN 707 Inter-AS I-PMSI A-D routes, whose next hop must not be changed when 708 they're re-advertised by the segmenting ABRs for reasons specific to 709 MVPN. For EVPN, it is perfectly fine to change the next hop when 710 RBRs re-advertise the I/S-PMSI A-D routes, instead of relying on S- 711 NH-EC. As a result, this document specifies that RBRs change the BGP 712 next hop when they re-advertise I/S-PMSI A-D routes and do not use S- 713 NH-EC. The advantage of this is that neither ingress nor egress PEs 714 need to understand/use S-NH-EC, and a consistent procedure (based on 715 BGP next hop) is used for both inter-as and inter-region 716 segmentation. 718 If a downstream PE/RBR needs to originate Leaf A-D routes, it 719 constructs an IP-based Route Target Extended Community by placing the 720 IP address carried in the Next Hop of the received I/S-PMSI A-D route 721 in the Global Administrator field of the Community, with the Local 722 Administrator field of this Community set to 0 and setting the 723 Extended Communities attribute of the Leaf A-D route to that 724 Community. 726 Similar to [RFC7524], the upstream RBR MUST (auto-)configure a RT 727 with the Global Administrator field set to the Next Hop in the re- 728 advertised I/S-PMSI A-D route and with the Local Administrator field 729 set to 0. With this, the mechanisms specified in [RFC4684] for 730 constrained BGP route distribution can be used along with this 731 specification to ensure that only the needed PE/ABR will have to 732 process a said Leaf A-D route. 734 6.4. Ingress PE's I-PMSI Leaf Tracking 736 [RFC7524] specifies that when an ingress PE/ASBR (re-)advertises an 737 VPLS I-PMSI A-D route, it sets the L flag to 1 in the route's PTA. 738 Similar to the inter-as case, this is actually not really needed for 739 EVPN. To be consistent with the inter-as case, the ingress PE does 740 not set the L flag in its originated I-PMSI A-D routes, and 741 determines the leaves based on the BGP next hops in its received 742 I-PMSI A-D routes, as specified in Section 5.2. 744 The same backward compatibility issue exists, and the same solution 745 as in the inter-as case applies, as specified in Section 5.3. 747 7. Multi-homing Support 749 To support multi-homing with segmentation, ESI labels SHOULD be 750 allocated from "Domain-wide Common Block" (DCB) 751 [I-D.ietf-bess-mvpn-evpn-aggregation-label] for all tunnel types 752 including Ingress Replication. Via means outside the scope of this 753 document, PEs know that ESI labels are from DCB and then existing 754 multi-homing procedures work as is (whether a multi-homed Ethernet 755 Segment spans across segmentation regions or not). 757 Not using DCB-allocated ESI labels is outside the scope of this 758 document. 760 8. IANA Considerations 762 IANA has temporarily assigned the following new EVPN route types in 763 the EVPN Route Types registry: 765 o 9 - Per-Region I-PMSI A-D route 767 o 10 - S-PMSI A-D route 769 o 11 - Leaf A-D route 771 This document requests IANA to assign one flag bit from the EVPN 772 Multicast Flags Extended Community Flags registry to be created in 773 [draft-ietf-bess-evpn-igmp-mld-proxy]: 775 o Bit-S - Segmentation Procedure Support 777 9. Security Considerations 779 The Selective Forwarding procedures via S-PMSI/Leaf A-D routes in 780 this document are based on the same procedures for MVPN [RFC6513] 781 [RFC6514] and VPLS Multicast [RFC7117]. The tunnel segmentation 782 procedures in this document are based on the similar procedures for 783 MVPN inter-AS [RFC6514] and inter-area [RFC7524] tunnel segmentation, 784 and procedures for VPLS Multicast [RFC7117] inter-as tunnel 785 segmentation. When applied to EVPN, they do not introduce new 786 security concerns besides what have been discussed in [RFC6513], 787 [RFC6514], [RFC7117], and [RFC7524]. They also do not introduce new 788 security concerns compared to [RFC7432]. 790 10. Acknowledgements 792 The authors thank Eric Rosen, John Drake, and Ron Bonica for their 793 comments and suggestions. 795 11. Contributors 797 The following also contributed to this document through their earlier 798 work in EVPN selective multicast. 800 Junlin Zhang 801 Huawei Technologies 802 Huawei Bld., No.156 Beiqing Rd. 803 Beijing 100095 804 China 806 Email: jackey.zhang@huawei.com 808 Zhenbin Li 809 Huawei Technologies 810 Huawei Bld., No.156 Beiqing Rd. 811 Beijing 100095 812 China 814 Email: lizhenbin@huawei.com 816 12. References 818 12.1. Normative References 820 [I-D.ietf-bess-evpn-igmp-mld-proxy] 821 Sajassi, A., Thoria, S., Mishra, M., Drake, J., and W. 822 Lin, "IGMP and MLD Proxy for EVPN", draft-ietf-bess-evpn- 823 igmp-mld-proxy-14 (work in progress), October 2021. 825 [I-D.ietf-bess-mvpn-evpn-aggregation-label] 826 Zhang, Z., Rosen, E., Lin, W., Li, Z., and I. Wijnands, 827 "MVPN/EVPN Tunnel Aggregation with Common Labels", draft- 828 ietf-bess-mvpn-evpn-aggregation-label-06 (work in 829 progress), April 2021. 831 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 832 Requirement Levels", BCP 14, RFC 2119, 833 DOI 10.17487/RFC2119, March 1997, 834 . 836 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 837 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 838 2012, . 840 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 841 Encodings and Procedures for Multicast in MPLS/BGP IP 842 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 843 . 845 [RFC7117] Aggarwal, R., Ed., Kamite, Y., Fang, L., Rekhter, Y., and 846 C. Kodeboniya, "Multicast in Virtual Private LAN Service 847 (VPLS)", RFC 7117, DOI 10.17487/RFC7117, February 2014, 848 . 850 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 851 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 852 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 853 2015, . 855 [RFC7524] Rekhter, Y., Rosen, E., Aggarwal, R., Morin, T., 856 Grosclaude, I., Leymann, N., and S. Saad, "Inter-Area 857 Point-to-Multipoint (P2MP) Segmented Label Switched Paths 858 (LSPs)", RFC 7524, DOI 10.17487/RFC7524, May 2015, 859 . 861 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 862 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 863 May 2017, . 865 [RFC8534] Dolganow, A., Kotalwar, J., Rosen, E., Ed., and Z. Zhang, 866 "Explicit Tracking with Wildcard Routes in Multicast VPN", 867 RFC 8534, DOI 10.17487/RFC8534, February 2019, 868 . 870 [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, 871 J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet 872 VPN Designated Forwarder Election Extensibility", 873 RFC 8584, DOI 10.17487/RFC8584, April 2019, 874 . 876 12.2. Informative References 878 [I-D.ietf-bess-evpn-prefix-advertisement] 879 Rabadan, J., Henderickx, W., Drake, J. E., Lin, W., and A. 880 Sajassi, "IP Prefix Advertisement in Ethernet VPN (EVPN)", 881 draft-ietf-bess-evpn-prefix-advertisement-11 (work in 882 progress), May 2018. 884 [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 885 R., Patel, K., and J. Guichard, "Constrained Route 886 Distribution for Border Gateway Protocol/MultiProtocol 887 Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual 888 Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684, 889 November 2006, . 891 [RFC4875] Aggarwal, R., Ed., Papadimitriou, D., Ed., and S. 892 Yasukawa, Ed., "Extensions to Resource Reservation 893 Protocol - Traffic Engineering (RSVP-TE) for Point-to- 894 Multipoint TE Label Switched Paths (LSPs)", RFC 4875, 895 DOI 10.17487/RFC4875, May 2007, 896 . 898 [RFC6388] Wijnands, IJ., Ed., Minei, I., Ed., Kompella, K., and B. 899 Thomas, "Label Distribution Protocol Extensions for Point- 900 to-Multipoint and Multipoint-to-Multipoint Label Switched 901 Paths", RFC 6388, DOI 10.17487/RFC6388, November 2011, 902 . 904 [RFC7988] Rosen, E., Ed., Subramanian, K., and Z. Zhang, "Ingress 905 Replication Tunnels in Multicast VPN", RFC 7988, 906 DOI 10.17487/RFC7988, October 2016, 907 . 909 [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 910 Przygienda, T., and S. Aldrin, "Multicast Using Bit Index 911 Explicit Replication (BIER)", RFC 8279, 912 DOI 10.17487/RFC8279, November 2017, 913 . 915 Authors' Addresses 917 Zhaohui Zhang 918 Juniper Networks 920 EMail: zzhang@juniper.net 922 Wen Lin 923 Juniper Networks 925 EMail: wlin@juniper.net 927 Jorge Rabadan 928 Nokia 930 EMail: jorge.rabadan@nokia.com 932 Keyur Patel 933 Arrcus 935 EMail: keyur@arrcus.com 936 Ali Sajassi 937 Cisco Systems 939 EMail: sajassi@cisco.com