idnits 2.17.1 draft-ietf-bess-evpn-bum-procedure-updates-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 18, 2019) is 1621 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-igmp-mld-proxy-04 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Z. Zhang 3 Internet-Draft W. Lin 4 Updates: 7432 (if approved) Juniper Networks 5 Intended status: Standards Track J. Rabadan 6 Expires: May 21, 2020 Nokia 7 K. Patel 8 Arrcus 9 A. Sajassi 10 Cisco Systems 11 November 18, 2019 13 Updates on EVPN BUM Procedures 14 draft-ietf-bess-evpn-bum-procedure-updates-08 16 Abstract 18 This document specifies procedure updates for broadcast, unknown 19 unicast, and multicast (BUM) traffic in Ethernet VPNs (EVPN), 20 including selective multicast, and provider tunnel segmentation. 21 This document updates RFC 7432. 23 Requirements Language 25 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 26 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 27 "OPTIONAL" in this document are to be interpreted as described in BCP 28 14 [RFC2119] [RFC8174] when, and only when, they appear in all 29 capitals, as shown here. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on May 21, 2020. 48 Copyright Notice 50 Copyright (c) 2019 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (https://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2.1. Reasons for Tunnel Segmentation . . . . . . . . . . . . . 4 68 3. Additional Route Types of EVPN NLRI . . . . . . . . . . . . . 5 69 3.1. Per-Region I-PMSI A-D route . . . . . . . . . . . . . . . 6 70 3.2. S-PMSI A-D route . . . . . . . . . . . . . . . . . . . . 6 71 3.3. Leaf-AD route . . . . . . . . . . . . . . . . . . . . . . 7 72 4. Selective Multicast . . . . . . . . . . . . . . . . . . . . . 8 73 5. Inter-AS Segmentation . . . . . . . . . . . . . . . . . . . . 8 74 5.1. Changes to Section 7.2.2 of [RFC7117] . . . . . . . . . . 8 75 5.2. I-PMSI Leaf Tracking . . . . . . . . . . . . . . . . . . 10 76 5.3. Backward Compatibility . . . . . . . . . . . . . . . . . 10 77 5.3.1. Designated ASBR Election . . . . . . . . . . . . . . 12 78 6. Inter-Region Segmentation . . . . . . . . . . . . . . . . . . 12 79 6.1. Area/AS vs. Region . . . . . . . . . . . . . . . . . . . 12 80 6.2. Per-region Aggregation . . . . . . . . . . . . . . . . . 13 81 6.3. Use of S-NH-EC . . . . . . . . . . . . . . . . . . . . . 14 82 6.4. Ingress PE's I-PMSI Leaf Tracking . . . . . . . . . . . . 15 83 7. Multi-homing Support . . . . . . . . . . . . . . . . . . . . 15 84 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 85 9. Security Considerations . . . . . . . . . . . . . . . . . . . 16 86 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 87 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 16 88 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 89 12.1. Normative References . . . . . . . . . . . . . . . . . . 16 90 12.2. Informative References . . . . . . . . . . . . . . . . . 17 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 93 1. Terminology 95 It is expected that audience is familiar with EVPN and MVPN concepts 96 and terminologies. For convenience, the following terms are briefly 97 explained. 99 o PMSI: P-Multicast Service Interface - a conceptual interface for a 100 PE to send customer multicast traffic to all or some PEs in the 101 same VPN. 103 o I-PMSI: Inclusive PMSI - to all PEs in the same VPN. 105 o S-PMSI: Selective PMSI - to some of the PEs in the same VPN. 107 o Leaf A-D routes: For explicit leaf tracking purpose. Triggered by 108 S-PMSI A-D routes and targeted at triggering route's originator. 110 o IMET A-D route: Inclusive Multicast Ethernet Tag A-D route. The 111 EVPN equivalent of MVPN Intra-AS I-PMSI A-D route. 113 o SMET A-D route: Selective Multicast Ethernet Tag A-D route. The 114 EVPN equivalent of MVPN Leaf A-D route but unsolicited and 115 untargeted. 117 2. Introduction 119 [RFC7432] specifies procedures to handle broadcast, unknown unicast, 120 and multicast (BUM) traffic in Section 11, 12 and 16, using Inclusive 121 Multicast Ethernet Tag Route. A lot of details are referred to 122 [RFC7117] (VPLS Multicast). In particular, selective multicast is 123 briefly mentioned for Ingress Replication but referred to [RFC7117]. 125 [RFC7117] specifies procedures for using both inclusive tunnels and 126 selective tunnels, similar to MVPN procedures specified in [RFC6513] 127 and [RFC6514]. A new SAFI "MCAST-VPLS" is introduced, with two types 128 of NLRIs that match MVPN's S-PMSI A-D routes and Leaf A-D routes. 129 The same procedures can be applied to EVPN selective multicast for 130 both Ingress Replication and other tunnel types, but new route types 131 need to be defined under the same EVPN SAFI. 133 MVPN uses terms I-PMSI and S-PMSI A-D Routes. For consistency and 134 convenience, this document will use the same I/S-PMSI terms for VPLS 135 and EVPN. In particular, EVPN's Inclusive Multicast Ethernet Tag 136 Route and VPLS's VPLS A-D route carrying PTA (PMSI Tunnel Attribute) 137 for BUM traffic purpose will all be referred to as I-PMSI A-D routes. 138 Depending on the context, they may be used interchangeably. 140 MVPN provider tunnels and EVPN/VPLS BUM provider tunnels, which are 141 referred to as MVPN/EVPN/VPLS provider tunnels in this document for 142 simplicity, can be segmented for technical or administrative reasons, 143 which are summarized in Section 2.1 of this document. [RFC6513] and 144 [RFC6514] cover MVPN inter-as segmentation, [RFC7117] covers VPLS 145 multicast inter-as segmentation, and [RFC7524] (Seamless MPLS 146 Multicast) covers inter-area segmentation for both MVPN and VPLS. 148 There is a difference between MVPN and VPLS multicast inter-as 149 segmentation. For simplicity, EVPN will use the same procedures as 150 in MVPN. All ASBRs can re-advertise their choice of the best route. 151 Each can become the root of its intra-AS segment and inject traffic 152 it receives from its upstream, while each downstream PE/ASBR will 153 only pick one of the upstream ASBRs as its upstream. This is also 154 the behavior even for VPLS in case of inter-area segmentation. 156 For inter-area segmentation, [RFC7524] requires the use of Inter-area 157 P2MP Segmented Next-Hop Extended Community (S-NH-EC), and the setting 158 of "Leaf Information Required" (LIR) flag in PTA in certain 159 situations. Either of these could be optional in case of EVPN. 160 Removing these requirements would make the segmentation procedures 161 transparent to ingress and egress PEs. 163 [RFC7524] assumes that segmentation happens at area borders. 164 However, it could be at "regional" borders, where a region could be a 165 sub-area, or even an entire AS plus its external links (Section 6). 166 That would allow for more flexible deployment scenarios (e.g. for 167 single-area provider networks). 169 This document specifies/clarifies/redefines certain/additional EVPN 170 BUM procedures, with a salient goal that they're better aligned among 171 MVPN, EVPN and VPLS. For brevity, only changes/additions to relevant 172 [RFC7117] and [RFC7524] procedures are specified, instead of 173 repeating the entire procedures. Note that these are to be applied 174 to EVPN only, and not updates to [RFC7117] or [RFC7524]. 176 2.1. Reasons for Tunnel Segmentation 178 Tunnel segmentation may be required and/or desired because of 179 administrative and/or technical reasons. 181 For example, an MVPN/VPLS/EVPN network may span multiple providers 182 and Inter-AS Option-B has to be used, in which the end-to-end 183 provider tunnels have to be segmented at and stitched by the ASBRs. 184 Different providers may use different tunnel technologies (e.g., 185 provider A uses Ingress Replication [RFC7988], provider B uses RSVP- 186 TE P2MP [RFC4875] while provider C uses mLDP [RFC6388]). Even if 187 they use the same tunnel technology like RSVP-TE P2MP, it may be 188 impractical to set up the tunnels across provider boundaries. 190 The same situations may apply between the ASes and/or areas of a 191 single provider. For example, the backbone area may use RSVP-TE P2MP 192 tunnels while non-backbone areas may use mLDP tunnels. 194 Segmentation can also be used to divide an AS/area to smaller 195 regions, so that control plane state and/or forwarding plane state/ 196 burden can be limited to that of individual regions. For example, 197 instead of Ingress Replicating to 100 PEs in the entire AS, with 198 inter-area segmentation [RFC7524] a PE only needs to replicate to 199 local PEs and ABRs. The ABRs will further replicate to their 200 downstream PEs and ABRs. This not only reduces the forwarding plane 201 burden, but also reduces the leaf tracking burden in the control 202 plane. 204 Smaller regions also have the benefit that, in case of tunnel 205 aggregation, it is easier to find congruence among the segments of 206 different constituent (service) tunnels and the resulting aggregation 207 (base) tunnel in a region. This leads to better bandwidth 208 efficiency, because the more congruent they are, the fewer leaves of 209 the base tunnel need to discard traffic when a service tunnel's 210 segment does not need to receive the traffic (yet it is receiving the 211 traffic due to aggregation). 213 Another advantage of the smaller region is smaller BIER sub-domains. 214 In this new multicast architecture BIER [RFC8279], packets carry a 215 BitString, in which the bits correspond to edge routers that needs to 216 receive traffic. Smaller sub-domains means smaller BitStrings can be 217 used without having to send multiple copies of the same packet. 219 3. Additional Route Types of EVPN NLRI 221 [RFC7432] defines the format of EVPN NLRI as the following: 223 +-----------------------------------+ 224 | Route Type (1 octet) | 225 +-----------------------------------+ 226 | Length (1 octet) | 227 +-----------------------------------+ 228 | Route Type specific (variable) | 229 +-----------------------------------+ 231 So far eight types have been defined in [RFC7432], 232 [I-D.ietf-bess-evpn-prefix-advertisement], and 233 [I-D.ietf-bess-evpn-igmp-mld-proxy]: 235 + 1 - Ethernet Auto-Discovery (A-D) route 236 + 2 - MAC/IP Advertisement route 237 + 3 - Inclusive Multicast Ethernet Tag route 238 + 4 - Ethernet Segment route 239 + 5 - IP Prefix Route 240 + 6 - Selective Multicast Ethernet Tag Route 241 + 7 - Multicast Join Synch Route 242 + 8 - Multicast Leave Synch Route 244 This document defines three additional route types: 246 + 9 - Per-Region I-PMSI A-D route 247 + 10 - S-PMSI A-D route 248 + 11 - Leaf A-D route 250 The "Route Type specific" field of the type 9 and type 10 EVPN NLRIs 251 starts with a type 1 RD, whose Administrative sub-field MUST match 252 that of the RD in all the EVPN routes from the same advertising 253 router for a given EVI, except the Leaf A-D route (Section 3.3). 255 3.1. Per-Region I-PMSI A-D route 257 The Per-region I-PMSI A-D route has the following format. Its usage 258 is discussed in Section 6.2. 260 +-----------------------------------+ 261 | RD (8 octets) | 262 +-----------------------------------+ 263 | Ethernet Tag ID (4 octets) | 264 +-----------------------------------+ 265 | Region ID (8 octets) | 266 +-----------------------------------+ 268 The Region ID identifies the region and is encoded just as how an 269 Extended Community is encoded, as detailed in Section 6.2. 271 3.2. S-PMSI A-D route 273 The S-PMSI A-D route has the following format: 275 +-----------------------------------+ 276 | RD (8 octets) | 277 +-----------------------------------+ 278 | Ethernet Tag ID (4 octets) | 279 +-----------------------------------+ 280 | Multicast Source Length (1 octet) | 281 +-----------------------------------+ 282 | Multicast Source (Variable) | 283 +-----------------------------------+ 284 | Multicast Group Length (1 octet) | 285 +-----------------------------------+ 286 | Multicast Group (Variable) | 287 +-----------------------------------+ 288 |Originator's Addr Length (1 octet) | 289 +-----------------------------------+ 290 |Originator's Addr (4 or 16 octets) | 291 +-----------------------------------+ 293 Other than the addition of Ethernet Tag ID and Originator's Addr 294 Length, it is identical to the S-PMSI A-D route as defined in 295 [RFC7117]. The procedures in [RFC7117] also apply (including 296 wildcard functionality), except that the granularity level is per 297 Ethernet Tag. 299 3.3. Leaf-AD route 301 The Route Type specific field of a Leaf A-D route consists of the 302 following: 304 +-----------------------------------+ 305 | Route Key (variable) | 306 +-----------------------------------+ 307 |Originator's Addr Length (1 octet) | 308 +-----------------------------------+ 309 |Originator's Addr (4 or 16 octets) | 310 +-----------------------------------+ 312 A Leaf A-D route is originated in response to a PMSI route, which 313 could be an Inclusive Multicast Tag route, a per-region I-PMSI A-D 314 route, an S-PMSI A-D route, or some other types of routes that may be 315 defined in the future that triggers Leaf A-D routes. The Route Key 316 is the "Route Type Specific" field of the route for which this Leaf 317 A-D route is generated. 319 The general procedures of Leaf A-D route are first specified in 320 [RFC6514] for MVPN. The principles apply to VPLS and EVPN as well. 321 [RFC7117] has details for VPLS Multicast, and this document points 322 out some specifics for EVPN, e.g. in Section 5. 324 4. Selective Multicast 326 [I-D.ietf-bess-evpn-igmp-mld-proxy] specifies procedures for EVPN 327 selective forwarding of IP multicast using SMET routes. It assumes 328 selective forwarding is always used with IR or BIER for all flows. 329 An NVE proxies the IGMP/MLD state that it learns on its ACs to 330 (C-S,C-G) or (C-*,C-G) SMET routes and advertises to other NVEs, and 331 a receiving NVE converts the SMET routes back to IGMP/MLD messages 332 and send them out of its ACs. The receiving NVE also uses the SMET 333 routes to identify which NVEs need to receive traffic for a 334 particular (C-S,C-G) or (C-*,C-G) to achieve selective forwarding 335 using IR or BIER. 337 With the above procedures, selective forwarding is done for all flows 338 and the SMET routes are advertised for all flows. It is possible 339 that an operator may not want to track all those (C-S, C-G) or 340 (C-*,C-G) state on the NVEs, and the multicast traffic pattern allows 341 inclusive forwarding for most flows while selective forwarding is 342 needed only for a few high-rate flows. For that, or for tunnel types 343 other than IR/BIER, S-PMSI/Leaf A-D procedures defined for Selective 344 Multicast for VPLS in [RFC7117] are used. Other than that different 345 route types and formats are specified with EVPN SAFI for S-PMSI A-D 346 and Leaf A-D routes (Section 3), all procedures in [RFC7117] with 347 respect to Selective Multicast apply to EVPN as well, including 348 wildcard procedures. In a nutshell, a source NVE advertises S-PMSI 349 A-D routes to announce the tunnels used for certain flows, and 350 receiving NVEs either join the announced PIM/mLDP tunnel or respond 351 with Leaf A-D routes if the Leaf Information Requested flag is set in 352 the S-PMSI A-D route's PTA (so that the source NVE can include them 353 as tunnel leaves). 355 An optimization to the [RFC7117] procedures may be applied. Even if 356 a source NVE sets the LIR bit to request Leaf A-D routes, an egress 357 NVE MAY omit the Leaf A-D route if it has already advertised a 358 corresponding SMET route, and the source NVE MUST use that in lieu of 359 the Leaf A-D route. 361 The optional optimizations specified for MVPN in [RFC8534] are also 362 applicable to EVPN when the S-PMSI/Leaf A-D routes procedures are 363 used for EVPN selective multicast forwarding. 365 5. Inter-AS Segmentation 367 5.1. Changes to Section 7.2.2 of [RFC7117] 369 The first paragraph of Section 7.2.2.2 of [RFC7117] says: 371 "... The best route procedures ensure that if multiple 372 ASBRs, in an AS, receive the same Inter-AS A-D route from their EBGP 373 neighbors, only one of these ASBRs propagates this route in Internal 374 BGP (IBGP). This ASBR becomes the root of the intra-AS segment of 375 the inter-AS tree and ensures that this is the only ASBR that accepts 376 traffic into this AS from the inter-AS tree." 378 The above VPLS behavior requires complicated VPLS specific procedures 379 for the ASBRs to reach agreement. For EVPN, a different approach is 380 used and the above quoted text is not applicable to EVPN. 382 With the different approach for EVPN, each ASBR will re-advertise its 383 received Inter-AS A-D route to its IBGP peers and becomes the root of 384 an intra-AS segment of the inter-AS tree. The intra-AS segment 385 rooted at one ASBR is disjoint with another intra-AS segment rooted 386 at another ASBR. This is the same as the procedures for S-PMSI in 387 [RFC7117] itself. 389 The first bullet does not apply to EVPN. 391 The second bullet is changed to the following when applied to EVPN: 393 "The PMSI Tunnel attribute MUST specify the tunnel for the segment. 394 If and only if, in order 395 to establish the tunnel, the ASBR needs to know the leaves of 396 the tree, then the ASBR MUST set the LIR flag to 1 in the PTA to 397 trigger Leaf A-D routes from egress PEs and downstream ASBRs. 398 It MUST be (auto-)configured with an import RT, which controls 399 acceptance of leaf A-D routes by the ASBR." 401 Accordingly, the following paragraph in Section 7.2.2.4: 403 "If the received Inter-AS A-D route carries the PMSI Tunnel attribute 404 with the Tunnel Identifier set to RSVP-TE P2MP LSP, then the ASBR 405 that originated the route MUST establish an RSVP-TE P2MP LSP with the 406 local PE/ASBR as a leaf. This LSP MAY have been established before 407 the local PE/ASBR receives the route, or it MAY be established after 408 the local PE receives the route." 410 is changed to the following when applied to EVPN: 412 "If the received Inter-AS A-D route has the LIR flag set in its PTA, 413 then a receiving PE MUST originate a corresponding Leaf A-D route, 414 while a receiving ASBR MUST originate a corresponding Leaf A-D route 415 if and only if it received and imported one or more corresponding 416 Leaf A-D routes from its downstream IBGP or EBGP peers, or it has 417 non-null downstream forwarding state for the PIM/mLDP tunnel that 418 instantiates its downstream intra-AS segment. The targeted ASBR for 419 the Leaf A-D route, which (re-)advertised the Inter-AS A-D route, 420 MUST establish a tunnel to the leaves discovered by the Leaf A-D 421 routes." 423 5.2. I-PMSI Leaf Tracking 425 An ingress PE does not set the LIR flag in its Inclusive Multicast 426 Ethernet Tag (IMET) A-D route's PTA, even with Ingress Replication or 427 RSVP-TE P2MP tunnels. It does not rely on the Leaf A-D routes to 428 discover leaves in its AS, and Section 11.2 of [RFC7432] explicitly 429 states that the LIR flag must be set to zero. 431 An implementation of [RFC7432] might have used the Originating 432 Router's IP Address field of the IMET A-D routes to determine the 433 leaves, or might have used the Next Hop field instead. Within the 434 same AS, both will lead to the same result. 436 With segmentation, an ingress PE MUST determine the leaves in its AS 437 from the BGP next hops in all its received IMET A-D routes, so it 438 does not have to set the LIR bit set to request Leaf A-D routes. PEs 439 within the same AS will all have different next hops in their IMET 440 A-D routes (hence will all be considered as leaves), and PEs from 441 other ASes will have the next hop in their IMET A-D routes set to 442 addresses of ASBRs in this local AS, hence only those ASBRs will be 443 considered as leaves (as proxies for those PEs in other ASes). Note 444 that in case of Ingress Replication, when an ASBR re-advertises IMET 445 A-D routes to IBGP peers, it MUST advertise the same label for all 446 those for the same Ethernet Tag ID and the same EVI. When an ingress 447 PE builds its flooding list, multiple routes might have the same 448 (nexthop, label) tuple and they MUST only be added as a single branch 449 in the flooding list. 451 5.3. Backward Compatibility 453 The above procedures assume that all PEs are upgraded to support the 454 segmentation procedures: 456 o An ingress PE uses the Next Hop instead of Originating Router's IP 457 Address to determine leaves for the I-PMSI tunnel. 459 o An egress PE sends Leaf A-D routes in response to I-PMSI routes, 460 if the PTA has the LIR flag set (by the re-advertising ASBRs). 462 o In case of Ingress Replication, when an ingress PE builds its 463 flooding list, multiple I-PMSI routes may have the same (nexthop, 464 label) tuple and only a single branch for those will be added in 465 the flooding list. 467 If a deployment has legacy PEs that does not support the above, then 468 a legacy ingress PE would include all PEs (including those in remote 469 ASes) as leaves of the inclusive tunnel and try to send traffic to 470 them directly (no segmentation), which is either undesired or not 471 possible; a legacy egress PE would not send Leaf A-D routes so the 472 ASBRs would not know to send external traffic to them. 474 To address this backward compatibility problem, the following 475 procedure can be used (see Section 6.2 for per-PE/AS/region I-PMSI 476 A-D routes): 478 o An upgraded PE indicates in its per-PE I-PMSI A-D route that it 479 supports the new procedures. This is done by setting a flag bit 480 in the EVPN Multicast Flags Extended Community. 482 o All per-PE I-PMSI A-D routes are restricted to the local AS and 483 not propagated to external peers. 485 o The ASBRs in an AS originate per-region I-PMSI A-D routes and 486 advertise to their external peers to advertise tunnels used to 487 carry traffic from the local AS to other ASes. Depending on the 488 types of tunnels being used, the LIR flag in the PTA may be set, 489 in which case the downstream ASBRs and upgraded PEs will send Leaf 490 A-D routes to pull traffic from their upstream ASBRs. In a 491 particular downstream AS, one of the ASBRs is elected, based on 492 the per-region I-PMSI A-D routes for a particular source AS, to 493 send traffic from that source AS to legacy PEs in the downstream 494 AS. The traffic arrives at the elected ASBR on the tunnel 495 announced in the best per-region I-PMSI A-D route for the source 496 AS, that the ASBR has selected of all those that it received over 497 EBGP or IBGP sessions. The election procedure is described in 498 Section 5.3.1. 500 o In an ingress/upstream AS, if and only if an ASBR has active 501 downstream receivers (PEs and ASBRs), which are learned either 502 explicitly via Leaf AD routes or implicitly via PIM join or mLDP 503 label mapping, the ASBR originates a per-PE I-PMSI A-D route 504 (i.e., regular Inclusive Multicast Ethernet Tag route) into the 505 local AS, and stitches incoming per-PE I-PMSI tunnels into its 506 per-region I-PMSI tunnel. With this, it gets traffic from local 507 PEs and send to other ASes via the tunnel announced in its per- 508 region I-PMSI A-D route. 510 Note that, even if there is no backward compatibility issue, the use 511 of per-region I-PMSI has the benefit of keeping all per-PE I-PMSI A-D 512 routes in their local ASes, greatly reducing the flooding of the 513 routes and their corresponding Leaf A-D routes (when needed), and the 514 number of inter-as tunnels. 516 5.3.1. Designated ASBR Election 518 When an ASBR re-advertises a per-region I-PMSI A-D route into an AS 519 in which a designated ASBR needs to be used to forward traffic to the 520 legacy PEs in the AS, it SHOULD include a DF Election EC. The EC and 521 its use is specified in [RFC8584]. The AC-DF bit in the DF Election 522 EC SHOULD be cleared. If it is known that no legacy PEs exist in the 523 AS, the ASBR SHOULD NOT include the EC and SHOULD remove the DF 524 Election EC if one is carried in the per-region I-PMSI A-D routes 525 that it receives. Note that this is done for each set of per-region 526 I-PMSI A-D routes with the same NLRI. 528 Based on the procedures in [RFC8584], an election algorithm is 529 determined according to the DF Election ECs carried in the set of 530 per-region I-PMSI routes of the same NLRI re-adverised into the AS. 531 The algorithm is then applied to a candidate list, which is the set 532 of ASBRs that re-advertised the per-region I-PMSI routes of the same 533 NLRI carrying the DF Election EC. 535 6. Inter-Region Segmentation 537 6.1. Area/AS vs. Region 539 [RFC7524] is for MVPN/VPLS inter-area segmentation and does not 540 explicitly cover EVPN. However, if "area" is replaced by "region" 541 and "ABR" is replaced by "RBR" (Regional Border Router) then 542 everything still works, and can be applied to EVPN as well. 544 A region can be a sub-area, or can be an entire AS including its 545 external links. Instead of automatic region definition based on IGP 546 areas, a region would be defined as a BGP peer group. In fact, even 547 with IGP area based region definition, a BGP peer group listing the 548 PEs and ABRs in an area is still needed. 550 Consider the following example diagram: 552 --------- ------ --------- 553 / \ / \ / \ 554 / \ / \ / \ 555 | PE1 o ASBR1 -- ASBR2 ASBR3 -- ASBR4 o PE2 | 556 \ / \ / \ / 557 \ / \ / \ / 558 --------- ------ --------- 559 AS 100 AS 200 AS 300 560 |-----------|--------|---------|--------|------------| 561 segment1 segment2 segment3 segment4 segment5 563 The inter-as segmentation procedures specified so far ([RFC6513] 564 [RFC6514], [RFC7117], and Section 5 of this document) require all 565 ASBRs to be involved, and Ingress Replication is used between two 566 ASBRs in different ASes. 568 In the above diagram, it's possible that ASBR1/4 does not support 569 segmentation, and the provider tunnels in AS 100/300 can actually 570 extend across the external link. In this case, the inter-region 571 segmentation procedures can be used instead - a region is the entire 572 (AS100 + ASBR1-ASBR2 link) or (AS300 + ASBR3-ASBR4 link). ASBR2/3 573 would be the RBRs, and ASBR1/4 will just be a transit core router 574 with respect to provider tunnels. 576 As illustrated in the diagram below, ASBR2/3 will establish a 577 multihop EBGP session with either a RR or directly with PEs in the 578 neighboring AS. I/S-PMSI A-D routes from ingress PEs will not be 579 processed by ASBR1/4. When ASBR2 re-advertises the routes into AS 580 200, it changes the next hop to its own address and changes PTA to 581 specify the tunnel type/identification in its own AS. When ASBR3 re- 582 advertises I/S-PMSI A-D routes into the neighboring AS 300, it 583 changes the next hop to its own address and changes PTA to specify 584 the tunnel type/identification in the neighboring region 3. Now the 585 segment is rooted at ASBR3 and extends across the external link to 586 PEs. 588 --------- ------ --------- 589 / RR....\.mh-ebpg / \ mh-ebgp/....RR \ 590 / : \ `. / \ .' / : \ 591 | PE1 o ASBR1 -- ASBR2 ASBR3 -- ASBR4 o PE2 | 592 \ / \ / \ / 593 \ / \ / \ / 594 --------- ------ --------- 595 AS 100 AS 200 AS 300 596 |-------------------|----------|---------------------| 597 segment 1 segment 2 segment 3 599 6.2. Per-region Aggregation 601 Notice that every I/S-PMSI route from each PE will be propagated 602 throughout all the ASes or regions. They may also trigger 603 corresponding Leaf A-D routes depending on the types of tunnels used 604 in each region. This may become too many - routes and corresponding 605 tunnels. To address this concern, the I-PMSI routes from all PEs in 606 a AS/region can be aggregated into a single I-PMSI route originated 607 from the RBRs, and traffic from all those individual I-PMSI tunnels 608 will be switched into the single I-PMSI tunnel. This is like the 609 MVPN Inter-AS I-PMSI route originated by ASBRs. 611 The MVPN Inter-AS I-PMSI A-D route can be better called as per-AS 612 I-PMSI A-D route, to be compared against the (per-PE) Intra-AS I-PMSI 613 A-D routes originated by each PE. In this document we will call it 614 as per-region I-PMSI A-D route, in case we want to apply the 615 aggregation at regional level. The per-PE I-PMSI routes will not be 616 propagated to other regions. If multiple RBRs are connected to a 617 region, then each will advertise such a route, with the same route 618 key (Section 3.1). Similar to the per-PE I-PMSI A-D routes, RBRs/PEs 619 in a downstream region will each select a best one from all those re- 620 advertised by the upstream RBRs, hence will only receive traffic 621 injected by one of them. 623 MVPN does not aggregate S-PMSI routes from all PEs in an AS like it 624 does for I-PMSIs routes, because the number of PEs that will 625 advertise S-PMSI routes for the same (s,g) or (*,g) is small. This 626 is also the case for EVPN, i.e., there is no per-region S-PMSI 627 routes. 629 Notice that per-region I-PMSI routes can also be used to address 630 backwards compatibility issue, as discussed in Section 5.3. 632 The Region ID in the per-region I-PMSI route's NLRI is encoded like 633 an EC. For example, the Region ID can encode an AS number or area ID 634 in the following EC format: 636 o For a two-octet AS number, a Transitive Two-Octet AS-Specific EC 637 of sub-type 0x09 (Source AS), with the Global Administrator sub- 638 field set to the AS number and the Local Administrator sub-field 639 set to 0. 641 o For a four-octet AS number, a Transitive Four-Octet AS-Specific EC 642 of sub-type 0x09 (Source AS), with the Global Administrator sub- 643 field set to the AS number and the Local Administrator sub-field 644 set to 0. 646 o For an area ID, a Transitive IPv4-Address-Specific EC of any sub- 647 type, with the Global Administrator sub-field set to the area ID 648 and the Local Administrator sub-field set to 0. 650 Uses of other EC encoding MAY be allowed as long as it uniquely 651 identifies the region and the RBRs for the same region uses the same 652 Region ID. 654 6.3. Use of S-NH-EC 656 [RFC7524] specifies the use of S-NH-EC because it does not allow ABRs 657 to change the BGP next hop when they re-advertise I/S-PMSI AD routes 658 to downstream areas. That is only to be consistent with the MVPN 659 Inter-AS I-PMSI A-D routes, whose next hop must not be changed when 660 they're re-advertised by the segmenting ABRs for reasons specific to 661 MVPN. For EVPN, it is perfectly fine to change the next hop when 662 RBRs re-advertise the I/S-PMSI A-D routes, instead of relying on S- 663 NH-EC. As a result, this document specifies that RBRs change the BGP 664 next hop when they re-advertise I/S-PMSI A-D routes and do not use S- 665 NH-EC. If a downstream PE/RBR needs to originate Leaf A-D routes, it 666 constructs an IP-based Route Target Extended Community by placing the 667 IP address carried in the Next Hop of the received I/S-PMSI A-D route 668 in the Global Administrator field of the Community, with the Local 669 Administrator field of this Community set to 0 and setting the 670 Extended Communities attribute of the Leaf A-D route to that 671 Community. 673 The advantage of this is that neither ingress nor egress PEs need to 674 understand/use S-NH-EC, and consistent procedure (based on BGP next 675 hop) is used for both inter-as and inter-region segmentation. 677 6.4. Ingress PE's I-PMSI Leaf Tracking 679 [RFC7524] specifies that when an ingress PE/ASBR (re-)advertises an 680 VPLS I-PMSI A-D route, it sets the LIR flag to 1 in the route's PTA. 681 Similar to the inter-as case, this is actually not really needed for 682 EVPN. To be consistent with the inter-as case, the ingress PE does 683 not set the LIR flag in its originated I-PMSI A-D routes, and 684 determines the leaves based on the BGP next hops in its received 685 I-PMSI A-D routes, as specified in Section 5.2. 687 The same backward compatibility issue exists, and the same solution 688 as in the inter-as case applies, as specified in Section 5.3. 690 7. Multi-homing Support 692 If multi-homing does not span across different ASes or regions, 693 existing procedures work with segmentation, and a segmentation point 694 will remove the ESI label from the packets. If an ES is multi-homed 695 to PEs in different ASes or regions, additional procedures are needed 696 to work with segmentation. The procedures are well understood but 697 omitted here until the requirement becomes clear. 699 8. IANA Considerations 701 IANA has temporarily assigned the following new EVPN route types: 703 o 9 - Per-Region I-PMSI A-D route 705 o 10 - S-PMSI A-D route 706 o 11 - Leaf A-D route 708 This document requests IANA to assign one flag bit from the EVPN 709 Multicast Flags Extended Community: 711 o Bit-S - The router supports segmentation procedure defined in this 712 document 714 9. Security Considerations 716 This document does not seem to introduce new security risks, though 717 this may be revised after further review and scrutiny. 719 10. Acknowledgements 721 The authors thank Eric Rosen, John Drake, and Ron Bonica for their 722 comments and suggestions. 724 11. Contributors 726 The following also contributed to this document through their earlier 727 work in EVPN selective multicast. 729 Junlin Zhang 730 Huawei Technologies 731 Huawei Bld., No.156 Beiqing Rd. 732 Beijing 100095 733 China 735 Email: jackey.zhang@huawei.com 737 Zhenbin Li 738 Huawei Technologies 739 Huawei Bld., No.156 Beiqing Rd. 740 Beijing 100095 741 China 743 Email: lizhenbin@huawei.com 745 12. References 747 12.1. Normative References 749 [I-D.ietf-bess-evpn-igmp-mld-proxy] 750 Sajassi, A., Thoria, S., Patel, K., Drake, J., and W. Lin, 751 "IGMP and MLD Proxy for EVPN", draft-ietf-bess-evpn-igmp- 752 mld-proxy-04 (work in progress), September 2019. 754 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 755 Requirement Levels", BCP 14, RFC 2119, 756 DOI 10.17487/RFC2119, March 1997, 757 . 759 [RFC7117] Aggarwal, R., Ed., Kamite, Y., Fang, L., Rekhter, Y., and 760 C. Kodeboniya, "Multicast in Virtual Private LAN Service 761 (VPLS)", RFC 7117, DOI 10.17487/RFC7117, February 2014, 762 . 764 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 765 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 766 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 767 2015, . 769 [RFC7524] Rekhter, Y., Rosen, E., Aggarwal, R., Morin, T., 770 Grosclaude, I., Leymann, N., and S. Saad, "Inter-Area 771 Point-to-Multipoint (P2MP) Segmented Label Switched Paths 772 (LSPs)", RFC 7524, DOI 10.17487/RFC7524, May 2015, 773 . 775 [RFC7988] Rosen, E., Ed., Subramanian, K., and Z. Zhang, "Ingress 776 Replication Tunnels in Multicast VPN", RFC 7988, 777 DOI 10.17487/RFC7988, October 2016, 778 . 780 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 781 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 782 May 2017, . 784 [RFC8534] Dolganow, A., Kotalwar, J., Rosen, E., Ed., and Z. Zhang, 785 "Explicit Tracking with Wildcard Routes in Multicast VPN", 786 RFC 8534, DOI 10.17487/RFC8534, February 2019, 787 . 789 [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, 790 J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet 791 VPN Designated Forwarder Election Extensibility", 792 RFC 8584, DOI 10.17487/RFC8584, April 2019, 793 . 795 12.2. Informative References 797 [I-D.ietf-bess-evpn-prefix-advertisement] 798 Rabadan, J., Henderickx, W., Drake, J., Lin, W., and A. 799 Sajassi, "IP Prefix Advertisement in EVPN", draft-ietf- 800 bess-evpn-prefix-advertisement-11 (work in progress), May 801 2018. 803 [RFC4875] Aggarwal, R., Ed., Papadimitriou, D., Ed., and S. 804 Yasukawa, Ed., "Extensions to Resource Reservation 805 Protocol - Traffic Engineering (RSVP-TE) for Point-to- 806 Multipoint TE Label Switched Paths (LSPs)", RFC 4875, 807 DOI 10.17487/RFC4875, May 2007, 808 . 810 [RFC6388] Wijnands, IJ., Ed., Minei, I., Ed., Kompella, K., and B. 811 Thomas, "Label Distribution Protocol Extensions for Point- 812 to-Multipoint and Multipoint-to-Multipoint Label Switched 813 Paths", RFC 6388, DOI 10.17487/RFC6388, November 2011, 814 . 816 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 817 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 818 2012, . 820 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 821 Encodings and Procedures for Multicast in MPLS/BGP IP 822 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 823 . 825 [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 826 Przygienda, T., and S. Aldrin, "Multicast Using Bit Index 827 Explicit Replication (BIER)", RFC 8279, 828 DOI 10.17487/RFC8279, November 2017, 829 . 831 Authors' Addresses 833 Zhaohui Zhang 834 Juniper Networks 836 EMail: zzhang@juniper.net 838 Wen Lin 839 Juniper Networks 841 EMail: wlin@juniper.net 843 Jorge Rabadan 844 Nokia 846 EMail: jorge.rabadan@nokia.com 847 Keyur Patel 848 Arrcus 850 EMail: keyur@arrcus.com 852 Ali Sajassi 853 Cisco Systems 855 EMail: sajassi@cisco.com