idnits 2.17.1 draft-wang-bess-evpn-cmac-overload-reduction-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC7623], [RFC8986], [I-D.ietf-bess-srv6-services]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 449: '... MUST be the same as the Attachment ...' RFC 2119 keyword, line 661: '...e End.DX2AGG SID MUST be the last segm...' RFC 2119 keyword, line 919: '... PE1 MUST use RT-2 route RT2S (RT-2 ...' RFC 2119 keyword, line 925: '...ce. When PE3 receives RT2S, RT2S MUST...' RFC 2119 keyword, line 937: '...by PE1, the RT2S MUST be withdrawn, th...' (1 more instance...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: When PE2 receives RT2S, the MAC entry Mx should be installed with AC2 as its actual outgoing-interface. When PE3 receives RT2S, RT2S MUST not be imported into VN-10 because that the ES-Import RT of RT2S can be resolved to a local ES of PE3. -- The document date (30 August 2021) is 968 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC7796' is mentioned on line 346, but not defined == Unused Reference: 'RFC7041' is defined on line 1170, but no explicit reference was found in the text == Unused Reference: 'RFC8365' is defined on line 1183, but no explicit reference was found in the text == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-igmp-mld-proxy-12 == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-unequal-lb-14 == Outdated reference: A later version (-15) exists of draft-ietf-bess-srv6-services-07 == Outdated reference: A later version (-06) exists of draft-sajassi-bess-evpn-ac-aware-bundling-04 == Outdated reference: A later version (-01) exists of draft-wang-bess-evpn-ac-df-per-evi-00 == Outdated reference: A later version (-17) exists of draft-ietf-bess-evpn-irb-extended-mobility-05 Summary: 2 errors (**), 0 flaws (~~), 11 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS WG Y. Wang 3 Internet-Draft R. Chen 4 Intended status: Standards Track ZTE Corporation 5 Expires: 3 March 2022 30 August 2021 7 Light Weighted EVPN 8 draft-wang-bess-evpn-cmac-overload-reduction-07 10 Abstract 12 SRv6 EVPN [I-D.ietf-bess-srv6-services] is not sufficient for some 13 light-weighted use cases. When PBB EVPN [RFC7623] over SRv6 is used 14 to support these light-weighted EVPN services, it is complicated to 15 make use of the SID list to carry a function that is aiming for 16 C-MACs. 18 In [RFC8986], End.DX2 function is defined, this function can be used 19 in EVPN VPLS. When it is used in EVPN VPLS, the data-plane learning 20 defined in End.DT2U function can also be transplanted into End.DX2 21 function. On the basis of such extended End.DX2 function, SRv6 EVPN 22 can be improved to meet all the requirements per [RFC7623] and bring 23 us some other benefits. Such SRv6 EVPN is called light-weighted SRv6 24 EVPN, and it will be more simpler than PBB EVPN over SRv6. 26 It is easy for the light-weighted SRv6 EVPN to carry a SID that is 27 aiming for customer ethernet packets, because there will be no other 28 ethernet header between the SID list and the customer ethernet 29 header. These SIDs may be user-defined functions for the customer 30 ethernet headers. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at https://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on 3 March 2022. 49 Copyright Notice 51 Copyright (c) 2021 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 56 license-info) in effect on the date of publication of this document. 57 Please review these documents carefully, as they describe your rights 58 and restrictions with respect to this document. Code Components 59 extracted from this document must include Simplified BSD License text 60 as described in Section 4.e of the Trust Legal Provisions and are 61 provided without warranty as described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 66 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 3 67 1.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 68 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 69 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 5 70 2.1. No C-MAC Awareness in the Backbone . . . . . . . . . . . 6 71 2.2. Flexible Multi-homing Remains Supported . . . . . . . . . 6 72 2.3. C-MAC Address Learning and Confinement . . . . . . . . . 6 73 2.4. No C-MAC Flushing for All-Active ESes . . . . . . . . . . 6 74 2.5. Independent C-MAC Flushing for Single-Active ESes . . . . 7 75 2.6. Independent Convergency per . . . . . . . . . 7 76 2.7. ESI Route Aggregation in Backbone . . . . . . . . . . . . 7 77 2.8. Unequal load-balance . . . . . . . . . . . . . . . . . . 7 78 2.9. AC-aware Service Interface . . . . . . . . . . . . . . . 8 79 2.10. Ingress Filtering for Unicast Flows of E-Tree Services . 8 80 2.11. AC-Influenced DF Election . . . . . . . . . . . . . . . . 8 81 2.12. Synchronous MAC Entries in All-Active Mode . . . . . . . 8 82 3. Light-Weighted SRv6 EVPN Overview . . . . . . . . . . . . . . 8 83 3.1. Use Case . . . . . . . . . . . . . . . . . . . . . . . . 8 84 3.2. Solution Overview . . . . . . . . . . . . . . . . . . . . 9 85 3.2.1. Aggregatable End.DX2 SID . . . . . . . . . . . . . . 10 86 3.2.2. The Advertisement of ESI-Prefixes . . . . . . . . . . 10 87 3.3. Packet Walkthrough . . . . . . . . . . . . . . . . . . . 11 88 3.3.1. PE1 forward ARP Request to PE2/PE3 . . . . . . . . . 11 89 3.3.2. PE2/PE3's Dataplane MAC Learning . . . . . . . . . . 12 90 3.3.3. PE2 Discard ARP Request to H1 . . . . . . . . . . . . 12 91 3.3.4. PE3 Forward ARP Replay to PE1/PE2 . . . . . . . . . . 13 92 3.3.5. PE1 Forward ARP Replay to H1 . . . . . . . . . . . . 13 93 4. Decapsulation Optimizations . . . . . . . . . . . . . . . . . 13 94 4.1. Decapsulation Aggregation . . . . . . . . . . . . . . . . 13 95 4.2. End.DX2AGG Function and Arg.ACI . . . . . . . . . . . . . 14 96 5. Advanced Considerations . . . . . . . . . . . . . . . . . . . 15 97 5.1. ESI SID Aggregation . . . . . . . . . . . . . . . . . . . 15 98 5.2. ESI/AC SID Advertisement Optimization . . . . . . . . . . 16 99 5.2.1. Advertise ESI-Locators in Underlay Network . . . . . 16 100 5.2.2. Using EAD/EVI Route to Advertise AC SIDs . . . . . . 16 101 5.2.3. Using EAD/ES Route to Advertise ESI SIDs . . . . . . 16 102 5.2.4. The Reduction of EAD/EVI Routes . . . . . . . . . . . 17 103 5.2.4.1. AC-DF per EVI Mode for Light-Weighted EVPNs . . . 17 104 5.2.4.2. On Receiving Reverse EAD/EVI Routes . . . . . . . 18 105 5.3. Unequal LB Advertisement . . . . . . . . . . . . . . . . 18 106 5.4. AC-aware Bundling Service Interface . . . . . . . . . . . 19 107 5.5. C-MAC Flush Notification Procedure . . . . . . . . . . . 19 108 5.6. E-Tree Support Considerations . . . . . . . . . . . . . . 19 109 5.7. MAC-Synchronization in All-Active Mode . . . . . . . . . 20 110 5.8. EVPN IRB Support Considerations . . . . . . . . . . . . . 21 111 5.8.1. EVPN IRB Extended Mobility . . . . . . . . . . . . . 21 112 5.8.2. Anycast IRB interfaces . . . . . . . . . . . . . . . 21 113 5.8.2.1. Constructing GW-list . . . . . . . . . . . . . . 21 114 5.8.2.2. Flooding over GW-list . . . . . . . . . . . . . . 21 115 6. Comparison with Other Solutions . . . . . . . . . . . . . . . 22 116 6.1. Detailed Comparisons with PBB EVPN over SRv6 . . . . . . 22 117 6.2. Detailed Comparisons with Anycast Node SID . . . . . . . 23 118 7. Security Considerations . . . . . . . . . . . . . . . . . . . 23 119 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 120 8.1. End.DX2AGG SID . . . . . . . . . . . . . . . . . . . . . 23 121 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 23 122 10. Normative References . . . . . . . . . . . . . . . . . . . . 23 123 11. Informative References . . . . . . . . . . . . . . . . . . . 25 124 Appendix A. Explanation for Physical Links of the Use-cases . . 26 125 A.1. Failure Detections for P1.2 (or P2.1) . . . . . . . . . . 27 126 A.2. Protection Approaches for N1 (or N2) . . . . . . . . . . 27 127 A.2.1. CCC-Approaches . . . . . . . . . . . . . . . . . . . 28 128 A.2.1.1. CCC Active-Active Protection . . . . . . . . . . 28 129 A.2.1.2. CCC Active-Standby Protection . . . . . . . . . . 28 130 A.2.2. VSI-Approaches . . . . . . . . . . . . . . . . . . . 28 131 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 28 133 1. Introduction 135 1.1. Background 137 When there are too many customer-MACs (C-MACs), the RRs and/or ASBRs 138 will be overloaded by the RT-2 routes for these MACs according to 139 [RFC7432]. This issue can be solved by light-weighted EVPNs. PBB 140 EVPN [RFC7623] is a MPLS-based light-weighted EVPN solution. But in 141 SRv6 network, PBB EVPN over SRv6 is not a good choice for light- 142 weighted EVPN solution. 144 This document proposes some new extensions to 145 [I-D.ietf-bess-srv6-services] to achieve all-active mode ES 146 redundancy on TPEs and reduce the C-MAC loads for RRs and ASBRs at 147 the same time. The new solution will work even more better than PBB 148 EVPN under the help of these extensions, especially when there is no 149 deployment of MPLS dataplane. 151 Furthermore, it naturally brings the benefits of high scalability, 152 faster network convergence, and reduced operational complexity, and 153 we call it light-weighted EVPNs because of these advantages. 155 1.2. Overview 157 In [RFC7432], the C-MACs is advertised via RT-2 route. This behavior 158 is inheritted by [I-D.ietf-bess-srv6-services]. but in order to 159 solve the C-MAC overload problem for RRs and ASBRs, we have to return 160 to a PBB-like dataplane C-MAC learning procedures. 162 We discuss all the requirements for a light-weighted EVPN solution 163 which pushes no C-MAC entries into the backbone network in Section 2. 164 Note that some of these requirements is not supported well by PBB 165 EVPN. 167 In this document, the light-weighted EVPN solutions are also called 168 as EVPN-lite for short. 170 Note that the EVI here corresponds to the I-Component of [RFC7623], 171 not the B-Component. In fact, there will be no typical B-components 172 in EVPN-lite SRv6 solutions. 174 1.3. Terminology 176 Most of the terminology used in this documents comes from [RFC7432] 177 and [I-D.ietf-bess-srv6-services] except for the following: 179 * Light-weighted EVPN: The EVPN solution with high scalability and 180 reduced operational complexity. 182 * EVPN-lite: The Light-weighted EVPN is also called EVPN-lite for 183 short. 185 * C-MAC: Customer MAC, it is the same as the C-MAC of PBB EVPN. 187 * ISID: a broadcast domain identifier in PBB I-Component. 189 * LDV: Local Discreminating Value. It is similar to the Local 190 Discreminating Value of type 3 ESI. 192 * GDV: Global Discreminating Value. An identifier with global 193 uniqueness. 195 * EGD: EVI-GDV, an EVI's Global Discreminator, it is a GDV for an 196 EVPN MAC-VRF. A EGD is used to idenfify an EVPN MAC-VRF in data 197 plane. The EGD is a Global Discreminating Value (GDV) of that 198 EVPN MAC-VRF, so it is also the abbreviation of EVPN-GDV. e.g. 199 The EGD of [RFC7348] is a global VNI. In this draft, the EGD of 200 an EVI is that MAC-VRF's VPN ID of global uniqueness. 202 * AC SID: The End.DX2 SID of a specified AC, different ACs on the 203 same ES have different AC SIDs. 205 * ESI SID: An SRv6 SID whose function type is End.DX2AGG. Note that 206 when the ESI is all-active mode, the ESI SID is the same on all 207 PEs of that ES, according to Section 3.2. In such case, the ESI 208 SID can be called as ES anycast SID too. Different ACs on the 209 same ES have the same ESI SID with different Arg.ACI. 211 * ESI IP: The End.DX2AGG SID of a specified ESI, but with an empty 212 Arg.ACI. 214 * ESI Prefix: The IPv6 Prefix that covers all AC-SIDs of the 215 specified ESI. 217 * Ingress AC SID: The End.DX2 SID of the ingress-AC on ingress-PE. 218 Note that the Ingress AC SID are typically encapsulated as SRv6 219 Source IP in data plane. 221 * EAD/ES: Ethernet A-D route per EVI, or RT-1 per ES route. 223 * EAD/EVI: Ethernet A-D route per EVI, or RT-1 per ES route. 225 * Arg.ACI: The argument part of a SID of the End.DX2AGG function is 226 called as Arg.ACI, because the value of that argument will be an 227 AC-ID. 229 * RT-2: MAC/IP Advertise Route. 231 * MAC Entry: An entry in the EVPN MAC table in data-plane. 233 * GRT: Global Routing Table. 235 2. Requirements 237 Light-weighted SRv6 EVPNs should be provided together with the 238 following requirements: 240 2.1. No C-MAC Awareness in the Backbone 242 In typical operation, an EVPN PE sends a BGP MAC Advertisement route 243 per C-MAC address. In certain applications, this poses scalability 244 challenges, as is the case in data center interconnect (DCI) 245 scenarios where the number of virtual machines (VMs), and hence the 246 number of C-MAC addresses, can be in the millions. This is called as 247 C-MAC overload of DC Backbone. In such scenarios, it is required to 248 reduce the number of BGP MAC Advertisement routes by relying on a 249 'EVPN-lite' scheme, as is provided by ESI and its equivalents (e.g. 250 Pseudo B-MAC, ESI IP). 252 2.2. Flexible Multi-homing Remains Supported 254 Flexible multi-homing means that different ES instances can have 255 different adjacent-PEs. We call all the adjacent-PEs of the same ES 256 instances as that ES's location-set in this document. Flexible 257 multi-homing means that different ES can have different location-set. 259 For example, ES101's location-set is {PE1}, ES102's location-set is 260 {PE2, PE3}, ES103's location-set is {PE1, PE3}, and ES104's location- 261 set is {PE2,PE4}. 263 2.3. C-MAC Address Learning and Confinement 265 In EVPN, all the PE nodes participating in the same EVPN instance are 266 exposed to all the C-MAC addresses learnt by any one of these PE 267 nodes because a C-MAC learnt by one of the PE nodes is advertised in 268 BGP to other PE nodes in that EVPN instance. This is the case even 269 if some of the PE nodes for that EVPN instance are not involved in 270 forwarding traffic to, or from, these C-MAC addresses. Even if an 271 implementation does not install hardware forwarding entries for C-MAC 272 addresses that are not part of active traffic flows on that PE, the 273 device memory is still consumed by keeping record of the C-MAC 274 addresses in the routing information base (RIB) table. In network 275 applications with millions of C-MAC addresses, this introduces a non- 276 trivial waste of PE resources. As such, it is required to confine 277 the scope of visibility of C-MAC addresses to only those PE nodes 278 that are actively involved in forwarding traffic to, or from, these 279 addresses. 281 2.4. No C-MAC Flushing for All-Active ESes 283 Just as in [RFC7623], it is required to avoid C-MAC address flushing 284 upon link, port, or node failure for remote All-Active multihomed 285 segments. 287 Note that when an ES fails on one PE, it may still works well on 288 another PE, so the C-MACs should not be flushed. 290 2.5. Independent C-MAC Flushing for Single-Active ESes 292 Just as in [RFC7623], upon single-active ESI's link or port failure, 293 the C-MACs of other single-active ESes from the same PE will not be 294 flushed. 296 2.6. Independent Convergency per 298 When the physical port of an All-Active ES works well, but a single 299 Ethernet Tag ID (ETI) of that ES fails (illustrated as the 'X' flag 300 in Figure 1) on PE1, The traffic to that ETI of that ES will be re- 301 routed to other adjacent PE of the same ES, but the traffic to other 302 ETIs of the same ES will not be affected. 304 If PE1 is the last active link for that before that 305 failure, C-MAC flush should be triggered on the remote PEs. If that 306 ESI is single-active, C-MAC flush should be triggered on the remote 307 PEs too. 309 Note that when AC (ES link) fails but PE node still works well, there 310 should not be steady bypassing traffic either. The steady bypassing 311 problem is discussed in [I-D.wang-bess-evpn-egress-protection]. 313 2.7. ESI Route Aggregation in Backbone 315 In SRv6 EVPN, different sub-interfaces of the same ESI can have 316 different AC-SIDs in order to achieve Independent Convergency per 317 . But only the common prefix (say ESI-Prefix) of them 318 should be advertised in underlay network. 320 Note that only the common prefix need to be advertised in the overlay 321 network before any of these sub-interfaces failed. 323 Note that different ESIs may use the same SRv6 locator. In such 324 case, these ESI SIDs are aggregated into that anycast SRv6 locator 325 while they are advertised in the underlay network. 327 2.8. Unequal load-balance 329 The light-weighted EVPNs should support the unequal load-balance 330 defined in [I-D.ietf-bess-evpn-unequal-lb]. 332 2.9. AC-aware Service Interface 334 In AC-aware bundling service interface 335 [I-D.sajassi-bess-evpn-ac-aware-bundling], the ESes may make its two 336 VLANs to be attached to the same broadcast domain. These two VLANs 337 may be assigned to the same sub-interface, or to different sub- 338 interfaces. 340 2.10. Ingress Filtering for Unicast Flows of E-Tree Services 342 The filtering needed by an E-Tree service for known unicast traffic 343 should be performed at the ingress PE, thus providing very efficient 344 filtering and avoiding sending known unicast traffic over the PSN to 345 be filtered at the egress PE, as is done in traditional E-Tree 346 solutions (i.e., E-Tree for VPLS [RFC7796]). 348 2.11. AC-Influenced DF Election 350 When the EAD/EVI route is not advertised before the corresponding ESI 351 sub-interface fails, The AC-influenced DF Election procedures should 352 elect the right DF before and after that failure. 354 Note that according to [RFC8584], the AC-influenced DF Election will 355 be incorrect when no EAD/EVI route is advertised, even if no ESI sub- 356 interface has failed at all. 358 The AC-influenced DF Election should support "service carving" like 359 what [RFC7432] section 8.5 have done. 361 2.12. Synchronous MAC Entries in All-Active Mode 363 When a C-MAC Mx is learnt on an attachment circuit AC1 of an all- 364 active Ethernet Segment ES21 on PE1, Mx should not be in unknown 365 unicast state on PE2, which is also adjacent with ES21. And the 366 outgoing interface of PE2's MAC entry of Mx should be AC2, which is 367 an AC of ES21 and has the same VLAN as AC1. 369 3. Light-Weighted SRv6 EVPN Overview 371 3.1. Use Case 373 The physical links of these use cases are described in Appendix A. 374 Here we discribe the ACs and broadcast domains. Note that the VN-10/ 375 VN-20/VUN1 in Figure 1 is the VPNx/VPNy/NIz in Figure 5. 377 The ethernet segment ES21's ESI is ESI21, the ES21 is attached to 378 MAC-VRF VN-10 via attachment circuit AC1 on PE1, the ES21 is attached 379 to MAC-VRF VN-10 via attachment circuit AC2 on PE2, We assign an 380 End.DX2 SID DX2_AC1 to AC1, and we assign an End.DX2 SID DX2_AC2 to 381 AC2. The ethernet segment ES3's ESI is ESI3, the ES3 is attached to 382 MAC-VRF VN-10 via attachment circuit AC3 on PE3, We assign an End.DX2 383 SID DX2_AC3 to AC3. 385 Note that network instance VUN1 is the (virtual) underlay network of 386 VN-10 and VN-20. Because that VN-10 and VN-20 are SRv6 EVPN MAC- 387 VRFs, their underlay network will be the SRv6 network of the GRT. 389 +------------------------+ 390 PE1 | | 391 +-------+-------+ ------------> | 392 V1P1 | ___(VN-10) | VUN1: DX2AGG:: | 393 EVC1(V1) V2P1 | /DX2_AC1 \ | through IGP or | 394 SN1--O=------=========< (VUN1) | RT-1 per ES | PE3 395 \ / ESI21 | \___ / | +----+-----+ 396 | | + | X (VN-20) | | | 397 X X | +----------+----+ | | 398 |P3| | | | (VN-10)--+H3 399 | | | | | RT2(VN:C-MAC)| / | 400 V1 | | V2 | DX2AGG::/96 | | EVI-RT | (VUN1) | 401 | | | | V ES-Import RT | \ | 402 | | | | | (VN-20)--+H4 403 | | | +----------+----+ | | 404 \/ + | X___(VN-10) | | | 405 /\ ESI21 | /DX2_AC2 \ | +----+-----+ 406 SN2--O===--===========< (VUN1) | | 407 EVC2(V2) V1P2 | \___ / | | 408 V2P2 | (VN-20) | | 409 +-------+-------+ | 410 PE2 | | 411 +------------------------+ 413 Figure 1: EVPN-lite SRv6 Usecase 415 We use IMET routes to build a broadcast-list. The broadcast-list is 416 used to forward BUM traffics. The data-plane MAC learning for BUM 417 traffics produces the first batch of C-MAC entries. The subsequent 418 C-MAC entries can be learnt from Unicast traffics and/or BUM 419 traffics. It is clear that we don't use MAC/IP routes to advertise 420 C-MAC entries as usual, that is for fear that the RRs and/or SPEs are 421 overloaded by these C-MACs. 423 3.2. Solution Overview 424 3.2.1. Aggregatable End.DX2 SID 426 When an Ethernet Segment ES21 is attached to an EVI, the attachment- 427 circuit AC1 for that is assigned with an End.DX2 SID. 428 Different ACs of the same ESI are assigned with different End.DX2 429 SIDs, we call them AC SIDs in this document. But these different 430 End.DX2 SIDs must be able to be aggregated into the same prefix, and 431 this prefix are called as ESI prefix in light-weighted SRv6 EVPNs. 432 The format of aggregatable End.DX2 SIDs is illustrated in the 433 following figure: 435 |<--- ESI-Prefix(128-N bits) ---->|<---- N bits --->| 436 +------------+------------+-----------+-------------------------+ 437 | Block | Node | ESI.LDV | AC-ID | 438 +------------+------------+-----------+-------------------------+ 439 |<------ Locator -------->|<------------- Function ------------>| 441 Figure 2: End.DX2 SID Formart for Aggregation 443 Note that the ESI.LDV field is the Local Discreminator Value (LDV) of 444 the ESI (especially the type 3/4/5 ESI). The AC-ID field is the 445 identifier of the AC's EVI. The ESI.LDV field and the AC-ID field 446 are integrated into the End.DX2 SID's Function part. 448 Note that in "AC-aware bundling service interface" the AC-ID field 449 MUST be the same as the Attachment Circuit ID of 450 [I-D.sajassi-bess-evpn-ac-aware-bundling]. But in other service 451 interfaces the AC-ID field can also be the EGD of that AC's MAC-VRF. 452 Note that the EGD has a global meaning like a global VNI or a PBB 453 I-SID, while the ordinary AC-ID part for an aggregatable End.DX2 SID 454 typically is only a VLAN-ID on that ES. 456 Note that the ESI IP of an AC is that AC's End.DX2 SID but with a 457 zero AC-ID. The AC SIDs have non-zero AC-IDs, but the ESI-IPs always 458 have zero AC-IDs. Becuase an ESI-IP identifies an ESI, not an AC. 460 Note that if ESI21 is single-active mode, DX2_AC1 is different from 461 DX2_AC2, but if ESI21 is all-active mode, DX2_AC1 is the same as 462 DX2_AC2, we can call them DX2_SID21 in such case. 464 3.2.2. The Advertisement of ESI-Prefixes 466 The ESI-prefixes of DX2_AC1 and DX2_AC2 are defined in Figure 2, and 467 they are called ESI_Prefix1 and ESI_Prefix2 respectively. We can use 468 IGP protocols to advertise these ESI-Prefixes to PE3 respectively in 469 SRv6 underlay. So we don't have to use EAD/ES route or EAD/EVI route 470 in SRv6 EVPN in this section. 472 Note that the SRv6 SID in IMET route is an End.DT2M SID but with a 473 zero argument length. 475 Note that if ESI21 is single-active mode, ESI_Prefix1 is different 476 from ESI_Prefix2, but if ESI21 is all-active mode, ESI_Prefix1 is the 477 same as ESI_Prefix2. 479 Note that when PE1 node fails and the ESI is all active, the PLR node 480 will do underlay anycast FRR switching for 481 DX2_SID21(=DX2_AC2=DX2_AC1). This will bring out fast network 482 convergency. 484 Note that when the PE-CE link of ESI21 fails, the IGP route of 485 ESI_Prefix1 will be withdrawn, So there will be no steady bypassing 486 for that ES, but a temporary bypassing can be performed to further 487 improve the convergency. 489 When two ESes are attached to the same redundancy group of PEs, they 490 can share the same anycast SRv6 Locator. In such case, only the 491 common SRv6 Locator is advertised by the underlay network. But they 492 should have different ESI-Prefix. Because that the ESI-SID 493 Aggregation is not recommanded to be activated in order to avoid the 494 steady bypass problems described in Section 5.1. 496 The detailed comparisons between light-weighted SRv6 EVPN and PBB 497 EVPN over SRv6 is described in Section 6. 499 3.3. Packet Walkthrough 501 3.3.1. PE1 forward ARP Request to PE2/PE3 503 * When H1 (of SN1) requests H3's ARP, PE1 will receive the ARP 504 Request BUM1 from AC1 of ESI21. PE1 will forward the ARP Request 505 following the broadcast-list of AC1's MAC-VRF VN-10. The 506 broadcast-list is constructed by the IMET routes from PE2 and PE3. 507 The End.DX2 SID of AC1 is named as DX2_AC1. 509 PE1 will forward the ARP Request to PE2 and PE3. The inner SMAC 510 of the ARP request is M1 which is H1's MAC address. 512 * In this step, PE1 will forward the ARP Request BUM1 to PE2/PE3 513 with the following SRv6 encapsulation: It's underlay Source IP is 514 the End.DX2 SID (DX2_AC1) on PE1 for the ingress AC; It's underlay 515 Destination IP is the End.DT2M SID (whose argument length is zero) 516 on PE2/PE3. 518 Note that the underlay SIP will be the End.DT2U SID (because they 519 don't need any dedicated End.DX2 SIDs) for the single-homed 520 ingress ACs. The multi-homed ingress ACs with single-active 521 behavior may not be assigned with a dedicated ESI-Prefix either. 522 In such situations, the underlay SIP can be the End.DT2U SID too. 523 Note that in such situations, the AC SIDs of all single-active 524 ESIs for the same EVI are aggregated into the same End.DT2U SID. 526 3.3.2. PE2/PE3's Dataplane MAC Learning 528 * When PE2/PE3 receives the ARP Request packet BUM1, they do 529 dataplane MAC learning independently. They will learn that M1 is 530 behind DX2_AC1. 532 Note that when PE2 learns that M1 is behind DX2_AC1, it will 533 assume that M1 is behind the local AC (AC2) whose End.DX2 SID 534 (DX2_AC2) is the same as DX2_AC1 too. The local AC may have more 535 higher priority than the remote one. 537 After the dataplane MAC learning, the ARP request packet BUM1 is 538 broadcasted to the local ACs, behind one of which is H3. 540 3.3.3. PE2 Discard ARP Request to H1 542 * On receiving BUM1 from PE1, PE2 use the ingress ESI information 543 (DX2_AC1) in BUM1 to determine its ingress ESI-Prefix, When ESI21 544 is all-active mode and PE2 is about to forward the ARP request to 545 H1, PE2 will find that the AC SID (DX2_AC2) for the outgoing AC 546 (AC2) is of the same ESI-Prefix, so PE2 discards it for ESI loop- 547 free considerations. 549 Note that before that ARP Request packet is discarded, its source- 550 MAC can be learnt, especially in "AC-aware bundling service 551 interface". The MAC entry is learnt against DX2_AC1, but it will 552 consider the local sub-interface (of the same AC SID) on that ES 553 as its outgoing interface, in order to avoid unknown-unicast 554 flooding. 556 When ESI21 is single-active mode, the outgoing AC may be in 557 blocking state, otherwise its corresponding sub-interface on H1 558 will take charge of packet-drop behavior instead. So although the 559 AC-SID (DX2_AC2) for the outgoing AC is not the same as DX2_AC1, 560 no loop will arise in the Ethernet Segment. 562 * In this step, PE2 can compare the ingress AC-SID of BUM1 and the 563 AC-SID of outgoing AC directly, no SID-to-ESI lookup needed. 565 3.3.4. PE3 Forward ARP Replay to PE1/PE2 567 * When H3 replies to H1 for the ARP request BUM1, PE3 will forward 568 the ARP reply U1 according to the MAC entry M1 learnt previously 569 as above. 571 PE3 will forward the ARP reply U1 to PE1 or PE2 according to 572 DX2_AC1's SRv6 locator's IGP route. 574 When ESI21 is all-active mode, DX2_AC1 will be the same as 575 DX2_AC2, in such case, we call both of them DX2_SID21 instead. 576 The traffics to M1 will be load-balanced between PE1 and PE2. 577 Because that DX2_SID21's locator is advertised by both PE1 and PE2 578 in the underlay IGP protocol. 580 * In this step, PE3 will forward the ARP reply U1 to PE1 with the 581 following SRv6 encapsulation: It's underlay Source IP is the 582 End.DX2 SID on PE3 for AC3; It's underlay Destination IP is the 583 End.DX2 SID (DX2_AC1) on PE1 for AC1 according to the MAC entry 584 M1. 586 Note that if the DIP is just the anycast node SID of PE1 and PE2, 587 when the PE-CE link of ESI21 fails, the traffic will be steadily 588 bypassed untill that link recovers again. That's why MAC-entries 589 should be learnt against AC-SIDs. 591 3.3.5. PE1 Forward ARP Replay to H1 593 * When PE1 receives the ARP reply packet U1 from PE3, PE1 first 594 match the packet to its MAC-VRF VN-10 by U1's destination End.DX2 595 SID. And PE1 will not discard it because the egress AC's AC-SID 596 is not the same as the ingress AC-SID (which is represented by 597 U1's source IP). 599 * In this step, When PE1 receives the SRv6 encapsulated ARP reply 600 packet U1 from PE3, PE1 first match the packet to the End.DX2 SID 601 of AC1 by DIP, then match the packet to AC1's MAC-VRF VN-10. 603 4. Decapsulation Optimizations 605 4.1. Decapsulation Aggregation 607 We want to decapsulation the packets destining to different ESIs for 608 the same EVI using the same forwarding entry. In order to achieve 609 this benefit, we can use an AC's EVI's EGD as that AC's AC SID's AC- 610 ID. 612 These AC SIDs are aggregatable End.DX2 SIDs, so we can consider the 613 ESI prefix aggregated from these End.DX2 SIDs as a new SRv6 function 614 called End.DX2AGG SID, The format of the End.DX2AGG SID is 615 illustrated in the following figure: 617 |<------ Locator -------->|<- FUNC -->|<------ Arg.ACI -------->| 618 +------------+------------+-----------+-----------------------+-+ 619 | Block | Node | ESI.LDV | EGD |L| 620 +------------+------------+-----------+-----------------------+-+ 622 Figure 3: End.DX2AGG SID Format 624 Note that whether these SIDs are considered as lots of End.DX2 SIDs 625 or are considered as a single End.DX2AGG SID with different 626 arguments, it is just a local matter of their PE node's independent 627 choice, other PEs of the same EVI won't be aware of the difference of 628 these two implementations. 630 A SID with the End.DX2AGG function is called as an "ESI SID" in this 631 document. The ESI's ESI-Prefix is the locator and fuction part of 632 its corresponding ESI SID. The argument part of the ESI SID is the 633 AC-ID for the corresponding AC's End.DX2 SID. The AC-ID plus the 634 ESI.LDV works like the function part of an End.DX2 SID. The argument 635 part of an ESI SID is called as Arg.ACI in this document. 637 Note that the Arg.ACI comprises EGD (EVPN Global Discreminator) and L 638 bit. The EGD identifies the EVI of that AC. When that AC is a leaf 639 AC, the L bit is 1, otherwise the L bit is 0. 641 Note that when AC-ID is the EGD, PE2 can still decapsulate the packet 642 following the End.DX2 function or following the End.DX2AGG function. 643 It is just a local matter, while the End.DX2AGG function can reduce 644 the decapsulation forwarding entries. But when AC-ID is that AC's 645 VLAN-IDs, PE2 have to decapsulate the packet following the End.DX2 646 function. 648 4.2. End.DX2AGG Function and Arg.ACI 650 The "Endpoint with decapsulation and Aggregated L2 table forwarding" 651 behavior (End.DX2AGG for short) is a variant of the End.DX2 behavior. 653 Two of the applications of the End.DX2AGG behavior are the EVPN VPLS 654 [RFC7432] and the EVPN ETREE [RFC8317] use-cases. 656 Any SID instance of this behavior is associated with an ESI E. The 657 behavior also takes an argument: "Arg.ACI". This argument provides a 658 local mapping to an EVI V. The outgoing interface corresponds to 659 is OIF, and the EVI V's bridge table is L2 Table T . 661 The End.DX2AGG SID MUST be the last segment in a SR Policy. 663 When N receives a packet whose IPv6 DA is S and S is a local 664 End.DX2AGG SID, the processing is identical to the End.DT2U behavior 665 except for the Upper-layer header processing which is as follows: 667 S01. If (Upper-Layer Header type == 143(Ethernet) ) { 668 S02. Remove the outer IPv6 Header with all its extension headers. 669 S03. Determine the L2 Table T using Arg.ACI. 670 S04. Learn the exposed MAC Source Address in L2 Table T. 671 S05. Find out the OIF, Forward the Ethernet frame to the OIF. 672 S06. } Else { 673 S07. Process as per Section 4.1.1 of [RFC8986]. 674 S08. } 676 Note that the OIF can be found out using the MAC-entries in L2 677 Table T, when the EVI V is an E-LAN service. 679 5. Advanced Considerations 681 5.1. ESI SID Aggregation 683 There are obvious difference between "Route Aggregation" and "SID 684 Aggregation" for an ESI. The "ESI Route Aggregation" is that 685 different End.DX2AGG SIDs are advertised by underlay protocols in a 686 common SRv6 locator, but different ESIs still have different 687 End.DX2AGG SIDs. The "ESI SID Aggregation" is that different ESIs 688 use the same SRv6 SID. 690 Note that the "ESI Route Aggregation" is recommanded as long as it is 691 possible, but the "ESI SID Aggregation" can only be used under 692 certain restraints. 694 When two ESes are attached to the same redundancy group of PEs, they 695 can share the same SRv6 SID. But this will bring out some issues 696 too. One of these issues is that they may be attached to different 697 groups of PEs in the future. Another issue is that when only one of 698 the ESes fails, that common SRv6 SID can't be withdrawn by that PE, 699 so the steady bypass of that ES arises immediately after its failture 700 on that PE. If these issues are not so important in some scenarios, 701 The ESI-SID Aggregation may be activated. This is an option. 703 Note that when ESI SID Aggregation is activated, the local-bias ES 704 split-horizon procedures or its variations should be used. 706 Note that ESI SID Aggregation works well with single-active ESIs (see 707 Section 3.3), its steadby bypassing problem will arise with all- 708 active ESIs only. 710 Note that the sub-interfaces of the same ESI may be assigned with 711 different End.DX2 SIDs, and these End.DX2 SIDs can be aggregated into 712 a common prefix, this common prefix is assigned with that ESI. In 713 such case, only the common prefix should be advertised before any of 714 the sub-interfaces fails. But this is not considered as "ESI SID 715 Aggregation", this is "ESI Route Aggregation". 717 5.2. ESI/AC SID Advertisement Optimization 719 5.2.1. Advertise ESI-Locators in Underlay Network 721 The End.DX2AGG SIDs can be advertised as an IP prefix in underlay IGP 722 protocols. Although it is the aggregation of many AC SIDs, the ESI 723 SIDs may still be too many for the underlay network. And the core 724 routers who are service-agnostic have to install these ESI prefixes. 726 In order to solve these problems, only the anycast SRv6 locators (say 727 ESI-Locators) of such ESI prefixes should be advertised in the 728 underlay network. 730 Note that in such case the ESI/AC SID typically don't have to be 731 advertised by EVPN routes in overlay network, unless some special 732 features (i.e. unequal load-balance) should be providered together. 734 5.2.2. Using EAD/EVI Route to Advertise AC SIDs 736 When the EAD/EVI routes here are used to advertise AC SIDs, the 737 End.DX2 SIDs are advertised in their SRv6 L2 Service TLVs, not in 738 their next hops. Their next hops will be the node SID of the 739 advertising PE. 741 In such case, the EAD/EVI routes will be installed as overlay routes, 742 and the AC SIDs learnt in the MAC entries is treated as the overlay 743 indexes for recursion. 745 In all-active mode, when an AC of a fails on one PE, all 746 other PEs of that should use EAD/EVI route to advertise 747 its AC SID. 749 5.2.3. Using EAD/ES Route to Advertise ESI SIDs 751 In section 6.1.1 of [I-D.ietf-bess-srv6-services], the SRv6 L2 752 Service TLVs of EAD ES routes just carry the Arg.FE2 infomations. 753 Here the SRv6 L2 Service TLVs of EAD ES routes carry the ESI SIDs. 755 EAD/ES routes will be advertised/imported for EVIs but they should be 756 installed into Global Routing Table (GRT). Because there isn't a 757 dedicated B-component in EVPN-lite SRv6 like that in PBB VPLS and PBB 758 EVPN. The GRT plays a B-Component role in EVPN-lite SRv6. 760 Note that the EAD/ES routes won't be installed as overlay routes like 761 the EAD/EVI routes, because that we want to reduce the forwarding 762 table consumption. 764 Although ESI SIDs are installed into GRT, they are awared only on PE 765 nodes, the transit nodes in underlay network won't be aware of ESI 766 SIDs (they may aware the locators of these SIDs) in order to reduce 767 the FIB consumption. 769 Note that when the EAD/ES route here is used to advertise ESI SID, 770 the End.DX2AGG SID is advertised in its SRv6 L2 Service TLV, not in 771 its nexthop. Its nexthop will be the node SID of the advertising PE. 773 Note that in such case, the SRv6 source IP in the dataplane should be 774 set to the entire AC SID of the ingress AC, not just the ESI IP whose 775 AC-ID part is zero. 777 5.2.4. The Reduction of EAD/EVI Routes 779 In order to solve the problem described in Section 2.6, we may have 780 to advertise AC SIDs in the overlay network. But the amount of AC 781 SIDs may be hundreds of times larger than ESI SIDs. It is necessary 782 for the light-weighted SRv6 EVPNs to reduce the advertisement of AC 783 SIDs. 785 The AC SID of a specified will not be advertised by its 786 PEs, until these PEs know that the fails on at least one of 787 them. 789 Note that the entire AC SID for that can be used as the 790 source IP of the SRv6 encapsulation before that AC SID is advertised 791 via EVPN routes. Because that when a MAC is learnt over that AC SID, 792 the packet for that MAC can also be forwarded according to the ESI- 793 Prefix or ESI-Locator of the corresponding ESI SID due to the longest 794 match procedures of IP lookup. 796 5.2.4.1. AC-DF per EVI Mode for Light-Weighted EVPNs 798 When the EAD/EVI routes are not advertised, the AC-influenced DF- 799 Election per [RFC8584] can't work. So the AC-DF per EVI procedures 800 are required. The AC-DF per EVI procedures includes two steps. The 801 first step is the AC-DF per EVI capability negotiation procedure, and 802 the second step is the AC-DF per EVI DF-election procedure. 804 The Capability negotiation procedures and the DF-Election procedures 805 follow [I-D.wang-bess-evpn-ac-df-per-evi]. 807 5.2.4.2. On Receiving Reverse EAD/EVI Routes 809 In all-active mode, when a PE X receives a reverse EAD/EVI route 810 ([I-D.wang-bess-evpn-ac-df-per-evi]), that PE x can use nomal EAD/EVI 811 route to advertise its local AC SID of that . 813 Note that no EAD/EVI route have to be advertised before receiving the 814 corresponding reverse EAD/EVI routes. This can greatly reduce the 815 amount of EAD/EVI routes. 817 5.3. Unequal LB Advertisement 819 When the ESI SIDs are advertised by EVPN routes for the overlay 820 network according to Section 5.2.2, we can advertise the EVPN Link 821 Bandwidth extended community (see [I-D.ietf-bess-evpn-unequal-lb]) 822 along with the ESI SIDs using EAD/ES routes. 824 Note that these extra information (which are advertised along with 825 the EVPN routes) are awared by the PEs only. The underlay network 826 don't have to be aware of it. 828 Note that when the EVPN Link Bandwidth extended community is 829 advertised along with the ESI SID, The nexthop of the EAD/ES route 830 should not be set to the anycast ECMP Node SID of the advertising PE 831 (egress-PE). On receiving such EAD/ES route, the ingress PE may push 832 this EAD/ES route's nexthop onto the End.DX2AGG/End.DX2 SID when 833 constructing the SID stack, if unequal-LB is required. 835 Note that the association between an ESI SID and its corresponding 836 Node SID is also advertised by EAD/ES routes. In such case, when the 837 ESI SIDs are used as destination IP addresses, they should be hiden 838 in the SRH behind the node SID of the corresponding egress PE router. 839 This need to be encapsulated under the help of EAD/ES routes of 840 overlay network. So the ESI SIDs must be advertised in overlay 841 network in such case. 843 Although these ESI SIDs (that are used as destination IP addresses to 844 PE X) won't be exposed untill data packets reached the egress PE X, 845 the ESI-Locator of them should also be advertised in underlay network 846 because that their corresponding AC SIDs will be encapsulated as 847 source IPs for some other data packets whose ingress PE is PE X. and 848 these source IPs may be checked by underlay URPF (Unicast Reverse 849 Path Forwarding) procedures. 851 5.4. AC-aware Bundling Service Interface 853 In AC-aware bundling service interface, Attachment Circuit ID 854 extended community ([I-D.sajassi-bess-evpn-ac-aware-bundling]) or 855 ACI-specific SOI extended community 856 ([I-D.wang-bess-evpn-ether-tag-id-usage]) should be used in ARP/ND 857 synchronization. 859 Note that each VLAN of the same AC of the same MAC-VRF will have the 860 same End.DX2 SID, 862 Note that in "AC-aware bundling service interface", the AC-ID inside 863 that DX2_AC1 can help the MAC entry to be installed for the correct 864 outgoing interface. Such MAC entry is called as the synced MAC 865 entry. 867 Note that the MAC enties which are learnt against a DX2-SID should 868 have low preference than which are received over a RT-2 route, when 869 they are installed to the MAC table. 871 5.5. C-MAC Flush Notification Procedure 873 The withdraw of an ESI/AC SID Advertisement (as an overlay route) can 874 (if it is the only advertisement of that ESI/AC SID at that time) be 875 used as C-MAC (which was learnt against that ESI/AC SID) flush 876 notification. 878 Note that in single active mode, the ESI-Prefixes of DX2_AC1 and 879 DX2_AC2 are different, so each withdraw of DX2_AC1 or DX2_AC2 will be 880 for the single advertisement of that SID. 882 When "AC-DF per EVI" (Section 5.2.4.1) is used, the reverse EAD/EVI 883 routes can be used to trigger C-MAC flush for specified AC SIDs. In 884 such case, these reverse EAD/EVI routes should not use EVI-RT format 885 to carry their EVI's route-target. Because that EVI-RT format is not 886 visible to RT constraints mechanism. 888 5.6. E-Tree Support Considerations 890 E-tree Supprot extensions is similar to [RFC8317] section 5 except 891 for the following notable differences: The leaf B-MACs are replaced 892 by leaf ESI-SIDs, the root B-MACs are replaced by root ESI-SIDs. The 893 PBB encapsulation is replaced by SRv6 encapsulation, the B-component 894 is replaced by the underlay GRT. The B-MAC Advertisement Route is 895 replaced by EAD/EVI route or EAD/ES Route. 897 As illustrated in Figure 3, the root AC-SID and leaf AC-SID of the 898 same AC can be considered as the same ESI-SID with different Arg.ACI. 899 Even the EGD part of their Arg.ACIs are the same EGD, only the L bit 900 of their Arg.ACIs are different. The L bit of the leaf AC-SID is set 901 to 1. The L bit of the root AC-SID is set to 0. 903 On the ingress PE, when the L bit of the destination SID for the DMAC 904 of a data packet is 1, and that data packet's ingress AC is a leaf 905 AC, that data packet should be dropped. 907 5.7. MAC-Synchronization in All-Active Mode 909 When a host H1 of subnet SN1 sends an ARP Request REQ_P1, then REQ_P1 910 will be forwarded by EVC1 to either PE1 or PE2, not to the both. But 911 when H3 send an ARP Reply REP_P2 to H1, then PE3 may load-balance 912 REP_P2 to either PE1 or PE2, not to the both. 914 When REQ_P1 is load-balanced (see Appendix A.2.1.1) by EVC1 to PE1, 915 not to PE2, but PE3 load-balance REP_P2 to PE2, The MAC entry of H1 916 would not have been prepared on PE2 for REP_P2. So the fowarding of 917 REP_P2 will follow the unknown-unicast procedures. 919 PE1 MUST use RT-2 route RT2S (RT-2 for Synchronization only) to 920 advertise the MAC/IP entry of H1 to other PEs (e.g. PE2) on ES21. 921 These RT-2 routes should be advertised along with an EVI-RT 922 ([I-D.ietf-bess-evpn-igmp-mld-proxy]) and an ES-Import RT. 924 When PE2 receives RT2S, the MAC entry Mx should be installed with AC2 925 as its actual outgoing-interface. When PE3 receives RT2S, RT2S MUST 926 not be imported into VN-10 because that the ES-Import RT of RT2S can 927 be resolved to a local ES of PE3. 929 As a result of that, the synchronized MAC entries will not be 930 imported by their external remote PEs, they are imported just by 931 their internal remote PEs. 933 The IP address field of NLRI of RT2S can be set to H1's IP address, 934 which is obtained through ARP snooping. This IP address can be used 935 to trigger ARP probing when PE1 fails. 937 When C-MAC Mx is aged out by PE1, the RT2S MUST be withdrawn, thus 938 PE2's MAC entry of Mx will be deleted. In such case, ARP probing for 939 Mx should not be triggered in order not to hold a MAC entry for Mx 940 when Mx will not connect to others for a long time. 942 Note that in other light-weighted EVPNs, the VUN1 may be a backbone- 943 VPLS (B-VPLS), in such case, the IP address field of NLRI can be used 944 to distinguish the RT-2 routes of C-MACs from the RT-2 routes of 945 B-MACs. 947 5.8. EVPN IRB Support Considerations 949 The dataplane in this draft is no more complex than typical SRv6 950 EVPN. So it will work as efficient as we should expect in SRv6 EVPN 951 IRB usecase. 953 5.8.1. EVPN IRB Extended Mobility 955 In EVPN IRB usecase, [I-D.ietf-bess-evpn-irb-extended-mobility] 956 defines some optional extensions to support some specific IRB 957 usecases. In these specific IRB usecases, the bindings will 958 change across VM-moves. These extensions can't be applied to light- 959 weighted EVPNs, just like they can't be applied to PBB EVPNs either. 961 5.8.2. Anycast IRB interfaces 963 When an EVPN IRB interface (on PE1) ping a host H1, the corresponding 964 ICMP Echo Request will be delivered to host H1, whether host H1 is 965 PE1's local host or not . but if that IRB interface is an anycast IRB 966 interface, and host H1 is a local host of PE2 (not of PE1), naturally 967 the Echo Reply for that Echo Request will be delivered to the nearest 968 anycast IRB interface on PE2 (not on PE1) only. 970 5.8.2.1. Constructing GW-list 972 The MAC/IP of an anycast IRB interface should be advertised along 973 with a Default Gateway Extended Community. 975 The PEs in which resides the anycast IRB interface of a subnet forms 976 the "GW-list" of that subnet. The "GW-list" of a BD can be 977 constructed from such MAC/IP routes (with Default Gateway extended 978 community of corresponding subnet). 980 5.8.2.2. Flooding over GW-list 982 Echo Replies received by any of the anycast IRB interfaces MUST be 983 flooded over the GW-list of that BD. So that the PE which originated 984 the previous Echo Request can receive the synced Echo Replies. 986 Note that the Echo Replies between two hosts of that BD will not be 987 flooded, because that they will not be received by any of the anycast 988 IRB interfaces. 990 6. Comparison with Other Solutions 992 6.1. Detailed Comparisons with PBB EVPN over SRv6 994 The "PBB EVPN over SRv6 underlay" solution will be complex, if we 995 address too much things to it. I have some examples in the 996 following: 998 * The upper-layer header for SRv6 is the PBB-header for B-MACs, not 999 the ethernet header for C-MACs, so the SID list (SR-Path or 1000 network programming Instructions) in the SRH can't be constructed 1001 for the sake of the I-Component. For example, when a SRv6 SID for 1002 MAC-guarding (or something else, just an example) present in the 1003 SRH for PBB EVPN SRv6, I think it means BMAC-guarding, no C-MAC 1004 guarding. 1006 * The B-MACs for the all-active ESIs can't be aggregated, but the 1007 SRv6 SIDs for ESIs can be aggregated. The underlay can advertise 1008 the ESI-Locators only, so the burden of the underlay network may 1009 not be increased too much. When the underlay routes is 1010 aggregated, the C-MACs can also be learnt against /128 source-IP, 1011 it is the advantage of a light-weighted SRv6 EVPN, which can't be 1012 gained from a PBB header. 1014 * The B-MACs are for overlay protection (the real overlay is the 1015 I-VPLS, but the B-VPLS is also an overlay network from the 1016 viewpoint of the SRv6 network). But the SRv6 SIDs for ESIs will 1017 be for underlay protection, it works like the egress protection. 1018 They are two different types of protecting solutions. 1020 * Light-weighted SRv6 EVPN can support AC-influenced DF Election, 1021 but PBB EVPN over SRv6 can't. 1023 * Although PBB EVPN can be transplanted into SRv6 networks along 1024 with the PBB header (say PBB EVPN over SRv6), It seems to be more 1025 complicated to me. Take the EVPN IRB usecases for example, that 1026 requires seven sequences of header processing, like (SRv6/B-MAC/C- 1027 MAC)(Inner-IP)(C-MAC/B-MAC/SRv6), during the overlay L3 1028 forwarding. I think it will be horrible enough for some ASICs to 1029 implement it. When the processing is simplified as (SRv6/C- 1030 MAC)(Inner-IP)(C-MAC/SRv6), it sounds like a step forward, not 1031 backward, IMHO. We can achieve this goal easily inside the EVPN 1032 framework, only if the data-plane learning can still be considered 1033 as an option after PBB EVPN. 1035 Fortunately, SRv6 is just too young to have a transplantation of PBB 1036 EVPN. So it will waste nothing for the SRv6 nodes to give up the PBB 1037 header that is never used by these SRv6 nodes. Note that the SRv6 1038 functions (End.DT2U and End.DT2M) for L2VPNs have source-IP-based 1039 data-plane learning for a long time already. 1041 Although the extensions in [I-D.ietf-bess-evpn-irb-extended-mobility] 1042 can't be applied to PBB EVPNs or light-weighted EVPNs. This will not 1043 prevent PBB EVPNs and light-weighted EVPNs from supporting typical 1044 IRB use-cases. Note that these extensions are optional. 1046 6.2. Detailed Comparisons with Anycast Node SID 1048 Note that SRv6 Anycast Node SID is the ultimate aggregation of ESI 1049 SIDs. Such ESI SID aggregation will have some problems as described 1050 in Section 5.1. 1052 7. Security Considerations 1054 Security considerations will be added in future versions. 1056 8. IANA Considerations 1058 8.1. End.DX2AGG SID 1060 IANA is requested to allocate a new code points for the new SRv6 1061 Endpoint Behaviors defined in this document. 1063 +------+-------------+---------------+ 1064 | Type | Description | Reference | 1065 +------+-------------+---------------+ 1066 | TBD1 | End.DX2AGG | This Document | 1067 +------+-------------+---------------+ 1069 Figure 4: End.DX2AGG 1071 9. Acknowledgements 1073 The authors would like to thank the following for their comments and 1074 review of this document: 1076 Ye Shu. 1078 10. Normative References 1080 [I-D.ietf-bess-evpn-igmp-mld-proxy] 1081 Sajassi, A., Thoria, S., Mishra, M. P., Drake, J., and W. 1082 Lin, "IGMP and MLD Proxy for EVPN", Work in Progress, 1083 Internet-Draft, draft-ietf-bess-evpn-igmp-mld-proxy-12, 23 1084 August 2021, . 1087 [I-D.ietf-bess-evpn-unequal-lb] 1088 Malhotra, N., Sajassi, A., Rabadan, J., Drake, J., 1089 Lingala, A., and S. Thoria, "Weighted Multi-Path 1090 Procedures for EVPN Multi-Homing", Work in Progress, 1091 Internet-Draft, draft-ietf-bess-evpn-unequal-lb-14, 14 May 1092 2021, . 1095 [I-D.ietf-bess-srv6-services] 1096 Dawra, G., Filsfils, C., Talaulikar, K., Raszuk, R., 1097 Decraene, B., Zhuang, S., and J. Rabadan, "SRv6 BGP based 1098 Overlay Services", Work in Progress, Internet-Draft, 1099 draft-ietf-bess-srv6-services-07, 11 April 2021, 1100 . 1103 [I-D.sajassi-bess-evpn-ac-aware-bundling] 1104 Sajassi, A., Brissette, P., Mishra, M., Thoria, S., 1105 Rabadan, J., and J. Drake, "AC-Aware Bundling Service 1106 Interface in EVPN", Work in Progress, Internet-Draft, 1107 draft-sajassi-bess-evpn-ac-aware-bundling-04, 11 July 1108 2021, . 1111 [I-D.wang-bess-evpn-ac-df-per-evi] 1112 Wang, Y., "AC-Influenced DF Election per EVI", Work in 1113 Progress, Internet-Draft, draft-wang-bess-evpn-ac-df-per- 1114 evi-00, 7 May 2021, 1115 . 1118 [I-D.wang-bess-evpn-ether-tag-id-usage] 1119 Wang, Y., "Ethernet Tag ID Usage Update for Ethernet A-D 1120 per EVI Route", Work in Progress, Internet-Draft, draft- 1121 wang-bess-evpn-ether-tag-id-usage-03, 26 August 2021, 1122 . 1125 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1126 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 1127 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 1128 2015, . 1130 [RFC7623] Sajassi, A., Ed., Salam, S., Bitar, N., Isaac, A., and W. 1131 Henderickx, "Provider Backbone Bridging Combined with 1132 Ethernet VPN (PBB-EVPN)", RFC 7623, DOI 10.17487/RFC7623, 1133 September 2015, . 1135 [RFC8317] Sajassi, A., Ed., Salam, S., Drake, J., Uttaro, J., 1136 Boutros, S., and J. Rabadan, "Ethernet-Tree (E-Tree) 1137 Support in Ethernet VPN (EVPN) and Provider Backbone 1138 Bridging EVPN (PBB-EVPN)", RFC 8317, DOI 10.17487/RFC8317, 1139 January 2018, . 1141 [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, 1142 J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet 1143 VPN Designated Forwarder Election Extensibility", 1144 RFC 8584, DOI 10.17487/RFC8584, April 2019, 1145 . 1147 [RFC8986] Filsfils, C., Ed., Camarillo, P., Ed., Leddy, J., Voyer, 1148 D., Matsushima, S., and Z. Li, "Segment Routing over IPv6 1149 (SRv6) Network Programming", RFC 8986, 1150 DOI 10.17487/RFC8986, February 2021, 1151 . 1153 11. Informative References 1155 [I-D.ietf-bess-evpn-irb-extended-mobility] 1156 Malhotra, N., Sajassi, A., Pattekar, A., Rabadan, J., 1157 Lingala, A., and J. Drake, "Extended Mobility Procedures 1158 for EVPN-IRB", Work in Progress, Internet-Draft, draft- 1159 ietf-bess-evpn-irb-extended-mobility-05, 15 March 2021, 1160 . 1163 [I-D.wang-bess-evpn-egress-protection] 1164 Wang, Y. and R. Chen, "EVPN Egress Protection", Work in 1165 Progress, Internet-Draft, draft-wang-bess-evpn-egress- 1166 protection-04, 29 October 2020, 1167 . 1170 [RFC7041] Balus, F., Ed., Sajassi, A., Ed., and N. Bitar, Ed., 1171 "Extensions to the Virtual Private LAN Service (VPLS) 1172 Provider Edge (PE) Model for Provider Backbone Bridging", 1173 RFC 7041, DOI 10.17487/RFC7041, November 2013, 1174 . 1176 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1177 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1178 eXtensible Local Area Network (VXLAN): A Framework for 1179 Overlaying Virtualized Layer 2 Networks over Layer 3 1180 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1181 . 1183 [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., 1184 Uttaro, J., and W. Henderickx, "A Network Virtualization 1185 Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, 1186 DOI 10.17487/RFC8365, March 2018, 1187 . 1189 Appendix A. Explanation for Physical Links of the Use-cases 1191 +------------------+ 1192 PE1 | P6 | 1193 L2NE1 +----------+---------+ | 1194 +----------+ | __(P1.1)__(VPNx) | | 1195 +---+ P4 | | P1 | / \ | | 1196 |N1 |-----O==------=======< (NIz) | P6 | PE3 1197 +---+ | \ / | | \__ __ / | +----+-------+ 1198 | | | | | (P1.2) (VPNy) | | | 1199 +----|P3|--+ +-----------+--------+ | (VPNx)--+N3 1200 | | | | / | 1201 P3.1 | | P3.2 | P7 | (NIz)--------+N4 1202 | | PE2 | | \ | 1203 +----|P3|--+ +-----------+--------+ | (VPNy)--+N5 1204 | \/ | | __(P2.2)__(VPNy) | | | 1205 +---+ | /\ | | / \ | +----+-------+ 1206 |N2 |-----O====--=========< (NIz) | P8 | 1207 +---+ P5 | | P2 | \__ __ / | | 1208 +----------+ | (P2.1) (VPNx) | | 1209 L2NE2 +----------+---------+ | 1210 | P8 | 1211 +------------------+ 1213 Figure 5: Physical Links Illustrated 1215 There are three PEs, two L2NEs (Layer 2 Network Elements) and five 1216 L3NEs (Layer 3 Network Elements) in abobe network. The PEs are PE1, 1217 PE2 and PE3. The L2NEs are L2NE1 and L2NE2. The L3NEs are 1218 N1/N2/N3/N4/N5. They are all illustrated in Figure 5. 1220 There are 9 physical links among these 10 physical devices as 1221 illustrated in Figure 5. These physical links are called as PLi 1222 (i=1,2...8). The two physical ports of the same physical link PLi 1223 are both called as Pi (i=1,2...8). 1225 As illustrated in Figure 5, some of these physical ports may have 1226 subinterfaces. When a subinterface's VLAN ID is j and it is physical 1227 port Pi's subinterface, that subinterface is called as Pi.j. For 1228 example, P1.2 is a subinterface of physical port P1 and its VLAN ID 1229 is 2. 1231 There are three NIs (Network Instances) among PE1, PE2 and PE3. They 1232 are VPNx, VPNy and NIz. Two subinterfaces are attached to VPNx, they 1233 are P1.1 and P2.1. Other two subinterfaces are attached to VPNy, 1234 they are P1.2 and P2.2. N3 is also attched to VPNx, while N5 is also 1235 attached to VPNy. 1237 There are two EVCs (Ethernet Virtual Connections) between L2NE1 and 1238 L2NE2, they are EVC1 and EVC2. The L2NE1's EVC1 instance (which is 1239 illustrated as the "O" on L2NE1) have three member interfaces, they 1240 are P4, P1.1 and P3.1, where P3.1 and P1.1 are of the same 1241 protection-group. The L2NE2's EVC1 instance have two member 1242 interfaces, they are P3.1 and P2.1. The L2NE2's EVC2 instance (which 1243 is illustrated as the "O" on L2NE2) have three member interfaces, 1244 they are P5, P2.2 and P3.2, where P3.1 and P1.1 are of the same 1245 protection-group. The L2NE1's EVC2 instance have two member 1246 interfaces, they are P3.2 and P1.2. The L2NE2's EVC1 instance and 1247 L2NE1's EVC2 instance are both CCC (Circuit Cross Connection) local 1248 connections. 1250 VPNx and VPNy are associated to NIz on each PE. 1252 A.1. Failure Detections for P1.2 (or P2.1) 1254 There is a CFM session CFM1 between P1.2 of PE1 and L2NE2's P3.2, 1255 when physical port P3 fails, the CFM session CFM1 will go down. 1256 There is a CFM session CFM2 between P2.1 of PE2 and L2NE1's P3.1, 1257 when physical port P3 fails, the CFM session CFM2 will go down. 1259 A.2. Protection Approaches for N1 (or N2) 1260 A.2.1. CCC-Approaches 1262 The L2NE1's EVC1 instance and L2NE2's EVC2 instance are both CCC 1263 local connections too. In L2NE1's EVC1 instance, P1.1 and P3.1 are 1264 of the same protection-group PG1. In L2NE2's EVC2 instance, P2.2 and 1265 P3.2 are of the same protection-group PG2. In PG1, both P1.1 and 1266 P3.1 will receive data packets. In PG2, both P2.2 and P3.2 will 1267 receive data packets. 1269 A.2.1.1. CCC Active-Active Protection 1271 L2NE1 (or L2NE2) will load-balance N1's (N2's) data packets between 1272 P1.1 and P3.1 (or P2.2 and P3.2). 1274 A.2.1.2. CCC Active-Standby Protection 1276 In PG1, P1.1 is the active path, P3.1 is the backup path. In PG2, 1277 P2.2 is the active path, P3.2 is the backup path. 1279 That's saying that L2NE1 (or L2NE2) will not send N1's (or N2's) data 1280 packets over P3.1 (or P3.2), unless P1.1 (or P2.2) or P1 (or P2) has 1281 been in failure before that data forwarding. 1283 A.2.2. VSI-Approaches 1285 L2NE1's EVC2 instance and L2NE2's EVC1 instance are both VSI 1286 instances in this case. P1.1, P3.1, P2.2 and P3.2 are all individual 1287 ACs in these VSIs. 1289 Note that L2NE2's EVC1 instance and L2NE1's EVC2 instance are still 1290 both CCC local connections in this case, and there is no PG1 or PG2 1291 in this case, and there are no PWs in this case. 1293 Authors' Addresses 1295 Yubao Wang 1296 ZTE Corporation 1297 No.68 of Zijinghua Road, Yuhuatai Distinct 1298 Nanjing 1299 China 1301 Email: wang.yubao2@zte.com.cn 1302 Ran Chen 1303 ZTE Corporation 1304 No. 50 Software Ave, Yuhuatai Distinct 1305 Nanjing 1306 China 1308 Email: chen.ran@zte.com.cn