idnits 2.17.1 draft-wz-bess-evpn-vpws-as-vrf-ac-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC8214], [I-D.ietf-bess-evpn-inter-subnet-forwarding]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 2 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 335: '...IP Advertisement SHOULD carry one EVI-...' RFC 2119 keyword, line 366: '...Extended Community attribute SHOULD be...' RFC 2119 keyword, line 661: '...Such RT-2 routes MUST NOT carry any Ro...' RFC 2119 keyword, line 668: '... MAC201 MUST be advertised along wit...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (28 August 2021) is 970 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC4364' is mentioned on line 195, but not defined == Unused Reference: 'I-D.ietf-bess-srv6-services' is defined on line 860, but no explicit reference was found in the text == Unused Reference: 'RFC8365' is defined on line 894, but no explicit reference was found in the text == Outdated reference: A later version (-15) exists of draft-ietf-bess-srv6-services-07 == Outdated reference: A later version (-09) exists of draft-sajassi-bess-evpn-ip-aliasing-02 == Outdated reference: A later version (-08) exists of draft-wang-bess-evpn-arp-nd-synch-without-irb-07 Summary: 2 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS WG Y. Wang 3 Internet-Draft Z. Zhang 4 Intended status: Standards Track ZTE Corporation 5 Expires: 1 March 2022 28 August 2021 7 EVPN VPWS as VRF Attachment Circuit 8 draft-wz-bess-evpn-vpws-as-vrf-ac-02 10 Abstract 12 When a VRF Attachment Cirucit (VRF-AC) is far away from its IP-VRF 13 instance, we can deploy an EVPN VPWS ([RFC8214]) between that VRF-AC 14 and its IP-VRF instance. From the viewpoint of the IP-VRF instance, 15 a local virtual interface takes the place of that remote "VRF-AC". 16 The IP address for that VRF-AC is now configured to the virtual 17 interface, in other words, the virtual interface is the actual VRF-AC 18 of the IP-VRF instance. The virtual interface is also the AC of that 19 VPWS instance, in other words, the virtual interface is cross- 20 connected to that remote "VRF-AC" by the VPWS instance. 22 This document proposes an extension to 23 [I-D.ietf-bess-evpn-inter-subnet-forwarding] to support this 24 scenario. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on 1 March 2022. 43 Copyright Notice 45 Copyright (c) 2021 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 50 license-info) in effect on the date of publication of this document. 51 Please review these documents carefully, as they describe your rights 52 and restrictions with respect to this document. Code Components 53 extracted from this document must include Simplified BSD License text 54 as described in Section 4.e of the Trust Legal Provisions and are 55 provided without warranty as described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Integrated Routing and Cross-connecting . . . . . . . . . 3 61 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 62 2. ARP/ND Synching and IP Prefix Synching . . . . . . . . . . . 6 63 2.1. Constructing MAC/IP Advertisement Route . . . . . . . . . 7 64 2.1.1. When CEs are Hosts . . . . . . . . . . . . . . . . . 7 65 2.1.2. When CEs are Routers . . . . . . . . . . . . . . . . 8 66 2.2. Constructing Ethernet A-D Route . . . . . . . . . . . . . 8 67 2.3. Constructing IP Prefix Advertisement Route . . . . . . . 9 68 2.3.1. Direct-Prefixes Advertisement . . . . . . . . . . . . 9 69 2.3.2. Exclusive CE-Prefixes of Each CE . . . . . . . . . . 9 70 3. Packet Walk Through . . . . . . . . . . . . . . . . . . . . . 10 71 3.1. When CEs are Hosts . . . . . . . . . . . . . . . . . . . 10 72 3.2. When CEs are Routers . . . . . . . . . . . . . . . . . . 11 73 4. Fast Convergence for Routed Traffic . . . . . . . . . . . . . 11 74 5. Considerations on ABRs and Route Reflectors . . . . . . . . . 12 75 6. For Common CE-prefixes behind R1 and R2 . . . . . . . . . . . 12 76 6.1. Solution 1: Independent CE-BGP sessions . . . . . . . . . 12 77 6.2. Solution 2: ECMP-Merging for RT-5G routes . . . . . . . . 13 78 6.2.1. ECMP-Merging by RT-5L . . . . . . . . . . . . . . . . 15 79 6.2.2. ECMP-Merging by RT-2R . . . . . . . . . . . . . . . . 15 80 6.3. Solution 3: RT-5E Routes Advertisement . . . . . . . . . 16 81 6.3.1. CE-Prefix Advertisement by RT-5E Routes . . . . . . . 16 82 6.3.1.1. When Internal Remote PEs Receive the RT-5E . . . 18 83 6.3.1.2. When External Remote PEs Receive the RT-5E . . . 18 84 6.3.1.3. Packet Walk Through . . . . . . . . . . . . . . . 18 85 6.3.2. The Advertisement of SOI-mapping Routes . . . . . . . 19 86 6.3.3. IP-mapping SOI Extended Community . . . . . . . . . . 19 87 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 88 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 89 9. Normative References . . . . . . . . . . . . . . . . . . . . 20 90 10. Informative References . . . . . . . . . . . . . . . . . . . 21 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 93 1. Introduction 95 When a VRF Attachment Cirucit (VRF-AC) is far away from its IP-VRF 96 instance, we can deploy an EVPN VPWS ([RFC8214]) between that VRF-AC 97 and its IP-VRF instance. From the viewpoint of the IP-VRF instance, 98 a local virtual interface takes the place of that remote "VRF-AC". 99 The IP address for that VRF-AC is now configured to the virtual 100 interface, in other words, the virtual interface is the actual VRF-AC 101 of the IP-VRF instance. The virtual interface is also the AC of that 102 VPWS instance, in other words, the virtual interface is cross- 103 connected to that remote "VRF-AC" by the VPWS instance. 105 The requirements of this scenario is described in Section 1.1. 107 1.1. Integrated Routing and Cross-connecting 109 When an IP-VRF instance and an EVPN VPWS instance are connected by an 110 virtual-interface, We call such scenarios as Integrated Routing and 111 Cross-connecting (IRC) use-case, and the virtual-interface connecting 112 EVPN VPWS and IP-VRF is called as IRC interface, because that the 113 packets received from the virtual-interface is routed in the IP-VRF 114 and the data packets sent to the virtual-interface is cross-connected 115 to the remote AC of that EVPN VPWS. 117 The IRC use case are illustrated by the following figure: 119 PE1 120 +---------------------+ 121 | IRC1=10.9 | 122 | +-----+ +------+ |. 123 .| |VPWS1|---|IPVRF1| | . 124 . | +-----+ +------+ | . 125 PE4 . | | . PE3 126 +--------+. +---------------------+ +---------+ 127 | | | | | 128 |+-----+ | | RT-2 |+------+ | 129 ||VPWS1| | | <10.2, M1> ||IPVRF1| | 130 |+-----+ | | label2=IPVRF1 |+------+ | 131 | | | | label1=VPWS1 | | | 132 +---|----+. | RT=VPWS1 .+---|-----+ 133 | . PE2 V . | 134 | . +---------------------+ . | 135 | .| IRC1=10.9 |. | 136 N1=10.2 | +-----+ +------+ | N3=30.2 137 | |VPWS1|---|IPVRF1| | 138 Behind N1: | +-----+ +------+ | 139 60.0/24 | | 140 70.0/24 +---------------------+ 141 Figure 1: ARP/ND Synchronizing for IRC Interfaces 143 There are four PE nodes named PE1/PE2/PE3/PE4 in the above network. 144 PE4 is a pure EVPN VPWS PE, there may be no IP-VRFs on it. PE3 is a 145 pure L3 EVPN PE, there may be no VPWSes or MAC-VRFs on it. PE1 and 146 PE2 are the border of the EVPN VPWS domain and the L3 EVPN domain, so 147 they are both EVPN VPWS PE and L3 EVPN PE, there will be both EVPN 148 IP-VRFs and EVPN VPWSes on them. 150 N1/N2/N3/N1b may be a host or an IP router. N1/N1b and IRC1 is in 151 the subnet 10.0.0.0/24, where N1's IP is 10.0.0.2, N1b's IP is 152 10.0.0.3 and IRC1's IP is 10.0.0.9 (10.9). N2 and IRC2 (see 153 Figure 3) is in the subnet 20.0.0.0/24, where N2's IP is 20.0.0.2 and 154 IRC2's IP is 20.0.0.9 (20.9). N3 is in the subnet 30.0.0.0/24. When 155 N1/N2/N3/N1b is a host, it is also called H1/H2/H3/H1b in this 156 document. When N1/N2/N3/N1b is a router, it is also called R1/R2/R3/ 157 R1b in this document. N1/N2/N3/N1b's MAC address is M1/M2/M3/M1b 158 respectively. 160 When N1 is a Router, there are two subnets behind N1, these subnets 161 are 60.0/24 and 70.0/24. 163 Note that there may be L2 switches between N1/N2/N3/N4 and their PEs. 164 These switches are not shown in Figure 1. 166 Note that the IRC interfaces are considered as AC interfaces in EVPN 167 VPWS instances. At the same time, they are considered as VRF-ACs in 168 IP-VRF instances. 170 When N1 sends an ARP Request REQ_P1, then REQ_P1 will be forwarded by 171 PE4 to either PE1 or PE2, not to the both. Both the IRC1 on PE1 and 172 PE2 are N1's subnet-gateway(SNGW). But when N3 send an ARP Reply 173 REP_P2 to N1, then PE3 may load-balance REP_P2 to either PE1 or PE2, 174 not to the both. 176 When REQ_P1 is load-balanced to PE1, not to PE2, but PE3 load-balance 177 REP_P2 to PE2, The ARP entry of N1 will not be prepared on PE2 for 178 REP_P2. So the fowarding of REP_P2 will be delayed due to ARP 179 missing. 181 We use RT-2 routes to advertise the ARP entry of N1 from PE2 to PE3. 183 Note that an ESI may be assigned to IRC1 and IRC2, But it is not 184 necessary to advertise that ESI in the L3 EVPN domain in some 185 scenarios. The ESI may be advertised in the EVPN VPWS domain only, 186 in such scenarios. 188 1.2. Terminology 190 Most of the terminology used in this documents comes from [RFC7432] 191 and [I-D.ietf-bess-evpn-prefix-advertisement] except for the 192 following: 194 * VRF AC: VRF Attachment Circuit, An Attachment Circuit (AC) that 195 attaches a CE to an IP-VRF. It is defined in [RFC4364]. 197 * IRC: Integrated Routing and Cross-connecting, thus a IRC interface 198 is the virtual interface connecting an IP-VRF and an EVPN VPWS. 200 * L3 EVI: An EVPN instance spanning the Provider Edge (PE) devices 201 participating in that EVPN which contains VRF ACs and maybe 202 contains IRB interfaces or IRC interfaces. 204 * IP-AD/EVI: Ethernet Auto-Discovery route per EVI, and the EVI here 205 is an IP-VRF. 207 * IP-AD/ES: Ethernet Auto-Discovery route per ES, and the EVI for 208 one of its route targets is an IP-VRF. 210 * CE-BGP: The BGP session between PE and CE. Note that CE-BGP route 211 doesn't have a RD or Route-Target. 213 * RMAC: Router's MAC, which is signaled in the Router's MAC extended 214 community. 216 * RT-2R: When a MAC/IP Advertisement Route is used in the context of 217 an IP-VRF, it is called as a RT-2R in this draft. 219 * RT-5E: An EVPN Prefix Advertisement Route with a non-reserved ESI. 221 * RT-5G: An EVPN Prefix Advertisement Route with a zero ESI and a 222 non-zero GW-IP. 224 * RT-5L: An EVPN Prefix Advertisement Route with both zero ESI and 225 zero GW-IP, but a valid MPLS label. 227 * SOI: Supplementary Overlay Index (see Section 6.3.3), the SOI is 228 used together with an ESI to select IP A-D per EVI routes. 230 * Internal Remote PE: When PEx is called as an EVPN route ERy's 231 internal remote PE, that is saying that, PEx is on the ES which is 232 identified by ERy's ESI field. When ERy's SOI is not zero, that is 233 aslo saying that PEx has been attached to the ethernet tag which is 234 identified by the . 236 * External Remote PE: When PEx is called as an EVPN route ERy's 237 external remote PE, that is saying that, PEx is not on the ES which 238 is identified by ERy's ESI field. When ERy's SOI is not zero, PEx 239 may aslo be a PE which has not been attached to the ethernet tag 240 which is identified by the . 242 * CE-Prefix: When an IP prefix can be reached through CEx from PEy, 243 that IP prefix is called as PEy's CE-prefix behind CEx in this 244 draft. PEy's CE-prefix behind CEx is also called as PEy's CE- 245 prefix for short in this draft. 247 * Common CE-Prefix: When an CE-Prefix can be reached through either 248 CEy or CEz from PEy, in this draft, it is called as a common CE- 249 Prefix of CEy and CEz,from the viewpoint of PEy. 251 * Exclusive CE-Prefix: When an CE-Prefix of PEy can be reached 252 through CEy, and it can't be reached through other CEs of PEy, it 253 is called as an exlusive CE-Prefix of CEy, from the viewpoint of 254 PEy. 256 * SNGW: Sub-Net-specific Gate Way IP address, the SNGW of a subnet 257 is an IP address which is used by the hosts of that subnet to be 258 the nexthop of the default route of these host. 260 * Intermediate subnet: The subnet that connects a PE and a CE of a 261 L3 EVI. 263 * Intermediate SNGW : The SNGW of a intermediate subnet. It will be 264 the IP address of a IRC interface in this draft. 266 * Intermediate nexthop : The CE's IP address in the intermediate 267 subnet. 269 * Overlay nexthop : The CE-Prefix's nexthop IP address which is in 270 the address-space of the L3 EVI. 272 * Original Overlay nexthop : The overlay nexthop which is advertised 273 by the CE through a PE-CE route protocol. 275 2. ARP/ND Synching and IP Prefix Synching 277 IP-MAC relations of hosts are learnt by PEs on the access side via a 278 control plane protocol like ARP. In case where N1 is multihomed to 279 multiple L3 EVPN PE nodes by an All-Active EVPN VPWS, N1's Host IP/ 280 MAC will be learnt and advertised in the MAC/IP Advertisement Route 281 only by the PE that receives the ARP packet. The MAC/ IP 282 Advertisement with non-zero ESI will be received by the other 283 multihomed PEs. 285 As a result, after PE2 receives the MAC/IP Advertisement and imports 286 it to the VPWS Service Instance, PE2 installs an ARP entry to the 287 VPWS Service instance's IRC interface. Such ARP entry is called as 288 remote synched ARP Entry in this document. 290 Note that the PE3 follows the DGW1 behavior of 291 [I-D.ietf-bess-evpn-prefix-advertisement]'s section 4.1 to achieve 292 the load balancing procedures based on the recursive route resolution 293 by the GW-IP Overlay Index. 295 When PE3 load balance the traffic towards PE1/PE2, both PE1 and PE2 296 would have been prepared with corresponding ARP entry yet because of 297 the following ARP synching procedures. 299 2.1. Constructing MAC/IP Advertisement Route 301 The CEs may be hosts or routers, these factors may have an influence 302 on how the MAC/IPs of these CEs should be advertised. 304 * The CEs are Hosts - In this case, there may be many hosts in the 305 subnet of an IRC interface. 307 It is not necessary for the MAC/IP routes of these hosts to be 308 imported by their external remote PEs (e.g. PE3). These MAC/IP 309 routes just need to be imported by their internal remote PEs (e.g. 310 PE1/PE2). 312 * The CEs are Routers - In this case, there may be few Routers in 313 the subnet of an IRC interface. 315 The MAC/IP routes of these routers should be imported by their 316 external remote PEs (e.g. PE3), because that the GW-IP of the RT- 317 5G routes (see Section 2.3) of the CE-prefixes behind these 318 routers should be resolved to these MAC/IP routes. 320 This draft introduces a new usage/construction of MAC/IP 321 Advertisement route to enable ARP/ND synching for IP addresses in 322 EVPN IRC use-cases. The usage/construction of this route remains 323 similar to that described in 324 [I-D.ietf-bess-evpn-inter-subnet-forwarding] with a few notable 325 exceptions as below. 327 2.1.1. When CEs are Hosts 329 * The Route-Distinguisher should be set to the corresponding EVPN- 330 VPWS context. 332 * The Ethernet Tag should be set to the VPWS Service Instance 333 Identifier of the IRC interface. 335 * The MAC/IP Advertisement SHOULD carry one EVI-RT (for the EVPN 336 VPWS instance) and one ES-Import RT (for the ESI of the IRC 337 interface). 339 * The ESI can be set to the ESI of the IRC interface or the I-ESI of 340 VPWS1's L2 EVI. 342 Note that the receiver use the ESI and Ethernet Tag ID to 343 determine the VPWS Service Instance whose IRC interface is the 344 interface that the synced ARP entry will be installed to. 346 Note that VPWS1 and VPWS2 are two VPWS Service Instances of the 347 same L2 EVPN Instance, thus they have different VPWS Service 348 Instance Identifiers. Then we can assign an I-ESI to that L2 EVI. 349 The ESI of the Ethernet A-D per EVI routes for these two VPWS 350 Service Instances will be set to this I-ESI. The Ethernet Tag ID 351 of each of these Ethernet A-D per EVI routes (for EVPN VPWS 352 domain) will be set to its VPWS Service Instance ID. 354 * The MPLS Label1 should be set to the label of the . 357 2.1.2. When CEs are Routers 359 * Route-Distinguisher: The RD of VPWS1's EVI. 360 * Ethernet Tag ID: The same as Section 2.1.1. 361 * SOI: The same as the ET-ID of Section 2.1.1. 362 * Router Target: IPVRF1's export RTs and EVPN VPWS's export RTs. 363 * ESI: The same as Section 2.1.1. 364 * MPLS Label1: The same as Section 2.1.1. 365 * MPLS Label2: The MPLS Label2 should be set to IPVRF1's EVPN label. 366 * RMAC: The Rourter's MAC Extended Community attribute SHOULD be 367 carried in VXLAN EVPN. 369 2.2. Constructing Ethernet A-D Route 371 When CEs are hosts, the ESI of the IRC interface is mainly used in 372 the EVPN VPWS domain. That ESI typically has nothing to do with the 373 fundamental function of the L3 EVPN domain. 375 Note that PE3 or PE4 will not import the RT-2 route with an ES-import 376 RT it doesn't recognize. 378 Note that the Ethernet A-D route advertisement in the EVPN VPWS 379 domain still follows [RFC8214]. The IRC interface is considered as 380 an ordinary AC in the EVPN VPWS domain. 382 When CEs are routers, the of the RT-2R route for the GW-IP 383 of the RT-5G routes will be used to do recursive resolution. Thus an 384 corresponding IP A-D per EVI route should be advertised for the IRC1 385 interface in the context of IPVRF1. 387 * Route-Distinguisher: IPVRF1's RD. 388 * Ethernet Tag ID: IRC1 interface's local VPWS service instance ID. 389 * Router Target: IPVRF1's export RT. 390 * ESI: IRC1's ESI or the I-ESI of VPWS1's L2 EVI. 391 * MPLS Label: IPVRF1's EVPN label. 392 * RMAC: The Rourter's MAC Extended Community should be set as per 393 [I-D.sajassi-bess-evpn-ip-aliasing]. 395 2.3. Constructing IP Prefix Advertisement Route 397 There may be two types of IP prefixes on PE1/PE2, direct-prefixes 398 (e.g. intermediate subnet of IRC interface) and CE-prefixes. The 399 direct-prefixes are the subnets of the PE's own interfaces (e.g. the 400 IRC interface). The CE-prefixes are the prefixes behind the CE node 401 N1 (especially when N1 is a router). 403 2.3.1. Direct-Prefixes Advertisement 405 Given that PE1/PE2 can install synced ARP entries to its proper IRC 406 interface benefitting from the RT-2 route of Section 2. This ensures 407 that both PE1 and PE2 will know all hosts of the IRC interface's own 408 subnet. So it is not necessary for PE1/PE2 to advertise per-host IP 409 prefixes of that subnet to PE3 by RT-2 routes. It is recommended 410 that PE1/PE2 advertise a single RT-5L route of that subnet to PE3 411 instead. The ESI of these RT-5 routes can be simply set to zero, 412 because when PE3 receives such RT-5 routes from both PE1 and PE2, PE3 413 can consider them as ECMP or FRR even when their ESI is zero. 415 2.3.2. Exclusive CE-Prefixes of Each CE 417 There may be two types of CE-Prefixes on PE1/PE2, they are the common 418 CE-prefixes (e.g. SN9) of R1 and R2, and the exclusive CE-prefixes 419 (which can only be reached by a specified CE) of R1 or R2. Let us 420 discuss the exclusive CE-Prefixes first, the common CE-prefixes will 421 be discussed in Section 6. 423 Note that N1 may be a host or a router, when it is a router, there 424 may be some prefixes behind N1 on PE1. Those prefixes will be learnt 425 via a PE-CE route protocol (e.g. CE-BGP). N1's IP address may be 426 considered as the overlay nexthop of those prefixes. The overlay 427 nexthop of those prefixes will be carried in the RT-5 route's GW-IP 428 field. Those RT-5 routes are called as RT-5G routes because their 429 Overlay Indexes are their GW-IPs (and their ESI and label are zero). 431 Note that these RT-5G routes are advertised by PE1 to both PE2 and 432 PE3. If the IRC1 interface of PE1 fails, these CE-prefixes will 433 achieve more faster convergency on PE3 by the withdraw (from PE1) of 434 the corresponding IP A-D per EVI route. 436 Note that when PE3 receives the withdraw of the RT-2R of 10.2 from 437 PE1, and the RT-2R is the only RT-2R of 10.2, and the of 438 the RT-2R can be resolved to an IP A-D per EVI route from another PE 439 (e.g. PE2), PE3 should triger a delayed deletion of that RT-2R. so 440 that ARP/ND refresh can happen on PE2 before the deletion. 442 3. Packet Walk Through 444 The procedures for local/remote host learning and MAC/IP 445 Advertisement route constructing are described above. 447 3.1. When CEs are Hosts 449 When N3 sends a data packet P301 to 10.2 which is a host of the 450 subnet of IRC1, P301 will match prefix 10.0/24 on PE3. 452 Both PE1 and PE2 have advertised the RT-5L route of 10.0/24 to PE3. 453 PE3 may consider them as ECMP or FRR, depending on their route 454 attributes. Then PE3 should forward P301 to PE1 or PE2, depending on 455 the ECMP/FRR procedures. 457 We can assume that it is PE2 that will receive P301 from PE3. The 458 outgoing interface for P301 (whose destination IP is 10.2) is IRC1 459 interface. The destination MAC should be found from the ARP entries 460 on IRC1. 462 The ARP entry for 10.2 is a synched ARP entry, because N1 sent the 463 ARP Request only to PE1. It is intalled onto IRC1 interface just 464 because the RT-2 route's route-target mathes VPWS1's L2 EVI and the 465 RT-2 route's matches the IRC1 interfaces's ESI 466 and VPWS Service Instance ID. 468 Then P301 is encapsulated with a ethernet header and becomes an 469 ethernet packet P301E. The destination MAC address of P301E is N1's 470 MAC address which is determined by that ARP entry. The source MAC 471 address of P301E is IRC1's MAC address. Then P301E is sent over IRC1 472 interface. 474 After P301E is sent over IRC1 interface, it will be forwarded to PE4 475 in the EVPN VPWS instance according to [RFC8214] 477 3.2. When CEs are Routers 479 When N3 sends a data packet P301b to a host 60.1 whose location is 480 behind R1(N1), P301b will match prefix 60.0/24 on PE3. The RT-5G 481 route for 60.0/24 will be used to forward P301b. The GW-IP of that 482 RT-5G route is 10.2 (R1). So PE3 uses 10.2 to do recursive route 483 resolution and matches the RT-2R route of 10.2. 485 Note that the recursive route resolution follows the DGW1 behavior of 486 [I-D.ietf-bess-evpn-prefix-advertisement]'s section 4.1. 488 Both PE1 and PE2 have advertised the IP A-D per EVI route for the 489 of the RT-2R route of 10.2. PE3 may consider them as 490 ECMP or FRR, depending on the ESI is all-active or single-active. 491 Then PE3 can forward P301b to PE1 or PE2, depending on the ECMP/FRR 492 procedures. 494 We can assume that it is PE2 that will receive P301b from PE3. The 495 destination IP of P301b is in prefix 60.0/24. That prefix has been 496 installed into IPVRF1 on PE2. PE2 previously received that prefix 497 either from a PE-CE route protocol or from a RT-5G route from PE1. 498 The overlay nexthop or GW-IP of prefix 60.0/24 is 10.2, which is a 499 host of IRC1's subnet. The outgoing interface for P301b is IRC1 500 interface. 502 The ARP entry for 10.2 will be found by the same way as Section 3.1. 503 then the ethernet header will be encapsulated by the same way as 504 Section 3.1. then it will be forwarded to PE4 by the same way as 505 Section 3.1. 507 4. Fast Convergence for Routed Traffic 509 When IRC1 interface goes down, PE1 will withdraw the RT-5L route of 510 10.0/24. And the RT-5G routes of 60.0/24 and 70.0/24 will be just 511 changed to stale state. When PE3 receives the withdraw of that RT-5L 512 route, it will stop to forward the data packets of those two subnets 513 to PE1 again. But PE3 will continue to forward these data packets to 514 PE2. 516 5. Considerations on ABRs and Route Reflectors 518 When an ABR or ASBR receives a MAC/IP Advertisement Route that 519 contains both EVI-RT and ES-Import RT, It should re-advertise that 520 route even if that route's MPLS label1 is null (It should not 521 consider that route as malformed). When that route's nexthop are 522 changed to itself, It don't have to allocate a new label for each 523 RT-2 route's MPLS label1 field separately. That field can be 524 rewritten to the same preconfigured MPLS label that will blackhole 525 the data packets it received. But the MPLS label2 (if is not null) 526 field should be rewritten normally along with the nexthop-rewritting. 528 6. For Common CE-prefixes behind R1 and R2 530 We can assume that there is a common prefix (SN9) and two exclusive 531 prefixes (SN7 and SN8). SN9 is behind both R1 and R2, SN7 is 532 particular to R1 while SN8 is particular to R2. That's saying that 533 PE5 can reach SN9 through either R1 or R2. 535 6.1. Solution 1: Independent CE-BGP sessions 537 R1 and R2 don't know which prefix is their common prefix, and which 538 prefix is their exclusive prefix. So R1 establish its own CE-BGP 539 session S1 to PE1, and R2 establish its own CE-BGP session S2 to PE2. 541 When R1(or R2) advertises IP prefixes to PE1(or PE2), the BGP next 542 hop of these prefixes are set to R1's (or R2's) IP address in the 543 IRC1's (or IRC2's) subnet . 545 PE4 +-----------------------+ 546 +-------------+ PE1 | | 547 SN7 | VPWS1 | +------+------+ ----------> | 548 + | +---------+ | | (VPN1) | RT5(SN9) | 549 | | | _|_|_________|__ / IRC1 | GW-IP=R1 | 550 | | | PW1 / | | P | (VPWS1) | RT2(R1,M1) | 551 +-R1----O=====< | | +-----+-------+ |PE3 552 | | | \_|_|___ | +--+---+ 553 | | | | | B \ +-----+-------+ | | 554 | | +---------+ | \____|__(VPWS1) | | | 555 | | | | \ IRC1 | ----------> | | 556 SN9 | | |PE5 (VPN1) | RT2(R1,M1) |(VPN1)--R3 557 | | VPWS2 | ____|__ / IRC2 | RT2(R2,M2) | | 558 | | +---------+ | / | (VPWS2) | | | 559 | | | _|_|___/ +-----+-------+ | | 560 | | | PW2 / | | B | +--+---+ 561 +-R2----O=====< | | +-----+-------+ | 562 | | | \_|_|_________|__(VPWS2) | ----------> | 563 | | | | | P | \ IRC2 | RT5(SN9) | 564 + | +---------+ | | (VPN1) | GW-IP=R2 | 565 SN8 | | +------+------+ RT2(R2,M2) | 566 +-------------+ PE2 | | 567 +-----------------------+ 569 Figure 2: Common CE-Prefixes and Exclusive CE-Prefixes 571 In such case, the route advertisement is just the same as Section 2 572 (on the condition that the CEs are routers). 574 Note that according to the recursive route resolution behavior of 575 [I-D.ietf-bess-evpn-prefix-advertisement]'s section 4.1, If both RT- 576 5G routes of SN9 were equally preferable and ECMP is enabled, SN9 577 would be added to the routing table with both Overlay Index 10.2 and 578 Overlay Index 20.2. 580 6.2. Solution 2: ECMP-Merging for RT-5G routes 582 In some scenarios, R1 and R2 will not have any exclusive prefixes 583 (e.g. SN7 or SN8 in Figure 2) at all, in other words, all prefixes 584 of them are always their common prefixes, in such case, when R1 585 advertises SN9 to PE1 over that CE-BGP session S1, 10.2 may not be 586 the best choice for SN9's BGP next hop. 588 +-----------------------+ 589 PE1 | | 590 +------+------+ ----------> | 591 | (VPN1) | RT5(SN9) | 592 _____________|__ / IRC1 | GW-IP=IP201 | 593 PW1 / P | (VPWS1) | RT2(IP201) | 594 +-R1----O=====< +-----+-------+ MAC201 |PE3 595 | VPWS1 \_______ | +--+---+ 596 | B \ +-----+-------+ | | 597 | \____|__(VPWS1) | | | 598 | | \ IRC1 | ----------> | | 599 SN9 |PE5 (VPN1) | RT2(IP201) |(VPN1)--R3 600 | ____|__ / IRC2 | MAC201 | | 601 | / | (VPWS2) | | | 602 | _______/ +-----+-------+ | | 603 | PW2 / B | +--+---+ 604 +-R2----O=====< +-----+-------+ | 605 VPWS2 \_____________|__(VPWS2) | ----------> | 606 P | \ IRC2 | RT2(IP201) | 607 | (VPN1) | MAC201 | 608 +------+------+ | 609 PE2 | | 610 +-----------------------+ 612 Figure 3: IP Aliasing of Common CE-Prefixes 614 In such case, we can configure a common anycast loopback address (say 615 IP201, whose value is 7.7.7.7) on R1 and R2. Then, when R1 advertise 616 SN9 to PE1, R1 choose IP201 to be the BGP next-hop of the 617 advertisement. Thus the RT-5G of SN9 from PE1 will be advertised 618 along with GW-IP=IP201. 620 In such case, we can configure a static route in VPN1 for IP201 on 621 PE1, PE2 and PE5. The static route on PE1 (which is called as SRE1) 622 use NH1 as its overlay next hop. The static route on PE2 (which is 623 called as SRE2) use NH2 as its overlay next hop. The static route on 624 PE5 (which is called as SRE5) use both NH1 and NH2 as its overlay 625 next hops. 627 If SRE1, SRE2 and SRE5 are advertised by RT-5G routes too, The 628 recursive resolution will be complicated. There are two ways to 629 simplify the recursive resolution. 631 6.2.1. ECMP-Merging by RT-5L 633 Note that IRC1 and IRC2 are on the same I-ES (say ESI512). Thus 10.2 634 (say NH1) and 20.2(say NH2) are behind different Ethernet Tags of the 635 same I-ESI. We can assume that the ET-ID of IRC1 is ETI100, while 636 the ET-ID of IRC2 (say ETI200) is ETI200. Thus 10.2 is behind 637 , while 20.2 is behind . 639 Then all of the three PEs advertise a RT-5L route (say RT5L_201, 640 whose ESI is zero) for IP201 (in fact it is for SRE1, SRE1 or SRE2) 641 separately. 643 Then we advertise a RT-5G route for SN9 (say RT5G_SN9), the 644 RT5G_SN9's GW-IP is IP201, and its ESI is 0, its ET-ID is 0. 646 When PE3 receives RT5G_SN9 and RT5L_201, the GW-IP of RT5G_SN9 can be 647 resolved to RT5L_201. Then the corresponding data packets of 648 RT5G_SN9 will be forwarded according to IP201's ECMP pathes formed by 649 the corresponding RT-5L routes. 651 Note that we can use this approach to merge the two ECMP Path 652 collections (e.g. s and s) for the CE- 653 prefixes (e.g. SN9) behind a specified anycast IP address (e.g. 654 7.7.7.7 or IP201, which is the IP-address of a loopback interface). 656 6.2.2. ECMP-Merging by RT-2R 658 We can substitute a RT-2 route (say RT2R_201) for 659 RT5L_201(Section 6.2.1). The RT2R_201's IP address is IP201, its MAC 660 address is MAC201, its RD is VPN1's RD, its ESI is 0, its ET-ID is 0. 661 Such RT-2 routes MUST NOT carry any Route-Targets of a Broadcast 662 Domain. Its MPLS Label2 field should be set to VPN1's EVPN label, 663 thus its RMAC should be set to the PE's MAC address in VXLAN EVPN. 664 and its MPLS Label1 field should be set to a pre-configured (for all 665 such RT-2 routes) value. 667 Note that MAC201 is a pre-configured MAC address for IP201. And the 668 MAC201 MUST be advertised along with the Stricky flag. 670 Note that the diferences between RT2R_201 and RT5L_201 exists only in 671 the control plane, when they are installed into the FIB of VPN1 in 672 the data plane, they will be the same. 674 6.3. Solution 3: RT-5E Routes Advertisement 676 For direct-prefixes and exclusive CE-prefixes behind each CE, no ESIs 677 need to be advertised along with them, but for the common CE-prefixes 678 behind R1 and R2, a virtual ESI can be used to achieve the ECMP- 679 merging. 681 6.3.1. CE-Prefix Advertisement by RT-5E Routes 683 This use case is different from Section 6.2 in the following: 685 * There are common prefixes behind R1 and R2, but there are also 686 other prefixes which can only be reached through R1 or R2. 687 * For the common prefixes behind R1 and R2, the integration of R1 688 and R2 can be considered as a vRouter whose two LPUs is R1 and R2. 689 Note that the vRouter concept is a logical entity only for the 690 common prefixes behind R1 and R2, it should not be used for other 691 prefixes. 692 * The CE-prefixes are IPv6 prefixes whith IPv6 nexthop (NH21). 693 * The vRouter is identified by VR621(Virtual Router-ID 621). 694 * The VR621 can be mapped to form an IPv6 address VRID_IP. The 695 VRID_IP are slected from an 96 bits IPv6 prefix VRID_Prefix, and 696 the VRID_IP's lowest 32 bits may be set to a constant X. 697 The VRID_Prefix's lowest 32 bits (of that 96 bits) should be set 698 to VR621. 700 +-------------------------------+ 701 PE1 | | 702 +------+--------+ ----------------> | 703 vRouter | (VRF1:vES) | RT5(SN9) | 704 +---------+ _______|__ / IRC1 | ESI=vES251 | 705 | | VPWS1 / P | (VPWS1) | SOI=VR612 | 706 | R1---+---O==< +-----+---------+ RT1(vES251,VR612) |PE3 707 | | \___ | Label=VRF1 +--+--+ 708 | | B \ +-----+---------+ | | 709 | VR612 | \__|__(VPWS1) | | | 710 | | | \ IRC1 | ----------------> | | 711 | | PE5 | (VRF1:vES) | RT1(vES251,VR612) | | 712 | | __|__ / IRC2 | Label=VRF1 | | 713 | (NH612) | / | (VPWS2) | | | 714 | | ___/ +-----+---------+ | | 715 | | VPWS2 / B | +--+--+ 716 | R2---+---O==< +-----+---------+ | 717 | | \_______|__(VPWS2) | ----------------> | 718 +---+-----+ P | \ IRC2 | RT1(vES251,VR612) | 719 | | (VRF1:vES) | Label=VRF1 | 720 | +------+--------+ | 721 + PE2 | | 722 SN9(Common Prefix) +-------------------------------+ 724 Figure 4: VRID as ET-ID 726 * SOI-mapping Route per each VRID 727 A special static route (which is called as SOI-mapping route) is 728 configured for prefix VRID_Prefix on PE1, PE2, PE5, they are 729 VRID_MR1 (VRID Mapping Route 1), VRID_MR2, VRID_MR5 respectively. 730 VRID_MR1's nexthop is IP102 of R1, which is allocated from IRC1's 731 subnet. VRID_MR2's nexthop is IP202 of R2, which is allocated from 732 IRC2's subnet. VRID_MR5's nexthops are both IP102 and IP202. 734 * I-ESI per L3 EVPN Instance 735 Then we can assign an I-ESI (illustrated as vES251 in the figure) 736 to that L3 EVI. 738 Note that a single RT-1 per ES route will be advertised for vES251, 739 because vES251 is dedicated to that L3 EVI. 741 The RT-4 route will be advertised for DF-Election of vES251. AC-DF 742 mode should be used for vES251. 744 * Ethernet Tag per each vRouter 745 Each vRouter of that L3 EVI is considered to be attached to an 746 Ethernet Tag of vES251. The ET-ID of such Ethernet Tag will be a 747 vRouter's VRID. The Ethernet A-D per EVI route advertisement is 748 triggered by the SOI-mapping route (which represents the vRouter) 749 per each PE, where: 751 - RD: VRF1's RD. 752 - ESI: VRF1's I-ESI (vES251). 753 - ET-ID: The vRouter's VRID. 754 - MPLS Label: VRF1's EVPN label. 755 - Route Target: VRF1's eRT (export Route Target). 757 * RT-5E Route per each CE-Prefix 758 The CE-Prefixes are advertised using RT-5E route, instead of RT-5G 759 route. 761 When PE1 learns a CE-prefix SN9 from the CE-BGP session between PE1 762 and the vRouter, PE1 will advertise a RT-5E route RT5E_SN9, where: 764 - RD: VRF1's RD. 765 - Ethernet Tag ID: The ET-ID should be set to 0. 766 - ESI: VRF1's I-ESI (vES251). 767 - Supplementary Overlay Index: The VRID of the CE-Prefix's 768 advertising vRouter. 769 The SOI can be carried in IP-mapping SOI extended community. 770 - MPLS Label: VRF1's EVPN label. 771 - Route Target: VRF1's eRT (export Route Target). 773 6.3.1.1. When Internal Remote PEs Receive the RT-5E 775 PE5 receives the RT5E_SN9 whose VRID_IP can match a local SOI-mapping 776 route VRID_MR5, and VRID_MR5 indicates that RT5E_SN9 should be 777 installed is if its overlay nexthop is the VRID_IP. The VRID_IP can 778 be infered from the SOI and VRID_MR5 and the constant X. 780 6.3.1.2. When External Remote PEs Receive the RT-5E 782 PE3 receives the RT5E_SN9 whose SOI can't match a local SOI-mapping 783 route, RT5E_SN9 should be installed (as FIB_Entry_6) with as its Overlay Index. 786 6.3.1.3. Packet Walk Through 788 When PE3 use that RT-5E to forward data packet DP6, it follows 789 [I-D.wang-bess-evpn-ether-tag-id-usage]. 791 When PE2 receives DP6 from PE3, it forwards DP6 according to 792 FIB_Entry_6. 794 6.3.2. The Advertisement of SOI-mapping Routes 796 VRID_MR1, VRID_MR2, VRID_MR3 can be advertised using RT-5L along with 797 EVI-RT and ES-Import RT to preclude the external remote PEs from 798 importing these routes into their IP-VRF. Because that they don't 799 have to be used on the external remote PEs. 801 6.3.3. IP-mapping SOI Extended Community 803 The IP-specific SOI extended community is an extension of 804 Supplementary Overlay Index extended community. 806 0 1 2 3 807 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 808 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 809 | Type=0x06 | Sub-Type=TBD |Type=4 |O|Z|F=1| Flags |V|G|Rsv| 810 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 811 | IP-mapping SOI | 812 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 814 Figure 5: IP-mapping SOI Extended Community 816 Where: 818 IP-mapping SOI: A SOI that is derived from or mapped to an IP 819 address, Router ID, static route, etc. 820 V Flag: IPv6 Flag, when it is set to 1, it indicates that the SOI 821 should be mapped to an overlay IPv6 nexthop on internal remote 822 PEs, otherwise the SOI should be mapped to an overlay IPv4 823 nexthop (whose value is the same as the IP-mapping SOI field) on 824 internal remote PEs. 825 When V Flag is 1, on the internal remote PEs, the IP-mapping SOI 826 will be mapped to an IPv6 address (like the VRID_IP in 827 Section 6.3.1, Paragraph 2, Item 5) in the address space of the 828 IP-VRF, then it will be used the same as the above case. 829 When V Flag is zero, on the internal remote PEs, the IP-mapping 830 SOI don't need to be mapped to an IPv6 address. 831 F: Format Inicator is set to 1, to indicate that it is a type- 832 specific SOI. 833 Type: Type code is 4, to indicate that it is an IP-mapping SOI. 834 Rsv: Reserved for future use. 835 G Flag: When G Flag is zero, on the external remote PEs, the SOI- 836 mapped IP address can be used as if it is the GW-IP field of the 837 RT-5 route it belongs to, except for that it don't require to 838 find a RT-2 routes (which is discussed in Appendix B.2 of 839 [I-D.wang-bess-evpn-arp-nd-synch-without-irb]) before the 840 recursive resolution. 842 When V Flag is 1, the SOI-mapped IP address is an IPv6 address 843 like the VRID_IP in Section 6.3.1, Paragraph 2, Item 5. When V 844 Flag is 0, the SOI-mapped IP address is the SOI itself. 845 When the G Flag is set to 1, the advertising PE should advertise 846 an RT-5L route for that SOI-mapped IP address. and the RT-5L 847 route should not use EVI-RT and ES-import RT. 848 Other fields: The same as [I-D.wang-bess-evpn-ether-tag-id-usage]. 850 7. Security Considerations 852 TBD. 854 8. IANA Considerations 856 There is no IANA consideration needed. 858 9. Normative References 860 [I-D.ietf-bess-srv6-services] 861 Dawra, G., Filsfils, C., Talaulikar, K., Raszuk, R., 862 Decraene, B., Zhuang, S., and J. Rabadan, "SRv6 BGP based 863 Overlay Services", Work in Progress, Internet-Draft, 864 draft-ietf-bess-srv6-services-07, 11 April 2021, 865 . 868 [I-D.ietf-bess-evpn-prefix-advertisement] 869 Rabadan, J., Henderickx, W., Drake, J., Lin, W., and A. 870 Sajassi, "IP Prefix Advertisement in EVPN", Work in 871 Progress, Internet-Draft, draft-ietf-bess-evpn-prefix- 872 advertisement-11, 18 May 2018, 873 . 876 [I-D.ietf-bess-evpn-inter-subnet-forwarding] 877 Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. 878 Rabadan, "Integrated Routing and Bridging in EVPN", Work 879 in Progress, Internet-Draft, draft-ietf-bess-evpn-inter- 880 subnet-forwarding-15, 26 July 2021, 881 . 884 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 885 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 886 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 887 2015, . 889 [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. 890 Rabadan, "Virtual Private Wire Service Support in Ethernet 891 VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017, 892 . 894 [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., 895 Uttaro, J., and W. Henderickx, "A Network Virtualization 896 Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, 897 DOI 10.17487/RFC8365, March 2018, 898 . 900 [I-D.wang-bess-evpn-ether-tag-id-usage] 901 Wang, Y., "Ethernet Tag ID Usage Update for Ethernet A-D 902 per EVI Route", Work in Progress, Internet-Draft, draft- 903 wang-bess-evpn-ether-tag-id-usage-03, 26 August 2021, 904 . 907 [I-D.sajassi-bess-evpn-ip-aliasing] 908 Sajassi, A., Badoni, G., Warade, P., Pasupula, S., Drake, 909 J., and J. Rabadan, "EVPN Support for L3 Fast Convergence 910 and Aliasing/Backup Path", Work in Progress, Internet- 911 Draft, draft-sajassi-bess-evpn-ip-aliasing-02, 8 June 912 2021, . 915 10. Informative References 917 [I-D.wang-bess-evpn-arp-nd-synch-without-irb] 918 Wang, Y. and Z. Zhang, "ARP/ND Synching And IP Aliasing 919 without IRB", Work in Progress, Internet-Draft, draft- 920 wang-bess-evpn-arp-nd-synch-without-irb-07, 9 August 2021, 921 . 924 Authors' Addresses 926 Yubao Wang 927 ZTE Corporation 928 No. 68 of Zijinghua Road, Yuhuatai Distinct 929 Nanjing 930 China 932 Email: wang.yubao2@zte.com.cn 933 Zheng(Sandy) Zhang 934 ZTE Corporation 935 No. 50 Software Ave, Yuhuatai Distinct 936 Nanjing 937 China 939 Email: zhang.zheng@zte.com.cn