idnits 2.17.1 draft-wang-bess-evpn-arp-nd-synch-without-irb-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([I-D.sajassi-bess-evpn-ip-aliasing], [RFC7432]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 226: '...IP Advertisement SHOULD carry one or m...' RFC 2119 keyword, line 229: '... * The ESI SHOULD be set to the ESI ...' RFC 2119 keyword, line 249: '...6 L3 Service TLV MAY also be advertise...' RFC 2119 keyword, line 254: '...munity attribute SHOULD be carried in ...' RFC 2119 keyword, line 329: '... Although PE3 SHOULD prefers the RMA...' (2 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 10, 2020) is 1447 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-08 == Outdated reference: A later version (-09) exists of draft-sajassi-bess-evpn-ip-aliasing-01 == Outdated reference: A later version (-04) exists of draft-wang-bess-evpn-context-label-00 Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS WG Y. Wang 3 Internet-Draft Z. Zhang 4 Intended status: Standards Track ZTE Corporation 5 Expires: November 11, 2020 May 10, 2020 7 ARP/ND Synching And IP Aliasing without IRB 8 draft-wang-bess-evpn-arp-nd-synch-without-irb-05 10 Abstract 12 This document proposes an extension to [RFC7432] and 13 [I-D.sajassi-bess-evpn-ip-aliasing] to do ARP synchronizing and IP 14 aliasing for Layer 3 routes that is needed for EVPN signalled L3VPN 15 to build a complete IP ECMP. The phrase "EVPN signalled L3VPN" means 16 that there may be no MAC-VRF or IRB interface in the use case. When 17 there are no MAC-VRF or IRB interface, EVPN signalled L3VPN is also 18 called as "pure L3VPN instance" which is a different usecase from 19 [I-D.sajassi-bess-evpn-ip-aliasing]. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on November 11, 2020. 38 Copyright Notice 40 Copyright (c) 2020 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 57 2. ARP/ND Synching and IP Aliasing . . . . . . . . . . . . . . . 4 58 2.1. Constructing MAC/IP Advertisement Route . . . . . . . . . 5 59 2.2. Constructing IP-AD/EVI Route . . . . . . . . . . . . . . 6 60 2.3. Constructing IP-AD/ES Route . . . . . . . . . . . . . . . 6 61 3. Fast Convergence for Routed Traffic . . . . . . . . . . . . . 7 62 4. Determining Reach-ability to Unicast IP Addresses . . . . . . 7 63 5. Forwarding Unicast Packets . . . . . . . . . . . . . . . . . 7 64 6. EVPN signalled L3VPN . . . . . . . . . . . . . . . . . . . . 8 65 6.1. RT-5E Advertisement on Distributed L3 GW . . . . . . . . 8 66 6.2. Centerlized RT-5G Advertisement for Distributed L3 67 Forwarding . . . . . . . . . . . . . . . . . . . . . . . 8 68 6.2.1. Centerlized CE-BGP . . . . . . . . . . . . . . . . . 9 69 6.2.2. RT-2E Advertisement from PE1/PE2 to PE3 . . . . . . . 10 70 6.2.3. RT-5G Advertisement from PE3 to PE1/PE2 . . . . . . . 10 71 6.2.4. RT-2E Advertisement between PE1 and PE2 . . . . . . . 11 72 6.2.5. Egress ESI Link Protection between PE1 and PE2 . . . 11 73 6.2.6. Comparing with Distributed RT-5G Advertisement . . . 11 74 6.2.7. Mass-Withdraw by EAD/ES Route . . . . . . . . . . . . 12 75 6.2.8. On the Failure of PE3 Node . . . . . . . . . . . . . 12 76 6.2.9. Floating GW-IP between R1 and R2 . . . . . . . . . . 13 77 6.3. RT-5L Advertisement . . . . . . . . . . . . . . . . . . . 13 78 7. Load Balancing of Unicast Packets . . . . . . . . . . . . . . 13 79 8. Special Considerations for Single-Active ESI . . . . . . . . 14 80 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 81 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 82 11. Normative References . . . . . . . . . . . . . . . . . . . . 14 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 85 1. Introduction 87 In [I-D.sajassi-bess-evpn-ip-aliasing], an extension to [RFC7432] to 88 do aliasing for Layer 3 routes is proposed for symmetric IRB to build 89 a complete IP ECMP. But typically there may be both IRB 90 interfaces(to do EVPN IRB per-MAC-VRF basis) and VRF- interfaces in 91 the same IP-VRF instance. It is necessary to apply the EVPN control- 92 plane to the VRF-interfaces in order to support EVPN signalled L3VPN, 93 including both such mixed situations and the pure L3VPN instance use 94 case where maybe no IRB interfaces will be found in the IP-VRF 95 instances. 97 +---------+ 98 +-------------+ | | 99 | | | | 100 /| PE1 |----| | +-------------+ 101 / | | | MPLS/ | | | 102 LAG / +-------------+ | VxLAN/ | | PE3 |---N3 103 N1---SW1===== | NVGRE/ | | | 104 / \ +-------------+ | SRv6 |---| | 105 N2 \ | | | | +-------------+ 106 \| PE2 |----| | 107 | | | | 108 +-------------+ | | 109 | | 110 | | 111 +---------+ 113 Figure 1: ARP/ND Synchronizing and IP Aliasing without IRB 115 There are three CE nodes named N1/N2/N3 in the above network. N1/N2/ 116 N3 may be a host or a IP router. When N1/N2/N3 is a host, it is also 117 called H1/H2/H3 in this document. When N1/N2/N3 is a router, it is 118 also called R1/R2/R3 in this document. 120 Consider a pair of multi-homed PEs PE1 and PE2. Let there be two 121 hosts H1 and H2 attached to them via a L2 switch SW1. Consider 122 another PE PE3 and a host H3 attached to it. The H1 and H2 represent 123 subnet SN1 and the H3 represents subnet SN2. 125 Note that it is different from [I-D.sajassi-bess-evpn-ip-aliasing] in 126 the following aspects: There may be no MAC-VRF or IRB interface on 127 PE1/PE2/PE3. And it is the IP-VRFs that are called as EVPN instance 128 instead. Such EVPN instance can be called pure L3 EVPN instance or 129 L3 EVI for short. The anycast gateway of H1/H2 is configured on a 130 sub-interface on PE1/PE2. 132 Note that the communication between H1 and H2 won't pass through any 133 of the multi-homed PEs. So it is not necessary for PE1/PE2 keeping a 134 Broadcast domain and its IRB for SN1. 136 Note that the SW1 multi-homing PE1 and PE2 via a LAG interface which 137 maybe load-balance traffic to the PEs. 139 This draft proposes an extension to do ARP/ND synchronizing and IP 140 aliasing for Layer 3 routes that is needed for L3 EVI to build a 141 complete IP ECMP. 143 1.1. Terminology 145 Most of the terminology used in this documents comes from [RFC7432] 146 and [I-D.sajassi-bess-evpn-ip-aliasing] except for the following: 148 VRF Interface: A interface that connects to a CE for an IP-VRF but is 149 not an IRB interface. 151 L3 EVI: An EVPN instance spanning the Provider Edge (PE) devices 152 participating in that EVPN which contains VRF Interfaces and maybe 153 contains IRB interfaces. 155 IP-AD/EVI: Ethernet Auto-Discovery route per EVI, and the EVI here is 156 an IP-VRF. 158 IP-AD/ES: Ethernet Auto-Discovery route per ES, and the EVI for one 159 of its route targets is an IP-VRF. 161 CE-BGP: The BGP session between PE and CE. Note that CE-BGP route 162 doesn't have a RD or Route-Target. 164 RMAC: Router's MAC, which is signaled in the Router's MAC extended 165 community. 167 RT-2E: A MAC/IP Advertisement Route with a non-reserved ESI. 169 RT-5E: An EVPN Prefix Advertisement Route with a non-reserved ESI. 171 RT-5G: An EVPN Prefix Advertisement Route with a zero ESI and a non- 172 zero GW-IP. 174 RT-5L: An EVPN Prefix Advertisement Route with both zero ESI and zero 175 GW-IP. 177 2. ARP/ND Synching and IP Aliasing 179 Host IP and MAC routes are learnt by PEs on the access side via a 180 control plane protocol like ARP. In case where a CE is multihomed to 181 multiple PE nodes using a LAG and is running in All-Active Redundancy 182 Mode, the Host IP will be learnt and advertised in the MAC/IP 183 Advertisement only by the PE that receives the ARP packet. The MAC/ 184 IP Advertisement with non-zero ESI will be received by both PE2 and 185 PE3. 187 As a result, after PE2 receives the MAC/IP Advertisement and imports 188 it to the L3 EVI, PE2 installs an ARP entry to the VRF interface 189 whose subnet matches the IP Address from the MAC/IP Advertisement. 190 Such ARP entry is called remote synched ARP Entry in this document. 192 Note that the PEs follow [I-D.sajassi-bess-evpn-ip-aliasing] to 193 achieve the ESI load balance except for the constructing of MAC/IP 194 Advertisement Route and IP AD per EVI route. 196 When PE3 load balance the traffic towards the multihomed Ethernet 197 Segment, both PE1 and PE2 would have been prepared with corresponding 198 ARP entry yet because of the ARP synching procedures. 200 It is important to explain that typically there may be both IRB 201 interface and VRF interface in an IP-VRF instance, which is called as 202 the "VRF interface in EVPN IRB" use-case in this document. But each 203 IRB/VRF interface is independent to each other in EVPN control plane. 204 So the use-case here is constrained to a pure L3 EVPN schema, Because 205 it is enough to describe all the control-plane updates for both the 206 pure L3 EVPN use-case and the "VRF interface in EVPN IRB" use-case. 208 In current EVPN control-plane for "VRF interface in EVPN IRB" use- 209 case, the VRF interface is considered as "external link" and it just 210 inter-operates with the EVPN control-plane. But in this document it 211 is assumed to be better if the EVPN control-plane directly applied to 212 the VRF interfaces. 214 2.1. Constructing MAC/IP Advertisement Route 216 This draft introduces a new usage/construction of MAC/IP 217 Advertisement route to enable Aliasing for IP addresses in pure L3 218 EVPN use-cases. The usage/construction of this route remains similar 219 to that described in RFC 7432 with a few notable exceptions as below. 221 * The Route-Distinguisher should be set to the corresponding L3VPN 222 context. 224 * The Ethernet Tag should be set to 0. 226 * The MAC/IP Advertisement SHOULD carry one or more IP VRF Route- 227 Target (RT) attributes. 229 * The ESI SHOULD be set to the ESI of the VRF interface from which 230 the ARP entry is learned. 232 Note that the ESI is used to install remote synched ARP entries to 233 corresponding VRF interfaces on PE1/PE2. But it is only used to load 234 balance traffic on PE3. 236 * The MPLS Label1 should be set to implicit-null in MPLS/SRv6 237 encapsulation. For VXLAN encapsulation, the MPLS label1 should be 238 set to 0 instead. 240 Note that there may be no MAC-VRF here, and this is outside the scope 241 of RFC 7432. 243 * The MPLS Label2 should be set to the local label of the IP-VRF in 244 MPLS or VXLAN EVPN. But it should be set to implicit-null in SRv6 245 EVPN. 247 Note that the label may be VNI label or MPLS label. 249 Note that in SRv6 EVPN an SRv6 L3 Service TLV MAY also be advertised 250 along with the route following [I-D.dawra-bess-srv6-services]. But 251 SRv6 L2 Service TLV won't be advertiseed along with the route. 252 Because that no MAC-VRF exists in the use case. 254 * The RMAC Extended Community attribute SHOULD be carried in VXLAN 255 EVPN. 257 2.2. Constructing IP-AD/EVI Route 259 Note that the IP-AD/EVI Advertisement is used for two reasons. It is 260 used between PE1 and PE2 to do egress link protection for the subnet 261 of the downlink VRF-interface. It is used between PE1/PE2 and PE3 to 262 achieve the load balance to ES adjacent PEs. 264 The usage/construction of this route is similar to the IP-AD per EVI 265 route described in [I-D.sajassi-bess-evpn-ip-aliasing] with a few 266 notable exceptions as below. 268 Note that there may be no MAC-VRF here, and this is outside the scope 269 of [RFC7432] and [I-D.sajassi-bess-evpn-ip-aliasing]. 271 Note that we have special considerations for single-active ESIs than 272 [I-D.sajassi-bess-evpn-ip-aliasing], and it is detailed in section 8. 274 Such Ethernet Auto-Discovery route is called Ethernet Auto-Discvoery 275 route per IP-VRF which is abbreviated as EAD/IP-VRF in the old 276 versions of this document. 278 2.3. Constructing IP-AD/ES Route 280 The usage/construction of this route remains similar to the IP AD per 281 ES route described in [I-D.sajassi-bess-evpn-ip-aliasing] section 3.1 282 with a few notable exceptions as explained as below. 284 There may be no MAC-VRF RTs in the IP-AD/ES Route. 286 Such Ethernet Auto-Discovery route is called EAD/ES route in the old 287 versions of this document. 289 3. Fast Convergence for Routed Traffic 291 The procedures for Fast Convergence do not change from 292 [I-D.sajassi-bess-evpn-ip-aliasing] except for a few notable 293 exceptions as explained as below. 295 The local ARP entries and remote synced ARP entries is installed/ 296 learned on a VRF interface rather than an IRB interface. 298 There is no MAC entry. 300 4. Determining Reach-ability to Unicast IP Addresses 302 The procedures for local/remote host learning and MAC/IP 303 Advertisement route constructing are described above. The procedures 304 for Route Resolution do not change from 305 [I-D.sajassi-bess-evpn-ip-aliasing] and/or 306 [I-D.ietf-bess-evpn-prefix-advertisement]. 308 5. Forwarding Unicast Packets 310 Because of the nature of the MPLS label or SRv6 SID for IP-VRF 311 instance, when these IP-AD/EVI routes are referred in IP-VRF routing 312 and forwarding procedures, the inner ethernet headers are absent on 313 the corresponding packets transported following these IP-AD/EVI 314 routes. 316 Note that in [I-D.sajassi-bess-evpn-ip-aliasing] the IP-AD per EVI 317 route carries a "Router's MAC" extended community in case the RMAC is 318 not the same among different PEs. In these cases, the inner 319 destination MAC of the corresponding data packets from PE3 to PE1/PE2 320 must use the RMAC in IP-AD/EVI route instead, even if there is a RMAC 321 in RT-2E route. 323 Note that this is a data-plane update of 324 [I-D.ietf-bess-evpn-prefix-advertisement] for both EVPN signalled 325 L3VPN and [I-D.sajassi-bess-evpn-ip-aliasing]. According to 326 [I-D.ietf-bess-evpn-prefix-advertisement] section 4.3 or 327 [I-D.ietf-bess-evpn-inter-subnet-forwarding] section 3.2.3, the inner 328 destination MAC will follow the RMAC of RT-5E Route or RT-2E Route. 329 Although PE3 SHOULD prefers the RMAC in the IP-AD/EVI routes 330 following this document, we also suggest the RMAC being included in 331 RT-2E or RT-5E route for compatibility. 333 When a packet is forwarded following the subnet route of downlink 334 VRF-interface, and the bypass tunnel is used, the ARP lookup is not 335 needed because of the RMAC in the IP-AD/EVI route. But if the 336 downlink VRF-interface is up at that time, the ARP lookup is used to 337 encapsulated the destination MAC of the packet's ethernet header as 338 usual. 340 Note that the packets received from a bypass tunnel can only be 341 forwarded to a local downlink VRF-interface. In order to prevent the 342 micro loop on R1's node failure, a few split-horizon filter rules 343 should be introduced. In EVPN NVO3, the packet received from a 344 tunnel is not allowed to forwarded to the same tunnel. In SRv6 EVPN, 345 the packet received form a locator may be not allowed to forwarded to 346 the same locator based on configurations. In MPLS EVPN, the packet 347 may include an extra label to identify its ingress router as proposed 348 in [I-D.wang-bess-evpn-context-label]. IN MPLS EVPN, the packet may 349 include an extra label to identify that it is forwarded on a bypass 350 tunnel. And the extra label can be a extended special-purpose label 351 or an ESI label. 353 6. EVPN signalled L3VPN 355 EVPN signalled L3VPN can be deployed without EVPN IRB like what MPLS/ 356 BGP VPNs have done for a long time, but it can be combined with EVPN 357 IRB. The EVPN siganlled L3VPN without EVPN IRB is not well defined 358 yet, so we take the non-IRB usecase as an example. But the following 359 routes and procedures can be used in EVPN IRB usecase too. Note that 360 in EVPN IRB usecase, the IRB interfaces are VRF-interface too. 362 6.1. RT-5E Advertisement on Distributed L3 GW 364 Given that PE1/PE2 can install a synced ARP entry to its proper VRF- 365 interface benefitting from the RT-2 route of section 2.1. So it is 366 not necessary for PE1/PE2 to advertise per-host IP prefixes by RT-2 367 routes. It is recommended that PE1/PE2 advertise an RT-5 route per 368 subnet to PE3 instead. The ESI of these RT-5E routes can be set to 369 the ESI of the corresponding VRF interface. If the VRF interface 370 fails, these subnets will achieve more faster convergency on PE3 by 371 the withdraw of the corresponding IP-AD/EVI route. 373 Note that N1/N2 may be a host or a router, when it is a router, those 374 subnets will be the subnets behind it. When N1 and N2 are hosts, 375 those subnets will be the subnets of N1 and N2 whether they are 376 different subnets or not. 378 6.2. Centerlized RT-5G Advertisement for Distributed L3 Forwarding 380 When N1/N2/N3 is a router, it is called R1/R2/R3 in the following 381 figure. Note that figure 1 only illustrates the physical ethernet 382 links, but figure 2 illustrates the logical L3 adjacencies betweent 383 PE and CE as the following. 385 PE2 386 +----+ +---------------+ 387 | | 20.2 | 20.1 +------+ | ------> 388 | R2 |===+------------| | | RT-2E 389 | | | | |IPVRF1| | 20.2 PE3 390 +----+ | +---------| | | ESI1 +---------------+ 391 Prefix2 | | | 10.1 +------+ | | | 392 | | +---------------+ | +-----------+ | 393 | | ^ | | IPVRF1 | | 394 | | | RT-2E <-------- | | |----R3 395 | | ESI1 | 10.2 RT-5G | | 3.3.3.3 | | 396 | | | ESI1 Prefix1 | +-----------+ | 397 | | | 10.2 | ^ | 398 | | +---------------+ | | | 399 Prefix1 | | | 20.1 +------+ | +---|-----------+ 400 +----+ +--|---------| | | | 401 | | | | |IPVRF1| | | 402 | R1 |======+---------| | | ------> | 403 | | 10.2 | 10.1 +------+ | RT-2E | 404 +----+ +---------------+ 10.2 | CE-BGP 405 | PE1 ESI1 | Prefix1 406 | | NH=10.2 407 | CE-BGP | 408 +------------------------>------------------------+ 410 Figure 2: Centerlized RT-5G Advertisement 412 Note that R1/R2 should establish CE-BGP session with both PE1 and PE2 413 in case of one of them fails, PE1 and PE2 will advertise RT-5E route 414 to PE3 for their prefixes learned from CE-BGP independently. If R1/ 415 R2 prefers to establish a single CE-BGP session, it can establish the 416 CE-BGP session with PE3 instead. This CE-BGP session can be called 417 the centerlized CE-BGP session. But when we use centerlized CE-BGP 418 session, we should use RT-5G route instead. 420 Note that we just use centerlized CE-BGP session to do route 421 advertisement, but we still expect a distributed Layer 3 forwarding 422 framework. 424 6.2.1. Centerlized CE-BGP 426 The CE-BGP session between R1 and PE3 is established between 10.2 and 427 3.3.3.3. The CE-BGP session between R2 and PE3 is established 428 between 20.2 and 3.3.3.3. The IP address 10.2/20.2 is called the 429 uplink interface address of R1/R2 in this document. The IP address 430 3.3.3.3 is called the centerlized loopback address of IPVRF1 in this 431 document. The IP address 10.1/20.1 is called the downlink VRF- 432 interface address of PE1/PE2 in this document. 434 Note that the downlink VRF-interface is a Layer 3 link and it needn't 435 attach an BD. 437 R1 advertises a BGP route for a prefix (say "Prefix1") behind it to 438 PE3 via that CE-BGP session. The nexthop for Prefix1 is R1's uplink 439 interface address (say 10.2). 441 The route advertisement of R2 is similar to the above advertisement. 443 Note that the packets from R1/R2 to the centerlized loopback address 444 may be routed following the default route on R1/R2. 446 6.2.2. RT-2E Advertisement from PE1/PE2 to PE3 448 When PE1 learns the ARP entry of 10.2, it advertises a RT-2E route to 449 PE3. The ESI value of the RT-2E route is ESI1, which is the ESI of 450 PE1's downlink VRF-interface for R1. The RT-2E route is constructed 451 following section 2.1. 453 Note that in [RFC7432], when the ESI is single-active, the MAC 454 forwarding only use the label and the MPLS nexthop of the RT-2E route 455 as long as they are valid for forwarding status. But in IP 456 forwarding we assume that the ESI is always preferred even if the ESI 457 is single-active. This is similar to 458 [I-D.ietf-bess-evpn-prefix-advertisement] section 3.2 Table 1. The 459 ESI usage in IP forwarding is out of the [RFC7432]'s scope. 461 The RT-2E route advertisement of PE2 is similar to the above 462 advertisement. 464 6.2.3. RT-5G Advertisement from PE3 to PE1/PE2 466 When PE3 receives the prefix1 from the CE-BGP session. The nexthop 467 for Prefix1 is 10.2, and the ESI for 10.2 is ESI1. So PE3 advertises 468 a RT-5G route to PE1/PE2 for Prefix1. The GW-IP value of the RT-5G 469 route for Prefix1 is 10.2. 471 Note that PE3 can load-balance packets for Prefix1 via the IP-AD/EVI 472 routes from PE1/PE2. Because ESI1 is the ESI for Prefix1's GW-IP. 474 The RT-5 route advertisement and packet forwarding for Prefix2 is 475 similar to the above. 477 Note that the centerlized loopback address is advertised by PE3 via 478 RT-5L route. The nexthop of the RT-5L route is PE3, and the GW-IP 479 value of the RT-5L route is zero. The label of the RT-5L route is 480 IPVRF1's label on PE3. The RMAC of the RT-5L route is PE3's MAC when 481 the encapsulation is VXLAN. 483 6.2.4. RT-2E Advertisement between PE1 and PE2 485 The RT-2E routes advertisement between PE1 and PE2 is used to sync 486 these ARP entries to each other in order to avoid ARP missing. The 487 ESI Value of these two RT-2E routes is ESI1. 489 Note that we assume that the ARP entry for 10.2 will be learned on 490 PE1 only, and 20.2 will be learned on PE2 only. Note that the two 491 downlink VRF-interfaces for R1/R2 on PE1/PE2 are sub-interfaces of 492 the same physical interface. So they have the same ESI. 494 6.2.5. Egress ESI Link Protection between PE1 and PE2 496 The IP-AD/EVI routes between PE1 and PE2 is used to do egress link 497 protection. The egress link protection follows the second approach 498 of the [RFC8679] section 6. 500 Note that although the ARP entry for 10.2 on PE2 is synced from PE1 501 via RT-2E route. The ARP entry on PE2 is installed to forward 502 packets directly to the corresponding downlink VRF-interface 503 primarily. The bypass tunnel following the IP-AD/EVI route is only 504 activated when the downlink VRF-interface fails. 506 6.2.6. Comparing with Distributed RT-5G Advertisement 508 When R1/R2 establish CE-BGP sessions with both PE1 and PE2, The RT-5G 509 routes can be used by PE1/PE2 instead of the RT-5E routes. But when 510 R1 only establish just a single CE-BGP session with PE1, there will 511 be some trouble when PE1 fails. Even if PE2/PE3 applies a delayed 512 deletion when PE1 fails, the delay cann't be long enough when PE1 513 never comes up again. 515 Note that when there is only a single CE-BGP session, the RT-5E 516 advertisement will face the same fact. In fact it is even worse when 517 R1 uses different subnets to connect to PE1 and PE2 as described in 518 [I-D.sajassi-bess-evpn-ip-aliasing] section 1.2. Because that RT-5E 519 can only sync the prefixes, it can't sync the nexthops, so when PE2 520 receives a RT-5E route from PE1 the ARP entry for the other uplink 521 interface that connects R1 to PE2 will not be resolved by PE2. 523 Note that when R1 uses different subnets to connect to PE1 and PE2 , 524 it is not necessary to configure a BD for the two subnets connecting 525 PE and CE like what is described in 526 [I-D.sajassi-bess-evpn-ip-aliasing] section 1.2. 528 6.2.7. Mass-Withdraw by EAD/ES Route 530 We can assume that R1 and R2 are attached to different IP-VRFs(say 531 IPVRF1 and IPVRF2 respectively), and the physical interface of the 532 downlink VRF-interfaces on PE1 fails, PE1 will withdraw the IP-AD/ES 533 route of ESI1, so PE3 will re-route 10.2 for Prefix1 in IPVRF1 and 534 20.2 for Prefix2 in IPVRF2 at the same time. Then data packets for 535 Prefix1 and Prefix2 will be sent to PE2 instead. 537 6.2.8. On the Failure of PE3 Node 539 On the failure of PE3, PE1/PE2 should delay the deletion of the RT-5G 540 route from PE3. PE3 can use a new BGP attribute to indicate the 541 delayed-deletion requirement to PE1/PE2. Otherwise the L3 traffic 542 between R1 and R2 will be interrupted. Fortunately, PE3 will 543 typically have a redundant node (PE3' in Figure 3), and PE3' can be 544 used to take PE3's place when PE3 fails. 546 Note that from the viewpoint of R1 and R2, the total of PE1, PE2, 547 PE3, PE3' and the underlay network between them is regarded as the 548 following logical router: 550 +---------------------------------+ 551 | | 552 | +----------------------+ | 553 | | RPU1 (PE3) | | 554 | +----------------------+ | 555 | | 556 | +----------------------+ | 557 | | RPU2 (PE3') | | 558 | +----------------------+ | 559 | | 560 | +----------------------+ | 561 R1-----------| Line Card 1 (PE1) | | 562 | +----------------------+ | 563 | | 564 | +----------------------+ | 565 R2-----------| Line Card 2 (PE2) | | 566 | +----------------------+ | 567 | | 568 +---------------------------------+ 570 Figure 3: The Logical Router Framework 572 R1 and R2 connect to the line-cards of the logical router. and the 573 data packets between R1 and R2 just pass through the line-cards, not 574 through the RPUs(Routing Processing Units). But R1/R2 establish the 575 BGP session with the RPUs, not the line-cards. When the RPU1(or 576 actually PE3) fails, the line-cards(or actually PE1/PE2) will keep 577 the forwarding state unchanged untill the RPU1 or RPU2 comes up. So 578 the delayed deletion on PE1/PE2 for PE3's sake is apprehensible for 579 the same reason. 581 6.2.9. Floating GW-IP between R1 and R2 583 It is similar to [I-D.ietf-bess-evpn-prefix-advertisement] section 584 4.2 except for a few notable differences as described in the 585 following. There may be no BD in PE1/PE2/PE3. There is no need for 586 a PE node that don't have an IP-VRF instance to advertise the RT-5G 587 routes here. 589 6.3. RT-5L Advertisement 591 When R1/R2 establish CE-BGP sessions with both PE1 and PE2, it is 592 enough for PE1/PE2 to advertise RT-5L routes to PE3. There is no 593 need for RT-5G or RT-5E advertisement on PE1/PE2 in that usecase. 595 Note that when R1/R2 establish CE-BGP sessions with both PE1 and PE2, 596 the downlink VRF-interface addresses on PE1 and PE2 may be different 597 IP addresses of the same subnet. 599 Note that when centerlized CE-BGP session is used, the prefixes from 600 R3 and the local loopback addresses on PE3 are advertised to PE1/PE2 601 using RT-5L too. 603 7. Load Balancing of Unicast Packets 605 It is similar to [I-D.sajassi-bess-evpn-ip-aliasing] except for a few 606 notable exceptions as explained in section 6.2.3 and the following. 608 Note that when the encapsulation is VXLAN, PE3 will encapsulate the 609 RMAC of the RT-2E route for corresponding GW-IP address. And the 610 RMAC of PE1 MAY have the same value with the RMAC of PE2. This can 611 be achieved by configuration. When a IP packet is encapsulated with 612 a VNI label according to an IP-AD/EVI route, the packet SHOULD be 613 encapsulated with a Destination-MAC according to the RMAC of the same 614 IP-AD/EVI route, if and only if the IP-AD/EVI route have a RMAC of 615 its own. 617 Note that PE1/PE2 just do egress link protection following IP-AD/EVI 618 and EAD/ES route. Even if ESI1 is configured as all-active ESI, PE1/ 619 PE2 will not load-balance between local downlink VRF-interface and 620 the bypass tunnel. The downlink VRF-interfaces will always have more 621 higher priority than the bypass tunnel. 623 8. Special Considerations for Single-Active ESI 625 When the R1 is an Ethernet Segment of MHD type, and the uplink 626 interfaces of R1 operates in linux network-bonding mode type 1. So 627 the Primary flag according to DF election may cause packet-drop on R1 628 because of the nature of linux bond1. 630 In the linux bond1 use case, we propose that the Layer 2 extended 631 community should not be included. and the single-active ESI have 632 lower priority than the MAC/IP route's own MPLS nexthop on PE3, but 633 at the same time the downlink VRF-interface may still have higher 634 priority than the bypass tunnel on PE1/PE2 to make convergency 635 faster. 637 9. Security Considerations 639 This document does not introduce any new security considerations 640 other than already discussed in [RFC7432] and [RFC8365]. 642 10. IANA Considerations 644 There is no IANA consideration. 646 11. Normative References 648 [I-D.dawra-bess-srv6-services] 649 Dawra, G., Filsfils, C., Brissette, P., Agrawal, S., 650 Leddy, J., daniel.voyer@bell.ca, d., 651 daniel.bernier@bell.ca, d., Steinberg, D., Raszuk, R., 652 Decraene, B., Matsushima, S., Zhuang, S., and J. Rabadan, 653 "SRv6 BGP based Overlay services", draft-dawra-bess- 654 srv6-services-02 (work in progress), July 2019. 656 [I-D.ietf-bess-evpn-inter-subnet-forwarding] 657 Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. 658 Rabadan, "Integrated Routing and Bridging in EVPN", draft- 659 ietf-bess-evpn-inter-subnet-forwarding-08 (work in 660 progress), March 2019. 662 [I-D.ietf-bess-evpn-prefix-advertisement] 663 Rabadan, J., Henderickx, W., Drake, J., Lin, W., and A. 664 Sajassi, "IP Prefix Advertisement in EVPN", draft-ietf- 665 bess-evpn-prefix-advertisement-11 (work in progress), May 666 2018. 668 [I-D.sajassi-bess-evpn-ip-aliasing] 669 Sajassi, A., Badoni, G., Warade, P., Pasupula, S., Drake, 670 J., and J. Rabadan, "L3 Aliasing and Mass Withdrawal 671 Support for EVPN", draft-sajassi-bess-evpn-ip-aliasing-01 672 (work in progress), March 2020. 674 [I-D.wang-bess-evpn-context-label] 675 Wang, Y., "Context Label for MPLS EVPN", draft-wang-bess- 676 evpn-context-label-00 (work in progress), January 2020. 678 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 679 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 680 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 681 2015, . 683 [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., 684 Uttaro, J., and W. Henderickx, "A Network Virtualization 685 Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, 686 DOI 10.17487/RFC8365, March 2018, 687 . 689 [RFC8679] Shen, Y., Jeganathan, M., Decraene, B., Gredler, H., 690 Michel, C., and H. Chen, "MPLS Egress Protection 691 Framework", RFC 8679, DOI 10.17487/RFC8679, December 2019, 692 . 694 Authors' Addresses 696 Yubao(Bob) Wang 697 ZTE Corporation 698 No. 50 Software Ave, Yuhuatai Distinct 699 Nanjing 700 China 702 Email: yubao.wang2008@hotmail.com 704 Zheng(Sandy) Zhang 705 ZTE Corporation 706 No. 50 Software Ave, Yuhuatai Distinct 707 Nanjing 708 China 710 Email: zzhang_ietf@hotmail.com