idnits 2.17.1 draft-wang-bess-evpn-arp-nd-synch-without-irb-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([I-D.sajassi-bess-evpn-ip-aliasing], [RFC7432]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 202: '... But there SHOULD be no RT-2 adverti...' RFC 2119 keyword, line 203: '... [RFC8214]. So the RT-2 routes from PE2 to PE3 SHOULD not carry any...' RFC 2119 keyword, line 304: '...IP Advertisement SHOULD carry one or m...' RFC 2119 keyword, line 307: '... * The ESI SHOULD be set to the ESI ...' RFC 2119 keyword, line 329: '...6 L3 Service TLV MAY also be advertise...' (4 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: We use RT-2 routes to advertise the ARP entry of H1 from PE2 to PE3. But there SHOULD be no RT-2 advertisement in EVPN VPWS according to [RFC8214]. So the RT-2 routes from PE2 to PE3 SHOULD not carry any export-RTs of VPWS1, and the label1 of these RT-2 route will be set to NULL. -- The document date (July 4, 2020) is 1363 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-08 == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-15 == Outdated reference: A later version (-09) exists of draft-sajassi-bess-evpn-ip-aliasing-01 == Outdated reference: A later version (-04) exists of draft-wang-bess-evpn-context-label-02 Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS WG Y. Wang 3 Internet-Draft Z. Zhang 4 Intended status: Standards Track ZTE Corporation 5 Expires: January 5, 2021 July 4, 2020 7 ARP/ND Synching And IP Aliasing without IRB 8 draft-wang-bess-evpn-arp-nd-synch-without-irb-06 10 Abstract 12 This document proposes an extension to [RFC7432] and 13 [I-D.sajassi-bess-evpn-ip-aliasing] to do ARP synchronizing and IP 14 aliasing for Layer 3 routes that is needed for EVPN signalled L3VPN 15 to build a complete IP ECMP. The phrase "EVPN signalled L3VPN" means 16 that there may be no MAC-VRF or IRB interface in the use case. When 17 there are no MAC-VRF or IRB interface, EVPN signalled L3VPN is also 18 called as "pure L3VPN instance" which is a different usecase from 19 [I-D.sajassi-bess-evpn-ip-aliasing]. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on January 5, 2021. 38 Copyright Notice 40 Copyright (c) 2020 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 1.1. EVPN signalled L3VPN . . . . . . . . . . . . . . . . . . 3 57 1.2. Integrated Routing and Cross-connecting . . . . . . . . . 4 58 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 59 2. ARP/ND Synching and IP Aliasing . . . . . . . . . . . . . . . 6 60 2.1. Constructing MAC/IP Advertisement Route . . . . . . . . . 7 61 2.2. Constructing IP-AD/EVI Route . . . . . . . . . . . . . . 8 62 2.3. Constructing IP-AD/ES Route . . . . . . . . . . . . . . . 8 63 3. Fast Convergence for Routed Traffic . . . . . . . . . . . . . 9 64 4. Determining Reach-ability to Unicast IP Addresses . . . . . . 9 65 5. Forwarding Unicast Packets . . . . . . . . . . . . . . . . . 9 66 6. RT-5 Routes in EVPN signalled L3VPN . . . . . . . . . . . . . 10 67 6.1. RT-5E Advertisement on Distributed L3 GW . . . . . . . . 10 68 6.2. Centerlized RT-5G Advertisement for Distributed L3 69 Forwarding . . . . . . . . . . . . . . . . . . . . . . . 11 70 6.2.1. Centerlized CE-BGP . . . . . . . . . . . . . . . . . 12 71 6.2.2. RT-2E Advertisement from PE1/PE2 to PE3 . . . . . . . 12 72 6.2.3. RT-5G Advertisement from PE3 to PE1/PE2 . . . . . . . 12 73 6.2.4. RT-2E Advertisement between PE1 and PE2 . . . . . . . 13 74 6.2.5. Egress ESI Link Protection between PE1 and PE2 . . . 13 75 6.2.6. Comparing with Distributed RT-5G Advertisement . . . 13 76 6.2.7. Mass-Withdraw by EAD/ES Route . . . . . . . . . . . . 14 77 6.2.8. On the Failure of PE3 Node . . . . . . . . . . . . . 14 78 6.2.9. Floating GW-IP between R1 and R2 . . . . . . . . . . 15 79 6.3. RT-5L Advertisement . . . . . . . . . . . . . . . . . . . 15 80 7. Load Balancing of Unicast Packets . . . . . . . . . . . . . . 16 81 8. Special Considerations for Single-Active ESIs . . . . . . . . 16 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . 16 83 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 84 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 85 11.1. Normative References . . . . . . . . . . . . . . . . . . 17 86 11.2. Normative References . . . . . . . . . . . . . . . . . . 18 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 89 1. Introduction 91 In [I-D.sajassi-bess-evpn-ip-aliasing], an extension to [RFC7432] to 92 do aliasing for Layer 3 routes is proposed for symmetric IRB to build 93 a complete IP ECMP. But typically there may be both IRB 94 interfaces(to do EVPN IRB per-MAC-VRF basis) and VRF- ACs in the same 95 IP-VRF instance. It is necessary to apply the EVPN control-plane to 96 the VRF-ACs in order to support EVPN signalled L3VPN, including such 97 mixed situations, the pure L3VPN instance use case where maybe no IRB 98 interfaces will be found in the IP-VRF instances. 100 There are also an Integrated Routing and Cross-connecting use case 101 which is described in Section 1.2. 103 1.1. EVPN signalled L3VPN 105 +---------+ 106 +-------------+ | | 107 | | | | 108 /| PE1 |----| | +-------------+ 109 / | | | MPLS/ | | | 110 LAG / +-------------+ | VxLAN/ | | PE3 |---N3 111 N1---SW1===== | NVGRE/ | | | 112 / \ +-------------+ | SRv6 |---| | 113 N2 \ | | | | +-------------+ 114 \| PE2 |----| | 115 | | | | 116 +-------------+ | | 117 | | 118 | | 119 +---------+ 121 Figure 1: ARP/ND Synchronizing and IP Aliasing without IRB 123 There are three CE nodes named N1/N2/N3 in the above network. N1/N2/ 124 N3 may be a host or a IP router. When N1/N2/N3 is a host, it is also 125 called H1/H2/H3 in this document. When N1/N2/N3 is a router, it is 126 also called R1/R2/R3 in this document. 128 Consider a pair of multi-homed PEs PE1 and PE2. Let there be two 129 hosts H1 and H2 attached to them via a L2 switch SW1. Consider 130 another PE PE3 and a host H3 attached to it. The H1 and H2 represent 131 subnet SN1 and the H3 represents subnet SN2. 133 Note that it is different from [I-D.sajassi-bess-evpn-ip-aliasing] in 134 the following aspects: There may be no MAC-VRF or IRB interface on 135 PE1/PE2/PE3. And it is the IP-VRFs that are called as EVPN instance 136 instead. Such EVPN instance can be called pure L3 EVPN instance or 137 L3 EVI for short. The anycast gateway of H1/H2 is configured on a 138 sub-interface on PE1/PE2. 140 Note that the communication between H1 and H2 won't pass through any 141 of the multi-homed PEs. So it is not necessary for PE1/PE2 keeping a 142 Broadcast domain and its IRB for SN1. 144 Note that the SW1 multi-homing PE1 and PE2 via a LAG interface which 145 maybe load-balance traffic to the PEs. 147 This draft proposes an extension to do ARP/ND synchronizing and IP 148 aliasing for Layer 3 routes that is needed for L3 EVI to build a 149 complete IP ECMP. 151 1.2. Integrated Routing and Cross-connecting 153 When an IP-VRF instance and an EVPN VPWS instance is connected by an 154 virtual-interface, We call such scenarios as Integrated Routing and 155 Cross-connecting (IRC) use-case where the EVPN VPWS is represent by 156 the term "cross-connecting", and the IP-VRF is represent by the term 157 "Routing". The virtual-interface connecting EVPN VPWS and IP-VRF is 158 called as IRC interface. 160 The IRC use case is illustrated by the following figure: 162 PE2 163 +---------------------+ 164 | IRC1=10.1 | 165 | +-----+ +------+ |. 166 .| |VPWS1|---|IPVRF1| | . 167 . | +-----+ | | | . PE4 168 PE1 . | +------+ | .+---------+ 169 +--------+. +---------------------+ |+------+ | 170 |+-----+ | | ||IPVRF1| | 171 ||VPWS1| | | RT-2E || | | 172 |+-----+ | | 10.2 |+------+ | 173 | | | | ESI1 | | | 174 | | | | label2=IPVRF1 | | | 175 +---|----+. | label1=NULL .+---|-----+ 176 | . PE3 V . | 177 | . +---------------------+ . | 178 | .| IRC2=10.1 |. | 179 H1=10.2 | +-----+ +------+ | H4=20.2 180 H2=10.3 | |VPWS1|---|IPVRF1| | 181 H3=10.4 | +-----+ | | | 182 | +------+ | 183 +---------------------+ 185 Figure 2: ARP/ND Synchronizing for IRC Interfaces 187 Note that the IRC interfaces are considered as AC interfaces in EVPN 188 VPWS interface. At the same time, they are considered as VRF-ACs in 189 IP-VRF instances. 191 When H1 sends an ARP packet P1, then PE1 will be forwarded by PE1 to 192 either PE2 or PE3, not to the both. Both the IRC1 on PE2 and IRC2 on 193 PE3 are H1's subnet-gateway(SNGW). But when H4 send an packet P2 to 194 H1, then PE4 may load-balance P2 to either PE2 ore PE3, not to the 195 both. 197 When P1 is load-balance to PE2, not to PE3, but PE4 load-balance P2 198 to PE3, The ARP entry of H1 will not be prepared on PE3 for P2. So 199 the fowarding of P2 will be delayed due to ARP missing. 201 We use RT-2 routes to advertise the ARP entry of H1 from PE2 to PE3. 202 But there SHOULD be no RT-2 advertisement in EVPN VPWS according to 203 [RFC8214]. So the RT-2 routes from PE2 to PE3 SHOULD not carry any 204 export-RTs of VPWS1, and the label1 of these RT-2 route will be set 205 to NULL. 207 The NULL value of label1 in MPLS EVPN should be implicit-null. The 208 NULL value of label1 in VXLAN EVPN should be 0. 210 Note that an ESI may be assigned to IRC1 and IRC2, Because the ESI of 211 the RT-2 routes will be used to determine that to which the ARP 212 entries should be installed. 214 1.3. Terminology 216 Most of the terminology used in this documents comes from [RFC7432] 217 and [I-D.sajassi-bess-evpn-ip-aliasing] except for the following: 219 VRF AC: An Attachment Circuit (AC) that attaches a CE to an IP-VRF 220 but is not an IRB interface. 222 IRC: Integrated Routing and Cross-connecting, thus a IRC interface is 223 the virtual interface connecting an IP-VRF and an EVPN VPWS. 225 VRF Interface: An IRB interface or a VRF-AC or an IRC interface. 226 Note that a VRF interface will be bound to the routing space of an 227 IP-VRF. 229 L3 EVI: An EVPN instance spanning the Provider Edge (PE) devices 230 participating in that EVPN which contains VRF ACs and maybe contains 231 IRB interfaces or IRC interfaces. 233 IP-AD/EVI: Ethernet Auto-Discovery route per EVI, and the EVI here is 234 an IP-VRF. 236 IP-AD/ES: Ethernet Auto-Discovery route per ES, and the EVI for one 237 of its route targets is an IP-VRF. 239 CE-BGP: The BGP session between PE and CE. Note that CE-BGP route 240 doesn't have a RD or Route-Target. 242 RMAC: Router's MAC, which is signaled in the Router's MAC extended 243 community. 245 RT-2E: A MAC/IP Advertisement Route with a non-reserved ESI. 247 RT-5E: An EVPN Prefix Advertisement Route with a non-reserved ESI. 249 RT-5G: An EVPN Prefix Advertisement Route with a zero ESI and a non- 250 zero GW-IP. 252 RT-5L: An EVPN Prefix Advertisement Route with both zero ESI and zero 253 GW-IP. 255 2. ARP/ND Synching and IP Aliasing 257 Host IP and MAC routes are learnt by PEs on the access side via a 258 control plane protocol like ARP. In case where a CE is multihomed to 259 multiple PE nodes using a LAG and is running in All-Active Redundancy 260 Mode, the Host IP will be learnt and advertised in the MAC/IP 261 Advertisement only by the PE that receives the ARP packet. The MAC/ 262 IP Advertisement with non-zero ESI will be received by both PE2 and 263 PE3. 265 As a result, after PE2 receives the MAC/IP Advertisement and imports 266 it to the L3 EVI, PE2 installs an ARP entry to the VRF interface 267 whose subnet matches the IP Address from the MAC/IP Advertisement. 268 Such ARP entry is called remote synched ARP Entry in this document. 270 Note that the PEs follow [I-D.sajassi-bess-evpn-ip-aliasing] to 271 achieve the ESI load balance except for the constructing of MAC/IP 272 Advertisement Route and IP AD per EVI route. 274 When PE3 load balance the traffic towards the multihomed Ethernet 275 Segment, both PE1 and PE2 would have been prepared with corresponding 276 ARP entry yet because of the ARP synching procedures. 278 It is important to explain that typically there may be both IRB 279 interface and VRF interface in an IP-VRF instance, which is called as 280 the "VRF interface in EVPN IRB" use-case in this document. But each 281 IRB/VRF interface is independent to each other in EVPN control plane. 282 So the use-case here is constrained to a pure L3 EVPN schema, Because 283 it is enough to describe all the control-plane updates for both the 284 pure L3 EVPN use-case and the "VRF interface in EVPN IRB" use-case. 286 In current EVPN control-plane for "VRF interface in EVPN IRB" use- 287 case, the VRF interface is considered as "external link" and it just 288 inter-operates with the EVPN control-plane. But in this document it 289 is assumed to be better if the EVPN control-plane directly applied to 290 the VRF interfaces. 292 2.1. Constructing MAC/IP Advertisement Route 294 This draft introduces a new usage/construction of MAC/IP 295 Advertisement route to enable Aliasing for IP addresses in pure L3 296 EVPN use-cases. The usage/construction of this route remains similar 297 to that described in RFC 7432 with a few notable exceptions as below. 299 * The Route-Distinguisher should be set to the corresponding L3VPN 300 context. 302 * The Ethernet Tag should be set to 0. 304 * The MAC/IP Advertisement SHOULD carry one or more IP VRF Route- 305 Target (RT) attributes. 307 * The ESI SHOULD be set to the ESI of the VRF interface from which 308 the ARP entry is learned. 310 Note that the ESI is used to install remote synched ARP entries to 311 corresponding VRF interfaces on PE1/PE2. But it is only used to load 312 balance traffic on PE3. 314 * The MPLS Label1 should be set to implicit-null in MPLS/SRv6 315 encapsulation. For VXLAN encapsulation, the MPLS label1 should be 316 set to 0 instead. Note that in IRC use case, although there is a L2 317 EVPN instance (EVPN VPWS), the EVPN label and export-RT of that EVPN 318 VPWS will not be carried in the MAC/IP route. 320 Note that there may be no MAC-VRF here, and this is outside the scope 321 of RFC 7432. 323 * The MPLS Label2 should be set to the local label of the IP-VRF in 324 MPLS or VXLAN EVPN. But it should be set to implicit-null in SRv6 325 EVPN. 327 Note that the label may be VNI label or MPLS label. 329 Note that in SRv6 EVPN an SRv6 L3 Service TLV MAY also be advertised 330 along with the route following [I-D.dawra-bess-srv6-services]. But 331 SRv6 L2 Service TLV won't be advertiseed along with the route. 332 Because that no MAC-VRF exists in the use case. 334 * The RMAC Extended Community attribute SHOULD be carried in VXLAN 335 EVPN. 337 2.2. Constructing IP-AD/EVI Route 339 Note that the IP-AD/EVI Advertisement is used for two reasons. It is 340 used between PE1 and PE2 to do egress link protection for the subnet 341 of the downlink VRF-interface. It is used between PE1/PE2 and PE3 to 342 achieve the load balance to ES adjacent PEs. 344 The usage/construction of this route is similar to the IP-AD per EVI 345 route described in [I-D.sajassi-bess-evpn-ip-aliasing] with a few 346 notable exceptions as below. 348 Note that there may be no MAC-VRF here, and this is outside the scope 349 of [RFC7432] and [I-D.sajassi-bess-evpn-ip-aliasing]. 351 Note that the Encapsulation Sub-TLV of Tunnel Encapsulation Attribute 352 per [I-D.ietf-idr-tunnel-encaps] may be used to emphasize that the 353 RMAC in the Encapsulation Sub-TLV will be preferred. 355 Note that, in [I-D.ietf-idr-tunnel-encaps] setion 7, when the next 356 hop of BGP UPDATE U1 is router X1 and the best path to router X1 is a 357 BGP route that was advertised in UPDATE U2, and both U1 and U2 have a 358 tunnel encapsulation attribute, the data packet will be carried 359 through a pair of nested tunnels, each corresponding to a tunnel 360 encapsulation attribute. But when U1 is a RT-2E route and U2 is an 361 IP-AD/EVI route, the ESI in the recursion is not considered as a 362 "next hop" of [I-D.ietf-idr-tunnel-encaps] setion 7. So only the 363 tunnels in IP-AD/EVI route will be used, although both of the two 364 EVPN routes have a Tunnel Encapsulation attribute. 366 Note that we have special considerations for single-active ESIs than 367 [I-D.sajassi-bess-evpn-ip-aliasing], and it is detailed in Section 8. 369 Such Ethernet Auto-Discovery route is called Ethernet Auto-Discvoery 370 route per IP-VRF which is abbreviated as EAD/IP-VRF in the old 371 versions of this document. 373 2.3. Constructing IP-AD/ES Route 375 The usage/construction of this route remains similar to the IP AD per 376 ES route described in [I-D.sajassi-bess-evpn-ip-aliasing] section 3.1 377 with a few notable exceptions as explained as below. 379 There may be no MAC-VRF RTs in the IP-AD/ES Route. 381 Such Ethernet Auto-Discovery route is called EAD/ES route in the old 382 versions of this document. 384 3. Fast Convergence for Routed Traffic 386 The procedures for Fast Convergence do not change from 387 [I-D.sajassi-bess-evpn-ip-aliasing] except for a few notable 388 exceptions as explained as below. 390 The local ARP entries and remote synced ARP entries is installed/ 391 learned on a VRF interface rather than an IRB interface. 393 There is no MAC entry. 395 4. Determining Reach-ability to Unicast IP Addresses 397 The procedures for local/remote host learning and MAC/IP 398 Advertisement route constructing are described above. The procedures 399 for Route Resolution do not change from 400 [I-D.sajassi-bess-evpn-ip-aliasing] and/or 401 [I-D.ietf-bess-evpn-prefix-advertisement]. 403 5. Forwarding Unicast Packets 405 Because of the nature of the MPLS label or SRv6 SID for IP-VRF 406 instance, when these IP-AD/EVI routes are referred in IP-VRF routing 407 and forwarding procedures, the inner ethernet headers are absent on 408 the corresponding packets transported following these IP-AD/EVI 409 routes. 411 Note that in [I-D.sajassi-bess-evpn-ip-aliasing] the IP-AD per EVI 412 route carries a "Router's MAC" extended community in case the RMAC is 413 not the same among different PEs. In these cases, the inner 414 destination MAC of the corresponding data packets from PE3 to PE1/PE2 415 must use the RMAC in IP-AD/EVI route instead, even if there is a RMAC 416 in RT-2E route. 418 Note that this is a data-plane update of 419 [I-D.ietf-bess-evpn-prefix-advertisement] for both EVPN signalled 420 L3VPN and [I-D.sajassi-bess-evpn-ip-aliasing]. According to 421 [I-D.ietf-bess-evpn-prefix-advertisement] section 4.3 or 422 [I-D.ietf-bess-evpn-inter-subnet-forwarding] section 3.2.3, the inner 423 destination MAC will follow the RMAC of RT-5E Route or RT-2E Route. 424 Although PE3 SHOULD prefers the RMAC in the IP-AD/EVI routes 425 following this document, we also suggest the RMAC being included in 426 RT-2E or RT-5E route for compatibility. 428 When a packet is forwarded following the subnet route of a downlink 429 VRF-interface, and the bypass tunnel is used, the ARP lookup is not 430 needed because of the RMAC in the IP-AD/EVI route. But if the 431 downlink VRF-interface is up at that time, the ARP lookup is used to 432 encapsulated the destination MAC of the packet's ethernet header as 433 usual. 435 Note that the packets received from a bypass tunnel can only be 436 forwarded to a local downlink VRF-interface. In order to prevent the 437 micro loop on R1's node failure, a few split-horizon filter rules 438 should be introduced. In EVPN NVO3, the packet received from a 439 tunnel is not allowed to forwarded to the same tunnel. In SRv6 EVPN, 440 the packet received from a locator may be not allowed to forwarded to 441 the same locator based on configurations. In MPLS EVPN, the packet 442 may include an extra label to identify its ingress router as proposed 443 in [I-D.wang-bess-evpn-context-label]. In MPLS EVPN, the packet may 444 include an extra label to identify that it is forwarded on a bypass 445 tunnel. And the extra label can be a extended special-purpose label 446 or an ESI label. 448 6. RT-5 Routes in EVPN signalled L3VPN 450 EVPN signalled L3VPN can be deployed without EVPN IRB like what MPLS/ 451 BGP VPNs have done for a long time, but it can be combined with EVPN 452 IRB. The EVPN siganlled L3VPN without EVPN IRB is not well defined 453 yet, so we take the non-IRB usecase as an example. But the following 454 routes and procedures can be used in EVPN IRB usecase too. Note that 455 in EVPN IRB usecase, the IRB interfaces are VRF-interface too. 457 6.1. RT-5E Advertisement on Distributed L3 GW 459 Given that PE1/PE2 can install a synced ARP entry to its proper VRF- 460 interface benefitting from the RT-2 route of section 2.1. So it is 461 not necessary for PE1/PE2 to advertise per-host IP prefixes by RT-2 462 routes. It is recommended that PE1/PE2 advertise an RT-5 route per 463 subnet to PE3 instead. The ESI of these RT-5E routes can be set to 464 the ESI of the corresponding VRF interface. If the VRF interface 465 fails, these subnets will achieve more faster convergency on PE3 by 466 the withdraw of the corresponding IP-AD/EVI route. 468 Note that N1/N2 may be a host or a router, when it is a router, those 469 subnets will be the subnets behind it. When N1 and N2 are hosts, 470 those subnets will be the subnets of N1 and N2 whether they are 471 different subnets or not. 473 6.2. Centerlized RT-5G Advertisement for Distributed L3 Forwarding 475 When N1/N2/N3 is a router, it is called R1/R2/R3 in the following 476 figure. Note that figure 1 only illustrates the physical ethernet 477 links, but figure 2 illustrates the logical L3 adjacencies between PE 478 and CE as the following. 480 PE2 481 +----+ +---------------+ 482 | | 20.2 | 20.1 +------+ | ------> 483 | R2 |===+------------| | | RT-2E 484 | | | | |IPVRF1| | 20.2 PE3 485 +----+ | +---------| | | ESI1 +---------------+ 486 Prefix2 | | | 10.1 +------+ | | | 487 | | +---------------+ | +-----------+ | 488 | | ^ | | IPVRF1 | | 489 | | | RT-2E <-------- | | |----R3 490 | | ESI1 | 10.2 RT-5G | | 3.3.3.3 | | 491 | | | ESI1 Prefix1 | +-----------+ | 492 | | | 10.2 | ^ | 493 | | +---------------+ | | | 494 Prefix1 | | | 20.1 +------+ | +---|-----------+ 495 +----+ +--|---------| | | | 496 | | | | |IPVRF1| | | 497 | R1 |======+---------| | | ------> | 498 | | 10.2 | 10.1 +------+ | RT-2E | 499 +----+ +---------------+ 10.2 | CE-BGP 500 | PE1 ESI1 | Prefix1 501 | | NH=10.2 502 | CE-BGP | 503 +------------------------>------------------------+ 505 Figure 3: Centerlized RT-5G Advertisement 507 Note that R1/R2 should establish CE-BGP session with both PE1 and PE2 508 in case of one of them fails, PE1 and PE2 will advertise RT-5E route 509 to PE3 for their prefixes learned from CE-BGP independently. If R1/ 510 R2 prefers to establish a single CE-BGP session, it can establish the 511 CE-BGP session with PE3 instead. This CE-BGP session can be called 512 the centerlized CE-BGP session. But when we use centerlized CE-BGP 513 session, we should use RT-5G route instead. 515 Note that we just use centerlized CE-BGP session to do route 516 advertisement, but we still expect a distributed Layer 3 forwarding 517 framework. 519 6.2.1. Centerlized CE-BGP 521 The CE-BGP session between R1 and PE3 is established between 10.2 and 522 3.3.3.3. The CE-BGP session between R2 and PE3 is established 523 between 20.2 and 3.3.3.3. The IP address 10.2/20.2 is called the 524 uplink interface address of R1/R2 in this document. The IP address 525 3.3.3.3 is called the centerlized loopback address of IPVRF1 in this 526 document. The IP address 10.1/20.1 is called the downlink VRF- 527 interface address of PE1/PE2 in this document. 529 Note that the downlink VRF-interface is a Layer 3 link and it needn't 530 attach an BD. 532 R1 advertises a BGP route for a prefix (say "Prefix1") behind it to 533 PE3 via that CE-BGP session. The nexthop for Prefix1 is R1's uplink 534 interface address (say 10.2). 536 The route advertisement of R2 is similar to the above advertisement. 538 Note that the packets from R1/R2 to the centerlized loopback address 539 may be routed following the default route on R1/R2. 541 6.2.2. RT-2E Advertisement from PE1/PE2 to PE3 543 When PE1 learns the ARP entry of 10.2, it advertises a RT-2E route to 544 PE3. The ESI value of the RT-2E route is ESI1, which is the ESI of 545 PE1's downlink VRF-interface for R1. The RT-2E route is constructed 546 following section 2.1. 548 Note that in [RFC7432], when the ESI is single-active, the MAC 549 forwarding only use the label and the MPLS nexthop of the RT-2E route 550 as long as they are valid for forwarding status. But in RT-5 routes 551 we assume that the ESI is always preferred even if the ESI is single- 552 active. This is similar to [I-D.ietf-bess-evpn-prefix-advertisement] 553 section 3.2 Table 1. The ESI usage in IP forwarding is out of the 554 [RFC7432]'s scope. 556 The RT-2E route advertisement of PE2 is similar to the above 557 advertisement. 559 6.2.3. RT-5G Advertisement from PE3 to PE1/PE2 561 When PE3 receives the prefix1 from the CE-BGP session. The nexthop 562 for Prefix1 is 10.2, and the ESI for 10.2 is ESI1. So PE3 advertises 563 a RT-5G route to PE1/PE2 for Prefix1. The GW-IP value of the RT-5G 564 route for Prefix1 is 10.2. 566 Note that PE3 can load-balance packets for Prefix1 via the IP-AD/EVI 567 routes from PE1/PE2. Because ESI1 is the ESI for Prefix1's GW-IP. 569 The RT-5 route advertisement and packet forwarding for Prefix2 is 570 similar to the above. 572 Note that the centerlized loopback address is advertised by PE3 via 573 RT-5L route. The nexthop of the RT-5L route is PE3, and the GW-IP 574 value of the RT-5L route is zero. The label of the RT-5L route is 575 IPVRF1's label on PE3. The RMAC of the RT-5L route is PE3's MAC when 576 the encapsulation is VXLAN. 578 Note that no Tunnel Encapsulation attribute should be carried in a 579 RT-5G route, in order to avoid the nested tunnel encapsulation 580 described in [I-D.ietf-idr-tunnel-encaps] setion 7. 582 6.2.4. RT-2E Advertisement between PE1 and PE2 584 The RT-2E routes advertisement between PE1 and PE2 is used to sync 585 these ARP entries to each other in order to avoid ARP missing. The 586 ESI Value of these two RT-2E routes is ESI1. 588 Note that we assume that the ARP entry for 10.2 will be learned on 589 PE1 only, and 20.2 will be learned on PE2 only. Note that the two 590 downlink VRF-interfaces for R1/R2 on PE1/PE2 are sub-interfaces of 591 the same physical interface. So they have the same ESI. 593 6.2.5. Egress ESI Link Protection between PE1 and PE2 595 The IP-AD/EVI routes between PE1 and PE2 is used to do egress link 596 protection. The egress link protection follows the second approach 597 of the [RFC8679] section 6. 599 Note that although the ARP entry for 10.2 on PE2 is synced from PE1 600 via RT-2E route. The ARP entry on PE2 is installed to forward 601 packets directly to the corresponding downlink VRF-interface 602 primarily. The bypass tunnel following the IP-AD/EVI route is only 603 activated when the downlink VRF-interface fails. 605 6.2.6. Comparing with Distributed RT-5G Advertisement 607 When R1/R2 establish CE-BGP sessions with both PE1 and PE2, The RT-5G 608 routes can be used by PE1/PE2 instead of the RT-5E routes. But when 609 R1 only establish just a single CE-BGP session with PE1, there will 610 be some trouble when PE1 fails. Even if PE2/PE3 applies a delayed 611 deletion when PE1 fails, the delay cann't be long enough when PE1 612 never comes up again. 614 Note that when there is only a single CE-BGP session, the RT-5E 615 advertisement will face the same fact. In fact it is even worse when 616 R1 uses different subnets to connect to PE1 and PE2 as described in 617 [I-D.sajassi-bess-evpn-ip-aliasing] section 1.2. Because that RT-5E 618 can only sync the prefixes, it can't sync the nexthops, so when PE2 619 receives a RT-5E route from PE1 the ARP entry for the other uplink 620 interface that connects R1 to PE2 will not be resolved by PE2. 622 Note that when R1 uses different subnets to connect to PE1 and PE2 , 623 it is not necessary to configure a BD for the two subnets connecting 624 PE and CE like what is described in 625 [I-D.sajassi-bess-evpn-ip-aliasing] section 1.2. 627 Note that we can make the RT-5E route carry the MAC address of its 628 overlay nexthop (which is R1's uplink interface)'s ARP entry, so that 629 when when PE2 receives a RT-5E route carrying such MAC address, these 630 routes don't need to do ARP lookup. Such MAC address can be carried 631 in a new extended community called as GW-MAC extended community. By 632 doing so, when R1 uses different subnets to connect to PE1 and PE2, 633 then the RT-5E can be used to sync the prefixes. 635 6.2.7. Mass-Withdraw by EAD/ES Route 637 We can assume that R1 and R2 are attached to different IP-VRFs(say 638 IPVRF1 and IPVRF2 respectively), and the physical interface of the 639 downlink VRF-interfaces on PE1 fails, PE1 will withdraw the IP-AD/ES 640 route of ESI1, so PE3 will re-route 10.2 for Prefix1 in IPVRF1 and 641 20.2 for Prefix2 in IPVRF2 at the same time. Then data packets for 642 Prefix1 and Prefix2 will be sent to PE2 instead. 644 6.2.8. On the Failure of PE3 Node 646 On the failure of PE3, PE1/PE2 should delay the deletion of the RT-5G 647 route from PE3. PE3 can use a new BGP attribute to indicate the 648 delayed-deletion requirement to PE1/PE2. Otherwise the L3 traffic 649 between R1 and R2 will be interrupted. Fortunately, PE3 will 650 typically have a redundant node (PE3' in Figure 3), and PE3' can be 651 used to take PE3's place when PE3 fails. 653 Note that from the viewpoint of R1 and R2, the total of PE1, PE2, 654 PE3, PE3' and the underlay network between them is regarded as the 655 following logical router: 657 +---------------------------------+ 658 | | 659 | +----------------------+ | 660 | | RPU1 (PE3) | | 661 | +----------------------+ | 662 | | 663 | +----------------------+ | 664 | | RPU2 (PE3') | | 665 | +----------------------+ | 666 | | 667 | +----------------------+ | 668 R1-----------| Line Card 1 (PE1) | | 669 | +----------------------+ | 670 | | 671 | +----------------------+ | 672 R2-----------| Line Card 2 (PE2) | | 673 | +----------------------+ | 674 | | 675 +---------------------------------+ 677 Figure 4: The Logical Router Framework 679 R1 and R2 connect to the line-cards of the logical router. and the 680 data packets between R1 and R2 just pass through the line-cards, not 681 through the RPUs(Routing Processing Units). But R1/R2 establish the 682 BGP session with the RPUs, not the line-cards. When the RPU1(or 683 actually PE3) fails, the line-cards(or actually PE1/PE2) will keep 684 the forwarding state unchanged untill the RPU1 or RPU2 comes up. So 685 the delayed deletion on PE1/PE2 for PE3's sake is apprehensible for 686 the same reason. 688 6.2.9. Floating GW-IP between R1 and R2 690 It is similar to [I-D.ietf-bess-evpn-prefix-advertisement] section 691 4.2 except for a few notable differences as described in the 692 following. There may be no BD in PE1/PE2/PE3. There is no need for 693 a PE node that don't have an IP-VRF instance to advertise the RT-5G 694 routes here. 696 6.3. RT-5L Advertisement 698 When R1/R2 establish CE-BGP sessions with both PE1 and PE2, it is 699 enough for PE1/PE2 to advertise RT-5L routes to PE3. There is no 700 need for RT-5G or RT-5E advertisement on PE1/PE2 in that usecase. 702 Note that when R1/R2 establish CE-BGP sessions with both PE1 and PE2, 703 the downlink VRF-interface addresses on PE1 and PE2 may be different 704 IP addresses of the same subnet. 706 Note that when centerlized CE-BGP session is used, the prefixes from 707 R3 and the local loopback addresses on PE3 are advertised to PE1/PE2 708 using RT-5L too. 710 7. Load Balancing of Unicast Packets 712 It is similar to [I-D.sajassi-bess-evpn-ip-aliasing] except for a few 713 notable exceptions as explained in section 6.2.3 and the following. 715 Note that when the encapsulation is VXLAN, PE3 will encapsulate the 716 RMAC of the RT-2E route for corresponding GW-IP address. And the 717 RMAC of PE1 MAY have the same value with the RMAC of PE2. This can 718 be achieved by configuration. When a IP packet is encapsulated with 719 a VNI label according to an IP-AD/EVI route, the packet SHOULD be 720 encapsulated with a Destination-MAC according to the RMAC of the same 721 IP-AD/EVI route, if and only if the IP-AD/EVI route have a RMAC of 722 its own. 724 Note that PE1/PE2 just do egress link protection following IP-AD/EVI 725 and EAD/ES route. Even if ESI1 is configured as all-active ESI, PE1/ 726 PE2 will not load-balance between local downlink VRF-interface and 727 the bypass tunnel. The downlink VRF-interfaces will always have more 728 higher priority than the bypass tunnel. 730 8. Special Considerations for Single-Active ESIs 732 When the R1 is an Ethernet Segment of MHD type, and the uplink 733 interfaces of R1 operates in linux network-bonding mode type 1. So 734 the Primary flag according to DF election may cause packet-drop on R1 735 because of the nature of linux bond1. 737 In the linux bond1 use case, we propose that the Layer 2 extended 738 community should not be included. and on PE3 the single-active ESI 739 have lower priority than the MAC/IP route's own MPLS nexthop, but at 740 the same time the downlink VRF-interface on PE1/PE2 may still have 741 higher priority than the bypass tunnel to make convergency faster. 743 9. Security Considerations 745 This document does not introduce any new security considerations 746 other than already discussed in [RFC7432] and [RFC8365]. 748 10. IANA Considerations 750 There is no IANA consideration. 752 11. References 754 11.1. Normative References 756 [I-D.dawra-bess-srv6-services] 757 Dawra, G., Filsfils, C., Brissette, P., Agrawal, S., 758 Leddy, J., daniel.voyer@bell.ca, d., 759 daniel.bernier@bell.ca, d., Steinberg, D., Raszuk, R., 760 Decraene, B., Matsushima, S., Zhuang, S., and J. Rabadan, 761 "SRv6 BGP based Overlay services", draft-dawra-bess- 762 srv6-services-02 (work in progress), July 2019. 764 [I-D.ietf-bess-evpn-inter-subnet-forwarding] 765 Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. 766 Rabadan, "Integrated Routing and Bridging in EVPN", draft- 767 ietf-bess-evpn-inter-subnet-forwarding-08 (work in 768 progress), March 2019. 770 [I-D.ietf-bess-evpn-prefix-advertisement] 771 Rabadan, J., Henderickx, W., Drake, J., Lin, W., and A. 772 Sajassi, "IP Prefix Advertisement in EVPN", draft-ietf- 773 bess-evpn-prefix-advertisement-11 (work in progress), May 774 2018. 776 [I-D.ietf-idr-tunnel-encaps] 777 Patel, K., Velde, G., and S. Ramachandra, "The BGP Tunnel 778 Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-15 779 (work in progress), December 2019. 781 [I-D.sajassi-bess-evpn-ip-aliasing] 782 Sajassi, A., Badoni, G., Warade, P., Pasupula, S., Drake, 783 J., and J. Rabadan, "L3 Aliasing and Mass Withdrawal 784 Support for EVPN", draft-sajassi-bess-evpn-ip-aliasing-01 785 (work in progress), March 2020. 787 [I-D.wang-bess-evpn-context-label] 788 Wang, Y. and B. Song, "Context Label for MPLS EVPN", 789 draft-wang-bess-evpn-context-label-02 (work in progress), 790 June 2020. 792 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 793 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 794 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 795 2015, . 797 [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., 798 Uttaro, J., and W. Henderickx, "A Network Virtualization 799 Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, 800 DOI 10.17487/RFC8365, March 2018, 801 . 803 [RFC8679] Shen, Y., Jeganathan, M., Decraene, B., Gredler, H., 804 Michel, C., and H. Chen, "MPLS Egress Protection 805 Framework", RFC 8679, DOI 10.17487/RFC8679, December 2019, 806 . 808 11.2. Normative References 810 [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. 811 Rabadan, "Virtual Private Wire Service Support in Ethernet 812 VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017, 813 . 815 Authors' Addresses 817 Yubao(Bob) Wang 818 ZTE Corporation 819 No. 50 Software Ave, Yuhuatai Distinct 820 Nanjing 821 China 823 Email: yubao.wang2008@hotmail.com 825 Zheng(Sandy) Zhang 826 ZTE Corporation 827 No. 50 Software Ave, Yuhuatai Distinct 828 Nanjing 829 China 831 Email: zzhang_ietf@hotmail.com