idnits 2.17.1 draft-ietf-bess-evpn-inter-subnet-forwarding-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 26, 2021) is 1005 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'I-D.ietf-bess-evpn-modes-interop' is mentioned on line 569, but not defined == Missing Reference: 'RFC9012' is mentioned on line 626, but not defined ** Downref: Normative reference to an Informational RFC: RFC 7348 ** Downref: Normative reference to an Informational RFC: RFC 7637 == Outdated reference: A later version (-17) exists of draft-ietf-bess-evpn-irb-extended-mobility-03 == Outdated reference: A later version (-13) exists of draft-ietf-nvo3-vxlan-gpe-10 Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS WorkGroup A. Sajassi 3 Internet-Draft S. Salam 4 Intended status: Standards Track S. Thoria 5 Expires: January 27, 2022 Cisco Systems 6 J. Drake 7 Juniper 8 J. Rabadan 9 Nokia 10 July 26, 2021 12 Integrated Routing and Bridging in EVPN 13 draft-ietf-bess-evpn-inter-subnet-forwarding-15 15 Abstract 17 Ethernet VPN (EVPN) provides an extensible and flexible multi-homing 18 VPN solution over an MPLS/IP network for intra-subnet connectivity 19 among Tenant Systems and End Devices that can be physical or virtual. 20 However, there are scenarios for which there is a need for a dynamic 21 and efficient inter-subnet connectivity among these Tenant Systems 22 and End Devices while maintaining the multi-homing capabilities of 23 EVPN. This document describes an Integrated Routing and Bridging 24 (IRB) solution based on EVPN to address such requirements. 26 Requirements Language 28 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 29 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 30 "OPTIONAL" in this document are to be interpreted as described in RFC 31 2119 [RFC2119] and RFC 8174 [RFC8174] when, and only when, they 32 appear in all capitals, as shown here. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 27, 2022. 50 Copyright Notice 52 Copyright (c) 2021 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 68 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5 69 3. EVPN PE Model for IRB Operation . . . . . . . . . . . . . . . 6 70 4. Symmetric and Asymmetric IRB . . . . . . . . . . . . . . . . 7 71 4.1. IRB Interface and its MAC and IP addresses . . . . . . . 10 72 4.2. Operational Considerations . . . . . . . . . . . . . . . 12 73 5. Symmetric IRB Procedures . . . . . . . . . . . . . . . . . . 13 74 5.1. Control Plane - Advertising PE . . . . . . . . . . . . . 13 75 5.2. Control Plane - Receiving PE . . . . . . . . . . . . . . 14 76 5.3. Subnet route advertisement . . . . . . . . . . . . . . . 15 77 5.4. Data Plane - Ingress PE . . . . . . . . . . . . . . . . . 16 78 5.5. Data Plane - Egress PE . . . . . . . . . . . . . . . . . 17 79 6. Asymmetric IRB Procedures . . . . . . . . . . . . . . . . . . 17 80 6.1. Control Plane - Advertising PE . . . . . . . . . . . . . 17 81 6.2. Control Plane - Receiving PE . . . . . . . . . . . . . . 18 82 6.3. Data Plane - Ingress PE . . . . . . . . . . . . . . . . . 19 83 6.4. Data Plane - Egress PE . . . . . . . . . . . . . . . . . 19 84 7. Mobility Procedure . . . . . . . . . . . . . . . . . . . . . 20 85 7.1. Initiating a gratutious ARP upon a Move . . . . . . . . . 21 86 7.2. Sending Data Traffic without an ARP Request . . . . . . . 22 87 7.3. Silent Host . . . . . . . . . . . . . . . . . . . . . . . 24 88 8. BGP Encoding . . . . . . . . . . . . . . . . . . . . . . . . 24 89 8.1. Router's MAC Extended Community . . . . . . . . . . . . . 25 90 9. Operational Models for Symmetric Inter-Subnet Forwarding . . 25 91 9.1. IRB forwarding on NVEs for Tenant Systems . . . . . . . . 25 92 9.1.1. Control Plane Operation . . . . . . . . . . . . . . . 27 93 9.1.2. Data Plane Operation . . . . . . . . . . . . . . . . 28 94 9.2. IRB forwarding on NVEs for Subnets behind Tenant Systems 30 95 9.2.1. Control Plane Operation . . . . . . . . . . . . . . . 31 96 9.2.2. Data Plane Operation . . . . . . . . . . . . . . . . 32 97 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33 98 11. Security Considerations . . . . . . . . . . . . . . . . . . . 33 99 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 100 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 101 13.1. Normative References . . . . . . . . . . . . . . . . . . 34 102 13.2. Informative References . . . . . . . . . . . . . . . . . 35 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 36 105 1. Terminology 107 AC: Attachment Circuit 109 ARP: Address Resolution Protocol 111 ARP table: A logical view of a forwarding table on a PE that 112 maintains an IP to MAC binding entry on an IP interface for both IPv4 113 and IPv6. These entries are learned through ARP/ND or through EVPN. 115 Broadcast Domain: As per [RFC7432], an EVI consists of a single or 116 multiple broadcast domains. In the case of VLAN-bundle and VLAN- 117 based service models (see [RFC7432]), a broadcast domain is 118 equivalent to an EVI. In the case of VLAN-aware bundle service 119 model, an EVI contains multiple broadcast domains. Also, in this 120 document, broadcast domain and subnet are equivalent terms and 121 wherever "subnet" is used, it means "IP subnet" 123 Broadcast Domain Route Target: refers to the Broadcast Domain 124 assigned Route Target [RFC4364]. In the case of VLAN-aware bundle 125 service model, all the broadcast domain instances in the MAC-VRF 126 share the same Route Target 128 Bridge Table: The instantiation of a broadcast domain in a MAC-VRF, 129 as per [RFC7432]. 131 Ethernet NVO tunnel: refers to Network Virtualization Overlay tunnels 132 with Ethernet payload as specified for VxLAN in [RFC7348] and for 133 NVGRE in [RFC7637]. 135 EVI: EVPN Instance spanning the NVE/PE devices that are participating 136 on that EVPN, as per [RFC7432]. 138 EVPN: Ethernet Virtual Private Networks, as per [RFC7432]. 140 IP NVO tunnel: it refers to Network Virtualization Overlay tunnels 141 with IP payload (no MAC header in the payload) as specified for GPE 142 in [I-D.ietf-nvo3-vxlan-gpe]. 144 IP-VRF: A Virtual Routing and Forwarding table for IP routes on an 145 NVE/PE. The IP routes could be populated by EVPN and IP-VPN address 146 families. An IP-VRF is also an instantiation of a layer 3 VPN in an 147 NVE/PE. 149 IRB: Integrated Routing and Bridging interface. It connects an IP- 150 VRF to a broadcast domain (or subnet). 152 MAC-VRF: A Virtual Routing and Forwarding table for Media Access 153 Control (MAC) addresses on an NVE/PE, as per [RFC7432]. A MAC-VRF is 154 also an instantiation of an EVI in an NVE/PE. 156 ND: Neighbor Discovery Protocol 158 NVE: Network Virtualization Edge 160 NVGRE: Network Virtualization Generic Routing Encapsulation, 161 [RFC7637] 163 NVO: Network Virtualization Overlays 165 RT-2: EVPN route type 2, i.e., MAC/IP Advertisement route, as defined 166 in [RFC7432] 168 RT-5: EVPN route type 5, i.e., IP Prefix route. As defined in 169 Section 3 of [I-D.ietf-bess-evpn-prefix-advertisement] 171 TS: Tenant System 173 VA: Virtual Appliance 175 VNI: Virtual Network Identifier. As in [RFC8365], the term is used 176 as a representation of a 24-bit NVO instance identifier, with the 177 understanding that VNI will refer to a VXLAN Network Identifier in 178 VXLAN, or Virtual Subnet Identifier in NVGRE, etc. unless it is 179 stated otherwise. 181 VTEP: VXLAN Termination End Point, as in [RFC7348]. 183 VXLAN: Virtual Extensible LAN, as in [RFC7348]. 185 This document also assumes familiarity with the terminology of 186 [RFC7432], [RFC8365] and [RFC7365]. 188 2. Introduction 190 EVPN [RFC7432] provides an extensible and flexible multi-homing VPN 191 solution over an MPLS/IP network for intra-subnet connectivity among 192 Tenant Systems (TSes) and End Devices that can be physical or 193 virtual; where an IP subnet is represented by an EVPN Instance (EVI) 194 for a VLAN-based service or by an (EVI, VLAN) for a VLAN-aware bundle 195 service. However, there are scenarios for which there is a need for 196 a dynamic and efficient inter-subnet connectivity among these Tenant 197 Systems and End Devices while maintaining the multi-homing 198 capabilities of EVPN. This document describes an Integrated Routing 199 and Bridging (IRB) solution based on EVPN to address such 200 requirements. 202 The inter-subnet communication is traditionally achieved at 203 centralized L3 Gateway (L3GW) devices where all the inter-subnet 204 forwarding is performed and all the inter-subnet communication 205 policies are enforced. When two TSes belonging to two different 206 subnets connected to the same PE wanted to communicate with each 207 other, their traffic needed to be backhauled from the PE all the way 208 to the centralized gateway where inter-subnet switching is performed 209 and then back to the PE. For today's large multi-tenant data center, 210 this scheme is very inefficient and sometimes impractical. 212 In order to overcome the drawback of the centralized layer-3 GW 213 approach, IRB functionality is needed on the PEs (also referred to as 214 EVPN NVEs) attached to TSes in order to avoid inefficient forwarding 215 of tenant traffic (i.e., avoid back-hauling and hair-pinning). When 216 a PE with IRB capability receives tenant traffic over an Attachment 217 Circuit (AC), it can not only locally bridge the tenant intra-subnet 218 traffic but also can locally route the tenant inter-subnet traffic on 219 a packet by packet basis thus meeting the requirements for both intra 220 and inter-subnet forwarding and avoiding non-optimal traffic 221 forwarding associated with centralized layer-3 GW approach. 223 Some TSes run non-IP protocols in conjunction with their IP traffic. 224 Therefore, it is important to handle both kinds of traffic optimally 225 - e.g., to bridge non-IP and intra-subnet traffic and to route inter- 226 subnet IP traffic. Therefore, the solution needs to meet the 227 following requirements: 229 R1: The solution must provide each tenant with IP routing of its 230 inter-subnet traffic and Ethernet bridging of its intra-subnet 231 traffic and non-routable traffic, where non-routable traffic refers 232 both to non-IP traffic and IP traffic whose version differs from the 233 IP version configured in the IP-VRF. For example, if an IP-VRF in a 234 NVE is configured for IPv6 and that NVE receives IPv4 traffic on the 235 corresponding VLAN, then the IPv4 traffic is treated as non-routable 236 traffic. 238 R2: The solution must allow IP routing of inter-subnet traffic to be 239 disabled on a per-VLAN basis on those PEs that are backhauling that 240 traffic to another PE for routing. 242 3. EVPN PE Model for IRB Operation 244 Since this document discusses IRB operation in relationship to EVPN 245 MAC-VRF, IP-VRF, EVI, Broadcast Domain, Bridge Table, and IRB 246 interfaces, it is important to understand the relationship between 247 these components. Therefore, the following PE model is illustrated 248 below to a) describe these components and b) illustrate the 249 relationship among them. 251 +-------------------------------------------------------------+ 252 | | 253 | +------------------+ IRB PE | 254 | Attachment | +------------------+ | 255 | Circuit(AC1) | | +----------+ | MPLS/NVO tnl 256 ----------------------*Bridge | | +----- 257 | | | |Table(BT1)| | +-----------+ / \ \ 258 | | | | *---------* |<--> |Eth| 259 | | | | VLAN x | |IRB1| | \ / / 260 | | | +----------+ | | | +----- 261 | | | ... | | IP-VRF1 | | 262 | | | +----------+ | | RD2/RT2 |MPLS/NVO tnl 263 | | | |Bridge | | | | +----- 264 | | | |Table(BT2)| |IRB2| | / \ \ 265 | | | | *---------* |<--> |IP | 266 ----------------------* VLAN y | | +-----------+ \ / / 267 | AC2 | | +----------+ | +----- 268 | | | MAC-VRF1 | | 269 | +-+ RD1/RT1 | | 270 | +------------------+ | 271 | | 272 | | 273 +-------------------------------------------------------------+ 275 Figure 1: EVPN IRB PE Model 277 A tenant needing IRB services on a PE, requires an IP Virtual Routing 278 and Forwarding table (IP-VRF) along with one or more MAC Virtual 279 Routing and Forwarding tables (MAC-VRFs). An IP-VRF, as defined in 280 [RFC4364], is the instantiation of an IPVPN instance in a PE. A MAC- 281 VRF, as defined in [RFC7432], is the instantiation of an EVI (EVPN 282 Instance) in a PE. A MAC-VRF consists of one or more bridge tables, 283 where each bridge table corresponds to a VLAN (broadcast domain). If 284 service interfaces for an EVPN PE are configured in VLAN- Based mode 285 (i.e., section 6.1 of RFC7432), then there is only a single bridge 286 table per MAC-VRF (per EVI) - i.e., there is only one tenant VLAN per 287 EVI. However, if service interfaces for an EVPN PE are configured in 288 VLAN-Aware Bundle mode (i.e., section 6.3 of RFC7432), then there are 289 several bridge tables per MAC-VRF (per EVI) - i.e., there are several 290 tenant VLANs per EVI. 292 Each bridge table is connected to an IP-VRF via an L3 interface 293 called IRB interface. Since a single tenant subnet is typically (and 294 in this document) represented by a VLAN (and thus supported by a 295 single bridge table), for a given tenant there are as many bridge 296 tables as there are subnets and thus there are also as many IRB 297 interfaces between the tenant IP-VRF and the associated bridge tables 298 as shown in the PE model above. 300 IP-VRF is identified by its corresponding route target and route 301 distinguisher and MAC-VRF is also identified by its corresponding 302 route target and route distinguisher. If operating in EVPN VLAN- 303 Based mode, then a receiving PE that receives an EVPN route with MAC- 304 VRF route target can identify the corresponding bridge table; 305 however, if operating in EVPN VLAN-Aware Bundle mode, then the 306 receiving PE needs both the MAC-VRF route target and VLAN ID in order 307 to identify the corresponding bridge table. 309 4. Symmetric and Asymmetric IRB 311 This document defines and describes two types of IRB solutions - 312 namely symmetric and asymmetric IRB. The description of symmetric 313 and asymmetric IRB procedures relating to data path operations and 314 tables in this document is a logical view of data path lookups and 315 related tables. Actual implementations, while following this logical 316 view, may not strictly adhere to it for performance tradeoffs. 317 Specifically, 319 o References to ARP table in the context of asymmetric IRB is a 320 logical view of a forwarding table that maintains an IP to MAC 321 binding entry on a layer 3 interface for both IPv4 and IPv6. 322 These entries are not subject to ARP or ND protocol. For IP to 323 MAC bindings learnt via EVPN, an implementation may choose to 324 import these bindings directly to the respective forwarding table 325 (such as an adjacency/next-hop table) as opposed to importing them 326 to ARP or ND protocol tables. 328 o References to host IP lookup followed by a host MAC lookup in the 329 context of asymmetric IRB MAY be collapsed into a single IP lookup 330 in a hardware implementation. 332 In symmetric IRB as its name implies, the lookup operation is 333 symmetric at both ingress and egress PEs - i.e., both ingress and 334 egress PEs perform lookups on both MAC and IP addresses. The ingress 335 PE performs a MAC lookup followed by an IP lookup and the egress PE 336 performs an IP lookup followed by a MAC lookup as depicted in the 337 following figure. 339 Ingress PE Egress PE 340 +-------------------+ +------------------+ 341 | | | | 342 | +-> IP-VRF ----|---->---|-----> IP-VRF -+ | 343 | | | | | | 344 | BT1 BT2 | | BT3 BT2 | 345 | | | | | | 346 | ^ | | v | 347 | | | | | | 348 +-------------------+ +------------------+ 349 ^ | 350 | | 351 TS1->-+ +->-TS2 352 Figure 2: Symmetric IRB 354 In symmetric IRB as shown in figure-2, the inter-subnet forwarding 355 between two PEs is done between their associated IP-VRFs. Therefore, 356 the tunnel connecting these IP-VRFs can be either IP-only tunnel 357 (e.g., in case of MPLS or GPE encapsulation) or Ethernet NVO tunnel 358 (e.g., in case of VxLAN encapsulation). If it is an Ethernet NVO 359 tunnel, the TS1's IP packet is encapsulated in an Ethernet header 360 consisting of ingress and egress PEs MAC addresses - i.e., there is 361 no need for ingress PE to use the destination TS2's MAC address. 362 Therefore, in symmetric IRB, there is no need for the ingress PE to 363 maintain ARP entries for destination TS2's IP and MAC addresses 364 association in its ARP table. Each PE participating in symmetric IRB 365 only maintains ARP entries for locally connected hosts and maintains 366 MAC-VRFs/bridge tables for only locally configured subnets. 368 In asymmetric IRB, the lookup operation is asymmetric and the ingress 369 PE performs three lookups; whereas the egress PE performs a single 370 lookup - i.e., the ingress PE performs a MAC lookup, followed by an 371 IP lookup, followed by a MAC lookup again; whereas, the egress PE 372 performs just a single MAC lookup as depicted in figure 3 below. 374 Ingress PE Egress PE 375 +-------------------+ +------------------+ 376 | | | | 377 | +-> IP-VRF -> | | IP-VRF | 378 | | | | | | 379 | BT1 BT2 | | BT3 BT2 | 380 | | | | | | | | 381 | | +--|--->----|--------------+ | | 382 | | | | v | 383 +-------------------+ +----------------|-+ 384 ^ | 385 | | 386 TS1->-+ +->-TS2 387 Figure 3: Asymmetric IRB 389 In asymmetric IRB as shown in figure-3, the inter-subnet forwarding 390 between two PEs is done between their associated MAC-VRFs/bridge 391 tables. Therefore, the MPLS or NVO tunnel used for inter-subnet 392 forwarding MUST be of type Ethernet. Since only MAC lookup is 393 performed at the egress PE (e.g., no IP lookup), the TS1's IP packets 394 need to be encapsulated with the destination TS2's MAC address. In 395 order for ingress PE to perform such encapsulation, it needs to 396 maintain TS2's IP and MAC address association in its ARP table. 397 Furthermore, it needs to maintain destination TS2's MAC address in 398 the corresponding bridge table even though it may not have any TSes 399 of the corresponding subnet locally attached. In other words, each 400 PE participating in asymmetric IRB MUST maintain ARP entries for 401 remote hosts (hosts connected to other PEs) as well as maintain MAC- 402 VRFs/bridge tables and IRB interfaces for ALL subnets in an IP VRF 403 including subnets that may not be locally attached. Therefore, 404 careful consideration of PE scale aspects for its ARP table size, its 405 IRB interfaces, number and size of its bridge tables should be given 406 for the application of asymmetric IRB. 408 It should be noted that whenever a PE performs a host IP lookup for a 409 packet that is routed, IPv4 TTL or IPv6 hop limit for that packet is 410 decremented by one and if it reaches zero, the packet is discarded. 411 In the case of symmetric IRB, the TTL/hop limit is decremented by 412 both ingress and egress PEs (once by each); whereas, in the case of 413 asymmetric IRB, the TTL/hop limit is decremented only once by the 414 ingress PE. 416 The following sections define the control and data plane procedures 417 for symmetric and asymmetric IRB on ingress and egress PEs. The 418 following figure is used to describe these procedures, showing a 419 single IP-VRF and a number of broadcast domains on each PE for a 420 given tenant. I.e., an IP-VRF connects one or more EVIs, each EVI 421 contains one MAC-VRF, each MAC VRF consists of one or more bridge 422 tables, one per broadcast domain, and a PE has an associated IRB 423 interface for each broadcast domain. 425 PE 1 +---------+ 426 +-------------+ | | 427 TS1-----| MACx| | | PE2 428 (IP1/M1) |(BT1) | | | +-------------+ 429 TS5-----| \ | | MPLS/ | |MACy (BT3) |-----TS3 430 (IP5/M5) |IPx/Mx \ | | VxLAN/ | | / | (IP3/M3) 431 | (IP-VRF1)|----| NVGRE |---|(IP-VRF1) | 432 | / | | | | \ | 433 TS2-----|(BT2) / | | | | (BT1) |-----TS4 434 (IP2/M2) | | | | | | (IP4/M4) 435 +-------------+ | | +-------------+ 436 | | 437 +---------+ 439 Figure 4: IRB forwarding 441 4.1. IRB Interface and its MAC and IP addresses 443 To support inter-subnet forwarding on a PE, the PE acts as an IP 444 Default Gateway from the perspective of the attached Tenant Systems 445 where default gateway MAC and IP addresses are configured on each IRB 446 interface associated with its subnet and falls into one of the 447 following two options: 449 1. All the PEs for a given tenant subnet use the same anycast 450 default gateway IP and MAC addresses. On each PE, this default 451 gateway IP and MAC addresses correspond to the IRB interface 452 connecting the bridge table associated with the tenant's VLAN to 453 the corresponding tenant's IP-VRF. 455 2. Each PE for a given tenant subnet uses the same anycast default 456 gateway IP address but its own MAC address. These MAC addresses 457 are aliased to the same anycast default gateway IP address 458 through the use of the Default Gateway extended community as 459 specified in [RFC7432], which is carried in the EVPN MAC/IP 460 Advertisement routes. On each PE, this default gateway IP 461 address along with its associated MAC addresses correspond to the 462 IRB interface connecting the bridge table associated with the 463 tenant's VLAN to the corresponding tenant's IP-VRF. 465 It is worth noting that if the applications that are running on the 466 TSes are employing or relying on any form of MAC security, then the 467 first option (i.e. using anycast MAC address) should be used to 468 ensure that the applications receive traffic from the same IRB 469 interface MAC address that they are sending to. If the second option 470 is used, then the IRB interface MAC address MUST be the one used in 471 the initial ARP reply or ND Neighbor Advertisement (NA)for that TS. 473 Although both of these options are applicable to both symmetric and 474 asymmetric IRB, the option-1 is recommended because of the ease of 475 anycast MAC address provisioning on not only the IRB interface 476 associated with a given subnet across all the PEs corresponding to 477 that VLAN but also on all IRB interfaces associated with all the 478 tenant's subnets across all the PEs corresponding to all the VLANs 479 for that tenant. Furthermore, it simplifies the operation as there 480 is no need for Default Gateway extended community advertisement and 481 its associated MAC aliasing procedure. Yet another advantage is that 482 following host mobility, the host does not need to refresh the 483 default GW ARP/ND entry. 485 If option-1 is used, an implementation MAY choose to auto-derive the 486 anycast MAC address. If auto-derivation is used, the anycast MAC 487 MUST be auto-derived out of the following ranges (which are defined 488 in [RFC5798]): 490 o Anycast IPv4 IRB case: 00-00-5E-00-01-{VRID} 492 o Anycast IPv6 IRB case: 00-00-5E-00-02-{VRID} 494 Where the last octet is generated based on a configurable Virtual 495 Router ID (VRID, range 1-255)). If not explicitly configured, the 496 default value for the VRID octet is '1'. Auto-derivation of the 497 anycast MAC can only be used if there is certainty that the auto- 498 derived MAC does not collide with any customer MAC address. 500 In addition to IP anycast addresses, IRB interfaces can be configured 501 with non-anycast IP addresses for the purpose of OAM (such as 502 traceroute/ping to these interfaces) for both symmetric and 503 asymmetric IRB. These IP addresses need to be distributed as VPN 504 routes when PEs operate in symmetric IRB mode. However, they don't 505 need to be distributed if the PEs are operating in asymmetric IRB 506 mode as the non-anycast IP addresses are configured along with their 507 individual MACs and they get distributed via EVPN route type-2 508 advertisement. 510 For option-1, irrespective of using only the anycast MAC address or 511 both anycast and non-anycast MAC addresses (where the latter one is 512 used for the purpose of OAM) on the same IRB, when a TS sends an ARP 513 request or ND Neighbor Solicitation (NS) to the PE that is attached 514 to, the request is sent for the anycast IP address of the IRB 515 interface associated with the TS's subnet and then the reply will use 516 anycast MAC address (in both Source MAC in the Ethernet header and 517 Sender hardware address in the payload). For example, in figure 4, 518 TS1 is configured with the anycast IPx address as its default gateway 519 IP address and thus when it sends an ARP request for IPx (anycast IP 520 address of the IRB interface for BT1), the PE1 sends an ARP reply 521 with the MACx which is the anycast MAC address of that IRB interface. 522 Traffic routed from IP-VRF1 to TS1 uses the anycast MAC address as 523 source MAC address. 525 4.2. Operational Considerations 527 Symmetric and Asymmetric IRB modes may coexist in the same network, 528 and an ingress PE that supports both forwarding modes for a given 529 tenant can interwork with egress PEs that support either IRB mode. 530 The egress PE will indicate the desired forwarding mode for a given 531 host based on the presence of the Label2 field and the IP-VRF route- 532 target in the EVPN MAC/IP Advertisement route. If the Label2 field 533 of the received MAC/IP Advertisement route for host H1 is non-zero, 534 and one of its route-targets identifies the IP-VRF, the ingress PE 535 will use Symmetric IRB mode when forwarding packets destined to H1. 536 If the Label2 field is zero and the MAC/IP Advertisement route for H1 537 does not carry any route-target that identifies the IP-VRF, the 538 ingress PE will use Asymmetric mode when forwarding traffic to H1. 540 As an example that illustrates the previous statement, suppose PE1 541 and PE2 need to forward packets from TS2 to TS4 in the example of 542 Figure 4. Since both PEs are attached to the bridge table of the 543 destination host, Symmetric and Asymmetric IRB modes are both 544 possible as long as the ingress PE, PE1, supports both modes. The 545 forwarding mode will depend on the mode configured in the egress PE, 546 PE2. That is: 548 1. If PE2 is configured for Symmetric IRB mode, PE2 will advertise 549 TS4 MAC/IP addresses in a MAC/IP Advertisement route with a non- 550 zero Label2 field, e.g., Label2=Lx, and a route-target that 551 identifies IP-VRF1 in PE1. IP4 will be installed in PE1's IP- 552 VRF1, TS4's ARP and MAC information will also be installed in 553 PE1's IRB interface ARP table and BT1 respectively. When a 554 packet from TS2 destined to TS4 is looked up in PE1's IP-VRF 555 route-table, a longest prefix match lookup will find IP4 in the 556 IP-VRF, and PE1 will forward using the Symmetric IRB mode and 557 Label Lx. 559 2. However, if PE2 is configured for Asymmetric IRB mode, PE2 will 560 advertise TS4 MAC/IP information in a MAC/IP Advertisement route 561 with a zero Label2 field and no route-target identifying IP-VRF1. 562 In this case, PE2 will install TS4 information in its ARP table 563 and BT1. When a packet from TS2 to TS4 arrives at PE1, a longest 564 prefix match on IP-VRF1's route-table will yield the local IRB 565 interface to BT1, where a subsequent ARP and bridge table lookup 566 will provide the information for an Asymmetric forwarding mode to 567 PE2. 569 Refer to [I-D.ietf-bess-evpn-modes-interop] for more information 570 about interoperability between Symmetric and Asymmetric forwarding 571 modes. 573 The choice between Symmetric or Asymmetric mode is based on the 574 operator's preference and it is a trade-off between scale (better in 575 the Symmetric IRB mode) and control plane simplicity (Asymmetric IRB 576 mode simplifies the control plane). In cases where a tenant has 577 hosts for every subnet attached to all (or most) the PEs, the ARP and 578 MAC entries need to be learned by all PEs anyway and therefore the 579 Asymmetric IRB mode simplifies the forwarding model and saves space 580 in the IP-VRF route-table, since host routes are not installed in the 581 route-table. However, if the tenant does not need to stretch subnets 582 (broadcast domains) to multiple PEs and inter-subnet-forwarding is 583 needed, the Symmetric IRB model will save ARP and bridge table space 584 in all the PEs (in comparison with the Asymmetric IRB model). 586 5. Symmetric IRB Procedures 588 5.1. Control Plane - Advertising PE 590 When a PE (e.g., PE1 in figure 4 above) learns MAC and IP address of 591 a TS (e.g., via an ARP request or Neighbor Solicitation), it adds the 592 MAC address to the corresponding MAC-VRF/bridge table of that 593 tenant's subnet and adds the IP address to the IP-VRF for that 594 tenant. Furthermore, it adds this TS's MAC and IP address 595 association to its ARP table or NDP cache. It then builds an EVPN 596 MAC/IP Advertisement route (type 2) as follows and advertises it to 597 other PEs participating in that tenant's VPN. 599 o The Length field of the BGP EVPN NLRI for an EVPN MAC/IP 600 Advertisement route MUST be either 40 (if IPv4 address is carried) 601 or 52 (if IPv6 address is carried). 603 o Route Distinguisher (RD), Ethernet Segment Identifier, Ethernet 604 Tag ID, MAC Address Length, MAC Address, IP Address Length, IP 605 Address, and MPLS Label1 fields MUST be set per [RFC7432] and 606 [RFC8365]. 608 o The MPLS Label2 field is set to either an MPLS label or a VNI 609 corresponding to the tenant's IP-VRF. In the case of an MPLS 610 label, this field is encoded as 3 octets, where the high-order 20 611 bits contain the label value. 613 Just as in [RFC7432], the RD, Ethernet Tag ID, MAC Address Length, 614 MAC Address, IP Address Length, and IP Address fields are part of the 615 route key used by BGP to compare routes. The rest of the fields are 616 not part of the route key. 618 This route is advertised along with the following two extended 619 communities: 621 1. Encapsulation Extended Community 623 2. Router's MAC Extended Community 625 This route is advertised with one or more Encapsulation extended 626 communities [RFC9012], one for each encapsulation type supported by 627 the advertising PE. If one or more encapsulation types require an 628 Ethernet frame, a single Router's MAC extended community, section 629 8.1, is also advertised. This extended community specifies the MAC 630 address to be used as the inner destination MAC address in an 631 Ethernet frame sent to the advertising PE. 633 This route MUST be advertised with two route targets, one 634 corresponding to the MAC-VRF of the tenant's subnet and another 635 corresponding to the tenant's IP-VRF. 637 5.2. Control Plane - Receiving PE 639 When a PE (e.g., PE2 in figure 4 above) receives this EVPN MAC/IP 640 Advertisement route, it performs the following: 642 o The MAC-VRF route target and Ethernet Tag, if the latter is non- 643 zero, are used to identify the correct MAC-VRF and bridge table 644 and if they are found the MAC address is imported. The IP-VRF 645 route target is used to identify the correct IP-VRF and if it is 646 found the IP address is imported. 648 If the MPLS label2 field is non-zero, it means that this route is to 649 be used for symmetric IRB and the MPLS label2 value is to be used 650 when sending a packet for this IP address to the advertising PE. 652 If the receiving PE receives this route with both the MAC-VRF and IP- 653 VRF route targets but the MAC/IP Advertisement route does not include 654 MPLS label2 field and if the receiving PE supports asymmetric IRB 655 mode, then the receiving PE installs the MAC address in the 656 corresponding MAC-VRF and (IP, MAC) association in the ARP table for 657 that tenant (identified by the corresponding IP-VRF route target). 659 If the receiving PE receives this route with both the MAC-VRF and IP- 660 VRF route targets and if the receiving PE does not support either 661 asymmetric or symmetric IRB modes, then if it has the corresponding 662 MAC-VRF, it only imports the MAC address. 664 If the receiving PE receives this route with both the MAC-VRF and IP- 665 VRF route targets and the MAC/IP Advertisement route includes MPLS 666 label2 field but the receiving PE only supports asymmetric IRB mode, 667 then the receiving PE MUST ignore MPLS label2 field and install the 668 MAC address in the corresponding MAC-VRF and (IP, MAC) association in 669 the ARP table for that tenant (identified by the corresponding IP-VRF 670 route target). 672 5.3. Subnet route advertisement 674 In the case of symmetric IRB, a layer-3 subnet and IRB interface 675 corresponding to a MAC-VRF/bridge table is required to be provisioned 676 at a PE only if that PE has locally attached hosts in that subnet. 677 In order to enable inter-subnet routing across PEs in a deployment 678 where not all subnets are provisioned at all PEs participating in an 679 EVPN IRB instance, PEs MUST advertise local subnet routes as EVPN RT- 680 5. These subnet routes are required for bootstrapping host (MAC,IP) 681 learning using gleaning procedures initiated by an inter-subnet data 682 packet. 684 I.e., if a given host's (MAC, IP) association is unknown, and an 685 ingress PE needs to send a packet to that host, then that ingress PE 686 needs to know which egress PEs are attached to the subnet in which 687 the host resides in order to send the packet to one of those PEs, 688 causing the PE receiving the packet to probe for that host. For 689 example, Consider a subnet A that is locally attached to PE1 and 690 subnet B that is locally attached to PE2 and to PE3. Host A in 691 subnet A, that is attached to PE1 initiates a data packet destined to 692 host B in subnet B that is attached to PE3. If host B's (MAC, IP) 693 has not yet been learnt either via a gratuitous ARP OR via a prior 694 gleaning procedure, a new gleaning procedure MUST be triggered for 695 host B's (MAC, IP) to be learnt and advertised across the EVPN 696 network. Since host B's subnet is not local to PE1, an IP lookup for 697 host B at PE1 will not trigger this gleaning procedure for host B's 698 (MAC, IP). Therefore, PE1 MUST learn subnet B's prefix route via 699 EVPN RT-5 advertised from PE2 and PE3, so it can route the packet to 700 one of the PEs that have subnet B locally attached. Once the packet 701 is received at PE2 OR PE3, and the route lookup yields a glean 702 result, an ARP request is triggered and flooded across the layer-2 703 overlay. This ARP request would be received and replied to by host 704 B, resulting in host B (MAC, IP) learning at PE3, and its 705 advertisement across the EVPN network. Packets from host A to host B 706 can now be routed directly from PE1 to PE3. Advertisement of local 707 subnet EVPN RT-5 for an IP VRF MAY typically be achieved via 708 provisioning connected route redistribution to BGP. 710 5.4. Data Plane - Ingress PE 712 When an Ethernet frame is received by an ingress PE (e.g., PE1 in 713 figure 4 above), the PE uses the AC ID (e.g., VLAN ID) to identify 714 the associated MAC-VRF/bridge table and it performs a lookup on the 715 destination MAC address. If the MAC address corresponds to its IRB 716 Interface MAC address, the ingress PE deduces that the packet must be 717 inter-subnet routed. Hence, the ingress PE performs an IP lookup in 718 the associated IP-VRF table. The lookup identifies BGP next hop of 719 egress PE along with the tunnel/encapsulation type and the associated 720 MPLS/VNI values. The ingress PE also decrements the TTL/hop limit 721 for that packet by one and if it reaches zero, the ingress PE 722 discards the packet. 724 If the tunnel type is that of MPLS or IP-only NVO tunnel, then TS's 725 IP packet is sent over the tunnel without any Ethernet header. 726 However, if the tunnel type is that of Ethernet NVO tunnel, then an 727 Ethernet header needs to be added to the TS's IP packet. The source 728 MAC address of this inner Ethernet header is set to the ingress PE's 729 router MAC address and the destination MAC address of this inner 730 Ethernet header is set to the egress PE's router MAC address learnt 731 via Router's MAC extended community attached to the route. MPLS VPN 732 label is set to the received label2 in the route. In the case of 733 Ethernet NVO tunnel type, VNI may be set one of two ways: 735 o downstream mode: VNI is set to the received label2 in the route 736 which is downstream assigned. 738 o global mode: VNI is set to the received label2 in the route which 739 is domain-wide assigned. This VNI value from received label2 MUST 740 be the same as the locally configured VNI for the IP VRF as all 741 PEs in the NVO MUST be configured with the same IP VRF VNI for 742 this mode of operation. If the received label2 value does not 743 match the locally configured VNI value the route MUST NOT be used 744 and an error message SHOULD logged. 746 PEs may be configured to operate in one of these two modes depending 747 on the administrative domain boundaries across PEs participating in 748 the NVO, and PE's capability to support downstream VNI mode. 750 In the case of NVO tunnel encapsulation, the outer source and 751 destination IP addresses are set to the ingress and egress PE BGP 752 next-hop IP addresses respectively. 754 5.5. Data Plane - Egress PE 756 When the tenant's MPLS or NVO encapsulated packet is received over an 757 MPLS or NVO tunnel by the egress PE, the egress PE removes NVO tunnel 758 encapsulation and uses the VPN MPLS label (for MPLS encapsulation) or 759 VNI (for NVO encapsulation) to identify the IP-VRF in which IP lookup 760 needs to be performed. If the VPN MPLS label or VNI identifies a 761 MAC- VRF instead of an IP-VRF, then the procedures in section 6.4 for 762 asymmetric IRB are executed. 764 The lookup in the IP-VRF identifies a local adjacency to the IRB 765 interface associated with the egress subnet's MAC-VRF/bridge table. 766 The egress PE also decrements the TTL/hop limit for that packet by 767 one and if it reaches zero, the egress PE discards the packet. 769 The egress PE gets the destination TS's MAC address for that TS's IP 770 address from its ARP table or NDP cache, it encapsulates the packet 771 with that destination MAC address and a source MAC address 772 corresponding to that IRB interface and sends the packet to its 773 destination subnet MAC-VRF/bridge table. 775 The destination MAC address lookup in the MAC-VRF/bridge table 776 results in local adjacency (e.g., local interface) over which the 777 Ethernet frame is sent on. 779 6. Asymmetric IRB Procedures 781 6.1. Control Plane - Advertising PE 783 When a PE (e.g., PE1 in figure 4 above) learns MAC and IP address of 784 an attached TS (e.g., via an ARP request or ND Neighbor 785 Solicitation), it populates its MAC-VRF/bridge table, IP-VRF, and ARP 786 table or NDP cache just as in the case for symmetric IRB. It then 787 builds an EVPN MAC/IP Advertisement route (type 2) as follows and 788 advertises it to other PEs participating in that tenant's VPN. 790 o The Length field of the BGP EVPN NLRI for an EVPN MAC/IP 791 Advertisement route MUST be either 37 (if IPv4 address is carried) 792 or 49 (if IPv6 address is carried). 794 o Route Distinguisher (RD), Ethernet Segment Identifier, Ethernet 795 Tag ID, MAC Address Length, MAC Address, IP Address Length, IP 796 Address, and MPLS Label1 fields MUST be set per [RFC7432] and 797 [RFC8365]. 799 o The MPLS Label2 field MUST NOT be included in this route. 801 Just as in [RFC7432], the RD, Ethernet Tag ID, MAC Address Length, 802 MAC Address, IP Address Length, and IP Address fields are part of the 803 route key used by BGP to compare routes. The rest of the fields are 804 not part of the route key. 806 This route is advertised along with the following extended community: 808 o Tunnel Type Extended Community 810 For asymmetric IRB mode, Router's MAC extended community is not 811 needed because forwarding is performed using destination TS's MAC 812 address which is carried in this EVPN route type-2 advertisement. 814 This route MUST always be advertised with the MAC-VRF route target. 815 It MAY also be advertised with a second route target corresponding to 816 the IP-VRF. 818 6.2. Control Plane - Receiving PE 820 When a PE (e.g., PE2 in figure 4 above) receives this EVPN MAC/IP 821 Advertisement route, it performs the following: 823 o Using MAC-VRF route target, it identifies the corresponding MAC- 824 VRF and imports the MAC address into it. For asymmetric IRB mode, 825 it is assumed that all PEs participating in a tenant's VPN are 826 configured with all subnets (i.e., all VLANs) and corresponding 827 MAC-VRFs/bridge tables even if there are no locally attached TSes 828 for some of these subnets. The reason for this is because ingress 829 PE needs to do forwarding based on destination TS's MAC address 830 and perform NVO tunnel encapsulation as a property of a lookup in 831 MAC-VRF/bridge table. 833 o If only MAC-VRF route target is used, then the receiving PE uses 834 the MAC-VRF route target to identify the corresponding IP-VRF -- 835 i.e., many MAC-VRF route targets map to the same IP-VRF for a 836 given tenant. In this case, MAC-VRF may be used by the receiving 837 PE to identify the corresponding IP VRF via the IRB interface 838 associated with the subnet MAC-VRF/bridge table. In this case, 839 the MAC-VRF route target may be used by the receiving PE to 840 identify the corresponding IP VRF. 842 o Using MAC-VRF route target, the receiving PE identifies the 843 corresponding ARP table or NDP cache for the tenant and it adds an 844 entry to the ARP table or NDP cache for the TS's MAC and IP 845 address association. It should be noted that the tenant's ARP 846 table or NDP cache at the receiving PE is identified by all the 847 MAC- VRF route targets for that tenant. 849 o If IP-VRF route target is included, it may be used to import the 850 route to IP-VRF. If IP-VRF route-target is not included, MAC-VRF 851 is used to derive corresponding IP-VRF for import, as explained in 852 the prior section. In both cases, IP-VRF route is installed with 853 the TS MAC binding included in the received route. 855 If the receiving PE receives the MAC/IP Advertisement route with MPLS 856 label2 field but the receiving PE only supports asymmetric IRB mode, 857 then the receiving PE MUST ignore MPLS label2 field and install the 858 MAC address in the corresponding MAC-VRF and (IP, MAC) association in 859 the ARP table or NDP cache for that tenant (with IRB interface 860 identified by the MAC-VRF). 862 6.3. Data Plane - Ingress PE 864 When an Ethernet frame is received by an ingress PE (e.g., PE1 in 865 figure 4 above), the PE uses the AC ID (e.g., VLAN ID) to identify 866 the associated MAC-VRF/bridge table and it performs a lookup on the 867 destination MAC address. If the MAC address corresponds to its IRB 868 Interface MAC address, the ingress PE deduces that the packet must be 869 inter-subnet routed. Hence, the ingress PE performs an IP lookup in 870 the associated IP-VRF table. The lookup identifies a local adjacency 871 to the IRB interface associated with the egress subnet's MAC-VRF/ 872 bridge table. The ingress PE also decrements the TTL/hop limit for 873 that packet by one and if it reaches zero, the ingress PE discards 874 the packet. 876 The ingress PE gets the destination TS's MAC address for that TS's IP 877 address from its ARP table or NDP cache, it encapsulates the packet 878 with that destination MAC address and a source MAC address 879 corresponding to that IRB interface and sends the packet to its 880 destination subnet MAC-VRF/bridge table. 882 The destination MAC address lookup in the MAC-VRF/bridge table 883 results in BGP next hop address of egress PE along with label1 (L2 884 VPN MPLS label or VNI). The ingress PE encapsulates the packet using 885 Ethernet NVO tunnel of the choice (e.g., VxLAN or NVGRE) and sends 886 the packet to the egress PE. Because the packet forwarding is 887 between ingress PE's MAC-VRF/bridge table and egress PE's MAC-VRF/ 888 bridge table, the packet encapsulation procedures follow that of 889 [RFC7432] for MPLS and [RFC8365] for VxLAN encapsulations. 891 6.4. Data Plane - Egress PE 893 When a tenant's Ethernet frame is received over an NVO tunnel by the 894 egress PE, the egress PE removes NVO tunnel encapsulation and uses 895 the VPN MPLS label (for MPLS encapsulation) or VNI (for NVO 896 encapsulation) to identify the MAC-VRF/bridge table in which MAC 897 lookup needs to be performed. 899 The MAC lookup results in local adjacency (e.g., local interface) 900 over which the packet needs to get sent. 902 Note that the forwarding behavior on the egress PE is the same as 903 EVPN intra-subnet forwarding described in [RFC7432] for MPLS and 904 [RFC8365] for NVO networks. In other words, all the packet 905 processing associated with the inter-subnet forwarding semantics is 906 confined to the ingress PE for asymmetric IRB mode. 908 It should also be noted that [RFC7432] provides a different level of 909 granularity for the EVPN label. Besides identifying the bridge 910 domain table, it can be used to identify the egress interface or a 911 destination MAC address on that interface. If EVPN label is used for 912 egress interface or individual MAC address identification, then no 913 MAC lookup is needed in the egress PE for MPLS encapsulation and the 914 packet can be directly forwarded to the egress interface just based 915 on EVPN label lookup. 917 7. Mobility Procedure 919 When a TS moves from one NVE (aka source NVE) to another NVE (aka 920 target NVE), it is important that the MAC mobility procedures are 921 properly executed and the corresponding MAC-VRF and IP-VRF tables on 922 all participating NVEs are updated. [RFC7432] describes the MAC 923 mobility procedures for L2-only services for both single-homed TS and 924 multi-homed TS. This section describes the incremental procedures 925 and BGP Extended Communities needed to handle the MAC mobility for 926 IRB. In order to place the emphasis on the differences between 927 L2-only and IRB use cases, the incremental procedure is described for 928 single-homed TS with the expectation that the additional steps needed 929 for multi-homed TS, can be extended per section 15 of [RFC7432]. 930 This section describes mobility procedures for both symmetric and 931 asymmetric IRB. Although the language used in this section is for 932 IPv4 ARP, it equally applies to IPv6 ND. 934 When a TS moves from a source NVE to a target NVE, it can behave in 935 one of the following three ways: 937 1. TS initiates an ARP request upon a move to the target NVE 939 2. TS sends data packet without first initiating an ARP request to 940 the target NVE 942 3. TS is a silent host and neither initiates an ARP request nor 943 sends any packets 945 Depending on the expexted TS's behavior, an NVE needs to handle at 946 least the first bullet and should be able to handle the 2nd and the 947 3rd bullet. The following subsections describe the procedures for 948 each of them where it is assumed that the MAC and IP addresses of a 949 TS have one-to-one relationship (i.e., there is one IP address per 950 MAC address and vice versa). The procedures for host mobility 951 detection in the presence of many-to-one relationship is outside the 952 scope of this document and it is covered in 953 [I-D.ietf-bess-evpn-irb-extended-mobility]. The many-to-one 954 relationship means many host IP addresses corresponding to a single 955 host MAC address or many host MAC addresses corresponding to a single 956 IP address. It should be noted that in case of IPv6, a Link Local IP 957 address does not count in many-to-one relationship because that 958 address is confined to single Ethernet Segment and it is not used for 959 host moblity (i.e., by definition host mobility is between two 960 different Ethernet Segments). Therefore, when an IPv6 host is 961 configured with both a Global Unicast address (or a Unique Local 962 address) and a Link Local address, for the purpose of host mobility, 963 it is considered with a single IP address. 965 7.1. Initiating a gratutious ARP upon a Move 967 In this scenario when a TS moves from a source NVE to a target NVE, 968 the TS initiates a gratuitous ARP upon the move to the target NVE. 970 The target NVE upon receiving this ARP message, updates its MAC-VRF, 971 IP-VRF, and ARP table with the host MAC, IP, and local adjacency 972 information (e.g., local interface). 974 Since this NVE has previously learned the same MAC and IP addresses 975 from the source NVE, it recognizes that there has been a MAC move and 976 it initiates MAC mobility procedures per [RFC7432] by advertising an 977 EVPN MAC/IP Advertisement route with both the MAC and IP addresses 978 filled in (per sections 5.1 and 6.1) along with MAC Mobility Extended 979 Community with the sequence number incremented by one. The target 980 NVE also exercises the MAC duplication detection procedure in section 981 15.1 of [RFC7432]. 983 The source NVE upon receiving this MAC/IP Advertisement route, 984 realizes that the MAC has moved to the target NVE. It updates its 985 MAC-VRF and IP-VRF table accordingly with the adjacency information 986 of the target NVE. In the case of the asymmetric IRB, the source NVE 987 also updates its ARP table with the received adjacency information 988 and in the case of the symmetric IRB, the source NVE removes the 989 entry associated with the received (MAC, IP) from its local ARP 990 table. It then withdraws its EVPN MAC/IP Advertisement route. 991 Furthermore, it sends an ARP probe locally to ensure that the MAC is 992 gone. If an ARP response is received, the source NVE updates its ARP 993 entry for that (IP, MAC) and re-advertises an EVPN MAC/IP 994 Advertisement route for that (IP, MAC) along with MAC Mobility 995 Extended Community with the sequence number incremented by one. The 996 source NVE also exercises the MAC duplication detection procedure in 997 section 15.1 of [RFC7432]. 999 All other remote NVE devices upon receiving the MAC/IP Advertisement 1000 route with MAC Mobility extended community compare the sequence 1001 number in this advertisement with the one previously received. If 1002 the new sequence number is greater than the old one, then they update 1003 the MAC/IP addresses of the TS in their corresponding MAC-VRF and IP- 1004 VRF tables to point to the target NVE. Furthermore, upon receiving 1005 the MAC/IP withdraw for the TS from the source NVE, these remote PEs 1006 perform the cleanups for their BGP tables. 1008 7.2. Sending Data Traffic without an ARP Request 1010 In this scenario when a TS moves from a source NVE to a target NVE, 1011 the TS starts sending data traffic without first initiating an ARP 1012 request. 1014 The target NVE upon receiving the first data packet, learns the MAC 1015 address of the TS in the data plane and updates its MAC-VRF table 1016 with the MAC address and the local adjacency information (e.g., local 1017 interface) accordingly. The target NVE realizes that there has been 1018 a MAC move because the same MAC address has been learned remotely 1019 from the source NVE. 1021 If EVPN-IRB NVEs are configured to advertise MAC-only routes in 1022 addition to MAC-and-IP EVPN routes, then the following steps are 1023 taken: 1025 o The target NVE upon learning this MAC address in the data plane, 1026 updates this MAC address entry in the corresponding MAC-VRF with 1027 the local adjacency information (e.g., local interface). It also 1028 recognizes that this MAC has moved and initiates MAC mobility 1029 procedures per [RFC7432] by advertising an EVPN MAC/IP 1030 Advertisement route with only the MAC address filled in along with 1031 MAC Mobility Extended Community with the sequence number 1032 incremented by one. 1034 o The source NVE upon receiving this MAC/IP Advertisement route, 1035 realizes that the MAC has moved to the new NVE. It updates its 1036 MAC-VRF table with the adjacency information for that MAC address 1037 to point to the target NVE and withdraws its EVPN MAC/IP 1038 Advertisement route that has only the MAC address (if it has 1039 advertised such route previously). Furthermore, it searches for 1040 the corresponding MAC-IP entry and sends an ARP probe for this 1041 (MAC,IP) pair. The ARP request message is sent both locally to 1042 all attached TSes in that subnet as well as it is sent to other 1043 NVEs participating in that subnet including the target NVE. Note 1044 that the PE needs to maintain a correlation between MAC and MAC-IP 1045 route entries in the MAC-VRF to accomplish this. 1047 o The target NVE passes the ARP request to its locally attached TSes 1048 and when it receives the ARP response, it updates its IP-VRF and 1049 ARP table with the host (MAC, IP) information. It also sends an 1050 EVPN MAC/IP Advertisement route with both the MAC and IP addresses 1051 filled in along with MAC Mobility Extended Community with the 1052 sequence number set to the same value as the one for MAC-only 1053 advertisement route it sent previously. 1055 o When the source NVE receives the EVPN MAC/IP Advertisement route, 1056 it updates its IP-VRF table with the new adjacency information 1057 (pointing to the target NVE). In the case of the asymmetric IRB, 1058 the source NVE also updates its ARP table with the received 1059 adjacency information and in the case of the symmetric IRB, the 1060 source NVE removes the entry associated with the received (MAC, 1061 IP) from its local ARP table. Furthermore, it withdraws its 1062 previously advertised EVPN MAC/IP route with both the MAC and IP 1063 address fields filled in. 1065 o All other remote NVE devices upon receiving the MAC/IP 1066 advertisement route with MAC Mobility extended community compare 1067 the sequence number in this advertisement with the one previously 1068 received. If the new sequence number is greater than the old one, 1069 then they update the MAC/IP addresses of the TS in their 1070 corresponding MAC-VRF, IP-VRF, and ARP tables (in the case of 1071 asymmetric IRB) to point to the new NVE. Furthermore, upon 1072 receiving the MAC/IP withdraw for the TS from the old NVE, these 1073 remote PEs perform the cleanups for their BGP tables. 1075 If EVPN-IRB NVEs are configured not to advertise MAC-only routes, 1076 then upon receiving the first data packet, it learns the MAC address 1077 of the TS and updates the MAC entry in the corresponding MAC-VRF 1078 table with the local adjacency information (e.g., local interface). 1079 It also realizes that there has been a MAC move because the same MAC 1080 address has been learned remotely from the source NVE. It uses the 1081 local MAC route to find the corresponding local MAC-IP route, and 1082 sends a unicast ARP request to the host and when receiving an ARP 1083 response, it follows the procedure outlined in section 7.1. In the 1084 prior case, where MAC-only routes are also advertised, this procedure 1085 of triggering a unicast ARP probe at the target PE MAY also be used 1086 in addition to the source PE broadcast ARP probing procedure 1087 described earlier for better convergence. 1089 7.3. Silent Host 1091 In this scenario when a TS moves from a source NVE to a target NVE, 1092 the TS is silent and it neither initiates an ARP request nor it sends 1093 any data traffic. Therefore, neither the target nor the source NVEs 1094 are aware of the MAC move. 1096 On the source NVE, an age-out timer (for the silent host that has 1097 moved) is used to trigger an ARP probe. This age-out timer can be 1098 either ARP timer or MAC age-out timer and this is an implementation 1099 choice. The ARP request gets sent both locally to all the attached 1100 TSes on that subnet as well as it gets sent to all the remote NVEs 1101 (including the target NVE) participating in that subnet. The source 1102 NVE also withdraw the EVPN MAC/IP Advertisement route with only the 1103 MAC address (if it has previously advertised such a route). 1105 The target NVE passes the ARP request to its locally attached TSes 1106 and when it receives the ARP response, it updates its MAC-VRF, IP- 1107 VRF, and ARP table with the host (MAC, IP) and local adjacency 1108 information (e.g., local interface). It also sends an EVPN MAC/IP 1109 advertisement route with both the MAC and IP address fields filled in 1110 along with MAC Mobility Extended Community with the sequence number 1111 incremented by one. 1113 When the source NVE receives the EVPN MAC/IP Advertisement route, it 1114 updates its IP-VRF table with the new adjacency information (pointing 1115 to the target NVE). In the case of the asymmetric IRB, the source 1116 NVE also updates its ARP table with the received adjacency 1117 information and in the case of the symmetric IRB, the source NVE 1118 removes the entry associated with the received (MAC, IP) from its 1119 local ARP table. Furthermore, it withdraws its previously advertised 1120 EVPN MAC/IP route with both the MAC and IP address fields filled in. 1122 All other remote NVE devices upon receiving the MAC/IP Advertisement 1123 route with MAC Mobility extended community compare the sequence 1124 number in this advertisement with the one previously received. If 1125 the new sequence number is greater than the old one, then they update 1126 the MAC/IP addresses of the TS in their corresponding MAC-VRF, IP- 1127 VRF, and ARP (in the case of asymmetric IRB) tables to point to the 1128 new NVE. Furthermore, upon receiving the MAC/IP withdraw for the TS 1129 from the old NVE, these remote PEs perform the cleanups for their BGP 1130 tables. 1132 8. BGP Encoding 1134 This document defines one new BGP Extended Community for EVPN. 1136 8.1. Router's MAC Extended Community 1138 A new EVPN BGP Extended Community called Router's MAC is introduced 1139 here. This new extended community is a transitive extended community 1140 with the Type field of 0x06 (EVPN) and the Sub-Type of 0x03. It may 1141 be advertised along with Encapsulation Extended Community defined in 1142 section 4.1 of [I-D.ietf-idr-tunnel-encaps]. 1144 The Router's MAC Extended Community is encoded as an 8-octet value as 1145 follows: 1147 0 1 2 3 1148 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1149 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1150 | Type=0x06 | Sub-Type=0x03 | Router's MAC | 1151 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1152 | Router's MAC Cont'd | 1153 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1155 Figure 5: Router's MAC Extended Community 1157 This extended community is used to carry the PE's MAC address for 1158 symmetric IRB scenarios and it is sent with EVPN RT-2. The 1159 advertising PE SHALL only attach a single Router's MAC Extended 1160 Community to a route. In case the receiving PE receives more than 1161 one Router's MAC Extended Community with a route, it SHALL process 1162 the first one in the list and not store and propagate the others. 1164 9. Operational Models for Symmetric Inter-Subnet Forwarding 1166 The following sections describe two main symmetric IRB forwarding 1167 scenarios (within a DC -- i.e., intra-DC) along with the 1168 corresponding procedures. In the following scenarios, without loss 1169 of generality, it is assumed that a given tenant is represented by a 1170 single IP-VPN instance. Therefore, on a given PE, a tenant is 1171 represented by a single IP-VRF table and one or more MAC-VRF tables. 1173 9.1. IRB forwarding on NVEs for Tenant Systems 1175 This section covers the symmetric IRB procedures for the scenario 1176 where each Tenant System (TS) is attached to one or more NVEs and its 1177 host IP and MAC addresses are learned by the attached NVEs and are 1178 distributed to all other NVEs that are interested in participating in 1179 both intra-subnet and inter-subnet communications with that TS. 1181 In this scenario, without loss of generality, it is assumed that NVEs 1182 operate in VLAN-based service interface mode with one bridge table(s) 1183 per MAC-VRF. Thus, for a given tenant, an NVE has one MAC-VRF for 1184 each tenant subnet (e.g., each VLAN) that is configured for extension 1185 via VxLAN or NVGRE encapsulation. In the case of VLAN-aware 1186 bundling, then each MAC-VRF consists of multiple Bridge Tables (e.g., 1187 one bridge table per VLAN). The MAC-VRFs on an NVE for a given 1188 tenant are associated with an IP-VRF corresponding to that tenant (or 1189 IP-VPN instance) via their IRB interfaces. 1191 Since VxLAN and NVGRE encapsulations require inner Ethernet header 1192 (inner MAC SA/DA), and since for inter-subnet traffic, TS MAC address 1193 cannot be used, the ingress NVE's MAC address is used as inner MAC 1194 SA. The NVE's MAC address is the device MAC address and it is common 1195 across all MAC-VRFs and IP-VRFs. This MAC address is advertised 1196 using the new EVPN Router's MAC Extended Community (section 8.1). 1198 Figure 6 below illustrates this scenario where a given tenant (e.g., 1199 an IP-VPN instance) has three subnets represented by MAC-VRF1, MAC- 1200 VRF2, and MAC-VRF3 across two NVEs. There are five TSes that are 1201 associated with these three MAC-VRFs -- i.e., TS1, TS4, and TS5 are 1202 on the same subnet (e.g., same MAC-VRF/VLAN). TS1 and TS5 are 1203 associated with MAC-VRF1 on NVE1, while TS4 is associated with MAC- 1204 VRF1 on NVE2. TS2 is associated with MAC-VRF2 on NVE1, and TS3 is 1205 associated with MAC-VRF3 on NVE2. MAC-VRF1 and MAC-VRF2 on NVE1 are 1206 in turn associated with IP-VRF1 on NVE1 and MAC-VRF1 and MAC-VRF3 on 1207 NVE2 are associated with IP-VRF1 on NVE2. When TS1, TS5, and TS4 1208 exchange traffic with each other, only the L2 forwarding (bridging) 1209 part of the IRB solution is exercised because all these TSes belong 1210 to the same subnet. However, when TS1 wants to exchange traffic with 1211 TS2 or TS3 which belong to different subnets, both bridging and 1212 routing parts of the IRB solution are exercised. The following 1213 subsections describe the control and data planes operations for this 1214 IRB scenario in details. 1216 NVE1 +---------+ 1217 +-------------+ | | 1218 TS1-----| MACx| | | NVE2 1219 (IP1/M1) |(MAC- | | | +-------------+ 1220 TS5-----| VRF1)\ | | MPLS/ | |MACy (MAC- |-----TS3 1221 (IP5/M5) | \ | | VxLAN/ | | / VRF3) | (IP3/M3) 1222 | (IP-VRF1)|----| NVGRE |---|(IP-VRF1) | 1223 | / | | | | \ | 1224 TS2-----|(MAC- / | | | | (MAC- |-----TS4 1225 (IP2/M2) | VRF2) | | | | VRF1) | (IP4/M4) 1226 +-------------+ | | +-------------+ 1227 | | 1228 +---------+ 1230 Figure 6: IRB forwarding on NVEs for Tenant Systems 1232 9.1.1. Control Plane Operation 1234 Each NVE advertises a MAC/IP Advertisement route (i.e., Route Type 2) 1235 for each of its TSes with the following field set: 1237 o RD and ESI per [RFC7432] 1239 o Ethernet Tag = 0; assuming VLAN-based service 1241 o MAC Address Length = 48 1243 o MAC Address = Mi ; where i = 1,2,3,4, or 5 in the above example 1245 o IP Address Length = 32 or 128 1247 o IP Address = IPi ; where i = 1,2,3,4, or 5 in the above example 1249 o Label1 = MPLS Label or VNI corresponding to MAC-VRF 1251 o Label2 = MPLS Label or VNI corresponding to IP-VRF 1253 Each NVE advertises an EVPN RT-2 route with two Route Targets (one 1254 corresponding to its MAC-VRF and the other corresponding to its IP- 1255 VRF. Furthermore, the EVPN RT-2 is advertised with two BGP Extended 1256 Communities. The first BGP Extended Community identifies the tunnel 1257 type and it is called Encapsulation Extended Community as defined in 1258 [I-D.ietf-idr-tunnel-encaps] and the second BGP Extended Community 1259 includes the MAC address of the NVE (e.g., MACx for NVE1 or MACy for 1260 NVE2) as defined in section 8.1. The Router's MAC Extended community 1261 MUST be added when Ethernet NVO tunnel is used. If IP NVO tunnel 1262 type is used, then there is no need to send this second Extended 1263 Community. It should be noted that IP NVO tunnel type is only 1264 applicable to symmetric IRB procedures. 1266 Upon receiving this advertisement, the receiving NVE performs the 1267 following: 1269 o It uses Route Targets corresponding to its MAC-VRF and IP-VRF for 1270 identifying these tables and subsequently importing the MAC and IP 1271 addresses into them respectively. 1273 o It imports the MAC address from MAC/IP Advertisement route into 1274 the MAC-VRF with BGP Next Hop address as the underlay tunnel 1275 destination address (e.g., VTEP DA for VxLAN encapsulation) and 1276 Label1 as VNI for VxLAN encapsulation or EVPN label for MPLS 1277 encapsulation. 1279 o If the route carries the new Router's MAC Extended Community, and 1280 if the receiving NVE uses Ethernet NVO tunnel, then the receiving 1281 NVE imports the IP address into IP-VRF with NVE's MAC address 1282 (from the new Router's MAC Extended Community) as inner MAC DA and 1283 BGP Next Hop address as the underlay tunnel destination address, 1284 VTEP DA for VxLAN encapsulation and Label2 as IP-VPN VNI for VxLAN 1285 encapsulation. 1287 o If the receiving NVE uses MPLS encapsulation, then the receiving 1288 NVE imports the IP address into IP-VRF with BGP Next Hop address 1289 as the underlay tunnel destination address, and Label2 as IP-VPN 1290 label for MPLS encapsulation. 1292 If the receiving NVE receives an EVPN RT-2 with only Label1 and only 1293 a single Route Target corresponding to IP-VRF, or if it receives an 1294 EVPN RT-2 with only a single Route Target corresponding to MAC-VRF 1295 but with both Label1 and Label2, or if it receives an EVPN RT-2 with 1296 MAC Address Length of zero, then it MUST use the treat-as-withdraw 1297 approach [RFC7606] and SHOULD log an error message. 1299 9.1.2. Data Plane Operation 1301 The following description of the data-plane operation describes just 1302 the logical functions and the actual implementation may differ. Lets 1303 consider data-plane operation when TS1 in subnet-1 (MAC-VRF1) on NVE1 1304 wants to send traffic to TS3 in subnet-3 (MAC-VRF3) on NVE2. 1306 o NVE1 receives a packet with MAC DA corresponding to the MAC-VRF1 1307 IRB interface on NVE1 (the interface between MAC-VRF1 and IP- 1308 VRF1), and VLAN-tag corresponding to MAC-VRF1. 1310 o Upon receiving the packet, the NVE1 uses VLAN-tag to identify the 1311 MAC-VRF1. It then looks up the MAC DA and forwards the frame to 1312 its IRB interface. 1314 o The Ethernet header of the packet is stripped and the packet is 1315 fed to the IP-VRF where an IP lookup is performed on the 1316 destination IP address. NVE1 also decrements the TTL/hop limit 1317 for that packet by one and if it reaches zero, NVE1 discards the 1318 packet. This lookup yields the outgoing NVO tunnel and the 1319 required encapsulation. If the encapsulation is for Ethernet NVO 1320 tunnel, then it includes the egress NVE's MAC address as inner MAC 1321 DA, the egress NVE's IP address (e.g., BGP Next Hop address) as 1322 the VTEP DA, and the VPN-ID as the VNI. The inner MAC SA and VTEP 1323 SA are set to NVE's MAC and IP addresses respectively. If it is a 1324 MPLS encapsulation, then corresponding EVPN and LSP labels are 1325 added to the packet. The packet is then forwarded to the egress 1326 NVE. 1328 o On the egress NVE, if the packet arrives on Ethernet NVO tunnel 1329 (e.g., it is VxLAN encapsulated), then the NVO tunnel header is 1330 removed. Since the inner MAC DA is the egress NVE's MAC address, 1331 the egress NVE knows that it needs to perform an IP lookup. It 1332 uses the VNI to identify the IP-VRF table. If the packet is MPLS 1333 encapsulated, then the EVPN label lookup identifies the IP-VRF 1334 table. Next, an IP lookup is performed for the destination TS 1335 (TS3) which results in an access-facing IRB interface over which 1336 the packet is sent. Before sending the packet over this 1337 interface, the ARP table is consulted to get the destination TS's 1338 MAC address. NVE2 also decrements the TTL/hop limit for that 1339 packet by one and if it reaches zero, NVE2 discards the packet. 1341 o The IP packet is encapsulated with an Ethernet header with MAC SA 1342 set to that of IRB interface MAC address (i.e, IRB interface 1343 between MAC-VRF3 and IP-VRF1 on NVE2) and MAC DA set to that of 1344 destination TS (TS3) MAC address. The packet is sent to the 1345 corresponding MAC-VRF (i.e., MAC-VRF3) and after a lookup of MAC 1346 DA, is forwarded to the destination TS (TS3) over the 1347 corresponding interface. 1349 In this symmetric IRB scenario, inter-subnet traffic between NVEs 1350 will always use the IP-VRF VNI/MPLS label. For instance, traffic 1351 from TS2 to TS4 will be encapsulated by NVE1 using NVE2's IP-VRF VNI/ 1352 MPLS label, as long as TS4's host IP is present in NVE1's IP-VRF. 1354 9.2. IRB forwarding on NVEs for Subnets behind Tenant Systems 1356 This section covers the symmetric IRB procedures for the scenario 1357 where some Tenant Systems (TSes) support one or more subnets and 1358 these TSes are associated with one or more NVEs. Therefore, besides 1359 the advertisement of MAC/IP addresses for each TS which can be multi- 1360 homed with All-Active redundancy mode, the associated NVE needs to 1361 also advertise the subnets statically configured on each TS. 1363 The main difference between this solution and the previous one is the 1364 additional advertisement corresponding to each subnet. These subnet 1365 advertisements are accomplished using the EVPN IP Prefix route 1366 defined in [I-D.ietf-bess-evpn-prefix-advertisement]. These subnet 1367 prefixes are advertised with the IP address of their associated TS 1368 (which is in overlay address space) as their next hop. The receiving 1369 NVEs perform recursive route resolution to resolve the subnet prefix 1370 with its advertising NVE so that they know which NVE to forward the 1371 packets to when they are destined for that subnet prefix. 1373 The advantage of this recursive route resolution is that when a TS 1374 moves from one NVE to another, there is no need to re-advertise any 1375 of the subnet prefixes for that TS. All it is needed is to advertise 1376 the IP/MAC addresses associated with the TS itself and exercise MAC 1377 mobility procedures for that TS. The recursive route resolution 1378 automatically takes care of the updates for the subnet prefixes of 1379 that TS. 1381 Figure 7 illustrates this scenario where a given tenant (e.g., an IP- 1382 VPN service) has three subnets represented by MAC-VRF1, MAC-VRF2, and 1383 MAC-VRF3 across two NVEs. There are four TSes associated with these 1384 three MAC-VRFs -- i.e., TS1 is connected to MAC-VRF1 on NVE1, TS2 is 1385 connected to MAC-VRF2 on NVE1, TS3 is connected to MAC- VRF3 on NVE2, 1386 and TS4 is connected to MAC-VRF1 on NVE2. TS1 has two subnet 1387 prefixes (SN1 and SN2) and TS3 has a single subnet prefix, SN3. The 1388 MAC-VRFs on each NVE are associated with their corresponding IP-VRF 1389 using their IRB interfaces. When TS4 and TS1 exchange intra- subnet 1390 traffic, only L2 forwarding (bridging) part of the IRB solution is 1391 used (i.e., the traffic only goes through their MAC- VRFs); however, 1392 when TS3 wants to forward traffic to SN1 or SN2 sitting behind TS1 1393 (inter-subnet traffic), then both bridging and routing parts of the 1394 IRB solution are exercised (i.e., the traffic goes through the 1395 corresponding MAC-VRFs and IP-VRFs). If TS4, for example, wants to 1396 reach SN1, it uses its default route and sends the packet to the MAC 1397 address associated with the IRB interface on NVE2, NVE2 then makes an 1398 IP lookup in its IP- VRF, and finds an entry for SN1. The following 1399 subsections describe the control and data planes operations for this 1400 IRB scenario in details. 1402 NVE1 +----------+ 1403 SN1--+ +-------------+ | | 1404 |--TS1-----|(MAC- \ | | | 1405 SN2--+ IP1/M1 | VRF1) \ | | | 1406 | (IP-VRF)|---| | 1407 | / | | | 1408 TS2-----|(MAC- / | | MPLS/ | 1409 IP2/M2 | VRF2) | | VxLAN/ | 1410 +-------------+ | NVGRE | 1411 +-------------+ | | 1412 SN3--+--TS3-----|(MAC-\ | | | 1413 IP3/M3 | VRF3)\ | | | 1414 | (IP-VRF)|---| | 1415 | / | | | 1416 TS4-----|(MAC- / | | | 1417 IP4/M4 | VRF1) | | | 1418 +-------------+ +----------+ 1419 NVE2 1421 Figure 7: IRB forwarding on NVEs for subnets behind TSes 1423 Note that in figure 7, above, SN1 and SN2 are configured on NVE1, 1424 which then advertises each in an IP Prefix route. Similarly, SN3 is 1425 configured on NVE2, which then advertises it in an IP Prefix route. 1427 9.2.1. Control Plane Operation 1429 Each NVE advertises a Route Type-5 (EVPN RT-5, IP Prefix route 1430 defined in [I-D.ietf-bess-evpn-prefix-advertisement]) for each of its 1431 subnet prefixes with the IP address of its TS as the next hop 1432 (gateway address field) as follows: 1434 o RD associated with the IP-VRF 1436 o ESI = 0 1438 o Ethernet Tag = 0; 1440 o IP Prefix Length = 0 to 32 or 0 to 128 1442 o IP Prefix = SNi 1444 o Gateway Address = IPi; IP address of TS 1446 o MPLS Label = 0 1447 This EVPN RT-5 is advertised with one or more Route Targets 1448 associated with the IP-VRF from which the route is originated. 1450 Each NVE also advertises an EVPN RT-2 (MAC/IP Advertisement Route) 1451 along with their associated Route Targets and Extended Communities 1452 for each of its TSes exactly as described in section 9.1.1. 1454 Upon receiving the EVPN RT-5 advertisement, the receiving NVE 1455 performs the following: 1457 o It uses the Route Target to identify the corresponding IP-VRF 1459 o It imports the IP prefix into its corresponding IP-VRF that is 1460 configured with an import RT that is one of the RTs being carried 1461 by the EVPN RT-5 route along with the IP address of the associated 1462 TS as its next hop. 1464 When receiving the EVPN RT-2 advertisement, the receiving NVE imports 1465 MAC/IP addresses of the TS into the corresponding MAC-VRF and IP-VRF 1466 per section 9.1.1. When both routes exist, recursive route 1467 resolution is performed to resolve the IP prefix (received in EVPN 1468 RT-5) to its corresponding NVE's IP address (e.g., its BGP next hop). 1469 BGP next hop will be used as the underlay tunnel destination address 1470 (e.g., VTEP DA for VxLAN encapsulation) and Router's MAC will be used 1471 as inner MAC for VxLAN encapsulation. 1473 9.2.2. Data Plane Operation 1475 The following description of the data-plane operation describes just 1476 the logical functions and the actual implementation may differ. Lets 1477 consider data-plane operation when a host on SN1 sitting behind TS1 1478 wants to send traffic to a host sitting behind SN3 behind TS3. 1480 o TS1 send a packet with MAC DA corresponding to the MAC-VRF1 IRB 1481 interface of NVE1, and VLAN-tag corresponding to MAC-VRF1. 1483 o Upon receiving the packet, the ingress NVE1 uses VLAN-tag to 1484 identify the MAC-VRF1. It then looks up the MAC DA and forwards 1485 the frame to its IRB interface just like section 9.1.1. 1487 o The Ethernet header of the packet is stripped and the packet is 1488 fed to the IP-VRF; where, IP lookup is performed on the 1489 destination address. This lookup yields the fields needed for 1490 VxLAN encapsulation with NVE2's MAC address as the inner MAC DA, 1491 NVE'2 IP address as the VTEP DA, and the VNI. MAC SA is set to 1492 NVE1's MAC address and VTEP SA is set to NVE1's IP address. NVE1 1493 also decrements the TTL/hop limit for that packet by one and if it 1494 reaches zero, NVE1 discards the packet. 1496 o The packet is then encapsulated with the proper header based on 1497 the above info and is forwarded to the egress NVE (NVE2). 1499 o On the egress NVE (NVE2), assuming the packet is VxLAN 1500 encapsulated, the VxLAN and the inner Ethernet headers are removed 1501 and the resultant IP packet is fed to the IP-VRF associated with 1502 that the VNI. 1504 o Next, a lookup is performed based on IP DA (which is in SN3) in 1505 the associated IP-VRF of NVE2. The IP lookup yields the access- 1506 facing IRB interface over which the packet needs to be sent. 1507 Before sending the packet over this interface, the ARP table is 1508 consulted to get the destination TS (TS3) MAC address. NVE2 also 1509 decrements the TTL/hop limit for that packet by one and if it 1510 reaches zero, NVE2 discards the packet. 1512 o The IP packet is encapsulated with an Ethernet header with the MAC 1513 SA set to that of the access-facing IRB interface of the egress 1514 NVE (NVE2) and the MAC DA is set to that of destination TS (TS3) 1515 MAC address. The packet is sent to the corresponding MAC-VRF3 and 1516 after a lookup of MAC DA, is forwarded to the destination TS (TS3) 1517 over the corresponding interface. 1519 10. Acknowledgements 1521 The authors would like to thank Sami Boutros, Jeffrey Zhang, 1522 Krzysztof Szarkowicz, Lukas Krattiger and Neeraj Malhotra for their 1523 valuable comments. The authors would also like to thank Linda 1524 Dunbar, Florin Balus, Yakov Rekhter, Wim Henderickx, Lucy Yong, and 1525 Dennis Cai for their feedback and contributions. 1527 11. Security Considerations 1529 The security considerations for layer-2 forwarding in this document 1530 follow that of [RFC7432] for MPLS encapsulation and it follows that 1531 of [RFC8365] for VxLAN or NVGRE encapsulations. This section 1532 describes additional considerations. 1534 This document describes a set of procedures for Inter-Subnet 1535 Forwarding of tenant traffic across PEs (or NVEs). These procedures 1536 include both layer-2 forwarding and layer-3 routing on a packet by 1537 packet basis. The security consideration for layer-3 routing in this 1538 document follows that of [RFC4365] with the exception for the 1539 application of routing protocols between CEs and PEs. Contrary to 1540 [RFC4364], this document does not describe route distribution 1541 techniques between CEs and PEs, but rather considers the CEs as TSes 1542 or VAs that do not run dynamic routing protocols. This can be 1543 considered a security advantage, since dynamic routing protocols can 1544 be blocked on the NVE/PE ACs, not allowing the tenant to interact 1545 with the infrastructure's dynamic routing protocols. 1547 The VPN scheme described in this document does not provide the 1548 quartet of security properties mentioned in [RFC4365] 1549 (confidentiality protection, source authentication, integrity 1550 protection, replay protection). If these are desired, they must be 1551 provided by mechanisms that are outside the scope of the VPN 1552 mechanisms. 1554 In this document, the EVPN RT-5 is used for certain scenarios. This 1555 route uses an Overlay Index that requires a recursive resolution to a 1556 different EVPN route (an EVPN RT-2). Because of this, it is worth 1557 noting that any action that ends up filtering or modifying the EVPN 1558 RT-2 route used to convey the Overlay Indexes, will modify the 1559 resolution of the EVPN RT-5 and therefore the forwarding of packets 1560 to the remote subnet. 1562 12. IANA Considerations 1564 IANA has allocated a new transitive extended community Type of 0x06 1565 and Sub-Type of 0x03 for EVPN Router's MAC Extended Community. 1567 This document has been listed as an additional reference for the MAC/ 1568 IP Advertisement route in the EVPN Route Type registry. 1570 13. References 1572 13.1. Normative References 1574 [I-D.ietf-bess-evpn-prefix-advertisement] 1575 Rabadan, J., Henderickx, W., Drake, J., Lin, W., and A. 1576 Sajassi, "IP Prefix Advertisement in EVPN", draft-ietf- 1577 bess-evpn-prefix-advertisement-11 (work in progress), May 1578 2018. 1580 [I-D.ietf-idr-tunnel-encaps] 1581 Patel, K., Velde, G., Sangli, S., and J. Scudder, "The BGP 1582 Tunnel Encapsulation Attribute", draft-ietf-idr-tunnel- 1583 encaps-22 (work in progress), January 2021. 1585 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1586 Requirement Levels", BCP 14, RFC 2119, 1587 DOI 10.17487/RFC2119, March 1997, 1588 . 1590 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1591 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1592 2006, . 1594 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1595 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1596 eXtensible Local Area Network (VXLAN): A Framework for 1597 Overlaying Virtualized Layer 2 Networks over Layer 3 1598 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1599 . 1601 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1602 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 1603 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 1604 2015, . 1606 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 1607 Patel, "Revised Error Handling for BGP UPDATE Messages", 1608 RFC 7606, DOI 10.17487/RFC7606, August 2015, 1609 . 1611 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 1612 Virtualization Using Generic Routing Encapsulation", 1613 RFC 7637, DOI 10.17487/RFC7637, September 2015, 1614 . 1616 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1617 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1618 May 2017, . 1620 [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., 1621 Uttaro, J., and W. Henderickx, "A Network Virtualization 1622 Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, 1623 DOI 10.17487/RFC8365, March 2018, 1624 . 1626 13.2. Informative References 1628 [I-D.ietf-bess-evpn-irb-extended-mobility] 1629 Malhotra, N., Sajassi, A., Pattekar, A., Lingala, A., 1630 Rabadan, J., and J. Drake, "Extended Mobility Procedures 1631 for EVPN-IRB", draft-ietf-bess-evpn-irb-extended- 1632 mobility-03 (work in progress), May 2020. 1634 [I-D.ietf-nvo3-vxlan-gpe] 1635 Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol 1636 Extension for VXLAN (VXLAN-GPE)", draft-ietf-nvo3-vxlan- 1637 gpe-10 (work in progress), July 2020. 1639 [RFC4365] Rosen, E., "Applicability Statement for BGP/MPLS IP 1640 Virtual Private Networks (VPNs)", RFC 4365, 1641 DOI 10.17487/RFC4365, February 2006, 1642 . 1644 [RFC5798] Nadas, S., Ed., "Virtual Router Redundancy Protocol (VRRP) 1645 Version 3 for IPv4 and IPv6", RFC 5798, 1646 DOI 10.17487/RFC5798, March 2010, 1647 . 1649 [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 1650 Rekhter, "Framework for Data Center (DC) Network 1651 Virtualization", RFC 7365, DOI 10.17487/RFC7365, October 1652 2014, . 1654 Authors' Addresses 1656 Ali Sajassi 1657 Cisco Systems 1659 Email: sajassi@cisco.com 1661 Samer Salam 1662 Cisco Systems 1664 Email: ssalam@cisco.com 1666 Samir Thoria 1667 Cisco Systems 1669 Email: sthoria@cisco.com 1671 John E Drake 1672 Juniper 1674 Email: jdrake@juniper.net 1676 Jorge Rabadan 1677 Nokia 1679 Email: jorge.rabadan@nokia.com