idnits 2.17.1 draft-ietf-bess-evpn-inter-subnet-forwarding-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 3 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: If the receiving PE receives the MAC/IP Advertisement route with MPLS label2 field and it can support symmetric IRB mode, then it should use the MAC-VRF route target to identify its corresponding MAC-VRF table and import the MAC address. It should use the IP-VRF route target to identify the corresponding IP-VRF table and import the IP address. It MUST not import association into its ARP table. -- The document date (February 3, 2019) is 1902 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC4364' is mentioned on line 1339, but not defined == Missing Reference: 'RFC7365' is mentioned on line 200, but not defined == Missing Reference: 'RFC5798' is mentioned on line 473, but not defined == Missing Reference: 'RFC 7432' is mentioned on line 837, but not defined == Missing Reference: 'RFC4365' is mentioned on line 1338, but not defined == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-03 == Outdated reference: A later version (-11) exists of draft-ietf-bess-evpn-prefix-advertisement-03 == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-06 == Outdated reference: A later version (-04) exists of draft-malhotra-bess-evpn-irb-extended-mobility-02 Summary: 1 error (**), 0 flaws (~~), 11 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 L2VPN Workgroup A. Sajassi, Ed. 3 INTERNET-DRAFT S. Salam 4 Intended Status: Standards Track S. Thoria 5 Cisco 6 J. Drake 7 Juniper 8 J. Rabadan 9 Nokia 11 Expires: August 3, 2019 February 3, 2019 13 Integrated Routing and Bridging in EVPN 14 draft-ietf-bess-evpn-inter-subnet-forwarding-06 16 Abstract 18 EVPN provides an extensible and flexible multi-homing VPN solution 19 over an MPLS/IP network for intra-subnet connectivity among Tenant 20 Systems and End Devices that can be physical or virtual. However, 21 there are scenarios for which there is a need for a dynamic and 22 efficient inter-subnet connectivity among these Tenant Systems and 23 End Devices while maintaining the multi-homing capabilities of EVPN. 24 This document describes an Integrated Routing and Bridging (IRB) 25 solution based on EVPN to address such requirements. 27 Status of this Memo 29 This Internet-Draft is submitted to IETF in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF), its areas, and its working groups. Note that 34 other groups may also distribute working documents as 35 Internet-Drafts. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 The list of current Internet-Drafts can be accessed at 43 http://www.ietf.org/1id-abstracts.html 45 The list of Internet-Draft Shadow Directories can be accessed at 46 http://www.ietf.org/shadow.html 48 Copyright and License Notice 50 Copyright (c) 2014 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 66 2 EVPN PE Model for IRB Operation . . . . . . . . . . . . . . . . 7 67 3 Symmetric and Asymmetric IRB . . . . . . . . . . . . . . . . . 8 68 3.1 IRB Interface and its MAC & IP addresses . . . . . . . . . . 11 69 3.2 Symmetric IRB Procedures . . . . . . . . . . . . . . . . . . 13 70 3.2.1 Control Plane - Ingress PE . . . . . . . . . . . . . . . 13 71 3.2.2 Control Plane - Egress PE . . . . . . . . . . . . . . . 14 72 3.2.3 Data Plane - Ingress PE . . . . . . . . . . . . . . . . 15 73 3.2.4 Data Plane - Egress PE . . . . . . . . . . . . . . . . . 15 74 3.3 Asymmetric IRB Procedures . . . . . . . . . . . . . . . . . 16 75 3.3.1 Control Plane - Ingress PE . . . . . . . . . . . . . . . 16 76 3.3.2 Control Plane - Egress PE . . . . . . . . . . . . . . . 17 77 3.3.3 Data Plane - Ingress PE . . . . . . . . . . . . . . . . 17 78 3.3.4 Data Plane - Egress PE . . . . . . . . . . . . . . . . . 18 79 4 Mobility Procedure . . . . . . . . . . . . . . . . . . . . . . . 18 80 4.1 Mobility Procedure for Symmetric IRB . . . . . . . . . . . . 19 81 4.1.1 Initiating an ARP Request upon a Move . . . . . . . . . 19 82 4.1.2 Sending Data Traffic without an ARP Request . . . . . . 20 83 4.1.3 Silent Host . . . . . . . . . . . . . . . . . . . . . . 22 84 5 BGP Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 22 85 5.1 Router's MAC Extended Community . . . . . . . . . . . . . . 22 86 6 Operational Models for Symmetric Inter-Subnet Forwarding . . . . 23 87 6.1 IRB forwarding on NVEs for Tenant Systems . . . . . . . . . 23 88 6.1.1 Control Plane Operation . . . . . . . . . . . . . . . . 25 89 6.1.2 Data Plane Operation . . . . . . . . . . . . . . . . . . 26 90 6.2 IRB forwarding on NVEs for Subnets behind Tenant Systems . . 27 91 6.2.1 Control Plane Operation . . . . . . . . . . . . . . . . 28 92 6.2.2 Data Plane Operation . . . . . . . . . . . . . . . . . . 29 93 7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 30 94 8 Security Considerations . . . . . . . . . . . . . . . . . . . . 30 95 9 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 31 96 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . 31 97 10.1 Normative References . . . . . . . . . . . . . . . . . . . 31 98 10.2 Informative References . . . . . . . . . . . . . . . . . . 31 99 11 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 32 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 32 102 Terminology 104 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 105 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 106 "OPTIONAL" in this document are to be interpreted as described in BCP 107 14 [RFC2119] [RFC8174] when, and only when, they appear in all 108 capitals, as shown here. 110 AC: Attachment Circuit. 112 ARP: Address Resolution Protocol. 114 BD: Broadcast Domain. As per [RFC7432], an EVI consists of a single 115 or multiple BDs. In case of VLAN-bundle and VLAN-based service models 116 (see [RFC7432]), a BD is equivalent to an EVI. In case of VLAN-aware 117 bundle service model, an EVI contains multiple BDs. Also, in this 118 document, BD and subnet are equivalent terms. 120 BD Route Target: refers to the Broadcast Domain assigned Route Target 121 [RFC4364]. In case of VLAN-aware bundle service model, all the BD 122 instances in the MAC-VRF share the same Route Target. 124 BT: Bridge Table. The instantiation of a BD in a MAC-VRF, as per 125 [RFC7432]. 127 DGW: Data Center Gateway. 129 Ethernet A-D route: Ethernet Auto-Discovery (A-D) route, as per 130 [RFC7432]. 132 Ethernet NVO tunnel: refers to Network Virtualization Overlay tunnels 133 with Ethernet payload. Examples of this type of tunnels are VXLAN or 134 GENEVE. 136 EVI: EVPN Instance spanning the NVE/PE devices that are participating 137 on that EVPN, as per [RFC7432]. 139 EVPN: Ethernet Virtual Private Networks, as per [RFC7432]. 141 GRE: Generic Routing Encapsulation. 143 GW IP: Gateway IP Address. 145 IPL: IP Prefix Length. 147 IP NVO tunnel: it refers to Network Virtualization Overlay tunnels 148 with IP payload (no MAC header in the payload). 150 IP-VRF: A VPN Routing and Forwarding table for IP routes on an 151 NVE/PE. The IP routes could be populated by EVPN and IP-VPN address 152 families. An IP-VRF is also an instantiation of a layer 3 VPN in an 153 NVE/PE. 155 IRB: Integrated Routing and Bridging interface. It connects an IP-VRF 156 to a BD (or subnet). 158 MAC-VRF: A Virtual Routing and Forwarding table for Media Access 159 Control (MAC) addresses on an NVE/PE, as per [RFC7432]. A MAC-VRF is 160 also an instantiation of an EVI in an NVE/PE. 162 ML: MAC address length. 164 NDP: Neighbor Discovery Protocol. 166 NVE: Network Virtualization Edge. 168 GENEVE: Generic Network Virtualization Encapsulation, [GENEVE]. 170 NVO: Network Virtualization Overlays. 172 RT-2: EVPN route type 2, i.e., MAC/IP advertisement route, as defined 173 in [RFC7432]. 175 RT-5: EVPN route type 5, i.e., IP Prefix route. As defined in Section 176 3 of [EVPN-PREFIX]. 178 SBD: Supplementary Broadcast Domain. A BD that does not have any ACs, 179 only IRB interfaces, and it is used to provide connectivity among all 180 the IP-VRFs of the tenant. The SBD is only required in IP-VRF- to-IP- 181 VRF use-cases (see Section 4.4.). 183 SN: Subnet. 185 TS: Tenant System. 187 VA: Virtual Appliance. 189 VNI: Virtual Network Identifier. As in [RFC8365], the term is used as 190 a representation of a 24-bit NVO instance identifier, with the 191 understanding that VNI will refer to a VXLAN Network Identifier in 192 VXLAN, or Virtual Network Identifier in GENEVE, etc. unless it is 193 stated otherwise. 195 VTEP: VXLAN Termination End Point, as in [RFC7348]. 197 VXLAN: Virtual Extensible LAN, as in [RFC7348]. 199 This document also assumes familiarity with the terminology of 200 [RFC7432], [RFC8365] and [RFC7365]. 202 1 Introduction 204 EVPN provides an extensible and flexible multi-homing VPN solution 205 over an MPLS/IP network for intra-subnet connectivity among Tenant 206 Systems (TSes) and End Devices that can be physical or virtual; where 207 an IP subnet is represented by an EVI for a VLAN-based service or by 208 an for a VLAN-aware bundle service. However, there are 209 scenarios for which there is a need for a dynamic and efficient 210 inter-subnet connectivity among these Tenant Systems and End Devices 211 while maintaining the multi-homing capabilities of EVPN. This 212 document describes an Integrated Routing and Bridging (IRB) solution 213 based on EVPN to address such requirements. 215 The inter-subnet communication is traditionally achieved at 216 centralized L3 Gateway (L3GW) devices where all the inter-subnet 217 forwarding are performed and all the inter-subnet communication 218 policies are enforced. When two TSes belonging to two different 219 subnets connected to the same PE node, wanted to communicate with 220 each other, their traffic needed to be back hauled from the PE node 221 all the way to the centralized gateway node where inter-subnet 222 switching is performed and then back to the PE node. For today's 223 large multi-tenant data center, this scheme is very inefficient and 224 sometimes impractical. 226 In order to overcome the drawback of centralized L3GW approach, IRB 227 functionality is needed on the PE nodes (also referred to as EVPN 228 NVEs) attached to TSes in order to avoid inefficient forwarding of 229 tenant traffic (i.e., avoid back-hauling and hair-pinning). When a PE 230 with IRB capability receives tenant traffic over a single Attachment 231 Circuit (AC), it can not only locally bridged the tenant intra-subnet 232 traffic but also can locally route the tenant inter-subnet traffic on 233 a packet by packet basis thus meeting the requirements for both intra 234 and inter-subnet forwarding and avoiding non-optimum traffic 235 forwarding associate with centralized L3GW approach. 237 Some TSes run non-IP protocols in conjunction with their IP traffic. 238 Therefore, it is important to handle both kinds of traffic optimally 239 - e.g., to bridge non-IP and intra-subnet traffic and to route inter- 240 subnet IP traffic. Therefore, the solution needs to meet the 241 following requirements: 243 R1: The solution MUST allow for both inter-subnet and intra-subnet 244 traffic belonging to the same tenant to be locally routed and bridged 245 respectively. The solution MUST provide IP routing for inter-subnet 246 traffic and Ethernet Bridging for intra-subnet traffic. 248 R2: The solution MUST support bridging for non-IP traffic. 250 R3: The solution MUST allow inter-subnet switching to be disabled on 251 a per VLAN basis on PEs where the traffic needs to be back hauled to 252 another node (i.e., for performing FW or DPI functionality). 254 2 EVPN PE Model for IRB Operation 256 Since this document discusses IRB operation in relationship to EVPN 257 MAC-VRF, IP-VRF, EVI, Bridge Domain (BD), Bridge Table (BT), and IRB 258 interfaces, it is important to understand the relationship among 259 these components. Therefore, the following PE model is demonstrated 260 below to a) describe these components and b) illustrate the 261 relationship among them. 263 +-------------------------------------------------------------+ 264 | | 265 | +------------------+ IRB PE | 266 | Attachment | +------------------+ | 267 | Circuit(AC1) | | +----------+ | MPLS/NVO tnl 268 ----------------------*Bridge | | +----- 269 | | | |Table(BT1)| | +-----------+ / \ \ 270 | | | | *---------* |<--> |Eth| 271 | | | |Eth-Tag x | |IRB1| | \ / / 272 | | | +----------+ | | | +----- 273 | | | ... | | IP-VRF1 | | 274 | | | +----------+ | | RD2/RT2 |MPLS/NVO tnl 275 | | | |Bridge | | | | +----- 276 | | | |Table(BT2)| |IRB2| | / \ \ 277 | | | | *---------* |<--> |IP | 278 ----------------------*Eth-Tag y | | +-----------+ \ / / 279 | AC2 | | +----------+ | +----- 280 | | | MAC-VRF1 | | 281 | +-+ RD1/RT1 | | 282 | +------------------+ | 283 | | 284 | | 285 +-------------------------------------------------------------+ 287 Figure 1: EVPN IRB PE Model 289 A tenant needing IRB services on a PE, requires an IP Virtual Routing 290 and Forwarding table (IP-VRF) along with one or more MAC Virtual 291 Routing and Forwarding tables (MAC-VRFs). An IP-VRF, as defined in 292 [RFC4364], is the instantiation of an IPVPN in a PE. A MAC-VRF, as 293 defined in [RFC7432], is the instantiation of an EVI (EVPN Instancce) 294 in a PE. A MAC-VRF can consists of one or more Bridge Tables (BTs) 295 where each BT corresponds to a VLAN (broadcast domain - BD). If 296 service interfaces for an EVPN PE are configured in VLAN-Based mode 297 (i.e., section 6.1 of [RFC7432]), then there is only a single BT per 298 MAC-VRF (per EVI) - i.e., there is only one tenant VLAN per EVI. 299 However, if service interfaces for an EVPN PE are configured in VLAN- 300 Aware Bundle mode (i.e., section 6.3 of [RFC7432]), then there are 301 several BTs per MAC-VRF (per EVI) - i.e., there are several tenant 302 VLANs per EVI. 304 Each BT is connected to a IP-VRF via a L3 interface called IRB 305 interface. Since a single tenant subnet is typically (and in this 306 document) represented by a VLAN (and thus supported by a single BT), 307 for a given tenant there are as many BTs as there are subnets and 308 thus there are also as many IRB interfaces between the tenant IP-VRF 309 and the associated BTs as shown in the PE model above. 311 IP-VRF is identified by its corresponding route target and route 312 distinguisher and MAC-VRF is also identified by its corresponding 313 route target and route distinguisher. If operating in EVPN VLAN-Based 314 mode, then a receiving PE that receives an EVPN route with MAC-VRF 315 route target can identify the corresponding BT; however, if operating 316 in EVPN VLAN-Aware Bundle mode, then the receiving PE needs both the 317 MAC-VRF route target and VLAN ID in order to identify the 318 corresponding BT. 320 3 Symmetric and Asymmetric IRB 322 This document defines and describes two types of IRB solutions - 323 namely symmetric and asymmetric IRB. In symmetric IRB as its name 324 implies, the lookup operation is symmetric at both ingress and egress 325 PEs - i.e., both ingress and egress PEs perform lookups on both MAC 326 and IP addresses. The ingress PE performs a MAC lookup followed by an 327 IP lookup and the egress PE performs a IP lookup followed by a MAC 328 lookup as depicted in figure 2. 330 Ingress PE Egress PE 331 +-------------------+ +------------------+ 332 | | | | 333 | +-> IP-VRF ----|---->---|-----> IP-VRF -+ | 334 | | | | | | 335 | BT1 BT2 | | BT3 BT2 | 336 | | | | | | 337 | ^ | | v | 338 | | | | | | 339 +-------------------+ +------------------+ 340 ^ | 341 | | 342 TS1->-+ +->-TS2 343 Figure 2: Symmetric IRB 345 In symmetric IRB as shown in figure-2, the inter-subnet forwarding 346 between two PEs is done between their associated IP-VRFs. Therefore, 347 the tunnel connecting these IP-VRFs can be either IP-only tunnel (in 348 case of MPLS or GENEVE encapsulation) or Ethernet NVO tunnel (in case 349 of VxLAN encapsulation). If it is an Ethernet NVO tunnel, the TS's IP 350 packet is encapsulated in an Ethernet header consisting of ingress 351 and egress PEs MAC addresses - i.e., there is no need for ingress PE 352 to use the destination TS's MAC address. Therefore, in symmetric IRB, 353 there is no need for the ingress PE to maintain ARP entries for 354 destination TS IP and MAC addresses association in its ARP table. 355 Each PE participating in symmetric IRB only maintains ARP entries for 356 locally connected hosts and maintains MAC-VRFs/BTs for only locally 357 configured subnets. 359 In asymmetric IRB, the lookup operation is asymmetric and the ingress 360 PE performs three lookups; whereas the egress PE performs a single 361 lookup - i.e., the ingress PE performs a MAC lookup, followed by an 362 IP lookup, followed by a MAC lookup again; whereas, the egress PE 363 performs just a single MAC lookup as depicted in figure 3 below. 365 Ingress PE Egress PE 366 +-------------------+ +------------------+ 367 | | | | 368 | +-> IP-VRF -> | | IP-VRF | 369 | | | | | | 370 | BT1 BT2 | | BT3 BT2 | 371 | | | | | | | | 372 | | +--|--->----|--------------+ | | 373 | | | | v | 374 +-------------------+ +----------------|-+ 375 ^ | 376 | | 377 TS1->-+ +->-TS2 378 Figure 3: Asymmetric IRB 380 In asymmetric IRB as shown in figure-3, the inter-subnet forwarding 381 between two PEs is done between their associated MAC-VRFs/BTs. 382 Therefore, the MPLS or NVO tunnel used for inter-subnet forwarding 383 MUST be of type Ethernet. Since at the egress PE only MAC lookup is 384 performed (e.g., no IP lookup), the TS's IP packets need to be 385 encapsulated with the destination TS's MAC address. In order for 386 ingress PE to perform such encapsulation, it needs to maintain TS's 387 IP and MAC address association in its ARP table. Furthermore, it 388 needs to maintain destination TS's MAC address in the corresponding 389 BT even though it may not have any TS of the corresponding subnet 390 locally attached. In other words, each PE participating in asymmetric 391 IRB MUST maintain ARP entries for remote hosts (hosts connected to 392 other PEs) as well as maintaining MAC-VRFs/BTs for subnets that may 393 not be locally present on that PE. 395 The following subsection defines the control and data planes 396 procedures for symmetric and asymmetric IRB on ingress and egress 397 PEs. The following figure is used in description of these procedures 398 where it shows a single IP-VRF and a number of BTs on each PE for a 399 given tenant. The IP-VRF of the tenant (i.e., IP-VRF1) is connected 400 to each BT via its associated IRB interface. Each BT on a PE is 401 associated with a unique VLAN (e.g., with a BD) where in turn is 402 associated with a single MAC-VRF in case of VLAN-Based mode or a 403 number of BTs can be associated with a single MAC-VRF in case of 404 VLAN-Aware Bundle mode. Whether the service interface on a PE is 405 VLAN-Based or VLAN-Aware Bundle mode does not impact the IRB 406 operation and procedures. It only impacts the setting of Ethernet tag 407 field in EVPN BGP routes as described in [RFC7432]. 409 PE 1 +---------+ 410 +-------------+ | | 411 TS1-----| MACx| | | PE2 412 (IP1/M1) |(BT1) | | | +-------------+ 413 TS5-----| \ | | MPLS/ | |MACy (BT3) |-----TS3 414 (IP5/M5) |Mx/IPx \ | | VxLAN/ | | / | (IP3/M3) 415 | (IP-VRF1)|----| NVGRE |---|(IP-VRF1) | 416 | / | | | | \ | 417 TS2-----|(BT2) / | | | | (BT1) |-----TS4 418 (IP2/M2) | | | | | | (IP4/M4) 419 +-------------+ | | +-------------+ 420 | | 421 +---------+ 423 Figure 4: IRB forwarding 425 3.1 IRB Interface and its MAC & IP addresses 427 To support inter-subnet forwarding on a PE, the PE acts as an IP 428 Default Gateway from the perspective of the attached Tenant Systems 429 where default gateway MAC and IP addresses are configured on each IRB 430 interface associated with its subnet and falls into one of the 431 following two options: 433 1. All the PEs for a given tenant subnet use the same anycast default 434 gateway IP and MAC addresses . On each PE, this default gateway IP 435 and MAC addresses correspond to the IRB interface connecting the BT 436 associated with the tenant's to the corresponding 437 tenant's IP-VRF. 439 2. Each PE for a given tenant subnet uses the same anycast default 440 gateway IP address but its own MAC address. These MAC addresses are 441 aliased to the same anycast default gateway IP address through the 442 use of the Default Gateway extended community as specified in 443 [RFC7432], which is carried in the EVPN MAC/IP Advertisement routes. 444 On each PE, this default gateway IP address along with its associated 445 MAC addresses correspond to the IRB interface connecting the BT 446 associated with the tenant's to the corresponding 447 tenant's IP-VRF. 449 It is worth noting that if the applications that are running on the 450 TSes are employing or relying on any form of MAC security, then 451 either the first model (i.e. using anycast MAC address) should be 452 used to ensure that the applications receive traffic from the same 453 IRB interface MAC address that they are sending to, or if the second 454 model is used, then the IRB interface MAC address MUST be the one 455 used in the initial ARP reply or ND Neighbor Advertisement (NA) for 456 that TS. 458 Although both of these options are equally applicable to both 459 symmetric and asymmetric IRB, the option-1 is recommended because of 460 the ease of anycast MAC address provisioning on not only the IRB 461 interface associated with a given subnet across all the PEs 462 corresponding to that EVI but also on all IRB interfaces associated 463 with all the tenant's subnets across all the PEs corresponding to all 464 the EVIs for that tenant. Furthermore, it simplifies the operation as 465 there is no need for Default Gateway extended community advertisement 466 and its associated MAC aliasing procedure. Yet another advantage is 467 that following host mobility, the host does not need to refresh the 468 default GW ARP/ND entry. 470 If option-1 is used, an implementation MAY choose to auto-derive the 471 anycast MAC address. If auto-derivation is used, the anycast MAC MUST 472 be auto-derived out of the following ranges (which are defined in 473 [RFC5798]): 475 - Anycast IPv4 IRB case: 00-00-5E-00-01-{VRID} (in hex, in Internet 476 standard bit-order) 478 - Anycast IPv6 IRB case: 00-00-5E-00-02-{VRID} (in hex, in Internet 479 standard bit-order) 481 Where the last octet is generated based on a configurable Virtual 482 Router ID (VRID, range 1-255)). If not explicitly configured, the 483 default value for the VRID octet is '01'. Auto-derivation of the 484 anycast MAC can only be used if there is certainty that the auto- 485 derived MAC does not collide with any customer MAC address. 487 In addition to IP anycast addresses, IRB interfaces can be configured 488 with non-anycast IP addresses for the purpose of OAM (such as 489 traceroute/ping to these interfaces) for both symmetric and 490 asymmetric IRB. These IP addresses need to be distributed as VPN 491 routes when PEs operating in symmetric IRB mode. However, they don't 492 need to be distributed if the PEs are operating in asymmetric IRB 493 mode and the non-anycast IP addresses are configured along with 494 individual MACs. 496 Irrespective of using only the anycast address or both anycast and 497 non-anycast addresses on the same IRB, when a TS sends an ARP request 498 or ND Neighbor Solicitation (NS) to the PE that is attached to, the 499 request is sent for the anycast IP address of the IRB interface 500 associated with the TS's subnet and the reply will use anycast MAC 501 address (in both Source MAC in the Ethernet header and Sender 502 hardware address in the payload). For example, in figure 4, TS1 is 503 configured with the anycast IPx address as its default gateway IP 504 address and thus when it sends an ARP request for IPx (anycast IP 505 address of the IRB interface for BT1), the PE1 sends an ARP reply 506 with the MACx which is the anycast MAC address of that IRB interface. 507 Traffic routed from IP-VRF1 to TS1 SHOULD use the anycast MAC address 508 as source MAC address. 510 3.2 Symmetric IRB Procedures 512 3.2.1 Control Plane - Ingress PE 514 When a PE (e.g., PE1 in figure 4 above) learns MAC and IP address of 515 a TS (via an ARP request), it adds the MAC address to the 516 corresponding MAC-VRF/BT of that tenant's subnet and adds the IP 517 address to the IP-VRF for that tenant. Furthermore, it adds this TS's 518 MAC and IP address association to its ARP table. It then builds an 519 EVPN MAC/IP Advertisement route (type 2) as follows and advertises it 520 to other PEs participating in that tenant's VPN. 522 - The Length field of the BGP EVPN NLRI for an EVPN MAC/IP 523 Advertisement route MUST be either 40 (if IPv4 address is carried) or 524 52 (if IPv6 address is carried). 526 - Route Distinguisher (RD), Ethernet Segment Identifier, Ethernet Tag 527 ID, MAC Address Length, MAC Address, IP Address Length, IP Address, 528 and MPLS Label1 fields MUST be set per [RFC7432] and [RFC8365]. 530 - The MPLS Label2 field is set to either an MPLS label or a VNI 531 corresponding to the tenant's IP-VRF. In case of an MPLS label, this 532 field is encoded as 3 octets, where the high-order 20 bits contain 533 the label value. 535 Just as in [RFC7432], the RD, Ethernet Tag ID, MAC Address Length, 536 MAC Address, IP Address Length, and IP Address fields are part of 537 the route key used by BGP to compare routes. The rest of the fields 538 are not part of the route key. 540 This route is advertised along with the following two extended 541 communities: 543 1) Tunnel Type Extended Community 544 2) Router's MAC Extended Community 546 For symmetric IRB mode, Router's MAC EC is needed to carry the PE's 547 overlay MAC address (e.g., inner MAC address in NVO encapsulation) 548 which is used for IP-VRF to IP-VRF communications with Ethernet NVO 549 tunnel. If MPLS or IP-only NVO tunnel is used, then there is no need 550 to send Router's MAC Extended Community along with this route. 552 This route MUST be advertised with two route targets - one 553 corresponding to the MAC-VRF of the tenant's subnet and another 554 corresponding to the tenant's IP-VRF. 556 3.2.2 Control Plane - Egress PE 558 When a PE (e.g., PE2 in figure 4 above) receives this EVPN MAC/IP 559 Advertisement route advertisement, it performs the following: 561 - Using MAC-VRF Route Target (and Ethernet Tag if different from 562 zero), it identifies the corresponding MAC-VRF (and BT). If the MAC- 563 VRF (and BT) exists (e.g., it is locally configured) then it imports 564 the MAC address into it. Otherwise, it does not import the MAC 565 address. 567 - Using IP-VRF route target, it identifies the corresponding IP-VRF 568 and imports the IP address into it. 570 The inclusion of MPLS label2 field in this route signals to the 571 receiving PE that this route is for symmetric IRB mode and MPLS 572 label2 needs to be installed in forwarding path to identify the 573 corresponding IP-VRF. 575 If the receiving PE receives this route with both the MAC-VRF and IP- 576 VRF route targets but the MAC/IP Advertisement route does not include 577 MPLS label2 field and if the receiving PE supports asymmetric IRB 578 mode, then the receiving PE installs the MAC address in the 579 corresponding MAC-VRF and association in the ARP table for 580 that tenant (identified by the corresponding IP-VRF route target). 582 If the receiving PE receives this route with both the MAC-VRF and IP- 583 VRF route targets but the MAC/IP Advertisement route does not include 584 MPLS label2 field and if the receiving PE does not support asymmetric 585 IRB mode, then if it has the corresponding MAC-VRF, it only imports 586 the MAC address; otherwise, if it doesn't have the corresponding MAC- 587 VRF, it MUST treat the route as withdraw [RFC7606] and log an error 588 message. 590 If the receiving PE receives this route with both the MAC-VRF and IP- 591 VRF route targets and the MAC/IP Advertisement route includes MPLS 592 label2 field but the receiving PE only supports asymmetric IRB mode, 593 then the receiving PE MUST ignore MPLS label2 field and install the 594 MAC address in the corresponding MAC-VRF and association in 595 the ARP table for that tenant (identified by the corresponding IP-VRF 596 route target). 598 3.2.3 Data Plane - Ingress PE 600 When an Ethernet frame is received by an ingress PE (e.g., PE1 in 601 figure 4 above), the PE uses the AC ID (e.g., VLAN ID) to identify 602 the associated MAC-VRF/BT and it performs a lookup on the destination 603 MAC address. If the MAC address corresponds to its IRB Interface MAC 604 address, the ingress PE deduces that the packet must be inter-subnet 605 routed. Hence, the ingress PE performs an IP lookup in the associated 606 IP-VRF table. The lookup identifies BGP next hop of egress PE along 607 with the tunnel/encapsulation type and the associated MPLS/VNI 608 values. 610 If the tunnel type is that of MPLS or IP-only NVO tunnel, then TS's 611 IP packet is sent over the tunnel without any Ethernet header. 612 However, if the tunnel type is that of Ethernet NVO tunnel, then an 613 Ethernet header needs to be added to the TS's IP packet. The source 614 MAC address of this inner Ethernet header is set to the ingress PE's 615 router MAC address and the destination MAC address of this inner 616 Ethernet header is set to the egress PE's router MAC address. The 617 MPLS VPN label or VNI fields are set accordingly and the packet is 618 forwarded to the egress PE. 620 If case of NVO tunnel encapsulation, the outer source and destination 621 IP addresses are set to the ingress and egress PE BGP next-hop IP 622 addresses respectively. 624 3.2.4 Data Plane - Egress PE 626 When the tenant's MPLS or NVO encapsulated packet is received over an 627 MPLS or NVO tunnel by the egress PE, the egress PE removes NVO tunnel 628 encapsulation and uses the VPN MPLS label (for MPLS encapsulation) or 629 VNI (for NVO encapsulation) to identify the IP-VRF in which IP lookup 630 needs to be performed. If the VPN MPLS label or VNI identifies a MAC- 631 VRF instead of an IP-VRF, then the procedures in section 3.3.4 for 632 asymmetric IRB are executed. 634 The lookup in the IP-VRF identifies a local adjacency to the IRB 635 interface associated with the egress subnet's MAC-VRF/BT. 637 The egress PE gets the destination TS's MAC address for that TS's IP 638 address from its ARP table, it encapsulates the packet with that 639 destination MAC address and a source MAC address corresponding to 640 that IRB interface and sends the packet to its destination subnet 641 MAC-VRF/BT. 643 The destination MAC address lookup in the MAC-VRF/BT results in local 644 adjacency (e.g., local interface) over which the Ethernet frame is 645 sent on. 647 3.3 Asymmetric IRB Procedures 649 3.3.1 Control Plane - Ingress PE 651 When a PE (e.g., PE1 in figure 4 above) learns MAC and IP address of 652 a TS (e.g., via an ARP request), it populates its MAC-VRF/BT, IP-VRF, 653 and ARP table just as in the case for symmetric IRB. It then builds 654 an EVPN MAC/IP Advertisement route (type 2) as follow and advertises 655 it to other PEs participating in that tenant's VPN. 657 - The Length field of the BGP EVPN NLRI for an EVPN MAC/IP 658 Advertisement route MUST be either 37 (if IPv4 address is carried) or 659 49 (if IPv6 address is carried). 661 - Route Distinguisher (RD), Ethernet Segment Identifier, Ethernet Tag 662 ID, MAC Address Length, MAC Address, IP Address Length, IP Address, 663 and MPLS Label1 fields MUST be set per [RFC7432] and [RFC8365]. 665 - The MPLS Label2 field MUST NOT be included in this route. 667 Just as in [RFC7432], the RD, Ethernet Tag ID, MAC Address Length, 668 MAC Address, IP Address Length, and IP Address fields are part of 669 the route key used by BGP to compare routes. The rest of the fields 670 are not part of the route key. 672 This route is advertised along with the following extended 673 communitiy: 675 1) Tunnel Type Extended Community 677 For asymmetric IRB mode, Router's MAC EC is not needed because 678 forwarding is performed using destination TS's MAC address which is 679 carried in this EVPN route type-2 advertisement. 681 This route MUST always be advertised with the MAC-VRF route target. 682 It MAY also be advertised with a second route target corresponding to 683 the IP-VRF. If only MAC-VRF route target is used, then the receiving 684 PE uses the MAC-VRF route target to identify the corresponding IP-VRF 685 - i.e., many MAC-VRF route targets map to the same IP-VRF for a given 686 tenant. Since in this asymmetric IRB mode, each PE is configured with 687 every BD for a tenant, the MAC-VRF route target has the same 688 reachability as the IP-VRF route target and that is why the use of 689 IP-VRF route target is optional for this IRB mode. 691 3.3.2 Control Plane - Egress PE 693 When a PE (e.g., PE2 in figure 4 above) receives this EVPN MAC/IP 694 Advertisement route advertisement, it performs the following: 696 - Using MAC-VRF route target, it identifies the corresponding MAC-VRF 697 and imports the MAC address into it. For asymmetric IRB mode, it is 698 assumed that all PEs participating in a tenant's VPN are configured 699 with all subnets and corresponding MAC-VRFs/BTs even if there are no 700 locally attached TSes for some of these subnets. The reason for this 701 is because ingress PE needs to do forwarding based on destination 702 TS's MAC address and does proper NVO tunnel encapsulation which are 703 property of a lookup in MAC-VRF/BT. An implementation may choose to 704 consolidate the lookup at the ingress PE's IP-VRF with the lookup at 705 the ingress PE's destination subnet MAC-VRF. Consideration for such 706 consolidation of lookups is an implementation exercise and thus its 707 specification is outside the scope of this document. 709 - Using MAC-VRF route target, it identifies the corresponding ARP 710 table for the tenant and it adds an entry to the ARP table for the 711 TS's MAC and IP address association. It should be noted that the 712 tenant's ARP table at the receiving PE is identified by all the MAC- 713 VRF route targets for that tenant. If IP-VRF route target is included 714 with this route advertisement, then it MAY be used for the 715 identification of tenant's ARP table. 717 If the receiving PE receives the MAC/IP Advertisement route with MPLS 718 label2 field but the receiving PE only supports asymmetric IRB mode, 719 then the receiving PE MUST ignore MPLS label2 field and install the 720 MAC address in the corresponding MAC-VRF and association in 721 the ARP table for that tenant (identified by either MAC-VRF or IP-VRF 722 route targets). 724 If the receiving PE receives the MAC/IP Advertisement route with MPLS 725 label2 field and it can support symmetric IRB mode, then it should 726 use the MAC-VRF route target to identify its corresponding MAC-VRF 727 table and import the MAC address. It should use the IP-VRF route 728 target to identify the corresponding IP-VRF table and import the IP 729 address. It MUST not import association into its ARP 730 table. 732 3.3.3 Data Plane - Ingress PE 734 When an Ethernet frame is received by an ingress PE (e.g., PE1 in 735 figure 4 above), the PE uses the AC ID (e.g., VLAN ID) to identify 736 the associated MAC-VRF/BT and it performs a lookup on the destination 737 MAC address. If the MAC address corresponds to its IRB Interface MAC 738 address, the ingress PE deduces that the packet must be inter-subnet 739 routed. Hence, the ingress PE performs an IP lookup in the associated 740 IP-VRF table. The lookup identifies a local adjacency to the IRB 741 interface associated with the egress subnet's MAC-VRF/BT. 743 The ingress PE gets the destination TS's MAC address for that TS's IP 744 address from its ARP table, it encapsulates the packet with that 745 destination MAC address and a source MAC address corresponding to 746 that IRB interface and sends the packet to its destination subnet 747 MAC-VRF/BT. 749 The destination MAC address lookup in the MAC-VRF/BT results in BGP 750 next hop address of egress PE along with VPN MPLS label or VNI. The 751 ingress PE encapsulates the packet using Ethernet NVO tunnel of the 752 choice (e.g., VxLAN or GENEVE) and sends the packet to the egress PE. 753 Since the packet forwarding is between ingress PE's MAC-VRF/BT and 754 egress PE's MAC-VRF/BT, the packet encapsulation procedures follows 755 that of [RFC7432] for MPLS and [RFC8365] for VxLAN encapsulations. 757 3.3.4 Data Plane - Egress PE 759 When a tenant's Ethernet frame is received over an NVO tunnel by the 760 egress PE, the egress PE removes NVO tunnel encapsulation and uses 761 the VPN MPLS label (for MPLS encapsulation) or VNI (for NVO 762 encapsulation) to identify the MAC-VRF/BT in which MAC lookup needs 763 to be performed. 765 The MAC lookup results in local adjacency (e.g., local interface) 766 over which the packet needs to get sent. 768 Note that the forwarding behavior on the egress PE is the same as 769 EVPN intra-subnet forwarding described in [RFC7432] for MPLS and 770 [RFC8365] for NVO networks. In other words, all the packet processing 771 associated with the inter-subnet forwarding semantics is confined to 772 the ingress PE for asymmetric IRB mode. 774 It should also be noted that [RFC7432] provides different level of 775 granularity for the EVPN label. Besides identifying bridge domain 776 table, it can be used to identify the egress interface or a 777 destination MAC address on that interface. If EVPN label is used for 778 egress interface or individual MAC address identification, then no 779 MAC lookup is needed in the egress PE for MPLS encapsulation and the 780 packet can be directly forwarded to the egress interface just based 781 on EVPN label lookup. 783 4 Mobility Procedure 784 When a TS move from one NVE (aka source NVE) to another NVE (aka 785 target NVE), it is important that the MAC mobility procedures are 786 properly executed and the corresponding MAC-VRF and IP-VRF tables on 787 all participating NVEs are updated. [RFC7432] describes the MAC 788 mobility procedures for L2-only services for both single-homed TS and 789 multi-homed TS. This section describes the incremental procedures and 790 BGP Extended Communities needed to handle the MAC mobility for IRB. 791 In order to place the emphasis on the differences between L2-only and 792 IRB use cases, the incremental procedure is described for single- 793 homed TS with the expectation that the reader can easily extrapolate 794 multi-homed TS based on the procedures described in section 15 of 795 [RFC7432]. This section describes mobility procedures for both 796 symmetric and asymmetric IRB. 798 4.1 Mobility Procedure for Symmetric IRB 800 When a TS moves from a source NVE to a target NVE, it can behave in 801 one of the following three ways: 803 1) TS initiates an ARP request upon a move to the target NVE 805 2) TS sends data packet without first initiating an ARP request to 806 the target NVE 808 3) TS is a silent host and neither initiates an ARP request nor sends 809 any packets 811 The following subsections describe the procedures for each of the 812 above options. In the following subsections, it is assumed that the 813 MAC & IP addresses of a TS have one-to-one relationship (i.e., there 814 is one IP address per MAC address and vise versa). If such there is 815 many-to-one relationship such that there are many host IP addresses 816 correspond to a single host MAC address or there are many host MAC 817 addresses correspond to a single IP address, then to detect host 818 mobility, the procedures in [IRB-EXT-MOBILITY] must be exercised 819 followed by the procedures described below. 821 4.1.1 Initiating an ARP Request upon a Move 823 In this scenario when a TS moves from a source NVE to a target NVE, 824 the TS initiates an ARP request upon the move (e.g., gratuitous ARP) 825 to the target NVE. 827 The target NVE upon receiving this ARP request, updates its MAC-VRF, 828 IP-VRF, and ARP table with the host MAC, IP, and local adjacency 829 information (e.g., local interface). 831 Since this NVE has previously learned the same MAC and IP addresses 832 from the source NVE, it recognizes that there has been a MAC move and 833 it initiates MAC mobility procedures per [RFC7432] by advertising an 834 EVPN MAC/IP route with both the MAC and IP addresses filled in along 835 with MAC Mobility Extended Community with the sequence number 836 incremented by one. The target NVE also exercises the MAC duplication 837 detection procedure in section 15.1 of [RFC 7432]. 839 The source NVE upon receiving this MAC/IP advertisement, realizes 840 that the MAC has moved to the target NVE. It updates its MAC-VRF and 841 IP-VRF table accordingly with the adjacency information of the target 842 NVE and withdraws its EVPN MAC/IP route. Furthermore, it sends an ARP 843 probe locally to ensure that the MAC is gone. After no ARP response 844 is received, it then deletes its ARP entry corresponding to that . If an ARP response is received, the source NVE updates its ARP 846 entry timer for that and re-advertises an EVPN MAC/IP route 847 for that along with MAC Mobility Extended Community with 848 the sequence number incremented by one. The source NVE also exercises 849 the MAC duplication detection procedure in section 15.1 of [RFC 850 7432]. 852 All other remote NVE devices upon receiving the MAC/IP advertisement 853 route with MAC Mobility extended community compare the sequence 854 number in this advertisement with the one previously received. If the 855 new sequence number is greater than the old one, then they update the 856 MAC/IP addresses of the TS in their corresponding MAC-VRF and IP-VRF 857 tables to point to the target NVE. Furthermore, upon receiving the 858 MAC/IP withdraw for the TS from the source NVE, these remote PEs 859 perform the cleanups for their BGP tables. 861 4.1.2 Sending Data Traffic without an ARP Request 863 In this scenario when a TS moves from a source NVE to a target NVE, 864 the TS starts sending data traffic without first initiating an ARP 865 request. 867 The target NVE upon receiving the first data packet, it learns the 868 MAC address of the TS in data plane and updates its MAC-VRF table 869 with the MAC address and the local adjacency information (e.g., local 870 interface) accordingly. The target NVE realizes that there has been a 871 MAC move because the same MAC address has been learned remotely from 872 the source NVE. 874 If EVPN-IRB NVEs are configured to advertise MAC-only routes in 875 addition to MAC-and-IP EVPN routes, then the following steps are 876 taken: 878 - The target NVE upon learning this MAC address in data-plane, 879 updates this MAC address entry in the corresponding MAC-VRF with the 880 local adjacency information (e.g., local interface). It also 881 recognizes that this MAC has moved and initiates MAC mobility 882 procedures per [RFC7432] by advertising an EVPN MAC/IP route with 883 only the MAC address filled in along with MAC Mobility Extended 884 Community with the sequence number incremented by one. 886 - The source NVE upon receiving this MAC/IP advertisement, realizes 887 that the MAC has moved to the new NVE. It updates its MAC-VRF table 888 accordingly by updating the adjacency information for that MAC 889 address to point to the target NVE and withdraws its EVPN MAC/IP 890 route that has only the MAC address (if it has advertised such route 891 previously). Furthermore, it searches its ARP table for this MAC and 892 sends an ARP probe for this pair. The ARP request message is 893 sent both locally to all attached TSes in that subnet as well as it 894 is sent to other NVEs participating in that subnet including the 895 target NVE. 897 - The target NVE passes the ARP request to its locally attached TSes 898 and when it receives the ARP response, it sends an EVPN MAC/IP 899 advertisement route with both the MAC and IP addresses filled in 900 along with MAC Mobility Extended Community with the sequence number 901 set to the same value as the one for MAC-only advertisement route it 902 sent previously. 904 - When the source NVE receives the EVPN MAC/IP advertisement, it 905 updates its IP-VRF table with the new adjacency information 906 (pointing to the target NVE) and deletes the associated ARP entry 907 from its ARP table. Furthermore, it withdraws its previously 908 advertised EVPN MAC/IP route with both the MAC and IP addresses. 910 - All other remote NVE devices upon receiving the MAC/IP 911 advertisement route with MAC Mobility extended community compare the 912 sequence number in this advertisement with the one previously 913 received. If the new sequence number is greater than the old one, 914 then they update the MAC/IP addresses of the TS in their 915 corresponding MAC-VRF and IP-VRF tables to point to the new NVE. 916 Furthermore, upon receiving the MAC/IP withdraw for the TS from the 917 old NVE, these remote PEs perform the cleanups for their BGP tables. 919 If EVPN-IRB NVEs are configured not to advertise MAC-only routes, 920 then upon receiving the first data packet, it learns the MAC address 921 of the TS and updates the MAC entry in the corresponding MAC-VRF 922 table with the local adjacency information (e.g., local interface). 923 It also realizes that there has been a MAC move because the same MAC 924 address has been learned remotely from the source NVE. It then sends 925 an unicast ARP request to the host and when receiving an ARP 926 response, it follows the procedure outlined in section 4.1.1. 928 4.1.3 Silent Host 930 In this scenario when a TS moves from a source NVE to a target NVE, 931 the TS is silent and it neither initiates an ARP request nor it sends 932 any data traffic. Therefore, neither the target nor the source NVEs 933 are aware of the MAC move. 935 On the source NVE, the MAC age-out timer (for the silent host that 936 has moved) expires and as the result it triggers an ARP probe on the 937 source NVE. The ARP request gets sent both locally to all the 938 attached TSes on that subnet as well as it gets sent to all the 939 remote NVEs (including the target NVE) participating in that subnet. 940 It also withdraw the EVPN MAC/IP route with only the MAC address (if 941 it has previously advertised such a route). 943 The target NVE passes the ARP request to its locally attached TSes 944 and when it receives the ARP response, it sends an EVPN MAC/IP 945 advertisement route with both the MAC and IP addresses filled in 946 along with MAC Mobility Extended Community with the sequence number 947 incremented by one. 949 When the source NVE receives the EVPN MAC/IP advertisement, it 950 updates its IP-VRF table with the new adjacency information 951 (pointing to the target NVE) and deletes the associated ARP entry 952 from its ARP table. Furthermore, it withdraws its previously 953 advertised EVPN MAC/IP route with both the MAC and IP addresses. 955 All other remote NVE devices upon receiving the MAC/IP advertisement 956 route with MAC Mobility extended community compare the sequence 957 number in this advertisement with the one previously received. If the 958 new sequence number is greater than the old one, then they update the 959 MAC/IP addresses of the TS in their corresponding MAC-VRF and IP-VRF 960 tables to point to the new NVE. Furthermore, upon receiving the 961 MAC/IP withdraw for the TS from the old NVE, these remote PEs perform 962 the cleanups for their BGP tables. 964 5 BGP Encoding 966 This document defines one new BGP Extended Community for EVPN. 968 5.1 Router's MAC Extended Community 970 A new EVPN BGP Extended Community called Router's MAC is introduced 971 here. This new extended community is a transitive extended community 972 with the Type field of 0x06 (EVPN) and the Sub-Type of 0x03. It may 973 be advertised along with BGP Encapsulation Extended Community define 974 in section 4.5 of [TUNNEL-ENCAP]. 976 The Router's MAC Extended Community is encoded as an 8-octet value as 977 follows: 979 0 1 2 3 980 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 981 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 982 | Type=0x06 | Sub-Type=0x03 | Router's MAC | 983 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 984 | Router's MAC Cont'd | 985 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 987 Figure 5: Router's MAC Extended Community 989 This extended community is used to carry the PE's MAC address for 990 symmetric IRB scenarios and it is sent with RT-2. 992 6 Operational Models for Symmetric Inter-Subnet Forwarding 994 The following sections describe two main symmetric IRB forwarding 995 scenarios (within a DC - i.e., intra-DC) along with their 996 corresponding procedures. In the following scenarios, without loss of 997 generality, it is assumed that a given tenant is represented by a 998 single IP-VPN instance. Therefore, on a given PE, a tenant is 999 represented by a single IP-VRF table and one or more MAC-VRF tables. 1001 6.1 IRB forwarding on NVEs for Tenant Systems 1003 This section covers the symmetric IRB procedures for the scenario 1004 where each Tenant System (TS) is attached to one or more NVEs and its 1005 host IP and MAC addresses are learned by the attached NVEs and are 1006 distributed to all other NVEs that are interested in participating in 1007 both intra-subnet and inter-subnet communications with that TS. 1009 In this scenario, without loss of generality, it is assumed that NVEs 1010 operate in VLAN-based service interface mode with one Bridge Table 1011 (BT) per MAC-VRF. Thus for a given tenant, an NVE has one MAC-VRF for 1012 each tenant's subnet (e.g., each VLAN) that is configured for which 1013 is typically the case for VxLAN and GENEVE encapsulation. In case of 1014 VLAN-aware bundling, then each MAC-VRF consists of multiple Bridge 1015 Tables (e.g., one BT per VLAN). The MAC-VRFs on an NVE for a given 1016 tenant are associated with an IP-VRF corresponding to that tenant (or 1017 IP-VPN instance) via their IRB interfaces. 1019 Each NVE MUST support QoS, Security, and OAM policies per IP-VRF 1020 to/from the core network. This is not to be confused with the QoS, 1021 Security, and OAM policies per Attachment Circuits (AC) to/from the 1022 Tenant Systems. How this requirement is met is an implementation 1023 choice and it is outside the scope of this document. 1025 Since VxLAN and GENEVE encapsulations require inner Ethernet header 1026 (inner MAC SA/DA), and since for inter-subnet traffic, TS MAC address 1027 cannot be used, the ingress NVE's MAC address is used as inner MAC 1028 SA. The NVE's MAC address is the device MAC address and it is common 1029 across all MAC-VRFs and IP-VRFs. This MAC address is advertised using 1030 the new EVPN Router's MAC Extended Community (section 5.1). 1032 Figure 6 below illustrates this scenario where a given tenant (e.g., 1033 an IP-VPN instance) has three subnets represented by MAC-VRF1, MAC- 1034 VRF2, and MAC-VRF3 across two NVEs. There are five TSes that are 1035 associated with these three MAC-VRFs - i.e., TS1, TS4, and TS5 are 1036 sitting on the same subnet (e.g., same MAC-VRF/VLAN);where, TS1 and 1037 TS5 are associated with MAC-VRF1 on NVE1, TS4 is associated with MAC- 1038 VRF1 on NVE2. TS2 is associated with MAC-VRF2 on NVE1, and TS3 is 1039 associated with MAC-VRF3 on NVE2. MAC-VRF1 and MAC-VRF2 on NVE1 are 1040 in turn associated with IP-VRF1 on NVE1 and MAC-VRF1 and MAC-VRF3 on 1041 NVE2 are associated with IP-VRF1 on NVE2. When TS1, TS5, and TS4 1042 exchange traffic with each other, only L2 forwarding (bridging) part 1043 of the IRB solution is exercised because all these TSes sit on the 1044 same subnet. However, when TS1 wants to exchange traffic with TS2 or 1045 TS3 which belong to different subnets, then both bridging and routing 1046 parts of the IRB solution are exercised. The following subsections 1047 describe the control and data planes operations for this IRB scenario 1048 in details. 1050 NVE1 +---------+ 1051 +-------------+ | | 1052 TS1-----| MACx| | | NVE2 1053 (IP1/M1) |(MAC- | | | +-------------+ 1054 TS5-----| VRF1)\ | | MPLS/ | |MACy (MAC- |-----TS3 1055 (IP5/M5) | \ | | VxLAN/ | | / VRF3) | (IP3/M3) 1056 | (IP-VRF1)|----| NVGRE |---|(IP-VRF1) | 1057 | / | | | | \ | 1058 TS2-----|(MAC- / | | | | (MAC- |-----TS4 1059 (IP2/M2) | VRF2) | | | | VRF1) | (IP4/M4) 1060 +-------------+ | | +-------------+ 1061 | | 1062 +---------+ 1064 Figure 6: IRB forwarding on NVEs for Tenant Systems 1066 6.1.1 Control Plane Operation 1068 Each NVE advertises a MAC/IP Advertisement route (i.e., Route Type 2) 1069 for each of its TSes with the following field set: 1071 - RD and ESI per [RFC7432] 1072 - Ethernet Tag = 0; assuming VLAN-based service 1073 - MAC Address Length = 48 1074 - MAC Address = Mi ; where i = 1,2,3,4, or 5 in the above example 1075 - IP Address Length = 32 or 128 1076 - IP Address = IPi ; where i = 1,2,3,4, or 5 in the above example 1077 - Label-1 = MPLS Label or VNI corresponding to MAC-VRF 1078 - Label-2 = MPLS Label or VNI corresponding to IP-VRF 1080 Each NVE advertises an RT-2 route with two Route Targets (one 1081 corresponding to its MAC-VRF and the other corresponding to its IP- 1082 VRF. Furthermore, the RT-2 is advertised with two BGP Extended 1083 Communities. The first BGP Extended Community identifies the tunnel 1084 type per section 4.5 of [TUNNEL-ENCAP] and the second BGP Extended 1085 Community includes the MAC address of the NVE (e.g., MACx for NVE1 or 1086 MACy for NVE2) as defined in section 5.1. This second Extended 1087 Community (for the MAC address of NVE) is only required when Ethernet 1088 NVO tunnel type is used. If IP NVO tunnel type is used, then there is 1089 no need to send this second Extended Community. It should be noted 1090 that IP NVO tunnel type is only applicable to symmetric IRB 1091 procedures. 1093 Upon receiving this advertisement, the receiving NVE performs the 1094 following: 1096 - It uses Route Targets corresponding to its MAC-VRF and IP-VRF for 1097 identifying these tables and subsequently importing the MAC and IP 1098 addresses into them respectively. 1100 - It imports the MAC address from MAC/IP Advertisement route into the 1101 MAC-VRF with BGP Next Hop address as underlay tunnel destination 1102 address (e.g., VTEP DA for VxLAN encapsulation) and Label-1 as VNI 1103 for VxLAN encapsulation or EVPN label for MPLS encapsulation. 1105 - If the route carries the new Router's MAC Extended Community, and 1106 if the receiving NVE is using Ethernet NVO tunnel, then the receiving 1107 NVE imports the IP address into IP-VRF with NVE's MAC address (from 1108 the new Router's MAC Extended Community) as inner MAC DA and BGP Next 1109 Hop address as underlay tunnel destination address, VTEP DA for VxLAN 1110 encapsulation and Label-2 as IP-VPN VNI for VxLAN encapsulation. 1112 - If the receiving NVE is going to use MPLS encapsulation, then the 1113 receiving NVE imports the IP address into IP-VRF with BGP Next Hop 1114 address as underlay tunnel destination address, and Label-2 as IP-VPN 1115 label for MPLS encapsulation. 1117 If the receiving NVE receives a RT-2 with only Label-1 and only a 1118 single Route Target corresponding to IP-VRF, or if it receives a RT-2 1119 with only a single Route Target corresponding to MAC-VRF but with 1120 both Label-1 and Label-2, or if it receives a RT-2 with MAC Address 1121 Length of zero, then it MUST treat the route as withdraw [RFC7606] 1122 and log an error message. 1124 6.1.2 Data Plane Operation 1126 The following description of the data-plane operation describes just 1127 the logical functions and the actual implementation may differ. Lets 1128 consider data-plane operation when TS1 in subnet-1 (MAC-VRF1) on NVE1 1129 wants to send traffic to TS3 in subnet-3 (MAC-VRF3) on NVE2. 1131 - NVE1 receives a packet with MAC DA corresponding to the MAC-VRF1 1132 IRB interface on NVE1 (the interface between MAC-VRF1 and IP-VRF1), 1133 and VLAN-tag corresponding to MAC-VRF1. 1135 - Upon receiving the packet, the NVE1 uses VLAN-tag to identify the 1136 MAC-VRF1. It then looks up the MAC DA and forwards the frame to its 1137 IRB interface. 1139 - The Ethernet header of the packet is stripped and the packet is 1140 fed to the IP-VRF where IP lookup is performed on the destination IP 1141 address. This lookup yields the outgoing NVO tunnel and the required 1142 encapsulation. If the encapsulation is for Ethernet NVO tunnel, then 1143 it includes the egress NVE's MAC address as inner MAC DA, the egress 1144 NVE's IP address (e.g., BGP Next Hop address) as the VTEP DA, and the 1145 VPN-ID as the VNI. The inner MAC SA and VTEP SA are set to NVE's MAC 1146 and IP addresses respectively. If it is a MPLS encapsulation, then 1147 corresponding EVPN and LSP labels are added to the packet. The packet 1148 is then forwarded to the egress NVE. 1150 - On the egress NVE, if the packet arrives on Ethernet NVO tunnel 1151 (e.g., it is VxLAN encapsulated), then the NVO tunnel header is 1152 removed. Since the inner MAC DA is the egress NVE's MAC address, the 1153 egress NVE knows that it needs to perform an IP lookup. It uses the 1154 VNI to identify the IP-VRF table. If the packet is MPLS encapsulated, 1155 then the EVPN label lookup identifies the IP-VRF table. Next, an IP 1156 lookup is performed for the destination TS (TS3) which results in 1157 access-facing IRB interface over which the packet is sent. Before 1158 sending the packet over this interface, the ARP table is consulted to 1159 get the destination TS's MAC address. 1161 - The IP packet is encapsulated with an Ethernet header with MAC SA 1162 set to that of IRB interface MAC address (i.e, IRB interface between 1163 MAC-VRF3 and IP-VRF1 on NVE2) and MAC DA set to that of destination 1164 TS (TS3) MAC address. The packet is sent to the corresponding MAC-VRF 1165 (i.e., MAC-VRF3) and after a lookup of MAC DA, is forwarded to the 1166 destination TS (TS3) over the corresponding interface. 1168 In this symmetric IRB scenario, inter-subnet traffic between NVEs 1169 will always use the IP-VRF VNI/MPLS label. For instance, traffic from 1170 TS2 to TS4 will be encapsulated by NVE1 using NVE2's IP-VRF VNI/MPLS 1171 label, as long as TS4's host IP is present in NVE1's IP-VRF. 1173 6.2 IRB forwarding on NVEs for Subnets behind Tenant Systems 1175 This section covers the symmetric IRB procedures for the scenario 1176 where some Tenant Systems (TSes) support one or more subnets and 1177 these TSes are associated with one or more NVEs. Therefore, besides 1178 the advertisement of MAC/IP addresses for each TS which can be multi- 1179 homed with All-Active redundancy mode, the associated NVE needs to 1180 also advertise the subnets statically configured on each TS. 1182 The main difference between this solution and the previous one is the 1183 additional advertisement corresponding to each subnet. These subnet 1184 advertisements are accomplished using EVPN IP Prefix route defined in 1185 [EVPN-PREFIX]. These subnet prefixes are advertised with the IP 1186 address of their associated TS (which is in overlay address space) as 1187 their next hop. The receiving NVEs perform recursive route resolution 1188 to resolve the subnet prefix with its associated ingress NVE so that 1189 they know which NVE to forward the packets to when they are destined 1190 for that subnet prefix. 1192 The advantage of this recursive route resolution is that when a TS 1193 moves from one NVE to another, there is no need to re-advertise any 1194 of the subnet prefixes for that TS. All it is needed is to advertise 1195 the IP/MAC addresses associated with the TS itself and exercise MAC 1196 mobility procedures for that TS. The recursive route resolution 1197 automatically takes care of the updates for the subnet prefixes of 1198 that TS. 1200 Figure below illustrates this scenario where a given tenant (e.g., an 1201 IP-VPN service) has three subnets represented by MAC-VRF1, MAC-VRF2, 1202 and MAC-VRF3 across two NVEs. There are four TSes associated with 1203 these three MAC-VRFs - i.e., TS1, TS5 are connected to MAC-VRF1 on 1204 NVE1, TS2 is connected to MAC-VRF2 on NVE1, TS3 is connected to MAC- 1205 VRF3 on NVE2, and TS4 is connected to MAC-VRF1 on NVE2. TS1 has two 1206 subnet prefixes (SN1 and SN2) and TS3 has a single subnet prefix, 1207 SN3. The MAC-VRFs on each NVE are associated with their corresponding 1208 IP-VRF using their IRB interfaces. When TS4 and TS1 exchange intra- 1209 subnet traffic, only L2 forwarding (bridging) part of the IRB 1210 solution is used (i.e., the traffic only goes through their MAC- 1211 VRFs); however, when TS3 wants to forward traffic to SN1 or SN2 1212 sitting behind TS1 (inter-subnet traffic), then both bridging and 1213 routing parts of the IRB solution are exercised (i.e., the traffic 1214 goes through the corresponding MAC-VRFs and IP-VRFs). The following 1215 subsections describe the control and data planes operations for this 1216 IRB scenario in details. 1218 NVE1 +----------+ 1219 SN1--+ +-------------+ | | 1220 |--TS1-----|(MAC- \ | | | 1221 SN2--+ IP1/M1 | VRF1) \ | | | 1222 | (IP-VRF)|---| | 1223 | / | | | 1224 TS2-----|(MAC- / | | MPLS/ | 1225 IP2/M2 | VRF2) | | VxLAN/ | 1226 +-------------+ | NVGRE | 1227 +-------------+ | | 1228 SN3--+--TS3-----|(MAC-\ | | | 1229 IP3/M3 | VRF3)\ | | | 1230 | (IP-VRF)|---| | 1231 | / | | | 1232 TS4-----|(MAC- / | | | 1233 IP4/M4 | VRF1) | | | 1234 +-------------+ +----------+ 1235 NVE2 1237 Figure 7: IRB forwarding on NVEs for subnets behind TSes 1239 6.2.1 Control Plane Operation 1241 Each NVE advertises a Route Type-5 (RT-5, IP Prefix Route defined in 1242 [EVPN-PREFIX]) for each of its subnet prefixes with the IP address of 1243 its TS as the next hop (gateway address field) as follow: 1245 - RD associated with the IP-VRF 1246 - ESI = 0 1247 - Ethernet Tag = 0; 1248 - IP Prefix Length = 32 or 128 1249 - IP Prefix = SNi 1250 - Gateway Address = IPi; IP address of TS 1251 - Label = 0 1252 This RT-5 is advertised with one or more Route Targets that have been 1253 configured as "export route targets" of the IP-VRF from which the 1254 route is originated. 1256 Each NVE also advertises an RT-2 (MAC/IP Advertisement Route) along 1257 with their associated Route Targets and Extended Communities for each 1258 of its TSes exactly as described in section 6.1.1. 1260 Upon receiving the RT-5 advertisement, the receiving NVE performs the 1261 following: 1263 - It uses the Route Target to identify the corresponding IP-VRF 1265 - It imports the IP prefix into its corresponding IP-VRF that is 1266 configured with an import RT that is one of the RTs being carried by 1267 the RT-5 route along with the IP address of the associated TS as its 1268 next hop. 1270 When receiving the RT-2 advertisement, the receiving NVE imports 1271 MAC/IP addresses of the TS into the corresponding MAC-VRF and IP-VRF 1272 per section 6.1.1. When both routes exist, recursive route resolution 1273 is performed to resolve the IP prefix (received in RT-5) to its 1274 corresponding NVE's IP address (e.g., its BGP next hop). BGP next hop 1275 will be used as underlay tunnel destination address (e.g., VTEP DA 1276 for VxLAN encapsulation) and Router's MAC will be used as inner MAC 1277 for VxLAN encapsulation. 1279 6.2.2 Data Plane Operation 1281 The following description of the data-plane operation describes just 1282 the logical functions and the actual implementation may differ. Lets 1283 consider data-plane operation when a host on SN1 sitting behind TS1 1284 wants to send traffic to a host sitting behind SN3 behind TS3. 1286 - TS1 send a packet with MAC DA corresponding to the MAC-VRF1 IRB 1287 interface of NVE1, and VLAN-tag corresponding to MAC-VRF1. 1289 - Upon receiving the packet, the ingress NVE1 uses VLAN-tag to 1290 identify the MAC-VRF1. It then looks up the MAC DA and forwards the 1291 frame to its IRB interface just like section 6.1.1. 1293 - The Ethernet header of the packet is stripped and the packet is fed 1294 to the IP-VRF; where, IP lookup is performed on the destination 1295 address. This lookup yields the fields needed for VxLAN encapsulation 1296 with NVE2's MAC address as the inner MAC DA, NVE'2 IP address as the 1297 VTEP DA, and the VNI. MAC SA is set to NVE1's MAC address and VTEP SA 1298 is set to NVE1's IP address. 1300 - The packet is then encapsulated with the proper header based on 1301 the above info and is forwarded to the egress NVE (NVE2). 1303 - On the egress NVE (NVE2), assuming the packet is VxLAN 1304 encapsulated, the VxLAN and the inner Ethernet headers are removed 1305 and the resultant IP packet is fed to the IP-VRF associated with that 1306 the VNI. 1308 - Next, a lookup is performed based on IP DA (which is in SN3) in the 1309 associated IP-VRF of NVE2. The IP lookup yields the access-facing IRB 1310 interface over which the packet needs to be sent. Before sending the 1311 packet over this interface, the ARP table is consulted to get the 1312 destination TS (TS3) MAC address. 1314 - The IP packet is encapsulated with an Ethernet header with the MAC 1315 SA set to that of the access-facing IRB interface of the egress NVE 1316 (NVE2) and the MAC DA is set to that of destination TS (TS3) MAC 1317 address. The packet is sent to the corresponding MAC-VRF3 and after a 1318 lookup of MAC DA, is forwarded to the destination TS (TS3) over the 1319 corresponding interface. 1321 7 Acknowledgements 1323 The authors would like to thank Sami Boutros, Jeffrey Zhang, 1324 Krzysztof Szarkowicz, and Neeraj Malhotra for their valuable 1325 comments. The authors would also like to thank Linda Dunbar for her 1326 engaging discussions. 1328 8 Security Considerations 1330 This document describes a set of procedures for Inter-Subnet 1331 Forwarding of tenant traffic across PEs (or NVEs). These procedures 1332 include both layer-2 forwarding and layer-3 routing on a packet by 1333 packet basis. The security consideration for layer-2 forwarding in 1334 this document follow that of [RFC7432] for MPLS encapsulation and it 1335 follows that of [RFC8365] for VxLAN or GENEVE encapsulations. 1337 Furthermore, the security consideration for layer-3 routing is this 1338 document follows that of [RFC4365] with the exception for application 1339 of routing protocols between CEs and PEs. Contrary to [RFC4364], this 1340 document does not describe route distribution techniques between CEs 1341 and PEs, but rather considers the CEs as TSes or VAs that do not run 1342 dynamic routing protocols. This can be considered a security 1343 advantage, since dynamic routing protocols can be blocked on the 1344 NVE/PE ACs, not allowing the tenant to interact with the 1345 infrastructure's dynamic routing protocols. 1347 In this document, the RT-5 is used for certain scenarios. This route 1348 uses an Overlay Index that requires a recursive resolution to a 1349 different EVPN route (an RT-2). Because of this, it is worth noting 1350 that any action that ends up filtering or modifying the RT-2 route 1351 used to convey the Overlay Indexes, will modify the resolution of the 1352 RT-5 and therefore the forwarding of packets to the remote subnet. 1354 9 IANA Considerations 1356 IANA has allocated a new transitive extended community Type of 0x06 1357 and Sub-Type of 0x03 for EVPN Router's MAC Extended Community. 1359 10 References 1361 10.1 Normative References 1363 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1364 Requirement Levels", BCP 14, RFC 2119, March 1997. 1366 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC2119 1367 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 1368 2017. 1370 [RFC7432] Sajassi et al., "BGP MPLS Based Ethernet VPN", RFC 7432, 1371 February, 2015. 1373 [RFC8365] Sajassi et al., "A Network Virtualization Overlay Solution 1374 Using Ethernet VPN (EVPN)", RFC 8365, March, 2018. 1376 [TUNNEL-ENCAP] Rosen et al., "The BGP Tunnel Encapsulation 1377 Attribute", draft-ietf-idr-tunnel-encaps-03, November 1378 2016. 1380 [EVPN-PREFIX] Rabadan et al., "IP Prefix Advertisement in EVPN", 1381 draft-ietf-bess-evpn-prefix-advertisement-03, September, 1382 2016. 1384 10.2 Informative References 1386 [RFC7606] Chen, E., Scudder, J., Mohapatra, P., and K. Patel, 1387 "Revised Error Handling for BGP UPDATE Messages", RFC 7606, August 1388 2015, . 1390 [802.1Q] "IEEE Standard for Local and metropolitan area networks - 1391 Media Access Control (MAC) Bridges and Virtual Bridged Local Area 1392 Networks", IEEE Std 802.1Q(tm), 2014 Edition, November 2014. 1394 [RFC7348] Mahalingam, M., et al., "Virtual eXtensible Local Area 1395 Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 1396 Networks over Layer 3 Networks", RFC 7348, DOI 10.17487/RFC7348, 1397 August 2014. 1399 [GENEVE] Gross, J., et al., "Geneve: Generic Network Virtualization 1400 Encapsulation", Work in Progress, draft-ietf-nvo3-geneve-06, March 1401 2018. 1403 [IRB-EXT-MOBILITY] Malhotra, N., al., "Extended Mobility Procedures 1404 for EVPN-IRB", Work in Progress, draft-malhotra-bess-evpn-irb- 1405 extended-mobility-02, February 2018. 1407 11 Contributors 1409 In addition to the authors listed on the front page, the following 1410 co-authors have also contributed to this document: 1412 Florin Balus 1413 Cisco 1415 Yakov Rekhter 1416 Juniper 1418 Wim Henderickx 1419 Nokia 1421 Lucy Yong 1422 Linda Dunbar 1423 Huawei 1425 Dennis Cai 1426 Alibaba 1428 Authors' Addresses 1430 Ali Sajassi (Editor) 1431 Cisco 1432 Email: sajassi@cisco.com 1434 Samer Salam 1435 Cisco 1436 Email: sslam@cisco.com 1437 Samir Thoria 1438 Cisco 1439 Email: sthoria@cisco.com 1441 John E. Drake 1442 Juniper 1443 Email: jdrake@juniper.net 1445 Jorge Rabadan 1446 Nokia 1447 Email: jorge.rabadan@nokia.com