idnits 2.17.1 draft-ietf-bess-evpn-inter-subnet-forwarding-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 3 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: If the receiving PE receives the MAC/IP Advertisement route with MPLS label2 field and it can support symmetric IRB mode, then it should use the MAC-VRF route target to identify its corresponding MAC-VRF table and import the MAC address. It should use the IP-VRF route target to identify the corresponding IP-VRF table and import the IP address. It MUST not import association into its ARP table. -- The document date (July 18, 2018) is 2110 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC4364' is mentioned on line 1328, but not defined == Missing Reference: 'RFC7365' is mentioned on line 200, but not defined == Missing Reference: 'RFC5798' is mentioned on line 472, but not defined == Missing Reference: 'RFC4365' is mentioned on line 1327, but not defined == Outdated reference: A later version (-22) exists of draft-ietf-idr-tunnel-encaps-03 == Outdated reference: A later version (-11) exists of draft-ietf-bess-evpn-prefix-advertisement-03 == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-06 == Outdated reference: A later version (-04) exists of draft-malhotra-bess-evpn-irb-extended-mobility-02 Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 L2VPN Workgroup A. Sajassi, Ed. 3 INTERNET-DRAFT S. Salam 4 Intended Status: Standards Track S. Thoria 5 Cisco 6 J. Drake 7 Juniper 8 J. Rabadan 9 Nokia 11 Expires: January 18, 2019 July 18, 2018 13 Integrated Routing and Bridging in EVPN 14 draft-ietf-bess-evpn-inter-subnet-forwarding-05 16 Abstract 18 EVPN provides an extensible and flexible multi-homing VPN solution 19 over an MPLS/IP network for intra-subnet connectivity among Tenant 20 Systems and End Devices that can be physical or virtual. However, 21 there are scenarios for which there is a need for a dynamic and 22 efficient inter-subnet connectivity among these Tenant Systems and 23 End Devices while maintaining the multi-homing capabilities of EVPN. 24 This document describes an Integrated Routing and Bridging (IRB) 25 solution based on EVPN to address such requirements. 27 Status of this Memo 29 This Internet-Draft is submitted to IETF in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF), its areas, and its working groups. Note that 34 other groups may also distribute working documents as 35 Internet-Drafts. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 The list of current Internet-Drafts can be accessed at 43 http://www.ietf.org/1id-abstracts.html 45 The list of Internet-Draft Shadow Directories can be accessed at 46 http://www.ietf.org/shadow.html 48 Copyright and License Notice 50 Copyright (c) 2014 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 66 2 EVPN PE Model for IRB Operation . . . . . . . . . . . . . . . . 7 67 3 Symmetric and Asymmetric IRB . . . . . . . . . . . . . . . . . 8 68 3.1 IRB Interface and its MAC & IP addresses . . . . . . . . . . 11 69 3.2 Symmetric IRB Procedures . . . . . . . . . . . . . . . . . . 13 70 3.2.1 Control Plane - Ingress PE . . . . . . . . . . . . . . . 13 71 3.2.2 Control Plane - Egress PE . . . . . . . . . . . . . . . 13 72 3.2.3 Data Plane - Ingress PE . . . . . . . . . . . . . . . . 14 73 3.2.4 Data Plane - Egress PE . . . . . . . . . . . . . . . . . 15 74 3.3 Asymmetric IRB Procedures . . . . . . . . . . . . . . . . . 15 75 3.3.1 Control Plane - Ingress PE . . . . . . . . . . . . . . . 15 76 3.3.2 Control Plane - Egress PE . . . . . . . . . . . . . . . 16 77 3.3.3 Data Plane - Ingress PE . . . . . . . . . . . . . . . . 17 78 3.3.4 Data Plane - Egress PE . . . . . . . . . . . . . . . . . 18 79 4 Mobility Procedure . . . . . . . . . . . . . . . . . . . . . . . 18 80 4.1 Mobility Procedure for Symmetric IRB . . . . . . . . . . . . 19 81 4.1.1 Initiating an ARP Request upon a Move . . . . . . . . . 19 82 4.1.2 Sending Data Traffic without an ARP Request . . . . . . 20 83 4.1.3 Silent Host . . . . . . . . . . . . . . . . . . . . . . 21 84 5 BGP Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 22 85 5.1 Router's MAC Extended Community . . . . . . . . . . . . . . 22 86 6 Operational Models for Symmetric Inter-Subnet Forwarding . . . . 23 87 6.1 IRB forwarding on NVEs for Tenant Systems . . . . . . . . . 23 88 6.1.1 Control Plane Operation . . . . . . . . . . . . . . . . 24 89 6.1.2 Data Plane Operation . . . . . . . . . . . . . . . . . . 26 90 6.2 IRB forwarding on NVEs for Subnets behind Tenant Systems . . 27 91 6.2.1 Control Plane Operation . . . . . . . . . . . . . . . . 28 92 6.2.2 Data Plane Operation . . . . . . . . . . . . . . . . . . 29 93 7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 30 94 8 Security Considerations . . . . . . . . . . . . . . . . . . . . 30 95 9 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 31 96 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . 31 97 10.1 Normative References . . . . . . . . . . . . . . . . . . . 31 98 10.2 Informative References . . . . . . . . . . . . . . . . . . 31 99 11 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 32 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 32 102 Terminology 104 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 105 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 106 "OPTIONAL" in this document are to be interpreted as described in BCP 107 14 [RFC2119] [RFC8174] when, and only when, they appear in all 108 capitals, as shown here. 110 AC: Attachment Circuit. 112 ARP: Address Resolution Protocol. 114 BD: Broadcast Domain. As per [RFC7432], an EVI consists of a single 115 or multiple BDs. In case of VLAN-bundle and VLAN-based service models 116 (see [RFC7432]), a BD is equivalent to an EVI. In case of VLAN-aware 117 bundle service model, an EVI contains multiple BDs. Also, in this 118 document, BD and subnet are equivalent terms. 120 BD Route Target: refers to the Broadcast Domain assigned Route Target 121 [RFC4364]. In case of VLAN-aware bundle service model, all the BD 122 instances in the MAC-VRF share the same Route Target. 124 BT: Bridge Table. The instantiation of a BD in a MAC-VRF, as per 125 [RFC7432]. 127 DGW: Data Center Gateway. 129 Ethernet A-D route: Ethernet Auto-Discovery (A-D) route, as per 130 [RFC7432]. 132 Ethernet NVO tunnel: refers to Network Virtualization Overlay tunnels 133 with Ethernet payload. Examples of this type of tunnels are VXLAN or 134 GENEVE. 136 EVI: EVPN Instance spanning the NVE/PE devices that are participating 137 on that EVPN, as per [RFC7432]. 139 EVPN: Ethernet Virtual Private Networks, as per [RFC7432]. 141 GRE: Generic Routing Encapsulation. 143 GW IP: Gateway IP Address. 145 IPL: IP Prefix Length. 147 IP NVO tunnel: it refers to Network Virtualization Overlay tunnels 148 with IP payload (no MAC header in the payload). 150 IP-VRF: A VPN Routing and Forwarding table for IP routes on an 151 NVE/PE. The IP routes could be populated by EVPN and IP-VPN address 152 families. An IP-VRF is also an instantiation of a layer 3 VPN in an 153 NVE/PE. 155 IRB: Integrated Routing and Bridging interface. It connects an IP-VRF 156 to a BD (or subnet). 158 MAC-VRF: A Virtual Routing and Forwarding table for Media Access 159 Control (MAC) addresses on an NVE/PE, as per [RFC7432]. A MAC-VRF is 160 also an instantiation of an EVI in an NVE/PE. 162 ML: MAC address length. 164 ND: Neighbor Discovery Protocol. 166 NVE: Network Virtualization Edge. 168 GENEVE: Generic Network Virtualization Encapsulation, [GENEVE]. 170 NVO: Network Virtualization Overlays. 172 RT-2: EVPN route type 2, i.e., MAC/IP advertisement route, as defined 173 in [RFC7432]. 175 RT-5: EVPN route type 5, i.e., IP Prefix route. As defined in Section 176 3 of [EVPN-PREFIX]. 178 SBD: Supplementary Broadcast Domain. A BD that does not have any ACs, 179 only IRB interfaces, and it is used to provide connectivity among all 180 the IP-VRFs of the tenant. The SBD is only required in IP-VRF- to-IP- 181 VRF use-cases (see Section 4.4.). 183 SN: Subnet. 185 TS: Tenant System. 187 VA: Virtual Appliance. 189 VNI: Virtual Network Identifier. As in [RFC8365], the term is used as 190 a representation of a 24-bit NVO instance identifier, with the 191 understanding that VNI will refer to a VXLAN Network Identifier in 192 VXLAN, or Virtual Network Identifier in GENEVE, etc. unless it is 193 stated otherwise. 195 VTEP: VXLAN Termination End Point, as in [RFC7348]. 197 VXLAN: Virtual Extensible LAN, as in [RFC7348]. 199 This document also assumes familiarity with the terminology of 200 [RFC7432], [RFC8365] and [RFC7365]. 202 1 Introduction 204 EVPN provides an extensible and flexible multi-homing VPN solution 205 over an MPLS/IP network for intra-subnet connectivity among Tenant 206 Systems (TS's) and End Devices that can be physical or virtual; where 207 an IP subnet is represented by an EVI for a VLAN-based service or by 208 an for a VLAN-aware bundle service. However, there are 209 scenarios for which there is a need for a dynamic and efficient 210 inter-subnet connectivity among these Tenant Systems and End Devices 211 while maintaining the multi-homing capabilities of EVPN. This 212 document describes an Integrated Routing and Bridging (IRB) solution 213 based on EVPN to address such requirements. 215 The inter-subnet communication is traditionally achieved at 216 centralized L3 Gateway (L3GW) devices where all the inter-subnet 217 forwarding are performed and all the inter-subnet communication 218 policies are enforced. When two Tenant Systems (TS's) belonging to 219 two different subnets connected to the same PE node, wanted to 220 communicate with each other, their traffic needed to be back hauled 221 from the PE node all the way to the centralized gateway node where 222 inter-subnet switching is performed and then back to the PE node. For 223 today's large multi-tenant data center, this scheme is very 224 inefficient and sometimes impractical. 226 In order to overcome the drawback of centralized L3GW approach, IRB 227 functionality is needed on the PE nodes (also referred to as EVPN 228 NVEs) attached to TS's in order to avoid inefficient forwarding of 229 tenant traffic (i.e., avoid back-hauling and hair-pinning). When a PE 230 with IRB capability receives tenant traffic over a single Attachment 231 Circuit (AC), it can not only locally bridged the tenant intra-subnet 232 traffic but also can locally route the tenant inter-subnet traffic on 233 a packet by packet basis thus meeting the requirements for both intra 234 and inter-subnet forwarding and avoiding non-optimum traffic 235 forwarding associate with centralized L3GW approach. 237 Some TS's run non-IP protocols in conjunction with their IP traffic. 238 Therefore, it is important to handle both kinds of traffic optimally 239 - e.g., to bridge non-IP and intra-subnet traffic and to route inter- 240 subnet IP traffic. Therefore, the solution needs to meet the 241 following requirements: 243 R1: The solution MUST allow for both inter-subnet and intra-subnet 244 traffic belonging to the same tenant to be locally routed and bridged 245 respectively. The solution MUST provide IP routing for inter-subnet 246 traffic and Ethernet Bridging for intra-subnet traffic. 248 R2: The solution MUST support bridging for non-IP traffic. 250 R3: The solution MUST allow inter-subnet switching to be disabled on 251 a per VLAN basis on PEs where the traffic needs to be back hauled to 252 another node (i.e., for performing FW or DPI functionality). 254 2 EVPN PE Model for IRB Operation 256 Since this document discusses IRB operation in relationship to EVPN 257 MAC-VRF, IP-VRF, EVI, Bridge Domain (BD), Bridge Table (BT), and IRB 258 interfaces, it is important to understand the relationship among 259 these components. Therefore, the following PE model is demonstrated 260 below to a) describe these components and b) illustrate the 261 relationship among them. 263 +-------------------------------------------------------------+ 264 | | 265 | +------------------+ IRB PE | 266 | Attachment | +------------------+ | 267 | Circuit(AC1) | | +----------+ | MPLS/NVO tnl 268 ----------------------*Bridge | | +----- 269 | | | |Table(BT1)| | +-----------+ / \ \ 270 | | | | *---------* |<--> |Eth| 271 | | | |Eth-Tag x | |IRB1| | \ / / 272 | | | +----------+ | | | +----- 273 | | | ... | | IP-VRF1 | | 274 | | | +----------+ | | RD2/RT2 |MPLS/NVO tnl 275 | | | |Bridge | | | | +----- 276 | | | |Table(BT2)| |IRB2| | / \ \ 277 | | | | *---------* |<--> |IP | 278 ----------------------*Eth-Tag y | | +-----------+ \ / / 279 | AC2 | | +----------+ | +----- 280 | | | MAC-VRF1 | | 281 | +-+ RD1/RT1 | | 282 | +------------------+ | 283 | | 284 | | 285 +-------------------------------------------------------------+ 287 Figure 1: EVPN IRB PE Model 289 A tenant needing IRB services on a PE, requires an IP Virtual Routing 290 and Forwarding table (IP-VRF) along with one or more MAC Virtual 291 Routing and Forwarding tables (MAC-VRFs). An IP-VRF, as defined in 292 [RFC4364], is the instantiation of an IPVPN in a PE. A MAC-VRF, as 293 defined in [RFC7432], is the instantiation of an EVI (EVPN Instancce) 294 in a PE. A MAC-VRF can consists of one or more Bridge Tables (BTs) 295 where each BT corresponds to a VLAN (broadcast domain - BD). If 296 service interfaces for an EVPN PE are configured in VLAN-Based mode 297 (i.e., section 6.1 of [RFC7432]), then there is only a single BT per 298 MAC-VRF (per EVI) - i.e., there is only one tenant VLAN per EVI. 299 However, if service interfaces for an EVPN PE are configured in VLAN- 300 Aware Bundle mode (i.e., section 6.3 of [RFC7432]), then there are 301 several BTs per MAC-VRF (per EVI) - i.e., there are several tenant 302 VLANs per EVI. 304 Each BT is connected to a IP-VRF via a L3 interface called IRB 305 interface. Since a single tenant subnet is typically (and in this 306 document) represented by a VLAN (and thus supported by a single BT), 307 for a given tenant there are as many BTs as there are subnets and 308 thus there are also as many IRB interfaces between the tenant IP-VRF 309 and the associated BTs as shown in the PE model above. 311 IP-VRF is identified by its corresponding route target and route 312 distinguisher and MAC-VRF is also identified by its corresponding 313 route target and route distinguisher. If operating in EVPN VLAN-Based 314 mode, then a receiving PE that receives an EVPN route with MAC-VRF 315 route target can identify the corresponding BT; however, if operating 316 in EVPN VLAN-Aware Bundle mode, then the receiving PE needs both the 317 MAC-VRF route target and VLAN ID in order to identify the 318 corresponding BT. 320 3 Symmetric and Asymmetric IRB 322 This document defines and describes two types of IRB solutions - 323 namely symmetric and asymmetric IRB. In symmetric IRB as its name 324 implies, the lookup operation is symmetric at both ingress and egress 325 PEs - i.e., both ingress and egress PEs perform lookups on both MAC 326 and IP addresses. The ingress PE performs a MAC lookup followed by an 327 IP lookup and the egress PE performs a IP lookup followed by a MAC 328 lookup as depicted in figure 2. 330 Ingress PE Egress PE 331 +-------------------+ +------------------+ 332 | | | | 333 | +-> IP-VRF ----|---->---|-----> IP-VRF -+ | 334 | | | | | | 335 | BT1 BT2 | | BT3 BT2 | 336 | | | | | | 337 | ^ | | v | 338 | | | | | | 339 +-------------------+ +------------------+ 340 ^ | 341 | | 342 TS1->-+ +->-TS2 343 Figure 2: Symmetric IRB 345 In symmetric IRB as shown in figure-2, the inter-subnet forwarding 346 between two PEs is done between their associated IP-VRFs. Therefore, 347 the tunnel connecting these IP-VRFs can be either IP-only tunnel (in 348 case of MPLS or GENEVE encapsulation) or Ethernet NVO tunnel (in case 349 of VxLAN encapsulation). If it is an Ethernet NVO tunnel, the TS's IP 350 packet is encapsulated in an Ethernet header consisting of ingress 351 and egress PEs MAC addresses - i.e., there is no need for ingress PE 352 to use the destination TS's MAC address. Therefore, in symmetric IRB, 353 there is no need for the ingress PE to maintain ARP entries for 354 destination TS IP and MAC addresses association in its ARP table. 355 Each PE participating in symmetric IRB only maintains ARP entries for 356 locally connected hosts and maintains MAC-VRFs/BTs for only locally 357 configured subnets. 359 In asymmetric IRB, the lookup operation is asymmetric and the ingress 360 PE performs three lookups; whereas the egress PE performs a single 361 lookup - i.e., the ingress PE performs a MAC lookup, followed by an 362 IP lookup, followed by a MAC lookup again; whereas, the egress PE 363 performs just a single MAC lookup as depicted in figure 3 below. 365 Ingress PE Egress PE 366 +-------------------+ +------------------+ 367 | | | | 368 | +-> IP-VRF -> | | IP-VRF | 369 | | | | | | 370 | BT1 BT2 | | BT3 BT2 | 371 | | | | | | | | 372 | | +--|--->----|--------------+ | | 373 | | | | v | 374 +-------------------+ +----------------|-+ 375 ^ | 376 | | 377 TS1->-+ +->-TS2 378 Figure 3: Asymmetric IRB 380 In asymmetric IRB as shown in figure-3, the inter-subnet forwarding 381 between two PEs is done between their associated MAC-VRFs/BTs. 382 Therefore, the MPLS or NVO tunnel used for inter-subnet forwarding 383 MUST be of type Ethernet. Since at the egress PE only MAC lookup is 384 performed (e.g., no IP lookup), the TS's IP packets need to be 385 encapsulated with the destination TS's MAC address. In order for 386 ingress PE to perform such encapsulation, it needs to maintain TS's 387 IP and MAC address association in its ARP table. Furthermore, it 388 needs to maintain destination TS's MAC address in the corresponding 389 BT even though it may not have any TS of the corresponding subnet 390 locally attached. In other words, each PE participating in asymmetric 391 IRB MUST maintain ARP entries for remote hosts (hosts connected to 392 other PEs) as well as maintaining MAC-VRFs/BTs for subnets that may 393 not be locally present on that PE. 395 The following subsection defines the control and data planes 396 procedures for symmetric and asymmetric IRB on ingress and egress 397 PEs. The following figure is used in description of these procedures 398 where it shows a single IP-VRF and a number of BTs on each PE for a 399 given tenant. The IP-VRF of the tenant (i.e., IP-VRF1) is connected 400 to each BT via its associated IRB interface. Each BT on a PE is 401 associated with a unique VLAN (e.g., with a BD) where in turn is 402 associated with a single MAC-VRF in case of VLAN-Based mode or a 403 number of BTs can be associated with a single MAC-VRF in case of 404 VLAN-Aware Bundle mode. Whether the service interface on a PE is 405 VLAN-Based or VLAN-Aware Bundle mode does not impact the IRB 406 operation and procedures. It only impacts the setting of Ethernet tag 407 field in EVPN BGP routes as described in [RFC7432]. 409 PE 1 +---------+ 410 +-------------+ | | 411 TS1-----| MACx| | | PE2 412 (IP1/M1) |(BT1) | | | +-------------+ 413 TS5-----| \ | | MPLS/ | |MACy (BT3) |-----TS3 414 (IP5/M5) |Mx/IPx \ | | VxLAN/ | | / | (IP3/M3) 415 | (IP-VRF1)|----| NVGRE |---|(IP-VRF1) | 416 | / | | | | \ | 417 TS2-----|(BT2) / | | | | (BT1) |-----TS4 418 (IP2/M2) | | | | | | (IP4/M4) 419 +-------------+ | | +-------------+ 420 | | 421 +---------+ 423 Figure 4: IRB forwarding 425 3.1 IRB Interface and its MAC & IP addresses 427 To support inter-subnet forwarding on a PE, the PE acts as an IP 428 Default Gateway from the perspective of the attached Tenant Systems 429 where default gateway MAC and IP addresses are configured on each IRB 430 interface associated with its subnet and falls into one of the 431 following two options: 433 1. All the PEs for a given tenant subnet use the same anycast default 434 gateway IP and MAC addresses . On each PE, this default gateway IP 435 and MAC addresses correspond to the IRB interface connecting the BT 436 associated with the tenant's to the corresponding 437 tenant's IP-VRF. 439 2. Each PE for a given tenant subnet uses the same anycast default 440 gateway IP address but its own MAC address. These MAC addresses are 441 aliased to the same anycast default gateway IP address through the 442 use of the Default Gateway extended community as specified in 443 [RFC7432], which is carried in the EVPN MAC/IP Advertisement routes. 444 On each PE, this default gateway IP address along with its associated 445 MAC addresses correspond to the IRB interface connecting the BT 446 associated with the tenant's to the corresponding 447 tenant's IP-VRF. 449 It is worth noting that if the applications that are running on the 450 TS's are employing or relying on any form of MAC security, then 451 either the first model (i.e. using anycast MAC address) should be 452 used to ensure that the applications receive traffic from the same 453 IRB interface MAC address that they are sending to, or if the second 454 model is used, then the IRB interface MAC address MUST be the one 455 used in the initial ARP reply for that TS. 457 Although both of these options are equally applicable to both 458 symmetric and asymmetric IRB, the option-1 is recommended because of 459 the ease of anycast MAC address provisioning on not only the IRB 460 interface associated with a given subnet across all the PEs 461 corresponding to that EVI but also on all IRB interfaces associated 462 with all the tenant's subnets across all the PEs corresponding to all 463 the EVIs for that tenant. Furthermore, it simplifies the operation as 464 there is no need for Default Gateway extended community advertisement 465 and its associated MAC aliasing procedure. Yet another advantage is 466 that following host mobility, the host does not need to refresh the 467 default GW ARP entry. 469 If option-1 is used, an implementation MAY choose to auto-derive the 470 anycast MAC address. If auto-derivation is used, the anycast MAC MUST 471 be auto-derived out of the following ranges (which are defined in 472 [RFC5798]): 474 - Anycast IPv4 IRB case: 00-00-5E-00-01-{VRID} (in hex, in Internet 475 standard bit-order) 477 - Anycast IPv6 IRB case: 00-00-5E-00-02-{VRID} (in hex, in Internet 478 standard bit-order) 480 Where the last octet is generated based on a configurable Virtual 481 Router ID (VRID, range 1-255)). If not explicitly configured, the 482 default value for the VRID octet is '01'. Auto-derivation of the 483 anycast MAC can only be used if there is certainty that the auto- 484 derived MAC does not collide with any customer MAC address. 486 In addition to IP anycast addresses, IRB interfaces can be configured 487 with non-anycast IP addresses for the purpose of OAM (such as 488 traceroute/ping to these interfaces) for both symmetric and 489 asymmetric IRB. These IP addresses need to be distributed as VPN 490 routes when PEs operating in symmetric IRB mode. However, they don't 491 need to be distributed if the PEs are operating in asymmetric IRB 492 mode and the non-anycast IP addresses are configured with individual 493 MACs. 495 Irrespective of using only the anycast address or both anycast and 496 non-anycast addresses on the same IRB, when a TS sends an ARP request 497 to the PE that is attached to, the ARP request is sent for the 498 anycast IP address of the IRB interface associated with the TS's 499 subnet. For example, in figure 4, TS1 is configured with the anycast 500 IPx address as its default gateway IP address and thus when it sends 501 an ARP request for IPx (anycast IP address of the IRB interface for 502 BT1), the PE1 sends an ARP reply with the MACx which is the anycast 503 MAC address of that IRB interface. Traffic routed from IP-VRF1 to TS1 504 SHOULD use the anycast MAC address as source MAC address. 506 3.2 Symmetric IRB Procedures 508 3.2.1 Control Plane - Ingress PE 510 When a PE (e.g., PE1 in figure 4 above) learns MAC and IP address of 511 a TS (via an ARP request), it adds the MAC address to the 512 corresponding MAC-VRF/BT of that tenant's subnet and adds the IP 513 address to the IP-VRF for that tenant. Furthermore, it adds this TS's 514 MAC and IP address association to its ARP table. It then builds an 515 EVPN MAC/IP Advertisement route (type 2) as follows and advertises it 516 to other PEs participating in that tenant's VPN. 518 - The Length field of the BGP EVPN NLRI for an EVPN MAC/IP 519 Advertisement route MUST be either 40 (if IPv4 address is carried) or 520 52 (if IPv6 address is carried). 522 - Route Distinguisher (RD), Ethernet Segment Identifier, Ethernet Tag 523 ID, MAC Address Length, MAC Address, IP Address Length, IP Address, 524 and MPLS Label1 fields MUST be set per [RFC7432] and [RFC8365]. 526 - The MPLS Label2 field is set to either an MPLS label or a VNI 527 corresponding to the tenant's IP-VRF. In case of an MPLS label, this 528 field is encoded as 3 octets, where the high-order 20 bits contain 529 the label value. 531 Just as in [RFC7432], the RD, Ethernet Tag ID, MAC Address Length, 532 MAC Address, IP Address Length, and IP Address fields are part of 533 the route key used by BGP to compare routes. The rest of the fields 534 are not part of the route key. 536 This route is advertised along with the following two extended 537 communities: 539 1) Tunnel Type Extended Community 540 2) Router's MAC Extended Community 542 For symmetric IRB mode, Router's MAC EC is needed to carry the PE's 543 overlay MAC address (e.g., inner MAC address in NVO encapsulation) 544 which is used for IP-VRF to IP-VRF communications with Ethernet NVO 545 tunnel. If MPLS or IP-only NVO tunnel is used, then there is no need 546 to send Router's MAC Extended Community along with this route. 548 This route MUST be advertised with two route targets - one 549 corresponding to the MAC-VRF of the tenant's subnet and another 550 corresponding to the tenant's IP-VRF. 552 3.2.2 Control Plane - Egress PE 553 When a PE (e.g., PE2 in figure 4 above) receives this EVPN MAC/IP 554 Advertisement route advertisement, it performs the following: 556 - Using MAC-VRF Route Target (and Ethernet Tag if different from 557 zero), it identifies the corresponding MAC-VRF (and BT). If the MAC- 558 VRF (and BT) exists (e.g., it is locally configured) then it imports 559 the MAC address into it. Otherwise, it does not import the MAC 560 address. 562 - Using IP-VRF route target, it identifies the corresponding IP-VRF 563 and imports the IP address into it. 565 The inclusion of MPLS label2 field in this route signals to the 566 receiving PE that this route is for symmetric IRB mode and MPLS 567 label2 needs to be installed in forwarding path to identify the 568 corresponding IP-VRF. 570 If the receiving PE receives this route with both the MAC-VRF and IP- 571 VRF route targets but the MAC/IP Advertisement route does not include 572 MPLS label2 field and if the receiving PE supports asymmetric IRB 573 mode, then the receiving PE installs the MAC address in the 574 corresponding MAC-VRF and association in the ARP table for 575 that tenant (identified by the corresponding IP-VRF route target). 577 If the receiving PE receives this route with both the MAC-VRF and IP- 578 VRF route targets but the MAC/IP Advertisement route does not include 579 MPLS label2 field and if the receiving PE does not support asymmetric 580 IRB mode, then if it has the corresponding MAC-VRF, it only imports 581 the MAC address; otherwise, if it doesn't have the corresponding MAC- 582 VRF, it MUST treat the route as withdraw [RFC7606] and log an error 583 message. 585 If the receiving PE receives this route with both the MAC-VRF and IP- 586 VRF route targets and the MAC/IP Advertisement route includes MPLS 587 label2 field but the receiving PE only supports asymmetric IRB mode, 588 then the receiving PE MUST ignore MPLS label2 field and install the 589 MAC address in the corresponding MAC-VRF and association in 590 the ARP table for that tenant (identified by the corresponding IP-VRF 591 route target). 593 3.2.3 Data Plane - Ingress PE 595 When an Ethernet frame is received by an ingress PE (e.g., PE1 in 596 figure 4 above), the PE uses the AC ID (e.g., VLAN ID) to identify 597 the associated MAC-VRF/BT and it performs a lookup on the destination 598 MAC address. If the MAC address corresponds to its IRB Interface MAC 599 address, the ingress PE deduces that the packet must be inter-subnet 600 routed. Hence, the ingress PE performs an IP lookup in the associated 601 IP-VRF table. The lookup identifies BGP next hop of egress PE along 602 with the tunnel/encapsulation type and the associated MPLS/VNI 603 values. 605 If the tunnel type is that of MPLS or IP-only NVO tunnel, then TS's 606 IP packet is sent over the tunnel without any Ethernet header. 607 However, if the tunnel type is that of Ethernet NVO tunnel, then an 608 Ethernet header needs to be added to the TS's IP packet. The source 609 MAC address of this inner Ethernet header is set to the ingress PE's 610 router MAC address and the destination MAC address of this inner 611 Ethernet header is set to the egress PE's router MAC address. The 612 MPLS VPN label or VNI fields are set accordingly and the packet is 613 forwarded to the egress PE. 615 If case of NVO tunnel encapsulation, the outer source and destination 616 IP addresses are set to the ingress and egress PE BGP next-hop IP 617 addresses respectively. 619 3.2.4 Data Plane - Egress PE 621 When the tenant's MPLS or NVO encapsulated packet is received over an 622 MPLS or NVO tunnel by the egress PE, the egress PE removes NVO tunnel 623 encapsulation and uses the VPN MPLS label (for MPLS encapsulation) or 624 VNI (for NVO encapsulation) to identify the IP-VRF in which IP lookup 625 needs to be performed. If the VPN MPLS label or VNI identifies a MAC- 626 VRF instead of an IP-VRF, then the procedures in section 3.3.4 for 627 asymmetric IRB are executed. 629 The lookup in the IP-VRF identifies a local adjacency to the IRB 630 interface associated with the egress subnet's MAC-VRF/BT. 632 The egress PE gets the destination TS's MAC address for that TS's IP 633 address from its ARP table, it encapsulates the packet with that 634 destination MAC address and a source MAC address corresponding to 635 that IRB interface and sends the packet to its destination subnet 636 MAC-VRF/BT. 638 The destination MAC address lookup in the MAC-VRF/BT results in local 639 adjacency (e.g., local interface) over which the Ethernet frame is 640 sent on. 642 3.3 Asymmetric IRB Procedures 644 3.3.1 Control Plane - Ingress PE 645 When a PE (e.g., PE1 in figure 4 above) learns MAC and IP address of 646 a TS (e.g., via an ARP request), it populates its MAC-VRF/BT, IP-VRF, 647 and ARP table just as in the case for symmetric IRB. It then builds 648 an EVPN MAC/IP Advertisement route (type 2) as follow and advertises 649 it to other PEs participating in that tenant's VPN. 651 - The Length field of the BGP EVPN NLRI for an EVPN MAC/IP 652 Advertisement route MUST be either 37 (if IPv4 address is carried) or 653 49 (if IPv6 address is carried). 655 - Route Distinguisher (RD), Ethernet Segment Identifier, Ethernet Tag 656 ID, MAC Address Length, MAC Address, IP Address Length, IP Address, 657 and MPLS Label1 fields MUST be set per [RFC7432] and [RFC8365]. 659 - The MPLS Label2 field MUST NOT be included in this route. 661 Just as in [RFC7432], the RD, Ethernet Tag ID, MAC Address Length, 662 MAC Address, IP Address Length, and IP Address fields are part of 663 the route key used by BGP to compare routes. The rest of the fields 664 are not part of the route key. 666 This route is advertised along with the following extended 667 communitiy: 669 1) Tunnel Type Extended Community 671 For asymmetric IRB mode, Router's MAC EC is not needed because 672 forwarding is performed using destination TS's MAC address which is 673 carried in this EVPN route type-2 advertisement. 675 This route MUST always be advertised with the MAC-VRF route target. 676 It MAY also be advertised with a second route target corresponding to 677 the IP-VRF. If only MAC-VRF route target is used, then the receiving 678 PE uses the MAC-VRF route target to identify the corresponding IP-VRF 679 - i.e., many MAC-VRF route targets map to the same IP-VRF for a given 680 tenant. Since in this asymmetric IRB mode, each PE is configured with 681 every BD for a tenant, the MAC-VRF route target has the same 682 reachability as the IP-VRF route target and that is why the use of 683 IP-VRF route target is optional for this IRB mode. 685 3.3.2 Control Plane - Egress PE 687 When a PE (e.g., PE2 in figure 4 above) receives this EVPN MAC/IP 688 Advertisement route advertisement, it performs the following: 690 - Using MAC-VRF route target, it identifies the corresponding MAC-VRF 691 and imports the MAC address into it. For asymmetric IRB mode, it is 692 assumed that all PEs participating in a tenant's VPN are configured 693 with all subnets and corresponding MAC-VRFs/BTs even if there are no 694 locally attached TS's for some of these subnets. The reason for this 695 is because ingress PE needs to do forwarding based on destination 696 TS's MAC address and does proper NVO tunnel encapsulation which are 697 property of a lookup in MAC-VRF/BT. An implementation may choose to 698 consolidate the lookup at the ingress PE's IP-VRF with the lookup at 699 the ingress PE's destination subnet MAC-VRF. Consideration for such 700 consolidation of lookups is an implementation exercise and thus its 701 specification is outside the scope of this document. 703 - Using MAC-VRF route target, it identifies the corresponding ARP 704 table for the tenant and it adds an entry to the ARP table for the 705 TS's MAC and IP address association. It should be noted that the 706 tenant's ARP table at the receiving PE is identified by all the MAC- 707 VRF route targets for that tenant. If IP-VRF route target is included 708 with this route advertisement, then it MAY be used for the 709 identification of tenant's ARP table. 711 If the receiving PE receives the MAC/IP Advertisement route with MPLS 712 label2 field but the receiving PE only supports asymmetric IRB mode, 713 then the receiving PE MUST ignore MPLS label2 field and install the 714 MAC address in the corresponding MAC-VRF and association in 715 the ARP table for that tenant (identified by either MAC-VRF or IP-VRF 716 route targets). 718 If the receiving PE receives the MAC/IP Advertisement route with MPLS 719 label2 field and it can support symmetric IRB mode, then it should 720 use the MAC-VRF route target to identify its corresponding MAC-VRF 721 table and import the MAC address. It should use the IP-VRF route 722 target to identify the corresponding IP-VRF table and import the IP 723 address. It MUST not import association into its ARP 724 table. 726 3.3.3 Data Plane - Ingress PE 728 When an Ethernet frame is received by an ingress PE (e.g., PE1 in 729 figure 4 above), the PE uses the AC ID (e.g., VLAN ID) to identify 730 the associated MAC-VRF/BT and it performs a lookup on the destination 731 MAC address. If the MAC address corresponds to its IRB Interface MAC 732 address, the ingress PE deduces that the packet must be inter-subnet 733 routed. Hence, the ingress PE performs an IP lookup in the associated 734 IP-VRF table. The lookup identifies a local adjacency to the IRB 735 interface associated with the egress subnet's MAC-VRF/BT. 737 The ingress PE gets the destination TS's MAC address for that TS's IP 738 address from its ARP table, it encapsulates the packet with that 739 destination MAC address and a source MAC address corresponding to 740 that IRB interface and sends the packet to its destination subnet 741 MAC-VRF/BT. 743 The destination MAC address lookup in the MAC-VRF/BT results in BGP 744 next hop address of egress PE along with VPN MPLS label or VNI. The 745 ingress PE encapsulates the packet using Ethernet NVO tunnel of the 746 choice (e.g., VxLAN or GENEVE) and sends the packet to the egress PE. 747 Since the packet forwarding is between ingress PE's MAC-VRF/BT and 748 egress PE's MAC-VRF/BT, the packet encapsulation procedures follows 749 that of [RFC7432] for MPLS and [RFC8365] for VxLAN encapsulations. 751 3.3.4 Data Plane - Egress PE 753 When a tenant's Ethernet frame is received over an NVO tunnel by the 754 egress PE, the egress PE removes NVO tunnel encapsulation and uses 755 the VPN MPLS label (for MPLS encapsulation) or VNI (for NVO 756 encapsulation) to identify the MAC-VRF/BT in which MAC lookup needs 757 to be performed. 759 The MAC lookup results in local adjacency (e.g., local interface) 760 over which the packet needs to get sent. 762 Note that the forwarding behavior on the egress PE is the same as 763 EVPN intra-subnet forwarding described in [RFC7432] for MPLS and 764 [RFC8365] for NVO networks. In other words, all the packet processing 765 associated with the inter-subnet forwarding semantics is confined to 766 the ingress PE for asymmetric IRB mode. 768 It should also be noted that [RFC7432] provides different level of 769 granularity for the EVPN label. Besides identifying bridge domain 770 table, it can be used to identify the egress interface or a 771 destination MAC address on that interface. If EVPN label is used for 772 egress interface or individual MAC address identification, then no 773 MAC lookup is needed in the egress PE for MPLS encapsulation and the 774 packet can be directly forwarded to the egress interface just based 775 on EVPN label lookup. 777 4 Mobility Procedure 779 When a TS move from one NVE (aka source NVE) to another NVE (aka 780 target NVE), it is important that the MAC mobility procedures are 781 properly executed and the corresponding MAC-VRF and IP-VRF tables on 782 all participating NVEs are updated. [RFC7432] describes the MAC 783 mobility procedures for L2-only services for both single-homed TS and 784 multi-homed TS. This section describes the incremental procedures and 785 BGP Extended Communities needed to handle the MAC mobility for IRB. 787 In order to place the emphasis on the differences between L2-only and 788 IRB use cases, the incremental procedure is described for single- 789 homed TS with the expectation that the reader can easily extrapolate 790 multi-homed TS based on the procedures described in section 15 of 791 [RFC7432]. This section describes mobility procedures for both 792 symmetric and asymmetric IRB. 794 4.1 Mobility Procedure for Symmetric IRB 796 When a TS moves from a source NVE to a target NVE, it can behave in 797 one of the following three ways: 799 1) TS initiates an ARP request upon a move to the target NVE 801 2) TS sends data packet without first initiating an ARP request to 802 the target NVE 804 3) TS is a silent host and neither initiates an ARP request nor sends 805 any packets 807 The following subsections describe the procedures for each of the 808 above options. In the following subsections, it is assumed that the 809 MAC & IP addresses of a TS have one-to-one relationship (i.e., there 810 is one IP address per MAC address and vise versa). If such there is 811 many-to-one relationship such that there are many host IP addresses 812 correspond to a single host MAC address or there are many host MAC 813 addresses correspond to a single IP address, then to detect host 814 mobility, the procedures in [IRB-EXT-MOBILITY] must be exercised 815 followed by the procedures described below. 817 4.1.1 Initiating an ARP Request upon a Move 819 In this scenario when a TS moves from a source NVE to a target NVE, 820 the TS initiates an ARP request upon the move (e.g., gratuitous ARP) 821 to the target NVE. 823 The target NVE upon receiving this ARP request, updates its MAC-VRF, 824 IP-VRF, and ARP table with the host MAC, IP, and local adjacency 825 information (e.g., local interface). 827 Since this NVE has previously learned the same MAC and IP addresses 828 from the source NVE, it recognizes that there has been a MAC move and 829 it initiates MAC mobility procedures per [RFC7432] by advertising an 830 EVPN MAC/IP route with both the MAC and IP addresses filled in along 831 with MAC Mobility Extended Community with the sequence number 832 incremented by one. 834 The source NVE upon receiving this MAC/IP advertisement, realizes 835 that the MAC has moved to the target NVE. It updates its MAC-VRF and 836 IP-VRF table accordingly with the adjacency information of the target 837 NVE and withdraws its EVPN MAC/IP route. Furthermore, it sends an ARP 838 probe locally to ensure that the MAC is gone and it deletes its ARP 839 entry corresponding to that when there is no ARP response. 841 All other remote NVE devices upon receiving the MAC/IP advertisement 842 route with MAC Mobility extended community compare the sequence 843 number in this advertisement with the one previously received. If the 844 new sequence number is greater than the old one, then they update the 845 MAC/IP addresses of the TS in their corresponding MAC-VRF and IP-VRF 846 tables to point to the target NVE. Furthermore, upon receiving the 847 MAC/IP withdraw for the TS from the source NVE, these remote PEs 848 perform the cleanups for their BGP tables. 850 4.1.2 Sending Data Traffic without an ARP Request 852 In this scenario when a TS moves from a source NVE to a target NVE, 853 the TS starts sending data traffic without first initiating an ARP 854 request. 856 The target NVE upon receiving the first data packet, it learns the 857 MAC address of the TS in data plane and updates its MAC-VRF table 858 with the MAC address and the local adjacency information (e.g., local 859 interface) accordingly. The target NVE realizes that there has been a 860 MAC move because the same MAC address has been learned remotely from 861 the source NVE. 863 If EVPN-IRB NVEs are configured to advertise MAC-only routes in 864 addition to MAC-and-IP EVPN routes, then the following steps are 865 taken: 867 - The target NVE upon learning this MAC address in data-plane, 868 updates this MAC address entry in the corresponding MAC-VRF with the 869 local adjacency information (e.g., local interface). It also 870 recognizes that this MAC has moved and initiates MAC mobility 871 procedures per [RFC7432] by advertising an EVPN MAC/IP route with 872 only the MAC address filled in along with MAC Mobility Extended 873 Community with the sequence number incremented by one. 875 - The source NVE upon receiving this MAC/IP advertisement, realizes 876 that the MAC has moved to the new NVE. It updates its MAC-VRF table 877 accordingly by updating the adjacency information for that MAC 878 address to point to the target NVE and withdraws its EVPN MAC/IP 879 route that has only the MAC address (if it has advertised such route 880 previously). Furthermore, it searches its ARP table for this MAC and 881 sends an ARP probe for this pair. The ARP request message is 882 sent both locally to all attached TS's in that subnet as well as it 883 is sent to other NVEs participating in that subnet including the 884 target NVE. 886 - The target NVE passes the ARP request to its locally attached TS's 887 and when it receives the ARP response, it sends an EVPN MAC/IP 888 advertisement route with both the MAC and IP addresses filled in 889 along with MAC Mobility Extended Community with the sequence number 890 set to the same value as the one for MAC-only advertisement route it 891 sent previously. 893 - When the source NVE receives the EVPN MAC/IP advertisement, it 894 updates its IP-VRF table with the new adjacency information 895 (pointing to the target NVE) and deletes the associated ARP entry 896 from its ARP table. Furthermore, it withdraws its previously 897 advertised EVPN MAC/IP route with both the MAC and IP addresses. 899 - All other remote NVE devices upon receiving the MAC/IP 900 advertisement route with MAC Mobility extended community compare the 901 sequence number in this advertisement with the one previously 902 received. If the new sequence number is greater than the old one, 903 then they update the MAC/IP addresses of the TS in their 904 corresponding MAC-VRF and IP-VRF tables to point to the new NVE. 905 Furthermore, upon receiving the MAC/IP withdraw for the TS from the 906 old NVE, these remote PEs perform the cleanups for their BGP tables. 908 If EVPN-IRB NVEs are configured not to advertise MAC-only routes, 909 then upon receiving the first data packet, it learns the MAC address 910 of the TS and updates the MAC entry in the corresponding MAC-VRF 911 table with the local adjacency information (e.g., local interface). 912 It also realizes that there has been a MAC move because the same MAC 913 address has been learned remotely from the source NVE. It then sends 914 an unicast ARP request to the host and when receiving an ARP 915 response, it follows the procedure outlined in section 4.1.1. 917 4.1.3 Silent Host 919 In this scenario when a TS moves from a source NVE to a target NVE, 920 the TS is silent and it neither initiates an ARP request nor it sends 921 any data traffic. Therefore, neither the target nor the source NVEs 922 are aware of the MAC move. 924 On the source NVE, the MAC age-out timer expires and as the result it 925 triggers an ARP probe on the source NVE. The ARP request gets sent 926 both locally to all the attached TS's on that subnet as well as it 927 gets sent to all the remote NVEs (including the target NVE) 928 participating in that subnet. It also withdraw the EVPN MAC/IP route 929 with only the MAC address (if it has previously advertised such a 930 route). 932 The target NVE passes the ARP request to its locally attached TS's 933 and when it receives the ARP response, it sends an EVPN MAC/IP 934 advertisement route with both the MAC and IP addresses filled in 935 along with MAC Mobility Extended Community with the sequence number 936 incremented by one. 938 When the source NVE receives the EVPN MAC/IP advertisement, it 939 updates its IP-VRF table with the new adjacency information 940 (pointing to the target NVE) and deletes the associated ARP entry 941 from its ARP table. Furthermore, it withdraws its previously 942 advertised EVPN MAC/IP route with both the MAC and IP addresses. 944 All other remote NVE devices upon receiving the MAC/IP advertisement 945 route with MAC Mobility extended community compare the sequence 946 number in this advertisement with the one previously received. If the 947 new sequence number is greater than the old one, then they update the 948 MAC/IP addresses of the TS in their corresponding MAC-VRF and IP-VRF 949 tables to point to the new NVE. Furthermore, upon receiving the 950 MAC/IP withdraw for the TS from the old NVE, these remote PEs perform 951 the cleanups for their BGP tables. 953 5 BGP Encoding 955 This document defines one new BGP Extended Community for EVPN. 957 5.1 Router's MAC Extended Community 959 A new EVPN BGP Extended Community called Router's MAC is introduced 960 here. This new extended community is a transitive extended community 961 with the Type field of 0x06 (EVPN) and the Sub-Type of 0x03. It may 962 be advertised along with BGP Encapsulation Extended Community define 963 in section 4.5 of [TUNNEL-ENCAP]. 965 The Router's MAC Extended Community is encoded as an 8-octet value as 966 follows: 968 0 1 2 3 969 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 970 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 971 | Type=0x06 | Sub-Type=0x03 | Router's MAC | 972 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 973 | Router's MAC Cont'd | 974 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 976 Figure 5: Router's MAC Extended Community 978 This extended community is used to carry the PE's MAC address for 979 symmetric IRB scenarios and it is sent with RT-2. 981 6 Operational Models for Symmetric Inter-Subnet Forwarding 983 The following sections describe two main symmetric IRB forwarding 984 scenarios (within a DC - i.e., intra-DC) along with their 985 corresponding procedures. In the following scenarios, without loss of 986 generality, it is assumed that a given tenant is represented by a 987 single IP-VPN instance. Therefore, on a given PE, a tenant is 988 represented by a single IP-VRF table and one or more MAC-VRF tables. 990 6.1 IRB forwarding on NVEs for Tenant Systems 992 This section covers the symmetric IRB procedures for the scenario 993 where each Tenant System (TS) is attached to one or more NVEs and its 994 host IP and MAC addresses are learned by the attached NVEs and are 995 distributed to all other NVEs that are interested in participating in 996 both intra-subnet and inter-subnet communications with that TS. 998 In this scenario, without loss of generality, it is assumed that NVEs 999 operate in VLAN-based service interface mode with one Bridge Table 1000 (BT) per MAC-VRF. Thus for a given tenant, an NVE has one MAC-VRF for 1001 each tenant's subnet (e.g., each VLAN) that is configured for which 1002 is typically the case for VxLAN and GENEVE encapsulation. In case of 1003 VLAN-aware bundling, then each MAC-VRF consists of multiple Bridge 1004 Tables (e.g., one BT per VLAN). The MAC-VRFs on an NVE for a given 1005 tenant are associated with an IP-VRF corresponding to that tenant (or 1006 IP-VPN instance) via their IRB interfaces. 1008 Each NVE MUST support QoS, Security, and OAM policies per IP-VRF 1009 to/from the core network. This is not to be confused with the QoS, 1010 Security, and OAM policies per Attachment Circuits (AC) to/from the 1011 Tenant Systems. How this requirement is met is an implementation 1012 choice and it is outside the scope of this document. 1014 Since VxLAN and GENEVE encapsulations require inner Ethernet header 1015 (inner MAC SA/DA), and since for inter-subnet traffic, TS MAC address 1016 cannot be used, the ingress NVE's MAC address is used as inner MAC 1017 SA. The NVE's MAC address is the device MAC address and it is common 1018 across all MAC-VRFs and IP-VRFs. This MAC address is advertised using 1019 the new EVPN Router's MAC Extended Community (section 5.1). 1021 Figure 6 below illustrates this scenario where a given tenant (e.g., 1022 an IP-VPN instance) has three subnets represented by MAC-VRF1, MAC- 1023 VRF2, and MAC-VRF3 across two NVEs. There are five TS's that are 1024 associated with these three MAC-VRFs - i.e., TS1, TS4, and TS5 are 1025 sitting on the same subnet (e.g., same MAC-VRF/VLAN);where, TS1 and 1026 TS5 are associated with MAC-VRF1 on NVE1, TS4 is associated with MAC- 1027 VRF1 on NVE2. TS2 is associated with MAC-VRF2 on NVE1, and TS3 is 1028 associated with MAC-VRF3 on NVE2. MAC-VRF1 and MAC-VRF2 on NVE1 are 1029 in turn associated with IP-VRF1 on NVE1 and MAC-VRF1 and MAC-VRF3 on 1030 NVE2 are associated with IP-VRF1 on NVE2. When TS1, TS5, and TS4 1031 exchange traffic with each other, only L2 forwarding (bridging) part 1032 of the IRB solution is exercised because all these TS's sit on the 1033 same subnet. However, when TS1 wants to exchange traffic with TS2 or 1034 TS3 which belong to different subnets, then both bridging and routing 1035 parts of the IRB solution are exercised. The following subsections 1036 describe the control and data planes operations for this IRB scenario 1037 in details. 1039 NVE1 +---------+ 1040 +-------------+ | | 1041 TS1-----| MACx| | | NVE2 1042 (IP1/M1) |(MAC- | | | +-------------+ 1043 TS5-----| VRF1)\ | | MPLS/ | |MACy (MAC- |-----TS3 1044 (IP5/M5) | \ | | VxLAN/ | | / VRF3) | (IP3/M3) 1045 | (IP-VRF1)|----| NVGRE |---|(IP-VRF1) | 1046 | / | | | | \ | 1047 TS2-----|(MAC- / | | | | (MAC- |-----TS4 1048 (IP2/M2) | VRF2) | | | | VRF1) | (IP4/M4) 1049 +-------------+ | | +-------------+ 1050 | | 1051 +---------+ 1053 Figure 6: IRB forwarding on NVEs for Tenant Systems 1055 6.1.1 Control Plane Operation 1057 Each NVE advertises a MAC/IP Advertisement route (i.e., Route Type 2) 1058 for each of its TS's with the following field set: 1060 - RD and ESI per [RFC7432] 1061 - Ethernet Tag = 0; assuming VLAN-based service 1062 - MAC Address Length = 48 1063 - MAC Address = Mi ; where i = 1,2,3,4, or 5 in the above example 1064 - IP Address Length = 32 or 128 1065 - IP Address = IPi ; where i = 1,2,3,4, or 5 in the above example 1066 - Label-1 = MPLS Label or VNI corresponding to MAC-VRF 1067 - Label-2 = MPLS Label or VNI corresponding to IP-VRF 1069 Each NVE advertises an RT-2 route with two Route Targets (one 1070 corresponding to its MAC-VRF and the other corresponding to its IP- 1071 VRF. Furthermore, the RT-2 is advertised with two BGP Extended 1072 Communities. The first BGP Extended Community identifies the tunnel 1073 type per section 4.5 of [TUNNEL-ENCAP] and the second BGP Extended 1074 Community includes the MAC address of the NVE (e.g., MACx for NVE1 or 1075 MACy for NVE2) as defined in section 5.1. This second Extended 1076 Community (for the MAC address of NVE) is only required when Ethernet 1077 NVO tunnel type is used. If IP NVO tunnel type is used, then there is 1078 no need to send this second Extended Community. It should be noted 1079 that IP NVO tunnel type is only applicable to symmetric IRB 1080 procedures. 1082 Upon receiving this advertisement, the receiving NVE performs the 1083 following: 1085 - It uses Route Targets corresponding to its MAC-VRF and IP-VRF for 1086 identifying these tables and subsequently importing the MAC and IP 1087 addresses into them respectively. 1089 - It imports the MAC address from MAC/IP Advertisement route into the 1090 MAC-VRF with BGP Next Hop address as underlay tunnel destination 1091 address (e.g., VTEP DA for VxLAN encapsulation) and Label-1 as VNI 1092 for VxLAN encapsulation or EVPN label for MPLS encapsulation. 1094 - If the route carries the new Router's MAC Extended Community, and 1095 if the receiving NVE is using Ethernet NVO tunnel, then the receiving 1096 NVE imports the IP address into IP-VRF with NVE's MAC address (from 1097 the new Router's MAC Extended Community) as inner MAC DA and BGP Next 1098 Hop address as underlay tunnel destination address, VTEP DA for VxLAN 1099 encapsulation and Label-2 as IP-VPN VNI for VxLAN encapsulation. 1101 - If the receiving NVE is going to use MPLS encapsulation, then the 1102 receiving NVE imports the IP address into IP-VRF with BGP Next Hop 1103 address as underlay tunnel destination address, and Label-2 as IP-VPN 1104 label for MPLS encapsulation. 1106 If the receiving NVE receives a RT-2 with only Label-1 and only a 1107 single Route Target corresponding to IP-VRF, or if it receives a RT-2 1108 with only a single Route Target corresponding to MAC-VRF but with 1109 both Label-1 and Label-2, or if it receives a RT-2 with MAC Address 1110 Length of zero, then it MUST treat the route as withdraw [RFC7606] 1111 and log an error message. 1113 6.1.2 Data Plane Operation 1115 The following description of the data-plane operation describes just 1116 the logical functions and the actual implementation may differ. Lets 1117 consider data-plane operation when TS1 in subnet-1 (MAC-VRF1) on NVE1 1118 wants to send traffic to TS3 in subnet-3 (MAC-VRF3) on NVE2. 1120 - NVE1 receives a packet with MAC DA corresponding to the MAC-VRF1 1121 IRB interface on NVE1 (the interface between MAC-VRF1 and IP-VRF1), 1122 and VLAN-tag corresponding to MAC-VRF1. 1124 - Upon receiving the packet, the NVE1 uses VLAN-tag to identify the 1125 MAC-VRF1. It then looks up the MAC DA and forwards the frame to its 1126 IRB interface. 1128 - The Ethernet header of the packet is stripped and the packet is 1129 fed to the IP-VRF where IP lookup is performed on the destination IP 1130 address. This lookup yields the outgoing NVO tunnel and the required 1131 encapsulation. If the encapsulation is for Ethernet NVO tunnel, then 1132 it includes the egress NVE's MAC address as inner MAC DA, the egress 1133 NVE's IP address (e.g., BGP Next Hop address) as the VTEP DA, and the 1134 VPN-ID as the VNI. The inner MAC SA and VTEP SA are set to NVE's MAC 1135 and IP addresses respectively. If it is a MPLS encapsulation, then 1136 corresponding EVPN and LSP labels are added to the packet. The packet 1137 is then forwarded to the egress NVE. 1139 - On the egress NVE, if the packet arrives on Ethernet NVO tunnel 1140 (e.g., it is VxLAN encapsulated), then the NVO tunnel header is 1141 removed. Since the inner MAC DA is the egress NVE's MAC address, the 1142 egress NVE knows that it needs to perform an IP lookup. It uses the 1143 VNI to identify the IP-VRF table. If the packet is MPLS encapsulated, 1144 then the EVPN label lookup identifies the IP-VRF table. Next, an IP 1145 lookup is performed for the destination TS (TS3) which results in 1146 access-facing IRB interface over which the packet is sent. Before 1147 sending the packet over this interface, the ARP table is consulted to 1148 get the destination TS's MAC address. 1150 - The IP packet is encapsulated with an Ethernet header with MAC SA 1151 set to that of IRB interface MAC address (i.e, IRB interface between 1152 MAC-VRF3 and IP-VRF1 on NVE2) and MAC DA set to that of destination 1153 TS (TS3) MAC address. The packet is sent to the corresponding MAC-VRF 1154 (i.e., MAC-VRF3) and after a lookup of MAC DA, is forwarded to the 1155 destination TS (TS3) over the corresponding interface. 1157 In this symmetric IRB scenario, inter-subnet traffic between NVEs 1158 will always use the IP-VRF VNI/MPLS label. For instance, traffic from 1159 TS2 to TS4 will be encapsulated by NVE1 using NVE2's IP-VRF VNI/MPLS 1160 label, as long as TS4's host IP is present in NVE1's IP-VRF. 1162 6.2 IRB forwarding on NVEs for Subnets behind Tenant Systems 1164 This section covers the symmetric IRB procedures for the scenario 1165 where some Tenant Systems (TS's) support one or more subnets and 1166 these TS's are associated with one or more NVEs. Therefore, besides 1167 the advertisement of MAC/IP addresses for each TS which can be multi- 1168 homed with All-Active redundancy mode, the associated NVE needs to 1169 also advertise the subnets statically configured on each TS. 1171 The main difference between this solution and the previous one is the 1172 additional advertisement corresponding to each subnet. These subnet 1173 advertisements are accomplished using EVPN IP Prefix route defined in 1174 [EVPN-PREFIX]. These subnet prefixes are advertised with the IP 1175 address of their associated TS (which is in overlay address space) as 1176 their next hop. The receiving NVEs perform recursive route resolution 1177 to resolve the subnet prefix with its associated ingress NVE so that 1178 they know which NVE to forward the packets to when they are destined 1179 for that subnet prefix. 1181 The advantage of this recursive route resolution is that when a TS 1182 moves from one NVE to another, there is no need to re-advertise any 1183 of the subnet prefixes for that TS. All it is needed is to advertise 1184 the IP/MAC addresses associated with the TS itself and exercise MAC 1185 mobility procedures for that TS. The recursive route resolution 1186 automatically takes care of the updates for the subnet prefixes of 1187 that TS. 1189 Figure below illustrates this scenario where a given tenant (e.g., an 1190 IP-VPN service) has three subnets represented by MAC-VRF1, MAC-VRF2, 1191 and MAC-VRF3 across two NVEs. There are four TS's associated with 1192 these three MAC-VRFs - i.e., TS1, TS5 are connected to MAC-VRF1 on 1193 NVE1, TS2 is connected to MAC-VRF2 on NVE1, TS3 is connected to MAC- 1194 VRF3 on NVE2, and TS4 is connected to MAC-VRF1 on NVE2. TS1 has two 1195 subnet prefixes (SN1 and SN2) and TS3 has a single subnet prefix, 1196 SN3. The MAC-VRFs on each NVE are associated with their corresponding 1197 IP-VRF using their IRB interfaces. When TS4 and TS1 exchange intra- 1198 subnet traffic, only L2 forwarding (bridging) part of the IRB 1199 solution is used (i.e., the traffic only goes through their MAC- 1200 VRFs); however, when TS3 wants to forward traffic to SN1 or SN2 1201 sitting behind TS1 (inter-subnet traffic), then both bridging and 1202 routing parts of the IRB solution are exercised (i.e., the traffic 1203 goes through the corresponding MAC-VRFs and IP-VRFs). The following 1204 subsections describe the control and data planes operations for this 1205 IRB scenario in details. 1207 NVE1 +----------+ 1208 SN1--+ +-------------+ | | 1209 |--TS1-----|(MAC- \ | | | 1210 SN2--+ IP1/M1 | VRF1) \ | | | 1211 | (IP-VRF)|---| | 1212 | / | | | 1213 TS2-----|(MAC- / | | MPLS/ | 1214 IP2/M2 | VRF2) | | VxLAN/ | 1215 +-------------+ | NVGRE | 1216 +-------------+ | | 1217 SN3--+--TS3-----|(MAC-\ | | | 1218 IP3/M3 | VRF3)\ | | | 1219 | (IP-VRF)|---| | 1220 | / | | | 1221 TS4-----|(MAC- / | | | 1222 IP4/M4 | VRF1) | | | 1223 +-------------+ +----------+ 1224 NVE2 1226 Figure 7: IRB forwarding on NVEs for subnets behind TS's 1228 6.2.1 Control Plane Operation 1230 Each NVE advertises a Route Type-5 (RT-5, IP Prefix Route defined in 1231 [EVPN-PREFIX]) for each of its subnet prefixes with the IP address of 1232 its TS as the next hop (gateway address field) as follow: 1234 - RD associated with the IP-VRF 1235 - ESI = 0 1236 - Ethernet Tag = 0; 1237 - IP Prefix Length = 32 or 128 1238 - IP Prefix = SNi 1239 - Gateway Address = IPi; IP address of TS 1240 - Label = 0 1242 This RT-5 is advertised with one or more Route Targets that have been 1243 configured as "export route targets" of the IP-VRF from which the 1244 route is originated. 1246 Each NVE also advertises an RT-2 (MAC/IP Advertisement Route) along 1247 with their associated Route Targets and Extended Communities for each 1248 of its TS's exactly as described in section 6.1.1. 1250 Upon receiving the RT-5 advertisement, the receiving NVE performs the 1251 following: 1253 - It uses the Route Target to identify the corresponding IP-VRF 1255 - It imports the IP prefix into its corresponding IP-VRF that is 1256 configured with an import RT that is one of the RTs being carried by 1257 the RT-5 route along with the IP address of the associated TS as its 1258 next hop. 1260 When receiving the RT-2 advertisement, the receiving NVE imports 1261 MAC/IP addresses of the TS into the corresponding MAC-VRF and IP-VRF 1262 per section 6.1.1. When both routes exist, recursive route resolution 1263 is performed to resolve the IP prefix (received in RT-5) to its 1264 corresponding NVE's IP address (e.g., its BGP next hop). BGP next hop 1265 will be used as underlay tunnel destination address (e.g., VTEP DA 1266 for VxLAN encapsulation) and Router's MAC will be used as inner MAC 1267 for VxLAN encapsulation. 1269 6.2.2 Data Plane Operation 1271 The following description of the data-plane operation describes just 1272 the logical functions and the actual implementation may differ. Lets 1273 consider data-plane operation when a host on SN1 sitting behind TS1 1274 wants to send traffic to a host sitting behind SN3 behind TS3. 1276 - TS1 send a packet with MAC DA corresponding to the MAC-VRF1 IRB 1277 interface of NVE1, and VLAN-tag corresponding to MAC-VRF1. 1279 - Upon receiving the packet, the ingress NVE1 uses VLAN-tag to 1280 identify the MAC-VRF1. It then looks up the MAC DA and forwards the 1281 frame to its IRB interface just like section 6.1.1. 1283 - The Ethernet header of the packet is stripped and the packet is fed 1284 to the IP-VRF; where, IP lookup is performed on the destination 1285 address. This lookup yields the fields needed for VxLAN encapsulation 1286 with NVE2's MAC address as the inner MAC DA, NVE'2 IP address as the 1287 VTEP DA, and the VNI. MAC SA is set to NVE1's MAC address and VTEP SA 1288 is set to NVE1's IP address. 1290 - The packet is then encapsulated with the proper header based on 1291 the above info and is forwarded to the egress NVE (NVE2). 1293 - On the egress NVE (NVE2), assuming the packet is VxLAN 1294 encapsulated, the VxLAN and the inner Ethernet headers are removed 1295 and the resultant IP packet is fed to the IP-VRF associated with that 1296 the VNI. 1298 - Next, a lookup is performed based on IP DA (which is in SN3) in the 1299 associated IP-VRF of NVE2. The IP lookup yields the access-facing IRB 1300 interface over which the packet needs to be sent. Before sending the 1301 packet over this interface, the ARP table is consulted to get the 1302 destination TS (TS3) MAC address. 1304 - The IP packet is encapsulated with an Ethernet header with the MAC 1305 SA set to that of the access-facing IRB interface of the egress NVE 1306 (NVE2) and the MAC DA is set to that of destination TS (TS3) MAC 1307 address. The packet is sent to the corresponding MAC-VRF3 and after a 1308 lookup of MAC DA, is forwarded to the destination TS (TS3) over the 1309 corresponding interface. 1311 7 Acknowledgements 1313 The authors would like to thank Sami Boutros, Jeffrey Zhang, 1314 Krzysztof Szarkowicz, and Neeraj Malhotra for their valuable 1315 comments. 1317 8 Security Considerations 1319 This document describes a set of procedures for Inter-Subnet 1320 Forwarding of tenant traffic across PEs (or NVEs). These procedures 1321 include both layer-2 forwarding and layer-3 routing on a packet by 1322 packet basis. The security consideration for layer-2 forwarding in 1323 this document follow that of [RFC7432] for MPLS encapsulation and it 1324 follows that of [RFC8365] for VxLAN or GENEVE encapsulations. 1326 Furthermore, the security consideration for layer-3 routing is this 1327 document follows that of [RFC4365] with the exception for application 1328 of routing protocols between CEs and PEs. Contrary to [RFC4364], this 1329 document does not describe route distribution techniques between CEs 1330 and PEs, but rather considers the CEs as TSes or VAs that do not run 1331 dynamic routing protocols. This can be considered a security 1332 advantage, since dynamic routing protocols can be blocked on the 1333 NVE/PE ACs, not allowing the tenant to interact with the 1334 infrastructure's dynamic routing protocols. 1336 In this document, the RT-5 is used for certain scenarios. This route 1337 uses an Overlay Index that requires a recursive resolution to a 1338 different EVPN route (an RT-2). Because of this, it is worth noting 1339 that any action that ends up filtering or modifying the RT-2 route 1340 used to convey the Overlay Indexes, will modify the resolution of the 1341 RT-5 and therefore the forwarding of packets to the remote subnet. 1343 9 IANA Considerations 1345 IANA has allocated a new transitive extended community Type of 0x06 1346 and Sub-Type of 0x03 for EVPN Router's MAC Extended Community. 1348 10 References 1350 10.1 Normative References 1352 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1353 Requirement Levels", BCP 14, RFC 2119, March 1997. 1355 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC2119 1356 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 1357 2017. 1359 [RFC7432] Sajassi et al., "BGP MPLS Based Ethernet VPN", RFC 7432, 1360 February, 2015. 1362 [RFC8365] Sajassi et al., "A Network Virtualization Overlay Solution 1363 Using Ethernet VPN (EVPN)", RFC 8365, March, 2018. 1365 [TUNNEL-ENCAP] Rosen et al., "The BGP Tunnel Encapsulation 1366 Attribute", draft-ietf-idr-tunnel-encaps-03, November 1367 2016. 1369 [EVPN-PREFIX] Rabadan et al., "IP Prefix Advertisement in EVPN", 1370 draft-ietf-bess-evpn-prefix-advertisement-03, September, 1371 2016. 1373 10.2 Informative References 1375 [RFC7606] Chen, E., Scudder, J., Mohapatra, P., and K. Patel, 1376 "Revised Error Handling for BGP UPDATE Messages", RFC 7606, August 1377 2015, . 1379 [802.1Q] "IEEE Standard for Local and metropolitan area networks - 1380 Media Access Control (MAC) Bridges and Virtual Bridged Local Area 1381 Networks", IEEE Std 802.1Q(tm), 2014 Edition, November 2014. 1383 [RFC7348] Mahalingam, M., et al., "Virtual eXtensible Local Area 1384 Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 1385 Networks over Layer 3 Networks", RFC 7348, DOI 10.17487/RFC7348, 1386 August 2014. 1388 [GENEVE] Gross, J., et al., "Geneve: Generic Network Virtualization 1389 Encapsulation", Work in Progress, draft-ietf-nvo3-geneve-06, March 1390 2018. 1392 [IRB-EXT-MOBILITY] Malhotra, N., al., "Extended Mobility Procedures 1393 for EVPN-IRB", Work in Progress, draft-malhotra-bess-evpn-irb- 1394 extended-mobility-02, February 2018. 1396 11 Contributors 1398 In addition to the authors listed on the front page, the following 1399 co-authors have also contributed to this document: 1401 Florin Balus 1402 Cisco 1404 Yakov Rekhter 1405 Juniper 1407 Wim Henderickx 1408 Nokia 1410 Lucy Yong 1411 Linda Dunbar 1412 Huawei 1414 Dennis Cai 1415 Alibaba 1417 Authors' Addresses 1419 Ali Sajassi (Editor) 1420 Cisco 1421 Email: sajassi@cisco.com 1423 Samer Salam 1424 Cisco 1425 Email: sslam@cisco.com 1427 Samir Thoria 1428 Cisco 1429 Email: sthoria@cisco.com 1431 John E. Drake 1432 Juniper 1433 Email: jdrake@juniper.net 1435 Jorge Rabadan 1436 Nokia 1437 Email: jorge.rabadan@nokia.com