idnits 2.17.1 draft-ietf-bess-evpn-inter-subnet-forwarding-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 3 instances of too long lines in the document, the longest one being 3 characters in excess of 72. ** There are 3 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 8, 2017) is 2634 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'LS' is mentioned on line 625, but not defined == Missing Reference: 'RFC5512' is mentioned on line 1027, but not defined ** Obsolete undefined reference: RFC 5512 (Obsoleted by RFC 9012) == Missing Reference: 'RFC7432' is mentioned on line 1089, but not defined == Outdated reference: A later version (-11) exists of draft-ietf-l2vpn-evpn-04 == Outdated reference: A later version (-02) exists of draft-sajassi-l2vpn-evpn-ipvpn-interop-01 == Outdated reference: A later version (-07) exists of draft-raggarwa-data-center-mobility-05 == Outdated reference: A later version (-03) exists of draft-rabadan-l2vpn-evpn-prefix-advertisement-02 Summary: 3 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 L2VPN Workgroup A. Sajassi, Ed. 3 INTERNET-DRAFT S. Salam 4 Intended Status: Standards Track S. Thoria 5 Cisco 6 J. Drake 7 Juniper 8 J. Rabadan 9 Nokia 10 L. Yong 11 Huawei 13 Expires: August 8, 2017 February 8, 2017 15 Integrated Routing and Bridging in EVPN 16 draft-ietf-bess-evpn-inter-subnet-forwarding-02 18 Abstract 20 EVPN provides an extensible and flexible multi-homing VPN solution 21 for intra-subnet connectivity among hosts/VMs over an MPLS/IP 22 network. However, there are scenarios in which inter-subnet 23 forwarding among hosts/VMs across different IP subnets is required, 24 while maintaining the multi-homing capabilities of EVPN. This 25 document describes an Integrated Routing and Bridging (IRB) solution 26 based on EVPN to address such requirements. 28 Status of this Memo 30 This Internet-Draft is submitted to IETF in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF), its areas, and its working groups. Note that 35 other groups may also distribute working documents as 36 Internet-Drafts. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 The list of current Internet-Drafts can be accessed at 44 http://www.ietf.org/1id-abstracts.html 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 Copyright and License Notice 51 Copyright (c) 2014 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 2 Inter-Subnet Forwarding Scenarios . . . . . . . . . . . . . . . 6 68 2.1 Switching among IP subnets within a DC . . . . . . . . . . . 7 69 2.2 Switching among IP subnets in different DCs without GW . . . 8 70 2.3 Switching among IP subnets in different DCs with GW . . . . 8 71 2.4 Switching among IP subnets spread across IP-VPN and EVPN 72 networks with GW . . . . . . . . . . . . . . . . . . . . . . 8 73 3 Default L3 Gateway for Tenant System . . . . . . . . . . . . . . 9 74 3.1 Homogeneous Environment . . . . . . . . . . . . . . . . . . 9 75 3.2 Heterogeneous Environment . . . . . . . . . . . . . . . . . 10 76 4 Operational Models for Asymmetric Inter-Subnet Forwarding . . . 10 77 4.1 Among EVPN NVEs within a DC . . . . . . . . . . . . . . . . 10 78 4.2 Among EVPN NVEs in Different DCs Without GW . . . . . . . . 11 79 4.3 Among EVPN NVEs in Different DCs with GW . . . . . . . . . . 13 80 4.4 Among IP-VPN Sites and EVPN NVEs with GW . . . . . . . . . . 14 81 4.5 Use of Centralized Gateway . . . . . . . . . . . . . . . . . 15 82 5 Operational Models for Symmetric Inter-Subnet Forwarding . . . . 16 83 5.1 IRB forwarding on NVEs for Tenant Systems . . . . . . . . . 16 84 5.1.1 Control Plane Operation . . . . . . . . . . . . . . . . 17 85 5.1.2 Data Plane Operation - Inter Subnet . . . . . . . . . . 18 86 5.1.3 TS Move Operation . . . . . . . . . . . . . . . . . . . 19 87 5.2 IRB forwarding on NVEs for Subnets behind Tenant Systems . . 20 88 5.2.1 Control Plane Operation . . . . . . . . . . . . . . . . 22 89 5.2.2 Data Plane Operation . . . . . . . . . . . . . . . . . . 23 90 6 BGP Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 24 91 6.1 Router's MAC Extended Community . . . . . . . . . . . . . . 24 93 7 TS Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . 24 94 7.1 TS Mobility & Optimum Forwarding for TS Outbound Traffic . . 24 95 7.2 TS Mobility & Optimum Forwarding for TS Inbound Traffic . . 24 96 7.2.1 Mobility without Route Aggregation . . . . . . . . . . . 25 97 8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25 98 9 Security Considerations . . . . . . . . . . . . . . . . . . . . 25 99 10 IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 100 11 References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 101 11.1 Normative References . . . . . . . . . . . . . . . . . . . 25 102 11.2 Informative References . . . . . . . . . . . . . . . . . . 26 103 12 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 26 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26 106 Terminology 108 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 109 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 110 document are to be interpreted as described in RFC 2119 [RFC2119]. 112 Broadcast Domain: In a bridged network, the broadcast domain 113 corresponds to a Virtual LAN (VLAN), where a VLAN is typically 114 represented by a single VLAN ID (VID) but can be represented by 115 several VIDs where Shared VLAN Learning (SVL) is used per [802.1Q]. 117 EVI : An EVPN instance spanning the Provider Edge (PE) devices 118 participating in that EVPN 120 IRB: Integrated Routing and Bridging 122 MAC-VRF: A Virtual Routing and Forwarding table for Media Access 123 Control (MAC) addresses on a PE for an EVI 125 Bridge Table: An instantiation of a broadcast domain on a MAC-VRF 127 IP-VRF: A Virtual Routing and Forwarding table for IP addresses on a 128 PE that is associated with one or more EVIs 130 IRB Interface: A virtual interface that connects a bridge table in a 131 MAC-VRF to an IP-VRF in an NVE. 133 NVE: Network Virtualization Endpoint 135 TS: Tenant System 137 Ethernet NVO tunnel: It refers to Network Virtualization Overlay 138 tunnels with Ethernet payload. Example of this type of tunnels are 139 VxLAN and NvGRE. 141 IP NVO tunnel: It refers to Network Virtualization Overlay tunnels 142 with IP payload (no MAC header in the payload). Examples of IP NVO 143 tunnels are VxLAN GPE or MPLSoGRE (both with IP payload). 145 1 Introduction 147 EVPN provides an extensible and flexible multi-homing VPN solution 148 for intra-subnet connectivity among Tenant Systems (TS's) over an 149 MPLS/IP network; where, an IP subnet is represented by an EVI for a 150 VLAN-based service or by an for a VLAN-aware bundle 151 service. However, there are scenarios where, in addition to intra- 152 subnet forwarding, inter-subnet forwarding is required among TS's 153 across different IP subnets at EVPN PE nodes, also known as EVPN NVE 154 nodes throughout this document, while maintaining the multi-homing 155 capabilities of EVPN. This document describes an Integrated Routing 156 and Bridging (IRB) solution based on EVPN to address such 157 requirements. 159 The inter-subnet communication is traditionally achieved at 160 centralized L3 Gateway (L3GW) nodes where all the inter-subnet 161 communication policies are enforced. When two Tenant Systems (TS's) 162 belonging to two different subnets connected to the same PE node, 163 wanted to talk to each other, their traffic needed to be back hauled 164 from the PE node all the way to the centralized gateway nodes where 165 inter-subnet switching is performed and then back to the PE node. For 166 today's large multi-tenant data center, this scheme is very 167 inefficient and sometimes impractical. 169 In order to overcome the drawback of centralized approach, IRB 170 functionality is needed on the PE nodes (i.e., NVE devices) as close 171 to TS as possible to avoid hair pinning of user traffic 172 unnecessarily. Under this design, all traffic between hosts attached 173 to one NVE can be routed and bridged locally, thus avoiding traffic 174 hair-pinning issue of the centralized L3GW. 176 There can be scenarios where both centralized and distributed 177 approaches may be preferred simultaneously. For example, to allow 178 NVEs to switch inter-subnet traffic belonging to one tenant or one 179 security zone locally; whereas, to back haul inter-subnet traffic 180 belonging to two different tenants or security zones to the 181 centralized gateway nodes and perform switching there after the 182 traffic is subjected to Firewall (FW) or Deep Packet Inspection 183 (DPI). 185 Some TS's run non-IP protocols in conjunction with their IP traffic. 186 Therefore, it is important to handle both kinds of traffic optimally 187 - e.g., to bridge non-IP traffic and to route IP traffic. 189 Therefore, the solution needs to meet the following requirements: 191 R1: The solution MUST allow for inter-subnet traffic to be locally 192 switched at NVEs. 194 R2: The solution MUST allow for both inter-subnet and intra-subnet 195 traffic belonging to the same tenant to be locally routed and bridged 196 respectively. The solution MUST provide IP routing for inter-subnet 197 traffic and Ethernet Bridging for intra-subnet traffic. 199 R3: The solution MUST support bridging of non-IP traffic. 201 R4: The solution MUST allow inter-subnet switching to be disabled on 202 a per VLAN basis on NVEs where the traffic needs to be back hauled to 203 another node (i.e., for performing FW or DPI functionality). 205 2 Inter-Subnet Forwarding Scenarios 207 The inter-subnet forwarding scenarios performed by an EVPN NVE can be 208 divided into the following five categories. The last scenario, along 209 with its corresponding solution, are described in [EVPN-IPVPN- 210 INTEROP]. The first four scenarios are covered in this document. 212 1. Switching among IP subnets within a DC using EVPN 214 2. Switching among IP subnets in different DCs using EVPN without GW 216 3. Switching among IP subnets in different DCs using EVPN with GW 218 4. Switching among IP subnets spread across IP-VPN and EVPN networks 219 with GW 221 5. Switching among IP subnets spread across IP-VPN and EVPN networks 222 without GW 224 In the above scenario, the term "GW" refers to the case where a node 225 situated at the WAN edge of the data center network behaves as a 226 default gateway (GW) for all the destinations that are outside the 227 data center. The absence of GW refers to the scenario where NVEs 228 within a data center maintain individual (host) routes that are 229 outside of the data center. 231 In the case (4), the WAN edge node also performs route aggregation 232 for all the destinations within its own data center, and acts as an 233 interworking unit between EVPN and IP VPN (it implements both EVPN 234 and IP-VPN functionality). 236 +---+ Enterprise Site 1 237 |PE1|----- H1 238 +---+ 239 / 240 ,---------. Enterprise Site 2 241 ,' `. +---+ 242 ,---------. /( MPLS/IP )---|PE2|----- H2 243 ' DCN 3 `./ `. Core ,' +---+ 244 `-+------+' `-+------+' 245 __/__ / / \ \ 246 :NVE4 : +---+ \ \ 247 '-----' ,----|GW |. \ \ 248 | ,' +---+ `. ,---------. 249 TS6 ( DCN 1 ) ,' `. 250 `. ,' ( DCN 2 ) 251 `-+------+' `. ,' 252 __/__ `-+------+' 253 :NVE1 : __/__ __\__ 254 '-----' :NVE2 : :NVE3 : 255 | | '-----' '-----' 256 TS1 TS2 | | | 257 TS3 TS4 TS5 259 Figure 2: Interoperability Use-Cases 261 In what follows, we will describe scenarios 1 through 4 in more 262 detail. 264 2.1 Switching among IP subnets within a DC 266 In this scenario, connectivity is required between TS's in the same 267 data center, where those hosts belong to different IP subnets. All 268 these subnets belong to the same tenant or are part of the same IP 269 VPN. Each subnet is associated with a single EVI (or ) 270 realized by a collection of MAC-VRFs (one per NVE) residing on the 271 NVEs configured for that EVI. 273 As an example, consider TS3 and TS5 of Figure 2 above. Assume that 274 connectivity is required between these two TS's where TS3 belongs to 275 the IP-subnet 3 (SN3) whereas TS5 belongs to the IP-subnet 5 (SN5). 276 Both SN3 and SN5 subnets belong to the same tenant. NVE2 has an EVI3 277 associated with the SN3 and this EVI is represented by a MAC-VRF 278 which is associated with an IP-VRF (for that tenant) via an IRB 279 interface. NVE3 respectively has an EVI5 associated with the SN5 and 280 this EVI is represented by an MAC-VRF which is associated with the 281 same IP-VRF via a different IRB interface. 283 2.2 Switching among IP subnets in different DCs without GW 285 This case is similar to that of section 2.1 above albeit for the fact 286 that the TS's belong to different data centers that are 287 interconnected over a WAN (e.g. MPLS/IP PSN). The data centers in 288 question here are seamlessly interconnected to the WAN, i.e., the WAN 289 edge devices do not maintain any TS-specific addresses in the 290 forwarding path - e.g., there is no WAN edge GW(s) between these DCs. 292 As an example, consider TS3 and TS6 of Figure 2 above. Assume that 293 connectivity is required between these two TS's where TS3 belongs to 294 the SN3 whereas TS6 belongs to the SN6. NVE2 has an EVI3 associated 295 with SN3 and NVE4 has an EVI6 associated with the SN6. Both SN3 and 296 SN6 are part of the same IP-VRF. 298 2.3 Switching among IP subnets in different DCs with GW 300 In this scenario, connectivity is required between TS's in different 301 data centers, and those hosts belong to different IP subnets. What 302 makes this case different from that of Section 2.2 is that at least 303 one of the data centers has a gateway as the WAN edge switch. Because 304 of that, the NVE's IP-VRF within that data center need not maintain 305 (host) routes to individual TS's outside of that data center. 307 As an example, consider a tenant with TS1 and TS5 of Figure 2 above. 308 Assume that connectivity is required between these two TS's where TS1 309 belongs to the SN1 whereas TS5 belongs to the SN5. NVE3 has an EVI5 310 associated with the SN5 and this EVI is represented by the MAC-VRF 311 which is connected to the IP-VRF via an IRB interface. NVE1 has an 312 EVI1 associated with the SN1 and this EVI is represented by the MAC- 313 VRF which is connected to the IP-VRF representing the same tenant. 314 Due to the gateway at the edge of DCN 1, NVE1's IP-VRF does not need 315 to have the address of TS5 but instead it has a default route in its 316 IP-VRF with the next-hop being the GW. 318 2.4 Switching among IP subnets spread across IP-VPN and EVPN networks 319 with GW 321 In this scenario, connectivity is required between TS's in a data 322 center and hosts in an enterprise site that belongs to a given IP- 323 VPN. The NVE within the data center is an EVPN NVE, whereas the 324 enterprise site has an IP-VPN PE. Furthermore, the data center in 325 question has a gateway as the WAN edge switch. Because of that, the 326 NVE in the data center does not need to maintain individual IP 327 prefixes advertised by enterprise sites (by IP-VPN PEs). 329 As an example, consider end-station H1 and TS2 of Figure 2. Assume 330 that connectivity is required between the end-station and the TS, 331 where TS2 belongs to the SN2 that is realized using EVPN, whereas H1 332 belongs to an IP VPN site connected to PE1 (PE1 maintains an IP-VRF 333 associated with that IP VPN). NVE1 has an EVI2 associated with the 334 SN2. Moreover, EVI2 on NVE1 is connected to an IP-VRF associated with 335 that IP VPN. PE1 originates a VPN-IP route that covers H1. The 336 gateway at the edge of DCN1 performs interworking function between 337 IP-VPN and EVPN. As a result of this, a default route in the IP-VRF 338 on the NVE1, pointing to the gateway as the next hop, and a route to 339 the TS2 (or maybe SN2) on the PE1's IP-VRF are sufficient for the 340 connectivity between H1 and TS2. In this scenario, the NVE1's IP-VRF 341 does not need to maintain a route to H1 because it has the default 342 route to the gateway. 344 3 Default L3 Gateway for Tenant System 346 3.1 Homogeneous Environment 348 This is an environment where all NVEs to which an EVPN instance could 349 potentially be attached (or moved), perform inter-subnet switching. 350 Therefore, inter-subnet traffic can be locally switched by the EVPN 351 NVE connecting the TS's belonging to different subnets. 353 To support such inter-subnet forwarding, the NVE behaves as an IP 354 Default Gateway from the perspective of the attached TS's. Two models 355 are possible: 357 1. All the NVEs of a given EVPN instance use the same anycast default 358 gateway IP address and the same anycast default gateway MAC address. 359 On each NVE, this default gateway IP/MAC address correspond to the 360 IRB interface connecting the MAC-VRF of that EVI to the corresponding 361 IP-VRF. 363 2. Each NVE of a given EVPN instance uses its own default gateway IP 364 and MAC addresses, and these addresses are aliased to the same 365 conceptual gateway through the use of the Default Gateway extended 366 community as specified in [EVPN], which is carried in the EVPN MAC 367 Advertisement routes. On each NVE, this default gateway IP/MAC 368 address correspond to the IRB interface connecting the MAC-VRF of 369 that EVI to the corresponding IP-VRF. 371 Both of these models enable a packet forwarding paradigm for both 372 symmetric and asymmetric IRB forwarding. In case of asymmetric IRB, a 373 packet is forwarded through the MAC-VRF followed by the IP-VRF on the 374 ingress NVE, and then forwarded through the the MAC-VRF on the egress 375 (disposition) NVE. The egress NVE merely needs to perform a lookup in 376 the associated MAC-VRF and forward the Ethernet frames unmodified, 377 i.e. without rewriting the source MAC address. This is different 378 from symmetric IRB forwarding where a packet is forwarded through the 379 MAC-VRF followed by the IP-VRF on the ingress NVE, and then forwarded 380 through the IP-VRF followed by the MAC-VRF on the egress NVE. 382 It is worth noting that if the applications that are running on the 383 TS's are employing or relying on any form of MAC security, then the 384 first model (i.e. using anycast addresses) would be required to 385 ensure that the applications receive traffic from the same source MAC 386 address that they are sending to. 388 3.2 Heterogeneous Environment 390 For large data centers with thousands of servers and ToR (or Access) 391 switches, some of them may not have the capability of maintaining or 392 enforcing policies for inter-subnet switching. Even though policies 393 among multiple subnets belonging to same tenant can be simpler, hosts 394 belonging to one tenant can also send traffic to peers belonging to 395 different tenants or security zones. In such scenarios, a WAN edge PE 396 (e.g., L3GW) may not only need to enforce policies for communication 397 among subnets belonging to a single tenant, but also it may need to 398 know how to handle traffic destined towards peers in different 399 tenants. Therefore, there can be a mixed environment where an NVE 400 performs inter-subnet switching for some EVPN instances and the L3GW 401 for others. 403 4 Operational Models for Asymmetric Inter-Subnet Forwarding 405 4.1 Among EVPN NVEs within a DC 407 When an EVPN MAC/IP advertisement route is received by a NVE, the IP 408 address associated with the route is used to populate the IP-VRF 409 table, whereas the MAC address associated with the route is used to 410 populate both the MAC-VRF table, as well as the adjacency associated 411 with the IP route in the IP-VRF table (i.e., ARP table). 413 When an Ethernet frame is received by an ingress NVE, it performs a 414 lookup on the destination MAC address in the associated MAC-VRF for 415 that EVI. If the MAC address corresponds to its IRB Interface MAC 416 address, the ingress NVE deduces that the packet MUST be inter-subnet 417 routed. Hence, the ingress NVE performs an IP lookup in the 418 associated IP-VRF table. The lookup identifies an adjacency that 419 contains a MAC rewrite and in turn the next-hop (i.e., egress) NVE to 420 which the packet must be forwarded and the associated MPLS label 421 stack. The MAC rewrite holds the MAC address associated with the 422 destination host (as populated by the EVPN MAC route), instead of the 423 MAC address of the next-hop NVE. The ingress NVE then rewrites the 424 destination MAC address in the packet with the address specified in 425 the adjacency. It also rewrites the source MAC address with its IRB 426 Interface MAC address. The ingress NVE, then, forwards the frame to 427 the next-hop (i.e. egress) NVE after encapsulating it with the MPLS 428 label stack. Note that this label stack includes the LSP label as 429 well as the EVPN label that was advertised by the egress NVE. When 430 the MPLS encapsulated packet is received by the egress NVE, it uses 431 the EVPN label to identify the MAC-VRF table. It then performs a MAC 432 lookup in that table, which yields the outbound interface to which 433 the Ethernet frame must be forwarded. Figure 2 below depicts the 434 packet flow, where NVE1 and NVE2 are the ingress and egress NVEs, 435 respectively. 437 NVE1 NVE2 438 +------------+ +------------+ 439 | | | | 440 |(MAC - (IP | |(IP - (MAC | 441 | VRF) VRF)| | VRF) VRF)| 442 | | | | | | | | 443 +------------+ +------------+ 444 ^ v ^ V 445 | | | | 446 TS1->-+ +-->--------------+ +->-TS2 448 Figure 2: Inter-Subnet Forwarding Among EVPN NVEs within a DC 450 Note that the forwarding behavior on the egress NVE is similar to 451 EVPN intra-subnet forwarding. In other words, all the packet 452 processing associated with the inter-subnet forwarding semantics is 453 confined to the ingress NVE and that is why it is called Asymmetric 454 IRB. 456 It should also be noted that [EVPN] provides different level of 457 granularity for the EVPN label. Besides identifying bridge domain 458 table, it can be used to identify the egress interface or a 459 destination MAC address on that interface. If EVPN label is used for 460 egress interface or destination MAC address identification, then no 461 MAC lookup is needed in the egress EVI and the packet can be directly 462 forwarded to the egress interface just based on EVPN label lookup. 464 4.2 Among EVPN NVEs in Different DCs Without GW 466 When an EVPN MAC advertisement route is received by a NVE, the IP 467 address associated with the route is used to populate the IP-VRF 468 table, whereas the MAC address associated with the route is used to 469 populate both the MAC-VRF table, as well as the adjacency associated 470 with the IP route in the IP-VRF table (i.e., ARP table). 472 When an Ethernet frame is received by an ingress NVE, it performs a 473 lookup on the destination MAC address in the associated EVI. If the 474 MAC address corresponds to its IRB Interface MAC address, the ingress 475 NVE deduces that the packet MUST be inter-subnet routed. Hence, the 476 ingress NVE performs an IP lookup in the associated IP-VRF table. The 477 lookup identifies an adjacency that contains a MAC rewrite and in 478 turn the next-hop (i.e. egress) Gateway to which the packet must be 479 forwarded along with the associated MPLS label stack. The MAC rewrite 480 holds the MAC address associated with the destination host (as 481 populated by the EVPN MAC route), instead of the MAC address of the 482 next-hop Gateway. The ingress NVE then rewrites the destination MAC 483 address in the packet with the address specified in the adjacency. It 484 also rewrites the source MAC address with its IRB Interface MAC 485 address. The ingress NVE, then, forwards the frame to the next-hop 486 (i.e. egress) Gateway after encapsulating it with the MPLS label 487 stack. 489 Note that this label stack includes the LSP label as well as an EVPN 490 label. The EVPN label could be either advertised by the ingress 491 Gateway, if inter-AS option B is used, or advertised by the egress 492 NVE, if inter-AS option C is used. When the MPLS encapsulated packet 493 is received by the ingress Gateway, the processing again differs 494 depending on whether inter-AS option B or option C is employed: in 495 the former case, the ingress Gateway swaps the EVPN label in the 496 packets with the EVPN label value received from the egress Gateway. 497 In the latter case, the ingress Gateway does not modify the EVPN 498 label and performs normal label switching on the LSP label. 499 Similarly on the egress Gateway, for option B, the egress Gateway 500 swaps the EVPN label with the value advertised by the egress NVE. 501 Whereas, for option C, the egress Gateway does not modify the EVPN 502 label, and performs normal label switching on the LSP label. When the 503 MPLS encapsulated packet is received by the egress NVE, it uses the 504 EVPN label to identify the bridge-domain table. It then performs a 505 MAC lookup in that table, which yields the outbound interface to 506 which the Ethernet frame must be forwarded. Figure 3 below depicts 507 the packet flow. 509 NVE1 GW1 GW2 NVE2 510 +------------+ +------------+ +------------+ +------------+ 511 | | | | | | | | 512 |(MAC - (IP | | [LS] | | [LS] | |(IP - (MAC | 513 | VRF) VRF)| | | | | | VRF) VRF)| 514 | | | | | | | | | | | | | | | | 515 +------------+ +------------+ +------------+ +------------+ 516 ^ v ^ V ^ V ^ V 517 | | | | | | | | 518 TS1->-+ +-->--------+ +------------+ +---------------+ +->-TS2 520 Figure 3: Inter-Subnet Forwarding Among EVPN NVEs in Different DCs 521 without GW 523 4.3 Among EVPN NVEs in Different DCs with GW 525 In this scenario, the NVEs within a given data center do not have 526 entries for the MAC/IP addresses of hosts in remote data centers. 527 Rather, the NVEs have a default IP route pointing to the WAN gateway 528 for each VRF. This is accomplished by the WAN gateway advertising for 529 a given EVPN that spans multiple DC a default VPN-IP route that is 530 imported by the NVEs of that VPN that are in the gateway's own DC. 532 When an Ethernet frame is received by an ingress NVE, it performs a 533 lookup on the destination MAC address in the associated MAC-VRF 534 table. If the MAC address corresponds to the IRB Interface MAC 535 address, the ingress NVE deduces that the packet MUST be inter-subnet 536 routed. Hence, the ingress NVE performs an IP lookup in the 537 associated IP-VRF table. The lookup, in this case, matches the 538 default host route which points to the local WAN gateway. The ingress 539 NVE then rewrites the destination MAC address in the packet with the 540 router's MAC address of the local WAN gateway. It also rewrites the 541 source MAC address with its own IRB Interface MAC address. The 542 ingress NVE, then, forwards the frame to the WAN gateway after 543 encapsulating it with the MPLS label stack. Note that this label 544 stack includes the LSP label as well as the label for default host 545 route that was advertised by the local WAN gateway. When the MPLS 546 encapsulated packet is received by the local WAN gateway, it uses the 547 default host route label to identify the IP-VRF table. It then 548 performs an IP lookup in that table. The lookup identifies an 549 adjacency that contains a MAC rewrite and in turn the remote WAN 550 gateway (of the remote data center) to which the packet must be 551 forwarded along with the associated MPLS label stack. The MAC rewrite 552 holds the MAC address associated with the ultimate destination host 553 (as populated by the EVPN MAC route). The local WAN gateway then 554 rewrites the destination MAC address in the packet with the address 555 specified in the adjacency. It also rewrites the source MAC address 556 with its router's MAC address. The local WAN gateway, then, forwards 557 the frame to the remote WAN gateway after encapsulating it with the 558 MPLS label stack. Note that this label stack includes the LSP label 559 as well as a EVPN label that was advertised by the remote WAN 560 gateway. When the MPLS encapsulated packet is received by the remote 561 WAN gateway, it simply swaps the EVPN label and forwards the packet 562 to the egress NVE. This implies that the GW1 needs to keep the remote 563 host MAC addresses along with the corresponding EVPN labels in the 564 adjacency entries of the IP-VRF table (i.e., its ARP table). The 565 remote WAN gateway then forward the packet to the egress NVE. The 566 egress NVE then performs a MAC lookup in the MAC-VRF (identified by 567 the received EVPN label) to determine the outbound port to send the 568 traffic on. 570 Figure 4 below depicts the forwarding model. 572 NVE1 GW1 GW2 NVE2 573 +------------+ +------------+ +------------+ +------------+ 574 | | | | | | | | 575 |(MAC - (IP | |(IP - (MAC | | [LS] | |(IP - (MAC | 576 | VRF) VRF)| | VRF) VRF)| | | | | | VRF) VRF)| 577 | | | | | | | | | | | | | | | | 578 +------------+ +------------+ +------------+ +------------+ 579 ^ v ^ V ^ V ^ V 580 | | | | | | | | 581 TS1->-+ +-->-----+ +---------------+ +---------------+ +->-TS2 583 Figure 4: Inter-Subnet Forwarding Among EVPN NVEs in Different DCs 584 with GW 586 4.4 Among IP-VPN Sites and EVPN NVEs with GW 588 In this scenario, the NVEs within a given data center do not have 589 entries for the IP addresses of hosts in remote enterprise sites. 590 Rather, the NVEs have a default IP route pointing the WAN gateway for 591 each IP-VRF. 593 When an Ethernet frame is received by an ingress NVE, it performs a 594 lookup on the destination MAC address in the associated MAC-VRF 595 table. If the MAC address corresponds to the IRB Interface MAC 596 address, the ingress NVE deduces that the packet MUST be inter-subnet 597 routed. Hence, the ingress NVE performs an IP lookup in the 598 associated IP-VRF table. The lookup, in this case, matches the 599 default route which points to the local WAN gateway. The ingress NVE 600 then rewrites the destination MAC address in the packet with the 601 router's MAC address of the local WAN gateway. It also rewrites the 602 source MAC address with its own IRB Interface MAC address. The 603 ingress NVE, then, forwards the frame to the local WAN gateway after 604 encapsulating it with the MPLS label stack. Note that this label 605 stack includes the LSP label as well as the default host route label 606 that was advertised by the local WAN gateway. When the MPLS 607 encapsulated packet is received by the local WAN gateway, it uses the 608 default host route label to identify the IP-VRF table. It then 609 performs an IP lookup in that table. The lookup identifies the next 610 hop ASBR to which the packet must be forwarded. The local gateway in 611 this case strips the Ethernet encapsulation and perform an IP lookup 612 in its IP-VRF and forwards the IP packet to the ASBR using a label 613 stack comprising of an LSP label and an IP-VPN label that was 614 advertised by the ASBR. When the MPLS encapsulated packet is received 615 by the ASBR, it simply swaps the IP-VPN label with the one advertised 616 by the egress PE. The ASBR then forwards the packet to the egress PE. 617 The egress PE then performs an IP lookup in the IP-VRF (identified by 618 the received IP-VPN label) to determine where to forward the traffic. 620 Figure 5 below depicts the forwarding model. 622 NVE1 GW1 ASBR NVE2 623 +------------+ +------------+ +------------+ +------------+ 624 | | | | | | | | 625 |(MAC - (IP | |(IP - (MAC | | [LS] | | (IP | 626 | VRF) VRF)| | VRF) VRF)| | | | | | VRF)| 627 | | | | | | | | | | | | | | | | 628 +------------+ +------------+ +------------+ +------------+ 629 ^ v ^ V ^ V ^ V 630 | | | | | | | | 631 TS1->-+ +-->-----+ +--------------+ +---------------+ +->-H1 633 Figure 5: Inter-Subnet Forwarding Among IP-VPN Sites and EVPN NVEs 634 with GW 636 4.5 Use of Centralized Gateway 638 In this scenario, the NVEs within a given data center need to forward 639 traffic in L2 to a centralized L3GW for a number of reasons: a) they 640 don't have IRB capabilities or b) they don't have required policy for 641 switching traffic between different tenants or security zones. The 642 centralized L3GW performs both the IRB function for switching traffic 643 among different EVPN instances as well as it performs interworking 644 function when the traffic needs to be switched between IP-VPN sites 645 and EVPN instances. 647 5 Operational Models for Symmetric Inter-Subnet Forwarding 649 The following sections describe several main symmetric IRB forwarding 650 scenarios. 652 5.1 IRB forwarding on NVEs for Tenant Systems 654 This section covers the symmetric IRB procedures for the scenario 655 where each Tenant System (TS) is attached to one or more NVEs and its 656 host IP and MAC addresses are learned by the attached NVEs and are 657 distributed to all other NVEs that are interested in participating in 658 both intra-subnet and inter-subnet communications with that TS. 660 In this scenario, for a given tenant (e.g., an IP-VPN instance), an 661 NVE has typically one MAC-VRF for each tenant's subnet (VLAN) that is 662 configured for, Assuming VLAN-based service which is typically the 663 case for VxLAN and NVGRE encapsulation, each MAC-VRF consists of a 664 single bridge domain. In case of MPLS encapsulation with VLAN-aware 665 bundling, then each MAC-VRF consists of multiple bridge domains (one 666 bridge domain per VLAN). The MAC-VRFs on an NVE for a given tenant 667 are associated with an IP-VRF corresponding to that tenant (or IP-VPN 668 instance) via their IRB interfaces. 670 Each NVE MUST support QoS, Security, and OAM policies per IP-VRF 671 to/from the core network. This is not to be confused with the QoS, 672 Security, and OAM policies per Attachment Circuits (AC) to/from the 673 Tenant Systems. How this requirement is met is an implementation 674 choice and it is outside the scope of this document. 676 Since VxLAN and NVGRE encapsulations require inner Ethernet header 677 (inner MAC SA/DA), and since for inter-subnet traffic, TS MAC address 678 cannot be used, the ingress NVE's MAC address is used as inner MAC 679 SA. The NVE's MAC address is the device MAC address and it is common 680 across all MAC-VRFs and IP-VRFs. This MAC address is advertised using 681 the new EVPN Router's MAC Extended Community (section 6.1). 683 Figure below illustrates this scenario where a given tenant (e.g., an 684 IP-VPN instance) has three subnets represented by MAC-VRF1, MAC-VRF2, 685 and MAC-VRF3 across two NVEs. There are five TS's that are associated 686 with these three MAC-VRFs - i.e., TS1, TS4, and TS5 are sitting on 687 the same subnet (e.g., same MAC-VRF/VLAN);where, TS1 and TS5 are 688 associated with MAC-VRF1 on NVE1, TS4 is associated with MAC-VRF1 on 689 NVE2. TS2 is associated with MAC-VRF2 on NVE1, and TS3 is associated 690 with MAC-VRF3 on NVE2. MAC-VRF1 and MAC-VRF2 on NVE1 are in turn 691 associated with IP-VRF1 on NVE1 and MAC-VRF1 and MAC-VRF3 on NVE2 are 692 associated with IP-VRF1 on NVE2. When TS1, TS5, and TS4 exchange 693 traffic with each other, only L2 forwarding (bridging) part of the 694 IRB solution is exercised because all these TS's sit on the same 695 subnet. However, when TS1 wants to exchange traffic with TS2 or TS3 696 which belong to different subnets, then both bridging and routing 697 parts of the IRB solution are exercised. The following subsections 698 describe the control and data planes operations for this IRB scenario 699 in details. 701 NVE1 +---------+ 702 +-------------+ | | 703 TS1-----| MACx| | | NVE2 704 (IP1/M1) |(MAC- | | | +-------------+ 705 TS5-----| VRF1)\ | | MPLS/ | |MACy (MAC- |-----TS3 706 (IP5/M5) | \ | | VxLAN/ | | / VRF3) | (IP3/M3) 707 | (IP-VRF1)|----| NVGRE |---|(IP-VRF1) | 708 | / | | | | \ | 709 TS2-----|(MAC- / | | | | (MAC- |-----TS4 710 (IP2/M2) | VRF2) | | | | VRF1) | (IP4/M4) 711 +-------------+ | | +-------------+ 712 | | 713 +---------+ 715 Figure 6: IRB forwarding on NVEs for Tenant Systems 717 5.1.1 Control Plane Operation 719 Each NVE advertises a Route Type-2 (RT-2, MAC/IP Advertisement Route) 720 for each of its TS's with the following field set: 722 - RD and ESI per [EVPN] 723 - Ethernet Tag = 0; assuming VLAN-based service 724 - MAC Address Length = 48 725 - MAC Address = Mi ; where i = 1,2,3,4, or 5 in the above example 726 - IP Address Length = 32 or 128 727 - IP Address = IPi ; where i = 1,2,3,4, or 5 in the above example 728 - Label-1 = MPLS Label or VNID corresponding to MAC-VRF 729 - Label-2 = MPLS Label or VNID corresponding to IP-VRF 731 Each NVE advertises an RT-2 route with two Route Targets (one 732 corresponding to its MAC-VRF and the other corresponding to its IP- 733 VRF. Furthermore, the RT-2 is advertised with two BGP Extended 734 Communities. The first BGP Extended Community identifies the tunnel 735 type per section 4.5 of [RFC5512] and the second BGP Extended 736 Community includes the MAC address of the NVE (e.g., MACx for NVE1 or 737 MACy for NVE2) as defined in section 6.1. This second Extended 738 Community (for the MAC address of NVE) is only required when Ethernet 739 NVO tunnel type is used. If IP NVO tunnel type is used, then there is 740 no need to send this second Extended Community. 742 Upon receiving this advertisement, the receiving NVE performs the 743 following: 745 - It uses Route Targets corresponding to its MAC-VRF and IP-VRF for 746 identifying these tables and subsequently importing this route into 747 them. 749 - It imports the MAC address into the MAC-VRF with BGP Next Hop 750 address as underlay tunnel destination address (e.g., VTEP DA for 751 VxLAN encapsulation) and Label-1 as VNID for VxLAN encapsulation or 752 EVPN label for MPLS encapsulation. 754 - If the route carries the new Router's MAC Extended Community, and 755 if the receiving NVE is using Ethernet NVO tunnel, then the receiving 756 NVE imports the IP address into IP-VRF with NVE's MAC address (from 757 the new Router's MAC Extended Community) as inner MAC DA and BGP Next 758 Hop address as underlay tunnel destination address, VTEP DA for VxLAN 759 encapsulation and Label-2 as IP-VPN VNID for VxLAN encapsulation. 761 - If the receiving NVE is going to use MPLS encapsulation, then the 762 receiving NVE imports the IP address into IP-VRF with BGP Next Hop 763 address as underlay tunnel destination address, and Label-2 as IP-VPN 764 label for MPLS encapsulation. 766 If the receiving NVE receives a RT-2 with only a single Route Target 767 corresponding to IP-VRF and Label-1, then it must discard this route 768 and log an error. If the receiving NVE receives a RT-2 with only a 769 single Route Target corresponding to MAC-VRF but with both Label-1 770 and Label-2, then it must discard this route and log an error. If the 771 receiving NVE receives a RT-2 with MAC Address Length of zero, then 772 it must discard this route and log an error. 774 5.1.2 Data Plane Operation - Inter Subnet 776 The following description of the data-plane operation describes just 777 the logical functions and the actual implementation may differ. Lets 778 consider data-plane operation when TS1 in subnet-1 (MAC-VRF1) on NVE1 779 wants to send traffic to TS3 in subnet-3 (MAC-VRF3) on NVE2. 781 - TS1 send a packet with MAC DA corresponding to the MAC-VRF1 IRB 782 interface on NVE1 (the interface between MAC-VRF1 and IP-VRF1), and 783 VLAN-tag corresponding to MAC-VRF1. 785 - Upon receiving the packet, the NVE1 uses VLAN-tag to identify the 786 MAC-VRF1. It then looks up the MAC DA and forwards the frame to its 787 IRB interface. 789 - The Ethernet header of the packet is stripped and the packet is 790 fed to the IP-VRF where IP lookup is performed on the destination 791 address. This lookup yields an outgoing interface and the required 792 encapsulation. If the encapsulation is for Ethernet NVO tunnel, then 793 it includes a MAC address to be used as inner MAC DA, an IP address 794 to be used as VTEP DA, and a VPN-ID to be used as VNID. 796 - The packet is then encapsulated with the proper header based on 797 the above info. The inner MAC SA and VTEP SA is set to NVE's MAC and 798 IP addresses respectively. The packet is then forwarded to the egress 799 NVE. 801 - On the egress NVE, if the packet arrives on Ethernet NOV tunnel 802 (e.g., it is VxLAN encapsulated), then the VxLAN header is removed. 803 Since the inner MAC DA is the egress NVE's MAC address, the egress 804 NVE knows that it needs to perform an IP lookup. It uses VNID to 805 identify the IP-VRF table and then performs an IP lookup for the 806 destination TS (TS3) which results in access-facing IRB interface 807 over which the packet is sent. Before sending the packet over this 808 interface, the ARP table is consulted to get the destination TS's MAC 809 address. 811 - The IP packet is encapsulated with an Ethernet header with MAC SA 812 set to that of IRB interface MAC address and MAC DA set to that of 813 destination TS (TS3) MAC address. The packet is sent to the 814 corresponding MAC-VRF3 and after a lookup of MAC DA, is forwarded to 815 the destination TS (TS3) over the corresponding interface. 817 In this symmetric IRB scenario, inter-subnet traffic between NVEs 818 will always use the IP-VRF VNID/MPLS label. For instance, traffic 819 from TS2 to TS4 will be encapsulated by NVE1 using NVE2's IP-VRF 820 VNID/MPLS label, as long as TS4's host IP is present in NVE1's IP- 821 VRF. 823 5.1.3 TS Move Operation 825 When a TS move from one NVE to other, it is important that the MAC 826 mobility procedures are properly executed and the corresponding MAC- 827 VRF and IP-VRF tables on all participating NVEs are updated. [EVPN] 828 describes the MAC mobility procedures for L2-only services for both 829 single-homed TS and multi-homed TS. This section describes the 830 incremental procedures and BGP Extended Communities needed to handle 831 the MAC mobility for a mixed of L2 and L3 connectivity (aka IRB). In 832 order to place the emphasis on the differences between L2-only versus 833 L2-and-L3 use cases, the incremental procedure is described for 834 single-homed TS with the expectation that the reader can easily 835 extrapolate multi-homed TS based on the procedures described in 836 section 15 of [EVPN]. 838 Lets consider TS1 in figure-6 above where it moves from NVE1 to NVE2. 839 In such move, NVE2 discovers IP1/MAC1 of TS1 and realizes that it is 840 a MAC move and it advertises a MAC/IP route per section 5.1.1 above 841 with MAC Mobility Extended Community. In this IRB use case, both MAC 842 and IP addresses of the TS along with their corresponding VNI/MPLS 843 labels are included in the EVPN MAC/IP Advertisement route. 844 Furthermore, besides MAC mobility Extended Community and Route Target 845 corresponding to the MAC-VRF, the following additional BGP Extended 846 Communities are advertised along with the MAC/IP Advertisement route: 848 - Route Target associated with IP-VRF 849 - Router's MAC Extended Community 850 - Tunnel Type Extended Community 852 Since NVE2 learns TS1's MAC/IP addresses locally, it updates its MAC- 853 VRF1 and IP-VRF1 for TS1 with its local interface. 855 If the local learning at NVE1 is performed using control or 856 management planes, then these interactions serve as the trigger for 857 NVE1 to withdraw the MAC/IP addresses associated with TS1. However, 858 if the local learning at NVE1 is performed using data-plane learning, 859 then the reception of the MAC/IP Advertisement route (for TS1) from 860 NVE2 with MAC Mobility extended community serve as the trigger for 861 NVE1 to withdraw the MAC/IP addresses associated with TS1. 863 All other remote NVE devices upon receiving the MAC/IP advertisement 864 route for TS1 from NVE2 with MAC Mobility extended community compare 865 the sequence number in this advertisement with the one previously 866 received. If the new sequence number is greater than the old one, 867 then they update the MAC/IP addresses of TS1 in their corresponding 868 MAC-VRFs and IP-VRFs to point to NVE2. Furthermore, upon receiving 869 the MAC/IP withdraw for TS1 from NVE1, these remote PEs perform the 870 cleanups for their BGP tables. 872 5.2 IRB forwarding on NVEs for Subnets behind Tenant Systems 874 This section covers the symmetric IRB procedures for the scenario 875 where some Tenant Systems (TS's) support one or more subnets and 876 these TS's are associated with one ore more NVEs. Therefore, besides 877 the advertisement of MAC/IP addresses for each TS which can be in the 878 presence of All-Active multi-homing, the associated NVE needs to also 879 advertise the subnets behind each TS. 881 The main difference between this scenario and the previous one is the 882 additional advertisement corresponding to each subnet. These subnet 883 advertisements are accomplished using EVPN IP Prefix route defined in 884 [EVPN-PREFIX]. These subnet prefixes are advertised with the IP 885 address of their associated TS (which is in overlay address space) as 886 their next hop. The receiving NVEs perform recursive route resolution 887 to resolve the subnet prefix with its associated ingress NVE so that 888 they know which NVE to forward the packets to when they are destined 889 for that subnet prefix. 891 The advantage of this recursive route resolution is that when a TS 892 moves from one NVE to another, there is no need to re-advertise any 893 of the subnet prefixes for that TS. All it is needed is to advertise 894 the IP/MAC addresses associated with the TS itself and exercise MAC 895 mobility procedures for that TS. The recursive route resolution 896 automatically takes care of the updates for the subnet prefixes of 897 that TS. 899 Figure below illustrates this scenario where a given tenant (e.g., an 900 IP-VPN service) has three subnets represented by MAC-VRF1, MAC-VRF2, 901 and MAC-VRF3 across two NVEs. There are four TS's associated with 902 these three MAC-VRFs - i.e., TS1, TS5 are connected to MAC-VRF1 on 903 NVE1, TS2 is connected to MAC-VRF2 on NVE1, TS3 is connected to MAC- 904 VRF3 on NVE2, and TS4 is connected to MAC-VRF1 on NVE2. TS1 has two 905 subnet prefixes (SN1 and SN2) and TS3 has a single subnet prefix, 906 SN3. The MAC-VRFs on each NVE are associated with their corresponding 907 IP-VRF using their IRB interfaces. When TS4 and TS1 exchange intra- 908 subnet traffic, only L2 forwarding (bridging) part of the IRB 909 solution is used (i.e., the traffic only goes through their MAC- 910 VRFs); however, when TS3 wants to forward traffic to SN1 or SN2 911 sitting behind TS1 (inter-subnet traffic), then both bridging and 912 routing parts of the IRB solution are exercised (i.e., the traffic 913 goes through the corresponding MAC-VRFs and IP-VRFs). The following 914 subsections describe the control and data planes operations for this 915 IRB scenario in details. 917 NVE1 +----------+ 918 SN1--+ +-------------+ | | 919 |--TS1-----|(MAC- \ | | | 920 SN2--+ IP1/M1 | VRF1) \ | | | 921 | (IP-VRF)|---| | 922 | / | | | 923 TS2-----|(MAC- / | | MPLS/ | 924 IP2/M2 | VRF2) | | VxLAN/ | 925 +-------------+ | NVGRE | 926 +-------------+ | | 927 SN3--+--TS3-----|(MAC-\ | | | 928 IP3/M3 | VRF3)\ | | | 929 | (IP-VRF)|---| | 930 | / | | | 931 TS4-----|(MAC- / | | | 932 IP4/M4 | VRF1) | | | 933 +-------------+ +----------+ 934 NVE2 936 Figure 7: IRB forwarding on NVEs for Tenant Systems with configured subnets 938 5.2.1 Control Plane Operation 940 Each NVE advertises a Route Type-5 (RT-5, IP Prefix Route defined in 941 [EVPN-PREFIX]) for each of its subnet prefixes with the IP address of 942 its TS as the next hop (gateway address field) as follow: 944 - RD per VPN 945 - ESI = 0 946 - Ethernet Tag = 0; 947 - IP Prefix Length = 32 or 128 948 - IP Prefix = SNi 949 - Gateway Address = IPi; IP address of TS 950 - Label = 0 952 This RT-5 is advertised with a Route Target corresponding to the IP- 953 VPN service. 955 Each NVE also advertises an RT-2 (MAC/IP Advertisement Route) along 956 with their associated Route Targets and Extended Communities for each 957 of its TS's exactly as described in section 5.1.1. 959 Upon receiving the RT-5 advertisement, the receiving NVE performs the 960 following: 962 - It uses the Route Target to identify the corresponding IP-VRF 963 - It imports the IP prefix into its corresponding IP-VRF with the IP 964 address of the associated TS as its next hop. 966 Upon receiving the RT-2 advertisement, the receiving NVE imports 967 MAC/IP addresses of the TS into the corresponding MAC-VRF and IP-VRF 968 per section 5.1.1. Furthermore, it performs recursive route 969 resolution to resolve the IP prefix (received in RT-5) to its 970 corresponding NVE's IP address (e.g., its BGP next hop). BGP next hop 971 will be used as underlay tunnel destination address (e.g., VTEP DA 972 for VxLAN encapsulation) and Router's MAC will be used as inner MAC 973 for VxLAN encapsulation. 975 5.2.2 Data Plane Operation 977 The following description of the data-plane operation describes just 978 the logical functions and the actual implementation may differ. Lets 979 consider data-plane operation when a host on SN1 sitting behind TS1 980 wants to send traffic to a host sitting behind SN3 behind TS3. 982 - TS1 send a packet with MAC DA corresponding to the MAC-VRF1 IRB 983 interface of NVE1, and VLAN-tag corresponding to MAC-VRF1. 985 - Upon receiving the packet, the ingress NVE1 uses VLAN-tag to 986 identify the MAC-VRF1. It then looks up the MAC DA and forwards the 987 frame to its IRB interface just like section 5.1.1. 989 - The Ethernet header of the packet is stripped and the packet is fed 990 to the IP-VRF; where, IP lookup is performed on the destination 991 address. This lookup yields the fields needed for VxLAN encapsulation 992 with NVE2's MAC address as the inner MAC DA, NVE'2 IP address as the 993 VTEP DA, and the VNID. MAC SA is set to NVE1's MAC address and VTEP 994 SA is set to NVE1's IP address. 996 - The packet is then encapsulated with the proper header based on 997 the above info and is forwarded to the egress NVE (NVE2). 999 - On the egress NVE (NVE2), assuming the packet is VxLAN 1000 encapsulated, the VxLAN and the inner Ethernet headers are removed 1001 and the resultant IP packet is fed to the IP-VRF associated with that 1002 the VNID. 1004 - Next, a lookup is performed based on IP DA (which is in SN3) in the 1005 associated IP-VRF of NVE2. The IP lookup yields the access-facing IRB 1006 interface over which the packet needs to be sent. Before sending the 1007 packet over this interface, the ARP table is consulted to get the 1008 destination TS (TS3) MAC address. 1010 - The IP packet is encapsulated with an Ethernet header with the MAC 1011 SA set to that of the access-facing IRB interface of the egress NVE 1012 (NVE2) and the MAC DA is set to that of destination TS (TS3) MAC 1013 address. The packet is sent to the corresponding MAC-VRF3 and after a 1014 lookup of MAC DA, is forwarded to the destination TS (TS3) over the 1015 corresponding interface. 1017 6 BGP Encoding 1019 This document defines one new BGP Extended Community for EVPN. 1021 6.1 Router's MAC Extended Community 1023 A new EVPN BGP Extended Community called Router's MAC is introduced 1024 here. This new extended community is a transitive extended community 1025 with the Type field of 0x06 (EVPN) and the Sub-Type of 0x03. It may 1026 be advertised along with BGP Encapsulation Extended Community define 1027 in section 4.5 of [RFC5512]. 1029 The Router's MAC Extended Community is encoded as an 8-octet value as 1030 follows: 1032 0 1 2 3 1033 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1034 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1035 | Type=0x06 | Sub-Type=0x03 | Router's MAC | 1036 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1037 | Router's MAC Cont'd | 1038 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1040 This extended community is used to carry the NVE's MAC address for 1041 symmetric IRB scenarios and it is sent with RT-2 as described in 1042 section 5.1.1 and 5.2.1. 1044 7 TS Mobility 1046 7.1 TS Mobility & Optimum Forwarding for TS Outbound Traffic 1048 Optimum forwarding for the TS outbound traffic, upon TS mobility, can 1049 be achieved using either the anycast default Gateway MAC and IP 1050 addresses, or using the address aliasing as discussed in [DC- 1051 MOBILITY]. 1053 7.2 TS Mobility & Optimum Forwarding for TS Inbound Traffic 1054 For optimum forwarding of the TS inbound traffic, upon TS mobility, 1055 all the NVEs and/or IP-VPN PEs need to know the up to date location 1056 of the TS. Two scenarios must be considered, as discussed next. 1058 In what follows, we use the following terminology: 1060 - source NVE refers to the NVE behind which the TS used to reside 1061 prior to the TS mobility event. 1063 - target NVE refers to the new NVE behind which the TS has moved 1064 after the mobility event. 1066 7.2.1 Mobility without Route Aggregation 1068 In this scenario, when a target NVE detects that a MAC mobility event 1069 has occurred, it initiates the MAC mobility handshake in BGP as 1070 specified in section 5.1.3. The WAN Gateways, acting as ASBRs in this 1071 case, re-advertise the MAC route of the target NVE with the MAC 1072 Mobility extended community attribute unmodified. Because the WAN 1073 Gateway for a given data center re-advertises BGP routes received 1074 from the WAN into the data center, the source NVE will receive the 1075 MAC Advertisement route of the target NVE (with the next hop 1076 attribute adjusted depending on which inter-AS option is employed). 1077 The source NVE will then withdraw its original MAC Advertisement 1078 route as a result of evaluating the Sequence Number field of the MAC 1079 Mobility extended community in the received MAC Advertisement route. 1080 This is per the procedures already defined in [EVPN]. 1082 8 Acknowledgements 1084 The authors would like to thank Sami Boutros for his valuable 1085 comments. 1087 9 Security Considerations 1089 The security considerations discussed in [RFC7432] apply to this 1090 document. 1092 10 IANA Considerations 1094 IANA has allocated a new transitive extended community Type of 0x06 1095 and Sub-Type of 0x03 for EVPN Router's MAC Extended Community. 1097 11 References 1099 11.1 Normative References 1101 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1102 Requirement Levels", BCP 14, RFC 2119, March 1997. 1104 11.2 Informative References 1106 [EVPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf- 1107 l2vpn-evpn-04.txt, work in progress, July, 2014. 1109 [EVPN-IPVPN-INTEROP] Sajassi et al., "EVPN Seamless Interoperability 1110 with IP-VPN", draft-sajassi-l2vpn-evpn-ipvpn-interop-01, work in 1111 progress, October, 2012. 1113 [DC-MOBILITY] Aggarwal et al., "Data Center Mobility based on 1114 BGP/MPLS, IP Routing and NHRP", draft-raggarwa-data-center-mobility- 1115 05.txt, work in progress, June, 2013. 1117 [EVPN-PREFIX] Rabadan et al., "IP Prefix Advertisement in EVPN", 1118 draft-rabadan-l2vpn-evpn-prefix-advertisement-02, July, 2014. 1120 12 Contributors 1122 In addition to the authors listed on the front page, the following 1123 co-authors have also contributed to this document: 1125 Samer Salam 1126 Florin Balus 1127 Cisco 1129 Yakov Rekhter 1130 Juniper 1132 Wim Henderickx 1133 Nokia 1135 Linda Dunbar 1136 Huawei 1138 Dennis Cai 1139 Alibaba 1141 Authors' Addresses 1143 Ali Sajassi (Editor) 1144 Cisco 1145 Email: sajassi@cisco.com 1146 Samer Salam 1147 Cisco 1148 Email: sslam@cisco.com 1150 Samir Thoria 1151 Cisco 1152 Email: sthoria@cisco.com 1154 John E. Drake 1155 Juniper Networks 1156 Email: jdrake@juniper.net 1158 Lucy Yong 1159 Huawei Technologies 1160 Email: lucy.yong@huawei.com 1162 Jorge Rabadan 1163 Nokia 1164 Email: jorge.rabadan@nokia.com