idnits 2.17.1 draft-ietf-bess-evpn-prefix-advertisement-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 27, 2018) is 2248 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5512 (Obsoleted by RFC 9012) == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-03 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Workgroup J. Rabadan, Ed. 3 Internet Draft W. Henderickx 4 Intended status: Standards Track Nokia 6 J. Drake 7 W. Lin 8 Juniper 10 A. Sajassi 11 Cisco 13 Expires: August 31, 2018 February 27, 2018 15 IP Prefix Advertisement in EVPN 16 draft-ietf-bess-evpn-prefix-advertisement-10 18 Abstract 20 EVPN provides a flexible control plane that allows intra-subnet 21 connectivity in an MPLS and/or NVO-based network. In some networks, 22 there is also a need for a dynamic and efficient inter-subnet 23 connectivity across Tenant Systems and End Devices that can be 24 physical or virtual and do not necessarily participate in dynamic 25 routing protocols. This document defines a new EVPN route type for 26 the advertisement of IP Prefixes and explains some use-case examples 27 where this new route-type is used. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as Internet- 37 Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/ietf/1id-abstracts.txt 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 This Internet-Draft will expire on August 31, 2018. 51 Copyright Notice 53 Copyright (c) 2018 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 2. Introduction and Problem Statement . . . . . . . . . . . . . . 4 70 2.1 Inter-Subnet Connectivity Requirements in Data Centers . . . 5 71 2.2 The Requirement for a New EVPN Route Type . . . . . . . . . 7 72 3. The BGP EVPN IP Prefix Route . . . . . . . . . . . . . . . . . 8 73 3.1 IP Prefix Route Encoding . . . . . . . . . . . . . . . . . . 9 74 3.2 Overlay Indexes and Recursive Lookup Resolution . . . . . . 11 75 4. Overlay Index Use-Cases . . . . . . . . . . . . . . . . . . . . 14 76 4.1 TS IP Address Overlay Index Use-Case . . . . . . . . . . . . 14 77 4.2 Floating IP Overlay Index Use-Case . . . . . . . . . . . . . 16 78 4.3 Bump-in-the-Wire Use-Case . . . . . . . . . . . . . . . . . 18 79 4.4 IP-VRF-to-IP-VRF Model . . . . . . . . . . . . . . . . . . . 21 80 4.4.1 Interface-less IP-VRF-to-IP-VRF Model . . . . . . . . . 22 81 4.4.2 Interface-ful IP-VRF-to-IP-VRF with SBD IRB . . . . . . 25 82 4.4.3 Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD IRB . 28 83 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 31 84 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 32 85 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 32 86 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 87 8.1 Normative References . . . . . . . . . . . . . . . . . . . . 32 88 8.2 Informative References . . . . . . . . . . . . . . . . . . . 33 89 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 33 90 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 34 91 11. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 34 93 1. Terminology 95 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 96 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 97 "OPTIONAL" in this document are to be interpreted as described in BCP 98 14 [RFC2119] [RFC8174] when, and only when, they appear in all 99 capitals, as shown here. 101 GW IP: Gateway IP Address. 103 IPL: IP address length. 105 ML: MAC address length. 107 NVE: Network Virtualization Edge. 109 TS: Tenant System. 111 VA: Virtual Appliance. 113 RT-2: EVPN route type 2, i.e. MAC/IP advertisement route. 115 RT-5: EVPN route type 5, i.e. IP Prefix route. 117 AC: Attachment Circuit. 119 ARP: Address Resolution Protocol. 121 ND: Neighbor Discovery Protocol. 123 Ethernet NVO tunnel: it refers to Network Virtualization Overlay 124 tunnels with Ethernet payload. Examples of this type of tunnels 125 are VXLAN or nvGRE. 127 IP NVO tunnel: it refers to Network Virtualization Overlay tunnels 128 with IP payload (no MAC header in the payload). 130 EVI: EVPN Instance spanning the NVE/PE devices that are participating 131 on that EVPN. 133 MAC-VRF: A Virtual Routing and Forwarding table for Media Access 134 Control (MAC) addresses on an NVE/PE, as per [RFC7432]. 136 BD: Broadcast Domain. As per [RFC7432], an EVI consists of a single 137 or multiple BDs. In case of VLAN-bundle and VLAN-based service 138 models (see [RFC7432]), a BD is equivalent to an EVI. In case of 139 VLAN-aware bundle service model, an EVI contains multiple BDs. 140 Also, in this document, BD and subnet are equivalent terms. 142 BD route-target: refers to the Broadcast Domain assigned route- 143 target. In case of VLAN-aware bundle service model, all the BD 144 instances in the MAC-VRF share the same route-target. 146 BT: Bridge Table. The instantiation of a BD in a MAC-VRF. 148 IP-VRF: A VPN Routing and Forwarding table for IP routes on an 149 NVE/PE. The IP routes could be populated by EVPN and IP-VPN 150 address families. 152 IRB: Integrated Routing and Bridging interface. It connects an IP-VRF 153 to a BD (or subnet). 155 SBD: Supplementary Broadcast Domain. A BD that does not have any ACs, 156 only IRB interfaces, and it is used to provide connectivity among 157 all the IP-VRFs of the tenant. The SBD is only required in IP-VRF- 158 to-IP-VRF use-cases (see section 4.4.). 160 VNI: Virtual Network Identifier. 162 SN: Subnet. 164 DGW: Data Center Gateway. 166 GARP: Gratuitous Address Resolution Protocol. 168 2. Introduction and Problem Statement 170 Inter-subnet connectivity is used for certain tenants within the Data 171 Center. [EVPN-INTERSUBNET] defines some fairly common inter-subnet 172 forwarding scenarios where TSes can exchange packets with TSes 173 located in remote subnets. In order to achieve this, 174 [EVPN-INTERSUBNET] describes how MAC/IPs encoded in TS RT-2 routes 175 are not only used to populate MAC-VRF and overlay ARP tables, but 176 also IP-VRF tables with the encoded TS host routes (/32 or /128). In 177 some cases, EVPN may advertise IP Prefixes and therefore provide 178 aggregation in the IP-VRF tables, as opposed to program individual 179 host routes. This document complements the scenarios described in 180 [EVPN-INTERSUBNET] and defines how EVPN may be used to advertise IP 181 Prefixes. Interoperability between EVPN and L3VPN [RFC4364] IP Prefix 182 routes is out of the scope of this document. 184 Section 2.1 describes the inter-subnet connectivity requirements in 185 Data Centers. Section 2.2 explains why a new EVPN route type is 186 required for IP Prefix advertisements. Once the need for a new EVPN 187 route type is justified, sections 3, 4 and 5 will describe this route 188 type and how it is used in some specific use cases. 190 2.1 Inter-Subnet Connectivity Requirements in Data Centers 192 [RFC7432] is used as the control plane for a Network Virtualization 193 Overlay (NVO3) solution in Data Centers (DC), where Network 194 Virtualization Edge (NVE) devices can be located in Hypervisors or 195 TORs, as described in [EVPN-OVERLAY]. 197 If the term Tenant System (TS) is used to designate a physical or 198 virtual system identified by MAC and maybe IP addresses, and 199 connected to a BD by an Attachment Circuit, the following 200 considerations apply: 202 o The Tenant Systems may be Virtual Machines (VMs) that generate 203 traffic from their own MAC and IP. 205 o The Tenant Systems may be Virtual Appliance entities (VAs) that 206 forward traffic to/from IP addresses of different End Devices 207 sitting behind them. 209 o These VAs can be firewalls, load balancers, NAT devices, other 210 appliances or virtual gateways with virtual routing instances. 212 o These VAs do not necessarily participate in dynamic routing 213 protocols and hence rely on the EVPN NVEs to advertise the 214 routes on their behalf. 216 o In all these cases, the VA will forward traffic to other TSes 217 using its own source MAC but the source IP will be the one 218 associated to the End Device sitting behind or a translated IP 219 address (part of a public NAT pool) if the VA is performing 220 NAT. 222 o Note that the same IP address could exist behind two of these 223 TS. One example of this would be certain appliance resiliency 224 mechanisms, where a virtual IP or floating IP can be owned by 225 one of the two VAs running the resiliency protocol (the master 226 VA). Virtual Router Redundancy Protocol (VRRP), RFC5798, is 227 one particular example of this. Another example is multi-homed 228 subnets, i.e. the same subnet is connected to two VAs. 230 o Although these VAs provide IP connectivity to VMs and subnets 231 behind them, they do not always have their own IP interface 232 connected to the EVPN NVE, e.g. layer-2 firewalls are examples 233 of VAs not supporting IP interfaces. 235 Figure 1 illustrates some of the examples described above. 237 NVE1 238 +-----------+ 239 TS1(VM)--| (BD-10) |-----+ 240 IP1/M1 +-----------+ | DGW1 241 +---------+ +-------------+ 242 | |----| (BD-10) | 243 SN1---+ NVE2 | | | IRB1\ | 244 | +-----------+ | | | (IP-VRF)|---+ 245 SN2---TS2(VA)--| (BD-10) |-| | +-------------+ _|_ 246 | IP2/M2 +-----------+ | VXLAN/ | ( ) 247 IP4---+ <-+ | nvGRE | DGW2 ( WAN ) 248 | | | +-------------+ (___) 249 vIP23 (floating) | |----| (BD-10) | | 250 | +---------+ | IRB2\ | | 251 SN1---+ <-+ NVE3 | | | | (IP-VRF)|---+ 252 | IP3/M3 +-----------+ | | | +-------------+ 253 SN3---TS3(VA)--| (BD-10) |---+ | | 254 | +-----------+ | | 255 IP5---+ | | 256 | | 257 NVE4 | | NVE5 +--SN5 258 +---------------------+ | | +-----------+ | 259 IP6------| (BD-1) | | +-| (BD-10) |--TS4(VA)--SN6 260 | \ | | +-----------+ | 261 | (IP-VRF) |--+ ESI4 +--SN7 262 | / \IRB3 | 263 |---| (BD-2) (BD-10) | 264 SN4| +---------------------+ 266 Figure 1 DC inter-subnet use-cases 268 Where: 270 NVE1, NVE2, NVE3, NVE4, NVE5, DGW1 and DGW2 share the same BD for a 271 particular tenant. BD-10 is comprised of the collection of BD 272 instances defined in all the NVEs. All the hosts connected to BD-10 273 belong to the same IP subnet. The hosts connected to BD-10 are listed 274 below: 276 o TS1 is a VM that generates/receives traffic from/to IP1, where IP1 277 belongs to the BD-10 subnet. 279 o TS2 and TS3 are Virtual Appliances (VA) that send/receive traffic 280 from/to the subnets and hosts sitting behind them (SN1, SN2, SN3, 281 IP4 and IP5). Their IP addresses (IP2 and IP3) belong to the BD-10 282 subnet and they can also generate/receive traffic. When these VAs 283 receive packets destined to their own MAC addresses (M2 and M3) 284 they will route the packets to the proper subnet or host. These VAs 285 do not support routing protocols to advertise the subnets connected 286 to them and can move to a different server and NVE when the Cloud 287 Management System decides to do so. These VAs may also support 288 redundancy mechanisms for some subnets, similar to VRRP, where a 289 floating IP is owned by the master VA and only the master VA 290 forwards traffic to a given subnet. E.g.: vIP23 in Figure 1 is a 291 floating IP that can be owned by TS2 or TS3 depending on who the 292 master is. Only the master will forward traffic to SN1. 294 o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 have 295 their own IP addresses that belong to the BD-10 subnet too. These 296 IRB interfaces connect the BD-10 subnet to Virtual Routing and 297 Forwarding (IP-VRF) instances that can route the traffic to other 298 subnets for the same tenant (within the DC or at the other end of 299 the WAN). 301 o TS4 is a layer-2 VA that provides connectivity to subnets SN5, SN6 302 and SN7, but does not have an IP address itself in the BD-10. TS4 303 is connected to a physical port on NVE5 assigned to Ethernet 304 Segment Identifier 4. 306 For a BD that an ingress NVE is attached to, "Overlay Index" is 307 defined as an identifier that the ingress EVPN NVE requires in order 308 to forward packets to a subnet or host in a remote subnet. As an 309 example, vIP23 (Figure 1) is an Overlay Index that any NVE attached 310 to BD-10 needs to know in order to forward packets to SN1. IRB3 IP 311 address is an Overlay Index required to get to SN4, and ESI4 is an 312 Overlay Index needed to forward traffic to SN5. In other words, the 313 Overlay Index is a next-hop in the overlay address space that can be 314 an IP address, a MAC address or an ESI. When advertised along with an 315 IP Prefix, the Overlay Index requires a recursive resolution to find 316 out to what egress NVE the EVPN packets need to be sent. 318 All the DC use cases in Figure 1 require inter-subnet forwarding and 319 therefore, the individual host routes and subnets: 321 a) MUST be advertised from the NVEs (since VAs and VMs do not 322 participate in dynamic routing protocols) and 323 b) MAY be associated to an Overlay Index that can be a VA IP address, 324 a floating IP address, a MAC address or an ESI. The Overlay Index 325 is further discussed in section 3.2. 327 2.2 The Requirement for a New EVPN Route Type 329 [RFC7432] defines a MAC/IP route (also referred as RT-2) where a MAC 330 address can be advertised together with an IP address length (IPL) 331 and IP address (IP). While a variable IPL might have been used to 332 indicate the presence of an IP prefix in a route type 2, there are 333 several specific use cases in which using this route type to deliver 334 IP Prefixes is not suitable. 336 One example of such use cases is the "floating IP" example described 337 in section 2.1. In this example it is needed to decouple the 338 advertisement of the prefixes from the advertisement of MAC address 339 of either M2 or M3, otherwise the solution gets highly inefficient 340 and does not scale. 342 E.g.: if 1k prefixes are advertised from M2 (using RT-2) and the 343 floating IP owner changes from M2 to M3, 1k routes would be withdrawn 344 from M2 and re-advertise 1k routes from M3. However if a separate 345 route type is used, 1k routes can be advertised as associated to the 346 floating IP address (vIP23) and only one RT-2 for advertising the 347 ownership of the floating IP, i.e. vIP23 and M2 in the route type 2. 348 When the floating IP owner changes from M2 to M3, a single RT-2 349 withdraw/update is required to indicate the change. The remote DGW 350 will not change any of the 1k prefixes associated to vIP23, but will 351 only update the ARP resolution entry for vIP23 (now pointing at M3). 353 Other reasons to decouple the IP Prefix advertisement from the MAC/IP 354 route are listed below: 356 o Clean identification, operation and troubleshooting of IP Prefixes, 357 independent of and not subject to the interpretation of the IPL and 358 the IP value. E.g.: a default IP route 0.0.0.0/0 must always be 359 easily and clearly distinguished from the absence of IP 360 information. 362 o In MAC/IP routes, the MAC information is part of the NLRI, so if IP 363 Prefixes were to be advertised using MAC/IP routes, the MAC 364 information would always be present and part of the route key. 366 The following sections describe how EVPN is extended with a route 367 type for the advertisement of IP prefixes and how this route is used 368 to address the inter-subnet connectivity requirements existing in the 369 Data Center. 371 3. The BGP EVPN IP Prefix Route 373 The current BGP EVPN NLRI as defined in [RFC7432] is shown below: 375 +-----------------------------------+ 376 | Route Type (1 octet) | 377 +-----------------------------------+ 378 | Length (1 octet) | 379 +-----------------------------------+ 380 | Route Type specific (variable) | 381 +-----------------------------------+ 383 Figure 2 BGP EVPN NLRI 385 Where the route type field can contain one of the following specific 386 values (refer to the IANA "EVPN Route Types" registry): 388 + 1 - Ethernet Auto-Discovery (A-D) route 390 + 2 - MAC/IP advertisement route 392 + 3 - Inclusive Multicast Route 394 + 4 - Ethernet Segment Route 396 This document defines an additional route type that IANA has added to 397 the registry, and will be used for the advertisement of IP Prefixes: 399 + 5 - IP Prefix Route 401 According to Section 5.4 in [RFC7606], a node that doesn't recognize 402 the Route Type 5 (RT-5) will ignore it. Therefore an NVE following 403 this document can still be attached to a BD where an NVE ignoring RT- 404 5s is attached to. Regular [RFC7432] procedures would apply in that 405 case for both NVEs. In case two or more NVEs are attached to 406 different BDs of the same tenant, they MUST support RT-5 for the 407 proper Inter-Subnet Forwarding operation of the tenant. 409 The detailed encoding of this route and associated procedures are 410 described in the following sections. 412 3.1 IP Prefix Route Encoding 414 An IP Prefix Route Type consists of the following fields: 416 +---------------------------------------+ 417 | RD (8 octets) | 418 +---------------------------------------+ 419 |Ethernet Segment Identifier (10 octets)| 420 +---------------------------------------+ 421 | Ethernet Tag ID (4 octets) | 422 +---------------------------------------+ 423 | IP Prefix Length (1 octet) | 424 +---------------------------------------+ 425 | IP Prefix (4 or 16 octets) | 426 +---------------------------------------+ 427 | GW IP Address (4 or 16 octets) | 428 +---------------------------------------+ 429 | MPLS Label (3 octets) | 430 +---------------------------------------+ 432 Figure 3 EVPN IP Prefix route NLRI 434 Where: 436 o RD and Ethernet Tag ID MUST be used as defined in [RFC7432] and 437 [EVPN-OVERLAY]. The MPLS Label field is set to either an MPLS label 438 or a VNI, as described in [EVPN-OVERLAY] for other EVPN route 439 types. 441 o The Ethernet Segment Identifier MUST be a non-zero 10-byte 442 identifier if the ESI is used as an Overlay Index (see the 443 definition of Overlay Index in section 3.2). It MUST be zero 444 otherwise. The ESI format is described in [RFC7432]. 446 o The IP Prefix Length can be set to a value between 0 and 32 (bits) 447 for IPv4 and between 0 and 128 for IPv6, and specifies the number 448 of bits in the Prefix. The value MUST NOT be greater than 128. 450 o The IP Prefix is a 4 or 16-octet field (IPv4 or IPv6). The size of 451 this field MUST NOT be 4 octets if the IP Prefix Length value is 452 greater than 32 bits. 454 o The GW (Gateway) IP Address field is a 4 or 16-octet field (IPv4 or 455 IPv6), and will encode a valid IP address as an Overlay Index for 456 the IP Prefixes. The GW IP field MUST be zero if it is not used as 457 an Overlay Index. Refer to section 3.2 for the definition and use 458 of the Overlay Index. 460 o The MPLS Label field is encoded as 3 octets, where the high-order 461 20 bits contain the label value. When sending, the label value 462 SHOULD be zero if recursive resolution based on overlay index is 463 used. If the received MPLS Label value is zero, the route MUST 464 contain an Overlay Index and the ingress NVE/PE MUST do recursive 465 resolution to find the egress NVE/PE. If the received Label is zero 466 and the route does not contain an Overlay Index, it MUST be treat- 467 as-withdraw [RFC7606]. If the received Label value is non-zero, the 468 route will not be used for recursive resolution unless a local 469 policy says so. 471 o The total route length will indicate the type of prefix (IPv4 or 472 IPv6) and the type of GW IP address (IPv4 or IPv6). Note that the 473 IP Prefix + the GW IP should have a length of either 64 or 256 474 bits, but never 160 bits (IPv4 and IPv6 mixed values are not 475 allowed). 477 The RD, Ethernet Tag ID, IP Prefix Length and IP Prefix are part of 478 the route key used by BGP to compare routes. The rest of the fields 479 are not part of the route key. 481 An IP Prefix Route MAY be sent along with a Router's MAC Extended 482 Community (defined in [EVPN-INTERSUBNET]) to carry the MAC address 483 that is used as the overlay index. Note that the MAC address may be 484 that of an TS. 486 3.2 Overlay Indexes and Recursive Lookup Resolution 488 RT-5 routes support recursive lookup resolution through the use of 489 Overlay Indexes as follows: 491 o An Overlay Index can be an ESI, IP address in the address space of 492 the tenant or MAC address and it is used by an NVE as the next-hop 493 for a given IP Prefix. An Overlay Index always needs a recursive 494 route resolution on the NVE/PE that installs the RT-5 into one of 495 its IP-VRFs, so that the NVE knows to which egress NVE/PE it needs 496 to forward the packets. It is important to note that recursive 497 resolution of the Overlay Index applies upon installation into an 498 IP-VRF, and not upon BGP propagation (for instance, on an ASBR). 499 Also, as a result of the recursive resolution, the egress NVE/PE is 500 not necessarily the same NVE that originated the RT-5. 502 o The Overlay Index is indicated along with the RT-5 in the ESI 503 field, GW IP field or Router's MAC Extended Community, depending on 504 whether the IP Prefix next-hop is an ESI, IP address or MAC address 505 in the tenant space. The Overlay Index for a given IP Prefix is set 506 by local policy at the NVE that originates an RT-5 for that IP 507 Prefix (typically managed by the Cloud Management System). 509 o In order to enable the recursive lookup resolution at the ingress 510 NVE, an NVE that is a possible egress NVE for a given Overlay Index 511 must originate a route advertising itself as the BGP next hop on 512 the path to the system denoted by the Overlay Index. For instance: 514 . If an NVE receives an RT-5 that specifies an Overlay Index, the 515 NVE cannot use the RT-5 in its IP-VRF unless (or until) it can 516 recursively resolve the Overlay Index. 517 . If the RT-5 specifies an ESI as the Overlay Index, recursive 518 resolution can only be done if the NVE has received and installed 519 an RT-1 (Auto-Discovery per-EVI) route specifying that ESI. 520 . If the RT-5 specifies a GW IP address as the Overlay Index, 521 recursive resolution can only be done if the NVE has received and 522 installed an RT-2 (MAC/IP route) specifying that IP address in 523 the IP address field of its NLRI. 524 . If the RT-5 specifies a MAC address as the Overlay Index, 525 recursive resolution can only be done if the NVE has received and 526 installed an RT-2 (MAC/IP route) specifying that MAC address in 527 the MAC address field of its NLRI. 529 Note that the RT-1 or RT-2 routes needed for the recursive 530 resolution may arrive before or after the given RT-5 route. 532 o Irrespective of the recursive resolution, if there is no IGP or BGP 533 route to the BGP next-hop of an RT-5, BGP MUST fail to install the 534 RT-5 even if the Overlay Index can be resolved. 536 o The ESI and GW IP fields may both be zero, however they MUST NOT 537 both be non-zero at the same time. A route containing a non-zero GW 538 IP and a non-zero ESI (at the same time) SHOULD be treat-as- 539 withdraw [RFC7606]. 541 o If either the ESI or GW IP are non-zero, then one of them is the 542 Overlay Index, regardless of whether the Router's MAC Extended 543 Community is present or the value of the Label. 545 The indirection provided by the Overlay Index and its recursive 546 lookup resolution is required to achieve fast convergence in case of 547 a failure of the object represented by the Overlay Index (see the 548 example described in section 2.2). 550 Table 1 shows the different RT-5 field combinations allowed by this 551 specification and what Overlay Index must be used by the receiving 552 NVE/PE in each case. Those cases where there is no Overlay Index, are 553 indicated as "None" in Table 1. If there is no Overlay Index the 554 receiving NVE/PE will not perform any recursive resolution, and the 555 actual next-hop is given by the RT-5's BGP next-hop. 557 +----------+----------+----------+------------+----------------+ 558 | ESI | GW IP | MAC* | Label | Overlay Index | 559 |--------------------------------------------------------------| 560 | Non-Zero | Zero | Zero | Don't Care | ESI | 561 | Non-Zero | Zero | Non-Zero | Don't Care | ESI | 562 | Zero | Non-Zero | Zero | Don't Care | GW IP | 563 | Zero | Zero | Non-Zero | Zero | MAC | 564 | Zero | Zero | Non-Zero | Non-Zero | MAC or None** | 565 | Zero | Zero | Zero | Non-Zero | None*** | 566 +----------+----------+----------+------------+----------------+ 568 Table 1 - RT-5 fields and Indicated Overlay Index 570 Table NOTES: 572 * MAC with Zero value means no Router's MAC extended community is 573 present along with the RT-5. Non-Zero indicates that the extended 574 community is present and carries a valid MAC address. The 575 encoding of a MAC address MUST be the 6-octet MAC address 576 specified by [802.1Q] and [802.1D-REV]. Examples of invalid MAC 577 addresses are broadcast or multicast MAC addresses. The route 578 MUST be treat-as-withdraw in case of an invalid MAC address. The 579 presence of the Router's MAC extended community alone is not 580 enough to indicate the use of the MAC address as the Overlay 581 Index, since the extended community can be used for other 582 purposes. 584 ** In this case, the Overlay Index may be the RT-5's MAC address or 585 None, depending on the local policy of the receiving NVE/PE. Note 586 that the advertising NVE/PE that sets the Overlay Index SHOULD 587 advertise an RT-2 for the MAC Overlay Index if there are 588 receiving NVE/PEs configured to use the MAC as the Overlay Index. 589 This case in Table 1 is used in the IP-VRF-to-IP-VRF 590 implementations described in 4.4.1 and 4.4.3. The support of a 591 MAC Overlay Index in this model is OPTIONAL. 593 *** The Overlay Index is None. This is a special case used for IP- 594 VRF-to-IP-VRF where the NVE/PEs are connected by IP NVO tunnels 595 as opposed to Ethernet NVO tunnels. 597 Table 2 shows the different inter-subnet use-cases described in this 598 document and the corresponding coding of the Overlay Index in the 599 route type 5 (RT-5). 601 +---------+---------------------+----------------------------+ 602 | Section | Use-case | Overlay Index in the RT-5 | 603 +-------------------------------+----------------------------+ 604 | 4.1 | TS IP address | GW IP | 605 | 4.2 | Floating IP address | GW IP | 606 | 4.3 | "Bump in the wire" | ESI or MAC | 607 | 4.4 | IP-VRF-to-IP-VRF | GW IP, MAC or None | 608 +---------+---------------------+----------------------------+ 610 Table 2 - Use-cases and Overlay Indexes for Recursive Resolution 612 The above use-cases are representative of the different Overlay 613 Indexes supported by RT-5 (GW IP, ESI, MAC or None). 615 4. Overlay Index Use-Cases 617 This section describes some use-cases for the Overlay Index types 618 used with the IP Prefix route. 620 4.1 TS IP Address Overlay Index Use-Case 622 Figure 4 illustrates an example of inter-subnet forwarding for 623 subnets sitting behind Virtual Appliances (on TS2 and TS3). 625 IP4---+ NVE2 DGW1 626 | +-----------+ +---------+ +-------------+ 627 SN2---TS2(VA)--| (BD-10) |-| |----| (BD-10) | 628 | IP2/M2 +-----------+ | | | IRB1\ | 629 -+---+ | | | (IP-VRF)|---+ 630 | | | +-------------+ _|_ 631 SN1 | VXLAN/ | ( ) 632 | | nvGRE | DGW2 ( WAN ) 633 -+---+ NVE3 | | +-------------+ (___) 634 | IP3/M3 +-----------+ | |----| (BD-10) | | 635 SN3---TS3(VA)--| (BD-10) |-| | | IRB2\ | | 636 | +-----------+ +---------+ | (IP-VRF)|---+ 637 IP5---+ +-------------+ 639 Figure 4 TS IP address use-case 641 An example of inter-subnet forwarding between subnet SN1/24 and a 642 subnet sitting in the WAN is described below. NVE2, NVE3, DGW1 and 643 DGW2 are running BGP EVPN. TS2 and TS3 do not participate in dynamic 644 routing protocols, and they only have a static route to forward the 645 traffic to the WAN. SN1/24 is dual-homed to NVE2 and NVE3. 647 In this case, a GW IP is used as an Overlay Index. Although a 648 different Overlay Index type could have been used, this use-case 649 assumes that the operator knows the VA's IP addresses beforehand, 650 whereas the VA's MAC address is unknown and the VA's ESI is zero. 651 Because of this, the GW IP is the suitable Overlay Index to be used 652 with the RT-5s. The NVEs know the GW IP to be used for a given Prefix 653 by policy. 655 (1) NVE2 advertises the following BGP routes on behalf of TS2: 657 o Route type 2 (MAC/IP route) containing: ML=48 (MAC Address 658 Length), M=M2 (MAC Address), IPL=32 (IP Address Length), 659 IP=IP2 and [RFC5512] BGP Encapsulation Extended Community with 660 the corresponding Tunnel-type. The MAC and IP addresses may be 661 learned via ARP-snooping (ND-snooping if IPv6). 663 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 664 ESI=0, GW IP address=IP2. The prefix and GW IP are learned by 665 policy. 667 (2) Similarly, NVE3 advertises the following BGP routes on behalf of 668 TS3: 670 o Route type 2 (MAC/IP route) containing: ML=48, M=M3, IPL=32, 671 IP=IP3 (and BGP Encapsulation Extended Community). 673 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 674 ESI=0, GW IP address=IP3. 676 (3) DGW1 and DGW2 import both received routes based on the 677 route-targets: 679 o Based on the BD-10 route-target in DGW1 and DGW2, the MAC/IP 680 route is imported and M2 is added to the BD-10 along with its 681 corresponding tunnel information. For instance, if VXLAN is 682 used, the VTEP will be derived from the MAC/IP route BGP next- 683 hop and VNI from the MPLS Label1 field. IP2 - M2 is added to 684 the ARP table. Similarly, M3 is added to BD-10 and IP3 - M3 to 685 the ARP table. 687 o Based on the BD-10 route-target in DGW1 and DGW2, the IP 688 Prefix route is also imported and SN1/24 is added to the IP- 689 VRF with Overlay Index IP2 pointing at the local BD-10. In 690 this example, it is assumed that the RT-5 from NVE2 is 691 preferred over the RT-5 from NVE3. If both routes were equally 692 preferable and ECMP enabled, SN1/24 would also be added to the 693 routing table with Overlay Index IP3. 695 (4) When DGW1 receives a packet from the WAN with destination IPx, 696 where IPx belongs to SN1/24: 698 o A destination IP lookup is performed on the DGW1 IP-VRF 699 routing table and Overlay Index=IP2 is found. Since IP2 is an 700 Overlay Index a recursive route resolution is required for 701 IP2. 703 o IP2 is resolved to M2 in the ARP table, and M2 is resolved to 704 the tunnel information given by the BD FIB (e.g. remote VTEP 705 and VNI for the VXLAN case). 707 o The IP packet destined to IPx is encapsulated with: 709 . Source inner MAC = IRB1 MAC. 711 . Destination inner MAC = M2. 713 . Tunnel information provided by the BD (VNI, VTEP IPs and 714 MACs for the VXLAN case). 716 (5) When the packet arrives at NVE2: 718 o Based on the tunnel information (VNI for the VXLAN case), the 719 BD-10 context is identified for a MAC lookup. 721 o Encapsulation is stripped-off and based on a MAC lookup 722 (assuming MAC forwarding on the egress NVE), the packet is 723 forwarded to TS2, where it will be properly routed. 725 (6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will 726 be applied to the MAC route IP2/M2, as defined in [RFC7432]. 727 Route type 5 prefixes are not subject to MAC mobility procedures, 728 hence no changes in the DGW IP-VRF routing table will occur for 729 TS2 mobility, i.e. all the prefixes will still be pointing at IP2 730 as Overlay Index. There is an indirection for e.g. SN1/24, which 731 still points at Overlay Index IP2 in the routing table, but IP2 732 will be simply resolved to a different tunnel, based on the 733 outcome of the MAC mobility procedures for the MAC/IP route 734 IP2/M2. 736 Note that in the opposite direction, TS2 will send traffic based on 737 its static-route next-hop information (IRB1 and/or IRB2), and regular 738 EVPN procedures will be applied. 740 4.2 Floating IP Overlay Index Use-Case 742 Sometimes Tenant Systems (TS) work in active/standby mode where an 743 upstream floating IP - owned by the active TS - is used as the 744 Overlay Index to get to some subnets behind. This redundancy mode, 745 already introduced in section 2.1 and 2.2, is illustrated in Figure 746 5. 748 NVE2 DGW1 749 +-----------+ +---------+ +-------------+ 750 +---TS2(VA)--| (BD-10) |-| |----| (BD-10) | 751 | IP2/M2 +-----------+ | | | IRB1\ | 752 | <-+ | | | (IP-VRF)|---+ 753 | | | | +-------------+ _|_ 754 SN1 vIP23 (floating) | VXLAN/ | ( ) 755 | | | nvGRE | DGW2 ( WAN ) 756 | <-+ NVE3 | | +-------------+ (___) 757 | IP3/M3 +-----------+ | |----| (BD-10) | | 758 +---TS3(VA)--| (BD-10) |-| | | IRB2\ | | 759 +-----------+ +---------+ | (IP-VRF)|---+ 760 +-------------+ 762 Figure 5 Floating IP Overlay Index for redundant TS 764 In this use-case, a GW IP is used as an Overlay Index for the same 765 reasons as in 4.1. However, this GW IP is a floating IP that belongs 766 to the active TS. Assuming TS2 is the active TS and owns vIP23: 768 (1) NVE2 advertises the following BGP routes for TS2: 770 o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32, 771 IP=vIP23 (and BGP Encapsulation Extended Community). The MAC 772 and IP addresses may be learned via ARP-snooping. 774 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 775 ESI=0, GW IP address=vIP23. The prefix and GW IP are learned 776 by policy. 778 (2) NVE3 advertises the following BGP route for TS3 (it does not 779 advertise an RT-2 for vIP23/M3): 781 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 782 ESI=0, GW IP address=vIP23. The prefix and GW IP are learned 783 by policy. 785 (3) DGW1 and DGW2 import both received routes based on the route- 786 target: 788 o M2 is added to the BD-10 FIB along with its corresponding 789 tunnel information. For the VXLAN use case, the VTEP will be 790 derived from the MAC/IP route BGP next-hop and VNI from the 791 VNI/VSID field. vIP23 - M2 is added to the ARP table. 793 o SN1/24 is added to the IP-VRF in DGW1 and DGW2 with Overlay 794 index vIP23 pointing at M2 in the local BD-10. 796 (4) When DGW1 receives a packet from the WAN with destination IPx, 797 where IPx belongs to SN1/24: 799 o A destination IP lookup is performed on the DGW1 IP-VRF 800 routing table and Overlay Index=vIP23 is found. Since vIP23 is 801 an Overlay Index, a recursive route resolution for vIP23 is 802 required. 804 o vIP23 is resolved to M2 in the ARP table, and M2 is resolved 805 to the tunnel information given by the BD (remote VTEP and VNI 806 for the VXLAN case). 808 o The IP packet destined to IPx is encapsulated with: 810 . Source inner MAC = IRB1 MAC. 812 . Destination inner MAC = M2. 814 . Tunnel information provided by the BD FIB (VNI, VTEP IPs 815 and MACs for the VXLAN case). 817 (5) When the packet arrives at NVE2: 819 o Based on the tunnel information (VNI for the VXLAN case), the 820 BD-10 context is identified for a MAC lookup. 822 o Encapsulation is stripped-off and based on a MAC lookup 823 (assuming MAC forwarding on the egress NVE), the packet is 824 forwarded to TS2, where it will be properly routed. 826 (6) When the redundancy protocol running between TS2 and TS3 appoints 827 TS3 as the new active TS for SN1, TS3 will now own the floating 828 vIP23 and will signal this new ownership (GARP message or 829 similar). Upon receiving the new owner's notification, NVE3 will 830 issue a route type 2 for M3-vIP23 and NVE2 will withdraw the RT-2 831 for M2-vIP23. DGW1 and DGW2 will update their ARP tables with the 832 new MAC resolving the floating IP. No changes are made in the IP- 833 VRF routing table. 835 4.3 Bump-in-the-Wire Use-Case 837 Figure 6 illustrates an example of inter-subnet forwarding for an IP 838 Prefix route that carries a subnet SN1. In this use-case, TS2 and TS3 839 are layer-2 VA devices without any IP address that can be included as 840 an Overlay Index in the GW IP field of the IP Prefix route. Their MAC 841 addresses are M2 and M3 respectively and are connected to BD-10. Note 842 that IRB1 and IRB2 (in DGW1 and DGW2 respectively) have IP addresses 843 in a subnet different than SN1. 845 NVE2 DGW1 846 M2 +-----------+ +---------+ +-------------+ 847 +---TS2(VA)--| (BD-10) |-| |----| (BD-10) | 848 | ESI23 +-----------+ | | | IRB1\ | 849 | + | | | (IP-VRF)|---+ 850 | | | | +-------------+ _|_ 851 SN1 | | VXLAN/ | ( ) 852 | | | nvGRE | DGW2 ( WAN ) 853 | + NVE3 | | +-------------+ (___) 854 | ESI23 +-----------+ | |----| (BD-10) | | 855 +---TS3(VA)--| (BD-10) |-| | | IRB2\ | | 856 M3 +-----------+ +---------+ | (IP-VRF)|---+ 857 +-------------+ 859 Figure 6 Bump-in-the-wire use-case 861 Since neither TS2 nor TS3 can participate in any dynamic routing 862 protocol and have no IP address assigned, there are two potential 863 Overlay Index types that can be used when advertising SN1: 865 a) an ESI, i.e. ESI23, that can be provisioned on the attachment 866 ports of NVE2 and NVE3, as shown in Figure 6. 867 b) or the VA's MAC address, that can be added to NVE2 and NVE3 by 868 policy. 870 The advantage of using an ESI as Overlay Index as opposed to the VA's 871 MAC address, is that the forwarding to the egress NVE can be done 872 purely based on the state of the AC in the ES (notified by the AD 873 per-EVI route) and all the EVPN multi-homing redundancy mechanisms 874 can be re-used. For instance, the [RFC7432] mass-withdrawal mechanism 875 for fast failure detection and propagation can be used. This section 876 assumes that an ESI Overlay Index is used in this use-case but it 877 does not prevent the use of the VA's MAC address as an Overlay Index. 878 If a MAC is used as Overlay Index, the control plane must follow the 879 procedures described in section 4.4.3. 881 The model supports VA redundancy in a similar way as the one 882 described in section 4.2 for the floating IP Overlay Index use-case, 883 except that it uses the EVPN Ethernet A-D per-EVI route instead of 884 the MAC advertisement route to advertise the location of the Overlay 885 Index. The procedure is explained below: 887 (1) Assuming TS2 is the active TS in ESI23, NVE2 advertises the 888 following BGP routes: 890 o Route type 1 (Ethernet A-D route for BD-10) containing: 891 ESI=ESI23 and the corresponding tunnel information (VNI/VSID 892 field), as well as the BGP Encapsulation Extended Community as 893 per [EVPN-OVERLAY]. 895 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 896 ESI=ESI23, GW IP address=0. The Router's MAC Extended 897 Community defined in [EVPN-INTERSUBNET] is added and carries 898 the MAC address (M2) associated to the TS behind which SN1 899 sits. M2 may be learned by policy. 901 (2) NVE3 advertises the following BGP route for TS3 (no AD per-EVI 902 route is advertised): 904 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 905 ESI=23, GW IP address=0. The Router's MAC Extended Community 906 is added and carries the MAC address (M3) associated to the TS 907 behind which SN1 sits. M3 may be learned by policy. 909 (3) DGW1 and DGW2 import the received routes based on the route- 910 target: 912 o The tunnel information to get to ESI23 is installed in DGW1 913 and DGW2. For the VXLAN use case, the VTEP will be derived 914 from the Ethernet A-D route BGP next-hop and VNI from the 915 VNI/VSID field (see [EVPN-OVERLAY]). 917 o The RT-5 coming from the NVE that advertised the RT-1 is 918 selected and SN1/24 is added to the IP-VRF in DGW1 and DGW2 919 with Overlay Index ESI23 and MAC = M2. 921 (4) When DGW1 receives a packet from the WAN with destination IPx, 922 where IPx belongs to SN1/24: 924 o A destination IP lookup is performed on the DGW1 IP-VRF 925 routing table and Overlay Index=ESI23 is found. Since ESI23 is 926 an Overlay Index, a recursive route resolution is required to 927 find the egress NVE where ESI23 resides. 929 o The IP packet destined to IPx is encapsulated with: 931 . Source inner MAC = IRB1 MAC. 933 . Destination inner MAC = M2 (this MAC will be obtained 934 from the Router's MAC Extended Community received along 935 with the RT-5 for SN1). Note that the Router's MAC 936 Extended Community is used in this case to carry the TS' 937 MAC address, as opposed to the NVE/PE's MAC address. 939 . Tunnel information for the NVO tunnel is provided by the 940 Ethernet A-D route per-EVI for ESI23 (VNI and VTEP IP for 941 the VXLAN case). 943 (5) When the packet arrives at NVE2: 945 o Based on the tunnel demultiplexer information (VNI for the 946 VXLAN case), the BD-10 context is identified for a MAC lookup 947 (assuming MAC disposition model) or the VNI may directly 948 identify the egress interface (for a label or VNI disposition 949 model). 951 o Encapsulation is stripped-off and based on a MAC lookup 952 (assuming MAC forwarding on the egress NVE) or a VNI lookup 953 (in case of VNI forwarding), the packet is forwarded to TS2, 954 where it will be forwarded to SN1. 956 (6) If the redundancy protocol running between TS2 and TS3 follows an 957 active/standby model and there is a failure, appointing TS3 as 958 the new active TS for SN1, TS3 will now own the connectivity to 959 SN1 and will signal this new ownership. Upon receiving the new 960 owner's notification, NVE3's AC will become active and issue a 961 route type 1 for ESI23, whereas NVE2 will withdraw its Ethernet 962 A-D route for ESI23. DGW1 and DGW2 will update their tunnel 963 information to resolve ESI23. The destination inner MAC will be 964 changed to M3. 966 4.4 IP-VRF-to-IP-VRF Model 968 This use-case is similar to the scenario described in "IRB forwarding 969 on NVEs for Tenant Systems" in [EVPN-INTERSUBNET], however the new 970 requirement here is the advertisement of IP Prefixes as opposed to 971 only host routes. 973 In the examples described in sections 4.1, 4.2 and 4.3, the BD 974 instance can connect IRB interfaces and any other Tenant Systems 975 connected to it. EVPN provides connectivity for: 977 1. Traffic destined to the IRB or TS IP interfaces as well as 979 2. Traffic destined to IP subnets sitting behind the TS, e.g. SN1 or 980 SN2. 982 In order to provide connectivity for (1), MAC/IP routes (RT-2) are 983 needed so that IRB or TS MACs and IPs can be distributed. 984 Connectivity type (2) is accomplished by the exchange of IP Prefix 985 routes (RT-5) for IPs and subnets sitting behind certain Overlay 986 Indexes, e.g. GW IP or ESI or TS MAC. 988 In some cases, IP Prefix routes may be advertised for subnets and IPs 989 sitting behind an IRB. This use-case is referred to as the "IP-VRF- 990 to-IP-VRF" model. 992 [EVPN-INTERSUBNET] defines an asymmetric IRB model and a symmetric 993 IRB model, based on the required lookups at the ingress and egress 994 NVE: the asymmetric model requires an ip-lookup and a mac-lookup at 995 the ingress NVE, whereas only a mac-lookup is needed at the egress 996 NVE; the symmetric model requires ip and mac lookups at both, ingress 997 and egress NVE. From that perspective, the IP-VRF-to-IP-VRF use-case 998 described in this section is a symmetric IRB model. 1000 Note that, in an IP-VRF-to-IP-VRF scenario, out of the many subnets 1001 that a tenant may have, it may be the case that only a few are 1002 attached to a given NVE/PE's IP-VRF. In order to provide inter-subnet 1003 connectivity among the set of NVE/PEs where the tenant is connected, 1004 a new "Supplementary Broadcast Domain" (SBD) is created on all of 1005 them if recursive resolution is needed. This SBD is instantiated as a 1006 regular BD (with no ACs) in each NVE/PE and has a IRB interfaces that 1007 connect the SBD to the IP-VRF. The IRB interface's IP or MAC address 1008 is used as the overlay index for recursive resolution. 1010 Depending on the existence and characteristics of the SBD and IRB 1011 interfaces for the IP-VRFs, there are three different IP-VRF-to-IP- 1012 VRF scenarios identified and described in this document: 1014 1) Interface-less model: no SBD and no overlay indexes required. 1015 2) Interface-ful with SBD IRB model: it requires SBD, as well as GW 1016 IP addresses as overlay indexes. 1017 3) Interface-ful with unnumbered SBD IRB model: it requires SBD, as 1018 well as MAC addresses as overlay indexes. 1020 Inter-subnet IP multicast is outside the scope of this document. 1022 4.4.1 Interface-less IP-VRF-to-IP-VRF Model 1024 Figure 7 will be used for the description of this model. 1026 NVE1(M1) 1027 +------------+ 1028 IP1+----| (BD-1) | DGW1(M3) 1029 | \ | +---------+ +--------+ 1030 | (IP-VRF)|----| |-|(IP-VRF)|----+ 1031 | / | | | +--------+ | 1032 +---| (BD-2) | | | _+_ 1033 | +------------+ | | ( ) 1034 SN1| | VXLAN/ | ( WAN )--H1 1035 | NVE2(M2) | nvGRE/ | (___) 1036 | +------------+ | MPLS | + 1037 +---| (BD-2) | | | DGW2(M4) | 1038 | \ | | | +--------+ | 1039 | (IP-VRF)|----| |-|(IP-VRF)|----+ 1040 | / | +---------+ +--------+ 1041 SN2+----| (BD-3) | 1042 +------------+ 1044 Figure 7 Interface-less IP-VRF-to-IP-VRF model 1046 In this case: 1048 a) The NVEs and DGWs must provide connectivity between hosts in SN1, 1049 SN2, IP1 and hosts sitting at the other end of the WAN, for 1050 example, H1. It is assumed that the DGWs import/export IP and/or 1051 VPN-IP routes from/to the WAN. 1053 b) The IP-VRF instances in the NVE/DGWs are directly connected 1054 through NVO tunnels, and no IRBs and/or BD instances are 1055 instantiated to connect the IP-VRFs. 1057 c) The solution must provide layer-3 connectivity among the IP-VRFs 1058 for Ethernet NVO tunnels, for instance, VXLAN or nvGRE. 1060 d) The solution may provide layer-3 connectivity among the IP-VRFs 1061 for IP NVO tunnels, for example, VXLAN GPE (with IP payload). 1063 In order to meet the above requirements, the EVPN route type 5 will 1064 be used to advertise the IP Prefixes, along with the Router's MAC 1065 Extended Community as defined in [EVPN-INTERSUBNET] if the 1066 advertising NVE/DGW uses Ethernet NVO tunnels. Each NVE/DGW will 1067 advertise an RT-5 for each of its prefixes with the following fields: 1069 o RD as per [RFC7432]. 1071 o Ethernet Tag ID=0. 1073 o IP address length and IP address, as explained in the previous 1074 sections. 1076 o GW IP address=0. 1078 o ESI=0 1080 o MPLS label or VNI corresponding to the IP-VRF. 1082 Each RT-5 will be sent with a route-target identifying the tenant 1083 (IP-VRF) and two BGP extended communities: 1085 o The first one is the BGP Encapsulation Extended Community, as 1086 per [RFC5512], identifying the tunnel type. 1088 o The second one is the Router's MAC Extended Community as per 1089 [EVPN-INTERSUBNET] containing the MAC address associated to 1090 the NVE advertising the route. This MAC address identifies the 1091 NVE/DGW and MAY be re-used for all the IP-VRFs in the NVE. The 1092 Router's MAC Extended Community must be sent if the route is 1093 associated to an Ethernet NVO tunnel, for instance, VXLAN. If 1094 the route is associated to an IP NVO tunnel, for instance 1095 VXLAN GPE with IP payload, the Router's MAC Extended Community 1096 should not be sent. 1098 The following example illustrates the procedure to advertise and 1099 forward packets to SN1/24 (IPv4 prefix advertised from NVE1): 1101 (1) NVE1 advertises the following BGP route: 1103 o Route type 5 (IP Prefix route) containing: 1105 . IPL=24, IP=SN1, Label=10. 1107 . GW IP= set to 0. 1109 . [RFC5512] BGP Encapsulation Extended Community. 1111 . Router's MAC Extended Community that contains M1. 1113 . Route-target identifying the tenant (IP-VRF). 1115 (2) DGW1 imports the received routes from NVE1: 1117 o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 1118 route-target. 1120 o Since GW IP=ESI=0, the Label is a non-zero value and the local 1121 policy indicates this interface-less model, DGW1 will use the 1122 Label and next-hop of the RT-5, as well as the MAC address 1123 conveyed in the Router's MAC Extended Community (as inner 1124 destination MAC address) to set up the forwarding state and 1125 later encapsulate the routed IP packets. 1127 (3) When DGW1 receives a packet from the WAN with destination IPx, 1128 where IPx belongs to SN1/24: 1130 o A destination IP lookup is performed on the DGW1 IP-VRF 1131 routing table. The lookup yields SN1/24. 1133 o Since the RT-5 for SN1/24 had a GW IP=ESI=0, a non-zero Label 1134 and next-hop and the model is interface-less, DGW1 will not 1135 need a recursive lookup to resolve the route. 1137 o The IP packet destined to IPx is encapsulated with: Source 1138 inner MAC = DGW1 MAC, Destination inner MAC = M1, Source outer 1139 IP (tunnel source IP) = DGW1 IP, Destination outer IP (tunnel 1140 destination IP) = NVE1 IP. The Source and Destination inner 1141 MAC addresses are not needed if IP NVO tunnels are used. 1143 (4) When the packet arrives at NVE1: 1145 o NVE1 will identify the IP-VRF for an IP-lookup based on the 1146 Label (the Destination inner MAC is not needed to identify the 1147 IP-VRF). 1149 o An IP lookup is performed in the routing context, where SN1 1150 turns out to be a local subnet associated to BD-2. A 1151 subsequent lookup in the ARP table and the BD FIB will provide 1152 the forwarding information for the packet in BD-2. 1154 The model described above is called Interface-less model since the 1155 IP-VRFs are connected directly through tunnels and they don't require 1156 those tunnels to be terminated in SBDs instead, like in sections 1157 4.4.2 or 4.4.3. 1159 4.4.2 Interface-ful IP-VRF-to-IP-VRF with SBD IRB 1161 Figure 8 will be used for the description of this model. 1163 NVE1 1164 +------------+ DGW1 1165 IP10+---+(BD-1) | +---------------+ +------------+ 1166 | \ | | | | | 1167 |(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ 1168 | / IRB(IP1/M1) IRB(IP3/M3) | | 1169 +---+(BD-2) | | | +------------+ _+_ 1170 | +------------+ | | ( ) 1171 SN1| | VXLAN/ | ( WAN )--H1 1172 | NVE2 | nvGRE/ | (___) 1173 | +------------+ | MPLS | DGW2 + 1174 +---+(BD-2) | | | +------------+ | 1175 | \ | | | | | | 1176 |(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ 1177 | / IRB(IP2/M2) IRB(IP4/M4) | 1178 SN2+----+(BD-3) | +---------------+ +------------+ 1179 +------------+ 1181 Figure 8 Interface-ful with SBD IRB model 1183 In this model: 1185 a) As in section 4.4.1, the NVEs and DGWs must provide connectivity 1186 between hosts in SN1, SN2, IP10 and hosts sitting at the other end 1187 of the WAN. 1189 b) However, the NVE/DGWs are now connected through Ethernet NVO 1190 tunnels terminated in the SBD instance. The IP-VRFs use IRB 1191 interfaces for their connectivity to the SBD. 1193 c) Each SBD IRB has an IP and a MAC address, where the IP address 1194 must be reachable from other NVEs or DGWs. 1196 d) The SBD is attached to all the NVE/DGWs in the tenant domain BDs. 1198 e) The solution must provide layer-3 connectivity for Ethernet NVO 1199 tunnels, for instance, VXLAN or nvGRE. 1201 EVPN type 5 routes will be used to advertise the IP Prefixes, whereas 1202 EVPN RT-2 routes will advertise the MAC/IP addresses of each SBD IRB 1203 interface. Each NVE/DGW will advertise an RT-5 for each of its 1204 prefixes with the following fields: 1206 o RD as per [RFC7432]. 1208 o Ethernet Tag ID=0. 1210 o IP address length and IP address, as explained in the previous 1211 sections. 1213 o GW IP address=IRB-IP (this is the Overlay Index that will be 1214 used for the recursive route resolution). 1216 o ESI=0 1218 o Label value should be zero since the RT-5 route requires a 1219 recursive lookup resolution to an RT-2 route. It is ignored on 1220 reception, and, when forwarding packets, the MPLS label or VNI 1221 from the RT-2's MPLS Label1 field is used. 1223 Each RT-5 will be sent with a route-target identifying the tenant 1224 (IP-VRF). The Router's MAC Extended Community should not be sent in 1225 this case. 1227 The following example illustrates the procedure to advertise and 1228 forward packets to SN1/24 (IPv4 prefix advertised from NVE1): 1230 (1) NVE1 advertises the following BGP routes: 1232 o Route type 5 (IP Prefix route) containing: 1234 . IPL=24, IP=SN1, Label= SHOULD be set to 0. 1236 . GW IP=IP1 (sBD IRB's IP) 1238 . Route-target identifying the tenant (IP-VRF). 1240 o Route type 2 (MAC/IP route for the SBD IRB) containing: 1242 . ML=48, M=M1, IPL=32, IP=IP1, Label=10. 1244 . A [RFC5512] BGP Encapsulation Extended Community. 1246 . Route-target identifying the SBD. This route-target may be 1247 the same as the one used with the RT-5. 1249 (2) DGW1 imports the received routes from NVE1: 1251 o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 1252 route-target. 1254 . Since GW IP is different from zero, the GW IP (IP1) will be 1255 used as the Overlay Index for the recursive route resolution 1256 to the RT-2 carrying IP1. 1258 (3) When DGW1 receives a packet from the WAN with destination IPx, 1259 where IPx belongs to SN1/24: 1261 o A destination IP lookup is performed on the DGW1 IP-VRF 1262 routing table. The lookup yields SN1/24, which is associated 1263 to the Overlay Index IP1. The forwarding information is 1264 derived from the RT-2 received for IP1. 1266 o The IP packet destined to IPx is encapsulated with: Source 1267 inner MAC = M3, Destination inner MAC = M1, Source outer IP 1268 (source VTEP) = DGW1 IP, Destination outer IP (destination 1269 VTEP) = NVE1 IP. 1271 (4) When the packet arrives at NVE1: 1273 o NVE1 will identify the IP-VRF for an IP-lookup based on the 1274 Label and the inner MAC DA. 1276 o An IP lookup is performed in the routing context, where SN1 1277 turns out to be a local subnet associated to BD-2. A 1278 subsequent lookup in the ARP table and the BD FIB will provide 1279 the forwarding information for the packet in BD-2. 1281 The model described above is called 'Interface-ful with SBD IRB 1282 model' since the tunnels connecting the DGWs and NVEs need to be 1283 terminated into the SBD. The SBD is connected to the IP-VRFs via SBD 1284 IRB interfaces, and that allows the recursive resolution of RT-5s to 1285 GW IP addresses. 1287 4.4.3 Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD IRB 1289 Figure 9 will be used for the description of this model. Note that 1290 this model is similar to the one described in section 4.4.2, only 1291 without IP addresses on the SBD IRB interfaces. 1293 NVE1 1294 +------------+ DGW1 1295 IP1+----+(BD-1) | +---------------+ +------------+ 1296 | \ | | | | | 1297 |(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ 1298 | / IRB(M1)| | IRB(M3) | | 1299 +---+(BD-2) | | | +------------+ _+_ 1300 | +------------+ | | ( ) 1301 SN1| | VXLAN/ | ( WAN )--H1 1302 | NVE2 | nvGRE/ | (___) 1303 | +------------+ | MPLS | DGW2 + 1304 +---+(BD-2) | | | +------------+ | 1305 | \ | | | | | | 1306 |(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ 1307 | / IRB(M2)| | IRB(M4) | 1308 SN2+----+(BD-3) | +---------------+ +------------+ 1309 +------------+ 1311 Figure 9 Interface-ful with unnumbered SBD IRB model 1313 In this model: 1315 a) As in section 4.4.1 and 4.4.2, the NVEs and DGWs must provide 1316 connectivity between hosts in SN1, SN2, IP1 and hosts sitting at 1317 the other end of the WAN. 1319 b) As in section 4.4.2, the NVE/DGWs are connected through Ethernet 1320 NVO tunnels terminated in the SBD instance. The IP-VRFs use IRB 1321 interfaces for their connectivity to the SBD. 1323 c) However, each SBD IRB has a MAC address only, and no IP address 1324 (that is why the model refers to an 'unnumbered' SBD IRB). In this 1325 model, there is no need to have IP reachability to the SBD IRB 1326 interfaces themselves and there is a requirement to save IP 1327 addresses on those interfaces. 1329 d) As in section 4.4.2, the SBD is composed of all the NVE/DGW BDs of 1330 the tenant that need inter-subnet-forwarding. 1332 e) As in section 4.4.2, the solution must provide layer-3 1333 connectivity for Ethernet NVO tunnels, for instance, VXLAN or 1334 nvGRE. 1336 This model will also make use of the RT-5 recursive resolution. EVPN 1337 type 5 routes will advertise the IP Prefixes along with the Router's 1338 MAC Extended Community used for the recursive lookup, whereas EVPN 1339 RT-2 routes will advertise the MAC addresses of each SBD IRB 1340 interface (this time without an IP). 1342 Each NVE/DGW will advertise an RT-5 for each of its prefixes with the 1343 same fields as described in 4.4.2 except for: 1345 o GW IP address= set to 0. 1347 Each RT-5 will be sent with a route-target identifying the tenant 1348 (IP-VRF) and the Router's MAC Extended Community containing the MAC 1349 address associated to SBD IRB interface. This MAC address may be re- 1350 used for all the IP-VRFs in the NVE. 1352 The example is similar to the one in section 4.4.2: 1354 (1) NVE1 advertises the following BGP routes: 1356 o Route type 5 (IP Prefix route) containing the same values as 1357 in the example in section 4.4.2, except for: 1359 . GW IP= SHOULD be set to 0. 1361 . Router's MAC Extended Community containing M1 (this will be 1362 used for the recursive lookup to a RT-2). 1364 o Route type 2 (MAC route for the SBD IRB) with the same values 1365 as in section 4.4.2 except for: 1367 . ML=48, M=M1, IPL=0, Label=10. 1369 (2) DGW1 imports the received routes from NVE1: 1371 o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 1372 route-target. 1374 . The MAC contained in the Router's MAC Extended Community 1375 sent along with the RT-5 (M1) will be used as the Overlay 1376 Index for the recursive route resolution to the RT-2 1377 carrying M1. 1379 (3) When DGW1 receives a packet from the WAN with destination IPx, 1380 where IPx belongs to SN1/24: 1382 o A destination IP lookup is performed on the DGW1 IP-VRF 1383 routing table. The lookup yields SN1/24, which is associated 1384 to the Overlay Index M1. The forwarding information is derived 1385 from the RT-2 received for M1. 1387 o The IP packet destined to IPx is encapsulated with: Source 1388 inner MAC = M3, Destination inner MAC = M1, Source outer IP 1389 (source VTEP) = DGW1 IP, Destination outer IP (destination 1390 VTEP) = NVE1 IP. 1392 (4) When the packet arrives at NVE1: 1394 o NVE1 will identify the IP-VRF for an IP-lookup based on the 1395 Label and the inner MAC DA. 1397 o An IP lookup is performed in the routing context, where SN1 1398 turns out to be a local subnet associated to BD-2. A 1399 subsequent lookup in the ARP table and the BD FIB will provide 1400 the forwarding information for the packet in BD-2. 1402 The model described above is called Interface-ful with SBD IRB model 1403 (as in section 4.4.2), only this time the SBD IRB does not have an IP 1404 address. 1406 5. Conclusions 1408 An EVPN route (type 5) for the advertisement of IP Prefixes is 1409 described in this document. This new route type has a differentiated 1410 role from the RT-2 route and addresses the Data Center (or NVO-based 1411 networks in general) inter-subnet connectivity scenarios described in 1412 this document. Using this new RT-5, an IP Prefix may be advertised 1413 along with an Overlay Index that can be a GW IP address, a MAC or an 1414 ESI, or without an Overlay Index, in which case the BGP next-hop will 1415 point at the egress NVE/ASBR/ABR and the MAC in the Router's MAC 1416 Extended Community will provide the inner MAC destination address to 1417 be used. As discussed throughout the document, the EVPN RT-2 does not 1418 meet the requirements for all the DC use cases, therefore this EVPN 1419 route type 5 is required. 1421 The EVPN route type 5 decouples the IP Prefix advertisements from the 1422 MAC/IP route advertisements in EVPN, hence: 1424 a) Allows the clean and clear advertisements of IPv4 or IPv6 prefixes 1425 in an NLRI with no MAC addresses. 1427 b) Since the route type is different from the MAC/IP Advertisement 1428 route, the current [RFC7432] procedures do not need to be 1429 modified. 1431 c) Allows a flexible implementation where the prefix can be linked to 1432 different types of Overlay/Underlay Indexes: overlay IP address, 1433 overlay MAC addresses, overlay ESI, underlay BGP next-hops, etc. 1435 d) An EVPN implementation not requiring IP Prefixes can simply 1436 discard them by looking at the route type value. 1438 6. Security Considerations 1440 This document provides a set of procedures to achieve Inter-Subnet 1441 Forwarding across NVEs or PEs attached to a group of BDs that belong 1442 to the same tenant (or VPN). The security considerations discussed in 1443 [RFC7432] apply to the Intra-Subnet Forwarding or communication 1444 within each of those BDs. In addition, the security considerations in 1445 [RFC4364] should also be understood, since this document and 1446 [RFC4364] may be used in similar applications. 1448 Contrary to [RFC4364], this document does not describe PE/CE route 1449 distribution techniques, but rather considers the CEs as TSes or VAs 1450 that do not run dynamic routing protocols. This can be considered a 1451 security advantage, since dynamic routing protocols can be blocked on 1452 the NVE/PE ACs. 1454 In this document, the RT-5 may use a regular BGP Next Hop for its 1455 resolution or an Overlay Index that requires a recursive resolution 1456 to a different EVPN route (an RT-2 or an RT-1). In the latter case, 1457 it is worth noting that any action that end up filtering or modifying 1458 the RT-2/RT-1 routes used to convey the Overlay Indexes, will modify 1459 the resolution of the RT-5 and therefore the forwarding of packets to 1460 the remote subnet. 1462 7. IANA Considerations 1464 As requested by this document, value 5 in the "EVPN Route Types" 1465 registry defined by [RFC7432] has been allocated: 1467 Value Description Reference 1468 5 IP Prefix route [this document] 1470 8. References 1472 8.1 Normative References 1474 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1475 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet 1476 VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, . 1479 [RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation 1480 Subsequent Address Family Identifier (SAFI) and the BGP Tunnel 1481 Encapsulation Attribute", RFC 5512, DOI 10.17487/RFC5512, April 2009, 1482 . 1484 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1485 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1486 1997, . 1488 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC2119 1489 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, 1490 . 1492 [EVPN-OVERLAY] Sajassi-Drake et al., "A Network Virtualization 1493 Overlay Solution using EVPN", draft-ietf-bess-evpn-overlay-12.txt, 1494 work in progress, February, 2018. 1496 [EVPN-INTERSUBNET] Sajassi et al., "IP Inter-Subnet Forwarding in 1497 EVPN", draft-ietf-bess-evpn-inter-subnet-forwarding-03.txt, work in 1498 progress, February, 2017 1500 8.2 Informative References 1502 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1503 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, 1504 . 1506 [RFC7606] Chen, E., Scudder, J., Mohapatra, P., and K. Patel, 1507 "Revised Error Handling for BGP UPDATE Messages", RFC 7606, August 1508 2015, . 1510 [802.1D-REV] "IEEE Standard for Local and metropolitan area networks 1511 - Media Access Control (MAC) Bridges", IEEE Std. 802.1D, June 2004. 1513 [802.1Q] "IEEE Standard for Local and metropolitan area networks - 1514 Media Access Control (MAC) Bridges and Virtual Bridged Local Area 1515 Networks", IEEE Std 802.1Q(tm), 2014 Edition, November 2014. 1517 9. Acknowledgments 1519 The authors would like to thank Mukul Katiyar and Jeffrey Zhang for 1520 their valuable feedback and contributions. The following people also 1521 helped improving this document with their feedback: Tony Przygienda 1522 and Thomas Morin. Special THANK YOU to Eric Rosen for his detailed 1523 review, it really helped improve the readability and clarify the 1524 concepts. Thank you to Alvaro Retana for his thorough review. 1526 10. Contributors 1528 In addition to the authors listed on the front page, the following 1529 co-authors have also contributed to this document: 1531 Senthil Sathappan 1532 Florin Balus 1533 Aldrin Isaac 1534 Senad Palislamovic 1535 Samir Thoria 1537 11. Authors' Addresses 1539 Jorge Rabadan (Editor) 1540 Nokia 1541 777 E. Middlefield Road 1542 Mountain View, CA 94043 USA 1543 Email: jorge.rabadan@nokia.com 1545 Wim Henderickx 1546 Nokia 1547 Email: wim.henderickx@nokia.com 1549 John E. Drake 1550 Juniper 1551 Email: jdrake@juniper.net 1553 Ali Sajassi 1554 Cisco 1555 Email: sajassi@cisco.com 1557 Wen Lin 1558 Juniper 1559 Email: wlin@juniper.net