idnits 2.17.1 draft-ietf-bess-evpn-prefix-advertisement-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 20, 2017) is 2351 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC7432' is mentioned on line 1405, but not defined == Missing Reference: 'RFC4364' is mentioned on line 164, but not defined == Missing Reference: 'RFC7606' is mentioned on line 374, but not defined == Missing Reference: 'RFC5512' is mentioned on line 1194, but not defined ** Obsolete undefined reference: RFC 5512 (Obsoleted by RFC 9012) == Missing Reference: 'RFC2119' is mentioned on line 1396, but not defined == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-03 == Outdated reference: A later version (-12) exists of draft-ietf-bess-evpn-overlay-08 Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Workgroup J. Rabadan, Ed. 3 Internet Draft W. Henderickx 4 Intended status: Standards Track Nokia 6 J. Drake 7 W. Lin 8 Juniper 10 A. Sajassi 11 Cisco 13 Expires: April 23, 2018 October 20, 2017 15 IP Prefix Advertisement in EVPN 16 draft-ietf-bess-evpn-prefix-advertisement-08 18 Abstract 20 EVPN provides a flexible control plane that allows intra-subnet 21 connectivity in an MPLS and/or NVO-based network. In some networks, 22 there is also a need for a dynamic and efficient inter-subnet 23 connectivity across Tenant Systems and End Devices that can be 24 physical or virtual and do not necessarily participate in dynamic 25 routing protocols. This document defines a new EVPN route type for 26 the advertisement of IP Prefixes and explains some use-case examples 27 where this new route-type is used. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as Internet- 37 Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/ietf/1id-abstracts.txt 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 This Internet-Draft will expire on April 23, 2018. 51 Copyright Notice 53 Copyright (c) 2017 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 2. Introduction and Problem Statement . . . . . . . . . . . . . . 4 70 2.1 Inter-Subnet Connectivity Requirements in Data Centers . . . 4 71 2.2 The Requirement for a New EVPN Route Type . . . . . . . . . 7 72 3. The BGP EVPN IP Prefix Route . . . . . . . . . . . . . . . . . 8 73 3.1 IP Prefix Route Encoding . . . . . . . . . . . . . . . . . . 9 74 3.2 Overlay Indexes and Recursive Lookup Resolution . . . . . . 10 75 4. Overlay Index Use-Cases . . . . . . . . . . . . . . . . . . . . 13 76 4.1 TS IP Address Overlay Index Use-Case . . . . . . . . . . . . 13 77 4.2 Floating IP Overlay Index Use-Case . . . . . . . . . . . . . 15 78 4.3 Bump-in-the-Wire Use-Case . . . . . . . . . . . . . . . . . 17 79 4.4 IP-VRF-to-IP-VRF Model . . . . . . . . . . . . . . . . . . . 20 80 4.4.1 Interface-less IP-VRF-to-IP-VRF Model . . . . . . . . . 21 81 4.4.2 Interface-ful IP-VRF-to-IP-VRF with SBD IRB . . . . . . 24 82 4.4.3 Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD IRB . 27 83 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 30 84 6. Conventions used in this document . . . . . . . . . . . . . . . 31 85 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 31 86 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 31 87 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 31 88 9.1 Normative References . . . . . . . . . . . . . . . . . . . . 31 89 9.2 Informative References . . . . . . . . . . . . . . . . . . . 31 90 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 32 91 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 32 92 12. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 32 94 1. Terminology 96 GW IP: Gateway IP Address. 98 IPL: IP address length. 100 ML: MAC address length. 102 NVE: Network Virtualization Edge. 104 TS: Tenant System. 106 VA: Virtual Appliance. 108 RT-2: EVPN route type 2, i.e. MAC/IP advertisement route. 110 RT-5: EVPN route type 5, i.e. IP Prefix route. 112 AC: Attachment Circuit. 114 ARP: Address Resolution Protocol. 116 ND: Neighbor Discovery Protocol. 118 Ethernet NVO tunnel: it refers to Network Virtualization Overlay 119 tunnels with Ethernet payload. Examples of this type of tunnels 120 are VXLAN or nvGRE. 122 IP NVO tunnel: it refers to Network Virtualization Overlay tunnels 123 with IP payload (no MAC header in the payload). 125 EVI: EVPN Instance spanning the NVE/PE devices that are participating 126 on that EVPN. 128 MAC-VRF: A Virtual Routing and Forwarding table for Media Access 129 Control (MAC) addresses on an NVE/PE, as per [RFC7432]. 131 BD: Broadcast Domain. As per [RFC7432], an EVI consists of a single 132 or multiple BDs. In case of VLAN-bundle and VLAN-based service 133 models (see [RFC7432]), a BD is equivalent to an EVI. In case of 134 VLAN-aware bundle service model, an EVI contains multiple BDs. 135 Also, in this document, BD and subnet are equivalent terms. 137 BT: Bridge Table. The instantiation of a BD in a MAC-VRF. 139 IP-VRF: A VPN Routing and Forwarding table for IP routes on an 140 NVE/PE. The IP routes could be populated by EVPN and IP-VPN 141 address families. 143 IRB: Integrated Routing and Bridging interface. It connects an IP-VRF 144 to a BD (or subnet). 146 SBD: Supplementary Broadcast Domain. A BD that does not have any ACs, 147 only IRB interfaces, and it is used to provide connectivity among 148 all the IP-VRFs of the tenant. The SBD is only required in IP-VRF- 149 to-IP-VRF use-cases (see section 4.4.). 151 2. Introduction and Problem Statement 153 Inter-subnet connectivity is used for certain tenants within the Data 154 Center. [EVPN-INTERSUBNET] defines some fairly common inter-subnet 155 forwarding scenarios where TSes can exchange packets with TSes 156 located in remote subnets. In order to achieve this, 157 [EVPN-INTERSUBNET] describes how MAC/IPs encoded in TS RT-2 routes 158 are not only used to populate MAC-VRF and overlay ARP tables, but 159 also IP-VRF tables with the encoded TS host routes (/32 or /128). In 160 some cases, EVPN may advertise IP Prefixes and therefore provide 161 aggregation in the IP-VRF tables, as opposed to program individual 162 host routes. This document complements the scenarios described in 163 [EVPN-INTERSUBNET] and defines how EVPN may be used to advertise IP 164 Prefixes. Interoperability between EVPN and L3VPN [RFC4364] IP Prefix 165 routes is out of the scope of this document. 167 Section 2.1 describes the inter-subnet connectivity requirements in 168 Data Centers. Section 2.2 explains why a new EVPN route type is 169 required for IP Prefix advertisements. Once the need for a new EVPN 170 route type is justified, sections 3, 4 and 5 will describe this route 171 type and how it is used in some specific use cases. 173 2.1 Inter-Subnet Connectivity Requirements in Data Centers 175 [RFC7432] is used as the control plane for a Network Virtualization 176 Overlay (NVO3) solution in Data Centers (DC), where Network 177 Virtualization Edge (NVE) devices can be located in Hypervisors or 178 TORs, as described in [EVPN-OVERLAY]. 180 If we use the term Tenant System (TS) to designate a physical or 181 virtual system identified by MAC and maybe IP addresses, and 182 connected to a BD by an Attachment Circuit, the following 183 considerations apply: 185 o The Tenant Systems may be Virtual Machines (VMs) that generate 186 traffic from their own MAC and IP. 188 o The Tenant Systems may be Virtual Appliance entities (VAs) that 189 forward traffic to/from IP addresses of different End Devices 190 sitting behind them. 192 o These VAs can be firewalls, load balancers, NAT devices, other 193 appliances or virtual gateways with virtual routing instances. 195 o These VAs do not necessarily participate in dynamic routing 196 protocols and hence rely on the EVPN NVEs to advertise the 197 routes on their behalf. 199 o In all these cases, the VA will forward traffic to other TSes 200 using its own source MAC but the source IP will be the one 201 associated to the End Device sitting behind or a translated IP 202 address (part of a public NAT pool) if the VA is performing 203 NAT. 205 o Note that the same IP address could exist behind two of these 206 TS. One example of this would be certain appliance resiliency 207 mechanisms, where a virtual IP or floating IP can be owned by 208 one of the two VAs running the resiliency protocol (the master 209 VA). Virtual Router Redundancy Protocol (VRRP), RFC5798, is 210 one particular example of this. Another example is multi-homed 211 subnets, i.e. the same subnet is connected to two VAs. 213 o Although these VAs provide IP connectivity to VMs and subnets 214 behind them, they do not always have their own IP interface 215 connected to the EVPN NVE, e.g. layer-2 firewalls are examples 216 of VAs not supporting IP interfaces. 218 Figure 1 illustrates some of the examples described above. 220 NVE1 221 +-----------+ 222 TS1(VM)--| (BD-10) |-----+ 223 IP1/M1 +-----------+ | DGW1 224 +---------+ +-------------+ 225 | |----| (BD-10) | 226 SN1---+ NVE2 | | | IRB1\ | 227 | +-----------+ | | | (IP-VRF)|---+ 228 SN2---TS2(VA)--| (BD-10) |-| | +-------------+ _|_ 229 | IP2/M2 +-----------+ | VXLAN/ | ( ) 230 IP4---+ <-+ | nvGRE | DGW2 ( WAN ) 231 | | | +-------------+ (___) 232 vIP23 (floating) | |----| (BD-10) | | 233 | +---------+ | IRB2\ | | 234 SN1---+ <-+ NVE3 | | | | (IP-VRF)|---+ 235 | IP3/M3 +-----------+ | | | +-------------+ 236 SN3---TS3(VA)--| (BD-10) |---+ | | 237 | +-----------+ | | 238 IP5---+ | | 239 | | 240 NVE4 | | NVE5 +--SN5 241 +---------------------+ | | +-----------+ | 242 IP6------| (BD-1) | | +-| (BD-10) |--TS4(VA)--SN6 243 | \ | | +-----------+ | 244 | (IP-VRF) |--+ ESI4 +--SN7 245 | / \IRB3 | 246 |---| (BD-2) (BD-10) | 247 SN4| +---------------------+ 249 Figure 1 DC inter-subnet use-cases 251 Where: 253 NVE1, NVE2, NVE3, NVE4, NVE5, DGW1 and DGW2 share the same BD for a 254 particular tenant. BD-10 is comprised of the collection of BD 255 instances defined in all the NVEs. All the hosts connected to BD-10 256 belong to the same IP subnet. The hosts connected to BD-10 are listed 257 below: 259 o TS1 is a VM that generates/receives traffic from/to IP1, where IP1 260 belongs to the BD-10 subnet. 262 o TS2 and TS3 are Virtual Appliances (VA) that send/receive traffic 263 from/to the subnets and hosts sitting behind them (SN1, SN2, SN3, 264 IP4 and IP5). Their IP addresses (IP2 and IP3) belong to the BD-10 265 subnet and they can also generate/receive traffic. When these VAs 266 receive packets destined to their own MAC addresses (M2 and M3) 267 they will route the packets to the proper subnet or host. These VAs 268 do not support routing protocols to advertise the subnets connected 269 to them and can move to a different server and NVE when the Cloud 270 Management System decides to do so. These VAs may also support 271 redundancy mechanisms for some subnets, similar to VRRP, where a 272 floating IP is owned by the master VA and only the master VA 273 forwards traffic to a given subnet. E.g.: vIP23 in figure 1 is a 274 floating IP that can be owned by TS2 or TS3 depending on who the 275 master is. Only the master will forward traffic to SN1. 277 o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 have 278 their own IP addresses that belong to the BD-10 subnet too. These 279 IRB interfaces connect the BD-10 subnet to Virtual Routing and 280 Forwarding (IP-VRF) instances that can route the traffic to other 281 subnets for the same tenant (within the DC or at the other end of 282 the WAN). 284 o TS4 is a layer-2 VA that provides connectivity to subnets SN5, SN6 285 and SN7, but does not have an IP address itself in the BD-10. TS4 286 is connected to a physical port on NVE5 assigned to Ethernet 287 Segment Identifier 4. 289 All the above DC use cases require inter-subnet forwarding and 290 therefore the individual host routes and subnets: 292 a) MUST be advertised from the NVEs (since VAs and VMs do not 293 participate in dynamic routing protocols) and 294 b) MAY be associated to an Overlay Index that can be a VA IP address, 295 a floating IP address, a MAC address or an ESI. An Overlay Index 296 is a next-hop that requires a recursive resolution and it is 297 described in section 3.2. 299 2.2 The Requirement for a New EVPN Route Type 301 [RFC7432] defines a MAC/IP route (also referred as RT-2) where a MAC 302 address can be advertised together with an IP address length (IPL) 303 and IP address (IP). While a variable IPL might have been used to 304 indicate the presence of an IP prefix in a route type 2, there are 305 several specific use cases in which using this route type to deliver 306 IP Prefixes is not suitable. 308 One example of such use cases is the "floating IP" example described 309 in section 2.1. In this example we need to decouple the advertisement 310 of the prefixes from the advertisement of MAC address of either M2 or 311 M3", otherwise the solution gets highly inefficient and does not 312 scale. 314 E.g.: if we are advertising 1k prefixes from M2 (using RT-2) and the 315 floating IP owner changes from M2 to M3, we would need to withdraw 1k 316 routes from M2 and re-advertise 1k routes from M3. However if we use 317 a separate route type, we can advertise the 1k routes associated to 318 the floating IP address (vIP23) and only one RT-2 for advertising the 319 ownership of the floating IP, i.e. vIP23 and M2 in the route type 2. 320 When the floating IP owner changes from M2 to M3, a single RT-2 321 withdraw/update is required to indicate the change. The remote DGW 322 will not change any of the 1k prefixes associated to vIP23, but will 323 only update the ARP resolution entry for vIP23 (now pointing at M3). 325 Other reasons to decouple the IP Prefix advertisement from the MAC/IP 326 route are listed below: 328 o Clean identification, operation and troubleshooting of IP Prefixes, 329 independent of and not subject to the interpretation of the IPL and 330 the IP value. E.g.: a default IP route 0.0.0.0/0 must always be 331 easily and clearly distinguished from the absence of IP 332 information. 334 o In MAC/IP routes, the MAC information is part of the NLRI, so if IP 335 Prefixes were to be advertised using MAC/IP routes, the MAC 336 information would always be present and part of the route key. 338 The following sections describe how EVPN is extended with a new route 339 type for the advertisement of IP prefixes and how this route is used 340 to address the current and future inter-subnet connectivity 341 requirements existing in the Data Center. 343 3. The BGP EVPN IP Prefix Route 345 The current BGP EVPN NLRI as defined in [RFC7432] is shown below: 347 +-----------------------------------+ 348 | Route Type (1 octet) | 349 +-----------------------------------+ 350 | Length (1 octet) | 351 +-----------------------------------+ 352 | Route Type specific (variable) | 353 +-----------------------------------+ 355 Where the route type field can contain one of the following specific 356 values (refer to the IANA "EVPN Route Types registry): 358 + 1 - Ethernet Auto-Discovery (A-D) route 360 + 2 - MAC/IP advertisement route 361 + 3 - Inclusive Multicast Route 363 + 4 - Ethernet Segment Route 365 This document defines an additional route type that IANA has added to 366 the registry, and will be used for the advertisement of IP Prefixes: 368 + 5 - IP Prefix Route 370 The support for this new route type is OPTIONAL. 372 Since this new route type is OPTIONAL, an implementation not 373 supporting it MUST ignore the route, based on the unknown route type 374 value, as specified by Section 5.4 in [RFC7606]. 376 The detailed encoding of this route and associated procedures are 377 described in the following sections. 379 3.1 IP Prefix Route Encoding 381 An IP Prefix advertisement route NLRI consists of the following 382 fields: 384 +---------------------------------------+ 385 | RD (8 octets) | 386 +---------------------------------------+ 387 |Ethernet Segment Identifier (10 octets)| 388 +---------------------------------------+ 389 | Ethernet Tag ID (4 octets) | 390 +---------------------------------------+ 391 | IP Prefix Length (1 octet) | 392 +---------------------------------------+ 393 | IP Prefix (4 or 16 octets) | 394 +---------------------------------------+ 395 | GW IP Address (4 or 16 octets) | 396 +---------------------------------------+ 397 | MPLS Label (3 octets) | 398 +---------------------------------------+ 400 Where: 402 o RD, Ethernet Tag ID and MPLS Label fields will be used as defined 403 in [RFC7432] and [EVPN-OVERLAY]. 405 o The Ethernet Segment Identifier will be a non-zero 10-byte 406 identifier if the ESI is used as an Overlay Index (see the 407 definition of Overlay Index in section 3.2). It will be zero 408 otherwise. 410 o The IP Prefix Length can be set to a value between 0 and 32 (bits) 411 for ipv4 and between 0 and 128 for ipv6, and specifies the number 412 of bits in the Prefix. 414 o The IP Prefix will be a 32 or 128-bit field (ipv4 or ipv6). The 415 size of this field does not depend on the value of the IP Prefix 416 Length field. 418 o The GW IP (Gateway IP Address) will be a 32 or 128-bit field (ipv4 419 or ipv6), and will encode an IP address as an overlay index for the 420 IP Prefixes. The GW IP field SHOULD be zero if it is not used as an 421 Overlay Index. Refer to section 3.2 for the definition and use of 422 the Overlay Index. 424 o The MPLS Label field is encoded as 3 octets, where the high-order 425 20 bits contain the label value. When sending, the label value 426 SHOULD be zero if recursive resolution based on overlay index is 427 used. If the received MPLS Label value is zero, the route MUST 428 contain an Overlay Index and the ingress NVE/PE MUST do recursive 429 resolution to find the egress NVE/PE. If the received Label value 430 is non-zero, the route will not be used for recursive resolution 431 unless a local policy says so. 433 o The total route length will indicate the type of prefix (ipv4 or 434 ipv6) and the type of GW IP address (ipv4 or ipv6). Note that the 435 IP Prefix + the GW IP should have a length of either 64 or 256 436 bits, but never 160 bits (ipv4 and ipv6 mixed values are not 437 allowed). 439 The RD, Eth-Tag ID, IP Prefix Length and IP Prefix will be part of 440 the route key used by BGP to compare routes. The rest of the fields 441 will not be part of the route key. 443 An IP Prefix Route MAY be sent along with a Router's MAC Extended 444 Community (defined in [EVPN-INTERSUBNET]) to carry the MAC address 445 that is used as the overlay index. Note that the MAC address may be 446 that of an TS. 448 3.2 Overlay Indexes and Recursive Lookup Resolution 450 RT-5 routes support recursive lookup resolution through the use of 451 Overlay Indexes as follows: 453 o An Overlay Index can be an ESI, IP address in the address space of 454 the tenant or MAC address and it is used by an NVE as the next-hop 455 for a given IP Prefix. An Overlay Index always needs a recursive 456 route resolution on the NVE/PE that installs the RT-5 into one of 457 its IP-VRFs, so that the NVE knows to which egress NVE/PE it needs 458 to forward the packets. It is important to note that recursive 459 resolution of the Overlay Index applies upon installation into an 460 IP-VRF, and not upon BGP propagation (for instance, on an ASBR). 461 Also, as a result of the recursive resolution, the egress NVE/PE is 462 not necessarily the same NVE that originated the RT-5. 464 o The Overlay Index is indicated along with the RT-5 in the ESI 465 field, GW IP field or Router's MAC Extended Community, depending on 466 whether the IP Prefix next-hop is an ESI, IP address or MAC address 467 in the tenant space. The Overlay Index for a given IP Prefix is set 468 by local policy at the NVE that originates an RT-5 for that IP 469 Prefix (typically managed by the Cloud Management System). 471 o In order to enable the recursive lookup resolution at the ingress 472 NVE, an NVE that is a possible egress NVE for a given Overlay Index 473 must originate a route advertising itself as the BGP next hop on 474 the path to the system denoted by the Overlay Index. For instance: 476 . If an NVE receives an RT-5 that specifies an Overlay Index, the 477 NVE cannot use the RT-5 in its IP-VRF unless (or until) it can 478 recursively resolve the Overlay Index. 479 . If the RT-5 specifies an ESI as the Overlay Index, recursive 480 resolution can only be done if the NVE has received and installed 481 an RT-1 (Auto-Discovery per-EVI) route specifying that ESI. 482 . If the RT-5 specifies a GW IP address as the Overlay Index, 483 recursive resolution can only be done if the NVE has received and 484 installed an RT-2 (MAC/IP route) specifying that IP address in 485 the IP address field of its NLRI. 486 . If the RT-5 specifies a MAC address as the Overlay Index, 487 recursive resolution can only be done if the NVE has received and 488 installed an RT-2 (MAC/IP route) specifying that MAC address in 489 the MAC address field of its NLRI. 491 Note that the RT-1 or RT-2 routes needed for the recursive 492 resolution may arrive before or after the given RT-5 route. 494 o Irrespective of the recursive resolution, if there is no IGP or BGP 495 route to the BGP next-hop of an RT-5, BGP SHOULD fail to install 496 the RT-5 even if the Overlay Index can be resolved. 498 o The ESI and GW IP fields MAY both be zero, however they MUST NOT 499 both be non-zero at the same time. A route containing a non-zero GW 500 IP and a non-zero ESI (at the same time) will be treated as- 501 withdraw. 503 The indirection provided by the Overlay Index and its recursive 504 lookup resolution is required to achieve fast convergence in case of 505 a failure of the object represented by the Overlay Index (see the 506 example described in section 2.2). 508 Table 1 shows the different RT-5 field combinations allowed by this 509 specification and what Overlay Index must be used by the receiving 510 NVE/PE in each case. When the Overlay Index is "None" in Table 1, the 511 receiving NVE/PE will not perform any recursive resolution, and the 512 actual next-hop is given by the RT-5's BGP next-hop. 514 +----------+----------+----------+------------+----------------+ 515 | ESI | GW-IP | MAC* | Label | Overlay Index | 516 |--------------------------------------------------------------| 517 | Non-Zero | Zero | Zero | Don't Care | ESI | 518 | Non-Zero | Zero | Non-Zero | Don't Care | ESI | 519 | Zero | Non-Zero | Zero | Don't Care | GW-IP | 520 | Zero | Zero | Non-Zero | Zero | MAC | 521 | Zero | Zero | Non-Zero | Non-Zero | MAC or None** | 522 | Zero | Zero | Zero | Non-Zero | None*** | 523 +----------+----------+----------+------------+----------------+ 525 Table 1 - RT-5 fields and Indicated Overlay Index 527 Table NOTES: 529 * MAC with Zero value means no Router's MAC extended community is 530 present along with the RT-5. Non-Zero indicates that the extended 531 community is present and carries a valid MAC address. Examples of 532 invalid MAC addresses are broadcast or multicast MAC addresses. 533 The presence of the Router's MAC extended community alone is not 534 enough to indicate the use of the MAC address as the overlay 535 index, since the extended community can be used for other 536 purposes. 538 ** In this case, the Overlay Index may be the RT-5's MAC address or 539 None, depending on the local policy of the receiving NVE/PE. 541 *** The Overlay Index is None. This is a special case used for IP- 542 VRF-to-IP-VRF where the NVE/PEs are connected by IP NVO tunnels 543 as opposed to Ethernet NVO tunnels. 545 Table 2 shows the different inter-subnet use-cases described in this 546 document and the corresponding coding of the Overlay Index in the 547 route type 5 (RT-5). 549 +---------+---------------------+----------------------------+ 550 | Section | Use-case | Overlay Index in the RT-5 | 551 +-------------------------------+----------------------------+ 552 | 4.1 | TS IP address | GW IP | 553 | 4.2 | Floating IP address | GW IP | 554 | 4.3 | "Bump in the wire" | ESI or MAC | 555 | 4.4 | IP-VRF-to-IP-VRF | GW IP, MAC or None | 556 +---------+---------------------+----------------------------+ 558 Table 2 - Use-cases and Overlay Indexes for Recursive Resolution 560 The above use-cases are representative of the different Overlay 561 Indexes supported by RT-5 (GW IP, ESI, MAC or None). Any other use- 562 case using a given Overlay Index, SHOULD follow the procedures 563 described in this document for the same Overlay Index. 565 4. Overlay Index Use-Cases 567 This section describes some use-cases for the Overlay Index types 568 used with the IP Prefix route. 570 4.1 TS IP Address Overlay Index Use-Case 572 The following figure illustrates an example of inter-subnet 573 forwarding for subnets sitting behind Virtual Appliances (on TS2 and 574 TS3). 576 IP4---+ NVE2 DGW1 577 | +-----------+ +---------+ +-------------+ 578 SN2---TS2(VA)--| (BD-10) |-| |----| (BD-10) | 579 | IP2/M2 +-----------+ | | | IRB1\ | 580 -+---+ | | | (IP-VRF)|---+ 581 | | | +-------------+ _|_ 582 SN1 | VXLAN/ | ( ) 583 | | nvGRE | DGW2 ( WAN ) 584 -+---+ NVE3 | | +-------------+ (___) 585 | IP3/M3 +-----------+ | |----| (BD-10) | | 586 SN3---TS3(VA)--| (BD-10) |-| | | IRB2\ | | 587 | +-----------+ +---------+ | (IP-VRF)|---+ 588 IP5---+ +-------------+ 590 Figure 2 TS IP address use-case 592 An example of inter-subnet forwarding between subnet SN1/24 and a 593 subnet sitting in the WAN is described below. NVE2, NVE3, DGW1 and 594 DGW2 are running BGP EVPN. TS2 and TS3 do not participate in dynamic 595 routing protocols, and they only have a static route to forward the 596 traffic to the WAN. We assume SN1/24 is dual-homed to NVE2 and NVE3. 598 In this case, a GW IP is used as an Overlay Index. Although a 599 different Overlay Index type could have been used, this use-case 600 assumes that the operator knows the VA's IP addresses beforehand, 601 whereas the VA's MAC address is unknown and the VA's ESI is zero. 602 Because of this, the GW IP is the suitable Overlay Index to be used 603 with the RT-5s. The NVEs know the GW IP to be used for a given Prefix 604 by policy. 606 (1) NVE2 advertises the following BGP routes on behalf of TS2: 608 o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32, 609 IP=IP2 and [RFC5512] BGP Encapsulation Extended Community with 610 the corresponding Tunnel-type. The MAC and IP addresses may be 611 learned via ARP-snooping (ND-snooping if IPv6). 613 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 614 ESI=0, GW IP address=IP2. The prefix and GW IP are learned by 615 policy. 617 (2) Similarly, NVE3 advertises the following BGP routes on behalf of 618 TS3: 620 o Route type 2 (MAC/IP route) containing: ML=48, M=M3, IPL=32, 621 IP=IP3 (and BGP Encapsulation Extended Community). 623 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 624 ESI=0, GW IP address=IP3. 626 (3) DGW1 and DGW2 import both received routes based on the 627 route-targets: 629 o Based on the BD-10 route-target in DGW1 and DGW2, the MAC/IP 630 route is imported and M2 is added to the BD-10 along with its 631 corresponding tunnel information. For instance, if VXLAN is 632 used, the VTEP will be derived from the MAC/IP route BGP next- 633 hop and VNI from the MPLS Label1 field. IP2 - M2 is added to 634 the ARP table. Similarly, M3 is added to BD-10 and IP3 - M3 to 635 the ARP table. 637 o Based on the BD-10 route-target in DGW1 and DGW2, the IP 638 Prefix route is also imported and SN1/24 is added to the IP- 639 VRF with Overlay Index IP2 pointing at the local BD-10. In 640 this example, we assume the RT-5 from NVE2 is preferred over 641 the RT-5 from NVE3. If both routes were equally preferable and 642 ECMP enabled, SN1/24 would also be added to the routing table 643 with Overlay Index IP3. 645 (4) When DGW1 receives a packet from the WAN with destination IPx, 646 where IPx belongs to SN1/24: 648 o A destination IP lookup is performed on the DGW1 IP-VRF 649 routing table and Overlay Index=IP2 is found. Since IP2 is an 650 Overlay Index a recursive route resolution is required for 651 IP2. 653 o IP2 is resolved to M2 in the ARP table, and M2 is resolved to 654 the tunnel information given by the BD FIB (e.g. remote VTEP 655 and VNI for the VXLAN case). 657 o The IP packet destined to IPx is encapsulated with: 659 . Source inner MAC = IRB1 MAC. 661 . Destination inner MAC = M2. 663 . Tunnel information provided by the BD (VNI, VTEP IPs and 664 MACs for the VXLAN case). 666 (5) When the packet arrives at NVE2: 668 o Based on the tunnel information (VNI for the VXLAN case), the 669 BD-10 context is identified for a MAC lookup. 671 o Encapsulation is stripped-off and based on a MAC lookup 672 (assuming MAC forwarding on the egress NVE), the packet is 673 forwarded to TS2, where it will be properly routed. 675 (6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will 676 be applied to the MAC route IP2/M2, as defined in [RFC7432]. 677 Route type 5 prefixes are not subject to MAC mobility procedures, 678 hence no changes in the DGW IP-VRF routing table will occur for 679 TS2 mobility, i.e. all the prefixes will still be pointing at IP2 680 as Overlay Index. There is an indirection for e.g. SN1/24, which 681 still points at Overlay Index IP2 in the routing table, but IP2 682 will be simply resolved to a different tunnel, based on the 683 outcome of the MAC mobility procedures for the MAC/IP route 684 IP2/M2. 686 Note that in the opposite direction, TS2 will send traffic based on 687 its static-route next-hop information (IRB1 and/or IRB2), and regular 688 EVPN procedures will be applied. 690 4.2 Floating IP Overlay Index Use-Case 691 Sometimes Tenant Systems (TS) work in active/standby mode where an 692 upstream floating IP - owned by the active TS - is used as the 693 Overlay Index to get to some subnets behind. This redundancy mode, 694 already introduced in section 2.1 and 2.2, is illustrated in Figure 695 3. 697 NVE2 DGW1 698 +-----------+ +---------+ +-------------+ 699 +---TS2(VA)--| (BD-10) |-| |----| (BD-10) | 700 | IP2/M2 +-----------+ | | | IRB1\ | 701 | <-+ | | | (IP-VRF)|---+ 702 | | | | +-------------+ _|_ 703 SN1 vIP23 (floating) | VXLAN/ | ( ) 704 | | | nvGRE | DGW2 ( WAN ) 705 | <-+ NVE3 | | +-------------+ (___) 706 | IP3/M3 +-----------+ | |----| (BD-10) | | 707 +---TS3(VA)--| (BD-10) |-| | | IRB2\ | | 708 +-----------+ +---------+ | (IP-VRF)|---+ 709 +-------------+ 711 Figure 3 Floating IP Overlay Index for redundant TS 713 In this use-case, a GW IP is used as an Overlay Index for the same 714 reasons as in 4.1. However, this GW IP is a floating IP that belongs 715 to the active TS. Assuming TS2 is the active TS and owns IP23: 717 (1) NVE2 advertises the following BGP routes for TS2: 719 o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32, 720 IP=IP23 (and BGP Encapsulation Extended Community). The MAC 721 and IP addresses may be learned via ARP-snooping. 723 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 724 ESI=0, GW IP address=IP23. The prefix and GW IP are learned by 725 policy. 727 (2) NVE3 advertises the following BGP route for TS3 (it does not 728 advertise an RT-2 for IP23/M3): 730 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 731 ESI=0, GW IP address=IP23. The prefix and GW IP are learned by 732 policy. 734 (3) DGW1 and DGW2 import both received routes based on the route- 735 target: 737 o M2 is added to the BD-10 FIB along with its corresponding 738 tunnel information. For the VXLAN use case, the VTEP will be 739 derived from the MAC/IP route BGP next-hop and VNI from the 740 VNI/VSID field. IP23 - M2 is added to the ARP table. 742 o SN1/24 is added to the IP-VRF in DGW1 and DGW2 with Overlay 743 index IP23 pointing at M2 in the local BD-10. 745 (4) When DGW1 receives a packet from the WAN with destination IPx, 746 where IPx belongs to SN1/24: 748 o A destination IP lookup is performed on the DGW1 IP-VRF 749 routing table and Overlay Index=IP23 is found. Since IP23 is 750 an Overlay Index, a recursive route resolution for IP23 is 751 required. 753 o IP23 is resolved to M2 in the ARP table, and M2 is resolved to 754 the tunnel information given by the BD (remote VTEP and VNI 755 for the VXLAN case). 757 o The IP packet destined to IPx is encapsulated with: 759 . Source inner MAC = IRB1 MAC. 761 . Destination inner MAC = M2. 763 . Tunnel information provided by the BD FIB (VNI, VTEP IPs 764 and MACs for the VXLAN case). 766 (5) When the packet arrives at NVE2: 768 o Based on the tunnel information (VNI for the VXLAN case), the 769 BD-10 context is identified for a MAC lookup. 771 o Encapsulation is stripped-off and based on a MAC lookup 772 (assuming MAC forwarding on the egress NVE), the packet is 773 forwarded to TS2, where it will be properly routed. 775 (6) When the redundancy protocol running between TS2 and TS3 appoints 776 TS3 as the new active TS for SN1, TS3 will now own the floating 777 IP23 and will signal this new ownership (GARP message or 778 similar). Upon receiving the new owner's notification, NVE3 will 779 issue a route type 2 for M3-IP23 and NVE2 will withdraw the RT-2 780 for M2-IP23. DGW1 and DGW2 will update their ARP tables with the 781 new MAC resolving the floating IP. No changes are made in the IP- 782 VRF routing table. 784 4.3 Bump-in-the-Wire Use-Case 785 Figure 5 illustrates an example of inter-subnet forwarding for an IP 786 Prefix route that carries a subnet SN1. In this use-case, TS2 and TS3 787 are layer-2 VA devices without any IP address that can be included as 788 an Overlay Index in the GW IP field of the IP Prefix route. Their MAC 789 addresses are M2 and M3 respectively and are connected to BD-10. Note 790 that IRB1 and IRB2 (in DGW1 and DGW2 respectively) have IP addresses 791 in a subnet different than SN1. 793 NVE2 DGW1 794 M2 +-----------+ +---------+ +-------------+ 795 +---TS2(VA)--| (BD-10) |-| |----| (BD-10) | 796 | ESI23 +-----------+ | | | IRB1\ | 797 | + | | | (IP-VRF)|---+ 798 | | | | +-------------+ _|_ 799 SN1 | | VXLAN/ | ( ) 800 | | | nvGRE | DGW2 ( WAN ) 801 | + NVE3 | | +-------------+ (___) 802 | ESI23 +-----------+ | |----| (BD-10) | | 803 +---TS3(VA)--| (BD-10) |-| | | IRB2\ | | 804 M3 +-----------+ +---------+ | (IP-VRF)|---+ 805 +-------------+ 807 Figure 5 Bump-in-the-wire use-case 809 Since neither TS2 nor TS3 can participate in any dynamic routing 810 protocol and have no IP address assigned, there are two potential 811 Overlay Index types that can be used when advertising SN1: 813 a) an ESI, i.e. ESI23, that can be provisioned on the attachment 814 ports of NVE2 and NVE3, as shown in Figure 5. 815 b) or the VA's MAC address, that can be added to NVE2 and NVE3 by 816 policy. 818 The advantage of using an ESI as Overlay Index as opposed to the VA's 819 MAC address, is that the forwarding to the egress NVE can be done 820 purely based on the state of the AC in the ES (notified by the AD 821 per-EVI route) and all the EVPN multi-homing redundancy mechanisms 822 can be re-used. For instance, the [RFC7432] mass-withdrawal mechanism 823 for fast failure detection and propagation can be used. This section 824 assumes that an ESI Overlay Index is used in this use-case but it 825 does not prevent the use of the VA's MAC address as an Overlay Index. 826 If a MAC is used as Overlay Index, the control plane must follow the 827 procedures described in section 4.4.3. 829 The model supports VA redundancy in a similar way as the one 830 described in section 4.2 for the floating IP Overlay Index use-case, 831 except that it uses the EVPN Ethernet A-D per-EVI route instead of 832 the MAC advertisement route to advertise the location of the Overlay 833 Index. The procedure is explained below: 835 (1) Assuming TS2 is the active TS in ESI23, NVE2 advertises the 836 following BGP routes: 838 o Route type 1 (Ethernet A-D route for BD-10) containing: 839 ESI=ESI23 and the corresponding tunnel information (VNI/VSID 840 field), as well as the BGP Encapsulation Extended Community as 841 per [EVPN-OVERLAY]. 843 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 844 ESI=ESI23, GW IP address=0. The Router's MAC Extended 845 Community defined in [EVPN-INTERSUBNET] is added and carries 846 the MAC address (M2) associated to the TS behind which SN1 847 sits. M2 may be learned by policy. 849 (2) NVE3 advertises the following BGP route for TS3 (no AD per-EVI 850 route is advertised): 852 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 853 ESI=23, GW IP address=0. The Router's MAC Extended Community 854 is added and carries the MAC address (M3) associated to the TS 855 behind which SN1 sits. M3 may be learned by policy. 857 (3) DGW1 and DGW2 import the received routes based on the route- 858 target: 860 o The tunnel information to get to ESI23 is installed in DGW1 861 and DGW2. For the VXLAN use case, the VTEP will be derived 862 from the Ethernet A-D route BGP next-hop and VNI from the 863 VNI/VSID field (see [EVPN-OVERLAY]). 865 o The RT-5 coming from the NVE that advertised the RT-1 is 866 selected and SN1/24 is added to the IP-VRF in DGW1 and DGW2 867 with Overlay Index ESI23 and MAC = M2. 869 (4) When DGW1 receives a packet from the WAN with destination IPx, 870 where IPx belongs to SN1/24: 872 o A destination IP lookup is performed on the DGW1 IP-VRF 873 routing table and Overlay Index=ESI23 is found. Since ESI23 is 874 an Overlay Index, a recursive route resolution is required to 875 find the egress NVE where ESI23 resides. 877 o The IP packet destined to IPx is encapsulated with: 879 . Source inner MAC = IRB1 MAC. 881 . Destination inner MAC = M2 (this MAC will be obtained 882 from the Router's MAC Extended Community received along 883 with the RT-5 for SN1). Note that the Router's MAC 884 Extended Community is used in this case to carry the TS' 885 MAC address, as opposed to the NVE/PE's MAC address. 887 . Tunnel information for the NVO tunnel is provided by the 888 Ethernet A-D route per-EVI for ESI23 (VNI and VTEP IP for 889 the VXLAN case). 891 (5) When the packet arrives at NVE2: 893 o Based on the tunnel demultiplexer information (VNI for the 894 VXLAN case), the BD-10 context is identified for a MAC lookup 895 (assuming MAC disposition model) or the VNI MAY directly 896 identify the egress interface (for a label or VNI disposition 897 model). 899 o Encapsulation is stripped-off and based on a MAC lookup 900 (assuming MAC forwarding on the egress NVE) or a VNI lookup 901 (in case of VNI forwarding), the packet is forwarded to TS2, 902 where it will be forwarded to SN1. 904 (6) If the redundancy protocol running between TS2 and TS3 follows an 905 active/standby model and there is a failure, appointing TS3 as 906 the new active TS for SN1, TS3 will now own the connectivity to 907 SN1 and will signal this new ownership. Upon receiving the new 908 owner's notification, NVE3's AC will become active and issue a 909 route type 1 for ESI23, whereas NVE2 will withdraw its Ethernet 910 A-D route for ESI23. DGW1 and DGW2 will update their tunnel 911 information to resolve ESI23. The destination inner MAC will be 912 changed to M3. 914 4.4 IP-VRF-to-IP-VRF Model 916 This use-case is similar to the scenario described in "IRB forwarding 917 on NVEs for Tenant Systems" in [EVPN-INTERSUBNET], however the new 918 requirement here is the advertisement of IP Prefixes as opposed to 919 only host routes. 921 In the examples described in sections 4.1, 4.2 and 4.3, the BD 922 instance can connect IRB interfaces and any other Tenant Systems 923 connected to it. EVPN provides connectivity for: 925 1. Traffic destined to the IRB or TS IP interfaces as well as 927 2. Traffic destined to IP subnets sitting behind the TS, e.g. SN1 or 928 SN2. 930 In order to provide connectivity for (1), MAC/IP routes (RT-2) are 931 needed so that IRB or TS MACs and IPs can be distributed. 932 Connectivity type (2) is accomplished by the exchange of IP Prefix 933 routes (RT-5) for IPs and subnets sitting behind certain Overlay 934 Indexes, e.g. GW IP or ESI or TS MAC. 936 In some cases, IP Prefix routes may be advertised for subnets and IPs 937 sitting behind an IRB. We refer to this use-case as the "IP-VRF-to- 938 IP-VRF" model. 940 [EVPN-INTERSUBNET] defines an asymmetric IRB model and a symmetric 941 IRB model, based on the required lookups at the ingress and egress 942 NVE: the asymmetric model requires an ip-lookup and a mac-lookup at 943 the ingress NVE, whereas only a mac-lookup is needed at the egress 944 NVE; the symmetric model requires ip and mac lookups at both, ingress 945 and egress NVE. From that perspective, the IP-VRF-to-IP-VRF use-case 946 described in this section is a symmetric IRB model. 948 Note that, in an IP-VRF-to-IP-VRF scenario, out of the many subnets 949 that a tenant may have, it may be the case that only a few are 950 attached to a given NVE/PE's IP-VRF. In order to provide inter-subnet 951 connectivity among the set of NVE/PEs where the tenant is connected, 952 a new "Supplementary Broadcast Domain" (SBD) is created on all of 953 them if recursive resolution is needed. This SBD is instantiated as a 954 regular BD (with no ACs) in each NVE/PE and has a IRB interfaces that 955 connect the SBD to the IP-VRF. The IRB interface's IP or MAC address 956 is used as the overlay index for recursive resolution. 958 Depending on the existence and characteristics of the SBD and IRB 959 interfaces for the IP-VRFs, there are three different IP-VRF-to-IP- 960 VRF scenarios identified and described in this document: 962 1) Interface-less model: no SBD and no overlay indexes required. 963 2) Interface-ful with SBD IRB model: it requires SBD, as well as GW 964 IP addresses as overlay indexes. 965 3) Interface-ful with unnumbered SBD IRB model: it requires SBD, as 966 well as MAC addresses as overlay indexes. 968 Inter-subnet IP multicast is outside the scope of this document. 970 4.4.1 Interface-less IP-VRF-to-IP-VRF Model 972 Figure 6 will be used for the description of this model. 974 NVE1(M1) 975 +------------+ 976 IP1+----| (BD-1) | DGW1(M3) 977 | \ | +---------+ +--------+ 978 | (IP-VRF)|----| |-|(IP-VRF)|----+ 979 | / | | | +--------+ | 980 +---| (BD-2) | | | _+_ 981 | +------------+ | | ( ) 982 SN1| | VXLAN/ | ( WAN )--H1 983 | NVE2(M2) | nvGRE/ | (___) 984 | +------------+ | MPLS | + 985 +---| (BD-2) | | | DGW2(M4) | 986 | \ | | | +--------+ | 987 | (IP-VRF)|----| |-|(IP-VRF)|----+ 988 | / | +---------+ +--------+ 989 SN2+----| (BD-3) | 990 +------------+ 992 Figure 6 Interface-less IP-VRF-to-IP-VRF model 994 In this case: 996 a) The NVEs and DGWs must provide connectivity between hosts in SN1, 997 SN2, IP1 and hosts sitting at the other end of the WAN, for 998 example, H1. We assume the DGWs import/export IP and/or VPN-IP 999 routes from/to the WAN. 1001 b) The IP-VRF instances in the NVE/DGWs are directly connected 1002 through NVO tunnels, and no IRBs and/or BD instances are 1003 instantiated to connect the IP-VRFs. 1005 c) The solution must provide layer-3 connectivity among the IP-VRFs 1006 for Ethernet NVO tunnels, for instance, VXLAN or nvGRE. 1008 d) The solution may provide layer-3 connectivity among the IP-VRFs 1009 for IP NVO tunnels, for example, VXLAN GPE (with IP payload). 1011 In order to meet the above requirements, the EVPN route type 5 will 1012 be used to advertise the IP Prefixes, along with the Router's MAC 1013 Extended Community as defined in [EVPN-INTERSUBNET] if the 1014 advertising NVE/DGW uses Ethernet NVO tunnels. Each NVE/DGW will 1015 advertise an RT-5 for each of its prefixes with the following fields: 1017 o RD as per [RFC7432]. 1019 o Eth-Tag ID=0. 1021 o IP address length and IP address, as explained in the previous 1022 sections. 1024 o GW IP address=0. 1026 o ESI=0 1028 o MPLS label or VNI corresponding to the IP-VRF. 1030 Each RT-5 will be sent with a route-target identifying the tenant 1031 (IP-VRF) and two BGP extended communities: 1033 o The first one is the BGP Encapsulation Extended Community, as 1034 per [RFC5512], identifying the tunnel type. 1036 o The second one is the Router's MAC Extended Community as per 1037 [EVPN-INTERSUBNET] containing the MAC address associated to 1038 the NVE advertising the route. This MAC address identifies the 1039 NVE/DGW and MAY be re-used for all the IP-VRFs in the NVE. The 1040 Router's MAC Extended Community MUST be sent if the route is 1041 associated to an Ethernet NVO tunnel, for instance, VXLAN. If 1042 the route is associated to an IP NVO tunnel, for instance 1043 VXLAN GPE with IP payload, the Router's MAC Extended Community 1044 SHOULD NOT be sent. 1046 The following example illustrates the procedure to advertise and 1047 forward packets to SN1/24 (ipv4 prefix advertised from NVE1): 1049 (1) NVE1 advertises the following BGP route: 1051 o Route type 5 (IP Prefix route) containing: 1053 . IPL=24, IP=SN1, Label=10. 1055 . GW IP= SHOULD be set to 0. 1057 . [RFC5512] BGP Encapsulation Extended Community. 1059 . Router's MAC Extended Community that contains M1. 1061 . Route-target identifying the tenant (IP-VRF). 1063 (2) DGW1 imports the received routes from NVE1: 1065 o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 1066 route-target. 1068 o Since GW IP=ESI=0, the Label is a non-zero value and the local 1069 policy indicates this interface-less model, DGW1 will use the 1070 Label and next-hop of the RT-5, as well as the MAC address 1071 conveyed in the Router's MAC Extended Community (as inner 1072 destination MAC address) to set up the forwarding state and 1073 later encapsulate the routed IP packets. 1075 (3) When DGW1 receives a packet from the WAN with destination IPx, 1076 where IPx belongs to SN1/24: 1078 o A destination IP lookup is performed on the DGW1 IP-VRF 1079 routing table. The lookup yields SN1/24. 1081 o Since the RT-5 for SN1/24 had a GW IP=ESI=0, a non-zero Label 1082 and next-hop and the model is interface-less, DGW1 will not 1083 need a recursive lookup to resolve the route. 1085 o The IP packet destined to IPx is encapsulated with: Source 1086 inner MAC = DGW1 MAC, Destination inner MAC = M1, Source outer 1087 IP (tunnel source IP) = DGW1 IP, Destination outer IP (tunnel 1088 destination IP) = NVE1 IP. The Source and Destination inner 1089 MAC addresses are not needed if IP NVO tunnels are used. 1091 (4) When the packet arrives at NVE1: 1093 o NVE1 will identify the IP-VRF for an IP-lookup based on the 1094 Label (the Destination inner MAC is not needed to identify the 1095 IP-VRF). 1097 o An IP lookup is performed in the routing context, where SN1 1098 turns out to be a local subnet associated to BD-2. A 1099 subsequent lookup in the ARP table and the BD FIB will provide 1100 the forwarding information for the packet in BD-2. 1102 The model described above is called Interface-less model since the 1103 IP-VRFs are connected directly through tunnels and they don't require 1104 those tunnels to be terminated in SBDs instead, like in sections 1105 4.4.2 or 4.4.3. An EVPN IP-VRF-to-IP-VRF implementation is REQUIRED 1106 to support the ingress and egress procedures described in this 1107 section. 1109 4.4.2 Interface-ful IP-VRF-to-IP-VRF with SBD IRB 1111 Figure 7 will be used for the description of this model. 1113 NVE1 1114 +------------+ DGW1 1115 IP10+---+(BD-1) | +---------------+ +------------+ 1116 | \ | | | | | 1117 |(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ 1118 | / IRB(IP1/M1) IRB(IP3/M3) | | 1119 +---+(BD-2) | | | +------------+ _+_ 1120 | +------------+ | | ( ) 1121 SN1| | VXLAN/ | ( WAN )--H1 1122 | NVE2 | nvGRE/ | (___) 1123 | +------------+ | MPLS | DGW2 + 1124 +---+(BD-2) | | | +------------+ | 1125 | \ | | | | | | 1126 |(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ 1127 | / IRB(IP2/M2) IRB(IP4/M4) | 1128 SN2+----+(BD-3) | +---------------+ +------------+ 1129 +------------+ 1131 Figure 7 Interface-ful with SBD IRB model 1133 In this model: 1135 a) As in section 4.4.1, the NVEs and DGWs must provide connectivity 1136 between hosts in SN1, SN2, IP1 and hosts sitting at the other end 1137 of the WAN. 1139 b) However, the NVE/DGWs are now connected through Ethernet NVO 1140 tunnels terminated in the SBD instance. The IP-VRFs use IRB 1141 interfaces for their connectivity to the SBD. 1143 c) Each SBD IRB has an IP and a MAC address, where the IP address 1144 must be reachable from other NVEs or DGWs. 1146 d) The SBD is attached to all the NVE/DGWs in the tenant domain BDs. 1148 e) The solution must provide layer-3 connectivity for Ethernet NVO 1149 tunnels, for instance, VXLAN or nvGRE. 1151 EVPN type 5 routes will be used to advertise the IP Prefixes, whereas 1152 EVPN RT-2 routes will advertise the MAC/IP addresses of each SBD IRB 1153 interface. Each NVE/DGW will advertise an RT-5 for each of its 1154 prefixes with the following fields: 1156 o RD as per [RFC7432]. 1158 o Eth-Tag ID=0. 1160 o IP address length and IP address, as explained in the previous 1161 sections. 1163 o GW IP address=IRB-IP (this is the Overlay Index that will be 1164 used for the recursive route resolution). 1166 o ESI=0 1168 o Label value SHOULD be zero since the RT-5 route requires a 1169 recursive lookup resolution to an RT-2 route. It is ignored on 1170 reception, and, when forwarding packets, the MPLS label or VNI 1171 from the RT-2's MPLS Label1 field is used. 1173 Each RT-5 will be sent with a route-target identifying the tenant 1174 (IP-VRF). The Router's MAC Extended Community SHOULD NOT be sent in 1175 this case. 1177 The following example illustrates the procedure to advertise and 1178 forward packets to SN1/24 (ipv4 prefix advertised from NVE1): 1180 (1) NVE1 advertises the following BGP routes: 1182 o Route type 5 (IP Prefix route) containing: 1184 . IPL=24, IP=SN1, Label= SHOULD be set to 0. 1186 . GW IP=IP1 (sBD IRB's IP) 1188 . Route-target identifying the tenant (IP-VRF). 1190 o Route type 2 (MAC/IP route for the SBD IRB) containing: 1192 . ML=48, M=M1, IPL=32, IP=IP1, Label=10. 1194 . A [RFC5512] BGP Encapsulation Extended Community. 1196 . Route-target identifying the SBD. This route-target MAY be 1197 the same as the one used with the RT-5. 1199 (2) DGW1 imports the received routes from NVE1: 1201 o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 1202 route-target. 1204 . Since GW IP is different from zero, the GW IP (IP1) will be 1205 used as the Overlay Index for the recursive route resolution 1206 to the RT-2 carrying IP1. 1208 (3) When DGW1 receives a packet from the WAN with destination IPx, 1209 where IPx belongs to SN1/24: 1211 o A destination IP lookup is performed on the DGW1 IP-VRF 1212 routing table. The lookup yields SN1/24, which is associated 1213 to the Overlay Index IP1. The forwarding information is 1214 derived from the RT-2 received for IP1. 1216 o The IP packet destined to IPx is encapsulated with: Source 1217 inner MAC = M3, Destination inner MAC = M1, Source outer IP 1218 (source VTEP) = DGW1 IP, Destination outer IP (destination 1219 VTEP) = NVE1 IP. 1221 (4) When the packet arrives at NVE1: 1223 o NVE1 will identify the IP-VRF for an IP-lookup based on the 1224 Label and the inner MAC DA. 1226 o An IP lookup is performed in the routing context, where SN1 1227 turns out to be a local subnet associated to BD-2. A 1228 subsequent lookup in the ARP table and the BD FIB will provide 1229 the forwarding information for the packet in BD-2. 1231 The model described above is called 'Interface-ful with SBD IRB 1232 model' since the tunnels connecting the DGWs and NVEs need to be 1233 terminated into the SBD. The SBD is connected to the IP-VRFs via SBD 1234 IRB interfaces, and that allows the recursive resolution of RT-5s to 1235 GW IP addresses. An EVPN IP-VRF-to-IP-VRF implementation is REQUIRED 1236 to support the ingress and egress procedures described in this 1237 section. 1239 4.4.3 Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD IRB 1241 Figure 8 will be used for the description of this model. Note that 1242 this model is similar to the one described in section 4.4.2, only 1243 without IP addresses on the SBD IRB interfaces. 1245 NVE1 1246 +------------+ DGW1 1247 IP1+----+(BD-1) | +---------------+ +------------+ 1248 | \ | | | | | 1249 |(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ 1250 | / IRB(M1)| | IRB(M3) | | 1251 +---+(BD-2) | | | +------------+ _+_ 1252 | +------------+ | | ( ) 1253 SN1| | VXLAN/ | ( WAN )--H1 1254 | NVE2 | nvGRE/ | (___) 1255 | +------------+ | MPLS | DGW2 + 1256 +---+(BD-2) | | | +------------+ | 1257 | \ | | | | | | 1258 |(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ 1259 | / IRB(M2)| | IRB(M4) | 1260 SN2+----+(BD-3) | +---------------+ +------------+ 1261 +------------+ 1263 Figure 8 Interface-ful with unnumbered SBD IRB model 1265 In this model: 1267 a) As in section 4.4.1 and 4.4.2, the NVEs and DGWs must provide 1268 connectivity between hosts in SN1, SN2, IP1 and hosts sitting at 1269 the other end of the WAN. 1271 b) As in section 4.4.2, the NVE/DGWs are connected through Ethernet 1272 NVO tunnels terminated in the SBD instance. The IP-VRFs use IRB 1273 interfaces for their connectivity to the SBD. 1275 c) However, each SBD IRB has a MAC address only, and no IP address 1276 (that is why the model refers to an 'unnumbered' SBD IRB). In this 1277 model, there is no need to have IP reachability to the SBD IRB 1278 interfaces themselves and there is a requirement to save IP 1279 addresses on those interfaces. 1281 d) As in section 4.4.2, the SBD is composed of all the NVE/DGW BDs of 1282 the tenant that need inter-subnet-forwarding. 1284 e) As in section 4.4.2, the solution must provide layer-3 1285 connectivity for Ethernet NVO tunnels, for instance, VXLAN or 1286 nvGRE. 1288 This model will also make use of the RT-5 recursive resolution. EVPN 1289 type 5 routes will advertise the IP Prefixes along with the Router's 1290 MAC Extended Community used for the recursive lookup, whereas EVPN 1291 RT-2 routes will advertise the MAC addresses of each SBD IRB 1292 interface (this time without an IP). 1294 Each NVE/DGW will advertise an RT-5 for each of its prefixes with the 1295 same fields as described in 4.4.2 except for: 1297 o GW IP address= SHOULD be set to 0. 1299 Each RT-5 will be sent with a route-target identifying the tenant 1300 (IP-VRF) and the Router's MAC Extended Community containing the MAC 1301 address associated to SBD IRB interface. This MAC address MAY be re- 1302 used for all the IP-VRFs in the NVE. 1304 The example is similar to the one in section 4.4.2: 1306 (1) NVE1 advertises the following BGP routes: 1308 o Route type 5 (IP Prefix route) containing the same values as 1309 in the example in section 4.4.2, except for: 1311 . GW IP= SHOULD be set to 0. 1313 . Router's MAC Extended Community containing M1 (this will be 1314 used for the recursive lookup to a RT-2). 1316 o Route type 2 (MAC route for the SBD IRB) with the same values 1317 as in section 4.4.2 except for: 1319 . ML=48, M=M1, IPL=0, Label=10. 1321 (2) DGW1 imports the received routes from NVE1: 1323 o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 1324 route-target. 1326 . The MAC contained in the Router's MAC Extended Community 1327 sent along with the RT-5 (M1) will be used as the Overlay 1328 Index for the recursive route resolution to the RT-2 1329 carrying M1. 1331 (3) When DGW1 receives a packet from the WAN with destination IPx, 1332 where IPx belongs to SN1/24: 1334 o A destination IP lookup is performed on the DGW1 IP-VRF 1335 routing table. The lookup yields SN1/24, which is associated 1336 to the Overlay Index M1. The forwarding information is derived 1337 from the RT-2 received for M1. 1339 o The IP packet destined to IPx is encapsulated with: Source 1340 inner MAC = M3, Destination inner MAC = M1, Source outer IP 1341 (source VTEP) = DGW1 IP, Destination outer IP (destination 1342 VTEP) = NVE1 IP. 1344 (4) When the packet arrives at NVE1: 1346 o NVE1 will identify the IP-VRF for an IP-lookup based on the 1347 Label and the inner MAC DA. 1349 o An IP lookup is performed in the routing context, where SN1 1350 turns out to be a local subnet associated to BD-2. A 1351 subsequent lookup in the ARP table and the BD FIB will provide 1352 the forwarding information for the packet in BD-2. 1354 The model described above is called Interface-ful with SBD IRB model 1355 (as in section 4.4.2), only this time the SBD IRB does not have an IP 1356 address. This model is OPTIONAL for an EVPN IP-VRF-to-IP-VRF 1357 implementation. 1359 5. Conclusions 1361 An EVPN route (type 5) for the advertisement of IP Prefixes is 1362 described in this document. This new route type has a differentiated 1363 role from the RT-2 route and addresses the Data Center (or NVO-based 1364 networks in general) inter-subnet connectivity scenarios described in 1365 this document. Using this new RT-5, an IP Prefix may be advertised 1366 along with an Overlay Index that can be a GW IP address, a MAC or an 1367 ESI, or without an Overlay Index, in which case the BGP next-hop will 1368 point at the egress NVE/ASBR/ABR and the MAC in the Router's MAC 1369 Extended Community will provide the inner MAC destination address to 1370 be used. As discussed throughout the document, the EVPN RT-2 does not 1371 meet the requirements for all the DC use cases, therefore this EVPN 1372 route type 5 is required. 1374 The EVPN route type 5 decouples the IP Prefix advertisements from the 1375 MAC/IP route advertisements in EVPN, hence: 1377 a) Allows the clean and clear advertisements of ipv4 or ipv6 prefixes 1378 in an NLRI with no MAC addresses. 1380 b) Since the route type is different from the MAC/IP Advertisement 1381 route, the current [RFC7432] procedures do not need to be 1382 modified. 1384 c) Allows a flexible implementation where the prefix can be linked to 1385 different types of Overlay/Underlay Indexes: overlay IP address, 1386 overlay MAC addresses, overlay ESI, underlay BGP next-hops, etc. 1388 d) An EVPN implementation not requiring IP Prefixes can simply 1389 discard them by looking at the route type value. 1391 6. Conventions used in this document 1393 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 1394 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 1395 document are to be interpreted as described in RFC-2119 [RFC2119]. 1397 7. Security Considerations 1399 The security considerations discussed in [RFC7432] apply to this 1400 document. 1402 8. IANA Considerations 1404 This document requests the allocation of value 5 in the "EVPN Route 1405 Types" registry defined by [RFC7432]: 1407 Value Description Reference 1408 5 IP Prefix route [this document] 1410 9. References 1412 9.1 Normative References 1414 [RFC4364]Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1415 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, 1416 . 1418 [RFC7432]Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1419 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet 1420 VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, . 1423 [RFC7606]Chen, E., Scudder, J., Mohapatra, P., and K. Patel, "Revised 1424 Error Handling for BGP UPDATE Messages", RFC 7606, August 2015, 1425 . 1427 9.2 Informative References 1429 [EVPN-INTERSUBNET] Sajassi et al., "IP Inter-Subnet Forwarding in 1430 EVPN", draft-ietf-bess-evpn-inter-subnet-forwarding-03.txt, work in 1431 progress, February, 2017 1433 [EVPN-OVERLAY] Sajassi-Drake et al., "A Network Virtualization 1434 Overlay Solution using EVPN", draft-ietf-bess-evpn-overlay-08.txt, 1435 work in progress, March, 2017 1437 10. Acknowledgments 1439 The authors would like to thank Mukul Katiyar and Jeffrey Zhang for 1440 their valuable feedback and contributions. The following people also 1441 helped improving this document with their feedback: Tony Przygienda 1442 and Thomas Morin. Special THANK YOU to Eric Rosen for his detailed 1443 review, it really helped improve the readability and clarify the 1444 concepts. 1446 11. Contributors 1448 In addition to the authors listed on the front page, the following 1449 co-authors have also contributed to this document: 1451 Senthil Sathappan 1452 Florin Balus 1453 Aldrin Isaac 1454 Senad Palislamovic 1456 12. Authors' Addresses 1458 Jorge Rabadan (Editor) 1459 Nokia 1460 777 E. Middlefield Road 1461 Mountain View, CA 94043 USA 1462 Email: jorge.rabadan@nokia.com 1464 Wim Henderickx 1465 Nokia 1466 Email: wim.henderickx@nokia.com 1468 John E. Drake 1469 Juniper 1470 Email: jdrake@juniper.net 1472 Ali Sajassi 1473 Cisco 1474 Email: sajassi@cisco.com 1476 Wen Lin 1477 Juniper 1478 Email: wlin@juniper.net