idnits 2.17.1 draft-ietf-bess-evpn-prefix-advertisement-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 16, 2017) is 2378 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC7432' is mentioned on line 1400, but not defined == Missing Reference: 'RFC4364' is mentioned on line 165, but not defined == Missing Reference: 'RFC7606' is mentioned on line 375, but not defined == Missing Reference: 'RFC5512' is mentioned on line 1189, but not defined ** Obsolete undefined reference: RFC 5512 (Obsoleted by RFC 9012) == Missing Reference: 'RFC2119' is mentioned on line 1391, but not defined == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-03 == Outdated reference: A later version (-12) exists of draft-ietf-bess-evpn-overlay-08 Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Workgroup J. Rabadan, Ed. 3 Internet Draft W. Henderickx 4 Intended status: Standards Track Nokia 6 J. Drake 7 W. Lin 8 Juniper 10 A. Sajassi 11 Cisco 13 Expires: April 19, 2018 October 16, 2017 15 IP Prefix Advertisement in EVPN 16 draft-ietf-bess-evpn-prefix-advertisement-06 18 Abstract 20 EVPN provides a flexible control plane that allows intra-subnet 21 connectivity in an MPLS and/or NVO-based network. In some networks, 22 there is also a need for a dynamic and efficient inter-subnet 23 connectivity across Tenant Systems and End Devices that can be 24 physical or virtual and do not necessarily participate in dynamic 25 routing protocols. This document defines a new EVPN route type for 26 the advertisement of IP Prefixes and explains some use-case examples 27 where this new route-type is used. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as Internet- 37 Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/ietf/1id-abstracts.txt 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 This Internet-Draft will expire on April 16, 2018. 51 Copyright Notice 53 Copyright (c) 2017 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 2. Introduction and Problem Statement . . . . . . . . . . . . . . 4 70 2.1 Inter-Subnet Connectivity Requirements in Data Centers . . . 4 71 2.2 The Requirement for a New EVPN Route Type . . . . . . . . . 7 72 3. The BGP EVPN IP Prefix Route . . . . . . . . . . . . . . . . . 8 73 3.1 IP Prefix Route Encoding . . . . . . . . . . . . . . . . . . 9 74 3.2 Overlay Indexes and Recursive Lookup Resolution . . . . . . 10 75 4. Overlay Index Use-Cases . . . . . . . . . . . . . . . . . . . . 13 76 4.1 TS IP Address Overlay Index Use-Case . . . . . . . . . . . . 13 77 4.2 Floating IP Overlay Index Use-Case . . . . . . . . . . . . . 15 78 4.3 Bump-in-the-Wire Use-Case . . . . . . . . . . . . . . . . . 17 79 4.4 IP-VRF-to-IP-VRF Model . . . . . . . . . . . . . . . . . . . 20 80 4.4.1 Interface-less IP-VRF-to-IP-VRF Model . . . . . . . . . 21 81 4.4.2 Interface-ful IP-VRF-to-IP-VRF with SBD-facing IRB . . . 24 82 4.4.3 Interface-ful IP-VRF-to-IP-VRF with Unnumbered 83 SBD-facing IRB . . . . . . . . . . . . . . . . . . . . . 27 84 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 30 85 6. Conventions used in this document . . . . . . . . . . . . . . . 31 86 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 31 87 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 31 88 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 31 89 9.1 Normative References . . . . . . . . . . . . . . . . . . . . 31 90 9.2 Informative References . . . . . . . . . . . . . . . . . . . 31 91 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 32 92 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 32 93 12. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 32 95 1. Terminology 97 GW IP: Gateway IP Address. 99 IPL: IP address length. 101 ML: MAC address length. 103 NVE: Network Virtualization Edge. 105 TS: Tenant System. 107 VA: Virtual Appliance. 109 RT-2: EVPN route type 2, i.e. MAC/IP advertisement route. 111 RT-5: EVPN route type 5, i.e. IP Prefix route. 113 AC: Attachment Circuit. 115 ARP: Address Resolution Protocol. 117 ND: Neighbor Discovery Protocol. 119 Ethernet NVO tunnel: it refers to Network Virtualization Overlay 120 tunnels with Ethernet payload. Examples of this type of tunnels 121 are VXLAN or nvGRE. 123 IP NVO tunnel: it refers to Network Virtualization Overlay tunnels 124 with IP payload (no MAC header in the payload). 126 EVI: EVPN Instance spanning the NVE/PE devices that are participating 127 on that EVPN. 129 MAC-VRF: A Virtual Routing and Forwarding table for Media Access 130 Control (MAC) addresses on an NVE/PE, as per [RFC7432]. 132 BD: Broadcast Domain. As per [RFC7432], an EVI consists of a single 133 or multiple BDs. In case of VLAN-bundle and VLAN-based service 134 models (see [RFC7432]), a BD is equivalent to an EVI. In case of 135 VLAN-aware bundle service model, an EVI contains multiple BDs. 136 Also, in this document, BD and subnet are equivalent terms. 138 BT: Bridge Table. The instantiation of a BD in a MAC-VRF. 140 IP-VRF: A VPN Routing and Forwarding table for IP routes on an 141 NVE/PE. The IP routes could be populated by EVPN and IP-VPN 142 address families. 144 IRB: Integrated Routing and Bridging interface. It connects an IP-VRF 145 to a BD (or subnet). 147 SBD: Supplementary Broadcast Domain. A BD that does not have any ACs, 148 only IRB interfaces, and it is used to provide connectivity among 149 all the IP-VRFs of the tenant. The SBD is only required in IP-VRF- 150 to-IP-VRF use-cases (see section 4.4.). 152 2. Introduction and Problem Statement 154 Inter-subnet connectivity is used for certain tenants within the Data 155 Center. [EVPN-INTERSUBNET] defines some fairly common inter-subnet 156 forwarding scenarios where TSes can exchange packets with TSes 157 located in remote subnets. In order to achieve this, 158 [EVPN-INTERSUBNET] describes how MAC/IPs encoded in TS RT-2 routes 159 are not only used to populate MAC-VRF and overlay ARP tables, but 160 also IP-VRF tables with the encoded TS host routes (/32 or /128). In 161 some cases, EVPN may advertise IP Prefixes and therefore provide 162 aggregation in the IP-VRF tables, as opposed to program individual 163 host routes. This document complements the scenarios described in 164 [EVPN-INTERSUBNET] and defines how EVPN may be used to advertise IP 165 Prefixes. Interoperability between EVPN and L3VPN [RFC4364] IP Prefix 166 routes is out of the scope of this document. 168 Section 2.1 describes the inter-subnet connectivity requirements in 169 Data Centers. Section 2.2 explains why a new EVPN route type is 170 required for IP Prefix advertisements. Once the need for a new EVPN 171 route type is justified, sections 3, 4 and 5 will describe this route 172 type and how it is used in some specific use cases. 174 2.1 Inter-Subnet Connectivity Requirements in Data Centers 176 [RFC7432] is used as the control plane for a Network Virtualization 177 Overlay (NVO3) solution in Data Centers (DC), where Network 178 Virtualization Edge (NVE) devices can be located in Hypervisors or 179 TORs, as described in [EVPN-OVERLAY]. 181 If we use the term Tenant System (TS) to designate a physical or 182 virtual system identified by MAC and maybe IP addresses, and 183 connected to a BD by an Attachment Circuit, the following 184 considerations apply: 186 o The Tenant Systems may be Virtual Machines (VMs) that generate 187 traffic from their own MAC and IP. 189 o The Tenant Systems may be Virtual Appliance entities (VAs) that 190 forward traffic to/from IP addresses of different End Devices 191 sitting behind them. 193 o These VAs can be firewalls, load balancers, NAT devices, other 194 appliances or virtual gateways with virtual routing instances. 196 o These VAs do not necessarily participate in dynamic routing 197 protocols and hence rely on the EVPN NVEs to advertise the 198 routes on their behalf. 200 o In all these cases, the VA will forward traffic to other TSes 201 using its own source MAC but the source IP will be the one 202 associated to the End Device sitting behind or a translated IP 203 address (part of a public NAT pool) if the VA is performing 204 NAT. 206 o Note that the same IP address could exist behind two of these 207 TS. One example of this would be certain appliance resiliency 208 mechanisms, where a virtual IP or floating IP can be owned by 209 one of the two VAs running the resiliency protocol (the master 210 VA). Virtual Router Redundancy Protocol (VRRP), RFC5798, is 211 one particular example of this. Another example is multi-homed 212 subnets, i.e. the same subnet is connected to two VAs. 214 o Although these VAs provide IP connectivity to VMs and subnets 215 behind them, they do not always have their own IP interface 216 connected to the EVPN NVE, e.g. layer-2 firewalls are examples 217 of VAs not supporting IP interfaces. 219 Figure 1 illustrates some of the examples described above. 221 NVE1 222 +-----------+ 223 TS1(VM)--| (BD-10) |-----+ 224 IP1/M1 +-----------+ | DGW1 225 +---------+ +-------------+ 226 | |----| (BD-10) | 227 SN1---+ NVE2 | | | IRB1\ | 228 | +-----------+ | | | (IP-VRF)|---+ 229 SN2---TS2(VA)--| (BD-10) |-| | +-------------+ _|_ 230 | IP2/M2 +-----------+ | VXLAN/ | ( ) 231 IP4---+ <-+ | nvGRE | DGW2 ( WAN ) 232 | | | +-------------+ (___) 233 vIP23 (floating) | |----| (BD-10) | | 234 | +---------+ | IRB2\ | | 235 SN1---+ <-+ NVE3 | | | | (IP-VRF)|---+ 236 | IP3/M3 +-----------+ | | | +-------------+ 237 SN3---TS3(VA)--| (BD-10) |---+ | | 238 | +-----------+ | | 239 IP5---+ | | 240 | | 241 NVE4 | | NVE5 +--SN5 242 +---------------------+ | | +-----------+ | 243 IP6------| (BD-1) | | +-| (BD-10) |--TS4(VA)--SN6 244 | \ | | +-----------+ | 245 | (IP-VRF) |--+ ESI4 +--SN7 246 | / \IRB3 | 247 |---| (BD-2) (BD-10) | 248 SN4| +---------------------+ 250 Figure 1 DC inter-subnet use-cases 252 Where: 254 NVE1, NVE2, NVE3, NVE4, NVE5, DGW1 and DGW2 share the same BD for a 255 particular tenant. BD-10 is comprised of the collection of BD 256 instances defined in all the NVEs. All the hosts connected to BD-10 257 belong to the same IP subnet. The hosts connected to BD-10 are listed 258 below: 260 o TS1 is a VM that generates/receives traffic from/to IP1, where IP1 261 belongs to the BD-10 subnet. 263 o TS2 and TS3 are Virtual Appliances (VA) that send/receive traffic 264 from/to the subnets and hosts sitting behind them (SN1, SN2, SN3, 265 IP4 and IP5). Their IP addresses (IP2 and IP3) belong to the BD-10 266 subnet and they can also generate/receive traffic. When these VAs 267 receive packets destined to their own MAC addresses (M2 and M3) 268 they will route the packets to the proper subnet or host. These VAs 269 do not support routing protocols to advertise the subnets connected 270 to them and can move to a different server and NVE when the Cloud 271 Management System decides to do so. These VAs may also support 272 redundancy mechanisms for some subnets, similar to VRRP, where a 273 floating IP is owned by the master VA and only the master VA 274 forwards traffic to a given subnet. E.g.: vIP23 in figure 1 is a 275 floating IP that can be owned by TS2 or TS3 depending on who the 276 master is. Only the master will forward traffic to SN1. 278 o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 have 279 their own IP addresses that belong to the BD-10 subnet too. These 280 IRB interfaces connect the BD-10 subnet to Virtual Routing and 281 Forwarding (IP-VRF) instances that can route the traffic to other 282 subnets for the same tenant (within the DC or at the other end of 283 the WAN). 285 o TS4 is a layer-2 VA that provides connectivity to subnets SN5, SN6 286 and SN7, but does not have an IP address itself in the BD-10. TS4 287 is connected to a physical port on NVE5 assigned to Ethernet 288 Segment Identifier 4. 290 All the above DC use cases require inter-subnet forwarding and 291 therefore the individual host routes and subnets: 293 a) MUST be advertised from the NVEs (since VAs and VMs do not 294 participate in dynamic routing protocols) and 295 b) MAY be associated to an Overlay Index that can be a VA IP address, 296 a floating IP address, a MAC address or an ESI. An Overlay Index 297 is a next-hop that requires a recursive resolution and it is 298 described in section 3.2. 300 2.2 The Requirement for a New EVPN Route Type 302 [RFC7432] defines a MAC/IP route (also referred as RT-2) where a MAC 303 address can be advertised together with an IP address length (IPL) 304 and IP address (IP). While a variable IPL might have been used to 305 indicate the presence of an IP prefix in a route type 2, there are 306 several specific use cases in which using this route type to deliver 307 IP Prefixes is not suitable. 309 One example of such use cases is the "floating IP" example described 310 in section 2.1. In this example we need to decouple the advertisement 311 of the prefixes from the advertisement of MAC address of either M2 or 312 M3", otherwise the solution gets highly inefficient and does not 313 scale. 315 E.g.: if we are advertising 1k prefixes from M2 (using RT-2) and the 316 floating IP owner changes from M2 to M3, we would need to withdraw 1k 317 routes from M2 and re-advertise 1k routes from M3. However if we use 318 a separate route type, we can advertise the 1k routes associated to 319 the floating IP address (vIP23) and only one RT-2 for advertising the 320 ownership of the floating IP, i.e. vIP23 and M2 in the route type 2. 321 When the floating IP owner changes from M2 to M3, a single RT-2 322 withdraw/update is required to indicate the change. The remote DGW 323 will not change any of the 1k prefixes associated to vIP23, but will 324 only update the ARP resolution entry for vIP23 (now pointing at M3). 326 Other reasons to decouple the IP Prefix advertisement from the MAC/IP 327 route are listed below: 329 o Clean identification, operation and troubleshooting of IP Prefixes, 330 independent of and not subject to the interpretation of the IPL and 331 the IP value. E.g.: a default IP route 0.0.0.0/0 must always be 332 easily and clearly distinguished from the absence of IP 333 information. 335 o In MAC/IP routes, the MAC information is part of the NLRI, so if IP 336 Prefixes were to be advertised using MAC/IP routes, the MAC 337 information would always be present and part of the route key. 339 The following sections describe how EVPN is extended with a new route 340 type for the advertisement of IP prefixes and how this route is used 341 to address the current and future inter-subnet connectivity 342 requirements existing in the Data Center. 344 3. The BGP EVPN IP Prefix Route 346 The current BGP EVPN NLRI as defined in [RFC7432] is shown below: 348 +-----------------------------------+ 349 | Route Type (1 octet) | 350 +-----------------------------------+ 351 | Length (1 octet) | 352 +-----------------------------------+ 353 | Route Type specific (variable) | 354 +-----------------------------------+ 356 Where the route type field can contain one of the following specific 357 values (refer to the IANA "EVPN Route Types registry): 359 + 1 - Ethernet Auto-Discovery (A-D) route 361 + 2 - MAC/IP advertisement route 362 + 3 - Inclusive Multicast Route 364 + 4 - Ethernet Segment Route 366 This document defines an additional route type that IANA has added to 367 the registry, and will be used for the advertisement of IP Prefixes: 369 + 5 - IP Prefix Route 371 The support for this new route type is OPTIONAL. 373 Since this new route type is OPTIONAL, an implementation not 374 supporting it MUST ignore the route, based on the unknown route type 375 value, as specified by Section 5.4 in [RFC7606]. 377 The detailed encoding of this route and associated procedures are 378 described in the following sections. 380 3.1 IP Prefix Route Encoding 382 An IP Prefix advertisement route NLRI consists of the following 383 fields: 385 +---------------------------------------+ 386 | RD (8 octets) | 387 +---------------------------------------+ 388 |Ethernet Segment Identifier (10 octets)| 389 +---------------------------------------+ 390 | Ethernet Tag ID (4 octets) | 391 +---------------------------------------+ 392 | IP Prefix Length (1 octet) | 393 +---------------------------------------+ 394 | IP Prefix (4 or 16 octets) | 395 +---------------------------------------+ 396 | GW IP Address (4 or 16 octets) | 397 +---------------------------------------+ 398 | MPLS Label (3 octets) | 399 +---------------------------------------+ 401 Where: 403 o RD, Ethernet Tag ID and MPLS Label fields will be used as defined 404 in [RFC7432] and [EVPN-OVERLAY]. 406 o The Ethernet Segment Identifier will be a non-zero 10-byte 407 identifier if the ESI is used as an Overlay Index (see the 408 definition of Overlay Index in section 3.2). It will be zero 409 otherwise. 411 o The IP Prefix Length can be set to a value between 0 and 32 (bits) 412 for ipv4 and between 0 and 128 for ipv6, and specifies the number 413 of bits in the Prefix. 415 o The IP Prefix will be a 32 or 128-bit field (ipv4 or ipv6). The 416 size of this field does not depend on the value of the IP Prefix 417 Length field. 419 o The GW IP (Gateway IP Address) will be a 32 or 128-bit field (ipv4 420 or ipv6), and will encode an IP address as an overlay index for the 421 IP Prefixes. The GW IP field SHOULD be zero if it is not used as an 422 Overlay Index. Refer to section 3.2 for the definition and use of 423 the Overlay Index. 425 o The MPLS Label field is encoded as 3 octets, where the high-order 426 20 bits contain the label value. When sending, the label value 427 SHOULD be zero if recursive resolution based on overlay index is 428 used. If the received MPLS Label value is zero, the route MUST 429 contain an Overlay Index and the ingress NVE/PE MUST do recursive 430 resolution to find the egress NVE/PE. If the received Label value 431 is non-zero, the route will not be used for recursive resolution 432 unless a local policy says so. 434 o The total route length will indicate the type of prefix (ipv4 or 435 ipv6) and the type of GW IP address (ipv4 or ipv6). Note that the 436 IP Prefix + the GW IP should have a length of either 64 or 256 437 bits, but never 160 bits (ipv4 and ipv6 mixed values are not 438 allowed). 440 The RD, Eth-Tag ID, IP Prefix Length and IP Prefix will be part of 441 the route key used by BGP to compare routes. The rest of the fields 442 will not be part of the route key. 444 An IP Prefix Route MAY be sent along with a Router's MAC Extended 445 Community (defined in [EVPN-INTERSUBNET]). 447 3.2 Overlay Indexes and Recursive Lookup Resolution 449 RT-5 routes support recursive lookup resolution through the use of 450 Overlay Indexes as follows: 452 o An Overlay Index can be an ESI, IP address in the address space of 453 the tenant or MAC address and it is used by an NVE as the next-hop 454 for a given IP Prefix. An Overlay Index always needs a recursive 455 route resolution on the NVE/PE that installs the RT-5 into one of 456 its IP-VRFs, so that the NVE knows to which egress NVE/PE it needs 457 to forward the packets. It is important to note that recursive 458 resolution of the Overlay Index applies upon installation into an 459 IP-VRF, and not upon BGP propagation (for instance, on an ASBR). 460 Also, as a result of the recursive resolution, the egress NVE/PE is 461 not necessarily the same NVE that originated the RT-5. 463 o The Overlay Index is indicated along with the RT-5 in the ESI 464 field, GW IP field or Router's MAC Extended Community, depending on 465 whether the IP Prefix next-hop is an ESI, IP address or MAC address 466 in the tenant space. The Overlay Index for a given IP Prefix is set 467 by local policy at the NVE that originates an RT-5 for that IP 468 Prefix (typically managed by the Cloud Management System). 470 o In order to enable the recursive lookup resolution at the ingress 471 NVE, an NVE that is a possible egress NVE for a given Overlay Index 472 must originate a route advertising itself as the BGP next hop on 473 the path to the system denoted by the Overlay Index. For instance: 475 . If an NVE receives an RT-5 that specifies an Overlay Index, the 476 NVE cannot use the RT-5 in its IP-VRF unless (or until) it can 477 recursively resolve the Overlay Index. 478 . If the RT-5 specifies an ESI as the Overlay Index, recursive 479 resolution can only be done if the NVE has received and installed 480 an RT-1 (Auto-Discovery per-EVI) route specifying that ESI. 481 . If the RT-5 specifies a GW IP address as the Overlay Index, 482 recursive resolution can only be done if the NVE has received and 483 installed an RT-2 (MAC/IP route) specifying that IP address in 484 the IP address field of its NLRI. 485 . If the RT-5 specifies a MAC address as the Overlay Index, 486 recursive resolution can only be done if the NVE has received and 487 installed an RT-2 (MAC/IP route) specifying that MAC address in 488 the MAC address field of its NLRI. 490 Note that the RT-1 or RT-2 routes needed for the recursive 491 resolution may arrive before or after the given RT-5 route. 493 o Irrespective of the recursive resolution, if there is no IGP or BGP 494 route to the BGP next-hop of an RT-5, BGP should fail to install 495 the RT-5 even if the Overlay Index can be resolved. 497 o The ESI and GW IP fields MAY both be zero, however they MUST NOT 498 both be non-zero at the same time. A route containing a non-zero GW 499 IP and a non-zero ESI (at the same time) will be treated as- 500 withdraw. 502 The indirection provided by the Overlay Index and its recursive 503 lookup resolution is required to achieve fast convergence in case of 504 a failure of the object represented by the Overlay Index (see the 505 example described in section 2.2). 507 Table 1 shows the different RT-5 field combinations allowed by this 508 specification and what Overlay Index must be used by the receiving 509 NVE/PE in each case. When the Overlay Index is "None" in Table 1, the 510 receiving NVE/PE will not perform any recursive resolution, and the 511 actual next-hop is given by the RT-5's BGP next-hop. 513 +----------+----------+----------+------------+----------------+ 514 | ESI | GW-IP | MAC* | Label | Overlay Index | 515 |--------------------------------------------------------------| 516 | Non-Zero | Zero | Zero | Don't Care | ESI | 517 | Non-Zero | Zero | Non-Zero | Don't Care | ESI | 518 | Zero | Non-Zero | Zero | Don't Care | GW-IP | 519 | Zero | Zero | Non-Zero | Zero | MAC | 520 | Zero | Zero | Non-Zero | Non-Zero | MAC or None** | 521 | Zero | Zero | Zero | Non-Zero | None(IP NVO)***| 522 +----------+----------+----------+------------+----------------+ 524 Table 1 - RT-5 fields and Indicated Overlay Index 526 Table NOTES: 528 * MAC with Zero value means no Router's MAC extended community is 529 present along with the RT-5. Non-Zero indicates that the extended 530 community is present and carries a valid MAC address. Examples of 531 invalid MAC addresses are broadcast or multicast MAC addresses. 533 ** In this case, the Overlay Index may be the RT-5's MAC address or 534 None, depending on the local policy of the receiving NVE/PE. 536 *** The Overlay Index is None. This is a special case used for IP- 537 VRF-to-IP-VRF where the NVE/PEs are connected by IP NVO tunnels 538 as opposed to Ethernet NVO tunnels. 540 Table 2 shows the different inter-subnet use-cases described in this 541 document and the corresponding coding of the Overlay Index in the 542 route type 5 (RT-5). 544 +---------+---------------------+----------------------------+ 545 | Section | Use-case | Overlay Index in the RT-5 | 546 +-------------------------------+----------------------------+ 547 | 4.1 | TS IP address | GW IP | 548 | 4.2 | Floating IP address | GW IP | 549 | 4.3 | "Bump in the wire" | ESI or MAC | 550 | 4.4 | IP-VRF-to-IP-VRF | GW IP, MAC or None | 551 +---------+---------------------+----------------------------+ 553 Table 2 - Use-cases and Overlay Indexes for Recursive Resolution 555 The above use-cases are representative of the different Overlay 556 Indexes supported by RT-5 (GW IP, ESI, MAC or None). Any other use- 557 case using a given Overlay Index, SHOULD follow the procedures 558 described in this document for the same Overlay Index. 560 4. Overlay Index Use-Cases 562 This section describes some use-cases for the Overlay Index types 563 used with the IP Prefix route. 565 4.1 TS IP Address Overlay Index Use-Case 567 The following figure illustrates an example of inter-subnet 568 forwarding for subnets sitting behind Virtual Appliances (on TS2 and 569 TS3). 571 SN1---+ NVE2 DGW1 572 | +-----------+ +---------+ +-------------+ 573 SN2---TS2(VA)--| (BD-10) |-| |----| (BD-10) | 574 | IP2/M2 +-----------+ | | | IRB1\ | 575 IP4---+ | | | (IP-VRF)|---+ 576 | | +-------------+ _|_ 577 | VXLAN/ | ( ) 578 | nvGRE | DGW2 ( WAN ) 579 SN1---+ NVE3 | | +-------------+ (___) 580 | IP3/M3 +-----------+ | |----| (BD-10) | | 581 SN3---TS3(VA)--| (BD-10) |-| | | IRB2\ | | 582 | +-----------+ +---------+ | (IP-VRF)|---+ 583 IP5---+ +-------------+ 585 Figure 2 TS IP address use-case 587 An example of inter-subnet forwarding between subnet SN1/24 and a 588 subnet sitting in the WAN is described below. NVE2, NVE3, DGW1 and 589 DGW2 are running BGP EVPN. TS2 and TS3 do not participate in dynamic 590 routing protocols, and they only have a static route to forward the 591 traffic to the WAN. We assume SN1/24 is dual-homed to NVE2 and NVE3. 593 In this case, a GW IP is used as an Overlay Index. Although a 594 different Overlay Index type could have been used, this use-case 595 assumes that the operator knows the VA's IP addresses beforehand, 596 whereas the VA's MAC address is unknown and the VA's ESI is zero. 597 Because of this, the GW IP is the suitable Overlay Index to be used 598 with the RT-5s. The NVEs know the GW IP to be used for a given Prefix 599 by policy. 601 (1) NVE2 advertises the following BGP routes on behalf of TS2: 603 o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32, 604 IP=IP2 and [RFC5512] BGP Encapsulation Extended Community with 605 the corresponding Tunnel-type. The MAC and IP addresses may be 606 learned via ARP-snooping (ND-snooping if IPv6). 608 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 609 ESI=0, GW IP address=IP2. The prefix and GW IP are learned by 610 policy. 612 (2) Similarly, NVE3 advertises the following BGP routes on behalf of 613 TS3: 615 o Route type 2 (MAC/IP route) containing: ML=48, M=M3, IPL=32, 616 IP=IP3 (and BGP Encapsulation Extended Community). 618 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 619 ESI=0, GW IP address=IP3. 621 (3) DGW1 and DGW2 import both received routes based on the 622 route-targets: 624 o Based on the BD-10 route-target in DGW1 and DGW2, the MAC/IP 625 route is imported and M2 is added to the BD-10 along with its 626 corresponding tunnel information. For instance, if VXLAN is 627 used, the VTEP will be derived from the MAC/IP route BGP next- 628 hop and VNI from the MPLS Label1 field. IP2 - M2 is added to 629 the ARP table. Similarly, M3 is added to BD-10 and IP3 - M3 to 630 the ARP table. 632 o Based on the BD-10 route-target in DGW1 and DGW2, the IP 633 Prefix route is also imported and SN1/24 is added to the IP- 634 VRF with Overlay Index IP2 pointing at the local BD-10. In 635 this example, we assume the RT-5 from NVE2 is preferred over 636 the RT-5 from NVE3. If both routes were equally preferable and 637 ECMP enabled, SN1/24 would also be added to the routing table 638 with Overlay Index IP3. 640 (4) When DGW1 receives a packet from the WAN with destination IPx, 641 where IPx belongs to SN1/24: 643 o A destination IP lookup is performed on the DGW1 IP-VRF 644 routing table and Overlay Index=IP2 is found. Since IP2 is an 645 Overlay Index a recursive route resolution is required for 646 IP2. 648 o IP2 is resolved to M2 in the ARP table, and M2 is resolved to 649 the tunnel information given by the BD FIB (e.g. remote VTEP 650 and VNI for the VXLAN case). 652 o The IP packet destined to IPx is encapsulated with: 654 . Source inner MAC = IRB1 MAC. 656 . Destination inner MAC = M2. 658 . Tunnel information provided by the BD (VNI, VTEP IPs and 659 MACs for the VXLAN case). 661 (5) When the packet arrives at NVE2: 663 o Based on the tunnel information (VNI for the VXLAN case), the 664 BD-10 context is identified for a MAC lookup. 666 o Encapsulation is stripped-off and based on a MAC lookup 667 (assuming MAC forwarding on the egress NVE), the packet is 668 forwarded to TS2, where it will be properly routed. 670 (6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will 671 be applied to the MAC route IP2/M2, as defined in [RFC7432]. 672 Route type 5 prefixes are not subject to MAC mobility procedures, 673 hence no changes in the DGW IP-VRF routing table will occur for 674 TS2 mobility, i.e. all the prefixes will still be pointing at IP2 675 as Overlay Index. There is an indirection for e.g. SN1/24, which 676 still points at Overlay Index IP2 in the routing table, but IP2 677 will be simply resolved to a different tunnel, based on the 678 outcome of the MAC mobility procedures for the MAC/IP route 679 IP2/M2. 681 Note that in the opposite direction, TS2 will send traffic based on 682 its static-route next-hop information (IRB1 and/or IRB2), and regular 683 EVPN procedures will be applied. 685 4.2 Floating IP Overlay Index Use-Case 686 Sometimes Tenant Systems (TS) work in active/standby mode where an 687 upstream floating IP - owned by the active TS - is used as the 688 Overlay Index to get to some subnets behind. This redundancy mode, 689 already introduced in section 2.1 and 2.2, is illustrated in Figure 690 3. 692 NVE2 DGW1 693 +-----------+ +---------+ +-------------+ 694 +---TS2(VA)--| (BD-10) |-| |----| (BD-10) | 695 | IP2/M2 +-----------+ | | | IRB1\ | 696 | <-+ | | | (IP-VRF)|---+ 697 | | | | +-------------+ _|_ 698 SN1 vIP23 (floating) | VXLAN/ | ( ) 699 | | | nvGRE | DGW2 ( WAN ) 700 | <-+ NVE3 | | +-------------+ (___) 701 | IP3/M3 +-----------+ | |----| (BD-10) | | 702 +---TS3(VA)--| (BD-10) |-| | | IRB2\ | | 703 +-----------+ +---------+ | (IP-VRF)|---+ 704 +-------------+ 706 Figure 3 Floating IP Overlay Index for redundant TS 708 In this use-case, a GW IP is used as an Overlay Index for the same 709 reasons as in 4.1. However, this GW IP is a floating IP that belongs 710 to the active TS. Assuming TS2 is the active TS and owns IP23: 712 (1) NVE2 advertises the following BGP routes for TS2: 714 o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32, 715 IP=IP23 (and BGP Encapsulation Extended Community). The MAC 716 and IP addresses may be learned via ARP-snooping. 718 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 719 ESI=0, GW IP address=IP23. The prefix and GW IP are learned by 720 policy. 722 (2) NVE3 advertises the following BGP route for TS3 (it does not 723 advertise an RT-2 for IP23/M3): 725 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 726 ESI=0, GW IP address=IP23. The prefix and GW IP are learned by 727 policy. 729 (3) DGW1 and DGW2 import both received routes based on the route- 730 target: 732 o M2 is added to the BD-10 FIB along with its corresponding 733 tunnel information. For the VXLAN use case, the VTEP will be 734 derived from the MAC/IP route BGP next-hop and VNI from the 735 VNI/VSID field. IP23 - M2 is added to the ARP table. 737 o SN1/24 is added to the IP-VRF in DGW1 and DGW2 with Overlay 738 index IP23 pointing at M2 in the local BD-10. 740 (4) When DGW1 receives a packet from the WAN with destination IPx, 741 where IPx belongs to SN1/24: 743 o A destination IP lookup is performed on the DGW1 IP-VRF 744 routing table and Overlay Index=IP23 is found. Since IP23 is 745 an Overlay Index, a recursive route resolution for IP23 is 746 required. 748 o IP23 is resolved to M2 in the ARP table, and M2 is resolved to 749 the tunnel information given by the BD (remote VTEP and VNI 750 for the VXLAN case). 752 o The IP packet destined to IPx is encapsulated with: 754 . Source inner MAC = IRB1 MAC. 756 . Destination inner MAC = M2. 758 . Tunnel information provided by the BD FIB (VNI, VTEP IPs 759 and MACs for the VXLAN case). 761 (5) When the packet arrives at NVE2: 763 o Based on the tunnel information (VNI for the VXLAN case), the 764 BD-10 context is identified for a MAC lookup. 766 o Encapsulation is stripped-off and based on a MAC lookup 767 (assuming MAC forwarding on the egress NVE), the packet is 768 forwarded to TS2, where it will be properly routed. 770 (6) When the redundancy protocol running between TS2 and TS3 appoints 771 TS3 as the new active TS for SN1, TS3 will now own the floating 772 IP23 and will signal this new ownership (GARP message or 773 similar). Upon receiving the new owner's notification, NVE3 will 774 issue a route type 2 for M3-IP23 and NVE2 will withdraw the RT-2 775 for M2-IP23. DGW1 and DGW2 will update their ARP tables with the 776 new MAC resolving the floating IP. No changes are made in the IP- 777 VRF routing table. 779 4.3 Bump-in-the-Wire Use-Case 780 Figure 5 illustrates an example of inter-subnet forwarding for an IP 781 Prefix route that carries a subnet SN1. In this use-case, TS2 and TS3 782 are layer-2 VA devices without any IP address that can be included as 783 an Overlay Index in the GW IP field of the IP Prefix route. Their MAC 784 addresses are M2 and M3 respectively and are connected to BD-10. Note 785 that IRB1 and IRB2 (in DGW1 and DGW2 respectively) have IP addresses 786 in a subnet different than SN1. 788 NVE2 DGW1 789 M2 +-----------+ +---------+ +-------------+ 790 +---TS2(VA)--| (BD-10) |-| |----| (BD-10) | 791 | ESI23 +-----------+ | | | IRB1\ | 792 | + | | | (IP-VRF)|---+ 793 | | | | +-------------+ _|_ 794 SN1 | | VXLAN/ | ( ) 795 | | | nvGRE | DGW2 ( WAN ) 796 | + NVE3 | | +-------------+ (___) 797 | ESI23 +-----------+ | |----| (BD-10) | | 798 +---TS3(VA)--| (BD-10) |-| | | IRB2\ | | 799 M3 +-----------+ +---------+ | (IP-VRF)|---+ 800 +-------------+ 802 Figure 5 Bump-in-the-wire use-case 804 Since neither TS2 nor TS3 can participate in any dynamic routing 805 protocol and have no IP address assigned, there are two potential 806 Overlay Index types that can be used when advertising SN1: 808 a) an ESI, i.e. ESI23, that can be provisioned on the attachment 809 ports of NVE2 and NVE3, as shown in Figure 5. 810 b) or the VA's MAC address, that can be added to NVE2 and NVE3 by 811 policy. 813 The advantage of using an ESI as Overlay Index as opposed to the VA's 814 MAC address, is that the forwarding to the egress NVE can be done 815 purely based on the state of the AC in the ES (notified by the AD 816 per-EVI route) and all the EVPN multi-homing redundancy mechanisms 817 can be re-used. For instance, the [RFC7432] mass-withdrawal mechanism 818 for fast failure detection and propagation can be used. This section 819 assumes that an ESI Overlay Index is used in this use-case but it 820 does not prevent the use of the VA's MAC address as an Overlay Index. 821 If a MAC is used as Overlay Index, the control plane must follow the 822 procedures described in section 4.4.3. 824 The model supports VA redundancy in a similar way as the one 825 described in section 4.2 for the floating IP Overlay Index use-case, 826 except that it uses the EVPN Ethernet A-D per-EVI route instead of 827 the MAC advertisement route to advertise the location of the Overlay 828 Index. The procedure is explained below: 830 (1) Assuming TS2 is the active TS in ESI23, NVE2 advertises the 831 following BGP routes: 833 o Route type 1 (Ethernet A-D route for BD-10) containing: 834 ESI=ESI23 and the corresponding tunnel information (VNI/VSID 835 field), as well as the BGP Encapsulation Extended Community as 836 per [EVPN-OVERLAY]. 838 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 839 ESI=ESI23, GW IP address=0. The Router's MAC Extended 840 Community defined in [EVPN-INTERSUBNET] is added and carries 841 the MAC address (M2) associated to the TS behind which SN1 842 sits. M2 may be learned by policy. 844 (2) NVE3 advertises the following BGP route for TS3 (no AD per-EVI 845 route is advertised): 847 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 848 ESI=23, GW IP address=0. The Router's MAC Extended Community 849 is added and carries the MAC address (M3) associated to the TS 850 behind which SN1 sits. M3 may be learned by policy. 852 (3) DGW1 and DGW2 import the received routes based on the route- 853 target: 855 o The tunnel information to get to ESI23 is installed in DGW1 856 and DGW2. For the VXLAN use case, the VTEP will be derived 857 from the Ethernet A-D route BGP next-hop and VNI from the 858 VNI/VSID field (see [EVPN-OVERLAY]). 860 o The RT-5 coming from the NVE that advertised the RT-1 is 861 selected and SN1/24 is added to the IP-VRF in DGW1 and DGW2 862 with Overlay Index ESI23 and MAC = M2. 864 (4) When DGW1 receives a packet from the WAN with destination IPx, 865 where IPx belongs to SN1/24: 867 o A destination IP lookup is performed on the DGW1 IP-VRF 868 routing table and Overlay Index=ESI23 is found. Since ESI23 is 869 an Overlay Index, a recursive route resolution is required to 870 find the egress NVE where ESI23 resides. 872 o The IP packet destined to IPx is encapsulated with: 874 . Source inner MAC = IRB1 MAC. 876 . Destination inner MAC = M2 (this MAC will be obtained 877 from the Router's MAC Extended Community received along 878 with the RT-5 for SN1). Note that the Router's MAC 879 Extended Community is used in this case to carry the TS' 880 MAC address, as opposed to the NVE/PE's MAC address. 882 . Tunnel information for the NVO tunnel is provided by the 883 Ethernet A-D route per-EVI for ESI23 (VNI and VTEP IP for 884 the VXLAN case). 886 (5) When the packet arrives at NVE2: 888 o Based on the tunnel demultiplexer information (VNI for the 889 VXLAN case), the BD-10 context is identified for a MAC lookup 890 (assuming MAC disposition model) or the VNI MAY directly 891 identify the egress interface (for a label or VNI disposition 892 model). 894 o Encapsulation is stripped-off and based on a MAC lookup 895 (assuming MAC forwarding on the egress NVE) or a VNI lookup 896 (in case of VNI forwarding), the packet is forwarded to TS2, 897 where it will be forwarded to SN1. 899 (6) If the redundancy protocol running between TS2 and TS3 follows an 900 active/standby model and there is a failure, appointing TS3 as 901 the new active TS for SN1, TS3 will now own the connectivity to 902 SN1 and will signal this new ownership. Upon receiving the new 903 owner's notification, NVE3's AC will become active and issue a 904 route type 1 for ESI23, whereas NVE2 will withdraw its Ethernet 905 A-D route for ESI23. DGW1 and DGW2 will update their tunnel 906 information to resolve ESI23. The destination inner MAC will be 907 changed to M3. 909 4.4 IP-VRF-to-IP-VRF Model 911 This use-case is similar to the scenario described in "IRB forwarding 912 on NVEs for Tenant Systems" in [EVPN-INTERSUBNET], however the new 913 requirement here is the advertisement of IP Prefixes as opposed to 914 only host routes. 916 In the examples described in sections 4.1, 4.2 and 4.3, the BD 917 instance can connect IRB interfaces and any other Tenant Systems 918 connected to it. EVPN provides connectivity for: 920 1. Traffic destined to the IRB or TS IP interfaces as well as 922 2. Traffic destined to IP subnets sitting behind the TS, e.g. SN1 or 923 SN2. 925 In order to provide connectivity for (1), MAC/IP routes (RT-2) are 926 needed so that IRB or TS MACs and IPs can be distributed. 927 Connectivity type (2) is accomplished by the exchange of IP Prefix 928 routes (RT-5) for IPs and subnets sitting behind certain Overlay 929 Indexes, e.g. GW IP or ESI or TS MAC. 931 In some cases, IP Prefix routes may be advertised for subnets and IPs 932 sitting behind an IRB. We refer to this use-case as the "IP-VRF-to- 933 IP-VRF" model. 935 [EVPN-INTERSUBNET] defines an asymmetric IRB model and a symmetric 936 IRB model, based on the required lookups at the ingress and egress 937 NVE: the asymmetric model requires an ip-lookup and a mac-lookup at 938 the ingress NVE, whereas only a mac-lookup is needed at the egress 939 NVE; the symmetric model requires ip and mac lookups at both, ingress 940 and egress NVE. From that perspective, the IP-VRF-to-IP-VRF use-case 941 described in this section is a symmetric IRB model. 943 Note that, in an IP-VRF-to-IP-VRF scenario, out of the many subnets 944 that a tenant may have, it may be the case that only a few are 945 attached to a given NVE/PE's IP-VRF. In order to provide inter-subnet 946 connectivity among the set of NVE/PEs where the tenant is connected, 947 a new "Supplementary Broadcast Domain" (SBD) is created on all of 948 them. This SBD is instantiated as a regular BD (with no ACs) in each 949 NVE/PE and has a IRB interfaces that connect the SBD to the IP-VRF. 950 If no recursive resolution is needed, the SBD may not be needed and 951 the IP-VRFs may be connected directly by Ethernet or IP NVO tunnels. 952 Depending on the existence and characteristics of the SBD and IRB 953 interfaces for the IP-VRFs, there are three different IP-VRF-to-IP- 954 VRF scenarios identified and described in this document: 956 1) Interface-less model: no SBD and no overlay indexes required. 957 2) Interface-ful with SBD-facing IRB model: it requires SBD, as well 958 as GW IP addresses as overlay indexes. 959 3) Interface-ful with unnumbered SBD-facing IRB model: it requires 960 SBD, as well as MAC addresses as overlay indexes. 962 Inter-subnet IP multicast is outside the scope of this document. 964 4.4.1 Interface-less IP-VRF-to-IP-VRF Model 966 Figure 6 will be used for the description of this model. 968 NVE1(M1) 969 +------------+ 970 IP1+----| (BD-1) | DGW1(M3) 971 | \ | +---------+ +--------+ 972 | (IP-VRF)|----| |-|(IP-VRF)|----+ 973 | / | | | +--------+ | 974 +---| (BD-2) | | | _+_ 975 | +------------+ | | ( ) 976 SN1| | VXLAN/ | ( WAN )--H1 977 | NVE2(M2) | nvGRE/ | (___) 978 | +------------+ | MPLS | + 979 +---| (BD-2) | | | DGW2(M4) | 980 | \ | | | +--------+ | 981 | (IP-VRF)|----| |-|(IP-VRF)|----+ 982 | / | +---------+ +--------+ 983 SN2+----| (BD-3) | 984 +------------+ 986 Figure 6 Interface-less IP-VRF-to-IP-VRF model 988 In this case: 990 a) The NVEs and DGWs must provide connectivity between hosts in SN1, 991 SN2, IP1 and hosts sitting at the other end of the WAN, for 992 example, H1. We assume the DGWs import/export IP and/or VPN-IP 993 routes from/to the WAN. 995 b) The IP-VRF instances in the NVE/DGWs are directly connected 996 through NVO tunnels, and no IRBs and/or BD instances are 997 instantiated to connect the IP-VRFs. 999 c) The solution must provide layer-3 connectivity among the IP-VRFs 1000 for Ethernet NVO tunnels, for instance, VXLAN or nvGRE. 1002 d) The solution may provide layer-3 connectivity among the IP-VRFs 1003 for IP NVO tunnels, for example, VXLAN GPE (with IP payload). 1005 In order to meet the above requirements, the EVPN route type 5 will 1006 be used to advertise the IP Prefixes, along with the Router's MAC 1007 Extended Community as defined in [EVPN-INTERSUBNET] if the 1008 advertising NVE/DGW uses Ethernet NVO tunnels. Each NVE/DGW will 1009 advertise an RT-5 for each of its prefixes with the following fields: 1011 o RD as per [RFC7432]. 1013 o Eth-Tag ID=0. 1015 o IP address length and IP address, as explained in the previous 1016 sections. 1018 o GW IP address=0. 1020 o ESI=0 1022 o MPLS label or VNI corresponding to the IP-VRF. 1024 Each RT-5 will be sent with a route-target identifying the tenant 1025 (IP-VRF) and two BGP extended communities: 1027 o The first one is the BGP Encapsulation Extended Community, as 1028 per [RFC5512], identifying the tunnel type. 1030 o The second one is the Router's MAC Extended Community as per 1031 [EVPN-INTERSUBNET] containing the MAC address associated to 1032 the NVE advertising the route. This MAC address identifies the 1033 NVE/DGW and MAY be re-used for all the IP-VRFs in the NVE. The 1034 Router's MAC Extended Community MUST be sent if the route is 1035 associated to an Ethernet NVO tunnel, for instance, VXLAN. If 1036 the route is associated to an IP NVO tunnel, for instance 1037 VXLAN GPE with IP payload, the Router's MAC Extended Community 1038 SHOULD NOT be sent. 1040 The following example illustrates the procedure to advertise and 1041 forward packets to SN1/24 (ipv4 prefix advertised from NVE1): 1043 (1) NVE1 advertises the following BGP route: 1045 o Route type 5 (IP Prefix route) containing: 1047 . IPL=24, IP=SN1, Label=10. 1049 . GW IP= SHOULD be set to 0. 1051 . [RFC5512] BGP Encapsulation Extended Community. 1053 . Router's MAC Extended Community that contains M1. 1055 . Route-target identifying the tenant (IP-VRF). 1057 (2) DGW1 imports the received routes from NVE1: 1059 o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 1060 route-target. 1062 o Since GW IP=ESI=0, the Label is a non-zero value and the local 1063 policy indicates this interface-less model, DGW1 will use the 1064 Label and next-hop of the RT-5, as well as the MAC address 1065 conveyed in the Router's MAC Extended Community (as inner 1066 destination MAC address) to set up the forwarding state and 1067 later encapsulate the routed IP packets. 1069 (3) When DGW1 receives a packet from the WAN with destination IPx, 1070 where IPx belongs to SN1/24: 1072 o A destination IP lookup is performed on the DGW1 IP-VRF 1073 routing table. The lookup yields SN1/24. 1075 o Since the RT-5 for SN1/24 had a GW IP=ESI=0, a non-zero Label 1076 and next-hop and the model is interface-less, DGW1 will not 1077 need a recursive lookup to resolve the route. 1079 o The IP packet destined to IPx is encapsulated with: Source 1080 inner MAC = DGW1 MAC, Destination inner MAC = M1, Source outer 1081 IP (tunnel source IP) = DGW1 IP, Destination outer IP (tunnel 1082 destination IP) = NVE1 IP. The Source and Destination inner 1083 MAC addresses are not needed if IP NVO tunnels are used. 1085 (4) When the packet arrives at NVE1: 1087 o NVE1 will identify the IP-VRF for an IP-lookup based on the 1088 Label (the Destination inner MAC is not needed to identify the 1089 IP-VRF). 1091 o An IP lookup is performed in the routing context, where SN1 1092 turns out to be a local subnet associated to BD-2. A 1093 subsequent lookup in the ARP table and the BD FIB will provide 1094 the forwarding information for the packet in BD-2. 1096 The model described above is called Interface-less model since the 1097 IP-VRFs are connected directly through tunnels and they don't require 1098 those tunnels to be terminated in core BDs instead, like in sections 1099 4.4.2 or 4.4.3. An EVPN IP-VRF-to-IP-VRF implementation is REQUIRED 1100 to support the ingress and egress procedures described in this 1101 section. 1103 4.4.2 Interface-ful IP-VRF-to-IP-VRF with SBD-facing IRB 1105 Figure 7 will be used for the description of this model. 1107 NVE1 1108 +------------+ DGW1 1109 IP10+---+(BD-1) | +---------------+ +------------+ 1110 | \ | | | | | 1111 |(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ 1112 | / IRB(IP1/M1) IRB(IP3/M3) | | 1113 +---+(BD-2) | | | +------------+ _+_ 1114 | +------------+ | | ( ) 1115 SN1| | VXLAN/ | ( WAN )--H1 1116 | NVE2 | nvGRE/ | (___) 1117 | +------------+ | MPLS | DGW2 + 1118 +---+(BD-2) | | | +------------+ | 1119 | \ | | | | | | 1120 |(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ 1121 | / IRB(IP2/M2) IRB(IP4/M4) | 1122 SN2+----+(BD-3) | +---------------+ +------------+ 1123 +------------+ 1125 Figure 7 Interface-ful with core-facing IRB model 1127 In this model: 1129 a) As in section 4.4.1, the NVEs and DGWs must provide connectivity 1130 between hosts in SN1, SN2, IP1 and hosts sitting at the other end 1131 of the WAN. 1133 b) However, the NVE/DGWs are now connected through Ethernet NVO 1134 tunnels terminated in the SBD instance. The IP-VRFs use IRB 1135 interfaces for their connectivity to the SBD. 1137 c) Each SBD-facing IRB has an IP and a MAC address, where the IP 1138 address must be reachable from other NVEs or DGWs. 1140 d) The SBD is attached to all the NVE/DGWs in the tenant domain BDs. 1142 e) The solution must provide layer-3 connectivity for Ethernet NVO 1143 tunnels, for instance, VXLAN or nvGRE. 1145 EVPN type 5 routes will be used to advertise the IP Prefixes, whereas 1146 EVPN RT-2 routes will advertise the MAC/IP addresses of each SBD- 1147 facing IRB interface. Each NVE/DGW will advertise an RT-5 for each of 1148 its prefixes with the following fields: 1150 o RD as per [RFC7432]. 1152 o Eth-Tag ID=0. 1154 o IP address length and IP address, as explained in the previous 1155 sections. 1157 o GW IP address=IRB-IP (this is the Overlay Index that will be 1158 used for the recursive route resolution). 1160 o ESI=0 1162 o Label value SHOULD be zero since the RT-5 route requires a 1163 recursive lookup resolution to an RT-2 route. It is ignored on 1164 reception, and, when forwarding packets, the MPLS label or VNI 1165 from the RT-2's MPLS Label1 field is used. 1167 Each RT-5 will be sent with a route-target identifying the tenant 1168 (IP-VRF). The Router's MAC Extended Community SHOULD NOT be sent in 1169 this case. 1171 The following example illustrates the procedure to advertise and 1172 forward packets to SN1/24 (ipv4 prefix advertised from NVE1): 1174 (1) NVE1 advertises the following BGP routes: 1176 o Route type 5 (IP Prefix route) containing: 1178 . IPL=24, IP=SN1, Label= SHOULD be set to 0. 1180 . GW IP=IP1 (core-facing IRB's IP) 1182 . Route-target identifying the tenant (IP-VRF). 1184 o Route type 2 (MAC/IP route for the core-facing IRB) 1185 containing: 1187 . ML=48, M=M1, IPL=32, IP=IP1, Label=10. 1189 . A [RFC5512] BGP Encapsulation Extended Community. 1191 . Route-target identifying the SBD. This route-target MAY be 1192 the same as the one used with the RT-5. 1194 (2) DGW1 imports the received routes from NVE1: 1196 o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 1197 route-target. 1199 . Since GW IP is different from zero, the GW IP (IP1) will be 1200 used as the Overlay Index for the recursive route resolution 1201 to the RT-2 carrying IP1. 1203 (3) When DGW1 receives a packet from the WAN with destination IPx, 1204 where IPx belongs to SN1/24: 1206 o A destination IP lookup is performed on the DGW1 IP-VRF 1207 routing table. The lookup yields SN1/24, which is associated 1208 to the Overlay Index IP1. The forwarding information is 1209 derived from the RT-2 received for IP1. 1211 o The IP packet destined to IPx is encapsulated with: Source 1212 inner MAC = M3, Destination inner MAC = M1, Source outer IP 1213 (source VTEP) = DGW1 IP, Destination outer IP (destination 1214 VTEP) = NVE1 IP. 1216 (4) When the packet arrives at NVE1: 1218 o NVE1 will identify the IP-VRF for an IP-lookup based on the 1219 Label and the inner MAC DA. 1221 o An IP lookup is performed in the routing context, where SN1 1222 turns out to be a local subnet associated to BD-2. A 1223 subsequent lookup in the ARP table and the BD FIB will provide 1224 the forwarding information for the packet in BD-2. 1226 The model described above is called 'Interface-ful with SBD-facing 1227 IRB model' since the tunnels connecting the DGWs and NVEs need to be 1228 terminated into the SBD. The SBD is connected to the IP-VRFs via 1229 core-facing IRB interfaces, and that allows the recursive resolution 1230 of RT-5s to GW IP addresses. An EVPN IP-VRF-to-IP-VRF implementation 1231 is REQUIRED to support the ingress and egress procedures described in 1232 this section. 1234 4.4.3 Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD-facing IRB 1236 Figure 8 will be used for the description of this model. Note that 1237 this model is similar to the one described in section 4.4.2, only 1238 without IP addresses on the SBD-facing IRB interfaces. 1240 NVE1 1241 +------------+ DGW1 1242 IP1+----+(BD-1) | +---------------+ +------------+ 1243 | \ | | | | | 1244 |(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ 1245 | / IRB(M1)| | IRB(M3) | | 1246 +---+(BD-2) | | | +------------+ _+_ 1247 | +------------+ | | ( ) 1248 SN1| | VXLAN/ | ( WAN )--H1 1249 | NVE2 | nvGRE/ | (___) 1250 | +------------+ | MPLS | DGW2 + 1251 +---+(BD-2) | | | +------------+ | 1252 | \ | | | | | | 1253 |(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ 1254 | / IRB(M2)| | IRB(M4) | 1255 SN2+----+(BD-3) | +---------------+ +------------+ 1256 +------------+ 1258 Figure 8 Interface-ful with unnumbered core-facing IRB model 1260 In this model: 1262 a) As in section 4.4.1 and 4.4.2, the NVEs and DGWs must provide 1263 connectivity between hosts in SN1, SN2, IP1 and hosts sitting at 1264 the other end of the WAN. 1266 b) As in section 4.4.2, the NVE/DGWs are connected through Ethernet 1267 NVO tunnels terminated in the SBD instance. The IP-VRFs use IRB 1268 interfaces for their connectivity to the SBD. 1270 c) However, each SBD-facing IRB has a MAC address only, and no IP 1271 address (that is why the model refers to an 'unnumbered' SBD- 1272 facing IRB). In this model, there is no need to have IP 1273 reachability to the SBD-facing IRB interfaces themselves and there 1274 is a requirement to save IP addresses on those interfaces. 1276 d) As in section 4.4.2, the SBD is composed of all the NVE/DGW BDs of 1277 the tenant that need inter-subnet-forwarding. 1279 e) As in section 4.4.2, the solution must provide layer-3 1280 connectivity for Ethernet NVO tunnels, for instance, VXLAN or 1281 nvGRE. 1283 This model will also make use of the RT-5 recursive resolution. EVPN 1284 type 5 routes will advertise the IP Prefixes along with the Router's 1285 MAC Extended Community used for the recursive lookup, whereas EVPN 1286 RT-2 routes will advertise the MAC addresses of each SBD-facing IRB 1287 interface (this time without an IP). 1289 Each NVE/DGW will advertise an RT-5 for each of its prefixes with the 1290 same fields as described in 4.4.2 except for: 1292 o GW IP address= SHOULD be set to 0. 1294 Each RT-5 will be sent with a route-target identifying the tenant 1295 (IP-VRF) and the Router's MAC Extended Community containing the MAC 1296 address associated to SBD-facing IRB interface. This MAC address MAY 1297 be re-used for all the IP-VRFs in the NVE. 1299 The example is similar to the one in section 4.4.2: 1301 (1) NVE1 advertises the following BGP routes: 1303 o Route type 5 (IP Prefix route) containing the same values as 1304 in the example in section 4.4.2, except for: 1306 . GW IP= SHOULD be set to 0. 1308 . Router's MAC Extended Community containing M1 (this will be 1309 used for the recursive lookup to a RT-2). 1311 o Route type 2 (MAC route for the core-facing IRB) with the same 1312 values as in section 4.4.2 except for: 1314 . ML=48, M=M1, IPL=0, Label=10. 1316 (2) DGW1 imports the received routes from NVE1: 1318 o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 1319 route-target. 1321 . The MAC contained in the Router's MAC Extended Community 1322 sent along with the RT-5 (M1) will be used as the Overlay 1323 Index for the recursive route resolution to the RT-2 1324 carrying M1. 1326 (3) When DGW1 receives a packet from the WAN with destination IPx, 1327 where IPx belongs to SN1/24: 1329 o A destination IP lookup is performed on the DGW1 IP-VRF 1330 routing table. The lookup yields SN1/24, which is associated 1331 to the Overlay Index M1. The forwarding information is derived 1332 from the RT-2 received for M1. 1334 o The IP packet destined to IPx is encapsulated with: Source 1335 inner MAC = M3, Destination inner MAC = M1, Source outer IP 1336 (source VTEP) = DGW1 IP, Destination outer IP (destination 1337 VTEP) = NVE1 IP. 1339 (4) When the packet arrives at NVE1: 1341 o NVE1 will identify the IP-VRF for an IP-lookup based on the 1342 Label and the inner MAC DA. 1344 o An IP lookup is performed in the routing context, where SN1 1345 turns out to be a local subnet associated to BD-2. A 1346 subsequent lookup in the ARP table and the BD FIB will provide 1347 the forwarding information for the packet in BD-2. 1349 The model described above is called Interface-ful with SBD-facing IRB 1350 model (as in section 4.4.2), only this time the SBD-facing IRB does 1351 not have an IP address. This model is OPTIONAL for an EVPN IP-VRF-to- 1352 IP-VRF implementation. 1354 5. Conclusions 1356 An EVPN route (type 5) for the advertisement of IP Prefixes is 1357 described in this document. This new route type has a differentiated 1358 role from the RT-2 route and addresses the Data Center (or NVO-based 1359 networks in general) inter-subnet connectivity scenarios described in 1360 this document. Using this new RT-5, an IP Prefix may be advertised 1361 along with an Overlay Index that can be a GW IP address, a MAC or an 1362 ESI, or without an Overlay Index, in which case the BGP next-hop will 1363 point at the egress NVE/ASBR/ABR and the MAC in the Router's MAC 1364 Extended Community will provide the inner MAC destination address to 1365 be used. As discussed throughout the document, the EVPN RT-2 does not 1366 meet the requirements for all the DC use cases, therefore this EVPN 1367 route type 5 is required. 1369 The EVPN route type 5 decouples the IP Prefix advertisements from the 1370 MAC/IP route advertisements in EVPN, hence: 1372 a) Allows the clean and clear advertisements of ipv4 or ipv6 prefixes 1373 in an NLRI with no MAC addresses. 1375 b) Since the route type is different from the MAC/IP Advertisement 1376 route, the current [RFC7432] procedures do not need to be 1377 modified. 1379 c) Allows a flexible implementation where the prefix can be linked to 1380 different types of Overlay/Underlay Indexes: overlay IP address, 1381 overlay MAC addresses, overlay ESI, underlay BGP next-hops, etc. 1383 d) An EVPN implementation not requiring IP Prefixes can simply 1384 discard them by looking at the route type value. 1386 6. Conventions used in this document 1388 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 1389 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 1390 document are to be interpreted as described in RFC-2119 [RFC2119]. 1392 7. Security Considerations 1394 The security considerations discussed in [RFC7432] apply to this 1395 document. 1397 8. IANA Considerations 1399 This document requests the allocation of value 5 in the "EVPN Route 1400 Types" registry defined by [RFC7432]: 1402 Value Description Reference 1403 5 IP Prefix route [this document] 1405 9. References 1407 9.1 Normative References 1409 [RFC4364]Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1410 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, 1411 . 1413 [RFC7432]Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1414 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet 1415 VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, . 1418 [RFC7606]Chen, E., Scudder, J., Mohapatra, P., and K. Patel, "Revised 1419 Error Handling for BGP UPDATE Messages", RFC 7606, August 2015, 1420 . 1422 9.2 Informative References 1424 [EVPN-INTERSUBNET] Sajassi et al., "IP Inter-Subnet Forwarding in 1425 EVPN", draft-ietf-bess-evpn-inter-subnet-forwarding-03.txt, work in 1426 progress, February, 2017 1428 [EVPN-OVERLAY] Sajassi-Drake et al., "A Network Virtualization 1429 Overlay Solution using EVPN", draft-ietf-bess-evpn-overlay-08.txt, 1430 work in progress, March, 2017 1432 10. Acknowledgments 1434 The authors would like to thank Mukul Katiyar and Jeffrey Zhang for 1435 their valuable feedback and contributions. The following people also 1436 helped improving this document with their feedback: Tony Przygienda 1437 and Thomas Morin. Special THANK YOU to Eric Rosen for his detailed 1438 review, it really helped improve the readability and clarify the 1439 concepts. 1441 11. Contributors 1443 In addition to the authors listed on the front page, the following 1444 co-authors have also contributed to this document: 1446 Senthil Sathappan 1447 Florin Balus 1448 Aldrin Isaac 1449 Senad Palislamovic 1451 12. Authors' Addresses 1453 Jorge Rabadan (Editor) 1454 Nokia 1455 777 E. Middlefield Road 1456 Mountain View, CA 94043 USA 1457 Email: jorge.rabadan@nokia.com 1459 Wim Henderickx 1460 Nokia 1461 Email: wim.henderickx@nokia.com 1463 John E. Drake 1464 Juniper 1465 Email: jdrake@juniper.net 1467 Ali Sajassi 1468 Cisco 1469 Email: sajassi@cisco.com 1471 Wen Lin 1472 Juniper 1473 Email: wlin@juniper.net