idnits 2.17.1 draft-ietf-bess-evpn-prefix-advertisement-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 14, 2015) is 3145 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC7432' is mentioned on line 1031, but not defined == Missing Reference: 'RFC5512' is mentioned on line 936, but not defined ** Obsolete undefined reference: RFC 5512 (Obsoleted by RFC 9012) == Missing Reference: 'RFC2119' is mentioned on line 1026, but not defined == Outdated reference: A later version (-12) exists of draft-ietf-bess-evpn-overlay-01 == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-00 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Workgroup J. Rabadan, Ed. 3 Internet Draft W. Henderickx 4 S. Palislamovic 5 Intended status: Standards Track Alcatel-Lucent 7 J. Drake A. Isaac 8 W. Lin Bloomberg 9 Juniper 11 A. Sajassi 12 Cisco 14 Expires: March 17, 2016 September 14, 2015 16 IP Prefix Advertisement in EVPN 17 draft-ietf-bess-evpn-prefix-advertisement-02 19 Abstract 21 EVPN provides a flexible control plane that allows intra-subnet 22 connectivity in an IP/MPLS and/or an NVO-based network. In NVO 23 networks, there is also a need for a dynamic and efficient inter- 24 subnet connectivity across Tenant Systems and End Devices that can be 25 physical or virtual and may not support their own routing protocols. 26 This document defines a new EVPN route type for the advertisement of 27 IP Prefixes and explains some use-case examples where this new route- 28 type is used. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF), its areas, and its working groups. Note that 37 other groups may also distribute working documents as Internet- 38 Drafts. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 The list of current Internet-Drafts can be accessed at 46 http://www.ietf.org/ietf/1id-abstracts.txt 48 The list of Internet-Draft Shadow Directories can be accessed at 49 http://www.ietf.org/shadow.html 51 This Internet-Draft will expire on March 17, 2016. 53 Copyright Notice 55 Copyright (c) 2015 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 71 2. Introduction and problem statement . . . . . . . . . . . . . . 3 72 2.1 Inter-subnet connectivity requirements in Data Centers . . . 4 73 2.2 The requirement for a new EVPN route type . . . . . . . . . 6 74 3. The BGP EVPN IP Prefix route . . . . . . . . . . . . . . . . . 7 75 3.1 IP Prefix Route encoding . . . . . . . . . . . . . . . . . . 8 76 4. Benefits of using the EVPN IP Prefix route . . . . . . . . . . 10 77 5. IP Prefix overlay index use-cases . . . . . . . . . . . . . . . 11 78 5.1 TS IP address overlay index use-case . . . . . . . . . . . . 11 79 5.2 Floating IP overlay index use-case . . . . . . . . . . . . . 14 80 5.3 ESI overlay index ("Bump in the wire") use-case . . . . . . 16 81 5.4 IRB forwarding on NVEs for Subnets (IP-VRF-to-IP-VRF) . . . 18 82 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 22 83 7. Conventions used in this document . . . . . . . . . . . . . . . 22 84 8. Security Considerations . . . . . . . . . . . . . . . . . . . . 22 85 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 23 86 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 87 10.1 Normative References . . . . . . . . . . . . . . . . . . . 23 88 10.2 Informative References . . . . . . . . . . . . . . . . . . 23 89 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23 90 12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 23 91 13. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 24 93 1. Terminology 95 GW IP: Gateway IP Address 97 IPL: IP address length 99 IRB: Integrated Routing and Bridging interface 101 ML: MAC address length 103 NVE: Network Virtualization Edge 105 TS: Tenant System 107 VA: Virtual Appliance 109 RT-2: EVPN route type 2, i.e. MAC/IP advertisement route 111 RT-5: EVPN route type 5, i.e. IP Prefix route 113 Overlay index: object used in the IP Prefix route, as described in 114 this document. It can be an IP address in the tenant space or an ESI, 115 and identifies a pointer yielded by the IP route lookup at the 116 routing context importing the route. An overlay index always needs a 117 recursive route resolution on the NVE receiving the IP Prefix route, 118 so that the NVE knows to which egress NVE it needs to forward the 119 packets. 121 Underlay next-hop: IP address sent by BGP along with any EVPN route, 122 i.e. BGP next-hop. It identifies the NVE sending the route and it is 123 used at the receiving NVE as the VXLAN destination VTEP or NVGRE 124 destination end-point. 126 2. Introduction and problem statement 128 Inter-subnet connectivity is required for certain tenants within the 129 Data Center. [EVPN-INTERSUBNET] defines some fairly common inter- 130 subnet forwarding scenarios where TSes can exchange packets with TSes 131 located in remote subnets. In order to meet this requirement, 132 [EVPN-INTERSUBNET] describes how MAC/IPs encoded in TS RT-2 routes 133 are not only used to populate MAC-VRF and overlay ARP tables, but 134 also IP-VRF tables with the encoded TS host routes (/32 or /128). In 135 some cases, EVPN may advertise IP Prefixes and therefore provide 136 aggregation in the IP-VRF tables, as opposed to program individual 137 host routes. This document complements the scenarios described in 138 [EVPN-INTERSUBNET] and defines how EVPN may be used to advertise IP 139 Prefixes. 141 Section 2.1 describes the inter-subnet connectivity requirements in 142 Data Centers. Section 2.2 explains why a new EVPN route type is 143 required for IP Prefix advertisements. Once the need for a new EVPN 144 route type is justified, sections 3, 4 and 5 will describe this route 145 type and how it is used in some specific use cases. 147 2.1 Inter-subnet connectivity requirements in Data Centers 149 [RFC7432] is used as the control plane for a Network Virtualization 150 Overlay (NVO3) solution in Data Centers (DC), where Network 151 Virtualization Edge (NVE) devices can be located in Hypervisors or 152 TORs, as described in [EVPN-OVERLAY]. 154 If we use the term Tenant System (TS) to designate a physical or 155 virtual system identified by MAC and IP addresses, and connected to 156 an EVPN instance, the following considerations apply: 158 o The Tenant Systems may be Virtual Machines (VMs) that generate 159 traffic from their own MAC and IP. 161 o The Tenant Systems may be Virtual Appliance entities (VAs) that 162 forward traffic to/from IP addresses of different End Devices 163 seating behind them. 165 o These VAs can be firewalls, load balancers, NAT devices, other 166 appliances or virtual gateways with virtual routing instances. 168 o These VAs do not have their own routing protocols and hence 169 rely on the EVPN NVEs to advertise the routes on their behalf. 171 o In all these cases, the VA will forward traffic to the Data 172 Center using its own source MAC but the source IP will be the 173 one associated to the End Device seating behind or a 174 translated IP address (part of a public NAT pool) if the VA is 175 performing NAT. 177 o Note that the same IP address could exist behind two of these 178 TS. One example of this would be certain appliance resiliency 179 mechanisms, where a virtual IP or floating IP can be owned by 180 one of the two VAs running the resiliency protocol (the master 181 VA). VRRP is one particular example of this. Another example 182 is multi-homed subnets, i.e. the same subnet is connected to 183 two VAs. 185 o Although these VAs provide IP connectivity to VMs and subnets 186 behind them, they do not always have their own IP interface 187 connected to the EVPN NVE, e.g. layer-2 firewalls are examples 188 of VAs not supporting IP interfaces. 190 The following figure illustrates some of the examples described 191 above. 192 NVE1 193 +-----------+ 194 TS1(VM)--|(MAC-VRF10)|-----+ 195 IP1/M1 +-----------+ | DGW1 196 +---------+ +-------------+ 197 | |----|(MAC-VRF10) | 198 SN1---+ NVE2 | | | IRB1\ | 199 | +-----------+ | | | (IP-VRF)|---+ 200 SN2---TS2(VA)--|(MAC-VRF10)|-| | +-------------+ _|_ 201 | IP2/M2 +-----------+ | VXLAN/ | ( ) 202 IP4---+ <-+ | nvGRE | DGW2 ( WAN ) 203 | | | +-------------+ (___) 204 vIP23 (floating) | |----|(MAC-VRF10) | | 205 | +---------+ | IRB2\ | | 206 SN1---+ <-+ NVE3 | | | | (IP-VRF)|---+ 207 | IP3/M3 +-----------+ | | | +-------------+ 208 SN3---TS3(VA)--|(MAC-VRF10)|---+ | | 209 | +-----------+ | | 210 IP5---+ | | 211 | | 212 NVE4 | | NVE5 +--SN5 213 +---------------------+ | | +-----------+ | 214 IP6------|(MAC-VRF1) | | +-|(MAC-VRF10)|--TS4(VA)--SN6 215 | \ | | +-----------+ | 216 | (IP-VRF) |--+ ESI4 +--SN7 217 | / \IRB3 | 218 |---|(MAC-VRF2)(MAC-VRF10)| 219 SN4| +---------------------+ 221 Figure 1 DC inter-subnet use-cases 223 Where: 225 NVE1, NVE2, NVE3, NVE4, NVE5, DGW1 and DGW2 share the same EVI for a 226 particular tenant. EVI-10 is comprised of the collection of MAC-VRF10 227 instances defined in all the NVEs. All the hosts connected to EVI-10 228 belong to the same IP subnet. The hosts connected to EVI-10 are 229 listed below: 231 o TS1 is a VM that generates/receives traffic from/to IP1, where 232 IP1 belongs to the EVI-10 subnet. 234 o TS2 and TS3 are Virtual Appliances (VA) that generate/receive 235 traffic from/to the subnets and hosts seating behind them 236 (SN1, SN2, SN3, IP4 and IP5). Their IP addresses (IP2 and IP3) 237 belong to the EVI-10 subnet and they can also generate/receive 238 traffic. When these VAs receive packets destined to their own 239 MAC addresses (M2 and M3) they will route the packets to the 240 proper subnet or host. These VAs do not support routing 241 protocols to advertise the subnets connected to them and can 242 move to a different server and NVE when the Cloud Management 243 System decides to do so. These VAs may also support redundancy 244 mechanisms for some subnets, similar to VRRP, where a floating 245 IP is owned by the master VA and only the master VA forwards 246 traffic to a given subnet. E.g.: vIP23 in figure 1 is a 247 floating IP that can be owned by TS2 or TS3 depending on who 248 the master is. Only the master will forward traffic to SN1. 250 o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 251 have their own IP addresses that belong to the EVI-10 subnet 252 too. These IRB interfaces connect the EVI-10 subnet to Virtual 253 Routing and Forwarding (IP-VRF) instances that can route the 254 traffic to other connected subnets for the same tenant (within 255 the DC or at the other end of the WAN). 257 o TS4 is a layer-2 VA that provides connectivity to subnets SN5, 258 SN6 and SN7, but does not have an IP address itself in the 259 EVI-10. TS4 is connected to a physical port on NVE5 assigned 260 to Ethernet Segment Identifier 4. 262 All the above DC use cases require inter-subnet forwarding and 263 therefore the individual host routes and subnets: 265 a) MUST be advertised from the NVEs (since VAs and VMs do not run 266 routing protocols) and 267 b) MAY be associated to an overlay index that can be a VA IP address, 268 a floating IP address or an ESI. 270 2.2 The requirement for a new EVPN route type 272 [RFC7432] defines a MAC/IP route (also referred as RT-2) where a MAC 273 address can be advertised together with an IP address length (IPL) 274 and IP address (IP). While a variable IPL might have been used to 275 indicate the presence of an IP prefix in a route type 2, there are 276 several specific use cases in which using this route type to deliver 277 IP Prefixes is not suitable. 279 One example of such use cases is the "floating IP" example described 280 in section 2.1. In this example we need to decouple the advertisement 281 of the prefixes from the advertisement of the floating IP (vIP23 in 282 figure 1) and MAC associated to it, otherwise the solution gets 283 highly inefficient and does not scale. 285 E.g.: if we are advertising 1k prefixes from M2 (using RT-2) and the 286 floating IP owner changes from M2 to M3, we would need to withdraw 1k 287 routes from M2 and re-advertise 1k routes from M3. However if we use 288 a separate route type, we can advertise the 1k routes associated to 289 the floating IP address (vIP23) and only one RT-2 for advertising the 290 ownership of the floating IP, i.e. vIP23 and M2 in the route type 2. 291 When the floating IP owner changes from M2 to M3, a single RT-2 292 withdraw/update is required to indicate the change. The remote DGW 293 will not change any of the 1k prefixes associated to vIP23, but will 294 only update the ARP resolution entry for vIP23 (now pointing at M3). 296 Other reasons to decouple the IP Prefix advertisement from the MAC/IP 297 route are listed below: 299 o Clean identification, operation of troubleshooting of IP 300 Prefixes, not subject to interpretation and independent of the 301 IPL and the IP value. E.g.: a default IP route 0.0.0.0/0 must 302 always be easily and clearly distinguished from the absence of 303 IP information. 305 o MAC address information must not be compared by BGP when 306 selecting two IP Prefix routes. If IP Prefixes were to be 307 advertised using MAC/IP routes, the MAC information would 308 always be present and part of the route key. 310 o IP Prefix routes must not be subject to MAC/IP route 311 procedures such as MAC mobility or aliasing. Prefixes 312 advertised from two different ESIs do not mean mobility; MACs 313 advertised from two different ESIs do mean mobility. Similarly 314 load balancing for IP prefixes is achieved through IP 315 mechanisms such as ECMP, and not through MAC route mechanisms 316 such as aliasing. 318 o NVEs that do not require processing IP Prefixes must have an 319 easy way to identify an update with an IP Prefix and ignore 320 it, rather than processing the MAC/IP route to find out only 321 later that it carries a Prefix that must be ignored. 323 The following sections describe how EVPN is extended with a new route 324 type for the advertisement of IP prefixes and how this route is used 325 to address the current and future inter-subnet connectivity 326 requirements existing in the Data Center. 328 3. The BGP EVPN IP Prefix route 330 The current BGP EVPN NLRI as defined in [RFC7432] is shown below: 332 +-----------------------------------+ 333 | Route Type (1 octet) | 334 +-----------------------------------+ 335 | Length (1 octet) | 336 +-----------------------------------+ 337 | Route Type specific (variable) | 338 +-----------------------------------+ 340 Where the route type field can contain one of the following specific 341 values: 343 + 1 - Ethernet Auto-Discovery (A-D) route 345 + 2 - MAC/IP advertisement route 347 + 3 - Inclusive Multicast Route 349 + 4 - Ethernet Segment Route 351 This document defines an additional route type that will be used for 352 the advertisement of IP Prefixes: 354 + 5 - IP Prefix Route 356 The support for this new route type is OPTIONAL. 358 Since this new route type is OPTIONAL, an implementation not 359 supporting it MUST ignore the route, based on the unknown route type 360 value. 362 The detailed encoding of this route and associated procedures are 363 described in the following sections. 365 3.1 IP Prefix Route encoding 367 An IP Prefix advertisement route NLRI consists of the following 368 fields: 370 +---------------------------------------+ 371 | RD (8 octets) | 372 +---------------------------------------+ 373 |Ethernet Segment Identifier (10 octets)| 374 +---------------------------------------+ 375 | Ethernet Tag ID (4 octets) | 376 +---------------------------------------+ 377 | IP Prefix Length (1 octet) | 378 +---------------------------------------+ 379 | IP Prefix (4 or 16 octets) | 380 +---------------------------------------+ 381 | GW IP Address (4 or 16 octets) | 382 +---------------------------------------+ 383 | MPLS Label (3 octets) | 384 +---------------------------------------+ 386 Where: 388 o RD, Ethernet Tag ID and MPLS Label fields will be used as 389 defined in [RFC7432] and [EVPN-OVERLAY]. 391 o The Ethernet Segment Identifier will be a non-zero 10-byte 392 identifier if the ESI is used as an overlay index. It will be 393 zero otherwise. 395 o The IP Prefix Length can be set to a value between 0 and 32 396 (bits) for ipv4 and between 0 and 128 for ipv6. 398 o The IP Prefix will be a 32 or 128-bit field (ipv4 or ipv6). 400 o The GW IP (Gateway IP Address) will be a 32 or 128-bit field 401 (ipv4 or ipv6), and will encode an overlay IP index for the IP 402 Prefixes. The GW IP field can be zero if it is not used as an 403 overlay index. 405 o The total route length will indicate the type of prefix (ipv4 406 or ipv6) and the type of GW IP address (ipv4 or ipv6). Note 407 that the IP Prefix + the GW IP should have a length of either 408 64 or 256 bits, but never 160 bits (ipv4 and ipv6 mixed values 409 are not allowed). 411 The Eth-Tag ID, IP Prefix Length and IP Prefix will be part of the 412 route key used by BGP to compare routes. The rest of the fields will 413 not be part of the route key. 415 The route will contain a single overlay index at most, i.e. if the 416 ESI field is different from zero, the GW IP field will be zero, and 417 vice versa. The following table shows the different inter-subnet use- 418 cases described in this document and the corresponding coding of the 419 overlay index in the route type 5 (RT-5). The IP-VRF-to-IP-VRF or IRB 420 forwarding on NVEs case is a special use-case, where there may be no 421 need for overlay index, since the actual next-hop is given by the BGP 422 next-hop. When an overlay index is present in the RT-5, the receiving 423 NVE will need to perform a recursive route resolution to find out to 424 which egress NVE to forward the packets. 426 +----------------------------+--------------------------------------+ 427 | Use-case | Overlay Index in the RT-5 BGP update | 428 +----------------------------+--------------------------------------+ 429 | TS IP address | Overlay GW IP Address | 430 | Floating IP address | Overlay GW IP Address | 431 | "Bump in the wire" | ESI | 432 | IP-VRF-to-IP-VRF | Overlay GW IP or N/A | 433 +----------------------------+--------------------------------------+ 435 4. Benefits of using the EVPN IP Prefix route 437 This section clarifies the different functions accomplished by the 438 EVPN RT-2 and RT-5 routes, and provides a list of benefits derived 439 from using a separate route type for the advertisement of IP Prefixes 440 in EVPN. 442 [RFC7432] describes the content of the BGP EVPN RT-2 specific NLRI, 443 i.e. MAC/IP Advertisement Route, where the IP address length (IPL) 444 and IP address (IP) of a specific advertised MAC are encoded. The 445 subject of the MAC advertisement route is the MAC address (M) and MAC 446 address length (ML) encoded in the route. The MAC mobility and other 447 procedures are defined around that MAC address. The IP address 448 information carries the host IP address required for the ARP 449 resolution of the MAC according to [RFC7432] and the host route to be 450 programmed in the IP-VRF [EVPN-INTERSUBNET]. 452 The BGP EVPN route type 5 defined in this document, i.e. IP Prefix 453 Advertisement route, decouples the advertisement of IP prefixes from 454 the advertisement of any MAC address related to it. This brings some 455 major benefits to NVO-based networks where certain inter-subnet 456 forwarding scenarios are required. Some of those benefits are: 458 a) Upon receiving a route type 2 or type 5, an egress NVE can easily 459 distinguish MACs and IPs from IP Prefixes. E.g. an IP prefix with 460 IPL=32 being advertised from two different ingress NVEs (as RT-5) 461 can be identified as such and be imported in the designated 462 routing context as two ECMP routes, as opposed to two MACs 463 competing for the same IP. 465 b) Similarly, upon receiving a route, an ingress NVE not supporting 466 processing of IP Prefixes can easily ignore the update, based on 467 the route type. 469 c) A MAC route includes the ML, M, IPL and IP in the route key that 470 is used by BGP to compare routes, whereas for IP Prefix routes, 471 only IPL and IP (as well as Ethernet Tag ID) are part of the route 472 key. Advertised IP Prefixes are imported into the designated 473 routing context, where there is no MAC information associated to 474 IP routes. In the example illustrated in figure 1, subnet SN1 475 should be advertised by NVE2 and NVE3 and interpreted by DGW1 as 476 the same route coming from two different next-hops, regardless of 477 the MAC address associated to TS2 or TS3. This is easily 478 accomplished in the RT-5 by including only the IP information in 479 the route key. 481 d) By decoupling the MAC from the IP Prefix advertisement procedures, 482 we can leave the IP Prefix advertisements out of the MAC mobility 483 procedures defined in [RFC7432] for MACs. In addition, this allows 484 us to have an indirection mechanism for IP Prefixes advertised 485 from a MAC/IP that can move between hypervisors. E.g. if there are 486 1,000 prefixes seating behind TS2 (figure 1), NVE2 will advertise 487 all those prefixes in RT-5 routes associated to the overlay index 488 IP2. Should TS2 move to a different NVE, a single MAC/IP 489 advertisement route withdraw for the M2/IP2 route from NVE2 will 490 invalidate the 1,000 prefixes, as opposed to have to wait for each 491 individual prefix to be withdrawn. This may be easily accomplished 492 by using IP Prefix routes that are not tied to a MAC address, and 493 use a different MAC/IP route to advertise the location and 494 resolution of the overlay index to a MAC address. 496 5. IP Prefix overlay index use-cases 498 The IP Prefix route can use a GW IP or an ESI as an overlay index as 499 well as no overlay index whatsoever. This section describes some use- 500 cases for these index types. 502 5.1 TS IP address overlay index use-case 504 The following figure illustrates an example of inter-subnet 505 forwarding for subnets seating behind Virtual Appliances (on TS2 and 506 TS3). 508 SN1---+ NVE2 DGW1 509 | +-----------+ +---------+ +-------------+ 510 SN2---TS2(VA)--|(MAC-VRF10)|-| |----|(MAC-VRF10) | 511 | IP2/M2 +-----------+ | | | IRB1\ | 512 IP4---+ | | | (IP-VRF)|---+ 513 | | +-------------+ _|_ 514 | VXLAN/ | ( ) 515 | nvGRE | DGW2 ( WAN ) 516 SN1---+ NVE3 | | +-------------+ (___) 517 | IP3/M3 +-----------+ | |----|(MAC-VRF10) | | 518 SN3---TS3(VA)--|(MAC-VRF10)|-| | | IRB2\ | | 519 | +-----------+ +---------+ | (IP-VRF)|---+ 520 IP5---+ +-------------+ 522 Figure 2 TS IP address use-case 524 An example of inter-subnet forwarding between subnet SN1/24 and a 525 subnet seating in the WAN is described below. NVE2, NVE3, DGW1 and 526 DGW2 are running BGP EVPN. TS2 and TS3 do not support routing 527 protocols, only a static route to forward the traffic to the WAN. 529 (1) NVE2 advertises the following BGP routes on behalf of TS2: 531 o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32, 532 IP=IP2 and [RFC5512] BGP Encapsulation Extended Community with 533 the corresponding Tunnel-type. 535 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 536 ESI=0, GW IP address=IP2 (and BGP Encapsulation Extended 537 Community). 539 (2) NVE3 advertises the following BGP routes on behalf of TS3: 541 o Route type 2 (MAC/IP route) containing: ML=48, M=M3, IPL=32, 542 IP=IP3 (and BGP Encapsulation Extended Community). 544 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 545 ESI=0, GW IP address=IP3 (and BGP Encapsulation Extended 546 Community). 548 (3) DGW1 and DGW2 import both received routes based on the 549 route-targets: 551 o Based on the MAC-VRF10 route-target in DGW1 and DGW2, the 552 MAC/IP route is imported and M2 is added to the MAC-VRF10 553 along with its corresponding tunnel information. For instance, 554 if VXLAN is used, the VTEP will be derived from the MAC/IP 555 route BGP next-hop (underlay next-hop) and VNI from the MPLS 556 Label1 field. IP2 - M2 is added to the ARP table. 558 o Based on the MAC-VRF10 route-target in DGW1 and DGW2, the IP 559 Prefix route is also imported and SN1/24 is added to the IP- 560 VRF with overlay index IP2 pointing at the local MAC-VRF10. 561 Should ECMP be enabled in the IP-VRF, SN1/24 would also be 562 added to the routing table with overlay index IP3. 564 (4) When DGW1 receives a packet from the WAN with destination IPx, 565 where IPx belongs to SN1/24: 567 o A destination IP lookup is performed on the DGW1 IP-VRF 568 routing table and overlay index=IP2 is found. Since IP2 is an 569 overlay index a recursive route resolution is required for 570 IP2. 572 o IP2 is resolved to M2 in the ARP table, and M2 is resolved to 573 the tunnel information given by the MAC-VRF FIB (e.g. remote 574 VTEP and VNI for the VXLAN case). 576 o The IP packet destined to IPx is encapsulated with: 578 . Source inner MAC = IRB1 MAC. 580 . Destination inner MAC = M2. 582 . Tunnel information provided by the MAC-VRF (VNI, VTEP IPs 583 and MACs for the VXLAN case). 585 (5) When the packet arrives at NVE2: 587 o Based on the tunnel information (VNI for the VXLAN case), the 588 MAC-VRF10 context is identified for a MAC lookup. 590 o Encapsulation is stripped-off and based on a MAC lookup 591 (assuming MAC forwarding on the egress NVE), the packet is 592 forwarded to TS2, where it will be properly routed. 594 (6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will 595 be applied to the MAC route IP2/M2, as defined in [RFC7432]. 596 Route type 5 prefixes are not subject to MAC mobility procedures, 597 hence no changes in the DGW IP-VRF routing table will occur for 598 TS2 mobility, i.e. all the prefixes will still be pointing at IP2 599 as overlay index. There is an indirection for e.g. SN1/24, which 600 still points at overlay index IP2 in the routing table, but IP2 601 will be simply resolved to a different tunnel, based on the 602 outcome of the MAC mobility procedures for the MAC/IP route 603 IP2/M2. 605 Note that in the opposite direction, TS2 will send traffic based on 606 its static-route next-hop information (IRB1 and/or IRB2), and regular 607 EVPN procedures will be applied. 609 5.2 Floating IP overlay index use-case 611 Sometimes Tenant Systems (TS) work in active/standby mode where an 612 upstream floating IP - owned by the active TS - is used as the 613 overlay index to get to some subnets behind. This redundancy mode, 614 already introduced in section 2.1 and 2.2, is illustrated in Figure 615 3. 617 NVE2 DGW1 618 +-----------+ +---------+ +-------------+ 619 +---TS2(VA)--|(MAC-VRF10)|-| |----|(MAC-VRF10) | 620 | IP2/M2 +-----------+ | | | IRB1\ | 621 | <-+ | | | (IP-VRF)|---+ 622 | | | | +-------------+ _|_ 623 SN1 vIP23 (floating) | VXLAN/ | ( ) 624 | | | nvGRE | DGW2 ( WAN ) 625 | <-+ NVE3 | | +-------------+ (___) 626 | IP3/M3 +-----------+ | |----|(MAC-VRF10) | | 627 +---TS3(VA)--|(MAC-VRF10)|-| | | IRB2\ | | 628 +-----------+ +---------+ | (IP-VRF)|---+ 629 +-------------+ 631 Figure 3 Floating IP overlay index for redundant TS 633 In this example, assuming TS2 is the active TS and owns IP23: 635 (1) NVE2 advertises the following BGP routes for TS2: 637 o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32, 638 IP=IP23 (and BGP Encapsulation Extended Community). 640 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 641 ESI=0, GW IP address=IP23 (and BGP Encapsulation Extended 642 Community). 644 (2) NVE3 advertises the following BGP routes for TS3: 646 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 647 ESI=0, GW IP address=IP23 (and BGP Encapsulation Extended 648 Community). 650 (3) DGW1 and DGW2 import both received routes based on the route- 651 target: 653 o M2 is added to the MAC-VRF10 FIB along with its corresponding 654 tunnel information. For the VXLAN use case, the VTEP will be 655 derived from the MAC/IP route BGP next-hop and VNI from the 656 VNI/VSID field. IP23 - M2 is added to the ARP table. 658 o SN1/24 is added to the IP-VRF in DGW1 and DGW2 with overlay 659 index IP23 pointing at the local MAC-VRF10. 661 (4) When DGW1 receives a packet from the WAN with destination IPx, 662 where IPx belongs to SN1/24: 664 o A destination IP lookup is performed on the DGW1 IP-VRF 665 routing table and overlay index=IP23 is found. Since IP23 is 666 an overlay index, a recursive route resolution for IP23 is 667 required. 669 o IP23 is resolved to M2 in the ARP table, and M2 is resolved to 670 the tunnel information given by the MAC-VRF (remote VTEP and 671 VNI for the VXLAN case). 673 o The IP packet destined to IPx is encapsulated with: 675 . Source inner MAC = IRB1 MAC. 677 . Destination inner MAC = M2. 679 . Tunnel information provided by the MAC-VRF FIB (VNI, VTEP 680 IPs and MACs for the VXLAN case). 682 (5) When the packet arrives at NVE2: 684 o Based on the tunnel information (VNI for the VXLAN case), the 685 MAC-VRF10 context is identified for a MAC lookup. 687 o Encapsulation is stripped-off and based on a MAC lookup 688 (assuming MAC forwarding on the egress NVE), the packet is 689 forwarded to TS2, where it will be properly routed. 691 (6) When the redundancy protocol running between TS2 and TS3 appoints 692 TS3 as the new active TS for SN1, TS3 will now own the floating 693 IP23 and will signal this new ownership (GARP message or 694 similar). Upon receiving the new owner's notification, NVE3 will 695 issue a route type 2 for M3-IP23. DGW1 and DGW2 will update their 696 ARP tables with the new MAC resolving the floating IP. No changes 697 are carried out in the IP-VRF routing table. 699 5.3 ESI overlay index ("Bump in the wire") use-case 701 Figure 5 illustrates an example of inter-subnet forwarding for an IP 702 Prefix route that carries a subnet SN1 and uses an ESI as an overlay 703 index (ESI23). In this use-case, TS2 and TS3 are layer-2 VA devices 704 without any IP address that can be included as an overlay index in 705 the GW IP field of the IP Prefix route. Their MAC addresses are M2 706 and M3 respectively and are connected to EVI-10. Note that IRB1 and 707 IRB2 (in DGW1 and DGW2 respectively) have IP addresses in a subnet 708 different than SN1. 710 NVE2 DGW1 711 M2 +-----------+ +---------+ +-------------+ 712 +---TS2(VA)--|(MAC-VRF10)|-| |----|(MAC-VRF10) | 713 | ESI23 +-----------+ | | | IRB1\ | 714 | + | | | (IP-VRF)|---+ 715 | | | | +-------------+ _|_ 716 SN1 | | VXLAN/ | ( ) 717 | | | nvGRE | DGW2 ( WAN ) 718 | + NVE3 | | +-------------+ (___) 719 | ESI23 +-----------+ | |----|(MAC-VRF10) | | 720 +---TS3(VA)--|(MAC-VRF10)|-| | | IRB2\ | | 721 M3 +-----------+ +---------+ | (IP-VRF)|---+ 722 +-------------+ 724 Figure 5 ESI overlay index use-case 726 Since neither TS2 nor TS3 can run any routing protocol and have no IP 727 address assigned, an ESI, i.e. ESI23, will be provisioned on the 728 attachment ports of NVE2 and NVE3. This model supports VA redundancy 729 in a similar way as the one described in section 5.2 for the floating 730 IP overlay index use-case, only using the EVPN Ethernet A-D route 731 instead of the MAC advertisement route to advertise the location of 732 the overlay index. The procedure is explained below: 734 (1) NVE2 advertises the following BGP routes for TS2: 736 o Route type 1 (Ethernet A-D route for EVI-10) containing: 737 ESI=ESI23 and the corresponding tunnel information (VNI/VSID 738 field), as well as the BGP Encapsulation Extended Community as 739 per [EVPN-OVERLAY]. 741 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 742 ESI=ESI23, GW IP address=0 (and BGP Encapsulation Extended 743 Community). The Router's MAC Extended Community defined in 744 [EVPN-INTERSUBNET] is added and carries the MAC address (M2) 745 associated to the TS behind which SN1 seats. 747 (2) NVE3 advertises the following BGP routes for TS3: 749 o Route type 1 (Ethernet A-D route for EVI-10) containing: 750 ESI=ESI23 and the corresponding tunnel information (VNI/VSID 751 field), as well as the BGP Encapsulation Extended Community. 752 Note that if the resiliency mechanism for TS2 and TS3 is in 753 all-active mode, both NVE2 and NVE3 will send the A-D route. 754 Otherwise, that is, the resiliency is single-active, only the 755 NVE owning the active ESI will advertise the Ethernet A-D 756 route for ESI23. 758 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 759 ESI=23, GW IP address=0 (and BGP Encapsulation Extended 760 Community). The Router's MAC Extended Community is added and 761 carries the MAC address (M3) associated to the TS behind which 762 SN1 seats. 764 (3) DGW1 and DGW2 import the received routes based on the route- 765 target: 767 o The tunnel information to get to ESI23 is installed in DGW1 768 and DGW2. For the VXLAN use case, the VTEP will be derived 769 from the Ethernet A-D route BGP next-hop and VNI from the 770 VNI/VSID field (see [EVPN-OVERLAY]). 772 o SN1/24 is added to the IP-VRF in DGW1 and DGW2 with overlay 773 index ESI23. 775 (4) When DGW1 receives a packet from the WAN with destination IPx, 776 where IPx belongs to SN1/24: 778 o A destination IP lookup is performed on the DGW1 IP-VRF 779 routing table and overlay index=ESI23 is found. Since ESI23 is 780 an overlay index, a recursive route resolution is required to 781 find the egress NVE where ESI23 resides. 783 o The IP packet destined to IPx is encapsulated with: 785 . Source inner MAC = IRB1 MAC. 787 . Destination inner MAC = M2 (this MAC will be obtained 788 from the Router's MAC Extended Community received along 789 with the RT-5 for SN1). 791 . Tunnel information for the NVO tunnel is provided by the 792 Ethernet A-D route per-EVI for ESI23 (VNI and VTEP IP for 793 the VXLAN case). 795 (5) When the packet arrives at NVE2: 797 o Based on the tunnel information (VNI for the VXLAN case), the 798 MAC-VRF10 context is identified for a MAC lookup (assuming MAC 799 disposition model). 801 o Encapsulation is stripped-off and based on a MAC lookup 802 (assuming MAC forwarding on the egress NVE), the packet is 803 forwarded to TS2, where it will be forwarded to SN1. 805 (6) If the redundancy protocol running between TS2 and TS3 follows an 806 active/standby model and there is a failure, appointing TS3 as 807 the new active TS for SN1, TS3 will now own the connectivity to 808 SN1 and will signal this new ownership. Upon receiving the new 809 owner's notification, NVE3 will issue a route type 1 for ESI23, 810 whereas NVE2 will withdraw its Ethernet A-D route for ESI23. DGW1 811 and DGW2 will update their tunnel information to resolve ESI23. 812 The destination inner MAC will be changed to M3. 814 5.4 IRB forwarding on NVEs for Subnets (IP-VRF-to-IP-VRF) 816 This use-case is similar to the scenario described in "IRB forwarding 817 on NVEs for Tenant Systems" in [EVPN-INTERSUBNET], however the new 818 requirement here is the advertisement of IP Prefixes as opposed to 819 only host routes. In the previous examples, the MAC-VRF instance can 820 connect IRB interfaces and any other Tenant Systems connected to it. 821 EVPN provides connectivity for: 823 a) Traffic destined to the IRB IP interfaces as well as 825 b) Traffic destined to IP subnets seating behind the TS, e.g. SN1 or 826 SN2. 828 In order to provide connectivity for (a), MAC/IP routes (RT-2) are 829 needed so that IRB MACs and IPs can be distributed. Connectivity type 830 (b) is accomplished by the exchange of IP Prefix routes (RT-5) for 831 IPs and subnets seating behind certain overlay indexes, e.g. GW IP or 832 ESI. 834 In some cases, IP Prefix routes may be advertised for subnets and IPs 835 seating behind an IRB. This use case is depicted in the diagram below 836 and we refer to it as the "IRB forwarding on NVEs for Subnets" or 837 "IP-VRF-to-IP-VRF" use-case: 839 NVE1 840 +------------+ 841 IP1-----|(MAC-VRF1) | DGW1 842 | \ IRB-1(M1)---------+ +--------+ 843 | (IP-VRF)|----| |-|(IP-VRF)|----+ 844 | / | | | +--------+ | 845 |---|(MAC-VRF2) | | | _|_ 846 | +------------+ | | ( ) 847 SN1| | VXLAN/ | ( WAN ) 848 | NVE2 | nvGRE | (___) 849 | +------------+ | | | 850 |---|(MAC-VRF2) | | | DGW2 | 851 | \ IRB-2(M2) | +--------+ | 852 | (IP-VRF)|----| |-|(IP-VRF)|----+ 853 | / | +---------+ +--------+ 854 SN2-----|(MAC-VRF3) | 855 +------------+ 857 Figure 6 Inter-subnet forwarding on NVEs for Subnets 859 In this case, we need to provide connectivity from/to IP hosts in 860 SN1, SN2, IP1 and hosts seating at the other end of the WAN. 862 The solution must provide connectivity in this use case, irrespective 863 of whether the data plane between IP-VRFs requires an inner layer-2 864 header. 866 The EVPN route type 5 will be used to advertise the IP Prefixes, 867 along with the Router's MAC Extended Community as defined in [EVPN- 868 INTERSUBNET]. Each NVE/DGW will advertise an RT-5 for each of its 869 prefixes with the following fields: 871 o RD as per [RFC7432]. 873 o Eth-Tag ID = 0 assuming VLAN-based service. 875 o IP address length and IP address, as explained in the previous 876 sections. 878 o GW IP address= 0 or IRB-IP (see below for further explanation) 880 o ESI=0 882 o MPLS label or VNI corresponding to the IP-VRF. 884 Each RT-5 will be sent with a route-target identifying the tenant 885 (IP-VRF) and two BGP extended communities: 887 o The first one is the BGP Encapsulation Extended Community, as 888 per [RFC5512], identifying the tunnel type. 890 o The second one is the Router's MAC Extended Community as per 891 [EVPN-INTERSUBNET] containing the MAC address associated to 892 the NVE advertising the route. This MAC address identifies the 893 NVE/DGW and MAY be re-used for all the IP-VRFs in the NVE. The 894 Router's MAC Extended Community MUST be sent if the associated 895 RT-5's GW IP Address is zero. 897 If the data plane between IP-VRFs does not require an inner layer-2 898 header (e.g. VXLAN GPE) NVE1 and NVE2 will only send a RT-5 per IP 899 Prefix that they have attached to their respective IP-VRF, e.g. IP1, 900 SN1 and SN2. 902 If the data plane between IP-VRFs requires an inner layer-2 header 903 (e.g. VXLAN or nvGRE) NVE1 and NVE2 will additionally send an RT-2 904 for their IRB interface interconnecting the IP-VRFs for the same 905 tenant. In Figure 6, the IRB interfaces interconnecting IP-VRFs in 906 NVE1 and NVE2 are referred to as IRB-1 and IRB-2 and have the MAC 907 addresses M1 and M2 respectively. 909 The following example illustrates the procedure to advertise and 910 forward packets to SN1/24 (ipv4 prefix advertised from NVE1) for 911 VXLAN tunnels: 913 (1) NVE1 advertises the following BGP routes: 915 o Route type 5 (IP Prefix route) containing: 917 . IPL=24, IP=SN1, VNI=10. 919 . GW IP=0 if IRB-1 is NOT IP-reachable or GW IP=IRB-1-IP if 920 IRB-1 is IP-reachable. 922 . [RFC5512] BGP Encapsulation Extended Community with Tunnel- 923 type= VXLAN. 925 . Router's MAC Extended Community that contains M1. 927 . Route-target identifying the tenant (IP-VRF). 929 o Route type 2 (MAC/IP route for IRB-1) containing: 931 . ML=48, M=M1, IPL= 0 or 32, VNI=10. 933 . IP= null (if IRB-1 is not IP-reachable) or IRB-1-IP1 (if 934 IRB-1 is IP-reachable). 936 . A [RFC5512] BGP Encapsulation Extended Community with 937 Tunnel-type= VXLAN. 939 . Route-target identifying the tenant. This route-target MAY 940 be the same one used with the RT-5. 942 (2) DGW1 imports the received routes from NVE1: 944 o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 945 route-target. 947 . If GW IP is different from zero, the GW IP - IRB-1-IP1 - 948 will be used as the overlay index for the recursive route 949 resolution to the RT-2 carrying IRB-1-IP1. 951 . If GW IP=0, an implementation MAY use the VNI and next-hop 952 of the RT-5, as well as the MAC address conveyed in the 953 Router's MAC Extended Community (as inner destination MAC 954 address). 956 (3) When DGW1 receives a packet from the WAN with destination IPx, 957 where IPx belongs to SN1/24: 959 o A destination IP lookup is performed on the DGW1 IP-VRF 960 routing table that yields SN1/24. 962 . If RT-5 for SN1/24 had a GW IP=IRB-1-IP1, this GW IP will be 963 used as an overlay index that will be recursively resolved 964 to the tunnel information received from the RT-2. 966 . If the RT-5 for SN1/24 had a GW IP=0, DGW1 MAY not refer to 967 the RT-2. 969 o The IP packet destined to IPx is encapsulated with: Source 970 inner MAC = DGW1 MAC, Destination inner MAC = M1, Source outer 971 IP (source VTEP) = DGW1 IP, Destination outer IP (destination 972 VTEP) = NVE1 IP. 974 (4) When the packet arrives at NVE1: 976 o NVE1 will identify the IP-VRF for an IP-lookup based on the 977 VNI or the VNI and the inner MAC DA (this is implementation 978 specific). 980 o An IP lookup is performed in the routing context, where SN1 981 turns out to be a local subnet associated to MAC-VRF2. A 982 subsequent lookup in the ARP table and the MAC-VRF FIB will 983 provide the forwarding information for the packet in MAC-VRF2. 985 6. Conclusions 987 A new EVPN route type 5 for the advertisement of IP Prefixes is 988 described in this document. This new route type has a differentiated 989 role from the RT-2 route and addresses all the Data Center (or NVO- 990 based networks in general) inter-subnet connectivity scenarios in 991 which an IP Prefix advertisement is required. Using this new RT-5, an 992 IP Prefix may be advertised along with an overlay index that can be a 993 GW IP address or an ESI, or without an overlay index, in which case 994 the BGP next-hop will point at the egress NVE and the MAC in the 995 Router's MAC Extended Community will provide the inner MAC 996 destination address to be used. As discussed throughout the document, 997 the EVPN RT-2 does not meet the requirements for all the DC use 998 cases, therefore this EVPN route type is required. 1000 The EVPN route type 5 decouples the IP Prefix advertisements from the 1001 MAC/IP route advertisements in EVPN, hence: 1003 a) Allows the clean and clear advertisements of ipv4 or ipv6 prefixes 1004 in an NLRI with no MAC addresses in the route key, so that only IP 1005 information is used in BGP route comparisons. 1007 b) Since the route type is different from the MAC/IP Advertisement 1008 route, the advertisement of prefixes will be excluded from all the 1009 procedures defined for the advertisement of VM MACs, e.g. MAC 1010 Mobility or aliasing. As a result of that, the current EVPN 1011 procedures do not need to be modified. 1013 c) Allows a flexible implementation where the prefix can be linked to 1014 different types of overlay indexes: overlay IP address, overlay 1015 ESI, underlay IP next-hops, etc. 1017 d) An EVPN implementation not requiring IP Prefixes can simply 1018 discard them by looking at the route type value. An unknown route 1019 type MUST be ignored by the receiving NVE/PE. 1021 7. Conventions used in this document 1023 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 1024 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 1025 document are to be interpreted as described in RFC-2119 [RFC2119]. 1027 8. Security Considerations 1028 9. IANA Considerations 1030 This document requests the allocation of value 5 in the "EVPN Route 1031 Types" registry defined by [RFC7432] and modification of the registry 1032 as follows: 1034 Value Description Reference 1035 5 IP Prefix route [this document] 1036 6-255 Unassigned 1038 10. References 1040 10.1 Normative References 1042 [RFC4364]Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1043 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, 1044 . 1046 [RFC7432]Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1047 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet 1048 VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, . 1051 10.2 Informative References 1053 [EVPN-OVERLAY] Sajassi-Drake et al., "A Network Virtualization 1054 Overlay Solution using EVPN", draft-ietf-bess-evpn-overlay-01.txt, 1055 work in progress, February, 2015 1057 [EVPN-INTERSUBNET] Sajassi et al., "IP Inter-Subnet Forwarding in 1058 EVPN", draft-ietf-bess-evpn-inter-subnet-forwarding-00.txt, work in 1059 progress, November, 2014 1061 11. Acknowledgments 1063 The authors would like to thank Mukul Katiyar and Senthil Sathappan 1064 for their valuable feedback and contributions. The following people 1065 also helped improving this document with their feedback: Tony 1066 Przygienda and Thomas Morin. 1068 12. Contributors 1070 In addition to the authors listed on the front page, the following 1071 co-authors have also contributed to this document: 1073 Florin Balus 1075 13. Authors' Addresses 1077 Jorge Rabadan 1078 Alcatel-Lucent 1079 777 E. Middlefield Road 1080 Mountain View, CA 94043 USA 1081 Email: jorge.rabadan@alcatel-lucent.com 1083 Wim Henderickx 1084 Alcatel-Lucent 1085 Email: wim.henderickx@alcatel-lucent.com 1087 Aldrin Isaac 1088 Bloomberg 1089 Email: aisaac71@bloomberg.net 1091 Senad Palislamovic 1092 Alcatel-Lucent 1093 Email: senad.palislamovic@alcatel-lucent.com 1095 John E. Drake 1096 Juniper Networks 1097 Email: jdrake@juniper.net 1099 Ali Sajassi 1100 Cisco 1101 Email: sajassi@cisco.com 1103 Wen Lin 1104 Juniper Networks 1105 Email: wlin@juniper.net