idnits 2.17.1 draft-ietf-bess-evpn-prefix-advertisement-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 28, 2015) is 3366 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5512' is mentioned on line 903, but not defined ** Obsolete undefined reference: RFC 5512 (Obsoleted by RFC 9012) == Missing Reference: 'RFC2119' is mentioned on line 982, but not defined == Outdated reference: A later version (-12) exists of draft-ietf-bess-evpn-overlay-00 == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-00 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 L2VPN Workgroup J. Rabadan 3 Internet Draft W. Henderickx 4 S. Palislamovic 5 Intended status: Standards Track Alcatel-Lucent 7 J. Drake F. Balus 8 Juniper Nuage Networks 10 A. Sajassi A. Isaac 11 Cisco Bloomberg 13 Expires: August 1, 2015 January 28, 2015 15 IP Prefix Advertisement in EVPN 16 draft-ietf-bess-evpn-prefix-advertisement-00 18 Abstract 20 EVPN provides a flexible control plane that allows intra-subnet 21 connectivity in an IP/MPLS and/or an NVO-based network. In NVO 22 networks, there is also a need for a dynamic and efficient inter- 23 subnet connectivity across Tenant Systems and End Devices that can be 24 physical or virtual and may not support their own routing protocols. 25 This document defines a new EVPN route type for the advertisement of 26 IP Prefixes and explains some use-case examples where this new route- 27 type is used. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as Internet- 37 Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/ietf/1id-abstracts.txt 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 This Internet-Draft will expire on August 1, 2015. 51 Copyright Notice 53 Copyright (c) 2015 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 2. Introduction and problem statement . . . . . . . . . . . . . . 3 70 2.1 Inter-subnet connectivity requirements in Data Centers . . . 4 71 2.2 The requirement for a new EVPN route type . . . . . . . . . 6 72 3. The BGP EVPN IP Prefix route . . . . . . . . . . . . . . . . . 7 73 3.1 IP Prefix Route encoding . . . . . . . . . . . . . . . . . . 8 74 4. Benefits of using the EVPN IP Prefix route . . . . . . . . . . 10 75 5. IP Prefix next-hop use-cases . . . . . . . . . . . . . . . . . 11 76 5.1 TS IP address next-hop use-case . . . . . . . . . . . . . . 11 77 5.2 Floating IP next-hop use-case . . . . . . . . . . . . . . . 14 78 5.3 ESI next-hop ("Bump in the wire") use-case . . . . . . . . . 16 79 5.4 IRB forwarding on NVEs for Subnets (IP-VRF-to-IP-VRF) . . . 18 80 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 21 81 7. Conventions used in this document . . . . . . . . . . . . . . . 21 82 8. Security Considerations . . . . . . . . . . . . . . . . . . . . 22 83 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 22 84 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 85 10.1 Normative References . . . . . . . . . . . . . . . . . . . 22 86 10.2 Informative References . . . . . . . . . . . . . . . . . . 22 87 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 22 88 12. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 22 90 1. Terminology 92 GW IP: Gateway IP Address 94 IPL: IP address length 96 IRB: Integrated Routing and Bridging interface 98 ML: MAC address length 100 NVE: Network Virtualization Edge 102 TS: Tenant System 104 VA: Virtual Appliance 106 RT-2: EVPN route type 2, i.e. MAC/IP advertisement route 108 RT-5: EVPN route type 5, i.e. IP Prefix route 110 Overlay next-hop: object used in the IP Prefix route, as described in 111 this document. It can be an IP address in the tenant space or an ESI, 112 and identifies the next-hop yielded by the IP route lookup at the 113 routing context importing the route. An overlay next-hop always needs 114 a recursive route resolution on the NVE receiving the IP Prefix 115 route, so that the NVE knows to which egress NVE to forward the 116 packets. 118 Underlay next-hop: IP address sent by BGP along with any EVPN route, 119 i.e. BGP next-hop. It identifies the NVE sending the route and it is 120 used at the receiving NVE as the VXLAN destination VTEP or NVGRE 121 destination end-point. 123 2. Introduction and problem statement 125 Inter-subnet connectivity is required for certain tenants within the 126 Data Center. [EVPN-INTERSUBNET] defines some fairly common inter- 127 subnet forwarding scenarios where TSes can exchange packets with TSes 128 located in remote subnets. In order to meet this requirement, 129 [EVPN-INTERSUBNET] describes how MAC/IPs encoded in TS RT-2 routes 130 are not only used to populate MAC-VRF and overlay ARP tables, but 131 also IP-VRF tables with the encoded TS host routes (/32 or /128). In 132 some cases, EVPN may advertise IP Prefixes and therefore provide 133 aggregation in the IP-VRF tables, as opposed to program individual 134 host routes. This document complements the scenarios described in 135 [EVPN-INTERSUBNET] and defines how EVPN may be used to advertise IP 136 Prefixes. 138 Section 2.1 describes the inter-subnet connectivity requirements in 139 Data Centers. Section 2.2 explains why a new EVPN route type is 140 required for IP Prefix advertisements. Once the need for a new EVPN 141 route type is justified, sections 3, 4 and 5 will describe this route 142 type and how it is used in some specific use cases. 144 2.1 Inter-subnet connectivity requirements in Data Centers 146 [EVPN] is used as the control plane for a Network Virtualization 147 Overlay (NVO3) solution in Data Centers (DC), where Network 148 Virtualization Edge (NVE) devices can be located in Hypervisors or 149 TORs, as described in [EVPN-OVERLAYS]. 151 If we use the term Tenant System (TS) to designate a physical or 152 virtual system identified by MAC and IP addresses, and connected to 153 an EVPN instance, the following considerations apply: 155 o The Tenant Systems may be Virtual Machines (VMs) that generate 156 traffic from their own MAC and IP. 158 o The Tenant Systems may be Virtual Appliance entities (VAs) that 159 forward traffic to/from IP addresses of different End Devices 160 seating behind them. 162 o These VAs can be firewalls, load balancers, NAT devices, other 163 appliances or virtual gateways with virtual routing instances. 165 o These VAs do not have their own routing protocols and hence 166 rely on the EVPN NVEs to advertise the routes on their behalf. 168 o In all these cases, the VA will forward traffic to the Data 169 Center using its own source MAC but the source IP will be the 170 one associated to the End Device seating behind or a 171 translated IP address (part of a public NAT pool) if the VA is 172 performing NAT. 174 o Note that the same IP address could exist behind two of these 175 TS. One example of this would be certain appliance resiliency 176 mechanisms, where a virtual IP or floating IP can be owned by 177 one of the two VAs running the resiliency protocol (the master 178 VA). VRRP is one particular example of this. Another example 179 is multi-homed subnets, i.e. the same subnet is connected to 180 two VAs. 182 o Although these VAs provide IP connectivity to VMs and subnets 183 behind them, they do not always have their own IP interface 184 connected to the EVPN NVE, e.g. layer-2 firewalls are examples 185 of VAs not supporting IP interfaces. 187 The following figure illustrates some of the examples described 188 above. 189 NVE1 190 +-----------+ 191 TS1(VM)--|(MAC-VRF10)|-----+ 192 IP1/M1 +-----------+ | DGW1 193 +---------+ +-------------+ 194 | |----|(MAC-VRF10) | 195 SN1---+ NVE2 | | | IRB1\ | 196 | +-----------+ | | | (IP-VRF)|---+ 197 SN2---TS2(VA)--|(MAC-VRF10)|-| | +-------------+ _|_ 198 | IP2/M2 +-----------+ | VXLAN/ | ( ) 199 IP4---+ <-+ | nvGRE | DGW2 ( WAN ) 200 | | | +-------------+ (___) 201 vIP23 (floating) | |----|(MAC-VRF10) | | 202 | +---------+ | IRB2\ | | 203 SN1---+ <-+ NVE3 | | | | (IP-VRF)|---+ 204 | IP3/M3 +-----------+ | | | +-------------+ 205 SN3---TS3(VA)--|(MAC-VRF10)|---+ | | 206 | +-----------+ | | 207 IP5---+ | | 208 | | 209 NVE4 | | NVE5 +--SN5 210 +---------------------+ | | +-----------+ | 211 IP6------|(MAC-VRF1) | | +-|(MAC-VRF10)|--TS4(VA)--SN6 212 | \ | | +-----------+ | 213 | (IP-VRF) |--+ ESI4 +--SN7 214 | / \IRB3 | 215 |---|(MAC-VRF2)(MAC-VRF10)| 216 SN4| +---------------------+ 218 Figure 1 DC inter-subnet use-cases 220 Where: 222 NVE1, NVE2, NVE3, NVE4, NVE5, DGW1 and DGW2 share the same EVI for a 223 particular tenant. EVI-10 is comprised of the collection of MAC-VRF10 224 instances defined in all the NVEs. All the hosts connected to EVI-10 225 belong to the same IP subnet. The hosts connected to EVI-10 are 226 listed below: 228 o TS1 is a VM that generates/receives traffic from/to IP1, where 229 IP1 belongs to the EVI-10 subnet. 231 o TS2 and TS3 are Virtual Appliances (VA) that generate/receive 232 traffic from/to the subnets and hosts seating behind them 233 (SN1, SN2, SN3, IP4 and IP5). Their IP addresses (IP2 and IP3) 234 belong to the EVI-10 subnet and they can also generate/receive 235 traffic. When these VAs receive packets destined to their own 236 MAC addresses (M2 and M3) they will route the packets to the 237 proper subnet or host. These VAs do not support routing 238 protocols to advertise the subnets connected to them and can 239 move to a different server and NVE when the Cloud Management 240 System decides to do so. These VAs may also support redundancy 241 mechanisms for some subnets, similar to VRRP, where a floating 242 IP is owned by the master VA and only the master VA forwards 243 traffic to a given subnet. E.g.: vIP23 in figure 1 is a 244 floating IP that can be owned by TS2 or TS3 depending on who 245 the master is. Only the master will forward traffic to SN1. 247 o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 248 have their own IP addresses that belong to the EVI-10 subnet 249 too. These IRB interfaces connect the EVI-10 subnet to Virtual 250 Routing and Forwarding (VRF) instances that can route the 251 traffic to other connected subnets for the same tenant (within 252 the DC or at the other end of the WAN). 254 o TS4 is a layer-2 VA that provides connectivity to subnets SN5, 255 SN6 and SN7, but does not have an IP address itself in the 256 EVI-10. TS4 is connected to a physical port on NVE5 assigned 257 to Ethernet Segment Identifier 4. 259 All the above DC use cases require inter-subnet forwarding and 260 therefore the individual host routes and subnets: 262 a) MUST be advertised from the NVEs (since VAs and VMs do not run 263 routing protocols) and 264 b) MAY be associated to an overlay next-hop that can be a VA IP 265 address, a floating IP address or an ESI. 267 2.2 The requirement for a new EVPN route type 269 [EVPN] defines a MAC/IP route (also referred as RT-2) where a MAC 270 address can be advertised together with an IP address length (IPL) 271 and IP address (IP). While a variable IPL might have been used to 272 indicate the presence of an IP prefix in a route type 2, there are 273 several specific use cases in which using this route type to deliver 274 IP Prefixes is not suitable. 276 One example of such use cases is the "floating IP" example described 277 in section 2.1. In this example we need to decouple the advertisement 278 of the prefixes from the advertisement of the floating IP (vIP23 in 279 figure 1) and MAC associated to it, otherwise the solution gets 280 highly inefficient and does not scale. 282 E.g.: if we are advertising 1k prefixes from M2 (using RT-2) and the 283 floating IP owner changes from M2 to M3, we would need to withdraw 1k 284 routes from M2 and re-advertise 1k routes from M3. However if we use 285 a separate route type, we can advertise the 1k routes associated to 286 the floating IP address (vIP23) and only one RT-2 for advertising the 287 ownership of the floating IP, i.e. vIP23 and M2 in the route type 2. 288 When the floating IP owner changes from M2 to M3, a single RT-2 289 withdraw/update is required to indicate the change. The remote DGW 290 will not change any of the 1k prefixes associated to vIP23, but will 291 only update the ARP resolution entry for vIP23 (now pointing at M3). 293 Other reasons to decouple the IP Prefix advertisement from the MAC/IP 294 route are listed below: 296 o Clean identification, operation of troubleshooting of IP 297 Prefixes, not subject to interpretation and independent of the 298 IPL and the IP value. E.g.: a default IP route 0.0.0.0/0 must 299 always be easily and clearly distinguished from the absence of 300 IP information. 302 o MAC address information must not be compared by BGP when 303 selecting two IP Prefix routes. If IP Prefixes were to be 304 advertised using MAC/IP routes, the MAC information would 305 always be present and part of the route key. 307 o IP Prefix routes must not be subject to MAC/IP route 308 procedures such as MAC mobility or aliasing. Prefixes 309 advertised from two different ESIs do not mean mobility; MACs 310 advertised from two different ESIs do mean mobility. Similarly 311 load balancing for IP prefixes is achieved through IP 312 mechanisms such as ECMP, and not through MAC route mechanisms 313 such as aliasing. 315 o NVEs that do not require processing IP Prefixes must have an 316 easy way to identify an update with an IP Prefix and ignore 317 it, rather than processing the MAC/IP route to find out only 318 later that it carries a Prefix that must be ignored. 320 The following sections describe how EVPN is extended with a new route 321 type for the advertisement of IP prefixes and how this route is used 322 to address the current and future inter-subnet connectivity 323 requirements existing in the Data Center. 325 3. The BGP EVPN IP Prefix route 327 The current BGP EVPN NLRI as defined in [EVPN] is shown below: 329 +-----------------------------------+ 330 | Route Type (1 octet) | 331 +-----------------------------------+ 332 | Length (1 octet) | 333 +-----------------------------------+ 334 | Route Type specific (variable) | 335 +-----------------------------------+ 337 Where the route type field can contain one of the following specific 338 values: 340 + 1 - Ethernet Auto-Discovery (A-D) route 342 + 2 - MAC/IP advertisement route 344 + 3 - Inclusive Multicast Route 346 + 4 - Ethernet Segment Route 348 This document defines an additional route type that will be used for 349 the advertisement of IP Prefixes: 351 + 5 - IP Prefix Route 353 The support for this new route type is OPTIONAL. 355 Since this new route type is OPTIONAL, an implementation not 356 supporting it MUST ignore the route, based on the unknown route type 357 value. 359 The detailed encoding of this route and associated procedures are 360 described in the following sections. 362 3.1 IP Prefix Route encoding 364 An IP Prefix advertisement route NLRI consists of the following 365 fields: 367 +---------------------------------------+ 368 | RD (8 octets) | 369 +---------------------------------------+ 370 |Ethernet Segment Identifier (10 octets)| 371 +---------------------------------------+ 372 | Ethernet Tag ID (4 octets) | 373 +---------------------------------------+ 374 | IP Prefix Length (1 octet) | 375 +---------------------------------------+ 376 | IP Prefix (4 or 16 octets) | 377 +---------------------------------------+ 378 | GW IP Address (4 or 16 octets) | 379 +---------------------------------------+ 380 | MPLS Label (3 octets) | 381 +---------------------------------------+ 383 Where: 385 o RD, Ethernet Tag ID and MPLS Label fields will be used as 386 defined in [EVPN] and [EVPN-OVERLAYS]. 388 o The Ethernet Segment Identifier will be a non-zero 10-byte 389 identifier if the ESI is used as an overlay next-hop. It will 390 be zero otherwise. 392 o The IP Prefix Length can be set to a value between 0 and 32 393 (bits) for ipv4 and between 0 and 128 for ipv6. 395 o The IP Prefix will be a 32 or 128-bit field (ipv4 or ipv6). 397 o The GW IP (Gateway IP Address) will be a 32 or 128-bit field 398 (ipv4 or ipv6), and will encode the overlay IP next-hop for 399 the IP Prefixes. The GW IP field can be zero if it is not used 400 as an overlay next-hop. 402 o The total route length will indicate the type of prefix (ipv4 403 or ipv6) and the type of GW IP address (ipv4 or ipv6). Note 404 that the IP Prefix + the GW IP should have a length of either 405 64 or 256 bits, but never 160 bits (ipv4 and ipv6 mixed values 406 are not allowed). 408 The Eth-Tag ID, IP Prefix Length and IP Prefix will be part of the 409 route key used by BGP to compare routes. The rest of the fields will 410 not be part of the route key. 412 The route will contain a single overlay next-hop at most, i.e. if the 413 ESI field is different from zero, the GW IP field will be zero, and 414 vice versa. The following table shows the different inter-subnet use- 415 cases described in this document and the corresponding coding of the 416 overlay next-hop in the route type 5 (RT-5). The IP-VRF-to-IP-VRF or 417 IRB forwarding on NVEs case is a special use-case, where there is no 418 need for overlay next-hop, since the actual next-hop is given by the 419 BGP next-hop. When an overlay next-hop is present in the RT-5, the 420 receiving NVE will need to perform a recursive route resolution to 421 find out to which egress NVE to forward the packets. 423 +----------------------------+----------------------------------+ 424 | Use-case | Next-hop in the RT-5 BGP update | 425 +----------------------------+----------------------------------+ 426 | TS IP address | GW IP Address | 427 | Floating IP address | GW IP Address | 428 | "Bump in the wire" | ESI | 429 | IP-VRF-to-IP-VRF | BGP next-hop | 430 +----------------------------+----------------------------------+ 432 4. Benefits of using the EVPN IP Prefix route 434 This section clarifies the different functions accomplished by the 435 EVPN RT-2 and RT-5 routes, and provides a list of benefits derived 436 from using a separate route type for the advertisement of IP Prefixes 437 in EVPN. 439 [EVPN] describes the content of the BGP EVPN RT-2 specific NLRI, i.e. 440 MAC/IP Advertisement Route, where the IP address length (IPL) and IP 441 address (IP) of a specific advertised MAC are encoded. The subject of 442 the MAC advertisement route is the MAC address (M) and MAC address 443 length (ML) encoded in the route. The MAC mobility and other complex 444 procedures are defined around that MAC address. The IP address 445 information carries the host IP address required for the ARP 446 resolution of the MAC according to [EVPN] and the host route to be 447 programmed in the IP-VRF [EVPN-INTERSUBNET]. 449 The BGP EVPN route type 5 defined in this document, i.e. IP Prefix 450 Advertisement route, decouples the advertisement of IP prefixes from 451 the advertisement of any MAC address related to it. This brings some 452 major benefits to NVO-based networks where certain inter-subnet 453 forwarding scenarios are required. Some of those benefits are: 455 a) Upon receiving a route type 2 or type 5, an egress NVE can easily 456 distinguish MACs and IPs from IP Prefixes. E.g. an IP prefix with 457 IPL=32 being advertised from two different ingress NVEs (as RT-5) 458 can be identified as such and be imported in the designated 459 routing context as two ECMP routes, as opposed to two MACs 460 competing for the same IP. 462 b) Similarly, upon receiving a route, an ingress NVE not supporting 463 processing of IP Prefixes can easily ignore the update, based on 464 the route type. 466 c) A MAC route includes the ML, M, IPL and IP in the route key that 467 is used by BGP to compare routes, whereas for IP Prefix routes, 468 only IPL and IP (as well as Ethernet Tag ID) are part of the route 469 key. Advertised IP Prefixes are imported into the designated 470 routing context, where there is no MAC information associated to 471 IP routes. In the example illustrated in figure 1, subnet SN1 472 should be advertised by NVE2 and NVE3 and interpreted by DGW1 as 473 the same route coming from two different next-hops, regardless of 474 the MAC address associated to TS2 or TS3. This is easily 475 accomplished in the RT-5 by including only the IP information in 476 the route key. 478 d) By decoupling the MAC from the IP Prefix advertisement procedures, 479 we can leave the IP Prefix advertisements out of the MAC mobility 480 procedures defined in [EVPN] for MACs. In addition, this allows us 481 to have an indirection mechanism for IP Prefixes advertised from a 482 MAC/IP that can move between hypervisors. E.g. if there are 1,000 483 prefixes seating behind TS2 (figure 1), NVE2 will advertise all 484 those prefixes in RT-5 routes associated to the next-hop IP2. 485 Should TS2 move to a different NVE, a single MAC advertisement 486 route withdraw for the M2/IP2 route from NVE2 will invalidate the 487 1,000 prefixes, as opposed to have to wait for each individual 488 prefix to be withdrawn. This may be easily accomplished by using 489 IP Prefix routes that are not tied to a MAC address, and use a 490 different MAC/IP route to advertise the location and resolution of 491 the overlay next-hop to a MAC address. 493 5. IP Prefix next-hop use-cases 495 The IP Prefix route can use a GW IP or an ESI as an overlay next-hop 496 as well as no overlay next-hop whatsoever. This section describes 497 some use-cases for these next-hop types. 499 5.1 TS IP address next-hop use-case 501 The following figure illustrates an example of inter-subnet 502 forwarding for subnets seating behind Virtual Appliances (on TS2 and 503 TS3). 505 SN1---+ NVE2 DGW1 506 | +-----------+ +---------+ +-------------+ 507 SN2---TS2(VA)--|(MAC-VRF10)|-| |----|(MAC-VRF10) | 508 | IP2/M2 +-----------+ | | | IRB1\ | 509 IP4---+ | | | (IP-VRF)|---+ 510 | | +-------------+ _|_ 511 | VXLAN/ | ( ) 512 | nvGRE | DGW2 ( WAN ) 513 SN1---+ NVE3 | | +-------------+ (___) 514 | IP3/M3 +-----------+ | |----|(MAC-VRF10) | | 515 SN3---TS3(VA)--|(MAC-VRF10)|-| | | IRB2\ | | 516 | +-----------+ +---------+ | (IP-VRF)|---+ 517 IP5---+ +-------------+ 519 Figure 2 TS IP address use-case 521 An example of inter-subnet forwarding between subnet SN1/24 and a 522 subnet seating in the WAN is described below. NVE2, NVE3, DGW1 and 523 DGW2 are running BGP EVPN. TS2 and TS3 do not support routing 524 protocols, only a static route to forward the traffic to the WAN. 526 (1) NVE2 advertises the following BGP routes on behalf of TS2: 528 o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32, 529 IP=IP2 and [RFC5512] BGP Encapsulation Extended Community with 530 Tunnel-type= VXLAN or NVGRE. 532 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 533 ESI=0, GW IP address=IP2 (and BGP Encapsulation Extended 534 Community). 536 (2) NVE3 advertises the following BGP routes on behalf of TS3: 538 o Route type 2 (MAC/IP route) containing: ML=48, M=M3, IPL=32, 539 IP=IP3 (and BGP Encapsulation Extended Community). 541 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 542 ESI=0, GW IP address=IP3 (and BGP Encapsulation Extended 543 Community). 545 (3) DGW1 and DGW2 import both received routes based on the 546 route-targets: 548 o Based on the MAC-VRF10 route-target in DGW1 and DGW2, the 549 MAC/IP route is imported and M2 is added to the MAC-VRF10 550 along with its corresponding tunnel information. For instance, 551 if VXLAN is used, the VTEP will be derived from the MAC/IP 552 route BGP next-hop (underlay next-hop) and VNI from the 553 Ethernet Tag or MPLS fields. IP2 - M2 is added to the ARP 554 table. 556 o Based on the MAC-VRF10 route-target in DGW1 and DGW2, the IP 557 Prefix route is also imported and SN1/24 is added to the 558 designated routing context with next-hop IP2 pointing at the 559 local MAC-VRF10. Should ECMP be enabled in the routing 560 context, SN1/24 would also be added to the routing table with 561 next-hop IP3. 563 (4) When DGW1 receives a packet from the WAN with destination IPx, 564 where IPx belongs to SN1/24: 566 o A destination IP lookup is performed on the DGW1 IP-VRF 567 routing table and next-hop=IP2 is found. Since IP2 is an 568 overlay next-hop a recursive route resolution is required for 569 IP2. 571 o IP2 is resolved to M2 in the ARP table, and M2 is resolved to 572 the tunnel information given by the MAC FIB (remote VTEP and 573 VNI for the VXLAN case). 575 o The IP packet destined to IPx is encapsulated with: 577 . Source inner MAC = IRB1 MAC 579 . Destination inner MAC = M2 581 . Tunnel information provided by the MAC-VRF (VNI, VTEP IPs 582 and MACs for the VXLAN case) 584 (5) When the packet arrives at NVE2: 586 o Based on the tunnel information (VNI for the VXLAN case), the 587 MAC-VRF10 context is identified for a MAC lookup. 589 o Encapsulation is stripped-off and based on a MAC lookup 590 (assuming MAC forwarding on the egress NVE), the packet is 591 forwarded to TS2, where it will be properly routed. 593 (6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will 594 be applied to the MAC route IP2/M2, as defined in [EVPN]. Route type 595 5 prefixes are not subject to MAC mobility procedures, hence no 596 changes in the DGW VRF routing table will occur for TS2 mobility, 597 i.e. all the prefixes will still be pointing at IP2 as next-hop. 598 There is an indirection for e.g. SN1/24, which still points at 599 next-hop IP2 in the routing table, but IP2 will be simply resolved to 600 a different tunnel, based on the outcome of the MAC mobility 601 procedures for the MAC/IP route IP2/M2. 603 Note that in the opposite direction, TS2 will send traffic based on 604 its static-route next-hop information (IRB1 and/or IRB2), and regular 605 EVPN procedures will be applied. 607 5.2 Floating IP next-hop use-case 609 Sometimes Tenant Systems (TS) work in active/standby mode where an 610 upstream floating IP - owned by the active TS - is used as the 611 next-hop to get to some subnets behind. This redundancy mode, already 612 introduced in section 2.1 and 2.2, is illustrated in Figure 3. 614 NVE2 DGW1 615 +-----------+ +---------+ +-------------+ 616 +---TS2(VA)--|(MAC-VRF10)|-| |----|(MAC-VRF10) | 617 | IP2/M2 +-----------+ | | | IRB1\ | 618 | <-+ | | | (IP-VRF)|---+ 619 | | | | +-------------+ _|_ 620 SN1 vIP23 (floating) | VXLAN/ | ( ) 621 | | | nvGRE | DGW2 ( WAN ) 622 | <-+ NVE3 | | +-------------+ (___) 623 | IP3/M3 +-----------+ | |----|(MAC-VRF10) | | 624 +---TS3(VA)--|(MAC-VRF10)|-| | | IRB2\ | | 625 +-----------+ +---------+ | (IP-VRF)|---+ 626 +-------------+ 627 Figure 3 Floating IP next-hop for redundant TS 629 In this example, assuming TS2 is the active TS and owns IP23: 631 (1) NVE2 advertises the following BGP routes for TS2: 633 o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32, 634 IP=IP23 (and BGP Encapsulation Extended Community). 636 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 637 ESI=0, GW IP address=IP23 (and BGP Encapsulation Extended 638 Community). 640 (2) NVE3 advertises the following BGP routes for TS3: 642 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 643 ESI=0, GW IP address=IP23 (and BGP Encapsulation Extended 644 Community). 646 (3) DGW1 and DGW2 import both received routes based on the route- 647 target: 649 o M2 is added to the MAC-VRF10 MAC FIB along with its 650 corresponding tunnel information. For the VXLAN use case, the 651 VTEP will be derived from the MAC/IP route BGP next-hop and 652 VNI from the Ethernet Tag or MPLS fields. IP23 - M2 is added 653 to the ARP table. 655 o SN1/24 is added to the designated routing context in DGW1 and 656 DGW2 with next-hop IP23 pointing at the local MAC-VRF10. 658 (4) When DGW1 receives a packet from the WAN with destination IPx, 659 where IPx belongs to SN1/24: 661 o A destination IP lookup is performed on the DGW1 IP-VRF 662 routing table and next-hop=IP23 is found. Since IP23 is an 663 overlay next-hop, a recursive route resolution for IP23 is 664 required. 666 o IP23 is resolved to M2 in the ARP table, and M2 is resolved to 667 the tunnel information given by the MAC-VRF (remote VTEP and 668 VNI for the VXLAN case). 670 o The IP packet destined to IPx is encapsulated with: 672 . Source inner MAC = IRB1 MAC 674 . Destination inner MAC = M2 676 . Tunnel information provided by the MAC FIB (VNI, VTEP IPs 677 and MACs for the VXLAN case) 679 (5) When the packet arrives at NVE2: 681 o Based on the tunnel information (VNI for the VXLAN case), the 682 MAC-VRF10 context is identified for a MAC lookup. 684 o Encapsulation is stripped-off and based on a MAC lookup 685 (assuming MAC forwarding on the egress NVE), the packet is 686 forwarded to TS2, where it will be properly routed. 688 (6) When the redundancy protocol running between TS2 and TS3 appoints 689 TS3 as the new active TS for SN1, TS3 will now own the floating 690 IP23 and will signal this new ownership (GARP message or 691 similar). Upon receiving the new owner's notification, NVE3 will 692 issue a route type 2 for M3-IP23. DGW1 and DGW2 will update their 693 ARP tables with the new MAC resolving the floating IP. No changes 694 are carried out in the VRF routing table. 696 In the DGW1/2 BGP RIB, there will be two route type 5 routes for SN1 697 (from NVE2 and NVE3) but only the one with the same BGP next-hop as 698 the IP23 RT-2 BGP next-hop will be valid. 700 5.3 ESI next-hop ("Bump in the wire") use-case 702 The following figure illustrates and example of inter-subnet 703 forwarding for a subnet route that uses an ESI as an overlay next- 704 hop. In this use-case, TS2 and TS3 are layer-2 VA devices without any 705 IP address that can be included as an overlay next-hop in the GW IP 706 field of the IP Prefix route. 708 NVE2 DGW1 709 +-----------+ +---------+ +-------------+ 710 +---TS2(VA)--|(MAC-VRF10)|-| |----|(MAC-VRF10) | 711 | ESI23 +-----------+ | | | IRB1\ | 712 | + | | | (IP-VRF)|---+ 713 | | | | +-------------+ _|_ 714 SN1 | | VXLAN/ | ( ) 715 | | | nvGRE | DGW2 ( WAN ) 716 | + NVE3 | | +-------------+ (___) 717 | ESI23 +-----------+ | |----|(MAC-VRF10) | | 718 +---TS3(VA)--|(MAC-VRF10)|-| | | IRB2\ | | 719 +-----------+ +---------+ | (IP-VRF)|---+ 720 +-------------+ 722 Figure 5 ESI next-hop use-case 724 Since neither TS2 nor TS3 can run any routing protocol and have no IP 725 address assigned, an ESI, i.e. ESI23, will be provisioned on the 726 attachment ports of NVE2 and NVE3. This model supports VA redundancy 727 in a similar way as the one described in section 5.2 for the floating 728 IP next-hop use-case, only using the EVPN Ethernet A-D route instead 729 of the MAC advertisement route to advertise the location of the 730 overlay next-hop. The procedure is explained below: 732 (1) NVE2 advertises the following BGP routes for TS2: 734 o Route type 1 (Ethernet A-D route for EVI-10) containing: 735 ESI=ESI23 and the corresponding tunnel information (Ethernet 736 Tag and/or MPLS label), as well as the BGP Encapsulation 737 Extended Community. Assuming the ESI is active on NVE2, NVE2 738 will advertise this route. 740 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 741 ESI=ESI23, GW IP address=0 (and BGP Encapsulation Extended 742 Community). 744 (2) NVE3 advertises the following BGP routes for TS3: 746 o Route type 1 (Ethernet A-D route for EVI-10) containing: 747 ESI=ESI23 and the corresponding tunnel information (Ethernet 748 Tag and/or MPLS label), as well as the BGP Encapsulation 749 Extended Community. NVE3 will advertise this route assuming 750 the ESI is active on NVE2. Note that if the resiliency 751 mechanism for TS2 and TS3 is in active-active mode, both NVE2 752 and NVE3 will send the A-D route. Otherwise, that is, the 753 resiliency is active-standby, only the NVE owning the active 754 ESI will advertise the Ethernet A-D route for ESI23. 756 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 757 ESI=23, GW IP address=0 (and BGP Encapsulation Extended 758 Community). 760 (3) DGW1 and DGW2 import the received routes based on the route- 761 target: 763 o The tunnel information to get to ESI23 is installed in DGW1 764 and DGW2. For the VXLAN use case, the VTEP will be derived 765 from the Ethernet A-D route BGP next-hop and VNI from the 766 Ethernet Tag or MPLS fields (see [EVPN-OVERLAYS]). 768 o SN1/24 is added to the designated routing context in DGW1 and 769 DGW2 with next-hop ESI23 pointing at the local MAC-VRF10. 771 (4) When DGW1 receives a packet from the WAN with destination IPx, 772 where IPx belongs to SN1/24: 774 o A destination IP lookup is performed on the DGW1 IP-VRF 775 routing table and next-hop=ESI23 is found. Since ESI23 is an 776 overlay next-hop, a recursive route resolution is required to 777 find the egress NVE where ESI23 resides. 779 o The IP packet destined to IPx is encapsulated with: 781 . Source inner MAC = IRB1 MAC 783 . Destination inner MAC = M2 (this MAC will be obtained 784 after a lookup in the IP-VRF ARP table or in the MAC- 785 VRF10 FDB table associated to ESI23). 787 . Tunnel information provided by the Ethernet A-D route for 788 ESI23 (VNI, VTEP IP and MACs for the VXLAN case). 790 (5) When the packet arrives at NVE2: 792 o Based on the tunnel information (VNI for the VXLAN case), the 793 MAC-VRF10 context is identified for a MAC lookup (assuming MAC 794 disposition model). 796 o Encapsulation is stripped-off and based on a MAC lookup 797 (assuming MAC forwarding on the egress NVE), the packet is 798 forwarded to TS2, where it will be properly forwarded. 800 (6) If the redundancy protocol running between TS2 and TS3 follows an 801 active/standby model and there is a failure, appointing TS3 as 802 the new active TS for SN1, TS3 will now own the connectivity to 803 SN1 and will signal this new ownership. Upon receiving the new 804 owner's notification, NVE3 will issue a route type 1 for ESI23, 805 whereas NVE2 will withdraw its Ethernet A-D route for ESI23. DGW1 806 and DGW2 will update their tunnel information to resolve ESI23. 807 No changes are carried out in the IP-VRF routing table. 809 In the DGW1/2 BGP RIB, there will be two route type 5 routes for SN1 810 (from NVE2 and NVE3) but only the one with the same BGP next-hop as 811 the ESI23 route type 1 BGP next-hop will be valid. 813 5.4 IRB forwarding on NVEs for Subnets (IP-VRF-to-IP-VRF) 815 This use-case is similar to the scenario described in "IRB forwarding 816 on NVEs for Tenant Systems" in [EVPN-INTERSUBNET], however the new 817 requirement here is the advertisement of IP Prefixes as opposed to 818 only host routes. In the previous examples, the MAC-VRF instance can 819 connect IRB interfaces and any other Tenant Systems connected to it. 820 EVPN provides connectivity for: 822 a) Traffic destined to the IRB IP interfaces as well as 824 b) Traffic destined to IP subnets seating behind the TS, e.g. SN1 or 825 SN2. 827 In order to provide connectivity for (a) we need MAC/IP routes (RT-2) 828 distributing IRB MACs and IPs. Connectivity type (b) is accomplished 829 by the exchange of IP Prefix routes (RT-5) for IPs and subnets 830 seating behind certain overlay next-hops. 832 In some cases, subnets may be advertised in IP Prefix routes without 833 any overlay next-hop since the RT-5 itself provides all the 834 forwarding information required to send the packets to the egress NVE 835 and no recursive route resolution is needed. This use case is 836 depicted in the diagram below and we refer to it as the "IRB 837 forwarding on NVEs for Subnets" or "IP-VRF-to-IP-VRF" use-case: 839 NVE1 840 +------------+ 841 IP1-----|(MAC-VRF1) | DGW1 842 | \ | +---------+ +--------+ 843 | (IP-VRF)|----| |-|(IP-VRF)|----+ 844 | / | | | +--------+ | 845 |---|(MAC-VRF2) | | | _|_ 846 | +------------+ | | ( ) 847 SN1| | VXLAN/ | ( WAN ) 848 | NVE2 | nvGRE | (___) 849 | +------------+ | | | 850 |---|(MAC-VRF2) | | | DGW2 | 851 | \ | | | +--------+ | 852 | (IP-VRF)|----| |-|(IP-VRF)|----+ 853 | / | +---------+ +--------+ 854 SN2-----|(MAC-VRF3) | 855 +------------+ 857 Figure 6 Inter-subnet forwarding on NVEs for Subnets 859 In this case, we need to provide connectivity from/to IP hosts in 860 SN1, SN2, IP1 and hosts seating at the other end of the WAN. There is 861 no need to define IRB interfaces to interconnect the IP-VRF instances 862 among the NVEs for the same tenant. This is the reason why we refer 863 to this solution as "IP-VRF-to-IP-VRF" solution. 865 In this case, the EVPN route type 5 will be used to advertise the IP 866 Prefixes, along with the Router's MAC Extended Community as defined 867 in [EVPN-INTERSUBNET]. Each NVE/DGW will advertise an RT-5 for each 868 of its subnet prefixes with the following fields: 870 o RD as per [EVPN]. 872 o Eth-Tag ID = 0 assuming VLAN-based service. 874 o IP address length and IP address, as explained in the previous 875 sections. 877 o GW IP address=0 and ESI=0, that is, no overlay next-hop is 878 required in this use-case, since the BGP next-hop is enough to 879 find the egress NVE to forward the packets to. 881 o MPLS label or VNID corresponding to the IP-VRF. 883 Each RT-5 will be sent with a route-target identifying the tenant 884 (IP-VRF) and two BGP extended communities: 886 o The first one is the BGP Encapsulation Extended Community, as 887 per [RFC5512], identifying the tunnel type. 889 o The second one is the Router's MAC Extended Community as per 890 [EVPN-INTERSUBNET] containing the MAC address associated to 891 the NVE advertising the route. This MAC address identifies the 892 NVE/DGW and MAY be re-used for all the IP-VRFs in the node. 893 The ingress NVE will use this MAC address as the inner MAC 894 destination address in the packets forwarded to the owner of 895 the RT-5. 897 Example of prefix advertisement for the ipv4 prefix SN1/24 advertised 898 from NVE1: 900 (1) NVE1 advertises the following BGP route for SN1: 902 o Route type 5 (IP Prefix route) containing: Eth-Tag=0, IPL=24, 903 IP=SN1, MPLS Label=10. An [RFC5512] BGP Encapsulation Extended 904 Community will be sent, where Tunnel-type= VXLAN or NVGRE. A 905 Router's MAC Extended Community will also be sent along with 906 the RT-5, where the Router's MAC address value will contain 907 the NVE1 MAC. 909 (2) DGW1 imports the received route from NVE1 and SN1/24 is added to 910 the designated IP-VRF. The next-hop for SN1/24 will be given by 911 the route type 5 BGP next-hop (NVE1), which is resolved to a 912 tunnel. For instance: if the tunnel is VXLAN based, the BGP next- 913 hop will be resolved to a VXLAN tunnel where: destination-VTEP= 914 NVE1 IP, VNI=10, inner destination MAC = NVE1 MAC (derived from 915 the Router's MAC Extended Community value). 917 (3) When DGW1 receives a packet from the WAN with destination IPx, 918 where IPx belongs to SN1/24: 920 o A destination IP lookup is performed on the DGW1 IP-VRF 921 routing table and next-hop= "NVE1 IP" is found. The tunnel 922 information to encapsulate the packet will be derived from the 923 route type 5 received for SN1. 925 o The IP packet destined to IPx is encapsulated with: Source 926 inner MAC = DGW1 MAC, Destination inner MAC = NVE1 MAC, Source 927 outer IP (source VTEP) = DGW1 IP, Destination outer IP 928 (destination VTEP) = NVE1 IP. 930 (4) When the packet arrives at NVE1: 932 o Based on the tunnel information (VNI for the VXLAN case), the 933 routing context is identified for an IP lookup. 935 o An IP lookup is performed in the routing context, where SN1 936 turns out to be a local subnet associated to MAC-VRF2. A 937 subsequent lookup in the ARP table and the MAC-VRF FIB will 938 return the forwarding information for the packet in EVI-2. 940 6. Conclusions 942 A new EVPN route type 5 for the advertisement of IP Prefixes is 943 described in this document. This new route type has a differentiated 944 role from the RT-2 route and addresses all the Data Center (or NVO- 945 based networks in general) inter-subnet connectivity scenarios in 946 which an IP Prefix advertisement is required. Using this new RT-5, an 947 IP Prefix may be advertised along with an overlay next-hop that can 948 be a GW IP address or an ESI, or without an overlay next-hop, in 949 which case the BGP next-hop will point at the egress NVE and the MAC 950 in the Router's MAC Extended Community will provide the inner MAC 951 destination address to be used. As discussed throughout the document, 952 the existing EVPN RT-2 does not meet the requirements for all the DC 953 use cases, therefore a new EVPN route type is required. 955 This new EVPN route type 5 decouples the IP Prefix advertisements 956 from the MAC route advertisements in EVPN, hence: 958 a) Allows the clean and clear advertisements of ipv4 or ipv6 prefixes 959 in an NLRI with no MAC addresses in the route key, so that only IP 960 information is used in BGP route comparisons. 962 b) Since the route type is different from the MAC/IP advertisement 963 route, the advertisement of prefixes will be excluded from all the 964 procedures defined for the advertisement of VM MACs, e.g. MAC 965 Mobility or aliasing. As a result of that, the current EVPN 966 procedures do not need to be modified. 968 c) Allows a flexible implementation where the prefix can be linked to 969 different types of next-hops: overlay IP address, overlay ESI, 970 underlay IP next-hops, etc. 972 d) An EVPN implementation not requiring IP Prefixes can simply 973 discard them by looking at the route type value. An unknown route 974 type MUST be ignored by the receiving NVE/PE. 976 7. Conventions used in this document 978 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 979 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" 980 in this document are to be interpreted as described in RFC-2119 981 [RFC2119]. 983 8. Security Considerations 985 9. IANA Considerations 987 10. References 989 10.1 Normative References 991 [RFC4364]Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 992 Networks (VPNs)", RFC 4364, February 2006, . 995 10.2 Informative References 997 [EVPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf- 998 l2vpn-evpn-11.txt, work in progress, October, 2014 1000 [EVPN-OVERLAYS] Sajassi-Drake et al., "A Network Virtualization 1001 Overlay Solution using EVPN", draft-ietf-bess-evpn-overlay-00.txt, 1002 work in progress, November, 2014 1004 [EVPN-INTERSUBNET] Sajassi et al., "IP Inter-Subnet Forwarding in 1005 EVPN", draft-ietf-bess-evpn-inter-subnet-forwarding-00.txt, work 1006 in progress, November, 2014 1008 11. Acknowledgments 1010 The authors would like to thank Mukul Katiyar and Senthil 1011 Sathappan for their valuable feedback and contributions. The 1012 following people also helped improving this document with their 1013 feedback: Antoni Przygienda and Thomas Morin. 1015 12. Authors' Addresses 1017 Jorge Rabadan 1018 Alcatel-Lucent 1019 777 E. Middlefield Road 1020 Mountain View, CA 94043 USA 1021 Email: jorge.rabadan@alcatel-lucent.com 1023 Wim Henderickx 1024 Alcatel-Lucent 1025 Email: wim.henderickx@alcatel-lucent.com 1027 Florin Balus 1028 Nuage Networks 1029 Email: florin@nuagenetworks.net 1031 Aldrin Isaac 1032 Bloomberg 1033 Email: aisaac71@bloomberg.net 1035 Senad Palislamovic 1036 Alcatel-Lucent 1037 Email: senad.palislamovic@alcatel-lucent.com 1039 John E. Drake 1040 Juniper Networks 1041 Email: jdrake@juniper.net 1043 Ali Sajassi 1044 Cisco 1045 Email: sajassi@cisco.com