idnits 2.17.1 draft-rabadan-l2vpn-evpn-prefix-advertisement-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 15, 2013) is 3930 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC4364' is mentioned on line 264, but not defined == Missing Reference: 'EVPN' is mentioned on line 595, but not defined == Missing Reference: 'RFC2119' is mentioned on line 913, but not defined == Outdated reference: A later version (-11) exists of draft-ietf-l2vpn-evpn-03 == Outdated reference: A later version (-03) exists of draft-sd-l2vpn-evpn-overlay-01 == Outdated reference: A later version (-09) exists of draft-vandevelde-idr-remote-next-hop-03 Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 L2VPN Workgroup J. Rabadan 3 Internet Draft W. Henderickx 4 S. Palislamovic 5 Intended status: Standards Track Alcatel-Lucent 7 F. Balus 8 Nuage Networks 10 A. Isaac 11 Bloomberg 13 Expires: January 16, 2014 July 15, 2013 15 IP Prefix Advertisement in E-VPN 16 draft-rabadan-l2vpn-evpn-prefix-advertisement-00 18 Abstract 20 E-VPN provides a flexible control plane that allows intra-subnet 21 connectivity in an IP/MPLS and/or an NVO-based network. In Data 22 Centers, there is also a need for a dynamic and efficient inter- 23 subnet connectivity across Tenant Systems and End Devices that can be 24 physical or virtual and may not support their own routing protocols. 25 This document defines a new E-VPN route type for the advertisement of 26 IP Prefixes and explains how E-VPN should be used to provide 27 inter-subnet connectivity with the flexibility required by the Data 28 Center applications. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF), its areas, and its working groups. Note that 37 other groups may also distribute working documents as Internet- 38 Drafts. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 The list of current Internet-Drafts can be accessed at 46 http://www.ietf.org/ietf/1id-abstracts.txt 48 The list of Internet-Draft Shadow Directories can be accessed at 49 http://www.ietf.org/shadow.html 51 This Internet-Draft will expire on January 16, 2014. 53 Copyright Notice 55 Copyright (c) 2013 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction and problem statement . . . . . . . . . . . . . . 3 71 1.1 Inter-subnet connectivity requirements in Data Centers . . . 3 72 1.2 The requirement for advertising IP prefixes in E-VPN . . . . 5 73 1.3 The requirement for a new E-VPN route type . . . . . . . . . 6 74 2. The BGP E-VPN IP Prefix route . . . . . . . . . . . . . . . . . 8 75 2.1. IP Prefix Route encoding . . . . . . . . . . . . . . . . . 9 76 2.2. BGP remote-next-hop attribute . . . . . . . . . . . . . . . 9 77 3. Procedures associated to the advertisement of IP Prefixes . . . 10 78 3.1. Usage of the MAC advertisement and IP Prefix 79 advertisement routes . . . . . . . . . . . . . . . . . . . 10 80 3.2. Inter-subnet connectivity for TS . . . . . . . . . . . . . 11 81 3.3. Inter-subnet connectivity for redundant TS (floating IP) . 13 82 3.4. Inter-subnet connectivity for IRB interfaces . . . . . . . 15 83 3.4.1. Inter-subnet connectivity for unnumbered IRB 84 interfaces . . . . . . . . . . . . . . . . . . . . . . 17 85 4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 5. Conventions used in this document . . . . . . . . . . . . . . . 20 87 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 20 88 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 20 89 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 90 8.1. Normative References . . . . . . . . . . . . . . . . . . . 20 91 8.2. Informative References . . . . . . . . . . . . . . . . . . 20 92 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 20 93 10. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 21 95 1. Introduction and problem statement 97 Inter-subnet connectivity is required within the Data Center, 98 therefore IP Prefixes must be advertised in the control plane. This 99 section explains why IP-VPN [RFC4364] procedures cannot be used for 100 such advertisements and why the existing E-VPN MAC route type does 101 not meet the Data Center requirements for the advertisement of IP 102 Prefixes, hence a new E-VPN route type is proposed. 104 Section 1.1 describes the inter-subnet connectivity requirements in 105 Data Centers. Section 1.2 and 1.3 explain why neither IP-VPN nor the 106 existing E-VPN route types meet the requirements for IP Prefix 107 advertisements. Once the need for a new E-VPN route type is 108 justified, sections 2 and 3 will describe this route type and how it 109 is used in some specific use cases. 111 1.1 Inter-subnet connectivity requirements in Data Centers 113 [E-VPN] is used as the control plane for a Network Virtualization 114 Overlay (NVO3) solution in Data Centers (DC), where Network 115 Virtualization Edge (NVE) devices can be located in Hypervisors or 116 TORs, as described in [E-VPN-OVERLAYS]. 118 If we use the term Tenant System (TS) to designate a physical or 119 virtual system identified by MAC and IP addresses, and connected to 120 an E-VPN instance, the following considerations apply: 122 o The Tenant Systems may be Virtual Machines (VMs) that generate 123 traffic from their own MAC and IP. 125 o The Tenant Systems may be Virtual Appliance entities (VAs) that 126 forward traffic to/from IP addresses of different End Devices 127 seating behind them. 129 o These VAs can be firewalls, load balancers, NAT devices, other 130 appliances or virtual gateways with virtual routing instances. 132 o These VAs do not have their own routing protocols and hence 133 rely on the E-VPN NVEs to advertise the routes on their 134 behalf. 136 o In all these cases, the VA will forward traffic to the Data 137 Center using its own source MAC but the source IP will be the 138 one associated to the End Device seating behind or a 139 translated IP address (part of a public NAT pool) if the VA is 140 performing NAT. 142 o Note that the same IP address could exist behind two of these 143 TS. One example of this would be certain appliance resiliency 144 mechanisms, where a virtual IP or floating IP can be own by 145 one of the two VAs running the resiliency protocol (the master 146 VA). VRRP is one particular example of this. Another example 147 is multi-homed subnets, i.e. the same subnet is connected to 148 two VAs. 150 The following figure illustrates some of the examples described 151 above. 153 NVE1 154 +--------+ 155 TS1(VM)--|(EVI-10)|---------+ 156 IP1/M1 +--------+ | DGW1 157 +---------+ +-------------+ 158 | |----|(EVI-10) | 159 SN1---+ NVE2 | | | IRB1\ | 160 | +--------+ | | | (VRF)|---+ 161 SN2---TS2(VA)--|(EVI-10)|----| | +-------------+ _|_ 162 | IP2/M2 +--------+ | VXLAN/ | ( ) 163 IP4---+ <-+ | nvGRE | DGW2 ( WAN ) 164 | | | +-------------+ (___) 165 vIP23 (floating) | |----|(EVI-10) | | 166 | +---------+ | IRB2\ | | 167 SN1---+ <-+ NVE3 | | | (VRF)|---+ 168 | IP3/M3 +--------+ | | +-------------+ 169 SN3---TS3(VA)--|(EVI-10)|------+ | 170 | +--------+ | 171 IP5---+ | 172 | 173 NVE4 | 174 +---------------------+ | 175 IP6------|(EVI-1) | | 176 | \ IRB3 | | 177 | (VRF)-(EVI-10)|--+ 178 | / | 179 |---|(EVI-2) | 180 SN4| +---------------------+ 182 Figure 1 DC inter-subnet use-cases 184 Where: 186 NVE1, NVE2, NVE3, NVE4, DGW1 and DGW2 share the same E-VPN for a 187 particular tenant. EVI-10 is the corresponding E-VPN instance on each 188 element, and all the hosts connected to that instance belong to the 189 same IP subnet. The hosts connected to E-VPN 10 are listed below: 191 o TS1 is a VM that generates/receives traffic from/to IP1, where 192 IP1 belongs to the E-VPN 10 subnet. 194 o TS2 and TS3 are Virtual Appliances (VA) that generate/receive 195 traffic from/to the subnets and hosts seating behind them 196 (SN1, SN2, SN3, IP4 and IP5). Their IP addresses (IP2 and IP3) 197 belong to the E-VPN subnet and they can also generate/receive 198 traffic. When these VAs receive packets destined to their own 199 MAC addresses (M2 and M3) they will route the packets to the 200 proper subnet or host. These VAs do not support routing 201 protocols to advertise the subnets connected to them and can 202 move to a different server and NVE when the Cloud Management 203 System decides to do so. These VAs may also support redundancy 204 mechanisms for some subnets, similar to VRRP, where a floating 205 IP is owned by the master VA and only the master VA forwards 206 traffic to a given subnet. E.g.: vIP23 in figure 1 is a 207 floating IP that can be owned by TS2 or TS3 depending on who 208 the master is. Only the master will forward traffic to SN1. 210 o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 211 have their own IP addresses that belong to the E-VPN 10 subnet 212 too. These IRB interfaces connect the E-VPN 10 subnet to 213 Virtual Routing and Forwarding (VRF) instances that can route 214 the traffic to other connected subnets for the same tenant 215 (within the DC or at the other end of the WAN). In some 216 occasions, the IRB interfaces do not terminate IP traffic 217 themselves and therefore they do not need any IP address 218 configured. In such case, we will refer to these special IRB 219 interfaces as "unnumbered" IRB interfaces. 221 All the above DC use cases use individual IP hosts and subnets for 222 intra/inter connectivity. Therefore, their IP addresses MUST be 223 advertised: 225 a) From the NVEs (since VAs and VMs do not run routing protocols) and 226 b) Associated to a next-hop that can be a VA IP address, a floating 227 IP address, and IRB IP address or a MAC address. 229 1.2 The requirement for advertising IP prefixes in E-VPN 231 In all the inter-subnet connectivity cases discussed in section 1.1 232 there is a need to advertise IP prefixes in the control plane that 233 cannot be satisfied by using [RFC4364] due to the following 234 requirements, specific to NVO-based Data Centers: 236 o The data plane in NVO-based Data Centers is not based on IP 237 over a GRE or MPLS tunnel as required by [RFC4364], but 238 Ethernet over an IP tunnel, such as VXLAN or NVGRE. 240 o The IP prefixes in the DC must be advertised with a 241 flexibility that does not exist in IP-VPNs. For instance: 243 a) The advertised next-hop for a given IP prefix can be an 244 IRB IP address (see section 3.4), a floating IP address (see 245 section 3.3) or even a MAC address (see section 3.4.1). In 246 the future, the ESI could also be defined as a next-hop for 247 the advertised prefixes. 249 b) As stated by [E-VPN-OVERLAYS], VXLAN or NVGRE virtual 250 identifiers can have a global or a local scope. The 251 implementation MUST support the flexibility to advertise IP 252 Prefixes associated to a global identifier (32-bit value 253 encoded in the E-VPN Ethernet Tag ID) or a locally 254 significant identifier (20-bit value encoded in the MPLS 255 label field). At the moment, [RFC4364] can only advertise 256 Prefixes associated to a locally significant identifier 257 (MPLS label). 259 o IP prefixes must be advertised by NVE devices that have no VRF 260 instances defined and no capability to process IP-VPN 261 prefixes. These NVE devices just support E-VPN and advertise 262 IP Prefixes on behalf of some connected Tenant Systems. In 263 other words: any attempt to solve this problem by simply using 264 [RFC4364] routes requires that any EVPN deployment must be 265 accompanied with a concurrent IP-VPN topology, which is not 266 possible in most of the cases. 268 o Finally, Data Center providers want to use a single BGP 269 Subsequent Address Family (AFI/SAFI) for the advertisement of 270 addresses within the Data Center, i.e. BGP E-VPN only, as 271 opposed to using E-VPN and IP-VPN in a concurrent topology. 272 This minimizes the control plane overhead in TORs and 273 Hypervisors and simplifies the operations. 275 E-VPN is extended - as described in this document - to advertise IP 276 prefixes with the flexibility required by the current and future Data 277 Center applications. 279 1.3 The requirement for a new E-VPN route type 281 [E-VPN] defines a MAC route (or route type 2) where a MAC address can 282 be advertised together with an IP address length (IPL) and IP address 283 (IP). While a variable IPL might be used to indicate the presence of 284 an IP prefix in a route type 2, there are several specific use cases 285 in which using this route type to deliver IP Prefixes is not 286 suitable. 288 One example of such use cases is the "floating IP" example described 289 in section 1.1. In this example we need to decouple the advertisement 290 of the prefixes from the advertisement of the floating IP (vIP23 in 291 figure 1) and MAC associated to it, otherwise the solution gets 292 highly inefficient and does not scale. 294 E.g.: if we are advertising 1k prefixes from M2 (using route type 2) 295 and the floating IP owner changes from M2 to M3, we would need to 296 withdraw 1k routes from M2 and re-advertise 1k routes from M3. 297 However if we use a separate route type, we can advertise the 1k 298 routes associated to the floating IP address (vIP23) and only one 299 route type 2 for advertising the ownership of the floating IP, i.e. 300 vIP23 and M2 in the route type 2. When the floating IP owner changes 301 from M2 to M3, a single route type 2 withdraw/update is required to 302 indicate the change. The remote DGW will not change any of the 1k 303 prefixes associated to vIP23, but will only update the ARP resolution 304 entry for vIP23 (now pointing at M3). 306 Any other attempt to improve the efficiency of the solution when 307 using non-MAC-decoupled Prefix advertisements, will derive in 308 dependencies on the Cloud Management System (if ESIs are to be used) 309 and changes in the current E-VPN semantics. The DC applications 310 require mechanisms to provide IP Prefix resiliency independent of the 311 E-VPN procedures. 313 Other reasons to decouple the IP Prefix advertisement from the MAC 314 route are listed below: 316 o Clean identification, operation of troubleshooting of IP 317 Prefixes, not subject to interpretation and independent of the 318 IPL and the IP value. E.g.: An IP address for ARP resolution 319 must be always clearly distinguished from an /32 IP Prefix, or 320 a default IP route 0.0.0.0/0 must always be easily and clearly 321 distinguished from the absence of IP information. 323 o MAC address information must not be compared by BGP when 324 selecting two IP Prefix routes. If IP Prefixes are to be 325 advertised using MAC routes, the MAC information is always 326 present and part of the route key. 328 o IP Prefix routes must not be subject to MAC route procedures 329 such as MAC Mobility or aliasing. Prefixes advertised from two 330 different ESIs do not mean mobility; MACs advertised from two 331 different ESIs do mean mobility. Similarly load balancing for 332 IP prefixes is achieved through IP mechanisms such as ECMP, 333 and not through MAC route mechanisms such as aliasing. 335 o NVEs that do not require processing IP Prefixes must have an 336 easy way to identify an update with an IP Prefix and ignore 337 it, rather than processing the MAC route only to find out 338 later that it carries a Prefix that must be ignored. 340 The following sections describe how E-VPN is extended with a new 341 route type for the advertisement of prefixes and how this route is 342 used to address the current and future inter-subnet connectivity 343 requirements existing in the Data Center. 345 2. The BGP E-VPN IP Prefix route 347 The current BGP E-VPN NLRI as defined in [E-VPN] is shown below: 349 +-----------------------------------+ 350 | Route Type (1 octet) | 351 +-----------------------------------+ 352 | Length (1 octet) | 353 +-----------------------------------+ 354 | Route Type specific (variable) | 355 +-----------------------------------+ 357 Where the route type field can contain one of the following specific 358 values: 360 + 1 - Ethernet Auto-Discovery (A-D) route 362 + 2 - MAC advertisement route 364 + 3 - Inclusive Multicast Route 366 + 4 - Ethernet Segment Route 368 This document defines an additional route type that will be used for 369 the advertisement of IP Prefixes: 371 + 5 - IP Prefix Route 373 The support for this new route type is OPTIONAL. 375 By using a separate route type for IP prefix advertisements, there is 376 a clean separation of functions between route types, i.e. route type 377 2 or MAC Advertisement route will be used for MAC and ARP resolution 378 advertisement, whereas route type 5 or IP Prefix route will be used 379 for the advertisement of prefixes. Since this new route type is 380 OPTIONAL, an implementation not supporting it will easily ignore the 381 route, based on the route type value. 383 The detailed encoding of this route and associated procedures are 384 described in the following sections. 386 2.1. IP Prefix Route encoding 388 An IP Prefix advertisement route type specific E-VPN NLRI consists of 389 the following fields: 391 +---------------------------------------+ 392 | RD (8 octets) | 393 +---------------------------------------+ 394 |Ethernet Segment Identifier (10 octets)| 395 +---------------------------------------+ 396 | Ethernet Tag ID (4 octets) | 397 +---------------------------------------+ 398 | IP Address Length (1 octet) | 399 +---------------------------------------+ 400 | IP Address (4 or 16 octets) | 401 +---------------------------------------+ 402 | MPLS Label (3 octets) | 403 +---------------------------------------+ 405 Where: 407 o RD, Ethernet Tag ID and MPLS Label fields will be used as 408 defined in [E-VPN] and [E-VPN-OVERLAYS]. 410 o The Ethernet Segment Identifier will be zero for IP prefix 411 advertisements in this version of the document, and be re-used 412 in the future for other purposes. 414 o The IP address length can be set to a value between 0 and 32 415 (bits) for ipv4 and between 0 and 128 for ipv6. 417 o The IP address will be a 32 or 128-bit field (ipv4 or ipv6). 419 o The total route length will indicate the type of prefix (ipv4 420 or ipv6). 422 The Eth-Tag ID, IP address length and IP address will be part of the 423 route key used by BGP to compare routes. The rest of the fields will 424 be out of the route key. 426 2.2. BGP remote-next-hop attribute 428 The BGP remote-next-hop attribute [BGP-REMOTE-NH] will be sent along 429 with the IP Prefix advertisement to indicate the next-hop behind 430 which the advertised prefix is located. The following table shows the 431 different types of next-hops defined in this document and their 432 corresponding encoding in the BGP remote-next-hop attribute. 434 +--------------------+----------------------------------+ 435 | Prefix next-hop | Field in the remote-nh attribute | 436 +--------------------+----------------------------------+ 437 | MAC address | sub-TLV (for VXLAN or NVGRE) | 438 | IRB IP address | tunnel address (ipv4 or ipv6) | 439 | Floating IP address| tunnel address (ipv4 or ipv6) | 440 +--------------------+----------------------------------+ 442 3. Procedures associated to the advertisement of IP Prefixes 444 This section describes the separate function of each E-VPN 445 advertisement route: route type 2 for MAC/IP advertisements and route 446 type 5 for IP Prefixes. 448 After defining the role of each route type and the benefits of using 449 a separate route for IP Prefixes, the procedures associated to the 450 advertisement of prefixes will be explained in three different use 451 cases. 453 3.1. Usage of the MAC advertisement and IP Prefix advertisement routes 455 [E-VPN] describes the content of the BGP E-VPN route type 2 specific 456 NLRI, i.e. MAC Advertisement Route, where the IP address length (IPL) 457 and IP address (IP) of a specific advertised MAC are encoded. The 458 subject of the MAC advertisement route is the MAC address (M) and MAC 459 address length (ML) encoded in the route. The MAC mobility and other 460 complex procedures are defined around that MAC address. The IP 461 address information carries the host IP address required for the ARP 462 resolution of the MAC. 464 The BGP E-VPN route type 5 defined in this document, i.e. IP Prefix 465 Advertisement route, decouples the advertisement of IP prefixes from 466 the advertisement of any MAC address related to it. This brings some 467 major benefits to NVO-based networks where inter-subnet forwarding is 468 required. Some of those benefits are: 470 a) Upon receiving a route type 2 or type 5, an egress NVE can easily 471 distinguish MACs and IPs for ARP resolution from IP Prefixes. E.g. 472 an IP prefix with IPL=32 being advertised from two different 473 ingress NVEs (as route type 5) can be identified as such and be 474 imported in the designated routing context as two ECMP routes, as 475 opposed to two ARP entries competing for the same IP. 477 b) Similarly, upon receiving a route, an egress NVE not supporting 478 processing IP Prefixes can easily ignore the update, based on the 479 route type. 481 c) A MAC route includes the ML, M, IPL and IP in the route key that 482 is used by BGP to compare routes. Advertised IP Prefixes are 483 imported into the designated routing context, where there is no 484 MAC information associated to IP routes. In the example 485 illustrated in figure 1, subnet SN1 should be advertised by NVE2 486 and NVE3 and interpreted by DGW1 as the same route coming from two 487 different next-hops, regardless of the MAC address associated to 488 TS2 or TS3. This is easily accomplished in the route type 5 by 489 including only the IP information in the route key. 491 d) By decoupling the MAC from the IP Prefix advertisement procedures, 492 we can leave the IP prefix advertisements out of the MAC mobility 493 procedures defined in [E-VPN] for MACs. In addition, this allows 494 us to have an indirection mechanism for IP prefixes advertised 495 from a MAC/IP that can move between hypervisors. E.g. if there are 496 1,000 prefixes seating behind TS2 (figure 1), NVE2 will advertise 497 all those prefixes in type 5 routes associated to the next-hop 498 IP2. Should TS2 move to a different NVE, a single MAC 499 advertisement route withdraw for the M2/IP2 route from NVE2 will 500 invalidate the 1,000 prefixes, as opposed to have to wait for each 501 individual prefix to be withdrawn. This may be easily accomplished 502 by using a different IP Prefix route type that is not tied to a 503 MAC address. 505 3.2. Inter-subnet connectivity for TS 507 The following figure illustrates an example of inter-subnet 508 forwarding for subnets seating behind Virtual Appliances (on TS2 and 509 TS3). 511 SN1---+ NVE2 DGW1 512 | +--------+ +---------+ +-------------+ 513 SN2---TS2(VA)--|(EVI-10)|----| |----|(EVI-10) | 514 | IP2/M2 +--------+ | | | IRB1\ | 515 IP4---+ | | | (VRF)|---+ 516 | | +-------------+ _|_ 517 | VXLAN/ | ( ) 518 | nvGRE | DGW2 ( WAN ) 519 SN1---+ NVE3 | | +-------------+ (___) 520 | IP3/M3 +--------+ | |----|(EVI-10) | | 521 SN3---TS3(VA)--|(EVI-10)|----| | | IRB2\ | | 522 | +--------+ +---------+ | (VRF)|---+ 523 IP5---+ +-------------+ 525 Figure 2 Inter-subnet forwarding for TS 527 An example of inter-subnet forwarding between subnet SN1/24 and a 528 subnet seating in the WAN is described below. NVE2, NVE3, DGW1 and 529 DGW2 are running BGP E-VPN. TS2 and TS3 do not support routing 530 protocols, only a static route to forward the traffic to the WAN. 532 (1) NVE2 advertises the following BGP routes on behalf of TS2: 534 o Route type 2 (MAC route) containing: ML=48, M=M2, IPL=32, 535 IP=IP2 537 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 538 remote-nh tunnel address=IP2 540 (2) NVE3 advertises the following BGP routes on behalf of TS3: 542 o Route type 2 (MAC route) containing: ML=48, M=M3, IPL=32, 543 IP=IP3 545 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 546 remote-nh tunnel address=IP3 548 (3) DGW1 and DGW2 import both received routes based on the RT: 550 o Based on the EVI-10 route-target in DGW1 and DGW2, the MAC 551 route is imported and M2 is added to the EVI-10 MAC FIB along 552 with its corresponding tunnel information. For the VXLAN use 553 case, the VTEP will be derived from the MAC route BGP next-hop 554 and VNI from the Ethernet Tag or MPLS fields (see [E-VPN- 555 OVERLAYS]). IP2 - M2 is added to the ARP table. 557 o Based on the EVI-10 route-target in DGW1 and DGW2, the IP 558 Prefix route is also imported and SN1/24 is added to the 559 designated routing context with next-hop IP2 pointing at the 560 local EVI-10. Should ECMP be enabled in the routing context, 561 SN1/24 would also be added to the routing table with next-hop 562 IP3. 564 (4) When DGW1 receives a packet from the WAN with destination IPx, 565 where IPx belongs to SN1/24: 567 o A destination IP lookup is performed on the DGW1 VRF routing 568 table and next-hop=IP2 is found. The tunnel information to 569 encapsulate the packet will be derived from the route-type 2 570 (MAC route) received for M2/IP2. 572 o IP2 is resolved to M2 in the ARP table, and M2 is resolved to 573 the tunnel information given by the MAC FIB (remote VTEP and 574 VNI for the VXLAN case). 576 o The IP packet destined to IPx is encapsulated with: 578 . Source inner MAC = IRB1 MAC 580 . Destination inner MAC = M2 582 . Tunnel information provided by the MAC FIB (VNI, VTEP IPs 583 and MACs for the VXLAN case) 585 (5) When the packet arrives at NVE2: 587 o Based on the tunnel information (VNI for the VXLAN case), the 588 EVI-10 context is identified for a MAC lookup. 590 o Encapsulation is stripped-off and based on a MAC lookup 591 (assuming MAC forwarding on the egress NVE), the packet is 592 forwarded to TS2, where it will be properly routed. 594 (5) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will 595 be applied to the MAC route IP2/M2, as defined in [EVPN]. Route type 596 5 prefixes are not subject to MAC mobility procedures, hence no 597 changes in the DGW VRF routing table will occur for TS2 mobility, 598 i.e. all the prefixes will still be pointing at IP2 as next-hop. 599 There is an indirection for e.g. SN1/24, which still points at 600 next-hop IP2 in the routing table, but IP2 will be simply resolved to 601 a different tunnel, based on the outcome of the MAC mobility 602 procedures for the MAC route IP2/M2. 604 Note that in the opposite direction, TS2 will send traffic based on 605 its static-route next-hop information (IRB1 and/or IRB2), and regular 606 E-VPN procedures will be applied. 608 3.3. Inter-subnet connectivity for redundant TS (floating IP) 610 Sometimes Tenant Systems (TS) work in active/standby mode where an 611 upstream floating IP - owned by the active TS - is used as the next- 612 hop to get to some subnets behind. This redundancy mode, alredy 613 introduced in section 1.1 and 1.3, is illustrated in Figure 3. 615 NVE2 DGW1 616 +--------+ +---------+ +-------------+ 617 +---TS2(VA)--|(EVI-10)|----| |----|(EVI-10) | 618 | IP2/M2 +--------+ | | | IRB1\ | 619 | <-+ | | | (VRF)|---+ 620 | | | | +-------------+ _|_ 621 SN1 vIP23 (floating) | VXLAN/ | ( ) 622 | | | nvGRE | DGW2 ( WAN ) 623 | <-+ NVE3 | | +-------------+ (___) 624 | IP3/M3 +--------+ | |----|(EVI-10) | | 625 +---TS3(VA)--|(EVI-10)|----| | | IRB2\ | | 626 +--------+ +---------+ | (VRF)|---+ 627 +-------------+ 628 Figure 3 Inter-subnet forwarding for redundant TS 630 In this example, assuming TS2 is the active TS and owns IP23: 632 (1) NVE2 advertises the following BGP routes for TS2: 634 o Route type 2 (MAC route) containing: ML=48, M=M2, IPL=32, 635 IP=IP23 637 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 638 remote-nh tunnel address=IP23 640 (2) NVE3 advertises the following BGP routes for TS3: 642 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 643 remote-nh tunnel address=IP23 645 (3) DGW1 and DGW2 import both received routes based on the RT: 647 o M2 is added to the EVI-10 MAC FIB along with its corresponding 648 tunnel information. For the VXLAN use case, the VTEP will be 649 derived from the MAC route BGP next-hop and VNI from the 650 Ethernet Tag or MPLS fields (see [E-VPN-OVERLAYS]). IP23 - M2 651 is added to the ARP table. 653 o SN1/24 is added to the designated routing context in DGW1 and 654 DGW2 with next-hop IP23 pointing at the local EVI-10. 656 (4) When DGW1 receives a packet from the WAN with destination IPx, 657 where IPx belongs to SN1/24: 659 o A destination IP lookup is performed on the DGW1 VRF routing 660 table and next-hop=IP23 is found. The tunnel information to 661 encapsulate the packet will be derived from the route-type 2 662 (MAC route) received for M2/IP23. 664 o IP23 is resolved to M2 in the ARP table, and M2 is resolved to 665 the tunnel information given by the MAC FIB (remote VTEP and 666 VNI for the VXLAN case). 668 o The IP packet destined to IPx is encapsulated with: 670 . Source inner MAC = IRB1 MAC 672 . Destination inner MAC = M2 674 . Tunnel information provided by the MAC FIB (VNI, VTEP IPs 675 and MACs for the VXLAN case) 677 (5) When the packet arrives at NVE2: 679 o Based on the tunnel information (VNI for the VXLAN case), the 680 EVI-10 context is identified for a MAC lookup. 682 o Encapsulation is stripped-off and based on a MAC lookup 683 (assuming MAC forwarding on the egress NVE), the packet is 684 forwarded to TS2, where it will be properly routed. 686 (5) When the redundancy protocol running between TS2 and TS3 appoints 687 TS3 as the new active TS for SN1, TS3 will now own the floating IP23 688 and will signal this new ownership (GARP message or similar). Upon 689 receiving the new owner's notification, NVE3 will issue a route type 690 2 for M3-IP23. DGW1 and DGW2 will update their ARP tables with the 691 new MAC resolving the floating IP. No changes are carried out in the 692 VRF routing table. 694 In the DGW1/2 BGP RIB, there will be two route type 5 routes for SN1 695 (from NVE2 and NVE3) but only the one with the same BGP next-hop as 696 the IP23 route type 2 BGP next-hop will be valid. 698 3.4. Inter-subnet connectivity for IRB interfaces 700 In some other cases, the NVEs and DGWs will have just IRB interfaces 701 as hosts in the E-VPN instance. Figure 4 illustrates an example. 703 NVE1 704 +---------------------+ DGW1 705 IP1---|(EVI-1) | +-------------+ 706 | \ IRB3 | +---------+ |(EVI-10) | 707 | (VRF)-(EVI-10)|--| |--| IRB1\ | 708 | / | | | | (VRF)|---+ 709 |-|(EVI-2) | | | +-------------+ _|_ 710 SN1| +---------------------+ | | ( ) 711 | +---------------------+ | VXLAN/ | DGW2 ( WAN ) 712 |-|(EVI-2) | | nvGRE | +-------------+ (___) 713 | \ IRB4 | | | |(EVI-10) | | 714 | (VRF)-(EVI-10)|--| |--| IRB2\ | | 715 | / | +---------+ | (VRF)|---+ 716 SN2---|(EVI-3) | +-------------+ 717 +---------------------+ 718 NVE2 720 Figure 4 Inter-subnet forwarding for IRB interfaces 722 In this case: 724 (1) NVE1 advertises the following BGP routes for SN1 resolution: 726 o Route type 2 (MAC route) containing: ML=48, M=IRB3-MAC, 727 IPL=32, IP=IRB3-IP 729 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 730 remote-nh tunnel address=IRB3-IP 732 (2) NVE2 advertises the following BGP routes for SN1 resolution: 734 o Route type 2 (MAC route) containing: ML=48, M=IRB4-MAC, 735 IPL=32, IP=IRB4-IP 737 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 738 remote-nh tunnel address=IRB4-IP 740 (3) DGW1 and DGW2 import both received routes based on the RT: 742 o IRB3-MAC and IRB4-MAC are added to the EVI-10 MAC FIB along 743 with their corresponding tunnel information. For the VXLAN use 744 case, the VTEP will be derived from the MAC route BGP next-hop 745 and VNI from the Ethernet Tag or MPLS fields (see [E-VPN- 746 OVERLAYS]). IRB3-MAC - IRB3-IP and IRB4-MAC - IRB4-IP are 747 added to the ARP table. 749 o SN1/24 is added to the designated routing context in DGW1 and 750 DGW2 with next-hop IRB3-IP (and/or IRB4-IP) pointing at the 751 local EVI-10. 753 Similar forwarding procedures as the ones described in the previous 754 use-cases are followed. 756 3.4.1. Inter-subnet connectivity for unnumbered IRB interfaces 758 In the previous example, the E-VPN instance can connect IRB 759 interfaces and any other Tenant Systems connected to it. E-VPN 760 provides connectivity for: 762 a) Traffic destined to the IRB IP interfaces as well as 763 b) Traffic destined to IP subnets seating behind the IRB interfaces, 764 e.g. SN1 or SN2. 766 In order to provide connectivity for (a) we need MAC routes (route- 767 type 2) distributing IRB MACs and IPs. Connectivity type (b) is 768 accomplished by the exchange of IP Prefix routes (route-type 5) for 769 IPs and subnets seating behind IRBs. As discussed in this document, 770 prefixes are advertised along with their corresponding remote 771 next-hop tunnel address, and those tunnel addresses are used to link 772 prefixes to MAC/IPs advertised in MAC routes (type 2). 774 In some cases, connectivity type (a) (see above) is not required and 775 the E-VPN instance is connecting only IRB interfaces, which are never 776 the final destination of any packet. This use case is depicted in the 777 diagram below and we refer to it as the "unnumbered IRB interface" 778 use-case: 779 NVE1 780 +------------+ 781 IP1-----|(EVI-1) | DGW1 782 | \ | +---------+ +-----+ 783 | (VRF)|----| |----|(VRF)|----+ 784 | / | | | +-----+ | 785 |---|(EVI-2) | | | _|_ 786 | +------------+ | | ( ) 787 SN1| | VXLAN/ | ( WAN ) 788 | NVE2 | nvGRE | (___) 789 | +------------+ | | | 790 |---|(EVI-2) | | | DGW2 | 791 | \ | | | +-----+ | 792 | (VRF)|----| |----|(VRF)|----+ 793 | / | +---------+ +-----+ 794 SN2-----|(EVI-3) | 795 +------------+ 797 Figure 5 Inter-subnet forwarding for unnumbered IRB interfaces 799 In this case, we need to provide connectivity from/to IP hosts in 800 SN1, SN2, IP1 and hosts seating at the other end of the WAN. The 801 E-VPN in the core just connects all the IRBs in NVE1, NVE2, DGW1 and 802 DGW2 but there will not be any IP host in this core E-VPN that is the 803 final destination of any IP packet. 805 Therefore there is no need to define IRB IP addresses (IRBs are not 806 represented in the diagram). This is the reason why we refer to this 807 solution as "unnumbered Ethernet IRB" solution. 809 In this case, the proposal is to use EVPN type 5 routes and the BGP 810 Remote-Next-Hop attribute, where the following information is 811 carried: 813 o Route type 5 Eth-Tag ID can contain the core instance VNI (if 814 the VNI is global, otherwise, for local significant VNIs, an 815 MPLS label field may be added with a 20-bit VNI encoded in the 816 label space, as per [E-VPN-OVERLAYS]). 818 o Route type 5 IP address length and IP address, as explained in 819 the previous section. 821 o Remote next-hop Tunnel Type is: TBD for VXLAN and TBD for 822 NVGRE (TBD by IANA). 824 o Remote next-hop Tunnel Address is populated with zeros, 825 meaning that the prefix next-hop is an "unnumbered IRB". 827 o Remote next-hop sub-TLV (for VXLAN/NVGRE) in the Tunnel 828 Parameters field: contains the next-hop MAC address associated 829 to the unnumbered IRB interface. This MAC address identifies 830 the NVE/DGW and can be re-used for all the VRFs in the node. 832 Example of prefix advertisement for the ipv4 prefix SN1/24 advertised 833 from NVE1: 835 (1) NVE1 advertises the following BGP route for SN1: 837 o Route type 5 (IP Prefix route) containing: Eth-Tag=VNI=10 838 (assuming global VNI), IPL=24, IP=SN1. In addition to that, a 839 Remote-NH attribute will be sent, where: Tunnel-type= VXLAN or 840 NVGRE and a Sub-TLV will contain a MAC address= NVE1 MAC. 842 o As discussed, no MAC route is advertised for this core evpn. 844 (2) DGW1 imports the received route from NVE1 and SN1/24 is added to 845 the designated routing context. The next-hop for SN1/24 will be given 846 by the route type 5 BGP next-hop (NVE1), which is resolved to a 847 tunnel. For instance: if the tunnel is VXLAN based, the BGP next-hop 848 will be resolved to a VXLAN tunnel where: destination-VTEP= NVE1 IP, 849 VNI=10, inner destination MAC = NVE1 MAC (derived from the remote-nh 850 attribute). 852 (3) When DGW1 receives a packet from the WAN with destination IPx, 853 where IPx belongs to SN1/24: 855 o A destination IP lookup is performed on the DGW1 VRF routing 856 table and next-hop= "NVE1 IP" is found. The tunnel information 857 to encapsulate the packet will be derived from the route-type 858 5 received for SN1. 860 o The IP packet destined to IPx is encapsulated with: Source 861 inner MAC = DGW1 MAC, Destination inner MAC = NVE1 MAC, Source 862 outer IP (source VTEP) = DGW1 IP, Destination outer IP 863 (destination VTEP) = NVE1 IP 865 (4) When the packet arrives at NVE1: 867 o Based on the tunnel information (VNI for the VXLAN case), the 868 routing context is identified for an IP lookup. 870 o An IP lookup is performed in the routing context, where SN1 871 turns out to be a local subnet associated to EVI-2. A 872 subsequent lookup in the ARP table and the EVI-2 MAC FIB will 873 return the forwarding information for the packet in EVI-2. 875 4. Conclusions 877 A new E-VPN route type 5 for the advertisement of IP Prefixes is 878 proposed in this document. This new route type will have a 879 differentiated role from the route type 2, i.e. MAC advertisement 880 route, and will address all the inter-subnet connectivity scenarios 881 which are required in the Data Center. As discussed throughout the 882 document, IP-VPN cannot be used in an NVO-based DC to advertise IP 883 Prefixes and the existing E-VPN route type 2 does not meet the 884 requirements for all the DC use cases, therefore a new E-VPN route 885 type is required. 887 This new E-VPN route type 5 decouples the IP Prefix advertisements 888 from the MAC route advertisements in E-VPN, hence: 890 a) Allows the clean and clear announcements of ipv4 or ipv6 prefixes 891 in an NLRI with no MAC addresses in the route key, so that only IP 892 information is used in BGP route comparisons. 894 b) Since the route type is different from the MAC advertisement 895 route, the advertisement of prefixes will be excluded from all the 896 procedures defined for the advertisement of VM MACs, e.g. MAC 897 Mobility or aliasing. As a result of that, the current E-VPN 898 procedures do not need to be modified. 900 c) Allows a flexible implementation where the prefix can be linked to 901 different types of next-hops: MAC address, IP address, IRB IP 902 address, ESI, etc. and these MAC or IP addresses do not need to 903 reside in the advertising NVE. 905 d) An E-VPN implementation not requiring IP Prefixes can simply 906 discard them by looking at the route type value. 908 5. Conventions used in this document 910 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 911 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 912 document are to be interpreted as described in RFC-2119 [RFC2119]. 914 6. Security Considerations 916 7. IANA Considerations 918 8. References 920 8.1. Normative References 922 [RFC4364]Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 923 Networks (VPNs)", RFC 4364, February 2006. 925 8.2. Informative References 927 [E-VPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf- 928 l2vpn-evpn-03.txt, work in progress, February, 2013 930 [E-VPN-OVERLAYS] Sajassi-Drake et al., "A Network Virtualization 931 Overlay Solution using E-VPN", draft-sd-l2vpn-evpn-overlay-01.txt, 932 work in progress, February, 2013 934 [BGP-REMOTE-NH] Van de Velde et al., "BGP Remote-Next-Hop", 935 draft-vandevelde-idr-remote-next-hop-03.txt, work in progress, 936 October, 2012 938 9. Acknowledgments 939 The authors would like to thank Mukul Katiyar and Senthil Sathappan 940 for their valuable feedback and contributions. 942 10. Authors' Addresses 944 Jorge Rabadan 945 Alcatel-Lucent 946 777 E. Middlefield Road 947 Mountain View, CA 94043 USA 948 Email: jorge.rabadan@alcatel-lucent.com 950 Wim Henderickx 951 Alcatel-Lucent 952 Email: wim.henderickx@alcatel-lucent.com 954 Florin Balus 955 Nuage Networks 956 Email: florin@nuagenetworks.net 958 Aldrin Isaac 959 Bloomberg 960 Email: aisaac71@bloomberg.net 962 Senad Palislamovic 963 Alcatel-Lucent 964 Email: senad.palislamovic@alcatel-lucent.com