idnits 2.17.1 draft-rabadan-l2vpn-evpn-prefix-advertisement-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 21, 2013) is 3837 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC4364' is mentioned on line 299, but not defined == Missing Reference: 'EVPN' is mentioned on line 635, but not defined == Missing Reference: 'RFC2119' is mentioned on line 942, but not defined == Outdated reference: A later version (-11) exists of draft-ietf-l2vpn-evpn-03 == Outdated reference: A later version (-03) exists of draft-sd-l2vpn-evpn-overlay-01 Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 L2VPN Workgroup J. Rabadan 3 Internet Draft W. Henderickx 4 S. Palislamovic 5 Intended status: Standards Track Alcatel-Lucent 7 F. Balus 8 Nuage Networks 10 A. Isaac 11 Bloomberg 13 Expires: April 24, 2014 October 21, 2013 15 IP Prefix Advertisement in E-VPN 16 draft-rabadan-l2vpn-evpn-prefix-advertisement-01 18 Abstract 20 E-VPN provides a flexible control plane that allows intra-subnet 21 connectivity in an IP/MPLS and/or an NVO-based network. In Data 22 Centers, there is also a need for a dynamic and efficient inter- 23 subnet connectivity across Tenant Systems and End Devices that can be 24 physical or virtual and may not support their own routing protocols. 25 This document defines a new E-VPN route type for the advertisement of 26 IP Prefixes and explains some use-case examples where this new route- 27 type is used. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as Internet- 37 Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/ietf/1id-abstracts.txt 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 This Internet-Draft will expire on January 16, 2014. 51 Copyright Notice 53 Copyright (c) 2013 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 2. Introduction and problem statement . . . . . . . . . . . . . . 3 70 2.1 Inter-subnet connectivity requirements in Data Centers . . . 3 71 2.2 The requirement for advertising IP prefixes in E-VPN . . . . 6 72 2.3 The requirement for a new E-VPN route type . . . . . . . . . 7 73 3. The BGP E-VPN IP Prefix route . . . . . . . . . . . . . . . . . 9 74 3.1 IP Prefix Route encoding . . . . . . . . . . . . . . . . . . 9 75 4. Benefits of using the E-VPN IP Prefix route . . . . . . . . . . 11 76 5. IP Prefix next-hop use-cases . . . . . . . . . . . . . . . . . 12 77 5.1 TS IP address next-hop use-case . . . . . . . . . . . . . . 12 78 5.2 Floating IP next-hop use-case . . . . . . . . . . . . . . . 15 79 5.3 IRB IP next-hop use-case . . . . . . . . . . . . . . . . . . 16 80 5.4 ESI next-hop ("Bump in the wire") use-case . . . . . . . . . 18 81 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 20 82 7. Conventions used in this document . . . . . . . . . . . . . . . 21 83 8. Security Considerations . . . . . . . . . . . . . . . . . . . . 21 84 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 21 85 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 86 10.1 Normative References . . . . . . . . . . . . . . . . . . . 21 87 10.2 Informative References . . . . . . . . . . . . . . . . . . 21 88 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 89 12. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 21 91 1. Terminology 93 GW IP: Gateway IP Address 95 IPL: IP address length 97 IRB: Integrated Routing and Bridging interface 99 ML: MAC address length 101 NVE: Network Virtualization Edge 103 TS: Tenant System 105 VA: Virtual Appliance 107 Overlay next-hop: object used in the IP Prefix route, as described in 108 this document. It can be an IP address in the tenant space or an ESI, 109 and identifies the next-hop to be used in IP lookups for a given IP 110 Prefix at the routing context importing the route. 112 Underlay next-hop: IP address sent by BGP along with any E-VPN route, 113 i.e. BGP next-hop. It identifies the NVE sending the route and it is 114 used at the receiving NVE as the VXLAN destination VTEP or NVGRE 115 destination end-point. 117 2. Introduction and problem statement 119 Inter-subnet connectivity is required within the Data Center, 120 therefore IP Prefixes must be advertised in the control plane. This 121 section explains why IP-VPN [RFC4364] procedures are not recommended 122 for such advertisements and why the existing E-VPN MAC route type 123 does not meet the Data Center requirements for the advertisement of 124 IP Prefixes, hence a new E-VPN route type is proposed. 126 Section 2.1 describes the inter-subnet connectivity requirements in 127 Data Centers. Section 2.2 and 2.3 explain why neither IP-VPN nor the 128 existing E-VPN route types meet the requirements for IP Prefix 129 advertisements. Once the need for a new E-VPN route type is 130 justified, sections 2 and 3 will describe this route type and how it 131 is used in some specific use cases. 133 2.1 Inter-subnet connectivity requirements in Data Centers 135 [E-VPN] is used as the control plane for a Network Virtualization 136 Overlay (NVO3) solution in Data Centers (DC), where Network 137 Virtualization Edge (NVE) devices can be located in Hypervisors or 138 TORs, as described in [E-VPN-OVERLAYS]. 140 If we use the term Tenant System (TS) to designate a physical or 141 virtual system identified by MAC and IP addresses, and connected to 142 an E-VPN instance, the following considerations apply: 144 o The Tenant Systems may be Virtual Machines (VMs) that generate 145 traffic from their own MAC and IP. 147 o The Tenant Systems may be Virtual Appliance entities (VAs) that 148 forward traffic to/from IP addresses of different End Devices 149 seating behind them. 151 o These VAs can be firewalls, load balancers, NAT devices, other 152 appliances or virtual gateways with virtual routing instances. 154 o These VAs do not have their own routing protocols and hence 155 rely on the E-VPN NVEs to advertise the routes on their 156 behalf. 158 o In all these cases, the VA will forward traffic to the Data 159 Center using its own source MAC but the source IP will be the 160 one associated to the End Device seating behind or a 161 translated IP address (part of a public NAT pool) if the VA is 162 performing NAT. 164 o Note that the same IP address could exist behind two of these 165 TS. One example of this would be certain appliance resiliency 166 mechanisms, where a virtual IP or floating IP can be owned by 167 one of the two VAs running the resiliency protocol (the master 168 VA). VRRP is one particular example of this. Another example 169 is multi-homed subnets, i.e. the same subnet is connected to 170 two VAs. 172 o Although these VAs provide IP connectivity to VMs and subnets 173 behind them, they do not always have their own IP interface 174 connected to the E-VPN NVE, e.g. layer-2 firewalls are 175 examples of VAs not supporting IP interfaces. 177 The following figure illustrates some of the examples described 178 above. 180 NVE1 181 +--------+ 182 TS1(VM)--|(EVI-10)|---------+ 183 IP1/M1 +--------+ | DGW1 184 +---------+ +-------------+ 185 | |----|(EVI-10) | 186 SN1---+ NVE2 | | | IRB1 | 187 | +--------+ | | | (VRF)|---+ 188 SN2---TS2(VA)--|(EVI-10)|----| | +-------------+ _|_ 189 | IP2/M2 +--------+ | VXLAN/ | ( ) 190 IP4---+ <-+ | nvGRE | DGW2 ( WAN ) 191 | | | +-------------+ (___) 192 vIP23 (floating) | |----|(EVI-10) | | 193 | +---------+ | IRB2 | | 194 SN1---+ <-+ NVE3 | | | | (VRF)|---+ 195 | IP3/M3 +--------+ | | | +-------------+ 196 SN3---TS3(VA)--|(EVI-10)|------+ | | 197 | +--------+ | | 198 IP5---+ | | 199 | | 200 NVE4 | | NVE5 +--SN5 201 +---------------------+ | | +--------+ | 202 IP6------|(EVI-1) | | +----|(EVI-10)|--TS4(VA)--SN6 203 | \ IRB3 | | +--------+ | 204 | (VRF)-(EVI-10)|--+ ESI4 +--SN7 205 | / | 206 |---|(EVI-2) | 207 SN4| +---------------------+ 209 Figure 1 DC inter-subnet use-cases 211 Where: 213 NVE1, NVE2, NVE3, NVE4, NVE5, DGW1 and DGW2 share the same E-VPN for 214 a particular tenant. EVI-10 is the corresponding E-VPN instance on 215 each element, and all the hosts connected to that instance belong to 216 the same IP subnet. The hosts connected to E-VPN 10 are listed below: 218 o TS1 is a VM that generates/receives traffic from/to IP1, where 219 IP1 belongs to the E-VPN 10 subnet. 221 o TS2 and TS3 are Virtual Appliances (VA) that generate/receive 222 traffic from/to the subnets and hosts seating behind them 223 (SN1, SN2, SN3, IP4 and IP5). Their IP addresses (IP2 and IP3) 224 belong to the E-VPN subnet and they can also generate/receive 225 traffic. When these VAs receive packets destined to their own 226 MAC addresses (M2 and M3) they will route the packets to the 227 proper subnet or host. These VAs do not support routing 228 protocols to advertise the subnets connected to them and can 229 move to a different server and NVE when the Cloud Management 230 System decides to do so. These VAs may also support redundancy 231 mechanisms for some subnets, similar to VRRP, where a floating 232 IP is owned by the master VA and only the master VA forwards 233 traffic to a given subnet. E.g.: vIP23 in figure 1 is a 234 floating IP that can be owned by TS2 or TS3 depending on who 235 the master is. Only the master will forward traffic to SN1. 237 o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 238 have their own IP addresses that belong to the E-VPN 10 subnet 239 too. These IRB interfaces connect the E-VPN 10 subnet to 240 Virtual Routing and Forwarding (VRF) instances that can route 241 the traffic to other connected subnets for the same tenant 242 (within the DC or at the other end of the WAN). 244 o TS4 is a layer-2 VA that provides connectivity to subnets SN5, 245 SN6 and SN7, but does not have an IP address itself in the E- 246 VPN 10. TS4 is connected to a physical port on NVE5 assigned 247 to Ethernet Segment Identifier 4. 249 All the above DC use cases require inter-subnet forwarding and 250 therefore the individual host routes and subnets MUST be advertised: 252 a) From the NVEs (since VAs and VMs do not run routing protocols) and 253 b) Associated to an overlay next-hop that can be a VA IP address, a 254 floating IP address, and IRB IP address or an ESI. 256 2.2 The requirement for advertising IP prefixes in E-VPN 258 In all the inter-subnet connectivity cases discussed in section 2.1 259 there is a need to advertise IP prefixes. The advertisement of such 260 prefixes must meet certain requirements, specific to NVO-based Data 261 Centers: 263 o The data plane in NVO-based Data Centers is not based on IP 264 over a GRE or MPLS tunnel as required by [RFC4364], but 265 Ethernet over an IP tunnel, such as VXLAN or NVGRE. 267 o The IP prefixes in the DC must be advertised with a 268 flexibility that does not exist in IP-VPNs today. For 269 instance: 271 a) The advertised overlay next-hop for a given IP prefix can 272 be an IRB IP address (see section 5.3), a floating IP 273 address (see section 5.2) or even an ESI (see section 5.4). 275 b) As stated by [E-VPN-OVERLAYS], VXLAN or NVGRE virtual 276 identifiers can have a global or a local scope. The 277 implementation MUST support the flexibility to advertise IP 278 Prefixes associated to a global identifier (32-bit value 279 encoded in the E-VPN Ethernet Tag ID) or a locally 280 significant identifier (20-bit value encoded in the MPLS 281 label field). At the moment, [RFC4364] can only advertise 282 Prefixes associated to a locally significant identifier 283 (MPLS label). 285 c) Since an NVE can potentially advertise many Prefixes with 286 different overlay next-hops and different VXLAN/NVGRE 287 identifiers, it is highly desirable to be able to advertise 288 those prefixes with their corresponding overlay next-hop and 289 VXLAN/NVGRE identifier within the same NLRI, for a better 290 BGP update packing. [RFC4364] does not have the capability 291 of advertising a flexible overlay next-hop together with a 292 prefix in the same NLRI. 294 o IP prefixes must be advertised by NVE devices that have no VRF 295 instances defined and no capability to process IP-VPN 296 prefixes. These NVE devices just support E-VPN and advertise 297 IP Prefixes on behalf of some connected Tenant Systems. In 298 other words: any attempt to solve this problem by simply using 299 [RFC4364] routes requires that any EVPN deployment must be 300 accompanied with a concurrent IP-VPN topology, which is not 301 possible in most of the cases. 303 o Finally, Data Center providers want to use a single BGP 304 Subsequent Address Family (AFI/SAFI) for the advertisement of 305 addresses within the Data Center, i.e. BGP E-VPN only, as 306 opposed to using E-VPN and IP-VPN in a concurrent topology. 307 This minimizes the control plane overhead in TORs and 308 Hypervisors and simplifies the operations. 310 E-VPN is extended - as described in this document - to advertise IP 311 prefixes with the flexibility required by the current and future Data 312 Center applications. 314 2.3 The requirement for a new E-VPN route type 316 [E-VPN] defines a MAC route (or route type 2) where a MAC address can 317 be advertised together with an IP address length (IPL) and IP address 318 (IP). While a variable IPL might be used to indicate the presence of 319 an IP prefix in a route type 2, there are several specific use cases 320 in which using this route type to deliver IP Prefixes is not 321 suitable. 323 One example of such use cases is the "floating IP" example described 324 in section 2.1. In this example we need to decouple the advertisement 325 of the prefixes from the advertisement of the floating IP (vIP23 in 326 figure 1) and MAC associated to it, otherwise the solution gets 327 highly inefficient and does not scale. 329 E.g.: if we are advertising 1k prefixes from M2 (using route type 2) 330 and the floating IP owner changes from M2 to M3, we would need to 331 withdraw 1k routes from M2 and re-advertise 1k routes from M3. 332 However if we use a separate route type, we can advertise the 1k 333 routes associated to the floating IP address (vIP23) and only one 334 route type 2 for advertising the ownership of the floating IP, i.e. 335 vIP23 and M2 in the route type 2. When the floating IP owner changes 336 from M2 to M3, a single route type 2 withdraw/update is required to 337 indicate the change. The remote DGW will not change any of the 1k 338 prefixes associated to vIP23, but will only update the ARP resolution 339 entry for vIP23 (now pointing at M3). 341 Other reasons to decouple the IP Prefix advertisement from the MAC 342 route are listed below: 344 o Clean identification, operation of troubleshooting of IP 345 Prefixes, not subject to interpretation and independent of the 346 IPL and the IP value. E.g.: An IP address for ARP resolution 347 must be always clearly distinguished from an /32 IP Prefix, or 348 a default IP route 0.0.0.0/0 must always be easily and clearly 349 distinguished from the absence of IP information. 351 o MAC address information must not be compared by BGP when 352 selecting two IP Prefix routes. If IP Prefixes are to be 353 advertised using MAC routes, the MAC information is always 354 present and part of the route key. 356 o IP Prefix routes must not be subject to MAC route procedures 357 such as MAC Mobility or aliasing. Prefixes advertised from two 358 different ESIs do not mean mobility; MACs advertised from two 359 different ESIs do mean mobility. Similarly load balancing for 360 IP prefixes is achieved through IP mechanisms such as ECMP, 361 and not through MAC route mechanisms such as aliasing. 363 o NVEs that do not require processing IP Prefixes must have an 364 easy way to identify an update with an IP Prefix and ignore 365 it, rather than processing the MAC route only to find out 366 later that it carries a Prefix that must be ignored. 368 The following sections describe how E-VPN is extended with a new 369 route type for the advertisement of prefixes and how this route is 370 used to address the current and future inter-subnet connectivity 371 requirements existing in the Data Center. 373 3. The BGP E-VPN IP Prefix route 375 The current BGP E-VPN NLRI as defined in [E-VPN] is shown below: 377 +-----------------------------------+ 378 | Route Type (1 octet) | 379 +-----------------------------------+ 380 | Length (1 octet) | 381 +-----------------------------------+ 382 | Route Type specific (variable) | 383 +-----------------------------------+ 385 Where the route type field can contain one of the following specific 386 values: 388 + 1 - Ethernet Auto-Discovery (A-D) route 390 + 2 - MAC advertisement route 392 + 3 - Inclusive Multicast Route 394 + 4 - Ethernet Segment Route 396 This document defines an additional route type that will be used for 397 the advertisement of IP Prefixes: 399 + 5 - IP Prefix Route 401 The support for this new route type is OPTIONAL. 403 By using a separate route type for IP prefix advertisements, there is 404 a clean separation of functions between route types, i.e. route type 405 2 or MAC Advertisement route will be used for MAC and ARP resolution 406 advertisement, whereas route type 5 or IP Prefix route will be used 407 for the advertisement of prefixes. Since this new route type is 408 OPTIONAL, an implementation not supporting it will easily ignore the 409 route, based on the route type value. 411 The detailed encoding of this route and associated procedures are 412 described in the following sections. 414 3.1 IP Prefix Route encoding 416 An IP Prefix advertisement route type specific E-VPN NLRI consists of 417 the following fields: 419 +---------------------------------------+ 420 | RD (8 octets) | 421 +---------------------------------------+ 422 |Ethernet Segment Identifier (10 octets)| 423 +---------------------------------------+ 424 | Ethernet Tag ID (4 octets) | 425 +---------------------------------------+ 426 | IP Address Length (1 octet) | 427 +---------------------------------------+ 428 | IP Address (4 or 16 octets) | 429 +---------------------------------------+ 430 | GW IP Address (4 or 16 octets) | 431 +---------------------------------------+ 432 | MPLS Label (3 octets) | 433 +---------------------------------------+ 435 Where: 437 o RD, Ethernet Tag ID and MPLS Label fields will be used as 438 defined in [E-VPN] and [E-VPN-OVERLAYS]. 440 o The Ethernet Segment Identifier will be a non-zero 10-byte 441 identifier if the ESI is used as an overlay next-hop. It will 442 be zero otherwise. 444 o The IP address length can be set to a value between 0 and 32 445 (bits) for ipv4 and between 0 and 128 for ipv6. 447 o The IP address will be a 32 or 128-bit field (ipv4 or ipv6). 449 o The GW IP (Gateway IP Address) will be a 32 or 128-bit field 450 (ipv4 or ipv6), and will encode the overlay IP next-hop for 451 the IP Prefixes. The GW IP field can be zero if it is not used 452 as an overlay next-hop. 454 o The total route length will indicate the type of prefix (ipv4 455 or ipv6) and the type of GW IP address (ipv4 or ipv6). Note 456 that the IP Address + the GW IP should have a length of either 457 64 or 256 bits, but never 160 bits (ipv4 and ipv6 mixed values 458 are not allowed). 460 The Eth-Tag ID, IP address length and IP address will be part of the 461 route key used by BGP to compare routes. The rest of the fields will 462 be out of the route key. 464 The route will contain a single overlay next-hop, i.e. if the ESI 465 field is zero, the GW IP field will not, and vice versa. The 466 following table shows the different inter-subnet use-cases described 467 in this document and the corresponding coding of the overlay next-hop 468 in the route-type 5. 470 +----------------------------+----------------------------------+ 471 | Overlay next-hop use-case | Field in the route-type 5 | 472 +----------------------------+----------------------------------+ 473 | TS IP address | GW IP Address | 474 | Floating IP address | GW IP Address | 475 | IRB IP address | GW IP Address | 476 | "Bump in the wire" | ESI | 477 +----------------------------+----------------------------------+ 479 4. Benefits of using the E-VPN IP Prefix route 481 This section clarifies the different functions accomplished by the E- 482 VPN route-type 2 and route-type 5 routes, and provides a list of 483 benefits derived from using a separate route type for the 484 advertisement of IP Prefixes in E-VPN. 486 [E-VPN] describes the content of the BGP E-VPN route type 2 specific 487 NLRI, i.e. MAC Advertisement Route, where the IP address length (IPL) 488 and IP address (IP) of a specific advertised MAC are encoded. The 489 subject of the MAC advertisement route is the MAC address (M) and MAC 490 address length (ML) encoded in the route. The MAC mobility and other 491 complex procedures are defined around that MAC address. The IP 492 address information carries the host IP address required for the ARP 493 resolution of the MAC. 495 The BGP E-VPN route type 5 defined in this document, i.e. IP Prefix 496 Advertisement route, decouples the advertisement of IP prefixes from 497 the advertisement of any MAC address related to it. This brings some 498 major benefits to NVO-based networks where inter-subnet forwarding is 499 required. Some of those benefits are: 501 a) Upon receiving a route type 2 or type 5, an egress NVE can easily 502 distinguish MACs and IPs for ARP resolution from IP Prefixes. E.g. 503 an IP prefix with IPL=32 being advertised from two different 504 ingress NVEs (as route type 5) can be identified as such and be 505 imported in the designated routing context as two ECMP routes, as 506 opposed to two ARP entries competing for the same IP. 508 b) Similarly, upon receiving a route, an egress NVE not supporting 509 processing IP Prefixes can easily ignore the update, based on the 510 route type. 512 c) A MAC route includes the ML, M, IPL and IP in the route key that 513 is used by BGP to compare routes, whereas for IP Prefix routes, 514 only IPL and IP (as well as Ethernet Tag ID) are part of the route 515 key. Advertised IP Prefixes are imported into the designated 516 routing context, where there is no MAC information associated to 517 IP routes. In the example illustrated in figure 1, subnet SN1 518 should be advertised by NVE2 and NVE3 and interpreted by DGW1 as 519 the same route coming from two different next-hops, regardless of 520 the MAC address associated to TS2 or TS3. This is easily 521 accomplished in the route type 5 by including only the IP 522 information in the route key. 524 d) By decoupling the MAC from the IP Prefix advertisement procedures, 525 we can leave the IP prefix advertisements out of the MAC mobility 526 procedures defined in [E-VPN] for MACs. In addition, this allows 527 us to have an indirection mechanism for IP prefixes advertised 528 from a MAC/IP that can move between hypervisors. E.g. if there are 529 1,000 prefixes seating behind TS2 (figure 1), NVE2 will advertise 530 all those prefixes in type 5 routes associated to the next-hop 531 IP2. Should TS2 move to a different NVE, a single MAC 532 advertisement route withdraw for the M2/IP2 route from NVE2 will 533 invalidate the 1,000 prefixes, as opposed to have to wait for each 534 individual prefix to be withdrawn. This may be easily accomplished 535 by using IP Prefix routes that are not tied to a MAC address, and 536 use a different MAC route to advertise the location and resolution 537 of the overlay next-hop to a MAC address. 539 5. IP Prefix next-hop use-cases 541 The IP Prefix route can use a GW IP or an ESI as an overlay next-hop. 542 This section describes some use-cases for both next-hop types. 544 5.1 TS IP address next-hop use-case 546 The following figure illustrates an example of inter-subnet 547 forwarding for subnets seating behind Virtual Appliances (on TS2 and 548 TS3). 550 SN1---+ NVE2 DGW1 551 | +--------+ +---------+ +-------------+ 552 SN2---TS2(VA)--|(EVI-10)|----| |----|(EVI-10) | 553 | IP2/M2 +--------+ | | | IRB1\ | 554 IP4---+ | | | (VRF)|---+ 555 | | +-------------+ _|_ 556 | VXLAN/ | ( ) 557 | nvGRE | DGW2 ( WAN ) 558 SN1---+ NVE3 | | +-------------+ (___) 559 | IP3/M3 +--------+ | |----|(EVI-10) | | 560 SN3---TS3(VA)--|(EVI-10)|----| | | IRB2\ | | 561 | +--------+ +---------+ | (VRF)|---+ 562 IP5---+ +-------------+ 564 Figure 2 TS IP address use-case 566 An example of inter-subnet forwarding between subnet SN1/24 and a 567 subnet seating in the WAN is described below. NVE2, NVE3, DGW1 and 568 DGW2 are running BGP E-VPN. TS2 and TS3 do not support routing 569 protocols, only a static route to forward the traffic to the WAN. 571 (1) NVE2 advertises the following BGP routes on behalf of TS2: 573 o Route type 2 (MAC route) containing: ML=48, M=M2, IPL=32, 574 IP=IP2 576 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 577 ESI=0, GW IP address=IP2 579 (2) NVE3 advertises the following BGP routes on behalf of TS3: 581 o Route type 2 (MAC route) containing: ML=48, M=M3, IPL=32, 582 IP=IP3 584 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 585 ESI=0, GW IP address=IP3 587 (3) DGW1 and DGW2 import both received routes based on the RT: 589 o Based on the EVI-10 route-target in DGW1 and DGW2, the MAC 590 route is imported and M2 is added to the EVI-10 MAC FIB along 591 with its corresponding tunnel information. For the VXLAN use 592 case, the VTEP will be derived from the MAC route BGP next-hop 593 (underlay next-hop) and VNI from the Ethernet Tag or MPLS 594 fields (see [E-VPN-OVERLAYS]). IP2 - M2 is added to the ARP 595 table. 597 o Based on the EVI-10 route-target in DGW1 and DGW2, the IP 598 Prefix route is also imported and SN1/24 is added to the 599 designated routing context with next-hop IP2 pointing at the 600 local EVI-10. Should ECMP be enabled in the routing context, 601 SN1/24 would also be added to the routing table with next-hop 602 IP3. 604 (4) When DGW1 receives a packet from the WAN with destination IPx, 605 where IPx belongs to SN1/24: 607 o A destination IP lookup is performed on the DGW1 VRF routing 608 table and next-hop=IP2 is found. The tunnel information to 609 encapsulate the packet will be derived from the route-type 2 610 (MAC route) received for M2/IP2. 612 o IP2 is resolved to M2 in the ARP table, and M2 is resolved to 613 the tunnel information given by the MAC FIB (remote VTEP and 614 VNI for the VXLAN case). 616 o The IP packet destined to IPx is encapsulated with: 618 . Source inner MAC = IRB1 MAC 620 . Destination inner MAC = M2 622 . Tunnel information provided by the MAC FIB (VNI, VTEP IPs 623 and MACs for the VXLAN case) 625 (5) When the packet arrives at NVE2: 627 o Based on the tunnel information (VNI for the VXLAN case), the 628 EVI-10 context is identified for a MAC lookup. 630 o Encapsulation is stripped-off and based on a MAC lookup 631 (assuming MAC forwarding on the egress NVE), the packet is 632 forwarded to TS2, where it will be properly routed. 634 (6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will 635 be applied to the MAC route IP2/M2, as defined in [EVPN]. Route type 636 5 prefixes are not subject to MAC mobility procedures, hence no 637 changes in the DGW VRF routing table will occur for TS2 mobility, 638 i.e. all the prefixes will still be pointing at IP2 as next-hop. 639 There is an indirection for e.g. SN1/24, which still points at 640 next-hop IP2 in the routing table, but IP2 will be simply resolved to 641 a different tunnel, based on the outcome of the MAC mobility 642 procedures for the MAC route IP2/M2. 644 Note that in the opposite direction, TS2 will send traffic based on 645 its static-route next-hop information (IRB1 and/or IRB2), and regular 646 E-VPN procedures will be applied. 648 5.2 Floating IP next-hop use-case 650 Sometimes Tenant Systems (TS) work in active/standby mode where an 651 upstream floating IP - owned by the active TS - is used as the next- 652 hop to get to some subnets behind. This redundancy mode, already 653 introduced in section 2.1 and 2.3, is illustrated in Figure 3. 655 NVE2 DGW1 656 +--------+ +---------+ +-------------+ 657 +---TS2(VA)--|(EVI-10)|----| |----|(EVI-10) | 658 | IP2/M2 +--------+ | | | IRB1\ | 659 | <-+ | | | (VRF)|---+ 660 | | | | +-------------+ _|_ 661 SN1 vIP23 (floating) | VXLAN/ | ( ) 662 | | | nvGRE | DGW2 ( WAN ) 663 | <-+ NVE3 | | +-------------+ (___) 664 | IP3/M3 +--------+ | |----|(EVI-10) | | 665 +---TS3(VA)--|(EVI-10)|----| | | IRB2\ | | 666 +--------+ +---------+ | (VRF)|---+ 667 +-------------+ 668 Figure 3 Floating IP next-hop for redundant TS 670 In this example, assuming TS2 is the active TS and owns IP23: 672 (1) NVE2 advertises the following BGP routes for TS2: 674 o Route type 2 (MAC route) containing: ML=48, M=M2, IPL=32, 675 IP=IP23 677 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 678 ESI=0, GW IP address=IP23 680 (2) NVE3 advertises the following BGP routes for TS3: 682 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 683 ESI=0, GW IP address=IP23 685 (3) DGW1 and DGW2 import both received routes based on the RT: 687 o M2 is added to the EVI-10 MAC FIB along with its corresponding 688 tunnel information. For the VXLAN use case, the VTEP will be 689 derived from the MAC route BGP next-hop and VNI from the 690 Ethernet Tag or MPLS fields (see [E-VPN-OVERLAYS]). IP23 - M2 691 is added to the ARP table. 693 o SN1/24 is added to the designated routing context in DGW1 and 694 DGW2 with next-hop IP23 pointing at the local EVI-10. 696 (4) When DGW1 receives a packet from the WAN with destination IPx, 697 where IPx belongs to SN1/24: 699 o A destination IP lookup is performed on the DGW1 VRF routing 700 table and next-hop=IP23 is found. The tunnel information to 701 encapsulate the packet will be derived from the route-type 2 702 (MAC route) received for M2/IP23. 704 o IP23 is resolved to M2 in the ARP table, and M2 is resolved to 705 the tunnel information given by the MAC FIB (remote VTEP and 706 VNI for the VXLAN case). 708 o The IP packet destined to IPx is encapsulated with: 710 . Source inner MAC = IRB1 MAC 712 . Destination inner MAC = M2 714 . Tunnel information provided by the MAC FIB (VNI, VTEP IPs 715 and MACs for the VXLAN case) 717 (5) When the packet arrives at NVE2: 719 o Based on the tunnel information (VNI for the VXLAN case), the 720 EVI-10 context is identified for a MAC lookup. 722 o Encapsulation is stripped-off and based on a MAC lookup 723 (assuming MAC forwarding on the egress NVE), the packet is 724 forwarded to TS2, where it will be properly routed. 726 (6) When the redundancy protocol running between TS2 and TS3 appoints 727 TS3 as the new active TS for SN1, TS3 will now own the floating IP23 728 and will signal this new ownership (GARP message or similar). Upon 729 receiving the new owner's notification, NVE3 will issue a route type 730 2 for M3-IP23. DGW1 and DGW2 will update their ARP tables with the 731 new MAC resolving the floating IP. No changes are carried out in the 732 VRF routing table. 734 In the DGW1/2 BGP RIB, there will be two route type 5 routes for SN1 735 (from NVE2 and NVE3) but only the one with the same BGP next-hop as 736 the IP23 route type 2 BGP next-hop will be valid. 738 5.3 IRB IP next-hop use-case 739 In some other cases, the NVEs and DGWs will have just IRB interfaces 740 as hosts in the E-VPN instance. Figure 4 illustrates an example. 742 NVE1 743 +---------------------+ DGW1 744 IP1---|(EVI-1) | +-------------+ 745 | \ IRB3 | +---------+ |(EVI-10) | 746 | (VRF)-(EVI-10)|--| |--| IRB1\ | 747 | / | | | | (VRF)|---+ 748 |-|(EVI-2) | | | +-------------+ _|_ 749 SN1| +---------------------+ | | ( ) 750 | +---------------------+ | VXLAN/ | DGW2 ( WAN ) 751 |-|(EVI-2) | | nvGRE | +-------------+ (___) 752 | \ IRB4 | | | |(EVI-10) | | 753 | (VRF)-(EVI-10)|--| |--| IRB2\ | | 754 | / | +---------+ | (VRF)|---+ 755 SN2---|(EVI-3) | +-------------+ 756 +---------------------+ 757 NVE2 759 Figure 4 IRB IP next-hop use-case 761 In this case: 763 (1) NVE1 advertises the following BGP routes for SN1 resolution: 765 o Route type 2 (MAC route) containing: ML=48, M=IRB3-MAC, 766 IPL=32, IP=IRB3-IP 768 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 769 ESI=0, GW IP address=IRB3-IP 771 (2) NVE2 advertises the following BGP routes for SN1 resolution: 773 o Route type 2 (MAC route) containing: ML=48, M=IRB4-MAC, 774 IPL=32, IP=IRB4-IP 776 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 777 ESI=0, GW IP address=IRB4-IP 779 (3) DGW1 and DGW2 import both received routes based on the RT: 781 o IRB3-MAC and IRB4-MAC are added to the EVI-10 MAC FIB along 782 with their corresponding tunnel information. For the VXLAN use 783 case, the VTEP will be derived from the MAC route BGP next-hop 784 and VNI from the Ethernet Tag or MPLS fields (see [E-VPN- 785 OVERLAYS]). IRB3-MAC - IRB3-IP and IRB4-MAC - IRB4-IP are 786 added to the ARP table. 788 o SN1/24 is added to the designated routing context in DGW1 and 789 DGW2 with next-hop IRB3-IP (and/or IRB4-IP) pointing at the 790 local EVI-10. 792 Similar forwarding procedures as the ones described in the previous 793 use-cases are followed. 795 5.4 ESI next-hop ("Bump in the wire") use-case 797 The following figure illustrates and example of inter-subnet 798 forwarding for a subnet route that uses an ESI as an overlay next- 799 hop. In this use-case, TS2 and TS3 are layer-2 VA devices without any 800 IP address that can be included as an overlay next-hop in the GW IP 801 field of the IP Prefix route. 803 NVE2 DGW1 804 +--------+ +---------+ +-------------+ 805 +---TS2(VA)--|(EVI-10)|----| |----|(EVI-10) | 806 | ESI23 +--------+ | | | IRB1 | 807 | + | | | (VRF)|---+ 808 | | | | +-------------+ _|_ 809 SN1 | | VXLAN/ | ( ) 810 | | | nvGRE | DGW2 ( WAN ) 811 | + NVE3 | | +-------------+ (___) 812 | ESI23 +--------+ | |----|(EVI-10) | | 813 +---TS3(VA)--|(EVI-10)|----| | | IRB2 | | 814 +--------+ +---------+ | (VRF)|---+ 815 +-------------+ 817 Figure 5 ESI next-hop use-case 819 Since neither TS2 nor TS3 can run any routing protocol and have no IP 820 address assigned, an ESI, i.e. ESI23, will be provisioned on the 821 attachment ports of NVE2 and NVE3. This model supports VA redundancy 822 in a similar way as the one described in section 4.2 for the floating 823 IP next-hop use-case, only using the E-VPN A-D route instead of the 824 MAC advertisement route to advertise the location of the overlay 825 next-hop. The procedure is explained below: 827 (1) NVE2 advertises the following BGP routes for TS2: 829 o Route type 1 (A-D route for EVI-10) containing: ESI=ESI23 and 830 the corresponding tunnel information (Ethernet Tag and/or MPLS 831 label). Assuming the ESI is active on NVE2, NVE2 will 832 advertise this route. 834 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 835 ESI=ESI23, GW IP address=0. 837 (2) NVE3 advertises the following BGP routes for TS3: 839 o Route type 1 (A-D route for EVI-10) containing: ESI=ESI23 and 840 the corresponding tunnel information (Ethernet Tag and/or MPLS 841 label). NVE3 will advertise this route assuming the ESI is 842 active on NVE2. Note that if the resiliency mechanism for TS2 843 and TS3 is in active-active mode, both NVE2 and NVE3 will send 844 the A-D route. Otherwise, that is, the resiliency is active- 845 standby, only the NVE owning the active ESI will advertise the 846 A-D route for ESI23. 848 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, 849 ESI=23, GW IP address=0. 851 (3) DGW1 and DGW2 import the received routes based on the RT: 853 o The tunnel information to get to ESI23 is installed in DGW1 854 and DGW2. For the VXLAN use case, the VTEP will be derived 855 from the A-D route BGP next-hop and VNI from the Ethernet Tag 856 or MPLS fields (see [E-VPN-OVERLAYS]). 858 o SN1/24 is added to the designated routing context in DGW1 and 859 DGW2 with next-hop ESI23 pointing at the local EVI-10. 861 (4) When DGW1 receives a packet from the WAN with destination IPx, 862 where IPx belongs to SN1/24: 864 o A destination IP lookup is performed on the DGW1 VRF routing 865 table and next-hop=ESI23 is found. The tunnel information to 866 encapsulate the packet will be derived from the route-type 1 867 (A-D route) received for ESI23. 869 o The IP packet destined to IPx is encapsulated with: 871 . Source inner MAC = IRB1 MAC 873 . Destination inner MAC = M2 (this MAC will be looked up in 874 the EVI-10 FDB using the ESI23 as the key for the 875 lookup). 877 . Tunnel information provided by the A-D route for ESI23 878 (VNI, VTEP IP and MACs for the VXLAN case). 880 (5) When the packet arrives at NVE2: 882 o Based on the tunnel information (VNI for the VXLAN case), the 883 EVI-10 context is identified for a MAC lookup (assuming MAC 884 disposition model). 886 o Encapsulation is stripped-off and based on a MAC lookup 887 (assuming MAC forwarding on the egress NVE), the packet is 888 forwarded to TS2, where it will be properly forwarded. 890 (6) If the redundancy protocol running between TS2 and TS3 follows an 891 active/standby model and there is a failure, appointing TS3 as the 892 new active TS for SN1, TS3 will now own the connectivity to SN1 and 893 will signal this new ownership (GARP message or similar). Upon 894 receiving the new owner's notification, NVE3 will issue a route type 895 1 for ESI23, whereas NVE2 will withdraw it's A-D route for ESI23. 896 DGW1 and DGW2 will update their tunnel information to resolve ESI23. 897 No changes are carried out in the VRF routing table. 899 In the DGW1/2 BGP RIB, there will be two route type 5 routes for SN1 900 (from NVE2 and NVE3) but only the one with the same BGP next-hop as 901 the ESI23 route type 1 BGP next-hop will be valid. 903 6. Conclusions 905 A new E-VPN route type 5 for the advertisement of IP Prefixes is 906 proposed in this document. This new route type will have a 907 differentiated role from the route type 2, i.e. MAC advertisement 908 route, and will address all the inter-subnet connectivity scenarios 909 which are required in the Data Center, where the overlay next-hop can 910 be an IP address or an ESI. As discussed throughout the document, IP- 911 VPN cannot be used in an NVO-based DC to advertise IP Prefixes and 912 the existing E-VPN route type 2 does not meet the requirements for 913 all the DC use cases, therefore a new E-VPN route type is required. 915 This new E-VPN route type 5 decouples the IP Prefix advertisements 916 from the MAC route advertisements in E-VPN, hence: 918 a) Allows the clean and clear announcements of ipv4 or ipv6 prefixes 919 in an NLRI with no MAC addresses in the route key, so that only IP 920 information is used in BGP route comparisons. 922 b) Since the route type is different from the MAC advertisement 923 route, the advertisement of prefixes will be excluded from all the 924 procedures defined for the advertisement of VM MACs, e.g. MAC 925 Mobility or aliasing. As a result of that, the current E-VPN 926 procedures do not need to be modified. 928 c) Allows a flexible implementation where the prefix can be linked to 929 different types of next-hops: MAC address, IP address, IRB IP 930 address, ESI, etc. and these MAC or IP addresses do not need to 931 reside in the advertising NVE. 933 d) An E-VPN implementation not requiring IP Prefixes can simply 934 discard them by looking at the route type value. 936 7. Conventions used in this document 938 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 939 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" 940 in this document are to be interpreted as described in RFC-2119 941 [RFC2119]. 943 8. Security Considerations 945 9. IANA Considerations 947 10. References 949 10.1 Normative References 951 [RFC4364]Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 952 Networks (VPNs)", RFC 4364, February 2006. 954 10.2 Informative References 956 [E-VPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf- 957 l2vpn-evpn-03.txt, work in progress, February, 2013 959 [E-VPN-OVERLAYS] Sajassi-Drake et al., "A Network Virtualization 960 Overlay Solution using E-VPN", draft-sd-l2vpn-evpn-overlay-01.txt, 961 work in progress, February, 2013 963 11. Acknowledgments 965 The authors would like to thank Mukul Katiyar and Senthil 966 Sathappan for their valuable feedback and contributions. 968 12. Authors' Addresses 970 Jorge Rabadan 971 Alcatel-Lucent 972 777 E. Middlefield Road 973 Mountain View, CA 94043 USA 974 Email: jorge.rabadan@alcatel-lucent.com 976 Wim Henderickx 977 Alcatel-Lucent 978 Email: wim.henderickx@alcatel-lucent.com 980 Florin Balus 981 Nuage Networks 982 Email: florin@nuagenetworks.net 984 Aldrin Isaac 985 Bloomberg 986 Email: aisaac71@bloomberg.net 988 Senad Palislamovic 989 Alcatel-Lucent 990 Email: senad.palislamovic@alcatel-lucent.com