idnits 2.17.1 draft-ietf-bess-evpn-usage-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([EVPN]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 13, 2014) is 3451 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'VPLS-MCAST' is mentioned on line 240, but not defined == Missing Reference: 'RFC4761' is mentioned on line 248, but not defined == Missing Reference: 'RFC4762' is mentioned on line 248, but not defined == Missing Reference: 'RFC6074' is mentioned on line 248, but not defined == Missing Reference: 'RFC7209' is mentioned on line 260, but not defined == Missing Reference: 'PE-IP' is mentioned on line 364, but not defined == Missing Reference: 'AS' is mentioned on line 370, but not defined == Missing Reference: 'RFC2119' is mentioned on line 1278, but not defined == Unused Reference: 'RFC709' is defined on line 1317, but no explicit reference was found in the text == Outdated reference: A later version (-11) exists of draft-ietf-l2vpn-evpn-10 Summary: 1 error (**), 0 flaws (~~), 11 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 L2VPN Workgroup J. Rabadan 3 Internet Draft S. Palislamovic 4 W. Henderickx 5 Intended status: Informational F. Balus 6 Alcatel-Lucent 8 J. Uttaro K. Patel 9 AT&T A. Sajassi 10 Cisco 12 A. Isaac 13 T. Boyes 14 Bloomberg 16 Expires: May 17, 2015 November 13, 2014 18 Usage and applicability of BGP MPLS based Ethernet VPN 19 draft-ietf-bess-evpn-usage-00.txt 21 Abstract 23 This document discusses the usage and applicability of BGP MPLS based 24 Ethernet VPN (EVPN) in a simple and fairly common deployment 25 scenario. The different EVPN procedures will be explained on the 26 example scenario, analyzing the benefits and trade-offs of each 27 option. Along with [EVPN], this document is intended to provide a 28 simplified guide for the deployment of EVPN in Service Provider 29 networks. 31 Status of this Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF), its areas, and its working groups. Note that 38 other groups may also distribute working documents as Internet- 39 Drafts. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 45 The list of current Internet-Drafts can be accessed at 46 http://www.ietf.org/ietf/1id-abstracts.txt 48 The list of Internet-Draft Shadow Directories can be accessed at 49 http://www.ietf.org/shadow.html 51 This Internet-Draft will expire on May 17, 2015. 53 Copyright Notice 55 Copyright (c) 2014 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 2. Use-case scenario description . . . . . . . . . . . . . . . . . 4 72 3. Provisioning Model . . . . . . . . . . . . . . . . . . . . . . 6 73 3.1. Common provisioning tasks . . . . . . . . . . . . . . . . . 7 74 3.1.1. Non-service specific parameters . . . . . . . . . . . . 7 75 3.1.2. Service specific parameters . . . . . . . . . . . . . . 8 76 3.2. Service interface dependent provisioning tasks . . . . . . 8 77 3.2.1. VLAN-based service interface EVI . . . . . . . . . . . 8 78 3.2.2. VLAN-bundle service interface EVI . . . . . . . . . . . 9 79 3.2.3. VLAN-aware bundling service interface EVI . . . . . . . 9 80 4. BGP EVPN NLRI usage . . . . . . . . . . . . . . . . . . . . . . 9 81 5. MAC-based forwarding model use-case . . . . . . . . . . . . . . 10 82 5.1. EVPN Network Startup procedures . . . . . . . . . . . . . . 10 83 5.2. VLAN-based service procedures . . . . . . . . . . . . . . . 11 84 5.2.1. Service startup procedures . . . . . . . . . . . . . . 11 85 5.2.2. Packet walkthrough . . . . . . . . . . . . . . . . . . 12 86 5.3. VLAN-bundle service procedures . . . . . . . . . . . . . . 15 87 5.3.1. Service startup procedures . . . . . . . . . . . . . . 15 88 5.3.2. Packet Walkthrough . . . . . . . . . . . . . . . . . . 16 89 5.4. VLAN-aware bundling service procedures . . . . . . . . . . 16 90 5.4.1. Service startup procedures . . . . . . . . . . . . . . 17 91 5.4.2. Packet Walkthrough . . . . . . . . . . . . . . . . . . 17 92 6. MPLS-based forwarding model use-case . . . . . . . . . . . . . 18 93 6.1. Impact of MPLS-based forwarding on the EVPN network 94 startup . . . . . . . . . . . . . . . . . . . . . . . . . . 19 95 6.2. Impact of MPLS-based forwarding on the VLAN-based service 96 procedures . . . . . . . . . . . . . . . . . . . . . . . . 19 97 6.3. Impact of MPLS-based forwarding on the VLAN-bundle 98 service procedures . . . . . . . . . . . . . . . . . . . . 19 99 6.4. Impact of MPLS-based forwarding on the VLAN-aware service 100 procedures . . . . . . . . . . . . . . . . . . . . . . . . 20 101 7. Comparison between MAC-based and MPLS-based forwarding models . 21 102 8. Traffic flow optimization . . . . . . . . . . . . . . . . . . . 22 103 8.1. Control Plane Procedures . . . . . . . . . . . . . . . . . 22 104 8.1.1. MAC learning options . . . . . . . . . . . . . . . . . 22 105 8.1.2. Proxy-ARP/ND . . . . . . . . . . . . . . . . . . . . . 23 106 8.1.3. Unknown Unicast flooding suppression . . . . . . . . . 23 107 8.1.4. Optimization of Inter-subnet forwarding . . . . . . . . 24 108 8.2. Packet Walkthrough Examples . . . . . . . . . . . . . . . . 25 109 8.2.1. Proxy-ARP example for CE2 to CE3 traffic . . . . . . . 25 110 8.2.2. Flood suppression example for CE1 to CE3 traffic . . . 25 111 8.2.3. Optimization of inter-subnet forwarding example for 112 CE3 to CE2 traffic . . . . . . . . . . . . . . . . . . 26 113 9. Conventions used in this document . . . . . . . . . . . . . . . 27 114 10. Security Considerations . . . . . . . . . . . . . . . . . . . 28 115 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 116 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 117 12.1. Normative References . . . . . . . . . . . . . . . . . . . 28 118 12.2. Informative References . . . . . . . . . . . . . . . . . . 28 119 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 29 120 14. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 29 122 1. Introduction 124 This document complements [EVPN] by discussing the applicability of 125 the technology in a simple and fairly common deployment scenario, 126 which is described in section 2. 128 After describing the topology of the use-case scenario and the 129 characteristics of the service to be deployed, section 3 will 130 describe the provisioning model, comparing the EVPN procedures with 131 the provisioning tasks required for other VPN technologies, such as 132 VPLS or IP-VPN. 134 Once the provisioning model is analyzed, sections 4, 5 and 6 will 135 describe the control plane and data plane procedures in the example 136 scenario, for the two potential disposition/forwarding models: MAC- 137 based and MPLS-based models. While both models can interoperate in 138 the same network, each one has different trade-offs that are analyzed 139 in section 7. 141 Finally, EVPN provides some potential traffic flow optimization tools 142 that are also described in section 8, in the context of the example 143 scenario. 145 2. Use-case scenario description 147 The following figure depicts the scenario that will be referenced 148 throughout the rest of the document. 150 +--------------+ 151 | | 152 +----+ +----+ | | +----+ +----+ 153 | CE1|-----| | | | | |---| CE3| 154 +----+ /| PE1| | IP/MPLS | | PE3| +----+ 155 / +----+ | Network | +----+ 156 / | | 157 / +----+ | | 158 +----+/ | | | | 159 | CE2|-----| PE2| | | 160 +----+ +----+ | | 161 +--------------+ 163 Figure 1 EVPN use-case scenario 165 There are three PEs and three CEs considered in this example: PE1, 166 PE2, PE3, as well as CE1, CE2 and CE3. Layer-2 traffic must be 167 extended among the three CEs. The following service requirements are 168 assumed in this scenario: 170 o Redundancy requirements: CE1 and CE3 are single-homed to PE1 and 171 PE3 respectively. CE2 requires multi-homing connectivity to PE1 and 172 PE2, not only for redundancy purposes, but also for adding more 173 upstream/downstream connectivity bandwidth to/from the network. If 174 CE2 has a single CE-VID (or a few CE-VIDs) the current VPLS 175 multi-homing solutions (based on load-balancing per CE-VID or 176 service) do not provide the optimized link utilization required in 177 this example. Another redundancy requirement that must be met is 178 fast convergence. E.g.: if the link between CE2 and PE1 goes down, 179 a fast convergence mechanism must be supported so that PE3 can 180 immediately send the traffic to PE2, irrespectively of the number 181 of affected services and MAC addresses. EVPN provides the 182 flow-based load-balancing multi-homing solution required in this 183 scenario to optimize the upstream/downstream link utilization 184 between CE2 and PE1-PE2. EVPN also provides a fast convergence 185 solution so that PE3 can immediately send the traffic to PE2 upon 186 failure on the link between CE2 and PE1. 188 o Service interface requirements: service definition must be flexible 189 in terms of CE-VID-to-broadcast-domain assignment and service 190 contexts in the core. The following three services are required in 191 this example: 193 EVI100 - It will use VLAN-based service interfaces in the three CEs 194 with a 1:1 mapping (VLAN-to-EVI). The CE-VIDs at the three CEs can 195 be the same, e.g.: VID 100, or different at each CE, e.g.: VID 101 196 in CE1, VID 102 in CE2 and VID 103 in CE3. A single broadcast 197 domain needs to be created for EVI100 in any case; therefore CE- 198 VIDs will require translation at the egress PEs if they are not 199 consistent across the three CEs. The case when the same CE-VID is 200 used across the three CEs for EVI100 is referred in [EVPN] as the 201 "Unique VLAN" EVPN case. This term will be used throughout this 202 document too. 204 EVI200 - It will use VLAN-bundle service interfaces in CE1, CE2 and 205 CE3, based on an N:1 VLAN-to-EVI mapping. In this case, the service 206 provider just needs to assign a pre-configured number of CE-VIDs on 207 the ingress PE to EVI200, and send the customer frames with the 208 original CE-VIDs. The Service Provider will build a single 209 broadcast domain for the customer. The customer will be responsible 210 for the CE-VID handling. 212 EVI300 - It will use VLAN-aware bundling service interfaces in CE1, 213 CE2 and CE3. At the ingress PE, an N:1 VLAN-to-EVI mapping will be 214 done, however and as opposed to EVI200, a separate core broadcast 215 domain is required per CE-VID. In addition to that, the CE-VIDs can 216 be different (hence CE-VID translation is required). Note that, 217 while the requirements stated for EVI100 and EVI200 might be met 218 with the current VPLS solutions, the VLAN-aware bundling service 219 interfaces required by EVI300 are not supported by the current VPLS 220 tools. 222 NOTE: in section 3.2.1, only EVI100 is used as an example of 223 VLAN-based service provisioning. In sections 5.2 and 6.2, 4k 224 VLAN-based EVIs (EVI1 to EVI4k) are used so that the impact of MAC 225 vs. MPLS disposition models in the control plane can be evaluated. In 226 the same way, EVI200 and EVI300 will be described with a 4k:1 mapping 227 (CE-VIDs-to-EVI mapping) in sections 5.3-4 and 6.3-4. 229 o BUM (Broadcast, Unknown unicast, Multicast) optimization 230 requirements: The solution must be able to support ingress 231 replication, P2MP MPLS LSPs and MP2MP MPLS LSPs and the user must 232 be able to decide what kind of provider tree will be used by each 233 EVI service. For example, if we assume that EVI100 and EVI200 will 234 not carry much BUM traffic, we can use ingress replication for 235 those service instances. The benefit is that the core will not need 236 to maintain any states for the multicast trees associated to EVI100 237 and EVI200. On the contrary, if EVI300 is presumably carrying a 238 significant amount of multicast traffic, P2MP MPLS LSPs or MP2MP 239 LSPs can be used for this service. Note that ingress replication 240 and P2MP LSPs are supported by VPLS solutions (see [VPLS-MCAST]), 241 however VPLS solutions do not support MP2MP LSPs, since the source 242 of the tree must be identified for the data plane MAC learning, and 243 that identification is challenging when using MP2MP LSPs. Since 244 EVPN uses the control plane for MAC learning, any type of provider 245 multicast tree is supported in the core. 247 As already outlined above, the current VPLS solutions, based on 248 [RFC4761][RFC4762][RFC6074], cannot meet all the above set of 249 requirements and therefore a new solution is needed. The rest of the 250 document will describe how EVPN can be used to meet those service 251 requirements and even optimize the network further by: 253 o Providing the user with an option to reduce (and even suppress) the 254 ARP-flooding. 256 o Supporting ARP termination for inter-subnet forwarding 258 3. Provisioning Model 260 One of the requirements stated in [RFC7209] is the ease of 261 provisioning. BGP parameters and service context parameters should be 262 auto-provisioned so that the addition of a new MAC-VRF to the EVI 263 requires a minimum number of single-sided provisioning touches. 264 However this is only possible in a limited number of cases. This 265 section describes the provisioning tasks required for the services 266 described in section 2, i.e. EVI100 (VLAN-based service interfaces), 267 EVI200 (VLAN-bundle service interfaces) and EVI300 (VLAN-aware 268 bundling service interfaces). 270 3.1. Common provisioning tasks 272 Regardless of the service interface type (VLAN-based, VLAN-bundle or 273 VLAN-aware), the following sub-sections describe the parameters to be 274 provisioned in the three PEs. 276 3.1.1. Non-service specific parameters 278 The multi-homing function in EVPN requires the provisioning of 279 certain parameters which are not service-specific and that are shared 280 by all the MAC-VRFs in the node using the multi-homing capabilities. 281 In our use-case, these parameters are only provisioned in PE1 and 282 PE2, and are listed below: 284 o Ethernet Segment Identifier (ESI): only the ESI associated to CE2 285 needs to be considered in our example. Single-homed CEs such as CE1 286 and CE3 do not require the provisioning of an ESI (the ESI will be 287 coded as zero in the BGP NLRIs). In our example, a LAG is used 288 between CE2 and PE1-PE2 (since all-active multi-homing is a 289 requirement) therefore the ESI can be auto-derived from the LACP 290 information as described in [EVPN]. Note that the ESI MUST be 291 unique across all the PEs in the network, therefore the 292 auto-provisioning of the ESI is only recommended in case the CEs 293 are managed by the Service Provider. Otherwise the ESI should be 294 manually provisioned (type 0 as in [EVPN]) in order to avoid 295 potential conflicts. 297 o ES-Import Route Target (ES-Import RT): this is the RT that will be 298 sent by PE1 and PE2, along with the ES route. Regardless of how the 299 ESI is provisioned in PE1 and PE2, the ES-Import RT must always be 300 auto-derived from the 6-byte MAC address portion of the ESI value. 302 o Ethernet Segment Route Distinguisher (ES RD): this is the RD to be 303 encoded in the ES route and Ethernet Auto-Discovery (A-D) route to 304 be sent by PE1 and PE2 for the CE2 ESI. This RD should always be 305 auto-derived from the PE IP address, as described in [EVPN]. 307 o Multi-homing type: the user must be able to provision the 308 multi-homing type to be used in the network. In our use-case, the 309 multi-homing type will be set to all-active for the CE2 ESI. This 310 piece of information is encoded in the ESI Label extended community 311 flags and sent by PE1 and PE2 along with the Ethernet A-D route for 312 the CE2 ESI. 314 In our use-case, besides the above parameters, the same LACP 315 parameters will be configured in PE1 and PE2 for the ESI, so that CE2 316 can send different flows to PE1 and PE2 for the same CE-VID as though 317 they were forming a single system from the CE2 perspective. 319 3.1.2. Service specific parameters 321 The following parameters must be provisioned in PE1, PE2 and PE3 per 322 EVI service: 324 o EVI identifier: global identifier per EVI that is shared by all the 325 PEs part of the EVI, i.e. PE1, PE2 and PE3 will be provisioned with 326 EVI100, 200 and 300. The EVI identifier can be associated to (or be 327 the same value as) the EVI default Ethernet Tag (4-byte default 328 broadcast domain identifier for the EVI). The Ethernet Tag is 329 different from zero in the EVPN BGP routes only if the service 330 interface type (of the source PE) is VLAN-aware. 332 o EVI Route Distinguisher (EVI RD): This RD is a unique value across 333 all the MAC-VRFs in a PE. Auto-derivation of this RD might be 334 possible depending on the service interface type being used in the 335 EVI. Next section discusses the specifics of each service interface 336 type. 338 o EVI Route Target(s) (EVI RT): one or more RTs can be provisioned 339 per MAC-VRF. The RT(s) imported and exported can be equal or 340 different, just as the RT(s) in IP-VPNs. Auto-derivation of this 341 RT(s) might be possible depending on the service interface type 342 being used in the EVI. Next section discusses the specifics of each 343 service interface type. 345 o CE-VID and port/LAG binding to EVI identifier or Ethernet Tag: see 346 section 3.2. 348 3.2. Service interface dependent provisioning tasks 350 Depending on the service interface type being used in the EVI, a 351 specific CE-VID binding provisioning must be specified. 353 3.2.1. VLAN-based service interface EVI 355 In our use-case, EVI100 is a VLAN-based service interface EVI. 357 EVI100 can be a "unique-VLAN" EVPN if the CE-VID being used for this 358 service in CE1, CE2 and CE3 is equal, e.g. VID 100. In that case, the 359 VID 100 binding must be provisioned in PE1, PE2 and PE3 for EVI100 360 and the associated port or LAG. The MAC-VRF RD and RT can be auto- 361 derived from the CE-VID: 363 o The auto-derived MAC-VRF RD will be a Type 1 RD, as recommended in 364 [EVPN], and it will be comprised of [PE-IP]:[zero-padded-VID]; 365 where PE-IP is the IP address of the PE (a loopback address) and 366 [zero-padded-VID] is a 2-byte value where the low order 12 bits are 367 the VID (VID 100 in our example) and the high order 4 bits are 368 zero. 370 o The auto-derived MAC-VRF RT will be composed of [AS]:[zero-padded- 371 VID]; where AS is the Autonomous System that the PE belongs to and 372 [zero-padded-VID] is a 4-byte value where the low order 12 bits are 373 the VID (VID 100 in our example) and the high order 20 bits are 374 zero. Note that auto-deriving the RT implies supporting a basic 375 any-to-any topology in the EVI and using the same import and export 376 RT in the EVI. 378 If EVI100 is not a "unique-VLAN" EVPN, each individual CE-VID must be 379 configured in each PE, and MAC-VRF RDs and RTs cannot be auto- 380 derived, hence they must be provisioned by the user. 382 3.2.2. VLAN-bundle service interface EVI 384 Assuming EVI200 is a VLAN-bundle service interface EVI, and VIDs 385 200-250 are assigned to EVI200, the CE-VID bundle 200-250 must be 386 provisioned on PE1, PE2 and PE3. Note that this model does not allow 387 CE-VID translation and the CEs must use the same CE-VIDs for EVI200. 388 No auto-derived EVI RDs or EVI RTs are possible. 390 3.2.3. VLAN-aware bundling service interface EVI 392 If EVI300 is a VLAN-aware bundling service interface EVI, CE-VID 393 binding to EVI300 does not have to match on the three PEs (only on 394 PE1 and PE2, since they are part of the same ES). E.g.: PE1 and PE2 395 CE-VID binding to EVI300 can be set to the range 300-310 and PE3 to 396 321-330. Note that each individual CE-VID will be assigned to a core 397 broadcast domain, i.e. Ethernet Tag, which will be encoded in the BGP 398 EVPN routes. 400 Therefore, besides the CE-VID bundle range bound to EVI300 in each 401 PE, associations between each individual CE-VID and the EVPN Ethernet 402 Tag must be provisioned by the user. No auto-derived EVI RDs/RTs are 403 possible. 405 4. BGP EVPN NLRI usage 407 [EVPN] defines four different types of routes and four different 408 extended communities advertised along with the different routes. 409 However not all the PEs in a network must generate and process all 410 the different routes and extended communities. The following table 411 shows the routes that must be exported and imported in the use-case 412 described in this document. "Export", in this context, means that the 413 PE must be capable of generating and exporting a given route, 414 assuming there are no BGP policies to prevent it. In the same way, 415 "Import" means the PE must be capable of importing and processing a 416 given route, assuming the right RTs and policies. "N/A" means neither 417 import nor export actions are required. 419 +-------------------+---------------+---------------+ 420 | BGP EVPN routes | PE1-PE2 | PE3 | 421 +-------------------+---------------+---------------+ 422 | ES | Export/import | N/A | 423 | A-D per ESI | Export/import | Import | 424 | A-D per EVI | Export/import | Import | 425 | MAC | Export/import | Export/import | 426 | Inclusive mcast | Export/import | Export/import | 427 +-------------------+---------------+---------------+ 429 PE3 is only required to export MAC and Inclusive multicast routes and 430 be able to import and process A-D routes, as well as MAC and 431 Inclusive multicast routes. If PE3 did not support importing and 432 processing A-D routes per ESI and per EVI, fast convergence and 433 aliasing functions (respectively) would not be possible in this 434 use-case. 436 5. MAC-based forwarding model use-case 438 This section describes how the BGP EVPN routes are exported and 439 imported by the PEs in our use-case, as well as how traffic is 440 forwarded assuming that PE1, PE2 and PE3 support a MAC-based 441 forwarding model. In order to compare the control and data plane 442 impact in the two forwarding models (MAC-based and MPLS-based) and 443 different service types, we will assume that CE1, CE2 and CE3 need to 444 exchange traffic for up to 4k CE-VIDs. 446 5.1. EVPN Network Startup procedures 448 Before any EVI is provisioned in the network, the following 449 procedures are required: 451 o Infrastructure setup: the proper MPLS infrastructure must be setup 452 among PE1, PE2 and PE3 so that the EVPN services can make use of 453 P2P, P2MP and/or MP2MP LSPs. In addition to the MPLS transport, PE1 454 and PE2 must be properly configured with the same LACP 455 configuration to CE2. Details are provided in [EVPN]. Once the LAG 456 is properly setup, the ESI for the CE2 Ethernet Segment, e.g. 457 ESI12, can be auto-generated by PE1 and PE2 from the LACP 458 information exchanged with CE2 (ESI type 1), as discussed in 459 section 3.1. Alternatively, the ESI can also be manually 460 provisioned on PE1 and PE2 (ESI type 0). PE1 and PE2 will auto- 461 configure a BGP policy that will import any ES route matching the 462 auto-derived ES-import RT for ESI12. 464 o Ethernet Segment route exchange and DF election: PE1 and PE2 will 465 advertise a BGP Ethernet Segment route for ESI12, where the ESI RD 466 and ES-Import RT will be auto-generated as discussed in section 467 3.1.1. PE1 and PE2 will import the ES routes of each other and will 468 run the DF election algorithm for any existing EVI (if any, at this 469 point). PE3 will simply discard the route. Note that the DF 470 election algorithm can support service carving, so that the 471 downstream BUM traffic from the network to CE2 can be load-balanced 472 across PE1 and PE2 on a per-service basis. 474 At the end of this process, the network infrastructure is ready to 475 start deploying EVPN services. PE1 and PE2 are aware of the existence 476 of a shared Ethernet Segment, i.e. ESI12. 478 5.2. VLAN-based service procedures 480 Assuming that the EVPN network must carry traffic among CE1, CE2 and 481 CE3 for up to 4k CE-VIDs, the Service Provider can decide to 482 implement VLAN-based service interface EVIs to accomplish it. In this 483 case, each CE-VID will be individually mapped to a different EVI. 484 While this means a total number of 4k MAC-VRFs is required per PE, 485 the advantages of this approach are the auto-provisioning of most of 486 the service parameters if no VLAN translation is needed (see section 487 3.2.1) and great control over each individual customer broadcast 488 domain. We assume in this section that the range of EVIs from 1 to 4k 489 is provisioned in the network. 491 5.2.1. Service startup procedures 493 As soon as the EVIs are created in PE1, PE2 and PE3, the following 494 control plane actions are carried out: 496 o Flooding tree setup per EVI (4k routes): Each PE will send one 497 Inclusive Multicast Ethernet Tag route per EVI (up to 4k routes per 498 PE) so that the flooding tree per EVI can be setup. Note that 499 ingress replication, P2MP LSPs or MP2MP LSPs can optionally be 500 signaled in the PMSI Tunnel attribute and the corresponding tree be 501 created. 503 o Ethernet A-D routes per ESI (a set of routes for ESI12): A set of 504 A-D routes with a list of 4k RTs (one per EVI) for ESI12 will be 505 issued from PE1 and PE2 (it has to be a set of routes so that the 506 total number of RTs can be conveyed). This set will also include 507 ESI Label extended communities with the active-standby flag set to 508 zero (all-active multi-homing type) and an ESI Label different from 509 zero (used for split-horizon functions). These routes will be 510 imported by the three PEs, since the RTs match the EVI RTs locally 511 configured. The A-D routes per ESI will be used for fast 512 convergence and split-horizon functions, as discussed in [EVPN]. 514 o Ethernet A-D routes per EVI (4k routes): An A-D route per EVI will 515 be sent by PE1 and PE2 for ESI12. Each individual route includes 516 the corresponding EVI RT and an MPLS label to be used by PE3 for 517 the aliasing function. These routes will be imported by the three 518 PEs. 520 5.2.2. Packet walkthrough 522 Once the services are setup, the traffic can start flowing. Assuming 523 there are no MAC addresses learnt yet and that MAC learning at the 524 access is performed in the data plane in our use-case, this is the 525 process followed upon receiving frames from each CE (example for 526 EVI1). 528 (1) BUM frame example from CE1: 530 a) An ARP-request with CE-VID=1 is issued from source MAC CE1-MAC 531 (MAC address coming from CE1 or from a device connected to CE1) to 532 find the MAC address of CE3-IP. 534 b) Based on the CE-VID, the frame is identified to be forwarded in 535 the MAC-VRF-1 (EVI1) context. A source MAC lookup is done in the 536 MAC FIB and the sender's CE1-IP in the proxy-ARP table within the 537 MAC-VRF-1 (EVI1) context. If CE1-MAC/CE1-IP are unknown in both 538 tables, three actions are carried out (assuming the source MAC is 539 accepted by PE1): (1) a forwarding state is added for CE1-MAC 540 associated to the corresponding port and CE-VID, (2) the ARP- 541 request is snooped and the tuple CE1-MAC/CE1-IP is added to the 542 proxy-ARP table and (3) a BGP MAC advertisement route is triggered 543 from PE1 containing the EVI1 RD and RT, ESI=0, Ethernet-Tag=0 and 544 CE1-MAC/CE1-IP along with an MPLS label assigned to MAC-VRF-1 from 545 the PE1 label space. Note that depending on the implementation, 546 the MAC FIB and proxy-ARP learning processes can independently 547 send two BGP MAC advertisements instead of one (one containing 548 only the CE1-MAC and another one containing CE1-MAC/CE1-IP). 550 Since we assume a MAC forwarding model, a label per MAC-VRF is 551 normally allocated and signaled by the three PEs for MAC 552 advertisement routes. Based on the RT, the route is imported by 553 PE2 and PE3 and the forwarding state plus ARP entry are added to 554 their MAC-VRF-1 context. From this moment on, any ARP request from 555 CE2 or CE3 destined to CE1-IP, can be directly replied by PE1, PE2 556 or PE3 and ARP flooding for CE1-IP is not needed in the core. 558 c) Since the ARP frame is a broadcast frame, it is forwarded by PE1 559 using the Inclusive multicast tree for EVI1 (CE-VID=1 should be 560 kept if translation is required). Depending on the type of tree, 561 the label stack may vary. E.g. assuming ingress replication, the 562 packet is replicated to PE2 and PE3 with the downstream allocated 563 labels and the P2P LSP transport labels. No other labels are added 564 to the stack. 566 d) Assuming PE1 is the DF for EVI1 on ESI12, the frame is locally 567 replicated to CE2. 569 e) The MPLS-encapsulated frame gets to PE2 and PE3. Since PE2 is non- 570 DF for EVI1 on ESI12, and there is no other CE connected to PE2, 571 the frame is discarded. At PE3, the frame is de-encapsulated, CE- 572 VID translated if needed and replicated to CE3. 574 Any other type of BUM frame from CE1 would follow the same 575 procedures. BUM frames from CE3 would follow the same procedures too. 577 (2) BUM frame example from CE2: 579 a) An ARP-request with CE-VID=1 is issued from source MAC CE2-MAC to 580 find the MAC address of CE3-IP. 582 b) CE2 will hash the frame and will forward it to e.g. PE2. Based on 583 the CE-VID, the frame is identified to be forwarded in the EVI1 584 context. A source MAC lookup is done in the MAC FIB and the 585 sender's CE2-IP in the proxy-ARP table within the MAC-VRF-1 586 context. If both are unknown, three actions are carried out 587 (assuming the source MAC is accepted by PE2): (1) a forwarding 588 state is added for CE2-MAC associated to the corresponding LAG/ESI 589 and CE-VID, (2) the ARP-request is snooped and the tuple CE2- 590 MAC/CE2-IP is added to the proxy-ARP table and (3) a BGP MAC 591 advertisement route is triggered from PE2 containing the EVI1 RD 592 and RT, ESI=12, Ethernet-Tag=0 and CE2-MAC/CE2-IP along with an 593 MPLS label assigned from the PE2 label space (one label per MAC- 594 VRF). Again, depending on the implementation, the MAC FIB and 595 proxy-ARP learning processes can independently send two BGP MAC 596 advertisements instead of one. 598 Note that, since PE3 is not part of ESI12, it will install a 599 forwarding state for CE2-MAC as long as the A-D routes for ESI12 600 are also active on PE3. On the contrary, PE1 is part of ESI12, 601 therefore PE1 will not modify the forwarding state for CE2-MAC if 602 it has previously learnt CE2-MAC locally attached to ESI12. 604 Otherwise it will add forwarding state for CE2-MAC associated to 605 the local ESI12 port. 607 c) Assuming PE2 does not have the ARP information for CE3-IP yet, and 608 since the ARP is a broadcast frame and PE2 the non-DF for EVI1 on 609 ESI12, the frame is forwarded by PE2 in the Inclusive multicast 610 tree for EVI1, adding the ESI label for ESI12 at the bottom of the 611 stack. The ESI label has been previously allocated and signaled by 612 the A-D routes for ESI12. Note that, as per [EVPN], if the result 613 of the CE2 hashing is different and the frame sent to PE1, PE1 614 SHOULD add the ESI label too (PE1 is the DF for EVI1 on ESI12). 616 d) The MPLS-encapsulated frame gets to PE1 and PE3. PE1 617 de-encapsulate the Inclusive multicast tree label(s) and based on 618 the ESI label at the bottom of the stack, it decides to not 619 forward the frame to the ESI12. It will pop the ESI label and will 620 replicate it to CE1 though, since CE1 is not part of the ESI 621 identified by the ESI label. At PE3, the Inclusive multicast tree 622 label is popped and the frame forwarded to CE3. If a P2MP LSP is 623 used as Inclusive multicast tree for EVI1, PE3 will find an ESI 624 label after popping the P2MP LSP label. The ESI label will simply 625 be ignored and popped, since CE3 is not part of ESI12. 627 (3) Unicast frame example from CE3 to CE1: 629 a) A unicast frame with CE-VID=1 is issued from source MAC CE3-MAC 630 and destination MAC CE1-MAC (we assume PE3 has previously resolved 631 an ARP request from CE3 to find the MAC of CE1-IP, and has added 632 CE3-MAC/CE3-IP to its proxy-ARP table). 634 b) Based on the CE-VID, the frame is identified to be forwarded in 635 the EVI1 context. A source MAC lookup is done in the MAC FIB 636 within the MAC-VRF-1 context and this time, since we assume CE3- 637 MAC is known, no further actions are carried out as a result of 638 the source lookup. A destination MAC lookup is performed next and 639 the label stack associated to the MAC CE1-MAC is found (including 640 the label associated to MAC-VRF-1 in PE1 and the P2P LSP label to 641 get to PE1). The unicast frame is then encapsulated and forwarded 642 to PE1. 644 c) At PE1, the packet is identified to be part of EVI1 and a 645 destination MAC lookup is performed in the MAC-VRF-1 context. The 646 labels are popped and the frame forwarded to CE1 with CE-VID=1. 648 Unicast frames from CE1 to CE3 or from CE2 to CE3 follow the same 649 procedures described above. 651 (4) Unicast frame example from CE3 to CE2: 653 a) A unicast frame with CE-VID=1 is issued from source MAC CE3-MAC 654 and destination MAC CE2-MAC (we assume PE3 has previously resolved 655 an ARP request from CE3 to find the MAC of CE2-IP). 657 b) Based on the CE-VID, the frame is identified to be forwarded in 658 the MAC-VRF-1 context. We assume CE3-MAC is known. A destination 659 MAC lookup is performed next and PE3 finds CE2-MAC associated to 660 PE2 on ESI12, an Ethernet Segment for which PE3 has two active A-D 661 routes per ESI (from PE1 and PE2) and two active A-D routes for 662 EVI1 (from PE1 and PE2). Based on a hashing function for the 663 frame, PE3 may decide to forward the frame using the label stack 664 associated to PE2 (label received from the MAC advertisement 665 route) or the label stack associated to PE1 (label received from 666 the A-D route per EVI for EVI1). Either way, the frame is 667 encapsulated and sent to the remote PE. 669 c) At PE2 (or PE1), the packet is identified to be part of EVI1 based 670 on the bottom label, and a destination MAC lookup is performed. At 671 either PE (PE2 or PE1), the FIB lookup yields a local ESI12 port 672 to which the frame is sent. 674 Unicast frames from CE1 to CE2 follow the same procedures. Aliasing 675 is possible in this case too, since ESI12 is local to PE1 and load 676 balancing through PE1 and PE2 may happen. 678 5.3. VLAN-bundle service procedures 680 Instead of using VLAN-based interfaces, the Service Provider can 681 choose to implement VLAN-bundle interfaces to carry the traffic for 682 the 4k CE-VIDs among CE1, CE2 and CE3. If that is the case, the 4k 683 CE-VIDs can be mapped to the same EVI, e.g. EVI200, at each PE. The 684 main advantage of this approach is the low control plane overhead 685 (reduced number of routes and labels) and easiness of provisioning, 686 at the expense of no control over the customer broadcast domains, 687 i.e. a single inclusive multicast tree for all the CE-VIDs and no CE- 688 VID translation in the Provider network. 690 5.3.1. Service startup procedures 692 As soon as the EVI200 is created in PE1, PE2 and PE3, the following 693 control plane actions are carried out: 695 o Flooding tree setup per EVI (one route): Each PE will send one 696 Inclusive Multicast Ethernet Tag route per EVI (hence only one 697 route per PE) so that the flooding tree per EVI can be setup. Note 698 that ingress replication, P2MP LSPs or MP2MP LSPs can optionally 699 be signaled in the PMSI Tunnel attribute and the corresponding 700 tree be created. 702 o Ethernet A-D routes per ESI (one route for ESI12): A single A-D 703 route for ESI12 will be issued from PE1 and PE2. This route will 704 include a single RT (RT for EVI200), an ESI Label extended 705 community with the active-standby flag set to zero (all-active 706 multi-homing type) and an ESI Label different from zero (used by 707 the non-DF for split-horizon functions). This route will be 708 imported by the three PEs, since the RT matches the EVI200 RT 709 locally configured. The A-D routes per ESI will be used for fast 710 convergence and split-horizon functions, as described in [EVPN]. 712 o Ethernet A-D routes per EVI (one route): An A-D route (EVI200) will 713 be sent by PE1 and PE2 for ESI12. This route includes the EVI200 714 RT and an MPLS label to be used by PE3 for the aliasing function. 715 This route will be imported by the three PEs. 717 5.3.2. Packet Walkthrough 719 The packet walkthrough for the VLAN-bundle case is similar to the one 720 described for EVI1 in the VLAN-based case except for the way the 721 CE-VID is handled by the ingress PE and the egress PE: 723 o No VLAN translation is allowed and the CE-VIDs are kept untouched 724 from CE to CE, i.e. the ingress CE-VID MUST be kept at the 725 imposition PE and at the disposition PE. 727 o The frame is identified to be forwarded in the MAC-VRF-200 context 728 as long as its CE-VID belongs to the VLAN-bundle defined in the 729 PE1/PE2/PE3 port to CE1/CE2/CE3. Our example is a special VLAN- 730 bundle case, since the entire CE-VID range is defined in the 731 ports, therefore any CE-VID would be part of EVI200. 733 Please refer to section 5.2.2 for more information about the control 734 plane and forwarding plane interaction for BUM and unicast traffic 735 from the different CEs. 737 5.4. VLAN-aware bundling service procedures 739 The last potential service type analyzed in this document is 740 VLAN-aware bundling. When this type of service interface is used to 741 carry the 4k CE-VIDs among CE1, CE2 and CE3, all the CE-VIDs will be 742 mapped to the same EVI, e.g. EVI300. The difference, compared to the 743 VLAN-bundle service type in the previous section, is that each 744 incoming CE-VID will also be mapped to a different "normalized" 745 Ethernet-Tag in addition to EVI300. If no translation is required, 746 the Ethernet-tag will match the CE-VID. Otherwise a translation 747 between CE-VID and Ethernet-tag will be needed at the imposition PE 748 and at the disposition PE. The main advantage of this approach is the 749 ability to control customer broadcast domains while providing a 750 single EVI to the customer. 752 5.4.1. Service startup procedures 754 As soon as the EVI300 is created in PE1, PE2 and PE3, the following 755 control plane actions are carried out: 757 o Flooding tree setup per EVI per Ethernet-Tag (4k routes): Each PE 758 will send one Inclusive Multicast Ethernet Tag route per EVI and 759 per Ethernet-Tag (hence 4k routes per PE) so that the flooding 760 tree per customer broadcast domain can be setup. Note that ingress 761 replication, P2MP LSPs or MP2MP LSPs can optionally be signaled in 762 the PMSI Tunnel attribute and the corresponding tree be created. 763 In the described use-case, since all the CE-VIDs and Ethernet-Tags 764 are defined on the three PEs, multicast tree aggregation might 765 make sense in order to save forwarding states. 767 o Ethernet A-D routes per ESI (one route for ESI12): A single A-D 768 route for ESI12 will be issued from PE1 and PE2. This route will 769 include a single RT (RT for EVI300), an ESI Label extended 770 community with the active-standby flag set to zero (all-active 771 multi-homing type) and an ESI Label different than zero (used by 772 the non-DF for split-horizon functions). This route will be 773 imported by the three PEs, since the RT matches the EVI300 RT 774 locally configured. The A-D routes per ESI will be used for fast 775 convergence and split-horizon functions, as described in [EVPN]. 777 o Ethernet A-D routes per EVI (one route): An A-D route (EVI300) will 778 be sent by PE1 and PE2 for ESI12. This route includes the EVI300 779 RT and an MPLS label to be used by PE3 for the aliasing function. 780 This route will be imported by the three PEs. 782 5.4.2. Packet Walkthrough 784 The packet walkthrough for the VLAN-aware case is similar to the one 785 described before. Compared to the other two cases, VLAN-aware 786 services allow for CE-VID translation and for an N:1 CE-VID to EVI 787 mapping. Both things are not supported at once in either of the two 788 other service interfaces. Note that this model requires qualified 789 learning on the MAC FIBs. Some differences compared to the packet 790 walkthrough described in section 5.2.2 are: 792 o At the ingress PE, the frames are identified to be forwarded in the 793 EVI300 context as long as their CE-VID belong to the range defined 794 in the PE port to the CE. In addition to it, CE-VID=x is mapped to 795 a "normalized" Ethernet-Tag=y at the MAC-VRF-300 (where x and y 796 might be equal if no translation is needed). Qualified learning is 797 now required (a different FIB space is allocated within MAC-VRF- 798 300 for each Ethernet-Tag). Potentially the same MAC could be 799 learnt in two different Ethernet-Tag bridge domains of the same 800 MAC-VRF. 802 o Any new locally learnt MAC on the MAC-VRF-300/Ethernet-Tag=y 803 interface is advertised by the ingress PE in a MAC advertisement 804 route, using now the Ethernet-Tag field (Ethernet-Tag=y) so that 805 the remote PE learns the MAC associated to the MAC-VRF- 806 300/Ethernet-Tag=y FIB. Note that the Ethernet-Tag field is not 807 used in advertisements of MACs learnt on VLAN-based or VLAN-bundle 808 service interfaces. 810 o At the ingress PE, BUM frames are sent to the corresponding 811 flooding tree for the particular Ethernet-Tag they are mapped to. 812 Each individual Ethernet-Tag can have a different flooding tree 813 within the same EVI300. For instance, Ethernet-Tag=y can use 814 ingress replication to get to the remote PEs whereas Ethernet- 815 Tag=z can use a p2mp LSP. 817 o At the egress PE, Ethernet-Tag=y, for a given broadcast domain 818 within MAC-VRF-300, can be translated to egress CE-VID=x. That is 819 not possible for VLAN-bundle interfaces. It is possible for VLAN- 820 based interfaces, but it requires a separate EVI per CE-VID. 822 6. MPLS-based forwarding model use-case 824 EVPN supports an alternative forwarding model, usually referred to as 825 MPLS-based forwarding or disposition model as opposed to the 826 MAC-based forwarding or disposition model described in section 5. 827 Using MPLS-based forwarding model instead of MAC-based model might 828 have an impact on: 830 o The number of forwarding states required 832 o The FIB where the forwarding states are handled: MAC FIB or MPLS 833 LFIB. 835 The MPLS-based forwarding model avoids the destination MAC lookup at 836 the egress PE MAC FIB, at the expense of increasing the number of 837 next-hop forwarding states at the egress MPLS LFIB. This also has an 838 impact on the control plane and the label allocation model, since an 839 MPLS-based disposition PE MUST send as many routes and labels as 840 required next-hops in the egress MAC-VRF. This concept is equivalent 841 to the forwarding models supported in IP-VPNs at the egress PE, where 842 an IP lookup in the IP-VPN FIB might be necessary or not depending on 843 the available next-hop forwarding states in the LFIB. 845 The following sub-sections highlight the impact on the control and 846 data plane procedures described in section 5 when and MPLS-based 847 forwarding model is used. 849 Note that both forwarding models are compatible and interoperable in 850 the same network. The implementation of either model in each PE is a 851 local decision to the PE node. 853 6.1. Impact of MPLS-based forwarding on the EVPN network startup 855 The MPLS-based forwarding model has no impact on the procedures 856 explained in section 5.1. 858 6.2. Impact of MPLS-based forwarding on the VLAN-based service 859 procedures 861 Compared to the MAC-based forwarding model, the MPLS-based forwarding 862 model has no impact in terms of number of routes, when all the 863 service interfaces are VLAN-based. The differences for the use-case 864 described in this document are summarized in the following list: 866 o Flooding tree setup per EVI (4k routes per PE): no impact compared 867 to the MAC-based model. 869 o Ethernet A-D routes per ESI (one set of routes for ESI12 per PE): 870 no impact compared to the MAC-based model. 872 o Ethernet A-D routes per EVI (4k routes per PE/ESI): no impact 873 compared to the MAC-based model. 875 o MAC-advertisement routes: instead of allocating and advertising the 876 same MPLS label for all the new MACs locally learnt on the same 877 MAC-VRF, a different label MUST be advertised per CE next-hop or 878 MAC so that no MAC FIB lookup is needed at the egress PE. In 879 general, this means that a different label at least per CE must be 880 advertised, although the PE can decide to implement a label per 881 MAC if more granularity (hence less scalability) is required in 882 terms of forwarding states. E.g. if CE2 sends traffic from two 883 different MACs to PE1, CE2-MAC1 and CE2-MAC2, the same MPLS 884 label=x can be re-used for both MAC advertisements since they both 885 share the same source ESI12. It is up to the PE1 implementation to 886 use a different label per individual MAC within the same ES 887 Segment (even if only one label per ESI is enough). 889 o PE1, PE2 and PE3 will not add forwarding states to the MAC FIB upon 890 learning new local CE MAC addresses on the data plane, but will 891 rather add forwarding states to the MPLS LFIB. 893 6.3. Impact of MPLS-based forwarding on the VLAN-bundle service 894 procedures 896 Compared to the MAC-based forwarding model, the MPLS-based forwarding 897 model has no impact in terms of number of routes when all the service 898 interfaces are VLAN-bundle type. The differences for the use-case 899 described in this document are summarized in the following list: 901 o Flooding tree setup per EVI (one route): no impact compared to the 902 MAC-based model. 904 o Ethernet A-D routes per ESI (one route for ESI12 per PE): no impact 905 compared to the MAC-based model. 907 o Ethernet A-D routes per EVI (one route per PE/ESI): no impact 908 compared to the MAC-based model since no VLAN translation is 909 required. 911 o MAC-advertisement routes: instead of allocating and advertising the 912 same MPLS label for all the new MACs locally learnt on the same 913 MAC-VRF, a different label MUST be advertised per CE next-hop or 914 MAC so that no MAC FIB lookup is needed at the egress PE. In 915 general, this means that a different label at least per CE must be 916 advertised, although the PE can decide to implement a label per 917 MAC if more granularity (hence less scalability) is required in 918 terms of forwarding states. It is up to the PE1 implementation to 919 use a different label per individual MAC within the same ES 920 Segment (even if only one label per ESI is enough). 922 o PE1, PE2 and PE3 will not add forwarding states to the MAC FIB upon 923 learning new local CE MAC addresses on the data plane, but will 924 rather add forwarding states to the MPLS LFIB. 926 6.4. Impact of MPLS-based forwarding on the VLAN-aware service 927 procedures 929 Compared to the MAC-based forwarding model, the MPLS-based forwarding 930 model has definitively an impact in terms of number of A-D routes 931 when all the service interfaces are VLAN-aware bundle type. The 932 differences for the use-case described in this document are 933 summarized in the following list: 935 o Flooding tree setup per EVI (4k routes per PE): no impact compared 936 to the MAC-based model. 938 o Ethernet A-D routes per ESI (one route for ESI12 per PE): no impact 939 compared to the MAC-based model. 941 o Ethernet A-D routes per EVI (4k routes per PE/ESI): PE1 and PE2 942 will send 4k routes for EVI300, one per 943 tuple. This will allow the egress PE to find out all the 944 forwarding information in the MPLS LFIB and even support Ethernet- 945 Tag to CE-VID translation at the egress. The MAC-based forwarding 946 model would allow the PEs to send a single route per PE/ESI for 947 EVI300, since the packet with the embedded Ethernet-Tag would be 948 used to perform a MAC lookup and find out the egress CE-VID. 950 o MAC-advertisement routes: instead of allocating and advertising the 951 same MPLS label for all the new MACs locally learnt on the same 952 MAC-VRF, a different label MUST be advertised per CE next-hop or 953 MAC so that no MAC FIB lookup is needed at the egress PE. In 954 general, this means that a different label at least per CE must be 955 advertised, although the PE can decide to implement a label per 956 MAC if more granularity (hence less scalability) is required in 957 terms of forwarding states. It is up to the PE1 implementation to 958 use a different label per individual MAC within the same ES 959 Segment. Note that, in this model, the Ethernet-Tag will be set to 960 a non-zero value for the MAC-advertisement routes. The same MAC 961 address can be announced with different Ethernet-Tag value. This 962 will make the advertising PE install two different forwarding 963 states in the MPLS LFIB. 965 o PE1, PE2 and PE3 will not add forwarding states to the MAC FIB upon 966 learning new local CE MAC addresses on the data plane, but will 967 rather add forwarding states to the MPLS LFIB. 969 7. Comparison between MAC-based and MPLS-based forwarding models 971 Both forwarding models are possible in a network deployment and each 972 one has its own trade-offs. 974 The MAC-based forwarding model can save A-D routes per EVI when VLAN- 975 aware bundling services are deployed and therefore reduce the control 976 plane overhead. This model also saves a significant amount of MPLS 977 labels compared to the MPLS-based forwarding model. All the MACs and 978 A-D routes for the same EVI can signal the same MPLS label, saving 979 labels from the local PE space. A MAC FIB lookup at the egress PE is 980 required in order to do so. 982 The MPLS-based forwarding model can save forwarding states at the 983 egress PEs if labels per next hop CE (as opposed to per MAC) are 984 implemented. No egress MAC lookup is required. An A-D route per is required for VLAN-aware services, as opposed to an 986 A-D route per EVI. Also, a different label per next-hop CE per MAC- 987 VRF is consumed, as opposed to a single label per MAC-VRF. 989 The following table summarizes the implementation details of both 990 models for the VLAN-aware bundling service type. 992 +-----------------------------+----------------+----------------+ 993 | 4k CE-VID VLANs | MAC-based | MPLS-based | 994 | | Model | Model | 995 +-----------------------------+----------------+----------------+ 996 | A-D routes/EVI | 1 per ESI/EVI | 4k per ESI/EVI | 997 | MPLS labels consumed | 1 per MAC-VRF | 1 per CE/EVI | 998 | Egress PE Forwarding states | 1 per MAC | 1 per next-hop | 999 | Egress PE Lookups | 2 (MPLS+MAC) | 1 (MPLS) | 1000 +-----------------------------+----------------+----------------+ 1002 The egress forwarding model is an implementation local to the egress 1003 PE and is independent of the model supported on the rest of the PEs, 1004 i.e. in our use-case, PE1, PE2 and PE3 could have either egress 1005 forwarding model without any dependencies. 1007 8. Traffic flow optimization 1009 In addition to the procedures described across sections 1 through 7, 1010 EVPN [EVPN] procedures allow for optimized traffic handling in order 1011 to minimize unnecessary flooding across the entire infrastructure. 1012 Optimization is provided through specific ARP termination and the 1013 ability to block unknown unicast flooding. Additionally, EVPN 1014 procedures allow for intelligent, close to the source, inter-subnet 1015 forwarding and solves the commonly known sub-optimal routing problem. 1016 Besides the traffic efficiency, ingress based inter-subnet forwarding 1017 also optimizes packet forwarding rules and implementation at the 1018 egress nodes as well. Details of these procedures are outlined in 1019 sections 8.1 and 8.2. 1021 8.1. Control Plane Procedures 1023 8.1.1. MAC learning options 1025 The fundamental premise of [EVPN] is the notion of a different 1026 approach to MAC address learning compared to traditional IEEE 802.1 1027 bridge learning methods; specifically EVPN differentiates between 1028 data and control plane driven learning mechanisms. 1030 Data driven learning implies that there is no separate communication 1031 channel used to advertise and propagate MAC addresses. Rather, MAC 1032 addresses are learned through IEEE defined bridge-learning procedures 1033 as well as by snooping on DHCP and ARP requests. As different MAC 1034 addresses show up on different ports, the L2 FIB is populated with 1035 the appropriate MAC addresses. 1037 Control plane driven learning implies a communication channel that 1038 could be either a control-plane protocol or a management-plane 1039 mechanism. In the context of EVPN, two different learning procedures 1040 are defined, i.e. local and remote procedures: 1042 o Local learning defines the procedures used for learning the MAC 1043 addresses of network elements locally connected to a MAC-VRF. 1044 Local learning could be implemented through all three learning 1045 procedures: control plane, management plane as well as data plane. 1046 However, the expectation is that for most of the use cases, local 1047 learning through data plane should be sufficient. 1049 o Remote learning defines the procedures used for learning MAC 1050 addresses of network elements remotely connected to a MAC-VRF, 1051 i.e. far-end PEs. Remote learning procedures defined in [EVPN] 1052 advocate using only control plane learning; specifically BGP. 1053 Through the use of BGP EVPN NLRIs, the remote PE has the 1054 capability of advertising all the MAC addresses present in its 1055 local FIB. 1057 8.1.2. Proxy-ARP/ND 1059 In EVPN, MAC addresses are advertised via the MAC/IP Advertisement 1060 Route, as discussed in [EVPN]. Optionally an IP address can be 1061 advertised along with the MAC address announcement. However, there 1062 are certain rules put in place in terms of IP address usage: if the 1063 MAC Advertisement Route contains an IP address, and the IP Address 1064 Length is 32 bits (or 128 in the IPv6 case), this particular IP 1065 address correlates directly with the advertised MAC address. Such 1066 advertisement allows us to build a proxy-ARP/ND table populated with 1067 the IP<>MAC bindings received from all the remote nodes. 1069 Furthermore, based on these bindings, a local MAC-VRF can now provide 1070 Proxy-ARP/ND functionality for all ARP requests and ND solicitations 1071 directed to the IP address pool learned through BGP. Therefore, the 1072 amount of unnecessary L2 flooding, ARP/ND requests/solicitations in 1073 this case, can be further reduced by the introduction of Proxy-ARP/ND 1074 functionality across all EVI MAC-VRFs. 1076 8.1.3. Unknown Unicast flooding suppression 1078 Given that all locally learned MAC addresses are advertised through 1079 BGP to all remote PEs, suppressing flooding of any Unknown Unicast 1080 traffic towards the remote PEs is a feasible network optimization. 1082 The assumption in the use case is made that any network device that 1083 appears on a remote MAC-VRF will somehow signal its presence to the 1084 network. This signaling can be done through e.g. gratuitous ARPs. 1086 Once the remote PE acknowledges the presence of the node in the MAC- 1087 VRF, it will do two things: install its MAC address in its local FIB 1088 and advertise this MAC address to all other BGP speakers via EVPN 1089 NLRI. Therefore, we can assume that any active MAC address is 1090 propagated and learnt through the entire EVI. Given that MAC 1091 addresses become pre-populated - once nodes are alive on the network 1092 - there is no need to flood any unknown unicast towards the remote 1093 PEs. If the owner of a given destination MAC is active, the BGP route 1094 will be present in the local RIB and FIB, assuming that the BGP 1095 import policies are successfully applied; otherwise, the owner of 1096 such destination MAC is not present on the network. 1098 It is worth noting that unless: a) control or management plane 1099 learning is performed through the entire EVI or b) all the EVI- 1100 attached devices signal their presence when they come up (GARPs or 1101 similar), unknown unicast flooding MUST be enabled. 1103 8.1.4. Optimization of Inter-subnet forwarding 1105 In a scenario in which both L2 and L3 services are needed over the 1106 same physical topology, some interaction between EVPN and IP-VPN is 1107 required. A common way of stitching the two service planes is through 1108 the use of an IRB interface, which allows for traffic to be either 1109 routed or bridged depending on its destination MAC address. If the 1110 destination MAC address is the one of the IRB interface, traffic 1111 needs to be passed through a routing module and potentially be either 1112 routed to a remote PE or forwarded to a local subnet. If the 1113 destination MAC address is not the one of the IRB, the MAC-VRF 1114 follows standard bridging procedures. 1116 A typical example of EVPN inter-subnet forwarding would be a scenario 1117 in which multiple IP subnets are part of a single or multiple EVIs, 1118 and they all belong to a single IP-VPN. In such topologies, it is 1119 desired that inter-subnet traffic can be efficiently routed without 1120 any tromboning effects in the network. Due to the overlapping 1121 physical and service topology in such scenarios, all inter-subnet 1122 connectivity will be locally routed trough the IRB interface. 1124 In addition to optimizing the traffic patterns in the network, local 1125 inter-subnet forwarding also optimizes greatly the amount of 1126 processing needed to cross the subnets. Through EVPN MAC 1127 advertisements, the local PE learns the real destination MAC address 1128 associated with the remote IP address and the inter-subnet forwarding 1129 can happen locally. When the packet is received at the egress PE, it 1130 is directly mapped to an egress MAC-VRF, bypassing any egress IP-VPN 1131 processing. 1133 Please refer to [EVPN-INTERSUBNET] for more information about the IP 1134 inter-subnet forwarding procedures in EVPN. 1136 8.2. Packet Walkthrough Examples 1138 Assuming that the services are setup according to figure 1 in section 1139 2, the following flow optimization processes will take place in terms 1140 of creating, receiving and forwarding packets across the network. 1142 8.2.1. Proxy-ARP example for CE2 to CE3 traffic 1144 Using figure 1 in section 2, consider EVI 400 residing on PE1, PE2 1145 and PE3 connecting CE2 and CE3 networks. Also, consider that PE1 and 1146 PE2 are part of the all-active multi-homing ES for CE2, and that PE2 1147 is elected designated-forwarder for EVI400. We assume that all the 1148 PEs implement the proxy-ARP functionality in the MAC-VRF-400 context. 1150 In this scenario, PE3 will not only advertise the MAC addresses 1151 through the EVPN MAC Advertisement Route but also IP addresses of 1152 individual hosts, i.e. /32 prefixes, behind CE3. Upon receiving the 1153 EVPN routes, PE1 and PE2 will install the MAC addresses in the MAC- 1154 VRF-400 FIB and based on the associated received IP addresses, PE1 1155 and PE2 can now build a proxy-ARP table within the context of MAC- 1156 VRF-400. 1158 From the forwarding perspective, when a node behind CE2 sends a frame 1159 destined to a node behind CE3, it will first send an ARP request to 1160 e.g. PE2 (based on the result of the CE2 hashing). Assuming that PE2 1161 has populated its proxy-ARP table for all active nodes behind the 1162 CE3, and that the IP address in the ARP message matches the entry in 1163 the table, PE2 will respond to the ARP request with the actual MAC 1164 address on behalf of the node behind CE3. 1166 Once the nodes behind CE2 learn the actual MAC address of the nodes 1167 behind CE3, all the MAC-to-MAC communications between the two 1168 networks will be unicast. 1170 8.2.2. Flood suppression example for CE1 to CE3 traffic 1172 Using figure 1 in section 2, consider EVI 500 residing on PE1 and PE3 1173 connecting CE1 and CE3 networks. Consider that both PE1 and PE3 have 1174 disabled unknown unicast flooding for this specific EVI context. Once 1175 the network devices behind CE3 come online they will learn their MAC 1176 addresses and create local FIB entries for these devices. Note that 1177 local FIB entries could also be created through either a control or 1178 management plane between PE and CE as well. Consequently, PE3 will 1179 automatically create EVPN Type 2 MAC Advertisement Routes and 1180 advertise all locally learned MAC addresses. The routes will also 1181 include the corresponding MPLS label. 1183 Given that PE1 automatically learns and installs all MAC addresses 1184 behind CE3, its MAC-VRF FIB will already be pre-populated with the 1185 respective next-hops and label assignments associated with the MAC 1186 addresses behind CE3. As such, as soon as the traffic sent by CE1 to 1187 nodes behind CE3 is received into the context of EVI 500, PE1 will 1188 push the MPLS Label(s) onto the original Ethernet frame and send the 1189 packet to the MPLS network. As usual, once PE3 receives this packet, 1190 and depending on the forwarding model, PE3 will either do a next-hop 1191 lookup in the EVI 500 context, or will just forward the traffic 1192 directly to the CE3. In the case that PE1 MAC-VRF-500 does not have a 1193 MAC entry for a specific destination that CE1 is trying to reach, PE1 1194 will drop the frame since unknown unicast flooding is disabled. 1196 Based on the assumption that all the MAC entries behind the CEs are 1197 pre-populated through gratuitous-ARP and/or DHCP requests, if one 1198 specific MAC entry is not present in the MAC-VRF-500 FIB on PE1, the 1199 owner of that MAC is not alive on the network behind the CE3, hence 1200 the traffic can be dropped at PE1 instead of be flooded and consume 1201 network bandwidth. 1203 8.2.3. Optimization of inter-subnet forwarding example for CE3 to CE2 1204 traffic 1206 Using figure 1 in section 2 consider that there is an IP-VPN 666 1207 context residing on PE1, PE2 and PE3 which connects CE1, CE2 and CE3 1208 into a single IP-VPN domain. Also consider that there are two EVIs 1209 present on the PEs, EVI 600 and EVI 60. Each IP subnet is associated 1210 to a different MAC-VRF context. Thus there is a single subnet, subnet 1211 600, between CE1 and CE3 that is established through EVI 600. 1212 Similarly, there is another subnet, subnet 60, between CE2 and CE3 1213 that is established through EVI 60. Since both subnets are part of 1214 the same IP VPN, there is a mapping of each EVI (or individual 1215 subnet) to a local IRB interface on the three PEs. 1217 If a node behind CE2 wants to communicate with a node on the same 1218 subnet seating behind CE3, the communication flow will follow the 1219 standard EVPN procedures, i.e. FIB lookup within the PE1 (or PE2) 1220 after adding the corresponding EVPN label to the MPLS label stack 1221 (downstream label allocation from PE3 for EVI 60). 1223 When it comes to crossing the subnet boundaries, the ingress PE 1224 implements local inter-subnet forwarding. For example, when a node 1225 behind CE2 (EVI 60) sends a packet to a node behind CE1 (EVI 600) the 1226 destination IP address will be in the subnet 600, but the destination 1227 MAC address will be the address of source node's default gateway, 1228 which in this case will be an IRB interface on PE1 (connecting EVI 60 1229 to IP-VPN 666). Once PE1 sees the traffic destined to its own MAC 1230 address, it will route the packet to EVI 600, i.e. it will change the 1231 source MAC address to the one of the IRB interface in EVI 600 and 1232 change the destination MAC address to the address belonging to the 1233 node behind CE1, which is already populated in the MAC-VRF-600 FIB, 1234 either through data or control plane learning. 1236 An important optimization to be noted is the local inter-subnet 1237 forwarding in lieu of IP VPN routing. If the node from subnet 60 1238 (behind CE2) is sending a packet to the remote end node on subnet 600 1239 (behind CE3), the mechanism in place still honors the local inter- 1240 subnet (inter-EVI) forwarding. 1242 In our use-case, therefore, when node from subnet 60 behind CE2 sends 1243 traffic to the node on subnet 600 behind CE3, the destination MAC 1244 address is the PE1 MAC-VRF-60 IRB MAC address. However, once the 1245 traffic locally crosses EVIs, to EVI 600, via the IRB interface on 1246 PE1, the source MAC address is changed to that of the IRB interface 1247 and the destination MAC address is changed to the one advertised by 1248 PE3 via EVPN and already installed in MAC-VRF-600. The rest of the 1249 forwarding through PE1 is using the MAC-VRF-600 forwarding context 1250 and label space. 1252 Another very relevant optimization is due to the fact that traffic 1253 between PEs is forwarded through EVPN, rather than through IP-VPN. In 1254 the example described above for traffic from EVI 60 on CE2 to EVI 600 1255 on CE3, there is no need for IP-VPN processing on the egress PE3. 1256 Traffic is forwarded either to the EVI 600 context in PE3 for further 1257 MAC lookup and next-hop processing, or directly to the node behind 1258 CE3, depending on the egress forwarding model being used. 1260 9. Conventions used in this document 1262 In the examples, the following conventions are used: 1264 o CE-VIDs refer to the VLAN tag identifiers being used at CE1, CE2 1265 and CE3 to tag customer traffic sent to the Service Provider E- 1266 VPN network 1268 o CE1-MAC, CE2-MAC and CE3-MAC refer to source MAC addresses "behind" 1269 each CE respectively. Those MAC addresses can belong to the CEs 1270 themselves or to devices connected to the CEs. 1272 o CE1-IP, CE2-IP and CE3-IP refer to IP addresses associated to the 1273 above MAC addresses. 1275 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 1276 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 1277 document are to be interpreted as described in RFC-2119 [RFC2119]. 1279 In this document, these words will appear with that interpretation 1280 only when in ALL CAPS. Lower case uses of these words are not to be 1281 interpreted as carrying RFC-2119 significance. 1283 10. Security Considerations 1285 11. IANA Considerations 1287 12. References 1289 12.1. Normative References 1291 [RFC4761]Kompella, K., Ed., and Y. Rekhter, Ed., "Virtual Private LAN 1292 Service (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 4761, 1293 January 2007, . 1295 [RFC4762]Lasserre, M., Ed., and V. Kompella, Ed., "Virtual Private 1296 LAN Service (VPLS) Using Label Distribution Protocol (LDP) 1297 Signaling", RFC 4762, January 2007, . 1300 [RFC6074]Rosen, E., Davie, B., Radoaca, V., and W. Luo, 1301 "Provisioning, Auto-Discovery, and Signaling in Layer 2 Virtual 1302 Private Networks (L2VPNs)", RFC 6074, January 2011, . 1305 [RFC4364]Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1306 Networks (VPNs)", RFC 4364, February 2006, . 1309 [RFC7209]Sajassi, A., Aggarwal, R., Uttaro, J., Bitar, N., 1310 Henderickx, W., and A. Isaac, "Requirements for Ethernet VPN (EVPN)", 1311 RFC 7209, May 2014, . 1313 [RFC7117]Aggarwal, R., Ed., Kamite, Y., Fang, L., Rekhter, Y., and C. 1314 Kodeboniya, "Multicast in Virtual Private LAN Service (VPLS)", 1315 RFC 7117, February 2014, . 1317 [RFC709] A. Sajassi, R. Aggarwal et al., "Requirements for Ethernet 1318 VPN", RFC7209, May 2014 1320 12.2. Informative References 1322 [EVPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf- 1323 l2vpn-evpn-10.txt, work in progress, October, 2014 1325 [EVPN-INTERSUBNET] Sajassi et al., "IP Inter-subnet forwarding in 1326 EVPN", draft-sajassi-l2vpn-evpn-inter-subnet-forwarding-05.txt 1328 13. Acknowledgments 1330 The authors want to thank Giles Heron for his detailed review of the 1331 document. We also thank Stefan Plug for his comments. 1333 This document was prepared using 2-Word-v2.0.template.dot. 1335 14. Authors' Addresses 1337 Jorge Rabadan 1338 Alcatel-Lucent 1339 777 E. Middlefield Road 1340 Mountain View, CA 94043 USA 1341 Email: jorge.rabadan@alcatel-lucent.com 1343 Senad Palislamovic 1344 Alcatel-Lucent 1345 Email: senad.palislamovic@alcatel-lucent.com 1347 Wim Henderickx 1348 Alcatel-Lucent 1349 Email: wim.henderickx@alcatel-lucent.be 1351 Florin Balus 1352 Alcatel-Lucent 1353 Email: Florin.Balus@alcatel-lucent.com 1355 Keyur Patel 1356 Cisco 1357 Email: keyupate@cisco.com 1359 Ali Sajassi 1360 Cisco 1361 Email: sajassi@cisco.com 1362 James Uttaro 1363 AT&T 1364 Email: uttaro@att.com 1366 Aldrin Isaac 1367 Bloomberg 1368 Email: aisaac71@bloomberg.net 1370 Truman Boyes 1371 Bloomberg 1372 Email: tboyes@bloomberg.net