idnits 2.17.1 draft-ietf-bess-evpn-usage-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 28, 2017) is 2432 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'PE-IP' is mentioned on line 407, but not defined == Missing Reference: 'AS' is mentioned on line 413, but not defined == Unused Reference: 'RFC4364' is defined on line 1339, but no explicit reference was found in the text == Unused Reference: 'RFC7117' is defined on line 1348, but no explicit reference was found in the text == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-03 Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Workgroup J. Rabadan, Ed. 3 Internet Draft S. Palislamovic 4 W. Henderickx 5 Intended status: Informational Nokia 7 A. Sajassi 8 Cisco 10 J. Uttaro 11 AT&T 13 Expires: March 1, 2018 August 28, 2017 15 Usage and applicability of BGP MPLS based Ethernet VPN 16 draft-ietf-bess-evpn-usage-06 18 Abstract 20 This document discusses the usage and applicability of BGP MPLS based 21 Ethernet VPN (EVPN) in a simple and fairly common deployment 22 scenario. The different EVPN procedures are explained on the example 23 scenario, analyzing the benefits and trade-offs of each option. This 24 document is intended to provide a simplified guide for the deployment 25 of EVPN networks. 27 Status of this Memo This Internet-Draft is submitted in full conformance 28 with the provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as Internet- 33 Drafts. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 39 The list of current Internet-Drafts can be accessed at 40 http://www.ietf.org/ietf/1id-abstracts.txt 42 The list of Internet-Draft Shadow Directories can be accessed at 43 http://www.ietf.org/shadow.html 45 This Internet-Draft will expire on March 1, 2018. 47 Copyright Notice 49 Copyright (c) 2017 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 3. Use-case scenario description and requirements . . . . . . . . 4 67 3.1. Service Requirements . . . . . . . . . . . . . . . . . . . 5 68 3.2. Why EVPN is chosen to address this use-case . . . . . . . . 6 69 4. Provisioning Model . . . . . . . . . . . . . . . . . . . . . . 7 70 4.1. Common provisioning tasks . . . . . . . . . . . . . . . . . 7 71 4.1.1. Non-service specific parameters . . . . . . . . . . . . 7 72 4.1.2. Service specific parameters . . . . . . . . . . . . . . 8 73 4.2. Service interface dependent provisioning tasks . . . . . . 9 74 4.2.1. VLAN-based service interface EVI . . . . . . . . . . . 9 75 4.2.2. VLAN-bundle service interface EVI . . . . . . . . . . . 10 76 4.2.3. VLAN-aware bundling service interface EVI . . . . . . . 10 77 5. BGP EVPN NLRI usage . . . . . . . . . . . . . . . . . . . . . . 10 78 6. MAC-based forwarding model use-case . . . . . . . . . . . . . . 11 79 6.1. EVPN Network Startup procedures . . . . . . . . . . . . . . 11 80 6.2. VLAN-based service procedures . . . . . . . . . . . . . . . 12 81 6.2.1. Service startup procedures . . . . . . . . . . . . . . 12 82 6.2.2. Packet walkthrough . . . . . . . . . . . . . . . . . . 13 83 6.3. VLAN-bundle service procedures . . . . . . . . . . . . . . 16 84 6.3.1. Service startup procedures . . . . . . . . . . . . . . 16 85 6.3.2. Packet Walkthrough . . . . . . . . . . . . . . . . . . 17 86 6.4. VLAN-aware bundling service procedures . . . . . . . . . . 17 87 6.4.1. Service startup procedures . . . . . . . . . . . . . . 18 88 6.4.2. Packet Walkthrough . . . . . . . . . . . . . . . . . . 18 89 7. MPLS-based forwarding model use-case . . . . . . . . . . . . . 19 90 7.1. Impact of MPLS-based forwarding on the EVPN network 91 startup . . . . . . . . . . . . . . . . . . . . . . . . . . 20 92 7.2. Impact of MPLS-based forwarding on the VLAN-based service 93 procedures . . . . . . . . . . . . . . . . . . . . . . . . 20 94 7.3. Impact of MPLS-based forwarding on the VLAN-bundle 95 service procedures . . . . . . . . . . . . . . . . . . . . 21 96 7.4. Impact of MPLS-based forwarding on the VLAN-aware service 97 procedures . . . . . . . . . . . . . . . . . . . . . . . . 21 98 8. Comparison between MAC-based and MPLS-based Egress Forwarding 99 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 100 9. Traffic flow optimization . . . . . . . . . . . . . . . . . . . 23 101 9.1. Control Plane Procedures . . . . . . . . . . . . . . . . . 23 102 9.1.1. MAC learning options . . . . . . . . . . . . . . . . . 23 103 9.1.2. Proxy-ARP/ND . . . . . . . . . . . . . . . . . . . . . 24 104 9.1.3. Unknown Unicast flooding suppression . . . . . . . . . 24 105 9.1.4. Optimization of Inter-subnet forwarding . . . . . . . . 25 106 9.2. Packet Walkthrough Examples . . . . . . . . . . . . . . . . 26 107 9.2.1. Proxy-ARP example for CE2 to CE3 traffic . . . . . . . 26 108 9.2.2. Flood suppression example for CE1 to CE3 traffic . . . 26 109 9.2.3. Optimization of inter-subnet forwarding example for 110 CE3 to CE2 traffic . . . . . . . . . . . . . . . . . . 27 111 10. Security Considerations . . . . . . . . . . . . . . . . . . . 28 112 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 113 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 114 12.1. Normative References . . . . . . . . . . . . . . . . . . . 28 115 12.2. Informative References . . . . . . . . . . . . . . . . . . 29 116 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 29 117 14. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 29 118 15. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 30 120 1. Introduction 122 This document complements [RFC7432] by discussing the applicability 123 of the technology in a simple and fairly common deployment scenario, 124 which is described in section 3. 126 After describing the topology and requirements of the use-case 127 scenario, section 4 will describe the provisioning model. 129 Once the provisioning model is analyzed, sections 5, 6 and 7 will 130 describe the control plane and data plane procedures in the example 131 scenario, for the two potential disposition/forwarding models: 132 MAC-based and MPLS-based models. While both models can interoperate 133 in the same network, each one has different trade-offs that are 134 analyzed in section 8. 136 Finally, EVPN provides some potential traffic flow optimization tools 137 that are also described in section 9, in the context of the example 138 scenario. 140 2. Terminology 142 The following terminology is used: 144 o VID: VLAN Identifier. 146 o CE: Customer Edge device. 148 o EVI: EVPN Instance. 150 o MAC-VRF: A Virtual Routing and Forwarding table for Media Access 151 Control (MAC) addresses on a PE. 153 o Ethernet Segment (ES): set of links through which a customer site 154 (CE) is connected to one or more PEs. Each ES is identified by an 155 Ethernet Segment Identifier (ESI) in the control plane. 157 o CE-VIDs refer to the VLAN tag identifiers being used at CE1, CE2 158 and CE3 to tag customer traffic sent to the Service Provider E- VPN 159 network 161 o CE1-MAC, CE2-MAC and CE3-MAC refer to source MAC addresses "behind" 162 each CE respectively. Those MAC addresses can belong to the CEs 163 themselves or to devices connected to the CEs. 165 o CE1-IP, CE2-IP and CE3-IP refer to IP addresses associated to the 166 above MAC addresses. 168 o LACP: Link Aggregation Control Protocol. 170 o RD: Route Distinguisher. 172 o RT: Route Target. 174 3. Use-case scenario description and requirements 176 Figure 1 depicts the scenario that will be referenced throughout the 177 rest of the document. 179 +--------------+ 180 | | 181 +----+ +----+ | | +----+ +----+ 182 | CE1|-----| | | | | |---| CE3| 183 +----+ /| PE1| | IP/MPLS | | PE3| +----+ 184 / +----+ | Network | +----+ 185 / | | 186 / +----+ | | 187 +----+/ | | | | 188 | CE2|-----| PE2| | | 189 +----+ +----+ | | 190 +--------------+ 192 Figure 1 EVPN use-case scenario 194 There are three PEs and three CEs considered in this example: PE1, 195 PE2, PE3, as well as CE1, CE2 and CE3. Broadcast Domains must be 196 extended among the three CEs. 198 3.1. Service Requirements 200 The following service requirements are assumed in this scenario: 202 o Redundancy requirements: 204 - CE2 requires multi-homing connectivity to PE1 and PE2, not only 205 for redundancy purposes, but also for adding more 206 upstream/downstream connectivity bandwidth to/from the network. 208 - Fast convergence. E.g.: if the link between CE2 and PE1 goes 209 down, a fast convergence mechanism must be supported so that PE3 210 can immediately send the traffic to PE2, irrespectively of the 211 number of affected services and MAC addresses. 213 o Service interface requirements: 215 - The service definition must be flexible in terms of CE-VID-to- 216 broadcast-domain assignment in the core. 218 - The following three EVI services are required in this example: 220 EVI100 - It uses VLAN-based service interfaces in the three CEs 221 with a 1:1 VLAN-to-EVI mapping. The CE-VIDs at the three CEs can 222 be the same, e.g.: VID 100, or different at each CE, e.g.: VID 223 101 in CE1, VID 102 in CE2 and VID 103 in CE3. A single broadcast 224 domain needs to be created for EVI100 in any case; therefore CE- 225 VIDs will require translation at the egress PEs if they are not 226 consistent across the three CEs. The case when the same CE-VID is 227 used across the three CEs for EVI100 is referred in [RFC7432] as 228 the "Unique VLAN" EVPN case. This term will be used throughout 229 this document too. 231 EVI200 - It uses VLAN-bundle service interfaces in CE1, CE2 and 232 CE3, based on an N:1 VLAN-to-EVI mapping. The operator needs to 233 pre-configure a range of CE-VIDs and its mapping to the EVI, and 234 this mapping should be consistent in all the PEs (no translation 235 is supported). A single broadcast domain is created for the 236 customer. The customer is responsible of keeping the separation 237 between users in different CE-VIDs. 239 EVI300 - It uses VLAN-aware bundling service interfaces in CE1, 240 CE2 and CE3. As in the EVI200 case, an N:1 VLAN-to-EVI mapping is 241 created at the ingress PEs, however in this case, a separate 242 broadcast domain is required per CE-VID. The CE-VIDs can be 243 different (hence CE-VID translation is required). 245 NOTE: in section 4.2.1, only EVI100 is used as an example of 246 VLAN-based service provisioning. In sections 6.2 and 7.2, 4k 247 VLAN-based EVIs (EVI1 to EVI4k) are used so that the impact of MAC 248 vs. MPLS disposition models in the control plane can be evaluated. In 249 the same way, EVI200 and EVI300 will be described with a 4k:1 mapping 250 (CE-VIDs-to-EVI mapping) in sections 6.3, 6.4, 7.3 and 7.4. 252 o BUM (Broadcast, Unknown unicast, Multicast) optimization 253 requirements: 255 - The solution must support ingress replication or P2MP MPLS LSPs 256 on a per EVI service. 258 - For example, we can use ingress replication for on EVI100 and 259 EVI200, assuming those EVIs will not carry much BUM traffic. On 260 the contrary, if EVI300 is presumably carrying a significant 261 amount of multicast traffic, P2MP MPLS LSPs can be used for this 262 service. 264 - The benefit of ingress replication compared to P2MP LSPs is that 265 the core routers will not need to maintain any multicast states. 267 3.2. Why EVPN is chosen to address this use-case 269 VPLS solutions based on [RFC4761], [RFC4762] and [RFC6074] cannot 270 meet the requirements in section 3, whereas EVPN can. 272 For example: 274 o If CE2 has a single CE-VID (or a few CE-VIDs) the current VPLS 275 multi-homing solutions (based on load-balancing per CE-VID or 276 service) do not provide the optimized link utilization required in 277 this example. EVPN provides the flow-based load-balancing 278 multi-homing solution required in this scenario to optimize the 279 upstream/downstream link utilization between CE2 and PE1-PE2. 281 o Also, EVPN provides a fast convergence solution that is independent 282 of the CE-VIDs in the multi-homed PEs. Upon failure on the link 283 between CE2 and PE1, PE3 can immediately send the traffic to PE2, 284 based on a single notification message being sent by PE1. This is 285 not possible with VPLS solutions. 287 o In regards to service interfaces and mapping to broadcast domains, 288 while VPLS might meet the requirements for EVI100 and EVI200, the 289 VLAN-aware bundling service interfaces required by EVI300 are not 290 supported by the current VPLS tools. 292 The rest of the document will describe how EVPN can be used to meet 293 the service requirements described in section 3, and even optimize 294 the network further by: 296 o Providing the user with an option to reduce (and even suppress) the 297 ARP-flooding. 299 o Supporting ARP termination and inter-subnet-forwarding. 301 4. Provisioning Model 303 One of the requirements stated in [RFC7209] is the ease of 304 provisioning. BGP parameters and service context parameters should be 305 auto-provisioned so that the addition of a new MAC-VRF to the EVI 306 requires a minimum number of single-sided provisioning touches. 307 However this is only possible in a limited number of cases. This 308 section describes the provisioning tasks required for the services 309 described in section 3, i.e. EVI100 (VLAN-based service interfaces), 310 EVI200 (VLAN-bundle service interfaces) and EVI300 (VLAN-aware 311 bundling service interfaces). 313 4.1. Common provisioning tasks 315 Regardless of the service interface type (VLAN-based, VLAN-bundle or 316 VLAN-aware), the following sub-sections describe the parameters to be 317 provisioned in the three PEs. 319 4.1.1. Non-service specific parameters 320 The multi-homing function in EVPN requires the provisioning of 321 certain parameters which are not service-specific and that are shared 322 by all the MAC-VRFs in the node using the multi-homing capabilities. 323 In our use-case, these parameters are only provisioned or auto- 324 derived in PE1 and PE2, and are listed below: 326 o Ethernet Segment Identifier (ESI): only the ESI associated to CE2 327 needs to be considered in our example. Single-homed CEs such as CE1 328 and CE3 do not require the provisioning of an ESI (the ESI will be 329 coded as zero in the BGP NLRIs). In our example, a LAG is used 330 between CE2 and PE1-PE2 (since all-active multi-homing is a 331 requirement) therefore the ESI can be auto-derived from the LACP 332 information as described in [RFC7432]. Note that the ESI must be 333 unique across all the PEs in the network, therefore the 334 auto-provisioning of the ESI is only recommended in case the CEs 335 are managed by the Operator. Otherwise the ESI should be manually 336 provisioned (type 0 as in [RFC7432]) in order to avoid potential 337 conflicts. 339 o ES-Import Route Target (ES-Import RT): this is the RT that will be 340 sent by PE1 and PE2, along with the ES route. Regardless of how the 341 ESI is provisioned in PE1 and PE2, the ES-Import RT must always be 342 auto-derived from the 6-byte MAC address portion of the ESI value. 344 o Ethernet Segment Route Distinguisher (ES RD): this is the RD to be 345 encoded in the ES route and Ethernet Auto-Discovery (A-D) route to 346 be sent by PE1 and PE2 for the CE2 ESI. This RD should always be 347 auto-derived from the PE IP address, as described in [RFC7432]. 349 o Multi-homing type: the user must be able to provision the 350 multi-homing type to be used in the network. In our use-case, the 351 multi-homing type will be set to all-active for the CE2 ESI. This 352 piece of information is encoded in the ESI Label extended community 353 flags and sent by PE1 and PE2 along with the Ethernet A-D route for 354 the CE2 ESI. 356 In our use-case, besides the above parameters, the same LACP 357 parameters will be configured in PE1 and PE2 for the ESI, so that CE2 358 can send different flows to PE1 and PE2 for the same CE-VID as though 359 they were forming a single system from the CE2 perspective. 361 4.1.2. Service specific parameters 363 The following parameters must be provisioned in PE1, PE2 and PE3 per 364 EVI service: 366 o EVI identifier: global identifier per EVI that is shared by all the 367 PEs part of the EVI, i.e. PE1, PE2 and PE3 will be provisioned with 368 EVI100, 200 and 300. The EVI identifier can be associated to (or be 369 the same value as) the EVI default Ethernet Tag (4-byte default 370 broadcast domain identifier for the EVI). The Ethernet Tag is 371 different from zero in the EVPN BGP routes only if the service 372 interface type (of the source PE) is VLAN-aware Bundle. 374 o EVI Route Distinguisher (EVI RD): This RD is a unique value across 375 all the MAC-VRFs in a PE. Auto-derivation of this RD might be 376 possible depending on the service interface type being used in the 377 EVI. Next section discusses the specifics of each service interface 378 type. 380 o EVI Route Target(s) (EVI RT): one or more RTs can be provisioned 381 per MAC-VRF. The RT(s) imported and exported can be equal or 382 different, just as the RT(s) in IP-VPNs. Auto-derivation of this 383 RT(s) might be possible depending on the service interface type 384 being used in the EVI. Next section discusses the specifics of each 385 service interface type. 387 o CE-VID and port/LAG binding to EVI identifier or Ethernet Tag: see 388 section 4.2. 390 4.2. Service interface dependent provisioning tasks 392 Depending on the service interface type being used in the EVI, a 393 specific CE-VID binding provisioning must be specified. 395 4.2.1. VLAN-based service interface EVI 397 In our use-case, EVI100 is a VLAN-based service interface EVI. 399 EVI100 can be a "unique-VLAN" service if the CE-VID being used for 400 this service in CE1, CE2 and CE3 is identical, e.g. VID 100. In that 401 case, the VID 100 binding must be provisioned in PE1, PE2 and PE3 for 402 EVI100 and the associated port or LAG. The MAC-VRF RD and RT can be 403 auto-derived from the CE-VID: 405 o The auto-derived MAC-VRF RD will be a Type 1 RD, as recommended in 406 [RFC7432], and it will be comprised of [PE-IP]:[zero-padded-VID]; 407 where [PE-IP] is the IP address of the PE (a loopback address) and 408 [zero-padded-VID] is a 2-byte value where the low order 12 bits are 409 the VID (VID 100 in our example) and the high order 4 bits are 410 zero. 412 o The auto-derived MAC-VRF RT will be composed of [AS]:[zero-padded- 413 VID]; where [AS] is the Autonomous System that the PE belongs to 414 and [zero-padded-VID] is a 2 or 4-byte value where the low order 12 415 bits are the VID (VID 100 in our example) and the high order bits 416 are zero. Note that auto-deriving the RT implies supporting a basic 417 any-to-any topology in the EVI and using the same import and export 418 RT in the EVI. 420 If EVI100 is not a "unique-VLAN" instance, each individual CE-VID 421 must be configured in each PE, and MAC-VRF RDs and RTs cannot be 422 auto-derived, hence they must be provisioned by the user. 424 4.2.2. VLAN-bundle service interface EVI 426 Assuming EVI200 is a VLAN-bundle service interface EVI, and VIDs 427 200-250 are assigned to EVI200, the CE-VID bundle 200-250 must be 428 provisioned on PE1, PE2 and PE3. Note that this model does not allow 429 CE-VID translation and the CEs must use the same CE-VIDs for EVI200. 430 No auto-derived EVI RDs or EVI RTs are possible. 432 4.2.3. VLAN-aware bundling service interface EVI 434 If EVI300 is a VLAN-aware bundling service interface EVI, CE-VID 435 binding to EVI300 does not have to match on the three PEs (only on 436 PE1 and PE2, since they are part of the same ES). E.g.: PE1 and PE2 437 CE-VID binding to EVI300 can be set to the range 300-310 and PE3 to 438 321-330. Note that each individual CE-VID will be assigned to a 439 different broadcast domain, represented by an Ethernet Tag in the 440 control plane. 442 Therefore, besides the CE-VID bundle range bound to EVI300 in each 443 PE, associations between each individual CE-VID and the corresponding 444 EVPN Ethernet Tag must be provisioned by the user. No auto-derived 445 EVI RDs/RTs are possible. 447 5. BGP EVPN NLRI usage 449 [RFC7432] defines four different route types and four different 450 extended communities. However, not all the PEs in an EVPN network 451 must generate and process all the different routes and extended 452 communities. The following table shows the routes that must be 453 exported and imported in the use-case described in this document. 454 "Export", in this context, means that the PE must be capable of 455 generating and exporting a given route, assuming there are no BGP 456 policies to prevent it. In the same way, "Import" means the PE must 457 be capable of importing and processing a given route, assuming the 458 right RTs and policies. "N/A" means neither import nor export actions 459 are required. 461 +-------------------+---------------+---------------+ 462 | BGP EVPN routes | PE1-PE2 | PE3 | 463 +-------------------+---------------+---------------+ 464 | ES | Export/import | N/A | 465 | A-D per ESI | Export/import | Import | 466 | A-D per EVI | Export/import | Import | 467 | MAC | Export/import | Export/import | 468 | Inclusive mcast | Export/import | Export/import | 469 +-------------------+---------------+---------------+ 471 PE3 is only required to export MAC and Inclusive multicast routes and 472 be able to import and process A-D routes, as well as MAC and 473 Inclusive multicast routes. If PE3 did not support importing and 474 processing A-D routes per ESI and per EVI, fast convergence and 475 aliasing functions (respectively) would not be possible in this 476 use-case. 478 6. MAC-based forwarding model use-case 480 This section describes how the BGP EVPN routes are exported and 481 imported by the PEs in our use-case, as well as how traffic is 482 forwarded assuming that PE1, PE2 and PE3 support a MAC-based 483 forwarding model. In order to compare the control and data plane 484 impact in the two forwarding models (MAC-based and MPLS-based) and 485 different service types, we will assume that CE1, CE2 and CE3 need to 486 exchange traffic for up to 4k CE-VIDs. 488 6.1. EVPN Network Startup procedures 490 Before any EVI is provisioned in the network, the following 491 procedures are required: 493 o Infrastructure setup: the proper MPLS infrastructure must be setup 494 among PE1, PE2 and PE3 so that the EVPN services can make use of 495 P2P and P2MP LSPs. In addition to the MPLS transport, PE1 and PE2 496 must be properly configured with the same LACP configuration to 497 CE2. Details are provided in [RFC7432]. Once the LAG is properly 498 setup, the ESI for the CE2 Ethernet Segment, e.g. ESI12, can be 499 auto-generated by PE1 and PE2 from the LACP information exchanged 500 with CE2 (ESI type 1), as discussed in section 4.1. Alternatively, 501 the ESI can also be manually provisioned on PE1 and PE2 (ESI type 502 0). PE1 and PE2 will auto-configure a BGP policy that will import 503 any ES route matching the auto-derived ES-import RT for ESI12. 505 o Ethernet Segment route exchange and DF election: PE1 and PE2 will 506 advertise a BGP Ethernet Segment route for ESI12, where the ESI RD 507 and ES-Import RT will be auto-generated as discussed in section 508 4.1.1. PE1 and PE2 will import the ES routes of each other and will 509 run the DF election algorithm for any existing EVI (if any, at this 510 point). PE3 will simply discard the route. Note that the DF 511 election algorithm can support service carving, so that the 512 downstream BUM traffic from the network to CE2 can be load-balanced 513 across PE1 and PE2 on a per-service basis. 515 At the end of this process, the network infrastructure is ready to 516 start deploying EVPN services. PE1 and PE2 are aware of the existence 517 of a shared Ethernet Segment, i.e. ESI12. 519 6.2. VLAN-based service procedures 521 Assuming that the EVPN network must carry traffic among CE1, CE2 and 522 CE3 for up to 4k CE-VIDs, the Service Provider can decide to 523 implement VLAN-based service interface EVIs to accomplish it. In this 524 case, each CE-VID will be individually mapped to a different EVI. 525 While this means a total number of 4k MAC-VRFs is required per PE, 526 the advantages of this approach are the auto-provisioning of most of 527 the service parameters if no VLAN translation is needed (see section 528 4.2.1) and great control over each individual customer broadcast 529 domain. We assume in this section that the range of EVIs from 1 to 4k 530 is provisioned in the network. 532 6.2.1. Service startup procedures 534 As soon as the EVIs are created in PE1, PE2 and PE3, the following 535 control plane actions are carried out: 537 o Flooding tree setup per EVI (4k routes): Each PE will send one 538 Inclusive Multicast Ethernet Tag route per EVI (up to 4k routes per 539 PE) so that the flooding tree per EVI can be setup. Note that 540 ingress replication or P2MP LSPs can optionally be signaled in the 541 PMSI Tunnel attribute and the corresponding tree be created. 543 o Ethernet A-D routes per ESI (a set of routes for ESI12): A set of 544 A-D routes with a total list of 4k RTs (one per EVI) for ESI12 will 545 be issued from PE1 and PE2 (it has to be a set of routes so that 546 the total number of RTs can be conveyed). As per [RFC7432], each 547 Ethernet A-D route per ESI is differentiated from the other routes 548 in the set by a different Route Distinguisher (ES RD). This set 549 will also include ESI Label extended communities with the active- 550 standby flag set to zero (all-active multi-homing type) and an ESI 551 Label different from zero (used for split-horizon functions). These 552 routes will be imported by the three PEs, since the RTs match the 553 EVI RTs locally configured. The A-D routes per ESI will be used for 554 fast convergence and split-horizon functions, as discussed in 555 [RFC7432]. 557 o Ethernet A-D routes per EVI (4k routes): An A-D route per EVI will 558 be sent by PE1 and PE2 for ESI12. Each individual route includes 559 the corresponding EVI RT and an MPLS label to be used by PE3 for 560 the aliasing function. These routes will be imported by the three 561 PEs. 563 6.2.2. Packet walkthrough 565 Once the services are setup, the traffic can start flowing. Assuming 566 there are no MAC addresses learned yet and that MAC learning at the 567 access is performed in the data plane in our use-case, this is the 568 process followed upon receiving frames from each CE (example for 569 EVI1). 571 (1) BUM frame example from CE1: 573 a) An ARP-request with CE-VID=1 is issued from source MAC CE1-MAC 574 (MAC address coming from CE1 or from a device connected to CE1) to 575 find the MAC address of CE3-IP. 577 b) Based on the CE-VID, the frame is identified to be forwarded in 578 the MAC-VRF-1 (EVI1) context. A source MAC lookup is done in the 579 MAC FIB and the sender's CE1-IP in the proxy-ARP table within the 580 MAC-VRF-1 (EVI1) context. If CE1-MAC/CE1-IP are unknown in both 581 tables, three actions are carried out (assuming the source MAC is 582 accepted by PE1): 584 (1) Forwarding state is added for CE1-MAC associated to the 585 corresponding port and CE-VID, 587 (2) the ARP-request is snooped and the tuple CE1-MAC/CE1-IP is 588 added to the proxy-ARP table and 590 (3) a BGP MAC advertisement route is triggered from PE1 containing 591 the EVI1 RD and RT, ESI=0, Ethernet-Tag=0 and CE1-MAC/CE1-IP 592 along with an MPLS label assigned to MAC-VRF-1 from the PE1 593 label space. Note that depending on the implementation, the 594 MAC FIB and proxy-ARP learning processes can independently 595 send two BGP MAC advertisements instead of one (one containing 596 only the CE1-MAC and another one containing CE1-MAC/CE1-IP). 598 Since we assume a MAC forwarding model, a label per MAC-VRF is 599 normally allocated and signaled by the three PEs for MAC 600 advertisement routes. Based on the RT, the route is imported by 601 PE2 and PE3 and the forwarding state plus ARP entry are added to 602 their MAC-VRF-1 context. From this moment on, any ARP request from 603 CE2 or CE3 destined to CE1-IP, can be directly replied by PE1, PE2 604 or PE3 and ARP flooding for CE1-IP is not needed in the core. 606 c) Since the ARP frame is a broadcast frame, it is forwarded by PE1 607 using the Inclusive multicast tree for EVI1 (CE-VID=1 tag should 608 be kept if translation is required). Depending on the type of 609 tree, the label stack may vary. E.g. assuming ingress replication, 610 the packet is replicated to PE2 and PE3 with the downstream 611 allocated labels and the P2P LSP transport labels. No other labels 612 are added to the stack. 614 d) Assuming PE1 is the DF for EVI1 on ESI12, the frame is locally 615 replicated to CE2. 617 e) The MPLS-encapsulated frame gets to PE2 and PE3. Since PE2 is non- 618 DF for EVI1 on ESI12, and there is no other CE connected to PE2, 619 the frame is discarded. At PE3, the frame is de-encapsulated, CE- 620 VID translated if needed and forwarded to CE3. 622 Any other type of BUM frame from CE1 would follow the same 623 procedures. BUM frames from CE3 would follow the same procedures too. 625 (2) BUM frame example from CE2: 627 a) An ARP-request with CE-VID=1 is issued from source MAC CE2-MAC to 628 find the MAC address of CE3-IP. 630 b) CE2 will hash the frame and will forward it to e.g. PE2. Based on 631 the CE-VID, the frame is identified to be forwarded in the EVI1 632 context. A source MAC lookup is done in the MAC FIB and the 633 sender's CE2-IP in the proxy-ARP table within the MAC-VRF-1 634 context. If both are unknown, three actions are carried out 635 (assuming the source MAC is accepted by PE2): 637 (1) Forwarding state is added for CE2-MAC associated to the 638 corresponding LAG/ESI and CE-VID, 640 (2) the ARP-request is snooped and the tuple CE2-MAC/CE2-IP is 641 added to the proxy-ARP table and 643 (3) a BGP MAC advertisement route is triggered from PE2 containing 644 the EVI1 RD and RT, ESI=12, Ethernet-Tag=0 and CE2-MAC/CE2-IP 645 along with an MPLS label assigned from the PE2 label space 646 (one label per MAC-VRF). Again, depending on the 647 implementation, the MAC FIB and proxy-ARP learning processes 648 can independently send two BGP MAC advertisements instead of 649 one. 651 Note that, since PE3 is not part of ESI12, it will install 652 forwarding state for CE2-MAC as long as the A-D routes for ESI12 653 are also active on PE3. On the contrary, PE1 is part of ESI12, 654 therefore PE1 will not modify the forwarding state for CE2-MAC if 655 it has previously learnt CE2-MAC locally attached to ESI12. 657 Otherwise it will add forwarding state for CE2-MAC associated to 658 the local ESI12 port. 660 c) Assuming PE2 does not have the ARP information for CE3-IP yet, and 661 since the ARP is a broadcast frame and PE2 the non-DF for EVI1 on 662 ESI12, the frame is forwarded by PE2 in the Inclusive multicast 663 tree for EVI1, adding the ESI label for ESI12 at the bottom of the 664 stack. The ESI label has been previously allocated and signaled by 665 the A-D routes for ESI12. Note that, as per [RFC7432], if the 666 result of the CE2 hashing is different and the frame sent to PE1, 667 PE1 should add the ESI label too (PE1 is the DF for EVI1 on 668 ESI12). 670 d) The MPLS-encapsulated frame gets to PE1 and PE3. PE1 671 de-encapsulates the Inclusive multicast tree label(s) and based on 672 the ESI label at the bottom of the stack, it decides to not 673 forward the frame to the ESI12. It will pop the ESI label and will 674 replicate it to CE1 though, since CE1 is not part of the ESI 675 identified by the ESI label. At PE3, the Inclusive multicast tree 676 label is popped and the frame forwarded to CE3. If a P2MP LSP is 677 used as Inclusive multicast tree for EVI1, PE3 will find an ESI 678 label after popping the P2MP LSP label. The ESI label will simply 679 be popped, since CE3 is not part of ESI12. 681 (3) Unicast frame example from CE3 to CE1: 683 a) A unicast frame with CE-VID=1 is issued from source MAC CE3-MAC 684 and destination MAC CE1-MAC (we assume PE3 has previously resolved 685 an ARP request from CE3 to find the MAC of CE1-IP, and has added 686 CE3-MAC/CE3-IP to its proxy-ARP table). 688 b) Based on the CE-VID, the frame is identified to be forwarded in 689 the EVI1 context. A source MAC lookup is done in the MAC FIB 690 within the MAC-VRF-1 context and this time, since we assume CE3- 691 MAC is known, no further actions are carried out as a result of 692 the source lookup. A destination MAC lookup is performed next and 693 the label stack associated to the MAC CE1-MAC is found (including 694 the label associated to MAC-VRF-1 in PE1 and the P2P LSP label to 695 get to PE1). The unicast frame is then encapsulated and forwarded 696 to PE1. 698 c) At PE1, the packet is identified to be part of EVI1 and a 699 destination MAC lookup is performed in the MAC-VRF-1 context. The 700 labels are popped and the frame forwarded to CE1 with CE-VID=1. 702 Unicast frames from CE1 to CE3 or from CE2 to CE3 follow the same 703 procedures described above. 705 (4) Unicast frame example from CE3 to CE2: 707 a) A unicast frame with CE-VID=1 is issued from source MAC CE3-MAC 708 and destination MAC CE2-MAC (we assume PE3 has previously resolved 709 an ARP request from CE3 to find the MAC of CE2-IP). 711 b) Based on the CE-VID, the frame is identified to be forwarded in 712 the MAC-VRF-1 context. We assume CE3-MAC is known. A destination 713 MAC lookup is performed next and PE3 finds CE2-MAC associated to 714 PE2 on ESI12, an Ethernet Segment for which PE3 has two active A-D 715 routes per ESI (from PE1 and PE2) and two active A-D routes for 716 EVI1 (from PE1 and PE2). Based on a hashing function for the 717 frame, PE3 may decide to forward the frame using the label stack 718 associated to PE2 (label received from the MAC advertisement 719 route) or the label stack associated to PE1 (label received from 720 the A-D route per EVI for EVI1). Either way, the frame is 721 encapsulated and sent to the remote PE. 723 c) At PE2 (or PE1), the packet is identified to be part of EVI1 based 724 on the bottom label, and a destination MAC lookup is performed. At 725 either PE (PE2 or PE1), the FIB lookup yields a local ESI12 port 726 to which the frame is sent. 728 Unicast frames from CE1 to CE2 follow the same procedures. 730 6.3. VLAN-bundle service procedures 732 Instead of using VLAN-based interfaces, the Operator can choose to 733 implement VLAN-bundle interfaces to carry the traffic for the 4k CE- 734 VIDs among CE1, CE2 and CE3. If that is the case, the 4k CE-VIDs can 735 be mapped to the same EVI, e.g. EVI200, at each PE. The main 736 advantage of this approach is the low control plane overhead (reduced 737 number of routes and labels) and easiness of provisioning, at the 738 expense of no control over the customer broadcast domains, i.e. a 739 single inclusive multicast tree for all the CE-VIDs and no CE-VID 740 translation in the Provider network. 742 6.3.1. Service startup procedures 744 As soon as the EVI200 is created in PE1, PE2 and PE3, the following 745 control plane actions are carried out: 747 o Flooding tree setup per EVI (one route): Each PE will send one 748 Inclusive Multicast Ethernet Tag route per EVI (hence only one 749 route per PE) so that the flooding tree per EVI can be setup. Note 750 that ingress replication or P2MP LSPs can optionally be signaled 751 in the PMSI Tunnel attribute and the corresponding tree be 752 created. 754 o Ethernet A-D routes per ESI (one route for ESI12): A single A-D 755 route for ESI12 will be issued from PE1 and PE2. This route will 756 include a single RT (RT for EVI200), an ESI Label extended 757 community with the active-standby flag set to zero (all-active 758 multi-homing type) and an ESI Label different from zero (used by 759 the non-DF for split-horizon functions). This route will be 760 imported by the three PEs, since the RT matches the EVI200 RT 761 locally configured. The A-D routes per ESI will be used for fast 762 convergence and split-horizon functions, as described in 763 [RFC7432]. 765 o Ethernet A-D routes per EVI (one route): An A-D route (EVI200) will 766 be sent by PE1 and PE2 for ESI12. This route includes the EVI200 767 RT and an MPLS label to be used by PE3 for the aliasing function. 768 This route will be imported by the three PEs. 770 6.3.2. Packet Walkthrough 772 The packet walkthrough for the VLAN-bundle case is similar to the one 773 described for EVI1 in the VLAN-based case except for the way the 774 CE-VID is handled by the ingress PE and the egress PE: 776 o No VLAN translation is allowed and the CE-VIDs are kept untouched 777 from CE to CE, i.e. the ingress CE-VID must be kept at the 778 imposition PE and at the disposition PE. 780 o The frame is identified to be forwarded in the MAC-VRF-200 context 781 as long as its CE-VID belongs to the VLAN-bundle defined in the 782 PE1/PE2/PE3 port to CE1/CE2/CE3. Our example is a special VLAN- 783 bundle case, since the entire CE-VID range is defined in the 784 ports, therefore any CE-VID would be part of EVI200. 786 Please refer to section 6.2.2 for more information about the control 787 plane and forwarding plane interaction for BUM and unicast traffic 788 from the different CEs. 790 6.4. VLAN-aware bundling service procedures 792 The last potential service type analyzed in this document is 793 VLAN-aware bundling. When this type of service interface is used to 794 carry the 4k CE-VIDs among CE1, CE2 and CE3, all the CE-VIDs will be 795 mapped to the same EVI, e.g. EVI300. The difference, compared to the 796 VLAN-bundle service type in the previous section, is that each 797 incoming CE-VID will also be mapped to a different "normalized" 798 Ethernet-Tag in addition to EVI300. If no translation is required, 799 the Ethernet-tag will match the CE-VID. Otherwise a translation 800 between CE-VID and Ethernet-tag will be needed at the imposition PE 801 and at the disposition PE. The main advantage of this approach is the 802 ability to control customer broadcast domains while providing a 803 single EVI to the customer. 805 6.4.1. Service startup procedures 807 As soon as the EVI300 is created in PE1, PE2 and PE3, the following 808 control plane actions are carried out: 810 o Flooding tree setup per EVI per Ethernet-Tag (4k routes): Each PE 811 will send one Inclusive Multicast Ethernet Tag route per EVI and 812 per Ethernet-Tag (hence 4k routes per PE) so that the flooding 813 tree per customer broadcast domain can be setup. Note that ingress 814 replication or P2MP LSPs can optionally be signaled in the PMSI 815 Tunnel attribute and the corresponding tree be created. In the 816 described use-case, since all the CE-VIDs and Ethernet-Tags are 817 defined on the three PEs, multicast tree aggregation might make 818 sense in order to save forwarding states. 820 o Ethernet A-D routes per ESI (one route for ESI12): A single A-D 821 route for ESI12 will be issued from PE1 and PE2. This route will 822 include a single RT (RT for EVI300), an ESI Label extended 823 community with the active-standby flag set to zero (all-active 824 multi-homing type) and an ESI Label different than zero (used by 825 the non-DF for split-horizon functions). This route will be 826 imported by the three PEs, since the RT matches the EVI300 RT 827 locally configured. The A-D routes per ESI will be used for fast 828 convergence and split-horizon functions, as described in 829 [RFC7432]. 831 o Ethernet A-D routes per EVI: a single A-D route (EVI300) may be 832 sent by PE1 and PE2 for ESI12, in case no CE-VID translation is 833 required. This route includes the EVI300 RT and an MPLS label to 834 be used by PE3 for the aliasing function. This route will be 835 imported by the three PEs. Note that if CE-VID translation is 836 required, an A-D per EVI route is required per Ethernet-Tag (4k). 838 6.4.2. Packet Walkthrough 840 The packet walkthrough for the VLAN-aware case is similar to the one 841 described before. Compared to the other two cases, VLAN-aware 842 services allow for CE-VID translation and for an N:1 CE-VID to EVI 843 mapping. Both things are not supported at once in either of the two 844 other service interfaces. Some differences compared to the packet 845 walkthrough described in section 6.2.2 are: 847 o At the ingress PE, the frames are identified to be forwarded in the 848 EVI300 context as long as their CE-VID belong to the range defined 849 in the PE port to the CE. In addition to it, CE-VID=x is mapped to 850 a "normalized" Ethernet-Tag=y at the MAC-VRF-300 (where x and y 851 might be equal if no translation is needed). Qualified learning is 852 now required (a different Bridge Table is allocated within MAC- 853 VRF-300 for each Ethernet-Tag). Potentially the same MAC could be 854 learned in two different Ethernet-Tag Bridge Tables of the same 855 MAC-VRF. 857 o Any new locally learned MAC on the MAC-VRF-300/Ethernet-Tag=y 858 interface is advertised by the ingress PE in a MAC advertisement 859 route, using now the Ethernet-Tag field (Ethernet-Tag=y) so that 860 the remote PE learns the MAC associated to the MAC-VRF- 861 300/Ethernet-Tag=y FIB. Note that the Ethernet-Tag field is not 862 used in advertisements of MACs learned on VLAN-based or VLAN- 863 bundle service interfaces. 865 o At the ingress PE, BUM frames are sent to the corresponding 866 flooding tree for the particular Ethernet-Tag they are mapped to. 867 Each individual Ethernet-Tag can have a different flooding tree 868 within the same EVI300. For instance, Ethernet-Tag=y can use 869 ingress replication to get to the remote PEs whereas Ethernet- 870 Tag=z can use a p2mp LSP. 872 o At the egress PE, Ethernet-Tag=y, for a given broadcast domain 873 within MAC-VRF-300, can be translated to egress CE-VID=x. That is 874 not possible for VLAN-bundle interfaces. It is possible for VLAN- 875 based interfaces, but it requires a separate MAC-VRF per CE-VID. 877 7. MPLS-based forwarding model use-case 879 EVPN supports an alternative forwarding model, usually referred to as 880 MPLS-based forwarding or disposition model as opposed to the 881 MAC-based forwarding or disposition model described in section 6. 882 Using MPLS-based forwarding model instead of MAC-based model might 883 have an impact on: 885 o The number of forwarding states required. 887 o The FIB where the forwarding states are handled: MAC FIB or MPLS 888 LFIB. 890 The MPLS-based forwarding model avoids the destination MAC lookup at 891 the egress PE MAC FIB, at the expense of increasing the number of 892 next-hop forwarding states at the egress MPLS LFIB. This also has an 893 impact on the control plane and the label allocation model, since an 894 MPLS-based disposition PE must send as many routes and labels as 895 required next-hops in the egress MAC-VRF. This concept is equivalent 896 to the forwarding models supported in IP-VPNs at the egress PE, where 897 an IP lookup in the IP-VPN FIB might be necessary or not depending on 898 the available next-hop forwarding states in the LFIB. 900 The following sub-sections highlight the impact on the control and 901 data plane procedures described in section 6 when and MPLS-based 902 forwarding model is used. 904 Note that both forwarding models are compatible and interoperable in 905 the same network. The implementation of either model in each PE is a 906 local decision to the PE node. 908 7.1. Impact of MPLS-based forwarding on the EVPN network startup 910 The MPLS-based forwarding model has no impact on the procedures 911 explained in section 6.1. 913 7.2. Impact of MPLS-based forwarding on the VLAN-based service 914 procedures 916 Compared to the MAC-based forwarding model, the MPLS-based forwarding 917 model has no impact in terms of number of routes, when all the 918 service interfaces are VLAN-based. The differences for the use-case 919 described in this document are summarized in the following list: 921 o Flooding tree setup per EVI (4k routes per PE): no impact compared 922 to the MAC-based model. 924 o Ethernet A-D routes per ESI (one set of routes for ESI12 per PE): 925 no impact compared to the MAC-based model. 927 o Ethernet A-D routes per EVI (4k routes per PE/ESI): no impact 928 compared to the MAC-based model. 930 o MAC-advertisement routes: instead of allocating and advertising the 931 same MPLS label for all the new MACs locally learnt on the same 932 MAC-VRF, a different label must be advertised per CE next-hop or 933 MAC so that no MAC FIB lookup is needed at the egress PE. In 934 general, this means that a different label at least per CE must be 935 advertised, although the PE can decide to implement a label per MAC 936 if more granularity (hence less scalability) is required in terms 937 of forwarding states. E.g. if CE2 sends traffic from two different 938 MACs to PE1, CE2-MAC1 and CE2-MAC2, the same MPLS label=x can be 939 re-used for both MAC advertisements since they both share the same 940 source ESI12. It is up to the PE1 implementation to use a different 941 label per individual MAC within the same ES Segment (even if only 942 one label per ESI is enough). 944 o PE1, PE2 and PE3 will not add forwarding states to the MAC FIB upon 945 learning new local CE MAC addresses on the data plane, but will 946 rather add forwarding states to the MPLS LFIB. 948 7.3. Impact of MPLS-based forwarding on the VLAN-bundle service 949 procedures 951 Compared to the MAC-based forwarding model, the MPLS-based forwarding 952 model has no impact in terms of number of routes when all the service 953 interfaces are VLAN-bundle type. The differences for the use-case 954 described in this document are summarized in the following list: 956 o Flooding tree setup per EVI (one route): no impact compared to the 957 MAC-based model. 959 o Ethernet A-D routes per ESI (one route for ESI12 per PE): no impact 960 compared to the MAC-based model. 962 o Ethernet A-D routes per EVI (one route per PE/ESI): no impact 963 compared to the MAC-based model since no VLAN translation is 964 required. 966 o MAC-advertisement routes: instead of allocating and advertising the 967 same MPLS label for all the new MACs locally learnt on the same 968 MAC-VRF, a different label must be advertised per CE next-hop or 969 MAC so that no MAC FIB lookup is needed at the egress PE. In 970 general, this means that a different label at least per CE must be 971 advertised, although the PE can decide to implement a label per MAC 972 if more granularity (hence less scalability) is required in terms 973 of forwarding states. It is up to the PE1 implementation to use a 974 different label per individual MAC within the same ES Segment (even 975 if only one label per ESI is enough). 977 o PE1, PE2 and PE3 will not add forwarding states to the MAC FIB upon 978 learning new local CE MAC addresses on the data plane, but will 979 rather add forwarding states to the MPLS LFIB. 981 7.4. Impact of MPLS-based forwarding on the VLAN-aware service 982 procedures 984 Compared to the MAC-based forwarding model, the MPLS-based forwarding 985 model has no impact in terms of number of A-D routes when all the 986 service interfaces are VLAN-aware bundle type. The differences for 987 the use-case described in this document are summarized in the 988 following list: 990 o Flooding tree setup per EVI (4k routes per PE): no impact compared 991 to the MAC-based model. 993 o Ethernet A-D routes per ESI (one route for ESI12 per PE): no impact 994 compared to the MAC-based model. 996 o Ethernet A-D routes per EVI (1 route per ESI or 4k routes per 997 PE/ESI): PE1 and PE2 may send one route per ESI if no CE-VID 998 translation is needed. However, 4k routes normally sent for EVI300, 999 one per tuple. This will allow the egress PE 1000 to find out all the forwarding information in the MPLS LFIB and 1001 even support Ethernet-Tag to CE-VID translation at the egress. 1003 o MAC-advertisement routes: instead of allocating and advertising the 1004 same MPLS label for all the new MACs locally learnt on the same 1005 MAC-VRF, a different label must be advertised per CE next-hop or 1006 MAC so that no MAC FIB lookup is needed at the egress PE. In 1007 general, this means that a different label at least per CE must be 1008 advertised, although the PE can decide to implement a label per MAC 1009 if more granularity (hence less scalability) is required in terms 1010 of forwarding states. It is up to the PE1 implementation to use a 1011 different label per individual MAC within the same ES Segment. Note 1012 that the Ethernet-Tag will be set to a non-zero value for the MAC- 1013 advertisement routes. The same MAC address can be announced with 1014 different Ethernet-Tag value. This will make the advertising PE 1015 install two different forwarding states in the MPLS LFIB. 1017 o PE1, PE2 and PE3 will not add forwarding states to the MAC FIB upon 1018 learning new local CE MAC addresses on the data plane, but will 1019 rather add forwarding states to the MPLS LFIB. 1021 8. Comparison between MAC-based and MPLS-based Egress Forwarding Models 1023 Both forwarding models are possible in a network deployment and each 1024 one has its own trade-offs. 1026 Both forwarding models can save A-D routes per EVI when VLAN-aware 1027 bundling services are deployed and no CE-VID translation is required. 1028 While this saves a significant amount of routes, customers normally 1029 require CE-VID translation, hence we assume an A-D per EVI route per 1030 is needed. 1032 This MAC-based model saves a significant amount of MPLS labels 1033 compared to the MPLS-based forwarding model. All the MACs and A-D 1034 routes for the same EVI can signal the same MPLS label, saving labels 1035 from the local PE space. A MAC FIB lookup at the egress PE is 1036 required in order to do so. 1038 The MPLS-based forwarding model can save forwarding states at the 1039 egress PEs if labels per next hop CE (as opposed to per MAC) are 1040 implemented. No egress MAC lookup is required. Also, a different 1041 label per next-hop CE per MAC-VRF is consumed, as opposed to a single 1042 label per MAC-VRF. 1044 The following table summarizes the implementation details of both 1045 models. 1047 +-----------------------------+----------------+----------------+ 1048 | 4k CE-VID VLANs | MAC-based | MPLS-based | 1049 | | Model | Model | 1050 +-----------------------------+----------------+----------------+ 1051 | MPLS labels consumed | 1 per MAC-VRF | 1 per CE/EVI | 1052 | Egress PE Forwarding states | 1 per MAC | 1 per next-hop | 1053 | Egress PE Lookups | 2 (MPLS+MAC) | 1 (MPLS) | 1054 +-----------------------------+----------------+----------------+ 1056 The egress forwarding model is an implementation local to the egress 1057 PE and is independent of the model supported on the rest of the PEs, 1058 i.e. in our use-case, PE1, PE2 and PE3 could have either egress 1059 forwarding model without any dependencies. 1061 9. Traffic flow optimization 1063 In addition to the procedures described across sections 3 through 8, 1064 EVPN [RFC7432] procedures allow for optimized traffic handling in 1065 order to minimize unnecessary flooding across the entire 1066 infrastructure. Optimization is provided through specific ARP 1067 termination and the ability to block unknown unicast flooding. 1068 Additionally, EVPN procedures allow for intelligent, close to the 1069 source, inter-subnet forwarding and solves the commonly known sub- 1070 optimal routing problem. Besides the traffic efficiency, ingress 1071 based inter-subnet forwarding also optimizes packet forwarding rules 1072 and implementation at the egress nodes as well. Details of these 1073 procedures are outlined in sections 9.1 and 9.2. 1075 9.1. Control Plane Procedures 1077 9.1.1. MAC learning options 1079 The fundamental premise of [RFC7432] is the notion of a different 1080 approach to MAC address learning compared to traditional IEEE 802.1 1081 bridge learning methods; specifically EVPN differentiates between 1082 data and control plane driven learning mechanisms. 1084 Data driven learning implies that there is no separate communication 1085 channel used to advertise and propagate MAC addresses. Rather, MAC 1086 addresses are learned through IEEE defined bridge-learning procedures 1087 as well as by snooping on DHCP and ARP requests. As different MAC 1088 addresses show up on different ports, the L2 FIB is populated with 1089 the appropriate MAC addresses. 1091 Control plane driven learning implies a communication channel that 1092 could be either a control-plane protocol or a management-plane 1093 mechanism. In the context of EVPN, two different learning procedures 1094 are defined, i.e. local and remote procedures: 1096 o Local learning defines the procedures used for learning the MAC 1097 addresses of network elements locally connected to a MAC-VRF. 1098 Local learning could be implemented through all three learning 1099 procedures: control plane, management plane as well as data plane. 1100 However, the expectation is that for most of the use cases, local 1101 learning through data plane should be sufficient. 1103 o Remote learning defines the procedures used for learning MAC 1104 addresses of network elements remotely connected to a MAC-VRF, 1105 i.e. far-end PEs. Remote learning procedures defined in [RFC7432] 1106 advocate using only control plane learning; specifically BGP. 1107 Through the use of BGP EVPN NLRIs, the remote PE has the 1108 capability of advertising all the MAC addresses present in its 1109 local FIB. 1111 9.1.2. Proxy-ARP/ND 1113 In EVPN, MAC addresses are advertised via the MAC/IP Advertisement 1114 Route, as discussed in [RFC7432]. Optionally an IP address can be 1115 advertised along with the MAC address advertisement. However, there 1116 are certain rules put in place in terms of IP address usage: if the 1117 MAC/IP Route contains an IP address, this particular IP address 1118 correlates directly with the advertised MAC address. Such 1119 advertisement allows us to build a proxy-ARP/ND table populated with 1120 the IP<->MAC bindings received from all the remote nodes. 1122 Furthermore, based on these bindings, a local MAC-VRF can now provide 1123 Proxy-ARP/ND functionality for all ARP requests and ND solicitations 1124 directed to the IP address pool learned through BGP. Therefore, the 1125 amount of unnecessary L2 flooding, ARP/ND requests/solicitations in 1126 this case, can be further reduced by the introduction of Proxy-ARP/ND 1127 functionality across all EVI MAC-VRFs. 1129 9.1.3. Unknown Unicast flooding suppression 1131 Given that all locally learned MAC addresses are advertised through 1132 BGP to all remote PEs, suppressing flooding of any Unknown Unicast 1133 traffic towards the remote PEs is a feasible network optimization. 1135 The assumption in the use case is made that any network device that 1136 appears on a remote MAC-VRF will somehow signal its presence to the 1137 network. This signaling can be done through e.g. gratuitous ARPs. 1138 Once the remote PE acknowledges the presence of the node in the MAC- 1139 VRF, it will do two things: install its MAC address in its local FIB 1140 and advertise this MAC address to all other BGP speakers via EVPN 1141 NLRI. Therefore, we can assume that any active MAC address is 1142 propagated and learnt through the entire EVI. Given that MAC 1143 addresses become pre-populated - once nodes are alive on the network 1144 - there is no need to flood any unknown unicast towards the remote 1145 PEs. If the owner of a given destination MAC is active, the BGP route 1146 will be present in the local RIB and FIB, assuming that the BGP 1147 import policies are successfully applied; otherwise, the owner of 1148 such destination MAC is not present on the network. 1150 It is worth noting that unless: a) control or management plane 1151 learning is performed through the entire EVI or b) all the EVI- 1152 attached devices signal their presence when they come up (GARPs or 1153 similar), unknown unicast flooding must be enabled. 1155 9.1.4. Optimization of Inter-subnet forwarding 1157 In a scenario in which both L2 and L3 services are needed over the 1158 same physical topology, some interaction between EVPN and IP-VPN is 1159 required. A common way of stitching the two service planes is through 1160 the use of an IRB interface, which allows for traffic to be either 1161 routed or bridged depending on its destination MAC address. If the 1162 destination MAC address is the one of the IRB interface, traffic 1163 needs to be passed through a routing module and potentially be either 1164 routed to a remote PE or forwarded to a local subnet. If the 1165 destination MAC address is not the one of the IRB, the MAC-VRF 1166 follows standard bridging procedures. 1168 A typical example of EVPN inter-subnet forwarding would be a scenario 1169 in which multiple IP subnets are part of a single or multiple EVIs, 1170 and they all belong to a single IP-VPN. In such topologies, it is 1171 desired that inter-subnet traffic can be efficiently routed without 1172 any tromboning effects in the network. Due to the overlapping 1173 physical and service topology in such scenarios, all inter-subnet 1174 connectivity will be locally routed through the IRB interface. 1176 In addition to optimizing the traffic patterns in the network, local 1177 inter-subnet forwarding also optimizes greatly the amount of 1178 processing needed to cross the subnets. Through EVPN MAC 1179 advertisements, the local PE learns the real destination MAC address 1180 associated with the remote IP address and the inter-subnet forwarding 1181 can happen locally. When the packet is received at the egress PE, it 1182 is directly mapped to an egress MAC-VRF, bypassing any egress IP-VPN 1183 processing. 1185 Please refer to [EVPN-INTERSUBNET] for more information about the IP 1186 inter-subnet forwarding procedures in EVPN. 1188 9.2. Packet Walkthrough Examples 1190 Assuming that the services are setup according to figure 1 in section 1191 3, the following flow optimization processes will take place in terms 1192 of creating, receiving and forwarding packets across the network. 1194 9.2.1. Proxy-ARP example for CE2 to CE3 traffic 1196 Using Figure 1 in section 3, consider EVI 400 residing on PE1, PE2 1197 and PE3 connecting CE2 and CE3 networks. Also, consider that PE1 and 1198 PE2 are part of the all-active multi-homing ES for CE2, and that PE2 1199 is elected designated-forwarder for EVI400. We assume that all the 1200 PEs implement the proxy-ARP functionality in the MAC-VRF-400 context. 1202 In this scenario, PE3 will not only advertise the MAC addresses 1203 through the EVPN MAC Advertisement Route but also IP addresses of 1204 individual hosts, i.e. /32 prefixes, behind CE3. Upon receiving the 1205 EVPN routes, PE1 and PE2 will install the MAC addresses in the MAC- 1206 VRF-400 FIB and based on the associated received IP addresses, PE1 1207 and PE2 can now build a proxy-ARP table within the context of MAC- 1208 VRF-400. 1210 From the forwarding perspective, when a node behind CE2 sends a frame 1211 destined to a node behind CE3, it will first send an ARP request to 1212 e.g. PE2 (based on the result of the CE2 hashing). Assuming that PE2 1213 has populated its proxy-ARP table for all active nodes behind the 1214 CE3, and that the IP address in the ARP message matches the entry in 1215 the table, PE2 will respond to the ARP request with the actual MAC 1216 address on behalf of the node behind CE3. 1218 Once the nodes behind CE2 learn the actual MAC address of the nodes 1219 behind CE3, all the MAC-to-MAC communications between the two 1220 networks will be unicast. 1222 9.2.2. Flood suppression example for CE1 to CE3 traffic 1224 Using Figure 1 in section 3, consider EVI 500 residing on PE1 and PE3 1225 connecting CE1 and CE3 networks. Consider that both PE1 and PE3 have 1226 disabled unknown unicast flooding for this specific EVI context. Once 1227 the network devices behind CE3 come online they will learn their MAC 1228 addresses and create local FIB entries for these devices. Note that 1229 local FIB entries could also be created through either a control or 1230 management plane between PE and CE as well. Consequently, PE3 will 1231 automatically create EVPN Type 2 MAC Advertisement Routes and 1232 advertise all locally learned MAC addresses. The routes will also 1233 include the corresponding MPLS label. 1235 Given that PE1 automatically learns and installs all MAC addresses 1236 behind CE3, its MAC-VRF FIB will already be pre-populated with the 1237 respective next-hops and label assignments associated with the MAC 1238 addresses behind CE3. As such, as soon as the traffic sent by CE1 to 1239 nodes behind CE3 is received into the context of EVI 500, PE1 will 1240 push the MPLS Label(s) onto the original Ethernet frame and send the 1241 packet to the MPLS network. As usual, once PE3 receives this packet, 1242 and depending on the forwarding model, PE3 will either do a next-hop 1243 lookup in the EVI 500 context, or will just forward the traffic 1244 directly to the CE3. In the case that PE1 MAC-VRF-500 does not have a 1245 MAC entry for a specific destination that CE1 is trying to reach, PE1 1246 will drop the frame since unknown unicast flooding is disabled. 1248 Based on the assumption that all the MAC entries behind the CEs are 1249 pre-populated through gratuitous-ARP and/or DHCP requests, if one 1250 specific MAC entry is not present in the MAC-VRF-500 FIB on PE1, the 1251 owner of that MAC is not alive on the network behind the CE3, hence 1252 the traffic can be dropped at PE1 instead of be flooded and consume 1253 network bandwidth. 1255 9.2.3. Optimization of inter-subnet forwarding example for CE3 to CE2 1256 traffic 1258 Using Figure 1 in section 3 consider that there is an IP-VPN 666 1259 context residing on PE1, PE2 and PE3 which connects CE1, CE2 and CE3 1260 into a single IP-VPN domain. Also consider that there are two EVIs 1261 present on the PEs, EVI 600 and EVI 60. Each IP subnet is associated 1262 to a different MAC-VRF context. Thus there is a single subnet, subnet 1263 600, between CE1 and CE3 that is established through EVI 600. 1264 Similarly, there is another subnet, subnet 60, between CE2 and CE3 1265 that is established through EVI 60. Since both subnets are part of 1266 the same IP VPN, there is a mapping of each EVI (or individual 1267 subnet) to a local IRB interface on the three PEs. 1269 If a node behind CE2 wants to communicate with a node on the same 1270 subnet seating behind CE3, the communication flow will follow the 1271 standard EVPN procedures, i.e. FIB lookup within the PE1 (or PE2) 1272 after adding the corresponding EVPN label to the MPLS label stack 1273 (downstream label allocation from PE3 for EVI 60). 1275 When it comes to crossing the subnet boundaries, the ingress PE 1276 implements local inter-subnet forwarding. For example, when a node 1277 behind CE2 (EVI 60) sends a packet to a node behind CE1 (EVI 600) the 1278 destination IP address will be in the subnet 600, but the destination 1279 MAC address will be the address of source node's default gateway, 1280 which in this case will be an IRB interface on PE1 (connecting EVI 60 1281 to IP-VPN 666). Once PE1 sees the traffic destined to its own MAC 1282 address, it will route the packet to EVI 600, i.e. it will change the 1283 source MAC address to the one of the IRB interface in EVI 600 and 1284 change the destination MAC address to the address belonging to the 1285 node behind CE1, which is already populated in the MAC-VRF-600 FIB, 1286 either through data or control plane learning. 1288 An important optimization to be noted is the local inter-subnet 1289 forwarding in lieu of IP VPN routing. If the node from subnet 60 1290 (behind CE2) is sending a packet to the remote end node on subnet 600 1291 (behind CE3), the mechanism in place still honors the local inter- 1292 subnet (inter-EVI) forwarding. 1294 In our use-case, therefore, when node from subnet 60 behind CE2 sends 1295 traffic to the node on subnet 600 behind CE3, the destination MAC 1296 address is the PE1 MAC-VRF-60 IRB MAC address. However, once the 1297 traffic locally crosses EVIs, to EVI 600, via the IRB interface on 1298 PE1, the source MAC address is changed to that of the IRB interface 1299 and the destination MAC address is changed to the one advertised by 1300 PE3 via EVPN and already installed in MAC-VRF-600. The rest of the 1301 forwarding through PE1 is using the MAC-VRF-600 forwarding context 1302 and label space. 1304 Another very relevant optimization is due to the fact that traffic 1305 between PEs is forwarded through EVPN, rather than through IP-VPN. In 1306 the example described above for traffic from EVI 60 on CE2 to EVI 600 1307 on CE3, there is no need for IP-VPN processing on the egress PE3. 1308 Traffic is forwarded either to the EVI 600 context in PE3 for further 1309 MAC lookup and next-hop processing, or directly to the node behind 1310 CE3, depending on the egress forwarding model being used. 1312 10. Security Considerations 1314 Please refer to the "Security Considerations" section in [RFC7432]. 1316 11. IANA Considerations 1318 No new IANA considerations are needed. 1320 12. References 1322 12.1. Normative References 1324 [RFC4761] Kompella, K., Ed., and Y. Rekhter, Ed., "Virtual Private 1325 LAN Service (VPLS) Using BGP for Auto-Discovery and Signaling", 1326 RFC 4761, DOI 10.17487/RFC4761, January 2007, . 1329 [RFC4762] Lasserre, M., Ed., and V. Kompella, Ed., "Virtual Private 1330 LAN Service (VPLS) Using Label Distribution Protocol (LDP) 1331 Signaling", RFC 4762, DOI 10.17487/RFC4762, January 2007, 1332 . 1334 [RFC6074] Rosen, E., Davie, B., Radoaca, V., and W. Luo, 1335 "Provisioning, Auto-Discovery, and Signaling in Layer 2 Virtual 1336 Private Networks (L2VPNs)", RFC 6074, DOI 10.17487/RFC6074, January 1337 2011, . 1339 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1340 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, 1341 . 1343 [RFC7209] Sajassi, A., Aggarwal, R., Uttaro, J., Bitar, N., 1344 Henderickx, W., and A. Isaac, "Requirements for Ethernet VPN (EVPN)", 1345 RFC 7209, DOI 10.17487/RFC7209, May 2014, . 1348 [RFC7117] Aggarwal, R., Ed., Kamite, Y., Fang, L., Rekhter, Y., and 1349 C. Kodeboniya, "Multicast in Virtual Private LAN Service (VPLS)", 1350 RFC 7117, DOI 10.17487/RFC7117, February 2014, . 1353 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1354 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet 1355 VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, . 1358 12.2. Informative References 1360 [EVPN-INTERSUBNET] Sajassi et al., "IP Inter-subnet forwarding in 1361 EVPN", draft-ietf-bess-evpn-inter-subnet-forwarding-03.txt 1363 13. Acknowledgments 1365 The authors want to thank Giles Heron for his detailed review of the 1366 document. We also thank Stefan Plug, and Eric Wunan for their 1367 comments. 1369 14. Contributors 1370 In addition to the authors listed on the front page, the following 1371 co-authors have also contributed to this document: 1373 Florin Balus 1374 Keyur Patel 1375 Aldrin Isaac 1376 Truman Boyes 1378 15. Authors' Addresses 1380 Jorge Rabadan 1381 Nokia 1382 777 E. Middlefield Road 1383 Mountain View, CA 94043 USA 1384 Email: jorge.rabadan@nokia.com 1386 Senad Palislamovic 1387 Nokia 1388 Email: senad.palislamovic@nokia.com 1390 Wim Henderickx 1391 Nokia 1392 Email: wim.henderickx@nokia.com 1394 Ali Sajassi 1395 Cisco 1396 Email: sajassi@cisco.com 1398 James Uttaro 1399 AT&T 1400 Email: uttaro@att.com