idnits 2.17.1 draft-ietf-l2vpn-evpn-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 45 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 12, 2014) is 3698 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5925' is mentioned on line 2168, but not defined == Outdated reference: A later version (-07) exists of draft-ietf-l2vpn-evpn-req-04 == Outdated reference: A later version (-16) exists of draft-ietf-l2vpn-vpls-mcast-14 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Sajassi, Ed. 3 INTERNET-DRAFT Cisco 4 Category: Standards Track 5 R. Aggarwal 6 J. Drake Arktan 7 Juniper Networks 8 N. Bitar 9 W. Henderickx Verizon 10 Alcatel-Lucent 11 Aldrin Isaac 12 Bloomberg 14 J. Uttaro 15 AT&T 17 Expires: September 12, 2014 March 12, 2014 19 BGP MPLS Based Ethernet VPN 20 draft-ietf-l2vpn-evpn-06 22 Status of this Memo 24 This Internet-Draft is submitted to IETF in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as 30 Internet-Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/1id-abstracts.html 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html 43 Copyright and License Notice 45 Copyright (c) 2013 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Abstract 60 This document describes procedures for BGP MPLS based Ethernet VPNs 61 (EVPN). 63 Table of Contents 65 1. Specification of requirements . . . . . . . . . . . . . . . . . 5 66 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 4. BGP MPLS Based EVPN Overview . . . . . . . . . . . . . . . . . 6 69 5. Ethernet Segment . . . . . . . . . . . . . . . . . . . . . . . 7 70 6. Ethernet Tag . . . . . . . . . . . . . . . . . . . . . . . . . 10 71 6.1 VLAN Based Service Interface . . . . . . . . . . . . . . . . 10 72 6.2 VLAN Bundle Service Interface . . . . . . . . . . . . . . . 11 73 6.2.1 Port Based Service Interface . . . . . . . . . . . . . . 11 74 6.3 VLAN Aware Bundle Service Interface . . . . . . . . . . . . 11 75 6.3.1 Port Based VLAN Aware Service Interface . . . . . . . . 11 76 7. BGP EVPN NLRI . . . . . . . . . . . . . . . . . . . . . . . . . 12 77 7.1. Ethernet Auto-Discovery Route . . . . . . . . . . . . . . . 12 78 7.2. MAC/IP Advertisement Route . . . . . . . . . . . . . . . . 13 79 7.3. Inclusive Multicast Ethernet Tag Route . . . . . . . . . . 14 80 7.4 Ethernet Segment Route . . . . . . . . . . . . . . . . . . . 14 81 7.5 ESI Label Extended Community . . . . . . . . . . . . . . . . 15 82 7.6 ES-Import Route Target . . . . . . . . . . . . . . . . . . . 15 83 7.7 MAC Mobility Extended Community . . . . . . . . . . . . . . 16 84 7.8 Default Gateway Extended Community . . . . . . . . . . . . . 16 85 8. Multi-homing Functions . . . . . . . . . . . . . . . . . . . . 16 86 8.1 Multi-homed Ethernet Segment Auto-Discovery . . . . . . . . 17 87 8.1.1 Constructing the Ethernet Segment Route . . . . . . . . 17 88 8.2 Fast Convergence . . . . . . . . . . . . . . . . . . . . . . 17 89 8.2.1 Constructing the Ethernet A-D per Ethernet Segment 90 (ES) Route . . . . . . . . . . . . . . . . . . . . . . . 18 91 8.2.1.1. Ethernet A-D Route Targets . . . . . . . . . . . . 18 92 8.3 Split Horizon . . . . . . . . . . . . . . . . . . . . . . . 19 93 8.3.1 ESI Label Assignment . . . . . . . . . . . . . . . . . . 19 94 8.3.1.1 Ingress Replication . . . . . . . . . . . . . . . . 19 95 8.3.1.2. P2MP MPLS LSPs . . . . . . . . . . . . . . . . . . 20 97 8.4 Aliasing and Backup-Path . . . . . . . . . . . . . . . . . . 21 98 8.4.1 Constructing the Ethernet A-D per EVPN Instance (EVI) 99 Route . . . . . . . . . . . . . . . . . . . . . . . . . 22 100 8.4.1.1 Ethernet A-D Route Targets . . . . . . . . . . . . . 23 101 8.5 Designated Forwarder Election . . . . . . . . . . . . . . . 24 102 8.6. Interoperability with Single-homing PEs . . . . . . . . . . 26 103 9. Determining Reachability to Unicast MAC Addresses . . . . . . . 26 104 9.1. Local Learning . . . . . . . . . . . . . . . . . . . . . . 27 105 9.2. Remote learning . . . . . . . . . . . . . . . . . . . . . . 27 106 9.2.1. Constructing the BGP EVPN MAC/IP Address 107 Advertisement . . . . . . . . . . . . . . . . . . . . . 27 108 9.2.2 Route Resolution . . . . . . . . . . . . . . . . . . . . 29 109 10. ARP and ND . . . . . . . . . . . . . . . . . . . . . . . . . . 30 110 10.1 Default Gateway . . . . . . . . . . . . . . . . . . . . . . 31 111 11. Handling of Multi-Destination Traffic . . . . . . . . . . . . 32 112 11.1. Construction of the Inclusive Multicast Ethernet Tag 113 Route . . . . . . . . . . . . . . . . . . . . . . . . . . 32 114 11.2. P-Tunnel Identification . . . . . . . . . . . . . . . . . 33 115 12. Processing of Unknown Unicast Packets . . . . . . . . . . . . 34 116 12.1. Ingress Replication . . . . . . . . . . . . . . . . . . . 34 117 12.2. P2MP MPLS LSPs . . . . . . . . . . . . . . . . . . . . . . 35 118 13. Forwarding Unicast Packets . . . . . . . . . . . . . . . . . . 35 119 13.1. Forwarding packets received from a CE . . . . . . . . . . 35 120 13.2. Forwarding packets received from a remote PE . . . . . . . 36 121 13.2.1. Unknown Unicast Forwarding . . . . . . . . . . . . . . 36 122 13.2.2. Known Unicast Forwarding . . . . . . . . . . . . . . . 37 123 14. Load Balancing of Unicast Frames . . . . . . . . . . . . . . . 37 124 14.1. Load balancing of traffic from an PE to remote CEs . . . . 37 125 14.1.1 Single-Active Redundancy Mode . . . . . . . . . . . . . 37 126 14.1.2 All-Active Redundancy Mode . . . . . . . . . . . . . . 38 127 14.2. Load balancing of traffic between an PE and a local CE . . 40 128 14.2.1. Data plane learning . . . . . . . . . . . . . . . . . 40 129 14.2.2. Control plane learning . . . . . . . . . . . . . . . . 40 130 15. MAC Mobility . . . . . . . . . . . . . . . . . . . . . . . . . 40 131 15.1. MAC Duplication Issue . . . . . . . . . . . . . . . . . . 42 132 15.2. Sticky MAC addresses . . . . . . . . . . . . . . . . . . . 43 133 16. Multicast & Broadcast . . . . . . . . . . . . . . . . . . . . 43 134 16.1. Ingress Replication . . . . . . . . . . . . . . . . . . . 43 135 16.2. P2MP LSPs . . . . . . . . . . . . . . . . . . . . . . . . 43 136 16.2.1. Inclusive Trees . . . . . . . . . . . . . . . . . . . 43 137 17. Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 44 138 17.1. Transit Link and Node Failures between PEs . . . . . . . . 44 139 17.2. PE Failures . . . . . . . . . . . . . . . . . . . . . . . 44 140 17.3. PE to CE Network Failures . . . . . . . . . . . . . . . . 44 141 18. Frame Ordering . . . . . . . . . . . . . . . . . . . . . . . . 45 142 19. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 46 143 20. Security Considerations . . . . . . . . . . . . . . . . . . . 46 144 21. Co-authors . . . . . . . . . . . . . . . . . . . . . . . . . . 47 145 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 48 146 23. References . . . . . . . . . . . . . . . . . . . . . . . . . . 48 147 23.1 Normative References . . . . . . . . . . . . . . . . . . . 48 148 23.2 Informative References . . . . . . . . . . . . . . . . . . 48 149 24. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 49 151 1. Specification of requirements 153 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 154 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 155 document are to be interpreted as described in [RFC2119]. 157 2. Terminology 159 Bridge Domain: 161 Broadcast Domain: 163 CE: Customer Edge device e.g., host or router or switch 165 EVI: An EVPN instance spanning across the PEs participating in that 166 VPN 168 MAC-VRF: A Virtual Routing and Forwarding table for MAC addresses on 169 a PE for an EVI 171 Ethernet Segment Identifier (ESI): If a CE is multi-homed to two or 172 more PEs, the set of Ethernet links that attaches the CE to the PEs 173 is an 'Ethernet segment'. Ethernet segments MUST have a unique non- 174 zero identifier, the 'Ethernet Segment Identifier'. 176 Ethernet Tag: An Ethernet Tag identifies a particular broadcast 177 domain, e.g., a VLAN. An EVPN instance consists of one or more 178 broadcast domains. Ethernet tag(s) are assigned to the broadcast 179 domains of a given EVPN instance by the provider of that EVPN, and 180 each PE in that EVPN instance performs a mapping between broadcast 181 domain identifier(s) understood by each of its attached CEs and the 182 corresponding Ethernet tag. 184 LACP: Link Aggregation Control Protocol 186 MP2MP: Multipoint to Multipoint 188 P2MP: Point to Multipoint 190 P2P: Point to Point 192 Single-Active Redundancy Mode: When only a single PE, among a group 193 of PEs attached to an Ethernet segment, is allowed to forward traffic 194 to/from that Ethernet Segment, then the Ethernet segment is defined 195 to be operating in Single-Active redundancy mode. 197 All-Active Redundancy Mode: When all PEs attached to an Ethernet 198 segment are allowed to forward traffic to/from that Ethernet Segment, 199 then the Ethernet segment is defined to be operating in All-Active 200 redundancy mode. 202 3. Introduction 204 This document describes procedures for BGP MPLS based Ethernet VPNs 205 (EVPN). The procedures described here are intended to meet the 206 requirements specified in [EVPN-REQ]. Please refer to [EVPN-REQ] for 207 the detailed requirements and motivation. EVPN requires extensions to 208 existing IP/MPLS protocols as described in this document. In addition 209 to these extensions EVPN uses several building blocks from existing 210 MPLS technologies. 212 4. BGP MPLS Based EVPN Overview 214 This section provides an overview of EVPN. An EVPN instance comprises 215 CEs that are connected to PEs that form the edge of the MPLS 216 infrastructure. A CE may be a host, a router or a switch. The PEs 217 provide virtual Layer 2 bridged connectivity between the CEs. There 218 may be multiple EVPN instances in the provider's network. 220 The PEs may be connected by an MPLS LSP infrastructure which provides 221 the benefits of MPLS technology such as fast-reroute, resiliency, 222 etc. The PEs may also be connected by an IP infrastructure in which 223 case IP/GRE tunneling or other IP tunneling can be used between the 224 PEs. The detailed procedures in this version of this document are 225 specified only for MPLS LSPs as the tunneling technology. However 226 these procedures are designed to be extensible to IP tunneling as the 227 PSN tunneling technology. 229 In an EVPN, MAC learning between PEs occurs not in the data plane (as 230 happens with traditional bridging) but in the control plane. Control 231 plane learning offers greater control over the MAC learning process, 232 such as restricting who learns what, and the ability to apply 233 policies. Furthermore, the control plane chosen for advertising MAC 234 reachability information is multi-protocol (MP) BGP (similar to IP 235 VPNs (RFC 4364)). This provides greater scalability and the ability 236 to preserve the "virtualization" or isolation of groups of 237 interacting agents (hosts, servers, virtual machines) from each 238 other. In EVPN, PEs advertise the MAC addresses learned from the CEs 239 that are connected to them, along with an MPLS label, to other PEs in 240 the control plane using MP-BGP. Control plane learning enables load 241 balancing of traffic to and from CEs that are multi-homed to multiple 242 PEs. This is in addition to load balancing across the MPLS core via 243 multiple LSPs between the same pair of PEs. In other words it allows 244 CEs to connect to multiple active points of attachment. It also 245 improves convergence times in the event of certain network failures. 247 However, learning between PEs and CEs is done by the method best 248 suited to the CE: data plane learning, IEEE 802.1x, LLDP, 802.1aq, 249 ARP, management plane or other protocols. 251 It is a local decision as to whether the Layer 2 forwarding table on 252 an PE is populated with all the MAC destination addresses known to 253 the control plane, or whether the PE implements a cache based scheme. 254 For instance the MAC forwarding table may be populated only with the 255 MAC destinations of the active flows transiting a specific PE. 257 The policy attributes of EVPN are very similar to those of IP-VPN. A 258 EVPN instance requires a Route-Distinguisher (RD) which is unique per 259 PE and one or more globally unique Route-Targets (RTs). A CE attaches 260 to a MAC-VRF on an PE, on an Ethernet interface which may be 261 configured for one or more Ethernet Tags, e.g., VLAN IDs. Some 262 deployment scenarios guarantee uniqueness of VLAN IDs across EVPN 263 instances: all points of attachment for a given EVPN instance use the 264 same VLAN ID, and no other EVPN instance uses this VLAN ID. This 265 document refers to this case as a "Unique VLAN EVPN" and describes 266 simplified procedures to optimize for it. 268 5. Ethernet Segment 270 If a CE is multi-homed to two or more PEs, the set of Ethernet links 271 constitutes an "Ethernet Segment". An Ethernet segment may appear to 272 the CE as a Link Aggregation Group (LAG). Ethernet segments have an 273 identifier, called the "Ethernet Segment Identifier" (ESI) which is 274 encoded as a ten octets integer. The following two ESI values are 275 reserved: 277 - ESI 0 denotes a single-homed CE. 279 - ESI {0xFF} (repeated 10 times) is known as MAX-ESI and is 280 reserved. 282 In general, an Ethernet segment MUST have a non-reserved ESI that is 283 unique network wide (e.g., across all EVPN instances on all the PEs). 284 If the CE(s) constituting an Ethernet Segment is (are) managed by the 285 network operator, then ESI uniqueness should be guaranteed; however, 286 if the CE(s) is (are) not managed, then the operator MUST configure a 287 network-wide unique ESI for that Ethernet Segment. This is required 288 to enable auto-discovery of Ethernet Segments and DF election. 290 In a network with managed and not-managed CEs, the ESI has the 291 following format: 293 +---+---+---+---+---+---+---+---+---+---+ 294 | T | ESI Value | 295 +---+---+---+---+---+---+---+---+---+---+ 297 Where: 299 T (ESI Type) is a 1-byte field (most significant octet) that 300 specifies the format of the remaining nine bytes (ESI Value). The 301 following 6 ESI types can be used: 303 - Type 0 (T=0x00) - This type indicates an arbitrary nine-octet ESI 304 value, which is managed and configured by the operator. 306 - Type 1 (T=0x01) - When IEEE 802.1AX LACP is used between the PEs 307 and CEs, this ESI type indicates an auto-generated ESI value 308 determined from LACP by concatenating the following parameters: 310 + CE LACP six octets System MAC address. The CE LACP System MAC 311 address MUST be encoded in the high order six octets of the ESI 312 Value field. 314 + CE LACP two octets Port Key. The CE LACP port key MUST be 315 encoded in the two octets next to the System MAC address. 317 + The remaining octet will be set to 0x00. 319 As far as the CE is concerned, it would treat the multiple PEs 320 that it is connected to as the same switch. This allows the CE 321 to aggregate links that are attached to different PEs in the 322 same bundle. 324 This mechanism could be used only if it produces ESIs that satisfy 325 the uniqueness requirement specified above. 327 - Type 2 (T=0x02) - This type is used in the case of indirectly 328 connected hosts via a bridged LAN between the CEs and the PEs. The 329 ESI Value is auto-generated and determined based on the Layer 2 330 bridge protocol as follows: If MST is used in the bridged LAN then 331 the value of the ESI is derived by listening to BPDUs on the Ethernet 332 segment. To achieve this the PE is not required to run MST. However 333 the PE must learn the Root Bridge MAC address and Bridge Priority of 334 the root of the Internal Spanning Tree (IST) by listening to the 335 BPDUs. The ESI Value is constructed as follows: 337 + Root Bridge six octets MAC address. The Root Bridge MAC 338 address MUST be encoded in the high order six octets of the 339 ESI Value field. 341 + Root Bridge two octets Priority. The CE LACP port key MUST be 342 encoded in the two octets next to the Root Bridge MAC address. 344 + The remaining octet will be set to 0x00. 346 This mechanism could be used only if it produces ESIs that satisfy 347 the uniqueness requirement specified above. 349 - Type 3 (T=0x03) - This type indicates a MAC-based ESI Value that 350 can be auto-generated or configured by the operator. The ESI Value is 351 constructed as follows: 353 + System MAC address (six octets). The System MAC address MUST 354 be encoded in the high order six octets of the ESI Value field. 356 + Local Discriminator value (three octets). The Local 357 Discriminator MUST be encoded in the low order three octets 358 of the ESI Value. 360 This mechanism could be used only if it produces ESIs that satisfy 361 the uniqueness requirement specified above. 363 - Type 4 (T=0x04) - This type indicates an IP-based ESI Value that 364 can be auto-generated or configured by the operator. The ESI Value is 365 constructed as follows: 367 + IP address (four octets). This is an IPv4 address owned by 368 the system and MUST be encoded in the high order four octets 369 of the ESI Value field. 371 + Local Discriminator value (four octets). The Local Discriminator 372 MUST be encoded in the four octets next to the IP address. 374 + The low order octet of the ESI Value will be set to 0x00. 376 This mechanism could be used only if it produces ESIs that satisfy 377 the uniqueness requirement specified above. 379 - Type 5 (T=0x05) - This type indicates an AS-based ESI Value that 380 can be auto-generated or configured by the operator. The ESI Value is 381 constructed as follows: 383 + AS number (four octets). This is an AS number owned by the 384 system and MUST be encoded in the high order four octets of the 385 ESI Value field. If a two-octet AS number is used, the high order 386 extra two bytes will be 0x0000. 388 + Local Discriminator value (four octets). The Local Discriminator 389 MUST be encoded in the four octets next to the AS number. 391 + The low order octet of the ESI Value will be set to 0x00. 393 This mechanism could be used only if it produces ESIs that satisfy 394 the uniqueness requirement specified above. 396 6. Ethernet Tag 398 An Ethernet Tag identifies a particular broadcast domain, e.g. a 399 VLAN, in an EVPN Instance. An EVPN Instance consists of one or more 400 broadcast domains (one or more VLANs). VLANs are assigned to a given 401 EVPN Instance by the provider of the EVPN service. A given VLAN can 402 itself be represented by multiple VLAN IDs (VIDs). In such cases, the 403 PEs participating in that VLAN for a given EVPN instance are 404 responsible for performing VLAN ID translation to/from locally 405 attached CE devices. 407 If a VLAN is represented by a single VID across all PE devices 408 participating in that VLAN for that EVPN instance, then there is no 409 need for VID translation at the PEs. Furthermore, some deployment 410 scenarios guarantee uniqueness of VIDs across all EVPN instances; 411 all points of attachment for a given EVPN instance use the same VID 412 and no other EVPN instances use that VID. This allows the RT(s) for 413 each EVPN instance to be derived automatically from the corresponding 414 VID, as described in section 8.4.1.1.1 "Auto-Derivation from the 415 Ethernet Tag ID". 417 The following subsections discuss the relationship between broadcast 418 domains (e.g., VLANs), Ethernet Tags (e.g., VIDs), and MAC-VRFs as 419 well as the setting of the Ethernet Tag Identifier, in the various 420 EVPN BGP routes (defined in section 8), for the different types of 421 service interfaces described in [EVPN-REQ]. 423 The following Ethernet Tag value is reserved: 425 - Ethernet Tag {0xFFFFFFFF} is known as MAX-ET 427 6.1 VLAN Based Service Interface 429 With this service interface, an EVPN instance consists of only a 430 single broadcast domain (e.g., a single VLAN). Therefore, there is a 431 one to one mapping between a VID on this interface and a MAC-VRF. 432 Since a MAC-VRF corresponds to a single VLAN, it consists of a single 433 bridge domain corresponding to that VLAN. If the VLAN is represented 434 by different VIDs on different PEs, then each PE needs to perform VID 435 translation for frames destined to its attached CEs. In such 436 scenarios, the Ethernet frames transported over MPLS/IP network 437 SHOULD remain tagged with the originating VID and a VID translation 438 MUST be supported in the data path and MUST be performed on the 439 disposition PE. The Ethernet Tag Identifier in all EVPN routes MUST 440 be set to 0. 442 6.2 VLAN Bundle Service Interface 444 With this service interface, an EVPN instance corresponds to several 445 broadcast domains (e.g., several VLANs); however, only a single 446 bridge domain is maintained per MAC-VRF which means multiple VLANs 447 share the same bridge domain. This implies MAC addresses MUST be 448 unique across different VLANs for this service to work. In other 449 words, there is a many-to-one mapping between VLANs and a MAC-VRF, 450 and the MAC-VRF consists of a single bridge domain. Furthermore, a 451 single VLAN must be represented by a single VID - e.g., no VID 452 translation is allowed for this service interface type. The MPLS 453 encapsulated frames MUST remain tagged with the originating VID. Tag 454 translation is NOT permitted. The Ethernet Tag Identifier in all EVPN 455 routes MUST be set to 0. 457 6.2.1 Port Based Service Interface 459 This service interface is a special case of the VLAN Bundle service 460 interface, where all of the VLANs on the port are part of the same 461 service and map to the same bundle. The procedures are identical to 462 those described in section 6.2. 464 6.3 VLAN Aware Bundle Service Interface 466 With this service interface, an EVPN instance consists of several 467 broadcast domains (e.g., several VLANs) with each VLAN having its own 468 bridge domain - e.g., multiple bridge domains (one per VLAN) is 469 maintained by a single MAC-VRF corresponding to the EVPN instance. In 470 the case where a single VLAN is represented by different VIDs on 471 different CEs and thus tag (VID) translation is required, a 472 normalized Ethernet Tag (VID) MUST be carried in the MPLS 473 encapsulated frames and a tag translation function MUST be supported 474 in the data path. This translation MUST be performed in data path on 475 both the imposition as well as the disposition PEs (translating to 476 normalized tag on imposition PE and translating to local tag on 477 disposition PE). The Ethernet Tag Identifier in all EVPN routes MUST 478 be set to the normalized Ethernet Tag assigned by the EVPN provider. 480 6.3.1 Port Based VLAN Aware Service Interface 482 This service interface is a special case of the VLAN Aware Bundle 483 service interface, where all of the VLANs on the port are part of the 484 same service and map to the same bundle. The procedures are identical 485 to those described in section 6.3. 487 7. BGP EVPN NLRI 489 This document defines a new BGP NLRI, called the EVPN NLRI. 491 Following is the format of the EVPN NLRI: 493 +-----------------------------------+ 494 | Route Type (1 octet) | 495 +-----------------------------------+ 496 | Length (1 octet) | 497 +-----------------------------------+ 498 | Route Type specific (variable) | 499 +-----------------------------------+ 501 The Route Type field defines encoding of the rest of the EVPN NLRI 502 (Route Type specific EVPN NLRI). 504 The Length field indicates the length in octets of the Route Type 505 specific field of EVPN NLRI. 507 This document defines the following Route Types: 509 + 1 - Ethernet Auto-Discovery (A-D) route 510 + 2 - MAC advertisement route 511 + 3 - Inclusive Multicast Route 512 + 4 - Ethernet Segment Route 514 The detailed encoding and procedures for these route types are 515 described in subsequent sections. 517 The EVPN NLRI is carried in BGP [RFC4271] using BGP Multiprotocol 518 Extensions [RFC4760] with an AFI of 25 (L2VPN) and a SAFI of 70 519 (EVPN). The NLRI field in the MP_REACH_NLRI/MP_UNREACH_NLRI attribute 520 contains the EVPN NLRI (encoded as specified above). 522 In order for two BGP speakers to exchange labeled EVPN NLRI, they 523 must use BGP Capabilities Advertisement to ensure that they both are 524 capable of properly processing such NLRI. This is done as specified 525 in [RFC4760], by using capability code 1 (multiprotocol BGP) with an 526 AFI of 25 (L2VPN) and a SAFI of 70 (EVPN). 528 7.1. Ethernet Auto-Discovery Route 530 A Ethernet A-D route type specific EVPN NLRI consists of the 531 following: 533 +---------------------------------------+ 534 | RD (8 octets) | 535 +---------------------------------------+ 536 |Ethernet Segment Identifier (10 octets)| 537 +---------------------------------------+ 538 | Ethernet Tag ID (4 octets) | 539 +---------------------------------------+ 540 | MPLS Label (3 octets) | 541 +---------------------------------------+ 543 For the purpose of BGP route key processing, only the Ethernet 544 Segment ID and the Ethernet Tag ID are considered to be part of the 545 prefix in the NLRI. The MPLS Label field is to be treated as a 546 route attribute as opposed to being part of the route. 548 For procedures and usage of this route please see section 8.2 "Fast 549 Convergence" and section 8.4 "Aliasing". 551 7.2. MAC/IP Advertisement Route 553 A MAC advertisement route type specific EVPN NLRI consists of the 554 following: 556 +---------------------------------------+ 557 | RD (8 octets) | 558 +---------------------------------------+ 559 |Ethernet Segment Identifier (10 octets)| 560 +---------------------------------------+ 561 | Ethernet Tag ID (4 octets) | 562 +---------------------------------------+ 563 | MAC Address Length (1 octet) | 564 +---------------------------------------+ 565 | MAC Address (6 octets) | 566 +---------------------------------------+ 567 | IP Address Length (1 octet) | 568 +---------------------------------------+ 569 | IP Address (0 or 4 or 16 octets) | 570 +---------------------------------------+ 571 | MPLS Label1 (3 octets) | 572 +---------------------------------------+ 573 | MPLS Label2 (0 or 3 octets) | 574 +---------------------------------------+ 576 For the purpose of BGP route key processing, only the Ethernet Tag 577 ID, MAC Address Length, MAC Address, IP Address Length, and IP 578 Address Address fields are considered to be part of the prefix in the 579 NLRI. The Ethernet Segment Identifier and MPLS Label fields are to be 580 treated as route attributes as opposed to being part of the "route". 582 For procedures and usage of this route please see section 9 583 "Determining Reachability to Unicast MAC Addresses" and section 14 584 "Load Balancing of Unicast Packets". 586 7.3. Inclusive Multicast Ethernet Tag Route 588 An Inclusive Multicast Ethernet Tag route type specific EVPN NLRI 589 consists of the following: 591 +---------------------------------------+ 592 | RD (8 octets) | 593 +---------------------------------------+ 594 | Ethernet Tag ID (4 octets) | 595 +---------------------------------------+ 596 | IP Address Length (1 octet) | 597 +---------------------------------------+ 598 | Originating Router's IP Addr | 599 | (4 or 16 octets) | 600 +---------------------------------------+ 602 For procedures and usage of this route please see section 11 603 "Handling of Multi-Destination Traffic", section 13 "Processing of 604 Unknown Unicast Traffic" and section 16 "Multicast". 606 7.4 Ethernet Segment Route 608 The Ethernet Segment Route is encoded in the EVPN NLRI using the 609 Route Type value of 4. The Route Type Specific field of the NLRI is 610 formatted as follows: 612 +---------------------------------------+ 613 | RD (8 octets) | 614 +---------------------------------------+ 615 |Ethernet Segment Identifier (10 octets)| 616 +---------------------------------------+ 617 | IP Address Length (1 octet) | 618 +---------------------------------------+ 619 | Originating Router's IP Addr | 620 | (4 or 16 octets) | 621 +---------------------------------------+ 623 For procedures and usage of this route please see section 8.5 624 "Designated Forwarder Election". The IP address length is in bits. 626 7.5 ESI Label Extended Community 628 This extended community is a new transitive extended community with 629 the Type field is 0x06, and the Sub-Type of 0x01. It may be 630 advertised along with Ethernet Auto-Discovery routes and it enables 631 split-horizon procedures for multi-homed sites as described in 632 section 8.3 "Split Horizon". 634 Each ESI Label Extended Community is encoded as a 8-octet value as 635 follows: 637 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 639 | Type=0x06 | Sub-Type=0x01 | Flags (One Octet) |Reserved=0 | 640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 641 | Reserved = 0| ESI Label | 642 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 644 The low order bit of the flags octet is defined as the "Single- 645 Active" bit. A value of 0 means that the multi-homed site is 646 operating in All-Active redundancy mode and a value of 1 means that 647 the multi-homed site is operating in Single-Active redundancy mode. 649 The second low order bit of the flags octet is defined as the "Root- 650 Leaf". A value of 0 means that this label is associated with a Root 651 site; whereas, a value of 1 means that this label is associate with a 652 Leaf site. The other bits must be set to 0. 654 7.6 ES-Import Route Target 656 This is a new transitive Route Target extended community carried with 657 the Ethernet Segment route. When used, it enables all the PEs 658 connected to the same multi-homed site to import the Ethernet Segment 659 routes. The value is derived automatically from the ESI by encoding 660 the high order 6-byte portion of the 9-byte ESI Value in the ES- 661 Import Route Target. The format of this extended community is as 662 follows: 664 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 665 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 666 | Type=0x06 | Sub-Type=0x02 | ES-Import | 667 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 668 | ES-Import Cont'd | 669 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 671 This document expands the definition of the Route Target extended 672 community to allow the value of high order octet (Type field) to be 673 0x06 (in addition to the values specified in rfc4360). The value of 674 low order octet (Sub-Type field) of 0x02 indicates that this extended 675 community is of type "Route Target". The new value for Type field of 676 0x06 indicates that the structure of this RT is a six bytes value 677 (e.g., a MAC address). A BGP speaker that implements RT-Constrain 678 (RFC4684) MUST apply the RT-Constrain procedures to the ES-import RT 679 as-well. 681 For procedures and usage of this attribute, please see section 8.1 682 "MH Ethernet Segment Auto Discovery". 684 7.7 MAC Mobility Extended Community 686 This extended community is a new transitive extended community with 687 the Type field of 0x06 and the Sub-Type of 0x00. It may be advertised 688 along with MAC Advertisement routes. The procedures for using this 689 Extended Community are described in section 16 "MAC Mobility". 691 The MAC Mobility Extended Community is encoded as a 8-octet value as 692 follows: 694 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 695 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 696 | Type=0x06 | Sub-Type=0x00 |Flags(1 octet)| Reserved=0 | 697 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 698 | Sequence Number | 699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 701 The low order bit of the flags octet is defined as the 702 "Sticky/static" flag and may be set to 1. A value of 1 means that the 703 MAC address is static and cannot move. 705 7.8 Default Gateway Extended Community 707 The Default Gateway community is an Extended Community of an Opaque 708 Type (see 3.3 of rfc4360). It is a transitive community, which means 709 that the first octet is 0x03. The value of the second octet (Sub- 710 Type) is 0x030d (Default Gateway) as defined by IANA. The Value field 711 of this community is reserved (set to 0 by the senders, ignored by 712 the receivers). 714 8. Multi-homing Functions 716 This section discusses the functions, procedures and associated BGP 717 routes used to support multi-homing in EVPN. This covers both multi- 718 homed device (MHD) as well as multi-homed network (MHN) scenarios. 720 8.1 Multi-homed Ethernet Segment Auto-Discovery 722 PEs connected to the same Ethernet segment can automatically discover 723 each other with minimal to no configuration through the exchange of 724 the Ethernet Segment route. 726 8.1.1 Constructing the Ethernet Segment Route 728 The Route-Distinguisher (RD) MUST be a Type 1 RD [RFC4364]. The value 729 field comprises an IP address of the MES (typically, the loopback 730 address) followed by 0's. 732 The Ethernet Segment Identifier MUST be set to the ten octet ESI 733 identifier described in section 5. 735 The BGP advertisement that advertises the Ethernet Segment route MUST 736 also carry an ES-Import route target, as defined in section 7.6. 738 The Ethernet Segment Route filtering MUST be done such that the 739 Ethernet Segment Route is imported only by the PEs that are multi- 740 homed to the same Ethernet Segment. To that end, each PE that is 741 connected to a particular Ethernet segment constructs an import 742 filtering rule to import a route that carries the ES-Import extended 743 community, constructed from the ESI. 745 8.2 Fast Convergence 747 In EVPN, MAC address reachability is learnt via the BGP control-plane 748 over the MPLS network. As such, in the absence of any fast protection 749 mechanism, the network convergence time is a function of the number 750 of MAC Advertisement routes that must be withdrawn by the PE 751 encountering a failure. For highly scaled environments, this scheme 752 yields slow convergence. 754 To alleviate this, EVPN defines a mechanism to efficiently and 755 quickly signal, to remote PE nodes, the need to update their 756 forwarding tables upon the occurrence of a failure in connectivity to 757 an Ethernet segment. This is done by having each PE advertise a set 758 of Ethernet A-D per Ethernet segment (per ES) routes for each locally 759 attached Ethernet segment (refer to section 8.2.1 below for details 760 on how this route is constructed). Upon a failure in connectivity to 761 the attached segment, the PE withdraws the corresponding Ethernet A-D 762 route. This triggers all PEs that receive the withdrawal to update 763 their next-hop adjacencies for all MAC addresses associated with the 764 Ethernet segment in question. If no other PE had advertised an 765 Ethernet A-D route for the same segment, then the PE that received 766 the withdrawal simply invalidates the MAC entries for that segment. 767 Otherwise, the PE updates the next-hop adjacencies to point to the 768 backup PE(s). 770 8.2.1 Constructing the Ethernet A-D per Ethernet Segment (ES) Route 772 This section describes the procedures used to construct the Ethernet 773 A-D per ES route, which is used for fast convergence (as discussed 774 above) and for advertising the ESI label used for split-horizon 775 filtering (as discussed in section 8.3). Support of this route is 776 MANDATORY. 778 The Route-Distinguisher (RD) MUST be a Type 1 RD [RFC4364]. The value 779 field comprises an IP address of the PE (typically, the loopback 780 address) followed by a number unique to the PE. 782 The Ethernet Segment Identifier MUST be a ten octet entity as 783 described in section "Ethernet Segment". This document does not 784 specify the use of the Ethernet A-D route when the Segment Identifier 785 is set to 0. 787 The Ethernet Tag ID MUST be set to MAX-ET. 789 The MPLS label in the NLRI MUST be set to 0. 791 The "ESI Label Extended Community" MUST be included in the route. If 792 All-Active redundancy mode is desired, then the "Single-Active" bit 793 in the flags of the ESI Label Extended Community MUST be set to 0 and 794 the MPLS label in that extended community MUST be set to a valid MPLS 795 label value. The MPLS label in this Extended Community is referred to 796 as the ESI label and MUST have the same value in each Ethernet A-D 797 per ES route advertised for the ES. This label MUST be a downstream 798 assigned MPLS label if the advertising PE is using ingress 799 replication for receiving multicast, broadcast or unknown unicast 800 traffic from other PEs. If the advertising PE is using P2MP MPLS LSPs 801 for sending multicast, broadcast or unknown unicast traffic, then 802 this label MUST be an upstream assigned MPLS label. The usage of this 803 label is described in section 8.3. 805 If Single-Active redundancy mode is desired, then the "Single-Active" 806 bit in the flags of the ESI Label Extended Community MUST be set to 1 807 and the ESI label MUST be set to zero. 809 8.2.1.1. Ethernet A-D Route Targets 811 Each Ethernet A-D per ES route MUST carry one or more Route Target 812 (RT) attributes. The set of Ethernet A-D routes per ES MUST carry the 813 entire set of RTs for all the EVPN instances to which the Ethernet 814 Segment belongs. 816 8.3 Split Horizon 818 Consider a CE that is multi-homed to two or more PEs on an Ethernet 819 segment ES1 operating in All-Active redundancy mode. If the CE sends 820 a broadcast, unknown unicast, or multicast (BUM) packet to one of the 821 non-DF (Designated Forwarder) PEs, say PE1, then PE1 will forward 822 that packet to all or subset of the other PEs in that EVPN instance 823 including the DF PE for that Ethernet segment. In this case the DF PE 824 that the CE is multi-homed to MUST drop the packet and not forward 825 back to the CE. This filtering is referred to as "split horizon" 826 filtering in this document. 828 When a set of PEs operating in Single-Active redundancy mode, the use 829 of this split-horizon filtering mechanism is highly recommended 830 because it prevents transient loop at the time of failure or recovery 831 impacting the Ethernet Segment - e.g., when two PEs thinks that both 832 are DFs for that segment before DF election procedure settles down. 834 In order to achieve this split horizon function, every BUM packet 835 originating from a non-DF PE is encapsulated with an MPLS label that 836 identifies the Ethernet segment of origin (i.e. the segment from 837 which the frame entered the EVPN network). This label is referred to 838 as the ESI label, and MUST be distributed by all PEs when operating 839 in All-Active redundancy mode using a set of Ethernet A-D per ES 840 routes per section 8.2.1 above. The ESI label SHOULD be distributed 841 by all PEs when operating in Single-Active redundancy mode using a 842 set of Ethernet A-D per ES route. This route is imported by the PEs 843 connected to the Ethernet Segment and also by the PEs that have at 844 least one EVPN instance in common with the Ethernet Segment in the 845 route. As described in section 8.1.1, the route MUST carry an ESI 846 Label Extended Community with a valid ESI label. The disposition PE 847 rely on the value of the ESI label to determine whether or not a BUM 848 frame is allowed to egress a specific Ethernet segment. 850 8.3.1 ESI Label Assignment 852 The following subsections describe the assignment procedures for the 853 ESI label, which differ depending on the type of tunnels being used 854 to deliver multi-destination packets in the EVPN network. 856 8.3.1.1 Ingress Replication 858 Each PE attached to a given ES that is operating in All-Active or 859 Single-Active redundancy mode and that uses ingress replication to 860 receive BUM traffic advertises a downstream assigned ESI label in the 861 set of Ethernet A-D per ES routes for that ES. This label MUST be 862 programmed in the platform label space by the advertising PE and the 863 forwarding entry for this label must result in NOT forwarding packets 864 received with this label onto the Ethernet segment for which the 865 label was distributed. 867 The rules for the inclusion of the ESI label in a BUM packet by the 868 ingress PE operating in All-Active redundancy mode are as follows: 870 A non-DF ingress PE MUST include the ESI label distributed by the DF 871 egress PE in the copy of a BUM packet sent to it. 873 An ingress PE (DF or non-DF) SHOULD include the ESI label distributed 874 by each non-DF egress PE in the copy of a BUM packet sent to it. 876 The rules for the inclusion of the ESI label in a BUM packet by the 877 ingress PE operating in Single-Active redundancy mode are as follows: 879 An ingress DF PE SHOULD include the ESI label distributed by the 880 egress PE in the copy of a BUM packet sent to it. 882 In both All-Active and Single-Active redundancy mode, an ingress PE 883 MUST NOT include an ESI label in the copy of a BUM packet sent to an 884 egress PE that is not attached to the ES through which the BUM packet 885 entered the EVI. 887 As an example, consider PE1 and PE2 that are multi-homed to CE1 on 888 ES1 and operating in All-Active multi-homing mode. Further consider 889 that PE1 is using P2P or MP2P LSPs to send packets to PE2. Consider 890 that PE1 is the non-DF for VLAN1 and PE2 is the DF for VLAN1, and PE1 891 receives a BUM packet from CE1 on VLAN1 on ES1. In this scenario, PE2 892 distributes an Inclusive Multicast Ethernet Tag route for VLAN1 893 corresponding to an EVPN instance. So, when PE1 sends a BUM packet, 894 that it receives from CE1, it MUST first push onto the MPLS label 895 stack the ESI label that PE2 has distributed for ES1. It MUST then 896 push on the MPLS label distributed by PE2 in the Inclusive Multicast 897 Ethernet Tag route for VLAN1. The resulting packet is further 898 encapsulated in the P2P or MP2P LSP label stack required to transmit 899 the packet to PE2. When PE2 receives this packet, it determines the 900 set of ESIs to replicate the packet to from the top MPLS label, after 901 any P2P or MP2P LSP labels have been removed. If the next label is 902 the ESI label assigned by PE2 for ES1, then PE2 MUST NOT forward the 903 packet onto ES1. If the next label is an ESI label which has not been 904 assigned by PE2, then PE2 MUST drop the packet. It should be noted 905 that in this scenario, if PE2 receives a BUM traffic for VLAN1 from 906 CE1, then it should encapsulate the packet with an ESI label received 907 from PE1 when sending it to the PE1 in order to avoid any transient 908 loop during a failure scenario impacting ES1 (e.g., port or link 909 failure). 911 8.3.1.2. P2MP MPLS LSPs 912 The non-DF PEs attached to a given ES that is operating in All-Active 913 redundancy mode and that use P2MP LSPs to send BUM traffic advertise 914 an upstream assigned ESI label in the set of Ethernet A-D per ES 915 routes for that ES. This label is upstream assigned by the PE that 916 advertises the route. This label MUST be programmed by the other PEs, 917 that are connected to the ESI advertised in the route, in the context 918 label space for the advertising PE. Further the forwarding entry for 919 this label must result in NOT forwarding packets received with this 920 label onto the Ethernet segment that the label was distributed for. 921 This label MUST also be programmed by the other PEs, that import the 922 route but are not connected to the ESI advertised in the route, in 923 the context label space for the advertising PE. Further the 924 forwarding entry for this label must be a POP with no other 925 associated action. 927 The DF PE attached to a given ES that is operating in Single-Active 928 redundancy mode and that use P2MP LSPs to send BUM traffic should 929 advertise an upstream assigned ESI label in the set of Ethernet A-D 930 per ES routes for that ES just as above paragraph. 932 As an example, consider PE1 and PE2 that are multi-homed to CE1 on 933 ES1 and operating in All-Active multi-homing mode. Also consider PE3 934 belongs to one of the EVPN instances of ES1. Further, assume that 935 PE1 which is the non-DF, using P2MP MPLS LSPs to send BUM packets. 936 When PE1 sends a BUM packet, that it receives from CE1, it MUST first 937 push onto the MPLS label stack the ESI label that it has assigned for 938 the ESI that the packet was received on. The resulting packet is 939 further encapsulated in the P2MP MPLS label stack necessary to 940 transmit the packet to the other PEs. Penultimate hop popping MUST be 941 disabled on the P2MP LSPs used in the MPLS transport infrastructure 942 for EVPN. When PE2 receives this packet, it de-capsulates the top 943 MPLS label and forwards the packet using the context label space 944 determined by the top label. If the next label is the ESI label 945 assigned by PE1 to ES1, then PE2 MUST NOT forward the packet onto 946 ES1. When PE3 receives this packet, it de-capsulates the top MPLS 947 label and forwards the packet using the context label space 948 determined by the top label. If the next label is the ESI label 949 assigned by PE1 to ES1 and PE3 is not connected to ES1, then PE3 MUST 950 pop the label and flood the packet over all local ESIs in that EVPN 951 instance. It should be noted that when PE2 sends a BUM frame over a 952 P2MP LSP, it should encapsulate the frame with an ESI label even 953 though it is the DF for that VLAN in order to avoid any transient 954 loop during a failure scenario impacting ES1 (e.g., port or link 955 failure). 957 8.4 Aliasing and Backup-Path 958 In the case where a CE is multi-homed to multiple PE nodes, using a 959 LAG with All-Active redundancy, it is possible that only a single PE 960 learns a set of the MAC addresses associated with traffic transmitted 961 by the CE. This leads to a situation where remote PE nodes receive 962 MAC advertisement routes, for these addresses, from a single PE even 963 though multiple PEs are connected to the multi-homed segment. As a 964 result, the remote PEs are not able to effectively load-balance 965 traffic among the PE nodes connected to the multi-homed Ethernet 966 segment. This could be the case, for e.g. when the PEs perform data- 967 path learning on the access, and the load-balancing function on the 968 CE hashes traffic from a given source MAC address to a single PE. 969 Another scenario where this occurs is when the PEs rely on control 970 plane learning on the access (e.g. using ARP), since ARP traffic will 971 be hashed to a single link in the LAG. 973 To address this issue, EVPN introduces the concept of 'Aliasing' 974 which is the ability of a PE to signal that it has reachability to an 975 EVPN instance on a given ES even when it has learnt no MAC addresses 976 from that EVI/ES. The Ethernet A-D per EVI route is used for this 977 purpose. A remote PE that receives a MAC advertisement route with 978 non-reserved ESI SHOULD consider the advertised MAC address to be 979 reachable via all PEs that have advertised reachability to that MAC 980 address' EVI/ES via the combination of an Ethernet A-D per EVI route 981 for that EVI/ES (and Ethernet Tag if applicable) AND Ethernet A-D per 982 ES routes for that ES with the 'Single-Active' bit in the flags of 983 the ESI Label Extended Community set to 0. 985 Note that the Ethernet A-D per EVI route may be received by a remote 986 PE before it receives the set of Ethernet A-D per ES routes. 987 Therefore, in order to handle corner cases and race conditions, the 988 Ethernet A-D per EVI route MUST NOT be used for traffic forwarding by 989 a remote PE until it also receives the associated set of Ethernet A-D 990 per ES routes. 992 Backup-path is a closely related function, but it is used in Single- 993 Active redundancy mode. In this case a PE also advertises that it 994 has reachability to a give EVI/ES using same combination of Ethernet 995 A-D per EVI route and Ethernet A-D per ES route as above, but with 996 the 'Single-Active' bit in the flags of the ESI Label Extended 997 Community set to 1. A remote PE that receives a MAC advertisement 998 route with non-reserved ESI SHOULD consider the advertised MAC 999 address to be reachable via any PE that has advertised this 1000 combination of Ethernet A-D routes and it SHOULD install a backup- 1001 path for that MAC address. 1003 8.4.1 Constructing the Ethernet A-D per EVPN Instance (EVI) Route 1005 This section describes the procedures used to construct the Ethernet 1006 A-D per EVPN Instance (EVI) route, which is used for aliasing (as 1007 discussed above). Support of this route is OPTIONAL. 1009 Route-Distinguisher (RD) MUST be set to the RD of the EVI that is 1010 advertising the NLRI. An RD MUST be assigned for a given EVI on an 1011 PE. This RD MUST be unique across all EVIs on an PE. It is 1012 RECOMMENDED to use the Type 1 RD [RFC4364]. The value field comprises 1013 an IP address of the PE (typically, the loopback address) followed by 1014 a number unique to the PE. This number may be generated by the PE. 1015 Or in the Unique VLAN EVPN case, the low order 12 bits may be the 12 1016 bit VLAN ID, with the remaining high order 4 bits set to 0. 1018 The Ethernet Segment Identifier MUST be a ten octet entity as 1019 described in section "Ethernet Segment Identifier". This document 1020 does not specify the use of the Ethernet A-D route when the Segment 1021 Identifier is set to 0. 1023 The Ethernet Tag ID is the identifier of an Ethernet Tag on the 1024 Ethernet segment. This value may be a 12 bit VLAN ID, in which case 1025 the low order 12 bits are set to the VLAN ID and the high order 20 1026 bits are set to 0. Or it may be another Ethernet Tag used by the 1027 EVPN. It MAY be set to the default Ethernet Tag on the Ethernet 1028 segment or to the value 0. 1030 Note that the above allows the Ethernet A-D route to be advertised 1031 with one of the following granularities: 1033 + One Ethernet A-D route for a given tuple 1034 per EVI. This is applicable when the PE uses MPLS-based 1035 disposition. 1037 + One Ethernet A-D route per (where the Ethernet 1038 Tag ID is set to 0). This is applicable when the PE uses 1039 MAC-based disposition, or when the PE uses MPLS-based 1040 disposition when no VLAN translation is required. 1042 The usage of the MPLS label is described in the section on "Load 1043 Balancing of Unicast Packets". 1045 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1046 be set to the IPv4 or IPv6 address of the advertising PE. 1048 8.4.1.1 Ethernet A-D Route Targets 1050 The Ethernet A-D route MUST carry one or more Route Target (RT) 1051 attributes. RTs may be configured (as in IP VPNs), or may be derived 1052 automatically. 1054 If an PE uses Route Target Constrain [RT-CONSTRAIN], the PE SHOULD 1055 advertise all such RTs using Route Target Constrains. The use of RT 1056 Constrains allows each Ethernet A-D route to reach only those PEs 1057 that are configured to import at least one RT from the set of RTs 1058 carried in the Ethernet A-D route. 1060 8.4.1.1.1 Auto-Derivation from the Ethernet Tag ID 1062 For the "Unique VLAN EVPN" scenario, it is highly desirable to auto- 1063 derive the RT from the Ethernet Tag ID (VLAN ID) for that EVPN 1064 instance. The following is the procedure for performing such auto- 1065 derivation. 1067 + The Global Administrator field of the RT MUST be set to 1068 the Autonomous System (AS) number that the PE associated 1069 with. 1071 + The two octet VLAN ID MUST be encoded in the lower two 1072 octets of the Local Administrator field. 1074 8.5 Designated Forwarder Election 1076 Consider a CE that is a host or a router that is multi-homed directly 1077 to more than one PE in an EVPN instance on a given Ethernet segment. 1078 One or more Ethernet Tags may be configured on the Ethernet segment. 1079 In this scenario only one of the PEs, referred to as the Designated 1080 Forwarder (DF), is responsible for certain actions: 1082 - Sending multicast and broadcast traffic, on a given Ethernet 1083 Tag on a particular Ethernet segment, to the CE. 1085 - Flooding unknown unicast traffic (i.e. traffic for 1086 which an PE does not know the destination MAC address), 1087 on a given Ethernet Tag on a particular Ethernet segment 1088 to the CE, if the environment requires flooding of 1089 unknown unicast traffic. 1091 Note that this behavior, which allows selecting a DF at the 1092 granularity of for multicast, broadcast and unknown 1093 unicast traffic, is the default behavior in this specification. 1095 Note that a CE always sends packets belonging to a specific flow 1096 using a single link towards an PE. For instance, if the CE is a host 1097 then, as mentioned earlier, the host treats the multiple links that 1098 it uses to reach the PEs as a Link Aggregation Group (LAG). The CE 1099 employs a local hashing function to map traffic flows onto links in 1100 the LAG. 1102 If a bridged network is multi-homed to more than one PE in an EVPN 1103 network via switches, then the support of All-Active redundancy mode 1104 requires the bridge network to be connected to two or more PEs using 1105 a LAG. 1107 If a bridged network does not connect to the PEs using LAG, then only 1108 one of the links between the switched bridged network and the PEs 1109 must be the active link for a given EVPN instance. In this case, the 1110 set of Ethernet A-D per ES routes advertised by each PE MUST have the 1111 'Single-Active' bit in the flags of the ESI Label Extended Community 1112 set to 1. 1114 The default procedure for DF election at the granularity of is referred to as "service carving". With service carving, it is 1116 possible to elect multiple DFs per Ethernet Segment (one per EVI) in 1117 order to perform load-balancing of multi-destination traffic destined 1118 to a given Segment. The load-balancing procedures carve up the EVI 1119 space among the PE nodes evenly, in such a way that every PE is the 1120 DF for a disjoint set of EVIs. The procedure for service carving is 1121 as follows: 1123 1. When a PE discovers the ESI of the attached Ethernet Segment, it 1124 advertises an Ethernet Segment route with the associated ES-Import 1125 extended community attribute. 1127 2. The PE then starts a timer (default value = 3 seconds) to allow 1128 the reception of Ethernet Segment routes from other PE nodes 1129 connected to the same Ethernet Segment. This timer value MUST be same 1130 across all PEs connected to the same Ethernet Segment. 1132 3. When the timer expires, each PE builds an ordered list of the IP 1133 addresses of all the PE nodes connected to the Ethernet Segment 1134 (including itself), in increasing numeric value. Each IP address in 1135 this list is extracted from the "Originator Router's IP address" 1136 field of the advertised Ethernet Segment route. Every PE is then 1137 given an ordinal indicating its position in the ordered list, 1138 starting with 0 as the ordinal for the PE with the numerically lowest 1139 IP address. The ordinals are used to determine which PE node will be 1140 the DF for a given EVPN instance on the Ethernet Segment using the 1141 following rule: Assuming a redundancy group of N PE nodes, the PE 1142 with ordinal i is the DF for an EVPN instance with an associated 1143 Ethernet Tag value V when (V mod N) = i. In the case where multiple 1144 Ethernet Tags are associated with a single EVPN instance, then the 1145 numerically lowest Ethernet Tag value in that EVPN instance MUST be 1146 used in the modulo function. 1148 It should be noted that using "Originator Router's IP address" field 1149 in the Ethernet Segment route to get the PE IP address needed for the 1150 ordered list, allows for a CE to be multi-homed across different ASes 1151 if such need every arises. 1153 4. The PE that is elected as a DF for a given EVPN instance will 1154 unblock traffic for the Ethernet Tags associated with that EVPN 1155 instance. Note that the DF PE unblocks multi-destination traffic in 1156 the egress direction towards the Segment. All non-DF PEs continue to 1157 drop multi-destination traffic (for the associated EVPN instances) in 1158 the egress direction towards the Segment. 1160 In the case of link or port failure, the affected PE withdraws its 1161 Ethernet Segment route. This will re-trigger the service carving 1162 procedures on all the PEs in the RG. For PE node failure, or upon PE 1163 commissioning or decommissioning, the PEs re-trigger the service 1164 carving. In case of a Single-Active multi-homing, when a service 1165 moves from one PE in the RG to another PE as a result of re-carving, 1166 the PE, which ends up being the elected DF for the service, must 1167 trigger a MAC address flush notification towards the associated 1168 Ethernet Segment. This can be done, for e.g. using IEEE 802.1ak MVRP 1169 'new' declaration. 1171 8.6. Interoperability with Single-homing PEs 1173 Let's refer to PEs that only support single-homed CE devices as 1174 single-homing PEs. For single-homing PEs, all the above multi-homing 1175 procedures can be omitted; however, to allow for single-homing PEs to 1176 fully inter-operate with multi-homing PEs, some of the multi-homing 1177 procedures described above SHOULD be supported even by single-homing 1178 PEs: 1180 - procedures related to processing Ethernet A-D route for the purpose 1181 of Fast Convergence (9.2 Fast Convergence), to let single-homing PEs 1182 benefit from fast convergence 1184 - procedures related to processing Ethernet A-D route for the purpose 1185 of Aliasing (9.4 Aliasing and Backup-path), to let single-homing PEs 1186 benefit from load balancing 1188 - procedures related to processing Ethernet A-D route for the purpose 1189 of Backup-path (9.4 Aliasing and Backup-path), to let single-homing 1190 PEs to benefit from the corresponding convergence improvement 1192 9. Determining Reachability to Unicast MAC Addresses 1194 PEs forward packets that they receive based on the destination MAC 1195 address. This implies that PEs must be able to learn how to reach a 1196 given destination unicast MAC address. 1198 There are two components to MAC address learning, "local learning" 1199 and "remote learning": 1201 9.1. Local Learning 1203 A particular PE must be able to learn the MAC addresses from the CEs 1204 that are connected to it. This is referred to as local learning. 1206 The PEs in a particular EVPN instance MUST support local data plane 1207 learning using standard IEEE Ethernet learning procedures. An PE must 1208 be capable of learning MAC addresses in the data plane when it 1209 receives packets such as the following from the CE network: 1211 - DHCP requests 1213 - ARP request for its own MAC. 1215 - ARP request for a peer. 1217 Alternatively PEs MAY learn the MAC addresses of the CEs in the 1218 control plane or via management plane integration between the PEs and 1219 the CEs. 1221 There are applications where a MAC address that is reachable via a 1222 given PE on a locally attached Segment (e.g. with ESI X) may move 1223 such that it becomes reachable via another PE on another Segment 1224 (e.g. with ESI Y). This is referred to as a "MAC Mobility". 1225 Procedures to support this are described in section "MAC Mobility". 1227 9.2. Remote learning 1229 A particular PE must be able to determine how to send traffic to MAC 1230 addresses that belong to or are behind CEs connected to other PEs 1231 i.e. to remote CEs or hosts behind remote CEs. We call such MAC 1232 addresses as "remote" MAC addresses. 1234 This document requires an PE to learn remote MAC addresses in the 1235 control plane. In order to achieve this, each PE advertises the MAC 1236 addresses it learns from its locally attached CEs in the control 1237 plane, to all the other PEs in that EVPN instance, using MP-BGP and 1238 specifically the MAC Advertisement route. 1240 9.2.1. Constructing the BGP EVPN MAC/IP Address Advertisement 1242 BGP is extended to advertise these MAC addresses using the MAC/IP 1243 Advertisement route type in the EVPN NLRI. 1245 The RD MUST be the RD of the EVI that is advertising the NLRI. The 1246 procedures for setting the RD for a given EVI are described in 1247 section 8.4.1. 1249 The Ethernet Segment Identifier is set to the ten octet ESI described 1250 in section "Ethernet Segment". 1252 The Ethernet Tag ID may be zero or may represent a valid Ethernet Tag 1253 ID. This field may be non-zero when there are multiple bridge 1254 domains in the MAC-VRF (e.g., the PE needs to perform qualified 1255 learning for the VLANs in that MAC-VRF). 1257 When the the Ethernet Tag ID in the NLRI is set to a non-zero value, 1258 for a particular bridge domain, then this Ethernet Tag may either be 1259 the Ethernet tag value associated with the CE, e.g., VLAN ID, or it 1260 may be the Ethernet Tag Identifier, e.g., VLAN ID assigned by the 1261 EVPN provider and mapped to the CE's Ethernet tag. The latter would 1262 be the case if the CE Ethernet tags, e.g., VLAN ID, for a particular 1263 bridge domain are different on different CEs. 1265 The MAC address length field is in bits and it is set to 48. The 1266 encoding of a MAC address MUST be the 6-octet MAC address specified 1267 by [802.1D-ORIG] [802.1D-REV]. 1269 The IP Address Field is optional. By default, the IP Address Length 1270 field is set to 0 and the IP address field is omitted from the route. 1271 When a valid IP address needs to be advertised, it is then encoded in 1272 this route. When an IP address is present, the IP Address Length 1273 field is in bits and it is set to 32 or 128 bits. Other IP Address 1274 Length values are outside the scope of this document. The encoding of 1275 an IP address MUST be either 4 octets for IPv4 or 16 octets for IPv6. 1276 The length field of EVPN NLRI (which is in octets and is described in 1277 section 7) is sufficient to determine whether an IP address is 1278 encoded in this route and if so, whether the encoded IP address is 1279 IPV4 or IPv6. 1281 The MPLS label1 field is encoded as 3 octets, where the high-order 20 1282 bits contain the label value. The MPLS label1 MUST be downstream 1283 assigned and it is associated with the MAC address being advertised 1284 by the advertising PE. The advertising PE uses this label when it 1285 receives an MPLS-encapsulated packet to perform forwarding based on 1286 the destination MAC address. The forwarding procedures are specified 1287 in section "Forwarding Unicast Packets" and "Load Balancing of 1288 Unicast Packets". 1290 An PE may advertise the same single EVPN label for all MAC addresses 1291 in a given EVI. This label assignment methodology is referred to as a 1292 per EVI label assignment. Alternatively, an PE may advertise a unique 1293 EVPN label per combination. This label assignment 1294 methodology is referred to as a per label 1295 assignment. As a third option, an PE may advertise a unique EVPN 1296 label per MAC address. All of these methodologies have their 1297 tradeoffs. The choice of a particular label assignment methodology is 1298 purely local to the PE that originates the route. 1300 Per EVI label assignment requires the least number of EVPN labels, 1301 but requires a MAC lookup in addition to an MPLS lookup on an egress 1302 PE for forwarding. On the other hand, a unique label per or a unique label per MAC allows an egress PE to 1304 forward a packet that it receives from another PE, to the connected 1305 CE, after looking up only the MPLS labels without having to perform a 1306 MAC lookup. This includes the capability to perform appropriate VLAN 1307 ID translation on egress to the CE. 1309 The MPLS label2 field is an optional field and if it is present, then 1310 it is encoded as 3 octets, where the high-order 20 bits contain the 1311 label value. The use of MPLS label2 is for further study. 1313 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1314 be set to the IPv4 or IPv6 address of the advertising PE. 1316 The BGP advertisement for the MAC advertisement route MUST also carry 1317 one or more Route Target (RT) attributes. RTs may be configured (as 1318 in IP VPNs), or may be derived automatically from the Ethernet Tag 1319 ID, in the Unique VLAN case, as described in section "Ethernet A-D 1320 Route per EVPN". 1322 It is to be noted that this document does not require PEs to create 1323 forwarding state for remote MACs when they are learnt in the control 1324 plane. When this forwarding state is actually created is a local 1325 implementation matter. 1327 9.2.2 Route Resolution 1329 If the Ethernet Segment Identifier field in a received MAC 1330 Advertisement route is set to the reserved ESI value of 0 or MAX-ESI, 1331 then the receiving PE MUST install forwarding state for the 1332 associated MAC Address based on the MAC Advertisement route alone. 1334 If the Ethernet Segment Identifier field in a received MAC 1335 Advertisement route is set to a non-reserved ESI, and the receiving 1336 PE is locally attached to the same ESI, then the PE does not alter 1337 its forwarding state based on the received route. This ensures that 1338 local routes are preferred to remote routes. 1340 If the Ethernet Segment Identifier field in a received MAC 1341 Advertisement route is set to a non-reserved ESI, then the receiving 1342 PE MUST install forwarding state for a given MAC address only when 1343 both the MAC Advertisement route AND the associated set of Ethernet 1344 A-D per ES routes have been received. 1346 To illustrate this with an example, consider two PEs (PE1 and PE2) 1347 connected to a multi-homed Ethernet Segment ES1. All-Active 1348 redundancy mode is assumed. A given MAC address M1 is learnt by PE1 1349 but not PE2. On PE3, the following states may arise: 1351 T1- When the MAC Advertisement Route from PE1 and the set of Ethernet 1352 A-D per ES routes from PE1 and PE2 are received, PE3 can forward 1353 traffic destined to M1 to both PE1 and PE2. 1355 T2- If after T1, PE1 withdraws its set of Ethernet A-D per ES routes, 1356 then PE3 forwards traffic destined to M1 to PE2 only. 1358 T3- If after T1, PE2 withdraws its set of Ethernet A-D per ES routes, 1359 then PE3 forwards traffic destined to M1 to PE1 only. 1361 T4- If after T1, PE1 withdraws its MAC Advertisement route, then PE3 1362 treats traffic to M1 as unknown unicast. Note, here, that had PE2 1363 also advertised a MAC route for M1 before PE1 withdraws its MAC 1364 route, then PE3 would have continued forwarding traffic destined to 1365 M1 to PE2. 1367 10. ARP and ND 1369 The IP address field in the MAC advertisement route may optionally 1370 carry one of the IP addresses associated with the MAC address. This 1371 provides an option which can be used to minimize the flooding of ARP 1372 or Neighbor Discovery (ND) messages over the MPLS network and to 1373 remote CEs. This option also minimizes ARP (or ND) message processing 1374 on end-stations/hosts connected to the EVPN network. An PE may learn 1375 the IP address associated with a MAC address in the control or 1376 management plane between the CE and the PE. Or, it may learn this 1377 binding by snooping certain messages to or from a CE. When an PE 1378 learns the IP address associated with a MAC address, of a locally 1379 connected CE, it may advertise this address to other PEs by including 1380 it in the MAC Advertisement route. The IP Address may be an IPv4 1381 address encoded using four octets, or an IPv6 address encoded using 1382 sixteen octets. For ARP and ND purposes, the IP Address length field 1383 MUST be set to 32 for an IPv4 address or to 128 for an IPv6 address. 1385 If there are multiple IP addresses associated with a MAC address, 1386 then multiple MAC advertisement routes MUST be generated, one for 1387 each IP address. For instance, this may be the case when there are 1388 both an IPv4 and an IPv6 address associated with the MAC address. 1390 When the IP address is dissociated with the MAC address, then the MAC 1391 advertisement route with that particular IP address MUST be 1392 withdrawn. 1394 When an PE receives an ARP request for an IP address from a CE, and 1395 if the PE has the MAC address binding for that IP address, the PE 1396 SHOULD perform ARP proxy by responding to the ARP request. 1398 10.1 Default Gateway 1400 When a PE needs to perform inter-subnet forwarding where each subnet 1401 is represented by a different broadcast domain (e.g., different VLAN) 1402 the inter-subnet forwarding is performed at layer 3 and the PE that 1403 performs such function is called the default gateway. In this case 1404 when the PE receives an ARP Request for the IP address of the default 1405 gateway, the PE originates an ARP Reply. 1407 Each PE that acts as a default gateway for a given EVPN instance MAY 1408 advertise in the EVPN control plane its default gateway MAC address 1409 using the MAC advertisement route, and indicates that such route is 1410 associated with the default gateway. This is accomplished by 1411 requiring the route to carry the Default Gateway extended community 1412 defined in [Section 7.8 Default Gateway Extended Community]. The ESI 1413 field is set to zero when advertising the MAC route with the Default 1414 Gateway extended community. 1416 Unless it is known a priori (by means outside of this document) that 1417 all PEs of a given EVPN instance act as a default gateway for that 1418 EVPN instance, the MPLS label MUST be set to a valid downstream 1419 assigned label. 1421 Furthermore, even if all PEs of a given EVPN instance do act as a 1422 default gateway for that EVPN instance, but only some, but not all, 1423 of these PEs have sufficient (routing) information to provide inter- 1424 subnet routing for all the inter-subnet traffic originated within the 1425 subnet associated with the EVPN instance, then when such PE 1426 advertises in the EVPN control plane its default gateway MAC address 1427 using the MAC advertisement route, and indicates that such route is 1428 associated with the default gateway, the route MUST carry a valid 1429 downstream assigned label. 1431 If all PEs of a given EVPN instance act as a default gateway for that 1432 EVPN instance, and the same default gateway MAC address is used 1433 across all gateway devices, then no such advertisement is needed. 1434 However, if each default gateway uses a different MAC address, then 1435 each default gateway needs to be aware of other gateways' MAC 1436 addresses and thus the need for such advertisement. This is called 1437 MAC address aliasing since a single default GW can be represented by 1438 multiple MAC addresses. 1440 Each PE that receives this route and imports it as per procedures 1441 specified in this document follows the procedures in this section 1442 when replying to ARP Requests that it receives if such Requests are 1443 for the IP address in the received EVPN route. 1445 Each PE that acts as a default gateway for a given EVPN instance that 1446 receives this route and imports it as per procedures specified in 1447 this document MUST create MAC forwarding state that enables it to 1448 apply IP forwarding to the packets destined to the MAC address 1449 carried in the route. 1451 11. Handling of Multi-Destination Traffic 1453 Procedures are required for a given PE to send broadcast or multicast 1454 traffic, received from a CE encapsulated in a given Ethernet Tag 1455 (VLAN) in an EVPN instance, to all the other PEs that span that 1456 Ethernet Tag (VLAN) in that EVPN instance. In certain scenarios, 1457 described in section "Processing of Unknown Unicast Packets", a given 1458 PE may also need to flood unknown unicast traffic to other PEs. 1460 The PEs in a particular EVPN instance may use ingress replication, 1461 P2MP LSPs or MP2MP LSPs to send unknown unicast, broadcast or 1462 multicast traffic to other PEs. 1464 Each PE MUST advertise an "Inclusive Multicast Ethernet Tag Route" to 1465 enable the above. The following subsection provides the procedures to 1466 construct the Inclusive Multicast Ethernet Tag route. Subsequent 1467 subsections describe in further detail its usage. 1469 11.1. Construction of the Inclusive Multicast Ethernet Tag Route 1471 The RD MUST be the RD of the EVI that is advertising the NLRI. The 1472 procedures for setting the RD for a given EVPN instance on a PE are 1473 described in section 8.4.1. 1475 The Ethernet Tag ID is the identifier of the Ethernet Tag. It MAY be 1476 set to 0 or to a valid Ethernet Tag value. 1478 The Originating Router's IP address MUST be set to an IP address of 1479 the PE. This address SHOULD be common for all the EVIs on the PE 1480 (e.,g., this address may be PE's loopback address). The IP Address 1481 Length field is in bits. 1483 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1484 be set to the same IP address as the one carried in the Originating 1485 Router's IP Address field. 1487 The BGP advertisement for the Inclusive Multicast Ethernet Tag route 1488 MUST also carry one or more Route Target (RT) attributes. The 1489 assignment of RTs described in the section on "Constructing the BGP 1490 EVPN MAC Address Advertisement" MUST be followed. 1492 11.2. P-Tunnel Identification 1494 In order to identify the P-Tunnel used for sending broadcast, unknown 1495 unicast or multicast traffic, the Inclusive Multicast Ethernet Tag 1496 route MUST carry a "PMSI Tunnel Attribute" as specified in [BGP 1497 MVPN]. 1499 Depending on the technology used for the P-tunnel for the EVPN 1500 instance on the PE, the PMSI Tunnel attribute of the Inclusive 1501 Multicast Ethernet Tag route is constructed as follows. 1503 + If the PE that originates the advertisement uses a 1504 P-Multicast tree for the P-tunnel for EVPN, the PMSI 1505 Tunnel attribute MUST contain the identity of the tree 1506 (note that the PE could create the identity of the 1507 tree prior to the actual instantiation of the tree). 1509 + An PE that uses a P-Multicast tree for the P-tunnel MAY 1510 aggregate two or more EVPN instances (EVIs) present 1511 on the PE onto the same tree. In this case, in addition 1512 to carrying the identity of the tree, the PMSI Tunnel 1513 attribute MUST carry an MPLS upstream assigned label which 1514 the PE has bound uniquely to the EVI associated with this 1515 update (as determined by its RTs). 1517 If the PE has already advertised Inclusive Multicast 1518 Ethernet Tag routes for two or more EVIs that it now 1519 desires to aggregate, then the PE MUST re-advertise 1520 those routes. The re-advertised routes MUST be the same 1521 as the original ones, except for the PMSI Tunnel attribute 1522 and the label carried in that attribute. 1524 + If the PE that originates the advertisement uses ingress 1525 replication for the P-tunnel for EVPN, the route MUST 1526 include the PMSI Tunnel attribute with the Tunnel Type set to 1527 Ingress Replication and Tunnel Identifier set to a routable 1528 address of the PE. The PMSI Tunnel attribute MUST carry a 1529 downstream assigned MPLS label. This label is used to 1530 demultiplex the broadcast, multicast or unknown unicast EVPN 1531 traffic received over a MP2P tunnel by the PE. 1533 + The Leaf Information Required flag of the PMSI Tunnel 1534 attribute MUST be set to zero, and MUST be ignored on receipt. 1536 12. Processing of Unknown Unicast Packets 1538 The procedures in this document do not require the PEs to flood 1539 unknown unicast traffic to other PEs. If PEs learn CE MAC addresses 1540 via a control plane protocol, the PEs can then distribute MAC 1541 addresses via BGP, and all unicast MAC addresses will be learnt prior 1542 to traffic to those destinations. 1544 However, if a destination MAC address of a received packet is not 1545 known by the PE, the PE may have to flood the packet. When flooding, 1546 one must take into account "split horizon forwarding" as follows: The 1547 principles behind the following procedures are borrowed from the 1548 split horizon forwarding rules in VPLS solutions [RFC4761] and 1549 [RFC4762]. When an PE capable of flooding (say PEx) receives an 1550 unknown destination MAC address, it floods the frame. If the frame 1551 arrived from an attached CE, PEx must send a copy of the frame to 1552 every other attached CE participating in that EVPN instance, on a 1553 different ESI than the one it received the frame on, as long as the 1554 PE is the DF for the egress ESI. In addition, the PE must flood the 1555 frame to all other PEs participating in that EVPN instance. If, on 1556 the other hand, the frame arrived from another PE (say PEy), PEx must 1557 send a copy of the packet only to attached CEs as long as it is the 1558 DF for the egress ESI. PEx MUST NOT send the frame to other PEs, 1559 since PEy would have already done so. Split horizon forwarding rules 1560 apply to unknown MAC addresses. 1562 Whether or not to flood packets to unknown destination MAC addresses 1563 should be an administrative choice, depending on how learning happens 1564 between CEs and PEs. 1566 The PEs in a particular EVPN instance may use ingress replication 1567 using RSVP-TE P2P LSPs or LDP MP2P LSPs for sending unknown unicast 1568 traffic to other PEs. Or they may use RSVP-TE P2MP or LDP P2MP for 1569 sending such traffic to other PEs. 1571 12.1. Ingress Replication 1573 If ingress replication is in use, the P-Tunnel attribute, carried in 1574 the Inclusive Multicast Ethernet Tag routes for the EVPN instance, 1575 specifies the downstream label that the other PEs can use to send 1576 unknown unicast, multicast or broadcast traffic for that EVPN 1577 instance to this particular PE. 1579 The PE that receives a packet with this particular MPLS label MUST 1580 treat the packet as a broadcast, multicast or unknown unicast packet. 1582 Further if the MAC address is a unicast MAC address, the PE MUST 1583 treat the packet as an unknown unicast packet. 1585 12.2. P2MP MPLS LSPs 1587 The procedures for using P2MP LSPs are very similar to VPLS 1588 procedures [VPLS-MCAST]. The P-Tunnel attribute used by an PE for 1589 sending unknown unicast, broadcast or multicast traffic for a 1590 particular EVPN instance is advertised in the Inclusive Ethernet Tag 1591 Multicast route as described in section "Handling of Multi- 1592 Destination Traffic". 1594 The P-Tunnel attribute specifies the P2MP LSP identifier. This is the 1595 equivalent of an Inclusive tree in [VPLS-MCAST]. Note that multiple 1596 Ethernet Tags, which may be in different EVPN instances, may use the 1597 same P2MP LSP, using upstream labels [VPLS-MCAST]. This is the 1598 equivalent of an Aggregate Inclusive tree in [VPLS-MCAST]. When P2MP 1599 LSPs are used for flooding unknown unicast traffic, packet re- 1600 ordering is possible. 1602 The PE that receives a packet on the P2MP LSP specified in the PMSI 1603 Tunnel Attribute MUST treat the packet as a broadcast, multicast or 1604 unknown unicast packet. Further if the MAC address is a unicast MAC 1605 address, the PE MUST treat the packet as an unknown unicast packet. 1607 13. Forwarding Unicast Packets 1609 This section describes procedures for forwarding unicast packets by 1610 PEs, where such packets are received from either directly connected 1611 CEs, or from some other PEs. 1613 13.1. Forwarding packets received from a CE 1615 When an PE receives a packet from a CE, on a given Ethernet Tag, it 1616 must first look up the source MAC address of the packet. In certain 1617 environments the source MAC address MAY be used to authenticate the 1618 CE and determine that traffic from the host can be allowed into the 1619 network. Source MAC lookup MAY also be used for local MAC address 1620 learning. 1622 If the PE decides to forward the packet, the destination MAC address 1623 of the packet must be looked up. If the PE has received MAC address 1624 advertisements for this destination MAC address from one or more 1625 other PEs or learned it from locally connected CEs, it is considered 1626 as a known MAC address. Otherwise, the MAC address is considered as 1627 an unknown MAC address. 1629 For known MAC addresses the PE forwards this packet to one of the 1630 remote PEs or to a locally attached CE. When forwarding to a remote 1631 PE, the packet is encapsulated in the EVPN MPLS label advertised by 1632 the remote PE, for that MAC address, and in the MPLS LSP label stack 1633 to reach the remote PE. 1635 If the MAC address is unknown and if the administrative policy on the 1636 PE requires flooding of unknown unicast traffic then: 1638 - The PE MUST flood the packet to other PEs. The PE MUST first 1639 encapsulate the packet in the ESI MPLS label as described in section 1640 8.3. If ingress replication is used, the packet MUST be replicated 1641 one or more times to each remote PE with the outermost label being an 1642 MPLS label determined as follows: This is the MPLS label advertised 1643 by the remote PE in a PMSI Tunnel Attribute in the Inclusive 1644 Multicast Ethernet Tag route for an 1645 combination. The Ethernet Tag in the route must be the same as the 1646 Ethernet Tag associated with the interface on which the ingress PE 1647 receives the packet. If P2MP LSPs are being used the packet MUST be 1648 sent on the P2MP LSP that the PE is the root of for the Ethernet Tag 1649 in the EVPN instance. If the same P2MP LSP is used for all Ethernet 1650 Tags, then all the PEs in the EVPN instance MUST be the leaves of the 1651 P2MP LSP. If a distinct P2MP LSP is used for a given Ethernet Tag in 1652 the EVPN instance, then only the PEs in the Ethernet Tag MUST be the 1653 leaves of the P2MP LSP. The packet MUST be encapsulated in the P2MP 1654 LSP label stack. 1656 If the MAC address is unknown then, if the administrative policy on 1657 the PE does not allow flooding of unknown unicast traffic: 1659 - The PE MUST drop the packet. 1661 13.2. Forwarding packets received from a remote PE 1663 This section described the procedures for forwarding known and 1664 unknown unicast packets received from a remote PE. 1666 13.2.1. Unknown Unicast Forwarding 1668 When an PE receives an MPLS packet from a remote PE then, after 1669 processing the MPLS label stack, if the top MPLS label ends up being 1670 a P2MP LSP label associated with an EVPN instance or in case of 1671 ingress replication the downstream label advertised in the P-Tunnel 1672 attribute, and after performing the split horizon procedures 1673 described in section "Split Horizon": 1675 - If the PE is the designated forwarder of BUM traffic on a 1676 particular set of ESIs for the Ethernet Tag, the default behavior is 1677 for the PE to flood the packet on these ESIs. In other words, the 1678 default behavior is for the PE to assume that for BUM traffic, it is 1679 not required to perform a destination MAC address lookup. As an 1680 option, the PE may perform a destination MAC lookup to flood the 1681 packet to only a subset of the CE interfaces in the Ethernet Tag. For 1682 instance the PE may decide to not flood an BUM packet on certain 1683 Ethernet segments even if it is the DF on the Ethernet segment, based 1684 on administrative policy. 1686 - If the PE is not the designated forwarder on any of the ESIs for 1687 the Ethernet Tag, the default behavior is for it to drop the packet. 1689 13.2.2. Known Unicast Forwarding 1691 If the top MPLS label ends up being an EVPN label that was advertised 1692 in the unicast MAC advertisements, then the PE either forwards the 1693 packet based on CE next-hop forwarding information associated with 1694 the label or does a destination MAC address lookup to forward the 1695 packet to a CE. 1697 14. Load Balancing of Unicast Frames 1699 This section specifies the load balancing procedures for sending 1700 known unicast frames to a multi-homed CE. 1702 14.1. Load balancing of traffic from an PE to remote CEs 1704 Whenever a remote PE imports a MAC advertisement for a given in an EVI, it MUST examine all imported Ethernet A-D 1706 routes for that ESI in order to determine the load-balancing 1707 characteristics of the Ethernet segment. 1709 14.1.1 Single-Active Redundancy Mode 1711 For a given ES, if the remote PE has imported the set of Ethernet A-D 1712 per ES routes from at least one PE, where the "Single-Active" flag in 1713 the ESI Label Extended Community is set, then the remote PE MUST 1714 deduce that the ES is operating in Single-Active redundancy mode. As 1715 such, the MAC address will be reachable only via the PE announcing 1716 the associated MAC Advertisement route - this is referred to as the 1717 primary PE. The other PEs advertising the set of Ethernet A-D per ES 1718 routes for the same ES provide backup paths for that ES, in case the 1719 primary PE encounters a failure, and are referred to as backup PEs. 1720 It should be noted that the primary PE for a given is the 1721 DF for that . 1723 If the primary PE encounters a failure, it MAY withdraw its set of 1724 Ethernet A-D per ES routes for the affected ES prior to withdrawing 1725 it set of MAC Advertisement routes. 1727 If there is only one backup PE for a given ES, the remote PE MAY use 1728 the primary PE's withdrawal of its set of Ethernet A-D per ES routes 1729 as a trigger to update its forwarding entries, for the associated MAC 1730 addresses, to point towards the backup PE. As the backup PE starts 1731 learning the MAC addresses over its attached ES, it will start 1732 sending MAC Advertisement routes while the failed PE withdraws its 1733 routes. This mechanism minimizes the flooding of traffic during fail- 1734 over events. 1736 If there is more than one backup PE for a given ES, the remote PE 1737 MUST use the primary PE's withdrawal of its set of Ethernet A-D per 1738 ES routes as a trigger to start flooding traffic for the associated 1739 MAC addresses (as long as flooding of unknown unicast is 1740 administratively allowed), as it is not possible to select a single 1741 backup PE. 1743 14.1.2 All-Active Redundancy Mode 1745 For a given ES, if the remote PE has imported the set of Ethernet A-D 1746 per ES routes from one or more PEs and none of them have the "Single- 1747 Active" flag in the ESI Label Extended Community set, then the remote 1748 PE MUST deduce that the ES is operating in All-Active redundancy 1749 mode. A remote PE that receives a MAC advertisement route with non- 1750 reserved ESI SHOULD consider the advertised MAC address to be 1751 reachable via all PEs that have advertised reachability to that MAC 1752 address' EVI/ES via the combination of an Ethernet A-D per EVI route 1753 for that EVI/ES (and Ethernet Tag if applicable) AND an Ethernet A-D 1754 per ES route for that ES. The remote PE MUST use received MAC 1755 Advertisement routes and Ethernet A-D per EVI/per ES routes to 1756 construct the set of next-hops for the advertised MAC address. 1758 The remote PE MUST use the MAC advertisement and eligible Ethernet A- 1759 D routes to construct the set of next-hops that it can use to send 1760 the packet to the destination MAC. Each next-hop comprises an MPLS 1761 label stack that is to be used by the egress PE to forward the 1762 packet. This label stack is determined as follows: 1764 -If the next-hop is constructed as a result of a MAC route then this 1765 label stack MUST be used. However, if the MAC route doesn't exist, 1766 then the next-hop and MPLS label stack is constructed as a result of 1767 the Ethernet A-D routes. Note that the following description applies 1768 to determining the label stack for a particular next-hop to reach a 1769 given PE, from which the remote PE has received and imported Ethernet 1770 A-D routes that have the matching ESI and Ethernet Tag as the one 1771 present in the MAC advertisement. The Ethernet A-D routes mentioned 1772 in the following description refer to the ones imported from this 1773 given PE. 1775 -If a set of Ethernet A-D per ES routes for that ES AND an Ethernet 1776 A-D route per EVI exist, then the label from that latter route must 1777 be used. 1779 The following example explains the above. 1781 Consider a CE (CE1) that is dual-homed to two PEs (PE1 and PE2) on a 1782 LAG interface (ES1), and is sending packets with MAC address MAC1 on 1783 VLAN1 (mapped to EVI1). A remote PE, say PE3, is able to learn that 1784 MAC1 is reachable via PE1 and PE2. Both PE1 and PE2 may advertise 1785 MAC1 in BGP if they receive packets with MAC1 from CE1. If this is 1786 not the case, and if MAC1 is advertised only by PE1, PE3 still 1787 considers MAC1 as reachable via both PE1 and PE2 as both PE1 and PE2 1788 advertise a set of Ethernet A-D per ES routes for ES1 as well as an 1789 Ethernet A-D per EVI route for . 1791 The MPLS label stack to send the packets to PE1 is the MPLS LSP stack 1792 to get to PE1 and the EVPN label advertised by PE1 for CE1's MAC. 1794 The MPLS label stack to send packets to PE2 is the MPLS LSP stack to 1795 get to PE2 and the MPLS label in the Ethernet A-D route advertised by 1796 PE2 for , if PE2 has not advertised MAC1 in BGP. 1798 We will refer to these label stacks as MPLS next-hops. 1800 The remote PE (PE3) can now load balance the traffic it receives from 1801 its CEs, destined for CE1, between PE1 and PE2. PE3 may use N-Tuple 1802 flow information to hash traffic into one of the MPLS next-hops for 1803 load balancing of IP traffic. Alternatively PE3 may rely on the 1804 source MAC addresses for load balancing. 1806 Note that once PE3 decides to send a particular packet to PE1 or PE2 1807 it can pick one out of multiple possible paths to reach the 1808 particular remote PE using regular MPLS procedures. For instance, if 1809 the tunneling technology is based on RSVP-TE LSPs, and PE3 decides to 1810 send a particular packet to PE1, then PE3 can choose from multiple 1811 RSVP-TE LSPs that have PE1 as their destination. 1813 When PE1 or PE2 receive the packet destined for CE1 from PE3, if the 1814 packet is a unicast MAC packet it is forwarded to CE1. If it is a 1815 multicast or broadcast MAC packet then only one of PE1 or PE2 must 1816 forward the packet to the CE. Which of PE1 or PE2 forward this packet 1817 to the CE is determined based on which of the two is the DF. 1819 If the connectivity between the multi-homed CE and one of the PEs 1820 that it is attached to, fails, the PE MUST withdraw the set of 1821 Ethernet A-D per ES routes that had been previously advertised for 1822 that ES. When the MAC entry on the PE ages out, the PE MUST withdraw 1823 the MAC address from BGP. Note that to aid convergence, the Ethernet 1824 Tag A-D routes MAY be withdrawn before the MAC routes. This enables 1825 the remote PEs to remove the MPLS next-hop to this particular PE from 1826 the set of MPLS next-hops that can be used to forward traffic to the 1827 CE. For further details and procedures on withdrawal of EVPN route 1828 types in the event of PE to CE failures please section "PE to CE 1829 Network Failures". 1831 14.2. Load balancing of traffic between an PE and a local CE 1833 A CE may be configured with more than one interface connected to 1834 different PEs or the same PE for load balancing, using a technology 1835 such as LAG. The PE(s) and the CE can load balance traffic onto these 1836 interfaces using one of the following mechanisms. 1838 14.2.1. Data plane learning 1840 Consider that the PEs perform data plane learning for local MAC 1841 addresses learned from local CEs. This enables the PE(s) to learn a 1842 particular MAC address and associate it with one or more interfaces, 1843 if the technology between the PE and the CE supports multi-pathing. 1844 The PEs can now load balance traffic destined to that MAC address on 1845 the multiple interfaces. 1847 Whether the CE can load balance traffic that it generates on the 1848 multiple interfaces is dependent on the CE implementation. 1850 14.2.2. Control plane learning 1852 The CE can be a host that advertises the same MAC address using a 1853 control protocol on both interfaces. This enables the PE(s) to learn 1854 the host's MAC address and associate it with one or more interfaces. 1855 The PEs can now load balance traffic destined to the host on the 1856 multiple interfaces. The host can also load balance the traffic it 1857 generates onto these interfaces and the PE that receives the traffic 1858 employs EVPN forwarding procedures to forward the traffic. 1860 15. MAC Mobility 1862 It is possible for a given host or end-station (as defined by its MAC 1863 address) to move from one Ethernet segment to another; this is 1864 referred to as 'MAC Mobility' or 'MAC move' and it is different from 1865 the multi-homing situation in which a given MAC address is reachable 1866 via multiple PEs for the same Ethernet segment. In a MAC move, there 1867 would be two sets of MAC Advertisement routes, one set with the new 1868 Ethernet segment and one set with the previous Ethernet segment, and 1869 the MAC address would appear to be reachable via each of these 1870 segments. 1872 In order to allow all of the PEs in the EVPN instance to correctly 1873 determine the current location of the MAC address, all advertisements 1874 of it being reachable via the previous Ethernet segment MUST be 1875 withdrawn by the PEs, for the previous Ethernet segment, that had 1876 advertised it. 1878 If local learning is performed using the data plane, these PEs will 1879 not be able to detect that the MAC address has moved to another 1880 Ethernet segment and the receipt of MAC Advertisement routes, with 1881 the MAC Mobility extended community attribute, from other PEs serves 1882 as the trigger for these PEs to withdraw their advertisements. If 1883 local learning is performed using the control or management planes, 1884 these interactions serve as the trigger for these PEs to withdraw 1885 their advertisements. 1887 In a situation where there are multiple moves of a given MAC, 1888 possibly between the same two Ethernet segments, there may be 1889 multiple withdrawals and re-advertisements. In order to ensure that 1890 all PEs in the EVPN instance receive all of these correctly through 1891 the intervening BGP infrastructure, it is necessary to introduce a 1892 sequence number into the MAC Mobility extended community attribute. 1894 An implementation MUST handle the scenarios where the sequence number 1895 wraps around to process mobility event correctly. 1897 Every MAC mobility event for a given MAC address will contain a 1898 sequence number that is set using the following rules: 1900 - A PE advertising a MAC address for the first time advertises it 1901 with no MAC Mobility extended community attribute. 1903 - A PE detecting a locally attached MAC address for which it had 1904 previously received a MAC Advertisement route with a different 1905 Ethernet segment identifier advertises the MAC address in a MAC 1906 Advertisement route tagged with a MAC Mobility extended community 1907 attribute with a sequence number one greater than the sequence number 1908 in the MAC mobility attribute of the received MAC Advertisement 1909 route. In the case of the first mobility event for a given MAC 1910 address, where the received MAC Advertisement route does not carry a 1911 MAC Mobility attribute, the value of the sequence number in the 1912 received route is assumed to be 0 for purpose of this processing. 1914 - A PE detecting a locally attached MAC address for which it had 1915 previously received a MAC Advertisement route with the same non-zero 1916 Ethernet segment identifier advertises it with: 1917 i. no MAC Mobility extended community attribute, if the received 1918 route did not carry said attribute. 1920 ii. a MAC Mobility extended community attribute with the sequence 1921 number equal to the highest of the sequence number(s) in the 1922 received MAC Advertisement route(s), if the received route(s) is 1923 (are) tagged with a MAC Mobility extended community attribute. 1925 - A PE detecting a locally attached MAC address for which it had 1926 previously received a MAC Advertisement route with the same zero 1927 Ethernet segment identifier (single-homed scenarios) advertises it 1928 with MAC mobility extended community attribute with the sequence 1929 number set properly. In case of single-homed scenarios, there is no 1930 need for ESI comparison. The reason ESI comparison is done for multi- 1931 homing, is to prevent false detection of MAC move among the PEs 1932 attached to the same multi-homed site. 1934 A PE receiving a MAC Advertisement route for a MAC address with a 1935 different Ethernet segment identifier and a higher sequence number 1936 than that which it had previously advertised, withdraws its MAC 1937 Advertisement route. If two (or more) PEs advertise the same MAC 1938 address with same sequence number but different Ethernet segment 1939 identifiers, a PE that receives these routes selects the route 1940 advertised by the PE with lowest IP address as the best route. If the 1941 PE is the originator of the MAC route and it receives the same MAC 1942 address with the same sequence number that it generated, it will 1943 compare its own IP address with the IP address of the remote PE and 1944 will select the lowest IP. If its own route is not the best one, it 1945 will withdraw the route. 1947 15.1. MAC Duplication Issue 1949 A situation may arise where the same MAC address is learned by 1950 different PEs in the same VLAN because of two (or more hosts) being 1951 mis-configured with the same (duplicate) MAC address. In such 1952 situation, the traffic originating from these hosts would trigger 1953 continuous MAC moves among the PEs attached to these hosts. It is 1954 important to recognize such situation and avoid incrementing the 1955 sequence number (in the MAC Mobility attribute) to infinity. In order 1956 to remedy such situation, a PE that detects a MAC mobility event by 1957 way of local learning starts an M-second timer (default value of M = 1958 180) and if it detects N MAC moves before the timer expires (default 1959 value for N = 5), it concludes that a duplicate MAC situation has 1960 occurred. The PE MUST alert the operator and stop sending and 1961 processing any BGP MAC Advertisement routes for that MAC address till 1962 a corrective action is taken by the operator. The values of M and N 1963 MUST be configurable to allow for flexibility in operator control. 1964 Note that the other PEs in the E-VPN instance will forward the 1965 traffic for the duplicate MAC address to one of the PEs advertising 1966 the duplicate MAC address. 1968 15.2. Sticky MAC addresses 1970 There are scenarios in which it is desired to configure some MAC 1971 addresses as static so that they are not subjected to MAC move. In 1972 such scenarios, these MAC addresses are advertised with MAC Mobility 1973 Extended Community where static flag is set to 1 and sequence number 1974 is set to zero. If a PE receives such advertisements and later learns 1975 the same MAC address(es) via local learning, then the PE MUST alert 1976 the operator. 1978 16. Multicast & Broadcast 1980 The PEs in a particular EVPN instance may use ingress replication or 1981 P2MP LSPs to send multicast traffic to other PEs. 1983 16.1. Ingress Replication 1985 The PEs may use ingress replication for flooding BUM traffic as 1986 described in section "Handling of Multi-Destination Traffic". A given 1987 broadcast packet must be sent to all the remote PEs. However a given 1988 multicast packet for a multicast flow may be sent to only a subset of 1989 the PEs. Specifically a given multicast flow may be sent to only 1990 those PEs that have receivers that are interested in the multicast 1991 flow. Determining which of the PEs have receivers for a given 1992 multicast flow is done using explicit tracking described below. 1994 16.2. P2MP LSPs 1996 An PE may use an "Inclusive" tree for sending an BUM packet. This 1997 terminology is borrowed from [VPLS-MCAST]. 1999 A variety of transport technologies may be used in the SP network. 2000 For inclusive P-Multicast trees, these transport technologies include 2001 point-to-multipoint LSPs created by RSVP-TE or mLDP. 2003 16.2.1. Inclusive Trees 2005 An Inclusive Tree allows the use of a single multicast distribution 2006 tree, referred to as an Inclusive P-Multicast tree, in the SP network 2007 to carry all the multicast traffic from a specified set of EVPN 2008 instances on a given PE. A particular P-Multicast tree can be set up 2009 to carry the traffic originated by sites belonging to a single EVPN 2010 instance, or to carry the traffic originated by sites belonging to 2011 several EVPN instances. The ability to carry the traffic of more than 2012 one EVPN instance on the same tree is termed 'Aggregation' and the 2013 tree is called an Aggregate Inclusive P-Multicast tree or Aggregate 2014 Inclusive tree for short. The Aggregate Inclusive tree needs to 2015 include every PE that is a member of any of the EVPN instances that 2016 are using the tree. This implies that an PE may receive multicast 2017 traffic for a multicast stream even if it doesn't have any receivers 2018 that are interested in receiving traffic for that stream. 2020 An Inclusive or Aggregate Inclusive tree as defined in this document 2021 is a P2MP tree. A P2MP tree is used to carry traffic only for EVPN 2022 CEs that are connected to the PE that is the root of the tree. 2024 The procedures for signaling an Inclusive tree are the same as those 2025 in [VPLS-MCAST] with the VPLS-AD route replaced with the Inclusive 2026 Multicast Ethernet Tag route. The P-Tunnel attribute [VPLS-MCAST] for 2027 an Inclusive tree is advertised with the Inclusive Multicast Ethernet 2028 Tag route as described in section "Handling of Multi-Destination 2029 Traffic". Note that for an Aggregate Inclusive tree, an PE can 2030 "aggregate" multiple EVPN instances on the same P2MP LSP using 2031 upstream labels. The procedures for aggregation are the same as those 2032 described in [VPLS-MCAST], with VPLS A-D routes replaced by EVPN 2033 Inclusive Multicast ET routes. 2035 17. Convergence 2037 This section describes failure recovery from different types of 2038 network failures. 2040 17.1. Transit Link and Node Failures between PEs 2042 The use of existing MPLS Fast-Reroute mechanisms can provide failure 2043 recovery in the order of 50ms, in the event of transit link and node 2044 failures in the infrastructure that connects the PEs. 2046 17.2. PE Failures 2048 Consider a host host1 that is dual homed to PE1 and PE2. If PE1 2049 fails, a remote PE, PE3, can discover this based on the failure of 2050 the BGP session. This failure detection can be in the sub-second 2051 range if BFD is used to detect BGP session failure. PE3 can update 2052 its forwarding state to start sending all traffic for host1 to only 2053 PE2. It is to be noted that this failure recovery is potentially 2054 faster than what would be possible if data plane learning were to be 2055 used. As in that case PE3 would have to rely on re-learning of MAC 2056 addresses via PE2. 2058 17.3. PE to CE Network Failures 2060 When an Ethernet segment connected to an PE fails or when a Ethernet 2061 Tag is decommissioned on an Ethernet segment, then the PE MUST 2062 withdraw the Ethernet A-D route(s) announced for the that are impacted by the failure or decommissioning. In 2064 addition, the PE MUST also withdraw the MAC advertisement routes that 2065 are impacted by the failure or decommissioning. 2067 The Ethernet A-D routes should be used by an implementation to 2068 optimize the withdrawal of MAC advertisement routes. When an PE 2069 receives a withdrawal of a particular Ethernet A-D route from an PE 2070 it SHOULD consider all the MAC advertisement routes, that are learned 2071 from the same as in the Ethernet A-D route, from 2072 the advertising PE, as having been withdrawn. This optimizes the 2073 network convergence times in the event of PE to CE failures. 2075 18. Frame Ordering 2077 In a MAC address, bit-1 of the most significant byte is used for 2078 unicast/multicast indication and bit-2 is used for globally unique 2079 versus locally administered MAC address. If the value of the 2nd 2080 nibble (bits 4 thorough 8) of the most significant byte of the 2081 destination MAC address (which follows the last MPLS label) happens 2082 to be 0x4 or 0x6, then the Ethernet frame can be misinterpreted as an 2083 IPv4 or IPv6 packet by intermediate P nodes performing ECMP based on 2084 deep packet inspection, thus resulting in load balancing packets 2085 belonging to the same flow on different ECMP paths and subjecting 2086 them to different delays. Therefore, packets belonging to the same 2087 flow can arrive at the destination out of order. This out of order 2088 delivery can happen during steady state in absence of any failures 2089 resulting in significant impact to the network operation. 2091 In order to avoid any such mis-ordering, the following rules are 2092 applied: 2094 - If a network uses deep packet inspection for its ECMP, then the 2095 control word SHOULD be used when sending EVPN encapsulated packets 2096 over a MP2P LSP. 2098 - If a network uses Entropy label [RFC6790], then the control word 2099 SHOULD NOT be used when sending EVPN encapsulated packet over a MP2P 2100 LSP. 2102 - When sending EVPN encapsulated packets over a P2MP LSP or TE P2P 2103 LSP, then the control world SHOULD NOT be used. 2105 The control word is defined as follows: 2107 0 1 2 3 2108 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2109 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2110 |0 0 0 0| Reserved | Sequence Number | 2111 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2113 In the above diagram the first 4 bits MUST be set to 0. The rest of 2114 the first 16 bits are reserved for future use. They MUST be set to 0 2115 when transmitting, and MUST be ignored upon receipt. The next 16 bits 2116 provide a sequence number that MUST also be set to zero by default. 2118 19. Acknowledgements 2120 Special thanks to Yakov Rekhter for reviewing this draft several 2121 times and providing valuable comments and for his very engaging 2122 discussions on several topics of this draft that helped shape this 2123 document. We would also like to thank Pedro Marques, Kaushik Ghosh, 2124 Nischal Sheth, Robert Raszuk, Amit Shukla, and Nadeem Mohammed for 2125 discussions that helped shape this document. We would also like to 2126 thank Han Nguyen for his comments and support of this work. We would 2127 also like to thank Steve Kensil and Reshad Rahman for their reviews. 2128 We would like to thank Jorge Rabadan for his contribution to section 2129 5 of this draft. We like to thank Thomas Morin for his review of this 2130 draft and his contribution of section 8.6. Last but not least, many 2131 thanks to Jakob Heitz for his help to improve several sections of 2132 this draft. 2134 We would also like to thank Clarence Filsfils, Dennis Cai, Quaizar 2135 Vohra, Kireeti Kompella, Apurva Mehta for their contributions to this 2136 document. 2138 20. Security Considerations 2140 Security considerations discussed in [RFC4761] and [RFC4762] apply to 2141 this document for MAC learning in data-plane over an Attachment 2142 Circuit (AC) and for flooding of unknown unicast and ARP messages 2143 over the MPLS/IP core. Security considerations discussed in [RFC4364] 2144 apply to this document for MAC learning in control-plane over the 2145 MPLS/IP core. This section describes additional considerations. 2147 As mentioned in [RFC4761], there are two aspects to achieving data 2148 privacy and protecting against denial-of-service attacks in a VPN: 2149 securing the control plane and protecting the forwarding path. 2150 Compromise of the control plane could result in a PE sending customer 2151 data belonging to some EVPN to another EVPN, or black-holing EVPN 2152 customer data, or even sending it to an eavesdropper; none of which 2153 are acceptable from a data privacy point of view. In addition, 2154 compromise of the control plane could result in black-holing EVPN 2155 customer data and could provide opportunities for unauthorized EVPN 2156 data usage (e.g., exploiting traffic replication within a multicast 2157 tree to amplify a denial-of-service attack based on sending large 2158 amounts of traffic). 2160 The mechanisms in this document use BGP for the control plane. Hence, 2161 techniques such as in [RFC5925] help authenticate BGP messages, 2162 making it harder to spoof updates (which can be used to divert EVPN 2163 traffic to the wrong EVPN instance) or withdrawals (denial-of-service 2164 attacks). In the multi-AS methods (b) and (c), this also means 2165 protecting the inter-AS BGP sessions, between the ASBRs, the PEs, or 2166 the Route Reflectors. 2168 Note that [RFC5925] will not help in keeping MPLS labels private -- 2169 knowing the labels, one can eavesdrop on EVPN traffic. However, this 2170 requires access to the data path within an SP network, which is 2171 assumed to be composed of trusted nodes/links. 2173 One of the requirements for protecting the data plane is that the 2174 MPLS labels be accepted only from valid interfaces. For a PE, valid 2175 interfaces comprise links from other routers in the PE's own AS. For 2176 an ASBR, valid interfaces comprise links from other routers in the 2177 ASBR's own AS, and links from other ASBRs in ASes that have instances 2178 of a given EVPN. It is especially important in the case of multi-AS 2179 EVPN instances that one accept EVPN packets only from valid 2180 interfaces. 2182 It is also important to help limit malicious traffic into a network 2183 for an imposter MAC address. The mechanism described in section 15.1, 2184 shows how duplicate MAC addresses can be detected and continous false 2185 MAC mobility can be prevented. The mechanism described in section 2186 15.2, shows how MAC addresses can be pinned to a given Ethernet 2187 Segment, such that if they appear behind any other Ethernet Segments, 2188 the traffic for those MAC addresses be prevented from entering the 2189 EVPN network from the other Ethernet Segments. 2191 21. Co-authors 2193 In addition to the authors listed on the front page, the following 2194 individuals have also helped to shape this document: 2196 Keyur Patel 2197 Samer Salam 2198 Sami Boutros 2199 Cisco 2200 Yakov Rekhter 2201 Ravi Shekhar 2202 Juniper Networks 2204 Florin Balus 2205 Nuage Networks 2207 22. IANA Considerations 2209 This document defines a new NLRI, called "EVPN", to be carried in BGP 2210 using multiprotocol extensions. This NLRI uses the existing AFI of 2211 25 (L2VPN). IANA has assigned it a SAFI value of 70. 2213 23. References 2215 23.1 Normative References 2217 [RFC4364] "BGP/MPLS IP VPNs", Rosen, Rekhter, et. al., February 2006 2219 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 2220 (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 2221 4761, January 2007. 2223 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 2224 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 2225 RFC 4762, January 2007. 2227 [RFC4271] Y. Rekhter et. al., "A Border Gateway Protocol 4 (BGP-4)", 2228 RFC 4271, January 2006 2230 [RFC4760] T. Bates et. al., "Multiprotocol Extensions for BGP-4", RFC 2231 4760, January 2007 2233 23.2 Informative References 2235 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2236 Requirement Levels", BCP 14, RFC 2119, March 1997. 2238 [EVPN-REQ] A. Sajassi, R. Aggarwal et. al., "Requirements for 2239 Ethernet VPN", draft-ietf-l2vpn-evpn-req-04.txt, July 2240 2013. 2242 [VPLS-MCAST] "Multicast in VPLS". R. Aggarwal et.al., draft-ietf- 2243 l2vpn-vpls-mcast-14.txt, July 2013. 2245 [RT-CONSTRAIN] P. Marques et. al., "Constrained Route Distribution 2246 for Border Gateway Protocol/MultiProtocol Label Switching 2247 (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks 2248 (VPNs)", RFC 4684, November 2006. 2250 [RFC6790] K. Kompella et. al, "The Use of Entropy Labels in MPLS 2251 Forwarding", RFC 6790, November 2012. 2253 24. Author's Address 2255 Ali Sajassi 2256 Cisco 2257 Email: sajassi@cisco.com 2259 Rahul Aggarwal 2260 Email: raggarwa_1@yahoo.com 2262 Nabil Bitar 2263 Verizon Communications 2264 Email : nabil.n.bitar@verizon.com 2266 Aldrin Isaac 2267 Bloomberg 2268 Email: aisaac71@bloomberg.net 2270 James Uttaro 2271 AT&T 2272 Email: uttaro@att.com 2274 John Drake 2275 Juniper Networks 2276 Email: jdrake@juniper.net 2278 Wim Henderickx 2279 Alcatel-Lucent 2280 e-mail: wim.henderickx@alcatel-lucent.com