idnits 2.17.1 draft-ietf-l2vpn-evpn-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 44 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 12, 2014) is 3725 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5925' is mentioned on line 2145, but not defined == Outdated reference: A later version (-07) exists of draft-ietf-l2vpn-evpn-req-04 == Outdated reference: A later version (-16) exists of draft-ietf-l2vpn-vpls-mcast-14 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Sajassi 3 INTERNET-DRAFT Cisco 4 Category: Standards Track 5 R. Aggarwal 6 N. Bitar Arktan 7 Verizon 8 W. Henderickx 9 J. Drake Alcatel-Lucent 10 Juniper Networks 11 Aldrin Isaac 12 Bloomberg 14 J. Uttaro 15 AT&T 17 Expires: August 12, 2014 February 12, 2014 19 BGP MPLS Based Ethernet VPN 20 draft-ietf-l2vpn-evpn-05 22 Status of this Memo 24 This Internet-Draft is submitted to IETF in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as 30 Internet-Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/1id-abstracts.html 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html 43 Copyright and License Notice 45 Copyright (c) 2013 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Abstract 60 This document describes procedures for BGP MPLS based Ethernet VPNs 61 (EVPN). 63 Table of Contents 65 1. Specification of requirements . . . . . . . . . . . . . . . . . 5 66 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 4. BGP MPLS Based EVPN Overview . . . . . . . . . . . . . . . . . 6 69 5. Ethernet Segment . . . . . . . . . . . . . . . . . . . . . . . 7 70 6. Ethernet Tag . . . . . . . . . . . . . . . . . . . . . . . . . 10 71 6.1 VLAN Based Service Interface . . . . . . . . . . . . . . . . 10 72 6.2 VLAN Bundle Service Interface . . . . . . . . . . . . . . . 11 73 6.2.1 Port Based Service Interface . . . . . . . . . . . . . . 11 74 6.3 VLAN Aware Bundle Service Interface . . . . . . . . . . . . 11 75 6.3.1 Port Based VLAN Aware Service Interface . . . . . . . . 11 76 7. BGP EVPN NLRI . . . . . . . . . . . . . . . . . . . . . . . . . 12 77 7.1. Ethernet Auto-Discovery Route . . . . . . . . . . . . . . . 12 78 7.2. MAC/IP Advertisement Route . . . . . . . . . . . . . . . . 13 79 7.3. Inclusive Multicast Ethernet Tag Route . . . . . . . . . . 14 80 7.4 Ethernet Segment Route . . . . . . . . . . . . . . . . . . . 14 81 7.5 ESI Label Extended Community . . . . . . . . . . . . . . . . 15 82 7.6 ES-Import Route Target . . . . . . . . . . . . . . . . . . . 15 83 7.7 MAC Mobility Extended Community . . . . . . . . . . . . . . 16 84 7.8 Default Gateway Extended Community . . . . . . . . . . . . . 16 85 8. Multi-homing Functions . . . . . . . . . . . . . . . . . . . . 16 86 8.1 Multi-homed Ethernet Segment Auto-Discovery . . . . . . . . 17 87 8.1.1 Constructing the Ethernet Segment Route . . . . . . . . 17 88 8.2 Fast Convergence . . . . . . . . . . . . . . . . . . . . . . 17 89 8.2.1 Constructing the Ethernet A-D per Ethernet Segment 90 (ES) Route . . . . . . . . . . . . . . . . . . . . . . . 18 91 8.2.1.1. Ethernet A-D Route Targets . . . . . . . . . . . . 18 92 8.3 Split Horizon . . . . . . . . . . . . . . . . . . . . . . . 19 93 8.3.1 ESI Label Assignment . . . . . . . . . . . . . . . . . . 19 94 8.3.1.1 Ingress Replication . . . . . . . . . . . . . . . . 19 95 8.3.1.2. P2MP MPLS LSPs . . . . . . . . . . . . . . . . . . 20 97 8.4 Aliasing and Backup-Path . . . . . . . . . . . . . . . . . . 21 98 8.4.1 Constructing the Ethernet A-D per EVPN Instance (EVI) 99 Route . . . . . . . . . . . . . . . . . . . . . . . . . 22 100 8.4.1.1 Ethernet A-D Route Targets . . . . . . . . . . . . . 23 101 8.5 Designated Forwarder Election . . . . . . . . . . . . . . . 23 102 8.6. Interoperability with Single-homing PEs . . . . . . . . . . 25 103 9. Determining Reachability to Unicast MAC Addresses . . . . . . . 26 104 9.1. Local Learning . . . . . . . . . . . . . . . . . . . . . . 26 105 9.2. Remote learning . . . . . . . . . . . . . . . . . . . . . . 27 106 9.2.1. Constructing the BGP EVPN MAC/IP Address 107 Advertisement . . . . . . . . . . . . . . . . . . . . . 27 108 9.2.2 Route Resolution . . . . . . . . . . . . . . . . . . . . 29 109 10. ARP and ND . . . . . . . . . . . . . . . . . . . . . . . . . . 30 110 10.1 Default Gateway . . . . . . . . . . . . . . . . . . . . . . 30 111 11. Handling of Multi-Destination Traffic . . . . . . . . . . . . 32 112 11.1. Construction of the Inclusive Multicast Ethernet Tag 113 Route . . . . . . . . . . . . . . . . . . . . . . . . . . 32 114 11.2. P-Tunnel Identification . . . . . . . . . . . . . . . . . 32 115 12. Processing of Unknown Unicast Packets . . . . . . . . . . . . 33 116 12.1. Ingress Replication . . . . . . . . . . . . . . . . . . . 34 117 12.2. P2MP MPLS LSPs . . . . . . . . . . . . . . . . . . . . . . 34 118 13. Forwarding Unicast Packets . . . . . . . . . . . . . . . . . . 35 119 13.1. Forwarding packets received from a CE . . . . . . . . . . 35 120 13.2. Forwarding packets received from a remote PE . . . . . . . 36 121 13.2.1. Unknown Unicast Forwarding . . . . . . . . . . . . . . 36 122 13.2.2. Known Unicast Forwarding . . . . . . . . . . . . . . . 36 123 14. Load Balancing of Unicast Frames . . . . . . . . . . . . . . . 37 124 14.1. Load balancing of traffic from an PE to remote CEs . . . . 37 125 14.1.1 Single-Active Redundancy Mode . . . . . . . . . . . . . 37 126 14.1.2 All-Active Redundancy Mode . . . . . . . . . . . . . . 38 127 14.2. Load balancing of traffic between an PE and a local CE . . 39 128 14.2.1. Data plane learning . . . . . . . . . . . . . . . . . 40 129 14.2.2. Control plane learning . . . . . . . . . . . . . . . . 40 130 15. MAC Mobility . . . . . . . . . . . . . . . . . . . . . . . . . 40 131 15.1. MAC Duplication Issue . . . . . . . . . . . . . . . . . . 42 132 15.2. Sticky MAC addresses . . . . . . . . . . . . . . . . . . . 42 133 16. Multicast & Broadcast . . . . . . . . . . . . . . . . . . . . 42 134 16.1. Ingress Replication . . . . . . . . . . . . . . . . . . . 43 135 16.2. P2MP LSPs . . . . . . . . . . . . . . . . . . . . . . . . 43 136 16.2.1. Inclusive Trees . . . . . . . . . . . . . . . . . . . 43 137 17. Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 44 138 17.1. Transit Link and Node Failures between PEs . . . . . . . . 44 139 17.2. PE Failures . . . . . . . . . . . . . . . . . . . . . . . 44 140 17.3. PE to CE Network Failures . . . . . . . . . . . . . . . . 44 141 18. Frame Ordering . . . . . . . . . . . . . . . . . . . . . . . . 45 142 19. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 45 143 20. Security Considerations . . . . . . . . . . . . . . . . . . . 46 144 21. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 47 145 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 47 146 23. References . . . . . . . . . . . . . . . . . . . . . . . . . . 47 147 23.1 Normative References . . . . . . . . . . . . . . . . . . . 48 148 23.2 Informative References . . . . . . . . . . . . . . . . . . 48 149 24. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 48 151 1. Specification of requirements 153 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 154 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 155 document are to be interpreted as described in [RFC2119]. 157 2. Terminology 159 Bridge Domain: 161 Broadcast Domain: 163 CE: Customer Edge device e.g., host or router or switch 165 EVI: An EVPN instance spanning across the PEs participating in that 166 VPN 168 MAC-VRF: A Virtual Routing and Forwarding table for MAC addresses on 169 a PE for an EVI 171 Ethernet Segment Identifier (ESI): If a CE is multi-homed to two or 172 more PEs, the set of Ethernet links that attaches the CE to the PEs 173 is an 'Ethernet segment'. Ethernet segments MUST have a unique non- 174 zero identifier, the 'Ethernet Segment Identifier'. 176 Ethernet Tag: An Ethernet Tag identifies a particular broadcast 177 domain, e.g., a VLAN. An EVPN instance consists of one or more 178 broadcast domains. Ethernet tag(s) are assigned to the broadcast 179 domains of a given EVPN instance by the provider of that EVPN, and 180 each PE in that EVPN instance performs a mapping between broadcast 181 domain identifier(s) understood by each of its attached CEs and the 182 corresponding Ethernet tag. 184 LACP: Link Aggregation Control Protocol 186 MP2MP: Multipoint to Multipoint 188 P2MP: Point to Multipoint 190 P2P: Point to Point 192 Single-Active Redundancy Mode: When only a single PE, among a group 193 of PEs attached to an Ethernet segment, is allowed to forward traffic 194 to/from that Ethernet Segment, then the Ethernet segment is defined 195 to be operating in Single-Active redundancy mode. 197 All-Active Redundancy Mode: When all PEs attached to an Ethernet 198 segment are allowed to forward traffic to/from that Ethernet Segment, 199 then the Ethernet segment is defined to be operating in All-Active 200 redundancy mode. 202 3. Introduction 204 This document describes procedures for BGP MPLS based Ethernet VPNs 205 (EVPN). The procedures described here are intended to meet the 206 requirements specified in [EVPN-REQ]. Please refer to [EVPN-REQ] for 207 the detailed requirements and motivation. EVPN requires extensions to 208 existing IP/MPLS protocols as described in this document. In addition 209 to these extensions EVPN uses several building blocks from existing 210 MPLS technologies. 212 4. BGP MPLS Based EVPN Overview 214 This section provides an overview of EVPN. An EVPN instance comprises 215 CEs that are connected to PEs that form the edge of the MPLS 216 infrastructure. A CE may be a host, a router or a switch. The PEs 217 provide virtual Layer 2 bridged connectivity between the CEs. There 218 may be multiple EVPN instances in the provider's network. 220 The PEs may be connected by an MPLS LSP infrastructure which provides 221 the benefits of MPLS technology such as fast-reroute, resiliency, 222 etc. The PEs may also be connected by an IP infrastructure in which 223 case IP/GRE tunneling or other IP tunneling can be used between the 224 PEs. The detailed procedures in this version of this document are 225 specified only for MPLS LSPs as the tunneling technology. However 226 these procedures are designed to be extensible to IP tunneling as the 227 PSN tunneling technology. 229 In an EVPN, MAC learning between PEs occurs not in the data plane (as 230 happens with traditional bridging) but in the control plane. Control 231 plane learning offers greater control over the MAC learning process, 232 such as restricting who learns what, and the ability to apply 233 policies. Furthermore, the control plane chosen for advertising MAC 234 reachability information is multi-protocol (MP) BGP (similar to IP 235 VPNs (RFC 4364)). This provides greater scalability and the ability 236 to preserve the "virtualization" or isolation of groups of 237 interacting agents (hosts, servers, virtual machines) from each 238 other. In EVPN, PEs advertise the MAC addresses learned from the CEs 239 that are connected to them, along with an MPLS label, to other PEs in 240 the control plane using MP-BGP. Control plane learning enables load 241 balancing of traffic to and from CEs that are multi-homed to multiple 242 PEs. This is in addition to load balancing across the MPLS core via 243 multiple LSPs between the same pair of PEs. In other words it allows 244 CEs to connect to multiple active points of attachment. It also 245 improves convergence times in the event of certain network failures. 247 However, learning between PEs and CEs is done by the method best 248 suited to the CE: data plane learning, IEEE 802.1x, LLDP, 802.1aq, 249 ARP, management plane or other protocols. 251 It is a local decision as to whether the Layer 2 forwarding table on 252 an PE is populated with all the MAC destination addresses known to 253 the control plane, or whether the PE implements a cache based scheme. 254 For instance the MAC forwarding table may be populated only with the 255 MAC destinations of the active flows transiting a specific PE. 257 The policy attributes of EVPN are very similar to those of IP-VPN. A 258 EVPN instance requires a Route-Distinguisher (RD) which is unique per 259 PE and one or more globally unique Route-Targets (RTs). A CE attaches 260 to a MAC-VRF on an PE, on an Ethernet interface which may be 261 configured for one or more Ethernet Tags, e.g., VLAN IDs. Some 262 deployment scenarios guarantee uniqueness of VLAN IDs across EVPN 263 instances: all points of attachment for a given EVPN instance use the 264 same VLAN ID, and no other EVPN instance uses this VLAN ID. This 265 document refers to this case as a "Unique VLAN EVPN" and describes 266 simplified procedures to optimize for it. 268 5. Ethernet Segment 270 If a CE is multi-homed to two or more PEs, the set of Ethernet links 271 constitutes an "Ethernet Segment". An Ethernet segment may appear to 272 the CE as a Link Aggregation Group (LAG). Ethernet segments have an 273 identifier, called the "Ethernet Segment Identifier" (ESI) which is 274 encoded as a ten octets integer. The following two ESI values are 275 reserved: 277 - ESI 0 denotes a single-homed CE. 279 - ESI {0xFF} (repeated 10 times) is known as MAX-ESI and is 280 reserved. 282 In general, an Ethernet segment MUST have a non-reserved ESI that is 283 unique network wide (e.g., across all EVPN instances on all the PEs). 284 If the CE(s) constituting an Ethernet Segment is (are) managed by the 285 network operator, then ESI uniqueness should be guaranteed; however, 286 if the CE(s) is (are) not managed, then the operator MUST configure a 287 network-wide unique ESI for that Ethernet Segment. This is required 288 to enable auto-discovery of Ethernet Segments and DF election. 290 In a network with managed and not-managed CEs, the ESI has the 291 following format: 293 +---+---+---+---+---+---+---+---+---+---+ 294 | T | ESI Value | 295 +---+---+---+---+---+---+---+---+---+---+ 297 Where: 299 T (ESI Type) is a 1-byte field (most significant octet) that 300 specifies the format of the remaining nine bytes (ESI Value). The 301 following 6 ESI types can be used: 303 - Type 0 (T=0x00) - This type indicates an arbitrary nine-octet ESI 304 value, which is managed and configured by the operator. 306 - Type 1 (T=0x01) - When IEEE 802.1AX LACP is used between the PEs 307 and CEs, this ESI type indicates an auto-generated ESI value 308 determined from LACP by concatenating the following parameters: 310 + CE LACP six octets System MAC address. The CE LACP System MAC 311 address MUST be encoded in the high order six octets of the ESI 312 Value field. 314 + CE LACP two octets Port Key. The CE LACP port key MUST be 315 encoded in the two octets next to the System MAC address. 317 + The remaining octet will be set to 0x00. 319 As far as the CE is concerned, it would treat the multiple PEs 320 that it is connected to as the same switch. This allows the CE 321 to aggregate links that are attached to different PEs in the 322 same bundle. 324 This mechanism could be used only if it produces ESIs that satisfy 325 the uniqueness requirement specified above. 327 - Type 2 (T=0x02) - This type is used in the case of indirectly 328 connected hosts via a bridged LAN between the CEs and the PEs. The 329 ESI Value is auto-generated and determined based on the Layer 2 330 bridge protocol as follows: If MST is used in the bridged LAN then 331 the value of the ESI is derived by listening to BPDUs on the Ethernet 332 segment. To achieve this the PE is not required to run MST. However 333 the PE must learn the Root Bridge MAC address and Bridge Priority of 334 the root of the Internal Spanning Tree (IST) by listening to the 335 BPDUs. The ESI Value is constructed as follows: 337 + Root Bridge six octets MAC address. The Root Bridge MAC 338 address MUST be encoded in the high order six octets of the 339 ESI Value field. 341 + Root Bridge two octets Priority. The CE LACP port key MUST be 342 encoded in the two octets next to the Root Bridge MAC address. 344 + The remaining octet will be set to 0x00. 346 This mechanism could be used only if it produces ESIs that satisfy 347 the uniqueness requirement specified above. 349 - Type 3 (T=0x03) - This type indicates a MAC-based ESI Value that 350 can be auto-generated or configured by the operator. The ESI Value is 351 constructed as follows: 353 + System MAC address (six octets). The System MAC address MUST 354 be encoded in the high order six octets of the ESI Value field. 356 + Local Discriminator value (three octets). The Local 357 Discriminator MUST be encoded in the low order three octets 358 of the ESI Value. 360 This mechanism could be used only if it produces ESIs that satisfy 361 the uniqueness requirement specified above. 363 - Type 4 (T=0x04) - This type indicates an IP-based ESI Value that 364 can be auto-generated or configured by the operator. The ESI Value is 365 constructed as follows: 367 + IP address (four octets). This is an IPv4 address owned by 368 the system and MUST be encoded in the high order four octets 369 of the ESI Value field. 371 + Local Discriminator value (four octets). The Local Discriminator 372 MUST be encoded in the four octets next to the IP address. 374 + The low order octet of the ESI Value will be set to 0x00. 376 This mechanism could be used only if it produces ESIs that satisfy 377 the uniqueness requirement specified above. 379 - Type 5 (T=0x05) - This type indicates an AS-based ESI Value that 380 can be auto-generated or configured by the operator. The ESI Value is 381 constructed as follows: 383 + AS number (four octets). This is an AS number owned by the 384 system and MUST be encoded in the high order four octets of the 385 ESI Value field. If a two-octet AS number is used, the high order 386 extra two bytes will be 0x0000. 388 + Local Discriminator value (four octets). The Local Discriminator 389 MUST be encoded in the four octets next to the AS number. 391 + The low order octet of the ESI Value will be set to 0x00. 393 This mechanism could be used only if it produces ESIs that satisfy 394 the uniqueness requirement specified above. 396 6. Ethernet Tag 398 An Ethernet Tag identifies a particular broadcast domain, e.g. a 399 VLAN, in an EVPN Instance. An EVPN Instance consists of one or more 400 broadcast domains (one or more VLANs). VLANs are assigned to a given 401 EVPN Instance by the provider of the EVPN service. A given VLAN can 402 itself be represented by multiple VLAN IDs (VIDs). In such cases, the 403 PEs participating in that VLAN for a given EVPN instance are 404 responsible for performing VLAN ID translation to/from locally 405 attached CE devices. 407 If a VLAN is represented by a single VID across all PE devices 408 participating in that VLAN for that EVPN instance, then there is no 409 need for VID translation at the PEs. Furthermore, some deployment 410 scenarios guarantee uniqueness of VIDs across all EVPN instances; 411 all points of attachment for a given EVPN instance use the same VID 412 and no other EVPN instances use that VID. This allows the RT(s) for 413 each EVPN instance to be derived automatically from the corresponding 414 VID, as described in section 9.4.1.1.1 "Auto-Derivation from the 415 Ethernet Tag ID". 417 The following subsections discuss the relationship between broadcast 418 domains (e.g., VLANs), Ethernet Tags (e.g., VIDs), and MAC-VRFs as 419 well as the setting of the Ethernet Tag Identifier, in the various 420 EVPN BGP routes (defined in section 8), for the different types of 421 service interfaces described in [EVPN-REQ]. 423 The following Ethernet Tag value is reserved: 425 - Ethernet Tag {0xFFFFFFFF} is known as MAX-ET 427 6.1 VLAN Based Service Interface 429 With this service interface, an EVPN instance consists of only a 430 single broadcast domain (e.g., a single VLAN). Therefore, there is a 431 one to one mapping between a VID on this interface and a MAC-VRF. 432 Since a MAC-VRF corresponds to a single VLAN, it consists of a single 433 bridge domain corresponding to that VLAN. If the VLAN is represented 434 by different VIDs on different PEs, then each PE needs to perform VID 435 translation for frames destined to its attached CEs. In such 436 scenarios, the Ethernet frames transported over MPLS/IP network 437 SHOULD remain tagged with the originating VID and a VID translation 438 MUST be supported in the data path and MUST be performed on the 439 disposition PE. The Ethernet Tag Identifier in all EVPN routes MUST 440 be set to 0. 442 6.2 VLAN Bundle Service Interface 444 With this service interface, an EVPN instance corresponds to several 445 broadcast domains (e.g., several VLANs); however, only a single 446 bridge domain is maintained per MAC-VRF which means multiple VLANs 447 share the same bridge domain. This implies MAC addresses MUST be 448 unique across different VLANs for this service to work. In other 449 words, there is a many-to-one mapping between VLANs and a MAC-VRF, 450 and the MAC-VRF consists of a single bridge domain. Furthermore, a 451 single VLAN must be represented by a single VID - e.g., no VID 452 translation is allowed for this service interface type. The MPLS 453 encapsulated frames MUST remain tagged with the originating VID. Tag 454 translation is NOT permitted. The Ethernet Tag Identifier in all EVPN 455 routes MUST be set to 0. 457 6.2.1 Port Based Service Interface 459 This service interface is a special case of the VLAN Bundle service 460 interface, where all of the VLANs on the port are part of the same 461 service and map to the same bundle. The procedures are identical to 462 those described in section 7.2. 464 6.3 VLAN Aware Bundle Service Interface 466 With this service interface, an EVPN instance consists of several 467 broadcast domains (e.g., several VLANs) with each VLAN having its own 468 bridge domain - e.g., multiple bridge domains (one per VLAN) is 469 maintained by a single MAC-VRF corresponding to the EVPN instance. In 470 the case where a single VLAN is represented by different VIDs on 471 different CEs and thus tag (VID) translation is required, a 472 normalized Ethernet Tag (VID) MUST be carried in the MPLS 473 encapsulated frames and a tag translation function MUST be supported 474 in the data path. This translation MUST be performed in data path on 475 both the imposition as well as the disposition PEs (translating to 476 normalized tag on imposition PE and translating to local tag on 477 disposition PE). The Ethernet Tag Identifier in all EVPN routes MUST 478 be set to the normalized Ethernet Tag assigned by the EVPN provider. 480 6.3.1 Port Based VLAN Aware Service Interface 482 This service interface is a special case of the VLAN Aware Bundle 483 service interface, where all of the VLANs on the port are part of the 484 same service and map to the same bundle. The procedures are identical 485 to those described in section 7.3. 487 7. BGP EVPN NLRI 489 This document defines a new BGP NLRI, called the EVPN NLRI. 491 Following is the format of the EVPN NLRI: 493 +-----------------------------------+ 494 | Route Type (1 octet) | 495 +-----------------------------------+ 496 | Length (1 octet) | 497 +-----------------------------------+ 498 | Route Type specific (variable) | 499 +-----------------------------------+ 501 The Route Type field defines encoding of the rest of the EVPN NLRI 502 (Route Type specific EVPN NLRI). 504 The Length field indicates the length in octets of the Route Type 505 specific field of EVPN NLRI. 507 This document defines the following Route Types: 509 + 1 - Ethernet Auto-Discovery (A-D) route 510 + 2 - MAC advertisement route 511 + 3 - Inclusive Multicast Route 512 + 4 - Ethernet Segment Route 514 The detailed encoding and procedures for these route types are 515 described in subsequent sections. 517 The EVPN NLRI is carried in BGP [RFC4271] using BGP Multiprotocol 518 Extensions [RFC4760] with an AFI of 25 (L2VPN) and a SAFI of 70 519 (EVPN). The NLRI field in the MP_REACH_NLRI/MP_UNREACH_NLRI attribute 520 contains the EVPN NLRI (encoded as specified above). 522 In order for two BGP speakers to exchange labeled EVPN NLRI, they 523 must use BGP Capabilities Advertisement to ensure that they both are 524 capable of properly processing such NLRI. This is done as specified 525 in [RFC4760], by using capability code 1 (multiprotocol BGP) with an 526 AFI of 25 (L2VPN) and a SAFI of 70 (EVPN). 528 7.1. Ethernet Auto-Discovery Route 530 A Ethernet A-D route type specific EVPN NLRI consists of the 531 following: 533 +---------------------------------------+ 534 | RD (8 octets) | 535 +---------------------------------------+ 536 |Ethernet Segment Identifier (10 octets)| 537 +---------------------------------------+ 538 | Ethernet Tag ID (4 octets) | 539 +---------------------------------------+ 540 | MPLS Label (3 octets) | 541 +---------------------------------------+ 543 For the purpose of BGP route key processing, only the Ethernet 544 Segment ID and the Ethernet Tag ID are considered to be part of the 545 prefix in the NLRI. The MPLS Label field is to be treated as a 546 route attribute as opposed to being part of the route. 548 For procedures and usage of this route please see section 9.2 "Fast 549 Convergence" and section 9.4 "Aliasing". 551 7.2. MAC/IP Advertisement Route 553 A MAC advertisement route type specific EVPN NLRI consists of the 554 following: 556 +---------------------------------------+ 557 | RD (8 octets) | 558 +---------------------------------------+ 559 |Ethernet Segment Identifier (10 octets)| 560 +---------------------------------------+ 561 | Ethernet Tag ID (4 octets) | 562 +---------------------------------------+ 563 | MAC Address Length (1 octet) | 564 +---------------------------------------+ 565 | MAC Address (6 octets) | 566 +---------------------------------------+ 567 | IP Address Length (1 octet) | 568 +---------------------------------------+ 569 | IP Address (0 or 4 or 16 octets) | 570 +---------------------------------------+ 571 | MPLS Label1 (3 octets) | 572 +---------------------------------------+ 573 | MPLS Label2 (0 or 3 octets) | 574 +---------------------------------------+ 576 For the purpose of BGP route key processing, only the Ethernet Tag 577 ID, MAC Address Length, MAC Address, IP Address Length, and IP 578 Address Address fields are considered to be part of the prefix in the 579 NLRI. The Ethernet Segment Identifier and MPLS Label fields are to be 580 treated as route attributes as opposed to being part of the "route". 582 For procedures and usage of this route please see section 10 583 "Determining Reachability to Unicast MAC Addresses" and section 15 584 "Load Balancing of Unicast Packets". 586 7.3. Inclusive Multicast Ethernet Tag Route 588 An Inclusive Multicast Ethernet Tag route type specific EVPN NLRI 589 consists of the following: 591 +---------------------------------------+ 592 | RD (8 octets) | 593 +---------------------------------------+ 594 | Ethernet Tag ID (4 octets) | 595 +---------------------------------------+ 596 | IP Address Length (1 octet) | 597 +---------------------------------------+ 598 | Originating Router's IP Addr | 599 | (4 or 16 octets) | 600 +---------------------------------------+ 602 For procedures and usage of this route please see section 12 603 "Handling of Multi-Destination Traffic", section 13 "Processing of 604 Unknown Unicast Traffic" and section 17 "Multicast". 606 7.4 Ethernet Segment Route 608 The Ethernet Segment Route is encoded in the EVPN NLRI using the 609 Route Type value of 4. The Route Type Specific field of the NLRI is 610 formatted as follows: 612 +---------------------------------------+ 613 | RD (8 octets) | 614 +---------------------------------------+ 615 |Ethernet Segment Identifier (10 octets)| 616 +---------------------------------------+ 617 | IP Address Length (1 octet) | 618 +---------------------------------------+ 619 | Originating Router's IP Addr | 620 | (4 or 16 octets) | 621 +---------------------------------------+ 623 For procedures and usage of this route please see section 9.5 624 "Designated Forwarder Election". The IP address length is in bits. 626 7.5 ESI Label Extended Community 628 This extended community is a new transitive extended community with 629 the Type field is 0x06, and the Sub-Type of 0x01. It may be 630 advertised along with Ethernet Auto-Discovery routes and it enables 631 split-horizon procedures for multi-homed sites as described in 632 section 9.3 "Split Horizon". 634 Each ESI Label Extended Community is encoded as a 8-octet value as 635 follows: 637 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 639 | Type=0x06 | Sub-Type=0x01 | Flags (One Octet) |Reserved=0 | 640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 641 | Reserved = 0| ESI Label | 642 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 644 The low order bit of the flags octet is defined as the "Single- 645 Active" bit. A value of 0 means that the multi-homed site is 646 operating in All-Active redundancy mode and a value of 1 means that 647 the multi-homed site is operating in Single-Active redundancy mode. 649 The second low order bit of the flags octet is defined as the "Root- 650 Leaf". A value of 0 means that this label is associated with a Root 651 site; whereas, a value of 1 means that this label is associate with a 652 Leaf site. The other bits must be set to 0. 654 7.6 ES-Import Route Target 656 This is a new transitive Route Target extended community carried with 657 the Ethernet Segment route. When used, it enables all the PEs 658 connected to the same multi-homed site to import the Ethernet Segment 659 routes. The value is derived automatically from the ESI by encoding 660 the high order 6-byte portion of the 9-byte ESI Value in the ES- 661 Import Route Target. The format of this extended community is as 662 follows: 664 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 665 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 666 | Type=0x06 | Sub-Type=0x02 | ES-Import | 667 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 668 | ES-Import Cont'd | 669 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 671 This document expands the definition of the Route Target extended 672 community to allow the value of high order octet (Type field) to be 673 0x06 (in addition to the values specified in rfc4360). The value of 674 low order octet (Sub-Type field) of 0x02 indicates that this extended 675 community is of type "Route Target". The new value for Type field of 676 0x06 indicates that the structure of this RT is a six bytes value 677 (e.g., a MAC address). A BGP speaker that implements RT-Constrain 678 (RFC4684) MUST apply the RT-Constrain procedures to the ES-import RT 679 as-well. 681 For procedures and usage of this attribute, please see section 9.1 682 "Redundancy Group Discovery". 684 7.7 MAC Mobility Extended Community 686 This extended community is a new transitive extended community with 687 the Type field of 0x06 and the Sub-Type of 0x00. It may be advertised 688 along with MAC Advertisement routes. The procedures for using this 689 Extended Community are described in section 16 "MAC Mobility". 691 The MAC Mobility Extended Community is encoded as a 8-octet value as 692 follows: 694 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 695 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 696 | Type=0x06 | Sub-Type=0x00 |Flags(1 octet)| Reserved=0 | 697 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 698 | Sequence Number | 699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 701 The low order bit of the flags octet is defined as the 702 "Sticky/static" flag and may be set to 1. A value of 1 means that the 703 MAC address is static and cannot move. 705 7.8 Default Gateway Extended Community 707 The Default Gateway community is an Extended Community of an Opaque 708 Type (see 3.3 of rfc4360). It is a transitive community, which means 709 that the first octet is 0x03. The value of the second octet (Sub- 710 Type) is 0x030d (Default Gateway) as defined by IANA. The Value field 711 of this community is reserved (set to 0 by the senders, ignored by 712 the receivers). 714 8. Multi-homing Functions 716 This section discusses the functions, procedures and associated BGP 717 routes used to support multi-homing in EVPN. This covers both multi- 718 homed device (MHD) as well as multi-homed network (MHN) scenarios. 720 8.1 Multi-homed Ethernet Segment Auto-Discovery 722 PEs connected to the same Ethernet segment can automatically discover 723 each other with minimal to no configuration through the exchange of 724 the Ethernet Segment route. 726 8.1.1 Constructing the Ethernet Segment Route 728 The Route-Distinguisher (RD) MUST be a Type 1 RD [RFC4364]. The value 729 field comprises an IP address of the MES (typically, the loopback 730 address) followed by 0's. 732 The Ethernet Segment Identifier MUST be set to the ten octet ESI 733 identifier described in section 6. 735 The BGP advertisement that advertises the Ethernet Segment route MUST 736 also carry an ES-Import extended community attribute, as defined in 737 section 8.6. 739 The Ethernet Segment Route filtering MUST be done such that the 740 Ethernet Segment Route is imported only by the PEs that are multi- 741 homed to the same Ethernet Segment. To that end, each PE that is 742 connected to a particular Ethernet segment constructs an import 743 filtering rule to import a route that carries the ES-Import extended 744 community, constructed from the ESI. 746 8.2 Fast Convergence 748 In EVPN, MAC address reachability is learnt via the BGP control-plane 749 over the MPLS network. As such, in the absence of any fast protection 750 mechanism, the network convergence time is a function of the number 751 of MAC Advertisement routes that must be withdrawn by the PE 752 encountering a failure. For highly scaled environments, this scheme 753 yields slow convergence. 755 To alleviate this, EVPN defines a mechanism to efficiently and 756 quickly signal, to remote PE nodes, the need to update their 757 forwarding tables upon the occurrence of a failure in connectivity to 758 an Ethernet segment. This is done by having each PE advertise a set 759 of Ethernet A-D per Ethernet segment (per ES) routes for each locally 760 attached Ethernet segment (refer to section 9.2.1 below for details 761 on how this route is constructed). Upon a failure in connectivity to 762 the attached segment, the PE withdraws the corresponding Ethernet A-D 763 route. This triggers all PEs that receive the withdrawal to update 764 their next-hop adjacencies for all MAC addresses associated with the 765 Ethernet segment in question. If no other PE had advertised an 766 Ethernet A-D route for the same segment, then the PE that received 767 the withdrawal simply invalidates the MAC entries for that segment. 769 Otherwise, the PE updates the next-hop adjacencies to point to the 770 backup PE(s). 772 8.2.1 Constructing the Ethernet A-D per Ethernet Segment (ES) Route 774 This section describes the procedures used to construct the Ethernet 775 A-D per ES route, which is used for fast convergence (as discussed 776 above) and for advertising the ESI label used for split-horizon 777 filtering (as discussed in section 9.3). Support of this route is 778 MANDATORY. 780 The Route-Distinguisher (RD) MUST be a Type 1 RD [RFC4364]. The value 781 field comprises an IP address of the PE (typically, the loopback 782 address) followed by a number unique to the PE. 784 The Ethernet Segment Identifier MUST be a ten octet entity as 785 described in section "Ethernet Segment". This document does not 786 specify the use of the Ethernet A-D route when the Segment Identifier 787 is set to 0. 789 The Ethernet Tag ID MUST be set to MAX-ET. 791 The MPLS label in the NLRI MUST be set to 0. 793 The "ESI Label Extended Community" MUST be included in the route. If 794 All-Active redundancy mode is desired, then the "Single-Active" bit 795 in the flags of the ESI Label Extended Community MUST be set to 0 and 796 the MPLS label in that extended community MUST be set to a valid MPLS 797 label value. The MPLS label in this Extended Community is referred to 798 as the ESI label and MUST have the same value in each Ethernet A-D 799 per ES route advertised for the ES. This label MUST be a downstream 800 assigned MPLS label if the advertising PE is using ingress 801 replication for receiving multicast, broadcast or unknown unicast 802 traffic from other PEs. If the advertising PE is using P2MP MPLS LSPs 803 for sending multicast, broadcast or unknown unicast traffic, then 804 this label MUST be an upstream assigned MPLS label. The usage of this 805 label is described in section 9.3. 807 If Single-Active redundancy mode is desired, then the "Single-Active" 808 bit in the flags of the ESI Label Extended Community MUST be set to 1 809 and the ESI label MUST be set to zero. 811 8.2.1.1. Ethernet A-D Route Targets 813 Each Ethernet A-D per ES route MUST carry one or more Route Target 814 (RT) attributes. The set of Ethernet A-D routes per ES MUST carry the 815 entire set of RTs for all the EVPN instances to which the Ethernet 816 Segment belongs. 818 8.3 Split Horizon 820 Consider a CE that is multi-homed to two or more PEs on an Ethernet 821 segment ES1 operating in All-Active redundancy mode. If the CE sends 822 a broadcast, unknown unicast, or multicast (BUM) packet to one of the 823 non-DF (Designated Forwarder) PEs, say PE1, then PE1 will forward 824 that packet to all or subset of the other PEs in that EVPN instance 825 including the DF PE for that Ethernet segment. In this case the DF PE 826 that the CE is multi-homed to MUST drop the packet and not forward 827 back to the CE. This filtering is referred to as "split horizon" 828 filtering in this document. 830 In order to achieve this split horizon function, every BUM packet 831 originating from a non-DF PE is encapsulated with an MPLS label that 832 identifies the Ethernet segment of origin (i.e. the segment from 833 which the frame entered the EVPN network). This label is referred to 834 as the ESI label, and MUST be distributed by all PEs when operating 835 in All-Active redundancy mode using a set of Ethernet A-D per ES 836 routes per section 9.2.1 above. This route is imported by the PEs 837 connected to the Ethernet Segment and also by the PEs that have at 838 least one EVPN instance in common with the Ethernet Segment in the 839 route. As described in section 9.1.1, the route MUST carry an ESI 840 Label Extended Community with a valid ESI label. The disposition DF 841 PE rely on the value of the ESI label to determine whether or not a 842 BUM frame is allowed to egress a specific Ethernet segment. It should 843 be noted that if the BUM frame is originated from the DF PE operating 844 in All-Active multi-homing mode, then the DF PE MAY not encapsulate 845 the frame with the ESI label. Furthermore, if the multi-homed PEs 846 operate in Single-Active redundancy mode, then the packet MUST NOT be 847 encapsulated with the ESI label and the label value MUST be set to 848 zero in ESI Label Extended Community per section 9.2.1 above. 850 8.3.1 ESI Label Assignment 852 The following subsections describe the assignment procedures for the 853 ESI label, which differ depending on the type of tunnels being used 854 to deliver multi-destination packets in the EVPN network. 856 8.3.1.1 Ingress Replication 858 The non-DF PEs attached to a given ES that is operating in All-Active 859 redundancy mode and that use ingress replication to receive BUM 860 traffic advertise a downstream assigned ESI label in the set of 861 Ethernet A-D per ES routes for that ES. This label MUST be programmed 862 in the platform label space by the advertising PE. Further the 863 forwarding entry for this label must result in NOT forwarding packets 864 received with this label onto the Ethernet segment that the label was 865 distributed for. 867 Consider PE1 and PE2 that are multi-homed to CE1 on ES1 and operating 868 in All-Active multi-homing mode. Further consider that PE1 is using 869 P2P or MP2P LSPs to send packets to PE2. Consider that PE1 is the 870 non-DF for VLAN1 and PE2 is the DF for VLAN1, and PE1 receives a BUM 871 packet from CE1 on VLAN1 on ES1. In this scenario, PE2 distributes an 872 Inclusive Multicast Ethernet Tag route for VLAN1 corresponding to an 873 EVPN instance. So, when PE1 sends a BUM packet, that it receives from 874 CE1, it MUST first push onto the MPLS label stack the ESI label that 875 PE2 has distributed for ES1. It MUST then push on the MPLS label 876 distributed by PE2 in the Inclusive Multicast Ethernet Tag route for 877 VLAN1. The resulting packet is further encapsulated in the P2P or 878 MP2P LSP label stack required to transmit the packet to PE2. When 879 PE2 receives this packet, it determines the set of ESIs to replicate 880 the packet to from the top MPLS label, after any P2P or MP2P LSP 881 labels have been removed. If the next label is the ESI label assigned 882 by PE2 for ES1, then PE2 MUST NOT forward the packet onto ES1. If the 883 next label is an ESI label which has not been assigned by PE2, then 884 PE2 MUST drop the packet. It should be noted that in this scenario, 885 if PE2 receives a BUM traffic for VLAN1 from CE1, then it doesn't 886 need to encapsulate the packet with an ESI label when sending it to 887 the PE1 since PE1 can use its DF logic to filter the BUM packets and 888 thus doesn't need to use split-horizon filtering for ES1. 890 8.3.1.2. P2MP MPLS LSPs 892 The non-DF PEs attached to a given ES that is operating in All-Active 893 redundancy mode and that use P2MP LSPs to send BUM traffic advertise 894 an upstream assigned ESI label in the set of Ethernet A-D per ES 895 routes for that ES. This label is upstream assigned by the PE that 896 advertises the route. This label MUST be programmed by the other PEs, 897 that are connected to the ESI advertised in the route, in the context 898 label space for the advertising PE. Further the forwarding entry for 899 this label must result in NOT forwarding packets received with this 900 label onto the Ethernet segment that the label was distributed for. 901 This label MUST also be programmed by the other PEs, that import the 902 route but are not connected to the ESI advertised in the route, in 903 the context label space for the advertising PE. Further the 904 forwarding entry for this label must be a POP with no other 905 associated action. 907 Consider PE1 and PE2 that are multi-homed to CE1 on ES1 and operating 908 in All-Active multi-homing mode. Also consider PE3 belongs to one of 909 the EVPN instances of ES1. Further, assume that PE1 which is the 910 non-DF, using P2MP MPLS LSPs to send BUM packets. When PE1 sends a 911 BUM packet, that it receives from CE1, it MUST first push onto the 912 MPLS label stack the ESI label that it has assigned for the ESI that 913 the packet was received on. The resulting packet is further 914 encapsulated in the P2MP MPLS label stack necessary to transmit the 915 packet to the other PEs. Penultimate hop popping MUST be disabled on 916 the P2MP LSPs used in the MPLS transport infrastructure for EVPN. 917 When PE2 receives this packet, it de-capsulates the top MPLS label 918 and forwards the packet using the context label space determined by 919 the top label. If the next label is the ESI label assigned by PE1 to 920 ES1, then PE2 MUST NOT forward the packet onto ES1. When PE3 receives 921 this packet, it de-capsulates the top MPLS label and forwards the 922 packet using the context label space determined by the top label. If 923 the next label is the ESI label assigned by PE1 to ES1 and PE3 is not 924 connected to ES1, then PE3 MUST pop the label and flood the packet 925 over all local ESIs in that EVPN instance. It should be noted that 926 when PE2 sends a BUM frame over a P2MP LSP, it does not need to 927 encapsulate the frame with an ESI label because it is the DF for that 928 VLAN. 930 8.4 Aliasing and Backup-Path 932 In the case where a CE is multi-homed to multiple PE nodes, using a 933 LAG with All-Active redundancy, it is possible that only a single PE 934 learns a set of the MAC addresses associated with traffic transmitted 935 by the CE. This leads to a situation where remote PE nodes receive 936 MAC advertisement routes, for these addresses, from a single PE even 937 though multiple PEs are connected to the multi-homed segment. As a 938 result, the remote PEs are not able to effectively load-balance 939 traffic among the PE nodes connected to the multi-homed Ethernet 940 segment. This could be the case, for e.g. when the PEs perform data- 941 path learning on the access, and the load-balancing function on the 942 CE hashes traffic from a given source MAC address to a single PE. 943 Another scenario where this occurs is when the PEs rely on control 944 plane learning on the access (e.g. using ARP), since ARP traffic will 945 be hashed to a single link in the LAG. 947 To address this issue, EVPN introduces the concept of 'Aliasing' 948 which is the ability of a PE to signal that it has reachability to an 949 EVPN instance on a given ES even when it has learnt no MAC addresses 950 from that EVI/ES. The Ethernet A-D per EVI route is used for this 951 purpose. A remote PE that receives a MAC advertisement route with 952 non-reserved ESI SHOULD consider the advertised MAC address to be 953 reachable via all PEs that have advertised reachability to that MAC 954 address' EVI/ES via the combination of an Ethernet A-D per EVI route 955 for that EVI/ES (and Ethernet Tag if applicable) AND Ethernet A-D per 956 ES routes for that ES with the 'Single-Active' bit in the flags of 957 the ESI Label Extended Community set to 0. 959 Note that the Ethernet A-D per EVI route may be received by a remote 960 PE before it receives the set of Ethernet A-D per ES routes. 961 Therefore, in order to handle corner cases and race conditions, the 962 Ethernet A-D per EVI route MUST NOT be used for traffic forwarding by 963 a remote PE until it also receives the associated set of Ethernet A-D 964 per ES routes. 966 Backup-path is a closely related function, but it is used in Single- 967 Active redundancy mode. In this case a PE also advertises that it 968 has reachability to a give EVI/ES using same combination of Ethernet 969 A-D per EVI route and Ethernet A-D per ES route as above, but with 970 the 'Single-Active' bit in the flags of the ESI Label Extended 971 Community set to 1. A remote PE that receives a MAC advertisement 972 route with non-reserved ESI SHOULD consider the advertised MAC 973 address to be reachable via any PE that has advertised this 974 combination of Ethernet A-D routes and it SHOULD install a backup- 975 path for that MAC address. 977 8.4.1 Constructing the Ethernet A-D per EVPN Instance (EVI) Route 979 This section describes the procedures used to construct the Ethernet 980 A-D per EVPN Instance (EVI) route, which is used for aliasing (as 981 discussed above). Support of this route is OPTIONAL. 983 Route-Distinguisher (RD) MUST be set to the RD of the EVI that is 984 advertising the NLRI. An RD MUST be assigned for a given EVI on an 985 PE. This RD MUST be unique across all EVIs on an PE. It is 986 RECOMMENDED to use the Type 1 RD [RFC4364]. The value field comprises 987 an IP address of the PE (typically, the loopback address) followed by 988 a number unique to the PE. This number may be generated by the PE. 989 Or in the Unique VLAN EVPN case, the low order 12 bits may be the 12 990 bit VLAN ID, with the remaining high order 4 bits set to 0. 992 The Ethernet Segment Identifier MUST be a ten octet entity as 993 described in section "Ethernet Segment Identifier". This document 994 does not specify the use of the Ethernet A-D route when the Segment 995 Identifier is set to 0. 997 The Ethernet Tag ID is the identifier of an Ethernet Tag on the 998 Ethernet segment. This value may be a 12 bit VLAN ID, in which case 999 the low order 12 bits are set to the VLAN ID and the high order 20 1000 bits are set to 0. Or it may be another Ethernet Tag used by the 1001 EVPN. It MAY be set to the default Ethernet Tag on the Ethernet 1002 segment or to the value 0. 1004 Note that the above allows the Ethernet A-D route to be advertised 1005 with one of the following granularities: 1007 + One Ethernet A-D route for a given tuple 1008 per EVI. This is applicable when the PE uses MPLS-based 1009 disposition. 1011 + One Ethernet A-D route per (where the Ethernet 1012 Tag ID is set to 0). This is applicable when the PE uses 1013 MAC-based disposition, or when the PE uses MPLS-based 1014 disposition when no VLAN translation is required. 1016 The usage of the MPLS label is described in the section on "Load 1017 Balancing of Unicast Packets". 1019 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1020 be set to the IPv4 or IPv6 address of the advertising PE. 1022 8.4.1.1 Ethernet A-D Route Targets 1024 The Ethernet A-D route MUST carry one or more Route Target (RT) 1025 attributes. RTs may be configured (as in IP VPNs), or may be derived 1026 automatically. 1028 If an PE uses Route Target Constrain [RT-CONSTRAIN], the PE SHOULD 1029 advertise all such RTs using Route Target Constrains. The use of RT 1030 Constrains allows each Ethernet A-D route to reach only those PEs 1031 that are configured to import at least one RT from the set of RTs 1032 carried in the Ethernet A-D route. 1034 8.4.1.1.1 Auto-Derivation from the Ethernet Tag ID 1036 The following is the procedure for deriving the RT attribute 1037 automatically from the Ethernet Tag ID associated with the 1038 advertisement: 1040 + The Global Administrator field of the RT MUST 1041 be set to the Autonomous System (AS) number that the PE 1042 belongs to. 1044 + The Local Administrator field of the RT contains a 4 1045 octets long number that encodes the Ethernet Tag-ID. If the 1046 Ethernet Tag-ID is a two octet VLAN ID then it MUST be 1047 encoded in the lower two octets of the Local Administrator 1048 field and the higher two octets MUST be set to zero. 1050 For the "Unique VLAN EVPN" this results in auto-deriving the RT from 1051 the Ethernet Tag, e.g., VLAN ID for that EVPN. 1053 8.5 Designated Forwarder Election 1055 Consider a CE that is a host or a router that is multi-homed directly 1056 to more than one PE in an EVPN instance on a given Ethernet segment. 1057 One or more Ethernet Tags may be configured on the Ethernet segment. 1058 In this scenario only one of the PEs, referred to as the Designated 1059 Forwarder (DF), is responsible for certain actions: 1061 - Sending multicast and broadcast traffic, on a given Ethernet 1062 Tag on a particular Ethernet segment, to the CE. 1064 - Flooding unknown unicast traffic (i.e. traffic for 1065 which an PE does not know the destination MAC address), 1066 on a given Ethernet Tag on a particular Ethernet segment 1067 to the CE, if the environment requires flooding of 1068 unknown unicast traffic. 1070 Note that this behavior, which allows selecting a DF at the 1071 granularity of for multicast, broadcast and unknown 1072 unicast traffic, is the default behavior in this specification. 1074 Note that a CE always sends packets belonging to a specific flow 1075 using a single link towards an PE. For instance, if the CE is a host 1076 then, as mentioned earlier, the host treats the multiple links that 1077 it uses to reach the PEs as a Link Aggregation Group (LAG). The CE 1078 employs a local hashing function to map traffic flows onto links in 1079 the LAG. 1081 If a bridged network is multi-homed to more than one PE in an EVPN 1082 network via switches, then the support of All-Active redundancy mode 1083 requires the bridge network to be connected to two or more PEs using 1084 a LAG. 1086 If a bridged network does not connect to the PEs using LAG, then only 1087 one of the links between the switched bridged network and the PEs 1088 must be the active link for a given EVPN instance. In this case, the 1089 set of Ethernet A-D per ES routes advertised by each PE MUST have the 1090 'Single-Active' bit in the flags of the ESI Label Extended Community 1091 set to 1. 1093 The default procedure for DF election at the granularity of is referred to as "service carving". With service carving, it is 1095 possible to elect multiple DFs per Ethernet Segment (one per EVI) in 1096 order to perform load-balancing of multi-destination traffic destined 1097 to a given Segment. The load-balancing procedures carve up the EVI 1098 space among the PE nodes evenly, in such a way that every PE is the 1099 DF for a disjoint set of EVIs. The procedure for service carving is 1100 as follows: 1102 1. When a PE discovers the ESI of the attached Ethernet Segment, it 1103 advertises an Ethernet Segment route with the associated ES-Import 1104 extended community attribute. 1106 2. The PE then starts a timer (default value = 3 seconds) to allow 1107 the reception of Ethernet Segment routes from other PE nodes 1108 connected to the same Ethernet Segment. This timer value MUST be same 1109 across all PEs connected to the same Ethernet Segment. 1111 3. When the timer expires, each PE builds an ordered list of the IP 1112 addresses of all the PE nodes connected to the Ethernet Segment 1113 (including itself), in increasing numeric value. Each IP address in 1114 this list is extracted from the "Originator Router's IP address" 1115 field of the advertised Ethernet Segment route. Every PE is then 1116 given an ordinal indicating its position in the ordered list, 1117 starting with 0 as the ordinal for the PE with the numerically lowest 1118 IP address. The ordinals are used to determine which PE node will be 1119 the DF for a given EVPN instance on the Ethernet Segment using the 1120 following rule: Assuming a redundancy group of N PE nodes, the PE 1121 with ordinal i is the DF for an EVPN instance with an associated 1122 Ethernet Tag value V when (V mod N) = i. In the case where multiple 1123 Ethernet Tags are associated with a single EVPN instance, then the 1124 numerically lowest Ethernet Tag value in that EVPN instance MUST be 1125 used in the modulo function. 1127 It should be noted that using "Originator Router's IP address" field 1128 in the Ethernet Segment route to get the PE IP address needed for the 1129 ordered list, allows for a CE to be multi-homed across different ASes 1130 if such need every arises. 1132 4. The PE that is elected as a DF for a given EVPN instance will 1133 unblock traffic for the Ethernet Tags associated with that EVPN 1134 instance. Note that the DF PE unblocks multi-destination traffic in 1135 the egress direction towards the Segment. All non-DF PEs continue to 1136 drop multi-destination traffic (for the associated EVPN instances) in 1137 the egress direction towards the Segment. 1139 In the case of link or port failure, the affected PE withdraws its 1140 Ethernet Segment route. This will re-trigger the service carving 1141 procedures on all the PEs in the RG. For PE node failure, or upon PE 1142 commissioning or decommissioning, the PEs re-trigger the service 1143 carving. In case of a Single-Active multi-homing, when a service 1144 moves from one PE in the RG to another PE as a result of re-carving, 1145 the PE, which ends up being the elected DF for the service, must 1146 trigger a MAC address flush notification towards the associated 1147 Ethernet Segment. This can be done, for e.g. using IEEE 802.1ak MVRP 1148 'new' declaration. 1150 8.6. Interoperability with Single-homing PEs 1152 Let's refer to PEs that only support single-homed CE devices as 1153 single-homing PEs. For single-homing PEs, all the above multi-homing 1154 procedures can be omitted; however, to allow for single-homing PEs to 1155 fully inter-operate with multi-homing PEs, some of the multi-homing 1156 procedures described above SHOULD be supported even by single-homing 1157 PEs: 1159 - procedures related to processing Ethernet A-D route for the purpose 1160 of Fast Convergence (9.2 Fast Convergence), to let single-homing PEs 1161 benefit from fast convergence 1163 - procedures related to processing Ethernet A-D route for the purpose 1164 of Aliasing (9.4 Aliasing and Backup-path), to let single-homing PEs 1165 benefit from load balancing 1167 - procedures related to processing Ethernet A-D route for the purpose 1168 of Backup-path (9.4 Aliasing and Backup-path), to let single-homing 1169 PEs to benefit from the corresponding convergence improvement 1171 9. Determining Reachability to Unicast MAC Addresses 1173 PEs forward packets that they receive based on the destination MAC 1174 address. This implies that PEs must be able to learn how to reach a 1175 given destination unicast MAC address. 1177 There are two components to MAC address learning, "local learning" 1178 and "remote learning": 1180 9.1. Local Learning 1182 A particular PE must be able to learn the MAC addresses from the CEs 1183 that are connected to it. This is referred to as local learning. 1185 The PEs in a particular EVPN instance MUST support local data plane 1186 learning using standard IEEE Ethernet learning procedures. An PE must 1187 be capable of learning MAC addresses in the data plane when it 1188 receives packets such as the following from the CE network: 1190 - DHCP requests 1192 - ARP request for its own MAC. 1194 - ARP request for a peer. 1196 Alternatively PEs MAY learn the MAC addresses of the CEs in the 1197 control plane or via management plane integration between the PEs and 1198 the CEs. 1200 There are applications where a MAC address that is reachable via a 1201 given PE on a locally attached Segment (e.g. with ESI X) may move 1202 such that it becomes reachable via another PE on another Segment 1203 (e.g. with ESI Y). This is referred to as a "MAC Mobility". 1204 Procedures to support this are described in section "MAC Mobility". 1206 9.2. Remote learning 1208 A particular PE must be able to determine how to send traffic to MAC 1209 addresses that belong to or are behind CEs connected to other PEs 1210 i.e. to remote CEs or hosts behind remote CEs. We call such MAC 1211 addresses as "remote" MAC addresses. 1213 This document requires an PE to learn remote MAC addresses in the 1214 control plane. In order to achieve this, each PE advertises the MAC 1215 addresses it learns from its locally attached CEs in the control 1216 plane, to all the other PEs in that EVPN instance, using MP-BGP and 1217 specifically the MAC Advertisement route. 1219 9.2.1. Constructing the BGP EVPN MAC/IP Address Advertisement 1221 BGP is extended to advertise these MAC addresses using the MAC/IP 1222 Advertisement route type in the EVPN NLRI. 1224 The RD MUST be the RD of the EVI that is advertising the NLRI. The 1225 procedures for setting the RD for a given EVI are described in 1226 section 9.4.1. 1228 The Ethernet Segment Identifier is set to the ten octet ESI described 1229 in section "Ethernet Segment". 1231 The Ethernet Tag ID may be zero or may represent a valid Ethernet Tag 1232 ID. This field may be non-zero when there are multiple bridge 1233 domains in the MAC-VRF (e.g., the PE needs to perform qualified 1234 learning for the VLANs in that MAC-VRF). 1236 When the the Ethernet Tag ID in the NLRI is set to a non-zero value, 1237 for a particular bridge domain, then this Ethernet Tag may either be 1238 the Ethernet tag value associated with the CE, e.g., VLAN ID, or it 1239 may be the Ethernet Tag Identifier, e.g., VLAN ID assigned by the 1240 EVPN provider and mapped to the CE's Ethernet tag. The latter would 1241 be the case if the CE Ethernet tags, e.g., VLAN ID, for a particular 1242 bridge domain are different on different CEs. 1244 The MAC address length field is in bits and it is typically set to 1245 48. However this specification enables specifying the MAC address as 1246 a prefix; in which case, the MAC address length field is set to the 1247 length of the prefix. This provides the ability to aggregate MAC 1248 addresses if the deployment environment supports that. The encoding 1249 of a MAC address MUST be the 6-octet MAC address specified by 1250 [802.1D-ORIG] [802.1D-REV]. If the MAC address is advertised as a 1251 prefix then the trailing bits of the prefix MUST be set to 0 to 1252 ensure that the entire prefix is encoded as 6 octets. 1254 The IP Address field is optional. By default, the IP Address Length 1255 field is set to 0 and the IP address field is omitted from the route. 1256 When a valid IP address or address prefix needs to be advertised 1257 (e.g., for ARP suppression purposes or for inter-subnet switching), 1258 it is then encoded in this route. 1260 The IP Address Length field is in bits and it is the length of the IP 1261 prefix. This provides the ability to advertise IP address prefixes 1262 when the deployment environment supports that. The encoding of an IP 1263 address MUST be either 4 octets for IPv4 or 16 octets for IPv6. When 1264 the IP address is advertised as a prefix, then the trailing bits of 1265 the prefix MUST be set to 0 to ensure that the entire prefix is 1266 encoded as either 4 or 16 octets. The length field of EVPN NLRI 1267 (which is in octets and is described in section 8) is sufficient to 1268 determine whether an IP address/prefix is encoded in this route and 1269 if so, whether the encoded IP address/prefix is IPV4 or IPv6. 1271 The MPLS label1 field is encoded as 3 octets, where the high-order 20 1272 bits contain the label value. The MPLS label1 MUST be downstream 1273 assigned and it is associated with the MAC address being advertised 1274 by the advertising PE. The advertising PE uses this label when it 1275 receives an MPLS-encapsulated packet to perform forwarding based on 1276 the destination MAC address. The forwarding procedures are specified 1277 in section "Forwarding Unicast Packets" and "Load Balancing of 1278 Unicast Packets". 1280 An PE may advertise the same single EVPN label for all MAC addresses 1281 in a given EVI. This label assignment methodology is referred to as a 1282 per EVI label assignment. Alternatively, an PE may advertise a unique 1283 EVPN label per combination. This label assignment 1284 methodology is referred to as a per label 1285 assignment. As a third option, an PE may advertise a unique EVPN 1286 label per MAC address. All of these methodologies have their 1287 tradeoffs. The choice of a particular label assignment methodology is 1288 purely local to the PE that originates the route. 1290 Per EVI label assignment requires the least number of EVPN labels, 1291 but requires a MAC lookup in addition to an MPLS lookup on an egress 1292 PE for forwarding. On the other hand, a unique label per or a unique label per MAC allows an egress PE to 1294 forward a packet that it receives from another PE, to the connected 1295 CE, after looking up only the MPLS labels without having to perform a 1296 MAC lookup. This includes the capability to perform appropriate VLAN 1297 ID translation on egress to the CE. 1299 The MPLS label2 field is an optional field and if it is present, then 1300 it is encoded as 3 octets, where the high-order 20 bits contain the 1301 label value. The use of MPLS label2 is for further study. 1303 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1304 be set to the IPv4 or IPv6 address of the advertising PE. 1306 The BGP advertisement for the MAC advertisement route MUST also carry 1307 one or more Route Target (RT) attributes. RTs may be configured (as 1308 in IP VPNs), or may be derived automatically from the Ethernet Tag 1309 ID, in the Unique VLAN case, as described in section "Ethernet A-D 1310 Route per EVPN". 1312 It is to be noted that this document does not require PEs to create 1313 forwarding state for remote MACs when they are learnt in the control 1314 plane. When this forwarding state is actually created is a local 1315 implementation matter. 1317 9.2.2 Route Resolution 1319 If the Ethernet Segment Identifier field in a received MAC 1320 Advertisement route is set to the reserved ESI value of 0 or MAX-ESI, 1321 then the receiving PE MUST install forwarding state for the 1322 associated MAC Address based on the MAC Advertisement route alone. 1324 If the Ethernet Segment Identifier field in a received MAC 1325 Advertisement route is set to a non-reserved ESI, and the receiving 1326 PE is locally attached to the same ESI, then the PE does not alter 1327 its forwarding state based on the received route. This ensures that 1328 local routes are preferred to remote routes. 1330 If the Ethernet Segment Identifier field in a received MAC 1331 Advertisement route is set to a non-reserved ESI, then the receiving 1332 PE MUST install forwarding state for a given MAC address only when 1333 both the MAC Advertisement route AND the associated set of Ethernet 1334 A-D per ES routes have been received. 1336 To illustrate this with an example, consider two PEs (PE1 and PE2) 1337 connected to a multi-homed Ethernet Segment ES1. All-Active 1338 redundancy mode is assumed. A given MAC address M1 is learnt by PE1 1339 but not PE2. On PE3, the following states may arise: 1341 T1- When the MAC Advertisement Route from PE1 and the set of Ethernet 1342 A-D per ES routes from PE1 and PE2 are received, PE3 can forward 1343 traffic destined to M1 to both PE1 and PE2. 1345 T2- If after T1, PE1 withdraws its set of Ethernet A-D per ES routes, 1346 then PE3 forwards traffic destined to M1 to PE2 only. 1348 T3- If after T1, PE2 withdraws its set of Ethernet A-D per ES routes, 1349 then PE3 forwards traffic destined to M1 to PE1 only. 1351 T4- If after T1, PE1 withdraws its MAC Advertisement route, then PE3 1352 treats traffic to M1 as unknown unicast. Note, here, that had PE2 1353 also advertised a MAC route for M1 before PE1 withdraws its MAC 1354 route, then PE3 would have continued forwarding traffic destined to 1355 M1 to PE2. 1357 10. ARP and ND 1359 The IP address field in the MAC advertisement route may optionally 1360 carry one of the IP addresses associated with the MAC address. This 1361 provides an option which can be used to minimize the flooding of ARP 1362 or Neighbor Discovery (ND) messages over the MPLS network and to 1363 remote CEs. This option also minimizes ARP (or ND) message processing 1364 on end-stations/hosts connected to the EVPN network. An PE may learn 1365 the IP address associated with a MAC address in the control or 1366 management plane between the CE and the PE. Or, it may learn this 1367 binding by snooping certain messages to or from a CE. When an PE 1368 learns the IP address associated with a MAC address, of a locally 1369 connected CE, it may advertise this address to other PEs by including 1370 it in the MAC Advertisement route. The IP Address may be an IPv4 1371 address encoded using four octets, or an IPv6 address encoded using 1372 sixteen octets. For ARP and ND purposes, the IP Address length field 1373 MUST be set to 32 for an IPv4 address or to 128 for an IPv6 address. 1375 If there are multiple IP addresses associated with a MAC address, 1376 then multiple MAC advertisement routes MUST be generated, one for 1377 each IP address. For instance, this may be the case when there are 1378 both an IPv4 and an IPv6 address associated with the MAC address. 1379 When the IP address is dissociated with the MAC address, then the MAC 1380 advertisement route with that particular IP address MUST be 1381 withdrawn. 1383 When an PE receives an ARP request for an IP address from a CE, and 1384 if the PE has the MAC address binding for that IP address, the PE 1385 SHOULD perform ARP proxy by responding to the ARP request. 1387 10.1 Default Gateway 1389 When a PE needs to perform inter-subnet forwarding where each subnet 1390 is represented by a different broadcast domain (e.g., different VLAN) 1391 the inter-subnet forwarding is performed at layer 3 and the PE that 1392 performs such function is called the default gateway. In this case 1393 when the PE receives an ARP Request for the IP address of the default 1394 gateway, the PE originates an ARP Reply. 1396 Each PE that acts as a default gateway for a given EVPN instance MAY 1397 advertise in the EVPN control plane its default gateway MAC address 1398 using the MAC advertisement route, and indicates that such route is 1399 associated with the default gateway. This is accomplished by 1400 requiring the route to carry the Default Gateway extended community 1401 defined in [Section 8.8 Default Gateway Extended Community]. The ESI 1402 field is set to zero when advertising the MAC route with the Default 1403 Gateway extended community. 1405 Unless it is known a priori (by means outside of this document) that 1406 all PEs of a given EVPN instance act as a default gateway for that 1407 EVPN instance, the MPLS label MUST be set to a valid downstream 1408 assigned label. 1410 Furthermore, even if all PEs of a given EVPN instance do act as a 1411 default gateway for that EVPN instance, but only some, but not all, 1412 of these PEs have sufficient (routing) information to provide inter- 1413 subnet routing for all the inter-subnet traffic originated within the 1414 subnet associated with the EVPN instance, then when such PE 1415 advertises in the EVPN control plane its default gateway MAC address 1416 using the MAC advertisement route, and indicates that such route is 1417 associated with the default gateway, the route MUST carry a valid 1418 downstream assigned label. 1420 If all PEs of a given EVPN instance act as a default gateway for that 1421 EVPN instance, and the same default gateway MAC address is used 1422 across all gateway devices, then no such advertisement is needed. 1423 However, if each default gateway uses a different MAC address, then 1424 each default gateway needs to be aware of other gateways' MAC 1425 addresses and thus the need for such advertisement. This is called 1426 MAC address aliasing since a single default GW can be represented by 1427 multiple MAC addresses. 1429 Each PE that receives this route and imports it as per procedures 1430 specified in this document follows the procedures in this section 1431 when replying to ARP Requests that it receives if such Requests are 1432 for the IP address in the received EVPN route. 1434 Each PE that acts as a default gateway for a given EVPN instance that 1435 receives this route and imports it as per procedures specified in 1436 this document MUST create MAC forwarding state that enables it to 1437 apply IP forwarding to the packets destined to the MAC address 1438 carried in the route. 1440 11. Handling of Multi-Destination Traffic 1442 Procedures are required for a given PE to send broadcast or multicast 1443 traffic, received from a CE encapsulated in a given Ethernet Tag 1444 (VLAN) in an EVPN instance, to all the other PEs that span that 1445 Ethernet Tag (VLAN) in that EVPN instance. In certain scenarios, 1446 described in section "Processing of Unknown Unicast Packets", a given 1447 PE may also need to flood unknown unicast traffic to other PEs. 1449 The PEs in a particular EVPN instance may use ingress replication, 1450 P2MP LSPs or MP2MP LSPs to send unknown unicast, broadcast or 1451 multicast traffic to other PEs. 1453 Each PE MUST advertise an "Inclusive Multicast Ethernet Tag Route" to 1454 enable the above. The following subsection provides the procedures to 1455 construct the Inclusive Multicast Ethernet Tag route. Subsequent 1456 subsections describe in further detail its usage. 1458 11.1. Construction of the Inclusive Multicast Ethernet Tag Route 1460 The RD MUST be the RD of the EVI that is advertising the NLRI. The 1461 procedures for setting the RD for a given EVPN instance on a PE are 1462 described in section 9.4.1. 1464 The Ethernet Tag ID is the identifier of the Ethernet Tag. It MAY be 1465 set to 0 or to a valid Ethernet Tag value. 1467 The Originating Router's IP address MUST be set to an IP address of 1468 the PE. This address SHOULD be common for all the EVIs on the PE 1469 (e.,g., this address may be PE's loopback address). The IP Address 1470 Length field is in bits. 1472 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1473 be set to the same IP address as the one carried in the Originating 1474 Router's IP Address field. 1476 The BGP advertisement for the Inclusive Multicast Ethernet Tag route 1477 MUST also carry one or more Route Target (RT) attributes. The 1478 assignment of RTs described in the section on "Constructing the BGP 1479 EVPN MAC Address Advertisement" MUST be followed. 1481 11.2. P-Tunnel Identification 1483 In order to identify the P-Tunnel used for sending broadcast, unknown 1484 unicast or multicast traffic, the Inclusive Multicast Ethernet Tag 1485 route MUST carry a "PMSI Tunnel Attribute" as specified in [BGP 1486 MVPN]. 1488 Depending on the technology used for the P-tunnel for the EVPN 1489 instance on the PE, the PMSI Tunnel attribute of the Inclusive 1490 Multicast Ethernet Tag route is constructed as follows. 1492 + If the PE that originates the advertisement uses a 1493 P-Multicast tree for the P-tunnel for EVPN, the PMSI 1494 Tunnel attribute MUST contain the identity of the tree 1495 (note that the PE could create the identity of the 1496 tree prior to the actual instantiation of the tree). 1498 + An PE that uses a P-Multicast tree for the P-tunnel MAY 1499 aggregate two or more Ethernet Tags in the same or different 1500 EVIs present on the PE onto the same tree. In this case, in 1501 addition to carrying the identity of the tree, the PMSI Tunnel 1502 attribute MUST carry an MPLS upstream assigned label which 1503 the PE has bound uniquely to the Ethernet Tag for the EVI 1504 associated with this update (as determined by its RTs). 1506 If the PE has already advertised Inclusive Multicast 1507 Ethernet Tag routes for two or more Ethernet Tags that it 1508 now desires to aggregate, then the PE MUST re-advertise 1509 those routes. The re-advertised routes MUST be the same 1510 as the original ones, except for the PMSI Tunnel attribute 1511 and the label carried in that attribute. 1513 + If the PE that originates the advertisement uses ingress 1514 replication for the P-tunnel for EVPN, the route MUST 1515 include the PMSI Tunnel attribute with the Tunnel Type set to 1516 Ingress Replication and Tunnel Identifier set to a routable 1517 address of the PE. The PMSI Tunnel attribute MUST carry a 1518 downstream assigned MPLS label. This label is used to 1519 demultiplex the broadcast, multicast or unknown unicast EVPN 1520 traffic received over a MP2P tunnel by the PE. 1522 + The Leaf Information Required flag of the PMSI Tunnel 1523 attribute MUST be set to zero, and MUST be ignored on receipt. 1525 12. Processing of Unknown Unicast Packets 1527 The procedures in this document do not require the PEs to flood 1528 unknown unicast traffic to other PEs. If PEs learn CE MAC addresses 1529 via a control plane protocol, the PEs can then distribute MAC 1530 addresses via BGP, and all unicast MAC addresses will be learnt prior 1531 to traffic to those destinations. 1533 However, if a destination MAC address of a received packet is not 1534 known by the PE, the PE may have to flood the packet. When flooding, 1535 one must take into account "split horizon forwarding" as follows: The 1536 principles behind the following procedures are borrowed from the 1537 split horizon forwarding rules in VPLS solutions [RFC4761] and 1538 [RFC4762]. When an PE capable of flooding (say PEx) receives an 1539 unknown destination MAC address, it floods the frame. If the frame 1540 arrived from an attached CE, PEx must send a copy of the frame to 1541 every other attached CE participating in that EVPN instance, on a 1542 different ESI than the one it received the frame on, as long as the 1543 PE is the DF for the egress ESI. In addition, the PE must flood the 1544 frame to all other PEs participating in that EVPN instance. If, on 1545 the other hand, the frame arrived from another PE (say PEy), PEx must 1546 send a copy of the packet only to attached CEs as long as it is the 1547 DF for the egress ESI. PEx MUST NOT send the frame to other PEs, 1548 since PEy would have already done so. Split horizon forwarding rules 1549 apply to unknown MAC addresses. 1551 Whether or not to flood packets to unknown destination MAC addresses 1552 should be an administrative choice, depending on how learning happens 1553 between CEs and PEs. 1555 The PEs in a particular EVPN instance may use ingress replication 1556 using RSVP-TE P2P LSPs or LDP MP2P LSPs for sending unknown unicast 1557 traffic to other PEs. Or they may use RSVP-TE P2MP or LDP P2MP for 1558 sending such traffic to other PEs. 1560 12.1. Ingress Replication 1562 If ingress replication is in use, the P-Tunnel attribute, carried in 1563 the Inclusive Multicast Ethernet Tag routes for the EVPN instance, 1564 specifies the downstream label that the other PEs can use to send 1565 unknown unicast, multicast or broadcast traffic for that EVPN 1566 instance to this particular PE. 1568 The PE that receives a packet with this particular MPLS label MUST 1569 treat the packet as a broadcast, multicast or unknown unicast packet. 1570 Further if the MAC address is a unicast MAC address, the PE MUST 1571 treat the packet as an unknown unicast packet. 1573 12.2. P2MP MPLS LSPs 1575 The procedures for using P2MP LSPs are very similar to VPLS 1576 procedures [VPLS-MCAST]. The P-Tunnel attribute used by an PE for 1577 sending unknown unicast, broadcast or multicast traffic for a 1578 particular EVPN instance is advertised in the Inclusive Ethernet Tag 1579 Multicast route as described in section "Handling of Multi- 1580 Destination Traffic". 1582 The P-Tunnel attribute specifies the P2MP LSP identifier. This is the 1583 equivalent of an Inclusive tree in [VPLS-MCAST]. Note that multiple 1584 Ethernet Tags, which may be in different EVPN instances, may use the 1585 same P2MP LSP, using upstream labels [VPLS-MCAST]. This is the 1586 equivalent of an Aggregate Inclusive tree in [VPLS-MCAST]. When P2MP 1587 LSPs are used for flooding unknown unicast traffic, packet re- 1588 ordering is possible. 1590 The PE that receives a packet on the P2MP LSP specified in the PMSI 1591 Tunnel Attribute MUST treat the packet as a broadcast, multicast or 1592 unknown unicast packet. Further if the MAC address is a unicast MAC 1593 address, the PE MUST treat the packet as an unknown unicast packet. 1595 13. Forwarding Unicast Packets 1597 This section describes procedures for forwarding unicast packets by 1598 PEs, where such packets are received from either directly connected 1599 CEs, or from some other PEs. 1601 13.1. Forwarding packets received from a CE 1603 When an PE receives a packet from a CE, on a given Ethernet Tag, it 1604 must first look up the source MAC address of the packet. In certain 1605 environments the source MAC address MAY be used to authenticate the 1606 CE and determine that traffic from the host can be allowed into the 1607 network. Source MAC lookup MAY also be used for local MAC address 1608 learning. 1610 If the PE decides to forward the packet, the destination MAC address 1611 of the packet must be looked up. If the PE has received MAC address 1612 advertisements for this destination MAC address from one or more 1613 other PEs or learned it from locally connected CEs, it is considered 1614 as a known MAC address. Otherwise, the MAC address is considered as 1615 an unknown MAC address. 1617 For known MAC addresses the PE forwards this packet to one of the 1618 remote PEs or to a locally attached CE. When forwarding to a remote 1619 PE, the packet is encapsulated in the EVPN MPLS label advertised by 1620 the remote PE, for that MAC address, and in the MPLS LSP label stack 1621 to reach the remote PE. 1623 If the MAC address is unknown and if the administrative policy on the 1624 PE requires flooding of unknown unicast traffic then: 1626 - The PE MUST flood the packet to other PEs. The PE MUST first 1627 encapsulate the packet in the ESI MPLS label as described in section 1628 9.3. If ingress replication is used, the packet MUST be replicated 1629 one or more times to each remote PE with the outermost label being an 1630 MPLS label determined as follows: This is the MPLS label advertised 1631 by the remote PE in a PMSI Tunnel Attribute in the Inclusive 1632 Multicast Ethernet Tag route for an 1633 combination. The Ethernet Tag in the route must be the same as the 1634 Ethernet Tag associated with the interface on which the ingress PE 1635 receives the packet. If P2MP LSPs are being used the packet MUST be 1636 sent on the P2MP LSP that the PE is the root of for the Ethernet Tag 1637 in the EVPN instance. If the same P2MP LSP is used for all Ethernet 1638 Tags, then all the PEs in the EVPN instance MUST be the leaves of the 1639 P2MP LSP. If a distinct P2MP LSP is used for a given Ethernet Tag in 1640 the EVPN instance, then only the PEs in the Ethernet Tag MUST be the 1641 leaves of the P2MP LSP. The packet MUST be encapsulated in the P2MP 1642 LSP label stack. 1644 If the MAC address is unknown then, if the administrative policy on 1645 the PE does not allow flooding of unknown unicast traffic: 1647 - The PE MUST drop the packet. 1649 13.2. Forwarding packets received from a remote PE 1651 This section described the procedures for forwarding known and 1652 unknown unicast packets received from a remote PE. 1654 13.2.1. Unknown Unicast Forwarding 1656 When an PE receives an MPLS packet from a remote PE then, after 1657 processing the MPLS label stack, if the top MPLS label ends up being 1658 a P2MP LSP label associated with an EVPN instance or in case of 1659 ingress replication the downstream label advertised in the P-Tunnel 1660 attribute, and after performing the split horizon procedures 1661 described in section "Split Horizon": 1663 - If the PE is the designated forwarder of BUM traffic on a 1664 particular set of ESIs for the Ethernet Tag, the default behavior is 1665 for the PE to flood the packet on these ESIs. In other words, the 1666 default behavior is for the PE to assume that for BUM traffic, it is 1667 not required to perform a destination MAC address lookup. As an 1668 option, the PE may perform a destination MAC lookup to flood the 1669 packet to only a subset of the CE interfaces in the Ethernet Tag. For 1670 instance the PE may decide to not flood an BUM packet on certain 1671 Ethernet segments even if it is the DF on the Ethernet segment, based 1672 on administrative policy. 1674 - If the PE is not the designated forwarder on any of the ESIs for 1675 the Ethernet Tag, the default behavior is for it to drop the packet. 1677 13.2.2. Known Unicast Forwarding 1679 If the top MPLS label ends up being an EVPN label that was advertised 1680 in the unicast MAC advertisements, then the PE either forwards the 1681 packet based on CE next-hop forwarding information associated with 1682 the label or does a destination MAC address lookup to forward the 1683 packet to a CE. 1685 14. Load Balancing of Unicast Frames 1687 This section specifies the load balancing procedures for sending 1688 known unicast frames to a multi-homed CE. 1690 14.1. Load balancing of traffic from an PE to remote CEs 1692 Whenever a remote PE imports a MAC advertisement for a given in an EVI, it MUST examine all imported Ethernet A-D 1694 routes for that ESI in order to determine the load-balancing 1695 characteristics of the Ethernet segment. 1697 14.1.1 Single-Active Redundancy Mode 1699 For a given ES, if the remote PE has imported the set of Ethernet A-D 1700 per ES routes from at least one PE, where the "Single-Active" flag in 1701 the ESI Label Extended Community is set, then the remote PE MUST 1702 deduce that the ES is operating in Single-Active redundancy mode. As 1703 such, the MAC address will be reachable only via the PE announcing 1704 the associated MAC Advertisement route - this is referred to as the 1705 primary PE. The other PEs advertising the set of Ethernet A-D per ES 1706 routes for the same ES provide backup paths for that ES, in case the 1707 primary PE encounters a failure, and are referred to as backup PEs. 1708 It should be noted that the primary PE for a given is the 1709 DF for that . 1711 If the primary PE encounters a failure, it MAY withdraw its set of 1712 Ethernet A-D per ES routes for the affected ES prior to withdrawing 1713 it set of MAC Advertisement routes. 1715 If there is only one backup PE for a given ES, the remote PE MAY use 1716 the primary PE's withdrawal of its set of Ethernet A-D per ES routes 1717 as a trigger to update its forwarding entries, for the associated MAC 1718 addresses, to point towards the backup PE. As the backup PE starts 1719 learning the MAC addresses over its attached ES, it will start 1720 sending MAC Advertisement routes while the failed PE withdraws its 1721 routes. This mechanism minimizes the flooding of traffic during fail- 1722 over events. 1724 If there is more than one backup PE for a given ES, the remote PE 1725 MUST use the primary PE's withdrawal of its set of Ethernet A-D per 1726 ES routes as a trigger to start flooding traffic for the associated 1727 MAC addresses (as long as flooding of unknown unicast is 1728 administratively allowed), as it is not possible to select a single 1729 backup PE. 1731 14.1.2 All-Active Redundancy Mode 1733 For a given ES, if the remote PE has imported the set of Ethernet A-D 1734 per ES routes from one or more PEs and none of them have the "Single- 1735 Active" flag in the ESI Label Extended Community set, then the remote 1736 PE MUST deduce that the ES is operating in All-Active redundancy 1737 mode. A remote PE that receives a MAC advertisement route with non- 1738 reserved ESI SHOULD consider the advertised MAC address to be 1739 reachable via all PEs that have advertised reachability to that MAC 1740 address' EVI/ES via the combination of an Ethernet A-D per EVI route 1741 for that EVI/ES (and Ethernet Tag if applicable) AND an Ethernet A-D 1742 per ES route for that ES. The remote PE MUST use received MAC 1743 Advertisement routes and Ethernet A-D per EVI/per ES routes to 1744 construct the set of next-hops for the advertised MAC address. 1746 The remote PE MUST use the MAC advertisement and eligible Ethernet A- 1747 D routes to construct the set of next-hops that it can use to send 1748 the packet to the destination MAC. Each next-hop comprises an MPLS 1749 label stack that is to be used by the egress PE to forward the 1750 packet. This label stack is determined as follows: 1752 -If the next-hop is constructed as a result of a MAC route then this 1753 label stack MUST be used. However, if the MAC route doesn't exist, 1754 then the next-hop and MPLS label stack is constructed as a result of 1755 the Ethernet A-D routes. Note that the following description applies 1756 to determining the label stack for a particular next-hop to reach a 1757 given PE, from which the remote PE has received and imported Ethernet 1758 A-D routes that have the matching ESI and Ethernet Tag as the one 1759 present in the MAC advertisement. The Ethernet A-D routes mentioned 1760 in the following description refer to the ones imported from this 1761 given PE. 1763 -If a set of Ethernet A-D per ES routes for that ES AND an Ethernet 1764 A-D route per EVI exist, then the label from that latter route must 1765 be used. 1767 The following example explains the above. 1769 Consider a CE (CE1) that is dual-homed to two PEs (PE1 and PE2) on a 1770 LAG interface (ES1), and is sending packets with MAC address MAC1 on 1771 VLAN1 (mapped to EVI1). A remote PE, say PE3, is able to learn that 1772 MAC1 is reachable via PE1 and PE2. Both PE1 and PE2 may advertise 1773 MAC1 in BGP if they receive packets with MAC1 from CE1. If this is 1774 not the case, and if MAC1 is advertised only by PE1, PE3 still 1775 considers MAC1 as reachable via both PE1 and PE2 as both PE1 and PE2 1776 advertise a set of Ethernet A-D per ES routes for ES1 as well as an 1777 Ethernet A-D per EVI route for . 1779 The MPLS label stack to send the packets to PE1 is the MPLS LSP stack 1780 to get to PE1 and the EVPN label advertised by PE1 for CE1's MAC. 1782 The MPLS label stack to send packets to PE2 is the MPLS LSP stack to 1783 get to PE2 and the MPLS label in the Ethernet A-D route advertised by 1784 PE2 for , if PE2 has not advertised MAC1 in BGP. 1786 We will refer to these label stacks as MPLS next-hops. 1788 The remote PE (PE3) can now load balance the traffic it receives from 1789 its CEs, destined for CE1, between PE1 and PE2. PE3 may use N-Tuple 1790 flow information to hash traffic into one of the MPLS next-hops for 1791 load balancing of IP traffic. Alternatively PE3 may rely on the 1792 source MAC addresses for load balancing. 1794 Note that once PE3 decides to send a particular packet to PE1 or PE2 1795 it can pick one out of multiple possible paths to reach the 1796 particular remote PE using regular MPLS procedures. For instance, if 1797 the tunneling technology is based on RSVP-TE LSPs, and PE3 decides to 1798 send a particular packet to PE1, then PE3 can choose from multiple 1799 RSVP-TE LSPs that have PE1 as their destination. 1801 When PE1 or PE2 receive the packet destined for CE1 from PE3, if the 1802 packet is a unicast MAC packet it is forwarded to CE1. If it is a 1803 multicast or broadcast MAC packet then only one of PE1 or PE2 must 1804 forward the packet to the CE. Which of PE1 or PE2 forward this packet 1805 to the CE is determined based on which of the two is the DF. 1807 If the connectivity between the multi-homed CE and one of the PEs 1808 that it is attached to, fails, the PE MUST withdraw the set of 1809 Ethernet A-D per ES routes that had been previously advertised for 1810 that ES. When the MAC entry on the PE ages out, the PE MUST withdraw 1811 the MAC address from BGP. Note that to aid convergence, the Ethernet 1812 Tag A-D routes MAY be withdrawn before the MAC routes. This enables 1813 the remote PEs to remove the MPLS next-hop to this particular PE from 1814 the set of MPLS next-hops that can be used to forward traffic to the 1815 CE. For further details and procedures on withdrawal of EVPN route 1816 types in the event of PE to CE failures please section "PE to CE 1817 Network Failures". 1819 14.2. Load balancing of traffic between an PE and a local CE 1821 A CE may be configured with more than one interface connected to 1822 different PEs or the same PE for load balancing, using a technology 1823 such as LAG. The PE(s) and the CE can load balance traffic onto these 1824 interfaces using one of the following mechanisms. 1826 14.2.1. Data plane learning 1828 Consider that the PEs perform data plane learning for local MAC 1829 addresses learned from local CEs. This enables the PE(s) to learn a 1830 particular MAC address and associate it with one or more interfaces, 1831 if the technology between the PE and the CE supports multi-pathing. 1832 The PEs can now load balance traffic destined to that MAC address on 1833 the multiple interfaces. 1835 Whether the CE can load balance traffic that it generates on the 1836 multiple interfaces is dependent on the CE implementation. 1838 14.2.2. Control plane learning 1840 The CE can be a host that advertises the same MAC address using a 1841 control protocol on both interfaces. This enables the PE(s) to learn 1842 the host's MAC address and associate it with one or more interfaces. 1843 The PEs can now load balance traffic destined to the host on the 1844 multiple interfaces. The host can also load balance the traffic it 1845 generates onto these interfaces and the PE that receives the traffic 1846 employs EVPN forwarding procedures to forward the traffic. 1848 15. MAC Mobility 1850 It is possible for a given host or end-station (as defined by its MAC 1851 address) to move from one Ethernet segment to another; this is 1852 referred to as 'MAC Mobility' or 'MAC move' and it is different from 1853 the multi-homing situation in which a given MAC address is reachable 1854 via multiple PEs for the same Ethernet segment. In a MAC move, there 1855 would be two sets of MAC Advertisement routes, one set with the new 1856 Ethernet segment and one set with the previous Ethernet segment, and 1857 the MAC address would appear to be reachable via each of these 1858 segments. 1860 In order to allow all of the PEs in the EVPN instance to correctly 1861 determine the current location of the MAC address, all advertisements 1862 of it being reachable via the previous Ethernet segment MUST be 1863 withdrawn by the PEs, for the previous Ethernet segment, that had 1864 advertised it. 1866 If local learning is performed using the data plane, these PEs will 1867 not be able to detect that the MAC address has moved to another 1868 Ethernet segment and the receipt of MAC Advertisement routes, with 1869 the MAC Mobility extended community attribute, from other PEs serves 1870 as the trigger for these PEs to withdraw their advertisements. If 1871 local learning is performed using the control or management planes, 1872 these interactions serve as the trigger for these PEs to withdraw 1873 their advertisements. 1875 In a situation where there are multiple moves of a given MAC, 1876 possibly between the same two Ethernet segments, there may be 1877 multiple withdrawals and re-advertisements. In order to ensure that 1878 all PEs in the EVPN instance receive all of these correctly through 1879 the intervening BGP infrastructure, it is necessary to introduce a 1880 sequence number into the MAC Mobility extended community attribute. 1882 An implementation MUST handle the scenarios where the sequence number 1883 wraps around to process mobility event correctly. 1885 Every MAC mobility event for a given MAC address will contain a 1886 sequence number that is set using the following rules: 1888 - A PE advertising a MAC address for the first time advertises it 1889 with no MAC Mobility extended community attribute. 1891 - A PE detecting a locally attached MAC address for which it had 1892 previously received a MAC Advertisement route with a different 1893 Ethernet segment identifier advertises the MAC address in a MAC 1894 Advertisement route tagged with a MAC Mobility extended community 1895 attribute with a sequence number one greater than the sequence number 1896 in the MAC mobility attribute of the received MAC Advertisement 1897 route. In the case of the first mobility event for a given MAC 1898 address, where the received MAC Advertisement route does not carry a 1899 MAC Mobility attribute, the value of the sequence number in the 1900 received route is assumed to be 0 for purpose of this processing. 1902 - A PE detecting a locally attached MAC address for which it had 1903 previously received a MAC Advertisement route with the same non-zero 1904 Ethernet segment identifier advertises it with: 1905 i. no MAC Mobility extended community attribute, if the received 1906 route did not carry said attribute. 1908 ii. a MAC Mobility extended community attribute with the sequence 1909 number equal to the highest of the sequence number(s) in the 1910 received MAC Advertisement route(s), if the received route(s) is 1911 (are) tagged with a MAC Mobility extended community attribute. 1913 - A PE detecting a locally attached MAC address for which it had 1914 previously received a MAC Advertisement route with the same zero 1915 Ethernet segment identifier (single-homed scenarios) advertises it 1916 with MAC mobility extended community attribute with the sequence 1917 number set properly. In case of single-homed scenarios, there is no 1918 need for ESI comparison. The reason ESI comparison is done for multi- 1919 homing, is to prevent false detection of MAC move among the PEs 1920 attached to the same multi-homed site. 1922 A PE receiving a MAC Advertisement route for a MAC address with a 1923 different Ethernet segment identifier and a higher sequence number 1924 than that which it had previously advertised, withdraws its MAC 1925 Advertisement route. If two (or more) PEs advertise the same MAC 1926 address with same sequence number but different Ethernet segment 1927 identifiers, a PE that receives these routes selects the route 1928 advertised by the PE with lowest IP address as the best route. 1930 15.1. MAC Duplication Issue 1932 A situation may arise where the same MAC address is learned by 1933 different PEs in the same VLAN because of two (or more hosts) being 1934 mis-configured with the same (duplicate) MAC address. In such 1935 situation, the traffic originating from these hosts would trigger 1936 continuous MAC moves among the PEs attached to these hosts. It is 1937 important to recognize such situation and avoid incrementing the 1938 sequence number (in the MAC Mobility attribute) to infinity. In order 1939 to remedy such situation, a PE that detects a MAC mobility event by 1940 way of local learning starts an M-second timer (default value of M = 1941 5) and if it detects N MAC moves before the timer expires (default 1942 value for N = 3), it concludes that a duplicate MAC situation has 1943 occurred. The PE MUST alert the operator and stop sending and 1944 processing any BGP MAC Advertisement routes for that MAC address till 1945 a corrective action is taken by the operator. The values of M and N 1946 MUST be configurable to allow for flexibility in operator control. 1947 Note that the other PEs in the E-VPN instance will forward the 1948 traffic for the duplicate MAC address to one of the PEs advertising 1949 the duplicate MAC address. 1951 15.2. Sticky MAC addresses 1953 There are scenarios in which it is desired to configure some MAC 1954 addresses as static so that they are not subjected to MAC move. In 1955 such scenarios, these MAC addresses are advertised with MAC Mobility 1956 Extended Community where static flag is set to 1 and sequence number 1957 is set to zero. If a PE receives such advertisements and later learns 1958 the same MAC address(es) via local learning, then the PE MUST alert 1959 the operator. 1961 16. Multicast & Broadcast 1963 The PEs in a particular EVPN instance may use ingress replication or 1964 P2MP LSPs to send multicast traffic to other PEs. 1966 16.1. Ingress Replication 1968 The PEs may use ingress replication for flooding BUM traffic as 1969 described in section "Handling of Multi-Destination Traffic". A given 1970 broadcast packet must be sent to all the remote PEs. However a given 1971 multicast packet for a multicast flow may be sent to only a subset of 1972 the PEs. Specifically a given multicast flow may be sent to only 1973 those PEs that have receivers that are interested in the multicast 1974 flow. Determining which of the PEs have receivers for a given 1975 multicast flow is done using explicit tracking described below. 1977 16.2. P2MP LSPs 1979 An PE may use an "Inclusive" tree for sending an BUM packet. This 1980 terminology is borrowed from [VPLS-MCAST]. 1982 A variety of transport technologies may be used in the SP network. 1983 For inclusive P-Multicast trees, these transport technologies include 1984 point-to-multipoint LSPs created by RSVP-TE or mLDP. 1986 16.2.1. Inclusive Trees 1988 An Inclusive Tree allows the use of a single multicast distribution 1989 tree, referred to as an Inclusive P-Multicast tree, in the SP network 1990 to carry all the multicast traffic from a specified set of EVPN 1991 instances on a given PE. A particular P-Multicast tree can be set up 1992 to carry the traffic originated by sites belonging to a single EVPN 1993 instance, or to carry the traffic originated by sites belonging to 1994 different EVPN instances. The ability to carry the traffic of more 1995 than one EVPN instance on the same tree is termed 'Aggregation'. The 1996 tree needs to include every PE that is a member of any of the EVPN 1997 instances that are using the tree. This implies that an PE may 1998 receive multicast traffic for a multicast stream even if it doesn't 1999 have any receivers that are interested in receiving traffic for that 2000 stream. 2002 An Inclusive P-Multicast tree as defined in this document is a P2MP 2003 tree. A P2MP tree is used to carry traffic only for EVPN CEs that 2004 are connected to the PE that is the root of the tree. 2006 The procedures for signaling an Inclusive Tree are the same as those 2007 in [VPLS-MCAST] with the VPLS-AD route replaced with the Inclusive 2008 Multicast Ethernet Tag route. The P-Tunnel attribute [VPLS-MCAST] for 2009 an Inclusive tree is advertised in the Inclusive Multicast route as 2010 described in section "Handling of Multi-Destination Traffic". Note 2011 that an PE can "aggregate" multiple inclusive trees for different 2012 EVPN instances on the same P2MP LSP using upstream labels. The 2013 procedures for aggregation are the same as those described in [VPLS- 2014 MCAST], with VPLS A-D routes replaced by EVPN Inclusive Multicast 2015 routes. 2017 17. Convergence 2019 This section describes failure recovery from different types of 2020 network failures. 2022 17.1. Transit Link and Node Failures between PEs 2024 The use of existing MPLS Fast-Reroute mechanisms can provide failure 2025 recovery in the order of 50ms, in the event of transit link and node 2026 failures in the infrastructure that connects the PEs. 2028 17.2. PE Failures 2030 Consider a host host1 that is dual homed to PE1 and PE2. If PE1 2031 fails, a remote PE, PE3, can discover this based on the failure of 2032 the BGP session. This failure detection can be in the sub-second 2033 range if BFD is used to detect BGP session failure. PE3 can update 2034 its forwarding state to start sending all traffic for host1 to only 2035 PE2. It is to be noted that this failure recovery is potentially 2036 faster than what would be possible if data plane learning were to be 2037 used. As in that case PE3 would have to rely on re-learning of MAC 2038 addresses via PE2. 2040 17.3. PE to CE Network Failures 2042 When an Ethernet segment connected to an PE fails or when a Ethernet 2043 Tag is decommissioned on an Ethernet segment, then the PE MUST 2044 withdraw the Ethernet A-D route(s) announced for the that are impacted by the failure or decommissioning. In 2046 addition, the PE MUST also withdraw the MAC advertisement routes that 2047 are impacted by the failure or decommissioning. 2049 The Ethernet A-D routes should be used by an implementation to 2050 optimize the withdrawal of MAC advertisement routes. When an PE 2051 receives a withdrawal of a particular Ethernet A-D route from an PE 2052 it SHOULD consider all the MAC advertisement routes, that are learned 2053 from the same as in the Ethernet A-D route, from 2054 the advertising PE, as having been withdrawn. This optimizes the 2055 network convergence times in the event of PE to CE failures. 2057 18. Frame Ordering 2059 In a MAC address, bit-1 of the most significant byte is used for 2060 unicast/multicast indication and bit-2 is used for globally unique 2061 versus locally administered MAC address. If the value of the 2nd 2062 nibble (bits 4 thorough 8) of the most significant byte of the 2063 destination MAC address (which follows the last MPLS label) happens 2064 to be 0x4 or 0x6, then the Ethernet frame can be misinterpreted as an 2065 IPv4 or IPv6 packet by intermediate P nodes performing ECMP based on 2066 deep packet inspection, thus resulting in load balancing packets 2067 belonging to the same flow on different ECMP paths and subjecting 2068 them to different delays. Therefore, packets belonging to the same 2069 flow can arrive at the destination out of order. This out of order 2070 delivery can happen during steady state in absence of any failures 2071 resulting in significant impact to the network operation. 2073 In order to avoid any such mis-ordering, the following rules are 2074 applied: 2076 - If a network uses deep packet inspection for its ECMP, then the 2077 control word SHOULD be used when sending EVPN encapsulated packets 2078 over a MP2P LSP. 2080 - If a network uses Entropy label [RFC6790], then the control word 2081 SHOULD NOT be used when sending EVPN encapsulated packet over a MP2P 2082 LSP. 2084 - When sending EVPN encapsulated packets over a P2MP LSP or TE P2P 2085 LSP, then the control world SHOULD NOT be used. 2087 The control word is defined as follows: 2089 0 1 2 3 2090 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2091 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2092 |0 0 0 0| Reserved | Sequence Number | 2093 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2095 In the above diagram the first 4 bits MUST be set to 0. The rest of 2096 the first 16 bits are reserved for future use. They MUST be set to 0 2097 when transmitting, and MUST be ignored upon receipt. The next 16 bits 2098 provide a sequence number that MUST also be set to zero by default. 2100 19. Acknowledgements 2101 Special thanks to Yakov Rekhter for reviewing this draft several 2102 times and providing valuable comments and for his very engaging 2103 discussions on several topics of this draft that helped shape this 2104 document. We would also like to thank Pedro Marques, Kaushik Ghosh, 2105 Nischal Sheth, Robert Raszuk, Amit Shukla and Nadeem Mohammed for 2106 discussions that helped shape this document. We would also like to 2107 thank Han Nguyen for his comments and support of this work. We would 2108 also like to thank Steve Kensil and Reshad Rahman for their reviews. 2109 We would like to thank Jorge Rabadan for his contribution to section 2110 5 of this draft. We like to thank Thomas Morin for his review of this 2111 draft and his contribution of section 8.6. Last but not least, many 2112 thanks to Jakob Heitz for his help to improve several sections of 2113 this draft. 2115 20. Security Considerations 2117 Security considerations discussed in [RFC4761] and [RFC4762] apply to 2118 this document for MAC learning in data-plane over an Attachment 2119 Circuit (AC) and for flooding of unknown unicast and ARP messages 2120 over the MPLS/IP core. Security considerations discussed in [RFC4364] 2121 apply to this document for MAC learning in control-plane over the 2122 MPLS/IP core. This section describes additional considerations. 2124 As mentioned in [RFC4761], there are two aspects to achieving data 2125 privacy and protecting against denial-of-service attacks in a VPN: 2126 securing the control plane and protecting the forwarding path. 2127 Compromise of the control plane could result in a PE sending customer 2128 data belonging to some EVPN to another EVPN, or black-holing EVPN 2129 customer data, or even sending it to an eavesdropper; none of which 2130 are acceptable from a data privacy point of view. In addition, 2131 compromise of the control plane could result in black-holing EVPN 2132 customer data and could provide opportunities for unauthorized EVPN 2133 data usage (e.g., exploiting traffic replication within a multicast 2134 tree to amplify a denial-of-service attack based on sending large 2135 amounts of traffic). 2137 The mechanisms in this document use BGP for the control plane. Hence, 2138 techniques such as in [RFC5925] help authenticate BGP messages, 2139 making it harder to spoof updates (which can be used to divert EVPN 2140 traffic to the wrong EVPN instance) or withdrawals (denial-of-service 2141 attacks). In the multi-AS methods (b) and (c), this also means 2142 protecting the inter-AS BGP sessions, between the ASBRs, the PEs, or 2143 the Route Reflectors. 2145 Note that [RFC5925] will not help in keeping MPLS labels private -- 2146 knowing the labels, one can eavesdrop on EVPN traffic. However, this 2147 requires access to the data path within an SP network, which is 2148 assumed to be composed of trusted nodes/links. 2150 One of the requirements for protecting the data plane is that the 2151 MPLS labels be accepted only from valid interfaces. For a PE, valid 2152 interfaces comprise links from other routers in the PE's own AS. For 2153 an ASBR, valid interfaces comprise links from other routers in the 2154 ASBR's own AS, and links from other ASBRs in ASes that have instances 2155 of a given EVPN. It is especially important in the case of multi-AS 2156 EVPN instances that one accept EVPN packets only from valid 2157 interfaces. 2159 It is also important to help limit malicious traffic into a network 2160 for an imposter MAC address. The mechanism described in section 16.1, 2161 shows how duplicate MAC addresses can be detected and continous false 2162 MAC mobility can be prevented. The mechanism described in section 2163 16.2, shows how MAC addresses can be pinned to a given Ethernet 2164 Segment, such that if they appear behind any other Ethernet Segments, 2165 the traffic for those MAC addresses be prevented from entering the 2166 EVPN network from the other Ethernet Segments. 2168 21. Contributors 2170 In addition to the authors listed above, the following individuals 2171 also contributed to this document: 2173 Samer Salam 2174 Sami Boutros 2175 Keyur Patel 2176 Clarence Filsfils 2177 Dennis Cai 2178 Cisco 2180 Ravi Shekhar 2181 Quaizar Vohra 2182 Kireeti Kompella 2183 Apurva Mehta 2184 Nadeem Mohammad 2185 Juniper Networks 2187 Florin Balus 2188 Nuage Networks 2190 22. IANA Considerations 2192 This document defines a new NLRI, called "EVPN", to be carried in BGP 2193 using multiprotocol extensions. This NLRI uses the existing AFI of 2194 25 (L2VPN). IANA has assigned it a SAFI value of 70. 2196 23. References 2197 23.1 Normative References 2199 [RFC4364] "BGP/MPLS IP VPNs", Rosen, Rekhter, et. al., February 2006 2201 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 2202 (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 2203 4761, January 2007. 2205 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 2206 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 2207 RFC 4762, January 2007. 2209 [RFC4271] Y. Rekhter et. al., "A Border Gateway Protocol 4 (BGP-4)", 2210 RFC 4271, January 2006 2212 [RFC4760] T. Bates et. al., "Multiprotocol Extensions for BGP-4", RFC 2213 4760, January 2007 2215 23.2 Informative References 2217 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2218 Requirement Levels", BCP 14, RFC 2119, March 1997. 2220 [EVPN-REQ] A. Sajassi, R. Aggarwal et. al., "Requirements for 2221 Ethernet VPN", draft-ietf-l2vpn-evpn-req-04.txt, July 2222 2013. 2224 [VPLS-MCAST] "Multicast in VPLS". R. Aggarwal et.al., draft-ietf- 2225 l2vpn-vpls-mcast-14.txt, July 2013. 2227 [RT-CONSTRAIN] P. Marques et. al., "Constrained Route Distribution 2228 for Border Gateway Protocol/MultiProtocol Label Switching 2229 (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks 2230 (VPNs)", RFC 4684, November 2006. 2232 [RFC6790] K. Kompella et. al, "The Use of Entropy Labels in MPLS 2233 Forwarding", RFC 6790, November 2012. 2235 24. Author's Address 2237 Ali Sajassi 2238 Cisco 2239 Email: sajassi@cisco.com 2241 Rahul Aggarwal 2242 Email: raggarwa_1@yahoo.com 2243 Wim Henderickx 2244 Alcatel-Lucent 2245 e-mail: wim.henderickx@alcatel-lucent.com 2247 Aldrin Isaac 2248 Bloomberg 2249 Email: aisaac71@bloomberg.net 2251 James Uttaro 2252 AT&T 2253 200 S. Laurel Avenue 2254 Middletown, NJ 07748 2255 USA 2256 Email: uttaro@att.com 2258 Nabil Bitar 2259 Verizon Communications 2260 Email : nabil.n.bitar@verizon.com 2262 John Drake 2263 Juniper Networks 2264 Email: jdrake@juniper.net