idnits 2.17.1 draft-ietf-l2vpn-evpn-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 15, 2013) is 3910 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-07) exists of draft-ietf-l2vpn-evpn-req-04 == Outdated reference: A later version (-16) exists of draft-ietf-l2vpn-vpls-mcast-14 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Sajassi 3 INTERNET-DRAFT Cisco 4 Category: Standards Track 5 R. Aggarwal 6 N. Bitar Arktan 7 Verizon 8 W. Henderickx 9 S. Boutros F. Balus 10 K. Patel Alcatel-Lucent 11 S. Salam 12 Cisco Aldrin Isaac 13 Bloomberg 14 J. Drake 15 R. Shekhar J. Uttaro 16 Juniper Networks AT&T 18 Expires: January 15, 2014 July 15, 2013 20 BGP MPLS Based Ethernet VPN 21 draft-ietf-l2vpn-evpn-04 23 Status of this Memo 25 This Internet-Draft is submitted to IETF in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as 31 Internet-Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/1id-abstracts.html 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html 44 Copyright and License Notice 46 Copyright (c) 2013 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Abstract 61 This document describes procedures for BGP MPLS based Ethernet VPNs 62 (EVPN). 64 Table of Contents 66 1. Specification of requirements . . . . . . . . . . . . . . . . . 5 67 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 5 68 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 69 4. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 6 70 5. BGP MPLS Based EVPN Overview . . . . . . . . . . . . . . . . . 6 71 6. Ethernet Segment . . . . . . . . . . . . . . . . . . . . . . . 7 72 7. Ethernet Tag . . . . . . . . . . . . . . . . . . . . . . . . . 9 73 7.1 VLAN Based Service Interface . . . . . . . . . . . . . . . . 9 74 7.2 VLAN Bundle Service Interface . . . . . . . . . . . . . . . 9 75 7.2.1 Port Based Service Interface . . . . . . . . . . . . . . 10 76 7.3 VLAN Aware Bundle Service Interface . . . . . . . . . . . . 10 77 7.3.1 Port Based VLAN Aware Service Interface . . . . . . . . 10 78 8. BGP EVPN NLRI . . . . . . . . . . . . . . . . . . . . . . . . . 10 79 8.1. Ethernet Auto-Discovery Route . . . . . . . . . . . . . . . 11 80 8.2. MAC Advertisement Route . . . . . . . . . . . . . . . . . 12 81 8.3. Inclusive Multicast Ethernet Tag Route . . . . . . . . . . 12 82 8.4 Ethernet Segment Route . . . . . . . . . . . . . . . . . . . 13 83 8.5 ESI Label Extended Community . . . . . . . . . . . . . . . . 13 84 8.6 ES-Import Route Target . . . . . . . . . . . . . . . . . . . 14 85 8.7 MAC Mobility Extended Community . . . . . . . . . . . . . . 14 86 8.8 Default Gateway Extended Community . . . . . . . . . . . . . 15 87 9. Multi-homing Functions . . . . . . . . . . . . . . . . . . . . 15 88 9.1 Multi-homed Ethernet Segment Auto-Discovery . . . . . . . . 15 89 9.1.1 Constructing the Ethernet Segment Route . . . . . . . . 15 90 9.2 Fast Convergence . . . . . . . . . . . . . . . . . . . . . . 16 91 9.2.1 Constructing the Ethernet A-D Route per Ethernet 92 Segment . . . . . . . . . . . . . . . . . . . . . . . . 16 93 9.2.1.1. Ethernet A-D Route Targets . . . . . . . . . . . . 17 94 9.3 Split Horizon . . . . . . . . . . . . . . . . . . . . . . . 17 95 9.3.1 ESI Label Assignment . . . . . . . . . . . . . . . . . . 18 96 9.3.1.1 Ingress Replication . . . . . . . . . . . . . . . . 18 97 9.3.1.2. P2MP MPLS LSPs . . . . . . . . . . . . . . . . . . 19 98 9.4 Aliasing and Backup-Path . . . . . . . . . . . . . . . . . . 20 99 9.4.1 Constructing the Ethernet A-D Route per EVI . . . . . . 21 100 9.4.1.1 Ethernet A-D Route Targets . . . . . . . . . . . . . 22 101 9.5 Designated Forwarder Election . . . . . . . . . . . . . . . 22 102 9.6. Interoperability with Single-homing PEs . . . . . . . . . . 24 103 10. Determining Reachability to Unicast MAC Addresses . . . . . . 25 104 10.1. Local Learning . . . . . . . . . . . . . . . . . . . . . . 25 105 10.2. Remote learning . . . . . . . . . . . . . . . . . . . . . 26 106 10.2.1. Constructing the BGP EVPN MAC Address Advertisement . 26 107 10.2.2 Route Resolution . . . . . . . . . . . . . . . . . . . 28 108 11. ARP and ND . . . . . . . . . . . . . . . . . . . . . . . . . . 29 109 11.1 Default Gateway . . . . . . . . . . . . . . . . . . . . . . 29 110 12. Handling of Multi-Destination Traffic . . . . . . . . . . . . 30 111 12.1. Construction of the Inclusive Multicast Ethernet Tag 112 Route . . . . . . . . . . . . . . . . . . . . . . . . . . 31 113 12.2. P-Tunnel Identification . . . . . . . . . . . . . . . . . 31 114 13. Processing of Unknown Unicast Packets . . . . . . . . . . . . 32 115 13.1. Ingress Replication . . . . . . . . . . . . . . . . . . . 33 116 13.2. P2MP MPLS LSPs . . . . . . . . . . . . . . . . . . . . . . 33 117 14. Forwarding Unicast Packets . . . . . . . . . . . . . . . . . . 34 118 14.1. Forwarding packets received from a CE . . . . . . . . . . 34 119 14.2. Forwarding packets received from a remote PE . . . . . . . 35 120 14.2.1. Unknown Unicast Forwarding . . . . . . . . . . . . . . 35 121 14.2.2. Known Unicast Forwarding . . . . . . . . . . . . . . . 35 122 15. Load Balancing of Unicast Frames . . . . . . . . . . . . . . . 36 123 15.1. Load balancing of traffic from an PE to remote CEs . . . . 36 124 15.1.1 Single-Active Redundancy Mode . . . . . . . . . . . . . 36 125 15.1.2 All-Active Redundancy Mode . . . . . . . . . . . . . . 37 126 15.2. Load balancing of traffic between an PE and a local CE . . 38 127 15.2.1. Data plane learning . . . . . . . . . . . . . . . . . 38 128 15.2.2. Control plane learning . . . . . . . . . . . . . . . . 39 129 16. MAC Mobility . . . . . . . . . . . . . . . . . . . . . . . . . 39 130 16.1. MAC Duplication Issue . . . . . . . . . . . . . . . . . . 41 131 16.2. Sticky MAC addresses . . . . . . . . . . . . . . . . . . . 41 132 17. Multicast & Broadcast . . . . . . . . . . . . . . . . . . . . 41 133 17.1. Ingress Replication . . . . . . . . . . . . . . . . . . . 41 134 17.2. P2MP LSPs . . . . . . . . . . . . . . . . . . . . . . . . 42 135 17.2.1. Inclusive Trees . . . . . . . . . . . . . . . . . . . 42 136 18. Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 42 137 18.1. Transit Link and Node Failures between PEs . . . . . . . . 42 138 18.2. PE Failures . . . . . . . . . . . . . . . . . . . . . . . 43 139 18.2. PE to CE Network Failures . . . . . . . . . . . . . . . . 43 140 19. Frame Ordering . . . . . . . . . . . . . . . . . . . . . . . . 43 141 20. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 44 142 21. Security Considerations . . . . . . . . . . . . . . . . . . . 44 143 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 144 23. References . . . . . . . . . . . . . . . . . . . . . . . . . . 45 145 23.1 Normative References . . . . . . . . . . . . . . . . . . . 45 146 23.2 Informative References . . . . . . . . . . . . . . . . . . 45 147 24. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 45 149 1. Specification of requirements 151 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 152 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 153 document are to be interpreted as described in [RFC2119]. 155 2. Terminology 157 Bridge Domain: 159 Broadcast Domain: 161 CE: Customer Edge device e.g., host or router or switch 163 EVI: An EVPN instance spanning across the PEs participating in that 164 VPN 166 MAC-VRF: A Virtual Routing and Forwarding table for MAC addresses on 167 a PE for an EVI 169 Ethernet Segment Identifier (ESI): If a CE is multi-homed to two or 170 more PEs, the set of Ethernet links that attaches the CE to the PEs 171 is an 'Ethernet segment'. Ethernet segments MUST have a unique non- 172 zero identifier, the 'Ethernet Segment Identifier'. 174 Ethernet Tag: An Ethernet Tag identifies a particular broadcast 175 domain, e.g., a VLAN. An EVPN instance consists of one or more 176 broadcast domains. Ethernet tag(s) are assigned to the broadcast 177 domains of a given EVPN instance by the provider of that EVPN, and 178 each PE in that EVPN instance performs a mapping between broadcast 179 domain identifier(s) understood by each of its attached CEs and the 180 corresponding Ethernet tag. 182 LACP: Link Aggregation Control Protocol 184 MP2MP: Multipoint to Multipoint 186 P2MP: Point to Multipoint 188 P2P: Point to Point 190 Single-Active Mode: When a device or a network is multi-homed to two 191 or more PEs and when only a single PE in such redundancy group can 192 forward traffic to/from the multi-homed device or network for a given 193 VLAN, then such multi-homing or redundancy is referred to as "Single- 194 Active". 196 All-Active Mode: When a device is multi-homed to two or more PEs and 197 when all PEs in such redundancy group can forward traffic to/from the 198 multi-homed device for a given VLAN, then such multi-homing or 199 redundancy is referred to as "All-Active". 201 3. Introduction 203 This document describes procedures for BGP MPLS based Ethernet VPNs 204 (EVPN). The procedures described here are intended to meet the 205 requirements specified in [EVPN-REQ]. Please refer to [EVPN-REQ] for 206 the detailed requirements and motivation. EVPN requires extensions to 207 existing IP/MPLS protocols as described in this document. In addition 208 to these extensions EVPN uses several building blocks from existing 209 MPLS technologies. 211 4. Contributors 213 In addition to the authors listed above, the following individuals 214 also contributed to this document: 216 Quaizar Vohra 217 Kireeti Kompella 218 Apurva Mehta 219 Nadeem Mohammad 220 Juniper Networks 222 Clarence Filsfils 223 Dennis Cai 224 Cisco 226 5. BGP MPLS Based EVPN Overview 228 This section provides an overview of EVPN. An EVPN instance comprises 229 CEs that are connected to PEs that form the edge of the MPLS 230 infrastructure. A CE may be a host, a router or a switch. The PEs 231 provide virtual Layer 2 bridged connectivity between the CEs. There 232 may be multiple EVPN instances in the provider's network. 234 The PEs may be connected by an MPLS LSP infrastructure which provides 235 the benefits of MPLS technology such as fast-reroute, resiliency, 236 etc. The PEs may also be connected by an IP infrastructure in which 237 case IP/GRE tunneling or other IP tunneling can be used between the 238 PEs. The detailed procedures in this version of this document are 239 specified only for MPLS LSPs as the tunneling technology. However 240 these procedures are designed to be extensible to IP tunneling as the 241 PSN tunneling technology. 243 In an EVPN, MAC learning between PEs occurs not in the data plane (as 244 happens with traditional bridging) but in the control plane. Control 245 plane learning offers greater control over the MAC learning process, 246 such as restricting who learns what, and the ability to apply 247 policies. Furthermore, the control plane chosen for advertising MAC 248 reachability information is multi-protocol (MP) BGP (similar to IP 249 VPNs (RFC 4364)). This provides greater scalability and the ability 250 to preserve the "virtualization" or isolation of groups of 251 interacting agents (hosts, servers, virtual machines) from each 252 other. In EVPN, PEs advertise the MAC addresses learned from the CEs 253 that are connected to them, along with an MPLS label, to other PEs in 254 the control plane using MP-BGP. Control plane learning enables load 255 balancing of traffic to and from CEs that are multi-homed to multiple 256 PEs. This is in addition to load balancing across the MPLS core via 257 multiple LSPs between the same pair of PEs. In other words it allows 258 CEs to connect to multiple active points of attachment. It also 259 improves convergence times in the event of certain network failures. 261 However, learning between PEs and CEs is done by the method best 262 suited to the CE: data plane learning, IEEE 802.1x, LLDP, 802.1aq, 263 ARP, management plane or other protocols. 265 It is a local decision as to whether the Layer 2 forwarding table on 266 an PE is populated with all the MAC destination addresses known to 267 the control plane, or whether the PE implements a cache based scheme. 268 For instance the MAC forwarding table may be populated only with the 269 MAC destinations of the active flows transiting a specific PE. 271 The policy attributes of EVPN are very similar to those of IP-VPN. A 272 EVPN instance requires a Route-Distinguisher (RD) which is unique per 273 PE and one or more globally unique Route-Targets (RTs). A CE attaches 274 to a MAC-VRF on an PE, on an Ethernet interface which may be 275 configured for one or more Ethernet Tags, e.g., VLAN IDs. Some 276 deployment scenarios guarantee uniqueness of VLAN IDs across EVPN 277 instances: all points of attachment for a given EVPN instance use the 278 same VLAN ID, and no other EVPN instance uses this VLAN ID. This 279 document refers to this case as a "Unique VLAN EVPN" and describes 280 simplified procedures to optimize for it. 282 6. Ethernet Segment 284 If a CE is multi-homed to two or more PEs, the set of Ethernet links 285 constitutes an "Ethernet Segment". An Ethernet segment may appear to 286 the CE as a Link Aggregation Group (LAG). Ethernet segments have an 287 identifier, called the "Ethernet Segment Identifier" (ESI) which is 288 encoded as a ten octets integer. The following two ESI values are 289 reserved: 291 - ESI 0 denotes a single-homed CE. 293 - ESI {0xFF} (repeated 10 times) is known as MAX-ESI and is reserved. 295 In general, an Ethernet segment MUST have a non-reserved ESI that is 296 unique network wide (e.g., across all EVPN instances on all the PEs). 297 If the CE(s) constituting an Ethernet Segment is (are) managed by the 298 network operator, then ESI uniqueness should be guaranteed; however, 299 if the CE(s) is (are) not managed, then the operator MUST configure a 300 network-wide unique ESI for that Ethernet Segment. This is required 301 to enable auto-discovery of Ethernet Segments and DF election. The 302 ESI can be assigned using various mechanisms: 304 1. If IEEE 802.1AX LACP is used between the PEs and CEs, then 305 the ESI is determined from LACP by concatenating the following 306 parameters: 308 + CE LACP System Identifier comprised of two octets of System 309 Priority and six octets of System MAC address, where the 310 System Priority is encoded in the most significant two octets. 311 The CE LACP identifier MUST be encoded in the high order eight 312 octets of the ESI. 314 + CE LACP two octets Port Key. The CE LACP port key MUST be 315 encoded in the low order two octets of the ESI. 317 As far as the CE is concerned, it would treat the multiple PEs 318 that it is connected to as the same switch. This allows the CE 319 to aggregate links that are attached to different PEs in the 320 same bundle. 322 This mechanism could be used only if it produces ESIs that satisfy 323 the uniqueness requirement specified above. 325 2. In the case of indirectly connected hosts via a bridged LAN 326 between the CEs and the PEs, the ESI is determined based on the 327 Layer 2 bridge protocol as follows: If MST is used in the bridged 328 LAN then the value of the ESI is derived by listening to BPDUs on 329 the Ethernet segment. To achieve this the PE is not required to 330 run MST. However the PE must learn the Root Bridge MAC address 331 and Bridge Priority of the root of the Internal Spanning Tree 332 (IST) by listening to the BPDUs. The ESI is constructed as 333 follows: 335 {Bridge Priority (16 bits) , Root Bridge MAC Address (48 bits)} 336 This mechanism could be used only if it produces ESIs that satisfy 337 the uniqueness requirement specified above. 339 3. The ESI may be configured. 341 7. Ethernet Tag 343 An Ethernet Tag identifies a particular broadcast domain, e.g. a 344 VLAN, in an EVPN Instance. An EVPN Instance consists of one or more 345 broadcast domains (one or more VLANs). VLANs are assigned to a given 346 EVPN Instance by the provider of the EVPN service. A given VLAN can 347 itself be represented by multiple VLAN IDs (VIDs). In such cases, the 348 PEs participating in that VLAN for a given EVPN instance are 349 responsible for performing VLAN ID translation to/from locally 350 attached CE devices. 352 If a VLAN is represented by a single VID across all PE devices 353 participating in that VLAN for that EVPN instance, then there is no 354 need for VID translation at the PEs. Furthermore, some deployment 355 scenarios guarantee uniqueness of VIDs across all EVPN instances; 356 all points of attachment for a given EVPN instance use the same VID 357 and no other EVPN instances use that VID. This allows the RT(s) for 358 each EVPN instance to be derived automatically from the corresponding 359 VID, as described in section 9.4.1.1.1 "Auto-Derivation from the 360 Ethernet Tag ID". 362 The following subsections discuss the relationship between broadcast 363 domains (e.g., VLANs), Ethernet Tags (e.g., VIDs), and MAC-VRFs as 364 well as the setting of the Ethernet Tag Identifier, in the various 365 EVPN BGP routes (defined in section 8), for the different types of 366 service interfaces described in [EVPN-REQ]. 368 7.1 VLAN Based Service Interface 370 With this service interface, an EVPN instance consists of only a 371 single broadcast domain (e.g., a single VLAN). Therefore, there is a 372 one to one mapping between a VID on this interface and a MAC-VRF. 373 Since a MAC-VRF corresponds to a single VLAN, it consists of a single 374 bridge domain corresponding to that VLAN. If the VLAN is represented 375 by different VIDs on different PEs, then each PE needs to perform VID 376 translation for frames destined to its attached CEs. In such 377 scenarios, the Ethernet frames transported over MPLS/IP network 378 SHOULD remain tagged with the originating VID and a VID translation 379 MUST be supported in the data path and MUST be performed on the 380 disposition PE. The Ethernet Tag Identifier in all EVPN routes MUST 381 be set to 0. 383 7.2 VLAN Bundle Service Interface 384 With this service interface, an EVPN instance corresponds to several 385 broadcast domains (e.g., several VLANs); however, only a single 386 bridge domain is maintained per MAC-VRF which means multiple VLANs 387 share the same bridge domain. This implies MAC addresses MUST be 388 unique across different VLANs for this service to work. In other 389 words, there is a many-to-one mapping between VLANs and a MAC-VRF, 390 and the MAC-VRF consists of a single bridge domain. Furthermore, a 391 single VLAN must be represented by a single VID - e.g., no VID 392 translation is allowed for this service interface type. The MPLS 393 encapsulated frames MUST remain tagged with the originating VID. Tag 394 translation is NOT permitted. The Ethernet Tag Identifier in all EVPN 395 routes MUST be set to 0. 397 7.2.1 Port Based Service Interface 399 This service interface is a special case of the VLAN Bundle service 400 interface, where all of the VLANs on the port are part of the same 401 service and map to the same bundle. The procedures are identical to 402 those described in section 7.2. 404 7.3 VLAN Aware Bundle Service Interface 406 With this service interface, an EVPN instance consists of several 407 broadcast domains (e.g., several VLANs) with each VLAN having its own 408 bridge domain - e.g., multiple bridge domains (one per VLAN) is 409 maintained by a single MAC-VRF corresponding to the EVPN instance. In 410 the case where a single VLAN is represented by different VIDs on 411 different CEs and thus tag (VID) translation is required, a 412 normalized Ethernet Tag (VID) MUST be carried in the MPLS 413 encapsulated frames and a tag translation function MUST be supported 414 in the data path. This translation MUST be performed in data path on 415 both the imposition as well as the disposition PEs (translating to 416 normalized tag on imposition PE and translating to local tag on 417 disposition PE). The Ethernet Tag Identifier in all EVPN routes MUST 418 be set to the normalized Ethernet Tag assigned by the EVPN provider. 420 7.3.1 Port Based VLAN Aware Service Interface 422 This service interface is a special case of the VLAN Aware Bundle 423 service interface, where all of the VLANs on the port are part of the 424 same service and map to the same bundle. The procedures are identical 425 to those described in section 7.3. 427 8. BGP EVPN NLRI 429 This document defines a new BGP NLRI, called the EVPN NLRI. 431 Following is the format of the EVPN NLRI: 433 +-----------------------------------+ 434 | Route Type (1 octet) | 435 +-----------------------------------+ 436 | Length (1 octet) | 437 +-----------------------------------+ 438 | Route Type specific (variable) | 439 +-----------------------------------+ 441 The Route Type field defines encoding of the rest of the EVPN NLRI 442 (Route Type specific EVPN NLRI). 444 The Length field indicates the length in octets of the Route Type 445 specific field of EVPN NLRI. 447 This document defines the following Route Types: 449 + 1 - Ethernet Auto-Discovery (A-D) route 450 + 2 - MAC advertisement route 451 + 3 - Inclusive Multicast Route 452 + 4 - Ethernet Segment Route 454 The detailed encoding and procedures for these route types are 455 described in subsequent sections. 457 The EVPN NLRI is carried in BGP [RFC4271] using BGP Multiprotocol 458 Extensions [RFC4760] with an AFI of 25 (L2VPN) and a SAFI of 70 459 (EVPN). The NLRI field in the MP_REACH_NLRI/MP_UNREACH_NLRI attribute 460 contains the EVPN NLRI (encoded as specified above). 462 In order for two BGP speakers to exchange labeled EVPN NLRI, they 463 must use BGP Capabilities Advertisement to ensure that they both are 464 capable of properly processing such NLRI. This is done as specified 465 in [RFC4760], by using capability code 1 (multiprotocol BGP) with an 466 AFI of 25 (L2VPN) and a SAFI of 70 (EVPN). 468 8.1. Ethernet Auto-Discovery Route 470 A Ethernet A-D route type specific EVPN NLRI consists of the 471 following: 473 +---------------------------------------+ 474 | RD (8 octets) | 475 +---------------------------------------+ 476 |Ethernet Segment Identifier (10 octets)| 477 +---------------------------------------+ 478 | Ethernet Tag ID (4 octets) | 479 +---------------------------------------+ 480 | MPLS Label (3 octets) | 481 +---------------------------------------+ 483 For procedures and usage of this route please see section 9.2 "Fast 484 Convergence" and section 9.4 "Aliasing". 486 8.2. MAC Advertisement Route 488 A MAC advertisement route type specific EVPN NLRI consists of the 489 following: 491 +---------------------------------------+ 492 | RD (8 octets) | 493 +---------------------------------------+ 494 |Ethernet Segment Identifier (10 octets)| 495 +---------------------------------------+ 496 | Ethernet Tag ID (4 octets) | 497 +---------------------------------------+ 498 | MAC Address Length (1 octet) | 499 +---------------------------------------+ 500 | MAC Address (6 octets) | 501 +---------------------------------------+ 502 | IP Address Length (1 octet) | 503 +---------------------------------------+ 504 | IP Address (4 or 16 octets) | 505 +---------------------------------------+ 506 | MPLS Label (3 octets) | 507 +---------------------------------------+ 509 For the purpose of BGP route key processing, only the Ethernet Tag 510 ID, MAC Address Length, MAC Address, IP Address Length, and IP 511 Address Address fields are considered to be part of the prefix in the 512 NLRI. The Ethernet Segment Identifier and MPLS Label fields are to be 513 treated as route attributes as opposed to being part of the "route". 515 For procedures and usage of this route please see section 10 516 "Determining Reachability to Unicast MAC Addresses" and section 15 517 "Load Balancing of Unicast Packets". 519 8.3. Inclusive Multicast Ethernet Tag Route 520 An Inclusive Multicast Ethernet Tag route type specific EVPN NLRI 521 consists of the following: 523 +---------------------------------------+ 524 | RD (8 octets) | 525 +---------------------------------------+ 526 | Ethernet Tag ID (4 octets) | 527 +---------------------------------------+ 528 | IP Address Length (1 octet) | 529 +---------------------------------------+ 530 | Originating Router's IP Addr | 531 | (4 or 16 octets) | 532 +---------------------------------------+ 534 For procedures and usage of this route please see section 12 535 "Handling of Multi-Destination Traffic", section 13 "Processing of 536 Unknown Unicast Traffic" and section 17 "Multicast". 538 8.4 Ethernet Segment Route 540 The Ethernet Segment Route is encoded in the EVPN NLRI using the 541 Route Type value of 4. The Route Type Specific field of the NLRI is 542 formatted as follows: 544 +---------------------------------------+ 545 | RD (8 octets) | 546 +---------------------------------------+ 547 |Ethernet Segment Identifier (10 octets)| 548 +---------------------------------------+ 549 | IP Address Length (1 octet) | 550 +---------------------------------------+ 551 | Originating Router's IP Addr | 552 | (4 or 16 octets) | 553 +---------------------------------------+ 555 For procedures and usage of this route please see section 9.5 556 "Designated Forwarder Election". 558 8.5 ESI Label Extended Community 560 This extended community is a new transitive extended community with 561 the Type field is 0x06, and the Sub-Type of 0x01. It may be 562 advertised along with Ethernet Auto-Discovery routes and it enables 563 split-horizon procedures for multi-homed sites as described in 564 section 9.3 "Split Horizon". 566 Each ESI Label Extended Community is encoded as a 8-octet value as 567 follows: 569 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 570 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 571 | Type=0x06 | Sub-Type=0x01 | Flags (One Octet) |Reserved=0 | 572 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 573 | Reserved = 0| ESI Label | 574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 The low order bit of the flags octet is defined as the "Active- 577 Standby" bit and may be set to 1. A value of 0 means that the multi- 578 homed site is operating in All-Active mode; whereas, a value of 1 579 means that the multi-homed site is operating in Single-Active mode. 581 The second low order bit of the flags octet is defined as the "Root- 582 Leaf". A value of 0 means that this label is associated with a Root 583 site; whereas, a value of 1 means that this label is associate with a 584 Leaf site. The other bits must be set to 0. 586 8.6 ES-Import Route Target 588 This is a new transitive Route Target extended community carried with 589 the Ethernet Segment route. When used, it enables all the PEs 590 connected to the same multi-homed site to import the Ethernet Segment 591 routes. The value is derived automatically from the ESI by encoding 592 the 6-byte MAC address portion of the ESI in the ES-Import Route 593 Target. The format of this extended community is as follows: 595 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 597 | Type=0x06 | Sub-Type=0x02 | ES-Import | 598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 599 | ES-Import Cont'd | 600 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 602 This document expands the definition of the Route Target extended 603 community to allow the value of high order octet (Type field) to be 604 0x06 (in addition to the values specified in rfc4360). The value of 605 low order octet (Sub-Type field) of 0x02 indicates that this extended 606 community is of type "Route Target". The new value for Type field of 607 0x06 indicates that the structure of this RT is a six bytes value 608 (e.g., a MAC address). A BGP speaker that implements RT-Constrain 609 (RFC4684) MUST apply the RT-Constrain procedures to the ES-import RT 610 as-well. 612 For procedures and usage of this attribute, please see section 9.1 613 "Redundancy Group Discovery". 615 8.7 MAC Mobility Extended Community 616 This extended community is a new transitive extended community with 617 the Type field of 0x06 and the Sub-Type of 0x00. It may be advertised 618 along with MAC Advertisement routes. The procedures for using this 619 Extended Community are described in section 16 "MAC Mobility". 621 The MAC Mobility Extended Community is encoded as a 8-octet value as 622 follows: 624 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 626 | Type=0x06 | Sub-Type=0x00 |Flags(1 octet)| Reserved=0 | 627 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 628 | Sequence Number | 629 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 631 The low order bit of the flags octet is defined as the 632 "Sticky/static" flag and may be set to 1. A value of 1 means that the 633 MAC address is static and cannot move. 635 8.8 Default Gateway Extended Community 637 The Default Gateway community is an Extended Community of an Opaque 638 Type (see 3.3 of rfc4360). It is a transitive community, which means 639 that the first octet is 0x03. The value of the second octet (Sub- 640 Type) is 0x030d (Default Gateway) as defined by IANA. The Value field 641 of this community is reserved (set to 0 by the senders, ignored by 642 the receivers). 644 9. Multi-homing Functions 646 This section discusses the functions, procedures and associated BGP 647 routes used to support multi-homing in EVPN. This covers both multi- 648 homed device (MHD) as well as multi-homed network (MHN) scenarios. 650 9.1 Multi-homed Ethernet Segment Auto-Discovery 652 PEs connected to the same Ethernet segment can automatically discover 653 each other with minimal to no configuration through the exchange of 654 the Ethernet Segment route. 656 9.1.1 Constructing the Ethernet Segment Route 658 The Route-Distinguisher (RD) MUST be a Type 1 RD [RFC4364]. The value 659 field comprises an IP address of the MES (typically, the loopback 660 address) followed by 0's. 662 The Ethernet Segment Identifier MUST be set to the ten octet ESI 663 identifier described in section 6. 665 The BGP advertisement that advertises the Ethernet Segment route MUST 666 also carry an ES-Import extended community attribute, as defined in 667 section 8.6. 669 The Ethernet Segment Route filtering MUST be done such that the 670 Ethernet Segment Route is imported only by the PEs that are multi- 671 homed to the same Ethernet Segment. To that end, each PE that is 672 connected to a particular Ethernet segment constructs an import 673 filtering rule to import a route that carries the ES-Import extended 674 community, constructed from the ESI. 676 9.2 Fast Convergence 678 In EVPN, MAC address reachability is learnt via the BGP control-plane 679 over the MPLS network. As such, in the absence of any fast protection 680 mechanism, the network convergence time is a function of the number 681 of MAC Advertisement routes that must be withdrawn by the PE 682 encountering a failure. For highly scaled environments, this scheme 683 yields slow convergence. 685 To alleviate this, EVPN defines a mechanism to efficiently and 686 quickly signal, to remote PE nodes, the need to update their 687 forwarding tables upon the occurrence of a failure in connectivity to 688 an Ethernet segment. This is done by having each PE advertise an 689 Ethernet A-D Route per Ethernet segment for each locally attached 690 segment (refer to section 9.2.1 below for details on how this route 691 is constructed). Upon a failure in connectivity to the attached 692 segment, the PE withdraws the corresponding Ethernet A-D route. This 693 triggers all PEs that receive the withdrawal to update their next-hop 694 adjacencies for all MAC addresses associated with the Ethernet 695 segment in question. If no other PE had advertised an Ethernet A-D 696 route for the same segment, then the PE that received the withdrawal 697 simply invalidates the MAC entries for that segment. Otherwise, the 698 PE updates the next-hop adjacencies to point to the backup PE(s). 700 9.2.1 Constructing the Ethernet A-D Route per Ethernet Segment 702 This section describes procedures to construct the Ethernet A-D route 703 when a single such route is advertised by an PE for a given Ethernet 704 Segment. This flavor of the Ethernet A-D route is used for fast 705 convergence (as discussed above) as well as for advertising the ESI 706 label used for split-horizon filtering (as discussed in section 9.3). 707 Support of this route flavor is MANDATORY. 709 Route-Distinguisher (RD) MUST be a Type 1 RD [RFC4364]. The value 710 field comprises an IP address of the PE (typically, the loopback 711 address) followed by 0. 713 The Ethernet Segment Identifier MUST be a ten octet entity as 714 described in section "Ethernet Segment". This document does not 715 specify the use of the Ethernet A-D route when the Segment Identifier 716 is set to 0. 718 The Ethernet Tag ID MUST be set to 0. 720 The MPLS label in the NLRI MUST be set to 0. 722 The "ESI Label Extended Community" MUST be included in the route. If 723 all-Active multi-homing is desired, then the "Active-Standby" bit in 724 the flags of the ESI Label Extended Community MUST be set to 0 and 725 the MPLS label in that extended community MUST be set to a valid MPLS 726 label value. The MPLS label in this Extended Community is referred to 727 as an "ESI label". This label MUST be a downstream assigned MPLS 728 label if the advertising PE is using ingress replication for 729 receiving multicast, broadcast or unknown unicast traffic from other 730 PEs. If the advertising PE is using P2MP MPLS LSPs for sending 731 multicast, broadcast or unknown unicast traffic, then this label MUST 732 be an upstream assigned MPLS label. The usage of this label is 733 described in section 9.3. 735 If the Ethernet Segment is connected to more than one PE and Single- 736 Active multi-homing is desired, then the "Active-Standby" bit in the 737 flags of the ESI Label Extended Community MUST be set to 1 and ESI 738 label MUST be set to zero. 740 9.2.1.1. Ethernet A-D Route Targets 742 The Ethernet A-D route MUST carry one or more Route Target (RT) 743 attributes. These RTs MUST be the set of RTs associated with all the 744 EVPN instances to which the Ethernet Segment, corresponding to the 745 Ethernet A-D route, belongs. 747 9.3 Split Horizon 749 Consider a CE that is multi-homed to two or more PEs on an Ethernet 750 segment ES1 operating in All-Active mode. If the CE sends a 751 broadcast, unknown unicast, or multicast (BUM) packet to one of the 752 non-DF (Designated Forwarder) PEs, say PE1, then PE1 will forward 753 that packet to all or subset of the other PEs in that EVPN instance 754 including the DF PE for that Ethernet segment. In this case the DF PE 755 that the CE is multi-homed to MUST drop the packet and not forward 756 back to the CE. This filtering is referred to as "split horizon" 757 filtering in this document. 759 In order to achieve this split horizon function, every BUM packet 760 originating from a non-DF PE is encapsulated with an MPLS label that 761 identifies the Ethernet segment of origin (i.e. the segment from 762 which the frame entered the EVPN network). This label is referred to 763 as the ESI label, and MUST be distributed by all PEs when operating 764 in All-Active multi-homing mode using the "Ethernet A-D route per 765 Ethernet Segment" as per the procedures in section 9.2.1 above. This 766 route is imported by the PEs connected to the Ethernet Segment and 767 also by the PEs that have at least one EVPN instance in common with 768 the Ethernet Segment in the route. As described in section 9.1.1, the 769 route MUST carry an ESI Label Extended Community with a valid ESI 770 label. The disposition DF PE rely on the value of the ESI label to 771 determine whether or not a BUM frame is allowed to egress a specific 772 Ethernet segment. It should be noted that if the BUM frame is 773 originated from the DF PE operating in All-Active multi-homing mode, 774 then the DF PE MAY not encapsulate the frame with the ESI label. 775 Furthermore, if the multi-homed PEs operate in active/standby mode, 776 then the packet MUST NOT be encapsulated with the ESI label and the 777 label value MUST be set to zero in ESI Label Extended Community per 778 section 9.2.1 above. 780 9.3.1 ESI Label Assignment 782 The following subsections describe the assignment procedures for the 783 ESI label, which differ depending on the type of tunnels being used 784 to deliver multi-destination packets in the EVPN network. 786 9.3.1.1 Ingress Replication 788 All PEs operating in an All-Active multi-homing mode that rely on 789 ingress replication for the reception of BUM traffic, distribute to 790 other PEs, that belong to the Ethernet segment, a downstream assigned 791 "ESI label" in the Ethernet A-D route per ESI. This label MUST be 792 programmed in the platform label space by the advertising PE. Further 793 the forwarding entry for this label must result in NOT forwarding 794 packets received with this label onto the Ethernet segment that the 795 label was distributed for. 797 Consider PE1 and PE2 that are multi-homed to CE1 on ES1 and operating 798 in All-Active multi-homing mode. Further consider that PE1 is using 799 P2P or MP2P LSPs to send packets to PE2. Consider that PE1 is the 800 non-DF for VLAN1 and PE2 is the DF for VLAN1, and PE1 receives a BUM 801 packet from CE1 on VLAN1 on ES1. In this scenario, PE2 distributes an 802 Inclusive Multicast Ethernet Tag route for VLAN1 corresponding to an 803 EVPN instance. So, when PE1 sends a BUM packet, that it receives from 804 CE1, it MUST first push onto the MPLS label stack the ESI label that 805 PE2 has distributed for ES1. It MUST then push on the MPLS label 806 distributed by PE2 in the Inclusive Multicast Ethernet Tag route for 807 VLAN1. The resulting packet is further encapsulated in the P2P or 808 MP2P LSP label stack required to transmit the packet to PE2. When 809 PE2 receives this packet, it determines the set of ESIs to replicate 810 the packet to from the top MPLS label, after any P2P or MP2P LSP 811 labels have been removed. If the next label is the ESI label assigned 812 by PE2 for ES1, then PE2 MUST NOT forward the packet onto ES1. If the 813 next label is an ESI label which has not been assigned by PE2, then 814 PE2 MUST drop the packet. It should be noted that in this scenario, 815 if PE2 receives a BUM traffic for VLAN1 from CE1, then it doesn't 816 need to encapsulate the packet with an ESI label when sending it to 817 the PE1 since PE1 can use its DF logic to filter the BUM packets and 818 thus doesn't need to use split-horizon filtering for ES1. 820 9.3.1.2. P2MP MPLS LSPs 822 The non-DF PEs operating in an All-Active multi-homing mode that is 823 using P2MP LSPs for sending BUM traffic, distribute to other PEs, 824 that belong to the Ethernet segment or have an EVPN instance in 825 common with the Ethernet Segment, an upstream assigned "ESI label" in 826 the Ethernet A-D route. This label is upstream assigned by the PE 827 that advertises the route. This label MUST be programmed by the other 828 PEs, that are connected to the ESI advertised in the route, in the 829 context label space for the advertising PE. Further the forwarding 830 entry for this label must result in NOT forwarding packets received 831 with this label onto the Ethernet segment that the label was 832 distributed for. This label MUST also be programmed by the other PEs, 833 that import the route but are not connected to the ESI advertised in 834 the route, in the context label space for the advertising PE. Further 835 the forwarding entry for this label must be a POP with no other 836 associated action. 838 Consider PE1 and PE2 that are multi-homed to CE1 on ES1 and operating 839 in All-Active multi-homing mode. Also consider PE3 belongs to one of 840 the EVPN instances of ES1. Further, assume that PE1 which is the 841 non-DF, using P2MP MPLS LSPs to send BUM packets. When PE1 sends a 842 BUM packet, that it receives from CE1, it MUST first push onto the 843 MPLS label stack the ESI label that it has assigned for the ESI that 844 the packet was received on. The resulting packet is further 845 encapsulated in the P2MP MPLS label stack necessary to transmit the 846 packet to the other PEs. Penultimate hop popping MUST be disabled on 847 the P2MP LSPs used in the MPLS transport infrastructure for EVPN. 848 When PE2 receives this packet, it de-capsulates the top MPLS label 849 and forwards the packet using the context label space determined by 850 the top label. If the next label is the ESI label assigned by PE1 to 851 ES1, then PE2 MUST NOT forward the packet onto ES1. When PE3 receives 852 this packet, it de-capsulates the top MPLS label and forwards the 853 packet using the context label space determined by the top label. If 854 the next label is the ESI label assigned by PE1 to ES1 and PE3 is not 855 connected to ES1, then PE3 MUST pop the label and flood the packet 856 over all local ESIs in that EVPN instance. It should be noted that 857 when PE2 sends a BUM frame over a P2MP LSP, it does not need to 858 encapsulate the frame with an ESI label because it is the DF for that 859 VLAN. 861 9.4 Aliasing and Backup-Path 863 In the case where a CE is multi-homed to multiple PE nodes, using a 864 LAG with All-Active redundancy, it is possible that only a single PE 865 learns a set of the MAC addresses associated with traffic transmitted 866 by the CE. This leads to a situation where remote PE nodes receive 867 MAC advertisement routes, for these addresses, from a single PE even 868 though multiple PEs are connected to the multi-homed segment. As a 869 result, the remote PEs are not able to effectively load-balance 870 traffic among the PE nodes connected to the multi-homed Ethernet 871 segment. This could be the case, for e.g. when the PEs perform data- 872 path learning on the access, and the load-balancing function on the 873 CE hashes traffic from a given source MAC address to a single PE. 874 Another scenario where this occurs is when the PEs rely on control 875 plane learning on the access (e.g. using ARP), since ARP traffic will 876 be hashed to a single link in the LAG. 878 To alleviate this issue, EVPN introduces the concept of 'Aliasing'. 879 Aliasing refers to the ability of a PE to signal that it has 880 reachability to a given locally attached Ethernet segment, even when 881 it has learnt no MAC addresses from that segment. The Ethernet A-D 882 route per EVI is used to that end. Remote PEs which receive MAC 883 advertisement routes with non-reserved ESI SHOULD consider the 884 advertised MAC address as reachable via all PEs which have advertised 885 reachability to the relevant Segment using: (1) Ethernet A-D routes 886 per EVI with the same ESI (and Ethernet Tag if applicable) AND 887 (2)Ethernet A-D routes per ESI with the same ESI and with the 888 Active/Standby bit set to 0 in the ESI Label Extended Community. 890 This flavor of Ethernet A-D route per EVI, associated with aliasing, 891 can arrive at target PEs asynchronously relative to the flavor of 892 Ethernet A-D route associated with split-horizon and mass-withdraw 893 (i.e. per ESI). Therefore, if the Ethernet A-D route per EVI arrives 894 ahead of the Ethernet A-D route per ESI, then the former must NOT be 895 used for traffic forwarding till the latter arrives. This will take 896 care of corner cases and race conditions where the Ethernet A-D route 897 associated with mass-withdraw is withdrawn but a PE still receives 898 the route associated with aliasing. 900 Backup-Path is a closely related function, albeit it applies to the 901 case where the redundancy mode is Active/Standby. In this case, the 902 PE advertises that it has reachability to a given locally attached 903 Ethernet Segment using the Ethernet A-D route as well. Remote PEs 904 which receive the MAC advertisement routes, with non-reserved ESI, 905 MUST consider the MAC address as reachable via the advertising PE. 906 Furthermore, the remote PEs SHOULD install a Backup-Path, for said 907 MAC, to the PE which had advertised reachability to the relevant 908 Segment using (1) an Ethernet A-D routes per EVI with the same ESI 909 (and Ethernet Tag if applicable) AND (2) Ethernet A-D routes per ESI 910 with the same ESI and with the Active/Standby bit set to 1 in the ESI 911 Label Extended Community. 913 9.4.1 Constructing the Ethernet A-D Route per EVI 915 This section describes procedures to construct the Ethernet A-D route 916 when one or more such routes are advertised by an PE for a given EVI. 917 This flavor of the Ethernet A-D route is used for aliasing, and 918 support of this route flavor is OPTIONAL. 920 Route-Distinguisher (RD) MUST be set to the RD of the EVI that is 921 advertising the NLRI. An RD MUST be assigned for a given EVI on an 922 PE. This RD MUST be unique across all EVIs on an PE. It is 923 RECOMMENDED to use the Type 1 RD [RFC4364]. The value field comprises 924 an IP address of the PE (typically, the loopback address) followed by 925 a number unique to the PE. This number may be generated by the PE. 926 Or in the Unique VLAN EVPN case, the low order 12 bits may be the 12 927 bit VLAN ID, with the remaining high order 4 bits set to 0. 929 The Ethernet Segment Identifier MUST be a ten octet entity as 930 described in section "Ethernet Segment Identifier". This document 931 does not specify the use of the Ethernet A-D route when the Segment 932 Identifier is set to 0. 934 The Ethernet Tag ID is the identifier of an Ethernet Tag on the 935 Ethernet segment. This value may be a 12 bit VLAN ID, in which case 936 the low order 12 bits are set to the VLAN ID and the high order 20 937 bits are set to 0. Or it may be another Ethernet Tag used by the 938 EVPN. It MAY be set to the default Ethernet Tag on the Ethernet 939 segment or to the value 0. 941 Note that the above allows the Ethernet A-D route to be advertised 942 with one of the following granularities: 944 + One Ethernet A-D route for a given tuple 945 per EVI. This is applicable when the PE uses MPLS-based 946 disposition. 948 + One Ethernet A-D route per (where the Ethernet 949 Tag ID is set to 0). This is applicable when the PE uses 950 MAC-based disposition, or when the PE uses MPLS-based 951 disposition when no VLAN translation is required. 953 The usage of the MPLS label is described in the section on "Load 954 Balancing of Unicast Packets". 956 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 957 be set to the IPv4 or IPv6 address of the advertising PE. 959 9.4.1.1 Ethernet A-D Route Targets 961 The Ethernet A-D route MUST carry one or more Route Target (RT) 962 attributes. RTs may be configured (as in IP VPNs), or may be derived 963 automatically. 965 If an PE uses Route Target Constrain [RT-CONSTRAIN], the PE SHOULD 966 advertise all such RTs using Route Target Constrains. The use of RT 967 Constrains allows each Ethernet A-D route to reach only those PEs 968 that are configured to import at least one RT from the set of RTs 969 carried in the Ethernet A-D route. 971 9.4.1.1.1 Auto-Derivation from the Ethernet Tag ID 973 The following is the procedure for deriving the RT attribute 974 automatically from the Ethernet Tag ID associated with the 975 advertisement: 977 + The Global Administrator field of the RT MUST 978 be set to the Autonomous System (AS) number that the PE 979 belongs to. 981 + The Local Administrator field of the RT contains a 4 982 octets long number that encodes the Ethernet Tag-ID. If the 983 Ethernet Tag-ID is a two octet VLAN ID then it MUST be 984 encoded in the lower two octets of the Local Administrator 985 field and the higher two octets MUST be set to zero. 987 For the "Unique VLAN EVPN" this results in auto-deriving the RT from 988 the Ethernet Tag, e.g., VLAN ID for that EVPN. 990 9.5 Designated Forwarder Election 992 Consider a CE that is a host or a router that is multi-homed directly 993 to more than one PE in an EVPN instance on a given Ethernet segment. 994 One or more Ethernet Tags may be configured on the Ethernet segment. 995 In this scenario only one of the PEs, referred to as the Designated 996 Forwarder (DF), is responsible for certain actions: 998 - Sending multicast and broadcast traffic, on a given Ethernet 999 Tag on a particular Ethernet segment, to the CE. 1001 - Flooding unknown unicast traffic (i.e. traffic for 1002 which an PE does not know the destination MAC address), 1003 on a given Ethernet Tag on a particular Ethernet segment 1004 to the CE, if the environment requires flooding of 1005 unknown unicast traffic. 1007 Note that this behavior, which allows selecting a DF at the 1008 granularity of for multicast, broadcast and unknown 1009 unicast traffic, is the default behavior in this specification. 1011 Note that a CE always sends packets belonging to a specific flow 1012 using a single link towards an PE. For instance, if the CE is a host 1013 then, as mentioned earlier, the host treats the multiple links that 1014 it uses to reach the PEs as a Link Aggregation Group (LAG). The CE 1015 employs a local hashing function to map traffic flows onto links in 1016 the LAG. 1018 If a bridged network is multi-homed to more than one PE in an EVPN 1019 network via switches, then the support of All-Active points of 1020 attachments, as described in this specification, requires the bridge 1021 network to be connected to two or more PEs using a LAG. In this case 1022 the reasons for doing DF election are the same as those described 1023 above when a CE is a host or a router. 1025 If a bridged network does not connect to the PEs using LAG, then only 1026 one of the links between the switched bridged network and the PEs 1027 must be the active link for a given Ethernet Tag. In this case, the 1028 Ethernet A-D route per Ethernet segment MUST be advertised with the 1029 "Active-Standby" flag set to one. Procedures for supporting All- 1030 Active points of attachments, when a bridge network connects to the 1031 PEs using LAG, are for further study. 1033 The default procedure for DF election at the granularity of is referred to as "service carving". With service carving, it is 1035 possible to elect multiple DFs per Ethernet Segment (one per EVI) in 1036 order to perform load-balancing of multi-destination traffic destined 1037 to a given Segment. The load-balancing procedures carve up the EVI 1038 space among the PE nodes evenly, in such a way that every PE is the 1039 DF for a disjoint set of EVIs. The procedure for service carving is 1040 as follows: 1042 1. When a PE discovers the ESI of the attached Ethernet Segment, it 1043 advertises an Ethernet Segment route with the associated ES-Import 1044 extended community attribute. 1046 2. The PE then starts a timer (default value = 3 seconds) to allow 1047 the reception of Ethernet Segment routes from other PE nodes 1048 connected to the same Ethernet Segment. This timer value MUST be same 1049 across all PEs connected to the same Ethernet Segment. 1051 3. When the timer expires, each PE builds an ordered list of the IP 1052 addresses of all the PE nodes connected to the Ethernet Segment 1053 (including itself), in increasing numeric value. Each IP address in 1054 this list is extracted from the "Originator Router's IP address" 1055 field of the advertised Ethernet Segment route. Every PE is then 1056 given an ordinal indicating its position in the ordered list, 1057 starting with 0 as the ordinal for the PE with the numerically lowest 1058 IP address. The ordinals are used to determine which PE node will be 1059 the DF for a given EVPN instance on the Ethernet Segment using the 1060 following rule: Assuming a redundancy group of N PE nodes, the PE 1061 with ordinal i is the DF for an EVPN instance with an associated 1062 Ethernet Tag value V when (V mod N) = i. In the case where multiple 1063 Ethernet Tags are associated with a single EVPN instance, then the 1064 numerically lowest Ethernet Tag value in that EVPN instance MUST be 1065 used in the modulo function. 1067 It should be noted that using "Originator Router's IP address" field 1068 in the Ethernet Segment route to get the PE IP address needed for the 1069 ordered list, allows for a CE to be multi-homed across different ASes 1070 if such need every arises. 1072 4. The PE that is elected as a DF for a given EVPN instance will 1073 unblock traffic for the Ethernet Tags associated with that EVPN 1074 instance. Note that the DF PE unblocks multi-destination traffic in 1075 the egress direction towards the Segment. All non-DF PEs continue to 1076 drop multi-destination traffic (for the associated EVPN instances) in 1077 the egress direction towards the Segment. 1079 In the case of link or port failure, the affected PE withdraws its 1080 Ethernet Segment route. This will re-trigger the service carving 1081 procedures on all the PEs in the RG. For PE node failure, or upon PE 1082 commissioning or decommissioning, the PEs re-trigger the service 1083 carving. In case of a Single-Active multi-homing, when a service 1084 moves from one PE in the RG to another PE as a result of re-carving, 1085 the PE, which ends up being the elected DF for the service, must 1086 trigger a MAC address flush notification towards the associated 1087 Ethernet Segment. This can be done, for e.g. using IEEE 802.1ak MVRP 1088 'new' declaration. 1090 9.6. Interoperability with Single-homing PEs 1092 Let's refer to PEs that only support single-homed CE devices as 1093 single-homing PEs. For single-homing PEs, all the above multi-homing 1094 procedures can be omitted; however, to allow for single-homing PEs to 1095 fully inter-operate with multi-homing PEs, some of the multi-homing 1096 procedures described above SHOULD be supported even by single-homing 1097 PEs: 1099 - procedures related to processing Ethernet A-D route for the purpose 1100 of Fast Convergence (9.2 Fast Convergence), to let single-homing PEs 1101 benefit from fast convergence 1103 - procedures related to processing Ethernet A-D route for the purpose 1104 of Aliasing (9.4 Aliasing and Backup-path), to let single-homing PEs 1105 benefit from load balancing 1107 - procedures related to processing Ethernet A-D route for the purpose 1108 of Backup-path (9.4 Aliasing and Backup-path), to let single-homing 1109 PEs to benefit from the corresponding convergence improvement 1111 10. Determining Reachability to Unicast MAC Addresses 1113 PEs forward packets that they receive based on the destination MAC 1114 address. This implies that PEs must be able to learn how to reach a 1115 given destination unicast MAC address. 1117 There are two components to MAC address learning, "local learning" 1118 and "remote learning": 1120 10.1. Local Learning 1122 A particular PE must be able to learn the MAC addresses from the CEs 1123 that are connected to it. This is referred to as local learning. 1125 The PEs in a particular EVPN instance MUST support local data plane 1126 learning using standard IEEE Ethernet learning procedures. An PE must 1127 be capable of learning MAC addresses in the data plane when it 1128 receives packets such as the following from the CE network: 1130 - DHCP requests 1132 - ARP request for its own MAC. 1134 - ARP request for a peer. 1136 Alternatively PEs MAY learn the MAC addresses of the CEs in the 1137 control plane or via management plane integration between the PEs and 1138 the CEs. 1140 There are applications where a MAC address that is reachable via a 1141 given PE on a locally attached Segment (e.g. with ESI X) may move 1142 such that it becomes reachable via another PE on another Segment 1143 (e.g. with ESI Y). This is referred to as a "MAC Mobility". 1144 Procedures to support this are described in section "MAC Mobility". 1146 10.2. Remote learning 1148 A particular PE must be able to determine how to send traffic to MAC 1149 addresses that belong to or are behind CEs connected to other PEs 1150 i.e. to remote CEs or hosts behind remote CEs. We call such MAC 1151 addresses as "remote" MAC addresses. 1153 This document requires an PE to learn remote MAC addresses in the 1154 control plane. In order to achieve this, each PE advertises the MAC 1155 addresses it learns from its locally attached CEs in the control 1156 plane, to all the other PEs in that EVPN instance, using MP-BGP and 1157 specifically the MAC Advertisement route. 1159 10.2.1. Constructing the BGP EVPN MAC Address Advertisement 1161 BGP is extended to advertise these MAC addresses using the MAC 1162 Advertisement route type in the EVPN NLRI. 1164 The RD MUST be the RD of the EVI that is advertising the NLRI. The 1165 procedures for setting the RD for a given EVI are described in 1166 section 9.4.1. 1168 The Ethernet Segment Identifier is set to the ten octet ESI described 1169 in section "Ethernet Segment". 1171 The Ethernet Tag ID may be zero or may represent a valid Ethernet Tag 1172 ID. This field may be non-zero when there are multiple bridge 1173 domains in the MAC-VRF (e.g., the PE needs to perform qualified 1174 learning for the VLANs in that MAC-VRF). 1176 When the the Ethernet Tag ID in the NLRI is set to a non-zero value, 1177 for a particular bridge domain, then this Ethernet Tag may either be 1178 the Ethernet tag value associated with the CE, e.g., VLAN ID, or it 1179 may be the Ethernet Tag Identifier, e.g., VLAN ID assigned by the 1180 EVPN provider and mapped to the CE's Ethernet tag. The latter would 1181 be the case if the CE Ethernet tags, e.g., VLAN ID, for a particular 1182 bridge domain are different on different CEs. 1184 The MAC address length field is in bits and it is typically set to 1185 48. However this specification enables specifying the MAC address as 1186 a prefix; in which case, the MAC address length field is set to the 1187 length of the prefix. This provides the ability to aggregate MAC 1188 addresses if the deployment environment supports that. The encoding 1189 of a MAC address MUST be the 6-octet MAC address specified by 1190 [802.1D-ORIG] [802.1D-REV]. If the MAC address is advertised as a 1191 prefix then the trailing bits of the prefix MUST be set to 0 to 1192 ensure that the entire prefix is encoded as 6 octets. 1194 The IP Address field is optional. By default, the IP Address Length 1195 field is set to 0 and the IP address field is omitted from the route. 1196 When a valid IP address or address prefix needs to be advertised 1197 (e.g., for ARP suppression purposes or for inter-subnet switching), 1198 it is then encoded in this route. 1200 The IP Address Length field is in bits and it is the length of the IP 1201 prefix. This provides the ability to advertise IP address prefixes 1202 when the deployment environment supports that. The encoding of an IP 1203 address MUST be either 4 octets for IPv4 or 16 octets for IPv6. When 1204 the IP address is advertised as a prefix, then the trailing bits of 1205 the prefix MUST be set to 0 to ensure that the entire prefix is 1206 encoded as either 4 or 16 octets. The length field of EVPN NLRI 1207 (which is in octets and is described in section 8) is sufficient to 1208 determine whether an IP address/prefix is encoded in this route and 1209 if so, whether the encoded IP address/prefix is IPV4 or IPv6. 1211 The MPLS label field carries a single label and it is encoded as 3 1212 octets, where the high-order 20 bits contain the label value. The 1213 MPLS label MUST be the downstream assigned that is used by the PE to 1214 forward MPLS-encapsulated Ethernet frames, where the destination MAC 1215 address in the Ethernet frame is the MAC address advertised in the 1216 above NLRI. The forwarding procedures are specified in section 1217 "Forwarding Unicast Packets" and "Load Balancing of Unicast Packets". 1219 An PE may advertise the same single EVPN label for all MAC addresses 1220 in a given EVI. This label assignment methodology is referred to as a 1221 per EVI label assignment. Alternatively, an PE may advertise a unique 1222 EVPN label per combination. This label assignment 1223 methodology is referred to as a per label 1224 assignment. As a third option, an PE may advertise a unique EVPN 1225 label per MAC address. All of these methodologies have their 1226 tradeoffs. The choice of a particular label assignment methodology is 1227 purely local to the PE that originates the route. 1229 Per EVI label assignment requires the least number of EVPN labels, 1230 but requires a MAC lookup in addition to an MPLS lookup on an egress 1231 PE for forwarding. On the other hand, a unique label per or a unique label per MAC allows an egress PE to 1233 forward a packet that it receives from another PE, to the connected 1234 CE, after looking up only the MPLS labels without having to perform a 1235 MAC lookup. This includes the capability to perform appropriate VLAN 1236 ID translation on egress to the CE. 1238 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1239 be set to the IPv4 or IPv6 address of the advertising PE. 1241 The BGP advertisement for the MAC advertisement route MUST also carry 1242 one or more Route Target (RT) attributes. RTs may be configured (as 1243 in IP VPNs), or may be derived automatically from the Ethernet Tag 1244 ID, in the Unique VLAN case, as described in section "Ethernet A-D 1245 Route per EVPN". 1247 It is to be noted that this document does not require PEs to create 1248 forwarding state for remote MACs when they are learnt in the control 1249 plane. When this forwarding state is actually created is a local 1250 implementation matter. 1252 10.2.2 Route Resolution 1254 If the Ethernet Segment Identifier field in a received MAC 1255 Advertisement route is set to the reserved ESI value of 0 or MAX-ESI, 1256 then the receiving PE MUST install forwarding state for the 1257 associated MAC Address based on the MAC Advertisement route alone. 1259 If the Ethernet Segment Identifier field in a received MAC 1260 Advertisement route is set to a non-reserved ESI, and the receiving 1261 PE is locally attached to the same ESI, then the PE does not alter 1262 its forwarding state based on the received route. This ensures that 1263 local routes are preferred to remote routes. 1265 If the Ethernet Segment Identifier field in a received MAC 1266 Advertisement route is set to a non-reserved ESI, then the receiving 1267 PE MUST install forwarding state for a given MAC address only when 1268 both the MAC Advertisement route AND the associated Ethernet A-D 1269 route per ESI have been received. 1271 To illustrate this with an example, consider two PEs (PE1 and PE2) 1272 connected to a multi-homed Ethernet Segment ES1. All-Active 1273 redundancy mode is assumed. A given MAC address M1 is learnt by PE1 1274 but not PE2. On PE3, the following states may arise: 1276 T1- When the MAC Advertisement Route from PE1 and the Ethernet A-D 1277 routes per ESI from PE1 and PE2 are received, PE3 can forward traffic 1278 destined to M1 to both PE1 and PE2. 1280 T2- If after T1, PE1 withdraws its Ethernet A-D route per ESI, then 1281 PE3 forwards traffic destined to M1 to PE2 only. 1283 T3- If after T1, PE2 withdraws its Ethernet A-D route per ESI, then 1284 PE3 forwards traffic destined to M1 to PE1 only. 1286 T4- If after T1, PE1 withdraws its MAC Advertisement route, then PE3 1287 treats traffic to M1 as unknown unicast. Note, here, that had PE2 1288 also advertised a MAC route for M1 before PE1 withdraws its MAC 1289 route, then PE3 would have continued forwarding traffic destined to 1290 M1 to PE2. 1292 11. ARP and ND 1294 The IP address field in the MAC advertisement route may optionally 1295 carry one of the IP addresses associated with the MAC address. This 1296 provides an option which can be used to minimize the flooding of ARP 1297 or Neighbor Discovery (ND) messages over the MPLS network and to 1298 remote CEs. This option also minimizes ARP (or ND) message processing 1299 on end-stations/hosts connected to the EVPN network. An PE may learn 1300 the IP address associated with a MAC address in the control or 1301 management plane between the CE and the PE. Or, it may learn this 1302 binding by snooping certain messages to or from a CE. When an PE 1303 learns the IP address associated with a MAC address, of a locally 1304 connected CE, it may advertise this address to other PEs by including 1305 it in the MAC Advertisement route. The IP Address may be an IPv4 1306 address encoded using four octets, or an IPv6 address encoded using 1307 sixteen octets. The IP Address length field MUST be set to 32 for an 1308 IPv4 address or to 128 for an IPv6 address. 1310 If there are multiple IP addresses associated with a MAC address, 1311 then multiple MAC advertisement routes MUST be generated, one for 1312 each IP address. For instance, this may be the case when there are 1313 both an IPv4 and an IPv6 address associated with the MAC address. 1314 When the IP address is dissociated with the MAC address, then the MAC 1315 advertisement route with that particular IP address MUST be 1316 withdrawn. 1318 When an PE receives an ARP request for an IP address from a CE, and 1319 if the PE has the MAC address binding for that IP address, the PE 1320 SHOULD perform ARP proxy by responding to the ARP request. 1322 11.1 Default Gateway 1324 When a PE needs to perform inter-subnet forwarding where each subnet 1325 is represented by a different broadcast domain (e.g., different VLAN) 1326 the inter-subnet forwarding is performed at layer 3 and the PE that 1327 performs such function is called the default gateway. In this case 1328 when the PE receives an ARP Request for the IP address of the default 1329 gateway, the PE originates an ARP Reply. 1331 Each PE that acts as a default gateway for a given EVPN instance MAY 1332 advertise in the EVPN control plane its default gateway MAC address 1333 using the MAC advertisement route, and indicates that such route is 1334 associated with the default gateway. This is accomplished by 1335 requiring the route to carry the Default Gateway extended community 1336 defined in [Section 8.8 Default Gateway Extended Community]. The IP 1337 address field (4 octets for IPv4, 16 octets for IPv6) is set to zero 1338 when advertising the MAC route with the Default Gateway extended 1339 community. Both ESI and Ethernet Tag fields are also set to zero for 1340 this advertisement. 1342 Unless it is known a priori (by means outside of this document) that 1343 all PEs of a given EVPN instance act as a default gateway for that 1344 EVPN instance, the MPLS label MUST be set to a valid downstream 1345 assigned label. 1347 Furthermore, even if all PEs of a given EVPN instance do act as a 1348 default gateway for that EVPN instance, but only some, but not all, 1349 of these PEs have sufficient (routing) information to provide inter- 1350 subnet routing for all the inter-subnet traffic originated within the 1351 subnet associated with the EVPN instance, then when such PE 1352 advertises in the EVPN control plane its default gateway MAC address 1353 using the MAC advertisement route, and indicates that such route is 1354 associated with the default gateway, the route MUST carry a valid 1355 downstream assigned label. 1357 If all PEs of a given EVPN instance act as a default gateway for that 1358 EVPN instance, and the same default gateway MAC address is used 1359 across all gateway devices, then no such advertisement is needed. 1360 However, if each default gateway uses a different MAC address, then 1361 each default gateway needs to be aware of other gateways' MAC 1362 addresses and thus the need for such advertisement. This is called 1363 MAC address aliasing since a single default GW can be represented by 1364 multiple MAC addresses. 1366 Each PE that receives this route and imports it as per procedures 1367 specified in this document follows the procedures in this section 1368 when replying to ARP Requests that it receives if such Requests are 1369 for the IP address in the received EVPN route. 1371 Each PE that acts as a default gateway for a given EVPN instance that 1372 receives this route and imports it as per procedures specified in 1373 this document MUST create MAC forwarding state that enables it to 1374 apply IP forwarding to the packets destined to the MAC address 1375 carried in the route. 1377 12. Handling of Multi-Destination Traffic 1378 Procedures are required for a given PE to send broadcast or multicast 1379 traffic, received from a CE encapsulated in a given Ethernet Tag 1380 (VLAN) in an EVPN instance, to all the other PEs that span that 1381 Ethernet Tag (VLAN) in that EVPN instance. In certain scenarios, 1382 described in section "Processing of Unknown Unicast Packets", a given 1383 PE may also need to flood unknown unicast traffic to other PEs. 1385 The PEs in a particular EVPN instance may use ingress replication, 1386 P2MP LSPs or MP2MP LSPs to send unknown unicast, broadcast or 1387 multicast traffic to other PEs. 1389 Each PE MUST advertise an "Inclusive Multicast Ethernet Tag Route" to 1390 enable the above. The following subsection provides the procedures to 1391 construct the Inclusive Multicast Ethernet Tag route. Subsequent 1392 subsections describe in further detail its usage. 1394 12.1. Construction of the Inclusive Multicast Ethernet Tag Route 1396 The RD MUST be the RD of the EVI that is advertising the NLRI. The 1397 procedures for setting the RD for a given EVPN instance on a PE are 1398 described in section 9.4.1. 1400 The Ethernet Tag ID is the identifier of the Ethernet Tag. It MAY be 1401 set to 0 or to a valid Ethernet Tag value. 1403 The Originating Router's IP address MUST be set to an IP address of 1404 the PE. This address SHOULD be common for all the EVIs on the PE 1405 (e.,g., this address may be PE's loopback address). The IP Address 1406 Length field is in bits. 1408 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1409 be set to the same IP address as the one carried in the Originating 1410 Router's IP Address field. 1412 The BGP advertisement for the Inclusive Multicast Ethernet Tag route 1413 MUST also carry one or more Route Target (RT) attributes. The 1414 assignment of RTs described in the section on "Constructing the BGP 1415 EVPN MAC Address Advertisement" MUST be followed. 1417 12.2. P-Tunnel Identification 1419 In order to identify the P-Tunnel used for sending broadcast, unknown 1420 unicast or multicast traffic, the Inclusive Multicast Ethernet Tag 1421 route MUST carry a "PMSI Tunnel Attribute" as specified in [BGP 1422 MVPN]. 1424 Depending on the technology used for the P-tunnel for the EVPN 1425 instance on the PE, the PMSI Tunnel attribute of the Inclusive 1426 Multicast Ethernet Tag route is constructed as follows. 1428 + If the PE that originates the advertisement uses a 1429 P-Multicast tree for the P-tunnel for EVPN, the PMSI 1430 Tunnel attribute MUST contain the identity of the tree 1431 (note that the PE could create the identity of the 1432 tree prior to the actual instantiation of the tree). 1434 + An PE that uses a P-Multicast tree for the P-tunnel MAY 1435 aggregate two or more Ethernet Tags in the same or different 1436 EVIs present on the PE onto the same tree. In this case, in 1437 addition to carrying the identity of the tree, the PMSI Tunnel 1438 attribute MUST carry an MPLS upstream assigned label which 1439 the PE has bound uniquely to the Ethernet Tag for the EVI 1440 associated with this update (as determined by its RTs). 1442 If the PE has already advertised Inclusive Multicast 1443 Ethernet Tag routes for two or more Ethernet Tags that it 1444 now desires to aggregate, then the PE MUST re-advertise 1445 those routes. The re-advertised routes MUST be the same 1446 as the original ones, except for the PMSI Tunnel attribute 1447 and the label carried in that attribute. 1449 + If the PE that originates the advertisement uses ingress 1450 replication for the P-tunnel for EVPN, the route MUST 1451 include the PMSI Tunnel attribute with the Tunnel Type set to 1452 Ingress Replication and Tunnel Identifier set to a routable 1453 address of the PE. The PMSI Tunnel attribute MUST carry a 1454 downstream assigned MPLS label. This label is used to 1455 demultiplex the broadcast, multicast or unknown unicast EVPN 1456 traffic received over a MP2P tunnel by the PE. 1458 + The Leaf Information Required flag of the PMSI Tunnel 1459 attribute MUST be set to zero, and MUST be ignored on receipt. 1461 13. Processing of Unknown Unicast Packets 1463 The procedures in this document do not require the PEs to flood 1464 unknown unicast traffic to other PEs. If PEs learn CE MAC addresses 1465 via a control plane protocol, the PEs can then distribute MAC 1466 addresses via BGP, and all unicast MAC addresses will be learnt prior 1467 to traffic to those destinations. 1469 However, if a destination MAC address of a received packet is not 1470 known by the PE, the PE may have to flood the packet. When flooding, 1471 one must take into account "split horizon forwarding" as follows: The 1472 principles behind the following procedures are borrowed from the 1473 split horizon forwarding rules in VPLS solutions [RFC4761] and 1475 [RFC4762]. When an PE capable of flooding (say PEx) receives an 1476 unknown destination MAC address, it floods the frame. If the frame 1477 arrived from an attached CE, PEx must send a copy of the frame to 1478 every other attached CE participating in that EVPN instance, on a 1479 different ESI than the one it received the frame on, as long as the 1480 PE is the DF for the egress ESI. In addition, the PE must flood the 1481 frame to all other PEs participating in that EVPN instance. If, on 1482 the other hand, the frame arrived from another PE (say PEy), PEx must 1483 send a copy of the packet only to attached CEs as long as it is the 1484 DF for the egress ESI. PEx MUST NOT send the frame to other PEs, 1485 since PEy would have already done so. Split horizon forwarding rules 1486 apply to unknown MAC addresses. 1488 Whether or not to flood packets to unknown destination MAC addresses 1489 should be an administrative choice, depending on how learning happens 1490 between CEs and PEs. 1492 The PEs in a particular EVPN instance may use ingress replication 1493 using RSVP-TE P2P LSPs or LDP MP2P LSPs for sending unknown unicast 1494 traffic to other PEs. Or they may use RSVP-TE P2MP or LDP P2MP for 1495 sending such traffic to other PEs. 1497 13.1. Ingress Replication 1499 If ingress replication is in use, the P-Tunnel attribute, carried in 1500 the Inclusive Multicast Ethernet Tag routes for the EVPN instance, 1501 specifies the downstream label that the other PEs can use to send 1502 unknown unicast, multicast or broadcast traffic for that EVPN 1503 instance to this particular PE. 1505 The PE that receives a packet with this particular MPLS label MUST 1506 treat the packet as a broadcast, multicast or unknown unicast packet. 1507 Further if the MAC address is a unicast MAC address, the PE MUST 1508 treat the packet as an unknown unicast packet. 1510 13.2. P2MP MPLS LSPs 1512 The procedures for using P2MP LSPs are very similar to VPLS 1513 procedures [VPLS-MCAST]. The P-Tunnel attribute used by an PE for 1514 sending unknown unicast, broadcast or multicast traffic for a 1515 particular EVPN instance is advertised in the Inclusive Ethernet Tag 1516 Multicast route as described in section "Handling of Multi- 1517 Destination Traffic". 1519 The P-Tunnel attribute specifies the P2MP LSP identifier. This is the 1520 equivalent of an Inclusive tree in [VPLS-MCAST]. Note that multiple 1521 Ethernet Tags, which may be in different EVPN instances, may use the 1522 same P2MP LSP, using upstream labels [VPLS-MCAST]. This is the 1523 equivalent of an Aggregate Inclusive tree in [VPLS-MCAST]. When P2MP 1524 LSPs are used for flooding unknown unicast traffic, packet re- 1525 ordering is possible. 1527 The PE that receives a packet on the P2MP LSP specified in the PMSI 1528 Tunnel Attribute MUST treat the packet as a broadcast, multicast or 1529 unknown unicast packet. Further if the MAC address is a unicast MAC 1530 address, the PE MUST treat the packet as an unknown unicast packet. 1532 14. Forwarding Unicast Packets 1534 This section describes procedures for forwarding unicast packets by 1535 PEs, where such packets are received from either directly connected 1536 CEs, or from some other PEs. 1538 14.1. Forwarding packets received from a CE 1540 When an PE receives a packet from a CE, on a given Ethernet Tag, it 1541 must first look up the source MAC address of the packet. In certain 1542 environments the source MAC address MAY be used to authenticate the 1543 CE and determine that traffic from the host can be allowed into the 1544 network. Source MAC lookup MAY also be used for local MAC address 1545 learning. 1547 If the PE decides to forward the packet, the destination MAC address 1548 of the packet must be looked up. If the PE has received MAC address 1549 advertisements for this destination MAC address from one or more 1550 other PEs or learned it from locally connected CEs, it is considered 1551 as a known MAC address. Otherwise, the MAC address is considered as 1552 an unknown MAC address. 1554 For known MAC addresses the PE forwards this packet to one of the 1555 remote PEs or to a locally attached CE. When forwarding to a remote 1556 PE, the packet is encapsulated in the EVPN MPLS label advertised by 1557 the remote PE, for that MAC address, and in the MPLS LSP label stack 1558 to reach the remote PE. 1560 If the MAC address is unknown and if the administrative policy on the 1561 PE requires flooding of unknown unicast traffic then: 1563 - The PE MUST flood the packet to other PEs. The PE MUST first 1564 encapsulate the packet in the ESI MPLS label as described in section 1565 9.3. If ingress replication is used, the packet MUST be replicated 1566 one or more times to each remote PE with the outermost label being an 1567 MPLS label determined as follows: This is the MPLS label advertised 1568 by the remote PE in a PMSI Tunnel Attribute in the Inclusive 1569 Multicast Ethernet Tag route for an 1570 combination. The Ethernet Tag in the route must be the same as the 1571 Ethernet Tag associated with the interface on which the ingress PE 1572 receives the packet. If P2MP LSPs are being used the packet MUST be 1573 sent on the P2MP LSP that the PE is the root of for the Ethernet Tag 1574 in the EVPN instance. If the same P2MP LSP is used for all Ethernet 1575 Tags, then all the PEs in the EVPN instance MUST be the leaves of the 1576 P2MP LSP. If a distinct P2MP LSP is used for a given Ethernet Tag in 1577 the EVPN instance, then only the PEs in the Ethernet Tag MUST be the 1578 leaves of the P2MP LSP. The packet MUST be encapsulated in the P2MP 1579 LSP label stack. 1581 If the MAC address is unknown then, if the administrative policy on 1582 the PE does not allow flooding of unknown unicast traffic: 1584 - The PE MUST drop the packet. 1586 14.2. Forwarding packets received from a remote PE 1588 This section described the procedures for forwarding known and 1589 unknown unicast packets received from a remote PE. 1591 14.2.1. Unknown Unicast Forwarding 1593 When an PE receives an MPLS packet from a remote PE then, after 1594 processing the MPLS label stack, if the top MPLS label ends up being 1595 a P2MP LSP label associated with an EVPN instance or in case of 1596 ingress replication the downstream label advertised in the P-Tunnel 1597 attribute, and after performing the split horizon procedures 1598 described in section "Split Horizon": 1600 - If the PE is the designated forwarder of BUM traffic on a 1601 particular set of ESIs for the Ethernet Tag, the default behavior is 1602 for the PE to flood the packet on these ESIs. In other words, the 1603 default behavior is for the PE to assume that for BUM traffic, it is 1604 not required to perform a destination MAC address lookup. As an 1605 option, the PE may perform a destination MAC lookup to flood the 1606 packet to only a subset of the CE interfaces in the Ethernet Tag. For 1607 instance the PE may decide to not flood an BUM packet on certain 1608 Ethernet segments even if it is the DF on the Ethernet segment, based 1609 on administrative policy. 1611 - If the PE is not the designated forwarder on any of the ESIs for 1612 the Ethernet Tag, the default behavior is for it to drop the packet. 1614 14.2.2. Known Unicast Forwarding 1616 If the top MPLS label ends up being an EVPN label that was advertised 1617 in the unicast MAC advertisements, then the PE either forwards the 1618 packet based on CE next-hop forwarding information associated with 1619 the label or does a destination MAC address lookup to forward the 1620 packet to a CE. 1622 15. Load Balancing of Unicast Frames 1624 This section specifies the load balancing procedures for sending 1625 known unicast frames to a multi-homed CE. 1627 15.1. Load balancing of traffic from an PE to remote CEs 1629 Whenever a remote PE imports a MAC advertisement for a given in an EVI, it MUST examine all imported Ethernet A-D 1631 routes for that ESI in order to determine the load-balancing 1632 characteristics of the Ethernet segment. 1634 15.1.1 Single-Active Redundancy Mode 1636 For a given ESI, if the remote PE has imported an Ethernet A-D route 1637 per Ethernet Segment from at least one PE, where the "Active-Standby" 1638 flag in the ESI Label Extended Community is set, then the remote PE 1639 MUST deduce that the Ethernet segment is operating in Single-Active 1640 redundancy mode. As such, the MAC address will be reachable only via 1641 the PE announcing the associated MAC Advertisement route - this is 1642 referred to as the primary PE. The set of other PE nodes advertising 1643 Ethernet A-D routes per Ethernet Segment for the same ESI serve as 1644 backup paths, in case the active PE encounters a failure. These are 1645 referred to as the backup PEs. It should be noted that the primary PE 1646 for a given is the DF for that . 1648 If the primary PE encounters a failure, it MAY withdraw its Ethernet 1649 A-D route for the affected segment prior to withdrawing the entire 1650 set of MAC Advertisement routes. 1652 In the case where only a single other backup PE in the network had 1653 advertised an Ethernet A-D route for the same ESI, the remote PE can 1654 then use the Ethernet A-D route withdrawal as a trigger to update its 1655 forwarding entries, for the associated MAC addresses, to point 1656 towards the backup PE. As the backup PE starts learning the MAC 1657 addresses over its attached Ethernet segment, it will start sending 1658 MAC Advertisement routes while the failed PE withdraws its own. This 1659 mechanism minimizes the flooding of traffic during fail-over events. 1661 In the case where multiple other backup PE in the network had 1662 advertised an Ethernet A-D route for the same ESI, the remote PE MUST 1663 then use the Ethernet A-D route withdrawal as a trigger to start 1664 flooding traffic destined to the associated MAC addresses (as long as 1665 flooding of unknown unicast is administratively allowed). It is not 1666 possible to select a single backup path in this case. 1668 15.1.2 All-Active Redundancy Mode 1670 If for the given ESI, none of the Ethernet A-D routes per Ethernet 1671 Segment imported by the remote PE have the "Active-Standby" flag set 1672 in the ESI Label Extended Community, then the remote PE MUST treat 1673 the Ethernet segment as operating in All-Active redundancy mode. The 1674 remote PE would then treat the MAC address as reachable via all of 1675 the PE nodes from which it has received both an Ethernet A-D route 1676 per Ethernet Segment as well as an Ethernet A-D route per EVI for the 1677 ESI in question. The remote PE MUST use the MAC advertisement and 1678 eligible Ethernet A-D routes to construct the set of next-hops that 1679 it can use to send the packet to the destination MAC. Each next-hop 1680 comprises an MPLS label stack that is to be used by the egress PE to 1681 forward the packet. This label stack is determined as follows: 1683 -If the next-hop is constructed as a result of a MAC route then this 1684 label stack MUST be used. However, if the MAC route doesn't exist, 1685 then the next-hop and MPLS label stack is constructed as a result of 1686 the Ethernet A-D routes. Note that the following description applies 1687 to determining the label stack for a particular next-hop to reach a 1688 given PE, from which the remote PE has received and imported Ethernet 1689 A-D routes that have the matching ESI and Ethernet Tag as the one 1690 present in the MAC advertisement. The Ethernet A-D routes mentioned 1691 in the following description refer to the ones imported from this 1692 given PE. 1694 -If an Ethernet A-D route per Ethernet Segment for that ESI exists, 1695 together with an Ethernet A-D route per EVI, then the label from that 1696 latter route must be used. 1698 The following example explains the above. 1700 Consider a CE (CE1) that is dual-homed to two PEs (PE1 and PE2) on a 1701 LAG interface (ES1), and is sending packets with MAC address MAC1 on 1702 VLAN1. A remote PE, say PE3, is able to learn that MAC1 is reachable 1703 via PE1 and PE2. Both PE1 and PE2 may advertise MAC1 in BGP if they 1704 receive packets with MAC1 from CE1. If this is not the case, and if 1705 MAC1 is advertised only by PE1, PE3 still considers MAC1 as reachable 1706 via both PE1 and PE2 as both PE1 and PE2 advertise a Ethernet A-D 1707 route per ESI for ES1 as well as an Ethernet A-D route per EVI for 1708 . 1710 The MPLS label stack to send the packets to PE1 is the MPLS LSP stack 1711 to get to PE1 and the EVPN label advertised by PE1 for CE1's MAC. 1713 The MPLS label stack to send packets to PE2 is the MPLS LSP stack to 1714 get to PE2 and the MPLS label in the Ethernet A-D route advertised by 1715 PE2 for , if PE2 has not advertised MAC1 in BGP. 1717 We will refer to these label stacks as MPLS next-hops. 1719 The remote PE (PE3) can now load balance the traffic it receives from 1720 its CEs, destined for CE1, between PE1 and PE2. PE3 may use N-Tuple 1721 flow information to hash traffic into one of the MPLS next-hops for 1722 load balancing of IP traffic. Alternatively PE3 may rely on the 1723 source MAC addresses for load balancing. 1725 Note that once PE3 decides to send a particular packet to PE1 or PE2 1726 it can pick one out of multiple possible paths to reach the 1727 particular remote PE using regular MPLS procedures. For instance, if 1728 the tunneling technology is based on RSVP-TE LSPs, and PE3 decides to 1729 send a particular packet to PE1, then PE3 can choose from multiple 1730 RSVP-TE LSPs that have PE1 as their destination. 1732 When PE1 or PE2 receive the packet destined for CE1 from PE3, if the 1733 packet is a unicast MAC packet it is forwarded to CE1. If it is a 1734 multicast or broadcast MAC packet then only one of PE1 or PE2 must 1735 forward the packet to the CE. Which of PE1 or PE2 forward this packet 1736 to the CE is determined based on which of the two is the DF. 1738 If the connectivity between the multi-homed CE and one of the PEs 1739 that it is attached to fails, the PE MUST withdraw the Ethernet Tag 1740 A-D routes, that had been previously advertised, for the Ethernet 1741 Segment to the CE. When the MAC entry on the PE ages out, the PE MUST 1742 withdraw the MAC address from BGP. Note that to aid convergence, the 1743 Ethernet Tag A-D routes MAY be withdrawn before the MAC routes. This 1744 enables the remote PEs to remove the MPLS next-hop to this particular 1745 PE from the set of MPLS next-hops that can be used to forward traffic 1746 to the CE. For further details and procedures on withdrawal of EVPN 1747 route types in the event of PE to CE failures please section "PE to 1748 CE Network Failures". 1750 15.2. Load balancing of traffic between an PE and a local CE 1752 A CE may be configured with more than one interface connected to 1753 different PEs or the same PE for load balancing, using a technology 1754 such as LAG. The PE(s) and the CE can load balance traffic onto these 1755 interfaces using one of the following mechanisms. 1757 15.2.1. Data plane learning 1759 Consider that the PEs perform data plane learning for local MAC 1760 addresses learned from local CEs. This enables the PE(s) to learn a 1761 particular MAC address and associate it with one or more interfaces, 1762 if the technology between the PE and the CE supports multi-pathing. 1763 The PEs can now load balance traffic destined to that MAC address on 1764 the multiple interfaces. 1766 Whether the CE can load balance traffic that it generates on the 1767 multiple interfaces is dependent on the CE implementation. 1769 15.2.2. Control plane learning 1771 The CE can be a host that advertises the same MAC address using a 1772 control protocol on both interfaces. This enables the PE(s) to learn 1773 the host's MAC address and associate it with one or more interfaces. 1774 The PEs can now load balance traffic destined to the host on the 1775 multiple interfaces. The host can also load balance the traffic it 1776 generates onto these interfaces and the PE that receives the traffic 1777 employs EVPN forwarding procedures to forward the traffic. 1779 16. MAC Mobility 1781 It is possible for a given host or end-station (as defined by its MAC 1782 address) to move from one Ethernet segment to another; this is 1783 referred to as 'MAC Mobility' or 'MAC move' and it is different from 1784 the multi-homing situation in which a given MAC address is reachable 1785 via multiple PEs for the same Ethernet segment. In a MAC move, there 1786 would be two sets of MAC Advertisement routes, one set with the new 1787 Ethernet segment and one set with the previous Ethernet segment, and 1788 the MAC address would appear to be reachable via each of these 1789 segments. 1791 In order to allow all of the PEs in the EVPN instance to correctly 1792 determine the current location of the MAC address, all advertisements 1793 of it being reachable via the previous Ethernet segment MUST be 1794 withdrawn by the PEs, for the previous Ethernet segment, that had 1795 advertised it. 1797 If local learning is performed using the data plane, these PEs will 1798 not be able to detect that the MAC address has moved to another 1799 Ethernet segment and the receipt of MAC Advertisement routes, with 1800 the MAC Mobility extended community attribute, from other PEs serves 1801 as the trigger for these PEs to withdraw their advertisements. If 1802 local learning is performed using the control or management planes, 1803 these interactions serve as the trigger for these PEs to withdraw 1804 their advertisements. 1806 In a situation where there are multiple moves of a given MAC, 1807 possibly between the same two Ethernet segments, there may be 1808 multiple withdrawals and re-advertisements. In order to ensure that 1809 all PEs in the EVPN instance receive all of these correctly through 1810 the intervening BGP infrastructure, it is necessary to introduce a 1811 sequence number into the MAC Mobility extended community attribute. 1813 An implementation MUST handle the scenarios where the sequence number 1814 wraps around to process mobility event correctly. 1816 Every MAC mobility event for a given MAC address will contain a 1817 sequence number that is set using the following rules: 1819 - A PE advertising a MAC address for the first time advertises it 1820 with no MAC Mobility extended community attribute. 1822 - A PE detecting a locally attached MAC address for which it had 1823 previously received a MAC Advertisement route with a different 1824 Ethernet segment identifier advertises the MAC address in a MAC 1825 Advertisement route tagged with a MAC Mobility extended community 1826 attribute with a sequence number one greater than the sequence number 1827 in the MAC mobility attribute of the received MAC Advertisement 1828 route. In the case of the first mobility event for a given MAC 1829 address, where the received MAC Advertisement route does not carry a 1830 MAC Mobility attribute, the value of the sequence number in the 1831 received route is assumed to be 0 for purpose of this processing. 1833 - A PE detecting a locally attached MAC address for which it had 1834 previously received a MAC Advertisement route with the same non-zero 1835 Ethernet segment identifier advertises it with: 1836 i. no MAC Mobility extended community attribute, if the received 1837 route did not carry said attribute. 1839 ii. a MAC Mobility extended community attribute with the sequence 1840 number equal to the highest of the sequence number(s) in the 1841 received MAC Advertisement route(s), if the received route(s) is 1842 (are) tagged with a MAC Mobility extended community attribute. 1844 - A PE detecting a locally attached MAC address for which it had 1845 previously received a MAC Advertisement route with the same zero 1846 Ethernet segment identifier (single-homed scenarios) advertises it 1847 with MAC mobility extended community attribute with the sequence 1848 number set properly. In case of single-homed scenarios, there is no 1849 need for ESI comparison. The reason ESI comparison is done for multi- 1850 homing, is to prevent false detection of MAC move among the PEs 1851 attached to the same multi-homed site. 1853 A PE receiving a MAC Advertisement route for a MAC address with a 1854 different Ethernet segment identifier and a higher sequence number 1855 than that which it had previously advertised, withdraws its MAC 1856 Advertisement route. If two (or more) PEs advertise the same MAC 1857 address with same sequence number but different Ethernet segment 1858 identifiers, a PE that receives these routes selects the route 1859 advertised by the PE with lowest IP address as the best route. 1861 16.1. MAC Duplication Issue 1863 A situation may arise where the same MAC address is learned by 1864 different PEs in the same VLAN because of two (or more hosts) being 1865 mis-configured with the same (duplicate) MAC address. In such 1866 situation, the traffic originating from these hosts would trigger 1867 continuous MAC moves among the PEs attached to these hosts. It is 1868 important to recognize such situation and avoid incrementing the 1869 sequence number (in the MAC Mobility attribute) to infinity. In order 1870 to remedy such situation, a PE that detects a MAC mobility event by 1871 way of local learning starts an M-second timer (default value of M = 1872 5) and if it detects N MAC moves before the timer expires (default 1873 value for N = 3), it concludes that a duplicate MAC situation has 1874 occurred. The PE MUST alert the operator and stop sending and 1875 processing any BGP MAC Advertisement routes for that MAC address till 1876 a corrective action is taken by the operator. The values of M and N 1877 MUST be configurable to allow for flexibility in operator control. 1878 Note that the other PEs in the E-VPN instance will forward the 1879 traffic for the duplicate MAC address to one of the PEs advertising 1880 the duplicate MAC address. 1882 16.2. Sticky MAC addresses 1884 There are scenarios in which it is desired to configure some MAC 1885 addresses as static so that they are not subjected to MAC move. In 1886 such scenarios, these MAC addresses are advertised with MAC Mobility 1887 Extended Community where static flag is set to 1 and sequence number 1888 is set to zero. If a PE receives such advertisements and later learns 1889 the same MAC address(es) via local learning, then the PE MUST alert 1890 the operator. 1892 17. Multicast & Broadcast 1894 The PEs in a particular EVPN instance may use ingress replication or 1895 P2MP LSPs to send multicast traffic to other PEs. 1897 17.1. Ingress Replication 1899 The PEs may use ingress replication for flooding BUM traffic as 1900 described in section "Handling of Multi-Destination Traffic". A given 1901 broadcast packet must be sent to all the remote PEs. However a given 1902 multicast packet for a multicast flow may be sent to only a subset of 1903 the PEs. Specifically a given multicast flow may be sent to only 1904 those PEs that have receivers that are interested in the multicast 1905 flow. Determining which of the PEs have receivers for a given 1906 multicast flow is done using explicit tracking described below. 1908 17.2. P2MP LSPs 1910 An PE may use an "Inclusive" tree for sending an BUM packet. This 1911 terminology is borrowed from [VPLS-MCAST]. 1913 A variety of transport technologies may be used in the SP network. 1914 For inclusive P-Multicast trees, these transport technologies include 1915 point-to-multipoint LSPs created by RSVP-TE or mLDP. 1917 17.2.1. Inclusive Trees 1919 An Inclusive Tree allows the use of a single multicast distribution 1920 tree, referred to as an Inclusive P-Multicast tree, in the SP network 1921 to carry all the multicast traffic from a specified set of EVPN 1922 instances on a given PE. A particular P-Multicast tree can be set up 1923 to carry the traffic originated by sites belonging to a single EVPN 1924 instance, or to carry the traffic originated by sites belonging to 1925 different EVPN instances. The ability to carry the traffic of more 1926 than one EVPN instance on the same tree is termed 'Aggregation'. The 1927 tree needs to include every PE that is a member of any of the EVPN 1928 instances that are using the tree. This implies that an PE may 1929 receive multicast traffic for a multicast stream even if it doesn't 1930 have any receivers that are interested in receiving traffic for that 1931 stream. 1933 An Inclusive P-Multicast tree as defined in this document is a P2MP 1934 tree. A P2MP tree is used to carry traffic only for EVPN CEs that 1935 are connected to the PE that is the root of the tree. 1937 The procedures for signaling an Inclusive Tree are the same as those 1938 in [VPLS-MCAST] with the VPLS-AD route replaced with the Inclusive 1939 Multicast Ethernet Tag route. The P-Tunnel attribute [VPLS-MCAST] for 1940 an Inclusive tree is advertised in the Inclusive Multicast route as 1941 described in section "Handling of Multi-Destination Traffic". Note 1942 that an PE can "aggregate" multiple inclusive trees for different 1943 EVPN instances on the same P2MP LSP using upstream labels. The 1944 procedures for aggregation are the same as those described in [VPLS- 1945 MCAST], with VPLS A-D routes replaced by EVPN Inclusive Multicast 1946 routes. 1948 18. Convergence 1950 This section describes failure recovery from different types of 1951 network failures. 1953 18.1. Transit Link and Node Failures between PEs 1954 The use of existing MPLS Fast-Reroute mechanisms can provide failure 1955 recovery in the order of 50ms, in the event of transit link and node 1956 failures in the infrastructure that connects the PEs. 1958 18.2. PE Failures 1960 Consider a host host1 that is dual homed to PE1 and PE2. If PE1 1961 fails, a remote PE, PE3, can discover this based on the failure of 1962 the BGP session. This failure detection can be in the sub-second 1963 range if BFD is used to detect BGP session failure. PE3 can update 1964 its forwarding state to start sending all traffic for host1 to only 1965 PE2. It is to be noted that this failure recovery is potentially 1966 faster than what would be possible if data plane learning were to be 1967 used. As in that case PE3 would have to rely on re-learning of MAC 1968 addresses via PE2. 1970 18.2. PE to CE Network Failures 1972 When an Ethernet segment connected to an PE fails or when a Ethernet 1973 Tag is decommissioned on an Ethernet segment, then the PE MUST 1974 withdraw the Ethernet A-D route(s) announced for the that are impacted by the failure or decommissioning. In 1976 addition, the PE MUST also withdraw the MAC advertisement routes that 1977 are impacted by the failure or decommissioning. 1979 The Ethernet A-D routes should be used by an implementation to 1980 optimize the withdrawal of MAC advertisement routes. When an PE 1981 receives a withdrawal of a particular Ethernet A-D route from an PE 1982 it SHOULD consider all the MAC advertisement routes, that are learned 1983 from the same as in the Ethernet A-D route, from 1984 the advertising PE, as having been withdrawn. This optimizes the 1985 network convergence times in the event of PE to CE failures. 1987 19. Frame Ordering 1989 In a MAC address, bit-1 of the most significant byte is used for 1990 unicast/multicast indication and bit-2 is used for globally unique 1991 versus locally administered MAC address. If the value of the 2nd 1992 nibble (bits 4 thorough 8) of the most significant byte of the 1993 destination MAC address (which follows the last MPLS label) happens 1994 to be 0x4 or 0x6, then the Ethernet frame can be misinterpreted as an 1995 IPv4 or IPv6 packet by intermediate P nodes performing ECMP resulting 1996 in load balancing packets belonging to the same flow on different 1997 ECMP paths, thus subjecting them to different delays. Therefore, 1998 packets belonging to the same flow can arrive at the destination out 1999 of order. This out of order delivery can happen during steady state 2000 in absence of any failures resulting in significant impact to the 2001 network operation. 2003 In order to avoid any such mis-ordering, the usage of control word 2004 SHALL adhere to the following rules: 2006 - A PE MUST use the control world when sending EVPN encapsulated 2007 packets over a MP2P or a P2P LSP 2009 - A PE MUST NOT use the control world when sending EVPN encapsulated 2010 packets over a P2MP LSP 2012 The control word is defined as follows: 2014 0 1 2 3 2015 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2016 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2017 |0 0 0 0| Reserved | Sequence Number | 2018 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2020 In the above diagram the first 4 bits MUST be set to 0. The rest of 2021 the first 16 bits are reserved for future use. They MUST be set to 0 2022 when transmitting, and MUST be ignored upon receipt. The next 16 bits 2023 provide a sequence number that MUST also be set to zero by default. 2025 20. Acknowledgements 2027 Special thanks to Yakov Rekhter for reviewing this draft several 2028 times and providing valuable comments and for his very engaging 2029 discussions on several topics of this draft that helped shape this 2030 document. We would also like to thank Pedro Marques, Kaushik Ghosh, 2031 Nischal Sheth, Robert Raszuk, Amit Shukla and Nadeem Mohammed for 2032 discussions that helped shape this document. We would also like to 2033 thank Han Nguyen for his comments and support of this work. We would 2034 also like to thank Steve Kensil and Reshad Rahman for their reviews. 2035 Last but not least, many thanks to Jakob Heitz for his help to 2036 improve several sections of this draft. 2038 21. Security Considerations 2040 22. IANA Considerations 2041 23. References 2043 23.1 Normative References 2045 [RFC4364] "BGP/MPLS IP VPNs", Rosen, Rekhter, et. al., February 2006 2047 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 2048 (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 2049 4761, January 2007. 2051 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 2052 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 2053 RFC 4762, January 2007. 2055 [RFC4271] Y. Rekhter et. al., "A Border Gateway Protocol 4 (BGP-4)", 2056 RFC 4271, January 2006 2058 [RFC4760] T. Bates et. al., "Multiprotocol Extensions for BGP-4", RFC 2059 4760, January 2007 2061 23.2 Informative References 2063 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2064 Requirement Levels", BCP 14, RFC 2119, March 1997. 2066 [EVPN-REQ] A. Sajassi, R. Aggarwal et. al., "Requirements for 2067 Ethernet VPN", draft-ietf-l2vpn-evpn-req-04.txt, July 2068 2013. 2070 [VPLS-MCAST] "Multicast in VPLS". R. Aggarwal et.al., draft-ietf- 2071 l2vpn-vpls-mcast-14.txt, July 2013. 2073 [RT-CONSTRAIN] P. Marques et. al., "Constrained Route Distribution 2074 for Border Gateway Protocol/MultiProtocol Label Switching 2075 (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks 2076 (VPNs)", RFC 4684, November 2006 2078 24. Author's Address 2080 Ali Sajassi 2081 Cisco 2082 Email: sajassi@cisco.com 2084 Rahul Aggarwal 2085 Email: raggarwa_1@yahoo.com 2086 Wim Henderickx 2087 Alcatel-Lucent 2088 e-mail: wim.henderickx@alcatel-lucent.com 2090 Aldrin Isaac 2091 Bloomberg 2092 Email: aisaac71@bloomberg.net 2094 James Uttaro 2095 AT&T 2096 200 S. Laurel Avenue 2097 Middletown, NJ 07748 2098 USA 2099 Email: uttaro@att.com 2101 Nabil Bitar 2102 Verizon Communications 2103 Email : nabil.n.bitar@verizon.com 2105 Ravi Shekhar 2106 Juniper Networks 2107 1194 N. Mathilda Ave. 2108 Sunnyvale, CA 94089 US 2109 Email: rshekhar@juniper.net 2111 Florin Balus 2112 Alcatel-Lucent 2113 e-mail: Florin.Balus@alcatel-lucent.com 2115 Keyur Patel 2116 Cisco 2117 170 West Tasman Drive 2118 San Jose, CA 95134, US 2119 Email: keyupate@cisco.com 2121 Sami Boutros 2122 Cisco 2123 170 West Tasman Drive 2124 San Jose, CA 95134, US 2125 Email: sboutros@cisco.com 2126 Samer Salam 2127 Cisco 2128 Email: ssalam@cisco.com 2130 John Drake 2131 Juniper Networks 2132 Email: jdrake@juniper.net