idnits 2.17.1 draft-ietf-bess-rfc7432bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 25, 2021) is 915 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 3023 == Missing Reference: 'RFC4448' is mentioned on line 1056, but not defined == Missing Reference: 'EVI' is mentioned on line 2255, but not defined == Missing Reference: 'BD' is mentioned on line 2255, but not defined == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-igmp-mld-proxy-07 == Outdated reference: A later version (-08) exists of draft-ietf-bess-evpn-mh-split-horizon-01 == Outdated reference: A later version (-14) exists of draft-ietf-bess-mvpn-evpn-aggregation-label-06 == Outdated reference: A later version (-14) exists of draft-ietf-bier-evpn-04 -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Working Group A. Sajassi 3 Internet-Draft LA. Burdet, Ed. 4 Intended status: Standards Track Cisco 5 Expires: April 28, 2022 J. Drake 6 Juniper 7 J. Rabadan 8 Nokia 9 October 25, 2021 11 BGP MPLS-Based Ethernet VPN 12 draft-ietf-bess-rfc7432bis-02 14 Abstract 16 This document describes procedures for BGP MPLS-based Ethernet VPNs 17 (EVPN). The procedures described here meet the requirements 18 specified in RFC 7209 -- "Requirements for Ethernet VPN (EVPN)". 20 Note to Readers 22 _RFC EDITOR: please remove this section before publication_ 24 The complete and detailed set of all changes between this version and 25 RFC7432 may be found as an Annotated Diff (rfcdiff) here [1]. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on April 28, 2022. 44 Copyright Notice 46 Copyright (c) 2021 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (https://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 62 1.1. Summary of changes from RFC 7432 . . . . . . . . . . . . 4 63 2. Specification of Requirements . . . . . . . . . . . . . . . . 5 64 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 4. BGP MPLS-Based EVPN Overview . . . . . . . . . . . . . . . . 7 66 5. Ethernet Segment . . . . . . . . . . . . . . . . . . . . . . 9 67 6. Ethernet Tag ID . . . . . . . . . . . . . . . . . . . . . . . 12 68 6.1. VLAN-Based Service Interface . . . . . . . . . . . . . . 12 69 6.2. VLAN Bundle Service Interface . . . . . . . . . . . . . . 13 70 6.2.1. Port-Based Service Interface . . . . . . . . . . . . 13 71 6.3. VLAN-Aware Bundle Service Interface . . . . . . . . . . . 13 72 6.3.1. Port-Based VLAN-Aware Service Interface . . . . . . . 14 73 6.4. EVPN PE Model . . . . . . . . . . . . . . . . . . . . . . 14 74 7. BGP EVPN Routes . . . . . . . . . . . . . . . . . . . . . . . 16 75 7.1. Ethernet Auto-discovery Route . . . . . . . . . . . . . . 17 76 7.2. MAC/IP Advertisement Route . . . . . . . . . . . . . . . 18 77 7.3. Inclusive Multicast Ethernet Tag Route . . . . . . . . . 18 78 7.4. Ethernet Segment Route . . . . . . . . . . . . . . . . . 19 79 7.5. ESI Label Extended Community . . . . . . . . . . . . . . 19 80 7.6. ES-Import Route Target . . . . . . . . . . . . . . . . . 20 81 7.7. MAC Mobility Extended Community . . . . . . . . . . . . . 21 82 7.8. Default Gateway Extended Community . . . . . . . . . . . 22 83 7.9. Route Distinguisher Assignment per MAC-VRF . . . . . . . 22 84 7.10. Route Targets . . . . . . . . . . . . . . . . . . . . . . 22 85 7.10.1. Auto-derivation from the Ethernet Tag (VLAN ID) . . 22 86 7.11. EVPN Layer 2 Attributes Extended Community . . . . . . . 23 87 7.11.1. EVPN Layer 2 Attributes Partitioning . . . . . . . . 24 88 7.12. Route Prioritization . . . . . . . . . . . . . . . . . . 26 89 8. Multihoming Functions . . . . . . . . . . . . . . . . . . . . 26 90 8.1. Multihomed Ethernet Segment Auto-discovery . . . . . . . 26 91 8.1.1. Constructing the Ethernet Segment Route . . . . . . . 26 92 8.2. Fast Convergence . . . . . . . . . . . . . . . . . . . . 27 93 8.2.1. Constructing Ethernet A-D per Ethernet Segment Route 27 94 8.2.1.1. Ethernet A-D Route Targets . . . . . . . . . . . 28 95 8.3. Split Horizon . . . . . . . . . . . . . . . . . . . . . . 28 96 8.3.1. ESI Label Assignment . . . . . . . . . . . . . . . . 29 97 8.3.1.1. Ingress Replication . . . . . . . . . . . . . . . 29 98 8.3.1.2. P2MP MPLS LSPs . . . . . . . . . . . . . . . . . 30 99 8.3.1.3. MP2MP MPLS LSPs . . . . . . . . . . . . . . . . . 31 100 8.4. Aliasing and Backup Path . . . . . . . . . . . . . . . . 32 101 8.4.1. Constructing Ethernet A-D per EVPN Instance Route . . 33 102 8.5. Designated Forwarder Election . . . . . . . . . . . . . . 33 103 8.6. Signaling Primary and Backup DF Elected PEs . . . . . . . 36 104 8.7. Interoperability with Single-Homing PEs . . . . . . . . . 36 105 9. Determining Reachability to Unicast MAC Addresses . . . . . . 37 106 9.1. Local Learning . . . . . . . . . . . . . . . . . . . . . 37 107 9.2. Remote Learning . . . . . . . . . . . . . . . . . . . . . 37 108 9.2.1. Constructing MAC/IP Address Advertisement . . . . . . 38 109 9.2.2. Route Resolution . . . . . . . . . . . . . . . . . . 39 110 10. ARP and ND . . . . . . . . . . . . . . . . . . . . . . . . . 40 111 10.1. Default Gateway . . . . . . . . . . . . . . . . . . . . 41 112 10.1.1. Best Path selection for Default Gateway . . . . . . 43 113 11. Handling of Multi-destination Traffic . . . . . . . . . . . . 43 114 11.1. Constructing Inclusive Multicast Ethernet Tag Route . . 44 115 11.2. P-Tunnel Identification . . . . . . . . . . . . . . . . 44 116 12. Processing of Unknown Unicast Packets . . . . . . . . . . . . 45 117 12.1. Ingress Replication . . . . . . . . . . . . . . . . . . 46 118 12.2. P2MP MPLS LSPs . . . . . . . . . . . . . . . . . . . . . 46 119 13. Forwarding Unicast Packets . . . . . . . . . . . . . . . . . 46 120 13.1. Forwarding Packets Received from a CE . . . . . . . . . 46 121 13.2. Forwarding Packets Received from a Remote PE . . . . . . 47 122 13.2.1. Unknown Unicast Forwarding . . . . . . . . . . . . . 48 123 13.2.2. Known Unicast Forwarding . . . . . . . . . . . . . . 48 124 14. Load Balancing of Unicast Packets . . . . . . . . . . . . . . 48 125 14.1. Load Balancing of Traffic from a PE to Remote CEs . . . 48 126 14.1.1. Single-Active Redundancy Mode . . . . . . . . . . . 49 127 14.1.2. All-Active Redundancy Mode . . . . . . . . . . . . . 49 128 14.2. Load Balancing of Traffic between a PE and a Local CE . 51 129 14.2.1. Data-Plane Learning . . . . . . . . . . . . . . . . 51 130 14.2.2. Control-Plane Learning . . . . . . . . . . . . . . . 51 131 15. MAC Mobility . . . . . . . . . . . . . . . . . . . . . . . . 51 132 15.1. MAC Duplication Issue . . . . . . . . . . . . . . . . . 53 133 15.2. Sticky MAC Addresses . . . . . . . . . . . . . . . . . . 54 134 15.3. Loop Protection . . . . . . . . . . . . . . . . . . . . 54 135 16. Multicast and Broadcast . . . . . . . . . . . . . . . . . . . 55 136 16.1. Ingress Replication . . . . . . . . . . . . . . . . . . 55 137 16.2. P2MP or MP2MP LSPs . . . . . . . . . . . . . . . . . . . 56 138 16.2.1. Inclusive Trees . . . . . . . . . . . . . . . . . . 56 139 17. Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 57 140 17.1. Transit Link and Node Failures between PEs . . . . . . . 57 141 17.2. PE Failures . . . . . . . . . . . . . . . . . . . . . . 57 142 17.3. PE-to-CE Network Failures . . . . . . . . . . . . . . . 57 143 18. Frame Ordering . . . . . . . . . . . . . . . . . . . . . . . 58 144 18.1. Flow Label . . . . . . . . . . . . . . . . . . . . . . . 58 146 19. Use of Domain-wide Common Block (DCB) Labels . . . . . . . . 59 147 20. Security Considerations . . . . . . . . . . . . . . . . . . . 60 148 21. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 61 149 22. References . . . . . . . . . . . . . . . . . . . . . . . . . 62 150 22.1. Normative References . . . . . . . . . . . . . . . . . . 62 151 22.2. Informative References . . . . . . . . . . . . . . . . . 63 152 22.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 65 153 Appendix A. Acknowledgments for This Document (2021) . . . . . . 66 154 Appendix B. Contributors for This Document (2021) . . . . . . . 66 155 Appendix C. Acknowledgments from the First Edition (2015) . . . 66 156 C.1. Contributors from the First Edition (2015) . . . . . . . 66 157 C.2. Authors from the First Edition (2015) . . . . . . . . . . 67 158 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 67 160 1. Introduction 162 Virtual Private LAN Service (VPLS), as defined in [RFC4664], 163 [RFC4761], and [RFC4762], is a proven and widely deployed technology. 164 However, the existing solution has a number of limitations when it 165 comes to multihoming and redundancy, multicast optimization, 166 provisioning simplicity, flow-based load balancing, and multipathing; 167 these limitations are important considerations for Data Center (DC) 168 deployments. [RFC7209] describes the motivation for a new solution 169 to address these limitations. It also outlines a set of requirements 170 that the new solution must address. 172 This document describes procedures for a BGP MPLS-based solution 173 called Ethernet VPN (EVPN) to address the requirements specified in 174 [RFC7209]. Please refer to [RFC7209] for the detailed requirements 175 and motivation. EVPN requires extensions to existing IP/MPLS 176 protocols as described in this document. In addition to these 177 extensions, EVPN uses several building blocks from existing MPLS 178 technologies. 180 1.1. Summary of changes from RFC 7432 182 This section describes the significant changes between [RFC4762] and 183 this document. 185 - Updates to Terminology i.a. BD, EVI, Ethernet Tag ID, P-tunnel, 186 DF/BDF/NDF, DCB; 188 - Added Section 6.4 for description and disambiguation of EVPN 189 bridging terminology; 191 - Added ES-Import route target auto-derivation for ESI types 0,4,5; 192 - Precision of 'encoding' language for all references to 'Label' 193 fields; 195 - Added Section 7.11 for usage of 196 EVPN Layer 2 Attributes Extended Community in EVPN Bridging; 198 - Added Section 7.12 proposes relative order-of-magnitude route 199 priority and processing to help achieve fast convergence; 201 - Corrected Section 8.2.1 to include reference to E-TREE exception; 203 - Updated Section 8.5 to include Backup- and Non-Designated Forwarder 204 roles to DF-Election algorithm, description of those roles and 205 signaling updates; 207 - Updated Section 8.5 to specify DF Election behaviour for 208 Originating IP in different family 210 - Added Section 8.3.1.3 for MP2MP MPLS LSPs and updated Section 12.2; 212 - Address conflicts in Best Path algorithm for Default Gateway in 213 Section 10.1.1; 215 - Update to Section 14.1.1 redundancy mode description; 217 - Added Section 15.3 describing a loop detection and protection 218 mechanism; 220 - Added Section 18.1 describing Flow-label usage and signaling (see 221 also new Section 7.11); 223 - Section 19 specifies use of Domain-wide Common Block (DCB) for 224 several cases; 226 - Restructuring, namely Section 8.5 to Section 5, simplify all 227 Ethernet Tag ID references to Section 6 ; and 229 - Corrected Route Target and other extcomm 'attributes' references to 230 'extended communities'; 232 - Cross-references and editorial changes; 234 2. Specification of Requirements 236 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 237 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 238 document are to be interpreted as described in [RFC2119]. 240 3. Terminology 242 BD: Broadcast Domain. In a bridged network, the broadcast domain 243 corresponds to a Virtual LAN (VLAN), where a VLAN is typically 244 represented by a single VLAN ID (VID) but can be represented by 245 several VIDs where Shared VLAN Learning (SVL) is used per 246 [IEEE.802.1Q_2014]. 248 Bridge Table: An instantiation of a broadcast domain on a MAC-VRF. 250 CE: Customer Edge device, e.g., a host, router, or switch. 252 EVI: An EVPN instance spanning the Provider Edge (PE) devices 253 participating in that EVPN. An EVI may be comprised of one BD 254 (VLAN-based, VLAN Bundle, or Port-based services) or multiple BDs 255 (VLAN-aware Bundle or Port-based VLAN-Aware services). 257 MAC-VRF: A Virtual Routing and Forwarding table for Media Access 258 Control (MAC) addresses on a PE. 260 Ethernet Segment (ES): When a customer site (device or network) is 261 connected to one or more PEs via a set of Ethernet links, then 262 that set of links is referred to as an 'Ethernet segment'. 264 Ethernet Segment Identifier (ESI): A unique non-zero identifier that 265 identifies an Ethernet segment is called an 'Ethernet Segment 266 Identifier'. 268 VID: VLAN Identifier. 270 Ethernet Tag: Used to represent a BD that is configured on a given 271 ES for the purposes of DF election and identification 272 for frames received from the CE. Note that any of the following 273 may be used to represent a BD: VIDs (including Q-in-Q tags), 274 configured IDs, VNIs (Virtual Extensible Local Area Network 275 (VXLAN) Network Identifiers), normalized VIDs, I-SIDs (Service 276 Instance Identifiers), etc., as long as the representation of the 277 BDs is configured consistently across the multihomed PEs attached 278 to that ES. 280 Ethernet Tag ID: Normalized network wide ID that is used to identify 281 a BD within an EVI and carried in EVPN routes. 283 LACP: Link Aggregation Control Protocol. 285 MP2MP: Multipoint to Multipoint. 287 MP2P: Multipoint to Point. 289 P2MP: Point to Multipoint. 291 P2P: Point to Point. 293 P-tunnel: A tunnel through the network of one or more SPs. In this 294 document, P-tunnels are instantiated as bidirectional multicast 295 distribution trees. 297 PE: Provider Edge device. 299 Single-Active Redundancy Mode: When only a single PE, among all the 300 PEs attached to an Ethernet segment, is allowed to forward traffic 301 to/from that Ethernet segment for a given VLAN, then the Ethernet 302 segment is defined to be operating in Single-Active redundancy 303 mode. 305 All-Active Redundancy Mode: When all PEs attached to an Ethernet 306 segment are allowed to forward known unicast traffic to/from that 307 Ethernet segment for a given VLAN, then the Ethernet segment is 308 defined to be operating in All-Active redundancy mode. 310 BUM: Broadcast, unknown unicast, and multicast. 312 DF: Designated Forwarder. 314 Backup-DF (BDF): Backup-Designated Forwarder. 316 Non-DF (NDF): Non-Designated Forwarder. 318 DCB: Domain-wide Common Block (of labels), as in 319 [I-D.ietf-bess-mvpn-evpn-aggregation-label]. 321 AC: Attachment Circuit. 323 4. BGP MPLS-Based EVPN Overview 325 This section provides an overview of EVPN. An EVPN instance 326 comprises Customer Edge devices (CEs) that are connected to Provider 327 Edge devices (PEs) that form the edge of the MPLS infrastructure. A 328 CE may be a host, a router, or a switch. The PEs provide virtual 329 Layer 2 bridged connectivity between the CEs. There may be multiple 330 EVPN instances in the provider's network. 332 The PEs may be connected by an MPLS Label Switched Path (LSP) 333 infrastructure, which provides the benefits of MPLS technology, such 334 as fast reroute, resiliency, etc. The PEs may also be connected by 335 an IP infrastructure, in which case IP/GRE (Generic Routing 336 Encapsulation) tunneling or other IP tunneling can be used between 337 the PEs. The detailed procedures in this document are specified only 338 for MPLS LSPs as the tunneling technology. However, these procedures 339 are designed to be extensible to IP tunneling as the Packet Switched 340 Network (PSN) tunneling technology. 342 In an EVPN, MAC learning between PEs occurs not in the data plane (as 343 happens with traditional bridging in VPLS [RFC4761] [RFC4762]) but in 344 the control plane. Control-plane learning offers greater control 345 over the MAC learning process, such as restricting who learns what, 346 and the ability to apply policies. Furthermore, the control plane 347 chosen for advertising MAC reachability information is multi-protocol 348 (MP) BGP (similar to IP VPNs [RFC4364]). This provides flexibility 349 and the ability to preserve the "virtualization" or isolation of 350 groups of interacting agents (hosts, servers, virtual machines) from 351 each other. In EVPN, PEs advertise the MAC addresses learned from 352 the CEs that are connected to them, along with an MPLS label, to 353 other PEs in the control plane using Multiprotocol BGP (MP-BGP). 354 Control-plane learning enables load balancing of traffic to and from 355 CEs that are multihomed to multiple PEs. This is in addition to load 356 balancing across the MPLS core via multiple LSPs between the same 357 pair of PEs. In other words, it allows CEs to connect to multiple 358 active points of attachment. It also improves convergence times in 359 the event of certain network failures. 361 However, learning between PEs and CEs is done by the method best 362 suited to the CE: data-plane learning, IEEE 802.1x, the Link Layer 363 Discovery Protocol (LLDP), IEEE 802.1aq, Address Resolution Protocol 364 (ARP), management plane, or other protocols. 366 It is a local decision as to whether the Layer 2 forwarding table on 367 a PE is populated with all the MAC destination addresses known to the 368 control plane, or whether the PE implements a cache-based scheme. 369 For instance, the MAC forwarding table may be populated only with the 370 MAC destinations of the active flows transiting a specific PE. 372 The policy attributes of EVPN are very similar to those of IP-VPN. 373 An EVPN instance requires a Route Distinguisher (RD) that is unique 374 per MAC-VRF and one or more globally unique Route Targets (RTs). A 375 CE attaches to a BD on a PE, on an Ethernet interface that may be 376 configured for one or more Ethernet tags. If the Ethernet tags are 377 VLAN IDs, some deployment scenarios guarantee uniqueness of VLAN IDs 378 across EVPN instances: all points of attachment for a given EVPN 379 instance use the same VLAN ID, and no other EVPN instance uses this 380 VLAN ID. This document refers to this case as a "Unique VLAN EVPN" 381 and describes simplified procedures to optimize for it. See for 382 example Section 7.10.1 which describes deriving automatically the 383 RT(s) for each EVPN instance from the corresponding VID. 385 5. Ethernet Segment 387 As indicated in [RFC7209], each Ethernet segment needs a unique 388 identifier in an EVPN. This section defines how such identifiers are 389 assigned and how they are encoded for use in EVPN signaling. Later 390 sections of this document describe the protocol mechanisms that 391 utilize the identifiers. 393 When a customer site is connected to one or more PEs via a set of 394 Ethernet links, then this set of Ethernet links constitutes an 395 "Ethernet segment". For a multihomed site, each Ethernet segment 396 (ES) is identified by a unique non-zero identifier called an Ethernet 397 Segment Identifier (ESI). An ESI is encoded as a 10-octet integer in 398 line format with the most significant octet sent first. The 399 following two ESI values are reserved: 401 - ESI 0 denotes a single-homed site. 403 - ESI {0xFF} (repeated 10 times) is known as MAX-ESI and is reserved. 405 In general, an Ethernet segment SHOULD have a non-reserved ESI that 406 is unique network wide (i.e., across all EVPN instances on all the 407 PEs). If the CE(s) constituting an Ethernet segment is (are) managed 408 by the network operator, then ESI uniqueness should be guaranteed; 409 however, if the CE(s) is (are) not managed, then the operator MUST 410 configure a network-wide unique ESI for that Ethernet segment. This 411 is required to enable auto-discovery of Ethernet segments and 412 Designated Forwarder (DF) election. 414 In a network with managed and non-managed CEs, the ESI has the 415 following format: 417 +---+---+---+---+---+---+---+---+---+---+ 418 | T | ESI Value | 419 +---+---+---+---+---+---+---+---+---+---+ 421 Where: 423 T (ESI Type) is a 1-octet field (most significant octet) that 424 specifies the format of the remaining 9 octets (ESI Value). The 425 following six ESI types can be used: 427 - Type 0 (T=0x00) - This type indicates an arbitrary 9-octet ESI 428 value, which is managed and configured by the operator. 430 - Type 1 (T=0x01) - When IEEE 802.1AX LACP is used between the PEs 431 and CEs, this ESI type indicates an auto-generated ESI value 432 determined from LACP by concatenating the following parameters: 434 + CE LACP System MAC address (6 octets). The CE LACP System MAC 435 address MUST be encoded in the high-order 6 octets of the ESI 436 Value field. 438 + CE LACP Port Key (2 octets). The CE LACP port key MUST be 439 encoded in the 2 octets next to the System MAC address. 441 + The remaining octet will be set to 0x00. 443 As far as the CE is concerned, it would treat the multiple PEs that 444 it is connected to as the same switch. This allows the CE to 445 aggregate links that are attached to different PEs in the same 446 bundle. 448 This mechanism could be used only if it produces ESIs that satisfy 449 the uniqueness requirement specified above. 451 - Type 2 (T=0x02) - This type is used in the case of indirectly 452 connected hosts via a bridged LAN between the CEs and the PEs. The 453 ESI Value is auto-generated and determined based on the Layer 2 454 bridge protocol as follows: If the Multiple Spanning Tree Protocol 455 (MSTP) is used in the bridged LAN, then the value of the ESI is 456 derived by listening to Bridge PDUs (BPDUs) on the Ethernet 457 segment. To achieve this, the PE is not required to run MSTP. 458 However, the PE must learn the Root Bridge MAC address and Bridge 459 Priority of the root of the Internal Spanning Tree (IST) by 460 listening to the BPDUs. The ESI Value is constructed as follows: 462 + Root Bridge MAC address (6 octets). The Root Bridge MAC address 463 MUST be encoded in the high-order 6 octets of the ESI Value 464 field. 466 + Root Bridge Priority (2 octets). The CE Root Bridge Priority 467 MUST be encoded in the 2 octets next to the Root Bridge MAC 468 address. 470 + The remaining octet will be set to 0x00. 472 This mechanism could be used only if it produces ESIs that satisfy 473 the uniqueness requirement specified above. 475 - Type 3 (T=0x03) - This type indicates a MAC-based ESI Value that 476 can be auto-generated or configured by the operator. The ESI Value 477 is constructed as follows: 479 + System MAC address (6 octets). The PE MAC address MUST be 480 encoded in the high-order 6 octets of the ESI Value field. 482 + Local Discriminator value (3 octets). The Local Discriminator 483 value MUST be encoded in the low-order 3 octets of the ESI Value. 485 This mechanism could be used only if it produces ESIs that satisfy 486 the uniqueness requirement specified above. 488 - Type 4 (T=0x04) - This type indicates a router-ID ESI Value that 489 can be auto-generated or configured by the operator. The ESI Value 490 is constructed as follows: 492 + Router ID (4 octets). The system router ID MUST be encoded in 493 the high-order 4 octets of the ESI Value field. 495 + Local Discriminator value (4 octets). The Local Discriminator 496 value MUST be encoded in the 4 octets next to the IP address. 498 + The low-order octet of the ESI Value will be set to 0x00. 500 This mechanism could be used only if it produces ESIs that satisfy 501 the uniqueness requirement specified above. 503 - Type 5 (T=0x05) - This type indicates an Autonomous System 504 (AS)-based ESI Value that can be auto-generated or configured by 505 the operator. The ESI Value is constructed as follows: 507 + AS number (4 octets). This is an AS number owned by the system 508 and MUST be encoded in the high-order 4 octets of the ESI Value 509 field. If a 2-octet AS number is used, the high-order extra 510 2 octets will be 0x0000. 512 + Local Discriminator value (4 octets). The Local Discriminator 513 value MUST be encoded in the 4 octets next to the AS number. 515 + The low-order octet of the ESI Value will be set to 0x00. 517 This mechanism could be used only if it produces ESIs that satisfy 518 the uniqueness requirement specified above. 520 Note that a CE always sends packets belonging to a specific flow 521 using a single link towards a PE. For instance, if the CE is a host, 522 then, as mentioned earlier, the host treats the multiple links that 523 it uses to reach the PEs as a Link Aggregation Group (LAG). The CE 524 employs a local hashing function to map traffic flows onto links in 525 the LAG. 527 If a bridged network is multihomed to more than one PE in an EVPN 528 network via switches, then the support of All-Active redundancy mode 529 requires the bridged network to be connected to two or more PEs using 530 a LAG. 532 If a bridged network does not connect to the PEs using a LAG, then 533 only one of the links between the bridged network and the PEs must be 534 the active link for a given . In this case, the set of 535 Ethernet A-D per ES routes advertised by each PE MUST have the 536 "Single-Active" bit in the flags of the ESI Label extended community 537 set to 1. 539 6. Ethernet Tag ID 541 An Ethernet Tag ID is a 32-bit field containing either a 12-bit or 542 24-bit identifier that identifies a particular broadcast domain 543 (e.g., a VLAN) in an EVPN instance. The 12-bit identifier is called 544 the VLAN ID (VID). An EVPN instance consists of one or more 545 broadcast domains (one or more VLANs). VLANs are assigned to a given 546 EVPN instance by the provider of the EVPN service. A given VLAN can 547 itself be represented by multiple VIDs. In such cases, the PEs 548 participating in that VLAN for a given EVPN instance are responsible 549 for performing VLAN ID translation to/from locally attached CE 550 devices. 552 The following subsections discuss the relationship between broadcast 553 domains (e.g., VLANs), Ethernet Tag IDs (e.g., VIDs), and MAC-VRFs as 554 well as the setting of the Ethernet Tag ID, in the various EVPN BGP 555 routes (defined in Section 8), for the different types of service 556 interfaces described in [RFC7209]. 558 The following Ethernet Tag ID value is reserved: 560 - Ethernet Tag ID {0xFFFFFFFF} is known as MAX-ET. 562 6.1. VLAN-Based Service Interface 564 With this service interface, an EVPN instance consists of only a 565 single broadcast domain (e.g., a single VLAN). Therefore, there is a 566 one-to-one mapping between a VID on this interface and a MAC-VRF. 567 Since a MAC-VRF corresponds to a single VLAN, it consists of a single 568 bridge table corresponding to that VLAN. If the VLAN is represented 569 by multiple VIDs (e.g., a different VID per Ethernet segment per PE), 570 then each PE needs to perform VID translation for frames destined to 571 its Ethernet segment(s). In such scenarios, the Ethernet frames 572 transported over an MPLS/IP network SHOULD remain tagged with the 573 originating VID, and a VID translation MUST be supported in the data 574 path and MUST be performed on the disposition PE. The Ethernet Tag 575 ID in all EVPN routes MUST be set to 0. 577 6.2. VLAN Bundle Service Interface 579 With this service interface, an EVPN instance corresponds to multiple 580 broadcast domains (e.g., multiple VLANs); however, only a single 581 bridge table is maintained per MAC-VRF, which means multiple VLANs 582 share the same bridge table. This implies that MAC addresses MUST be 583 unique across all VLANs for that EVI in order for this service to 584 work. In other words, there is a many-to-one mapping between VLANs 585 and a MAC-VRF, and the MAC-VRF consists of a single bridge table. 586 Furthermore, a single VLAN must be represented by a single VID -- 587 e.g., no VID translation is allowed for this service interface type. 588 The MPLS-encapsulated frames MUST remain tagged with the originating 589 VID. Tag translation is NOT permitted. The Ethernet Tag ID in all 590 EVPN routes MUST be set to 0. 592 6.2.1. Port-Based Service Interface 594 This service interface is a special case of the VLAN bundle service 595 interface, where all of the VLANs on the port are part of the same 596 service and map to the same bundle. The procedures are identical to 597 those described in Section 6.2. 599 6.3. VLAN-Aware Bundle Service Interface 601 With this service interface, an EVPN instance consists of multiple 602 broadcast domains (e.g., multiple VLANs) with each VLAN having its 603 own bridge table -- i.e., multiple bridge tables (one per VLAN) are 604 maintained by a single MAC-VRF corresponding to the EVPN instance. 606 Broadcast, unknown unicast, or multicast (BUM) traffic is sent only 607 to the CEs in a given broadcast domain; however, the broadcast 608 domains within an EVI either MAY each have their own P-Tunnel or MAY 609 share P-Tunnels -- e.g., all of the broadcast domains in an EVI MAY 610 share a single P-Tunnel. 612 In the case where a single VLAN is represented by a single VID and 613 thus no VID translation is required, an MPLS-encapsulated packet MUST 614 carry that VID. The Ethernet Tag ID in all EVPN routes MUST be set 615 to that VID. The advertising PE MAY advertise the MPLS Label1 in the 616 MAC/IP Advertisement route representing ONLY the EVI or representing 617 both the Ethernet Tag ID and the EVI. This decision is only a local 618 matter by the advertising PE (which is also the disposition PE) and 619 doesn't affect any other PEs. 621 In the case where a single VLAN is represented by different VIDs on 622 different CEs and thus VID translation is required, a normalized 623 Ethernet Tag ID (VID) MUST be carried in the EVPN BGP routes. 624 Furthermore, the advertising PE advertises the MPLS Label1 in the 625 MAC/IP Advertisement route representing both the Ethernet Tag ID and 626 the EVI, so that upon receiving an MPLS-encapsulated packet, it can 627 identify the corresponding bridge table from the MPLS EVPN label and 628 perform Ethernet Tag ID translation ONLY at the disposition PE -- 629 i.e., the Ethernet frames transported over the MPLS/IP network MUST 630 remain tagged with the originating VID, and VID translation is 631 performed on the disposition PE. The Ethernet Tag ID in all EVPN 632 routes MUST be set to the normalized Ethernet Tag ID assigned by the 633 EVPN provider. 635 6.3.1. Port-Based VLAN-Aware Service Interface 637 This service interface is a special case of the VLAN-aware bundle 638 service interface, where all of the VLANs on the port are part of the 639 same service and are mapped to a single bundle but without any VID 640 translation. The procedures are a subset of those described in 641 Section 6.3. 643 6.4. EVPN PE Model 645 Since this document discusses EVPN operation in relationship to MAC- 646 VRF, EVI, Broadcast Domain (BD), and Bridge Table (BT), it is 647 important to understand the relationship between these terms. 648 Therefore, the following PE model is depicted below to illustrate the 649 relationship among them. 651 +--------------------------------------------------+ 652 | | 653 | +------------------+ EVPN PE | 654 | Attachment | +------------------+ | 655 | Circuit(AC1) | | +----------+ | MPLS/NVO tnl 656 ----------------------*Bridge | | +----- 657 | | | |Table(BT1)| | / \ \ 658 | | | | |<------------------> |Eth| 659 | | | | VLAN x | | \ / / 660 | | | +----------+ | +----- 661 | | | ... | | 662 | | | +----------+ | MPLS/NVO tnl 663 | | | |Bridge | | +----- 664 | | | |Table(BT2)| | / \ \ 665 | | | | |<-------------------> |Eth| 666 ----------------------* VLAN y | | \ / / 667 | AC2 | | +----------+ | +----- 668 | | | MAC-VRF1 | | 669 | +-+ RD1/RT1 | | 670 | +------------------+ | 671 | | 672 | | 673 +---------------------------------------------------+ 675 Figure 1: EVPN PE Model 677 A tenant configured for an EVPN service instance (i.e, EVI) on a PE, 678 is instantiated by a single MAC Virtual Routing and Forwarding table 679 (MAC-VRF) on that PE. A MAC-VRF consists of one or more Bridge 680 Tables (BTs) where each BT corresponds to a VLAN (broadcast domain - 681 BD). If a service interface for an EVPN PE is configured in VLAN- 682 Based mode (i.e., section 6.1), then there is only a single BT per 683 MAC-VRF (per EVI) - i.e., there is only one tenant VLAN per EVI. 684 However, if a service interface for an EVPN PE is configured in VLAN- 685 Aware Bundle mode (i.e., section 6.3), then there are several BTs per 686 MAC-VRF (per EVI) - i.e., there are several tenant VLANs per EVI. 687 The relationship among these terms can be summarized as follow: 689 - An EVI consists of one or more BDs and a MAC-VRF consists of one or 690 more BTs, one for each BD. A BD is identified by an Ethernet Tag 691 ID which is typically represented by a single VLAN ID (VID); 692 however, it can be represented by multiple VIDs (i.e., Shared VLAN 693 Learning (SVL) mode in 802.1Q). 695 - In VLAN-based mode, there is one EVI per VLAN and thus one BD/BT 696 per VLAN. Furthermore, there is one BT per MAC-VRF. 698 - In VLAN-bundle service, it can be considered as analogous to SVL 699 mode in 802.1Q i.e., one BD per EVI and one BT per MAC-VRF with 700 multiple VIDs representing that BD. 702 - In VLAN-aware bundle service, there is one EVI with multiple BDs 703 where each BD is represented by a VLAN. Furthermore, there are 704 multiple BTs in a single MAC-VRF. 706 Since a single tenant subnet is typically (and in this document) 707 represented by a VLAN (and thus supported by a single BT), for a 708 given tenant there are as many BTs as there are subnets as shown in 709 the PE model above. 711 MAC-VRF is identified by its corresponding route target and route 712 distinguisher. If operating in EVPN VLAN-Based mode, then a 713 receiving PE that receives an EVPN route with MAC-VRF route target 714 can identify the corresponding BT; however, if operating in EVPN 715 VLAN-Aware Bundle mode, then the receiving PE needs both the MAC-VRF 716 route target and Ethernet Tag ID in order to identify the 717 corresponding BT. 719 7. BGP EVPN Routes 721 This document defines a new BGP Network Layer Reachability 722 Information (NLRI) called the EVPN NLRI. 724 The format of the EVPN NLRI is as follows: 726 +-----------------------------------+ 727 | Route Type (1 octet) | 728 +-----------------------------------+ 729 | Length (1 octet) | 730 +-----------------------------------+ 731 | Route Type specific (variable) | 732 +-----------------------------------+ 734 The Route Type field defines the encoding of the rest of the EVPN 735 NLRI (Route Type specific EVPN NLRI). 737 The Length field indicates the length in octets of the Route Type 738 specific field of the EVPN NLRI. 740 This document defines the following Route Types: 742 + 1 - Ethernet Auto-Discovery (A-D) route 743 + 2 - MAC/IP Advertisement route 744 + 3 - Inclusive Multicast Ethernet Tag route 745 + 4 - Ethernet Segment route 747 The detailed encoding and procedures for these route types are 748 described in subsequent sections. 750 The EVPN NLRI is carried in BGP [RFC4271] using BGP Multiprotocol 751 Extensions [RFC4760] with an Address Family Identifier (AFI) of 25 752 (L2VPN) and a Subsequent Address Family Identifier (SAFI) of 70 753 (EVPN). The NLRI field in the MP_REACH_NLRI/MP_UNREACH_NLRI 754 attribute contains the EVPN NLRI (encoded as specified above). 756 In order for two BGP speakers to exchange labeled EVPN NLRI, they 757 must use BGP Capabilities Advertisements to ensure that they both are 758 capable of properly processing such NLRI. This is done as specified 759 in [RFC4760], by using capability code 1 (multiprotocol BGP) with an 760 AFI of 25 (L2VPN) and a SAFI of 70 (EVPN). 762 7.1. Ethernet Auto-discovery Route 764 An Ethernet A-D route type specific EVPN NLRI consists of the 765 following: 767 +---------------------------------------+ 768 | Route Distinguisher (RD) (8 octets) | 769 +---------------------------------------+ 770 |Ethernet Segment Identifier (10 octets)| 771 +---------------------------------------+ 772 | Ethernet Tag ID (4 octets) | 773 +---------------------------------------+ 774 | MPLS Label (3 octets) | 775 +---------------------------------------+ 777 For the purpose of BGP route key processing, only the Ethernet 778 Segment Identifier and the Ethernet Tag ID are considered to be part 779 of the prefix in the NLRI. The MPLS Label field is to be treated as 780 a route attribute as opposed to being part of the route. 782 The MPLS Label field is encoded as 3 octets, where the high-order 783 20 bits contain the label value. 785 For procedures and usage of this route, please see Sections 8.2 786 ("Fast Convergence") and 8.4 ("Aliasing and Backup Path"). 788 7.2. MAC/IP Advertisement Route 790 A MAC/IP Advertisement route type specific EVPN NLRI consists of the 791 following: 793 +---------------------------------------+ 794 | RD (8 octets) | 795 +---------------------------------------+ 796 |Ethernet Segment Identifier (10 octets)| 797 +---------------------------------------+ 798 | Ethernet Tag ID (4 octets) | 799 +---------------------------------------+ 800 | MAC Address Length (1 octet) | 801 +---------------------------------------+ 802 | MAC Address (6 octets) | 803 +---------------------------------------+ 804 | IP Address Length (1 octet) | 805 +---------------------------------------+ 806 | IP Address (0, 4, or 16 octets) | 807 +---------------------------------------+ 808 | MPLS Label1 (3 octets) | 809 +---------------------------------------+ 810 | MPLS Label2 (0 or 3 octets) | 811 +---------------------------------------+ 813 For the purpose of BGP route key processing, only the Ethernet Tag 814 ID, MAC Address Length, MAC Address, IP Address Length, and IP 815 Address fields are considered to be part of the prefix in the NLRI. 816 The Ethernet Segment Identifier, MPLS Label1, and MPLS Label2 fields 817 are to be treated as route attributes as opposed to being part of the 818 "route". Both the IP and MAC address lengths are in bits. 820 The MPLS Label1 and MPLS Label2 fields are encoded as 3 octets, where 821 the high-order 20 bits contain the label value. 823 For procedures and usage of this route, please see Sections 9 824 ("Determining Reachability to Unicast MAC Addresses") and 14 825 ("Load Balancing of Unicast Packets"). 827 7.3. Inclusive Multicast Ethernet Tag Route 829 An Inclusive Multicast Ethernet Tag route type specific EVPN NLRI 830 consists of the following: 832 +---------------------------------------+ 833 | RD (8 octets) | 834 +---------------------------------------+ 835 | Ethernet Tag ID (4 octets) | 836 +---------------------------------------+ 837 | IP Address Length (1 octet) | 838 +---------------------------------------+ 839 | Originating Router's IP Address | 840 | (4 or 16 octets) | 841 +---------------------------------------+ 843 For procedures and usage of this route, please see Sections 11 844 ("Handling of Multi-destination Traffic"), 12 845 ("Processing of Unknown Unicast Packets"), and 16 846 ("Multicast and Broadcast"). The IP address length is in bits. For 847 the purpose of BGP route key processing, only the Ethernet Tag ID, IP 848 Address Length, and Originating Router's IP Address fields are 849 considered to be part of the prefix in the NLRI. 851 7.4. Ethernet Segment Route 853 An Ethernet Segment route type specific EVPN NLRI consists of the 854 following: 856 +---------------------------------------+ 857 | RD (8 octets) | 858 +---------------------------------------+ 859 |Ethernet Segment Identifier (10 octets)| 860 +---------------------------------------+ 861 | IP Address Length (1 octet) | 862 +---------------------------------------+ 863 | Originating Router's IP Address | 864 | (4 or 16 octets) | 865 +---------------------------------------+ 867 For procedures and usage of this route, please see Section 8.5 868 ("Designated Forwarder Election"). The IP address length is in bits. 869 For the purpose of BGP route key processing, only the Ethernet 870 Segment ID, IP Address Length, and Originating Router's IP Address 871 fields are considered to be part of the prefix in the NLRI. 873 7.5. ESI Label Extended Community 875 This Extended Community is a new transitive Extended Community having 876 a Type field value of 0x06 and the Sub-Type 0x01. It may be 877 advertised along with Ethernet Auto-discovery routes, and it enables 878 split-horizon procedures for multihomed sites as described in 879 Section 8.3 ("Split Horizon"). The ESI Label field represents an ES 880 by the advertising PE, and it is used in split-horizon filtering by 881 other PEs that are connected to the same multihomed Ethernet segment. 883 The ESI Label field is encoded as 3 octets, where the high-order 884 20 bits contain the label value. 886 The ESI label value MAY be zero if no split-horizon filtering 887 procedures are required in any of the VLANs of the Ethernet Segment. 888 This is the case in [RFC8214] or Ethernet Segments using Local Bias 889 procedures in [I-D.ietf-bess-evpn-mh-split-horizon]. 891 Each ESI Label extended community is encoded as an 8-octet value, as 892 follows: 894 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 895 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 896 | Type=0x06 | Sub-Type=0x01 | Flags(1 octet)| Reserved=0 | 897 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 898 | Reserved=0 | ESI Label | 899 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 901 The low-order bit of the Flags octet is defined as the 902 "Single-Active" bit. A value of 0 means that the multihomed site 903 is operating in All-Active redundancy mode, and a value of 1 means 904 that the multihomed site is operating in Single-Active redundancy 905 mode. 907 7.6. ES-Import Route Target 909 This is a new transitive Route Target extended community carried with 910 the Ethernet Segment route. When used, it enables all the PEs 911 connected to the same multihomed site to import the Ethernet Segment 912 routes. 914 + The value MAY be derived automatically for ESI Type 0 by encoding 915 the high-order 6-octet portion of the 9-octet ESI Value, which 916 corresponds to part of the arbitrary value configured, in the ES- 917 Import Route Target. 919 + The value is derived automatically for ESI Types 1, 2, and 3, by 920 encoding the high-order 6-octet portion of the 9-octet ESI Value, 921 which corresponds to a MAC address, in the ES-Import Route Target. 923 + The value MAY be derived automatically for ESI Types 4 and 5, by 924 encoding the high-order 6-octet portion of the 9-octet ESI Value, 925 which corresponds to a Router ID or AS number (4-octets) 926 respectively, and 2-octets of Local Discriminator, in the ES-Import 927 Route Target. 929 The format of this Extended Community is as follows: 931 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 932 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 933 | Type=0x06 | Sub-Type=0x02 | ES-Import ~ 934 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 935 ~ ES-Import Cont'd | 936 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 938 This document expands the definition of the Route Target extended 939 community to allow the value of the high-order octet (Type field) to 940 be 0x06 (in addition to the values specified in [RFC4360]). The 941 low-order octet (Sub-Type field) value 0x02 indicates that this 942 Extended Community is of type "Route Target". The new Type field 943 value 0x06 indicates that the structure of this RT is a 6-octet value 944 (e.g., a MAC address). A BGP speaker that implements RT Constraint 945 [RFC4684] MUST apply the RT Constraint procedures to the ES-Import RT 946 as well. 948 For procedures and usage of this extended community, please see 949 Section 8.1 ("Multihomed Ethernet Segment Auto-discovery"). 951 7.7. MAC Mobility Extended Community 953 This Extended Community is a new transitive Extended Community having 954 a Type field value of 0x06 and the Sub-Type 0x00. It may be 955 advertised along with MAC/IP Advertisement routes. The procedures 956 for using this extended community are described in Section 15 957 ("MAC Mobility"). 959 The MAC Mobility extended community is encoded as an 8-octet value, 960 as follows: 962 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 963 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 964 | Type=0x06 | Sub-Type=0x00 |Flags(1 octet)| Reserved=0 | 965 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 966 | Sequence Number | 967 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 969 The low-order bit of the Flags octet is defined as the 970 "Sticky/static" flag and may be set to 1. A value of 1 means that 971 the MAC address is static and cannot move. The sequence number is 972 used to ensure that PEs retain the correct MAC/IP Advertisement route 973 when multiple updates occur for the same MAC address. 975 7.8. Default Gateway Extended Community 977 The Default Gateway community is an Extended Community of an Opaque 978 Type (see Section 3.3 of [RFC4360]). It is a transitive community, 979 which means that the first octet is 0x03. The value of the second 980 octet (Sub-Type) is 0x0d (Default Gateway) as assigned by IANA. The 981 Value field of this community is reserved (set to 0 by the senders, 982 ignored by the receivers). For procedures and usage of this extended 983 community, please see Section 10.1 ("Default Gateway"). 985 7.9. Route Distinguisher Assignment per MAC-VRF 987 The Route Distinguisher (RD) MUST be set to the RD of the MAC-VRF 988 that is advertising the NLRI. An RD MUST be assigned for a given 989 MAC-VRF on a PE. This RD MUST be unique across all MAC-VRFs on a PE. 990 It is RECOMMENDED to use the Type 1 RD [RFC4364]. The value field 991 comprises an IP address of the PE (typically, the loopback address) 992 followed by a number unique to the PE. This number may be generated 993 by the PE. In case of VLAN-based or VLAN Bundle services, this 994 number may also be generated out of the Ethernet Tag ID for the BD as 995 long as the value does not exceed a length of 16 bits. Or, in the 996 Unique VLAN EVPN case, the low-order 12 bits may be the 12-bit VLAN 997 ID, with the remaining high-order 4 bits set to 0. 999 7.10. Route Targets 1001 The EVPN route MAY carry one or more Route Target (RT) extended 1002 communities. RTs may be configured (as in IP VPNs) or may be derived 1003 automatically. 1005 If a PE uses RT Constraint, the PE advertises all such RTs using RT 1006 Constraints per [RFC4684]. The use of RT Constraints allows each 1007 EVPN route to reach only those PEs that are configured to import at 1008 least one RT from the set of RTs carried in the EVPN route. 1010 7.10.1. Auto-derivation from the Ethernet Tag (VLAN ID) 1012 For the "Unique VLAN EVPN" scenario (Section 4), it is highly 1013 desirable to auto-derive the RT from the Ethernet Tag (VLAN ID). The 1014 procedure for performing such auto-derivation is as follows: 1016 + The Global Administrator field of the RT MUST be set to the 1017 Autonomous System (AS) number with which the PE is associated. 1019 + The 12-bit VLAN ID MUST be encoded in the lowest 12 bits of the 1020 Local Administrator field, with the remaining bits set to zero. 1022 For VLAN-based and VLAN Bundle services, the RT may also be auto- 1023 derived as per the above rules but replacing the 12-bit VLAN ID with 1024 a 16-bit Ethernet Tag ID configured for the BD. If the Ethernet Tag 1025 ID length is 24 bits, the RT for the MAC-VRF can be auto-derived as 1026 per [RFC8365] section 5.1.2.1. 1028 7.11. EVPN Layer 2 Attributes Extended Community 1030 [RFC8214] defines this extended community ("L2-Attr"), to be included 1031 with per-EVI Ethernet A-D routes and mandatory if multihoming is 1032 enabled. 1034 Usage and applicability of this Extended community to Bridging is 1035 clarified here. 1037 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1038 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1039 | MBZ |RSV|RSV|F|C|P|B| (MBZ = MUST Be Zero) 1040 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1042 The following bits in Control Flags from [RFC8214] are listed here 1043 for completeness only: 1045 Name Meaning 1046 --------------------------------------------------------------- 1047 P If set to 1 in multihoming Single-Active scenarios, 1048 this flag indicates that the advertising PE is the 1049 primary PE. MUST be set to 1 for multihoming 1050 All-Active scenarios by all active PE(s). 1052 B If set to 1 in multihoming Single-Active scenarios, 1053 this flag indicates that the advertising PE is the 1054 backup PE. 1056 C If set to 1, a control word [RFC4448] MUST be present 1057 when sending EVPN packets to this PE. It is 1058 recommended that the control word be included in the 1059 absence of an entropy label [RFC6790]. 1061 The bits in Control Flags are extended by the following defined bits: 1063 Name Meaning 1064 --------------------------------------------------------------- 1065 F If set to 1, a Flow Label MUST be present 1066 when sending EVPN packets to this PE. 1067 If set to 0, a Flow Label MUST NOT be present 1068 when sending EVPN packets to this PE. 1070 For procedures and usage of this extended community, with respect to 1071 Control Word and Flow Label, please see Section 18. 1072 ("Frame Ordering"). 1074 For procedures and usage of this extended community, with respect to 1075 Primary-Backup bits, please see Section 8.5. 1076 ("Designated Forwarder Election"). 1078 7.11.1. EVPN Layer 2 Attributes Partitioning 1080 The information carried in the L2-Attr Extended Community may be ESI- 1081 specific or BD/MAC-VRF-specific. In order to minimize the processing 1082 overhead of configuration-time items such as MTU not expected to 1083 change at runtime based on failures, the Extended Community from 1084 [RFC8214] is partitioned, with a subset of information carried over 1085 each Ethernet A-D per EVI and Inclusive Multicast routes. 1087 The EVPN Layer 2 Attributes Extended Community, when added to 1088 Inclusive Multicast route: 1090 - BD/MAC-VRF attributes MTU, Control Word and Flow Label are 1091 conveyed, and; 1093 - per-ESI attributes P, B MUST be zero. 1095 +-------------------------------------------+ 1096 | Type (0x06) / Sub-type (0x04) (2 octets) | 1097 +-------------------------------------------+ 1098 | Control Flags (2 octets) | 1099 +-------------------------------------------+ 1100 | L2 MTU (2 octets) | 1101 +-------------------------------------------+ 1102 | Reserved (2 octets) | 1103 +-------------------------------------------+ 1105 1 1 1 1 1 1106 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1107 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1108 | MBZ | MBZ |F|C|MBZ| (MBZ = MUST Be Zero) 1109 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1111 The EVPN Layer 2 Attributes Extended Community is included on 1112 Ethernet A-D per EVI route and: 1114 - per-ESI attributes P, B are conveyed, and; 1116 - BD/MAC-VRF attributes MTU, Control Word and Flow Label MUST be 1117 zero. 1119 +-------------------------------------------+ 1120 | Type (0x06) / Sub-type (0x04) (2 octets) | 1121 +-------------------------------------------+ 1122 | Control Flags (2 octets) | 1123 +-------------------------------------------+ 1124 | MBZ (2 octets) | 1125 +-------------------------------------------+ 1126 | Reserved (2 octets) | 1127 +-------------------------------------------+ 1129 1 1 1 1 1 1130 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1131 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1132 | MBZ | MBZ |P|B| (MBZ = MUST Be Zero) 1133 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1135 Note that in both of the above cases, the values conveyed in this 1136 extended community are at the granularity of an individual EVI (or 1137 [EVI, BD] for VLAN-aware bundle) and hence may vary for different 1138 EVIs. 1140 7.12. Route Prioritization 1142 In order to achieve the Fast Convergence referred to in (Section 8.2 1143 ("Fast Convergence")), BGP speakers SHOULD prioritise advertisement, 1144 processing and redistribution of routes based on relative scale of 1145 priority vs. expected or average scale. 1147 1. Ethernet A-D per ES (Mass-Withdraw Route Type 1) and Ethernet 1148 Segment (Route Type 4) are lower scale and highly convergence 1149 affecting, and SHOULD be handled in first order of priority 1151 2. Ethernet A-D per EVI, Inclusive Multicast Ethernet Tag route, and 1152 IP Prefix route defined in [RFC9136] are sent for each Bridge or 1153 AC at medium scale and may be convergence affecting, and SHOULD 1154 be handled in second order of priority 1156 3. MAC advertisement route (zero and nonzero IP portion), Multicast 1157 Join Sync and Multicast Leave Sync routes defined in 1158 [I-D.ietf-bess-evpn-igmp-mld-proxy] are considered 'individual 1159 routes' and very-high scale or of relatively low importance for 1160 fast convergence and SHOULD be handled in last order of priority. 1162 8. Multihoming Functions 1164 This section discusses the functions, procedures, and associated BGP 1165 routes used to support multihoming in EVPN. This covers both 1166 multihomed device (MHD) and multihomed network (MHN) scenarios. 1168 8.1. Multihomed Ethernet Segment Auto-discovery 1170 PEs connected to the same Ethernet segment can automatically discover 1171 each other with minimal to no configuration through the exchange of 1172 the Ethernet Segment route. 1174 8.1.1. Constructing the Ethernet Segment Route 1176 The Route Distinguisher (RD) MUST be a Type 1 RD [RFC4364]. The 1177 value field comprises an IP address of the PE (typically, the 1178 loopback address) followed by a number unique to the PE. 1180 The Ethernet Segment Identifier (ESI) MUST be set to the 10-octet 1181 value described in Section 5. 1183 The BGP advertisement that advertises the Ethernet Segment route MUST 1184 also carry an ES-Import Route Target, as defined in Section 7.6. 1186 The Ethernet Segment route filtering MUST be done such that the 1187 Ethernet Segment route is imported only by the PEs that are 1188 multihomed to the same Ethernet segment. To that end, each PE that 1189 is connected to a particular Ethernet segment constructs an import 1190 filtering rule to import a route that carries the ES-Import Route 1191 Target, constructed from the ESI. 1193 8.2. Fast Convergence 1195 In EVPN, MAC address reachability is learned via the BGP control 1196 plane over the MPLS network. As such, in the absence of any fast 1197 protection mechanism, the network convergence time is a function of 1198 the number of MAC/IP Advertisement routes that must be withdrawn by 1199 the PE encountering a failure. For highly scaled environments, this 1200 scheme yields slow convergence. 1202 To alleviate this, EVPN defines a mechanism to efficiently and 1203 quickly signal, to remote PE nodes, the need to update their 1204 forwarding tables upon the occurrence of a failure in connectivity to 1205 an Ethernet segment. This is done by having each PE advertise a set 1206 of one or more Ethernet A-D per ES routes for each locally attached 1207 Ethernet segment (refer to Section 8.2.1 below for details on how 1208 these routes are constructed). A PE may need to advertise more than 1209 one Ethernet A-D per ES route for a given ES because the ES may be in 1210 a multiplicity of EVIs and the RTs for all of these EVIs may not fit 1211 into a single route. Advertising a set of Ethernet A-D per ES routes 1212 for the ES allows each route to contain a subset of the complete set 1213 of RTs. Each Ethernet A-D per ES route is differentiated from the 1214 other routes in the set by a different Route Distinguisher (RD). 1216 Upon a failure in connectivity to the attached segment, the PE 1217 withdraws the corresponding set of Ethernet A-D per ES routes. This 1218 triggers all PEs that receive the withdrawal to update their next-hop 1219 adjacencies for all MAC addresses associated with the Ethernet 1220 segment in question. If no other PE had advertised an Ethernet A-D 1221 per ES route for the same segment, then the PE that received the 1222 withdrawal simply invalidates the MAC entries for that segment. 1223 Otherwise, the PE updates its next-hop adjacencies accordingly. 1225 8.2.1. Constructing Ethernet A-D per Ethernet Segment Route 1227 This section describes the procedures used to construct the Ethernet 1228 A-D per ES route, which is used for fast convergence (as discussed 1229 above) and for advertising the ESI label used for split-horizon 1230 filtering (as discussed in Section 8.3). Support of this route is 1231 REQUIRED. 1233 The Route Distinguisher (RD) MUST be a Type 1 RD [RFC4364]. The 1234 value field comprises an IP address of the PE (typically, the 1235 loopback address) followed by a number unique to the PE. 1237 The Ethernet Segment Identifier MUST be a 10-octet entity as 1238 described in Section 5 ("Ethernet Segment"). The Ethernet A-D route 1239 is not needed when the Segment Identifier is set to 0 (e.g., single- 1240 homed scenarios). An exception to this rule is described in 1241 [RFC8317]. 1243 The Ethernet Tag ID MUST be set to MAX-ET. 1245 The MPLS label in the NLRI MUST be set to 0. 1247 The ESI Label extended community MUST be included in the route. If 1248 All-Active redundancy mode is desired, then the "Single-Active" bit 1249 in the flags of the ESI Label extended community MUST be set to 0 and 1250 the MPLS label in that Extended Community MUST be set to a valid MPLS 1251 label value. The MPLS label in this Extended Community is referred 1252 to as the ESI label and MUST have the same value in each Ethernet A-D 1253 per ES route advertised for the ES. This label MUST be a downstream 1254 assigned MPLS label if the advertising PE is using ingress 1255 replication for receiving multicast, broadcast, or unknown unicast 1256 traffic from other PEs. If the advertising PE is using P2MP MPLS 1257 LSPs for sending multicast, broadcast, or unknown unicast traffic, 1258 then this label MUST be an upstream assigned MPLS label, unless DCB 1259 allocated labels are used. The usage of this label is described in 1260 Section 8.3. 1262 If Single-Active redundancy mode is desired, then the "Single-Active" 1263 bit in the flags of the ESI Label extended community MUST be set to 1 1264 and the ESI label SHOULD be set to a valid MPLS label value. 1266 8.2.1.1. Ethernet A-D Route Targets 1268 Each Ethernet A-D per ES route MUST carry one or more Route Target 1269 (RT) extended communities. The set of Ethernet A-D routes per ES 1270 MUST carry the entire set of RTs for all the EVPN instances to which 1271 the Ethernet segment belongs. 1273 8.3. Split Horizon 1275 Consider a CE that is multihomed to two or more PEs on an Ethernet 1276 segment ES1 operating in All-Active redundancy mode. If the CE sends 1277 a broadcast, unknown unicast, or multicast (BUM) packet to one of the 1278 Non-Designated Forwarder (Non-DF) PEs, say PE1, then PE1 will forward 1279 that packet to all or a subset of the other PEs in that EVPN 1280 instance, including the DF PE for that Ethernet segment. In this 1281 case, the DF PE to which the CE is multihomed MUST drop the packet 1282 and not forward back to the CE. This filtering is referred to as 1283 "split-horizon filtering" in this document. 1285 When a set of PEs are operating in Single-Active redundancy mode, the 1286 use of this split-horizon filtering mechanism is highly recommended 1287 because it prevents transient loops at the time of failure or 1288 recovery that would impact the Ethernet segment -- e.g., when two PEs 1289 think that both are DFs for that segment before the DF election 1290 procedure settles down. 1292 In order to achieve this split-horizon function, every BUM packet 1293 originating from a Non-DF PE is encapsulated with an MPLS label that 1294 identifies the Ethernet segment of origin (i.e., the segment from 1295 which the frame entered the EVPN network). This label is referred to 1296 as the ESI label and MUST be distributed by all PEs when operating in 1297 All-Active redundancy mode using a set of Ethernet A-D per ES routes, 1298 per Section 8.2.1 above. The ESI label SHOULD be distributed by all 1299 PEs when operating in Single-Active redundancy mode using a set of 1300 Ethernet A-D per ES routes. These routes are imported by the PEs 1301 connected to the Ethernet segment and also by the PEs that have at 1302 least one EVPN instance in common with the Ethernet segment in the 1303 route. As described in Section 8.1.1, the route MUST carry an ESI 1304 Label extended community with a valid ESI label. The disposition PE 1305 relies on the value of the ESI label to determine whether or not a 1306 BUM frame is allowed to egress a specific Ethernet segment. 1308 8.3.1. ESI Label Assignment 1310 The following subsections describe the assignment procedures for the 1311 ESI label, which differ depending on the type of tunnels being used 1312 to deliver multi-destination packets in the EVPN network. 1314 8.3.1.1. Ingress Replication 1316 Each PE that operates in All-Active or Single-Active redundancy mode 1317 and that uses ingress replication to receive BUM traffic advertises a 1318 downstream assigned ESI label in the set of Ethernet A-D per ES 1319 routes for its attached ES. This label MUST be programmed in the 1320 platform label space by the advertising PE, and the forwarding entry 1321 for this label must result in NOT forwarding packets received with 1322 this label onto the Ethernet segment for which the label was 1323 distributed. 1325 The rules for the inclusion of the ESI label in a BUM packet by the 1326 ingress PE operating in All-Active redundancy mode are as follows: 1328 - A Non-DF ingress PE MUST include the ESI label distributed by the 1329 DF egress PE in the copy of a BUM packet sent to it. 1331 - An ingress PE (DF or Non-DF) SHOULD include the ESI label 1332 distributed by each Non-DF egress PE in the copy of a BUM packet 1333 sent to it. 1335 The rule for the inclusion of the ESI label in a BUM packet by the 1336 ingress PE operating in Single-Active redundancy mode is as follows: 1338 - An ingress DF PE SHOULD include the ESI label distributed by the 1339 egress PE in the copy of a BUM packet sent to it. 1341 In both All-Active and Single-Active redundancy mode, an ingress PE 1342 MUST NOT include an ESI label in the copy of a BUM packet sent to an 1343 egress PE that is not attached to the ES through which the BUM packet 1344 entered the EVI. 1346 As an example, consider PE1 and PE2, which are multihomed to CE1 on 1347 ES1 and operating in All-Active multihoming mode. Further, consider 1348 that PE1 is using P2P or MP2P LSPs to send packets to PE2. Consider 1349 that PE1 is the Non-DF for VLAN1 and PE2 is the DF for VLAN1, and PE1 1350 receives a BUM packet from CE1 on VLAN1 on ES1. In this scenario, 1351 PE2 distributes an Inclusive Multicast Ethernet Tag route for VLAN1 1352 corresponding to an EVPN instance. So, when PE1 sends a BUM packet 1353 that it receives from CE1, it MUST first push onto the MPLS label 1354 stack the ESI label that PE2 has distributed for ES1. It MUST then 1355 push onto the MPLS label stack the MPLS label distributed by PE2 in 1356 the Inclusive Multicast Ethernet Tag route for VLAN1. The resulting 1357 packet is further encapsulated in the P2P or MP2P LSP label stack 1358 required to transmit the packet to PE2. When PE2 receives this 1359 packet, it determines, from the top MPLS label, the set of ESIs to 1360 which it will replicate the packet after any P2P or MP2P LSP labels 1361 have been removed. If the next label is the ESI label assigned by 1362 PE2 for ES1, then PE2 MUST NOT forward the packet onto ES1. If the 1363 next label is an ESI label that has not been assigned by PE2, then 1364 PE2 MUST drop the packet. It should be noted that in this scenario, 1365 if PE2 receives a BUM packet for VLAN1 from CE1, then it SHOULD 1366 encapsulate the packet with an ESI label received from PE1 when 1367 sending it to PE1 in order to avoid any transient loops during a 1368 failure scenario that would impact ES1 (e.g., port or link failure). 1370 8.3.1.2. P2MP MPLS LSPs 1372 The Non-DF PEs that operate in All-Active redundancy mode and that 1373 use P2MP LSPs to send BUM traffic advertise an upstream assigned ESI 1374 label in the set of Ethernet A-D per ES routes for their common 1375 attached ES. This label is upstream assigned by the PE that 1376 advertises the route. This label MUST be programmed by the other PEs 1377 that are connected to the ESI advertised in the route, in the context 1378 label space for the advertising PE. Further, the forwarding entry 1379 for this label must result in NOT forwarding packets received with 1380 this label onto the Ethernet segment for which the label was 1381 distributed. This label MUST also be programmed by the other PEs 1382 that import the route but are not connected to the ESI advertised in 1383 the route, in the context label space for the advertising PE. 1384 Further, the forwarding entry for this label must be a label pop with 1385 no other associated action. 1387 The DF PE that operates in Single-Active redundancy mode and that 1388 uses P2MP LSPs to send BUM traffic should advertise an upstream 1389 assigned ESI label in the set of Ethernet A-D per ES routes for its 1390 attached ES, just as described in the previous paragraph. 1392 As an example, consider PE1 and PE2, which are multihomed to CE1 on 1393 ES1 and operating in All-Active multihoming mode. Also, consider 1394 that PE3 belongs to one of the EVPN instances of ES1. Further, 1395 assume that PE1, which is the Non-DF, is using P2MP MPLS LSPs to send 1396 BUM packets. When PE1 sends a BUM packet that it receives from CE1, 1397 it MUST first push onto the MPLS label stack the ESI label that it 1398 has assigned for the ESI on which the packet was received. The 1399 resulting packet is further encapsulated in the P2MP MPLS label stack 1400 necessary to transmit the packet to the other PEs. Penultimate hop 1401 popping MUST be disabled on the P2MP LSPs used in the MPLS transport 1402 infrastructure for EVPN. When PE2 receives this packet, it 1403 decapsulates the top MPLS label and forwards the packet using the 1404 context label space determined by the top label. If the next label 1405 is the ESI label assigned by PE1 to ES1, then PE2 MUST NOT forward 1406 the packet onto ES1. When PE3 receives this packet, it decapsulates 1407 the top MPLS label and forwards the packet using the context label 1408 space determined by the top label. If the next label is the ESI 1409 label assigned by PE1 to ES1 and PE3 is not connected to ES1, then 1410 PE3 MUST pop the label and flood the packet over all local ESIs in 1411 that EVPN instance. It should be noted that when PE2 sends a BUM 1412 frame over a P2MP LSP, it should encapsulate the frame with an ESI 1413 label even though it is the DF for that VLAN, in order to avoid any 1414 transient loops during a failure scenario that would impact ES1 1415 (e.g., port or link failure). 1417 8.3.1.3. MP2MP MPLS LSPs 1419 The procedures for MP2MP tunnels follow Section 8.3.1.2, with the 1420 exceptions described in this section. 1422 When MP2MP tunnels are used, ESI-labels MUST be allocated from a DCB 1423 and the same label must be used by all the PEs attached to the same 1424 Ethernet Segment. 1426 In that way, any egress PE with local Ethernet Segments can identify 1427 the source ES of the received BUM packets. 1429 8.4. Aliasing and Backup Path 1431 In the case where a CE is multihomed to multiple PE nodes, using a 1432 Link Aggregation Group (LAG) with All-Active redundancy, it is 1433 possible that only a single PE learns a set of the MAC addresses 1434 associated with traffic transmitted by the CE. This leads to a 1435 situation where remote PE nodes receive MAC/IP Advertisement routes 1436 for these addresses from a single PE, even though multiple PEs are 1437 connected to the multihomed segment. As a result, the remote PEs are 1438 not able to effectively load balance traffic among the PE nodes 1439 connected to the multihomed Ethernet segment. This could be the 1440 case, for example, when the PEs perform data-plane learning on the 1441 access, and the load-balancing function on the CE hashes traffic from 1442 a given source MAC address to a single PE. 1444 Another scenario where this occurs is when the PEs rely on control- 1445 plane learning on the access (e.g., using ARP), since ARP traffic 1446 will be hashed to a single link in the LAG. 1448 To address this issue, EVPN introduces the concept of 'aliasing', 1449 which is the ability of a PE to signal that it has reachability to an 1450 EVPN instance on a given ES even when it has learned no MAC addresses 1451 from that EVI/ES. The Ethernet A-D per EVI route is used for this 1452 purpose. A remote PE that receives a MAC/IP Advertisement route with 1453 a non-reserved ESI SHOULD consider the advertised MAC address to be 1454 reachable via all PEs that have advertised reachability to that MAC 1455 address's EVI/ES/Ethernet Tag ID via the combination of an Ethernet 1456 A-D per EVI route for that EVI/ES/Ethernet Tag ID AND Ethernet A-D 1457 per ES routes for that ES with the "Single-Active" bit in the flags 1458 of the ESI Label extended community set to 0. 1460 Note that the Ethernet A-D per EVI route may be received by a remote 1461 PE before it receives the set of Ethernet A-D per ES routes. 1462 Therefore, in order to handle corner cases and race conditions, the 1463 Ethernet A-D per EVI route MUST NOT be used for traffic forwarding by 1464 a remote PE until it also receives the associated set of Ethernet A-D 1465 per ES routes. 1467 The backup path is a closely related function, but it is used in 1468 Single-Active redundancy mode. In this case, a PE also advertises 1469 that it has reachability to a given EVI/ES using the same combination 1470 of Ethernet A-D per EVI route and Ethernet A-D per ES route as 1471 discussed above, but with the "Single-Active" bit in the flags of the 1472 ESI Label extended community set to 1. A remote PE that receives a 1473 MAC/IP Advertisement route with a non-reserved ESI SHOULD consider 1474 the advertised MAC address to be reachable via any PE that has 1475 advertised this combination of Ethernet A-D routes, and it SHOULD 1476 install a backup path for that MAC address. 1478 Please see Section 14.1.1 for a description of the backup paths 1479 operation. 1481 8.4.1. Constructing Ethernet A-D per EVPN Instance Route 1483 This section describes the procedures used to construct the Ethernet 1484 A-D per EVPN instance (EVI) route, which is used for aliasing (as 1485 discussed above). Support of this route is OPTIONAL. 1487 The Route Distinguisher (RD) MUST be set per Section 7.9. 1489 The Ethernet Segment Identifier MUST be a 10-octet entity as 1490 described in Section 5 ("Ethernet Segment"). The Ethernet A-D route 1491 is not needed when the Segment Identifier is set to 0. 1493 The Ethernet Tag ID is set as defined in Section 6. 1495 Note that the above allows the Ethernet A-D per EVI route to be 1496 advertised with one of the following granularities: 1498 + One Ethernet A-D route per tuple per 1499 MAC-VRF. This is applicable when the PE uses MPLS-based 1500 disposition with VID translation or may be applicable when the 1501 PE uses MAC-based disposition with VID translation. 1503 + One Ethernet A-D route for each per MAC-VRF (where the 1504 Ethernet Tag ID is set to 0). This is applicable when the PE uses 1505 MAC-based disposition or MPLS-based disposition without VID 1506 translation. 1508 The usage of the MPLS label is described in Section 14 1509 ("Load Balancing of Unicast Packets"). 1511 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1512 be set to the IPv4 or IPv6 address of the advertising PE. 1514 The Ethernet A-D per EVI route MUST carry one or more Route Target 1515 (RT) extended communities, per Section 7.10. 1517 8.5. Designated Forwarder Election 1519 Consider a CE that is a host or a router that is multihomed directly 1520 to more than one PE in an EVPN instance on a given Ethernet segment. 1522 In this scenario, only one of the PEs, referred to as the Designated 1523 Forwarder (DF), is responsible for certain actions: 1525 - Sending broadcast and multicast traffic for a given EVI to that CE. 1527 - If the flooding of unknown unicast traffic (i.e., traffic for which 1528 a PE does not know the destination MAC address, see Section 12) is 1529 allowed, sending unknown unicast traffic for a given EVI to that 1530 CE. 1532 - If the multihoming mode is Single-Active, sending (known) unicast 1533 traffic for a given EVI to that CE. 1535 Note that this behavior, which allows selecting a DF at the 1536 granularity of for is the default behavior in this 1537 specification. 1539 In this same scenario, a second PE referred to as the 1540 Backup-Designated Forwarder (Backup-DF or BDF), is responsible for 1541 assuming the role of DF in the event of DF's failure. Until this 1542 occurs, the Backup-DF PE is a subset of, and behaves like, a Non-DF 1543 PE for all forwarding considerations. 1545 All other PEs, referred to as Non-Designated Forwarder (Non-DF or 1546 NDF) are not responsible for any forwarding nor of assuming any 1547 functionality from the DF in the event of its failure. 1549 The default procedure for DF election at the granularity of 1550 is referred to as "service carving". With service carving, it is 1551 possible to perform load-balancing of traffic destined to a given 1552 segment. The load-balancing procedure carves the set of EVIs on that 1553 ES among the PEs nodes evenly such that every PE is the DF for a 1554 disjoint and distinct set of EVIs for that ES. The procedure for 1555 service carving is as follows according to the DF Election Finite 1556 State Machine as defined in [RFC8584] Section 2.1: 1558 1. When a PE discovers the ESI of the attached Ethernet segment, 1559 it advertises an Ethernet Segment route with the associated 1560 ES-Import extended community. 1562 2. The PE then starts a timer (default value = 3 seconds) to allow 1563 the reception of Ethernet Segment routes from other PE nodes 1564 connected to the same Ethernet segment. This timer value should 1565 be the same across all PEs connected to the same Ethernet 1566 segment. 1568 3. When the timer expires, each PE builds an ordered list of the IP 1569 addresses of all the PE nodes connected to the Ethernet segment 1570 (including itself), in increasing numeric value. Each IP address 1571 in this list is extracted from the "IP Address length" and 1572 "Originating Router's IP address" fields of the advertised 1573 Ethernet Segment route. Every PE is then given an ordinal 1574 indicating its position in the ordered list, starting with 0 as 1575 the ordinal for the PE with the lowest IP address length and 1576 numeric value tuple. The ordinals are used to determine which PE 1577 node will be the DF for a given EVPN instance on the Ethernet 1578 segment, using the following rule: 1580 Assuming a redundancy group of N PE nodes, the PE with ordinal i 1581 is the DF for an when (V mod N) = i, where V is the 1582 Ethernet tag for that EVI. For VLAN-Aware Bundle service, then 1583 the numerically lowest Ethernet tag in that EVI MUST be used in 1584 the modulo function. 1586 It should be noted that using the "Originating Router's IP 1587 address" field in the Ethernet Segment route to get the PE IP 1588 address needed for the ordered list allows for a CE to be 1589 multihomed across different ASes if such a need ever arises. 1591 4. For each EVPN instance, a second list of the IP addresses of all 1592 the PE nodes connected to the Ethernet segment is built. The PE 1593 which was determined as DF above is removed from that ordered 1594 candidate list, forming a backup redundancy group of M PE nodes. 1595 Every remaining PE is then given a second ordinal indicating its 1596 position in the secondary ordered list according to the same 1597 criteria as in step 3 above. 1599 The second ordinals are used to determine which PE nodes will be 1600 the BDF for a given EVPN instance on the Ethernet segment, using 1601 the same modulo rule as above, (V mod M) = i. 1603 5. The PE that is elected as a DF for a given will unblock 1604 BUM traffic, or all traffic if in Single-Active mode, for that 1605 EVI on the corresponding ES. Note that the DF PE unblocks BUM 1606 traffic in the egress direction towards the segment. All Non-DF 1607 PEs, including the Backup-DF PE, continue to drop 1608 multi-destination traffic in the egress direction towards that 1609 . 1611 In the case of link or port failure, the affected PE withdraws 1612 its Ethernet Segment route. This will re-trigger the service 1613 carving procedures on all the PEs in the redundancy group: the 1614 expected new-DF will be BDF previously calculated in step 5. For 1615 PE node failure, or upon PE commissioning or decommissioning, the 1616 PEs re-trigger the service carving. In the case of Single-Active 1617 multihoming, when a service moves from one PE in the redundancy 1618 group to another PE as a result of re-carving, the PE, which ends 1619 up being the elected DF for the service, SHOULD trigger a MAC 1620 address flush notification towards the associated Ethernet 1621 segment. This can be done, for example, using the IEEE 802.1ak 1622 Multiple VLAN Registration Protocol (MVRP) 'new' declaration. 1624 It is RECOMMENDED that all future DF Election algorithms specify an 1625 algorithm to select one Designated Forwarder (DF) PE, one Backup-DF 1626 PE and a residual number of Non-DF PE(s). 1628 8.6. Signaling Primary and Backup DF Elected PEs 1630 Once the Primary and Backup DF Elected PEs for a given are 1631 determined, the multi-homed PEs for that ES will each advertise an 1632 Ethernet A-D per EVI route for that EVI and each will include an 1633 L2-Attr extended community with the P and B bits set to reflect the 1634 advertising PE's role for that EVI. 1636 It should be noted if L2-Attr extended community is included for All- 1637 Active mode, then the P bit must be set for all PEs in the redundancy 1638 group. 1640 8.7. Interoperability with Single-Homing PEs 1642 Let's refer to PEs that only support single-homed CE devices as 1643 single-homing PEs. For single-homing PEs, all the above multihoming 1644 procedures can be omitted; however, to allow for single-homing PEs 1645 to fully interoperate with multihoming PEs, some of the multihoming 1646 procedures described above SHOULD be supported even by single- 1647 homing PEs: 1649 - procedures related to processing Ethernet A-D routes for the 1650 purpose of fast convergence (Section 8.2 ("Fast Convergence")), to 1651 let single-homing PEs benefit from fast convergence 1653 - procedures related to processing Ethernet A-D routes for the 1654 purpose of aliasing (Section 8.4 ("Aliasing and Backup Path")), to 1655 let single-homing PEs benefit from load balancing 1657 - procedures related to processing Ethernet A-D routes for the 1658 purpose of a backup path (Section 8.4 1659 ("Aliasing and Backup Path")), to let single-homing PEs benefit 1660 from the corresponding convergence improvement 1662 9. Determining Reachability to Unicast MAC Addresses 1664 PEs forward packets that they receive based on the destination MAC 1665 address. This implies that PEs must be able to learn how to reach a 1666 given destination unicast MAC address. 1668 There are two components to MAC address learning -- "local learning" 1669 and "remote learning": 1671 9.1. Local Learning 1673 A particular PE must be able to learn the MAC addresses from the CEs 1674 that are connected to it. This is referred to as local learning. 1676 The PEs in a particular EVPN instance MUST support local data-plane 1677 learning using standard IEEE Ethernet learning procedures. A PE must 1678 be capable of learning MAC addresses in the data plane when it 1679 receives packets such as the following from the CE network: 1681 - DHCP requests 1683 - An ARP Request for its own MAC 1685 - An ARP Request for a peer 1687 Alternatively, PEs MAY learn the MAC addresses of the CEs in the 1688 control plane or via management-plane integration between the PEs and 1689 the CEs. 1691 There are applications where a MAC address that is reachable via a 1692 given PE on a locally attached segment (e.g., with ESI X) may move, 1693 such that it becomes reachable via another PE on another segment 1694 (e.g., with ESI Y). This is referred to as "MAC Mobility". 1695 Procedures to support this are described in Section 15 1696 ("MAC Mobility"). 1698 9.2. Remote Learning 1700 A particular PE must be able to determine how to send traffic to MAC 1701 addresses that belong to or are behind CEs connected to other PEs, 1702 i.e., to remote CEs or hosts behind remote CEs. We call such MAC 1703 addresses "remote" MAC addresses. 1705 This document requires a PE to learn remote MAC addresses in the 1706 control plane. In order to achieve this, each PE advertises the MAC 1707 addresses it learns from its locally attached CEs in the control 1708 plane, to all the other PEs in that EVPN instance, using MP-BGP and, 1709 specifically, the MAC/IP Advertisement route. 1711 9.2.1. Constructing MAC/IP Address Advertisement 1713 BGP is extended to advertise these MAC addresses using the MAC/IP 1714 Advertisement route type in the EVPN NLRI. 1716 The RD MUST be set per Section 7.9. 1718 The Ethernet Segment Identifier is set to the 10-octet ESI described 1719 in Section 5 ("Ethernet Segment"). 1721 The Ethernet Tag ID is set as defined in Section 6. 1723 The MAC Address Length field is in bits, and it is set to 48. MAC 1724 address length values other than 48 bits are outside the scope of 1725 this document. The encoding of a MAC address MUST be the 6-octet MAC 1726 address specified by [IEEE.802.1Q_2014] and [IEEE.802.1D_2004]. 1728 The IP Address field is optional. By default, the IP Address Length 1729 field is set to 0, and the IP Address field is omitted from the 1730 route. When a valid IP address needs to be advertised, it is then 1731 encoded in this route. When an IP address is present, the IP Address 1732 Length field is in bits, and it is set to 32 or 128 bits. Other IP 1733 Address Length values are outside the scope of this document. The 1734 encoding of an IP address MUST be either 4 octets for IPv4 or 1735 16 octets for IPv6. The Length field of the EVPN NLRI (which is in 1736 octets and is described in Section 7) is sufficient to determine 1737 whether an IP address is encoded in this route and, if so, whether 1738 the encoded IP address is IPv4 or IPv6. 1740 The MPLS Label1 field is encoded as 3 octets, where the high-order 1741 20 bits contain the label value. The MPLS Label1 MUST be downstream 1742 assigned, and it is associated with the MAC address being advertised 1743 by the advertising PE. The advertising PE uses this label when it 1744 receives an MPLS-encapsulated packet to perform forwarding based on 1745 the destination MAC address toward the CE. The forwarding procedures 1746 are specified in Sections 13 and 14. 1748 A PE may advertise the same single EVPN label for all MAC addresses 1749 in a given MAC-VRF. This label assignment is referred to as a per 1750 MAC-VRF label assignment. Alternatively, a PE may advertise a unique 1751 EVPN label per combination. This label 1752 assignment is referred to as a per label 1753 assignment. As a third option, a PE may advertise a unique EVPN 1754 label per combination. This label assignment is 1755 referred to as a per label assignment. As a 1756 fourth option, a PE may advertise a unique EVPN label per MAC 1757 address. This label assignment is referred to as a per MAC label 1758 assignment. All of these label assignment methods have their 1759 trade-offs. The choice of a particular label assignment methodology 1760 is purely local to the PE that originates the route. 1762 An assignment per MAC-VRF label requires the least number of EVPN 1763 labels but requires a MAC lookup in addition to an MPLS lookup on an 1764 egress PE for forwarding. On the other hand, a unique label per 1765 or a unique label per MAC allows an egress PE to 1766 forward a packet that it receives from another PE, to the connected 1767 CE, after looking up only the MPLS labels without having to perform a 1768 MAC lookup. This includes the capability to perform appropriate VLAN 1769 ID translation on egress to the CE. 1771 The MPLS Label2 field is an optional field. If it is present, then 1772 it is encoded as 3 octets, where the high-order 20 bits contain the 1773 label value. 1775 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 1776 be set to the IPv4 or IPv6 address of the advertising PE. 1778 The BGP advertisement for the MAC/IP Advertisement route MUST also 1779 carry one or more Route Target (RT) extended communities. RTs may be 1780 configured (as in IP VPNs) or may be derived automatically in the 1781 "Unique VLAN EVPN" case from the Ethernet Tag (VLAN ID), as described 1782 in Section 7.10.1. 1784 It is to be noted that this document does not require PEs to create 1785 forwarding state for remote MACs when they are learned in the control 1786 plane. When this forwarding state is actually created is a local 1787 implementation matter. 1789 9.2.2. Route Resolution 1791 If the Ethernet Segment Identifier field in a received MAC/IP 1792 Advertisement route is set to the reserved ESI value of 0 or MAX-ESI, 1793 then if the receiving PE decides to install forwarding state for the 1794 associated MAC address, it MUST be based on the MAC/IP Advertisement 1795 route alone. 1797 If the Ethernet Segment Identifier field in a received MAC/IP 1798 Advertisement route is set to a non-reserved ESI, and the receiving 1799 PE is locally attached to the same ESI, then the PE does not alter 1800 its forwarding state based on the received route. This ensures that 1801 local routes are preferred to remote routes. 1803 If the Ethernet Segment Identifier field in a received MAC/IP 1804 Advertisement route is set to a non-reserved ESI, then if the 1805 receiving PE decides to install forwarding state for the associated 1806 MAC address, it MUST be when both the MAC/IP Advertisement route AND 1807 the associated set of Ethernet A-D per ES routes have been received. 1808 The dependency of MAC route installation on Ethernet A-D per ES 1809 routes is to ensure that MAC routes don't get accidentally installed 1810 during a mass withdraw period. 1812 To illustrate this with an example, consider two PEs (PE1 and PE2) 1813 connected to a multihomed Ethernet segment ES1. All-Active 1814 redundancy mode is assumed. A given MAC address M1 is learned by PE1 1815 but not PE2. On PE3, the following states may arise: 1817 T1 When the MAC/IP Advertisement route from PE1 and the set of 1818 Ethernet A-D per ES routes and Ethernet A-D per EVI routes from 1819 PE1 and PE2 are received, PE3 can forward traffic destined to 1820 M1 to both PE1 and PE2. 1822 T2 If after T1 PE1 withdraws its set of Ethernet A-D per ES 1823 routes, then PE3 forwards traffic destined to M1 to PE2 only. 1825 T2' If after T1 PE2 withdraws its set of Ethernet A-D per ES 1826 routes, then PE3 forwards traffic destined to M1 to PE1 only. 1828 T2'' If after T1 PE1 withdraws its MAC/IP Advertisement route, then 1829 PE3 treats traffic to M1 as unknown unicast. 1831 T3 PE2 also advertises a MAC route for M1, and then PE1 withdraws 1832 its MAC route for M1. PE3 continues forwarding traffic 1833 destined to M1 to both PE1 and PE2. In other words, despite M1 1834 withdrawal by PE1, PE3 forwards the traffic destined to M1 to 1835 both PE1 and PE2. This is because a flow from the CE, 1836 resulting in M1 traffic getting hashed to PE1, can get 1837 terminated, resulting in M1 being aged out in PE1; however, M1 1838 can be reachable by both PE1 and PE2. 1840 10. ARP and ND 1842 The IP Address field in the MAC/IP Advertisement route may optionally 1843 carry one of the IP addresses associated with the MAC address. This 1844 provides an option that can be used to minimize the flooding of ARP 1845 or Neighbor Discovery (ND) messages over the MPLS network and to 1846 remote CEs. This option also minimizes ARP (or ND) message 1847 processing on end-stations/hosts connected to the EVPN network. A PE 1848 may learn the IP address associated with a MAC address in the control 1849 or management plane between the CE and the PE. Or, it may learn this 1850 binding by snooping certain messages to or from a CE. When a PE 1851 learns the IP address associated with a MAC address of a locally 1852 connected CE, it may advertise this address to other PEs by including 1853 it in the MAC/IP Advertisement route. The IP address may be an IPv4 1854 address encoded using 4 octets or an IPv6 address encoded using 1855 16 octets. For ARP and ND purposes, the IP Address Length field MUST 1856 be set to 32 for an IPv4 address or 128 for an IPv6 address. 1858 If there are multiple IP addresses associated with a MAC address, 1859 then multiple MAC/IP Advertisement routes MUST be generated, one for 1860 each IP address. For instance, this may be the case when there are 1861 both an IPv4 and an IPv6 address associated with the same MAC address 1862 for dual-IP-stack scenarios. When the IP address is dissociated with 1863 the MAC address, then the MAC/IP Advertisement route with that 1864 particular IP address MUST be withdrawn. 1866 Note that a MAC-only route can be advertised along with, but 1867 independent from, a MAC/IP route for scenarios where the MAC learning 1868 over an access network/node is done in the data plane and independent 1869 from ARP snooping that generates a MAC/IP route. In such scenarios, 1870 when the ARP entry times out and causes the MAC/IP to be withdrawn, 1871 then the MAC information will not be lost. In scenarios where the 1872 host MAC/IP is learned via the management or control plane, then the 1873 sender PE may only generate and advertise the MAC/IP route. If the 1874 receiving PE receives both the MAC-only route and the MAC/IP route, 1875 then when it receives a withdraw message for the MAC/IP route, it 1876 MUST delete the corresponding entry from the ARP table but not the 1877 MAC entry from the MAC-VRF table, unless it receives a withdraw 1878 message for the MAC-only route. 1880 When a PE receives an ARP Request for an IP address from a CE, and if 1881 the PE has the MAC address binding for that IP address, the PE SHOULD 1882 perform ARP proxy by responding to the ARP Request. 1884 In the same way, when a PE receives a Neighbor Solicitation for an IP 1885 address from a CE, the PE SHOULD perform ND proxy and respond if the 1886 PE has the binding information for the IP. 1888 10.1. Default Gateway 1890 When a PE needs to perform inter-subnet forwarding where each subnet 1891 is represented by a different broadcast domain (e.g., a different 1892 VLAN), the inter-subnet forwarding is performed at Layer 3, and the 1893 PE that performs such a function is called the default gateway for 1894 the EVPN instance. In this case, when the PE receives an ARP Request 1895 for the IP address configured as the default gateway address, the PE 1896 originates an ARP Reply. 1898 Each PE that acts as a default gateway for a given EVPN instance MAY 1899 advertise in the EVPN control plane its default gateway MAC address 1900 using the MAC/IP Advertisement route, and each such PE indicates that 1901 such a route is associated with the default gateway. This is 1902 accomplished by requiring the route to carry the Default Gateway 1903 extended community defined in Section 7.8 1904 ("Default Gateway Extended Community"). The ESI field is set to zero 1905 when advertising the MAC route with the Default Gateway extended 1906 community. 1908 The IP Address field of the MAC/IP Advertisement route is set to the 1909 default gateway IP address for that subnet (e.g., an EVPN instance). 1910 For a given subnet (e.g., a VLAN or EVPN instance), the default 1911 gateway IP address is the same across all the participant PEs. The 1912 inclusion of this IP address enables the receiving PE to check its 1913 configured default gateway IP address against the one received in the 1914 MAC/IP Advertisement route for that subnet (or EVPN instance), and if 1915 there is a discrepancy, then the PE SHOULD notify the operator and 1916 log an error message. 1918 Unless it is known a priori (by means outside of this document) that 1919 all PEs of a given EVPN instance act as a default gateway for that 1920 EVPN instance, the MPLS label MUST be set to a valid downstream 1921 assigned label. 1923 Furthermore, even if all PEs of a given EVPN instance do act as a 1924 default gateway for that EVPN instance, but only some, but not all, 1925 of these PEs have sufficient (routing) information to provide 1926 inter-subnet routing for all the inter-subnet traffic originated 1927 within the subnet associated with the EVPN instance, then when such a 1928 PE advertises in the EVPN control plane its default gateway MAC 1929 address using the MAC/IP Advertisement route and indicates that such 1930 a route is associated with the default gateway, the route MUST carry 1931 a valid downstream assigned label. 1933 If all PEs of a given EVPN instance act as a default gateway for that 1934 EVPN instance, and the same default gateway MAC address is used 1935 across all gateway devices, then no such advertisement is needed. 1936 However, if each default gateway uses a different MAC address, then 1937 each default gateway needs to be aware of other gateways' MAC 1938 addresses and thus the need for such an advertisement. This is 1939 called MAC address aliasing, since a single default gateway can be 1940 represented by multiple MAC addresses. 1942 Each PE that receives this route and imports it as per procedures 1943 specified in this document follows the procedures in this section 1944 when replying to ARP Requests that it receives. 1946 Each PE that acts as a default gateway for a given EVPN instance that 1947 receives this route and imports it as per procedures specified in 1948 this document MUST create MAC forwarding state that enables it to 1949 apply IP forwarding to the packets destined to the MAC address 1950 carried in the route. 1952 10.1.1. Best Path selection for Default Gateway 1954 Default gateway MAC address that is assigned to an IRB interface (for 1955 a subnet) in a PE MUST be unique in context of that subnet. In other 1956 words, the same MAC address cannot be used by a host either 1957 intentionally or accidently. Therefore, in case such conflicts 1958 arises, there needs to be scheme to detect it and resolve it. In 1959 order to properly detect such conflicts, the following BGP best path 1960 selection MUST be applied. 1962 * When comparing two routes, the route which has Default Gateway 1963 extended community is preferred over a route which does not have 1964 the extended comunity. The PE that has advertised the MAC route 1965 without Default Gateway extended community, upon receiving the 1966 route with Default Gateway extended community, SHALL withdraw its 1967 route and raise an alarm. 1969 * When comparing two routes where both routes have the Default 1970 Gateway extended community, normal BGP best path processing is be 1971 applied. 1973 * When comparing local and remote routes with Default Gateway 1974 extended community, the local route is always preferred. 1976 * MAC Mobility extended community SHALL NOT be attached to routes 1977 which also have Default Gateway extended community on the sending 1978 side and SHALL be ignored on the receiving side. 1980 11. Handling of Multi-destination Traffic 1982 Procedures are required for a given PE to flood broadcast or 1983 multicast traffic received from a CE and with a given Ethernet tag to 1984 the other PEs in the associated [EVI, BD] (EVPN instance). In 1985 certain scenarios, as described in Section 12 1986 ("Processing of Unknown Unicast Packets"), a given PE may also need 1987 to flood unknown unicast traffic to other PEs. 1989 The PEs in a particular EVPN instance may use ingress replication, 1990 P2MP LSPs, or MP2MP LSPs to send unknown unicast, broadcast, or 1991 multicast traffic to other PEs. 1993 Each PE MUST advertise an "Inclusive Multicast Ethernet Tag route" to 1994 enable the above. The following subsection provides the procedures 1995 to construct the Inclusive Multicast Ethernet Tag route. Subsequent 1996 subsections describe its usage in further detail. 1998 11.1. Constructing Inclusive Multicast Ethernet Tag Route 2000 The RD MUST be set per Section 7.9. 2002 The Ethernet Tag ID is set as defined in Section 6. 2004 The Originating Router's IP Address field value MUST be set to an IP 2005 address of the PE that should be common for all the EVIs on the PE 2006 (e.g., this address may be the PE's loopback address). The IP 2007 Address Length field is in bits. 2009 The Next Hop field of the MP_REACH_NLRI attribute of the route MUST 2010 be set to the IPv4 or IPv6 address of the advertising PE. 2012 The BGP advertisement for the Inclusive Multicast Ethernet Tag route 2013 MUST also carry one or more Route Target (RT) extended communities. 2014 The assignment of RTs as described in Section 7.10 MUST be followed. 2016 11.2. P-Tunnel Identification 2018 In order to identify the P-tunnel used for sending broadcast, unknown 2019 unicast, or multicast traffic, the Inclusive Multicast Ethernet Tag 2020 route MUST carry a Provider Multicast Service Interface (PMSI) Tunnel 2021 attribute as specified in [RFC6514]. 2023 Depending on the technology used for the P-tunnel for the EVPN 2024 instance on the PE, the PMSI Tunnel attribute of the Inclusive 2025 Multicast Ethernet Tag route is constructed as follows. 2027 + If the PE that originates the advertisement uses a P-multicast tree 2028 for the P-tunnel for EVPN, the PMSI Tunnel attribute MUST contain 2029 the identity of the tree (note that the PE could create the 2030 identity of the tree prior to the actual instantiation of the 2031 tree). 2033 + A PE that uses a P-multicast tree for the P-tunnel MAY aggregate 2034 two or more Broadcast Domains (BDs) present on the PE onto the same 2035 tree. In this case, in addition to carrying the identity of the 2036 tree, the PMSI Tunnel attribute MUST carry an MPLS label, which the 2037 PE has bound uniquely to the BD associated with this update (as 2038 determined by its RTs and Ethernet Tag ID). The assigned MPLS 2039 label is upstream allocated unless the procedures in section 19 2040 (Use of Domain-wide Common Block (DCB) Labels) are followed. If 2041 the PE has already advertised Inclusive Multicast Ethernet Tag 2042 routes for two or more BDs that it now desires to aggregate, then 2043 the PE MUST re-advertise those routes. The re-advertised routes 2044 MUST be the same as the original ones, except for the PMSI Tunnel 2045 attribute and the label carried in that attribute. 2047 + If the PE that originates the advertisement uses ingress 2048 replication for the P-tunnel for EVPN, the route MUST include the 2049 PMSI Tunnel attribute with the Tunnel Type set to Ingress 2050 Replication and the Tunnel Identifier set to a routable address of 2051 the PE. The PMSI Tunnel attribute MUST carry a downstream assigned 2052 MPLS label. This label is used to demultiplex the broadcast, 2053 multicast, or unknown unicast EVPN traffic received over an MP2P 2054 tunnel by the PE. 2056 12. Processing of Unknown Unicast Packets 2058 The procedures in this document do not require the PEs to flood 2059 unknown unicast traffic to other PEs. If PEs learn CE MAC addresses 2060 via a control-plane protocol, the PEs can then distribute MAC 2061 addresses via BGP, and all unicast MAC addresses will be learned 2062 prior to traffic to those destinations. 2064 However, if a destination MAC address of a received packet is not 2065 known by the PE, the PE may have to flood the packet. When flooding, 2066 one must take into account "split-horizon forwarding" as follows: The 2067 principles behind the following procedures are borrowed from the 2068 split-horizon forwarding rules in VPLS solutions [RFC4761] [RFC4762]. 2069 When a PE capable of flooding (say PEx) receives an unknown 2070 destination MAC address, it floods the frame. If the frame arrived 2071 from an attached CE, PEx must send a copy of that frame on every 2072 Ethernet segment (belonging to that EVI) for which it is the DF, 2073 other than the Ethernet segment on which it received the frame. In 2074 addition, the PE must flood the frame to all other PEs participating 2075 in that EVPN instance. If, on the other hand, the frame arrived from 2076 another PE (say PEy), PEx must send a copy of the packet on each 2077 Ethernet segment (belonging to that EVI) for which it is the DF. PEx 2078 MUST NOT send the frame to other PEs, since PEy would have already 2079 done so. Split-horizon forwarding rules apply to unknown MAC 2080 addresses. 2082 Whether or not to flood packets to unknown destination MAC addresses 2083 should be an administrative choice, depending on how learning happens 2084 between CEs and PEs. 2086 The PEs in a particular EVPN instance may use ingress replication 2087 using RSVP-TE P2P LSPs or LDP MP2P LSPs for sending unknown unicast 2088 traffic to other PEs. Or, they may use RSVP-TE P2MP or LDP P2MP for 2089 sending such traffic to other PEs. 2091 12.1. Ingress Replication 2093 If ingress replication is in use, the P-tunnel attribute, carried in 2094 the Inclusive Multicast Ethernet Tag routes for the EVPN instance, 2095 specifies the downstream label that the other PEs can use to send 2096 unknown unicast, multicast, or broadcast traffic for that EVPN 2097 instance to this particular PE. 2099 The PE that receives a packet with this particular MPLS label MUST 2100 treat the packet as a broadcast, multicast, or unknown unicast 2101 packet. Further, if the MAC address is a unicast MAC address, the PE 2102 MUST treat the packet as an unknown unicast packet. 2104 12.2. P2MP MPLS LSPs 2106 The procedures for using P2MP or MP2MP LSPs are very similar to the 2107 VPLS procedures described in [RFC7117]. The P-tunnel attribute used 2108 by a PE for sending unknown unicast, broadcast, or multicast traffic 2109 for a particular EVPN instance is advertised in the Inclusive 2110 Multicast Ethernet Tag route as described in Section 11 2111 ("Handling of Multi-destination Traffic"). 2113 The P-tunnel attribute specifies the P2MP or MP2MP LSP identifier. 2114 This is the equivalent of an Inclusive tree as described in 2115 [RFC7117]. Note that multiple BDs in the same or different EVIs may 2116 use the same P2MP or MP2MP LSP, using upstream labels [RFC7117] or 2117 DCB labels [I-D.ietf-bess-mvpn-evpn-aggregation-label]. This is the 2118 equivalent of an Aggregate Inclusive tree [RFC7117]. When P2MP or 2119 MP2MP LSPs are used for flooding unknown unicast traffic, packet 2120 reordering is possible. 2122 The PE that receives a packet on the P2MP or MP2MP LSP specified in 2123 the PMSI Tunnel attribute MUST treat the packet as a broadcast, 2124 multicast, or unknown unicast packet. Further, if the MAC address is 2125 a unicast MAC address, the PE MUST treat the packet as an unknown 2126 unicast packet. 2128 13. Forwarding Unicast Packets 2130 This section describes procedures for forwarding unicast packets by 2131 PEs, where such packets are received from either directly connected 2132 CEs or some other PEs. 2134 13.1. Forwarding Packets Received from a CE 2136 When a PE receives a packet from a CE with a given Ethernet Tag, it 2137 must first look up the packet's source MAC address. In certain 2138 environments that enable MAC security, the source MAC address MAY be 2139 used to validate the host identity and determine that traffic from 2140 the host can be allowed into the network. Source MAC lookup MAY also 2141 be used for local MAC address learning. 2143 If the PE decides to forward the packet, the destination MAC address 2144 of the packet must be looked up. If the PE has received MAC address 2145 advertisements for this destination MAC address from one or more 2146 other PEs or has learned it from locally connected CEs, the MAC 2147 address is considered a known MAC address. Otherwise, it is 2148 considered an unknown MAC address. 2150 For known MAC addresses, the PE forwards this packet to one of the 2151 remote PEs or to a locally attached CE. When forwarding to a remote 2152 PE, the packet is encapsulated in the EVPN MPLS label advertised by 2153 the remote PE, for that MAC address, and in the MPLS LSP label stack 2154 to reach the remote PE. 2156 If the MAC address is unknown and if the administrative policy on the 2157 PE requires flooding of unknown unicast traffic, then: 2159 - The PE MUST flood the packet to other PEs. The PE MUST first 2160 encapsulate the packet in the ESI MPLS label as described in 2161 Section 8.3. If ingress replication is used, the packet MUST be 2162 replicated to each remote PE, with the VPN label being the MPLS 2163 label advertised by the remote PE in a PMSI Tunnel attribute in the 2164 Inclusive Multicast Ethernet Tag route for the [EVI, BD] associated 2165 with the received packet's Ethernet tag. 2167 If P2MP LSPs are being used, the packet MUST be sent on the P2MP 2168 LSP of which the PE is the root, for the [EVI, BD] associated with 2169 the received packet's Ethernet tag. If the same P2MP LSP is used 2170 for all the BD's in the EVI, then all the PEs in the EVI MUST be 2171 the leaves of the P2MP LSP. If a different P2MP LSP is used for a 2172 given BD in the EVI, then only the PEs in that BD MUST be the 2173 leaves of the P2MP LSP. The packet MUST be encapsulated in the 2174 P2MP LSP label stack. 2176 If the MAC address is unknown, then, if the administrative policy on 2177 the PE does not allow flooding of unknown unicast traffic: 2179 - the PE MUST drop the packet. 2181 13.2. Forwarding Packets Received from a Remote PE 2183 This section describes the procedures for forwarding known and 2184 unknown unicast packets received from a remote PE. 2186 13.2.1. Unknown Unicast Forwarding 2188 When a PE receives an MPLS packet from a remote PE, then, after 2189 processing the MPLS label stack, if the top MPLS label ends up being 2190 a P2MP LSP label associated with an EVPN instance or -- in the case 2191 of ingress replication -- the downstream label advertised in the 2192 P-tunnel attribute, and after performing the split-horizon procedures 2193 described in Section 8.3: 2195 - If the PE is the designated forwarder of BUM traffic on a 2196 particular set of ESes for the [EVI, BD], the default behavior is 2197 for the PE to flood that traffic to these ESes. In other words, 2198 the default behavior is for the PE to assume that for BUM traffic 2199 it is not required to perform a destination MAC address lookup. As 2200 an option, the PE may perform a destination MAC lookup to flood the 2201 packet to only a subset of these ESes. For instance, the PE may 2202 decide to not flood a BUM packet on certain Ethernet segments even 2203 if it is the DF on the Ethernet segment, based on administrative 2204 policy. 2206 - If the PE is not the designated forwarder for any ES associated 2207 with the [EVI, BD], the default behavior is for it to drop the BUM 2208 traffic. 2210 13.2.2. Known Unicast Forwarding 2212 If the top MPLS label ends up being an EVPN label that was advertised 2213 in the unicast MAC advertisements, then the PE either forwards the 2214 packet based on CE next-hop forwarding information associated with 2215 the label or does a destination MAC address lookup to forward the 2216 packet to a CE. 2218 14. Load Balancing of Unicast Packets 2220 This section specifies the load-balancing procedures for sending 2221 known unicast packets to a multihomed CE. 2223 14.1. Load Balancing of Traffic from a PE to Remote CEs 2225 When a remote PE imports a MAC/IP Advertisement route for a given ES 2226 in a MAC-VRF, it MUST examine all imported Ethernet A-D routes for 2227 that ESI in order to determine the load- balancing characteristics of 2228 the Ethernet segment. 2230 14.1.1. Single-Active Redundancy Mode 2232 For a given ES, if a remote PE has imported the set of Ethernet A-D 2233 per ES routes from at least one PE, where the "Single-Active" flag in 2234 the ESI Label extended community is set, then that remote PE MUST 2235 deduce that the ES is operating in Single-Active redundancy mode. 2237 This means that for a given [EVI, BD], a given MAC address is only 2238 reachable only via the PE announcing the associated MAC/IP 2239 Advertisement route - this PE will also have advertised an Ethernet 2240 A-D per EVI route for that [EVI, BD] with an L2-Attr extended 2241 community in which the P bit is set. I.e., the Primary DF Elected PE 2242 is also responsible for sending known unicast frames to the CE and 2243 receiving unicast and BUM frames from it. Similarly, the Backup DF 2244 Elected PE will have advertised an Ethernet AD per EVI route for 2245 [EVI, BD] with an L2-Attr extended community in which the B bit is 2246 set. 2248 If the Primary DF Elected PE loses connectivity to the CE it SHOULD 2249 withdraw its set of Ethernet A-D per ES routes for the affected ES 2250 prior to withdrawing the affected MAC/IP Advertisement routes. The 2251 Backup DF Elected PE (which is now the Primary DF Elected PE) needs 2252 to advertise an Ethernet A-D per EVI route for [EVI, BD] with an 2253 L2-Attr extended community in which the P bit is set. Furthermore, 2254 the new Backup DF Elected PE needs to advertise an Ethernet A-D per 2255 EVI route for [EVI, BD] with an L2-Attr extended community in which 2256 the B bit is set. 2258 A remote PE SHOULD use the Primary DF Elected PE's withdrawal of its 2259 set of Ethernet A-D per ES routes as a trigger to update its 2260 forwarding entries for the associated MAC addresses to point at the 2261 Backup DF Elected PE. As the Backup DF Elected PE starts learning 2262 the MAC addresses over its attached ES, it will start sending MAC/IP 2263 Advertisement routes while the failed PE withdraws its routes. This 2264 mechanism minimizes the flooding of traffic during fail-over events. 2266 14.1.2. All-Active Redundancy Mode 2268 For a given ES, if the remote PE has imported the set of Ethernet A-D 2269 per ES routes from one or more PEs and none of them have the 2270 "Single-Active" flag in the ESI Label extended community set, then 2271 the remote PE MUST deduce that the ES is operating in All-Active 2272 redundancy mode. A remote PE that receives a MAC/IP Advertisement 2273 route with a non-reserved ESI SHOULD consider the advertised MAC 2274 address to be reachable via all PEs that have advertised reachability 2275 to that MAC address's EVI/ES/Ethernet Tag ID via the combination of 2276 an Ethernet A-D per EVI route for that EVI/ES/Ethernet Tag ID AND an 2277 Ethernet A-D per ES route for that ES. The remote PE MUST use 2278 received MAC/IP Advertisement routes and Ethernet A-D per EVI/per ES 2279 routes to construct the set of next hops for the advertised MAC 2280 address. 2282 Each next hop comprises an MPLS label stack that is to be used to 2283 reach a given egress PE and allow it to forward a packet. The 2284 portion of the MPLS label stack that is to be used by that egress PE 2285 to forward a packet is constructed by the remote PE as follows: 2287 - If a MAC/IP Advertisement route was received from that PE, then its 2288 label stack MUST be used in the next hop. 2290 - Otherwise, the label stack from the Ethernet A-D per EVI route that 2291 matches the MAC address' EVI/ES/Ethernet Tag ID MUST be used in the 2292 next hop. 2294 The following example explains the above. 2296 Consider a CE (CE1) that is dual-homed to two PEs (PE1 and PE2) on a 2297 LAG interface (ES1), and is sending packets with source MAC address 2298 MAC1 on VLAN1 (mapped to EVI1). A remote PE, say PE3, is able to 2299 learn that MAC1 is reachable via PE1 and PE2. Both PE1 and PE2 may 2300 advertise MAC1 if they receive packets with MAC1 from CE1. If this 2301 is not the case, and if MAC1 is advertised only by PE1, PE3 still 2302 considers MAC1 as reachable via both PE1 and PE2, as both PE1 and PE2 2303 advertise a set of Ethernet A-D per ES routes for ES1 as well as an 2304 Ethernet A-D per EVI route for . 2306 The MPLS label stack to send the packets to PE1 is the MPLS LSP stack 2307 to get to PE1 (at the top of the stack) followed by the EVPN label 2308 advertised by PE1 for CE1's MAC. 2310 The MPLS label stack to send packets to PE2 is the MPLS LSP stack to 2311 get to PE2 (at the top of the stack) followed by the MPLS label in 2312 the Ethernet A-D route advertised by PE2 for , if PE2 has 2313 not advertised MAC1 in BGP. 2315 We will refer to these label stacks as MPLS next hops. 2317 The remote PE (PE3) can now load balance the traffic it receives from 2318 its CEs, destined for CE1, between PE1 and PE2. PE3 may use N-tuple 2319 flow information to hash traffic into one of the MPLS next hops for 2320 load balancing of IP traffic. Alternatively, PE3 may rely on the 2321 source MAC addresses for load balancing. 2323 Note that once PE3 decides to send a particular packet to PE1 or PE2, 2324 it can pick one out of multiple possible paths to reach the 2325 particular remote PE using regular MPLS procedures. For instance, if 2326 the tunneling technology is based on RSVP-TE LSPs and PE3 decides to 2327 send a particular packet to PE1, then PE3 can choose from multiple 2328 RSVP-TE LSPs that have PE1 as their destination. 2330 When PE1 or PE2 receives the packet destined for CE1 from PE3, if the 2331 packet is a known unicast, it is forwarded to CE1. 2333 14.2. Load Balancing of Traffic between a PE and a Local CE 2335 A CE may be configured with more than one interface connected to 2336 different PEs or the same PE for load balancing, using a technology 2337 such as a LAG. The PE(s) and the CE can load balance traffic onto 2338 these interfaces using one of the following mechanisms. 2340 14.2.1. Data-Plane Learning 2342 Consider that the PEs perform data-plane learning for local MAC 2343 addresses learned from local CEs. This enables the PE(s) to learn a 2344 particular MAC address and associate it with one or more interfaces, 2345 if the technology between the PE and the CE supports multipathing. 2346 The PEs can now load balance traffic destined to that MAC address on 2347 the multiple interfaces. 2349 Whether the CE can load balance traffic that it generates on the 2350 multiple interfaces is dependent on the CE implementation. 2352 14.2.2. Control-Plane Learning 2354 The CE can be a host that advertises the same MAC address using a 2355 control protocol on all interfaces. This enables the PE(s) to learn 2356 the host's MAC address and associate it with all interfaces. The PEs 2357 can now load balance traffic destined to the host on all these 2358 interfaces. The host can also load balance the traffic it generates 2359 onto these interfaces, and the PE that receives the traffic employs 2360 EVPN forwarding procedures to forward the traffic. 2362 15. MAC Mobility 2364 It is possible for a given host or end-station (as defined by its MAC 2365 address) to move from one Ethernet segment to another; this is 2366 referred to as 'MAC Mobility' or 'MAC move', and it is different from 2367 the multihoming situation in which a given MAC address is reachable 2368 via multiple PEs for the same Ethernet segment. In a MAC move, there 2369 would be two sets of MAC/IP Advertisement routes -- one set with the 2370 new Ethernet segment and one set with the previous Ethernet segment 2371 -- and the MAC address would appear to be reachable via each of these 2372 segments. 2374 In order to allow all of the PEs in the EVPN instance to correctly 2375 determine the current location of the MAC address, all advertisements 2376 of it being reachable via the previous Ethernet segment MUST be 2377 withdrawn by the PEs, for the previous Ethernet segment, that had 2378 advertised it. 2380 If local learning is performed using the data plane, these PEs will 2381 not be able to detect that the MAC address has moved to another 2382 Ethernet segment, and the receipt of MAC/IP Advertisement routes, 2383 with the MAC Mobility extended community, from other PEs serves as 2384 the trigger for these PEs to withdraw their advertisements. If local 2385 learning is performed using the control or management planes, these 2386 interactions serve as the trigger for these PEs to withdraw their 2387 advertisements. 2389 In a situation where there are multiple moves of a given MAC, 2390 possibly between the same two Ethernet segments, there may be 2391 multiple withdrawals and re-advertisements. In order to ensure that 2392 all PEs in the EVPN instance receive all of these correctly through 2393 the intervening BGP infrastructure, introducing a sequence number 2394 into the MAC Mobility extended community is necessary. 2396 In order to process mobility events correctly, an implementation MUST 2397 handle scenarios in which sequence number wraparound occurs. 2399 Every MAC mobility event for a given MAC address will contain a 2400 sequence number that is set using the following rules: 2402 - A PE advertising a MAC address for the first time advertises it 2403 with no MAC Mobility extended community. 2405 - A PE detecting a locally attached MAC address for which it had 2406 previously received a MAC/IP Advertisement route with a different 2407 Ethernet segment identifier advertises the MAC address in a MAC/IP 2408 Advertisement route tagged with a MAC Mobility extended community 2409 with a sequence number one greater than the sequence number in the 2410 MAC Mobility extended community of the received MAC/IP 2411 Advertisement route. In the case of the first mobility event for a 2412 given MAC address, where the received MAC/IP Advertisement route 2413 does not carry a MAC Mobility extended community, the value of the 2414 sequence number in the received route is assumed to be 0 for the 2415 purpose of this processing. 2417 - A PE detecting a locally attached MAC address for which it had 2418 previously received a MAC/IP Advertisement route with the same 2419 non-zero Ethernet segment identifier advertises it with: 2421 1. no MAC Mobility extended community, if the received route did 2422 not carry said extended community. 2424 2. a MAC Mobility extended community with the sequence number equal 2425 to the highest of the sequence number(s) in the received MAC/IP 2426 Advertisement route(s), if the received route(s) is (are) tagged 2427 with a MAC Mobility extended community. 2429 - A PE detecting a locally attached MAC address for which it had 2430 previously received a MAC/IP Advertisement route with the same zero 2431 Ethernet segment identifier (single-homed scenarios) advertises it 2432 with a MAC Mobility extended community with the sequence number set 2433 properly. In the case of single-homed scenarios, there is no need 2434 for ESI comparison. ESI comparison is done for multihoming in 2435 order to prevent false detection of MAC moves among the PEs 2436 attached to the same multihomed site. 2438 A PE receiving a MAC/IP Advertisement route for a MAC address with a 2439 different Ethernet segment identifier and a higher sequence number 2440 than that which it had previously advertised withdraws its MAC/IP 2441 Advertisement route. If two (or more) PEs advertise the same MAC 2442 address with the same sequence number but different Ethernet segment 2443 identifiers, a PE that receives these routes selects the route 2444 advertised by the PE with the lowest IP address as the best route. 2445 If the PE is the originator of the MAC route and it receives the same 2446 MAC address with the same sequence number that it generated, it will 2447 compare its own IP address with the IP address of the remote PE and 2448 will select the lowest IP. If its own route is not the best one, it 2449 will withdraw the route. 2451 15.1. MAC Duplication Issue 2453 A situation may arise where the same MAC address is learned by 2454 different PEs in the same VLAN because of two (or more) hosts being 2455 misconfigured with the same (duplicate) MAC address. In such a 2456 situation, the traffic originating from these hosts would trigger 2457 continuous MAC moves among the PEs attached to these hosts. It is 2458 important to recognize such a situation and avoid incrementing the 2459 sequence number (in the MAC Mobility extended community) to infinity. 2460 In order to remedy such a situation, a PE that detects a MAC mobility 2461 event via local learning starts an M-second timer (with a default 2462 value of M = 180), and if it detects N MAC moves before the timer 2463 expires (with a default value of N = 5), it concludes that a 2464 duplicate-MAC situation has occurred. The PE MUST alert the operator 2465 and stop sending and processing any BGP MAC/IP Advertisement routes 2466 for that MAC address until a corrective action is taken by the 2467 operator. The values of M and N MUST be configurable to allow for 2468 flexibility in operator control. Note that the other PEs in the EVPN 2469 instance will forward the traffic for the duplicate MAC address to 2470 one of the PEs advertising the duplicate MAC address. 2472 15.2. Sticky MAC Addresses 2474 There are scenarios in which it is desired to configure some MAC 2475 addresses as static so that they are not subjected to MAC moves. In 2476 such scenarios, these MAC addresses are advertised with a MAC 2477 Mobility extended community where the static flag is set to 1 and the 2478 sequence number is set to zero. If a PE receives such advertisements 2479 and later learns the same MAC address(es) via local learning, then 2480 the PE MUST alert the operator. 2482 15.3. Loop Protection 2484 The EVPN MAC Duplication procedure in section 15.1 prevents an 2485 endless EVPN MAC/IP route advertisement exchange for a duplicate MAC 2486 between two (or more) PEs. While this helps the control plane 2487 settle, in case there is backdoor link (loop) between two or more PEs 2488 attached to the same BD, BUM frames being sent by a CE are still 2489 endlessly looped within the BD through the backdoor link and among 2490 the PEs. This may cause unpredictable issues in the CEs connected to 2491 the affected BD. 2493 The EVPN MAC Duplication Mechanism in section 15.1 MAY be extended 2494 with a Loop-protection action that is applied on the duplicate-MAC 2495 addresses. This additional mechanism resolves loops created by 2496 accidental or intentional backdoor links and SHOULD be enabled in all 2497 the PEs attached to the BD. 2499 After following the procedure in section 15.1, when a PE detects a 2500 MAC M as duplicate, the PE behaves as follows: 2502 a) Stops advertising M and logs a duplicate event. 2504 b) Initializes a retry-timer, R seconds. 2506 c) Since Loop Protection is enabled, the PE executes a Loop 2507 Protection action, which we refer to as "Black-Holing" M. 2509 When the PE programs M as a Black-Hole MAC in the Bridge Table, M is 2510 no longer associated to the backdoor Attachment Circuit (AC), but to 2511 a Black-Hole destination. 2513 At this point and while M is in Black-Hole state: 2515 a) If a new frame is received (from the EVPN network or the backdoor 2516 AC) with MAC SA = M, the PE identifies M to be Black-Holed and 2517 discards the frame, ending the loop. 2519 b) Optionally, instead of simply discarding the frame with MAC SA = 2520 M, the PE MAY bring down the AC on which the offending frame is 2521 seen last. 2523 c) Optionally, any frame that arrives at the PE with MAC DA = M 2524 SHOULD be discarded too. 2526 When the retry-timer R for M expires, the PE flushes M from the 2527 Bridge Table and the process is restarted. In general, a Black-Hole 2528 MAC M can be flushed from the Bridge Table if any of the following 2529 events occur: 2531 o Retry-timer R for duplicate-MAC M expires (as discussed). R is 2532 initialized when M is detected as duplicate-MAC. Its value is 2533 configurable and SHOULD be at least three times the EVPN MAC 2534 Duplication M-timer window. 2536 o The operator manually flushes a Black-Hole MAC M. This should be 2537 done only if the conditions under which M was identified as 2538 duplicate have been cleared. 2540 o The remote PE withdraws the MAC/IP route for M and there are no 2541 other remote MAC/IP routes for M. 2543 o The remote PE sends a MAC/IP route update for M with the sticky-bit 2544 set (in the MAC Mobility extended community). 2546 16. Multicast and Broadcast 2548 The PEs in a particular EVPN instance may use ingress replication or 2549 P2MP or MP2MP LSPs to send multicast traffic to other PEs. 2551 16.1. Ingress Replication 2553 The PEs may use ingress replication for flooding BUM traffic as 2554 described in Section 11 ("Handling of Multi-destination Traffic"). A 2555 given broadcast packet must be sent to all the remote PEs. However, 2556 a given multicast packet for a multicast flow may be sent to only a 2557 subset of the PEs. Specifically, a given multicast flow may be sent 2558 to only those PEs that have receivers that are interested in the 2559 multicast flow. Determining which of the PEs have receivers for a 2560 given multicast flow is done using the procedures of 2561 [I-D.ietf-bess-evpn-igmp-mld-proxy]. 2563 16.2. P2MP or MP2MP LSPs 2565 A PE may use an "Inclusive" tree for sending a BUM packet. This 2566 terminology is borrowed from [RFC7117]. 2568 A variety of transport technologies may be used in the service 2569 provider (SP) network. For Inclusive P-multicast trees, these 2570 transport technologies include point-to-multipoint LSPs created by 2571 RSVP-TE or Multipoint LDP (mLDP) or BIER. 2573 16.2.1. Inclusive Trees 2575 An Inclusive tree allows the use of a single multicast distribution 2576 tree, referred to as an Inclusive P-multicast tree, in the SP network 2577 to carry all the multicast traffic from a specified set of EVPN 2578 instances on a given PE. A particular P-multicast tree can be set up 2579 to carry the traffic originated by sites belonging to a single EVPN 2580 instance, or to carry the traffic originated by sites belonging to 2581 several EVPN instances. The ability to carry the traffic of more 2582 than one EVPN instance on the same tree is termed 'Aggregation', and 2583 the tree is called an Aggregate Inclusive P-multicast tree or 2584 Aggregate Inclusive tree for short. The Aggregate Inclusive tree 2585 needs to include every PE that is a member of any of the EVPN 2586 instances that are using the tree. This implies that a PE may 2587 receive BUM traffic even if it doesn't have any receivers that are 2588 interested in receiving that traffic. 2590 An Inclusive or Aggregate Inclusive tree as defined in this document 2591 is a P2MP tree. A P2MP or MP2MP tree is used to carry traffic only 2592 for EVPN CEs that are connected to the PE that is the root of the 2593 tree. 2595 The procedures for signaling an Inclusive tree are the same as those 2596 in [RFC7117], with the VPLS A-D route replaced with the Inclusive 2597 Multicast Ethernet Tag route. The P-tunnel attribute [RFC7117] for 2598 an Inclusive tree is advertised with the Inclusive Multicast Ethernet 2599 Tag route as described in Section 11 2600 ("Handling of Multi-destination Traffic"). Note that for an 2601 Aggregate Inclusive tree, a PE can "aggregate" multiple EVPN 2602 instances on the same P2MP LSP using upstream labels or DCB allocated 2603 labels [I-D.ietf-bess-mvpn-evpn-aggregation-label]. The procedures 2604 for aggregation are the same as those described in [RFC7117], with 2605 VPLS A-D routes replaced by EVPN Inclusive Multicast Ethernet Tag 2606 routes. 2608 17. Convergence 2610 This section describes failure recovery from different types of 2611 network failures. 2613 17.1. Transit Link and Node Failures between PEs 2615 The use of existing MPLS fast-reroute mechanisms can provide failure 2616 recovery on the order of 50 ms, in the event of transit link and node 2617 failures in the infrastructure that connects the PEs. 2619 17.2. PE Failures 2621 Consider a host CE1 that is dual-homed to PE1 and PE2. If PE1 fails, 2622 a remote PE, PE3, can discover this based on the failure of the BGP 2623 session. This failure detection can be in the sub-second range if 2624 Bidirectional Forwarding Detection (BFD) is used to detect BGP 2625 session failures. PE3 can update its forwarding state to start 2626 sending all traffic for CE1 to only PE2. 2628 17.3. PE-to-CE Network Failures 2630 If the connectivity between the multihomed CE and one of the PEs to 2631 which it is attached fails, the PE MUST withdraw the set of Ethernet 2632 A-D per ES routes that had been previously advertised for that ES. 2633 This enables the remote PEs to remove the MPLS next hop to this 2634 particular PE from the set of MPLS next hops that can be used to 2635 forward traffic to the CE. When the MAC entry on the PE ages out, 2636 the PE MUST withdraw the MAC address from BGP. 2638 When an EVI is decommissioned on an Ethernet segment the PE MUST 2639 withdraw the Ethernet A-D per EVI route(s) announced for that . In addition, the PE MUST also withdraw the MAC/IP Advertisement 2641 routes that are impacted by the decommissioning. 2643 The Ethernet A-D per ES routes should be used by an implementation to 2644 optimize the withdrawal of MAC/IP Advertisement routes. When a PE 2645 receives a withdrawal of a particular Ethernet A-D route from an 2646 advertising PE, it SHOULD consider all the MAC/IP Advertisement 2647 routes that are learned from the same ESI as in the Ethernet A-D 2648 route from the advertising PE as having been withdrawn. This 2649 optimizes the network convergence times in the event of PE-to-CE 2650 failures. 2652 18. Frame Ordering 2654 In a MAC address, if the value of the first nibble (bits 8 through 5) 2655 of the most significant octet of the destination MAC address (which 2656 follows the last MPLS label) happens to be 0x4 or 0x6, then the 2657 Ethernet frame can be misinterpreted as an IPv4 or IPv6 packet by 2658 intermediate P nodes performing ECMP based on deep packet inspection, 2659 thus resulting in load balancing packets belonging to the same flow 2660 on different ECMP paths and subjecting those packets to different 2661 delays. Therefore, packets belonging to the same flow can arrive at 2662 the destination out of order. This out-of-order delivery can happen 2663 during steady state in the absence of any failures, resulting in 2664 significant impact on network operations. 2666 In order to avoid frame misordering described in Section 18, the 2667 following network-wide rules are applied: 2669 - If a network uses deep packet inspection for its ECMP, then the 2670 "Preferred PW MPLS Control Word" [RFC4385] MUST be used with the 2671 value 0 (e.g., a 4-octet field with a value of zero) when sending 2672 unicast EVPN-encapsulated packets over an MP2P LSP. 2674 - When sending EVPN-encapsulated packets over a P2MP or P2P RSVP-TE 2675 LSP, then the control word SHOULD NOT be used. 2677 - When sending EVPN-encapsulated packets over a P2MP LSP (e.g., using 2678 mLDP signaling), then the control word SHOULD be used. 2680 - If a network uses entropy labels per [RFC6790], then the control 2681 word SHOULD NOT be used when sending EVPN-encapsulated packets over 2682 an MP2P LSP. 2684 18.1. Flow Label 2686 Flow label is used to add entropy to divisible flows, and creates 2687 ECMP load-balancing in the network. The Flow Label MAY be used in 2688 EVPN networks to achieve better load-balancing in the network, when 2689 transit nodes perform deep packet inspection for ECMP hashing. The 2690 following rules apply: 2692 - When F-bit is set to 1, the PE announces the capability of both 2693 sending and receiving flow label for known unicast. If the PE is 2694 capable of supporting Flow Label, then : 2696 * upon receiving the F-bit set (F=1) from a remote PE, it MUST 2697 send known unicast packets to that PE with Flow labels; 2699 * alternately, upon receiving the F-bit unset (F=0) from a 2700 remote PE, it MUST NOT send known unicast packets to that PE 2701 with Flow labels. 2703 - The Flow Label MUST NOT be used for EVPN-encapsulated BUM packets. 2705 - An ingress PE will push the Flow Label at the bottom of the stack 2706 of the EVPN-encapsulated known unicast packets sent to an egress PE 2707 that previously signaled F-bit set to 1. 2709 - If a PE receives a unicast packet with two labels, then it can 2710 differentiate between [VPN label + ESI label] and [VPN label + Flow 2711 label] and there should be no ambiguity between ESI and Flow labels 2712 even if they overlap. The reason for this is that the downstream 2713 assigned VPN label for known unicast is different than for BUM 2714 traffic and ESI label (if present) comes after BUM VPN label. 2715 Therefore, from the VPN label, the receiving PE knows whether the 2716 next label is a ESI label or a Flow label - i.e., if the VPN label 2717 is for known unicast, then the next label MUST be a flow label and 2718 if the VPN label is for BUM traffic, then the next label MUST be an 2719 ESI label because BUM packets are not sent with Flow labels. 2721 - When sending EVPN-encapsulated packets over a P2MP LSP (either 2722 RSVP-TE or mLDP), flow label SHOULD NOT be used. This is 2723 independant of any F-bit signalling in the L2-Attr Extended 2724 Community which would still apply to unicast. 2726 19. Use of Domain-wide Common Block (DCB) Labels 2728 The use of DCB labels as in 2729 [I-D.ietf-bess-mvpn-evpn-aggregation-label] is RECOMMENDED in the 2730 following cases: 2732 + Aggregate P-multicast trees: A P-multicast tree MAY aggregate the 2733 traffic of two or more BDs on a given ingress PE. When aggregation 2734 is needed, DCB Labels [I-D.ietf-bess-mvpn-evpn-aggregation-label] 2735 MAY be used in the MPLS label field of the Inclusive Multicast 2736 Ethernet Tag routes PMSI Tunnel Attribute. The use of DCB Labels, 2737 instead of upstream allocated labels, can greatly reduce the number 2738 of labels that the egress PEs need to process when P-multicast 2739 tunnel aggregation is used in a network with a large number of BDs. 2741 + BIER tunnels: As described in [I-D.ietf-bier-evpn], the use of 2742 labels with BIER tunnels in EVPN networks is similar to aggregate 2743 tunnels, since the ingress PE uses upstream allocated labels to 2744 identify the BD. As described in [I-D.ietf-bier-evpn], DCB labels 2745 can be allocated instead of upstream labels in the PMSI Tunnel 2746 Attribute so that the number of labels required on the egress PEs 2747 can be reduced. 2749 + ESI labels: The ESI labels advertised with EVPN A-D per ES routes 2750 MAY be allocated as DCB labels in general, and are RECOMMENDED to 2751 be allocated as DCB labels when used in combination with P2MP/BIER 2752 tunnels. 2754 When MP2MP tunnels are used, ESI-labels MUST be allocated from a DCB 2755 and the same label must be used by all the PEs attached to the same 2756 Ethernet Segment. In that way, any egress PE with local Ethernet 2757 Segments can identify the source ES of the received BUM packets. 2759 20. Security Considerations 2761 Security considerations discussed in [RFC4761] and [RFC4762] apply to 2762 this document for MAC learning in the data plane over an Attachment 2763 Circuit (AC) and for flooding of unknown unicast and ARP messages 2764 over the MPLS/IP core. Security considerations discussed in 2765 [RFC4364] apply to this document for MAC learning in the control 2766 plane over the MPLS/IP core. This section describes additional 2767 considerations. 2769 As mentioned in [RFC4761], there are two aspects to achieving data 2770 privacy and protecting against denial-of-service attacks in a VPN: 2771 securing the control plane and protecting the forwarding path. 2772 Compromise of the control plane could result in a PE sending customer 2773 data belonging to some EVPN to another EVPN, or black-holing EVPN 2774 customer data, or even sending it to an eavesdropper, none of which 2775 are acceptable from a data privacy point of view. In addition, 2776 compromise of the control plane could provide opportunities for 2777 unauthorized EVPN data usage (e.g., exploiting traffic replication 2778 within a multicast tree to amplify a denial-of-service attack based 2779 on sending large amounts of traffic). 2781 The mechanisms in this document use BGP for the control plane. 2782 Hence, techniques such as those discussed in [RFC5925] help 2783 authenticate BGP messages, making it harder to spoof updates (which 2784 can be used to divert EVPN traffic to the wrong EVPN instance) or 2785 withdrawals (denial-of-service attacks). In the multi-AS backbone 2786 options (b) and (c) [RFC4364], this also means protecting the 2787 inter-AS BGP sessions between the Autonomous System Border Routers 2788 (ASBRs), the PEs, or the Route Reflectors. 2790 Further discussion of security considerations for BGP may be found in 2791 the BGP specification itself [RFC4271] and in the security analysis 2792 for BGP [RFC4272]. The original discussion of the use of the TCP MD5 2793 signature option to protect BGP sessions is found in [RFC5925], while 2795 [RFC6952] includes an analysis of BGP keying and authentication 2796 issues. 2798 Note that [RFC5925] will not help in keeping MPLS labels private -- 2799 knowing the labels, one can eavesdrop on EVPN traffic. Such 2800 eavesdropping additionally requires access to the data path within an 2801 SP network. Users of VPN services are expected to take appropriate 2802 precautions (such as encryption) to protect the data exchanged over 2803 a VPN. 2805 One of the requirements for protecting the data plane is that the 2806 MPLS labels be accepted only from valid interfaces. For a PE, valid 2807 interfaces comprise links from other routers in the PE's own AS. For 2808 an ASBR, valid interfaces comprise links from other routers in the 2809 ASBR's own AS, and links from other ASBRs in ASes that have instances 2810 of a given EVPN. It is especially important in the case of multi-AS 2811 EVPN instances that one accept EVPN packets only from valid 2812 interfaces. 2814 It is also important to help limit malicious traffic into a network 2815 for an impostor MAC address. The mechanism described in Section 15.1 2816 shows how duplicate MAC addresses can be detected and continuous 2817 false MAC mobility can be prevented. The mechanism described in 2818 Section 15.2 shows how MAC addresses can be pinned to a given 2819 Ethernet segment, such that if they appear behind any other Ethernet 2820 segments, the traffic for those MAC addresses can be prevented from 2821 entering the EVPN network from the other Ethernet segments. 2823 21. IANA Considerations 2825 This document defines a new NLRI, called "EVPN", to be carried in BGP 2826 using multiprotocol extensions. This NLRI uses the existing AFI of 2827 25 (L2VPN). IANA has assigned BGP EVPNs a SAFI value of 70. 2829 IANA has allocated the following EVPN Extended Community sub-types in 2830 [RFC7153], and this document is the only reference for them, in 2831 addition to [RFC7432]. 2833 0x00 MAC Mobility [RFC7432] 2834 0x01 ESI Label [RFC7432] 2835 0x02 ES-Import Route Target [RFC7432] 2837 This document creates a registry called "EVPN Route Types". New 2838 registrations will be made through the "RFC Required" procedure 2839 defined in [RFC5226]. The registry has a maximum value of 255. 2840 Initial registrations from [RFC7432] are as follows: 2842 0 Reserved [RFC7432] 2843 1 Ethernet Auto-discovery [RFC7432] 2844 2 MAC/IP Advertisement [RFC7432] 2845 3 Inclusive Multicast Ethernet Tag [RFC7432] 2846 4 Ethernet Segment [RFC7432] 2848 This document requests allocation of bit 3 in the "EVPN Layer 2 2849 Attributes Control Flags" registry with name F: 2851 F Flow Label MUST be present 2853 22. References 2855 22.1. Normative References 2857 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2858 Requirement Levels", BCP 14, RFC 2119, 2859 DOI 10.17487/RFC2119, March 1997, 2860 . 2862 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 2863 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 2864 DOI 10.17487/RFC4271, January 2006, 2865 . 2867 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 2868 Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, 2869 February 2006, . 2871 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 2872 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2873 2006, . 2875 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 2876 "Multiprotocol Extensions for BGP-4", RFC 4760, 2877 DOI 10.17487/RFC4760, January 2007, 2878 . 2880 [RFC4761] Kompella, K., Ed. and Y. Rekhter, Ed., "Virtual Private 2881 LAN Service (VPLS) Using BGP for Auto-Discovery and 2882 Signaling", RFC 4761, DOI 10.17487/RFC4761, January 2007, 2883 . 2885 [RFC4762] Lasserre, M., Ed. and V. Kompella, Ed., "Virtual Private 2886 LAN Service (VPLS) Using Label Distribution Protocol (LDP) 2887 Signaling", RFC 4762, DOI 10.17487/RFC4762, January 2007, 2888 . 2890 [RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP 2891 Extended Communities", RFC 7153, DOI 10.17487/RFC7153, 2892 March 2014, . 2894 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 2895 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 2896 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2897 2015, . 2899 [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. 2900 Rabadan, "Virtual Private Wire Service Support in Ethernet 2901 VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017, 2902 . 2904 [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, 2905 J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet 2906 VPN Designated Forwarder Election Extensibility", 2907 RFC 8584, DOI 10.17487/RFC8584, April 2019, 2908 . 2910 22.2. Informative References 2912 [I-D.ietf-bess-evpn-igmp-mld-proxy] 2913 Sajassi, A., Thoria, S., Mishra, M., Patel, K., Drake, J., 2914 and W. Lin, "IGMP and MLD Proxy for EVPN", draft-ietf- 2915 bess-evpn-igmp-mld-proxy-07 (work in progress), April 2916 2021. 2918 [I-D.ietf-bess-evpn-mh-split-horizon] 2919 Rabadan, J., Nagaraj, K., Lin, W., and A. Sajassi, "EVPN 2920 Multi-Homing Extensions for Split Horizon Filtering", 2921 draft-ietf-bess-evpn-mh-split-horizon-01 (work in 2922 progress), April 2021. 2924 [I-D.ietf-bess-mvpn-evpn-aggregation-label] 2925 Zhang, Z., Rosen, E., Lin, W., Li, Z., and I. Wijnands, 2926 "MVPN/EVPN Tunnel Aggregation with Common Labels", draft- 2927 ietf-bess-mvpn-evpn-aggregation-label-06 (work in 2928 progress), April 2021. 2930 [I-D.ietf-bier-evpn] 2931 Zhang, Z., Przygienda, T., Sajassi, A., and J. Rabadan, 2932 "EVPN BUM Using BIER", draft-ietf-bier-evpn-04 (work in 2933 progress), December 2020. 2935 [IEEE.802.1D_2004] 2936 IEEE, "IEEE Standard for Local and metropolitan area 2937 networks: Media Access Control (MAC) Bridges", IEEE 2938 802.1D-2004, DOI 10.1109/ieeestd.2004.94569, July 2004, 2939 . 2941 [IEEE.802.1Q_2014] 2942 IEEE, "IEEE Standard for Local and metropolitan area 2943 networks--Bridges and Bridged Networks", IEEE 802.1Q-2014, 2944 DOI 10.1109/ieeestd.2014.6991462, December 2014, 2945 . 2948 [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", 2949 RFC 4272, DOI 10.17487/RFC4272, January 2006, 2950 . 2952 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 2953 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 2954 Use over an MPLS PSN", RFC 4385, DOI 10.17487/RFC4385, 2955 February 2006, . 2957 [RFC4664] Andersson, L., Ed. and E. Rosen, Ed., "Framework for Layer 2958 2 Virtual Private Networks (L2VPNs)", RFC 4664, 2959 DOI 10.17487/RFC4664, September 2006, 2960 . 2962 [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 2963 R., Patel, K., and J. Guichard, "Constrained Route 2964 Distribution for Border Gateway Protocol/MultiProtocol 2965 Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual 2966 Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684, 2967 November 2006, . 2969 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2970 IANA Considerations Section in RFCs", RFC 5226, 2971 DOI 10.17487/RFC5226, May 2008, 2972 . 2974 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 2975 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 2976 June 2010, . 2978 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 2979 Encodings and Procedures for Multicast in MPLS/BGP IP 2980 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 2981 . 2983 [RFC6790] Kompella, K., Drake, J., Amante, S., Henderickx, W., and 2984 L. Yong, "The Use of Entropy Labels in MPLS Forwarding", 2985 RFC 6790, DOI 10.17487/RFC6790, November 2012, 2986 . 2988 [RFC6952] Jethanandani, M., Patel, K., and L. Zheng, "Analysis of 2989 BGP, LDP, PCEP, and MSDP Issues According to the Keying 2990 and Authentication for Routing Protocols (KARP) Design 2991 Guide", RFC 6952, DOI 10.17487/RFC6952, May 2013, 2992 . 2994 [RFC7117] Aggarwal, R., Ed., Kamite, Y., Fang, L., Rekhter, Y., and 2995 C. Kodeboniya, "Multicast in Virtual Private LAN Service 2996 (VPLS)", RFC 7117, DOI 10.17487/RFC7117, February 2014, 2997 . 2999 [RFC7209] Sajassi, A., Aggarwal, R., Uttaro, J., Bitar, N., 3000 Henderickx, W., and A. Isaac, "Requirements for Ethernet 3001 VPN (EVPN)", RFC 7209, DOI 10.17487/RFC7209, May 2014, 3002 . 3004 [RFC8317] Sajassi, A., Ed., Salam, S., Drake, J., Uttaro, J., 3005 Boutros, S., and J. Rabadan, "Ethernet-Tree (E-Tree) 3006 Support in Ethernet VPN (EVPN) and Provider Backbone 3007 Bridging EVPN (PBB-EVPN)", RFC 8317, DOI 10.17487/RFC8317, 3008 January 2018, . 3010 [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., 3011 Uttaro, J., and W. Henderickx, "A Network Virtualization 3012 Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, 3013 DOI 10.17487/RFC8365, March 2018, 3014 . 3016 [RFC9136] Rabadan, J., Ed., Henderickx, W., Drake, J., Lin, W., and 3017 A. Sajassi, "IP Prefix Advertisement in Ethernet VPN 3018 (EVPN)", RFC 9136, DOI 10.17487/RFC9136, October 2021, 3019 . 3021 22.3. URIs 3023 [1] https://tools.ietf.org/rfcdiff?url1=https://www.rfc- 3024 editor.org/rfc/rfc7432.txt&url2=https://www.ietf.org/archive/id/ 3025 draft-ietf-bess-rfc7432bis-02.txt 3027 Appendix A. Acknowledgments for This Document (2021) 3029 Appendix B. Contributors for This Document (2021) 3031 In addition to the authors listed on the front page, the following 3032 co-authors have also contributed to this document: 3034 Appendix C. Acknowledgments from the First Edition (2015) 3036 Special thanks to Yakov Rekhter for reviewing this document several 3037 times and providing valuable comments, and for his very engaging 3038 discussions on several topics of this document that helped shape this 3039 document. We would also like to thank Pedro Marques, Kaushik Ghosh, 3040 Nischal Sheth, Robert Raszuk, Amit Shukla, and Nadeem Mohammed for 3041 discussions that helped shape this document. We would also like to 3042 thank Han Nguyen for his comments and support of this work. We would 3043 also like to thank Steve Kensil and Reshad Rahman for their reviews. 3044 We would like to thank Jorge Rabadan for his contribution to 3045 Section 5 of this document. We would like to thank Thomas Morin for 3046 his review of this document and his contribution of Section 8.7. 3047 Many thanks to Jakob Heitz for his help to improve several sections 3048 of this document. 3050 We would also like to thank Clarence Filsfils, Dennis Cai, Quaizar 3051 Vohra, Kireeti Kompella, and Apurva Mehta for their contributions to 3052 this document. 3054 Last but not least, special thanks to Giles Heron (our WG chair) for 3055 his detailed review of this document in preparation for WG Last Call 3056 and for making many valuable suggestions. 3058 C.1. Contributors from the First Edition (2015) 3060 In addition to the authors listed on the front page, the following 3061 co-authors have also contributed to this document: 3063 Keyur Patel 3064 Samer Salam 3065 Sami Boutros 3066 Cisco 3068 Yakov Rekhter 3069 Ravi Shekhar 3070 Juniper Networks 3072 Florin Balus 3073 Nuage Networks 3075 C.2. Authors from the First Edition (2015) 3077 Original Authors: 3079 Ali Sajassi 3080 Cisco 3082 EMail: sajassi@cisco.com 3084 Rahul Aggarwal 3085 Arktan 3087 EMail: raggarwa_1@yahoo.com 3089 Nabil Bitar 3090 Verizon Communications 3092 EMail : nabil.n.bitar@verizon.com 3094 Aldrin Isaac 3095 Bloomberg 3097 EMail: aisaac71@bloomberg.net 3099 James Uttaro 3100 AT&T 3102 EMail: uttaro@att.com 3104 John Drake 3105 Juniper Networks 3107 EMail: jdrake@juniper.net 3109 Wim Henderickx 3110 Alcatel-Lucent 3112 EMail: wim.henderickx@alcatel-lucent.com 3114 Authors' Addresses 3116 Ali Sajassi 3117 Cisco 3119 Email: sajassi@cisco.com 3120 Luc Andre Burdet (editor) 3121 Cisco 3123 Email: lburdet@cisco.com 3125 John Drake 3126 Juniper 3128 Email: jdrake@juniper.net 3130 Jorge Rabadan 3131 Nokia 3133 Email: jorge.rabadan@nokia.com