| < draft-ietf-bess-evpn-overlay-04.txt | draft-ietf-bess-evpn-overlay-12.txt > | |||
|---|---|---|---|---|
| L2VPN Workgroup A. Sajassi (Editor) | BESS Workgroup A. Sajassi (Editor) | |||
| INTERNET-DRAFT Cisco | INTERNET-DRAFT Cisco | |||
| Intended Status: Standards Track J. Drake (Editor) | Intended Status: Standards Track J. Drake (Editor) | |||
| Juniper | Juniper | |||
| N. Bitar | N. Bitar | |||
| Nokia | Nokia | |||
| R. Shekhar | R. Shekhar | |||
| Juniper | Juniper | |||
| J. Uttaro | J. Uttaro | |||
| AT&T | AT&T | |||
| W. Henderickx | W. Henderickx | |||
| Nokia | Nokia | |||
| Expires: December 10, 2016 June 10, 2016 | Expires: August 9, 2018 February 9, 2018 | |||
| A Network Virtualization Overlay Solution using EVPN | A Network Virtualization Overlay Solution using EVPN | |||
| draft-ietf-bess-evpn-overlay-04 | draft-ietf-bess-evpn-overlay-12 | |||
| Abstract | Abstract | |||
| This document describes how Ethernet VPN (EVPN) [RFC7432] can be used | This document specifies how Ethernet VPN (EVPN) can be used as a | |||
| as an Network Virtualization Overlay (NVO) solution and explores the | Network Virtualization Overlay (NVO) solution and explores the | |||
| various tunnel encapsulation options over IP and their impact on the | various tunnel encapsulation options over IP and their impact on the | |||
| EVPN control-plane and procedures. In particular, the following | EVPN control-plane and procedures. In particular, the following | |||
| encapsulation options are analyzed: VXLAN, NVGRE, and MPLS over GRE. | encapsulation options are analyzed: Virtual Extensible LAN (VXLAN), | |||
| Network Virtualization using Generic Routing Encapsulation (NVGRE), | ||||
| and MPLS over Generic Routing Encapsulation (GRE). This specification | ||||
| is also applicable to Generic Network Virtualization Encapsulation | ||||
| (GENEVE) encapsulation; however, some incremental work is required | ||||
| which will be covered in a separate document. This document also | ||||
| specifies new multi-homing procedures for split-horizon filtering and | ||||
| mass-withdraw. It also specifies EVPN route constructions for | ||||
| VXLAN/NVGRE encapsulations and Autonomous System Boundary Router | ||||
| (ASBR) procedures for multi-homing of Network Virtualization (NV) | ||||
| Edge devices. | ||||
| Status of this Memo | Status of this Memo | |||
| This Internet-Draft is submitted to IETF in full conformance with the | This Internet-Draft is submitted to IETF in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| other groups may also distribute working documents as | other groups may also distribute working documents as | |||
| Internet-Drafts. | Internet-Drafts. | |||
| skipping to change at page 2, line 7 ¶ | skipping to change at page 2, line 18 ¶ | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/1id-abstracts.html | http://www.ietf.org/1id-abstracts.html | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html | http://www.ietf.org/shadow.html | |||
| Copyright and License Notice | Copyright and License Notice | |||
| Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 2 Specification of Requirements . . . . . . . . . . . . . . . . . 5 | 2 Requirements Notation and Conventions . . . . . . . . . . . . . 5 | |||
| 3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 4 EVPN Features . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 4 EVPN Features . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 5 Encapsulation Options for EVPN Overlays . . . . . . . . . . . . 7 | 5 Encapsulation Options for EVPN Overlays . . . . . . . . . . . . 8 | |||
| 5.1 VXLAN/NVGRE Encapsulation . . . . . . . . . . . . . . . . . 7 | 5.1 VXLAN/NVGRE Encapsulation . . . . . . . . . . . . . . . . . 8 | |||
| 5.1.1 Virtual Identifiers Scope . . . . . . . . . . . . . . . 8 | 5.1.1 Virtual Identifiers Scope . . . . . . . . . . . . . . . 9 | |||
| 5.1.1.1 Data Center Interconnect with Gateway . . . . . . . 8 | 5.1.1.1 Data Center Interconnect with Gateway . . . . . . . 9 | |||
| 5.1.1.2 Data Center Interconnect without Gateway . . . . . . 9 | 5.1.1.2 Data Center Interconnect without Gateway . . . . . . 9 | |||
| 5.1.2 Virtual Identifiers to EVI Mapping . . . . . . . . . . . 9 | 5.1.2 Virtual Identifiers to EVI Mapping . . . . . . . . . . . 10 | |||
| 5.1.2.1 Auto Derivation of RT . . . . . . . . . . . . . . . 10 | 5.1.2.1 Auto Derivation of RT . . . . . . . . . . . . . . . 11 | |||
| 5.1.3 Constructing EVPN BGP Routes . . . . . . . . . . . . . 11 | 5.1.3 Constructing EVPN BGP Routes . . . . . . . . . . . . . 13 | |||
| 5.2 MPLS over GRE . . . . . . . . . . . . . . . . . . . . . . . 13 | 5.2 MPLS over GRE . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 6 EVPN with Multiple Data Plane Encapsulations . . . . . . . . . 13 | 6 EVPN with Multiple Data Plane Encapsulations . . . . . . . . . 15 | |||
| 7 NVE Residing in Hypervisor . . . . . . . . . . . . . . . . . . 14 | 7 Single-Homing NVEs - NVE Residing in Hypervisor . . . . . . . . 15 | |||
| 7.1 Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE | 7.1 Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE | |||
| Encapsulation . . . . . . . . . . . . . . . . . . . . . . . 14 | Encapsulation . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
| 7.2 Impact on EVPN Procedures for VXLAN/NVGRE Encapsulation . . 15 | ||||
| 8 NVE Residing in ToR Switch . . . . . . . . . . . . . . . . . . 15 | ||||
| 8.1 EVPN Multi-Homing Features . . . . . . . . . . . . . . . . 16 | ||||
| 8.1.1 Multi-homed Ethernet Segment Auto-Discovery . . . . . . 16 | ||||
| 8.1.2 Fast Convergence and Mass Withdraw . . . . . . . . . . . 16 | ||||
| 8.1.3 Split-Horizon . . . . . . . . . . . . . . . . . . . . . 16 | ||||
| 8.1.4 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 17 | ||||
| 8.1.5 DF Election . . . . . . . . . . . . . . . . . . . . . . 17 | ||||
| 8.2 Impact on EVPN BGP Routes & Attributes . . . . . . . . . . . 18 | ||||
| 8.3 Impact on EVPN Procedures . . . . . . . . . . . . . . . . . 18 | ||||
| 8.3.1 Split Horizon . . . . . . . . . . . . . . . . . . . . . 19 | ||||
| 8.3.2 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 19 | ||||
| 9 Support for Multicast . . . . . . . . . . . . . . . . . . . . . 20 | 7.2 Impact on EVPN Procedures for VXLAN/NVGRE Encapsulation . . 16 | |||
| 10 Data Center Interconnections - DCI . . . . . . . . . . . . . . 20 | 8 Multi-Homing NVEs - NVE Residing in ToR Switch . . . . . . . . 17 | |||
| 10.1 DCI using GWs . . . . . . . . . . . . . . . . . . . . . . . 21 | 8.1 EVPN Multi-Homing Features . . . . . . . . . . . . . . . . 17 | |||
| 10.2 DCI using ASBRs . . . . . . . . . . . . . . . . . . . . . . 21 | 8.1.1 Multi-homed Ethernet Segment Auto-Discovery . . . . . . 18 | |||
| 10.2.1 ASBR Functionality with NVEs in Hypervisors . . . . . . 22 | 8.1.2 Fast Convergence and Mass Withdraw . . . . . . . . . . . 18 | |||
| 10.2.2 ASBR Functionality with NVEs in TORs . . . . . . . . . 22 | 8.1.3 Split-Horizon . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 11 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 24 | 8.1.4 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 18 | |||
| 12 Security Considerations . . . . . . . . . . . . . . . . . . . 24 | 8.1.5 DF Election . . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 13 IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 | 8.2 Impact on EVPN BGP Routes & Attributes . . . . . . . . . . . 20 | |||
| 14 References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | 8.3 Impact on EVPN Procedures . . . . . . . . . . . . . . . . . 20 | |||
| 14.1 Normative References . . . . . . . . . . . . . . . . . . . 25 | 8.3.1 Split Horizon . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 14.2 Informative References . . . . . . . . . . . . . . . . . . 26 | 8.3.2 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 21 | |||
| Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 | 8.3.3 Unknown Unicast Traffic Designation . . . . . . . . . . 21 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 | 9 Support for Multicast . . . . . . . . . . . . . . . . . . . . . 22 | |||
| 10 Data Center Interconnections - DCI . . . . . . . . . . . . . . 23 | ||||
| 10.1 DCI using GWs . . . . . . . . . . . . . . . . . . . . . . . 23 | ||||
| 10.2 DCI using ASBRs . . . . . . . . . . . . . . . . . . . . . . 24 | ||||
| 10.2.1 ASBR Functionality with Single-Homing NVEs . . . . . . 25 | ||||
| 10.2.2 ASBR Functionality with Multi-Homing NVEs . . . . . . . 25 | ||||
| 11 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 27 | ||||
| 12 Security Considerations . . . . . . . . . . . . . . . . . . . 27 | ||||
| 13 IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 | ||||
| 14 References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 | ||||
| 14.1 Normative References . . . . . . . . . . . . . . . . . . . 28 | ||||
| 14.2 Informative References . . . . . . . . . . . . . . . . . . 29 | ||||
| Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 | ||||
| 1 Introduction | 1 Introduction | |||
| This document specifies how Ethernet VPN (EVPN) [RFC7432] can be used | ||||
| as a Network Virtualization Overlay (NVO) solution and explores the | ||||
| various tunnel encapsulation options over IP and their impact on the | ||||
| EVPN control-plane and procedures. In particular, the following | ||||
| encapsulation options are analyzed: Virtual Extensible LAN (VXLAN) | ||||
| [RFC7348], Network Virtualization using Generic Routing Encapsulation | ||||
| (NVGRE) [RFC7637], and MPLS over Generic Routing Encapsulation (GRE) | ||||
| [RFC4023]. This specification is also applicable to Generic Network | ||||
| Virtualization Encapsulation (GENEVE) encapsulation [GENEVE]; | ||||
| however, some incremental work is required which will be covered in a | ||||
| separate document [EVPN-GENEVE]. This document also specifies new | ||||
| multi-homing procedures for split-horizon filtering and mass- | ||||
| withdraw. It also specifies EVPN route constructions for VXLAN/NVGRE | ||||
| encapsulations and Autonomous System Boundary Router (ASBR) | ||||
| procedures for multi-homing of Network Virtualization (NV) Edge | ||||
| devices. | ||||
| In the context of this document, a Network Virtualization Overlay | In the context of this document, a Network Virtualization Overlay | |||
| (NVO) is a solution to address the requirements of a multi-tenant | (NVO) is a solution to address the requirements of a multi-tenant | |||
| data center, especially one with virtualized hosts, e.g., Virtual | data center, especially one with virtualized hosts, e.g., Virtual | |||
| Machines (VMs). The key requirements of such a solution, as described | Machines (VMs) or virtual workloads. The key requirements of such a | |||
| in [Problem-Statement], are: | solution, as described in [RFC7364], are: | |||
| - Isolation of network traffic per tenant | - Isolation of network traffic per tenant | |||
| - Support for a large number of tenants (tens or hundreds of | - Support for a large number of tenants (tens or hundreds of | |||
| thousands) | thousands) | |||
| - Extending L2 connectivity among different VMs belonging to a given | - Extending L2 connectivity among different VMs belonging to a given | |||
| tenant segment (subnet) across different PODs within a data center or | tenant segment (subnet) across different Point of Deliveries (PODs) | |||
| between different data centers | within a data center or between different data centers | |||
| - Allowing a given VM to move between different physical points of | - Allowing a given VM to move between different physical points of | |||
| attachment within a given L2 segment | attachment within a given L2 segment | |||
| The underlay network for NVO solutions is assumed to provide IP | The underlay network for NVO solutions is assumed to provide IP | |||
| connectivity between NVO endpoints (NVEs). | connectivity between NVO endpoints (NVEs). | |||
| This document describes how Ethernet VPN (EVPN) can be used as an NVO | This document describes how Ethernet VPN (EVPN) can be used as an NVO | |||
| solution and explores applicability of EVPN functions and procedures. | solution and explores applicability of EVPN functions and procedures. | |||
| In particular, it describes the various tunnel encapsulation options | In particular, it describes the various tunnel encapsulation options | |||
| for EVPN over IP, and their impact on the EVPN control-plane and | for EVPN over IP, and their impact on the EVPN control-plane and | |||
| procedures for two main scenarios: | procedures for two main scenarios: | |||
| a) when the NVE resides in the hypervisor, and | a) single-homing NVEs - when a NVE resides in the hypervisor, and | |||
| b) when the NVE resides in a Top of Rack (ToR) device | b) multi-homing NVEs - when a NVE resides in a Top of Rack (ToR) | |||
| device | ||||
| Note that the use of EVPN as an NVO solution does not necessarily | ||||
| mandate that the BGP control-plane be running on the NVE. For such | ||||
| scenarios, it is still possible to leverage the EVPN solution by | ||||
| using XMPP, or alternative mechanisms, to extend the control-plane to | ||||
| the NVE as discussed in [L3VPN-ENDSYSTEMS]. | ||||
| The possible encapsulation options for EVPN overlays that are | The possible encapsulation options for EVPN overlays that are | |||
| analyzed in this document are: | analyzed in this document are: | |||
| - VXLAN and NVGRE | - VXLAN and NVGRE | |||
| - MPLS over GRE | - MPLS over GRE | |||
| Before getting into the description of the different encapsulation | Before getting into the description of the different encapsulation | |||
| options for EVPN over IP, it is important to highlight the EVPN | options for EVPN over IP, it is important to highlight the EVPN | |||
| solution's main features, how those features are currently supported, | solution's main features, how those features are currently supported, | |||
| and any impact that the encapsulation has on those features. | and any impact that the encapsulation has on those features. | |||
| 2 Specification of Requirements | 2 Requirements Notation and Conventions | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| document are to be interpreted as described in [RFC2119]. | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | ||||
| capitals, as shown here. | ||||
| 3 Terminology | 3 Terminology | |||
| Most of the terminology used in this documents comes from [RFC7432] | ||||
| and [RFC7365]. | ||||
| VXLAN: Virtual Extensible LAN | ||||
| GRE: Generic Routing Encapsulation | ||||
| NVGRE: Network Virtualization using Generic Routing Encapsulation | ||||
| GENEVE: Generic Network Virtualization Encapsulation | ||||
| POD: Point of Delivery | ||||
| NV: Network Virtualization | ||||
| NVO: Network Virtualization Overlay | NVO: Network Virtualization Overlay | |||
| NVE: Network Virtualization Endpoint | NVE: Network Virtualization Endpoint | |||
| VNI: Virtual Network Identifier (for VXLAN) | VNI: Virtual Network Identifier (for VXLAN) | |||
| VSID: Virtual Subnet Identifier (for NVGRE) | VSID: Virtual Subnet Identifier (for NVGRE) | |||
| EVPN: Ethernet VPN | EVPN: Ethernet VPN | |||
| EVI: An EVPN instance spanning the Provider Edge (PE) devices | EVI: An EVPN instance spanning the Provider Edge (PE) devices | |||
| participating in that EVPN. | participating in that EVPN | |||
| MAC-VRF: A Virtual Routing and Forwarding table for Media Access | MAC-VRF: A Virtual Routing and Forwarding table for Media Access | |||
| Control (MAC) addresses on a PE. | Control (MAC) addresses on a PE | |||
| IP-VRF: A Virtual Routing and Forwarding table for Internet Protocol | ||||
| (IP) addresses on a PE | ||||
| Ethernet Segment (ES): When a customer site (device or network) is | Ethernet Segment (ES): When a customer site (device or network) is | |||
| connected to one or more PEs via a set of Ethernet links, then that | connected to one or more PEs via a set of Ethernet links, then that | |||
| set of links is referred to as an 'Ethernet segment'. | set of links is referred to as an 'Ethernet segment'. | |||
| Ethernet Segment Identifier (ESI): A unique non-zero identifier that | Ethernet Segment Identifier (ESI): A unique non-zero identifier that | |||
| identifies an Ethernet segment is called an 'Ethernet Segment | identifies an Ethernet segment is called an 'Ethernet Segment | |||
| Identifier'. | Identifier'. | |||
| Ethernet Tag: An Ethernet tag identifies a particular broadcast | Ethernet Tag: An Ethernet tag identifies a particular broadcast | |||
| skipping to change at page 6, line 8 ¶ | skipping to change at page 6, line 41 ¶ | |||
| Single-Active Redundancy Mode: When only a single PE, among all the | Single-Active Redundancy Mode: When only a single PE, among all the | |||
| PEs attached to an Ethernet segment, is allowed to forward traffic | PEs attached to an Ethernet segment, is allowed to forward traffic | |||
| to/from that Ethernet segment for a given VLAN, then the Ethernet | to/from that Ethernet segment for a given VLAN, then the Ethernet | |||
| segment is defined to be operating in Single-Active redundancy mode. | segment is defined to be operating in Single-Active redundancy mode. | |||
| All-Active Redundancy Mode: When all PEs attached to an Ethernet | All-Active Redundancy Mode: When all PEs attached to an Ethernet | |||
| segment are allowed to forward known unicast traffic to/from that | segment are allowed to forward known unicast traffic to/from that | |||
| Ethernet segment for a given VLAN, then the Ethernet segment is | Ethernet segment for a given VLAN, then the Ethernet segment is | |||
| defined to be operating in All-Active redundancy mode. | defined to be operating in All-Active redundancy mode. | |||
| PIM-SM: Protocol Independent Multicast - Sparse-Mode | ||||
| PIM-SSM: Protocol Independent Multicast - Source Specific Multicast | ||||
| Bidir PIM: Bidirectional PIM | ||||
| 4 EVPN Features | 4 EVPN Features | |||
| EVPN was originally designed to support the requirements detailed in | EVPN [RFC7432] was originally designed to support the requirements | |||
| [RFC7209] and therefore has the following attributes which directly | detailed in [RFC7209] and therefore has the following attributes | |||
| address control plane scaling and ease of deployment issues. | which directly address control plane scaling and ease of deployment | |||
| issues. | ||||
| 1) Control plane traffic is distributed with BGP and Broadcast and | 1) Control plane information is distributed with BGP and Broadcast | |||
| Multicast traffic is sent using a shared multicast tree or with | and Multicast traffic is sent using a shared multicast tree or with | |||
| ingress replication. | ingress replication. | |||
| 2) Control plane learning is used for MAC (and IP) addresses instead | 2) Control plane learning is used for MAC (and IP) addresses instead | |||
| of data plane learning. The latter requires the flooding of unknown | of data plane learning. The latter requires the flooding of unknown | |||
| unicast and ARP frames; whereas, the former does not require any | unicast and Address Resolution Protocol (ARP) frames; whereas, the | |||
| flooding. | former does not require any flooding. | |||
| 3) Route Reflector is used to reduce a full mesh of BGP sessions | 3) Route Reflector (RR) is used to reduce a full mesh of BGP sessions | |||
| among PE devices to a single BGP session between a PE and the RR. | among PE devices to a single BGP session between a PE and the RR. | |||
| Furthermore, RR hierarchy can be leveraged to scale the number of BGP | Furthermore, RR hierarchy can be leveraged to scale the number of BGP | |||
| routes on the RR. | routes on the RR. | |||
| 4) Auto-discovery via BGP is used to discover PE devices | 4) Auto-discovery via BGP is used to discover PE devices | |||
| participating in a given VPN, PE devices participating in a given | participating in a given VPN, PE devices participating in a given | |||
| redundancy group, tunnel encapsulation types, multicast tunnel type, | redundancy group, tunnel encapsulation types, multicast tunnel type, | |||
| multicast members, etc. | multicast members, etc. | |||
| 5) All-Active multihoming is used. This allows a given customer | 5) All-Active multihoming is used. This allows a given customer | |||
| device (CE) to have multiple links to multiple PEs, and traffic | device (CE) to have multiple links to multiple PEs, and traffic | |||
| to/from that CE fully utilizes all of these links. This set of links | to/from that CE fully utilizes all of these links. | |||
| is termed an Ethernet Segment (ES). | ||||
| 6) When a link between a CE and a PE fails, the PEs for that EVI are | 6) When a link between a CE and a PE fails, the PEs for that EVI are | |||
| notified of the failure via the withdrawal of a single EVPN route. | notified of the failure via the withdrawal of a single EVPN route. | |||
| This allows those PEs to remove the withdrawing PE as a next hop for | This allows those PEs to remove the withdrawing PE as a next hop for | |||
| every MAC address associated with the failed link. This is termed | every MAC address associated with the failed link. This is termed | |||
| 'mass withdrawal'. | 'mass withdrawal'. | |||
| 7) BGP route filtering and constrained route distribution are | 7) BGP route filtering and constrained route distribution are | |||
| leveraged to ensure that the control plane traffic for a given EVI is | leveraged to ensure that the control plane traffic for a given EVI is | |||
| only distributed to the PEs in that EVI. | only distributed to the PEs in that EVI. | |||
| skipping to change at page 7, line 16 ¶ | skipping to change at page 8, line 6 ¶ | |||
| 9) VM Mobility mechanisms ensure that all PEs in a given EVI know | 9) VM Mobility mechanisms ensure that all PEs in a given EVI know | |||
| the ES with which a given VM, as identified by its MAC and IP | the ES with which a given VM, as identified by its MAC and IP | |||
| addresses, is currently associated. | addresses, is currently associated. | |||
| 10) Route Targets are used to allow the operator (or customer) to | 10) Route Targets are used to allow the operator (or customer) to | |||
| define a spectrum of logical network topologies including mesh, hub & | define a spectrum of logical network topologies including mesh, hub & | |||
| spoke, and extranets (e.g., a VPN whose sites are owned by different | spoke, and extranets (e.g., a VPN whose sites are owned by different | |||
| enterprises), without the need for proprietary software or the aid of | enterprises), without the need for proprietary software or the aid of | |||
| other virtual or physical devices. | other virtual or physical devices. | |||
| 11) Because the design goal for NVO is millions of instances per | Because the design goal for NVO is millions of instances per common | |||
| common physical infrastructure, the scaling properties of the control | physical infrastructure, the scaling properties of the control plane | |||
| plane for NVO are extremely important. EVPN and the extensions | for NVO are extremely important. EVPN and the extensions described | |||
| described herein, are designed with this level of scalability in | herein, are designed with this level of scalability in mind. | |||
| mind. | ||||
| 5 Encapsulation Options for EVPN Overlays | 5 Encapsulation Options for EVPN Overlays | |||
| 5.1 VXLAN/NVGRE Encapsulation | 5.1 VXLAN/NVGRE Encapsulation | |||
| Both VXLAN and NVGRE are examples of technologies that provide a data | Both VXLAN and NVGRE are examples of technologies that provide a data | |||
| plane encapsulation which is used to transport a packet over the | plane encapsulation which is used to transport a packet over the | |||
| common physical IP infrastructure between Network Virtualization | common physical IP infrastructure between Network Virtualization | |||
| Edges (NVEs) - e.g., VXLAN Tunnel End Points (VTEPs) in VXLAN | Edges (NVEs) - e.g., VXLAN Tunnel End Points (VTEPs) in VXLAN | |||
| network. Both of these technologies include the identifier of the | network. Both of these technologies include the identifier of the | |||
| skipping to change at page 8, line 9 ¶ | skipping to change at page 8, line 45 ¶ | |||
| VLAN ID gets mapped to an EVPN instance (EVI). | VLAN ID gets mapped to an EVPN instance (EVI). | |||
| VXLAN also provides an option of including an inner VLAN tag in the | VXLAN also provides an option of including an inner VLAN tag in the | |||
| encapsulated frame, if explicitly configured at the VTEP. This mode | encapsulated frame, if explicitly configured at the VTEP. This mode | |||
| of operation can map to VLAN Bundle Service in [RFC7432] because all | of operation can map to VLAN Bundle Service in [RFC7432] because all | |||
| the tenant's tagged frames map to a single bridge table / MAC-VRF, | the tenant's tagged frames map to a single bridge table / MAC-VRF, | |||
| and the inner VLAN tag is not used for lookup by the disposition PE | and the inner VLAN tag is not used for lookup by the disposition PE | |||
| when performing VXLAN decapsulation as described in section 6 of | when performing VXLAN decapsulation as described in section 6 of | |||
| [RFC7348]. | [RFC7348]. | |||
| [NVGRE] encapsulation is based on [GRE] and it mandates the inclusion | [RFC7637] encapsulation is based on GRE encapsulation and it mandates | |||
| of the optional GRE Key field which carries the VSID. There is a one- | the inclusion of the optional GRE Key field which carries the VSID. | |||
| to-one mapping between the VSID and the tenant VLAN ID, as described | There is a one-to-one mapping between the VSID and the tenant VLAN | |||
| in [NVGRE] and the inclusion of an inner VLAN tag is prohibited. This | ID, as described in [RFC7637] and the inclusion of an inner VLAN tag | |||
| mode of operation in [NVGRE] maps to VLAN Based Service in | is prohibited. This mode of operation in [RFC7637] maps to VLAN Based | |||
| [RFC7432]. | Service in [RFC7432]. | |||
| As described in the next section there is no change to the encoding | As described in the next section there is no change to the encoding | |||
| of EVPN routes to support VXLAN or NVGRE encapsulation except for the | of EVPN routes to support VXLAN or NVGRE encapsulation except for the | |||
| use of BGP Encapsulation extended community to indicate the | use of the BGP Encapsulation extended community to indicate the | |||
| encapsulation type (e.g., VxLAN or NVGRE). However, there is | encapsulation type (e.g., VXLAN or NVGRE). However, there is | |||
| potential impact to the EVPN procedures depending on where the NVE is | potential impact to the EVPN procedures depending on where the NVE is | |||
| located (i.e., in hypervisor or TOR) and whether multi-homing | located (i.e., in hypervisor or TOR) and whether multi-homing | |||
| capabilities are required. | capabilities are required. | |||
| 5.1.1 Virtual Identifiers Scope | 5.1.1 Virtual Identifiers Scope | |||
| Although VNIs are defined as 24-bit globally unique values, there are | Although VNIs are defined as 24-bit globally unique values, there are | |||
| scenarios in which it is desirable to use a locally significant value | scenarios in which it is desirable to use a locally significant value | |||
| for VNI, especially in the context of data center interconnect: | for VNI, especially in the context of data center interconnect: | |||
| skipping to change at page 9, line 49 ¶ | skipping to change at page 10, line 28 ¶ | |||
| |NVE2|--| | | | | |--|NVE4| | |NVE2|--| | | | | |--|NVE4| | |||
| +----+ +---------+ +--------------+ +---------+ +----+ | +----+ +---------+ +--------------+ +---------+ +----+ | |||
| |<------ DC 1 -----> <---- DC2 ------>| | |<------ DC 1 -----> <---- DC2 ------>| | |||
| Figure 2: Data Center Interconnect with ASBR | Figure 2: Data Center Interconnect with ASBR | |||
| 5.1.2 Virtual Identifiers to EVI Mapping | 5.1.2 Virtual Identifiers to EVI Mapping | |||
| When the EVPN control plane is used in conjunction with VXLAN (or | When the EVPN control plane is used in conjunction with VXLAN (or | |||
| NVGRE encapsulation), two options for mapping the VXLAN VNI (or NVGRE | NVGRE encapsulation), just like [RFC7432] where two options existed | |||
| VSID) to an EVI are possible: | for mapping broadcast domains (represented by VLAN IDs) to an EVI, in | |||
| here there are also two options for mapping broadcast domains | ||||
| represented by VXLAN VNIs (or NVGRE VSIDs) to an EVI: | ||||
| 1. Option 1: Single Subnet per EVI | 1. Option 1: Single Broadcast Domain per EVI | |||
| In this option, a single subnet represented by a VNI is mapped to a | In this option, a single Ethernet broadcast domain (e.g., subnet) | |||
| unique EVI. This corresponds to the VLAN Based service in [RFC7432], | represented by a VNI is mapped to a unique EVI. This corresponds to | |||
| where a tenant VLAN ID gets mapped to an EVPN instance (EVI). As | the VLAN Based service in [RFC7432], where a tenant-facing interface, | |||
| such, a BGP RD and RT is needed per VNI on every NVE. The advantage | logical interface (e.g., represented by a VLAN ID) or physical, gets | |||
| of this model is that it allows the BGP RT constraint mechanisms to | mapped to an EVPN instance (EVI). As such, a BGP RD and RT are needed | |||
| be used in order to limit the propagation and import of routes to | per VNI on every NVE. The advantage of this model is that it allows | |||
| only the NVEs that are interested in a given VNI. The disadvantage of | the BGP RT constraint mechanisms to be used in order to limit the | |||
| this model may be the provisioning overhead if RD and RT are not | propagation and import of routes to only the NVEs that are interested | |||
| derived automatically from VNI. | in a given VNI. The disadvantage of this model may be the | |||
| provisioning overhead if RD and RT are not derived automatically from | ||||
| VNI. | ||||
| In this option, the MAC-VRF table is identified by the RT in the | In this option, the MAC-VRF table is identified by the RT in the | |||
| control plane and by the VNI in the data-plane. In this option, the | control plane and by the VNI in the data-plane. In this option, the | |||
| specific the MAC-VRF table corresponds to only a single bridge table. | specific MAC-VRF table corresponds to only a single bridge table. | |||
| 2. Option 2: Multiple Subnets per EVI | 2. Option 2: Multiple Broadcast Domains per EVI | |||
| In this option, multiple subnets each represented by a unique VNI are | In this option, multiple subnets each represented by a unique VNI are | |||
| mapped to a single EVI. For example, if a tenant has multiple | mapped to a single EVI. For example, if a tenant has multiple | |||
| segments/subnets each represented by a VNI, then all the VNIs for | segments/subnets each represented by a VNI, then all the VNIs for | |||
| that tenant are mapped to a single EVI - e.g., the EVI in this case | that tenant are mapped to a single EVI - e.g., the EVI in this case | |||
| represents the tenant and not a subnet . This corresponds to the | represents the tenant and not a subnet . This corresponds to the | |||
| VLAN-aware bundle service in [RFC7432]. The advantage of this model | VLAN-aware bundle service in [RFC7432]. The advantage of this model | |||
| is that it doesn't require the provisioning of RD/RT per VNI. | is that it doesn't require the provisioning of RD/RT per VNI. | |||
| However, this is a moot point if option 1 with auto-derivation is | However, this is a moot point when compared to option 1 where auto- | |||
| used. The disadvantage of this model is that routes would be imported | derivation is used. The disadvantage of this model is that routes | |||
| by NVEs that may not be interested in a given VNI. | would be imported by NVEs that may not be interested in a given VNI. | |||
| In this option the MAC-VRF table is identified by the RT in the | In this option the MAC-VRF table is identified by the RT in the | |||
| control plane and a specific bridge table for that MAC-VRF is | control plane and a specific bridge table for that MAC-VRF is | |||
| identified by the <RT, Ethernet Tag ID> in the control plane. In this | identified by the <RT, Ethernet Tag ID> in the control plane. In this | |||
| option, the VNI in the data-plane is sufficient to identify a | option, the VNI in the data-plane is sufficient to identify a | |||
| specific bridge table - e.g., no need to do a lookup based on VNI and | specific bridge table. | |||
| Ethernet Tag ID fields to identify a bridge table. | ||||
| 5.1.2.1 Auto Derivation of RT | 5.1.2.1 Auto Derivation of RT | |||
| When the option of a single VNI per EVI is used, it is important to | When the option of a single VNI per EVI is used, in order to simplify | |||
| auto-derive RT for EVPN BGP routes in order to simplify configuration | configuration, the RT used for EVPN can be auto-derived. RD can be | |||
| for data center operations. RD can be auto generated as described in | auto generated as described in [RFC7432] and RT can be auto-derived | |||
| [RFC7432] and RT can be auto-derived as described next. | as described next. | |||
| Since a gateway PE as depicted in figure-1 participates in both the | Since a gateway PE as depicted in figure-1 participates in both the | |||
| DCN and WAN BGP sessions, it is important that when RT values are | DCN and WAN BGP sessions, it is important that when RT values are | |||
| auto-derived for VNIs, there is no conflict in RT spaces between DCN | auto-derived from VNIs, there is no conflict in RT spaces between DCN | |||
| and WAN networks assuming that both are operating within the same AS. | and WAN networks assuming that both are operating within the same AS. | |||
| Also, there can be scenarios where both VXLAN and NVGRE | Also, there can be scenarios where both VXLAN and NVGRE | |||
| encapsulations may be needed within the same DCN and their | encapsulations may be needed within the same DCN and their | |||
| corresponding VNIs are administered independently which means VNI | corresponding VNIs are administered independently which means VNI | |||
| spaces can overlap. In order to ensure that no such conflict in RT | spaces can overlap. In order to avoid conflict in RT spaces arises, | |||
| spaces arises, RT values for DCNs are auto-derived as follow: | the 6-byte RT values with 2-octet AS number for DCNs can be auto- | |||
| derived as follow: | ||||
| 0 1 2 3 4 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 0 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | AS # |A| TYPE| D-ID |Service Instance ID| | | Global Administrator | Local Administrator | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+ | +-----------------------------------------------+---------------+ | |||
| | Local Administrator (Cont.) | | ||||
| +-------------------------------+ | ||||
| - 2 bytes of global admin field of the RT is set to the AS number. | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Global Administrator |A| TYPE| D-ID | Service ID | | ||||
| +-----------------------------------------------+---------------+ | ||||
| | Service ID (Cont.) | | ||||
| +-------------------------------+ | ||||
| - Three least significant bytes of the local admin field of the RT is | The 6-octet RT field consists of two sub-field: | |||
| set to the VNI, VSID, I-SID, or VID. | ||||
| - The most significant bit of the local admin field of the RT is set | - Global Administrator sub-field: 2 octets. This sub-field contains | |||
| as follow: | an Autonomous System number assigned by IANA. | |||
| 0: auto-derived | ||||
| 1: manually-derived | ||||
| - The next 3 bits of the most significant byte of the local admin | - Local Administrator sub-field: 4 octets | |||
| field of the RT identifies the space in which the other 3 bytes are | ||||
| defined. The following spaces are defined: | ||||
| 0 : VID | ||||
| 1 : VXLAN | ||||
| 2 : NVGRE | ||||
| 3 : I-SID | ||||
| 4 : EVI | ||||
| 5 : dual-VID | ||||
| - The remaining 4 bits of the most significant byte of the local | * A: A single-bit field indicating if this RT is auto-derived | |||
| admin field of the RT identifies the domain-id. The default value of | ||||
| domain-id is zero indicating that only a single numbering space exist | 0: auto-derived | |||
| for a given technology. However, if there are more than one number | 1: manually-derived | |||
| space exist for a given technology (e.g., overlapping VXLAN spaces), | ||||
| then each of the number spaces need to be identify by their | * Type: A 3-bit field that identifies the space in which | |||
| corresponding domain-id starting from 1. | the other 3 bytes are defined. The following spaces are | |||
| defined: | ||||
| 0 : VID (802.1Q VLAN ID) | ||||
| 1 : VXLAN | ||||
| 2 : NVGRE | ||||
| 3 : I-SID | ||||
| 4 : EVI | ||||
| 5 : dual-VID (QinQ VLAN ID) | ||||
| * D-ID: A 4-bit field that identifies domain-id. The default | ||||
| value of domain-id is zero indicating that only a single | ||||
| numbering space exist for a given technology. However, if | ||||
| there are more than one number space exist for a given | ||||
| technology (e.g., overlapping VXLAN spaces), then each of | ||||
| the number spaces need to be identify by their | ||||
| corresponding domain-id starting from 1. | ||||
| * Service ID: This 3-octet field is set to VNI, VSID, I-SID, | ||||
| or VID. | ||||
| It should be noted that RT auto-derivation is applicable for 2-octet | ||||
| AS numbers. For 4-octet AS numbers, RT needs to be manually | ||||
| configured since 3-octet VNI fields cannot be fit within 2-octet | ||||
| local administrator field. | ||||
| 5.1.3 Constructing EVPN BGP Routes | 5.1.3 Constructing EVPN BGP Routes | |||
| In EVPN, an MPLS label is distributed by the egress PE via the EVPN | In EVPN, an MPLS label for instance identifying forwarding table is | |||
| control plane and is placed in the MPLS header of a given packet by | distributed by the egress PE via the EVPN control plane and is placed | |||
| the ingress PE. This label is used upon receipt of that packet by the | in the MPLS header of a given packet by the ingress PE. This label is | |||
| egress PE for disposition of that packet. This is very similar to the | used upon receipt of that packet by the egress PE for disposition of | |||
| use of the VNI by the egress NVE, with the difference being that an | that packet. This is very similar to the use of the VNI by the egress | |||
| MPLS label has local significance while a VNI typically has global | NVE, with the difference being that an MPLS label has local | |||
| significance. Accordingly, and specifically to support the option of | significance while a VNI typically has global significance. | |||
| locally assigned VNIs, the MPLS label field in the MAC Advertisement, | Accordingly, and specifically to support the option of locally- | |||
| Ethernet AD per EVI, and Inclusive Multicast Ethernet Tag routes is | assigned VNIs, the MPLS Label1 field in the MAC/IP Advertisement | |||
| used to carry the VNI. For the balance of this memo, the MPLS label | route, the MPLS label field in the Ethernet AD per EVI route, and the | |||
| field will be referred to as the VNI field. The VNI field is used for | MPLS label field in the PMSI Tunnel Attribute of the Inclusive | |||
| both local and global VNIs, and for either case the entire 24-bit | Multicast Ethernet Tag (IMET) route are used to carry the VNI. For | |||
| field is used to encode the VNI value. | the balance of this memo, the above MPLS label fields will be | |||
| referred to as the VNI field. The VNI field is used for both local | ||||
| and global VNIs, and for either case the entire 24-bit field is used | ||||
| to encode the VNI value. | ||||
| For the VLAN-based service (a single VNI per MAC-VRF), the Ethernet | For the VLAN-based service (a single VNI per MAC-VRF), the Ethernet | |||
| Tag field in the MAC/IP Advertisement, Ethernet AD per EVI, and | Tag field in the MAC/IP Advertisement, Ethernet AD per EVI, and IMET | |||
| Inclusive Multicast route MUST be set to zero just as in the VLAN | route MUST be set to zero just as in the VLAN Based service in | |||
| Based service in [RFC7432]. | [RFC7432]. | |||
| For the VLAN-aware bundle service (multiple VNIs per MAC-VRF with | For the VLAN-aware bundle service (multiple VNIs per MAC-VRF with | |||
| each VNI associated with its own bridge table), the Ethernet Tag | each VNI associated with its own bridge table), the Ethernet Tag | |||
| field in the MAC Advertisement, Ethernet AD per EVI, and Inclusive | field in the MAC Advertisement, Ethernet AD per EVI, and IMET route | |||
| Multicast route MUST identify a bridge table within a MAC-VRF and the | MUST identify a bridge table within a MAC-VRF and the set of Ethernet | |||
| set of Ethernet Tags for that EVI needs to be configured consistently | Tags for that EVI needs to be configured consistently on all PEs | |||
| on all PEs within that EVI. For local VNIs, the value advertised in | within that EVI. For locally-assigned VNIs, the value advertised in | |||
| the Ethernet Tag field MUST be set to a VID just as in the VLAN-aware | the Ethernet Tag field MUST be set to a VID just as in the VLAN-aware | |||
| bundle service in [RFC7432]. Such setting must be done consistently | bundle service in [RFC7432]. Such setting must be done consistently | |||
| on all PE devices participating in that EVI within a given domain. | on all PE devices participating in that EVI within a given domain. | |||
| For global VNIs, the value advertised in the Ethernet Tag field | For global VNIs, the value advertised in the Ethernet Tag field | |||
| SHOULD be set to a VNI as long as it matches the existing semantics | SHOULD be set to a VNI as long as it matches the existing semantics | |||
| of the Ethernet Tag, i.e., it identifies a bridge table within a MAC- | of the Ethernet Tag, i.e., it identifies a bridge table within a MAC- | |||
| VRF and the set of VNIs are configured consistently on each PE in | VRF and the set of VNIs are configured consistently on each PE in | |||
| that EVI. | that EVI. | |||
| In order to indicate that which type of data plane encapsulation | In order to indicate which type of data plane encapsulation (i.e., | |||
| (i.e., VXLAN, NVGRE, MPLS, or MPLS in GRE) is to be used, the BGP | VXLAN, NVGRE, MPLS, or MPLS in GRE) is to be used, the BGP | |||
| Encapsulation extended community defined in [TUNNEL-ENCAP]and | Encapsulation extended community defined in [RFC5512] is included | |||
| [RFC5512] is included with all EVPN routes (i.e. MAC Advertisement, | with all EVPN routes (i.e. MAC Advertisement, Ethernet AD per EVI, | |||
| Ethernet AD per EVI, Ethernet AD per ESI, Inclusive Multicast | Ethernet AD per ESI, Inclusive Multicast Ethernet Tag, and Ethernet | |||
| Ethernet Tag, and Ethernet Segment) advertised by an egress PE. Five | Segment) advertised by an egress PE. Five new values have been | |||
| new values have been assigned by IANA to extend the list of | assigned by IANA to extend the list of encapsulation types defined in | |||
| encapsulation types defined in [TUNNEL-ENCAP] and they are listed in | [RFC5512] and they are listed in section 13. | |||
| section 13. | ||||
| The MPLS encapsulation tunnel type, listed in section 13, is needed | The MPLS encapsulation tunnel type, listed in section 13, is needed | |||
| in order to distinguish between an advertising node that only | in order to distinguish between an advertising node that only | |||
| supports non-MPLS encapsulations and one that supports MPLS and non- | supports non-MPLS encapsulations and one that supports MPLS and non- | |||
| MPLS encapsulations. An advertising node that only supports MPLS | MPLS encapsulations. An advertising node that only supports MPLS | |||
| encapsulation does not need to advertise any encapsulation tunnel | encapsulation does not need to advertise any encapsulation tunnel | |||
| types; i.e., if the BGP Encapsulation extended community is not | types; i.e., if the BGP Encapsulation extended community is not | |||
| present, then either MPLS encapsulation or a statically configured | present, then either MPLS encapsulation or a statically configured | |||
| encapsulation is assumed. | encapsulation is assumed. | |||
| The Ethernet Segment and Ethernet AD per ESI routes MAY be advertised | ||||
| with multiple encapsulation types as long as they use the same EVPN | ||||
| multi-homing procedures - e.g., the mix of VXLAN and NVGRE | ||||
| encapsulation types is a valid one but not the mix of VXLAN and MPLS | ||||
| encapsulation types. | ||||
| The Next Hop field of the MP_REACH_NLRI attribute of the route MUST | The Next Hop field of the MP_REACH_NLRI attribute of the route MUST | |||
| be set to the IPv4 or IPv6 address of the NVE. The remaining fields | be set to the IPv4 or IPv6 address of the NVE. The remaining fields | |||
| in each route are set as per [RFC7432]. | in each route are set as per [RFC7432]. | |||
| Note that the procedure defined here to use the MPLS Label field to | ||||
| carry the VNI in the presence of a Tunnel Encapsulation Extended | ||||
| Community specifying the use of a VNI, is aligned with the procedures | ||||
| described in section 8.2.2.2 of [TUNNEL-ENCAP] ("When a Valid VNI has | ||||
| not been Signaled"). | ||||
| 5.2 MPLS over GRE | 5.2 MPLS over GRE | |||
| The EVPN data-plane is modeled as an EVPN MPLS client layer sitting | The EVPN data-plane is modeled as an EVPN MPLS client layer sitting | |||
| over an MPLS PSN-tunnel server layer. Some of the EVPN functions | over an MPLS PSN-tunnel server layer. Some of the EVPN functions | |||
| (split-horizon, aliasing, and backup-path) are tied to the MPLS | (split-horizon, aliasing, and backup-path) are tied to the MPLS | |||
| client layer. If MPLS over GRE encapsulation is used, then the EVPN | client layer. If MPLS over GRE encapsulation is used, then the EVPN | |||
| MPLS client layer can be carried over an IP PSN tunnel transparently. | MPLS client layer can be carried over an IP PSN tunnel transparently. | |||
| Therefore, there is no impact to the EVPN procedures and associated | Therefore, there is no impact to the EVPN procedures and associated | |||
| data-plane operation. | data-plane operation. | |||
| The existing standards for MPLS over GRE encapsulation as defined by | The existing standards for MPLS over GRE encapsulation as defined by | |||
| [RFC4023] can be used for this purpose; however, when it is used in | [RFC4023] can be used for this purpose; however, when it is used in | |||
| conjunction with EVPN the GRE key field SHOULD be present, and SHOULD | conjunction with EVPN, it is recommended that the GRE key field be | |||
| be used to provide a 32-bit entropy field. The Checksum and Sequence | present and be used to provide a 32-bit entropy value only if the P | |||
| Number fields are not needed and their corresponding C and S bits | nodes can perform Equal-Cost Multipath (ECMP) hashing based on the | |||
| MUST be set to zero. A PE capable of supporting this encapsulation, | GRE key; otherwise, the GRE header SHOULD NOT include the GRE key. | |||
| should advertise its EVPN routes along with the Tunnel Encapsulation | The Checksum and Sequence Number fields MUST NOT be included and the | |||
| extended community indicating MPLS over GRE encapsulation, as | corresponding C and S bits in the GRE Packet Header MUST be set to | |||
| described in previous section. | zero. A PE capable of supporting this encapsulation, SHOULD advertise | |||
| its EVPN routes along with the Tunnel Encapsulation extended | ||||
| community indicating MPLS over GRE encapsulation as described in | ||||
| previous section. | ||||
| 6 EVPN with Multiple Data Plane Encapsulations | 6 EVPN with Multiple Data Plane Encapsulations | |||
| The use of the BGP Encapsulation extended community per [TUNNEL- | The use of the BGP Encapsulation extended community per [RFC5512] | |||
| ENCAP] and [RFC5512] allows each NVE in a given EVI to know each of | allows each NVE in a given EVI to know each of the encapsulations | |||
| the encapsulations supported by each of the other NVEs in that EVI. | supported by each of the other NVEs in that EVI. i.e., each of the | |||
| i.e., each of the NVEs in a given EVI may support multiple data plane | NVEs in a given EVI may support multiple data plane encapsulations. | |||
| encapsulations. An ingress NVE can send a frame to an egress NVE | An ingress NVE can send a frame to an egress NVE only if the set of | |||
| only if the set of encapsulations advertised by the egress NVE in the | encapsulations advertised by the egress NVE forms a non-empty | |||
| subject MAC/IP Advertisement or per EVI Ethernet AD route, forms a | intersection with the set of encapsulations supported by the ingress | |||
| non-empty intersection with the set of encapsulations supported by | NVE, and it is at the discretion of the ingress NVE which | |||
| the ingress NVE, and it is at the discretion of the ingress NVE which | ||||
| encapsulation to choose from this intersection. (As noted in | encapsulation to choose from this intersection. (As noted in | |||
| section 5.1.3, if the BGP Encapsulation extended community is not | section 5.1.3, if the BGP Encapsulation extended community is not | |||
| present, then the default MPLS encapsulation or a statically | present, then the default MPLS encapsulation or a locally configured | |||
| configured encapsulation is assumed.) | encapsulation is assumed.) | |||
| When a PE advertises multiple supported encapsulations, it MUST | ||||
| advertise encapsulations that use the same EVPN procedures including | ||||
| procedures associated with split-horizon filtering described in | ||||
| section 8.3.1. For example, VXLAN and NVGRE (or MPLS and MPLS over | ||||
| GRE) encapsulations use the same EVPN procedures and thus a PE can | ||||
| advertise both of them and can support either of them or both of them | ||||
| simultaneously. However, a PE MUST NOT advertise VXLAN and MPLS | ||||
| encapsulations together because (a) the MPLS field of EVPN routes is | ||||
| set to either an MPLS label or a VNI but not both and (b) some EVPN | ||||
| procedures (such as split-horizon filtering) are different for | ||||
| VXLAN/NVGRE and MPLS encapsulations. | ||||
| An ingress node that uses shared multicast trees for sending | An ingress node that uses shared multicast trees for sending | |||
| broadcast or multicast frames MUST maintain distinct trees for each | broadcast or multicast frames MAY maintain distinct trees for each | |||
| different encapsulation type. | different encapsulation type. | |||
| It is the responsibility of the operator of a given EVI to ensure | It is the responsibility of the operator of a given EVI to ensure | |||
| that all of the NVEs in that EVI support at least one common | that all of the NVEs in that EVI support at least one common | |||
| encapsulation. If this condition is violated, it could result in | encapsulation. If this condition is violated, it could result in | |||
| service disruption or failure. The use of the BGP Encapsulation | service disruption or failure. The use of the BGP Encapsulation | |||
| extended community provides a method to detect when this condition is | extended community provides a method to detect when this condition is | |||
| violated but the actions to be taken are at the discretion of the | violated but the actions to be taken are at the discretion of the | |||
| operator and are outside the scope of this document. | operator and are outside the scope of this document. | |||
| 7 NVE Residing in Hypervisor | 7 Single-Homing NVEs - NVE Residing in Hypervisor | |||
| When a NVE and its hosts/VMs are co-located in the same physical | When a NVE and its hosts/VMs are co-located in the same physical | |||
| device, e.g., when they reside in a server, the links between them | device, e.g., when they reside in a server, the links between them | |||
| are virtual and they typically share fate; i.e., the subject | are virtual and they typically share fate; i.e., the subject | |||
| hosts/VMs are typically not multi-homed or if they are multi-homed, | hosts/VMs are typically not multi-homed or if they are multi-homed, | |||
| the multi-homing is a purely local matter to the server hosting the | the multi-homing is a purely local matter to the server hosting the | |||
| VM and the NVEs, and need not be "visible" to any other NVEs residing | VM and the NVEs, and need not be "visible" to any other NVEs residing | |||
| on other servers, and thus does not require any specific protocol | on other servers, and thus does not require any specific protocol | |||
| mechanisms. The most common case of this is when the NVE resides on | mechanisms. The most common case of this is when the NVE resides on | |||
| the hypervisor. | the hypervisor. | |||
| In the sub-sections that follow, we will discuss the impact on EVPN | In the sub-sections that follow, we will discuss the impact on EVPN | |||
| procedures for the case when the NVE resides on the hypervisor and | procedures for the case when the NVE resides on the hypervisor and | |||
| the VXLAN (or NVGRE) encapsulation is used. | the VXLAN (or NVGRE) encapsulation is used. | |||
| 7.1 Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE Encapsulation | 7.1 Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE Encapsulation | |||
| In the scenario where all data centers are under a single | In scenarios where different groups of data centers are under | |||
| administrative domain, and there is a single global VNI space, the RD | different administrative domains, and these data centers are | |||
| MAY be set to zero in the EVPN routes. However, in the scenario where | connected via one or more backbone core providers as described in | |||
| different groups of data centers are under different administrative | [RFC7365], the RD must be a unique value per EVI or per NVE as | |||
| domains, and these data centers are connected via one or more | described in [RFC7432]. In other words, whenever there is more than | |||
| backbone core providers as described in [NOV3-Framework], the RD must | one administrative domain for global VNI, then a unique RD must be | |||
| be a unique value per EVI or per NVE as described in [RFC7432]. In | used, or whenever the VNI value has local significance, then a unique | |||
| other words, whenever there is more than one administrative domain | RD must be used. Therefore, it is recommended to use a unique RD as | |||
| for global VNI, then a non-zero RD MUST be used, or whenever the VNI | described in [RFC7432] at all time. | |||
| value have local significance, then a non-zero RD MUST be used. It is | ||||
| recommend to use a non-zero RD at all time. | ||||
| When the NVEs reside on the hypervisor, the EVPN BGP routes and | When the NVEs reside on the hypervisor, the EVPN BGP routes and | |||
| attributes associated with multi-homing are no longer required. This | attributes associated with multi-homing are no longer required. This | |||
| reduces the required routes and attributes to the following subset of | reduces the required routes and attributes to the following subset of | |||
| four out of eight: | four out of the total of eight listed in section 7 of [RFC7432]: | |||
| - MAC/IP Advertisement Route | - MAC/IP Advertisement Route | |||
| - Inclusive Multicast Ethernet Tag Route | - Inclusive Multicast Ethernet Tag Route | |||
| - MAC Mobility Extended Community | - MAC Mobility Extended Community | |||
| - Default Gateway Extended Community | - Default Gateway Extended Community | |||
| However, as noted in section 8.6 of [RFC7432] in order to enable a | However, as noted in section 8.6 of [RFC7432] in order to enable a | |||
| single-homing ingress NVE to take advantage of fast convergence, | single-homing ingress NVE to take advantage of fast convergence, | |||
| aliasing, and backup-path when interacting with multi-homed egress | aliasing, and backup-path when interacting with multi-homed egress | |||
| NVEs attached to a given Ethernet segment, the single-homing ingress | NVEs attached to a given Ethernet segment, the single-homing ingress | |||
| NVE SHOULD be able to receive and process Ethernet AD per ES and | NVE should be able to receive and process Ethernet AD per ES and | |||
| Ethernet AD per EVI routes. | Ethernet AD per EVI routes. | |||
| 7.2 Impact on EVPN Procedures for VXLAN/NVGRE Encapsulation | 7.2 Impact on EVPN Procedures for VXLAN/NVGRE Encapsulation | |||
| When the NVEs reside on the hypervisors, the EVPN procedures | When the NVEs reside on the hypervisors, the EVPN procedures | |||
| associated with multi-homing are no longer required. This limits the | associated with multi-homing are no longer required. This limits the | |||
| procedures on the NVE to the following subset of the EVPN procedures: | procedures on the NVE to the following subset of the EVPN procedures: | |||
| 1. Local learning of MAC addresses received from the VMs per section | 1. Local learning of MAC addresses received from the VMs per section | |||
| 10.1 of [RFC7432]. | 10.1 of [RFC7432]. | |||
| skipping to change at page 15, line 44 ¶ | skipping to change at page 17, line 22 ¶ | |||
| 4. Discovering other NVEs and constructing the multicast tunnels | 4. Discovering other NVEs and constructing the multicast tunnels | |||
| using the Inclusive Multicast Ethernet Tag routes. | using the Inclusive Multicast Ethernet Tag routes. | |||
| 5. Handling MAC address mobility events per the procedures of Section | 5. Handling MAC address mobility events per the procedures of Section | |||
| 16 in [RFC7432]. | 16 in [RFC7432]. | |||
| However, as noted in section 8.6 of [RFC7432] in order to enable a | However, as noted in section 8.6 of [RFC7432] in order to enable a | |||
| single-homing ingress NVE to take advantage of fast convergence, | single-homing ingress NVE to take advantage of fast convergence, | |||
| aliasing, and back-up path when interacting with multi-homed egress | aliasing, and back-up path when interacting with multi-homed egress | |||
| NVEs attached to a given Ethernet segment, a single-homing ingress | NVEs attached to a given Ethernet segment, a single-homing ingress | |||
| NVE SHOULD implement the ingress node processing of Ethernet AD per | NVE should implement the ingress node processing of Ethernet AD per | |||
| ES and Ethernet AD per EVI routes as defined in sections 8.2 Fast | ES and Ethernet AD per EVI routes as defined in sections 8.2 Fast | |||
| Convergence and 8.4 Aliasing and Backup-Path of [RFC7432]. | Convergence and 8.4 Aliasing and Backup-Path of [RFC7432]. | |||
| 8 NVE Residing in ToR Switch | 8 Multi-Homing NVEs - NVE Residing in ToR Switch | |||
| In this section, we discuss the scenario where the NVEs reside in the | In this section, we discuss the scenario where the NVEs reside in the | |||
| Top of Rack (ToR) switches AND the servers (where VMs are residing) | Top of Rack (ToR) switches AND the servers (where VMs are residing) | |||
| are multi-homed to these ToR switches. The multi-homing may operate | are multi-homed to these ToR switches. The multi-homing NVE operate | |||
| in All-Active or Single-Active redundancy mode. If the servers are | in All-Active or Single-Active redundancy mode. If the servers are | |||
| single-homed to the ToR switches, then the scenario becomes similar | single-homed to the ToR switches, then the scenario becomes similar | |||
| to that where the NVE resides on the hypervisor, as discussed in | to that where the NVE resides on the hypervisor, as discussed in | |||
| Section 7, as far as the required EVPN functionality are concerned. | Section 7, as far as the required EVPN functionality are concerned. | |||
| [RFC7432] defines a set of BGP routes, attributes and procedures to | [RFC7432] defines a set of BGP routes, attributes and procedures to | |||
| support multi-homing. We first describe these functions and | support multi-homing. We first describe these functions and | |||
| procedures, then discuss which of these are impacted by the VxLAN | procedures, then discuss which of these are impacted by the VXLAN | |||
| (or NVGRE) encapsulation and what modifications are required. | (or NVGRE) encapsulation and what modifications are required. As it | |||
| will be seen later in this section, the only EVPN procedure that is | ||||
| impacted by non-MPLS overlay encapsulation (e.g., VXLAN or NVGRE) | ||||
| where it provides space for one ID rather than stack of labels, is | ||||
| that of split-horizon filtering for multi-homed Ethernet Segments | ||||
| described in section 8.3.1. | ||||
| 8.1 EVPN Multi-Homing Features | 8.1 EVPN Multi-Homing Features | |||
| In this section, we will recap the multi-homing features of EVPN to | In this section, we will recap the multi-homing features of EVPN to | |||
| highlight the encapsulation dependencies. The section only describes | highlight the encapsulation dependencies. The section only describes | |||
| the features and functions at a high-level. For more details, the | the features and functions at a high-level. For more details, the | |||
| reader is to refer to [RFC7432]. | reader is to refer to [RFC7432]. | |||
| 8.1.1 Multi-homed Ethernet Segment Auto-Discovery | 8.1.1 Multi-homed Ethernet Segment Auto-Discovery | |||
| skipping to change at page 16, line 49 ¶ | skipping to change at page 18, line 31 ¶ | |||
| for all MAC addresses associated with the Ethernet segment in | for all MAC addresses associated with the Ethernet segment in | |||
| question. If no other NVE had advertised an Ethernet A-D route for | question. If no other NVE had advertised an Ethernet A-D route for | |||
| the same segment, then the NVE that received the withdrawal simply | the same segment, then the NVE that received the withdrawal simply | |||
| invalidates the MAC entries for that segment. Otherwise, the NVE | invalidates the MAC entries for that segment. Otherwise, the NVE | |||
| updates the next-hop adjacency list accordingly. | updates the next-hop adjacency list accordingly. | |||
| 8.1.3 Split-Horizon | 8.1.3 Split-Horizon | |||
| If a server is multi-homed to two or more NVEs (represented by an | If a server is multi-homed to two or more NVEs (represented by an | |||
| Ethernet segment ES1) and operating in an all-active redundancy mode, | Ethernet segment ES1) and operating in an all-active redundancy mode, | |||
| sends a BUM packet (ie, Broadcast, Unknown unicast, or Multicast) | sends a BUM packet (ie, Broadcast, Unknown unicast, or Multicast) to | |||
| packet to one of these NVEs, then it is important to ensure the | one of these NVEs, then it is important to ensure the packet is not | |||
| packet is not looped back to the server via another NVE connected to | looped back to the server via another NVE connected to this server. | |||
| this server. The filtering mechanism on the NVE to prevent such loop | The filtering mechanism on the NVE to prevent such loop and packet | |||
| and packet duplication is called "split horizon filtering'. | duplication is called "split horizon filtering'. | |||
| 8.1.4 Aliasing and Backup-Path | 8.1.4 Aliasing and Backup-Path | |||
| In the case where a station is multi-homed to multiple NVEs, it is | In the case where a station is multi-homed to multiple NVEs, it is | |||
| possible that only a single NVE learns a set of the MAC addresses | possible that only a single NVE learns a set of the MAC addresses | |||
| associated with traffic transmitted by the station. This leads to a | associated with traffic transmitted by the station. This leads to a | |||
| situation where remote NVEs receive MAC advertisement routes, for | situation where remote NVEs receive MAC advertisement routes, for | |||
| these addresses, from a single NVE even though multiple NVEs are | these addresses, from a single NVE even though multiple NVEs are | |||
| connected to the multi-homed station. As a result, the remote NVEs | connected to the multi-homed station. As a result, the remote NVEs | |||
| are not able to effectively load-balance traffic among the NVEs | are not able to effectively load-balance traffic among the NVEs | |||
| skipping to change at page 17, line 29 ¶ | skipping to change at page 19, line 11 ¶ | |||
| from a given source MAC address to a single NVE. Another scenario | from a given source MAC address to a single NVE. Another scenario | |||
| where this occurs is when the NVEs rely on control plane learning on | where this occurs is when the NVEs rely on control plane learning on | |||
| the access (e.g. using ARP), since ARP traffic will be hashed to a | the access (e.g. using ARP), since ARP traffic will be hashed to a | |||
| single link in the LAG. | single link in the LAG. | |||
| To alleviate this issue, EVPN introduces the concept of Aliasing. | To alleviate this issue, EVPN introduces the concept of Aliasing. | |||
| This refers to the ability of an NVE to signal that it has | This refers to the ability of an NVE to signal that it has | |||
| reachability to a given locally attached Ethernet segment, even when | reachability to a given locally attached Ethernet segment, even when | |||
| it has learnt no MAC addresses from that segment. The Ethernet A-D | it has learnt no MAC addresses from that segment. The Ethernet A-D | |||
| route per EVI is used to that end. Remote NVEs which receive MAC | route per EVI is used to that end. Remote NVEs which receive MAC | |||
| advertisement routes with non-zero ESI SHOULD consider the MAC | advertisement routes with non-zero ESI should consider the MAC | |||
| address as reachable via all NVEs that advertise reachability to the | address as reachable via all NVEs that advertise reachability to the | |||
| relevant Segment using Ethernet A-D routes with the same ESI and with | relevant Segment using Ethernet A-D routes with the same ESI and with | |||
| the Single-Active flag reset. | the Single-Active flag reset. | |||
| Backup-Path is a closely related function, albeit it applies to the | Backup-Path is a closely related function, albeit it applies to the | |||
| case where the redundancy mode is Single-Active. In this case, the | case where the redundancy mode is Single-Active. In this case, the | |||
| NVE signals that it has reachability to a given locally attached | NVE signals that it has reachability to a given locally attached | |||
| Ethernet Segment using the Ethernet A-D route as well. Remote NVEs | Ethernet Segment using the Ethernet A-D route as well. Remote NVEs | |||
| which receive the MAC advertisement routes, with non-zero ESI, SHOULD | which receive the MAC advertisement routes, with non-zero ESI, should | |||
| consider the MAC address as reachable via the advertising NVE. | consider the MAC address as reachable via the advertising NVE. | |||
| Furthermore, the remote NVEs SHOULD install a Backup-Path, for said | Furthermore, the remote NVEs should install a Backup-Path, for said | |||
| MAC, to the NVE which had advertised reachability to the relevant | MAC, to the NVE which had advertised reachability to the relevant | |||
| Segment using an Ethernet A-D route with the same ESI and with the | Segment using an Ethernet A-D route with the same ESI and with the | |||
| Single-Active flag set. | Single-Active flag set. | |||
| 8.1.5 DF Election | 8.1.5 DF Election | |||
| If a host is multi-homed to two or more NVEs on an Ethernet segment | If a host is multi-homed to two or more NVEs on an Ethernet segment | |||
| operating in all-active redundancy mode, then for a given EVI only | operating in all-active redundancy mode, then for a given EVI only | |||
| one of these NVEs, termed the Designated Forwarder (DF) is | one of these NVEs, termed the Designated Forwarder (DF) is | |||
| responsible for sending it broadcast, multicast, and, if configured | responsible for sending it broadcast, multicast, and, if configured | |||
| for that EVI, unknown unicast frames. | for that EVI, unknown unicast frames. | |||
| This is required in order to prevent duplicate delivery of multi- | This is required in order to prevent duplicate delivery of multi- | |||
| destination frames to a multi-homed host or VM, in case of all-active | destination frames to a multi-homed host or VM, in case of all-active | |||
| redundancy. | redundancy. | |||
| In NVEs where .1Q tagged frames are received from hosts, the DF | In NVEs where .1Q tagged frames are received from hosts, the DF | |||
| election is performed on host VLAN IDs (VIDs). It is assumed that for | election should be performed based on host VLAN IDs (VIDs) per | |||
| a given Ethernet Segment, VIDs are unique and consistent (e.g., no | section 8.5 of [RFC7432]. Furthermore, multi-homing PEs of a given | |||
| duplicate VIDs exist). | Ethernet Segment MAY perform DF election using configured IDs such as | |||
| VNI, EVI, normalized VIDs, and etc. as along the IDs are configured | ||||
| consistently across the multi-homing PEs. | ||||
| In GWs where VxLAN encapsulated frames are received, the DF election | In GWs where VXLAN encapsulated frames are received, the DF election | |||
| is performed on VNIs. Again, it is assumed that for a given Ethernet | is performed on VNIs. Again, it is assumed that for a given Ethernet | |||
| Segment, VNIs are unique and consistent (e.g., no duplicate VNIs | Segment, VNIs are unique and consistent (e.g., no duplicate VNIs | |||
| exist). | exist). | |||
| 8.2 Impact on EVPN BGP Routes & Attributes | 8.2 Impact on EVPN BGP Routes & Attributes | |||
| Since multi-homing is supported in this scenario, then the entire set | Since multi-homing is supported in this scenario, then the entire set | |||
| of BGP routes and attributes defined in [RFC7432] are used. The | of BGP routes and attributes defined in [RFC7432] are used. The | |||
| setting of the Ethernet Tag field in the MAC Advertisement, Ethernet | setting of the Ethernet Tag field in the MAC Advertisement, Ethernet | |||
| AD per EVI, and Inclusive Multicast routes follows that of section | AD per EVI, and Inclusive Multicast routes follows that of section | |||
| 5.1.3. Furthermore, the setting of the VNI field in the MAC | 5.1.3. Furthermore, the setting of the VNI field in the MAC | |||
| Advertisement and Ethernet AD per EVI routes follows that of section | Advertisement and Ethernet AD per EVI routes follows that of section | |||
| 5.1.3. | 5.1.3. | |||
| 8.3 Impact on EVPN Procedures | 8.3 Impact on EVPN Procedures | |||
| Two cases need to be examined here, depending on whether the NVEs are | Two cases need to be examined here, depending on whether the NVEs are | |||
| operating in Active/Standby or in All-Active redundancy. | operating in Single-Active or in All-Active redundancy mode. | |||
| First, lets consider the case of Active/Standby redundancy, where the | First, lets consider the case of Single-Active redundancy mode, where | |||
| hosts are multi-homed to a set of NVEs, however, only a single NVE is | the hosts are multi-homed to a set of NVEs, however, only a single | |||
| active at a given point of time for a given VNI. In this case, the | NVE is active at a given point of time for a given VNI. In this case, | |||
| aliasing is not required and the split-horizon may not be required, | the aliasing is not required and the split-horizon filtering may not | |||
| but other functions such as multi-homed Ethernet segment auto- | be required, but other functions such as multi-homed Ethernet segment | |||
| discovery, fast convergence and mass withdraw, backup path, and DF | auto-discovery, fast convergence and mass withdraw, backup path, and | |||
| election are required. | DF election are required. | |||
| Second, let's consider the case of All-Active redundancy. In this | Second, let's consider the case of All-Active redundancy mode. In | |||
| case, out of all the EVPN multi-homing features listed in section | this case, out of all the EVPN multi-homing features listed in | |||
| 8.1, the use of the VXLAN or NVGRE encapsulation impacts the split- | section 8.1, the use of the VXLAN or NVGRE encapsulation impacts the | |||
| horizon and aliasing features, since those two rely on the MPLS | split-horizon and aliasing features, since those two rely on the MPLS | |||
| client layer. Given that this MPLS client layer is absent with these | client layer. Given that this MPLS client layer is absent with these | |||
| types of encapsulations, alternative procedures and mechanisms are | types of encapsulations, alternative procedures and mechanisms are | |||
| needed to provide the required functions. Those are discussed in | needed to provide the required functions. Those are discussed in | |||
| detail next. | detail next. | |||
| 8.3.1 Split Horizon | 8.3.1 Split Horizon | |||
| In EVPN, an MPLS label is used for split-horizon filtering to support | In EVPN, an MPLS label is used for split-horizon filtering to support | |||
| All-Active multi-homing where an ingress NVE adds a label | All-Active multi-homing where an ingress NVE adds a label | |||
| corresponding to the site of origin (aka ESI Label) when | corresponding to the site of origin (aka ESI Label) when | |||
| encapsulating the packet. The egress NVE checks the ESI label when | encapsulating the packet. The egress NVE checks the ESI label when | |||
| attempting to forward a multi-destination frame out an interface, and | attempting to forward a multi-destination frame out an interface, and | |||
| if the label corresponds to the same site identifier (ESI) associated | if the label corresponds to the same site identifier (ESI) associated | |||
| with that interface, the packet gets dropped. This prevents the | with that interface, the packet gets dropped. This prevents the | |||
| occurrence of forwarding loops. | occurrence of forwarding loops. | |||
| Since the VXLAN or NVGRE encapsulation does not include this ESI | Since VXLAN and NVGRE encapsulations do not include the ESI label, | |||
| label, other means of performing the split-horizon filtering function | other means of performing the split-horizon filtering function must | |||
| MUST be devised. The following approach is recommended for split- | be devised for these encapsulations. The following approach is | |||
| horizon filtering when VXLAN (or NVGRE) encapsulation is used. | recommended for split-horizon filtering when VXLAN (or NVGRE) | |||
| encapsulation is used. | ||||
| Every NVE track the IP address(es) associated with the other NVE(s) | Every NVE track the IP address(es) associated with the other NVE(s) | |||
| with which it has shared multi-homed Ethernet Segments. When the NVE | with which it has shared multi-homed Ethernet Segments. When the NVE | |||
| receives a multi-destination frame from the overlay network, it | receives a multi-destination frame from the overlay network, it | |||
| examines the source IP address in the tunnel header (which | examines the source IP address in the tunnel header (which | |||
| corresponds to the ingress NVE) and filters out the frame on all | corresponds to the ingress NVE) and filters out the frame on all | |||
| local interfaces connected to Ethernet Segments that are shared with | local interfaces connected to Ethernet Segments that are shared with | |||
| the ingress NVE. With this approach, it is required that the ingress | the ingress NVE. With this approach, it is required that the ingress | |||
| NVE performs replication locally to all directly attached Ethernet | NVE performs replication locally to all directly attached Ethernet | |||
| Segments (regardless of the DF Election state) for all flooded | Segments (regardless of the DF Election state) for all flooded | |||
| traffic ingress from the access interfaces (i.e. from the hosts). | traffic ingress from the access interfaces (i.e. from the hosts). | |||
| This approach is referred to as "Local Bias", and has the advantage | This approach is referred to as "Local Bias", and has the advantage | |||
| that only a single IP address needs to be used per NVE for split- | that only a single IP address needs to be used per NVE for split- | |||
| horizon filtering, as opposed to requiring an IP address per Ethernet | horizon filtering, as opposed to requiring an IP address per Ethernet | |||
| Segment per NVE. | Segment per NVE. | |||
| In order to prevent unhealthy interactions between the split horizon | In order to allow proper operation of split-horizon filtering among | |||
| procedures defined in [RFC7432] and the local bias procedures | the same group of multi-homing PE devices, a mix of PE devices with | |||
| described in this document, a mix of MPLS over GRE encapsulations on | MPLS over GRE encapsulations running [RFC7432] procedures for split- | |||
| the one hand and VXLAN/NVGRE encapsulations on the other on a given | horizon filtering on the one hand and VXLAN/NVGRE encapsulations | |||
| Ethernet Segment is prohibited. | running local-bias procedures on the other on a given Ethernet | |||
| Segment MUST NOT be configured. | ||||
| 8.3.2 Aliasing and Backup-Path | 8.3.2 Aliasing and Backup-Path | |||
| The Aliasing and the Backup-Path procedures for VXLAN/NVGRE | The Aliasing and the Backup-Path procedures for VXLAN/NVGRE | |||
| encapsulation is very similar to the ones for MPLS. In case of MPLS, | encapsulation are very similar to the ones for MPLS. In case of MPLS, | |||
| two different Ethernet A-D routes are used for this purpose. The one | Ethernet A-D route per EVI is used for Aliasing when the | |||
| used for Aliasing has a VPN scope (per EVI) and carries a VPN label | corresponding Ethernet Segment operates in All-Active multi-homing, | |||
| but the one used for Backup-Path has Ethernet segment scope (per ES) | and the same route is used for Backup-Path when the corresponding | |||
| and doesn't carry any VPN specific info (e.g., Ethernet Tag and MPLS | Ethernet Segment operates in Single-Active multi-homing. In case of | |||
| label are set to zero). In case of VxLAN/NVGRE, the same two routes | VXLAN/NVGRE, the same route is used for the Aliasing and the Backup- | |||
| are used for the Aliasing and the Backup-Path. In case of Aliasing, | Path with the difference that the Ethernet Tag and VNI fields in | |||
| the Ethernet Tag and VNI fields in Ethernet A-D per EVI route is set | Ethernet A-D per EVI route are set as described in section 5.1.3. | |||
| as described in section 5.1.3. | ||||
| 8.3.3 Unknown Unicast Traffic Designation | ||||
| In EVPN, when an ingress PE uses ingress replication to flood unknown | ||||
| unicast traffic to egress PEs, the ingress PE uses a different EVPN | ||||
| MPLS label (from the one used for known unicast traffic) to identify | ||||
| such BUM traffic. The egress PEs use this label to identify such BUM | ||||
| traffic and thus apply DF filtering for All-Active multi-homed sites. | ||||
| In absence of unknown unicast traffic designation and in presence of | ||||
| enabling unknown unicast flooding, there can be transient duplicate | ||||
| traffic to All-Active multi-homed sites under the following | ||||
| condition: the host MAC address is learned by the egress PE(s) and | ||||
| advertised to the ingress PE; however, the MAC advertisement has not | ||||
| been received or processed by the ingress PE, resulting in the host | ||||
| MAC address to be unknown on the ingress PE but be known on the | ||||
| egress PE(s). Therefore, when a packet destined to that host MAC | ||||
| address arrives on the ingress PE, it floods it via ingress | ||||
| replication to all the egress PE(s) and since they are known to the | ||||
| egress PE(s), multiple copies is sent to the All-Active multi-homed | ||||
| site. It should be noted that such transient packet duplication only | ||||
| happens when a) the destination host is multi-homed via All-Active | ||||
| redundancy mode, b) flooding of unknown unicast is enabled in the | ||||
| network, c) ingress replication is used, and d) traffic for the | ||||
| destination host is arrived on the ingress PE before it learns the | ||||
| host MAC address via BGP EVPN advertisement. If it is desired to | ||||
| avoid occurrence of such transient packet duplication (however low | ||||
| probability that may be), then VXLAN-GPE encapsulation needs to be | ||||
| used between these PEs and the ingress PE needs to set the BUM | ||||
| Traffic Bit (B bit) [VXLAN-GPE] to indicate that this is an ingress- | ||||
| replicated BUM traffic. | ||||
| 9 Support for Multicast | 9 Support for Multicast | |||
| The E-VPN Inclusive Multicast BGP route is used to discover the | The E-VPN Inclusive Multicast Ethernet Tag (IMET) route is used to | |||
| multicast tunnels among the endpoints associated with a given EVI | discover the multicast tunnels among the endpoints associated with a | |||
| (e.g., given VNI) for VLAN-based service and a given <EVI,VLAN> for | given EVI (e.g., given VNI) for VLAN-based service and a given | |||
| VLAN-aware bundle service. The Ethernet Tag field of this route is | <EVI,VLAN> for VLAN-aware bundle service. All fields of this route is | |||
| set as described in section 5.1.3. The Originating router's IP | set as described in section 5.1.3. The Originating router's IP | |||
| address field is set to the NVE's IP address. This route is tagged | address field is set to the NVE's IP address. This route is tagged | |||
| with the PMSI Tunnel attribute, which is used to encode the type of | with the PMSI Tunnel attribute, which is used to encode the type of | |||
| multicast tunnel to be used as well as the multicast tunnel | multicast tunnel to be used as well as the multicast tunnel | |||
| identifier. The tunnel encapsulation is encoded by adding the BGP | identifier. The tunnel encapsulation is encoded by adding the BGP | |||
| Encapsulation extended community as per section 5.1.1. The following | Encapsulation extended community as per section 5.1.1. For example, | |||
| tunnel types as defined in [RFC6514] can be used in the PMSI tunnel | the PMSI Tunnel attribute may indicate the multicast tunnel is of | |||
| type Protocol Independent Multicast - Sparse-Mode (PIM-SM); whereas, | ||||
| the BGP Encapsulation extended community may indicate the | ||||
| encapsulation for that tunnel is of type VXLAN. The following tunnel | ||||
| types as defined in [RFC6514] can be used in the PMSI tunnel | ||||
| attribute for VXLAN/NVGRE: | attribute for VXLAN/NVGRE: | |||
| + 3 - PIM-SSM Tree | + 3 - PIM-SSM Tree | |||
| + 4 - PIM-SM Tree | + 4 - PIM-SM Tree | |||
| + 5 - BIDIR-PIM Tree | + 5 - Bidir-PIM Tree | |||
| + 6 - Ingress Replication | + 6 - Ingress Replication | |||
| Except for Ingress Replication, this multicast tunnel is used by the | In case of VxLAN and NVGRE encapsulation with locally-assigned VNIs, | |||
| PE originating the route for sending multicast traffic to other PEs, | just as in [RFC7432], each PE MUST advertise an IMET route to other | |||
| and is used by PEs that receive this route for receiving the traffic | PEs in an EVPN instance for the multicast tunnel type that it uses | |||
| originated by hosts connected to the PE that originated the route. | (i.e., ingress replication, PIM-SM, PIM-SSM, or Bidir-PIM tunnel). | |||
| However, for globally-assigned VNIs, each PE MUST advertise IMET | ||||
| route to other PEs in an EVPN instance for ingress replication or | ||||
| PIM-SSM tunnel, and MAY advertise IMET route for PIM-SM or Bidir-PIM | ||||
| tunnel. In case of PIM-SM or Bidir-PIM tunnel, no information in the | ||||
| IMET route is needed by the PE to setup these tunnels. | ||||
| In the scenario where the multicast tunnel is a tree, both the | In the scenario where the multicast tunnel is a tree, both the | |||
| Inclusive as well as the Aggregate Inclusive variants may be used. In | Inclusive as well as the Aggregate Inclusive variants may be used. In | |||
| the former case, a multicast tree is dedicated to a VNI. Whereas, in | the former case, a multicast tree is dedicated to a VNI. Whereas, in | |||
| the latter, a multicast tree is shared among multiple VNIs. This is | the latter, a multicast tree is shared among multiple VNIs. For VNI- | |||
| done by having the NVEs advertise multiple Inclusive Multicast routes | based service, the Aggregate Inclusive mode is accomplished by having | |||
| with different VNI encoded in the Ethernet Tag field, but with the | the NVEs advertise multiple IMET routes with different Route Targets | |||
| same tunnel identifier encoded in the PMSI Tunnel attribute. | (one per VNI) but with the same tunnel identifier encoded in the PMSI | |||
| tunnel attribute. For VNI-aware bundle service, the Aggregate | ||||
| Inclusive mode is accomplished by having the NVEs advertise multiple | ||||
| IMET routes with different VNI encoded in the Ethernet Tag field, but | ||||
| with the same tunnel identifier encoded in the PMSI Tunnel attribute. | ||||
| 10 Data Center Interconnections - DCI | 10 Data Center Interconnections - DCI | |||
| For DCI, the following two main scenarios are considered when | For DCI, the following two main scenarios are considered when | |||
| connecting data centers running evpn-overlay (as described here) over | connecting data centers running evpn-overlay (as described here) over | |||
| MPLS/IP core network: | MPLS/IP core network: | |||
| - Scenario 1: DCI using GWs | - Scenario 1: DCI using GWs | |||
| - Scenario 2: DCI using ASBRs | - Scenario 2: DCI using ASBRs | |||
| The following two subsections describe the operations for each of | The following two subsections describe the operations for each of | |||
| these scenarios. | these scenarios. | |||
| 10.1 DCI using GWs | 10.1 DCI using GWs | |||
| This is the typical scenario for interconnecting data centers over | This is the typical scenario for interconnecting data centers over | |||
| WAN. In this scenario, EVPN routes are terminated and processed in | WAN. In this scenario, EVPN routes are terminated and processed in | |||
| each GW and MAC/IP routes are always re-advertised from DC to WAN but | each GW and MAC/IP routes are always re-advertised from DC to WAN but | |||
| from WAN to DC, they are not re-advertised if unknown MAC address | from WAN to DC, they are not re-advertised if unknown MAC address | |||
| (and default IP address) are utilized in NVEs. In this scenario, each | (and default IP address) are utilized in NVEs. In this scenario, each | |||
| skipping to change at page 21, line 33 ¶ | skipping to change at page 24, line 12 ¶ | |||
| convergence time upon a link or NVE failure in a multi-homed network | convergence time upon a link or NVE failure in a multi-homed network | |||
| or device redundancy scenario, because the failure related BGP routes | or device redundancy scenario, because the failure related BGP routes | |||
| (such as mass withdraw message) do not need to get propagated all the | (such as mass withdraw message) do not need to get propagated all the | |||
| way to the remote NVEs in the remote DCs. This approach is described | way to the remote NVEs in the remote DCs. This approach is described | |||
| in details in section 3.4 of [DCI-EVPN-OVERLAY]. | in details in section 3.4 of [DCI-EVPN-OVERLAY]. | |||
| 10.2 DCI using ASBRs | 10.2 DCI using ASBRs | |||
| This approach can be considered as the opposite of the first approach | This approach can be considered as the opposite of the first approach | |||
| and it favors simplification at DCI devices over NVEs such that | and it favors simplification at DCI devices over NVEs such that | |||
| larger MAC-VRF (and IP-VRF) tables are need to be maintained on NVEs; | larger MAC-VRF (and IP-VRF) tables need to be maintained on NVEs; | |||
| whereas, DCI devices don't need to maintain any MAC (and IP) | whereas, DCI devices don't need to maintain any MAC (and IP) | |||
| forwarding tables. Furthermore, DCI devices do not need to terminate | forwarding tables. Furthermore, DCI devices do not need to terminate | |||
| and processed routes related to multi-homing but rather to relay | and process routes related to multi-homing but rather to relay these | |||
| these messages for the establishment of an end-to-end LSP path. In | messages for the establishment of an end-to-end Label Switched Path | |||
| other words, DCI devices in this approach operate similar to ASBRs | (LSP) path. In other words, DCI devices in this approach operate | |||
| for inter-AS options B. This requires locally assigned VNIs to be | similar to ASBRs for inter-AS option B - section 10 of [RFC4364]. | |||
| used just like downstream assigned MPLS VPN label where for all | This requires locally assigned VNIs to be used just like downstream | |||
| practical purposes the VNIs function like 24-bit VPN labels. This | assigned MPLS VPN label where for all practical purposes the VNIs | |||
| approach is equally applicable to data centers (or Carrier Ethernet | function like 24-bit VPN labels. This approach is equally applicable | |||
| networks) with MPLS encapsulation. | to data centers (or Carrier Ethernet networks) with MPLS | |||
| encapsulation. | ||||
| In inter-AS option B, when ASBR receives an EVPN route from its DC | In inter-AS option B, when ASBR receives an EVPN route from its DC | |||
| over iBGP and re-advertises it to other ASBRs, it re-advertises the | over internal BGP (iBGP) and re-advertises it to other ASBRs, it re- | |||
| EVPN route by re-writing the BGP next-hops to itself, thus losing the | advertises the EVPN route by re-writing the BGP next-hops to itself, | |||
| identity of the PE that originated the advertisement. This re-write | thus losing the identity of the PE that originated the advertisement. | |||
| of BGP next-hop impacts the EVPN Mass Withdraw route (Ethernet A-D | This re-write of BGP next-hop impacts the EVPN Mass Withdraw route | |||
| per ES) and its procedure adversely. However, it does not impact EVPN | (Ethernet A-D per ES) and its procedure adversely. However, it does | |||
| Aliasing mechanism/procedure because when the Aliasing routes (Ether | not impact EVPN Aliasing mechanism/procedure because when the | |||
| A-D per EVI) are advertised, the receiving PE first resolves a MAC | Aliasing routes (Ether A-D per EVI) are advertised, the receiving PE | |||
| address for a given EVI into its corresponding <ES,EVI> and | first resolves a MAC address for a given EVI into its corresponding | |||
| subsequently, it resolves the <ES,EVI> into multiple paths (and their | <ES,EVI> and subsequently, it resolves the <ES,EVI> into multiple | |||
| associated next hops) via which the <ES,EVI> is reachable. Since | paths (and their associated next hops) via which the <ES,EVI> is | |||
| Aliasing and MAC routes are both advertised per EVI basis and they | reachable. Since Aliasing and MAC routes are both advertised per EVI | |||
| use the same RD and RT (per EVI), the receiving PE can associate them | basis and they use the same RD and RT (per EVI), the receiving PE can | |||
| together on a per BGP path basis (e.g., per originating PE) and thus | associate them together on a per BGP path basis (e.g., per | |||
| perform recursive route resolution - e.g., a MAC is reachable via an | originating PE) and thus perform recursive route resolution - e.g., a | |||
| <ES,EVI> which in turn, is reachable via a set of BGP paths, thus the | MAC is reachable via an <ES,EVI> which in turn, is reachable via a | |||
| MAC is reachable via the set of BGP paths. Since on a per EVI basis, | set of BGP paths, thus the MAC is reachable via the set of BGP paths. | |||
| the association of MAC routes and the corresponding Aliasing route is | Since on a per EVI basis, the association of MAC routes and the | |||
| fixed and determined by the same RD and RT, there is no ambiguity | corresponding Aliasing route is fixed and determined by the same RD | |||
| when the BGP next hop for these routes is re-written as these routes | and RT, there is no ambiguity when the BGP next hop for these routes | |||
| pass through ASBRs - i.e., the receiving PE may receive multiple | is re-written as these routes pass through ASBRs - i.e., the | |||
| Aliasing routes for the same EVI from a single next hop (a single | receiving PE may receive multiple Aliasing routes for the same EVI | |||
| ASBR), and it can still create multiple paths toward that <ES, EVI>. | from a single next hop (a single ASBR), and it can still create | |||
| multiple paths toward that <ES, EVI>. | ||||
| However, when the BGP next hop address corresponding to the | However, when the BGP next hop address corresponding to the | |||
| originating PE is re-written, the association between the Mass | originating PE is re-written, the association between the Mass | |||
| Withdraw route (Ether A-D per ES) and its corresponding MAC routes | Withdraw route (Ether A-D per ES) and its corresponding MAC routes | |||
| cannot be made based on their RDs and RTs because the RD for Mass | cannot be made based on their RDs and RTs because the RD for Mass | |||
| Withdraw route is different than the one for the MAC routes. | Withdraw route is different than the one for the MAC routes. | |||
| Therefore, the functionality needed at the ASBRs and the receiving | Therefore, the functionality needed at the ASBRs and the receiving | |||
| PEs depends on whether the Mass Withdraw route is originated and | PEs depends on whether the Mass Withdraw route is originated and | |||
| whether there is a need to handle route resolution ambiguity for this | whether there is a need to handle route resolution ambiguity for this | |||
| route. The following two subsections describe the functionality | route. The following two subsections describe the functionality | |||
| needed by the ASBRs and the receiving PEs depending on whether the | needed by the ASBRs and the receiving PEs depending on whether the | |||
| NVEs reside in a Hypervisors or in TORs. | NVEs reside in a Hypervisors or in TORs. | |||
| 10.2.1 ASBR Functionality with NVEs in Hypervisors | 10.2.1 ASBR Functionality with Single-Homing NVEs | |||
| When NVEs reside in hypervisors as described in section 7.1, there is | When NVEs reside in hypervisors as described in section 7.1, there is | |||
| no multi-homing and thus there is no need for the originating NVE to | no multi-homing and thus there is no need for the originating NVE to | |||
| send Ethernet A-D per ES or Ethernet A-D per EVI routes. However, as | send Ethernet A-D per ES or Ethernet A-D per EVI routes. However, as | |||
| noted in section 7, in order to enable a single-homing ingress NVE to | noted in section 7, in order to enable a single-homing ingress NVE to | |||
| take advantage of fast convergence, aliasing, and backup-path when | take advantage of fast convergence, aliasing, and backup-path when | |||
| interacting with multi-homing egress NVEs attached to a given | interacting with multi-homing egress NVEs attached to a given | |||
| Ethernet segment, the single-homing NVE SHOULD be able to receive and | Ethernet segment, the single-homing NVE should be able to receive and | |||
| process Ethernet AD per ES and Ethernet AD per EVI routes. The | process Ethernet AD per ES and Ethernet AD per EVI routes. The | |||
| handling of these routes are described in the next section. | handling of these routes are described in the next section. | |||
| 10.2.2 ASBR Functionality with NVEs in TORs | 10.2.2 ASBR Functionality with Multi-Homing NVEs | |||
| When NVEs reside in TORs and operate in multi-homing redundancy mode, | When NVEs reside in TORs and operate in multi-homing redundancy mode, | |||
| then as described in section 8, there is a need for the originating | then as described in section 8, there is a need for the originating | |||
| NVE to send Ethernet A-D per ES route(s) (used for mass withdraw) and | multi-homing NVE to send Ethernet A-D per ES route(s) (used for mass | |||
| Ethernet A-D per EVI routes (used for aliasing). As described above, | withdraw) and Ethernet A-D per EVI routes (used for aliasing). As | |||
| the re-write of BGP next-hop by ASBRs creates ambiguities when | described above, the re-write of BGP next-hop by ASBRs creates | |||
| Ethernet A-D per ES routes are received by the remote NVE in a | ambiguities when Ethernet A-D per ES routes are received by the | |||
| different ASBR because the receiving NVE cannot associated that route | remote NVE in a different ASBR because the receiving NVE cannot | |||
| with the MAC/IP routes of that Ethernet Segment advertised by the | associated that route with the MAC/IP routes of that Ethernet Segment | |||
| same originating NVE. This ambiguity inhibits the function of mass- | advertised by the same originating NVE. This ambiguity inhibits the | |||
| withdraw per ES by the receiving NVE in a different AS. | function of mass-withdraw per ES by the receiving NVE in a different | |||
| AS. | ||||
| As an example consider a scenario where CE is multi-homed to PE1 and | As an example consider a scenario where CE is multi-homed to PE1 and | |||
| PE2 where these PEs are connected via ASBR1 and then ASBR2 to the | PE2 where these PEs are connected via ASBR1 and then ASBR2 to the | |||
| remote PE3. Furthermore, consider that PE1 receives M1 from CE1 but | remote PE3. Furthermore, consider that PE1 receives M1 from CE1 but | |||
| not PE2. Therefore, PE1 advertises Eth A-D per ES1, Eth A-D per EVI1, | not PE2. Therefore, PE1 advertises Eth A-D per ES1, Eth A-D per EVI1, | |||
| and M1; whereas, PE2 only advertises Eth A-D per ES1 and Eth A-D per | and M1; whereas, PE2 only advertises Eth A-D per ES1 and Eth A-D per | |||
| EVI1. ASBR1 receives all these five advertisements and passes them to | EVI1. ASBR1 receives all these five advertisements and passes them to | |||
| ASBR2 (with itself as the BGP next hop). ASBR2, in turn, passes them | ASBR2 (with itself as the BGP next hop). ASBR2, in turn, passes them | |||
| to the remote PE3 with itself as the BGP next hop. PE3 receives these | to the remote PE3 with itself as the BGP next hop. PE3 receives these | |||
| five routes where all of them have the same BGP next-hop (i.e., | five routes where all of them have the same BGP next-hop (i.e., | |||
| skipping to change at page 24, line 12 ¶ | skipping to change at page 26, line 39 ¶ | |||
| In the above example, the PE3 receives two Aliasing routes with the | In the above example, the PE3 receives two Aliasing routes with the | |||
| same BGP next hop (ASBR2) but different RDs. One of the Alias route | same BGP next hop (ASBR2) but different RDs. One of the Alias route | |||
| has the same RD as the advertised MAC route (M1). PE3 follows the | has the same RD as the advertised MAC route (M1). PE3 follows the | |||
| route resolution procedure specified in [RFC7432] upon receiving the | route resolution procedure specified in [RFC7432] upon receiving the | |||
| two Aliasing route - ie, it resolves M1 to <ES, EVI1> and | two Aliasing route - ie, it resolves M1 to <ES, EVI1> and | |||
| subsequently it resolves <ES,EVI1> to a BGP path list with two paths | subsequently it resolves <ES,EVI1> to a BGP path list with two paths | |||
| along with the corresponding VNIs/MPLS labels (one associated with | along with the corresponding VNIs/MPLS labels (one associated with | |||
| PE1 and the other associated with PE2). It should be noted that even | PE1 and the other associated with PE2). It should be noted that even | |||
| though both paths are advertised by the same BGP next hop (ASRB2), | though both paths are advertised by the same BGP next hop (ASRB2), | |||
| the receiving PE3 can handle them properly. Therefore, M1 is | the receiving PE3 can handle them properly. Therefore, M1 is | |||
| reachable via two paths. This creates two end-to-end LSPs from PE3 to | reachable via two paths. This creates two end-to-end LSPs, from PE3 | |||
| PE1 for M1 such that when PE3 wants to forward traffic destined to | to PE1 and from PE3 to PE2, for M1 such that when PE3 wants to | |||
| M1, it can load balanced between the two paths. Although route | forward traffic destined to M1, it can load balanced between the two | |||
| resolution for Aliasing routes with the same BGP next hop is not | LSPs. Although route resolution for Aliasing routes with the same BGP | |||
| explicitly mentioned in [RFC7432], the is the expected operation and | next hop is not explicitly mentioned in [RFC7432], this is the | |||
| thus it is elaborated here. | expected operation and thus it is elaborated here. | |||
| When the AC between the PE2 and the CE fails and PE2 sends NLRI | When the AC between the PE2 and the CE fails and PE2 sends NLRI | |||
| withdrawal for Ether A-D per EVI routes and these withdrawals get | withdrawal for Ether A-D per EVI routes and these withdrawals get | |||
| propagated and received by the PE3, the PE3 removes the Aliasing | propagated and received by the PE3, the PE3 removes the Aliasing | |||
| route and updates the path list - ie, it removes the path | route and updates the path list - ie, it removes the path | |||
| corresponding to the PE2. Therefore, all the corresponding MAC routes | corresponding to the PE2. Therefore, all the corresponding MAC routes | |||
| for that <ES,EVI> that point to that path list will now have the | for that <ES,EVI> that point to that path list will now have the | |||
| updated path list with a single path associated with PE1. This action | updated path list with a single path associated with PE1. This action | |||
| can be considered as the mass-withdraw at the per-EVI level. The | can be considered as the mass-withdraw at the per-EVI level. The | |||
| mass-withdraw at per-EVI level has longer convergence time than the | mass-withdraw at per-EVI level has longer convergence time than the | |||
| mass-withdraw at per-ES level; however, it is much faster than the | mass-withdraw at per-ES level; however, it is much faster than the | |||
| convergence time when the withdraw is done on a per-MAC basis. | convergence time when the withdraw is done on a per-MAC basis. | |||
| If a PE becomes detached from a given ES, then in addition to | ||||
| withdrawing its previously advertised Ethernet AD Per ES routes, it | ||||
| MUST also withdraw its previously advertised Ethernet AD Per EVI | ||||
| routes for that ES. For a remote PE that is separated from the | ||||
| withdrawing PE by one or more EVPN inter-AS option B ASBRs, the | ||||
| withdrawal of the Ethernet AD Per ES routes is not actionable. | ||||
| However, a remote PE is able to correlate a previously advertised | ||||
| Ethernet AD Per EVI route with any MAC/IP Advertisement routes also | ||||
| advertised by the withdrawing PE for that <ES, EVI, BD>. Hence, when | ||||
| it receives the withdrawal of an Ethernet AD Per EVI route, it SHOULD | ||||
| remove the withdrawing PE as a next-hop for all MAC addresses | ||||
| associated with that <ES, EVI, BD>. | ||||
| In the previous example, when the AC between PE2 and the CE fails, | ||||
| PE2 will withdraw its Ethernet AD Per ES and Per EVI routes. When | ||||
| PE3 receives the withdrawal of an Ethernet AD Per EVI route, it | ||||
| removes PE2 as a valid next-hop for all MAC addresses associated with | ||||
| the corresponding <ES, EVI, BD>. Therefore, all the MAC next-hops | ||||
| for that <ES,EVI, BD> will now have a single next-hop, viz the LSP to | ||||
| PE1. | ||||
| In summary, it can be seen that aliasing (and backup path) | In summary, it can be seen that aliasing (and backup path) | |||
| functionality should work as is for inter-AS option B without | functionality should work as is for inter-AS option B without | |||
| requiring any addition functionality in ASBRs or PEs. However, the | requiring any addition functionality in ASBRs or PEs. However, the | |||
| mass-withdraw functionality falls back from per-ES mode to per-EVI | mass-withdraw functionality falls back from per-ES mode to per-EVI | |||
| mode for inter-AS option B - i.e., PEs receiving mass-withdraw route | mode for inter-AS option B - i.e., PEs receiving mass-withdraw route | |||
| from the same AS use Ether A-D per ES route; whereas, PEs receiving | from the same AS take action on Ether A-D per ES route; whereas, PEs | |||
| mass-withdraw route from different AS use Ether A-D per EVI route. | receiving mass-withdraw route from different AS take action on Ether | |||
| A-D per EVI route. | ||||
| 11 Acknowledgement | 11 Acknowledgement | |||
| The authors would like to thank Aldrin Isaac, David Smith, John | The authors would like to thank Aldrin Isaac, David Smith, John | |||
| Mullooly, Thomas Nadeau for their valuable comments and feedback. The | Mullooly, Thomas Nadeau, Samir Thoria, and Jorge Rabadan for their | |||
| authors would also like to thank Jakob Heitz for his contribution on | valuable comments and feedback. The authors would also like to thank | |||
| section 10.2. | Jakob Heitz for his contribution on section 10.2. | |||
| 12 Security Considerations | 12 Security Considerations | |||
| This document uses IP-based tunnel technologies to support data | This document uses IP-based tunnel technologies to support data | |||
| plane transport. Consequently, the security considerations of those | plane transport. Consequently, the security considerations of those | |||
| tunnel technologies apply. This document defines support for VXLAN | tunnel technologies apply. This document defines support for VXLAN | |||
| and NVGRE encapsulations. The security considerations from those | [RFC7348] and NVGRE [RFC7637] encapsulations. The security | |||
| documents as well as [RFC4301] apply to the data plane aspects of | considerations from those RFCs apply to the data plane aspects of | |||
| this document. | this document. | |||
| As with [RFC5512], any modification of the information that is used | As with [RFC5512], any modification of the information that is used | |||
| to form encapsulation headers, to choose a tunnel type, or to choose | to form encapsulation headers, to choose a tunnel type, or to choose | |||
| a particular tunnel for a particular payload type may lead to user | a particular tunnel for a particular payload type may lead to user | |||
| data packets getting misrouted, misdelivered, and/or dropped. | data packets getting misrouted, misdelivered, and/or dropped. | |||
| More broadly, the security considerations for the transport of IP | More broadly, the security considerations for the transport of IP | |||
| reachability information using BGP are discussed in [RFC4271] and | reachability information using BGP are discussed in [RFC4271] and | |||
| [RFC4272], and are equally applicable for the extensions described | [RFC4272], and are equally applicable for the extensions described | |||
| in this document. | in this document. | |||
| If the integrity of the BGP session is not itself protected, then an | ||||
| imposter could mount a denial-of-service attack by establishing | ||||
| numerous BGP sessions and forcing an IPsec SA to be created for each | ||||
| one. However, as such an imposter could wreak havoc on the entire | ||||
| routing system, this particular sort of attack is probably not of | ||||
| any special importance. | ||||
| It should be noted that a BGP session may itself be transported over | ||||
| an IPsec tunnel. Such IPsec tunnels can provide additional security | ||||
| to a BGP session. The management of such IPsec tunnels is outside | ||||
| the scope of this document. | ||||
| 13 IANA Considerations | 13 IANA Considerations | |||
| IANA has allocated the following BGP Tunnel Encapsulation Attribute | This document requests the following BGP Tunnel Encapsulation | |||
| Tunnel Types: | Attribute Tunnel Types from IANA and they have already been | |||
| allocated. The IANA registry needs to point to this document. | ||||
| 8 VXLAN Encapsulation | 8 VXLAN Encapsulation | |||
| 9 NVGRE Encapsulation | 9 NVGRE Encapsulation | |||
| 10 MPLS Encapsulation | 10 MPLS Encapsulation | |||
| 11 MPLS in GRE Encapsulation | 11 MPLS in GRE Encapsulation | |||
| 12 VXLAN GPE Encapsulation | 12 VXLAN GPE Encapsulation | |||
| 14 References | 14 References | |||
| 14.1 Normative References | 14.1 Normative References | |||
| [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
| [RFC4271] Y. Rekhter, Ed., T. Li, Ed., S. Hares, Ed., "A Border | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| Gateway Protocol 4 (BGP-4)", January 2006. | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, | |||
| <http://www.rfc-editor.org/info/rfc8174>. | ||||
| [RFC4272] S. Murphy, "BGP Security Vulnerabilities Analysis.", | [RFC7432] Sajassi et al., "BGP MPLS Based Ethernet VPN", RFC 7432, | |||
| January 2006. | February 2014 | |||
| [RFC4301] S. Kent, K. Seo., "Security Architecture for the | [RFC7348] Mahalingam, M., et al, "VXLAN: A Framework for Overlaying | |||
| Internet Protocol.", December 2005. | Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, August | |||
| 2014 | ||||
| [RFC7637] Garg, P., et al., "NVGRE: Network Virtualization using | ||||
| Generic Routing Encapsulation", RFC 7637, September, 2015 | ||||
| [RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation | [RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation | |||
| Subsequent Address Family Identifier (SAFI) and the BGP | Subsequent Address Family Identifier (SAFI) and the BGP Tunnel | |||
| Tunnel Encapsulation Attribute", RFC 5512, April 2009. | Encapsulation Attribute", RFC 5512, April 2009. | |||
| [RFC7432] Sajassi et al., "BGP MPLS Based Ethernet VPN", RFC 7432, | [RFC4023] T. Worster et al., "Encapsulating MPLS in IP or Generic | |||
| February 2014 | Routing Encapsulation (GRE)", RFC 4023, March 2005 | |||
| 14.2 Informative References | 14.2 Informative References | |||
| [RFC7209] Sajassi et al., "Requirements for Ethernet VPN (EVPN)", RFC | [RFC7209] Sajassi et al., "Requirements for Ethernet VPN (EVPN)", RFC | |||
| 7209, May 2014 | 7209, May 2014 | |||
| [RFC7348] Mahalingam, M., et al, "VXLAN: A Framework for Overlaying | [RFC4272] S. Murphy, "BGP Security Vulnerabilities Analysis.", | |||
| Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, August | January 2006. | |||
| 2014 | ||||
| [NVGRE] Garg, P., et al., "NVGRE: Network Virtualization using | [RFC7364] Narten et al., "Problem Statement: Overlays for Network | |||
| Generic Routing Encapsulation", draft-sridharan-virtualization-nvgre- | Virtualization", RFC 7364, October 2014. | |||
| 07.txt, November 11, 2014 | ||||
| [Problem-Statement] Narten et al., "Problem Statement: Overlays for | [RFC7365] Lasserre et al., "Framework for DC Network Virtualization", | |||
| Network Virtualization", draft-ietf-nvo3-overlay-problem-statement- | RFC 7365, October 2014. | |||
| 01, September 2012. | ||||
| [L3VPN-ENDSYSTEMS] Marques et al., "BGP-signaled End-system IP/VPNs", | [DCI-EVPN-OVERLAY] Rabadan et al., "Interconnect Solution for EVPN | |||
| draft-ietf-l3vpn-end-system, work in progress, October 2012. | Overlay networks", draft-ietf-bess-dci-evpn-overlay-08, work in | |||
| progress, February 8, 2018. | ||||
| [NOV3-FRWK] Lasserre et al., "Framework for DC Network | [RFC4271] Y. Rekhter, Ed., T. Li, Ed., S. Hares, Ed., "A Border | |||
| Virtualization", draft-ietf-nvo3-framework-01.txt, work in progress, | Gateway Protocol 4 (BGP-4)", January 2006. | |||
| October 2012. | ||||
| [DCI-EVPN-OVERLAY] Rabadan et al., "Interconnect Solution for EVPN | [RFC4364] Rosen, E., et al, "BGP/MPLS IP Virtual Private Networks | |||
| Overlay networks", draft-ietf-bess-dci-evpn-overlay-02, work in | (VPNs)", RFC 4364, February 2006. | |||
| progress, February 29, 2016. | ||||
| [TUNNEL-ENCAP] Rosen et al., "The BGP Tunnel Encapsulation | [TUNNEL-ENCAP] Rosen et al., "The BGP Tunnel Encapsulation | |||
| Attribute", draft-ietf-idr-tunnel-encaps-02, work in progress, May | Attribute", draft-ietf-idr-tunnel-encaps-08, work in progress, | |||
| 31, 2016. | January 11, 2018. | |||
| [RFC6514] R. Aggarwal et al., "BGP Encodings and Procedures for | ||||
| Multicast in MPLS/BGP IP VPNs", RFC 6514, February 2012 | ||||
| [VXLAN-GPE] Maino et al., "Generic Protocol Extension for VXLAN", | ||||
| draft-ietf-nvo3-vxlan-gpe-05, work in progress October 30, 2017. | ||||
| [GENEVE] J. Gross et al., "Geneve: Generic Network Virtualization | ||||
| Encapsulation", draft-ietf-nvo3-geneve-05, September 2017 | ||||
| [EVPN-GENEVE] S. Boutros et al., "EVPN control plane for Geneve", | ||||
| draft-boutros-bess-evpn-geneve-00.txt, June 2017 | ||||
| Contributors | Contributors | |||
| S. Salam K. Patel D. Rao S. Thoria D. Cai Cisco | S. Salam | |||
| K. Patel | ||||
| D. Rao | ||||
| S. Thoria | ||||
| D. Cai | ||||
| Cisco | ||||
| Y. Rekhter R. Shekhar Wen Lin Nischal Sheth Juniper | Y. Rekhter | |||
| A. Issac | ||||
| Wen Lin | ||||
| Nischal Sheth | ||||
| Juniper | ||||
| L. Yong Huawei | L. Yong | |||
| Huawei | ||||
| Authors' Addresses | Authors' Addresses | |||
| Ali Sajassi | Ali Sajassi | |||
| Cisco | Cisco | |||
| USA | ||||
| Email: sajassi@cisco.com | Email: sajassi@cisco.com | |||
| John Drake | John Drake | |||
| Juniper Networks | Juniper Networks | |||
| USA | ||||
| Email: jdrake@juniper.net | Email: jdrake@juniper.net | |||
| Nabil Bitar | Nabil Bitar | |||
| Nokia | Nokia | |||
| USA | ||||
| Email : nabil.bitar@nokia.com | Email : nabil.bitar@nokia.com | |||
| R. Shekhar | R. Shekhar | |||
| Juniper | Juniper | |||
| USA | ||||
| Email: rshekhar@juniper.net | Email: rshekhar@juniper.net | |||
| James Uttaro | James Uttaro | |||
| AT&T | AT&T | |||
| USA | ||||
| Email: uttaro@att.com | Email: uttaro@att.com | |||
| Wim Henderickx | Wim Henderickx | |||
| Alcatel-Lucent | Nokia | |||
| USA | ||||
| e-mail: wim.henderickx@nokia.com | e-mail: wim.henderickx@nokia.com | |||
| End of changes. 122 change blocks. | ||||
| 382 lines changed or deleted | 561 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||