| < draft-ietf-bess-evpn-optimized-ir-06.txt | draft-ietf-bess-evpn-optimized-ir-07.txt > | |||
|---|---|---|---|---|
| BESS Workgroup J. Rabadan, Ed. | BESS Workgroup J. Rabadan, Ed. | |||
| Internet Draft S. Sathappan | Internet-Draft S. Sathappan | |||
| Intended status: Standards Track Nokia | Intended status: Standards Track Nokia | |||
| Expires: January 14, 2021 W. Lin | ||||
| W. Lin | Juniper Networks | |||
| Juniper | ||||
| M. Katiyar | M. Katiyar | |||
| Versa Networks | Versa Networks | |||
| A. Sajassi | A. Sajassi | |||
| Cisco | Cisco Systems | |||
| July 13, 2020 | ||||
| Expires: April 22, 2019 October 19, 2018 | ||||
| Optimized Ingress Replication solution for EVPN | Optimized Ingress Replication solution for EVPN | |||
| draft-ietf-bess-evpn-optimized-ir-06 | draft-ietf-bess-evpn-optimized-ir-07 | |||
| Abstract | Abstract | |||
| Network Virtualization Overlay (NVO) networks using EVPN as control | Network Virtualization Overlay (NVO) networks using EVPN as control | |||
| plane may use Ingress Replication (IR) or PIM (Protocol Independent | plane may use Ingress Replication (IR) or PIM (Protocol Independent | |||
| Multicast) based trees to convey the overlay BUM traffic. PIM | Multicast) based trees to convey the overlay BUM traffic. PIM | |||
| provides an efficient solution to avoid sending multiple copies of | provides an efficient solution to avoid sending multiple copies of | |||
| the same packet over the same physical link, however it may not | the same packet over the same physical link, however it may not | |||
| always be deployed in the NVO core network. IR avoids the dependency | always be deployed in the NVO core network. IR avoids the dependency | |||
| on PIM in the NVO network core. While IR provides a simple multicast | on PIM in the NVO network core. While IR provides a simple multicast | |||
| transport, some NVO networks with demanding multicast applications | transport, some NVO networks with demanding multicast applications | |||
| require a more efficient solution without PIM in the core. This | require a more efficient solution without PIM in the core. This | |||
| document describes a solution to optimize the efficiency of IR in NVO | document describes a solution to optimize the efficiency of IR in NVO | |||
| networks. | networks. | |||
| Status of this Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF). Note that other groups may also distribute | |||
| other groups may also distribute working documents as Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | This Internet-Draft will expire on January 14, 2021. | |||
| http://www.ietf.org/ietf/1id-abstracts.txt | ||||
| The list of Internet-Draft Shadow Directories can be accessed at | ||||
| http://www.ietf.org/shadow.html | ||||
| This Internet-Draft will expire on April 22, 2019. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2020 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 2. Terminology and Conventions . . . . . . . . . . . . . . . . . . 4 | 2. Terminology and Conventions . . . . . . . . . . . . . . . . . 4 | |||
| 3. Solution requirements . . . . . . . . . . . . . . . . . . . . . 5 | 3. Solution requirements . . . . . . . . . . . . . . . . . . . . 6 | |||
| 4. EVPN BGP Attributes for optimized-IR . . . . . . . . . . . . . 6 | 4. EVPN BGP Attributes for optimized-IR . . . . . . . . . . . . 6 | |||
| 5. Non-selective Assisted-Replication (AR) Solution Description . 9 | 5. Non-selective Assisted-Replication (AR) Solution Description 10 | |||
| 5.1. Non-selective AR-REPLICATOR procedures . . . . . . . . . . 10 | 5.1. Non-selective AR-REPLICATOR procedures . . . . . . . . . 11 | |||
| 5.2. Non-selective AR-LEAF procedures . . . . . . . . . . . . . 11 | 5.2. Non-selective AR-LEAF procedures . . . . . . . . . . . . 12 | |||
| 5.3. RNVE procedures . . . . . . . . . . . . . . . . . . . . . . 12 | 5.3. RNVE procedures . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 5.4. Forwarding behavior in non-selective AR EVIs . . . . . . . 13 | 6. Selective Assisted-Replication (AR) Solution Description . . 15 | |||
| 5.4.1. Broadcast and Multicast forwarding behavior . . . . . . 13 | 6.1. Selective AR-REPLICATOR procedures . . . . . . . . . . . 15 | |||
| 5.4.1.1. Non-selective AR-REPLICATOR BM forwarding . . . . . 13 | 6.2. Selective AR-LEAF procedures . . . . . . . . . . . . . . 18 | |||
| 5.4.1.2. Non-selective AR-LEAF BM forwarding . . . . . . . . 14 | 7. Pruned-Flood-Lists (PFL) . . . . . . . . . . . . . . . . . . 19 | |||
| 5.4.1.3. RNVE BM forwarding . . . . . . . . . . . . . . . . 14 | 7.1. A PFL example . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 5.4.2. Unknown unicast forwarding behavior . . . . . . . . . . 14 | 8. AR Procedures for single-IP AR-REPLICATORS . . . . . . . . . 21 | |||
| 5.4.2.1. Non-selective AR-REPLICATOR/LEAF Unknown unicast | 9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon 21 | |||
| forwarding . . . . . . . . . . . . . . . . . . . . 15 | 9.1. Ethernet Segments on AR-LEAF nodes . . . . . . . . . . . 22 | |||
| 5.4.2.2. RNVE Unknown unicast forwarding . . . . . . . . . . 15 | 9.2. Ethernet Segments on AR-REPLICATOR nodes . . . . . . . . 22 | |||
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 23 | ||||
| 6. Selective Assisted-Replication (AR) Solution Description . . . 15 | 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 6.1. Selective AR-REPLICATOR procedures . . . . . . . . . . . . 15 | 12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 6.2. Selective AR-LEAF procedures . . . . . . . . . . . . . . . 17 | 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 6.3. Forwarding behavior in selective AR EVIs . . . . . . . . . 18 | 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 6.3.1. Selective AR-REPLICATOR BM forwarding . . . . . . . . . 18 | 14.1. Normative References . . . . . . . . . . . . . . . . . . 24 | |||
| 6.3.2. Selective AR-LEAF BM forwarding . . . . . . . . . . . . 19 | 14.2. Informative References . . . . . . . . . . . . . . . . . 25 | |||
| 7. Pruned-Flood-Lists (PFL) . . . . . . . . . . . . . . . . . . . 20 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 7.1. A PFL example . . . . . . . . . . . . . . . . . . . . . . . 20 | ||||
| 8. AR Procedures for single-IP AR-REPLICATORS . . . . . . . . . . 21 | ||||
| 9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon . 22 | ||||
| 9.1. Ethernet Segments on AR-LEAF nodes . . . . . . . . . . . . 22 | ||||
| 9.2. Ethernet Segments on AR-REPLICATOR nodes . . . . . . . . . 23 | ||||
| 10. Benefits of the optimized-IR solution . . . . . . . . . . . . 23 | ||||
| 11. Security Considerations . . . . . . . . . . . . . . . . . . . 24 | ||||
| 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 | ||||
| 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 | ||||
| 13.1 Normative References . . . . . . . . . . . . . . . . . . . 24 | ||||
| 13.2 Informative References . . . . . . . . . . . . . . . . . . 25 | ||||
| 14. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 25 | ||||
| 15. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25 | ||||
| 16. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 25 | ||||
| 1. Introduction | 1. Introduction | |||
| Ethernet Virtual Private Networks (EVPN) may be used as the control | Ethernet Virtual Private Networks (EVPN) may be used as the control | |||
| plane for a Network Virtualization Overlay (NVO) network. Network | plane for a Network Virtualization Overlay (NVO) network. Network | |||
| Virtualization Edge (NVE) devices and Provider Edges (PEs) that are | Virtualization Edge (NVE) devices and Provider Edges (PEs) that are | |||
| part of the same EVPN Instance (EVI) use Ingress Replication (IR) or | part of the same EVPN Instance (EVI) use Ingress Replication (IR) or | |||
| PIM-based trees to transport the tenant's BUM traffic. In NVO | PIM-based trees to transport the tenant's BUM traffic. In NVO | |||
| networks where PIM-based trees cannot be used, IR is the only option. | networks where PIM-based trees cannot be used, IR is the only option. | |||
| Examples of these situations are NVO networks where the core nodes | Examples of these situations are NVO networks where the core nodes | |||
| don't support PIM or the network operator does not want to run PIM in | don't support PIM or the network operator does not want to run PIM in | |||
| the core. | the core. | |||
| In some use-cases, the amount of replication for BUM (Broadcast, | In some use-cases, the amount of replication for BUM (Broadcast, | |||
| Unknown unicast and Multicast traffic) is kept under control on the | Unknown unicast and Multicast traffic) is kept under control on the | |||
| NVEs due to the following fairly common assumptions: | NVEs due to the following fairly common assumptions: | |||
| a) Broadcast is greatly reduced due to the proxy ARP (Address | a. Broadcast is greatly reduced due to the proxy ARP (Address | |||
| Resolution Protocol) and proxy ND (Neighbor Discovery) | Resolution Protocol) and proxy ND (Neighbor Discovery) | |||
| capabilities supported by EVPN on the NVEs. Some NVEs can even | capabilities supported by EVPN on the NVEs. Some NVEs can even | |||
| provide Dynamic Host Configuration Protocol(DHCP) server functions | provide Dynamic Host Configuration Protocol (DHCP) server | |||
| for the attached Tenant Systems (TS) reducing the broadcast even | functions for the attached Tenant Systems (TS) reducing the | |||
| further. | broadcast even further. | |||
| b) Unknown unicast traffic is greatly reduced in virtualized NVO | b. Unknown unicast traffic is greatly reduced in virtualized NVO | |||
| networks where all the MAC and IP addresses are learnt in the | networks where all the MAC and IP addresses are learned in the | |||
| control plane. | control plane. | |||
| c) Multicast applications are not used. | c. Multicast applications are not used. | |||
| If the above assumptions are true for a given NVO network, then IR | If the above assumptions are true for a given NVO network, then IR | |||
| provides a simple solution for multi-destination traffic. However, | provides a simple solution for multi-destination traffic. However, | |||
| the statement c) above is not always true and multicast applications | the statement c) above is not always true and multicast applications | |||
| are required in many use-cases. | are required in many use-cases. | |||
| When the multicast sources are attached to NVEs residing in | When the multicast sources are attached to NVEs residing in | |||
| hypervisors or low-performance-replication TORs Top Of the Rack | hypervisors or low-performance-replication TORs (Top Of Rack | |||
| switches), the ingress replication of a large amount of multicast | switches), the ingress replication of a large amount of multicast | |||
| traffic to a significant number of remote NVEs/PEs can seriously | traffic to a significant number of remote NVEs/PEs can seriously | |||
| degrade the performance of the NVE and impact the application. | degrade the performance of the NVE and impact the application. | |||
| This document describes a solution that makes use of two IR | This document describes a solution that makes use of two IR | |||
| optimizations: | optimizations: | |||
| i) Assisted-Replication (AR) | 1. Assisted-Replication (AR) | |||
| ii) Pruned-Flood-Lists (PFL) | ||||
| 2. Pruned-Flood-Lists (PFL) | ||||
| Both optimizations may be used together or independently so that the | Both optimizations may be used together or independently so that the | |||
| performance and efficiency of the network to transport multicast can | performance and efficiency of the network to transport multicast can | |||
| be improved. Both solutions require some extensions to [RFC7432] that | be improved. Both solutions require some extensions to [RFC7432] | |||
| are described in section 3. | that are described in Section 4. | |||
| Section 2 lists the requirements of the combined optimized-IR | Section 3 lists the requirements of the combined optimized-IR | |||
| solution, whereas sections 4 and 5 describe the Assisted-Replication | solution, whereas Section 5 and Section 6 describe the Assisted- | |||
| (AR) solution, and section 6 the Pruned-Flood-Lists (PFL) solution. | Replication (AR) solution, and Section 7 the Pruned-Flood-Lists (PFL) | |||
| solution. | ||||
| 2. Terminology and Conventions | 2. Terminology and Conventions | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| The following terminology is used throughout the document: | The following terminology is used throughout the document: | |||
| AC: Attachment Circuit | - AC: Attachment Circuit | |||
| Regular-IR: Refers to Regular Ingress Replication, where the source | - BM traffic: Refers to Broadcast and Multicast frames (excluding | |||
| NVE/PE sends a copy to each remote NVE/PE part of the EVI. | unknown unicast frames) | |||
| AR-IP: IP address owned by the AR-REPLICATOR and used to | - NVO: Network Virtualization Overlay | |||
| differentiate the ingress traffic that must follow the AR | ||||
| procedures. | ||||
| IR-IP: IP address used for Ingress Replication as in [RFC7432]. | - NVE: Network Virtualization Edge router | |||
| AR-VNI: VNI advertised by the AR-REPLICATOR along with the | - PE: Provider Edge router | |||
| Replicator-AR route. It is used to identify the ingress | ||||
| packets that must follow AR procedures ONLY in the Single-IP | ||||
| AR-REPLICATOR case. | ||||
| IR-VNI: VNI advertised along with the RT-3 for IR. | - AR-REPLICATOR: Assisted Replication - REPLICATOR, refers to an | |||
| NVE/PE that can replicate Broadcast en Multicast traffic received | ||||
| on overlay tunnels to other overlay tunnels. This document | ||||
| defines the control and data plane procedures that an AR- | ||||
| REPLICATOR needs to follow. | ||||
| AR forwarding mode: for an AR-LEAF, it means sending an AC BM packet | - AR-LEAF: Assisted Replication - LEAF, refers to an NVE/PE that - | |||
| to a single AR-REPLICATOR with tunnel destination IP AR-IP. | given its poor replication performance - sends all the Broadcast | |||
| For an AR-REPLICATOR, it means sending a BM packet to a | and Multicast traffic to an AR-REPLICATOR that can replicate the | |||
| selective number or all the overlay tunnels when the packet | traffic further on its behalf. | |||
| was previously received from an overlay tunnel. | ||||
| IR forwarding mode: it refers to the Ingress Replication behavior | - RNVE: Regular NVE, refers to an NVE that supports the procedures | |||
| explained in [RFC7432]. It means sending an AC BM packet copy | of [RFC8365] and does not support the procedures in this document. | |||
| to each remote PE/NVE in the EVI and sending an overlay BM | However, this document defines procedures to interoperate with | |||
| packet only to the ACs and not other overlay tunnels. | RNVEs. | |||
| PTA: PMSI Tunnel Attribute | - Replicator-AR route: an EVPN RT-3 (route type 3) that is | |||
| advertised by an AR-REPLICATOR to signal its capabilities. | ||||
| RT-3: EVPN Route Type 3, Inclusive Multicast Ethernet Tag route | - Regular-IR: Refers to Regular Ingress Replication, where the | |||
| source NVE/PE sends a copy to each remote NVE/PE part of the BD. | ||||
| RT-11: EVPN Route Type 11, Leaf Auto-Discovery (AD) route | - AR-IP: IP address owned by the AR-REPLICATOR and used to | |||
| differentiate the ingress traffic that must follow the AR | ||||
| procedures. | ||||
| VXLAN: Virtual Extensible LAN | - IR-IP: IP address used for Ingress Replication as in [RFC7432]. | |||
| GRE: Generic Routing Encapsulation | - AR-VNI: VNI advertised by the AR-REPLICATOR along with the | |||
| Replicator-AR route. It is used to identify the ingress packets | ||||
| that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR | ||||
| case. | ||||
| NVGRE: Network Virtualization using Generic Routing Encapsulation | - IR-VNI: VNI advertised along with the RT-3 for IR. | |||
| GENEVE: Generic Network Virtualization Encapsulation | - AR forwarding mode: for an AR-LEAF, it means sending an AC BM | |||
| packet to a single AR-REPLICATOR with tunnel destination IP AR-IP. | ||||
| For an AR-REPLICATOR, it means sending a BM packet to a selected | ||||
| number or all the overlay tunnels when the packet was previously | ||||
| received from an overlay tunnel. | ||||
| NVO: Network Virtualization Overlay | - IR forwarding mode: it refers to the Ingress Replication behavior | |||
| explained in [RFC7432]. It means sending an AC BM packet copy to | ||||
| each remote PE/NVE in the BD and sending an overlay BM packet only | ||||
| to the ACs and not other overlay tunnels. | ||||
| NVE: Network Virtualization Edge | - PTA: PMSI Tunnel Attribute | |||
| VNI: VXLAN Network Identifier | - RT-3: EVPN Route Type 3, Inclusive Multicast Ethernet Tag route | |||
| EVI: EVPN Instance. An EVPN instance spanning the Provider Edge (PE) | - RT-11: EVPN Route Type 11, Leaf Auto-Discovery (AD) route | |||
| devices participating in that EVPN | ||||
| 3. Solution requirements | - VXLAN: Virtual Extensible LAN | |||
| - GRE: Generic Routing Encapsulation | ||||
| - NVGRE: Network Virtualization using Generic Routing Encapsulation | ||||
| - GENEVE: Generic Network Virtualization Encapsulation | ||||
| - VNI: VXLAN Network Identifier | ||||
| - EVI: EVPN Instance. An EVPN instance spanning the Provider Edge | ||||
| (PE) devices participating in that EVPN | ||||
| - BD: Broadcast Domain, as defined in [RFC7432]. | ||||
| - TOR: Top Of Rack switch | ||||
| 3. Solution requirements | ||||
| The IR optimization solution specified in this document (optimized-IR | The IR optimization solution specified in this document (optimized-IR | |||
| hereafter) meets the following requirements: | hereafter) meets the following requirements: | |||
| a) The solution provides an IR optimization for BM (Broadcast and | a. It provides an IR optimization for BM (Broadcast and Multicast) | |||
| Multicast) traffic, while preserving the packet order for unicast | traffic without the need for PIM, while preserving the packet | |||
| applications, i.e., known and unknown unicast traffic should | order for unicast applications, i.e., known and unknown unicast | |||
| follow the same path. | traffic should follow the same path. This optimization is | |||
| required in low-performance NVEs. | ||||
| b) The solution is compatible with [RFC7432] and [RFC8365] and has no | b. It reduces the flooded traffic in NVO networks where some NVEs do | |||
| impact on the EVPN procedures for BM traffic. In particular, the | not need broadcast/multicast and/or unknown unicast traffic. | |||
| solution supports the following EVPN functions: | ||||
| o All-active multi-homing, including the split-horizon and | c. The solution is compatible with [RFC7432] and [RFC8365] and has | |||
| Designated Forwarder (DF) functions. | no impact on the EVPN procedures for BM traffic. In particular, | |||
| the solution supports the following EVPN functions: | ||||
| o Single-active multi-homing, including the DF function. | o All-active multi-homing, including the split-horizon and | |||
| Designated Forwarder (DF) functions. | ||||
| o Handling of multi-destination traffic and processing of | o Single-active multi-homing, including the DF function. o | |||
| Handling of multi-destination traffic and processing of | ||||
| broadcast and multicast as per [RFC7432]. | broadcast and multicast as per [RFC7432]. | |||
| c) The solution is backwards compatible with existing NVEs using a | d. The solution is backwards compatible with existing NVEs using a | |||
| non-optimized version of IR. A given EVI can have NVEs/PEs | non-optimized version of IR. A given BD can have NVEs/PEs | |||
| supporting regular-IR and optimized-IR. | supporting regular-IR and optimized-IR. | |||
| d) The solution is independent of the NVO specific data plane | e. The solution is independent of the NVO specific data plane | |||
| encapsulation and the virtual identifiers being used, e.g.: VXLAN | encapsulation and the virtual identifiers being used, e.g.: VXLAN | |||
| VNIs, NVGRE VSIDs or MPLS labels, as long as the tunnel is IP- | VNIs, NVGRE VSIDs or MPLS labels, as long as the tunnel is IP- | |||
| based. | based. | |||
| 4. EVPN BGP Attributes for optimized-IR | 4. EVPN BGP Attributes for optimized-IR | |||
| This solution extends the [RFC7432] Inclusive Multicast Ethernet Tag | This solution extends the [RFC7432] Inclusive Multicast Ethernet Tag | |||
| routes and attributes so that an NVE/PE can signal its optimized-IR | routes and attributes so that an NVE/PE can signal its optimized-IR | |||
| capabilities. | capabilities. | |||
| The Inclusive Multicast Ethernet Tag route (RT-3) and its PMSI Tunnel | The Inclusive Multicast Ethernet Tag route (RT-3) and its PMSI Tunnel | |||
| Attribute's (PTA) general format used in [RFC7432] are shown below: | Attribute's (PTA) general format used in [RFC7432] are shown below: | |||
| +---------------------------------+ | +---------------------------------+ | |||
| | RD (8 octets) | | | RD (8 octets) | | |||
| +---------------------------------+ | +---------------------------------+ | |||
| | Ethernet Tag ID (4 octets) | | | Ethernet Tag ID (4 octets) | | |||
| +---------------------------------+ | +---------------------------------+ | |||
| | IP Address Length (1 octet) | | | IP Address Length (1 octet) | | |||
| +---------------------------------+ | +---------------------------------+ | |||
| | Originating Router's IP Addr | | | Originating Router's IP Addr | | |||
| | (4 or 16 octets) | | | (4 or 16 octets) | | |||
| +---------------------------------+ | +---------------------------------+ | |||
| +---------------------------------+ | +---------------------------------+ | |||
| | Flags (1 octet) | | | Flags (1 octet) | | |||
| +---------------------------------+ | +---------------------------------+ | |||
| | Tunnel Type (1 octets) | | | Tunnel Type (1 octets) | | |||
| +---------------------------------+ | +---------------------------------+ | |||
| | MPLS Label (3 octets) | | | MPLS Label (3 octets) | | |||
| +---------------------------------+ | +---------------------------------+ | |||
| | Tunnel Identifier (variable) | | | Tunnel Identifier (variable) | | |||
| +---------------------------------+ | +---------------------------------+ | |||
| The Flags field is defined as follows: | The Flags field is 8 bits long. This document defines the use of 4 | |||
| bits of this Flags field: | ||||
| 0 1 2 3 4 5 6 7 | - bits 3 and 4, forming together the Assisted-Replication Type (T) | |||
| +-+-+-+-+-+--+-+-+ | field | |||
| |rsvd | T |BM|U|L| | ||||
| +-+-+-+-+-+--+-+-+ | ||||
| Where a new type field (for AR) and two new flags (for PFL signaling) | - bit 5, called the Broadcast and Multicast (BM) flag | |||
| are defined: | ||||
| - T is the AR Type field (2 bits) that defines the AR role of the | - bit 6, called the Unknown (U) flag | |||
| advertising router: | ||||
| + 00 (decimal 0) = RNVE (non-AR support) | Bits 5 and 6 are collectively referred to as the PFL (Pruned-Flood | |||
| Lists) flags. | ||||
| + 01 (decimal 1) = AR-REPLICATOR | The T field and PFL flags are defined as follows: | |||
| + 10 (decimal 2) = AR-LEAF | - T is the AR Type field (2 bits) that defines the AR role of the | |||
| advertising router: | ||||
| + 11 (decimal 3) = RESERVED | o 00 (decimal 0) = RNVE (non-AR support) | |||
| - The PFL (Pruned-Flood-Lists) flags defined the desired behavior of | o 01 (decimal 1) = AR-REPLICATOR | |||
| the advertising router for the different types of traffic: | ||||
| + BM= Broadcast and Multicast (BM) flag. BM=1 means "prune-me" from | o 10 (decimal 2) = AR-LEAF | |||
| the BM flooding list. BM=0 means regular behavior. | ||||
| + U= Unknown flag. U=1 means "prune-me" from the Unknown flooding | o 11 (decimal 3) = RESERVED | |||
| list. U=0 means regular behavior. | ||||
| - Flag L is an existing flag defined in [RFC6514] (L=Leaf Information | - The PFL (Pruned-Flood-Lists) flags define the desired behavior of | |||
| Required) and it will be used only in the Selective AR Solution. | the advertising router for the different types of traffic: | |||
| Please refer to section 10 for the IANA considerations related to the | o BM= Broadcast and Multicast (BM) flag. BM=1 means "prune-me" | |||
| from the BM flooding list. BM=0 means regular behavior. | ||||
| o U= Unknown flag. U=1 means "prune-me" from the Unknown | ||||
| flooding list. U=0 means regular behavior. | ||||
| - Flag L is an existing flag defined in [RFC6514] (L=Leaf | ||||
| Information Required) and it will be used only in the Selective AR | ||||
| Solution. | ||||
| Please refer to Section 11 for the IANA considerations related to the | ||||
| PTA flags. | PTA flags. | |||
| In this document, the above RT-3 and PTA can be used in two different | In this document, the above RT-3 and PTA can be used in two different | |||
| modes for the same EVI/Ethernet Tag: | modes for the same BD: | |||
| o Regular-IR route: in this route, Originating Router's IP Address, | - Regular-IR route: in this route, Originating Router's IP Address, | |||
| Tunnel Type (0x06), MPLS Label, Tunnel Identifier and Flags MUST be | Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used | |||
| used as described in [RFC7432]. The Originating Router's IP Address | as described in [RFC7432] when Ingress Replication is in use. The | |||
| and Tunnel Identifier are set to an IP address that we denominate | NVE/PE that advertises the route will set the Next-Hop to an IP | |||
| IR-IP in this document. | address that we denominate IR-IP in this document. When | |||
| advertised by an AR-LEAF node, the Regular-IR route SHOULD be | ||||
| advertised with type T= AR-LEAF. | ||||
| o Replicator-AR route: this route is used by the AR-REPLICATOR to | - Replicator-AR route: this route is used by the AR-REPLICATOR to | |||
| advertise its AR capabilities, with the fields set as follows. | advertise its AR capabilities, with the fields set as follows: | |||
| + Originating Router's IP Address as well as the Tunnel Identifier | o Originating Router's IP Address MUST be set to an IP address of | |||
| are set to the same routable IP address that we denominate AR-IP | the PE that should be common to all the EVIs on the PE (usually | |||
| and SHOULD be different than the IR-IP for a given PE/NVE. | this is the PE's loopback address). The Tunnel Identifier and | |||
| Next-Hop SHOULD be set to the same IP address as the | ||||
| Originating Router's IP address when the NVE/PE originates the | ||||
| route. The Next-Hop address is referred to as the AR-IP and | ||||
| SHOULD be different than the IR-IP for a given PE/NVE. | ||||
| + Tunnel Type = Assisted-Replication (AR). Section 11 provides the | o Tunnel Type = Assisted-Replication Tunnel. Section 11 provides | |||
| allocated type value. | the allocated type value. | |||
| + T (AR role type) = 01 (AR-REPLICATOR). | o T (AR role type) = 01 (AR-REPLICATOR). | |||
| + L (Leaf Information Required) = 0 (for non-selective AR) or 1 | o L (Leaf Information Required) = 0 (for non-selective AR) or 1 | |||
| (for selective AR). | (for selective AR). | |||
| In addition, this document also uses the Leaf-AD route (RT-11) | In addition, this document also uses the Leaf-AD route (RT-11) | |||
| defined in [EVPN-BUM] in case the selective AR mode is used. The | defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the | |||
| Leaf-AD route MAY be used by the AR-LEAF in response to a Replicator- | selective AR mode is used. The Leaf-AD route MAY be used by the AR- | |||
| AR route (with the L flag set) to advertise its desire to receive the | LEAF in response to a Replicator-AR route (with the L flag set) to | |||
| multicast traffic from a specific AR-REPLICATOR. It is only used for | advertise its desire to receive the BM traffic from a specific AR- | |||
| selective AR and its fields are set as follows: | REPLICATOR. It is only used for selective AR and its fields are set | |||
| as follows: | ||||
| + Originating Router's IP Address is set to the advertising IR-IP | o Originating Router's IP Address is set to the advertising PE's | |||
| (same IP used by the AR-LEAF in regular-IR routes). | IP address (same IP used by the AR-LEAF in regular-IR routes). | |||
| The Next-Hop address is set to the IR-IP. | ||||
| + Route Key is the "Route Type Specific" NLRI of the Replicator-AR | o Route Key is the "Route Type Specific" NLRI of the Replicator- | |||
| route for which this Leaf-AD route is generated. | AR route for which this Leaf-AD route is generated. | |||
| + The AR-LEAF constructs an IP-address-specific route-target as | o The AR-LEAF constructs an IP-address-specific route-target as | |||
| indicated in [EVPN-BUM], by placing the IP address carried in the | indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by | |||
| Next Hop field of the received Replicator-AR route in the Global | placing the IP address carried in the Next-Hop field of the | |||
| Administrator field of the Community, with the Local | received Replicator-AR route in the Global Administrator field | |||
| Administrator field of this Community set to 0. Note that the | of the Community, with the Local Administrator field of this | |||
| same IP-address-specific import route-target is auto-configured | Community set to 0. Note that the same IP-address-specific | |||
| by the AR-REPLICATOR that sent the Replicator-AR, in order to | import route-target is auto-configured by the AR-REPLICATOR | |||
| control the acceptance of the Leaf-AD routes. | that sent the Replicator-AR, in order to control the acceptance | |||
| of the Leaf-AD routes. | ||||
| + The leaf-AD route MUST include the PMSI Tunnel attribute with the | o The leaf-AD route MUST include the PMSI Tunnel attribute with | |||
| Tunnel Type set to AR, type set to AR-LEAF and the Tunnel | the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel | |||
| Identifier set to the IR-IP of the advertising AR-LEAF. The PMSI | Identifier set to the IP of the advertising AR-LEAF. The PMSI | |||
| Tunnel attribute MUST carry a downstream-assigned MPLS label that | Tunnel attribute MUST carry a downstream-assigned MPLS label or | |||
| is used by the AR-REPLICATOR to send traffic to the AR-LEAF. | VNI that is used by the AR-REPLICATOR to send traffic to the | |||
| AR-LEAF. | ||||
| Each AR-enabled node MUST understand and process the AR type field in | Each AR-enabled node MUST understand and process the AR type field in | |||
| the PTA (Flags field) of the routes, and MUST signal the | the PTA (Flags field) of the routes, and MUST signal the | |||
| corresponding type (1 or 2) according to its administrative choice. | corresponding type (1 or 2) according to its administrative choice. | |||
| Each node, part of the EVI, MAY understand and process the BM/U | Each node attached to the BD may understand and process the BM/U | |||
| flags. Note that these BM/U flags may be used to optimize the | flags. Note that these BM/U flags may be used to optimize the | |||
| delivery of multi-destination traffic and its use SHOULD be an | delivery of multi-destination traffic and its use SHOULD be an | |||
| administrative choice, and independent of the AR role. | administrative choice, and independent of the AR role. | |||
| Non-optimized-IR nodes will be unaware of the new PMSI attribute flag | Non-optimized-IR nodes will be unaware of the new PMSI attribute flag | |||
| definition as well as the new Tunnel Type (AR), i.e. they will ignore | definition as well as the new Tunnel Type (AR), i.e. they will ignore | |||
| the information contained in the flags field for any RT-3 and will | the information contained in the flags field for any RT-3 and will | |||
| ignore the RT-3 routes with an unknown Tunnel Type (type AR in this | ignore the RT-3 routes with an unknown Tunnel Type (type AR in this | |||
| case). | case). | |||
| 5. Non-selective Assisted-Replication (AR) Solution Description | 5. Non-selective Assisted-Replication (AR) Solution Description | |||
| The following figure illustrates an example NVO network where the | Figure 1 illustrates an example NVO network where the non-selective | |||
| non-selective AR function is enabled. Three different roles are | AR function is enabled. Three different roles are defined for a | |||
| defined for a given EVI: AR-REPLICATOR, AR-LEAF and RNVE (Regular | given BD: AR-REPLICATOR, AR-LEAF and RNVE (Regular NVE). The | |||
| NVE). The solution is called "non-selective" because the chosen AR- | solution is called "non-selective" because the chosen AR-REPLICATOR | |||
| REPLICATOR for a given flow MUST replicate the multicast traffic to | for a given flow MUST replicate the BM traffic to 'all' the NVE/PEs | |||
| 'all' the NVE/PEs in the EVI except for the source NVE/PE. | in the BD except for the source NVE/PE. | |||
| ( ) | ( ) | |||
| (_ WAN _) | (_ WAN _) | |||
| +---(_ _)----+ | +---(_ _)----+ | |||
| | (_ _) | | | (_ _) | | |||
| PE1 | PE2 | | PE1 | PE2 | | |||
| +------+----+ +----+------+ | +------+----+ +----+------+ | |||
| TS1--+ (EVI-1) | | (EVI-1) +--TS2 | TS1--+ (BD-1) | | (BD-1) +--TS2 | |||
| |REPLICATOR | |REPLICATOR | | |REPLICATOR | |REPLICATOR | | |||
| +--------+--+ +--+--------+ | +--------+--+ +--+--------+ | |||
| | | | | | | |||
| +--+----------------+--+ | +--+----------------+--+ | |||
| | | | | | | |||
| | | | | | | |||
| +----+ VXLAN/nvGRE/MPLSoGRE +----+ | +----+ VXLAN/nvGRE/MPLSoGRE +----+ | |||
| | | IP Fabric | | | | | IP Fabric | | | |||
| | | | | | | | | | | |||
| NVE1 | +-----------+----------+ | NVE3 | NVE1 | +-----------+----------+ | NVE3 | |||
| Hypervisor| TOR | NVE2 |Hypervisor | Hypervisor| TOR | NVE2 |Hypervisor | |||
| +---------+-+ +-----+-----+ +-+---------+ | +---------+-+ +-----+-----+ +-+---------+ | |||
| | (EVI-1) | | (EVI-1) | | (EVI-1) | | | (BD-1) | | (BD-1) | | (BD-1) | | |||
| | LEAF | | RNVE | | LEAF | | | LEAF | | RNVE | | LEAF | | |||
| +--+-----+--+ +--+-----+--+ +--+-----+--+ | +--+-----+--+ +--+-----+--+ +--+-----+--+ | |||
| | | | | | | | | | | | | | | |||
| VM11 VM12 TS3 TS4 VM31 VM32 | VM11 VM12 TS3 TS4 VM31 VM32 | |||
| Figure 1 Optimized-IR scenario | Figure 1: Optimized-IR scenario | |||
| 5.1. Non-selective AR-REPLICATOR procedures | In AR BDs such as BD-1 in the example, BM (Broadcast and Multicast) | |||
| traffic between two NVEs may follow a different path than unicast | ||||
| traffic. This solution recommends the replication of BM through the | ||||
| AR-REPLICATOR node, whereas unknown/known unicast will be delivered | ||||
| directly from the source node to the destination node without being | ||||
| replicated by any intermediate node. Unknown unicast SHALL follow | ||||
| the same path as known unicast traffic in order to avoid packet | ||||
| reordering for unicast applications and simplify the control and data | ||||
| plane procedures. | ||||
| Note that known unicast forwarding is not impacted by this solution. | ||||
| 5.1. Non-selective AR-REPLICATOR procedures | ||||
| An AR-REPLICATOR is defined as an NVE/PE capable of replicating | An AR-REPLICATOR is defined as an NVE/PE capable of replicating | |||
| ingress BM (Broadcast and Multicast) traffic received on an overlay | ingress BM (Broadcast and Multicast) traffic received on an overlay | |||
| tunnel to other overlay tunnels and local Attachment Circuits (ACs). | tunnel to other overlay tunnels and local Attachment Circuits (ACs). | |||
| The AR-REPLICATOR signals its role in the control plane and | The AR-REPLICATOR signals its role in the control plane and | |||
| understands where the other roles (AR-LEAF nodes, RNVEs and other AR- | understands where the other roles (AR-LEAF nodes, RNVEs and other AR- | |||
| REPLICATORs) are located. A given AR-enabled EVI service may have | REPLICATORs) are located. A given AR-enabled BD service may have | |||
| zero, one or more AR-REPLICATORs. In our example in figure 1, PE1 and | zero, one or more AR-REPLICATORs. In our example in Figure 1, PE1 | |||
| PE2 are defined as AR-REPLICATORs. The following considerations apply | and PE2 are defined as AR-REPLICATORs. The following considerations | |||
| to the AR-REPLICATOR role: | apply to the AR-REPLICATOR role: | |||
| a) The AR-REPLICATOR role SHOULD be an administrative choice in any | a. The AR-REPLICATOR role SHOULD be an administrative choice in any | |||
| NVE/PE that is part of an AR-enabled EVI. This administrative | NVE/PE that is part of an AR-enabled BD. This administrative | |||
| option to enable AR-REPLICATOR capabilities MAY be implemented as | option to enable AR-REPLICATOR capabilities MAY be implemented as | |||
| a system level option as opposed to as a per-MAC-VRF option. | a system level option as opposed to as a per-BD option. | |||
| b) An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY | b. An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY | |||
| advertise a Regular-IR route. The AR-REPLICATOR MUST NOT generate | advertise a Regular-IR route. The AR-REPLICATOR MUST NOT | |||
| a Regular-IR route if it does not have local attachment circuits | generate a Regular-IR route if it does not have local attachment | |||
| (AC). If the Regular-IR route is advertised, the AR Type field MAY | circuits (AC). If the Regular-IR route is advertised, the AR | |||
| be set to AR-REPLICATOR. | Type field is set to zero. | |||
| c) The Replicator-AR and Regular-IR routes will be generated | c. The Replicator-AR and Regular-IR routes are generated according | |||
| according to section 3. The AR-IP and IR-IP used by the | to section 3. The AR-IP and IR-IP used by the AR-REPLICATOR are | |||
| Replicator-AR will be different routable IP addresses. | different routable IP addresses. | |||
| d) When a node defined as AR-REPLICATOR receives a packet on an | d. When a node defined as AR-REPLICATOR receives a BM packet on an | |||
| overlay tunnel, it will do a tunnel destination IP lookup and | overlay tunnel, it will do a tunnel destination IP lookup and | |||
| apply the following procedures: | apply the following procedures: | |||
| o If the destination IP is the AR-REPLICATOR IR-IP Address the | o If the destination IP is the AR-REPLICATOR IR-IP Address the | |||
| node will process the packet normally as in [RFC7432]. | node will process the packet normally as in [RFC7432]. | |||
| o If the destination IP is the AR-REPLICATOR AR-IP Address the | o If the destination IP is the AR-REPLICATOR AR-IP Address the | |||
| node MUST replicate the packet to local ACs and overlay | node MUST replicate the packet to local ACs and overlay | |||
| tunnels (excluding the overlay tunnel to the source of the | tunnels (excluding the overlay tunnel to the source of the | |||
| packet). When replicating to remote AR-REPLICATORs the tunnel | packet). When replicating to remote AR-REPLICATORs the tunnel | |||
| destination IP will be an IR-IP. That will be an indication | destination IP will be an IR-IP. That will be an indication | |||
| for the remote AR-REPLICATOR that it MUST NOT replicate to | for the remote AR-REPLICATOR that it MUST NOT replicate to | |||
| overlay tunnels. The tunnel source IP used by the AR- | overlay tunnels. The tunnel source IP used by the AR- | |||
| REPLICATOR MUST be its IR-IP. | REPLICATOR MUST be its IR-IP when replicating to either AR- | |||
| REPLICATOR or AR-LEAF nodes. | ||||
| 5.2. Non-selective AR-LEAF procedures | An AR-REPLICATOR will follow a data path implementation compatible | |||
| with the following rules: | ||||
| - The AR-REPLICATORs will build a flooding list composed of ACs and | ||||
| overlay tunnels to remote nodes in the BD. Some of those overlay | ||||
| tunnels MAY be flagged as non-BM receivers based on the BM flag | ||||
| received from the remote nodes in the BD. | ||||
| - When an AR-REPLICATOR receives a BM packet on an AC, it will | ||||
| forward the BM packet to its flooding list (including local ACs | ||||
| and remote NVE/PEs), skipping the non-BM overlay tunnels. | ||||
| - When an AR-REPLICATOR receives a BM packet on an overlay tunnel, | ||||
| it will check the destination IP of the underlay IP header and: | ||||
| o If the destination IP matches its AR-IP, the AR-REPLICATOR will | ||||
| forward the BM packet to its flooding list (ACs and overlay | ||||
| tunnels) excluding the non-BM overlay tunnels. The AR- | ||||
| REPLICATOR will do source squelching to ensure the traffic is | ||||
| not sent back to the originating AR-LEAF. | ||||
| o If the destination IP matches its IR-IP, the AR-REPLICATOR will | ||||
| skip all the overlay tunnels from the flooding list, i.e. it | ||||
| will only replicate to local ACs. This is the regular IR | ||||
| behavior described in [RFC7432]. | ||||
| - While the forwarding behavior in AR-REPLICATORs and AR-LEAF nodes | ||||
| is different for BM traffic, as far as Unknown unicast traffic | ||||
| forwarding is concerned, AR-LEAF nodes behave exactly in the same | ||||
| way as AR-REPLICATORs do. | ||||
| - The AR-REPLICATOR/LEAF nodes will build an Unknown unicast flood- | ||||
| list composed of ACs and overlay tunnels to the IR-IP Addresses of | ||||
| the remote nodes in the BD. Some of those overlay tunnels MAY be | ||||
| flagged as non-U (Unknown unicast) receivers based on the U flag | ||||
| received from the remote nodes in the BD. | ||||
| o When an AR-REPLICATOR/LEAF receives an unknown packet on an AC, | ||||
| it will forward the unknown packet to its flood-list, skipping | ||||
| the non-U overlay tunnels. | ||||
| o When an AR-REPLICATOR/LEAF receives an unknown packet on an | ||||
| overlay tunnel will forward the unknown packet to its local ACs | ||||
| and never to an overlay tunnel. This is the regular IR | ||||
| behavior described in [RFC7432]. | ||||
| 5.2. Non-selective AR-LEAF procedures | ||||
| AR-LEAF is defined as an NVE/PE that - given its poor replication | AR-LEAF is defined as an NVE/PE that - given its poor replication | |||
| performance - sends all the BM traffic to an AR-REPLICATOR that can | performance - sends all the BM traffic to an AR-REPLICATOR that can | |||
| replicate the traffic further on its behalf. It MAY signal its AR- | replicate the traffic further on its behalf. It MAY signal its AR- | |||
| LEAF capability in the control plane and understands where the other | LEAF capability in the control plane and understands where the other | |||
| roles are located (AR-REPLICATOR and RNVEs). A given service can have | roles are located (AR-REPLICATOR and RNVEs). A given service can | |||
| zero, one or more AR-LEAF nodes. Figure 1 shows NVE1 and NVE3 (both | have zero, one or more AR-LEAF nodes. Figure 1 shows NVE1 and NVE3 | |||
| residing in hypervisors) acting as AR-LEAF. The following | (both residing in hypervisors) acting as AR-LEAF. The following | |||
| considerations apply to the AR-LEAF role: | considerations apply to the AR-LEAF role: | |||
| a) The AR-LEAF role SHOULD be an administrative choice in any NVE/PE | a. The AR-LEAF role SHOULD be an administrative choice in any NVE/PE | |||
| that is part of an AR-enabled EVI. This administrative option to | that is part of an AR-enabled BD. This administrative option to | |||
| enable AR-LEAF capabilities MAY be implemented as a system level | enable AR-LEAF capabilities MAY be implemented as a system level | |||
| option as opposed to as per-MAC-VRF option. | option as opposed to as per-BD option. | |||
| b) In this non-selective AR solution, the AR-LEAF MUST advertise a | b. In this non-selective AR solution, the AR-LEAF MUST advertise a | |||
| single Regular-IR inclusive multicast route as in [RFC7432]. The | single Regular-IR inclusive multicast route as in [RFC7432]. The | |||
| AR-LEAF SHOULD set the AR Type field to AR-LEAF. Note that | AR-LEAF SHOULD set the AR Type field to AR-LEAF. Note that | |||
| although this flag does not make any difference for the egress | although this flag does not make any difference for the egress | |||
| nodes when creating an EVPN destination to the the AR-LEAF, it is | nodes when creating an EVPN destination to the AR-LEAF, it is | |||
| RECOMMENDED the use of this flag for an easy operation and | RECOMMENDED to use this flag for an easy operation and | |||
| troubleshooting of the EVI. | troubleshooting of the BD. | |||
| c) In a service where there are no AR-REPLICATORs, the AR-LEAF MUST | c. In a service where there are no AR-REPLICATORs, the AR-LEAF MUST | |||
| use regular ingress replication. This will happen when a new | use regular ingress replication. This will happen when a new | |||
| update from the last former AR-REPLICATOR is received and contains | update from the last former AR-REPLICATOR is received and | |||
| a non-REPLICATOR AR type, or when the AR-LEAF detects that the | contains a non-REPLICATOR AR type, or when the AR-LEAF detects | |||
| last AR-REPLICATOR is down (next-hop tracking in the IGP or any | that the last AR-REPLICATOR is down (via next-hop tracking in the | |||
| other detection mechanism). Ingress replication MUST use the | IGP or any other detection mechanism). Ingress replication MUST | |||
| forwarding information given by the remote Regular-IR Inclusive | use the forwarding information given by the remote Regular-IR | |||
| Multicast Routes as described in [RFC7432]. | Inclusive Multicast Routes as described in [RFC7432]. | |||
| d) In a service where there is one or more AR-REPLICATORs (based on | d. In a service where there is one or more AR-REPLICATORs (based on | |||
| the received Replicator-AR routes for the EVI), the AR-LEAF can | the received Replicator-AR routes for the BD), the AR-LEAF can | |||
| locally select which AR-REPLICATOR it sends the BM traffic to: | locally select which AR-REPLICATOR it sends the BM traffic to: | |||
| o A single AR-REPLICATOR MAY be selected for all the BM packets | o A single AR-REPLICATOR MAY be selected for all the BM packets | |||
| received on the AR-LEAF attachment circuits (ACs) for a given | received on the AR-LEAF attachment circuits (ACs) for a given | |||
| EVI. This selection is a local decision and it does not have | BD. This selection is a local decision and it does not have | |||
| to match other AR-LEAF's selection within the same EVI. | to match other AR-LEAF's selection within the same BD. | |||
| o An AR-LEAF MAY select more than one AR-REPLICATOR and do | o An AR-LEAF MAY select more than one AR-REPLICATOR and do | |||
| either per-flow or per-EVI load balancing. | either per-flow or per-BD load balancing. | |||
| o In case of a failure on the selected AR-REPLICATOR, another | o In case of a failure on the selected AR-REPLICATOR, another | |||
| AR-REPLICATOR will be selected. | AR-REPLICATOR will be selected. | |||
| o When an AR-REPLICATOR is selected, the AR-LEAF MUST send all | o When an AR-REPLICATOR is selected, the AR-LEAF MUST send all | |||
| the BM packets to that AR-REPLICATOR using the forwarding | the BM packets to that AR-REPLICATOR using the forwarding | |||
| information given by the Replicator-AR route for the chosen | information given by the Replicator-AR route for the chosen | |||
| AR-REPLICATOR, with tunnel type = 0x0A (AR tunnel). The | AR-REPLICATOR, with tunnel type = 0x0A (AR tunnel). The | |||
| underlay destination IP address MUST be the AR-IP advertised | underlay destination IP address MUST be the AR-IP advertised | |||
| by the AR-REPLICATOR in the Replicator-AR route. | by the AR-REPLICATOR in the Replicator-AR route. | |||
| o AR-LEAF nodes SHALL send service-level BM control plane | o AR-LEAF nodes SHALL send service-level BM control plane | |||
| packets following regular IR procedures. An example would be | packets following regular IR procedures. An example would be | |||
| IGMP, MLD or PIM multicast packets. The AR-REPLICATORs MUST | IGMP, MLD or PIM multicast packets. The AR-REPLICATORs MUST | |||
| NOT replicate these control plane packets to other overlay | NOT replicate these control plane packets to other overlay | |||
| tunnels since they will use the regular IR-IP Address. | tunnels since they will use the regular IR-IP Address. | |||
| e) The use of an AR-REPLICATOR-activation-timer (in seconds) on the | e. The use of an AR-REPLICATOR-activation-timer (in seconds) on the | |||
| AR-LEAF nodes is RECOMMENDED. Upon receiving a new Replicator-AR | AR-LEAF nodes is RECOMMENDED. Upon receiving a new Replicator-AR | |||
| route where the AR-REPLICATOR is selected, the AR-LEAF will run a | route where the AR-REPLICATOR is selected, the AR-LEAF will run a | |||
| timer before programming the new AR-REPLICATOR. This will give the | timer before programming the new AR-REPLICATOR. This will give | |||
| AR-REPLICATOR some time to program the AR-LEAF nodes before the | the AR-REPLICATOR some time to program the AR-LEAF nodes before | |||
| AR-LEAF sends BM traffic. | the AR-LEAF sends BM traffic. | |||
| 5.3. RNVE procedures | ||||
| RNVE (Regular Network Virtualization Edge node) is defined as an | ||||
| NVE/PE without AR-REPLICATOR or AR-LEAF capabilities that does IR as | ||||
| described in [RFC7432]. The RNVE does not signal any AR role and is | ||||
| unaware of the AR-REPLICATOR/LEAF roles in the EVI. The RNVE will | ||||
| ignore the Flags in the Regular-IR routes and will ignore the | ||||
| Replicator-AR routes (due to an unknown tunnel type in the PTA) and | ||||
| the Leaf-AD routes (due to the IP-address-specific route-target). | ||||
| This role provides EVPN with the backwards compatibility required in | ||||
| optimized-IR EVIs. Figure 1 shows NVE2 as RNVE. | ||||
| 5.4. Forwarding behavior in non-selective AR EVIs | ||||
| In AR EVIs, BM (Broadcast and Multicast) traffic between two NVEs may | ||||
| follow a different path than unicast traffic. This solution | ||||
| recommends the replication of BM through the AR-REPLICATOR node, | ||||
| whereas unknown/known unicast will be delivered directly from the | ||||
| source node to the destination node without being replicated by any | ||||
| intermediate node. Unknown unicast SHALL follow the same path as | ||||
| known unicast traffic in order to avoid packet reordering for unicast | ||||
| applications and simplify the control and data plane procedures. | ||||
| Section 4.4.1. describes the expected forwarding behavior for BM | ||||
| traffic in nodes acting as AR-REPLICATOR, AR-LEAF and RNVE. Section | ||||
| 4.4.2. describes the forwarding behavior for unknown unicast traffic. | ||||
| Note that known unicast forwarding is not impacted by this solution. | ||||
| 5.4.1. Broadcast and Multicast forwarding behavior | ||||
| The expected behavior per role is described in this section. | ||||
| 5.4.1.1. Non-selective AR-REPLICATOR BM forwarding | ||||
| The AR-REPLICATORs will build a flooding list composed of ACs and | ||||
| overlay tunnels to remote nodes in the EVI. Some of those overlay | ||||
| tunnels MAY be flagged as non-BM receivers based on the BM flag | ||||
| received from the remote nodes in the EVI. | ||||
| o When an AR-REPLICATOR receives a BM packet on an AC, it will | ||||
| forward the BM packet to its flooding list (including local ACs and | ||||
| remote NVE/PEs), skipping the non-BM overlay tunnels. | ||||
| o When an AR-REPLICATOR receives a BM packet on an overlay tunnel, it | ||||
| will check the destination IP of the underlay IP header and: | ||||
| - If the destination IP matches its AR-IP, the AR-REPLICATOR will | ||||
| forward the BM packet to its flooding list (ACs and overlay | ||||
| tunnels) excluding the non-BM overlay tunnels. The AR-REPLICATOR | ||||
| will do source squelching to ensure the traffic is not sent back | ||||
| to the originating AR-LEAF. | ||||
| - If the destination IP matches its IR-IP, the AR-REPLICATOR will | ||||
| skip all the overlay tunnels from the flooding list, i.e. it | ||||
| will only replicate to local ACs. This is the regular IR | ||||
| behavior described in [RFC7432]. | ||||
| 5.4.1.2. Non-selective AR-LEAF BM forwarding | ||||
| The AR-LEAF nodes will build two flood-lists: | ||||
| 1) Flood-list #1 - composed of ACs and an AR-REPLICATOR-set of | ||||
| overlay tunnels. The AR-REPLICATOR-set is defined as one or more | ||||
| overlay tunnels to the AR-IP Addresses of the remote AR- | ||||
| REPLICATOR(s) in the EVI. The selection of more than one AR- | ||||
| REPLICATOR is described in section 4.2. and it is a local AR- | ||||
| LEAF decision. | ||||
| 2) Flood-list #2 - composed of ACs and overlay tunnels to the | ||||
| remote IR-IP Addresses. | ||||
| When an AR-LEAF receives a BM packet on an AC, it will check the | ||||
| AR-REPLICATOR-set: | ||||
| o If the AR-REPLICATOR-set is empty, the AR-LEAF will send the packet | ||||
| to flood-list #2. | ||||
| o If the AR-REPLICATOR-set is NOT empty, the AR-LEAF will send the | ||||
| packet to flood-list #1, where only one of the overlay tunnels of | ||||
| the AR-REPLICATOR-set is used. | ||||
| When an AR-LEAF receives a BM packet on an overlay tunnel, will | An AR-LEAF will follow a data path implementation compatible with the | |||
| forward the BM packet to its local ACs and never to an overlay | following rules: | |||
| tunnel. This is the regular IR behavior described in [RFC7432]. | ||||
| 5.4.1.3. RNVE BM forwarding | - The AR-LEAF nodes will build two flood-lists: | |||
| The RNVE is completely unaware of the AR-REPLICATORs, AR-LEAF nodes | 1. Flood-list #1 - composed of ACs and an AR-REPLICATOR-set of | |||
| and BM/U flags (that information is ignored). Its forwarding behavior | overlay tunnels. The AR-REPLICATOR-set is defined as one or | |||
| is the regular IR behavior described in [RFC7432]. Any regular non-AR | more overlay tunnels to the AR-IP Addresses of the remote AR- | |||
| node is fully compatible with the RNVE role described in this | REPLICATOR(s) in the BD. The selection of more than one AR- | |||
| document. | REPLICATOR is described in point d) above and it is a local | |||
| AR-LEAF decision. | ||||
| 5.4.2. Unknown unicast forwarding behavior | 2. Flood-list #2 - composed of ACs and overlay tunnels to the | |||
| remote IR-IP Addresses. | ||||
| The expected behavior is described in this section. | - When an AR-LEAF receives a BM packet on an AC, it will check the | |||
| AR-REPLICATOR-set: | ||||
| 5.4.2.1. Non-selective AR-REPLICATOR/LEAF Unknown unicast forwarding | o If the AR-REPLICATOR-set is empty, the AR-LEAF will send the | |||
| packet to flood-list #2. | ||||
| While the forwarding behavior in AR-REPLICATORs and AR-LEAF nodes is | o If the AR-REPLICATOR-set is NOT empty, the AR-LEAF will send | |||
| different for BM traffic, as far as Unknown unicast traffic | the packet to flood-list #1, where only one of the overlay | |||
| forwarding is concerned, AR-LEAF nodes behave exactly in the same way | tunnels of the AR-REPLICATOR-set is used. | |||
| as AR-REPLICATORs do. | ||||
| The AR-REPLICATOR/LEAF nodes will build a flood-list composed of ACs | - When an AR-LEAF receives a BM packet on an overlay tunnel, will | |||
| and overlay tunnels to the IR-IP Addresses of the remote nodes in the | forward the BM packet to its local ACs and never to an overlay | |||
| EVI. Some of those overlay tunnels MAY be flagged as non-U (Unknown | tunnel. This is the regular IR behavior described in [RFC7432]. | |||
| unicast) receivers based on the U flag received from the remote nodes | ||||
| in the EVI. | ||||
| o When an AR-REPLICATOR/LEAF receives an unknown packet on an AC, it | - AR-LEAF nodes process Unknown unicast traffic in the same way AR- | |||
| will forward the unknown packet to its flood-list, skipping the | REPLICATORS do, as described in section Section 5.1. | |||
| non-U overlay tunnels. | ||||
| o When an AR-REPLICATOR/LEAF receives an unknown packet on an overlay | 5.3. RNVE procedures | |||
| tunnel will forward the unknown packet to its local ACs and never | ||||
| to an overlay tunnel. This is the regular IR behavior described in | ||||
| [RFC7432]. | ||||
| 5.4.2.2. RNVE Unknown unicast forwarding | RNVE (Regular Network Virtualization Edge node) is defined as an NVE/ | |||
| PE without AR-REPLICATOR or AR-LEAF capabilities that does IR as | ||||
| described in [RFC7432]. The RNVE does not signal any AR role and is | ||||
| unaware of the AR-REPLICATOR/LEAF roles in the BD. The RNVE will | ||||
| ignore the Flags in the Regular-IR routes and will ignore the | ||||
| Replicator-AR routes (due to an unknown tunnel type in the PTA) and | ||||
| the Leaf-AD routes (due to the IP-address-specific route-target). | ||||
| As described for BM traffic, the RNVE is completely unaware of the | This role provides EVPN with the backwards compatibility required in | |||
| REPLICATORs, LEAF nodes and BM/U flags (that information is ignored). | optimized-IR BDs. Figure 1 shows NVE2 as RNVE. | |||
| Its forwarding behavior is the regular IR behavior described in | ||||
| [RFC7432], also for Unknown unicast traffic. Any regular non-AR node | ||||
| is fully compatible with the RNVE role described in this document. | ||||
| 6. Selective Assisted-Replication (AR) Solution Description | 6. Selective Assisted-Replication (AR) Solution Description | |||
| Figure 1 is also used to describe the selective AR solution, however | Figure 1 is also used to describe the selective AR solution, however | |||
| in this section we consider NVE2 as one more AR-LEAF for EVI-1. The | in this section we consider NVE2 as one more AR-LEAF for BD-1. The | |||
| solution is called "selective" because a given AR-REPLICATOR MUST | solution is called "selective" because a given AR-REPLICATOR MUST | |||
| replicate the BM traffic to only the AR-LEAF that requested the | replicate the BM traffic to only the AR-LEAF that requested the | |||
| replication (as opposed to all the AR-LEAF nodes) and MAY replicate | replication (as opposed to all the AR-LEAF nodes) and MAY replicate | |||
| the BM traffic to the RNVEs. The same AR roles defined in section 4 | the BM traffic to the RNVEs. The same AR roles defined in Section 4 | |||
| are used here, however the procedures are slightly different. | are used here, however the procedures are different. | |||
| The following sub-sections describe the differences in the procedures | The following sub-sections describe the differences in the procedures | |||
| of AR-REPLICATOR/LEAFs compared to the non-selective AR solution. | of AR-REPLICATOR/LEAFs compared to the non-selective AR solution. | |||
| There is no change on the RNVEs. | There is no change on the RNVEs. | |||
| 6.1. Selective AR-REPLICATOR procedures | 6.1. Selective AR-REPLICATOR procedures | |||
| In our example in figure 1, PE1 and PE2 are defined as Selective AR- | ||||
| REPLICATORs. The following considerations apply to the Selective AR- | In our example in Figure 1, PE1 and PE2 are defined as Selective AR- | |||
| REPLICATORs. The following considerations apply to the Selective AR- | ||||
| REPLICATOR role: | REPLICATOR role: | |||
| a) The Selective AR-REPLICATOR capability SHOULD be an administrative | a. The Selective AR-REPLICATOR capability SHOULD be an | |||
| choice in any NVE/PE that is part of an AR-enabled EVI, as the AR | administrative choice in any NVE/PE that is part of an AR-enabled | |||
| role itself. This administrative option MAY be implemented as a | BD, as the AR role itself. This administrative option MAY be | |||
| system level option as opposed to as a per-MAC-VRF option. | implemented as a system level option as opposed to as a per-BD | |||
| option. | ||||
| b) Each AR-REPLICATOR will build a list of AR-REPLICATOR, AR-LEAF and | b. Each AR-REPLICATOR will build a list of AR-REPLICATOR, AR-LEAF | |||
| RNVE nodes (AR-LEAF nodes that sent only a regular-IR route are | and RNVE nodes. In spite of the 'Selective' administrative | |||
| accounted as RNVEs by the AR-REPLICATOR). In spite of the | option, an AR-REPLICATOR MUST NOT behave as a Selective AR- | |||
| 'Selective' administrative option, an AR-REPLICATOR MUST NOT | REPLICATOR if at least one of the AR-REPLICATORs has the L flag | |||
| behave as a Selective AR-REPLICATOR if at least one of the AR- | NOT set. If at least one AR-REPLICATOR sends a Replicator-AR | |||
| REPLICATORs has the L flag NOT set. If at least one AR-REPLICATOR | route with L=0 (in the BD context), the rest of the AR- | |||
| sends a Replicator-AR route with L=0 (in the EVI context), the | REPLICATORs will fall back to non-selective AR mode. | |||
| rest of the AR-REPLICATORs will fall back to non-selective AR | ||||
| mode. | ||||
| b) The Selective AR-REPLICATOR MUST follow the procedures described | c. The Selective AR-REPLICATOR MUST follow the procedures described | |||
| in section 4.1, except for the following differences: | in section Section 5.1, except for the following differences: | |||
| o The Replicator-AR route MUST include L=1 (Leaf Information | o The Replicator-AR route MUST include L=1 (Leaf Information | |||
| Required) in the Replicator-AR route. This flag is used by the | Required) in the Replicator-AR route. This flag is used by | |||
| AR-REPLICATORs to advertise their 'selective' AR-REPLICATOR | the AR-REPLICATORs to advertise their 'selective' AR- | |||
| capabilities. In addition, the AR-REPLICATOR auto-configures | REPLICATOR capabilities. In addition, the AR-REPLICATOR auto- | |||
| its IP-address-specific import route-target as described in | configures its IP-address-specific import route-target as | |||
| section 3. | described in section Section 4. | |||
| o The AR-REPLICATOR will build a 'selective' AR-LEAF-set with | o The AR-REPLICATOR will build a 'selective' AR-LEAF-set with | |||
| the list of nodes that requested replication to its own AR-IP. | the list of nodes that requested replication to its own AR-IP. | |||
| For instance, assuming NVE1 and NVE2 advertise a Leaf-AD route | For instance, assuming NVE1 and NVE2 advertise a Leaf-AD route | |||
| with PE1's IP-address-specific route-target and NVE3 | with PE1's IP-address-specific route-target and NVE3 | |||
| advertises a Leaf-AD route with PE2's IP-address-specific | advertises a Leaf-AD route with PE2's IP-address-specific | |||
| route-target, PE1 MUST only add NVE1/NVE2 to its selective AR- | route-target, PE1 MUST only add NVE1/NVE2 to its selective AR- | |||
| LEAF-set for EVI-1, and exclude NVE3. | LEAF-set for BD-1, and exclude NVE3. | |||
| o When a node defined and operating as Selective AR-REPLICATOR | o When a node defined and operating as Selective AR-REPLICATOR | |||
| receives a packet on an overlay tunnel, it will do a tunnel | receives a packet on an overlay tunnel, it will do a tunnel | |||
| destination IP lookup and if the destination IP is the AR- | destination IP lookup and if the destination IP is the AR- | |||
| REPLICATOR AR-IP Address, the node MUST replicate the packet | REPLICATOR AR-IP Address, the node MUST replicate the packet | |||
| to: | to: | |||
| + local ACs | + local ACs | |||
| + overlay tunnels in the Selective AR-LEAF-set (excluding the | ||||
| overlay tunnel to the source AR-LEAF). | ||||
| + overlay tunnels to the RNVEs if the tunnel source IP is the | ||||
| IR-IP of an AR-LEAF (in any other case, the AR-REPLICATOR | ||||
| MUST NOT replicate the BM traffic to remote RNVEs). In other | ||||
| words, the first-hop selective AR-REPLICATOR will replicate | ||||
| to all the RNVEs. | ||||
| + overlay tunnels to the remote Selective AR-REPLICATORs if | ||||
| the tunnel source IP is an IR-IP of its own AR-LEAF-set (in | ||||
| any other case, the AR-REPLICATOR MUST NOT replicate the BM | ||||
| traffic to remote AR-REPLICATORs), where the tunnel | ||||
| destination IP is the AR-IP of the remote Selective AR- | ||||
| REPLICATOR. The tunnel destination IP AR-IP will be an | ||||
| indication for the remote Selective AR-REPLICATOR that the | ||||
| packet needs further replication to its AR-LEAFs. | ||||
| 6.2. Selective AR-LEAF procedures | + overlay tunnels in the Selective AR-LEAF-set (excluding the | |||
| overlay tunnel to the source AR-LEAF). | ||||
| A Selective AR-LEAF chooses a single Selective AR-REPLICATOR per EVI | + overlay tunnels to the RNVEs if the tunnel source IP is the | |||
| and: | IR-IP of an AR-LEAF (in any other case, the AR-REPLICATOR | |||
| MUST NOT replicate the BM traffic to remote RNVEs). In | ||||
| other words, only the first-hop selective AR-REPLICATOR | ||||
| will replicate to all the RNVEs. | ||||
| o Sends all the EVI BM traffic to that AR-REPLICATOR and | + overlay tunnels to the remote Selective AR-REPLICATORs if | |||
| o Expects to receive the BM traffic for a given EVI from the same AR- | the tunnel source IP is an IR-IP of its own AR-LEAF-set (in | |||
| REPLICATOR. | any other case, the AR-REPLICATOR MUST NOT replicate the BM | |||
| traffic to remote AR-REPLICATORs), where the tunnel | ||||
| destination IP is the AR-IP of the remote Selective AR- | ||||
| REPLICATOR. The tunnel destination IP AR-IP will be an | ||||
| indication for the remote Selective AR-REPLICATOR that the | ||||
| packet needs further replication to its AR-LEAFs. | ||||
| In the example of Figure 1, we consider NVE1/NVE2/NVE3 as Selective | A Selective AR-REPLICATOR data path implementation will be compatible | |||
| AR-LEAFs. NVE1 selects PE1 as its Selective AR-REPLICATOR. If that is | with the following rules: | |||
| so, NVE1 will send all its BM traffic for EVI-1 to PE1. If other AR- | ||||
| LEAF/REPLICATORs send BM traffic, NVE1 will receive that traffic from | ||||
| PE1. These are the differences in the behavior of a Selective AR-LEAF | ||||
| compared to a non-selective AR-LEAF: | ||||
| a) The AR-LEAF role selective capability SHOULD be an administrative | - The Selective AR-REPLICATORs will build two flood-lists: | |||
| choice in any NVE/PE that is part of an AR-enabled EVI. This | ||||
| administrative option to enable AR-LEAF capabilities MAY be | ||||
| implemented as a system level option as opposed to as per-MAC-VRF | ||||
| option. | ||||
| b) The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs in | 1. Flood-list #1 - composed of ACs and overlay tunnels to the | |||
| the EVI. The Selective AR-LEAF MUST advertise a Leaf-AD route | remote nodes in the BD, always using the IR-IPs in the tunnel | |||
| after receiving a Replicator-AR route with L=1. It is recommended | destination IP addresses. Some of those overlay tunnels MAY | |||
| that the Selective AR-LEAF waits for a timer t before sending the | be flagged as non-BM receivers based on the BM flag received | |||
| Leaf-AD route, so that the AR-LEAF receives all the Replicator-AR | from the remote nodes in the BD. | |||
| routes for the EVI. | ||||
| c) In a service where there is more than one Selective AR-REPLICATORs | 2. Flood-list #2 - composed of ACs, a Selective AR-LEAF-set and a | |||
| the Selective AR-LEAF MUST locally select a single Selective AR- | Selective AR-REPLICATOR-set, where: | |||
| REPLICATOR for the EVI. Once selected: | ||||
| o The Selective AR-LEAF will send a Leaf-AD route including the | + The Selective AR-LEAF-set is composed of the overlay | |||
| Route-key and IP-address-specific route-target of the selected | tunnels to the AR-LEAFs that advertise a Leaf-AD route for | |||
| AR-REPLICATOR. | the local AR-REPLICATOR. This set is updated with every | |||
| Leaf-AD route received/withdrawn from a new AR-LEAF. | ||||
| o The Selective AR-LEAF will send all the BM packets received on | + The Selective AR-REPLICATOR-set is composed of the overlay | |||
| the attachment circuits (ACs) for a given EVI to that AR- | tunnels to all the AR-REPLICATORs that send a Replicator-AR | |||
| REPLICATOR. | route with L=1. The AR-IP addresses are used as tunnel | |||
| destination IP. | ||||
| o In case of a failure on the selected AR-REPLICATOR, another | - When a Selective AR-REPLICATOR receives a BM packet on an AC, it | |||
| AR-REPLICATOR will be selected and a new Leaf-AD update will | will forward the BM packet to its flood-list #1, skipping the non- | |||
| be issued for the new AR-REPLICATOR. This new route will | BM overlay tunnels. | |||
| update the selective list in the new Selective AR-REPLICATOR. | ||||
| In case of failure on the active Selective AR-REPLICATOR, it | ||||
| is recommended for the Selective AR-LEAF to revert to IR | ||||
| behavior for a timer t to speed up the convergence. When the | ||||
| timer expires, the Selective AR-LEAF will resume its AR mode | ||||
| with the new Selective AR-REPLICATOR. | ||||
| All the AR-LEAFs in an EVI are expected to be configured as either | - When a Selective AR-REPLICATOR receives a BM packet on an overlay | |||
| selective or non-selective. A mix of selective and non-selective AR- | tunnel, it will check the destination and source IPs of the | |||
| LEAFs SHOULD NOT coexist in the same EVI. In case there is a non- | underlay IP header and: | |||
| selective AR-LEAF, its BM traffic sent to a selective AR-REPLICATOR | ||||
| will not be replicated to other AR-LEAFs that are not in its | ||||
| Selective AR-LEAF-set. | ||||
| 6.3. Forwarding behavior in selective AR EVIs | o If the destination IP matches its AR-IP and the source IP | |||
| matches an IP of its own Selective AR-LEAF-set, the AR- | ||||
| REPLICATOR will forward the BM packet to its flood-list #2, as | ||||
| long as the list of AR-REPLICATORs for the BD matches the | ||||
| Selective AR-REPLICATOR-set. If the Selective AR-REPLICATOR- | ||||
| set does not match the list of AR-REPLICATORs, the node reverts | ||||
| back to non-selective mode and flood-list #1 is used. | ||||
| This section describes the differences of the selective AR forwarding | o If the destination IP matches its AR-IP and the source IP does | |||
| mode compared to the non-selective mode. Compared to section 4.4, | not match any IP of its Selective AR-LEAF-set, the AR- | |||
| there are no changes for the forwarding behavior in RNVEs or for | REPLICATOR will forward the BM packet to flood-list #2 but | |||
| unknown unicast traffic. | skipping the AR-REPLICATOR-set. | |||
| 6.3.1. Selective AR-REPLICATOR BM forwarding | o If the destination IP matches its IR-IP, the AR-REPLICATOR will | |||
| use flood-list #1 but MUST skip all the overlay tunnels from | ||||
| the flooding list, i.e. it will only replicate to local ACs. | ||||
| This is the regular-IR behavior described in [RFC7432]. | ||||
| The Selective AR-REPLICATORs will build two flood-lists: | - In any case, non-BM overlay tunnels are excluded from flood-lists | |||
| and, also, source squelching is always done in order to ensure the | ||||
| traffic is not sent back to the originating source. If the | ||||
| encapsulation is MPLSoGRE (or MPLSoUDP) and the BD label is not | ||||
| the bottom of the stack, the AR-REPLICATOR MUST copy the rest of | ||||
| the labels when forwarding them to the egress overlay tunnels. | ||||
| 1) Flood-list #1 - composed of ACs and overlay tunnels to the | 6.2. Selective AR-LEAF procedures | |||
| remote nodes in the EVI, always using the IR-IPs in the tunnel | ||||
| destination IP addresses. Some of those overlay tunnels MAY be | ||||
| flagged as non-BM receivers based on the BM flag received from | ||||
| the remote nodes in the EVI. | ||||
| 2) Flood-list #2 - composed of ACs, a Selective AR-LEAF-set and a | A Selective AR-LEAF chooses a single Selective AR-REPLICATOR per BD | |||
| Selective AR-REPLICATOR-set, where: | and: | |||
| o The Selective AR-LEAF-set is composed of the overlay tunnels | - Sends all the BD BM traffic to that AR-REPLICATOR and | |||
| to the AR-LEAFs that advertise a Leaf-AD route for the local | - Expects to receive the BM traffic for a given BD from the same AR- | |||
| AR-REPLICATOR. This set is updated with every Leaf-AD route | REPLICATOR. | |||
| received/withdrawn from a new AR-LEAF. | ||||
| o The Selective AR-REPLICATOR-set is composed of the overlay | In the example of Figure 1, we consider NVE1/NVE2/NVE3 as Selective | |||
| tunnels to all the AR-REPLICATORs that send a Replicator-AR | AR-LEAFs. NVE1 selects PE1 as its Selective AR-REPLICATOR. If that | |||
| route with L=1. The AR-IP addresses are used as tunnel | is so, NVE1 will send all its BM traffic for BD-1 to PE1. If other | |||
| destination IP. | AR-LEAF/REPLICATORs send BM traffic, NVE1 will receive that traffic | |||
| from PE1. These are the differences in the behavior of a Selective | ||||
| AR-LEAF compared to a non-selective AR-LEAF: | ||||
| When a Selective AR-REPLICATOR receives a BM packet on an AC, it will | a. The AR-LEAF role selective capability SHOULD be an administrative | |||
| forward the BM packet to its flood-list #1, skipping the non-BM | choice in any NVE/PE that is part of an AR-enabled BD. This | |||
| overlay tunnels. | administrative option to enable AR-LEAF capabilities MAY be | |||
| implemented as a system level option as opposed to as per-BD | ||||
| option. | ||||
| When a Selective AR-REPLICATOR receives a BM packet on an overlay | b. The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs | |||
| tunnel, it will check the destination and source IPs of the underlay | in the BD. The Selective AR-LEAF MUST advertise a Leaf-AD route | |||
| IP header and: | after receiving a Replicator-AR route with L=1. It is | |||
| RECOMMENDED that the Selective AR-LEAF waits for a timer t before | ||||
| sending the Leaf-AD route, so that the AR-LEAF receives all the | ||||
| Replicator-AR routes for the BD. | ||||
| - If the destination IP matches its AR-IP and the source IP | c. In a service where there is more than one Selective AR- | |||
| matches an IP of its own Selective AR-LEAF-set, the AR- | REPLICATORs the Selective AR-LEAF MUST locally select a single | |||
| REPLICATOR will forward the BM packet to its flood-list #2, as | Selective AR-REPLICATOR for the BD. Once selected: | |||
| long as the list of AR-REPLICATORs for the EVI matches the | ||||
| Selective AR-REPLICATOR-set. If the Selective AR-REPLICATOR-set | ||||
| does not match the list of AR-REPLICATORs, the node reverts back | ||||
| to non-selective mode and flood-list #1 is used. | ||||
| - If the destination IP matches its AR-IP and the source IP does | o The Selective AR-LEAF will send a Leaf-AD route including the | |||
| not match any IP of its Selective AR-LEAF-set, the AR-REPLICATOR | Route-key and IP-address-specific route-target of the selected | |||
| will forward the BM packet to flood-list #2 but skipping the AR- | AR-REPLICATOR. | |||
| REPLICATOR-set. | ||||
| - If the destination IP matches its IR-IP, the AR-REPLICATOR will | o The Selective AR-LEAF will send all the BM packets received on | |||
| use flood-list #1 but MUST skip all the overlay tunnels from the | the attachment circuits (ACs) for a given BD to that AR- | |||
| flooding list, i.e. it will only replicate to local ACs. This is | REPLICATOR. | |||
| the regular-IR behavior described in [RFC7432]. | ||||
| In any case, non-BM overlay tunnels are excluded from flood-lists | o In case of a failure on the selected AR-REPLICATOR, another | |||
| and, also, source squelching is always done in order to ensure the | AR-REPLICATOR will be selected and a new Leaf-AD update will | |||
| traffic is not sent back to the originating source. If the | be issued for the new AR-REPLICATOR. This new route will | |||
| encapsulation is MPLSoGRE (or MPLSoUDP) and the EVI label is not the | update the selective list in the new Selective AR-REPLICATOR. | |||
| bottom of the stack, the AR-REPLICATOR MUST copy the rest of the | In case of failure on the active Selective AR-REPLICATOR, it | |||
| labels when forwarding them to the egress overlay tunnels. | is RECOMMENDED for the Selective AR-LEAF to revert to IR | |||
| behavior for a timer t to speed up the convergence. When the | ||||
| timer expires, the Selective AR-LEAF will resume its AR mode | ||||
| with the new Selective AR-REPLICATOR. | ||||
| 6.3.2. Selective AR-LEAF BM forwarding | All the AR-LEAFs in a BD are expected to be configured as either | |||
| selective or non-selective. A mix of selective and non-selective AR- | ||||
| LEAFs SHOULD NOT coexist in the same BD. In case there is a non- | ||||
| selective AR-LEAF, its BM traffic sent to a selective AR-REPLICATOR | ||||
| will not be replicated to other AR-LEAFs that are not in its | ||||
| Selective AR-LEAF-set. | ||||
| The Selective AR-LEAF nodes will build two flood-lists: | A Selective AR-LEAF will follow a data path implementation compatible | |||
| with the following rules: | ||||
| 1) Flood-list #1 - composed of ACs and the overlay tunnel to the | - The Selective AR-LEAF nodes will build two flood-lists: | |||
| selected AR-REPLICATOR (using the AR-IP as the tunnel | ||||
| destination IP). | ||||
| 2) Flood-list #2 - composed of ACs and overlay tunnels to the | 1. Flood-list #1 - composed of ACs and the overlay tunnel to the | |||
| remote IR-IP Addresses. | selected AR-REPLICATOR (using the AR-IP as the tunnel | |||
| destination IP). | ||||
| When an AR-LEAF receives a BM packet on an AC, it will check if there | 2. Flood-list #2 - composed of ACs and overlay tunnels to the | |||
| is any selected AR-REPLICATOR. If there is, flood-list #1 will be | remote IR-IP Addresses. | |||
| used. Otherwise, flood-list #2 will. | ||||
| When an AR-LEAF receives a BM packet on an overlay tunnel, will | - When an AR-LEAF receives a BM packet on an AC, it will check if | |||
| forward the BM packet to its local ACs and never to an overlay | there is any selected AR-REPLICATOR. If there is, flood-list #1 | |||
| tunnel. This is the regular IR behavior described in [RFC7432]. | will be used. Otherwise, flood-list #2 will. | |||
| 7. Pruned-Flood-Lists (PFL) | - When an AR-LEAF receives a BM packet on an overlay tunnel, will | |||
| forward the BM packet to its local ACs and never to an overlay | ||||
| tunnel. This is the regular IR behavior described in [RFC7432]. | ||||
| 7. Pruned-Flood-Lists (PFL) | ||||
| In addition to AR, the second optimization supported by this solution | In addition to AR, the second optimization supported by this solution | |||
| is the ability for the all the EVI nodes to signal Pruned-Flood-Lists | is the ability for the all the BD nodes to signal Pruned-Flood-Lists | |||
| (PFL). As described in section 3, an EVPN node can signal a given | (PFL). As described in section 3, an EVPN node can signal a given | |||
| value for the BM and U PFL flags in the IR Inclusive Multicast | value for the BM and U PFL flags in the IR Inclusive Multicast | |||
| Routes, where: | Routes, where: | |||
| + BM= Broadcast and Multicast (BM) flag. BM=1 means "prune-me" from | - BM= Broadcast and Multicast (BM) flag. BM=1 means "prune-me" from | |||
| the BM flood-list. BM=0 means regular behavior. | the BM flood-list. BM=0 means regular behavior. | |||
| + U= Unknown flag. U=1 means "prune-me" from the Unknown flood-list. | - U= Unknown flag. U=1 means "prune-me" from the Unknown flood- | |||
| U=0 means regular behavior. | list. U=0 means regular behavior. | |||
| The ability to signal these PFL flags is an administrative choice. | The ability to signal these PFL flags is an administrative choice. | |||
| Upon receiving a non-zero PFL flag, a node MAY decide to honor the | Upon receiving a non-zero PFL flag, a node MAY decide to honor the | |||
| PFL flag and remove the sender from the corresponding flood-list. A | PFL flag and remove the sender from the corresponding flood-list. A | |||
| given EVI node receiving BUM traffic on an overlay tunnel MUST | given BD node receiving BUM traffic on an overlay tunnel MUST | |||
| replicate the traffic normally, regardless of the signaled PFL | replicate the traffic normally, regardless of the signaled PFL flags. | |||
| flags. | ||||
| This optimization MAY be used along with the AR solution. | This optimization MAY be used along with the AR solution. | |||
| 7.1. A PFL example | 7.1. A PFL example | |||
| In order to illustrate the use of the solution described in this | In order to illustrate the use of the solution described in this | |||
| document, we will assume that EVI-1 in figure 1 is optimized-IR | document, we will assume that BD-1 in figure 1 is optimized-IR | |||
| enabled and: | enabled and: | |||
| o PE1 and PE2 are administratively configured as AR-REPLICATORs, due | - PE1 and PE2 are administratively configured as AR-REPLICATORs, due | |||
| to their high-performance replication capabilities. PE1 and PE2 | to their high-performance replication capabilities. PE1 and PE2 | |||
| will send a Replicator-AR route with BM/U flags = 00. | will send a Replicator-AR route with BM/U flags = 00. | |||
| o NVE1 and NVE3 are administratively configured as AR-LEAF nodes, due | - NVE1 and NVE3 are administratively configured as AR-LEAF nodes, | |||
| to their low-performance software-based replication capabilities. | due to their low-performance software-based replication | |||
| They will advertise a Regular-IR route with type AR-LEAF. Assuming | capabilities. They will advertise a Regular-IR route with type | |||
| both NVEs advertise all the attached VMs in EVPN as soon as they | AR-LEAF. Assuming both NVEs advertise all the attached VMs in | |||
| come up and don't have any VMs interested in multicast | EVPN as soon as they come up and don't have any VMs interested in | |||
| applications, they will be configured to signal BM/U flags = 11 for | multicast applications, they will be configured to signal BM/U | |||
| EVI-1. | flags = 11 for BD-1. | |||
| o NVE2 is optimized-IR unaware; therefore it takes on the RNVE role | - NVE2 is optimized-IR unaware; therefore it takes on the RNVE role | |||
| in EVI-1. | in BD-1. | |||
| Based on the above assumptions the following forwarding behavior will | Based on the above assumptions the following forwarding behavior will | |||
| take place: | take place: | |||
| (1) Any BM packets sent from VM11 will be sent to VM12 and PE1. PE1 | 1. Any BM packets sent from VM11 will be sent to VM12 and PE1. PE1 | |||
| will forward further the BM packets to TS1, WAN link, PE2 and | will forward further the BM packets to TS1, WAN link, PE2 and | |||
| NVE2, but not to NVE3. PE2 and NVE2 will replicate the BM packets | NVE2, but not to NVE3. PE2 and NVE2 will replicate the BM | |||
| to their local ACs but we will avoid NVE3 having to replicate | packets to their local ACs but we will avoid NVE3 having to | |||
| unnecessarily those BM packets to VM31 and VM32. | replicate unnecessarily those BM packets to VM31 and VM32. | |||
| (2) Any BM packets received on PE2 from the WAN will be sent to PE1 | 2. Any BM packets received on PE2 from the WAN will be sent to PE1 | |||
| and NVE2, but not to NVE1 and NVE3, sparing the two hypervisors | and NVE2, but not to NVE1 and NVE3, sparing the two hypervisors | |||
| from replicating unnecessarily to their local VMs. PE1 and NVE2 | from replicating unnecessarily to their local VMs. PE1 and NVE2 | |||
| will replicate to their local ACs only. | will replicate to their local ACs only. | |||
| (3) Any Unknown unicast packet sent from VM31 will be forwarded by | 3. Any Unknown unicast packet sent from VM31 will be forwarded by | |||
| NVE3 to NVE2, PE1 and PE2 but not NVE1. The solution avoids the | NVE3 to NVE2, PE1 and PE2 but not NVE1. The solution avoids the | |||
| unnecessary replication to NVE1, since the destination of the | unnecessary replication to NVE1, since the destination of the | |||
| unknown traffic cannot be at NVE1. | unknown traffic cannot be at NVE1. | |||
| (4) Any Unknown unicast packet sent from TS1 will be forwarded by PE1 | 4. Any Unknown unicast packet sent from TS1 will be forwarded by PE1 | |||
| to the WAN link, PE2 and NVE2 but not to NVE1 and NVE3, since the | to the WAN link, PE2 and NVE2 but not to NVE1 and NVE3, since the | |||
| target of the unknown traffic cannot be at those NVEs. | target of the unknown traffic cannot be at those NVEs. | |||
| 8. AR Procedures for single-IP AR-REPLICATORS | 8. AR Procedures for single-IP AR-REPLICATORS | |||
| The procedures explained in sections 4 (Non-selective AR) and 5 | The procedures explained in sections Section 5 and Section 6 assume | |||
| (Selective AR) assume that the AR-REPLICATOR can use two local | that the AR-REPLICATOR can use two local routable IP addresses to | |||
| routable IP addresses to terminate and originate NVO tunnels, i.e. | terminate and originate NVO tunnels, i.e. IR-IP and AR-IP addresses. | |||
| IR-IP and AR-IP addresses. This is usually the case for PE-based AR- | This is usually the case for PE-based AR-REPLICATOR nodes. | |||
| REPLICATOR nodes. | ||||
| In some cases, the AR-REPLICATOR node does not support more than one | In some cases, the AR-REPLICATOR node does not support more than one | |||
| IP address to terminate and originate NVO tunnels, i.e. the IR-IP and | IP address to terminate and originate NVO tunnels, i.e. the IR-IP and | |||
| AR-IP are the same IP addresses. This may be the case in some | AR-IP are the same IP addresses. This may be the case in some | |||
| software-based or low-end AR-REPLICATOR nodes. If this is the case, | software-based or low-end AR-REPLICATOR nodes. If this is the case, | |||
| the procedures in sections 4 and 5 must be modified in the following | the procedures in sections Section 5 and Section 6 MUST be modified | |||
| way: | in the following way: | |||
| o The Replicator-AR routes generated by the AR-REPLICATOR use an AR- | - The Replicator-AR routes generated by the AR-REPLICATOR use an AR- | |||
| IP that will match its IR-IP. In order to differentiate the data | IP that will match its IR-IP. In order to differentiate the data | |||
| plane packets that need to use IR from the packets that must use AR | plane packets that need to use IR from the packets that must use | |||
| forwarding mode, the Replicator-AR route must advertise a different | AR forwarding mode, the Replicator-AR route MUST advertise a | |||
| VNI/VSID than the one used by the Regular-IR route. For instance, | different VNI/VSID than the one used by the Regular-IR route. For | |||
| the AR-REPLICATOR will advertise AR-VNI along with the Replicator- | instance, the AR-REPLICATOR will advertise AR-VNI along with the | |||
| AR route and IR-VNI along with the Regular-IR route. Since both | Replicator-AR route and IR-VNI along with the Regular-IR route. | |||
| routes have the same key, different RDs are needed for both routes. | Since both routes have the same key, different RDs are needed in | |||
| each route. | ||||
| o An AR-REPLICATOR will perform IR or AR forwarding mode for the | - An AR-REPLICATOR will perform IR or AR forwarding mode for the | |||
| incoming Overlay packets based on an ingress VNI lookup, as opposed | incoming Overlay packets based on an ingress VNI lookup, as | |||
| to the tunnel IP DA lookup described in sections 4 and 5. Note | opposed to the tunnel IP DA lookup. Note that, when replicating | |||
| that, when replicating to remote AR-REPLICATOR nodes, the use of | to remote AR-REPLICATOR nodes, the use of the IR-VNI or AR-VNI | |||
| the IR-VNI or AR-VNI advertised by the egress node will determine | advertised by the egress node will determine the IR or AR | |||
| the IR or AR forwarding mode at the subsequent AR-REPLICATOR. | forwarding mode at the subsequent AR-REPLICATOR. | |||
| The rest of the procedures will follow what is described in sections | The rest of the procedures will follow what is described in sections | |||
| 4 and 5. | Section 5 and Section 6. | |||
| 9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon | 9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon | |||
| This section extends the procedures for the cases where AR-LEAF nodes | This section extends the procedures for the cases where AR-LEAF nodes | |||
| or AR-REPLICATOR nodes are attached to the the same Ethernet Segment | or AR-REPLICATOR nodes are attached to the the same Ethernet Segment | |||
| in the Broadcast Domain. The case where one (or more) AR-LEAF node(s) | in the BD. The case where one (or more) AR-LEAF node(s) and one (or | |||
| and one (or more) AR-REPLICATOR node(s) are attached to the same | more) AR-REPLICATOR node(s) are attached to the same Ethernet Segment | |||
| Ethernet Segment is out of scope. | is out of scope. | |||
| 9.1. Ethernet Segments on AR-LEAF nodes | 9.1. Ethernet Segments on AR-LEAF nodes | |||
| If VXLAN or NVGRE are used, and if the Split-horizon is based on the | If VXLAN or NVGRE are used, and if the Split-horizon is based on the | |||
| tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split- | tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split- | |||
| horizon check will not work if there is an Ethernet-Segment shared | horizon check will not work if there is an Ethernet-Segment shared | |||
| between two AR-LEAF nodes, and the AR-REPLICATOR changes the tunnel | between two AR-LEAF nodes, and the AR-REPLICATOR changes the tunnel | |||
| IP SA of the packets with its own AR-IP. | IP SA of the packets with its own AR-IP. | |||
| In order to be compatible with the IP SA split-horizon check, the AR- | In order to be compatible with the IP SA split-horizon check, the AR- | |||
| REPLICATOR MAY keep the original received tunnel IP SA when | REPLICATOR MAY keep the original received tunnel IP SA when | |||
| replicating packets to a remote AR-LEAF or RNVE. This will allow DF | replicating packets to a remote AR-LEAF or RNVE. This will allow AR- | |||
| (Designated Forwarder) AR-LEAF nodes to apply Split-horizon check | LEAF nodes to apply Split-horizon check procedures for BM packets, | |||
| procedures for BM packets, before sending them to the local Ethernet- | before sending them to the local Ethernet-Segment. Even if the AR- | |||
| Segment. Even if the AR-LEAF's IP SA is preserved when replicating to | LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the | |||
| AR-LEAFs or RNVEs, the AR-REPLICATOR MUST always use its IR-IP as IP | AR-REPLICATOR MUST always use its IR-IP as IP SA when replicating to | |||
| SA when replicating to other AR-REPLICATORs. | other AR-REPLICATORs. | |||
| When EVPN is used for MPLS over GRE (or UDP), the ESI-label based | When EVPN is used for MPLS over GRE (or UDP), the ESI-label based | |||
| split-horizon procedure as in [RFC7432] will not work for multi-homed | split-horizon procedure as in [RFC7432] will not work for multi-homed | |||
| Ethernet-Segments defined on AR-LEAF nodes. "Local-Bias" is | Ethernet-Segments defined on AR-LEAF nodes. "Local-Bias" is | |||
| recommended in this case, as in the case of VXLAN or NVGRE explained | recommended in this case, as in the case of VXLAN or NVGRE explained | |||
| above. The "Local-Bias" and tunnel IP SA preservation mechanisms | above. The "Local-Bias" and tunnel IP SA preservation mechanisms | |||
| provide the required split-horizon behavior in non-selective or | provide the required split-horizon behavior in non-selective or | |||
| selective AR. | selective AR. | |||
| Note that if the AR-REPLICATOR implementation keeps the received | Note that if the AR-REPLICATOR implementation keeps the received | |||
| tunnel IP SA, the use of uRPF (unicast Reverse Path Forwarding) | tunnel IP SA, the use of uRPF (unicast Reverse Path Forwarding) | |||
| checks in the IP fabric based on the tunnel IP SA MUST be disabled. | checks in the IP fabric based on the tunnel IP SA MUST be disabled. | |||
| 9.2. Ethernet Segments on AR-REPLICATOR nodes | 9.2. Ethernet Segments on AR-REPLICATOR nodes | |||
| Ethernet Segments associated to one or more AR-REPLICATOR nodes | Ethernet Segments associated to one or more AR-REPLICATOR nodes | |||
| SHOULD follow "Local-Bias" procedures for EVPN all-active multi- | SHOULD follow "Local-Bias" procedures for EVPN all-active multi- | |||
| homing, as follows: | homing, as follows: | |||
| o For BUM traffic received on a local AR-REPLICATOR's AC, "Local- | - For BUM traffic received on a local AR-REPLICATOR's AC, "Local- | |||
| Bias" procedures as in [RFC8365] SHOULD be followed. | Bias" procedures as in [RFC8365] SHOULD be followed. | |||
| o For BUM traffic received on an AR-REPLICATOR overlay tunnel with | ||||
| AR-IP as the IP DA, "Local-Bias" SHOULD also be followed. That is, | ||||
| traffic received with AR-IP as IP DA will be treated as though it | ||||
| had been received on a local AC that is part of the ES and will be | ||||
| forwarded to all local ES, irrespective of their DF or NDF state. | ||||
| o BUM traffic received on an AR-REPLICATOR overlay tunnel with IR-IP | ||||
| as the IP DA, will follow regular [RFC8365] "Local-Bias" rules and | ||||
| will not be forwarded to local ESes that are shared with the AR-LEF | ||||
| or AR-REPLICATOR originating the traffic. | ||||
| 10. Benefits of the optimized-IR solution | - For BUM traffic received on an AR-REPLICATOR overlay tunnel with | |||
| AR-IP as the IP DA, "Local-Bias" SHOULD also be followed. That | ||||
| is, traffic received with AR-IP as IP DA will be treated as though | ||||
| it had been received on a local AC that is part of the ES and will | ||||
| be forwarded to all local ES, irrespective of their DF or NDF | ||||
| state. | ||||
| A solution for the optimization of Ingress Replication in EVPN is | - BUM traffic received on an AR-REPLICATOR overlay tunnel with IR-IP | |||
| described in this document (optimized-IR). The solution brings the | as the IP DA, will follow regular [RFC8365] "Local-Bias" rules and | |||
| following benefits: | will not be forwarded to local ESes that are shared with the AR- | |||
| LEF or AR-REPLICATOR originating the traffic. | ||||
| o Optimizes the multicast forwarding in low-performance NVEs, by | 10. Security Considerations | |||
| relaying the replication to high-performance NVEs (AR-REPLICATORs) | ||||
| and while preserving the packet ordering for unicast applications. | ||||
| o Reduces the flooded traffic in NVO networks where some NVEs do not | The Security Considerations in [RFC7432] and [RFC8365] apply to this | |||
| need broadcast/multicast and/or unknown unicast traffic. | document. | |||
| o It is fully compatible with existing EVPN implementations and EVPN | In addition, the procedures introduced by this document may bring | |||
| functions for NVO overlay tunnels. Optimized-IR NVEs and regular | some new risks for the successful delivery of BM traffic. Unicast | |||
| NVEs can be even part of the same EVI. | traffic is not affected by this document. The forwarding of | |||
| Broadcast and Multicast (BM) traffic is modified though, and BM | ||||
| traffic from the AR-LEAF nodes will be attracted by the existance of | ||||
| AR-REPLICATORs in the BD. An AR-LEAF will forward BM traffic to its | ||||
| selected AR-REPLICATOR, therefore an attack on the AR-REPLICATOR | ||||
| could impact the delivery of the BM traffic using that node. | ||||
| o It does not require any PIM-based tree in the NVO core of the | A implementation following the procedures in this document should not | |||
| network. | create BM loops, since the AR-REPLICATOR will always forward the BM | |||
| traffic using the correct tunnel IP Destination Address that | ||||
| indicates the remote nodes how to forward the traffic. This is true | ||||
| in both, the Non-Selective and Selective modes defined in this | ||||
| document. | ||||
| 11. Security Considerations | The Selective mode provides a multi-staged replication solution, | |||
| where a proper configuration of all the AR-REPLICATORs will avoid any | ||||
| issues. A mix of mistakenly configured Selective and Non-Selective | ||||
| AR-REPLICATORs in the same BD could theoretically create packet | ||||
| duplication in some AR-LEAFs, however this document provides a fall | ||||
| back solution to Non-Selective mode in case the AR-REPLICATORs | ||||
| advertised an inconsistent AR Replication mode. | ||||
| This section will be added in future versions. | Finally, the use of PFL as in Section 7, should be handled with care. | |||
| An intentional or unintentional misconfiguration of the BDs on a | ||||
| given leaf node may result in the leaf not receiving the required BM | ||||
| or Unknown unicast traffic. | ||||
| 12. IANA Considerations | 11. IANA Considerations | |||
| IANA has allocated the following Border Gateway Protocol (BGP) | IANA has allocated the following Border Gateway Protocol (BGP) | |||
| Parameters: | Parameters: | |||
| 1) Allocation in the P-Multicast Service Interface Tunnel (PMSI | - Allocation in the P-Multicast Service Interface Tunnel (PMSI | |||
| Tunnel) Tunnel Types registry: | Tunnel) Tunnel Types registry: | |||
| Value Meaning Reference | Value Meaning Reference | |||
| 0x0A Assisted-Replication Tunnel [This document] | 0x0A Assisted-Replication Tunnel [This document] | |||
| 2) Allocations in the P-Multicast Service Interface (PMSI) Tunnel | - Allocations in the P-Multicast Service Interface (PMSI) Tunnel | |||
| Attribute Flags registry: | Attribute Flags registry: | |||
| Value Name Reference | Value Name Reference | |||
| 3-4 Assisted-Replication Type (T) [This document] | 3-4 Assisted-Replication Type (T) [This document] | |||
| 5 Broadcast and Multicast (BM) [This document] | 5 Broadcast and Multicast (BM) [This document] | |||
| 6 Unknown (U) [This document] | 6 Unknown (U) [This document] | |||
| 13. References | 12. Contributors | |||
| 13.1 Normative References | ||||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | ||||
| Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March | ||||
| 1997, <https://www.rfc-editor.org/info/rfc2119>. | ||||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | ||||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, | ||||
| <https://www.rfc-editor.org/info/rfc8174>. | ||||
| [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP | ||||
| Encodings and Procedures for Multicast in MPLS/BGP IP VPNs", | ||||
| RFC 6514, DOI 10.17487/RFC6514, February 2012, <https://www.rfc- | ||||
| editor.org/info/rfc6514>. | ||||
| [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., | ||||
| Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet | ||||
| VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, | ||||
| <https://www.rfc-editor.org/info/rfc7432>. | ||||
| [EVPN-BUM] Zhang et al., "Updates on EVPN BUM Procedures", draft- | ||||
| ietf-bess-evpn-bum-procedure-updates-04.txt, work in progress, June | ||||
| 2018. | ||||
| 13.2 Informative References | ||||
| [RFC8365] Sajassi et al., "A Network Virtualization Overlay Solution | ||||
| Using Ethernet VPN (EVPN)", RFC 8365, March, 2018. | ||||
| 14. Contributors | ||||
| In addition to the names in the front page, the following co-authors | In addition to the names in the front page, the following co-authors | |||
| also contributed to this document: | also contributed to this document: | |||
| Wim Henderickx | Wim Henderickx | |||
| Nokia | Nokia | |||
| Kiran Nagaraj | Kiran Nagaraj | |||
| Nokia | Nokia | |||
| skipping to change at page 25, line 35 ¶ | skipping to change at page 24, line 36 ¶ | |||
| Nischal Sheth | Nischal Sheth | |||
| Juniper Networks | Juniper Networks | |||
| Aldrin Isaac | Aldrin Isaac | |||
| Juniper | Juniper | |||
| Mudassir Tufail | Mudassir Tufail | |||
| Citibank | Citibank | |||
| 15. Acknowledgments | 13. Acknowledgments | |||
| The authors would like to thank Neil Hart, David Motz, Dai Truong, | The authors would like to thank Neil Hart, David Motz, Dai Truong, | |||
| Thomas Morin, Jeffrey Zhang and Shankar Murthy for their valuable | Thomas Morin, Jeffrey Zhang, Shankar Murthy and Krzysztof Szarkowicz | |||
| feedback and contributions. | for their valuable feedback and contributions. | |||
| 16. Authors' Addresses | 14. References | |||
| Jorge Rabadan (Editor) | 14.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | ||||
| Requirement Levels", BCP 14, RFC 2119, | ||||
| DOI 10.17487/RFC2119, March 1997, | ||||
| <https://www.rfc-editor.org/info/rfc2119>. | ||||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | ||||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | ||||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | ||||
| [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP | ||||
| Encodings and Procedures for Multicast in MPLS/BGP IP | ||||
| VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, | ||||
| <https://www.rfc-editor.org/info/rfc6514>. | ||||
| [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., | ||||
| Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based | ||||
| Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February | ||||
| 2015, <https://www.rfc-editor.org/info/rfc7432>. | ||||
| [I-D.ietf-bess-evpn-bum-procedure-updates] | ||||
| Zhang, Z., Lin, W., Rabadan, J., Patel, K., and A. | ||||
| Sajassi, "Updates on EVPN BUM Procedures", draft-ietf- | ||||
| bess-evpn-bum-procedure-updates-08 (work in progress), | ||||
| November 2019. | ||||
| 14.2. Informative References | ||||
| [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., | ||||
| Uttaro, J., and W. Henderickx, "A Network Virtualization | ||||
| Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, | ||||
| DOI 10.17487/RFC8365, March 2018, | ||||
| <https://www.rfc-editor.org/info/rfc8365>. | ||||
| Authors' Addresses | ||||
| J. Rabadan (editor) | ||||
| Nokia | Nokia | |||
| 777 E. Middlefield Road | 777 Middlefield Road | |||
| Mountain View, CA 94043 USA | Mountain View, CA 94043 | |||
| USA | ||||
| Email: jorge.rabadan@nokia.com | Email: jorge.rabadan@nokia.com | |||
| Senthil Sathappan | ||||
| S. Sathappan | ||||
| Nokia | Nokia | |||
| Email: senthil.sathappan@nokia.com | Email: senthil.sathappan@nokia.com | |||
| W. Lin | ||||
| Juniper Networks | ||||
| Mukul Katiyar | Email: wlin@juniper.net | |||
| M. Katiyar | ||||
| Versa Networks | Versa Networks | |||
| Email: mukul@versa-networks.com | Email: mukul@versa-networks.com | |||
| Wen Lin | A. Sajassi | |||
| Juniper Networks | Cisco Systems | |||
| Email: wlin@juniper.net | ||||
| Ali Sajassi | ||||
| Cisco | ||||
| Email: sajassi@cisco.com | Email: sajassi@cisco.com | |||
| End of changes. 223 change blocks. | ||||
| 731 lines changed or deleted | 752 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||