< draft-ietf-bess-evpn-pref-df-06.txt   draft-ietf-bess-evpn-pref-df-07.txt >
BESS Workgroup J. Rabadan, Ed. BESS Workgroup J. Rabadan, Ed.
Internet-Draft S. Sathappan Internet-Draft S. Sathappan
Intended status: Standards Track Nokia Intended status: Standards Track Nokia
Expires: December 21, 2020 T. Przygienda Expires: September 13, 2021 T. Przygienda
W. Lin W. Lin
J. Drake J. Drake
Juniper Networks Juniper Networks
A. Sajassi A. Sajassi
S. Mohanty S. Mohanty
Cisco Systems Cisco Systems
June 19, 2020 March 12, 2021
Preference-based EVPN DF Election Preference-based EVPN DF Election
draft-ietf-bess-evpn-pref-df-06 draft-ietf-bess-evpn-pref-df-07
Abstract Abstract
The Designated Forwarder (DF) in Ethernet Virtual Private Networks The Designated Forwarder (DF) in Ethernet Virtual Private Networks
(EVPN) is defined as the PE responsible for sending Broadcast, (EVPN) is defined as the PE responsible for sending Broadcast,
Unknown unicast and Broadcast traffic (BUM) to a multi-homed device/ Unknown unicast and Broadcast traffic (BUM) to a multi-homed device/
network in the case of an all-active multi-homing Ethernet Segment network in the case of an all-active multi-homing Ethernet Segment
(ES), or BUM and unicast in the case of single-active multi-homing. (ES), or BUM and unicast in the case of single-active multi-homing.
The DF is selected out of a candidate list of PEs that advertise the The DF is selected out of a candidate list of PEs that advertise the
same Ethernet Segment Identifier (ESI) to the EVPN network, according same Ethernet Segment Identifier (ESI) to the EVPN network, according
to the Default DF Election algorithm. to the Default DF Election algorithm. While the Default Algorithm
provides an efficient and automated way of selecting the DF across
While the Default Algorithm provides an efficient and automated way different Ethernet Tags in the ES, there are some use cases where a
of selecting the DF across different Ethernet Tags in the ES, there more 'deterministic' and user-controlled method is required. At the
are some use-cases where a more 'deterministic' and user-controlled same time, Service Providers require an easy way to force an on-
method is required. At the same time, Service Providers require an demand DF switchover in order to carry out some maintenance tasks on
easy way to force an on-demand DF switchover in order to carry out the existing DF or control whether a new active PE can preempt the
some maintenance tasks on the existing DF or control whether a new existing DF PE.
active PE can preempt the existing DF PE.
This document proposes an extension to the Default DF election This document proposes a DF Election algorithm that meets the
procedures so that the above requirements can be met. requirements of determinism and operation control.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 21, 2020. This Internet-Draft will expire on September 13, 2021.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 3 1.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 3
1.2. Solution requirements . . . . . . . . . . . . . . . . . . 3 1.2. Solution requirements . . . . . . . . . . . . . . . . . . 3
2. Requirements Language and Terminology . . . . . . . . . . . . 4 2. Requirements Language and Terminology . . . . . . . . . . . . 4
3. EVPN BGP Attributes Extensions . . . . . . . . . . . . . . . 5 3. EVPN BGP Attributes Extensions . . . . . . . . . . . . . . . 5
4. Solution description . . . . . . . . . . . . . . . . . . . . 6 4. Solution description . . . . . . . . . . . . . . . . . . . . 6
4.1. Use of the Preference algorithm . . . . . . . . . . . . . 7 4.1. Use of the Highest-Preference Algorithm . . . . . . . . . 7
4.2. Use of the Preference algorithm in [RFC7432] Ethernet 4.2. Use of the Lowest-Preference Algorithm . . . . . . . . . 9
Segments . . . . . . . . . . . . . . . . . . . . . . . . 9 4.3. Use of the Highest-Preference algorithm in [RFC7432]
4.3. The Non-Revertive Capability . . . . . . . . . . . . . . 10 Ethernet Segments . . . . . . . . . . . . . . . . . . . . 9
5. Security Considerations . . . . . . . . . . . . . . . . . . . 13 4.4. The Non-Revertive Capability . . . . . . . . . . . . . . 10
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 5. Security Considerations . . . . . . . . . . . . . . . . . . . 14
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 13 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 15
9.1. Normative References . . . . . . . . . . . . . . . . . . 14 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 15
9.2. Informative References . . . . . . . . . . . . . . . . . 14 9.1. Normative References . . . . . . . . . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 9.2. Informative References . . . . . . . . . . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16
1. Introduction 1. Introduction
1.1. Problem Statement
[RFC7432] defines the Designated Forwarder (DF) in (PBB-)EVPN 1.1. Problem Statement
networks as the PE responsible for sending broadcast, multicast and
unknown unicast traffic (BUM) to a multi-homed device/network in the
case of an all-active multi-homing ES or BUM and unicast traffic to a
multi-homed device or network in case of single-active multi-homing.
The DF is selected out of a candidate list of PEs that advertise the [RFC7432] defines the Designated Forwarder (DF) in EVPN networks as
the PE responsible for sending broadcast, multicast and unknown
unicast traffic (BUM) to a multi-homed device/network in the case of
an all-active multi-homing ES or BUM and unicast traffic to a multi-
homed device or network in case of single-active multi-homing. The
DF is selected out of a candidate list of PEs that advertise the
Ethernet Segment Identifier (ESI) to the EVPN network and according Ethernet Segment Identifier (ESI) to the EVPN network and according
to the DF Election Algorithm, or DF Alg as per [RFC8584]. to the DF Election Algorithm, or DF Alg as per [RFC8584].
While the Default DF Alg [RFC7432] or HRW [RFC8584] provide an While the Default DF Alg [RFC7432] or HRW [RFC8584] provide an
efficient and automated way of selecting the DF across different efficient and automated way of selecting the DF across different
Ethernet Tags in the ES, there are some use-cases where a more Ethernet Tags in the ES, there are some use-cases where a more
'deterministic' and user-controlled method is required. At the same 'deterministic' and user-controlled method is required. At the same
time, Service Providers require an easy way to force an on-demand DF time, Service Providers require an easy way to force an on-demand DF
switchover in order to carry out some maintenance tasks on the switchover in order to carry out some maintenance tasks on the
existing DF or control whether a new active PE can preempt the existing DF or control whether a new active PE can preempt the
skipping to change at page 5, line 41 skipping to change at page 5, line 41
Where the following fields are defined as follows: Where the following fields are defined as follows:
o DF Alg can have the following values: o DF Alg can have the following values:
- Alg 0 - Default DF Election algorithm, or modulus-based - Alg 0 - Default DF Election algorithm, or modulus-based
algorithm as per [RFC7432]. algorithm as per [RFC7432].
- Alg 1 - HRW algorithm as per [RFC8584]. - Alg 1 - HRW algorithm as per [RFC8584].
- Alg 2 - Preference algorithm (this document). - Alg 2 - Highest-Preference algorithm (this document).
- Alg TBD - Lowest-Preference algorithm (this document). TBD
will be replaced by the allocated value at the time of
publication.
o Bitmap (2 octets) can have the following values: o Bitmap (2 octets) can have the following values:
1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|D|A| | |D|A| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: Bitmap field in the DF Election Extended Community Figure 2: Bitmap field in the DF Election Extended Community
- Bit 0 (corresponds to Bit 24 of the DF Election Extended - Bit 0 (corresponds to Bit 24 of the DF Election Extended
Community and it is defined by this document): D bit or 'Don't Community and it is defined by this document): D bit or 'Don't
Preempt' bit (DP hereafter), determines if the PE advertising Preempt' bit (DP hereafter), determines if the PE advertising
the ES route requests the remote PEs in the ES not to preempt the ES route requests the remote PEs in the ES not to preempt
it as DF. The default value is DP=0, which is compatible with it as DF. The default value is DP=0, which is compatible with
the 'preempt' or 'revertive' behavior in the Default DF Alg the 'preempt' or 'revertive' behavior in the Default DF Alg
[RFC7432]. The DP bit SHOULD be ignored if the DF Alg is [RFC7432]. The DP capability is supported by Alg 2 and Alg
different than 2. TBD, and MAY be used with DF Alg 0 or 1. The procedures of the
DP capability for DF Alg 0 or 1 are out of the scope of this
document.
- Bit 1: AC-DF or AC-Influenced DF Election, as explained in - Bit 1: AC-DF or AC-Influenced DF Election, as explained in
[RFC8584]. When set to 1, it indicates the desire to use AC- [RFC8584]. When set to 1, it indicates the desire to use AC-
Influenced DF Election with the rest of the PEs in the ES. The Influenced DF Election with the rest of the PEs in the ES. The
AC-DF capability bit MAY be set along with the DP capability AC-DF capability bit MAY be set along with the DP capability
and DF Alg 2. and DF Alg 2 or Alg TBD.
o DF Preference (defined in this document): defines a 2-octet value o DF Preference (defined in this document): defines a 2-octet value
that indicates the PE preference to become the DF in the ES. The that indicates the PE preference to become the DF in the ES. The
allowed values are within the range 0-65535, and the default value allowed values are within the range 0-65535, and the default value
MUST be 32767. This value is the midpoint in the allowed MUST be 32767. This value is the midpoint in the allowed
Preference range of values, which gives the operator the Preference range of values, which gives the operator the
flexibility of choosing a significant number of values, above or flexibility of choosing a significant number of values, above or
below the default Preference. The DF Preference field is specific below the default Preference. The DF Preference field is specific
to DF Alg 2 and does not represent any Preference value for other to DF Alg 2 and DF Alg TBD, and does not represent any Preference
Algs. If the DF Alg is different than Alg 2, these two octets can value for other Algs. If the DF Alg is different than Alg 2 or
be encoded differently. Alg TBD, these two octets can be encoded differently.
4. Solution description 4. Solution description
Figure 3 illustrates an example that will be used in the description Figure 3 illustrates an example that will be used in the description
of the solution. of the solution.
EVPN network EVPN network
+-------------------+ +-------------------+
| +-------+ ENNI Aggregation | +-------+ ENNI Aggregation
| <---ESI1,500 | PE1 | /\ +----Network---+ | <---ESI1,500 | PE1 | /\ +----Network---+
| <-----ESI2,100 | |===||=== | | <-----ESI2,100 | |===||=== |
| | |===||== \ vES1 | +----+ | | |===||== \ vES1 | +----+
+-----+ | | \/ |\----------------+CE1 | +-----+ | | \/ |\----------------+CE1 |
CE3--+ PE4 | +-------+ | \ ------------+ | CE3--+ PE4 | +-------+ | \ ------------+ |
+-----+ | | \ / | +----+ +-----+ | | \ / | +----+
| | | X | | | | X |
| <---ESI1,255 +-----+============ \ | | <---ESI1,255 +-----+============ \ |
| <-----ESI2,200 | PE2 |========== \ vES2 | +----+ | <-----ESI2,200 | PE2 |========== \ vES2 | +----+
| +-----+ | \ ----------+CE2 | | +-----+ | \ ----------+CE2 |
| | | --------------| | | | | --------------+ |
| +-----+ ----------------------+ | | +-----+ ----------------------+ |
| <-----ESI2,300 | PE3 +--/ | | +----+ | <-----ESI2,300 | PE3 +--/ | | +----+
| +-----+ +--------------+ | +-----+ +--------------+
--------------------+ --------------------+
Figure 3: Preference-based DF Election Figure 3: Preference-based DF Election
Figure 3 shows three PEs that are connecting EVCs coming from the Figure 3 shows three PEs that are connecting EVCs coming from the
Aggregation Network to their EVIs in the EVPN network. CE1 is Aggregation Network to their EVIs in the EVPN network. CE1 is
connected to vES1 - that spans PE1 and PE2 - and CE2 is connected to connected to vES1 - that spans PE1 and PE2 - and CE2 is connected to
vES2, that is defined in PE1, PE2 and PE3. vES2, that is defined in PE1, PE2 and PE3.
If the algorithm chosen for vES1 and vES2 is Alg 2, i.e., Preference- If the algorithm chosen for vES1 and vES2 is Alg 2 or Alg TBD, i.e.,
based, the PEs may become DF irrespective of their IP address and Highest-Preference or Lowest-Preference, the PEs may become DF
based on an administrative Preference value. The following sections irrespective of their IP address and based on an administrative
provide some examples of the procedures and how they are applied in Preference value. The following sections provide some examples of
the use-case of Figure 3. the procedures and how they are applied in the use-case of Figure 3.
4.1. Use of the Preference algorithm 4.1. Use of the Highest-Preference Algorithm
Assuming the operator wants to control - in a flexible way - what PE Assuming the operator wants to control - in a flexible way - what PE
becomes the DF for a given vES and the order in which the PEs become becomes the DF for a given vES and the order in which the PEs become
DF in case of multiple failures, the following procedure may be used: DF in case of multiple failures, the following procedure may be used:
a. vES1 and vES2 are now configurable with three optional parameters a. vES1 and vES2 are now configurable with three optional parameters
that are signaled in the DF Election extended community. These that are signaled in the DF Election extended community. These
parameters are the Preference, Preemption option (or "Don't parameters are the Preference, Preemption option (or "Don't
Preempt Me" option) and DF Alg. We will represent these Preempt Me" option) and DF Alg. We will represent these
parameters as (Pref,DP,Alg). Let's assume vES1 is configured as parameters as (Pref,DP,Alg). Let's assume vES1 is configured as
(500,0,Pref) in PE1, and (255,0,Pref) in PE2. vES2 is configured (500,0,Highest-Pref) in PE1, and (255,0,Highest-Pref) in PE2.
as (100,0,Pref), (200,0,Pref) and (300,0,Pref) in PE1, PE2 and vES2 is configured as (100,0,Highest-Pref), (200,0,Highest-Pref)
PE3 respectively. and (300,0,Highest-Pref) in PE1, PE2 and PE3 respectively.
b. The PEs will advertise an ES route for each vES, including the 3 b. The PEs will advertise an ES route for each vES, including the 3
parameters in the DF Election Extended Community. parameters in the DF Election Extended Community.
c. According to [RFC8584], each PE will run the DF election c. According to [RFC8584], each PE will run the DF election
algorithm upon expiration of the DF Wait timer. In this case, algorithm upon expiration of the DF Wait timer. In this case,
each PE runs the Preference-based DF Alg for each ES as follows: each PE runs the Highest-Preference DF Alg for each ES as
follows:
- The PE will check the DF Alg value in each ES route, and - The PE will check the DF Alg value in each ES route, and
assuming all the ES routes are consistent in this DF Alg and assuming all the ES routes are consistent in this DF Alg and
the value is 2 (Preference-based), the PE will run the the value is 2 (Highest-Preference), the PE will run the
procedure in this section. Otherwise, the procedure will fall procedure in this section. Otherwise, the procedure will fall
back to [RFC7432] Default Alg. back to [RFC7432] Default Alg.
- In this Preference-based Alg, each PE builds a list of - In this Highest-Preference Alg, each PE builds a list of
candidate PEs, ordered by Preference. E.g. PE1 will build a candidate PEs, ordered by Preference. E.g. PE1 will build a
list of candidate PEs for vES1 ordered by the Preference, from list of candidate PEs for vES1 ordered by the Preference, from
high to low: PE1>PE2. Hence PE1 will become the DF for vES1. high to low: PE1>PE2. Hence PE1 will become the DF for vES1.
In the same way, PE3 becomes the DF for vES2. In the same way, PE3 becomes the DF for vES2.
d. Note that, by default, the Highest-Preference is chosen for each d. Assuming some maintenance tasks had to be executed on, E.g., PE3,
ES or vES, however the ES configuration can be changed to the the operator could set vES2's Preference to E.g., 50 so that PE2
Lowest-Preference algorithm as long as this option is consistent is forced to take over as DF for vES2 (irrespective of the DP
in all the PEs in the ES. E.g. vES1 could have been explicitly capability). Once the maintenance task on PE3 is over, the
configured as Alg Preference-based with Lowest-Preference, in operator could decide to leave the existing preference or
which case, PE2 would have been the DF. configure the old preference back.
e. Assuming some maintenance tasks had to be executed on PE3, the
operator could set vES2's Preference to e.g., 50 so that PE2 is
forced to take over as DF for vES2 (irrespective of the DP
capability). Once the maintenance on PE3 is over, the operator
could decide to leave the existing preference or configure the
old preference back.
f. In case of equal Preference in two or more PEs in the ES, the DP e. In case of equal Preference in two or more PEs in the ES, the DP
bit and the lowest IP of the candidate PEs are used as tie- bit and the lowest IP of the candidate PEs are used as tie-
breakers. After selecting the PEs with the highest Preference breakers. After selecting the PEs with the highest Preference
value, an implementation MUST first select the PE advertising the value, an implementation MUST first select the PE advertising the
DP bit set, and then select the PE with the lowest IP address (if DP bit set, and then select the PE with the lowest IP address (if
the DP bit selection does not yield a unique candidate). The the DP bit selection does not yield a unique candidate). The
PE's IP address is the address used in the candidate list and it PE's IP address is the address used in the candidate list and it
is derived from the Originating Router's IP address of the ES is derived from the Originating Router's IP address of the ES
route. Some examples of the use of the DP bit and IP address route. Some examples of the use of the DP bit and IP address
tie-breakers follow: tie-breakers follow:
- If vES1 parameters were (500,0,Pref) in PE1 and (500,1,Pref) - If vES1 parameters were (500,0,Highest-Pref) in PE1 and
in PE2, PE2 would be elected due to the DP bit. (500,1,Highest-Pref) in PE2, PE2 would be elected due to the
DP bit.
- If vES1 parameters were (500,0,Pref) in PE1 and (500,0,Pref) - If vES1 parameters were (500,0,Highest-Pref) in PE1 and
in PE2, PE1 would be elected, assuming PE1's IP address is (500,0,Highest-Pref) in PE2, PE1 would be elected, assuming
lower than PE2's. PE1's IP address is lower than PE2's.
g. The Preference is an administrative option that MUST be f. The Preference is an administrative option that MUST be
configured on a per-ES basis from the management plane, but MAY configured on a per-ES basis from the management plane, but MAY
also be dynamically changed based on the use of local policies. also be dynamically changed based on the use of local policies.
For instance, on PE1, ES1's Preference can be lowered from 500 to For instance, on PE1, ES1's Preference can be lowered from 500 to
100 in case the bandwidth on the ENNI port is decreased a 50% 100 in case the bandwidth on the ENNI port is decreased a 50%
(that could happen if e.g. the 2-port LAG between PE1 and the (that could happen if e.g. the 2-port LAG between PE1 and the
Aggregation Network loses one port). Policies MAY also trigger Aggregation Network loses one port). Policies MAY also trigger
dynamic Preference changes based on the PE's bandwidth dynamic Preference changes based on the PE's bandwidth
availability in the core, specific ports going operationally availability in the core, specific ports going operationally
down, etc. The definition of the actual local policies is out of down, etc. The definition of the actual local policies is out of
scope of this document. The default Preference value is 32767. scope of this document. The default Preference value is 32767.
The Preference Alg MAY be used along with the AC-DF capability. The Highest-Preference Alg MAY be used along with the AC-DF
Assuming all the PEs in the ES are configured consistently with capability. Assuming all the PEs in the ES are configured
Preference Alg and AC-DF capability, a given PE in the ES is not consistently with Highest-Preference Alg and AC-DF capability, a
considered as candidate for DF Election until its corresponding given PE in the ES is not considered as candidate for DF Election
Ethernet A-D per ES and Ethernet A-D per EVI routes are not received, until its corresponding Ethernet A-D per ES and Ethernet A-D per EVI
as described in [RFC8584]. routes are not received, as described in [RFC8584].
The procedures in this document can be used in [RFC7432] based ES or The procedures in this document can be used in [RFC7432] based ES or
vES as in [I-D.ietf-bess-evpn-virtual-eth-segment], and including vES as in [I-D.ietf-bess-evpn-virtual-eth-segment], and including
EVPN networks as in [RFC8214], [RFC7623] or [RFC8365]. EVPN networks as in [RFC8214], [RFC7623] or [RFC8365].
4.2. Use of the Preference algorithm in [RFC7432] Ethernet Segments 4.2. Use of the Lowest-Preference Algorithm
While the Preference-based DF Alg described in Section 4.1 is In addition to the Highest-Preference Alg described in Section 4.1
typically used in virtual ES scenarios where there is normally an this document defines the Lowest-Preference Alg. In this case, and
individual Ethernet Tag per vES, the existing [RFC7432] definition of using the example of vES1 in Figure 3, if the Lowest-Preference Alg
ES allows potentially up to thousands of Ethernet Tags on the same is configured in all the PEs in the ES, PE2 will be the DF due to its
ES. If this is the case, and the operator still wants to control who lower Preference.
the DF is for a given Ethernet Tag, the use of the Preference-based
DF Alg can also provide some level of load balancing.
In this type of scenarios, the ES is configured with an All the procedures described in Section 4.1 apply to the Lowest-
administrative Preference value, but then a range of Ethernet Tags Preference Alg, only replacing the Highest-Preference tie-breaker
can be defined to use the Highest-Preference or the Lowest-Preference with the Lowest-Preference tie-breaker. The Highest-Preference and
depending on the desired behavior. With this option, the PE will Lowest-Preference Algs are different Algs, therefore if two PEs
build a list of candidate PEs ordered by Preference, however the DF configured for Highest-Preference and Lowest-Preference respectively,
for a given Ethernet Tag will be determined by the local are attached to the same ES, the operational DF Election Alg will
configuration. fall back to the Default Alg.
4.3. Use of the Highest-Preference algorithm in [RFC7432] Ethernet
Segments
While the Highest-Preference (or Lowest-Preference for that matter)
DF Alg described in Section 4.1 is typically used in virtual ES
scenarios where there is normally an individual Ethernet Tag per vES,
the existing [RFC7432] definition of an ES allows potentially up to
thousands of Ethernet Tags on the same ES. If this is the case, if
Highest-Preference (or Lowest-Preference) Alg is configured in all
the PEs of the ES, the same PE will be the elected DF for all the
Ethernet Tags of the ES. A potential way to achive a more granular
load balancing is decribed below.
The ES is configured with an administrative Preference value and
E.g., Highest-Preference Alg, but then a range of Ethernet Tags can
be defined to use the Lowest-Preference depending on the desired
behavior. With this option, the PE will build a list of candidate
PEs ordered by Preference, however the DF for a given Ethernet Tag
will be determined by the local configuration.
For instance: For instance:
o Assuming ES3 is defined in PE1 and PE2, PE1 may be configured as o Assuming ES3 is defined in PE1 and PE2, PE1 may be configured as
(500,0,Preference) for ES3 and PE2 as (100,0,Preference). (500,0,Highest-Preference) for ES3 and PE2 as (100,0,Highest-
Preference).
o In addition, assuming VLAN-based service interfaces, the PEs will o In addition, assuming VLAN-based service interfaces and that the
be configured with (Ethernet Tag-range,high_or_low), E.g., PEs are attached to all Ethernet Tags in the range 1-4000, both
(1-2000,high) and (2001-4000, low). PE1 and PE2 will be configured with (Ethernet Tag-range,low),
E.g., (2001-4000, low).
o This will result in PE1 being DF for Ethernet Tags 1-2000 and PE2 o This will result in PE1 being DF for Ethernet Tags 1-2000 (since
being DF for Ethernet Tags 2001-4000. they use the default Highest-Preference Alg) and PE2 being DF for
Ethernet Tags 2001-4000, due to the local policy overriding the
Highest-Preference Alg.
For Ethernet Segments attached to three or more PEs, any other logic For Ethernet Segments attached to three or more PEs, any other logic
that provides a fair distribution of the DF function among the PEs is that provides a fair distribution of the DF function among the PEs is
valid, as long as that logic is consistent in all the PEs in the ES. valid, as long as that logic is consistent in all the PEs in the ES.
It is important to note that, when a local policy overrides the
Highest-Preference or Lowest-Preference signaled by all the PEs in
the ES, this local policy MUST be consistent in all the PEs of the
ES. If the local policy is inconsistent for a given Ethernet Tag in
the ES, black-holes or packet duplication may occur on that Ethernet
Tag.
4.3. The Non-Revertive Capability 4.4. The Non-Revertive Capability
As discussed in Section 1.2 (d), a capability to NOT preempt the As discussed in Section 1.2 (d), a capability to NOT preempt the
existing DF for a given Ethernet Tag is required and therefore added existing DF for a given Ethernet Tag is required and therefore added
to the DF Election extended community. This option will allow a non- to the DF Election extended community. This option will allow a non-
revertive behavior in the DF election. revertive behavior in the DF election.
Note that, when a given PE in an ES is taken down for maintenance Note that, when a given PE in an ES is taken down for maintenance
operations, before bringing it back, the Preference may be changed in operations, before bringing it back, the Preference may be changed in
order to provide a non-revertive behavior. The DP bit and the order to provide a non-revertive behavior. The DP bit and the
mechanism explained in this section will be used for those cases when mechanism explained in this section will be used for those cases when
a former DF comes back up without any controlled maintenance a former DF comes back up without any controlled maintenance
operation, and the non-revertive option is desired in order to avoid operation, and the non-revertive option is desired in order to avoid
service impact. service impact.
In Figure 3, we assume that based on the Highest-Pref, PE3 is the DF In Figure 3, we assume that based on the Highest-Preference Alg, PE3
for ESI2. is the DF for ESI2.
If PE3 has a link, EVC or node failure, PE2 would take over as DF. If PE3 has a link, EVC or node failure, PE2 would take over as DF.
If/when PE3 comes back up again, PE3 will take over, causing some If/when PE3 comes back up again, PE3 will take over, causing some
unnecessary packet loss in the ES. unnecessary packet loss in the ES.
The following procedure avoids preemption upon failure recovery The following procedure avoids preemption upon failure recovery
(please refer to Figure 1): (please refer to Figure 3). The procedure supports a non-revertive
mode that can be used along with:
1. A new "Don't Preempt Me" capability is defined on a per-PE/per-ES o Highest-Preference Alg
o Highest-Preference Alg, where a local policy overrides the
Highest-Preference tie-breaker for a range of Ethernet Tags
o Lowest-Preference Alg
The procedure is described assuming Highest-Preference Alg in the ES,
where local policy overrides the tie-breaker for a given Ethernet
Tag, since this is the most complex case. The other two cases above
are a sub-set of this one and the differences will be explained
later.
1. A "Don't Preempt Me" capability is defined on a per-PE/per-ES
basis, as described in Section 3. If "Don't Preempt Me" is basis, as described in Section 3. If "Don't Preempt Me" is
disabled (default behavior), the advertised DP bit will be 0. If disabled (default behavior), the advertised DP bit will be 0. If
"Don't Preempt Me" is enabled, the ES route will be advertised "Don't Preempt Me" is enabled, the ES route will be advertised
with DP=1 ("Don't Preempt Me"). All the PEs in an ES SHOULD be with DP=1 ("Don't Preempt Me"). All the PEs in an ES SHOULD be
consistent in their configuration of the DP capability, however consistent in their configuration of the DP capability, however
this document do not enforce the consistency across all the PEs. this document does not enforce the consistency across all the
In case of inconsistency in the support of the DP capability in PEs. In case of inconsistency in the support of the DP
the PEs of the same ES, non-revertive behavior is not guaranteed. capability in the PEs of the same ES, non-revertive behavior is
not guaranteed. However, PEs supporting this capability will
still attempt this procedure.
2. Assuming we want to avoid 'preemption' in all the PEs in the ES, 2. We assume we want to avoid 'preemption' in all the PEs in the ES,
the three PEs are configured with the "Don't Preempt Me" the three PEs are configured with the "Don't Preempt Me"
capability. In this example, we assume ESI2 is configured as capability. In this example, we assume ESI2 is configured as
'DP=enabled' in the three PEs. 'DP=enabled' in the three PEs.
3. Assuming Ethernet Tag-1 uses Highest-Pref in vES2 and Ethernet 3. We also assume vES2 is attached to Ethernet Tag-1 and Ethernet
Tag-2 uses Lowest-Pref, when vES2 is enabled in the three PEs, Tag-2. vES2 uses Highest-Preference as DF Alg and a local policy
the PEs will exchange the ES routes and select PE3 as DF for is configured in the three PEs to use Lowest-Preference for
Ethernet Tag-1 (due to the Highest-Pref type), and PE1 as DF for Ethernet Tag-2. When vES2 is enabled in the three PEs, the PEs
Ethernet Tag-2 (due to the Lowest-Pref). will exchange the ES routes and select PE3 as DF for Ethernet
Tag-1 (due to the Highest-Preference), and PE1 as DF for Ethernet
Tag-2 (due to the Lowest-Preference).
4. If PE3's vES2 goes down (due to EVC failure - detected by OAM, or 4. If PE3's vES2 goes down (due to EVC failure - detected by OAM, or
port failure or node failure), PE2 will become the DF for port failure or node failure), PE2 will become the DF for
Ethernet Tag-1. No changes will occur for Ethernet Tag-2. Ethernet Tag-1. No changes will occur for Ethernet Tag-2.
5. When PE3's vES2 comes back up, PE3 will start a boot-timer (if 5. When PE3's vES2 comes back up, PE3 will start a boot-timer (if
booting up) or hold-timer (if the port or EVC recovers). That booting up) or hold-timer (if the port or EVC recovers). That
timer will allow some time for PE3 to receive the ES routes from timer will allow some time for PE3 to receive the ES routes from
PE1 and PE2. This timer is applied between the INIT and the PE1 and PE2. This timer is applied between the INIT and the
DF_WAIT states in the DF Election Finite State Machine described DF_WAIT states in the DF Election Finite State Machine described
skipping to change at page 13, line 5 skipping to change at page 13, line 44
not before, the PE will then compare its operational 'in-use' not before, the PE will then compare its operational 'in-use'
Pref with its administrative Pref. If different, the PE will Pref with its administrative Pref. If different, the PE will
send an ES route update with its administrative Pref and DP send an ES route update with its administrative Pref and DP
values. In the example, PE3 will be the new Highest-PE, values. In the example, PE3 will be the new Highest-PE,
therefore it will send an ES route update with therefore it will send an ES route update with
(Pref,DP)=(300,1). (Pref,DP)=(300,1).
- After running the DF Election, PE3 will become the new DF for - After running the DF Election, PE3 will become the new DF for
Ethernet Tag-1. No changes will occur for Ethernet Tag-2. Ethernet Tag-1. No changes will occur for Ethernet Tag-2.
If the ES uses Highest-Preference Alg (for all the Ethernet Tags, no
local policy), the PEs only need to select the "Highest-PE" as the
"reference-PE" (i.e., no need to select the "Lowest-PE"). If the ES
uses Lowest-Preference Alg for all the Ethernet Tags, the PEs only
need to select the "Lowest-PE" as the "reference-PE". The rest of
the procedure remains the same.
Note that, irrespective of the DP bit, when a PE or ES comes back and Note that, irrespective of the DP bit, when a PE or ES comes back and
the PE advertises a DF Election Alg different than 2 (Preference the PE advertises a DF Election Alg different than the one configured
algorithm), the rest of the PEs in the ES MUST fall back to the in the rest of the PEs in the ES, all the PEs in the ES MUST fall
Default [RFC7432] Alg. back to the Default [RFC7432] Alg.
This document does not modify the use of the P and B bits in the This document does not modify the use of the P and B bits in the
Ethernet A-D per EVI routes [RFC8214] advertised by the PEs in the ES Ethernet A-D per EVI routes [RFC8214] advertised by the PEs in the ES
after running the DF Election, irrespective of the revertive or non- after running the DF Election, irrespective of the revertive or non-
revertive behavior in the PE. revertive behavior in the PE.
5. Security Considerations 5. Security Considerations
This document describes a DF Election Algorithm that provides This document describes a DF Election Algorithm that provides
absolute control (by configuration) over what PE is the DF for a absolute control (by configuration) over what PE is the DF for a
skipping to change at page 13, line 30 skipping to change at page 14, line 27
a malicious user that gets access to the configuration of a PE in the a malicious user that gets access to the configuration of a PE in the
ES may change the behavior of the network. In other DF Algs such as ES may change the behavior of the network. In other DF Algs such as
HRW, the DF Election is more automated and cannot be determined by HRW, the DF Election is more automated and cannot be determined by
configuration. configuration.
The non-revertive capability described in this document may be seen The non-revertive capability described in this document may be seen
as a security improvement over the regular EVPN revertive DF as a security improvement over the regular EVPN revertive DF
Election: an intentional link (or node) "flapping" on a PE will only Election: an intentional link (or node) "flapping" on a PE will only
cause service disruption once, when the PE goes to NDF state. cause service disruption once, when the PE goes to NDF state.
The document also describes how a local policy can override the
Highest-Preference Alg for a range of Ethernet Tags in the ES. If
the local policy is not consistent across all PEs in the ES and there
is an Ethernet Tag that ends up with an inconsistent use of Highest-
Preference or Lowest-Preference in different PEs, black-holing or
packet duplication may occur for that Ethernet Tag.
6. IANA Considerations 6. IANA Considerations
This document solicits the allocation of the following values: This document solicits the allocation of the following values:
o DF Alg = 2 in the [RFC8584] "DF Alg" registry, with name o DF Alg = 2 in the [RFC8584] "DF Alg" registry, with name "Highest-
"Preference Algorithm". Preference Algorithm".
o DF Alg = TBD in the same "DF Alg" registry, with name "Lowest-
Preference Algorithm".
o Bit 0 in the [RFC8584] DF Election Capabilities registry, with o Bit 0 in the [RFC8584] DF Election Capabilities registry, with
name "D (Don't Preempt) Capability" for Non-revertive ES. name "D (Don't Preempt) Capability" for Non-revertive ES.
7. Acknowledgments 7. Acknowledgments
The authors would like to thank Kishore Tiruveedhula for his review The authors would like to thank Kishore Tiruveedhula for his review
and comments. and comments. Also thank you to Luc Andre Burdet and Stephane
Litkowski for their thorough review and suggestions for a new DF Alg
for lowest-preference.
8. Contributors 8. Contributors
In addition to the authors listed, the following individuals also In addition to the authors listed, the following individuals also
contributed to this document: contributed to this document:
Kiran Nagaraj, Nokia Kiran Nagaraj, Nokia
Vinod Prabhu, Nokia Vinod Prabhu, Nokia
Selvakumar Sivaraj, Juniper Selvakumar Sivaraj, Juniper
Sami Boutros, VMWare Sami Boutros, VMWare
9. References 9. References
9.1. Normative References 9.1. Normative References
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
 End of changes. 50 change blocks. 
124 lines changed or deleted 190 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/