idnits 2.17.1 draft-ietf-bess-evpn-pref-df-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (6 July 2022) is 632 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-virtual-eth-segment-07 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Workgroup J. Rabadan, Ed. 3 Internet-Draft S. Sathappan 4 Intended status: Standards Track Nokia 5 Expires: 7 January 2023 T. Przygienda 6 W. Lin 7 J. Drake 8 Juniper Networks 9 A. Sajassi 10 S. Mohanty 11 Cisco Systems 12 6 July 2022 14 Preference-based EVPN DF Election 15 draft-ietf-bess-evpn-pref-df-09 17 Abstract 19 The Designated Forwarder (DF) in Ethernet Virtual Private Networks 20 (EVPN) is defined as the PE responsible for sending Broadcast, 21 Unknown unicast and Broadcast traffic (BUM) to a multi-homed device/ 22 network in the case of an all-active multi-homing Ethernet Segment 23 (ES), or BUM and unicast in the case of single-active multi-homing. 24 The DF is selected out of a candidate list of PEs that advertise the 25 same Ethernet Segment Identifier (ESI) to the EVPN network, according 26 to the Default DF Election algorithm. While the Default Algorithm 27 provides an efficient and automated way of selecting the DF across 28 different Ethernet Tags in the ES, there are some use cases where a 29 more 'deterministic' and user-controlled method is required. At the 30 same time, Service Providers require an easy way to force an on- 31 demand DF switchover in order to carry out some maintenance tasks on 32 the existing DF or control whether a new active PE can preempt the 33 existing DF PE. 35 This document proposes a DF Election algorithm that meets the 36 requirements of determinism and operation control. 38 Status of This Memo 40 This Internet-Draft is submitted in full conformance with the 41 provisions of BCP 78 and BCP 79. 43 Internet-Drafts are working documents of the Internet Engineering 44 Task Force (IETF). Note that other groups may also distribute 45 working documents as Internet-Drafts. The list of current Internet- 46 Drafts is at https://datatracker.ietf.org/drafts/current/. 48 Internet-Drafts are draft documents valid for a maximum of six months 49 and may be updated, replaced, or obsoleted by other documents at any 50 time. It is inappropriate to use Internet-Drafts as reference 51 material or to cite them other than as "work in progress." 53 This Internet-Draft will expire on 7 January 2023. 55 Copyright Notice 57 Copyright (c) 2022 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 62 license-info) in effect on the date of publication of this document. 63 Please review these documents carefully, as they describe your rights 64 and restrictions with respect to this document. Code Components 65 extracted from this document must include Revised BSD License text as 66 described in Section 4.e of the Trust Legal Provisions and are 67 provided without warranty as described in the Revised BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 72 1.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 3 73 1.2. Solution requirements . . . . . . . . . . . . . . . . . . 3 74 2. Requirements Language and Terminology . . . . . . . . . . . . 4 75 3. EVPN BGP Attributes Extensions . . . . . . . . . . . . . . . 5 76 4. Solution description . . . . . . . . . . . . . . . . . . . . 6 77 4.1. Use of the Highest-Preference Algorithm . . . . . . . . . 7 78 4.2. Use of the Lowest-Preference Algorithm . . . . . . . . . 9 79 4.3. Use of the Highest-Preference algorithm in [RFC7432] 80 Ethernet Segments . . . . . . . . . . . . . . . . . . . . 9 81 4.4. The Non-Revertive Capability . . . . . . . . . . . . . . 10 82 5. Security Considerations . . . . . . . . . . . . . . . . . . . 14 83 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 84 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 85 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 15 86 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 87 9.1. Normative References . . . . . . . . . . . . . . . . . . 15 88 9.2. Informative References . . . . . . . . . . . . . . . . . 16 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 91 1. Introduction 92 1.1. Problem Statement 94 [RFC7432] defines the Designated Forwarder (DF) in EVPN networks as 95 the PE responsible for sending broadcast, multicast and unknown 96 unicast traffic (BUM) to a multi-homed device/network in the case of 97 an all-active multi-homing ES or BUM and unicast traffic to a multi- 98 homed device or network in case of single-active multi-homing. The 99 DF is selected out of a candidate list of PEs that advertise the 100 Ethernet Segment Identifier (ESI) to the EVPN network and according 101 to the DF Election Algorithm, or DF Alg as per [RFC8584]. 103 While the Default DF Alg [RFC7432] or HRW [RFC8584] provide an 104 efficient and automated way of selecting the DF across different 105 Ethernet Tags in the ES, there are some use-cases where a more 106 'deterministic' and user-controlled method is required. At the same 107 time, Service Providers require an easy way to force an on-demand DF 108 switchover in order to carry out some maintenance tasks on the 109 existing DF or control whether a new active PE can preempt the 110 existing DF PE. 112 This document proposes a new DF Alg and capability to address the 113 above needs. 115 1.2. Solution requirements 117 The procedures described in this document meet the following 118 requirements: 120 a. The solution provides an administrative preference option so that 121 the user can control in what order the candidate PEs may become 122 DF, assuming they are all operationally ready to take over as DF. 124 b. This extension works for [RFC7432] Ethernet Segments and virtual 125 ES, as defined in [I-D.ietf-bess-evpn-virtual-eth-segment]. 127 c. The user may force a PE to preempt the existing DF for a given 128 Ethernet Tag without re-configuring all the PEs in the ES. 130 d. The solution allows an option to NOT preempt the current DF, even 131 if the former DF PE comes back up after a failure. This is also 132 known as "non-revertive" behavior, as opposed to the [RFC7432] DF 133 election procedures that are always revertive. 135 e. The solution works for single-active and all-active multi-homing 136 Ethernet Segments. 138 2. Requirements Language and Terminology 140 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 141 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 142 "OPTIONAL" in this document are to be interpreted as described in 143 BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all 144 capitals, as shown here. 146 * AC - Attachment Circuit. An AC has an Ethernet Tag associated to 147 it. 149 * BUM - refers to the Broadcast, Unknown unicast and Multicast 150 traffic. 152 * DF, NDF and BDF - Designated Forwarder, Non-Designated Forwarder 153 and Backup Designated Forwarder. 155 * DF Alg or simply Alg - refers to Designated Forwarder Election 156 Algorithm. 158 * HRW - Highest Random Weight, as per [RFC8584]. 160 * ES, vES and ESI - Ethernet Segment, virtual Ethernet Segment and 161 Ethernet Segment Identifier. 163 * EVI - EVPN Instance. 165 * ISID - refers to Service Instance Identifiers in Provider Backbone 166 Bridging (PBB) networks. 168 * MAC-VRF - A Virtual Routing and Forwarding table for Media Access 169 Control (MAC) addresses on a PE. 171 * BD - Broadcast Domain. An EVI may be comprised of one (VLAN-Based 172 or VLAN Bundle services) or multiple (VLAN-Aware Bundle services) 173 Broadcast Domains. 175 * EVC - Ethernet Virtual Circuit. 177 * DP - refers to the "Don't Preempt me" capability in the DF 178 Election extended community. 180 * OAM - refers to Operations And Maintenance protocols. 182 * Ethernet A-D per ES route - refers to [RFC7432] route type 1 or 183 Auto-Discovery per Ethernet Segment route. 185 * Ethernet A-D per EVI route - refers to [RFC7432] route type 1 or 186 Auto-Discovery per EVPN Instance route. 188 * Ethernet Tag - used to represent a Broadcast Domain that is 189 configured on a given ES for the purpose of DF election. Note 190 that any of the following may be used to represent a Broadcast 191 Domain: VIDs (including Q-in-Q tags), configured IDs, VNI (VXLAN 192 Network Identifiers), normalized VID, I-SIDs (Service Instance 193 Identifiers), etc., as long as the representation of the broadcast 194 domains is configured consistently across the multi-homed PEs 195 attached to that ES. The Ethernet Tag value MUST be different 196 from zero. 198 3. EVPN BGP Attributes Extensions 200 This solution reuses and extends the DF Election Extended Community 201 defined in [RFC8584] that is advertised along with the ES route: 203 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 205 | Type=0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ 206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 207 ~ Bitmap | Reserved | DF Preference (2 octets) | 208 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 210 Figure 1: DF Election Extended Community 212 Where the following fields are defined as follows: 214 * DF Alg can have the following values: 216 - Alg 0 - Default DF Election algorithm, or modulus-based 217 algorithm as per [RFC7432]. 219 - Alg 1 - HRW algorithm as per [RFC8584]. 221 - Alg 2 - Highest-Preference algorithm (this document). 223 - Alg TBD - Lowest-Preference algorithm (this document). TBD 224 will be replaced by the allocated value at the time of 225 publication. 227 * Bitmap (2 octets) can have the following values: 229 1 1 1 1 1 1 230 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 231 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 232 |D|A| | 233 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 235 Figure 2: Bitmap field in the DF Election Extended Community 237 * Bit 0 (corresponds to Bit 24 of the DF Election Extended Community 238 and it is defined by this document): D bit or 'Don't Preempt' bit 239 (DP hereafter), determines if the PE advertising the ES route 240 requests the remote PEs in the ES not to preempt it as DF. The 241 default value is DP=0, which is compatible with the 'preempt' or 242 'revertive' behavior in the Default DF Alg [RFC7432]. The DP 243 capability is supported by Alg 2 and Alg TBD, and MAY be used with 244 DF Alg 0 or 1. The procedures of the DP capability for DF Alg 0 245 or 1 are out of the scope of this document. 247 * Bit 1: AC-DF or AC-Influenced DF Election, as explained in 248 [RFC8584]. When set to 1, it indicates the desire to use AC- 249 Influenced DF Election with the rest of the PEs in the ES. The 250 AC-DF capability bit MAY be set along with the DP capability and 251 DF Alg 2 or Alg TBD. 253 - DF Preference (defined in this document): defines a 2-octet 254 value that indicates the PE preference to become the DF in the 255 ES. The allowed values are within the range 0-65535, and the 256 default value MUST be 32767. This value is the midpoint in the 257 allowed Preference range of values, which gives the operator 258 the flexibility of choosing a significant number of values, 259 above or below the default Preference. The DF Preference field 260 is specific to DF Alg 2 and DF Alg TBD, and does not represent 261 any Preference value for other Algs. If the DF Alg is 262 different than Alg 2 or Alg TBD, these two octets can be 263 encoded differently. 265 4. Solution description 267 Figure 3 illustrates an example that will be used in the description 268 of the solution. 270 EVPN network 271 +-------------------+ 272 | +-------+ ENNI Aggregation 273 | <---ESI1,500 | PE1 | /\ +----Network---+ 274 | <-----ESI2,100 | |===||=== | 275 | | |===||== \ vES1 | +----+ 276 +-----+ | | \/ |\----------------+CE1 | 277 CE3--+ PE4 | +-------+ | \ ------------+ | 278 +-----+ | | \ / | +----+ 279 | | | X | 280 | <---ESI1,255 +-----+============ \ | 281 | <-----ESI2,200 | PE2 |========== \ vES2 | +----+ 282 | +-----+ | \ ----------+CE2 | 283 | | | --------------+ | 284 | +-----+ ----------------------+ | 285 | <-----ESI2,300 | PE3 +--/ | | +----+ 286 | +-----+ +--------------+ 287 --------------------+ 289 Figure 3: Preference-based DF Election 291 Figure 3 shows three PEs that are connecting EVCs coming from the 292 Aggregation Network to their EVIs in the EVPN network. CE1 is 293 connected to vES1 - that spans PE1 and PE2 - and CE2 is connected to 294 vES2, that is defined in PE1, PE2 and PE3. 296 If the algorithm chosen for vES1 and vES2 is Alg 2 or Alg TBD, i.e., 297 Highest-Preference or Lowest-Preference, the PEs may become DF 298 irrespective of their IP address and based on an administrative 299 Preference value. The following sections provide some examples of 300 the procedures and how they are applied in the use-case of Figure 3. 302 4.1. Use of the Highest-Preference Algorithm 304 Assuming the operator wants to control - in a flexible way - what PE 305 becomes the DF for a given vES and the order in which the PEs become 306 DF in case of multiple failures, the following procedure may be used: 308 a. vES1 and vES2 are now configurable with three optional parameters 309 that are signaled in the DF Election extended community. These 310 parameters are the Preference, Preemption option (or "Don't 311 Preempt Me" option) and DF Alg. We will represent these 312 parameters as (Pref,DP,Alg). Let's assume vES1 is configured as 313 (500,0,Highest-Pref) in PE1, and (255,0,Highest-Pref) in PE2. 314 vES2 is configured as (100,0,Highest-Pref), (200,0,Highest-Pref) 315 and (300,0,Highest-Pref) in PE1, PE2 and PE3 respectively. 317 b. The PEs will advertise an ES route for each vES, including the 3 318 parameters in the DF Election Extended Community. 320 c. According to [RFC8584], each PE will run the DF election 321 algorithm upon expiration of the DF Wait timer. In this case, 322 each PE runs the Highest-Preference DF Alg for each ES as 323 follows: 325 * The PE will check the DF Alg value in each ES route, and 326 assuming all the ES routes are consistent in this DF Alg and 327 the value is 2 (Highest-Preference), the PE will run the 328 procedure in this section. Otherwise, the procedure will fall 329 back to [RFC7432] Default Alg. 331 * In this Highest-Preference Alg, each PE builds a list of 332 candidate PEs, ordered by Preference. E.g. PE1 will build a 333 list of candidate PEs for vES1 ordered by the Preference, from 334 high to low: PE1>PE2. Hence PE1 will become the DF for vES1. 335 In the same way, PE3 becomes the DF for vES2. 337 d. Assuming some maintenance tasks had to be executed on, E.g., PE3, 338 the operator could set vES2's Preference to E.g., 50 so that PE2 339 is forced to take over as DF for vES2 (irrespective of the DP 340 capability). Once the maintenance task on PE3 is over, the 341 operator could decide to leave the existing preference or 342 configure the old preference back. 344 e. In case of equal Preference in two or more PEs in the ES, the DP 345 bit and the lowest IP of the candidate PEs are used as tie- 346 breakers. After selecting the PEs with the highest Preference 347 value, an implementation MUST first select the PE advertising the 348 DP bit set, and then select the PE with the lowest IP address (if 349 the DP bit selection does not yield a unique candidate). The 350 PE's IP address is the address used in the candidate list and it 351 is derived from the Originating Router's IP address of the ES 352 route. Some examples of the use of the DP bit and IP address 353 tie-breakers follow: 355 * If vES1 parameters were (500,0,Highest-Pref) in PE1 and 356 (500,1,Highest-Pref) in PE2, PE2 would be elected due to the 357 DP bit. 359 * If vES1 parameters were (500,0,Highest-Pref) in PE1 and 360 (500,0,Highest-Pref) in PE2, PE1 would be elected, assuming 361 PE1's IP address is lower than PE2's. 363 f. The Preference is an administrative option that MUST be 364 configured on a per-ES basis from the management plane, but MAY 365 also be dynamically changed based on the use of local policies. 366 For instance, on PE1, ES1's Preference can be lowered from 500 to 367 100 in case the bandwidth on the ENNI port is decreased a 50% 368 (that could happen if e.g. the 2-port LAG between PE1 and the 369 Aggregation Network loses one port). Policies MAY also trigger 370 dynamic Preference changes based on the PE's bandwidth 371 availability in the core, specific ports going operationally 372 down, etc. The definition of the actual local policies is out of 373 scope of this document. The default Preference value is 32767. 375 The Highest-Preference Alg MAY be used along with the AC-DF 376 capability. Assuming all the PEs in the ES are configured 377 consistently with Highest-Preference Alg and AC-DF capability, a 378 given PE in the ES is not considered as candidate for DF Election 379 until its corresponding Ethernet A-D per ES and Ethernet A-D per EVI 380 routes are not received, as described in [RFC8584]. 382 The procedures in this document can be used in [RFC7432] based ES or 383 vES as in [I-D.ietf-bess-evpn-virtual-eth-segment], and including 384 EVPN networks as in [RFC8214], [RFC7623] or [RFC8365]. 386 4.2. Use of the Lowest-Preference Algorithm 388 In addition to the Highest-Preference Alg described in Section 4.1 389 this document defines the Lowest-Preference Alg. In this case, and 390 using the example of vES1 in Figure 3, if the Lowest-Preference Alg 391 is configured in all the PEs in the ES, PE2 will be the DF due to its 392 lower Preference. 394 All the procedures described in Section 4.1 apply to the Lowest- 395 Preference Alg, only replacing the Highest-Preference tie-breaker 396 with the Lowest-Preference tie-breaker. The Highest-Preference and 397 Lowest-Preference Algs are different Algs, therefore if two PEs 398 configured for Highest-Preference and Lowest-Preference respectively, 399 are attached to the same ES, the operational DF Election Alg will 400 fall back to the Default Alg. 402 4.3. Use of the Highest-Preference algorithm in [RFC7432] Ethernet 403 Segments 405 While the Highest-Preference (or Lowest-Preference for that matter) 406 DF Alg described in Section 4.1 is typically used in virtual ES 407 scenarios where there is normally an individual Ethernet Tag per vES, 408 the existing [RFC7432] definition of an ES allows potentially up to 409 thousands of Ethernet Tags on the same ES. If this is the case, if 410 Highest-Preference (or Lowest-Preference) Alg is configured in all 411 the PEs of the ES, the same PE will be the elected DF for all the 412 Ethernet Tags of the ES. A potential way to achive a more granular 413 load balancing is decribed below. 415 The ES is configured with an administrative Preference value and 416 E.g., Highest-Preference Alg, but then a range of Ethernet Tags can 417 be defined to use the Lowest-Preference depending on the desired 418 behavior. With this option, the PE will build a list of candidate 419 PEs ordered by Preference, however the DF for a given Ethernet Tag 420 will be determined by the local configuration. 422 For instance: 424 * Assuming ES3 is defined in PE1 and PE2, PE1 may be configured as 425 (500,0,Highest-Preference) for ES3 and PE2 as (100,0,Highest- 426 Preference). 428 * In addition, assuming VLAN-based service interfaces and that the 429 PEs are attached to all Ethernet Tags in the range 1-4000, both 430 PE1 and PE2 will be configured with (Ethernet Tag-range,low), 431 E.g., (2001-4000, low). 433 * This will result in PE1 being DF for Ethernet Tags 1-2000 (since 434 they use the default Highest-Preference Alg) and PE2 being DF for 435 Ethernet Tags 2001-4000, due to the local policy overriding the 436 Highest-Preference Alg. 438 For Ethernet Segments attached to three or more PEs, any other logic 439 that provides a fair distribution of the DF function among the PEs is 440 valid, as long as that logic is consistent in all the PEs in the ES. 441 It is important to note that, when a local policy overrides the 442 Highest-Preference or Lowest-Preference signaled by all the PEs in 443 the ES, this local policy MUST be consistent in all the PEs of the 444 ES. If the local policy is inconsistent for a given Ethernet Tag in 445 the ES, black-holes or packet duplication may occur on that Ethernet 446 Tag. 448 4.4. The Non-Revertive Capability 450 As discussed in Section 1.2 (d), a capability to NOT preempt the 451 existing DF (for all the Ethernet Tags in the ES) is required and 452 therefore added to the DF Election extended community. This option 453 will allow a non-revertive behavior in the DF election. 455 Note that, when a given PE in an ES is taken down for maintenance 456 operations, before bringing it back, the Preference may be changed in 457 order to provide a non-revertive behavior. The DP bit and the 458 mechanism explained in this section will be used for those cases when 459 a former DF comes back up without any controlled maintenance 460 operation, and the non-revertive option is desired in order to avoid 461 service impact. 463 In Figure 3, we assume that based on the Highest-Preference Alg, PE3 464 is the DF for ESI2. 466 If PE3 has a link, EVC or node failure, PE2 would take over as DF. 467 If/when PE3 comes back up again, PE3 will take over, causing some 468 unnecessary packet loss in the ES. 470 The following procedure avoids preemption upon failure recovery 471 (please refer to Figure 3). The procedure supports a non-revertive 472 mode that can be used along with: 474 * Highest-Preference Alg 476 * Highest-Preference Alg, where a local policy overrides the 477 Highest-Preference tie-breaker for a range of Ethernet Tags 479 * Lowest-Preference Alg 481 The procedure is described assuming Highest-Preference Alg in the ES, 482 where local policy overrides the tie-breaker for a given Ethernet 483 Tag, since this is the most complex case. The other two cases above 484 are a sub-set of this one and the differences will be explained 485 later. 487 1. A "Don't Preempt Me" capability is defined on a per-PE/per-ES 488 basis, as described in Section 3. If "Don't Preempt Me" is 489 disabled (default behavior), the advertised DP bit will be 0. If 490 "Don't Preempt Me" is enabled, the ES route will be advertised 491 with DP=1 ("Don't Preempt Me"). All the PEs in an ES SHOULD be 492 consistent in their configuration of the DP capability, however 493 this document does not enforce the consistency across all the 494 PEs. In case of inconsistency in the support of the DP 495 capability in the PEs of the same ES, non-revertive behavior is 496 not guaranteed. However, PEs supporting this capability will 497 still attempt this procedure. 499 2. We assume we want to avoid 'preemption' in all the PEs in the ES, 500 the three PEs are configured with the "Don't Preempt Me" 501 capability. In this example, we assume ESI2 is configured as 502 'DP=enabled' in the three PEs. 504 3. We also assume vES2 is attached to Ethernet Tag-1 and Ethernet 505 Tag-2. vES2 uses Highest-Preference as DF Alg and a local policy 506 is configured in the three PEs to use Lowest-Preference for 507 Ethernet Tag-2. When vES2 is enabled in the three PEs, the PEs 508 will exchange the ES routes and select PE3 as DF for Ethernet 509 Tag-1 (due to the Highest-Preference), and PE1 as DF for Ethernet 510 Tag-2 (due to the Lowest-Preference). 512 4. If PE3's vES2 goes down (due to EVC failure - detected by OAM, or 513 port failure or node failure), PE2 will become the DF for 514 Ethernet Tag-1. No changes will occur for Ethernet Tag-2. 516 5. When PE3's vES2 comes back up, PE3 will start a boot-timer (if 517 booting up) or hold-timer (if the port or EVC recovers). That 518 timer will allow some time for PE3 to receive the ES routes from 519 PE1 and PE2. This timer is applied between the INIT and the 520 DF_WAIT states in the DF Election Finite State Machine described 521 in [RFC8584]. PE3 will then: 523 * Select two "reference-PEs" among the ES routes in the vES, the 524 "Highest-PE" and the "Lowest-PE": 526 - The Highest-PE is the PE with higher Preference, using the 527 DP bit first (with DP=1 being better) and, after that, the 528 lower PE-IP address as tie-breakers. PE3 will select PE2 529 as Highest-PE over PE1, since, when comparing (Pref,DP,PE- 530 IP), (200,1,PE2-IP) wins over (100,1,PE1-IP). 532 - The Lowest-PE is the PE with lower Preference, using the DP 533 bit first (with DP=1 being better) and, after that, the 534 lower PE-IP address as tie-breakers. PE3 will select PE1 535 as Lowest-PE over PE2, since (100,1,PE1-IP) wins over 536 (200,1,PE2-IP). 538 - Note that if there were only one remote PE in the ES, 539 Lowest and Highest PE would be the same PE. 541 * Check its own administrative Pref and compares it with the one 542 of the Highest-PE and Lowest-PE that have DP=1 in their ES 543 routes. Depending on this comparison PE3 will send the ES 544 route with a (Pref,DP) that may be different from its 545 administrative (Pref,DP): 547 - If PE3's Pref value is higher or equal than the Highest- 548 PE's, PE3 will send the ES route with an 'in-use' 549 operational Pref equal to the Highest-PE's and DP=0. 551 - If PE3's Pref value is lower or equal than the Lowest-PE's, 552 PE3 will send the ES route with an 'in-use' operational 553 Preference equal to the Lowest-PE's and DP=0. 555 - If PE3's Pref value is not higher or equal than the 556 Highest-PE's and is not lower or equal than the Lowest- 557 PE's, PE3 will send the ES route with its administrative 558 (Pref,DP)=(300,1). 560 - In this example, PE3's administrative Pref=300 is higher 561 than the Highest-PE with DP=1, that is, PE2 (Pref=200). 562 Hence PE3 will inherit PE2's preference and send the ES 563 route with an operational 'in-use' (Pref,DP)=(200,0). 565 * Note that, a PE will always send DP=0 as long as the 566 advertised Pref is the 'in-use' operational Pref (as opposed 567 to the 'administrative' Pref). 569 * This ES route update sent by PE3, with (200,0,PE3-IP), will 570 not cause any DF switchover for any Ethernet Tag. PE2 will 571 continue being DF for Ethernet Tag-1. This is because the DP 572 bit will be used as a tie-breaker in the DF election. That 573 is, if a PE has two candidate PEs with the same Pref, it will 574 pick up the one with DP=1. There are no DF changes for 575 Ethernet Tag-2 either. 577 6. For any subsequent received update/withdraw in the ES, the PEs 578 will go through the process described in (5) to select Highest 579 and Lowest-PEs, now considering themselves as candidates. For 580 instance, if PE2 fails, upon receiving PE2's ES route withdrawal, 581 PE3 and PE1 will go through the selection of new Highest and 582 Lowest-PEs (considering their own active ES route) and then they 583 will run the DF Election. 585 * If a PE selects itself as new Highest or Lowest-PE and it was 586 not before, the PE will then compare its operational 'in-use' 587 Pref with its administrative Pref. If different, the PE will 588 send an ES route update with its administrative Pref and DP 589 values. In the example, PE3 will be the new Highest-PE, 590 therefore it will send an ES route update with 591 (Pref,DP)=(300,1). 593 * After running the DF Election, PE3 will become the new DF for 594 Ethernet Tag-1. No changes will occur for Ethernet Tag-2. 596 If the ES uses Highest-Preference Alg (for all the Ethernet Tags, no 597 local policy), the PEs only need to select the "Highest-PE" as the 598 "reference-PE" (i.e., no need to select the "Lowest-PE"). If the ES 599 uses Lowest-Preference Alg for all the Ethernet Tags, the PEs only 600 need to select the "Lowest-PE" as the "reference-PE". The rest of 601 the procedure remains the same. 603 Note that, irrespective of the DP bit, when a PE or ES comes back and 604 the PE advertises a DF Election Alg different than the one configured 605 in the rest of the PEs in the ES, all the PEs in the ES MUST fall 606 back to the Default [RFC7432] Alg. 608 This document does not modify the use of the P and B bits in the 609 Ethernet A-D per EVI routes [RFC8214] advertised by the PEs in the ES 610 after running the DF Election, irrespective of the revertive or non- 611 revertive behavior in the PE. 613 5. Security Considerations 615 This document describes a DF Election Algorithm that provides 616 absolute control (by configuration) over what PE is the DF for a 617 given Ethernet Tag. While this control is desired in many situations, 618 a malicious user that gets access to the configuration of a PE in the 619 ES may change the behavior of the network. In other DF Algs such as 620 HRW, the DF Election is more automated and cannot be determined by 621 configuration. 623 The non-revertive capability described in this document may be seen 624 as a security improvement over the regular EVPN revertive DF 625 Election: an intentional link (or node) "flapping" on a PE will only 626 cause service disruption once, when the PE goes to NDF state. 628 The document also describes how a local policy can override the 629 Highest-Preference Alg for a range of Ethernet Tags in the ES. If 630 the local policy is not consistent across all PEs in the ES and there 631 is an Ethernet Tag that ends up with an inconsistent use of Highest- 632 Preference or Lowest-Preference in different PEs, black-holing or 633 packet duplication may occur for that Ethernet Tag. 635 6. IANA Considerations 637 This document solicits the allocation of the following values: 639 * DF Alg = 2 in the [RFC8584] "DF Alg" registry, with name "Highest- 640 Preference Algorithm". 642 * DF Alg = TBD in the same "DF Alg" registry, with name "Lowest- 643 Preference Algorithm". 645 * Bit 0 in the [RFC8584] DF Election Capabilities registry, with 646 name "D (Don't Preempt) Capability" for Non-revertive ES. 648 7. Acknowledgments 650 The authors would like to thank Kishore Tiruveedhula and Sasha 651 Vainshtein for their review and comments. Also thank you to Luc 652 Andre Burdet and Stephane Litkowski for their thorough review and 653 suggestions for a new DF Alg for lowest-preference. 655 8. Contributors 657 In addition to the authors listed, the following individuals also 658 contributed to this document: 660 Kiran Nagaraj, Nokia 662 Vinod Prabhu, Nokia 664 Selvakumar Sivaraj, Juniper 666 Sami Boutros, VMWare 668 9. References 670 9.1. Normative References 672 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 673 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 674 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 675 2015, . 677 [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, 678 J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet 679 VPN Designated Forwarder Election Extensibility", 680 RFC 8584, DOI 10.17487/RFC8584, April 2019, 681 . 683 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 684 Requirement Levels", BCP 14, RFC 2119, 685 DOI 10.17487/RFC2119, March 1997, 686 . 688 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 689 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 690 May 2017, . 692 [I-D.ietf-bess-evpn-virtual-eth-segment] 693 Sajassi, A., Brissette, P., Schell, R., Drake, J. E., and 694 J. Rabadan, "EVPN Virtual Ethernet Segment", Work in 695 Progress, Internet-Draft, draft-ietf-bess-evpn-virtual- 696 eth-segment-07, 6 July 2021, 697 . 700 9.2. Informative References 702 [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. 703 Rabadan, "Virtual Private Wire Service Support in Ethernet 704 VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017, 705 . 707 [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., 708 Uttaro, J., and W. Henderickx, "A Network Virtualization 709 Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, 710 DOI 10.17487/RFC8365, March 2018, 711 . 713 [RFC7623] Sajassi, A., Ed., Salam, S., Bitar, N., Isaac, A., and W. 714 Henderickx, "Provider Backbone Bridging Combined with 715 Ethernet VPN (PBB-EVPN)", RFC 7623, DOI 10.17487/RFC7623, 716 September 2015, . 718 Authors' Addresses 720 J. Rabadan (editor) 721 Nokia 722 520 Almanor Avenue 723 Sunnyvale, CA 94085 724 USA 725 Email: jorge.rabadan@nokia.com 727 S. Sathappan 728 Nokia 729 Email: senthil.sathappan@nokia.com 731 T. Przygienda 732 Juniper Networks 733 Email: prz@juniper.net 735 W. Lin 736 Juniper Networks 737 Email: wlin@juniper.net 738 J. Drake 739 Juniper Networks 740 Email: jdrake@juniper.net 742 A. Sajassi 743 Cisco Systems 744 Email: sajassi@cisco.com 746 S. Mohanty 747 Cisco Systems 748 Email: satyamoh@cisco.com