idnits 2.17.1 draft-ietf-bess-evpn-mh-pa-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 5, 2021) is 1026 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'ES' is mentioned on line 257, but not defined == Missing Reference: 'VLAN' is mentioned on line 257, but not defined == Outdated reference: A later version (-13) exists of draft-ietf-bess-evpn-pref-df-07 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Working Group P. Brissette, Ed. 3 Internet-Draft A. Sajassi 4 Intended status: Standards Track LA. Burdet 5 Expires: January 6, 2022 S. Thoria 6 Cisco Systems 7 B. Wen 8 Comcast 9 E. Leyton 10 Verizon Wireless 11 J. Rabadan 12 Nokia 13 July 5, 2021 15 EVPN multi-homing port-active load-balancing 16 draft-ietf-bess-evpn-mh-pa-03 18 Abstract 20 The Multi-Chassis Link Aggregation Group (MC-LAG) technology enables 21 establishing a logical link-aggregation connection with a redundant 22 group of independent nodes. The purpose of multi-chassis LAG is to 23 provide a solution to achieve higher network availability, while 24 providing different modes of sharing/balancing of traffic. RFC7432 25 defines EVPN based MC-LAG with single-active and all-active 26 multi-homing load-balancing mode. The current draft expands on 27 existing redundancy mechanisms supported by EVPN and introduces 28 support for a new port-active load-balancing mode. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on January 6, 2022. 47 Copyright Notice 49 Copyright (c) 2021 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 66 2. Multi-Chassis Link Aggregation . . . . . . . . . . . . . . . 4 67 3. Port-active Load-balancing Procedure . . . . . . . . . . . . 4 68 4. Designated Forwarder Algorithm to Elect per Port-active PE . 5 69 4.1. Capability Flag . . . . . . . . . . . . . . . . . . . . . 5 70 4.2. Modulo-based Algorithm . . . . . . . . . . . . . . . . . 6 71 4.3. HRW Algorithm . . . . . . . . . . . . . . . . . . . . . . 6 72 4.4. Preferred-DF Algorithm . . . . . . . . . . . . . . . . . 6 73 5. Convergence considerations . . . . . . . . . . . . . . . . . 6 74 5.1. Primary / Backup per Ethernet-Segment . . . . . . . . . . 7 75 5.2. Backward Compatibility . . . . . . . . . . . . . . . . . 7 76 6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 7 77 7. Overall Advantages . . . . . . . . . . . . . . . . . . . . . 8 78 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 79 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 80 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 81 10.1. Normative References . . . . . . . . . . . . . . . . . . 9 82 10.2. Informative References . . . . . . . . . . . . . . . . . 9 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 85 1. Introduction 87 EVPN, as per [RFC7432], provides all-active per flow load-balancing 88 for multi-homing. It also defines single-active with service carving 89 mode, where one of the PEs, in redundancy relationship, is active per 90 service. 92 While these two multi-homing scenarios are most widely utilized in 93 data center and service provider access networks, there are scenarios 94 where active-standby per interface multi-homing load-balancing is 95 useful and required. The main consideration for this mode of 96 load-balancing is the determinism of traffic forwarding through a 97 specific interface rather than statistical per flow load-balancing 98 across multiple PEs providing multi-homing. The determinism provided 99 by active-standby per interface is also required for certain QOS 100 features to work. While using this mode, customers also expect 101 minimized convergence during failures. 103 A new type of load-balancing mode, port-active load-balancing, is 104 defined. This draft describes how the new load-balancing mode can be 105 supported via EVPN. The new mode may also be referred to as per 106 interface active/standby. 108 +-----+ 109 | PE3 | 110 +-----+ 111 +-----------+ 112 | MPLS/IP | 113 | CORE | 114 +-----------+ 115 +-----+ +-----+ 116 | PE1 | | PE2 | 117 +-----+ +-----+ 118 | | 119 I1 I2 120 \ / 121 \ / 122 +---+ 123 |CE1| 124 +---+ 126 Figure 1: MC-LAG Topology 128 Figure 1 shows a MC-LAG multi-homing topology where PE1 and PE2 are 129 part of the same redundancy group providing multi-homing to CE1 via 130 interfaces I1 and I2. Interfaces I1 and I2 are members of a LAG 131 running LACP protocol. The core, shown as IP or MPLS enabled, 132 provides wide range of L2 and L3 services. MC-LAG multi-homing 133 functionality is decoupled from those services in the core and it 134 focuses on providing multi-homing to the CE. With per-port active/ 135 standby load-balancing, only one of the two interface I1 or I2 would 136 be in forwarding, the other interface will be in standby. This also 137 implies that all services on the active interface are in active mode 138 and all services on the standby interface operate in standby mode. 140 1.1. Requirements Language 142 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 143 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 144 document are to be interpreted as described in [RFC2119]. 146 2. Multi-Chassis Link Aggregation 148 When a CE is multi-homed to a set of PE nodes using the [802.1AX] 149 Link Aggregation Control Protocol (LACP), the PEs must act as if they 150 were a single LACP speaker for the Ethernet links to form and operate 151 as a Link Aggregation Group (LAG). To achieve this, the PEs 152 connected to the same multi-homed CE must synchronize LACP 153 configuration and operational data among them. Interchassis 154 Communication Protocol (ICCP) [RFC7275] has been used for that 155 purpose. EVPN LAG simplifies greatly that solution. Along with the 156 simplification comes few assumptions: 158 o CE device connected to multi-homing PEs may have a single LAG with 159 all its active links i.e. links in the LAG operate in all-active 160 load-balancing mode. 162 o Same LACP parameters MUST be configured on peering PEs such as 163 system id, port priority and port key. 165 Any discrepancies from this list are left for future study. 166 Furthermore, mis-configuration and mis-wiring detection across 167 peering PEs are also left for further study. 169 3. Port-active Load-balancing Procedure 171 Following steps describe the proposed procedure with EVPN LAG to 172 support port-active load-balancing mode: 174 a. The Ethernet-Segment Identifier (ESI) MUST be assigned per access 175 interface as described in [RFC7432], which may be auto derived or 176 manually assigned. Access interface MAY be a Layer-2 or Layer-3 177 interface. The usage of ESI over Layer-3 interfce is newly 178 described in this document. 180 b. Ethernet-Segment (ES) MUST be configured in port-active 181 load-balancing mode on peering PEs for specific access interface. 183 c. Peering PEs MAY exchange only Ethernet-Segment (ES) route 184 (Route Type-4) when ESI is configured on a Layer-3 interface. 186 d. PEs in the redundancy group leverage the DF election defined in 187 [RFC8584] to determine which PE keeps the port in active mode and 188 which one(s) keep it in standby mode. While the DF election 189 defined in [RFC8584] is per [ES, Ethernet Tag] granularity, for 190 port-active mode of multi-homing, the DF election is done per 191 [ES]. The details of this algorithm are described in Section 4. 193 e. DF router MUST keep corresponding access interface in up and 194 forwarding active state for that Ethernet-Segment 196 f. Non-DF routers MAY bring and keep peering access interface 197 attached to it in operational down state. If the interface is 198 running LACP protocol, then the non-DF PE MAY also set the LACP 199 state to OOS (Out of Sync) as opposed to interface state down. 200 This allows for better convergence on standby to active 201 transition. 203 g. For EVPN-VPWS service, the usage of primary/backup bits of EVPN 204 Layer-2 attributes extended community [RFC8214] is highly 205 recommended to achieve better convergence. 207 4. Designated Forwarder Algorithm to Elect per Port-active PE 209 The ES routes, running in port-active load-balancing mode, are 210 advertised with a new capability in the DF Election Extended 211 Community as defined in [RFC8584]. Moreover, the ES associated to 212 the port leverages existing procedure of single-active, and signals 213 single-active bit along with Ethernet-AD per-ES route. Finally, as 214 in [RFC7432], the ESI-label based split-horizon procedures should be 215 used to avoid transient echo'ed packets when Layer-2 circuits are 216 involved. 218 4.1. Capability Flag 220 [RFC8584] defines a DF Election extended community, and a Bitmap 221 field to encode "capabilities" to use with the DF election algorithm 222 in the DF algorithm field. Bitmap (2 octets) is extended by the 223 following value: 225 1 1 1 1 1 1 226 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 227 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 228 |D|A| |P| | 229 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 231 Figure 2: Amended Bitmap field in the DF Election Extended Community 233 Bit 0: 'Don't Preempt' bit, as explained in 234 [I-D.ietf-bess-evpn-pref-df]. 236 Bit 1: AC-Influenced DF Election, as explained in [RFC8584]. 238 Bit 5: (corresponds to Bit 25 of the DF Election Extended 239 Community and it is defined by this document): P bit or 240 'Port Mode' bit (P hereafter), determines that the DF-Algorithm 241 should be modified to consider the port only and not the Ethernet 242 Tags. 244 4.2. Modulo-based Algorithm 246 The default DF Election algorithm, or modulus-based algorithm as in 247 [RFC7432] and updated by [RFC8584], is used here, at the granularity 248 of ES only. Given the fact, ES-Import RT community inherits from ESI 249 only byte 1-6, many deployments differentiate ESI within these bytes 250 only. For Modulo calculation, bytes 3-6 are used to determine the 251 designated forwarder using Modulo-based DF assignment. 253 4.3. HRW Algorithm 255 Highest Random Weight (HRW) algorithm defined in [RFC8584] MAY also 256 be used and signaled, and modified to operate at the granularity of 257 [ES] rather than per [ES, VLAN]. 259 [RFC8584] describes computing a 32 bit CRC over the concatenation of 260 Ethernet Tag and ESI. For port-active load-balancing mode, the 261 Ethernet Tag is simply removed from the CRC computation. 263 4.4. Preferred-DF Algorithm 265 When the new capability 'Port-Mode' is signaled, the algorithm is 266 modified to consider the port only and not any associated Ethernet 267 Tags. Furthermore, the "port-based" capability MUST be compatible 268 with the "Don't Preempt" bit. When an interface recovers, a peering 269 PE signaling D-bit will enable non-revertive behaviour at the port 270 level. The AC-DF bit MUST be set to zero. When an AC (sub- 271 interface) goes down, it does not influence the DF election. 273 5. Convergence considerations 275 To improve the convergence, upon failure and recovery, when 276 port-active load-balancing mode is used, some advanced 277 synchronization between peering PEs may be required. Port-active is 278 challenging in a sense that the "standby" port is in down state. It 279 takes some time to bring a "standby" port in up-state and settle the 280 network. For IRB and L3 services, ARP / ND cache may be 281 synchronized. Moreover, associated VRF tables may also be 282 synchronized. For L2 services, MAC table synchronization may be 283 considered. 285 Finally, for members of a LAG running LACP the ability to set the 286 "standby" port in "out-of-sync" state a.k.a "warm-standby" can be 287 leveraged. 289 5.1. Primary / Backup per Ethernet-Segment 291 The L2 Info Extended Community MAY be advertised in Ethernet A-D 292 per ES route for fast convergence. Only the P and B bits are 293 relevant to this specification. When advertised, the L2 Info 294 Extended Community SHALL have only P or B bits set and all other bits 295 must be zero. MTU must also be zero. Remote PE receiving optional 296 L2 Info Extended Community on Ethernet A-D per ES routes SHALL 297 consider only P and B bits. P and B bits received on Ethernet A-D 298 per EVI routes per [RFC8214] are overridden. 300 5.2. Backward Compatibility 302 Implementations that comply with [RFC7432] or [RFC8214] only (i.e., 303 implementations that predate this specification) will not advertise 304 the L2 Info Extended Community in Ethernet A-D per ES routes. That 305 means that all remote PEs in the ES will not receive P and B bit per 306 ES and will continue to receive and honour the P and B bits received 307 in Ethernet A-D per EVI route(s). Similarly, an implementation that 308 complies with [RFC7432] or [RFC8214] only and that receives a L2 Info 309 Extended Community will ignore it and will continue to use the 310 default path resolution algorithm. 312 6. Applicability 314 A common deployment is to provide L2 or L3 service on the PEs 315 providing multi-homing. The services could be any L2 EVPN such as 316 EVPN VPWS, EVPN [RFC7432], etc. L3 service could be in VPN context 317 [RFC4364] or in global routing context. When a PE provides first hop 318 routing, EVPN IRB could also be deployed on the PEs. The mechanism 319 defined in this draft is used between the PEs providing the L2 and/or 320 L3 service, when the requirement is to use per port active. 322 A possible alternate solution is the one described in this draft is 323 MC-LAG with ICCP [RFC7275] active-standby redundancy. However, ICCP 324 requires LDP to be enabled as a transport of ICCP messages. There 325 are many scenarios where LDP is not required e.g. deployments with 326 VXLAN or SRv6. The solution defined in this draft with EVPN does not 327 mandate the need to use LDP or ICCP and is independent of the 328 underlay encapsulation. 330 7. Overall Advantages 332 The use of port-active multi-homing brings the following benefits to 333 EVPN networks: 335 a. Open standards based per interface single-active load-balancing 336 mechanism that eliminates the need to run ICCP and LDP. 338 b. Agnostic of underlay technology (MPLS, VXLAN, SRv6) and 339 associated services (L2, L3, Bridging, E-LINE, etc). 341 c. Provides a way to enable deterministic QOS over MC-LAG attachment 342 circuits. 344 d. Fully compliant with [RFC7432], does not require any new protocol 345 enhancement to existing EVPN RFCs. 347 e. Can leverage various DF election algorithms e.g. modulo, HRW, 348 etc. 350 f. Replaces legacy MC-LAG ICCP-based solution, and offers following 351 additional benefits: 353 * Efficiently supports 1+N redundancy mode (with EVPN using BGP 354 RR) where as ICCP requires full mesh of LDP sessions among PEs 355 in redundancy group. 357 * Fast convergence with mass-withdraw is possible with EVPN, no 358 equivalent in ICCP. 360 g. Customers want per interface single-active load-balancing, but 361 don't want to enable LDP (e.g. they may be running VXLAN or SRv6 362 in the network). Currently there is no alternative to this. 364 8. Security Considerations 366 The same Security Considerations described in [RFC7432] are valid for 367 this document. 369 9. IANA Considerations 371 This document solicits the allocation of the following values: 373 o Bit 5 in the [RFC8584] DF Election Capabilities registry, with 374 name "P" (port mode load-balancing) Capability" for port-active 375 ES. 377 10. References 379 10.1. Normative References 381 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 382 Requirement Levels", BCP 14, RFC 2119, 383 DOI 10.17487/RFC2119, March 1997, 384 . 386 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 387 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 388 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 389 2015, . 391 [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. 392 Rabadan, "Virtual Private Wire Service Support in Ethernet 393 VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017, 394 . 396 [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, 397 J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet 398 VPN Designated Forwarder Election Extensibility", 399 RFC 8584, DOI 10.17487/RFC8584, April 2019, 400 . 402 10.2. Informative References 404 [I-D.ietf-bess-evpn-pref-df] 405 Rabadan, J., Sathappan, S., Przygienda, T., Lin, W., 406 Drake, J., Sajassi, A., and satyamoh@cisco.com, 407 "Preference-based EVPN DF Election", draft-ietf-bess-evpn- 408 pref-df-07 (work in progress), March 2021. 410 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 411 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 412 2006, . 414 [RFC7275] Martini, L., Salam, S., Sajassi, A., Bocci, M., 415 Matsushima, S., and T. Nadeau, "Inter-Chassis 416 Communication Protocol for Layer 2 Virtual Private Network 417 (L2VPN) Provider Edge (PE) Redundancy", RFC 7275, 418 DOI 10.17487/RFC7275, June 2014, 419 . 421 Authors' Addresses 423 Patrice Brissette (editor) 424 Cisco Systems 425 Ottawa, ON 426 Canada 428 Email: pbrisset@cisco.com 430 Ali Sajassi 431 Cisco Systems 432 USA 434 Email: sajassi@cisco.com 436 Luc Andre Burdet 437 Cisco Systems 438 Canada 440 Email: lburdet@cisco.com 442 Samir Thoria 443 Cisco Systems 444 USA 446 Email: sthoria@cisco.com 448 Bin Wen 449 Comcast 450 USA 452 Email: Bin_Wen@comcast.com 454 Edward Leyton 455 Verizon Wireless 456 USA 458 Email: edward.leyton@verizonwireless.com 459 Jorge Rabadan 460 Nokia 461 USA 463 Email: jorge.rabadan@nokia.com