idnits 2.17.1 draft-ietf-bess-evpn-mh-pa-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 22, 2021) is 1153 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '3-6' is mentioned on line 250, but not defined == Missing Reference: 'ES' is mentioned on line 257, but not defined == Missing Reference: 'VLAN' is mentioned on line 257, but not defined == Unused Reference: 'RFC8174' is defined on line 417, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Working Group P. Brissette, Ed. 3 Internet-Draft A. Sajassi 4 Intended status: Standards Track Cisco Systems 5 Expires: August 26, 2021 B. Wen 6 Comcast 7 E. Leyton 8 Verizon Wireless 9 J. Rabadan 10 Nokia 11 L. Burdet 12 S. Thoria 13 Cisco Systems 14 February 22, 2021 16 EVPN multi-homing port-active load-balancing 17 draft-ietf-bess-evpn-mh-pa-01 19 Abstract 21 The Multi-Chassis Link Aggregation Group (MC-LAG) technology enables 22 the establishment of a logical link-aggregation connection with a 23 redundant group of independent nodes. The purpose of multi-chassis 24 LAG is to provide a solution to achieve higher network availability, 25 while providing different modes of sharing/balancing of traffic. 26 EVPN standard defines EVPN based MC-LAG with single-active and all- 27 active multi-homing load-balancing mode. The current draft expands 28 on existing redundancy mechanisms supported by EVPN and introduces 29 support of port-active load-balancing mode. In the current document, 30 port-active load-balancing mode is also referred to as per interface 31 active/standby. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at https://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on August 26, 2021. 50 Copyright Notice 52 Copyright (c) 2021 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 68 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 69 2. Multi-Chassis Ethernet Bundles . . . . . . . . . . . . . . . 4 70 3. Port-active load-balancing procedure . . . . . . . . . . . . 4 71 4. Algorithm to elect per port-active PE . . . . . . . . . . . . 5 72 4.1. Capability Flag . . . . . . . . . . . . . . . . . . . . . 5 73 4.2. Modulo-based Designated Forwarder Algorithm . . . . . . . 6 74 4.3. HRW Algorithm . . . . . . . . . . . . . . . . . . . . . . 6 75 4.4. Preferred-DF Algorithm . . . . . . . . . . . . . . . . . 6 76 5. Convergence considerations . . . . . . . . . . . . . . . . . 6 77 5.1. Primary / Backup per Ethernet-Segment . . . . . . . . . . 7 78 5.2. Backward Compatibility . . . . . . . . . . . . . . . . . 7 79 6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 7 80 7. Overall Advantages . . . . . . . . . . . . . . . . . . . . . 8 81 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 82 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 83 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 84 10.1. Normative References . . . . . . . . . . . . . . . . . . 9 85 10.2. Informative References . . . . . . . . . . . . . . . . . 9 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 88 1. Introduction 90 EVPN, as per [RFC7432], provides all-active per flow load balancing 91 for multi-homing. It also defines single-active with service carving 92 mode, where one of the PEs, in redundancy relationship, is active per 93 service. 95 While these two multi-homing scenarios are most widely utilized in 96 data center and service provider access networks, there are scenarios 97 where active-standby per interface multi-homing redundancy is useful 98 and required. The main consideration for this mode of redundancy is 99 the determinism of traffic forwarding through a specific interface 100 rather than statistical per flow load balancing across multiple PEs 101 providing multi-homing. The determinism provided by active-standby 102 per interface is also required for certain QOS features to work. 103 While using this mode, customers also expect minimized convergence 104 during failures. A new term of load-balancing mode, port-active 105 load- balancing is then defined. 107 This draft describes how that new redundancy mode can be supported 108 via EVPN 110 +-----+ 111 | PE3 | 112 +-----+ 113 +-----------+ 114 | MPLS/IP | 115 | CORE | 116 +-----------+ 117 +-----+ +-----+ 118 | PE1 | | PE2 | 119 +-----+ +-----+ 120 | | 121 I1 I2 122 \ / 123 \ / 124 +---+ 125 |CE1| 126 +---+ 128 Figure 1: MC-LAG Topology 130 Figure 1 shows a MC-LAG multi-homing topology where PE1 and PE2 are 131 part of the same redundancy group providing multi-homing to CE1 via 132 interfaces I1 and I2. Interfaces I1 and I2 are Bundle-Ethernet 133 interfaces running LACP protocol. The core, shown as IP or MPLS 134 enabled, provides wide range of L2 and L3 services. MC-LAG multi- 135 homing functionality is decoupled from those services in the core and 136 it focuses on providing multi-homing to CE. With per-port active/ 137 standby redundancy, only one of the two interface I1 or I2 would be 138 in forwarding, the other interface will be in standby. This also 139 implies that all services on the active interface are in active mode 140 and all services on the standby interface operate in standby mode. 142 1.1. Requirements Language 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in [RFC2119]. 148 2. Multi-Chassis Ethernet Bundles 150 When a CE is multi-homed to a set of PE nodes using the [802.1AX] 151 Link Aggregation Control Protocol (LACP), the PEs must act as if they 152 were a single LACP speaker for the Ethernet links to form a bundle, 153 and operate as a Link Aggregation Group (LAG). To achieve this, the 154 PEs connected to the same multi-homed CE must synchronize LACP 155 configuration and operational data among them. InterChassis 156 Communicated-based Protocol (ICCP) has been used for that purpose. 157 EVPN LAG simplifies greatly that solution. Along with the 158 simplification comes few assumptions: 160 o CE device connected to Multi-homing PEs may has a single LAG with 161 all its active links i.e. Links in the Ethernet Bundle operate in 162 all-active load-balancing mode. 164 o Same LACP parameters MUST be configured on peering PEs such as 165 system id, port priority and port key. 167 Any discrepancies from this list is left for future study. 168 Furthermore, mis-configuration and mis-wiring detection across 169 peering PEs are also left for further study. 171 3. Port-active load-balancing procedure 173 Following steps describe the proposed procedure with EVPN LAG to 174 support port-active load-balancing mode: 176 a. The Ethernet-Segment Identifier (ESI) MUST be assigned per access 177 interface as described in [RFC7432], which may be auto derived or 178 manually assigned. Access interface MAY be a Layer-2 or Layer3 179 interface. The usage of ESI over L3 interfce is newly described 180 in this document. 182 b. Ethernet-Segment MUST be configured in port-active load-balancing 183 mode on peering PEs for specific access interface 185 c. Peering PEs MAY exchange only Ethernet-Segment route (Route Type- 186 4) when ESI is configured on a Layer3 interface. 188 d. PEs in the redundancy group leverage the DF election defined in 189 [RFC8584] to determine which PE keeps the port in active mode and 190 which one(s) keep it in standby mode. While the DF election 191 defined in [RFC8584] is per [ES, Ethernet Tag] granularity, for 192 port-active mode of multi-homing, the DF election is done per ES. 193 The details of this algorithm are described in Section 4. 195 e. DF router MUST keep corresponding access interface in up and 196 forwarding active state for that Ethernet-Segment 198 f. Non-DF routers MAY bring and keep peering access interface 199 attached to it in operational down state. If the interface is 200 running LACP protocol, then the non-DF PE MAY also set the LACP 201 state to OOS (Out of Sync) as opposed to interface state down. 202 This allows for better convergence on standby to active 203 transition. 205 g. For EVPN-VPWS service, the usage of primary/backup bits of EVPN 206 Layer2 attributes extended community [RFC8214] is highly 207 recommended to achieve better convergence. 209 4. Algorithm to elect per port-active PE 211 The ES routes, running in port-active load-balancing mode, are 212 advertised with a new capability in the DF Election Extended 213 Community as defined in [RFC8584]. Moreover, the ES associated to 214 the port leverages existing procedure of single-active, and signals 215 single-active bit along with Ethernet-AD per-ES route. Finally, as 216 in [RFC7432], the ESI-label based split-horizon procedures should be 217 used to avoid transient echo'ed packets when L2 circuits are 218 involved. 220 4.1. Capability Flag 222 [RFC8584] defines a DF Election extended community, and a Bitmap 223 field to encode "capabilities" to use with the DF election algorithm 224 in the DF algorithm field. Bitmap (2 octets) is extended by the 225 following value: 227 1 1 1 1 1 1 228 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 229 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 230 |D|A| |P| | 231 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 233 Figure 2: Amended Bitmap field in the DF Election Extended Community 235 Bit 0: 'Don't Preempt' bit, as explained in [PREF-DF]. 237 Bit 1: AC-Influenced DF Election, as explained in [RFC8584]. 239 Bit 5: (corresponds to Bit 25 of the DF Election Extended 240 Community and it is defined by this document): P bit or 'Port 241 Mode' bit (P hereafter), determines that the DF-Algorithm should 242 be modified to consider the port only and not the Ethernet Tags. 244 4.2. Modulo-based Designated Forwarder Algorithm 246 The default DF Election algorithm, or modulus-based algorithm as in 247 [RFC7432] and updated by [RFC8584], is used here, at the granularity 248 of ES only. Given the fact, ES-Import RT community inherits from ESI 249 only byte 1-6, many deployments differentiate ESI within these bytes 250 only. For Modulo calculation, bytes [3-6] are used to determine the 251 designated forwarder using Modulo-based DF assignment. 253 4.3. HRW Algorithm 255 Highest Random Weight (HRW) algorithm defined in [RFC8584] MAY also 256 be used and signalled, and modified to operate at the granularity of 257 ES rather than per [ES, VLAN]. 259 [RFC8584] describes computing a 32 bit CRC over the concatenation of 260 Ethernet Tag and ESI. For port-active load-balancing mode, the 261 Ethernet Tag is simply removed from the CRC computation. 263 4.4. Preferred-DF Algorithm 265 When the new capability 'Port-Mode' is signalled, the algorithm is 266 modified to consider the port only and not any associated Ethernet 267 Tags. Furthermore, the "port-based" capability MUST be compatible 268 with the 'DP' capability (for non-revertive). The AC-DF bit MUST be 269 set to zero. When an AC (sub-interface) goes down, it does not 270 influence the DF election. 272 5. Convergence considerations 274 To improve the convergence, upon failure and recovery, when port- 275 active load-balancing mode is used, some advanced synchronization 276 between peering PEs may be required. Port-active is challenging in a 277 sense that the "standby" port is in down state. It takes some time 278 to bring a "standby" port in up-state and settle the network. For 279 IRB and L3 services, ARP / ND cache may be synchronized. Moreover, 280 associated VRF tables may also be synchronized. For L2 services, MAC 281 table synchronization may be considered. 283 Finally, for Bundle-Ethernet interface where LACP is running the 284 ability to set the "standby" port in "out-of-sync" state aka "warm- 285 standby" can be leveraged. 287 5.1. Primary / Backup per Ethernet-Segment 289 The L2 Info Extended Community MAY be advertised in Ethernet A-D per 290 ES routes for fast convergence. Only the P and B bits are relevant 291 to this specification. When advertised, the L2 Info Extended 292 Community SHALL have only P or B bits set and all other bits must be 293 zero. MTU must also be zero. Remote PE receiving optional L2 Info 294 Extended Community on Ethernet A-D per ES routes SHALL consider only 295 P and B bits. P and B bits received on Ethernet A-D per EVI routes 296 per [RFC8214] are overridden. 298 5.2. Backward Compatibility 300 Implementations that comply with [RFC7432] or [RFC8214] only (i.e., 301 implementations that predate this specification) will not advertise 302 the L2 Info Extended Community in Ethernet A-D per ES routes. That 303 means that all remote PEs in the ES will not receive P and B bit per 304 ES and will continue to receive and honour the P and B bits Ethernet 305 A-D per EVI routes. Similarly, an implementation that complies with 306 [RFC7432] or [RFC8214] only and that receives a L2 Info Extended 307 Community will ignore it and will continue to use the default path 308 resolution algorithm. 310 6. Applicability 312 A common deployment is to provide L2 or L3 service on the PEs 313 providing multi-homing. The services could be any L2 EVPN such as 314 EVPN VPWS, EVPN [RFC7432], etc. L3 service could be in VPN context 315 [RFC4364] or in global routing context. When a PE provides first hop 316 routing, EVPN IRB could also be deployed on the PEs. The mechanism 317 defined in this draft is used between the PEs providing the L2 and/or 318 L3 service, when the requirement is to use per port active. 320 A possible alternate solution is the one described in this draft is 321 MC-LAG with ICCP [RFC7275] active-standby redundancy. However, ICCP 322 requires LDP to be enabled as a transport of ICCP messages. There 323 are many scenarios where LDP is not required e.g. deployments with 324 VXLAN or SRv6. The solution defined in this draft with EVPN does not 325 mandate the need to use LDP or ICCP and is independent of the 326 underlay encapsulation. 328 7. Overall Advantages 330 The use of port-active multi-homing brings the following benefits to 331 EVPN networks: 333 a. Open standards based per interface single-active redundancy 334 mechanism that eliminates the need to run ICCP and LDP. 336 b. Agnostic of underlay technology (MPLS, VXLAN, SRv6) and 337 associated services (L2, L3, Bridging, E-LINE, etc). 339 c. Provides a way to enable deterministic QOS over MC-LAG attachment 340 circuits. 342 d. Fully compliant with [RFC7432], does not require any new protocol 343 enhancement to existing EVPN RFCs. 345 e. Can leverage various DF election algorithms e.g. modulo, HRW, 346 etc. 348 f. Replaces legacy MC-LAG ICCP-based solution, and offers following 349 additional benefits: 351 g. 353 * Efficiently supports 1+N redundancy mode (with EVPN using BGP 354 RR) where as ICCP requires full mesh of LDP sessions among PEs 355 in redundancy group. 357 * Fast convergence with mass-withdraw is possible with EVPN, no 358 equivalent in ICCP 360 h. Customers want per interface single-active redundancy, but don't 361 want to enable LDP (e.g. they may be running VXLAN or SRv6 in the 362 network). Currently there is no alternative to this. 364 8. Security Considerations 366 The same Security Considerations described in [RFC7432] are valid for 367 this document. 369 9. IANA Considerations 371 This document solicits the allocation of the following values: 373 o Bit 5 in the [RFC8584] DF Election Capabilities registry, with 374 name "P" (port mode load-balancing) Capability" for port-active 375 ES. 377 10. References 379 10.1. Normative References 381 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 382 Requirement Levels", BCP 14, RFC 2119, 383 DOI 10.17487/RFC2119, March 1997, 384 . 386 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 387 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 388 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 389 2015, . 391 [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. 392 Rabadan, "Virtual Private Wire Service Support in Ethernet 393 VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017, 394 . 396 [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, 397 J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet 398 VPN Designated Forwarder Election Extensibility", 399 RFC 8584, DOI 10.17487/RFC8584, April 2019, 400 . 402 10.2. Informative References 404 [PREF-DF] Rabadan, J., "Preference-based EVPN DF Election", 2020. 406 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 407 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 408 2006, . 410 [RFC7275] Martini, L., Salam, S., Sajassi, A., Bocci, M., 411 Matsushima, S., and T. Nadeau, "Inter-Chassis 412 Communication Protocol for Layer 2 Virtual Private Network 413 (L2VPN) Provider Edge (PE) Redundancy", RFC 7275, 414 DOI 10.17487/RFC7275, June 2014, 415 . 417 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 418 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 419 May 2017, . 421 Authors' Addresses 423 Patrice Brissette (editor) 424 Cisco Systems 425 Ottawa, ON 426 Canada 428 Email: pbrisset@cisco.com 430 Ali Sajassi 431 Cisco Systems 432 USA 434 Email: sajassi@cisco.com 436 Bin Wen 437 Comcast 438 USA 440 Email: Bin_Wen@comcast.com 442 Edward Leyton 443 Verizon Wireless 444 USA 446 Email: edward.leyton@verizonwireless.com 448 Jorge Rabadan 449 Nokia 450 USA 452 Email: jorge.rabadan@nokia.com 454 Luc Andre Burdet 455 Cisco Systems 456 Canada 458 Email: lburdet@cisco.com 459 Samir Thoria 460 Cisco Systems 461 USA 463 Email: sthoria@cisco.com