idnits 2.17.1 draft-brissette-bess-evpn-mh-pa-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (October 31, 2019) is 1633 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '3-7' is mentioned on line 254, but not defined Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Working Group Patrice Brissette 3 INTERNET-DRAFT Ali Sajassi 4 Intended Status: Proposed Standard Cisco Systems 6 Bin Wen 7 Comcast 9 Edward Leyton 10 Verizon Wireless 12 Jorge Rabadan 13 Nokia 15 Expires: May 3, 2020 October 31, 2019 17 EVPN multi-homing port-active load-balancing 18 draft-brissette-bess-evpn-mh-pa-04 20 Abstract 22 The Multi-Chassis Link Aggregation Group (MC-LAG) technology enables 23 the establishment of a logical link-aggregation connection with a 24 redundant group of independent nodes. The purpose of multi-chassis 25 LAG is to provide a solution to achieve higher network availability, 26 while providing different modes of sharing/balancing of traffic. EVPN 27 standard defines EVPN based MC-LAG with single-active and all-active 28 multi-homing load-balancing mode. The current draft expands on 29 existing redundancy mechanisms supported by EVPN and introduces 30 support of port-active load-balancing mode. In the current document, 31 port-active load-balancing mode is also referred to as per interface 32 active/standby. 34 Status of this Memo 36 This Internet-Draft is submitted to IETF in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF), its areas, and its working groups. Note that 41 other groups may also distribute working documents as 42 Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 The list of current Internet-Drafts can be accessed at 49 http://www.ietf.org/1id-abstracts.html 51 The list of Internet-Draft Shadow Directories can be accessed at 52 http://www.ietf.org/shadow.html 54 Copyright and License Notice 56 Copyright (c) 2019 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 72 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 73 2. Multi-Chassis Ethernet Bundles . . . . . . . . . . . . . . . . 4 74 3. Port-active load-balancing procedure . . . . . . . . . . . . . 4 75 4. Algorithm to elect per port-active PE . . . . . . . . . . . . . 5 76 4.1 Capability Flag . . . . . . . . . . . . . . . . . . . . . . 5 77 4.2 Modulo-based Designated Forwarder Algorithm . . . . . . . . 6 78 4.3 HRW Algorithm . . . . . . . . . . . . . . . . . . . . . . . 6 79 4.4 Preferred-DF Algorithm . . . . . . . . . . . . . . . . . . . 6 80 5. Convergence considerations . . . . . . . . . . . . . . . . . . 6 81 6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . . 7 82 7. Overall Advantages . . . . . . . . . . . . . . . . . . . . . . 7 83 8 Security Considerations . . . . . . . . . . . . . . . . . . . . 8 84 9 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 85 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 86 10.1 Normative References . . . . . . . . . . . . . . . . . . . 8 87 10.2 Informative References . . . . . . . . . . . . . . . . . . 8 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 90 1 Introduction 92 EVPN, as per [RFC7432], provides all-active per flow load balancing 93 for multi-homing. It also defines single-active with service carving 94 mode, where one of the PEs, in redundancy relationship, is active per 95 service. 97 While these two multi-homing scenarios are most widely utilized in 98 data center and service provider access networks, there are scenarios 99 where active-standby per interface multi-homing redundancy is useful 100 and required. The main consideration for this mode of redundancy is 101 the determinism of traffic forwarding through a specific interface 102 rather than statistical per flow load balancing across multiple PEs 103 providing multi-homing. The determinism provided by active-standby 104 per interface is also required for certain QOS features to work. 105 While using this mode, customers also expect minimized convergence 106 during failures. A new term of load-balancing mode "port-active load- 107 balancing" is then defined. 109 This draft describes how that new redundancy mode can be supported 110 via EVPN. 112 +-----+ 113 | PE3 | 114 +-----+ 115 +-----------+ 116 | MPLS/IP | 117 | CORE | 118 +-----------+ 119 +-----+ +-----+ 120 | PE1 | | PE2 | 121 +-----+ +-----+ 122 | | 123 I1 I2 124 \ / 125 \ / 126 +---+ 127 |CE1| 128 +---+ 130 Figure 1. MC-LAG topology 132 Figure 1 shows a MC-LAG multi-homing topology where PE1 and PE2 are 133 part of the same redundancy group providing multi-homing to CE1 via 134 interfaces I1 and I2. Interfaces I1 and I2 are Bundle-Ethernet 135 interfaces running LACP protocol. The core, shown as IP or MPLS 136 enabled, provides wide range of L2 and L3 services. MC-LAG multi- 137 homing functionality is decoupled from those services in the core and 138 it focuses on providing multi-homing to CE. With per-port 139 active/standby redundancy, only one of the two interface I1 or I2 140 would be in forwarding, the other interface will be in standby. This 141 also implies that all services on the active interface are in active 142 mode and all services on the standby interface operate in standby 143 mode. 145 1.1 Terminology 147 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 148 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 149 "OPTIONAL" in this document are to be interpreted as described in 150 BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all 151 capitals, as shown here. 153 2. Multi-Chassis Ethernet Bundles 155 When a CE is multi-homed to a set of PE nodes using the [802.1AX] 156 Link Aggregation Control Protocol (LACP), the PEs must act as if they 157 were a single LACP speaker for the Ethernet links to form a bundle, 158 and operate as a Link Aggregation Group (LAG). To achieve this, the 159 PEs connected to the same multi-homed CE must synchronize LACP 160 configuration and operational data among them. InterChassis 161 Communicated-based Protocol (ICCP) has been used for that purpose. 162 EVPN LAG simplifies greatly that solution. Along with the 163 simplification comes few assumptions: 165 - CE device connected to Multi-homing PEs may has a single LAG with 166 all its active links i.e. Links in the Ethernet Bundle operate in 167 all-active load-balancing mode. 169 - Same LACP parameters MUST be configured on peering PEs such as 170 system id, port priority and port key. 172 Any discrepancies from this list is left for future study. 173 Furthermore, mis-configuration and mis-wiring detection across 174 peering PEs are also left for further study. 176 3. Port-active load-balancing procedure 178 Following steps describe the proposed procedure with EVPN LAG to 179 support port-active load-balancing mode: 181 1- The Ethernet-Segment Identifier (ESI) MUST be assigned per access 182 interface as described in [RFC7432], which may be auto derived or 183 manually assigned. Access interface MAY be a Layer-2 or Layer3 184 interface. The usage of ESI over L3 interfce is newly described in 185 this document. 187 2- Ethernet-Segment MUST be configured in port-active load-balancing 188 mode on peering PEs for specific access interface 190 3- Peering PEs MAY exchange only Ethernet-Segment route (Route Type- 191 4) when ESI is configured on a Layer3 interface. 193 4- PEs in the redundancy group leverage the DF election defined in 194 [RFC8584] to determine which PE keeps the port in active mode and 195 which one(s) keep it in standby mode. While the DF election defined 196 in [RFC8584] is per granularity, for port-active 197 mode of multi-homing, the DF election is done per . The details 198 of this algorithm are described in Section 4. 200 5- DF router MUST keep corresponding access interface in up and 201 forwarding active state for that Ethernet-Segment 203 6- Non-DF routers MAY bring and keep peering access interface 204 attached to it in operational down state. If the interface is running 205 LACP protocol, then the non-DF PE MAY also set the LACP state to OOS 206 (Out of Sync) as opposed to interface state down. This allows for 207 better convergence on standby to active transition. 209 7- For EVPN-VPWS service, the usage of primary/backup bits of EVPN 210 Layer2 attributes extended community [RFC8214] is highly recommended 211 to achieve better convergence. 213 4. Algorithm to elect per port-active PE 215 The ES routes, running in port-active load-balancing mode, are 216 advertised with a new capability in the DF Election Extended 217 Community as defined in [RFC8584]. Moreover, the ES associated to the 218 port leverages existing procedure of single-active, and signals 219 single-active bit along with Ethernet-AD per-ES route. Finally, as in 220 RFC7432, the ESI-label based split-horizon procedures should be used 221 to avoid transient echo'ed packets when L2 circuits are involved. 223 4.1 Capability Flag 225 [RFC8584] defines a DF Election extended community, and a Bitmap 226 field to encode "capabilities" to use with the DF election algorithm 227 in the DF algorithm field. Bitmap (2 octets) is extended by the 228 following value: 230 1 1 1 1 1 1 231 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 232 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 233 |D|A| |P| | 234 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 236 Figure 2 - Amended Bitmap field in the DF Election Extended Community 238 - Bit 0: 'Don't Preempt' bit, as explained in [PREF-DF]. 240 - Bit 1: AC-Influenced DF Election, as explained in [RFC8584]. 242 - Bit 5: (corresponds to Bit 25 of the DF Election Extended 243 Community and it is defined by this document): 244 P bit or 'Port Mode' bit (P hereafter), determines 245 that the DF-Algorithm should be modified to consider 246 the port only and not the Ethernet Tags. 248 4.2 Modulo-based Designated Forwarder Algorithm 250 The default DF Election algorithm, or modulus-based algorithm as in 251 [RFC7432] and updated by [RFC8584], is used here, at the granularity 252 of only. Given the fact, ES-Import RT community inherits from 253 ESI only byte 1-7, many deployments differentiate ESI within these 254 bytes only. For Modulo calculation, bytes [3-7] are used to determine 255 the designated forwarder using Modulo-based DF assignment. 257 4.3 HRW Algorithm 259 Highest Random Weight (HRW) algorithm defined in [RFC8584] MAY also 260 be used and signaled, and modified to operate at the granularity of 261 rather than per . 263 [RFC8584] describes computing a 32 bit CRC over the concatenation of 264 Ethernet Tag and ESI. For port-active load-balancing mode, the 265 Ethernet Tag is simply removed from the CRC computation. 267 4.4 Preferred-DF Algorithm 269 When the new capability 'Port-Mode' is signaled, the algorithm is 270 modified to consider the port only and not any associated Ethernet 271 Tags. Furthermore, the "port-based" capability MUST be compatible 272 with the 'DP' capability (for non-revertive). The AC-DF bit MUST be 273 set to zero. When an AC (sub-interface) goes down, it does not 274 influence the DF election. 276 5. Convergence considerations 278 To improve the convergence, upon failure and recovery, when port- 279 active load-balancing mode is used, some advanced synchronization 280 between peering PEs may be required. Port-active is challenging in a 281 sense that the "standby" port is in down state. It takes some time to 282 bring a "standby" port in up-state and settle the network. For IRB 283 and L3 services, ARP / ND cache may be synchronized. Moreover, 284 associated VRF tables may also be synchronized. For L2 services, MAC 285 table synchronization may be considered. 287 Finally, for Bundle-Ethernet interface where LACP is running the 288 ability to set the "standby" port in "out-of-sync" state aka "warm- 289 standby" can be leveraged. 291 6. Applicability 293 A common deployment is to provide L2 or L3 service on the PEs 294 providing multi-homing. The services could be any L2 EVPN such as 295 EVPN VPWS, EVPN [RFC7432], etc. L3 service could be in VPN context 296 [RFC4364] or in global routing context. When a PE provides first hop 297 routing, EVPN IRB could also be deployed on the PEs. The mechanism 298 defined in this draft is used between the PEs providing the L2 and/or 299 L3 service, when the requirement is to use per port active. 301 A possible alternate solution is the one described in this draft is 302 MC-LAG with ICCP [RFC7275] active-standby redundancy. However, ICCP 303 requires LDP to be enabled as a transport of ICCP messages. There are 304 many scenarios where LDP is not required e.g. deployments with VXLAN 305 or SRv6. The solution defined in this draft with EVPN does not 306 mandate the need to use LDP or ICCP and is independent of the 307 underlay encapsulation. 309 7. Overall Advantages 311 The use of port-active multi-homing brings the following benefits to 312 EVPN networks: 314 - Open standards based per interface single-active redundancy 315 mechanism that eliminates the need to run ICCP and LDP. 317 - Agnostic of underlay technology (MPLS, VXLAN, SRv6) and associated 318 services (L2, L3, Bridging, E-LINE, etc). 320 - Provides a way to enable deterministic QOS over MC-LAG attachment 321 circuits 323 - Fully compliant with [RFC7432], does not require any new protocol 324 enhancement to existing EVPN RFCs. 326 - Can leverage various DF election algorithms e.g. modulo, HRW, etc. 328 - Replaces legacy MC-LAG ICCP-based solution, and offers following 329 additional benefits: 331 - Efficiently supports 1+N redundancy mode (with EVPN using BGP 332 RR) where as ICCP requires full mesh of LDP sessions among PEs in 333 redundancy group 335 - Fast convergence with mass-withdraw is possible with EVPN, no 336 equivalent in ICCP 338 - Customers want per interface single-active redundancy, but don't 339 want to enable LDP (e.g. they may be running VXLAN or SRv6 in the 340 network). Currently there is no alternative to this. 342 8 Security Considerations 344 The same Security Considerations described in [RFC7432] are valid for 345 this document. 347 9 IANA Considerations 349 This document solicits the allocation of the following values: 351 o Bit 5 in the [RFC8584] DF Election Capabilities registry, 352 with name "P"(port mode load-balancing) Capability" for 353 port-active ES. 355 10 References 357 10.1 Normative References 359 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 360 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 361 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 362 2015, . 364 [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. 365 Rabadan, "Virtual Private Wire Service Support in Ethernet 366 VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017, 367 . 369 [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, 370 J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet 371 VPN Designated Forwarder Election Extensibility", 372 RFC 8584, DOI 10.17487/RFC8584, April 2019, 373 . 375 10.2 Informative References 377 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 378 Requirement Levels", BCP 14, RFC 2119, DOI 379 10.17487/RFC2119, March 1997, . 382 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in 383 RFC 2119 Key Words", BCP 14, RFC 8174, DOI 384 10.17487/RFC8174, May 2017, . 387 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 388 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 389 2006, . 391 [RFC7275] Martini, L., Salam, S., Sajassi, A., Bocci, M., 392 Matsushima, S., and T. Nadeau, "Inter-Chassis 393 Communication Protocol for Layer 2 Virtual Private Network 394 (L2VPN) Provider Edge (PE) Redundancy", RFC 7275, DOI 395 10.17487/RFC7275, June 2014, . 398 [PREF-DF] Rabadan et al. "Preference-based EVPN DF Election", 399 draft-ietf-bess-evpn-pref-df, work-in-progress, June, 400 2019. 402 Authors' Addresses 404 Patrice Brissette 405 Cisco Systems 406 EMail: pbrisset@cisco.com 408 Ali Sajassi 409 Cisco Systems 410 EMail: sajassi@cisco.com 412 Luc Andre Burdet 413 Cisco Systems 414 EMail: lburdet@cisco.com 416 Samir Thoria 417 Cisco Systems 418 EMail: sthoria@cisco.com 420 Jorge Rabadan 421 Nokia 422 Email: jorge.rabadan@nokia.com 424 Bin Wen 425 Comcast 426 Email: Bin_Wen@comcast.com 428 Edward Leyton 429 Verizon 430 Email: edward.leyton@verizonwireless.com