idnits 2.17.1 draft-mohanty-bess-evpn-bum-opt-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 2, 2019) is 1636 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC4271' is defined on line 367, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-bess-evpn-per-mcast-flow-df-election' is defined on line 390, but no explicit reference was found in the text == Unused Reference: 'RFC4364' is defined on line 396, but no explicit reference was found in the text == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-igmp-mld-proxy-04 == Outdated reference: A later version (-09) exists of draft-ietf-bess-evpn-per-mcast-flow-df-election-01 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS WorkGroup S. Mohanty 3 Internet-Draft M. Ghosh 4 Intended status: Informational A. Sajassi 5 Expires: May 5, 2020 Cisco Systems 6 S. Breeze 7 Claranet 8 J. Uttaro 9 ATT 10 November 2, 2019 12 BGP EVPN Flood Traffic Optimization at EVPN Gateways 13 draft-mohanty-bess-evpn-bum-opt-01 15 Abstract 17 In EVPN, the Broadcast, Unknown Unicast and Multicast (BUM) traffic 18 is sent to all the routers participating in the EVPN instance. In a 19 multi-homing scenario, when more than one PEs share the same Ethernet 20 Segment, i.e. there are more than one PEs in a redundancy group, only 21 the PE that is the Designated-Forwarder (DF) for the ES will forward 22 that packet on the access interface whereas all non-DF PEs will drop 23 the packet. In deployments such as EVPN Gateways (EVPN GW) or Data 24 Center Interconnect (DCI) routers, this can be quite wasteful. This 25 is especially true if there are significantly more EVPN GW or DCI PEs 26 all participating in the same sets of ES and vES. This draft 27 explores the problem and provides solutions for the same. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on May 5, 2020. 46 Copyright Notice 48 Copyright (c) 2019 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (https://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Requirements Language and Terminology . . . . . . . . . . . . 2 64 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 3. Problem Description . . . . . . . . . . . . . . . . . . . . . 4 66 4. Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 4.1. DF Election per-mcast-flow . . . . . . . . . . . . . . . 5 68 4.2. Suppress the advertisement of the IMET route . . . . . . 5 69 4.3. Advertisement of the IMET route from the BDF . . . . . . 7 70 5. Protocol Considerations . . . . . . . . . . . . . . . . . . . 7 71 6. Operational Considerations . . . . . . . . . . . . . . . . . 8 72 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 73 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 74 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 8 75 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 76 10.1. Normative References . . . . . . . . . . . . . . . . . . 8 77 10.2. Informative References . . . . . . . . . . . . . . . . . 9 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 80 1. Requirements Language and Terminology 82 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 83 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 84 document are to be interpreted as described in [RFC2119]. 86 o ES: Ethernet Segment 88 o vES: Virtual Ethernet Segment 90 o EVI: Ethernet virtual Instance, this is a mac-vrf. 92 o IMET: Inclusive Multicast Route 93 o DF: Designated Forwarder 95 o BDF: Backup Designated Forwarder 97 o DCI: Data Center Interconnect Router 99 2. Introduction 101 EVPN [RFC7432] describes a solution for disseminating mac addresses 102 over an mpls core via the Border Gateway Protocol. In EVPN, data 103 plane learning is confined to the access, and the control plane 104 learning happens via BGP in the core. This prevents unnecessary 105 flooding in the data plane as the traffic is directed to where the 106 destination is learnt from. However, in case of Broadcast, Unknown 107 Unicast and Multicast (BUM) traffic, the PE needs to do a flooding to 108 all the other PEs in the domain. 110 PEs elect a Designated Forwarder (DF) amongst themselves, for a given 111 ES, by exchanging type-4 routes via BGP. The role of a DF is to 112 forward BUM traffic received from the core, towards its access facing 113 interface. A PE in a non-DF role will drop flood traffic received on 114 its core-facing interface. Note that the DF election process is only 115 confined to the set of PEs who host the same Ethernet Segment. 116 Remote PEs are not interested in type-4 routes for Ethernet Segments 117 that they do not host. Hence remote PEs are ignorant of the DFs for 118 segments which is not local to them. Consequently, when the remote 119 PE needs to do a BUM flooding using ingress replication, it will 120 flood the frames to all participating PEs, irrespective of whether 121 DFs or not. The key to creating a list of PEs with which to flood 122 to, is the Inclusive multicast ethernet tag route which is described 123 below. 125 The IMET route (type-3) in EVPN advertises the BUM label for the EVI 126 to all the other PEs who are interested in the same EVI. For ingress 127 replication the label is encapsulated in the PMSI attribute. The 128 label is used to encapsulate the BUM traffic at the ingress entity. 129 This label is inserted just above the split-horizon label in the BUM 130 frame. When the BUM packet is received by a PE that is multi-homed 131 to the same Ethernet segment as the PE that originated the BUM 132 packet, and, is the DF for that (EVI, ES) pair, after popping the 133 transport label, the receiving PE is going to check if the split- 134 horizon label is its own. If so, it will drop the packet if no other 135 ES is configured. Otherwise it will forward the frame on all other 136 Segments that are part of the same EVI. if the PE is not the DF, it 137 will drop the packet immediately. 139 ____ ____ 140 __/ \__ ___/ \___ 141 / \ / \ 142 CE1+--+-+VTEP1 DCI1 PE1+---+CE10 143 | | | | | 144 | | | | | 145 CE2+--+-+VTEP2 EVPN DCI2 EVPN | 146 | VXLAN | | MPLS | 147 | FWD | | FWD | 148 CE3+----+VTEP3 DCI3 | 149 | | | | 150 | | | | 151 | | | | 152 CEn+----+VTEPk DCIj / 153 \__ ___/ \___ __/ 154 \____/ \____/ 156 An EVPN Datacenter network with VXLAN forwarding joined to a 157 traditional EVPN network with MPLS forwarding. Adjoining DCI routers 158 are said to be EVPN GW's. A DCI will have a single vES (ESI) per BD, 159 with multiple VTEP next-hops. 161 Figure 1 163 3. Problem Description 165 In the Figure 1. above, DCI1, DCI2 and DCI3 are all multi-homed EVPN 166 GW's for multiple VTEPs serving the same vES, say vES1. PE1 has a 167 single host which is not multi-homed. 169 The same EVPN instance (Bridge-Domain) exists on all the PEs and 170 DCIs. For this EVPN instance, DCI1 is the Designated Forwarder on 171 vES1 and DCI2 is the backup DF [RFC8584]. When PE1 sends the BUM 172 traffic, the flooded frames are received by DCI1, DCI2, DCI3 up to 173 DCIj. DCI1 is going to forward the flood traffic on its vES towards 174 all VTEPs participating in vES1. DCI2, DCI3 and all DCIs up to DCIj 175 will drop the flooded frames that they receive from the core. 177 Here it is wasteful for DCI2, DCI3 and DCIj to receive the flooded 178 frames. Whilst the majority of deployments usually have two DCIs as 179 part of the redundancy group, in some cases, there may be more than 180 two on the same vES. An example being when capacity demands of the 181 DCI are close to the hardware limits of the DCI. In this scenario, 182 operators may chose to protect their investments and increase their 183 resilience by installing additional DCIs, instead of replacing them 184 or further segmenting the datacenter network. Further, increasing 185 the number of DCIs results in more efficient load-balancing across 186 VNIs. 188 We can now formally describe the issue. In general, consider an EVPN 189 instance, EVIi, that exists in a DCI, say DCIj. As per existing EVPN 190 behavior, even if DCIj is not the DF for any of its virtual Ethernet 191 Segments and also there are no other single-homed Ethernet Segments 192 that are part of EVIi in DCIj , then DCIj will still receive BUM 193 traffic meant for EVIi from a remote PE, PEk. This traffic is simply 194 dropped as PEk is not a DF for any of these virtual Ethernet 195 Segments. 197 1. This is an unnecessary usage of bandwidth in the EVPN Core. 199 2. DCIj receives traffic which it drops which is non-optimal usage 200 of the L2 Forwarding engine. 202 3. PEk replicates a copy of the Ethernet Frame to DCIj which is only 203 to be dropped. This consumes cycles at PEk. 205 In this draft we address the above problem and give possible 206 solutions. 208 4. Solutions 210 4.1. DF Election per-mcast-flow 212 Solving the bandwidth in the EVPN core is an operators primary 213 concern. Given the majority of traffic volume in BUM comes from 214 large multicast flows, adopting the mechanisms described in :"I- 215 D.draft-ietf-bess-evpn-per-mcast-flow-df-election-00" not only 216 improves the distribution of multicast traffic amongst DCI1...DCIj 217 for a given vES, techniques such as not advertising the SMET from a 218 non-DF DCI ensure that only DCIs who've won the election for the 219 group, receive multicast traffic for the group. 221 This solution explicitly requires IGMP snooping in the BD where the 222 vES resides. 224 This solution does not solve the problem of unnecessary Broadcast and 225 Unknown Unicast being replicated to nDFs, but it solves the most 226 prominent problem of bandwidth. 228 4.2. Suppress the advertisement of the IMET route 230 The next solution is for a DCI not to advertise the IMET route if the 231 outcome is to drop the flooded traffic 232 o DCIj only needs to advertise "Inclusive Multicast Ethernet Tag 233 route" (Type-3 route) for an EVPN Instance, EVIi if and only if 234 EVIi is configured on at least one Ethernet Segment (which also 235 has a presence in another DCI, i.e Multihomed) and DCIj is the DF 236 for that specific Ethernet Segment. 238 o The Type-3 SHOULD also be advertised if there is a "Single-Home" 239 Ethernet Segment on an EVI. 241 o Where a DCI is the first DF for an vES on an EVPN Instance, the 242 IMET should be advertised, whereas on the Last DF to Non-DF 243 transition, it should be withdrawn. 245 In the Figure 2 the same EVPN instance exists in DCI1, DCI2, DCI3, 246 DCIj and PE1. However, only DCI1 and PE1 advertise the IMET route. 247 So PE1 sends the flood traffic to DCI1 only. 249 ____ ____ 250 __/ \__ - - ->___/ \___ 251 / \ / \ 252 CE1+--+-+VTEP1 DCI1 PE1+---+CE10 253 | | | | | 254 | | | | | 255 CE2+--+-+VTEP2 EVPN DCI2 EVPN | 256 | VXLAN | | MPLS | 257 | FWD | | FWD | 258 CE3+----+VTEP3 DCI3 | 259 | | | | 260 | | | | 261 | | | | 262 CEn+----+VTEPk DCIj / 263 \__ ___/ \___ __/ 264 \____/ \____/ 266 An EVPN GW Network 268 Figure 2 270 With this approach, on a DF DCI1 failure, BUM traffic will be dropped 271 until the IMET from the next elected DF [DCI2 through DCIj] is 272 received at PE1. Note however; present behaviour is that BUM is also 273 dropped based on route type 4 withdraw in the peering PEs. In 274 comparison of this proposal with the existing methods, convergence 275 delay will be MAX[Type 4, Type 3 Propagation delays] after the New DF 276 is elected. This leads to our next solution extension, where 277 convergence cannot be traded off over bandwidth optimization. 279 4.3. Advertisement of the IMET route from the BDF 281 1. Multihomed PEs can easily compute the Backup DF, based on the DF 282 election mode in operation. 284 2. Extending the previous solution, we are proposing that a PE 285 should only advertise Type-3 for an EVI if and only if one of the 286 conditions hold: 288 * It has an Single Home Ethernet Segment, in the EVI 290 * It is DF for at least one ES or vES, for that EVI 292 * It is BDF for at least one ES or vES, for that EVI 294 This would mean that, in Fig. 2, in addition to the IMET routes that 295 are being advertised from DCI1, DCI2 also advertises the IMET route 296 since it is the BDF. It can be seen from the above example that with 297 increasing number of multi-homed PEs sharing the same vESs, only two 298 DCIs will advertise IMET on behalf of an EVI. Of course, if there 299 are some single-homed hosts, there may be some additional IMET 300 advertisements. But the real benefits are in the data plane since 301 this results in no BUM traffic for DCIs that do not need it; but 302 would have, nevertheless, got it, as per the existing EVPN 303 procedures. 305 It is important to note that the solutions involving suppression of 306 IMET should be limited to the following use case caveats; 308 1. BUM traffic for Ingress Replication (IR) cases 310 2. BDs with no igmp/mld/pim proxy 312 3. BDs with no OISM or IRBs 314 4. BDs with vES associated to overlay tunnels and no other ACs 316 With these caveats, the suppression of IMET at non DF or BDF EVPN GWs 317 provide complete control over BUM traffic distribution per-vES (per- 318 BD). 320 5. Protocol Considerations 322 This idea conforms to existing EVPN drafts that deal with BUM 323 handling [RFC7432], and [I-D.ietf-bess-evpn-igmp-mld-proxy]. 324 Additionally, to take DF Type 4 as explained in :"I-D.draft-ietf- 325 bess-evpn-per-mcast-flow-df-election" into consideration, along the 326 other conditions specified in Sections 4 and 5, the PE should 327 advertise IMET if and only if there is at least one (S,G) for which 328 it is DF. For all other DF Types, no additional considerations are 329 required. 331 6. Operational Considerations 333 None 335 7. Security Considerations 337 This document raises no new security issues for EVPN. 339 8. Acknowledgements 341 The authors would like to thank Jorge Rabadan, John Drake and Eric 342 Rosen for discussions related to this draft. 344 9. Contributors 346 Samir Thoria 347 Cisco Systems 348 US 350 Email: sthoria@cisco.com 352 Sameer Gulrajani 353 Cisco Systems 354 US 356 Email: sameerg@cisco.com 358 10. References 360 10.1. Normative References 362 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 363 Requirement Levels", BCP 14, RFC 2119, 364 DOI 10.17487/RFC2119, March 1997, 365 . 367 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 368 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 369 DOI 10.17487/RFC4271, January 2006, 370 . 372 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 373 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 374 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 375 2015, . 377 [RFC8584] Rabadan, J., Ed., Mohanty, R., Sajassi, N., Drake, A., 378 Nagaraj, K., and S. Sathappan, "BGP MPLS-Based Ethernet 379 VPN", RFC 8584, DOI 10.17487/RFC8584, April 2019, 380 . 382 10.2. Informative References 384 [I-D.ietf-bess-evpn-igmp-mld-proxy] 385 Sajassi, A., Thoria, S., Patel, K., Yeung, D., Drake, J., 386 and W. Lin, "IGMP and MLD Proxy for EVPN", draft-ietf- 387 bess-evpn-igmp-mld-proxy-04 (work in progress), September 388 2019. 390 [I-D.ietf-bess-evpn-per-mcast-flow-df-election] 391 Sajassi, A., mishra, m., Thoria, S., Rabadan, J., and J. 392 Drake, "Per multicast flow Designated Forwarder Election 393 for EVPN", draft-ietf-bess-evpn-per-mcast-flow-df- 394 election-01 (work in progress), March 2019. 396 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 397 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 398 2006, . 400 Authors' Addresses 402 Satya Ranjan Mohanty 403 Cisco Systems 404 170 W. Tasman Drive 405 San Jose, CA 95134 406 USA 408 Email: satyamoh@cisco.com 410 Mrinmoy Ghosh 411 Cisco Systems 412 170 W. Tasman Drive 413 San Jose, CA 95134 414 USA 416 Email: mrghosh@cisco.com 417 Ali Sajassi 418 Cisco Systems 419 170 W. Tasman Drive 420 San Jose, CA 95134 421 USA 423 Email: sajassi@cisco.com 425 Sandy Breeze 426 Claranet 427 21 Southampton Row 428 London WC1B 5HA 429 United Kingdom 431 Email: sandy.breeze@eu.clara.net 433 Jim Uttaro 434 ATT 435 200 S. Laurel Avenue 436 Middletown, CA 07748 437 USA 439 Email: uttaro@att.com