idnits 2.17.1 draft-ietf-bess-evpn-optimized-ir-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 23, 2021) is 947 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-14) exists of draft-ietf-bess-evpn-bum-procedure-updates-10 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BESS Workgroup J. Rabadan, Ed. 3 Internet-Draft S. Sathappan 4 Intended status: Standards Track Nokia 5 Expires: March 27, 2022 W. Lin 6 Juniper Networks 7 M. Katiyar 8 Versa Networks 9 A. Sajassi 10 Cisco Systems 11 September 23, 2021 13 Optimized Ingress Replication solution for EVPN 14 draft-ietf-bess-evpn-optimized-ir-09 16 Abstract 18 Network Virtualization Overlay (NVO) networks using EVPN as control 19 plane may use Ingress Replication (IR) or PIM (Protocol Independent 20 Multicast) based trees to convey the overlay Broadcast, Unknown 21 unicast and Multicast (BUM) traffic. PIM provides an efficient 22 solution to avoid sending multiple copies of the same packet over the 23 same physical link, however it may not always be deployed in the NVO 24 core network. IR avoids the dependency on PIM in the NVO network 25 core. While IR provides a simple multicast transport, some NVO 26 networks with demanding multicast applications require a more 27 efficient solution without PIM in the core. This document describes 28 a solution to optimize the efficiency of IR in NVO networks. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on March 27, 2022. 47 Copyright Notice 49 Copyright (c) 2021 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 2. Terminology and Conventions . . . . . . . . . . . . . . . . . 4 66 3. Solution requirements . . . . . . . . . . . . . . . . . . . . 6 67 4. EVPN BGP Attributes for optimized-IR . . . . . . . . . . . . 6 68 5. Non-selective Assisted-Replication (AR) Solution Description 10 69 5.1. Non-selective AR-REPLICATOR procedures . . . . . . . . . 11 70 5.2. Non-selective AR-LEAF procedures . . . . . . . . . . . . 12 71 5.3. RNVE procedures . . . . . . . . . . . . . . . . . . . . . 15 72 6. Selective Assisted-Replication (AR) Solution Description . . 15 73 6.1. Selective AR-REPLICATOR procedures . . . . . . . . . . . 15 74 6.2. Selective AR-LEAF procedures . . . . . . . . . . . . . . 18 75 7. Pruned-Flood-Lists (PFL) . . . . . . . . . . . . . . . . . . 20 76 7.1. A PFL example . . . . . . . . . . . . . . . . . . . . . . 20 77 8. AR Procedures for single-IP AR-REPLICATORS . . . . . . . . . 21 78 9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon 22 79 9.1. Ethernet Segments on AR-LEAF nodes . . . . . . . . . . . 22 80 9.2. Ethernet Segments on AR-REPLICATOR nodes . . . . . . . . 22 81 10. Security Considerations . . . . . . . . . . . . . . . . . . . 23 82 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 83 12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 84 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 85 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 86 14.1. Normative References . . . . . . . . . . . . . . . . . . 25 87 14.2. Informative References . . . . . . . . . . . . . . . . . 25 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 90 1. Introduction 92 Ethernet Virtual Private Networks (EVPN) may be used as the control 93 plane for a Network Virtualization Overlay (NVO) network. Network 94 Virtualization Edge (NVE) devices and Provider Edges (PEs) that are 95 part of the same EVPN Instance (EVI) use Ingress Replication (IR) or 96 PIM-based trees to transport the tenant's Broadcast, Unknown unicast 97 and Multicast (BUM) traffic. In NVO networks where PIM-based trees 98 cannot be used, IR is the only option. Examples of these situations 99 are NVO networks where the core nodes don't support PIM or the 100 network operator does not want to run PIM in the core. 102 In some use-cases, the amount of replication for BUM traffic is kept 103 under control on the NVEs due to the following fairly common 104 assumptions: 106 a. Broadcast is greatly reduced due to the proxy ARP (Address 107 Resolution Protocol) and proxy ND (Neighbor Discovery) 108 capabilities supported by EVPN on the NVEs. Some NVEs can even 109 provide Dynamic Host Configuration Protocol (DHCP) server 110 functions for the attached Tenant Systems (TS) reducing the 111 broadcast even further. 113 b. Unknown unicast traffic is greatly reduced in virtualized NVO 114 networks where all the MAC and IP addresses are learned in the 115 control plane. 117 c. Multicast applications are not used. 119 If the above assumptions are true for a given NVO network, then IR 120 provides a simple solution for multi-destination traffic. However, 121 the statement c) above is not always true and multicast applications 122 are required in many use-cases. 124 When the multicast sources are attached to NVEs residing in 125 hypervisors or low-performance-replication TORs (Top Of Rack 126 switches), the ingress replication of a large amount of multicast 127 traffic to a significant number of remote NVEs/PEs can seriously 128 degrade the performance of the NVE and impact the application. 130 This document describes a solution that makes use of two IR 131 optimizations: 133 1. Assisted-Replication (AR) 135 2. Pruned-Flood-Lists (PFL) 137 Both optimizations may be used together or independently so that the 138 performance and efficiency of the network to transport multicast can 139 be improved. Both solutions require some extensions to [RFC7432] 140 that are described in Section 4. 142 Section 3 lists the requirements of the combined optimized-IR 143 solution, whereas Section 5 and Section 6 describe the Assisted- 144 Replication (AR) solution, and Section 7 the Pruned-Flood-Lists (PFL) 145 solution. 147 2. Terminology and Conventions 149 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 150 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 151 "OPTIONAL" in this document are to be interpreted as described in BCP 152 14 [RFC2119] [RFC8174] when, and only when, they appear in all 153 capitals, as shown here. 155 The following terminology is used throughout the document: 157 - AC: Attachment Circuit 159 - BM traffic: Refers to Broadcast and Multicast frames (excluding 160 unknown unicast frames) 162 - NVO: Network Virtualization Overlay 164 - NVE: Network Virtualization Edge router 166 - PE: Provider Edge router 168 - AR-REPLICATOR: Assisted Replication - REPLICATOR, refers to an 169 NVE/PE that can replicate Broadcast or Multicast traffic received 170 on overlay tunnels to other overlay tunnels. This document 171 defines the control and data plane procedures that an AR- 172 REPLICATOR needs to follow. 174 - AR-LEAF: Assisted Replication - LEAF, refers to an NVE/PE that - 175 given its poor replication performance - sends all the Broadcast 176 and Multicast traffic to an AR-REPLICATOR that can replicate the 177 traffic further on its behalf. 179 - RNVE: Regular NVE, refers to an NVE that supports the procedures 180 of [RFC8365] and does not support the procedures in this document. 181 However, this document defines procedures to interoperate with 182 RNVEs. 184 - Replicator-AR route: an EVPN RT-3 (route type 3) that is 185 advertised by an AR-REPLICATOR to signal its capabilities. 187 - Regular-IR: Refers to Regular Ingress Replication, where the 188 source NVE/PE sends a copy to each remote NVE/PE part of the BD. 190 - AR-IP: IP address owned by the AR-REPLICATOR and used to 191 differentiate the ingress traffic that must follow the AR 192 procedures. 194 - IR-IP: IP address used for Ingress Replication as in [RFC7432]. 196 - AR-VNI: VNI advertised by the AR-REPLICATOR along with the 197 Replicator-AR route. It is used to identify the ingress packets 198 that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR 199 case. 201 - IR-VNI: VNI advertised along with the RT-3 for IR. 203 - AR forwarding mode: for an AR-LEAF, it means sending an AC BM 204 packet to a single AR-REPLICATOR with tunnel destination IP AR-IP. 205 For an AR-REPLICATOR, it means sending a BM packet to a selected 206 number or all the overlay tunnels when the packet was previously 207 received from an overlay tunnel. 209 - IR forwarding mode: it refers to the Ingress Replication behavior 210 explained in [RFC7432]. It means sending an AC BM packet copy to 211 each remote PE/NVE in the BD and sending an overlay BM packet only 212 to the ACs and not other overlay tunnels. 214 - PTA: PMSI Tunnel Attribute 216 - RT-3: EVPN Route Type 3, Inclusive Multicast Ethernet Tag route 218 - RT-11: EVPN Route Type 11, Leaf Auto-Discovery (A-D) route 220 - VXLAN: Virtual Extensible LAN 222 - GRE: Generic Routing Encapsulation 224 - NVGRE: Network Virtualization using Generic Routing Encapsulation 226 - GENEVE: Generic Network Virtualization Encapsulation 228 - VNI: VXLAN Network Identifier 230 - EVI: EVPN Instance. An EVPN instance spanning the Provider Edge 231 (PE) devices participating in that EVPN 233 - BD: Broadcast Domain, as defined in [RFC7432]. 235 - TOR: Top Of Rack switch 237 3. Solution requirements 239 The IR optimization solution specified in this document (optimized-IR 240 hereafter) meets the following requirements: 242 a. It provides an IR optimization for BM (Broadcast and Multicast) 243 traffic without the need for PIM, while preserving the packet 244 order for unicast applications, i.e., known and unknown unicast 245 traffic should follow the same path. This optimization is 246 required in low-performance NVEs. 248 b. It reduces the flooded traffic in NVO networks where some NVEs do 249 not need broadcast/multicast and/or unknown unicast traffic. 251 c. The solution is compatible with [RFC7432] and [RFC8365] and has 252 no impact on the EVPN procedures for BM traffic. In particular, 253 the solution supports the following EVPN functions: 255 o All-active multi-homing, including the split-horizon and 256 Designated Forwarder (DF) functions. 258 o Single-active multi-homing, including the DF function. 260 o Handling of multi-destination traffic and processing of 261 broadcast and multicast as per [RFC7432]. 263 d. The solution is backwards compatible with existing NVEs using a 264 non-optimized version of IR. A given BD can have NVEs/PEs 265 supporting regular-IR and optimized-IR. 267 e. The solution is independent of the NVO specific data plane 268 encapsulation and the virtual identifiers being used, e.g.: VXLAN 269 VNIs, NVGRE VSIDs or MPLS labels, as long as the tunnel is IP- 270 based. 272 4. EVPN BGP Attributes for optimized-IR 274 This solution extends the [RFC7432] Inclusive Multicast Ethernet Tag 275 routes and attributes so that an NVE/PE can signal its optimized-IR 276 capabilities. 278 The Inclusive Multicast Ethernet Tag route (RT-3) and its PMSI Tunnel 279 Attribute's (PTA) general format used in [RFC7432] are shown below: 281 +---------------------------------+ 282 | RD (8 octets) | 283 +---------------------------------+ 284 | Ethernet Tag ID (4 octets) | 285 +---------------------------------+ 286 | IP Address Length (1 octet) | 287 +---------------------------------+ 288 | Originating Router's IP Addr | 289 | (4 or 16 octets) | 290 +---------------------------------+ 292 +---------------------------------+ 293 | Flags (1 octet) | 294 +---------------------------------+ 295 | Tunnel Type (1 octets) | 296 +---------------------------------+ 297 | MPLS Label (3 octets) | 298 +---------------------------------+ 299 | Tunnel Identifier (variable) | 300 +---------------------------------+ 302 The Flags field is 8 bits long. This document defines the use of 4 303 bits of this Flags field: 305 - bits 3 and 4, forming together the Assisted-Replication Type (T) 306 field 308 - bit 5, called the Broadcast and Multicast (BM) flag 310 - bit 6, called the Unknown (U) flag 312 Bits 5 and 6 are collectively referred to as the PFL (Pruned-Flood 313 Lists) flags. 315 The T field and PFL flags are defined as follows: 317 - T is the AR Type field (2 bits) that defines the AR role of the 318 advertising router: 320 o 00 (decimal 0) = RNVE (non-AR support) 322 o 01 (decimal 1) = AR-REPLICATOR 324 o 10 (decimal 2) = AR-LEAF 326 o 11 (decimal 3) = RESERVED 328 - The PFL (Pruned-Flood-Lists) flags define the desired behavior of 329 the advertising router for the different types of traffic: 331 o Broadcast and Multicast (BM) flag. BM=1 means "prune-me" from 332 the BM flooding list. BM=0 means regular behavior. 334 o Unknown (U) flag. U=1 means "prune-me" from the Unknown 335 flooding list. U=0 means regular behavior. 337 - Flag L is an existing flag defined in [RFC6514] (L=Leaf 338 Information Required) and it will be used only in the Selective AR 339 Solution. 341 Please refer to Section 11 for the IANA considerations related to the 342 PTA flags. 344 In this document, the above RT-3 and PTA can be used in two different 345 modes for the same BD: 347 - Regular-IR route: in this route, Originating Router's IP Address, 348 Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used 349 as described in [RFC7432] when Ingress Replication is in use. The 350 NVE/PE that advertises the route will set the Next-Hop to an IP 351 address that we denominate IR-IP in this document. When 352 advertised by an AR-LEAF node, the Regular-IR route SHOULD be 353 advertised with type T= AR-LEAF. 355 - Replicator-AR route: this route is used by the AR-REPLICATOR to 356 advertise its AR capabilities, with the fields set as follows: 358 o Originating Router's IP Address MUST be set to an IP address of 359 the PE that should be common to all the EVIs on the PE (usually 360 this is the PE's loopback address). The Tunnel Identifier and 361 Next-Hop SHOULD be set to the same IP address as the 362 Originating Router's IP address when the NVE/PE originates the 363 route. The Next-Hop address is referred to as the AR-IP and 364 SHOULD be different than the IR-IP for a given PE/NVE. 366 o Tunnel Type = Assisted-Replication Tunnel. Section 11 provides 367 the allocated type value. 369 o T (AR role type) = 01 (AR-REPLICATOR). 371 o L (Leaf Information Required) = 0 (for non-selective AR) or 1 372 (for selective AR). 374 In addition, this document also uses the Leaf A-D route (RT-11) 375 defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the 376 selective AR mode is used. The Leaf A-D route MAY be used by the AR- 377 LEAF in response to a Replicator-AR route (with the L flag set) to 378 advertise its desire to receive the BM traffic from a specific AR- 379 REPLICATOR. It is only used for selective AR and its fields are set 380 as follows: 382 o Originating Router's IP Address is set to the advertising PE's 383 IP address (same IP used by the AR-LEAF in regular-IR routes). 384 The Next-Hop address is set to the IR-IP. 386 o Route Key is the "Route Type Specific" NLRI of the Replicator- 387 AR route for which this Leaf A-D route is generated. 389 o The AR-LEAF constructs an IP-address-specific route-target as 390 indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by 391 placing the IP address carried in the Next-Hop field of the 392 received Replicator-AR route in the Global Administrator field 393 of the Community, with the Local Administrator field of this 394 Community set to 0. Note that the same IP-address-specific 395 import route-target is auto-configured by the AR-REPLICATOR 396 that sent the Replicator-AR, in order to control the acceptance 397 of the Leaf A-D routes. 399 o The Leaf A-D route MUST include the PMSI Tunnel attribute with 400 the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel 401 Identifier set to the IP of the advertising AR-LEAF. The PMSI 402 Tunnel attribute MUST carry a downstream-assigned MPLS label or 403 VNI that is used by the AR-REPLICATOR to send traffic to the 404 AR-LEAF. 406 Each AR-enabled node MUST understand and process the AR type field in 407 the PTA (Flags field) of the routes, and MUST signal the 408 corresponding type (1 or 2) according to its administrative choice. 410 Each node attached to the BD may understand and process the BM/U 411 flags. Note that these BM/U flags may be used to optimize the 412 delivery of multi-destination traffic and its use SHOULD be an 413 administrative choice, and independent of the AR role. 415 Non-optimized-IR nodes will be unaware of the new PMSI attribute flag 416 definition as well as the new Tunnel Type (AR), i.e. they will ignore 417 the information contained in the flags field for any RT-3 and will 418 ignore the RT-3 routes with an unknown Tunnel Type (type AR in this 419 case). 421 5. Non-selective Assisted-Replication (AR) Solution Description 423 Figure 1 illustrates an example NVO network where the non-selective 424 AR function is enabled. Three different roles are defined for a 425 given BD: AR-REPLICATOR, AR-LEAF and RNVE (Regular NVE). The 426 solution is called "non-selective" because the chosen AR-REPLICATOR 427 for a given flow MUST replicate the BM traffic to 'all' the NVE/PEs 428 in the BD except for the source NVE/PE. 430 ( ) 431 (_ WAN _) 432 +---(_ _)----+ 433 | (_ _) | 434 PE1 | PE2 | 435 +------+----+ +----+------+ 436 TS1--+ (BD-1) | | (BD-1) +--TS2 437 |REPLICATOR | |REPLICATOR | 438 +--------+--+ +--+--------+ 439 | | 440 +--+----------------+--+ 441 | | 442 | | 443 +----+ VXLAN/nvGRE/MPLSoGRE +----+ 444 | | IP Fabric | | 445 | | | | 446 NVE1 | +-----------+----------+ | NVE3 447 Hypervisor| TOR | NVE2 |Hypervisor 448 +---------+-+ +-----+-----+ +-+---------+ 449 | (BD-1) | | (BD-1) | | (BD-1) | 450 | LEAF | | RNVE | | LEAF | 451 +--+-----+--+ +--+-----+--+ +--+-----+--+ 452 | | | | | | 453 VM11 VM12 TS3 TS4 VM31 VM32 455 Figure 1: Optimized-IR scenario 457 In AR BDs such as BD-1 in the example, BM (Broadcast and Multicast) 458 traffic between two NVEs may follow a different path than unicast 459 traffic. This solution recommends the replication of BM through the 460 AR-REPLICATOR node, whereas unknown/known unicast will be delivered 461 directly from the source node to the destination node without being 462 replicated by any intermediate node. Unknown unicast SHALL follow 463 the same path as known unicast traffic in order to avoid packet 464 reordering for unicast applications and simplify the control and data 465 plane procedures. 467 Note that known unicast forwarding is not impacted by this solution. 469 5.1. Non-selective AR-REPLICATOR procedures 471 An AR-REPLICATOR is defined as an NVE/PE capable of replicating 472 ingress BM (Broadcast and Multicast) traffic received on an overlay 473 tunnel to other overlay tunnels and local Attachment Circuits (ACs). 474 The AR-REPLICATOR signals its role in the control plane and 475 understands where the other roles (AR-LEAF nodes, RNVEs and other AR- 476 REPLICATORs) are located. A given AR-enabled BD service may have 477 zero, one or more AR-REPLICATORs. In our example in Figure 1, PE1 478 and PE2 are defined as AR-REPLICATORs. The following considerations 479 apply to the AR-REPLICATOR role: 481 a. The AR-REPLICATOR role SHOULD be an administrative choice in any 482 NVE/PE that is part of an AR-enabled BD. This administrative 483 option to enable AR-REPLICATOR capabilities MAY be implemented as 484 a system level option as opposed to as a per-BD option. 486 b. An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY 487 advertise a Regular-IR route. The AR-REPLICATOR MUST NOT 488 generate a Regular-IR route if it does not have local attachment 489 circuits (AC). If the Regular-IR route is advertised, the AR 490 Type field is set to zero. 492 c. The Replicator-AR and Regular-IR routes are generated according 493 to section 3. The AR-IP and IR-IP used by the AR-REPLICATOR are 494 different routable IP addresses. 496 d. When a node defined as AR-REPLICATOR receives a BM packet on an 497 overlay tunnel, it will do a tunnel destination IP lookup and 498 apply the following procedures: 500 o If the destination IP is the AR-REPLICATOR IR-IP Address the 501 node will process the packet normally as in [RFC7432]. 503 o If the destination IP is the AR-REPLICATOR AR-IP Address the 504 node MUST replicate the packet to local ACs and overlay 505 tunnels (excluding the overlay tunnel to the source of the 506 packet). When replicating to remote AR-REPLICATORs the tunnel 507 destination IP will be an IR-IP. That will be an indication 508 for the remote AR-REPLICATOR that it MUST NOT replicate to 509 overlay tunnels. The tunnel source IP used by the AR- 510 REPLICATOR MUST be its IR-IP when replicating to either AR- 511 REPLICATOR or AR-LEAF nodes. 513 An AR-REPLICATOR will follow a data path implementation compatible 514 with the following rules: 516 - The AR-REPLICATORs will build a flooding list composed of ACs and 517 overlay tunnels to remote nodes in the BD. Some of those overlay 518 tunnels MAY be flagged as non-BM receivers based on the BM flag 519 received from the remote nodes in the BD. 521 - When an AR-REPLICATOR receives a BM packet on an AC, it will 522 forward the BM packet to its flooding list (including local ACs 523 and remote NVE/PEs), skipping the non-BM overlay tunnels. 525 - When an AR-REPLICATOR receives a BM packet on an overlay tunnel, 526 it will check the destination IP of the underlay IP header and: 528 o If the destination IP matches its AR-IP, the AR-REPLICATOR will 529 forward the BM packet to its flooding list (ACs and overlay 530 tunnels) excluding the non-BM overlay tunnels. The AR- 531 REPLICATOR will do source squelching to ensure the traffic is 532 not sent back to the originating AR-LEAF. 534 o If the destination IP matches its IR-IP, the AR-REPLICATOR will 535 skip all the overlay tunnels from the flooding list, i.e. it 536 will only replicate to local ACs. This is the regular IR 537 behavior described in [RFC7432]. 539 - While the forwarding behavior in AR-REPLICATORs and AR-LEAF nodes 540 is different for BM traffic, as far as Unknown unicast traffic 541 forwarding is concerned, AR-LEAF nodes behave exactly in the same 542 way as AR-REPLICATORs do. 544 - The AR-REPLICATOR/LEAF nodes will build an Unknown unicast flood- 545 list composed of ACs and overlay tunnels to the IR-IP Addresses of 546 the remote nodes in the BD. Some of those overlay tunnels MAY be 547 flagged as non-U (Unknown unicast) receivers based on the U flag 548 received from the remote nodes in the BD. 550 o When an AR-REPLICATOR/LEAF receives an unknown packet on an AC, 551 it will forward the unknown packet to its flood-list, skipping 552 the non-U overlay tunnels. 554 o When an AR-REPLICATOR/LEAF receives an unknown packet on an 555 overlay tunnel will forward the unknown packet to its local ACs 556 and never to an overlay tunnel. This is the regular IR 557 behavior described in [RFC7432]. 559 5.2. Non-selective AR-LEAF procedures 561 AR-LEAF is defined as an NVE/PE that - given its poor replication 562 performance - sends all the BM traffic to an AR-REPLICATOR that can 563 replicate the traffic further on its behalf. It MAY signal its AR- 564 LEAF capability in the control plane and understands where the other 565 roles are located (AR-REPLICATOR and RNVEs). A given service can 566 have zero, one or more AR-LEAF nodes. Figure 1 shows NVE1 and NVE3 567 (both residing in hypervisors) acting as AR-LEAF. The following 568 considerations apply to the AR-LEAF role: 570 a. The AR-LEAF role SHOULD be an administrative choice in any NVE/PE 571 that is part of an AR-enabled BD. This administrative option to 572 enable AR-LEAF capabilities MAY be implemented as a system level 573 option as opposed to as per-BD option. 575 b. In this non-selective AR solution, the AR-LEAF MUST advertise a 576 single Regular-IR inclusive multicast route as in [RFC7432]. The 577 AR-LEAF SHOULD set the AR Type field to AR-LEAF. Note that 578 although this flag does not make any difference for the egress 579 nodes when creating an EVPN destination to the AR-LEAF, it is 580 RECOMMENDED to use this flag for an easy operation and 581 troubleshooting of the BD. 583 c. In a service where there are no AR-REPLICATORs, the AR-LEAF MUST 584 use regular ingress replication. This will happen when a new 585 update from the last former AR-REPLICATOR is received and 586 contains a non-REPLICATOR AR type, or when the AR-LEAF detects 587 that the last AR-REPLICATOR is down (via next-hop tracking in the 588 IGP or any other detection mechanism). Ingress replication MUST 589 use the forwarding information given by the remote Regular-IR 590 Inclusive Multicast Routes as described in [RFC7432]. 592 d. In a service where there is one or more AR-REPLICATORs (based on 593 the received Replicator-AR routes for the BD), the AR-LEAF can 594 locally select which AR-REPLICATOR it sends the BM traffic to: 596 o A single AR-REPLICATOR MAY be selected for all the BM packets 597 received on the AR-LEAF attachment circuits (ACs) for a given 598 BD. This selection is a local decision and it does not have 599 to match other AR-LEAF's selection within the same BD. 601 o An AR-LEAF MAY select more than one AR-REPLICATOR and do 602 either per-flow or per-BD load balancing. 604 o In case of a failure on the selected AR-REPLICATOR, another 605 AR-REPLICATOR will be selected. 607 o When an AR-REPLICATOR is selected, the AR-LEAF MUST send all 608 the BM packets to that AR-REPLICATOR using the forwarding 609 information given by the Replicator-AR route for the chosen 610 AR-REPLICATOR, with tunnel type = 0x0A (AR tunnel). The 611 underlay destination IP address MUST be the AR-IP advertised 612 by the AR-REPLICATOR in the Replicator-AR route. 614 o AR-LEAF nodes SHALL send service-level BM control plane 615 packets following regular IR procedures. An example would be 616 IGMP, MLD or PIM multicast packets. The AR-REPLICATORs MUST 617 NOT replicate these control plane packets to other overlay 618 tunnels since they will use the regular IR-IP Address. 620 e. The use of an AR-REPLICATOR-activation-timer (in seconds, default 621 value is 3) on the AR-LEAF nodes is RECOMMENDED. Upon receiving 622 a new Replicator-AR route where the AR-REPLICATOR is selected, 623 the AR-LEAF will run a timer before programming the new AR- 624 REPLICATOR. In case of a new added AR-REPLICATOR, or in case the 625 AR-REPLICATOR reboots, this timer will give the AR-REPLICATOR 626 some time to program the AR-LEAF nodes before the AR-LEAF sends 627 BM traffic. The AR-REPLICATOR-activation-timer SHOULD be 628 configurable in seconds, and its value account for the time it 629 takes for the AR-LEAF Regular-IR inclusive multicast route to get 630 to the AR-REPLICATOR and be programmed. While the AR-REPLICATOR- 631 activation-time is running, the AR-LEAF node will use regular 632 ingress replication. 634 An AR-LEAF will follow a data path implementation compatible with the 635 following rules: 637 - The AR-LEAF nodes will build two flood-lists: 639 1. Flood-list #1 - composed of ACs and an AR-REPLICATOR-set of 640 overlay tunnels. The AR-REPLICATOR-set is defined as one or 641 more overlay tunnels to the AR-IP Addresses of the remote AR- 642 REPLICATOR(s) in the BD. The selection of more than one AR- 643 REPLICATOR is described in point d) above and it is a local 644 AR-LEAF decision. 646 2. Flood-list #2 - composed of ACs and overlay tunnels to the 647 remote IR-IP Addresses. 649 - When an AR-LEAF receives a BM packet on an AC, it will check the 650 AR-REPLICATOR-set: 652 o If the AR-REPLICATOR-set is empty, the AR-LEAF will send the 653 packet to flood-list #2. 655 o If the AR-REPLICATOR-set is NOT empty, the AR-LEAF will send 656 the packet to flood-list #1, where only one of the overlay 657 tunnels of the AR-REPLICATOR-set is used. 659 - When an AR-LEAF receives a BM packet on an overlay tunnel, will 660 forward the BM packet to its local ACs and never to an overlay 661 tunnel. This is the regular IR behavior described in [RFC7432]. 663 - AR-LEAF nodes process Unknown unicast traffic in the same way AR- 664 REPLICATORS do, as described in section Section 5.1. 666 5.3. RNVE procedures 668 RNVE (Regular Network Virtualization Edge node) is defined as an NVE/ 669 PE without AR-REPLICATOR or AR-LEAF capabilities that does IR as 670 described in [RFC7432]. The RNVE does not signal any AR role and is 671 unaware of the AR-REPLICATOR/LEAF roles in the BD. The RNVE will 672 ignore the Flags in the Regular-IR routes and will ignore the 673 Replicator-AR routes (due to an unknown tunnel type in the PTA) and 674 the Leaf A-D routes (due to the IP-address-specific route-target). 676 This role provides EVPN with the backwards compatibility required in 677 optimized-IR BDs. Figure 1 shows NVE2 as RNVE. 679 6. Selective Assisted-Replication (AR) Solution Description 681 Figure 1 is also used to describe the selective AR solution, however 682 in this section we consider NVE2 as one more AR-LEAF for BD-1. The 683 solution is called "selective" because a given AR-REPLICATOR MUST 684 replicate the BM traffic to only the AR-LEAF that requested the 685 replication (as opposed to all the AR-LEAF nodes) and MAY replicate 686 the BM traffic to the RNVEs. The same AR roles defined in Section 4 687 are used here, however the procedures are different. 689 The following sub-sections describe the differences in the procedures 690 of AR-REPLICATOR/LEAFs compared to the non-selective AR solution. 691 There is no change on the RNVEs. 693 6.1. Selective AR-REPLICATOR procedures 695 In our example in Figure 1, PE1 and PE2 are defined as Selective AR- 696 REPLICATORs. The following considerations apply to the Selective AR- 697 REPLICATOR role: 699 a. The Selective AR-REPLICATOR capability SHOULD be an 700 administrative choice in any NVE/PE that is part of an AR-enabled 701 BD, as the AR role itself. This administrative option MAY be 702 implemented as a system level option as opposed to as a per-BD 703 option. 705 b. Each AR-REPLICATOR will build a list of AR-REPLICATOR, AR-LEAF 706 and RNVE nodes. In spite of the 'Selective' administrative 707 option, an AR-REPLICATOR MUST NOT behave as a Selective AR- 708 REPLICATOR if at least one of the AR-REPLICATORs has the L flag 709 NOT set. If at least one AR-REPLICATOR sends a Replicator-AR 710 route with L=0 (in the BD context), the rest of the AR- 711 REPLICATORs will fall back to non-selective AR mode. 713 c. The Selective AR-REPLICATOR MUST follow the procedures described 714 in section Section 5.1, except for the following differences: 716 o The Replicator-AR route MUST include L=1 (Leaf Information 717 Required) in the Replicator-AR route. This flag is used by 718 the AR-REPLICATORs to advertise their 'selective' AR- 719 REPLICATOR capabilities. In addition, the AR-REPLICATOR auto- 720 configures its IP-address-specific import route-target as 721 described in section Section 4. 723 o The AR-REPLICATOR will build a 'selective' AR-LEAF-set with 724 the list of nodes that requested replication to its own AR-IP. 725 For instance, assuming NVE1 and NVE2 advertise a Leaf A-D 726 route with PE1's IP-address-specific route-target and NVE3 727 advertises a Leaf A-D route with PE2's IP-address-specific 728 route-target, PE1 MUST only add NVE1/NVE2 to its selective AR- 729 LEAF-set for BD-1, and exclude NVE3. 731 o When a node defined and operating as Selective AR-REPLICATOR 732 receives a packet on an overlay tunnel, it will do a tunnel 733 destination IP lookup and if the destination IP is the AR- 734 REPLICATOR AR-IP Address, the node MUST replicate the packet 735 to: 737 + local ACs 739 + overlay tunnels in the Selective AR-LEAF-set (excluding the 740 overlay tunnel to the source AR-LEAF). 742 + overlay tunnels to the RNVEs if the tunnel source IP is the 743 IR-IP of an AR-LEAF (in any other case, the AR-REPLICATOR 744 MUST NOT replicate the BM traffic to remote RNVEs). In 745 other words, only the first-hop selective AR-REPLICATOR 746 will replicate to all the RNVEs. 748 + overlay tunnels to the remote Selective AR-REPLICATORs if 749 the tunnel source IP is an IR-IP of its own AR-LEAF-set (in 750 any other case, the AR-REPLICATOR MUST NOT replicate the BM 751 traffic to remote AR-REPLICATORs), where the tunnel 752 destination IP is the AR-IP of the remote Selective AR- 753 REPLICATOR. The tunnel destination IP AR-IP will be an 754 indication for the remote Selective AR-REPLICATOR that the 755 packet needs further replication to its AR-LEAFs. 757 A Selective AR-REPLICATOR data path implementation will be compatible 758 with the following rules: 760 - The Selective AR-REPLICATORs will build two flood-lists: 762 1. Flood-list #1 - composed of ACs and overlay tunnels to the 763 remote nodes in the BD, always using the IR-IPs in the tunnel 764 destination IP addresses. Some of those overlay tunnels MAY 765 be flagged as non-BM receivers based on the BM flag received 766 from the remote nodes in the BD. 768 2. Flood-list #2 - composed of ACs, a Selective AR-LEAF-set and a 769 Selective AR-REPLICATOR-set, where: 771 + The Selective AR-LEAF-set is composed of the overlay 772 tunnels to the AR-LEAFs that advertise a Leaf A-D route for 773 the local AR-REPLICATOR. This set is updated with every 774 Leaf A-D route received/withdrawn from a new AR-LEAF. 776 + The Selective AR-REPLICATOR-set is composed of the overlay 777 tunnels to all the AR-REPLICATORs that send a Replicator-AR 778 route with L=1. The AR-IP addresses are used as tunnel 779 destination IP. 781 - When a Selective AR-REPLICATOR receives a BM packet on an AC, it 782 will forward the BM packet to its flood-list #1, skipping the non- 783 BM overlay tunnels. 785 - When a Selective AR-REPLICATOR receives a BM packet on an overlay 786 tunnel, it will check the destination and source IPs of the 787 underlay IP header and: 789 o If the destination IP matches its AR-IP and the source IP 790 matches an IP of its own Selective AR-LEAF-set, the AR- 791 REPLICATOR will forward the BM packet to its flood-list #2, as 792 long as the list of AR-REPLICATORs for the BD matches the 793 Selective AR-REPLICATOR-set. If the Selective AR-REPLICATOR- 794 set does not match the list of AR-REPLICATORs, the node reverts 795 back to non-selective mode and flood-list #1 is used. 797 o If the destination IP matches its AR-IP and the source IP does 798 not match any IP of its Selective AR-LEAF-set, the AR- 799 REPLICATOR will forward the BM packet to flood-list #2 but 800 skipping the AR-REPLICATOR-set. 802 o If the destination IP matches its IR-IP, the AR-REPLICATOR will 803 use flood-list #1 but MUST skip all the overlay tunnels from 804 the flooding list, i.e. it will only replicate to local ACs. 805 This is the regular-IR behavior described in [RFC7432]. 807 - In any case, non-BM overlay tunnels are excluded from flood-lists 808 and, also, source squelching is always done in order to ensure the 809 traffic is not sent back to the originating source. If the 810 encapsulation is MPLSoGRE (or MPLSoUDP) and the BD label is not 811 the bottom of the stack, the AR-REPLICATOR MUST copy the rest of 812 the labels when forwarding them to the egress overlay tunnels. 814 6.2. Selective AR-LEAF procedures 816 A Selective AR-LEAF chooses a single Selective AR-REPLICATOR per BD 817 and: 819 - Sends all the BD BM traffic to that AR-REPLICATOR and 820 - Expects to receive the BM traffic for a given BD from the same AR- 821 REPLICATOR. 823 In the example of Figure 1, we consider NVE1/NVE2/NVE3 as Selective 824 AR-LEAFs. NVE1 selects PE1 as its Selective AR-REPLICATOR. If that 825 is so, NVE1 will send all its BM traffic for BD-1 to PE1. If other 826 AR-LEAF/REPLICATORs send BM traffic, NVE1 will receive that traffic 827 from PE1. These are the differences in the behavior of a Selective 828 AR-LEAF compared to a non-selective AR-LEAF: 830 a. The AR-LEAF role selective capability SHOULD be an administrative 831 choice in any NVE/PE that is part of an AR-enabled BD. This 832 administrative option to enable AR-LEAF capabilities MAY be 833 implemented as a system level option as opposed to as per-BD 834 option. 836 b. The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs 837 in the BD. The Selective AR-LEAF MUST advertise a Leaf A-D route 838 after receiving a Replicator-AR route with L=1. It is 839 RECOMMENDED that the Selective AR-LEAF waits for a AR-LEAF-join- 840 wait-timer (in seconds, default value is 3) before sending the 841 Leaf A-D route, so that the AR-LEAF can collect all the 842 Replicator-AR routes for the BD before advertising the Leaf A-D 843 route. 845 c. In a service where there is more than one Selective AR- 846 REPLICATORs the Selective AR-LEAF MUST locally select a single 847 Selective AR-REPLICATOR for the BD. Once selected: 849 o The Selective AR-LEAF will send a Leaf A-D route including the 850 Route-key and IP-address-specific route-target of the selected 851 AR-REPLICATOR. 853 o The Selective AR-LEAF will send all the BM packets received on 854 the attachment circuits (ACs) for a given BD to that AR- 855 REPLICATOR. 857 o In case of a failure on the selected AR-REPLICATOR, another 858 AR-REPLICATOR will be selected and a new Leaf A-D update will 859 be issued for the new AR-REPLICATOR. This new route will 860 update the selective list in the new Selective AR-REPLICATOR. 861 In case of failure on the active Selective AR-REPLICATOR, it 862 is RECOMMENDED for the Selective AR-LEAF to revert to IR 863 behavior for a timer AR-REPLICATOR-activation-timer (in 864 seconds, default value is 3) to speed up the convergence. 865 When the timer expires, the Selective AR-LEAF will resume its 866 AR mode with the new Selective AR-REPLICATOR. The AR- 867 REPLICATOR-activation-timer MAY be the same configurable 868 parameter as in Section 5.2. 870 All the AR-LEAFs in a BD are expected to be configured as either 871 selective or non-selective. A mix of selective and non-selective AR- 872 LEAFs SHOULD NOT coexist in the same BD. In case there is a non- 873 selective AR-LEAF, its BM traffic sent to a selective AR-REPLICATOR 874 will not be replicated to other AR-LEAFs that are not in its 875 Selective AR-LEAF-set. 877 A Selective AR-LEAF will follow a data path implementation compatible 878 with the following rules: 880 - The Selective AR-LEAF nodes will build two flood-lists: 882 1. Flood-list #1 - composed of ACs and the overlay tunnel to the 883 selected AR-REPLICATOR (using the AR-IP as the tunnel 884 destination IP). 886 2. Flood-list #2 - composed of ACs and overlay tunnels to the 887 remote IR-IP Addresses. 889 - When an AR-LEAF receives a BM packet on an AC, it will check if 890 there is any selected AR-REPLICATOR. If there is, flood-list #1 891 will be used. Otherwise, flood-list #2 will. 893 - When an AR-LEAF receives a BM packet on an overlay tunnel, will 894 forward the BM packet to its local ACs and never to an overlay 895 tunnel. This is the regular IR behavior described in [RFC7432]. 897 7. Pruned-Flood-Lists (PFL) 899 In addition to AR, the second optimization supported by this solution 900 is the ability for the all the BD nodes to signal Pruned-Flood-Lists 901 (PFL). As described in section 3, an EVPN node can signal a given 902 value for the BM and U PFL flags in the IR Inclusive Multicast 903 Routes, where: 905 - BM is the Broadcast and Multicast flag. BM=1 means "prune-me" 906 from the BM flood-list. BM=0 means regular behavior. 908 - U is the Unknown flag. U=1 means "prune-me" from the Unknown 909 flood-list. U=0 means regular behavior. 911 The ability to signal these PFL flags is an administrative choice. 912 Upon receiving a non-zero PFL flag, a node MAY decide to honor the 913 PFL flag and remove the sender from the corresponding flood-list. A 914 given BD node receiving BUM traffic on an overlay tunnel MUST 915 replicate the traffic normally, regardless of the signaled PFL flags. 917 This optimization MAY be used along with the AR solution. 919 7.1. A PFL example 921 In order to illustrate the use of the solution described in this 922 document, we will assume that BD-1 in figure 1 is optimized-IR 923 enabled and: 925 - PE1 and PE2 are administratively configured as AR-REPLICATORs, due 926 to their high-performance replication capabilities. PE1 and PE2 927 will send a Replicator-AR route with BM/U flags = 00. 929 - NVE1 and NVE3 are administratively configured as AR-LEAF nodes, 930 due to their low-performance software-based replication 931 capabilities. They will advertise a Regular-IR route with type 932 AR-LEAF. Assuming both NVEs advertise all the attached VMs in 933 EVPN as soon as they come up and don't have any VMs interested in 934 multicast applications, they will be configured to signal BM/U 935 flags = 11 for BD-1. 937 - NVE2 is optimized-IR unaware; therefore it takes on the RNVE role 938 in BD-1. 940 Based on the above assumptions the following forwarding behavior will 941 take place: 943 1. Any BM packets sent from VM11 will be sent to VM12 and PE1. PE1 944 will forward further the BM packets to TS1, WAN link, PE2 and 945 NVE2, but not to NVE3. PE2 and NVE2 will replicate the BM 946 packets to their local ACs but we will avoid NVE3 having to 947 replicate unnecessarily those BM packets to VM31 and VM32. 949 2. Any BM packets received on PE2 from the WAN will be sent to PE1 950 and NVE2, but not to NVE1 and NVE3, sparing the two hypervisors 951 from replicating unnecessarily to their local VMs. PE1 and NVE2 952 will replicate to their local ACs only. 954 3. Any Unknown unicast packet sent from VM31 will be forwarded by 955 NVE3 to NVE2, PE1 and PE2 but not NVE1. The solution avoids the 956 unnecessary replication to NVE1, since the destination of the 957 unknown traffic cannot be at NVE1. 959 4. Any Unknown unicast packet sent from TS1 will be forwarded by PE1 960 to the WAN link, PE2 and NVE2 but not to NVE1 and NVE3, since the 961 target of the unknown traffic cannot be at those NVEs. 963 8. AR Procedures for single-IP AR-REPLICATORS 965 The procedures explained in sections Section 5 and Section 6 assume 966 that the AR-REPLICATOR can use two local routable IP addresses to 967 terminate and originate NVO tunnels, i.e. IR-IP and AR-IP addresses. 968 This is usually the case for PE-based AR-REPLICATOR nodes. 970 In some cases, the AR-REPLICATOR node does not support more than one 971 IP address to terminate and originate NVO tunnels, i.e. the IR-IP and 972 AR-IP are the same IP addresses. This may be the case in some 973 software-based or low-end AR-REPLICATOR nodes. If this is the case, 974 the procedures in sections Section 5 and Section 6 MUST be modified 975 in the following way: 977 - The Replicator-AR routes generated by the AR-REPLICATOR use an AR- 978 IP that will match its IR-IP. In order to differentiate the data 979 plane packets that need to use IR from the packets that must use 980 AR forwarding mode, the Replicator-AR route MUST advertise a 981 different VNI/VSID than the one used by the Regular-IR route. For 982 instance, the AR-REPLICATOR will advertise AR-VNI along with the 983 Replicator-AR route and IR-VNI along with the Regular-IR route. 984 Since both routes have the same key, different RDs are needed in 985 each route. 987 - An AR-REPLICATOR will perform IR or AR forwarding mode for the 988 incoming Overlay packets based on an ingress VNI lookup, as 989 opposed to the tunnel IP DA lookup. Note that, when replicating 990 to remote AR-REPLICATOR nodes, the use of the IR-VNI or AR-VNI 991 advertised by the egress node will determine the IR or AR 992 forwarding mode at the subsequent AR-REPLICATOR. 994 The rest of the procedures will follow what is described in sections 995 Section 5 and Section 6. 997 9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon 999 This section extends the procedures for the cases where AR-LEAF nodes 1000 or AR-REPLICATOR nodes are attached to the the same Ethernet Segment 1001 in the BD. The case where one (or more) AR-LEAF node(s) and one (or 1002 more) AR-REPLICATOR node(s) are attached to the same Ethernet Segment 1003 is out of scope. 1005 9.1. Ethernet Segments on AR-LEAF nodes 1007 If VXLAN or NVGRE are used, and if the Split-horizon is based on the 1008 tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split- 1009 horizon check will not work if there is an Ethernet-Segment shared 1010 between two AR-LEAF nodes, and the AR-REPLICATOR changes the tunnel 1011 IP SA of the packets with its own AR-IP. 1013 In order to be compatible with the IP SA split-horizon check, the AR- 1014 REPLICATOR MAY keep the original received tunnel IP SA when 1015 replicating packets to a remote AR-LEAF or RNVE. This will allow AR- 1016 LEAF nodes to apply Split-horizon check procedures for BM packets, 1017 before sending them to the local Ethernet-Segment. Even if the AR- 1018 LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the 1019 AR-REPLICATOR MUST always use its IR-IP as IP SA when replicating to 1020 other AR-REPLICATORs. 1022 When EVPN is used for MPLS over GRE (or UDP), the ESI-label based 1023 split-horizon procedure as in [RFC7432] will not work for multi-homed 1024 Ethernet-Segments defined on AR-LEAF nodes. "Local-Bias" is 1025 recommended in this case, as in the case of VXLAN or NVGRE explained 1026 above. The "Local-Bias" and tunnel IP SA preservation mechanisms 1027 provide the required split-horizon behavior in non-selective or 1028 selective AR. 1030 Note that if the AR-REPLICATOR implementation keeps the received 1031 tunnel IP SA, the use of uRPF (unicast Reverse Path Forwarding) 1032 checks in the IP fabric based on the tunnel IP SA MUST be disabled. 1034 9.2. Ethernet Segments on AR-REPLICATOR nodes 1036 Ethernet Segments associated to one or more AR-REPLICATOR nodes 1037 SHOULD follow "Local-Bias" procedures for EVPN all-active multi- 1038 homing, as follows: 1040 - For BUM traffic received on a local AR-REPLICATOR's AC, "Local- 1041 Bias" procedures as in [RFC8365] SHOULD be followed. 1043 - For BUM traffic received on an AR-REPLICATOR overlay tunnel with 1044 AR-IP as the IP DA, "Local-Bias" SHOULD also be followed. That 1045 is, traffic received with AR-IP as IP DA will be treated as though 1046 it had been received on a local AC that is part of the ES and will 1047 be forwarded to all local ES, irrespective of their DF or NDF 1048 state. 1050 - BUM traffic received on an AR-REPLICATOR overlay tunnel with IR-IP 1051 as the IP DA, will follow regular [RFC8365] "Local-Bias" rules and 1052 will not be forwarded to local ESes that are shared with the AR- 1053 LEAF or AR-REPLICATOR originating the traffic. 1055 10. Security Considerations 1057 The Security Considerations in [RFC7432] and [RFC8365] apply to this 1058 document. 1060 In addition, the procedures introduced by this document may bring 1061 some new risks for the successful delivery of BM traffic. Unicast 1062 traffic is not affected by this document. The forwarding of 1063 Broadcast and Multicast (BM) traffic is modified though, and BM 1064 traffic from the AR-LEAF nodes will be attracted by the existance of 1065 AR-REPLICATORs in the BD. An AR-LEAF will forward BM traffic to its 1066 selected AR-REPLICATOR, therefore an attack on the AR-REPLICATOR 1067 could impact the delivery of the BM traffic using that node. 1069 An implementation following the procedures in this document should 1070 not create BM loops, since the AR-REPLICATOR will always forward the 1071 BM traffic using the correct tunnel IP Destination Address that 1072 indicates the remote nodes how to forward the traffic. This is true 1073 in both, the Non-Selective and Selective modes defined in this 1074 document. 1076 The Selective mode provides a multi-staged replication solution, 1077 where a proper configuration of all the AR-REPLICATORs will avoid any 1078 issues. A mix of mistakenly configured Selective and Non-Selective 1079 AR-REPLICATORs in the same BD could theoretically create packet 1080 duplication in some AR-LEAFs, however this document provides a fall 1081 back solution to Non-Selective mode in case the AR-REPLICATORs 1082 advertised an inconsistent AR Replication mode. 1084 Finally, the use of PFL as in Section 7, should be handled with care. 1085 An intentional or unintentional misconfiguration of the BDs on a 1086 given leaf node may result in the leaf not receiving the required BM 1087 or Unknown unicast traffic. 1089 11. IANA Considerations 1091 IANA has allocated the following Border Gateway Protocol (BGP) 1092 Parameters: 1094 - Allocation in the P-Multicast Service Interface Tunnel (PMSI 1095 Tunnel) Tunnel Types registry: 1097 Value Meaning Reference 1098 0x0A Assisted-Replication Tunnel [This document] 1100 - Allocations in the P-Multicast Service Interface (PMSI) Tunnel 1101 Attribute Flags registry: 1103 Value Name Reference 1104 3-4 Assisted-Replication Type (T) [This document] 1105 5 Broadcast and Multicast (BM) [This document] 1106 6 Unknown (U) [This document] 1108 12. Contributors 1110 In addition to the names in the front page, the following co-authors 1111 also contributed to this document: 1113 Wim Henderickx 1114 Nokia 1116 Kiran Nagaraj 1117 Nokia 1119 Ravi Shekhar 1120 Juniper Networks 1122 Nischal Sheth 1123 Juniper Networks 1125 Aldrin Isaac 1126 Juniper 1128 Mudassir Tufail 1129 Citibank 1131 13. Acknowledgments 1133 The authors would like to thank Neil Hart, David Motz, Dai Truong, 1134 Thomas Morin, Jeffrey Zhang, Shankar Murthy and Krzysztof Szarkowicz 1135 for their valuable feedback and contributions. 1137 14. References 1139 14.1. Normative References 1141 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1142 Requirement Levels", BCP 14, RFC 2119, 1143 DOI 10.17487/RFC2119, March 1997, 1144 . 1146 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1147 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1148 May 2017, . 1150 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 1151 Encodings and Procedures for Multicast in MPLS/BGP IP 1152 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 1153 . 1155 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1156 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 1157 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 1158 2015, . 1160 [I-D.ietf-bess-evpn-bum-procedure-updates] 1161 Zhang, Z., Lin, W., Rabadan, J., Patel, K., and A. 1162 Sajassi, "Updates on EVPN BUM Procedures", draft-ietf- 1163 bess-evpn-bum-procedure-updates-10 (work in progress), 1164 September 2021. 1166 14.2. Informative References 1168 [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., 1169 Uttaro, J., and W. Henderickx, "A Network Virtualization 1170 Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, 1171 DOI 10.17487/RFC8365, March 2018, 1172 . 1174 Authors' Addresses 1176 J. Rabadan (editor) 1177 Nokia 1178 777 Middlefield Road 1179 Mountain View, CA 94043 1180 USA 1182 Email: jorge.rabadan@nokia.com 1183 S. Sathappan 1184 Nokia 1186 Email: senthil.sathappan@nokia.com 1188 W. Lin 1189 Juniper Networks 1191 Email: wlin@juniper.net 1193 M. Katiyar 1194 Versa Networks 1196 Email: mukul@versa-networks.com 1198 A. Sajassi 1199 Cisco Systems 1201 Email: sajassi@cisco.com