idnits 2.17.1 draft-malhotra-bess-evpn-l3dl-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 293 has weird spacing: '...CE-Host to PE...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The exact meaning of the all-uppercase expression 'MAY NOT' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: o The EVPN flag 'E' MUST NOT be set in type 8/9 PDU from a CE. o A MAC entry for the MAC received in a type 8/9 PDU MUST be installed in the MAC-VRF table pointing to the AC to which the session is bound. o If an IPv4/IPv6 address is set in the PDU, an IPv4/IPv6 neighbor binding MUST be established for the IPv4/IPv6 address in the PDU to the MAC address in the PDU. In other words, a next-hop re-write for these IPv4/IPv6 neighbor entries MUST be installed using the MAC address in the PDU, and if required by forwarding logic, bound to the AC associated with the L3DL session. o Note that an IPv4/IPv6 address MAY NOT be set in a type 8/9 PDU received from a CE, in which case this PDU is only used for MAC learning. This MAY be the case in a non-IRB EVPN network, wherein, an EVPN PE is not a first-hop router for the attached CEs. -- The document date (July 2, 2019) is 1757 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 826' is mentioned on line 229, but not defined == Missing Reference: 'RFC 4861' is mentioned on line 230, but not defined == Missing Reference: 'RFC 7432' is mentioned on line 1016, but not defined == Missing Reference: 'EVPN IRB' is mentioned on line 314, but not defined == Missing Reference: 'RFC4861' is mentioned on line 914, but not defined == Unused Reference: 'RFC7432' is defined on line 1194, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'L3DL' -- Possible downref: Non-RFC (?) normative reference: ref. 'EVPN-IRB' -- Possible downref: Non-RFC (?) normative reference: ref. 'EVPN-PREFIX-ADV' -- Possible downref: Non-RFC (?) normative reference: ref. 'EVPN-IRB-MOBILITY' -- Possible downref: Non-RFC (?) normative reference: ref. 'EVPN-IP-ALIASING' Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT N. Malhotra, Ed. 3 K. Patel 4 Arrcus 5 Intended Status: Proposed Standard 6 J. Rabadan 7 Nokia 9 Expires: Jan 3, 2020 July 2, 2019 11 L3DL based PE-CE Control Plane for EVPN 12 draft-malhotra-bess-evpn-l3dl-00 14 Abstract 16 In an EVPN network, EVPN PEs provide VPN bridging and routing service 17 to connected CE devices based on BGP EVPN control plane. At present, 18 there is no PE-CE control plane defined for an EVPN PE to learn CE 19 MAC, IP, and any other routes from a CE that may be distributed in 20 EVPN control plane to enable unicast flows between CE devices. As a 21 result, EVPN PEs rely on data plane based gleaning of source MACs for 22 CE MAC learning, ARP/ND snooping for CE IPv4/IPv6 learning, and in 23 some cases, local configuration for learning prefix routes behind a 24 CE. A PE-CE control plane alternative to this traditional learning 25 approach, where applicable, offers certain distinct advantages that 26 in turn result in simplified EVPN operation. 28 This document defines a PE-CE control plane as an optional 29 alternative to traditional non-control-plane based PE-CE learning in 30 an EVPN network. It defines PE-CE control plane procedures and TLVs 31 based on L3DL as the base protocol, enumerates advantages that may be 32 achieved by using this PE-CE control plane, and discusses in detail 33 EVPN use cases that are simplified as a result. 35 Status of this Memo 37 This Internet-Draft is submitted to IETF in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF), its areas, and its working groups. Note that 42 other groups may also distribute working documents as 43 Internet-Drafts. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress". 50 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 52 The list of current Internet-Drafts can be accessed at 53 http://www.ietf.org/1id-abstracts.html 55 The list of Internet-Draft Shadow Directories can be accessed at 56 http://www.ietf.org/shadow.html 58 Copyright and License Notice 60 Copyright (c) 2017 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (http://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 76 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 77 2. PE <-> CE Control Plane Overview . . . . . . . . . . . . . . . 7 78 3. TLVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 79 3.1 Overlay IPv4 Encapsulation PDU . . . . . . . . . . . . . . . 9 80 3.2 Overlay IPv6 Encapsulation PDU . . . . . . . . . . . . . . . 10 81 3.3 Overlay IPv4 Prefix Encapsulation PDU . . . . . . . . . . . 12 82 3.4 Overlay IPv6 Prefix Encapsulation PDU . . . . . . . . . . . 13 83 4. CE MAC/IP Learning on a PE AC . . . . . . . . . . . . . . . . . 14 84 4.1 PE <-> CE L3DL Session Establishment . . . . . . . . . . . . 14 85 4.2 CE MAC/IP Learning . . . . . . . . . . . . . . . . . . . . . 14 86 5. PE Any-cast GW MAC/IP Learning on CE . . . . . . . . . . . . . 14 87 6. Remote CE MAC/IP Learning on CE . . . . . . . . . . . . . . . . 15 88 7. PE <-> CE Control Plane with EVPN All-active Multi-Homing . . . 16 89 7.1 All-active Multi-Homing Mode . . . . . . . . . . . . . . . . 16 90 7.2 Source MAC . . . . . . . . . . . . . . . . . . . . . . . . . 17 91 7.3 CE MAC/IP Learning with EVPN All-active Multi-Homing . . . . 17 92 7.4 LAG Member Link Failure . . . . . . . . . . . . . . . . . . 18 93 7.4.1 Session Re-establishment . . . . . . . . . . . . . . . . 18 94 7.4.2 TLV Retention . . . . . . . . . . . . . . . . . . . . . 18 95 7.4 LAG Failure . . . . . . . . . . . . . . . . . . . . . . . . 18 96 7.5 Example PE <-> CE Control Plane Flow with All-active 98 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 100 Multi-Homing . . . . . . . . . . . . . . . . . . . . . . . . 19 101 8. Software Neighbor Tables . . . . . . . . . . . . . . . . . . . 21 102 9. MAC/IP Learning Conflict Resolution . . . . . . . . . . . . . . 21 103 10. PE-CE Overlay Prefix Learning . . . . . . . . . . . . . . . . 22 104 11. Asymmetric EVPN-IRB . . . . . . . . . . . . . . . . . . . . . 22 105 12. Centralized Gateway EVPN-IRB . . . . . . . . . . . . . . . . . 22 106 13. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 22 107 13.1 Simplified EVPN Operations . . . . . . . . . . . . . . . . 22 108 13.1.1 EVPN All-active Multi-Homing . . . . . . . . . . . . . 23 109 13.1.2 Convergence on CE Host Moves . . . . . . . . . . . . . 24 110 13.1.2.1 Silent Hosts . . . . . . . . . . . . . . . . . . . 24 111 13.1.2.2 Probing . . . . . . . . . . . . . . . . . . . . . 25 112 13.1.3 ARP Gleaning Latency . . . . . . . . . . . . . . . . . 26 113 13.2 Applicability to non-EVPN Use Cases . . . . . . . . . . . . 26 114 14. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 115 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 116 15.1 Normative References . . . . . . . . . . . . . . . . . . . 28 117 15.2 Informative References . . . . . . . . . . . . . . . . . . 28 118 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 29 119 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 120 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 29 122 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 124 1 Introduction 126 In an EVPN network, CE devices typically connect to an EVPN PE via 127 layer-2 interfaces that terminate in a BD on the PE. Multi-homed LAG 128 interfaces together with EVPN all-active multi-homing procedures are 129 used to achieve PE-CE link and PE node redundancy for fault-tolerance 130 and load-balancing. PEs provide overlay bridging and, optionally, 131 first-hop routing service for these CE devices based on an EVPN 132 control plane that is used to distribute CE MAC, IP, and prefix 133 reachability across PEs. 135 At present, there is no PE-CE control plane defined for an EVPN PE to 136 learn connected CE host MACs and IPs. As a result, EVPN PEs rely on: 138 o data plane based gleaning of source MAC for MAC learning, 139 o ARP snooping for IPv4 + MAC learning, and 140 o ND snooping for IPv6 + MAC learning. 142 A PE-CE control plane alternative to this traditional learning 143 approach, where applicable, can offer some distinct advantages across 144 various boot-up, mobility, and convergence scenarios: 146 o PE-CE learning is decoupled from non-deterministic hashing of 147 data, ARP, and ND packets from CEs over all-active multi-homed 148 LAG interfaces. 149 o PE-CE learning is decoupled from non-deterministic periodicity 150 of data traffic from CEs or, in an extreme scenario, from CE 151 device being silent for an extended period. 152 o PE-CE learning is decoupled from non-deterministic CE behavior 153 with respect to unsolicited ARPs and NAs following boot-up and 154 moves. 155 o PE-CE learning is decoupled from latencies associated with data 156 packet triggered ARP and ND gleaning. 158 This in-turn results in simplification of certain EVPN operations 159 such as aliasing, MAC and IP syncing across multi-homing PEs, and 160 probing on MAC/IP moves. In addition, it helps achieve a 161 deterministic convergence behavior across various boot-up, mobility, 162 and failure scenarios. 164 A PE may also use local policy configuration for learning prefixes 165 behind a CE that does not run a dynamic routing protocol. A PE-CE 166 control plane can provide an operationally simpler alternative to 167 local configuration for such use cases, where CE and PE devices are 168 not under the same configuration management entity. 170 This document defines a new PE-CE control plane as an alternative to 171 traditional data-plane and ARP/ND snooping based PE-CE host learning 173 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 175 and to local configuration-based PE-CE prefix learning. It defines 176 PE-CE control plane procedures and TLVs based on [L3DL] as the base 177 protocol, enumerates advantages that may be achieved by using this 178 PE-CE control plane, and discusses in detail EVPN operations that are 179 simplified as a result. Use of PE-CE control plane defined in this 180 document is intended to be optional and backwards compatible with CEs 181 that use traditional PE-CE learning within the same BD. 183 1.1 Terminology 185 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 186 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 187 "OPTIONAL" in this document are to be interpreted as described in 188 BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all 189 capitals, as shown here. 191 The following terms are used in this document: 193 o L3DL: Layer 3 Discovery and Liveness Protocol defined in [L3DL] 194 o EVPN-IRB: A BGP-EVPN distributed control plane based integrated 195 routing and bridging fabric overlay discussed in [EVPN-IRB] 196 o Underlay: IP or MPLS fabric core network that provides IP or 197 MPLS routed reachability between EVPN PEs. 198 o Overlay: VPN or service layer network consisting of EVPN PEs 199 OR VPN provider-edge (PE) switch-router devices that runs on top 200 of an underlay routed core. 201 o EVPN PE: A PE switch-router in a data-center fabric that 202 runs overlay BGP-EVPN control plane and connects to overlay CE 203 host devices. An EVPN PE may also be the first-hop layer-3 204 gateway for CE/host devices. This document refers to EVPN PE as a 205 logical function in a data-center fabric. This EVPN PE function 206 may be physically hosted on a top-of-rack switching device (ToR) 207 OR at layer(s) above the ToR in the Clos fabric. An EVPN PE is 208 typically also an IP or MPLS tunnel end-point for overlay VPN 209 flows. 210 o CE: A tenant host device that has layer 2 connectivity to an 211 EVPN PE switch-router, either directly OR via intermediate 212 switching device(s). 213 o Symmetric EVPN-IRB: An overlay fabric first-hop routing 214 architecture as defined in [EVPN-IRB], wherein, overlay host-to- 215 host routed inter-subnet flows are routed at both ingress and 216 egress EVPN PEs. 217 o Asymmetric EVPN-IRB: An overlay fabric first-hop routing 218 architecture as defined in [EVPN-IRB], wherein, overlay host-to- 219 host routed inter-subnet flows are routed and bridged at ingress 220 PE and bridged at egress PEs. 221 o Centralized EVPN-IRB: An overlay fabric first-hop routing 222 architecture, wherein, overlay host-to-host routed inter-subnet 224 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 226 flows are routed at a centralized gateway, typically at the one 227 of the spine layers, and where EVPN PEs are pure bridging 228 devices. 229 o ARP: Address Resolution Protocol [RFC 826]. 230 o ND: IPv6 Neighbor Discovery Protocol [RFC 4861]. 231 o Ethernet-Segment: physical Ethernet or LAG port that connects an 232 access device to an EVPN PE, as defined in [RFC 7432]. 233 o ESI: Ethernet Segment Identifier as defined in [RFC 7432]. 234 o LAG: Layer-2 link-aggregation, also known as layer-2 bundle 235 port-channel, or bond interface. 236 o EVPN all-active multi-homing: PE-CE all-active multi-homing 237 achieved via a multi-homed layer-2 LAG interface on a CE with 238 member links to multiple PEs and related EVPN procedures on the 239 PEs. 240 o EVPN Aliasing: multi-homing procedure as defined in [RFC 7432]. 241 o BD: Broadcast Domain. 242 o Bridge Table: An instantiation of a broadcast domain on a 243 MAC-VRF. 244 o AC: A PE Attachment Circuit. This may be an access (untagged) or 245 trunk (tagged) layer-2 interface that is a member of a local VLAN 246 or a BD. 248 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 250 2. PE <-> CE Control Plane Overview 252 Layer 3 Discovery and Liveness (L3DL) protocol is defined in [L3DL] 253 as a protocol over Ethernet links to auto-discover connected 254 neighbor's layer 2, layer 3 attributes, and encapsulations for the 255 purpose of bringing up upper layer routing protocols. This document 256 leverages L3DL as a PE-CE protocol in an EVPN network fabric on 257 access links between an EVPN PE and CE. Specifically, 259 o PE-CE control plane based on L3DL protocol is proposed for CE 260 MAC learning as an alternative to data-plane based source MAC 261 learning. 262 o PE-CE control plane based on L3DL protocol is proposed for CE 263 MAC-IP adjacency learning as an alternative to MAC-IP learning 264 based on ARP/ND snooping. 265 o PE-CE control plane based on L3DL is proposed for learning of 266 IP Prefixes and associated overlay indexes, as an alternative to 267 local configuration on the PE for use case defined in section 4.1 268 of [EVPN-PREFIX-ADV]. 270 Note that any specification related to base L3DL protocol itself is 271 considered out of scope for this document and will continue to be 272 covered in the base protocol spec. This document will instead focus 273 on procedures and TLV extensions needed to achieve the above learning 274 on PE-CE links in an EVPN network. Any text that relates to the base 275 protocol included in this document is simply background information 276 in the context of use cases covered in this document. The reader 277 should refer to the base L3DL protocol document for the exact L3DL 278 protocol specification. 280 +------------------------+ 281 | Underlay Network Fabric| 282 +------------------------+ 284 BGP-EVPN Peering 285 <------------------------------> 287 +------+ +------+ +------+ 288 | PE1 | ..... | PE2 | | PE3 | 289 +------+ +------+ +------+ 290 | \ / 291 L3DL Session \ ESI / 292 | L3DL \ / L3DL 293 CE-host to PE2 CE-Host to PE3 295 Figure 1 297 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 299 An L3DL session is established on layer-2 logical interfaces between 300 the EVPN PE and each connected CE host device. A session end-point on 301 a local logical interface is identified by peer Logical Link Endpoint 302 Identifier (LLEI) as defined in [L3DL]. L3DL HELLO messages are used 303 for end-point discovery and OPEN messages are exchanged between two 304 end-points to establish an L3DL peering. Once L3DL peering is 305 established, encapsulation TLVs are exchanged for learning. 307 In the context of an EVPN network, CE Attachment Circuits (AC logical 308 interfaces) typically terminate in a BD on the PE, with multi-homed 309 LAG interfaces used for EVPN all-active multi-homing. CE hosts may be 310 directly connected to EVPN PEs via access ports, or may be connected 311 on trunk-ports via another switch. In a common EVPN-IRB design, EVPN 312 PEs also function as distributed first-hop gateways for hosts in a 313 BD. While symmetric and asymmetric IRB designs are possible as 314 discussed in [EVPN IRB], procedures described in subsequent sections 315 assume symmetric IRB with distributed any-cast gateways on EVPN PEs. 316 Any deviations from these procedures for asymmetric IRB design or a 317 centralized IRB design will be covered in future updates to this 318 document. 320 The next few sections will focus on additional L3DL TLVs and 321 procedures needed for PE-CE learning on EVPN PE ACs without and with 322 all-active multi-homing. 324 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 326 3. TLVs 328 This section defines new TLVs that are used by PE-CE control plane 329 defined in this document. 331 3.1 Overlay IPv4 Encapsulation PDU 333 A new encapsulation PDU type is defined for the purpose of carrying 334 overlay IPv4 and MAC bindings. Alternatively, it may also be used to 335 carry an overlay MAC with a NULL IPv4 address in a non-IRB use case. 337 0 1 2 3 338 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 339 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 340 | Type = 11 | PDU Length | 341 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 342 | | Count | 343 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 344 | Serial Number | 345 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 346 | IPv4 Address | 347 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 348 | PrefixLen |E| Rsvd | | 349 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 350 | MAC Address | 351 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 352 | more... | 353 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 354 | | Sig Type | Signature Length | 355 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 356 | Signature | 357 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 359 Figure 2 361 o A new L3DL PDU type (11) is requested for this PDU. 362 o The IPv4 Address is that of an overlay. 363 o MAC address carries the MAC binding for the particular IPv4 364 address if one is set in the PDU. If an IPv4 address is not set, 365 it simply signals an overlay MAC address. 366 o EVPN flag 'E' indicates if this encapsulation is being sent on 367 behalf of a remote host learnt via EVPN. Use of this flag is 368 covered in a later section. 370 This PDU is used to carry PE's any-cast gateway IPv4 address and MAC 371 bindings to a CE host device. Optionally, it may also be used to 372 relay a remote CE's IPv4 address and MAC bindings to a local CE host 374 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 376 within a subnet, as well as to send local CE IPv4 address and MAC 377 binding to the PE. Procedures related to use of this PDU are 378 discussed in subsequent sections. 380 The encapsulation list in this PDU MUST follow full replace semantics 381 as in the L3DL protocol specification. 383 3.2 Overlay IPv6 Encapsulation PDU 385 A new encapsulation PDU type is defined for the purpose of carrying 386 overlay IPv6 and MAC bindings: 388 0 1 2 3 389 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 391 | Type = 12 | PDU Length | 392 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 393 | | Count | 394 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 395 | Serial Number | 396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 397 | | 398 + + 399 | | 400 + IPv6 Address + 401 | | 402 + + 403 | | 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | PrefixLen |E|R|O| Rsvd | | 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 407 | MAC Address | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 | more... | 410 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 411 | | Sig Type | Signature Length | 412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 413 | Signature | 414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 416 Figure 3 418 o A new L3DL PDU type (12) is requested for this PDU. 419 o The IPv6 Address is that of an overlay. 420 o MAC address carries the MAC binding for IPv6 address in the PDU. 421 o An EVPN flag 'E' indicates if this encapsulation is being sent 422 on behalf of a remote host learnt via EVPN. Usage of this flag is 424 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 426 covered in a later section. 427 o A Router flag 'R' is used to carry "Router Flag" or "R-bit" as 428 defined in [RFC4861]. Usage of this flag for the purpose of 429 installing ND cache entries based on learning via this TLV is as 430 defined in [RFC4861] 431 o An Override flag 'O' is used to carry "Override Flag" or "O-bit" 432 as defined in [RFC4861]. Usage of this flag for the purpose of 433 installing ND cache entries based on learning via this TLV is as 434 defined in [RFC4861] 436 This PDU is used to carry PE's any-cast gateway IPv6 address and MAC 437 bindings to a CE host device. Optionally, it may also be used to 438 relay a remote CE's IPv6 address and MAC bindings to a local CE 439 within a subnet, as well as to send local CE IPv6 address and MAC 440 bindings to the PE. Procedures related to usage of this PDU are 441 discussed in subsequent sections. 443 The encapsulation list contained in this PDU MUST follow full replace 444 semantics as in the L3DL protocol specification. 446 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 448 3.3 Overlay IPv4 Prefix Encapsulation PDU 450 A new encapsulation PDU type is defined for the purpose of carrying 451 overlay IPv4 prefix routes for prefixes behind a CE that does not run 452 a dynamic routing protocol for use-case as defined in section 4.1 of 453 [EVPN-PREFIX-ADV]: 455 0 1 2 3 456 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | Type = 13 | PDU Length | 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 | | Count | 461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 462 | Serial Number | 463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 464 | Prefix Count | IPv4 Prefix | 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 | | PrefixLen | | 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 | IPv4 Prefix | PrefixLen | 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | more... | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | GW IP | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | more... | 475 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 | | Sig Type | Signature Length | 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 478 | Signature | 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 481 Figure 4 483 A CE device as defined in [EVPN-PREFIX-ADV], with prefixes behind it 484 MAY use the above PDU to send these prefixes to an EVPN PE with 485 itself as the GW. An EVPN PE MAY then advertise prefixes received via 486 this PDU as RT-5, with TS as the GW, as defined in [EVPN-PREFIX-ADV]. 488 o A new L3DL PDU type (10) is requested for this PDU. 489 o IPv4 Prefix is set to a prefix behind a CE. 490 o PrefixLen is set to IPv4 prefix length for the advertised prefix. 491 o GW-IP is set to the CE IPv4 address (advertised via Type 8 PDU). 493 Multiple prefixes may be set for a single GW IP. The encapsulation 494 list contained in this PDU MUST follow full replace semantics as in 495 the L3DL protocol specification. 497 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 499 3.4 Overlay IPv6 Prefix Encapsulation PDU 501 A new encapsulation PDU type is defined for the purpose of carrying 502 overlay IPv6 prefix routes for prefixes behind a CE that does not run 503 a dynamic routing protocol for use-case as defined in section 4.1 of 504 [EVPN-PREFIX-ADV]: 506 0 1 2 3 507 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 508 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 509 | Type = 14 | PDU Length | 510 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 511 | | Count | 512 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 513 | Serial Number | 514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 515 | Prefix Count | | 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 517 | | 518 + + 519 | | 520 + + 521 | IPv6 Prefix | 522 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 523 | | PrefixLen | more... | 524 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 525 | | 526 + + 527 | | 528 + GW IP + 529 | | 530 + + 531 | | 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 533 | more... | 534 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 535 | | Sig Type | Signature Length | 536 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 537 | Signature | 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 540 Figure 5 542 A CE device as defined in [EVPN-PREFIX-ADV], with prefixes behind it 543 MAY use the above PDU to send these prefixes to an EVPN PE with 544 itself as the GW. An EVPN PE MAY then advertise prefixes received via 545 this PDU as RT-5, with TS as the GW, as defined in [EVPN-PREFIX-ADV]. 547 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 549 o A new L3DL PDU type (14) is requested for this PDU. 550 o IPv6 Prefix is set to an IPv6 prefix behind a CE. 551 o PrefixLen is set to IPv6 prefix length for the advertised prefix. 552 o GW-IP is set to the CE IPv6 address (advertised via Type 9 PDU). 554 Multiple prefixes may be set for a single GW IP. The encapsulation 555 list contained in this PDU MUST follow full replace semantics as in 556 the L3DL protocol specification. 558 4. CE MAC/IP Learning on a PE AC 560 This section defines procedures for learning a connected CE MAC and 561 IP on a PE local attachment circuit (AC). 563 4.1 PE <-> CE L3DL Session Establishment 565 On an EVPN PE, 567 o A HELLO and/or OPEN PDU sent from a CE host source MAC is 568 received on a tagged or untagged interface that is member of a 569 local BD, referred here to as an AC. 570 o OPEN messages are exchanged with the host on the AC. 571 o L3DL session is established to the host source MAC and bound to a 572 local AC. 574 4.2 CE MAC/IP Learning 576 Overlay IPv4 and IPv6 encapsulation PDU types 8/9 from a CE are used 577 for the purpose of CE MAC/IP learning on a PE: 579 o The EVPN flag 'E' MUST NOT be set in type 8/9 PDU from a CE. 580 o A MAC entry for the MAC received in a type 8/9 PDU MUST be 581 installed in the MAC-VRF table pointing to the AC to which the 582 session is bound. 583 o If an IPv4/IPv6 address is set in the PDU, an IPv4/IPv6 neighbor 584 binding MUST be established for the IPv4/IPv6 address in the PDU to 585 the MAC address in the PDU. In other words, a next-hop re-write for 586 these IPv4/IPv6 neighbor entries MUST be installed using the MAC 587 address in the PDU, and if required by forwarding logic, bound to 588 the AC associated with the L3DL session. 589 o Note that an IPv4/IPv6 address MAY NOT be set in a type 8/9 PDU 590 received from a CE, in which case this PDU is only used for MAC 591 learning. This MAY be the case in a non-IRB EVPN network, wherein, 592 an EVPN PE is not a first-hop router for the attached CEs. 594 5. PE Any-cast GW MAC/IP Learning on CE 596 If L3DL based host learning is enabled on a PE with a distributed 598 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 600 any-cast gateway on the EVPN PE, 602 o EVPN PE MUST send type 8/9 Overlay Encapsulation PDUs on 603 associated ACs with L3DL sessions toward CE hosts. 604 o Type 8/9 PDUs from an EVPN PE MUST be encoded with the any-cast 605 gateway IPv4/IPv6 address and any-cast gateway MAC address. 606 o EVPN flag 'E' MUST NOT be set in this PDU. 607 o A CE MAY process type 8/9 PDUs to establish GW IP to MAC 608 bindings and learn gateway MAC to LAG AC bindings, similar to 609 handling of type 8/9 PDUs on the PE described above. 611 Handling of type 8/9 PDUs for the purpose of gateway learning on the 612 host is desirable but optional. A CE MAY continue to use ARP and ND 613 for this purpose. 615 6. Remote CE MAC/IP Learning on CE 617 For CE to CE intra-subnet flows across the overlay, CE needs to learn 618 and install a neighbor IP to MAC binding for remote CEs. This is 619 handled today either by flooding ARP/ND requests across the overlay 620 bridge and optionally implementing an ARP/ND suppression cache on the 621 PE that is populated via MAC+IP EVPN route-type 2. ARP/ND request 622 frames are trapped on the PE that does a local ARP/ND reply on behalf 623 of the remote CE. If L3DL based learning is enabled in the fabric, 624 L3DL may be used for this purpose to avoid overlay ARP/ND flooding, 625 data frame triggered ARP learning, and to avoid maintaining an ARP 626 suppression cache on the PE. 628 o Remote MAC-IP routes learned via BGP EVPN route-type 2 that are 629 imported to a local MAC-VRF MAY also be sent as type 8/9 PDUs on 630 L3DL sessions to CEs over local ACs in that BD. 631 o EVPN flag 'E' MUST be set in this encapsulation in the PDU. 632 o A CE MAY install IPv4/IPv6 neighbor MAC bindings for remote 633 CEs within a subnet based on 'E' flagged type 8/9 PDUs received 634 from the PE. 636 Handling of type 8/9 PDUs for this purpose is optional but desirable 637 to get full benefit of a fabric that is completely setup on boot-up, 638 avoids overlay flooding, and is decoupled from latencies associated 639 with data plane driven ARP and ND learning. 641 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 643 7. PE <-> CE Control Plane with EVPN All-active Multi-Homing 645 +------------------------+ 646 | Underlay Network Fabric| 647 +------------------------+ 649 BGP-EVPN Peering 650 <---------------------------------------------------> 651 +------+ +------+ +------+ +------+ 652 | PE1 | | PE2 | ..... | PEx | | PEy | 653 +------+ +------+ +------+ +------+ 654 \ / \ / 655 \ / \ / 656 \ / \ / 657 \ ESI-a / \ ESI-b / 658 L3DL \ / L3DL L3DL \ / L3DL 659 to PE1\ / to PE2 to PEx\ / to PEy 660 CE-Host CE-Host 662 Figure 6 664 In an EVPN all-active multi-homing setup, a LAG interface on the CE 665 includes member physical ports that connect to multiple PE devices. A 666 subset of these member ports that terminate at a PE are configured as 667 members of a local LAG interface at that PE. A LAG AC at the PE is a 668 logical interface in a BD, identified by this LAG interface and 669 optionally, an Ethernet Tag in case of trunk ports. 671 In order for L3DL based learning to work with EVPN all-active multi- 672 homing, a separate L3DL peering MUST be established between the CE 673 host and each PE device. For this reason, while an EVPN PE MAY form 674 an L3DL peering to a CE host on its local LAG AC, the CE host MUST 675 form an L3DL peering to a PE on a local LAG "member physical port". 677 A configurable All-active Multi-Homing mode is defined below in order 678 to be able to bind an L3DL peering to a LAG member-port as opposed to 679 a LAG interface. 681 7.1 All-active Multi-Homing Mode 683 When configured to run on a local LAG port in this mode, 685 o L3DL HELLO messages MUST be replicated on ALL LAG member ports. 686 o An L3DL OPEN message sent in response to a HELLO MUST be sent on 687 the LAG member port on which the HELLO was received. 688 o An L3DL session MUST be bound to the local LAG member port on 690 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 692 which the OPEN message was received. 693 o L3DL encapsulation PDUs MUST be sent on the local LAG member 694 port on which the session was bound. 695 o L3DL Keep-Alives MUST be sent on the local LAG member port on 696 which the session was bound. 698 Note that this may result in a PE receiving multiple HELLO PDUs from 699 a CE end-point. This however is harmless, as per the [L3DL] 700 specification. A PE simply drops redundant HELLOs from a MAC that it 701 has already replied to with an OPEN, within a retry time window. 703 7.2 Source MAC 705 L3DL relies on the source MAC address in the Ethernet frame to 706 establish a peering. When running L3DL on a LAG port (in all-active 707 multi-homing mode or regular mode), L3DL frames MUST use the LAG 708 interface MAC as the source MAC address in the Ethernet frame. 710 7.3 CE MAC/IP Learning with EVPN All-active Multi-Homing 712 In order to accomplish MAC/IP learning of CE host devices multi-homed 713 to EVPN fabric PEs via EVPN All-active Multi-Homing: 715 o A multi-homed CE device MUST be configured to run L3DL on a 716 local LAG interfaces in All-active Multi-Homing mode defined 717 above. 718 o EVPN PE MAY run L3DL on local LAG interfaces to multi-homed CE 719 devices in regular mode. 720 o EVPN PEs that share the same Ethernet Segment MUST use unique 721 source MACs (that of the local LAG) in HELLO/OPEN messages to 722 establish separate L3DL sessions to a CE. 724 With the above rules in place, 726 o An L3DL session on the CE is bound to a local LAG member-port. 727 o An L3DL session on the PE is bound to a local LAG AC port. 728 o A single L3DL session is established at the PE to a CE on the 729 local LAG AC. 730 o 'N' L3DL sessions are established at the CE, one to each PE on a 731 local LAG member interface, where N = number of multi-homing PEs 732 in an Ethernet Segment. 734 Once an L3DL session is established as above, all other host learning 735 procedures defined earlier for CE MAC/IP learning on a PE's AC port 736 apply as is to a LAG AC in an EVPN all-active multi-homing setup. 738 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 740 7.4 LAG Member Link Failure 742 On a CE that is running in all-active multi-homing mode, an L3DL 743 session to a PE is bound to a LAG member interface. If the link that 744 the L3DL session is bound to fails, L3DL session will get torn down 745 at the CE by virtue of the session interface going down. If the CE 746 has additional active member link(s) to this PE, a new L3DL session 747 must be established on one of the active member links via HELLO PDUs 748 sent by the CE on its remaining active member links to the PE. 750 7.4.1 Session Re-establishment 752 L3DL session at the CE is torn down immediately following the session 753 interface failure. While the LAG interface at the PE is still 754 operationally UP, L3DL session at the PE is subject to Keep Alive 755 PDUs received from the CE. Once the session expires at the PE because 756 of missed Keep Alive PDUs from the CE, PE will respond to HELLO on 757 one of the active member link with an OPEN to re-establish a new 758 session. Note that the new session is still bound to the LAG AC at 759 the PE and to a new member link at the CE. 761 7.4.2 TLV Retention 763 TLVs learnt from a CE over a failed session MUST be retained at the 764 PE if the PE LAG AC is still operationally up following a member link 765 failure because of active member link(s) in the LAG. TLV retention 766 logic at the PE MAY be based on an age-out time, that is a local 767 matter at the PE. TLV age-out time MUST be higher than the missed 768 Keep Alive duration, after which the session is considered closed. 769 Once a new L3DL session is established, PE MUST implement a mark and 770 sweep logic to reconcile retained TLVs from the CE peer with the new 771 set of TLVs received from this CE. 773 7.4 LAG Failure 775 When a LAG member link failure results in the LAG interface being 776 operationally down, TLV age-out logic discussed above MUST NOT be in 777 effect. L3DL session MAY be be considered as DOWN immediately on the 778 LAG being down at the PE. This is so that, in the event of a total 779 connectivity loss between a PE and CE, CE learnt routes can be 780 withdrawn immediately. 782 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 784 7.5 Example PE <-> CE Control Plane Flow with All-active Multi-Homing 786 An example L3DL over all-active multi-homing session flow is 787 discussed below for clarity. 789 +-------------+ +-------------+ 790 | | | | 791 | PE2 | | PE3 | 792 | | | | 793 +-+-----------+ +-+-----------+ 794 | LAG | | LAG | 795 ++--+---+--++ ++--+---+--++ 796 | | | | | | | | 797 | | | | | | | | 798 | | | | | | | | 799 | | | | | | | | 800 | | | | | | | | 801 +--+--+---+--+----------+--+---+--++ 802 | LAG | 803 +----------------------------------+-+ 804 | | 805 | H1 | 806 | | 807 +------------------------------------+ 809 Figure 7 811 Example topology with CE H1 multi-homed to PE2 and PE3 via EVPN all- 812 active multi-homing LAG with four member ports to each PE: 814 H1 member ports to PE2: i121, i122, i123, i124 815 | | | | 816 PE2 member ports to H1: i211, i212, i213, i214 818 H1 member ports to PE3: i131, i132, i133, i134 819 | | | | 820 PE3 member ports to H1: i311, i312, i313, i314 822 H1 LAG port to PE2/PE3: MLAG1 823 PE2 LAG port to H1: LAG2 824 PE3 LAG port to H1: LAG3 825 H1 LAG MAC: LMAC1 826 PE2 LAG MAC: LMAC2 827 PE3 LAG MAC: LMAC3 829 H1 running L3DL on MLAG1 in All-active Multi-Homing mode 830 PE2 running L3DL on LAG2 in regular mode 831 PE3 running L3DL on LAG3 in regular mode 833 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 835 PE2 H1 PE3 837 | HELLOs | HELLOs | 838 LAG2|<-------------------|------------------->|LAG3 839 LAG2|<-------------------|------------------->|LAG3 840 LAG2|<-------------------|------------------->|LAG3 841 LAG2|<-------------------|------------------->|LAG3 842 | | | 843 | OPEN | OPEN | 844 |------------------->|<-------------------| 845 LAG2| i122|i132 |LAG3 846 | | | 847 | OPEN | OPEN | 848 |<-------------------|------------------->| 849 LAG2| i122|i132 |LAG3 850 | | | 851 Session to | Session to |Session to |Session to 852 LMAC1 on LAG2| LMAC2 on i122|LMAC3 on i132 |LMAC1 on LAG3 853 | | | 854 | Encap-PDU | Encap-PDU | 855 |<-------------------|------------------->| 856 LAG2| i122|i132 |LAG3 857 | ACK | ACK | 858 |------------------->|<-------------------| 859 LAG2| | |LAG3 860 | | | 861 | Overlay-PDU | Overlay-PDU | 862 |------------------->|<-------------------| 863 LAG2| | |LAG3 864 | ACK | ACK | 865 |<-------------------|------------------->| 866 LAG2| i122|i132 |LAG3 867 | | | 869 Figure 8 871 In an example flow shown above: 873 o H1: originates HELLO(SMAC=LMAC2) on all MLAG member ports 874 o PE2: Multiple HELLO(SMAC=LMAC2) copies received on port LAG2 875 o PE3: Multiple HELLO(SMAC=LMAC2) copies received on port LAG3 876 o PE2: A single OPEN(SMAC=LMAC2, DMAC=LMAC1) sent on port LAG2 877 o PE3: A single OPEN(SMAC=LMAC3, DMAC=LMAC1) sent on port LAG3 878 o PE2/PE3:duplicate HELLOs from same source LMAC2 are ignored 879 o H1: OPEN(SMAC=LMAC2, DMAC=LMAC1) received on member port i122 880 o H1: OPEN(SMAC=LMAC1, DMAC=LMAC2) sent on member port i122 881 o H1: Session established to LMAC2 on MLAG1 member port i122 883 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 885 o PE2: Session established to LMAC1 on LAG AC LAG2 886 o H1: OPEN(SMAC=LMAC3, DMAC=LMAC1) received on member port i132 887 o H1: OPEN(SMAC=LMAC1, DMAC=LMAC3) sent on member port i132 888 o H1: Session established to LMAC3 on MLAG member port i132 889 o PE3: Session established to LMAC1 on LAG AC LAG3 890 o H1: IP encapsulation PDUs (type 4/5) sent to LMAC2 and LMAC3 891 o PE2/PE3: H1 MAC and IP are learned 892 o PE2/PE3: overlay IP encapsulation PDUs (type 8/9) sent to LMAC1 893 o H1: Any-cast GW MAC and IP are learned 894 o H1: Remote host MAC and IP are learned 896 8. Software Neighbor Tables 898 Some networking stack implementations rely on ARP and ND populated 899 neighbor tables for software forwarding. In order to inter-work with 900 such an implementation, an L3DL learned IPv4/IPv6 neighbor entry MAY 901 also be installed in ARP and ND neighbor table as a static / 902 permanent entry. 904 In addition, 906 o Pre-installing L3DL learned neighbor entries may help reduce 907 potential conflict with ARP or ND learned neighbor entries. 908 o Pre-installing L3DL learned neighbor entries may help reduce 909 reliance on data traffic triggered ARP requests / ND 910 solicitations and associated learning latency. 912 With respect to installing IPv6 entries learnt via LSoE in IPv6 ND 913 cache, Router flag (R-bit) and Override flag (O-bit) received in LSoE 914 PDU should be handled as defined in [RFC4861]. 916 9. MAC/IP Learning Conflict Resolution 918 If L3DL learned neighbor entries are not already installed as static 919 entries in ARP/ND neighbor table, it is possible that a neighbor 920 IPv4/IPv6 adjacency may be learned both via L3DL and ARP/ND. Even if 921 L3DL learned entries were pre-installed in neighbor table, a race 922 condition is still possible leading to a potential conflict between 923 ARP/ND learned and L3DL learned neighbor IP adjacency. In such 924 scenarios, L3DL learned entry should be preferred for the purpose of 925 programming neighbor IP adjacencies in forwarding. 927 With respect to MAC-VRF entries, it is recommended that data plane 928 learning be turned off when L3DL based learning is enabled. However, 929 if it is not, data plane learned entries MUST be reconciled with L3DL 930 learned entries in software and, in case of a conflict, L3DL learned 931 entries preferred if L3DL based learning is enabled. 933 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 935 10. PE-CE Overlay Prefix Learning 937 [EVPN-PREFIX-ADV] section 4.1 defines a use case, wherein, a PE may 938 advertise IP prefixes and subnets behind a CE. In this use case, CE 939 device does not run a dynamic routing protocol. Instead, these 940 prefixes are learnt on the PE via local policy or configuration. 941 Prefixes are then advertised by PE as RT-5 with the CE as the GW. 943 PE-CE control plane defined in this document MAY be used to learn 944 these prefixes from a CE as an alternative to local configuration on 945 the PE. Once an L3DL session is established between a CE and a PE, as 946 discussed earlier, 948 o A CE MAY send type 10/11 PDUs with these IPv4/IPv6 prefixes over 949 an L3DL session to a PE with the CE IP as the GW IP. 950 o A PE MAY advertise prefixes learnt via type 10/11 PDUs as RT-5 951 with CE IP as the GW IP. 953 To summarize, A PE would advertise: 955 o RT-2 for the CE MAC-IP learnt via type 8/9 PDU 956 o RT-5 for Prefixes learnt via type 10/11 PDU with GW IP = CE IP 958 11. Asymmetric EVPN-IRB 960 Any deviations from the above procedures proposed in this document 961 for asymmetric IRB design will be covered in subsequent updates to 962 this document. 964 12. Centralized Gateway EVPN-IRB 966 Any deviations from the above procedures proposed in this document 967 for centralized GW based IRB design will be covered in subsequent 968 updates to this document. 970 13. Use Cases 972 13.1 Simplified EVPN Operations 974 This section will discuss in detail, benefits and simplifications 975 that may be achieved in the context of an EVPN network, if one 976 chooses to implement PE-CE control plane defined in this document 977 as opposed to using traditional data-plane and ARP/ND snooping 978 based PE-CE learning. 980 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 982 13.1.1 EVPN All-active Multi-Homing 984 +------------------------+ 985 | Underlay Network Fabric| 986 +------------------------+ 988 BGP-EVPN Peering 989 <---------------------------------------------------> 990 +------+ +------+ +------+ +------+ 991 | PE1 | | PE2 | ..... | PEx | | PEy | 992 +------+ +------+ +------+ +------+ 993 \ / \ / 994 \ / \ / 995 \ / \ / 996 \ ESI-a / \ ESI-b / 998 LAG Bundle LAG Bundle 999 to CE Host to CE Host 1001 Figure 9 1003 Data plane and ARP/ND snooping based MAC/IP learning on PE-CE all- 1004 active multi-homed LAG ports is subject to unpredictable hashing of 1005 ARP, ND, and data frames from host to PE. As an example, an ARP 1006 request for a connected host might originate at PE1 but the resulting 1007 ARP response from the host might be received at PE2. Redundant EVPN 1008 PEs in all-active multi-homing mode typically handle this 1009 unpredictability via combination of methods below: 1011 o PEs can handle unsolicited ARP and ND response frames. 1012 o PEs can implement additional mechanism to SYNC ARP, ND, and 1013 MAC tables across all PEs in a redundancy group for optimal 1014 forwarding to locally connected hosts. 1015 o PEs can implement EVPN aliasing procedures discussed in 1016 [RFC 7432] OR re-originate SYNCed MAC-IP adjacencies as local RT- 1017 2 to achieve MAC ECMP across the overlay. 1018 o PEs can also re-originate SYNCed MAC-IP adjacencies as local 1019 RT-2 to achieve IP ECMP across the overlay OR implement IP 1020 aliasing procedures discussed in [EVPN-IP-ALIASING]. 1021 o PEs can also ensure EVPN sequence number SYNC for local MAC 1022 entries for EVPN mobility procedures to work correctly, as 1023 discussed in [EVPN-IRB-MOBILITY]. 1025 The PE-CE control plane learning alternative defined in this document 1026 fully decouples MAC and IP learning over MLAG ports from 1027 unpredictable hashing of data, AR, ND frames on all-active multi- 1029 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 1031 homed LAG member links. As a result, above procedures that 1032 essentially result from data-plane PE-CE learning on all-active 1033 multi-homed LAGs can be simplified via the PE-CE control plane 1034 alternative defined in this document. 1036 13.1.2 Convergence on CE Host Moves 1038 +------------------------+ 1039 | Underlay Network Fabric| 1040 +------------------------+ 1042 BGP-EVPN Peering 1043 <------------------------------> 1045 +------+ +------+ +------+ 1046 | PE1 | | PE2 | ..... | PEx | 1047 +------+ +------+ +------+ 1048 | | | 1049 Hosts Hosts Hosts 1051 Figure 10 1053 Host mobility across EVPN PE switches is a common occurrence in a 1054 data center fabric for flexibility in work load placement across a 1055 DC. Further, a host move must result in minimal, if any, disruption 1056 to traffic flows / services to / from the device. 1058 Data plane and ARP/ND snooping based PE-CE learning may result in 1059 unpredictable convergence times, following host moves for the 1060 following cases: 1062 o A host may or may not send any data packet immediately following 1063 a move. 1064 o A host may or may not send an unsolicited ARP following a move. 1066 While probing procedures, discussed in the next sub-sections are 1067 typically used to minimize convergence time, certain scenarios 1068 discussed below may still result in extended convergence times and 1069 flooding. 1071 13.1.2.1 Silent Hosts 1073 If a host is silent for an extended period following a move from PE1 1074 to PE2, any bridged traffic flow destined to this host will continue 1075 to be black-holed by PE1 until the MAC ages out at PE1. Once the the 1076 MAC ages out at PE1, any bridged traffic flow destined to the host is 1078 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 1080 flooded across the overlay bridge. Flooding of unknown unicast 1081 traffic on the overlay is enabled for this purpose. In summary, PE-CE 1082 learning that is based on data-plane and AR/ND snooping may be 1083 subject to non-deterministic convergence time and flooding following 1084 host moves because of being heavily dependent on unpredictable CE 1085 behavior. 1087 PE-CE control plane based learning defined in this document fully 1088 decouples convergence in such scenarios from non-deterministic data 1089 flows and unsolicited ARP/ND behavior on a CE. 1091 13.1.2.2 Probing 1093 ARP and ND probing procedures are typically used to achieve host re- 1094 learning and convergence following host moves across the overlay: 1096 o Following a host move from PE1 to PE2, the host's MAC is 1097 discovered at PE2 as a local MAC via a data frames received from 1098 the host. If PE2 has a prior REMOTE MAC-IP host route for this 1099 MAC from PE1, an ARP probe is typically triggered at PE2 to learn 1100 the MAC-IP as a local IP adjacency and triggers EVPN RT-2 1101 advertisement for this MAC-IP across the overlay with new 1102 reachability via PE2. 1104 o Following a host move from PE1 to PE2, once PE1 receives a MAC 1105 or MAC-IP route from PE2 with a higher sequence number, an ARP 1106 probe is triggered at PE1 to clear the stale local MAC-IP 1107 neighbor adjacency OR re-learn the local MAC-IP in case the host 1108 has moved back or is duplicate. 1110 o Following a local MAC age-out, if there is a local IP adjacency 1111 with this MAC, an ARP probe is triggered for this IP to either 1112 re-learn the local MAC and maintain local l3 and l2 reachability 1113 to this host OR to clear the ARP entry in case the host is indeed 1114 no longer local. Note that clearing of stale ARP entries, 1115 following a move is required for traffic to converge in the event 1116 that the host was silent and not discovered at its new location. 1117 Once stale ARP entry for the host is cleared, routed traffic flow 1118 destined for the host can re-trigger ARP discovery for this host 1119 at the new location. ARP flooding on the overlay MUST also be 1120 done to enable ARP discovery via routed flows. 1122 o Alternatively, ARP probing timer may be tuned to be smaller than 1123 the MAC aging timer to avoid MAC age-out. 1125 PE-CE control plane learning alternative defined in this document 1126 decouples host learning following moves from unpredictable host 1127 behavior with respect to sending data traffic and unsolicited ARPs, 1129 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 1131 and as a result from ARP probing and MAC aging timer settings. Host 1132 move handling is hence greatly simplified to a very predictable and 1133 deterministic behavior. 1135 13.1.3 ARP Gleaning Latency 1137 If a CE's ARP binding is not already learned on a PE via an 1138 unsolicited ARP sent by the CE following events such as boot-up, 1139 flaps, and moves, a data frame that needs to be routed to the CE 1140 triggers ARP or ND discovery process on the PE. On a typical hardware 1141 switching platform, an IP packet that does not resolve to a link 1142 layer re-write would be punted to host stack that delivers packets 1143 with incomplete link-layer resolution to ARP or ND for resolution. An 1144 ARP request / ND Solicitation is generated for the CE IP and an ARP 1145 response or NA results in installing a link-layer re-write for the CE 1146 IP. In an EVPN multi-homing environment, this procedure is further 1147 complicated as the response is only received by one of the PEs that 1148 may or may not be the one that generated the ARP or ND request. 1149 Learned neighbor binding is SYNCed to other PEs that share the multi- 1150 homed Ethernet Segment. Routed flows can now be forwarded to the host 1151 via all PEs. Latency associated with such data frame driven ARP 1152 discovery may result in significant initial convergence hit, 1153 following triggers that warrant re-gleaning of CE IP to MAC binding. 1155 PE-CE control plane learning alternative defined in this document 1156 results in proactive host learning following these scenarios, 1157 potentially avoiding a convergence hit on initial data packets. 1159 13.2 Applicability to non-EVPN Use Cases 1161 While the L3DL based host learning procedure described in this 1162 document focuses on EVPN-IRB overlay fabric use case, it may also 1163 have benefits and applicability in non-EVPN use cases. Applicability 1164 of procedures described in this document to non-EVPN use cases is a 1165 topic for further study. 1167 14. Summary 1169 PE-CE control plane is proposed as an alternative to data plane and 1170 ARP/ND snooping based PE-CE host MAC/IP learning and for PE-CE prefix 1171 learning. With a PE-CE control plane, CE host MAC and IP are 1172 deterministically learned on host boot-up, on host configuration, 1173 across host moves, on convergence triggers such as link failures, 1174 flaps, and PE re-boots and on all-active multi-homing LAG links. A 1175 PE-CE control plane decouples CE MAC and IP learning from traffic 1176 flows sourced by a CE, from varying CE behavior with respect to 1177 sending unsolicited ARP/ND frames, and from hashing of CE sourced 1178 frames over all-active multi-homed LAG links. As a result, it helps 1180 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 1182 achieve a predictable and reliable convergence behavior across these 1183 triggers and helps simplify certain EVPN procedures that are 1184 otherwise needed with a data-plane and ARP/ND snooping based PE-CE 1185 learning. In addition, it may also be used for non-host learning use 1186 cases such as prefix learning. 1188 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 1190 15. References 1192 15.1 Normative References 1194 [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 1195 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 1196 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 1197 2015, . 1199 [L3DL] Bush, R., Austein R., Patel, K., "Layer 3 Discovery and 1200 Liveness", Feb 2019, . 1203 [EVPN-IRB] Sajassi, A., Salem, S., Thoria S., Drake J., Rabadan J., 1204 "Integrated Routing and Bridging in EVPN", July 2018, 1205 . 1208 [EVPN-PREFIX-ADV] Rabadan J., Henderickx W., Drake J., Lin W., 1209 Sajassi, A., "IP Prefix Advertisement in EVPN", May 2018, 1210 . 1213 [EVPN-IRB-MOBILITY] Malhotra, N., Sajassi, A., Rabadan, J., Drake 1214 J., Lingala A., Patekar A., "Extended Mobility Procedures 1215 for EVPN-IRB", June 2019, 1216 . 1219 [EVPN-IP-ALIASING] Sajassi, A., Badoni, G., "L3 Aliasing and Mass 1220 Withdrawal Support for EVPN", July 2017, 1221 . 1224 [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate 1225 Requirement Levels", March 1997, 1226 . 1228 [RFC8174] B. Leiba, "Ambiguity of Uppercase vs Lowercase in RFC 2119 1229 Key Words", May 2017, 1230 . 1232 15.2 Informative References 1233 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN 1235 16. Acknowledgements 1237 Authors would like to thank Randy Bush and Rob Austein for detailed 1238 review and feedback to ensure consistency with base L3DL protocol 1239 specification, as well as for helping build detailed L3DL flows 1240 included in this document. 1242 Authors would like to thank Ali Sajassi and John Drake for detailed 1243 review and very valuable input on PE-CE protocol design for EVPN use 1244 cases as well as structuring this document for EVPN use cases. 1246 Contributors 1248 Randy Bush 1249 Arrcus & IIJ 1250 5147 Crystal Springs 1251 Bainbridge Island, WA 98110 1252 United States of America 1254 Email: randy@psg.com 1256 Authors' Addresses 1258 Neeraj Malhotra (Editor) 1259 Arrcus 1260 2077 Gateway Place, Suite #400 1261 San Jose, CA 95119, USA 1263 Email: neeraj.ietf@gmail.com 1265 Keyur Patel 1266 Arrcus 1267 2077 Gateway Place, Suite #400 1268 San Jose, CA 95119, USA 1270 Email: keyur@arrcus.com 1272 Jorge Rabadan 1273 Nokia 1274 777 E. Middlefield Road 1275 Mountain View, CA 94043, USA 1277 Email: jorge.rabadan@nokia.com 1279 INTERNET DRAFT L3DL based PE-CE Control Plane for EVPN