idnits 2.17.1 draft-eastlake-bess-evpn-vxlan-bypass-vtep-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 6, 2021) is 1200 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT D. Eastlake 2 Intended status: Proposed Standard Futurewei Technologies 3 Z. Li 4 S. Zhuang 5 Huawei Technologies 6 R. White 7 Juniper Networks 8 Expires: July 5, 2021 January 6, 2021 10 EVPN VXLAN Bypass VTEP 11 13 Abstract 15 A principal feature of EVPN is the ability to support multihoming 16 from a customer equipment (CE) to multiple provider edge equipment 17 (PE) with all-active links. This draft specifies a mechanism to 18 simplify PEs used with VXLAN tunnels and enhance VXLAN Active-Active 19 reliability. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Distribution of this document is unlimited. Comments should be sent 27 to the BESS working group mailing list . 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF), its areas, and its working groups. Note that 31 other groups may also distribute working documents as Internet- 32 Drafts. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 The list of current Internet-Drafts can be accessed at 40 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 41 Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 Table of Contents 46 1. Introduction............................................3 47 1.1 Terminology and Acronyms...............................3 49 2. VXLAN Gateway High Reliability..........................4 50 3. Detailed Problem and Solution Requirement...............6 51 4. The Bypass VXLAN Extended Community Attribute...........7 52 5. Control Plane Processing................................9 54 6. Data Packet Processing................................10 55 6.1 Layer 2 Unicast Packet Forwarding.....................10 56 6.1.1 Uplink..............................................10 57 6.1.2 Downlink............................................10 58 6.2 BUM Packet Forwarding................................11 60 7. IANA Considerations....................................12 61 7.1 IPv4 Specific.........................................12 62 7.2 IPv6 Specific.........................................12 64 8. Security Considerations................................13 66 Acknowledgements..........................................13 67 Contributors..............................................14 69 Normative References......................................14 70 Informative References....................................14 72 Authors' Addresses........................................15 74 1. Introduction 76 A principal feature of EVPN is the ability to support multihoming 77 from a customer equipment (CE) to multiple provider edge equipment 78 (PE) with links used in the all-active redundancy mode. That mode is 79 where a device is multihomed to a group of two or more PEs and where 80 all PEs in such a redundancy group can forward traffic to/from the 81 multihomed device or network for a given VLAN [RFC7209]. This draft 82 specifies a VXLAN gateway mechanism to simplify PE processing in the 83 multi-homed case and enhance VXLAN Active-Active reliability. 85 1.1 Terminology and Acronyms 87 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 88 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 89 "OPTIONAL" in this document are to be interpreted as described in BCP 90 14 [RFC2119] [RFC8174] when, and only when, they appear in all 91 capitals, as shown here. 93 This document uses the following acronyms and terms: 95 All-Active Redundancy Mode - When a device is multihomed to a group 96 of two or more PEs and when all PEs in such redundancy group can 97 forward traffic to/from the multihomed device or network for a 98 given VLAN. 100 BUM - Broadcast, Unknown unicast, and Multicast. 102 CE - Customer Edge equipment. 104 DCI - Data Center Interconnect. 106 ESI - Ethernet Segment Identifier - A unique non-zero identifier that 107 identifies an Ethernet segment. 109 NVE - Network Virtualization Edge. 111 PE - Provider Edge equipment. 113 Single-Active Redundancy Mode - When a device or a network is 114 multihomed to a group of two or more PEs and when only a single PE 115 in such a redundancy group can forward traffic to/from the 116 multihomed device or network for a given VLAN. 118 VTEP - VXLAN Tunnel End Point. 120 VXLAN - Virtual eXtensible Local Area Network [RFC7348]. 122 2. VXLAN Gateway High Reliability 124 One example of the current situation would be a DCI (data center 125 interconnect) using VXLAN tunnels that is multihomed for reliability 126 as show in Figure 1. Each PE as a VXLAN Tunnel End Point (VTEP) uses 127 a different IP adress. Thus each PE must process EVPN updates based 128 on the ESIs [RFC7432]. 130 ......... 131 . DCI . 132 +----------+ . . +----------+ 133 | PE +---------------------+ PE | 134 |VTEP IP-1 +--- . VXLAN . ---+VTEP IP-3 | 135 +----------+ \ .Tunnels. / +----------= 136 / | ----- ----- | \ 137 +--+ | . \ / . | +--+ 138 |CE| | . X . | |CE| 139 +--+ | . / \ . | +--+ 140 \ | ----- ----- | / 141 +----------+ / . VXLAN . \ +----------+ 142 | PE +--- .Tunnels. ---+ PE | 143 |VTEP IP-2 +---------------------+VTEP IP-4 | 144 +----------+ . . +----------+ 145 ......... 147 Figure 1. Current Situtation 149 The situation is greatly simplified if the set of VTEPs connected to 150 a particular Ethernet segment all use the same anycast IP address. 151 PEs no longer need to conern themselves with whether a remote CE is 152 single or multi-homed. The situation is as shown in Figure 2. The IP 153 address within each VTEP group is synchronized by messages within 154 that group. 156 ......... 157 . DCI . 158 +----------+ . . +----------+ 159 | Anycast | . . | Anycast | 160 |VTEP IP-1 +--- . . ---+VTEP IP-2 | 161 +----------+ \ . . / +----------= 162 / ^ \ . . / ^ \ 163 +--+ | \. ./ | +--+ 164 |CE| Sy|nc >-------< Sy|nc |CE| 165 +--+ | /. VXLAN .\ | +--+ 166 \ v / . Tunnel. \ v / 167 +----------+ / . . \ +----------+ 168 | Anycast +--- . . ---+ Anycast | 169 |VTEP IP-1 | . . |VTEP IP-2 | 170 +----------+ . . +----------+ 171 ......... 173 Figure 2. Situtation Using Anycast 175 3. Detailed Problem and Solution Requirement 177 In the scenario illustrated in Figure 3, where an enterprise site and 178 a data center are interconnected, the VPN gateways (PE1 and PE2) and 179 the enterprise site (CPE) are connected through a VXLAN tunnel to 180 provide L2/L3 services between the enterprise site (CPE) and data 181 center. The data center gateway (CE1) is dual-homed to PE1 and PE2 182 to access the VXLAN network, which enhances network access 183 reliability. When one PE fails, services can be rapidly switched to 184 the other PE, minimizing the impact on services. 186 As shown in Figure 3, PE1 and PE2 use a virtual address as a Network 187 Virtualization Edge (NVE) interface address at the network side, 188 namely, the Anycast VTEP address. In this way, the CPE is aware of 189 only one remote NVE interface and establishes a VXLAN tunnel with the 190 virtual address. The packets from the CPE can reach CE1 through 191 either PE1 or PE2. However, single-homed CEs may exist, such as CE2 192 and CE3. As a result, after reaching a PE, the packets from the CPE 193 may need to be forwarded by the other PE to a single-homed CE. 194 Therefore, a bypass VXLAN tunnel needs to be established between PE1 195 and PE2. An EVPN peer relationship is established between PE1 and 196 PE2. Different addresses, namely, bypass VTEP addresses, are 197 configured for PE1 and PE2 so that they can establish a bypass VXLAN 198 tunnel. 200 +-----+ 201 ---------------- | CPE | 202 ^ +-----+ 203 | / \ 204 | / \ 205 VXLAN Tunnel / \ 206 | / \ 207 | / Anycast \ 208 v +-----+ VTEP +-----+ 209 --------- | PE1 |------| PE2 | 210 +-----+ +-----+ 211 /\ /\ 212 / \ / \ 213 / \ Trunk / \ 214 / \ / \ 215 / +\---/+ \ 216 / | \ / | \ 217 / +--+--+ \ 218 / | \ 219 +-----+ +-----+ +-----+ 220 | CE2 | | CE1 | | CE3 | 221 +-----+ +-----+ +-----+ 223 Figure 3. Basic networking of the VXLAN active-active scenario 225 4. The Bypass VXLAN Extended Community Attribute 227 This sections specifies the extensions to meet the requirements given 228 in Section 3 and enhance VXLAN active-active reliability. 230 This document specifies two new BGP extended communities, the IPv4 231 and IPv6 Bypass VXLAN Extended Communities. These extended 232 communities are IPv4-address-specific or IPv6-address-specific, 233 depending on whether the VTEP address to be accommodated is IPv4 or 234 IPv6. In the new extended communities, the 4-byte or 16-byte global 235 administrator field encodes the IPv4 or IPv6 address that is the VTEP 236 address and the 2-byte local administrator field is formatted as 237 shown in Figures 4 and 5. 239 0 1 2 3 240 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 242 | Type=0x01 | Sub-Type=TBA1 | IPv4 Address | 243 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 244 | IPv4 Address (cont.) | Flags | Reserved | 245 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 247 Figure 4. IPv4-address-specific Bypass VXLAN Extended Community 249 0 1 2 3 250 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 251 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 252 | Type=0x00/0x40| Sub-Type=TBA2 | Target IPv6 Address | 253 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 254 | Target IPv6 Address (cont.) | 255 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 256 | Target IPv6 Address (cont.) | 257 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 258 | Target IPv6 Address (cont.) | 259 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 260 | Target IPv6 Address (cont.) | Flags | Reserved | 261 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 263 Figure 5. IPv6-address-specific Bypass VXLAN Extended Community 265 Where 267 Type: 268 0x01 = type for transitive IPv4 specific use. 269 0x00 = type for transitive IPv6 specific use. 270 0x40 = type for non-transitive IPv6 specific use. 272 Sub-Type: 273 TBA1 = subtype for IPv4 specific use. 274 TBA2 = subtype for IPv6 specific use. 276 IPv4/IPv6: An address of that type. 278 Flags: MUST be sent as zero and ignored on receipt. 280 Reserved: MUST be sent as zero and ignored on receipt. 282 5. Control Plane Processing 284 Using the topology in Figure 3: 286 1) PE2 sends a multicast route to PE1. The source address of the 287 route is the Anycast VTEP address shared by PE1 and PE2. The 288 route carries the bypass VXLAN extended community attribute, 289 including the bypass VTEP address of PE1. 291 2) After receiving the multicast route from PE2, PE1 considers that 292 an Anycast relationship be established with PE2. This is because 293 the source address (Anycast VTEP address) of the route is the same 294 as the local virtual address of PE1 and the route carries the 295 bypass VTEP extended community attribute. Based on the bypass 296 VXLAN extended attribute of the route, PE1 establishes a bypass 297 VXLAN tunnel to PE2. 299 3) PE1 learns the MAC address of the CEs through upstream packets 300 from the CEs and advertises them as routes to PE2 through BGP 301 EVPN. The routes carry the ESI of the links accessed by the CEs, 302 and information about the VLANs that the CE access, and the bypass 303 VXLAN extended community attribute. 305 4) PE1 learns the MAC address of the CPE through downstream packets 306 at the network side, specifies that the next-hop address of the 307 MAC route can be iterated to a static VXLAN tunnel, and advertises 308 the route to PE2. The next-hop address of the MAC route cannot be 309 changed. 311 6. Data Packet Processing 313 This section describes how Layer 2 unicast and BUM (Broadcast, 314 Unknown unicast, and Multicast) packets are forwarded. A description 315 of how Layer 3 packets transmitted on the same subnet and Layer 3 316 packets transmitted across subnets cases are forwarded will be 317 provided in a furture version of this document. 319 6.1 Layer 2 Unicast Packet Forwarding 321 The following two subsections discuss Layer 2 unicast forwarding in 322 the topology shown in Figure 3. 324 6.1.1 Uplink 326 After receiving Layer 2 unicast packets destined for the CPE from 327 CE1, CE2, and CE3, PE1 and PE2 search for their local MAC address 328 table to obtain outbound interfaces, perform VXLAN encapsulation on 329 the packets, and forward them to the CPE. 331 6.1.2 Downlink 333 After receiving a Layer 2 unicast packet sent by the CPE to CE1, PE1 334 performs VXLAN decapsulation on the packet, searches the local MAC 335 address table for the destination MAC address, obtains the outbound 336 interface, and forwards the packet to CE1. 338 After receiving a Layer 2 unicast packet sent by the CPE to CE2, PE1 339 performs VXLAN decapsulation on the packet, searches the local MAC 340 address table for the destination MAC address, obtains the outbound 341 interface, and forwards the packet to CE2. 343 After receiving a Layer 2 unicast packet sent by the CPE to CE3, PE1 344 performs VXLAN decapsulation on the packet, searches the local MAC 345 address table for the destination MAC address, and forwards it to PE2 346 over the bypass VXLAN tunnel. After the packet reaches PE2, PE2 347 searches the destination MAC address, obtains the outbound interface, 348 and forwards the packet to CE3. 350 The process for PE2 to forward packets from the CPE is the same as 351 that for PE1 to forward packets from the CPE with the roles of CE2 352 and CE3 swapped. 354 6.2 BUM Packet Forwarding 356 Using the topology in Figure 3, if the destination address of a BUM 357 packet from the CPE is the Anycast VTEP address of PE1 and PE2, the 358 BUM packet may be forwarded to either PE1 or PE2. If the BUM packet 359 reaches PE2, PE2 sends a copy of the packet to CE3 and CE1. In 360 addition, PE2 sends a copy of the packet to PE1 through the bypass 361 VXLAN tunnel between PE1 and PE2. After the copy of the packet 362 reaches PE1, PE1 sends it to CE2, not to the CPE or CE1. In this 363 way, CE1 receives only one copy of the packet. 365 Using the topology in Figure 3, after a BUM packet from CE2 reaches 366 PE1, PE1 sends a copy of the packet to CE1 and the CPE. In addition, 367 PE1 sends a copy of the packet to PE2 through the bypass VXLAN tunnel 368 between PE1 and PE2. After the copy of the packet reaches PE2, PE2 369 sends it to CE3, not to the CPE or CE1. 371 Using the topology in Figure 3, after a BUM packet from CE1 reaches 372 PE1, PE1 sends a copy of the packet to CE2 and the CPE. In addition, 373 PE1 sends a copy of the packet to PE2 through the bypass VXLAN tunnel 374 between PE1 and PE2. After the copy of the packet reaches PE2, PE2 375 sends it to CE3, not to the CPE or CE1. 377 7. IANA Considerations 379 IANA is requested to assign two new Extended Community attribute 380 SubTypes as follows: 382 7.1 IPv4 Specific 384 Sub-Type Value Name Reference 385 -------------- ------------------------------- ---------- 386 TBA1 Bypass VXLAN Extended Community [this doc] 388 7.2 IPv6 Specific 390 Sub-Type Value Name Reference 391 -------------- ------------------------------- ---------- 392 TBA2 Bypass VXLAN Extended Community [this doc] 394 8. Security Considerations 396 TBD 398 For general EVPN Security Considerations, see [RFC7432]. 400 Acknowledgements 402 The authors would like to thank the following for their comments and 403 review of this document: 405 TBD 407 Contributors 409 The following individuals made significant contributions to this 410 document: 412 Haibo Wang 413 Huawei Technologies 414 Huawei Bldg., No. 156 Beiqing Road 415 Beijing 100095 416 China 418 Email: rainsword.wang@huawei.com 420 Normative References 422 [RFC2119] - Bradner, S., "Key words for use in RFCs to Indicate 423 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 424 March 1997, . 426 [RFC7432] - Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 427 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 428 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, 429 . 431 [RFC8174] - Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 432 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 433 2017, . 435 Informative References 437 [RFC7209] - Sajassi, A., Aggarwal, R., Uttaro, J., Bitar, N., 438 Henderickx, W., and A. Isaac, "Requirements for Ethernet VPN 439 (EVPN)", RFC 7209, DOI 10.17487/RFC7209, May 2014, 440 . 442 [RFC7348] - Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 443 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 444 eXtensible Local Area Network (VXLAN): A Framework for 445 Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", 446 RFC 7348, DOI 10.17487/RFC7348, August 2014, . 449 Authors' Addresses 451 Donald E. Eastlake, 3rd 452 Futurewei Technologies 453 2386 Panormaic Circle 454 Apopka, FL 32703 USA 456 Phone: +1-508-333-2270 457 Email: d3e3e3@gmail.com 459 Zhenbin Li 460 Huawei Technologies 461 Huawei Bld., No.156 Beiqing Rd. 462 Beijing 100095 463 China 465 Email: lizhenbin@huawei.com 467 Shunwan Zhuang 468 Huawei Technologies 469 Huawei Bld., No.156 Beiqing Rd. 470 Beijing 100095 471 China 473 Email: zhuangshunwan@huawei.com 475 Russ White 476 Juniper Networks 478 Email: russ@riw.us 480 Copyright, Disclaimer, and Additional IPR Provisions 482 Copyright (c) 2021 IETF Trust and the persons identified as the 483 document authors. All rights reserved. 485 This document is subject to BCP 78 and the IETF Trust's Legal 486 Provisions Relating to IETF Documents 487 (http://trustee.ietf.org/license-info) in effect on the date of 488 publication of this document. Please review these documents 489 carefully, as they describe your rights and restrictions with respect 490 to this document. Code Components extracted from this document must 491 include Simplified BSD License text as described in Section 4.e of 492 the Trust Legal Provisions and are provided without warranty as 493 described in the Simplified BSD License. The definitive version of 494 an IETF Document is that published by, or under the auspices of, the 495 IETF. Versions of IETF Documents that are published by third parties, 496 including those that are translated into other languages, should not 497 be considered to be definitive versions of IETF Documents. The 498 definitive version of these Legal Provisions is that published by, or 499 under the auspices of, the IETF. Versions of these Legal Provisions 500 that are published by third parties, including those that are 501 translated into other languages, should not be considered to be 502 definitive versions of these Legal Provisions. For the avoidance of 503 doubt, each Contributor to the IETF Standards Process licenses each 504 Contribution that he or she makes as part of the IETF Standards 505 Process to the IETF Trust pursuant to the provisions of RFC 5378. No 506 language to the contrary, or terms, conditions or rights that differ 507 from or are inconsistent with the rights and licenses granted under 508 RFC 5378, shall have any effect and shall be null and void, whether 509 published or posted by such Contributor, or included with or in such 510 Contribution.