idnits 2.17.1 draft-eastlake-bess-evpn-vxlan-bypass-vtep-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 16, 2018) is 2104 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group Donald Eastlake 2 INTERNET-DRAFT Zhenbin Li 3 Shunwan Zhuang 4 Huawei 5 Intended status: Proposed Standard 6 Expires: January 15, 2019 July 16, 2018 8 EVPN VXLAN Bypass VTEP 9 11 Abstract 13 A principal feature of EVPN is the ability to support multihoming 14 from a customer equipment (CE) to multiple provider edge equipment 15 (PE) with all-active links. This draft specifies a mechanism to 16 simplify PEs used with VXLAN tunnels and enhance VXLAN Active-Active 17 reliability. 19 Status of This Memo 21 This Internet-Draft is submitted to IETF in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Distribution of this document is unlimited. Comments should be sent 25 to the BESS working group mailing list . 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as Internet- 30 Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 39 Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 Table of Contents 44 1. Introduction............................................3 45 1.1 Terminology and Acronyms...............................3 47 2. VXLAN Gateway High Reliability..........................4 48 3. Detailed Problem and Solution Requirement...............6 49 4. The Bypass VXLAN Extended Community Attribute...........7 50 5. Control Plane Processing................................9 52 6. Data Packet Processing................................10 53 6.1 Layer 2 Unicast Packet Forwarding.....................10 54 6.1.1 Uplink..............................................10 55 6.1.2 Downlink............................................10 56 6.2 BUM Packet Forwarding................................11 58 7. IANA Considerations....................................12 59 7.1 IPv4 Specific.........................................12 60 7.2 IPv6 Specific.........................................12 62 8. Security Considerations................................12 64 Acknowledgements..........................................12 65 Contributors..............................................13 67 Normative References......................................13 68 Informative References....................................13 70 Authors' Addresses........................................14 72 1. Introduction 74 A principal feature of EVPN is the ability to support multihoming 75 from a customer equipment (CE) to multiple provider edge equipment 76 (PE) with links used in the all-active redundancy mode. That mode is 77 where a device is multihomed to a group of two or more PEs and where 78 all PEs in such a redundancy group can forward traffic to/from the 79 multihomed device or network for a given VLAN [RFC7209]. This draft 80 specifies a VXLAN gateway mechanism to simplify PE processing in the 81 multi-homed case and enhance VXLAN Active-Active reliability. 83 1.1 Terminology and Acronyms 85 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 86 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 87 "OPTIONAL" in this document are to be interpreted as described in BCP 88 14 [RFC2119] [RFC8174] when, and only when, they appear in all 89 capitals, as shown here. 91 This document uses the following acronyms and terms: 93 All-Active Redundancy Mode - When a device is multihomed to a group 94 of two or more PEs and when all PEs in such redundancy group can 95 forward traffic to/from the multihomed device or network for a 96 given VLAN. 98 BUM - Broadcast, Unknown unicast, and Multicast. 100 CE - Customer Edge equipment. 102 DCI - Data Center Interconnect. 104 ESI - Ethernet Segment Identifier - A unique non-zero identifier that 105 identifies an Ethernet segment. 107 NVE - Network Virtualization Edge. 109 PE - Provider Edge equipment. 111 Single-Active Redundancy Mode - When a device or a network is 112 multihomed to a group of two or more PEs and when only a single PE 113 in such a redundancy group can forward traffic to/from the 114 multihomed device or network for a given VLAN. 116 VXLAN - Virtual eXtensible Local Area Network [RFC7348]. 118 VXTEP - VXLAN Tunnel End Point. 120 2. VXLAN Gateway High Reliability 122 One example of the current situation would be a DCI (data center 123 interconnect) using VXLAN tunnels that is multihomed for reliability 124 as show in Figure 1. Each PE as a VXLAN Tunnel End Point (VTEP) uses 125 a different IP adress. Thus each PE must process EVPN updates based 126 on the ESIs [RFC7432]. 128 ......... 129 . DCI . 130 +----------+ . . +----------+ 131 | PE +---------------------+ PE | 132 |VTEP IP-1 +--- . VXLAN . ---+VTEP IP-3 | 133 +----------+ \ .Tunnels. / +----------= 134 / | ----- ----- | \ 135 +--+ | . \ / . | +--+ 136 |CE| | . X . | |CE| 137 +--+ | . / \ . | +--+ 138 \ | ----- ----- | / 139 +----------+ / . VXLAN . \ +----------+ 140 | PE +--- .Tunnels. ---+ PE | 141 |VTEP IP-2 +---------------------+VTEP IP-4 | 142 +----------+ . . +----------+ 143 ......... 145 Figure 1. Current Situtation 147 The situation is greatly simplified if the set of VTEPs connected to 148 a particular Ethernet segment all use the same anycast IP address. 149 PEs no longer need to conern themselves with whether a remote CE is 150 single or multi-homed. The situation is as shown in Figure 2. The IP 151 address within each VTEP group is synchronized by messages within 152 that group. 154 ......... 155 . DCI . 156 +----------+ . . +----------+ 157 | Anycast | . . | Anycast | 158 |VTEP IP-1 +--- . . ---+VTEP IP-2 | 159 +----------+ \ . . / +----------= 160 / ^ \ . . / ^ \ 161 +--+ | \. ./ | +--+ 162 |CE| Sy|nc >-------< Sy|nc |CE| 163 +--+ | /. VXLAN .\ | +--+ 164 \ v / . Tunnel. \ v / 165 +----------+ / . . \ +----------+ 166 | Anycast +--- . . ---+ Anycast | 167 |VTEP IP-1 | . . |VTEP IP-2 | 168 +----------+ . . +----------+ 169 ......... 171 Figure 2. Situtation Using Anycast 173 3. Detailed Problem and Solution Requirement 175 In the scenario illustrated in Figure 3, where an enterprise site and 176 a data center are interconnected, the VPN gateways (PE1 and PE2) and 177 the enterprise site (CPE) are connected through a VXLAN tunnel to 178 provide L2/L3 services between the enterprise site (CPE) and data 179 center. The data center gateway (CE1) is dual-homed to PE1 and PE2 180 to access the VXLAN network, which enhances network access 181 reliability. When one PE fails, services can be rapidly switched to 182 the other PE, minimizing the impact on services. 184 As shown in Figure 3, PE1 and PE2 use a virtual address as a Network 185 Virtualization Edge (NVE) interface address at the network side, 186 namely, the Anycast VTEP address. In this way, the CPE is aware of 187 only one remote NVE interface and establishes a VXLAN tunnel with the 188 virtual address. The packets from the CPE can reach CE1 through 189 either PE1 or PE2. However, single-homed CEs may exist, such as CE2 190 and CE3. As a result, after reaching a PE, the packets from the CPE 191 may need to be forwarded by the other PE to a single-homed CE. 192 Therefore, a bypass VXLAN tunnel needs to be established between PE1 193 and PE2. An EVPN peer relationship is established between PE1 and 194 PE2. Different addresses, namely, bypass VTEP addresses, are 195 configured for PE1 and PE2 so that they can establish a bypass VXLAN 196 tunnel. 198 +-----+ 199 ---------------- | CPE | 200 ^ +-----+ 201 | / \ 202 | / \ 203 VXLAN Tunnel / \ 204 | / \ 205 | / Anycast \ 206 v +-----+ VTEP +-----+ 207 --------- | PE1 |------| PE2 | 208 +-----+ +-----+ 209 /\ /\ 210 / \ / \ 211 / \ Trunk / \ 212 / \ / \ 213 / +\---/+ \ 214 / | \ / | \ 215 / +--+--+ \ 216 / | \ 217 +-----+ +-----+ +-----+ 218 | CE2 | | CE1 | | CE3 | 219 +-----+ +-----+ +-----+ 221 Figure 3. Basic networking of the VXLAN active-active scenario 223 4. The Bypass VXLAN Extended Community Attribute 225 This sections describes the extensions specified to meeting the 226 requirements given in Section 3 and enhance VXLAN active-active 227 reliability. 229 This document specifies two new BGP extended communities, called the 230 Bypass VXLAN Extended Community. The extended communities have a 231 Type indicating they are transitive and are IPv4-address-specific or 232 IPv6-address-specific, depending on whether the VTEP address to be 233 accommodated is IPv4 or IPv6. In the new extended communities, the 234 4-byte or 16-byte global administrator field encodes the IPv4 or IPv6 235 address that is the VTEP address and the 2-byte local administrator 236 field is formatted as shown in Figures 4 and 5. 238 0 1 2 3 239 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 240 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 241 | Type=0x01 | Sub-Type=TBA1 | IPv4 Address | 242 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 243 | IPv4 Address (cont.) | Flags | Reserved | 244 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 246 Figure 4. IPv4-address-specific Bypass VXLAN Extended Community 248 0 1 2 3 249 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 250 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 251 | Type=0x00/0x40| Sub-Type=TBA2 | Target IPv6 Address | 252 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 253 | Target IPv6 Address (cont.) | 254 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 255 | Target IPv6 Address (cont.) | 256 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 257 | Target IPv6 Address (cont.) | 258 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 259 | Target IPv6 Address (cont.) | Flags | Reserved | 260 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 262 Figure 5. IPv6-address-specific Bypass VXLAN Extended Community 264 Where 266 Type: 267 0x01 = type for IPv4 specific use. 268 0x00 = type for transitive IPv6 specific use. 269 0x40 = type for non-transitive IPv6 specific use. 271 Sub-Type: 272 TBA1 = subtype for IPv4 specific use. 274 TBA2 = subtype for IPv6 specific use. 276 IPv4/IPv6: An address of that type. 278 Flags: MUST be sent as zero and ignored on receipt. 280 Reserved: MUST be sent as zero and ignored on receipt. 282 5. Control Plane Processing 284 Using the topology in Figure 3: 286 1) PE2 sends a multicast route to PE1. The source address of the 287 route is the Anycast VTEP address shared by PE1 and PE2. The route 288 carries the bypass VXLAN extended community attribute, including the 289 bypass VTEP address of PE1. 291 2) After receiving the multicast route from PE2, PE1 considers that 292 an Anycast relationship be established with PE2. This is because the 293 source address (Anycast VTEP address) of the route is the same as the 294 local virtual address of PE1 and the route carries the bypass VTEP 295 extended community attribute. Based on the bypass VXLAN extended 296 attribute of the route, PE1 establishes a bypass VXLAN tunnel to PE2. 298 3) PE1 learns the MAC address of the CEs through upstream packets 299 from the CEs and advertises them as routes to PE2 through BGP EVPN. 300 The routes carry the ESI of the links accessed by the CEs, and 301 information about the VLANs that the CE access, and the bypass VXLAN 302 extended community attribute. 304 4) PE1 learns the MAC address of the CPE through downstream packets 305 at the network side, specifies that the next-hop address of the MAC 306 route can be iterated to a static VXLAN tunnel, and advertises the 307 route to PE2. The next-hop address of the MAC route cannot be 308 changed. 310 6. Data Packet Processing 312 This section describes how Layer 2 unicast and BUM (Broadcast, 313 Unknown unicast, and Multicast) packets are forwarded. A description 314 of how Layer 3 packets transmitted on the same subnet and Layer 3 315 packets transmitted across subnets cases are forwarded will be 316 provided in a furture version of this document. 318 6.1 Layer 2 Unicast Packet Forwarding 320 The following two subsections discuss Layer 2 unicast forwarding in 321 the topology shown in Figure 3. 323 6.1.1 Uplink 325 After receiving Layer 2 unicast packets destined for the CPE from 326 CE1, CE2, and CE3, PE1 and PE2 search for their local MAC address 327 table to obtain outbound interfaces, perform VXLAN encapsulation on 328 the packets, and forward them to the CPE. 330 6.1.2 Downlink 332 After receiving a Layer 2 unicast packet sent by the CPE to CE1, PE1 333 performs VXLAN decapsulation on the packet, searches the local MAC 334 address table for the destination MAC address, obtains the outbound 335 interface, and forwards the packet to CE1. 337 After receiving a Layer 2 unicast packet sent by the CPE to CE2, PE1 338 performs VXLAN decapsulation on the packet, searches the local MAC 339 address table for the destination MAC address, obtains the outbound 340 interface, and forwards the packet to CE2. 342 After receiving a Layer 2 unicast packet sent by the CPE to CE3, PE1 343 performs VXLAN decapsulation on the packet, searches the local MAC 344 address table for the destination MAC address, and forwards it to PE2 345 over the bypass VXLAN tunnel. After the packet reaches PE2, PE2 346 searches the destination MAC address, obtains the outbound interface, 347 and forwards the packet to CE3. 349 The process for PE2 to forward packets from the CPE is the same as 350 that for PE1 to forward packets from the CPE. 352 6.2 BUM Packet Forwarding 354 Using the topology in Figure 3, if the destination address of a BUM 355 packet from the CPE is the Anycast VTEP address of PE1 and PE2, the 356 BUM packet may be forwarded to either PE1 or PE2. If the BUM packet 357 reaches PE2 first, PE2 sends a copy of the packet to CE3 and CE1. In 358 addition, PE2 sends a copy of the packet to PE1 through the bypass 359 VXLAN tunnel between PE1 and PE2. After the copy of the packet 360 reaches PE1, PE1 sends it to CE2, not to the CPE or CE1. In this 361 way, CE1 receives only one copy of the packet. 363 Using the topology in Figure 3, after a BUM packet from CE2 reaches 364 PE1, PE1 sends a copy of the packet to CE1 and the CPE. In addition, 365 PE1 sends a copy of the packet to PE2 through the bypass VXLAN tunnel 366 between PE1 and PE2. After the copy of the packet reaches PE2, PE2 367 sends it to CE3, not to the CPE or CE1. 369 Using the topology in Figure 3, after a BUM packet from CE1 reaches 370 PE1, PE1 sends a copy of the packet to CE2 and the CPE. In addition, 371 PE1 sends a copy of the packet to PE2 through the bypass VXLAN tunnel 372 between PE1 and PE2. After the copy of the packet reaches PE2, PE2 373 sends it to CE3, not to the CPE or CE1. 375 7. IANA Considerations 377 IANA is requested to assign two new Extended Community attribute 378 SubTypes as follows: 380 7.1 IPv4 Specific 382 Sub-Type Value Name Reference 383 -------------- ------------------------------- ---------- 384 TBA1 Bypass VXLAN Extended Community [this doc] 386 7.2 IPv6 Specific 388 Sub-Type Value Name Reference 389 -------------- ------------------------------- ---------- 390 TBA2 Bypass VXLAN Extended Community [this doc] 392 8. Security Considerations 394 TBD 396 For general EVPN Security Considerations, see [RFC7432]. 398 Acknowledgements 400 The authors would like to thank the following for their comments and 401 review of this document: 403 TBD 405 Contributors 407 The following individuals made significant contributions to this 408 document: 410 Haibo Wang 411 Huawei Technologies 412 Huawei Bldg., No. 156 Beiqing Road 413 Beijing 100095 414 China 416 Email: rainsword.wang@huawei.com 418 Normative References 420 [RFC2119] - Bradner, S., "Key words for use in RFCs to Indicate 421 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 422 March 1997, . 424 [RFC7432] - Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 425 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 426 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, 427 . 429 [RFC8174] - Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 430 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 431 2017, . 433 Informative References 435 [RFC7209] - Sajassi, A., Aggarwal, R., Uttaro, J., Bitar, N., 436 Henderickx, W., and A. Isaac, "Requirements for Ethernet VPN 437 (EVPN)", RFC 7209, DOI 10.17487/RFC7209, May 2014, 438 . 440 [RFC7348] - Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 441 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 442 eXtensible Local Area Network (VXLAN): A Framework for 443 Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", 444 RFC 7348, DOI 10.17487/RFC7348, August 2014, . 447 Authors' Addresses 449 Donald E. Eastlake, 3rd 450 Huawei Technologies 451 1424 Pro Shop Court 452 Davenport, FL 33896 USA 454 Phone: +1-508-333-2270 455 Email: d3e3e3@gmail.com 457 Zhenbin Li 458 Huawei Technologies 459 Huawei Bld., No.156 Beiqing Rd. 460 Beijing 100095 461 China 463 Email: lizhenbin@huawei.com 465 Shunwan Zhuang 466 Huawei Technologies 467 Huawei Bld., No.156 Beiqing Rd. 468 Beijing 100095 469 China 471 Email: zhuangshunwan@huawei.com 473 Copyright, Disclaimer, and Additional IPR Provisions 475 Copyright (c) 2018 IETF Trust and the persons identified as the 476 document authors. All rights reserved. 478 This document is subject to BCP 78 and the IETF Trust's Legal 479 Provisions Relating to IETF Documents 480 (http://trustee.ietf.org/license-info) in effect on the date of 481 publication of this document. Please review these documents 482 carefully, as they describe your rights and restrictions with respect 483 to this document. Code Components extracted from this document must 484 include Simplified BSD License text as described in Section 4.e of 485 the Trust Legal Provisions and are provided without warranty as 486 described in the Simplified BSD License. The definitive version of 487 an IETF Document is that published by, or under the auspices of, the 488 IETF. Versions of IETF Documents that are published by third parties, 489 including those that are translated into other languages, should not 490 be considered to be definitive versions of IETF Documents. The 491 definitive version of these Legal Provisions is that published by, or 492 under the auspices of, the IETF. Versions of these Legal Provisions 493 that are published by third parties, including those that are 494 translated into other languages, should not be considered to be 495 definitive versions of these Legal Provisions. For the avoidance of 496 doubt, each Contributor to the IETF Standards Process licenses each 497 Contribution that he or she makes as part of the IETF Standards 498 Process to the IETF Trust pursuant to the provisions of RFC 5378. No 499 language to the contrary, or terms, conditions or rights that differ 500 from or are inconsistent with the rights and licenses granted under 501 RFC 5378, shall have any effect and shall be null and void, whether 502 published or posted by such Contributor, or included with or in such 503 Contribution.