idnits 2.17.1 draft-eastlake-bess-evpn-vxlan-bypass-vtep-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 8, 2019) is 1929 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group D. Eastlake 2 INTERNET-DRAFT Z. Li 3 S. Zhuang 4 Huawei 5 Intended status: Proposed Standard 6 Expires: July 7, 2019 January 8, 2019 8 EVPN VXLAN Bypass VTEP 9 11 Abstract 13 A principal feature of EVPN is the ability to support multihoming 14 from a customer equipment (CE) to multiple provider edge equipment 15 (PE) with all-active links. This draft specifies a mechanism to 16 simplify PEs used with VXLAN tunnels and enhance VXLAN Active-Active 17 reliability. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Distribution of this document is unlimited. Comments should be sent 25 to the BESS working group mailing list . 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as Internet- 30 Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 39 Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 Table of Contents 44 1. Introduction............................................3 45 1.1 Terminology and Acronyms...............................3 47 2. VXLAN Gateway High Reliability..........................4 48 3. Detailed Problem and Solution Requirement...............6 49 4. The Bypass VXLAN Extended Community Attribute...........7 50 5. Control Plane Processing................................9 52 6. Data Packet Processing................................10 53 6.1 Layer 2 Unicast Packet Forwarding.....................10 54 6.1.1 Uplink..............................................10 55 6.1.2 Downlink............................................10 56 6.2 BUM Packet Forwarding................................11 58 7. IANA Considerations....................................12 59 7.1 IPv4 Specific.........................................12 60 7.2 IPv6 Specific.........................................12 62 8. Security Considerations................................12 64 Acknowledgements..........................................12 65 Contributors..............................................13 67 Normative References......................................13 68 Informative References....................................13 70 1. Introduction 72 A principal feature of EVPN is the ability to support multihoming 73 from a customer equipment (CE) to multiple provider edge equipment 74 (PE) with links used in the all-active redundancy mode. That mode is 75 where a device is multihomed to a group of two or more PEs and where 76 all PEs in such a redundancy group can forward traffic to/from the 77 multihomed device or network for a given VLAN [RFC7209]. This draft 78 specifies a VXLAN gateway mechanism to simplify PE processing in the 79 multi-homed case and enhance VXLAN Active-Active reliability. 81 1.1 Terminology and Acronyms 83 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 84 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 85 "OPTIONAL" in this document are to be interpreted as described in BCP 86 14 [RFC2119] [RFC8174] when, and only when, they appear in all 87 capitals, as shown here. 89 This document uses the following acronyms and terms: 91 All-Active Redundancy Mode - When a device is multihomed to a group 92 of two or more PEs and when all PEs in such redundancy group can 93 forward traffic to/from the multihomed device or network for a 94 given VLAN. 96 BUM - Broadcast, Unknown unicast, and Multicast. 98 CE - Customer Edge equipment. 100 DCI - Data Center Interconnect. 102 ESI - Ethernet Segment Identifier - A unique non-zero identifier that 103 identifies an Ethernet segment. 105 NVE - Network Virtualization Edge. 107 PE - Provider Edge equipment. 109 Single-Active Redundancy Mode - When a device or a network is 110 multihomed to a group of two or more PEs and when only a single PE 111 in such a redundancy group can forward traffic to/from the 112 multihomed device or network for a given VLAN. 114 VXLAN - Virtual eXtensible Local Area Network [RFC7348]. 116 VXTEP - VXLAN Tunnel End Point. 118 2. VXLAN Gateway High Reliability 120 One example of the current situation would be a DCI (data center 121 interconnect) using VXLAN tunnels that is multihomed for reliability 122 as show in Figure 1. Each PE as a VXLAN Tunnel End Point (VTEP) uses 123 a different IP adress. Thus each PE must process EVPN updates based 124 on the ESIs [RFC7432]. 126 ......... 127 . DCI . 128 +----------+ . . +----------+ 129 | PE +---------------------+ PE | 130 |VTEP IP-1 +--- . VXLAN . ---+VTEP IP-3 | 131 +----------+ \ .Tunnels. / +----------= 132 / | ----- ----- | \ 133 +--+ | . \ / . | +--+ 134 |CE| | . X . | |CE| 135 +--+ | . / \ . | +--+ 136 \ | ----- ----- | / 137 +----------+ / . VXLAN . \ +----------+ 138 | PE +--- .Tunnels. ---+ PE | 139 |VTEP IP-2 +---------------------+VTEP IP-4 | 140 +----------+ . . +----------+ 141 ......... 143 Figure 1. Current Situtation 145 The situation is greatly simplified if the set of VTEPs connected to 146 a particular Ethernet segment all use the same anycast IP address. 147 PEs no longer need to conern themselves with whether a remote CE is 148 single or multi-homed. The situation is as shown in Figure 2. The IP 149 address within each VTEP group is synchronized by messages within 150 that group. 152 ......... 153 . DCI . 154 +----------+ . . +----------+ 155 | Anycast | . . | Anycast | 156 |VTEP IP-1 +--- . . ---+VTEP IP-2 | 157 +----------+ \ . . / +----------= 158 / ^ \ . . / ^ \ 159 +--+ | \. ./ | +--+ 160 |CE| Sy|nc >-------< Sy|nc |CE| 161 +--+ | /. VXLAN .\ | +--+ 162 \ v / . Tunnel. \ v / 163 +----------+ / . . \ +----------+ 164 | Anycast +--- . . ---+ Anycast | 165 |VTEP IP-1 | . . |VTEP IP-2 | 166 +----------+ . . +----------+ 167 ......... 169 Figure 2. Situtation Using Anycast 171 3. Detailed Problem and Solution Requirement 173 In the scenario illustrated in Figure 3, where an enterprise site and 174 a data center are interconnected, the VPN gateways (PE1 and PE2) and 175 the enterprise site (CPE) are connected through a VXLAN tunnel to 176 provide L2/L3 services between the enterprise site (CPE) and data 177 center. The data center gateway (CE1) is dual-homed to PE1 and PE2 178 to access the VXLAN network, which enhances network access 179 reliability. When one PE fails, services can be rapidly switched to 180 the other PE, minimizing the impact on services. 182 As shown in Figure 3, PE1 and PE2 use a virtual address as a Network 183 Virtualization Edge (NVE) interface address at the network side, 184 namely, the Anycast VTEP address. In this way, the CPE is aware of 185 only one remote NVE interface and establishes a VXLAN tunnel with the 186 virtual address. The packets from the CPE can reach CE1 through 187 either PE1 or PE2. However, single-homed CEs may exist, such as CE2 188 and CE3. As a result, after reaching a PE, the packets from the CPE 189 may need to be forwarded by the other PE to a single-homed CE. 190 Therefore, a bypass VXLAN tunnel needs to be established between PE1 191 and PE2. An EVPN peer relationship is established between PE1 and 192 PE2. Different addresses, namely, bypass VTEP addresses, are 193 configured for PE1 and PE2 so that they can establish a bypass VXLAN 194 tunnel. 196 +-----+ 197 ---------------- | CPE | 198 ^ +-----+ 199 | / \ 200 | / \ 201 VXLAN Tunnel / \ 202 | / \ 203 | / Anycast \ 204 v +-----+ VTEP +-----+ 205 --------- | PE1 |------| PE2 | 206 +-----+ +-----+ 207 /\ /\ 208 / \ / \ 209 / \ Trunk / \ 210 / \ / \ 211 / +\---/+ \ 212 / | \ / | \ 213 / +--+--+ \ 214 / | \ 215 +-----+ +-----+ +-----+ 216 | CE2 | | CE1 | | CE3 | 217 +-----+ +-----+ +-----+ 219 Figure 3. Basic networking of the VXLAN active-active scenario 221 4. The Bypass VXLAN Extended Community Attribute 223 This sections describes the extensions specified to meeting the 224 requirements given in Section 3 and enhance VXLAN active-active 225 reliability. 227 This document specifies two new BGP extended communities, called the 228 Bypass VXLAN Extended Community. The extended communities have a 229 Type indicating they are transitive and are IPv4-address-specific or 230 IPv6-address-specific, depending on whether the VTEP address to be 231 accommodated is IPv4 or IPv6. In the new extended communities, the 232 4-byte or 16-byte global administrator field encodes the IPv4 or IPv6 233 address that is the VTEP address and the 2-byte local administrator 234 field is formatted as shown in Figures 4 and 5. 236 0 1 2 3 237 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 238 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 239 | Type=0x01 | Sub-Type=TBA1 | IPv4 Address | 240 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 241 | IPv4 Address (cont.) | Flags | Reserved | 242 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 244 Figure 4. IPv4-address-specific Bypass VXLAN Extended Community 246 0 1 2 3 247 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 248 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 249 | Type=0x00/0x40| Sub-Type=TBA2 | Target IPv6 Address | 250 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 251 | Target IPv6 Address (cont.) | 252 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 253 | Target IPv6 Address (cont.) | 254 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 255 | Target IPv6 Address (cont.) | 256 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 257 | Target IPv6 Address (cont.) | Flags | Reserved | 258 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 260 Figure 5. IPv6-address-specific Bypass VXLAN Extended Community 262 Where 264 Type: 265 0x01 = type for IPv4 specific use. 266 0x00 = type for transitive IPv6 specific use. 267 0x40 = type for non-transitive IPv6 specific use. 269 Sub-Type: 270 TBA1 = subtype for IPv4 specific use. 272 TBA2 = subtype for IPv6 specific use. 274 IPv4/IPv6: An address of that type. 276 Flags: MUST be sent as zero and ignored on receipt. 278 Reserved: MUST be sent as zero and ignored on receipt. 280 5. Control Plane Processing 282 Using the topology in Figure 3: 284 1) PE2 sends a multicast route to PE1. The source address of the 285 route is the Anycast VTEP address shared by PE1 and PE2. The route 286 carries the bypass VXLAN extended community attribute, including the 287 bypass VTEP address of PE1. 289 2) After receiving the multicast route from PE2, PE1 considers that 290 an Anycast relationship be established with PE2. This is because the 291 source address (Anycast VTEP address) of the route is the same as the 292 local virtual address of PE1 and the route carries the bypass VTEP 293 extended community attribute. Based on the bypass VXLAN extended 294 attribute of the route, PE1 establishes a bypass VXLAN tunnel to PE2. 296 3) PE1 learns the MAC address of the CEs through upstream packets 297 from the CEs and advertises them as routes to PE2 through BGP EVPN. 298 The routes carry the ESI of the links accessed by the CEs, and 299 information about the VLANs that the CE access, and the bypass VXLAN 300 extended community attribute. 302 4) PE1 learns the MAC address of the CPE through downstream packets 303 at the network side, specifies that the next-hop address of the MAC 304 route can be iterated to a static VXLAN tunnel, and advertises the 305 route to PE2. The next-hop address of the MAC route cannot be 306 changed. 308 6. Data Packet Processing 310 This section describes how Layer 2 unicast and BUM (Broadcast, 311 Unknown unicast, and Multicast) packets are forwarded. A description 312 of how Layer 3 packets transmitted on the same subnet and Layer 3 313 packets transmitted across subnets cases are forwarded will be 314 provided in a furture version of this document. 316 6.1 Layer 2 Unicast Packet Forwarding 318 The following two subsections discuss Layer 2 unicast forwarding in 319 the topology shown in Figure 3. 321 6.1.1 Uplink 323 After receiving Layer 2 unicast packets destined for the CPE from 324 CE1, CE2, and CE3, PE1 and PE2 search for their local MAC address 325 table to obtain outbound interfaces, perform VXLAN encapsulation on 326 the packets, and forward them to the CPE. 328 6.1.2 Downlink 330 After receiving a Layer 2 unicast packet sent by the CPE to CE1, PE1 331 performs VXLAN decapsulation on the packet, searches the local MAC 332 address table for the destination MAC address, obtains the outbound 333 interface, and forwards the packet to CE1. 335 After receiving a Layer 2 unicast packet sent by the CPE to CE2, PE1 336 performs VXLAN decapsulation on the packet, searches the local MAC 337 address table for the destination MAC address, obtains the outbound 338 interface, and forwards the packet to CE2. 340 After receiving a Layer 2 unicast packet sent by the CPE to CE3, PE1 341 performs VXLAN decapsulation on the packet, searches the local MAC 342 address table for the destination MAC address, and forwards it to PE2 343 over the bypass VXLAN tunnel. After the packet reaches PE2, PE2 344 searches the destination MAC address, obtains the outbound interface, 345 and forwards the packet to CE3. 347 The process for PE2 to forward packets from the CPE is the same as 348 that for PE1 to forward packets from the CPE. 350 6.2 BUM Packet Forwarding 352 Using the topology in Figure 3, if the destination address of a BUM 353 packet from the CPE is the Anycast VTEP address of PE1 and PE2, the 354 BUM packet may be forwarded to either PE1 or PE2. If the BUM packet 355 reaches PE2 first, PE2 sends a copy of the packet to CE3 and CE1. In 356 addition, PE2 sends a copy of the packet to PE1 through the bypass 357 VXLAN tunnel between PE1 and PE2. After the copy of the packet 358 reaches PE1, PE1 sends it to CE2, not to the CPE or CE1. In this 359 way, CE1 receives only one copy of the packet. 361 Using the topology in Figure 3, after a BUM packet from CE2 reaches 362 PE1, PE1 sends a copy of the packet to CE1 and the CPE. In addition, 363 PE1 sends a copy of the packet to PE2 through the bypass VXLAN tunnel 364 between PE1 and PE2. After the copy of the packet reaches PE2, PE2 365 sends it to CE3, not to the CPE or CE1. 367 Using the topology in Figure 3, after a BUM packet from CE1 reaches 368 PE1, PE1 sends a copy of the packet to CE2 and the CPE. In addition, 369 PE1 sends a copy of the packet to PE2 through the bypass VXLAN tunnel 370 between PE1 and PE2. After the copy of the packet reaches PE2, PE2 371 sends it to CE3, not to the CPE or CE1. 373 7. IANA Considerations 375 IANA is requested to assign two new Extended Community attribute 376 SubTypes as follows: 378 7.1 IPv4 Specific 380 Sub-Type Value Name Reference 381 -------------- ------------------------------- ---------- 382 TBA1 Bypass VXLAN Extended Community [this doc] 384 7.2 IPv6 Specific 386 Sub-Type Value Name Reference 387 -------------- ------------------------------- ---------- 388 TBA2 Bypass VXLAN Extended Community [this doc] 390 8. Security Considerations 392 TBD 394 For general EVPN Security Considerations, see [RFC7432]. 396 Acknowledgements 398 The authors would like to thank the following for their comments and 399 review of this document: 401 TBD 403 Contributors 405 The following individuals made significant contributions to this 406 document: 408 Haibo Wang 409 Huawei Technologies 410 Huawei Bldg., No. 156 Beiqing Road 411 Beijing 100095 412 China 414 Email: rainsword.wang@huawei.com 416 Normative References 418 [RFC2119] - Bradner, S., "Key words for use in RFCs to Indicate 419 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 420 March 1997, . 422 [RFC7432] - Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 423 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 424 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, 425 . 427 [RFC8174] - Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 428 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 429 2017, . 431 Informative References 433 [RFC7209] - Sajassi, A., Aggarwal, R., Uttaro, J., Bitar, N., 434 Henderickx, W., and A. Isaac, "Requirements for Ethernet VPN 435 (EVPN)", RFC 7209, DOI 10.17487/RFC7209, May 2014, 436 . 438 [RFC7348] - Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 439 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 440 eXtensible Local Area Network (VXLAN): A Framework for 441 Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", 442 RFC 7348, DOI 10.17487/RFC7348, August 2014, . 445 Authors' Addresses 447 Donald E. Eastlake, 3rd 448 Huawei Technologies 449 1424 Pro Shop Court 450 Davenport, FL 33896 USA 452 Phone: +1-508-333-2270 453 Email: d3e3e3@gmail.com 455 Zhenbin Li 456 Huawei Technologies 457 Huawei Bld., No.156 Beiqing Rd. 458 Beijing 100095 459 China 461 Email: lizhenbin@huawei.com 463 Shunwan Zhuang 464 Huawei Technologies 465 Huawei Bld., No.156 Beiqing Rd. 466 Beijing 100095 467 China 469 Email: zhuangshunwan@huawei.com 471 Copyright, Disclaimer, and Additional IPR Provisions 473 Copyright (c) 2019 IETF Trust and the persons identified as the 474 document authors. All rights reserved. 476 This document is subject to BCP 78 and the IETF Trust's Legal 477 Provisions Relating to IETF Documents 478 (http://trustee.ietf.org/license-info) in effect on the date of 479 publication of this document. Please review these documents 480 carefully, as they describe your rights and restrictions with respect 481 to this document. Code Components extracted from this document must 482 include Simplified BSD License text as described in Section 4.e of 483 the Trust Legal Provisions and are provided without warranty as 484 described in the Simplified BSD License. The definitive version of 485 an IETF Document is that published by, or under the auspices of, the 486 IETF. Versions of IETF Documents that are published by third parties, 487 including those that are translated into other languages, should not 488 be considered to be definitive versions of IETF Documents. The 489 definitive version of these Legal Provisions is that published by, or 490 under the auspices of, the IETF. Versions of these Legal Provisions 491 that are published by third parties, including those that are 492 translated into other languages, should not be considered to be 493 definitive versions of these Legal Provisions. For the avoidance of 494 doubt, each Contributor to the IETF Standards Process licenses each 495 Contribution that he or she makes as part of the IETF Standards 496 Process to the IETF Trust pursuant to the provisions of RFC 5378. No 497 language to the contrary, or terms, conditions or rights that differ 498 from or are inconsistent with the rights and licenses granted under 499 RFC 5378, shall have any effect and shall be null and void, whether 500 published or posted by such Contributor, or included with or in such 501 Contribution.