idnits 2.17.1 draft-hao-bess-evpn-centralized-df-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 14, 2018) is 2020 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Donald Eastlake 2 Intended status: Proposed Standard Weiguo Hao 3 Lili Wang 4 Yizhou Li 5 Shunwan Zhuang 6 Huawei 7 Expires: April 13, 2019 October 14, 2018 9 Centralized EVPN DF Election 10 draft-hao-bess-evpn-centralized-df-03.txt 12 Abstract 14 This document proposes a centralized DF Designated Forwarder election 15 mechanism to be used between an SDN (Software Defined Network) 16 controller and each PE (Provider Edge) device in an EVPN network. 17 Such a mechanism overcomes some issues with the current standalone DF 18 election defined in RFC 7432. A new BGP capability and an additional 19 DF Election Result Route Type are specified to support this 20 centralized DF election mechanism. 22 Status of this Memo 24 This Internet-Draft is submitted to IETF in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Distribution of this document is unlimited. Comments should be sent 28 to the authors or the BESS working group mailing list: bess@ietf.org. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as Internet- 33 Drafts. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 The list of current Internet-Drafts can be accessed at 41 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 42 Shadow Directories can be accessed at 43 http://www.ietf.org/shadow.html. 45 Table of Contents 47 1. Introduction............................................3 48 2. Conventions used in this document.......................4 50 3. Solution Overview.......................................5 51 3.1 Centralized DF Election Capability.....................5 53 4. DF Election Result Route Type...........................7 54 4.1 DF Election Result Route Encoding......................7 55 4.2 Centralized DF Election procedures.....................9 57 5. Security Considerations................................10 58 6. IANA Considerations....................................11 60 Normative References......................................12 61 Informative References....................................12 63 Acknowledgments...........................................13 64 Authors' Addresses........................................13 66 1. Introduction 68 [RFC7432] defines a standardized Designated Forwarder (DF) election 69 mechanism in EVPN networks to appoint one Provider Edge (PE) device 70 as the DF from a candidate list of PEs for each VLAN (or VLAN bundle) 71 connecting to a multi-homed Customer Edge (CE) device or access 72 network. The DF PE is responsible for sending broadcast, multicast 73 and unknown unicast traffic (BUM) to the multi-homed CE device or 74 network and non-DF PEs must drop such traffic. This DF based 75 mechanism is used to prevent duplicated packet injection into the 76 multi-homed access network via multiple PEs. 78 In [RFC7432] the DF is selected according to the VLAN modulus 79 "service-carving" algorithm in order to perform load balancing for 80 multi-destination traffic destined to a given segment. The algorithm 81 can ensure each participating PE independently and unambiguously 82 determines which one of the participating PEs is the DF; however, use 83 of this algorithm has some drawbacks as follows [EVPN-HRW-DF]: 85 1. Uneven load balancing in some VLAN configuration cases when the 86 Ethernet tags have a non-uniform distribution, for instance when 87 the Ethernet tags in use are all even or all odd. 89 2. Unnecessary service disruption when PEs join or leave a redundancy 90 group. In Figure 1 below, say v1, v2 and v3 are VLANs configured 91 on ES2 with associated Ethernet tags of value 3, 4 and 5 92 respectively. So PE1, PE2 and PE3 are also the DFs for v1, v2 and 93 v3 respectively. Now when PE3 goes down, PE2 will become the DF 94 for v1 and v3 while PE1 will become the DF for v2, so needless 95 churn of v1 and v2 occurs causing unnecessary service disruption 96 in v1 and v2. 98 3. Lack of user control over DF election. In some cases, the user may 99 want to flexibly control the load balancing based on VLAN number, 100 bandwidth consumption, and other factors. The user should be 101 allowed to use some specific DF re-election algorithm to avoid 102 service disruption. The user also should be allowed to specify 103 revertive and non-revertive mode for on-demand DF switchover in 104 order to carry out some maintenance tasks. 106 This document specifies a centralized DF election method to overcome 107 the issues aforementioned. A physically distributed but logically 108 centralized controller is deployed to perform the DF election 109 calculation for all multi-homed PEs. Each individual multi-homed PE 110 in the redundancy group should disable its own DF election process 111 and listen to the DF election result from the SDN controller. 112 [RFC7432] DF election procedures are extended for the interaction 113 between the SDN Controller and each PE. 115 2. Conventions used in this document 117 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 118 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 119 "OPTIONAL" in this document are to be interpreted as described in BCP 120 14 [RFC2119] [RFC8174] when, and only when, they appear in all 121 capitals, as shown here. 123 The following terms and acronyms are used: 125 CE: Customer Edge device, e.g., a host, router, or switch. 127 DF: Designated Forwarder. 129 Ethernet Segment (ES): When a customer site (device or network) is 130 connected to one or more PEs via a set of Ethernet links, 131 then that set of links is referred to as an "Ethernet 132 segment". 134 ESI: Ethernet Segment Identifier: A unique non-zero identifier 135 that identifies an Ethernet segment. 137 EVI: An EVPN instance spanning the Provider Edge (PE) devices 138 participating in that EVPN. 140 EVPN: Ethernet Virtual Private Network [RFC7432]. 142 PE: Provider Edge device. 144 NLRI: Network Layer Reachability Information. 146 SDN: Software Defined Networking. 148 VLAN: Virtual Local Area Network. 150 3. Solution Overview 152 ------------------ 153 | SDN Controller | 154 ------------------ 155 | 156 ------------------------------------------- 157 / \ 158 | MPLS EVPN Network | 159 \ / 160 ------------------------------------------- 161 | | | | | 162 ------- ------- ------- ------- ------- 163 | PE1 | | PE2 | | PE3 | | PE4 | | PE5 | 164 ------- ------- ------- ------- ------- 165 \ | / \ / 166 \ | / \ / 167 \ | / \ / 168 ------- ------- 169 | CE1 | | CE2 | 170 ------- ------- 172 Figure 1. Centralized DF Election Scenario 174 In Figure 1, CE1 is multi-homed to PE1, PE2 and PE3, the ESI is 1. 175 CE2 is multi-homed to PE4 and PE5, the ESI is 2. The SDN controller 176 will be pre-provisioned with the entire network's ESI related 177 configuration. This includes EVI, the Ethernet Tags on each ESI, 178 redundancy mode of active-active or active-standby for each ESI, 179 and EVI correspondence. 181 Before each PE and the SDN controller exchange BGP route information 182 for DF election, the SDN controller and each PE MUST negotiate a new 183 BGP centralized DF election capability and role when OPEN messages 184 are first exchanged; each PE participating in multi-homing is the 185 client for the DF election information while the SDN controller is 186 the server. For these PEs the regular DF election process as per 187 [RFC7432] will be disabled and each PE listens to the DF/Non-DF 188 result from the SDN controller at the granularity of or 189 . For the DF election server, after it receives 190 Ethernet Segment route from each PE, it will perform DF election 191 calculation based on a local algorithm and will notify each EVPN PE 192 of the election result through a new EVPN route type. 194 3.1 Centralized DF Election Capability 196 The centralized DF election capability is a new BGP capability 197 [RFC5492] that can be used by a BGP speaker to indicate its ability 198 to support for the new DF election process. 200 This capability is defined as follows: 202 Capability code: TBD1 204 Capability length: 2 octets 206 Capability value: Consists of the "Election Flags" field and 207 "Holding Time" field as follows: 209 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15| 210 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 211 | Election | Holding Time in seconds | 212 | Flags | | 213 | (4 bits) | (12 bits) | 214 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 216 The use and meaning of these fields are as follows: 218 Election Flags: This field contains bit flags related to 219 restart as follows: 221 | 0 1 2 3| 222 +---+---+---+---+ 223 | S | Resv | 224 +---+---+---+---+ 226 S: The most significant bit is the election Server bit. 227 When set to 1, this bit indicates that the BGP speaker is 228 the Server (Controller) that has the DF election 229 calculation capability for all multi-homed PEs in the 230 entire EVPN network. When set to 0 it indicates the BGP 231 speaker is a Client which will await the DF election 232 result from the Controller (Server). 234 Resv: Reserved bits that MUST be sent as zero and ignored on 235 receipt. 237 Holding Time: This is the estimated maximum time in seconds it 238 will take for the client to get DF election results from the 239 controller after the BGP session is established. When no 240 result for the DF election is received after the holding 241 time, PEs will revert to the traditional EVPN DF election 242 process as per [RFC7432]. 244 4. DF Election Result Route Type 246 The current BGP EVPN NLRI as defined in [RFC7432] is shown below: 248 +-----------------------------------+ 249 | Route Type (1 octet) | 250 +-----------------------------------+ 251 | Length (1 octet) | 252 +-----------------------------------+ 253 | Route Type specific (variable) | 254 +-----------------------------------+ 256 This document defines an additional Route Type used for the server 257 (SDN Controller) to send DF election results to each client (PE). 258 The Route Type is named the "DF Election Result Route Type". 260 The detailed encoding of this route and associated procedures are 261 described in the following sections. 263 4.1 DF Election Result Route Encoding 265 The route type specific information for a DF Election Result Route 266 NLRI consists of the following fields: 268 Route Type specific information: 269 +--------------------------------------+ 270 | RD (8 octets) | 271 +--------------------------------------+-------+ 272 | Ethernet Segment Identifier (10 octets) | 273 +----------------------------------------------+ 274 | TLVs ... 275 +------------------------------------- 277 Figure 2: DF Election Result Router Type specific information 279 RD: The Route Distinguisher (RD) MUST be a Type 1 RD [RFC4364]. 280 The value field comprises an IP address of the Controller 281 (typically, the loopback address) followed by a number unique 282 to the Controller. 284 ESI: Ethernet Segment Identifier: Is a non-zero 10-octet 285 identifier for an Ethernet Segment. 287 TLVs: Information in the TLVs field is encoded in 288 Type/Length/Value triplets. Multiple TLVs can be included. This 289 document specifies type 1, the VLAN Bitmap type, whose 290 structure is as follows: 292 +-----------------------------------------------+ 293 | DF Election Result Type = 1 | (2 octets) 294 +-----------------------------------------------+ 295 | Length | (2 octets) 296 +-+---------------------+-----------------------+ 297 |V|IP Addr Prefix Length| (1 octet) 298 +-+---------------------+-...-----------------------...---+ 299 | Client PE IP Address (4 or 16 octets) | 300 +------------------------...--------------------+---...---+ 301 | RESV | Start VLAN ID | (2 octets) 302 +-----------------------------------------------+ 303 | VLAN bit-map.... ... 304 +------------------------------ 306 Figure 3. DF Election Result TLV Format 308 o DF Election Result Type (2 octets): Identifies the type 309 of DF Election result as an unsigned integer in network 310 byte order. This document defines type 1 as the "VLAN 311 Bitmap" Type. TLVs withe unknown types are ignored and 312 skipped upon receipt. 314 o Length (2 octets): The total number of octets of the 315 value part of the TLV as an unsigned integer in network 316 byte order. 318 The type and length are followed by the variable length value. 319 This value, for the VLAN Bitmap type, consists of the following 320 fields: 322 o V: A one bit field that indicates which version of IP the 323 TLV uses. A value of 1 implies ipv6 while 0 implies ipv4. 325 o The IP Prefix Length can be set to a value between 0 and 326 32 (bits) for ipv4 and between 0 and 127 for ipv6. If IP 327 Prefix Length is greater than 32 for ipv4, the TLV is 328 corrupt and MUST be ignored. 330 o The Client PE IP Address will be a 32 or 128-bit field 331 (ipv4 or ipv6 depending on the value of the V field) as 332 PE's identification. 334 o RESV is a 4-bit reserved field that MUST be sent as zero 335 and ignored on receipt. 337 o Start VLAN ID: The 12-bit VLAN ID that is represented by 338 the high order bit of the first byte of the VLAN bit-map. 340 o VLAN bit-map: The highest order bit indicates the VLAN 341 equal to the start VLAN ID, the next highest bit 342 indicates the VLAN equal to start VLAN ID + 1, continuing 343 to the end of the VLAN bit-map field. A bit value of 1 344 indicates DF and a bit value of 0 indicates non-DF. 346 4.2 Centralized DF Election procedures 348 The controller has all ES related configuration information for the 349 entire EVPN network. After the controller boots up, it can start a 350 boot-timer to allow the establishment of BGP EVPN sessions with all 351 multi-homed EVPN PEs. The controller also needs to receive all ES 352 routes from those PEs before the boot-timer timeout. The controller 353 will preserve all EVPN PE's ES routes. 355 Based on a local algorithm for each ES, after it has received the 356 above data, it can start to perform the DF election calculation. The 357 default algorithm is the VLAN modulus method defined in section 8.5 358 [RFC7432] relying on local VLAN configuration for each ES. A user 359 defined algorithm should be allowed. 361 After the DF election calculation is finished on the controller, it 362 will notify each multi-homed PE using the newly defined DF Election 363 Result Route. The DF Election Result Route is per ES, i.e., the DF 364 election results for all PEs connecting to the same ES are carried in 365 one route. The controller that advertises the Ethernet Segment route 366 MUST carry an ES-Import Route Target. The DF Election Result 367 filtering procedure is the same as the Ethernet Segment route 368 filtering defined in [RFC7432], i.e., the DF Election Result Route 369 filtering MUST be imported only by the PEs that are Multi-homed to 370 the same Ethernet segment. Each Multi-homed PE compares the Client PE 371 IP Address with its local IP Address, if the two IP addresses are 372 same, then it gets the corresponding start VLAN and VLAN Bitmap as 373 the DF election results. 375 When the failure of a multi-homed PE is detected by the controller, 376 the controller will initiate the DF re-election process. Because 377 it's the controller making decisions as to which PE is DF or non-DF, 378 the controller should ensure that the DF re-election does not cause 379 unnecessary service disruption. In the example above, the controller 380 should only redistribute the DF VLAN on PE3 to PE1 and PE2, the 381 existing DF VLAN on PE1 and PE2 should remain unchanged to avoid 382 service disruption. 384 When the access link fails on one multi-homed PE, the PE will 385 advertise an Ethernet Segment Withdraw message to the controller, 386 which will trigger the DF re-election on the controller. The re- 387 election principle in this case is same as in the node failure case 388 to minimize service disruption. 390 5. Security Considerations 392 Procedures and protocol extensions defined in this document do not 393 affect the BGP security model. The communications between the SDN 394 Controller and EVPN PEs should be protected to ensure security. BGP 395 peerings are not automatic and require configuration, thus it is the 396 responsibility of the network operator to ensure that they are 397 trusted entities. 399 6. IANA Considerations 401 Three IANA actions are requested as below. 403 IANA is requested to assign a new BGP Capability Code in the 404 Capability Code registry as follows: 406 Value Description Reference 407 ------ ----------------------- --------------- 408 TBD1 Centralized DF Election [this document] 410 This document requested the assignment of value TBD2 in the "EVPN 411 Route Types" registry created by [RFC7432] and modification of the 412 registry to add the following: 414 Value Description Reference 415 ------ ------------------ --------------- 416 TBD2 DF Election Result [this document] 418 IANA is requested to create a registry for "DF Election Result Types" 419 as follows: 421 Name: DF Election Result Types 422 Registration Procedure: First Come First Served 423 Reference: [this document] 425 Type Description Reference 426 -------- ------------- --------- 427 0 (Reserved) 428 1 VLAN Bitmap [this document] 429 2-65534 unassigned 430 65535 (reserved) 432 Normative References 434 [RFC2119] - Bradner, S., "Key words for use in RFCs to Indicate 435 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 436 March 1997, . 438 [RFC4364] - Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 439 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 440 2006, . 442 [RFC5492] - Scudder, J. and R. Chandra, "Capabilities Advertisement 443 with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February 2009, 444 . 446 [RFC7432] - Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 447 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 448 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, 449 . 451 [RFC8174] - Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 452 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 453 2017, . 455 Informative References 457 [EVPN-HRW-DF] - Mohanty S. et al. "A new Designated Forwarder 458 Election for the EVPN", draft-mohanty-bess-evpn-df-election-02, 459 work-in-progress, October 19, 2015. 461 Acknowledgments 463 The authors wish to acknowledge the important contributions of 464 Qiandeng Liang. 466 Authors' Addresses 468 Donald Eastlake, 3rd 469 Huawei Technologies 470 1424 Pro Shop Court 471 Davenport, FL 33896 USA 473 Email: d3e3e3@gmail.com 475 Weiguo Hao 476 Huawei Technologies 477 101 Software Avenue, 478 Nanjing 210012, China 480 Email: haoweiguo@huawei.com 482 Lili Wang 483 Huawei Technologies 484 Huawei Bld., No.156 Beiqing Rd. 485 Beijing 100095, China 487 Email: lily.wong@huawei.com 489 Yizhou Li 490 Huawei Technologies 491 101 Software Avenue, 492 Nanjing 210012, China 494 Email: liyizhou@huawei.com 496 Shunwan Zhuang 497 Huawei Technologies 498 Huawei Bld., No.156 Beiqing Rd. 499 Beijing, 100095 China 501 Email: zhuangshunwan@huawei.com 503 Copyright, Disclaimer, and Additional IPR Provisions 505 Copyright (c) 2018 IETF Trust and the persons identified as the 506 document authors. All rights reserved. 508 This document is subject to BCP 78 and the IETF Trust's Legal 509 Provisions Relating to IETF Documents 510 (http://trustee.ietf.org/license-info) in effect on the date of 511 publication of this document. Please review these documents 512 carefully, as they describe your rights and restrictions with respect 513 to this document. Code Components extracted from this document must 514 include Simplified BSD License text as described in Section 4.e of 515 the Trust Legal Provisions and are provided without warranty as 516 described in the Simplified BSD License.