idnits 2.17.1 draft-hao-bess-evpn-centralized-df-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 20, 2018) is 2230 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Donald Eastlake 2 Intended status: Proposed Standard Weiguo Hao 3 Lili Wang 4 Yizhou Li 5 Shunwan Zhuang 6 Huawei 7 Expires: September 19, 2018 March 20, 2018 9 Centralized EVPN DF Election 10 draft-hao-bess-evpn-centralized-df-01.txt 12 Abstract 14 This document proposes a centralized DF Designated Forwarder election 15 mechanism to be used between an SDN (Software Defined Network) 16 controller and each PE (Provider Edge) device in an EVPN network. 17 Such a mechanism overcomes the issues of current standalone DF 18 election defined in RFC 7432. A new BGP capability and an additional 19 DF Election Result Route Type are specified to support this 20 centralized DF mechanism. 22 Status of this Memo 24 This Internet-Draft is submitted to IETF in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Distribution of this document is unlimited. Comments should be sent 28 to the authors or the TRILL working group mailing list: 29 trill@ietf.org. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF), its areas, and its working groups. Note that 33 other groups may also distribute working documents as Internet- 34 Drafts. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 The list of current Internet-Drafts can be accessed at 42 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 43 Shadow Directories can be accessed at 44 http://www.ietf.org/shadow.html. 46 Table of Contents 48 1. Introduction............................................3 49 2. Conventions used in this document.......................4 51 3. Solution Overview.......................................5 52 3.1 Centralized DF Election Capability.....................5 54 4. DF Election Result Route Type...........................7 55 4.1 DF Election Result Route Encoding......................7 56 4.2 Centralized DF Election procedures.....................9 58 5. Security Considerations................................10 59 6. IANA Considerations....................................11 61 Normative References......................................12 62 Informative References....................................12 64 Acknowledgments...........................................13 65 Authors' Addresses........................................13 67 1. Introduction 69 [RFC7432] defines the Designated Forwarder (DF) election mechanism in 70 EVPN networks to appoint one PE as DF from a candidate list of PEs 71 and VLANs (or VLAN bundles) connecting to a multi-homed CE device or 72 access network. The DF PE is responsible for sending broadcast, 73 multicast and unknown unicast traffic (BUM) to the multi-homed CE 74 device or network and non-DF PEs must drop such traffic. This DF 75 based mechanism is used to prevent duplicated packet injection into 76 the multi-homed access network via multiple PEs. 78 In [RFC7432] the DF is selected according to the VLAN modulus 79 "service-carving" algorithm in order to perform load balancing for 80 multi-destination traffic destined to a given segment]. The algorithm 81 can ensure each participating PE independently and unambiguously 82 determines which one of the participating PEs is the DF; however, use 83 of this algorithm has some drawbacks as follows [EVPN-HRW-DF]. 85 1. Uneven load balancing in some VLAN configuration cases when the 86 Ethernet tag follows a non-uniform distribution, for instance when 87 the Ethernet tags are all even or all odd. 89 2. Unnecessary service disruption when PEs join or leave a redundancy 90 group. In Figure 1 below, say v1, v2 and v3 are VLANs configured 91 on ES2 with associated Ethernet tags of value 3, 4 and 5 92 respectively. So PE1, PE2 and PE3 are also the DFs for v1, v2 and 93 v3 respectively. Now when PE3 goes down, PE2 will become the DF 94 for v1 and v3 while PE1 will become the DF for v2, needless churn 95 of v1 and v2 occurs, and it will cause unnecessary service 96 disruption in v1 and v2. 98 3. Lack of user control over DF election. In some cases, the user may 99 want to flexibly control the load balancing based on VLAN number, 100 bandwidth consumption, and other factors. The user should be 101 allowed to use some specific DF re-election algorithm to avoid 102 service disruption. The user also should be allowed to specify 103 revertive and non-revertive mode for on-demand DF switchover in 104 order to carry out some maintenance tasks. 106 This document specifies a centralized DF election method to overcome 107 the issues aforementioned. A physically distributed but logically 108 centralized controller is deployed to perform the DF election 109 calculation for all multi-homed PEs. Each individual multi-homed PE 110 should disable its own DF election process and listen to the DF 111 election result from the SDN controller. [RFC7432] DF election 112 procedures are extended for the interaction between the SDN 113 Controller and each PE. 115 2. Conventions used in this document 117 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 118 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 119 "OPTIONAL" in this document are to be interpreted as described in BCP 120 14 [RFC2119] [RFC8174] when, and only when, they appear in all 121 capitals, as shown here. 123 The following terms and acronyms are used: 125 CE: Customer Edge device, e.g., a host, router, or switch. 127 DF: Designated Forwarder. 129 Ethernet Segment (ES): When a customer site (device or network) is 130 connected to one or more PEs via a set of Ethernet links, 131 then that set of links is referred to as an "Ethernet 132 segment". 134 ESI: Ethernet Segment Identifier: A unique non-zero identifier 135 that identifies an Ethernet segment. 137 EVI: An EVPN instance spanning the Provider Edge (PE) devices 138 participating in that EVPN. 140 EVPN: Ethernet Virtual Private Network. 142 PE: Provider Edge device. 144 NLRI: Network Layer Reachability Information. 146 SDN: Software Defined Networking. 148 VLAN: Virtual Local Area Network. 150 3. Solution Overview 152 ------------------ 153 | SDN Controller | 154 ------------------ 155 | 156 ------------------------------------------ 157 / \ 158 | MPLS EVPN Network | 159 \ / 160 ------------------------------------------ 161 | | | | | 162 ------- ------- ------- ------- ------- 163 | PE1 | | PE2 | | PE3 | | PE4 | | PE5 | 164 ------- ------- ------- ------- ------- 165 \ | / \ / 166 \ | / \ / 167 \ | / \ / 168 ------- ------- 169 | CE1 | | CE2 | 170 ------- ------- 172 Figure 1. Centralized DF Election Scenario 174 In Figure 1, CE1 is multi-homed to PE1,PE2 and PE3, the ESI is 1. 175 CE2 is multi-homed to PE4 and PE5, the ESI is 2. The SDN controller 176 will be pre-provisioned with the entire network's ESI related 177 configuration. This includes EVI, the Ethernet Tags on each ESI, 178 redundancy mode of active-active or active-standby for each ESI, 179 and EVI correspondence. 181 Before each PE and the SDN controller exchange BGP route information 182 for DF election, the SDN controller and each PE MUST negotiate a new 183 BGP centralized DF election capability and role when OPEN messages 184 are first exchanged; each multi-homed PE is the client for DF 185 election while the SDN controller is the server. For the DF election 186 Client, the regular DF election process as per [RFC7432] will be 187 disabled, and each PE listens to the DF/Non-DF result from the SDN 188 controller at the granularity of or . For 189 the DF election server, after it receives Ethernet Segment route from 190 each PE, it will perform DF election calculation based on a local 191 algorithm and will notify each EVPN PE of the election result through 192 a new EVPN route type. 194 3.1 Centralized DF Election Capability 196 The centralized DF election capability is a new BGP capability 197 [RFC5492] that can be used by a BGP speaker to indicate its ability 198 to support for the new DF election process. 200 This capability is defined as follows: 202 Capability code: TBD1 204 Capability length: 2 octets 206 Capability value: Consists of the "Election Flags" field and 207 "Holding Time" field as follows: 209 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15| 210 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 211 | Election | Holding Time in seconds | 212 | Flags | | 213 | (4 bits) | (12 bits) | 214 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 216 The use and meaning of these fields are as follows: 218 Election Flags: This field contains bit flags related to 219 restart as follows: 221 | 0 1 2 3| 222 +---+---+---+---+ 223 | C | S | Rsv | 224 +---+---+---+---+ 226 C: The most significant bit is the election Client bit. When 227 set to 1 it indicates the BGP speaker is a Client which 228 will await the DF election result from the Controller 229 (Server). 231 S: When set to 1, this bit indicates that the BGP speaker is 232 the Server (Controller) that has the DF election 233 calculation capability for all multi-homed PEs in the 234 entire EVPN network. 236 Rsv: Reserved bits that MUST be sent as zero and ignored on 237 receipt. 239 Holding Time: This is the estimated time in seconds it will 240 take for the client to get DF election result from the 241 controller after the BGP session is established. When no 242 result for the DF election is received after the holding 243 time, PEs will revert to the traditional EVPN DF election 244 process as per [RFC7432]. 246 4. DF Election Result Route Type 248 The current BGP EVPN NLRI as defined in [RFC7432] is shown below: 250 +-----------------------------------+ 251 | Route Type (1 octet) | 252 +-----------------------------------+ 253 | Length (1 octet) | 254 +-----------------------------------+ 255 | Route Type specific (variable) | 256 +-----------------------------------+ 258 This document defines an additional Route Type used for the server 259 (SDN Controller) to send DF election results to each client (PE). 260 The Route Type is "DF Election Result Route Type". 262 The detailed encoding of this route and associated procedures are 263 described in the following sections. 265 4.1 DF Election Result Route Encoding 267 The route type specific information for a DF Election Result Route 268 NLRI consists of the following fields: 270 +--------------------------------------+ 271 | RD (8 octets) | 272 +--------------------------------------+-------+ 273 | Ethernet Segment Identifier (10 octets) | 274 +----------------------------------------------+ 275 | TLVs ... 276 +------------------------------------- 278 Figure 2: DF Election Result Router NLRI 280 RD: The Route Distinguisher (RD) MUST be a Type 1 RD [RFC4364]. 281 The value field comprises an IP address of the Controller 282 (typically, the loopback address) followed by a number unique 283 to the Controller. 285 ESI: Ethernet Segment Identifier: Is a non-zero 10-octet 286 identifier for an Ethernet Segment. 288 TLVs: Information in the TLVs field is encoded in 289 Type/Length/Value triplets. Multiple TLVs can be included. This 290 document specifies type 1, the VLAN Bitmap type, whose 291 structure is as follows: 293 +-----------------------------------------+ 294 | DF Election Result Type = 1 | (2 octets) 295 +-----------------------------------------+ 296 | Length | (2 octets) 297 +------------------------+----------------+ 298 |IP Address Prefix Length| (1 octet) 299 +------------------------...----------------------+ 300 | Client PE IP Address (4 or 16 octets) | 301 +------------------------...----------------------+ 302 | RESV | Start VLAN ID | (2 octets) 303 +-----------------------------------------+ 304 | VLAN bit-map.... ... 305 +------------------------------ 307 Figure 3. DF Election Result TLV Format 309 o DF Election Result Type (2 octets): Identifies the type 310 of DF Election result. This document defines type 1 as 311 the "VLAN Bitmap" Type. TLVs withe unknown types are 312 ignored and skipped upon receipt. 314 o Length (2 octets): The total number of octets of the 315 value part of the TLV. 317 The type and length are followed by the variable length value. 318 This value, for the VLAN Bitmap type, consists of the following 319 fields: 321 o The IP Prefix Length can be set to a value between 0 and 322 32 (bits) for ipv4 and between 0 and 128 for ipv6. 324 o The Client PE IP Address will be a 32 or 128-bit field 325 (ipv4 or ipv6) as PE's identification. 327 o RESV is a 4-bit reserved field that MUST be sent as zero 328 and ignored on receipt. 330 o Start VLAN ID: The 12-bit VLAN ID that is represented by 331 the high order bit of the first byte of the VLAN bit-map. 333 o VLAN bit-map: The highest order bit indicates the VLAN 334 equal to the start VLAN ID, the next highest bit 335 indicates the VLAN equal to start VLAN ID + 1, continuing 336 to the end of the VLAN bit-map field. A bit value of 1 337 indicates DF and a bit value of 0 indicates non-DF. 339 4.2 Centralized DF Election procedures 341 The controller has all ES related configuration information for the 342 entire EVPN network. After the controller boots up, it can start a 343 boot-timer to allow the establishment of BGP EVPN sessions with all 344 multi-homed EVPN PEs. The controller also needs to receive all ES 345 routes from those PEs before the boot-timer timeout. The controller 346 will preserve all EVPN PE's ES routes. 348 Based on a local algorithm for each ES, it can start to perform the 349 DF election calculation. The default algorithm is the VLAN modulus 350 method defined in section 8.5 [RFC7432] relying on local VLAN 351 configuration on each ES. A user defined algorithm should be allowed. 353 After the DF election calculation is finished on the controller, it 354 will notify each multi-homed PE using the newly defined DF Election 355 Result Route. The DF Election Result Route is per ES, i.e., the DF 356 election results for all PEs connecting to the same ES are carried in 357 one route. The controller that advertises the Ethernet Segment route 358 must carry an ES-Import Route Target. The DF Election Result 359 filtering procedure is same as the Ethernet Segment route filtering 360 defined in [RFC7432], i.e., the DF Election Result Route filtering 361 MUST be imported only by the PEs that are Multi-homed to the same 362 Ethernet segment. Each Multi-homed PE compares the Client PE IP 363 Address with its local IP Address, if the two IP addresses are same, 364 then it gets the corresponding start VLAN and VLAN Bitmap as the DF 365 election results. 367 When the failure of a multi-homed PE is detected by the controller, 368 the controller will initiate the DF re-election process. Because 369 it's the controller making decisions as to which PE is DF or non-DF, 370 the controller should ensure that the DF re-election won't cause 371 unnecessary service disruption. In the example above, the controller 372 should only redistribute the DF VLAN on PE3 to PE1 and PE2, the 373 existing DF VLAN on PE1 and PE2 should remain unchanged to avoid 374 service disruption. 376 When the access link fails on one multi-homed PE, the PE will 377 advertise an Ethernet Segment Withdraw message to the controller, 378 which will trigger the DF re-election on the controller, the re- 379 election principle is same as in the node failure case to minimize 380 service disruption. 382 5. Security Considerations 384 Procedures and protocol extensions defined in this document do not 385 affect the BGP security model. The communications between the SDN 386 Controller and EVPN PEs should be protected to ensure security. BGP 387 peerings are not automatic and require configuration, thus it is the 388 responsibility of the network operator to ensure that they are 389 trusted entities. 391 6. IANA Considerations 393 Three IANA actions are requested as below. 395 IANA is requested to assign a new BGP Capability Code in the 396 Capability Code registry as follows: 398 Value Description Reference 399 ------ ----------------------- --------------- 400 TBD1 Centralized DF Election [this document] 402 This document requested the assignment of value TBD2 in the "EVPN 403 Route Types" registry created by [RFC7432] and modification of the 404 registry to add the following: 406 Value Description Reference 407 ------ ------------------ --------------- 408 TBD2 DF Election Result [this document] 410 IANA is requested to create a registry for "DF Election Result Types" 411 as follows: 413 Name: DF Election Result Types 414 Registration Procedure: First Come First Served 415 Reference: [this document] 417 Type Description Reference 418 -------- ------------- --------- 419 0 (Reserved) 420 1 VLAN Bitmap [this document] 421 2-65534 unassigned 422 65535 (reserved) 424 Normative References 426 [RFC2119] - Bradner, S., "Key words for use in RFCs to Indicate 427 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, 428 March 1997, . 430 [RFC4364] - Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 431 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 432 2006, . 434 [RFC5492] - Scudder, J. and R. Chandra, "Capabilities Advertisement 435 with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February 2009, 436 . 438 [RFC7432] - Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., 439 Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based 440 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, 441 . 443 [RFC8174] - Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 444 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 445 2017, . 447 Informative References 449 [EVPN-HRW-DF] - Mohanty S. et al. "A new Designated Forwarder 450 Election for the EVPN", draft-mohanty-bess-evpn-df-election-02, 451 work-in-progress, October 19, 2015. 453 Acknowledgments 455 The authors wish to acknowledge the important contributions of 456 Qiandeng Liang. 458 Authors' Addresses 460 Donald Eastlake, 3rd 461 Huawei Technologies 462 155 Beaver Street 463 Milford, MA 01757 USA 465 Email: d3e3e3@gmail.com 467 Weiguo Hao 468 Huawei Technologies 469 101 Software Avenue, 470 Nanjing 210012, China 472 Email: haoweiguo@huawei.com 474 Lili Wang 475 Huawei Technologies 476 Huawei Bld., No.156 Beiqing Rd. 477 Beijing 100095, China 479 Email: lily.wong@huawei.com 481 Yizhou Li 482 Huawei Technologies 483 101 Software Avenue, 484 Nanjing 210012, China 486 Email: liyizhou@huawei.com 488 Shunwan Zhuang 489 Huawei Technologies 490 Huawei Bld., No.156 Beiqing Rd. 491 Beijing, 100095 China 493 Email: zhuangshunwan@huawei.com 495 Copyright, Disclaimer, and Additional IPR Provisions 497 Copyright (c) 2018 IETF Trust and the persons identified as the 498 document authors. All rights reserved. 500 This document is subject to BCP 78 and the IETF Trust's Legal 501 Provisions Relating to IETF Documents 502 (http://trustee.ietf.org/license-info) in effect on the date of 503 publication of this document. Please review these documents 504 carefully, as they describe your rights and restrictions with respect 505 to this document. Code Components extracted from this document must 506 include Simplified BSD License text as described in Section 4.e of 507 the Trust Legal Provisions and are provided without warranty as 508 described in the Simplified BSD License.