idnits 2.17.1 draft-chen-pce-ctr-availability-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 9, 2020) is 1325 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC5440' is defined on line 510, but no explicit reference was found in the text == Unused Reference: 'RFC8231' is defined on line 517, but no explicit reference was found in the text == Unused Reference: 'RFC8402' is defined on line 523, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Chen 3 Internet-Draft Futurewei 4 Intended status: Standards Track A. Wang 5 Expires: March 13, 2021 China Telecom 6 L. Liu 7 Fujitsu 8 X. Liu 9 Volta Networks 10 September 9, 2020 12 PCE for Network High Availability 13 draft-chen-pce-ctr-availability-01 15 Abstract 17 This document describes extensions to Path Computation Element (PCE) 18 communication Protocol (PCEP) for improving the reliability or 19 availability of a network controlled by a controller cluster. 21 Requirements Language 23 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 24 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 25 document are to be interpreted as described in RFC 2119 [RFC2119]. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on March 13, 2021. 44 Copyright Notice 46 Copyright (c) 2020 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (https://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Terminologies . . . . . . . . . . . . . . . . . . . . . . . . 3 63 3. PCE for Controller Cluster Reliability . . . . . . . . . . . 3 64 3.1. Overview of Mechanism . . . . . . . . . . . . . . . . . . 3 65 3.2. Example . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 4. Extensions to PCEP . . . . . . . . . . . . . . . . . . . . . 6 67 4.1. Capability . . . . . . . . . . . . . . . . . . . . . . . 6 68 4.2. Controllers Object . . . . . . . . . . . . . . . . . . . 7 69 5. Recovery Procedure . . . . . . . . . . . . . . . . . . . . . 10 70 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 71 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 72 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 73 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 74 9.1. Normative References . . . . . . . . . . . . . . . . . . 12 75 9.2. Informative References . . . . . . . . . . . . . . . . . 12 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 78 1. Introduction 80 More and more networks are controlled by central controllers or 81 controller clusters. A controller cluster is a single controller 82 externally. It normally consists of two or more controllers 83 internally working together as a single controller externally to 84 control a network, i.e., every network element (NE) in the network. 85 The reliability or availability of a network is heavily dependent on 86 its controller cluster. The issues or failures in the controller 87 cluster may impact the reliability or availability of the network 88 greatly. 90 For a controller cluster comprising two or more controllers (i.e., 91 primary controller, secondary controller, and so on), the failures in 92 the cluster may split the cluster into a few of separated controller 93 groups. These groups do not know each other and may be out of 94 synchronization. Two or more groups may be elected as primary groups 95 to control the network at the same time, which may cause some issues. 97 This document proposes some procedures and extensions to PCEP for the 98 separated controllers or controller groups to know each other thus 99 elect one new primary controller or controller group correctly when 100 the cluster is split because of failures in the cluster. 102 2. Terminologies 104 The following terminologies are used in this document. 106 PCE: Path Computation Element 108 PCEP: PCE communication Protocol 110 PCC: Path Computation Client 112 NE: Network Element 114 CE: Customer Edge 116 PE: Provider Edge 118 3. PCE for Controller Cluster Reliability 120 This section briefs the mechanism of controller cluster reliability 121 or availability using PCEP, and illustrates some details through a 122 simple example. 124 3.1. Overview of Mechanism 126 When a cluster of controllers is split into a few of separated groups 127 because of failures in the cluster, the live controllers are still 128 actually connected to the network (i.e., network elements). Through 129 some of these connections, each group can get the information about 130 the other groups. A new primary controller or controller group is 131 correctly elected to control the network based on the information. 133 Each controller has a PCEP session with each of a give number of the 134 same NEs in the network and the session is established and maintained 135 over an IP path between the controller and the NE. The session is a 136 session of PCEP with extensions. 138 In one example or configuration, the given number of NEs is one NE 139 with the highest node ID. Suppose that node PE2 as NE has the 140 highest ID. The session between the primary controller (e.g., A) and 141 the NE (e.g., PE2) is the session of PCEP with extensions. Each of 142 the non-primary controllers (e.g., B, C, ...) creates and maintains a 143 PCEP session with this NE (e.g., PE2). 145 In normal operations, the cluster has all its controllers connected. 146 They are the primary controller controlling the network, the 147 secondary controller, and so on. They have current position 1, 2, 148 and so on respectively. The primary controller advertises the 149 information about the controllers via its PCEP sessions to the given 150 number of the same NEs. 152 For example, it sends the information in a PCEP message to the NE 153 (e.g., PE2), which transfers the information to each of the other 154 controllers via the PCEP sessions to the other controllers. 156 When the cluster is split into a few separated groups of controllers, 157 each group elects an intent primary controller, secondary controller 158 and so on from the group, which have intent position 1, 2, and so on 159 respectively. The intent primary controller in each group advertises 160 the information about the controllers in its group. 162 The information advertised by the (intent) primary controller 163 includes its current (intent) position, its old position, its 164 priority to become a primary controller, number of controllers in its 165 group or cluster, and the IDs of the controllers which are ordered 166 according to their (intent) positions. In addition, a flag C 167 indicating that whether it is Controlling the network (i.e., it is 168 the primary controller or intent primary controller) is included. 170 3.2. Example 172 Figure 1 shows a controller cluster comprising two controllers: the 173 primary controller and the secondary controller. Each controller has 174 a PCEP session with the same NE, which is NE4. 176 +---------------------------------------------------+ 177 | Controller Cluster | 178 | | 179 | +------------+ +------------+ | 180 | |Controller A| Synchronize |Controller B| | 181 | |(Primary) +---------------+(Secondary) | | 182 | +------------+ +-----------++ | 183 | ^ | | 184 | |_______________ | | 185 | | | | 186 | v | | 187 +-----------------Channels to Network---------|-----+ 188 / \ | 189 PCEP session----> / \____ | 190 between / \ \____ | <--PCEP session 191 A and NEi /\ .---. .---+ \ | between 192 (i=1,2,..) | \( ' |'.---. | | B and NE4 193 |---\ Network | '+. | 194 (o NE1\ | | ) / 195 ( | | o) / 196 ( | | ) NE4 197 ( o NE2 o NE3.-' 198 ' ) 199 '---._.-. ) 200 '---' 202 Figure 1: Controller Cluster of 2 Controllers 204 The primary PCE controller (i.e., A) has a PCEP session with each NE 205 in the network, including NE4. The secondary controller (i.e., B) 206 has a PCEP session with the same NE4 in the network and the session 207 is established and maintained over an IP path between B and NE4. 209 In normal operations, controller A (Primary) sends NE4 a PCEP message 210 containing the information about the controllers connected to it. 211 NE4 transfers the information to controller B (Secondary). The 212 information includes: 214 C = 1, A's current Position = 1, A's OldPosition = 1, A's Priority, 215 NoControllers = 2, A's ID, B's ID 217 When failures happen in the cluster, the live controllers act as 218 follows: 220 For the primary controller (e.g., A), if it is alive, it continues to 221 be the primary controller. 223 For the secondary controller (e.g., B) alive, if the primary 224 controller is dead, it promotes itself as the new primary controller; 225 if the primary controller is alive but separated from the secondary 226 controller, the secondary controller will not promote itself to be a 227 new primary controller. 229 With the extensions to PCEP, the secondary controller can determine 230 the status of the primary controller based on the information about 231 the primary controller received. The conditions that the primary 232 controller is alive but separated from the secondary controller 233 (i.e., condition a: the connection between the primary controller and 234 the secondary controller in the cluster failed, but condition b: the 235 two controllers are alive) can be determined by the secondary 236 controller as follows: 238 For condition a, when the heartbeat from the primary stops, the 239 secondary knows that the connection between the primary and secondary 240 controller failed. 242 For condition b, it checks whether the information about the primary 243 controller is updated within a given time. If so, the primary 244 controller is alive; otherwise, it is dead. 246 4. Extensions to PCEP 248 This section describes extensions to PCEP. 250 4.1. Capability 252 During a PCEP session establishment, PCEP Speakers (PCE or PCC) 253 advertise their support for PCEP extensions for network reliability, 254 especially the High Availability of Controller cluster (HAC). A new 255 Controller HA Support Capability TLV is defined for HAC below. A 256 PCEP speaker indicates its support for HAC by including the TLV in 257 the OPEN object in its OPEN message if it supports for HAC. 259 0 1 2 3 260 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 261 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 262 | Type (TBD1) | Length (4) | 263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 264 | Flags |C| 265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 267 Figure 2: Controller HA Support Capability TLV 269 Type (16 bits): TBD1 is to be assigned by IANA. 271 Length (16 bits): It indicates the length of the Capability value 272 portion in octets, which is 4. 274 Flag (32 bits): One flag bit, C-bit, is defined. When it is set to 275 one, it indicates that the PCEP speaker supports the high 276 availability of controller cluster as a Controller. When it is 277 set to zero, it indicates that the PCEP speaker supports the high 278 availability of controller cluster as a network element (NE). 280 When two PCEP speakers establish a PCEP session between them, each of 281 the speakers indicates its support for HAC by including a Controller 282 HA Support Capability TLV in the OPEN object in its OPEN message if 283 it supports for HAC. 285 For a PCEP speaker supporting for HAC, if it receives the Controller 286 HA Support Capability TLV in the OPEN message from the other PCEP 287 speaker over the PCEP session, it records that the other PCEP speaker 288 (i.e., the other/remote end of the session) supports for HAC; 289 otherwise, it records that the other speaker does not. Thus for all 290 its PCEP sessions, it knows whether each session's remote end PCEP 291 speaker supports for HAC. If the C-bit in the TLV is set to one, the 292 PCEP speaker is a controller; otherwise, it is a NE. 294 A PCE as a controller supporting for HAC acts on the information 295 about the controllers in its cluster or group as follows: 297 It sends the information in a PCEP message to each of a given set of 298 NEs that runs PCEP with HAC support whenever the information changes. 299 The given set of NEs may be the one NE with the highest ID. 301 It adjusts the positions of the controllers accordingly whenever 302 there is a change in the information about the controllers received 303 from the NE supporting for HAC. 305 An NE running PCEP with HAC support receives the information about 306 the controllers from the PCE as a controller supporting for HAC, and 307 sends the information to every PCE as a controller supporting for HAC 308 and having a PCEP session with the NE except for the one from which 309 the information is received. 311 4.2. Controllers Object 313 A new object, called Controllers Object, is defined to contain the 314 information about controllers. A controller in a cluster may 315 advertise the information in a PCEP Report message containing a 316 Controllers Object of the following format. 318 0 1 2 3 319 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 320 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 321 | Object-Class | OT |Res|P|I| Object Length (bytes) | 322 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 323 | | 324 + TLVs + 325 | (including Controllers TLV) | 326 | | 327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 329 Figure 3: Controllers Object 331 Object-Class (8 bits): It is to be assigned by IANA. It identifies 332 the PCEP object class. 334 OT (4 bits): It is to be assigned by IANA. It identifies the PCEP 335 object type. 337 Res flags (2 bits): Reserved field. This field MUST be set to zero 338 on transmission and MUST be ignored. 340 P flag and I flag: Refer to RFC 5440, page 25. 342 Object Length (16 bits): It specifies the total object length 343 including the header, in bytes. 345 TLVs: This field includes one TLV, called Controllers TLV to be 346 defined below. 348 Under the Controllers Object, a new TLV, called Controllers TLV, is 349 defined to contain the information about controllers. It has the 350 following format. 352 0 1 2 3 353 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 355 | Type (TBD2) | Length | 356 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 357 | Flags |C| Position | OldPosition | Priority | 358 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 359 | Reserved | NoControllers | 360 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 361 | Connected Controller 1 ID | 362 : : | 363 | Connected Controller n ID | 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 366 Figure 4: Controllers TLV 368 Type (16 bits): TBD2 is to be assigned by IANA. 370 Length (16 bits): It indicates the length of the value portion in 371 octets. 373 Flag (8 bits): One flag bit, C-bit, is defined. When set, it 374 indicates that the position is the position of the current active 375 primary controller. In this case, C = 1 and Position = 1, which 376 indicate that the controller is the current active primary 377 controller controlling the network. 379 Position (8 bits): It indicates the current/intent position of the 380 controller in the controller cluster or group. 1: primary (first) 381 controller, 2: secondary controller, 3: third controller, and so 382 on (i.e., Controller Position of value n: n-th controller in the 383 cluster or group). 385 OldPosition (8 bits): ): It indicates the old position of the 386 controller in the controller cluster before it is split. 388 Priority (8 bits): It indicates the priority of the controller to be 389 elected as a primary controller. 391 Reserved (24 bits): Reserved field, must set to zero for 392 transmission and ignored for reception. 394 NoControllers (8 bits): It indicates the number of controllers 395 connected to the controller advertising the TLV. 397 Controller i ID (32 bits): It represents the identifier (ID) of 398 controller i at position i (i = 1, ..., n) in the cluster or 399 group. 401 5. Recovery Procedure 403 This section describes the recovery procedure for a controller 404 cluster of n (n > 2) controllers, which are the primary controller A, 405 the secondary controller B, ..., the n-th controller N. 407 When failures happen in the cluster, it may be split into a few 408 separated groups of controllers. In one policy, the group with the 409 maximum number of controllers is responsible for controlling the 410 network as the primary group of the cluster, in which the new primary 411 controller, secondary controller, and so on are elected. 413 For each separated group of controllers, the intent primary 414 controller, secondary controller, and so on are elected. The intent 415 primary controller of the group advertises the information about its 416 group. The information includes its intent position, its old 417 position, its priority to become a primary controller, the number of 418 controllers in the group, and identifiers of the controllers in the 419 group. The identifiers of the controllers are ordered according to 420 their positions. The identifier of the intent primary controller, 421 which has position 1, is the first one; The identifier of the intent 422 secondary controller, which has position 2, is the second one; and so 423 on. Thus every separated group has the information about the other 424 groups and can determine which group has the maximum number of 425 controllers. 427 In the case of tie (i.e., two or more groups have the same maximum 428 number of controllers), the group with the highest old position 429 controller (e.g., the old primary controller) wins in one policy. In 430 another policy, the group with the highest priority controller wins. 432 Some details of the recovery procedures in the current and intent 433 primary controller in a controller cluster or group are as follows. 435 In normal operations, it advertises the information about controllers 436 containing: 438 C = 1, Position = 1, Old Position = 1, Primary Controller's priority, 439 NoControllers = n, Primary Controller's ID, secondary controller's 440 ID, ..., and n-th Controller's ID. 442 When failures cause the cluster split, it advertises the information 443 about controllers containing: 445 C = 0, Position = 1, Old Position = 1, Intent Primary Controller's 446 priority, NoControllers = m (m is the number of controllers in the 447 group to which the intent primary controller belongs after the 448 failures), Intent Primary Controller's ID, IDs of the other 449 controllers connected. 451 Then after a given time, it checks if the group is elected as the 452 primary group. If so, it advertises the information about 453 controllers containing: 455 C = 1, Position = 1, Old Position = 1, its Priority, NoControllers = 456 m, the IDs of the controllers in the group. 458 One example is that failures split the cluster into two separated 459 groups: group 1 comprising A and C, group 2 consisting of B and N. 460 Each group elects its intent primary controller, secondary 461 controller, and so on. Suppose that controller A and C are elected 462 as the intent primary and secondary controller respectively in group 463 1; controller B and N are elected as the intent primary and secondary 464 controller respectively in group 2. 466 Each of the intent primary controllers A and B advertises the 467 information about the controllers in its group. The information 468 advertised by A includes: 470 C = 0, Position = 1, OldPosition = 1, A's Priority, NoControllers = 471 2, A's ID, C's ID. 473 The information advertised by B includes: 475 C = 0, Position = 1, OldPosition = 2, B's Priority, NoControllers = 476 2, B's ID, N's ID. 478 Group 1 and 2 have the same number of controllers, which is 2. But 479 OldPosition in group 1 is higher than that in group 2. Group 1 is 480 elected as the primary group, and the intent primary controller A in 481 the primary group is determined as the current primary controller. 482 After the determination, the information about the controllers in 483 group 1 (i.e., the primary group) is changed. The updated 484 information advertised by A includes: 486 C = 1, Position = 1, OldPosition = 1, A's Priority, NoControllers = 487 2, A's ID, C's ID. 489 6. IANA Considerations 491 TBD 493 7. Security Considerations 495 TBD 497 8. Acknowledgements 499 TBD 501 9. References 503 9.1. Normative References 505 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 506 Requirement Levels", BCP 14, RFC 2119, 507 DOI 10.17487/RFC2119, March 1997, 508 . 510 [RFC5440] Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation 511 Element (PCE) Communication Protocol (PCEP)", RFC 5440, 512 DOI 10.17487/RFC5440, March 2009, 513 . 515 9.2. Informative References 517 [RFC8231] Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path 518 Computation Element Communication Protocol (PCEP) 519 Extensions for Stateful PCE", RFC 8231, 520 DOI 10.17487/RFC8231, September 2017, 521 . 523 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 524 Decraene, B., Litkowski, S., and R. Shakir, "Segment 525 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 526 July 2018, . 528 Authors' Addresses 530 Huaimo Chen 531 Futurewei 532 Boston, MA 533 USA 535 Email: Huaimo.chen@futurewei.com 536 Aijun Wang 537 China Telecom 538 Beiqijia Town, Changping District 539 Beijing, 102209 540 China 542 Email: wangaj3@chinatelecom.cn 544 Lei Liu 545 Fujitsu 547 USA 549 Email: liulei.kddi@gmail.com 551 Xufeng Liu 552 Volta Networks 554 McLean, VA 555 USA 557 Email: xufeng.liu.ietf@gmail.com