idnits 2.17.1 draft-chen-pce-ctr-availability-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (20 March 2022) is 739 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC5440' is defined on line 508, but no explicit reference was found in the text == Unused Reference: 'RFC8231' is defined on line 515, but no explicit reference was found in the text == Unused Reference: 'RFC8402' is defined on line 521, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Chen 3 Internet-Draft Futurewei 4 Intended status: Standards Track A. Wang 5 Expires: 21 September 2022 China Telecom 6 L. Liu 7 Fujitsu 8 X. Liu 9 Volta Networks 10 20 March 2022 12 PCE for Network High Availability 13 draft-chen-pce-ctr-availability-04 15 Abstract 17 This document describes extensions to Path Computation Element (PCE) 18 communication Protocol (PCEP) for improving the reliability or 19 availability of a network controlled by a controller cluster. 21 Requirements Language 23 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 24 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 25 document are to be interpreted as described in RFC 2119 [RFC2119]. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on 21 September 2022. 44 Copyright Notice 46 Copyright (c) 2022 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 51 license-info) in effect on the date of publication of this document. 52 Please review these documents carefully, as they describe your rights 53 and restrictions with respect to this document. Code Components 54 extracted from this document must include Revised BSD License text as 55 described in Section 4.e of the Trust Legal Provisions and are 56 provided without warranty as described in the Revised BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Terminologies . . . . . . . . . . . . . . . . . . . . . . . . 3 62 3. PCE for Controller Cluster Reliability . . . . . . . . . . . 3 63 3.1. Overview of Mechanism . . . . . . . . . . . . . . . . . . 3 64 3.2. Example . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 4. Extensions to PCEP . . . . . . . . . . . . . . . . . . . . . 6 66 4.1. Capability . . . . . . . . . . . . . . . . . . . . . . . 6 67 4.2. Controllers Object . . . . . . . . . . . . . . . . . . . 7 68 5. Recovery Procedure . . . . . . . . . . . . . . . . . . . . . 9 69 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 70 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 71 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 72 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 73 9.1. Normative References . . . . . . . . . . . . . . . . . . 11 74 9.2. Informative References . . . . . . . . . . . . . . . . . 12 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 77 1. Introduction 79 More and more networks are controlled by central controllers or 80 controller clusters. A controller cluster is a single controller 81 externally. It normally consists of two or more controllers 82 internally working together as a single controller externally to 83 control a network, i.e., every network element (NE) in the network. 84 The reliability or availability of a network is heavily dependent on 85 its controller cluster. The issues or failures in the controller 86 cluster may impact the reliability or availability of the network 87 greatly. 89 For a controller cluster comprising two or more controllers (i.e., 90 primary controller, secondary controller, and so on), the failures in 91 the cluster may split the cluster into a few of separated controller 92 groups. These groups do not know each other and may be out of 93 synchronization. Two or more groups may be elected as primary groups 94 to control the network at the same time, which may cause some issues. 96 This document proposes some procedures and extensions to PCEP for the 97 separated controllers or controller groups to know each other thus 98 elect one new primary controller or controller group correctly when 99 the cluster is split because of failures in the cluster. 101 2. Terminologies 103 The following terminologies are used in this document. 105 PCE: Path Computation Element 107 PCEP: PCE communication Protocol 109 PCC: Path Computation Client 111 NE: Network Element 113 CE: Customer Edge 115 PE: Provider Edge 117 3. PCE for Controller Cluster Reliability 119 This section briefs the mechanism of controller cluster reliability 120 or availability using PCEP, and illustrates some details through a 121 simple example. 123 3.1. Overview of Mechanism 125 When a cluster of controllers is split into a few of separated groups 126 because of failures in the cluster, the live controllers are still 127 actually connected to the network (i.e., network elements). Through 128 some of these connections, each group can get the information about 129 the other groups. A new primary controller or controller group is 130 correctly elected to control the network based on the information. 132 Each controller has a PCEP session with each of a give number of the 133 same NEs in the network and the session is established and maintained 134 over an IP path between the controller and the NE. The session is a 135 session of PCEP with extensions. 137 In one example or configuration, the given number of NEs is one NE 138 with the highest node ID. Suppose that node PE2 as NE has the 139 highest ID. The session between the primary controller (e.g., A) and 140 the NE (e.g., PE2) is the session of PCEP with extensions. Each of 141 the non-primary controllers (e.g., B, C, ...) creates and maintains a 142 PCEP session with this NE (e.g., PE2). 144 In normal operations, the cluster has all its controllers connected. 145 They are the primary controller controlling the network, the 146 secondary controller, and so on. They have current position 1, 2, 147 and so on respectively. The primary controller advertises the 148 information about the controllers via its PCEP sessions to the given 149 number of the same NEs. 151 For example, it sends the information in a PCEP message to the NE 152 (e.g., PE2), which transfers the information to each of the other 153 controllers via the PCEP sessions to the other controllers. 155 When the cluster is split into a few separated groups of controllers, 156 each group elects an intent primary controller, secondary controller 157 and so on from the group, which have intent position 1, 2, and so on 158 respectively. The intent primary controller in each group advertises 159 the information about the controllers in its group. 161 The information advertised by the (intent) primary controller 162 includes its current (intent) position, its old position, its 163 priority to become a primary controller, number of controllers in its 164 group or cluster, and the IDs of the controllers which are ordered 165 according to their (intent) positions. In addition, a flag C 166 indicating that whether it is Controlling the network (i.e., it is 167 the primary controller or intent primary controller) is included. 169 3.2. Example 171 Figure 1 shows a controller cluster comprising two controllers: the 172 primary controller and the secondary controller. Each controller has 173 a PCEP session with the same NE, which is NE4. 175 +---------------------------------------------------+ 176 | Controller Cluster | 177 | | 178 | +------------+ +------------+ | 179 | |Controller A| Synchronize |Controller B| | 180 | |(Primary) +---------------+(Secondary) | | 181 | +------------+ +-----------++ | 182 | ^ | | 183 | |_______________ | | 184 | | | | 185 | v | | 186 +-----------------Channels to Network---------|-----+ 187 / \ | 188 PCEP session----> / \____ | 189 between / \ \____ | <--PCEP session 190 A and NEi /\ .---. .---+ \ | between 191 (i=1,2,..) | \( ' |'.---. | | B and NE4 192 |---\ Network | '+. | 193 (o NE1\ | | ) / 194 ( | | o) / 195 ( | | ) NE4 196 ( o NE2 o NE3.-' 197 ' ) 198 '---._.-. ) 199 '---' 201 Figure 1: Controller Cluster of 2 Controllers 203 The primary PCE controller (i.e., A) has a PCEP session with each NE 204 in the network, including NE4. The secondary controller (i.e., B) 205 has a PCEP session with the same NE4 in the network and the session 206 is established and maintained over an IP path between B and NE4. 208 In normal operations, controller A (Primary) sends NE4 a PCEP message 209 containing the information about the controllers connected to it. 210 NE4 transfers the information to controller B (Secondary). The 211 information includes: 213 C = 1, A's current Position = 1, A's OldPosition = 1, A's Priority, 214 NoControllers = 2, A's ID, B's ID 216 When failures happen in the cluster, the live controllers act as 217 follows: 219 For the primary controller (e.g., A), if it is alive, it continues to 220 be the primary controller. 222 For the secondary controller (e.g., B) alive, if the primary 223 controller is dead, it promotes itself as the new primary controller; 224 if the primary controller is alive but separated from the secondary 225 controller, the secondary controller will not promote itself to be a 226 new primary controller. 228 With the extensions to PCEP, the secondary controller can determine 229 the status of the primary controller based on the information about 230 the primary controller received. The conditions that the primary 231 controller is alive but separated from the secondary controller 232 (i.e., condition a: the connection between the primary controller and 233 the secondary controller in the cluster failed, but condition b: the 234 two controllers are alive) can be determined by the secondary 235 controller as follows: 237 For condition a, when the heartbeat from the primary stops, the 238 secondary knows that the connection between the primary and secondary 239 controller failed. 241 For condition b, it checks whether the information about the primary 242 controller is updated within a given time. If so, the primary 243 controller is alive; otherwise, it is dead. 245 4. Extensions to PCEP 247 This section describes extensions to PCEP. 249 4.1. Capability 251 During a PCEP session establishment, PCEP Speakers (PCE or PCC) 252 advertise their support for PCEP extensions for network reliability, 253 especially the High Availability of Controller cluster (HAC). A new 254 Controller HA Support Capability TLV is defined for HAC below. A 255 PCEP speaker indicates its support for HAC by including the TLV in 256 the OPEN object in its OPEN message if it supports for HAC. 258 0 1 2 3 259 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 260 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 261 | Type (TBD1) | Length (4) | 262 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 263 | Flags |C| 264 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 266 Figure 2: Controller HA Support Capability TLV 268 Type (16 bits): TBD1 is to be assigned by IANA. 270 Length (16 bits): It indicates the length of the Capability value 271 portion in octets, which is 4. 273 Flag (32 bits): One flag bit, C-bit, is defined. When it is set to 274 one, it indicates that the PCEP speaker supports the high 275 availability of controller cluster as a Controller. When it is 276 set to zero, it indicates that the PCEP speaker supports the high 277 availability of controller cluster as a network element (NE). 279 When two PCEP speakers establish a PCEP session between them, each of 280 the speakers indicates its support for HAC by including a Controller 281 HA Support Capability TLV in the OPEN object in its OPEN message if 282 it supports for HAC. 284 For a PCEP speaker supporting for HAC, if it receives the Controller 285 HA Support Capability TLV in the OPEN message from the other PCEP 286 speaker over the PCEP session, it records that the other PCEP speaker 287 (i.e., the other/remote end of the session) supports for HAC; 288 otherwise, it records that the other speaker does not. Thus for all 289 its PCEP sessions, it knows whether each session's remote end PCEP 290 speaker supports for HAC. If the C-bit in the TLV is set to one, the 291 PCEP speaker is a controller; otherwise, it is a NE. 293 A PCE as a controller supporting for HAC acts on the information 294 about the controllers in its cluster or group as follows: 296 It sends the information in a PCEP message to each of a given set of 297 NEs that runs PCEP with HAC support whenever the information changes. 298 The given set of NEs may be the one NE with the highest ID. 300 It adjusts the positions of the controllers accordingly whenever 301 there is a change in the information about the controllers received 302 from the NE supporting for HAC. 304 An NE running PCEP with HAC support receives the information about 305 the controllers from the PCE as a controller supporting for HAC, and 306 sends the information to every PCE as a controller supporting for HAC 307 and having a PCEP session with the NE except for the one from which 308 the information is received. 310 4.2. Controllers Object 312 A new object, called Controllers Object, is defined to contain the 313 information about controllers. A controller in a cluster may 314 advertise the information in a PCEP Report message containing a 315 Controllers Object of the following format. 317 0 1 2 3 318 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 319 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 320 | Object-Class | OT |Res|P|I| Object Length (bytes) | 321 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 322 | | 323 + TLVs + 324 | (including Controllers TLV) | 325 | | 326 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 328 Figure 3: Controllers Object 330 Object-Class (8 bits): It is to be assigned by IANA. It identifies 331 the PCEP object class. 333 OT (4 bits): It is to be assigned by IANA. It identifies the PCEP 334 object type. 336 Res flags (2 bits): Reserved field. This field MUST be set to zero 337 on transmission and MUST be ignored. 339 P flag and I flag: Refer to RFC 5440, page 25. 341 Object Length (16 bits): It specifies the total object length 342 including the header, in bytes. 344 TLVs: This field includes one TLV, called Controllers TLV to be 345 defined below. 347 Under the Controllers Object, a new TLV, called Controllers TLV, is 348 defined to contain the information about controllers. It has the 349 following format. 351 0 1 2 3 352 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 353 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 354 | Type (TBD2) | Length | 355 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 356 | Flags |C| Position | OldPosition | Priority | 357 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 358 | Reserved | NoControllers | 359 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 360 | Connected Controller 1 ID | 361 : : | 362 | Connected Controller n ID | 363 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 364 Figure 4: Controllers TLV 366 Type (16 bits): TBD2 is to be assigned by IANA. 368 Length (16 bits): It indicates the length of the value portion in 369 octets. 371 Flag (8 bits): One flag bit, C-bit, is defined. When set, it 372 indicates that the position is the position of the current active 373 primary controller. In this case, C = 1 and Position = 1, which 374 indicate that the controller is the current active primary 375 controller controlling the network. 377 Position (8 bits): It indicates the current/intent position of the 378 controller in the controller cluster or group. 1: primary (first) 379 controller, 2: secondary controller, 3: third controller, and so 380 on (i.e., Controller Position of value n: n-th controller in the 381 cluster or group). 383 OldPosition (8 bits): ): It indicates the old position of the 384 controller in the controller cluster before it is split. 386 Priority (8 bits): It indicates the priority of the controller to be 387 elected as a primary controller. 389 Reserved (24 bits): Reserved field, must set to zero for 390 transmission and ignored for reception. 392 NoControllers (8 bits): It indicates the number of controllers 393 connected to the controller advertising the TLV. 395 Controller i ID (32 bits): It represents the identifier (ID) of 396 controller i at position i (i = 1, ..., n) in the cluster or 397 group. 399 5. Recovery Procedure 401 This section describes the recovery procedure for a controller 402 cluster of n (n > 2) controllers, which are the primary controller A, 403 the secondary controller B, ..., the n-th controller N. 405 When failures happen in the cluster, it may be split into a few 406 separated groups of controllers. In one policy, the group with the 407 maximum number of controllers is responsible for controlling the 408 network as the primary group of the cluster, in which the new primary 409 controller, secondary controller, and so on are elected. 411 For each separated group of controllers, the intent primary 412 controller, secondary controller, and so on are elected. The intent 413 primary controller of the group advertises the information about its 414 group. The information includes its intent position, its old 415 position, its priority to become a primary controller, the number of 416 controllers in the group, and identifiers of the controllers in the 417 group. The identifiers of the controllers are ordered according to 418 their positions. The identifier of the intent primary controller, 419 which has position 1, is the first one; The identifier of the intent 420 secondary controller, which has position 2, is the second one; and so 421 on. Thus every separated group has the information about the other 422 groups and can determine which group has the maximum number of 423 controllers. 425 In the case of tie (i.e., two or more groups have the same maximum 426 number of controllers), the group with the highest old position 427 controller (e.g., the old primary controller) wins in one policy. In 428 another policy, the group with the highest priority controller wins. 430 Some details of the recovery procedures in the current and intent 431 primary controller in a controller cluster or group are as follows. 433 In normal operations, it advertises the information about controllers 434 containing: 436 C = 1, Position = 1, Old Position = 1, Primary Controller's priority, 437 NoControllers = n, Primary Controller's ID, secondary controller's 438 ID, ..., and n-th Controller's ID. 440 When failures cause the cluster split, it advertises the information 441 about controllers containing: 443 C = 0, Position = 1, Old Position = 1, Intent Primary Controller's 444 priority, NoControllers = m (m is the number of controllers in the 445 group to which the intent primary controller belongs after the 446 failures), Intent Primary Controller's ID, IDs of the other 447 controllers connected. 449 Then after a given time, it checks if the group is elected as the 450 primary group. If so, it advertises the information about 451 controllers containing: 453 C = 1, Position = 1, Old Position = 1, its Priority, NoControllers = 454 m, the IDs of the controllers in the group. 456 One example is that failures split the cluster into two separated 457 groups: group 1 comprising A and C, group 2 consisting of B and N. 458 Each group elects its intent primary controller, secondary 459 controller, and so on. Suppose that controller A and C are elected 460 as the intent primary and secondary controller respectively in group 461 1; controller B and N are elected as the intent primary and secondary 462 controller respectively in group 2. 464 Each of the intent primary controllers A and B advertises the 465 information about the controllers in its group. The information 466 advertised by A includes: 468 C = 0, Position = 1, OldPosition = 1, A's Priority, NoControllers = 469 2, A's ID, C's ID. 471 The information advertised by B includes: 473 C = 0, Position = 1, OldPosition = 2, B's Priority, NoControllers = 474 2, B's ID, N's ID. 476 Group 1 and 2 have the same number of controllers, which is 2. But 477 OldPosition in group 1 is higher than that in group 2. Group 1 is 478 elected as the primary group, and the intent primary controller A in 479 the primary group is determined as the current primary controller. 480 After the determination, the information about the controllers in 481 group 1 (i.e., the primary group) is changed. The updated 482 information advertised by A includes: 484 C = 1, Position = 1, OldPosition = 1, A's Priority, NoControllers = 485 2, A's ID, C's ID. 487 6. IANA Considerations 489 TBD 491 7. Security Considerations 493 TBD 495 8. Acknowledgements 497 TBD 499 9. References 501 9.1. Normative References 503 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 504 Requirement Levels", BCP 14, RFC 2119, 505 DOI 10.17487/RFC2119, March 1997, 506 . 508 [RFC5440] Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation 509 Element (PCE) Communication Protocol (PCEP)", RFC 5440, 510 DOI 10.17487/RFC5440, March 2009, 511 . 513 9.2. Informative References 515 [RFC8231] Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path 516 Computation Element Communication Protocol (PCEP) 517 Extensions for Stateful PCE", RFC 8231, 518 DOI 10.17487/RFC8231, September 2017, 519 . 521 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 522 Decraene, B., Litkowski, S., and R. Shakir, "Segment 523 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 524 July 2018, . 526 Authors' Addresses 528 Huaimo Chen 529 Futurewei 530 Boston, MA, 531 United States of America 532 Email: Huaimo.chen@futurewei.com 534 Aijun Wang 535 China Telecom 536 Beiqijia Town, Changping District 537 Beijing 538 102209 539 China 540 Email: wangaj3@chinatelecom.cn 542 Lei Liu 543 Fujitsu 544 United States of America 545 Email: liulei.kddi@gmail.com 547 Xufeng Liu 548 Volta Networks 549 McLean, VA 550 United States of America 551 Email: xufeng.liu.ietf@gmail.com