idnits 2.17.1 draft-ietf-forces-ceha-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 5 instances of too long lines in the document, the longest one being 30 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 309 has weird spacing: '... |try v...' -- The document date (February 20, 2012) is 4442 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 5810' is mentioned on line 338, but not defined == Unused Reference: 'RFC5812' is defined on line 670, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Ogawa 3 Internet-Draft NTT Corporation 4 Intended status: Standards Track W. M. Wang 5 Expires: August 23, 2012 Zhejiang Gongshang University 6 E. Haleplidis 7 University of Patras 8 J. Hadi Salim 9 Mojatatu Networks 10 February 20, 2012 12 ForCES Intra-NE High Availability 13 draft-ietf-forces-ceha-03 15 Abstract 17 This document discusses CE High Availability within a ForCES NE. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on August 23, 2012. 36 Copyright Notice 38 Copyright (c) 2012 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 55 2.1. Document Scope . . . . . . . . . . . . . . . . . . . . . . 5 56 2.2. Quantifying Problem Scope . . . . . . . . . . . . . . . . 5 57 3. RFC5810 CE HA Framework . . . . . . . . . . . . . . . . . . . 6 58 3.1. Current CE High Availability Support . . . . . . . . . . . 6 59 3.1.1. Cold Standby Interaction with ForCES Protocol . . . . 7 60 3.1.2. Responsibilities for HA . . . . . . . . . . . . . . . 9 61 4. CE HA Hot Standby . . . . . . . . . . . . . . . . . . . . . . 10 62 4.1. Changes to the FEPO model . . . . . . . . . . . . . . . . 10 63 4.2. FEPO processing . . . . . . . . . . . . . . . . . . . . . 11 64 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 65 6. Security Considerations . . . . . . . . . . . . . . . . . . . 16 66 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 67 7.1. Normative References . . . . . . . . . . . . . . . . . . . 16 68 7.2. Informative References . . . . . . . . . . . . . . . . . . 16 69 Appendix 1. Appendix I - New FEPO version . . . . . . . . . . . . 17 70 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 72 1. Definitions 74 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 75 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 76 document are to be interpreted as described in RFC 2119. 78 The following definitions are taken from [RFC3654]and [RFC3746]: 80 Logical Functional Block (LFB) -- A template that represents a fine- 81 grained, logically separate aspects of FE processing. 83 ForCES Protocol -- The protocol used at the Fp reference point in the 84 ForCES Framework in [RFC3746]. 86 ForCES Protocol Layer (ForCES PL) -- A layer in the ForCES 87 architecture that embodies the ForCES protocol and the state transfer 88 mechanisms as defined in [RFC5810]. 90 ForCES Protocol Transport Mapping Layer (ForCES TML) -- A layer in 91 ForCES protocol architecture that specifically addresses the protocol 92 message transportation issues, such as how the protocol messages are 93 mapped to different transport media (like SCTP, IP, TCP, UDP, ATM, 94 Ethernet, etc), and how to achieve and implement reliability, 95 security, etc. 97 2. Introduction 99 Figure 1 illustrates a ForCES NE controlled by a set of redundant CEs 100 with CE1 being active and CE2 and CEn-1 being a backup. 102 ----------------------------------------- 103 | ForCES Network Element | 104 | +-----------+ | 105 | | CEn-1 | | 106 | | (Backup) | | 107 -------------- Fc | +------------+ +------------+ | | 108 | CE Manager |--------+-| CE1 |------| CE2 |-+ | 109 -------------- | | (Active) | Fr | (Backup) | | 110 | | +-------+--+-+ +---+---+----+ | 111 | Fl | | | Fp / | | 112 | | | +---------+ / | | 113 | | Fp| |/ |Fp | 114 | | | | | | 115 | | | Fp /+--+ | | 116 | | | +-------+ | | | 117 | | | | | | | 118 -------------- Ff | --------+--+-- ----+---+----+ | 119 | FE Manager |--------+-| FE1 | Fi | FE2 | | 120 -------------- | | |------| | | 121 | -------------- -------------- | 122 | | | | | | | | | | 123 ----+--+--+--+----------+--+--+--+------- 124 | | | | | | | | 125 | | | | | | | | 126 Fi/f Fi/f 128 Fp: CE-FE interface 129 Fi: FE-FE interface 130 Fr: CE-CE interface 131 Fc: Interface between the CE Manager and a CE 132 Ff: Interface between the FE Manager and an FE 133 Fl: Interface between the CE Manager and the FE Manager 134 Fi/f: FE external interface 136 Figure 1: ForCES Architecture 138 The ForCES architecture allows FEs to be aware of multiple CEs but 139 enforces that only one CE be the master controller. This is known in 140 the industry as 1+N redundancy. The master CE controls the FEs via 141 the ForCES protocol operating in the Fp interface. If the master CE 142 becomes faulty, a backup CE takes over and NE operation continues. 143 By definition, the current documented setup is known as cold-standby. 144 The CE set is static and is passed to the FE by the FE Manager (FEM) 145 via the Ff interface and to each CE by the CE Manager (CEM) in the Fc 146 interface during the pre-association phase. 148 From an FE perspective, the knobs of control for a CE set are defined 149 by the FEPO LFB in [RFC5810], Appendix B. Section 3.1 of this 150 document details these knobs further. 152 2.1. Document Scope 154 It is assumed that the reader is aware of the ForCES architecture to 155 make sense of the changes made here. This document provides minimal 156 background to set the context of the discussion in Section 4. 158 By current definition, the Fr interface is out of scope for the 159 ForCES architecture. However, it is expected that organizations 160 implementing a set of CEs will need to have the CEs communicate to 161 each other via the Fr interface in order to achieve the 162 synchronization necessary for controlling the FEs. 164 The problem scope addressed by this document falls into 2 areas: 166 1. To describe with more clarity (than [RFC5810]) how current cold- 167 standby approach operates within the NE cluster. 169 2. To describe how to evolve the cold-standby setup to a hot-standby 170 redundancy setup so as to improve the failover time and NE 171 availability. 173 2.2. Quantifying Problem Scope 175 The NE recovery and availability is dependent on several time- 176 sensitive metrics: 178 1. How fast the CE plane failure is detected the FE. 180 2. How fast a backup CE becomes operational. 182 3. How fast the FEs associate with the new master CE. 184 4. How fast the FEs recover their state and become operational. 186 The design goals of the current [RFC5810] choices to meet the above 187 goals are driven by desire for simplicity. 189 To quantify the above criteria with the current prescribed ForCES CE 190 setup in [RFC5810]: 192 1. How fast the CE side detects a CE failure is left undefined. To 193 illustrate an extreme scenario, we could have a human operator 194 acting as the monitoring entity to detect faulty CEs. How fast 195 such detection happens could be in the range of seconds to days. 196 A more active monitor on the Fr interface could improve this 197 detection. 199 2. How fast the backup CE becomes operational is also currently out 200 of scope. In the current setup, a backup CE need not be 201 operational at all (for example, to save power) and therefore it 202 is feasible for a monitoring entity to boot up a backup CE after 203 it detects the failure of the master CE. In this document 204 Section 4 we suggest that at least one backup CE be online so as 205 to improve this metric. 207 3. How fast an FE associates with new master CE is also currently 208 undefined. The cost of an FE connecting and associating adds to 209 the recovery overhead. As mentioned above we suggest having at 210 least one backup CE online. In Section 4 we propose to zero out 211 the connection and association cost on failover by having each FE 212 associate with all online backup CEs after associating to the 213 active CE. Note that if an FE pre-associates with backup CEs, 214 then the system will be technically operating in hot-standby 215 mode. 217 4. And last: How fast an FE recovers its state depends on how much 218 NE state exists. By ForCES current definition, the new master CE 219 assumes zero state on the FE and starts from scratch to update 220 the FE. So the larger the state, the longer the recovery. 222 3. RFC5810 CE HA Framework 224 To achieve CE High Availabilty, FEs and CEs MUST inter-operate per 225 [RFC5810] definition which is repeated for contextual reasons in 226 Section 3.1. It should be noted that in this default setup, which 227 MUST be implemented by CEs and FEs needing HA, the Fr plane is out of 228 scope (and if available is proprietary to an implementation). 230 3.1. Current CE High Availability Support 232 As mentioned earlier, although there can be multiple redundant CEs, 233 only one CE actively controls FEs in a ForCES NE. In practice there 234 may be only one backup CE. At any moment in time only one master CE 235 can control the FEs. In addition, the FE connects and associates to 236 only the master CE. The FE and the CE PL are aware of the primary 237 and one or more secondary CEs. This information (primary, secondary 238 CEs) is configured on the FE and the CE PLs during pre-association by 239 the FEM and the CEM respectively. 241 Figure 2 below illustrates the Forces message sequences that the FE 242 uses to recover the connection in current defined cold-standby 243 scheme. 245 FE CE Primary CE Secondary 246 | | | 247 | Asso Estb,Caps exchg | | 248 1 |<--------------------->| | 249 | | | 250 | state update | | 251 2 |<--------------------->| | 252 | | | 253 | | | 254 | FAILURE | 255 | | 256 | Asso Estb,Caps exchange | 257 3 |<------------------------------------------>| 258 | | 259 | Event Report (pri CE down) | 260 4 |------------------------------------------->| 261 | | 262 | state update from scratch | 263 5 |<------------------------------------------>| 265 Figure 2: CE Failover for Cold Standby 267 3.1.1. Cold Standby Interaction with ForCES Protocol 269 High Availability parameterization in an FE is driven by configuring 270 the FE Protocol Object (FEPO) LFB. 272 The FEPO CEID component identifies the current master CE and the 273 component table BackupCEs identifies the backup CEs. The FEPO FE 274 Heartbeat Interval, CE Heartbeat Dead Interval, and CE Heartbeat 275 policy help in detecting connectivity problems between an FE and CE. 276 The CE Failover policy defines how the FE should react on a detected 277 failure. 279 Figure 3 illustrates the defined state machine that facilitates 280 connection recovery. 282 The FE connects to the CE specified on FEPO CEID component. If it 283 fails to connect to the defined CE, it moves it to the bottom of 284 table BackupCEs and sets its CEID component to be the first CE 285 retrieved from table BackupCEs. The FE then attempts to associate 286 with the CE designated as the new primary CE. The FE continues 287 through this procedure until it successfully connects to one of the 288 CEs. 290 FE tries to associate 291 +-->-----+ 292 | | 293 (CE issues Teardown || +---+--------v----+ 294 Lost association) && | Pre-Association | 295 CE failover policy = 0 | (Association | 296 +------------>-->-->| in +<----+ 297 | | progress) | | 298 | CE Issues +--------+--------+ | 299 | Association | | CEFTI 300 | Response V | timer 301 | ___________________+ | expires 302 | | ^ 303 | V | 304 +-+-----------+ +------+-----+ 305 | | | Not | 306 | | (CE issues Teardown || | Associated | 307 | | Lost association) && | +->---+ 308 | Associated | CE Failover Policy = 1 |(May | FE | 309 | | | Continue |try v 310 | |-------->------->------>| Forwarding)|assn | 311 | | | |-<---+ 312 | | | | 313 +-------------+ +-------+-----+ 314 ^ | 315 | CE Issues v 316 | Association | 317 | Setup | 318 +_________________________________________+ 320 Figure 3: FE State Machine considering HA 322 When communication fails between the FE and CE (which can be caused 323 by either the CE or link failure but not FE related), either the TML 324 on the FE will trigger the FE PL regarding this failure or it will be 325 detected using the HB messages between FEs and CEs. The 326 communication failure, regardless of how it is detected, MUST be 327 considered as a loss of association between the CE and corresponding 328 FE. 330 If the FE's FEPO CE Failover Policy is configured to mode 0 (the 331 default), it will immediately transition to the pre-association 332 phase. This means that if association is again established, all FE 333 state will need to be re-established. 335 If the FE's FEPO CE Failover Policy is configured to mode 1, it 336 indicates that the FE is capable of HA restart recovery. In such a 337 case, the FE transitions to the Not Associated state and the CEFTI 338 timer[RFC 5810] is started. The FE MAY continue to forward packets 339 during this state. It MAY also recycle through any configured backup 340 CEs in a round-robin fashion. It first adds its primary CE to the 341 bottom of table BackupCEs and sets its CEID component to be the first 342 secondary retrieved from table BackupCEs. The FE then attempts to 343 associate with the CE designated as the new primary CE. If it fails 344 to re-associate with any CE and the CEFTI expires, the FE then 345 transitions to the pre-association state. 347 If the FE, while in the not associated state, manages to reconnect to 348 a new primary CE before CEFTI expires it transitions to the 349 Associated state. Once re-associated, the CE tries to synchronize 350 any state that the FE may have lost during the not associated state. 351 How the CE re-synchronizes such state is out of scope for the current 352 ForCES architecture but would include issuing new configs and 353 queries. 355 An explicit message (a Config message setting Primary CE component in 356 ForCES Protocol object) from the primary CE, can also be used to 357 change the Primary CE for an FE during normal protocol operation. In 358 this case, the FE transitions to the Not Associated State and 359 attempts to Associate with the new CE. 361 3.1.2. Responsibilities for HA 363 TML Level: 365 1. The TML controls logical connection availability and failover. 367 2. The TML also controls peer HA management. 369 At this level, control of all lower layers, for example transport 370 level (such as IP addresses, MAC addresses etc) and associated links 371 going down are the role of the TML. 373 PL Level: 374 All other functionality, including configuring the HA behavior during 375 setup, the CE IDs used to identify primary and secondary CEs, 376 protocol messages used to report CE failure (Event Report), Heartbeat 377 messages used to detect association failure, messages to change the 378 primary CE (Config), and other HA related operations described in 379 Section 3.1, are the PL's responsibility. 381 To put the two together, if a path to a primary CE is down, the TML 382 would take care of failing over to a backup path, if one is 383 available. If the CE is totally unreachable then the PL would be 384 informed and it would take the appropriate actions described before. 386 4. CE HA Hot Standby 388 In this section we describe small extensions to the existing scheme 389 to enable hot standby HA. To achieve hot standby HA, we target 390 specific goals defined in Section 2.2, namely: 392 o How fast a backup CE becomes operational. 394 o How fast the FEs associate with the new master CE. 396 As described in Section 3.1, in the pre-association phase the FEM 397 configures the FE to make it aware of all the CEs in the NE. The FEM 398 MUST configure the FE to make it aware of which CE is the master and 399 MAY specify any backup CE(s). 401 4.1. Changes to the FEPO model 403 In order for the above to be achievable there is a need to make a few 404 changes in the FEPO model. Section 1 contains the xml definition of 405 the new version 2 of the FEPO LFB. 407 Changes from the version 1 of FEPO are: 409 1. Addition of a new datatype, status (unsigned char) with special 410 values 0 (Disconnected), 1 (Connected), 2 (Associated), 3 411 (Lost_Connection) and 4 (Unreachable). 413 2. Change Component BackupCEs (9) to AllCEs and instead of an Array 414 of unsigned integers(CEID), it MUST be an Array of unsigned 415 integers (CEID) and unsigned char (status) for each CE. 417 3. Add two special values to the CEFailoverPolicyValues. 2 (High 418 availability without Graceful restart) and 3 (High availability 419 with Graceful restart). 421 4. Added one additional Event, the HAPrimaryCEDown event which 422 reports last known CEID and tentative new master CEID. 424 As the FEPO component 9 is not backwards compatible with the previous 425 version there is the issue of interoperability between CE and FE. 427 However this is a pre-association version mismatch and the managers 428 have to identify the issue and not allow an association that would 429 fail or cause problems. 431 4.2. FEPO processing 433 The FE's FEPO LFB version 2 AllCEs table (previously BackupCEs) 434 contains all the CEIDs that the FE may connect and associate with. 435 The ordering of the CE IDs in this table defines the priority order 436 in which an FE will connect to the CEs. In the pre-association 437 phase, the first CE ID (lowest table index) in the AllCEs table MUST 438 be the first CE ID that the FE will attempt to connect and associate 439 with. If the FE fails to connect and associate with the first CE ID, 440 it will attempt to connect to the second CE ID and so forth, and 441 cycles back to the beggining of the list until there is a connection 442 and an association. The FE MUST associate with at least one CE. 443 Upon a successful association, the FEPO's CEID component identifies 444 the current associated master CE. 446 For the sake of simplicity, the FE MUST respond to messages issued 447 only by the master CE. This simplifies the synchronization and 448 avoids the concept of locking FE state. i.e the FE MUST drop any 449 messages from backup CEs. However, asynchronous events that the 450 master CE has subscribed to, as well as heartbeats are sent to all 451 associated-to CEs. Packet redirects continue to be sent only to the 452 master CE. The Heartbeat Interval, the CEHB Policy and the FEHB 453 Policy MUST be the same for all CEs. 455 Figure 4 illustrates the state machine that facilitates connection 456 recovery with High Availability enabled. 458 FE tries to associate 459 +-->-----+ 460 | | 461 ^ v 462 (CE issues Teardown || +----+--------+---+ 463 Lost association) && | Pre-Association | 464 CE failover policy = 0 | (Association +<-------------------+ 465 +------------>-->-->| in +<-----+ | 466 | | progress) | | | 467 | CE Issues +--------+--------+ | | 468 | Association | | | 469 | Response V Not Found || CEFTI | 470 | ___________________+ timer expires | 471 | | | | 472 | V ^ | 473 +-+-----------+ +------+------+ | 474 | | | Not | | 475 | | (CE issues Teardown || | Associated | | 476 | | Lost association) && | | CEFTI 477 | Associated | (CE Failover Policy=2|| | (May | timer 478 | | CE Failover Policy=3) | Continue | expires 479 | +---------->------->----->| Forwarding)| | 480 | | | | | 481 | | | Search for | | 482 | | +--------->| next | | 483 | | | | associated | | 484 | | | | CE | | 485 +-------------+ | +-------------+ | 486 ^ | V | 487 | | | | 488 | | Found CE | 489 | CEHDI Expires Send Event of | 490 | | New CE ID. | 491 | | | | 492 | | V | 493 | | +------+------+ | 494 | ^---------+ Confirm +-------^ 495 | | State | 496 | Received +---->| | 497 | different | | Wait for CE | 498 | CE ID. ^ | to confirm | 499 | Resend Event | | new CE ID | 500 | +----<| | 501 | +-----+-------+ 502 | Received same CE ID | 503 +_______________________________________+ 504 Figure 4: FE State Machine considering HA 506 Once the FE has associated with a master CE it moves to the post- 507 association phase (Associated state). In this state, the master CE 508 MAY update the list of backup CEs. It MAY also instruct the FE to 509 use a different master CE. It is assumed that the master CE will 510 communicate with other CEs within the NE for the purpose of 511 synchronization via the CE-CE interface. The CE-CE interface is out 512 of scope for this document. 514 FE CE#1 CE#2 ... CE#N 515 | | | | 516 | Asso Estb,Caps exchg | | | 517 1 |<-------------------->| | | 518 | | | | 519 | state update | | | 520 2 |<-------------------->| | | 521 | | | | 522 | Asso Estb,Caps exchg | | 523 3I|<--------------------------------->| | 524 ... ... ... ... 525 | Asso Estb,Caps exchg | 526 3N|<------------------------------------------>| 527 | | | | 528 4 |<-------------------->| | | 529 . . . . 530 4x|<-------------------->| | | 531 | FAILURE | | 532 | | | | 533 | Event Report (CE#2 is new master) | | 534 5 |---------------------------------->|------->| 535 | | | 536 | Config (Set CEID to CEID of CE#2) | | 537 6 |<----------------------------------| | 538 7 |<--------------------------------->| | 539 . . . . 540 7x|<--------------------------------->| | 541 . . . . 543 Figure 5: CE Failover for Hot Standby 545 While in the post-association phase, if the CE Failover Policy is set 546 to 2 (High Availability without Graceful Restart) or 3 (High 547 Availability with Graceful Restart) then the FE, after succesfully 548 associating with the master CE, MUST attempt to connect and associate 549 with all the CEs that is aware of. Figure 5 steps #1 and #2 550 illustrates the FE associating with CE#1 as the master and then 551 proceeding to steps #3I to #3N the association with backup CE's CE#2 552 to CE#N. If the FE fails to connect or associate with some CEs, the 553 FE MAY flag them as unreachable to avoid continuous attempts to 554 connect. The FE MAY retry to reassociate with unreachable CEs when 555 possible. 557 When the master CE for any reason is considered to be down, then the 558 FE will try to find the first associated CE from the list of all CEs 559 in a round-robin fashion. 561 If the FE is unable to find an associated FE in its list of CEs, then 562 it will attempt to connect and associate with the first from the list 563 of all CEs and continue in a round-robin fashion until it connects 564 and associates with a CE. 566 Once the FE selects the associated CE to use as the new master, the 567 FE then sends a High Availability Primary CE Changed Event 568 Notification to all associated CEs to notifying them that the primary 569 CE is down as well as which CE the reporting FE considers to be the 570 new master. 572 The new master CE MUST configure the CEID component of the FE within 573 the time limit defined in the CEHDI Failover Timeout as a 574 confirmation that the FE made the right choice. 576 FE CE#1 CE#2 ... CE#N 577 | | | | 578 | Asso Estb,Caps exchg | | | 579 1 |<-------------------->| | | 580 | | | | 581 | state update | | | 582 2 |<-------------------->| | | 583 | | | | 584 | Asso Estb,Caps exchg | | 585 3I|<--------------------------------->| | 586 | | | | 587 ... ... ... ... 588 | Asso Estb,Caps exchg | 589 3N|<------------------------------------------>| 590 | | | | 591 4 |<-------------------->| | | 592 . . . . 593 4x|<-------------------->| | | 594 | FAILURE | | 595 | | | | 596 | Event Report (CE#2 is new master) | | 597 5 |---------------------------------->|------->| 598 | | | | 599 | CEHDI Failover Timeout | | 600 | | | | 601 | Event Report (CE#N is new master) | | 602 6 |---------------------------------->|------->| 603 | | | | 604 | Config (Set CEID to CEID of CE#N) | 605 7 |<-------------------------------------------| 606 8a|<------------------------------------------>| 607 . . . . 608 8x|<------------------------------------------>| 610 Figure 6: CE Failover for Hot Standby 612 If the FE does not get confirmation within the CEHDI Failover 613 Timeout, it picks the next CE on its list and advertises it as the 614 new master. Figure 6 illustrates in step #5 selecting CE#2 as its 615 new master. In step #6, the timeout occurs and it picks CE#N as its 616 new master. The FE receives confirmation that CE#N is the new master 617 in step #7. 619 If the CE the FE assumed to be the master discovers that it should 620 not be the new master CE, then it will configure the CEID with the ID 621 of the proper master CE. How the CE decides who the new master CE 622 is, is also out of scope of this document and is assumed to be done 623 via a CE-CE communication protocol. The FE must then associate with 624 then new CE. 626 If the CEFTI timer expires at either the not-associated or confirm 627 states without a new master CE confirmed, then the FE MUST revert to 628 the pre-association stage. 630 In most High Availability architectures there exists the possibility 631 of split-brain. However, since in our setup the FE will never accept 632 any configuration messages from any other than the master CE, we 633 consider the FE as fenced against data corruption from the other CEs 634 that consider themselves as the master. The split-brain issue 635 becomes mostly a CE-CE communication problem which is considered to 636 be out of scope. 638 By virtue of having multiple CE connections, the FE switchover to a 639 new master CE will be relatively much faster. The overall effect is 640 improving the NE recovery time in case of communication failure or 641 faults of the master CE. This satisfies the requirement we set to 642 achieve. 644 5. IANA Considerations 646 TBA 648 6. Security Considerations 650 TBA 652 7. References 654 7.1. Normative References 656 [RFC5810] Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang, 657 W., Dong, L., Gopal, R., and J. Halpern, "Forwarding and 658 Control Element Separation (ForCES) Protocol 659 Specification", RFC 5810, March 2010. 661 7.2. Informative References 663 [RFC3654] Khosravi, H. and T. Anderson, "Requirements for Separation 664 of IP Control and Forwarding", RFC 3654, November 2003. 666 [RFC3746] Yang, L., Dantu, R., Anderson, T., and R. Gopal, 667 "Forwarding and Control Element Separation (ForCES) 668 Framework", RFC 3746, April 2004. 670 [RFC5812] Halpern, J. and J. Hadi Salim, "Forwarding and Control 671 Element Separation (ForCES) Forwarding Element Model", 672 RFC 5812, March 2010. 674 1. Appendix I - New FEPO version 676 XXX: Describe this to conform to LFB extensions as prescribed in the 677 model 679 683 684 685 686 CEHBPolicyValues 687 688 The possible values of CE heartbeat policy 689 690 691 uchar 692 693 694 CEHBPolicy0 695 696 The CE heartbeat policy 0 697 698 699 700 CEHBPolicy1 701 702 The CE heartbeat policy 1 703 704 705 706 707 708 709 FEHBPolicyValues 710 711 The possible values of FE heartbeat policy 712 713 714 uchar 715 716 717 FEHBPolicy0 718 719 The FE heartbeat policy 0 720 721 722 723 FEHBPolicy1 724 725 The FE heartbeat policy 1 726 727 728 729 730 731 732 FERestartPolicyValues 733 734 The possible values of FE restart policy 735 736 737 uchar 738 739 740 FERestartPolicy0 741 742 The FE restart policy 0 743 744 745 746 747 748 749 CEFailoverPolicyValues 750 751 The possible values of CE failover policy 752 753 754 uchar 755 756 757 CEFailoverPolicy0 758 759 The CE failover policy 0 760 No High Availability or Graceful Restart. 761 762 764 765 CEFailoverPolicy1 766 767 Graceful Restart 768 769 770 771 CEFailoverPolicy2 772 773 High Availability without Graceful Restart 774 775 776 777 CEFailoverPolicy3 778 779 High Availability with Graceful Restart 780 781 782 783 784 785 786 FEHACapab 787 788 The supported HA features 789 790 791 uchar 792 793 794 GracefullRestart 795 796 The FE supports Graceful Restart 797 798 799 800 HA 801 802 The FE supports HA 803 804 805 806 807 808 809 CEStatusType 810 811 Status values. Status for each CE. 813 814 815 uchar 816 817 818 Disconnected 819 820 No connection attempt with the CE yet. 821 822 823 824 Connected 825 826 The FE has connected with the CE. 827 828 829 830 Associated 831 832 The FE has associated with the CE. 833 834 835 836 Lost_Connection 837 838 The FE was associated with the CE 839 but lost the connection. 840 841 842 843 Unreachable 844 845 The CE is deemed as unreachable by the FE. 846 847 848 849 850 851 852 AllCEType 853 854 Table Type for AllCE component. 855 856 857 858 CEID 859 ID of the CE 860 uint32 862 863 864 CEStatus 865 Status of the CE 866 CEStatusType 867 868 869 870 871 872 873 FEPO 874 875 The FE Protocol Object 876 877 2.0 878 879 880 CurrentRunningVersion 881 Currently running ForCES version 882 u8 883 884 885 FEID 886 Unicast FEID 887 uint32 888 889 890 MulticastFEIDs 891 892 the table of all multicast IDs 893 894 895 uint32 896 897 898 899 CEHBPolicy 900 901 The CE Heartbeat Policy 902 903 CEHBPolicyValues 904 905 906 CEHDI 907 908 The CE Heartbeat Dead Interval in millisecs 909 910 uint32 911 912 913 FEHBPolicy 914 915 The FE Heartbeat Policy 916 917 FEHBPolicyValues 918 919 920 FEHI 921 922 The FE Heartbeat Interval in millisecs 923 924 uint32 925 926 927 CEID 928 929 The Primary CE this FE is associated with 930 931 uint32 932 933 934 AllCEs 935 936 The table of all CEs. 937 938 939 AllCEType 940 941 942 943 CEFailoverPolicy 944 945 The CE Failover Policy 946 947 CEFailoverPolicyValues 948 949 950 CEFTI 951 952 The CE Failover Timeout Interval in millisecs 953 954 uint32 955 956 957 FERestartPolicy 958 959 The FE Restart Policy 960 961 FERestartPolicyValues 962 963 964 LastCEID 965 966 The Primary CE this FE was last associated with 967 968 uint32 969 970 971 972 973 SupportableVersions 974 975 the table of ForCES versions that FE supports 976 977 978 u8 979 980 981 982 HACapabilities 983 984 the table of HA capabilities the FE supports 985 986 987 FEHACapab 988 989 990 991 992 993 PrimaryCEDown 994 995 The pimary CE has changed 996 997 998 LastCEID 999 1000 1001 1002 1003 LastCEID 1004 1005 1007 1008 1009 HAPrimaryCEDown 1010 The primary CE has changed 1011 1012 LastCEID 1013 1014 1015 1016 1017 CEID 1018 LastCEID 1019 1020 1021 1022 1023 1024 1025 1027 Authors' Addresses 1029 Kentaro Ogawa 1030 NTT Corporation 1031 3-9-11 Midori-cho 1032 Musashino-shi, Tokyo 180-8585 1033 Japan 1035 Email: ogawa.kentaro@lab.ntt.co.jp 1037 Weiming Wang 1038 Zhejiang Gongshang University 1039 149 Jiaogong Road 1040 Hangzhou 310035 1041 P.R.China 1043 Phone: +86-571-88057712 1044 Email: wmwang@mail.zjgsu.edu.cn 1045 Evangelos Haleplidis 1046 University of Patras 1047 Patras 1048 Greece 1050 Email: ehalep@ece.upatras.gr 1052 Jamal Hadi Salim 1053 Mojatatu Networks 1054 Ottawa, Ontario 1055 Canada 1057 Email: hadi@mojatatu.com