idnits 2.17.1 draft-ietf-dhc-dhcpv6-failover-design-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC3315], [I-D.ietf-dhc-dhcpv6-failover-requirements]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1452 has weird spacing: '... accept acc...' == Line 1454 has weird spacing: '... accept acc...' == Line 1455 has weird spacing: '... accept acc...' -- The document date (September 13, 2013) is 3868 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3315 (Obsoleted by RFC 8415) ** Obsolete normative reference: RFC 3633 (Obsoleted by RFC 8415) == Outdated reference: A later version (-02) exists of draft-ietf-dhc-dhcpv6-load-balancing-00 Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Dynamic Host Configuration (DHC) T. Mrugalski 3 Internet-Draft ISC 4 Intended status: Standards Track K. Kinnear 5 Expires: March 17, 2014 Cisco 6 September 13, 2013 8 DHCPv6 Failover Design 9 draft-ietf-dhc-dhcpv6-failover-design-04 11 Abstract 13 DHCPv6 defined in [RFC3315] does not offer server redundancy. This 14 document defines a design for DHCPv6 failover, a mechanism for 15 running two servers on the same network with capability for either 16 server to take over clients' leases in case of server failure or 17 network partition. This is a DHCPv6 Failover design document, it is 18 not a protocol specification document. It is a second document in a 19 planned series of three documents. DHCPv6 failover requirements are 20 specified in [I-D.ietf-dhc-dhcpv6-failover-requirements]. A protocol 21 specification document is planned to follow this document. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on March 17, 2014. 40 Copyright Notice 42 Copyright (c) 2013 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 58 2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5 60 3.1. Design Requirements . . . . . . . . . . . . . . . . . . . 6 61 3.2. Features out of Scope: Load Balancing . . . . . . . . . . 6 62 4. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 7 63 4.1. Failover State Machine Overview . . . . . . . . . . . . . 8 64 4.2. Messages . . . . . . . . . . . . . . . . . . . . . . . . 10 65 5. Connection Management . . . . . . . . . . . . . . . . . . . . 12 66 5.1. Creating Connections . . . . . . . . . . . . . . . . . . 12 67 5.2. Endpoint Identification . . . . . . . . . . . . . . . . . 13 68 6. Resource Allocation . . . . . . . . . . . . . . . . . . . . . 14 69 6.1. Proportional Allocation . . . . . . . . . . . . . . . . . 14 70 6.2. Independent Allocation . . . . . . . . . . . . . . . . . 17 71 6.3. Choosing Allocation Algorithm . . . . . . . . . . . . . . 17 72 7. Information model . . . . . . . . . . . . . . . . . . . . . . 18 73 8. Failover Mechanisms . . . . . . . . . . . . . . . . . . . . . 23 74 8.1. Time Skew . . . . . . . . . . . . . . . . . . . . . . . . 23 75 8.2. Lazy updates . . . . . . . . . . . . . . . . . . . . . . 23 76 8.3. MCLT concept . . . . . . . . . . . . . . . . . . . . . . 24 77 8.3.1. MCLT example . . . . . . . . . . . . . . . . . . . . 25 78 8.4. Unreachability detection . . . . . . . . . . . . . . . . 26 79 8.5. Re-allocating Leases . . . . . . . . . . . . . . . . . . 27 80 8.6. Sending Binding Update . . . . . . . . . . . . . . . . . 28 81 8.7. Receiving Binding Update . . . . . . . . . . . . . . . . 29 82 8.8. Conflict Resolution . . . . . . . . . . . . . . . . . . . 30 83 8.9. Acknowledging Reception . . . . . . . . . . . . . . . . . 32 84 9. Endpoint States . . . . . . . . . . . . . . . . . . . . . . . 32 85 9.1. State Machine Operation . . . . . . . . . . . . . . . . . 32 86 9.2. State Machine Initialization . . . . . . . . . . . . . . 35 87 9.3. STARTUP State . . . . . . . . . . . . . . . . . . . . . . 35 88 9.3.1. Operation in STARTUP State . . . . . . . . . . . . . 36 89 9.3.2. Transition Out of STARTUP State . . . . . . . . . . . 36 90 9.4. PARTNER-DOWN State . . . . . . . . . . . . . . . . . . . 38 91 9.4.1. Operation in PARTNER-DOWN State . . . . . . . . . . . 38 92 9.4.2. Transition Out of PARTNER-DOWN State . . . . . . . . 39 93 9.5. RECOVER State . . . . . . . . . . . . . . . . . . . . . . 39 94 9.5.1. Operation in RECOVER State . . . . . . . . . . . . . 39 95 9.5.2. Transition Out of RECOVER State . . . . . . . . . . . 40 96 9.6. RECOVER-WAIT State . . . . . . . . . . . . . . . . . . . 41 97 9.6.1. Operation in RECOVER-WAIT State . . . . . . . . . . . 41 98 9.6.2. Transition Out of RECOVER-WAIT State . . . . . . . . 41 99 9.7. RECOVER-DONE State . . . . . . . . . . . . . . . . . . . 42 100 9.7.1. Operation in RECOVER-DONE State . . . . . . . . . . . 42 101 9.7.2. Transition Out of RECOVER-DONE State . . . . . . . . 42 102 9.8. NORMAL State . . . . . . . . . . . . . . . . . . . . . . 43 103 9.8.1. Operation in NORMAL State . . . . . . . . . . . . . . 43 104 9.8.2. Transition Out of NORMAL State . . . . . . . . . . . 44 105 9.9. COMMUNICATIONS-INTERRUPTED State . . . . . . . . . . . . 44 106 9.9.1. Operation in COMMUNICATIONS-INTERRUPTED State . . . . 45 107 9.9.2. Transition Out of COMMUNICATIONS-INTERRUPTED State . 45 108 9.10. POTENTIAL-CONFLICT State . . . . . . . . . . . . . . . . 47 109 9.10.1. Operation in POTENTIAL-CONFLICT State . . . . . . . 47 110 9.10.2. Transition Out of POTENTIAL-CONFLICT State . . . . . 47 111 9.11. RESOLUTION-INTERRUPTED State . . . . . . . . . . . . . . 48 112 9.11.1. Operation in RESOLUTION-INTERRUPTED State . . . . . 49 113 9.11.2. Transition Out of RESOLUTION-INTERRUPTED State . . . 49 114 9.12. CONFLICT-DONE State . . . . . . . . . . . . . . . . . . . 49 115 9.12.1. Operation in CONFLICT-DONE State . . . . . . . . . . 49 116 9.12.2. Transition Out of CONFLICT-DONE State . . . . . . . 50 117 10. Proposed extensions . . . . . . . . . . . . . . . . . . . . . 50 118 10.1. Active-active mode . . . . . . . . . . . . . . . . . . . 50 119 11. Dynamic DNS Considerations . . . . . . . . . . . . . . . . . 50 120 11.1. Relationship between failover and dynamic DNS update . . 51 121 11.2. Exchanging DDNS Information . . . . . . . . . . . . . . 52 122 11.3. Adding RRs to the DNS . . . . . . . . . . . . . . . . . 54 123 11.4. Deleting RRs from the DNS . . . . . . . . . . . . . . . 54 124 11.5. Name Assignment with No Update of DNS . . . . . . . . . 55 125 12. Reservations and failover . . . . . . . . . . . . . . . . . . 55 126 13. Security Considerations . . . . . . . . . . . . . . . . . . . 57 127 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 57 128 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 57 129 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 58 130 16.1. Normative References . . . . . . . . . . . . . . . . . . 58 131 16.2. Informative References . . . . . . . . . . . . . . . . . 58 132 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 59 134 1. Requirements Language 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 138 document are to be interpreted as described in RFC 2119 [RFC2119]. 140 2. Glossary 142 This is a supplemental glossary that should be combined with 143 definitions in Section 3 of 144 [I-D.ietf-dhc-dhcpv6-failover-requirements]. 146 o auto-partner-down - a capability where a failover server will move 147 from COMMUNICATIONS-INTERRUPTED state to PARTNER-DOWN state 148 automatically, without operator intervention. 150 o DDNS - Dynamic DNS. Typically used as an acronym referring to 151 dynamic update of the DNS. 153 o Failover endpoint - The failover protocol allows for there to be a 154 unique failover 'endpoint' for each failover relationship in which 155 a failover server participates. The failover relationship is 156 defined by a relationship name, and includes the failover partner 157 IP address, the role this server takes with respect to that 158 partner (primary or secondary), and the prefixes associated with 159 that relationship. Note that a single prefix can only be 160 associated with a single failover relationship. This failover 161 endpoint can take actions and hold unique states. Typically, 162 there is one failover endpoint per partner (server), although 163 there may be more. 'Server' and 'failover endpoint' are 164 synonymous only if the server participates in only one failover 165 relationship. However, for the sake of simplicity 'Server' is 166 used throughout the document to refer to a failover endpoint 167 unless to do so would be confusing. 169 o Failover communication - all messages exchanged between partners. 171 o Independent Allocation - an allocation algorithm that splits the 172 available pool of resources between the primary and secondary 173 servers that is particularly well suited for vast pools (i.e. when 174 available resources are not expected to deplete). See Section 6.2 175 for details. 177 o Lease - an association of a DHCPv6 client with an IPv6 address or 178 delegated prefix. 180 o Partner - name of the other DHCPv6 server that participates in 181 failover relationship. When the role (primary or secondary) is 182 not important, the other server is referred to as a "failover 183 partner" or simply partner. 185 o Primary Server - First out of two DHCPv6 servers that participate 186 in a failover relationship. In active-passive mode this is the 187 server that handles most of the client traffic. Its failover 188 partner is referred to as secondary server. 190 o Proportional Allocation - an allocation algorithm that splits the 191 available resources between the primary and secondary servers and 192 maintains proportions between available resources on both. It is 193 particularly well suited for more limited resources. See 194 Section 6.1 for details. 196 o Resource - Any type of resource that is managed by DHCPv6. 197 Currently there are three types of such resources defined: a non- 198 temporary IPv6 address, a temporary IPv6 address, and an IPv6 199 prefix. Other resource types may be defined in the future. 201 o Responsive - A server that is responsive, will respond to DHCPv6 202 client requests. 204 o Secondary Server - Second of two DHCPv6 servers that participate 205 in a failover relationship. Its failover partner is referred to 206 as the primary server. In active-passive mode this server (the 207 secondary) typically does not handle client traffic and acts as a 208 backup. 210 o Server - A DHCPv6 server that implements DHCPv6 failover. 211 'Server' and 'failover endpoint' are synonymous only if the server 212 participates in only one failover relationship. 214 o Unresponsive - A server that is unresponsive will not respond to 215 DHCPv6 client requests. 217 3. Introduction 219 The failover protocol design provides a means for cooperating DHCPv6 220 servers to work together to provide a DHCPv6 service with 221 availability that is increased beyond that which could be provided by 222 a single DHCPv6 server operating alone. It is designed to protect 223 DHCPv6 clients against server unreachability, including server 224 failure and network partition. It is possible to deploy exactly two 225 servers that are able to continue providing a lease on an IPv6 226 address [RFC3315] or on an IPv6 prefix [RFC3633] without the DHCPv6 227 client experiencing lease expiration or a reassignment of a lease to 228 a different IPv6 address (or prefix) in the event of failure by one 229 or the other of the two servers. 231 This protocol defines active-passive mode, sometimes also called a 232 hot standby model. This means that during normal operation one 233 server is active (i.e. actively responds to clients' requests) while 234 the second is passive (i.e. it does receive clients' requests, but 235 does not respond to them and only maintains a copy of lease database 236 and is ready to take over incoming queries in case of primary server 237 failure). Active-active mode (i.e. both servers actively handling 238 clients' requests) is currently not supported for the sake of 239 simplicity. Such a mode is likely to be defined as an extension at a 240 later time and will probably be based on 241 [I-D.ietf-dhc-dhcpv6-load-balancing]. 243 The failover protocol is designed to provide lease stability for 244 leases with lease times beyond a short period. Due in part to the 245 additional overhead required as well as requirements to handle time 246 skew between failover partners (See Section 8.1), failover is not 247 suitable for leases shorter than 30 seconds. The DHCPv6 Failover 248 protocol MUST NOT be used for leases shorter than 30 seconds. 250 This design attempts to fulfill all DHCPv6 failover requirements 251 defined in [I-D.ietf-dhc-dhcpv6-failover-requirements]. 253 3.1. Design Requirements 255 The following requirements are not related to failover the mechanism 256 in general, but rather to this particular design. 258 1. Minimize Asymmetry - while there are two distinct roles in 259 failover (primary and secondary server), the differences between 260 those two roles should be as small as possible. This will yield 261 a simpler design as well as a simpler implementation of that 262 design. 264 3.2. Features out of Scope: Load Balancing 266 While it is tempting to extend DHCPv6 failover mechanism to also 267 offer load balancing, as DHCPv4 failover did, this design does not do 268 that. Here is the reasoning for this decision. In general case (not 269 related to failover) load balancing solutions are used when each 270 server is not able to handle total incoming traffic. However, by the 271 very definition, DHCPv6 failover is supposed to assume service 272 availability despite failure of one server. That leads to the 273 conclusion that each server must be able to handle all of the 274 traffic. Therefore in properly provisioned setup, load balancing is 275 not needed. 277 It is likely that active-active mode that is essentially a load 278 balancing will be defined as an extension in the near future. 280 4. Protocol Overview 282 The DHCPv6 Failover Protocol is defined as a communication between 283 failover partners with all associated algorithms and mechanisms. 284 Failover communication is conducted over a TCP connection established 285 between the partners. The protocol reuses the framing format 286 specified in Section 5.1 of DHCPv6 Bulk Leasequery [RFC5460], but 287 uses different message types. New failover-specific message types 288 are listed in Section 4.2. All information is sent over the 289 connection as typical DHCPv6 messages that convey DHCPv6 options, 290 following the format defined in Section 22.1 of [RFC3315]. 292 After initialization, the primary server establishes a TCP connection 293 with its partner. The primary server sends a CONNECT message with 294 initial parameters. Secondary server responds with CONNECTACK. 296 If the primary server cannot immediately establish a connection with 297 its partner, it will continue to attempt to establish a connection. 298 See Section 5.1 for details. 300 Depending on the failover state of each partner, they MUST initiate 301 one of the binding update procedures. Each server MAY send an UPDREQ 302 message to request its partner to send all updates that have not been 303 sent yet (this case applies when the partner has an existing database 304 and wants to update it). Alternatively, a server MAY choose to send 305 an UPDREQALL message to request a full lease database transmission 306 including all leases (this case applies in case of booting up a new 307 server after installation, corruption or complete loss of database, 308 or other catastrophic failure). 310 Servers exchange lease information by using BNDUPD messages. 311 Depending on the local and remote state of a lease, a server may 312 either accept or reject the update. Reception of lease update 313 information is confirmed by responding with a BNDACK message with 314 appropriate status. The majority of the messages sent over a 315 failover TCP connection consists of BNDUPD and BNDACK messages. 317 A subset of available resources (addresses or prefixes) is reserved 318 for secondary server use. This is required for handling a case where 319 both servers are able to communicate with clients, but unable to 320 communicate with each other. After the initial connection is 321 established, the secondary server requests a pool of available 322 addresses or prefixes by sending a POOLREQ message. The primary 323 server assigns addresses or prefixes to the secondary by sending a 324 series of BNDUPD messages. When this process is complete, the 325 primary server sends a POOLRESP message to the secondary server. The 326 secondary server may initiate such pool request at any time when in 327 communication with primary server. 329 Failover servers use a lazy update mechanism to update their failover 330 partner about changes to their lease state database. After a server 331 performs any modifications to its lease state database (assign a new 332 lease, extend, release or expire existing lease), it sends its 333 response to the client's request first (performing the "regular" 334 DHCPv6 operation) and then informs its failover partner using a 335 BNDUPD message. This BNDUPD message SHOULD be sent soon after the 336 response is sent to the DHCPv6 client, but there is no specific 337 requirement of a minimum time in which to do so. 339 The major problem with a lazy update mechanism is when the server 340 crashes after sending a response to client, but before sending the 341 lazy update to its partner (or when communication between partners is 342 interrupted). To solve this problem, the concept known as the 343 Maximum Client Lead Time (initially designed for DHCPv4 failover) is 344 used. The MCLT is the maximum amount of time that one server can 345 extend a lease for a client's binding beyond the time known by its 346 failover partner. See Section 8.3 for a detailed description how the 347 MCLT affects assigned lifetimes. 349 Servers verify each others availability by periodically exchanging 350 CONTACT messages. See Section 8.4 for discussion about detecting a 351 partner's unreachability. 353 A server that is being shut down transmits a DISCONNECT message, 354 closes the connection with its failover partner and stops operation. 355 A Server SHOULD transmit any pending lease updates before 356 transmitting DISCONNECT message. 358 4.1. Failover State Machine Overview 360 The following section provides a simplified description of all 361 states. For the sake of clarity and simplicity, it omits important 362 details. For a complete description, see Section 9. In case of a 363 disagreement between the simplified and complete description, please 364 follow Section 9. 366 Each server MUST be in one of the well defines states. Depending on 367 its current state a server may be either responsive (responds to 368 clients' queries) or unresponsive (clients' queries are ignored). 370 A server starts its operation in the short-lived STARTUP state. A 371 server determines its partner reachability and state and sets its own 372 state based on that determination. It typically returns back to the 373 state it was in before shutdown, though the details can be 374 complicated. See Section 9.3.2. 376 During typical operation when servers maintain communication, both 377 are in NORMAL state. In that state only the primary responds to 378 clients' requests. The secondary server is unresponsive. 380 If a server discovers that its partner is no longer reachable, it 381 goes to COMMUNICATIONS-INTERRUPTED state. A server must be extra 382 cautious as it can't distinguish if its partner is down or just 383 communication between servers is interrupted. Since communication 384 between partners is not possible, a server must act on the assumption 385 that its partner is up. A failover server must follow a defined 386 procedure, in particular, it MUST NOT extend any lease more than the 387 MCLT beyond its partner's knowledge of the lease expiration time. 388 This imposes an additional burden on the server, in that clients will 389 return to the server for lease renewals more frequently than they 390 would otherwise. Therefore it is not recommended to operate for 391 prolonged periods in this state. Once communication is 392 reestablished, a server may go into NORMAL, POTENTIAL-CONFLICT or 393 PARTNER-DOWN state. It may also stay in COMMUNICATIONS-INTERRUPTED 394 state if certain conditions are met. 396 Once a server is switched into PARTNER-DOWN (when auto-partner-down 397 is used or as a result of administrative action), it can extend 398 leases, regardless of the original server that initially granted the 399 lease. In that state server handles leases from its own pool, but 400 once its own pool is depleted is also able to serve pool from its 401 downed partner. Some MCLT restrictions no longer apply, but the MCLT 402 still affects whether or not a particular lease can be given to a 403 different client. See Section 9.4.1 for details. Operation in this 404 mode is less demanding for the server that remains operational, than 405 in COMMUNICATIONS-INTERRUPTED state, but PARTNER-DOWN does not offer 406 any kind of redundancy. Even when in PARTNER-DOWN state, a failover 407 server continues to attempt to connect with its failover partner. 409 A server switches into RECOVER state when any of a variety of 410 conditions are encountered: 412 o When a backup server contacts its failover partner for the first 413 time. 415 o When either server discovers that its failover partner has 416 contacted it before but it has no local record of this contact. 417 If the record of previous contact is held in the lease-state 418 database, then this situation implies that the server has lost its 419 lease state database. 421 o When its failover partner is in PARTNER-DOWN state. 423 Any of these conditions signal that the server needs to refresh its 424 lease-state database from its partner. Once this operation is 425 complete, it switches to RECOVER-WAIT and later to RECOVER-DONE. See 426 Section 9.6.2. 428 Once servers reestablish connection, they discover each others' 429 state. Depending on the conditions, they may return to NORMAL or 430 move to POTENTINAL-CONFLICT if the partner is in a state that doesn't 431 allow a simple re-integration of the server's lease state databases. 432 It is a goal of this protocol to minimize the possibility that 433 POTENTIAL-CONFLICT state is ever entered. Servers running in 434 POTENTIAL-CONFLICT do not respond to clients' requests and work only 435 on resolving potential conflicts. Once outstanding lease updates are 436 exchanged, servers move to CONFLICT-DONE or NORMAL states. 438 Servers that are recovering from potential conflicts and loose 439 communication, switch to RESOLUTION-INTERRUPTED. 441 A server that is being shut down sends a DISCONNECT message. See 442 Section 4.2. A server that receives a DISCONNECT message moves into 443 COMMUNICATIONS-INTERRUPTED state. 445 4.2. Messages 447 The failover protocol is centered around the message exchanges used 448 by one server to update its partner and respond to received updates. 449 It should be noted that no specific formats or message type values 450 are assigned in this document. Appropriate implementation details 451 will be specified in a separate protocol specification document. The 452 following list enumerates these messages: 454 o BNDUPD - The binding update message is used to send the binding 455 lease changes to the partner. One message may contain one or more 456 lease updates. The partner is expected to respond with a BNDACK 457 message. 459 o BNDACK - The binding acknowledgement is used for confirmation of 460 the received BNDUPD message. It may contain a positive or 461 negative response (e.g. due to detected lease conflict). 463 o POOLREQ - The Pool Request message is used by one server 464 (typically secondary) to request allocation of resources 465 (addresses or prefixes) from its partner. The partner responds 466 with POOLRESP. 468 o POOLRESP - The Pool Response message is used by one server 469 (typically primary) to indicate that it has responded to its 470 partner's request for resources allocation. 472 o UPDREQ - The update request message is used by one server to 473 request that its partner send all binding database changes that 474 have not been sent and confirmed already. Requested partner is 475 expected to respond with zero or more BNDUPD messages, followed by 476 UPDDONE that signals end of updates. 478 o UPDREQALL - The update request all is used by one server to 479 request that all binding database information be sent in order to 480 recover from a total loss of its binding database by the 481 requesting server. Requested server responds with zero or more 482 BNDUPD messages, followed by UPDDONE that signal end of updates. 484 o UPDDONE - The update done message is used by the server responding 485 to an UPDREQ or UPDREQALL to indicate that all requested updates 486 have been sent by the responding server and acked by the 487 requesting server. 489 o CONNECT - The connect message is used by the primary server to 490 establish a high level connection with the other server, and to 491 transmit several important configuration data items between the 492 servers. The partner is expected to confirm by responding with 493 CONNECTACK message. 495 o CONNECTACK - The connect acknowledgement message is used by the 496 secondary server to respond to a CONNECT message from the primary 497 server. 499 o DISCONNECT - The disconnect message is used by either server when 500 closing a connection and shutting down. No response is required 501 for this message. 503 o STATE - The state message is used by either server to inform its 504 partner about a change of failover state. In some cases it may be 505 used to also inform the partner about current state, e.g. after 506 connection is established in COMMUNICATIONS-INTERRUPTED or 507 PARTNER-DOWN states. 509 o CONTACT - The contact message is used by either server to ensure 510 that the other server continues to see the connection as 511 operational. It MUST be transmitted periodically over every 512 established connection if other message traffic is not flowing, 513 and it MAY be sent at any time. 515 5. Connection Management 517 5.1. Creating Connections 519 Every primary server implementing the failover protocol MUST attempt 520 to connect to all of its partners periodically, where the period is 521 implementation dependent and SHOULD be configurable. In the event 522 that a connection has been rejected by a CONNECTACK message with a 523 reject-reason option contained in it or a DISCONNECT message, a 524 server SHOULD reduce the frequency with which it attempts to connect 525 to that server but it MUST continue to attempt to connect 526 periodically. 528 Every secondary server implementing the failover protocol MUST listen 529 for connection attempts from the primary server. 531 When a connection attempt succeeds, the primary server which has 532 initiated the connection attempt MUST send a CONNECT message down the 533 connection. 535 When a connection attempt is received, the only information that the 536 receiving server has is the IP address of the partner initiating a 537 connection. If it has any relationships with the connecting server 538 for which it is a secondary server, it should just await the CONNECT 539 message to determine which relationship this connection is to serve. 541 If it has no secondary relationships with the connecting server, it 542 MUST drop the connection. The goal is to limit the resources 543 expended dealing with attempts to create a spurious failover 544 connection. 546 To summarize -- a primary server MUST use a connection that it has 547 initiated in order to send a CONNECT message. Every server that is a 548 secondary server in a relationship simply listens for connection 549 attempts from the primary server. 551 Once a connection is established, the primary server MUST send a 552 CONNECT message across the connection. A secondary server MUST wait 553 for the CONNECT message from a primary server. If the secondary 554 server doesn't receive a CONNECT message from the primary server in 555 an installation dependent amount of time, it MAY drop the connection. 557 Every CONNECT message includes a TLS-request option, and if the 558 CONNECTACK message does not reject the CONNECT message and the TLS- 559 reply option says TLS MUST be used, then the servers will immediately 560 enter into TLS negotiation. 562 Once TLS negotiation is complete, the primary server MUST resend the 563 CONNECT message on the newly secured TLS connection and then wait for 564 the CONNECTACK message in response. The TLS-request and TLS-reply 565 options MUST NOT appear in either this second CONNECT or its 566 associated CONNECTACK message as they had in the first messages. 568 The second message sent over a new connection (either a bare TCP 569 connection or a connection utilizing TLS) is a STATE message. Upon 570 the receipt of this message, the receiver can consider communications 571 up. 573 5.2. Endpoint Identification 575 The proper operation of the failover protocol requires more than the 576 transmission of messages between one server and the other. Each 577 endpoint might seem to be a single DHCPv6 server, but in fact there 578 are situations where additional flexibility in configuration is 579 useful. A failover endpoint is always associated with a set of 580 DHCPv6 prefixes that are configured on the DHCPv6 server where the 581 endpoint appears. A DHCPv6 prefix MUST NOT be associated with more 582 than one failover endpoint. 584 The failover protocol SHOULD be configured with one failover 585 relationship between each pair of failover servers. In this case 586 there is one failover endpoint for that relationship on each failover 587 partner. This failover relationship MUST have a unique name. 589 There is typically little need for additional relationships between 590 any two servers but there MAY be more than one failover relationship 591 between two servers -- however each MUST have a unique relationship 592 name. 594 Any failover endpoint can take actions and hold unique states. 596 This document frequently describes the behavior of the protocol in 597 terms of primary and secondary servers, not primary and secondary 598 failover endpoints. However, it is important to remember that every 599 'server' described in this document is in reality a failover endpoint 600 that resides in a particular process, and that several failover end- 601 points may reside in the same server process. 603 It is not the case that there is a unique failover endpoint for each 604 prefix that participates in a failover relationship. On one server, 605 there is (typically) one failover endpoint per partner, regardless of 606 how many prefixes are managed by that combination of partner and 607 role. Conversely, on a particular server, any given prefix will be 608 associated with exactly one failover endpoint. 610 When a connection is received from the partner, the unique failover 611 endpoint to which the message is directed is determined solely by the 612 IP address of the partner, the relationship-name, and the role of the 613 receiving server. 615 6. Resource Allocation 617 Currently there are two allocation algorithms defined for resources 618 (addresses or prefixes). Additional allocation schemes may be 619 defined as future extensions. 621 1. Proportional Allocation - This allocation algorithm is a direct 622 application of the algorithm defined in [dhcpv4-failover] to 623 DHCPv6. Remaining available resources are split between the 624 primary and secondary servers in a configured proportion. 625 Released resources are always returned to the primary server. 626 Primary and secondary servers may initiate a rebalancing 627 procedure when disparity between resources available to each 628 server reaches a preconfigured threshold. Only resources that 629 are not leased to any clients are "owned" by one of the servers. 630 This algorithm is particularly well suited for scenarios where 631 amount of available resources is limited, as may be the case with 632 prefix delegation. See Section 6.1 for details. 634 2. Independent Allocation - This allocation algorithm also assumes 635 that available resources are split between primary and secondary 636 servers. In this case, however, resources are assigned to a 637 specific server for all time, regardless if they are available or 638 currently used. This algorithm is much simpler than proportional 639 allocation, because resource imbalance doesn't have to be checked 640 and there is no rebalancing for independent allocation. This 641 algorithm is particularly well suited for scenarios where the 642 there is an abundance of available resources which is typically 643 the case for DHCPv6 address allocation. See Section 6.2 for 644 details. 646 6.1. Proportional Allocation 648 In this allocation scheme, each server has its own pool of available 649 resources. Remaining available resources are split between the 650 primary and secondary servers in a configured proportion. Note that 651 a resource is not "owned" by a particular server throughout its 652 entire lifetime. Only a resource which is available is "owned" by a 653 particular server -- once it has been leased to a client, it is not 654 owned by either failover partner. When it finally becomes available 655 again, it will be owned initially by the primary server, and it may 656 or may not be allocated to the secondary server by the primary 657 server. 659 The flow of a resource is as follows: initially a resource is owned 660 by the primary server. It may be allocated to the secondary server 661 if it is available, and then it is owned by the secondary server. 662 Either server can allocate available resources which they own to 663 clients, in which case they cease to own them. When the client 664 releases the resource or the lease on it expires, it will again 665 become available and will be owned by the primary. 667 A resource will not become owned by the server which allocated it 668 initially when it is released or the lease expires because, in 669 general, that server will have had to replenish its pool of available 670 resources well in advance of any likely lease expirations. Thus, 671 having a particular resource cycle back to the secondary might well 672 put the secondary more out of balance with respect to the primary 673 instead of enhancing the balance of available addresses or prefixes 674 between them. 676 Pools governed by proportional allocation are used for allocation 677 when the server is in all states, except PARTNER-DOWN. In PARTNER- 678 DOWN state the healthy partner can allocate from either pool (both 679 its own, and its partner's after some time constraints have elapsed). 680 This allocation and maintenance of these address pools is an area of 681 some sensitivity, since the goal is to maintain a more or less 682 constant ratio of available addresses between the two servers. 684 The initial allocation when the servers first integrate is triggered 685 by the POOLREQ message from the secondary to the primary. This is 686 followed (at some point) by the POOLRESP message where the primary 687 tells the secondary that it received and processed the POOLREQ 688 message. The primary sends the allocated resources to the secondary 689 via BNDUPD messages. The POOLRESP message may be sent before, 690 during, or at the completion of the BNDUPD message exchanges that 691 were triggered by the POOLREQ message. The POOLREQ/POOLRESP message 692 exchange is a trigger to the primary to perform a scan of its 693 database and to ensure that the secondary has enough resources (based 694 on some configured ratio). 696 The primary server SHOULD examine some or all of its database from 697 time to time to determine if resources should be shifted between the 698 primary and secondary (in either direction). The POOLREQ/POOLRESP 699 message exchange allows the secondary server to explicitly request 700 that the primary server examine the entirety of its database to 701 ensure that the secondary has the appropriate resources available. 703 Servers frequently have several kinds of resources available on a 704 particular network segment. The failover protocol assumes that both 705 primary and secondary servers are configured in such a way that each 706 knows the type and number of resources on every network segment 707 participating in the failover protocol. The primary server is 708 responsible for allocating the secondary server the correct 709 proportion of available resources of each kind. 711 The resources are delegated to the secondary using the BNDUPD message 712 with a state of FREE_BACKUP, which indicates the resource is now 713 available for allocation by the secondary. Once the message is sent, 714 the primary MUST NOT use these resources for allocation to DHCPv6 715 clients. 717 Available resources can be delegated back to the primary server in 718 certain cases. BNDUPD will contain state FREE for leases that were 719 previously in FREE_BACKUP state. 721 The POOLREQ/POOLRESP message exchange initiated by the secondary is 722 valid at any time both partners remain in contact, and the primary 723 server SHOULD, whenever it receives the POOLREQ message, scan its 724 database of prefixes and determine if the secondary needs more 725 resources from any of the prefixes. 727 In order to support a reasonably dynamic balance of the resources 728 between the failover partners, the primary server needs to do 729 additional work to ensure that the secondary server has as many 730 resources as it needs (but that it doesn't have more than it needs). 732 The primary server SHOULD examine the balance of available resources 733 between the primary and secondary for a particular prefix whenever 734 the number of available resources for either the primary or secondary 735 changes by more than a configured limit. The primary server SHOULD 736 adjust the available resource balance as required to ensure the 737 configured resource balance, excepting that the primary server SHOULD 738 employ some threshold mechanism to such a balance adjustment in order 739 to minimize the overhead of maintaining this balance. 741 An example of a threshold approach is: do not attempt to re-balance 742 the prefixes on the primary and secondary until the out of balance 743 value exceeds a configured value. 745 The primary server can, at any time, send an available resource to 746 the secondary using a BNDUPD with the state FREE_BACKUP. The primary 747 server can attempt to take an available resource away from the 748 secondary by sending a BNDUPD with the state FREE. If the secondary 749 accepts the BNDUPD, then the resource is now available to the primary 750 and not available to the secondary. Of course, the secondary MUST 751 reject that BNDUPD if it has already used that resource for a DHCP 752 client. 754 6.2. Independent Allocation 756 In this allocation scheme, available resources are permanently (until 757 server configuration changes) split between servers. Available 758 resources are split between the primary and secondary servers as part 759 of initial connection establishment. Once resources are allocated to 760 each server, there is no need to reassign them. The resource 761 allocation is algorithmic in nature, and does not require a message 762 exchange for each resource allocated. This algorithm is simpler than 763 proportional allocation since it does not require a rebalancing 764 mechanism. It assumes that the pool assigned to each server will 765 never deplete. That is often a reasonable assumption for IPv6 766 addresses (e.g. servers are often assigned a /64 pool that contains 767 many more addresses than existing electronic devices on Earth). This 768 allocation mechanism SHOULD be used for IPv6 addresses, unless the 769 configured address pool is small or is otherwise administratively 770 limited. 772 Once each server is assigned a resource pool during initial 773 connection establishment, it may allocate assigned resources to 774 clients. Once a client releases a resource or its lease is expired, 775 the returned resource returns to the pool for the server that leased 776 it. Resources never changes servers. 778 Resources using the independent allocation approach are ignored when 779 a server processes a POOLREQ message. 781 During COMMUNICATION-INTERRUPTED events, a partner MAY continue 782 extending existing leases when requested by clients. A healthy 783 partner MUST NOT lease resources that were assigned to its downed 784 partner and later released by a client unless it is in PARTNER-DOWN 785 state. When it is in PARTNER-DOWN state, a server SHOULD use its own 786 pool first and then it MAY start making new assignments from its 787 downed partner's pool. As the assumption is that independent 788 allocation should be used only when available resources are vast and 789 not expected to be fully used at any given time, it is very unlikely 790 that the server will ever need to use its downed partner pools. This 791 makes a recovery even after prolonged down-time much easier. 793 6.3. Choosing Allocation Algorithm 795 All implementations SHOULD support both the proportional allocation 796 algorithm and the independent allocation algorithm. The specific 797 requirements for support (i.e., which algorithm(s) MUST be 798 supported), and the assignment of a specific algorithm to a specific 799 allocation domain, would be documented in any protocol specifications 800 that follow from this document. 802 The proportional allocation mechanism is more flexible as it can 803 dynamically rebalance available resources between servers. That 804 balance creates an additional burden for the servers and generates 805 more traffic between servers. The proportional algorithm can be 806 considered more efficient at managing available resources, compared 807 to the independent algorithm. That is an important aspect when 808 working in a network that is nearing address and/or prefix depletion. 810 Independent allocation can be used when the number of available 811 resources are large and there is no realistic danger of running out 812 of resources. Use of the independent allocation makes communication 813 between partners simpler. It also makes recovery easier and 814 potential conflict less likely to appear. 816 Typically independent allocation is used for IPv6 addresses, because 817 even for /64 pools a server will never run out of addresses to 818 assign, so there is no need to rebalance. For the prefix delegation 819 mechanism, available resources are typically much smaller, so there 820 is a danger of running out of prefixes. Therefore typically 821 proportional allocation will be used for prefix delegations. 822 Independent allocation still may be used, but the implication must be 823 well understood. For example in a network that delegates /64 824 prefixes out of a /48 prefix (so there can be up to 65536 prefixes 825 delegated) and a 1000 requesting routers, it is safe to use 826 independent allocation. 828 It should be stressed that the independent allocation algorithm 829 SHOULD NOT be used when the number of resources is limited and there 830 is a realistic danger of depleting resources. If this recommendation 831 is violated, it may lead to a case when one server denies clients due 832 to pool depletion despite the fact that the other partner still has 833 many resources available. 835 With independent allocation it is very unlikely for a remaining 836 healthy server to allocate resources from its unavailable partner's 837 pool. That makes recovery easier and any potential conflicts are 838 less likely to appear. 840 7. Information model 842 In most DHCP servers a resource (an IP address or a prefix) can take 843 on several different binding-status values, sometimes also called 844 lease states. While no two DHCP server implementations probably have 845 exactly the same possible binding-status values, [RFC3315] enforces 846 some commonality among the general semantics of the binding-status 847 values used by various DHCP server implementations. 849 In order to transmit binding database updates between one server and 850 another using the failover protocol, some common denominator binding- 851 status values must be defined. It is not expected that these values 852 correspond with any actual implementation of the DHCP protocol in a 853 DHCP server, but rather that the binding-status values defined in 854 this document should be a common denominator of those in use by many 855 DHCP server implementations. 857 The lease binding-status values defined for the failover protocol are 858 listed below. Unless otherwise noted below, there MAY be client 859 information associated with each of these binding-status value. 861 ACTIVE -- The lease is assigned to a client. Client identification 862 data MUST appear. 864 EXPIRED -- indicates that a client's binding on a given lease has 865 expired. When the partner acks the BNDUPD of an expired lease, 866 the server sets its internal state to FREE*. Client identification 867 SHOULD appear. 869 RELEASED -- indicates that a client sent in RELEASE message. When 870 the partner acks the BNDUPD of a released lease, the server sets 871 its internal state to FREE*. Client identification SHOULD appear. 873 FREE* -- Once a lease is expired or released, its state becomes 874 FREE*. Depending on which algorithm and which pool was used to 875 allocate a given lease, FREE* may either mean FREE or FREE_BACKUP. 876 Implementations do not have to implement this FREE* state, but may 877 choose to switch to the destination state directly. For a clarity 878 of representation, this transitional FREE* state is treated as a 879 separate state. 881 FREE -- Is used when a DHCP server needs to communicate that a 882 resource is unused by any client, but it was not just released, 883 expired or reset by a network administrator. When the partner 884 acks the BNDUPD of a FREE lease, the server marks the lease as 885 available for assignment by the primary server. Note that on a 886 secondary server running in PARTNER-DOWN state, after waiting the 887 MCLT, the resource MAY be allocated to a client by the secondary 888 server. Client identification MAY appear and indicates the last 889 client to have used this resource as a hint. 891 FREE_BACKUP -- indicates that this resource can be allocated by the 892 secondary server to a client at any time. Note that the primary 893 server running in PARTNER-DOWN state, after waiting the MCLT, the 894 resource MAY be allocated to a client by the primary server if 895 proportional algorithm was used. Client identification MAY appear 896 and indicates the last client to have used this resource as a 897 hint. 899 ABANDONED -- indicates that a lease is considered unusable by the 900 DHCP system. The primary reason for entering such state is 901 reception of DECLINE message for said lease. Client 902 identification MAY appear. 904 RESET -- indicates that this resource was made available by operator 905 command. This is a distinct state so that the reason that the 906 resource became FREE can be determined. Client identification MAY 907 appear. 909 The lease state machine has been presented in Figure 1. Most states 910 are stationary, i.e. the lease stays in a given state until external 911 event triggers transition to another state. The only transitive 912 state is FREE*. Once it is reached, the state machine immediately 913 transitions to either FREE or FREE_BACKUP state. 915 +---------+ 916 /------------->| ACTIVE |<--------------\ 917 | +---------+ | 918 | | | | | 919 | /--(8)--/ (3) \--(9)-\ | 920 | | | | | 921 | V V V | 922 | +-------+ +--------+ +---------+ | 923 | |EXPIRED| |RELEASED| |ABANDONED| | 924 | +-------+ +--------+ +---------+ | 925 | | | | | 926 | | | (10) | 927 | | | V | 928 | | | +---------+ | 929 | | | | RESET | | 930 | | | +---------+ | 931 | | | | | 932 | \--(4)--\ (4) /--(4)--/ | 933 | | | | | 934 (1) V V V (2) 935 | /---------\ | 936 | | FREE* | | 937 | \---------/ | 938 | | | | 939 | /-(5)--/ \-(6)-\ | 940 | | | | 941 | V V | 942 | +-------+ +-----------+ | 943 \----| FREE |<--(7)-->|FREE_BACKUP|-----/ 944 +-------+ +-----------+ 946 FREE* transition 948 Figure 1: Lease State Machine 950 Transitions between states are results of the following events: 952 1. Primary server allocates a lease. 954 2. Secondary server allocates a lease. 956 3. Client sends RELEASE and the lease is released. 958 4. Partner acknowledges state change. This transition MAY also 959 occur if the server is in PARTNER-DOWN state and the MCLT has 960 passed since the entry in RELEASED, EXPIRED, or RESET states. 962 5. The lease belongs to a pool that is governed by the 963 proportional allocation, or independent allocation is used and 964 this lease belongs to primary server pool. 966 6. The lease belongs to a pool that is governed by the 967 independent allocation and the lease belongs to the secondary 968 server. 970 7. Pool rebalance event occurs (POOLREQ/POOLRESP messages are 971 exchanged). Addresses (or prefixes) belonging to the primary 972 server can be assigned to the secondary server pool (transition 973 from FREE to FREE_BACKUP) or vice versa. 975 8. The lease has expired. 977 9. DECLINE message is received or a lease is deemed unusable for 978 other reasons. 980 10. An administrative action is taken to recover an abandoned 981 lease back to usable state. This transition MAY occur due to an 982 implementation specific handling on ABANDONED resource. One 983 possible example of such use is a Neighbor Discovery or ICMPv6 984 Echo check if the address is still in use. 986 The resource that is no longer in use (due to expiration or release), 987 becomes FREE*. Depending of what allocation algorithm is used, the 988 resource that is no longer is use, returns to primary (FREE) or 989 secondary pool (FREE_BACKUP). The conditions for specific 990 transitions are depicted in Figure 2. 992 +----------------+---------+-----------+ 993 | \Resource owner| | | 994 | \----------\ | Primary | Secondary | 995 |Algorithm \ | | | 996 +----------------+---------+-----------+ 997 | Proportional | FREE | FREE | 998 | Independent | FREE |FREE_BACKUP| 999 +----------------+---------+-----------+ 1001 Figure 2: FREE* State Transitions 1003 In case of servers operating in active-passive mode, while a majority 1004 of the resources are owned by the primary server, the secondary 1005 server will need a portion of the resources to serve new clients 1006 while operating in COMMUNICATION-INTERRUPTED state and also in 1007 PARTNER-DOWN state before it can take over the entire address pool 1008 (after the expiry of MCLT). 1010 The secondary server cannot simply take over the entire resource pool 1011 immediately, since it could also be that both servers are able to 1012 communicate with DHCP clients, but unable to communicate with each 1013 other. 1015 The size of the resource pool allocated to the secondary is specified 1016 as a percentage of the currently available resources. Thus, as the 1017 number of available resources changes on the primary server, the 1018 number of resources available to the secondary server MUST also 1019 change, although the frequency of the changes made to the secondary 1020 server's pool of address resources SHOULD be low enough to not use 1021 significant processing power or network bandwidth. 1023 The required size of this private pool allocated to the secondary 1024 server is based only on the arrival rate of new DHCP clients and the 1025 length of expected downtime of the primary server, and is not 1026 directly influenced by the total number of DHCP clients supported by 1027 the server pair. 1029 8. Failover Mechanisms 1031 This section lays out an overview of the communication between 1032 partners and other mechanisms required for failover operation. As 1033 this is a design document, not a protocol specification, high level 1034 ideas are presented without implementation specific details (e.g. on- 1035 wire protocol formats). 1037 8.1. Time Skew 1039 Partners exchange information about known lease states. To reliably 1040 compare a known lease state with an update received from a partner, 1041 servers must be able to reliably compare the times stored in the 1042 known lease state with the times received in the update. Although a 1043 simple approach would be to require both partners to use synchronized 1044 time, e.g. by using NTP, such a service may not always be available 1045 in some scenarios that failover expects to cover. Therefore a 1046 mechanism to measure and track relative time differences between 1047 servers is necessary. To do so, each message MUST contain 1048 information about the time of the transmission in the time context of 1049 the transmitter. The transmitting server MUST set this as close to 1050 the actual transmission as possible. Transmission here is when data 1051 is added to the send queue of the socket (or the equivalent), as the 1052 application may not know about the time of the actual transmission of 1053 the "wire". The receiving partner MUST store its own timestamp of 1054 reception as close to the actual reception as possible. The received 1055 timestamp information is then compared with local timestamp. 1057 To account for packet delay variation (jitter), the measured 1058 difference is not used directly, but rather the moving average of 1059 last TIME_SKEW_PKTS_AVG packets time difference is calculated. This 1060 averaged value is referred to as the time skew. Note that the time 1061 skew algorithm allows cooperation between servers with completely 1062 desynchronized clocks as well as those whose desynchronization itself 1063 is not constant. 1065 8.2. Lazy updates 1067 Lazy update refers to the requirement placed on a server implementing 1068 a failover protocol to update its failover partner whenever the 1069 binding database changes. A failover protocol which didn't support 1070 lazy update would require the failover partner update to complete 1071 before a DHCPv6 server could respond to a DHCPv6 client request. 1072 Such approach is often referred to as 'lockstep' and is the opposite 1073 of lazy updates. The lazy update mechanism allows a server to 1074 allocate a new or extend an existing lease and then update its 1075 failover partner as time permits. 1077 Although the lazy update mechanism does not introduce additional 1078 delays in server response times, it introduces other difficulties. 1079 The key problem with lazy update is that when a server fails after 1080 updating a client with a particular lease time and before updating 1081 its partner, the partner will believe that a lease has expired even 1082 though the client still retains a valid lease on that address or 1083 prefix. It is also possible that the partner will have no record at 1084 all of the lease of the resource to the client. 1086 8.3. MCLT concept 1088 In order to handle problem introduced by lazy updates (see 1089 Section 8.2), a period of time known as the "Maximum Client Lead 1090 Time" (MCLT) is defined and must be known to both the primary and 1091 secondary servers. Proper use of this time interval places an upper 1092 bound on the difference allowed between the lease time provided to a 1093 DHCPv6 client by a server and the lease time known by that server's 1094 failover partner. 1096 The MCLT is typically much less than the lease time that a server has 1097 been configured to offer a client, and so some strategy must exist to 1098 allow a server to offer the configured lease time to a client. 1099 During a lazy update the updating server typically updates its 1100 partner with a potential expiration time which is longer than the 1101 lease time previously given to the client and which is longer than 1102 the lease time that the server has been configured to give a client. 1103 This allows that server to give a longer lease time to the client the 1104 next time the client renews its lease, since the time that it will 1105 give to the client will not exceed the MCLT beyond the potential 1106 expiration time acknowledged by its partner. 1108 The fundamental relationship on which much of the correctness of this 1109 protocol depends is that the lease expiration time known to a DHCPv6 1110 client MUST NOT be greater by more than the MCLT beyond the potential 1111 expiration time known to that server's failover partner. 1113 The remainder of this section makes the above fundamental 1114 relationship more explicit. 1116 This protocol requires a DHCPv6 server to deal with several different 1117 lease intervals and places specific restrictions on their 1118 relationships. The purpose of these restrictions is to allow the 1119 other server in the pair to be able to make certain assumptions in 1120 the absence of an ability to communicate between servers. 1122 The different times are: 1124 desired valid lifetime: 1126 The desired valid lifetime is the lease interval that a DHCPv6 1127 server would like to give to a DHCPv6 client in the absence of any 1128 restrictions imposed by the failover protocol. Its determination 1129 is outside of the scope of this protocol. Typically this is the 1130 result of external configuration of a DHCPv6 server. 1132 actual valid lifetime: 1133 The actual valid lifetime is the lease interval that a DHCPv6 1134 server gives out to a DHCPv6 client. It may be shorter than the 1135 desired valid lifetime (as explained below). 1137 potential valid lifetime: 1138 The potential valid lifetime is the potential lease expiration 1139 interval the local server tells to its partner in a BNDUPD 1140 message. 1142 acknowledged potential valid lifetime: 1143 The acknowledged potential valid lifetime is the potential lease 1144 interval the partner server has most recently acknowledged in a 1145 BNDACK message. 1147 8.3.1. MCLT example 1149 The following example demonstrates the MCLT concept in practice. The 1150 values used are arbitrarily chosen are and not a recommendation for 1151 actual values. The MCLT in this case is 1 hour. The desired valid 1152 lifetime is 3 days, and its renewal time is half the valid lifetime. 1154 When a server makes an offer for a new lease on an IP address to a 1155 DHCPv6 client, it determines the desired valid lifetime (in this 1156 case, 3 days). It then examines the acknowledged potential valid 1157 lifetime (which in this case is zero) and determines the remainder of 1158 the time left to run, which is also zero. It adds the MCLT to this 1159 value. Since the actual valid lifetime cannot be allowed to exceed 1160 the remainder of the current acknowledged potential valid lifetime 1161 plus the MCLT, the offer made to the client is for the remainder of 1162 the current acknowledged potential valid lifetime (i.e. zero) plus 1163 the MCLT. Thus, the actual valid lifetime is 1 hour (the MCLT). 1165 Once the server has sent the REPLY to the DHCPv6 client, it will 1166 update its failover partner with the lease information. However, the 1167 desired potential valid lifetime will be composed of one half of the 1168 current actual valid lifetime added to the desired valid lifetime. 1169 Thus, the failover partner is updated with a BNDUPD with a potential 1170 valid lifetime of 1/2 hour + 3 days. 1172 When the primary server receives a BNDACK to its update of the 1173 secondary server's (partner's) potential valid lifetime, it records 1174 that as the acknowledged potential valid lifetime. A server MUST NOT 1175 send a BNDACK in response to a BNDUPD message until it is sure that 1176 the information in the BNDUPD message has been updated in its lease 1177 database. See Section 8.9. Thus, the primary server in this case 1178 can be sure that the secondary server has recorded the potential 1179 lease interval in its stable storage when the primary server receives 1180 a BNDACK message from the secondary server. 1182 When the DHCPv6 client attempts to renew at T1 (approximately one 1183 half an hour from the start of the lease), the primary server again 1184 determines the desired valid lifetime, which is still 3 days. It 1185 then compares this with the original acknowledged potential valid 1186 lifetime (1/2 hour + 3 days) and adjusts for the time passed since 1187 the secondary was last updated (1/2 hour). Thus the time remaining 1188 of the acknowledged potential valid interval is 3 days. Adding the 1189 MCLT to this yields 3 days plus 1 hour, which is more than the 1190 desired valid lifetime of 3 days. So the client is renewed for the 1191 desired valid lifetime -- 3 days. 1193 When the primary DHCPv6 server updates the secondary DHCPv6 server 1194 after the DHCPv6 client's renewal REPLY is complete, it will 1195 calculate the desired potential valid lifetime as the T1 fraction of 1196 the actual client valid lifetime (1/2 of 3 days this time = 1.5 1197 days). To this it will add the desired client valid lifetime of 3 1198 days, yielding a total desired potential valid lifetime of 4.5 days. 1199 In this way, the primary attempts to have the secondary always "lead" 1200 the client in its understanding of the client's valid lifetime so as 1201 to be able to always offer the client the desired client valid 1202 lifetime. 1204 Once the initial actual client valid lifetime of the MCLT is past, 1205 the protocol operates effectively like the DHCPv6 protocol does today 1206 in its behavior concerning valid lifetimes. However, the guarantee 1207 that the actual client valid lifetime will never exceed the remaining 1208 acknowledged partner server potential valid lifetime by more than the 1209 MCLT allows full recovery from a variety of failures. 1211 8.4. Unreachability detection 1213 Each partner MUST maintain a FO_SEND timer for each failover 1214 connection. The FO_SEND timer is reset every time any message is 1215 transmitted. If the timer reaches the FO_SEND_MAX value, a CONTACT 1216 message is transmitted and timer is reset. The CONTACT message may 1217 be transmitted at any time. An implementation MAY use additional 1218 mechanisms to detect partner unreachability. 1220 Implementers are advised to keep in mind that the timer based CONTACT 1221 message mechanism is not perfect and may not detect some failures. 1223 In particular, if the partner is using one interface to reach clients 1224 ("downlink") and another to reach its partner ("uplink"), it is 1225 possible that communication with the clients will break, yet the 1226 mechanism will still claim full reachability. For that reason it is 1227 beneficial to share the same interface for client traffic and 1228 communication with the failover partner. That approach may have 1229 drawbacks in some network topologies. 1231 8.5. Re-allocating Leases 1233 When in PARTNER-DOWN state there is a waiting period after which a 1234 resource can be re-allocated to another client. For resources which 1235 are available when the server enters PARTNER-DOWN state, the period 1236 is the MCLT from the entry into PARTNER-DOWN state. For resources 1237 which are not available when the server enters PARTNER-DOWN state, 1238 the period is the MCLT after the later of the following times: the 1239 potential valid lifetime, the most recently transmitted potential 1240 valid lifetime, the most recently received acknowledged potential 1241 valid lifetime, and the most recently transmitted acknowledged 1242 potential valid lifetime. If this time would be earlier than the 1243 current time plus the MCLT, then the time the server entered PARTNER- 1244 DOWN state plus the maximum-client-lead-time is used. 1246 In any other state, a server cannot reallocate a resource from one 1247 client to another without first notifying its partner (through a 1248 BNDUPD message) and receiving acknowledgement (through a BNDACK 1249 message) that its partner is aware that that first client is not 1250 using the resource. 1252 This could be modeled in the following way. Though this specific 1253 implementation is in no way required, it may serve to better 1254 illustrate the concept. 1256 An "available" resource on a server may be allocated to any client. 1257 A resource which was leased to a client and which expired or was 1258 released by that client would take on a new state, EXPIRED or 1259 RELEASED respectively. The partner server would then be notified 1260 that this resource was EXPIRED or RELEASED through a BNDUPD. When 1261 the sending server received the BNDACK for that resource showing it 1262 was FREE, it would move the resource from EXPIRED or RELEASED to 1263 FREE, and it would be available for allocation by the primary server 1264 to any clients. 1266 A server MAY reallocate a resource in the EXPIRED or RELEASED state 1267 to the same client with no restrictions provided it has not sent a 1268 BNDUPD message to its partner. This situation would exist if the 1269 lease expired or was released after the transition into PARTNER-DOWN 1270 state, for instance. 1272 8.6. Sending Binding Update 1274 This and the following section is written as though every BNDUPD 1275 message contains only a single binding update transaction in order to 1276 reduce the complexity of the discussion. Servers MAY generate 1277 messages with multiple binding update transactions in them, and their 1278 partner servers MAY process these messages. Before multiple binding 1279 update transactions are to be sent and processed over a failover 1280 connection, their use MUST be negotiated during the CONNECT and 1281 CONNECTACK connection establishment processing. 1283 Each server updates its failover partner about recent changes in 1284 lease states. Each update MUST include at least the following 1285 information: 1287 1. resource type - non-temporary address or a prefix. Resource 1288 type can be indicated by the container that conveys the actual 1289 resource (e.g. an IA_NA option indicates non-temporary IPv6 1290 address); 1292 2. resource information - the actual address or prefix. That is 1293 conveyed using the appropriate option, e.g. an IAADDR for an 1294 address or an IAPREFIX for a prefix; 1296 3. valid life time sent to client*; 1298 4. IAID - Identity Association used by the client, while obtaining 1299 a given lease. (Note1: one client may use many IAIDs 1300 simultaneously. Note2: IAID for IA, TA and PD are orthogonal 1301 number spaces.)*; 1303 5. Next Expected Client Transmission (renewal time) - time interval 1304 since Client Last Transmission Time, when a response from a 1305 client is expected*; 1307 6. potential valid life time - a lifetime that the server is 1308 willing to set if there were no MCLT/failover restrictions 1309 imposed*; 1311 7. preferred life time sent to client - the actual value sent back 1312 to the client*; 1314 8. CLTT - Client Last Transaction Time, a timestamp of the last 1315 received transmission from a client*; 1317 9. Client DUID*. 1319 10. Resource state. 1321 11. start time of state (especially for non-client updates). 1323 Items marked with asterisk MUST appear only if the lease is/was 1324 associated with a client. Otherwise it MUST NOT appear. 1326 The BNDUPD message MAY contain additional information related to the 1327 updated lease. The additional information MAY include, but is not 1328 limited to: 1330 1. assigned FQDN name, defined in [RFC4704]; 1332 2. Options Requested by the client, i.e. content of the ORO; 1334 3. Relay Data option from DHCPv6 Leasequery, see [RFC5007] 1335 Section 4.1.2.4 1337 4. Any other options the updating partner deems useful. 1339 The receiving partner MAY store any additional information received, 1340 but it MAY choose to ignore it as well. Some information may be 1341 useful, so it is a good idea to keep or update it. One reason is 1342 FQDN information. A server SHOULD be prepared to clean up DNS 1343 information once the lease expires or is released. See Section 11 1344 for a detailed discussion about Dynamic DNS. Another reason the 1345 partner may be interested in keeping additional data is a better 1346 support for leasequery [RFC5007] or bulk leasequery [RFC5460], which 1347 features queries based on Relay-ID, by link address and by Remote-ID. 1349 8.7. Receiving Binding Update 1351 When a server receives a BNDUPD message, it needs to decide how to 1352 process the binding update transaction it contains and whether that 1353 transaction represents a conflict of any sort. The conflict 1354 resolution process MUST be used on the receipt of every BNDUPD 1355 message, not just those that are received while in POTENTIAL-CONFLICT 1356 state, in order to increase the robustness of the protocol. 1358 There are three sorts of conflicts: 1360 1. Two clients, one resource - This is the duplicate resource 1361 allocation conflict. There two different clients each allocated 1362 the same resource. See Section 8.8. 1364 2. Two resources, one client conflict - This conflict exists when a 1365 client on one server is associated with a one resource, and on 1366 the other server with a different resource in the same or related 1367 prefix. This does not refer to the case where a single client 1368 has resources in multiple different prefixes or administrative 1369 domains (i.e. a mobile client that changed its location), but 1370 rather the case where on the same prefix the client has a lease 1371 on one IP address in one server and on a different IP address on 1372 the other server. 1374 This conflict may or may not be a problem for a given DHCP server 1375 implementation and policy. If implementations and policies 1376 allow, both resources can be assigned to a given client. In the 1377 event that a DHCP server requires that a DHCP client have only 1378 one outstanding lease of a given type, the conflict MUST be 1379 resolved by accepting the lease which has the latest CLTT. 1381 It should be further clarified that DHCPv6 protocol makes 1382 assignments based on a (client DUID, resource type, IAID) 1383 triplet. The possibility of using different IAIDs was omitted in 1384 this paragraph for clarity. If one client is assigned multiple 1385 resources of the same type, but with different IAIDs, there is no 1386 conflict. Also, IAID values for different resource types are 1387 orthogonal, i.e. an IA_NA with IAID=1 is different than an IA_PD 1388 with IAID=1 and there is no conflict. 1390 3. binding-status conflict - This is normal conflict, where one 1391 server is updating the other with newer information. See 1392 Section 8.8 for details of how to resolve these conflicts. 1394 4. configuration conflict -- This kind of conflict stems from a 1395 differing configuration on one server than on the other server. 1396 It may be transient (last until both servers can process a new 1397 configuration) or it may be chronic. It cannot be resolved by 1398 communications over the failover connection, but must be resolved 1399 (if it is not transient) by administrator action to resolve the 1400 conflicts. 1402 8.8. Conflict Resolution 1404 The server receiving a lease update from its partner must evaluate 1405 the received lease information to see if it is consistent with 1406 already known state and decide which information - the previously 1407 known or that just received - is "better". The server should take 1408 into consideration the following aspects: if the lease is already 1409 assigned to a specific client, who had contact with client recently, 1410 start time of the lease, etc. 1412 When analyzing a BNDUPD message from a partner server, if there is 1413 insufficient information in the BNDUPD to process it, then reject the 1414 BNDUPD with reject-reason "Missing binding information". 1416 If the resource in the BNDUPD is not a resource associated with the 1417 failover endpoint which received the BNDUPD message, then reject it 1418 with reject-reason "Illegal IP address or prefix (not part of any 1419 address or prefix pool)". 1421 Every BNDUPD message SHOULD contain a client-last-transaction-time 1422 option, which MUST, if it appears, be the time that the server last 1423 interacted with the DHCP client. It MUST NOT be, for instance, the 1424 time that the lease on an IP address expired. If there has been no 1425 interaction with the DHCP client in question (or there is no DHCP 1426 client presently associated with this resource), then there will be 1427 no client-last-transaction-time option in the BNDUPD message. 1429 The list in Figure 3 presents the conflict resolution outcome. To 1430 "accept" a BNDUPD means to update the server's bindings database with 1431 the information contained in the BNDUDP and once the update is 1432 complete, send a BNDACK message corresponding to the BNDUPD message. 1433 To "reject" a BNDUPD means to leave the server's binding database 1434 unchanged and to respond to the BNDUPD with BNDACK with a reject- 1435 reason option included. 1437 When interpreting the information in the following table (Figure 3), 1438 for those rules that are listed with "time" -- if a BNDUPD doesn't 1439 have a client-last-transaction-time value, then it MUST NOT be 1440 considered later than the client-last-transaction-time in the 1441 receiving server's binding. If the BNDUPD contains a client-last- 1442 transaction-time value and the receiving server's binding does not, 1443 then the client-last-transaction-time value in the BNDUPD MUST be 1444 considered later than the server's. 1446 binding-status in received BNDUPD. 1447 binding-status 1448 in receiving FREE RESET 1449 server ACTIVE EXPIRED RELEASED FREE_BACKUP ABANDONED 1451 ACTIVE accept(5) time(2) time(1) time(2) accept 1452 EXPIRED time(1) accept accept accept accept 1453 RELEASED time(1) time(1) accept accept accept 1454 FREE/FREE_BACKUP accept accept accept accept accept 1455 RESET time(3) accept accept accept accept 1456 ABANDONED reject(4) reject(4) reject(4) reject(4) accept 1458 Figure 3: Conflict Resolution 1460 time(1): If the client-last-transaction-time in the BNDUPD is later 1461 than the client-last-transaction-time in the receiving server's 1462 binding, accept it, else reject it. 1464 time(2): If the current time is later than the receiving server's 1465 lease-expiration-time, accept it, else reject it. 1467 time(3): If the client-last-transaction-time in the BNDUPD is later 1468 than the start-time-of-state in the receiving server's binding, 1469 accept it, else reject it. 1471 (1,2,3): If rejecting, use reject reason "Outdated binding 1472 information". 1474 (4): Use reject reason "Less critical binding information". 1476 (5): If the clients in a BNDUPD message and in a receiving server's 1477 binding differ, then if the receiving server is a secondary accept 1478 it, else reject it with a reject reason of "Fatal conflict exists: 1479 address in use by other client". 1481 The lease update may be accepted or rejected. Rejection SHOULD NOT 1482 change the flag in a lease that says that it should be transmitted to 1483 the failover partner. If this flag is set, then it should be 1484 transmitted, but if it is not already set, the rejection of a lease 1485 state update SHOULD NOT trigger an automatic update of the failover 1486 partner sending the rejected update. The potential for update storms 1487 is too great, and in the unusual case where the servers simply can't 1488 agree, that disagreement is better than an update storm. 1490 8.9. Acknowledging Reception 1492 Upon acceptance of a binding lease, the server MUST notify its 1493 partner that it updated its database. A server MUST NOT send the 1494 BNDACK before its database is updated. A BNDACK MUST contain at 1495 lease the minimum set of information required to unambiguously 1496 identify the BNDUPD that triggered the BNDACK. 1498 9. Endpoint States 1500 9.1. State Machine Operation 1502 Each server (or, more accurately, failover endpoint) can take on a 1503 variety of failover states. These states play a crucial role in 1504 determining the actions that a server will perform when processing a 1505 request from a DHCPv6 client as well as dealing with changing 1506 external conditions (e.g., loss of connection to a failover partner). 1508 The failover state in which a server is running controls the 1509 following behaviors: 1511 o Responsiveness -- the server is either responsive to DHCPv6 client 1512 requests or it is not. 1514 o Allocation Pool -- which pool of addresses (or prefixes) can be 1515 used for advertisement on receipt of a SOLICIT or allocation on 1516 receipt of a REQUEST message. 1518 o MCLT -- ensure that valid lifetimes are not beyond what the 1519 partner has acked plus the MCLT (or not). 1521 A server will transition from one failover state to another based on 1522 the specific values held by the following state variables: 1524 o Current failover state. 1526 o Communications status (OK or not OK). 1528 o Partner's failover state (if known). 1530 Whenever any of the above state variables changes state, the state 1531 machine is invoked, which may then trigger a change in the current 1532 failover state. Thus, whenever the communications status changes, 1533 the state machine processing is invoked. This may or may not result 1534 in a change in the current failover state. 1536 Whenever a server transitions to a new failover state, the new state 1537 MUST be communicated to its failover partner in a STATE message if 1538 the communications status is OK. In addition, whenever a server 1539 makes a transition into a new state, it MUST record the new state, 1540 its current understanding of its partner's state, and the time at 1541 which it entered the new state in stable storage. 1543 The following state transition diagram gives a condensed view of the 1544 state machine. If there is a difference between the words describing 1545 a particular state and the diagram below, the words should be 1546 considered authoritative. 1548 In the state transition diagram below, the "+" or "-" in the upper 1549 right corner of each state is a notation about whether communication 1550 is ongoing with the other server. 1552 +---------------+ V +--------------+ 1553 | RECOVER -|+| | | STARTUP - | 1554 |(unresponsive) | +->+(unresponsive)| 1555 +------+--------+ +--------------+ 1556 +-Comm. OK +-----------------+ 1557 | Other State: | PARTNER DOWN - +<---------------------+ 1558 | RESOLUTION-INTER. | (responsive) | ^ 1559 All POTENTIAL- +----+------------+ | 1560 Others CONFLICT------------ | --------+ | 1561 | CONFLICT-DONE Comm. OK | +--------------+ | 1562 UPDREQ or Other State: | +--+ RESOLUTION - | | 1563 UPDREQALL | | | | | INTERRUPTED | | 1564 Rcv UPDDONE RECOVER All | | | (responsive) | | 1565 | +---------------+ | Others | | +------------+-+ | 1566 +->+RECOVER-WAIT +-| RECOVER | | | ^ | | 1567 |(unresponsive) | WAIT or | | Comm. | Ext. | 1568 +-----------+---+ DONE | | OK Comm. Cmd---->+ 1569 Comm.---+ Wait MCLT | V V V Failed | 1570 Changed | V +---+ +---+-----+--+-+ | | 1571 | +---+----------++ | | POTENTIAL + +-------+ | 1572 | |RECOVER-DONE +-| Wait | CONFLICT +------+ | 1573 +->+(unresponsive) | for |(unresponsive)| Primary | 1574 +------+--------+ Other +>+----+--------++ resolve Comm. | 1575 Comm. OK State: | | ^ conflict Changed| 1576 +---Other State:-+ RECOVER | Secondary | V V | | 1577 | | | DONE | resolve | ++----------+---++ | 1578 | All Others: POTENT. | | conflict | |CONFLICT-DONE-|+| | 1579 | Wait for CONFLICT--|-----+ | | | (responsive) | | 1580 | Other State: V V | +------+---------+ | 1581 | NORMAL or RECOVER ++------------+---+ | Other State: NORMAL | 1582 | | DONE | NORMAL + +<--------------+ | 1583 | +--+----------+-->+ (responsive) +-------External Command-->+ 1584 | ^ ^ +--------+--------+ | 1585 | | | | | | 1586 | Wait for Comm. OK Comm. Failed | | 1587 | Other Other | | External 1588 | State: State: | | Command 1589 | RECOVER-DONE NORMAL Start Safe Comm. OK or 1590 | | COMM. INT. Period Timer Other State: Safe 1591 | Comm. OK. | V All Others Period 1592 | Other State: | +---------+--------+ | expiration 1593 | RECOVER +--+ COMMUNICATIONS - +----+ | 1594 | +-------------+ INTERRUPTED | | 1595 RECOVER | (responsive) +------------------------->+ 1596 RECOVER-WAIT--------->+------------------+ 1598 Figure 4: Failover Endpoint State Machine 1600 9.2. State Machine Initialization 1602 The state machine is characterized by storage (in stable storage) of 1603 at least the following information: 1605 o Current failover state. 1607 o Previous failover state. 1609 o Start time of current failover state. 1611 o Partner's failover state. 1613 o Start time of partner's failover state. 1615 o Time most recent packet received from partner. 1617 The state machine is initialized by reading these data items from 1618 stable storage and restoring their values from the information saved. 1619 If there is no information in stable storage concerning these items, 1620 then they should be initialized as follows: 1622 o Current failover state: Primary: PARTNER-DOWN, Secondary: RECOVER 1624 o Previous failover state: None. 1626 o Start time of current failover state: Current time. 1628 o Partner's failover state: None until reception of STATE message. 1630 o Start time of partner's failover state: None until reception of 1631 STATE message. 1633 o Time most recent packet received from partner: None until packet 1634 received. 1636 9.3. STARTUP State 1638 The STARTUP state affords an opportunity for a server to probe its 1639 partner server, before starting to service DHCP clients. When in the 1640 STARTUP state, a server attempts to learn its partner's state and 1641 determine (using that information if it is available) what state it 1642 should enter. 1644 The STARTUP state is not shown with any specific state transitions in 1645 the state machine diagram (Figure 4) because the processing during 1646 the STARTUP state can cause the server to transition to any of the 1647 other states, so that specific state transition arcs would only 1648 obscure other information. 1650 9.3.1. Operation in STARTUP State 1652 The server MUST NOT be responsive to DHCPv6 clients in STARTUP state. 1654 Whenever a STATE message is sent to the partner while in STARTUP 1655 state the STARTUP flag MUST be set in the message and the previously 1656 recorded failover state MUST be placed in the server-state option. 1658 9.3.2. Transition Out of STARTUP State 1660 The following algorithm is followed every time the server initializes 1661 itself, and enters STARTUP state. 1663 Step 1: 1665 If there is any record in stable storage of a previous failover state 1666 for this server, set PREVIOUS-STATE to the last recorded value in 1667 stable storage, and go to Step 2. 1669 If there is no record of any previous failover state in stable 1670 storage for this server, then set the PREVIOUS-STATE to RECOVER and 1671 set the TIME-OF-FAILURE to 0. This will allow two servers which 1672 already have lease information to synchronize themselves prior to 1673 operating. 1675 In some cases, an existing server will be commissioned as a failover 1676 server and brought back into operation where its partner is not yet 1677 available. In this case, the newly commissioned failover server will 1678 not operate until its partner comes online -- but it has operational 1679 responsibilities as a DHCP server nonetheless. To properly handle 1680 this situation, a server SHOULD be configurable in such a way as to 1681 move directly into PARTNER-DOWN state after the startup period 1682 expires if it has been unable to contact its partner during the 1683 startup period. 1685 Step 2: 1687 Implementations will differ in the ways that they deal with the state 1688 machine for failover endpoint states. In many cases, state 1689 transitions will occur when communications goes from "OK" to failed, 1690 or from failed to "OK", and some implementations will implement a 1691 portion of their state machine processing based on these changes. 1693 In these cases, during startup, if the previous state is one where 1694 communications was "OK", then set the previous state to the state 1695 that is the result of the communications failed state transition when 1696 in that state (if such transition exists -- some states don't have a 1697 communications failed state transition, since they allow both 1698 communications OK and failed). 1700 Step 3: 1702 Start the STARTUP state timer. The time that a server remains in the 1703 STARTUP state (absent any communications with its partner) is 1704 implementation dependent but SHOULD be short. It SHOULD be long 1705 enough for a TCP connection to be created to a heavily loaded partner 1706 across a slow network. 1708 Step 4: 1710 Attempt to create a TCP connection to the failover partner. 1712 Step 5: 1714 Wait for "communications OK". 1716 When and if communications become "okay", clear the STARTUP flag, and 1717 set the current state to the PREVIOUS-STATE. 1719 If the partner is in PARTNER-DOWN state, and if the time at which it 1720 entered PARTNER-DOWN state (as received in the start-time-of-state 1721 option in the STATE message) is later than the last recorded time of 1722 operation of this server, then set CURRENT-STATE to RECOVER. If the 1723 time at which it entered PARTNER-DOWN state is earlier than the last 1724 recorded time of operation of this server, then set CURRENT-STATE to 1725 POTENTIAL-CONFLICT. 1727 Then, transition to the current state and take the "communications 1728 OK" state transition based on the current state of this server and 1729 the partner. 1731 Step 6: 1733 If the startup time expires the server SHOULD transition to the 1734 PREVIOUS-STATE. 1736 9.4. PARTNER-DOWN State 1738 PARTNER-DOWN state is a state either server can enter. When in this 1739 state, the server assumes that it is the only server operating and 1740 serving the client base. If one server is in PARTNER-DOWN state, the 1741 other server MUST NOT be operating. 1743 A server can enter PARTNER-DOWN state either as a result of operator 1744 intervention (when an operator determines that the server's partner 1745 is, indeed, down), or as a result of an optional auto-partner-down 1746 capability where PARTNER-DOWN state is entered automatically after a 1747 server has been in COMMUNICATIONS-INTERRUPTED state for a pre- 1748 determined period of time. 1750 9.4.1. Operation in PARTNER-DOWN State 1752 The server MUST be responsive in PARTNER-DOWN state, regardless if it 1753 is primary or secondary. 1755 It will allow renewal of all outstanding leases on resources. For 1756 those resources for which the server is using proportional 1757 allocation, it will allocate resources from its own pool, and after a 1758 fixed period of time (the MCLT interval) has elapsed from entry into 1759 PARTNER-DOWN state, it may allocate IP addresses from the set of all 1760 available pools. Server SHOULD fully deplete its own pool, before 1761 starting allocations from its downed partner's pool. 1763 Any resource tagged as available for allocation by the other server 1764 (at entry to PARTNER-DOWN state) MUST NOT be allocated to a new 1765 client until the MCLT beyond the entry into PARTNER-DOWN state has 1766 elapsed. 1768 A server in PARTNER-DOWN state MUST NOT allocate a resource to a DHCP 1769 client different from that to which it was allocated at the entrance 1770 to PARTNER-DOWN state until the MCLT beyond the maximum of the 1771 following times: client expiration time, most recently transmitted 1772 potential-expiration-time, most recently received ack of potential- 1773 expiration-time from the partner, and most recently acked potential- 1774 expiration-time to the partner. If this time would be earlier than 1775 the current time plus the maximum-client-lead-time, then the time the 1776 server entered PARTNER-DOWN state plus the maximum-client-lead-time 1777 is used. 1779 The server is not restricted by the MCLT when offering lease times 1780 while in PARTNER-DOWN state. 1782 In the unlikely case when there are two servers operating in a 1783 PARTNER-DOWN state, there is a chance of duplicate leases assigned. 1785 This leads to a POTENTIAL-CONFLICT (unresponsive) state when they re- 1786 establish contact. The duplicate lease issue can be postponed to a 1787 large extent by the server granting new leases first from its own 1788 pool. Therefore the server operating in PARTNER-DOWN state MUST use 1789 its own pool first for new leases before assigning any leases from 1790 its downed partner pool. 1792 9.4.2. Transition Out of PARTNER-DOWN State 1794 When a server in PARTNER-DOWN state succeeds in establishing a 1795 connection to its partner, its actions are conditional on the state 1796 and flags received in the STATE message from the other server as part 1797 of the process of establishing the connection. 1799 If the STARTUP bit is set in the server-flags option of a received 1800 STATE message, a server in PARTNER-DOWN state MUST NOT take any state 1801 transitions based on reestablishing communications. Essentially, if 1802 a server is in PARTNER-DOWN state, it ignores all STATE messages from 1803 its partner that have the STARTUP bit set in the server-flags option 1804 of the STATE message. 1806 If the STARTUP bit is not set in the server-flags option of a STATE 1807 message received from its partner, then a server in PARTNER-DOWN 1808 state takes the following actions based on the state of the partner 1809 as received in a STATE message (either immediately after establishing 1810 communications or at any time later when a new state is received) 1812 o If the partner is in: [ NORMAL, COMMUNICATIONS-INTERRUPTED, 1813 PARTNER-DOWN, POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or 1814 CONFLICT-DONE ] state, then transition to POTENTIAL-CONFLICT state 1816 o If the partner is in: [ RECOVER, RECOVER-WAIT ] state stay in 1817 PARTNER-DOWN state 1819 o If the partner is in: [ RECOVER-DONE ] state transition into 1820 NORMAL state 1822 9.5. RECOVER State 1824 This state indicates that the server has no information in its stable 1825 storage or that it is re-integrating with a server in PARTNER-DOWN 1826 state after it has been down. A server in this state MUST attempt to 1827 refresh its stable storage from the other server. 1829 9.5.1. Operation in RECOVER State 1831 The server MUST NOT be responsive in RECOVER state. 1833 A server in RECOVER state will attempt to reestablish communications 1834 with the other server. 1836 9.5.2. Transition Out of RECOVER State 1838 If the other server is in POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, 1839 or CONFLICT-DONE state when communications are reestablished, then 1840 the server in RECOVER state will move to POTENTIAL-CONFLICT state 1841 itself. 1843 If the other server is in any other state, then the server in RECOVER 1844 state will request an update of missing binding information by 1845 sending an UPDREQ message. If the server has determined that it has 1846 lost its stable storage because it has no record of ever having 1847 talked to its partner, while its partner does have a record of 1848 communicating with it, it MUST send an UPDREQALL message, otherwise 1849 it MUST send an UPDREQ message. 1851 It will wait for an UPDDONE message, and upon receipt of that message 1852 it will transition to RECOVER-WAIT state. 1854 If communications fails during the reception of the results of the 1855 UPDREQ or UPDREQALL message, the server will remain in RECOVER state, 1856 and will re-issue the UPDREQ or UPDREQALL when communications are re- 1857 established. 1859 If an UPDDONE message isn't received within an implementation 1860 dependent amount of time, and no BNDUPD messages are being received, 1861 the connection SHOULD be dropped. 1863 A B 1864 Server Server 1866 | | 1867 RECOVER PARTNER-DOWN 1868 | | 1869 | >--UPDREQ--------------------> | 1870 | | 1871 | <---------------------BNDUPD--< | 1872 | >--BNDACK--------------------> | 1873 ... ... 1874 | | 1875 | <---------------------BNDUPD--< | 1876 | >--BNDACK--------------------> | 1877 | | 1878 | <--------------------UPDDONE--< | 1879 | | 1880 RECOVER-WAIT | 1881 | | 1882 | >--STATE-(RECOVER-WAIT)------> | 1883 | | 1884 | | 1885 Wait MCLT from last known | 1886 time of failover operation | 1887 | | 1888 RECOVER-DONE | 1889 | | 1890 | >--STATE-(RECOVER-DONE)------> | 1891 | NORMAL 1892 | <-------------(NORMAL)-STATE--< | 1893 NORMAL | 1894 | >---- State-(NORMAL)---------------> | 1895 | | 1896 | | 1898 Figure 5: Transition out of RECOVER state 1900 If at any time while a server is in RECOVER state communications 1901 fails, the server will stay in RECOVER state. When communications 1902 are restored, it will restart the process of transitioning out of 1903 RECOVER state. 1905 9.6. RECOVER-WAIT State 1907 This state indicates that the server has sent an UPDREQ or UPDREQALL 1908 and has received the UPDDONE message indicating that it has received 1909 all outstanding binding update information. In the RECOVER-WAIT 1910 state the server will wait for the MCLT in order to ensure that any 1911 processing that this server might have done prior to losing its 1912 stable storage will not cause future difficulties. 1914 9.6.1. Operation in RECOVER-WAIT State 1916 The server MUST NOT be responsive in RECOVER-WAIT state. 1918 9.6.2. Transition Out of RECOVER-WAIT State 1920 Upon entry to RECOVER-WAIT state the server MUST start a timer whose 1921 expiration is set to a time equal to the time the server went down 1922 (if known) or the time the server started (if the down-time is 1923 unknown) plus the maximum-client-lead-time. When this timer expires, 1924 the server will transition into RECOVER-DONE state. 1926 This is to allow any IP addresses that were allocated by this server 1927 prior to loss of its client binding information in stable storage to 1928 contact the other server or to time out. 1930 If this is the first time this server has run failover -- as 1931 determined by the information received from the partner, not 1932 necessarily only as determined by this server's stable storage (as 1933 that may have been lost), then the waiting time discussed above may 1934 be skipped, and the server MAY transition immediately to RECOVER-DONE 1935 state. 1937 If the server has never before run failover, then there is no need to 1938 wait in this state -- but, again, to determine if this server has run 1939 failover it is vital that the information provided by the partner be 1940 utilized, since the stable storage of this server may have been lost. 1942 If communications fails while a server is in RECOVER-WAIT state, it 1943 has no effect on the operation of this state. The server SHOULD 1944 continue to operate its timer, and if the timer expires during the 1945 period where communications with the other server have failed, then 1946 the server SHOULD transition to RECOVER-DONE state. This is rare -- 1947 failover state transitions are not usually made while communications 1948 are interrupted, but in this case there is no reason to inhibit the 1949 timer. 1951 9.7. RECOVER-DONE State 1953 This state exists to allow an interlocked transition for one server 1954 from RECOVER state and another server from PARTNER-DOWN or 1955 COMMUNICATIONS-INTERRUPTED state into NORMAL state. 1957 9.7.1. Operation in RECOVER-DONE State 1959 A server in RECOVER-DONE state SHOULD be unresponsive, but MAY 1960 respond to RENEW requests but MUST only change the state of resources 1961 that appear in the RENEW request. It MUST NOT allocate any 1962 additional resources when in RECOVER-DONE state. 1964 9.7.2. Transition Out of RECOVER-DONE State 1966 When a server in RECOVER-DONE state determines that its partner 1967 server has entered NORMAL or RECOVER-DONE state, then it will 1968 transition into NORMAL state. 1970 If communication fails while in RECOVER-DONE state, a server will 1971 stay in RECOVER-DONE state. 1973 9.8. NORMAL State 1975 NORMAL state is the state used by a server when it is communicating 1976 with the other server, and any required resynchronization has been 1977 performed. While some bindings database synchronization is performed 1978 in NORMAL state, potential conflicts are resolved prior to entry into 1979 NORMAL state as is binding database data loss. 1981 When entering NORMAL state, a server will send to the other server 1982 all currently unacknowledged binding updates as BNDUPD messages. 1984 When the above process is complete, if the server entering NORMAL 1985 state is a secondary server, then it will request resources 1986 (addresses and/or prefixes) for allocation using the POOLREQ message. 1988 9.8.1. Operation in NORMAL State 1990 Primary server is responsive in NORMAL state. Secondary is 1991 unresponsive in NORMAL state. 1993 When in NORMAL state a primary server will operate in the following 1994 manner: 1996 Lease time calculations 1997 As discussed in Section 8.3, the lease interval given to a DHCP 1998 client can never be more than the MCLT greater than the most 1999 recently received potential-expiration-time from the failover 2000 partner or the current time, whichever is later. 2002 As long as a server adheres to this constraint, the specifics of 2003 the lease interval that it gives to a DHCP client or the value of 2004 the potential-expiration-time sent to its failover partner are 2005 implementation dependent. 2007 Lazy update of partner server 2008 After sending a REPLY that includes a lease update to a client, 2009 the server servicing a DHCP client request attempts to update its 2010 partner with the new binding information. 2012 Reallocation of resources between clients 2013 Whenever a client binding is released or expires, a BNDUPD message 2014 must be sent to the partner, setting the binding state to RELEASED 2015 or EXPIRED. However, until a BNDACK is received for this message, 2016 the resource cannot be allocated to another client. It cannot be 2017 allocated to the same client again if a BNDUPD was sent, otherwise 2018 it can. See Section 8.5 for details. 2020 In NORMAL state, each server receives binding updates from its 2021 partner server in BNDUPD messages. It records these in its client 2022 binding database in stable storage and then sends a corresponding 2023 BNDACK message to its partner server. 2025 9.8.2. Transition Out of NORMAL State 2027 If an external command is received by a server in NORMAL state 2028 informing it that its partner is down, then transition into PARTNER- 2029 DOWN state. Generally, this would be an unusual situation, where 2030 some external agency knew the partner server was down prior to the 2031 failover server discovering it on its own. 2033 If a server in NORMAL state fails to receive acks to messages sent to 2034 its partner for an implementation dependent period of time, it MAY 2035 move into COMMUNICATIONS-INTERRUPTED state. This situation might 2036 occur if the partner server was capable of maintaining the TCP 2037 connection between the server and also capable of sending a CONTACT 2038 message periodically, but was (for some reason) incapable of 2039 processing BNDUPD messages. 2041 If the communications is determined to not be "ok" (as defined in 2042 Section 8.4), then transition into COMMUNICATIONS-INTERRUPTED state. 2044 If a server in NORMAL state receives any messages from its partner 2045 where the partner has changed state from that expected by the server 2046 in NORMAL state, then the server should transition into 2047 COMMUNICATIONS-INTERRUPTED state and take the appropriate state 2048 transition from there. For example, it would be expected for the 2049 partner to transition from POTENTIAL-CONFLICT into NORMAL state, but 2050 not for the partner to transition from NORMAL into POTENTIAL-CONFLICT 2051 state. 2053 If a server in NORMAL state receives a DISCONNECT message from its 2054 partner, the server should transition into COMMUNICATIONS-INTERRUPTED 2055 state. 2057 9.9. COMMUNICATIONS-INTERRUPTED State 2059 A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is 2060 unable to communicate with its partner. Primary and secondary 2061 servers cycle automatically (without administrative intervention) 2062 between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network 2063 connection between them fails and recovers, or as the partner server 2064 cycles between operational and non-operational. No duplicate 2065 resource allocation can occur while the servers cycle between these 2066 states. 2068 When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been 2069 configured to support an automatic transition out of COMMUNICATIONS- 2070 INTERRUPTED state and into PARTNER-DOWN state (i.e., a auto-partner- 2071 down has been configured), then a timer MUST be started for the 2072 length of the configured auto-partner-down period. 2074 A server transitioning into the COMMUNICATIONS-INTERRUPTED state from 2075 the NORMAL state SHOULD raise some alarm condition to alert 2076 administrative staff to a potential problem in the DHCP subsystem. 2078 9.9.1. Operation in COMMUNICATIONS-INTERRUPTED State 2080 In this state a server MUST respond to all DHCP client requests. 2081 When allocating new leases, each server allocates from its own pool, 2082 where the primary MUST allocate only FREE resources, and the 2083 secondary MUST allocate only FREE_BACKUP resources. When responding 2084 to RENEW messages, each server will allow continued renewal of a DHCP 2085 client's current lease on a resource irrespective of whether that 2086 lease was given out by the receiving server or not, although the 2087 renewal period MUST NOT exceed the maximum client lead time (MCLT) 2088 beyond the latest of: 1) the potential valid lifetime already 2089 acknowledged by the other server, or 2) now, or 3) the potential 2090 valid lifetime received from the partner server. 2092 However, since the server cannot communicate with its partner in this 2093 state, the acknowledged potential valid lifetime will not be updated 2094 in any new bindings. This is likely to eventually cause the actual 2095 valid lifetimes to converge to the MCLT (unless this is greater than 2096 the desired-client-lease-time). 2098 The server should continue to try to establish a connection with its 2099 partner. 2101 9.9.2. Transition Out of COMMUNICATIONS-INTERRUPTED State 2103 If the safe period timer expires while a server is in the 2104 COMMUNICATIONS-INTERRUPTED state, it will transition immediately into 2105 PARTNER-DOWN state. 2107 If an external command is received by a server in COMMUNICATIONS- 2108 INTERRUPTED state informing it that its partner is down, it will 2109 transition immediately into PARTNER-DOWN state. 2111 If communications is restored with the other server, then the server 2112 in COMMUNICATIONS-INTERRUPTED state will transition into another 2113 state based on the state of the partner: 2115 o NORMAL or COMMUNICATIONS-INTERRUPTED: Transition into the NORMAL 2116 state. 2118 o RECOVER: Stay in COMMUNICATIONS-INTERRUPTED state. 2120 o RECOVER-DONE: Transition into NORMAL state. 2122 o PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or RESOLUTION- 2123 INTERRUPTED: Transition into POTENTIAL-CONFLICT state. 2125 The following figure illustrates the transition from NORMAL to 2126 COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again. 2128 Primary Secondary 2129 Server Server 2131 NORMAL NORMAL 2132 | >--CONTACT-------------------> | 2133 | <--------------------CONTACT--< | 2134 | [TCP connection broken] | 2135 COMMUNICATIONS : COMMUNICATIONS 2136 INTERRUPTED : INTERRUPTED 2137 | [attempt new TCP connection] | 2138 | [connection succeeds] | 2139 | | 2140 | >--CONNECT-------------------> | 2141 | <-----------------CONNECTACK--< | 2142 | NORMAL 2143 | <-------------------STATE-----< | 2144 NORMAL | 2145 | >--STATE---------------------> | 2146 | 2147 | >--BNDUPD--------------------> | 2148 | <---------------------BNDACK--< | 2149 | | 2150 | <---------------------BNDUPD--< | 2151 | >------BNDACK----------------> | 2152 ... ... 2153 | | 2154 | <--------------------POOLREQ--< | 2155 | >--POOLRESP------------------> | 2156 | | 2157 | >--BNDUPD-(#1)---------------> | 2158 | <---------------------BNDACK--< | 2159 | | 2160 | >--BNDUPD-(#2)---------------> | 2161 | <---------------------BNDACK--< | 2162 | | 2164 Figure 6: Transition from NORMAL to COMMUNICATIONS-INTERRUPTED and 2165 back (example with 2 addresses allocated to secondary) 2167 9.10. POTENTIAL-CONFLICT State 2169 This state indicates that the two servers are attempting to 2170 reintegrate with each other, but at least one of them was running in 2171 a state that did not guarantee automatic reintegration would be 2172 possible. In POTENTIAL-CONFLICT state the servers may determine that 2173 the same resource has been offered and accepted by two different 2174 clients. 2176 It is a goal of this protocol to minimize the possibility that 2177 POTENTIAL-CONFLICT state is ever entered. 2179 When a primary server enters POTENTIAL-CONFLICT state it should 2180 request that the secondary send it all updates of which it is 2181 currently unaware by sending an UPDREQ message to the secondary 2182 server. 2184 A secondary server entering POTENTIAL-CONFLICT state will wait for 2185 the primary to send it an UPDREQ message. 2187 9.10.1. Operation in POTENTIAL-CONFLICT State 2189 Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming 2190 DHCP requests. 2192 9.10.2. Transition Out of POTENTIAL-CONFLICT State 2194 If communications fails with the partner while in POTENTIAL-CONFLICT 2195 state, then the server will transition to RESOLUTION-INTERRUPTED 2196 state. 2198 Whenever either server receives an UPDDONE message from its partner 2199 while in POTENTIAL-CONFLICT state, it MUST transition to a new state. 2200 The primary MUST transition to CONFLICT-DONE state, and the secondary 2201 MUST transition to NORMAL state. This will cause the primary server 2202 to leave POTENTIAL-CONFLICT state prior to the secondary, since the 2203 primary sends an UPDREQ message and receives an UPDDONE before the 2204 secondary sends an UPDREQ message and receives its UPDDONE message. 2206 When a secondary server receives an indication that the primary 2207 server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE 2208 state, it SHOULD send an UPDREQ message to the primary server. 2210 Primary Secondary 2211 Server Server 2212 | | 2213 POTENTIAL-CONFLICT POTENTIAL-CONFLICT 2214 | | 2215 | >--UPDREQ--------------------> | 2216 | | 2217 | <---------------------BNDUPD--< | 2218 | >--BNDACK--------------------> | 2219 ... ... 2220 | | 2221 | <---------------------BNDUPD--< | 2222 | >--BNDACK--------------------> | 2223 | | 2224 | <--------------------UPDDONE--< | 2225 CONFLICT-DONE | 2226 | >--STATE--(CONFLICT-DONE)----> | 2227 | <---------------------UPDREQ--< | 2228 | | 2229 | >--BNDUPD--------------------> | 2230 | <---------------------BNDACK--< | 2231 ... ... 2232 | >--BNDUPD--------------------> | 2233 | <---------------------BNDACK--< | 2234 | | 2235 | >--UPDDONE-------------------> | 2236 | NORMAL 2237 | <------------STATE--(NORMAL)--< | 2238 NORMAL | 2239 | >--STATE--(NORMAL)-----------> | 2240 | | 2241 | <--------------------POOLREQ--< | 2242 | >------POOLRESP--------------> | 2243 | | 2245 Figure 7: Transition out of POTENTIAL-CONFLICT 2247 9.11. RESOLUTION-INTERRUPTED State 2249 This state indicates that the two servers were attempting to 2250 reintegrate with each other in POTENTIAL-CONFLICT state, but 2251 communications failed prior to completion of re-integration. 2253 The RESOLUTION-INTERRUPTED state exists because servers are not 2254 responsive in POTENTIAL-CONFLICT state, and if one server drops out 2255 of service while both servers are in POTENTIAL-CONFLICT state, the 2256 server that remains in service will not be able to process DHCP 2257 client requests and there will be no DHCP service available. The 2258 RESOLUTION-INTERRUPTED state is the state that a server moves to if 2259 its partner disappears while it is in POTENTIAL-CONFLICT state. 2261 When a server enters RESOLUTION-INTERRUPTED state it SHOULD raise an 2262 alarm condition to alert administrative staff of a problem in the 2263 DHCP subsystem. 2265 9.11.1. Operation in RESOLUTION-INTERRUPTED State 2267 In this state a server MUST respond to all DHCP client requests. 2268 When allocating new resources, each server SHOULD allocate from its 2269 own pool (if that can be determined), where the primary SHOULD 2270 allocate only FREE resources, and the secondary SHOULD allocate only 2271 FREE_BACKUP resources. When responding to renewal requests, each 2272 server will allow continued renewal of a DHCP client's current lease 2273 independent of whether that lease was given out by the receiving 2274 server or not, although the renewal period MUST NOT exceed the 2275 maximum client lead time (MCLT) beyond the latest of: 1) the 2276 potential valid lifetime already acknowledged by the other server or 2277 2) now or 3) potential valid lifetime received from the partner 2278 server. 2280 However, since the server cannot communicate with its partner in this 2281 state, the acknowledged potential valid lifetime will not be updated 2282 in any new bindings. 2284 9.11.2. Transition Out of RESOLUTION-INTERRUPTED State 2286 If an external command is received by a server in RESOLUTION- 2287 INTERRUPTED state informing it that its partner is down, it will 2288 transition immediately into PARTNER-DOWN state. 2290 If communications is restored with the other server, then the server 2291 in RESOLUTION-INTERRUPTED state will transition into POTENTIAL- 2292 CONFLICT state. 2294 9.12. CONFLICT-DONE State 2296 This state indicates that during the process where the two servers 2297 are attempting to re-integrate with each other, the primary server 2298 has received all of the updates from the secondary server. It makes 2299 a transition into CONFLICT-DONE state in order that it may be totally 2300 responsive to the client load. There is no operational difference 2301 between CONFLICT-DONE and NORMAL for primary as in both states it 2302 responds to all clients' requests. The distinction between CONFLICT- 2303 DONE and NORMAL states will be more apparent when load balancing 2304 extension will be defined. 2306 9.12.1. Operation in CONFLICT-DONE State 2307 A primary server in CONFLICT-DONE state is fully responsive to all 2308 DHCP clients (similar to the situation in COMMUNICATIONS-INTERRUPTED 2309 state). 2311 If communications fails, remain in CONFLICT-DONE state. If 2312 communications becomes OK, remain in CONFLICT-DONE state until the 2313 conditions for transition out become satisfied. 2315 9.12.2. Transition Out of CONFLICT-DONE State 2317 If communications fails with the partner while in CONFLICT-DONE 2318 state, then the server will remain in CONFLICT-DONE state. 2320 When a primary server determines that the secondary server has made a 2321 transition into NORMAL state, the primary server will also transition 2322 into NORMAL state. 2324 10. Proposed extensions 2326 The following section discusses possible extensions to the proposed 2327 failover mechanism. Listed extensions must be sufficiently simple to 2328 not further complicate failover protocol. Any proposals that are 2329 considered complex will be defined as stand-alone extensions in 2330 separate documents. 2332 10.1. Active-active mode 2334 A very simple way to achieve active-active mode is to remove the 2335 restriction that secondary server MUST NOT respond to SOLICIT and 2336 REQUEST messages. Instead it could respond, but MUST have lower 2337 preference than primary server. Clients discovering available 2338 servers will receive ADVERTISE messages from both servers, but are 2339 expected to select the primary server as it has higher preference 2340 value configured. The following REQUEST message will be directed to 2341 primary server. 2343 The benefit of this approach, compared to the "basic" active--passive 2344 solution is that there is no delay between primary failure and the 2345 moment when secondary starts serving requests. 2347 11. Dynamic DNS Considerations 2349 DHCP servers (and clients) can use DNS Dynamic Updates as described 2350 in RFC 2136 [RFC2136] to maintain DNS name-mappings as they maintain 2351 DHCP leases. Many different administrative models for DHCP-DNS 2352 integration are possible. Descriptions of several of these models, 2353 and guidelines that DHCP servers and clients should follow in 2354 carrying them out, are laid out in RFC 4704 [RFC4704]. 2356 The nature of the failover protocol introduces some issues concerning 2357 dynamic DNS (DDNS) updates that are not part of non-failover 2358 environments. This section describes these issues, and defines the 2359 information which failover partners should exchange in order to 2360 ensure consistent behavior. The presence of this section should not 2361 be interpreted as requiring an implementation of the DHCPv6 failover 2362 protocol to also support DDNS updates. 2364 The purpose of this discussion is to clarify the areas where the 2365 failover and DHCP-DDNS protocols intersect for the benefit of 2366 implementations which support both protocols, not to introduce a new 2367 requirement into the DHCPv6 failover protocol. Thus, a DHCPv6 server 2368 which implements the failover protocol MAY also support dynamic DNS 2369 updates, but if it does support dynamic DNS updates it SHOULD utilize 2370 the techniques described here in order to correctly distribute them 2371 between the failover partners. See RFC 4704 [RFC4704] as well as RFC 2372 4703 [RFC4703] for information on how DHCPv6 servers deal with 2373 potential conflicts when updating DNS even without failover. 2375 From the standpoint of the failover protocol, there is no reason why 2376 a server which is utilizing the DDNS protocol to update a DNS server 2377 should not be a partner with a server which is not utilizing the DDNS 2378 protocol to update a DNS server. However, a server which is not able 2379 to support DDNS or is not configured to support DDNS SHOULD output a 2380 warning message when it receives BNDUPD messages which indicate that 2381 its failover partner is configured to support the DDNS protocol to 2382 update a DNS server. An implementation MAY consider this an error 2383 and refuse to operate, or it MAY choose to operate anyway, having 2384 warned the administrator of the problem in some way. 2386 11.1. Relationship between failover and dynamic DNS update 2388 The failover protocol describes the conditions under which each 2389 failover server may renew a lease to its current DHCP client, and 2390 describes the conditions under which it may grant a lease to a new 2391 DHCP client. An analogous set of conditions determines when a 2392 failover server should initiate a DDNS update, and when it should 2393 attempt to remove records from the DNS. The failover protocol's 2394 conditions are based on the desired external behavior: avoiding 2395 duplicate address and prefix assignments; allowing clients to 2396 continue using leases which they obtained from one failover partner 2397 even if they can only communicate with the other partner; allowing 2398 the secondary DHCP server to grant new leases even if it is unable to 2399 communicate with the primary server. The desired external DDNS 2400 behavior for DHCP failover servers is similar to that described above 2401 for the failover protocol itself: 2403 1. Allow timely DDNS updates from the server which grants a lease to 2404 a client. Recognize that there is often a DDNS update lifecycle 2405 which parallels the DHCP lease lifecycle. This is likely to 2406 include the addition of records when the lease is granted, and 2407 the removal of DNS records when the leased resource is 2408 subsequently made available for allocation to a different client. 2410 2. Communicate enough information between the two failover servers 2411 to allow one to complete the DDNS update 'lifecycle' even if the 2412 other server originally granted the lease. 2414 3. Avoid redundant or overlapping DDNS updates, where both failover 2415 servers are attempting to perform DDNS updates for the same 2416 lease-client binding. 2418 4. Avoid situations where one partner is attempting to add RRs 2419 related to a lease binding while the other partner is attempting 2420 to remove RRs related to the same lease binding. 2422 While DHCP servers configured for DDNS typically perform these 2423 operations on both the AAAA and the PTR resource records, this is not 2424 required. It is entirely possible that a DHCP server could be 2425 configured to only update the DNS with PTR records, and the DHCPv6 2426 clients could be responsible for updating the DNS with their own AAAA 2427 records. In this case, the discussions here would apply only to the 2428 PTR records. 2430 11.2. Exchanging DDNS Information 2432 In order for either server to be able to complete a DDNS update, or 2433 to remove DNS records which were added by its partner, both servers 2434 need to know the FQDN associated with the lease-client binding. In 2435 addition, to properly handle DDNS updates, additional information is 2436 required. All of the following information needs to be transmitted 2437 between the failover partners: 2439 1. The FQDN that the client requested be associated with the 2440 resource. If the client doesn't request a particular FQDN and 2441 one is synthesized by the failover server or if the failover 2442 server is configured to replace a client requested FQDN with a 2443 different FQDN, then the server generated value would be used. 2445 2. The FQDN that was actually placed in the DNS for this lease. It 2446 may differ from the client requested FQDN due to some form of 2447 disambiguation or other DHCP server configuration (as described 2448 above). 2450 3. The status of and DDNS operations in progress or completed. 2452 4. Information sufficient to allow the failover partner to remove 2453 the FQDN from the DNS should that become necessary. 2455 These data items are the minimum necessary set to reliably allow two 2456 failover partners to successfully share the responsibility to keep 2457 the DNS up to date with the resources allocated to clients. 2459 This information would typically be included in BNDUPD messages sent 2460 from one failover partner to the other. Failover servers MAY choose 2461 not to include this information in BNDUPD messages if there has been 2462 no change in the status of any DDNS update related to the lease. 2464 The partner server receiving BNDUPD messages containing the DDNS 2465 information SHOULD compare the status information and the FQDN with 2466 the current DDNS information it has associated with the lease 2467 binding, and update its notion of the DDNS status accordingly. 2469 Some implementations will instead choose to send a BNDUPD without 2470 waiting for the DDNS update to complete, and then will send a second 2471 BNDUPD once the DDNS update is complete. Other implementations will 2472 delay sending the partner a BNDUPD until the DDNS update has been 2473 acknowledged by the DNS server, or until some time-limit has elapsed, 2474 in order to avoid sending a second BNDUPD. 2476 The FQDN option contains the FQDN that will be associated with the 2477 AAAA RR (if the server is performing an AAAA RR update for the 2478 client). The PTR RR can be generated automatically from the IP 2479 address or prefix value. The FQDN may be composed in any of several 2480 ways, depending on server configuration and the information provided 2481 by the client in its DHCP messages. The client may supply a hostname 2482 which it would like the server to use in forming the FQDN, or it may 2483 supply the entire FQDN. The server may be configured to attempt to 2484 use the information the client supplies, it may be configured with an 2485 FQDN to use for the client, or it may be configured to synthesize an 2486 FQDN. 2488 Since the server interacting with the client may not have completed 2489 the DDNS update at the time it sends the first BNDUPD about the lease 2490 binding, there may be cases where the FQDN in later BNDUPD messages 2491 does not match the FQDN included in earlier messages. For example, 2492 the responsive server may be configured to handle situations where 2493 two or more DHCP client FQDNs are identical by modifying the most- 2494 specific label in the FQDNs of some of the clients in an attempt to 2495 generate unique FQDNs for them (a process sometimes called 2496 "disambiguation"). Alternatively, at sites which use some or all of 2497 the information which clients supply to form the FQDN, it's possible 2498 that a client's configuration may be changed so that it begins to 2499 supply new data. The server interacting with the client may react by 2500 removing the DNS records which it originally added for the client, 2501 and replacing them with records that refer to the client's new FQDN. 2502 In such cases, the server SHOULD include the actual FQDN that was 2503 used in subsequent DDNS options in any BNDUPD messages exchanged 2504 between the failover partners. This server SHOULD include relevant 2505 information in its BNDUPD messages. This information may be 2506 necessary in order to allow the non-responsive partner to detect 2507 client configuration changes that change the hostname or FQDN data 2508 which the client includes in its DHCP requests. 2510 11.3. Adding RRs to the DNS 2512 A failover server which is going to perform DDNS updates SHOULD 2513 initiate the DDNS update when it grants a new lease to a client. The 2514 server which did not grant the lease SHOULD NOT initiate a DDNS 2515 update when it receives the BNDUPD after the lease has been granted. 2516 The failover protocol ensures that only one of the partners will 2517 grant a lease to any individual client, so it follows that this 2518 requirement will prevent both partners from initiating updates 2519 simultaneously. The server initiating the update SHOULD follow the 2520 protocol in RFC 4704 [RFC4704]. The server may be configured to 2521 perform a AAAA RR update on behalf of its clients, or not. 2522 Ordinarily, a failover server will not initiate DDNS updates when it 2523 renews leases. In two cases, however, a failover server MAY initiate 2524 a DDNS update when it renews a lease to its existing client: 2526 1. When the lease was granted before the server was configured to 2527 perform DDNS updates, the server MAY be configured to perform 2528 updates when it next renews existing leases. The server which 2529 granted the lease is the server which should initiate the DDNS 2530 update. 2532 2. If a server is in PARTNER-DOWN state, it can conclude that its 2533 partner is no longer attempting to perform an update for the 2534 existing client. If the remaining server has not recorded that 2535 an update for the binding has been successfully completed, the 2536 server MAY initiate a DDNS update. It MAY initiate this update 2537 immediately upon entry to PARTNER-DOWN state, it may perform this 2538 in the background, or it MAY initiate this update upon next 2539 hearing from the DHCP client. 2541 11.4. Deleting RRs from the DNS 2543 The failover server which makes a resource FREE* SHOULD initiate any 2544 DDNS deletes, if it has recorded that DNS records were added on 2545 behalf of the client. 2547 A server not in PARTNER-DOWN state "makes a resource FREE" when it 2548 initiates a BNDUPD with a binding-status of FREE, FREE_BACKUP, 2549 EXPIRED, or RELEASED. Its partner confirms this status by acking 2550 that BNDUPD, and upon receipt of the BNDACK the server has "made the 2551 resource FREE". Conversely, a server in PARTNER-DOWN state "makes a 2552 resource FREE" when it sets the binding-status to FREE, since in 2553 PARTNER-DOWN state no communications is required with the partner. 2555 It is at this point that it should initiate the DDNS operations to 2556 delete RRs from the DDNS. Its partner SHOULD NOT initiate DDNS 2557 deletes for DNS records related to the lease binding as part of 2558 sending the BNDACK message. The partner MAY have issued BNDUPD 2559 messages with a binding-status of FREE, EXPIRED, or RELEASED 2560 previously, but the other server will have rejected these BNDUPD 2561 messages. 2563 The failover protocol ensures that only one of the two partner 2564 servers will be able to make a resource FREE*. The server making the 2565 resource FREE may be doing so while it is in NORMAL communication 2566 with its partner, or it may be in PARTNER-DOWN state. If a server is 2567 in PARTNER-DOWN state, it may be performing DDNS deletes for RRs 2568 which its partner added originally. This allows a single remaining 2569 partner server to assume responsibility for all of the DDNS activity 2570 which the two servers were undertaking. 2572 Another implication of this approach is that no DDNS RR deletes will 2573 be performed while either server is in COMMUNICATIONS-INTERRUPTED 2574 state, since no resource are moved into the FREE* state during that 2575 period. 2577 11.5. Name Assignment with No Update of DNS 2579 In some cases, a DHCP server is configured to return a name to the 2580 DHCPv6 client but not enter that name into the DNS. This is 2581 typically a name that it has discovered or generated from information 2582 it has received from the client. In this case this name information 2583 SHOULD be communicated to the failover partner, if only to ensure 2584 that they will return the same name in the event the partner becomes 2585 the server to which the DHCPv6 client begins to interact. 2587 12. Reservations and failover 2588 Some DHCP servers support a capability to offer specific 2589 preconfigured resources to DHCP clients. These are real DHCP 2590 clients, they do the entire DHCP protocol, but these servers always 2591 offer the client a specific pre-configured resource, and they offer 2592 that resource to no other clients. Such a capability has several 2593 names, but it is sometimes called a "reservation", in that the 2594 resource is reserved for a particular DHCP client. 2596 In a situation where there are two DHCP servers serving the same 2597 prefix without using failover, the two DHCP server's need to have 2598 disjoint resource pools, but identical reservations for the DHCP 2599 clients. 2601 In a failover context, both servers need to be configured with the 2602 proper reservations in an identical manner, but if we stop there 2603 problems can occur around the edge conditions where reservations are 2604 made for resource that has already been leased to a different client. 2605 Different servers handle this conflict in different ways, but the 2606 goal of the failover protocol is to allow correct operation with any 2607 server's approach to the normal processing of the DHCP protocol. 2609 The general solution with regards to reservations is as follows. 2610 Whenever a reserved resource becomes FREE (i.e., when first 2611 configured or whenever a client frees it or it expires or is reset), 2612 the primary server MUST show that resource as FREE (and thus 2613 available for its own allocation) and it MUST send it to the 2614 secondary server in a BNDUPD with a flag set showing that it is 2615 reserved and with a status of FREE_BACKUP. 2617 Note that this implies that a reserved resource goes through the 2618 normal state changes from FREE to ACTIVE (and possibly back to FREE). 2619 The failover protocol supports this approach to reservations, i.e., 2620 where the resource undergoes the normal state changes of any 2621 resource, but it can only be offered to the client for which it is 2622 reserved. 2624 From the above, it follows that a reservation solely on the secondary 2625 will not necessarily allow the secondary to offer that address to 2626 client to whom it is reserved. The reservation must also appear on 2627 the primary as well for the secondary to be able to offer the 2628 resource to the client to which it is reserved. 2630 When the reservation on a resource is cancelled, if the resource is 2631 currently FREE and the server is the primary, or FREE_BACKUP and the 2632 server is the secondary, the server MUST send a BNDUPD to the other 2633 server with the binding-status FREE and an indication that the 2634 resource is no longer reserved. 2636 13. Security Considerations 2638 DHCPv6 failover is an extension of a standard DHCPv6 protocol, so all 2639 security considerations from [RFC3315], Section 23 and [RFC3633], 2640 Section 15 related to the server apply. 2642 As traffic exchange between clients and server is not encrypted, an 2643 attacker that penetrated the network and is able to intercept 2644 traffic, will not gain any additional information by also sniffing 2645 communication between partners. 2647 An attacker that is able to impersonate one partner can efficiently 2648 perform a denial of service attack on the remaining uncompromised 2649 server. Several techniques may be used: pretending that conflict 2650 resolution is required, requesting rebalance, claiming that a valid 2651 lease was released or declined etc. For that reason the 2652 communication between servers SHOULD support failover connections 2653 over TLS, as explained in Section 5.1. Such secure connections 2654 SHOULD be optional and configurable by the administrator. 2656 A server MUST NOT operate in PARTNER-DOWN if its partner is up. 2657 Network administrators are expected to switch the remaining active 2658 server to PARTNER-DOWN state only if they is sure that its partner 2659 server is indeed down. Failing to obey this requirement will result 2660 in both servers likely assigning duplicate leases to different 2661 clients. Implementers should take that into consideration if they 2662 decide to implement the auto-partner-down timer-based transition to 2663 PARTNER-DOWN state. 2665 Running a network protected by DHCPv6 failover requires more 2666 resources than running without it. In particular some of the 2667 resources are allocated to the secondary server and they are not 2668 usable in a normal (i.e. non failures) operation immediately, though 2669 over time they will be rebalanced and end up on the server that needs 2670 them. While limiting this pool may be preferable from resource 2671 utilization perspective, it must be a reasonably large pool, so the 2672 secondary may take over once the primary becomes unavailable. 2674 14. IANA Considerations 2676 IANA is not requested to perform any actions at this time. 2678 15. Acknowledgements 2680 This document extensively uses concepts, definitions and other parts 2681 of [dhcpv4-failover] document. Authors would like to thank Shawn 2682 Routher, Greg Rabil, Bernie Volz and Marcin Siodelski for their 2683 significant involvement and contributions. Authors would like to 2684 thank VithalPrasad Gaitonde, Krzysztof Gierlowski, Krzysztof Nowicki 2685 and Michal Hoeft for their insightful comments. 2687 This work has been partially supported by Department of Computer 2688 Communications (a division of Gdansk University of Technology) and 2689 the Polish Ministry of Science and Higher Education under the 2690 European Regional Development Fund, Grant No. POIG.01.01.02-00-045/ 2691 09-00 (Future Internet Engineering Project). 2693 16. References 2695 16.1. Normative References 2697 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2698 Requirement Levels", BCP 14, RFC 2119, March 1997. 2700 [RFC3315] Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C., 2701 and M. Carney, "Dynamic Host Configuration Protocol for 2702 IPv6 (DHCPv6)", RFC 3315, July 2003. 2704 [RFC3633] Troan, O. and R. Droms, "IPv6 Prefix Options for Dynamic 2705 Host Configuration Protocol (DHCP) version 6", RFC 3633, 2706 December 2003. 2708 [RFC4703] Stapp, M. and B. Volz, "Resolution of Fully Qualified 2709 Domain Name (FQDN) Conflicts among Dynamic Host 2710 Configuration Protocol (DHCP) Clients", RFC 4703, October 2711 2006. 2713 [RFC4704] Volz, B., "The Dynamic Host Configuration Protocol for 2714 IPv6 (DHCPv6) Client Fully Qualified Domain Name (FQDN) 2715 Option", RFC 4704, October 2006. 2717 [RFC5007] Brzozowski, J., Kinnear, K., Volz, B., and S. Zeng, 2718 "DHCPv6 Leasequery", RFC 5007, September 2007. 2720 16.2. Informative References 2722 [I-D.ietf-dhc-dhcpv6-failover-requirements] 2723 Mrugalski, T. and K. Kinnear, "DHCPv6 Failover 2724 Requirements", draft-ietf-dhc-dhcpv6-failover- 2725 requirements-07 (work in progress), July 2013. 2727 [I-D.ietf-dhc-dhcpv6-load-balancing] 2728 Kostur, A., "DHC Load Balancing Algorithm for DHCPv6", 2729 draft-ietf-dhc-dhcpv6-load-balancing-00 (work in 2730 progress), December 2012. 2732 [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, 2733 "Dynamic Updates in the Domain Name System (DNS UPDATE)", 2734 RFC 2136, April 1997. 2736 [RFC5460] Stapp, M., "DHCPv6 Bulk Leasequery", RFC 5460, February 2737 2009. 2739 [dhcpv4-failover] 2740 Droms, R., Kinnear, K., Stapp, M., Volz, B., Gonczi, S., 2741 Rabil, G., Dooley, M., and A. Kapur, "DHCP Failover 2742 Protocol", draft-ietf-dhc-failover-12 (work in progress), 2743 March 2003. 2745 Authors' Addresses 2747 Tomasz Mrugalski 2748 Internet Systems Consortium, Inc. 2749 950 Charter Street 2750 Redwood City, CA 94063 2751 USA 2753 Phone: +1 650 423 1345 2754 Email: tomasz.mrugalski@gmail.com 2756 Kim Kinnear 2757 Cisco Systems, Inc. 2758 1414 Massachusetts Ave. 2759 Boxborough, Massachusetts 01719 2760 USA 2762 Phone: +1 (978) 936-0000 2763 Email: kkinnear@cisco.com