idnits 2.17.1 draft-mrugalski-dhc-dhcpv6-failover-design-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 39 instances of too long lines in the document, the longest one being 2 characters in excess of 72. ** The abstract seems to contain references ([RFC3315]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: In this state a server MUST respond to all DHCP client requests. When allocating new resources (addresses or prefixes), each server SHOULD allocate from its own pool (if that can be determined), where the primary SHOULD allocate only FREE resources, and the secondary SHOULD allocate only BACKUP resources. When responding to renewal requests, each server will allow continued renewal of a DHCP client's current lease irrespective of whether that lease was given out by the receiving server or not, although the renewal period MUST not exceed the maximum client lead time (MCLT) beyond the latest of: 1) the potential valid lifetime already acknowledged by the other server or 2) the lease-expiration-time or 3) potential valid lifetime received from the partner server. -- The document date (March 12, 2012) is 4428 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2131' is defined on line 1847, but no explicit reference was found in the text == Unused Reference: 'RFC3074' is defined on line 1850, but no explicit reference was found in the text == Unused Reference: 'RFC3633' is defined on line 1857, but no explicit reference was found in the text == Unused Reference: 'RFC4704' is defined on line 1861, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-dhc-dhcpv6-redundancy-consider' is defined on line 1870, but no explicit reference was found in the text == Unused Reference: 'RFC2136' is defined on line 1876, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3315 (Obsoleted by RFC 8415) ** Obsolete normative reference: RFC 3633 (Obsoleted by RFC 8415) == Outdated reference: A later version (-03) exists of draft-ietf-dhc-dhcpv6-redundancy-consider-02 Summary: 4 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Dynamic Host Configuration (DHC) T. Mrugalski 3 Internet-Draft ISC 4 Intended status: Standards Track K. Kinnear 5 Expires: September 13, 2012 Cisco 6 March 12, 2012 8 DHCPv6 Failover Design 9 draft-mrugalski-dhc-dhcpv6-failover-design-01 11 Abstract 13 DHCPv6 defined in [RFC3315] does not offer server redundancy. This 14 document defines a design for DHCPv6 failover, a mechanism for 15 running two servers on the same network with capability for either 16 server to take over clients' leases in case of server failure or 17 network partition. This is a DHCPv6 Failover design document, it is 18 not protocol specification document. It is a second document in a 19 planned series of three documents. DHCPv6 failover requirements are 20 specified in [requirements]. A protocol specification document is 21 planned to follow this document. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on September 13, 2012. 40 Copyright Notice 42 Copyright (c) 2012 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 58 2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 60 3.1. Additional Requirements . . . . . . . . . . . . . . . . . 5 61 3.2. Features out of Scope: Load Balancing . . . . . . . . . . 6 62 4. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 6 63 4.1. Failover Machine Sate Overview . . . . . . . . . . . . . . 7 64 5. Connection Management . . . . . . . . . . . . . . . . . . . . 9 65 5.1. Creating Connections . . . . . . . . . . . . . . . . . . . 9 66 5.2. Endpoint Identification . . . . . . . . . . . . . . . . . 10 67 6. Resource Allocation . . . . . . . . . . . . . . . . . . . . . 11 68 6.1. Proportional Allocation . . . . . . . . . . . . . . . . . 12 69 6.2. Independent Allocation . . . . . . . . . . . . . . . . . . 13 70 6.3. Determining Allocation Approach . . . . . . . . . . . . . 13 71 6.3.1. IPv6 Addresses . . . . . . . . . . . . . . . . . . . . 13 72 6.3.2. IPv6 Prefixes . . . . . . . . . . . . . . . . . . . . 13 73 7. Failover Mechanisms . . . . . . . . . . . . . . . . . . . . . 13 74 7.1. Time Skew . . . . . . . . . . . . . . . . . . . . . . . . 14 75 7.2. Time expression . . . . . . . . . . . . . . . . . . . . . 14 76 7.3. Lazy updates . . . . . . . . . . . . . . . . . . . . . . . 14 77 7.4. MCLT concept . . . . . . . . . . . . . . . . . . . . . . . 15 78 7.4.1. MCLT example . . . . . . . . . . . . . . . . . . . . . 16 79 7.5. Unreachability detection . . . . . . . . . . . . . . . . . 17 80 7.6. Re-allocating Leases . . . . . . . . . . . . . . . . . . . 17 81 7.7. Sending Data . . . . . . . . . . . . . . . . . . . . . . . 18 82 7.7.1. Required Data . . . . . . . . . . . . . . . . . . . . 18 83 7.7.2. Optional Data . . . . . . . . . . . . . . . . . . . . 18 84 7.8. Receiving Data . . . . . . . . . . . . . . . . . . . . . . 18 85 7.8.1. Conflict Resolution . . . . . . . . . . . . . . . . . 18 86 7.8.2. Acknowledging Reception . . . . . . . . . . . . . . . 19 87 8. Endpoint States . . . . . . . . . . . . . . . . . . . . . . . 19 88 8.1. State Machine Operation . . . . . . . . . . . . . . . . . 19 89 8.2. State Machine Initialization . . . . . . . . . . . . . . . 22 90 8.3. STARTUP State . . . . . . . . . . . . . . . . . . . . . . 22 91 8.3.1. Operation in STARTUP State . . . . . . . . . . . . . . 22 92 8.3.2. Transition Out of STARTUP State . . . . . . . . . . . 22 93 8.4. PARTNER-DOWN State . . . . . . . . . . . . . . . . . . . . 24 94 8.4.1. Operation in PARTNER-DOWN State . . . . . . . . . . . 24 95 8.4.2. Transition Out of PARTNER-DOWN State . . . . . . . . . 24 97 8.5. RECOVER State . . . . . . . . . . . . . . . . . . . . . . 25 98 8.5.1. Operation in RECOVER State . . . . . . . . . . . . . . 25 99 8.5.2. Transition Out of RECOVER State . . . . . . . . . . . 25 100 8.6. RECOVER-WAIT State . . . . . . . . . . . . . . . . . . . . 27 101 8.6.1. Operation in RECOVER-WAIT State . . . . . . . . . . . 28 102 8.6.2. Transition Out of RECOVER-WAIT State . . . . . . . . . 28 103 8.7. RECOVER-DONE State . . . . . . . . . . . . . . . . . . . . 28 104 8.7.1. Operation in RECOVER-DONE State . . . . . . . . . . . 29 105 8.7.2. Transition Out of RECOVER-DONE State . . . . . . . . . 29 106 8.8. NORMAL State . . . . . . . . . . . . . . . . . . . . . . . 29 107 8.8.1. Operation in NORMAL State . . . . . . . . . . . . . . 29 108 8.8.2. Transition Out of NORMAL State . . . . . . . . . . . . 30 109 8.9. COMMUNICATIONS-INTERRUPTED State . . . . . . . . . . . . . 31 110 8.9.1. Operation in COMMUNICATIONS-INTERRUPTED State . . . . 31 111 8.9.2. Transition Out of COMMUNICATIONS-INTERRUPTED State . . 32 112 8.10. POTENTIAL-CONFLICT State . . . . . . . . . . . . . . . . . 33 113 8.10.1. Operation in POTENTIAL-CONFLICT State . . . . . . . . 34 114 8.10.2. Transition Out of POTENTIAL-CONFLICT State . . . . . . 34 115 8.11. RESOLUTION-INTERRUPTED State . . . . . . . . . . . . . . . 35 116 8.11.1. Operation in RESOLUTION-INTERRUPTED State . . . . . . 36 117 8.11.2. Transition Out of RESOLUTION-INTERRUPTED State . . . . 36 118 8.12. CONFLICT-DONE State . . . . . . . . . . . . . . . . . . . 36 119 8.12.1. Operation in CONFLICT-DONE State . . . . . . . . . . . 37 120 8.12.2. Transition Out of CONFLICT-DONE State . . . . . . . . 37 121 8.13. PAUSED State . . . . . . . . . . . . . . . . . . . . . . . 37 122 8.13.1. Operation in PAUSED State . . . . . . . . . . . . . . 37 123 8.13.2. Transition Out of PAUSED State . . . . . . . . . . . . 38 124 8.14. SHUTDOWN State . . . . . . . . . . . . . . . . . . . . . . 38 125 8.14.1. Operation in SHUTDOWN State . . . . . . . . . . . . . 38 126 8.14.2. Transition Out of SHUTDOWN State . . . . . . . . . . . 38 127 9. Proposed extensions . . . . . . . . . . . . . . . . . . . . . 38 128 9.1. Active-active mode . . . . . . . . . . . . . . . . . . . . 39 129 10. Dynamic DNS Considerations . . . . . . . . . . . . . . . . . . 39 130 11. Reservations and failover . . . . . . . . . . . . . . . . . . 39 131 12. Protocol entities . . . . . . . . . . . . . . . . . . . . . . 39 132 12.1. Failover Protocol . . . . . . . . . . . . . . . . . . . . 40 133 12.2. Protocol constants . . . . . . . . . . . . . . . . . . . . 40 134 13. Open questions . . . . . . . . . . . . . . . . . . . . . . . . 40 135 14. Security Considerations . . . . . . . . . . . . . . . . . . . 40 136 15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 40 137 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 41 138 17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 41 139 17.1. Normative References . . . . . . . . . . . . . . . . . . . 41 140 17.2. Informative References . . . . . . . . . . . . . . . . . . 41 141 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42 143 1. Requirements Language 145 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 146 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 147 document are to be interpreted as described in RFC 2119 [RFC2119]. 149 2. Glossary 151 This is a supplemental glossary that should be combined with 152 definitions in Section 3 of [requirements]. 154 o Failover endpoint - The failover protocol allows for there to be a 155 unique failover 'endpoint' per partner per role per relationship 156 (where role is primary or secondary and the relationship is 157 defined by the relationship-name). This failover endpoint can 158 take actions and hold unique states. Typically, there is a one 159 failover endpoint per partner (server), although there may be 160 more. 'Server' and 'failover endpoint' are synonymous only if the 161 server participates in only one failover relationship. However, 162 for the sake of simplicity 'Server' is used throughout the 163 document to refer to a failover endpoint unless to do so would be 164 confusing. 166 o Failover transmission - all messages exchanged between partners. 168 o Independent Allocation - a prefix allocation algorithm to split 169 the available pool of resources between the primary and secondary 170 servers that is particularly well suited for vast pools (i.e. when 171 available resources are not expected to deplete). See Section 6.2 172 for details. 174 o Primary Server 176 o Proportional Allocation - a prefix allocation algorithm to split 177 the available free leases between the primary and secondary 178 servers that is particularly well suited for more limited 179 resources. See Section 6.1 for details. 181 o Resource - an IPv6 address or a IPv6 prefix. 183 o Responsive - A server that is responsive, will respond to DHCPv6 184 client requests. 186 o Secondary Server 188 o Server - A DHCPv6 server that implements DHCPv6 failover. 189 'Server' and 'failover endpoint' as synonymous only if server 190 participates in only one failover relationship. 192 o Unresponsive - A server that is unresponsive will not respond to 193 DHCPv6 client requests. 195 3. Introduction 197 The failover protocol design provides a means for cooperating DHCPv6 198 servers to work together to provide a DHCPv6 service with 199 availability that is increased beyond that which could be provided by 200 a single DHCPv6 server operating alone. It is designed to protect 201 DHCPv6 clients against server unreachability, including server 202 failure and network partition. It is possible to deploy exactly two 203 servers that are able to continue providing a lease on an IPv6 204 address or on an IPv6 prefix without the DHCPv6 client experiencing 205 lease expiration or a reassignment of a lease to a different IPv6 206 address in the event of failure by one or the other of the two 207 servers. 209 This protocol defines active-passive mode, sometimes also called hot 210 standby model. This means that during normal operation one server is 211 active (i.e. actively responds to clients' requests) while the second 212 is passive (i.e. it does receive clients' requests, but does not 213 respond to them and only maintains a copy of lease database and is 214 ready to take over incoming queries in case of primary server 215 failure). Active-active mode (i.e. both servers actively handling 216 clients' requests) is currently not supported for the sake of 217 simplicity. Such mode may be defined as an exension at a later time. 219 The failover protocol is designed to provide lease stability for 220 leases with lease times beyond a short period. Due to the additional 221 overhead required, failover is not suitable for leases shorter than 222 30 seconds. The DHCPv6 Failover protocol MUST NOT be used for leases 223 shorter than 30 seconds. 225 This design attempts to fulfill all DHCPv6 failover requirements 226 defined in [requirements]. 228 3.1. Additional Requirements 230 The following requirements are not related to failover mechanism in 231 general, but rather to this particular design. 233 1. Minimize Asymmetry - while there are two distinct roles in 234 failover (primary and secondary server), the differences between 235 those two roles should be as small as possible. This will yield 236 a simpler design as well as a simpler implementation of that 237 design. 239 3.2. Features out of Scope: Load Balancing 241 It may be tempting to extend DHCPv6 failover mechanism to also offer 242 load balancing, as DHCPv4 failover did. Here is the reasoning for 243 this decision. In general case (not related to failover) load 244 balancing solutions are used when each server is not able to handle 245 total incoming traffic. However, by the very definition, DHCPv6 246 failover is supposed to assume service availability despite failure 247 of one server. That leads to conclusion that each server must be 248 able to handle whole traffic. Therefore in properly provisioned 249 setup, load balancing is not needed. 251 4. Protocol Overview 253 The DHCPv6 Failover Protocol is defined as a communication between 254 failover partners with all associated algorithms and mechanisms. 255 Failover communication is conducted over a TCP connection established 256 between the partners. The protocol reuses the framing format 257 specified in Section 5.1 of DHCPv6 Bulk Leasequery [RFC5460], but 258 uses different message types. Additional failover-specific message 259 types will be defined. All information is sent over the connection 260 as typical DHCPv6 Options, following format defined in Section 22.1 261 of [RFC3315]. 263 After initialization, the primary server establishes a TCP connection 264 with its partner. The primary server sends a CONNECT message with 265 initial parameters. Secondary server responds with CONNECTACK. 267 Depending on the failover state of each partner, they MUST initiate 268 one of the binding update procedures. Each server MAY send an UPDREQ 269 message to request its partner to send all updates that have not been 270 sent yet (this case applies when partner has an existing database and 271 wants to update it). Alternatively, a server MAY choose to send an 272 UPDREQALL message to request a full lease database transmission 273 including all leases (this case applies in case of booting up new 274 server after installation, corruption or complete loss of database, 275 or other catastrophic failure). 277 Servers exchange lease information by using BNDUPD messages. 278 Depending on local and remote state of a lease, a server may either 279 accept or reject the update. Reception of lease update information 280 is confirmed by responding with BNDACK message with appropriate 281 status. The majority of the messages sent over a failover TCP 282 connection consists of BNDUPD and BNDACK messages. 284 A subset of available resources (addresses or prefixes) is reserved 285 for secondary server use. This is required for handling a case where 286 both servers are able to communicate with clients, but unable to 287 communicate with each other. After initial connection is 288 established, the secondary server requests a pool of available 289 addresses by sending a POOLREQ message. The primary server assigns a 290 pool to the secondary by transmitting a POOLRESP message and then 291 sending a series of BNDUPD messages. The secondary server may 292 initiate such pool request at any time when maintaining communication 293 with primary server. 295 Failover servers use a lazy update mechanism to update their failover 296 partner about changes to their lease state database. After a server 297 performs any modifications to its lease state database (assign a new 298 lease, extend an existing one, release or expire a lease), it sends 299 its response to the client's request first (performing the "regular" 300 DHCPv6 operation) and then informs its failover partner using a 301 BNDUPD message. This BNDUPD message SHOULD be sent soon after the 302 response is sent to the DHCPv6 client, but there is no specific 303 requirement of a minimum time in which to do so. 305 The major problem with lazy update mechanism is the case when the 306 server crashes after sending response to client, but before sending 307 the lazy update to its partner (or when communication between 308 partners is interrupted). To solve this problem, concept known as 309 the Maximum Client Lead Time (MCLT) (initially designed for DHCPv4 310 failover) is used. The MCLT is the maximum amount of time that one 311 server can extend a lease for a client's binding beyond the time 312 known by its failover partner. See Section 7.4 for detailed 313 desciption how MCLT affects assigned lease times. 315 Servers verify each others availability by periodically exchanging 316 CONTACT messages. See Section 7.5 for discussion about detecting 317 partner's unreachability. 319 A server that is being shut down transmits a DISCONNECT message, 320 closes the connection with its failover partner and stops operation. 321 A Server SHOULD transmit any pending lease updates before 322 transmitting DISCONNECT message. 324 4.1. Failover Machine Sate Overview 326 The following section provides simplified description of all states. 327 For the sake of clarity and simplicity, it omits important details. 328 For complete description, see Section 8. In case of a disagreement 329 between simplified and complete description, please follow Section 8. 331 Each server may be in one of the well defines states. In each state 332 a server may be either responsive (responds to clients' queries) or 333 unresponsive (clients' queries are ignored). 335 A server starts its operation in short-lived STARTUP state. A server 336 determines its partner reachibility and state and usually returns 337 back to the state it was in before shutdown. 339 During typical operation when servers maintain communication, both 340 are in NORMAL state. In that state only primary responds to clients' 341 requests. A secondary server in unresponsive. 343 If a server discovers that its partner is no longer reachable, it 344 goes to COMMUNICATIONS-INTERRUPTED state. Server must be extra 345 cautious as it can't distingush if its partner is down or just 346 communication between servers is interrupted. Since communication 347 between partners is not possible, a server must act on the assumtion 348 that if its partner is up, it follows defined procedure. In 349 particular, not extend any lease beyond its partner knowledge by at 350 most MCLT. That imposes additional burden on the server. Therefore 351 it is not recommended to operate for prolonged periods in this state. 352 Once communication is reestablished, server may go into NORMAL, 353 POTENTIAL-CONFLICT or PARTNER-DOWN state. It may also stay in 354 COMMUNICATIONS-INTERRUPTED if certain conditions are met. 356 Once a server is switched into PARTNER-DOWN (when auto-partner-down 357 is used or as a result of administrative action), it can extend 358 leases, regardless of the original server that initially granted the 359 lease. In that state server handles leases from its own pool, but is 360 albo able to serve pool from its downed partner. MCLT restrictions 361 no longer apply. Operation in this mode is less demanding for the 362 server that remains operational, than in COMMUNICATIONS-INTERRUPTED 363 state, but PARTNER-DOWN does not offer any kind of redundancy. 365 When server loses its database (e.g. due to first time run or 366 catastrophic failure) or detects that is partner is in PARTNER-DOWN 367 state and additional conditions are met, it switches to RECOVER 368 state. In that state server acknowledges that content of its 369 database is doubtful and needs to refresh its database from its 370 partner. Once this operation is done, it switches to RECOVER-WAIT 371 and later to RECOVER-DONE. 373 Once servers reestablish connection, they discover each others' 374 state. Depending on the conditions, they may return to NORMAL or 375 move to POTENTINAL-CONFLICT in case of unexpected partner's state. 376 It is a goal of this protocol to minimize the possibility that 377 POTENTIAL-CONFLICT state is ever entered. Servers running in 378 POTENTIAL-CONFLICT do not respond to clients' requests and work on 379 resolving potential conflicts. Once outstanding lease updates are 380 exchanged, servers move to CONFLICT-DONE or NORMAL states. 382 Servers that are recovering from potential conflict and loose 383 communication, switch to RESOLUTION-INTERRUPTED. 385 Server that is being shut down, switches briefly to SHUTDOWN state 386 and communicates its state to its partner before actual termination. 388 5. Connection Management 390 5.1. Creating Connections 392 Every server implementing the failover protocol SHOULD attempt to 393 connect to all of its partners periodically, where the period is 394 implementation dependent and SHOULD be configurable. In the event 395 that a connection has been rejected by a CONNECTACK message with a 396 reject-reason option contained in it or a DISCONNECT message, a 397 server SHOULD reduce the frequency with which it attempts to connect 398 to that server but it SHOULD continue to attempt to connect 399 periodically. 401 When a connection attempt succeeds, if the server generating the 402 connection attempt is a primary server for that relationship, then it 403 MUST send a CONNECT message down the connection. If it is not a 404 primary server for the relationship, then it MUST just drop the 405 connection and wait for the primary server to connect to it. 407 When a connection attempt is received, the only information that the 408 receiving server has is the IP address of the partner initiating a 409 connection. It also knows whether it has the primary role for any 410 failover relationships with the connecting server. If it has any 411 relationships for which it is a primary server, it should initiate a 412 connection of its own to the partner server, one for each primary 413 relationship it has with that server. 415 If it has any relationships with the connecting server for which it 416 is a seconary server, it should just await the CONNECT message to 417 determine which relationship this connection is to serve. 419 If it has no secondary relationships with the connecting server, it 420 SHOULD drop the connection. 422 To summarize -- a primary server MUST use a connection that it has 423 initiated in order to send a CONNECT message. Every server that is a 424 secondary server in a relationship attempts to create a connection to 425 the server which is primary in the relationship, but that connection 426 is only used to stimulate the primary server into recognizing that 427 the secondary server is ready for operation. The reason behind this 428 is that the secondary server has no way to communicate to the primary 429 server which relationship a connection is designed to serve. 431 A server which has multiple secondary relationships with a primary 432 server SHOULD only send one stimulus connection attempt to the 433 primary server. 435 Once a connection is established, the primary server MUST send a 436 CONNECT message across the connection. A secondary server MUST wait 437 for the CONNECT message from a primary server. If the secondary 438 server doesn't receive a CONNECT message from the primary server in 439 an installation dependent amount of time, it MAY drop the connection 440 and send another stimulus connection attempt to the primary server. 442 Every CONNECT message includes a TLS-request option, and if the 443 CONNECTACK message does not reject the CONNECT message and the TLS- 444 reply option says TLS MUST be used, then the servers will immediately 445 enter into TLS negotiation. 447 Once TLS negotiation is complete, the primary server MUST resend the 448 CONNECT message on the newly secured TLS connection and then wait for 449 the CONNECTACK message in response. The TLS-request and TLS-reply 450 options MUST NOT appear in either this second CONNECT or its 451 associated CONNECTACK message as they had in the first messages. 453 The second message sent over a new connection (either a bare TCP 454 connection or a connection utilizing TLS) is a STATE message. Upon 455 the receipt of this message, the receiver can consider communications 456 up. 458 A secondary server MUST NOT respond to the closing of a TCP 459 connection with a blind attempt to reconnect -- there may be another 460 TCP connection to the same failover partner already in use. 462 5.2. Endpoint Identification 464 The proper operation of the failover protocol requires more than the 465 transmission of messages between one server and the other. Each 466 endpoint might seem to be a single DHCPv6 server, but in fact there 467 are situations where additional flexibility in configuration is 468 useful. A failover endpoint is always associated with a set of 469 DHCPv6 prefixes that are configured on the DHCPv6 server where the 470 endpoint appears. A DHCPv6 prefix MUST NOT be associated with more 471 than one failover endpoint. 473 The failover protocol SHOULD be configured with one failover 474 relationship between each pair of failover servers. In this case 475 there is one failover endpoint for that relationship on each failover 476 partner. This failover relationship MUST have a unique name. 478 There is typically little need for addtional relationships between 479 any two servers but there MAY be more than one failover relationship 480 between two servers -- however each MUST have a unique relationship 481 name. 483 Any failover endpoint can take actions and hold unique states. 485 This document frequently describes the behavior of the protocol in 486 terms of primary and secondary servers, not primary and secondary 487 failover endpoints. However, it is important to remember that every 488 'server' described in this document is in reality a failover endpoint 489 that resides in a particular process, and that several failover end- 490 points may reside in the same server process. 492 It is not the case that there is a unique failover endpoint for each 493 prefix that participates in a failover relationship. On one server, 494 there is (typically) one failover endpoint per partner, regardless of 495 how many prefixes are managed by that combination of partner and 496 role. Conversely, on a particular server, any given prefix will be 497 associated with exactly one failover endpoint. 499 When a connection is received from the partner, the unique failover 500 endpoint to which the message is directed is determined solely by the 501 IP address of the partner, the relationship-name, and the role of the 502 receiving server. 504 6. Resource Allocation 506 Currently there are two allocation algorithms defined for resources 507 (addresses or prefixes). Additional allocation schemes may be 508 defined as future extensions. 510 1. Proportional Allocation - This allocation algorithm is a direct 511 application of algorithm defined in [dhcpv4-failover] to DHCPv6. 512 Available resources are split between primary and secondary 513 server. Released resources are always returned to primary 514 server. Primary and secondary servers may initiate a rebalancing 515 procedure, when disparity between resources available to each 516 server reaches a preconfigured threshold. Only resources that 517 are not leased to any clients are "owned" by one of the servers. 518 This algorithm is particularly well suited for scenarios where 519 amount of available resources is limited, as may be the case for 520 prefix delegation. See Section 6.1 for details. 522 2. Independent Allocation - This allocation algorithm assumes that 523 available resources are split between primary and secondary 524 servers as well. In this case, however, resources are assigned 525 to a specific server for all time, regardless if they are 526 available or currently used. This algorithm is much simpler than 527 proportional allocation, because resource imbalance doesn't have 528 to be checked and there is no rebalancing for independent 529 allocation. This algorithm is particularly well suited for 530 scenarios where the there is an abundance of available resources 531 which is typically the case for DHCPv6 address allocation. See 532 Section 6.2 for details. 534 6.1. Proportional Allocation 536 In this allocation scheme, each server has its own pool of available 537 resources. Note that a resource is not "owned" by a particular 538 server throughout its entire lifetime. Only a resource which is 539 available is "owned" by a particular server -- once it has been 540 leased to a client, it is not owned by either failover partner. When 541 it finally becomes available again, it will be owned initially by the 542 primary server, and it may or may not be allocated to the secondary 543 server by the primary server. 545 So, the flow of a resource is as follows: initially a resource is 546 owned by the primary server. It may be allocated to the secondary 547 server if it is available, and then it is owned by the secondary 548 server. Either server can allocate available resources which they 549 own to clients, in which case they cease to own them. When the 550 client releases the resource or the lease on it expires, it will 551 again become available and will be owned by the primary. 553 A resource will not become owned by the server which allocated it 554 initially when it is released or the lease expires because, in 555 general, that server will have had to replenish its pool of available 556 resources well in advance of any likely lease expirations. Thus, 557 having a particular resource cycle back to the secondary might well 558 put the secondary more out of balance with respect to the primary 559 instead of enhancing the balance of available addresses or prefixes 560 between them. 562 TODO: Need to rework this v4-specific vocabulary to v6, once we 563 decide how things will look like in v6. 565 When they are used, these proportional pools are used for allocation 566 when in every state but PARTNER-DOWN state. In PARTNER-DOWN state a 567 failover server can allocate from either pool. This allocation and 568 maintenance of these address pools is an area of some sensitivity, 569 since the goal is to maintain a more or less constant ratio of 570 available addresses between the two servers. 572 TODO: Reuse rest of the description from section 5.4 from 573 [dhcpv4-failover] here. 575 6.2. Independent Allocation 577 In this allocation scheme, available resources are split between 578 servers. Available resources are split between the primary and 579 secondary servers as part of initial connection establishment. Once 580 resources are allocated to each server, there is no need to reassign 581 them. This algorithm is simpler than proportional allocation since 582 it requires no less initial communicagtion and does not require a 583 rebalancing mechanism, but it assumes that the pool assigned to each 584 server will never deplete. That is often a reasonable assumption for 585 IPv6 addresses (e.g. servers are often assigned a /64 pool that 586 contains many more addresses than existing electronic devices on 587 Earth). This allocation mechanism SHOULD be used for IPv6 addresses, 588 unless configured address pool is small or is otherwise 589 administratively limited. 591 Once each server is assigned a resource pool during initial 592 connection establishment, it may allocate assigned resources to 593 clients. Once a client release a resource or its lease is expired, 594 the returned resource returns to pool for the same server. Resources 595 never changes servers. 597 During COMMUNICATION-INTERRUPTED events, a partner MAY continue 598 extending existing leases when requested by clients. A healthy 599 partner MUST NOT lease resources that were assigned to its downed 600 partner and later released by a client unless it is in PARTNER-DOWN 601 state. 603 6.3. Determining Allocation Approach 605 6.3.1. IPv6 Addresses 607 6.3.2. IPv6 Prefixes 609 7. Failover Mechanisms 611 This section lays out an overview of the communication between 612 partners and other mechanisms required for failover operation. As 613 this is a design document, not a protocol specification, high level 614 ideas are presented without implementation specific details (e.g. 615 lack of on-wire formats). Implementation details will be specified 616 in a separate draft. 618 7.1. Time Skew 620 Partners exchange information about known lease states. To reliably 621 compare a known lease state with an update received from a partner, 622 servers must be able to reliably compare the times stored in the 623 known lease state with the times received in the update. Although a 624 simple approach would be to require both partners to use synchronized 625 time, e.g. by using NTP, such a service may become unavailable in 626 some scenarios that failover expects to cover, e.g. network 627 partition. Therefore a mechanism to measure and track relative time 628 differences between servers is necessary. To do so, each message 629 MUST contain FO_TIMESTAMP option that contains the timestamp of the 630 transmission in the time context of the transmitter. The 631 transmitting server MUST set this as close to the actual transmission 632 as possible. The receiving partner MUST store its own timestamp of 633 reception event as close to the actual reception as possible. The 634 received timestamp information is then compared with local timestamp. 636 To account for packet delay variation (jitter), the measured 637 difference is not used directly, but rather the moving average of 638 last TIME_SKEW_PKTS_AVG packets time difference is calculated. This 639 averaged value is referred to as the time skew. Note that the time 640 skew algorithm allows cooperation between clients with completely 641 desynchronized clocks as well as those whose desynchronization itself 642 is not constant. 644 7.2. Time expression 646 Timestamps are expressed as number of seconds since midnight (UTC), 647 January 1, 2000, modulo 2^32. Note: that is the same approach as 648 used in creation of DUID-LLT (see Section 9.2 of [RFC3315]). 650 Time differences are expressed in seconds and are signed. 652 7.3. Lazy updates 654 Lazy update refers to the requirement placed on a server implementing 655 a failover protocol to update its failover partner whenever the 656 binding database changes. A failover protocol which didn't support 657 lazy update would require the failover partner update to complete 658 before a DHCPv6 server could respond to a DHCPv6 client request. The 659 lazy update mechanism allows a server to allocate a new or extend an 660 existing lease and then update its failover partner as time permits. 662 Although the lazy update mechanism does not introduce additional 663 delays in server response times, it introduces other difficulties. 664 The key problem with lazy update is that when a server fails after 665 updating a client with a particular lease time and before updating 666 its partner, the partner will believe that a lease has expired even 667 though the client still retains a valid lease on that address or 668 prefix. 670 7.4. MCLT concept 672 In order to handle problem introduced by lazy updates (see 673 Section 7.3), a period of time known as the "Maximum Client Lead 674 Time" (MCLT) is defined and must be known to both the primary and 675 secondary servers. Proper use of this time interval places an upper 676 bound on the difference allowed between the lease time provided to a 677 DHCPv6 client by a server and the lease time known by that server's 678 failover partner. 680 The MCLT is typically much less than the lease time that a server has 681 been configured to offer a client, and so some strategy must exist to 682 allow a server to offer the configured lease time to a client. 683 During a lazy update the updating server typically updates its 684 partner with a potential expiration time which is longer than the 685 lease time previously given to the client and which is longer than 686 the lease time that the server has been configured to give a client. 687 This allows that server to give a longer lease time to the client the 688 next time the client renews its lease, since the time that it will 689 give to the client will not exceed the MCLT beyond the potential 690 expiration time acknowledged by its partner. 692 The fundamental relationship on which much of The correctness of this 693 protocol depends is that the lease expiration time known to a DHCPv6 694 client MUST NOT under any circumstances be more than the maximum 695 client lead time (MCLT) greater than the potential expiration time 696 known to a server's partner. 698 The remainder of this section makes the above fundamental 699 relationship more explicit. 701 This protocol requires a DHCPv6 server to deal with several different 702 lease intervals and places specific restrictions on their 703 relationships. The purpose of these restrictions is to allow the 704 other server in the pair to be able to make certain assumptions in 705 the absence of an ability to communicate between servers. 707 The different times are: 709 desired valid lifetime: 710 The desired valid lifetime is the lease interval that a DHCPv6 711 server would like to give to a DHCPv6 client in the absence of any 712 restrictions imposed by the failover protocol. Its determination 713 is outside of the scope of this protocol. Typically this is the 714 result of external configuration of a DHCPv6 server. 716 actual valid lifetime: 717 The actual valid lifetime is the lease interval that a DHCPv6 718 server gives out to a DHCPv6 client. It may be shorter than the 719 desired valid lifetime (as explained below). 721 potential valid lifetime: 722 The potential valid lifetime is the potential lease expiration 723 interval the local server tells to its partner in a BNDUPD 724 message. 726 acknowledged potential valid lifetime: 727 The acknowledged potential valid lifetime is the potential lease 728 interval the partner server has most recently acknowledged in a 729 BNDACK message. 731 7.4.1. MCLT example 733 The following example demonstrates the MCLT concept in practice. The 734 values used are arbitrarily chosen are and not a recommendation for 735 actual values. The MCLT in this case is 1 hour. The desired valid 736 lifetime is 3 days, and its renewal time is half the valid lifetime. 738 When a server makes an offer for a new lease on an IP address to a 739 DHCPv6 client, it determines the desired valid lifetime (in this 740 case, 3 days). It then examines the acknowledged potential valid 741 lifetime (which in this case is zero) and determines the remainder of 742 the time left to run, which is also zero. To this it adds the MCLT. 743 Since the actual valid lifetime cannot be allowed to exceed the 744 remainder of the current acknowledged potential valid lifetime plus 745 the MCLT, the offer made to the client is for the remainder of the 746 current acknowledged potential valid lifetime (i.e., zero) plus the 747 MCLT. Thus, the actual valid lifetime is 1 hour. 749 Once the server has sent the REPLY to the DHCPv6 client, it will 750 update its failover partner with the lease information. However, the 751 desired potential valid lifetime will be composed of one half of the 752 current actual valid lifetime added to the desired valid lifetime. 753 Thus, the failover partner is updated with a BNDUPD with a potential 754 valid lifetime of 3 days + 1/2 hour. 756 When the primary server receives a BNDACK to its update of the 757 secondary server's (partner's) potential valid lifetime, it records 758 that as the acknowledged potential valid lifetime. A server MUST NOT 759 send a BNDACK in response to a BNDUPD message until it is sure that 760 the information in the BNDUPD message has been updated in its lease 761 database. Thus, the primary server in this case can be sure that the 762 secondary server has recorded the potential lease interval in its 763 stable storage when the primary server receives a BNDACK message from 764 the secondary server. 766 When the DHCPv6 client attempts to renew at T1 (approximately one 767 half an hour from the start of the lease), the primary server again 768 determines the desired valid lifetime, which is still 3 days. It 769 then compares this with the remaining acknowledged potential valid 770 lifetime (3 days + 1/2 hour) and adjusts for the time passed since 771 the secondary was last updated (1/2 hour). Thus the time remaining 772 of the acknowledged potential valid interval is 3 days. Adding the 773 MCLT to this yields 3 days plus 1 hour, which is more than the 774 desired valid lifetime of 3 days. So the client is renewed for the 775 desired valid lifetime -- 3 days. 777 When the primary DHCPv6 server updates the secondary DHCPv6 server 778 after the DHCPv6 client's renewal REPLY is complete, it will 779 calculate the desired potential valid lifetime as the T1 fraction of 780 the actual client valid lifetime (1/2 of 3 days this time = 1.5 781 days). To this it will add the desired client valid lifetime of 3 782 days, yielding a total desired potential valid lifetime of 4.5 days. 783 In this way, the primary attempts to have the secondary always "lead" 784 the client in its understanding of the client's valid lifetime so as 785 to be able to always offer the client the desired client valid 786 lifetime. 788 Once the initial actual client valid lifetime of the MCLT is past, 789 the protocol operates effectively like the DHCPv6 protocol does today 790 in its behavior concerning valid lifetimes. However, the guarantee 791 that the actual client valid lifetime will never exceed the remaining 792 acknowledged partner server potential valid lifetime by more than the 793 MCLT allows full recovery from a variety of failures. 795 7.5. Unreachability detection 797 Each partner maintains an FO_SEND timer for each partner connection. 798 The FO_SEND timer is reset every time any message is transmitted. If 799 the timer reaches the FO_SEND_MAX value, a CONTACT message is 800 transmitted and timer is reset. The CONTACT message may be 801 transmitted at any time. 803 Discussion: Perhaps it would be more reasonable to use echo-reply 804 approach, rather than periodic transmissions? 806 7.6. Re-allocating Leases 808 TODO: Describe controlled re-allocation of released/expired leases to 809 different clients. 811 7.7. Sending Data 813 Each server updates its failover partner about recent changes in 814 lease states. Each update must include following information: 816 1. resource type - non-temporary address or a prefix 818 2. resource information - actual address or prefix 820 3. valid life time requested by client 822 4. IAID - Identity Association used by client, while obtaining this 823 lease. (Note1: one client may use many IAID simulatenously. 824 Note2: IAID for IA, TA and PD are orthogonal number spaces.) 826 5. valid life time sent to client 828 6. potential valid life time 830 7. preferred life time sent to client 832 8. CLTT - Client Last Transaction Time, a timestamp of the last 833 received transmission from a client 835 9. assigned FQDN names, if any (optional) 837 Discussion: Do we need T1 as well? Something like next expected 838 client transmission? 840 Q: Maybe we could reuse IA_NA and IA_PD options here? Yes. 842 Q: Do we care about preferred lifetime? (presumably no). Certainly 843 not what was requested by the client. 845 Q: Do we care about IAID? (presumably yes) Yes. 847 7.7.1. Required Data 849 7.7.2. Optional Data 851 7.8. Receiving Data 853 7.8.1. Conflict Resolution 855 TODO: This is just a loose collection of notes. This section will 856 probably need to be rewritten as a a flowchart of some kind. 858 The server receiving a lease update from its partner must evaluate 859 the received lease information to see if it is consistent with 860 already known state and decide which information - previously known 861 or just received - is "better". The server should take into 862 consideration the following aspects: if the lease is already assigned 863 to specific client, who had contact with client recently, start time 864 of the lease, etc. 866 The lease update may be accepted or rejected. Rejection SHOULD NOT 867 change the flag in a lease that says that it should be transmitted to 868 the failover partner. If this flag is set, then it should be 869 transmitted, but if it is not already set, the rejection of a lease 870 state update SHOULD NOT trigger an automatic update of the failover 871 partner sending the rejected update. The potential for update storms 872 is too great, and in the unusual case where the servers simply can't 873 agree, that disagreement is better than an update storm. 875 Discussion: There will definitely be different types of update 876 rejections. For example, this will allow a server to treat 877 differently a case when receiving a new lease that it previously 878 haven't seen than a case when partner sents old version of a lease 879 for which a newer state is known. 881 7.8.2. Acknowledging Reception 883 8. Endpoint States 885 8.1. State Machine Operation 887 Each server (or, more accurately, failover endpoint) can take on a 888 variety of failover states. These states play a crucial role in 889 determining the actions that a server will perform when processing a 890 request from a DHCPv6 client as well as dealing with changing 891 external conditions (e.g., loss of connection to a failover partner). 893 The failover state in which a server is running controls the 894 following behaviors: 896 o Responsiveness -- the server is either responsive to DHCPv6 client 897 requests or it is not. 899 o Allocation Pool -- which pool of addresses (or prefixes) can be 900 used for allocation on receipt of a SOLICIT message. 902 o MCLT -- ensure that valid lifetimes are not beyond what the 903 partner has acked plus the MCLT (or not). 905 A server will transition from one failover state to another based on 906 the specific values held by the following state variables: 908 o Current failover state. 910 o Communications status (OK or not OK). 912 o Partner's failover state (if known). 914 Whenever the either of the last two of the above state variables 915 changes state, the state machine is invoked, which may then trigger a 916 change in the current failove state. Thus, whenever the 917 communications status changes, the state machine is processing is 918 invoked. This may or may not result in a change in the current 919 failover state. 921 Whenever a server transitions to a new failover state, the new state 922 MUST be communicated to its failover partner in a STATE message if 923 the communications status is OK. In addition, whenever a server 924 makes a transition into a new state, it MUST record the new state, 925 its current understanding of its partner's state, and the time at 926 which it entered the new state in stable storage. 928 The following state transition diagram gives a condensed view of the 929 state machine. If there is a difference between the words describing 930 a particular state and the diagram below, the words should be 931 considered authoritative. 933 A transition into SHUTDOWN or PAUSED state is not represented in the 934 following figure, since other than sending that state to its partner, 935 the remaining actions involved look just like the server halting in 936 its otherwise current state, which then becomes the previous state 937 upon server restart. 939 +---------------+ V +--------------+ 940 | RECOVER -|+| | | STARTUP - | 941 |(unresponsive) | +->+(unresponsive)| 942 +------+--------+ +--------------+ 943 +-Comm. OK +-----------------+ 944 | Other State: | PARTNER DOWN - +<----------------------+ 945 | RESOLUTION-INTER. | (responsive) | ^ 946 All POTENTIAL- +----+------------+ | 947 Others CONFLICT------------ | --------+ | 948 | CONFLICT-DONE Comm. OK | +--------------+ | 949 UPDREQ or Other State: | +--+ RESOLUTION - | | 950 UPDREQALL | | | | | INTERRUPTED | | 951 Rcv UPDDONE RECOVER All | | | (responsive) | | 952 | +---------------+ | Others | | +------------+-+ | 953 +->+RECOVER-WAIT +-| RECOVER | | | ^ | | 954 |(unresponsive) | WAIT or | | Comm. | Ext. | 955 +-----------+---+ DONE | | OK Comm. Cmd----->+ 956 Comm.---+ Wait MCLT | V V V Failed | 957 Changed | V +---+ +---+-----+--+-+ | | 958 | +---+----------++ | | POTENTIAL + +-------+ | 959 | |RECOVER-DONE +-| Wait | CONFLICT +------+ | 960 +->+(unresponsive) | for |(unresponsive)| Primary | 961 +------+--------+ Other +>+----+--------++ resolve Comm. | 962 Comm. OK State: | | ^ conflict Changed | 963 +---Other State:-+ RECOVER | Secondary | V V | | 964 | | | DONE | resolve | ++----------+---++ | 965 | All Others: POTENT. | | conflict | |CONFLICT-DONE-|+| | 966 | Wait for CONFLICT- | ----+ see (9.10) | | (responsive) | | 967 | Other State: V V | +------+---------+ | 968 | NORMAL or RECOVER ++------------+---+ Other State: NORMAL | 969 | | DONE | NORMAL + +<--------------+ | 970 | +--+----------+-->+ (balanced) +-------External Command--->+ 971 | ^ ^ +--------+--------+ or Other State: | 972 | | | | | SHUTDOWN | 973 | Wait for Comm. OK Comm. Failed or | | 974 | Other Other Other State: PAUSED | External 975 | State: State: | | Command 976 | RECOVER-DONE NORMAL Start Safe Comm. OK or 977 | | COMM. INT. Period Timer Other State: Safe 978 | Comm. OK. | V All Others Period 979 | Other State: | +---------+--------+ | expiration 980 | RECOVER +--+ COMMUNICATIONS - +----+ | 981 | +-------------+ INTERRUPTED | | 982 RECOVER | (responsive) +-------------------------->+ 983 RECOVER-WAIT--------->+------------------+ 985 Figure 1: Failover Endpoint State Machine 987 8.2. State Machine Initialization 989 TODO 991 8.3. STARTUP State 993 The STARTUP state affords an opportunity for a server to probe its 994 partner server, before starting to service DHCP clients. When in the 995 STARTUP state, a server attempts to learn its partner's state and 996 determine (using that information if it is available) what state it 997 should enter. 999 The STARTUP state is not shown with any specific state transitions in 1000 the state machine diagram (Figure 1) because the processing during 1001 the STARTUP state can cause the server to transition to any of the 1002 other states, so that specific state transition arcs would only 1003 obscure other information. 1005 8.3.1. Operation in STARTUP State 1007 The server MUST NOT be responsive in STARTUP state. 1009 Whenever a STATE message is sent to the partner while in STARTUP 1010 state the STARTUP flag MUST be set the message and the previously 1011 recorded failover state MUST be placed in the server-state option. 1013 8.3.2. Transition Out of STARTUP State 1015 The following algorithm is followed every time the server initializes 1016 itself, and enters STARTUP state. 1018 Step 1: 1020 If there is any record in stable storage of a previous failover state 1021 for this server, set PREVIOUS-STATE to the last recorded value in 1022 stable storage, and go to Step 2. 1024 If there is no record of any previous failover state in stable 1025 storage for this server, then set the PREVIOUS-STATE to RECOVER and 1026 set the TIME-OF-FAILURE to 0. This will allow two servers which 1027 already have lease information to synchronize themselves prior to 1028 operating. 1030 In some cases, an existing server will be commissioned as a failover 1031 server and brought back into operation where its partner is not yet 1032 available. In this case, the newly commissioned failover server will 1033 not operate until its partner comes online -- but it has operational 1034 responsibilities as a DHCP server nonetheless. To properly handle 1035 this situation, a server SHOULD be configurable in such a way as to 1036 move directly into PARTNER-DOWN state after the startup period 1037 expires if it has been unable to contact its partner during the 1038 startup period. 1040 Step 2: 1042 If the previous state is one where communications was "OK", then set 1043 the previous state to the state that is the result of the 1044 communications failed state transition (if such transition exists -- 1045 some states don't have a communications failed state transition, 1046 since they allow both commun- ications OK and failed). 1048 Step 3: 1050 Start the STARTUP state timer. The time that a server remains in the 1051 STARTUP state (absent any communications with its partner) is 1052 implementation dependent but SHOULD be short. It SHOULD be long 1053 enough for a TCP connection to be created to a heavily loaded partner 1054 across a slow network. 1056 Step 4: 1058 Attempt to create a TCP connection to the failover partner. 1060 Step 5: 1062 Wait for "communications OK". 1064 When and if communications become "okay", clear the STARTUP flag, and 1065 set the current state to the PREVIOUS-STATE. 1067 If the partner is in PARTNER-DOWN state, and if the time at which it 1068 entered PARTNER-DOWN state (as received in the start-time-of-state 1069 option in the STATE message) is later than the last recorded time of 1070 operation of this server, then set CURRENT-STATE to RECOVER. If the 1071 time at which it entered PARTNER-DOWN state is earlier than the last 1072 recorded time of operation of this server, then set CURRENT-STATE to 1073 POTENTIAL-CONFLICT. 1075 Then, transition to the current state and take the "communications 1076 OK" state transition based on the current state of this server and 1077 the partner. 1079 Step 6: 1081 If the startup time expires the server SHOULD go transition to the 1082 PREVIOUS-STATE. 1084 8.4. PARTNER-DOWN State 1086 PARTNER-DOWN state is a state either server can enter. When in this 1087 state, the server assumes that it is the only server operating and 1088 serving the client base. If one server is in PARTNER-DOWN state, the 1089 other server MUST NOT be operating. 1091 8.4.1. Operation in PARTNER-DOWN State 1093 The server MUST be responsive in PARTNER-DOWN state. 1095 It will allow renewal of all outstanding leases on IP addresses. For 1096 those IP addresses for which the server is using proportional 1097 allocation, it will allocate IP addresses from its own pool, and 1098 after a fixed period of time (the MCLT interval) has elapsed from 1099 entry into PARTNER-DOWN state, it will allocate IP addresses from the 1100 set of all available IP addresses. 1102 Any IP address tagged as available for allocation by the other server 1103 (at entry to PARTNER-DOWN state) MUST NOT be allocated to a new 1104 client until the maximum-client-lead-time beyond the entry into 1105 PARTNER-DOWN state has elapsed. 1107 A server in PARTNER-DOWN state MUST NOT allocate an IP address to a 1108 DHCP client different from that to which it was allocated at the 1109 entrance to PARTNER-DOWN state until the maximum-client-lead-time 1110 beyond the maximum of the following times: client expiration time, 1111 most recently transmitted potential-expiration-time, most recently 1112 received ack of potential-expiration-time from the partner, and most 1113 recently acked potential-expiration-time to the partner. See section 1114 7.1.5 for details. If this time would be earlier than the current 1115 time plus the maximum-client-lead-time, then the time the server 1116 entered PARTNER-DOWN state plus the maximum-client-lead-time is used. 1118 The server is not restricted by the MCLT when offering lease tmes 1119 while in PARTNER-DOWN state. 1121 8.4.2. Transition Out of PARTNER-DOWN State 1123 When a server in PARTNER-DOWN state succeeds in establishing a con- 1124 nection to its partner, its actions are conditional on the state and 1125 flags received in the STATE message from the other server as part of 1126 the process of establishing the connection. 1128 If the STARTUP bit is set in the server-flags option of a received 1129 STATE message, a server in PARTNER-DOWN state MUST NOT take any state 1130 transitions based on reestablishing communications. Essentially, if 1131 a server is in PARTNER-DOWN state, it ignores all STATE messages from 1132 its partner that have the STARTUP bit set in the server-flags option 1133 of the STATE message. THIS NEEDS TO BE MOVED 1135 If the STARTUP bit is not set in the server-flags option of a STATE 1136 message received from its partner, then a server in PARTNER-DOWN 1137 state takes the following actions based on the state of the partner 1138 as received in a STATE message (either immediately after establishing 1139 communications or at any time later when a new state is received) 1141 If the partner is in: 1143 NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN, POTENTIAL-CONFLICT, 1144 RESOLUTION-INTERRUPTED, or CONFLICT-DONE state 1146 transition to POTENTIAL-CONFLICT state 1148 If the partner is in: 1150 RECOVER, RECOVER-WAIT, SHUTDOWN, PAUSED state 1152 stay in PARTNER-DOWN state 1154 If the partner is in: 1156 RECOVER-DONE state 1158 transition into NORMAL state 1160 8.5. RECOVER State 1162 This state indicates that the server has no information in its stable 1163 storage or that it is re-integrating with a server in PARTNER-DOWN 1164 state after it has been down. A server in this state MUST attempt to 1165 refresh its stable storage from the other server. 1167 8.5.1. Operation in RECOVER State 1169 The server MUST NOT be responsive in RECOVER state. 1171 A server in RECOVER state will attempt to reestablish communications 1172 with the other server. 1174 8.5.2. Transition Out of RECOVER State 1176 If the other server is in POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, 1177 or CONFLICT-DONE state when communications are reestablished, then 1178 the server in RECOVER state will move to POTENTIAL-CONFLICT state 1179 itself. 1181 If the other server is in any other state, then the server in RECOVER 1182 state will request an update of missing binding information by 1183 sending an UPDREQ message. If the server has determined that it has 1184 lost its stable storage because it has no record of ever having 1185 talked to its partner, while its partner does have a record of 1186 communicating with it, it MUST send an UPDREQALL message, otherwise 1187 it MUST send an UPDREQ message. 1189 It will wait for an UPDDONE message, and upon receipt of that message 1190 it will transition to RECOVER-WAIT state. 1192 If communications fails during the reception of the results of the 1193 UPDREQ or UPDREQALL message, the server will remain in RECOVER state, 1194 and will re-issue the UPDREQ or UPDREQALL when communications are re- 1195 established. 1197 If an UPDDONE message isn't received within an implementation 1198 dependent amount of time, and no BNDUPD messages are being received, 1199 the connection SHOULD be dropped. 1201 A B 1202 Server Server 1204 | | 1205 RECOVER PARTNER-DOWN 1206 | | 1207 | >--UPDREQ--------------------> | 1208 | | 1209 | <---------------------BNDUPD--< | 1210 | >--BNDACK--------------------> | 1211 ... ... 1212 | | 1213 | <---------------------BNDUPD--< | 1214 | >--BNDACK--------------------> | 1215 | | 1216 | <--------------------UPDDONE--< | 1217 | | 1218 RECOVER-WAIT | 1219 | | 1220 | >--STATE-(RECOVER-WAIT)------> | 1221 | | 1222 | | 1223 Wait MCLT from last known | 1224 time of failover operation | 1225 | | 1226 RECOVER-DONE | 1227 | | 1228 | >--STATE-(RECOVER-DONE)------> | 1229 | NORMAL 1230 | <-------------(NORMAL)-STATE--< | 1231 NORMAL | 1232 | >---- State-(NORMAL)---------------> | 1233 | | 1234 | | 1236 Figure 2: Transition out of RECOVER state 1238 If, at any time while a server is in RECOVER state communications 1239 fails, the server will stay in RECOVER state. When communications 1240 are restored, it will restart the process of transitioning out of 1241 RECOVER state. 1243 8.6. RECOVER-WAIT State 1245 This state indicates that the server has done an UPDREQ or UPDREQALL 1246 and has received the UPDDONE message indicating that it has received 1247 all outstanding binding update information. In the RECOVER-WAIT 1248 state the server will wait for the MCLT in order to ensure that any 1249 processing that this server might have done prior to losing its 1250 stable storage will not cause future difficulties. 1252 8.6.1. Operation in RECOVER-WAIT State 1254 The server MUST NOT be responsive in RECOVER-WAIT state. 1256 8.6.2. Transition Out of RECOVER-WAIT State 1258 Upon entry to RECOVER-WAIT state the server MUST start a timer whose 1259 expiration is set to a time equal to the time the server went down 1260 (if known) or the time the server started (if the down-time is 1261 unknown) plus the maximum-client-lead-time. When this timer expires, 1262 the server will transition into RECOVER-DONE state. 1264 This is to allow any IP addresses that were allocated by this server 1265 prior to loss of its client binding information in stable storage to 1266 contact the other server or to time out. 1268 If this is the first time this server has run failover -- as 1269 determined by the information received from the partner, not 1270 necessarily only as determined by this server's stable storage (as 1271 that may have been lost), then the waiting time discussed above may 1272 be skipped, and the server may transition immediately to RECOVER-DONE 1273 state. 1275 If the server has never before run failover, then there is no need to 1276 wait in this state -- but, again, to determine if this server has run 1277 failover it is vital that the information provided by the partner be 1278 utilized, since the stable storage of this server may have been lost. 1280 If communications fails while a server is in RECOVER-WAIT state, it 1281 has no effect on the operation of this state. The server SHOULD 1282 continue to operate its timer, and the timer expires during the 1283 period where communications with the other server have failed, then 1284 the server SHOULD transition to RECOVER-DONE state. This is rare -- 1285 failover state transitions are not usually made while communications 1286 are interrupted, but in this case there is no reason to inhibit the 1287 timer. 1289 8.7. RECOVER-DONE State 1291 This state exists to allow an interlocked transition for one server 1292 from RECOVER state and another server from PARTNER-DOWN or 1293 COMMUNICATIONS-INTERRUPTED state into NORMAL state. 1295 8.7.1. Operation in RECOVER-DONE State 1297 A server in RECOVER-DONE state MUST respond only to DHCPREQUEST/ 1298 RENEWAL and DHCPREQUEST/REBINDING DHCP messages. 1300 8.7.2. Transition Out of RECOVER-DONE State 1302 When a server in RECOVER-DONE state determines that its partner 1303 server has entered NORMAL or RECOVER-DONE state, then it will 1304 transition into NORMAL state. 1306 If communications fails while in RECOVER-DONE state, a server will 1307 stay in RECOVER-DONE state. 1309 8.8. NORMAL State 1311 NORMAL state is the state used by a server when it is communicating 1312 with the other server, and any required resynchronization has been 1313 performed. While some bindings database synchronization is performed 1314 in NORMAL state, potential conflicts are resolved prior to entry into 1315 NORMAL state as is binding database data loss. 1317 When entering NORMAL state, a server will send to the other server 1318 all currently unacknowledged binding updates as BNDUPD messages. 1320 When the above process is complete, if the server entering NORMAL 1321 state is a secondary server, then it will request IP addresses for 1322 allocation using the POOLREQ message. 1324 8.8.1. Operation in NORMAL State 1326 When in NORMAL state a server will operate in the following manner: 1328 Lease time calculations 1329 As discussed in Section 7.4, the lease interval given to a DHCP 1330 client can never be more than the MCLT greater than the most 1331 recently received potential- expiration-time from the failover 1332 partner or the current time, whichever is later. 1334 As long as a server adheres to this constraint, the specifics of 1335 the lease interval that it gives to a DHCP client or the value of 1336 the potential-expiration-time sent to its failover partner are 1337 implementation dependent. 1339 Lazy update of partner server 1340 After sending an REPLY that includes lease update to a client, the 1341 server servicing a DHCP client request attempts to update its 1342 partner with the new binding information. Server transmits both 1343 desired valid lifetime and actual valid lifetime. 1345 Reallocation of IP addresses between clients 1346 Whenever a client binding is released or expires, a BNDUPD mes- 1347 sage must be sent to the partner, setting the binding state to 1348 RELEASED or EXPIRED. However, until a BNDACK is received for this 1349 message, the IP address cannot be allocated to another client. It 1350 cannot be allocated to the same client again if a BNDUPD was sent, 1351 otherwise it can. See Section 7.6. 1353 In normal state, each server receives binding updates from its 1354 partner server in BNDUPD messages. It records these in its client 1355 binding database in stable storage and then sends a corresponding 1356 BNDACK message to its partner server. 1358 8.8.2. Transition Out of NORMAL State 1360 If an external command is received by a server in NORMAL state 1361 informing it that its partner is down, then transition into PARTNER- 1362 DOWN state. Generally, this would be an unusual situation, where 1363 some external agency knew the partner server was down. Using the 1364 command in this case would be appropriate if the polling interval and 1365 timeout were long. 1367 If a server in NORMAL state fails to receive acks to messages sent to 1368 its partner for an implementation dependent period of time, it MAY 1369 move into COMMUNICATIONS-INTERRUPTED state. This situation might 1370 occur if the partner server was capable of maintaining the TCP con- 1371 nection between the server and also capable of sending a CONTACT mes- 1372 sage every tSend seconds, but was (for some reason) incapable of pro- 1373 cessing BNDUPD messages. 1375 If the communications is determined to not be "ok" (as defined in 1376 Section 7.5), then transition into COMMUNICATIONS-INTERRUPTED state. 1378 If a server in NORMAL state receives any messages from its partner 1379 where the partner has changed state from that expected by the server 1380 in NORMAL state, then the server should transition into 1381 COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran- 1382 sition from there. For example, it would be expected for the partner 1383 to transition from POTENTIAL-CONFLICT into NORMAL state, but not for 1384 the partner to transition from NORMAL into POTENTIAL-CONFLICT state. 1386 If a server in NORMAL state receives any messages from its partner 1387 where the PARTNER has changed into SHUTDOWN state, the server should 1388 transition into PARTNER-DOWN state. 1390 8.9. COMMUNICATIONS-INTERRUPTED State 1392 A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is 1393 unable to communicate with its partner. Primary and secondary 1394 servers cycle automatically (without administrative intervention) 1395 between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network 1396 connection between them fails and recovers, or as the partner server 1397 cycles between operational and non-operational. No duplicate IP 1398 address allocation can occur while the servers cycle between these 1399 states. 1401 When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been 1402 configured to support an automatic transition out of COMMUNICATIONS- 1403 INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period" 1404 has been configured, see section 10), then a timer MUST be started 1405 for the length of the configured safe period. 1407 A server transitioning into the COMMUNICATIONS-INTERRUPTED state from 1408 the NORMAL state SHOULD raise some alarm condition to alert 1409 administrative staff to a potential problem in the DHCP subsystem. 1411 8.9.1. Operation in COMMUNICATIONS-INTERRUPTED State 1413 In this state a server MUST respond to all DHCP client requests. 1414 When allocating new lease, each server allocates from its own pool, 1415 where the primary MUST allocate only FREE resources (addresses or 1416 prefixes), and the secondary MUST allocate only BACKUP resources 1417 (addresses or prefixes). When responding to RENEW messages, each 1418 server will allow continued renewal of a DHCP client's current lease 1419 on an IP address or prefix irrespective of whether that lease was 1420 given out by the receiving server or not, although the renewal period 1421 MUST NOT exceed the maximum client lead time (MCLT) beyond the latest 1422 of: 1) the potential valid lifetime already acknowledged by the other 1423 server, or 2) the lease- expiration-time , or 3) the potential valid 1424 lifetime received from the partner server. 1426 However, since the server cannot communicate with its partner in this 1427 state, the acknowledged potential valid lifetime will not be updated 1428 in any new bindings. This is likely to eventually cause the actual 1429 valid lifetimes to be the current time plus the MCLT (unless this is 1430 greater than the desired-client-lease- time). 1432 The server should continue to try to establish a connection with its 1433 partner. 1435 8.9.2. Transition Out of COMMUNICATIONS-INTERRUPTED State 1437 If the safe period timer expires while a server is in the 1438 COMMUNICATIONS-INTERRUPTED state, it will transition immediately into 1439 PARTNER-DOWN state. 1441 If an external command is received by a server in COMMUNICATIONS- 1442 INTERRUPTED state informing it that its partner is down, it will 1443 transition immediately into PARTNER-DOWN state. 1445 If communications is restored with the other server, then the server 1446 in COMMUNICATIONS-INTERRUPTED state will transition into another 1447 state based on the state of the partner: 1449 o NORMAL or COMMUNICATIONS-INTERRUPTED: Transition into the NORMAL 1450 state. 1452 o RECOVER: Stay in COMMUNICATIONS-INTERRUPTED state. 1454 o RECOVER-DONE: Transition into NORMAL state. 1456 o PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or RESOLUTION- 1457 INTERRUPTED: Transition into POTENTIAL-CONFLICT state. 1459 o SHUTDOWN: Transition into PARTNER-DOWN state. 1461 The following figure illustrates the transition from NORMAL to 1462 COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again. 1464 Primary Secondary 1465 Server Server 1467 NORMAL NORMAL 1468 | >--CONTACT-------------------> | 1469 | <--------------------CONTACT--< | 1470 | [TCP connection broken] | 1471 COMMUNICATIONS : COMMUNICATIONS 1472 INTERRUPTED : INTERRUPTED 1473 | [attempt new TCP connection] | 1474 | [connection succeeds] | 1475 | | 1476 | >--CONNECT-------------------> | 1477 | <-----------------CONNECTACK--< | 1478 | NORMAL 1479 | <-------------------STATE-----< | 1480 NORMAL | 1481 | >--STATE---------------------> | 1482 | 1483 | >--BNDUPD--------------------> | 1484 | <---------------------BNDACK--< | 1485 | | 1486 | <---------------------BNDUPD--< | 1487 | >------BNDACK----------------> | 1488 ... ... 1489 | | 1490 | <--------------------POOLREQ--< | 1491 | >--POOLRESP-(2)--------------> | 1492 t> | | 1493 | >--BNDUPD-(#1)---------------> | 1494 | <---------------------BNDACK--< | 1495 | | 1496 | <--------------------POOLREQ--< | 1497 | >--POOLRESP-(0)--------------> | 1498 | | 1499 | >--BNDUPD-(#2)---------------> | 1500 | <---------------------BNDACK--< | 1501 | | 1503 Figure 3: Transition from NORMAL to COMMUNICATIONS-INTERRUPTED and 1504 back (example with 2 addresses allocated to secondary) 1506 8.10. POTENTIAL-CONFLICT State 1508 This state indicates that the two servers are attempting to 1509 reintegrate with each other, but at least one of them was running in 1510 a state that did not guarantee automatic reintegration would be 1511 possible. In POTENTIAL-CONFLICT state the servers may determine that 1512 the same resource has been offered and accepted by two different 1513 clients. 1515 It is a goal of this protocol to minimize the possibility that 1516 POTENTIAL-CONFLICT state is ever entered. 1518 When a primary server enters POTENTIAL-CONFLICT state it should 1519 request that the secondary send it all updates of which it is 1520 currently unaware by sending an UPDREQ message to the secondary 1521 server. 1523 A secondary server entering POTENTIAL-CONFLICT state will wait for 1524 the primary to send it an UPDREQ message. 1526 8.10.1. Operation in POTENTIAL-CONFLICT State 1528 Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming 1529 DHCP requests. 1531 8.10.2. Transition Out of POTENTIAL-CONFLICT State 1533 If communications fails with the partner while in POTENTIAL-CONFLICT 1534 state, then the server will transition to RESOLUTION-INTERRUPTED 1535 state. 1537 Whenever either server receives an UPDDONE message from its partner 1538 while in POTENTIAL-CONFLICT state, it MUST transition to a new state. 1539 The primary MUST transition to CONFLICT-DONE state, and the secondary 1540 MUST transition to NORMAL state. This will cause the primary server 1541 to leave POTENTIAL-CONFLICT state prior to the secondary, since the 1542 primary sends an UPDREQ message and receives an UPDDONE before the 1543 secondary sends an UPDREQ message and receives its UPDDONE message. 1545 When a secondary server receives an indication that the primary 1546 server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE 1547 state, it SHOULD send an UPDREQ message to the primary server. 1549 Primary Secondary 1550 Server Server 1552 | | 1553 POTENTIAL-CONFLICT POTENTIAL-CONFLICT 1554 | | 1555 | >--UPDREQ--------------------> | 1556 | | 1557 | <---------------------BNDUPD--< | 1558 | >--BNDACK--------------------> | 1559 ... ... 1560 | | 1561 | <---------------------BNDUPD--< | 1562 | >--BNDACK--------------------> | 1563 | | 1564 | <--------------------UPDDONE--< | 1565 CONFLICT-DONE | 1566 | >--STATE--(CONFLICT-DONE)----> | 1567 | <---------------------UPDREQ--< | 1568 | | 1569 | >--BNDUPD--------------------> | 1570 | <---------------------BNDACK--< | 1571 ... ... 1572 | >--BNDUPD--------------------> | 1573 | <---------------------BNDACK--< | 1574 | | 1575 | >--UPDDONE-------------------> | 1576 | NORMAL 1577 | <------------STATE--(NORMAL)--< | 1578 NORMAL | 1579 | >--STATE--(NORMAL)-----------> | 1580 | | 1581 | <--------------------POOLREQ--< | 1582 | >------POOLRESP-(n)----------> | 1583 | addresses | 1585 Figure 4: Transition out of POTENTIAL-CONFLICT 1587 8.11. RESOLUTION-INTERRUPTED State 1589 This state indicates that the two servers were attempting to 1590 reintegrate with each other in POTENTIAL-CONFLICT state, but 1591 communications failed prior to completion of re-integration. 1593 If the servers remained in POTENTIAL-CONFLICT while communications 1594 was interrupted, neither server would be responsive to DHCP client 1595 requests, and if one server had crashed, then there might be no 1596 server able to process DHCP requests. 1598 When a server enters RESOLUTION-INTERRUPTED state it SHOULD raise an 1599 alarm condition to alert administrative staff of a problem in the 1600 DHCP subsystem. 1602 8.11.1. Operation in RESOLUTION-INTERRUPTED State 1604 In this state a server MUST respond to all DHCP client requests. 1605 When allocating new resources (addresses or prefixes), each server 1606 SHOULD allocate from its own pool (if that can be determined), where 1607 the primary SHOULD allocate only FREE resources, and the secondary 1608 SHOULD allocate only BACKUP resources. When responding to renewal 1609 requests, each server will allow continued renewal of a DHCP client's 1610 current lease irrespective of whether that lease was given out by the 1611 receiving server or not, although the renewal period MUST not exceed 1612 the maximum client lead time (MCLT) beyond the latest of: 1) the 1613 potential valid lifetime already acknowledged by the other server or 1614 2) the lease-expiration-time or 3) potential valid lifetime received 1615 from the partner server. 1617 However, since the server cannot communicate with its partner in this 1618 state, the acknowledged potential valid lifetime will not be updated 1619 in any new bindings. 1621 8.11.2. Transition Out of RESOLUTION-INTERRUPTED State 1623 If an external command is received by a server in RESOLUTION- 1624 INTERRUPTED state informing it that its partner is down, it will 1625 transition immediately into PARTNER-DOWN state. 1627 If communications is restored with the other server, then the server 1628 in RESOLUTION-INTERRUPTED state will transition into POTENTIAL- 1629 CONFLICT state. 1631 8.12. CONFLICT-DONE State 1633 This state indicates that during the process where the two servers 1634 are attempting to re-integrate with each other, the primary server 1635 has received all of the updates from the secondary server. It make a 1636 transition into CONFLICT-DONE state in order that it may be totally 1637 responsive to the client load, as opposed to NORMAL state where it 1638 would be in a "balanced" responsive state, running the load balancing 1639 algorithm. 1641 TODO: We do not support load balancing, so CONFLICT-DONE is actually 1642 equal to NORMAL. Need to remove CONFLICT-DONE and replace all its 1643 references to NORMAL. 1645 8.12.1. Operation in CONFLICT-DONE State 1647 A primary server in CONFLICT-DONE state is fully responsive to all 1648 DHCP clients (similar to the situation in COMMUNICATIONS-INTERRUPTED 1649 state). 1651 If communications fails, remain in CONFLICT-DONE state. If 1652 communications becomes OK, remain in CONFLICT-DONE state until the 1653 conditions for transition out become satisfied. 1655 8.12.2. Transition Out of CONFLICT-DONE State 1657 If communications fails with the partner while in CONFLICT-DONE 1658 state, then the server will remain in CONFLICT-DONE state. 1660 When a primary server determines that the secondary server has made a 1661 transition into NORMAL state, the primary server will also transition 1662 into NORMAL state. 1664 8.13. PAUSED State 1666 TODO: Remove PAUSED state completely 1668 This state exists to allow one server to inform another that it will 1669 be out of service for what is predicted to be a relatively short 1670 time, and to allow the other server to transition to COMMUNICATIONS- 1671 INTERRUPTED state immediately and to begin servicing all DHCP clients 1672 with no interruption in service to new DHCP clients. 1674 A server which is aware that it is shutting down temporarily SHOULD 1675 send a STATE message with the server-state option containing PAUSED 1676 state and close the TCP connection. 1678 While a server may or may not transition internally into PAUSED 1679 state, the 'previous' state determined when it is restarted MUST be 1680 the state the server was in prior to receiving the command to shut- 1681 down and restart and which precedes its entry into the PAUSED state. 1682 See Section 8.3.2 concerning the use of the previous state upon 1683 server restart. 1685 When entering PAUSED state, the server MUST store the previous state 1686 in stable storage, and use that state as the previous state when it 1687 is restarted. 1689 8.13.1. Operation in PAUSED State 1691 Server MUST NOT perform any operation while in PAUSED state. 1693 8.13.2. Transition Out of PAUSED State 1695 A server makes a transition out of PAUSED state by being restarted. 1696 At that time, the previous state MUST be the state the server was in 1697 prior to entering the PAUSED state. 1699 8.14. SHUTDOWN State 1701 This state exists to allow one server to inform another that it will 1702 be out of service for what is predicted to be a relatively long time, 1703 and to allow the other server to transition immediately to PARTNER- 1704 DOWN state, and take over completely for the server going down. 1706 When entering SHUTDOWN state, the server MUST record the previous 1707 state in stable storage for use when the server is restarted. It 1708 also MUST record the current time as the last time operational. 1710 A server which is aware that it is shutting down SHOULD send a STATE 1711 message with the server-state field containing SHUTDOWN. 1713 8.14.1. Operation in SHUTDOWN State 1715 A server in SHUTDOWN state MUST NOT respond to any DHCP client input. 1717 If a server receives any message indicating that the partner has 1718 moved to PARTNER-DOWN state while it is in SHUTDOWN state then it 1719 MUST record RECOVER state as the previous state to be used when it is 1720 restarted. 1722 A server SHOULD wait for a few seconds after informing the partner of 1723 entry into SHUTDOWN state (if communications are okay) to determine 1724 if the partner entered PARTNER-DOWN state. 1726 8.14.2. Transition Out of SHUTDOWN State 1728 A server makes a transition out of SHUTDOWN state by being restarted. 1730 9. Proposed extensions 1732 The following section discusses possible extensions to the proposed 1733 failover mechanism. Listed extensions must be sufficiently simple to 1734 not further complicate failover protocol. Any proposals that are 1735 considered complex will be defined as stand-alone extensions in 1736 separate documents. 1738 9.1. Active-active mode 1740 A very simple way to achieve active-active mode is to remove the 1741 restriction that seconary server MUST NOT respond to SOLICIT and 1742 REQUEST messages. Instead it could respond, but MUST have lower 1743 preference than primary server. Clients discovering available 1744 servers will receive ADVERTISE messages from both servers, but are 1745 expected to select the primary server as it has higher preference 1746 value configured. The following REQUEST message will be directed to 1747 primary server. 1749 Discussion: Do DHCPv6 clients actually do this? DHCPv4 clients were 1750 rumored to wait for a "while" to accept the best offer, but to a 1751 first approximation, they all take the first offer they receive that 1752 is even acceptable. 1754 The benefit of this approach, compared to the "basic" active--passive 1755 solution is that there is no delay between primary failure and the 1756 moment when secondary starts serving requests. 1758 Discussion: The possibility of setting both servers preference to an 1759 equal value could theoretically work as a crude attempt to provide 1760 load balancing. It wouldn't do much good on its own, as one (faster) 1761 server could be chosen more frequently (assuming that with equal 1762 preference sets clients will pick first responding server, which is 1763 not mandated by DHCPv6). We could design a simple mechanism of 1764 dynamically updating preference depending on usage of available 1765 resources. This concept hasn't been investigated in detail yet. 1767 10. Dynamic DNS Considerations 1769 TODO: Descibe DNS Updates challenges in failover environment. It is 1770 nicely described in Section 5.12 of [dhcpv4-failover]. 1772 11. Reservations and failover 1774 TODO: Describe how lease reservation works with failover. See 1775 Section 5.13 in [dhcpv4-failover]. 1777 12. Protocol entities 1779 Discussion: It is unclear if following sections belong to design or 1780 protocol draft. It is currently kept here as a scratchbook with list 1781 of things that will have to be defined eventually. Whether or not it 1782 will stay in this document or will be moved to the protocol spec 1783 document is TBD. 1785 12.1. Failover Protocol 1787 This section enumerates list of options that will be defined in 1788 failover protocol specification. Rough description of purpose and 1789 content for each option is specified. Exact on wire format will be 1790 defined in protocol specification. 1792 1. OPTION_FO_TIMESTAMP - convey information about timestamp. It is 1793 used by time skew measurement algorithm (see Section 7.1). 1795 12.2. Protocol constants 1797 This section enumerates various constants that have to be defined in 1798 actual protocol specification. 1800 1. TIME_SKEW_PKTS_AVG - number of packets that are used to calculate 1801 average time skew between partners. See (see Section 7.1). 1803 13. Open questions 1805 This is scratchbook. This section will be removed once questions are 1806 answered. 1808 Q: Do we want to support temporary addresses? I think not. They are 1809 short-lived by definition, so clients should not mind getting new 1810 temporary addresses. 1812 Q: Do we want to support CGA-registered addresses? There is 1813 currently work in DHC WG about this, but I haven't looked at it yet. 1814 If that is complicated, we may not define it here, but rather as an 1815 extension. [If it moves forward, we need to support it.] 1817 14. Security Considerations 1819 TODO: Security considerations section will contain loose notes and 1820 will be transformed into consistent text once the core design 1821 solidifies. 1823 15. IANA Considerations 1825 IANA is not requested to perform any actions at this time. 1827 16. Acknowledgements 1829 This document extensively uses concepts, definitions and other parts 1830 of [dhcpv4-failover] document. Authors would like to thank Shawn 1831 Routher, Greg Rabil, and Bernie Volz for their significant 1832 involvement and contributions. 1834 This work has been partially supported by Department of Computer 1835 Communications (a division of Gdansk University of Technology) and 1836 the Polish Ministry of Science and Higher Education under the 1837 European Regional Development Fund, Grant No. POIG.01.01.02-00-045/ 1838 09-00 (Future Internet Engineering Project). 1840 17. References 1842 17.1. Normative References 1844 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1845 Requirement Levels", BCP 14, RFC 2119, March 1997. 1847 [RFC2131] Droms, R., "Dynamic Host Configuration Protocol", 1848 RFC 2131, March 1997. 1850 [RFC3074] Volz, B., Gonczi, S., Lemon, T., and R. Stevens, "DHC Load 1851 Balancing Algorithm", RFC 3074, February 2001. 1853 [RFC3315] Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C., 1854 and M. Carney, "Dynamic Host Configuration Protocol for 1855 IPv6 (DHCPv6)", RFC 3315, July 2003. 1857 [RFC3633] Troan, O. and R. Droms, "IPv6 Prefix Options for Dynamic 1858 Host Configuration Protocol (DHCP) version 6", RFC 3633, 1859 December 2003. 1861 [RFC4704] Volz, B., "The Dynamic Host Configuration Protocol for 1862 IPv6 (DHCPv6) Client Fully Qualified Domain Name (FQDN) 1863 Option", RFC 4704, October 2006. 1865 [RFC5460] Stapp, M., "DHCPv6 Bulk Leasequery", RFC 5460, 1866 February 2009. 1868 17.2. Informative References 1870 [I-D.ietf-dhc-dhcpv6-redundancy-consider] 1871 Tremblay, J., Brzozowski, J., Chen, J., and T. Mrugalski, 1872 "DHCPv6 Redundancy Deployment Considerations", 1873 draft-ietf-dhc-dhcpv6-redundancy-consider-02 (work in 1874 progress), October 2011. 1876 [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, 1877 "Dynamic Updates in the Domain Name System (DNS UPDATE)", 1878 RFC 2136, April 1997. 1880 [dhcpv4-failover] 1881 Droms, R., Kinnear, K., Stapp, M., Volz, B., Gonczi, S., 1882 Rabil, G., Dooley, M., and A. Kapur, "DHCP Failover 1883 Protocol", draft-ietf-dhc-failover-12 (work in progress), 1884 March 2003. 1886 [requirements] 1887 Mrugalski, T. and K. Kinnear, "DHCPv6 Failover 1888 Requirements", 1889 draft-ietf-dhc-dhcpv6-failover-requirements-00 (work in 1890 progress), October 2011. 1892 Authors' Addresses 1894 Tomasz Mrugalski 1895 Internet Systems Consortium, Inc. 1896 950 Charter Street 1897 Redwood City, CA 94063 1898 USA 1900 Phone: +1 650 423 1345 1901 Email: tomasz.mrugalski@gmail.com 1903 Kim Kinnear 1904 Cisco Systems, Inc. 1905 1414 Massachusetts Ave. 1906 Boxborough, Massachusetts 01719 1907 USA 1909 Phone: +1 (978) 936-0000 1910 Email: kkinnear@cisco.com