idnits 2.17.1 

draft-mrugalski-dhc-dhcpv6-failover-design-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 39 instances of too long lines in the document, the longest
     one being 2 characters in excess of 72.

  ** The abstract seems to contain references ([RFC3315]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     In this state a server MUST respond to all DHCP client requests.
     When allocating new resources (addresses or prefixes), each server SHOULD
     allocate from its own pool (if that can be determined), where the primary
     SHOULD allocate only FREE resources, and the secondary SHOULD allocate
     only BACKUP resources.  When responding to renewal requests, each server
     will allow continued renewal of a DHCP client's current lease
     irrespective of whether that lease was given out by the receiving server
     or not, although the renewal period MUST not exceed the maximum client
     lead time (MCLT) beyond the latest of: 1) the potential valid lifetime
     already acknowledged by the other server or 2) the lease-expiration-time
     or 3) potential valid lifetime received from the partner server.

  -- The document date (March 12, 2012) is 4428 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2131' is defined on line 1847, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3074' is defined on line 1850, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3633' is defined on line 1857, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4704' is defined on line 1861, but no explicit
     reference was found in the text

  == Unused Reference: 'I-D.ietf-dhc-dhcpv6-redundancy-consider' is defined
     on line 1870, but no explicit reference was found in the text

  == Unused Reference: 'RFC2136' is defined on line 1876, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 3315 (Obsoleted by RFC 8415)

  ** Obsolete normative reference: RFC 3633 (Obsoleted by RFC 8415)

  == Outdated reference: A later version (-03) exists of
     draft-ietf-dhc-dhcpv6-redundancy-consider-02


     Summary: 4 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Dynamic Host Configuration (DHC)                            T. Mrugalski
3	Internet-Draft                                                       ISC
4	Intended status: Standards Track                              K. Kinnear
5	Expires: September 13, 2012                                        Cisco
6	                                                          March 12, 2012

8	                         DHCPv6 Failover Design
9	             draft-mrugalski-dhc-dhcpv6-failover-design-01

11	Abstract

13	   DHCPv6 defined in [RFC3315] does not offer server redundancy.  This
14	   document defines a design for DHCPv6 failover, a mechanism for
15	   running two servers on the same network with capability for either
16	   server to take over clients' leases in case of server failure or
17	   network partition.  This is a DHCPv6 Failover design document, it is
18	   not protocol specification document.  It is a second document in a
19	   planned series of three documents.  DHCPv6 failover requirements are
20	   specified in [requirements].  A protocol specification document is
21	   planned to follow this document.

23	Status of this Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on September 13, 2012.

40	Copyright Notice

42	   Copyright (c) 2012 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Requirements Language  . . . . . . . . . . . . . . . . . . . .  4
58	   2.  Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
59	   3.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
60	     3.1.  Additional Requirements  . . . . . . . . . . . . . . . . .  5
61	     3.2.  Features out of Scope: Load Balancing  . . . . . . . . . .  6
62	   4.  Protocol Overview  . . . . . . . . . . . . . . . . . . . . . .  6
63	     4.1.  Failover Machine Sate Overview . . . . . . . . . . . . . .  7
64	   5.  Connection Management  . . . . . . . . . . . . . . . . . . . .  9
65	     5.1.  Creating Connections . . . . . . . . . . . . . . . . . . .  9
66	     5.2.  Endpoint Identification  . . . . . . . . . . . . . . . . . 10
67	   6.  Resource Allocation  . . . . . . . . . . . . . . . . . . . . . 11
68	     6.1.  Proportional Allocation  . . . . . . . . . . . . . . . . . 12
69	     6.2.  Independent Allocation . . . . . . . . . . . . . . . . . . 13
70	     6.3.  Determining Allocation Approach  . . . . . . . . . . . . . 13
71	       6.3.1.  IPv6 Addresses . . . . . . . . . . . . . . . . . . . . 13
72	       6.3.2.  IPv6 Prefixes  . . . . . . . . . . . . . . . . . . . . 13
73	   7.  Failover Mechanisms  . . . . . . . . . . . . . . . . . . . . . 13
74	     7.1.  Time Skew  . . . . . . . . . . . . . . . . . . . . . . . . 14
75	     7.2.  Time expression  . . . . . . . . . . . . . . . . . . . . . 14
76	     7.3.  Lazy updates . . . . . . . . . . . . . . . . . . . . . . . 14
77	     7.4.  MCLT concept . . . . . . . . . . . . . . . . . . . . . . . 15
78	       7.4.1.  MCLT example . . . . . . . . . . . . . . . . . . . . . 16
79	     7.5.  Unreachability detection . . . . . . . . . . . . . . . . . 17
80	     7.6.  Re-allocating Leases . . . . . . . . . . . . . . . . . . . 17
81	     7.7.  Sending Data . . . . . . . . . . . . . . . . . . . . . . . 18
82	       7.7.1.  Required Data  . . . . . . . . . . . . . . . . . . . . 18
83	       7.7.2.  Optional Data  . . . . . . . . . . . . . . . . . . . . 18
84	     7.8.  Receiving Data . . . . . . . . . . . . . . . . . . . . . . 18
85	       7.8.1.  Conflict Resolution  . . . . . . . . . . . . . . . . . 18
86	       7.8.2.  Acknowledging Reception  . . . . . . . . . . . . . . . 19
87	   8.  Endpoint States  . . . . . . . . . . . . . . . . . . . . . . . 19
88	     8.1.  State Machine Operation  . . . . . . . . . . . . . . . . . 19
89	     8.2.  State Machine Initialization . . . . . . . . . . . . . . . 22
90	     8.3.  STARTUP State  . . . . . . . . . . . . . . . . . . . . . . 22
91	       8.3.1.  Operation in STARTUP State . . . . . . . . . . . . . . 22
92	       8.3.2.  Transition Out of STARTUP State  . . . . . . . . . . . 22
93	     8.4.  PARTNER-DOWN State . . . . . . . . . . . . . . . . . . . . 24
94	       8.4.1.  Operation in PARTNER-DOWN State  . . . . . . . . . . . 24
95	       8.4.2.  Transition Out of PARTNER-DOWN State . . . . . . . . . 24

97	     8.5.  RECOVER State  . . . . . . . . . . . . . . . . . . . . . . 25
98	       8.5.1.  Operation in RECOVER State . . . . . . . . . . . . . . 25
99	       8.5.2.  Transition Out of RECOVER State  . . . . . . . . . . . 25
100	     8.6.  RECOVER-WAIT State . . . . . . . . . . . . . . . . . . . . 27
101	       8.6.1.  Operation in RECOVER-WAIT State  . . . . . . . . . . . 28
102	       8.6.2.  Transition Out of RECOVER-WAIT State . . . . . . . . . 28
103	     8.7.  RECOVER-DONE State . . . . . . . . . . . . . . . . . . . . 28
104	       8.7.1.  Operation in RECOVER-DONE State  . . . . . . . . . . . 29
105	       8.7.2.  Transition Out of RECOVER-DONE State . . . . . . . . . 29
106	     8.8.  NORMAL State . . . . . . . . . . . . . . . . . . . . . . . 29
107	       8.8.1.  Operation in NORMAL State  . . . . . . . . . . . . . . 29
108	       8.8.2.  Transition Out of NORMAL State . . . . . . . . . . . . 30
109	     8.9.  COMMUNICATIONS-INTERRUPTED State . . . . . . . . . . . . . 31
110	       8.9.1.  Operation in COMMUNICATIONS-INTERRUPTED State  . . . . 31
111	       8.9.2.  Transition Out of COMMUNICATIONS-INTERRUPTED State . . 32
112	     8.10. POTENTIAL-CONFLICT State . . . . . . . . . . . . . . . . . 33
113	       8.10.1. Operation in POTENTIAL-CONFLICT State  . . . . . . . . 34
114	       8.10.2. Transition Out of POTENTIAL-CONFLICT State . . . . . . 34
115	     8.11. RESOLUTION-INTERRUPTED State . . . . . . . . . . . . . . . 35
116	       8.11.1. Operation in RESOLUTION-INTERRUPTED State  . . . . . . 36
117	       8.11.2. Transition Out of RESOLUTION-INTERRUPTED State . . . . 36
118	     8.12. CONFLICT-DONE State  . . . . . . . . . . . . . . . . . . . 36
119	       8.12.1. Operation in CONFLICT-DONE State . . . . . . . . . . . 37
120	       8.12.2. Transition Out of CONFLICT-DONE State  . . . . . . . . 37
121	     8.13. PAUSED State . . . . . . . . . . . . . . . . . . . . . . . 37
122	       8.13.1. Operation in PAUSED State  . . . . . . . . . . . . . . 37
123	       8.13.2. Transition Out of PAUSED State . . . . . . . . . . . . 38
124	     8.14. SHUTDOWN State . . . . . . . . . . . . . . . . . . . . . . 38
125	       8.14.1. Operation in SHUTDOWN State  . . . . . . . . . . . . . 38
126	       8.14.2. Transition Out of SHUTDOWN State . . . . . . . . . . . 38
127	   9.  Proposed extensions  . . . . . . . . . . . . . . . . . . . . . 38
128	     9.1.  Active-active mode . . . . . . . . . . . . . . . . . . . . 39
129	   10. Dynamic DNS Considerations . . . . . . . . . . . . . . . . . . 39
130	   11. Reservations and failover  . . . . . . . . . . . . . . . . . . 39
131	   12. Protocol entities  . . . . . . . . . . . . . . . . . . . . . . 39
132	     12.1. Failover Protocol  . . . . . . . . . . . . . . . . . . . . 40
133	     12.2. Protocol constants . . . . . . . . . . . . . . . . . . . . 40
134	   13. Open questions . . . . . . . . . . . . . . . . . . . . . . . . 40
135	   14. Security Considerations  . . . . . . . . . . . . . . . . . . . 40
136	   15. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 40
137	   16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 41
138	   17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 41
139	     17.1. Normative References . . . . . . . . . . . . . . . . . . . 41
140	     17.2. Informative References . . . . . . . . . . . . . . . . . . 41
141	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42

143	1.  Requirements Language

145	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
146	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
147	   document are to be interpreted as described in RFC 2119 [RFC2119].

149	2.  Glossary

151	   This is a supplemental glossary that should be combined with
152	   definitions in Section 3 of [requirements].

154	   o  Failover endpoint - The failover protocol allows for there to be a
155	      unique failover 'endpoint' per partner per role per relationship
156	      (where role is primary or secondary and the relationship is
157	      defined by the relationship-name).  This failover endpoint can
158	      take actions and hold unique states.  Typically, there is a one
159	      failover endpoint per partner (server), although there may be
160	      more.  'Server' and 'failover endpoint' are synonymous only if the
161	      server participates in only one failover relationship.  However,
162	      for the sake of simplicity 'Server' is used throughout the
163	      document to refer to a failover endpoint unless to do so would be
164	      confusing.

166	   o  Failover transmission - all messages exchanged between partners.

168	   o  Independent Allocation - a prefix allocation algorithm to split
169	      the available pool of resources between the primary and secondary
170	      servers that is particularly well suited for vast pools (i.e. when
171	      available resources are not expected to deplete).  See Section 6.2
172	      for details.

174	   o  Primary Server

176	   o  Proportional Allocation - a prefix allocation algorithm to split
177	      the available free leases between the primary and secondary
178	      servers that is particularly well suited for more limited
179	      resources.  See Section 6.1 for details.

181	   o  Resource - an IPv6 address or a IPv6 prefix.

183	   o  Responsive - A server that is responsive, will respond to DHCPv6
184	      client requests.

186	   o  Secondary Server

188	   o  Server - A DHCPv6 server that implements DHCPv6 failover.
189	      'Server' and 'failover endpoint' as synonymous only if server
190	      participates in only one failover relationship.

192	   o  Unresponsive - A server that is unresponsive will not respond to
193	      DHCPv6 client requests.

195	3.  Introduction

197	   The failover protocol design provides a means for cooperating DHCPv6
198	   servers to work together to provide a DHCPv6 service with
199	   availability that is increased beyond that which could be provided by
200	   a single DHCPv6 server operating alone.  It is designed to protect
201	   DHCPv6 clients against server unreachability, including server
202	   failure and network partition.  It is possible to deploy exactly two
203	   servers that are able to continue providing a lease on an IPv6
204	   address or on an IPv6 prefix without the DHCPv6 client experiencing
205	   lease expiration or a reassignment of a lease to a different IPv6
206	   address in the event of failure by one or the other of the two
207	   servers.

209	   This protocol defines active-passive mode, sometimes also called hot
210	   standby model.  This means that during normal operation one server is
211	   active (i.e. actively responds to clients' requests) while the second
212	   is passive (i.e. it does receive clients' requests, but does not
213	   respond to them and only maintains a copy of lease database and is
214	   ready to take over incoming queries in case of primary server
215	   failure).  Active-active mode (i.e. both servers actively handling
216	   clients' requests) is currently not supported for the sake of
217	   simplicity.  Such mode may be defined as an exension at a later time.

219	   The failover protocol is designed to provide lease stability for
220	   leases with lease times beyond a short period.  Due to the additional
221	   overhead required, failover is not suitable for leases shorter than
222	   30 seconds.  The DHCPv6 Failover protocol MUST NOT be used for leases
223	   shorter than 30 seconds.

225	   This design attempts to fulfill all DHCPv6 failover requirements
226	   defined in [requirements].

228	3.1.  Additional Requirements

230	   The following requirements are not related to failover mechanism in
231	   general, but rather to this particular design.

233	   1.  Minimize Asymmetry - while there are two distinct roles in
234	       failover (primary and secondary server), the differences between
235	       those two roles should be as small as possible.  This will yield
236	       a simpler design as well as a simpler implementation of that
237	       design.

239	3.2.  Features out of Scope: Load Balancing

241	   It may be tempting to extend DHCPv6 failover mechanism to also offer
242	   load balancing, as DHCPv4 failover did.  Here is the reasoning for
243	   this decision.  In general case (not related to failover) load
244	   balancing solutions are used when each server is not able to handle
245	   total incoming traffic.  However, by the very definition, DHCPv6
246	   failover is supposed to assume service availability despite failure
247	   of one server.  That leads to conclusion that each server must be
248	   able to handle whole traffic.  Therefore in properly provisioned
249	   setup, load balancing is not needed.

251	4.  Protocol Overview

253	   The DHCPv6 Failover Protocol is defined as a communication between
254	   failover partners with all associated algorithms and mechanisms.
255	   Failover communication is conducted over a TCP connection established
256	   between the partners.  The protocol reuses the framing format
257	   specified in Section 5.1 of DHCPv6 Bulk Leasequery [RFC5460], but
258	   uses different message types.  Additional failover-specific message
259	   types will be defined.  All information is sent over the connection
260	   as typical DHCPv6 Options, following format defined in Section 22.1
261	   of [RFC3315].

263	   After initialization, the primary server establishes a TCP connection
264	   with its partner.  The primary server sends a CONNECT message with
265	   initial parameters.  Secondary server responds with CONNECTACK.

267	   Depending on the failover state of each partner, they MUST initiate
268	   one of the binding update procedures.  Each server MAY send an UPDREQ
269	   message to request its partner to send all updates that have not been
270	   sent yet (this case applies when partner has an existing database and
271	   wants to update it).  Alternatively, a server MAY choose to send an
272	   UPDREQALL message to request a full lease database transmission
273	   including all leases (this case applies in case of booting up new
274	   server after installation, corruption or complete loss of database,
275	   or other catastrophic failure).

277	   Servers exchange lease information by using BNDUPD messages.
278	   Depending on local and remote state of a lease, a server may either
279	   accept or reject the update.  Reception of lease update information
280	   is confirmed by responding with BNDACK message with appropriate
281	   status.  The majority of the messages sent over a failover TCP
282	   connection consists of BNDUPD and BNDACK messages.

284	   A subset of available resources (addresses or prefixes) is reserved
285	   for secondary server use.  This is required for handling a case where
286	   both servers are able to communicate with clients, but unable to
287	   communicate with each other.  After initial connection is
288	   established, the secondary server requests a pool of available
289	   addresses by sending a POOLREQ message.  The primary server assigns a
290	   pool to the secondary by transmitting a POOLRESP message and then
291	   sending a series of BNDUPD messages.  The secondary server may
292	   initiate such pool request at any time when maintaining communication
293	   with primary server.

295	   Failover servers use a lazy update mechanism to update their failover
296	   partner about changes to their lease state database.  After a server
297	   performs any modifications to its lease state database (assign a new
298	   lease, extend an existing one, release or expire a lease), it sends
299	   its response to the client's request first (performing the "regular"
300	   DHCPv6 operation) and then informs its failover partner using a
301	   BNDUPD message.  This BNDUPD message SHOULD be sent soon after the
302	   response is sent to the DHCPv6 client, but there is no specific
303	   requirement of a minimum time in which to do so.

305	   The major problem with lazy update mechanism is the case when the
306	   server crashes after sending response to client, but before sending
307	   the lazy update to its partner (or when communication between
308	   partners is interrupted).  To solve this problem, concept known as
309	   the Maximum Client Lead Time (MCLT) (initially designed for DHCPv4
310	   failover) is used.  The MCLT is the maximum amount of time that one
311	   server can extend a lease for a client's binding beyond the time
312	   known by its failover partner.  See Section 7.4 for detailed
313	   desciption how MCLT affects assigned lease times.

315	   Servers verify each others availability by periodically exchanging
316	   CONTACT messages.  See Section 7.5 for discussion about detecting
317	   partner's unreachability.

319	   A server that is being shut down transmits a DISCONNECT message,
320	   closes the connection with its failover partner and stops operation.
321	   A Server SHOULD transmit any pending lease updates before
322	   transmitting DISCONNECT message.

324	4.1.  Failover Machine Sate Overview

326	   The following section provides simplified description of all states.
327	   For the sake of clarity and simplicity, it omits important details.
328	   For complete description, see Section 8.  In case of a disagreement
329	   between simplified and complete description, please follow Section 8.

331	   Each server may be in one of the well defines states.  In each state
332	   a server may be either responsive (responds to clients' queries) or
333	   unresponsive (clients' queries are ignored).

335	   A server starts its operation in short-lived STARTUP state.  A server
336	   determines its partner reachibility and state and usually returns
337	   back to the state it was in before shutdown.

339	   During typical operation when servers maintain communication, both
340	   are in NORMAL state.  In that state only primary responds to clients'
341	   requests.  A secondary server in unresponsive.

343	   If a server discovers that its partner is no longer reachable, it
344	   goes to COMMUNICATIONS-INTERRUPTED state.  Server must be extra
345	   cautious as it can't distingush if its partner is down or just
346	   communication between servers is interrupted.  Since communication
347	   between partners is not possible, a server must act on the assumtion
348	   that if its partner is up, it follows defined procedure.  In
349	   particular, not extend any lease beyond its partner knowledge by at
350	   most MCLT.  That imposes additional burden on the server.  Therefore
351	   it is not recommended to operate for prolonged periods in this state.
352	   Once communication is reestablished, server may go into NORMAL,
353	   POTENTIAL-CONFLICT or PARTNER-DOWN state.  It may also stay in
354	   COMMUNICATIONS-INTERRUPTED if certain conditions are met.

356	   Once a server is switched into PARTNER-DOWN (when auto-partner-down
357	   is used or as a result of administrative action), it can extend
358	   leases, regardless of the original server that initially granted the
359	   lease.  In that state server handles leases from its own pool, but is
360	   albo able to serve pool from its downed partner.  MCLT restrictions
361	   no longer apply.  Operation in this mode is less demanding for the
362	   server that remains operational, than in COMMUNICATIONS-INTERRUPTED
363	   state, but PARTNER-DOWN does not offer any kind of redundancy.

365	   When server loses its database (e.g. due to first time run or
366	   catastrophic failure) or detects that is partner is in PARTNER-DOWN
367	   state and additional conditions are met, it switches to RECOVER
368	   state.  In that state server acknowledges that content of its
369	   database is doubtful and needs to refresh its database from its
370	   partner.  Once this operation is done, it switches to RECOVER-WAIT
371	   and later to RECOVER-DONE.

373	   Once servers reestablish connection, they discover each others'
374	   state.  Depending on the conditions, they may return to NORMAL or
375	   move to POTENTINAL-CONFLICT in case of unexpected partner's state.
376	   It is a goal of this protocol to minimize the possibility that
377	   POTENTIAL-CONFLICT state is ever entered.  Servers running in
378	   POTENTIAL-CONFLICT do not respond to clients' requests and work on
379	   resolving potential conflicts.  Once outstanding lease updates are
380	   exchanged, servers move to CONFLICT-DONE or NORMAL states.

382	   Servers that are recovering from potential conflict and loose
383	   communication, switch to RESOLUTION-INTERRUPTED.

385	   Server that is being shut down, switches briefly to SHUTDOWN state
386	   and communicates its state to its partner before actual termination.

388	5.  Connection Management

390	5.1.  Creating Connections

392	   Every server implementing the failover protocol SHOULD attempt to
393	   connect to all of its partners periodically, where the period is
394	   implementation dependent and SHOULD be configurable.  In the event
395	   that a connection has been rejected by a CONNECTACK message with a
396	   reject-reason option contained in it or a DISCONNECT message, a
397	   server SHOULD reduce the frequency with which it attempts to connect
398	   to that server but it SHOULD continue to attempt to connect
399	   periodically.

401	   When a connection attempt succeeds, if the server generating the
402	   connection attempt is a primary server for that relationship, then it
403	   MUST send a CONNECT message down the connection.  If it is not a
404	   primary server for the relationship, then it MUST just drop the
405	   connection and wait for the primary server to connect to it.

407	   When a connection attempt is received, the only information that the
408	   receiving server has is the IP address of the partner initiating a
409	   connection.  It also knows whether it has the primary role for any
410	   failover relationships with the connecting server.  If it has any
411	   relationships for which it is a primary server, it should initiate a
412	   connection of its own to the partner server, one for each primary
413	   relationship it has with that server.

415	   If it has any relationships with the connecting server for which it
416	   is a seconary server, it should just await the CONNECT message to
417	   determine which relationship this connection is to serve.

419	   If it has no secondary relationships with the connecting server, it
420	   SHOULD drop the connection.

422	   To summarize -- a primary server MUST use a connection that it has
423	   initiated in order to send a CONNECT message.  Every server that is a
424	   secondary server in a relationship attempts to create a connection to
425	   the server which is primary in the relationship, but that connection
426	   is only used to stimulate the primary server into recognizing that
427	   the secondary server is ready for operation.  The reason behind this
428	   is that the secondary server has no way to communicate to the primary
429	   server which relationship a connection is designed to serve.

431	   A server which has multiple secondary relationships with a primary
432	   server SHOULD only send one stimulus connection attempt to the
433	   primary server.

435	   Once a connection is established, the primary server MUST send a
436	   CONNECT message across the connection.  A secondary server MUST wait
437	   for the CONNECT message from a primary server.  If the secondary
438	   server doesn't receive a CONNECT message from the primary server in
439	   an installation dependent amount of time, it MAY drop the connection
440	   and send another stimulus connection attempt to the primary server.

442	   Every CONNECT message includes a TLS-request option, and if the
443	   CONNECTACK message does not reject the CONNECT message and the TLS-
444	   reply option says TLS MUST be used, then the servers will immediately
445	   enter into TLS negotiation.

447	   Once TLS negotiation is complete, the primary server MUST resend the
448	   CONNECT message on the newly secured TLS connection and then wait for
449	   the CONNECTACK message in response.  The TLS-request and TLS-reply
450	   options MUST NOT appear in either this second CONNECT or its
451	   associated CONNECTACK message as they had in the first messages.

453	   The second message sent over a new connection (either a bare TCP
454	   connection or a connection utilizing TLS) is a STATE message.  Upon
455	   the receipt of this message, the receiver can consider communications
456	   up.

458	   A secondary server MUST NOT respond to the closing of a TCP
459	   connection with a blind attempt to reconnect -- there may be another
460	   TCP connection to the same failover partner already in use.

462	5.2.  Endpoint Identification

464	   The proper operation of the failover protocol requires more than the
465	   transmission of messages between one server and the other.  Each
466	   endpoint might seem to be a single DHCPv6 server, but in fact there
467	   are situations where additional flexibility in configuration is
468	   useful.  A failover endpoint is always associated with a set of
469	   DHCPv6 prefixes that are configured on the DHCPv6 server where the
470	   endpoint appears.  A DHCPv6 prefix MUST NOT be associated with more
471	   than one failover endpoint.

473	   The failover protocol SHOULD be configured with one failover
474	   relationship between each pair of failover servers.  In this case
475	   there is one failover endpoint for that relationship on each failover
476	   partner.  This failover relationship MUST have a unique name.

478	   There is typically little need for addtional relationships between
479	   any two servers but there MAY be more than one failover relationship
480	   between two servers -- however each MUST have a unique relationship
481	   name.

483	   Any failover endpoint can take actions and hold unique states.

485	   This document frequently describes the behavior of the protocol in
486	   terms of primary and secondary servers, not primary and secondary
487	   failover endpoints.  However, it is important to remember that every
488	   'server' described in this document is in reality a failover endpoint
489	   that resides in a particular process, and that several failover end-
490	   points may reside in the same server process.

492	   It is not the case that there is a unique failover endpoint for each
493	   prefix that participates in a failover relationship.  On one server,
494	   there is (typically) one failover endpoint per partner, regardless of
495	   how many prefixes are managed by that combination of partner and
496	   role.  Conversely, on a particular server, any given prefix will be
497	   associated with exactly one failover endpoint.

499	   When a connection is received from the partner, the unique failover
500	   endpoint to which the message is directed is determined solely by the
501	   IP address of the partner, the relationship-name, and the role of the
502	   receiving server.

504	6.  Resource Allocation

506	   Currently there are two allocation algorithms defined for resources
507	   (addresses or prefixes).  Additional allocation schemes may be
508	   defined as future extensions.

510	   1.  Proportional Allocation - This allocation algorithm is a direct
511	       application of algorithm defined in [dhcpv4-failover] to DHCPv6.
512	       Available resources are split between primary and secondary
513	       server.  Released resources are always returned to primary
514	       server.  Primary and secondary servers may initiate a rebalancing
515	       procedure, when disparity between resources available to each
516	       server reaches a preconfigured threshold.  Only resources that
517	       are not leased to any clients are "owned" by one of the servers.
518	       This algorithm is particularly well suited for scenarios where
519	       amount of available resources is limited, as may be the case for
520	       prefix delegation.  See Section 6.1 for details.

522	   2.  Independent Allocation - This allocation algorithm assumes that
523	       available resources are split between primary and secondary
524	       servers as well.  In this case, however, resources are assigned
525	       to a specific server for all time, regardless if they are
526	       available or currently used.  This algorithm is much simpler than
527	       proportional allocation, because resource imbalance doesn't have
528	       to be checked and there is no rebalancing for independent
529	       allocation.  This algorithm is particularly well suited for
530	       scenarios where the there is an abundance of available resources
531	       which is typically the case for DHCPv6 address allocation.  See
532	       Section 6.2 for details.

534	6.1.  Proportional Allocation

536	   In this allocation scheme, each server has its own pool of available
537	   resources.  Note that a resource is not "owned" by a particular
538	   server throughout its entire lifetime.  Only a resource which is
539	   available is "owned" by a particular server -- once it has been
540	   leased to a client, it is not owned by either failover partner.  When
541	   it finally becomes available again, it will be owned initially by the
542	   primary server, and it may or may not be allocated to the secondary
543	   server by the primary server.

545	   So, the flow of a resource is as follows: initially a resource is
546	   owned by the primary server.  It may be allocated to the secondary
547	   server if it is available, and then it is owned by the secondary
548	   server.  Either server can allocate available resources which they
549	   own to clients, in which case they cease to own them.  When the
550	   client releases the resource or the lease on it expires, it will
551	   again become available and will be owned by the primary.

553	   A resource will not become owned by the server which allocated it
554	   initially when it is released or the lease expires because, in
555	   general, that server will have had to replenish its pool of available
556	   resources well in advance of any likely lease expirations.  Thus,
557	   having a particular resource cycle back to the secondary might well
558	   put the secondary more out of balance with respect to the primary
559	   instead of enhancing the balance of available addresses or prefixes
560	   between them.

562	   TODO: Need to rework this v4-specific vocabulary to v6, once we
563	   decide how things will look like in v6.

565	   When they are used, these proportional pools are used for allocation
566	   when in every state but PARTNER-DOWN state.  In PARTNER-DOWN state a
567	   failover server can allocate from either pool.  This allocation and
568	   maintenance of these address pools is an area of some sensitivity,
569	   since the goal is to maintain a more or less constant ratio of
570	   available addresses between the two servers.

572	   TODO: Reuse rest of the description from section 5.4 from
573	   [dhcpv4-failover] here.

575	6.2.  Independent Allocation

577	   In this allocation scheme, available resources are split between
578	   servers.  Available resources are split between the primary and
579	   secondary servers as part of initial connection establishment.  Once
580	   resources are allocated to each server, there is no need to reassign
581	   them.  This algorithm is simpler than proportional allocation since
582	   it requires no less initial communicagtion and does not require a
583	   rebalancing mechanism, but it assumes that the pool assigned to each
584	   server will never deplete.  That is often a reasonable assumption for
585	   IPv6 addresses (e.g. servers are often assigned a /64 pool that
586	   contains many more addresses than existing electronic devices on
587	   Earth).  This allocation mechanism SHOULD be used for IPv6 addresses,
588	   unless configured address pool is small or is otherwise
589	   administratively limited.

591	   Once each server is assigned a resource pool during initial
592	   connection establishment, it may allocate assigned resources to
593	   clients.  Once a client release a resource or its lease is expired,
594	   the returned resource returns to pool for the same server.  Resources
595	   never changes servers.

597	   During COMMUNICATION-INTERRUPTED events, a partner MAY continue
598	   extending existing leases when requested by clients.  A healthy
599	   partner MUST NOT lease resources that were assigned to its downed
600	   partner and later released by a client unless it is in PARTNER-DOWN
601	   state.

603	6.3.  Determining Allocation Approach

605	6.3.1.  IPv6 Addresses

607	6.3.2.  IPv6 Prefixes

609	7.  Failover Mechanisms

611	   This section lays out an overview of the communication between
612	   partners and other mechanisms required for failover operation.  As
613	   this is a design document, not a protocol specification, high level
614	   ideas are presented without implementation specific details (e.g.
615	   lack of on-wire formats).  Implementation details will be specified
616	   in a separate draft.

618	7.1.  Time Skew

620	   Partners exchange information about known lease states.  To reliably
621	   compare a known lease state with an update received from a partner,
622	   servers must be able to reliably compare the times stored in the
623	   known lease state with the times received in the update.  Although a
624	   simple approach would be to require both partners to use synchronized
625	   time, e.g. by using NTP, such a service may become unavailable in
626	   some scenarios that failover expects to cover, e.g. network
627	   partition.  Therefore a mechanism to measure and track relative time
628	   differences between servers is necessary.  To do so, each message
629	   MUST contain FO_TIMESTAMP option that contains the timestamp of the
630	   transmission in the time context of the transmitter.  The
631	   transmitting server MUST set this as close to the actual transmission
632	   as possible.  The receiving partner MUST store its own timestamp of
633	   reception event as close to the actual reception as possible.  The
634	   received timestamp information is then compared with local timestamp.

636	   To account for packet delay variation (jitter), the measured
637	   difference is not used directly, but rather the moving average of
638	   last TIME_SKEW_PKTS_AVG packets time difference is calculated.  This
639	   averaged value is referred to as the time skew.  Note that the time
640	   skew algorithm allows cooperation between clients with completely
641	   desynchronized clocks as well as those whose desynchronization itself
642	   is not constant.

644	7.2.  Time expression

646	   Timestamps are expressed as number of seconds since midnight (UTC),
647	   January 1, 2000, modulo 2^32.  Note: that is the same approach as
648	   used in creation of DUID-LLT (see Section 9.2 of [RFC3315]).

650	   Time differences are expressed in seconds and are signed.

652	7.3.  Lazy updates

654	   Lazy update refers to the requirement placed on a server implementing
655	   a failover protocol to update its failover partner whenever the
656	   binding database changes.  A failover protocol which didn't support
657	   lazy update would require the failover partner update to complete
658	   before a DHCPv6 server could respond to a DHCPv6 client request.  The
659	   lazy update mechanism allows a server to allocate a new or extend an
660	   existing lease and then update its failover partner as time permits.

662	   Although the lazy update mechanism does not introduce additional
663	   delays in server response times, it introduces other difficulties.
664	   The key problem with lazy update is that when a server fails after
665	   updating a client with a particular lease time and before updating
666	   its partner, the partner will believe that a lease has expired even
667	   though the client still retains a valid lease on that address or
668	   prefix.

670	7.4.  MCLT concept

672	   In order to handle problem introduced by lazy updates (see
673	   Section 7.3), a period of time known as the "Maximum Client Lead
674	   Time" (MCLT) is defined and must be known to both the primary and
675	   secondary servers.  Proper use of this time interval places an upper
676	   bound on the difference allowed between the lease time provided to a
677	   DHCPv6 client by a server and the lease time known by that server's
678	   failover partner.

680	   The MCLT is typically much less than the lease time that a server has
681	   been configured to offer a client, and so some strategy must exist to
682	   allow a server to offer the configured lease time to a client.
683	   During a lazy update the updating server typically updates its
684	   partner with a potential expiration time which is longer than the
685	   lease time previously given to the client and which is longer than
686	   the lease time that the server has been configured to give a client.
687	   This allows that server to give a longer lease time to the client the
688	   next time the client renews its lease, since the time that it will
689	   give to the client will not exceed the MCLT beyond the potential
690	   expiration time acknowledged by its partner.

692	   The fundamental relationship on which much of The correctness of this
693	   protocol depends is that the lease expiration time known to a DHCPv6
694	   client MUST NOT under any circumstances be more than the maximum
695	   client lead time (MCLT) greater than the potential expiration time
696	   known to a server's partner.

698	   The remainder of this section makes the above fundamental
699	   relationship more explicit.

701	   This protocol requires a DHCPv6 server to deal with several different
702	   lease intervals and places specific restrictions on their
703	   relationships.  The purpose of these restrictions is to allow the
704	   other server in the pair to be able to make certain assumptions in
705	   the absence of an ability to communicate between servers.

707	   The different times are:

709	   desired valid lifetime:
710	      The desired valid lifetime is the lease interval that a DHCPv6
711	      server would like to give to a DHCPv6 client in the absence of any
712	      restrictions imposed by the failover protocol.  Its determination
713	      is outside of the scope of this protocol.  Typically this is the
714	      result of external configuration of a DHCPv6 server.

716	   actual valid lifetime:
717	      The actual valid lifetime is the lease interval that a DHCPv6
718	      server gives out to a DHCPv6 client.  It may be shorter than the
719	      desired valid lifetime (as explained below).

721	   potential valid lifetime:
722	      The potential valid lifetime is the potential lease expiration
723	      interval the local server tells to its partner in a BNDUPD
724	      message.

726	   acknowledged potential valid lifetime:
727	      The acknowledged potential valid lifetime is the potential lease
728	      interval the partner server has most recently acknowledged in a
729	      BNDACK message.

731	7.4.1.  MCLT example

733	   The following example demonstrates the MCLT concept in practice.  The
734	   values used are arbitrarily chosen are and not a recommendation for
735	   actual values.  The MCLT in this case is 1 hour.  The desired valid
736	   lifetime is 3 days, and its renewal time is half the valid lifetime.

738	   When a server makes an offer for a new lease on an IP address to a
739	   DHCPv6 client, it determines the desired valid lifetime (in this
740	   case, 3 days).  It then examines the acknowledged potential valid
741	   lifetime (which in this case is zero) and determines the remainder of
742	   the time left to run, which is also zero.  To this it adds the MCLT.
743	   Since the actual valid lifetime cannot be allowed to exceed the
744	   remainder of the current acknowledged potential valid lifetime plus
745	   the MCLT, the offer made to the client is for the remainder of the
746	   current acknowledged potential valid lifetime (i.e., zero) plus the
747	   MCLT.  Thus, the actual valid lifetime is 1 hour.

749	   Once the server has sent the REPLY to the DHCPv6 client, it will
750	   update its failover partner with the lease information.  However, the
751	   desired potential valid lifetime will be composed of one half of the
752	   current actual valid lifetime added to the desired valid lifetime.
753	   Thus, the failover partner is updated with a BNDUPD with a potential
754	   valid lifetime of 3 days + 1/2 hour.

756	   When the primary server receives a BNDACK to its update of the
757	   secondary server's (partner's) potential valid lifetime, it records
758	   that as the acknowledged potential valid lifetime.  A server MUST NOT
759	   send a BNDACK in response to a BNDUPD message until it is sure that
760	   the information in the BNDUPD message has been updated in its lease
761	   database.  Thus, the primary server in this case can be sure that the
762	   secondary server has recorded the potential lease interval in its
763	   stable storage when the primary server receives a BNDACK message from
764	   the secondary server.

766	   When the DHCPv6 client attempts to renew at T1 (approximately one
767	   half an hour from the start of the lease), the primary server again
768	   determines the desired valid lifetime, which is still 3 days.  It
769	   then compares this with the remaining acknowledged potential valid
770	   lifetime (3 days + 1/2 hour) and adjusts for the time passed since
771	   the secondary was last updated (1/2 hour).  Thus the time remaining
772	   of the acknowledged potential valid interval is 3 days.  Adding the
773	   MCLT to this yields 3 days plus 1 hour, which is more than the
774	   desired valid lifetime of 3 days.  So the client is renewed for the
775	   desired valid lifetime -- 3 days.

777	   When the primary DHCPv6 server updates the secondary DHCPv6 server
778	   after the DHCPv6 client's renewal REPLY is complete, it will
779	   calculate the desired potential valid lifetime as the T1 fraction of
780	   the actual client valid lifetime (1/2 of 3 days this time = 1.5
781	   days).  To this it will add the desired client valid lifetime of 3
782	   days, yielding a total desired potential valid lifetime of 4.5 days.
783	   In this way, the primary attempts to have the secondary always "lead"
784	   the client in its understanding of the client's valid lifetime so as
785	   to be able to always offer the client the desired client valid
786	   lifetime.

788	   Once the initial actual client valid lifetime of the MCLT is past,
789	   the protocol operates effectively like the DHCPv6 protocol does today
790	   in its behavior concerning valid lifetimes.  However, the guarantee
791	   that the actual client valid lifetime will never exceed the remaining
792	   acknowledged partner server potential valid lifetime by more than the
793	   MCLT allows full recovery from a variety of failures.

795	7.5.  Unreachability detection

797	   Each partner maintains an FO_SEND timer for each partner connection.
798	   The FO_SEND timer is reset every time any message is transmitted.  If
799	   the timer reaches the FO_SEND_MAX value, a CONTACT message is
800	   transmitted and timer is reset.  The CONTACT message may be
801	   transmitted at any time.

803	   Discussion: Perhaps it would be more reasonable to use echo-reply
804	   approach, rather than periodic transmissions?

806	7.6.  Re-allocating Leases

808	   TODO: Describe controlled re-allocation of released/expired leases to
809	   different clients.

811	7.7.  Sending Data

813	   Each server updates its failover partner about recent changes in
814	   lease states.  Each update must include following information:

816	   1.  resource type - non-temporary address or a prefix

818	   2.  resource information - actual address or prefix

820	   3.  valid life time requested by client

822	   4.  IAID - Identity Association used by client, while obtaining this
823	       lease.  (Note1: one client may use many IAID simulatenously.
824	       Note2: IAID for IA, TA and PD are orthogonal number spaces.)

826	   5.  valid life time sent to client

828	   6.  potential valid life time

830	   7.  preferred life time sent to client

832	   8.  CLTT - Client Last Transaction Time, a timestamp of the last
833	       received transmission from a client

835	   9.  assigned FQDN names, if any (optional)

837	   Discussion: Do we need T1 as well?  Something like next expected
838	   client transmission?

840	   Q: Maybe we could reuse IA_NA and IA_PD options here?  Yes.

842	   Q: Do we care about preferred lifetime? (presumably no).  Certainly
843	   not what was requested by the client.

845	   Q: Do we care about IAID? (presumably yes) Yes.

847	7.7.1.  Required Data

849	7.7.2.  Optional Data

851	7.8.  Receiving Data

853	7.8.1.  Conflict Resolution

855	   TODO: This is just a loose collection of notes.  This section will
856	   probably need to be rewritten as a a flowchart of some kind.

858	   The server receiving a lease update from its partner must evaluate
859	   the received lease information to see if it is consistent with
860	   already known state and decide which information - previously known
861	   or just received - is "better".  The server should take into
862	   consideration the following aspects: if the lease is already assigned
863	   to specific client, who had contact with client recently, start time
864	   of the lease, etc.

866	   The lease update may be accepted or rejected.  Rejection SHOULD NOT
867	   change the flag in a lease that says that it should be transmitted to
868	   the failover partner.  If this flag is set, then it should be
869	   transmitted, but if it is not already set, the rejection of a lease
870	   state update SHOULD NOT trigger an automatic update of the failover
871	   partner sending the rejected update.  The potential for update storms
872	   is too great, and in the unusual case where the servers simply can't
873	   agree, that disagreement is better than an update storm.

875	   Discussion: There will definitely be different types of update
876	   rejections.  For example, this will allow a server to treat
877	   differently a case when receiving a new lease that it previously
878	   haven't seen than a case when partner sents old version of a lease
879	   for which a newer state is known.

881	7.8.2.  Acknowledging Reception

883	8.  Endpoint States

885	8.1.  State Machine Operation

887	   Each server (or, more accurately, failover endpoint) can take on a
888	   variety of failover states.  These states play a crucial role in
889	   determining the actions that a server will perform when processing a
890	   request from a DHCPv6 client as well as dealing with changing
891	   external conditions (e.g., loss of connection to a failover partner).

893	   The failover state in which a server is running controls the
894	   following behaviors:

896	   o  Responsiveness -- the server is either responsive to DHCPv6 client
897	      requests or it is not.

899	   o  Allocation Pool -- which pool of addresses (or prefixes) can be
900	      used for allocation on receipt of a SOLICIT message.

902	   o  MCLT -- ensure that valid lifetimes are not beyond what the
903	      partner has acked plus the MCLT (or not).

905	   A server will transition from one failover state to another based on
906	   the specific values held by the following state variables:

908	   o  Current failover state.

910	   o  Communications status (OK or not OK).

912	   o  Partner's failover state (if known).

914	   Whenever the either of the last two of the above state variables
915	   changes state, the state machine is invoked, which may then trigger a
916	   change in the current failove state.  Thus, whenever the
917	   communications status changes, the state machine is processing is
918	   invoked.  This may or may not result in a change in the current
919	   failover state.

921	   Whenever a server transitions to a new failover state, the new state
922	   MUST be communicated to its failover partner in a STATE message if
923	   the communications status is OK.  In addition, whenever a server
924	   makes a transition into a new state, it MUST record the new state,
925	   its current understanding of its partner's state, and the time at
926	   which it entered the new state in stable storage.

928	   The following state transition diagram gives a condensed view of the
929	   state machine.  If there is a difference between the words describing
930	   a particular state and the diagram below, the words should be
931	   considered authoritative.

933	   A transition into SHUTDOWN or PAUSED state is not represented in the
934	   following figure, since other than sending that state to its partner,
935	   the remaining actions involved look just like the server halting in
936	   its otherwise current state, which then becomes the previous state
937	   upon server restart.

939	        +---------------+  V  +--------------+
940	        |    RECOVER -|+|  |  |   STARTUP  - |
941	        |(unresponsive) |  +->+(unresponsive)|
942	        +------+--------+     +--------------+
943	        +-Comm. OK             +-----------------+
944	        |     Other State:     |  PARTNER DOWN - +<----------------------+
945	        |    RESOLUTION-INTER. | (responsive)    |                       ^
946	       All     POTENTIAL-      +----+------------+                       |
947	      Others   CONFLICT------------ | --------+                          |
948	        |      CONFLICT-DONE     Comm. OK     |     +--------------+     |
949	     UPDREQ or                 Other State:   |  +--+ RESOLUTION - |     |
950	     UPDREQALL                  |       |     |  |  | INTERRUPTED  |     |
951	     Rcv UPDDONE             RECOVER    All   |  |  | (responsive) |     |
952	        |  +---------------+    |      Others |  |  +------------+-+     |
953	        +->+RECOVER-WAIT +-| RECOVER    |     |  |         ^     |       |
954	           |(unresponsive) |  WAIT or   |     |  Comm.     |    Ext.     |
955	           +-----------+---+  DONE      |     |  OK     Comm.   Cmd----->+
956	    Comm.---+     Wait MCLT     |       V     V  V     Failed            |
957	    Changed |          V    +---+   +---+-----+--+-+       |             |
958	     |  +---+----------++   |       |  POTENTIAL + +-------+             |
959	     |  |RECOVER-DONE +-|  Wait     |  CONFLICT    +------+              |
960	     +->+(unresponsive) |  for      |(unresponsive)|   Primary           |
961	        +------+--------+  Other  +>+----+--------++   resolve     Comm. |
962	         Comm. OK          State: |      |        ^    conflict  Changed |
963	    +---Other State:-+   RECOVER  |   Secondary   |       V       V   |  |
964	    |    |           |     DONE   |    resolve    |   ++----------+---++ |
965	    | All Others:  POTENT.  |     |   conflict    |   |CONFLICT-DONE-|+| |
966	    | Wait for    CONFLICT- | ----+    see (9.10) |   | (responsive)   | |
967	    | Other State:          V            V        |   +------+---------+ |
968	    | NORMAL or RECOVER    ++------------+---+      Other State: NORMAL  |
969	    |    |       DONE      |     NORMAL    + +<--------------+           |
970	    |    +--+----------+-->+   (balanced)    +-------External Command--->+
971	    |       ^          ^   +--------+--------+       or Other State:     |
972	    |       |          |            |             |  SHUTDOWN            |
973	    |   Wait for   Comm. OK  Comm. Failed or      |                      |
974	    |    Other      Other    Other State: PAUSED  |               External
975	    |    State:     State:          |             |                Command
976	    | RECOVER-DONE  NORMAL     Start Safe      Comm. OK                or
977	    |       |     COMM. INT.  Period Timer    Other State:            Safe
978	    |    Comm. OK.     |            V          All Others           Period
979	    |   Other State:   |  +---------+--------+    |             expiration
980	    |     RECOVER      +--+ COMMUNICATIONS - +----+                      |
981	    |       +-------------+   INTERRUPTED    |                           |
982	    RECOVER               |  (responsive)    +-------------------------->+
983	    RECOVER-WAIT--------->+------------------+

985	                 Figure 1: Failover Endpoint State Machine

987	8.2.  State Machine Initialization

989	   TODO

991	8.3.  STARTUP State

993	   The STARTUP state affords an opportunity for a server to probe its
994	   partner server, before starting to service DHCP clients.  When in the
995	   STARTUP state, a server attempts to learn its partner's state and
996	   determine (using that information if it is available) what state it
997	   should enter.

999	   The STARTUP state is not shown with any specific state transitions in
1000	   the state machine diagram (Figure 1) because the processing during
1001	   the STARTUP state can cause the server to transition to any of the
1002	   other states, so that specific state transition arcs would only
1003	   obscure other information.

1005	8.3.1.  Operation in STARTUP State

1007	   The server MUST NOT be responsive in STARTUP state.

1009	   Whenever a STATE message is sent to the partner while in STARTUP
1010	   state the STARTUP flag MUST be set the message and the previously
1011	   recorded failover state MUST be placed in the server-state option.

1013	8.3.2.  Transition Out of STARTUP State

1015	   The following algorithm is followed every time the server initializes
1016	   itself, and enters STARTUP state.

1018	   Step 1:

1020	   If there is any record in stable storage of a previous failover state
1021	   for this server, set PREVIOUS-STATE to the last recorded value in
1022	   stable storage, and go to Step 2.

1024	   If there is no record of any previous failover state in stable
1025	   storage for this server, then set the PREVIOUS-STATE to RECOVER and
1026	   set the TIME-OF-FAILURE to 0.  This will allow two servers which
1027	   already have lease information to synchronize themselves prior to
1028	   operating.

1030	   In some cases, an existing server will be commissioned as a failover
1031	   server and brought back into operation where its partner is not yet
1032	   available.  In this case, the newly commissioned failover server will
1033	   not operate until its partner comes online -- but it has operational
1034	   responsibilities as a DHCP server nonetheless.  To properly handle
1035	   this situation, a server SHOULD be configurable in such a way as to
1036	   move directly into PARTNER-DOWN state after the startup period
1037	   expires if it has been unable to contact its partner during the
1038	   startup period.

1040	   Step 2:

1042	   If the previous state is one where communications was "OK", then set
1043	   the previous state to the state that is the result of the
1044	   communications failed state transition (if such transition exists --
1045	   some states don't have a communications failed state transition,
1046	   since they allow both commun- ications OK and failed).

1048	   Step 3:

1050	   Start the STARTUP state timer.  The time that a server remains in the
1051	   STARTUP state (absent any communications with its partner) is
1052	   implementation dependent but SHOULD be short.  It SHOULD be long
1053	   enough for a TCP connection to be created to a heavily loaded partner
1054	   across a slow network.

1056	   Step 4:

1058	   Attempt to create a TCP connection to the failover partner.

1060	   Step 5:

1062	   Wait for "communications OK".

1064	   When and if communications become "okay", clear the STARTUP flag, and
1065	   set the current state to the PREVIOUS-STATE.

1067	   If the partner is in PARTNER-DOWN state, and if the time at which it
1068	   entered PARTNER-DOWN state (as received in the start-time-of-state
1069	   option in the STATE message) is later than the last recorded time of
1070	   operation of this server, then set CURRENT-STATE to RECOVER.  If the
1071	   time at which it entered PARTNER-DOWN state is earlier than the last
1072	   recorded time of operation of this server, then set CURRENT-STATE to
1073	   POTENTIAL-CONFLICT.

1075	   Then, transition to the current state and take the "communications
1076	   OK" state transition based on the current state of this server and
1077	   the partner.

1079	   Step 6:

1081	   If the startup time expires the server SHOULD go transition to the
1082	   PREVIOUS-STATE.

1084	8.4.  PARTNER-DOWN State

1086	   PARTNER-DOWN state is a state either server can enter.  When in this
1087	   state, the server assumes that it is the only server operating and
1088	   serving the client base.  If one server is in PARTNER-DOWN state, the
1089	   other server MUST NOT be operating.

1091	8.4.1.  Operation in PARTNER-DOWN State

1093	   The server MUST be responsive in PARTNER-DOWN state.

1095	   It will allow renewal of all outstanding leases on IP addresses.  For
1096	   those IP addresses for which the server is using proportional
1097	   allocation, it will allocate IP addresses from its own pool, and
1098	   after a fixed period of time (the MCLT interval) has elapsed from
1099	   entry into PARTNER-DOWN state, it will allocate IP addresses from the
1100	   set of all available IP addresses.

1102	   Any IP address tagged as available for allocation by the other server
1103	   (at entry to PARTNER-DOWN state) MUST NOT be allocated to a new
1104	   client until the maximum-client-lead-time beyond the entry into
1105	   PARTNER-DOWN state has elapsed.

1107	   A server in PARTNER-DOWN state MUST NOT allocate an IP address to a
1108	   DHCP client different from that to which it was allocated at the
1109	   entrance to PARTNER-DOWN state until the maximum-client-lead-time
1110	   beyond the maximum of the following times: client expiration time,
1111	   most recently transmitted potential-expiration-time, most recently
1112	   received ack of potential-expiration-time from the partner, and most
1113	   recently acked potential-expiration-time to the partner.  See section
1114	   7.1.5 for details.  If this time would be earlier than the current
1115	   time plus the maximum-client-lead-time, then the time the server
1116	   entered PARTNER-DOWN state plus the maximum-client-lead-time is used.

1118	   The server is not restricted by the MCLT when offering lease tmes
1119	   while in PARTNER-DOWN state.

1121	8.4.2.  Transition Out of PARTNER-DOWN State

1123	   When a server in PARTNER-DOWN state succeeds in establishing a con-
1124	   nection to its partner, its actions are conditional on the state and
1125	   flags received in the STATE message from the other server as part of
1126	   the process of establishing the connection.

1128	   If the STARTUP bit is set in the server-flags option of a received
1129	   STATE message, a server in PARTNER-DOWN state MUST NOT take any state
1130	   transitions based on reestablishing communications.  Essentially, if
1131	   a server is in PARTNER-DOWN state, it ignores all STATE messages from
1132	   its partner that have the STARTUP bit set in the server-flags option
1133	   of the STATE message.  THIS NEEDS TO BE MOVED

1135	   If the STARTUP bit is not set in the server-flags option of a STATE
1136	   message received from its partner, then a server in PARTNER-DOWN
1137	   state takes the following actions based on the state of the partner
1138	   as received in a STATE message (either immediately after establishing
1139	   communications or at any time later when a new state is received)

1141	   If the partner is in:

1143	   NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN, POTENTIAL-CONFLICT,
1144	   RESOLUTION-INTERRUPTED, or CONFLICT-DONE state

1146	   transition to POTENTIAL-CONFLICT state

1148	   If the partner is in:

1150	   RECOVER, RECOVER-WAIT, SHUTDOWN, PAUSED state

1152	   stay in PARTNER-DOWN state

1154	   If the partner is in:

1156	   RECOVER-DONE state

1158	   transition into NORMAL state

1160	8.5.  RECOVER State

1162	   This state indicates that the server has no information in its stable
1163	   storage or that it is re-integrating with a server in PARTNER-DOWN
1164	   state after it has been down.  A server in this state MUST attempt to
1165	   refresh its stable storage from the other server.

1167	8.5.1.  Operation in RECOVER State

1169	   The server MUST NOT be responsive in RECOVER state.

1171	   A server in RECOVER state will attempt to reestablish communications
1172	   with the other server.

1174	8.5.2.  Transition Out of RECOVER State

1176	   If the other server is in POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED,
1177	   or CONFLICT-DONE state when communications are reestablished, then
1178	   the server in RECOVER state will move to POTENTIAL-CONFLICT state
1179	   itself.

1181	   If the other server is in any other state, then the server in RECOVER
1182	   state will request an update of missing binding information by
1183	   sending an UPDREQ message.  If the server has determined that it has
1184	   lost its stable storage because it has no record of ever having
1185	   talked to its partner, while its partner does have a record of
1186	   communicating with it, it MUST send an UPDREQALL message, otherwise
1187	   it MUST send an UPDREQ message.

1189	   It will wait for an UPDDONE message, and upon receipt of that message
1190	   it will transition to RECOVER-WAIT state.

1192	   If communications fails during the reception of the results of the
1193	   UPDREQ or UPDREQALL message, the server will remain in RECOVER state,
1194	   and will re-issue the UPDREQ or UPDREQALL when communications are re-
1195	   established.

1197	   If an UPDDONE message isn't received within an implementation
1198	   dependent amount of time, and no BNDUPD messages are being received,
1199	   the connection SHOULD be dropped.

1201	                   A                                        B
1202	                 Server                                  Server

1204	                   |                                        |
1205	                RECOVER                               PARTNER-DOWN
1206	                   |                                        |
1207	                   | >--UPDREQ-------------------->         |
1208	                   |                                        |
1209	                   |        <---------------------BNDUPD--< |
1210	                   | >--BNDACK-------------------->         |
1211	                  ...                                      ...
1212	                   |                                        |
1213	                   |        <---------------------BNDUPD--< |
1214	                   | >--BNDACK-------------------->         |
1215	                   |                                        |
1216	                   |        <--------------------UPDDONE--< |
1217	                   |                                        |
1218	              RECOVER-WAIT                                  |
1219	                   |                                        |
1220	                   | >--STATE-(RECOVER-WAIT)------>         |
1221	                   |                                        |
1222	                   |                                        |
1223	          Wait MCLT from last known                         |
1224	             time of failover operation                     |
1225	                   |                                        |
1226	              RECOVER-DONE                                  |
1227	                   |                                        |
1228	                   | >--STATE-(RECOVER-DONE)------>         |
1229	                   |                                     NORMAL
1230	                   |        <-------------(NORMAL)-STATE--< |
1231	                NORMAL                                      |
1232	                   | >---- State-(NORMAL)--------------->   |
1233	                   |                                        |
1234	                   |                                        |

1236	                 Figure 2: Transition out of RECOVER state

1238	   If, at any time while a server is in RECOVER state communications
1239	   fails, the server will stay in RECOVER state.  When communications
1240	   are restored, it will restart the process of transitioning out of
1241	   RECOVER state.

1243	8.6.  RECOVER-WAIT State

1245	   This state indicates that the server has done an UPDREQ or UPDREQALL
1246	   and has received the UPDDONE message indicating that it has received
1247	   all outstanding binding update information.  In the RECOVER-WAIT
1248	   state the server will wait for the MCLT in order to ensure that any
1249	   processing that this server might have done prior to losing its
1250	   stable storage will not cause future difficulties.

1252	8.6.1.  Operation in RECOVER-WAIT State

1254	   The server MUST NOT be responsive in RECOVER-WAIT state.

1256	8.6.2.  Transition Out of RECOVER-WAIT State

1258	   Upon entry to RECOVER-WAIT state the server MUST start a timer whose
1259	   expiration is set to a time equal to the time the server went down
1260	   (if known) or the time the server started (if the down-time is
1261	   unknown) plus the maximum-client-lead-time.  When this timer expires,
1262	   the server will transition into RECOVER-DONE state.

1264	   This is to allow any IP addresses that were allocated by this server
1265	   prior to loss of its client binding information in stable storage to
1266	   contact the other server or to time out.

1268	   If this is the first time this server has run failover -- as
1269	   determined by the information received from the partner, not
1270	   necessarily only as determined by this server's stable storage (as
1271	   that may have been lost), then the waiting time discussed above may
1272	   be skipped, and the server may transition immediately to RECOVER-DONE
1273	   state.

1275	   If the server has never before run failover, then there is no need to
1276	   wait in this state -- but, again, to determine if this server has run
1277	   failover it is vital that the information provided by the partner be
1278	   utilized, since the stable storage of this server may have been lost.

1280	   If communications fails while a server is in RECOVER-WAIT state, it
1281	   has no effect on the operation of this state.  The server SHOULD
1282	   continue to operate its timer, and the timer expires during the
1283	   period where communications with the other server have failed, then
1284	   the server SHOULD transition to RECOVER-DONE state.  This is rare --
1285	   failover state transitions are not usually made while communications
1286	   are interrupted, but in this case there is no reason to inhibit the
1287	   timer.

1289	8.7.  RECOVER-DONE State

1291	   This state exists to allow an interlocked transition for one server
1292	   from RECOVER state and another server from PARTNER-DOWN or
1293	   COMMUNICATIONS-INTERRUPTED state into NORMAL state.

1295	8.7.1.  Operation in RECOVER-DONE State

1297	   A server in RECOVER-DONE state MUST respond only to DHCPREQUEST/
1298	   RENEWAL and DHCPREQUEST/REBINDING DHCP messages.

1300	8.7.2.  Transition Out of RECOVER-DONE State

1302	   When a server in RECOVER-DONE state determines that its partner
1303	   server has entered NORMAL or RECOVER-DONE state, then it will
1304	   transition into NORMAL state.

1306	   If communications fails while in RECOVER-DONE state, a server will
1307	   stay in RECOVER-DONE state.

1309	8.8.  NORMAL State

1311	   NORMAL state is the state used by a server when it is communicating
1312	   with the other server, and any required resynchronization has been
1313	   performed.  While some bindings database synchronization is performed
1314	   in NORMAL state, potential conflicts are resolved prior to entry into
1315	   NORMAL state as is binding database data loss.

1317	   When entering NORMAL state, a server will send to the other server
1318	   all currently unacknowledged binding updates as BNDUPD messages.

1320	   When the above process is complete, if the server entering NORMAL
1321	   state is a secondary server, then it will request IP addresses for
1322	   allocation using the POOLREQ message.

1324	8.8.1.  Operation in NORMAL State

1326	   When in NORMAL state a server will operate in the following manner:

1328	   Lease time calculations
1329	      As discussed in Section 7.4, the lease interval given to a DHCP
1330	      client can never be more than the MCLT greater than the most
1331	      recently received potential- expiration-time from the failover
1332	      partner or the current time, whichever is later.

1334	      As long as a server adheres to this constraint, the specifics of
1335	      the lease interval that it gives to a DHCP client or the value of
1336	      the potential-expiration-time sent to its failover partner are
1337	      implementation dependent.

1339	   Lazy update of partner server
1340	      After sending an REPLY that includes lease update to a client, the
1341	      server servicing a DHCP client request attempts to update its
1342	      partner with the new binding information.  Server transmits both
1343	      desired valid lifetime and actual valid lifetime.

1345	   Reallocation of IP addresses between clients
1346	      Whenever a client binding is released or expires, a BNDUPD mes-
1347	      sage must be sent to the partner, setting the binding state to
1348	      RELEASED or EXPIRED.  However, until a BNDACK is received for this
1349	      message, the IP address cannot be allocated to another client.  It
1350	      cannot be allocated to the same client again if a BNDUPD was sent,
1351	      otherwise it can.  See Section 7.6.

1353	   In normal state, each server receives binding updates from its
1354	   partner server in BNDUPD messages.  It records these in its client
1355	   binding database in stable storage and then sends a corresponding
1356	   BNDACK message to its partner server.

1358	8.8.2.  Transition Out of NORMAL State

1360	   If an external command is received by a server in NORMAL state
1361	   informing it that its partner is down, then transition into PARTNER-
1362	   DOWN state.  Generally, this would be an unusual situation, where
1363	   some external agency knew the partner server was down.  Using the
1364	   command in this case would be appropriate if the polling interval and
1365	   timeout were long.

1367	   If a server in NORMAL state fails to receive acks to messages sent to
1368	   its partner for an implementation dependent period of time, it MAY
1369	   move into COMMUNICATIONS-INTERRUPTED state.  This situation might
1370	   occur if the partner server was capable of maintaining the TCP con-
1371	   nection between the server and also capable of sending a CONTACT mes-
1372	   sage every tSend seconds, but was (for some reason) incapable of pro-
1373	   cessing BNDUPD messages.

1375	   If the communications is determined to not be "ok" (as defined in
1376	   Section 7.5), then transition into COMMUNICATIONS-INTERRUPTED state.

1378	   If a server in NORMAL state receives any messages from its partner
1379	   where the partner has changed state from that expected by the server
1380	   in NORMAL state, then the server should transition into
1381	   COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran-
1382	   sition from there.  For example, it would be expected for the partner
1383	   to transition from POTENTIAL-CONFLICT into NORMAL state, but not for
1384	   the partner to transition from NORMAL into POTENTIAL-CONFLICT state.

1386	   If a server in NORMAL state receives any messages from its partner
1387	   where the PARTNER has changed into SHUTDOWN state, the server should
1388	   transition into PARTNER-DOWN state.

1390	8.9.  COMMUNICATIONS-INTERRUPTED State

1392	   A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is
1393	   unable to communicate with its partner.  Primary and secondary
1394	   servers cycle automatically (without administrative intervention)
1395	   between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network
1396	   connection between them fails and recovers, or as the partner server
1397	   cycles between operational and non-operational.  No duplicate IP
1398	   address allocation can occur while the servers cycle between these
1399	   states.

1401	   When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been
1402	   configured to support an automatic transition out of COMMUNICATIONS-
1403	   INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period"
1404	   has been configured, see section 10), then a timer MUST be started
1405	   for the length of the configured safe period.

1407	   A server transitioning into the COMMUNICATIONS-INTERRUPTED state from
1408	   the NORMAL state SHOULD raise some alarm condition to alert
1409	   administrative staff to a potential problem in the DHCP subsystem.

1411	8.9.1.  Operation in COMMUNICATIONS-INTERRUPTED State

1413	   In this state a server MUST respond to all DHCP client requests.
1414	   When allocating new lease, each server allocates from its own pool,
1415	   where the primary MUST allocate only FREE resources (addresses or
1416	   prefixes), and the secondary MUST allocate only BACKUP resources
1417	   (addresses or prefixes).  When responding to RENEW messages, each
1418	   server will allow continued renewal of a DHCP client's current lease
1419	   on an IP address or prefix irrespective of whether that lease was
1420	   given out by the receiving server or not, although the renewal period
1421	   MUST NOT exceed the maximum client lead time (MCLT) beyond the latest
1422	   of: 1) the potential valid lifetime already acknowledged by the other
1423	   server, or 2) the lease- expiration-time , or 3) the potential valid
1424	   lifetime received from the partner server.

1426	   However, since the server cannot communicate with its partner in this
1427	   state, the acknowledged potential valid lifetime will not be updated
1428	   in any new bindings.  This is likely to eventually cause the actual
1429	   valid lifetimes to be the current time plus the MCLT (unless this is
1430	   greater than the desired-client-lease- time).

1432	   The server should continue to try to establish a connection with its
1433	   partner.

1435	8.9.2.  Transition Out of COMMUNICATIONS-INTERRUPTED State

1437	   If the safe period timer expires while a server is in the
1438	   COMMUNICATIONS-INTERRUPTED state, it will transition immediately into
1439	   PARTNER-DOWN state.

1441	   If an external command is received by a server in COMMUNICATIONS-
1442	   INTERRUPTED state informing it that its partner is down, it will
1443	   transition immediately into PARTNER-DOWN state.

1445	   If communications is restored with the other server, then the server
1446	   in COMMUNICATIONS-INTERRUPTED state will transition into another
1447	   state based on the state of the partner:

1449	   o  NORMAL or COMMUNICATIONS-INTERRUPTED: Transition into the NORMAL
1450	      state.

1452	   o  RECOVER: Stay in COMMUNICATIONS-INTERRUPTED state.

1454	   o  RECOVER-DONE: Transition into NORMAL state.

1456	   o  PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or RESOLUTION-
1457	      INTERRUPTED: Transition into POTENTIAL-CONFLICT state.

1459	   o  SHUTDOWN: Transition into PARTNER-DOWN state.

1461	   The following figure illustrates the transition from NORMAL to
1462	   COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again.

1464	      Primary                                Secondary
1465	       Server                                  Server

1467	       NORMAL                                  NORMAL
1468	         | >--CONTACT------------------->         |
1469	         |        <--------------------CONTACT--< |
1470	         |         [TCP connection broken]        |
1471	    COMMUNICATIONS          :              COMMUNICATIONS
1472	      INTERRUPTED           :                INTERRUPTED
1473	         |      [attempt new TCP connection]      |
1474	         |         [connection succeeds]          |
1475	         |                                        |
1476	         | >--CONNECT------------------->         |
1477	         |        <-----------------CONNECTACK--< |
1478	         |                                     NORMAL
1479	         |        <-------------------STATE-----< |
1480	       NORMAL                                     |
1481	         | >--STATE--------------------->         |
1482	         |
1483	         | >--BNDUPD-------------------->         |
1484	         |        <---------------------BNDACK--< |
1485	         |                                        |
1486	         |        <---------------------BNDUPD--< |
1487	         | >------BNDACK---------------->         |
1488	        ...                                      ...
1489	         |                                        |
1490	         |        <--------------------POOLREQ--< |
1491	         | >--POOLRESP-(2)-------------->         |
1492	   t>      |                                        |
1493	         | >--BNDUPD-(#1)--------------->         |
1494	         |        <---------------------BNDACK--< |
1495	         |                                        |
1496	         |        <--------------------POOLREQ--< |
1497	         | >--POOLRESP-(0)-------------->         |
1498	         |                                        |
1499	         | >--BNDUPD-(#2)--------------->         |
1500	         |        <---------------------BNDACK--< |
1501	         |                                        |

1503	    Figure 3: Transition from NORMAL to COMMUNICATIONS-INTERRUPTED and
1504	          back (example with 2 addresses allocated to secondary)

1506	8.10.  POTENTIAL-CONFLICT State

1508	   This state indicates that the two servers are attempting to
1509	   reintegrate with each other, but at least one of them was running in
1510	   a state that did not guarantee automatic reintegration would be
1511	   possible.  In POTENTIAL-CONFLICT state the servers may determine that
1512	   the same resource has been offered and accepted by two different
1513	   clients.

1515	   It is a goal of this protocol to minimize the possibility that
1516	   POTENTIAL-CONFLICT state is ever entered.

1518	   When a primary server enters POTENTIAL-CONFLICT state it should
1519	   request that the secondary send it all updates of which it is
1520	   currently unaware by sending an UPDREQ message to the secondary
1521	   server.

1523	   A secondary server entering POTENTIAL-CONFLICT state will wait for
1524	   the primary to send it an UPDREQ message.

1526	8.10.1.  Operation in POTENTIAL-CONFLICT State

1528	   Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming
1529	   DHCP requests.

1531	8.10.2.  Transition Out of POTENTIAL-CONFLICT State

1533	   If communications fails with the partner while in POTENTIAL-CONFLICT
1534	   state, then the server will transition to RESOLUTION-INTERRUPTED
1535	   state.

1537	   Whenever either server receives an UPDDONE message from its partner
1538	   while in POTENTIAL-CONFLICT state, it MUST transition to a new state.
1539	   The primary MUST transition to CONFLICT-DONE state, and the secondary
1540	   MUST transition to NORMAL state.  This will cause the primary server
1541	   to leave POTENTIAL-CONFLICT state prior to the secondary, since the
1542	   primary sends an UPDREQ message and receives an UPDDONE before the
1543	   secondary sends an UPDREQ message and receives its UPDDONE message.

1545	   When a secondary server receives an indication that the primary
1546	   server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE
1547	   state, it SHOULD send an UPDREQ message to the primary server.

1549	       Primary                                Secondary
1550	       Server                                  Server

1552	         |                                        |
1553	   POTENTIAL-CONFLICT                    POTENTIAL-CONFLICT
1554	         |                                        |
1555	         | >--UPDREQ-------------------->         |
1556	         |                                        |
1557	         |        <---------------------BNDUPD--< |
1558	         | >--BNDACK-------------------->         |
1559	        ...                                      ...
1560	         |                                        |
1561	         |        <---------------------BNDUPD--< |
1562	         | >--BNDACK-------------------->         |
1563	         |                                        |
1564	         |        <--------------------UPDDONE--< |
1565	   CONFLICT-DONE                                  |
1566	         | >--STATE--(CONFLICT-DONE)---->         |
1567	         |        <---------------------UPDREQ--< |
1568	         |                                        |
1569	         | >--BNDUPD-------------------->         |
1570	         |        <---------------------BNDACK--< |
1571	        ...                                      ...
1572	         | >--BNDUPD-------------------->         |
1573	         |        <---------------------BNDACK--< |
1574	         |                                        |
1575	         | >--UPDDONE------------------->         |
1576	         |                                     NORMAL
1577	         |        <------------STATE--(NORMAL)--< |
1578	      NORMAL                                      |
1579	         | >--STATE--(NORMAL)----------->         |
1580	         |                                        |
1581	         |        <--------------------POOLREQ--< |
1582	         | >------POOLRESP-(n)---------->         |
1583	         |              addresses                 |

1585	              Figure 4: Transition out of POTENTIAL-CONFLICT

1587	8.11.  RESOLUTION-INTERRUPTED State

1589	   This state indicates that the two servers were attempting to
1590	   reintegrate with each other in POTENTIAL-CONFLICT state, but
1591	   communications failed prior to completion of re-integration.

1593	   If the servers remained in POTENTIAL-CONFLICT while communications
1594	   was interrupted, neither server would be responsive to DHCP client
1595	   requests, and if one server had crashed, then there might be no
1596	   server able to process DHCP requests.

1598	   When a server enters RESOLUTION-INTERRUPTED state it SHOULD raise an
1599	   alarm condition to alert administrative staff of a problem in the
1600	   DHCP subsystem.

1602	8.11.1.  Operation in RESOLUTION-INTERRUPTED State

1604	   In this state a server MUST respond to all DHCP client requests.
1605	   When allocating new resources (addresses or prefixes), each server
1606	   SHOULD allocate from its own pool (if that can be determined), where
1607	   the primary SHOULD allocate only FREE resources, and the secondary
1608	   SHOULD allocate only BACKUP resources.  When responding to renewal
1609	   requests, each server will allow continued renewal of a DHCP client's
1610	   current lease irrespective of whether that lease was given out by the
1611	   receiving server or not, although the renewal period MUST not exceed
1612	   the maximum client lead time (MCLT) beyond the latest of: 1) the
1613	   potential valid lifetime already acknowledged by the other server or
1614	   2) the lease-expiration-time or 3) potential valid lifetime received
1615	   from the partner server.

1617	   However, since the server cannot communicate with its partner in this
1618	   state, the acknowledged potential valid lifetime will not be updated
1619	   in any new bindings.

1621	8.11.2.  Transition Out of RESOLUTION-INTERRUPTED State

1623	   If an external command is received by a server in RESOLUTION-
1624	   INTERRUPTED state informing it that its partner is down, it will
1625	   transition immediately into PARTNER-DOWN state.

1627	   If communications is restored with the other server, then the server
1628	   in RESOLUTION-INTERRUPTED state will transition into POTENTIAL-
1629	   CONFLICT state.

1631	8.12.  CONFLICT-DONE State

1633	   This state indicates that during the process where the two servers
1634	   are attempting to re-integrate with each other, the primary server
1635	   has received all of the updates from the secondary server.  It make a
1636	   transition into CONFLICT-DONE state in order that it may be totally
1637	   responsive to the client load, as opposed to NORMAL state where it
1638	   would be in a "balanced" responsive state, running the load balancing
1639	   algorithm.

1641	   TODO: We do not support load balancing, so CONFLICT-DONE is actually
1642	   equal to NORMAL.  Need to remove CONFLICT-DONE and replace all its
1643	   references to NORMAL.

1645	8.12.1.  Operation in CONFLICT-DONE State

1647	   A primary server in CONFLICT-DONE state is fully responsive to all
1648	   DHCP clients (similar to the situation in COMMUNICATIONS-INTERRUPTED
1649	   state).

1651	   If communications fails, remain in CONFLICT-DONE state.  If
1652	   communications becomes OK, remain in CONFLICT-DONE state until the
1653	   conditions for transition out become satisfied.

1655	8.12.2.  Transition Out of CONFLICT-DONE State

1657	   If communications fails with the partner while in CONFLICT-DONE
1658	   state, then the server will remain in CONFLICT-DONE state.

1660	   When a primary server determines that the secondary server has made a
1661	   transition into NORMAL state, the primary server will also transition
1662	   into NORMAL state.

1664	8.13.  PAUSED State

1666	   TODO: Remove PAUSED state completely

1668	   This state exists to allow one server to inform another that it will
1669	   be out of service for what is predicted to be a relatively short
1670	   time, and to allow the other server to transition to COMMUNICATIONS-
1671	   INTERRUPTED state immediately and to begin servicing all DHCP clients
1672	   with no interruption in service to new DHCP clients.

1674	   A server which is aware that it is shutting down temporarily SHOULD
1675	   send a STATE message with the server-state option containing PAUSED
1676	   state and close the TCP connection.

1678	   While a server may or may not transition internally into PAUSED
1679	   state, the 'previous' state determined when it is restarted MUST be
1680	   the state the server was in prior to receiving the command to shut-
1681	   down and restart and which precedes its entry into the PAUSED state.
1682	   See Section 8.3.2 concerning the use of the previous state upon
1683	   server restart.

1685	   When entering PAUSED state, the server MUST store the previous state
1686	   in stable storage, and use that state as the previous state when it
1687	   is restarted.

1689	8.13.1.  Operation in PAUSED State

1691	   Server MUST NOT perform any operation while in PAUSED state.

1693	8.13.2.  Transition Out of PAUSED State

1695	   A server makes a transition out of PAUSED state by being restarted.
1696	   At that time, the previous state MUST be the state the server was in
1697	   prior to entering the PAUSED state.

1699	8.14.  SHUTDOWN State

1701	   This state exists to allow one server to inform another that it will
1702	   be out of service for what is predicted to be a relatively long time,
1703	   and to allow the other server to transition immediately to PARTNER-
1704	   DOWN state, and take over completely for the server going down.

1706	   When entering SHUTDOWN state, the server MUST record the previous
1707	   state in stable storage for use when the server is restarted.  It
1708	   also MUST record the current time as the last time operational.

1710	   A server which is aware that it is shutting down SHOULD send a STATE
1711	   message with the server-state field containing SHUTDOWN.

1713	8.14.1.  Operation in SHUTDOWN State

1715	   A server in SHUTDOWN state MUST NOT respond to any DHCP client input.

1717	   If a server receives any message indicating that the partner has
1718	   moved to PARTNER-DOWN state while it is in SHUTDOWN state then it
1719	   MUST record RECOVER state as the previous state to be used when it is
1720	   restarted.

1722	   A server SHOULD wait for a few seconds after informing the partner of
1723	   entry into SHUTDOWN state (if communications are okay) to determine
1724	   if the partner entered PARTNER-DOWN state.

1726	8.14.2.  Transition Out of SHUTDOWN State

1728	   A server makes a transition out of SHUTDOWN state by being restarted.

1730	9.  Proposed extensions

1732	   The following section discusses possible extensions to the proposed
1733	   failover mechanism.  Listed extensions must be sufficiently simple to
1734	   not further complicate failover protocol.  Any proposals that are
1735	   considered complex will be defined as stand-alone extensions in
1736	   separate documents.

1738	9.1.  Active-active mode

1740	   A very simple way to achieve active-active mode is to remove the
1741	   restriction that seconary server MUST NOT respond to SOLICIT and
1742	   REQUEST messages.  Instead it could respond, but MUST have lower
1743	   preference than primary server.  Clients discovering available
1744	   servers will receive ADVERTISE messages from both servers, but are
1745	   expected to select the primary server as it has higher preference
1746	   value configured.  The following REQUEST message will be directed to
1747	   primary server.

1749	   Discussion: Do DHCPv6 clients actually do this?  DHCPv4 clients were
1750	   rumored to wait for a "while" to accept the best offer, but to a
1751	   first approximation, they all take the first offer they receive that
1752	   is even acceptable.

1754	   The benefit of this approach, compared to the "basic" active--passive
1755	   solution is that there is no delay between primary failure and the
1756	   moment when secondary starts serving requests.

1758	   Discussion: The possibility of setting both servers preference to an
1759	   equal value could theoretically work as a crude attempt to provide
1760	   load balancing.  It wouldn't do much good on its own, as one (faster)
1761	   server could be chosen more frequently (assuming that with equal
1762	   preference sets clients will pick first responding server, which is
1763	   not mandated by DHCPv6).  We could design a simple mechanism of
1764	   dynamically updating preference depending on usage of available
1765	   resources.  This concept hasn't been investigated in detail yet.

1767	10.  Dynamic DNS Considerations

1769	   TODO: Descibe DNS Updates challenges in failover environment.  It is
1770	   nicely described in Section 5.12 of [dhcpv4-failover].

1772	11.  Reservations and failover

1774	   TODO: Describe how lease reservation works with failover.  See
1775	   Section 5.13 in [dhcpv4-failover].

1777	12.  Protocol entities

1779	   Discussion: It is unclear if following sections belong to design or
1780	   protocol draft.  It is currently kept here as a scratchbook with list
1781	   of things that will have to be defined eventually.  Whether or not it
1782	   will stay in this document or will be moved to the protocol spec
1783	   document is TBD.

1785	12.1.  Failover Protocol

1787	   This section enumerates list of options that will be defined in
1788	   failover protocol specification.  Rough description of purpose and
1789	   content for each option is specified.  Exact on wire format will be
1790	   defined in protocol specification.

1792	   1.  OPTION_FO_TIMESTAMP - convey information about timestamp.  It is
1793	       used by time skew measurement algorithm (see Section 7.1).

1795	12.2.  Protocol constants

1797	   This section enumerates various constants that have to be defined in
1798	   actual protocol specification.

1800	   1.  TIME_SKEW_PKTS_AVG - number of packets that are used to calculate
1801	       average time skew between partners.  See (see Section 7.1).

1803	13.  Open questions

1805	   This is scratchbook.  This section will be removed once questions are
1806	   answered.

1808	   Q: Do we want to support temporary addresses?  I think not.  They are
1809	   short-lived by definition, so clients should not mind getting new
1810	   temporary addresses.

1812	   Q: Do we want to support CGA-registered addresses?  There is
1813	   currently work in DHC WG about this, but I haven't looked at it yet.
1814	   If that is complicated, we may not define it here, but rather as an
1815	   extension.  [If it moves forward, we need to support it.]

1817	14.  Security Considerations

1819	   TODO: Security considerations section will contain loose notes and
1820	   will be transformed into consistent text once the core design
1821	   solidifies.

1823	15.  IANA Considerations

1825	   IANA is not requested to perform any actions at this time.

1827	16.  Acknowledgements

1829	   This document extensively uses concepts, definitions and other parts
1830	   of [dhcpv4-failover] document.  Authors would like to thank Shawn
1831	   Routher, Greg Rabil, and Bernie Volz for their significant
1832	   involvement and contributions.

1834	   This work has been partially supported by Department of Computer
1835	   Communications (a division of Gdansk University of Technology) and
1836	   the Polish Ministry of Science and Higher Education under the
1837	   European Regional Development Fund, Grant No.  POIG.01.01.02-00-045/
1838	   09-00 (Future Internet Engineering Project).

1840	17.  References

1842	17.1.  Normative References

1844	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1845	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1847	   [RFC2131]  Droms, R., "Dynamic Host Configuration Protocol",
1848	              RFC 2131, March 1997.

1850	   [RFC3074]  Volz, B., Gonczi, S., Lemon, T., and R. Stevens, "DHC Load
1851	              Balancing Algorithm", RFC 3074, February 2001.

1853	   [RFC3315]  Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C.,
1854	              and M. Carney, "Dynamic Host Configuration Protocol for
1855	              IPv6 (DHCPv6)", RFC 3315, July 2003.

1857	   [RFC3633]  Troan, O. and R. Droms, "IPv6 Prefix Options for Dynamic
1858	              Host Configuration Protocol (DHCP) version 6", RFC 3633,
1859	              December 2003.

1861	   [RFC4704]  Volz, B., "The Dynamic Host Configuration Protocol for
1862	              IPv6 (DHCPv6) Client Fully Qualified Domain Name (FQDN)
1863	              Option", RFC 4704, October 2006.

1865	   [RFC5460]  Stapp, M., "DHCPv6 Bulk Leasequery", RFC 5460,
1866	              February 2009.

1868	17.2.  Informative References

1870	   [I-D.ietf-dhc-dhcpv6-redundancy-consider]
1871	              Tremblay, J., Brzozowski, J., Chen, J., and T. Mrugalski,
1872	              "DHCPv6 Redundancy Deployment Considerations",
1873	              draft-ietf-dhc-dhcpv6-redundancy-consider-02 (work in
1874	              progress), October 2011.

1876	   [RFC2136]  Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
1877	              "Dynamic Updates in the Domain Name System (DNS UPDATE)",
1878	              RFC 2136, April 1997.

1880	   [dhcpv4-failover]
1881	              Droms, R., Kinnear, K., Stapp, M., Volz, B., Gonczi, S.,
1882	              Rabil, G., Dooley, M., and A. Kapur, "DHCP Failover
1883	              Protocol", draft-ietf-dhc-failover-12 (work in progress),
1884	              March 2003.

1886	   [requirements]
1887	              Mrugalski, T. and K. Kinnear, "DHCPv6 Failover
1888	              Requirements",
1889	              draft-ietf-dhc-dhcpv6-failover-requirements-00 (work in
1890	              progress), October 2011.

1892	Authors' Addresses

1894	   Tomasz Mrugalski
1895	   Internet Systems Consortium, Inc.
1896	   950 Charter Street
1897	   Redwood City, CA  94063
1898	   USA

1900	   Phone: +1 650 423 1345
1901	   Email: tomasz.mrugalski@gmail.com

1903	   Kim Kinnear
1904	   Cisco Systems, Inc.
1905	   1414 Massachusetts Ave.
1906	   Boxborough, Massachusetts  01719
1907	   USA

1909	   Phone: +1 (978) 936-0000
1910	   Email: kkinnear@cisco.com