idnits 2.17.1 

draft-ietf-dhc-failover-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 100 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 16 instances of too long lines in the document, the longest
     one being 4 characters in excess of 72.

  ** The abstract seems to contain references ([RFC2131]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 1171 has weird spacing: '... of all  of th...'

  == Line 1233 has weird spacing: '...eserved    not...'

  == Line 1716 has weird spacing: '...    Len  reque...'

  == Line 4115 has weird spacing: '...ore the  expir...'

  == Line 4197 has weird spacing: '...'s hash  algor...'

  == (1 more instance...)

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     In this state a server MUST respond to all DHCP client requests,
     and the algorithm for load balancing described in section 5.3 MUST NOT be
     used.  When allocating new IP addresses, each server allocates from its
     own IP address pool, where the primary MUST allocate only FREE IP
     addresses, and the secondary MUST allocate only BACKUP IP addresses. When
     responding to renewal requests, each server will allow continued renewal
     of a DHCP client's current lease on an IP address irrespec-tive of
     whether that lease was given out by the receiving server or not, although
     the renewal period MUST not exceed the maximum client lead time (MCLT)
     beyond the potential-expiration-time already ack-nowledged by the other
     server or the lease-expiration-time or potential-expiration-time received
     from the partner server.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (December 1999) is 8889 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '1' on line 588

  == Missing Reference: 'IPAMTLS' is mentioned on line 4174, but not defined

  -- Looks like a reference, but probably isn't: '256' on line 4207

  == Unused Reference: 'RFC 2132' is defined on line 4329, but no explicit
     reference was found in the text

  == Unused Reference: 'IMAPTLS' is defined on line 4338, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2246 (ref. 'TLS') (Obsoleted by RFC
     4346)

  ** Obsolete normative reference: RFC 2487 (ref. 'SMTPTLS') (Obsoleted by
     RFC 3207)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NAMESPACE'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DDNS'


     Summary: 10 errors (**), 0 flaws (~~), 13 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        Ralph Droms
2	INTERNET DRAFT                                       Bucknell University

4	                                                             Kim Kinnear
5	                                                              Mark Stapp
6	                                                           Cisco Systems

8	                                                             Bernie Volz
9	                                                            Steve Gonczi
10	                                                        Process Software

12	                                                              Greg Rabil
13	                                                             Mike Dooley
14	                                                              Arun Kapur
15	                                                       Quadritek Systems

17	                                                               June 1999
18	                                                   Expires December 1999

20	                         DHCP Failover Protocol
21	                    <draft-ietf-dhc-failover-04.txt>

23	Status of this Memo

25	   This document is an Internet-Draft and is in full conformance with
26	   all provisions of Section 10 of RFC2026.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF), its areas, and its working groups.  Note that
30	   other groups may also distribute working documents as Internet-
31	   Drafts.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet- Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   The list of current Internet-Drafts can be accessed at
39	   http://www.ietf.org/ietf/1id-abstracts.txt

41	   The list of Internet-Draft Shadow Directories can be accessed at
42	   http://www.ietf.org/shadow.html.

44	Copyright Notice

46	   Copyright (C) The Internet Society (1999). All Rights Reserved.

48	Abstract

50	   DHCP [RFC 2131] allows for multiple servers to be operating on a
51	   single network. Some sites are interested in running multiple servers
52	   in such a way so as to provide redundancy in case of server failure.
53	   In order for this to work reliably, the cooperating primary and
54	   secondary servers must maintain a consistent database of the lease
55	   information.  This implies that servers will need to coordinate any
56	   and all lease activity so that this information is synchronized in
57	   case of failover.

59	   This document defines a protocol to provide this synchronization
60	   between two servers. One server is designated the "primary" server,
61	   the other is the "secondary" server. Additionally, this document
62	   describes a protocol which allows each server to determine to which
63	   DHCP clients it should provide service when both servers are
64	   operating in order to support load balancing as well as when on one
65	   server has failed in order to support increased DHCP service
66	   availability.

68	   This document is a complete rewrite of draft-ietf-dhc-failover-
69	   03.txt.  That earlier draft described a UDP based failover protocol,
70	   and this draft describes a closely related protocol which uses TCP as
71	   a transport and includes new load-balancing and security
72	   capabilities.

74	Table of Contents

76	    1.  Introduction................................................. 4
77	    2.  Terminology.................................................. 5
78	    2.1.  Requirements terminology................................... 5
79	    2.2.  DHCP and failover terminology.............................. 5
80	    3.  Background and External Requirements......................... 7
81	    3.1.  Key aspects of the DHCP protocol........................... 7
82	    3.2.  BOOTP relay agent implementation........................... 9
83	    3.3.  What does it mean if a server can't communicate with its partner?
84	10
85	    3.4.  Challenging scenarios for a Failover protocol............. 10
86	    3.5.  Using TCP to detect partner server failure................ 11
87	    4.  Design Goals................................................ 13
88	    4.1.  Design requirements for this protocol..................... 13
89	    4.2.  Goals for this protocol................................... 13
90	    4.3.  Limitations of this Protocol.............................. 14
91	    5.  Protocol Overview........................................... 15
92	    5.1.  Messages and States....................................... 15
93	    5.2.  Fundamental restrictions.................................. 18
94	    5.3.  Load balancing............................................ 24
95	    5.4.  Operating in NORMAL state................................. 25
96	    5.5.  Operating in COMMUNICATIONS-INTERRUPTED state............. 25
97	    5.6.  Operating in PARTNER-DOWN state........................... 25
98	    5.7.  Operating in RECOVER state................................ 26
99	    6.  Packet Formats.............................................. 26
100	    6.1.  Common message format..................................... 26
101	    6.2.  Common option format...................................... 28
102	    6.3.  BNDUPD message format..................................... 40
103	    6.4.  BNDACK message format..................................... 42
104	    6.5.  Bulking for BNDUPD and BNDACK messages.................... 44
105	    6.6.  UPDREQ message format..................................... 44
106	    6.7.  UPDREQALL message format.................................. 44
107	    6.8.  UPDDONE message format.................................... 44
108	    6.9.  POOLREQ message format.................................... 45
109	    6.10.  POOLRESP message format.................................. 45
110	    6.11.  CONNECT message format................................... 46
111	    6.12.  CONNECTACK message format................................ 46
112	    6.13.  STATE message format..................................... 47
113	    6.14.  CONTACT message format................................... 48
114	    7.  Protocol Messages........................................... 48
115	    7.1.  BNDUPD message............................................ 48
116	    7.2.  BNDACK message............................................ 57
117	    7.3.  UPDREQ message............................................ 58
118	    7.4.  UPDREQALL message......................................... 59
119	    7.5.  UPDDONE message........................................... 60
120	    7.6.  POOLREQ message........................................... 60
121	    7.7.  POOLRESP message.......................................... 61
122	    7.8.  CONNECT message........................................... 62
123	    7.9.  CONNECTACK message........................................ 65
124	    7.10.  STATE message............................................ 68
125	    7.11.  CONTACT message.......................................... 69
126	    8.  Connection Management....................................... 70
127	    8.1.  Connection granularity.................................... 70
128	    8.2.  Creating the TCP connection............................... 70
129	    8.3.  Using the TCP connection for determining communications status. 71
130	    8.4.  Using the TCP connection for binding data................. 73
131	    8.5.  Using the TCP connection for control messages............. 73
132	    8.6.  Losing the TCP connection................................. 73
133	    9.  Protocol States............................................. 73
134	    9.1.  Server Initialization..................................... 74
135	    9.2.  Server State Transitions.................................. 74
136	    9.3.  STARTUP state............................................. 77
137	    9.4.  PARTNER-DOWN state........................................ 79
138	    9.5.  RECOVER state............................................. 81
139	    9.6.  NORMAL state.............................................. 83
140	    9.7.  COMMUNICATIONS-INTERRUPTED State.......................... 86
141	    9.8.  POTENTIAL-CONFLICT state.................................. 89
142	    9.9.  RECOVER-DONE state........................................ 90
143	    9.10.  PAUSED state............................................. 91
144	    9.11.  SHUTDOWN state........................................... 91
145	    10.  Safe Period................................................ 92
146	    11.  Security................................................... 94
147	    11.1.  Simple shared secret..................................... 94
148	    11.2.  TLS...................................................... 94
149	    12.  Hash algorithm for load balancing.......................... 95
150	    13.  Acknowledgments............................................ 96
151	    14.  References................................................. 97
152	    15.  Author's information....................................... 98
153	    16.  Full Copyright Statement................................... 99

155	1.  Introduction

157	   DHCP [RFC 2131] allows for multiple servers to be operating on a sin-
158	   gle network.  Some sites are interested in running multiple servers
159	   in such a way so as to provide redundancy in case of server failure
160	   since the DHCP subsystem is in many cases a critical part of the net-
161	   work infrastructure.

163	   This document defines a protocol to provide synchronization between
164	   two servers in order that each can take over for the other should
165	   either one fail or become unreachable.

167	   One server is designated the "primary" server,  the other is the
168	   "secondary" server, and all DHCP client requests are sent to each
169	   server.

171	   In order to provide a  high availability DHCP service, these
172	   cooperating primary and secondary servers must maintain a consistent
173	   database of lease information.  This implies that servers will need
174	   to coordinate any and all lease activity so that this information is
175	   synchronized in case failover is required.  The protocol messages and
176	   processing techniques required to maintain a consistent database are
177	   specified in the protocol described here.

179	   The failover protocol also contains an algorithm which allows each
180	   server to determine to which DHCP clients it should provide service
181	   when both servers are operating normally, and this capability can be
182	   used to support load balancing.

184	2.  Terminology

186	   This section discusses both the generic requirements terminology com-
187	   mon to many IETF protocol specifications as well as specialized DHCP
188	   and failover protocol specific terminology.

190	2.1.  Requirements terminology

192	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
193	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
194	   document are to be interpreted as described in RFC 2119 [RFC 2119].

196	2.2.  DHCP and failover terminology

198	   This document uses the following terms:

200	      o "DHCP client" or "client"

202	        A DHCP client is an Internet host using DHCP to obtain confi-
203	        guration parameters such as a network address.

205	      o "DHCP server" or "server"

207	        A DHCP server is an Internet host that returns configuration
208	        parameters to DHCP clients.

210	      o "binding"

212	        A binding is a collection of configuration parameters, including
213	        at least an IP address, associated with or "bound to" a DHCP
214	        client.  Bindings are managed by DHCP servers.

216	      o "binding database"

218	        The collection of bindings managed by a primary and secondary.

220	      o "failover endpoint"

222	        The failover protocol allows for there to be a unique failover
223	        endpoint per partner per role (where role is primary or secon-
224	        dary).  This failover endpoint can take actions and hold unique
225	        states.  There are thus a maximum of two failover endpoints per
226	        server per partner (one for each partner as a primary and one
227	        for that same partner as a secondary.)

229	      o "lazy update"
230	        Lazy update refers to the requirement placed on a server imple-
231	        menting a failover protocol to update its failover partner when-
232	        ever the binding database changes.  A failover protocol which
233	        didn't support lazy update would require the failover partner
234	        update to be complete before a DHCP server could respond to a
235	        DHCP client request with a DHCPACK.  A failover protocol which
236	        does support lazy update places no such restriction on the
237	        update of the failover partner server, and so a server can allo-
238	        cate an IP address or extend a lease on an IP address and then
239	        update its failover partner as time permits.  A failover proto-
240	        col which supports lazy update not only removes the requirement
241	        to update the failover partner prior to responding to a DHCP
242	        client with a DHCPACK, but also allows gathering up batches of
243	        updates from one failover server to its partner.

245	      o "subnet address pool"

247	        A subnet address pool is the set of IP address which is associ-
248	        ated with a particular network number and subnet mask.  In the
249	        simple case, there is a single network number and subnet mask
250	        and a set of IP addresses.  In the more complex case (sometimes
251	        called "secondary subnets", sometimes "superscopes"), several
252	        (apparently unrelated) network number and subnet mask combina-
253	        tions with their associated IP addresses may all be configured
254	        together into one subnet address pool.

256	      o "Primary server" or "Primary"

258	        A DHCP server configured to provide primary service to a set of
259	        DHCP clients for a particular set of subnet address pools.

261	      o "Secondary server" or "Secondary"

263	        A DHCP server configured to act as backup to a primary server
264	        for a particular set of subnet address pools.

266	      o "stable storage"

268	        Every DHCP server is assumed to have some form of what is called
269	        "stable storage".  Stable storage is used to hold information
270	        concerning IP address bindings (among other things) so that this
271	        information is not lost in the event of a server failure which
272	        requires restart of the server.

274	      o "MCLT"

276	        The MCLT refers to maximum client lead time.  This time is con-
277	        figured on the primary server and transmitted from the primary
278	        to the secondary server in the CONNECT message.  It is the max-
279	        imum amount of time that one server can give to a client for a
280	        binding beyond that known and ACKed by the partner server.  See
281	        section 5.2.1 for details.

283	3.  Background and External Requirements

285	   This section highlights key aspects of the DHCP protocol on which the
286	   failover protocol depends.  It also discusses the requirements that
287	   the failover protocol places on other aspects of the network infras-
288	   tructure, and some general issues surrounding server failure detec-
289	   tion.  Some failure scenarios that provide particular challenges to a
290	   failover protocol are discussed.  Finally, the challenges inherent in
291	   using a TCP connection as a means to detect failure of a partner
292	   server are elaborated.

294	3.1.  Key aspects of the DHCP protocol

296	   The failover protocol is designed to augment the DHCP protocol as
297	   described in RFC 2131 [RFC 2131].  There are several key aspects of
298	   the DHCP protocol which are required by the failover protocol in
299	   order to successfully meet its design goals.

301	3.1.1.  Broadcast behavior

303	   There are two aspects of the broadcast behavior of the DHCP protocol
304	   which are key to making the failover protocol operate successfully.
305	   The first is simply that the DHCP protocol requires a DHCP client to
306	   broadcast all DHCPDISCOVER and DHCPREQUEST/INIT-REBOOT messages.
307	   Because of this requirement, a DHCP client who was communicating with
308	   one server will automatically be able to communicate with another
309	   server if one is available.

311	   The second aspect of broadcast behavior is similar to the first, but
312	   involves the distinction between a DHCPREQUEST/RENEW and
313	   DHCPREQUEST/REBINDING.  A DHCPREQUEST/RENEW is the message that a
314	   DHCP client uses to extend its lease.  It is unicast to the DHCP
315	   server from which it acquired the lease.   However, the DHCP protocol
316	   (in a farsighted move), was explicitly designed so that in the event
317	   that a DHCP client cannot contact the server from which it received a
318	   lease on an IP address using a DHCPREQUEST/RENEW, the client is
319	   required to broadcast its renewal using a DHCPREQUEST/REBINDING to
320	   any available DHCP server.  Since all DHCP clients were required to
321	   implement this algorithm, the failover protocol can have a different
322	   server from the one that initially granted a lease be the server to
323	   renew a lease.  Thus, one server can take over for another with no
324	   interruption in the service as experience by the DHCP client or its
325	   associated applications software.

327	3.1.2.  Client responsibility

329	   In the DHCP protocol the DHCP clients are entrusted with a consider-
330	   able responsibility.  In particular, after they are granted a lease
331	   on an IP address, they are enjoined to only use that IP address while
332	   their lease is valid.  Every DHCP client is expected to stop using an
333	   IP address if the expiration time on the lease has passed and if it
334	   cannot get an extension on the lease for that IP address from some
335	   DHCP server.  Thus, the correct behavior of every DHCP client in this
336	   regard is required to ensure the integrity of the DHCP service.  On
337	   the other hand, incorrect behavior by a client in this area will tend
338	   to adversely affect at most one other DHCP client.

340	   Furthermore, any DHCP client which sends in a DHCPREQUEST/RENEW or
341	   DHCPREQUEST/REBINDING to a DHCP server (either unicast for a RENEW or
342	   broadcast for a REBINDING) MUST still have time to run on the lease
343	   for that IP address.  The DHCP server sends the DHCPACK back unicast
344	   to the IP address from which the RENEW or REBINDING originated.

346	   Given the existing responsibility placed on the client to only use an
347	   IP address when the lease is valid, and to only send in a RENEW or
348	   REBINDING if the lease is valid, the failover protocol relies on DHCP
349	   clients to perform responsibly and will, in the absence of conflict-
350	   ing information, believe a DHCP client that is attempting to RENEW or
351	   REBIND a lease on an IP address is the legitimate owner of that IP
352	   address.

354	   One troublesome issue is that of the DHCP client responsibility when
355	   sending in DHCPREQUEST/INIT-REBOOT requests.  While the original DHCP
356	   RFC was written to require a DHCP client to have time left to run on
357	   the lease for an IP address if the client is sending an INIT-REBOOT
358	   request, it was sufficiently unclear that some client vendors didn't
359	   realize this until recently.  Since the INIT-REBOOT request was sent
360	   with the IP address in the dhcp-requested-address option and not in
361	   the ciaddr (for perfectly good reasons), the similarity to the RENEW
362	   and REBINDING case was lost on many people.

364	   At present, the failover protocol does not assume that a client send-
365	   ing in an INIT-REBOOT request necessarily has a valid lease on the IP
366	   address appearing in the dhcp-requested-address option in the INIT-
367	   REBOOT request.

369	   The implications of this are as follows: Assume that there is a DHCP
370	   client that gets a lease from one server while that server is unable
371	   to communicate with its failover partner.  Then, assume that after
372	   that client reboots it is able only to communicate with the other
373	   failover server.  If the failover servers have not been able to com-
374	   municate with each other during this process, then the DHCP client
375	   will get a new IP address instead of being able to continue to use
376	   its existing IP address. This will affect no applications on the DHCP
377	   client, since it is rebooting.  However, it will use up an additional
378	   IP address in this marginal case.

380	3.1.3.  Stable storage update before DHCPACK

382	   The DHCP protocol allocates resources, and in order to operate
383	   correctly it requires that a DHCP server update some form of stable
384	   storage prior to sending a DHCPACK to a DHCP client in order to grant
385	   that client a lease on an IP address.

387	   One of the goals of the failover protocol is that it not add signifi-
388	   cant additional time to this already time consuming requirement to
389	   update stable storage prior to a DHCPACK.  In particular, adding a
390	   requirement to communicate with another server prior to sending a
391	   DHCPACK would simplify the failover protocol, but it would limit the
392	   potential scalability of any DHCP server which employed the failover
393	   protocol in an unacceptable manner.

395	3.2.  BOOTP relay agent implementation

397	   Many DHCP clients are not resident on the same network segment as a
398	   DHCP server.  In order to support this form of network architecture,
399	   most contemporary routers implement something known as a BOOTP Relay
400	   Agent.  This capability inside of a router listens for all broadcasts
401	   at the DHCP port, port 67, and will relay any broadcasts that it
402	   receives on to a DHCP server.  The IP address of the DHCP server must
403	   have been previously configured into the router.  As part of the
404	   relay process, the relay agent will place the address of the inter-
405	   face on which it received the broadcast into the giaddr field of the
406	   DHCP packet.

408	   Since the failover protocol requires two DHCP servers to receive any
409	   broadcast DHCP messages, in order to work with DHCP clients which are
410	   not local to the DHCP server, the BOOTP relay agent on the router
411	   closest to the DHCP client must be configured to point at more than
412	   one DHCP server.

414	   Most BOOTP relay agent implementations allow this duplication of
415	   packets.

417	   If this is not possible, an administrator might be able to configure
418	   the relay agent with a subnet broadcast address, but in this case the
419	   primary and secondary DHCP servers in a failover pair must both
420	   reside on the same subnet.   While this is a realistic configuration,
421	   it is not the one that most people will use.

423	3.3.  What does it mean if a server can't communicate with its partner?

425	   In any protocol designed to allow one server to take over some
426	   responsibilities from a partner server in the event of "failure" of
427	   that partner server, there is an inherent difficulty in determining
428	   when that partner server has failed.

430	   In fact, it is fundamentally impossible for one server to distinguish
431	   a network communications failure from the outright failure of the
432	   server to which it is trying to communicate.  In the case where each
433	   server is handing out resources (in this case IP addresses) to a
434	   client community, mistaking an inability to communicate with a
435	   partner server for failure of that partner server could easily cause
436	   both servers to be handing out the same IP addresses to different
437	   clients.

439	   One way that this is sometimes handled is for there to be more than
440	   two servers.  In the case of an odd number of servers, the servers
441	   that can still communicate with a majority of other servers will con-
442	   sider themselves operational, and any server which can't communicate
443	   to a majority of other servers must immediately cease operations.

445	   While this technique works in some domains, having the only server to
446	   which a DHCP client can communicate voluntarily shut itself down
447	   seems like something worth avoiding.

449	   The failover protocol will operate correctly while both servers are
450	   unable to communicate, whether they are both running or not.  At some
451	   point there may be resource contention, and if one of the servers is
452	   actually down, then the operator can inform the other server and the
453	   operational server will be able to use all of the downed server's
454	   resources.

456	   The protocol also allows detection of an orderly shutdown of a parti-
457	   cipating server.

459	3.4.  Challenging scenarios for a Failover protocol

461	   There exist two failure scenarios which provide particular challenges
462	   the correctness guarantees of a failover protocol.

464	3.4.1.  Primary Server crash before "lazy" update:

466	   In the case where the primary server sends a DHCPACK to a client for
467	   a newly allocated IP address and then crashes prior to sending the
468	   corresponding update to the secondary server, the secondary server
469	   will have no record of the IP address allocation.  When the secondary
470	   server takes over, it may well try to allocate that IP address to a
471	   different client.  In the case where the first client to receive the
472	   IP address is not on the net at the time (yet while there was still
473	   time to run on its lease), an ICMP echo (i.e., ping) will not prevent
474	   the secondary server from allocating that IP address to a different
475	   client.

477	   The failover protocol deals with this situation by having the primary
478	   and secondary servers allocate addresses for new clients from dis-
479	   joint address pools.  See section 5.4 for details.

481	   A more likely (in that DHCPRENEWs are presumably more common than
482	   DHCPDISCOVERs) and more subtle version of this problem is where the
483	   primary server crashes after extending a client's lease time, and
484	   before updating the secondary with a new time using a lazy update.
485	   After the secondary takes over, if the client is not connected to the
486	   network the secondary will believe the client's lease has expired
487	   when, in fact, it has not.  In this case as well, the IP address
488	   might be reallocated to a different client while the first client is
489	   still using it.

491	   This scenario is handled by the failover protocol through control of
492	   the lease time and the use of the maximum client lead time (MCLT).
493	   See section 5.2.1  for details.

495	3.4.2.  Network partition where DHCP servers can't communicate but each
496	can talk to clients:

498	   Several conditions are required for this situation to occur.  First,
499	   due to a network failure, the primary and secondary servers cannot
500	   communicate.  As well, some of the DHCP clients must be able to com-
501	   municate with the primary server, and some of the clients must now
502	   only be able to communicate with the secondary server.  When this
503	   condition occurs, both primary and secondary servers could attempt to
504	   allocate IP addresses for new clients from the same pool of available
505	   addresses.  At some point, then, two clients will end up being allo-
506	   cated the same IP address.  This will cause problems when the network
507	   failure that created this situation is corrected.

509	   The failover protocol deals with this situation by having the primary
510	   and secondary servers allocate addresses for new clients from dis-
511	   joint address pools.  See section 5.4 for details.

513	3.5.  Using TCP to detect partner server failure

515	   There are several characteristics of TCP that are important to the
516	   functioning of the failover protocol, which uses one TCP connection
517	   for both bulk data transfer as well as to assess communications
518	   integrity with the other server.  Reliable and ordered message
519	   delivery are chief among these important characteristics.

521	   It would be nice to use the capabilities built in to TCP to allow it
522	   to determine if communications integrity exists to the failover
523	   partner but this strategy contains some problems which require
524	   analysis.  There exist three fundamental cases for an open TCP con-
525	   nection that must be examined.

527	      1.  When no data is being sent then no messages are traveling
528	          across the TCP connection.

530	      2.  When data is queued to be sent, and the receiver has not
531	          blocked the sending of additional data, then messages are
532	          flowing across the TCP connection containing the applications
533	          data.

535	      3.  When data is queued to be sent, and the receiver has blocked
536	          the transmission of additional data, then persist messages are
537	          flowing from the receiver to the sender to ensure that the
538	          sender doesn't miss the receiver opening the window for
539	          further transmissions.

541	   The first case can be turned into the second case by sending
542	   application-level keep-alive messages periodically when there is no
543	   other data queued to be sent.  Note TCP keep-alive messages might be
544	   used as well, but they present additional problems.

546	   Thus, we can ensure that the TCP connection has messages flowing
547	   periodically across the connection fairly easily.  The question
548	   remains as to what TCP will do if the other end of the connection
549	   fails to respond (either because of network partition or because the
550	   receiving server crashes). TCP will attempt to retransmit a message
551	   with an exponential backoff, and will eventually timeout that
552	   retransmission.  However, the length of that timeout cannot, in gen-
553	   eral, be set on a per-connection basis, and is frequently as long as
554	   nine minutes, though in some cases it may be as short as two minutes.
555	   One some systems it can be set system-wide, while on some systems it
556	   cannot be changed at all.

558	   A value for this timeout that would be appropriate for the failover
559	   protocol, say less than 1 minute, could have unpleasant side-effects
560	   on other applications running on the same server, assuming that it
561	   could be changed at all on the host operating system.

563	   Nine minutes is a long time for the DHCP service to be unavailable to
564	   any new clients that were being served by the server which has
565	   crashed, when there is another server running that could respond to
566	   them immediately as soon as it determines that its partner is not
567	   operational.

569	   The conclusion drawn from this analysis is that TCP provides very
570	   useful support for the failover protocol in the areas of reliable and
571	   ordered message delivery, but cannot by itself be relied upon to
572	   detect partner server failure in a fashion acceptable to the needs of
573	   the failover protocol.  Additional failover protocol capabilities
574	   will need to be created to support timely detection of partner server
575	   failure.  See section 8.3 for details on this mechanism.

577	4.  Design Goals

579	   This section lists the design requirements, the design goals, and the
580	   limitations of the failover protocol.

582	4.1.  Design requirements for this protocol

584	   The following list of requirements must be (and are) met by this pro-
585	   tocol.  They are listed in priority order.

587	      1.  Implementations of this protocol must work with existing DHCP
588	          client implementations based on the DHCP protocol [1].

590	      2.  Implementations of the protocol must work with existing BOOTP
591	          relay agent implementations.

593	      3.  The protocol must provide failover redundancy between servers
594	          that are not located on the same subnet.

596	4.2.  Goals for this protocol

598	   The following goals are met by this protocol as well, though they are
599	   less important than the requirements listed above. These goals are
600	   listed in priority order.

602	      1.  Provide for continued service to DHCP clients through an
603	          automated mechanism in the event of failure of the primary
604	          server.

606	      2.  Avoid binding an IP address to a client while that binding is
607	          currently valid for another client.  In other words, do not
608	          allocate the same IP address to two clients.

610	      3.  Minimize any need for manual administrative intervention.

612	      4.  Introduce no additional delays in server response time as a
613	          result of the network communications required to implement the
614	          failover protocol, i.e., don't require communications with the
615	          partner between the receipt of a DHCPREQUEST and the
616	          corresponding DHCPACK.

618	      5.  Share IP address ranges between primary and secondary servers;
619	          i.e., impose no requirement that the pool of available
620	          addresses be divided between servers.

622	      6.  Continue to meet the goals and objectives of this protocol in
623	          the event of server failure or network partition.

625	      7.  Provide graceful reintegration of full protocol service after
626	          server failure or network partition.

628	      8.  Allow for one computer to act as a secondary server for multi-
629	          ple primary servers. Other topologies (e.g.: mesh) are also
630	          possible.  primary and secondary servers SHOULD be viewed as
631	          "logical" servers and not necessarily physical computers.

633	      9.  Ensure that an existing client can keep its existing IP
634	          address binding if it can communicate with either the primary
635	          or secondary DHCP server implementing this protocol - not just
636	          whichever server that originally offered it the binding.

638	      10. Ensure that a new client can get an IP address from some
639	          server. Ensure that in the face of partition, where servers
640	          continue to run but cannot communicate with each other, the
641	          above goals and requirements may be met. In addition, when the
642	          partition condition is removed, allow graceful automatic re-
643	          integration without requiring human intervention.

645	      11. If either primary or secondary server loses all of the infor-
646	          mation that is has stored in stable storage, it should be able
647	          to refresh its stable storage from the other server.

649	      12. Support load balancing between the primary and secondary
650	          servers, and allow configuration of the percentage of the
651	          client population served by each with a moderately fine granu-
652	          larity.

654	4.3.  Limitations of this Protocol

656	   The following are explicit limitations of this protocol.

658	      1.  This protocol provides only one level of redundancy through a
659	          single secondary server for each primary server.

661	      2.  A subset of the address pool is reserved for secondary server
662	          use.  In order to handle the failure case where both servers
663	          are able to communicate with DHCP clients, but unable to com-
664	          municate with each other, a subset of the IP address pool must
665	          be set aside as a private address pool for the secondary
666	          server. The secondary can use these to service newly arrived
667	          DHCP clients during such a period.  The size of this private
668	          pool SHOULD be based only on the arrival rate of new DHCP
669	          clients and the length of expected downtime, and is not influ-
670	          enced in any way by the total number of DHCP clients supported
671	          by the server pair.

673	      3.  The primary and secondary servers do not respond to client
674	          requests at all while recovering from a failure that could
675	          have resulted in duplicate IP assignments.  (When synchroniz-
676	          ing in POTENTIAL-CONFLICT state).

678	5.  Protocol Overview

680	   This section will discuss the failover protocol at a relatively high
681	   level level of detail.  In the event that a description in this sec-
682	   tion conflicts (or appears to conflict due to the overview nature of
683	   this section) with information in later sections of this draft, the
684	   information in the later sections should be considered authoritative.

686	5.1.  Messages and States

688	   This protocol is centered around the message exchange used by one
689	   server to update the other server of binding database changes result-
690	   ing from DHCP client activity:

692	      o Communication of binding database changes

694	        The binding update (BNDUPD) message is used to send the binding
695	        database changes to the partner server, and the partner server
696	        responds with a binding acknowledgement (BNDACK) message when it
697	        has successfully committed those changes to its own stable
698	        storage.

700	   All of the other messages are involve ancillary issues:

702	      o Management of available IP addresses

704	        The pool request (POOLREQ) is used by the secondary server to
705	        request an allocation of IP addresses from the primary server.

707	        The pool response (POOLRESP) is used by the primary server to
708	        inform the secondary server how many IP addresses it was allo-
709	        cated as the result of a pool request.

711	      o Synchronization of the binding databases between the servers
712	        after they've been out of communications

714	        The update request (UPDREQ) message is used by one server to
715	        request that its partner send it all binding database informa-
716	        tion that it has not already seen.  The update request all
717	        (UPDREQALL) message is used by one server to request that all
718	        binding database information be sent in order to recover from a
719	        total loss of its lease state database by the requesting server.
720	        The update done (UPDDONE) message is used by the responding
721	        server to indicate that all requested updates have been sent the
722	        responding server and acked by the requesting server.

724	      o Connection establishment

726	        The connect (CONNECT) message is used by either server to estab-
727	        lish a high level connection with the other server, and to
728	        transmit several important configuration data items between the
729	        servers.  The connect acknowledgement message (CONNECTACK) is
730	        used to respond to a CONNECT message from another server.

732	      o Server synchronization

734	        The state change (STATE) message is used by either server to
735	        inform the other server of a change of failover state.

737	      o Connection integrity management

739	        The contact (CONTACT) message is used by either server to ensure
740	        that the other server continues to see the connection as opera-
741	        tional.  It MUST be transmitted periodically over every esta-
742	        blished connection if other message traffic is not flowing, and
743	        it MAY be sent at any time.

745	5.1.1.  Failover endpoints

747	   The proper operation of the failover protocol requires more than the
748	   transmission of messages between one server and the other.  Each end-
749	   point might seem to be a single DHCP server, but in fact there are
750	   many situations where additional flexibility in configuration is use-
751	   ful.

753	   For instance, there might be several servers which are each primary
754	   for a distinct set of address pools, and one server which is
755	   secondary for all of those address pools.  The situation with the
756	   primaries is straightforward, but the secondary will need to maintain
757	   a separate failover state, partner state, and communications up/down
758	   status for each of the separate primary servers for which it is act-
759	   ing as a secondary.

761	   The failover protocol calls for there to be a unique failover end-
762	   point per partner per role (where role is primary or secondary).
763	   This failover endpoint can take actions and hold unique states.
764	   There are thus a maximum of two failover endpoints per partner (one
765	   for the partner as a primary and one for that same partner as a
766	   secondary.)

768	   Thus, in the case where there are two primary servers A and B each
769	   backed up by a single common secondary server C, there is one fail-
770	   over endpoint on each of A and B, and two different failover end-
771	   points on C.  The two different failover endpoints on C each have
772	   unique states and independent TCP connections.

774	   This document describes the behavior of the protocol in terms of pri-
775	   mary and secondary servers, not primary and secondary failover end-
776	   points.  However, it is important to remember that every 'server'
777	   described in this document is in reality a failover endpoint that
778	   resides in a particular process, and that many failover endpoints may
779	   reside in the same process.

781	   It is not the case that there is a unique failover endpoint for each
782	   subnet that participates in a failover relationship.  On one server,
783	   there is one failover endpoint per partner per role, regardless of
784	   how many subnets or address pools are managed by that combination of
785	   partner and role.  Conversely, any given subnet or pool will be asso-
786	   ciated with exactly one failover endpoint on a single server.

788	   When a connection is received from the partner, the unique failover
789	   endpoint to which the message is directed is determined solely by the
790	   IP address of the partner and the setting of the SECONDARY bit in the
791	   'flags' field of the contact message.

793	   Throughout this document, the states and actions taken by "servers"
794	   are described.  The terms "server", "primary server", and "secondary
795	   server" are commonly used to described the failover endpoint taking
796	   these states and performing these actions.  This description is
797	   wholly accurate only for the simplest of cases, where all of the
798	   address pools on one server are backed up by all of the address pools
799	   on another server.  In this case, there is single failover endpoint
800	   in each server.  In all other cases, the term "server" is used to
801	   describe one of the two possible failover endpoints per partner.

803	5.2.  Fundamental restrictions

805	   There a several fundamental restrictions this protocol places on what
806	   one server an do in the absence of knowledge of the other server, and
807	   these restrictions are key to the correct operation of the protocol.

809	5.2.1.  Control of lease time

811	   The key problem with lazy update is that when the a server fails
812	   after updating a client with a particular lease time and before
813	   updating its partner, the partner will believe that a lease has
814	   expired even though the client still retains a valid lease on that IP
815	   address.

817	   In order to handle this problem, a period of time known as the "Max-
818	   imum Client Lead Time" (MCLT) is defined and must be known to both
819	   the primary and secondary servers.  Proper use of this time interval
820	   places an upper bound on the difference allowed between the lease
821	   time provided to a DHCP client by a server and the lease time known
822	   by that server's partner. However, the MCLT is typically much less
823	   than the lease time that a server has been configured to offer a
824	   client, and so some strategy must exist to allow a server to offer
825	   the configured lease time to a client.  During a lazy update the
826	   updating server typically updates its partner with a potential
827	   expiration time which is longer than the lease time previously given
828	   to the client and which is longer than the lease time that the server
829	   has been configured to give a client.  This allows that server to
830	   give a longer lease time to the client the next time the client
831	   renews its lease, since the time that it will give to the client will
832	   not exceed the MCLT beyond the potential expiration time acknowledged
833	   by the partner.

835	   When moving to the PARTNER-DOWN state (where a server is allowed to
836	   reallocate the partner's IP addresses), a server will wait the Max-
837	   imum Client Lead Time before allocating any IP addresses from its
838	   partner's pool to any new DHCP clients.  Thus, any clients which have
839	   a lease on an IP address with a lease time greater than that known by
840	   the server moving into PARTNER-DOWN state will either have contacted
841	   that server during the MCLT period or their leases will have expired.

843	   When a server has transitioned to PARTNER-DOWN state, it MUST NOT
844	   reallocate an IP address from one client to another client until an
845	   additional maximum client lead time interval after the lease by the
846	   original client expires. (Actually, until the maximum client lead
847	   time after what it believes to be the lease expiration time of the
848	   first client.)

850	   Some optimizations exist for this restriction, in that it only
851	   applies to leases that were issued BEFORE entering PARTNER-DOWN. Once
852	   a server has entered PARTNER-DOWN and it leases out an address, it
853	   need not wait this time as long as it has never communicated with the
854	   partner since the lease was given out.

856	   The fundamental relationship on which much of the correctness of this
857	   protocol depends is that the lease expiration time known to a DHCP
858	   client MUST NOT be more than the maximum client lead time greater
859	   than the potential expiration time known to a server's partner.

861	   The remainder of this section makes the above fundamental relation-
862	   ship more explicit.

864	   This protocol requires a DHCP server to deal with several different
865	   lease intervals and places specific restrictions on their relation-
866	   ships. The purpose of these restrictions is to allow the other server
867	   in the pair to be able to make certain assumptions in the absence of
868	   an ability to communicate between servers.

870	   The different lease times are:

872	      o desired lease interval

874	        The desired lease interval is the lease interval that a DHCP
875	        server would like to give to a DHCP client in the absence of any
876	        restrictions imposed by the Failover protocol.  Its determina-
877	        tion is outside of the scope of this protocol. Typically this is
878	        the result of external configuration of a DHCP server.

880	      o actual lease interval

882	        The actual lease internal is the lease interval that a DHCP
883	        server gives out to a DHCP client in the dhcp-lease-time option
884	        of a DHCPACK packet.  It may be shorter than the desired client
885	        lease interval (as explained below).

887	      o potential lease interval

889	        The potential lease interval is the lease expiration interval
890	        the local server tells to its partner in the potential-
891	        expiration-time option of a BNDUPD message.

893	      o acknowledged potential lease interval

895	        The acknowledged potential lease interval is the potential least
896	        interval the partner server has most recently acknowledged in
897	        the potential-expiration-time option of a BNDACK message.

899	   The key restriction (and guarantee) that any server makes with
900	   respect to lease intervals is that the actual client lease interval
901	   never exceeds the acknowledged potential lease interval (if any) by
902	   more than a fixed amount.  This fixed amount is called the "Maximum
903	   Client Lead Time" (MCLT).

905	   The MCLT MAY be configurable on the primary server, but for correct
906	   server operation it MUST be the same and known to both the primary
907	   and secondary servers.  The secondary server determines the MCLT from
908	   the MCLT option sent from the primary server to the secondary server
909	   in the CONNECT or CONNECTACK message.

911	   A server MUST record in its stable storage both the actual lease
912	   interval and the most recently acknowledged potential lease interval
913	   for each IP address binding.  It is assumed that the desired client
914	   lease interval can be determined through techniques outside of the
915	   scope of this protocol.

917	   Again, the fundamental relationship among these times which MUST be
918	   maintained is:

920	       actual lease interval <
921	       ( acknowledged potential lease interval + MCLT )

923	   Figure 5.1-1 illustrates a initial lease to a client using the rules
924	   discussed in the example which follows it.

926	              DHCP                 Primary             Secondary
927	       time   Client               Server               Server

929	                | (time in intervals) |  (absolute time)   |
930	                |                     |                    |
931	                | >-DHCPDISCOVER->    |                    |
932	                |     <---DHCPOFFER-< |                    |
933	                |                     |                    |
934	                | >-DHCPREQUEST->     |                    |
935	                |   (selecting)       |                    |
936	                |                     |                    |
937	         t      |  <--------DHCPACK-< |                    |
938	                |  lease-time=MCLT    |                    |
939	                |                     |    >-BNDUPD-->     |
940	                |                     |  lease-expiration=t+MCLT
941	                |                     |  potential-expiration=t+(MCLT/2)+X
942	                |                     |                    |
943	                |                     |     <-BNDACK-<     |
944	                |                     |  potential-expiration=t+(MCLT/2)+X
945	               ...                   ...                  ...
946	                |                     |                    |
947	      t+MCLT/2  | >-DHCPREQUEST->     |                    |
948	                |      (renew)        |                    |
949	                |                     |                    |
950	         t1     |  <--------DHCPACK-< |                    |
951	                |   lease-time=X      |                    |
952	                |                     |    >-BNDUPD-->     |
953	                |                     |  lease-expiration=t1+X
954	                |                     |  potential-expiration=t1+(X/2)+X
955	                |                     |                    |
956	                |                     |     <-BNDACK-<     |
957	                |                     |  potential-expiration=t1+(X/2)+X
958	               ...                   ...                  ...

960	           Figure 5.1-1:  Lazy Update Message Traffic
961	                          X = Desired Lease Interval

963	   DISCUSSION:

965	      This protocol mandates no algorithm concerning these lease inter-
966	      vals, as long as above fundamental relationship is preserved.

968	      In the interests of clarity, however, let's examine a specific
969	      example.  The MCLT in this case is 1 hour.  The desired lease
970	      interval is 3 days, and its renewal time is half the lease inter-
971	      val.

973	      The rules for this example are:

975	      o What to tell the client:

977	        Take the remainder of the acknowledged potential lease interval.
978	        If this is a new lease, then this value will be zero.  If this
979	        remainder plus the MCLT is greater than the desired lease inter-
980	        val, give the client the desired lease interval else give the
981	        client the remainder plus the MCLT.

983	      o What to tell the failover partner server:

985	        Take the renewal interval (typically half of the actual client
986	        lease interval), add to it the desired lease interval, and add
987	        it to the current time to yield the value that goes into the
988	        potential-expiration-time option.

990	        Also tell the failover partner the actual lease interval by
991	        adding it to the current time to yield the value that goes into
992	        the lease-expiration option.

994	      In operation this might work as follows:

996	      When a server makes an offer for a new lease on an IP address to a
997	      DHCP client, it determines the desired lease interval (in this
998	      case, 3 days).  It then examines the acknowledged potential lease
999	      interval (which in this case is zero) and determines the remainder
1000	      of the time left to run, which is also zero.  To this it adds the
1001	      MCLT.  Since the actual lease interval cannot be allowed to exceed
1002	      the remainder of the current acknowledged potential lease interval
1003	      plus the MCLT, the offer made to the client is for the remainder
1004	      of the current acknowledged potential lease interval (i.e., zero)
1005	      plus the MCLT.  Thus, the actual lease interval is 1 hour.

1007	      Once the server has performed the ACK to the DHCP client, it will
1008	      update the secondary server with the lease information. However,
1009	      the desired potential lease interval will be composed of the one
1010	      half of the current actual lease interval added to the desired
1011	      lease interval. Thus, the secondary server is updated with a
1012	      BNDUPD with a lease interval of 3 days + 1/2 hour specified in the
1013	      IP Address Lease Time Option (Option 51).

1015	      When the primary server receives an ACK to its update of the
1016	      secondary server's (partner's) potential lease interval, it
1017	      records that as the acknowledged potential lease interval.  A
1018	      server MUST NOT send a BNDACK in response to a BNDUPD message
1019	      until it is sure that the information in the BNDUPD message
1020	      resides in its stable storage.  Thus, the primary server in this
1021	      case can be sure that the secondary server has recorded the poten-
1022	      tial lease interval in its stable storage when the primary server
1023	      receives a BNDACK message from the secondary server.

1025	      When the DHCP client attempts to renew at T1 (approximately one
1026	      half an hour from the start of the lease), the primary server
1027	      again determines the desired lease interval, which is still 3
1028	      days.  It then compares this with the remaining acknowledged
1029	      potential lease interval (3 days + 1/2 hour) and adjusts for the
1030	      time passed since the secondary was last updated (1/2 hour).  Thus
1031	      the time remaining of the acknowledged potential lease interval is
1032	      3 days.  Adding the MCLT to this yields 3 days plus 1 hour, which
1033	      is more than the desired lease interval of 3 days.  So the client
1034	      is renewed for the desired lease interval -- 3 days.

1036	      When the primary DHCP server updates the secondary DHCP server
1037	      after the DHCP client's renewal ACK is complete, it will calculate
1038	      the desired potential lease interval as the T1 fraction of the
1039	      actual client lease interval (1/2 of 3 days this time = 1.5 days).
1040	      To this it will add the desired client lease interval of 3 days,
1041	      yielding a total desired partner server lease interval of 4.5
1042	      days.  In this way, the primary attempts to have the secondary
1043	      always "lead" the client in its understanding of the client's
1044	      lease interval so as to be able to always offer the client the
1045	      desired client lease interval.

1047	      Once the initial actual client lease interval of the MCLT is past,
1048	      the protocol operates effectively like the DHCP protocol does
1049	      today in its behavior concerning lease intervals. However, the
1050	      guarantee that the actual client lease interval will never exceed
1051	      the remaining acknowledged partner server lease interval by more
1052	      than the MCLT allows full recovery from a variety of failures.

1054	5.2.2.  Controlled re-allocation of IP addresses

1056	   When in PARTNER-DOWN state there is a waiting period after which an
1057	   IP address can be re-allocated to another client.  For leases which
1058	   are available when the server enters PARTNER-DOWN state, the period
1059	   is the MCLT from entry into PARTNER-DOWN state.  For IP addresses
1060	   which are not available when the server enters PARTNER-DOWN state,
1061	   the period is the MCLT after the lease becomes available.  See sec-
1062	   tion 9.4.2 for more details.

1064	   In any other state, a server cannot reallocate an address from one
1065	   client to another without first notifying its partner (through a
1066	   BNDUPD message) and receiving acknowledgement (through a BNDACK mes-
1067	   sage) that its partner is aware that that first client is not using
1068	   the address.

1070	   This could be modeled in the following way. Though this specific
1071	   implementation is in no way required, it may serve to better illus-
1072	   trate the concept.

1074	   An "available" IP address on a server may be allocated to any client.
1075	   An IP address which was leased to a client and which expired or was
1076	   released by that client would take on a new state, EXPIRED or
1077	   RELEASED respectively.  The partner server would then be notified
1078	   that this IP address was EXPIRED or RELEASED through a BNDUPD.  When
1079	   the sending server received the BNDACK for that IP address showing it
1080	   was FREE, it would move the IP address from EXPIRED or RELEASED to
1081	   FREE, and it would be available for allocation by the primary server
1082	   to any clients.

1084	   A server MAY reallocate an IP address in the EXPIRED or RELEASED
1085	   state to the same client with no restrictions.

1087	5.3.  Load balancing

1089	   In order to implement load balancing between a primary and secondary
1090	   server pair, each server must respond to DHCPDISCOVER requests from
1091	   some clients and not from other clients.  In order to do this suc-
1092	   cessfully, each server must be able to determine immediately upon
1093	   receipt of a DHCP client request whether it is to service this
1094	   request or to ignore it in order to allow the other server to service
1095	   the request.

1097	   In addition, it should be possible to configure the percentage of
1098	   clients which will be serviced by either the primary or secondary
1099	   server.  This configuration should be more or less continuous, from
1100	   all serviced by the primary through an even split with half serviced
1101	   by each, to all serviced by the secondary.

1103	   The technique chosen to support these goals is to define a hash func-
1104	   tion which must be applied to the client-identifier or to the htype
1105	   concatenated with the chaddr if no client-identifier is specified.
1106	   The results of this hash function yields a number between 0 and 255
1107	   which maps into one of 256 "hash-buckets".  Each hash bucket is
1108	   assigned to one server or the other by the primary server whenever a
1109	   connection is established, through use of the hash-bucket-assignment
1110	   option.

1112	   The hash-bucket-assignment option uses a 32 octet value field (con-
1113	   taining 256 bits), with one bit associated with each possible hash
1114	   bucket.  If the bit corresponding to a hash bucket is a 1 in the
1115	   hash-bucket-assignment option, then the secondary server is required
1116	   to service all DHCP client requests that map into that hash bucket
1117	   when in NORMAL state.

1119	   For example, if the primary server sends a hash-bucket-assignment
1120	   option to the secondary with the following 32 octets:

1122	                                  buckets
1123	       FF FF FF FF FF FF FF FF  ( 0   - 63 )
1124	       FF FF FF FF FF FF FF FF  ( 64  - 127 )
1125	       00 00 00 00 00 00 00 00  ( 128 - 191 )
1126	       00 00 00 00 00 00 00 00  ( 192 - 255 )

1128	   then the secondary MUST service any DHCP client requests where the
1129	   client-identifier or htype concatenated with the chaddr hashs into
1130	   the bucket values of 0 through 127.

1132	   See section 12 for the code to implement the hash bucket algorithm.
1133	   Each server MUST implement this same algorithm in order for all
1134	   clients to get service.

1136	5.4.  Operating in NORMAL state

1138	   When in NORMAL state, each server services DHCPDISCOVER's and all
1139	   other DHCP requests other than DHCPREQUEST/RENEWAL or
1140	   DHCPREQUEST/REBINDING from the client set defined by the load balanc-
1141	   ing algorithm.  Each server services DHCPREQUEST/RENEWAL or
1142	   DHCPDISCOVER/REBINDING requests from any client.

1144	   In general, whenever the binding database is changed in stable
1145	   storage, then a BNDUPD message is sent with the contents of that
1146	   change to the partner server.  The partner server then writes the
1147	   information about that binding in its bindings database in stable
1148	   storage and replies with a BNDACK message.

1150	5.5.  Operating in COMMUNICATIONS-INTERRUPTED state

1152	   When operating in COMMUNICATIONS-INTERRUPTED state, each server is
1153	   operating independently, but does not assume that its partner is not
1154	   operating.  The partner server might be operating and simply unable
1155	   to communicate with this server, or might not be operating.

1157	   Each server responds to the full range of DHCP client messages that
1158	   it receives, but in such a way that graceful reintegration is alway
1159	   possible when its partner comes back into contact with it.

1161	5.6.  Operating in PARTNER-DOWN state

1163	   When operating in PARTNER-DOWN state, a server assumes that its
1164	   partner is not currently operating, but does make allowances for the
1165	   possibility that that server was operating in the past.  It responds
1166	   to all DHCP client requests in PARTNER-DOWN state.

1168	   Any transactions that the partner server may have had with DHCP
1169	   clients but been unable to communicate to this server are allowed for
1170	   in the algorithms that are used to gradually take over full control
1171	   of all  of the addresses configured into the server.

1173	5.7.  Operating in RECOVER state

1175	   A server operating in RECOVER state assumes that it is reintegrating
1176	   with a server that has been operating in PARTNER-DOWN state, and that
1177	   it needs to update its bindings database before it services DHCP
1178	   client requests.

1180	   A server may also operate in RECOVER state in order to fully recover
1181	   its bindings database from its partner server.

1183	6.  Packet Formats

1185	   This section discusses the common message format that all failover
1186	   messages have in common, and then defines option used in the failover
1187	   protocol.

1189	6.1.  Common message format

1191	   All failover protocol messages are sent over the TCP connection
1192	   between failover endpoints and encoded using a packet format specific
1193	   to the failover protocol.

1195	   There exists a common message format for all failover messages, which
1196	   utilizes the options in a way similar to the DHCP protocol.  For each
1197	   message type, some options are required and some are optional.  In
1198	   addition, when a message is received any options that are not under-
1199	   stood by the receiving server MUST be ignored.

1201	   All of the fields in the fixed portion of the packet MUST be filled
1202	   with correct data in every message sent.

1204	   0                   1                   2                   3
1205	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1206	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1207	   |         packet length (2)     | msg type (1)  |payload off (1)|
1208	   +---------------+---------------+---------------+---------------+
1209	   |                            xid (4)                            |
1210	   +---------------------------------------------------------------+
1211	   |     0 or more additional header bytes  (variable)             |
1212	   +---------------------------------------------------------------+
1213	   |                    payload data  (variable)                   |
1214	   |                                                               |
1215	   |               formatted as DHCP-style options                 |
1216	   |         using a unique option number space in the ?R6?        |
1217	   |                   format defined by [NAMESPACE]               |
1218	   +---------------------------------------------------------------+

1220	   packet length - 2 bytes, network byte order

1222	   This is the length of the packet.  It includes the two byte packet
1223	   length itself.

1225	   msg type - 1 byte

1227	   The message type field is used to distinguish between messages.

1229	   The following message types are defined:

1231	   Value   Message Type
1232	   -----   ------------
1233	   0       reserved    not used
1234	   1       POOLREQ     request allocation of addresses
1235	   2       POOLRESP    respond with allocation count
1236	   3       BNDUPD      update partner with binding info
1237	   4       BNDACK      acknowledge receipt of binding update
1238	   5       CONNECT     establish connection with partner
1239	   6       CONNECTACK  respond to attempt to establish contact with partner
1240	   7       UPDREQALL   request full transfer of binding info
1241	   8       UPDDONE     ack send and ack of req'd binding info
1242	   9       UPDREQ      req transfer of un-acked binding info
1243	   10      STATE       inform partner of current state or state change
1244	   11      CONTACT     probe communications integrity with partner

1246	   New message types should be defined in one of two ranges, 0-127 or
1247	   129-255.  The range of 0-127 is used for messages that MUST be
1248	   supported by every server, and if a server receives a message in the
1249	   range of 0-127 that it doesn't understand, it MUST drop the TCP con-
1250	   nection.  The range of 128-255 is used for messages which MAY be sup-
1251	   ported but are not required, and if a server receives a message in
1252	   this range that it does not understand it SHOULD ignore the message.

1254	   payload offset - 1 byte

1256	   The byte offset of the Payload Data, from the beginning of the
1257	   failover packet header. The value for the current protocol version is
1258	   8.

1260	   xid - 4 bytes, network byte order

1262	   This is the transaction id of the failover packet.  The sender of a
1263	   failover protocol packet is responsible for setting this number, and
1264	   the receiver of the packet copies the number over into any response
1265	   packet, treating it as opaque data.  The sender SHOULD ensure that
1266	   every packet sent from a particular failover endpoint over the
1267	   associated TCP connection has a unique transaction id unless that
1268	   packet is a re-transmission.

1270	   payload data - variable length

1272	   The options are placed after the header, after skipping payload
1273	   offset bytes from beginning of the packet.  The payload data options
1274	   are not preceded by a "cookie" value.

1276	   The payload data is formatted as DHCP style options using the two
1277	   byte option number and two byte option length format as specified in
1278	   the recommendations of the DHCP panel in [NAMESPACE].

1280	   The maximum length of the payload data in octets is 2048 less the
1281	   size of the header, i.e., the maximum packet length is 2048 octets.

1283	6.2.  Common option format

1285	   The options contained in the payload data section of the failover
1286	   packet all use the two byte option number and two byte length format
1287	   as specified by the recommendations of the DHCP panel in [NAMESPACE].

1289	   The option numbers are drawn from an option number space unique to
1290	   the failover protocol.  All of the message types share a common
1291	   option number space and common options definitions, though not all
1292	   options are required or meaningful for every message.

1294	   In contrast to the options which appear in DHCP client and server
1295	   packets, the options in failover message are ordered.  That is, for
1296	   some messages the order in which the options appear in the payload
1297	   data area is significant.  The messages for which this is the case
1298	   spell it out in detail.

1300	   For all options which refer to time, they all use an absolute time in
1301	   GMT.  Time synchronization has already been achieved between the
1302	   source and the target server using the CONNECT message.  All time
1303	   fields in the options defined below use a time represented as seconds
1304	   elapsed since Jan 1, 1970 (i.e. ANSI C time_t time value representa-
1305	   tion).  Note that this is (at present) a signed field.

1307	   Additional options can be defined for intervendor or vendor specific
1308	   use with limited difficulty due to the large number of option numbers
1309	   available.

1311	6.2.1.  binding-status

1313	   This option is used to convey the current state of a binding.

1315	       Code          Len     Type
1316	   +-----+-----+------+-----+-----+
1317	   |  0  |  1  |   0  |  1  | 1-7 |
1318	   +-----+-----+------+-----+-----+

1320	   Legal values for this option are:

1322	   Value Binding Status
1323	   ----- ------------------------------------------------
1324	   1     FREE           Lease has never been used
1325	   2     ACTIVE         Lease is assigned to a client
1326	   3     EXPIRED        Lease has expired
1327	   4     RELEASED       Lease has been released by client
1328	   5     ABANDONED      A server, or client flagged address as unusable
1329	   6     RESET          Lease was freed by some external agent
1330	   7     BACKUP         Lease belongs to secondary's private address pool
1331	   8     EXPIRED-GRACE  Lease will become available after this period
1332	   9     RELEASED-GRACE Lease will become available after this period

1334	6.2.2.  assigned-IP-address

1336	   The IP address to which this message refers.

1338	        Code         Len          Address
1339	   +-----+-----+------+-----+----+-----+-----+-----+
1340	   |  0  |  2  |   0  |  4  | a1 |  a2 |  a3 |  a4 |
1341	   +-----+-----+------+-----+----+-----+-----+-----+

1343	6.2.3.  sending-server-IP-address

1345	   The IP address of the server sending this message.

1347	        Code         Len          Address
1348	   +-----+-----+------+-----+----+-----+-----+-----+
1349	   |  0  |  3  |   0  |  4  | a1 |  a2 |  a3 |  a4 |
1350	   +-----+-----+------+-----+----+-----+-----+-----+

1352	6.2.4.  addresses-transferred

1354	   A 32 bit unsigned long in network byte order. Reports the number of
1355	   addresses transferred by the primary to the secondary server
1356	   (addresses to be used for the secondary server's private address
1357	   pool)

1359	        Code         Len       Number of Addresses
1360	   +-----+-----+------+-----+----+-----+-----+-----+
1361	   |  0  |  4  |   0  |  4  | n1 |  n2 |  n3 |  n4 |
1362	   +-----+-----+------+-----+----+-----+-----+-----+

1364	6.2.5.  client-identifier

1366	   The format, code and conventions used are identical to DHCP option
1367	   61.

1369	        Code         Len       Client Identifier
1370	   +-----+-----+------+-----+----+-----+---
1371	   |  0  |  5  |   0  |  n  | i1 |  i2 | ...
1372	   +-----+-----+------+-----+----+-----+--

1374	6.2.6.  client-hardware-address

1376	   The format is similar to DHCP option 61. Byte t1 (type) MUST be set
1377	   to the proper ARP hardware address code, as defined in the ARP
1378	   section of RFC 1700 (it MUST NOT be zero!)

1380	        Code         Len      MAC address
1381	   +-----+-----+------+-----+----+-----+-----+---
1382	   |  0  |  6  |   0  |  n  | t1 |  m1 |  m2 | ...
1383	   +-----+-----+------+-----+----+-----+-----+---

1385	   Either Client Id, Client Hardware Address or BOTH MAY be present in
1386	   binding update transactions. At least one of them MUST be present.
1387	   If both are present, the Client Id MUST be used to uniquely identify
1388	   the owner of the binding (exactly as in RFC 2131).

1390	6.2.7.  client-FQDN

1392	   If an implementation supports Dynamic DNS updates, this option can be
1393	   used to communicate the DNS name that was set. Uses the format of the
1394	   Client FQDN option (81) as described in [DDNS] and extended to fit in
1395	   the two byte code and length approach of the DHCP panel.

1397	        Code         Len     Flags Rcode1 Rcode2 Domain Name
1398	   +-----+-----+------+-----+-----+------+------+-----+------
1399	   |  0  |  7  |   0  |  n  |  f  |  r1  |  r2  |  d1 | d2...
1400	   +-----+-----+------+-----+-----+------+------+-----+------

1402	6.2.8.  reject-reason

1404	   This option is used to selectively reject binding updates. It MAY be
1405	   used in BNDACK message, always associated with an assigned-IP-address
1406	   option, which contains the IP address of the update being rejected.

1408	        Code         Len     Reason Code
1409	   +-----+-----+------+-----+----------+
1410	   |  0  |  8  |   0  |  1  |    R1    |
1411	   +-----+-----+------+-----+----------+

1413	   Reason codes :

1415	   0   Reserved
1416	   1   Illegal IP address (not part of any address pool)
1417	   2   Fatal conflict exists: address in use by other client.
1418	   3   Missing binding information.
1419	   4   Connection rejected, time mismatch too great.
1420	   5   Connection rejected, invalid MCLT.
1421	   6   Connection rejected, unknown reason.
1422	   7   Connection rejected, duplicate connection.
1423	   8   Connection rejected, invalid failover partner.
1424	   9   TLS not supported
1425	   10  TLS supported but not configured
1426	   11  TLS required but not supported by partner
1427	   12  Message digest not supported
1428	   13  Message digest not configured
1429	   14  Protocol version mismatch
1430	   15  Missing binding information
1431	   16  Outdata binding information
1432	   17  Less critical binding information
1433	   18-253, reserved.
1434	   254 Unknown: Error occurred but does not match any reason code
1435	   255 Reserved for code expansion

1437	6.2.9.  message

1439	   This option is used to supply a human readable message.  It may be
1440	   used in association with the Reject Reason Code to provide a human
1441	   readable error message for the reject.

1443	        Code         Len         Text
1444	   +-----+-----+------+-----+------+-----+--
1445	   |  0  |  9  |   0  |  n  |  c1  | c2  | ...
1446	   +-----+-----+------+-----+------+-----+--

1448	6.2.10.  MCLT

1450	   Maximum Client Lead Time, in seconds.  A 32 bit integer value, in
1451	   network byte order. T

1453	        Code         Len             Time
1454	   +-----+-----+------+-----+----+-----+-----+-----+
1455	   |  0  |  10 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
1456	   +-----+-----+------+-----+----+-----+-----+-----+

1458	6.2.11.  vendor-class-identifier

1460	   A string which identifies the vendor of the failover protocol
1461	   implementation.

1463	   The code for this option is 60, and its minimum length is 1.

1465	        Code         Len           vendor class string
1466	   +-----+-----+------+-----+----+-----+---
1467	   |  0  |  11 |   0  |  n  | c1 |  c2 |  ...
1468	   +-----+-----+------+-----+----+-----+---

1470	6.2.12.  current-time

1472	   The current time expressed as an absolute time in GMT represented as
1473	   seconds elapsed since Jan 1, 1970 (i.e.  ANSI C time_t time value
1474	   representation).

1476	        Code         Len          Current Time
1477	   +-----+-----+------+-----+----+-----+-----+-----+
1478	   |  0  |  12 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
1479	   +-----+-----+------+-----+----+-----+-----+-----+

1481	6.2.13.  lease-expiration-time

1483	   The lease expiration time expressed as an absolute time in GMT
1484	   represented as seconds elapsed since Jan 1, 1970 (i.e.  ANSI C time_t
1485	   time value representation).

1487	   The lease expiration time is the time that a server has ACKed to a
1488	   DHCP client.

1490	        Code         Len          Time
1491	   +-----+-----+------+-----+----+-----+-----+-----+
1492	   |  0  |  13 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
1493	   +-----+-----+------+-----+----+-----+-----+-----+

1495	6.2.14.  potential-expiration-time

1497	   The potential expiration time expressed as an absolute time in GMT
1498	   represented as seconds elapsed since Jan 1, 1970 (i.e.  ANSI C time_t
1499	   time value representation).

1501	   The potential expiration time is the time that one server tells
1502	   another server that it may ACK to a client.

1504	        Code         Len          Time
1505	   +-----+-----+------+-----+----+-----+-----+-----+
1506	   |  0  |  14 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
1507	   +-----+-----+------+-----+----+-----+-----+-----+

1509	6.2.15.  grace-expiration-time

1511	   The grace expiration time expressed as an absolute time in GMT
1512	   represented as seconds elapsed since Jan 1, 1970 (i.e.  ANSI C time_t
1513	   time value representation).

1515	   The grace expiration time is the time that a grace period will
1516	   expire.

1518	        Code         Len          Time
1519	   +-----+-----+------+-----+----+-----+-----+-----+
1520	   |  0  |  15 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
1521	   +-----+-----+------+-----+----+-----+-----+-----+

1523	6.2.16.  client-last-transaction-time

1525	   The time at which this server last received a DHCP request from a
1526	   particular client expressed as an absolute time in GMT represented as
1527	   seconds elapsed since Jan 1, 1970 (i.e.  ANSI C time_t time value
1528	   representation).

1530	        Code         Len       Partner Down Time
1531	   +-----+-----+------+-----+----+-----+-----+-----+
1532	   |  0  |  16 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
1533	   +-----+-----+------+-----+----+-----+-----+-----+

1535	6.2.17.  start-time-of-state

1537	   The time at which the state contained in this message began,
1538	   expressed as an absolute time in GMT represented as seconds elapsed
1539	   since Jan 1, 1970 (i.e.  ANSI C time_t time value representation).

1541	   This option is used for different states in different messages.  In a
1542	   BNDUPD message it represents the start time of the state of the lease
1543	   in the BNDUPD message.  In a STATE message, it represents the start
1544	   time of the partner server's failover state.

1546	        Code         Len      Start Time of State
1547	   +-----+-----+------+-----+----+-----+-----+-----+
1548	   |  0  |  17 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
1549	   +-----+-----+------+-----+----+-----+-----+-----+

1551	6.2.18.  server-state

1553	   This option is used to convey the current state of the failover
1554	   endpoint in the sending server.

1556	       Code          Len     Server State
1557	   +-----+-----+------+-----+-----+
1558	   |  0  |  18 |   0  |  1  | 1-9 |
1559	   +-----+-----+------+-----+-----+

1561	   Legal values for this option are:

1563	   Value   Server State
1564	   -----   -------------------------------------------------------------
1565	   0       reserved
1566	   1       STARTUP                      Startup state (1)
1567	   2       NORMAL                       Normal state
1568	   3       COMMUNICATIONS-INTERRUPTED   Communication interrupted (safe)
1569	   4       PARTNER-DOWN                 Partner down (unsafe mode)
1570	   5       POTENTIAL-CONFLICT           Synchronizing
1571	   6       RECOVER                      Recovering bindings from partner
1572	   7       PAUSED                       Shutting down for a short period.
1573	   8       SHUTDOWN                     Shutting down for an extended
1574	                                        period.
1575	   9       RECOVER-DONE                 Interlock state prior to NORMAL

1577	6.2.19.  server-flags

1579	   This option is used to convey the current flags of the failover
1580	   endpoint in the sending server.

1582	       Code          Len     Server Flags
1583	   +-----+-----+------+-----+-------+
1584	   |  0  |  19 |   0  |  1  | flags |
1585	   +-----+-----+------+-----+-------+

1587	   Legal values for this option are:

1589	   Currently, bit 5 is defined.  All other bits
1590	   are reserved, and must be set to 0.

1592	      o STARTUP

1594	        Bit 5 is the STARTUP flag.  Bit 5 MUST be set to 1 whenever the
1595	        server is in STARTUP state, and set to 0 otherwise.  (Note that
1596	        when in STARTUP state, the state transmitted in the server-state
1597	        option is usually the last recorded state from stable storage,
1598	        but see section 9.3 for details.)

1600	6.2.20.  vendor-specific-options

1602	   This option is used to convey options specific to a particular
1603	   vendor's implementation.  The vendor class identifier is used to
1604	   specify which option space the embedded options are drawn from.

1606	   It functions similarly to the vendor class identifier and vendor
1607	   specific options in the DHCP protocol.

1609	   This option contains other options in the same two byte code, two
1610	   byte length format.  If this option appears in a message without a
1611	   corresponding vendor class identifier, it MUST be ignored.

1613	        Code         Len        Embedded options
1614	   +-----+-----+------+-----+----+-----+---
1615	   |  0  |  20 |   0  |  n  | c1 |  c2 |  ...
1616	   +-----+-----+------+-----+----+-----+---

1618	6.2.21.  max-unacked-bndupd

1620	   The maximum number of BNDUPD message that this server is prepared to
1621	   accept over the TCP connection without causing the TCP connection to
1622	   block.

1624	        Code         Len     Maximum Unacked BNDUPD
1625	   +-----+-----+------+-----+----+-----+-----+-----+
1626	   |  0  |  21 |   0  |  4  | n1 |  n2 |  n3 |  n4 |
1627	   +-----+-----+------+-----+----+-----+-----+-----+

1629	6.2.22.  server-role

1631	   This option is used to convey the role of the failover endpoint in
1632	   the sending server.

1634	       Code          Len      Role
1635	   +-----+-----+------+-----+-------+
1636	   |  0  |  22 |   0  |  1  |   r1  |
1637	   +-----+-----+------+-----+-------+

1639	   A value of 0 indicates that the failover endpoint is a primary server
1640	   and a value of 1 indicates that it is a secondary server.

1642	6.2.23.  receive-timer

1644	   The number of seconds within which the server must receive a packet
1645	   from its partner, or it will assume that the partner is down or the
1646	   communication path to the partner has failed.

1648	        Code         Len         Receive Timer
1649	   +-----+-----+------+-----+----+-----+-----+-----+
1650	   |  0  |  23 |   0  |  4  | s1 |  s2 |  s3 |  s4 |
1651	   +-----+-----+------+-----+----+-----+-----+-----+

1653	6.2.24.  hash-bucket-assignment

1655	   The set of hash values to which the receiving server MUST respond.
1656	   See section 5.3 for more information on how this option is used.

1658	   This option consists of a set of 32 bytes, in network byte order,
1659	   where each bit corresponds to one of 256 possible hash bucket values.
1660	   If a bit is set to 1, the recipient is required to service the
1661	   requests whose client-identifier or htype concatenated with the
1662	   chaddr (if no client-identifier exists) map into the corresponding
1663	   hash bucket.

1665	        Code         Len        Hash Buckets
1666	   +-----+-----+------+-----+----+-----+-----+-----+
1667	   |  0  |  24 |   0  |  32 | b1 |  b2 | ... | b32 |
1668	   +-----+-----+------+-----+----+-----+-----+-----+

1670	6.2.25.  message-digest

1672	   The message digest for this message.

1674	   This option consists of a variable number of bytes which contain the
1675	   message digest of the message prior to the inclusion of this option.

1677	   When this option appears in a message, it MUST appear as the last
1678	   option in the message.

1680	        Code         Len       Message Digest
1681	   +-----+-----+------+-----+----+-----+-----
1682	   |  0  |  25 |   0  |  n  | d1 |  d2 | ...
1683	   +-----+-----+------+-----+----+-----+-----

1685	6.2.26.  protocol-version

1687	   The protocol version being used by the server. It is only sent in the
1688	   CONNECT and CONNECTACK messages.

1690	        Code         Len    Version
1691	   +-----+-----+------+-----+----+
1692	   |  0  |  26 |   0  |  1  | v1 |
1693	   +-----+-----+------+-----+----+

1695	6.2.27.  TLS-request

1697	   This option contains information relating to TLS security
1698	   negotiation.  It is sent in a CONNECT message

1700	   The first byte, req, is the TLS request from this server.  A value of
1701	   0 indicates no TLS operation, a value of 1 indicates that TLS
1702	   operation is desired, and a value of 2 indicates that TLS operation
1703	   is required to establish communications with this server.

1705	   The second byte, acc, is what this server will accept for TLS
1706	   operation.  A value of 0 means that this server will not accept TLS
1707	   connections.  A value of 1 means that this server will accept TLS
1708	   connections.

1710	   If req is not zero, then acc MUST be 1.

1712	   This allows a server which is not configured for TLS support to
1713	   inform its partner that it will accept a TLS connection although it
1714	   does not desire one, for instance.

1716	        Code         Len  request acccept
1717	   +-----+-----+------+-----+----+----+
1718	   |  0  |  27 |   0  |  2  | req| acc|
1719	   +-----+-----+------+-----+----+----+

1721	6.2.28.  TLS-reply

1723	   This option contains information relating to TLS security
1724	   negotiation.  It is sent in a CONNECTACK message

1726	   The value of 0 indicates no TLS operation, a value of 1 indicates
1727	   that TLS operation is required.

1729	        Code         Len     TLS
1730	   +-----+-----+------+-----+----+
1731	   |  0  |  28 |   0  |  1  | t1 |
1732	   +-----+-----+------+-----+----+

1734	6.3.  BNDUPD message format

1736	   The binding update (BNDUPD) message is used to send the binding data-
1737	   base changes to the partner server.

1739	   The message type for the BNDUPD message is 3.

1741	   The xid of the BNDUPD MUST be unique with respect to other failover
1742	   messages transmitted from this failover endpoint.

1744	   The following table summarizes the various options for the BNDUPD
1745	   message.

1747	                                        binding-status

1749	   Option                        ACTIVE     EXPIRED    RELEASED   FREE
1750	   ------                        ------     -------    --------   ----
1751	   assigned-IP-address           MUST       MUST       MUST       MUST
1752	   binding-status                MUST       MUST       MUST       MUST
1753	   client-identifier             MAY        MAY        MAY        MAY
1754	   client-hardware-address       MUST       MUST       MUST       MAY
1755	   lease-expiration-time         MUST       MUST NOT   MUST NOT   MUST NOT
1756	   potential-expiration-time     MUST       MUST NOT   MUST NOT   MUST NOT
1757	   grace-expiration-time         MUST NOT   MUST NOT   MUST NOT   MUST NOT
1758	   start-time-of-state           SHOULD     SHOULD     SHOULD     SHOULD
1759	   client-last-trans.-time       SHOULD     SHOULD     SHOULD     MAY
1760	   client-FQDN(1)                SHOULD     SHOULD     SHOULD     SHOULD
1761	   all others                    MAY        MAY        MAY        MAY

1763	                                        binding-status
1764	                                                          BACKUP
1765	                                EXPIRED-     RELEASED-    RESET
1766	   Option                       GRACE        GRACE        ABANDONED
1767	   ------                       ------       -----        ---------
1768	   assigned-IP-address          MUST         MUST         MUST
1769	   binding-status               MUST         MUST         MUST
1770	   client-identifier            MAY          MAY          MAY(2)
1771	   client-hardware-address      MAY          MAY          MAY(2)
1772	   lease-expiration-time        MUST NOT     MUST NOT     MUST NOT
1773	   potential-expiration-time    MUST NOT     MUST NOT     MUST NOT
1774	   grace-expiration-time        MUST         MUST         MUST NOT
1775	   start-time-of-state          SHOULD       SHOULD       SHOULD
1776	   client-last-trans.-time      SHOULD       SHOULD       MAY
1777	   client-FQDN(1)               SHOULD       SHOULD       SHOULD
1778	   all others                   MAY          MAY          MAY

1780	   (1) Only SHOULD appear if client supplies a host name and dynamic DNS
1781	       is used.

1783	   (2) MUST NOT if binding-status is ABANDONED.

1785	                 Table 6.3-1: Options used in a BNDACK message

1787	6.4.  BNDACK message format

1789	   A server sends a binding acknowledgement (BNDACK) message when it has
1790	   successfully committed binding database changes received from a fail-
1791	   over partner in a BNDUPD message to its own stable storage.

1793	   The message type for the BNDACK message is 4.

1795	   The xid in a BNDACK MUST be the same as the xid of the corresponding
1796	   BNDUPD.

1798	   The following table summarizes the options for the BNDACK message.

1800	                                        binding-status

1802	   Option                        ACTIVE     EXPIRED    RELEASED   FREE
1803	   ------                        ------     -------    --------   ----
1804	   assigned-IP-address           MUST       MUST       MUST       MUST
1805	   binding-status                MUST       MUST       MUST       MUST
1806	   client-identifier             MAY        MAY        MAY        MAY
1807	   client-hardware-address       MUST       MUST       MUST       MAY
1808	   reject-reason                 MAY        MAY        MAY        MAY
1809	   message                       MAY        MAY        MAY        MAY
1810	   lease-expiration-time         MUST       MUST NOT   MUST NOT   MUST NOT
1811	   potential-expiration-time     MUST       MUST NOT   MUST NOT   MUST NOT
1812	   grace-expiration-time         MUST NOT   MUST NOT   MUST NOT   MUST NOT
1813	   start-time-of-state           SHOULD     SHOULD     SHOULD     SHOULD
1814	   client-last-trans.-time       SHOULD     SHOULD     SHOULD     MAY
1815	   client-FQDN(1)                SHOULD     SHOULD     SHOULD     SHOULD
1816	   all others                    MAY        MAY        MAY        MAY

1818	                                        binding-status
1819	                                                          BACKUP
1820	                                EXPIRED-     RELEASED-    RESET
1821	   Option                       GRACE        GRACE        ABANDONED
1822	   ------                       ------       -----        ---------
1823	   assigned-IP-address          MUST         MUST         MUST
1824	   binding-status               MUST         MUST         MUST
1825	   client-identifier            MAY          MAY          MAY
1826	   client-hardware-address      MAY          MAY          MAY(2)
1827	   reject-reason                MAY          MAY          MAY
1828	   message                      MAY          MAY          MAY
1829	   lease-expiration-time        MUST NOT     MUST NOT     MUST NOT
1830	   potential-expiration-time    MUST NOT     MUST NOT     MUST NOT
1831	   grace-expiration-time        MUST         MUST         MUST NOT
1832	   start-time-of-state          SHOULD       SHOULD       SHOULD
1833	   client-last-trans.-time      SHOULD       SHOULD       MAY
1834	   client-FQDN(1)               SHOULD       SHOULD       SHOULD
1835	   all others                   MAY          MAY          MAY

1837	   (1) Only SHOULD appear if client supplies a host name and dynamic DNS
1838	       is used.

1840	   (2) MUST NOT if binding-status is ABANDONED.

1842	                  Table 6.4-1: Options used in a BNDACK message

1844	6.5.  Bulking for BNDUPD and BNDACK messages

1846	   DISCUSSION:

1848	      Bulking is planned for this protocol, but it hasn't been specified
1849	      in this revision of the draft.  Once the draft settles down, we
1850	      will specify the bulking approach in detail.

1852	6.6.  UPDREQ message format

1854	   The update request (UPDREQ) message is used by one server to request
1855	   that its partner send it all binding database information that it has
1856	   not already seen.

1858	   The message type for the UPDREQ message is 9.

1860	   The xid in a UPDREQ message MUST be unique among messages transmitted
1861	   from this failover endpoint during the life of this connection.

1863	   There are no options that MUST appear in an UPDREQALL message.  Any
1864	   option MAY appear.

1866	6.7.  UPDREQALL message format

1868	   The update request all (UPDREQALL) message is used by one server to
1869	   request that all binding database information be sent in order to
1870	   recover from a total loss of its lease state database by the request-
1871	   ing server.

1873	   The message type for the UPDREQALL message is 7.

1875	   The xid in a UPDREQALL message MUST be unique among messages
1876	   transmitted from this failover endpoint during the life of this con-
1877	   nection.

1879	   There are no options that MUST appear in an UPDREQALL message.  Any
1880	   option MAY appear.

1882	6.8.  UPDDONE message format

1884	   The update done (UPDDONE) message is used by the responding server to
1885	   indicate that all requested updates have been sent by the responding
1886	   server as BNDUPD messages and acked by the requesting server using
1887	   BNDACK messages.  While a BNDACK message MUST have been received for
1888	   each IP address that was sent in a BNDUPD message, the BNDACK message
1889	   could have contained a reject-reason in order to NAK that specific
1890	   update.

1892	   Thus, this message confirms that the requesting server has received
1893	   and responded to a BNDUPD message for all of the requested updates,
1894	   but it does require the requesting server to accept all of the
1895	   offered updates.

1897	   The message type for the UPDDONE message is 7.

1899	   The xid in an UPDDONE message MUST be identical to the xid in the
1900	   UPDREQ or UPDREQALL message that initiated the update process.

1902	   There are no options that MUST appear in an UPDDONE message.  Any
1903	   option MAY appear.

1905	6.9.  POOLREQ message format

1907	   The pool request (POOLREQ) is used by the secondary server to request
1908	   an allocation of IP addresses from the primary server.

1910	   The message type for the POOLREQ message is 1.

1912	   The xid in a POOLREQ message MUST be unique among messages transmit-
1913	   ted from this failover endpoint during the life of this connection.

1915	   There are no options that MUST appear in a POOLREQ message.  Any
1916	   option MAY appear.

1918	6.10.  POOLRESP message format

1920	   The pool response (POOLRESP) is used by the primary server to inform
1921	   the secondary server how many IP addresses it was allocated as the
1922	   result of a pool request.

1924	   The message type for the POOLRESP message is 2.

1926	   The xid in the POOLRESP message MUST be identical to the xid in the
1927	   POOLREQ message for which this POOLRESP is a response.

1929	   The following table shows the options that MUST appear in a POOLRESP
1930	   message:

1932	           Option
1933	           ------
1934	           addresses-transferred       MUST

1936	                          Table 6.10-1: Options used in a STATE message

1938	6.11.  CONNECT message format

1940	   The connect (CONNECT) message is used by either server to establish a
1941	   high level connection with the other server, and to transmit several
1942	   important configuration data items between the servers.

1944	   The message type for the CONNECT message is 5.

1946	   The xid in a CONNECT message MUST be unique among messages transmit-
1947	   ted from this failover endpoint during the life of this connection.

1949	   The CONNECT message MUST be the first message sent down a newly esta-
1950	   blished connection.

1952	   The following table summarizes the options that are associated with
1953	   the CONNECT message:

1955	                                      role

1957	   Option                      primary       secondary
1958	   ------                      ------        ---------
1959	   sending-server-IP-address   MUST          MUST
1960	   server-role                 MUST          MUST
1961	   max-unacked-bndupd          MUST          MUST
1962	   receive-timer               MUST          MUST
1963	   current-time                MUST          MUST
1964	   vendor-class-identifier     MUST          MUST
1965	   protocol-version            MUST          MUST
1966	   TLS-request                 MUST(1)       MUST(1)
1967	   MCLT                        MUST          MUST NOT
1968	   hash-bucket-assignment      MUST          MUST NOT
1969	   all others                  MAY           MAY

1971	   (1) If the CONNECT message is being sent on a TLS secured connection,
1972	   then there MUST NOT be a TLS-request option.

1974	                  Table 6.11-1: Options used in a CONNECT message

1976	6.12.  CONNECTACK message format

1978	   The connect response (CONNECTACK) message is used by a server to
1979	   respond to the receipt of a CONNECT message.

1981	   The message type for the CONNECTACK message is 6.

1983	   The xid in the CONNECTACK message MUST be identical to the xid in the
1984	   CONNECT message for which this CONNECTACK is a response.

1986	   The following table summarizes the options associated with the CON-
1987	   NECTACK message:

1989	   Option
1990	   ------
1991	   sending-server-IP-address   MUST
1992	   server-role                 MUST
1993	   max-unacked-bndupd          MUST
1994	   receive-timer               MUST
1995	   current-time                MUST
1996	   vendor-class-identifier     MUST
1997	   protocol-version            MUST
1998	   TLS-reply                   MUST(1)
1999	   reject-reason               MAY(2)
2000	   message                     MAY

2002	   (1) If the CONNECTACK is being sent over an already TLS secured
2003	       connection, then the TLS-reply option MUST NOT appear.

2005	   (2) Indicates a rejection of the CONNECT message.

2007	                  Table 6.12-1: Options used in a CONNECTACK message

2009	6.13.  STATE message format

2011	   The state (STATE) message is used by either server to communicate the
2012	   current state of the failover endpoint with the other server.  It
2013	   MUST be sent immediately after a connection is established with
2014	   another server, and it MUST be sent whenever the server's state
2015	   changes.

2017	   The message type for the STATE message is 10.

2019	   The xid in a STATE message MUST be unique among messages transmitted
2020	   from this failover endpoint during the life of this connection.

2022	   The following table shows the options that MUST appear in a STATE
2023	   message:

2025	           Option
2026	           ------
2027	           sending-state               MUST
2028	           server-flags                MUST
2029	           start-time-of-state         MUST

2031	                          Table 6.13-1: Options used in a STATE message

2033	6.14.  CONTACT message format

2035	   The contact (CONTACT) message is used by either server to verify that
2036	   the connection is operational to the other server.

2038	   The message type for the CONTACT message is 11.

2040	   The xid in a CONTACT message MUST be unique among messages transmit-
2041	   ted from this failover endpoint during the life of this connection.

2043	   The following table shows the options that MUST appear in a CONTACT
2044	   message:

2046	           Option
2047	           ------
2048	           current-time                MUST

2050	                          Table 6.14-1: Options used in a CONTACT message

2052	7.  Protocol Messages

2054	   This section contains the detailed definition of the protocol mes-
2055	   sages, including the information to include when sending the message,
2056	   as well as the actions to take upon receiving the message.

2058	7.1.  BNDUPD message

2060	   The binding update (BNDUPD) message is used to send the binding data-
2061	   base changes to the partner server, and the partner server responds
2062	   with a binding acknowledgement (BNDACK) message when it has success-
2063	   fully commited those changes to its own stable storage.

2065	   The rest of the failover protocol exists to determine whether the
2066	   partner server is able to communicate or not, and to enable the
2067	   partners to exchange BNDUPD/BNDACK messages in order to keep their
2068	   binding databases in stable storage synchronized.

2070	7.1.1.  Sending the BNDUPD message

2072	   A BNDUPD message SHOULD be generated whenever any binding changes.  A
2073	   change might be in the binding-status, the lease-expiration-time, or
2074	   even just the last-transaction-time.  In general, any time a DHCP
2075	   client sends in a packet that results in a DHCP server writing to its
2076	   stable storage, a BNDUPD message SHOULD be generated.

2078	   The BNDUPD (and BNDACK) messages refer to the binding-status of the
2079	   IP address, and this protocol defines a series of binding-statuses,
2080	   discussed in more detail below.  Some servers may not support all of
2081	   these binding-statuses, and so in those cases they will not be sent,
2082	   and upon receipt a reasonable interpretation should be made.

2084	   All BNDUPD messages MUST contain the IP address in the assigned-IP-
2085	   address option, and it contains the IP address about which the BNDUPD
2086	   message is being sent.

2088	   All BNDUPD messages MUST contain the binding-status option, and it
2089	   will have one of the values in the following list.  This list
2090	   discusses the meanings of the various binding-statuses and the infor-
2091	   mation that should go into the BNDUPD message because of them.

2093	      o ACTIVE

2095	        Indicates that the IP address is currently leased to a DHCP
2096	        client.

2098	        client-hardware-address

2100	        The client-hardware-address option MUST appear, and be set from
2101	        the MAC address of the DHCP client to which this IP address is
2102	        leased.

2104	        client-identifier

2106	        If the DHCP client to which this IP address is leased used a
2107	        client-identifier option to identify itself, then the client-
2108	        identifier MUST appear in the BNDUPD message, else it MUST NOT
2109	        appear.

2111	        lease-expiration-time
2112	        The lease-expiration-time option MUST appear, and be set to the
2113	        expiration time most recently ACKed to the DHCP client.  Note
2114	        that the time ACKed to a DHCP client is a lease duration in
2115	        seconds, while the lease-expiration-time option in a BNDUPD mes-
2116	        sage is an absolute time value.

2118	        potential-expiration-time

2120	        The potential-expiration-time option MUST appear, and be set to
2121	        a value beyond that of the lease-expiration time.  This is the
2122	        value that is ACKed by the BNDACK message.  A server sending a
2123	        BNDUPD message MUST be able to recover the potential-
2124	        expiration-time sent in every BNDUPD, not just those that
2125	        receive a corresponding BNDACK, in order to be able to protect
2126	        against possible duplicate allocation of IP addresses after
2127	        transitioning to PARTNER-DOWN state. See section 5.2.1 for
2128	        details as to why the potential-expiration-time exists and
2129	        guidelines for how to decide the value.

2131	      o EXPIRED

2133	        A binding-status of EXPIRED is used when a client's binding on
2134	        an IP address has expired and the server does not wish to imple-
2135	        ment an expired-grace period.  When the partner server ACK's the
2136	        BNDUPD of an EXPIRED IP address, the server sets its internal
2137	        state to FREE.  It is then available to allocation to any client
2138	        of the primary server.

2140	        client-hardware-address

2142	        There SHOULD be a DHCP client associated with the IP address
2143	        whose binding has expired.  If there is, then the client-
2144	        hardware-address option MUST appear, and be set from the MAC
2145	        address of the DHCP client to which this IP address was leased.

2147	        client-identifier

2149	        There SHOULD be a DHCP client associated with the IP address
2150	        whose binding has expired.  If there is, then if the DHCP client
2151	        to which this IP address was leased used a client-identifier
2152	        option to identify itself, then the client-identifier MUST
2153	        appear in the BNDUPD message, else it MUST NOT appear.

2155	      o RELEASED

2157	        A binding-status of RELEASED is used when a DHCP client sends in
2158	        a DHCPRELEASE message and the server does not wish to implement
2159	        a released-grace period.  When the partner server ACK's the
2160	        BNDUPD of an RELEASED IP address, the server sets its internal
2161	        state to FREE, and it is available for allocation by the primary
2162	        server to any DHCP client.

2164	        client-hardware-address

2166	        There SHOULD be a DHCP client associated with the IP address
2167	        whose binding has been released.  If there is, then the client-
2168	        hardware-address option MUST appear, and be set from the MAC
2169	        address of the DHCP client which released this IP address.

2171	        client-identifier

2173	        There SHOULD be a DHCP client associated with the IP address
2174	        whose binding has been released.  If there is, then if the DHCP
2175	        client which released this IP address used a client-identifier
2176	        option to identify itself, then the client-identifier MUST
2177	        appear in the BNDUPD message, else it MUST NOT appear.

2179	      o FREE

2181	        A binding-status of FREE is used when a DHCP server needs to
2182	        communicate that an IP address is available for allocation to
2183	        another server, but it was not just released, expired, or reset
2184	        by a network administrator.  When the partner server ACK's the
2185	        BNDUPD of an FREE IP address, the server sets its internal state
2186	        such that it is available for allocation by any DHCP client.

2188	        client-hardware-address

2190	        There MAY be a DHCP client associated with the IP address whose
2191	        binding is now desired to be FREE.  If there is, then the
2192	        client-hardware-address option MUST appear, and be set from the
2193	        MAC address of the DHCP client which released this IP address.

2195	        client-identifier

2197	        There MAY be a DHCP client associated with the IP address whose
2198	        binding is now desired to be FREE.  If there is, then if the
2199	        DHCP client which released this IP address used a client-
2200	        identifier option to identify itself, then the client-identifier
2201	        MUST appear in the BNDUPD message, else it MUST NOT appear.

2203	      o EXPIRED-GRACE

2205	        Some servers support a grace period after lease expiration, to
2206	        handle clock speed differences between clients and servers as
2207	        well as to limit the number of times names are removed and
2208	        subsequently added to dynamic DNS.

2210	        client-hardware-address

2212	        There MAY be a DHCP client associated with the IP address whose
2213	        binding has now expired.  If there is, then the client-
2214	        hardware-address option MUST appear, and be set from the MAC
2215	        address of the DHCP client which released this IP address.

2217	        client-identifier

2219	        There MAY be a DHCP client associated with the IP address whose
2220	        binding hs now expired.  If there is, then if the DHCP client
2221	        which most recently leased this IP address used a client-
2222	        identifier option to identify itself, then the client-identifier
2223	        MUST appear in the BNDUPD message, else it MUST NOT appear.

2225	        grace-expiration-time

2227	        The grace-expiration-time option MUST appear, and is the length
2228	        of time that this server will wait before trying to make the IP
2229	        address available after the lease has expired for this IP
2230	        address.

2232	      o RELEASED-GRACE

2234	        Some servers support a grace period after lease release by a
2235	        DHCP client, to handle clock speed differences between clients
2236	        and servers as well as to limit the number of times names are
2237	        removed and subsequently added to dynamic DNS.

2239	        client-hardware-address

2241	        There MAY be a DHCP client associated with the IP address whose
2242	        binding has now been released by sending a DHCPRELEASE.  If
2243	        there is, then the client-hardware-address option MUST appear,
2244	        and be set from the MAC address of the DHCP client which
2245	        released this IP address.

2247	        client-identifier

2249	        There MAY be a DHCP client associated with the IP address whose
2250	        binding has been released.  If there is, then if the DHCP client
2251	        which most recently leased this IP address used a client-
2252	        identifier option to identify itself, then the client-identifier
2253	        MUST appear in the BNDUPD message, else it MUST NOT appear.

2255	        client-hardware-address
2256	        There MAY be a DHCP client associated with the IP address whose
2257	        binding is now desired to be FREE.  If there is, then the
2258	        client-hardware-address option MUST appear, and be set from the
2259	        MAC address of the DHCP client which released this IP address.

2261	        client-identifier

2263	        There MAY be a DHCP client associated with the IP address whose
2264	        binding is now desired to be FREE.  If there is, then if the
2265	        DHCP client which released this IP address used a client-
2266	        identifier option to identify itself, then the client-identifier
2267	        MUST appear in the BNDUPD message, else it MUST NOT appear.

2269	        grace-expiration-time

2271	        The grace-expiration-time MUST appear, and is the length of time
2272	        that this server will wait before trying to make the IP address
2273	        available after the lease was released for this IP address

2275	      o ABANDONED

2277	        An ABANDONED IP address is one that has been considered unusable
2278	        by the DHCP subsystem.  An IP address for which a valid PING
2279	        response was received SHOULD be set to ABANDONED.

2281	        client-hardware-address

2283	        There SHOULD NOT be a DHCP client associated with an ABANDONDED
2284	        IP address.  The client-hardware-address option MUST NOT appear
2285	        in the BNDUPD message.

2287	        client-identifier

2289	        There SHOULD NOT be a DHCP client associated with the IP address
2290	        whose binding has now been ABANDONED.  The client-identifier
2291	        option MUST-NOT appear in the BNDUPD message.

2293	      o RESET

2295	        The RESET value of the binding-status is used to indicate that
2296	        this IP address was made available by operator command.

2298	      o BACKUP

2300	        The BACKUP value of binding-status indicates that this IP
2301	        address belongs to the secondary server, and can be allocated by
2302	        that server to a DHCP client at any time.

2304	        client-hardware-address

2306	        There MAY be a DHCP client associated with an BACKUP IP address.
2307	        If there is, the client-hardware-address option MUST appear, and
2308	        be set from the MAC address of the DHCP client to which this IP
2309	        address was most recently associated.

2311	        client-identifier

2313	        There MAY be a DHCP client associated with this IP address.  If
2314	        the DHCP client to which this IP address is leased used a
2315	        client-identifier option to identify itself, then the client-
2316	        identifier MUST appear in the BNDUPD message, else it MUST NOT
2317	        appear.

2319	   The following option information is generic to all BNDUPD messages,
2320	   regardless of the value of the binding-status.

2322	   o start-time-of-state

2324	     The start-time-of-state SHOULD appear.  It is set to the time at
2325	     which this IP address first took on the state that corresponds to
2326	     the current value of binding-status.

2328	   o last-transaction-time

2330	     The last-transaction-time value SHOULD appear.  This is the time at
2331	     which this DHCP server last received a packet from the DHCP client
2332	     referenced by the client-identifier or client-hardware-address that
2333	     was associated with the IP address referenced by the assigned-IP-
2334	     address.

2336	   o client-FQDN

2338	     If the DHCP server is performing dynamic DNS operations on behalf
2339	     of the DHCP client represented by the client-identifier or client-
2340	     hardware-address, then it should include a client-FQDN option con-
2341	     taining the host name, domain name, and status of any dynamic DNS
2342	     operations enabled.

2344	   The BNDUPD message SHOULD be sent as soon as possible from the time
2345	   that the DHCP client received a response and the lease bindings data-
2346	   base is written on stable storage.

2348	7.1.2.  Receiving the BNDUPD message

2350	   When a server receives a BNDUPD message, it needs to decide how to
2351	   processes the message and whether the message represents a conflict
2352	   of any sort. The conflict resolution process is used on the receipt
2353	   of every BNDUPD message, not just those that are received while in
2354	   POTENTIAL-CONFLICT state, in order to increase the robustness of the
2355	   protocol.

2357	   There are two sorts of conflict.  The first, more major conflict, is
2358	   when a server receives a BNDUPD message from its partner for an
2359	   ACTIVE IP address and finds that the client specified in the BNDUPD
2360	   message is different from the client associated with this ACTIVE IP
2361	   address in this server's bindings database.

2363	   The second sort of conflict is where the receiving server has in its
2364	   bindings database the client specified in the BNDUPD message associ-
2365	   ated with a different IP address.

2367	   These two conflict cases can both occur together with the same BNDUPD
2368	   message.

2370	   When receiving a BNDUPD message, the server first determines the IP
2371	   address from the assigned-IP-address option, and then determines if
2372	   there was any client associated with this IP address by looking for
2373	   the client-identifier option.  If there is no client-identifier
2374	   option, then the server looks for a client-hardware-address option,
2375	   and ultimately determines the client's identity specified in the
2376	   BNDUPD.

2378	   The client specified in the BNDUPD message is compared to the client
2379	   currently associated with the IP address in this server's bindings
2380	   database.  If they are the same, continue.  If there is no client in
2381	   this server's binding database, continue.  If there is a client in
2382	   this server's bindings database, and it is different from that speci-
2383	   fied in the BNDUPD message, a 'client conflict' exists.  See the sec-
2384	   tion below on conflict resolution.  If the client specified in the
2385	   BNDUPD message is associated with a different IP address in this
2386	   server's bindings database in the same subnet, then an 'IP address
2387	   conflict' exists. This does not refer to the case where a single
2388	   client has addresses in multiple different subnets or administrative
2389	   domains, but rather the case where in the same subnet the client has
2390	   as lease on one IP address in one server and on a different IP
2391	   address on the other server.  See the section below on conflict reso-
2392	   lution.

2394	   If none of the conflicts mentioned above exist, then develop a time
2395	   for both the BNDUPD message and the server's information.

2397	   The time for both the BNDUPD and the server's information are
2398	   developed independently in the following way:  If there is a client-
2399	   last-transaction time, use that.  If there isn't, but there is a
2400	   start-time-of-state, use that.  If there isn't, but there is a
2401	   client-expiration-time, use that.  If there isn't, then use the time
2402	   the BNDUPD message was received for a BNDUPD message, and the current
2403	   time for the server's information.

2405	   Then the server determines the binding-status in the BNDUPD, and
2406	   takes the following actions based on binding-status:

2408	   (In the following list, to "accept" a BNDUPD means to update the
2409	   server's bindings database with the information contained in the
2410	   BNDUPD and once that update is complete, send a BNDACK message
2411	   corresponding to the BNDUPD message).

2413	      o ACTIVE in BNDUPD

2415	        If the BNDUPD is LATER than the server's information, accept it,
2416	        else reject it.

2418	      o EXPIRED or EXPIRED-GRACE in BNDUPD

2420	        If the binding-status in the receiving server's bindings data-
2421	        base is ACTIVE, then reject the BNDUPD.  Otherwise, accept the
2422	        BNDUPD.

2424	        If the binding-status in the BNDUPD is EXPIRED-GRACE and the
2425	        server receiving the BNDUPD does not implement a grace period
2426	        for expired leases, then the server MUST set its lease expira-
2427	        tion to value held in the grace-expiration in the BNDUPD.

2429	      o RELEASED or RELEASED-GRACE in BNDUPD

2431	        If the BNDUPD is LATER than the server's information, accept it,
2432	        else reject it.

2434	        If the binding-status in the BNDUPD is RELEASED-GRACE and the
2435	        server receiving the BNDUPD does not implement a grace period
2436	        for released leases, then the server MUST set its lease expira-
2437	        tion to value held in the grace-expiration in the BNDUPD.

2439	      o FREE or BACKUP in BNDUPD

2441	        If the binding-status in the receiving server's database is
2442	        ACTIVE and the lease-expiration-time has not yet been reached,
2443	        reject it, else accept it.

2445	      o RESET or ABANDONDED in BNDUPD

2447	        Accept it under all circumstances.

2449	7.1.3.  Conflict resolution when receiving the BNDUPD message

2451	   When a either of the following conflicts exists between the informa-
2452	   tion in a BNDUPD message and the information held in the receiving
2453	   server's bindings database, it should be resolved in the following
2454	   manner:

2456	      o client conflict

2458	        This is the duplicate IP address allocation conflict. There are
2459	        two different clients each allocated the same address.

2461	        If times for both exist, use the LATER update, else use the
2462	        information from the primary server.

2464	      o IP address conflict

2466	        An IP address conflict exists when a client on one server is
2467	        associated with a one IP address, and on the other server with a
2468	        different IP address in the same or a related subnet. If one
2469	        binding-status is ACTIVE and the other is anything but ACTIVE,
2470	        then the information in the ACTIVE binding SHOULD be used.  Oth-
2471	        erwise, if times exist, then the LATER SHOULD be used. Other-
2472	        wise, if times do not exist, then the information from the pri-
2473	        mary server should be used.

2475	7.2.  BNDACK message

2477	   Every BNDUPD message that is received by a server MUST be responded
2478	   to with a corresponding BNDUPD message.  The receiving server SHOULD
2479	   respond quickly to every BNDUPD message but it MAY choose to respond
2480	   preferentially to DHCP client requests instead of BNDUPD messages,
2481	   since there is no absolute time period within which a BNDACK must be
2482	   sent in response to a BNDUPD message, and DHCP clients frequently do
2483	   have time constraints that must be met.

2485	7.2.1.  Sending the BNDACK message

2487	   The BNDACK message MUST contain the same xid as the corresponding
2488	   BNDUPD message.

2490	   All of the options which appear in the BNDUPD message MUST be
2491	   included in the BNDACK message.  The values in the options MAY be
2492	   updated to reflect current information on the server sending the
2493	   BNDACK.   Note that update of this information may be used for infor-
2494	   mational purposes, but MUST NOT be assumed to necessarily be recorded
2495	   in the stable storage of the server who sent the BNDUPD message
2496	   because there is not corresponding ACK of the BNDACK message.  Any
2497	   information that SHOULD be recorded in the partner server's stable
2498	   storage MUST be transmitted in a subsequent BNDUPD.

2500	   If the server is accepting the BNDUPD, the BNDACK message includes
2501	   only those options that appears in the BNDUPD message. If the server
2502	   is rejecting the BNDUPD, the additional option reject-reason MUST
2503	   appear in the BNDACK message, and the message option SHOULD appear in
2504	   this case containing a human-readable error message describing in
2505	   some detail the reason for the rejection of the BNDUPD message.

2507	7.2.2.  Receiving the BNDACK message

2509	   When a server receives a BNDACK message, if it doesn't contain a
2510	   reject-reason option that means that the BNDUPD message was accepted,
2511	   and the server which sent the BNDUPD MUST update its stable storage
2512	   with the potential-expiration-time value sent in the BNDUPD message
2513	   and returned in the BNDACK message.  Other values sent in the BNDUPD
2514	   message MAY be used as desired.

2516	7.3.  UPDREQ message

2518	   The update request (UPDREQ) message is used by one server to request
2519	   that its partner send it all of the binding database information that
2520	   it has not already seen.   Since each server is required to keep
2521	   track at all times of the binding information the other server has
2522	   received and ACKed, one server can request transmission of all un-
2523	   ACKed binding database information held by the other server by using
2524	   the UPDREQ message.

2526	   The UPDREQ message is used whenever the sending server cannot proceed
2527	   before it has processed all previously un-ACKed binding update infor-
2528	   mation, since the UPDREQ message should yield a corresponding UPDDONE
2529	   message.  The UPDDONE message is not sent until the server that sent
2530	   the UPDREQ message has responded to all of the BNDUPD messages gen-
2531	   erated by the UPDREQ message with BNDACK messages. Thus, the sender
2532	   of the UPDREQ message can be sure upon receipt of an UPDDONE message
2533	   that it has received and commited to stable storage all outstanding
2534	   binding database updates.

2536	   See section 9, Protcol state transitions, for the details of when the
2537	   UPDREQ message is sent.

2539	7.3.1.  Sending the UPDREQ message

2541	   There are no options for the UPDREQ message.

2543	   The UPDREQ message is sent with a unique xid.

2545	7.3.2.  Receiving the UPDREQ message

2547	   A server receiving an UPDREQ message MUST send all binding database
2548	   changes that have not yet been ACKed by the sending server.   These
2549	   changes are sent as undistinguished BNDUPD messages.

2551	   However, the server which received and is processing the UPDREQ mes-
2552	   sage MUST track the BNDACK messages that correspond to the BNDUPD
2553	   messages triggered by the UPDREQ message and, when they are all
2554	   received, the server MUST send an UPDDONE message.

2556	   When queuing up the BNDUPD messages for transmission to the sender of
2557	   the UPDREQ message, the receiving server MUST honor the value
2558	   returned in the max-unacked-bndupd option in the CONNECT or CONNEC-
2559	   TACK message that set up the connection with the sending server.  It
2560	   MUST NOT send more BNDUPD messages without receiving corresponding
2561	   BNDACKs than the value returned in max-unacked-bndupd.

2563	7.4.  UPDREQALL message

2565	   The update request all (UPDREQALL) message is used by one server to
2566	   request that its partner send it all of the binding database informa-
2567	   tion.  This message is used to allow one server to recover from a
2568	   failure of stable storage and to restore its binding database in its
2569	   entirety from the other server.

2571	   A server which sends an UPDREQALL message cannot proceed until all of
2572	   its binding update information is restored, and it knows that all of
2573	   that information is restored when an UPDDONE message is received.

2575	   See section 9, Protcol state transitions, for the details of when the
2576	   UPDREQALL message is sent.

2578	7.4.1.  Sending the UPDREQALL message

2580	   There are no options for the UPDREQALL message.

2582	   The UPDREQALL message is sent with a unique xid.

2584	7.4.2.  Receiving the UPDREQALL message

2586	   A server receiving an UPDREQALL message MUST send all binding data-
2587	   base information to the sending server.  These changes are sent as
2588	   undistinguished BNDUPD messages.

2590	   However, the server receiving the UPDREQALL message MUST track the
2591	   BNDACK messages that correspond to the BNDUPD messages triggered by
2592	   the UPDREQ message and, when they are all received, the server MUST
2593	   send an UPDDONE message.

2595	   When queuing up the BNDUPD messages for transmission to the sender of
2596	   the UPDREQALL message, the receiving server MUST honor the value
2597	   returned in the max-unacked-bndupd option in the CONNECT or CONNEC-
2598	   TACK message that set up the connection with the sending server.  It
2599	   MUST NOT send more BNDUPD messages without receiving corresponding
2600	   BNDACKs than the value returned in max-unacked-bndupd.

2602	7.5.  UPDDONE message

2604	   The update done (UPDDONE) message is used by a server receiving an
2605	   UPDREQ or UPDREQALL message to signify that it has sent all of the
2606	   BNDUPD messages requested by the UPDREQ or UPDREQALL request and that
2607	   it has received a BNDACK for each of those messages.

2609	7.5.1.  Sending the UPDDONE message

2611	   The UPDDONE message SHOULD be sent as soon as the last BNDACK message
2612	   corresponding to a BNDUPD message requested by the UPDREQ or
2613	   UPDREQALL is received from the server which sent the UPDREQ or
2614	   UPDREQALL.

2616	7.5.2.  Receiving the UPDDONE message

2618	   A server receiving the UPDDONE message knows that all of the informa-
2619	   tion that it requested by sending an UPDREQ or UPDREQALL message has
2620	   now been sent and that it has recorded this information in its stable
2621	   storage.  It typically uses that the receipt of an UPDDONE message to
2622	   move to a different failover state.  See sections 9.5.2 and 9.8.3 for
2623	   details.

2625	7.6.  POOLREQ message

2627	   The pool request (POOLREQ) message is used by the secondary server to
2628	   request an allocation of IP addresses from the primary server.   It
2629	   MUST be sent by a secondary server to a primary server to request IP
2630	   address allocation by the primary.  The IP addresses allocated are
2631	   transmitted using normal BNDUPD messages from the primary to the
2632	   secondary.

2634	   The POOLREQ message SHOULD be sent from the secondary to the primary
2635	   whenever the secondary transitions into NORMAL state.  It SHOULD
2636	   periodically be resent in order that any change in the number of
2637	   available IP addresses on the primary be reflected in the pool on the
2638	   secondary.

2640	7.6.1.  Sending the POOLREQ message

2642	   The POOLREQ message has no options.  It must be sent with a unique
2643	   xid.

2645	7.6.2.  Receiving the POOLREQ message

2647	   When a primary server receives a POOLREQ message it SHOULD examine
2648	   the binding database and determine how many IP addresses the secon-
2649	   dary server should have, and set these IP addresses to BACKUP state.
2650	   It SHOULD then send BNDUPD messages concerning all of these IP
2651	   addresses to the secondary server.

2653	   Servers frequently have several kinds of IP addresses available on a
2654	   particular network segment.  The failover protocol assumes that both
2655	   primary and secondary servers are configured in such a way that each
2656	   knows the type and number of IP addresses on every network segment
2657	   participating in the failover protocol.  The primary server is
2658	   responsible for allocating the secondary server the correct propor-
2659	   tion of available IP addresses of each kind, and the secondary server
2660	   is responsible for being configured in such a way that it can tell
2661	   the kind of every IP address based solely on the IP address itself.

2663	   A primary server MUST keep track of how many IP addresses were allo-
2664	   cated as a result of processing the POOLREQ message, and send that
2665	   number in the POOLRESP message.

2667	   A primary server MAY choose to defer processing a POOLREQ message
2668	   until a more convenient time to process it, but it should not depend
2669	   on the secondary server to retransmit the POOLREQ message in that
2670	   case.

2672	   If a secondary server receives a POOLREQ message it SHOULD report an
2673	   error.

2675	7.7.  POOLRESP message

2677	   A primary server sends a POOLRESP message to a secondary server after
2678	   the allocation process for available addresses to the secondary
2679	   server is complete.  Typically this message will precede some of the
2680	   BNDUPD messages that the primary uses to send the actual allocated IP
2681	   addresses to the secondary.

2683	7.7.1.  Sending the POOLRESP message

2685	   The POOLRESP message MUST contain the same xid as the corresponding
2686	   POOLREQ message.

2688	   The only option which MUST appear in a POOLREQ message is:

2690	      o addressed-transferred

2692	        The number of addresses allocated to the secondary server by the
2693	        primary server as a result of a POOLREQ is contained in the
2694	        addresses-transferred option in a POOLRESP message.  Note this
2695	        is the number of addresses that are transferred to the secondary
2696	        in the primary's binding database as a result of the correspond-
2697	        ing POOLREQ message, and that it may be some time before they
2698	        can all be transmitted to the secondary server through the use
2699	        of BNDUPD messages.

2701	7.7.2.  Receiving the POOLRESP message

2703	   When a secondary server receives a POOLRESP message, it SHOULD send
2704	   another POOLRESP message if the value of the addresses-transferred
2705	   option is non-zero.

2707	   Typically, no other action is taken on the reception of a POOLRESP
2708	   message.

2710	7.8.  CONNECT message

2712	   The connect message is used to establish an applications level con-
2713	   nection over a newly created TCP connection.  It gives the source
2714	   information for the connection, and some important configuration
2715	   information.  It may be sent by either primary or secondary server.
2716	   It is sent by the initiator of a TCP connection.

2718	7.8.1.  Sending the CONNECT message

2720	   The CONNECT message MUST be the first message sent by the initiator
2721	   of a TCP connection after the establishment of a new TCP connection
2722	   with another server participating in the failover protocol.

2724	   The xid of the CONNECT message must be unique.

2726	   The IP address of the sending server MUST be placed in the sending-
2727	   server-IP-address option.  This information is placed in an option
2728	   inside of the packet in order to allow the identity of the sender to
2729	   be covered by a shared secret.

2731	   The role of the sending failover endpoint (i.e., either primary or
2732	   secondary) MUST be placed in the server-role option.

2734	   The current time MUST be placed in the current-time option.

2736	   The number of BNDUPD messages the server can accept without blocking
2737	   the TCP connection MUST be placed in the max-unacked-bndupd option.
2738	   This MUST be a number equal to or greater than 1, SHOULD be a number
2739	   greater than 10, and SHOULD be a number less than 100.

2741	   The length of the receive timer (tReceive, see section 8.3) MUST be
2742	   placed in the receive-timer option.

2744	   If the sending server is a primary server, then the MCLT MUST be
2745	   placed in the MCLT option.

2747	   If the sending server is a primary server, then the hash-bucket-
2748	   assignment option MUST be included in the CONNECT message.  The value
2749	   of the hash-bucket-assignment option is determined from the specific
2750	   buckets that the primary server has determined that the secondary
2751	   server MUST service as part of the load-balancing algorithm.  The way
2752	   in which the primary server determines this information is outside
2753	   the scope of this protocol definition.  The primary server is SHOULD
2754	   be able to be configured with a percentage of clients that the secon-
2755	   dary server will be instructed to service, and the primary server
2756	   SHOULD convert that percentage value into a corresponding set of bits
2757	   in the hash-bucket-assignment option that are set to a 1, indicating
2758	   that the secondary server MUST service clients which map to those
2759	   hash buckets.

2761	   The vendor class identifier MUST be placed in the vendor-class-
2762	   identifier option.

2764	   The protocol-version option MUST be included in every CONNECT mes-
2765	   sage.  The current value of the protocol version is 1.

2767	   The TLS-request option MUST be sent and contains the desired TLS con-
2768	   nection request as well as information concerning whether TLS is sup-
2769	   ported.    If this CONNECT message is being sent over a already
2770	   created TLS connection, the TLS-request MUST NOT appear.

2772	7.8.2.  Receiving the CONNECT message

2774	   When a server receives a TCP connection on the failover port, it
2775	   should wait for a CONNECT message.

2777	   When a server receives a CONNECT message it should:

2779	      1.  Record the time at which the message was received.

2781	      2.  Examine the protocol-version option, and decide if this server
2782	          is capable of interoperating with another server running that
2783	          protocol version.  If not, then send the CONNECTACK message
2784	          with the appropriate reject-reason.  The server MUST include
2785	          its protocol-version in the CONNECTACK message.

2787	      3.  Examine the TLS-request option.  Figure out the TLS-reply
2788	          value based on the capabilities and configuration of this
2789	          server, and save it for the CONNECTACK message.  If the
2790	          results of the TLS negotiation result in a connection rejec-
2791	          tion, then go immediately to send the CONNECTACK message.

2793	          The possibilities are:

2795	                CONNECT      CONNECTACK
2796	              TLS-request     TLS-reply
2797	                               Reject
2798	              req acc     t1   Reason   Comments
2799	              --- ---     --   ------   --------
2800	              0   0       0
2801	              0   0       1    11       receiver requires TLS
2802	              0   1       0
2803	              0   1       1
2804	              1   0       -             request doesn't make sense
2805	              1   1       0
2806	              1   1       1
2807	              2   0       -             request doesn't make sense
2808	              2   1       0    9 or 10  receiver won't do TLS
2809	              2   1       1

2811	      4.  Check to see if there is a message-digest option in the CON-
2812	          NECT message.  If there was, and the server does not support
2813	          message-digests, then reject the connection with the appropri-
2814	          ate reject-reason in the CONNECTACK.

2816	      5.  Determine if the sender (from the sending-server-IP-address
2817	          option) and the role of the sender (from the server-role)
2818	          option represents a server with which the receiver was config-
2819	          ured to engage in failover activity.

2821	          If not, then the receiving server should reject the CONNECT
2822	          request by sending a CONNECTACK message with a reject-reason
2823	          value of: 8, invalid failover partner.

2825	          If it is, then the receiving failover endpoint should be
2826	          determined.

2828	      6.  Decide if the time delta between the sending of the packet, in
2829	          the current-time option, and the receipt of the packet,
2830	          recorded in step 1 above, is acceptable.  A server MAY require
2831	          an arbitrarily small delta in time values in order to set up a
2832	          failover connection with another server.

2834	          If the delta between the time values is too great, the server
2835	          should reject the CONNECT request by sending a CONNECTACK mes-
2836	          sage with a reject-reason of 4, time mismatch too great.

2838	          If the time mismatch is not considered too great then the
2839	          receiving server MUST record the delta between the servers.
2840	          The receiving server MUST use this delta to correct all of the
2841	          absolute times received from the other server in all time-
2842	          valued options.  Note that server's can participate in fail-
2843	          over with arbitrarily great time mismatches, as long as it is
2844	          more or less constant.

2846	      7.  If the receiving server is a secondary server, it MUST examine
2847	          the MCLT option in the CONNECT request and use the value of
2848	          the MCLT as the MCLT for this failover endpoint.

2850	          A receiving secondary server SHOULD be able to operate with
2851	          any MCLT sent by the primary,  but if it cannot, then it
2852	          should send a CONNECTACK with a reject-reason of 5, MCLT
2853	          mismatch.

2855	      8.  The receiving server MAY use the vendor-class-identifier to do
2856	          vendor specific processing.

2858	7.9.  CONNECTACK message

2860	   The CONNECTACK message is sent to accept or reject a CONNECT message.
2861	   It is sent by the server which accepted the TCP connection and
2862	   received a CONNECT message.

2864	7.9.1.  Sending the CONNECTACK message

2866	   The xid of the CONNECTACK message must be that of the corresponding
2867	   CONNECT message.

2869	   The IP address of the sending server MUST be placed in the sending-
2870	   server-IP-address option.  This information is placed in an option
2871	   inside of the packet in order to allow the identity of the sender to
2872	   be covered by a shared secret.

2874	   The role of the sending failover endpoint (i.e., either primary or
2875	   secondary) MUST be placed in the server-role option.

2877	   The current time MUST be placed in the current-time option.

2879	   The protocol-version option MUST be included in every CONNECTACK mes-
2880	   sage.  The current value of the protocol version is 1.

2882	   If the connection has been rejected, the reject-reason option MUST be
2883	   placed in the CONNECTACK message with an appropriate reason, and a
2884	   message option SHOULD be included with a human-readable error message
2885	   describing the reason for the rejection in some detail.  If the
2886	   reject-reason option appears, then the remaining options listed below
2887	   do not appear.

2889	   The results of the TLS negotiation MUST be placed in the TLS-reply
2890	   option.  If this CONNECTACK message is being sent over an already TLS
2891	   secured connection, then there MUST NOT be a TLS-reply option.

2893	   If there was a message-digest option in the CONNECT message, then
2894	   there MUST be a message-digest in the CONNECTACK message if it does
2895	   not contain a reject-reason.

2897	   The number of BNDUPD messages the server can accept without blocking
2898	   the TCP connection MUST be placed in the max-unacked-bndupd option.
2899	   This SHOULD be a number greater than 10, and SHOULD be a number less
2900	   than 100.

2902	   The length of the receive timer (tReceive, see section 8.3) MUST be
2903	   placed in the receive-timer option.

2905	   If the sending server is a primary server, then the MCLT MUST be
2906	   placed in the MCLT option.

2908	   The vendor class identifier MUST be placed in the vendor-class-
2909	   identifier option.

2911	   If the server is rejecting the CONNECT message, then the reject-
2912	   reason option MUST appear.  A message option MAY appear to give a
2913	   human readable version of the rejection reason.

2915	   After sending a CONNECTACK message, the server MUST send a STATE mes-
2916	   sage.

2918	   After sending a CONNECTACK message, the server MUST start two timers
2919	   for the connection: tSend and tReceive.   The tSend timer SHOULD be
2920	   approximately 20 percent of the time in the receiver-timer option in
2921	   the corresponding CONNECT message.  The tReceive timer SHOULD be the
2922	   time sent in the receiver-timer option in the CONNECTACK message.

2924	   The tReceive timer is reset whenever a message is received from this
2925	   TCP connection.  If it ever expires, the TCP connection is dropped
2926	   and communications with this partner is considered not ok.

2928	   The tSend timer is reset whenever a packet is sent over this connec-
2929	   tion. When it expires, a CONTACT message MUST be sent.

2931	7.9.2.  Receiving the CONNECTACK message

2933	   When a CONNECTACK message is received, the following actions should
2934	   be taken:

2936	      1.  Record the time the packet was received.

2938	      2.  Check to see if there is a reject-reason option in the CONNEC-
2939	          TACK message.  If not, continue with step 3.  If there is a
2940	          reject-reason option, the server SHOULD report the error code.
2941	          If a message option appears a server SHOULD display the string
2942	          from the message option in a user visible way.  The server
2943	          MUST close the connection if a reject-reason option appears.

2945	      3.  Check to see if the xid on the CONNECTACK matches an outstand-
2946	          ing CONNECT message on this TCP connection.

2948	      4.  Check the value of the TLS-reply option, and if it was 1, then
2949	          skip processing of the rest of the CONNECTACK message, and
2950	          immediately enter into TLS connection setup.

2952	          If it does not, a server SHOULD report an error.

2954	      5.  Examine the value of the protocol-version option.  If this
2955	          server is able to establish connections with another server
2956	          running this protocol version, then continue, else close the
2957	          connection.

2959	      6.  Check to see if the sending-server-IP-address and server-role
2960	          in the CONNECTACK message correspond to the failover endpoint
2961	          for which this TCP connection was created.

2963	          If it was not, the server MUST drop the TCP connection and
2964	          SHOULD report an error.

2966	      7.  Decide if the time delta between the sending of the packet, in
2967	          the current-time option, and the receipt of the packet,
2968	          recorded in step 1 above, is acceptable.  A server MAY require
2969	          an arbitrarily small delta in time values in order to set up a
2970	          failover connection with another server.

2972	          If the delta between the time values is too great, the server
2973	          should drop the TCP connection.

2975	          If the time mismatch is not considered too great then the
2976	          receiving server MUST record the delta between the servers.
2977	          The receiving server MUST use this delta to correct all of the
2978	          absolute times received from the other server in all time-
2979	          valued options.  Note that the failover protocol is con-
2980	          structed so that two servers can be failover partners with
2981	          arbitrarily great time mismatches.

2983	      8.  If the receiving server is a secondary server, it MUST examine
2984	          the MCLT option in the CONNECT request and use the value of
2985	          the MCLT as the MCLT for this failover endpoint.

2987	          A receiving secondary server SHOULD be able to operate with
2988	          any MCLT sent by the primary,  but if it cannot, then it MUST
2989	          drop the TCP connection.

2991	      9.  The receiving server MAY use the vendor-class-identifier to do
2992	          vendor specific processing.

2994	      10. After accepting a CONNECTACK message, the server MUST send a
2995	          STATE message.

2997	          After receiving a CONNECTACK message, the server MUST start
2998	          two timers for the connection: tSend and tReceive.   The tSend
2999	          timer SHOULD be approximately 20 percent of the time in the
3000	          receiver-timer option in the corresponding CONNECTACK message.
3001	          The tReceive timer SHOULD be set to the time sent in the
3002	          receiver-timer option in the CONNECT message.

3004	          The tReceive timer is reset whenever a message is received
3005	          from this TCP connection.  If it ever expires, the TCP connec-
3006	          tion is dropped and communications with this partner is con-
3007	          sidered not ok.

3009	          The tSend timer is reset whenever a packet is sent over this
3010	          connection. When it expires, a CONTACT message MUST be sent.

3012	7.10.  STATE message

3014	   The state (STATE) message is used to communicate the current failover
3015	   state to the partner server.

3017	   The STATE message MUST be sent after sending a CONNECTACK message
3018	   that didn't contain a reject-reason option, and MUST be sent after
3019	   receiving a CONNECTACK message without a reject-reason option.

3021	   A STATE message MUST be sent whenever the failover endpoint changes
3022	   its failover state and a connection exists to the partner.

3024	   The STATE message requires no response from the failover partner.

3026	7.10.1.  Sending the STATE message

3028	   The current failover state is placed in the server-state option and
3029	   the current state of the STARTUP flag is placed in the server-flags
3030	   option.

3032	   The message is sent with a unique xid.

3034	   A server SHOULD only send the STATE message either when the connec-
3035	   tion is created (i.e, after sending or receiving a CONNECTACK message
3036	   with no reject-reason option), or when there is a change from the
3037	   values sent in a previous STATE message.

3039	7.10.2.  Receiving the STATE message

3041	   Every STATE message SHOULD indicate a change in state or a change in
3042	   the flags.

3044	   When a STATE message is received, any state transitions specified in
3045	   section 9 are taken.

3047	   No response to a STATE message is required.

3049	7.11.  CONTACT message

3051	   The contact (CONTACT) message is sent to verify communications
3052	   integrity with a failover partner. The CONTACT message is sent when
3053	   no messages have been sent to the failover partner for a specified
3054	   period of time.  This is determined by the tSend timer expiring (see
3055	   section 8.3).

3057	7.11.1.  Sending the CONTACT message

3059	   The current time is placed in the current-time option, and the CON-
3060	   TACT message is sent.

3062	7.11.2.  Receiving the CONTACT message

3064	   When a CONTACT message is received, the tReceive timer is reset (as
3065	   it is with any message that is received).

3067	   A server MAY use the time in the current-time option and the time
3068	   recorded above to refine the delta time calculations between the
3069	   servers.

3071	8.  Connection Management

3073	   Servers participating in the failover protocol communicate over TCP
3074	   connections.   These TCP connections are used both to transmit bind-
3075	   ing information from one server to another as well as to allow each
3076	   server to determine whether communications is possible with the other
3077	   server.

3079	   Central to the operation of the failover protocol is a notion of
3080	   "communications okay" or "communications failed".  Failover state
3081	   transitions are taken in many cases when the status of communications
3082	   with the partner changes, and the existence or non-existence of a TCP
3083	   connections between failover endpoints is used to determine if com-
3084	   munications is "okay" or "failed".

3086	   A single TCP connection exists which connects two failover endpoints.

3088	8.1.  Connection granularity

3090	   There exists one TCP connection between each set of failover end-
3091	   points.  See section 5.1.1 for an explanation of failover endpoint.

3093	   There are a maximum of two TCP connections between any two servers
3094	   implementing the failover protocol, one for each of the possible
3095	   failover endpoints between these two servers.  There is a minimum of
3096	   one TCP connection between one server and every other failover server
3097	   with which it implements the failover protocol.

3099	8.2.  Creating the TCP connection

3101	   Every server implementing the failover protocol MUST listen on port
3102	   647 for incoming failover TCP connections.  The source port of the
3103	   TCP connection is unimportant.

3105	   Every server implementing the failover protocol SHOULD attempt to
3106	   connect to all of its partners periodically, where the period is
3107	   implementation dependent and SHOULD be configurable. In the event
3108	   that a connection has been rejected by a CONNECTACK message with a
3109	   reject-reason option contained in it, a server SHOULD reduce the fre-
3110	   quency with which it attempts to connect to that server but it SHOULD
3111	   continue to attempt to connect periodically.

3113	   Once a connection is established, the first message sent across the
3114	   connection MUST be a CONNECT message. This message establishes the
3115	   identity of the failover endpoint making the connection.

3117	   Every CONNECT message includes a TLS-request option, and if the CON-
3118	   NECTACK message does not reject the CONNECT message and the TLS-reply
3119	   option says TLS MUST be used, then the servers will enter into TLS
3120	   negotiation.

3122	   Once that negotiation is complete, then the server MUST resend the
3123	   CONNECT message on the newly secured TLS connection and then wait for
3124	   the CONNECTACK message in response.  The TLS-request and TLS-reply
3125	   options MUST have the same values in this second CONNECT and CONNEC-
3126	   TACK message has they had in the first messages.

3128	   The second message sent over a new connection is a STATE message.
3129	   Upon the receipt of this message, the receiver can consider communi-
3130	   cations up.

3132	   It is entirely possible that two servers will attempt to make connec-
3133	   tions to each other essentially simultaneously, and then each will
3134	   send a CONNECT message down the new connection.  In this case each
3135	   server will receive a CONNECT message on one connection having
3136	   already sent a CONNECT message on the other connection.  In the event
3137	   that the primary server receives a CONNECT message from the secondary
3138	   server either while waiting for a CONNECTACK message from a secondary
3139	   server or when it has a valid connection open to a secondary server,
3140	   it will close the connection on which the CONNECT message was
3141	   received.

3143	8.3.  Using the TCP connection for determining communications status

3145	   The TCP connection is used to determine the communications status of
3146	   the other server, i.e., communications-ok, or communications-
3147	   interrupted.

3149	   Three things must happen for a server to consider that communications
3150	   are ok with respect to another server:

3152	      1.  A TCP connection must be established to the other server.

3154	      2.  A CONNECT message must be received and a CONNECTACK message
3155	          sent in response.  The CONNECT message is used to determine
3156	          the identify of the failover endpoint of the other end of the
3157	          TCP connection -- without it, the failover endpoint cannot be
3158	          uniquely determined.  Without knowledge of the failover end-
3159	          point, then the entity with which communications is ok is
3160	          undetermined.

3162	      3.  A STATE message must be received from the other server over
3163	          the connection.  This STATE message initializes important
3164	          information necessary to the operation of the state machine
3165	          the governs the behavior of this failover endpoint.

3167	   There are two ways that a server can determine that communications
3168	   has failed:

3170	      1.  The TCP connection can go down, yielding an error when
3171	          attempting to send a message.  This will happen at least as
3172	          often as the period of the tSend timer.

3174	      2.  The tReceive timer can expire.

3176	   In either of these cases, communications is considered interrupted.

3178	   Several difficulties arise when trying to use one TCP connection for
3179	   both bulk data transfer as well as to sense the communications status
3180	   of the other server.   One aspect of the problem stems from the dif-
3181	   ferent requirements of both uses.  The bulk data transfer is of
3182	   course critically important to the protocol, but the speed with which
3183	   it is processed is not terribly significant.  It might well be
3184	   minutes before a BNDUPD message is processed, and while not optimal,
3185	   such an occasional delay doesn't compromise the correctness of the
3186	   protocol. However, the speed with which one server detects the other
3187	   server is up (or, more importantly, down) is more highly constrained.
3188	   Generally one server should be able to detect that the other server
3189	   is not communicating within a minute or less.

3191	   These differing time constraints makes it difficult to use the same
3192	   TCP connection for data transfer as well as to sense communications
3193	   integrity.   See section 3.5 for additional details on TCP.

3195	   The solution to this problem is to require a that some message be
3196	   received by each end of the connection within a limited time or that
3197	   the connection will be considered down.  If no messages have been
3198	   sent recently, then a CONTACT message is sent.

3200	   In the case where there is no data queued to be sent, this is not a
3201	   problem, but in the case where there is data queued to be sent to the
3202	   partner, then the CONTACT message will not actually be transmitted
3203	   until the queued data is sent.  Section 3.5 explains why waiting for
3204	   TCP to determine that the connection is down is not acceptable, and
3205	   leads a requirement that the receiving server never block the sending
3206	   server from sending CONTACT packets.

3208	   In order to meet this requirement, each server tells the other server
3209	   the number of outstanding BNDUPD messages that it will accept.  The
3210	   receiving server is required to always be able to accept that many
3211	   BNDUPD messages off of the connection's input queue even if it cannot
3212	   process them immediately, and to accept all other messages immedi-
3213	   ately.

3215	   Thus, the sending server's TCP is never blocked from sending a mes-
3216	   sage except for very short periods, less than a few seconds unless
3217	   the network connection itself has problems.  In this case, if the
3218	   CONTACT messages don't make it to the partner then the partner will
3219	   close the connection.

3221	8.4.  Using the TCP connection for binding data

3223	   Binding data, in the form of BNDUPD messages and BNDACK messages to
3224	   respond to them, are sent across the TCP connection.

3226	   In order to support timely detection of any failure in the partner
3227	   server, the TCP connection MUST NOT block for more than a very short
3228	   time, on the order of a few seconds.  Therefore, a server that is
3229	   sending BNDUPD messages MUST send only a restricted number before
3230	   receiving BNDACK messages about previous messages sent.

3232	   The number of outstanding BNDUPD messages that each server will
3233	   accept without causing TCP to block transmission of additional data
3234	   (i.e, CONTACT messages) is sent by each server in the CONNECT and
3235	   CONNECTACK messages in the max-unacked-bndupd option.

3237	8.5.  Using the TCP connection for control messages

3239	   The TCP connection is used for control messages: POOLREQ, UPDREQ,
3240	   STATE, UPDREQALL and the corresponding reply messages: POOLRESP,
3241	   UPDDONE.  A server MUST immediately accept all of these messages from
3242	   the TCP connection.  A server MUST immediately accept any BNDACK
3243	   which is received as well.

3245	8.6.  Losing the TCP connection

3247	   When the TCP connection is lost, then communications is not ok with
3248	   the other server.  A server which has lost communications SHOULD
3249	   immediately attempt to reconnect to the other server, and should
3250	   retry these connection attempts periodically.

3252	   Any BNDUPD or other messages that have been received but not yet pro-
3253	   cessed from the partner SHOULD be processed as soon as possible.

3255	9.  Protocol States

3257	This section discusses the various states that a failover endpoint may
3258	take, and the server actions required when entering the state, operating
3259	in the state, and leaving the state, as well as the events that cause
3260	transitions out of the state into another state.

3262	The state transition diagram in Figure 9.2-1 is relevant for this
3263	section.  In the event that the textual description of a state differs
3264	from the state transition diagram, the textual description is to be con-
3265	sidered authoritative.  This is the common state transition diagram for
3266	both servers in a failover pair.

3268	9.1.  Server Initialization

3270	   When a server starts it starts out in STARTUP state.  See section 9.4
3271	   below for details.

3273	9.2.  Server State Transitions

3275	   Whenever a server transitions into a new state, it MUST record the
3276	   state and the time at which it entered that state in stable storage.
3277	   If communications is "ok", it MUST also send a STATE message to its
3278	   failover partner.

3280	   Figure 9.2-1 is the diagram of the server state transitions. The
3281	   remainder of this section contains information important to the
3282	   understanding of that diagram.

3284	   The server stays in the current state until all of the actions speci-
3285	   fied on the state transition are complete.  If communications fails
3286	   during one of the actions, the server simply stays in the current
3287	   state and attempts a transition whenever the conditions for a transi-
3288	   tion are later fulfilled.

3290	   In the state transition diagram below, the "+" or "-" in the upper
3291	   right corner of each state is a notation about whether communication
3292	   is ongoing with the other server.

3294	   The legend "responsive", "balanced", or "unresponsive" in each state
3295	   indicates whether the server is responsive to all DHCP client
3296	   requests, running in load balanced mode, or totally unresponsive in
3297	   the respective state.  The terms "responsive" and "unresponsive" have
3298	   the obvious meanings, while "balanced" means that a DHCP server may
3299	   respond to all DHCPREQUEST messages that are RENEWAL or REBINDING,
3300	   and to all other messages from clients for which the load balancing
3301	   algorithm indicates that it MUST respond to.  See sections 5.3 and
3302	   9.6.2 for details on load balancing.

3304	   In the state transition diagram below, when communication is reesta-
3305	   blished between the two servers, each must record the state of the
3306	   partner when communication was restored.  State transitions on one
3307	   server in some cases imply state transitions on the partner server,
3308	   so a record of the current state of the partner server must be kept
3309	   by each server.

3311	   If the state of the partner changes while communicating a server
3312	   moves through the communications-failed transition and into whatever
3313	   state results.  It then immediately moves through whatever state
3314	   transition is appropriate given the current state of the partner
3315	   server.  A server performing this operation SHOULD NOT drop the TCP
3316	   connection to its partner.

3318	   DISCUSSION:

3320	      The point of this technique is simplicity, both in explanation of
3321	      the protocol and in its implementation.  The alternative to this
3322	      technique of memory of partner state and automatic state transi-
3323	      tion on change of partner state is to have every state in the fol-
3324	      lowing diagram have a state transition for every possible state of
3325	      the partner.  With the approach adopted, only the states in which
3326	      communications are reestablished require a state transition for
3327	      each possible partner state.

3329	   The current state of a server MUST be recorded in stable storage and
3330	   thus be available to the server after a server restart.

3332	        +---------------+  V  +--------------+
3333	        |    RECOVER  - |  |  |   STARTUP  - |
3334	        |(unresponsive) |  +->|(unresponsive)|
3335	        +---------------+     +--------------+
3336	           Comm. OK             +-----------------+
3337	          Other State:-RECOVER  |  PARTNER DOWN - |<-----+
3338	          |      |              | (responsive)    |      |
3339	         All   POTENTIAL-       +-----------------+      |
3340	       Others  CONFLICT------------ | --------+  ^(see   |
3341	          |                     Comm. OK      |  | 9.8.3)|
3342	         UPDREQ(ALL)          Other State:    |  +-----+ |
3343	       Wait UPDDONE            |        |     | Comm.  | |
3344	     Wait MCLT from fail   RECOVER  All Others| Failed | |
3345	      +--------------+         |        V     V  |     | |
3346	      |RECOVER-DONE +|      +--+    +--------------+   | |
3347	      |(unresponsive)|      |       |  POTENTIAL + |<--+ |
3348	      +--------------+   Wait for +>|  CONFLICT    |     |
3349	         Comm. OK         Other   | |(unresponsive)|<--- | --+
3350	     +--Other State:-+    State:  | +--------------+     |   |
3351	     |   |           |   RECOVER  |         |            |   |
3352	     |   All      POTENT.  DONE   | Resolve Conflict     |   |
3353	     |  Others:  CONFLICT-- | ----+     (see 9.8)        |   |
3354	     | Wait for             V               V            |   |
3355	     | Other State: NORMAL +-----------------+           |   |
3356	     |   V                 |     NORMAL    + | External  |   |
3357	     |   +--+----------+-->|   (balanced)    |-Command-->+   |
3358	     |      ^          ^   +-----------------+           |   |
3359	     |      |          |            |                    |   |
3360	     |  Wait for   Comm. OK       Comm.            External  |
3361	     |   Other      Other        Failed            Command   |
3362	     |   State:     State:          |                or  |   |
3363	     |RECOVER-DONE  NORMAL     Start Safe        Safe    |   |
3364	     |      |     COMM. INT.  Period Timer       Period  |   |
3365	     |   Comm. OK.     |            V            expiration  |
3366	     |  Other State:   |  +------------------+           |   |
3367	     |    RECOVER      +--| COMMUNICATIONS - |-----------+   |
3368	     V      +-------------|   INTERRUPTED    |   Comm. OK    |
3369	    RECOVER               |  (responsive)    |--Other State:-+
3370	    RECOVER-DONE--------->+------------------+   All Others

3372	           Figure 9.2-1:  Server state diagram.

3374	9.3.  STARTUP state

3376	   The STARTUP state affords an opportunity for a server to probe its
3377	   partner server, before starting to service DHCP clients.

3379	   DISCUSSION:

3381	      Without the STARTUP state, a server would likely start in a state
3382	      derived from its previously stored state (held in stable storage),
3383	      if any.  However, this may be inconsistent with the current state
3384	      of the partner.  The STARTUP state affords the opportunity for a
3385	      server to potentially learn the partner's state and determine if
3386	      that state is consistent with its derived starting state or
3387	      whether some significant state change has occurred at the partner
3388	      that forces the server to start in another state.  This is
3389	      especially critical if significant time has elapsed while the
3390	      server was down.

3392	9.3.1.  Operation while in STARTUP state

3394	   Whenever a server is in STARTUP state, it MUST be unresponsive to
3395	   DHCP client requests, and so the time spent in the STARTUP state is
3396	   necessarily short, typically on the order of a few seconds to a few
3397	   tens of seconds.  The exact time spent in the STARTUP state is imple-
3398	   mentation dependent, and the primary and secondary server are not
3399	   required to spend the same amount of time in the STARTUP state.

3401	   Whenever a STATE message is sent to the partner while in STARTUP
3402	   state the STARTUP bit MUST be set in the server-flags option and the
3403	   previously recorded failover state MUST be placed in the server-state
3404	   option.

3406	9.3.2.  Transition out of STARTUP state

3408	   Each server starts out in startup state every time it initializes
3409	   itself, and performs the following algorithm as part of its initiali-
3410	   zation:

3412	      1.  Do not send any messages until step 5.

3414	      2.  Is there any record in stable storage of a previous failover
3415	          state?  If yes, set previous-state to the last recorded state
3416	          in stable storage, and continue with step 3.

3418	          Is there any configuration information that indicates that
3419	          this server was previously running but lost its stable
3420	          storage?  Such information must typically come from some
3421	          administrative intervention, since it is difficult for a
3422	          server to distinguish first startup from a startup after it
3423	          has lost its stable storage.  If yes, then set the previous-
3424	          state to RECOVER, and set the time-of-failure to whatever time
3425	          was configured, and go on to step 3.  This time-of-failure
3426	          will be used in the transition out of the RECOVER state into
3427	          the RECOVER-DONE state, below.

3429	          If there is no record of any previous failover state in stable
3430	          storage nor of any previous operational activity for this
3431	          server, then set the previous-state to PARTNER-DOWN if this
3432	          server is a primary and RECOVER if this server is a secondary,
3433	          and set the time-of-failure to a time before the maximum-
3434	          client-lead-time before now.  If using standard Posix times, 0
3435	          would typically do quite well.

3437	      3.  Is the previous-state NORMAL?  If yes, set the previous-state
3438	          to COMMUNICATIONS-INTERRUPTED.

3440	      4.  Start the STARTUP state timer.  The time that a server remains
3441	          in the STARTUP state (absent any communications with its
3442	          partner) is implementation dependent and SHOULD be configur-
3443	          able.  It SHOULD be long enough to for a TCP connection to be
3444	          created to a heavily loaded partner across a slow network.

3446	      5.  Attempt to create a TCP connection to the failover partner.
3447	          See section 8.2.

3449	      6.  Wait for "communications okay", i.e., the process discussed in
3450	          section 8.2 "Creating the TCP Connection", to complete,
3451	          including the receipt of a STATE message from the partner.

3453	          When and if communications become "okay", clear the STARTUP
3454	          flag, and set the current state to the previous-state.

3456	          If the partner is in PARTNER-DOWN state, and if the time at
3457	          which it entered PARTNER-DOWN state (as receive in the start-
3458	          time-of-state option in the STATE message) is later than the
3459	          last recorded time of operation of this server, then set the
3460	          current state to RECOVER.

3462	          Then, transition to the current state and take the "communica-
3463	          tions okay" state transition based on the current state of
3464	          this server and the partner.

3466	      7.  If the startup time expires, take an implementation dependent
3467	          action:  The server MAY go to the previous-state, or the
3468	          server MAY wait.

3470	          Reasons to go to previous-state and begin processing:

3472	          If the current server is the only operational server, then if
3473	          it waits, there will be no operational DHCP servers.  This
3474	          situation could occur very easily where one server fails and
3475	          then the other crashes and reboots.  If the rebooting server
3476	          doesn't start processing DHCP client requests without first
3477	          being in communication with the other server, then the level
3478	          of DHCP redundancy is not particularly high.  This is an
3479	          appropriate approach if the possibility of partition is low,
3480	          or if the safe period expiration time is well beyond the time
3481	          at which an operator would notice and react to a partition
3482	          situation.  It is also quite appropriate if the safe period
3483	          will never expire.

3485	          Reasons to wait:

3487	          If the current server has been down for longer than the
3488	          maximum-client-lead-time, and it is partitioned from the other
3489	          server, then when it returns it will attempt to use its own
3490	          available addresses to allocate to new DHCP clients, and the
3491	          other server may well be in PARTNER-DOWN state and may have
3492	          already allocated some of those available addresses to DHCP
3493	          clients.  In cases where the possibility of partition is high,
3494	          and the safe period expiration time is less than the likely
3495	          operator reaction time, this is a good approach to use.

3497	9.4.  PARTNER-DOWN state

3499	   PARTNER-DOWN state is a state either server can enter.  When in this
3500	   state, the server does not assume that the other server could still
3501	   be operating and servicing a different set of clients, but instead
3502	   assumes that it is the only server operating.  For this reason, only
3503	   one server should be operating in this state at a time.

3505	9.4.1.  Upon entry to PARTNER-DOWN state

3507	   No special actions are required when entering PARTNER-DOWN state.

3509	   The server should continue to attempt to connect to the partner
3510	   periodically.

3512	9.4.2.  Operation while in PARTNER-DOWN state

3514	   A server in PARTNER-DOWN state MUST respond to DHCP client requests.
3515	   It will allow renewal of all outstanding leases on IP addresses, and
3516	   will allocate IP addresses from its own pool, and after a fixed
3517	   period of time (the MCLT interval) has elapsed from entry into
3518	   PARTNER-DOWN state, it will allocate IP addresses from the set of all
3519	   available IP addresses.

3521	   Once a server has entered NORMAL state, the PARTNER-DOWN state is
3522	   entered only on command of an external agency (typically an adminis-
3523	   trator of some sort) or after the expiration of an externally config-
3524	   ured minimum safe-time after the beginning of COMMUNICATIONS-
3525	   INTERRUPTED state.

3527	   Any available IP address tagged as belonging to the other server (at
3528	   entry to PARTNER-DOWN state) MUST NOT be used until the maximum-
3529	   client-lead-time beyond the entry into PARTNER-DOWN state has
3530	   elapsed.

3532	   A server in PARTNER-DOWN state MUST NOT allocate an IP address to a
3533	   DHCP client different from that to which it was allocated at the
3534	   entrance to PARTNER-DOWN state until the maximum-client-lead-time
3535	   beyond the its expiration time has elapsed.  If this time would be
3536	   earlier than the current time plus the maximum-client-lead-time, then
3537	   the current time plus the maximum-client-lead-time is used.

3539	   Two options exist for lease times given out while in PARTNER-DOWN
3540	   state, with different ramifications flowing from each.

3542	   If the server wishes the Failover protocol to protect it from loss of
3543	   stable storage in PARTNER-DOWN state, then it should ensure that the
3544	   MCLT based lease time restrictions in Section 5.1 are maintained,
3545	   even in PARTNER-DOWN state.

3547	   If the server wishes to forego the protection of the Failover proto-
3548	   col in the event of loss of stable storage, then it need recognize no
3549	   restrictions on actual client lease times while in PARTNER-DOWN
3550	   state.

3552	   A server in PARTNER-DOWN state attempt to establish communications
3553	   and synchronization with its partner.

3555	9.4.3.  Transitions out of PARTNER-DOWN state

3557	   When a server in PARTNER-DOWN state succeeds in establishing a con-
3558	   nection to its partner, its actions are conditional on the state and
3559	   flags received in the STATE message from the other server as part of
3560	   the process of establishing the connection.

3562	   If the STARTUP bit is set in the server-flags option of a received
3563	   STATE message, a server in PARTNER-DOWN state MUST NOT take any state
3564	   transitions based on reestablishing communications. Essentially, if a
3565	   server is in PARTNER-DOWN state, it ignores all STATE messages from
3566	   its partner that have the STARTUP bit set in the server-flags option
3567	   of the STATE message.

3569	   If the STARTUP bit is not set in the server-flags option of a STATE
3570	   message received from its partner, then a server in PARTNER-DOWN
3571	   state take the following actions based on the value of the server-
3572	   state option in the received STATE message:

3574	      o partner in NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN or
3575	        POTENTIAL-CONFLICT state

3577	        transition to POTENTIAL-CONFLICT state

3579	      o partner in RECOVER state

3581	        stay in PARTNER-DOWN state

3583	      o partner in RECOVER-DONE state

3585	        transition into NORMAL state

3587	9.5.  RECOVER state

3589	   This state indicates that the server has no information in its stable
3590	   storage or that it is re-integrating with a server in PARTNER-DOWN
3591	   state after it has been down.  A server in this state will attempt to
3592	   refresh its stable storage from the other server.

3594	9.5.1.  Operation in RECOVER state

3596	   A server in RECOVER MUST NOT respond to DHCP client requests.

3598	   A server in RECOVER state will attempt to reestablish communications
3599	   with the other server.

3601	9.5.2.  Transitions out of RECOVER state

3603	   If the other server is in POTENTIAL-CONFLICT state when communica-
3604	   tions are reestablished, then the server in RECOVER state will move
3605	   to POTENTIAL-CONFLICT state itself.

3607	   If the other server is in RECOVER state, then this server SHOULD
3608	   signal an error and halt processing.

3610	   If the other server is in any other state, then the server in RECOVER
3611	   state will request an update of missing binding information by send-
3612	   ing an UPDREQ message.  If the server has been instructed (through
3613	   configuration or other external agency) that it has lost its stable
3614	   storage, it MUST send an UPDREQALL message, otherwise it MUST send an
3615	   UPDREQ message.

3617	   It will wait for an UPDDONE message, and upon receipt of that message
3618	   it will start a timer whose expiration is set to a time equal to the
3619	   time the server went down (if known) or the current time (if the
3620	   down-time is unknown) plus the maximum-client-lead-time.  When this
3621	   timer goes off, the server will transition into RECOVER-DONE state.
3622	   This is to allow any IP addresses that were allocated by this server
3623	   prior to loss of its client binding information in stable storage to
3624	   contact the other server or to time out.

3626	   See Figure 9.5.2-1.

3628	   DISCUSSION:

3630	      The actual requirement on this wait period in RECOVER is that it
3631	      start when the recovering server went down, not necessarily when
3632	      it came back up.  If the time when the recovering server failed is
3633	      known, then it could be communicated to the recovering server, and
3634	      the wait period could be reduced to the maximum-client-lead-time
3635	      less the difference between the current time and the time the
3636	      server failed. In this way, the waiting period could be minimized.

3638	   If an UPDDONE message isn't received within an implementation depen-
3639	   dent amount of time, and no BNDUPD message are being received, then
3640	   the UPDREQ(ALL) message will be re-transmitted.

3642	                A                                        B
3643	              Server                                  Server

3645	                |                                        |
3646	             RECOVER                               PARTNER-DOWN
3647	                |                                        |
3648	                | >--UPDREQ-------------------->         |
3649	                |                                        |
3650	                |        <---------------------BNDUPD--< |
3651	                | >--BNDACK-------------------->         |
3652	               ...                                      ...
3653	                |                                        |
3654	                |        <---------------------BNDUPD--< |
3655	                | >--BNDACK-------------------->         |
3656	                |                                        |
3657	                |        <--------------------UPDDONE--< |
3658	                |                                        |
3659	       Wait MCLT from last known                         |
3660	          time of operation                              |
3661	                |                                        |
3662	           RECOVER-DONE                                  |
3663	                |                                        |
3664	                | >--STATE-(RECOVER-DONE)------>         |
3665	                |                                     NORMAL
3666	                |        <-------------(NORMAL)-STATE--< |
3667	             NORMAL                                      |
3668	                |                                        |
3669	                |                                        |

3671	              Figure 9.5.2-1:  Transition out of RECOVER state

3673	9.6.  NORMAL state

3675	   NORMAL state is the state used by a server when it can communicate
3676	   with the other server.

3678	9.6.1.  Upon Entry to NORMAL state

3680	   When entering NORMAL state, a server will send to the other server
3681	   all currently unacknowledged binding updates as BNDUPD messages.

3683	   When the above process is complete, if the server entering NORMAL
3684	   state is a secondary server, then it will request IP addresses for
3685	   allocation using the POOLREQ message.

3687	9.6.2.  Processing DHCP client requests and load balancing

3689	   When in NORMAL state, each server MUST process all requests from some
3690	   DHCP clients, and MUST NOT process any request other than a
3691	   DHCPREQUEST/RENEWAL or a DHCPREQUEST/REBINDING request from some
3692	   other DHCP clients.  The load balancing algorithm determines into
3693	   which set a particular DHCP client falls.

3695	   As discussed in section 5.3, each server will take the client-
3696	   identifier from each DHCP client request (or the htype concatenated
3697	   to the front of the chaddr if no client-identifier is present in the
3698	   request), and hash it with the algorithm given in section 12.  The
3699	   results of this hash algorithm yields a number between 0 and 255.
3700	   This number is used to index into the bit array received by a server
3701	   in the hash-bucket-assignment option (if the server is a secondary),
3702	   or into the inverse of the bit array sent to the secondary in the
3703	   hash-bucket-assignment option if the server is a primary.

3705	   If the bit found from this indexing process is a 1 bit, then the
3706	   server MUST process this DHCP request.

3708	   In NORMAL state, a server MUST processes every DHCPREQUEST/RENEWAL or
3709	   DHCPREQUEST/REBINDING request it receives.

3711	9.6.3.  Operation in NORMAL state

3713	   When in NORMAL state, for every DHCP client request that it
3714	   processes, as determined by the algorithm described in section 9.6.2,
3715	   above, a server will operate in the following manner:

3717	      o Lease time calculations

3719	        As discussed in section 5.2.1, "Control of lease time", the
3720	        lease interval given to a DHCP client can never be more than the
3721	        MCLT greater than the most recently received potential-
3722	        expiration-time from the failover partner or the current time,
3723	        whichever is later.

3725	        As long as a server adheres to this constraint, the specifics of
3726	        the lease interval that it gives to a DHCP client or the value
3727	        of the potential-expiration-time sent to its failover partner
3728	        are implementation dependent.  One possible approach is dis-
3729	        cussed in section 5.2.1, but that particular approach is in no
3730	        way required by this protocol.

3732	      o Lazy update of partner server

3734	        After an ACK of a IP address binding, the server servicing a
3735	        DHCP client request attempts to update its partner with the new
3736	        binding information.  The lease time used in the update of the
3737	        secondary MUST be at that given to the DHCP client in the
3738	        DHCPACK, and the potential-expiration-time MUST be at least the
3739	        lease time, and SHOULD be longer.

3741	      o Reallocation of IP addresses between clients

3743	        Whenever a client binding is released or expires, a BNDUPD mes-
3744	        sage must be sent to partner, setting the binding state to
3745	        RELEASED or EXPIRED.  However, until a BNDACK is received for
3746	        this message, the IP address cannot be allocated to another
3747	        client.  It can be allocated to the same client again.

3749	   In normal state, the each server receives binding updates from its
3750	   partner server in BNDUPD messages.  It records these in its client
3751	   binding database in stable storage and then sends a corresponding
3752	   BNDACK message to the primary server.  It MUST ensure that the infor-
3753	   mation is recorded in stable storage prior to sending the BNDACK mes-
3754	   sage back to the primary server.

3756	9.6.4.  Transitions out of NORMAL state

3758	   If an external command is received by a server in NORMAL state
3759	   informing it that its partner is down, then transition into PARTNER-
3760	   DOWN state.

3762	   If a server in NORMAL state fails to receive acks to messages sent to
3763	   its partner for an implementation dependent period of time, it MAY
3764	   move into COMMUNICATIONS-INTERRUPTED state.  This situation might
3765	   occur if the partner server was capable of maintaining the TCP con-
3766	   nection between the server and also capable of sending a CONTACT mes-
3767	   sage every tSend seconds, but was (for some reason) incapable of pro-
3768	   cessing BNDUPD messages.

3770	   If the communications is determined to not be "ok" (as defined in
3771	   section 8), then transition into COMMUNICATIONS-INTERRUPTED state.

3773	   If a server in NORMAL state receives any messages from its partner
3774	   where the partner has changed state from that expected by the server
3775	   in NORMAL state, then the server should transition into
3776	   COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran-
3777	   sition from there.  For example, it would be expected for the partner
3778	   to transition from POTENTIAL-CONFLICT into NORMAL state, but not for
3779	   the partner to transition from NORMAL into POTENTIAL-CONFLICT state.

3781	9.7.  COMMUNICATIONS-INTERRUPTED State

3783	   A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is
3784	   unable to communicate with the other server.  Primary and secondary
3785	   servers cycle automatically (without administrative intervention)
3786	   between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network
3787	   connection between them fails and recovers, or as the partner server
3788	   cycles between operational and non-operational.  No duplicate IP
3789	   address allocation can occur while the servers cycle between these
3790	   states.

3792	9.7.1.  Upon Entry to COMMUNICATIONS-INTERRUPTED state

3794	   When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been
3795	   configured to support an automatic transition out of COMMUNICATIONS-
3796	   INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period"
3797	   has been configured, see section 10), then a timer MUST be started
3798	   for a the length of the configured safe period.

3800	   A server transitioning into the COMMUNICATIONS-INTERRUPTED state from
3801	   the NORMAL state SHOULD raise some alarm condition to alert adminis-
3802	   trative staff to a potential problem in the DHCP subsystem.

3804	9.7.2.  Operation in COMMUNICATIONS-INTERRUPTED State

3806	   In this state a server MUST respond to all DHCP client requests, and
3807	   the algorithm for load balancing described in section 5.3 MUST NOT be
3808	   used.  When allocating new IP addresses, each server allocates from
3809	   its own IP address pool, where the primary MUST allocate only FREE IP
3810	   addresses, and the secondary MUST allocate only BACKUP IP addresses.
3811	   When responding to renewal requests, each server will allow continued
3812	   renewal of a DHCP client's current lease on an IP address irrespec-
3813	   tive of whether that lease was given out by the receiving server or
3814	   not, although the renewal period MUST not exceed the maximum client
3815	   lead time (MCLT) beyond the potential-expiration-time already ack-
3816	   nowledged by the other server or the lease-expiration-time or
3817	   potential-expiration-time received from the partner server.

3819	   However, since the server cannot communicate with its partner in this
3820	   state, the acknowledged-potential-expiration time will not be updated
3821	   in any new bindings.  This is likely to eventually cause the actual-
3822	   client-lease-times to be the current-time plus the maximum-client-
3823	   lead-time (unless this is greater than the desired-client-lease-
3824	   time).

3826	9.7.3.  Transition out of COMMUNICATIONS-INTERRUPTED State

3828	   If the safe period timer expires while a server is in the
3829	   COMMUNICATIONS-INTERRUPTED state, it will transition immediately into
3830	   PARTNER-DOWN state.

3832	   If an external command is received by a server in COMMUNICATIONS-
3833	   INTERRUPTED state informing it that its partner is down, it will
3834	   transition immediately into PARTNER-DOWN state.

3836	   If communications is restored with the other server, then the server
3837	   in COMMUNICATIONS-INTERRUPTED state will transition into another
3838	   state based on the state of the partner:

3840	      o partner in NORMAL or COMMUNICATIONS-INTERRUPTED

3842	        Transition into the NORMAL state.

3844	      o partner in RECOVER

3846	        Stay in COMMUNICATIONS-INTERRUPTED state.

3848	      o partner in RECOVER-DONE

3850	        Transition into NORMAL state.

3852	      o partner in PARTNER-DOWN or POTENTIAL-CONFLICT

3854	        Transition into POTENTIAL-CONFLICT state.

3856	      o partner in PAUSED

3858	        Stay in COMMUNICATIONS-INTERRUPTED state.

3860	      o partner in SHUTDOWN

3862	        Transition into PARTNER-DOWN state.

3864	   The following figure illustrates the transition from NORMAL to
3865	   COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again.

3867	             Primary                                Secondary
3868	              Server                                  Server

3870	              NORMAL                                  NORMAL
3871	                | >--CONTACT------------------->         |
3872	                |        <--------------------CONTACT--< |
3873	                |         [TCP connection broken]        |
3874	           COMMUNICATIONS          :              COMMUNICATIONS
3875	             INTERRUPTED           :                INTERRUPTED
3876	                |      [attempt new TCP connection]      |
3877	                |         [connection succeeds]          |
3878	                |                                        |
3879	                | >--CONNECT------------------->         |
3880	                |        <-----------------CONNECTACK--< |
3881	                |        <-------------------STATE-----< |
3882	                |                                     NORMAL
3883	                | >--STATE--------------------->         |
3884	              NORMAL                                     |
3885	                | >--BNDUPD-------------------->         |
3886	                |        <---------------------BNDACK--< |
3887	                |                                        |
3888	                |        <---------------------BNDUPD--< |
3889	                | >------BNDACK---------------->         |
3890	               ...                                      ...
3891	                |                                        |
3892	                |        <--------------------POOLREQ--< |
3893	                | >--POOLRESP-(2)-------------->         |
3894	                |                                        |
3895	                | >--BNDUPD-(#1)--------------->         |
3896	                |        <---------------------BNDACK--< |
3897	                |                                        |
3898	                |        <--------------------POOLREQ--< |
3899	                | >--POOLRESP-(0)-------------->         |
3900	                |                                        |
3901	                | >--BNDUPD-(#2)--------------->         |
3902	                |        <---------------------BNDACK--< |
3903	                |                                        |

3905	       Figure 9.7.3-1:  Transition from NORMAL to COMMUNICATIONS-
3906	                        INTERRUPTED and back (example with 2
3907	                        addresses allocated to secondary)

3909	9.8.  POTENTIAL-CONFLICT state

3911	   This state indicates that the two servers are attempting to re-
3912	   integrate with each other, but at least one of them was running in a
3913	   state that did not guarantee automatic reintegration would be
3914	   possible.  In POTENTIAL-CONFLICT state the servers may determine that
3915	   the same IP address has been offered and accepted by two different
3916	   DHCP clients.

3918	   It is a goal of this protocol to minimize the possibility that
3919	   POTENTIAL-CONFLICT state is ever entered.

3921	9.8.1.  Upon Entry to POTENTIAL-CONFLICT

3923	   When a primary server enters POTENTIAL-CONFLICT state it should
3924	   request that the secondary send it all updates of which it is
3925	   currently unaware by sending an UPDREQ message to the secondary
3926	   server.

3928	   A secondary server entering POTENTIAL-CONFLICT state will wait for
3929	   the primary to send it an UPDREQ message.

3931	9.8.2.  Operation in POTENTIAL-CONFLICT state

3933	   Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming
3934	   DHCP requests.

3936	9.8.3.  Transitions out of POTENTIAL-CONFLICT state

3938	   If communications fails with the partner while in POTENTIAL-CONFLICT
3939	   state, then a primary server will transition to PARTNER-DOWN state
3940	   and a secondary server will stay in POTENTIAL-CONFLICT state.

3942	   Whenever either server receives an UPDDONE message from its partner
3943	   while in POTENTIAL-CONFLICT state, it MUST transition to NORMAL
3944	   state.  This will cause the primary server to leave POTENTIAL-
3945	   CONFLICT state prior to the secondary, since the primary sends an
3946	   UPDREQ message and receives an UPDDONE before the secondary sends an
3947	   UPDREQ message and receives its UPDDONE message.

3949	   When a secondary server receives an indication that the primary
3950	   server has transitioned from POTENTIAL-CONFLICT to NORMAL state, it
3951	   SHOULD send an UPDREQ message to the primary server.

3953	              Primary                                Secondary
3954	              Server                                  Server

3956	                |                                        |
3957	         POTENTIAL-CONFLICT                    POTENTIAL-CONFLICT
3958	                |                                        |
3959	                | >--UPDREQ-------------------->         |
3960	                |                                        |
3961	                |        <---------------------BNDUPD--< |
3962	                | >--BNDACK-------------------->         |
3963	               ...                                      ...
3964	                |                                        |
3965	                |        <---------------------BNDUPD--< |
3966	                | >--BNDACK-------------------->         |
3967	                |                                        |
3968	                |        <--------------------UPDDONE--< |
3969	              NORMAL                                     |
3970	                | >--STATE--(NORMAL)----------->         |
3971	                |        <---------------------UPDREQ--< |
3972	                |                                        |
3973	                | >--BNDUPD-------------------->         |
3974	                |        <---------------------BNDACK--< |
3975	               ...                                      ...
3976	                | >--BNDUPD-------------------->         |
3977	                |        <---------------------BNDACK--< |
3978	                |                                        |
3979	                | >--UPDDONE------------------->         |
3980	                |                                     NORMAL
3981	                |                                        |
3982	                |        <--------------------POOLREQ--< |
3983	                | >------POOLRESP-(?)---------->         |
3984	                |                                        |

3986	           Figure 9.8.3-1:  Transition out of POTENTIAL-CONFLICT

3988	9.9.  RECOVER-DONE state

3990	   This state exists to allow an interlocked transition for one server
3991	   from RECOVER state and another server from PARTNER-DOWN or
3992	   COMMUNICATIONS-INTERRUPTED state into NORMAL state.

3994	9.9.1.  Operation in RECOVER-DOWN state

3996	   A server in RECOVER-DONE state MUST respond only to
3997	   DHCPREQUEST/RENEWAL and DHCPREQUEST/REBINDING DHCP messages.

3999	9.9.2.  Transitions out of RECOVER-DONE state

4001	   When a server in RECOVER-DONE state determines that its partner
4002	   server has entered NORMAL state, then it will transition into NORMAL
4003	   state as well.

4005	9.10.  PAUSED state

4007	   This state exists to allow one server to inform another that it will
4008	   be out of service for what is predicted to be a relatively short
4009	   time, and to allow the other server to transition to COMMUNICATIONS-
4010	   INTERRUPTED state immediately and to begin servicing all DHCP clients
4011	   with no interruption in service to new DHCP clients.

4013	   A server which is aware that it is shutting down temporarily SHOULD
4014	   send a STATE message with the server-state option containing PAUSED
4015	   state.

4017	   While a server may or may not transition internally into PAUSED
4018	   state, the 'previous' state determined when it is restarted MUST be
4019	   the state the server was in prior to receiving the command to shut-
4020	   down and restart and which precedes its entry into the PAUSED state.
4021	   See section 9.3.2 concerning the use of the previous state upon
4022	   server restart.

4024	9.10.1.  Upon entry to PAUSED state

4026	   When entering PAUSED state, the server MUST store the previous state
4027	   in stable storage, and use that state as the previous state when it
4028	   is restarted.

4030	9.10.2.  Transitions out of PAUSED state

4032	   A server transitions out of PAUSED state by being restarted.  At that
4033	   time, the previous state MUST be the state the server was in prior to
4034	   entering the PAUSED state.

4036	9.11.  SHUTDOWN state

4038	   This state exists to allow one server to inform another that it will
4039	   be out of service for what is predicted to be a relatively long time,
4040	   and to allow the other server to transition immediately to PARTNER-
4041	   DOWN state, and take over completely for the server going down.

4043	   A server which is aware that it is shutting down SHOULD send a STATE
4044	   message with the server-state field containing SHUTDOWN.

4046	   While a server may or may not transition internally into SHUTDOWN
4047	   state, the 'previous' state determined when it is restarted MUST be
4048	   the state active prior to the command to shutdown.  See section 9.3.2
4049	   concerning the use of the previous state upon server restart.

4051	9.11.1.  Upon entry to SHUTDOWN state

4053	   When entering SHUTDOWN state, the server MUST record the previous
4054	   state in stable storage for use when the server is restarted.  It
4055	   also MUST record the current time as the last time operational.

4057	   A server which is aware that it is shutting down SHOULD send a STATE
4058	   message with the server-state field containing SHUTDOWN.

4060	9.11.2.  Operation in SHUTDOWN state

4062	   A server in SHUTDOWN state MUST NOT respond to any DHCP client input.

4064	   If a server receives any message indicating that the partner has
4065	   moved to PARTNER-DOWN state while it is in SHUTDOWN state then it
4066	   MUST record RECOVER state as the previous state to be used when it is
4067	   restarted.

4069	   A server SHOULD wait for a few seconds after informing the partner of
4070	   entry into SHUTDOWN state (if communications are okay) to determine
4071	   if it will enter PARTNER-DOWN state.

4073	9.11.3.  Transitions out of SHUTDOWN state

4075	   A server transitions out of SHUTDOWN state by being restarted.

4077	10.  Safe Period

4079	   Due to the restrictions imposed on each server while in
4080	   COMMUNICATIONS-INTERRUPTED state, long-term operation in this state
4081	   is not feasible for either server.  One reason that these states
4082	   exist at all, is to allow the servers to easily survive transient
4083	   network communications failures of a few minutes to a few days
4084	   (although the actual time periods will depend a great deal on the
4085	   DHCP activity of the network in terms of arrival and departure of
4086	   DHCP clients on the network).

4088	   Eventually, when the servers are unable to communicate, they will
4089	   have to move into a state where they no longer can re-integrate
4090	   without the some possibility of a duplicate IP address allocation.
4091	   There are two ways that they can move into this state (known as
4092	   PARTNER-DOWN).

4094	   They can either be informed by external command that, indeed, the
4095	   partner server is down.  In this case, there is no difficulty in mov-
4096	   ing into the PARTNER-DOWN state since it is an accurate reflection of
4097	   reality and the protocol has been designed to operate correctly (even
4098	   during reintegration) if, when in PARTNER-DOWN state the partner is,
4099	   indeed, down.

4101	   The more difficult scenario is when the servers are running unat-
4102	   tended for extended periods, and in this case an option is provided
4103	   to configure something called a "safe-period" into each server.  This
4104	   OPTIONAL safe-period is the period after which either the primary or
4105	   secondary server will automatically transition to PARTNER-DOWN from
4106	   COMMUNICATIONS-INTERRUPTED state.  If this transition is completed
4107	   and the partner is not down, then the possibility of duplicate IP
4108	   address allocations will exist.

4110	   The goal of the "safe-period" is to allow network operations staff
4111	   some time to react to a server moving into COMMUNICATIONS-INTERRUPTED
4112	   state.  During the safe-period the only requirement is that the net-
4113	   work operations staff determine if both servers are still running --
4114	   and if they are, to either fix the network communications failure
4115	   between them, or to take one of the servers down before the  expira-
4116	   tion of the safe-period.

4118	   The length of the safe-period is installation dependent, and depends
4119	   in large part on the number of unallocated IP addresses within the
4120	   subnet address pool and the expected frequency of arrival of previ-
4121	   ously unknown DHCP clients requiring IP addresses.  Many environments
4122	   should be able to support safe-periods of several days.

4124	   During this safe period, either server will allow renewals from any
4125	   existing client.  The only limitation concerns the need for IP
4126	   addresses for the DHCP server to hand out to new DHCP clients and the
4127	   need to re-allocate IP addresses to different DHCP clients.

4129	   The number of "extra" IP addresses required is equal to the expected
4130	   total number of new DHCP clients encountered during the safe period.
4131	   This is dependent only on the arrival rate of new DHCP clients, not
4132	   the total number of outstanding leases on IP addresses.

4134	   In the unlikely event that a relatively short safe period of an hour
4135	   is all that can be used (given a dearth of IP addresses or a very
4136	   high arrival rate of new DHCP clients), even that can provide sub-
4137	   stantial benefits in allowing the DHCP subsystem to ride through
4138	   minor problems that could occur and be fixed within that hour.  In
4139	   these cases, no possibility of duplicate IP address allocation
4140	   exists, and re-integration after the failure is solved will be
4141	   automatic and require no operator intervention.

4143	11.  Security

4145	   It is very desirable to assure the integrity of failover partners and
4146	   to thus ensure proper operation of the servers. For example, denial
4147	   of service attacks are possible by the communication of invalid state
4148	   information to both servers.

4150	   The Failover protocol MAY be secured either by using a simple shared
4151	   secret message digest which covers each message or by using TLS [TLS]
4152	   (Transport Layer Security).

4154	11.1.  Simple shared secret

4156	   A simple shared secret message digest MAY be used to cover each mes-
4157	   sage.  Since there are a number of configuration parameters that must
4158	   already be the same on each server in a pair, it is not unreasonable
4159	   to require a shared secret to be configured as well.

4161	   Only information within the packet and covered by the message digest
4162	   is used for operation of the protocol. It is for this reason that the
4163	   IP address of the sending server is sent in the sending-server-IP-
4164	   address option of the CONNECT and CONNECTACK messages.

4166	   This message digest is placed in the message-digest option.  The dig-
4167	   est covers the message prior to the inclusion of the message-digest
4168	   option.

4170	11.2.  TLS

4172	   TLS, Transport Layer Security, as specified in [TLS] MAY be used. The
4173	   use of TLS would be similar to the way it is used with SMTP [SMTPTLS]
4174	   and IMAP/POP3/ACAP [IPAMTLS].

4176	   To request the use TLS, the server that successfully opened a connec-
4177	   tion to its peer MUST send the TLS option as part of the CONNECT mes-
4178	   sage.  The server receiving the TLS option MUST respond with a TLS-
4179	   reply option indicating its acceptace or rejection of the TLS-request
4180	   in the CONNECT message.

4182	   If the CONNECTACK message contained a TLS-reply of 1 , then both
4183	   servers begin TLS negotiation.

4185	   Upon completion of this negotiation, the server which originally sent
4186	   the CONNECT message MUST resent its CONNECT message without any TLS-
4187	   request, and must wait for a corresponding CONNECTACK.

4189	   Implementation of the TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA [TLS] cipher
4190	   suite is REQUIRED in Failover servers supporting TLS. This is
4191	   important as it assures that any two compliant implementations can be
4192	   configured to interoperate.

4194	12.  Hash algorithm for load balancing

4196	The following hash function is an implementation of the algorithm known
4197	as "Pearson's hash".  The Pearson's hash  algorithm was originally pub-
4198	lished in the Communications of the ACM  Vol.33, No.  6 (June 1990), pp.
4199	677-680.  The author,  Peter K. Pearson, has kindly granted his permis-
4200	sion to use this algorithm, free of any encumbrances.

4202	To make  Primary-backup load balancing possible , both servers MUST use
4203	the same hash function.

4205	    /* A "mixing table" of 256 distinct values, in pseudo-random order. */

4207	    unsigned char failover_hash_mx_tbl[256] =
4208	    {
4209	    251, 175, 119, 215,  81,  14,  79, 191, 103,  49,
4210	    181, 143, 186, 157,   0, 232,  31,  32,  55,  60,
4211	    152,  58,  17, 237, 174,  70, 160, 144, 220,  90,
4212	    57,  223,  59,   3,  18, 140, 111, 166, 203, 196,
4213	    134, 243, 124,  95, 222, 179, 197,  65, 180,  48,
4214	     36,  15, 107,  46, 233, 130, 165,  30, 123, 161,
4215	    209,  23,  97,  16,  40,  91, 219,  61, 100,  10,
4216	    210, 109, 250, 127,  22, 138,  29, 108, 244,  67,
4217	    207,   9, 178, 204,  74,  98, 126, 249, 167, 116,
4218	    34,   77, 193, 200, 121,   5,  20, 113,  71,  35,
4219	    128,  13, 182,  94,  25, 226, 227, 199,  75,  27,
4220	     41, 245, 230, 224,  43, 225, 177,  26, 155, 150,
4221	    212, 142, 218, 115, 241,  73,  88, 105,  39, 114,
4222	     62, 255, 192, 201, 145, 214, 168, 158, 221, 148,
4223	    154, 122,  12,  84,  82, 163,  44, 139, 228, 236,
4224	    205, 242, 217,  11, 187, 146, 159,  64,  86, 239,
4225	    195,  42, 106, 198, 118, 112, 184, 172,  87,   2,
4226	    173, 117, 176, 229, 247, 253, 137, 185,  99, 164,
4227	    102, 147,  45,  66, 231,  52, 141, 211, 194, 206,
4228	    246, 238,  56, 110,  78, 248,  63, 240, 189,  93,
4229	     92,  51,  53, 183,  19, 171,  72,  50,  33, 104,
4230	    101,  69,   8, 252,  83, 120,  76, 135,  85,  54,
4231	    202, 125, 188, 213,  96, 235, 136, 208, 162, 129,
4232	    190, 132, 156,  38,  47,   1,   7, 254,  24,   4,
4233	    216, 131,  89,  21,  28, 133,  37, 153, 149,  80,
4234	    170,  68,   6, 169, 234, 151
4235	    };
4236	    unsigned char failover_p_hash(
4237	            unsigned char *key, /* The key to be hashed (e.g., MAC address)
4238	*/
4239	            int len             /* Length of key in bytes */       )
4240	    {
4241	        unsigned char hash  = len;
4242	        int i;

4244	        for( i=len ; i > 0 ;  )
4245	        {
4246	            hash = failover_p_mx_tbl  [ hash ^ key[ --i ] ];
4247	        }
4248	        return( hash );
4249	    }

4251	13.  Acknowledgments

4253	   Ralph Droms started it all, by sketching out an initial interserver
4254	   draft that embodied ideas from several past IETF meetings.  In that
4255	   draft, he acknowledged contributions by Jeff Mogul, Greg Minshall,
4256	   Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group.

4258	   Kim Kinnear and Bob Cole each extended that draft, separately and
4259	   then together, until they created an interserver draft that supported
4260	   any number of servers.  The complexity of that approach was just too
4261	   great, and that draft wasn't greeted with enthusiasm by many, includ-
4262	   ing its authors.

4264	   It did however lead to a much simpler approach embodied in the first
4265	   Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph
4266	   Droms.  This draft posited only two servers -- a primary and a secon-
4267	   dary.

4269	   Kim Kinnear then wrote the Safe Failover draft to layer on top of the
4270	   Failover Draft and increase its robustness in the face of certain
4271	   rare network failures.

4273	   At the spring 1998 IETF meeting in LA, the DHC working group said
4274	   that they wanted a merged Failover and Safe Failover draft.  Steve
4275	   Gonczi and Bernie Volz stepped up and produced the raw material for
4276	   such a merged draft, along with a new message format designed around
4277	   DHCP options and other extensions and clarifications.  Kim Kinnear
4278	   edited their work into draft format and made other changes in time
4279	   for the Summer Chicago IETF meeting.

4281	   During the summer and fall of 1998, two groups worked on separate
4282	   implementations of the UDP failover draft.  Bernie Volz and Steve
4283	   Gonczi constituted one group, and Kim Kinnear, Mark Stapp and Paul
4284	   Fox made up the other.  These two groups worked together to produce
4285	   considerable changes and simplifications of the protocol during that
4286	   period, and Steve Gonczi and Kim Kinnear edited those changes into
4287	   -03 draft in time for submission to the December 1998 Orlando IETF
4288	   meeting.

4290	   In February of 1999 Kim Kinnear and Mark Stapp hosted a meeting on
4291	   people interested in the failover draft.  During that meeting a gen-
4292	   eral agreement was reached to recast the failover protocol to use TCP
4293	   instead of UDP.  In addition, the group together brainstormed a work-
4294	   able load-balancing technique.  Kim Kinnear volunteered to rewrite
4295	   the entire draft to include the changes made at that meeting as well
4296	   as to restructure the draft along guidelines suggested by Thomas Nar-
4297	   ten.  The current draft represents the results of that effort.

4299	   The initial idea for a hash-based load balancing approach was offered
4300	   by Ted Lemon, and the determination of an algorithm and its integra-
4301	   tion into the draft was done by Steve Gonczi.  The security section
4302	   was spearheaded by Bernie Volz.  Both contributed considerably to the
4303	   ideas and text in the rest of the draft with several reviews.

4305	   These most recent changes have been widely circulated among the other
4306	   authors, but that does not preclude any of them from expressing
4307	   disagreement with what is contained in this draft at any future time.

4309	   Many people have reviewed the various earlier drafts that went into
4310	   this result.  At American Internet, ideas were contributed by Brad
4311	   Parker.  At Cisco Systems, Paul Fox, and Ellen Garvey have contri-
4312	   buted greatly to the form of the protocol.

4314	   Glenn Waters of Bay Networks contributed ideas and enthusiasm to make
4315	   a Failover protocol that was both "safe" and "lazy".

4317	   Many thanks to Peter K. Pearson, the author of Pearson's hash who has
4318	   kindly granted his permission to use this algorithm, for DHCP load
4319	   balancing, free of any encumbrances.

4321	14.  References

4323	   [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC
4324	      2131, March 1997.

4326	   [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate
4327	      Requirement Levels", RFC 2119.

4329	   [RFC 2132] Alexander, S.,  Droms, R., "DHCP Options and BOOTP Vendor
4330	      Extensions", Internet RFC 2132, March 1997.

4332	   [TLS] Dierks, T., "The TLS Protocol, Version 1.0", RFC 2246, January
4333	      1999.

4335	   [SMTPTLS] Hoffman, P., "SMTP Service Extension for Secure SMTP over
4336	      TLS", RFC 2487, January 1999.

4338	   [IMAPTLS] Newman, C., "Using TLS with IMAP, POP3, and ACAP", RFC
4339	      2595, June 1999.

4341	   [NAMESPACE] Carney, M., "draft-ietf-dhc-option_review_and_namespace-
4342	      00.txt", June 1999.

4344	   [DDNS] Rekhter, Y., Stapp, M., "draft-ietf-dhc-dhcp-dns-10.txt",
4345	      June, 1999.

4347	15.  Author's information

4349	      Ralph Droms
4350	      323 Dana Engineering
4351	      Bucknell University
4352	      Lewisburg, PA  17837

4354	      Phone: (717) 524-1145
4355	      EMail: droms@bucknell.edu

4357	      Greg Rabil, Mike Dooley, Arun Kapur
4358	      Lucent Technologies (Quadritek)
4359	      10 Valley Stream Parkway, Suite 240
4360	      Malvern, PA 19355

4362	      Phone: (800) 208-2747

4364	      EMail: grabil@lucent.com
4365	             mdooley@lucent.com
4366	             akapur@lucent.com

4368	      Kim Kinnear
4369	      Mark Stapp
4370	      Cisco Systems
4371	      250 Apollo Drive
4372	      Chelmsford, MA  01824
4373	      Phone: (978) 244-8000

4375	      EMail: kkinnear@cisco.com
4376	             mjs@cisco.com

4378	      Bernie Volz
4379	      Steve Gonczi
4380	      Process Software Corporation
4381	      959 Concord St.
4382	      Framingham, MA  01701

4384	      Phone: (508) 879-6994

4386	      EMail: volz@process.com
4387	             gonczi@process.com

4389	16.  Full Copyright Statement

4391	Copyright (C) The Internet Society (1999). All Rights Reserved.

4393	This document and translations of it may be copied and furnished to oth-
4394	ers, and derivative works that comment on or otherwise explain it or
4395	assist in its implementation may be prepared, copied, published and dis-
4396	tributed, in whole or in part, without restriction of any kind, provided
4397	that the above copyright notice and this paragraph are included on all
4398	such copies and derivative works.  However, this document itself may not
4399	be modified in any way, such as by removing the copyright notice or
4400	references to the Internet Society or other Internet organizations,
4401	except as needed for the  purpose of developing Internet standards in
4402	which case the procedures for copyrights defined in the Internet Stan-
4403	dards process must be followed, or as required to translate it into
4404	languages other than English.

4406	The limited permissions granted above are perpetual and will not be
4407	revoked by the Internet Society or its successors or assigns.

4409	This document and the information contained herein is provided on an "AS
4410	IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
4411	FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
4412	LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
4413	INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT-
4414	NESS FOR A PARTICULAR PURPOSE.

4416	Open Issues

4418	   These issues need to be resolved:

4420	      1.  We need to deal with the option space, and the procedures for
4421	          managing it.  Probably IANA.

4423	      2.  Figure out a better way to identify vendors.  How about an
4424	          SNMP Enterprise MIB value?

4426	      3.  Need more clarity in the conflict resolution section, probably
4427	          backed up by real implementation experience.  Learned a lot
4428	          from the UDP implementation and experience with it in the real
4429	          world, and need equivalent learning from a TCP implementation
4430	          with no messages out of order or lost.