idnits 2.17.1 

draft-ietf-dhc-failover-09.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 112 instances of too long lines in the document, the longest
     one being 7 characters in excess of 72.

  ** The abstract seems to contain references ([RFC2131]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 1584 has weird spacing: '...od ends  addre...'

  == Line 2140 has weird spacing: '...eserved    not...'

  == Line 2671 has weird spacing: '... accept    acc...'

  == Line 2672 has weird spacing: '... accept    acc...'

  == Line 2673 has weird spacing: '... accept    acc...'

  == (7 more instances...)

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     In this state a server MUST respond to all DHCP client requests,
     and any load balancing (described in section 5.3) MUST NOT be used.  When
     allocating new IP addresses, each server SHOULD allocate from its own IP
     address pool (if that can be determined), where the primary SHOULD
     allocate only FREE IP addresses, and the secondary SHOULD allocate only
     BACKUP IP addresses.  When responding to renewal requests, each server
     will allow continued renewal of a DHCP client's current lease on an IP
     address irrespective of whether that lease was given out by the receiving
     server or not, although the renewal period MUST not exceed the maximum
     client lead time (MCLT) beyond the latest of: 1) the
     potential-expiration-time already acknowledged by the other server or 2)
     the lease-expiration-time or 3) `potential-expiration-time received from
     the partner server.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (January 2002) is 8130 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '3' on line 2325

  -- Looks like a reference, but probably isn't: '4' on line 2836

  -- Looks like a reference, but probably isn't: '9' on line 2955

  -- Looks like a reference, but probably isn't: '7' on line 3018

  -- Looks like a reference, but probably isn't: '8' on line 3063

  -- Looks like a reference, but probably isn't: '1' on line 3102

  -- Looks like a reference, but probably isn't: '2' on line 3153

  -- Looks like a reference, but probably isn't: '5' on line 3191

  -- Looks like a reference, but probably isn't: '6' on line 3386

  -- Looks like a reference, but probably isn't: '10' on line 3558

  -- Looks like a reference, but probably isn't: '11' on line 3606

  -- Looks like a reference, but probably isn't: '12' on line 3629

  == Unused Reference: 'RFC 2139' is defined on line 5780, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DHCID'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DNSRES'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FQDN'

  ** Downref: Normative reference to an Informational RFC: RFC 2104

  ** Obsolete normative reference: RFC 2139 (Obsoleted by RFC 2866)

  ** Obsolete normative reference: RFC 2246 (Obsoleted by RFC 4346)

  ** Obsolete normative reference: RFC 2401 (Obsoleted by RFC 4301)

  ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226)

  ** Obsolete normative reference: RFC 2487 (Obsoleted by RFC 3207)


     Summary: 12 errors (**), 0 flaws (~~), 10 warnings (==), 17 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                        Ralph Droms
3	INTERNET DRAFT                                               Kim Kinnear
4	                                                              Mark Stapp
5	                                                           Cisco Systems

7	                                                             Bernie Volz
8	                                                                Ericsson

10	                                                            Steve Gonczi
11	                                                         Network Engines

13	                                                              Greg Rabil
14	                                                             Mike Dooley
15	                                                              Arun Kapur
16	                                                     Lucent Technologies

18	                                                               July 2001
19	                                                    Expires January 2002

21	                         DHCP Failover Protocol
22	                    <draft-ietf-dhc-failover-09.txt>

24	Status of this Memo

26	   This document is an Internet-Draft and is in full conformance with
27	   all provisions of Section 10 of RFC2026.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF), its areas, and its working groups.  Note that
31	   other groups may also distribute working documents as Internet-
32	   Drafts.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet- Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   The list of current Internet-Drafts can be accessed at
40	   http://www.ietf.org/ietf/1id-abstracts.txt

42	   The list of Internet-Draft Shadow Directories can be accessed at
43	   http://www.ietf.org/shadow.html.

45	Copyright Notice

47	   Copyright (C) The Internet Society (2001). All Rights Reserved.

49	Abstract

51	   DHCP [RFC 2131] allows for multiple servers to be operating on a
52	   single network.  Some sites are interested in running multiple
53	   servers in such a way so as to provide redundancy in case of server
54	   failure.  In order for this to work reliably, the cooperating primary
55	   and secondary servers must maintain a consistent database of the
56	   lease information.  This implies that servers will need to coordinate
57	   any and all lease activity so that this information is synchronized
58	   in case of failover.

60	   This document defines a protocol to provide such synchronization
61	   between two servers.  One server is designated the "primary" server,
62	   the other is the "secondary" server.  This document also describes a
63	   way to integrate the failover protocol with the DHCP load balancing
64	   approach.

66	Table of Contents

68	    1.  Introduction................................................. 4
69	    2.  Terminology.................................................. 5
70	    2.1.  Requirements terminology................................... 5
71	    2.2.  DHCP and failover terminology.............................. 5
72	    3.  Background and External Requirements......................... 9
73	    3.1.  Key aspects of the DHCP protocol........................... 9
74	    3.2.  BOOTP relay agent implementation........................... 11
75	    3.3.  What does it mean if a server can't communicate with its partner? 12
76	    3.4.  Challenging scenarios for a Failover protocol.............. 13
77	    3.5.  Using TCP to detect partner server failure................. 14
78	    4.  Design Goals................................................. 15
79	    4.1.  Design goals for this protocol............................. 15
80	    4.2.  Limitations of this protocol............................... 17
81	    5.  Protocol Overview............................................ 17
82	    5.1.  Messages and States........................................ 17
83	    5.2.  Fundamental guarantees..................................... 20
84	    5.3.  Load balancing............................................. 27
85	    5.4.  IP address allocations between servers..................... 28
86	    5.5.  Operating in NORMAL state.................................. 30
87	    5.6.  Operating in COMMUNICATIONS-INTERRUPTED state.............. 31
88	    5.7.  Operating in PARTNER-DOWN state............................ 31
89	    5.8.  Operating in RECOVER state................................. 31
90	    5.9.  Operating in STARTUP state................................. 31
91	    5.10.  Time synchronization between servers...................... 32
92	    5.11.  IP address binding-status................................. 32
93	    5.12.  DNS dynamic update considerations......................... 36
94	    5.13.  Reservations and failover................................. 41
95	    5.14.  Dynamic BOOTP and failover................................ 42
96	    5.15.  Guidelines for selecting MCLT............................. 43
97	    5.16.  What is sent in response to an UPDREQ or UPDREQALL message? 43
98	    5.17.  How do you determine that your partner is "up to date" for 45
99	    6.  Common Message Format........................................ 45
100	    6.1.  Message header format...................................... 46
101	    6.2.  Common option format....................................... 48
102	    6.3.  Batching multiple binding update transactions in one BNDUPD mes- 49
103	    7.  Protocol Messages............................................ 51
104	    7.1.  BNDUPD message [3]......................................... 51
105	    7.2.  BNDACK message [4]......................................... 62
106	    7.3.  UPDREQ message [9]......................................... 65
107	    7.4.  UPDREQALL message [7]...................................... 66
108	    7.5.  UPDDONE message [8]........................................ 67
109	    7.6.  POOLREQ message [1]........................................ 68
110	    7.7.  POOLRESP message [2]....................................... 69
111	    7.8.  CONNECT message [5]........................................ 70
112	    7.9.  CONNECTACK message [6]..................................... 74
113	    7.10.  STATE message [10]........................................ 78
114	    7.11.  CONTACT message [11]...................................... 79
115	    7.12.  DISCONNECT message [12]................................... 80
116	    8.  Connection Management........................................ 81
117	    8.1.  Connection granularity..................................... 81
118	    8.2.  Creating the TCP connection................................ 81
119	    8.3.  Using the TCP connection for determining communications status 83
120	    8.4.  Using the TCP connection for binding data.................. 85
121	    8.5.  Using the TCP connection for control messages.............. 85
122	    8.6.  Losing the TCP connection.................................. 85
123	    9.  Failover Endpoint States..................................... 86
124	    9.1.  Server Initialization...................................... 86
125	    9.2.  Server State Transitions................................... 86
126	    9.3.  STARTUP state.............................................. 90
127	    9.4.  PARTNER-DOWN state......................................... 93
128	    9.5.  RECOVER state.............................................. 95
129	    9.6.  RECOVER-WAIT state......................................... 97
130	    9.7.  RECOVER-DONE state......................................... 98
131	    9.9.  COMMUNICATIONS-INTERRUPTED State........................... 101
132	    9.10.  POTENTIAL-CONFLICT state.................................. 105
133	    9.11.  RESOLUTION-INTERRUPTED state.............................. 107
134	    9.12.  CONFLICT-DONE state....................................... 108
135	    9.13.  PAUSED state.............................................. 108
136	    9.14.  SHUTDOWN state............................................ 109
137	    10.  Safe Period................................................. 110
138	    11.  Security.................................................... 111
139	    11.1.  Simple shared secret...................................... 112
140	    11.2.  TLS....................................................... 113
141	    12.  Failover Options............................................ 113
142	    12.1.  addresses-transferred..................................... 114
143	    12.2.  assigned-IP-address....................................... 114
144	    12.3.  binding-status............................................ 114
145	    12.4.  client-identifier......................................... 115
146	    12.5.  client-hardware-address................................... 115
147	    12.6.  client-last-transaction-time.............................. 115
148	    12.7.  client-reply-options...................................... 116
149	    12.8.  client-request-options.................................... 116
150	    12.9.  DDNS...................................................... 117
151	    12.10.  delayed-service-parameter................................ 118
152	    12.11.  hash-bucket-assignment................................... 118
153	    12.12.  IP-flags................................................. 119
154	    12.13.  lease-expiration-time.................................... 120
155	    12.14.  max-unacked-bndupd....................................... 120
156	    12.15.  MCLT..................................................... 120
157	    12.16.  message.................................................. 121
158	    12.17.  message-digest........................................... 121
159	    12.18.  potential-expiration-time................................ 122
160	    12.19.  receive-timer............................................ 122
161	    12.20.  protocol-version......................................... 122
162	    12.21.  reject-reason............................................ 123
163	    12.22.  relationship-name........................................ 124
164	    12.23.  server-flags............................................. 124
165	    12.24.  server-state............................................. 125
166	    12.25.  start-time-of-state...................................... 125
167	    12.26.  TLS-reply................................................ 126
168	    12.27.  TLS-request.............................................. 126
169	    12.28.  vendor-class-identifier.................................. 126
170	    12.29.  vendor-specific-options.................................. 127
171	    13.  IANA Considerations......................................... 127
172	    14.  Acknowledgments............................................. 127
173	    15.  References.................................................. 129
174	    16.  Author's information........................................ 131
175	    17.  Full Copyright Statement.................................... 132

177	1.  Introduction

179	   DHCP [RFC 2131] allows for multiple servers to be operating on a sin-
180	   gle network.  Some sites are interested in running multiple servers
181	   in such a way so as to provide redundancy in case of server failure
182	   since the DHCP subsystem is in many cases a critical part of the net-
183	   work infrastructure.

185	   This document defines a protocol to provide synchronization between
186	   two servers in order that each can take over for the other should
187	   either one fail or become unreachable.

189	   One server is designated the "primary" server,  the other is the
190	   "secondary" server, and most DHCP client requests are sent to each
191	   server (see section 3.1.1 for details).

193	   In order to provide a  high availability DHCP service, these
194	   cooperating primary and secondary servers must maintain a consistent
195	   database of lease information.  This implies that servers will need
196	   to coordinate all lease activity so that this information is syn-
197	   chronized in case failover is required.  The protocol messages and
198	   processing techniques required to maintain a consistent database are
199	   specified in the protocol described here.

201	   The failover protocol also contains a way to integrate the DHCP load-
202	   balancing algorithm described in [RFC 3074] with the failover proto-
203	   col.

205	2.  Terminology

207	   This section discusses both the generic requirements terminology com-
208	   mon to many IETF protocol specifications as well as specialized DHCP
209	   and failover protocol specific terminology.

211	2.1.  Requirements terminology

213	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
214	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
215	   document are to be interpreted as described in RFC 2119 [RFC 2119].

217	2.2.  DHCP and failover terminology

219	   This document uses the following terms:

221	      o  "available IP address"

223	         An IP address is "available" if it may be allocated by a
224	         specific DHCP server.  An IP address is considered (for the
225	         purposes of this document) to be available to a single server
226	         for allocation unless otherwise noted.  An IP address available
227	         for allocation on a primary server has state FREE, and an IP
228	         address available for allocation on a secondary server has
229	         state BACKUP.

231	      o  "binding"
232	         A binding is a collection of configuration parameters, includ-
233	         ing at least an IP address, associated with or "bound to" a
234	         DHCP client.  Bindings are managed by DHCP servers.

236	      o  "binding database"

238	         The collection of bindings managed by a primary and secondary.

240	      o  "binding update transaction"

242	         A binding update transaction refers to the set of information
243	         (contained in options) necessary to perform a binding update
244	         for a single IP address.  It will be comprised of the
245	         assigned-IP-address option, the binding-status option, along
246	         with other options as appropriate.

248	      o  "binding-status"

250	         The binding-status is the status of an IP address with respect
251	         to its association with a client.  There are specific binding-
252	         status values defined for use by the failover protocol, e.g.,
253	         ACTIVE, FREE, RELEASED, ABANDONED, etc.  These are designed to
254	         map more or less directly onto the binding-status values used
255	         internally in most DHCP server implementations.  The term
256	         binding-status refers to the concept also sometimes known as
257	         "lease state" or "IP address state", but in this document the
258	         term "state" is reserved for the failover state of a failover
259	         endpoint, and binding-status is always used to refer to the
260	         state associated with an IP address or lease.

262	      o "DHCP client" or "client"

264	        A DHCP client is an Internet host using DHCP to obtain confi-
265	        guration parameters such as a network address.  The term
266	        "client" used within this document always means a DHCP client,
267	        and never one of the two failover servers.

269	      o "DHCP server" or "server"

271	        A DHCP server is an Internet host that returns configuration
272	        parameters to DHCP clients.

274	      o "DDNS"

276	        An abbreviation for "Dynamic DNS", which refers to the capabil-
277	        ity to update a DNS server's name (actually resource record)
278	        database using an on-the-wire protocol defined in [RFC 2136].

280	      o "DNS"

282	        An abbreviation for "Domain Name System", a scheme where a cen-
283	        tral name repository is used to map names to IP addresses and IP
284	        addresses to names.

286	      o "failover endpoint"

288	        The failover protocol allows for there to be a unique failover
289	        endpoint per partner per role (where role is primary or secon-
290	        dary).  This failover endpoint can take actions and hold unique
291	        states.  There are thus a maximum of two failover endpoints per
292	        server per partner (one for each partner as a primary and one
293	        for that same partner as a secondary.)

295	      o "FQDN"

297	        An FQDN is a "fully qualified domain name".  A fully qualified
298	        domain name generally is a host name with at least one zone
299	        name, for example "www.dhcp.org" is a fully qualified domain
300	        name.

302	      o "lazy update"

304	        Lazy update refers to the requirement placed on a server imple-
305	        menting a failover protocol to update its failover partner when-
306	        ever the binding database changes.  A failover protocol which
307	        didn't support lazy update would require the failover partner
308	        update to be complete before a DHCP server could respond to a
309	        DHCP client request with a DHCPACK.  A failover protocol which
310	        does support lazy update places no such restriction on the
311	        update of the failover partner server, and so a server can allo-
312	        cate an IP address or extend a lease on an IP address and then
313	        update its failover partner as time permits.  A failover proto-
314	        col which supports lazy update not only removes the requirement
315	        to update the failover partner prior to responding to a DHCP
316	        client with a DHCPACK, but also allows gathering up batches of
317	        updates from one failover server to its partner.

319	      o "MCLT"

321	        The MCLT refers to maximum client lead time.  This time is con-
322	        figured on the primary server and transmitted from the primary
323	        to the secondary server in the CONNECT message.  It is the max-
324	        imum amount of time that one server can extend a lease for a
325	        client's binding beyond the time known by the partner server.
326	        See section 5.2.1 for details.

328	      o "partner"

330	        A "partner", for the purposes of this document, refers to a
331	        failover server, typically the other failover server.  In many
332	        (if not most) cases, the failover protocol is symmetric with
333	        respect to the primary or secondary nature of the servers, and
334	        so it is often appropriate to discuss "updating the partner
335	        server", since it could be a primary server updating a secondary
336	        server or a secondary server updating a primary server.

338	      o "Primary server" or "Primary"

340	        A DHCP server configured to provide primary service to a set of
341	        DHCP clients for a particular set of subnet address pools.

343	      o "RR"

345	        "RR" is an abbreviation for "resource record".  All records in
346	        the DNS are resource records.  The resource records of most
347	        relevance to this document are the "A" resource record, which
348	        maps a DNS name to a particular IP address, the "PTR" resource
349	        record, which allows a "reverse map", from the IP address back
350	        to a DNS name, and the "KEY" resource record, which is used in
351	        ways defined in [FQDN] to tag a DNS name with the identity of
352	        the DHCP client with which it is associated.

354	      o "Secondary server" or "Secondary"

356	        A DHCP server configured to act as backup to a primary server
357	        for a particular set of subnet address pools.

359	      o "stable storage"

361	        Every DHCP server is assumed to have some form of what is called
362	        "stable storage".  Stable storage is used to hold information
363	        concerning IP address bindings (among other things) so that this
364	        information is not lost in the event of a server failure which
365	        requires restart of the server.

367	      o "state"

369	        In this document, the term "state" refers exclusively to the
370	        state of a failover endpoint, for example: NORMAL,
371	        COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN.  It is not used to
372	        refer to any attributes of an IP address or a binding of an IP
373	        address.  See "binding-status".

375	      o "subnet address pool"
376	        A subnet address pool is the set of IP addresses which is asso-
377	        ciated with a particular network number and subnet mask.  In the
378	        simple case, there is a single network number and subnet mask
379	        and a set of IP addresses.  In the more complex case (sometimes
380	        called "secondary subnets", sometimes "superscopes"), several
381	        (apparently unrelated) network number and subnet mask combina-
382	        tions with their associated IP addresses may all be configured
383	        together into one subnet address pool.

385	3.  Background and External Requirements

387	   This section highlights key aspects of the DHCP protocol on which the
388	   failover protocol depends.  It also discusses the requirements that
389	   the failover protocol places on other aspects of the network infras-
390	   tructure, and some general issues surrounding server failure detec-
391	   tion.  Some failure scenarios that provide particular challenges to a
392	   failover protocol are discussed.  Finally, the challenges inherent in
393	   using a TCP connection as a means to detect failure of a partner
394	   server are elaborated.

396	3.1.  Key aspects of the DHCP protocol

398	   The failover protocol is designed to augment the DHCP protocol as
399	   described in RFC 2131 [RFC 2131].  There are several key aspects of
400	   the DHCP protocol which are required by the failover protocol in
401	   order to successfully meet its design goals.

403	3.1.1.  Broadcast behavior

405	   There are two aspects of the broadcast behavior of the DHCP protocol
406	   which are key to making the failover protocol operate successfully.
407	   The first is simply that the DHCP protocol requires a DHCP client to
408	   broadcast all DHCPDISCOVER and DHCPREQUEST/INIT-REBOOT messages.
409	   Because of this requirement, a DHCP client who was communicating with
410	   one server will automatically be able to communicate with another
411	   server if one is available.

413	   The second aspect of broadcast behavior is similar to the first, but
414	   involves the distinction between a DHCPREQUEST/RENEW and
415	   DHCPREQUEST/REBINDING.  A DHCPREQUEST/RENEW is the message that a
416	   DHCP client uses to extend its lease.  It is unicast to the DHCP
417	   server from which it acquired the lease.   However, the DHCP protocol
418	   (in a farsighted move), was explicitly designed so that in the event
419	   that a DHCP client cannot contact the server from which it received a
420	   lease on an IP address using a DHCPREQUEST/RENEW, the client is
421	   required to broadcast its renewal using a DHCPREQUEST/REBINDING to
422	   any available DHCP server.  Since all DHCP clients were required to
423	   implement this algorithm, the failover protocol can have a different
424	   server from the one that initially granted a lease be the server to
425	   renew a lease.  Thus, one server can take over for another with no
426	   interruption in the service as experienced by the DHCP client or its
427	   associated applications software.

429	3.1.2.  Client responsibility

431	   In the DHCP protocol the DHCP clients are entrusted with a consider-
432	   able responsibility.  In particular, after they are granted a lease
433	   on an IP address, they are enjoined to only use that IP address while
434	   their lease is valid.  Every DHCP client is expected to stop using an
435	   IP address if the expiration time on the lease has passed and if it
436	   cannot get an extension on the lease for that IP address from some
437	   DHCP server.  Thus, the correct behavior of every DHCP client in this
438	   regard is required to ensure the integrity of the DHCP service.  On
439	   the other hand, incorrect behavior by a client in this area will tend
440	   to adversely affect at most one other DHCP client.

442	   Furthermore, any DHCP client which sends in a DHCPREQUEST/RENEW or
443	   DHCPREQUEST/REBINDING to a DHCP server (either unicast for a RENEW or
444	   broadcast for a REBINDING) MUST still have time to run on the lease
445	   for that IP address.  The DHCP server sends the DHCPACK back unicast
446	   to the IP address from which the RENEW or REBINDING originated.

448	   Given the existing responsibility placed on the client to only use an
449	   IP address when the lease is valid, and to only send in a RENEW or
450	   REBINDING if the lease is valid, the failover protocol relies on DHCP
451	   clients to perform responsibly and will, in the absence of conflict-
452	   ing information, believe a DHCP client that is attempting to RENEW or
453	   REBIND a lease on an IP address is the legitimate owner of that IP
454	   address.

456	   If clients do not follow these rules, it is possible for an address
457	   to be in use by more than one client. For a single server, this hap-
458	   pens because the server has leased the expired address to another
459	   client and the original client is also attempting to use the address.
460	   The server would NAK the renewal request. This is made slightly worse
461	   in the failover protocol if the two servers are unable to communicate
462	   with each other and one server leases an available address to a new
463	   client while the other server receives a renewal from a different
464	   client.  In this case, both servers lease the same address to dif-
465	   ferent clients for the MCLT time.

467	   One troublesome issue is that of the DHCP client responsibility when
468	   sending in DHCPREQUEST/INIT-REBOOT requests.  While the original DHCP
469	   RFC was written to require a DHCP client to have time left to run on
470	   the lease for an IP address if the client is sending an INIT-REBOOT
471	   request, it was sufficiently unclear that some client vendors didn't
472	   realize this until recently.  Since the INIT-REBOOT request was sent
473	   with the IP address in the dhcp-requested-address option and not in
474	   the ciaddr (for perfectly good reasons), the similarity to the RENEW
475	   and REBINDING case was lost on many people.

477	   At present, the failover protocol does not assume that a client send-
478	   ing in an INIT-REBOOT request necessarily has a valid lease on the IP
479	   address appearing in the dhcp-requested-address option in the INIT-
480	   REBOOT request.

482	   The implications of this are as follows: Assume that there is a DHCP
483	   client that gets a lease from one server while that server is unable
484	   to communicate with its failover partner.  Then, assume that after
485	   that client reboots it is able only to communicate with the other
486	   failover server.  If the failover servers have not been able to com-
487	   municate with each other during this process, then the DHCP client
488	   will get a new IP address instead of being able to continue to use
489	   its existing IP address. This will affect no applications on the DHCP
490	   client, since it is rebooting.  However, it will use up an additional
491	   IP address in this marginal case.

493	3.1.3.  Stable storage update before DHCPACK

495	   The DHCP protocol allocates resources, and in order to operate
496	   correctly it requires that a DHCP server update some form of stable
497	   storage prior to sending a DHCPACK to a DHCP client in order to grant
498	   that client a lease on an IP address.

500	   One of the goals of the failover protocol is that it not add signifi-
501	   cant additional time to this already time consuming requirement to
502	   update stable storage prior to a DHCPACK.  In particular, adding a
503	   requirement to communicate with another server prior to sending a
504	   DHCPACK would greatly simplify the failover protocol, but it would
505	   unacceptably limit the potential scalability of any DHCP server which
506	   employed the failover protocol.

508	3.2.  BOOTP relay agent implementation

510	   Many DHCP clients are not resident on the same network segment as a
511	   DHCP server.  In order to support this form of network architecture,
512	   most contemporary routers implement something known as a BOOTP Relay
513	   Agent.  This capability inside of a router listens for all broadcasts
514	   at the DHCP port, port 67, and will relay any broadcasts that it
515	   receives on to a DHCP server.  The IP address of the DHCP server must
516	   have been previously configured into the router.  As part of the
517	   relay process, the relay agent will place the address of the inter-
518	   face on which it received the broadcast into the giaddr field of the
519	   DHCP packet.

521	   Since the failover protocol requires two DHCP servers to receive any
522	   broadcast DHCP messages, in order to work with DHCP clients which are
523	   not local to the DHCP server, the BOOTP relay agent on the router
524	   closest to the DHCP client must be configured to point at more than
525	   one DHCP server.

527	   Most BOOTP relay agent implementations allow this duplication of
528	   packets.

530	   If this is not possible, an administrator might be able to configure
531	   the relay agent with a subnet broadcast address, but in this case the
532	   primary and secondary DHCP servers in a failover pair must both
533	   reside on the same subnet.

535	3.3.  What does it mean if a server can't communicate with its partner?

537	   In any protocol designed to allow one server to take over some
538	   responsibilities from a partner server in the event of "failure" of
539	   that partner server, there is an inherent difficulty in determining
540	   when that partner server has failed.

542	   In fact, it is fundamentally impossible for one server to distinguish
543	   a network communications failure from the outright failure of the
544	   server to which it is trying to communicate.  In the case where each
545	   server is handing out resources (in this case IP addresses) to a
546	   client community, mistaking an inability to communicate with a
547	   partner server for failure of that partner server could easily cause
548	   both servers to be handing out the same IP addresses to different
549	   clients.

551	   One way that this is sometimes handled is for there to be more than
552	   two servers.  In the case of an odd number of servers, the servers
553	   that can still communicate with a majority of other servers will con-
554	   sider themselves operational, and any server which can't communicate
555	   to a majority of other servers must immediately cease operations.

557	   While this technique works in some domains, having the only server to
558	   which a DHCP client can communicate voluntarily shut itself down
559	   seems like something worth avoiding.

561	   The failover protocol will operate correctly while both servers are
562	   unable to communicate, whether they are both running or not.  At some
563	   point there may be resource contention, and if one of the servers is
564	   actually down, then the operator can inform the operational server
565	   and the operational server will be able to use all of the failed
566	   server's resources.

568	   The protocol also allows detection of an orderly shutdown of a parti-
569	   cipating server.

571	3.4.  Challenging scenarios for a Failover protocol

573	   There exist two failure scenarios which provide particular challenges
574	   to the correctness guarantees of a failover protocol.

576	3.4.1.  Primary Server crash before "lazy" update:

578	   In the case where the primary server sends a DHCPACK to a client for
579	   a newly allocated IP address and then crashes prior to sending the
580	   corresponding update to the secondary server, the secondary server
581	   will have no record of the IP address allocation.  When the secondary
582	   server takes over, it may well try to allocate that IP address to a
583	   different client.  In the case where the first client to receive the
584	   IP address is not on the net at the time (yet while there was still
585	   time to run on its lease), an ICMP echo (i.e., ping) will not prevent
586	   the secondary server from allocating that IP address to a different
587	   client.

589	   The failover protocol deals with this situation by having the primary
590	   and secondary servers allocate addresses for new clients from dis-
591	   joint address pools.  See section 5.5 for details.

593	   A more likely (in that DHCPREQUEST/RENEWs are presumably more common
594	   than DHCPDISCOVERs) and more subtle version of this problem is where
595	   the primary server crashes after extending a client's lease time, and
596	   before updating the secondary with a new time using a lazy update.
597	   After the secondary takes over, if the client is not connected to the
598	   network the secondary will believe the client's lease has expired
599	   when, in fact, it has not.  In this case as well, the IP address
600	   might be reallocated to a different client while the first client is
601	   still using it.

603	   This scenario is handled by the failover protocol through control of
604	   the lease time and the use of the maximum client lead time (MCLT).
605	   See section 5.2.1  for details.

607	3.4.2.  Network partition where DHCP servers can't communicate but each
608	can talk to clients:

610	   Several conditions are required for this situation to occur.  First,
611	   due to a network failure, the primary and secondary servers cannot
612	   communicate.  As well, some of the DHCP clients must be able to com-
613	   municate with the primary server, and some of the clients must now
614	   only be able to communicate with the secondary server.  When this
615	   condition occurs, both primary and secondary servers could attempt to
616	   allocate IP addresses for new clients from the same pool of available
617	   addresses.  At some point, then, two clients will end up being allo-
618	   cated the same IP address.  This will cause problems when the network
619	   failure that created this situation is corrected.

621	   The failover protocol deals with this situation by having the primary
622	   and secondary servers allocate addresses for new clients from dis-
623	   joint address pools.  See section 5.5 for details.

625	3.5.  Using TCP to detect partner server failure

627	   There are several characteristics of TCP that are important to the
628	   functioning of the failover protocol, which uses one TCP connection
629	   for both bulk data transfer as well as to assess communications
630	   integrity with the other server.  Reliable and ordered message
631	   delivery are chief among these important characteristics.

633	   It would be nice to use the capabilities built in to TCP to allow it
634	   to determine if communications integrity exists to the failover
635	   partner but this strategy contains some problems which require
636	   analysis.  There exist three fundamental cases for an open TCP con-
637	   nection that must be examined.

639	      1.  When no data is being sent on a TCP connection, the TCP layer
640	          also does not exchange any signaling messages to assure that
641	          the peer is still up.

643	      2.  When data is queued to be sent, and the receiver has not
644	          blocked the sending of additional data, then messages are
645	          flowing across the TCP connection containing the applications
646	          data.

648	      3.  When data is queued to be sent, and the receiver has blocked
649	          the transmission of additional data, then persist messages are
650	          flowing from the receiver to the sender to ensure that the
651	          sender doesn't miss the receiver opening the window for
652	          further transmissions.

654	   The first case can be turned into the second case by sending
655	   application-level keep-alive messages periodically when there is no
656	   other data queued to be sent.  Note TCP keep-alive messages might be
657	   used as well, but they present additional problems.

659	   Thus, we can ensure that the TCP connection has messages flowing
660	   periodically across the connection fairly easily.  The question
661	   remains as to what TCP will do if the other end of the connection
662	   fails to respond (either because of network partition or because the
663	   receiving server crashes). TCP will attempt to retransmit a message
664	   with an exponential backoff, and will eventually timeout that
665	   retransmission.  However, the length of that timeout cannot, in gen-
666	   eral, be set on a per-connection basis, and is frequently as long as
667	   nine minutes, though in some cases it may be as short as two minutes.
668	   On some systems it can be set system-wide, while on other systems it
669	   cannot be changed at all.

671	   A value for this timeout that would be appropriate for the failover
672	   protocol, say less than 1 minute, could have unpleasant side-effects
673	   on other applications running on the same server, assuming that it
674	   could be changed at all on the host operating system.

676	   Nine minutes is a long time for the DHCP service to be unavailable to
677	   any new clients that were being served by the server which has
678	   crashed, when there is another server running that could respond to
679	   them as soon as it determines that its partner is not operational.

681	   The conclusion drawn from this analysis is that TCP provides very
682	   useful support for the failover protocol in the areas of reliable and
683	   ordered message delivery, but cannot by itself be relied upon to
684	   detect partner server failure in a fashion acceptable to the needs of
685	   the failover protocol.  Additional failover protocol capabilities
686	   have been created to support timely detection of partner server
687	   failure.  See section 8.3 for details on this mechanism.

689	4.  Design Goals

691	   This section lists the design goals and the limitations of the fail-
692	   over protocol.

694	4.1.  Design goals for this protocol

696	   The following is a list of goals that are met by this protocol.  They
697	   are listed in priority order.

699	      1.  Implementations of this protocol must work with existing DHCP
700	          client implementations based on the DHCP protocol [RFC 2131].

702	      2.  Implementations of the protocol must work with existing BOOTP
703	          relay agent implementations.

705	      3.  The protocol must provide failover redundancy between servers
706	          that are not located on the same subnet.

708	      4.  Provide for continued service to DHCP clients through an
709	          automated mechanism in the event of failure of the primary
710	          server.

712	      5.  Avoid binding an IP address to a client while that binding is
713	          currently valid for another client.  In other words, do not
714	          allocate the same IP address to two clients.

716	      6.  Minimize any need for manual administrative intervention.

718	      7.  Introduce no additional delays in server response time as a
719	          result of the network communications required to implement the
720	          failover protocol, i.e., don't require communications with the
721	          partner between the receipt of a DHCPREQUEST and the
722	          corresponding DHCPACK.

724	      8.  Share IP address ranges between primary and secondary servers;
725	          i.e., impose no requirement that the pool of available
726	          addresses be manually or permanently divided between servers.

728	      9.  Continue to meet the goals and objectives of this protocol in
729	          the event of server failure or network partition.

731	      10. Provide graceful reintegration of full protocol service after
732	          server failure or network partition.

734	      11. Allow for one computer to act as a secondary server for multi-
735	          ple primary servers.  The protocol must allow failover primary
736	          and secondary configuration choices to be made at a granular-
737	          ity smaller than "all of the subnets served by a single
738	          server", though individual implementations may not choose to
739	          allow such flexibility.

741	      12. Ensure that an existing client can keep its existing IP
742	          address binding if it can communicate with either the primary
743	          or secondary DHCP server implementing this protocol - not just
744	          whichever server that originally offered it the binding.

746	      13. Ensure that a new client can get an IP address from some
747	          server.  Ensure that in the face of partition, where servers
748	          continue to run but cannot communicate with each other, the
749	          above goals and requirements may be met.  In addition, when
750	          the partition condition is removed, allow graceful automatic
751	          re-integration without requiring human intervention.

753	      14. If either primary or secondary server loses all of the infor-
754	          mation that it has stored in stable storage, ensure that it be
755	          able to refresh its stable storage from the other server.

757	      15. Support load balancing between the primary and secondary
758	          servers, and allow configuration of the percentage of the
759	          client population served by each with a moderately fine
760	          granularity.

762	4.2.  Limitations of this protocol

764	   The following are explicit limitations of this protocol.

766	      1.  This protocol provides only one level of redundancy through a
767	          single secondary server for each primary server.

769	      2.  A subset of the address pool is reserved for secondary server
770	          use.  In order to handle the failure case where both servers
771	          are able to communicate with DHCP clients, but unable to com-
772	          municate with each other, a subset of the IP address pool must
773	          be set aside as a private address pool for the secondary
774	          server.  The secondary can use these to service newly arrived
775	          DHCP clients during such a period.  The required size of this
776	          private pool is based only on the arrival rate of new DHCP
777	          clients and the length of expected downtime, and is not influ-
778	          enced in any way by the total number of DHCP clients supported
779	          by the server pair.

781	          The failover protocol can be used in a mode where both the
782	          primary and secondary servers can share the load between them
783	          when both are operating.  In this load balancing mode, the
784	          addresses allocated by the primary server to the secondary
785	          server are not unused, but are used instead to service the
786	          portion of the client base to which the secondary server is
787	          required to respond.  See section 5.3 for more information on
788	          load balancing.

790	      3.  The primary and secondary servers do not respond to client
791	          requests at all while recovering from a failure that could
792	          have resulted in duplicate IP assignments.  (When synchroniz-
793	          ing in POTENTIAL-CONFLICT state).

795	5.  Protocol Overview

797	   This section will discuss the failover protocol at a relatively high
798	   level of detail.  In the event that a description in this section
799	   conflicts (or appears to conflict due to the overview nature of this
800	   section) with information in later sections of this draft, the infor-
801	   mation in the later sections should be considered authoritative.

803	5.1.  Messages and States

805	   This protocol is centered around the message exchange used by one
806	   server to update the other server of binding database changes result-
807	   ing from DHCP client activity:

809	      o Communication of binding database changes

811	        The binding update (BNDUPD) message is used to send the binding
812	        database changes to the partner server, and the partner server
813	        responds with a binding acknowledgement (BNDACK) message when it
814	        has successfully committed those changes to its own stable
815	        storage.

817	   All of the other messages involve ancillary issues:

819	      o Management of available IP addresses

821	        The pool request (POOLREQ) message is used by the secondary
822	        server to request an allocation of IP addresses from the primary
823	        server.  The pool response (POOLRESP) message is used by the
824	        primary server to inform the secondary server how many IP
825	        addresses were allocated to the secondary server as the result
826	        of the pool request.

828	      o Synchronization of the binding databases between the servers
829	        after they've been out of communications

831	        The update request (UPDREQ) message is used by one server to
832	        request that its partner send it all binding database informa-
833	        tion that it has not already seen.  The update request all
834	        (UPDREQALL) message is used by one server to request that all
835	        binding database information be sent in order to recover from a
836	        total loss of its binding database by the requesting server.
837	        The update done (UPDDONE) message is used by the responding
838	        server to indicate that all requested updates have been sent the
839	        responding server and acked by the requesting server.

841	      o Connection establishment

843	        The connect (CONNECT) message is used by the primary server to
844	        establish a high level connection with the other server, and to
845	        transmit several important configuration data items between the
846	        servers.  The connect acknowledgement message (CONNECTACK) is
847	        used by the secondary server to respond to a CONNECT message
848	        from the primary server.  The disconnect (DISCONNECT) message is
849	        used by either server when closing a connection.

851	      o Server synchronization

853	        The state change (STATE) message is used by either server to
854	        inform the other server of a change of failover state.

856	      o Connection integrity management

858	        The contact (CONTACT) message is used by either server to ensure
859	        that the other server continues to see the connection as opera-
860	        tional.  It MUST be transmitted periodically over every esta-
861	        blished connection if other message traffic is not flowing, and
862	        it MAY be sent at any time.

864	5.1.1.  Failover endpoints

866	   The proper operation of the failover protocol requires more than the
867	   transmission of messages between one server and the other.  Each end-
868	   point might seem to be a single DHCP server, but in fact there are
869	   many situations where additional flexibility in configuration is use-
870	   ful.

872	   For instance, there might be several servers which are each primary
873	   for a distinct set of address pools, and one server which is secon-
874	   dary for all of those address pools.  The situation with the pri-
875	   maries is straightforward, but the secondary will need to maintain a
876	   separate failover state, partner state, and communications up/down
877	   status for each of the separate primary servers for which it is act-
878	   ing as a secondary.

880	   The failover protocol calls for there to be a unique failover end-
881	   point per partner per role (where role is primary or secondary).
882	   This failover endpoint can take actions and hold unique states.
883	   There are thus a maximum of two failover endpoints per partner (one
884	   for the partner as a primary and one for that same partner as a
885	   secondary.)

887	   Thus, in the case where there are two primary servers A and B each
888	   backed up by a single common secondary server C, there is one fail-
889	   over endpoint on each of A and B, and two different failover end-
890	   points on C.  The two different failover endpoints on C each have
891	   unique states and independent TCP connections.

893	   This document frequently describes the behavior of the protocol in
894	   terms of primary and secondary servers, not primary and secondary
895	   failover endpoints.  However, it is important to remember that every
896	   'server' described in this document is in reality a failover endpoint
897	   that resides in a particular process, and that many failover end-
898	   points may reside in the same process.

900	   It is not the case that there is a unique failover endpoint for each
901	   subnet address pool that participates in a failover relationship.  On
902	   one server, there is one failover endpoint per partner per role,
903	   regardless of how many subnet address pools are managed by that com-
904	   bination of partner and role.  Conversely, on a particular server,
905	   any given subnet address pool will be associated with exactly one
906	   failover endpoint.

908	   When a connection is received from the partner, the unique failover
909	   endpoint to which the message is directed is determined solely by the
910	   IP address of the partner and the port to which the connection is
911	   directed by the partner.  See section 8.2.

913	5.2.  Fundamental guarantees

915	   There a several fundamental restrictions this protocol places on what
916	   one server can do in the absence of knowledge of the other server.
917	   Operating within these restrictions allows certain guarantees to be
918	   made to the partner server, and these are key to the correct opera-
919	   tion of the protocol.

921	5.2.1.  Control of lease time

923	   The key problem with lazy update is that when a server fails after
924	   updating a client with a particular lease time and before updating
925	   its partner, the partner will believe that a lease has expired even
926	   though the client still retains a valid lease on that IP address.

928	   In order to handle this problem, a period of time known as the "Max-
929	   imum Client Lead Time" (MCLT) is defined and must be known to both
930	   the primary and secondary servers.  Proper use of this time interval
931	   places an upper bound on the difference allowed between the lease
932	   time provided to a DHCP client by a server and the lease time known
933	   by that server's partner.  However, the MCLT is typically much less
934	   than the lease time that a server has been configured to offer a
935	   client, and so some strategy must exist to allow a server to offer
936	   the configured lease time to a client.  During a lazy update the
937	   updating server typically updates its partner with a potential
938	   expiration time which is longer than the lease time previously given
939	   to the client and which is longer than the lease time that the server
940	   has been configured to give a client.  This allows that server to
941	   give a longer lease time to the client the next time the client
942	   renews its lease, since the time that it will give to the client will
943	   not exceed the MCLT beyond the potential expiration time acknowledged
944	   by its partner.

946	   The PARTNER-DOWN state exists so that a server can be sure that its
947	   partner is, indeed, down.  Correct operation while in that state
948	   requires (generally) that the server wait the MCLT after anything
949	   that happened prior to its transition into PARTNER-DOWN state (or,
950	   more accurately, when the other server went down if that is known).
951	   Thus, the server MUST wait the MCLT after the partner server went
952	   down before allocating any of the partner's addresses which were
953	   available for allocation.  In the event the partner was not in com-
954	   munication prior to going down, it might have allocated one or more
955	   of its FREE addresses to a DHCP client and been unable to inform the
956	   server entering PARTNER-DOWN prior to going down itself.  By waiting
957	   the MCLT after the time the partner went down, the server in
958	   PARTNER-DOWN state ensures that any clients which have a lease on one
959	   of the partner's FREE addresses will either time out or contact the
960	   server in PARTNER-DOWN by the time that period ends.

962	   In addition, once a server has made a transition to PARTNER-DOWN
963	   state, it MUST NOT reallocate an IP address from one client to
964	   another client until the longer of the following two times:

966	      o The MCLT after the time the partner server went down (see
967	        above).

969	      o An additional MCLT interval after the lease by the original
970	        client expires.  (Actually, until the maximum client lead time
971	        after what it believes to be the lease expiration time of the
972	        client.)

974	   Some optimizations exist for this restriction, in that it only
975	   applies to leases that were issued BEFORE entering PARTNER-DOWN. Once
976	   a server has entered PARTNER-DOWN and it leases out an address, it
977	   need not wait this time as long as it has never communicated with the
978	   partner since the lease was given out.

980	   The fundamental relationship on which much of the correctness of this
981	   protocol depends is that the lease expiration time known to a DHCP
982	   client MUST NOT be more than the maximum client lead time greater
983	   than the potential expiration time known to a server's partner.

985	   The remainder of this section makes the above fundamental relation-
986	   ship more explicit.

988	   This protocol requires a DHCP server to deal with several different
989	   lease intervals and places specific restrictions on their relation-
990	   ships. The purpose of these restrictions is to allow the other server
991	   in the pair to be able to make certain assumptions in the absence of
992	   an ability to communicate between servers.

994	   The different lease times are:

996	   o desired lease interval
997	     The desired lease interval is the lease interval that a DHCP server
998	     would like to give to a DHCP client in the absence of any restric-
999	     tions imposed by the Failover protocol.  Its determination is out-
1000	     side of the scope of this protocol. Typically this is the result of
1001	     external configuration of a DHCP server.

1003	   o actual lease interval

1005	     The actual lease internal is the lease interval that a DHCP server
1006	     gives out to a DHCP client in the dhcp-lease-time option of a
1007	     DHCPACK packet.  It may be shorter than the desired client lease
1008	     interval (as explained below).

1010	   o potential lease interval

1012	     The potential lease interval is the lease expiration interval the
1013	     local server tells to its partner in the potential-expiration-time
1014	     option of a BNDUPD message.

1016	   o acknowledged potential lease interval

1018	     The acknowledged potential lease interval is the potential lease
1019	     interval the partner server has most recently acknowledged in the
1020	     potential-expiration-time option of a BNDACK message.

1022	   The key restriction (and guarantee) that any server makes with
1023	   respect to lease intervals is that the actual client lease interval
1024	   never exceeds the acknowledged potential lease interval (if any) by
1025	   more than a fixed amount.  This fixed amount is called the "Maximum
1026	   Client Lead Time" (MCLT).

1028	   The MCLT MAY be configurable on the primary server, but for correct
1029	   server operation it MUST be the same and known to both the primary
1030	   and secondary servers.  The secondary server determines the MCLT from
1031	   the MCLT option sent from the primary server to the secondary server
1032	   in the CONNECT message.

1034	   A server MUST record in its stable storage both the actual lease
1035	   interval and the most recently acknowledged potential lease interval
1036	   for each IP address binding.  It is assumed that the desired client
1037	   lease interval can be determined through techniques outside of the
1038	   scope of this protocol.  See section 7.1.5 for more details concern-
1039	   ing the times that the server MUST record in its stable storage and
1040	   the way that they interact with the lease time that may be offered to
1041	   a DHCP client.

1043	   Again, the fundamental relationship among these times which MUST be
1044	   maintained is:

1046	       actual lease interval <
1047	       ( acknowledged potential lease interval + MCLT )

1049	   Figure 5.2.1-1 illustrates an initial lease to a client using the
1050	   rules discussed in the example which follows it.  Note that this is
1051	   only one example -- as long as the fundamental relationship is
1052	   preserved, the actual times used could be quite different.

1054	              DHCP                 Primary             Secondary
1055	       time   Client               Server               Server

1057	                | (time in intervals) |  (absolute time)   |
1058	                |                     |                    |
1059	                | >-DHCPDISCOVER->    |                    |
1060	                |     <---DHCPOFFER-< |                    |
1061	                |  lease-time=MCLT    |                    |
1062	                |                     |                    |
1063	                | >-DHCPREQUEST->     |                    |
1064	                |   (selecting)       |                    |
1065	                |                     |                    |
1066	         t      |  <--------DHCPACK-< |                    |
1067	                |  lease-time=MCLT    |                    |
1068	                |                     |    >-BNDUPD-->     |
1069	                |                     |  lease-expiration=t+MCLT
1070	                |                     |  potential-expiration=t+(MCLT/2)+X
1071	                |                     |                    |
1072	                |                     |     <-BNDACK-<     |
1073	                |                     |  potential-expiration=t+(MCLT/2)+X
1074	               ...                   ...                  ...
1075	                |                     |                    |
1076	      t+MCLT/2  | >-DHCPREQUEST->     |                    |
1077	                |      (renew)        |                    |
1078	                |                     |                    |
1079	         t1     |  <--------DHCPACK-< |                    |
1080	                |   lease-time=X      |                    |
1081	                |                     |    >-BNDUPD-->     |
1082	                |                     |  lease-expiration=t1+X
1083	                |                     |  potential-expiration=t1+(X/2)+X
1084	                |                     |                    |
1085	                |                     |     <-BNDACK-<     |
1086	                |                     |  potential-expiration=t1+(X/2)+X
1087	               ...                   ...                  ...

1089	           Figure 5.2.1-1:  Lazy Update Message Traffic
1090	                          X = Desired Lease Interval
1091	                          Assumes renewal interval = lease interval / 2

1093	   DISCUSSION:

1095	      This protocol mandates only that the above fundamental relation-
1096	      ship concerning lease intervals is preserved.

1098	      In the interests of clarity, however, let's examine a specific
1099	      example.  The MCLT in this case is 1 hour.  The desired lease
1100	      interval is 3 days, and its renewal time is half the lease inter-
1101	      val.

1103	      The rules for this example are:

1105	      o What to tell the client:

1107	        Take the remainder of the acknowledged potential lease interval.
1108	        If this is a new lease, then this value will be zero.  If this
1109	        remainder plus the MCLT is greater than the desired lease inter-
1110	        val, give the client the desired lease interval else give the
1111	        client the remainder plus the MCLT.

1113	      o What to tell the failover partner server:

1115	        Take the renewal interval (typically half of the actual client
1116	        lease interval), add to it the desired lease interval, and add
1117	        it to the current time to yield the value that goes into the
1118	        potential-expiration-time option.

1120	        Also tell the failover partner the actual lease interval by
1121	        adding it to the current time to yield the value that goes into
1122	        the lease-expiration option.

1124	      In operation this might work as follows:

1126	      When a server makes an offer for a new lease on an IP address to a
1127	      DHCP client, it determines the desired lease interval (in this
1128	      case, 3 days).  It then examines the acknowledged potential lease
1129	      interval (which in this case is zero) and determines the remainder
1130	      of the time left to run, which is also zero.  To this it adds the
1131	      MCLT.  Since the actual lease interval cannot be allowed to exceed
1132	      the remainder of the current acknowledged potential lease interval
1133	      plus the MCLT, the offer made to the client is for the remainder
1134	      of the current acknowledged potential lease interval (i.e., zero)
1135	      plus the MCLT.  Thus, the actual lease interval is 1 hour.

1137	      Once the server has performed the DHCPACK to the DHCP client, it
1138	      will update the secondary server with the lease information. How-
1139	      ever, the desired potential lease interval will be composed of one
1140	      half of the current actual lease interval added to the desired
1141	      lease interval. Thus, the secondary server is updated with a
1142	      BNDUPD with a lease interval of 3 days + 1/2 hour specified in the
1143	      potential-expiration-time option.

1145	      When the primary server receives a BNDACK to its update of the
1146	      secondary server's (partner's) potential lease interval, it
1147	      records that as the acknowledged potential lease interval.  A
1148	      server MUST NOT send a BNDACK in response to a BNDUPD message
1149	      until it is sure that the information in the BNDUPD message
1150	      resides in its stable storage.  Thus, the primary server in this
1151	      case can be sure that the secondary server has recorded the poten-
1152	      tial lease interval in its stable storage when the primary server
1153	      receives a BNDACK message from the secondary server.

1155	      When the DHCP client attempts to renew at T1 (approximately one
1156	      half an hour from the start of the lease), the primary server
1157	      again determines the desired lease interval, which is still 3
1158	      days.  It then compares this with the remaining acknowledged
1159	      potential lease interval (3 days + 1/2 hour) and adjusts for the
1160	      time passed since the secondary was last updated (1/2 hour).  Thus
1161	      the time remaining of the acknowledged potential lease interval is
1162	      3 days.  Adding the MCLT to this yields 3 days plus 1 hour, which
1163	      is more than the desired lease interval of 3 days.  So the client
1164	      is renewed for the desired lease interval -- 3 days.

1166	      When the primary DHCP server updates the secondary DHCP server
1167	      after the DHCP client's renewal ACK is complete, it will calculate
1168	      the desired potential lease interval as the T1 fraction of the
1169	      actual client lease interval (1/2 of 3 days this time = 1.5 days).
1170	      To this it will add the desired client lease interval of 3 days,
1171	      yielding a total desired partner server lease interval of 4.5
1172	      days.  In this way, the primary attempts to have the secondary
1173	      always "lead" the client in its understanding of the client's
1174	      lease interval so as to be able to always offer the client the
1175	      desired client lease interval.

1177	      Once the initial actual client lease interval of the MCLT is past,
1178	      the protocol operates effectively like the DHCP protocol does
1179	      today in its behavior concerning lease intervals. However, the
1180	      guarantee that the actual client lease interval will never exceed
1181	      the remaining acknowledged partner server lease interval by more
1182	      than the MCLT allows full recovery from a variety of failures.

1184	5.2.2.  Controlled re-allocation of IP addresses

1186	   When in PARTNER-DOWN state there is a waiting period after which an
1187	   IP address can be re-allocated to another client.  For IP addresses
1188	   which are available when the server enters PARTNER-DOWN state, the
1189	   period is the MCLT from entry into PARTNER-DOWN state.  For IP
1190	   addresses which are not available when the server enters PARTNER-DOWN
1191	   state, the period is the MCLT after the IP address becomes available.
1192	   See section 9.4.2 for more details.

1194	   In any other state, a server cannot reallocate an address from one
1195	   client to another without first notifying its partner (through a
1196	   BNDUPD message) and receiving acknowledgement (through a BNDACK mes-
1197	   sage) that its partner is aware that that first client is not using
1198	   the address.

1200	   This could be modeled in the following way.  Though this specific
1201	   implementation is in no way required, it may serve to better illus-
1202	   trate the concept.

1204	   An "available" IP address on a server may be allocated to any client.
1205	   An IP address which was leased to a client and which expired or was
1206	   released by that client would take on a new state, EXPIRED or
1207	   RELEASED respectively.  The partner server would then be notified
1208	   that this IP address was EXPIRED or RELEASED through a BNDUPD.  When
1209	   the sending server received the BNDACK for that IP address showing it
1210	   was FREE, it would move the IP address from EXPIRED or RELEASED to
1211	   FREE, and it would be available for allocation by the primary server
1212	   to any clients.

1214	   A server MAY reallocate an IP address in the EXPIRED or RELEASED
1215	   state to the same client with no restrictions provided it has not
1216	   sent a BNDUPD message to its partner.  This situation would exist if
1217	   the lease expired or was released after the transition into PARTNER-
1218	   DOWN state, for instance.

1220	5.3.  Load balancing

1222	   In order to implement load balancing between a primary and secondary
1223	   server pair, each server must respond to DHCPDISCOVER requests from
1224	   some clients and not from other clients.  In order to do this suc-
1225	   cessfully, each server must be able to determine immediately upon
1226	   receipt of a DHCP client request whether it is to service this
1227	   request or to ignore it in order to allow the other server to service
1228	   the request.

1230	   In addition, it should be possible to configure the percentage of
1231	   clients which will be serviced by either the primary or secondary
1232	   server.  This configuration should be more or less continuous, from
1233	   all clients serviced by the primary through an even split with half
1234	   serviced by each, to all clients serviced by the secondary.

1236	   The technique chosen to support these goals is described in [RFC
1237	   3074].

1239	   A bitmap-style Hash Bucket Assignment (as described in [RFC 3074]) is
1240	   used to determine which DHCP clients can be processed.  There are two
1241	   potential HBA's in a failover server -- a server HBA and a failover
1242	   HBA.   The way that a server acquires a server HBA is outside of the
1243	   scope of the failover protocol, but both servers in a failover pair
1244	   MUST have the same server HBA. The failover HBA (which specifies the
1245	   clients that the secondary is supposed to process) is sent by the
1246	   primary server to the secondary server whenever a connection is esta-
1247	   blished, using the hash-bucket-assignment option defined in section
1248	   12.11.

1250	   When using the server HBA (if any) and the failover HBA (if any), to
1251	   decide whether to process a DHCP request, the server HBA always
1252	   applies in every failover state, and the failover HBA (which MUST be
1253	   a subset of the server HBA) is used by the secondary server to decide
1254	   which packets to process when in NORMAL state.

1256	5.4.  IP address allocations between servers

1258	   The failover protocol allows a DHCP server which implements it to
1259	   operate correctly in spite of the uncertainty over whether its
1260	   partner has failed or whether the communications link to its partner
1261	   has failed.  This is made possible in part by the existence of
1262	   separate address pools on each server for allocation to newly arrived
1263	   DHCP clients.

1265	   Thus, each server has its own pool of available IP addresses.  Note
1266	   that an IP address is not "owned" by a particular server throughout
1267	   its entire lifetime.  Only an IP address which is available is
1268	   "owned" by a particular server -- once it has been leased to a DHCP
1269	   client, it is not owned by either failover partner.  When it finally
1270	   becomes available again, it will be owned initially by the primary
1271	   server, and it may or may not be allocated to the secondary server by
1272	   the primary server.

1274	   So, the flow of IP address ownership is as follows: initially an IP
1275	   address is owned by the primary server.  It may be allocated to the
1276	   secondary server if it is available, and then it is owned by the
1277	   secondary server.  Either server can allocate available IP addresses
1278	   which they own to DHCP clients, in which case they cease to own them.
1279	   When the DHCP client releases the address or the lease on it expires,
1280	   it will again become available and will be owned by the primary.

1282	   An IP address will not become owned by the server which allocated it
1283	   initially when it is released or the lease expires because, in gen-
1284	   eral, that server will have had to replenish its pool of available
1285	   addresses well in advance of any likely lease expirations.  Thus,
1286	   having a particular IP address cycle back to the secondary might well
1287	   put the secondary more out of balance with respect to the primary
1288	   instead of enhancing the balance of available addresses between them.

1290	   These address pools are used when in COMMUNICATIONS-INTERRUPTED state
1291	   and while waiting for the MCLT expiration in PARTNER-DOWN state.  In
1292	   addition, when using load balancing, these pools are used when in
1293	   NORMAL state as well.

1295	   This allocation and maintenance of these address pools is an area of
1296	   some sensitivity, since the goal is to maintain a more or less con-
1297	   stant ratio of available addresses between the two servers.

1299	   The initial allocation when the servers first integrate is triggered
1300	   by the POOLREQ message from the secondary to the primary.  This is
1301	   followed by the POOLRESP message where the primary tells the secon-
1302	   dary how many IP addresses it allocated to the secondary.  Then, the
1303	   primary sends the allocated IP addresses to the secondary via BNDUPD
1304	   messages.  l The POOLREQ/POOLRESP message is a trigger to the primary
1305	   to perform a scan of its database and to ensure that the secondary
1306	   has enough IP addresses (based on some configured ratio).

1308	   The actual IP addresses are sent to the secondary using the BNDUPD
1309	   message with a state of BACKUP, which indicates the IP address is now
1310	   available for allocation by the secondary.  Once the message is sent,
1311	   the primary MUST NOT use these addresses for allocation to DHCP
1312	   clients.

1314	   The POOLREQ/POOLRESP message exchange initiated by the secondary is
1315	   valid at any time, and the primary server SHOULD, whenever it
1316	   receives the POOLREQ message, scan its database of address pools and
1317	   determine if the secondary needs more IP addresses from any of the IP
1318	   address pools.

1320	   However, in order to support a reasonably dynamic balance of the IP
1321	   addresses between the failover partners, the primary server needs to
1322	   do additional work to ensure that the secondary server has as many IP
1323	   addresses as it needs (but that it doesn't have *more* than it needs
1324	   either).

1326	   The primary server SHOULD examine the balance of available addresses
1327	   between the primary and secondary for a particular address pool when-
1328	   ever the number of available addresses for either the primary or
1329	   secondary changes.  The primary server SHOULD adjust the available
1330	   address balance as required to ensure the configured address balance,
1331	   excepting that the primary server SHOULD employ some threshold
1332	   mechanism to such a balance adjustment in order to minimize the over-
1333	   head of maintaining this balance.

1335	   An example of a threshold approach is: do not attempt to re-balance
1336	   the available pools on the primary and secondary until the out of
1337	   balance value exceeds a configured value.

1339	   The primary server can, at any time, send an available IP address to
1340	   the secondary using a BNDUPD with the state BACKUP.  The primary
1341	   server can attempt to take an available IP address away from the
1342	   secondary by sending a BNDUPD with the state FREE.  If the secondary
1343	   accepts the BNDUPD, then it is now available to the PRIMARY and not
1344	   available to the secondary.  Of course, the secondary MUST reject
1345	   that BNDUPD if it has already used that IP address for a DHCP client.

1347	   Whenever the primary server examines the possible available IP
1348	   addresses which it could send to the secondary server, the primary
1349	   server SHOULD take into account whether load balancing is in use, and
1350	   it SHOULD attempt to send to the secondary any IP addresses whose
1351	   most recent client would be processed by the secondary under the
1352	   current load balancing regime in use.  Likewise, when removing avail-
1353	   able IP addresses from the secondary server when load balancing is in
1354	   use, the primary server SHOULD first remove those IP addresses whose
1355	   most recent client would be processed by the primary server under the
1356	   current load balancing regime in use.

1358	5.5.  Operating in NORMAL state

1360	   When in NORMAL state, each server services DHCPDISCOVER's and all
1361	   other DHCP requests other than DHCPREQUEST/RENEWAL or
1362	   DHCPREQUEST/REBINDING from the client set defined by the load balanc-
1363	   ing algorithm [RFC 3074].  Each server services DHCPREQUEST/RENEWAL
1364	   or DHCPDISCOVER/REBINDING requests from any client.

1366	   In general, whenever the binding database is changed in stable
1367	   storage (other than a change resulting from receiving a BNDUPD from
1368	   the failover partner), then a BNDUPD message is sent with the con-
1369	   tents of that change to the partner server.  The partner server then
1370	   writes the information about that binding in its bindings database in
1371	   stable storage and replies with a BNDACK message.

1373	   The binding database in a DHCP server would normally be changed as a
1374	   result of DHCP protocol activity with a DHCP client  (e.g., granting
1375	   a lease to a DHCP client through the familiar
1376	   DISCOVER/OFFER/REQUEST/ACK cycle or extending a lease due to a
1377	   renewal from a DHCP client) or possibly (on some servers) because a
1378	   lease has expired or undergone another state change that must be
1379	   recorded in the DHCP binding database.  These are the state changes
1380	   that would be communicated to the partner server using a BNDUPD mes-
1381	   sage.  Of course, receipt of a BNDUPD message itself will normally
1382	   cause an update of the binding database for all of the IP addresses
1383	   contained in the BNDUPD, and a binding database change such as this
1384	   MUST NOT trigger a corresponding BNDUPD message to the partner.

1386	5.6.  Operating in COMMUNICATIONS-INTERRUPTED state

1388	   When operating in COMMUNICATIONS-INTERRUPTED state, each server is
1389	   operating independently, but does not assume that its partner is not
1390	   operating.  The partner server might be operating and simply unable
1391	   to communicate with this server, or might not be operating.

1393	   Each server responds to the full range of DHCP client messages that
1394	   it receives (subject to server load balancing [RFC 3074]), but in
1395	   such a way that graceful reintegration is always possible when its
1396	   partner comes back into contact with it.

1398	5.7.  Operating in PARTNER-DOWN state

1400	   When operating in PARTNER-DOWN state, a server assumes that its
1401	   partner is not currently operating, but does make allowances for the
1402	   possibility that that server was operating in the past, though possi-
1403	   bly out of communications with this server.  It responds to all DHCP
1404	   client requests in PARTNER-DOWN state (subject to server load balanc-
1405	   ing [RFC 3074]).

1407	5.8.  Operating in RECOVER state

1409	   A server operating in RECOVER state assumes that it is reintegrating
1410	   with a server that has been operating in PARTNER-DOWN state, and that
1411	   it needs to update its bindings database before it services DHCP
1412	   client requests.

1414	   A server may also operate in RECOVER state in order to fully recover
1415	   its bindings database from its partner server.

1417	5.9.  Operating in STARTUP state

1419	   A server operating in STARTUP state assumes that failover is opera-
1420	   tional, and it spends a short time whenever it comes up attempting to
1421	   contact the partner.  During this short time, the server is unrespon-
1422	   sive to DHCP client requests.  This period exists in order to give a
1423	   server a chance to determine that its partner has changed state since
1424	   it was last in communications, and to react to that changed state (if
1425	   any) prior to responding to DHCP client requests.

1427	   The startup period SHOULD be conditioned on the length of time the
1428	   server has been down (if that can be determined).  If the server has
1429	   been down less than the MCLT then it can wait only a few (say 5 or
1430	   10) seconds.  If it has been down a longer time (such that the
1431	   partner may well have moved to PARTNER-DOWN state), a considerably
1432	   longer startup period of 30 to 60 seconds may be warranted, since the
1433	   consequences of running while the partner is in PARTNER-DOWN state
1434	   are unpleasant.

1436	   The period of time a server remains in STARTUP state SHOULD be long
1437	   enough to ensure that it will connect to the other server if that
1438	   server is available for connections.

1440	5.10.  Time synchronization between servers

1442	   The failover protocol is designed to operate between two servers
1443	   which have time values which differ by an arbitrarily large amount.
1444	   A particular implementation MAY choose to only support servers whose
1445	   time values differ by an arbitrarily small amount.

1447	   In any event, whether large or only small differences in time values
1448	   are supported, every message that is received MUST be tagged with a
1449	   time value as soon as possible after receipt.  This time value is
1450	   used along with the time value that is sent in every message between
1451	   the failover partners to develop a delta time between the servers.
1452	   This delta time is used during the connection process to establish a
1453	   baseline delta time between the servers, and upon receipt of each
1454	   message, the delta time for that message is used to refine the delta
1455	   time for the server pair.

1457	   While the algorithm for this refinement of delta time is not speci-
1458	   fied as part of this protocol, a server SHOULD allow the delta time
1459	   value for a pair of failover servers to be periodically updated to
1460	   account for time drift.  In addition, the delta time value between
1461	   servers SHOULD be smoothed in some fashion, so that transient network
1462	   delays will not cause it to vary wildly.

1464	   A server SHOULD recognize a drastic change in the delta time value as
1465	   an event to be signaled to a network administrator, as well as reset-
1466	   ting the time delta between the failover partners.

1468	   The specific definitions of a minor or drastic change in delta time
1469	   as well as the algorithm used to smooth minor changes into the run-
1470	   ning delta time are implementation issues and are not further
1471	   addressed in this document.

1473	5.11.  IP address binding-status

1475	   In most DHCP servers an IP address can take on several different
1476	   binding-status values, sometimes also called states.  While no two
1477	   DHCP servers probably have exactly the same possible binding-status
1478	   values, the DHCP RFC enforces some commonality among the general
1479	   semantics of the binding-status values used by various DHCP server
1480	   implementations.

1482	   In order to transmit binding database updates between one server and
1483	   another using the failover protocol, some common denominator
1484	   binding-status values must be defined.  It is not expected that these
1485	   binding-status-values correspond with any actual implementation of
1486	   the DHCP protocol in a DHCP server, but rather that the binding-
1487	   status values defined in this document should be a common denominator
1488	   of those in use by many DHCP server implementations.  It is a goal of
1489	   this protocol that any DHCP server can map the various IP address
1490	   binding-status values that it uses internally into these failover IP
1491	   address binding-status values on transmission of binding database
1492	   updates to its partner, and likewise that it can map any failover IP
1493	   address binding-status values it received in a binding update into
1494	   its internal IP address binding-status values.

1496	   The IP address binding-status values defined for the failover proto-
1497	   col are listed below.  Unless otherwise noted below, there MAY be
1498	   client information associated with each of these binding-status
1499	   values.

1501	      o ACTIVE -- Lease is assigned to a client. Client identification
1502	        MUST appear.

1504	      o EXPIRED -- indicates that a client's binding on an IP address
1505	        has expired. When the partner server ACK's the BNDUPD of an
1506	        EXPIRED IP address, the server sets its internal state to FREE.
1507	        It is then available for allocation to any client of the primary
1508	        server.  It may be allocated to the same client on the server
1509	        where the lease expired if a BNDUPD containing the EXPIRED state
1510	        has not yet been sent to the partner (e.g., in the event that
1511	        the servers are not in communication).  Client identification
1512	        SHOULD appear.

1514	      o RELEASED -- indicates that a DHCP client sent in a DHCPRELEASE
1515	        message.  When the partner server ACK's the BNDUPD of an
1516	        RELEASED IP address, the server sets its internal state to FREE,
1517	        and it is available for allocation by the primary server to any
1518	        DHCP client.  It may be allocated to the same client if a BNDUPD
1519	        has not yet been sent to the partner.  Client identification
1520	        SHOULD appear.

1522	      o FREE -- is used when a DHCP server needs to communicate that an
1523	        IP address is unused by any DHCP client, but it was not just
1524	        released, expired, or reset by a network administrator.  When
1525	        the partner server ACK's the BNDUPD of a FREE IP address, the
1526	        server sets its internal state such that it is available for
1527	        allocation by the primary DHCP server to any DHCP client.  (Note
1528	        that in PARTNER-DOWN state, after waiting the MCLT, the IP
1529	        address MAY be allocated to a DHCP client by the secondary
1530	        server.)

1532	        Note that when an IP address that was allocated by the secondary
1533	        reverts to the FREE state, it must (like any other IP address)
1534	        be assigned to the secondary through the POOLREQ/BNDUPD process
1535	        before the secondary can reallocate it.

1537	        Client identification MAY appear.

1539	      o ABANDONED -- indicates that an IP address is considered unusable
1540	        by the DHCP subsystem.  An IP address for which a valid PING
1541	        response was received SHOULD be set to ABANDONED.  An IP address
1542	        for which a DHCPDECLINE was received should be set to ABANDONED.
1543	        Client identification MUST NOT appear.

1545	      o RESET -- indicates that this IP address was made available by
1546	        operator command.  This is a distinct state so that the reason
1547	        that the IP address became FREE can be determined.  Client iden-
1548	        tification MAY appear.

1550	      o BACKUP -- indicates that this IP address can be allocated by the
1551	        secondary server to a DHCP client at any time. When the MCLT has
1552	        passed after its time of entry into PARTNER-DOWN state, the IP
1553	        address may be allocated by the primary to any DHCP client.
1554	        Client identification MAY appear.

1556	   These binding-status values are communicated from one failover
1557	   partner to another using the binding-status option, see section 12.3
1558	   for details of this option.  Unless otherwise noted above there MAY
1559	   be client information associated with each of these binding-status
1560	   values.

1562	   An IP address will move between these binding-status values using the
1563	   following state transition diagram:

1565	                                        DHCP client DECLINE or
1566	                                        server detected problem
1567	                                        from any state
1568	                                                  |
1569	                                                  V
1570	                          +----------+         +--+------+
1571	         External   >---->|   RESET  |   (3)   |ABANDONED|
1572	         command          |          +<--------+         |
1573	                          +----------+         +---------+
1574	                               |
1575	                           Comm w/Parter(1)
1576	                               V
1577	     +---------+  Comm(1) +----------+   Comm(1) +---------+
1578	     | EXPIRED |--------->|  FREE    |<----------| RELEASED|
1579	     |         | w/Parter |          | w/Partner |         |
1580	     +---------+          +----------+           +---------+
1581	       ^     ^             |    |  +-----------+       ^
1582	       |     |             |    |              |       |
1583	       | Exp. grace     IP |  IP addr alloc.  IP addr  |
1584	       | period ends  address  to sec.(2)     reserved |
1585	       |     |        leased    V              |       |
1586	       |     |        by   |   +----------+    |       |
1587	       |     |        primary  |  BACKUP  |<---+       |
1588	       |   wait for        |   |          |            |
1589	       |  grace period     |   +----------+            |
1590	       |     |             |       |                   |
1591	       |     |             |    IP addr leased by      |
1592	       |  Expired grace    |       secondary           |
1593	       |  period exists    V       V                   |
1594	       |     |           +----------+                  |
1595	       |     | Lease on  |  ACTIVE  | DHCPRELEASE      |
1596	       +-----+-IP addr---|          |------------------+
1597	               expires   +----------+

1599	       Figure 5.11-1:  Transitions between binding-status values.

1601	       (1) This transition MAY also occur if the server is in
1602	       PARTNER-DOWN state and the MCLT has passed since the entry
1603	       in the RELEASED, EXPIRED, or RESET states.

1605	       (2) This transition MAY occur if the server is the secondary
1606	       and the MCLT has passed since its entry into PARTNER-DOWN state.

1608	       (3) This transition MAY occur due to an implementation specific
1609	       handling of ABANDONED IP addresses.

1611	   Again, note that a DHCP server implementing the failover protocol
1612	   does not have to implement either this state machine or use these
1613	   particular binding-status values in its normal operation of allocat-
1614	   ing IP addresses to DHCP clients.  It only needs to map its internal
1615	   binding-status-values onto these "standard" binding-status values,
1616	   and map these "standard" binding-status values back into its internal
1617	   binding-status values.  For example, a server which implements a
1618	   grace period for a IP address binding SHOULD simply wait to update
1619	   its partner server until the grace period on that binding has run
1620	   out.

1622	   The process of setting an IP address to FREE deserves some detailed
1623	   discussion.  When an IP address is moved to the EXPIRED,RELEASED, or
1624	   RESET binding-status on a server, it will send a BNDUPD with the
1625	   binding-status of EXPIRED, RELEASED, or RESET to its partner.  If its
1626	   partner agrees that is acceptable (see sections 7.1.2 and 7.1.3 con-
1627	   cerning why a server might not accept a BNDUPD) it will return a
1628	   BNDACK with no reject-reason, signifying that it accepted the update.
1629	   As part of the BNDUPD processing, the server returning the BNDACK
1630	   will set the binding-status of the IP address to FREE, and upon
1631	   receipt of the BNDACK the server which sent the BNDUPD will set the
1632	   binding-status of the IP address to FREE.  Thus, the EXPIRED,
1633	   RELEASED, or RESET binding-status is something of a transitory state.
1634	   This process is encoded in the transition diagram above by "Comm
1635	   w/Partner".

1637	5.12.  DNS dynamic update considerations

1639	   DHCP servers (and clients) can use DNS Dynamic Updates as described
1640	   in [RFC 2136] to maintain DNS name-mappings as they maintain DHCP
1641	   leases.  Many different administrative models for DHCP-DNS integra-
1642	   tion are possible.  Descriptions of several of these models, and
1643	   guidelines that DHCP servers and clients should follow in carrying
1644	   them out, are laid out in [FQDN].  The nature of the DHCP failover
1645	   protocol introduces some issues concerning dynamic DNS updates that
1646	   are not part of non-failover DHCP environments.  This section
1647	   describes these issues, and defines the information which failover
1648	   partners should exchange and the protocol which they should follow in
1649	   order to ensure consistent behavior.  The presence of this section
1650	   should not be interpreted as requiring that implementations of the
1651	   DHCP failover protocol must also support DDNS updates.  The purpose
1652	   of this discussion is to clarify the areas where the DHCP failover
1653	   and DHCP-DDNS protocols intersect for the benefit of implementations
1654	   which support both protocols, not to introduce a new requirement into
1655	   the DHCP failover protocol.  Thus, a DHCP server which implements the
1656	   failover protocol MAY also support dynamic DNS updates, but if it
1657	   does support dynamic DNS updates it SHOULD utilize the techniques
1658	   described here in order to correctly distribute them between the
1659	   failover partners.  See [FQDN], [DNSRES], and [DHCID] for details of
1660	   how DHCP servers update DNS.

1662	   From the standpoint of the failover protocol, there is no reason why
1663	   a server which is utilizing the DDNS protocol to update a DNS server
1664	   should not be a partner with a server which is not utilizing the DDNS
1665	   protocol to update a DNS server.  However, a server which is not able
1666	   to support DDNS or is not configured to support DDNS SHOULD output a
1667	   warning message when it receives BNDUPD messages which indicate that
1668	   its failover partner is configured to support the DDNS protocol to
1669	   update a DNS server.  An implementation MAY consider this an error
1670	   and refuse to operate, or it MAY choose to operate anyway, having
1671	   warned the user of the problem in some way.

1673	5.12.1.  Relationship between failover and dynamic DNS update

1675	   The failover protocol describes the conditions under which each fail-
1676	   over server may renew a lease to its current DHCP client, and
1677	   describes the conditions under which it may grant a lease to a new
1678	   DHCP client.  An analogous set of conditions determines when a fail-
1679	   over server should initiate a DDNS update, and when it should attempt
1680	   to remove records from the DNS. The failover protocol's conditions
1681	   are based on the desired external behavior: avoiding duplicate
1682	   address assignments; allowing clients to continue using leases which
1683	   they obtained from one failover partner even if they can only commun-
1684	   icate with the other partner; allowing the backup DHCP server to
1685	   grant new leases even if it is unable to communicate with the primary
1686	   server.  The desired external DDNS behavior for DHCP failover servers
1687	   is:

1689	      1.  Allow timely DDNS updates from the server which grants a
1690	          client a lease. Recognize that there is often a DDNS update
1691	          lifecycle which parallels the DHCP lease lifecycle. This is
1692	          likely to include the addition of records when the lease is
1693	          granted, and the removal of DNS records when the lease is sub-
1694	          sequently made available for allocation to a different client.

1696	      2.  Communicate enough information between the two failover
1697	          servers to allow one to complete the DDNS update 'lifecycle'
1698	          even if the other server originally granted the lease.

1700	      3.  Avoid redundant or overlapping DDNS updates, where both fail-
1701	          over servers are attempting to perform DDNS updates for the
1702	          same lease-client binding. Avoid situations where one partner
1703	          is attempting to add RRs related to a lease binding while the
1704	          other partner is attempting to remove RRs related to the same
1705	          lease binding.

1707	5.12.2.  Use of the DDNS option

1709	   In order for either server to be able to complete a DDNS update, or
1710	   to remove DNS records which were added by its partner, both servers
1711	   need to know the FQDN associated with the lease-client binding. The
1712	   FQDN associated with the client's A RR and PTR RR SHOULD be communi-
1713	   cated from the server which adds records into the DNS to its partner.
1714	   The initiating server SHOULD use the DDNS option in the BNDUPD mes-
1715	   sages to inform the partner server of the status of any DDNS updates
1716	   associated with a lease binding. Failover servers MAY choose not to
1717	   include the DDNS option in BNDUPD messages if there has been no
1718	   change in the status of any DDNS update related to the lease binding.
1719	   The partner server receiving BNDUPD messages containing the DDNS
1720	   option SHOULD compare the status flags and the FQDN contained in the
1721	   option data with the current DDNS information it has associated with
1722	   the lease binding, and update its notion of the DDNS status accord-
1723	   ingly.

1725	   The initiating server MAY send a BNDUPD to its partner before the
1726	   DDNS update has been successfully completed. If it does so, it SHOULD
1727	   leave the 'C' bit in the Flags field clear, to indicate to the
1728	   partner that the DDNS update may not be complete. When the DDNS
1729	   update has been successfully acknowledged by the DNS server, the ini-
1730	   tiating DHCP server SHOULD include the DDNS option in its next BNDUPD
1731	   message about the binding, so that the partner server will be able to
1732	   record the final status of the DDNS update. The initiating server
1733	   SHOULD set the 'C' bit in the DDNS option if the DDNS update was suc-
1734	   cessfully accepted by the DNS server.

1736	   Some implementations will choose to send a BNDUPD without waiting for
1737	   the DDNS update to complete, and then will send a second BNDUPD once
1738	   the DDNS update is complete. Other implementations will delay sending
1739	   the partner a BNDUPD until the DDNS update has been acknowledged by
1740	   the DNS server, or until some time-limit has elapsed, in order to
1741	   avoid sending a second BNDUPD.

1743	   The Domain Name field in the DDNS option contains the FQDN that will
1744	   be associated with the A RR (if the server is performing an A RR
1745	   update for the client) and the PTR RR. This FQDN may be composed in
1746	   any of several ways, depending on server configuration and the infor-
1747	   mation provided by the client in its DHCP messages. The client may
1748	   supply a hostname which it would like the server to use in forming
1749	   the FQDN, or it may supply the entire FQDN. The server may be config-
1750	   ured to attempt to use the information the client supplies, it may be
1751	   configured with an FQDN to use for the client, or it may be
1752	   configured to synthesize an FQDN. The responsive server SHOULD
1753	   include the FQDN that it will be using in DDNS updates it initiates
1754	   when it sends the DDNS option.

1756	   Since the responsive server may not have completed the DDNS update at
1757	   the time it sends the first BNDUPD about the lease binding, there may
1758	   be cases where the FQDN in later BNDUPD messages does not match the
1759	   FQDN included in earlier messages.  For example, the responsive
1760	   server may be configured to handle situations where two or more DHCP
1761	   client FQDNs are identical by modifying the most-specific label in
1762	   the FQDNs of some of the clients in an attempt to generate unique
1763	   FQDNs for them (a process sometimes called "disambiguation").  Alter-
1764	   natively, at sites which use some or all of the information which
1765	   clients supply to form the FQDN, it's possible that a client's confi-
1766	   guration may be changed so that it begins to supply new data.  The
1767	   responsive server may react by removing the DNS records which it ori-
1768	   ginally added for the client, and replacing them with records that
1769	   refer to the client's new FQDN. In such cases, the responsive server
1770	   SHOULD include the actual FQDN that was used in subsequent DDNS
1771	   options.  The responsive server SHOULD include relevant client-option
1772	   data in the client-request-options option in its BNDUPD messages.
1773	   This information may be necessary in order to allow the non-
1774	   responsive partner to detect client configuration changes that change
1775	   the hostname or FQDN data which the client includes in its DHCP
1776	   requests.

1778	5.12.3.  Adding RRs to the DNS

1780	   A failover server which is going to perform DDNS updates SHOULD ini-
1781	   tiate the DDNS update when it grants a new lease to a client. The
1782	   non-responsive partner SHOULD NOT initiate a DDNS update when it
1783	   receives the BNDUPD after the lease has been granted. The failover
1784	   protocol ensures that only one of the partners will grant a lease to
1785	   any individual client, so it follows that this requirement will
1786	   prevent both partners from initiating updates simultaneously. The
1787	   server initiating the update SHOULD follow the protocol in [FQDN].
1788	   The server may be configured to perform an A RR update on behalf of
1789	   its clients, or not. Ordinarily, a failover server will not initiate
1790	   DDNS updates when it renews leases. In two cases, however, a failover
1791	   server MAY initiate a DDNS update when it renews a lease to its
1792	   existing client:

1794	      1.  When the lease was granted before the server was configured to
1795	          perform DDNS updates, the server MAY be configured to perform
1796	          updates when it next renews existing leases. Since both
1797	          servers are responsive to renewals in NORMAL state, it is not
1798	          enough to simply require the non-responsive server to avoid a
1799	          DNS update in this case.  The server which would be responsive
1800	          to a DHCPDISCOVER from this client (even though the current
1801	          request is a DHCPREQUEST/RENEW) is the server which should
1802	          initiate the DDNS update.

1804	      2.  If a server is in PARTNER-DOWN state, it can conclude that its
1805	          partner is no longer attempting to perform an update for the
1806	          existing client. If the remaining server has not recorded that
1807	          an update for the binding has been successfully completed, the
1808	          server MAY initiate a DDNS update.  It MAY initiate this
1809	          update immediately upon entry to PARTNER-DOWN state, it may
1810	          perform this in the background, or it MAY initiate this update
1811	          upon next hearing from the DHCP client.

1813	5.12.4.  Deleting RRs from the DNS

1815	   The failover server which makes an IP address FREE SHOULD initiate
1816	   any DDNS deletes, if it has recorded that DNS records were added on
1817	   behalf of the client.

1819	   A server not in PARTNER-DOWN state "makes an IP address FREE" when it
1820	   initiates a BNDUPD with a binding-status of FREE, EXPIRED, or
1821	   RELEASED.  Its partner confirms this status by acking that BNDUPD,
1822	   and upon receipt of the ACK the server has "made the IP address
1823	   FREE".  Conversely, a server in PARTNER-DOWN state "makes an IP
1824	   address FREE" when it sets the binding-status to FREE, since in
1825	   PARTNER-DOWN state no communications is required with the partner.

1827	   It is at this point that it should initiate the DDNS operations to
1828	   delete RRs from the DDNS. Its partner SHOULD NOT initiate DDNS
1829	   deletes for DNS records related to the lease binding as part of send-
1830	   ing the BNDACK message.   The partner MAY have issued BNDUPD messages
1831	   with a binding-status of FREE, EXPIRED, or RELEASED previously, but
1832	   the other server will have NAKed these BNDUPD messages.

1834	   The failover protocol ensures that only one of the two partner
1835	   servers will be able to make a lease FREE. The server making the
1836	   lease FREE may be doing so while it is in NORMAL communication with
1837	   its partner, or it may be in PARTNER-DOWN state. If a server is in
1838	   PARTNER-DOWN state, it may be performing DDNS deletes for RRs which
1839	   its partner added originally. This allows a single remaining partner
1840	   server to assume responsibility for all of the DDNS activity which
1841	   the two servers were undertaking.

1843	   Another implication of this approach is that no DDNS RR deletes will
1844	   be performed while either server is in COMMUNICATIONS-INTERRUPTED
1845	   state, since no IP addresses are moved into the FREE state during
1846	   that period.

1848	5.13.  Reservations and failover

1850	   Some DHCP servers support a capability to offer specific pre-
1851	   configured IP addresses to DHCP clients.  These are real DHCP
1852	   clients, they do the entire DHCP protocol, but these servers always
1853	   offer the client a specific pre-configured IP address -- and they
1854	   offer that IP address to no other clients.  Such a capability has
1855	   several names, but it is sometimes called a "reservation", in that
1856	   the IP address is reserved for a particular DHCP client.

1858	   In a situation where there are two DHCP servers serving the same sub-
1859	   net without using failover, the two DHCP server's need to have dis-
1860	   joint IP address pools, but identical reservations for the DHCP
1861	   clients.

1863	   In a failover context, both servers need to be configured with the
1864	   proper reservations in an identical manner, but if we stop there
1865	   problems can occur around the edge conditions where reservations are
1866	   made for an IP address that has already been leased to a different
1867	   client.  Different servers handle this conflict in different ways,
1868	   but the goal of the failover protocol is to allow correct operation
1869	   with any server's approach to the normal processing of the DHCP pro-
1870	   tocol.

1872	   The general solution with regards to reservations is as follows.
1873	   Whenever a reserved IP address becomes FREE (i.e., when first config-
1874	   ured or whenever a client frees it or it expires or is reset), the
1875	   primary server MUST show that IP address as FREE (and thus available
1876	   for its own allocation) and it MUST send it to the secondary server
1877	   with the R bit set in the IP-flags option and the binding-status
1878	   BACKUP.

1880	   Note that this implies that a reserved IP address goes through the
1881	   normal state changes from FREE to ACTIVE (and possibly back to FREE).
1882	   The failover protocol supports this approach to reservations, i.e.,
1883	   where the IP address undergoes the normal state changes of any IP
1884	   address, but it can only be offered to the client for which it is
1885	   reserved.  Other approaches to the support of reservations exist in
1886	   some DHCP server implementations (e.g., where the IP address is
1887	   apparently leased to a particular client forever, without any expira-
1888	   tion).  The goal is for the failover protocol to support any of the
1889	   usual approaches to reservations, both those that allow an IP address
1890	   to go through different states when reserved, and those that don't.

1892	   From the above, it follows that a reservation soley on the secondary
1893	   will not necessarily allow the secondary to offer that address to
1894	   client to whom it is reserved.  The reservation must also appear on
1895	   the primary as well for the secondary to be able to offer the IP
1896	   address to the client to which is is reserved.

1898	   When the reservation on an IP address is cancelled, if the IP address
1899	   is currently FREE and the server is the primary, or BACKUP and the
1900	   server is the secondary, the server MUST send a BNDUPD to the other
1901	   server with the binding-status FREE and the R bit clear.

1903	5.14.  Dynamic BOOTP and failover

1905	   Some DHCP servers support a capability to offer IP addresses to BOOTP
1906	   clients without having a particular address previously allocated for
1907	   those clients.  This capability is often called something like
1908	   "dynamic BOOTP".  It is discussed briefly in RFC 1534 [RFC 1534].

1910	   This capability has a negative interaction with the fundamental ele-
1911	   ments of the failover protocol, in that an address handed out to a
1912	   BOOTP device has no term (or effectively no term, in that usually
1913	   they are considered leases for "forever").  There is no opportunity
1914	   to hand out a lease which is only the MCLT long when first hearing
1915	   from a BOOTP device, because they may only interact once with the
1916	   DHCP server and they have no notion of a lease expiration time.  Thus
1917	   the entire concept of the MCLT and waiting the MCLT after entering
1918	   PARTNER-DOWN state is defeated when dealing with BOOTP devices.

1920	   With some restrictions, however, dynamic BOOTP devices can be sup-
1921	   ported in a server on a subnet where failover is supported.  The only
1922	   restriction (and it is not small) is that on any portion of the sub-
1923	   net (in any address pool) where dynamic BOOTP devices can be allo-
1924	   cated IP addresses, a DHCP server MUST NOT ever use any of the IP
1925	   addresses which were previously available for allocation by its fail-
1926	   over partner.  Thus, the addresses allocated by the primary to the
1927	   secondary for allocation that might have been allocated to BOOTP dev-
1928	   ices MUST NOT ever be used by the primary server even if it is in
1929	   PARTNER-DOWN state and has waited the MCLT after entering that state.
1930	   Conversely, addresses available for allocation by the primary MUST
1931	   NOT be used by the secondary even it is in PARTNER-DOWN state.  The
1932	   reason for this is because one of those IP address could have been
1933	   allocated by the secondary server to a BOOTP device, and the primary
1934	   server would have no way of ever knowing that happened.

1936	   Whenever a server sends BNDUPD message to its partner, if the client
1937	   associated with the IP address is a BOOTP client, then the server
1938	   MUST set the B bit in the IP-flags option.

1940	   There is a very slight possibility that a BOOTP client could get an
1941	   IP address on each server of a failover pair.  When these two servers
1942	   eventually attempt to resolve this conflict, they SHOULD agree to
1943	   disagree, since it is not possible to know which IP address the BOOTP
1944	   client will actually use -- indeed, it could use both.  Operator
1945	   intervention will, in general, be required to rectify this situation.
1946	   Fortunately, it is extremely unlikely to ever actually occur.

1948	5.15.  Guidelines for selecting MCLT

1950	   There is no one correct value for the MCLT.  There is an explicit
1951	   tradeoff between various factors in selecting an MCLT value.

1953	5.15.1.  Short MCLT

1955	   A short MCLT value will mean that after entering PARTNER-DOWN state,
1956	   a server will only have to wait a short time before it can start
1957	   allocating its partner's IP addresses to DHCP clients.  Furthermore,
1958	   it will only have to wait a short time after the expiration of a
1959	   lease on an IP address before it can reallocate that IP address to
1960	   another DHCP client.

1962	   However the downside of a short MCLT value is that the initial lease
1963	   interval that will be offered to every new DHCP client will be short,
1964	   which will cause increased traffic as those clients will need to send
1965	   in their first renew in a half of a short MCLT time.  In addition,
1966	   the lease extensions that a server in COMMUNICATIONS-INTERRUPTED
1967	   state can give will be only the MCLT after the server has been in
1968	   COMMUNICATIONS-INTERRUPTED for around the desired client lease
1969	   period.  If a server stays in COMMUNICATIONS-INTERRUPTED for that
1970	   long, then the leases it hands out will be short and that will
1971	   increase the load on that server, possibly causing difficulty.

1973	5.15.2.  Long MCLT

1975	   A long MCLT value will mean that the initial lease period will be
1976	   longer and the time that a server in COMMUNICATIONS-INTERRUPTED state
1977	   will be able to extend leases (after it has been in COMMUNICATIONS-
1978	   INTERRUPTED state for around the desired client lease period) will be
1979	   longer.

1981	   However, a server entering PARTNER-DOWN state will have to wait the
1982	   longer MCLT before being able to allocate its partner's IP addresses
1983	   to new DHCP clients.  This may mean that additional IP addresses are
1984	   required in order to cover this time period.  Further, the server in
1985	   PARTNER-DOWN will have to wait the longer MCLT from every lease
1986	   expiration before it can reallocate an IP address to a different DHCP
1987	   client.

1989	5.16.  What is sent in response to an UPDREQ or UPDREQALL message?

1991	   In section 7.3, the UPDREQ message is defined, and it says that the
1992	   receiving server sends to the requesting server "all of the binding
1993	   database information that it has not already seen".  In section
1994	   7.4.2, the UPDREQALL message is defined, and it says that the receiv-
1995	   ing server sends to the requesting server "all binding database
1996	   information".

1998	   Both of these statements need further elaboration.

2000	   First, for the UPDREQ message, the information to be sent in BNDUPD
2001	   messages concerns "all of the binding database information it has not
2002	   already seen".  Since every BNDUPD is acked by the receiving server,
2003	   the sending server need only keep track of which IP addresses have
2004	   binding database changes not yet seen by the partner, and when they
2005	   are finally acked by the partner it can record that.  Thus, at any
2006	   time, it knows which IP addresses have unacked binding database
2007	   information.  This is less simple when, across reconfigurations of
2008	   the servers, an IP address can change the failover partner to which
2009	   it is associated.  In that case, it is important to reset the indica-
2010	   tion that the partner has seen this binding information.  See section
2011	   5.17, below, for a more complete discussion of this issue.

2013	   Second, in the event that a failover server's binding database infor-
2014	   mation is restored from a backup, it will be partially out of date.
2015	   In this case, its partner's indication of which binding database
2016	   information the restored server has seen will be also be out of date.

2018	   The solution to this problem is for a server which is connecting with
2019	   its partner to check the partner's last communicated time, and if it
2020	   is very much ahead of its own last communicated time, go to into
2021	   RECOVER state and transmit an UPDREQALL to allow it to refresh its
2022	   state.  See section 9.3.2, step 5.  If the partner's last communi-
2023	   cated time is very much behind its own record of when it last commun-
2024	   icated with the partner, then it SHOULD invalidate its information on
2025	   which binding database information the partner server knows, so that
2026	   it will send all of its relevant binding database information to the
2027	   partner.

2029	   Third, in the event that a server receives a UPDREQALL message, what
2030	   constitutes "all binding database information"?  At first glance this
2031	   would seem to be information on every configured IP address in the
2032	   server.  While this would be technically correct, it may impose a
2033	   serious and unacceptable performance penalty on servers which have
2034	   millions of configured IP addresses.  What can be done to lessen the
2035	   data that must be sent for an UPDREQALL?

2037	   When sending "all binding database information", if the sending
2038	   server sends only information concerning IP addresses which have been
2039	   at some time associated with clients, it will send enough information
2040	   to satisfy the needs of the failover protocol.  It need not send
2041	   information on any IP addresses that have never been used, since
2042	   presumably they will be initialized as available to the primary
2043	   server (i.e.  FREE) on any server employing failover.

2045	5.17.  How do you determine that your partner is "up to date" for
2046	specific binding?

2048	   Throughout this document, one server is assumed to know for each IP
2049	   address binding whether or not its partner is "up to date" for that
2050	   binding.  There are some subtle issues involved in recording this "up
2051	   to date" information about a specific binding.

2053	   In a steady state world, it would suffice to have a single bit in the
2054	   binding database to represent the information about whether the
2055	   partner was or was not up to date.

2057	   In a more complex environment a configuration change affecting a par-
2058	   ticular IP address may change the failover endpoint with which it is
2059	   associated, and if this should happen, any "up to date" bit which is
2060	   written into the bindings database will be accurate for only the pre-
2061	   vious failover endpoint, but not the current failover endpoint.  If
2062	   failover is disabled and then re-enabled (and the "up to date" bits,
2063	   if used, are not cleared) problems can also occur.

2065	   A server MUST have be able to relate the "up to date" condition to a
2066	   particular failover endpoint and even a particular instantiation of
2067	   that failover endpoint.  The techniques to do this are implementation
2068	   dependent.

2070	   In addition, section 7.4 requires that a server be able to remember
2071	   that an UPDREQALL message has been received and to treat every UPDREQ
2072	   message as an UPDREQALL message until the first UPDDONE message is
2073	   sent.  One way to do this is to clear all of the "up to date" indica-
2074	   tions for an entire failover endpoint upon receipt of an UPDREQALL
2075	   message, thereby ensuring that every active binding will be sent to
2076	   the partner whether through the completion of this UPDREQALL or
2077	   through processing of a subsequent UPDREQ message.  This is actually
2078	   better than remembering that an UPDREQALL was received and turning
2079	   every UPDREQ into an UPDREQALL, since any information sent in an
2080	   incomplete UPDREQALL (or subsequent UPDREQ messages turned into "all"
2081	   messages) will be remembered and not re-sent.

2083	6.  Common Message Format

2085	   This section discusses the common message format that all failover
2086	   messages have in common, including the message header format as well
2087	   as the common option format.  See section 12 for the the definitions
2088	   of the specific options used in the failover protocol.

2090	6.1.  Message header format

2092	   The options contained in the payload data section of the failover
2093	   message all use a two byte option number and two byte length format.

2095	   All failover protocol messages are sent over the TCP connection
2096	   between failover endpoints and encoded using a message format
2097	   specific to the failover protocol.

2099	   There exists a common message format for all failover messages, which
2100	   utilizes the options in a way similar to the DHCP protocol.  For each
2101	   message type, some options are required and some are optional.  In
2102	   addition, when a message is received any options that are not under-
2103	   stood by the receiving server MUST be ignored.

2105	   All of the fields in the fixed portion of the message MUST be filled
2106	   with correct data in every message sent.

2108	   0                   1                   2                   3
2109	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
2110	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2111	   |        message length (2)     | msg type (1)  |payload off (1)|
2112	   +---------------+---------------+---------------+---------------+
2113	   |                            time (4)                           |
2114	   +---------------------------------------------------------------+
2115	   |                            xid (4)                            |
2116	   +---------------------------------------------------------------+
2117	   |     0 or more additional header bytes  (variable)             |
2118	   +---------------------------------------------------------------+
2119	   |                    payload data  (variable)                   |
2120	   |                                                               |
2121	   |               formatted as DHCP-style options                 |
2122	   |           using a two byte option code and two byte length    |
2123	   |                  See section 6.2 for details.                 |
2124	   +---------------------------------------------------------------+

2126	   message length - 2 bytes, network byte order

2128	   This is the length of the message in bytes. It includes the two byte
2129	   message length itself.  The maximum length is 2048 bytes.  The
2130	   minimum length is 12.

2132	   msg type - 1 byte

2134	   The message type field is used to distinguish between messages.

2136	   The following message types are defined:

2138	   Value   Message Type
2139	   -----   ------------
2140	   0       reserved    not used
2141	   1       POOLREQ     request allocation of addresses
2142	   2       POOLRESP    respond with allocation count
2143	   3       BNDUPD      update partner with binding info
2144	   4       BNDACK      acknowledge receipt of binding update
2145	   5       CONNECT     establish connection with the secondary
2146	   6       CONNECTACK  respond to attempt to establish connection with partner
2147	   7       UPDREQALL   request full transfer of binding info
2148	   8       UPDDONE     ack send and ack of req'd binding info
2149	   9       UPDREQ      request transfer of un-acked binding info
2150	   10      STATE       inform partner of current state or state change
2151	   11      CONTACT     probe communications integrity with partner
2152	   12      DISCONNECT  close a connection

2154	   New message types should be defined in one of two ranges, 0-127 or
2155	   129-255.  The range of 0-127 is used for messages that MUST be sup-
2156	   ported by every server, and if a server receives a message in the
2157	   range of 0-127 that it doesn't understand, it MUST close the TCP con-
2158	   nection.  The range of 128-255 is used for messages which MAY be sup-
2159	   ported but are not required, and if a server receives a message in
2160	   this range that it does not understand it SHOULD ignore the message.

2162	   payload offset - 1 byte

2164	   The byte offset of the Payload Data, from the beginning of the
2165	   failover message header. The value for the current protocol version
2166	   (version 1) is 8.

2168	   time - 4 bytes, network byte order

2170	   The absolute time in GMT when the message was transmitted,
2171	   represented as seconds elapsed since Jan 1, 1970 (i.e., similar to
2172	   the ANSI C time_t time value representation).  While the ANSI C
2173	   time_t value is signed, the value used in this specification is
2174	   unsigned.

2176	   A server SHOULD set this time as close to the actual transmission of
2177	   the message as possible.

2179	   xid - 4 bytes, network byte order

2181	   This is the transaction id of the failover message. The sender of a
2182	   failover protocol message is responsible for setting this number, and
2183	   the receiver of the message copies the number over into any response
2184	   message, treating it as opaque data. The sender MUST ensure that
2185	   every message sent from a particular failover endpoint over the
2186	   associated TCP connection has a unique transaction id.

2188	   For failover messages that have no corresponding response message,
2189	   the XID value is meaningless, but MUST be supplied. The XID value is
2190	   used solely by the receiver of a response message to determine the
2191	   corresponding request message.

2193	   Request messages where the XID is used in the corresponding response
2194	   messages are: POOLREQ, BNDUPD, CONNECT, UPDREQALL, and UPDREQ. The
2195	   corresponding response messages are POOLRESP, BNDACK, CONNECTACK,
2196	   UPDDONE, and UPDDONE, respectively.

2198	   As requests/responses don't survive connection reestablishment, XIDs
2199	   only need to be unique during a specific connection.

2201	   payload data - variable length

2203	   The options are placed after the header, after skipping payload
2204	   offset bytes from beginning of the message.  The payload data options
2205	   are not preceded by a "cookie" value.

2207	   The payload data is formatted as DHCP style options using two byte
2208	   option codes and two byte option lengths.  The option codes are in a
2209	   namespace which is unique to the failover protocol.

2211	   The maximum length of the payload data in octets is 2048 less the
2212	   size of the header, i.e., the maximum message length is 2048 octets.

2214	6.2.  Common option format

2216	   The options contained in the payload data section of the failover
2217	   message all use a two byte option number and two byte length format.

2219	   The option numbers are drawn from an option number space unique to
2220	   the failover protocol.  All of the message types share a common
2221	   option number space and common options definitions, though not all
2222	   options are required or meaningful for every message.

2224	   In contrast to the options which appear in DHCP client and server
2225	   messages, the options in failover message are ordered.  That is, for
2226	   some messages the order in which the options appear in the payload
2227	   data area is significant.  The messages for which option ordering is
2228	   significant explicitly describe the ordering requirements.  If no
2229	   ordering requirements are mentioned, then the order is not signifi-
2230	   cant for that message.

2232	   For all options which refer to time, they all use an absolute time in
2233	   GMT.  Time synchronization has already been achieved between the
2234	   source and the target server using the CONNECT message and is updated
2235	   and refined using the time in every packet.

2237	   The time value is an unsigned 32 bit integer in network byte order
2238	   giving the number of seconds since 00:00 UTC, 1st January 1970. This
2239	   can be converted to an NTP timestamp by adding decimal 2208988800.
2240	   This time format will not wrap until the year 2106.  Until sometime
2241	   in 2038, it is equal to the ANSI C time_t value (which is a signed 32
2242	   bit value and will overflow into a negative number in 2038).

2244	   Options should appear once only in each message (except for BNDUPD
2245	   and BNDACK messages where bulking is used, see section 6.3 for
2246	   details.)  An option that appears twice is not concatenated, but
2247	   treated as an error.

2249	   Specific option values are described in section 12.

2251	   See section 13 for how to define additional options.

2253	6.3.  Batching multiple binding update transactions in one BNDUPD mes-
2254	sage

2256	   Implementations of this protocol MAY send multiple binding update
2257	   transactions in one BNDUPD message, where a binding update transac-
2258	   tion is defined as the set of options which are associated with the
2259	   update of a single IP address.  All implementations of this protocol
2260	   MUST be prepared to receive BNDUPD messages which contain multiple
2261	   binding update transactions and respond correctly to them, including
2262	   replying with a BNDACK message which contains status for the multiple
2263	   binding update transactions contained in the BNDUPD message.

2265	   In the discussion of sending and receiving BNDUPD messages in section
2266	   7.1 and BNDACK messages in section 7.2, each BNDUPD message and
2267	   BNDACK message is assumed to contain a single binding update transac-
2268	   tion in order to reduce the complexity of the discussions in section
2269	   7.

2271	   Multiple binding update transactions MAY be batched together in one
2272	   BNDUPD protocol message with the data sets for the individual tran-
2273	   sactions delimited by the assigned-IP-address option, which MUST
2274	   appear first in the option set for each transaction.  Ordering of
2275	   options between the assigned-IP-address options is not significant.
2276	   This is illustrated in the following schematic representation:

2278	       Non-IP Address/Non-client specific options first
2279	       assigned-IP-address option for the first IP address
2280	           Options pertaining to first address, including at least the
2281	           binding-status option and others as required.
2282	       assigned-IP-address option for the second IP address
2283	           Options pertaining to second address, including at least the
2284	           binding-status option and others as required.
2285	       ...
2286	       Trailing options (message digest).

2288	   There MUST be a one-to-one correspondence between BNDUPD and BNDACK
2289	   messages, and every BNDACK message MUST contain status for all of the
2290	   binding update transactions in the corresponding BNDUPD message.

2292	   The BNDACK message corresponding to a BNDUPD message MUST contain
2293	   assigned-IP-address options for all of the binding update transac-
2294	   tions in the BNDUPD message.  Thus, every BNDACK message contains
2295	   exactly the same assigned-IP-address options as does its correspond-
2296	   ing BNDUPD message.  The order of the assigned-IP-address options
2297	   MAY, however, be different.  Here is a schematic representation of a
2298	   BNDACK:

2300	       Non-IP Address/Non-client specific options first
2301	       assigned-IP-address option for the first IP address
2302	           If rejected, reject-reason option and message option.
2303	       assigned-IP-address option for the second IP address
2304	           If rejected, reject-reason option and message option.
2305	       ...
2306	       Trailing options (message digest).

2308	   In case the server chooses to reject some or all of the IP address
2309	   binding information in a BNDUPD message in a BNDACK reply, the BNDACK
2310	   message MUST contain a reject-reason option following every failed
2311	   assigned-IP-address option in order to indicate that the binding
2312	   update transaction for that IP address was not accepted and why.  As
2313	   with a BNDACK message containing a single binding update transaction,
2314	   an assigned-IP-address option without any associated reject-reason
2315	   option indicates a successful binding update transaction.

2317	7.  Protocol Messages

2319	   This section contains the detailed definition of the protocol mes-
2320	   sages, including the information to include when sending the message,
2321	   as well as the actions to take upon receiving the message.  The mes-
2322	   sage type for each message appears as [n] in the heading for the mes-
2323	   sage (see section 6.1).

2325	7.1.  BNDUPD message [3]

2327	   The binding update (BNDUPD) message is used to send the binding data-
2328	   base changes (known as binding update transactions) to the partner
2329	   server, and the partner server responds with a binding acknowledge-
2330	   ment (BNDACK) message when it has successfully committed those
2331	   changes to its own stable storage.

2333	   The rest of the failover protocol exists to determine whether the
2334	   partner server is able to communicate or not, and to enable the
2335	   partners to exchange BNDUPD/BNDACK messages in order to keep their
2336	   binding databases in stable storage synchronized.

2338	   The rest of this section is written as though every BNDUPD message
2339	   contains only a single binding update transaction in order to reduce
2340	   the complexity of the discussion.  See section 6.3 for information on
2341	   how to create and process BNDUPD and BNDACK messages which contain
2342	   multiple binding update transactions.  Note that while a server MAY
2343	   generate BNDUPD messages with multiple binding update transactions,
2344	   every server MUST be able to process a BNDUPD message which contains
2345	   multiple binding update transactions and generate the corresponding
2346	   BNDACK messages with status for multiple binding update transactions.

2348	   The following table summarizes the various options for the BNDUPD
2349	   message.

2351	                                        binding-status            BACKUP
2352	                                                                  RESET
2353	                                                                  ABANDONED
2354	   Option                        ACTIVE     EXPIRED    RELEASED   FREE
2355	   ------                        ------     -------    --------   ----
2356	   assigned-IP-address (3)       MUST       MUST       MUST       MUST
2357	   IP-flags                      MUST(4)    MUST(4)    MUST(4)    MUST(4)
2358	   binding-status                MUST       MUST       MUST       MUST
2359	   client-identifier             MAY        MAY        MAY        MAY(2)
2360	   client-hardware-address       MUST       MUST       MUST       MAY(2)
2361	   lease-expiration-time         MUST       MUST NOT   MUST NOT   MUST NOT
2362	   potential-expiration-time     MUST       MUST NOT   MUST NOT   MUST NOT
2363	   start-time-of-state           SHOULD     SHOULD     SHOULD     SHOULD
2364	   client-last-trans.-time       MUST       SHOULD     MUST       MAY
2365	   DDNS(1)                       SHOULD     SHOULD     SHOULD     SHOULD
2366	   client-request-options        SHOULD     SHOULD NOT SHOULD     SHOULD NOT
2367	   client-reply-options          SHOULD     SHOULD NOT SHOULD NOT SHOULD NOT

2369	   (1) MUST if server is performing dynamic DNS for this IP address, else
2370	       MUST NOT.
2371	   (2) MUST NOT if binding-status is ABANDONED.
2372	   (3) assigned-IP-address MUST be the first option for an IP address
2373	   (4) IP-flags option MUST appear if any flags are non-zero, else it
2374	       MAY appear.

2376	             Table 7.1-1: Options used in a BNDUPD message

2378	7.1.1.  Sending the BNDUPD message

2380	   A BNDUPD message SHOULD be generated whenever any binding changes.  A
2381	   change might be in the binding-status, the lease-expiration-time, or
2382	   even just the last-transaction-time.  In general, any time a DHCP
2383	   server writes its stable storage, a BNDUPD message SHOULD be gen-
2384	   erated.  This will often be the result of the processing of a DHCP
2385	   client request, but it might also be the result of a successful
2386	   dynamic DNS update operation.  Stable storage updates due to BNDUPD
2387	   or BNDACK messages SHOULD NOT result in additional BNDUPD messages.

2389	   BNDUPD (and BNDACK) messages refer to the binding-status of the IP
2390	   address, and this protocol defines a series of binding-statuses, dis-
2391	   cussed in more detail below.  Some servers may not support all of
2392	   these binding-statuses, and so in those cases they will not be sent.
2393	   Upon receipt of a BNDUPD message which contains an unsupported
2394	   binding-status, a reasonable interpretation should be made (see sec-
2395	   tion 5.10).

2397	   All BNDUPD messages MUST contain the IP address of the binding update
2398	   transaction in the assigned-IP-address option.

2400	   All binding update transactions MUST contain an IP-flags option if
2401	   the value of any of the flags would be non-zero.  The IP-flags option
2402	   MAY be omitted if all of the flags that it contains are zero.  The
2403	   IP-flags option contains a flag which indicates if the IP address is
2404	   currently reserved on the server sending the BNDUPD message.  It also
2405	   contains a flag which indicates that the lease is associated with a
2406	   client that used the BOOTP protocol (as opposed to the DHCP protocol)
2407	   to interact with the DHCP server.

2409	   All binding update transactions contain a binding-status option, and
2410	   it will have one of the values found in section 5.11.  Client infor-
2411	   mation consists of client-hardware-address and possibly a client-
2412	   identifier, and is explained in more detail later in this section.
2413	   The following table indicates whether client information should or
2414	   should not appear with each binding-status in a binding update tran-
2415	   saction:

2417	       binding-status       includes client information
2418	       ------------------------------------------------
2419	       ACTIVE                      MUST
2420	       EXPIRED                     SHOULD
2421	       RELEASED                    SHOULD
2422	       FREE                        MAY
2423	       ABANDONED                   MUST NOT
2424	       RESET                       MAY
2425	       BACKUP                      MAY

2427	         Table 7.1.1-1: Client information required by various
2428	         binding-status values.

2430	   The ACTIVE binding-status requires some options to indicate the
2431	   length of the binding:

2433	      o lease-expiration-time

2435	        The lease-expiration-time option MUST appear, and be set to the
2436	        expiration time most recently ACKed to the DHCP client.  Note
2437	        that the time ACKed to a DHCP client is a lease duration in
2438	        seconds, while the lease-expiration-time option in a BNDUPD mes-
2439	        sage is an absolute time value.

2441	      o potential-expiration-time
2442	        The potential-expiration-time option MUST appear, and be set to
2443	        a value beyond that of the lease-expiration time.  This is the
2444	        value that is ACKed by the BNDACK message.  A server sending a
2445	        BNDUPD message MUST be able to recover the potential-
2446	        expiration-time sent in every BNDUPD, not just those that
2447	        receive a corresponding BNDACK, in order to be able to protect
2448	        against possible duplicate allocation of IP addresses after
2449	        transitioning to PARTNER-DOWN state. See section 5.2.1 for
2450	        details as to why the potential-expiration-time exists and
2451	        guidelines for how to decide on the value.

2453	   The following option information applies to all BNDUPD messages,
2454	   regardless of the value of the binding-status, unless otherwise
2455	   noted.

2457	   o Identifying the client

2459	     For many of the binding-status values a client MUST appear while
2460	     for others a client MAY appear, and for some a client MUST NOT
2461	     appear.

2463	     A client is identified in a BNDUPD message by at least one and pos-
2464	     sibly two options.   The client-hardware-address option MUST appear
2465	     any time that a client appears in a BNDUPD message, and contains
2466	     the hardware type and chaddr information from the DHCP request
2467	     packet.  A failover client-identifier option MUST appear any time
2468	     that a client appears in a BNDUPD message if and only if that
2469	     client used a DHCP client-identifier option when communicating with
2470	     the DHCP server.  See section 12.5 and 12.4 for details of how to
2471	     construct these two options from a DHCP request packet.

2473	   o start-time-of-state

2475	     The start-time-of-state SHOULD appear.  It is set to the time at
2476	     which this IP address first took on the state that corresponds to
2477	     the current value of binding-status.

2479	   o last-transaction-time

2481	     The last-transaction-time value SHOULD appear.  This is the time at
2482	     which this DHCP server last received a packet from the DHCP client
2483	     referenced by the client-identifier or client-hardware-address that
2484	     was associated with the IP address referenced by the assigned-IP-
2485	     address.

2487	   o DDNS

2489	     If the DHCP server is performing dynamic DNS operations on behalf
2490	     of the DHCP client represented by the client-identifier or client-
2491	     hardware-address, then it should include a DDNS option containing
2492	     the domain name and status of any dynamic DNS operations enabled.

2494	   o client-request-options

2496	     If the BNDUPD was triggered by a request from a DHCP client (typi-
2497	     cally those with binding-status of ACTIVE and RELEASED), then the
2498	     server SHOULD include options of interest to a failover partner
2499	     from the client's request packet in the client-request-options for
2500	     transmission to its partner (see section 12.8).

2502	     A server sending a BNDUPD SHOULD remember the "interesting" options
2503	     or the information that would appear in an "interesting" option for
2504	     transmission at a time when the BNDUPD is not closely associated
2505	     with a DHCP client request.

2507	     A server SHOULD send the following "interesting" options.  It MAY
2508	     send any DHCP client options.  As new options are defined, the RFC
2509	     defining these options SHOULD include information that they are
2510	     "interesting to failover servers" if they should be sent as part of
2511	     a BNDUPD.

2513	         option          option
2514	         number          name
2515	         -----------------------------------------

2517	         12              host-name
2518	         81              client-FQDN [FQDN]
2519	         82              relay-agent-information [RFC 3046]
2520	         77              user-class [RFC 3004]
2521	         60              vendor-class-identifier
2522	         118             subnet-selection [RFC 3011]

2524	           Table 7.1.1-2: Options which SHOULD be sent in
2525	           the client-request-options option in a BNDUPD message.

2527	   o client-reply-options

2529	     If the BNDUPD was triggered by a request from a DHCP client (typi-
2530	     cally those with binding-status of ACTIVE and RELEASED), then the
2531	     server SHOULD include options of interest to a failover partner
2532	     from the server's DHCP reply packet in the client-reply-options for
2533	     transmission to its partner (see section 12.7).

2535	     A server sending a BNDUPD SHOULD remember the "interesting" options
2536	     or the information that would appear in an "interesting" option for
2537	     transmission at a time when the BNDUPD is not closely associated
2538	     with a DHCP client request.

2540	     A server SHOULD send the following "interesting" options.  It MAY
2541	     send any DHCP client options.  As new options are defined, the RFC
2542	     defining these options SHOULD include information that they are
2543	     "interesting to failover servers" if they should be sent as part of
2544	     a BNDUPD.

2546	         option          option
2547	         number          name
2548	         -----------------------------------------

2550	         58              renewal-time
2551	         59              rebinding-time

2553	           Table 7.1.1-3: Options which SHOULD be sent in
2554	           the client-reply-options option in a BNDUPD message.

2556	   The BNDUPD message SHOULD be sent as soon as possible from the time
2557	   that the DHCP client received a response and the lease bindings data-
2558	   base is written on stable storage.

2560	7.1.2.  Receiving the BNDUPD message

2562	   When a server receives a BNDUPD message, it needs to decide how to
2563	   process the binding update transaction it contains and whether that
2564	   transaction represents a conflict of any sort. The conflict resolu-
2565	   tion process MUST be used on the receipt of every BNDUPD message, not
2566	   just those that are received while in POTENTIAL-CONFLICT state, in
2567	   order to increase the robustness of the protocol.

2569	   There are three sorts of conflicts:

2571	      o Two clients, one IP address conflict

2573	        This is the duplicate IP address allocation conflict. There are
2574	        two different clients each allocated the same address.  See sec-
2575	        tion 7.1.3 for how to resolve this conflict.

2577	      o Two IP addresses, one client conflict

2579	        This conflict exists when a client on one server is associated
2580	        with a one IP address, and on the other server with a different
2581	        IP address in the same or a related subnet. This does not refer
2582	        to the case where a single client has addresses in multiple dif-
2583	        ferent subnets or administrative domains, but rather the case
2584	        where on the same subnet the client has as lease on one IP
2585	        address in one server and on a different IP address on the other
2586	        server.

2588	        This conflict may or may not be a problem for a given DHCP
2589	        server implementation.  In the event that a DHCP server requires
2590	        that a DHCP client have only one outstanding lease for an IP
2591	        address on one subnet, this conflict should be resolved by
2592	        accepting the lease information which has the latest client-
2593	        last-transaction-time.

2595	      o binding-status conflict

2597	        This is normal conflict, where one server is updating the other
2598	        with newer information.  See section 7.1.3 for details of how to
2599	        resolve these conflicts.

2601	7.1.3.  Deciding whether to accept the binding update transaction in a
2602	BNDUPD message

2604	   When analyzing a BNDUPD message from a partner server, if there is
2605	   insufficient information in the BNDUPD to process it, then reject the
2606	   BNDUPD with reject-reason 3: "Missing binding information".

2608	   If the IP address in the BNDUPD is not an IP address associated with
2609	   the failover endpoint which received the BNDUPD message, then reject
2610	   it with reject-reason 1: "Illegal IP address (not part of any address
2611	   pool)".

2613	   IP addresses undergo binding status changes for several reasons,
2614	   including receipt and processing of DHCP client requests, administra-
2615	   tive inputs and receipt of BNDUPD messages.  Every DHCP server needs
2616	   to respond to DHCP client requests and administrative inputs with
2617	   changes to its internal record of the binding-status of an IP
2618	   address, and this response is not in the scope of the failover proto-
2619	   col.  However, the receipt of BNDUPD messages implies at least a pos-
2620	   sible change of the binding-status for an IP address, and must be
2621	   discussed here.  See section 7.1.2 for general actions to take upon
2622	   receipt of a BNDUPD message.

2624	   When receiving a BNDUPD message, it is important to note that it may
2625	   not be current, in that the server receiving the BNDUPD message may
2626	   have had a more recent interaction with the DHCP client than its
2627	   partner who sent the BNDUPD message.  In this case, the receiving
2628	   server MUST reject the BNDUPD message. The reject reason SHOULD be
2629	   15: "Outdated binding information".  In addition, it is worth noting
2630	   that two (and possibly three) binding-status values are the direct
2631	   result of interaction with a DHCP client, ACTIVE and RELEASED (and
2632	   possibly ABANDONED).  All other binding-status values are either the
2633	   result of the expiration of a time period or interaction with an
2634	   external agency (e.g., a network administrator).

2636	   Every BNDUPD message SHOULD contain a client-last-transaction-time
2637	   option, which MUST, if it appears, be the time that the server last
2638	   interacted with the DHCP client.  It MUST NOT be, for instance, the
2639	   time that the lease on an IP address expired.  If there has been no
2640	   interaction with the DHCP client in question (or there is no DHCP
2641	   client presently associated with this IP address), then there will be
2642	   no client-last-transaction-time option in the BNDUPD message.

2644	   The list in Figure 7.1.3-1 is indexed by the binding-status that a
2645	   server receives in a BNDUPD message.  In many cases, the binding-
2646	   status of an IP address within the receiving server's data storage
2647	   will have an affect upon the checks performed prior to accepting the
2648	   new binding-status in a BNDUPD message.

2650	   In Figure 7.1.3-1, to "accept" a BNDUPD means to update the server's
2651	   bindings database with the information contained in the BNDUPD and
2652	   once that update is complete, send a BNDACK message corresponding to
2653	   the BNDUPD message.  To "reject" a BNDUPD means to respond to the
2654	   BNDUPD with a BNDACK with a reject-reason option included.

2656	   When interpreting the information in the following table (Figure
2657	   7.1.3-1), for those rules that are listed with "time" -- if a BNDUPD
2658	   doesn't have a client-last-transaction-time value, then it MUST NOT
2659	   be considered later than the client-last-transaction-time in the
2660	   receiving server's binding.   If the BNDUPD contains a client-last-
2661	   transaction-time value and the receiving server's binding does not,
2662	   then the client-last-transaction-time value in the BNDUPD MUST be
2663	   considered later than the server's.

2665	                        binding-status in received BNDUPD
2666	       binding-status
2667	       in receiving                                  FREE       RESET
2668	       server          ACTIVE   EXPIRED   RELEASED   BACKUP   ABANDONED

2670	       ACTIVE          accept(5) time(2)   time(1)    time(2)   accept
2671	       EXPIRED         time(1)   accept    accept     accept    accept
2672	       RELEASED        time(1)   time(1)   accept     accept    accept
2673	       FREE/BACKUP     accept    accept    accept     accept    accept
2674	       RESET           time(3)   accept    accept     accept    accept
2675	       ABANDONED       reject(4) reject(4) reject(4)  reject(4) accept

2677	       time(1): If the client-last-transaction-time in the BNDUPD
2678	       is later than the client-last-transaction-time in the
2679	       receiving server's binding, accept it, else reject it.

2681	       time(2): If the current time is later than the receiving
2682	       servers' lease-expiration-time, accept it, else reject it.

2684	       time(3): If the client-last-transaction-time in the BNDUPD
2685	       is later than the start-time-of-state in the receiving server's
2686	       binding, accept it, else reject it.

2688	       (1,2,3): If rejecting, use reject reason 15: "Outdated binding
2689	       information".

2691	       (4): Use reject reason 16: "Less critical binding information".

2693	       (5): If the clients in a BNDUPD message and in a receiving
2694	       server's binding differ, then if the receiving server is a
2695	       secondary accept it, else reject it with a reject reason of 2:
2696	       "Fatal conflict exists: address in use by other client".

2698	                Figure 7.1.3-1:  Accepting BNDUPD messages

2700	   If the IP address in the BNDUPD message has the R flag set in the
2701	   IP-flags option, indicating it is a reserved IP address, and if the
2702	   binding-status in the BNDUPD is BACKUP, then if the receiving server
2703	   does not show the IP address as reserved, the receiving server SHOULD
2704	   reject the BNDUPD using reject reason 19: "IP not reserved on this
2705	   server".

2707	7.1.4.  Accepting the BNDUPD message

2709	   When accepting a BNDUPD message, the information contained in the
2710	   client-request-options and client-reply-options SHOULD be examined
2711	   for any information of interest to this server.  For instance, a
2712	   server which wished to detect changes in client specified host names
2713	   might want to examine and save information from the host-name or
2714	   client-FQDN options.  Servers which expect to utilize information
2715	   from the relay-agent-information option would want to store this
2716	   information.

2718	7.1.5.  Time values related to the BNDUPD message

2720	   There are four time values that MAY be sent in a BNDUPD message.

2722	      o lease-expiration-time

2724	        The time that the server gave to the client, i.e., the time that
2725	        the server believes that the client's lease will expire.

2727	      o potential-expiration-time

2729	        The time that the server wants to be sure its partner waits
2730	        (added to the MCLT) before assuming that this lease has expired.
2731	        Typically some time beyond the desired client lease time.

2733	      o client-last-transaction-time

2735	        The time that the client last interacted with this server.

2737	      o start-time-of-state

2739	        The time at which the binding first went into the current state.

2741	   As discussed in section 5.2, each server knows what its partner has
2742	   ACKed with regard to potential-expiration time.  In addition, each
2743	   server needs to remember what it has told its partner as the
2744	   potential-expiration-time.  Moreover, each server must remember what
2745	   it has acked to the *other* server as the most recent potential-
2746	   expiration-time from that server.

2748	   Remember that each server sends a potential-expiration-time and
2749	   receives an ACK for that as well as receiving a potential-
2750	   expiration-time and needing to remember what it has acked for that.

2752	   While they don't have to be named in any particular way, the times
2753	   that a server needs to remember for every IP address in order to
2754	   implement the failover protocol are:

2756	      o lease-expiration-time
2757	        The time that a server gave to the DHCP client.  A DHCP server
2758	        needs to remember this time already, just to be a DHCP server.
2759	        A server SHOULD update this time with the lease-expiration time
2760	        received from a partner in a BNDUPD if the received lease-
2761	        expiration time is later than the lease-expiration time recorded
2762	        for this binding.

2764	      o sent-potential-expiration-time

2766	        The latest time sent to the partner for a potential-expiration-
2767	        time.

2769	      o acked-potential-expiration-time

2771	        The latest time that the partner has acked for a potential
2772	        expiration time.  Typically the same as sent-potential-
2773	        expiration-time if there is not a BNDUPD outstanding.

2775	      o received-potential-expiration-time

2777	        The latest time that this server has ever received as a
2778	        potential-expiration-time from its partner in a BNDUPD that this
2779	        server ACKed.

2781	   So, a server has to remember two additional times concerning BNDUPD
2782	   messages that it has initiated, and one additional time concerning
2783	   BNDUPD message that it has received.  How are these times used?

2785	   First, let's look at the time that a DHCP server can offer to a DHCP
2786	   client.  A server can offer to a DHCP client a time that is no longer
2787	   than the MCLT beyond the max( received-potential-expiration-time,
2788	   acked-potential-expiration-time).  One might think that the server
2789	   should be able to offer only the MCLT beyond the acked-potential-
2790	   expiration-time, and while that is certainly simple and easy to
2791	   understand, it has negative consequences in actual operation.

2793	   To illustrate this, in the simple case where the primary updates the
2794	   secondary for a while and then fails, if the secondary can then renew
2795	   the client for only the MCLT beyond the acked-potential-expiration-
2796	   time, then the secondary will only be able to renew the client for
2797	   the MCLT, because the secondary has never sent a BNDUPD packet to the
2798	   primary concerning this IP address and client, and so its acked-
2799	   potential-expiration-time is zero.

2801	   However, since the secondary is allowed to renew the client with the
2802	   MCLT beyond the max( received-potential-expiration-time, acked-
2803	   potential-expiration-time), then the secondary can usually renew the
2804	   client for the full lease period, at least for the first renew it
2805	   sees from the client, since the received-potential-expiration-time is
2806	   generally longer than the client's desired lease interval.  The
2807	   difference in renew times could make a big difference in server load
2808	   on the secondary in this case.

2810	   What are the consequences of allowing a server to offer a DHCP client
2811	   a lease term of the MCLT beyond the max( received-potential-
2812	   expiration-time, acked-potential-expiration-time)?  The consequences
2813	   appear whenever a server enters PARTNER-DOWN state, and affect how
2814	   long that server has to wait before reallocating expired leases.
2815	   With this approach, when a server goes into PARTNER-DOWN state, it
2816	   must wait the MCLT beyond the max( lease-expiration-time, sent-
2817	   potential-expiration-time, acked-potential-expiration-time,
2818	   received-potential-expiration-time ) for each IP address before it
2819	   can reallocate that IP address to another DHCP client.   One might
2820	   normally think that it needed to wait only the MCLT beyond the max(
2821	   lease-expiration-time, received-potential-expiration-time ), i.e.,
2822	   beyond what it has told the client and what it has explicitly acked
2823	   to the other server.  But with the optimization discussed above --
2824	   where either server can offer the DHCP client a lease term of the
2825	   MCLT beyond the max( received-potential-expiration-time, acked-
2826	   potential-expiration-time), then the additional times sent-
2827	   potential-expiration-time and acked-potential-expiration-time must be
2828	   added into the expression, since the partner could have used those
2829	   times as part of its own lease time calculation.

2831	   Thus this optimization may require a longer waiting time when enter-
2832	   ing PARTNER-DOWN state, but will generally allow servers to operate
2833	   considerably more effectively when running in COMMUNICATIONS-
2834	   INTERRUPTED state.

2836	7.2.  BNDACK message [4]

2838	   A server sends a binding acknowledgement (BNDACK) message when it has
2839	   processed a BNDUPD message and after it has successfully committed to
2840	   stable storage any binding database changes made as a result of pro-
2841	   cessing the BNDUPD message.  A BNDACK message is used to both accept
2842	   or reject a BNDUPD message.  A BNDACK message which contains a
2843	   reject-reason option is a rejection of the corresponding BNDUPD mes-
2844	   sage.

2846	   In order to reduce the complexity of the discussion, the rest of this
2847	   section is written as though every BNDUPD message contains only a
2848	   single binding update transaction and thus every corresponding BNDACK
2849	   message would also contain reply information about only a single
2850	   binding update transaction.  See section 6.3 for information on how
2851	   to create and process BNDUPD and BNDACK messages which contain multi-
2852	   ple binding update transactions.

2854	   Note that while a server MAY generate BNDUPD messages with multiple
2855	   binding update transactions, every server MUST be able to process a
2856	   BNDUPD message which contains multiple binding update transactions
2857	   and generate the corresponding BNDACK messages with status for multi-
2858	   ple binding update transactions.  If a server does not ever create
2859	   BNDUPD messages which contain multiple binding update transactions,
2860	   then it does not need to be able to process a received BNDACK message
2861	   with multiple binding update transactions.  However, all servers MUST
2862	   be able to create BNDACK messages which deal with multiple binding
2863	   update transactions received in a BNDUPD message.

2865	   Every BNDUPD message that is received by a server MUST be responded
2866	   to with a corresponding BNDACK message.  The receiving server SHOULD
2867	   respond quickly to every BNDUPD message but it MAY choose to respond
2868	   preferentially to DHCP client requests instead of BNDUPD messages,
2869	   since there is no absolute time period within which a BNDACK must be
2870	   sent in response to a BNDUPD message, while DHCP clients frequently
2871	   have strict time constraints.

2873	   A BNDACK message can only be sent in response to a BNDUPD message
2874	   using the same TCP connection from which the BNDUPD message was
2875	   received, since the XID's in BNDUPD messages are guaranteed unique
2876	   only during the life of a single TCP connection.  When a connection
2877	   to a partner server goes down, a server with unprocessed BNDUPD mes-
2878	   sages MAY simply drop all of those messages, since it can be sure
2879	   that the partner will resend them when they are next in communica-
2880	   tions (albeit with a different XID), or it MAY instead choose to pro-
2881	   cess those BNDUPD messages, but it MUST NOT send any BNDACK messages
2882	   in response.

2884	   The following table summarizes the options for the BNDACK message.

2886	   Option                        accept       reject
2887	   ------                        ------       ------
2888	   assigned-IP-address  (1)      MUST         MUST
2889	   IP-flags                      SHOULD NOT   SHOULD NOT
2890	   binding-status                SHOULD NOT   SHOULD NOT
2891	   client-identifier             SHOULD NOT   SHOULD NOT
2892	   client-hardware-address       SHOULD NOT   SHOULD NOT
2893	   reject-reason                 SHOULD NOT   MUST
2894	   message                       SHOULD NOT   SHOULD
2895	   lease-expiration-time         SHOULD NOT   SHOULD NOT
2896	   potential-expiration-time     SHOULD NOT   SHOULD NOT
2897	   start-time-of-state           SHOULD NOT   SHOULD NOT
2898	   client-last-trans.-time       SHOULD NOT   SHOULD NOT
2899	   DDNS(1)                       SHOULD NOT   SHOULD NOT

2901	   (1) assigned-IP-address MUST be the first option for an IP address

2903	              Table 7.2-1: Options used in a BNDACK message

2905	7.2.1.  Sending the BNDACK message

2907	   The BNDACK message MUST contain the same xid as the corresponding
2908	   BNDUPD message.

2910	   The assigned-IP-address option from the BNDUPD message MUST be
2911	   included in the BNDACK message.  Any additional options from the
2912	   BNDUPD message SHOULD NOT appear in the BNDACK message.  Note that
2913	   any information sent in options (e.g, a later lease-expiration time)
2914	   in the BNDACK message MUST NOT be assumed to necessarily be recorded
2915	   in the stable storage of the server who receives the BNDACK message
2916	   because there is no corresponding ACK of the BNDACK message.  Any
2917	   information that SHOULD be recorded in the partner server's stable
2918	   storage MUST be transmitted in a subsequent BNDUPD.

2920	   If the server is accepting the BNDUPD, the BNDACK message includes
2921	   only the assigned-IP-address option.  If the server is rejecting the
2922	   BNDUPD, the additional option reject-reason MUST appear in the BNDACK
2923	   message, and the message option SHOULD appear in this case containing
2924	   a human-readable error message describing in some detail the reason
2925	   for the rejection of the BNDUPD message.

2927	   If the server rejects the BNDUPD message with a BNDACK and a reject-
2928	   reason option, it may be because the server believes that it has
2929	   binding information that the other server should know.  A server
2930	   which is rejecting a BNDUPD may initiate a BNDUPD of its own in order
2931	   to update its partner with what it believes is better binding infor-
2932	   mation, but it MUST ensure through some means that it will not end up
2933	   in a situation where each server is sending BNDUPD messages as fast
2934	   as possible because they can't agree on which server has better bind-
2935	   ing data.  Placing a considerable delay on the initiation of a BNDUPD
2936	   message after sending a BNDACK with a reject-reason would be one way
2937	   to ensure this situation doesn't occur.

2939	7.2.2.  Receiving the BNDACK message

2941	   When a server receives a BNDACK message, if it doesn't contain a
2942	   reject-reason option that means that the BNDUPD message was accepted,
2943	   and the server which sent the BNDUPD SHOULD update its stable storage
2944	   with the potential-expiration-time value sent in the BNDUPD message.

2946	   If the BNDACK message contains a reject-reason option, that means
2947	   that the BNDUPD was rejected.  There SHOULD be a message option in
2948	   the BNDACK giving a text reason for the rejection, and the server
2949	   SHOULD log the message in some way.  The server MUST NOT immediately
2950	   try to resend the BNDUPD message as there is no reason to believe the
2951	   partner won't reject it a second time.  However a server MAY choose
2952	   to send another BNDUPD at some future time, for instance when the
2953	   server next processes an update request from its partner.

2955	7.3.  UPDREQ message [9]

2957	   The update request (UPDREQ) message is used by one server to request
2958	   that its partner send it all of the binding database information that
2959	   it has not already seen.   Since each server is required to keep
2960	   track at all times of the binding information the other server has
2961	   ACKed, one server can request transmission of all un-ACKed binding
2962	   database information held by the other server by using the UPDREQ
2963	   message.

2965	   The UPDREQ message is used whenever the sending server cannot proceed
2966	   before it has processed all previously un-ACKed binding update infor-
2967	   mation, since the UPDREQ message should yield a corresponding UPDDONE
2968	   message.  The UPDDONE message is not sent until the server that sent
2969	   the UPDREQ message has responded to all of the BNDUPD messages gen-
2970	   erated by the UPDREQ message with BNDACK messages (they may either be
2971	   accepted or rejected by the BNDACK messages, but they MUST have been
2972	   responded to). Thus, the sender of the UPDREQ message can be sure
2973	   upon receipt of an UPDDONE message that it has received and committed
2974	   to stable storage all outstanding binding database updates.

2976	   See section 9, Failover Endpoint States, for the details of when the
2977	   UPDREQ message is sent.

2979	7.3.1.  Sending the UPDREQ message

2981	   The UPDREQ message has no message specific options.

2983	7.3.2.  Receiving the UPDREQ message

2985	   A server receiving an UPDREQ message MUST send all binding database
2986	   changes that have not yet been ACKed by the sending server.   These
2987	   changes are sent as undistinguished BNDUPD messages.

2989	   However, the server which received and is processing the UPDREQ mes-
2990	   sage MUST track the BNDACK messages that correspond to the BNDUPD
2991	   messages triggered by the UPDREQ message and, when they are all
2992	   received, the server MUST send an UPDDONE message.

2994	   The server processing the UPDREQ message and sending BNDUPD messages
2995	   to its partner SHOULD only track the BNDUPD and BNDACK message pairs
2996	   for unACKed binding database changes that were present upon the
2997	   receipt of the UPDREQ message.  A server which has received an UPDREQ
2998	   message SHOULD send BNDUPD messages for binding database changes that
2999	   occur after receipt of the UPDREQ message, but it SHOULD NOT include
3000	   those additional BNDUPD messages and their corresponding BNDACK mes-
3001	   sages in the accounting necessary to consider the UPDREQ complete and
3002	   subsequently send the UPDDONE message.  If some additional binding
3003	   database changes end up becoming part of the set of BNDUPD messages
3004	   considered as part of the UPDREQ (due to whatever algorithm the
3005	   server uses to scan its bindings database for unacked changes) it
3006	   will probably not cause any difficulty, but a server MUST NOT attempt
3007	   to include all such later BNDUPD messages in the accounting for the
3008	   UPDREQ in order to be able to transmit an UPDDONE message.

3010	   When queuing up the BNDUPD messages for transmission to the sender of
3011	   the UPDREQ message, the server processing the UPDREQ message MUST
3012	   honor the value returned in the max-unacked-bndupd option in the CON-
3013	   NECT or CONNECTACK message that set up the connection with the send-
3014	   ing server.  It MUST NOT send more BNDUPD messages without receiving
3015	   corresponding BNDACKs than the value returned in max-unacked-bndupd.
3016	   (See section 8 for more details.)

3018	7.4.  UPDREQALL message [7]

3020	   The update request all (UPDREQALL) message is used by one server to
3021	   request that its partner send it all of the binding database informa-
3022	   tion.  This message is used to allow one server to recover from a
3023	   failure of stable storage and to restore its binding database in its
3024	   entirety from the other server.

3026	   A server which sends an UPDREQALL message cannot proceed until all of
3027	   its binding update information is restored, and it knows that all of
3028	   that information is restored when an UPDDONE message is received.

3030	   See section 9, Protocol state transitions, for the details of when
3031	   the UPDREQALL message is sent.

3033	   The UPDREQALL message has no message specific options.

3035	7.4.1.  Sending the UPDREQALL message

3037	   The UPDREQALL is sent.

3039	7.4.2.  Receiving the UPDREQALL message

3041	   A server receiving an UPDREQALL message MUST send all binding data-
3042	   base information to the sending server.  See section 5.16 for details
3043	   of what might actually comprise "all binding database information".

3045	   A server receiving an UPDREQALL message MUST remember that such a
3046	   message has been received, ensure that all binding information extant
3047	   at that point is sent to the partner prior to any UPDDONE message
3048	   being sent to that partner.  One way to do this is to remember the
3049	   receipt of an UPDREQALL message and to and treat every subsequent
3050	   UPDREQ message as an UPDREQALL message until it sends the first
3051	   UPDDONE message after receipt of the UPDREQALL message.  This
3052	   requirement exists because communications may fail and become re-
3053	   established between the two servers, and the specific conditions
3054	   which provoked the UPDREQALL message may not longer exist even though
3055	   the UPDREQALL message may not yet have completed.  See section 5.17
3056	   for information on a more efficient way to meet the above require-
3057	   ment.

3059	   These changes are sent as undistinguished BNDUPD messages. Otherwise
3060	   the processing is the same as for the UPDREQ message.  See section
3061	   7.3.2 for details.

3063	7.5.  UPDDONE message [8]

3065	   The update done (UPDDONE) message is used by a server receiving an
3066	   UPDREQ or UPDREQALL message to signify that it has sent all of the
3067	   BNDUPD messages requested by the UPDREQ or UPDREQALL request and that
3068	   it has received a BNDACK for each of those messages.

3070	   While a BNDACK message MUST have been received for each BNDUPD mes-
3071	   sage prior to the transmission of the UPDDONE message, this doesn't
3072	   necessarily mean that all of the BNDUPD messages were accepted, only
3073	   that all of them were responded to with a BNDACK message.  Thus, a
3074	   NAK (comprised of a BNDACK message containing a reject-reason option)
3075	   could be used to reject a BNDUPD, but for the purposes of the UPDDONE
3076	   message, such NAK would count as a response to the associated BNDUPD
3077	   message, and would not block the eventual transmission of the UPDDONE
3078	   message.

3080	   The xid in an UPDDONE message MUST be identical to the xid in the
3081	   UPDREQ or UPDREQALL message that initiated the update process.

3083	   The UPDDONE message has no message specific options.

3085	7.5.1.  Sending the UPDDONE message

3087	   The UPDDONE message SHOULD be sent as soon as the last BNDACK message
3088	   corresponding to a BNDUPD message requested by the UPDREQ or
3089	   UPDREQALL is received from the server which sent the UPDREQ or
3090	   UPDREQALL.  The XID of the UPDDONE message MUST be the same as the
3091	   XID of the corresponding UPDREQ or UPDREQALL message.

3093	7.5.2.  Receiving the UPDDONE message

3095	   A server receiving the UPDDONE message knows that all of the informa-
3096	   tion that it requested by sending an UPDREQ or UPDREQALL message has
3097	   now been sent and that it has recorded this information in its stable
3098	   storage.  It typically uses the receipt of an UPDDONE message to move
3099	   to a different failover state.  See sections 9.5.2 and 9.8.3 for
3100	   details.

3102	7.6.  POOLREQ message [1]

3104	   The pool request (POOLREQ) message is used by the secondary server to
3105	   request an allocation of IP addresses from the primary server.   It
3106	   MUST be sent by a secondary server to a primary server to request IP
3107	   address allocation by the primary.  The IP addresses allocated are
3108	   transmitted using normal BNDUPD messages from the primary to the
3109	   secondary.

3111	   The POOLREQ message SHOULD be sent from the secondary to the primary
3112	   whenever the secondary makes a transition into NORMAL state.  It
3113	   SHOULD periodically be resent in order that any change in the number
3114	   of available IP addresses on the primary be reflected in the pool on
3115	   the secondary.  The period may be influenced by the secondary
3116	   server's leasing activity.

3118	   The POOLREQ message has no message specific options.

3120	7.6.1.  Sending the POOLREQ message

3122	   The POOLREQ message is sent.

3124	7.6.2.  Receiving the POOLREQ message

3126	   When a primary server receives a POOLREQ message it SHOULD examine
3127	   the binding database and determine how many IP addresses the secon-
3128	   dary server should have, and set these IP addresses to BACKUP state.
3129	   It SHOULD then send BNDUPD messages concerning all of these IP
3130	   addresses to the secondary server.

3132	   Servers frequently have several kinds of IP addresses available on a
3133	   particular network segment.  The failover protocol assumes that both
3134	   primary and secondary servers are configured in such a way that each
3135	   knows the type and number of IP addresses on every network segment
3136	   participating in the failover protocol.  The primary server is
3137	   responsible for allocating the secondary server the correct propor-
3138	   tion of available IP addresses of each kind, and the secondary server
3139	   is responsible for being configured in such a way that it can tell
3140	   the kind of every IP address based solely on the IP address itself.

3142	   A primary server MUST keep track of how many IP addresses were allo-
3143	   cated as a result of processing the POOLREQ message, and send that
3144	   number in the POOLRESP message.

3146	   A primary server MAY choose to defer processing a POOLREQ message
3147	   until a more convenient time to process it, but it should not depend
3148	   on the secondary server to resend the POOLREQ message in that case.

3150	   If a secondary server receives a POOLREQ message it SHOULD report an
3151	   error.

3153	7.7.  POOLRESP message [2]

3155	   A primary server sends a POOLRESP message to a secondary server after
3156	   the allocation process for available addresses to the secondary
3157	   server is complete.  Typically this message will precede some of the
3158	   BNDUPD messages that the primary uses to send the actual allocated IP
3159	   addresses to the secondary.

3161	   The xid in the POOLRESP message MUST be identical to the xid in the
3162	   POOLREQ message for which this POOLRESP is a response.

3164	7.7.1.  Sending the POOLRESP message

3166	   The POOLRESP message MUST contain the same xid as the corresponding
3167	   POOLREQ message.

3169	   Only one option MUST appear in a POOLREQ message:

3171	      o addresses-transferred

3173	        The number of addresses allocated to the secondary server by the
3174	        primary server as a result of a POOLREQ is contained in the
3175	        addresses-transferred option in a POOLRESP message.  Note this
3176	        is the number of addresses that are transferred to the secondary
3177	        in the primary's binding database as a result of the correspond-
3178	        ing POOLREQ message, and that it may be some time before they
3179	        can all be transmitted to the secondary server through the use
3180	        of BNDUPD messages.

3182	7.7.2.  Receiving the POOLRESP message

3184	   When a secondary server receives a POOLRESP message, it SHOULD send
3185	   another POOLREQ message if the value of the addresses-transferred
3186	   option is non-zero.

3188	   Typically, no other action is taken on the reception of a POOLRESP
3189	   message.

3191	7.8.  CONNECT message [5]

3193	   The connect message is used to establish an applications level con-
3194	   nection over a newly created TCP connection.  It gives the source
3195	   information for the connection and critical configuration informa-
3196	   tion.  It MUST be sent only by the primary server.  Either server can
3197	   initiate a TCP connection, but the CONNECT message is only sent by
3198	   the primary server.

3200	   The CONNECT message MUST be the first message sent down a newly esta-
3201	   blished connection, and it MUST be sent only by the primary server.

3203	   The following table summarizes the options that are associated with
3204	   the CONNECT message:

3206	   Option
3207	   ------
3208	   relationship-name           MUST
3209	   max-unacked-bndupd          MUST
3210	   receive-timer               MUST
3211	   vendor-class-identifier     MUST
3212	   protocol-version            MUST
3213	   TLS-request                 MUST (1)
3214	   MCLT                        MUST
3215	   hash-bucket-assignment      MUST

3217	   (1) MUST NOT if CONNECT is being sent over a TLS connection

3219	              Table 7.8-1: Options used in a CONNECT message

3221	7.8.1.  Sending the CONNECT message

3223	   The CONNECT message MUST be the first message sent by the primary
3224	   server after the establishment of a new TCP connection with a secon-
3225	   dary server participating in the failover protocol.

3227	   The xid of the CONNECT message is not related to any previous xid
3228	   sequence, but initiates the sequence for this connection.

3230	   The name of the failover relationship MUST be placed in the
3231	   relationship-name option.  This information is placed in an option
3232	   inside of the message in order to allow the identity of the sender to
3233	   be covered by a shared secret.

3235	   The number of BNDUPD messages the primary server can accept without
3236	   blocking the TCP connection MUST be placed in the max-unacked-bndupd
3237	   option.  This MUST be a number equal to or greater than 1, SHOULD be
3238	   a number greater than 10, and SHOULD be a number less than 100.

3240	   The length of the receive timer (tReceive, see section 8.3) MUST be
3241	   placed in the receive-timer option.

3243	   The MCLT MUST be placed in the MCLT option.

3245	   The hash-bucket-assignment option MUST be included in the CONNECT
3246	   message.  In the event that load balancing is not configured for this
3247	   server, the hash-bucket-assignment option will indicate that.  The
3248	   value of the hash-bucket-assignment option is determined from the
3249	   specific buckets that the primary server has determined that the
3250	   secondary server MUST service as part of the load-balancing
3251	   algorithm.  The way in which the primary server determines this
3252	   information is outside the scope of this protocol definition.  The
3253	   primary server SHOULD be configured with a percentage of clients that
3254	   the secondary server will be instructed to service, and the primary
3255	   server SHOULD use the algorithm in [RFC 3074] to generate a Hash
3256	   Bucket Assignment which it sends to the secondary server.

3258	   The vendor class identifier MUST be placed in the vendor-class-
3259	   identifier option.

3261	   The protocol-version option MUST be included in every CONNECT mes-
3262	   sage.  The current value of the protocol version is 1.

3264	   The TLS-request option MUST be sent and contains the desired TLS con-
3265	   nection request as well as information concerning whether TLS is sup-
3266	   ported.    If this CONNECT message is being sent over a already
3267	   created TLS connection, the TLS-request MUST NOT appear.

3269	7.8.2.  Receiving the CONNECT message

3271	   When a server established a TCP connection on a failover port, if it
3272	   is a PRIMARY server it should send a CONNECT message, and if it is a
3273	   secondary server it should wait for a CONNECT message before sending
3274	   any messages.  To avoid denial of service attacks, a secondary should
3275	   only wait for a CONNECT message on a new connection for a limited
3276	   amount of time and close the connection if none is received during
3277	   that time.

3279	   When a secondary server receives a CONNECT message it should:

3281	      1.  Record the time at which the message was received.

3283	      2.  Examine the protocol-version option, and decide if this server
3284	          is capable of interoperating with another server running that
3285	          protocol version.  If not, send the CONNECTACK message with
3286	          the reject reason 14: "Protocol version mismatch".  The server
3287	          MUST include its protocol-version in the CONNECTACK message.

3289	      3.  Examine the TLS-request option.  Figure out the TLS-reply
3290	          value based on the capabilities and configuration of this
3291	          server.  If the result for the TLS-reply value is a 1 and the
3292	          connection is accepted, indicating use of TLS, then immedi-
3293	          ately send the CONNECTACK message and go into TLS negotiation.
3294	          If the TLS-reply value implies rejection of the connection,
3295	          then immediately send the CONNECTACK message with the TLS-
3296	          reply value and the appropriate reject-reason option value.
3297	          In all other cases, save the TLS-reply option information for
3298	          the eventual CONNECTACK message.

3300	          The possibilities for TLS-request and TLS-reply are:

3302	          CONNECT CONNECTACK
3303	            TLS     TLS
3304	          request  reply
3305	                        Reject
3306	            t1      t1  Reason   Comments
3307	            --      --  ------   --------
3308	            0       0           no TLS used
3309	            0       1    11     primary won't use TLS, secondary requires TLS
3310	            1       0           primary desires TLS, secondary doesn't
3311	            1       1           primary desires TLS, secondary will use TLS
3312	            2       0    9, 10  primary requires TLS and secondary won't
3313	            2       1           primary requires TLS and secondary will use TLS

3315	      4.  Check to see if there is a message-digest option in the CON-
3316	          NECT message.  If there was, and the server does not support
3317	          message-digests, then reject the connection with reject reason
3318	          12: "Message digest not supported" in the CONNECTACK.  If the
3319	          server does support message-digests, then check this message
3320	          for validity based on the message-digest, and reject it if the
3321	          digest indicates the message was altered with reject reason
3322	          20: "Message digest failed to compare".

3324	      5.  Determine if the sender (from the relationship-name option)
3325	          and the implicit role of the sender (i.e., primary) represents
3326	          a server with which the receiver was configured to engage in
3327	          failover activity.  This is performed after any TLS or message
3328	          digest processing so that it occurs after a secure connection
3329	          is created, to ensure that there is no tampering with the
3330	          relationship name of the partner.  In the absence of any other
3331	          security capability (i.e., when TLS or a message digest is not
3332	          used), the server MAY wish to be configured with the IP
3333	          address of the partner and check the source-ip of the CONNECT
3334	          message against that IP address as a weak form of security.

3336	          If not, then the receiving server should reject the CONNECT
3337	          request by sending a CONNECTACK message with a reject-reason
3338	          value of: 8, invalid failover partner.

3340	          If it is, then the receiving failover endpoint should be
3341	          determined.

3343	      6.  Decide if the time delta between the sending of the message,
3344	          in the time field, and the receipt of the message, recorded in
3345	          step 1 above, is acceptable.  A server MAY require an
3346	          arbitrarily small delta in time values in order to set up a
3347	          failover connection with another server.  See section 5.10 for
3348	          information on time synchronization.

3350	          If the delta between the time values is too great, the server
3351	          should reject the CONNECT request by sending a CONNECTACK mes-
3352	          sage with a reject-reason of 4, time mismatch too great.

3354	          If the time mismatch is not considered too great then the
3355	          receiving server MUST record the delta between the servers.
3356	          The receiving server MUST use this delta to correct all of the
3357	          absolute times received from the other server in all time-
3358	          valued options.  Note that servers can participate in failover
3359	          with arbitrarily great time mismatches, as long as it is more
3360	          or less constant.

3362	      7.  Examine the MCLT option in the CONNECT request and use the
3363	          value of the MCLT as the MCLT for this failover endpoint.

3365	          The secondary server SHOULD be able to operate with any MCLT
3366	          sent by the primary,  but if it cannot, then it should send a
3367	          CONNECTACK with a reject-reason of 5, MCLT mismatch.  In the
3368	          event that the MCLT from the primary does not match that con-
3369	          figured on the secondary, and the secondary will run with the
3370	          primary's value, then the secondary MUST save the MCLT in
3371	          secondary storage since it will need it even if it cannot con-
3372	          tact the primary.  The secondary MUST NOT use a different MCLT
3373	          value than it received from the primary even if it cannot con-
3374	          tact the primary.

3376	      8.  The server MUST store hash-bucket-assignment option for use
3377	          during processing during NORMAL state.  If this hash bucket
3378	          assignment conflicts with the secondary server's configured
3379	          hash bucket assignment for use in other than NORMAL state, the
3380	          secondary server should send a CONNECTACK with a reject reason
3381	          of 19, Hash bucket assignment conflict.

3383	      9.  The receiving server MAY use the vendor-class-identifier to do
3384	          vendor specific processing.

3386	7.9.  CONNECTACK message [6]

3388	   The CONNECTACK message is sent to accept or reject a CONNECT message.
3389	   It is sent by the secondary server which received a CONNECT message.

3391	   Attempting immediately to reconnect after either receiving a CONNEC-
3392	   TACK with a reject-reason or after sending a CONNECTACK with a
3393	   reject-reason could yield unwanted looping behavior, since the reason
3394	   that the connection was rejected may well not have changed since the
3395	   last attempt.  A simple suggested solution is to wait a minute or two
3396	   after sending or receiving a CONNECTACK message with a reject-reason
3397	   before attempting to reestablish communication.

3399	   The following table summarizes the options associated with the CON-
3400	   NECTACK message:

3402	   Option                     accept       reject
3403	   ------
3404	   relationship-name           MUST        MUST
3405	   max-unacked-bndupd          MUST        MUST NOT
3406	   receive-timer               MUST        MUST NOT
3407	   vendor-class-identifier     MUST        MUST NOT
3408	   protocol-version            MUST        MUST
3409	   TLS-reply                   (1)         (2)
3410	   reject-reason               MUST NOT    MUST
3411	   message                     MUST NOT    SHOULD
3412	   MCLT                        MUST NOT    MUST NOT
3413	   hash-bucket-assignment      MUST NOT    MUST NOT

3415	   (1) MUST NOT if sending CONNECTACK after TLS negotiation, MUST
3416	   if TLS-request in CONNECT, else MUST NOT.
3417	   (2) MUST if TLS-request in CONNECT message, else MUST NOT.

3419	              Table 7.9-1: Options used in a CONNECTACK message

3421	7.9.1.  Sending the CONNECTACK message

3423	   The xid of the CONNECTACK message MUST be that of the corresponding
3424	   CONNECT message.

3426	   The name of the relationship MUST be placed in the relationship-name
3427	   option.  This information is placed in an option inside of the mes-
3428	   sage in order to allow the identity of the sender to be covered by a
3429	   shared secret.

3431	   The protocol-version option MUST be included in every CONNECTACK mes-
3432	   sage.  The current value of the protocol version is 1.

3434	   If the connection has been rejected, the reject-reason option MUST be
3435	   placed in the CONNECTACK message with an appropriate reason, and a
3436	   message option SHOULD be included with a human-readable error message
3437	   describing the reason for the rejection in some detail.  If the
3438	   reject-reason option appears, then the remaining options listed below
3439	   do not appear.  The sending server should close the connection after
3440	   sending the CONNECTACK if the connection was rejected.

3442	   The results of the TLS negotiation MUST be placed in the TLS-reply
3443	   option.  If this CONNECTACK message is being sent over an already TLS
3444	   secured connection, then there MUST NOT be a TLS-reply option.

3446	   If there was a message-digest option in the CONNECT message, then
3447	   there MUST be a message-digest in the CONNECTACK message and any sub-
3448	   sequent messages if the CONNECTACK does not contain a reject-reason.

3450	   The number of BNDUPD messages the server can accept without blocking
3451	   the TCP connection MUST be placed in the max-unacked-bndupd option.
3452	   This SHOULD be a number greater than 10, and SHOULD be a number less
3453	   than 100.

3455	   The length of the receive timer (tReceive, see section 8.3) MUST be
3456	   placed in the receive-timer option.

3458	   The vendor class identifier MUST be placed in the vendor-class-
3459	   identifier option.

3461	   After a connection is created (either by sending a CONNECTACK message
3462	   to the first CONNECT message, or sending a CONNECTACK message to a
3463	   CONNECT message received over a TLS connection), the server MUST send
3464	   a STATE message.

3466	   After a connection is created, the server MUST start two timers for
3467	   the connection: tSend and tReceive.   The tSend timer SHOULD be
3468	   approximately 33 percent of the time in the receiver-timer option in
3469	   the corresponding CONNECT message.  The tReceive timer SHOULD be the
3470	   time sent in the receiver-timer option in the CONNECTACK message.

3472	   The tReceive timer is reset whenever a message is received from this
3473	   TCP connection.  If it ever expires, the TCP connection is dropped
3474	   and communications with this partner is considered not ok.  The
3475	   reject reason 17: "No traffic within sufficient time" is placed in
3476	   the DISCONNECT message sent prior to dropping the TCP connection.

3478	   The tSend timer is reset whenever a message is sent over this connec-
3479	   tion. When it expires, a CONTACT message MUST be sent.

3481	7.9.2.  Receiving the CONNECTACK message

3483	   If a CONNECTACK message is received with a different XID from the one
3484	   in the CONNECT that was sent, it SHOULD be ignored.  To avoid denial
3485	   of service attacks, a primary should only wait for a CONNECTACK mes-
3486	   sage on a new connection for a limited amount of time and close the
3487	   connection if none is received during that time.

3489	   When a CONNECTACK message is received, the following actions should
3490	   be taken:

3492	      1.  Record the time the message was received.

3494	      2.  Check to see if the xid on the CONNECTACK matches an outstand-
3495	          ing CONNECT message on this TCP connection.

3497	      3.  Check to see if there is a reject-reason option in the CONNEC-
3498	          TACK message.  If not, continue with step 3.  If there is a
3499	          reject-reason option, the server SHOULD report the error code.
3500	          If a message option appears a server SHOULD display the string
3501	          from the message option in a user visible way.  The server
3502	          MUST close the connection if a reject-reason option appears.

3504	      4.  Check the value of the TLS-reply option (if any, which there
3505	          won't be if this CONNECT is taking place utilizing TLS), and
3506	          if it was 1, then skip processing of the rest of the CONNEC-
3507	          TACK message, and immediately enter into TLS connection setup.

3509	          This step occurs prior to steps 5 and 6 in order to allow
3510	          creation of a secure connection (if required) prior to pro-
3511	          cessing the protocol version and IP address information.

3513	      5.  Examine the value of the protocol-version option.  If this
3514	          server is able to establish connections with another server
3515	          running this protocol version, then continue, else close the
3516	          connection.

3518	      6.  Decide if the time delta between the sending of the message,
3519	          in the time field, and the receipt of the message, recorded in
3520	          step 1 above, is acceptable.  A server MAY require an arbi-
3521	          trarily small delta in time values in order to set up a fail-
3522	          over connection with another server.

3524	          If the delta between the time values is too great, the server
3525	          should drop the TCP connection (see section 7.12).

3527	          If the time mismatch is not considered too great then the
3528	          receiving server MUST record the delta between the servers.
3529	          The receiving server MUST use this delta to correct all of the
3530	          absolute times received from the other server in all time-
3531	          valued options.  Note that the failover protocol is con-
3532	          structed so that two servers can be failover partners with
3533	          arbitrarily great time mismatches.

3535	      7.  The receiving server MAY use the vendor-class-identifier to do
3536	          vendor specific processing.

3538	      8.  After accepting a CONNECTACK message, the server MUST send a
3539	          STATE message.

3541	          After receiving a CONNECTACK message, the server MUST start
3542	          two timers for the connection: tSend and tReceive.   The tSend
3543	          timer SHOULD be approximately 20 percent of the time in the
3544	          receiver-timer option in the corresponding CONNECTACK message.
3545	          The tReceive timer SHOULD be set to the time sent in the
3546	          receiver-timer option in the CONNECT message.

3548	          The tReceive timer is reset whenever a message is received
3549	          from this TCP connection.  If it ever expires, the TCP connec-
3550	          tion is dropped and communications with this partner is con-
3551	          sidered not ok.  The reject reason 17: "No traffic within suf-
3552	          ficient time" is placed in the DISCONNECT message sent prior
3553	          to dropping the TCP connection.

3555	          The tSend timer is reset whenever a message is sent over this
3556	          connection. When it expires, a CONTACT message MUST be sent.

3558	7.10.  STATE message [10]

3560	   The state (STATE) message is used to communicate the current failover
3561	   state to the partner server.

3563	   The STATE message MUST be sent after sending a CONNECTACK message
3564	   that didn't contain a reject-reason option, and MUST be sent after
3565	   receiving a CONNECTACK message without a reject-reason option.

3567	   A STATE message MUST be sent whenever the failover endpoint changes
3568	   its failover state and a connection exists to the partner.

3570	   The STATE message requires no response from the failover partner.

3572	   The following table shows the options that MUST appear in a STATE
3573	   message:

3575	   Option
3576	   ------
3577	   sending-state               MUST
3578	   server-flags                MUST
3579	   start-time-of-state         MUST

3581	              Table 7.10-1: Options used in a STATE message

3583	7.10.1.  Sending the STATE message

3585	   The current failover state is placed in the server-state option and
3586	   the current state of the STARTUP flag is placed in the server-flags
3587	   option.

3589	   The message is sent with a unique xid.

3591	   A server SHOULD only send the STATE message either when the connec-
3592	   tion is created (i.e, after sending or receiving a CONNECTACK message
3593	   with no reject-reason option), or when there is a change from the
3594	   values sent in a previous STATE message.

3596	7.10.2.  Receiving the STATE message

3598	   Every STATE message SHOULD indicate a change in state or a change in
3599	   the flags.

3601	   When a STATE message is received, any state transitions specified in
3602	   section 9 are taken.

3604	   No response to a STATE message is required.

3606	7.11.  CONTACT message [11]

3608	   The contact (CONTACT) message is sent to verify communications
3609	   integrity with a failover partner.  The CONTACT message is sent when
3610	   no messages have been sent to the failover partner for a specified
3611	   period of time.  This is determined by the tSend timer expiring (see
3612	   section 8.3).

3614	   The CONTACT message has no message specific options.

3616	7.11.1.  Sending the CONTACT message

3618	   The CONTACT message is sent.

3620	7.11.2.  Receiving the CONTACT message

3622	   When a CONTACT message is received, the tReceive timer is reset (as
3623	   it is with any message that is received).

3625	   A server SHOULD use the time in the time field and the time the mes-
3626	   sage was received to refine the delta time calculations between the
3627	   servers.

3629	7.12.  DISCONNECT message [12]

3631	   The DISCONNECT is the last message sent over a connection before
3632	   dropping an established connection (note that an established connec-
3633	   tion is one where a CONNECTACK has been sent without a reject rea-
3634	   son).

3636	   After sending or receiving a DISCONNECT message, a server needs to
3637	   have some mechanism to prevent an error loop. Simply reconnecting to
3638	   the partner immediately is not the best option, especially after
3639	   several consecutive attempts.

3641	   A simple suggested solution is to wait a minute or two after sending
3642	   or receiving a DISCONNECT before attempting to reestablish communica-
3643	   tion.

3645	   The DISCONNECT message MUST be the last message sent down a connec-
3646	   tion before it is closed.

3648	   The following table summarizes the options that are associated with
3649	   the DISCONNECT message:

3651	   Option
3652	   ------
3653	   reject-reason               MUST
3654	   message                     SHOULD

3656	              Table 7.12-1: Options used in a DISCONNECT message

3658	7.12.1.  Sending the DISCONNECT message

3660	   The DISCONNECT message MUST be the last message sent by the a server
3661	   which is dropping a TCP connection.

3663	   The xid of the DISCONNECT message must be unique.

3665	   The reject-reason option MUST appear giving a reason why the connec-
3666	   tion was dropped.  A message option SHOULD appear giving a human
3667	   readable error message with possibly more details.

3669	7.12.2.  Receiving the DISCONNECT message

3671	   When a server receives a DISCONNECT message it should log the message
3672	   if there was one and possibly raise an alarm of some sort if the
3673	   reject reason was one that was sufficiently serious.

3675	8.  Connection Management

3677	   Servers participating in the failover protocol communicate over TCP
3678	   connections.   These TCP connections are used both to transmit bind-
3679	   ing information from one server to another as well as to allow each
3680	   server to determine whether communications is possible with the other
3681	   server.

3683	   Central to the operation of the failover protocol is a notion of
3684	   "communications okay" or "communications failed".  Failover state
3685	   transitions are taken in many cases when the status of communications
3686	   with the partner changes, and the existence or non-existence of a TCP
3687	   connections between failover endpoints is used to determine if com-
3688	   munications is "okay" or "failed".

3690	   A single TCP connection exists which connects two failover endpoints.

3692	8.1.  Connection granularity

3694	   There exists one TCP connection between each set of failover end-
3695	   points.  See section 5.1.1 for an explanation of failover endpoints.

3697	   There are a maximum of two TCP connections between any two servers
3698	   implementing the failover protocol, one for each of the possible
3699	   failover endpoints between these two servers.  There is a minimum of
3700	   one TCP connection between one server and every other failover server
3701	   with which it implements the failover protocol.

3703	8.2.  Creating the TCP connection

3705	   There are two ports used for initiating TCP connections, correspond-
3706	   ing to the two roles that a server can fill with respect to another
3707	   server.  Every server implementing the failover protocol MUST listen
3708	   on at least one of these ports.  Port 647 is the port to which pri-
3709	   mary servers will attempt a connection, and port 847 is the port to
3710	   which secondary servers will attempt a connection. When a connection
3711	   attempt is received on port 647, it is therefore from a primary
3712	   server, and the primary server is attempting to connect to this
3713	   secondary server. Likewise, when a connection attempt is received on
3714	   port 847, it is therefore from a secondary server, and the secondary
3715	   server is attempting to connect to this primary server."  See the
3716	   schematic representation below:

3718	      Primary Server
3719	      --------------
3720	       Listens on port 847 for secondary server to connect to it
3721	       Periodically connects on port 647 to contact secondary

3723	      Secondary Server
3724	      --------------
3725	       Listens on port 647 for primary server to connect to it
3726	       Periodically connects on port 847 to contact primary

3728	   Every server implementing the failover protocol SHOULD attempt to
3729	   connect to all of its partners periodically, where the period is
3730	   implementation dependent and SHOULD be configurable.  In the event
3731	   that a connection has been rejected by a CONNECTACK message with a
3732	   reject-reason option contained in it or a DISCONNECT message, a
3733	   server SHOULD reduce the frequency with which it attempts to connect
3734	   to that server but it SHOULD continue to attempt to connect periodi-
3735	   cally.

3737	   If a connection attempt has been received from another server in a
3738	   particular role (i.e., from a specific failover endpoint) then the
3739	   receiving server MUST NOT initiate a connection attempt to the
3740	   partner server in that same role.

3742	   If both servers happen to attempt to connect simultaneously, the
3743	   secondary server MUST drop its attempt in favor of the primary's
3744	   attempt.  Thus, in the event that a secondary server receives a con-
3745	   nection attempt to port 647 from a primary server when it has already
3746	   initiated a connection attempt to port 847 on the same primary
3747	   server, it MUST accept the connection to port 647 and it MUST drop
3748	   drop the connection attempt to port 847. In the event that a primary
3749	   server receives a connection attempt to port 847 from a secondary
3750	   server when it has already initiated a connection attempt to port 647
3751	   on that same server, it MUST reject the connection attempt to port
3752	   847 and continue to pursue the connection attempt on port 647.

3754	   Once a connection is established, the primary server MUST send a CON-
3755	   NECT message across the connection.  A secondary server MUST wait for
3756	   the CONNECT message from a primary server.

3758	   Every CONNECT message includes a TLS-request option, and if the CON-
3759	   NECTACK message does not reject the CONNECT message and the TLS-reply
3760	   option says TLS MUST be used, then the servers will immediately enter
3761	   into TLS negotiation.

3763	   Once TLS negotiation is complete, the primary server MUST resend the
3764	   CONNECT message on the newly secured TLS connection and then wait for
3765	   the CONNECTACK message in response.  The TLS-request and TLS-reply
3766	   options MUST NOT appear in either this second CONNECT or its associ-
3767	   ated CONNECTACK message as they had in the first messages.

3769	   The second message sent over a new connection (either a bare TCP con-
3770	   nection or a connection utilizing TLS) is a STATE message.  Upon the
3771	   receipt of this message, the receiver can consider communications up.

3773	   It is entirely possible that two servers will attempt to make connec-
3774	   tions to each other essentially simultaneously, and in this case the
3775	   secondary server will be waiting for a CONNECT message on each con-
3776	   nection.  The primary server MUST send a CONNECT message over one
3777	   connection and it MUST close the other connection.

3779	   A secondary server MUST NOT respond to the closing of a TCP connec-
3780	   tion with a blind attempt to reconnect -- there may be another TCP
3781	   connection to the same failover partner already in use.

3783	8.3.  Using the TCP connection for determining communications status

3785	   The TCP connection is used to determine the communications status of
3786	   the other server, i.e., communications-ok, or communications-
3787	   interrupted.

3789	   Three things must happen for a server to consider that communications
3790	   are ok with respect to another server:

3792	      1.  A TCP connection must be established to the other server.

3794	      2.  A CONNECT message must be received and a CONNECTACK message
3795	          sent in response.  The CONNECT message is used to determine
3796	          the identify of the failover endpoint of the other end of the
3797	          TCP connection -- without it, the failover endpoint cannot be
3798	          uniquely determined.  Without knowledge of the failover end-
3799	          point, then the entity with which communications is ok is
3800	          undetermined.

3802	      3.  A STATE message must be received from the other server over
3803	          the connection.  This STATE message initializes important
3804	          information necessary to the operation of the state machine
3805	          the governs the behavior of this failover endpoint.

3807	   There are two ways that a server can determine that communications
3808	   has failed:

3810	      1.  The TCP connection can go down, yielding an error when
3811	          attempting to send or receive a message. This will happen at
3812	          least as often as the period of the tSend timer.

3814	      2.  The tReceive timer can expire.

3816	   In either of these cases, communications is considered interrupted.

3818	   If the tReceive timer expires, the connection MUST be dropped.  The
3819	   reject reason 17: "No traffic within sufficient time" is placed in
3820	   the DISCONNECT message sent prior to dropping the TCP connection.

3822	   Several difficulties arise when trying to use one TCP connection for
3823	   both bulk data transfer as well as to sense the communications status
3824	   of the other server.   One aspect of the problem stems from the dif-
3825	   ferent requirements of both uses.  The bulk data transfer is of
3826	   course critically important to the protocol, but the speed with which
3827	   it is processed is not terribly significant.  It might well be
3828	   minutes before a BNDUPD message is processed, and while not optimal,
3829	   such an occasional delay doesn't compromise the correctness of the
3830	   protocol. However, the speed with which one server detects the other
3831	   server is up (or, more importantly, down) is more highly constrained.
3832	   Generally one server should be able to detect that the other server
3833	   is not communicating within a minute or less.

3835	   These differing time constraints makes it difficult to use the same
3836	   TCP connection for data transfer as well as to sense communications
3837	   integrity.   See section 3.5 for additional details on TCP.

3839	   The solution to this problem is to require that some message be
3840	   received by each end of the connection within a limited time or that
3841	   the connection will be considered down.  If no messages have been
3842	   sent recently, then a CONTACT message is sent.

3844	   In the case where there is no data queued to be sent, this is not a
3845	   problem, but in the case where there is data queued to be sent to the
3846	   partner, then the CONTACT message will not actually be transmitted
3847	   until the queued data is sent.  Section 3.5 explains why waiting for
3848	   TCP to determine that the connection is down is not acceptable, and
3849	   leads to a requirement that the receiving server never block the
3850	   sending server from sending CONTACT messages.

3852	   In order to meet this requirement, each server tells the other server
3853	   the number of outstanding BNDUPD messages that it will accept.  The
3854	   receiving server is required to always be able to accept that many
3855	   BNDUPD messages off of the connection's input queue even if it cannot
3856	   process them immediately, and to accept all other messages immedi-
3857	   ately.

3859	   Thus, the sending server's TCP is never blocked from sending a mes-
3860	   sage except for very short periods, less than a few seconds unless
3861	   the network connection itself has problems.  In this case, if the
3862	   CONTACT messages don't make it to the partner then the partner will
3863	   close the connection.

3865	   DISCUSSION:

3867	      When implementing this capability, one needs to be careful when
3868	      sending any message on the TCP connection as TCP can easily block
3869	      the server if the local TCP send buffers are full.  This can't be
3870	      prevented because if the receiver is not reachable (via the net-
3871	      work), the sending TCP can't send and thus it will be unable to
3872	      empty the local TCP send buffers.  So, all send operations either
3873	      need to assume they may block for some time or non-blocking sends
3874	      must be used carefully.

3876	8.4.  Using the TCP connection for binding data

3878	   Binding data, in the form of BNDUPD messages and BNDACK messages to
3879	   respond to them, are sent across the TCP connection.

3881	   In order to support timely detection of any failure in the partner
3882	   server, the TCP connection MUST NOT block for more than a very short
3883	   time, on the order of a few seconds.  Therefore, a server that is
3884	   sending BNDUPD messages MUST send only a restricted number before
3885	   receiving BNDACK messages about previous messages sent.

3887	   The number of outstanding BNDUPD messages that each server will
3888	   accept without causing TCP to block transmission of additional data
3889	   (i.e, CONTACT messages) is sent by each server in the CONNECT and
3890	   CONNECTACK messages in the max-unacked-bndupd option.

3892	8.5.  Using the TCP connection for control messages

3894	   The TCP connection is used for control messages: POOLREQ, UPDREQ,
3895	   STATE, CONTACT, UPDREQALL and the corresponding reply messages: POOL-
3896	   RESP, UPDDONE.  A server MUST immediately accept all of these mes-
3897	   sages from the TCP connection.  A server MUST immediately accept any
3898	   BNDACK which is received as well.

3900	8.6.  Losing the TCP connection

3902	   When the TCP connection is lost, then communications is not ok with
3903	   the other server.  A server which has lost communications SHOULD
3904	   immediately attempt to reconnect to the other server, and should
3905	   retry these connection attempts periodically.

3907	   An acknowledgement message (BNDACK, POOLRESP, UPDDONE) message can
3908	   only be sent in response to a request message (BNDUPD, POOLREQ,
3909	   UPDREQ, UPDREQALL) on the same TCP connection from which the request
3910	   was received, in part since the XID's in the request messages are
3911	   guaranteed unique only during the life of a single TCP connection.

3913	   When a connection to a partner server goes down, a server with unpro-
3914	   cessed request messages MAY simply drop all of those messages, since
3915	   it can be sure that the partner will resend them when they are next
3916	   in communications.  A server with unprocessed BNDUPD messages when a
3917	   TCP connection goes down MAY instead choose to process those BNDUPD
3918	   messages, but it MUST NOT send any BNDACK messages in response (again
3919	   because of the issues surrounding XID uniqueness).

3921	   When the TCP connection is closed explicitly, the DISCONNECT message
3922	   with a reject-reason option (and, ideally, a message option) MUST be
3923	   sent over the TCP connection.

3925	9.  Failover Endpoint States

3927	   This section discusses the various states that a failover endpoint
3928	   may take, and the server actions required when entering the state,
3929	   operating in the state, and leaving the state, as well as the events
3930	   that cause transitions out of the state into another state.

3932	   The state transition diagram in Figure 9.2-1 is relevant for this
3933	   section. This is the common state transition diagram for both servers
3934	   in a failover pair.  In the event that the textual description of a
3935	   state differs from the state transition diagram, the textual descrip-
3936	   tion is to be considered authoritative.

3938	9.1.  Server Initialization

3940	   When a server starts it starts out in STARTUP state.  See section 9.3
3941	   below for details.

3943	9.2.  Server State Transitions

3945	   Whenever a server makes a transition into a new state, it MUST record
3946	   the state and the time at which it entered that state in stable
3947	   storage.  If communications is "ok", it MUST also send a STATE mes-
3948	   sage to its failover partner.

3950	   Figure 9.2-1 is the diagram of the server state transitions. The
3951	   remainder of this section contains information important to the
3952	   understanding of that diagram.

3954	   The server stays in the current state until all of the actions
3955	   specified on the state transition are complete.  If communications
3956	   fails during one of the actions, the server simply stays in the
3957	   current state and attempts a transition whenever the conditions for a
3958	   transition are later fulfilled.

3960	   In the state transition diagram below, the "+" or "-" in the upper
3961	   right corner of each state is a notation about whether communication
3962	   is ongoing with the other server.

3964	   The legend "responsive", "balanced", or "unresponsive" in each state
3965	   indicates whether the server is responsive to all DHCP client
3966	   requests, running in load balanced mode, or totally unresponsive in
3967	   the respective state.  The terms "responsive" and "unresponsive" have
3968	   the obvious meanings, while "balanced" means that a DHCP server may
3969	   respond to all DHCPREQUEST messages that are RENEWAL or REBINDING,
3970	   and to all other messages from clients for which the load balancing
3971	   algorithm indicates that it MUST respond to.  See sections 5.3 and
3972	   9.8.2 for details on load balancing.

3974	   In the state transition diagram below, when communication is reesta-
3975	   blished between the two servers, each must record the state of the
3976	   partner when communication was restored.  State transitions on one
3977	   server in some cases imply state transitions on the partner server,
3978	   so a record of the current state of the partner server must be kept
3979	   by each server.

3981	   If the state of the partner changes while communicating a server
3982	   moves through the communications-failed transition and into whatever
3983	   state results.  It then immediately moves through whatever state
3984	   transition is appropriate given the current state of the partner
3985	   server.  A server performing this operation SHOULD NOT close the TCP
3986	   connection to its partner.

3988	   DISCUSSION:

3990	      The point of this technique is simplicity, both in explanation of
3991	      the protocol and in its implementation.  The alternative to this
3992	      technique of memory of partner state and automatic state transi-
3993	      tion on change of partner state is to have every state in the fol-
3994	      lowing diagram have a state transition for every possible state of
3995	      the partner.  With the approach adopted, only the states in which
3996	      communications are reestablished require a state transition for
3997	      each possible partner state.

3999	   The current state of a server MUST be recorded in stable storage and
4000	   thus be available to the server after a server restart.

4002	   A transition into SHUTDOWN or PAUSED state is not represented in the
4003	   following figure, since other than sending that state to its partner,
4004	   the remaining actions involved look just like the server halting in
4005	   its otherwise current state, which then becomes the previous state
4006	   upon server restart.

4008	        +---------------+  V  +--------------+
4009	        |    RECOVER -|+|  |  |   STARTUP  - |
4010	        |(unresponsive) |  +->+(unresponsive)|
4011	        +------+--------+     +--------------+
4012	        +-Comm. OK             +-----------------+
4013	        |     Other State:     |  PARTNER DOWN - +<----------------------+
4014	        |    RESOLUTION-INTER. | (responsive)    |                       ^
4015	       All     POTENTIAL-      +----+------------+                       |
4016	      Others   CONFLICT------------ | --------+                          |
4017	        |      CONFLICT-DONE     Comm. OK     |     +--------------+     |
4018	     UPDREQ or                 Other State:   |  +--+ RESOLUTION - |     |
4019	     UPDREQALL                  |       |     |  |  | INTERRUPTED  |     |
4020	     Rcv UPDDONE             RECOVER    All   |  |  | (responsive) |     |
4021	        |  +---------------+    |      Others |  |  +------------+-+     |
4022	        +->+RECOVER-WAIT +-| RECOVER    |     |  |         ^     |       |
4023	           |(unresponsive) |  WAIT or   |     |  Comm.     |    Ext.     |
4024	           +-----------+---+  DONE      |     |  OK     Comm.   Cmd----->+
4025	    Comm.---+     Wait MCLT     |       V     V  V     Failed            |
4026	    Changed |          V    +---+   +---+-----+--+-+       |             |
4027	     |  +---+----------++   |       |  POTENTIAL + +-------+             |
4028	     |  |RECOVER-DONE +-|  Wait     |  CONFLICT    +------+              |
4029	     +->+(unresponsive) |  for      |(unresponsive)|   Primary           |
4030	        +------+--------+  Other  +>+----+--------++   resolve     Comm. |
4031	         Comm. OK          State: |      |        ^    conflict  Changed |
4032	    +---Other State:-+   RECOVER  |   Secondary   |       V       V   |  |
4033	    |    |           |     DONE   |    resolve    |   ++----------+---++ |
4034	    | All Others:  POTENT.  |     |   conflict    |   |CONFLICT-DONE-|+| |
4035	    | Wait for    CONFLICT- | ----+    see (9.10) |   | (responsive)   | |
4036	    | Other State:          V            V        |   +------+---------+ |
4037	    | NORMAL or RECOVER    ++------------+---+      Other State: NORMAL  |
4038	    |    |       DONE      |     NORMAL    + +<--------------+           |
4039	    |    +--+----------+-->+   (balanced)    +-------External Command--->+
4040	    |       ^          ^   +--------+--------+       or Other State:     |
4041	    |       |          |            |             |  SHUTDOWN            |
4042	    |   Wait for   Comm. OK  Comm. Failed or      |                      |
4043	    |    Other      Other    Other State: PAUSED  |               External
4044	    |    State:     State:          |             |                Command
4045	    | RECOVER-DONE  NORMAL     Start Safe      Comm. OK                or
4046	    |       |     COMM. INT.  Period Timer    Other State:            Safe
4047	    |    Comm. OK.     |            V          All Others           Period
4048	    |   Other State:   |  +---------+--------+    |             expiration
4049	    |     RECOVER      +--+ COMMUNICATIONS - +----+                      |
4050	    |       +-------------+   INTERRUPTED    |                           |
4051	    RECOVER               |  (responsive)    +-------------------------->+
4052	    RECOVER-WAIT--------->+------------------+
4053	                    Figure 9.2-1:  Server state diagram.

4055	9.3.  STARTUP state

4057	   The STARTUP state affords an opportunity for a server to probe its
4058	   partner server, before starting to service DHCP clients.

4060	   DISCUSSION:

4062	      Without the STARTUP state, a server would likely start in a state
4063	      derived from its previously stored state (held in stable storage),
4064	      if any.  However, this may be inconsistent with the current state
4065	      of the partner.  The STARTUP state affords the opportunity for a
4066	      server to potentially learn the partner's state and determine if
4067	      that state is consistent with its derived starting state or
4068	      whether some significant state change has occurred at the partner
4069	      that forces the server to start in another state.  This is
4070	      especially critical if significant time has elapsed while the
4071	      server was down.

4073	9.3.1.  Operation while in STARTUP state

4075	   Whenever a server is in STARTUP state, it MUST be unresponsive to
4076	   DHCP client requests, and so the time spent in the STARTUP state is
4077	   necessarily short, typically on the order of a few seconds to a few
4078	   tens of seconds.  The exact time spent in the STARTUP state is imple-
4079	   mentation dependent, and the primary and secondary server are not
4080	   required to spend the same amount of time in the STARTUP state.  See
4081	   section 5.9 for some guidelines on the time to spend in STARTUP
4082	   state.

4084	   Whenever a STATE message is sent to the partner while in STARTUP
4085	   state the STARTUP bit MUST be set in the server-flags option and the
4086	   previously recorded failover state MUST be placed in the server-state
4087	   option.

4089	9.3.2.  Transition out of STARTUP state

4091	   Each server starts out in startup state every time it initializes
4092	   itself, and performs the following algorithm as part of its initiali-
4093	   zation:

4095	      1.  Is there any record in stable storage of a previous failover
4096	          state?  If yes, set previous-state to the last recorded state
4097	          in stable storage, and continue with step 2.

4099	          Is there any configuration information that indicates that
4100	          this server was previously running but lost its stable
4101	          storage?  Such information must typically come from some
4102	          administrative intervention, since it is difficult for a
4103	          server to distinguish first startup from a startup after it
4104	          has lost its stable storage.  If yes, then set the previous-
4105	          state to RECOVER, and set the time-of-failure to whatever time
4106	          was configured, and go on to step 2.  This time-of-failure
4107	          will be used in the transition out of the RECOVER-WAIT state
4108	          into the RECOVER-DONE state, below.

4110	          If there is no record of any previous failover state in stable
4111	          storage for this server, then set the previous-state to
4112	          RECOVER and set the time-of-failure to a time before the
4113	          maximum-client-lead-time before now.  If using standard Posix
4114	          times, 0 would typically do quite well.  This will allow two
4115	          servers which already have lease information to synchronize
4116	          themselves prior to operating.

4118	          Note that neither server is responsive to DHCP client requests
4119	          while in the RECOVER state.  If both servers can communicate,
4120	          however, they will come out of the RECOVER state and progress
4121	          through RECOVER-WAIT to RECOVER-DONE and thence to NORMAL or
4122	          COMMUNICATIONS-INTERRUPTED state quickly.  If both have state,
4123	          then they will exchange information.  If only one has state,
4124	          then the one that does not will complete its update of its
4125	          partner quickly (since it has nothing to send).

4127	          In some cases, an existing server will be commissioned as a
4128	          failover server and brought back into operation where its
4129	          partner is not yet available.  In this case, the newly commis-
4130	          sioned failover server will not operate until its partner
4131	          comes online  -- but it has operational responsibilities as a
4132	          DHCP server nonetheless.  To properly handle this situation, a
4133	          server SHOULD be configurable in such a way as to move
4134	          directly into PARTNER-DOWN state after the startup period
4135	          expires if it has been unable to contact its partner during
4136	          the startup period.

4138	      2.  If the previous state is one where communications was "OK",
4139	          then set the previous state to the state that is the result of
4140	          the communications failed state transition in Figure 9.2-1 (if
4141	          such transition is shown -- some states don't have a communi-
4142	          cations failed state transition, since they allow both commun-
4143	          ications OK and failed).

4145	      3.  Start the STARTUP state timer.  The time that a server remains
4146	          in the STARTUP state (absent any communications with its
4147	          partner) is implementation dependent and SHOULD be
4148	          configurable.  It SHOULD be long enough for a TCP connection
4149	          to be created to a heavily loaded partner across a slow net-
4150	          work.

4152	      4.  Attempt to create a TCP connection to the failover partner.
4153	          See section 8.2.

4155	      5.  Wait for "communications okay", i.e., the process discussed in
4156	          section 8.2 "Creating the TCP Connection", to complete,
4157	          including the receipt of a STATE message from the partner.

4159	          When and if communications become "okay", clear the STARTUP
4160	          flag, and set the current state to the previous-state.

4162	          If the partner is in PARTNER-DOWN state, and if the time at
4163	          which it entered PARTNER-DOWN state (as received in the
4164	          start-time-of-state option in the STATE message) is later than
4165	          the last recorded time of operation of this server, then set
4166	          the current state to RECOVER.  If the time at which it entered
4167	          PARTNER-DOWN state is earlier than the last recorded time of
4168	          operation of this server, then set the current state to
4169	          POTENTIAL-CONFLICT.

4171	          Then, transition to the current state and take the "communica-
4172	          tions okay" state transition based on the current state of
4173	          this server and the partner.

4175	      6.  If the startup time expires, take an implementation dependent
4176	          action:  The server MAY go to the previous-state, or the
4177	          server MAY wait.

4179	          Reasons to go to previous-state and begin processing:

4181	          If the current server is the only operational server, then if
4182	          it waits, there will be no operational DHCP servers.  This
4183	          situation could occur very easily where one server fails and
4184	          then the other crashes and reboots.  If the rebooting server
4185	          doesn't start processing DHCP client requests without first
4186	          being in communication with the other server, then the level
4187	          of DHCP redundancy is not particularly high.  This is an
4188	          appropriate approach if the possibility of partition is low,
4189	          or if the safe period expiration time is well beyond the time
4190	          at which an operator would notice and react to a partition
4191	          situation.  It is also quite appropriate if the safe period
4192	          will never expire.

4194	          Reasons to wait:

4196	          If the current server has been down for longer than the
4197	          maximum-client-lead-time, and it is partitioned from the other
4198	          server, then when it returns it will attempt to use its own
4199	          available addresses to allocate to new DHCP clients, and the
4200	          other server may well be in PARTNER-DOWN state and may have
4201	          already allocated some of those available addresses to DHCP
4202	          clients.  In cases where the possibility of partition is high,
4203	          and the safe period expiration time is less than the likely
4204	          operator reaction time, this is a good approach to use.

4206	9.4.  PARTNER-DOWN state

4208	   PARTNER-DOWN state is a state either server can enter.  When in this
4209	   state, the server does not assume that the other server could still
4210	   be operating and servicing a different set of clients, but instead
4211	   assumes that it is the only server operating. If one server is in
4212	   PARTNER-DOWN state, the other server MUST NOT be operating.

4214	9.4.1.  Upon entry to PARTNER-DOWN state

4216	   No special actions are required when entering PARTNER-DOWN state.

4218	   The server should continue to attempt to connect to the partner
4219	   periodically.

4221	9.4.2.  Operation while in PARTNER-DOWN state

4223	   A server in PARTNER-DOWN state MUST respond to DHCP client requests.
4224	   It will allow renewal of all outstanding leases on IP addresses, and
4225	   will allocate IP addresses from its own pool, and after a fixed
4226	   period of time (the MCLT interval) has elapsed from entry into
4227	   PARTNER-DOWN state, it will allocate IP addresses from the set of all
4228	   available IP addresses.

4230	   Once a server has entered NORMAL state, the PARTNER-DOWN state is
4231	   entered only on command of an external agency (typically an adminis-
4232	   trator of some sort) or after the expiration of an externally config-
4233	   ured minimum safe-time after the beginning of COMMUNICATIONS-
4234	   INTERRUPTED state.

4236	   Any IP address tagged as available for allocation by the other server
4237	   (at entry to PARTNER-DOWN state) MUST NOT be allocated to a new
4238	   client until the maximum-client-lead-time beyond the entry into
4239	   PARTNER-DOWN state has elapsed.

4241	   A server in PARTNER-DOWN state MUST NOT allocate an IP address to a
4242	   DHCP client different from that to which it was allocated at the
4243	   entrance to PARTNER-DOWN state until the maximum-client-lead-time
4244	   beyond the maximum of the following times: client expiration time,
4245	   most recently transmitted potential-expiration-time, most recently
4246	   received ack of potential-expiration-time from the partner, and most
4247	   recently acked potential-expiration-time to the partner.  See section
4248	   7.1.5 for details.  If this time would be earlier than the current
4249	   time plus the maximum-client-lead-time, then the time the server
4250	   entered PARTNER-DOWN state plus the maximum-client-lead-time is used.

4252	   Two options exist for lease times given out while in PARTNER-DOWN
4253	   state, with different ramifications flowing from each.

4255	   If the server wishes the Failover protocol to protect it from loss of
4256	   stable storage in PARTNER-DOWN state, then it should ensure that the
4257	   MCLT based lease time restrictions in section 5.1 are maintained,
4258	   even in PARTNER-DOWN state.

4260	   If the server wishes to forego the protection of the Failover proto-
4261	   col in the event of loss of stable storage, then it need recognize no
4262	   restrictions on actual client lease times while in PARTNER-DOWN
4263	   state.

4265	   A server in PARTNER-DOWN state MUST continue to attempt to establish
4266	   communications and synchronization with its partner.

4268	9.4.3.  Transitions out of PARTNER-DOWN state

4270	   When a server in PARTNER-DOWN state succeeds in establishing a con-
4271	   nection to its partner, its actions are conditional on the state and
4272	   flags received in the STATE message from the other server as part of
4273	   the process of establishing the connection.

4275	   If the STARTUP bit is set in the server-flags option of a received
4276	   STATE message, a server in PARTNER-DOWN state MUST NOT take any state
4277	   transitions based on reestablishing communications. Essentially, if a
4278	   server is in PARTNER-DOWN state, it ignores all STATE messages from
4279	   its partner that have the STARTUP bit set in the server-flags option
4280	   of the STATE message.

4282	   If the STARTUP bit is not set in the server-flags option of a STATE
4283	   message received from its partner, then a server in PARTNER-DOWN
4284	   state takes the following actions based on the value of the server-
4285	   state option in the received STATE message (either immediately after
4286	   establishing communications or at any time later when a new state is
4287	   received):

4289	      o partner in NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN,
4290	        POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or CONFLICT-DONE
4291	        state

4293	        transition to POTENTIAL-CONFLICT state

4295	      o partner in RECOVER, RECOVER-WAIT, SHUTDOWN, PAUSED state

4297	        stay in PARTNER-DOWN state

4299	      o partner in RECOVER-DONE state

4301	        transition into NORMAL state

4303	9.5.  RECOVER state

4305	   This state indicates that the server has no information in its stable
4306	   storage or that it is re-integrating with a server in PARTNER-DOWN
4307	   state after it has been down.  A server in this state MUST attempt to
4308	   refresh its stable storage from the other server.

4310	9.5.1.  Operation in RECOVER state

4312	   A server in RECOVER MUST NOT respond to DHCP client requests.

4314	   A server in RECOVER state will attempt to reestablish communications
4315	   with the other server.

4317	9.5.2.  Transitions out of RECOVER state

4319	   If the other server is in POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED,
4320	   or CONFLICT-DONE state when communications are reestablished, then
4321	   the server in RECOVER state will move to POTENTIAL-CONFLICT state
4322	   itself.

4324	   If the other server is in any other state, then the server in RECOVER
4325	   state will request an update of missing binding information by send-
4326	   ing an UPDREQ message.  If the server has been instructed (through
4327	   configuration or other external agency) that it has lost its stable
4328	   storage, or if it has deduced that from the fact that it has no
4329	   record of ever having talked to its partner, while its partner does
4330	   have a record of communicating with it, it MUST send an UPDREQALL
4331	   message, otherwise it MUST send an UPDREQ message.  See Figure
4332	   9.5.2-1.

4334	   It will wait for an UPDDONE message, and upon receipt of that message
4335	   it will transition to RECOVER-WAIT state.

4337	   If communications fails during the reception of the results of the
4338	   UPDREQ or UPDREQALL message, the server will remain in RECOVER state,
4339	   and will re-issue the UPDREQ or UPDREQALL when communications are
4340	   re-established.  (See section 5.17).

4342	   If an UPDDONE message isn't received within an implementation depen-
4343	   dent amount of time, and no BNDUPD messages are being received, the
4344	   connection SHOULD be dropped.

4346	                A                                        B
4347	              Server                                  Server

4349	                |                                        |
4350	             RECOVER                               PARTNER-DOWN
4351	                |                                        |
4352	                | >--UPDREQ-------------------->         |
4353	                |                                        |
4354	                |        <---------------------BNDUPD--< |
4355	                | >--BNDACK-------------------->         |
4356	               ...                                      ...
4357	                |                                        |
4358	                |        <---------------------BNDUPD--< |
4359	                | >--BNDACK-------------------->         |
4360	                |                                        |
4361	                |        <--------------------UPDDONE--< |
4362	                |                                        |
4363	           RECOVER-WAIT                                  |
4364	                |                                        |
4365	                | >--STATE-(RECOVER-WAIT)------>         |
4366	                |                                        |
4367	                |                                        |
4368	       Wait MCLT from last known                         |
4369	          time of failover operation                     |
4370	                |                                        |
4371	           RECOVER-DONE                                  |
4372	                |                                        |
4373	                | >--STATE-(RECOVER-DONE)------>         |
4374	                |                                     NORMAL
4375	                |        <-------------(NORMAL)-STATE--< |
4376	             NORMAL                                      |
4377	                | >---- State-(NORMAL)--------------->
4378	                |                                        |
4379	                |                                        |

4381	              Figure 9.5.2-1:  Transition out of RECOVER state

4383	If, at any time while a server is in RECOVER state communications fails,
4384	the server will stay in RECOVER state.  When communications are
4385	restored, it will restart the process of transitioning out of RECOVER
4386	state.

4388	9.6.  RECOVER-WAIT state

4390	   This state indicates that the server has done an UPDREQ or UPDREQALL
4391	   and has received the UPDDONE message indicating that it has received
4392	   all outstanding binding update information.  In the RECOVER-WAIT
4393	   state the server will wait for the MCLT in order to ensure that any
4394	   processing that this server might have done prior to losing its
4395	   stable storage will not cause future difficulties.

4397	9.6.1.  Operation in RECOVER-WAIT state

4399	   A server in RECOVER-WAIT MUST NOT respond to DHCP client requests.

4401	9.6.2.  Transitions out of RECOVER-WAIT state

4403	   Upon entry to RECOVER-WAIT state the server MUST start a timer whose
4404	   expiration is set to a time equal to the time the server went down
4405	   (if known) or the time the server started (if the down-time is
4406	   unknown) plus the maximum-client-lead-time.  When this timer goes
4407	   off, the server will transition into RECOVER-DONE state.

4409	   This is to allow any IP addresses that were allocated by this server
4410	   prior to loss of its client binding information in stable storage to
4411	   contact the other server or to time out.

4413	   If this is the first time this server has run failover -- as
4414	   determined by the information received from the partner, not
4415	   necessarily only as determined by this server's stable storage (as
4416	   that may have been lost), then the waiting time discussed above may
4417	   be skipped, and the server may transition immediately to RECOVER-DONE
4418	   state.

4420	   See Figure 9.5.2-1.

4422	   DISCUSSION:

4424	      The actual requirement on this wait period in RECOVER is that it
4425	      start not before the recovering server went down, not necessarily
4426	      when it came back up.  If the time when the recovering server
4427	      failed is known, it could be communicated to the recovering server
4428	      (perhaps through actions of the network administrator), and the
4429	      wait period could be reduced to the maximum-client-lead-time less
4430	      the difference between the current time and the time the server
4431	      failed.  In this way, the waiting period could be minimized.
4432	      Various heuristics could be used to estimate this time, for
4433	      example if the recovering server periodically updates stable
4434	      storage with a time stamp, the wait period could be calculated to
4435	      start at the time of the last update of stable storage plus the
4436	      time required for the next update (which never occurred).  This
4437	      estimate is later than the server went down, but probably not too
4438	      much later.

4440	      If the server has never before run failover, then there is no need
4441	      to wait in this state -- but, again, to determine if this server
4442	      has run failover it is vital that the information provided by the
4443	      partner be utilized, since the stable storage of this server may
4444	      have been lost.

4446	   If communications fails while a server is in RECOVER-WAIT state, it
4447	   has no effect on the operation of this state.  The server SHOULD
4448	   continue to operate its timer, and the timer goes off during the
4449	   period where communications with the other server have failed, then
4450	   the server SHOULD transition to RECOVER-DONE state.  This is rare --
4451	   failover state transitions are not usually made while communications
4452	   are interrupted, but in this case there is no reason to inhibit the
4453	   timer.  A server MAY state in RECOVER-WAIT state even after expiry of
4454	   the timer and transition to RECOVER-DONE state upon re-establishing
4455	   communications with the partner if desired.  The key point here is to
4456	   allow the timer to continue to operate, not whether or not the state
4457	   transition is made before or after communications are re-established.

4459	9.7.  RECOVER-DONE state

4461	   This state exists to allow an interlocked transition for one server
4462	   from RECOVER state and another server from PARTNER-DOWN or
4463	   COMMUNICATIONS-INTERRUPTED state into NORMAL state.

4465	9.7.1.  Operation in RECOVER-DONE state

4467	   A server in RECOVER-DONE state MUST respond only to
4468	   DHCPREQUEST/RENEWAL and DHCPREQUEST/REBINDING DHCP messages.

4470	9.7.2.  Transitions out of RECOVER-DONE state

4472	   When a server in RECOVER-DONE state determines that its partner
4473	   server has entered NORMAL or RECOVER-DONE state, then it will transi-
4474	   tion into NORMAL state.

4476	   If communications fails while in RECOVER-DONE state, a server will
4477	   stay in RECOVER-DONE state.

4479	   9.8.  NORMAL state

4481	   NORMAL state is the state used by a server when it is communicating
4482	   with the other server, and any required resynchronization has been
4483	   performed. While some bindings database synchronization is performed
4484	   in NORMAL state, potential conflicts are resolved prior to entry into
4485	   NORMAL state as is binding database data loss.

4487	9.8.1.  Upon entry to NORMAL state

4489	   When entering NORMAL state, a server will send to the other server
4490	   all currently unacknowledged binding updates as BNDUPD messages.

4492	   When the above process is complete, if the server entering NORMAL
4493	   state is a secondary server, then it will request IP addresses for
4494	   allocation using the POOLREQ message.

4496	9.8.2.  Processing DHCP client requests and load balancing

4498	   In NORMAL state, a server MUST process every DHCPREQUEST/RENEWAL or
4499	   DHCPREQUEST/REBINDING request it receives. And, it processes other
4500	   requests only for those clients as dictated by the load balancing
4501	   algorithm specified in [RFC 3074].

4503	   As discussed in section 5.3, each server will take the client-
4504	   identifier from each DHCP client request (or the client-hardware-
4505	   address, i.e., the chaddr if no client-identifier is present in the
4506	   request) and use it as the 'Request ID' specified in [RFC 3074].
4507	   After applying the algorithm specified in [RFC 3074] and comparing
4508	   the result with the hash bucket assignment (performed during connect
4509	   processing between failover servers), each failover server will be
4510	   able to unambiguously determine if it should process the DHCP client
4511	   request.

4513	9.8.3.  Operation in NORMAL state

4515	   When in NORMAL state, for every DHCP client request that it
4516	   processes, as determined by the algorithm described in section 9.8.2,
4517	   above, a server will operate in the following manner:

4519	      o Lease time calculations

4521	        As discussed in section 5.2.1, "Control of lease time", the
4522	        lease interval given to a DHCP client can never be more than the
4523	        MCLT greater than the most recently received potential-
4524	        expiration-time from the failover partner or the current time,
4525	        whichever is later.

4527	        As long as a server adheres to this constraint, the specifics of
4528	        the lease interval that it gives to a DHCP client or the value
4529	        of the potential-expiration-time sent to its failover partner
4530	        are implementation dependent.  One possible approach is dis-
4531	        cussed in section 5.2.1, but that particular approach is in no
4532	        way required by this protocol.

4534	        See section 7.1.5 for details concerning the storage of time
4535	        associated with IP addresses and how to use these times when
4536	        calculating lease times for DHCP clients.

4538	      o Lazy update of partner server

4540	        After an DHCPACK of a IP address binding, the server servicing a
4541	        DHCP client request attempts to update its partner with the new
4542	        binding information.  The lease time used in the update of the
4543	        secondary MUST be at least that given to the DHCP client in the
4544	        DHCPACK, and the potential-expiration-time MUST be at least the
4545	        lease time, and SHOULD be considerably longer.

4547	      o Reallocation of IP addresses between clients

4549	        Whenever a client binding is released or expires, a BNDUPD mes-
4550	        sage must be sent to the partner, setting the binding state to
4551	        RELEASED or EXPIRED.  However, until a BNDACK is received for
4552	        this message, the IP address cannot be allocated to another
4553	        client.  It cannot be allocated to the same client again if a
4554	        BNDUPD was sent, otherwise it can.  See section 5.2.2.

4556	   In normal state, each server receives binding updates from its
4557	   partner server in BNDUPD messages.  It records these in its client
4558	   binding database in stable storage and then sends a corresponding
4559	   BNDACK message to its partner server.  It MUST ensure that the infor-
4560	   mation is recorded in stable storage prior to sending the BNDACK mes-
4561	   sage back to its partner.

4563	9.8.4.  Transitions out of NORMAL state

4565	   If an external command is received by a server in NORMAL state
4566	   informing it that its partner is down, then transition into PARTNER-
4567	   DOWN state.  Generally, this would be an unusual situation, where
4568	   some external agency knew the partner server was down.  Using the
4569	   command in this case would be appropriate if the polling interval and
4570	   timeout were long.

4572	   If a server in NORMAL state fails to receive acks to messages sent to
4573	   its partner for an implementation dependent period of time, it MAY
4574	   move into COMMUNICATIONS-INTERRUPTED state.  This situation might
4575	   occur if the partner server was capable of maintaining the TCP con-
4576	   nection between the server and also capable of sending a CONTACT mes-
4577	   sage every tSend seconds, but was (for some reason) incapable of pro-
4578	   cessing BNDUPD messages.

4580	   If the communications is determined to not be "ok" (as defined in
4581	   section 8), then transition into COMMUNICATIONS-INTERRUPTED state.

4583	   If a server in NORMAL state receives any messages from its partner
4584	   where the partner has changed state from that expected by the server
4585	   in NORMAL state, then the server should transition into
4586	   COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran-
4587	   sition from there.  For example, it would be expected for the partner
4588	   to transition from POTENTIAL-CONFLICT into NORMAL state, but not for
4589	   the partner to transition from NORMAL into POTENTIAL-CONFLICT state.

4591	   If a server in NORMAL state receives any messages from its partner
4592	   where the PARTNER has changed into PAUSED state, the server should
4593	   transition into COMMUNICATIONS-INTERRUPTED state.  If a server in
4594	   NORMAL state receives any messages from its partner where the PARTNER
4595	   has changed into SHUTDOWN state, the server should transition into
4596	   PARTNER-DOWN state.

4598	9.9.  COMMUNICATIONS-INTERRUPTED State

4600	   A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is
4601	   unable to communicate with the other server.  Primary and secondary
4602	   servers cycle automatically (without administrative intervention)
4603	   between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network
4604	   connection between them fails and recovers, or as the partner server
4605	   cycles between operational and non-operational.  No duplicate IP
4606	   address allocation can occur while the servers cycle between these
4607	   states.

4609	9.9.1.  Upon entry to COMMUNICATIONS-INTERRUPTED state

4611	   When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been
4612	   configured to support an automatic transition out of COMMUNICATIONS-
4613	   INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period"
4614	   has been configured, see section 10), then a timer MUST be started
4615	   for the length of the configured safe period.

4617	   A server transitioning into the COMMUNICATIONS-INTERRUPTED state from
4618	   the NORMAL state SHOULD raise some alarm condition to alert adminis-
4619	   trative staff to a potential problem in the DHCP subsystem.

4621	9.9.2.  Operation in COMMUNICATIONS-INTERRUPTED State

4623	   In this state a server MUST respond to all DHCP client requests, and
4624	   the algorithm for load balancing described in section 5.3 MUST NOT be
4625	   used.  When allocating new IP addresses, each server allocates from
4626	   its own IP address pool, where the primary MUST allocate only FREE IP
4627	   addresses, and the secondary MUST allocate only BACKUP IP addresses.
4628	   When responding to renewal requests, each server will allow continued
4629	   renewal of a DHCP client's current lease on an IP address irrespec-
4630	   tive of whether that lease was given out by the receiving server or
4631	   not, although the renewal period MUST NOT exceed the maximum client
4632	   lead time (MCLT) beyond the latest of: 1) the potential-expiration-
4633	   time already acknowledged by the other server, or 2) the lease-
4634	   expiration-time, or 3) the potential-expiration-time received from
4635	   the partner server.

4637	   However, since the server cannot communicate with its partner in this
4638	   state, the acknowledged-potential-expiration time will not be updated
4639	   in any new bindings.  This is likely to eventually cause the actual-
4640	   client-lease-times to be the current time plus the maximum-client-
4641	   lead-time (unless this is greater than the desired-client-lease-
4642	   time).

4644	   The server should continue to try to establish a connection with its
4645	   partner.

4647	9.9.3.  Transition out of COMMUNICATIONS-INTERRUPTED State

4649	   If the safe period timer expires while a server is in the
4650	   COMMUNICATIONS-INTERRUPTED state, it will transition immediately into
4651	   PARTNER-DOWN state.

4653	   If an external command is received by a server in COMMUNICATIONS-
4654	   INTERRUPTED state informing it that its partner is down, it will
4655	   transition immediately into PARTNER-DOWN state.

4657	   If communications is restored with the other server, then the server
4658	   in COMMUNICATIONS-INTERRUPTED state will transition into another
4659	   state based on the state of the partner:

4661	      o partner in NORMAL or COMMUNICATIONS-INTERRUPTED
4662	        The partner SHOULD NOT be in NORMAL state here, since upon res-
4663	        toration of communications it MUST have created a new TCP con-
4664	        nection which would have forced it into COMMUNICATIONS-
4665	        INTERRUPTED state.  Still, we should account for every state
4666	        just in case.

4668	        Transition into the NORMAL state.

4670	      o partner in RECOVER

4672	        Stay in COMMUNICATIONS-INTERRUPTED state.

4674	      o partner in RECOVER-DONE

4676	        Transition into NORMAL state.

4678	      o partner in PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or
4679	        RESOLUTION-INTERRUPTED

4681	        Transition into POTENTIAL-CONFLICT state.

4683	      o partner in PAUSED

4685	        Stay in COMMUNICATIONS-INTERRUPTED state.

4687	      o partner in SHUTDOWN

4689	        Transition into PARTNER-DOWN state.

4691	   The following figure illustrates the transition from NORMAL to
4692	   COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again.

4694	             Primary                                Secondary
4695	              Server                                  Server

4697	              NORMAL                                  NORMAL
4698	                | >--CONTACT------------------->         |
4699	                |        <--------------------CONTACT--< |
4700	                |         [TCP connection broken]        |
4701	           COMMUNICATIONS          :              COMMUNICATIONS
4702	             INTERRUPTED           :                INTERRUPTED
4703	                |      [attempt new TCP connection]      |
4704	                |         [connection succeeds]          |
4705	                |                                        |
4706	                | >--CONNECT------------------->         |
4707	                |        <-----------------CONNECTACK--< |
4708	                |                                     NORMAL
4709	                |        <-------------------STATE-----< |
4710	              NORMAL                                     |
4711	                | >--STATE--------------------->         |
4712	                |
4713	                | >--BNDUPD-------------------->         |
4714	                |        <---------------------BNDACK--< |
4715	                |                                        |
4716	                |        <---------------------BNDUPD--< |
4717	                | >------BNDACK---------------->         |
4718	               ...                                      ...
4719	                |                                        |
4720	                |        <--------------------POOLREQ--< |
4721	                | >--POOLRESP-(2)-------------->         |
4722	                |                                        |
4723	                | >--BNDUPD-(#1)--------------->         |
4724	                |        <---------------------BNDACK--< |
4725	                |                                        |
4726	                |        <--------------------POOLREQ--< |
4727	                | >--POOLRESP-(0)-------------->         |
4728	                |                                        |
4729	                | >--BNDUPD-(#2)--------------->         |
4730	                |        <---------------------BNDACK--< |
4731	                |                                        |

4733	       Figure 9.9.3-1:  Transition from NORMAL to COMMUNICATIONS-
4734	                        INTERRUPTED and back (example with 2
4735	                        addresses allocated to secondary)

4737	9.10.  POTENTIAL-CONFLICT state

4739	   This state indicates that the two servers are attempting to re-
4740	   integrate with each other, but at least one of them was running in a
4741	   state that did not guarantee automatic reintegration would be
4742	   possible.  In POTENTIAL-CONFLICT state the servers may determine that
4743	   the same IP address has been offered and accepted by two different
4744	   DHCP clients.

4746	   It is a goal of this protocol to minimize the possibility that
4747	   POTENTIAL-CONFLICT state is ever entered.

4749	9.10.1.  Upon entry to POTENTIAL-CONFLICT state

4751	   When a primary server enters POTENTIAL-CONFLICT state it should
4752	   request that the secondary send it all updates of which it is
4753	   currently unaware by sending an UPDREQ message to the secondary
4754	   server.

4756	   A secondary server entering POTENTIAL-CONFLICT state will wait for
4757	   the primary to send it an UPDREQ message.

4759	9.10.2.  Operation in POTENTIAL-CONFLICT state

4761	   Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming
4762	   DHCP requests.

4764	9.10.3.  Transitions out of POTENTIAL-CONFLICT state

4766	   If communications fails with the partner while in POTENTIAL-CONFLICT
4767	   state, then the server will transition to RESOLUTION-INTERRUPTED
4768	   state.

4770	   Whenever either server receives an UPDDONE message from its partner
4771	   while in POTENTIAL-CONFLICT state, it MUST transition to a new state.
4772	   The primary MUST transition to CONFLICT-DONE state, and the secondary
4773	   MUST transition to NORMAL state.  This will cause the primary server
4774	   to leave POTENTIAL-CONFLICT state prior to the secondary, since the
4775	   primary sends an UPDREQ message and receives an UPDDONE before the
4776	   secondary sends an UPDREQ message and receives its UPDDONE message.

4778	   When a secondary server receives an indication that the primary
4779	   server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE
4780	   state, it SHOULD send an UPDREQ message to the primary server.

4782	              Primary                                Secondary
4783	              Server                                  Server

4785	                |                                        |
4786	         POTENTIAL-CONFLICT                    POTENTIAL-CONFLICT
4787	                |                                        |
4788	                | >--UPDREQ-------------------->         |
4789	                |                                        |
4790	                |        <---------------------BNDUPD--< |
4791	                | >--BNDACK-------------------->         |
4792	               ...                                      ...
4793	                |                                        |
4794	                |        <---------------------BNDUPD--< |
4795	                | >--BNDACK-------------------->         |
4796	                |                                        |
4797	                |        <--------------------UPDDONE--< |
4798	              NORMAL                                     |
4799	                | >--STATE--(NORMAL)----------->         |
4800	                |        <---------------------UPDREQ--< |
4801	                |                                        |
4802	                | >--BNDUPD-------------------->         |
4803	                |        <---------------------BNDACK--< |
4804	               ...                                      ...
4805	                | >--BNDUPD-------------------->         |
4806	                |        <---------------------BNDACK--< |
4807	                |                                        |
4808	                | >--UPDDONE------------------->         |
4809	                |                                     NORMAL
4810	                |        <------------STATE--(NORMAL)--< |
4811	                |                                        |
4812	                |        <--------------------POOLREQ--< |
4813	                | >------POOLRESP-(n)---------->         |
4814	                |              addresses                 |

4816	           Figure 9.8.3-1:  Transition out of POTENTIAL-CONFLICT

4818	9.11.  RESOLUTION-INTERRUPTED state

4820	   This state indicates that the two servers were attempting to re-
4821	   integrate with each other in POTENTIAL-CONFLICT state, but
4822	   communications failed prior to completion of re-integration.

4824	   If the servers remained in POTENTIAL-CONFLICT while communications
4825	   was interrupted, neither server would be responsive to DHCP client
4826	   requests, and if one server had crashed, then there might be no
4827	   server able to process DHCP requests.

4829	9.11.1.  Upon entry to RESOLUTION-INTERRUPTED state

4831	   When a server enters RESOLUTION-INTERRUPTED state it SHOULD raise an
4832	   alarm condition to alert administrative staff of a problem in the
4833	   DHCP subsystem.

4835	9.11.2.  Operation in RESOLUTION-INTERRUPTED state

4837	   In this state a server MUST respond to all DHCP client requests, and
4838	   any load balancing (described in section 5.3) MUST NOT be used.  When
4839	   allocating new IP addresses, each server SHOULD allocate from its own
4840	   IP address pool (if that can be determined), where the primary SHOULD
4841	   allocate only FREE IP addresses, and the secondary SHOULD allocate
4842	   only BACKUP IP addresses.  When responding to renewal requests, each
4843	   server will allow continued renewal of a DHCP client's current lease
4844	   on an IP address irrespective of whether that lease was given out by
4845	   the receiving server or not, although the renewal period MUST not
4846	   exceed the maximum client lead time (MCLT) beyond the latest of: 1)
4847	   the potential-expiration-time already acknowledged by the other
4848	   server or 2) the lease-expiration-time or 3) `potential-expiration-
4849	   time received from the partner server.

4851	   However, since the server cannot communicate with its partner in this
4852	   state, the acknowledged-potential-expiration time will not be updated
4853	   in any new bindings.

4855	9.11.3.  Transitions out of RESOLUTION-INTERRUPTED state

4857	   If an external command is received by a server in RESOLUTION-
4858	   INTERRUPTED state informing it that its partner is down, it will
4859	   transition immediately into PARTNER-DOWN state.

4861	   If communications is restored with the other server, then the server
4862	   in RESOLUTION-INTERRUPTED state will transition into POTENTIAL-
4863	   CONFLICT state.

4865	9.12.  CONFLICT-DONE state

4867	   This state indicates that during the process where the two servers
4868	   are attempting to re-integrate with each other, the primary server
4869	   has received all of the updates from the secondary server.  It make a
4870	   transition into CONFLICT-DONE state in order that it may be totally
4871	   responsive to the client load, as opposed to NORMAL state where it
4872	   would be in a "balanced" responsive state, running the load balancing
4873	   algorithm.

4875	9.12.1.  Upon entry to CONFLICT-DONE state

4877	   A secondary server should never enter CONFLICT-DONE state.

4879	9.12.2.  Operation in CONFLICT-DONE state

4881	   A primary server in CONFLICT-DONE state is fully responsive to all
4882	   DHCP clients (similar to the situation in COMMUNICATIONS-INTERRUPTED
4883	   state).

4885	   If communications fails, remain in CONFLICT-DONE state.  If communi-
4886	   cations becomes OK, remain in CONFLICT-DONE state until the condi-
4887	   tions for transition out become satisfied.

4889	9.12.3.  Transitions out of CONFLICT-DONE state

4891	   If communications fails with the partner while in CONFLICT-DONE
4892	   state, then the server will remain in CONFLICT-DONE state.

4894	   When a primary server determines that the secondary server has made a
4895	   transition into NORMAL state, the primary server will also transition
4896	   into NORMAL state.

4898	9.13.  PAUSED state

4900	   This state exists to allow one server to inform another that it will
4901	   be out of service for what is predicted to be a relatively short
4902	   time, and to allow the other server to transition to COMMUNICATIONS-
4903	   INTERRUPTED state immediately and to begin servicing all DHCP clients
4904	   with no interruption in service to new DHCP clients.

4906	   A server which is aware that it is shutting down temporarily SHOULD
4907	   send a STATE message with the server-state option containing PAUSED
4908	   state and close the TCP connection.

4910	   While a server may or may not transition internally into PAUSED
4911	   state, the 'previous' state determined when it is restarted MUST be
4912	   the state the server was in prior to receiving the command to shut-
4913	   down and restart and which precedes its entry into the PAUSED state.
4914	   See section 9.3.2 concerning the use of the previous state upon
4915	   server restart.

4917	9.13.1.  Upon entry to PAUSED state

4919	   When entering PAUSED state, the server MUST store the previous state
4920	   in stable storage, and use that state as the previous state when it
4921	   is restarted.

4923	9.13.2.  Transitions out of PAUSED state

4925	   A server makes a transition out of PAUSED state by being restarted.
4926	   At that time, the previous state MUST be the state the server was in
4927	   prior to entering the PAUSED state.

4929	9.14.  SHUTDOWN state

4931	   This state exists to allow one server to inform another that it will
4932	   be out of service for what is predicted to be a relatively long time,
4933	   and to allow the other server to transition immediately to PARTNER-
4934	   DOWN state, and take over completely for the server going down.

4936	9.14.1.  Upon entry to SHUTDOWN state

4938	   When entering SHUTDOWN state, the server MUST record the previous
4939	   state in stable storage for use when the server is restarted.  It
4940	   also MUST record the current time as the last time operational.

4942	   A server which is aware that it is shutting down SHOULD send a STATE
4943	   message with the server-state field containing SHUTDOWN.

4945	9.14.2.  Operation in SHUTDOWN state

4947	   A server in SHUTDOWN state MUST NOT respond to any DHCP client input.

4949	   If a server receives any message indicating that the partner has
4950	   moved to PARTNER-DOWN state while it is in SHUTDOWN state then it
4951	   MUST record RECOVER state as the previous state to be used when it is
4952	   restarted.

4954	   A server SHOULD wait for a few seconds after informing the partner of
4955	   entry into SHUTDOWN state (if communications are okay) to determine
4956	   if the partner entered PARTNER-DOWN state.

4958	9.14.3.  Transitions out of SHUTDOWN state

4960	   A server makes a transition out of SHUTDOWN state by being restarted.

4962	10.  Safe Period

4964	   Due to the restrictions imposed on each server while in
4965	   COMMUNICATIONS-INTERRUPTED state, long-term operation in this state
4966	   is not feasible for either server.  One reason that these states
4967	   exist at all, is to allow the servers to easily survive transient
4968	   network communications failures of a few minutes to a few days
4969	   (although the actual time periods will depend a great deal on the
4970	   DHCP activity of the network in terms of arrival and departure of
4971	   DHCP clients on the network).

4973	   Eventually, when the servers are unable to communicate, they will
4974	   have to move into a state where they no longer can re-integrate
4975	   without some possibility of a duplicate IP address allocation.  There
4976	   are two ways that they can move into this state (known as PARTNER-
4977	   DOWN).

4979	   They can either be informed by external command that, indeed, the
4980	   partner server is down.  In this case, there is no difficulty in mov-
4981	   ing into the PARTNER-DOWN state since it is an accurate reflection of
4982	   reality and the protocol has been designed to operate correctly (even
4983	   during reintegration) as long as, when in PARTNER-DOWN state the
4984	   partner is, indeed, down.

4986	   The more difficult scenario is when the servers are running unat-
4987	   tended for extended periods, and in this case an option is provided
4988	   to configure something called a "safe-period" into each server.  This
4989	   OPTIONAL safe-period is the period after which either the primary or
4990	   secondary server will automatically transition to PARTNER-DOWN from
4991	   COMMUNICATIONS-INTERRUPTED state.  If this transition is completed
4992	   and the partner is not down, then the possibility of duplicate IP
4993	   address allocations will exist.

4995	   The goal of the "safe-period" is to allow network operations staff
4996	   some time to react to a server moving into COMMUNICATIONS-INTERRUPTED
4997	   state.  During the safe-period the only requirement is that the net-
4998	   work operations staff determine if both servers are still running --
4999	   and if they are, to either fix the network communications failure
5000	   between them, or to take one of the servers down before the  expira-
5001	   tion of the safe-period.

5003	   The length of the safe-period is installation dependent, and depends
5004	   in large part on the number of unallocated IP addresses within the
5005	   subnet address pool and the expected frequency of arrival of
5006	   previously unknown DHCP clients requiring IP addresses.  Many
5007	   environments should be able to support safe-periods of several days.

5009	   During this safe period, either server will allow renewals from any
5010	   existing client.  The only limitation concerns the need for IP
5011	   addresses for the DHCP server to hand out to new DHCP clients and the
5012	   need to re-allocate IP addresses to different DHCP clients.

5014	   The number of "extra" IP addresses required is equal to the expected
5015	   total number of new DHCP clients encountered during the safe period.
5016	   This is dependent only on the arrival rate of new DHCP clients, not
5017	   the total number of outstanding leases on IP addresses.

5019	   In the unlikely event that a relatively short safe period of an hour
5020	   is all that can be used (given a dearth of IP addresses or a very
5021	   high arrival rate of new DHCP clients), even that can provide sub-
5022	   stantial benefits in allowing the DHCP subsystem to ride through
5023	   minor problems that could occur and be fixed within that hour.  In
5024	   these cases, no possibility of duplicate IP address allocation
5025	   exists, and re-integration after the failure is solved will be
5026	   automatic and require no operator intervention.

5028	11.  Security

5030	   The Failover protocol communicates DHCP lease activity and this data
5031	   is generally easily discovered via other means, such as by pinging
5032	   addresses and doing DNS lookups. Therefore, the need to encrypt the
5033	   data over the wire is likely not great (though some sites may feel
5034	   differently).

5036	   However, it is very desirable to assure the integrity of failover
5037	   partners and to thus ensure proper operation of the servers. For
5038	   example, denial of service attacks are possible by the communication
5039	   of invalid state information to one or both servers.

5041	   Therefore, the Failover protocol MUST be capable of being secured by
5042	   using a simple shared secret message digest which covers each mes-
5043	   sage.  This provides authentication of the servers, but does not pro-
5044	   vide encryption of the data exchange.

5046	   The Failover protocol MAY also be secured by using TLS [RFC 2246]
5047	   (Transport Layer Security) if encryption of the data exchange is
5048	   desired.  The use of the shared secret or TLS will not protect
5049	   against TCP or IP layer attacks (such as someone sending fake TCP RST
5050	   segments). IPsec [RFC 2401] SHOULD be used to protect against most
5051	   (if not all) of these kinds of attacks.

5053	11.1.  Simple shared secret

5055	   Messages between the failover partners can be authenticated through
5056	   the use of a shared secret, which is never sent over the network and
5057	   must be known by each server. How each server is told about this
5058	   shared secret and secures its storage of the shared secret is outside
5059	   the scope of this document.  If a server is configured with a shared
5060	   secret for a partner, it MUST send the message-digest option in ALL
5061	   messages to that partner and it MUST treat any messages received from
5062	   that partner without a message-digest option as failing authentica-
5063	   tion and reject them with reject reason 21: "Missing message digest".
5064	   Note that the message digest option MUST be the first option in the
5065	   message.

5067	   If a server is not configured with a shared secret for a partner, it
5068	   MUST NOT send the message-digest option in any message to that
5069	   partner and it MUST treat any messages received from that partner
5070	   with a message-digest option as failing authentication with reject
5071	   reason 13: "Message digest not configured".

5073	   The shared secret is used to calculate a 16 octet message-digest
5074	   which is sent in every failover message in the message-digest option.
5075	   See section 12.16. The message-digest contains a one-way 16 octet
5076	   HMAC-MD5 [RFC 2104] hash calculated over a stream of octets consist-
5077	   ing of the entire message concatenated with the shared secret.

5079	   For calculation, the message includes the message-digest option with
5080	   the message-digest data zeroed (16-octets of zero). Once the calcula-
5081	   tion is complete, these 16 octets of zero are replaced by the 16-
5082	   octet HMAC-MD5 hash and the message is sent.

5084	   For verification, the 16-octet message-digest is saved and replaced
5085	   with 16-octets of zero and calculated per above. The resulting HMAC-
5086	   MD5 hash is compared to the received hash and if they match, the mes-
5087	   sage is assumed authenticated.

5089	   A failover partner that fails to authenticate a received message or
5090	   receives a message without a message-digest option when configured
5091	   with a shared secret MUST close the connection immediately and take
5092	   steps to notify operators.

5094	   Every time a CONNECT message is received, the time at which that mes-
5095	   sage was sent by the partner (i.e., the time that actually appears in
5096	   the message itself) MUST be saved.  If a CONNECT message is ever
5097	   received containing that time or containing a time before that time,
5098	   it MUST be rejected.

5100	   The XID (see section 6.1) of every message received at a failover
5101	   endpoint MUST be greater than that of the previous message received
5102	   on that failover endpoint or the message just received MUST be
5103	   rejected.

5105	   A server MAY operate with arbitrary time skew between servers (see
5106	   section 5.10), but when using a shared secret administrators MAY wish
5107	   to configure a maximum allowable time skew between a failover server
5108	   and its partner(s).  Servers SHOULD allow an administrator to config-
5109	   ure a maximum allowable time skew between two failover partners.

5111	11.2.  TLS

5113	   TLS, Transport Layer Security, as specified in [RFC 2246] MAY be
5114	   used.  The use of TLS would be similar to the way it is used with
5115	   SMTP [RFC 2487] and IMAP/POP3/ACAP [RFC 2595].

5117	   To request the use of TLS, the primary MUST send the TLS-request
5118	   option as part of the CONNECT message. The secondary receiving the
5119	   TLS-request option MUST respond with a TLS-reply option indicating
5120	   its acceptance or rejection of the TLS-request in the CONNECT mes-
5121	   sage."

5123	   If the CONNECTACK message contained a TLS-reply of 1 , then both
5124	   servers immediately begin TLS negotiation.

5126	   Upon completion of this negotiation, the primary server sends another
5127	   CONNECT message without any TLS-request option, and must wait for a
5128	   corresponding CONNECTACK.

5130	   Implementation of the TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA [RFC 2246]
5131	   cipher suite is REQUIRED in Failover servers supporting TLS. This is
5132	   important as it assures that any two compliant implementations can be
5133	   configured to interoperate.

5135	12.  Failover Options

5137	   This section lists all of the options that are currently defined to
5138	   be used with the failover protocol.  See section 6.2 for details con-
5139	   cerning time values.

5141	12.1.  addresses-transferred

5143	   A 32 bit unsigned long in network byte order. Reports the number of
5144	   addresses transferred by the primary to the secondary server
5145	   (addresses to be used for the secondary server's private address
5146	   pool).

5148	        Code        Len       Number of Addresses
5149	   +-----+-----+-----+-----+----+-----+-----+-----+
5150	   |  0  |  1  |  0  |  4  | n1 |  n2 |  n3 |  n4 |
5151	   +-----+-----+-----+-----+----+-----+-----+-----+

5153	12.2.  assigned-IP-address

5155	   The DHCP managed IP address to which this message refers.

5157	        Code        Len          Address
5158	   +-----+-----+-----+-----+----+-----+-----+-----+
5159	   |  0  |  2  |  0  |  4  | a1 |  a2 |  a3 |  a4 |
5160	   +-----+-----+-----+-----+----+-----+-----+-----+

5162	12.3.  binding-status

5164	   This option is used to convey the current state of a binding.

5166	       Code         Len     Type
5167	   +-----+-----+-----+-----+-----+
5168	   |  0  |  3  |  0  |  1  | 1-7 |
5169	   +-----+-----+-----+-----+-----+

5171	   Legal values for this option are:

5173	   Value Binding Status
5174	   ----- ------------------------------------------------
5175	   1     FREE           Lease is currently available to the primary
5176	   2     ACTIVE         Lease is assigned to a client
5177	   3     EXPIRED        Lease has expired
5178	   4     RELEASED       Lease has been released by client
5179	   5     ABANDONED      A server, or client flagged address as unusable
5180	   6     RESET          Lease was freed by some external agent
5181	   7     BACKUP         Lease belongs to secondary's private address pool

5183	12.4.  client-identifier

5185	   This is the client-identifier for the client associated with a
5186	   binding.  The client-identifier data is subject to the same
5187	   conventions as DHCP option 81 [RFC 2132].

5189	        Code        Len       Client Identifier
5190	   +-----+-----+-----+-----+----+-----+---
5191	   |  0  |  4  |  0  |  n  | i1 |  i2 | ...
5192	   +-----+-----+-----+-----+----+-----+--

5194	12.5.  client-hardware-address

5196	   This is the hardware address for the client associated with a
5197	   binding.  Byte t1 (type) MUST be set to the proper ARP hardware
5198	   address code, as defined in the ARP section of RFC 1700 (it MUST NOT
5199	   be zero!)

5201	        Code        Len     htype   chaddr
5202	   +-----+-----+-----+-----+----+-----+-----+---
5203	   |  0  |  5  |  0  |  n  | t1 |  c1 |  c2 | ...
5204	   +-----+-----+-----+-----+----+-----+-----+---

5206	12.6.  client-last-transaction-time

5208	   The time at which this server last received a DHCP request from a
5209	   particular client expressed as an absolute time (see section 6.2).

5211	        Code        Len    client last transaction time
5212	   +-----+-----+-----+-----+----+-----+-----+-----+
5213	   |  0  |  6  |  0  |  4  | t1 |  t2 |  t3 |  t4 |
5214	   +-----+-----+-----+-----+----+-----+-----+-----+

5216	12.7.  client-reply-options

5218	   This option contains options from a DHCP server's reply to a DHCP
5219	   client request.  It is sent in a BNDUPD message.  The first 4 bytes
5220	   of the option contain the "magic number" of the option area from
5221	   which the DHCP reply options were taken and serves to define the
5222	   format of the rest of the sub-options contained in this option.
5223	   After the magic number, the options included are in the normal
5224	   options format appropriate for that magic number.

5226	   A server SHOULD NOT include all of the options in a DHCP server's
5227	   reply to a client's request in this option, but rather a server
5228	   SHOULD include only those options which are of likely interest to its
5229	   partner server.  See section 7.1 for details.

5231	        Code        Len         Magic Number      Embedded options
5232	   +-----+-----+-----+-----+----+----+----+----+----+----+--
5233	   |  0  |  7  |  0  |  n  | m1 | m2 | m3 | m4 | b1 | b2 |  ...
5234	   +-----+-----+-----+-----+----+----+----+----+----+----+--

5236	12.8.  client-request-options

5238	   This option contains options from a DHCP client's request.  It is
5239	   sent in a BNDUPD message.  The first 4 bytes of the option contain
5240	   the "magic number" of the option area from which the DHCP client's
5241	   request options were taken and serves to define the format of the
5242	   rest of the sub-options contained in this option.  After the magic
5243	   number, the options included are in the normal options format
5244	   appropriate for that magic number.

5246	   A server SHOULD NOT include all of the options in a DHCP client
5247	   request in this option, but rather a server SHOULD include only those
5248	   options which are of likely interest to its partner server.  See
5249	   section 7.1 for details.

5251	        Code        Len         Magic Number      Embedded options
5252	   +-----+-----+-----+-----+----+----+----+----+----+----+--
5253	   |  0  |  8  |  0  |  n  | m1 | m2 | m3 | m4 | b1 | b2 |  ...
5254	   +-----+-----+-----+-----+----+----+----+----+----+----+--

5256	12.9.  DDNS

5258	   If an implementation supports Dynamic DNS updates, this option is
5259	   used to communicate the status of the DDNS update associated with a
5260	   particular lease binding.  The Flags field conveys the types of DNS
5261	   RRs that are to be updated by the DHCP server, and the status of the
5262	   DDNS update.  The Domain Name field conveys the DNS FQDN that the
5263	   DHCP server is using to refer to the client, in DNS encoding as
5264	   specified in [RFC 1035].

5266	       Code        Len        Flags      Domain Name
5267	   +-----+-----+-----+-----+-----+------+------+-----+------
5268	   |  0  |  9  |  0  |  n  |   flags    |  d1  |  d2 | ...
5269	   +-----+-----+-----+-----+-----+------+------+-----+------

5271	   The Flags field is a 16-bit field; several bit positions are
5272	   specified here.

5274	                        1 1 1 1 1 1
5275	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
5276	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
5277	   |C|A|D|P|       MBZ             |
5278	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5280	   The bits (numbered from the least-significant bit in network
5281	   byte-order) are used as follows:

5283	   0 (C): name to address (such as A RR) update successfully completed
5284	   1 (A): Server is controlling A RR on behalf of the client
5285	   2 (D): address to name (such as PTR RR) update successfully completed (Done)
5286	   3 (P): Server is controlling PTR RR on behalf of the client
5287	   4-15 : Must be zero

5289	   All of the unspecified bit positions SHOULD be set to 0 by servers
5290	   sending the Failover-DDNS option, and they MUST be ignored by servers
5291	   receiving the option.

5293	12.10.  delayed-service-parameter

5295	   The delayed-service-parameter is an optional load balancing tuning
5296	   parameter, defined in [RFC 3074].  If it is used, it MUST be sent in
5297	   the same message as the hash-bucket-assignment option (see section
5298	   12.11).

5300	   Format :

5302	       Code        Len    Seconds
5303	   +-----+-----+-----+-----+----+
5304	   |  0  |  10 |  0  |  1  | S  |
5305	   +-----+-----+-----+-----+----+

5307	   S is a one byte value, 1..255.

5309	12.11.  hash-bucket-assignment

5311	   A set of load balancing hash values for the secondary server.  A one
5312	   bit in the hash buckets indicates that the secondary is to service
5313	   that set of clients.  See section 5.3 for more information on how
5314	   this option is used.  This option is only sent from the primary to
5315	   the secondary.

5317	   The format and usage of the data in this option is defined in [RFC
5318	   3074].

5320	        Code        Len        Hash Buckets
5321	   +-----+-----+-----+-----+-----+-----+-----+-----+
5322	   |  0  |  11 |  0  |  32 |  b1 |  b2 | ... | b32 |
5323	   +-----+-----+-----+-----+-----+-----+-----+-----+

5325	12.12.  IP-flags

5327	   This option is used to convey the current flags of the assigned-IP-
5328	   address option preceding it.

5330	       Code         Len       IP Flags
5331	   +-----+-----+-----+-----+-----+-----+
5332	   |  0  |  12 |  0  |  1  |  f1 |  f2 |
5333	   +-----+-----+-----+-----+-----+-----+

5335	   The IP-flags field is a 16-bit field; two bit positions are
5336	   specified here.

5338	                        1 1 1 1 1 1
5339	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
5340	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
5341	   |R|B|           MBZ             |
5342	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5344	   The bits (numbered from the least-significant bit in network
5345	   byte-order) are used as follows:

5347	   0 (R): RESERVED  (this bit allocated and in use and named "RESERVED")
5348	          Bit 0 MUST be set to 1 whenever the IP address in the preceding
5349	          assigned-IP-address option is reserved on the server sending the
5350	          packet.
5351	   1 (B): BOOTP
5352	          Bit 1 MUST be set to 1 whenever the IP address in the preceding
5353	          assigned-IP-address option is a an IP address which has been
5354	          allocated due to an interaction with a BOOTP client (as opposed
5355	          to a DHCP client).
5356	   2-15  : Must be zero

5358	12.13.  lease-expiration-time

5360	   The lease expiration time is the lease interval that a DHCP server
5361	   has ACKed to a DHCP client added to the time at which that ACK was
5362	   transmitted -- expressed as an absolute time (see section 6.2).

5364	        Code        Len          Time
5365	   +-----+-----+-----+-----+----+-----+-----+-----+
5366	   |  0  |  13 |  0  |  4  | t1 |  t2 |  t3 |  t4 |
5367	   +-----+-----+-----+-----+----+-----+-----+-----+

5369	12.14.  max-unacked-bndupd

5371	   The maximum number of BNDUPD message that this server is prepared to
5372	   accept over the TCP connection without causing the TCP connection to
5373	   block.  A 32 bit unsigned integer value, in network byte order.

5375	        Code        Len     Maximum Unacked BNDUPD
5376	   +-----+-----+-----+-----+----+-----+-----+-----+
5377	   |  0  |  14 |  0  |  4  | n1 |  n2 |  n3 |  n4 |
5378	   +-----+-----+-----+-----+----+-----+-----+-----+

5380	12.15.  MCLT

5382	   Maximum Client Lead Time, an interval, in seconds.  A 32 bit unsigned
5383	   integer value, in network byte order.

5385	        Code        Len             Time
5386	   +-----+-----+-----+-----+----+-----+-----+-----+
5387	   |  0  |  15 |  0  |  4  | t1 |  t2 |  t3 |  t4 |
5388	   +-----+-----+-----+-----+----+-----+-----+-----+

5390	12.16.  message

5392	   This option is used to supply a human readable message text.  It may
5393	   be used in association with the Reject Reason Code to provide a human
5394	   readable error message for the reject.

5396	        Code        Len         Text
5397	   +-----+-----+-----+-----+------+-----+--
5398	   |  0  |  16 |  0  |  n  |  c1  | c2  | ...
5399	   +-----+-----+-----+-----+------+-----+--

5401	12.17.  message-digest

5403	   The message digest for this message.

5405	   This option consists of a variable number of bytes which contain the
5406	   message digest of the message prior to the inclusion of this option.

5408	   When this option appears in a message, it MUST appear as the first
5409	   option in the message.  It MUST appear in every message if message
5410	   digests are required.  The Type MUST be configurable (once additional
5411	   types are defined).  When additional types are defined, they MUST be
5412	   specified as either optional (MAY be supported) or required (MUST be
5413	   supported).  See the section on IANA considerations for more details.

5415	        Code        Len      Type   Message Digest
5416	   +-----+-----+-----+-----+-----+-----+-----+--
5417	   |  0  |  17 |  0  |  n  |  t  |  d1 |  d2 | ...
5418	   +-----+-----+-----+-----+-----+-----+-----+--

5420	      Type:    0      Not Allowed
5421	               1      HMAC-MD5
5422	               2-255  Not Allowed

5424	12.18.  potential-expiration-time

5426	   The potential expiration time is the time that one server tells
5427	   another server that it may wish to grant in a lease to a DHCP client.
5428	   It is an absolute time.  See section 6.2.

5430	        Code        Len          Time
5431	   +-----+-----+-----+-----+----+-----+-----+-----+
5432	   |  0  |  18 |  0  |  4  | t1 |  t2 |  t3 |  t4 |
5433	   +-----+-----+-----+-----+----+-----+-----+-----+

5435	12.19.  receive-timer

5437	   The number of seconds (an interval) within which the server must
5438	   receive a message from its partner, or it will assume that
5439	   communications with the partner is not ok.  An unsigned 32 bit
5440	   integer in network byte order.

5442	        Code        Len         Receive Timer
5443	   +-----+-----+-----+-----+----+-----+-----+-----+
5444	   |  0  |  19 |  0  |  4  | s1 |  s2 |  s3 |  s4 |
5445	   +-----+-----+-----+-----+----+-----+-----+-----+

5447	12.20.  protocol-version

5449	   The protocol version being used by the server. It is only sent in the
5450	   CONNECT and CONNECTACK messages.  The current value for the version
5451	   is 1.

5453	        Code        Len    Version
5454	   +-----+-----+-----+-----+-----+
5455	   |  0  |  20 |  0  |  1  |  1  |
5456	   +-----+-----+-----+-----+-----+

5458	12.21.  reject-reason

5460	   This option is used to selectively reject binding updates. It MAY be
5461	   used in a BNDACK message or a CONNECTACK message, always associated
5462	   with an assigned-IP-address option, which contains the IP address of
5463	   the update being rejected.

5465	        Code        Len   Reason Code
5466	   +-----+-----+-----+-----+-----+
5467	   |  0  |  21 |  0  |  1  |  R1 |
5468	   +-----+-----+-----+-----+-----+

5470	   Reason codes (section where referenced in parentheses):

5472	   0   Reserved
5473	   1   Illegal IP address (not part of any address pool). (7.1.3)
5474	   2   Fatal conflict exists: address in use by other client. (7.1.3)
5475	   3   Missing binding information. (7.1.3)
5476	   4   Connection rejected, time mismatch too great. (7.8.2)
5477	   5   Connection rejected, invalid MCLT. (7.8.2)
5478	   6   Connection rejected, unknown reason. (not specifically referenced)
5479	   7   Connection rejected, duplicate connection. (unused)
5480	   8   Connection rejected, invalid failover partner. (7.8.2)
5481	   9   TLS not supported. (7.8.2)
5482	   10  TLS supported but not configured. (7.8.2)
5483	   11  TLS required but not supported by partner. (7.8.2)
5484	   12  Message digest not supported. (11.1)
5485	   13  Message digest not configured. (11.1)
5486	   14  Protocol version mismatch. (7.8.2)
5487	   15  Outdated binding information. (7.1.3)
5488	   16  Less critical binding information. (7.1.3)
5489	   17  No traffic within sufficient time. (8.6)
5490	   18  Hash bucket assignment conflict. (7.8.2)
5491	   19  IP not reserved on this server. (7.1.3)
5492	   20  Message digest failed to compare. (7.8.2)
5493	   21  Missing message digest. (7.1.3)
5494	   22-253, reserved.
5495	   254 Unknown: Error occurred but does not match any reason code.
5496	   255 Reserved for code expansion.

5498	12.22.  relationship-name

5500	   A string which is a unique identifier for the failover relationship.

5502	        Code        Len       Relationship Name
5503	   +-----+-----+-----+-----+----+-----+---
5504	   |  0  |  22 |  0  |  n  | c1 |  c2 |  ...
5505	   +-----+-----+-----+-----+----+-----+---

5507	12.23.  server-flags

5509	   This option is used to convey the current flags of the failover
5510	   endpoint in the sending server.

5512	       Code         Len     Server Flags
5513	   +-----+-----+-----+-----+-------+
5514	   |  0  |  23 |  0  |  1  | flags |
5515	   +-----+-----+-----+-----+-------+

5517	   The flags field is an 8-bit field; one bit position is
5518	   specified here.

5520	    0 1 2 3 4 5 6 7
5521	   +-+-+-+-+-+-+-+-+
5522	   |S|   MBZ       |
5523	   +-+-+-+-+-+-+-+-+

5525	   The bits (numbered from the least-significant bit in network
5526	   byte-order) are used as follows:

5528	   0 (S): STARTUP,
5529	          Bit 0 MUST be set to 1 whenever the server is in STARTUP state,
5530	          and set to 0 otherwise.  (Note that when in STARTUP state, the
5531	          state transmitted in the server-state option is usually the last
5532	          recorded state from stable storage, but see section 9.3 for
5533	          details.)
5534	   1-7  : Must be zero

5536	12.24.  server-state

5538	   This option is used to convey the current state of the failover
5539	   endpoint in the sending server.

5541	       Code         Len   Server State
5542	   +-----+-----+-----+-----+-----+
5543	   |  0  |  24 |  0  |  1  | 1-9 |
5544	   +-----+-----+-----+-----+-----+

5546	   Legal values for this option are:

5548	   Value   Server State
5549	   -----   -------------------------------------------------------------
5550	   0       reserved
5551	   1       STARTUP                      Startup state (1)
5552	   2       NORMAL                       Normal state
5553	   3       COMMUNICATIONS-INTERRUPTED   Communication interrupted (safe)
5554	   4       PARTNER-DOWN                 Partner down (unsafe mode)
5555	   5       POTENTIAL-CONFLICT           Synchronizing
5556	   6       RECOVER                      Recovering bindings from partner
5557	   7       PAUSED                       Shutting down for a short period.
5558	   8       SHUTDOWN                     Shutting down for an extended
5559	                                        period.
5560	   9       RECOVER-DONE                 Interlock state prior to NORMAL
5561	   10      RESOLUTION-INTERRUPTED       Comm. failed during resolution
5562	   11      CONFLICT-DONE                Primary has resolved its conflicts

5564	   (1) The STARTUP state is never sent to the partner server, it is
5565	   indicated by the STARTUP bit in the server-flags options (see section
5566	   12.22).

5568	12.25.  start-time-of-state

5570	   This option is used for different states in different messages.  In a
5571	   BNDUPD message it represents the start time of the state of the lease
5572	   in the BNDUPD message.  In a STATE message, it represents the start
5573	   time of the partner server's failover state.  In all cases it is an
5574	   absolute time.

5576	        Code        Len      Start Time of State
5577	   +-----+-----+-----+-----+----+-----+-----+-----+
5578	   |  0  |  25 |  0  |  4  | t1 |  t2 |  t3 |  t4 |
5579	   +-----+-----+-----+-----+----+-----+-----+-----+

5581	12.26.  TLS-reply

5583	   This option contains information relating to TLS security
5584	   negotiation.  It is sent in a CONNECTACK message

5586	   A t1 value of 0 indicates no TLS operation, a value of 1 indicates
5587	   that TLS operation is required.

5589	        Code        Len      TLS
5590	   +-----+-----+-----+-----+-----+
5591	   |  0  |  26 |  0  |  1  |  t1 |
5592	   +-----+-----+-----+-----+-----+

5594	12.27.  TLS-request

5596	   This option contains information relating to TLS security
5597	   negotiation.  It is sent in a CONNECT message.

5599	   The t1 byte is the TLS request from the primary server.  A value of 0
5600	   indicates no TLS operation (to communicate the secondary server MUST
5601	   NOT require TLS), a value of 1 indicates that TLS operation is
5602	   desired but not required (to communicate, the secondary server MAY
5603	   utilize TLS), and a value of 2 indicates that TLS operation is
5604	   required (to communicate the secondary server MUST utilize TLS) to
5605	   establish communications with the primary server.

5607	        Code        Len      TLS
5608	   +-----+-----+-----+-----+-----+
5609	   |  0  |  27 |  0  |  1  |  t1 |
5610	   +-----+-----+-----+-----+-----+

5612	12.28.  vendor-class-identifier

5614	   A string which identifies the vendor of the failover protocol
5615	   implementation.

5617	        Code        Len    vendor class string
5618	   +-----+-----+-----+-----+----+-----+---
5619	   |  0  |  28 |  0  |  n  | c1 |  c2 |  ...
5620	   +-----+-----+-----+-----+----+-----+---

5622	12.29.  vendor-specific-options

5624	   This option is used to convey options specific to a particular
5625	   vendor's implementation.  The vendor class identifier is used to
5626	   specify which option space the embedded options are drawn from.
5627	   Every message that uses vendor specific options MUST have a vendor-
5628	   class-identifier option in it.

5630	   It functions similarly to the vendor class identifier and vendor
5631	   specific options in the DHCP protocol.

5633	   This option contains other options in the same two byte code, two
5634	   byte length format.  If this option appears in a message without a
5635	   corresponding vendor class identifier, it MUST be ignored.

5637	        Code        Len     Embedded options
5638	   +-----+-----+-----+-----+----+-----+---
5639	   |  0  |  29 |  0  |  n  | c1 |  c2 |  ...
5640	   +-----+-----+-----+-----+----+-----+---

5642	13.  IANA Considerations

5644	   This document defines several number spaces (failover options, fail-
5645	   over message types, message digest types, and failover reject reason
5646	   codes). For all of these number spaces, certain values are defined in
5647	   this specification.  New values may only be defined by IETF Con-
5648	   sensus, as described in [RFC 2434]. Basically, this means that they
5649	   are defined by RFCs approved by the IESG.

5651	14.  Acknowledgments

5653	   Ralph Droms started it all, by sketching out an initial interserver
5654	   draft that embodied ideas from several past IETF meetings.  In that
5655	   draft, he acknowledged contributions by Jeff Mogul, Greg Minshall,
5656	   Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group.

5658	   Kim Kinnear and Bob Cole each extended that draft, separately and
5659	   then together, until they created an interserver draft that supported
5660	   any number of servers.  The complexity of that approach was just too
5661	   great, and that draft wasn't greeted with enthusiasm by many, includ-
5662	   ing its authors.

5664	   It did however lead to a much simpler approach embodied in the first
5665	   Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph
5666	   Droms.  This draft posited only two servers -- a primary and a secon-
5667	   dary.

5669	   Kim Kinnear then wrote the Safe Failover draft to layer on top of the
5670	   Failover Draft and increase its robustness in the face of certain
5671	   rare network failures.

5673	   At the spring 1998 IETF meeting in LA, the DHC working group said
5674	   that they wanted a merged Failover and Safe Failover draft.  Steve
5675	   Gonczi and Bernie Volz stepped up and produced the raw material for
5676	   such a merged draft, along with a new message format designed around
5677	   DHCP options and other extensions and clarifications.  Kim Kinnear
5678	   edited their work into draft format and made other changes in time
5679	   for the Summer Chicago IETF meeting.

5681	   Many people have reviewed the various earlier drafts that went into
5682	   this result.  At American Internet, ideas were contributed by Brad
5683	   Parker.  At Cisco Systems Paul Fox and Ellen Garvey contributed to
5684	   the design of the protocol.

5686	   During the summer and fall of 1998, two groups worked on separate
5687	   implementations of the UDP failover draft.  Bernie Volz and Steve
5688	   Gonczi constituted one group, and Kim Kinnear, Mark Stapp and Paul
5689	   Fox made up the other.  These two groups worked together to produce
5690	   considerable changes and simplifications of the protocol during that
5691	   period, and Steve Gonczi and Kim Kinnear edited those changes into
5692	   -03 draft in time for submission to the December 1998 Orlando IETF
5693	   meeting.

5695	   In February of 1999 Kim Kinnear and Mark Stapp hosted a meeting of
5696	   people interested in the failover draft.  During that meeting a gen-
5697	   eral agreement was reached to recast the failover protocol to use TCP
5698	   instead of UDP.  In addition, the group together brainstormed a work-
5699	   able load-balancing technique.  Kim Kinnear rewrote the entire draft
5700	   to include the changes made at that meeting as well as to restructure
5701	   the draft along guidelines suggested by Thomas Narten.  The result
5702	   was the -04 draft, submitted prior to the Oslo IETF meeting.

5704	   The initial idea for a hash-based load balancing approach was offered
5705	   by Ted Lemon, and the determination of an algorithm and its integra-
5706	   tion into the draft was done by Steve Gonczi.  The security section
5707	   was spearheaded by Bernie Volz.  Both contributed considerably to the
5708	   ideas and text in the rest of the draft with several reviews.

5710	   In early October of 1999, three conference calls were held to discuss
5711	   the -04 draft.  The -05 includes changes as a result of those calls,
5712	   perhaps the largest of which was to remove the load balancing
5713	   approach into a separate draft.   Thanks to all of the many people
5714	   who participated in the conference calls.  Changes were made because
5715	   of contributions by: Ted Lemon, David Erdmann, Richard Jones, Rob
5716	   Stevens, Thomas Narten, Diana Lane, and Andre Kostur.

5718	   Another conference call was held in mid-January of 2000, and the -06
5719	   draft was produced to tighten up the the -05 draft both technically
5720	   as well as editorially.

5722	   The -07 draft was edited by Kim Kinnear and was based in part on
5723	   reviews by Richard Jones, Bernie Volz, and Steve Gonczi.  It embodies
5724	   several technical updates as well as numerous editorial revisions
5725	   that enhanced both correctness as well as clarity.

5727	   The -08 draft was edited by Kim Kinnear and was based on the results
5728	   of two conference calls held in October and November of 2000.  It
5729	   includes the correct second port number, a new state to synchronize
5730	   conflict resolution with load balancing, a generally accepted
5731	   approach to secondary pool allocation, and many other updates based
5732	   on both operational as well as implementation experience.

5734	   This, the -09 draft was edited by Kim Kinnear based on discussions
5735	   held at the Minneapolis IETF in December of 2000, as well as issues
5736	   raised by Ted Lemon based on implementation and deployment.  The
5737	   specific changes were mailed to the dhcp-v4 list.

5739	   These most recent changes have not been widely circulated among the
5740	   other authors prior to submission to the IETF.

5742	   Glenn Waters of Nortel Networks contributed ideas and enthusiasm to
5743	   make a Failover protocol that was both "safe" and "lazy".

5745	15.  References

5747	   [DHCID] Stapp, M., Lemon, T., Gustafsson, A., "draft-ietf-dnsext-
5748	      dhcid-rr-02.txt", March, 2001.

5750	   [DNSRES] Stapp, M., "draft-ietf-dhc-dns-resolution-01.txt", March,
5751	      2001.

5753	   [FQDN] Rekhter, Y., Stapp, M., "draft-ietf-dhc-fqdn-option-01.txt",
5754	      March, 2001.

5756	   [RFC 1035] Mockapetris, P., "Domain Names - Implementation and
5757	      Specification", November, 1987.

5759	   [RFC 1534] Droms, R., "Interoperation between DHCP and BOOTP", RFC
5760	      1534, October 1993.

5762	   [RFC 2104] Krawczyk, H., Bellare, M., and Canetti, R., "HMAC: Keyed
5763	      Hashing for Message Authentication", RFC 2104, IBM T.J. Watson
5764	      Research Center, University of California at San Diego, February
5765	      1997.

5767	   [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate
5768	      Requirement Levels", RFC 2119.

5770	   [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC
5771	      2131, March 1997.

5773	   [RFC 2132] Alexander, S.,  Droms, R., "DHCP Options and BOOTP Vendor
5774	      Extensions", Internet RFC 2132, March 1997.

5776	   [RFC 2136] P. Vixie, S. Thomson, Y. Rekhter, J. Bound, "Dynamic
5777	      Updates in the Domain Name System (DNS UPDATE)", RFC 2136, April
5778	      1997

5780	   [RFC 2139] Rigney, C., "Radius Accounting", RFC 2139, Livingston
5781	      Enterprises, April 1997.

5783	   [RFC 2246] Dierks, T., "The TLS Protocol, Version 1.0", RFC 2246,
5784	      January 1999.

5786	   [RFC 2401] Kent, S., Atkinson, R., "Security Architecture for the
5787	      Internet Protocol", RFC 2401, November 1998.

5789	   [RFC 2434] Alvestrand, H. and T. Narten, "Guidelines for Writing an
5790	      IANA Considerations Section in RFCs", BCP 26, RFC 2434, October
5791	      1998.

5793	   [RFC 2487] Hoffman, P., "SMTP Service Extension for Secure SMTP over
5794	      TLS", RFC 2487, January 1999.

5796	   [RFC 2595] Newman, C., "Using TLS with IMAP, POP3, and ACAP", RFC
5797	      2595, June 1999.

5799	   [RFC 3004] Stump, G., Droms, R., Gu, Y., Vyaghrapuri, R., Demirtjis,
5800	      A., Privat, J.  "The User Class Option for DHCP", November 2000.

5802	   [RFC 3011] Waters, G., "The IPv4 Subnet Selection Option for DHCP",
5803	      November 2000.

5805	   [RFC 3046] Patrick, M., "DHCP Relay Agent Information Option", RFC
5806	      3046, January 2001.

5808	   [RFC 3074] Volz, B., Gonczi, S., Lemon, T., Stevens, R., "DHC Load-
5809	      balancing Algorithm", February, 2001.

5811	16.  Author's information

5813	      Ralph Droms
5814	      Kim Kinnear
5815	      Mark Stapp
5816	      Cisco Systems
5817	      250 Apollo Drive
5818	      Chelmsford, MA  01824

5820	      Phone: (978) 244-8000

5822	      EMail: rdroms@cisco.com
5823	             kkinnear@cisco.com
5824	             mjs@cisco.com

5826	      Bernie Volz
5827	      Ericsson
5828	      959 Concord St.
5829	      Framingham, MA  01701

5831	      Phone: +1-617-513-9060

5833	      EMail: bernie.volz@ericsson.com

5835	      Steve Gonczi
5836	      Network Engines, Inc.
5837	      25 Dan Road
5838	      Canton, MA 02021-2817

5840	      Phone: (781) 332-1165

5842	      Email: steve.gonczi@networkengines.com

5844	      Greg Rabil, Mike Dooley, Arun Kapur
5845	      Lucent Technologies
5846	      400 Lapp Road
5847	      Malvern, PA 19355

5849	      Phone: (800) 208-2747
5850	      EMail: grabil@lucent.com
5851	             mdooley@lucent.com
5852	             akapur@lucent.com

5854	17.  Full Copyright Statement

5856	Copyright (C) The Internet Society (2000). All Rights Reserved.

5858	This document and translations of it may be copied and furnished to oth-
5859	ers, and derivative works that comment on or otherwise explain it or
5860	assist in its implementation may be prepared, copied, published and dis-
5861	tributed, in whole or in part, without restriction of any kind, provided
5862	that the above copyright notice and this paragraph are included on all
5863	such copies and derivative works.  However, this document itself may not
5864	be modified in any way, such as by removing the copyright notice or
5865	references to the Internet Society or other Internet organizations,
5866	except as needed for the  purpose of developing Internet standards in
5867	which case the procedures for copyrights defined in the Internet Stan-
5868	dards process must be followed, or as required to translate it into
5869	languages other than English.

5871	The limited permissions granted above are perpetual and will not be
5872	revoked by the Internet Society or its successors or assigns.

5874	This document and the information contained herein is provided on an "AS
5875	IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
5876	FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
5877	LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
5878	INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT-
5879	NESS FOR A PARTICULAR PURPOSE.