idnits 2.17.1 

draft-ietf-dhc-failover-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 33 instances of too long lines in the document, the longest
     one being 6 characters in excess of 72.

  ** The abstract seems to contain references ([RFC2131]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 1414 has weird spacing: '...od ends  lease...'

  == Line 1910 has weird spacing: '...eserved    not...'

  == Line 2068 has weird spacing: '...  htype   chad...'

  == Line 2402 has weird spacing: '...    Len  reque...'

  == Line 5138 has weird spacing: '...ore the  expir...'

  == (1 more instance...)

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     In this state a server MUST respond to all DHCP client requests,
     and the algorithm for load balancing described in section 5.3 MUST NOT be
     used.  When allocating new IP addresses, each server allocates from its
     own IP address pool, where the primary MUST allocate only FREE IP
     addresses, and the secondary MUST allocate only BACKUP IP addresses. When
     responding to renewal requests, each server will allow continued renewal
     of a DHCP client's current lease on an IP address irrespec-tive of
     whether that lease was given out by the receiving server or not, although
     the renewal period MUST not exceed the maximum client lead time (MCLT)
     beyond the potential-expiration-time already ack-nowledged by the other
     server or the lease-expiration-time or potential-expiration-time received
     from the partner server.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     In this state a server MUST respond to all DHCP client requests,
     and any load balancing (described in section 5.3) MUST NOT be used.  When
     allocating new IP addresses, each server SHOULD allocate from its own IP
     address pool (if that can be determined), where the primary MUST allocate
     only FREE IP addresses, and the secondary MUST allocate only BACKUP IP
     addresses.  When responding to renewal requests, each server will allow
     continued renewal of a DHCP client's current lease on an IP address
     irrespective of whether that lease was given out by the receiving server
     or not, although the renewal period MUST not exceed the maximum client
     lead time (MCLT) beyond the potential-expiration-time already
     acknowledged by the other server or the lease-expiration-time or
     potential-expiration-time received from the partner server.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (April 2000) is 8776 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '1' on line 657

  == Missing Reference: 'USERCLASS' is mentioned on line 5371, but not defined

  == Missing Reference: 'IPAMTLS' is mentioned on line 5236, but not defined

  == Unused Reference: 'RFC 2132' is defined on line 5339, but no explicit
     reference was found in the text

  == Unused Reference: 'IMAPTLS' is defined on line 5348, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2246 (ref. 'TLS') (Obsoleted by RFC
     4346)

  ** Obsolete normative reference: RFC 2487 (ref. 'SMTPTLS') (Obsoleted by
     RFC 3207)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NAMESPACE'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DDNS'

  ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref.
     'MD5')

  ** Obsolete normative reference: RFC 2139 (ref. 'RADIUS') (Obsoleted by RFC
     2866)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'LOADB'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'AGENTINFO'


     Summary: 11 errors (**), 0 flaws (~~), 14 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        Ralph Droms
2	INTERNET DRAFT                                       Bucknell University

4	                                                             Kim Kinnear
5	                                                              Mark Stapp
6	                                                           Cisco Systems

8	                                                             Bernie Volz
9	                                                            Steve Gonczi
10	                                                        Process Software

12	                                                              Greg Rabil
13	                                                             Mike Dooley
14	                                                              Arun Kapur
15	                                                     Lucent Technologies

17	                                                            October 1999
18	                                                      Expires April 2000

20	                         DHCP Failover Protocol
21	                    <draft-ietf-dhc-failover-05.txt>

23	Status of this Memo

25	   This document is an Internet-Draft and is in full conformance with
26	   all provisions of Section 10 of RFC2026.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF), its areas, and its working groups.  Note that
30	   other groups may also distribute working documents as Internet-
31	   Drafts.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet- Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   The list of current Internet-Drafts can be accessed at
39	   http://www.ietf.org/ietf/1id-abstracts.txt

41	   The list of Internet-Draft Shadow Directories can be accessed at
42	   http://www.ietf.org/shadow.html.

44	Copyright Notice

46	   Copyright (C) The Internet Society (1999). All Rights Reserved.

48	Abstract

50	   DHCP [RFC 2131] allows for multiple servers to be operating on a
51	   single network.  Some sites are interested in running multiple
52	   servers in such a way so as to provide redundancy in case of server
53	   failure.  In order for this to work reliably, the cooperating primary
54	   and secondary servers must maintain a consistent database of the
55	   lease information.  This implies that servers will need to coordinate
56	   any and all lease activity so that this information is synchronized
57	   in case of failover.

59	   This document defines a protocol to provide this synchronization
60	   between two servers.  One server is designated the "primary" server,
61	   the other is the "secondary" server.  This document also describes a
62	   way to integrate the failover protocol with the DHCP loadbalancing
63	   approach.

65	   This document is a significant revision of draft-ietf-dhc-failover-
66	   04.txt.

68	Table of Contents

70	    1.  Introduction................................................. 4
71	    2.  Terminology.................................................. 5
72	    2.1.  Requirements terminology................................... 5
73	    2.2.  DHCP and failover terminology.............................. 5
74	    3.  Background and External Requirements......................... 8
75	    3.1.  Key aspects of the DHCP protocol........................... 8
76	    3.2.  BOOTP relay agent implementation........................... 10
77	    3.3.  What does it mean if a server can't communicate with its partner? 11
78	    3.4.  Challenging scenarios for a Failover protocol.............. 12
79	    3.5.  Using TCP to detect partner server failure................. 13
80	    4.  Design Goals................................................. 14
81	    4.1.  Design requirements for this protocol...................... 14
82	    4.2.  Goals for this protocol.................................... 15
83	    4.3.  Limitations of this Protocol............................... 16
84	    5.  Protocol Overview............................................ 16
85	    5.1.  Messages and States........................................ 17
86	    5.2.  Fundamental restrictions................................... 19
87	    5.3.  Load balancing............................................. 26
88	    5.4.  Operating in NORMAL state.................................. 27
89	    5.5.  Operating in COMMUNICATIONS-INTERRUPTED state.............. 27
90	    5.6.  Operating in PARTNER-DOWN state............................ 27
91	    5.7.  Operating in RECOVER state................................. 28
92	    5.8.  Operating in STARTUP state................................. 28
93	    5.  Protocol Overview (continued)
94	    5.9.  Time synchronization between servers....................... 28
95	    5.10.  IP address binding-status................................. 29
96	    5.11.  DNS dynamic update considerations......................... 34
97	    5.12.  Reservations and failover................................. 38
98	    5.13.  Dynamic BOOTP and failover................................ 39
99	    5.14.  Guidelines for selecting MCLT............................. 39
100	    6.  Packet Formats............................................... 40
101	    6.1.  Common message format...................................... 40
102	    6.2.  Common option format....................................... 43
103	    6.3.  BNDUPD message format...................................... 55
104	    6.4.  BNDACK message format...................................... 58
105	    6.5.  Bulking for BNDUPD and BNDACK messages..................... 59
106	    6.6.  UPDREQ message format...................................... 60
107	    6.7.  UPDREQALL message format................................... 60
108	    6.8.  UPDDONE message format..................................... 60
109	    6.9.  POOLREQ message format..................................... 61
110	    6.10.  POOLRESP message format................................... 61
111	    6.11.  CONNECT message format.................................... 62
112	    6.12.  CONNECTACK message format................................. 62
113	    6.13.  STATE message format...................................... 63
114	    6.14.  CONTACT message format.................................... 64
115	    6.15.  DISCONNECT message format................................. 64
116	    7.  Protocol Messages............................................ 64
117	    7.1.  BNDUPD message............................................. 64
118	    7.2.  BNDACK message............................................. 75
119	    7.3.  UPDREQ message............................................. 76
120	    7.4.  UPDREQALL message.......................................... 78
121	    7.5.  UPDDONE message............................................ 79
122	    7.6.  POOLREQ message............................................ 80
123	    7.7.  POOLRESP message........................................... 81
124	    7.8.  CONNECT message............................................ 81
125	    7.9.  CONNECTACK message......................................... 85
126	    7.10.  STATE message............................................. 88
127	    7.11.  CONTACT message........................................... 89
128	    7.12.  DISCONNECT message........................................ 89
129	    8.  Connection Management........................................ 90
130	    8.1.  Connection granularity..................................... 90
131	    8.2.  Creating the TCP connection................................ 90
132	    8.3.  Using the TCP connection for determining communications status 91
133	    8.4.  Using the TCP connection for binding data.................. 93
134	    8.5.  Using the TCP connection for control messages.............. 94
135	    8.6.  Losing the TCP connection.................................. 94
136	    9.  Protocol States.............................................. 94
137	    9.1.  Server Initialization...................................... 95
138	    9.2.  Server State Transitions................................... 95
139	    9.3.  STARTUP state.............................................. 98
140	    9.4.  PARTNER-DOWN state......................................... 100
141	    9.5.  RECOVER state.............................................. 102
142	    9.6.  NORMAL state............................................... 104
143	    9.7.  COMMUNICATIONS-INTERRUPTED State........................... 107
144	    9.8.  POTENTIAL-CONFLICT state................................... 110
145	    9.9.  RESOLUTION-INTERRUPTED state............................... 111
146	    9.10.  RECOVER-DONE state........................................ 112
147	    9.11.  PAUSED state.............................................. 113
148	    9.12.  SHUTDOWN state............................................ 113
149	    10.  Safe Period................................................. 114
150	    11.  Security.................................................... 116
151	    11.1.  Simple shared secret...................................... 116
152	    11.2.  TLS....................................................... 117
153	    12.  Acknowledgments............................................. 117
154	    13.  References.................................................. 119
155	    14.  Author's information........................................ 120
156	    15.  Full Copyright Statement.................................... 121

158	1.  Introduction

160	   DHCP [RFC 2131] allows for multiple servers to be operating on a sin-
161	   gle network.  Some sites are interested in running multiple servers
162	   in such a way so as to provide redundancy in case of server failure
163	   since the DHCP subsystem is in many cases a critical part of the net-
164	   work infrastructure.

166	   This document defines a protocol to provide synchronization between
167	   two servers in order that each can take over for the other should
168	   either one fail or become unreachable.

170	   One server is designated the "primary" server,  the other is the
171	   "secondary" server, and all DHCP client requests are sent to each
172	   server.

174	   In order to provide a  high availability DHCP service, these
175	   cooperating primary and secondary servers must maintain a consistent
176	   database of lease information.  This implies that servers will need
177	   to coordinate any and all lease activity so that this information is
178	   synchronized in case failover is required.  The protocol messages and
179	   processing techniques required to maintain a consistent database are
180	   specified in the protocol described here.

182	   The failover protocol also contains an algorithm which allows each
183	   server to determine to which DHCP clients it should provide service
184	   when both servers are operating normally, and this capability can be
185	   used to support load balancing.

187	2.  Terminology

189	   This section discusses both the generic requirements terminology com-
190	   mon to many IETF protocol specifications as well as specialized DHCP
191	   and failover protocol specific terminology.

193	2.1.  Requirements terminology

195	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
196	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
197	   document are to be interpreted as described in RFC 2119 [RFC 2119].

199	2.2.  DHCP and failover terminology

201	   This document uses the following terms:

203	      o "DHCP client" or "client"

205	        A DHCP client is an Internet host using DHCP to obtain confi-
206	        guration parameters such as a network address.  The term
207	        "client" used within this document always means a DHCP client,
208	        and never one of the two failover servers.

210	      o "DHCP server" or "server"

212	        A DHCP server is an Internet host that returns configuration
213	        parameters to DHCP clients.

215	      o "binding"

217	        A binding is a collection of configuration parameters, including
218	        at least an IP address, associated with or "bound to" a DHCP
219	        client.  Bindings are managed by DHCP servers.

221	      o "binding database"

223	        The collection of bindings managed by a primary and secondary.

225	      o "failover endpoint"

227	        The failover protocol allows for there to be a unique failover
228	        endpoint per partner per role (where role is primary or secon-
229	        dary).  This failover endpoint can take actions and hold unique
230	        states.  There are thus a maximum of two failover endpoints per
231	        server per partner (one for each partner as a primary and one
232	        for that same partner as a secondary.)

234	      o "lazy update"

236	        Lazy update refers to the requirement placed on a server imple-
237	        menting a failover protocol to update its failover partner when-
238	        ever the binding database changes.  A failover protocol which
239	        didn't support lazy update would require the failover partner
240	        update to be complete before a DHCP server could respond to a
241	        DHCP client request with a DHCPACK.  A failover protocol which
242	        does support lazy update places no such restriction on the
243	        update of the failover partner server, and so a server can allo-
244	        cate an IP address or extend a lease on an IP address and then
245	        update its failover partner as time permits.  A failover proto-
246	        col which supports lazy update not only removes the requirement
247	        to update the failover partner prior to responding to a DHCP
248	        client with a DHCPACK, but also allows gathering up batches of
249	        updates from one failover server to its partner.

251	      o "subnet address pool"

253	        A subnet address pool is the set of IP address which is associ-
254	        ated with a particular network number and subnet mask.  In the
255	        simple case, there is a single network number and subnet mask
256	        and a set of IP addresses.  In the more complex case (sometimes
257	        called "secondary subnets", sometimes "superscopes"), several
258	        (apparently unrelated) network number and subnet mask combina-
259	        tions with their associated IP addresses may all be configured
260	        together into one subnet address pool.

262	      o "Primary server" or "Primary"

264	        A DHCP server configured to provide primary service to a set of
265	        DHCP clients for a particular set of subnet address pools.

267	      o "Secondary server" or "Secondary"

269	        A DHCP server configured to act as backup to a primary server
270	        for a particular set of subnet address pools.

272	      o "stable storage"

274	        Every DHCP server is assumed to have some form of what is called
275	        "stable storage".  Stable storage is used to hold information
276	        concerning IP address bindings (among other things) so that this
277	        information is not lost in the event of a server failure which
278	        requires restart of the server.

280	      o "MCLT"
281	        The MCLT refers to maximum client lead time.  This time is con-
282	        figured on the primary server and transmitted from the primary
283	        to the secondary server in the CONNECT message.  It is the max-
284	        imum amount of time that one server can give to a client for a
285	        binding beyond that known and ACKed by the partner server.  See
286	        section 5.2.1 for details.

288	      o "DNS"

290	        An abbreviation for "Domain Name System", a scheme where a cen-
291	        tral name repository is used to map names to IP addresses and IP
292	        addresses to names.

294	      o "FQDN"

296	        An FQDN is a "fully qualified domain name".  A fully qualified
297	        domain name generally is a host name with at least one zone
298	        name, for example "www.dhcp.org" is a fully qualified domain
299	        name.

301	      o "partner"

303	        A "partner", for the purposes of this document, refers to a
304	        failover server, typically the other failover server.  In many
305	        (if not most) cases, the failover protocol is symmetric with
306	        respect to the primary or secondary nature of the servers, and
307	        so it is often appropriate to dicuss "updating the partner
308	        server", since it could be a primary server updating a secondary
309	        server or a secondary server updating a primary server.

311	      o "RR"

313	        "RR" is an abbreviation for "resource record".  All records in
314	        the DNS are resource records.  The resource records of most
315	        relevance to this document are the "A" resource record, which
316	        maps a DNS name to a particular IP address, the "PTR" resource
317	        record, which allows a "reverse map", from the IP address back
318	        to a DNS name, and the "KEY" resource record, which is used in
319	        ways defined in [DDNS] to tag a DNS name with the identity of
320	        the DHCP client with which it is associated.

322	      o "DDNS"

324	        An abbreviation for "Dynamic DNS", which refers to the capabil-
325	        ity to update a DNS server's name (actually resource record)
326	        database using an on-the-wire protocol defined in [RFC2136].

328	      o "binding-status"
329	        The binding-status is the status of an IP address with respect
330	        to its association with a client.  There are specific binding-
331	        status values defined for use by the failover protocol, e.g.,
332	        ACTIVE, FREE, RELEASED, ABANDONED, etc.  These are designed to
333	        map more or less directly onto the binding-status values used
334	        internally in most DHCP server implementations.  The term
335	        binding-status refers to the concept also sometimes known as
336	        "lease state" or "IP address state", but in this document the
337	        term "state" is reserved for the failover state of a failover
338	        endpoint, and binding-status is always used to refer to the
339	        state associated with an IP address or lease.

341	3.  Background and External Requirements

343	   This section highlights key aspects of the DHCP protocol on which the
344	   failover protocol depends.  It also discusses the requirements that
345	   the failover protocol places on other aspects of the network infras-
346	   tructure, and some general issues surrounding server failure detec-
347	   tion.  Some failure scenarios that provide particular challenges to a
348	   failover protocol are discussed.  Finally, the challenges inherent in
349	   using a TCP connection as a means to detect failure of a partner
350	   server are elaborated.

352	3.1.  Key aspects of the DHCP protocol

354	   The failover protocol is designed to augment the DHCP protocol as
355	   described in RFC 2131 [RFC 2131].  There are several key aspects of
356	   the DHCP protocol which are required by the failover protocol in
357	   order to successfully meet its design goals.

359	3.1.1.  Broadcast behavior

361	   There are two aspects of the broadcast behavior of the DHCP protocol
362	   which are key to making the failover protocol operate successfully.
363	   The first is simply that the DHCP protocol requires a DHCP client to
364	   broadcast all DHCPDISCOVER and DHCPREQUEST/INIT-REBOOT messages.
365	   Because of this requirement, a DHCP client who was communicating with
366	   one server will automatically be able to communicate with another
367	   server if one is available.

369	   The second aspect of broadcast behavior is similar to the first, but
370	   involves the distinction between a DHCPREQUEST/RENEW and
371	   DHCPREQUEST/REBINDING.  A DHCPREQUEST/RENEW is the message that a
372	   DHCP client uses to extend its lease.  It is unicast to the DHCP
373	   server from which it acquired the lease.   However, the DHCP protocol
374	   (in a farsighted move), was explicitly designed so that in the event
375	   that a DHCP client cannot contact the server from which it received a
376	   lease on an IP address using a DHCPREQUEST/RENEW, the client is
377	   required to broadcast its renewal using a DHCPREQUEST/REBINDING to
378	   any available DHCP server.  Since all DHCP clients were required to
379	   implement this algorithm, the failover protocol can have a different
380	   server from the one that initially granted a lease be the server to
381	   renew a lease.  Thus, one server can take over for another with no
382	   interruption in the service as experience by the DHCP client or its
383	   associated applications software.

385	3.1.2.  Client responsibility

387	   In the DHCP protocol the DHCP clients are entrusted with a consider-
388	   able responsibility.  In particular, after they are granted a lease
389	   on an IP address, they are enjoined to only use that IP address while
390	   their lease is valid.  Every DHCP client is expected to stop using an
391	   IP address if the expiration time on the lease has passed and if it
392	   cannot get an extension on the lease for that IP address from some
393	   DHCP server.  Thus, the correct behavior of every DHCP client in this
394	   regard is required to ensure the integrity of the DHCP service.  On
395	   the other hand, incorrect behavior by a client in this area will tend
396	   to adversely affect at most one other DHCP client.

398	   Furthermore, any DHCP client which sends in a DHCPREQUEST/RENEW or
399	   DHCPREQUEST/REBINDING to a DHCP server (either unicast for a RENEW or
400	   broadcast for a REBINDING) MUST still have time to run on the lease
401	   for that IP address.  The DHCP server sends the DHCPACK back unicast
402	   to the IP address from which the RENEW or REBINDING originated.

404	   Given the existing responsibility placed on the client to only use an
405	   IP address when the lease is valid, and to only send in a RENEW or
406	   REBINDING if the lease is valid, the failover protocol relies on DHCP
407	   clients to perform responsibly and will, in the absence of conflict-
408	   ing information, believe a DHCP client that is attempting to RENEW or
409	   REBIND a lease on an IP address is the legitimate owner of that IP
410	   address.

412	   If clients do not follow these rules, it is possible for an address
413	   to be in use by more than one client. For a single server, this hap-
414	   pens because the server has leased the expired address to another
415	   client and the original client is also attempting to use the address.
416	   The server would NAK the renewal request. This is made slightly worse
417	   in the failover protocol if the two servers are unable to communicate
418	   with each other and one server leases an available address to a new
419	   client while the other server receives a renewal from a different
420	   client.  In this case, both servers lease the same address to dif-
421	   ferent clients for the MCLT time.

423	   One troublesome issue is that of the DHCP client responsibility when
424	   sending in DHCPREQUEST/INIT-REBOOT requests.  While the original DHCP
425	   RFC was written to require a DHCP client to have time left to run on
426	   the lease for an IP address if the client is sending an INIT-REBOOT
427	   request, it was sufficiently unclear that some client vendors didn't
428	   realize this until recently.  Since the INIT-REBOOT request was sent
429	   with the IP address in the dhcp-requested-address option and not in
430	   the ciaddr (for perfectly good reasons), the similarity to the RENEW
431	   and REBINDING case was lost on many people.

433	   At present, the failover protocol does not assume that a client send-
434	   ing in an INIT-REBOOT request necessarily has a valid lease on the IP
435	   address appearing in the dhcp-requested-address option in the INIT-
436	   REBOOT request.

438	   The implications of this are as follows: Assume that there is a DHCP
439	   client that gets a lease from one server while that server is unable
440	   to communicate with its failover partner.  Then, assume that after
441	   that client reboots it is able only to communicate with the other
442	   failover server.  If the failover servers have not been able to com-
443	   municate with each other during this process, then the DHCP client
444	   will get a new IP address instead of being able to continue to use
445	   its existing IP address. This will affect no applications on the DHCP
446	   client, since it is rebooting.  However, it will use up an additional
447	   IP address in this marginal case.

449	3.1.3.  Stable storage update before DHCPACK

451	   The DHCP protocol allocates resources, and in order to operate
452	   correctly it requires that a DHCP server update some form of stable
453	   storage prior to sending a DHCPACK to a DHCP client in order to grant
454	   that client a lease on an IP address.

456	   One of the goals of the failover protocol is that it not add signifi-
457	   cant additional time to this already time consuming requirement to
458	   update stable storage prior to a DHCPACK.  In particular, adding a
459	   requirement to communicate with another server prior to sending a
460	   DHCPACK would simplify the failover protocol, but it would limit the
461	   potential scalability of any DHCP server which employed the failover
462	   protocol in an unacceptable manner.

464	3.2.  BOOTP relay agent implementation

466	   Many DHCP clients are not resident on the same network segment as a
467	   DHCP server.  In order to support this form of network architecture,
468	   most contemporary routers implement something known as a BOOTP Relay
469	   Agent.  This capability inside of a router listens for all broadcasts
470	   at the DHCP port, port 67, and will relay any broadcasts that it
471	   receives on to a DHCP server.  The IP address of the DHCP server must
472	   have been previously configured into the router.  As part of the
473	   relay process, the relay agent will place the address of the inter-
474	   face on which it received the broadcast into the giaddr field of the
475	   DHCP packet.

477	   Since the failover protocol requires two DHCP servers to receive any
478	   broadcast DHCP messages, in order to work with DHCP clients which are
479	   not local to the DHCP server, the BOOTP relay agent on the router
480	   closest to the DHCP client must be configured to point at more than
481	   one DHCP server.

483	   Most BOOTP relay agent implementations allow this duplication of
484	   packets.

486	   If this is not possible, an administrator might be able to configure
487	   the relay agent with a subnet broadcast address, but in this case the
488	   primary and secondary DHCP servers in a failover pair must both
489	   reside on the same subnet.   While this is a realistic configuration,
490	   it is not the one that most people will use.

492	3.3.  What does it mean if a server can't communicate with its partner?

494	   In any protocol designed to allow one server to take over some
495	   responsibilities from a partner server in the event of "failure" of
496	   that partner server, there is an inherent difficulty in determining
497	   when that partner server has failed.

499	   In fact, it is fundamentally impossible for one server to distinguish
500	   a network communications failure from the outright failure of the
501	   server to which it is trying to communicate.  In the case where each
502	   server is handing out resources (in this case IP addresses) to a
503	   client community, mistaking an inability to communicate with a
504	   partner server for failure of that partner server could easily cause
505	   both servers to be handing out the same IP addresses to different
506	   clients.

508	   One way that this is sometimes handled is for there to be more than
509	   two servers.  In the case of an odd number of servers, the servers
510	   that can still communicate with a majority of other servers will con-
511	   sider themselves operational, and any server which can't communicate
512	   to a majority of other servers must immediately cease operations.

514	   While this technique works in some domains, having the only server to
515	   which a DHCP client can communicate voluntarily shut itself down
516	   seems like something worth avoiding.

518	   The failover protocol will operate correctly while both servers are
519	   unable to communicate, whether they are both running or not.  At some
520	   point there may be resource contention, and if one of the servers is
521	   actually down, then the operator can inform the other server and the
522	   operational server will be able to use all of the downed server's
523	   resources.

525	   The protocol also allows detection of an orderly shutdown of a parti-
526	   cipating server.

528	3.4.  Challenging scenarios for a Failover protocol

530	   There exist two failure scenarios which provide particular challenges
531	   the correctness guarantees of a failover protocol.

533	3.4.1.  Primary Server crash before "lazy" update:

535	   In the case where the primary server sends a DHCPACK to a client for
536	   a newly allocated IP address and then crashes prior to sending the
537	   corresponding update to the secondary server, the secondary server
538	   will have no record of the IP address allocation.  When the secondary
539	   server takes over, it may well try to allocate that IP address to a
540	   different client.  In the case where the first client to receive the
541	   IP address is not on the net at the time (yet while there was still
542	   time to run on its lease), an ICMP echo (i.e., ping) will not prevent
543	   the secondary server from allocating that IP address to a different
544	   client.

546	   The failover protocol deals with this situation by having the primary
547	   and secondary servers allocate addresses for new clients from dis-
548	   joint address pools.  See section 5.4 for details.

550	   A more likely (in that DHCPRENEWs are presumably more common than
551	   DHCPDISCOVERs) and more subtle version of this problem is where the
552	   primary server crashes after extending a client's lease time, and
553	   before updating the secondary with a new time using a lazy update.
554	   After the secondary takes over, if the client is not connected to the
555	   network the secondary will believe the client's lease has expired
556	   when, in fact, it has not.  In this case as well, the IP address
557	   might be reallocated to a different client while the first client is
558	   still using it.

560	   This scenario is handled by the failover protocol through control of
561	   the lease time and the use of the maximum client lead time (MCLT).
562	   See section 5.2.1  for details.

564	3.4.2.  Network partition where DHCP servers can't communicate but each
565	can talk to clients:

567	   Several conditions are required for this situation to occur.  First,
568	   due to a network failure, the primary and secondary servers cannot
569	   communicate.  As well, some of the DHCP clients must be able to com-
570	   municate with the primary server, and some of the clients must now
571	   only be able to communicate with the secondary server.  When this
572	   condition occurs, both primary and secondary servers could attempt to
573	   allocate IP addresses for new clients from the same pool of available
574	   addresses.  At some point, then, two clients will end up being allo-
575	   cated the same IP address.  This will cause problems when the network
576	   failure that created this situation is corrected.

578	   The failover protocol deals with this situation by having the primary
579	   and secondary servers allocate addresses for new clients from dis-
580	   joint address pools.  See section 5.4 for details.

582	3.5.  Using TCP to detect partner server failure

584	   There are several characteristics of TCP that are important to the
585	   functioning of the failover protocol, which uses one TCP connection
586	   for both bulk data transfer as well as to assess communications
587	   integrity with the other server.  Reliable and ordered message
588	   delivery are chief among these important characteristics.

590	   It would be nice to use the capabilities built in to TCP to allow it
591	   to determine if communications integrity exists to the failover
592	   partner but this strategy contains some problems which require
593	   analysis.  There exist three fundamental cases for an open TCP con-
594	   nection that must be examined.

596	      1.  When no data is being sent then no messages are traveling
597	          across the TCP connection.

599	      2.  When data is queued to be sent, and the receiver has not
600	          blocked the sending of additional data, then messages are
601	          flowing across the TCP connection containing the applications
602	          data.

604	      3.  When data is queued to be sent, and the receiver has blocked
605	          the transmission of additional data, then persist messages are
606	          flowing from the receiver to the sender to ensure that the
607	          sender doesn't miss the receiver opening the window for
608	          further transmissions.

610	   The first case can be turned into the second case by sending
611	   application-level keep-alive messages periodically when there is no
612	   other data queued to be sent.  Note TCP keep-alive messages might be
613	   used as well, but they present additional problems.

615	   Thus, we can ensure that the TCP connection has messages flowing
616	   periodically across the connection fairly easily.  The question
617	   remains as to what TCP will do if the other end of the connection
618	   fails to respond (either because of network partition or because the
619	   receiving server crashes). TCP will attempt to retransmit a message
620	   with an exponential backoff, and will eventually timeout that
621	   retransmission.  However, the length of that timeout cannot, in gen-
622	   eral, be set on a per-connection basis, and is frequently as long as
623	   nine minutes, though in some cases it may be as short as two minutes.
624	   One some systems it can be set system-wide, while on some systems it
625	   cannot be changed at all.

627	   A value for this timeout that would be appropriate for the failover
628	   protocol, say less than 1 minute, could have unpleasant side-effects
629	   on other applications running on the same server, assuming that it
630	   could be changed at all on the host operating system.

632	   Nine minutes is a long time for the DHCP service to be unavailable to
633	   any new clients that were being served by the server which has
634	   crashed, when there is another server running that could respond to
635	   them immediately as soon as it determines that its partner is not
636	   operational.

638	   The conclusion drawn from this analysis is that TCP provides very
639	   useful support for the failover protocol in the areas of reliable and
640	   ordered message delivery, but cannot by itself be relied upon to
641	   detect partner server failure in a fashion acceptable to the needs of
642	   the failover protocol.  Additional failover protocol capabilities
643	   will need to be created to support timely detection of partner server
644	   failure.  See section 8.3 for details on this mechanism.

646	4.  Design Goals

648	   This section lists the design requirements, the design goals, and the
649	   limitations of the failover protocol.

651	4.1.  Design requirements for this protocol

653	   The following list of requirements must be (and are) met by this pro-
654	   tocol.  They are listed in priority order.

656	      1.  Implementations of this protocol must work with existing DHCP
657	          client implementations based on the DHCP protocol [1].

659	      2.  Implementations of the protocol must work with existing BOOTP
660	          relay agent implementations.

662	      3.  The protocol must provide failover redundancy between servers
663	          that are not located on the same subnet.

665	4.2.  Goals for this protocol

667	   The following goals are met by this protocol as well, though they are
668	   less important than the requirements listed above. These goals are
669	   listed in priority order.

671	      1.  Provide for continued service to DHCP clients through an
672	          automated mechanism in the event of failure of the primary
673	          server.

675	      2.  Avoid binding an IP address to a client while that binding is
676	          currently valid for another client.  In other words, do not
677	          allocate the same IP address to two clients.

679	      3.  Minimize any need for manual administrative intervention.

681	      4.  Introduce no additional delays in server response time as a
682	          result of the network communications required to implement the
683	          failover protocol, i.e., don't require communications with the
684	          partner between the receipt of a DHCPREQUEST and the
685	          corresponding DHCPACK.

687	      5.  Share IP address ranges between primary and secondary servers;
688	          i.e., impose no requirement that the pool of available
689	          addresses be divided between servers.

691	      6.  Continue to meet the goals and objectives of this protocol in
692	          the event of server failure or network partition.

694	      7.  Provide graceful reintegration of full protocol service after
695	          server failure or network partition.

697	      8.  Allow for one computer to act as a secondary server for multi-
698	          ple primary servers. Other topologies (e.g.: mesh) are also
699	          possible.  primary and secondary servers SHOULD be viewed as
700	          "logical" servers and not necessarily physical computers.

702	      9.  Ensure that an existing client can keep its existing IP
703	          address binding if it can communicate with either the primary
704	          or secondary DHCP server implementing this protocol - not just
705	          whichever server that originally offered it the binding.

707	      10. Ensure that a new client can get an IP address from some
708	          server. Ensure that in the face of partition, where servers
709	          continue to run but cannot communicate with each other, the
710	          above goals and requirements may be met. In addition, when the
711	          partition condition is removed, allow graceful automatic re-
712	          integration without requiring human intervention.

714	      11. If either primary or secondary server loses all of the infor-
715	          mation that is has stored in stable storage, it should be able
716	          to refresh its stable storage from the other server.

718	      12. Support load balancing between the primary and secondary
719	          servers, and allow configuration of the percentage of the
720	          client population served by each with a moderately fine granu-
721	          larity.

723	4.3.  Limitations of this Protocol

725	   The following are explicit limitations of this protocol.

727	      1.  This protocol provides only one level of redundancy through a
728	          single secondary server for each primary server.

730	      2.  A subset of the address pool is reserved for secondary server
731	          use.  In order to handle the failure case where both servers
732	          are able to communicate with DHCP clients, but unable to com-
733	          municate with each other, a subset of the IP address pool must
734	          be set aside as a private address pool for the secondary
735	          server.  The secondary can use these to service newly arrived
736	          DHCP clients during such a period.  The size of this private
737	          pool SHOULD be based only on the arrival rate of new DHCP
738	          clients and the length of expected downtime, and is not influ-
739	          enced in any way by the total number of DHCP clients supported
740	          by the server pair.

742	          The failover protocol can be used in a mode where both the
743	          primary and secondary servers can share the load between them
744	          when both are operating.  In this loadbalancing mode, the
745	          addresses allocated by the primary server to the secondary
746	          server are not unused, but are used instead to service the
747	          portion of the client base which to which the secondary server
748	          is required to respond.  See section 5.3 for more information
749	          on loadbalancing.

751	      3.  The primary and secondary servers do not respond to client
752	          requests at all while recovering from a failure that could
753	          have resulted in duplicate IP assignments.  (When synchroniz-
754	          ing in POTENTIAL-CONFLICT state).

756	5.  Protocol Overview

758	   This section will discuss the failover protocol at a relatively high
759	   level of detail.  In the event that a description in this section
760	   conflicts (or appears to conflict due to the overview nature of this
761	   section) with information in later sections of this draft, the infor-
762	   mation in the later sections should be considered authoritative.

764	5.1.  Messages and States

766	   This protocol is centered around the message exchange used by one
767	   server to update the other server of binding database changes result-
768	   ing from DHCP client activity:

770	      o Communication of binding database changes

772	        The binding update (BNDUPD) message is used to send the binding
773	        database changes to the partner server, and the partner server
774	        responds with a binding acknowledgement (BNDACK) message when it
775	        has successfully committed those changes to its own stable
776	        storage.

778	   All of the other messages involve ancillary issues:

780	      o Management of available IP addresses

782	        The pool request (POOLREQ) is used by the secondary server to
783	        request an allocation of IP addresses from the primary server.
784	        The pool response (POOLRESP) is used by the primary server to
785	        inform the secondary server how many IP addresses were allocated
786	        to the secondary server as the result of the pool request.

788	      o Synchronization of the binding databases between the servers
789	        after they've been out of communications

791	        The update request (UPDREQ) message is used by one server to
792	        request that its partner send it all binding database informa-
793	        tion that it has not already seen.  The update request all
794	        (UPDREQALL) message is used by one server to request that all
795	        binding database information be sent in order to recover from a
796	        total loss of its binding database by the requesting server.
797	        The update done (UPDDONE) message is used by the responding
798	        server to indicate that all requested updates have been sent the
799	        responding server and acked by the requesting server.

801	      o Connection establishment

803	        The connect (CONNECT) message is used by the primary server to
804	        establish a high level connection with the other server, and to
805	        transmit several important configuration data items between the
806	        servers.  The connect acknowledgement message (CONNECTACK) is
807	        used by the secondary server to respond to a CONNECT message
808	        from the primary server.  The disconnect (DISCONNECT) message is
809	        used by either server when closing a connection.

811	      o Server synchronization

813	        The state change (STATE) message is used by either server to
814	        inform the other server of a change of failover state.

816	      o Connection integrity management

818	        The contact (CONTACT) message is used by either server to ensure
819	        that the other server continues to see the connection as opera-
820	        tional.  It MUST be transmitted periodically over every esta-
821	        blished connection if other message traffic is not flowing, and
822	        it MAY be sent at any time.

824	5.1.1.  Failover endpoints

826	   The proper operation of the failover protocol requires more than the
827	   transmission of messages between one server and the other.  Each end-
828	   point might seem to be a single DHCP server, but in fact there are
829	   many situations where additional flexibility in configuration is use-
830	   ful.

832	   For instance, there might be several servers which are each primary
833	   for a distinct set of address pools, and one server which is secon-
834	   dary for all of those address pools.  The situation with the pri-
835	   maries is straightforward, but the secondary will need to maintain a
836	   separate failover state, partner state, and communications up/down
837	   status for each of the separate primary servers for which it is act-
838	   ing as a secondary.

840	   The failover protocol calls for there to be a unique failover end-
841	   point per partner per role (where role is primary or secondary).
842	   This failover endpoint can take actions and hold unique states.
843	   There are thus a maximum of two failover endpoints per partner (one
844	   for the partner as a primary and one for that same partner as a
845	   secondary.)

847	   Thus, in the case where there are two primary servers A and B each
848	   backed up by a single common secondary server C, there is one fail-
849	   over endpoint on each of A and B, and two different failover end-
850	   points on C.  The two different failover endpoints on C each have
851	   unique states and independent TCP connections.

853	   This document describes the behavior of the protocol in terms of pri-
854	   mary and secondary servers, not primary and secondary failover end-
855	   points.  However, it is important to remember that every 'server'
856	   described in this document is in reality a failover endpoint that
857	   resides in a particular process, and that many failover endpoints may
858	   reside in the same process.

860	   It is not the case that there is a unique failover endpoint for each
861	   subnet that participates in a failover relationship.  On one server,
862	   there is one failover endpoint per partner per role, regardless of
863	   how many subnets or address pools are managed by that combination of
864	   partner and role.  Conversely, on a particular server, any given sub-
865	   net or pool will be associated with exactly one failover endpoint.

867	   When a connection is received from the partner, the unique failover
868	   endpoint to which the message is directed is determined solely by the
869	   IP address of the partner and the setting of the SECONDARY bit in the
870	   'flags' field of the CONTACT message.

872	   Throughout this document, the states and actions taken by "servers"
873	   are described.  The terms "server", "primary server", and "secondary
874	   server" are commonly used to described the failover endpoint taking
875	   these states and performing these actions.  This description is
876	   wholly accurate only for the simplest of cases, where all of the
877	   address pools on one server are backed up by all of the address pools
878	   on another server.  In this case, there is single failover endpoint
879	   in each server.  In all other cases, the term "server" is used to
880	   describe one of the two possible failover endpoints per partner.

882	5.2.  Fundamental restrictions

884	   There a several fundamental restrictions this protocol places on what
885	   one server can do in the absence of knowledge of the other server,
886	   and these restrictions are key to the correct operation of the proto-
887	   col.

889	5.2.1.  Control of lease time

891	   The key problem with lazy update is that when the a server fails
892	   after updating a client with a particular lease time and before
893	   updating its partner, the partner will believe that a lease has
894	   expired even though the client still retains a valid lease on that IP
895	   address.

897	   In order to handle this problem, a period of time known as the "Max-
898	   imum Client Lead Time" (MCLT) is defined and must be known to both
899	   the primary and secondary servers.  Proper use of this time interval
900	   places an upper bound on the difference allowed between the lease
901	   time provided to a DHCP client by a server and the lease time known
902	   by that server's partner.  However, the MCLT is typically much less
903	   than the lease time that a server has been configured to offer a
904	   client, and so some strategy must exist to allow a server to offer
905	   the configured lease time to a client.  During a lazy update the
906	   updating server typically updates its partner with a potential
907	   expiration time which is longer than the lease time previously given
908	   to the client and which is longer than the lease time that the server
909	   has been configured to give a client.  This allows that server to
910	   give a longer lease time to the client the next time the client
911	   renews its lease, since the time that it will give to the client will
912	   not exceed the MCLT beyond the potential expiration time acknowledged
913	   by the partner.

915	   The PARTNER-DOWN state exists so that a server can be sure that its
916	   partner is, indeed, down.  Correct operation while in that state
917	   requires (generally) that the server wait the MCLT after anything
918	   that happened prior to its transition into PARTNER-DOWN state (or,
919	   more accurately, when the other server went down if that is known).
920	   Thus, the server MUST wait the Maximum Client Lead Time after the
921	   partner server went down before allocating any of the partner's FREE
922	   addresses.  In the event the partner was not in communication prior
923	   to going down, it might have allocated one or more of its FREE
924	   addresses to a DHCP client and been unable to inform the server
925	   entering PARTNER-DOWN prior to going down itself.  By waiting the
926	   MCLT after the time the partner went down, the server in PARTNER-DOWN
927	   state ensures that any clients which have a lease on one of the
928	   partner's FREE addresses will either time out or contact the server
929	   in PARTNER-DOWN by the time that period ends.

931	   In addition, once a server has transitioned to PARTNER-DOWN state, it
932	   MUST NOT reallocate an IP address from one client to another client
933	   until an additional MCLT interval after the lease by the original
934	   client expires.  (Actually, until the maximum client lead time after
935	   what it believes to be the lease expiration time of the first
936	   client.)

938	   Some optimizations exist for this restriction, in that it only
939	   applies to leases that were issued BEFORE entering PARTNER-DOWN. Once
940	   a server has entered PARTNER-DOWN and it leases out an address, it
941	   need not wait this time as long as it has never communicated with the
942	   partner since the lease was given out.

944	   The fundamental relationship on which much of the correctness of this
945	   protocol depends is that the lease expiration time known to a DHCP
946	   client MUST NOT be more than the maximum client lead time greater
947	   than the potential expiration time known to a server's partner.

949	   The remainder of this section makes the above fundamental relation-
950	   ship more explicit.

952	   This protocol requires a DHCP server to deal with several different
953	   lease intervals and places specific restrictions on their relation-
954	   ships. The purpose of these restrictions is to allow the other server
955	   in the pair to be able to make certain assumptions in the absence of
956	   an ability to communicate between servers.

958	   The different lease times are:

960	      o desired lease interval

962	        The desired lease interval is the lease interval that a DHCP
963	        server would like to give to a DHCP client in the absence of any
964	        restrictions imposed by the Failover protocol.  Its determina-
965	        tion is outside of the scope of this protocol. Typically this is
966	        the result of external configuration of a DHCP server.

968	      o actual lease interval

970	        The actual lease internal is the lease interval that a DHCP
971	        server gives out to a DHCP client in the dhcp-lease-time option
972	        of a DHCPACK packet.  It may be shorter than the desired client
973	        lease interval (as explained below).

975	      o potential lease interval

977	        The potential lease interval is the lease expiration interval
978	        the local server tells to its partner in the potential-
979	        expiration-time option of a BNDUPD message.

981	      o acknowledged potential lease interval

983	        The acknowledged potential lease interval is the potential lease
984	        interval the partner server has most recently acknowledged in
985	        the potential-expiration-time option of a BNDACK message.

987	   The key restriction (and guarantee) that any server makes with
988	   respect to lease intervals is that the actual client lease interval
989	   never exceeds the acknowledged potential lease interval (if any) by
990	   more than a fixed amount.  This fixed amount is called the "Maximum
991	   Client Lead Time" (MCLT).

993	   The MCLT MAY be configurable on the primary server, but for correct
994	   server operation it MUST be the same and known to both the primary
995	   and secondary servers.  The secondary server determines the MCLT from
996	   the MCLT option sent from the primary server to the secondary server
997	   in the CONNECT message.

999	   A server MUST record in its stable storage both the actual lease
1000	   interval and the most recently acknowledged potential lease interval
1001	   for each IP address binding.  It is assumed that the desired client
1002	   lease interval can be determined through techniques outside of the
1003	   scope of this protocol.  See section 7.1.4 for more details concern-
1004	   ing the times that the server MUST record in its stable storage and
1005	   the way that they interact with the lease time that may be offered to
1006	   a DHCP client.

1008	   Again, the fundamental relationship among these times which MUST be
1009	   maintained is:

1011	       actual lease interval <
1012	       ( acknowledged potential lease interval + MCLT )

1014	   Figure 5.1-1 illustrates a initial lease to a client using the rules
1015	   discussed in the example which follows it.

1017	              DHCP                 Primary             Secondary
1018	       time   Client               Server               Server

1020	                | (time in intervals) |  (absolute time)   |
1021	                |                     |                    |
1022	                | >-DHCPDISCOVER->    |                    |
1023	                |     <---DHCPOFFER-< |                    |
1024	                |                     |                    |
1025	                | >-DHCPREQUEST->     |                    |
1026	                |   (selecting)       |                    |
1027	                |                     |                    |
1028	         t      |  <--------DHCPACK-< |                    |
1029	                |  lease-time=MCLT    |                    |
1030	                |                     |    >-BNDUPD-->     |
1031	                |                     |  lease-expiration=t+MCLT
1032	                |                     |  potential-expiration=t+(MCLT/2)+X
1033	                |                     |                    |
1034	                |                     |     <-BNDACK-<     |
1035	                |                     |  potential-expiration=t+(MCLT/2)+X
1036	               ...                   ...                  ...
1037	                |                     |                    |
1038	      t+MCLT/2  | >-DHCPREQUEST->     |                    |
1039	                |      (renew)        |                    |
1040	                |                     |                    |
1041	         t1     |  <--------DHCPACK-< |                    |
1042	                |   lease-time=X      |                    |
1043	                |                     |    >-BNDUPD-->     |
1044	                |                     |  lease-expiration=t1+X
1045	                |                     |  potential-expiration=t1+(X/2)+X
1046	                |                     |                    |
1047	                |                     |     <-BNDACK-<     |
1048	                |                     |  potential-expiration=t1+(X/2)+X
1049	               ...                   ...                  ...

1051	           Figure 5.1-1:  Lazy Update Message Traffic
1052	                          X = Desired Lease Interval

1054	   DISCUSSION:

1056	      This protocol mandates no algorithm concerning these lease inter-
1057	      vals, as long as above fundamental relationship is preserved.

1059	      In the interests of clarity, however, let's examine a specific
1060	      example.  The MCLT in this case is 1 hour.  The desired lease
1061	      interval is 3 days, and its renewal time is half the lease inter-
1062	      val.

1064	      The rules for this example are:

1066	      o What to tell the client:

1068	        Take the remainder of the acknowledged potential lease interval.
1069	        If this is a new lease, then this value will be zero.  If this
1070	        remainder plus the MCLT is greater than the desired lease inter-
1071	        val, give the client the desired lease interval else give the
1072	        client the remainder plus the MCLT.

1074	      o What to tell the failover partner server:

1076	        Take the renewal interval (typically half of the actual client
1077	        lease interval), add to it the desired lease interval, and add
1078	        it to the current time to yield the value that goes into the
1079	        potential-expiration-time option.

1081	        Also tell the failover partner the actual lease interval by
1082	        adding it to the current time to yield the value that goes into
1083	        the lease-expiration option.

1085	      In operation this might work as follows:

1087	      When a server makes an offer for a new lease on an IP address to a
1088	      DHCP client, it determines the desired lease interval (in this
1089	      case, 3 days).  It then examines the acknowledged potential lease
1090	      interval (which in this case is zero) and determines the remainder
1091	      of the time left to run, which is also zero.  To this it adds the
1092	      MCLT.  Since the actual lease interval cannot be allowed to exceed
1093	      the remainder of the current acknowledged potential lease interval
1094	      plus the MCLT, the offer made to the client is for the remainder
1095	      of the current acknowledged potential lease interval (i.e., zero)
1096	      plus the MCLT.  Thus, the actual lease interval is 1 hour.

1098	      Once the server has performed the ACK to the DHCP client, it will
1099	      update the secondary server with the lease information. However,
1100	      the desired potential lease interval will be composed of the one
1101	      half of the current actual lease interval added to the desired
1102	      lease interval. Thus, the secondary server is updated with a
1103	      BNDUPD with a lease interval of 3 days + 1/2 hour specified in the
1104	      potential-expiration-time option.

1106	      When the primary server receives an ACK to its update of the
1107	      secondary server's (partner's) potential lease interval, it
1108	      records that as the acknowledged potential lease interval.  A
1109	      server MUST NOT send a BNDACK in response to a BNDUPD message
1110	      until it is sure that the information in the BNDUPD message
1111	      resides in its stable storage.  Thus, the primary server in this
1112	      case can be sure that the secondary server has recorded the poten-
1113	      tial lease interval in its stable storage when the primary server
1114	      receives a BNDACK message from the secondary server.

1116	      When the DHCP client attempts to renew at T1 (approximately one
1117	      half an hour from the start of the lease), the primary server
1118	      again determines the desired lease interval, which is still 3
1119	      days.  It then compares this with the remaining acknowledged
1120	      potential lease interval (3 days + 1/2 hour) and adjusts for the
1121	      time passed since the secondary was last updated (1/2 hour).  Thus
1122	      the time remaining of the acknowledged potential lease interval is
1123	      3 days.  Adding the MCLT to this yields 3 days plus 1 hour, which
1124	      is more than the desired lease interval of 3 days.  So the client
1125	      is renewed for the desired lease interval -- 3 days.

1127	      When the primary DHCP server updates the secondary DHCP server
1128	      after the DHCP client's renewal ACK is complete, it will calculate
1129	      the desired potential lease interval as the T1 fraction of the
1130	      actual client lease interval (1/2 of 3 days this time = 1.5 days).
1131	      To this it will add the desired client lease interval of 3 days,
1132	      yielding a total desired partner server lease interval of 4.5
1133	      days.  In this way, the primary attempts to have the secondary
1134	      always "lead" the client in its understanding of the client's
1135	      lease interval so as to be able to always offer the client the
1136	      desired client lease interval.

1138	      Once the initial actual client lease interval of the MCLT is past,
1139	      the protocol operates effectively like the DHCP protocol does
1140	      today in its behavior concerning lease intervals. However, the
1141	      guarantee that the actual client lease interval will never exceed
1142	      the remaining acknowledged partner server lease interval by more
1143	      than the MCLT allows full recovery from a variety of failures.

1145	5.2.2.  Controlled re-allocation of IP addresses

1147	   When in PARTNER-DOWN state there is a waiting period after which an
1148	   IP address can be re-allocated to another client.  For leases which
1149	   are available when the server enters PARTNER-DOWN state, the period
1150	   is the MCLT from entry into PARTNER-DOWN state.  For IP addresses
1151	   which are not available when the server enters PARTNER-DOWN state,
1152	   the period is the MCLT after the lease becomes available.  See sec-
1153	   tion 9.4.2 for more details.

1155	   In any other state, a server cannot reallocate an address from one
1156	   client to another without first notifying its partner (through a
1157	   BNDUPD message) and receiving acknowledgement (through a BNDACK mes-
1158	   sage) that its partner is aware that that first client is not using
1159	   the address.

1161	   This could be modeled in the following way. Though this specific
1162	   implementation is in no way required, it may serve to better illus-
1163	   trate the concept.

1165	   An "available" IP address on a server may be allocated to any client.
1166	   An IP address which was leased to a client and which expired or was
1167	   released by that client would take on a new state, EXPIRED or
1168	   RELEASED respectively.  The partner server would then be notified
1169	   that this IP address was EXPIRED or RELEASED through a BNDUPD.  When
1170	   the sending server received the BNDACK for that IP address showing it
1171	   was FREE, it would move the IP address from EXPIRED or RELEASED to
1172	   FREE, and it would be available for allocation by the primary server
1173	   to any clients.

1175	   A server MAY reallocate an IP address in the EXPIRED or RELEASED
1176	   state to the same client with no restrictions.

1178	5.3.  Load balancing

1180	   In order to implement load balancing between a primary and secondary
1181	   server pair, each server must respond to DHCPDISCOVER requests from
1182	   some clients and not from other clients.  In order to do this suc-
1183	   cessfully, each server must be able to determine immediately upon
1184	   receipt of a DHCP client request whether it is to service this
1185	   request or to ignore it in order to allow the other server to service
1186	   the request.

1188	   In addition, it should be possible to configure the percentage of
1189	   clients which will be serviced by either the primary or secondary
1190	   server.  This configuration should be more or less continuous, from
1191	   all serviced by the primary through an even split with half serviced
1192	   by each, to all serviced by the secondary.

1194	   The technique chosen to support these goals is described in [LOADB].
1195	   When using the load balancing algorithm in [LOADB] among two servers
1196	   implementing the failover protocol, both servers MUST use the same
1197	   information from the DHCP client packet as the Request ID for the
1198	   load balancing algorithm.  Both servers MUST use the dhcp-client-
1199	   identifier (if it appears), and the client-hardware-address if the
1200	   dhcp-client-identifier does not.  The client-hardware-address is con-
1201	   structed from the htype and chaddr fields of the DHCP client request
1202	   in the same manner as described for creation of the client-hardware-
1203	   address option in section 6.2.

1205	   A bitmap-style Hash Bucket Assignment (as described in section 5.2 of
1206	   [LOADB]) is sent by the primary server to the secondary server when-
1207	   ever a connection is established, using the hash-bucket-assignment
1208	   option defined in section 6.2.  This Hash Bucket Assignment is used
1209	   by the secondary server to decide which packets to process when in
1210	   NORMAL state.

1212	   The way in which either primary or secondary servers determine the
1213	   hash bucket assignment for it to use when in other than NORMAL state
1214	   is outside of the scope of this document.  Note, however, that the
1215	   primary and secondary servers MUST use identical hash bucket assign-
1216	   ments when not in NORMAL state.  This common hash bucket assignment
1217	   MAY be for all of the hash buckets, indicating that there is no other
1218	   DHCP server sharing the load with this failover pair, or it MAY be
1219	   for a subset of the hash buckets, which would indicate that there
1220	   exists another server or server pair with which this DHCP server pair
1221	   is sharing the load.

1223	5.4.  Operating in NORMAL state

1225	   When in NORMAL state, each server services DHCPDISCOVER's and all
1226	   other DHCP requests other than DHCPREQUEST/RENEWAL or
1227	   DHCPREQUEST/REBINDING from the client set defined by the load balanc-
1228	   ing algorithm.  Each server services DHCPREQUEST/RENEWAL or
1229	   DHCPDISCOVER/REBINDING requests from any client.

1231	   In general, whenever the binding database is changed in stable
1232	   storage, then a BNDUPD message is sent with the contents of that
1233	   change to the partner server.  The partner server then writes the
1234	   information about that binding in its bindings database in stable
1235	   storage and replies with a BNDACK message.

1237	5.5.  Operating in COMMUNICATIONS-INTERRUPTED state

1239	   When operating in COMMUNICATIONS-INTERRUPTED state, each server is
1240	   operating independently, but does not assume that its partner is not
1241	   operating.  The partner server might be operating and simply unable
1242	   to communicate with this server, or might not be operating.

1244	   Each server responds to the full range of DHCP client messages that
1245	   it receives, but in such a way that graceful reintegration is always
1246	   possible when its partner comes back into contact with it.

1248	5.6.  Operating in PARTNER-DOWN state

1250	   When operating in PARTNER-DOWN state, a server assumes that its
1251	   partner is not currently operating, but does make allowances for the
1252	   possibility that that server was operating in the past, though possi-
1253	   bly out of communications with this server.  It responds to all DHCP
1254	   client requests in PARTNER-DOWN state.

1256	5.7.  Operating in RECOVER state

1258	   A server operating in RECOVER state assumes that it is reintegrating
1259	   with a server that has been operating in PARTNER-DOWN state, and that
1260	   it needs to update its bindings database before it services DHCP
1261	   client requests.

1263	   A server may also operate in RECOVER state in order to fully recover
1264	   its bindings database from its partner server.

1266	5.8.  Operating in STARTUP state

1268	   A server operating in STARTUP state assumes that failover is opera-
1269	   tional, and it spends a short time whenever it comes up attempting to
1270	   contact the partner.  During this time (generally a few seconds), the
1271	   server is unresponsive to DHCP client requests.  This period exists
1272	   in order to give a server a chance to determine that its partner has
1273	   changed state since it was last in communications, and to react to
1274	   that changed state (if any) prior to responding to DHCP client
1275	   requests.

1277	   The period of time a server remains in STARTUP state SHOULD be long
1278	   enough to ensure that it will connect to the other server if that
1279	   server is available for connections.

1281	5.9.  Time synchronization between servers

1283	   The failover protocol is designed to operate between two servers
1284	   which have time values which differ by an arbitrarily large amount.
1285	   A particular implementation MAY choose to only support servers whose
1286	   time values differ by an arbitrarily small amount.

1288	   In any event, whether large or only small differences in time values
1289	   are supported, every message that is received MUST be tagged with a
1290	   time value as soon as possible after receipt.  This time value is
1291	   used along with the time value that is sent in every message between
1292	   the failover partners to develop a delta time between the servers.
1293	   This delta time is used during the connection process to establish a
1294	   baseline delta time between the servers, and upon receipt of each
1295	   message, the delta time for that message is used to refine the delta
1296	   time for the server pair.

1298	   While the algorithm for this refinement of delta time is not speci-
1299	   fied as part of this protocol, a server SHOULD allow the delta time
1300	   value for a pair of failover servers to be periodically updated to
1301	   account for time drift.  In addition, the delta time value between
1302	   servers SHOULD be smoothed in some fashion, so that transient network
1303	   delays will not cause it to vary wildly.

1305	   A server SHOULD recognize a drastic change in the delta time value as
1306	   an event to be signaled to a network administrator.

1308	5.10.  IP address binding-status

1310	   In most DHCP servers an IP address can take on several different
1311	   binding-status values, sometimes also called states.  While no two
1312	   DHCP servers probably have exactly the same possible binding-status
1313	   values the DHCP RFC enforces some commonality among the general
1314	   semantics of the binding-status values used by various DHCP server
1315	   implementations.

1317	   In order to transmit binding database updates between one server and
1318	   another using the failover protocol, some common denominator
1319	   binding-status values must be defined.  It is not expected that these
1320	   binding-status-values correspond with any actual implementation of
1321	   the DHCP protocol in a DHCP server, but rather that the binding-
1322	   status values defined in this document should be a superset of most
1323	   if not all DHCP server implementations.  It is a goal of this proto-
1324	   col that any DHCP server can map the various IP address binding-
1325	   status values that it uses internally into these failover IP address
1326	   binding-status values on transmission of binding database updates to
1327	   its partner, and likewise that it can map any failover IP address
1328	   binding-status values into its internal IP address binding-status
1329	   values upon receipt of a binding database update.

1331	   The IP address binding-status values defined for the failover proto-
1332	   col are:

1334	      o FREE

1336	        Lease may be allocated to any DHCP client.

1338	      o ACTIVE

1340	        Lease is assigned to a client.  It MUST have client information
1341	        associated with it.

1343	      o EXPIRED

1345	        Lease has expired.  It may be allocated to the same client.

1347	      o RELEASED

1349	        Lease has been released by client.  It may be allocated to the
1350	        same client.

1352	      o ABANDONED
1353	        A server, or client flagged address as unusable.

1355	      o RESET

1357	        Lease was freed by some external agent.

1359	      o BACKUP

1361	        Lease belongs to secondary's private address pool.

1363	   These binding-status values are communicated from one failover
1364	   partner to another using the binding-status option, see section 6.2
1365	   for details of this option.  Unless otherwise noted above there MAY
1366	   be client information associated with each of these binding-status
1367	   values.

1369	   Again, note that a DHCP server implementing the failover protocol
1370	   does not have to implement either this state machine or use these
1371	   particular binding-status values in its normal operation of allocat-
1372	   ing IP addresses to DHCP clients.  It only needs to map its internal
1373	   binding-status-values onto these "standard" binding-status values,
1374	   and map these "standard" binding-status values back into its internal
1375	   binding-status values.  In particular, a server which implements a
1376	   grace period for a IP address binding SHOULD simply wait to update
1377	   its partner server until the grace period on that binding has run
1378	   out.

1380	   The process of setting an IP address to FREE deserves some detailed
1381	   discussion.  When an IP address is moved to the EXPIRED,RELEASED, or
1382	   RESET binding-status on a server, it will send a BNDUPD with the
1383	   binding-status of EXPIRED, RELEASED, or RESET to its partner.  If its
1384	   partner agrees that is acceptable (see sections 7.1.2 and 7.13 con-
1385	   cerning why a server might not accept a BNDUPD) it will return a
1386	   BNDACK with no reject-reason, signifying that it accepted the update.
1387	   As part of the BNDUPD processing, the server returning the BNDACK
1388	   will set the binding-status of the IP address to FREE, and upon
1389	   receipt of the BNDACK the server which sent the BNDUPD will set the
1390	   binding-status of the IP address to FREE.  Thus, the EXPIRED,
1391	   RELEASED, or RESET binding-status is something of a transitory state.
1392	   This process is encoded in the transition diagram below by "Comm
1393	   w/Partner".

1395	   An IP address will move between these lease binding-status values
1396	   using the following state transition diagram:

1398	                                        DHCP client DECLINE or
1399	                                        server detected problem
1400	                                        from any state
1401	                          +----------+     V   +---------+
1402	         External   >---->|   RESET  |     |   |ABANDONED|
1403	         command          |          |     +-->|         |
1404	                          +----------+         +---------+
1405	                               |
1406	                           Comm w/Parter
1407	                               V
1408	     +---------+  Comm    +----------+   Comm    +---------+
1409	     | EXPIRED |--------->|  FREE    |<----------| RELEASED|
1410	     |         | w/Parter |          | w/Partner |         |
1411	     +---------+          +----------+           +---------+
1412	       ^     ^             |        |                  ^
1413	       | Exp. grace   IP address  IP addr alloc.       |
1414	       | period ends  leased by   to secondary         |
1415	       |     |        primary       V                  |
1416	       |     |             |      +----------+         |
1417	       |     |             |      |  BACKUP  |         |
1418	       |   wait for        |      |          |         |
1419	       |  grace period     |      +----------+         |
1420	       |     |             |       |                   |
1421	       |     |             |    IP addr leased by      |
1422	       |  Expired grace    |       secondary           |
1423	       |  period exists    V       V                   |
1424	       |     |           +----------+                  |
1425	       |     | Lease on  |  ACTIVE  | DHCPRELEASE      |
1426	       +-----+-IP addr---|          |------------------+
1427	               expires   +----------+

1429	       Figure 5.10-1:  Transitions between binding-status values.

1431	   If a server receives a binding-status that it doesn't implement
1432	   internally, it should do something reasonable. A server which doesn't
1433	   support an ABANDONED binding-status could set the IP address ACTIVE
1434	   and belonging to a client which will never be seen in a DHCP request.

1436	5.10.1.  IP address binding-status changes from BNDUPD messages

1438	   IP addresses undergo binding status changes for several reasons,
1439	   including receipt and processing of DHCP client requests, administra-
1440	   tive inputs and receipt of BNDUPD messages.  Every DHCP server needs
1441	   to respond to DHCP client request and administrative inputs with
1442	   changes to its internal record of the binding-status of an IP
1443	   address, and this response is not in the scope of the failover proto-
1444	   col.  However, the receipt of BNDUPD messages implies at least a pos-
1445	   sible change of the binding-status for an IP address, and must be
1446	   discussed here.  See section 7.1.2 for general actions to take upon
1447	   receipt of a BNDUPD message.

1449	   When receiving a BNDUPD message, it is important to note that it may
1450	   not be current, in that the server receiving the BNDUPD message may
1451	   have had a more recent interaction with the DHCP client than its
1452	   partner who sent the BNDUPD message.  In this case, the receiving
1453	   server MUST reject the BNDUPD message.  In addition, it is worth not-
1454	   ing that two (and possibly three) binding-status values are the
1455	   direct result of interaction with a DHCP client, ACTIVE and RELEASED
1456	   (and possibly ABANDONED).  All other binding-status values are either
1457	   the result of the expiration of a time period or interaction with an
1458	   external agency (e.g., a network admistrator).

1460	   Every BNDUPD message SHOULD contain a client-last-transaction-time
1461	   option, which MUST, if it appears, be the time that the server last
1462	   interacted with the DHCP client.  It MUST NOT be, for instance, the
1463	   time that the lease on an IP address expired.  If there has been no
1464	   interaction with the DHCP client in question (or there is no DHCP
1465	   client presently associated with this IP address), then there will be
1466	   no client-last-transaction-time option in the BNDUPD message.

1468	   The following list is indexed by the binding-status that a server
1469	   receives in a BNDUPD message.  In many cases, the binding-status of
1470	   an IP address within the receiving server's data storage will have an
1471	   affect upon the checks performed prior to accepting the new binding-
1472	   status in a BNDUPD message.

1474	   In the following list, to "accept" a BNDUPD means to update the
1475	   server's bindings database with the information contained in the
1476	   BNDUPD and once that update is complete, send a BNDACK message
1477	   corresponding to the BNDUPD message.  To "reject" a BNDUPD means to
1478	   respond to the BNDUPD with a BNDACK with a reject-reason option
1479	   included..

1481	   When interpreting the rules in the following list, if a BNDUPD
1482	   doesn't have a client-last-transaction-time value, then it MUST NOT
1483	   be considered later than the client-last-transaction-time in the
1484	   receiving server's binding.   If the BNDUPD contains a client-last-
1485	   transaction-time value and the receiving server's binding does not,
1486	   then the client-last-transaction-time value in the BNDUPD MUST be
1487	   considered later than the server's.

1489	   The second rule concerns clients and IP addresses.  If the client in
1490	   a BNDUPD message the client in a receiving server's binding both
1491	   exist and if they differ, then if the receiving server's binding-
1492	   status is ACTIVE and the binding-status in the BNDUPD is ACTIVE, then
1493	   if the receiving server is a secondary server accept it, else reject
1494	   it.

1496	   Otherwise, look up the binding-status in the BNDUPD in this list:

1498	      o ACTIVE in BNDUPD

1500	        If the receiving server's binding-status is ACTIVE, FREE, or
1501	        BACKUP, then accept it.

1503	        If the receiving server's binding-status is ABANDONED or RESET,
1504	        then reject it.

1506	        If the receiving server's binding status is RELEASED, EXPIRED,
1507	        then if the client-last-transaction-time in the BNDUPD is later
1508	        than the client-last-transaction-time in the receiving server's
1509	        binding, accept it, else reject it.

1511	      o EXPIRED in BNDUPD

1513	        If the receiving server's binding-status is ACTIVE, then current
1514	        time is later than the receiving server's lease-expiration-time,
1515	        accept it, else reject it.

1517	        If the receiving server's binding-status is ABANDONED or RESET,
1518	        reject it.

1520	        If the receiving server's binding-status is FREE or BACKUP,
1521	        accept it.

1523	        If the receiving server's binding-status is RELEASED, then if
1524	        the client-last-transaction-time is greater in the BNDUPD than
1525	        in the receiving server's binding, then accept it, else reject
1526	        it.

1528	      o RELEASED in BNDUPD

1530	        If the receiving server's binding-status is ACTIVE, then if the
1531	        client-last-transaction-time is greater than the client-last-
1532	        transaction-time in the receiving server's binding, accept it,
1533	        else reject it.

1535	        If the receiving server's binding-status is RELEASED, FREE or
1536	        BACKUP, accept it.

1538	        If the receiving server's binding-status is ABANDONED or RESET,
1539	        reject it.

1541	      o FREE or BACKUP in BNDUPD

1543	        If the receiving server's binding-status is ACTIVE and the
1544	        current time is later than the lease-expiration-time accept it,
1545	        else reject it.

1547	        If the receiving server's binding-status is ABANDONED, reject
1548	        it.

1550	        If the receiving server's binding-status is FREE or BACKUP or
1551	        RESET, accept it.

1553	      o RESET or ABANDONDED in BNDUPD

1555	        Accept the new binding-status under all circumstances.

1557	5.11.  DNS dynamic update considerations

1559	   DHCP servers (and clients) can use DNS Dynamic Updates as described
1560	   in [RFC2136] to maintain DNS name-mappings as they maintain DHCP
1561	   leases.  Many different administrative models for DHCP-DNS integra-
1562	   tion are possible.  Descriptions of several of these models, and
1563	   guidelines that DHCP servers and clients should follow in carrying
1564	   them out, are laid out in [DDNS].  The nature of the DHCP failover
1565	   protocol introduces some issues concerning dynamic DNS updates that
1566	   are not part of non-failover DHCP environments.  This section
1567	   describes these issues, and defines the information which failover
1568	   partners should exchange and the protocol which they should follow in
1569	   order to ensure consistent behavior.  The presence of this section
1570	   should not be interpreted as requiring that implementations of the
1571	   DHCP failover protocol must also support DDNS updates.  The purpose
1572	   of this discussion is to clarify the areas where the DHCP failover
1573	   and DHCP-DDNS protocols intersect for the benefit of implementations
1574	   which support both protocols, not to introduce a new requirement into
1575	   the DHCP failover protocol.  Thus, a DHCP server which implements the
1576	   failover protocol MAY also support dynamic DNS updates, but if it
1577	   does support dynamic DNS updates it SHOULD utilize the techniques
1578	   described here in order to correctly distribute them between the
1579	   failover partners.

1581	5.11.1.  Relationship between failover and dynamic DNS update

1583	   The failover protocol describes the conditions under which each fail-
1584	   over server may renew a lease to its current DHCP client, and
1585	   describes the conditions under which it may grant a lease to a new
1586	   DHCP client.  An analogous set of conditions determines when a fail-
1587	   over server should initiate a DDNS update, and when it should attempt
1588	   to remove records from the DNS. The failover protocol's conditions
1589	   are based on the desired external behavior: avoiding duplicate
1590	   address assignments; allowing clients to continue using leases which
1591	   they obtained from one failover partner even if they can only commun-
1592	   icate with the other partner; allowing the backup DHCP server to
1593	   grant new leases even if it is unable to communicate with the primary
1594	   server.  The desired external DDNS behavior for DHCP failover servers
1595	   is:

1597	      1.  Allow timely DDNS updates from the server which grants a
1598	          client a lease. Recognize that there is often a DDNS update
1599	          lifecycle which parallels the DHCP lease lifecycle. This is
1600	          likely to include the addition of records when the lease is
1601	          granted, and the removal of DNS records when the lease is sub-
1602	          sequently made available for allocation to a different client.

1604	      2.  Communicate enough information between the two failover
1605	          servers to allow one to complete the DDNS update 'lifecycle'
1606	          even if the other server originally granted the lease.

1608	      3.  Avoid redundant or overlapping DDNS updates, where both fail-
1609	          over servers are attempting to perform DDNS updates for the
1610	          same lease-client binding. Avoid situations where one partner
1611	          is attempting to add RRs related to a lease binding while the
1612	          other partner is attempting to remove RRs related to the same
1613	          lease binding.

1615	5.11.2.  Use of the DDNS option

1617	   In order for either server to be able to complete a DDNS update, or
1618	   to remove DNS records which were added by its partner, both servers
1619	   need to know the FQDN associated with the lease-client binding. The
1620	   FQDN associated with the client's A RR and PTR RR SHOULD be communi-
1621	   cated from the server which adds records into the DNS to its partner.
1622	   The initiating server SHOULD use the DDNS option in the BNDUPD mes-
1623	   sages to inform the partner server of the status of any DDNS updates
1624	   associated with a lease binding. Failover servers MAY choose not to
1625	   include the DDNS option in BNDUPD messages if there has been no
1626	   change in the status of any DDNS update related to the lease binding.
1627	   The partner server receiving BNDUPD messages containing the ddn
1628	   option SHOULD compare the status flags and the FQDN contained in the
1629	   option data with the current DDNS information it has associated with
1630	   the lease binding, and update its notion of the DDNS status accord-
1631	   ingly.

1633	   The initiating server MAY send a BNDUPD to its partner before the
1634	   DDNS update has been successfully completed. If it does so, it SHOULD
1635	   leave the 'C' bit in the Flags field clear, to indicate to the
1636	   partner that the DDNS update may not be complete. When the DDNS
1637	   update has been successfully acknowledged by the DNS server, the ini-
1638	   tiating DHCP server SHOULD include the DDNS option in its next BNDUPD
1639	   message about the binding, so that the partner server will be able to
1640	   record the final status of the DDNS update. The initiating server
1641	   SHOULD set the 'C' bit in the DDNS option if the DDNS update was suc-
1642	   cessfully accepted by the DNS server.

1644	   Some implementations will choose to send a BNDUPD without waiting for
1645	   the DDNS update to complete, and then will send a second BNDUPD once
1646	   the DDNS update is complete. Other implementations will delay sending
1647	   the partner a BNDUPD until the DDNS update has been acknowledged by
1648	   the DNS server, or until some time-limit has elapsed, in order to
1649	   avoid sending a second BNDUPD.

1651	   The Domain Name field in the DDNS option contains the FQDN that will
1652	   be associated with the A RR (if the server is performing an A RR
1653	   update for the client) and the PTR RR. This FQDN may be composed in
1654	   any of several ways, depending on server configuration and the infor-
1655	   mation provided by the client in its DHCP messages. The client may
1656	   supply a hostname which it would like the server to use in forming
1657	   the FQDN, or it may supply the entire FQDN. The server may be config-
1658	   ured to attempt to use the information the client supplies, it may be
1659	   configured with an FQDN to use for the client, or it may be config-
1660	   ured to synthesize an FQDN. The responsive server SHOULD include the
1661	   FQDN that it will be using in DDNS updates it initiates when it sends
1662	   the DDNS option.

1664	   Since the responsive server may not have completed the DDNS update at
1665	   the time it sends the first BNDUPD about the lease binding, there may
1666	   be cases where the FQDN in later BNDUPD messages does not match the
1667	   FQDN included in earlier messages. For example, the responsive server
1668	   may be configured to handle situations where two or more DHCP client
1669	   FQDNs are identical by modifying the most-specific label in the FQDNs
1670	   of some of the clients in an attempt to generate unique FQDNs for
1671	   them. Alternatively, at sites which use some or all of the informa-
1672	   tion which clients supply to form the FQDN, it's possible that a
1673	   client's configuration may be changed so that it begins to supply new
1674	   data. The responsive server may react by removing the DNS records
1675	   which it originally added for the client, and replacing them with
1676	   records that refer to the client's new FQDN. In such cases, the
1677	   responsive server SHOULD include the actual FQDN that was used in
1678	   subsequent DDNS options. The responsive server SHOULD include
1679	   relevant client-option data in the client-request-options option in
1680	   its BNDUPD messages. This information may be necessary in order to
1681	   allow the non-responsive partner to detect client configuration
1682	   changes that change the hostname or FQDN data which the client
1683	   includes in its DHCP requests.

1685	5.11.3.  Adding RRs to the DNS

1687	   A failover server which is going to perform DDNS updates SHOULD ini-
1688	   tiate the DDNS update when it grants a new lease to a client. The
1689	   non-responsive partner SHOULD NOT initiate a DDNS update when it
1690	   receives the BNDUPD after the lease has been granted. The failover
1691	   protocol ensures that only one of the partners will grant a lease to
1692	   any individual client, so it follows that this requirement will
1693	   prevent both partners from initiating updates simultaneously. The
1694	   server initiating the update SHOULD follow the protocol in [DDNS].
1695	   The server may be configured to perform an A RR update on behalf of
1696	   its clients, or not. Ordinarily, a failover server will not initiate
1697	   DDNS updates when it renews leases. In two cases, however, a failover
1698	   server MAY initiate a DDNS update when it renews a lease to its
1699	   existing client:

1701	      1.  When the lease was granted before the server was configured to
1702	          perform DDNS updates, the server MAY be configured to perform
1703	          updates when it next renews existing leases. Since both
1704	          servers are responsive to renewals in NORMAL state, it is not
1705	          enough to simply require the non-responsive server to avoid a
1706	          DNS update in this case.  The server which would be responsive
1707	          to a DHCPDISCOVER from this client (even though the current
1708	          request is a DHCPREQUEST/RENEW) is the server which should
1709	          initiate the DDNS update.

1711	      2.  If a server is in PARTNER-DOWN state, it can conclude that its
1712	          partner is no longer attempting to perform an update for the
1713	          existing client. If the remaining server has not recorded that
1714	          an update for the binding has been successfully completed, the
1715	          server MAY initiate a DDNS update.  It MAY initiate this
1716	          update immediately upon entry to PARTNER-DOWN state, it may
1717	          perform this in the background, or it MAY initiate this update
1718	          upon next hearing from the DHCP client.

1720	5.11.4.  Deleting RRs from the DNS

1722	   The failover server which makes a lease FREE SHOULD initiate any DDNS
1723	   deletes, if it has recorded that DNS records were added on behalf of
1724	   the client.

1726	   A server "makes a lease FREE" when it initiates a BNDUPD with a
1727	   binding-status of FREE, EXPIRED, or RELEASED.  Its partner confirms
1728	   this status by acking that BNDUPD, and upon receipt of the ACK the
1729	   server has "made the address FREE". It is at this point that it
1730	   should initiate the DDNS operations to delete RRs from the DDNS.  Its
1731	   partner SHOULD NOT initiate DDNS deletes for DNS records related to
1732	   the lease binding as part of sending the BNDACK message.   The
1733	   partner MAY have issued BNDUPD messages with a binding-status of
1734	   FREE, EXPIRED, or RELEASED previously, but the other server will have
1735	   NAKed these BNDUPD messages.

1737	   The failover protocol ensures that only one of the two partner
1738	   servers will be able to make a lease FREE. The server making the
1739	   lease FREE may be doing so while it is in NORMAL communication with
1740	   its partner, or it may be in PARTNER-DOWN state. If a server is in
1741	   PARTNER-DOWN state, it may be performing DDNS deletes for RRs which
1742	   its partner added originally. This allows a single remaining partner
1743	   server to assume responsibility for all of the DDNS activity which
1744	   the two servers were undertaking.

1746	   Another implication of this approach is that no DDNS RR deletes will
1747	   be performed while either server is in COMMUNICATIONS-INTERRUPTED
1748	   state, since no IP addresses are moved into the FREE state during
1749	   that period.

1751	5.12.  Reservations and failover

1753	   Some DHCP servers support a capability to offer specific pre-
1754	   configured IP addresses to DHCP clients.  These are real DHCP
1755	   clients, they do the entire DHCP protocol, but these servers always
1756	   offer the client a specific pre-configured IP address -- and they
1757	   offer that IP address to no other clients.  Such a capability has
1758	   several names, but it is sometimes called a "reservation", in that
1759	   the IP address is reserved for a particular DHCP client.

1761	   In a situation where there are two DHCP server serving the same sub-
1762	   net without using failover, the two DHCP server's need to have dis-
1763	   joint IP address pools, but identical reservations for the DHCP
1764	   clients.

1766	   In a failover context, both servers need to be configured with the
1767	   proper reservations in an identical manner, but if we stop there
1768	   problems can occur around the edge conditions where reservations are
1769	   made for an IP address that has already been leased to a different
1770	   client.  Different servers handle this conflict in different ways,
1771	   but the goal of the failover protocol is to allow correct operation
1772	   with any server's approach to the normal processing of the DHCP pro-
1773	   tocol.

1775	   The general solution with regards to reservations is as follows.
1776	   Whenever a reserved IP address becomes FREE (i.e., when first config-
1777	   ured or whenever a client frees it or it expires or is reset), the
1778	   primary server MUST show that IP address as FREE (and thus available
1779	   for its own allocation) and it MUST send it to the secondary server
1780	   as BACKUP, in order that the secondary server be able to allocate it
1781	   as well.

1783	5.13.  Dynamic BOOTP and failover

1785	   Some DHCP servers support a capability to offer IP addresses to BOOTP
1786	   clients without having a particular address previously allocated for
1787	   those clients.  This capability is often called something like
1788	   "dynamic BOOTP".  It is not a capability explicitly discussed in
1789	   either the DHCP or BOOTP RFC's, but rather a pragmatic capability
1790	   which can work reasonably well for a small set of legacy BOOTP dev-
1791	   ices.

1793	   This capability has a negative interaction with the fundamental ele-
1794	   ments of the failover protocol, in that an address handed out to a
1795	   BOOTP device has no term (or effectively no term, in that usually
1796	   they are considered leases for "forever").  There is no opportunity
1797	   to hand out a lease which is only the MCLT long when first hearing
1798	   from a BOOTP device, because they may only interact once with the
1799	   DHCP server and they have no notion of a lease expiration time.  Thus
1800	   the entire concept of the MCLT and waiting the MCLT after entering
1801	   PARTNER-DOWN state is broken when dealing with BOOTP devices.

1803	   With some restrictions, however, dynamic BOOTP devices can be sup-
1804	   ported in a server on a subnet where failover is supported.  The only
1805	   restriction (and it is not small) is that on any portion of the sub-
1806	   net (in any address pool) where dynamic BOOTP devices can be allo-
1807	   cated IP addresses, a DHCP server MUST NOT ever use any of the IP
1808	   addresses which were previously available for allocation by its fail-
1809	   over partner.  Thus, the addresses allocated by the primary to the
1810	   secondary for allocation MUST NOT ever be used by the primary server
1811	   even if it is in PARTNER-DOWN state and has waited the MCLT after
1812	   entering that state.  The reason for this is because one of those IP
1813	   address could have been allocated by the secondary server to a BOOTP
1814	   device, and the primary server would have no way of ever knowing that
1815	   happened.

1817	5.14.  Guidelines for selecting MCLT

1819	   There is no one correct value for the MCLT.  There is an explicit
1820	   tradeoff between various factors in selecting an MCLT value.

1822	5.14.1.  Short MCLT

1824	   A short MCLT value will mean that after entering PARTNER-DOWN state,
1825	   a server will only have to wait a short time before it can start
1826	   allocating its partner's IP addresses to DHCP clients.  Furthermore,
1827	   it will only have to wait a short time after the expiration of a
1828	   lease on an IP address before it can reallocate that IP address to
1829	   another DHCP client.

1831	   However the downside of a short MCLT value is that the initial lease
1832	   interval that will be offered to every new DHCP client will be short,
1833	   which will cause increased traffic as those clients will need to send
1834	   in their first renew in a half of a short MCLT time.  In addition,
1835	   the lease extensions that a server in COMMUNICATIONS-INTERRUPTED
1836	   state can give will be only the MCLT after the server has been in
1837	   COMMUNICATIONS-INTERRUPTED for around the desired client lease
1838	   period.  If a server stays in COMMUNICATIONS-INTERRUPTED for that
1839	   long, then the leases it hands out will be short and that will
1840	   increase the load on that server, possibly causing difficulty.

1842	5.14.2.  Long MCLT

1844	   A long MCLT value will mean that the initial lease period will be
1845	   longer and the time that a server in COMMUNICATIONS-INTERRUPTED state
1846	   will be able to extend leases (after it has been in COMMUNICATIONS-
1847	   INTERRUPTED state for around the desired client lease period) will be
1848	   longer.

1850	   However, a server entering PARTNER-DOWN state will have to wait the
1851	   longer MCLT before being able to allocate its partner's IP addresses
1852	   to new DHCP clients.  This may mean that additional IP addresses are
1853	   required in order to cover this time period.  Further, the server in
1854	   PARTNER-DOWN will have to wait the longer MCLT from every lease
1855	   expiration before it can reallocate an IP address to a different DHCP
1856	   client.

1858	6.  Packet Formats

1860	   This section discusses the common message format that all failover
1861	   messages have in common, and then defines option used in the failover
1862	   protocol.

1864	6.1.  Common message format

1866	   All failover protocol messages are sent over the TCP connection
1867	   between failover endpoints and encoded using a message format
1868	   specific to the failover protocol.

1870	   There exists a common message format for all failover messages, which
1871	   utilizes the options in a way similar to the DHCP protocol.  For each
1872	   message type, some options are required and some are optional.  In
1873	   addition, when a message is received any options that are not
1874	   understood by the receiving server MUST be ignored.

1876	   All of the fields in the fixed portion of the message MUST be filled
1877	   with correct data in every message sent.

1879	   0                   1                   2                   3
1880	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1881	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1882	   |         message length (2)     | msg type (1)  |payload off (1)|
1883	   +---------------+---------------+---------------+---------------+
1884	   |                            time (4)                           |
1885	   +---------------------------------------------------------------+
1886	   |                            xid (4)                            |
1887	   +---------------------------------------------------------------+
1888	   |     0 or more additional header bytes  (variable)             |
1889	   +---------------------------------------------------------------+
1890	   |                    payload data  (variable)                   |
1891	   |                                                               |
1892	   |               formatted as DHCP-style options                 |
1893	   |         using a unique option number space in the RFC TBD     |
1894	   |                   format defined by [NAMESPACE]               |
1895	   +---------------------------------------------------------------+

1897	   message length - 2 bytes, network byte order

1899	   This is the length of the message.  It includes the two byte message
1900	   length itself.  The maximum length is 2048 bytes.

1902	   msg type - 1 byte

1904	   The message type field is used to distinguish between messages.

1906	   The following message types are defined:

1908	   Value   Message Type
1909	   -----   ------------
1910	   0       reserved    not used
1911	   1       POOLREQ     request allocation of addresses
1912	   2       POOLRESP    respond with allocation count
1913	   3       BNDUPD      update partner with binding info
1914	   4       BNDACK      acknowledge receipt of binding update
1915	   5       CONNECT     establish connection with the secondary
1916	   6       CONNECTACK  respond to attempt to establish connection with partner
1917	   7       UPDREQALL   request full transfer of binding info
1918	   8       UPDDONE     ack send and ack of req'd binding info
1919	   9       UPDREQ      req transfer of un-acked binding info
1920	   10      STATE       inform partner of current state or state change
1921	   11      CONTACT     probe communications integrity with partner
1922	   12      DISCONNECT  close a connection

1924	   New message types should be defined in one of two ranges, 0-127 or
1925	   129-255.  The range of 0-127 is used for messages that MUST be sup-
1926	   ported by every server, and if a server receives a message in the
1927	   range of 0-127 that it doesn't understand, it MUST close the TCP con-
1928	   nection.  The range of 128-255 is used for messages which MAY be sup-
1929	   ported but are not required, and if a server receives a message in
1930	   this range that it does not understand it SHOULD ignore the message.

1932	   payload offset - 1 byte

1934	   The byte offset of the Payload Data, from the beginning of the
1935	   failover message header. The value for the current protocol version
1936	   is 8.

1938	   time - 4 bytes, network byte order

1940	   The absolute time in GMT when the message was transmitted,
1941	   represented as seconds elapsed since Jan 1, 1970 (i.e., similar to
1942	   the ANSI C time_t time value representation).  While the ANSI C
1943	   time_t value is signed, the value used in this specification is
1944	   unsigned.

1946	   A server SHOULD set this time as close to the actual transmission of
1947	   the message as possible.

1949	   xid - 4 bytes, network byte order

1951	   This is the transaction id of the failover message.  The sender of a
1952	   failover protocol message is responsible for setting this number, and
1953	   the receiver of the message copies the number over into any response
1954	   message, treating it as opaque data.  The sender SHOULD ensure that
1955	   every message sent from a particular failover endpoint over the
1956	   associated TCP connection has a unique transaction id unless that
1957	   message is a re-transmission.

1959	   payload data - variable length

1961	   The options are placed after the header, after skipping payload
1962	   offset bytes from beginning of the message.  The payload data options
1963	   are not preceded by a "cookie" value.

1965	   The payload data is formatted as DHCP style options using the two
1966	   byte option number and two byte option length format as specified in
1967	   the recommendations of the DHCP panel in [NAMESPACE].

1969	   The maximum length of the payload data in octets is 2048 less the
1970	   size of the header, i.e., the maximum message length is 2048 octets.

1972	6.2.  Common option format

1974	   The options contained in the payload data section of the failover
1975	   message all use the two byte option number and two byte length format
1976	   as specified by the recommendations of the DHCP panel in [NAMESPACE].

1978	   The option numbers are drawn from an option number space unique to
1979	   the failover protocol.  All of the message types share a common
1980	   option number space and common options definitions, though not all
1981	   options are required or meaningful for every message.

1983	   In contrast to the options which appear in DHCP client and server
1984	   messages, the options in failover message are ordered.  That is, for
1985	   some messages the order in which the options appear in the payload
1986	   data area is significant.  The messages for which this is the case
1987	   spell it out in detail.

1989	   For all options which refer to time, they all use an absolute time in
1990	   GMT.  Time synchronization has already been achieved between the
1991	   source and the target server using the CONNECT message and is updated
1992	   using the time in every packet.  All time fields in the options
1993	   defined below use a time represented as seconds elapsed since Jan 1,
1994	   1970 (i.e. ANSI C time_t time value representation).  Note that this
1995	   is (at present) a signed field.

1997	   Additional options can be defined for intervendor or vendor specific
1998	   use with limited difficulty due to the large number of option numbers
1999	   available.

2001	6.2.1.  binding-status

2003	   This option is used to convey the current state of a binding.

2005	       Code          Len     Type
2006	   +-----+-----+------+-----+-----+
2007	   |  0  |  1  |   0  |  1  | 1-7 |
2008	   +-----+-----+------+-----+-----+

2010	   Legal values for this option are:

2012	   Value Binding Status
2013	   ----- ------------------------------------------------
2014	   1     FREE           Lease has never been used
2015	   2     ACTIVE         Lease is assigned to a client
2016	   3     EXPIRED        Lease has expired
2017	   4     RELEASED       Lease has been released by client
2018	   5     ABANDONED      A server, or client flagged address as unusable
2019	   6     RESET          Lease was freed by some external agent
2020	   7     BACKUP         Lease belongs to secondary's private address pool

2022	6.2.2.  assigned-IP-address

2024	   The IP address to which this message refers.

2026	        Code         Len          Address
2027	   +-----+-----+------+-----+----+-----+-----+-----+
2028	   |  0  |  2  |   0  |  4  | a1 |  a2 |  a3 |  a4 |
2029	   +-----+-----+------+-----+----+-----+-----+-----+

2031	6.2.3.  sending-server-IP-address

2033	   The IP address of the server sending this message.

2035	        Code         Len          Address
2036	   +-----+-----+------+-----+----+-----+-----+-----+
2037	   |  0  |  3  |   0  |  4  | a1 |  a2 |  a3 |  a4 |
2038	   +-----+-----+------+-----+----+-----+-----+-----+

2040	6.2.4.  addresses-transferred

2042	   A 32 bit unsigned long in network byte order. Reports the number of
2043	   addresses transferred by the primary to the secondary server
2044	   (addresses to be used for the secondary server's private address
2045	   pool)

2047	        Code         Len       Number of Addresses
2048	   +-----+-----+------+-----+----+-----+-----+-----+
2049	   |  0  |  4  |   0  |  4  | n1 |  n2 |  n3 |  n4 |
2050	   +-----+-----+------+-----+----+-----+-----+-----+

2052	6.2.5.  client-identifier

2054	   The format, code and conventions used are identical to DHCP option
2055	   61.

2057	        Code         Len       Client Identifier
2058	   +-----+-----+------+-----+----+-----+---
2059	   |  0  |  5  |   0  |  n  | i1 |  i2 | ...
2060	   +-----+-----+------+-----+----+-----+--

2062	6.2.6.  client-hardware-address

2064	   The format is similar to DHCP option 61. Byte t1 (type) MUST be set
2065	   to the proper ARP hardware address code, as defined in the ARP
2066	   section of RFC 1700 (it MUST NOT be zero!)

2068	        Code         Len     htype   chaddr
2069	   +-----+-----+------+-----+----+-----+-----+---
2070	   |  0  |  6  |   0  |  n  | t1 |  c1 |  c2 | ...
2071	   +-----+-----+------+-----+----+-----+-----+---

2073	   Either client-identifier, client-hardware-address or BOTH MAY be
2074	   present in binding update transactions. At least one of them MUST be
2075	   present.  If both are present, the client-identifier MUST be used to
2076	   uniquely identify the owner of the binding (exactly as in RFC 2131).

2078	6.2.7.  DDNS

2080	   If an implementation supports Dynamic DNS updates, this option is
2081	   used to communicate the status of the DDNS update associated with a
2082	   particular lease binding.  The Flags field conveys the types of DNS
2083	   RRs that are to be updated by the DHCP server, and the status of the
2084	   DDNS update.  The Domain Name field conveys the DNS FQDN that the
2085	   DHCP server is using to refer to the client, in DNS encoding as
2086	   specified in [RFC1035].

2088	       Code         Len        Flags      Domain Name
2089	   +-----+-----+------+-----+-----+------+------+-----+------
2090	   |  0  |  7  |   0  |  n  |   flags    |  d1  |  d2 | ...
2091	   +-----+-----+------+-----+-----+------+------+-----+------

2093	   The Flags field is a 16-bit field; several bit positions are
2094	   specified here.

2096	   15               7             0
2097	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2098	   |           MBZ         |P|D|A|C|
2099	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2101	   The bits (numbered from the least-significant bit in network
2102	   byte-order) are used as follows:

2104	   0 (C): A RR update successfully completed
2105	   1 (A): Server is controlling A RR on behalf of the client
2106	   2 (D): PTR RR update successfully completed (Done)
2107	   3 (P): Server is controlling PTR RR on behalf of the client
2108	   4-15 : Must be zero

2110	   All of the unspecified bit positions SHOULD be set to 0 by servers
2111	   sending the Failover-DDNS option, and they MUST be ignored by servers
2112	   receiving the option.

2114	6.2.8.  reject-reason

2116	   This option is used to selectively reject binding updates. It MAY be
2117	   used in BNDACK message, always associated with an assigned-IP-address
2118	   option, which contains the IP address of the update being rejected.

2120	        Code         Len     Reason Code
2121	   +-----+-----+------+-----+----------+
2122	   |  0  |  8  |   0  |  1  |    R1    |
2123	   +-----+-----+------+-----+----------+

2125	   Reason codes :

2127	   0   Reserved
2128	   1   Illegal IP address (not part of any address pool)
2129	   2   Fatal conflict exists: address in use by other client.
2130	   3   Missing binding information.
2131	   4   Connection rejected, time mismatch too great.
2132	   5   Connection rejected, invalid MCLT.
2133	   6   Connection rejected, unknown reason.
2134	   7   Connection rejected, duplicate connection.
2135	   8   Connection rejected, invalid failover partner.
2136	   9   TLS not supported
2137	   10  TLS supported but not configured
2138	   11  TLS required but not supported by partner
2139	   12  Message digest not supported
2140	   13  Message digest not configured
2141	   14  Protocol version mismatch
2142	   15  Missing binding information
2143	   16  Outdated binding information
2144	   17  Less critical binding information
2145	   18  No traffic within sufficient time
2146	   19  Hash bucket assignment conflict
2147	   20-253, reserved.
2148	   254 Unknown: Error occurred but does not match any reason code
2149	   255 Reserved for code expansion

2151	6.2.9.  message

2153	   This option is used to supply a human readable message.  It may be
2154	   used in association with the Reject Reason Code to provide a human
2155	   readable error message for the reject.

2157	        Code         Len         Text
2158	   +-----+-----+------+-----+------+-----+--
2159	   |  0  |  9  |   0  |  n  |  c1  | c2  | ...
2160	   +-----+-----+------+-----+------+-----+--

2162	6.2.10.  MCLT

2164	   Maximum Client Lead Time, in seconds.  A 32 bit integer value, in
2165	   network byte order.

2167	        Code         Len             Time
2168	   +-----+-----+------+-----+----+-----+-----+-----+
2169	   |  0  |  10 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
2170	   +-----+-----+------+-----+----+-----+-----+-----+

2172	6.2.11.  vendor-class-identifier

2174	   A string which identifies the vendor of the failover protocol
2175	   implementation.

2177	   The code for this option is 60, and its minimum length is 1.

2179	        Code         Len           vendor class string
2180	   +-----+-----+------+-----+----+-----+---
2181	   |  0  |  11 |   0  |  n  | c1 |  c2 |  ...
2182	   +-----+-----+------+-----+----+-----+---

2184	6.2.12.  lease-expiration-time

2186	   The lease expiration time expressed as an absolute time in GMT
2187	   represented as seconds elapsed since Jan 1, 1970 (i.e.  ANSI C time_t
2188	   time value representation).

2190	   The lease expiration time is the time that a server has ACKed to a
2191	   DHCP client.

2193	        Code         Len          Time
2194	   +-----+-----+------+-----+----+-----+-----+-----+
2195	   |  0  |  13 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
2196	   +-----+-----+------+-----+----+-----+-----+-----+

2198	6.2.13.  potential-expiration-time

2200	   The potential expiration time expressed as an absolute time in GMT
2201	   represented as seconds elapsed since Jan 1, 1970 (i.e.  ANSI C time_t
2202	   time value representation).

2204	   The potential expiration time is the time that one server tells
2205	   another server that it may ACK to a client.

2207	        Code         Len          Time
2208	   +-----+-----+------+-----+----+-----+-----+-----+
2209	   |  0  |  14 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
2210	   +-----+-----+------+-----+----+-----+-----+-----+

2212	6.2.14.  grace-expiration-time

2214	   The grace expiration time expressed as an absolute time in GMT
2215	   represented as seconds elapsed since Jan 1, 1970 (i.e.  ANSI C time_t
2216	   time value representation).

2218	   The grace expiration time is the time that a grace period will
2219	   expire.

2221	        Code         Len          Time
2222	   +-----+-----+------+-----+----+-----+-----+-----+
2223	   |  0  |  15 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
2224	   +-----+-----+------+-----+----+-----+-----+-----+

2226	6.2.15.  client-last-transaction-time

2228	   The time at which this server last received a DHCP request from a
2229	   particular client expressed as an absolute time in GMT represented as
2230	   seconds elapsed since Jan 1, 1970 (i.e.  ANSI C time_t time value
2231	   representation).

2233	        Code         Len       Partner Down Time
2234	   +-----+-----+------+-----+----+-----+-----+-----+
2235	   |  0  |  16 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
2236	   +-----+-----+------+-----+----+-----+-----+-----+

2238	6.2.16.  start-time-of-state

2240	   The time at which the state contained in this message began,
2241	   expressed as an absolute time in GMT represented as seconds elapsed
2242	   since Jan 1, 1970 (i.e.  ANSI C time_t time value representation).

2244	   This option is used for different states in different messages.  In a
2245	   BNDUPD message it represents the start time of the state of the lease
2246	   in the BNDUPD message.  In a STATE message, it represents the start
2247	   time of the partner server's failover state.

2249	        Code         Len      Start Time of State
2250	   +-----+-----+------+-----+----+-----+-----+-----+
2251	   |  0  |  17 |   0  |  4  | t1 |  t2 |  t3 |  t4 |
2252	   +-----+-----+------+-----+----+-----+-----+-----+

2254	6.2.17.  server-state

2256	   This option is used to convey the current state of the failover
2257	   endpoint in the sending server.

2259	       Code          Len     Server State
2260	   +-----+-----+------+-----+-----+
2261	   |  0  |  18 |   0  |  1  | 1-9 |
2262	   +-----+-----+------+-----+-----+

2264	   Legal values for this option are:

2266	   Value   Server State
2267	   -----   -------------------------------------------------------------
2268	   0       reserved
2269	   1       STARTUP                      Startup state (1)
2270	   2       NORMAL                       Normal state
2271	   3       COMMUNICATIONS-INTERRUPTED   Communication interrupted (safe)
2272	   4       PARTNER-DOWN                 Partner down (unsafe mode)
2273	   5       POTENTIAL-CONFLICT           Synchronizing
2274	   6       RECOVER                      Recovering bindings from partner
2275	   7       PAUSED                       Shutting down for a short period.
2276	   8       SHUTDOWN                     Shutting down for an extended
2277	                                        period.
2278	   9       RECOVER-DONE                 Interlock state prior to NORMAL

2280	6.2.18.  server-flags

2282	   This option is used to convey the current flags of the failover
2283	   endpoint in the sending server.

2285	       Code          Len     Server Flags
2286	   +-----+-----+------+-----+-------+
2287	   |  0  |  19 |   0  |  1  | flags |
2288	   +-----+-----+------+-----+-------+

2290	   Legal values for this option are:

2292	   Currently, bit 5 is defined.  All other bits
2293	   are reserved, and must be set to 0.

2295	      o STARTUP

2297	        Bit 5 is the STARTUP flag.  Bit 5 MUST be set to 1 whenever the
2298	        server is in STARTUP state, and set to 0 otherwise.  (Note that
2299	        when in STARTUP state, the state transmitted in the server-state
2300	        option is usually the last recorded state from stable storage,
2301	        but see section 9.3 for details.)

2303	6.2.19.  vendor-specific-options

2305	   This option is used to convey options specific to a particular
2306	   vendor's implementation.  The vendor class identifier is used to
2307	   specify which option space the embedded options are drawn from.

2309	   It functions similarly to the vendor class identifier and vendor
2310	   specific options in the DHCP protocol.

2312	   This option contains other options in the same two byte code, two
2313	   byte length format.  If this option appears in a message without a
2314	   corresponding vendor class identifier, it MUST be ignored.

2316	        Code         Len        Embedded options
2317	   +-----+-----+------+-----+----+-----+---
2318	   |  0  |  20 |   0  |  n  | c1 |  c2 |  ...
2319	   +-----+-----+------+-----+----+-----+---

2321	6.2.20.  max-unacked-bndupd

2323	   The maximum number of BNDUPD message that this server is prepared to
2324	   accept over the TCP connection without causing the TCP connection to
2325	   block.

2327	        Code         Len     Maximum Unacked BNDUPD
2328	   +-----+-----+------+-----+----+-----+-----+-----+
2329	   |  0  |  21 |   0  |  4  | n1 |  n2 |  n3 |  n4 |
2330	   +-----+-----+------+-----+----+-----+-----+-----+

2332	6.2.21.  receive-timer

2334	   The number of seconds within which the server must receive a message
2335	   from its partner, or it will assume that the partner is down or the
2336	   communication path to the partner has failed.

2338	        Code         Len         Receive Timer
2339	   +-----+-----+------+-----+----+-----+-----+-----+
2340	   |  0  |  23 |   0  |  4  | s1 |  s2 |  s3 |  s4 |
2341	   +-----+-----+------+-----+----+-----+-----+-----+

2343	6.2.22.  hash-bucket-assignment

2345	   The set of hash values to which the receiving server MUST respond.
2346	   See section 5.3 for more information on how this option is used.

2348	   The format and usage of the data in this option is defined in
2349	   [LOADB].

2351	        Code         Len        Hash Buckets
2352	   +-----+-----+------+-----+----+-----+-----+-----+
2353	   |  0  |  24 |   0  |  32 | b1 |  b2 | ... | b32 |
2354	   +-----+-----+------+-----+----+-----+-----+-----+

2356	6.2.23.  message-digest

2358	   The message digest for this message.

2360	   This option consists of a variable number of bytes which contain the
2361	   message digest of the message prior to the inclusion of this option.

2363	   When this option appears in a message, it MUST appear as the last
2364	   option in the message.

2366	        Code         Len       Message Digest
2367	   +-----+-----+------+-----+----+-----+-----
2368	   |  0  |  25 |   0  |  n  | d1 |  d2 | ...
2369	   +-----+-----+------+-----+----+-----+-----

2371	6.2.24.  protocol-version

2373	   The protocol version being used by the server. It is only sent in the
2374	   CONNECT and CONNECTACK messages.

2376	        Code         Len    Version
2377	   +-----+-----+------+-----+----+
2378	   |  0  |  26 |   0  |  1  | v1 |
2379	   +-----+-----+------+-----+----+

2381	6.2.25.  TLS-request

2383	   This option contains information relating to TLS security
2384	   negotiation.  It is sent in a CONNECT message

2386	   The first byte, req, is the TLS request from this server.  A value of
2387	   0 indicates no TLS operation, a value of 1 indicates that TLS
2388	   operation is desired, and a value of 2 indicates that TLS operation
2389	   is required to establish communications with this server.

2391	   The second byte, acc, is what this server will accept for TLS
2392	   operation.  A value of 0 means that this server will not accept TLS
2393	   connections.  A value of 1 means that this server will accept TLS
2394	   connections.

2396	   If req is not zero, then acc MUST be 1.

2398	   This allows a server which is not configured to require TLS support
2399	   to inform its partner that it will accept a TLS connection although
2400	   it does not desire one, for instance.

2402	        Code         Len  request accept
2403	   +-----+-----+------+-----+----+----+
2404	   |  0  |  27 |   0  |  2  | req| acc|
2405	   +-----+-----+------+-----+----+----+

2407	6.2.26.  TLS-reply

2409	   This option contains information relating to TLS security
2410	   negotiation.  It is sent in a CONNECTACK message

2412	   The value of 0 indicates no TLS operation, a value of 1 indicates
2413	   that TLS operation is required.

2415	        Code         Len     TLS
2416	   +-----+-----+------+-----+----+
2417	   |  0  |  28 |   0  |  1  | t1 |
2418	   +-----+-----+------+-----+----+

2420	6.2.27.  client-request-options

2422	   This option contains options from a DHCP client's request.  It is
2423	   sent in a BNDUPD message.  The first 4 bytes of the option contain
2424	   the "magic number" of the option area from which the DHCP client's
2425	   request options were taken and serves to define the format of the
2426	   rest of the sub-options contained in this option.  After the magic
2427	   number, the options included are in the normal options format
2428	   appropriate for that magic number.

2430	   A server SHOULD NOT include all of the options in a DHCP client
2431	   request in this option, but rather a server SHOULD include only those
2432	   options which are of likely interest to its partner server.  See
2433	   section 7.1 for details.

2435	        Code         Len         Magic Number      Embedded options
2436	   +-----+-----+------+-----+----+----+----+----+----+----+--
2437	   |  0  |  29 |   0  |  n  | m1 | m2 | m3 | m4 | b1 | b2 |  ...
2438	   +-----+-----+------+-----+----+----+----+----+----+----+--

2440	6.2.28.  client-reply-options

2442	   This option contains options from a DHCP server's reply to a DHCP
2443	   client request.  It is sent in a BNDUPD message.  The first 4 bytes
2444	   of the option contain the "magic number" of the option area from
2445	   which the DHCP reply options were taken and serves to define the
2446	   format of the rest of the sub-options contained in this option.
2447	   After the magic number, the options included are in the normal
2448	   options format appropriate for that magic number.

2450	   A server SHOULD NOT include all of the options in a DHCP server's
2451	   reply to a client's request in this option, but rather a server
2452	   SHOULD include only those options which are of likely interest to its
2453	   partner server.  See section 7.1 for details.

2455	        Code         Len         Magic Number      Embedded options
2456	   +-----+-----+------+-----+----+----+----+----+----+----+--
2457	   |  0  |  30 |   0  |  n  | m1 | m2 | m3 | m4 | b1 | b2 |  ...
2458	   +-----+-----+------+-----+----+----+----+----+----+----+--

2460	6.3.  BNDUPD message format

2462	   The binding update (BNDUPD) message is used to send the binding data-
2463	   base changes to the partner server.

2465	   The message type for the BNDUPD message is 3.

2467	   The xid of the BNDUPD MUST be unique with respect to other failover
2468	   messages transmitted from this failover endpoint.

2470	   The following table summarizes the various options for the BNDUPD
2471	   message.

2473	                                        binding-status

2475	   Option                        ACTIVE     EXPIRED    RELEASED   FREE
2476	   ------                        ------     -------    --------   ----
2477	   assigned-IP-address           MUST       MUST       MUST       MUST
2478	   binding-status                MUST       MUST       MUST       MUST
2479	   client-identifier             MAY        MAY        MAY        MAY
2480	   client-hardware-address       MUST       MUST       MUST       MAY
2481	   lease-expiration-time         MUST       MUST NOT   MUST NOT   MUST NOT
2482	   potential-expiration-time     MUST       MUST NOT   MUST NOT   MUST NOT
2483	   grace-expiration-time         MUST NOT   MUST NOT   MUST NOT   MUST NOT
2484	   start-time-of-state           SHOULD     SHOULD     SHOULD     SHOULD
2485	   client-last-trans.-time       MUST       SHOULD     MUST       MAY
2486	   DDNS(1)                       SHOULD     SHOULD     SHOULD     SHOULD
2487	   client-request-options        SHOULD     SHOULD NOT SHOULD     SHOULD NOT
2488	   client-reply-options          SHOULD     SHOULD NOT SHOULD     SHOULD NOT
2489	   all others                    MAY        MAY        MAY        MAY

2491	                                        binding-status

2493	                                BACKUP
2494	                                RESET
2495	   Option                       ABANDONED
2496	   ------                       ---------
2497	   assigned-IP-address          MUST
2498	   binding-status               MUST
2499	   client-identifier            MAY(2)
2500	   client-hardware-address      MAY(2)
2501	   lease-expiration-time        MUST NOT
2502	   potential-expiration-time    MUST NOT
2503	   grace-expiration-time        MUST NOT
2504	   start-time-of-state          SHOULD
2505	   client-last-trans.-time      MAY
2506	   DDNS(1)                      SHOULD
2507	   client-request-options       SHOULD NOT
2508	   client-reply-options         SHOULD NOT
2509	   all others                   MAY

2511	   (1) Only SHOULD appear if server supports dynamic DNS.

2513	   (2) MUST NOT if binding-status is ABANDONED.

2515	             Table 6.3-1: Options used in a BNDUPD message

2517	6.4.  BNDACK message format

2519	   A server sends a binding acknowledgement (BNDACK) message when it has
2520	   successfully committed binding database changes received from a fail-
2521	   over partner in a BNDUPD message to its own stable storage.

2523	   The message type for the BNDACK message is 4.

2525	   The xid in a BNDACK MUST be the same as the xid of the corresponding
2526	   BNDUPD.

2528	   The following table summarizes the options for the BNDACK message.

2530	                                        binding-status

2532	   Option                        ACTIVE     EXPIRED    RELEASED   FREE
2533	   ------                        ------     -------    --------   ----
2534	   assigned-IP-address           MUST       MUST       MUST       MUST
2535	   binding-status                MUST       MUST       MUST       MUST
2536	   client-identifier             MAY        MAY        MAY        MAY
2537	   client-hardware-address       MUST       MUST       MUST       MAY
2538	   reject-reason                 MAY        MAY        MAY        MAY
2539	   message                       MAY        MAY        MAY        MAY
2540	   lease-expiration-time         MUST       MUST NOT   MUST NOT   MUST NOT
2541	   potential-expiration-time     MUST       MUST NOT   MUST NOT   MUST NOT
2542	   grace-expiration-time         MUST NOT   MUST NOT   MUST NOT   MUST NOT
2543	   start-time-of-state           SHOULD     SHOULD     SHOULD     SHOULD
2544	   client-last-trans.-time       SHOULD     SHOULD     SHOULD     MAY
2545	   DDNS(1)                       SHOULD     SHOULD     SHOULD     SHOULD
2546	   all others                    MAY        MAY        MAY        MAY

2548	                                        binding-status
2549	                                BACKUP
2550	                                RESET
2551	   Option                       ABANDONED
2552	   ------                       ---------
2553	   assigned-IP-address          MUST
2554	   binding-status               MUST
2555	   client-identifier            MAY
2556	   client-hardware-address      MAY(2)
2557	   reject-reason                MAY
2558	   message                      MAY
2559	   lease-expiration-time        MUST NOT
2560	   potential-expiration-time    MUST NOT
2561	   grace-expiration-time        MUST NOT
2562	   start-time-of-state          SHOULD
2563	   client-last-trans.-time      MAY
2564	   DDNS(1)                      SHOULD
2565	   all others                   MAY

2567	   (1) Only SHOULD appear if the server supports dynamic DNS.

2569	   (2) MUST NOT if binding-status is ABANDONED.

2571	              Table 6.4-1: Options used in a BNDACK message

2573	6.5.  Bulking for BNDUPD and BNDACK messages
2574	   DISCUSSION:

2576	      Bulking is planned for this protocol, but it hasn't been specified
2577	      in this revision of the draft.  Once the draft settles down, we
2578	      will specify the bulking approach in detail.

2580	6.6.  UPDREQ message format

2582	   The update request (UPDREQ) message is used by one server to request
2583	   that its partner send it all binding database information that it has
2584	   not already seen.

2586	   The message type for the UPDREQ message is 9.

2588	   The xid in a UPDREQ message MUST be unique among messages transmitted
2589	   from this failover endpoint during the life of this connection.

2591	   There are no options that MUST appear in an UPDREQALL message.  Any
2592	   option MAY appear, though very few will likely be useful.

2594	6.7.  UPDREQALL message format

2596	   The update request all (UPDREQALL) message is used by one server to
2597	   request that all binding database information be sent in order to
2598	   recover from a total loss of its binding database by the requesting
2599	   server.

2601	   The message type for the UPDREQALL message is 7.

2603	   The xid in a UPDREQALL message MUST be unique among messages
2604	   transmitted from this failover endpoint during the life of this con-
2605	   nection.

2607	   There are no options that MUST appear in an UPDREQALL message.  Any
2608	   option MAY appear, though very few will likely be useful.

2610	6.8.  UPDDONE message format

2612	   The update done (UPDDONE) message is used by the responding server to
2613	   indicate that all requested updates have been sent by the responding
2614	   server as BNDUPD messages and responded to by the requesting server
2615	   using BNDACK messages.  While a BNDACK message MUST have been
2616	   received for each BNDUPD message prior to the transmission of the
2617	   UPDDONE message, this doesn't necessarily mean that all of the BNDUPD
2618	   messages were accepted, only that all of them were responded to with
2619	   a BNDACK message.  Thus, a NAK (comprised of a BNDACK message con-
2620	   taining a reject-reason option) could be used to reject a BNDUPD, but
2621	   for the purposes of the UPDDONE message, such NAK would count as a
2622	   response to the associated BNDUPD message, and would not block the
2623	   eventual transmission of the UPDDONE message.

2625	   The message type for the UPDDONE message is 7.

2627	   The xid in an UPDDONE message MUST be identical to the xid in the
2628	   UPDREQ or UPDREQALL message that initiated the update process.

2630	   There are no options that MUST appear in an UPDDONE message.  Any
2631	   option MAY appear, though very few will likely be useful.

2633	6.9.  POOLREQ message format

2635	   The pool request (POOLREQ) is used by the secondary server to request
2636	   an allocation of IP addresses from the primary server.

2638	   The message type for the POOLREQ message is 1.

2640	   The xid in a POOLREQ message MUST be unique among messages transmit-
2641	   ted from this failover endpoint during the life of this connection.

2643	   There are no options that MUST appear in a POOLREQ message.  Any
2644	   option MAY appear.

2646	6.10.  POOLRESP message format

2648	   The pool response (POOLRESP) is used by the primary server to inform
2649	   the secondary server how many IP addresses were allocated to the
2650	   secondary server as the result of the pool request.

2652	   The message type for the POOLRESP message is 2.

2654	   The xid in the POOLRESP message MUST be identical to the xid in the
2655	   POOLREQ message for which this POOLRESP is a response.

2657	   The following table shows the options that MUST appear in a POOLRESP
2658	   message:

2660	           Option
2661	           ------
2662	           addresses-transferred       MUST

2664	                          Table 6.10-1: Options used in a POOLREQ message

2666	6.11.  CONNECT message format

2668	   The connect (CONNECT) message is used by the primary server to estab-
2669	   lish a high level connection with the other server, and to transmit
2670	   several important configuration data items between the servers.

2672	   The message type for the CONNECT message is 5.

2674	   The xid in a CONNECT message MUST be unique among messages transmit-
2675	   ted from this failover endpoint during the life of this connection.

2677	   The CONNECT message MUST be the first message sent down a newly esta-
2678	   blished connection.

2680	   The following table summarizes the options that are associated with
2681	   the CONNECT message:

2683	   Option
2684	   ------
2685	   sending-server-IP-address   MUST
2686	   max-unacked-bndupd          MUST
2687	   receive-timer               MUST
2688	   vendor-class-identifier     MUST
2689	   protocol-version            MUST
2690	   TLS-request                 MUST
2691	   MCLT                        MUST
2692	   hash-bucket-assignment      MUST
2693	   all others                  MAY

2695	              Table 6.11-1: Options used in a CONNECT message

2697	6.12.  CONNECTACK message format

2699	   The connect response (CONNECTACK) message is used by a secondary
2700	   server to respond to the receipt of a CONNECT message from the pri-
2701	   mary server.

2703	   The message type for the CONNECTACK message is 6.

2705	   The xid in the CONNECTACK message MUST be identical to the xid in the
2706	   CONNECT message for which this CONNECTACK is a response.

2708	   The following table summarizes the options associated with the CON-
2709	   NECTACK message:

2711	   Option
2712	   ------
2713	   sending-server-IP-address   MUST
2714	   max-unacked-bndupd          MUST
2715	   receive-timer               MUST
2716	   vendor-class-identifier     MUST
2717	   protocol-version            MUST
2718	   TLS-request                 MUST
2719	   reject-reason               MAY(1)
2720	   message                     MAY
2721	   MCLT                        MUST NOT
2722	   hash-bucket-assignment      MUST NOT

2724	   (1) Indicates a rejection of the CONNECT message.

2726	              Table 6.12-1: Options used in a CONNECTACK message

2728	6.13.  STATE message format

2730	   The state (STATE) message is used by either server to communicate the
2731	   current state of the failover endpoint with the other server.  It
2732	   MUST be sent immediately after connection negotiation completes with
2733	   the other server, and it MUST be sent whenever the server's state
2734	   changes.

2736	   The message type for the STATE message is 10.

2738	   The xid in a STATE message MUST be unique among messages transmitted
2739	   from this failover endpoint during the life of this connection.

2741	   The following table shows the options that MUST appear in a STATE
2742	   message:

2744	           Option
2745	           ------
2746	           sending-state               MUST
2747	           server-flags                MUST
2748	           start-time-of-state         MUST

2750	                      Table 6.13-1: Options used in a STATE message

2752	6.14.  CONTACT message format

2754	   The contact (CONTACT) message is used by either server to verify that
2755	   the connection is operational to the other server.

2757	   The message type for the CONTACT message is 11.

2759	   The xid in a CONTACT message MUST be unique among messages transmit-
2760	   ted from this failover endpoint during the life of this connection.

2762	   There are no options that MUST be used in a CONTACT message.

2764	6.15.  DISCONNECT message format

2766	   The disconnect (DISCONNECT) message is used by either server just
2767	   prior to closing a connection.

2769	   The message type for the DISCONNECT message is 12.

2771	   The xid in a DISCONNECT message MUST be unique among messages
2772	   transmitted from this failover endpoint during the life of this con-
2773	   nection.

2775	   The DISCONNECT message MUST be the last message sent down a connec-
2776	   tion before it is closed.

2778	   The following table summarizes the options that are associated with
2779	   the DISCONNECT message:

2781	   Option
2782	   ------
2783	   reject-reason               MUST
2784	   message                     SHOULD

2786	              Table 6.15-1: Options used in a DISCONNECT message

2788	7.  Protocol Messages

2790	   This section contains the detailed definition of the protocol mes-
2791	   sages, including the information to include when sending the message,
2792	   as well as the actions to take upon receiving the message.

2794	7.1.  BNDUPD message

2796	   The binding update (BNDUPD) message is used to send the binding data-
2797	   base changes to the partner server, and the partner server responds
2798	   with a binding acknowledgement (BNDACK) message when it has success-
2799	   fully committed those changes to its own stable storage.

2801	   The rest of the failover protocol exists to determine whether the
2802	   partner server is able to communicate or not, and to enable the
2803	   partners to exchange BNDUPD/BNDACK messages in order to keep their
2804	   binding databases in stable storage synchronized.

2806	7.1.1.  Sending the BNDUPD message

2808	   A BNDUPD message SHOULD be generated whenever any binding changes.  A
2809	   change might be in the binding-status, the lease-expiration-time, or
2810	   even just the last-transaction-time.  In general, any time a DHCP
2811	   client sends in a packet that results in a DHCP server writing to its
2812	   stable storage, a BNDUPD message SHOULD be generated.

2814	   The BNDUPD (and BNDACK) messages refer to the binding-status of the
2815	   IP address, and this protocol defines a series of binding-statuses,
2816	   discussed in more detail below.  Some servers may not support all of
2817	   these binding-statuses, and so in those cases they will not be sent,
2818	   and upon receipt a reasonable interpretation should be made.

2820	   All BNDUPD messages MUST contain the IP address in the assigned-IP-
2821	   address option, and it contains the IP address about which the BNDUPD
2822	   message is being sent.

2824	   All BNDUPD messages MUST contain the binding-status option, and it
2825	   will have one of the values in the following list.  This list
2826	   discusses the meanings of the various binding-statuses and the infor-
2827	   mation that should go into the BNDUPD message because of them.

2829	      o ACTIVE

2831	        Indicates that the IP address is currently leased to a DHCP
2832	        client.

2834	        client-hardware-address

2836	        The client-hardware-address option MUST appear, and be set from
2837	        the htype and chaddr of the DHCP client to which this IP address
2838	        is leased.

2840	        client-identifier

2842	        If the DHCP client to which this IP address is leased used a
2843	        client-identifier option to identify itself, then the client-
2844	        identifier MUST appear in the BNDUPD message, else it MUST NOT
2845	        appear.

2847	        lease-expiration-time

2849	        The lease-expiration-time option MUST appear, and be set to the
2850	        expiration time most recently ACKed to the DHCP client.  Note
2851	        that the time ACKed to a DHCP client is a lease duration in
2852	        seconds, while the lease-expiration-time option in a BNDUPD mes-
2853	        sage is an absolute time value.

2855	        potential-expiration-time

2857	        The potential-expiration-time option MUST appear, and be set to
2858	        a value beyond that of the lease-expiration time.  This is the
2859	        value that is ACKed by the BNDACK message.  A server sending a
2860	        BNDUPD message MUST be able to recover the potential-
2861	        expiration-time sent in every BNDUPD, not just those that
2862	        receive a corresponding BNDACK, in order to be able to protect
2863	        against possible duplicate allocation of IP addresses after
2864	        transitioning to PARTNER-DOWN state. See section 5.2.1 for
2865	        details as to why the potential-expiration-time exists and
2866	        guidelines for how to decide the value.

2868	      o EXPIRED

2870	        A binding-status of EXPIRED is used when a client's binding on
2871	        an IP address has expired and the server does not wish to imple-
2872	        ment an expired-grace period.  When the partner server ACK's the
2873	        BNDUPD of an EXPIRED IP address, the server sets its internal
2874	        state to FREE.  It is then available to allocation to any client
2875	        of the primary server.

2877	        client-hardware-address

2879	        There SHOULD be a DHCP client associated with the IP address
2880	        whose binding has expired.  If there is, then the client-
2881	        hardware-address option MUST appear, and be set from the htype
2882	        and chaddr of the DHCP client to which this IP address was
2883	        leased.

2885	        client-identifier

2887	        There SHOULD be a DHCP client associated with the IP address
2888	        whose binding has expired.  If there is, then if the DHCP client
2889	        to which this IP address was leased used a client-identifier
2890	        option to identify itself, then the client-identifier MUST
2891	        appear in the BNDUPD message, else it MUST NOT appear.

2893	      o RELEASED
2894	        A binding-status of RELEASED is used when a DHCP client sends in
2895	        a DHCPRELEASE message and the server does not wish to implement
2896	        a released-grace period.  When the partner server ACK's the
2897	        BNDUPD of an RELEASED IP address, the server sets its internal
2898	        state to FREE, and it is available for allocation by the primary
2899	        server to any DHCP client.

2901	        client-hardware-address

2903	        There SHOULD be a DHCP client associated with the IP address
2904	        whose binding has been released.  If there is, then the client-
2905	        hardware-address option MUST appear, and be set from the htype
2906	        and chaddr of the DHCP client which released this IP address.

2908	        client-identifier

2910	        There SHOULD be a DHCP client associated with the IP address
2911	        whose binding has been released.  If there is, then if the DHCP
2912	        client which released this IP address used a client-identifier
2913	        option to identify itself, then the client-identifier MUST
2914	        appear in the BNDUPD message, else it MUST NOT appear.

2916	      o FREE

2918	        A binding-status of FREE is used when a DHCP server needs to
2919	        communicate that an IP address is available for allocation to
2920	        another server, but it was not just released, expired, or reset
2921	        by a network administrator.  When the partner server ACK's the
2922	        BNDUPD of an FREE IP address, the server sets its internal state
2923	        such that it is available for allocation by any DHCP client.

2925	        client-hardware-address

2927	        There MAY be a DHCP client associated with the IP address whose
2928	        binding is now desired to be FREE.  If there is, then the
2929	        client-hardware-address option MUST appear, and be set from the
2930	        htype and chaddr of the DHCP client which released this IP
2931	        address.

2933	        client-identifier

2935	        There MAY be a DHCP client associated with the IP address whose
2936	        binding is now desired to be FREE.  If there is, then if the
2937	        DHCP client which released this IP address used a client-
2938	        identifier option to identify itself, then the client-identifier
2939	        MUST appear in the BNDUPD message, else it MUST NOT appear.

2941	        client-hardware-address
2942	        There MAY be a DHCP client associated with the IP address whose
2943	        binding has now expired.  If there is, then the client-
2944	        hardware-address option MUST appear, and be set from the htype
2945	        and chaddr of the DHCP client which released this IP address.

2947	        client-identifier

2949	        There MAY be a DHCP client associated with the IP address whose
2950	        binding has now expired.  If there is, then if the DHCP client
2951	        which most recently leased this IP address used a client-
2952	        identifier option to identify itself, then the client-identifier
2953	        MUST appear in the BNDUPD message, else it MUST NOT appear.

2955	        grace-expiration-time

2957	        The grace-expiration-time option MUST appear, and is the length
2958	        of time that this server will wait before trying to make the IP
2959	        address available after the lease has expired for this IP
2960	        address.

2962	        client-hardware-address

2964	        There MAY be a DHCP client associated with the IP address whose
2965	        binding has now been released by sending a DHCPRELEASE.  If
2966	        there is, then the client-hardware-address option MUST appear,
2967	        and be set from the htype and chaddr of the DHCP client which
2968	        released this IP address.

2970	        client-identifier

2972	        There MAY be a DHCP client associated with the IP address whose
2973	        binding has been released.  If there is, then if the DHCP client
2974	        which most recently leased this IP address used a client-
2975	        identifier option to identify itself, then the client-identifier
2976	        MUST appear in the BNDUPD message, else it MUST NOT appear.

2978	        client-hardware-address

2980	        There MAY be a DHCP client associated with the IP address whose
2981	        binding is now desired to be FREE.  If there is, then the
2982	        client-hardware-address option MUST appear, and be set from the
2983	        htype and chaddr of the DHCP client which released this IP
2984	        address.

2986	        client-identifier

2988	        There MAY be a DHCP client associated with the IP address whose
2989	        binding is now desired to be FREE.  If there is, then if the
2990	        DHCP client which released this IP address used a client-
2991	        identifier option to identify itself, then the client-identifier
2992	        MUST appear in the BNDUPD message, else it MUST NOT appear.

2994	        grace-expiration-time

2996	        The grace-expiration-time MUST appear, and is the length of time
2997	        that this server will wait before trying to make the IP address
2998	        available after the lease was released for this IP address

3000	      o ABANDONED

3002	        An ABANDONED IP address is one that has been considered unusable
3003	        by the DHCP subsystem.  An IP address for which a valid PING
3004	        response was received SHOULD be set to ABANDONED.

3006	        client-hardware-address

3008	        There SHOULD NOT be a DHCP client associated with an ABANDONDED
3009	        IP address.  The client-hardware-address option MUST NOT appear
3010	        in the BNDUPD message.

3012	        client-identifier

3014	        There SHOULD NOT be a DHCP client associated with the IP address
3015	        whose binding has now been ABANDONED.  The client-identifier
3016	        option MUST-NOT appear in the BNDUPD message.

3018	      o RESET

3020	        The RESET value of the binding-status is used to indicate that
3021	        this IP address was made available by operator command.

3023	      o BACKUP

3025	        The BACKUP value of binding-status indicates that this IP
3026	        address belongs to the secondary server, and can be allocated by
3027	        that server to a DHCP client at any time.

3029	        client-hardware-address

3031	        There MAY be a DHCP client associated with an BACKUP IP address.
3032	        If there is, the client-hardware-address option MUST appear, and
3033	        be set from the htype and chaddr of the DHCP client to which
3034	        this IP address was most recently associated.

3036	        client-identifier
3037	        There MAY be a DHCP client associated with this IP address.  If
3038	        the DHCP client to which this IP address is leased used a
3039	        client-identifier option to identify itself, then the client-
3040	        identifier MUST appear in the BNDUPD message, else it MUST NOT
3041	        appear.

3043	   The following option information is generic to all BNDUPD messages,
3044	   regardless of the value of the binding-status.

3046	   o start-time-of-state

3048	     The start-time-of-state SHOULD appear.  It is set to the time at
3049	     which this IP address first took on the state that corresponds to
3050	     the current value of binding-status.

3052	   o last-transaction-time

3054	     The last-transaction-time value SHOULD appear.  This is the time at
3055	     which this DHCP server last received a packet from the DHCP client
3056	     referenced by the client-identifier or client-hardware-address that
3057	     was associated with the IP address referenced by the assigned-IP-
3058	     address.

3060	   o DDNS

3062	     If the DHCP server is performing dynamic DNS operations on behalf
3063	     of the DHCP client represented by the client-identifier or client-
3064	     hardware-address, then it should include a DDNS option containing
3065	     the host name, domain name, and status of any dynamic DNS opera-
3066	     tions enabled.

3068	   o client-request-options

3070	     If the BNDUPD was triggered by a request from a DHCP client (typi-
3071	     cally those with binding-status of ACTIVE and RELEASED), then the
3072	     server SHOULD include options of interest to a failover partner
3073	     from the client's request packet in the client-request-options for
3074	     transmission to its partner.

3076	     A server sending a BNDUPD need not remember the "interesting"
3077	     options or the information that would appear in an "interesting"
3078	     option for transmission at a time when the BNDUPD is not closely
3079	     associated with a DHCP client request.

3081	     A server SHOULD send the following "interesting" options.  It MAY
3082	     send any DHCP client options.  As new options are defined, the RFC
3083	     defining these options SHOULD include information that they are
3084	     "interesting to failover servers" if they should be sent as part of
3085	     a BNDUPD.

3087	         option          option
3088	         number          name
3089	         -----------------------------------------

3091	         12              host-name
3092	         81              client-FQDN [DDNS]
3093	         82              relay-agent-information [AGENTINFO]
3094	         TBD             user-class [USERCLASS]
3095	         60              vendor-class-identifier

3097	           Table 7.1.1-1: Options which SHOULD be sent in
3098	           the client-request-options option in a BNDUPD message.

3100	   o client-reply-options

3102	     If the BNDUPD was triggered by a request from a DHCP client (typi-
3103	     cally those with binding-status of ACTIVE and RELEASED), then the
3104	     server SHOULD include options of interest to a failover partner
3105	     from the server's DHCP reply packet in the client-reply-options for
3106	     transmission to its partner.

3108	     A server sending a BNDUPD need not remember the "interesting"
3109	     options or the information that would appear in an "interesting"
3110	     option for transmission at a time when the BNDUPD is not closely
3111	     associated with a DHCP client request.

3113	     A server SHOULD send the following "interesting" options.  It MAY
3114	     send any DHCP client options.  As new options are defined, the RFC
3115	     defining these options SHOULD include information that they are
3116	     "interesting to failover servers" if they should be sent as part of
3117	     a BNDUPD.

3119	         option          option
3120	         number          name
3121	         -----------------------------------------

3123	         58              renewal-time
3124	         59              rebinding-time

3126	           Table 7.1.1-2: Options which SHOULD be sent in
3127	           the client-reply-options option in a BNDUPD message.

3129	   The BNDUPD message SHOULD be sent as soon as possible from the time
3130	   that the DHCP client received a response and the lease bindings data-
3131	   base is written on stable storage.

3133	7.1.2.  Receiving the BNDUPD message

3135	   When a server receives a BNDUPD message, it needs to decide how to
3136	   processes the message and whether the message represents a conflict
3137	   of any sort. The conflict resolution process SHOULD be used on the
3138	   receipt of every BNDUPD message, not just those that are received
3139	   while in POTENTIAL-CONFLICT state, in order to increase the robust-
3140	   ness of the protocol.

3142	   There are three sorts of conflicts:

3144	      o Two clients one IP address conflict

3146	        This is the duplicate IP address allocation conflict. There are
3147	        two different clients each allocated the same address.  There
3148	        cannot be a client conflict unless there is a client specified
3149	        in the BNDUPD message.  See section 5.10.1 for how to resolve
3150	        this conflict.

3152	      o Two IP addresses one client conflict

3154	        This conflict exists when a client on one server is associated
3155	        with a one IP address, and on the other server with a different
3156	        IP address in the same or a related subnet. This does not refer
3157	        to the case where a single client has addresses in multiple dif-
3158	        ferent subnets or administrative domains, but rather the case
3159	        where on the same subnet the client has as lease on one IP
3160	        address in one server and on a different IP address on the other
3161	        server.

3163	        This conflict may or may not be a problem for a given DHCP
3164	        server implementation.  In the event that a DHCP server requires
3165	        that a DHCP client have only one outstanding lease for an IP
3166	        address on one subnet, this conflict should be resolved by
3167	        accepting the update which has the latest client-last-
3168	        transaction-time.

3170	      o binding-status conflict

3172	        This is normal conflict, where one server is updating the other
3173	        with newer information.  See section 5.10.1 for details of how
3174	        to resolve these conflicts.

3176	      See section 5.10.1 for details of how to process binding-status
3177	      changes in BNDUPD messages.

3179	7.1.3.  Accepting the BNDUPD message

3181	   When accepting a BNDUPD message, the information contained in the
3182	   client-request-options and client-reply-options SHOULD be examined
3183	   for any information of interest to this server.  For instance, a
3184	   server which wished to detect changes in client specified host names
3185	   might want examine and save information from the host-name or
3186	   client-FQDN options.  Server's which expect to utilize information
3187	   from the relay-agent-information option would want to store this
3188	   information.

3190	7.1.4.  Time values related to the BNDUPD message

3192	   There are three time values that may be sent in a BNDUPD message.

3194	      o lease-expiration-time

3196	        The time that the server gave to the client, i.e., the time that
3197	        the server believes that the client's lease will expire.

3199	      o potential-expiration-time

3201	        The time that the server wants to be sure its partner waits
3202	        (added to the MCLT) before assuming that this lease has expired.
3203	        Typically some time beyond the desired client lease time.

3205	      o client-last-transaction-time

3207	        The time that the client last interacted with this server.

3209	   As discussed in section 5.2, each server knows what its partner has
3210	   ACKed with regard to potential-expiration time.  In addition, each
3211	   server needs to remember what it has told its partner as the
3212	   potential-expiration-time.  Moreover, each server must remember what
3213	   it has acked to the *other* server as the most recent potential-
3214	   expiration-time from that server.

3216	   Remember that each server sends a potential-expiration-time and
3217	   receives an ACK for that as well as receiving a potential-
3218	   expiration-time and needing to remember what it has acked for that.

3220	   While they don't have to be named in any particular way, the times
3221	   that a server needs to remember for every IP address in order to
3222	   implement the failover protocol are:

3224	      o lease-expiration-time
3225	        The time that this server gave to the DHCP client.  A DHCP
3226	        server needs to remember this time already, just to be a DHCP
3227	        server.

3229	      o sent-potential-expiration-time

3231	        The latest time sent to the partner for a potential-expiration-
3232	        time.

3234	      o acked-potential-expiration-time

3236	        The latest time that the partner has acked for a potential
3237	        expiration time.  Typically the same as sent-potential-
3238	        expiration-time if there is not a BNDUPD outstanding.

3240	      o received-potential-expiration-time

3242	        The latest time that this server has ever received as a
3243	        potential-expiration-time from its partner in a BNDUPD that this
3244	        server ACKed.

3246	   So, a server has to remember two additional times concerning BNDUPD
3247	   messages that it has initiated, and one additional time concerning
3248	   BNDUPD message that it has received.  How are these times used?

3250	   First, let's look at the time that DHCP server can offer to a DHCP
3251	   client.  A server can offer to a to a DHCP client a time that is no
3252	   longer than the MCLT beyond the max( received-potential-expiration-
3253	   time, acked-potential-expiration-time).  One might think that the
3254	   server should be able to offer only the MCLT beyond the acked-
3255	   potential-expiration-time, and while that is certainly simple and
3256	   easy to understand, it has negative consequences in actual operation.

3258	   To illustrate this, in the simple case where the primary updates the
3259	   secondary for a while and then fails, if the secondary can then renew
3260	   the client for only the MCLT beyond the acked-potential-expiration-
3261	   time, then the secondary will only be able to renew the client for
3262	   the MCLT, because the secondary has never sent a BNDUPD packet to the
3263	   primary concerning this IP address and client, and so its acked-
3264	   potential-expiration-time is zero.

3266	   However, if we allow the secondary to renew the client with the MCLT
3267	   beyond the max( received-potential-expiration-time, acked-potential-
3268	   expiration-time), then the secondary can usually renew the client for
3269	   the full lease period, at least for the first renew it sees from the
3270	   client, since the received-potential-expiration-time is generally
3271	   longer than the client's desired lease interval.  The difference in
3272	   renew times could make a big difference in server load on the
3273	   secondary in this case.

3275	   What are the consequences of allowing a server to offer a DHCP client
3276	   a lease term of the MCLT beyond the max( received-potential-
3277	   expiration-time, acked-potential-expiration-time)?  The consequences
3278	   appear whenever a server enters PARTNER-DOWN state, and affect how
3279	   long that server has to wait before reallocating expired leases.
3280	   With this approach, when a server goes into PARTNER-DOWN state, it
3281	   must wait the MCLT beyond the max( lease-expiration-time, sent-
3282	   potential-expiration-time, acked-potential-expiration-time,
3283	   received-potential-expiration-time ) for each IP address before it
3284	   can reallocate that IP address to another DHCP client.   One might
3285	   normally think that it needed to wait only the MCLT beyond the max(
3286	   lease-expiration-time, received-potential-expiration-time ), i.e.,
3287	   beyond what it has told the client and what it has explicitly acked
3288	   to the other server.  But with the optimization discussed above --
3289	   where either server can offer the DHCP client a lease term of the
3290	   MCLT beyond the max( received-potential-expiration-time, acked-
3291	   potential-expiration-time), then the additional times sent-
3292	   potential-expiration-time and acked-potential-expiration-time must be
3293	   added into the expression, since the partner could have used those
3294	   times as part of its own lease time calculation.

3296	   Thus this optimization may require a longer waiting time when enter-
3297	   ing PARTNER-DOWN state, but will generally allow servers to operate
3298	   considerably more effectively when running in COMMUNICATIONS-
3299	   INTERRUPTED state.

3301	7.2.  BNDACK message

3303	   Every BNDUPD message that is received by a server MUST be responded
3304	   to with a corresponding BNDACK message.  The receiving server SHOULD
3305	   respond quickly to every BNDUPD message but it MAY choose to respond
3306	   preferentially to DHCP client requests instead of BNDUPD messages,
3307	   since there is no absolute time period within which a BNDACK must be
3308	   sent in response to a BNDUPD message, and DHCP clients frequently do
3309	   have time constraints that must be met.

3311	   A BNDACK message can only be sent in response to a BNDUPD message
3312	   using the same TCP connection from which the BNDUPD message was
3313	   received, since the XID's in BNDUPD messages are guaranteed unique
3314	   only during the life of a single TCP connection.  When a connection
3315	   to a partner server goes down, a server with unprocessed BNDUPD mes-
3316	   sages MAY simply drop all of those messages, since it can be sure
3317	   that the partner will retransmit them when they are next in communi-
3318	   cations.  A server with unprocessed BNDUPD messages when a TCP con-
3319	   nection goes down MAY instead choose to process those BNDUPD mes-
3320	   sages, but it MUST NOT send any BNDACK messages in response (again
3321	   because of the issues surrounding XID uniqueness).

3323	7.2.1.  Sending the BNDACK message

3325	   The BNDACK message MUST contain the same xid as the corresponding
3326	   BNDUPD message.

3328	   All of the options which appear in the BNDUPD message MUST be
3329	   included in the BNDACK message.  The values in the options MAY be
3330	   updated to reflect current information on the server sending the
3331	   BNDACK.   Note that update of this information may be used for infor-
3332	   mational purposes, but MUST NOT be assumed to necessarily be recorded
3333	   in the stable storage of the server who sent the BNDUPD message
3334	   because there is no corresponding ACK of the BNDACK message.  Any
3335	   information that SHOULD be recorded in the partner server's stable
3336	   storage MUST be transmitted in a subsequent BNDUPD.

3338	   If the server is accepting the BNDUPD, the BNDACK message includes
3339	   only those options that appeared in the BNDUPD message. If the server
3340	   is rejecting the BNDUPD, the additional option reject-reason MUST
3341	   appear in the BNDACK message, and the message option SHOULD appear in
3342	   this case containing a human-readable error message describing in
3343	   some detail the reason for the rejection of the BNDUPD message.

3345	   If the server rejects the BNDUPD message with a BNDACK and a reject-
3346	   reason option, it may be because the server believes that it has
3347	   binding information that the other server should know.  A server
3348	   which is rejecting a BNDUPD may initiate a BNDUPD of its own in order
3349	   to update its partner with what it believes is better binding infor-
3350	   mation, but it MUST ensure through some means that it will not end up
3351	   a situation where each server is sending BNDUPD messages as fast as
3352	   possible because they can't agree on which server has better binding
3353	   data.  Placing a reasonable delay on the initiation of a BNDUPD mes-
3354	   sage after sending a BNDACK with a reject-reason would be one way to
3355	   ensure this situation doesn't occur.

3357	7.2.2.  Receiving the BNDACK message

3359	   When a server receives a BNDACK message, if it doesn't contain a
3360	   reject-reason option that means that the BNDUPD message was accepted,
3361	   and the server which sent the BNDUPD MUST update its stable storage
3362	   with the potential-expiration-time value sent in the BNDUPD message
3363	   and returned in the BNDACK message.  Other values sent in the BNDUPD
3364	   message MAY be used as desired.

3366	7.3.  UPDREQ message

3368	   The update request (UPDREQ) message is used by one server to request
3369	   that its partner send it all of the binding database information that
3370	   it has not already seen.   Since each server is required to keep
3371	   track at all times of the binding information the other server has
3372	   received and ACKed, one server can request transmission of all un-
3373	   ACKed binding database information held by the other server by using
3374	   the UPDREQ message.

3376	   The UPDREQ message is used whenever the sending server cannot proceed
3377	   before it has processed all previously un-ACKed binding update infor-
3378	   mation, since the UPDREQ message should yield a corresponding UPDDONE
3379	   message.  The UPDDONE message is not sent until the server that sent
3380	   the UPDREQ message has responded to all of the BNDUPD messages gen-
3381	   erated by the UPDREQ message with BNDACK messages. Thus, the sender
3382	   of the UPDREQ message can be sure upon receipt of an UPDDONE message
3383	   that it has received and committed to stable storage all outstanding
3384	   binding database updates.

3386	   See section 9, Protocol state transitions, for the details of when
3387	   the UPDREQ message is sent.

3389	7.3.1.  Sending the UPDREQ message

3391	   There are no options for the UPDREQ message.

3393	   The UPDREQ message is sent with a unique xid.

3395	7.3.2.  Receiving the UPDREQ message

3397	   A server receiving an UPDREQ message MUST send all binding database
3398	   changes that have not yet been ACKed by the sending server.   These
3399	   changes are sent as undistinguished BNDUPD messages.

3401	   However, the server which received and is processing the UPDREQ mes-
3402	   sage MUST track the BNDACK messages that correspond to the BNDUPD
3403	   messages triggered by the UPDREQ message and, when they are all
3404	   received, the server MUST send an UPDDONE message.

3406	   The server processing the UPDREQ message and sending BNDUPD messages
3407	   to its partner SHOULD only track the BNDUPD and BNDACK message pairs
3408	   for unACKed binding database changes that were present upon the
3409	   receipt of the UPDREQ message.  A server which has received an UPDREQ
3410	   message SHOULD send BNDUPD messages for binding database changes that
3411	   occur after receipt of the UPDREQ message, but it SHOULD NOT include
3412	   those additional BNDUPD messages and their corresponding BNDACK mes-
3413	   sages in the accounting necessary to consider the UPDREQ complete and
3414	   subsequently send the UPDDONE message.  If some additional binding
3415	   database changes end up becoming part of the set of BNDUPD messages
3416	   considered as part of the UPDREQ (due to whatever algorithm the
3417	   server uses to scan its bindings database for unacked changes) it
3418	   will probably not cause any difficulty, but a server MUST NOT attempt
3419	   to include all such later BNDUPD messages in the accounting for the
3420	   UPDREQ in order to be able to transmit an UPDDONE message.

3422	   When queuing up the BNDUPD messages for transmission to the sender of
3423	   the UPDREQ message, the server processing the UPDREQ message MUST
3424	   honor the value returned in the max-unacked-bndupd option in the CON-
3425	   NECT or CONNECTACK message that set up the connection with the send-
3426	   ing server.  It MUST NOT send more BNDUPD messages without receiving
3427	   corresponding BNDACKs than the value returned in max-unacked-bndupd.

3429	7.4.  UPDREQALL message

3431	   The update request all (UPDREQALL) message is used by one server to
3432	   request that its partner send it all of the binding database informa-
3433	   tion.  This message is used to allow one server to recover from a
3434	   failure of stable storage and to restore its binding database in its
3435	   entirety from the other server.

3437	   A server which sends an UPDREQALL message cannot proceed until all of
3438	   its binding update information is restored, and it knows that all of
3439	   that information is restored when an UPDDONE message is received.

3441	   See section 9, Protocol state transitions, for the details of when
3442	   the UPDREQALL message is sent.

3444	7.4.1.  Sending the UPDREQALL message

3446	   There are no options for the UPDREQALL message.

3448	   The UPDREQALL message is sent with a unique xid.

3450	7.4.2.  Receiving the UPDREQALL message

3452	   A server receiving an UPDREQALL message MUST send all binding data-
3453	   base information to the sending server.  These changes are sent as
3454	   undistinguished BNDUPD messages.

3456	   However, the server processing the UPDREQALL message MUST track the
3457	   BNDACK messages that correspond to the BNDUPD messages triggered by
3458	   the UPDREQALL message and, when they are all received, the server
3459	   MUST send an UPDDONE message.

3461	   Just as specified for the processing of the UPDREQ message, the
3462	   server processing the UPDREQALL message and sending BNDUPD messages
3463	   to its partner SHOULD only track the BNDUPD and BNDACK message pairs
3464	   for unACKed binding database changes that were present upon the
3465	   receipt of the UPDREQALL message.  A server which has received an
3466	   UPDREQALL message SHOULD send BNDUPD messages for binding database
3467	   changes that occur after receipt of the UPDREQ message, but it SHOULD
3468	   NOT include those additional BNDUPD messages and their corresponding
3469	   BNDACK messages in the accounting necessary to consider the UPDREQALL
3470	   complete and subsequently send the UPDDONE message.  If some addi-
3471	   tional binding database changes end up becoming part of the set of
3472	   BNDUPD messages considered as part of the UPDREALLQ (due to whatever
3473	   algorithm the server uses to scan its bindings database for unacked
3474	   changes) it will probably not cause any difficulty, but a server MUST
3475	   NOT attempt to include all such later BNDUPD messages in the account-
3476	   ing for the UPDREQALL in order to be able to transmit an UPDDONE mes-
3477	   sage.

3479	   When queuing up the BNDUPD messages for transmission to the sender of
3480	   the UPDREQALL message, the server processing the UPDREQALL MUST honor
3481	   the value returned in the max-unacked-bndupd option in the CONNECT or
3482	   CONNECTACK message that set up the connection with the sending
3483	   server.  It MUST NOT send more BNDUPD messages without receiving
3484	   corresponding BNDACKs than the value returned in max-unacked-bndupd.

3486	7.5.  UPDDONE message

3488	   The update done (UPDDONE) message is used by a server receiving an
3489	   UPDREQ or UPDREQALL message to signify that it has sent all of the
3490	   BNDUPD messages requested by the UPDREQ or UPDREQALL request and that
3491	   it has received a BNDACK for each of those messages.

3493	7.5.1.  Sending the UPDDONE message

3495	   The UPDDONE message SHOULD be sent as soon as the last BNDACK message
3496	   corresponding to a BNDUPD message requested by the UPDREQ or
3497	   UPDREQALL is received from the server which sent the UPDREQ or
3498	   UPDREQALL.  The XID of the UPDDONE message MUST be the same as the
3499	   XID of the corresponding UPDREQ or UPDREQALL message.

3501	7.5.2.  Receiving the UPDDONE message

3503	   A server receiving the UPDDONE message knows that all of the informa-
3504	   tion that it requested by sending an UPDREQ or UPDREQALL message has
3505	   now been sent and that it has recorded this information in its stable
3506	   storage.  It typically uses that the receipt of an UPDDONE message to
3507	   move to a different failover state.  See sections 9.5.2 and 9.8.3 for
3508	   details.

3510	7.6.  POOLREQ message

3512	   The pool request (POOLREQ) message is used by the secondary server to
3513	   request an allocation of IP addresses from the primary server.   It
3514	   MUST be sent by a secondary server to a primary server to request IP
3515	   address allocation by the primary.  The IP addresses allocated are
3516	   transmitted using normal BNDUPD messages from the primary to the
3517	   secondary.

3519	   The POOLREQ message SHOULD be sent from the secondary to the primary
3520	   whenever the secondary transitions into NORMAL state.  It SHOULD
3521	   periodically be resent in order that any change in the number of
3522	   available IP addresses on the primary be reflected in the pool on the
3523	   secondary.  The period may be influenced by the secondary server's
3524	   leasing activity.

3526	7.6.1.  Sending the POOLREQ message

3528	   The POOLREQ message has no options.  It must be sent with a unique
3529	   xid.

3531	7.6.2.  Receiving the POOLREQ message

3533	   When a primary server receives a POOLREQ message it SHOULD examine
3534	   the binding database and determine how many IP addresses the secon-
3535	   dary server should have, and set these IP addresses to BACKUP state.
3536	   It SHOULD then send BNDUPD messages concerning all of these IP
3537	   addresses to the secondary server.

3539	   Servers frequently have several kinds of IP addresses available on a
3540	   particular network segment.  The failover protocol assumes that both
3541	   primary and secondary servers are configured in such a way that each
3542	   knows the type and number of IP addresses on every network segment
3543	   participating in the failover protocol.  The primary server is
3544	   responsible for allocating the secondary server the correct propor-
3545	   tion of available IP addresses of each kind, and the secondary server
3546	   is responsible for being configured in such a way that it can tell
3547	   the kind of every IP address based solely on the IP address itself.

3549	   A primary server MUST keep track of how many IP addresses were allo-
3550	   cated as a result of processing the POOLREQ message, and send that
3551	   number in the POOLRESP message.

3553	   A primary server MAY choose to defer processing a POOLREQ message
3554	   until a more convenient time to process it, but it should not depend
3555	   on the secondary server to retransmit the POOLREQ message in that
3556	   case.

3558	   If a secondary server receives a POOLREQ message it SHOULD report an
3559	   error.

3561	7.7.  POOLRESP message

3563	   A primary server sends a POOLRESP message to a secondary server after
3564	   the allocation process for available addresses to the secondary
3565	   server is complete.  Typically this message will precede some of the
3566	   BNDUPD messages that the primary uses to send the actual allocated IP
3567	   addresses to the secondary.

3569	7.7.1.  Sending the POOLRESP message

3571	   The POOLRESP message MUST contain the same xid as the corresponding
3572	   POOLREQ message.

3574	   The only option which MUST appear in a POOLREQ message is:

3576	      o addressed-transferred

3578	        The number of addresses allocated to the secondary server by the
3579	        primary server as a result of a POOLREQ is contained in the
3580	        addresses-transferred option in a POOLRESP message.  Note this
3581	        is the number of addresses that are transferred to the secondary
3582	        in the primary's binding database as a result of the correspond-
3583	        ing POOLREQ message, and that it may be some time before they
3584	        can all be transmitted to the secondary server through the use
3585	        of BNDUPD messages.

3587	7.7.2.  Receiving the POOLRESP message

3589	   When a secondary server receives a POOLRESP message, it SHOULD send
3590	   another POOLRESP message if the value of the addresses-transferred
3591	   option is non-zero.

3593	   Typically, no other action is taken on the reception of a POOLRESP
3594	   message.

3596	7.8.  CONNECT message

3598	   The connect message is used to establish an applications level con-
3599	   nection over a newly created TCP connection.  It gives the source
3600	   information for the connection, and some important configuration
3601	   information.  It MUST be sent only by the primary server.  Either
3602	   server can initiate a TCP connection, but the CONNECT message is only
3603	   sent by the primary server.

3605	7.8.1.  Sending the CONNECT message

3607	   The CONNECT message MUST be the first message sent by the primary
3608	   server after the establishment of a new TCP connection with a secon-
3609	   dary server participating in the failover protocol.

3611	   The xid of the CONNECT message must be unique.

3613	   The IP address of the primary server MUST be placed in the sending-
3614	   server-IP-address option.  This information is placed in an option
3615	   inside of the message in order to allow the identity of the sender to
3616	   be covered by a shared secret.

3618	   The number of BNDUPD messages the primary server can accept without
3619	   blocking the TCP connection MUST be placed in the max-unacked-bndupd
3620	   option.  This MUST be a number equal to or greater than 1, SHOULD be
3621	   a number greater than 10, and SHOULD be a number less than 100.

3623	   The length of the receive timer (tReceive, see section 8.3) MUST be
3624	   placed in the receive-timer option.

3626	   The MCLT MUST be placed in the MCLT option.

3628	   The hash-bucket-assignment option MUST be included in the CONNECT
3629	   message.  In the event that load balancing is not configured for this
3630	   server, the hash-bucket-assignment option will indicate that.  The
3631	   value of the hash-bucket-assignment option is determined from the
3632	   specific buckets that the primary server has determined that the
3633	   secondary server MUST service as part of the load-balancing algo-
3634	   rithm.  The way in which the primary server determines this informa-
3635	   tion is outside the scope of this protocol definition.  The primary
3636	   server SHOULD be configured with a percentage of clients that the
3637	   secondary server will be instructed to service, and the primary
3638	   server SHOULD use the algorithm in [LOADB] to generate a Hash Bucket
3639	   Assignment which it sends to the secondary server.

3641	   The vendor class identifier MUST be placed in the vendor-class-
3642	   identifier option.

3644	   The protocol-version option MUST be included in every CONNECT mes-
3645	   sage.  The current value of the protocol version is 1.

3647	   The TLS-request option MUST be sent and contains the desired TLS con-
3648	   nection request as well as information concerning whether TLS is sup-
3649	   ported.    If this CONNECT message is being sent over a already
3650	   created TLS connection, the TLS-request MUST NOT appear.

3652	7.8.2.  Receiving the CONNECT message

3654	   When a server receives a TCP connection on the failover port, if it
3655	   is a PRIMARY server it should send a CONNECT message, and if it is a
3656	   secondary server it should wait for a CONNECT message.

3658	   When a secondary server receives a CONNECT message it should:

3660	      1.  Record the time at which the message was received.

3662	      2.  Examine the protocol-version option, and decide if this server
3663	          is capable of interoperating with another server running that
3664	          protocol version.  If not, send the CONNECTACK message with
3665	          the appropriate reject-reason.  The server MUST include its
3666	          protocol-version in the CONNECTACK message.

3668	      3.  Examine the TLS-request option.  Figure out the TLS-reply
3669	          value based on the capabilities and configuration of this
3670	          server, and save it for the CONNECTACK message.  If the
3671	          results of the TLS negotiation result in a connection rejec-
3672	          tion, then go immediately to send the CONNECTACK message.

3674	          The possibilities are:

3676	               CONNECT        CONNECTACK
3677	             TLS-request       TLS-reply

3679	                                    Reject
3680	              req acc          t1   Reason   Comments
3681	              --- ---          --   ------   --------
3682	              0   0            0
3683	              0   0            1    11       receiver requires TLS
3684	              0   1            0
3685	              0   1            1
3686	              1   0            -             request doesn't make sense
3687	              1   1            0
3688	              1   1            1
3689	              2   0            -             request doesn't make sense
3690	              2   1            0    9 or 10  receiver won't do TLS
3691	              2   1            1

3693	      4.  Check to see if there is a message-digest option in the CON-
3694	          NECT message.  If there was, and the server does not support
3695	          message-digests, then reject the connection with the appropri-
3696	          ate reject-reason in the CONNECTACK.

3698	      5.  Determine if the sender (from the sending-server-IP-address
3699	          option) and the implicit role of the sender (i.e., primary)
3700	          represents a server with which the receiver was configured to
3701	          engage in failover activity.  This is performed after the any
3702	          TLS processing so that it occurs after a secure connection is
3703	          created, to ensure that there is no tampering with the IP
3704	          address of the partner.

3706	          If not, then the receiving server should reject the CONNECT
3707	          request by sending a CONNECTACK message with a reject-reason
3708	          value of: 8, invalid failover partner.

3710	          If it is, then the receiving failover endpoint should be
3711	          determined.

3713	      6.  Decide if the time delta between the sending of the message,
3714	          in the time field, and the receipt of the message, recorded in
3715	          step 1 above, is acceptable.  A server MAY require an arbi-
3716	          trarily small delta in time values in order to set up a fail-
3717	          over connection with another server.  See section 5.9 for
3718	          information on time synchronization.

3720	          If the delta between the time values is too great, the server
3721	          should reject the CONNECT request by sending a CONNECTACK mes-
3722	          sage with a reject-reason of 4, time mismatch too great.

3724	          If the time mismatch is not considered too great then the
3725	          receiving server MUST record the delta between the servers.
3726	          The receiving server MUST use this delta to correct all of the
3727	          absolute times received from the other server in all time-
3728	          valued options.  Note that server's can participate in fail-
3729	          over with arbitrarily great time mismatches, as long as it is
3730	          more or less constant.

3732	      7.  If the receiving server is a secondary server, it MUST examine
3733	          the MCLT option in the CONNECT request and use the value of
3734	          the MCLT as the MCLT for this failover endpoint.

3736	          A receiving secondary server SHOULD be able to operate with
3737	          any MCLT sent by the primary,  but if it cannot, then it
3738	          should send a CONNECTACK with a reject-reason of 5, MCLT
3739	          mismatch.

3741	      8.  The server MUST store hash-bucket-assignment option for use
3742	          during processing during NORMAL state.  If this hash bucket
3743	          assignment conflicts with the secondary server's configured
3744	          hash bucket assignment for use in other than NORMAL state, the
3745	          secondary server should send a CONNECTACK with a reject reason
3746	          of 19, Hash bucket assignment conflict.

3748	      9.  The receiving server MAY use the vendor-class-identifier to do
3749	          vendor specific processing.

3751	7.9.  CONNECTACK message

3753	   The CONNECTACK message is sent to accept or reject a CONNECT message.
3754	   It is sent by the secondary server which received a CONNECT message.

3756	   Attempting immediately to reconnect after either receiving a CONNEC-
3757	   TACK with a reject-reason or after sending a CONNECTACK with a
3758	   reject-reason could yield unwanted looping behavior, since the reason
3759	   that the connection was rejected may well not have changed since the
3760	   last attempt.  A simple suggested solution is to wait a minute or two
3761	   after sending or receiving a CONNECTACK message with a reject-reason
3762	   before attempting to reestablish communication.

3764	7.9.1.  Sending the CONNECTACK message

3766	   The xid of the CONNECTACK message MUST be that of the corresponding
3767	   CONNECT message.

3769	   The IP address of the sending server MUST be placed in the sending-
3770	   server-IP-address option.  This information is placed in an option
3771	   inside of the message in order to allow the identity of the sender to
3772	   be covered by a shared secret.

3774	   The protocol-version option MUST be included in every CONNECTACK mes-
3775	   sage.  The current value of the protocol version is 1.

3777	   If the connection has been rejected, the reject-reason option MUST be
3778	   placed in the CONNECTACK message with an appropriate reason, and a
3779	   message option SHOULD be included with a human-readable error message
3780	   describing the reason for the rejection in some detail.  If the
3781	   reject-reason option appears, then the remaining options listed below
3782	   do not appear.  The sending server should close the connection after
3783	   sending the CONNECTACK if the connection was rejected.

3785	   The results of the TLS negotiation MUST be placed in the TLS-reply
3786	   option.  If this CONNECTACK message is being sent over an already TLS
3787	   secured connection, then there MUST NOT be a TLS-reply option.

3789	   If there was a message-digest option in the CONNECT message, then
3790	   there MUST be a message-digest in the CONNECTACK message and any sub-
3791	   sequent messages if the CONNECTACK does not contain a reject-reason.

3793	   The number of BNDUPD messages the server can accept without blocking
3794	   the TCP connection MUST be placed in the max-unacked-bndupd option.
3795	   This SHOULD be a number greater than 10, and SHOULD be a number less
3796	   than 100.

3798	   The length of the receive timer (tReceive, see section 8.3) MUST be
3799	   placed in the receive-timer option.

3801	   The vendor class identifier MUST be placed in the vendor-class-
3802	   identifier option.

3804	   If the server is rejecting the CONNECT message, then the reject-
3805	   reason option MUST appear.  A message option SHOULD appear to give a
3806	   human readable version of the rejection reason.

3808	   After a connection is created (either by sending a CONNECTACK message
3809	   to the first CONNECT message, or sending a CONNECTACK message to a
3810	   CONNECT message received over a TLS connection), the server MUST send
3811	   a STATE message.

3813	   After a connection is created, the server MUST start two timers for
3814	   the connection: tSend and tReceive.   The tSend timer SHOULD be
3815	   approximately 33 percent of the time in the receiver-timer option in
3816	   the corresponding CONNECT message.  The tReceive timer SHOULD be the
3817	   time sent in the receiver-timer option in the CONNECTACK message.

3819	   The tReceive timer is reset whenever a message is received from this
3820	   TCP connection.  If it ever expires, the TCP connection is dropped
3821	   and communications with this partner is considered not ok.

3823	   The tSend timer is reset whenever a message is sent over this connec-
3824	   tion. When it expires, a CONTACT message MUST be sent.

3826	7.9.2.  Receiving the CONNECTACK message

3828	   If a CONNECTACK message is received with a different XID from the one
3829	   in the CONNECT that was sent, it SHOULD be ignored.

3831	   When a CONNECTACK message is received, the following actions should
3832	   be taken:

3834	      1.  Record the time the message was received.

3836	      2.  Check to see if there is a reject-reason option in the CONNEC-
3837	          TACK message.  If not, continue with step 3.  If there is a
3838	          reject-reason option, the server SHOULD report the error code.
3839	          If a message option appears a server SHOULD display the string
3840	          from the message option in a user visible way.  The server
3841	          MUST close the connection if a reject-reason option appears.

3843	      3.  Check to see if the xid on the CONNECTACK matches an outstand-
3844	          ing CONNECT message on this TCP connection.

3846	      4.  Check the value of the TLS-reply option, and if it was 1, then
3847	          skip processing of the rest of the CONNECTACK message, and
3848	          immediately enter into TLS connection setup.

3850	          If it does not, a server SHOULD report an error.

3852	          This step occurs prior to steps 5 and 6 in order to allow
3853	          creation of a secure connection (if required) prior to pro-
3854	          cessing the protocol version and IP address information.

3856	      5.  Examine the value of the protocol-version option.  If this
3857	          server is able to establish connections with another server
3858	          running this protocol version, then continue, else close the
3859	          connection.

3861	      6.  Decide if the time delta between the sending of the message,
3862	          in the time field, and the receipt of the message, recorded in
3863	          step 1 above, is acceptable.  A server MAY require an arbi-
3864	          trarily small delta in time values in order to set up a fail-
3865	          over connection with another server.

3867	          If the delta between the time values is too great, the server
3868	          should drop the TCP connection.

3870	          If the time mismatch is not considered too great then the
3871	          receiving server MUST record the delta between the servers.
3872	          The receiving server MUST use this delta to correct all of the
3873	          absolute times received from the other server in all time-
3874	          valued options.  Note that the failover protocol is con-
3875	          structed so that two servers can be failover partners with
3876	          arbitrarily great time mismatches.

3878	      7.  If the receiving server is a secondary server, it MUST examine
3879	          the MCLT option in the CONNECT request and use the value of
3880	          the MCLT as the MCLT for this failover endpoint.

3882	          A receiving secondary server SHOULD be able to operate with
3883	          any MCLT sent by the primary,  but if it cannot, then it MUST
3884	          drop the TCP connection.

3886	      8.  If the receiving server is a secondary server, it MUST store
3887	          the hash-bucket-assignment option for use during processing
3888	          during NORMAL state.  If this hash bucket assignment conflicts
3889	          with the server's configured hash bucket assignment for use in
3890	          other than NORMAL state, the secondary server should send a
3891	          CONNECTACK with a reject reason of 19, Hash bucket assignment
3892	          conflict.

3894	      9.  The receiving server MAY use the vendor-class-identifier to do
3895	          vendor specific processing.

3897	      10. After accepting a CONNECTACK message, the server MUST send a
3898	          STATE message.

3900	          After receiving a CONNECTACK message, the server MUST start
3901	          two timers for the connection: tSend and tReceive.   The tSend
3902	          timer SHOULD be approximately 20 percent of the time in the
3903	          receiver-timer option in the corresponding CONNECTACK message.
3904	          The tReceive timer SHOULD be set to the time sent in the
3905	          receiver-timer option in the CONNECT message.

3907	          The tReceive timer is reset whenever a message is received
3908	          from this TCP connection.  If it ever expires, the TCP connec-
3909	          tion is dropped and communications with this partner is con-
3910	          sidered not ok.

3912	          The tSend timer is reset whenever a message is sent over this
3913	          connection. When it expires, a CONTACT message MUST be sent.

3915	7.10.  STATE message

3917	   The state (STATE) message is used to communicate the current failover
3918	   state to the partner server.

3920	   The STATE message MUST be sent after sending a CONNECTACK message
3921	   that didn't contain a reject-reason option, and MUST be sent after
3922	   receiving a CONNECTACK message without a reject-reason option.

3924	   A STATE message MUST be sent whenever the failover endpoint changes
3925	   its failover state and a connection exists to the partner.

3927	   The STATE message requires no response from the failover partner.

3929	7.10.1.  Sending the STATE message

3931	   The current failover state is placed in the server-state option and
3932	   the current state of the STARTUP flag is placed in the server-flags
3933	   option.

3935	   The message is sent with a unique xid.

3937	   A server SHOULD only send the STATE message either when the connec-
3938	   tion is created (i.e, after sending or receiving a CONNECTACK message
3939	   with no reject-reason option), or when there is a change from the
3940	   values sent in a previous STATE message.

3942	7.10.2.  Receiving the STATE message

3944	   Every STATE message SHOULD indicate a change in state or a change in
3945	   the flags.

3947	   When a STATE message is received, any state transitions specified in
3948	   section 9 are taken.

3950	   No response to a STATE message is required.

3952	7.11.  CONTACT message

3954	   The contact (CONTACT) message is sent to verify communications
3955	   integrity with a failover partner. The CONTACT message is sent when
3956	   no messages have been sent to the failover partner for a specified
3957	   period of time.  This is determined by the tSend timer expiring (see
3958	   section 8.3).

3960	7.11.1.  Sending the CONTACT message

3962	   The CONTACT message is sent.

3964	7.11.2.  Receiving the CONTACT message

3966	   When a CONTACT message is received, the tReceive timer is reset (as
3967	   it is with any message that is received).

3969	   A server MAY use the time in the time field and the time recorded
3970	   above to refine the delta time calculations between the servers.

3972	7.12.  DISCONNECT message

3974	   The DISCONNECT is the last message sent over a connection before
3975	   dropping an established connection.

3977	   After sending or receiving a DISCONNECT message, a server needs to
3978	   have some mechanism to prevent an error loop. Simply reconnecting to
3979	   the partner immediately is not the best option, especially after
3980	   several consecutive attempts.

3982	   A simple suggested solution is to wait a minute or two after sending
3983	   or receiving a DISCONNECT before attempting to reestablish communica-
3984	   tion.

3986	7.12.1.  Sending the DISCONNECT message

3988	   The DISCONNECT message MUST be the last message sent by the a server
3989	   which is dropping a TCP connection.

3991	   The xid of the DISCONNECT message must be unique.

3993	   The reject-reason option MUST appear giving a reason why the connec-
3994	   tion was dropped.  A message option SHOULD appear giving a human
3995	   readable error message with possibly more details.

3997	7.12.2.  Receiving the DISCONNECT message

3999	   When a server receives a DISCONNECT message it should log the message
4000	   if there was one and possibly raise an alarm of some sort if the
4001	   reject reason was one that was sufficiently serious.

4003	8.  Connection Management

4005	   Servers participating in the failover protocol communicate over TCP
4006	   connections.   These TCP connections are used both to transmit bind-
4007	   ing information from one server to another as well as to allow each
4008	   server to determine whether communications is possible with the other
4009	   server.

4011	   Central to the operation of the failover protocol is a notion of
4012	   "communications okay" or "communications failed".  Failover state
4013	   transitions are taken in many cases when the status of communications
4014	   with the partner changes, and the existence or non-existence of a TCP
4015	   connections between failover endpoints is used to determine if com-
4016	   munications is "okay" or "failed".

4018	   A single TCP connection exists which connects two failover endpoints.

4020	8.1.  Connection granularity

4022	   There exists one TCP connection between each set of failover end-
4023	   points.  See section 5.1.1 for an explanation of failover endpoint.

4025	   There are a maximum of two TCP connections between any two servers
4026	   implementing the failover protocol, one for each of the possible
4027	   failover endpoints between these two servers.  There is a minimum of
4028	   one TCP connection between one server and every other failover server
4029	   with which it implements the failover protocol.

4031	8.2.  Creating the TCP connection

4033	   Every server implementing the failover protocol MUST listen on port
4034	   647 for incoming failover TCP connections.  The source port of the
4035	   TCP connection is unimportant.

4037	   Every server implementing the failover protocol SHOULD attempt to
4038	   connect to all of its partners periodically, where the period is
4039	   implementation dependent and SHOULD be configurable. In the event
4040	   that a connection has been rejected by a CONNECTACK message with a
4041	   reject-reason option contained in it or a DISCONNECT message, a
4042	   server SHOULD r educe the frequency with which it attempts to connect
4043	   to that server but it SHOULD continue to attempt to connect periodi-
4044	   cally.

4046	   Once a connection is established, the primary server MUST send a CON-
4047	   NECT message across the connection.  A secondary server MUST wait for
4048	   the CONNECT message from a primary server.

4050	   Every CONNECT message includes a TLS-request option, and if the CON-
4051	   NECTACK message does not reject the CONNECT message and the TLS-reply
4052	   option says TLS MUST be used, then the servers will immediately enter
4053	   into TLS negotiation.

4055	   Once TLS negotiation is complete, the primary server MUST resend the
4056	   CONNECT message on the newly secured TLS connection and then wait for
4057	   the CONNECTACK message in response.  The TLS-request and TLS-reply
4058	   options MUST have the same values in this second CONNECT and CONNEC-
4059	   TACK message as they had in the first messages.

4061	   The second message sent over a new connection (either a bare TCP con-
4062	   nection or a connection utilizing TLS) is a STATE message.  Upon the
4063	   receipt of this message, the receiver can consider communications up.

4065	   It is entirely possible that two servers will attempt to make connec-
4066	   tions to each other essentially simultaneously, and in this case the
4067	   secondary server will be waiting for a CONNECT message on each con-
4068	   nection.  The primary server MUST send a CONNECT message over one
4069	   connection and it MUST close the other connection.

4071	   A secondary server MUST NOT respond to the closing of a TCP connec-
4072	   tion with a blind attempt to reconnect -- there may be another TCP
4073	   connection to the same failover partner already in use.

4075	8.3.  Using the TCP connection for determining communications status

4077	   The TCP connection is used to determine the communications status of
4078	   the other server, i.e., communications-ok, or communications-
4079	   interrupted.

4081	   Three things must happen for a server to consider that communications
4082	   are ok with respect to another server:

4084	      1.  A TCP connection must be established to the other server.

4086	      2.  A CONNECT message must be received and a CONNECTACK message
4087	          sent in response.  The CONNECT message is used to determine
4088	          the identify of the failover endpoint of the other end of the
4089	          TCP connection -- without it, the failover endpoint cannot be
4090	          uniquely determined.  Without knowledge of the failover end-
4091	          point, then the entity with which communications is ok is
4092	          undetermined.

4094	      3.  A STATE message must be received from the other server over
4095	          the connection.  This STATE message initializes important
4096	          information necessary to the operation of the state machine
4097	          the governs the behavior of this failover endpoint.

4099	   There are two ways that a server can determine that communications
4100	   has failed:

4102	      1.  The TCP connection can go down, yielding an error when
4103	          attempting to send or receive a message. This will happen at
4104	          least as often as the period of the tSend timer.

4106	      2.  The tReceive timer can expire.

4108	   In either of these cases, communications is considered interrupted.

4110	   Several difficulties arise when trying to use one TCP connection for
4111	   both bulk data transfer as well as to sense the communications status
4112	   of the other server.   One aspect of the problem stems from the dif-
4113	   ferent requirements of both uses.  The bulk data transfer is of
4114	   course critically important to the protocol, but the speed with which
4115	   it is processed is not terribly significant.  It might well be
4116	   minutes before a BNDUPD message is processed, and while not optimal,
4117	   such an occasional delay doesn't compromise the correctness of the
4118	   protocol. However, the speed with which one server detects the other
4119	   server is up (or, more importantly, down) is more highly constrained.
4120	   Generally one server should be able to detect that the other server
4121	   is not communicating within a minute or less.

4123	   These differing time constraints makes it difficult to use the same
4124	   TCP connection for data transfer as well as to sense communications
4125	   integrity.   See section 3.5 for additional details on TCP.

4127	   The solution to this problem is to require that some message be
4128	   received by each end of the connection within a limited time or that
4129	   the connection will be considered down.  If no messages have been
4130	   sent recently, then a CONTACT message is sent.

4132	   In the case where there is no data queued to be sent, this is not a
4133	   problem, but in the case where there is data queued to be sent to the
4134	   partner, then the CONTACT message will not actually be transmitted
4135	   until the queued data is sent.  Section 3.5 explains why waiting for
4136	   TCP to determine that the connection is down is not acceptable, and
4137	   leads a requirement that the receiving server never block the sending
4138	   server from sending CONTACT messages.

4140	   In order to meet this requirement, each server tells the other server
4141	   the number of outstanding BNDUPD messages that it will accept.  The
4142	   receiving server is required to always be able to accept that many
4143	   BNDUPD messages off of the connection's input queue even if it cannot
4144	   process them immediately, and to accept all other messages immedi-
4145	   ately.

4147	   Thus, the sending server's TCP is never blocked from sending a mes-
4148	   sage except for very short periods, less than a few seconds unless
4149	   the network connection itself has problems.  In this case, if the
4150	   CONTACT messages don't make it to the partner then the partner will
4151	   close the connection.

4153	   DISCUSSION:

4155	      When implementing this capability, one needs to be careful when
4156	      sending any message on the TCP connection as TCP can easily block
4157	      the server if the local TCP send buffers are full.  This can't be
4158	      prevented because if the receiver is not reachable (via the net-
4159	      work), the sending TCP can't send and thus it will be unable to
4160	      empty the local TCP send buffers.  So, all send operations either
4161	      need to assume they may block for some time or non-blocking sends
4162	      must be used.

4164	8.4.  Using the TCP connection for binding data

4166	   Binding data, in the form of BNDUPD messages and BNDACK messages to
4167	   respond to them, are sent across the TCP connection.

4169	   In order to support timely detection of any failure in the partner
4170	   server, the TCP connection MUST NOT block for more than a very short
4171	   time, on the order of a few seconds.  Therefore, a server that is
4172	   sending BNDUPD messages MUST send only a restricted number before
4173	   receiving BNDACK messages about previous messages sent.

4175	   The number of outstanding BNDUPD messages that each server will
4176	   accept without causing TCP to block transmission of additional data
4177	   (i.e, CONTACT messages) is sent by each server in the CONNECT and
4178	   CONNECTACK messages in the max-unacked-bndupd option.

4180	8.5.  Using the TCP connection for control messages

4182	   The TCP connection is used for control messages: POOLREQ, UPDREQ,
4183	   STATE, CONTACT, UPDREQALL and the corresponding reply messages: POOL-
4184	   RESP, UPDDONE.  A server MUST immediately accept all of these mes-
4185	   sages from the TCP connection.  A server MUST immediately accept any
4186	   BNDACK which is received as well.

4188	8.6.  Losing the TCP connection

4190	   When the TCP connection is lost, then communications is not ok with
4191	   the other server.  A server which has lost communications SHOULD
4192	   immediately attempt to reconnect to the other server, and should
4193	   retry these connection attempts periodically.

4195	   A BNDACK message can only be sent in response to a BNDUPD message
4196	   using the same TCP connection from which the BNDUPD message was
4197	   received, since the XID's in BNDUPD messages are guaranteed unique
4198	   only during the life of a single TCP connection.  When a connection
4199	   to a partner server goes down, a server with unprocessed BNDUPD mes-
4200	   sages MAY simply drop all of those messages, since it can be sure
4201	   that the partner will retransmit them when they are next in communi-
4202	   cations.  A server with unprocessed BNDUPD messages when a TCP con-
4203	   nection goes down MAY instead choose to process those BNDUPD mes-
4204	   sages, but it MUST NOT send any BNDACK messages in response (again
4205	   because of the issues surrounding XID uniqueness).

4207	   When the TCP connection is closed explicitly, the DISCONNECT message
4208	   with a reject-reason option (and, ideally, a message option) MUST be
4209	   sent over the TCP connection.

4211	9.  Protocol States

4213	   This section discusses the various states that a failover endpoint
4214	   may take, and the server actions required when entering the state,
4215	   operating in the state, and leaving the state, as well as the events
4216	   that cause transitions out of the state into another state.

4218	   The state transition diagram in Figure 9.2-1 is relevant for this
4219	   section. This is the common state transition diagram for both servers
4220	   in a failover pair.  In the event that the textual description of a
4221	   state differs from the state transition diagram, the textual descrip-
4222	   tion is to be considered authoritative.

4224	9.1.  Server Initialization

4226	   When a server starts it starts out in STARTUP state.  See section 9.4
4227	   below for details.

4229	9.2.  Server State Transitions

4231	   Whenever a server transitions into a new state, it MUST record the
4232	   state and the time at which it entered that state in stable storage.
4233	   If communications is "ok", it MUST also send a STATE message to its
4234	   failover partner.

4236	   Figure 9.2-1 is the diagram of the server state transitions. The
4237	   remainder of this section contains information important to the
4238	   understanding of that diagram.

4240	   The server stays in the current state until all of the actions speci-
4241	   fied on the state transition are complete.  If communications fails
4242	   during one of the actions, the server simply stays in the current
4243	   state and attempts a transition whenever the conditions for a transi-
4244	   tion are later fulfilled.

4246	   In the state transition diagram below, the "+" or "-" in the upper
4247	   right corner of each state is a notation about whether communication
4248	   is ongoing with the other server.

4250	   The legend "responsive", "balanced", or "unresponsive" in each state
4251	   indicates whether the server is responsive to all DHCP client
4252	   requests, running in load balanced mode, or totally unresponsive in
4253	   the respective state.  The terms "responsive" and "unresponsive" have
4254	   the obvious meanings, while "balanced" means that a DHCP server may
4255	   respond to all DHCPREQUEST messages that are RENEWAL or REBINDING,
4256	   and to all other messages from clients for which the load balancing
4257	   algorithm indicates that it MUST respond to.  See sections 5.3 and
4258	   9.6.2 for details on load balancing.

4260	   In the state transition diagram below, when communication is reesta-
4261	   blished between the two servers, each must record the state of the
4262	   partner when communication was restored.  State transitions on one
4263	   server in some cases imply state transitions on the partner server,
4264	   so a record of the current state of the partner server must be kept
4265	   by each server.

4267	   If the state of the partner changes while communicating a server
4268	   moves through the communications-failed transition and into whatever
4269	   state results.  It then immediately moves through whatever state
4270	   transition is appropriate given the current state of the partner
4271	   server.  A server performing this operation SHOULD NOT close the TCP
4272	   connection to its partner.

4274	   DISCUSSION:

4276	      The point of this technique is simplicity, both in explanation of
4277	      the protocol and in its implementation.  The alternative to this
4278	      technique of memory of partner state and automatic state transi-
4279	      tion on change of partner state is to have every state in the fol-
4280	      lowing diagram have a state transition for every possible state of
4281	      the partner.  With the approach adopted, only the states in which
4282	      communications are reestablished require a state transition for
4283	      each possible partner state.

4285	   The current state of a server MUST be recorded in stable storage and
4286	   thus be available to the server after a server restart.

4288	        +---------------+  V  +--------------+
4289	        |    RECOVER  - |  |  |   STARTUP  - |
4290	        |(unresponsive) |  +->|(unresponsive)|
4291	        +---------------+     +--------------+
4292	           Comm. OK            +-----------------+
4293	          Other State:-RECOVER |  PARTNER DOWN - |<-----------------+
4294	          |      |             | (responsive)    |                  |
4295	         All   POTENTIAL-      +-----------------+ +--------------+ |
4296	       Others  CONFLICT------------ | --------+    |  RESOLUTION  | |
4297	          |                     Comm. OK      |    |  INTERRUPTED | |
4298	         UPDREQ(ALL)          Other State:    |  +-| (responsive) | |
4299	       Wait UPDDONE            |        |     |  | +--------------+ |
4300	     Wait MCLT from fail   RECOVER  All Others| Comm. OK  ^     |   |
4301	      +--------------+         |        V     V  V        |    Ext. |
4302	      |RECOVER-DONE +|      +--+    +--------------+    Comm.  Cmd. |
4303	      |(unresponsive)|      |       |  POTENTIAL + |    Failed  |   |
4304	      +--------------+   Wait for +>|  CONFLICT    |------+     +-->|
4305	         Comm. OK         Other   | |(unresponsive)|<--------+      |
4306	     +--Other State:-+    State:  | +--------------+         |      |
4307	     |   |           |   RECOVER  |         |                |      |
4308	     |   All      POTENT.  DONE   | Resolve Conflict         |      |
4309	     |  Others:  CONFLICT-- | ----+     (see 9.8)            |      |
4310	     | Wait for             V               V                |      |
4311	     | Other State: NORMAL +-----------------+               |      |
4312	     |   V                 |     NORMAL    + | External      |      |
4313	     |   +--+----------+-->|   (balanced)    |-Command---+-- | -----+
4314	     |      ^          ^   +-----------------+           |   |
4315	     |      |          |            |                    |   |
4316	     |  Wait for   Comm. OK       Comm.            External  |
4317	     |   Other      Other        Failed            Command   |
4318	     |   State:     State:          |                or  |   |
4319	     |RECOVER-DONE  NORMAL     Start Safe        Safe    |   |
4320	     |      |     COMM. INT.  Period Timer       Period  |   |
4321	     |   Comm. OK.     |            V            expiration  |
4322	     |  Other State:   |  +------------------+           |   |
4323	     |    RECOVER      +--| COMMUNICATIONS - |-----------+   |
4324	     V      +-------------|   INTERRUPTED    |   Comm. OK    |
4325	    RECOVER               |  (responsive)    |--Other State:-+
4326	    RECOVER-DONE--------->+------------------+   All Others

4328	           Figure 9.2-1:  Server state diagram.

4330	9.3.  STARTUP state

4332	   The STARTUP state affords an opportunity for a server to probe its
4333	   partner server, before starting to service DHCP clients.

4335	   DISCUSSION:

4337	      Without the STARTUP state, a server would likely start in a state
4338	      derived from its previously stored state (held in stable storage),
4339	      if any.  However, this may be inconsistent with the current state
4340	      of the partner.  The STARTUP state affords the opportunity for a
4341	      server to potentially learn the partner's state and determine if
4342	      that state is consistent with its derived starting state or
4343	      whether some significant state change has occurred at the partner
4344	      that forces the server to start in another state.  This is
4345	      especially critical if significant time has elapsed while the
4346	      server was down.

4348	9.3.1.  Operation while in STARTUP state

4350	   Whenever a server is in STARTUP state, it MUST be unresponsive to
4351	   DHCP client requests, and so the time spent in the STARTUP state is
4352	   necessarily short, typically on the order of a few seconds to a few
4353	   tens of seconds.  The exact time spent in the STARTUP state is imple-
4354	   mentation dependent, and the primary and secondary server are not
4355	   required to spend the same amount of time in the STARTUP state.

4357	   Whenever a STATE message is sent to the partner while in STARTUP
4358	   state the STARTUP bit MUST be set in the server-flags option and the
4359	   previously recorded failover state MUST be placed in the server-state
4360	   option.

4362	9.3.2.  Transition out of STARTUP state

4364	   Each server starts out in startup state every time it initializes
4365	   itself, and performs the following algorithm as part of its initiali-
4366	   zation:

4368	      1.  Is there any record in stable storage of a previous failover
4369	          state?  If yes, set previous-state to the last recorded state
4370	          in stable storage, and continue with step 2.

4372	          Is there any configuration information that indicates that
4373	          this server was previously running but lost its stable
4374	          storage?  Such information must typically come from some
4375	          administrative intervention, since it is difficult for a
4376	          server to distinguish first startup from a startup after it
4377	          has lost its stable storage.  If yes, then set the previous-
4378	          state to RECOVER, and set the time-of-failure to whatever time
4379	          was configured, and go on to step 2.  This time-of-failure
4380	          will be used in the transition out of the RECOVER state into
4381	          the RECOVER-DONE state, below.

4383	          If there is no record of any previous failover state in stable
4384	          storage nor of any previous operational activity for this
4385	          server, then set the previous-state to PARTNER-DOWN if this
4386	          server is a primary and RECOVER if this server is a secondary,
4387	          and set the time-of-failure to a time before the maximum-
4388	          client-lead-time before now.  If using standard Posix times, 0
4389	          would typically do quite well.

4391	      2.  Is the previous-state NORMAL?  If yes, set the previous-state
4392	          to COMMUNICATIONS-INTERRUPTED.

4394	      3.  Start the STARTUP state timer.  The time that a server remains
4395	          in the STARTUP state (absent any communications with its
4396	          partner) is implementation dependent and SHOULD be configur-
4397	          able.  It SHOULD be long enough to for a TCP connection to be
4398	          created to a heavily loaded partner across a slow network.

4400	      4.  Attempt to create a TCP connection to the failover partner.
4401	          See section 8.2.

4403	      5.  Wait for "communications okay", i.e., the process discussed in
4404	          section 8.2 "Creating the TCP Connection", to complete,
4405	          including the receipt of a STATE message from the partner.

4407	          When and if communications become "okay", clear the STARTUP
4408	          flag, and set the current state to the previous-state.

4410	          If the partner is in PARTNER-DOWN state, and if the time at
4411	          which it entered PARTNER-DOWN state (as received in the
4412	          start-time-of-state option in the STATE message) is later than
4413	          the last recorded time of operation of this server, then set
4414	          the current state to RECOVER.  If the time at which it entered
4415	          PARTNER-DOWN state is earlier than the last recorded time of
4416	          operation of this server, then set the current state to
4417	          POTENTIAL-CONFLICT.

4419	          Then, transition to the current state and take the "communica-
4420	          tions okay" state transition based on the current state of
4421	          this server and the partner.

4423	      7.  If the startup time expires, take an implementation dependent
4424	          action:  The server MAY go to the previous-state, or the
4425	          server MAY wait.

4427	          Reasons to go to previous-state and begin processing:

4429	          If the current server is the only operational server, then if
4430	          it waits, there will be no operational DHCP servers.  This
4431	          situation could occur very easily where one server fails and
4432	          then the other crashes and reboots.  If the rebooting server
4433	          doesn't start processing DHCP client requests without first
4434	          being in communication with the other server, then the level
4435	          of DHCP redundancy is not particularly high.  This is an
4436	          appropriate approach if the possibility of partition is low,
4437	          or if the safe period expiration time is well beyond the time
4438	          at which an operator would notice and react to a partition
4439	          situation.  It is also quite appropriate if the safe period
4440	          will never expire.

4442	          Reasons to wait:

4444	          If the current server has been down for longer than the
4445	          maximum-client-lead-time, and it is partitioned from the other
4446	          server, then when it returns it will attempt to use its own
4447	          available addresses to allocate to new DHCP clients, and the
4448	          other server may well be in PARTNER-DOWN state and may have
4449	          already allocated some of those available addresses to DHCP
4450	          clients.  In cases where the possibility of partition is high,
4451	          and the safe period expiration time is less than the likely
4452	          operator reaction time, this is a good approach to use.

4454	9.4.  PARTNER-DOWN state

4456	   PARTNER-DOWN state is a state either server can enter.  When in this
4457	   state, the server does not assume that the other server could still
4458	   be operating and servicing a different set of clients, but instead
4459	   assumes that it is the only server operating. If one server is in
4460	   PARTNER-DOWN state, the other server MUST NOT be operating.

4462	9.4.1.  Upon entry to PARTNER-DOWN state

4464	   No special actions are required when entering PARTNER-DOWN state.

4466	   The server should continue to attempt to connect to the partner
4467	   periodically.

4469	9.4.2.  Operation while in PARTNER-DOWN state

4471	   A server in PARTNER-DOWN state MUST respond to DHCP client requests.
4472	   It will allow renewal of all outstanding leases on IP addresses, and
4473	   will allocate IP addresses from its own pool, and after a fixed
4474	   period of time (the MCLT interval) has elapsed from entry into
4475	   PARTNER-DOWN state, it will allocate IP addresses from the set of all
4476	   available IP addresses.

4478	   Once a server has entered NORMAL state, the PARTNER-DOWN state is
4479	   entered only on command of an external agency (typically an adminis-
4480	   trator of some sort) or after the expiration of an externally config-
4481	   ured minimum safe-time after the beginning of COMMUNICATIONS-
4482	   INTERRUPTED state.

4484	   Any available IP address tagged as belonging to the other server (at
4485	   entry to PARTNER-DOWN state) MUST NOT be used until the maximum-
4486	   client-lead-time beyond the entry into PARTNER-DOWN state has
4487	   elapsed.

4489	   A server in PARTNER-DOWN state MUST NOT allocate an IP address to a
4490	   DHCP client different from that to which it was allocated at the
4491	   entrance to PARTNER-DOWN state until the maximum-client-lead-time
4492	   beyond the maximum of the following times: client expiration time,
4493	   most recently transmitted potential-expiration-time, most recently
4494	   received ack of potential-expiration-time from the partner, and most
4495	   recently acked potential-expiration-time to the partner.  See section
4496	   7.1.4 for details.  If this time would be earlier than the current
4497	   time plus the maximum-client-lead-time, then the time the server
4498	   entered PARTNER-DOWN state plus the maximum-client-lead-time is used.

4500	   Two options exist for lease times given out while in PARTNER-DOWN
4501	   state, with different ramifications flowing from each.

4503	   If the server wishes the Failover protocol to protect it from loss of
4504	   stable storage in PARTNER-DOWN state, then it should ensure that the
4505	   MCLT based lease time restrictions in Section 5.1 are maintained,
4506	   even in PARTNER-DOWN state.

4508	   If the server wishes to forego the protection of the Failover proto-
4509	   col in the event of loss of stable storage, then it need recognize no
4510	   restrictions on actual client lease times while in PARTNER-DOWN
4511	   state.

4513	   A server in PARTNER-DOWN state MUST continue to attempt to establish
4514	   communications and synchronization with its partner.

4516	9.4.3.  Transitions out of PARTNER-DOWN state

4518	   When a server in PARTNER-DOWN state succeeds in establishing a con-
4519	   nection to its partner, its actions are conditional on the state and
4520	   flags received in the STATE message from the other server as part of
4521	   the process of establishing the connection.

4523	   If the STARTUP bit is set in the server-flags option of a received
4524	   STATE message, a server in PARTNER-DOWN state MUST NOT take any state
4525	   transitions based on reestablishing communications. Essentially, if a
4526	   server is in PARTNER-DOWN state, it ignores all STATE messages from
4527	   its partner that have the STARTUP bit set in the server-flags option
4528	   of the STATE message.

4530	   If the STARTUP bit is not set in the server-flags option of a STATE
4531	   message received from its partner, then a server in PARTNER-DOWN
4532	   state takes the following actions based on the value of the server-
4533	   state option in the received STATE message:

4535	      o partner in NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN or
4536	        POTENTIAL-CONFLICT state

4538	        transition to POTENTIAL-CONFLICT state

4540	      o partner in RECOVER state

4542	        stay in PARTNER-DOWN state

4544	      o partner in RECOVER-DONE state

4546	        transition into NORMAL state

4548	9.5.  RECOVER state

4550	   This state indicates that the server has no information in its stable
4551	   storage or that it is re-integrating with a server in PARTNER-DOWN
4552	   state after it has been down.  A server in this state will attempt to
4553	   refresh its stable storage from the other server.

4555	9.5.1.  Operation in RECOVER state

4557	   A server in RECOVER MUST NOT respond to DHCP client requests.

4559	   A server in RECOVER state will attempt to reestablish communications
4560	   with the other server.

4562	9.5.2.  Transitions out of RECOVER state

4564	   If the other server is in POTENTIAL-CONFLICT state when communica-
4565	   tions are reestablished, then the server in RECOVER state will move
4566	   to POTENTIAL-CONFLICT state itself.

4568	   If the other server is in RECOVER state, then this server SHOULD sig-
4569	   nal an error and halt processing.

4571	   If the other server is in any other state, then the server in RECOVER
4572	   state will request an update of missing binding information by send-
4573	   ing an UPDREQ message.  If the server has been instructed (through
4574	   configuration or other external agency) that it has lost its stable
4575	   storage, it MUST send an UPDREQALL message, otherwise it MUST send an
4576	   UPDREQ message.

4578	   It will wait for an UPDDONE message, and upon receipt of that message
4579	   it will start a timer whose expiration is set to a time equal to the
4580	   time the server went down (if known) or the current time (if the
4581	   down-time is unknown) plus the maximum-client-lead-time.  When this
4582	   timer goes off, the server will transition into RECOVER-DONE state.
4583	   This is to allow any IP addresses that were allocated by this server
4584	   prior to loss of its client binding information in stable storage to
4585	   contact the other server or to time out.

4587	   See Figure 9.5.2-1.

4589	   DISCUSSION:

4591	      The actual requirement on this wait period in RECOVER is that it
4592	      start when the recovering server went down, not necessarily when
4593	      it came back up.  If the time when the recovering server failed is
4594	      known, it could be communicated to the recovering server (perhaps
4595	      through actions of the network administrator), and the wait period
4596	      could be reduced to the maximum-client-lead-time less the differ-
4597	      ence between the current time and the time the server failed.  In
4598	      this way, the waiting period could be minimized.

4600	   If an UPDDONE message isn't received within an implementation depen-
4601	   dent amount of time, and no BNDUPD message are being received, then
4602	   the UPDREQ(ALL) message will be re-transmitted.

4604	                A                                        B
4605	              Server                                  Server

4607	                |                                        |
4608	             RECOVER                               PARTNER-DOWN
4609	                |                                        |
4610	                | >--UPDREQ-------------------->         |
4611	                |                                        |
4612	                |        <---------------------BNDUPD--< |
4613	                | >--BNDACK-------------------->         |
4614	               ...                                      ...
4615	                |                                        |
4616	                |        <---------------------BNDUPD--< |
4617	                | >--BNDACK-------------------->         |
4618	                |                                        |
4619	                |        <--------------------UPDDONE--< |
4620	                |                                        |
4621	       Wait MCLT from last known                         |
4622	          time of operation                              |
4623	                |                                        |
4624	           RECOVER-DONE                                  |
4625	                |                                        |
4626	                | >--STATE-(RECOVER-DONE)------>         |
4627	                |                                     NORMAL
4628	                |        <-------------(NORMAL)-STATE--< |
4629	             NORMAL                                      |
4630	                |                                        |
4631	                |                                        |

4633	              Figure 9.5.2-1:  Transition out of RECOVER state

4635	9.6.  NORMAL state

4637	   NORMAL state is the state used by a server when it is communicating
4638	   with the other server, and any required resynchronization has been
4639	   performed. While some bindings database synchronization is performed
4640	   in NORMAL state, potential conflicts are resolved prior to entry into
4641	   NORMAL state as is binding database data loss.

4643	9.6.1.  Upon Entry to NORMAL state

4645	   When entering NORMAL state, a server will send to the other server
4646	   all currently unacknowledged binding updates as BNDUPD messages.

4648	   When the above process is complete, if the server entering NORMAL
4649	   state is a secondary server, then it will request IP addresses for
4650	   allocation using the POOLREQ message.

4652	9.6.2.  Processing DHCP client requests and load balancing

4654	   When in NORMAL state, each server MUST process all requests from some
4655	   DHCP clients, and MUST NOT process any request other than a
4656	   DHCPREQUEST/RENEWAL or a DHCPREQUEST/REBINDING request from some
4657	   other DHCP clients.

4659	   However, if the load balancing algorithm specified in [LOADB] is used
4660	   with a pair of servers implementing the failover protocol, then each
4661	   server needs to test each incoming DHCP client request to see if it
4662	   should process that request.

4664	   As discussed in section 5.3, each server will take the client-
4665	   identifier from each DHCP client request (or the client-hardware-
4666	   address, i.e., the htype concatenated to the front of the chaddr if
4667	   no client-identifier is present in the request) and use it as the
4668	   'Request ID' specified in [LOADB].  After applying the algorithm
4669	   specified in [LOADB] and comparing the result with the hash bucket
4670	   assignment (performed during connect processing between failover
4671	   servers), each failover server will be able to unambiguously deter-
4672	   mine if it should processes the DHCP client request.

4674	   In NORMAL state, a server MUST process every DHCPREQUEST/RENEWAL or
4675	   DHCPREQUEST/REBINDING request it receives.

4677	9.6.3.  Operation in NORMAL state

4679	   When in NORMAL state, for every DHCP client request that it
4680	   processes, as determined by the algorithm described in section 9.6.2,
4681	   above, a server will operate in the following manner:

4683	      o Lease time calculations

4685	        As discussed in section 5.2.1, "Control of lease time", the
4686	        lease interval given to a DHCP client can never be more than the
4687	        MCLT greater than the most recently received potential-
4688	        expiration-time from the failover partner or the current time,
4689	        whichever is later.

4691	        As long as a server adheres to this constraint, the specifics of
4692	        the lease interval that it gives to a DHCP client or the value
4693	        of the potential-expiration-time sent to its failover partner
4694	        are implementation dependent.  One possible approach is
4695	        discussed in section 5.2.1, but that particular approach is in
4696	        no way required by this protocol.

4698	        See section 7.1.4 for details concerning the storage of time
4699	        associated IP addresses and how to use these times when calcu-
4700	        lating lease times for DHCP clients.

4702	      o Lazy update of partner server

4704	        After an ACK of a IP address binding, the server servicing a
4705	        DHCP client request attempts to update its partner with the new
4706	        binding information.  The lease time used in the update of the
4707	        secondary MUST be at that given to the DHCP client in the
4708	        DHCPACK, and the potential-expiration-time MUST be at least the
4709	        lease time, and SHOULD be longer.

4711	      o Reallocation of IP addresses between clients

4713	        Whenever a client binding is released or expires, a BNDUPD mes-
4714	        sage must be sent to partner, setting the binding state to
4715	        RELEASED or EXPIRED.  However, until a BNDACK is received for
4716	        this message, the IP address cannot be allocated to another
4717	        client.  It can be allocated to the same client again.

4719	   In normal state, the each server receives binding updates from its
4720	   partner server in BNDUPD messages.  It records these in its client
4721	   binding database in stable storage and then sends a corresponding
4722	   BNDACK message to the primary server.  It MUST ensure that the infor-
4723	   mation is recorded in stable storage prior to sending the BNDACK mes-
4724	   sage back to the primary server.

4726	9.6.4.  Transitions out of NORMAL state

4728	   If an external command is received by a server in NORMAL state
4729	   informing it that its partner is down, then transition into PARTNER-
4730	   DOWN state.

4732	   If a server in NORMAL state fails to receive acks to messages sent to
4733	   its partner for an implementation dependent period of time, it MAY
4734	   move into COMMUNICATIONS-INTERRUPTED state.  This situation might
4735	   occur if the partner server was capable of maintaining the TCP con-
4736	   nection between the server and also capable of sending a CONTACT mes-
4737	   sage every tSend seconds, but was (for some reason) incapable of pro-
4738	   cessing BNDUPD messages.

4740	   If the communications is determined to not be "ok" (as defined in
4741	   section 8), then transition into COMMUNICATIONS-INTERRUPTED state.

4743	   If a server in NORMAL state receives any messages from its partner
4744	   where the partner has changed state from that expected by the server
4745	   in NORMAL state, then the server should transition into
4746	   COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran-
4747	   sition from there.  For example, it would be expected for the partner
4748	   to transition from POTENTIAL-CONFLICT into NORMAL state, but not for
4749	   the partner to transition from NORMAL into POTENTIAL-CONFLICT state.

4751	9.7.  COMMUNICATIONS-INTERRUPTED State

4753	   A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is
4754	   unable to communicate with the other server.  Primary and secondary
4755	   servers cycle automatically (without administrative intervention)
4756	   between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network
4757	   connection between them fails and recovers, or as the partner server
4758	   cycles between operational and non-operational.  No duplicate IP
4759	   address allocation can occur while the servers cycle between these
4760	   states.

4762	9.7.1.  Upon Entry to COMMUNICATIONS-INTERRUPTED state

4764	   When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been
4765	   configured to support an automatic transition out of COMMUNICATIONS-
4766	   INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period"
4767	   has been configured, see section 10), then a timer MUST be started
4768	   for a the length of the configured safe period.

4770	   A server transitioning into the COMMUNICATIONS-INTERRUPTED state from
4771	   the NORMAL state SHOULD raise some alarm condition to alert adminis-
4772	   trative staff to a potential problem in the DHCP subsystem.

4774	9.7.2.  Operation in COMMUNICATIONS-INTERRUPTED State

4776	   In this state a server MUST respond to all DHCP client requests, and
4777	   the algorithm for load balancing described in section 5.3 MUST NOT be
4778	   used.  When allocating new IP addresses, each server allocates from
4779	   its own IP address pool, where the primary MUST allocate only FREE IP
4780	   addresses, and the secondary MUST allocate only BACKUP IP addresses.
4781	   When responding to renewal requests, each server will allow continued
4782	   renewal of a DHCP client's current lease on an IP address irrespec-
4783	   tive of whether that lease was given out by the receiving server or
4784	   not, although the renewal period MUST not exceed the maximum client
4785	   lead time (MCLT) beyond the potential-expiration-time already ack-
4786	   nowledged by the other server or the lease-expiration-time or
4787	   potential-expiration-time received from the partner server.

4789	   However, since the server cannot communicate with its partner in this
4790	   state, the acknowledged-potential-expiration time will not be updated
4791	   in any new bindings.  This is likely to eventually cause the actual-
4792	   client-lease-times to be the current time plus the maximum-client-
4793	   lead-time (unless this is greater than the desired-client-lease-
4794	   time).

4796	9.7.3.  Transition out of COMMUNICATIONS-INTERRUPTED State

4798	   If the safe period timer expires while a server is in the
4799	   COMMUNICATIONS-INTERRUPTED state, it will transition immediately into
4800	   PARTNER-DOWN state.

4802	   If an external command is received by a server in COMMUNICATIONS-
4803	   INTERRUPTED state informing it that its partner is down, it will
4804	   transition immediately into PARTNER-DOWN state.

4806	   If communications is restored with the other server, then the server
4807	   in COMMUNICATIONS-INTERRUPTED state will transition into another
4808	   state based on the state of the partner:

4810	      o partner in NORMAL or COMMUNICATIONS-INTERRUPTED

4812	        The partner really SHOULD NOT be in NORMAL state here, since
4813	        upon restoration of communications is MUST have created a new
4814	        TCP connection which would have forced it into COMMUNICATIONS-
4815	        INTERRUPTED state.  Still, we should account for every state
4816	        just in case.

4818	        Transition into the NORMAL state.

4820	      o partner in RECOVER

4822	        Stay in COMMUNICATIONS-INTERRUPTED state.

4824	      o partner in RECOVER-DONE

4826	        Transition into NORMAL state.

4828	      o partner in PARTNER-DOWN or POTENTIAL-CONFLICT

4830	        Transition into POTENTIAL-CONFLICT state.

4832	      o partner in PAUSED

4834	        Stay in COMMUNICATIONS-INTERRUPTED state.

4836	      o partner in SHUTDOWN

4838	        Transition into PARTNER-DOWN state.

4840	   The following figure illustrates the transition from NORMAL to
4841	   COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again.

4843	             Primary                                Secondary
4844	              Server                                  Server

4846	              NORMAL                                  NORMAL
4847	                | >--CONTACT------------------->         |
4848	                |        <--------------------CONTACT--< |
4849	                |         [TCP connection broken]        |
4850	           COMMUNICATIONS          :              COMMUNICATIONS
4851	             INTERRUPTED           :                INTERRUPTED
4852	                |      [attempt new TCP connection]      |
4853	                |         [connection succeeds]          |
4854	                |                                        |
4855	                | >--CONNECT------------------->         |
4856	                |        <-----------------CONNECTACK--< |
4857	                |        <-------------------STATE-----< |
4858	                |                                     NORMAL
4859	                | >--STATE--------------------->         |
4860	              NORMAL                                     |
4861	                | >--BNDUPD-------------------->         |
4862	                |        <---------------------BNDACK--< |
4863	                |                                        |
4864	                |        <---------------------BNDUPD--< |
4865	                | >------BNDACK---------------->         |
4866	               ...                                      ...
4867	                |                                        |
4868	                |        <--------------------POOLREQ--< |
4869	                | >--POOLRESP-(2)-------------->         |
4870	                |                                        |
4871	                | >--BNDUPD-(#1)--------------->         |
4872	                |        <---------------------BNDACK--< |
4873	                |                                        |
4874	                |        <--------------------POOLREQ--< |
4875	                | >--POOLRESP-(0)-------------->         |
4876	                |                                        |
4877	                | >--BNDUPD-(#2)--------------->         |
4878	                |        <---------------------BNDACK--< |
4879	                |                                        |

4881	       Figure 9.7.3-1:  Transition from NORMAL to COMMUNICATIONS-
4882	                        INTERRUPTED and back (example with 2
4883	                        addresses allocated to secondary)

4885	9.8.  POTENTIAL-CONFLICT state

4887	   This state indicates that the two servers are attempting to re-
4888	   integrate with each other, but at least one of them was running in a
4889	   state that did not guarantee automatic reintegration would be
4890	   possible.  In POTENTIAL-CONFLICT state the servers may determine that
4891	   the same IP address has been offered and accepted by two different
4892	   DHCP clients.

4894	   It is a goal of this protocol to minimize the possibility that
4895	   POTENTIAL-CONFLICT state is ever entered.

4897	9.8.1.  Upon Entry to POTENTIAL-CONFLICT

4899	   When a primary server enters POTENTIAL-CONFLICT state it should
4900	   request that the secondary send it all updates of which it is
4901	   currently unaware by sending an UPDREQ message to the secondary
4902	   server.

4904	   A secondary server entering POTENTIAL-CONFLICT state will wait for
4905	   the primary to send it an UPDREQ message.

4907	9.8.2.  Operation in POTENTIAL-CONFLICT state

4909	   Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming
4910	   DHCP requests.

4912	9.8.3.  Transitions out of POTENTIAL-CONFLICT state

4914	   If communications fails with the partner while in POTENTIAL-CONFLICT
4915	   state, then a primary server will transition to PARTNER-DOWN state
4916	   and a secondary server will stay in POTENTIAL-CONFLICT state.

4918	   Whenever either server receives an UPDDONE message from its partner
4919	   while in POTENTIAL-CONFLICT state, it MUST transition to NORMAL
4920	   state.  This will cause the primary server to leave POTENTIAL-
4921	   CONFLICT state prior to the secondary, since the primary sends an
4922	   UPDREQ message and receives an UPDDONE before the secondary sends an
4923	   UPDREQ message and receives its UPDDONE message.

4925	   When a secondary server receives an indication that the primary
4926	   server has transitioned from POTENTIAL-CONFLICT to NORMAL state, it
4927	   SHOULD send an UPDREQ message to the primary server.

4929	              Primary                                Secondary
4930	              Server                                  Server

4932	                |                                        |
4933	         POTENTIAL-CONFLICT                    POTENTIAL-CONFLICT
4934	                |                                        |
4935	                | >--UPDREQ-------------------->         |
4936	                |                                        |
4937	                |        <---------------------BNDUPD--< |
4938	                | >--BNDACK-------------------->         |
4939	               ...                                      ...
4940	                |                                        |
4941	                |        <---------------------BNDUPD--< |
4942	                | >--BNDACK-------------------->         |
4943	                |                                        |
4944	                |        <--------------------UPDDONE--< |
4945	              NORMAL                                     |
4946	                | >--STATE--(NORMAL)----------->         |
4947	                |        <---------------------UPDREQ--< |
4948	                |                                        |
4949	                | >--BNDUPD-------------------->         |
4950	                |        <---------------------BNDACK--< |
4951	               ...                                      ...
4952	                | >--BNDUPD-------------------->         |
4953	                |        <---------------------BNDACK--< |
4954	                |                                        |
4955	                | >--UPDDONE------------------->         |
4956	                |                                     NORMAL
4957	                |                                        |
4958	                |        <--------------------POOLREQ--< |
4959	                | >------POOLRESP-(n)---------->         |
4960	                |              addresses                 |

4962	           Figure 9.8.3-1:  Transition out of POTENTIAL-CONFLICT

4964	9.9.  RESOLUTION-INTERRUPTED state

4966	   This state indicates that the two servers were attempting to re-
4967	   integrate with each other in POTENTIAL-CONFLICT state, but
4968	   communications failed prior to completion of re-integration.

4970	   If the servers remained in POTENTIAL-CONFLICT while communications
4971	   was interrupted, neither server would be responsive to DHCP client
4972	   requests, and if one server had crashed, then there might be no
4973	   server able to process DHCP requests.

4975	9.9.1.  Upon Entry to RESOLUTION-INTERRUPTED state

4977	   When a server enters RESOLUTION-INTERRUPTED SHOULD raise an alarm
4978	   condition to alert administrative staff of a problem in the DHCP sub-
4979	   system.

4981	9.9.2.  Operation in RESOLUTION-INTERRUPTED state

4983	   In this state a server MUST respond to all DHCP client requests, and
4984	   any load balancing (described in section 5.3) MUST NOT be used.  When
4985	   allocating new IP addresses, each server SHOULD allocate from its own
4986	   IP address pool (if that can be determined), where the primary MUST
4987	   allocate only FREE IP addresses, and the secondary MUST allocate only
4988	   BACKUP IP addresses.  When responding to renewal requests, each
4989	   server will allow continued renewal of a DHCP client's current lease
4990	   on an IP address irrespective of whether that lease was given out by
4991	   the receiving server or not, although the renewal period MUST not
4992	   exceed the maximum client lead time (MCLT) beyond the potential-
4993	   expiration-time already acknowledged by the other server or the
4994	   lease-expiration-time or potential-expiration-time received from the
4995	   partner server.

4997	   However, since the server cannot communicate with its partner in this
4998	   state, the acknowledged-potential-expiration time will not be updated
4999	   in any new bindings.

5001	9.9.3.  Transitions out of RESOLUTION-INTERRUPTED state

5003	   If an external command is received by a server in RESOLUTION-
5004	   INTERRUPTED state informing it that its partner is down, it will
5005	   transition immediately into PARTNER-DOWN state.

5007	   If communications is restored with the other server, then the server
5008	   in RESOLUTION-INTERRUPTED state will transition into POTENTIAL-
5009	   CONFLICT state.

5011	9.10.  RECOVER-DONE state

5013	   This state exists to allow an interlocked transition for one server
5014	   from RECOVER state and another server from PARTNER-DOWN or
5015	   COMMUNICATIONS-INTERRUPTED state into NORMAL state.

5017	9.10.1.  Operation in RECOVER-DONE state

5019	   A server in RECOVER-DONE state MUST respond only to
5020	   DHCPREQUEST/RENEWAL and DHCPREQUEST/REBINDING DHCP messages.

5022	9.10.2.  Transitions out of RECOVER-DONE state

5024	   When a server in RECOVER-DONE state determines that its partner
5025	   server has entered NORMAL state, then it will transition into NORMAL
5026	   state as well.

5028	9.11.  PAUSED state

5030	   This state exists to allow one server to inform another that it will
5031	   be out of service for what is predicted to be a relatively short
5032	   time, and to allow the other server to transition to COMMUNICATIONS-
5033	   INTERRUPTED state immediately and to begin servicing all DHCP clients
5034	   with no interruption in service to new DHCP clients.

5036	   A server which is aware that it is shutting down temporarily SHOULD
5037	   send a STATE message with the server-state option containing PAUSED
5038	   state and close the TCP connection.

5040	   While a server may or may not transition internally into PAUSED
5041	   state, the 'previous' state determined when it is restarted MUST be
5042	   the state the server was in prior to receiving the command to shut-
5043	   down and restart and which precedes its entry into the PAUSED state.
5044	   See section 9.3.2 concerning the use of the previous state upon
5045	   server restart.

5047	9.11.1.  Upon entry to PAUSED state

5049	   When entering PAUSED state, the server MUST store the previous state
5050	   in stable storage, and use that state as the previous state when it
5051	   is restarted.

5053	9.11.2.  Transitions out of PAUSED state

5055	   A server transitions out of PAUSED state by being restarted.  At that
5056	   time, the previous state MUST be the state the server was in prior to
5057	   entering the PAUSED state.

5059	9.12.  SHUTDOWN state

5061	   This state exists to allow one server to inform another that it will
5062	   be out of service for what is predicted to be a relatively long time,
5063	   and to allow the other server to transition immediately to PARTNER-
5064	   DOWN state, and take over completely for the server going down.

5066	   A server which is aware that it is shutting down SHOULD send a STATE
5067	   message with the server-state field containing SHUTDOWN.

5069	   While a server may or may not transition internally into SHUTDOWN
5070	   state, the 'previous' state determined when it is restarted MUST be
5071	   the state active prior to the command to shutdown.  See section 9.3.2
5072	   concerning the use of the previous state upon server restart.

5074	9.12.1.  Upon entry to SHUTDOWN state

5076	   When entering SHUTDOWN state, the server MUST record the previous
5077	   state in stable storage for use when the server is restarted.  It
5078	   also MUST record the current time as the last time operational.

5080	   A server which is aware that it is shutting down SHOULD send a STATE
5081	   message with the server-state field containing SHUTDOWN.

5083	9.12.2.  Operation in SHUTDOWN state

5085	   A server in SHUTDOWN state MUST NOT respond to any DHCP client input.

5087	   If a server receives any message indicating that the partner has
5088	   moved to PARTNER-DOWN state while it is in SHUTDOWN state then it
5089	   MUST record RECOVER state as the previous state to be used when it is
5090	   restarted.

5092	   A server SHOULD wait for a few seconds after informing the partner of
5093	   entry into SHUTDOWN state (if communications are okay) to determine
5094	   if it will enter PARTNER-DOWN state.

5096	9.12.3.  Transitions out of SHUTDOWN state

5098	   A server transitions out of SHUTDOWN state by being restarted.

5100	10.  Safe Period

5102	   Due to the restrictions imposed on each server while in
5103	   COMMUNICATIONS-INTERRUPTED state, long-term operation in this state
5104	   is not feasible for either server.  One reason that these states
5105	   exist at all, is to allow the servers to easily survive transient
5106	   network communications failures of a few minutes to a few days
5107	   (although the actual time periods will depend a great deal on the
5108	   DHCP activity of the network in terms of arrival and departure of
5109	   DHCP clients on the network).

5111	   Eventually, when the servers are unable to communicate, they will
5112	   have to move into a state where they no longer can re-integrate
5113	   without some possibility of a duplicate IP address allocation.  There
5114	   are two ways that they can move into this state (known as PARTNER-
5115	   DOWN).

5117	   They can either be informed by external command that, indeed, the
5118	   partner server is down.  In this case, there is no difficulty in mov-
5119	   ing into the PARTNER-DOWN state since it is an accurate reflection of
5120	   reality and the protocol has been designed to operate correctly (even
5121	   during reintegration) if, when in PARTNER-DOWN state the partner is,
5122	   indeed, down.

5124	   The more difficult scenario is when the servers are running unat-
5125	   tended for extended periods, and in this case an option is provided
5126	   to configure something called a "safe-period" into each server.  This
5127	   OPTIONAL safe-period is the period after which either the primary or
5128	   secondary server will automatically transition to PARTNER-DOWN from
5129	   COMMUNICATIONS-INTERRUPTED state.  If this transition is completed
5130	   and the partner is not down, then the possibility of duplicate IP
5131	   address allocations will exist.

5133	   The goal of the "safe-period" is to allow network operations staff
5134	   some time to react to a server moving into COMMUNICATIONS-INTERRUPTED
5135	   state.  During the safe-period the only requirement is that the net-
5136	   work operations staff determine if both servers are still running --
5137	   and if they are, to either fix the network communications failure
5138	   between them, or to take one of the servers down before the  expira-
5139	   tion of the safe-period.

5141	   The length of the safe-period is installation dependent, and depends
5142	   in large part on the number of unallocated IP addresses within the
5143	   subnet address pool and the expected frequency of arrival of previ-
5144	   ously unknown DHCP clients requiring IP addresses.  Many environments
5145	   should be able to support safe-periods of several days.

5147	   During this safe period, either server will allow renewals from any
5148	   existing client.  The only limitation concerns the need for IP
5149	   addresses for the DHCP server to hand out to new DHCP clients and the
5150	   need to re-allocate IP addresses to different DHCP clients.

5152	   The number of "extra" IP addresses required is equal to the expected
5153	   total number of new DHCP clients encountered during the safe period.
5154	   This is dependent only on the arrival rate of new DHCP clients, not
5155	   the total number of outstanding leases on IP addresses.

5157	   In the unlikely event that a relatively short safe period of an hour
5158	   is all that can be used (given a dearth of IP addresses or a very
5159	   high arrival rate of new DHCP clients), even that can provide sub-
5160	   stantial benefits in allowing the DHCP subsystem to ride through
5161	   minor problems that could occur and be fixed within that hour.  In
5162	   these cases, no possibility of duplicate IP address allocation
5163	   exists, and re-integration after the failure is solved will be
5164	   automatic and require no operator intervention.

5166	11.  Security

5168	   The Failover protocol communicates DHCP lease activity and this data
5169	   is generally easily discovered via other means, such as by pinging
5170	   addresses and doing DNS lookups. Therefore, the need to encrypt the
5171	   data over the wire is likely not great (though some sites may feel
5172	   differently).

5174	   However, it is very desirable to assure the integrity of failover
5175	   partners and to thus ensure proper operation of the servers. For
5176	   example, denial of service attacks are possible by the communication
5177	   of invalid state information to one or both servers.

5179	   Therefore, the Failover protocol MUST be capable of being secured by
5180	   using a simple shared secret message digest which covers each mes-
5181	   sage.  This provides authentication of the servers, but does not pro-
5182	   vide encryption of the data exchange.

5184	   The Failover protocol MAY also be secured by using TLS [TLS] (Tran-
5185	   sport Layer Security) if encryption of the data exchange is desired.
5186	   The use of the shared secret or TLS will not protect against TCP or
5187	   IP layer attacks (such as someone sending fake TCP RST segments).
5188	   IPsec SHOULD be used to protect against most (if not all) of these
5189	   kinds of attacks.

5191	11.1.  Simple shared secret

5193	   Messages between the failover partners are authenticated through the
5194	   use of a shared secret, which is never sent over the network and must
5195	   be known by each server. How each server is told about this shared
5196	   secret and secures its storage of the shared secret is outside the
5197	   scope of this document.  If a server is configured with a shared
5198	   secret for a partner, it MUST send the message-digest option in ALL
5199	   messages to that partner and it MUST treat any messages received from
5200	   that partner without a message-digest option as failing authentica-
5201	   tion.

5203	   If a server is not configured with a shared secret for a partner, it
5204	   MUST NOT send the message-digest option in any message to that
5205	   partner and it MUST treat any messages received from that partner
5206	   with a message-digest option as failing authentication.

5208	   The shared secret is used to calculate a 16 octet message-digest
5209	   which is sent in every failover message as the message-digest option.
5210	   See section 6.2.25. The message-digest contains a one-way 16 octet
5211	   MD5 [MD5] hash calculated over a stream of octets consisting of the
5212	   entire message concatenated with the shared secret.

5214	   For calculation, the message includes the message-digest option with
5215	   the message-digest data zeroed (16-octets of zero). Once the calcula-
5216	   tion is complete, these 16 octets of zero are replaced by the 16-
5217	   octet MD5 hash and the message is sent.

5219	   For verification, the 16-octet message-digest is saved and replaced
5220	   with 16-octets of zero and calculated per above. The resulting MD5
5221	   hash is compared to the received hash and if they match, the message
5222	   is assumed authenticated.

5224	   A failover partner that fails to authenticate a received message or
5225	   receives a message without a message-digest option when configured
5226	   with a shared secret MUST close the connection immediately and take
5227	   steps to notify operators.

5229	   This use of the shared secret is very similar to that used for RADIUS
5230	   Accounting [RADIUS].

5232	11.2.  TLS

5234	   TLS, Transport Layer Security, as specified in [TLS] MAY be used.
5235	   The use of TLS would be similar to the way it is used with SMTP
5236	   [SMTPTLS] and IMAP/POP3/ACAP [IPAMTLS].

5238	   To request the use of TLS, the server that successfully opened a con-
5239	   nection to its peer MUST send the TLS option as part of the CONNECT
5240	   message.  The server receiving the TLS option MUST respond with a
5241	   TLS-reply option indicating its acceptance or rejection of the TLS-
5242	   request in the CONNECT message.

5244	   If the CONNECTACK message contained a TLS-reply of 1 , then both
5245	   servers begin TLS negotiation.

5247	   Upon completion of this negotiation, the server which originally sent
5248	   the CONNECT message MUST resent its CONNECT message without any TLS-
5249	   request, and must wait for a corresponding CONNECTACK.

5251	   Implementation of the TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA [TLS] cipher
5252	   suite is REQUIRED in Failover servers supporting TLS. This is impor-
5253	   tant as it assures that any two compliant implementations can be con-
5254	   figured to interoperate.

5256	12.  Acknowledgments

5258	   Ralph Droms started it all, by sketching out an initial interserver
5259	   draft that embodied ideas from several past IETF meetings.  In that
5260	   draft, he acknowledged contributions by Jeff Mogul, Greg Minshall,
5261	   Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group.

5263	   Kim Kinnear and Bob Cole each extended that draft, separately and
5264	   then together, until they created an interserver draft that supported
5265	   any number of servers.  The complexity of that approach was just too
5266	   great, and that draft wasn't greeted with enthusiasm by many, includ-
5267	   ing its authors.

5269	   It did however lead to a much simpler approach embodied in the first
5270	   Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph
5271	   Droms.  This draft posited only two servers -- a primary and a secon-
5272	   dary.

5274	   Kim Kinnear then wrote the Safe Failover draft to layer on top of the
5275	   Failover Draft and increase its robustness in the face of certain
5276	   rare network failures.

5278	   At the spring 1998 IETF meeting in LA, the DHC working group said
5279	   that they wanted a merged Failover and Safe Failover draft.  Steve
5280	   Gonczi and Bernie Volz stepped up and produced the raw material for
5281	   such a merged draft, along with a new message format designed around
5282	   DHCP options and other extensions and clarifications.  Kim Kinnear
5283	   edited their work into draft format and made other changes in time
5284	   for the Summer Chicago IETF meeting.

5286	   During the summer and fall of 1998, two groups worked on separate
5287	   implementations of the UDP failover draft.  Bernie Volz and Steve
5288	   Gonczi constituted one group, and Kim Kinnear, Mark Stapp and Paul
5289	   Fox made up the other.  These two groups worked together to produce
5290	   considerable changes and simplifications of the protocol during that
5291	   period, and Steve Gonczi and Kim Kinnear edited those changes into
5292	   -03 draft in time for submission to the December 1998 Orlando IETF
5293	   meeting.

5295	   In February of 1999 Kim Kinnear and Mark Stapp hosted a meeting on
5296	   people interested in the failover draft.  During that meeting a gen-
5297	   eral agreement was reached to recast the failover protocol to use TCP
5298	   instead of UDP.  In addition, the group together brainstormed a work-
5299	   able load-balancing technique.  Kim Kinnear rewrote the entire draft
5300	   to include the changes made at that meeting as well as to restructure
5301	   the draft along guidelines suggested by Thomas Narten.  The result
5302	   was the -04 draft, submitted prior to the Oslo IETF meeting.

5304	   The initial idea for a hash-based load balancing approach was offered
5305	   by Ted Lemon, and the determination of an algorithm and its integra-
5306	   tion into the draft was done by Steve Gonczi.  The security section
5307	   was spearheaded by Bernie Volz.  Both contributed considerably to the
5308	   ideas and text in the rest of the draft with several reviews.

5310	   In early October of 1999, three conference calls were held to discuss
5311	   the -04 draft.  The current draft (-05) includes changes as a result
5312	   of those calls, perhaps the largest of which was to remove the load-
5313	   balancing approach into a separate draft.   Thanks to all of the many
5314	   people whoe participated in the conference calls.  This current draft
5315	   was changed because of contributions by: Ted Lemon, David Erdmann,
5316	   Richard Jones, Rob Stevens, Thomas Narten, Diana Lane, and Andre Kos-
5317	   tur.

5319	   These most recent changes have been widely circulated among the other
5320	   authors, but that does not preclude any of them from expressing
5321	   disagreement with what is contained in this draft at any future time.

5323	   Many people have reviewed the various earlier drafts that went into
5324	   this result.  At American Internet, ideas were contributed by Brad
5325	   Parker.  At Cisco Systems Paul Fox and Ellen Garvey contributed to
5326	   the design of the protocol.

5328	   Glenn Waters of Nortel Networks contributed ideas and enthusiasm to
5329	   make a Failover protocol that was both "safe" and "lazy".

5331	13.  References

5333	   [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC
5334	      2131, March 1997.

5336	   [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate
5337	      Requirement Levels", RFC 2119.

5339	   [RFC 2132] Alexander, S.,  Droms, R., "DHCP Options and BOOTP Vendor
5340	      Extensions", Internet RFC 2132, March 1997.

5342	   [TLS] Dierks, T., "The TLS Protocol, Version 1.0", RFC 2246, January
5343	      1999.

5345	   [SMTPTLS] Hoffman, P., "SMTP Service Extension for Secure SMTP over
5346	      TLS", RFC 2487, January 1999.

5348	   [IMAPTLS] Newman, C., "Using TLS with IMAP, POP3, and ACAP", RFC
5349	      2595, June 1999.

5351	   [NAMESPACE] Carney, M., "draft-ietf-dhc-option_review_and_namespace-
5352	      00.txt", June 1999.

5354	   [DDNS] Rekhter, Y., Stapp, M., "draft-ietf-dhc-dhcp-dns-11.txt",
5355	      October, 1999.

5357	   [MD5] Rivest, R., and Dusse, S., "The MD5 Message-Digest Algorithm",
5358	      RFC 1321, MIT Laboratory for Computer Science, RSA Data Security
5359	      Inc., April 1992.

5361	   [RADIUS] Rigney, C., "Radius Accounting", RFC 2139, Livingston Enter-
5362	      prises, April 1997.

5364	   [LOADB] Volz, B., Gonczi, S., Lemon, T., Stevens, R., "draft-ietf-
5365	      dhc-loadb-00.txt", October, 1999.

5367	   [RFC1035] Mockapetris, P., "Domain Names - Implementation and Specif-
5368	      ication", November, 1987.

5370	   [AGENTINFO] Patrick, M., "draft-ietf-dhc-agent-options-07.txt",
5371	      August, 1999. [USERCLASS] Stump, G., Droms, R., "draft-ietf-dhc-
5372	      userclass-04.txt", October, 1999.

5374	   [RFC2136] P. Vixie, S. Thomson, Y. Rekhter, J. Bound, "Dynamic
5375	      Updates in the Domain Name System (DNS UPDATE)", RFC2136, April
5376	      1997

5378	14.  Author's information

5380	      Ralph Droms
5381	      323 Dana Engineering
5382	      Bucknell University
5383	      Lewisburg, PA  17837

5385	      Phone: (717) 524-1145
5386	      EMail: droms@bucknell.edu

5388	      Greg Rabil, Mike Dooley, Arun Kapur
5389	      Lucent Technologies
5390	      10 Valley Stream Parkway, Suite 240
5391	      Malvern, PA 19355

5393	      Phone: (800) 208-2747

5395	      EMail: grabil@lucent.com
5396	             mdooley@lucent.com
5397	             akapur@lucent.com

5399	      Kim Kinnear
5400	      Mark Stapp
5401	      Cisco Systems
5402	      250 Apollo Drive
5403	      Chelmsford, MA  01824

5405	      Phone: (978) 244-8000

5407	      EMail: kkinnear@cisco.com
5408	             mjs@cisco.com

5410	      Bernie Volz
5411	      Steve Gonczi
5412	      Process Software Corporation
5413	      959 Concord St.
5414	      Framingham, MA  01701

5416	      Phone: (508) 879-6994

5418	      EMail: volz@process.com
5419	             gonczi@process.com

5421	15.  Full Copyright Statement

5423	Copyright (C) The Internet Society (1999). All Rights Reserved.

5425	This document and translations of it may be copied and furnished to oth-
5426	ers, and derivative works that comment on or otherwise explain it or
5427	assist in its implementation may be prepared, copied, published and dis-
5428	tributed, in whole or in part, without restriction of any kind, provided
5429	that the above copyright notice and this paragraph are included on all
5430	such copies and derivative works.  However, this document itself may not
5431	be modified in any way, such as by removing the copyright notice or
5432	references to the Internet Society or other Internet organizations,
5433	except as needed for the  purpose of developing Internet standards in
5434	which case the procedures for copyrights defined in the Internet Stan-
5435	dards process must be followed, or as required to translate it into
5436	languages other than English.

5438	The limited permissions granted above are perpetual and will not be
5439	revoked by the Internet Society or its successors or assigns.

5441	This document and the information contained herein is provided on an "AS
5442	IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
5443	FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
5444	LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
5445	INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT-
5446	NESS FOR A PARTICULAR PURPOSE.

5448	Open Issues

5450	   These issues need to be resolved:

5452	      1.  Need to figure out how to get 16 bit options without referenc-
5453	          ing the [NAMESPACE] draft, since it doesn't really define them
5454	          anymore.

5456	      2.  We need to deal with the option space, and the procedures for
5457	          managing it.  Probably IANA.

5459	      3.  Figure out a better way to identify vendors.  How about an
5460	          SNMP Enterprise MIB value?

5462	      4.  Need to tie reject-reasons to text of draft, remove obsolete
5463	          reject-reasons.

5465	      5.  Using tables, compress description of sending BNDUPD message
5466	          to save duplicated words, enhance description of differences.