idnits 2.17.1 

draft-ietf-dhc-dhcpv6-failover-requirements-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RFC3315]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 19, 2013) is 3928 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 3315 (Obsoleted by RFC 8415)

  ** Obsolete normative reference: RFC 3633 (Obsoleted by RFC 8415)

  == Outdated reference: A later version (-02) exists of
     draft-ietf-dhc-dhcpv6-load-balancing-00


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Dynamic Host Configuration (DHC)                            T. Mrugalski
3	Internet-Draft                                                       ISC
4	Intended status: Informational                                K. Kinnear
5	Expires: January 20, 2014                                          Cisco
6	                                                           July 19, 2013

8	                      DHCPv6 Failover Requirements
9	             draft-ietf-dhc-dhcpv6-failover-requirements-07

11	Abstract

13	   The DHCPv6 protocol, defined in [RFC3315] allows for multiple servers
14	   to operate on a single network, however it does not define any way
15	   the servers could share information about currently active clients
16	   and their leases.  Some sites are interested in running multiple
17	   servers in such a way as to provide increased availability in case of
18	   server failure.  In order for this to work reliably, the cooperating
19	   primary and secondary servers must maintain a consistent database of
20	   the lease information.  [RFC3315] allows for but does not define any
21	   redundancy or failover mechanisms.  This document outlines
22	   requirements for DHCPv6 failover, enumerates related problems, and
23	   discusses the proposed scope of work to be conducted.  This document
24	   does not define a DHCPv6 failover protocol.

26	Status of this Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at http://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on January 20, 2014.

43	Copyright Notice

45	   Copyright (c) 2013 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
61	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  3
62	   3.  Scope of work  . . . . . . . . . . . . . . . . . . . . . . . .  5
63	     3.1.  Alternatives to Failover . . . . . . . . . . . . . . . . .  5
64	       3.1.1.  Short-lived addresses  . . . . . . . . . . . . . . . .  5
65	       3.1.2.  Redundant servers  . . . . . . . . . . . . . . . . . .  6
66	       3.1.3.  Distributed databases  . . . . . . . . . . . . . . . .  6
67	       3.1.4.  Load Balancing . . . . . . . . . . . . . . . . . . . .  7
68	   4.  Failover Scenarios . . . . . . . . . . . . . . . . . . . . . .  7
69	     4.1.  Hot Standby Model  . . . . . . . . . . . . . . . . . . . .  7
70	     4.2.  Geographically Distributed Failover  . . . . . . . . . . .  7
71	     4.3.  Load balancing . . . . . . . . . . . . . . . . . . . . . .  7
72	     4.4.  1-to-1, m-to-1 and m-to-n models . . . . . . . . . . . . .  8
73	     4.5.  Split prefixes . . . . . . . . . . . . . . . . . . . . . .  8
74	     4.6.  Long lived connections . . . . . . . . . . . . . . . . . .  8
75	     4.7.  Partial server communication loss  . . . . . . . . . . . .  8
76	   5.  Principles of DHCPv6 Failover  . . . . . . . . . . . . . . . .  9
77	     5.1.  Failure modes  . . . . . . . . . . . . . . . . . . . . . .  9
78	       5.1.1.  Server Failure . . . . . . . . . . . . . . . . . . . .  9
79	       5.1.2.  Network partition  . . . . . . . . . . . . . . . . . . 10
80	     5.2.  Synchronization mechanisms . . . . . . . . . . . . . . . . 11
81	       5.2.1.  Lockstep . . . . . . . . . . . . . . . . . . . . . . . 11
82	       5.2.2.  Lazy updates . . . . . . . . . . . . . . . . . . . . . 11
83	   6.  DHCPv4 and DHCPv6 Failover Comparison  . . . . . . . . . . . . 12
84	   7.  DHCPv6 Failover Requirements . . . . . . . . . . . . . . . . . 12
85	     7.1.  Features out of scope  . . . . . . . . . . . . . . . . . . 14
86	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
87	   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 15
88	   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15
89	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
90	     11.1. Normative References . . . . . . . . . . . . . . . . . . . 15
91	     11.2. Informative References . . . . . . . . . . . . . . . . . . 16
92	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16

94	1.  Introduction

96	   The DHCPv6 protocol, defined in [RFC3315] allows for multiple servers
97	   to be operating on a single network, however it does not define how
98	   the servers can share the same address and prefix delegation pools
99	   and allow a client to seamlessly extend its existing leases when the
100	   original server is down.  [RFC3315] provides for these capabilities,
101	   but does not document how the servers cooperate and communicate to
102	   provide this capability.  Some sites are interested in running
103	   multiple servers in such a way as to provide redundancy in case of
104	   server failure.  In order for this to work reliably, the cooperating
105	   primary and secondary servers must maintain a consistent database of
106	   the lease information.

108	   This document discusses failover implementations scenarios, failure
109	   modes, and synchronization approaches to provide background to the
110	   list of requirements for a DHCPv6 failover protocol.  It then defines
111	   a minimum set of requirements that failover must provide to be
112	   useful, while acknowledging that additional features may be specified
113	   as extensions.  This document does not define a DHCPv6 failover
114	   protocol.

116	   The failover model, to which these requirements apply, will initially
117	   be a pairwise "hot standby" model (see Section 4.1) with a primary
118	   server used in normal operation switching over to a backup secondary
119	   server in the event of failure.  Optionally, a secondary server may
120	   provide failover service for multiple primary servers.  However the
121	   requirements will not preclude a future load-balancing extension
122	   where there is a symmetric failover relationship.

124	   The DHCPv6 failover concept borrows heavily from its DHCPv4
125	   counterpart [dhcpv4-failover] that never completed standardization
126	   process, but has several successful, operationally proven vendor-
127	   specific implementations.  For a dicussion about commonalities and
128	   differences, see Section 6.

130	2.  Definitions

132	   This section defines terms that are relevant to DHCPv6 failover.

134	   Definitions from [RFC3315] are included by reference.  In particular,
135	   client means any device e.g., end user host, CPE (Customer Premises
136	   Equipment) or other router that implements client functionality of
137	   the DHCPv6 protocol.  A server means a DHCPv6 server, unless
138	   explicitly noted otherwise.  A relay is a DHCPv6 relay.

140	      A binding (or client binding) is a group of server data records
141	      containing the information the server has about the addresses in
142	      an IA (Identity Assocation, see Section 10 of [RFC3315]) or
143	      configuration information explicitly assigned to the client.
144	      Configuration information that has been returned to a client
145	      through a policy - for example, the information returned to all
146	      clients on the same link - does not require a binding.

148	      DDNS - an abbreviation for "Dynamic DNS", which refers to the
149	      capability to update a DNS server's name database using the on-
150	      the-wire protocol defined in [RFC2136].  Clients and servers can
151	      negotiate the scope of such updates as defined in [RFC4704].

153	      Failover - an ability of one partner to continue offering services
154	      provided by another partner, with minimal or no impact on clients.

156	      FQDN - a fully qualified domain name.  A fully qualified domain
157	      name generally is a host name with at least one domain label under
158	      the top-level domain.  For example "dhcp.example.org" is a fully
159	      qualified domain name.

161	      High Availability - a desired property of DHCPv6 servers to
162	      continue providing services despite experiencing unwanted events
163	      such as server crashes, link failures, or network partitions.

165	      Load Balancing - the ability for two or more servers to each
166	      process some portion of the client request traffic in a conflict-
167	      free fashion.

169	      Lease - an IPv6 address, an IPv6 prefix or other resource that was
170	      assigned ("leased") by a server to a specific client.  A lease may
171	      include additional information, like associated fully qualified
172	      domain name (FQDN) and/or information about associated DNS
173	      updates.  A client obtains a lease for a specified period of time
174	      (valid lifetime).

176	      Partner - A "partner", for the purpose of this document, refers to
177	      a failover server, typically the other failover server in a
178	      failover relationship.

180	      Stable Storage - each DHCP server is required to keep its lease
181	      database in some form of storage (known as "stable storage") that
182	      will be consistent throughout reboots, crashes and power failures.

184	      Partner Failure - A power outage, unexpected shutdown, crash or
185	      other type of failure that renders a partner unable to continue
186	      its operation.

188	3.  Scope of work

190	   In order to fit within the IETF process effectively and efficiently,
191	   the standardization effort for DHCPv6 failover is expected to proceed
192	   with the creation of documents of increasing specificity.  It begins
193	   with this document specifying the requirements for DHCPv6 failover
194	   ("requirements document").  Later documents are expected to address
195	   the design of the DHCPv6 failover protocol ("design document"), and
196	   if sufficient interest exists, the protocol details required to
197	   implement the DHCPv6 failover protocol itself ("protocol document").
198	   The goal of this partitioning is, in part, to ease the validation,
199	   review, and approval of the DHCPv6 failover protocol by presenting it
200	   in comprehensible parts to the larger community.

202	   Additional documents describing extensions may also be defined.

204	   DHCPv6 Failover requirements are presented in Section 7.

206	3.1.  Alternatives to Failover

208	   There are many scenarios when it seems that a failover capability
209	   would be useful.  However, there are often much simpler approaches
210	   that will meet the required goals.  This section documents examples
211	   where failover is not really needed.

213	3.1.1.  Short-lived addresses

215	   There are cases when IPv6 addresses are used only for a short time,
216	   but there is a need to have high degree of confidence that those
217	   addresses will be served.  A notable example is PXE: Pre eXecution
218	   Environment [RFC5970].  This is a mechanism for obtaining
219	   configuration early in the process of bootstrapping over the network.

221	   The PXE BIOS acquires an address in order to load the operating
222	   system image and continue booting.  Address and possibly other
223	   configuration parameters are used during the boot process and are
224	   discarded thereafter.  Any lack of available DHCPv6 service at this
225	   time will prevent such devices from booting.

227	   Instead of deploying failover, it is better to use the much simpler
228	   preference mechanism, defined in [RFC3315].  For example, consider
229	   two or more servers with each having a distinct preference set (e.g.,
230	   10 and 20).  Both will answer to a client's request.  The client
231	   should choose the one with larger preference value.  In case of
232	   failure of the most preferred server, the next server will keep
233	   responding to clients' queries.  This approach is simple to deploy,
234	   but does not offer lease stability, i.e., in case of server failure,
235	   clients' addresses and prefixes will change.

237	3.1.2.  Redundant servers

239	   In some cases the desire to deploy failover is motivated by high
240	   availability, i.e., to continue providing services despite server
241	   failure.  If there are no additional requirements, that goal may be
242	   fulfilled with simply deploying two or more independent servers on
243	   the same link.

245	   There are several well-documented approaches showing how such a
246	   deployment could work.  They are discussed in detail in [RFC6853].
247	   Each of those approaches is simpler to deploy and maintain than full
248	   failover.

250	3.1.3.  Distributed databases

252	   Some servers may allow their lease database to be stored in external
253	   databases.  Another possible alternative to failover is to configure
254	   two servers to connect to the same distributed database.

256	   Care should be taken to understand how inconsistencies are solved in
257	   such database backends and how such conflict resolutions affect
258	   DHCPv6 server operation.

260	   It is also essential to use only a database that provides equivalent
261	   reliability and failover capability.  Otherwise the single point of
262	   failure is only moved to a different location (database rather than
263	   DHCPv6 server).  Such a configuration does not improve redundancy,
264	   but significantly complicates deployment.

266	   A common miscoception regarding database-based redundancy is the
267	   assumption that a conflict resolution after recovering from a network
268	   partition is not necessary.  To explain that fallacy, let's consider
269	   an example where there is a very small pool with only one address.
270	   There are two servers, each connected to a co-located database node
271	   (i.e., running on the same hardware).  Network partition occurs.
272	   Each server is operating, but has lost connection to its partner.
273	   Two clients request an address, one from each server.  Each server
274	   consults its database and discovers that only one address is
275	   available, so it is assigned to the client.  Unfortunately, each
276	   server assigned the same address to a different client.  Making the
277	   scenario more realistic (millions of addresses rather than one) just
278	   decreased failure probability, but did not eliminate the underlying
279	   issue.

281	   Any solution that involves a distributed database implementation of
282	   DHCPv6 failover must take into account the requirements for security.
283	   See Section 8 for additional information.

285	3.1.4.  Load Balancing

287	   Sometimes the desire to deploy more than one server is based on the
288	   assumption that they will share the client traffic.  Administrators
289	   that are interested in such a capability are advised to deploy a load
290	   balancing mechanism, defined in [I-D.ietf-dhc-dhcpv6-load-balancing].

292	4.  Failover Scenarios

294	   The following section provides several examples of deployment
295	   scenarios and use cases that may be associated with capabilities
296	   commonly referred to as failover.  These scenarios may be in or out
297	   of scope for the DHCPv6 failover protocol to which this document's
298	   requirements apply; they are enumerated here to provide a common
299	   basis for discussion.

301	4.1.  Hot Standby Model

303	   In the simplest case, there are two partners that are connected to
304	   the same network.  Only one of the partners ("primary") provides
305	   services to clients.  In case of its failure, the second partner
306	   ("secondary") continues handling services previously handled by first
307	   partner.  As both servers are connected to the same network, a
308	   partner that fails to communicate with its partner while also
309	   receiving requests from clients may assume with high probability that
310	   its partner is down and the network is functional.  This assumption
311	   may affect its operation.

313	4.2.  Geographically Distributed Failover

315	   Servers may be physically located in separate locations.  A common
316	   example of such a topology is where a service provider has at least a
317	   regional high performance network between geographically distributed
318	   datacenters.  In such a scenario, one server is located in one
319	   datacenter and its failover partner is located in another remote
320	   datacenter.  In this scenario, when one partner finds that it cannot
321	   communicate with the other partner, it does not necessarily mean that
322	   the other partner is down.

324	4.3.  Load balancing

326	   A desire to have more than one server in a network may also be
327	   created by the desire to have incoming traffic be handled by several
328	   servers.  This decreases the load each server must endure when all
329	   servers are operational.  Although such a capability does not,
330	   strictly, require failover - it is clear that failover makes such an
331	   architecture more straightforward.

333	   Note that in a load balancing situation which includes failover, each
334	   individual server must be able to handle the full load normally
335	   handled by both servers working together, or there is not a true
336	   increase in availability.

338	4.4.  1-to-1, m-to-1 and m-to-n models

340	   A failover relationship for a specific network is provided by two
341	   failover partners.  Those partners communicate with each other and
342	   back up all pools.  This scenario is sometimes referred to as the
343	   1-to-1 model and is considered relatively simple.  In larger networks
344	   one server may be participating in several failover relationships,
345	   i.e., it provides failover for several address or prefix pools, each
346	   served by separate partners.  Such a scenario can be referred to as
347	   m-to-1.  The most complex scenario - m-to-n - assumes that each
348	   partner participates in multiple failover relationships.

350	4.5.  Split prefixes

352	   Due to the extensive IPv6 address space, it is possible to provide
353	   semi-redundant service by splitting the available pool of addressees
354	   into two or more non-overlapping pools, with each server handling its
355	   own smaller pool.  Several versions of such a scenario are discussed
356	   in [RFC6853].

358	4.6.  Long lived connections

360	   Certain nodes may maintain long lived connections.  Since the IPv6
361	   address space is large, techniques exist (e.g., [RFC6853]) that use
362	   the easy availability of IPv6 addresses in order to provide increased
363	   DHCPv6 availability.  However, these approaches do not generally
364	   provide for stable IPv6 addresses for DHCPv6 clients should the
365	   server with which the client is interacting become unavailable.

367	   The obvious benefit of stable addresses is the ability to update DNS
368	   infrequently.  While the DNS can be updated every time an IPv6
369	   address changes, it introduces delays and (depending on DNS
370	   configuration) old entries may be cached for prolonged periods of
371	   time.

373	   The other benefit of having a stable address is that many monitoring
374	   solutions provide statistics on a per IP basis, so IP changes make
375	   measuring characteristics of a given box more difficult.

377	4.7.  Partial server communication loss

379	   There is a scenario where the DHCPv6 server may be configured to
380	   serve clients on one network adapter and communicate with a partner
381	   server (server to server traffic) on a different network adapter.  In
382	   this scenario, if the server loses connectivity on the network
383	   adapter used to communicate with the clients because of network
384	   adapter (hardware) failure, there is no intimation of the loss of
385	   service to the partner in the DHCPv6 failover protocol.  Since the
386	   servers are able to communicate with each other, the partner remains
387	   ignorant of the loss of service to clients.

389	5.  Principles of DHCPv6 Failover

391	   This section describes important issues that will affect any DHCPv6
392	   failover protocol.  This section is not intended to define
393	   implementation details, but rather high level concepts and issues
394	   that are important to DHCPv6 failover.  These issues form a basis for
395	   later documents which deal with the solutions to these issues.

397	   The general failover concept assumes that there are backup servers
398	   that can provide service in case of a primary server failure.  In
399	   theory there could be more than one backup server that could take up
400	   the role if such a need arise.  However, having more than two servers
401	   introduces a very difficult issue of synchronizing between partners.
402	   In the case of just a pair of cooperating servers, the notification
403	   and processes can result in only one of two states: fully successful
404	   (got response from a partner) and total failure (no response, failure
405	   event occurred).  Were there more than two partners participating in
406	   a relationship, there would be intermediate, inconsistent states
407	   where some partners had updated their state and some had not.  This
408	   would greatly complicate protocol design, and would give little
409	   advantage in return.  Therefore an approach that assumes a pair of
410	   cooperating servers was chosen.

412	5.1.  Failure modes

414	   This section documents failure modes.

416	5.1.1.  Server Failure

418	   Servers may become unresponsive due to a software crash, hardware
419	   failure, power outage or any number of other reasons.  The failover
420	   partner will detect such event due to lack of responses from the down
421	   partner.  In this failure mode, the assumption is that the server is
422	   the only equipment that is off-line and all other network equipment
423	   is operating normally.  In particular, communication between other
424	   nodes is not interrupted.

426	   When working under the assumption that this is the only type of
427	   failure that can happen, the server may safely assume that its
428	   partner unreachability means that it is down, so other nodes (clients
429	   in particular) are not able to reach it either and no services are
430	   provided.

432	   It should be noted that recovery after the failed server is brought
433	   back on-line is straightforward, due to the fact that it just needs
434	   to download current information from the lease database of the
435	   healthy partner and there is no conflict resolution required.

437	   This is by far the most common failure mode between two failover
438	   partners.

440	   When the two servers are located physically close to each other,
441	   possibly in the same room, the probability that a failure to
442	   communicate between failover partners is due to server failure is
443	   increased.

445	5.1.2.  Network partition

447	   Another possible cause of partner unreachability is a failure in the
448	   network that connects the two servers.  This may be caused by failure
449	   of any kind of network equipment: router, switch, physical cables, or
450	   optic fibers.  As a result of such a failure the network is split
451	   into two or more disjoint sections (partitions) that are not able to
452	   communicate with each other.  Such an event is called network
453	   partition.  If failover partners are located in different partitions,
454	   they won't be able to communicate with each other.  Nevertheless,
455	   each partner may still be able to serve clients that happen to be
456	   part of the same partition.

458	   If this failure mode is taken into consideration, a server can't
459	   assume that partner unreachability automatically means that its
460	   partner is down.  They must consider the fact that the partner may
461	   continue operating and interacting with a subset of the clients.  The
462	   only valid assumption is that the partner also detected the network
463	   partition event and follows procedures specified for such a
464	   situation.

466	   It should be noted that recovery after a partitioned network is
467	   rejoined is significantly more complicated than recovery from a
468	   server failure event.  As both servers may have kept serving clients,
469	   they have two separate lease databases, and they need to agree on the
470	   state of each lease (or follow any other algorithm to bring their
471	   lease databases into agreement).

473	   This failure mode is more likely (though still rare) in the situation
474	   where two servers are in physically distant locations with multiple
475	   network elements between them.  This is the case in geographically
476	   distributed failover (see Section 4.2).

478	5.2.  Synchronization mechanisms

480	   Partners must exchange information about changes made to the lease
481	   database.  There are at least two types of synchronization methods
482	   that may be used.  These concepts are related to distributed
483	   databases, so some familiarity with distributed database technology
484	   is useful to better understand this topic.

486	5.2.1.  Lockstep

488	   When a server receives a REQUEST message from a client it consults
489	   its lease database and assigns requested addresses and/or prefixes.
490	   To make sure that its partner maintains a consistent database, it
491	   then sends information about a new or just updated lease to the
492	   partner and waits for the partner's response.  After the response
493	   from its partner is received the REPLY message is transmitted to the
494	   client.

496	   This approach has the benefit of having a completely consistent lease
497	   database between partners at all times.  Unfortunately, there is
498	   typically a significant performance penalty for this approach as each
499	   response sent to a client is delayed by the total sum of the delays
500	   caused by two transmissions between partners and the processing by
501	   the second partner.  The second partner is expected to update its own
502	   copy of the lease database in permanent storage, so this delay is not
503	   negligible, even in fast networks.

505	   Due to the advent of fast SSD (solid state disk) and battery backed
506	   RAM (random access memory) disk technology, this write performance
507	   penalty can be limited to some degree.

509	5.2.2.  Lazy updates

511	   Another approach to synchronizing the lease databases is to transmit
512	   the REPLY message to the client before completing the update to the
513	   partner.  The server sends the REPLY to the client immediately after
514	   assigning appropriate addresses and/or prefixes and initiates the
515	   partner update at a later time, depending on the algorithm chosen.
516	   Another variation of this approach is to initiate transmission to the
517	   partner, but not wait for its response before sending the REPLY to
518	   the client.

520	   This approach has benefit of a minimal impact on server response
521	   times, thus it is much better from a performance perspective.
522	   However, it makes the lease databases loosely synchronized between
523	   partners.  This makes the synchronization more complex (and
524	   particularly the re-integration after a network partition event), as
525	   there may be cases where one client has been given a lease on an
526	   address or prefix of which the partner is not aware (e.g., if the
527	   server crashes after sending REPLY to the client, but before sending
528	   update information to its partner).

530	6.  DHCPv4 and DHCPv6 Failover Comparison

532	   There are significant similarities between existing DHCPv4 and
533	   envisaged DHCPv6 failovers.  In particular both serve IP addresses to
534	   clients, require maintaining consistent databases among partners,
535	   need to perform consistent DNS Updates, must be able take over
536	   bindings offered by failed partner, must be able to resume operation
537	   after partner is recovered.  DNS conflict resolution works on the
538	   same principles in both DHCPv4 and DHCPv6.

540	   Nevertheless, there are significant differences.  IPv6 introduced
541	   prefix delegation [RFC3633] that is a crucial element of the DHCPv6
542	   protocol.  IPv6 also introduced the concept of deprecated addresses
543	   with separate preferred and valid lifetimes, both being configured
544	   via DHCPv6.  Negative response (NACK) in DHCPv4 has been replaced
545	   with the ability in DHCPv6 to provide corrected response in a REPLY
546	   message that differs from a REQUEST.

548	   Also, the typical large address space (close to 2^64 addresses on /64
549	   prefixes expected to be available on most networks) may make managing
550	   address assignment significantly different from DHCPv4 failover.  In
551	   DHCPv4 it was not possible to use a hash or calculated technique to
552	   divide the significantly more limited address space and therefore
553	   much of the protocol that deals with pool balancing and rebalancing
554	   might not be necessary and can be done mathematically.  Also, because
555	   of the much lower degree of contention for IP addresses, the DHCPv6
556	   failover protocol does not need to be tuned to support rapid
557	   reclamation of IPv6 addresses following the loss of a failover peer's
558	   database.

560	   However, DHCPv6 Prefix Delegation is similar to IPv4 addressing in
561	   terms of the number of available leases and therefore techniques for
562	   pool balancing and rebalancing and more rapid reclamation of prefixes
563	   allocated by a failed peer will be needed.

565	7.  DHCPv6 Failover Requirements

567	   This section summarizes the requirements for DHCPv6 failover.

569	   Certain capabilities may be useful in some, but not all scenarios.

571	   Such additional features will be considered optional parts of
572	   failover, and will be split and defined in separate documents.  As
573	   such, this document can be considered an attempt to define
574	   requirements for the DHCPv6 failover "core" protocol.

576	   The core of the DHCPv6 failover protocol is expected to provide the
577	   following properties:

579	   1.   The number of supported partners must be exactly two, i.e.,
580	        there are at most two servers that are aware of a specific
581	        lease.

583	   2.   For each prefix or address pool, a server must not participate
584	        in more than one failover relationship.

586	   3.   The defined protocol must support the m-to-1 model (i.e., one
587	        server may form more than one relationship), but an
588	        implementation may choose to implement only the 1-to-1 model
589	        (i.e., everything from one server is backed on another).

591	   4.   One partner must be able to continue serving leases offered by
592	        the other partner.  This property is also sometimes called
593	        "lease stability".  The failure of either failover partner
594	        should have minimal or no impact on client connectivity.  In
595	        particular, it must not force the client to change addresses
596	        and/or change prefixes delegated to it.  Lease stability has the
597	        aim of avoiding disturbance to long-lived connections.

599	   5.   Prefix delegation must be supported.

601	   6.   Use of the failover protocol must not introduce significant
602	        performance impact on server response times.  Therefore
603	        synchronization between partners must be done using some form of
604	        lazy updates (see Section 5.2.2).

606	   7.   The pair of failover servers must be able to recover from a
607	        server down failure (see Section 5.1.1).

609	   8.   The pair of failover servers must be able to recover from a
610	        network partition event (see Section 5.1.2).

612	   9.   The design must allow secure communication between the failover
613	        partners.

615	   10.  The definition of extensions to this core protocol should be
616	        allowed, when possible.

618	   Depending on the specific nature of the failure, the recovery
619	   procedures mentioned in points 7 and 8 may require manual
620	   intervention.

622	   High Availability is a property of the protocol that allows clients
623	   to receive DHCPv6 services despite the failure of individual DHCPv6
624	   servers.  In particular, it means the server that takes over
625	   providing service to clients from its failed partner, will continue
626	   serving the same addresses and/or prefixes.  This property is also
627	   called "lease stability".

629	   Although progress on a standardized inter-operable DHCPv4 failover
630	   protocol has stalled, vendor-specific DHCPv4 failover protocols have
631	   been deployed that meet these requirements to a large extent.
632	   Accordingly it would be appropriate to take into account the likely
633	   coexistence of DHCPv4 and DHCPv6 failover solutions.  In particular,
634	   certain features that are common to both IPv4 and IPv6
635	   implementations, such as DNS Update mechanism, should be taken into
636	   consideration to ensure compatible operation.

638	7.1.  Features out of scope

640	   The following features are explicitly out of scope.

642	   1.  Load Balancing - a capability is considered an extension and may
643	       be defined in a separate document.  It must not be part of the
644	       core protocol, but rather defined as an extension.  The primary
645	       reason for this the desire to limit core protocol complexity.
646	       Load Balancing is likely to be defined as an extension.  See
647	       [I-D.ietf-dhc-dhcpv6-load-balancing].

649	   2.  Configuration synchronization - two failover partners are
650	       expected to maintain the same configuration.  Mismatched
651	       configuration between partners is a frequent problem in failover
652	       solutions.  Unfortunately, that is an open-ended problem, since
653	       different servers have very different configuration data models.

655	   3.  m-to-n model (see Section 4.4)

657	   4.  Servers participating in multiple failover relationships for any
658	       given prefix or address pool.

660	8.  Security Considerations

662	   The design must provide a mechanism whereby each peer in a failover
663	   relationship can identify the other peer, authenticate that
664	   identification, and validate that the identified peer is the one with
665	   which communication is intended.  This mechanism should also
666	   optionally provide support for confidentiality.

668	   The protocol specification, when it is written, should provide
669	   operational guidelines in the case of authentication mechanisms that
670	   require access to network servers that have the potential to be
671	   unreachable (e.g. what to do if a partner is reachable, but remote
672	   Certificate Authority is unreachable due to network partition event).

674	   The security considerations for the design itself will be discussed
675	   in the design document.

677	9.  IANA Considerations

679	   IANA is not requested to perform any actions at this time.

681	10.  Acknowledgements

683	   This document extensively uses concepts, definitions and other parts
684	   of [dhcpv4-failover] document.  Thanks to Bernie Volz and Shawn
685	   Routhier for their frequent reviews and substantial contributions.
686	   Authors would also like to thank Qin Wu, Jean-Francois Tremblay,
687	   Frank Sweetser, Jiang Sheng, Yu Fu, Greg Rabil, Vithalprasad
688	   Gaitonde, Krzysztof Nowicki, Steinar Haug, Elwyn Davies, Ted Lemon,
689	   Benoit Claise and Stephen Farrell for their comments and feedback.

691	   This work has been partially supported by Department of Computer
692	   Communications (a division of Gdansk University of Technology) and
693	   the National Centre for Research and Development (Poland) under the
694	   European Regional Development Fund, Grant No.  POIG.01.01.02-00-045 /
695	   09-00 (Future Internet Engineering Project).

697	11.  References

699	11.1.  Normative References

701	   [RFC3315]  Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C.,
702	              and M. Carney, "Dynamic Host Configuration Protocol for
703	              IPv6 (DHCPv6)", RFC 3315, July 2003.

705	   [RFC3633]  Troan, O. and R. Droms, "IPv6 Prefix Options for Dynamic
706	              Host Configuration Protocol (DHCP) version 6", RFC 3633,
707	              December 2003.

709	   [RFC4704]  Volz, B., "The Dynamic Host Configuration Protocol for
710	              IPv6 (DHCPv6) Client Fully Qualified Domain Name (FQDN)
711	              Option", RFC 4704, October 2006.

713	11.2.  Informative References

715	   [I-D.ietf-dhc-dhcpv6-load-balancing]
716	              Kostur, A., "DHC Load Balancing Algorithm for DHCPv6",
717	              draft-ietf-dhc-dhcpv6-load-balancing-00 (work in
718	              progress), December 2012.

720	   [RFC2136]  Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
721	              "Dynamic Updates in the Domain Name System (DNS UPDATE)",
722	              RFC 2136, April 1997.

724	   [RFC5970]  Huth, T., Freimann, J., Zimmer, V., and D. Thaler, "DHCPv6
725	              Options for Network Boot", RFC 5970, September 2010.

727	   [RFC6853]  Brzozowski, J., Tremblay, J., Chen, J., and T. Mrugalski,
728	              "DHCPv6 Redundancy Deployment Considerations", BCP 180,
729	              RFC 6853, February 2013.

731	   [dhcpv4-failover]
732	              Droms, R., Kinnear, K., Stapp, M., Volz, B., Gonczi, S.,
733	              Rabil, G., Dooley, M., and A. Kapur, "DHCP Failover
734	              Protocol", draft-ietf-dhc-failover-12 (work in progress),
735	              March 2003.

737	Authors' Addresses

739	   Tomek Mrugalski
740	   Internet Systems Consortium, Inc.
741	   950 Charter Street
742	   Redwood City, CA  94063
743	   USA

745	   Phone: +1 650 423 1345
746	   Email: tomasz.mrugalski@gmail.com

748	   Kim Kinnear
749	   Cisco Systems, Inc.
750	   1414 Massachusetts Ave.
751	   Boxborough, Massachusetts  01719
752	   USA

754	   Phone: +1 (978) 936-0000
755	   Email: kkinnear@cisco.com