idnits 2.17.1 draft-ietf-dhc-dhcpv6-failover-requirements-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC3315]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 19, 2013) is 3928 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 3315 (Obsoleted by RFC 8415) ** Obsolete normative reference: RFC 3633 (Obsoleted by RFC 8415) == Outdated reference: A later version (-02) exists of draft-ietf-dhc-dhcpv6-load-balancing-00 Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Dynamic Host Configuration (DHC) T. Mrugalski 3 Internet-Draft ISC 4 Intended status: Informational K. Kinnear 5 Expires: January 20, 2014 Cisco 6 July 19, 2013 8 DHCPv6 Failover Requirements 9 draft-ietf-dhc-dhcpv6-failover-requirements-07 11 Abstract 13 The DHCPv6 protocol, defined in [RFC3315] allows for multiple servers 14 to operate on a single network, however it does not define any way 15 the servers could share information about currently active clients 16 and their leases. Some sites are interested in running multiple 17 servers in such a way as to provide increased availability in case of 18 server failure. In order for this to work reliably, the cooperating 19 primary and secondary servers must maintain a consistent database of 20 the lease information. [RFC3315] allows for but does not define any 21 redundancy or failover mechanisms. This document outlines 22 requirements for DHCPv6 failover, enumerates related problems, and 23 discusses the proposed scope of work to be conducted. This document 24 does not define a DHCPv6 failover protocol. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on January 20, 2014. 43 Copyright Notice 45 Copyright (c) 2013 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 3. Scope of work . . . . . . . . . . . . . . . . . . . . . . . . 5 63 3.1. Alternatives to Failover . . . . . . . . . . . . . . . . . 5 64 3.1.1. Short-lived addresses . . . . . . . . . . . . . . . . 5 65 3.1.2. Redundant servers . . . . . . . . . . . . . . . . . . 6 66 3.1.3. Distributed databases . . . . . . . . . . . . . . . . 6 67 3.1.4. Load Balancing . . . . . . . . . . . . . . . . . . . . 7 68 4. Failover Scenarios . . . . . . . . . . . . . . . . . . . . . . 7 69 4.1. Hot Standby Model . . . . . . . . . . . . . . . . . . . . 7 70 4.2. Geographically Distributed Failover . . . . . . . . . . . 7 71 4.3. Load balancing . . . . . . . . . . . . . . . . . . . . . . 7 72 4.4. 1-to-1, m-to-1 and m-to-n models . . . . . . . . . . . . . 8 73 4.5. Split prefixes . . . . . . . . . . . . . . . . . . . . . . 8 74 4.6. Long lived connections . . . . . . . . . . . . . . . . . . 8 75 4.7. Partial server communication loss . . . . . . . . . . . . 8 76 5. Principles of DHCPv6 Failover . . . . . . . . . . . . . . . . 9 77 5.1. Failure modes . . . . . . . . . . . . . . . . . . . . . . 9 78 5.1.1. Server Failure . . . . . . . . . . . . . . . . . . . . 9 79 5.1.2. Network partition . . . . . . . . . . . . . . . . . . 10 80 5.2. Synchronization mechanisms . . . . . . . . . . . . . . . . 11 81 5.2.1. Lockstep . . . . . . . . . . . . . . . . . . . . . . . 11 82 5.2.2. Lazy updates . . . . . . . . . . . . . . . . . . . . . 11 83 6. DHCPv4 and DHCPv6 Failover Comparison . . . . . . . . . . . . 12 84 7. DHCPv6 Failover Requirements . . . . . . . . . . . . . . . . . 12 85 7.1. Features out of scope . . . . . . . . . . . . . . . . . . 14 86 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 87 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 88 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 89 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 90 11.1. Normative References . . . . . . . . . . . . . . . . . . . 15 91 11.2. Informative References . . . . . . . . . . . . . . . . . . 16 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16 94 1. Introduction 96 The DHCPv6 protocol, defined in [RFC3315] allows for multiple servers 97 to be operating on a single network, however it does not define how 98 the servers can share the same address and prefix delegation pools 99 and allow a client to seamlessly extend its existing leases when the 100 original server is down. [RFC3315] provides for these capabilities, 101 but does not document how the servers cooperate and communicate to 102 provide this capability. Some sites are interested in running 103 multiple servers in such a way as to provide redundancy in case of 104 server failure. In order for this to work reliably, the cooperating 105 primary and secondary servers must maintain a consistent database of 106 the lease information. 108 This document discusses failover implementations scenarios, failure 109 modes, and synchronization approaches to provide background to the 110 list of requirements for a DHCPv6 failover protocol. It then defines 111 a minimum set of requirements that failover must provide to be 112 useful, while acknowledging that additional features may be specified 113 as extensions. This document does not define a DHCPv6 failover 114 protocol. 116 The failover model, to which these requirements apply, will initially 117 be a pairwise "hot standby" model (see Section 4.1) with a primary 118 server used in normal operation switching over to a backup secondary 119 server in the event of failure. Optionally, a secondary server may 120 provide failover service for multiple primary servers. However the 121 requirements will not preclude a future load-balancing extension 122 where there is a symmetric failover relationship. 124 The DHCPv6 failover concept borrows heavily from its DHCPv4 125 counterpart [dhcpv4-failover] that never completed standardization 126 process, but has several successful, operationally proven vendor- 127 specific implementations. For a dicussion about commonalities and 128 differences, see Section 6. 130 2. Definitions 132 This section defines terms that are relevant to DHCPv6 failover. 134 Definitions from [RFC3315] are included by reference. In particular, 135 client means any device e.g., end user host, CPE (Customer Premises 136 Equipment) or other router that implements client functionality of 137 the DHCPv6 protocol. A server means a DHCPv6 server, unless 138 explicitly noted otherwise. A relay is a DHCPv6 relay. 140 A binding (or client binding) is a group of server data records 141 containing the information the server has about the addresses in 142 an IA (Identity Assocation, see Section 10 of [RFC3315]) or 143 configuration information explicitly assigned to the client. 144 Configuration information that has been returned to a client 145 through a policy - for example, the information returned to all 146 clients on the same link - does not require a binding. 148 DDNS - an abbreviation for "Dynamic DNS", which refers to the 149 capability to update a DNS server's name database using the on- 150 the-wire protocol defined in [RFC2136]. Clients and servers can 151 negotiate the scope of such updates as defined in [RFC4704]. 153 Failover - an ability of one partner to continue offering services 154 provided by another partner, with minimal or no impact on clients. 156 FQDN - a fully qualified domain name. A fully qualified domain 157 name generally is a host name with at least one domain label under 158 the top-level domain. For example "dhcp.example.org" is a fully 159 qualified domain name. 161 High Availability - a desired property of DHCPv6 servers to 162 continue providing services despite experiencing unwanted events 163 such as server crashes, link failures, or network partitions. 165 Load Balancing - the ability for two or more servers to each 166 process some portion of the client request traffic in a conflict- 167 free fashion. 169 Lease - an IPv6 address, an IPv6 prefix or other resource that was 170 assigned ("leased") by a server to a specific client. A lease may 171 include additional information, like associated fully qualified 172 domain name (FQDN) and/or information about associated DNS 173 updates. A client obtains a lease for a specified period of time 174 (valid lifetime). 176 Partner - A "partner", for the purpose of this document, refers to 177 a failover server, typically the other failover server in a 178 failover relationship. 180 Stable Storage - each DHCP server is required to keep its lease 181 database in some form of storage (known as "stable storage") that 182 will be consistent throughout reboots, crashes and power failures. 184 Partner Failure - A power outage, unexpected shutdown, crash or 185 other type of failure that renders a partner unable to continue 186 its operation. 188 3. Scope of work 190 In order to fit within the IETF process effectively and efficiently, 191 the standardization effort for DHCPv6 failover is expected to proceed 192 with the creation of documents of increasing specificity. It begins 193 with this document specifying the requirements for DHCPv6 failover 194 ("requirements document"). Later documents are expected to address 195 the design of the DHCPv6 failover protocol ("design document"), and 196 if sufficient interest exists, the protocol details required to 197 implement the DHCPv6 failover protocol itself ("protocol document"). 198 The goal of this partitioning is, in part, to ease the validation, 199 review, and approval of the DHCPv6 failover protocol by presenting it 200 in comprehensible parts to the larger community. 202 Additional documents describing extensions may also be defined. 204 DHCPv6 Failover requirements are presented in Section 7. 206 3.1. Alternatives to Failover 208 There are many scenarios when it seems that a failover capability 209 would be useful. However, there are often much simpler approaches 210 that will meet the required goals. This section documents examples 211 where failover is not really needed. 213 3.1.1. Short-lived addresses 215 There are cases when IPv6 addresses are used only for a short time, 216 but there is a need to have high degree of confidence that those 217 addresses will be served. A notable example is PXE: Pre eXecution 218 Environment [RFC5970]. This is a mechanism for obtaining 219 configuration early in the process of bootstrapping over the network. 221 The PXE BIOS acquires an address in order to load the operating 222 system image and continue booting. Address and possibly other 223 configuration parameters are used during the boot process and are 224 discarded thereafter. Any lack of available DHCPv6 service at this 225 time will prevent such devices from booting. 227 Instead of deploying failover, it is better to use the much simpler 228 preference mechanism, defined in [RFC3315]. For example, consider 229 two or more servers with each having a distinct preference set (e.g., 230 10 and 20). Both will answer to a client's request. The client 231 should choose the one with larger preference value. In case of 232 failure of the most preferred server, the next server will keep 233 responding to clients' queries. This approach is simple to deploy, 234 but does not offer lease stability, i.e., in case of server failure, 235 clients' addresses and prefixes will change. 237 3.1.2. Redundant servers 239 In some cases the desire to deploy failover is motivated by high 240 availability, i.e., to continue providing services despite server 241 failure. If there are no additional requirements, that goal may be 242 fulfilled with simply deploying two or more independent servers on 243 the same link. 245 There are several well-documented approaches showing how such a 246 deployment could work. They are discussed in detail in [RFC6853]. 247 Each of those approaches is simpler to deploy and maintain than full 248 failover. 250 3.1.3. Distributed databases 252 Some servers may allow their lease database to be stored in external 253 databases. Another possible alternative to failover is to configure 254 two servers to connect to the same distributed database. 256 Care should be taken to understand how inconsistencies are solved in 257 such database backends and how such conflict resolutions affect 258 DHCPv6 server operation. 260 It is also essential to use only a database that provides equivalent 261 reliability and failover capability. Otherwise the single point of 262 failure is only moved to a different location (database rather than 263 DHCPv6 server). Such a configuration does not improve redundancy, 264 but significantly complicates deployment. 266 A common miscoception regarding database-based redundancy is the 267 assumption that a conflict resolution after recovering from a network 268 partition is not necessary. To explain that fallacy, let's consider 269 an example where there is a very small pool with only one address. 270 There are two servers, each connected to a co-located database node 271 (i.e., running on the same hardware). Network partition occurs. 272 Each server is operating, but has lost connection to its partner. 273 Two clients request an address, one from each server. Each server 274 consults its database and discovers that only one address is 275 available, so it is assigned to the client. Unfortunately, each 276 server assigned the same address to a different client. Making the 277 scenario more realistic (millions of addresses rather than one) just 278 decreased failure probability, but did not eliminate the underlying 279 issue. 281 Any solution that involves a distributed database implementation of 282 DHCPv6 failover must take into account the requirements for security. 283 See Section 8 for additional information. 285 3.1.4. Load Balancing 287 Sometimes the desire to deploy more than one server is based on the 288 assumption that they will share the client traffic. Administrators 289 that are interested in such a capability are advised to deploy a load 290 balancing mechanism, defined in [I-D.ietf-dhc-dhcpv6-load-balancing]. 292 4. Failover Scenarios 294 The following section provides several examples of deployment 295 scenarios and use cases that may be associated with capabilities 296 commonly referred to as failover. These scenarios may be in or out 297 of scope for the DHCPv6 failover protocol to which this document's 298 requirements apply; they are enumerated here to provide a common 299 basis for discussion. 301 4.1. Hot Standby Model 303 In the simplest case, there are two partners that are connected to 304 the same network. Only one of the partners ("primary") provides 305 services to clients. In case of its failure, the second partner 306 ("secondary") continues handling services previously handled by first 307 partner. As both servers are connected to the same network, a 308 partner that fails to communicate with its partner while also 309 receiving requests from clients may assume with high probability that 310 its partner is down and the network is functional. This assumption 311 may affect its operation. 313 4.2. Geographically Distributed Failover 315 Servers may be physically located in separate locations. A common 316 example of such a topology is where a service provider has at least a 317 regional high performance network between geographically distributed 318 datacenters. In such a scenario, one server is located in one 319 datacenter and its failover partner is located in another remote 320 datacenter. In this scenario, when one partner finds that it cannot 321 communicate with the other partner, it does not necessarily mean that 322 the other partner is down. 324 4.3. Load balancing 326 A desire to have more than one server in a network may also be 327 created by the desire to have incoming traffic be handled by several 328 servers. This decreases the load each server must endure when all 329 servers are operational. Although such a capability does not, 330 strictly, require failover - it is clear that failover makes such an 331 architecture more straightforward. 333 Note that in a load balancing situation which includes failover, each 334 individual server must be able to handle the full load normally 335 handled by both servers working together, or there is not a true 336 increase in availability. 338 4.4. 1-to-1, m-to-1 and m-to-n models 340 A failover relationship for a specific network is provided by two 341 failover partners. Those partners communicate with each other and 342 back up all pools. This scenario is sometimes referred to as the 343 1-to-1 model and is considered relatively simple. In larger networks 344 one server may be participating in several failover relationships, 345 i.e., it provides failover for several address or prefix pools, each 346 served by separate partners. Such a scenario can be referred to as 347 m-to-1. The most complex scenario - m-to-n - assumes that each 348 partner participates in multiple failover relationships. 350 4.5. Split prefixes 352 Due to the extensive IPv6 address space, it is possible to provide 353 semi-redundant service by splitting the available pool of addressees 354 into two or more non-overlapping pools, with each server handling its 355 own smaller pool. Several versions of such a scenario are discussed 356 in [RFC6853]. 358 4.6. Long lived connections 360 Certain nodes may maintain long lived connections. Since the IPv6 361 address space is large, techniques exist (e.g., [RFC6853]) that use 362 the easy availability of IPv6 addresses in order to provide increased 363 DHCPv6 availability. However, these approaches do not generally 364 provide for stable IPv6 addresses for DHCPv6 clients should the 365 server with which the client is interacting become unavailable. 367 The obvious benefit of stable addresses is the ability to update DNS 368 infrequently. While the DNS can be updated every time an IPv6 369 address changes, it introduces delays and (depending on DNS 370 configuration) old entries may be cached for prolonged periods of 371 time. 373 The other benefit of having a stable address is that many monitoring 374 solutions provide statistics on a per IP basis, so IP changes make 375 measuring characteristics of a given box more difficult. 377 4.7. Partial server communication loss 379 There is a scenario where the DHCPv6 server may be configured to 380 serve clients on one network adapter and communicate with a partner 381 server (server to server traffic) on a different network adapter. In 382 this scenario, if the server loses connectivity on the network 383 adapter used to communicate with the clients because of network 384 adapter (hardware) failure, there is no intimation of the loss of 385 service to the partner in the DHCPv6 failover protocol. Since the 386 servers are able to communicate with each other, the partner remains 387 ignorant of the loss of service to clients. 389 5. Principles of DHCPv6 Failover 391 This section describes important issues that will affect any DHCPv6 392 failover protocol. This section is not intended to define 393 implementation details, but rather high level concepts and issues 394 that are important to DHCPv6 failover. These issues form a basis for 395 later documents which deal with the solutions to these issues. 397 The general failover concept assumes that there are backup servers 398 that can provide service in case of a primary server failure. In 399 theory there could be more than one backup server that could take up 400 the role if such a need arise. However, having more than two servers 401 introduces a very difficult issue of synchronizing between partners. 402 In the case of just a pair of cooperating servers, the notification 403 and processes can result in only one of two states: fully successful 404 (got response from a partner) and total failure (no response, failure 405 event occurred). Were there more than two partners participating in 406 a relationship, there would be intermediate, inconsistent states 407 where some partners had updated their state and some had not. This 408 would greatly complicate protocol design, and would give little 409 advantage in return. Therefore an approach that assumes a pair of 410 cooperating servers was chosen. 412 5.1. Failure modes 414 This section documents failure modes. 416 5.1.1. Server Failure 418 Servers may become unresponsive due to a software crash, hardware 419 failure, power outage or any number of other reasons. The failover 420 partner will detect such event due to lack of responses from the down 421 partner. In this failure mode, the assumption is that the server is 422 the only equipment that is off-line and all other network equipment 423 is operating normally. In particular, communication between other 424 nodes is not interrupted. 426 When working under the assumption that this is the only type of 427 failure that can happen, the server may safely assume that its 428 partner unreachability means that it is down, so other nodes (clients 429 in particular) are not able to reach it either and no services are 430 provided. 432 It should be noted that recovery after the failed server is brought 433 back on-line is straightforward, due to the fact that it just needs 434 to download current information from the lease database of the 435 healthy partner and there is no conflict resolution required. 437 This is by far the most common failure mode between two failover 438 partners. 440 When the two servers are located physically close to each other, 441 possibly in the same room, the probability that a failure to 442 communicate between failover partners is due to server failure is 443 increased. 445 5.1.2. Network partition 447 Another possible cause of partner unreachability is a failure in the 448 network that connects the two servers. This may be caused by failure 449 of any kind of network equipment: router, switch, physical cables, or 450 optic fibers. As a result of such a failure the network is split 451 into two or more disjoint sections (partitions) that are not able to 452 communicate with each other. Such an event is called network 453 partition. If failover partners are located in different partitions, 454 they won't be able to communicate with each other. Nevertheless, 455 each partner may still be able to serve clients that happen to be 456 part of the same partition. 458 If this failure mode is taken into consideration, a server can't 459 assume that partner unreachability automatically means that its 460 partner is down. They must consider the fact that the partner may 461 continue operating and interacting with a subset of the clients. The 462 only valid assumption is that the partner also detected the network 463 partition event and follows procedures specified for such a 464 situation. 466 It should be noted that recovery after a partitioned network is 467 rejoined is significantly more complicated than recovery from a 468 server failure event. As both servers may have kept serving clients, 469 they have two separate lease databases, and they need to agree on the 470 state of each lease (or follow any other algorithm to bring their 471 lease databases into agreement). 473 This failure mode is more likely (though still rare) in the situation 474 where two servers are in physically distant locations with multiple 475 network elements between them. This is the case in geographically 476 distributed failover (see Section 4.2). 478 5.2. Synchronization mechanisms 480 Partners must exchange information about changes made to the lease 481 database. There are at least two types of synchronization methods 482 that may be used. These concepts are related to distributed 483 databases, so some familiarity with distributed database technology 484 is useful to better understand this topic. 486 5.2.1. Lockstep 488 When a server receives a REQUEST message from a client it consults 489 its lease database and assigns requested addresses and/or prefixes. 490 To make sure that its partner maintains a consistent database, it 491 then sends information about a new or just updated lease to the 492 partner and waits for the partner's response. After the response 493 from its partner is received the REPLY message is transmitted to the 494 client. 496 This approach has the benefit of having a completely consistent lease 497 database between partners at all times. Unfortunately, there is 498 typically a significant performance penalty for this approach as each 499 response sent to a client is delayed by the total sum of the delays 500 caused by two transmissions between partners and the processing by 501 the second partner. The second partner is expected to update its own 502 copy of the lease database in permanent storage, so this delay is not 503 negligible, even in fast networks. 505 Due to the advent of fast SSD (solid state disk) and battery backed 506 RAM (random access memory) disk technology, this write performance 507 penalty can be limited to some degree. 509 5.2.2. Lazy updates 511 Another approach to synchronizing the lease databases is to transmit 512 the REPLY message to the client before completing the update to the 513 partner. The server sends the REPLY to the client immediately after 514 assigning appropriate addresses and/or prefixes and initiates the 515 partner update at a later time, depending on the algorithm chosen. 516 Another variation of this approach is to initiate transmission to the 517 partner, but not wait for its response before sending the REPLY to 518 the client. 520 This approach has benefit of a minimal impact on server response 521 times, thus it is much better from a performance perspective. 522 However, it makes the lease databases loosely synchronized between 523 partners. This makes the synchronization more complex (and 524 particularly the re-integration after a network partition event), as 525 there may be cases where one client has been given a lease on an 526 address or prefix of which the partner is not aware (e.g., if the 527 server crashes after sending REPLY to the client, but before sending 528 update information to its partner). 530 6. DHCPv4 and DHCPv6 Failover Comparison 532 There are significant similarities between existing DHCPv4 and 533 envisaged DHCPv6 failovers. In particular both serve IP addresses to 534 clients, require maintaining consistent databases among partners, 535 need to perform consistent DNS Updates, must be able take over 536 bindings offered by failed partner, must be able to resume operation 537 after partner is recovered. DNS conflict resolution works on the 538 same principles in both DHCPv4 and DHCPv6. 540 Nevertheless, there are significant differences. IPv6 introduced 541 prefix delegation [RFC3633] that is a crucial element of the DHCPv6 542 protocol. IPv6 also introduced the concept of deprecated addresses 543 with separate preferred and valid lifetimes, both being configured 544 via DHCPv6. Negative response (NACK) in DHCPv4 has been replaced 545 with the ability in DHCPv6 to provide corrected response in a REPLY 546 message that differs from a REQUEST. 548 Also, the typical large address space (close to 2^64 addresses on /64 549 prefixes expected to be available on most networks) may make managing 550 address assignment significantly different from DHCPv4 failover. In 551 DHCPv4 it was not possible to use a hash or calculated technique to 552 divide the significantly more limited address space and therefore 553 much of the protocol that deals with pool balancing and rebalancing 554 might not be necessary and can be done mathematically. Also, because 555 of the much lower degree of contention for IP addresses, the DHCPv6 556 failover protocol does not need to be tuned to support rapid 557 reclamation of IPv6 addresses following the loss of a failover peer's 558 database. 560 However, DHCPv6 Prefix Delegation is similar to IPv4 addressing in 561 terms of the number of available leases and therefore techniques for 562 pool balancing and rebalancing and more rapid reclamation of prefixes 563 allocated by a failed peer will be needed. 565 7. DHCPv6 Failover Requirements 567 This section summarizes the requirements for DHCPv6 failover. 569 Certain capabilities may be useful in some, but not all scenarios. 571 Such additional features will be considered optional parts of 572 failover, and will be split and defined in separate documents. As 573 such, this document can be considered an attempt to define 574 requirements for the DHCPv6 failover "core" protocol. 576 The core of the DHCPv6 failover protocol is expected to provide the 577 following properties: 579 1. The number of supported partners must be exactly two, i.e., 580 there are at most two servers that are aware of a specific 581 lease. 583 2. For each prefix or address pool, a server must not participate 584 in more than one failover relationship. 586 3. The defined protocol must support the m-to-1 model (i.e., one 587 server may form more than one relationship), but an 588 implementation may choose to implement only the 1-to-1 model 589 (i.e., everything from one server is backed on another). 591 4. One partner must be able to continue serving leases offered by 592 the other partner. This property is also sometimes called 593 "lease stability". The failure of either failover partner 594 should have minimal or no impact on client connectivity. In 595 particular, it must not force the client to change addresses 596 and/or change prefixes delegated to it. Lease stability has the 597 aim of avoiding disturbance to long-lived connections. 599 5. Prefix delegation must be supported. 601 6. Use of the failover protocol must not introduce significant 602 performance impact on server response times. Therefore 603 synchronization between partners must be done using some form of 604 lazy updates (see Section 5.2.2). 606 7. The pair of failover servers must be able to recover from a 607 server down failure (see Section 5.1.1). 609 8. The pair of failover servers must be able to recover from a 610 network partition event (see Section 5.1.2). 612 9. The design must allow secure communication between the failover 613 partners. 615 10. The definition of extensions to this core protocol should be 616 allowed, when possible. 618 Depending on the specific nature of the failure, the recovery 619 procedures mentioned in points 7 and 8 may require manual 620 intervention. 622 High Availability is a property of the protocol that allows clients 623 to receive DHCPv6 services despite the failure of individual DHCPv6 624 servers. In particular, it means the server that takes over 625 providing service to clients from its failed partner, will continue 626 serving the same addresses and/or prefixes. This property is also 627 called "lease stability". 629 Although progress on a standardized inter-operable DHCPv4 failover 630 protocol has stalled, vendor-specific DHCPv4 failover protocols have 631 been deployed that meet these requirements to a large extent. 632 Accordingly it would be appropriate to take into account the likely 633 coexistence of DHCPv4 and DHCPv6 failover solutions. In particular, 634 certain features that are common to both IPv4 and IPv6 635 implementations, such as DNS Update mechanism, should be taken into 636 consideration to ensure compatible operation. 638 7.1. Features out of scope 640 The following features are explicitly out of scope. 642 1. Load Balancing - a capability is considered an extension and may 643 be defined in a separate document. It must not be part of the 644 core protocol, but rather defined as an extension. The primary 645 reason for this the desire to limit core protocol complexity. 646 Load Balancing is likely to be defined as an extension. See 647 [I-D.ietf-dhc-dhcpv6-load-balancing]. 649 2. Configuration synchronization - two failover partners are 650 expected to maintain the same configuration. Mismatched 651 configuration between partners is a frequent problem in failover 652 solutions. Unfortunately, that is an open-ended problem, since 653 different servers have very different configuration data models. 655 3. m-to-n model (see Section 4.4) 657 4. Servers participating in multiple failover relationships for any 658 given prefix or address pool. 660 8. Security Considerations 662 The design must provide a mechanism whereby each peer in a failover 663 relationship can identify the other peer, authenticate that 664 identification, and validate that the identified peer is the one with 665 which communication is intended. This mechanism should also 666 optionally provide support for confidentiality. 668 The protocol specification, when it is written, should provide 669 operational guidelines in the case of authentication mechanisms that 670 require access to network servers that have the potential to be 671 unreachable (e.g. what to do if a partner is reachable, but remote 672 Certificate Authority is unreachable due to network partition event). 674 The security considerations for the design itself will be discussed 675 in the design document. 677 9. IANA Considerations 679 IANA is not requested to perform any actions at this time. 681 10. Acknowledgements 683 This document extensively uses concepts, definitions and other parts 684 of [dhcpv4-failover] document. Thanks to Bernie Volz and Shawn 685 Routhier for their frequent reviews and substantial contributions. 686 Authors would also like to thank Qin Wu, Jean-Francois Tremblay, 687 Frank Sweetser, Jiang Sheng, Yu Fu, Greg Rabil, Vithalprasad 688 Gaitonde, Krzysztof Nowicki, Steinar Haug, Elwyn Davies, Ted Lemon, 689 Benoit Claise and Stephen Farrell for their comments and feedback. 691 This work has been partially supported by Department of Computer 692 Communications (a division of Gdansk University of Technology) and 693 the National Centre for Research and Development (Poland) under the 694 European Regional Development Fund, Grant No. POIG.01.01.02-00-045 / 695 09-00 (Future Internet Engineering Project). 697 11. References 699 11.1. Normative References 701 [RFC3315] Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C., 702 and M. Carney, "Dynamic Host Configuration Protocol for 703 IPv6 (DHCPv6)", RFC 3315, July 2003. 705 [RFC3633] Troan, O. and R. Droms, "IPv6 Prefix Options for Dynamic 706 Host Configuration Protocol (DHCP) version 6", RFC 3633, 707 December 2003. 709 [RFC4704] Volz, B., "The Dynamic Host Configuration Protocol for 710 IPv6 (DHCPv6) Client Fully Qualified Domain Name (FQDN) 711 Option", RFC 4704, October 2006. 713 11.2. Informative References 715 [I-D.ietf-dhc-dhcpv6-load-balancing] 716 Kostur, A., "DHC Load Balancing Algorithm for DHCPv6", 717 draft-ietf-dhc-dhcpv6-load-balancing-00 (work in 718 progress), December 2012. 720 [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, 721 "Dynamic Updates in the Domain Name System (DNS UPDATE)", 722 RFC 2136, April 1997. 724 [RFC5970] Huth, T., Freimann, J., Zimmer, V., and D. Thaler, "DHCPv6 725 Options for Network Boot", RFC 5970, September 2010. 727 [RFC6853] Brzozowski, J., Tremblay, J., Chen, J., and T. Mrugalski, 728 "DHCPv6 Redundancy Deployment Considerations", BCP 180, 729 RFC 6853, February 2013. 731 [dhcpv4-failover] 732 Droms, R., Kinnear, K., Stapp, M., Volz, B., Gonczi, S., 733 Rabil, G., Dooley, M., and A. Kapur, "DHCP Failover 734 Protocol", draft-ietf-dhc-failover-12 (work in progress), 735 March 2003. 737 Authors' Addresses 739 Tomek Mrugalski 740 Internet Systems Consortium, Inc. 741 950 Charter Street 742 Redwood City, CA 94063 743 USA 745 Phone: +1 650 423 1345 746 Email: tomasz.mrugalski@gmail.com 748 Kim Kinnear 749 Cisco Systems, Inc. 750 1414 Massachusetts Ave. 751 Boxborough, Massachusetts 01719 752 USA 754 Phone: +1 (978) 936-0000 755 Email: kkinnear@cisco.com