idnits 2.17.1 draft-ietf-dhc-failover-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 1184 instances of lines with control characters in the document. ** The abstract seems to contain references ([RFC2131]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? RFC 2119 keyword, line 212: '... Secondary Servers SHOULD be viewed as...' RFC 2119 keyword, line 257: '...his private pool SHOULD be based only ...' RFC 2119 keyword, line 262: '...econdary Servers SHOULD pause normal D...' RFC 2119 keyword, line 363: '... SHOULD ensure that every packet sen...' RFC 2119 keyword, line 390: '... message MUST have the same transact...' (47 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 237 has weird spacing: '...through redun...' == Line 305 has weird spacing: '...ow easy recog...' == Line 422 has weird spacing: '...pproach as in...' == Line 423 has weird spacing: '... one of these...' == Line 513 has weird spacing: '... could just ...' == (1 more instance...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 1999) is 9173 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'RFC 2131' on line 42 == Unused Reference: '2' is defined on line 1986, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 1991, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 1994, but no explicit reference was found in the text == Outdated reference: A later version (-12) exists of draft-ietf-dhc-failover-00 -- Possible downref: Normative reference to a draft: ref. '3' == Outdated reference: A later version (-01) exists of draft-ietf-dhc-security-arch-00 -- Possible downref: Normative reference to a draft: ref. '4' Summary: 14 errors (**), 0 flaws (~~), 12 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Ralph Droms 2 INTERNET DRAFT Bucknell University 4 Greg Rabil 5 Mike Dooley 6 Arun Kapur 7 Quadritek Systems 9 Kim Kinnear 10 American Internet 12 Steve Gonczi 13 Bernie Volz 14 Process Software 16 August 1998 17 Expires March 1999 19 DHCP Failover Protocol 20 22 Status of this Memo 24 This document is an Internet-Draft. Internet-Drafts are working 25 documents of the Internet Engineering Task Force (IETF), its areas, 26 and its working groups. Note that other groups may also distribute 27 working documents as Internet-Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as ``work in progress.'' 34 To learn the current status of any Internet-Draft, please check the 35 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 36 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 37 munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or 38 ftp.isi.edu (US West Coast). 40 Abstract 42 DHCP [RFC 2131] allows for multiple servers to be operating on a 43 single network. Some sites are interested in running multiple servers 44 in such a way so as to provide redundancy in case of server failure. 45 In order for this to work reliably, the cooperating Primary and 46 Secondary servers must maintain a consistent database of the lease 48 DRAFT January 1998 50 information. This implies that servers will need to coordinate any 51 and all lease activity so that this information is synchronized in 52 case of failover. 54 This document defines a protocol to provide this synchronization 55 between two servers. One server is designated the "Primary" server, 56 the other is the "Secondary" server. Additionally, this document 57 describes a protocol for the automatic transfer of control from the 58 Primary to the Secondary in the case of failure (failover), as well 59 as a network partition. 61 This document is a merge of draft-ietf-dhc-failover-01.txt and 62 draft-ietf-dhc-safe-failover-proto-00.txt, along with substantial 63 changes to each. Unfortunately, this merge was not completed with 64 sufficient time to allow review by any of the authors of draft-ietf- 65 dhc-failover-01.txt, and so it may well not reflect their views even 66 though their names appear as authors. See Section 11, issue #1 and 67 Section 12 for more details. 69 1. Introduction 71 As the use of DHCP servers in networked environments grows, the 72 dependency of those networks on the DHCP server increases. This is 73 particularly true of the hosts that receive their configuration 74 information from the DHCP server. Therefore, it is very important to 75 be able to provide reliable, continuous availability of DHCP ser- 76 vices. 78 This specification describes a protocol to support automatic failover 79 from a primary to its secondary server. The failover mechanism 80 allows the secondary server to perform DHCP actions while the primary 81 is down, or when a network failure prevents the primary and secondary 82 from communicating. The protocol also specifies how reintegration is 83 achieved when the primary again becomes operational or when the pri- 84 mary and secondary can again communicate. 86 In providing the specification for the failover, the protocol speci- 87 fies how to guarantee reliable delivery of changes to the secondary. 88 This is required to synchronize the secondary's lease data with that 89 of the primary. The protocol further specifies a mechanism to allow 90 the secondary to determine if it can communicate with the primary 91 server. The secondary will automatically begin to service DHCP 92 requests whenever it cannot communicate with the primary. When the 93 primary server becomes available again, the secondary will convey any 94 changes that occurred since the time of failover back to the primary. 96 Through careful control of the difference between the lease times 98 DRAFT January 1998 100 offered to DHCP clients and the lease time known by the secondary 101 server, the protocol allows the primary to communicate with the 102 secondary after the primary has completed communication with the DHCP 103 client (a technique known as "lazy" update) and still guarantee that 104 duplicate IP address allocations do not occur. Thus, the protocol 105 does not directly impact the ability of a DHCP server to respond to 106 DHCP client requests. 108 1.1. Requirements Terminology 110 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 111 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 112 document are to be interpreted as described in RFC 2119 [RFC 2119]. 114 1.2. DHCP Terminology 116 This document uses the following terms: 118 o "DHCP client" or "client" 120 A DHCP client is an Internet host using DHCP to obtain confi- 121 guration parameters such as a network address. 123 o "DHCP server" or "server" 125 A DHCP server is an Internet host that returns configuration 126 parameters to DHCP clients. 128 o "binding" 130 A binding is a collection of configuration parameters, includ- 131 ing at least an IP address, associated with or "bound to" a 132 DHCP client. Bindings are managed by DHCP servers. 134 o "binding database" 136 The collection of bindings managed by a primary and secondary. 138 o "subnet address pool" 140 A subnet address pool is the set of IP address which is asso- 141 ciated with a particular network number and subnet mask. In 142 the simple case, there is a single network number and subnet 143 mask and a set of IP addresses. In the more complex case 144 (sometimes called "secondary subnets", sometimes "super- 145 scopes"), several (apparently unrelated) network number and 146 subnet mask combinations with their associated IP addresses 147 DRAFT January 1998 149 may all be configured together into one subnet address pool. 151 o "primary server" or "primary" 153 A DHCP server configured to provide primary service to a set 154 of DHCP clients for a particular set of subnet address pools. 156 o "secondary server" or "secondary" 158 A DHCP server configured to act as backup to a primary server 159 for a particular set of subnet address pools. 161 o "stable storage" 163 Every DHCP server is assumed to have some form of what is 164 called "stable storage". Stable storage is used to hold 165 information concerning IP address bindings (among other 166 things) so that this information is not lost in the event of a 167 server failure which requires restart of the server. 169 1.3. Requirements for this protocol 171 The following list of goals must be (and are) achieved by this proto- 172 col. 174 1. Implementations of this protocol must work with existing DHCP 175 client implementations based on the DHCP protocol [1]. 177 2. Implementations of the protocol must work with existing BOOTP 178 relay implementations. 180 3. The protocol must provide failover redundancy between servers 181 that are not located on the same subnet. 183 1.4. Goals for this protocol 185 1. Provide for continued service to DHCP clients through an 186 automated mechanism in the event of failure of the Primary 187 Server. 189 2. Avoid binding an IP address to a client while that binding is 190 currently valid for another client. In other words, don't 191 allocate the same IP address to two clients. 193 3. Minimize any need for manual administrative intervention. 195 DRAFT January 1998 197 4. Introduce no additional delays in server response time as a 198 result of inter-server communication. 200 5. Share IP address ranges between primary and secondary 201 servers; i.e., impose no requirement that the pool of avail- 202 able addresses be divided between servers. 204 6. Continue to meet the goals and objectives of this protocol in 205 the event of server failure or network partition. 207 7. Provide graceful reintegration of full protocol service after 208 server failure or network partition. 210 8. Allow for one computer to act as a Secondary Server for mul- 211 tiple Primary Servers. Other topologies (e.g.: mesh) are also 212 possible. Primary and Secondary Servers SHOULD be viewed as 213 "logical" servers and not necessarily physical computers. 215 9. Ensure that an existing client can keep its existing IP 216 address binding if it can communicate with either the Primary 217 or Secondary DHCP server implementing this protocol - not 218 just whichever server that originally offered it the binding. 220 10.Ensure that a new client can get an IP address from some 221 server. Ensure that in the face of partition, where servers 222 continue to run but cannot communicate with each other, the 223 above goals and requirements may be met. In addition, when 224 the partition condition is removed, allow graceful automatic 225 re-integration without requiring human intervention. 227 11.If either Primary or Secondary Server loses all of the infor- 228 mation that is has stored in stable storage, it should be 229 able to refresh its stable storage from the other server. 231 1.5. Limitations of this Protocol 233 The following are explicit limitations of this protocol. 235 1. Under normal operation, only one server at a time will ser- 236 vice DHCP client requests; this protocol provides reliability 237 through redundancy but not load balancing. 239 2. This protocol provides only one level of redundancy through a 240 single Secondary Server for each Primary Server. 242 3. The protocol provides a way to detect when the primary and 243 secondary server cannot communicate, but once this condition 244 DRAFT January 1998 246 has been detected, does not (indeed, cannot) provide any way 247 to further distinguish between network failure and failure of 248 one of the servers. 250 4. A small number of IP addresses are reserved for Secondary 251 Server use. In order to handle the failure case where both 252 servers are able to communicate with DHCP clients, but unable 253 to communicate with each other, a small number of IP 254 addresses must be set aside as a private address pool for the 255 Secondary Server. The Secondary can use these to service 256 newly arrived DHCP clients during such a period. The size of 257 this private pool SHOULD be based only on the arrival rate of 258 new DHCP clients and the length of expected downtime, and is 259 not influenced in any way by the total number of DHCP clients 260 supported by the server pair. 262 5. The Primary and Secondary Servers SHOULD pause normal DHCP 263 transaction processing while resynchronizing, after a system 264 failure. 266 2. Protocol Operations 268 The protocol necessary in providing redundant/failover servers can be 269 grouped in three areas: 271 o Messages to keep the Secondary Server's lease data synchron- 272 ized with that of the Primary so that when failover occurs, 273 there is no degradation of service. 275 o Messages that allow the Secondary to determine the operational 276 state of the Primary, so as to know when to start servicing 277 DHCP traffic. 279 o Messages that are used to coordinate the Primary regaining 280 control when it has become available again. 282 2.1. Time synchronization between communicating servers 284 Each Binding update message carries a "sent time stamp" (the time 285 when the message was sent in GMT). This provides a simple mechanism 286 to determine any "time drift" between communicating servers. 288 DISCUSSION: 290 If an UDP packet is successfully transmitted (i.e.: it does not 291 get lost), the packet travel time is negligible in the framework 293 DRAFT January 1998 295 of DHCP leases. By providing a GMT "sent time" stamp, the reci- 296 pient can compare this with its notion of the current GMT time at 297 the time it receives the packet. The difference (plus the packet 298 travel time, which we ignore) is the time drift. The recipient 299 can use this time drift value to bias all "absolute time" values 300 it receives from the sender. 302 2.2. Failover Protocol Messages 304 The Failover Protocol messages are encoded using a packet format 305 specific to the Failover Protocol. To allow easy recognition of 306 Failover Protocol messages, BOOTP packet "op" field values 3..14 are 307 proposed to mark various Failover Protocol messages. A Failover Pro- 308 tocol message is always unicast from the source to the destination. 309 The sender, and never the recipient is responsible for reliable re- 310 transmission. 312 2.3. Failover Protocol packet header format 314 0 1 2 3 315 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 316 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 317 | op (1) | rev (1) | payload offset (2) | 318 +---------------+---------------+---------------+---------------+ 319 | xid (4) | 320 +---------------------------------------------------------------+ 321 | 0 or more additional header bytes (variable) | 322 +---------------------------------------------------------------+ 323 | Payload data, formatted as DHCP-style options | 324 | (although using a unique option number space) | 325 | (variable) | 326 +---------------------------------------------------------------+ 328 op - 1 byte 330 These values extend the number space of the existing BOOTP message 331 type "Op" field. The following types are defined: 333 DRAFT January 1998 335 3 DHCPPOOLREQ 336 4 DHCPPOOLRESP 337 5 DHCPBNDUPD 338 6 DHCPBNDACK 339 7 DHCPPOLL 340 8 DHCPPRPL 341 9 DHCPCTLREQ 342 10 DHCPCTLRET 343 11 DHCPCTLACK 344 12 DHCPCTLACKACK 345 13 DHCPREQUEREQ 346 14 DHCPREQUERESP 348 rev - 1 byte 350 Failover protocol version supported. Set to 1 for the Failover Proto- 351 col described in this draft. 353 payload offset - 2 bytes, network byte order 355 The byte offset of the Payload area, from the beginning of the Fail- 356 over packet header. The value for the current protocol version is 8. 358 xid - 4 bytes, network byte order 360 The sender of a failover protocol packet is responsible for setting 361 this number, and the receiver of the packet copies the number over 362 into any response packet. To the receiver it is opaque. The sender 363 SHOULD ensure that every packet sent to a particular IP address and 364 port combination has a unique transaction id unless that packet is a 365 re-transmission. 367 2.4. DHCPPOOLREQ and DHCPPOOLRESP: 369 Whenever the Secondary server transitions into NORMAL mode, it first 370 sends a DHCPPOOLREQ message to initiate a transfer of a small range 371 of IP addresses that will serve as its private address pool. 373 This is necessary, because initially the Secondary server has no such 374 address pool, and its pool gets depleted when it hands out addresses 375 in COMMUNICATION-INTERRUPTED mode. This is why the request is sent 376 every time the Secondary server transitions into NORMAL mode. The 377 DHCPPOOLREQ message does not carry any payload data. When the Primary 378 Server gets a DHCPPOOLREQ message, it computes which addresses should 379 be transferred to the Secondary, and queues up DHCPBNDUPD transac- 380 tions, setting the Status of these bindings to "BACKUP". Having done 381 this, it sends a DHCPPOOLRESP message. The DHCPPOOLRESP message 383 DRAFT January 1998 385 carries the "Number of addresses transferred" as its payload. 387 The Secondary server keeps sending DHCPPOOLREQ messages until it 388 receives a DHCPPOOLRESP with "Number of addresses transferred" = 0, 389 or it decides that the partner is not responding. Each one of these 390 message MUST have the same transaction ID. If a new transaction ID 391 is used in one of these messages, the receiving server will begin the 392 transmission of the DHCPBNDUPD messages all over again. To be clear, 393 if the Secondary Server receives a DHCPPOOLRESP message with "Number 394 of addresses transferred" > 0, it MUST send another DHCPPOOLREQ mes- 395 sage. This mechanism makes it possible for the Primary Server to pace 396 the transfer (e.g., it could generate all addresses all at once, or 397 one-by-one). 399 The Primary Server must respond to each DHCPPOOLREQ message it 400 receives. If it has already generated all private addresses, or it 401 has no available addresses, it MUST send DHCPPOOLRESP with "Number 402 of addresses transferred" = 0. 404 2.5. DHCPREQUEREQ and DHCPREQUERESP: 406 Whenever either server wishes to be updated with the information that 407 the other server knows and has not yet transmitted to it, will send a 408 DHCPREQUEREQ. 410 The DHCPREQUEREQ message does not carry any payload data. When the 411 either server gets a DHCPREQUEREQ message, it computes which updates 412 should be transferred to the Secondary, and queues up DHCPBNDUPD 413 transactions as appropriate. Having done this, it sends a DHCPRE- 414 QUERESP message. The DHCPREQUESP message carries the "Number of 415 addresses queued up" as its payload. The set of binding updates 416 queued up will depend on the requesting server's state. (The state 417 has already been communicated via prior DHCPPOLL/DHCPPRPL messages) 419 The Secondary server keeps sending DHCPPREQUEREQ messages until it 420 receives a DHCPREQUERESP with "Number of addresses queued up" = 0, 421 or it decides that the partner is not responding. This is the same 422 approach as in the DHCPPOOLREQ/DHCPPOOLRESP messages is used. Each 423 one of these DHCPREQUEREQ message MUST have the same transaction ID. 424 Use of a new transaction ID will cause re-building of the outgoing 425 binding update queue. 427 The Primary Server must respond to each DHCPREQUEREQ message it 428 receives. If it has already queued up all of the previously unsent 429 bindings update, then it MUST send DHCPREQUERESP with "Number of 430 addresses queued up" = 0. 432 DRAFT January 1998 434 2.6. DHCPBNDUPD 436 The Primary notifies Secondary (or the other way around) of a binding 437 state and data change. 439 In response to a binding update, the recipient server MUST respond 440 with a DHCPBNDACK message. Multiple binding updates can be batched 441 up, and sent in one Failover Protocol message. 443 2.7. DHCPBNDACK 445 This message implements a positive, or negative acknowledgement of 446 one or more binding updates. 448 A binding update, (or a batch of binding updates sent as one message) 449 are matched up with their associated acknowledgment by having the 450 same Xid field value in the message header. 452 The server sending a DHCPBNDACK message MAY include any of the 453 options that are acceptable in a DHCPBNDUPD message when the 454 DHCPBNDACK message returned to the sender. If any of this informa- 455 tion differs from the information in the DHCPBNDUPD message, the 456 receiver SHOULD update its bindings database with that information 457 upon receipt of the DHCPBNDACK message. 459 The DHCPBNDACK MAY selectively reject one or more updates by includ- 460 ing one or more IP address - Reject Reason option pairs in the mes- 461 sage body. 463 The DHCPBNDACK implicitly acknowledges any binding updates it replies 464 to, except those it enumerates using Reject Reason Codes. 466 2.8. DHCPPOLL 468 In order to determine the state of a given server, or to communicate 469 a critical change in its own status, a participant can use the above 470 message. 472 This message inquires about the current state of the recipient, and 473 tells the recipient what state the sender is. 475 In response to the DHCPPOLL message, the participant will listen for 476 a DHCPPRPL message. 478 DRAFT January 1998 480 2.9. DHCPPRPL 482 This message replies to the DHCPPOLL message (PRPL=Poll reply). The 483 DHCPPRPL also carries server status information (see message payload 484 details below). 486 After a failover, when the Primary Server is restarted, the following 487 messages are used to coordinate the Primary taking control back from 488 the Secondary: 490 DHCPCTLREQ - Request for control 491 DHCPCTLRET - Return of control initiated 492 DHCPCTLACK - Return of control completed 493 DHCPCTLACKACK - Return of control completed message acknowledged. 495 The Primary Server sends a DHCPCTLREQ message, indicating that it 496 would like to take control of the bindings database. The Secondary 497 Server replies with a DHCPCTLRET message, which serves as a signal to 498 the Primary "Stand by to receive binding updates". This message then 499 is followed by a set of binding updates from the secondary to the 500 primary. When all updates have been transmitted (and acknowledged) 501 from Secondary to Primary, a DHCPCTLACK message is sent from the 502 Secondary to the Primary, to signal that "all updates from the Secon- 503 dary are now completed". 505 DISCUSSION: 507 Note, that the DHCPCTLACK message type must be transmitted reli- 508 ably, as the Primary Server will not start servicing clients, 509 until it has received the DHCPCTLACK message. To provide this 510 reliability, the DCHPCTLACKACK message is provided. This provides 511 an acknowledgment of the DHCPCTLACK message, and the DHCPCTLACK 512 message will be periodically re-sent until it is acknowledged. We 513 could just periodically re- send the DHCPCTLACK message until we 514 start receiving binding updates from the Primary, but the Primary 515 may not have any updates to send at all, hence the need for an 516 explicit DCHPCTLACKACK message. 518 The Primary Server transitions into NORMAL state upon receiving a 519 DHCPCTLACK from the secondary, when the secondary has completed send- 520 ing all of its updates during synchronization. The DHCPCTLACKACK 521 message is needed to prevent the primary from waiting and not servic- 522 ing clients if the DHCPCTLACK message got lost. The Secondary server 523 will keep re-sending the DHCPCTLACK message, until: 525 1. It Decides that the primary is not responding, so the Secon- 526 dary server goes into COMMUNICATION- INTERRUPTED mode. 528 DRAFT January 1998 530 2. It receives a DHCPCTLACKACK or a DHCPBNDUPD message from the 531 primary. The Primary's DHCPBNDUPD messages would start 532 arriving at the Secondary server, if the Primary did get the 533 DHCPCTLACK, but the DHCPCTLACKACK message got lost. 535 3. Protocol Payload Data Format 537 Payload data is encoded as a set of flexible DHCP/BOOTP style 538 options. (The usual 1 byte option code, 1 byte length, and "length" 539 bytes of data). The options are placed after the header, after skip- 540 ping PayloadOffset bytes. The payload data options are not preceded 541 "cookie" value. 543 Since the packet is NOT a DHCP/BOOTP protocol packet, the options 544 used here do not conflict with any existing "proper" DHCP/BOOTP 545 options. In fact, these options are allocated in relationship to the 546 DHCP option space in the following way. In cases where the syntax 547 and semantics of a Failover Payload Option is identical to that of a 548 DHCP/BOOTP option, the same number option number is used. For 549 options unique to the Failover protocol, options numbers starting at 550 230 are used. 552 Thus, all new Failover Protocol option numbers are assigned from a 553 continuous range beginning with 230. This number is shown as an X in 554 the tables below. 556 The protocol is permissive in allowing various other DHCP options in 557 binding updates. As long as the sender wishes to use an option, it 558 MAY include it. On the other hand, the recipient MUST ignore any 559 option it is not expecting. 561 Multiple DHCPBNDUPD transactions can be batched together in one UDP 562 packet. Option sets for individual transaction MUST always begin 563 with the IP address (Option 50) . This is the only restriction on 564 payload item ordering. In any other case, payload data items can be 565 included in any desired order. 567 In case an implementation chooses to use the DHCPBNDNAK mechanism, 568 the DHCPBNDNAK message SHOULD contain one or more Option 50s from the 569 NAK-ed message, to indicate which specific update items are being 570 NAK-ed. 572 While the synchronization is in progress, the secondary MUST NOT 573 accept client requests, and the primary MUST NOT send any updates to 574 the secondary. This is necessary to allow the Primary to be the sole 575 arbitrator of any conflicting updates. 577 DRAFT January 1998 579 3.1. DHCP Server Status 581 This option is used to convey the current state of a server. 583 Code Len Type 584 +--+---+------+ 585 | X| 1 | 1-15 | 586 +--+---+------+ 588 Allowed values for this option: 590 Value Message Type 591 ----- ------------ 592 1 UNKNOWN-STATE 593 2 PRIMARY-NORMAL Normal state 594 3 BACKUP-NORMAL 595 4 PRIMARY-COMINT Communication interrupted (safe) 596 5 BACKUP-COMINT 597 6 PRIMARY-PARTNERDOWN Partner down (unsafe 598 mode) 599 7 BACKUP-PARTNERDOWN 600 8 PRIMARY-CONFLICT Synchronizing, after a 601 "Partner-Down" 602 divergence 603 9 PRIMARY-SYNC Synchronizing, after a 604 "communications- 605 interrupted" 606 divergence. 607 10 BACKUP-SYNC 608 11 PRIMARY-RECOVER Recovering ALL 609 bindings from partner 610 12 BACKUP-RECOVER 611 13 FAILOVER-DISABLED The server is running 612 with the failover 613 protocol disabled. 614 (standalone) 616 14 SERVER-PAUSED The server is inactive, 617 shutting down for a sort period. 618 15 SERVER-SHUTDOWN The server is inactive, 619 shutting down for an extended period. 621 When a server is being re-started, it should send a DHCPPOLL message 622 to its partner, reporting its status (SERVER-PAUSED). In response, 623 the recipient SHOULD go into COMMUNICATION-INTERRUPTED mode. 625 DRAFT January 1998 627 When a server is being shut down, it should send a DHCPPOLL message 628 to its partner, reporting its status (SERVER-SHUTDOWN). 630 In response, the recipient SHOULD go into PARTNER-DOWN mode. 632 3.2. DHCP Binding Status 634 This option is used to convey the current state of a binding. This 635 option is mandatory for DHCPBNDUPD messages. 637 Code Len Type 638 +-----+-----+-----+ 639 | X+1 | 1 | 1-7 | 640 +-----+-----+-----+ 642 Legal values for this option are: 644 Value Message Type 645 ----- ------------ 646 1 FREE The lease has never been used 647 2 ACTIVE assigned to a client * 648 3 EXPIRED 649 4 RELEASED A client released the lease 650 5 ABANDONED A server or client flagged address 651 as not usable. 652 6 RESET Lease was freed by some 653 external agent. 654 7 BACKUP Lease is set aside for Secondary 655 server's private address pool. 657 3.3. Assigned IP address 659 Uses identical code and format to DHCP Option 50 (requested IP 660 address). 662 Code Len Address 663 +-----+-----+-----+-----+-----+-----+ 664 | 50 | 4 | a1 | a2 | a3 | a4 | 665 +-----+-----+-----+-----+-----+-----+ 667 DRAFT January 1998 669 3.4. Lease grant time 671 An absolute, GMT time value for this option, as time synchronization 672 has already been achieved between the source and the target server 673 using the Sent Time Stamp option. Represented as seconds since Jan 674 1, 1970 (i.e. ANSI C time_t time value representation). 676 Code Len Time 677 +------+-----+-----+-----+-----+-----+ 678 | X+2 | 4 | t1 | t2 | t3 | t4 | 679 +------+-----+-----+-----+-----+-----+ 681 3.5. Sent Time Stamp 683 A time stamp using GMT, when the packet was sent. It is used to 684 determine the time drift between the sender and the recipient. The 685 time drift is defined as the difference between "Arrive Time (GMT)" 686 and (Send Time (GMT)" . The actual packet travel time is assumed to 687 be negligible in this context. All Date-Time values contained in 688 Failover messages will be corrected by the time drift before being 689 stored by the recipient. 691 Code Len Time 692 +-----+-----+-----+-----+-----+-----+ 693 | X+3 | 4 | t1 | t2 | t3 | t4 | 694 +-----+-----+-----+-----+-----+-----+ 696 The time is a 32 bit unsigned long in network byte order, in units of 697 seconds (GMT since EPOCH). 699 3.6. Number of addresses transferred to Secondary Server 701 A 32 bit unsigned long in network byte order. Reports the number of 702 addresses transferred by the Primary to the Secondary Server 703 (addresses to be used for the Secondary Server's private address 704 pool) 706 DRAFT January 1998 708 Code Len Time 709 +-----+-----+-----+-----+-----+-----+ 710 | X+4 | 4 | t1 | t2 | t3 | t4 | 711 +-----+-----+-----+-----+-----+-----+ 713 3.7. Lease Duration 715 Uses the format and code of the standard DHCP IP Address Lease Time 716 option. It is used by the DHCP protocol in the exact same way by the 717 DHCPOFFER message. The time is in units of seconds, and is specified 718 as a 32-bit unsigned integer. A Lease Duration of 0xFFFFFFFF indi- 719 cates an infinite lease. 721 Code Len Lease Time 722 +-----+-----+-----+-----+-----+-----+ 723 | 51 | 4 | t1 | t2 | t3 | t4 | 724 +-----+-----+-----+-----+-----+-----+ 726 3.8. Client Identifier 728 The format, code and conventions used are identical to DHCP option 729 61. 731 Code Len Type Client-Identifier 732 +-----+-----+-----+-----+-----+--- 733 | 61 | n | t1 | i1 | i2 | ... 734 +-----+-----+-----+-----+-----+--- 736 3.9. Client Hardware Address 738 The format is similar to DHCP option 61. T1 (type) MUST be set to the 739 proper ARP hardware address code ( it MUST NOT be zero!) TBD: Refer- 740 ence the ARP document here. 742 DRAFT January 1998 744 Code Len Type Client-Identifier 745 +-----+-----+-----+-----+-----+--- 746 | X+5 | n | t1 | i1 | i2 | ... 747 +-----+-----+-----+-----+-----+--- 749 Either Client Id, Client Hardware Address or BOTH MAY be present in 750 binding update transactions. At least one of them MUST be present. 751 If both are present, the Client Id MUST be used to uniquely identify 752 the owner of the binding (exactly as in RFC 2131). 754 3.10. Host Name 756 Uses the format and code of DHCP option 12. 758 Code Len Host Name 759 +-----+-----+-----+-----+-----+-----+-----+-----+-- 760 | 12 | n | h1 | h2 | h3 | h4 | h5 | h6 | ... 761 +-----+-----+-----+-----+-----+-----+-----+-----+-- 763 3.11. Domain Name 765 Uses the format and code of DHCP option 15. 767 Code Len Domain Name 768 +-----+-----+-----+-----+-----+-----+-- 769 | 15 | n | d1 | d2 | d3 | d4 | ... 770 +-----+-----+-----+-----+-----+-----+-- 772 3.12. Reject Reason Code 774 This option is used to selectively reject binding updates. It MAY be 775 used in DHCPBNDACK message, always following an option 50.(The option 776 50 contains the IP address of the specific update being rejected). 778 DRAFT January 1998 780 Code Len Reason code 781 +-----+-----+-----+ 782 | X+6 | 1 | R1 | 783 +-----+-----+-----+- 785 Reason codes : 787 1 Illegal IP address (not part of any address pool) 788 2 Fatal conflict exists: address in use by other client. 790 3.13. MDLI 792 Maximum Delta Lease Interval, in seconds. A 32 bit integer value, 793 in netwotk byte order. 795 Code Len Time 796 +------+-----+-----+-----+-----+-----+ 797 | X+7 | 4 | t1 | t2 | t3 | t4 | 798 +------+-----+-----+-----+-----+-----+ 800 4. Exchange of control between Primary and Secondary 802 The Primary and Secondary Servers coordinate the exchange control 803 over the bindings database through the use of DHCPPOLL and DHCPCTLREQ 804 messages. In normal operation: 806 The Primary sends notification of each change to its bindings data- 807 base to the Secondary, and the Secondary keeps its bindings database 808 synchronized with the Primary's database. 810 The Secondary periodically sends DHCPPOLL messages to the Primary, 811 and the Primary responds to each DHCPPOLL message with a DHCPPRPL 812 message. If the Secondary does not receive a DHCPPRPL response mes- 813 sage, the Secondary takes control of the bindings database and begins 814 answering requests from DHCP clients. Note that the Secondary should 815 be able to be configured to not perform the automatic switch-over. 817 The conditions under which a Secondary takes control of the bindings 818 database, e.g., the number of consecutive missing acknowledgments, 819 should be configurable in the Secondary by the DHCP administrator. 821 DRAFT January 1998 823 The Secondary records any changes it makes to the bindings database 824 while it has control. The Secondary continues to send DHCPPOLL mes- 825 sages to the Primary. The DHCPPOLL messages also carry information 826 on the state of the Secondary Server. 828 To regain control of the bindings database, e.g., after the Primary 829 Server has recovered from a failure, or a partitioned network condi- 830 tion, the Primary sends a DHCPCTLREQ message to the Secondary. The 831 Secondary stops answering DHCP client requests, and responds to its 832 Primary with a DHCPCTLRET message. After sending the DHCPCTLRET mes- 833 sage, the Secondary sends DHCPBNDUPD messages for each of the changes 834 it has made to the bindings database. 836 The Primary sends a DHCPBNDACK for each DHCPBNDUPD message it 837 receives. The Secondary completes the transfer of control by sending 838 a DHCPCTLACK message to the Primary as soon as all of its updates 839 were acknowledged. 841 Note, that the Primary SHOULD NOT send any DHCPBNDUPD messages while 842 synchronization is in progress with the Secondary. 844 Once the synchronization is completed, and the Primary transitions 845 into NORMAL state, and starts sending DHCPBNDUPD transactions on any 846 accumulated binding changes it may have. 848 5. Duplicate address assignment scenarios 850 In the following two scenarios, the protocol could end up allocating 851 duplicate IP addresses, unless the measures recommended in Section 6. 852 are taken: 854 Primary Server crash before "lazy" update: In the case where the Pri- 855 mary Server sends an ACK to a client for a newly allocated IP address 856 and then crashes prior to sending the corresponding update to the 857 Secondary Server, the Secondary Server will have no record of the IP 858 address allocation. When the Secondary Server takes over, it may 859 well try to allocate that IP address to a different client. In the 860 case where the first client to receive the IP address is not on the 861 net at the time (yet while there was still time to run on its lease), 862 an ICMP echo (i.e., ping) will not prevent the Secondary Server from 863 allocating that IP address to different client. 865 A more likely and subtle version of this problem is where the Primary 866 Server crashes after extending a client's lease time, and before 867 updating the Secondary with a new time using a lazy update. After the 868 Secondary takes over, if the client is not connected to the network 869 the Secondary will believe the client's lease has expired when, in 870 fact, it has not. In this case as well, the IP address might be 872 DRAFT January 1998 874 reallocated to a different client while the first client is still 875 using it. 877 Network partition where servers can't communicate but each can talk 878 to clients: Several conditions are required for this situation to 879 occur. First, due to a network failure, the Primary and Secondary 880 Servers cannot communicate. As well, some of the DHCP clients must 881 be able to communicate with the Primary Server, and some of the 882 clients must now only be able to communicate with the Secondary 883 Server. When this condition occurs, both Primary and Secondary 884 Servers could attempt to allocate IP addresses for new clients from 885 the same pool of available addresses. At some point, then, two 886 clients will end up being allocated the same IP address. This will 887 cause potentially serious problems when the network failure that 888 created this situation is corrected. 890 The next section details how the Failover Protocol prevents either of 891 the above scenarios (and other related scenarios) from causing dupli- 892 cate IP address allocation. 894 6. Duplicate Address Assignment Control 896 There are several ways that the Failover protocol avoids the possi- 897 bility of duplicate address assignment. 899 6.1. Control of lease time 901 The key problem with lazy update is that when the primary server 902 fails after updating a client with a particular lease time and before 903 updating the secondary server, the secondary server will believe that 904 a lease has expired even though the client still retains a valid 905 lease on that IP address. 907 In order to handle this problem, a period of time known as the "max- 908 imum delta lease interval" (MDLI) is defined and must be known to 909 both the primary and secondary servers. Proper use of this time 910 interval places an upper bound on the difference allowed between the 911 lease time provided to a DHCP client and the lease time known by the 912 secondary server. In order that this is not the maximum lease time 913 that the primary can ever provide to a client, during a lazy update 914 the primary typically updates the secondary with lease time informa- 915 tion which is longer than the lease time previously given to the 916 client. 918 In the case where the secondary needs to take over from the primary, 919 the secondary will not reallocate any IP addresses from one client to 920 a different clients. When transitioning to the PARTNER-DOWN state 921 (where the secondary is allowed to reallocate IP addresses), the 923 DRAFT January 1998 925 secondary will wait the maximum-delta-lease-interval before complet- 926 ing the state transition. Thus, any clients which have a lease on an 927 IP address with a lease time greater that than known by the secondary 928 will either have contacted the secondary during that time or the 929 their lease will have expired. 931 This protocol requires a DHCP server to deal with several different 932 lease intervals and places specific restrictions on their relation- 933 ships. The purpose of these restrictions is to allow the other server 934 in the pair to be able to make certain assumptions in the absence of 935 an ability to communicate between servers. 937 The different lease times are: 939 o desired client lease interval 941 The desired client lease interval is the lease interval that 942 the DHCP server would like to give to the DHCP client in the 943 absence of any restrictions imposed by the Failover Protocol. 944 Its determination is outside of the scope of this protocol. 945 Typically this is the result of external configuration of a 946 DHCP server. 948 o actual client lease interval 950 The actual client lease internal is the lease interval that 951 that DHCP server gives out to the DHCP client. It may be 952 shorter than the desired client lease interval (as explained 953 below). 955 o Primary Server lease interval 957 The Primary Server lease interval is the interval after which 958 the Primary Server believes that DHCP client's lease will 959 expire. 961 o desired Secondary Server lease interval 963 The desired Secondary Server lease interval is the interval 964 the Primary Server tells to the Secondary Server after which 965 the lease will expire. 967 o acknowledged Secondary Server lease interval 969 The acknowledged Secondary Server lease interval is the inter- 970 val the Secondary Server has most recently acknowledged. The 971 key restriction (and guarantee) that the Primary Server makes 972 with respect to lease intervals is that the actual client 973 DRAFT January 1998 975 lease interval never exceeds the acknowledged Secondary Server 976 lease interval (if any) by more than a fixed amount. This 977 fixed amount is called the "maximum delta lease interval" 978 (MDLI). 980 The MDLI MAY be configurable, but for correct server operation it 981 MUST be known to both the Primary and Secondary Servers. 983 The Primary Server MUST record in its state both the Primary Server 984 lease interval and the most recently acknowledged Secondary Server 985 lease interval. It is assumed that the desired client lease interval 986 can be determined through techniques outside of the scope of this 987 protocol. 989 The above lease time descriptions are written for the case where the 990 where the Primary server is operating and in communication with the 991 Secondary server. In the case where the Secondary server is operat- 992 ing out of communications with the Primary server, then the relation- 993 ships must hold in the other direction. 995 The fundamental relationship among these times which MUST be main- 996 tained is: 998 actual client lease interval < 999 ( acknowledged other server lease interval + MDLI ) 1001 The "acknowledged other server lease interval" is the acknowledged 1002 secondary server lease interval for the Primary server, and it would 1003 be the acknowledged primary server lease interval for the Secondary 1004 server when it is operating out of contact with the Primary server. 1006 DISCUSSION: 1008 This protocol mandates no particular detailed algorithms concern- 1009 ing these lease intervals, as long as above fundamental relation- 1010 ship is preserved. 1012 In the interests of clarity, however, let's examine a specific 1013 example. The MDLI in this case is 1 hour. The desired client 1014 lease interval is 3 days. In operation this might work as fol- 1015 lows: 1017 When a Primary Server makes an offer for a new lease on an IP 1018 address to a DHCP client, it determines the desired client lease 1019 interval (in this case, 3 days). It then examines the ack- 1020 nowledged Secondary lease interval (which in this case is zero). 1022 DRAFT January 1998 1024 Since the actual client lease interval can not be allowed to 1025 exceed the current Secondary lease interval by more than the MDLI, 1026 the offer made to the DHCP client (the actual client lease inter- 1027 val) is for (essentially) the MDLI, 1 hour. 1029 Once the Primary Server has performed the ACK to the DHCP client, 1030 it will update the Secondary Server with the lease information. 1031 However, the Secondary Server lease interval will be composed of 1032 the current actual client lease interval + ( 1.5 * desired client 1033 lease interval). Thus, the Secondary Server is updated with a 1034 lease interval of 4.5 days + 1 hour. 1036 When the Primary Server receives an ACK to its update of the 1037 Secondary Server's lease interval, it records that as the ack- 1038 nowledged Secondary Server lease interval. The Primary Server 1039 MUST ensure that the Secondary Server has received and recorded in 1040 its stable storage the Secondary Server lease interval. 1042 When the DHCP client attempts to renew at T2 (approximately one 1043 half an hour from the start of the lease), the Primary Server 1044 again determines the desired client lease time, which is still 3 1045 days. It then compares this with the remaining acknowledged 1046 Secondary Server lease interval (adjusting for the time passed 1047 since the Secondary Server was last updated), which is 4.5 days + 1048 to the desired client lease interval as it is less than the ack- 1049 nowledged Secondary lease interval. 1051 When the Primary DHCP server updates the Secondary DHCP server 1052 after the DHCP client's renewal ACK is complete, it will calculate 1053 the Secondary Server lease interval as the actual client lease 1054 interval (3 days this time) + .5 the desired client lease interval 1055 (1.5 days). In this way, the Primary attempts to have the Secon- 1056 dary always "lead" the client in its understanding of the client's 1057 lease interval. 1059 Once the initial actual client lease interval of the MDLI is past, 1060 the protocol operates effectively like the DHCP protocol does 1061 today in its behavior concerning lease intervals. However, the 1062 guarantee that the actual client lease interval will never exceed 1063 the acknowledged Secondary Server lease interval by more than the 1064 MDLI allows full recovery from failures in lazy update. 1066 6.2. Controlled re-allocation of IP addresses 1068 When the servers cannot communicate neither server will allow an IP 1069 address previously used by one client to be offered to a different 1070 client. As a corollary, during normal operations the primary server 1072 DRAFT January 1998 1074 must update the secondary server whenever a lease expires or an IP 1075 address is released, and must receive acknowledgement of that update 1076 before offering the IP address of the expired or released IP address 1077 to a different client. 1079 7. Server States 1081 The following server states are defined: 1083 NORMAL State: 1085 NORMAL state is the state used by a server when it can communicate 1086 with the other server in the Primary-Secondary Server pair. When in 1087 this state, the Primary responds to DHCP clients requests, while the 1088 Secondary does not. 1090 COMMUNICATION-INTERRUPTED state: 1092 A server goes into this state whenever it is unable to communicate 1093 with the other server. Both the Primary and Secondary Servers can go 1094 into this state, although the behavior changes that result are dif- 1095 ferent. Primary and Secondary Servers cycle automatically (without 1096 administrative intervention) between NORMAL and COMMUNICATION- 1097 INTERRUPTED state as the network connection between them fails and 1098 recovers, or as the partner server cycles between operational and 1099 non-operational. No duplicate IP address allocation can occur while 1100 the servers cycle between these states. In this state both servers 1101 may respond to DHCP client requests. When allocating new IP 1102 addresses, each server allocates from a different pool. When respond- 1103 ing to renewal requests, each server will allow continued renewal of 1104 a DHCP client's current lease on an IP address. 1106 PARTNER-DOWN state: 1108 PARTNER-DOWN state is a state either server can enter. Once a server 1109 has entered NORMAL state, the PARTNER-DOWN state is entered only on 1110 command of an external agency (typically an administrator of some 1111 sort) or after the expiration of an externally configured minimum 1112 safe-time after the beginning of COMMUNICATION-INTERRUPTED state. 1113 When in this state, the server no longer assumes that the other 1114 server could still be operational and servicing a a different set of 1115 clients, but instead assumes that it is the only server operating. 1116 Only one server should be operating in this state at a time. The 1117 server in this state will respond to DHCP client requests. It will 1118 allow renewal of all outstanding leases on IP addresses, and will 1119 allocate IP addresses from its own pool, and after a fixed period of 1120 time, it will allocate IP addresses from the set of all available IP 1122 DRAFT January 1998 1124 addresses. The server will transition out of PARTNER-DOWN state after 1125 automatic re-integration the companion server is complete. This 1126 automatic re- integration will typically be initiated by the restart 1127 of the server which was down. 1129 POTENTIAL-CONFLICT state: 1131 This state indicates that the two servers are attempting to rein- 1132 tegrate with each other, but at least one of them was running in a 1133 state that did not guarantee automatic reintegration would be possi- 1134 ble. In POTENTIAL-CONFLICT state the servers may determine that the 1135 same IP address has been offered and accepted by two different DHCP 1136 clients. 1138 RECOVER state: 1140 This state indicates that the server has no information in its stable 1141 storage. A server in this state will attempt to refresh its stable 1142 storage from the other server. 1144 SYNC state: 1146 In this state, the Secondary Server attempts to synchronize its 1147 stable storage with the Primary Server. Both the Primary and Secon- 1148 dary may have information that the other lacks. 1150 8. Primary Server Operation 1152 This section discusses the operation of the primary server using the 1153 state transition diagram in Figure 8.2-1. 1155 8.1. Primary Server Initialization 1157 When the Primary Server starts, there are three possibilities: it 1158 has never started before and therefore has no record of any previous 1159 state nor of any client binding information; it has started before 1160 and has a record of a previous state and possibly of some client 1161 binding information; it has started before, but failed catastrophi- 1162 cally, and now has no record of any previous state (nor of any client 1163 binding information). 1165 When the Primary Server starts, if it has any record of a previous 1166 state, then if that state was NORMAL or COMMUNICATION-INTERRUPTED it 1167 moves to COMMUNICATION- INTERRUPTED state. If that state was 1168 PARTNER-DOWN or POTENTIAL-CONFLICT, then it moves to PARTNER-DOWN 1169 state. If that state was RECOVER, then the Primary Server moves into 1170 the RECOVER state. 1172 DRAFT January 1998 1174 If it has no record of any previous state, then either this is an 1175 initial startup, or a recovery from a catastrophic failure where 1176 stable storage and all client binding information was lost. These are 1177 distinguished by recovery from a catastrophic failure being indicated 1178 by some external configuration indication to the Primary Server. 1180 8.2. Primary Server State Transitions 1182 Figure 8.2-1 is the diagram of the Primary Server's state transi- 1183 tions. The remainder of this section contains information important 1184 to the understanding of that diagram. 1186 The server stays in the current state until all of the actions speci- 1187 fied on the state transition are complete. If communications fails 1188 during one of the actions, the server simply stays in the current 1189 state and attempts a transition whenever the conditions for a transi- 1190 tion are later fulfilled. 1192 In the state transition diagram below, the "+" or "-" in the upper 1193 right corner of each state is a notation about whether communication 1194 is ongoing with the Secondary Server. The legend "responsive" and 1195 "unresponsive" in each state indicates whether the Primary Server is 1196 responsive to DHCP client requests in the respective state. 1198 In the diagram state transition diagram below, when communication is 1199 reestablished between the Primary and Secondary Server, the Primary 1200 server must record the state of the Secondary Server when the commun- 1201 ication was reestablished. 1203 If the state of the Secondary Server changes while communicating, 1204 then the Primary Server moves through the communications-failed tran- 1205 sition, and into whatever state results. It then immediately moves 1206 through whatever state transition is appropriate given the current 1207 state of the Secondary Server. 1209 DISCUSSION: 1211 The point of this technique is simplicity, both in explanation of 1212 the protocol and in its implementation. The alternative to this 1213 technique of memory of partner state and automatic state transi- 1214 tion on change of partner state is to have every state in the fol- 1215 lowing diagram have a state transition for every possible state of 1216 the partner. With the approach adopted, only the states in which 1217 communications are reestablished require a state transition for 1218 each possible partner state. 1220 All state transitions of the Primary Server must be recorded in its 1221 stable storage, and thus be available to the server after a server 1223 DRAFT January 1998 1225 restart. 1227 Previous Primary State: 1229 NORMAL or RECOVER PARTNER DOWN 1230 COMMUNICATION POTENTIAL CONFLICT 1231 INTERRUPTED | 1232 +---+ V | 1233 | +----------------+ +-----------------+ 1234 | | - | | - | 1235 | | RECOVER | | PARTNER DOWN |<-----+ 1236 | | (unresponsive) | | (responsive) | | 1237 | +----------------+ +-----------------+ | 1238 | | | | ^ | 1239 | Comm. OK | Comm. OK | | 1240 | Sec. State: | Sec. State: Comm. | 1241 | | | V All Others Failed | 1242 | | RECOVER +<---+ V | | 1243 | All | | +-------------+ | 1244 | Others | Comm. OK | POTENTIAL +| | 1245 | | Note Sec. State: | CONFLICT | | 1246 | | Poss. RECOVER |(responsive) |<---- | --+ 1247 | V Error NORMAL +-------------+ | | 1248 | Sec->Pri | Pri->Sec | | | 1249 | Sync | Sync. Resolve Conflict | | 1250 | | | V V | | 1251 | Wait MDLI | +-----------------+ | | 1252 | from Fail. | | + | External | | 1253 | V V | NORMAL |-Command-->+ | 1254 | +-----++------>| (responsive) | | | 1255 | ^ +-----------------+ | | 1256 | | | | | 1257 | Pri<->Sec Comm. External | 1258 | Sync Failed Command | 1259 | | | or | 1260 | Comm. OK | "Safe Period" | 1261 | Sec. State: V expiration | 1262 | NORMAL +-----------------+ | | 1263 | COMM. INT. | - |---------->+ | 1264 | RECOVER------| COMMUNICATIONS | | 1265 | | INTERRUPTED | Comm. OK | 1266 +------------------>| (responsive) |--Sec. State:--+ 1267 +-----------------+ All Others 1269 Figure 8.2-1: Primary Server state diagram. 1271 DRAFT January 1998 1273 8.3. Primary Server in PARTNER-DOWN state 1275 When it is in PARTNER-DOWN state, the Primary Server operates largely 1276 as does a normal DHCP server, with none of the special algorithms 1277 described below. In PARTNER-DOWN state the Primary Server MUST 1278 respond to DHCP client requests. 1280 Any available IP address tagged as belonging to the Secondary Server 1281 (at entry to PARTNER-DOWN state) MUST NOT be used until the MDLI 1282 beyond the entry into PARTNER-DOWN state has elapsed. 1284 The Primary Server MUST NOT allocate an IP address to a DHCP client 1285 different from that to which it was allocated at the entrance to 1286 PARTNER-DOWN state until the MDLI beyond the its expiration time has 1287 elapsed. If this time would be earlier than the current time plus 1288 the MDLI, then the current time plus the MDLI is used. 1290 Two options exist for lease times, with different ramifications flow- 1291 ing from each. 1293 If the Primary Server wishes the Failover Protocol to protect it from 1294 loss of stable storage in any state, then it should ensure that the 1295 MDLI based lease time restrictions in Section 6.1 are maintained, 1296 even in PARTNER-DOWN state. 1298 If the Primary Server wishes to forego the protection of the Failover 1299 Protocol in the event of loss of stable storage, then it need recog- 1300 nize no restrictions on actual client lease times while in PARTNER- 1301 DOWN state. 1303 The Primary Server MUST poll the Secondary Server and attempt to 1304 establish communications and synchronization with it. 1306 Once the Primary succeeds in contacting the Secondary Server, the 1307 Primary examines the state of the Secondary Server. If the state of 1308 the Secondary Server is RECOVER or NORMAL, then both servers have 1309 been running in such a way that duplicate IP address allocations were 1310 inhibited. In this case, the Primary Server updates the Secondary 1311 Server with its client binding information, and moves into the NORMAL 1312 state. 1314 Once contact has been established, if the state of the Secondary 1315 Server is anything other than RECOVER or NORMAL then the Primary 1316 Server moves into the POTENTIAL-CONFLICT state. 1318 8.4. Primary Server in RECOVER state 1320 When Primary Server is initialized in the RECOVER state it expects to 1322 DRAFT January 1998 1324 refresh its stable storage from an existing Secondary Server. In 1325 this state the Primary Server MUST NOT respond to DHCP client 1326 requests. 1328 When the Primary Server succeeds in contacting the Secondary Server, 1329 if it determines that the Secondary Server is itself in the RECOVER 1330 state (which indicates that the Secondary Server has no existing 1331 client binding information), the Primary Server will move directly 1332 into NORMAL state after signaling some kind of an error (since some 1333 person had to explicitly start the Primary Server in RECOVER state to 1334 refresh its lost client binding information from the Secondary, and 1335 the Secondary had no state). 1337 If the Primary Server determines that the Secondary Server is in any 1338 state other than RECOVER, then the Secondary Server has some client 1339 binding information that the Primary Server needs before it moves 1340 into the NORMAL state. The Primary Server will attempt to refresh 1341 its state from the Secondary Server, and it will remain in the 1342 RECOVER state until it is successful in doing so. 1344 The Primary Server MUST remain in RECOVER state until a period of at 1345 least the MDLI has passed since the Primary Server was known to have 1346 failed. This is to allow any IP addresses that were allocated by the 1347 Primary Server prior to loss of Primary Server client binding infor- 1348 mation in stable storage to contact the Secondary Server or to time 1349 out. 1351 DISCUSSION: 1353 The actual requirement on this wait period in RECOVER is that it 1354 start when the Primary Server went down, not necessarily when it 1355 came back up. If the time when the Primary Server failed is 1356 known, then it could be communicated to the recovering server, and 1357 the wait period could be reduced to the MDLI less the difference 1358 between the current time and the time the server failed. In this 1359 way, the waiting period could be minimized. 1361 8.5. Primary Server in NORMAL state 1363 When in NORMAL state, the Primary Server takes the following actions 1364 to implement the Safe Failover Protocol: 1366 o Lease Time Calculations 1368 As discussed in Section 6.1, "Control of lease time", the 1369 lease interval given to a DHCP client can never be more than 1370 the maximum delta lease interval greater than the acknowledged 1371 DRAFT January 1998 1373 Secondary Server lease interval. 1375 As long as the Primary Server adheres to this constraint, the 1376 specifics of the lease intervals that it gives to either the 1377 DHCP client or the Secondary DHCP server are implementation 1378 dependent. One possible approach is shown in Section 6.1, but 1379 that particular approach is in no way required by this proto- 1380 col. 1382 o Lazy Update of Secondary Server 1384 After an ACK of a IP address binding, the Primary Server 1385 attempts to update the Secondary with the binding information. 1386 The lease time used in the update of the Secondary MUST be at 1387 least that given to the DHCP client in the DHCPACK. It MAY, 1388 however, be longer. 1390 o Reallocation of IP Addresses Between Clients 1392 Whenever a client binding is released, a DHCPBNDUPD message 1393 must be sent to the Secondary Server, setting the binding 1394 state to RELEASED. However, until a DHCPBNDACK is received for 1395 this message, the IP address cannot be allocated to another 1396 client. 1398 8.6. Primary Server in COMMUNICATION-INTERRUPTED Mode 1400 When in COMMUNICATION-INTERRUPTED state the Primary Server operates 1401 in such a way that correct operation is ensured even if the Secondary 1402 Server is still up and operational, but unable to communicate to the 1403 Secondary Server. When communications are reestablished between the 1404 Primary and Secondary Servers, if both are still in COMMUNICATION- 1405 INTERRUPTED state, then the re-integration of their operation will 1406 proceed automatically and without human intervention. The protocol 1407 is designed to ensure that reintegration will proceed in an error 1408 free manner and that no actions taken by either server while in 1409 COMMUNICATION-INTERRUPTED state will cause problems during reintegra- 1410 tion. 1412 The Primary Server operates in COMMUNICATION-INTERRUPTED state as it 1413 does in NORMAL state. 1415 However, since it cannot communicate with the Secondary in this 1416 state, the acknowledged-Secondary-lease-time will not be updated in 1417 any new bindings. This is likely to eventually cause the actual- 1418 client-lease-times to be the current-time plus the MDLI (unless this 1419 is greater than the desired-client-lease-time). 1421 DRAFT January 1998 1423 The Primary Server can simply queue updates to the Secondary on com- 1424 munication interruption and stay in the NORMAL state. If, at the time 1425 communication with the Secondary is reestablished, the Secondary 1426 remains in the NORMAL state as well, then the queued updates for the 1427 Secondary will simply be processed. 1429 COMMUNICATION-INTERRUPTED state for the Primary Server is a signal 1430 that it has stopped queuing updates to the Secondary, and is able to 1431 respond to a variety of possible Secondary states. 1433 It is anticipated that some alarm condition would be raised upon the 1434 transition from NORMAL state to COMMUNICATION-INTERRUPTED state. Once 1435 the Primary Server has been in COMMUNICATION-INTERRUPTED state for a 1436 period equal to the safe-period, then it can (if configured to do so) 1437 transition into the PARTNER-DOWN state. An external command may also 1438 force a transition to PARTNER-DOWN state. 1440 9. Secondary Server Operation 1442 The Secondary Server responds to DHCP client requests only in the 1443 PARTNER-DOWN and COMMUNICATION-INTERRUPTED states. 1445 9.1. Secondary Server Initialization 1447 When the Secondary Server starts, there are three possibilities: it 1448 has never started before and therefore has no record of any previous 1449 state nor of any client binding information; it has started before 1450 and has a record of a previous state and possibly of some client 1451 binding information; it has started before, but failed catastrophi- 1452 cally, and now has no record of any previous state (nor of any client 1453 binding information). 1455 When the Secondary Server starts, if it has any record of a previous 1456 state, then if that state was NORMAL, COMMUNICATION-INTERRUPTED, or 1457 SYNC, it moves to COMMUNICATION-INTERRUPTED state. If that state was 1458 PARTNER-DOWN or POTENTIAL-CONFLICT, then it moves to PARTNER-DOWN 1459 state. In all other cases (both other previous states and the cases 1460 where there is no record of a previous state), the Secondary Server 1461 moves into the RECOVER state. 1463 9.2. Secondary Server State Transitions 1465 The server stays in the current state until all of the actions speci- 1466 fied on the state transition are complete. If communications fails 1467 during one of the actions, the server simply stays in the current 1468 state and attempts a transition whenever the conditions for a 1470 DRAFT January 1998 1472 transition are later fulfilled. 1474 In the state transition diagram below, the "+" or "-" in the upper 1475 right corner of each state is a notation about whether communication 1476 is ongoing with the Primary Server. The legend responsive" and 1477 "unresponsive" in each state indicates whether the Secondary Server 1478 is responsive to DHCP client requests in the respective state. 1480 In the state transition diagram below, when communication is reesta- 1481 blished between the Secondary and Primary Server, the Secondary 1482 Server must record the state of the Primary Server when the communi- 1483 cations was reestablished. If the state of the Primary Server changes 1484 while communicating, then the Secondary Server moves through the 1485 communications-interrupted transition, and into whatever state 1486 results. At that time, it then immediately moves through whatever 1487 state transition is appropriate for the current state of the Primary 1488 Server. 1490 All state transitions of the Secondary Server must be recorded in its 1491 stable storage, and thus be available to the server after a server 1492 restart. 1494 DRAFT January 1998 1496 Previous Secondary State: 1498 NORMAL RECOVER PARTNER DOWN 1499 COMM. INT. POTENTIAL CONFLICT 1500 SYNC | | 1501 +---+ V V 1502 | +----------------+ +-----------------+ 1503 | | RECOVER - | | PARTNER DOWN - |<-----+ 1504 | | (unresponsive) | | (responsive) | | 1505 | +----------------+ +-----------------+ | 1506 | | | | ^ | 1507 | Comm. OK | Comm. OK | | 1508 | Pri. State: | Pri. State: Comm. | 1509 | | | V All Others Failed | 1510 | | RECOVER +<---+ V | | 1511 | | | | +--------------+ | 1512 | | | Comm. OK | POTENTIAL + | | 1513 | All | Pri. State: | CONFLICT | | 1514 | Others | RECOVER |(unresponsive)|<--- | --+ 1515 | | Note | +--------------+ | | 1516 | | Poss. Sec->Pri | | | 1517 | V Error Sync. Resolve Conflict | | 1518 | Pri->Sec | V V | | 1519 | Sync | +-----------------+ | | 1520 | V V | NORMAL + |-External->+ | 1521 | +-----++------>| (unresponsive) | Command | | 1522 | ^ +-----------------+ | | 1523 | Pri<->Sec | ^ | | 1524 | Sync | Start Alloc Timer | | 1525 | | | Sec->Pri | | 1526 | +--------------+ | Sync | | 1527 | | + |--->+ | External | 1528 | | SYNC | Comm. Comm. OK Command | 1529 | | unresponsive | Failed Pri. State: or | 1530 | +--------------+ | RECOVER "Safe Period" | 1531 | ^ V | expiration | 1532 | | +------------------+ | | 1533 | Comm. OK | COMMUNICATIONS - |---------->+ | 1534 | Pri. State: | INTERRUPTED | Comm. OK | 1535 | NORMAL-----| (responsive) |--Pri. State:--+ 1536 | COMM. INT. +------------------+ All Others 1537 | ^ 1538 +---------------------+ 1540 Figure 9.2-1: Secondary Server State Diagram. 1542 DRAFT January 1998 1544 9.3. Secondary Server in RECOVER state 1546 The Secondary DHCP server comes up in the RECOVER state when it has 1547 no record of any previous state (or that previous state was RECOVER). 1549 It stays in this state until it establishes communication with the 1550 Primary Server, and is unresponsive to DHCP client requests in this 1551 state. Essentially it is idle until it can contact the Primary 1552 Server. 1554 When it establishes communication with the Primary Server, it 1555 attempts to load its client binding database from that of the Primary 1556 Server using the techniques specified in section 6. 1558 Once the Secondary Server's client binding database is refreshed from 1559 that of the Primary, the Secondary Server moves into NORMAL state. 1561 9.4. Secondary Server in NORMAL state 1563 In normal state, the Secondary Server receives state updates from the 1564 Primary Server in DHCPBNDUPD messages. It records these in its 1565 client binding database in stable storage and then sends the 1566 corresponding DHCPBNDACK message to the Primary Server. 1568 While in NORMAL state, the Secondary Server MUST also acquire a 1569 series of IP addresses from the Primary Server to be used to satisfy 1570 DHCPDISCOVER requests from DHCP clients when in COMMUNICATION- INTER- 1571 RUPTED state. See Section 2.2.2 for details of this acquisition pro- 1572 cess. 1574 The Secondary Server periodically polls the Primary Server with the 1575 DHCPPOLL message. If it fails to receive a DHCPPRPL message in reply 1576 after a configured number of retries or some administratively deter- 1577 mined time, the Secondary Server transitions into COMMUNICATION- 1578 INTERRUPTED state. Both the DHCPPOLL and DHCPPRPL messages carry the 1579 current status of the sender. 1581 If an external command is received by the Secondary Server, it can 1582 move from NORMAL to PARTNER- DOWN state directly. Such a command 1583 might be sent when the Primary Server was removed from server, and an 1584 operator wanted the Secondary Server to take over immediately and 1585 completely from the Primary Server.(Note that the Secondary Server 1586 takes over from the Primary Server when in COMMUNICATION- INTERRUPTED 1587 state, but less completely than in PARTNER-DOWN state). 1589 DRAFT January 1998 1591 9.5. Secondary Server in COMMUNICATION-INTERRUPTED state 1593 When in COMMUNICATION-INTERRUPTED state the Secondary Server operates 1594 in such a way that correct operation is ensured even if the Primary 1595 Server is still up and operational, but unable to communicate to the 1596 Secondary Server. When communications are reestablished between the 1597 Primary and Secondary Servers, if both are still in COMMUNICATION- 1598 INTERRUPTED state, then the re-integration of their operation will 1599 proceed automatically and without human intervention. The protocol 1600 is designed to ensure that reintegration will proceed in an error 1601 free manner and that no actions taken by either server while in 1602 COMMUNICATION-INTERRUPTED state will cause any conflicts to occur 1603 during re-integration. 1605 In COMMUNICATION-INTERRUPTED state, the Secondary Server responds to 1606 DHCP client requests. 1608 When processing a DHCPREQUEST from a DHCP client, the Secondary 1609 Server MUST ensure that the client- lease-time is never more than the 1610 maximum-delta-lease- interval from the current-time, independent of 1611 the desired- client-lease-time. 1613 When processing a DHCPRELEASE request from a DHCP client or the 1614 expiration of a lease, the Secondary Server must not reallocate the 1615 IP address to a different client. If the same client subsequently 1616 performs a DHCPDISCOVER request, the Secondary Server SHOULD offer it 1617 the previously used IP address. 1619 When processing a DHCPDISCOVER request from a DHCP client, the secon- 1620 dary MUST allocate IP addresses from the list of IP addresses that it 1621 acquired from the Primary Server in RECOVER state. When it exhausts 1622 this list, it MUST stop responding to DHCPDISCOVER requests (except 1623 those it can satisfy by offering expired or released IP addresses to 1624 their previously bound clients). 1626 The Secondary Server MUST continue to send DHCPPOLL messages to the 1627 Primary Server when in COMMUNICATION-INTERRUPTED state. If it 1628 receives a DHCPPRPL message in reply, the Secondary Server determines 1629 the state of the Primary Server. If the Primary Server is in NORMAL 1630 or COMMUNICATION-INTERRUPTED state, then the Secondary Server moves 1631 into the SYNC state. 1633 If, however, the Primary Server is in RECOVER state, then the Secon- 1634 dary Server updates the Primary Server with its known client binding 1635 information, and moves into NORMAL state upon completion of that 1636 update. 1638 If instructed to by an outside agency (e.g., an administrator), the 1640 DRAFT January 1998 1642 Secondary Server SHOULD move into PARTNER-DOWN state. Once the 1643 Secondary Server has been in COMMUNICATION-INTERRUPTED state for a 1644 period equal to the safe-period, then it may (if configured to do so) 1645 transition into the PARTNER-DOWN state in the absence of an external 1646 command. 1648 9.6. Secondary Server in SYNCH state 1650 The Secondary Server does not respond to DHCP client requests when in 1651 SYNCH state. 1653 DISCUSSION: 1655 This is the entire reason for this states existence, otherwise the 1656 activities specified for this state could happen as part of a 1657 state transition from the COMMUNICATION-INTERRUPTED state to the 1658 NORMAL state. However, in the COMMUNICATION-INTERRUPTED state the 1659 Secondary Server responds to DHCP client requests. Having the 1660 Secondary Server respond to DHCP client requests during the syn- 1661 chronization process (and thus taking actions requiring further 1662 synchronization) seemed like a bad idea. 1664 The Secondary Server synchronizes its information with the Primary 1665 Server while in SYNCH state. Both Primary and Secondary Servers may 1666 have information the other lacks because of operations performed 1667 while communications were interrupted. 1669 During the synchronization process, the Secondary Server continues to 1670 poll the Primary Server with DHCPPOLL messages. If it fails to 1671 receive a reply, it moves back into COMMUNICATION-INTERRUPTED state. 1673 When synchronization is complete, the Secondary Server moves into 1674 NORMAL state. 1676 9.7. Secondary Server in PARTNER-DOWN state 1678 The Secondary Server responds to DHCP client requests when in 1679 PARTNER-DOWN state. 1681 Any available IP address which does not belong to the private pool 1682 established by the Secondary Server (at entry to PARTNER-DOWN state) 1683 MUST NOT be used until the MDLI beyond the entry into PARTNER-DOWN 1684 state has elapsed. 1686 The Secondary Server MUST NOT allocate an IP address to a DHCP client 1687 different from that to which it was allocated at the entrance to 1689 DRAFT January 1998 1691 PARTNER-DOWN state until the MDLI beyond the its expiration time has 1692 elapsed. If this time would be earlier than the current time plus the 1693 MDLI, then the current time plus the MDLI is used. 1695 Two options exist for lease times, with different ramifications flow- 1696 ing from each. 1698 If the Secondary Server wishes the Failover Protocol to protect it 1699 from loss of stable storage in any state, then it should ensure that 1700 the MDLI based lease time restrictions in Section 6.1 are maintained, 1701 even in PARTNER-DOWN state. 1703 If the Secondary Server wishes to forego the protection of the safe 1704 Failover Protocol in the event of loss of stable storage, then it MAY 1705 recognize no restrictions on actual client lease times while in 1706 PARTNER-DOWN state. 1708 The Secondary Server continues to poll the Primary Server with 1709 DHCPPOLL messages. If the Secondary Server receives a reply, and the 1710 Primary Server is in the RECOVER state, the Secondary Server updates 1711 the Primary Server with all of the Secondary's client binding infor- 1712 mation, and then moves into the NORMAL state. 1714 If communications with the Primary Server are reestablished, and the 1715 Primary Server is in any other state but RECOVER, the Secondary 1716 Server moves into the POTENTIAL-CONFLICT state (as does the Primary 1717 Server). 1719 9.8. Secondary Server in POTENTIAL-CONFLICT state 1721 The secondary server enters POTENTIAL-CONFLICT state when the combi- 1722 nation of its state and that of the primary indicate that a potential 1723 conflict of IP address allocation has occurred. There is no guaran- 1724 tee that such a conflict has occurred -- just the possibility. In 1725 this state each server compares its client binding information with 1726 that of the other server and any conflicts are resolved in an imple- 1727 mentation dependent manner. 1729 When (and if) the resolution process completes, each server moves 1730 into the NORMAL state. 1732 10. Safe Period 1734 Due to the restrictions imposed on each server while in 1735 COMMUNICATION-INTERRUPTED state, long-term operation in this state is 1736 not feasible for either server. One reason that these states exist at 1737 all, is to allow the servers to easily survive transient network 1739 DRAFT January 1998 1741 communications failures of a few minutes to a few days (although the 1742 actual time periods will depend a great deal on the DHCP activity of 1743 the network in terms of arrival and departure of DHCP clients on the 1744 network). 1746 Eventually, when the servers are unable to communicate, they will 1747 have to move into a state where they no longer can re-integrate 1748 without the some possibility of a duplicate IP address allocation. 1749 There are two ways that they can move into this state (known as 1750 PARTNER-DOWN). 1752 They can either be informed by external command that, indeed, the 1753 partner server is down. In this case, there is no difficulty in mov- 1754 ing into the PARTNER-DOWN state since it is an accurate reflection of 1755 reality and the protocol has been designed to operate correctly (even 1756 during reintegration) if, when in PARTNER-DOWN state the partner is, 1757 indeed, down. 1759 The other difficulty is when the servers are running unattended for 1760 extended periods, and in this case the option is provided to config- 1761 ure something called a "safe- period" into each server. This OPTIONAL 1762 safe-period is the period after which either the Primary or Secondary 1763 Server will automatically transition to PARTNER-DOWN from 1764 COMMUNICATION-INTERRUPTED state. If this transition is completed and 1765 the partner is not down, then the possibility of duplicate IP address 1766 allocations will exist. 1768 The goal of the "safe-period" is to allow network operations staff 1769 some time to react to a server moving into COMMUNICATION-INTERRUPTED 1770 state. During the safe-period the only requirement is that the net- 1771 work operations staff determine if both servers are still running -- 1772 and if they are, to either fix the network communications failure 1773 between them, or to take one of the servers down before the expira- 1774 tion of the safe-period. 1776 The length of the safe-period is installation dependent, and depends 1777 in large part on the number of unallocated IP addresses within the 1778 subnet address pool and the expected frequency of arrival of previ- 1779 ously unknown DHCP clients requiring IP addresses. Many environments 1780 should be able to support safe-periods of several days. 1782 During this safe period, either server will allow renewals from any 1783 existing client. The only limitation concerns the need for IP 1784 addresses for the DHCP server to hand out to new DHCP clients and the 1785 need to re-allocate IP addresses to different DHCP clients. 1787 The number of "extra" IP addresses required is equal to the expected 1788 total number of new DHCP clients encountered during the safe period. 1790 DRAFT January 1998 1792 This is dependent only on the arrival rate of new DHCP clients, not 1793 the total number of outstanding leases on IP addresses. 1795 In the unlikely event that a relatively short safe period of an hour 1796 is all that can be used (given a dearth of IP addresses or a very 1797 high arrival rate of new DHCP clients), even that can provide sub- 1798 stantial benefits in allowing the DHCP subsystem to ride through a 1799 minor problems that could occur and be fixed within that hour. In 1800 these cases, no possibility of duplicate IP address allocation 1801 exists, and re-integration after the failure is solved will be 1802 automatic and require no operator intervention. 1804 11. Open Issues 1806 A number of details remain to be worked out. They are as follows: 1808 1. Level of Agreement and Completion 1810 This draft is incomplete in two senses. First, none of the 1811 authors agree with everything written, and quite a number of 1812 issues remain to be worked out among the various authors (to say 1813 nothing about the rest of the community). Second, this draft is 1814 not yet complete enough to support creation of inter-operable 1815 implementations. 1817 However, we believe that even though this draft is very much a 1818 work in progress, there is value with sharing it with the rest 1819 of the DHCP community in its current form. 1821 2. Failover Port 1823 We need to resolve whether the Failover protocol runs with the 1824 same or a different port as the DHCP protocol. In the interests 1825 of allowing implementation of the Failover protocol by a dif- 1826 ferent process or sub-process, having it use a different port 1827 seems reasonable. 1829 3. High Level Operations 1831 While the detailed operations are beginning to come together, 1832 the higher level operations (like reintegration) are, as yet, 1833 incompletely specifcied. This will be rectified in a later 1834 revision. 1836 4. Option Spaces 1838 The draft currently reflects some rather fuzzy goals of using 1839 DHCP options where they apply but also defining new options. It 1840 DRAFT January 1998 1842 uses the "user defined option space" for this, which is probably 1843 not a good idea. Perhaps the DHCP Panel will produce a larger 1844 option space in which all of these options can be defined, or 1845 perhaps (as it written in the draft) this protocol will just 1846 have to define entirely unique options. 1848 5. Subnet Level Granularity 1850 This protocol talks about a server being in one state or 1851 another, however the desire is for this protocol to operate 1852 independently in each address pool for which a primary and 1853 secondary server is defined. In this way, the "server" state 1854 really refers to the "subnet" state. Once the protocol is vali- 1855 dated, the editing work to make it operate at subnet granularity 1856 will be performed. 1858 6. Secondary Server Communications with DHCP Clients 1860 There are two situations where we may want to allow the secon- 1861 dary server to communicate with DHCP clients even though the 1862 secondary can communicate with the primary and would normally be 1863 unresponsive to DHCP client requests. 1865 The first situation which deserves consideration is where the 1866 secondary has given a DHCP client a lease on an IP address when 1867 it was not able to communicate with the primary, and then subse- 1868 quently the secondary becomes able to communicate with the pri- 1869 mary. When the client unicasts its DHCPREQUEST to the secondary 1870 to renew its lease, the secondary will not be able to communi- 1871 cate with the client (as this protocol is defined). Should we 1872 allow the Secondary to extend the lease for the DHCP client and 1873 then inform the primary of that extension using the DHCPBNDUPD 1874 message in the same was as the Primary uses that message? 1876 The second situation arises where a client can only communicate 1877 with the secondary due to some network failure, but the primary 1878 and secondary server can communicate. As written, the protocol 1879 will not allow the secondary to offer a lease to the DHCP 1880 client, but it would be straightforward to modify the protocol 1881 to allow the secondary to do so. The only difficult part of 1882 this change to the protocol would be to suggest how the secon- 1883 dary would know that the DHCP client could talk only to the 1884 secondary. But, given that if the DHCP primary could talk to 1885 the DHCP client, the secondary would expect to hear about it in 1886 DHCPBNDUPD messages at some point, the absence of such messages 1887 could be used as a signal to communicate to the DHCP client in 1888 question. 1890 DRAFT January 1998 1892 7. UDP or TCP 1894 There has been much debate about the utility of using UDP for 1895 the failover protocol, since it doesn't supply guaranteed 1896 delivery. Certainly rebuilding TCP out of UDP would be a mis- 1897 take. Some factors to consider in this debate are as follows: 1899 First, it is important to recognize that mere receipt of a 1900 packet by the other server in the pair (e.g., receipt of a 1901 DHCPBNDUPD packet by the secondary server) is not sufficient for 1902 the primary to update its own bindings database with new infor- 1903 mation about what the secondary knows. In all cases of 1904 transfers of bindings information, the server of a DHCPBNDUPD 1905 message MUST update its own stable storage prior to replying 1906 with a DHCPBNDACK message (except in the marginal case where all 1907 of the updates are rejected). An action is required by the 1908 receiving server and an explicit ACK is needed by the sending 1909 server to ensure the integrity of the protocol. So, just know- 1910 ing that the other server has received a Failover protocol 1911 packet is not intrinsically interesting. 1913 Second, the DHCP protocol, both the client and server side, is 1914 being implemented in progressively smaller and smaller machines. 1915 While this progression is most evident in DHCP clients, there 1916 exist implementations today of DHCP servers embedded in devices 1917 that are by no stretch of the imagination traditional "servers" 1918 running mainstream operating systems. In many ways, the Fail- 1919 over protocol is very well suited to such devices. Adding addi- 1920 tional protocol infrastructure requirements to implement the 1921 Failover protocol could easily prevent its implementation in 1922 devices that in some ways need it most. 1924 Third, there are only a few cases where the Failover protocol 1925 requires guaranteed delivery of packets. In particular, the 1926 normal Primary to Secondary DHCPBNDUPD message to not have to be 1927 delivered reliably. The consequences of lost DHCPBNDUPD mes- 1928 sages are handled by the use of the MDLI, for the simple reason 1929 that since these messages are "lazy", they may not get delivered 1930 because of a server failover prior to their transmission. Given 1931 that the protocol is robust in the face of loss of either a 1932 DHCPBNDUPD message or a DHCPBNDACK message, a technique known as 1933 "fire and forget" may be used with this protocol and two 1934 cooperating implementations. If the DHCPBNDACK message contains 1935 all of the information originally in the DHCPBNDUPD message, 1936 then the DHCPBNDUPD message may be transmitted and forgotten by 1937 the sending server (typically the primary). When and if the 1938 secondary receives the DHCPBNDUPD and replies with a DHCPBNDACK 1939 message and the primary receives it, the primary will update its 1940 DRAFT January 1998 1942 stable storage with a new picture of what the secondary knows 1943 about the lease time. If either of these messages is lost, the 1944 only downside is that the DHCP client associated with the bind- 1945 ing in question may receive a shorter lease for one lease period 1946 than it would otherwise. This "fire and forget" technique 1947 could substantially ease both the complexity of implementation 1948 and memory requirements of an implementation of the Failover 1949 protocol, especially where two servers were communicating over a 1950 very slow link. 1952 12. Acknowledgments 1954 Ralph Droms started it all, by sketching out an initial interserver 1955 draft that embodied ideas from several past IETF meetings. In that 1956 draft, he acknowledged contributions by Jeff Mogul, Greg Minshall, 1957 Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group. 1959 Kim Kinnear and Bob Cole each extended that draft, separately and 1960 then together, until they created an interserver draft that supported 1961 any number of servers. The complexity of that approach was just too 1962 great, and led to a much simpler approach embodied in the first Fail- 1963 over draft by Greg Rabil, Mike Dooley, and Arun Kapur and Ralph 1964 Droms. This draft posited only two servers -- a primary and a secon- 1965 dary. Kim Kinnear then wrote the Safe Failover draft to layer on top 1966 of the Failover Draft and increase its the robustness in the face of 1967 certain rare network failures. At the spring 1998 IETF meeting in LA, 1968 the DHC working group said that they wanted a merged Failover and 1969 Safe Failover draft. Steve Gonczi and Bernie Volz stepped up and 1970 produced the raw material for such a merged draft, along with a new 1971 message format designed around DHCP options and other extensions and 1972 clarifications. Kim Kinnear edited their work into draft format and 1973 made other changes, and that is what you have in your hands. 1975 Many people have reviewed the various drafts that went into this 1976 result. At American Internet, ideas have been contributed by Mark 1977 Stapp, Brad Parker, and Ellen Garvey. Glenn Waters of Bay Networks 1978 contributed ideas and enthusiasm to make a Failover protocol that was 1979 both "safe" and "lazy". 1981 13. References 1983 [1] Droms, R., "Dynamic Host Configuration Protocol", RFC 2131, 1984 March 1997. 1986 [2] Alexander, S., Droms, R., "DHCP Options and BOOTP Vendor 1987 Extensions", Internet RFC 2132, March 1997. 1989 DRAFT January 1998 1991 [3] Rabil, G., Dooley, M., Kapur, A., Droms, R., "DHCP Failover 1992 Protocol", draft-ietf-dhc-failover-00.txt. 1994 [4] Gudmundsson, Olafur, "Security Architecture for DHCP", 1995 draft-ietf-dhc-security-arch-00.txt. 1997 14. Author's information 1999 Ralph Droms 2000 323 Dana Engineering 2001 Bucknell University 2002 Lewisburg, PA 17837 2004 Phone: (717) 524-1145 2005 EMail: droms@bucknell.edu 2007 Greg Rabil, Mike Dooley, Arun Kapur 2008 Quadritek Systems, Inc. 2009 10 Valley Stream Parkway, Suite 240 2010 Malvern, PA 19355 2012 Phone: (800) 208-2747 2014 EMail: grabil@quadritek.com 2015 mdooley@quadritek.com 2016 akapur@quadritek.com 2018 Kim Kinnear 2019 American Internet Corporation 2020 4 Preston Ct. 2021 Bedford, MA 01730-2334 2023 Phone: (781) 276-4587 2024 EMail: kinnear@american.com 2026 Steve Gonczi, Bernie Volz 2027 Process Software Corporation 2028 959 Concord St. 2029 Framingham, MA 01701 2031 Phone: (508) 879-6994 2033 EMail: gonczi@process.com 2034 volz@process.com